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Preface 


In the spring of 1953, shortly after Watson and Crick’s discovery of the double-helical structure of 
DNA, I found myself at dinner next to the famous geneticist R.A. Fisher. When I asked him what he 
thought would be that structure’s implication for genetics, he replied firmly “None!” He may have 
taken this myopic view, because to him genetics was an abstract mathematical subject with laws that 
were independent of the physical nature of genes. At the time, it was an esoteric subject taught only to 
a few biologists and regarded as largely irrelevant to medicine. Archibald Garrod’s seminal book 
Inborn Errors of Metabolism had made little impact, perhaps because no-one knew what genes were 
made of or how mutations acted. 

The first recognition that mutations act on proteins came in 1949 when Linus Pauling and his 
collaborators published their paper on “Sickle cell anemia, a molecular disease.” They found that 
patients suffering from this recessively inherited disease had an abnormal hemoglobin that differed 
from the normal form by the elimination of two negative charges. Eight years later, Vernon Ingram 
showed that this was due to the replacement of a single glutamic acid residue in each of the identical 
half-molecules of hemoglobin by a valine. Ingram’s discovery was the first specific evidence of the 
chemical effect of a mutation. It marked the birth of molecular genetics and molecular medicine and it 
started the transformation of genetics to its central position in molecular biology, biochemistry, and 
medicine today. Even so, people who have studied genetics as part of their curriculum may not have 
become familiar with all the hundreds of specialized terms that geneticists have coined. To them and 
many others at the periphery of genetics this Encyclopedia will prove most useful. 

Like the Encyclopedia Britannica it is a cross between a dictionary and a text book. Hybrids are 
defined in three lines, while Jonathan Hodgkin’s brilliant exposition of past and present research on 
Drosophila’s rival, the minute nematode worm Caenorhabditis elegans, occupies nearly six pages. 
This entry is intelligible to non-specialists, but that is not true of some others that were “Greek” to 
me. 
The Encyclopedia includes biographies of many of the pioneers, from Gregor Mendel to Ernst 
Mayr, the great evolutionist who actually contributed several entries. Many of the other contributors 
are also pioneers, even though they are not yet old enough to figure among the biographies. Sydney 
Brenner has written the entry on the genetic code which he himself helped to discover 40 years ago; 
David Weatherall has written on thalassemia to whose exploration he has devoted a lifetime; Malcolm 
Ferguson-Smith has written on human chromosomal anomalies on which he is the world authority; 
he and some others made such varied contributions to their fields that they figure among many of the 
1650-odd entries. Sydney Brenner, one of the two editors-in-chief, may be the only contributor who 
does not need the Encyclopedia. 

What of the future of genetics? Its applications are likely to multiply, and many of the applicants 
coming from other fields will welcome the Encyclopedia. An increasing number of applications will 
be in medicine. I have heard predictions that in future every newborn child will have its genes 
screened and the results imprinted on a computer chip that he or she will carry for life. Recorded 
on it would be all genetic anomalies, susceptibilities to diseases, and intolerance of drugs. In case of 
illness or accident, that chip would activate an algorithm that automatically prescribes the correct 
treatment. Unfortunately for such utopias, medicine is more complex. Weatherall has shown else- 
where that the single recessive disease thalassemia is a multiplicity of different diseases and that the 
same genotype may give rise to widely different phenotypes, depending on environmental and other 
factors. This complexity arises even when single point mutations do not necessarily lead to disease, 
but merely to susceptibilities to disease, as in o,-antitrypsin deficiency or with certain abnormal 
hemoglobins. The complexity is much greater still in multifactorial diseases like schizophrenia or 
diabetes. For many reasons good medicine will continue to require wide knowledge, mature judge- 
ment, empathy, and wisdom. 

There have been glib predictions that the mapping of the human genome will allow most genetic 
diseases to be cured by either germline or somatic gene therapy, but the former is too risky and the 
latter is proving extremely difficult and costly. The risks of germline therapy are illustrated by the 
attempt to create a genetically modified monkey. Scientists injected the gene for the green fluorescent 
protein from a jellyfish into 222 monkey eggs. After fertilizing them with monkey sperm, they 
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incubated them and implanted a pair each into the wombs of 20 surrogate monkey mothers. Only five 
of the implants resulted in pregnancies. Only three monkey babies were born and only one of them 
carried the jellyfish gene, but the monkey does not fluoresce, because the gene, though incorporated 
into its chromosomes, fails to be expressed. It will be argued that technical improvements may lead to 
a 100% success rate, but this kind of gene transfer has now been practised for some years in mice and 
other animals, and it has remained a haphazard affair that would be criminal to apply to humans. 
Human cloning carries similar risks. 

After many failures, somatic gene therapy of a potentially fatal human genetic disease recently 
succeeded for the first time. A French team cured two baby boys of severe combined immunodefi- 
ciency (SCID)-XI disease. They infected cells extracted from the boys’ bone marrow with cDNA 
containing the required gene coupled to a retrovirus-derived vector, and then re-injected them back 
into the boys’ bone marrow. The therapy restored normal immune function that was still intact 10 
months later. Another interesting development was the restoration of normal function to the muscle 
of a dystrophic mouse after injection of fragments of the giant muscular dystrophy gene. On the other 
hand, it has so far been impossible to cure some of the most common genetic diseases: thalassemia, 
sickle cell anemia, or cystic fibrosis, because it has proved extremely difficult to express the genes in 
the correct place in the patients’ chromosomes and get them to express the required protein in 
sufficient quantities in the right tissues. If gene therapy has a bright future, it does not look as though 
it is just round the corner. 

Before the completion of the Human Genome Project, identification of the genes for some human 
inherited diseases required truly heroic efforts. The search for the Huntington’s disease gene occupied 
up to a hundred people for about 10 years. The same work could now be accomplished by few people 
in a fraction of the time. This is one of the Human Genome Project’s important medical benefits. 
Others may be the rapid identification of promising new drug targets against diseases ranging from 
high blood pressure to a variety of cancers, the epidemiology of alleles linked to susceptibility to 
various diseases and improved basic understanding of human physiology and pathology. 

Agriculture offers the greatest scope for applied genetics, but distrust of genetically modified foods 
has blinded the public to its potential benefits and its vital importance for the avoidance of widespread 
famines later in this century. Since the early 1960s, food available per head in the developing world has 
increased by 20% despite a doubling of the population. This outstanding success has been achieved by 
the introduction of crops improved by crossing and by intensive application of fertilizers, pesticides, 
and weed-killers. Even so, there are 800 million hungry people and 185 million seriously malnour- 
ished pre-school children in the developing world. It is unlikely that the methods that have raised 
cereal yields hitherto will allow them to be raised again sufficiently to reduce these distressing 
numbers. Since most fertile land is already intensively cultivated, scientists are trying to introduce 
genes into crops that would allow them to be grown on poorer soils and in harsher climates, and to 
make existing crops more nutritious. In the tropics, fungi, bacteria, and viruses still cause huge harvest 
losses. Scientists are trying to introduce genes that will confer resistance to some of these pests, 
enabling farmers to use fewer pesticides. Genetically modified plants offer our best hope of feeding a 
world population that is expected to double in the next 50 years. It will be tragic if the present outcry 
over genetically modified foods will discourage further research and development in this field. If this 
Encyclopedia helps to promote better public understanding of genetics, this might be the best remedy 
against irrational fears. 


Max F. Perutz 

MRC Laboratory of Molecular Biology 
Hills Road, Cambridge 

CB2 2QH 

UK 
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Introduction 


Genetics, the study of inheritance, is fundamental to all of biology. Living organisms are unique 
among all natural complex systems in that they contain within their genes an internal description 
encoded in the chemical text of DNA. It is this description and not the organism itself which is 
handed down from generation to generation and understanding how the genes work to specify the 
organism constitutes the science of genetics. Furthermore, this constancy is embedded in a vast range 
of diversity, from bacteria to ourselves, all having arisen by changes in the genes. Understanding 
evolution is also part of genetics, and an area which will benefit from our increasing ability to 
determine the complete DNA sequences of genomes. In some sense, these sequences contain a record 
of genome history and we have now learnt that many genes in our genomes can be found in other 
organisms, quite unlike us. Indeed some are much the same as those found in bacteria, and can be 
viewed as molecular fossils, preserved in our genomes. 

Although “like begets like” must be one of the oldest observations of mankind, it was only in the 
19th century that major scientific advances began. Charles Darwin put forward his theory of the 
origin of the species by natural selection but he lacked a credible theory of the mechanism of 
inheritance. He believed in blending inheritance which meant that variation would be continually 
removed and he was therefore compelled to introduce variation in each generation as an inherited 
acquired character. Gregor Mendel, working at the same time, discovered the laws of inheritance and 
showed how the characteristics of the organism could be accounted for by factors which specified 
them. Mendels’ work was rediscovered in 1900 independently by Correns, de Vries and Tchermak 
and soon after this, Bateson coined the term “gene” for the Mendelian factors and called the science, 
“genetics.” 

During the first 50 years of the twentieth century, there was a stream of important discoveries in 
genetics. We came to understand the relation between genes and chromosomes, and the connection 
between recombination maps and the physical structure of chromosomes. However, what the genes 
were made of and what they did remained a mystery until 1953 when Watson and Crick proposed the 
double helical structure of DNA, which at one blow unified genetics and biochemistry and ushered in 
the modern era of molecular biology. 

Genetics and especially the molecular approach to it is now a pervasive field covering all of biology. 
In the Encyclopedia of Genetics, we have tried to draw together the many strands of what is still a 
rapidly expanding field, to present a view of all of genetics. This has been a five year effort by over 700 
expert authors from all around the world. We have tried to ensure that the breadth of the work has not 
compromised the depth of the articles, and we hope that readers will be able to find accurate and 
up-to-date information on all major topics of genetics. We have also included articles on the history of 
the field as well as the impact of the applications of genetics to medicine and agriculture. 

When we began this work, the sequencing of complete genomes was still in its infancy and the 
sequencing of the human genome was thought to be far in the future. Technological advances and 
the concentration of resources have brought this to fruition this year, and genetics is a subject very 
much in the public eye. We hope that at least some of the articles will also be of value to those who are 
not professional biological or medical scientists, but want to discover more about this field. Many of 
the articles contain lists for further reading, and the online version of the Encyclopedia also includes 
hypertext links to original articles, abstracts, source items, databases and useful websites, so that 
readers can seamlessly search other appropriate literature. 

We would like to acknowledge the efforts of the Associate Editors, who worked hard in 
commissioning individual contributors to prepare cutting-edge articles, and who reviewed and edited 
the manuscripts in a timely manner. Our thanks also go to the Publishers, Academic Press, and the 
outstanding staff for their commitment, resourcefulness and creative input; and in particular Tessa 
Picknett, Kate Handyside, Peronel Craddock and the production team, who helped to make this a 
reality. 


Sydney Brenner, Jeffrey H. Miller 
Editors-in-Chief 
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A adenine 

aa amino acid 

aa-tRNA amino acyl-tRNA 

ABC ATP binding cassette 

AD Alzheimer’s disease 

ADA adenosine deaminase 

ADC adenocarcinoma 

ADH alcohol dehydrogenase 

ADP adenosine 5’-diphosphate 

AFA acromegaloid facial appearance 
(syndrome) 

AFB aflatoxin B 

AFP alpha-fetoprotein 

AHA acute hemolytic anemia 

AIDS acquired immunodeficiency 
syndrome 

AIL advanced intercross line 

Ala alanine 

ALL acute lymphoblastic leukemia 

AMCA aminomethyl coumarin acetic acid 

AMH anti-Miillerian hormone 

AML acute myeloid leukemia 

Amp ampicillin 

AMP adenosine 5’-monophosphate 

A, polyadenylation 

AP apyrimidinic (site) 

APC adenomatous polyposis coli 

APO- apolipoprotein- 

APOBEC apo-B mRNA editing cytidine 
deaminase 

APP amyloid precursor protein 

ARF ADP-ribosylation factor 

Arg arginine 

ARS autonomous replication sequence 

Asn asparagine 

ASO allele specific oligonucleotide 

Asp aspartic acid 

AT ataxia telangiectasia 

ATP adenosine 5'-triphosphate 

BAC bacterial artificial chromosome 

BER base excision repair 

BIC Breast Cancer Information Corp 

BIME bacterial interspersed mosaic element 

BMD Becker muscular dystrophy 

bp base pair 

BR Balbiani rings 

BS Bloom’s syndrome 

BSE bovine spongiform encephalopathy 

BWS Beckwith-Wiedemann syndrome 

C cytosine 

C- carboxyl- 

CAF chromatin assembly factor 

cAMP cyclic AMP 

CAP catabolite activator protein 

CD campomelic dysplasia 


circular dichroism 
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cell division cycle 
cyclin-dependent kinase 
complementary DNA 

coding sequence 

centromere 

cystic fibrosis 

cystic fibrosis transmembrane 
conductance regulator 
comparative genome hybridisation 
Chinese hamster ovary (cells) 
Creutzfeldt-Jacob disease 

cutis laxa 

centimorgan 

congenital muscular dystrophy 
chronic myeloid leukemia 
cytoplasmic male sterility 
coenzyme A 

colicin 

coat protein 

cytoplasmic polyadenylation element 
centiray 

conserved region 

colorectal adenocarcinoma 
cyclization recombination 

cAMP response element binding factor 
cAMP receptor protein 

cholera toxin 

chloroplast DNA 
cotransformation frequency 
cytidine triphosphate 

coefficient of variation 

chorionic villus sampling 

cysteine 

dalton 

dentinogenesis imperfecta 
differential inference 

differentially methylated domain 
Duchenne muscular dystrophy 
Dulbecco’s Modified Eagle’s Medium 
deviation from Mendelian inheritance 
dimethylsulfoxide 
deoxynucleotide 

deoxyribonucleic acid 
deoxyribonucleoprotein 
deoxyribonucleotide triphosphate 
double stranded 

double strand break repair 
endosperm balanced number 
Epstein-Barr virus 

extracellular matrix 
Ehlers—Danlos syndrome 
elongation factor 

epidermal growth factor 

enzyme linked immunosorbant assay 
European Molecular Biology 
Laboratories 
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EN 

EN 
ENU 
EP 

ER 

eRF 

ES 

ESI 

ESS 
EST 
F-factor 
FA 
FACS 
FAD 
FADH, 
FAK 
FAP 
FDS 
FFI 
FGF 
FH 
FIGE 
FISH 
FITC 
FRDA 
FRET 
FSH 
FSHMD 


G 

G-6-P 
G-banding 
G-proteins 
GAP 
GDB 
GDP 

GEF 

GF 

GFP 

GIST 

Glu 

Gly 

8P 

GPCR 
gRNA 
GSD 

GSS 


GTP 
HA 
HAT 
Hb 
hCG 
HCL 
HDAC 
HDGS 
HDL 
Hfr 
HGMD 


early nodule 

endonuclease 
N-ethyl-N-nitrosourea 

early promotor 

endoplasmic reticulum 

eukaryal release factor 
embryonic stem (cells) 
electrospray ionization 
evolutionarily stable strategy 
expressed sequence tagged 
fertility-factor 

fluctuating asymmetry 
fluorescence-activated cell sorter 
flavin-adenine dinucleotide 
reduced FAD 

focal adhesion kinase 

familial adenoma polyposis 
first-division segregation 
familial fatal insomnia 
fibroblast growth factor 
familial hypercholesterolemia 
field inversion gel electrophoresis 
fluorescent in situ hybridization 
fluorescein isothoicyanate 
Friedreich’s ataxia 


fluorescence energy resonance transfer 


follicle stimulating hormone 
facioscapulohumeral muscular 
dystrophy 

guanine 

glucose-6-phosphate 
Giemsa-banding 

GTP-binding proteins 

GTPase activating protein 
genome database 

guanosine diphosphate 

guanine nucleotide exchange factor 
growth factor 

green fluorescent protein 
gastrointestinal stromal tumors 
glutamic acid 

glycine 

glycoprotein 

G-protein-coupled receptor 
guide RNA 
Gerstmann-Straiissler disease 
Gerstmann-Straiissler-Scheinker 
syndrome 

guanosine triphosphate 
hemagglutinin 

histone acetyl transferase 
hemoglobin 

human chorionic gonadotrophin 
hairy cell leukemia 

histone deacetylase 
homology-dependent gene silencing 
high-density lipoprotein 

high frequency recombination 
Human Gene Mutation Database 


HGMP 
His 

HIV 
HLA 
HLH 
HMC 
HMG 
HMW 
HNPCC 


hnRNP 
HPLC 


HPV 

hsp 

HTH 

HTLV 
HV-I, HV-II 
HW equilibrium 
IBD 

IBS 

ICM 

ICSI 

IES 

IF 

Ig 

IGF 

IHF 

IL 

ILAR 


Ile 
IMAC 


IN 
INR 
IPTG 
IS 
ISH 
ISR 
IVF 
IVS 
kb 
KL 
KO 
KSS 
Lac 
LBC 
LCR 
LD 
LDL 
Leu 
LH 
LHSI/II 
LINE 
LMC 
LMW 
LOD 
LOH 


Human Genome Mapping Project 
histidine 

human immunodeficiency virus 
human leukocyte antigen 
helix-loop-helix 
5'-hydroxymethyl-cytosine 
high mobility group 

high molecular weight 
hereditary nonpolyposis colorectal 
cancer 

heterogeneous nuclear RNP 
high-performance liquid 
chromatography 

human papillomavirus 
heatshock protein 
helix-turn-helix 

human T-cell leukemia virus 
hypervariable regions I, II 
Hardy-Weinberg equilibrium 
identical by descent 

identical by state 

inner cell mass 
intracytoplasmic sperm injection 
internal eliminated sequences 
initiation factor 
immunoglobulin 

insulin-like growth factor 
integration host factor 
interleukin 

Institute for Laboratory Animal 
Research 

isoleucine 

immobilized metal ion affinity 
chromatography 

integrase (protein) 

initiator region 
isopropylthiogalactoside 
insertion sequence 

in situ hybridization 

induced systemic resistance 

in vitro fertilization 
intervening sequence 

kilobase 

kit ligand 

knockout 

Kearns-Sayre syndrome 
lactose 

lampbrush chromosome 

locus control region 

linkage disequilibrium 
low-density lipoprotein 

leucine 

luteinizing hormone 

light harvesting system I/II 
long interspersed nuclear element 
local mate competition 

low molecular weight 
logarithm of the odds (score) 
loss of heterozygosity 


Lox 
LPS 
LRC 
LRE 
LRR 
LTR 
Lys 
m-BCR 
M-BCR 
M-phase 
MAP 
MAP 
MAPK 
MAR 
Mb 
MBP 
MC’F 
MCR 
MCS 
MDS 
Mel 
MELAS 


MEN 
MERRF 


Met 
MFH 
MFS 
MGD 
MGF 
MGI 
MHC 
MIC 
Mis 
MLP 
MLS 
mM 
MMC 


MMR 

MOI 

MPS 
MRCA 
MRD 
mRNA 

MS 

MSI 
mtDNA 

Mu element 


locus of X-over 
lipopolysaccharide 

local resource competition 

local resource enhancement 
leucine-rich repeat 

long terminal repeat 

lysine 

minor breakpoint cluster region 
major breakpoint cluster region 
meiosis/mitosis phase 
microtubule associated protein 
mitogen-activated protein 
mitogen-activated protein kinase 
matrix-attached region 

megabase 

myelin-based protein 
micro-complement fixation 
mutation cluster region 

multiple cloning site 
myelodysplastic syndrome 
maternal-effect embryonic lethal 
mitochondrial encephalomyopathy, 
lactic acidosis, and stroke-like 
symptoms 

multiple endocrine neoplasia 
myoclonus epilepsy with ragged red 
fibers 

methionine 

malignant fibrous histiocytoma 
Marfan syndrome 

Mouse Genome Database 

mast cell growth factor 

Mouse Genome Informatics 
major histocompatibility complex 
minimum inhibitory concentration 
Müllerian-inhibiting substance 
major late promotor 

myxoid liposarcoma 

millimolar 

maternally inherited myopathy and 
cardiomyopathy 

mismatch repair 

multiplicity of infection 
mucopolysaccharidosis 

most recent common ancestor 
minimal residual disease 
messenger RNA 

mass spectroscopy 

microsatellite instability 
mitochondrial DNA 

mutator element 

amino- 

nicotinamide adenine dinucleotide 
reduced NAD 

nicotinamide adenine dinucleotide 
phosphate 

nucleosome assembly protein 
nonreplication Bacteroides units 
noncoding RNA 


NCS 
NER 
NHL 
NK 
NMD 
NMR 
NOR 
NOS 
NPC 
NPC 
NR 

nt 

OI 
OMIM 
OPMD 
ORC 
ORF 
Ori 
Ori T 
OTU 
PAA 
PAC 
PAC 
PAGE 
PAI 
PAPP 
PAR 
Pax 
PBP 
Pc 
PCO 
PCR 
PCT 
PDGF 
PE 
PEP 
PEV 
PFGE 
PGPR 


PH 
PH 
Phe 
Pi 
PI 
PIP, 


PKA 
PKU 
PMF 
PMS 
Pol 
PR 
Pro 
PS 

PS I/II 
PTC 
PTGS 


Q-(banding) 
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nonchromosomal striped 
nucleotide excision repair 
non-Hodgkin lymphoma 

natural killer (cells) 
nonsense-mediated mRNA decay 
nuclear magnetic resonance 
nucleolus organizing region 
nitric oxide synthetase 
nasopharyngeal carcinoma 
nuclear pore complex 

nuclear reorganization 
nucleotide 

osteogenesis imperfecta 

Online Mendelian Inheritance in Man 
oculopharyngeal muscular dystrophy 
origin of replication complex 
open reading frame 

origin (of replication) 

origin of transfer 

operational taxonomic unit 
propionic acidemia 

P1 artificial chromosome (vector) 
prostate adenocarcinoma 
polyacrylamide gel electrophoresis 
pathogenicity islands 
pregnancy-associated plasma protein 
pseudoautosomal region 

paired box-containing genes 
penicillin binding protein 
polycomb 

polycystic ovarian (disease) 
polymerase chain reaction 
plasmacytoma 

platelet derived growth factor 
phosphatidylethanolamine 
phosphoenolpyruvate 

position effect variegation 
pulsed-field gel electrophoresis 
plant growth-promoting 
Rhizobacteria 

plekstrin homology 

polyhedron 

phenylalanine 

inorganic phosphate 
phosphatidylinositol 
phosphatidylinositol-4,5-bispho- 
sphate 

protein kinase A 
phenylketonurea 

proton motive force 

postmeiotic segregation 
polymerase 

protease 

proline 

phosphatidylserine 

photosystem I/II 

premature termination codon 
posttranscriptional gene silencing 
quinacrine 
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QTL 


R 


R-plasmids 
RA 
Ram 
RB 
RCC 
RCL 
RCS 
rDNA 
RDR 
RED 
RER 
REV 
RF 

RF 
RFLP 


RI 

Rif 
RIM 
RIP 
RM 
RN 
RNA 
RNAi 
RNP 
ROS 
RRF 
rRNA 
RSS 
RSV 
RT 
RTK 
S-phase 
SAR 
SAR 
SBT 
SC 
SCE 
SCF 
scRNA 
SDP 
SDS 
SDS-PAGE 


SEM 

sen DNA 
Ser 

SER 

SF 

SH (domains) 
SI 

SINE 
SIV 

SL 

SLF 
snoRNA 


quantitative trait loci 

resistance (eg Amp®; ampicillin 
resistance) 

resistance plasmids 

retinoic acid 

ribosomal ambiguity 
retinoblastoma 

renal cancer cell 

round cell liposarcoma 
recombinant congenic strains 
recombinant DNA 
recombination-dependent replication 
repeat expansion detection 
rough endoplasmic reticulum 
reticuloendotheliosis virus 
release factor 

replicative form 

restriction fragment length 
polymorphism 

recombinant inbred 

rifampicin 

reproductive isolating mechanism 
repeat induced point (mutation) 
restriction modification 
recombination nodule 
ribonucleic acid 

RNA interference 
ribonucleoprotein 

reactive oxygen species 
ribosome recycling factor 
ribosomal RNA 
recombination signal sequence 
Rous sarcoma virus 

reverse transcriptase 

receptor tyrosine kinase 
synthesis phase 
scaffold-attached region 
systemic acquired resistance 
shifting balance theory 
synaptonemal complex 

sister chromatid exchange 

stem cell factor 

small cytoplasmic RNA 

strain distribution pattern 
second-division segregation 
sodium dodecyl-sulfate 
polyacrylamide gel electrophoresis 
scanning electron microscopy 
senescent DNA 

serine 

smooth endoplasmic reticulum 
steroidogenic factor 

Src homology (domains) 
self-incompatibility 

short interspersed nuclear element 
simian immunodeficiency virus 
spliced leader 

steel factor 

small nucleolar RNA 


single nucleotide polymorphism 
small nuclear RNA 

small nuclear ribonucleoprotein 
superoxide dismutase 

sterol response element 

signal recognition particle 
sex-determining region Y 

single stranded 

single strand binding (protein) 
simple sequence length polymorphism 
simple tandem sequence repeats 
short tandem repeats 

sequence tagged sites 

subunit 

surface (viral) 

simian virus 40 

thymine 

Thermus aquatus 

TATA-box binding protein 
tricarboxylic acid 

T-cell receptor 

testis-determining factor 
transposable elements 

telomere 

transmission electron microscopy 
terminator 

tetracyclin 

transcription factor 

transforming growth factor 
transcriptional gene silencing 
threonine 

translocase of the inner membrane 
tissue inhibitor of metalloproteinases 
terminal inverted repeats 
thymidine kinase 

melting temperature 
transmembrane 

tobacco mosaic virus 

transposon 

tumor necrosis factor 

translocase of the outer membrane 
topoisomerase 

optimum temperature 
transmission ratio distortion 
T-complex polypeptide ring complex 
tetramethyl rhodamine isothiocyanate 
transfer RNA 

tryptophan 

Tay-Sachs disease 

transmissible spongiform 
encephalopathy 

tumor supressor gene 
transcription start site 

tyrosine 

uracil 

ubiquitin 

uniparental disomy 

unidentified reading frame 

uptake signal sequence 


UTI 
UTP 
UTR 
UV 

V gene 
Val 
VEGF 
VHL 
VNTR 
VWF 


upper respiratory tract infection 
uridine triphosphate 

untranslated region 

ultraviolet 

variable gene 

valine 

vascular endothelial growth factor 
Von Hippel—Lindau disease 
variable number of tandem repeats 
Von Willebrand factor 


WHO 
WS 
WT 
WT1 
XIC 
Xist 
XP 
YAC 
ZP 


Abbreviations 


World Health Organization 
Werner syndrome 

wild-type 

Wilm’s tumor 1 
X-inactivation center 
X-inactive specific transcript 
xeroderma pigmentosum 
yeast artificial chromosome 
zona pellucida 


xV 


(A)n tail 


See: Poly(A) Tail 


ADNA 


See: DNA, History of; DNA Structure 
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‘Abortive transduction’ refers to the introduction of 
transcriptionally competent but nonreplicating seg- 
ments of foreign genetic material into a bacterial cell 
by a transducing phage (bacterial virus). A transducing 
phage is one capable of packaging DNA which is not 
its own into phage capsids, usually at low frequency. 
Once injected into a recipient cell, the transduced 
DNA fragment has three possible fates: it can be 
degraded; it can recombine with the recipient chromo- 
some or plasmid, resulting in a stable change in the 
bacterial genotype (complete transduction); or it can 
establish itself as a nonreplicating genetic element that 
is segregated to only one of the two daughter cells 
at each division (abortive transduction). Establish- 
ment of an abortive transducing fragment may involve 
protein-mediated circularization of the entering linear 
fragment. 

Abortive transduction was first described in the 
1950s by (among others) B.A.D. Stocker, J. Lederberg, 
and H. Ozeki. Particularly informative were Stocker’s 
transductional analyses of motility mutants of Salmo- 
nella typhimurium using P22. Motile cells embedded 
in semisolid agar can swim away from a growing 
colony and multiply further, forming a large circular 
swarm of cells, but a nonmotile mutant strain (e.g., 
lacking flagella) multiplies in place, forming a small 
circular colony. A suitable abortively transduced 
wild-type DNA can complement the motility muta- 
tion, allowing the formerly nonmotile cell to swim. 
However, nonmotile daughter cells are generated 


during the swim and remain in place, where they 
further multiply. This results in a compact colony 
(descendants of the first daughter cell) with a trail 
of cells emanating from it (later descendants of the 
abortively transduced swimming cell). Nutritional 
markers (for example, mutations abolishing the ability 
to synthesize an amino acid) can also be abortively 
transduced, resulting in very small colonies on min- 
imal media lacking the required nutrient. Such mar- 
kers have been used to study the process of abortive 
transduction, using P1 in Escherichia coli and P22 in S. 
typhimurium. Abortive transduction is in fact more 
frequent than complete transduction — as many as 
90% of all transducing fragments introduced into 
cells become established as abortive transductants, 
while about 2% form complete transductants. 

The physical nature of abortive transduction has 
been studied by Sandri and Berger, by Schmeiger, 
and by others. One method uses infection of unlabeled 
cells with phage grown on bacteria with labeled DNA. 
The fate of the labeled DNA can be followed by separ- 
ation according to density, for heavy non-radioactive 
isotopes such as N. Only about 10-15% of the label 
in the fragments becomes physically associated with 
the unlabeled chromosome (either by recombination 
or by nucleotide recycling). The remaining label is 
not degraded and can be quantitatively recovered for 
at least 5h after introduction. This persistent state 
is consistent with the genetic observation that the 
DNA can complement defective chromosomal genes 
for many generations. Complete transduction occurs 
within the first hour of introduction. 

Physical protection of the abortive fragments from 
host nucleases appears to result from protein asso- 
ciation with the DNA. Abortive transducing DNA 
labeled with heavy isotopes displayed an accelerated 
sedimentation velocity consistent with a supercoiled 
circular form, when reisolated from recipient cells; 
sedimentation velocity was restored to normal by 
protease treatment. In the P22 system, a particular 
phage protein has been implicated in the protection 
process: P22 gene 16 mutants yield fewer abortive 
transductants, but normal numbers of complete trans- 
ductants. It is thought that the protein is packaged 
with the DNA in the capsid and injected with the 
transducing fragment. 


2 Acentric Fragment 


The biological impact of this process is hard to 
assess. Its frequency in nature is unknown. It could 
in principle have the effect of allowing escape from a 
stressful condition for enough time for the cell to 
acquire a new mutational adaptation or to find a new 
environment, without leaving a permanent genetic 
record of the event. 


See also: Transduction 


Acentric Fragment 
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An acentric fragment of a chromosome is a fragment 
resulting from breakage that lacks a centromere. It is 
lost at cell division. 


See also: Centromere; Chromosome 


Achondroplasia 


R Savarirayan, V Cormier-Daire, and 
D L Rimoin 
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Achondroplasia is the most common form of dispro- 
portionate short stature (dwarfism) with an estimated 
incidence of 1 per 20 000-30 000 live births. This type 
of dwarfism has been recognized for more than 4000 
years, and can be seen depicted in many ancient statues 
and drawings. Achondroplasia is inherited as an 
autosomal dominant trait with approximately 75% 
of cases representing new dominant mutations. The 
molecular defects underlying achondroplasia have 
recently been elucidated, and comprise heterozygous 
mutations in the fibroblast growth factor receptor 3 
(FGFR3) gene located on the short arm of chromo- 
some 4. This gene encodes a tyrosine kinase cell 
surface receptor, and one specific gain-of-function 
mutation (G1138A), resulting in a glycine to arginine 
substitution in the transmembrane domain of FGFR3, 
is responsible for the vast majority (approximately 
98%) of cases, and is the most common known muta- 
tion in humans. 

Diagnosis of achondroplasia is usually made at or 
around birth, based on the typical appearance of these 
infants comprising: disproportionate short stature 
with short limbs, especially the most proximal (rhizo- 
melic) segments, redundant folds of skin overlying the 


shortened limbs, short and broad hands and feet with a 
“trident” configuration of the digits, a shortened 
thorax with relatively long abdomen, limitation of 
elbow extension, and a characteristic facial appearance 
with a disproportionately large head, prominent fore- 
head, depressed nasal bridge, flat midface, and a short, 
upturned nose. The clinical diagnosis is confirmed by 
the specific radiographic features of the condition, 
which include a large skull with relatively small cranial 
base, narrow foramen magnum, short, flat vertebral 
bodies, lack of normal increase in interpediculate dis- 
tance from upper lumbar vertebrae caudally, short 
pedicles with narrow vertebral canal, square-shaped 
iliac wings, short, narrow sciatic notches, flat acetabu- 
lar roof, short limbs with short thick tubular bones, 
broad and short metacarpals and phalanges, fibular 
overgrowth, and short ribs. The diagnosis of achon- 
droplasia can now be made before birth by molecular 
testing for the specific FGFR3 mutation in families 
with a prior history of the condition. Like many 
other skeletal dysplasias, the diagnosis of achondro- 
plasia can be suspected by the use of prenatal ultra- 
sonography, although it cannot be made until 
relatively late in pregnancy because shortening of the 
long bones becomes manifest only after 24 weeks of 
gestation. Hypochondroplasia and thanatophoric 
dysplasia are related conditions, also due to mutations 
in the FGFR3 gene; however achondroplasia can be 
readily distinguished from these, as the changes in 
hypochondroplasia are milder and those in thanato- 
phoric dysplasia much more severe and almost invari- 
ably lethal. 

The majority of individuals with achondroplasia 
are of normal intelligence, have a normal lifespan, 
and lead independent and productive lives. These indi- 
viduals, however, face many potential medical, psycho- 
social, and architectural challenges secondary to their 
abnormal skeletal development and subsequent dis- 
proportionate short stature. The mean final adult 
height in achondroplasia is 130cm for men and 
125cm for women and specific growth charts have 
been developed to document and track linear growth, 
head circumference, and weight in these individuals. 
Human growth hormone and other drug therapies 
have not been effective in significantly increasing 
final adult stature in achondroplasia. Recently, surgi- 
cal limb lengthening procedures have been employed 
successfully to increase leg length by up to 30 cm. 

There are many potential medical problems that a 
person with achondroplasia may experience during 
his or her life. In early infancy the most potentially 
serious of these is compression of the cervicomedul- 
lary spinal cord secondary to a narrow foramen mag- 
num, cervical spinal canal, or both. This complication 
may be manifest clinically by symptoms and signs of 


high cervical myelopathy, central apnea, or profound 
hypotonia and motor delay and may, in some instan- 
ces, require decompressive neurosurgery. Other poten- 
tial complications in infancy include significant nasal 
obstruction that may lead to sleep apnea in a minority 
(5%) of cases, development of a thoracolumbar kyph- 
osis, which usually resolves upon weight-bearing, 
and hydrocephalus in a small proportion of cases 
(1%) during the first 2 years of life, which may require 
shunting. From early childhood, and as the child 
begins to walk, several orthopedic manifestations 
may evolve including progressive bowing of the legs 
due to fibular overgrowth, development of lumbar 
lordosis, and hip flexion contractures. Recurrent ear 
infections with ensuing chronic serous otitis media are 
common complications at this time and may lead to 
conductive hearing loss with consequent delayed 
speech and language development. The older child 
with achondroplasia commonly develops dental mal- 
occlusion secondary to a disproportionate cranial base 
with subsequent crowding of teeth and crossbite. 
The main potential medical complication of the adult 
with achondroplasia is lumbar spinal canal stenosis, 
with impingement on the spinal cord roots. This 
complication may be manifested by lower limb pain 
and parasthesiae, bladder or bowel dysfunction, and 
neurological signs and may require decompressive 
surgery. 

Throughout their lives, some people with achon- 
droplasia may experience a variety of psychosocial 
challenges. These can be addressed by specialized 
medical and social support of the individual and 
family, appropriate anticipatory guidance and by 
interaction with patient support groups such as the 
“Little People of America.” 


See also: Genetic Diseases 


Acquired Resistance 
See: Systemic Acquired Resistance (SAR) 


Acridines 


J H Miller 
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A group of polycyclic hydrocarbons, often used as 
dyes, that intercalate into the DNA, often resulting 
in the insertion or deletion of base pairs, generating 
frameshift mutations. 


Acrosome 3 


Acrocentric Chromosome 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1751 


An acrocentric chromosome possesses a centromere 
nearer to one end than the other. 


See also: Centromere; Chromosome 


Acrosome 


G S Kopf 
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The acrosome is a vesicle overlying the nucleus of both 
invertebrate and vertebrate sperm composed 
of nonenzymatic and enzymatic proteins generally 
arranged as a matrix; these proteins have been demon- 
strated in some cases to play specific roles in the fertil- 
ization process. The contents of the acrosome are 
released prior to sperm—egg fusion in a regulated secre- 
tory event called the acrosome reaction. The morpho- 
logy of the acrosome varies between species and the 
mechanics of the acrosome reaction vary widely be- 
tween invertebrates and vertebrates. This chapter will 
focus specifically onthe acrosome ofmammaliansperm. 

The acrosome is a product of the Golgi complex, 
and is synthesized and assembled during spermio- 
genesis. The contents of the acrosome include struc- 
tural and nonstructural, nonenzymatic and enzymatic 
components, and this secretory vesicle is delimited by 
both inner and outer acrosomal membranes. These 
components appear to play important roles in the 
establishment and maintenance of the acrosomal 
matrix, in the dispersion of the acrosomal matrix, in 
the penetration of the egg’s zona pellucida, and pos- 
sibly in the interaction between the sperm and the egg 
plasma membranes. This vesicle is finally confined 
within the plasma membrane overlying the entire 
sperm surface. There remain several questions pertain- 
ing to the formation and maturation of this organelle. 

For example, although prominent biogenesis of the 
acrosome occurs during the Golgi and cap phases of 
spermiogenesis, it is not clear when it is during this 
developmental process that this organelle actually 
starts to develop. Furthermore, the acrosome is com- 
posed of multiple component proteins, but little is 
known regarding whether the synthesis of all of 
these components occurs at the same time or whether 
synthesis is ordered and coordinate. Experimental evi- 
dence to date suggests the latter mechanism. 


4 Active Site 


The mechanisms by which these acrosomal compo- 
nents are targeted to this organelle during biogenesis 
are also not known. Although spermatogenic cells 
possess functional mannose-6-phosphate/insulin-like 
growth factor II receptors, it is not clear whether these 
receptors play a role in the transport of glycoproteins 
to the acrosome or whether targeting occurs primarily 
through the ‘default’ pathway seen in the transport of 
proteins in other secretory systems. Finally, once these 
components are packaged into the acrosome, the func- 
tional significance of additional processing of these 
components (i.e. posttranslational modifications; 
movement within the organelle) during sperm resi- 
dence in the testis and/or during residence in the extra- 
testicular male reproductive organs (i.e., epididymis; 
vas deferens) is not clear. 

In some species (e.g., guinea pig, mouse), the forma- 
tion of specific protein domains within the acrosome 
has been clearly demonstrated, but the mechanism by 
which this compartmentalization is established is 
poorly understood and an understanding of the biolo- 
gical role of this compartmentalization is only starting 
to be realized. Answers to all of these questions will no 
doubt become apparent when a systematic evaluation 
of the proteins comprising the acrosome is undertaken 
with respect to transcription, translation, and posttran- 
slational modifications. An understanding of these 
processes may greatly further our knowledge of the 
role of the acrosome in fertilization since it is becoming 
apparent that this secretory vesicle may have multiple 
functions (see below). It should also be noted that 
individuals whose sperm have poorly formed acro- 
somes or lack acrosomes altogether display infertility; 
this speaks to the importance of this organelle in 
the normal fertilization process. In any event, studies 
focused on the synthesis and processing of acrosomal 
components should be considered in the context of the 
acrosome functioning as a secretory granule and nota 
modified lysosome, as has been historically suggested. 

Although the fusion of the plasma membrane over- 
lying the acrosome and the outer acrosomal mem- 
brane constitutes the acrosome reaction, it must be 
emphasized that this process is very complex and 
likely involves many of the steps constituting regu- 
lated exocytotic processes in other cell types. Such 
steps might include membrane priming, docking, and 
fusion. Therefore, this process can also be referred to 
as acrosomal exocytosis. Recent data support the idea 
that sperm capacitation, an extratesticular matura- 
tional process that normally occurs in the female 
reproductive tract and confers fertilization compe- 
tence to the sperm, may comprise signal transduction 
events that ready the plasma and outer acrosomal 
membranes for subsequent fusion during the process 
of acrosomal exocytosis. Acrosomal exocytosis is 


regulated by ligand-induced signal transduction 
events in which the physiologically relevant ligand is 
the zona pellucida, an oocyte-specific extracellular 
matrix. Specific components of the zona pellucida are 
responsible for species-specific binding of the sperm 
and subsequent acrosomal exocytosis. These events 
are likely mediated by sperm membrane-associated 
zona pellucida binding proteins and/or receptors; the 
identity and mode of action of such proteins is still 
quite controversial. Resultant exocytosis involves the 
point fusion and vesiculation of the plasma membrane 
overlying the acrosome with the outer acrosomal 
membrane, thus creating hybrid membrane vesicles. 
The molecular mechanisms involved in this fusion 
and vesiculation process are not known. The resul- 
tant fusion of these membranes leads to the subse- 
quent exposure of the acrosomal contents to the 
extracellular environment. Both the exposed soluble 
and insoluble components of the acrosome may play 
important roles in the binding of the acrosome reacted 
sperm to the zona pellucida, as well as the subsequent 
penetration of the acrosome reacted sperm through 
the zona pellucida. Although this exocytotic event 
can be induced by both physiological stimuli and 
pharmacological agents, the molecular mechanisms 
by which these different stimuli and agents function 
to induce exocytosis may be dramatically different. 


See also: Fertilization 


Active Site 
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An active site is the part or region of a protein to which 
a substrate binds. 


See also: Proteins and Protein Structure 


Adaptive Landscapes 
M B Cruzan 
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Overview 


The genetic determination of fitness is complex, involv- 
ing a large number of loci with numerous interactions. 
In 1932 Sewall Wright depicted this myriad of effects 
as a two-dimensional view of peaks and valleys that 
represented fitness levels of multilocus genotypes 


(Figure | A). In this version of an adaptive landscape (a 
gene combination landscape), the horizontal and ver- 
tical axes represent genetic dimensions, and fitness 
(selective value) is indicated by contours (lines repre- 
senting elevation differences as found on a topographic 
map). As envisioned by Wright, a gene combination 
landscape could consist of many thousands of peaks 
of various elevations separated by valleys and saddles. 
Individual genotypes are represented by single points, 
and populations as clouds of points that are typically 
found on or near an adaptive peak. Adaptive evolution 
translates into local hill climbing, and shifts to higher 
peaks can only occur through fitness reductions as 
populations traverse valleys or saddles. The rugged 
genetic topography is due to the prevalence of genetic 
interactions such that many different gene combina- 
tions can produce high-fitness phenotypes. The para- 
digm of an adaptive landscape is a key element of 
Wright’s Shifting Balance Theory of evolution, where- 
by species undergo shifts among fitness peaks. 


Adaptive Landscapes as Described by 
Wright 


Early in his career, Wright’s work with animal breed- 
ing programs led him to the conclusion that interac- 
tions among loci (epistasis) were common and that 
individual characters could be influenced by a number 
of genetic factors (pleiotropy). He considered evolu- 
tion to be a process of selection on networks of inter- 
dependent genetic factors rather than on single loci 
with independent effects, a view which was empha- 
sized by R.A. Fisher. With thousands of loci, the 
assumption of strong genetic interactions naturally 
leads to the conclusion that there must be multiple 
fitness optima, each of which represents a unique 
genetic combination. Hence, epistasis produces a 
rugged adaptive landscape with multiple peaks and 
valleys, as opposed to a single fitness optimum, which 
would be expected if all combinations of loci acted in 
a purely additive fashion. While a two-dimensional 
projection is inadequate to represent such a complex 
multidimensional genotypic space, Wright’s view of an 
adaptive landscape has served as an important heuris- 
tic tool for understanding evolutionary processes. 


Evolution on Rugged Adaptive 
Landscapes 


Wright used depictions of adaptive landscapes to 
demonstrate several features of evolution. He pointed 
out that very large populations would be more likely 
to be found near the top of an adaptive peak because 
the influence of selection would be much greater than 
the effects of genetic drift (random variations in popu- 
lation allele frequencies among generations). Very 
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small populations, on the other hand, would tend to 
undergo severe inbreeding and by chance could be- 
come fixed for nonoptimal genetic combinations, in 
which case they would be depicted as single points that 
had drifted to lower positions on the fitness surface. 
Wright proposed that populations that were small 
enough to allow some drift but large enough to avoid 
severe inbreeding would occasionally shift far enough 
from the local optimum to come under the influence 
of a different adaptive peak. In this way species could 
explore the fitness surface by continually making tran- 
sitions to ever higher peaks. Wright argued that this 
process would be facilitated if a species were divided 
into a large number of small populations connected by 
low levels of gene flow, a concept which came to be 
knownasthe Shifting Balance Theory. Therugged topo- 
graphy of the landscape is a consequence of epistasis as 
well as genotype-by-environment interaction. Hence, 
with changes in the environment, previously fit 
genetic combinations may be rendered maladaptive, 
and in fluctuating environments, populations will 
constantly be subjected to selection of variable inten- 
sity and direction. 


Gene Frequency Landscapes 


The gene combination adaptive landscape described 
above has been subject to criticism because the axes 
are difficult to define in a concise manner. As a con- 
sequence, most evolutionary biologists have regarded 
this model of the fitness landscape as a metaphor with 
heuristic rather than analytical value. In his later years, 
Wright changed his depiction of an adaptive land- 
scape to represent a fitness surface for combinations 
of two different loci (Figure |B). In this version of the 
adaptive landscape, each axis is defined as the fre- 
quency of a single allele, and points on the surface 
represent the mean fitness of a population with a 
unique combination of gene frequencies. In effect, 
there are innumerable gene frequency landscapes in 
the original gene combination landscape, each of 
which represents a single pair of genes. Gene fre- 
quency surfaces have the advantage of being amendable 
to analytical methods and have been used to provide 
insights into conditions that promote peak shifts. 


Phenotypic Landscapes 


Fitness surfaces that are based on genotypes often have 
limited utility because there may be few situations 
where the allelic states of fitness-determining loci can 
be determined. Quantitative (phenotypic) traits, on 
the other hand, are generally much more accessible 
for empirical studies, and a rich body of theory 
for the evolution of phenotypic characters has been 
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developed. The concept of an adaptive landscape as a 
combination of two phenotypic characters was first 
introduced by Karl Pearson in 1903 and elaborated by 
George Gaylord Simpson in 1944. In this case the axes 
represent quantitative trait values and points on the 
fitness surface can represent either individuals or popu- 
lation means (Figure IC). This version of the adaptive 
landscape has been used extensively in models of the 
evolution of quantitative traits (Lande, 1976; Arnold 
and Wade, 1984; Wade and Goodnight, 1998). 


Holey Landscapes 


When Wright developed the fitness surface metaphor, 
his ability to characterize a genotypic space with a 
large number of dimensions was hampered by the 
availability of appropriate analytical tools. In recent 
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years theoretical investigations of landscapes defined 
by a large number of loci have led to the realization 
that ridges (referred to as neutral or nearly neutral 
networks) connecting regions of high fitness are a 
natural feature of the multidimensional adaptive land- 
scape (Gavrilets, 1997) (Figure 1D). Hence, through 
mutation, recombination and genetic drift, popula- 
tions can diverge by traversing high fitness networks 
without opposing selection. With extensive diver- 
gence, populations will eventually come to occupy 
opposite sides of regions of low fitness (a hole in the 
fitness landscape), in which case they are reproduc- 
tively isolated because of hybrid inviability or incom- 
patibility of the parental genotypes. Like Wright’s 
original fitness surface, the topography of holey land- 
scapes is dependent on the prevalence of epistasis, but 
the existence of connecting ridges facilitates evolution 
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Adaptive landscapes. In each case increasing elevation on the three-dimensional surface is equivalent to 


higher fitness. The fitness contours on the floor of each graph represent two-dimensional projections of the adaptive 
surface. Four different versions of the adaptive landscape are depicted: (A) A portion of a gene combination landscape 
features a rugged topography and axes that correspond to a multidimensional genotype space. (B) In the gene 
frequency landscape only two loci are considered. In this case there are two fitness peaks separated by a saddle. (C) 
An example of a phenotypic landscape displays a ridge of equal fitness produced by different combinations of values 
for two quantitative traits. (D) Holey adaptive landscapes are characterized by networks of equal fitness perforated 
by regions of low adaptive value (holes). The actual genotype space consists of a large number of dimensions, so the 
graphical representation shown here is a rough approximation. 


and divergence of populations by small steps without 
the necessity of crossing valleys or saddles. 


Future Prospects 


As both metaphors and analytical constructs, adaptive 
landscapes will continue to be useful tools for under- 
standing evolutionary processes from both theoretical 
and empirical perspectives. The theory associated with 
fitness surfaces has made substantial advances in recent 
years, but empirical evidence supporting the topo- 
graphies proposed in these models is sparse. The recent 
development of multidimensional models of adap- 
tive landscapes, with their more concise predictions 
concerning the genetic determination of mating 
barriers between divergent populations and taxa, pro- 
vides new foci for empirical investigations and an 
opportunity to refine our understanding of evolution- 
ary processes. 
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Adaptor Hypothesis 


S Brenner 
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The adaptor hypothesis was first proposed by Francis 
Crick, originally in a privately circulated note in 1955, 
and published later in 1958. He suggested that nucleic 
acids, which interact by base-pairing through hydro- 
gen bonds, were unlikely to be able to distinguish 
between the different amino acids, especially those 
that differed by only one methyl group. He therefore 
proposed that genetic messages would not read amino 
acids directly but that each amino acid would be 
linked to an adaptor molecule, probably a small 
nucleic acid, with 20 enzymes to perform the specific 
linkages. Thus, whereas nucleic acids could not easily 
differentiate between the 20 amino acids, a protein 
could, by recognizing both, specifically join the 
amino acid to its adaptor and the adaptor could then 
be recognized by the message by standard base-pairing 
rules. Although very much later it was shown that 
nucleic acids could by themselves recognize a wide 
range of molecular configurations, by adopting three- 
dimensional structures, the hypothesis was enor- 
mously prescient, predicting as it did the existence 
of transfer RNAs (tRNAs) and the tRNA amino- 
acylases. At the time, however, its main impact was 
the realization that the degeneracy of the code need 
not follow logical rules but could simply be due to 
historical accidents which assigned the triplets to the 
different amino acids. 


See also: Crick, Francis Harry Compton; 
Genetic Code 


Additive Genetic Variance 


W J Ewens 
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The variation from individual to individual in most 
characters has both a genetic and an environmental 
component, and many attempts have been made to 
estimate the relative sizes of these two components. 
Within the genetic component of variation further 
subdivision is possible, roughly speaking into the ad- 
ditive, the dominance, and the epistatic components of 
this variance. 


8 Additive Genetic Variance 


The additive genetic variance is, in effect, the com- 
ponent of the total genetic variance that can be 
explained by genes within genotypes. If some charac- 
ter is determined by the genes at a single locus, and has 
measurement 3 for A, A, individuals, 4 for A; A> indi- 
viduals, and 5 for A,A> individuals, then all the vari- 
ation in the value of this character can be explained by 
genes, with an A> gene contributing an additive com- 
ponent of 1 compared to an A, gene. Here the additive 
genetic variance comprises all the genetic variance. If 
the character has measurement 3 for A, A, individuals, 
4 for A,A> individuals, and 3 for A,A, individuals, 
and A, and Az are equally frequent, then none of the 
variation can be explained by genes, and the additive 
genetic variance is zero. Usually a situation intermedi- 
ate between these extremes holds. 

A better expression for the additive genetic vari- 
ance is the ‘genic variance,’ that is the component of 
the variance attributable to genes, but the expression 
‘additive genetic’ is entrenched in the literature. 
The adjective ‘additive’ is explained by the definition 
of this variance through a least-squares procedure. 
Suppose that some character determined by the 
genes at a single locus, with individuals of geno- 
types A,;A;, A142, and AA, having respective 
measurement values m11, 7242, and mp for this char- 
acter. If the population frequencies of these three 
genotypes are P11, 2P;2, and P22, then the popu- 
lation mean for this measurement is m = Pim + 
2P12m42 + Pm and the population variance in the 
character is o° = P11 (m1 — m) + 2P (m2 — m) + 
Py(my — m}. The additive genetic variance is 
found by finding two parameters, « (for the allele 
Ai) and æ (for the allele Az) which minimize the 
quadratic function Q, defined by Q = Pi (my, — m — 
201)? + 2Py2 (mn — 7 — a4 — a2)” + Pa (mn — m— 
2a2)’, the minimization being subject to the constraint 
(Pu + Pi2)a4 + (Pii + Px)az = 0. This explains the 
adjective ‘additive, since the additive genetic variance 
is that portion of the total variance explained by fitting 
additive parameters associated with the genes at the 
locus in question. If Q can be reduced to zero by this 
additive fitting of the a values, then all the genetic 
variance is additive genetic. 

When many alleles are possible at the locus in 
question, the additive genetic variance is found by a 
direct extension of the above procedure. In both the 
two- and many-allele cases, any variance not explained 
by fitting additive parameters, that is the difference 
between the total genetic variance and the additive 
component, is called the dominance variance. This is 
usually denoted op? while the additive genetic vari- 
ance is denoted by a4? (or by Va). In both the two-and 
multiallele cases, the « quantities are called the average 
effects of the respective alleles. Because of their use 


as additive parameters in determining the additive 
genetic variance, a better expression would be ‘addi- 
tive effects.’ In animal breeding programs the sum of 
the æ values for any genotype is called the breeding 
value of that genotype, again emphasizing the additive 
nature of these quantities. This implies that the breed- 
ing value of any heterozygote is always the average of 
the breeding values of the two corresponding homo- 
zygotes, even though, when dominance exists, this is 
not true of the corresponding phenotypic values. 

The importance of the additive genetic variance 
can best be seen by considering the correlation in 
the measurement of interest between various types 
of relative. In simple cases, for example for random- 
mating populations, the parent—offspring correlation 
is 404°/o° and the grandparent-grandchild correl- 
ation is tog? /o*. These correlations involve the addi- 
tive genetic variance, but not the dominance variance, 
because a parent passes on a gene to an offspring, not a 
genotype, and the additive genetic variance is that com- 
ponent of the total genetic variance in the measurement 
due to genes within genotypes. This implies that the 
additive genetic variance is of crucial interest in animal 
breeding programs; in this context the ratio o4°/o° 
is called the (narrow) heritability. Since full sibs can 
share two genes in common from their parents, the 
correlation between full sibs contains also a compo- 
nent from the dominance variance. All these calcula- 
tions apply in the case where an arbitrary number of 
alleles is possible at the gene locus controlling the 
character. 

For characters determined by the genes at several 
loci, an additive genetic variance can be calculated for 
each locus, together with a dominance variance. Apart 
from these, additive-by-additive variances, additive- 
by-dominance variances, and other epistatic variances 
can also be calculated. The correlations between rela- 
tives now become far more complex and depend not 
only in a complicated way on all these components of 
variance, but also on the linkage arrangement between 
the loci controlling the character as well as the various 
coefficients of linkage disequilibrium between the 
genes at the loci involved. Nevertheless the most im- 
portant component in these correlations is usually the 
sum of the additive genetic variances at all the loci 
controlling the character, since in these correlations 
the coefficients of the epistatic components are usually 
small. 

In evolutionary population genetics, one possible 
interpretation of the ‘fundamental theorem of natural 
selection’ states that the partial increase in mean fitness 
from one generation to the next is proportional to 
the parental generation additive genetic variance in 
fitness. The effect of natural selection over many gen- 
erations is ultimately to reduce the additive genetic 


variance to zero at any equilibrium point of the evolu- 
tionary process. 


See also: Adaptive Landscapes; Fitness; 
Fundamental Theorem of Natural Selection; 
Linkage Disequilibrium 
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Adenine (Ade) is one of the purine bases found in 
nucleic acids. When attached to ribose, it is the nucleo- 
side adenosine (A); when attached to deoxyribose, it is 
the nucleoside deoxyadenosine (dA) (Figure 1). The 
phosphate esters of those nucleosides are the nucleot- 
ides adenylic acid (adenosine + phosphate; AMP) 
and deoxyadenylic acid (deoxyadenosine + phosphate; 
dAMP). The triphosphate forms of adenosine (ATP) 
and deoxyadenosine (dATP) are substrates for the 
synthesis of RNA and DNA, respectively. When pre- 
sent in RNA and DNA, adenine functions as one of 
the ‘letters’ of the genetic code. ATP is a ubiquitous, 
high-energy substrate (see Mitochondria) and, along 
with GTP, a cofactor in many cellular reactions. 
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See also: Genetic Code; Mitochondria 


Adenocarcinomas 


A F Gazdar and A Maitra 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1540 


Adenocarcinomas (ADCs) are defined as malig- 
nant epithelial tumors with glandular differentia- 
tion or mucin production by the tumor cells. Their 
benign counterparts are known as adenomas. ADCs 
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constitute a subgroup of a broader category of epithel- 
ial tumors known as carcinomas. Carcinomas are the 
most prevalent form of human tumors (approximately 
80% of all noncutaneous malignancies) and ADCs 
are the commonest form of carcinomas. Carcinomas 
may arise in virtually any organ that contains glandu- 
lar or secretory epithelium, and the most frequent sites 
include lung, kidney, gastrointestinal tract, breast, and 
prostate. In some organs, such as colorectum, breast, 
and kidney, almost all of the carcinomas are ADCs, 
while in other organs such as lung, only a portion 
of the carcinomas are ADCs. As with other epithel- 
ial malignancies, ADCs are usually preceded by a 
series of histopathologically identifiable preneoplastic 
lesions. Molecular changes can usually be detected 
during the lengthy preneoplastic process, and may be 
present in histologically normal appearing epithelium. 
Because ADCs may arise from multiple diverse struc- 
tures and organs, their molecular pathogenesis varies 
considerably. In this section we will briefly discuss 
some aspects of the molecular genetics of ADCs aris- 
ing in the common organ sites. 


Colorectal Adenocarcinomas 


There are two major genetic pathways by which col- 
orectal adenocarcinomas (CRCs) arise: the “suppres- 
sor” and the “mutator” pathway. The “suppressor” 
phenotype is epitomized by the adenoma-carcinoma 
sequence, where progressive accumulation of muta- 
tions in dominant tumor-promoting oncogenes and 
recessive tumor suppressor genes (TSGs) results in 
transformation of a benign adenoma into an ADC. 
One of the earliest changes in this sequence involves 
inactivation of the Adenomatous Polyposis Coli (APC) 
gene on chromosome 5q21. Germline APC mutations 
are responsible for the inherited polyposis syndrome 
familial adenomatous polyposis or FAP (see Table 1). 
While FAP is a relatively uncommon cause of CRC, 
inactivation of the APC gene has been found in the 
vast majority of sporadic tumors as well. APC gene 
mutations are seen in aberrant crypt foci, considered 
the first histologic manifestation of disordered epithel- 
ial proliferation in the colon, and up to 80% of spor- 
adic adenomas and CRCs. Although the traditionally 
proposed “two-hit” pathway of loss of APC function 
in CRCs has been deletion of one parental allele fol- 
lowed by mutational inactivation of the second, recent 
data suggests that hypermethylation of the APC pro- 
moter site is an equally important epigenetic mechan- 
ism of inactivation, especially in sporadic tumors. The 
second consistent genetic aberration in the “suppres- 
sor” phenotype involves activating point mutations of 
the K-ras oncogene, located on chromosome 12p12. 
K-ras encodes a 21-kDa protein involved in GTP 
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Table I Familial CRC syndromes 


Familial CRC syndrome Clinical features 


Affected gene 


Familial adenomatous polyposis (FAP) 
the colon 


Peutz—Jeghers syndrome 
pigmentation 


Juvenile polyposis 


Cowden syndrome 


Multiple adenomatous polyps (>100) in 


Multiple hamartomatous polyps, circumoral 


Multiple juvenile polyps 


Multiple hamartomas of all three embryonal 


APC (chromosome 5q) 


LKBI (chromosome 1|9p) 


DPC4/smad4 (chromosome 18q) 
PTEN/MMAC (chromosome |0q) 


layers, breast cancer, thyroid cancer 


Hereditary nonpolyposis colorectal 
cancer 


CRCs with or without other cancers such as 


DNA mismatch repair genes 


endometrial carcinoma 


signal transduction, which controls cellular prolifer- 
ation and differentiation. K-ras mutations are seen in 
half of adenomas >1 cm in size and an equal number of 
CRCs, but much less frequently in smaller adenomas, 
suggesting that these changes are preceded by APC 
gene mutations in most cases. Additional genetic 
alterations in the adenoma-carcinoma sequence are 
usually seen within the larger late-stage adenomas or 
only at the carcinoma stage. For example, loss of 
heterozygosity (LOH) at the Deleted in Colon Carci- 
noma (DCC) gene locus on chromosome 18q is found 
in approximately 70% of CRCs and 50% of late-stage 
adenomas, but only in 10% of smaller adenomas. Simi- 
larly, LOH at 17p13, the p53 TSG locus, is seen in 75% 
of CRCs, but infrequently in any adenomas, including 
the late-stage ones. The majority of CRCs with LOH 
of one p53 allele demonstrate missense mutation of 
the remaining allele. To summarize, the “suppressor” 
phenotype of CRCs is characterized by a defined 
sequence of genetic alterations, which begins in histo- 
logically benign epithelium with inactivation of the 
APC gene, and proceeds through stepwise accumu- 
lation of mutations involving K-ras, DCC, p53, and 
putative TSGs on other chromosomes. 

In contrast, the “mutator” phenotype is character- 
ized by microsatellite instability (MSI). Microsatel- 
lites are short, simple repetitive DNA sequences of 
monos, di-, tri-, or tetranucleotides dispersed through- 
out the human genome and are by nature highly poly- 
morphic. CRCs developing along this pathway are 
initiated by an inherited or somatic mutation within 
one or more DNA mismatch repair (MMR) genes. 
Unlike classic TSGs (or “gatekeepers”), MMR genes 
are “caretakers” of the genome, responsible for cor- 
recting spontaneous slippage-induced errors during 
DNA replication. Inactivating mutations in the 
MMR genes facilitate further mutations in cancer 
causing genes or additional MMR genes, resulting in 
tremendous genetic instability and a fertile soil for 


neoplastic transformation. MSI in MMR-deficient 
tumors is manifest as a 100- to 1000-fold increase in 
the rate of repeat unit additions or deletions in micro- 
satellite sequences. Hereditary nonpolyposis colorec- 
tal cancer (HNPCC) or Lynch syndrome is the 
prototypical example of the inherited “mutator” phe- 
notype. Germline mutations involving the MMR 
genes hMLH1, hMSH2, hMSH6, hPMS1 or hPMS2 
have been identified in the majority of HNPCC 
patients and 15% of MSI+ sporadic CRCs. 

A third novel phenotype of CRC only recently 
described is the CpG island “methylator” phenotype 
(CIMP), which is characterized by the simultaneous 
methylation of multiple CpG islands, including the 
promoter sites of known TSG and MMR genes such 
as p16, APC, and hMLH1. The CIMP phenotype 
occurs independent of MSI, can be detected even at 
the adenoma stage, and CIMP+ tumors demonstrate 
a distinct subset of genetic changes compared to 
CIMP— tumors. 


Breast Adenocarcinomas 


There are two principal subtypes of breast ADCs: 
infiltrating ductal and infiltrating lobular carcinomas. 
In general, the clinical features and underlying genetic 
abnormalities of these two subtypes are similar. Most 
breast carcinomas arise on a backdrop of premalignant 
breast disease, seen as a histologic continuum that be- 
gins with usual ductal or lobular hyperplasia, proceeds 
through atypical hyperplasias and carcinoma-in- 
situ, and culminates in invasive tumors. Approxi- 
mately 10% of breast cancers show a strong familial 
tendency, appearing in a younger subset of women 
than the general population. Linkage studies in these 
families have led to the isolation of two breast cancer 
susceptibility genes - BRCA1 (for BReast CAncer 1) 
on chromosome 17q21 and BRCA2 on chromosome 
13q12. Women with germline BRCA/ mutations have 


an 85-90% lifetime risk of developing breast cancers 
and about 33% risk of developing ovarian cancers. 
Until recently, the role of BRCA1/2 genes in sporadic 
breast cancers was unclear since mutations of these 
genes were not detected outside the context of familial 
cases. New data suggest that a subset of sporadic 
breast cancers with LOH at 17q21 undergo complete 
abrogation of BRCA1 function via promoter hyper- 
methylation of the second allele. It is likely that 
methylation will emerge as a common mechanism of 
inactivation in most sporadic cancers, just as intra- 
genic deletions and mutations play the predominant 
role in their familial counterparts. Other genetic loci 
important in breast cancer development include p53 
and the ataxia telangiectasia gene, ATM, on chromo- 
some 11q23. Patients with germline p53 mutations 
(Li-Fraumeni syndrome) have an increased risk of 
developing breast cancers, and 50% of sporadic cases 
contain a mutated p53 gene. Similarly, patients with 
germline ATM mutations are also predisposed to 
breast cancers and 40% of sporadic tumors demon- 
strate LOH at the ATM locus. As with BRCA1, ATM 
mutations have not been detected in sporadic tumors 
and it remains to be determined what frequency of 
these cases will demonstrate promoter hypermethyl- 
ation. Besides TSGs, one growth- promoting oncogene 
that has received considerable attention in recent 
clinical trials has been c-erbB2 or Her-2neu. Her- 
2neu shares considerable homology with the epidermal 
growth factor receptor, and encodes a transmembrane 
glycoprotein with intracellular tyrosine kinase activ- 
ity. Approximately 20-30% of breast cancers demon- 
strate amplification of Her-2neu, and there is strong 
correlation between amplification and proliferation 
indices. The use of Herceptin, a monoclonal antibody 
to Her-2neu in patients with advanced stage breast 
cancers that overexpress the oncogene, has been 
shown to prolong survival and delay disease pro- 
gression, and represents one of the best-known 
examples of translational cancer research in modern 
times. 


Adenocarcinomas of the Lung 


Unlike most organs, the histological types of carcino- 
mas arising in the lung are diverse, and ADC is the 
most frequent form in many countries. Their inci- 
dence is rising rapidly, and is related to geography, 
gender, smoking status, smoking habits, and age. 
ADCs are peripheral tumors, arising from the small 
airways or alveoli. Because of their peripheral loca- 
tion, the preneoplastic changes associated with them 
cannot be studied in great detail or followed sequen- 
tially. However there are small benign lesions called 
atypical adenomatous hyperplasias that share some of 
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the morphological, genetic, and phenotypic properties 
with lung ADCs. A superficial, spreading noninvasive 
form of ADC which lines the peripheral airways, 
the bronchiolo-alveolar carcinoma, may represent 
an intermediate step between atypical adenomatous 
hyperplasia and invasive carcinoma. The molecular 
changes in pulmonary ADCs show similarities and 
differences with the other types of lung carcinomas. 
ADCs have lower rates or overall allelic loss, and have 
a lower incidence of p53 mutations than other lung 
carcinomas. However, ras mutations, especially of the 
K-ras gene at codons 12 or 13, are much most frequent 


in ADCs. 


Prostatic Adenocarcinomas 


The incidence of prostatic adenocarcinomas (PAC) 
has been increasing in recent times, primarily due to 
the better use of screening and detection techniques, 
such as the prostate specific antigen assay. More than 
70% of men above the age of 80 years will harbor foci 
of PAC, such that, conceptually, malignancy can 
almost be called a “physiologic inevitability” at this 
age. Androgens have at least a permissive role in the 
genesis of these tumors, because neoplastic epithelial 
cells possess androgen receptors, and androgen depriv- 
ation can lead to regression of PAC in many cases. In 
approximately 10% of patients, the development of 
PAC has been linked to inheritance of putative sus- 
ceptibility genes. Linkage analysis from these cohorts 
has the narrowed the locus for one-third of familial 
cases to the hereditary prostate cancer 1 or HPC1 
region on chromosome 1q24-25. The identity of the 
implicated gene within the HPC1 region remains to be 
determined although several candidate genes have 
been isolated. Allelic deletions of the short arm of 
chromosome 8 are found in 80-100% of sporadic 
PACs, and 30-65% of its precursor lesion, prostatic 
intraepithelial neoplasia. No single gene has been 
identified as the target of intragenic deletions in these 
cases and the search for a causal TSG continues. 
Approximately one-fourth of PACs demonstrate 
inactivation of the PTEN/MMAC gene on chromo- 
some 10q23, and preliminary studies have suggested 
that loss of PTEN/MMAC function may be associated 
with disease progression and metastasis. 


Summary 


In Table 2, we have summarized the most frequently 
implicated genetic alterations in ADCs at various 
other anatomic sites. It is intriguing that despite the 
defining morphologic pattern common to all ADCs — 
i.e., the formation of neoplastic glands — each of these 
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Table 2 Frequent genetic abnormalities in adenocarcinomas at other anatomic sites 


Organ Genetic abnormality 

Pancreas Inactivation of DPC4/smad4 (18q21), p16 (9p21), p53 (17p13), K-ras mutations 

Kidney Inactivation of VHL (3p25) 

Stomach Inactivation of DCC (18q21), p53 (17p13), and APC (5q21) Overexpression of cell cycle regulators CDC25B 
Gallbladder Inactivation of p16 (9p21), p53 (17p13), and FHIT (3p14) 


tumors has a subset of distinct molecular abnormal- 
ities that is organ-specific, and probably represents the 
endpoint of a complex interplay between genetic and 
environmental factors. With the increasing integration 
of molecular approaches into the diagnosis and treat- 
ment of cancers, it is possible that the future will see 
ADCs being classified not by the organ they arise in 
but rather by their unique and clinically relevant 
“molecular profiles.” 
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An adenoma is a benign neoplasm arising from 
glandular or secretory epithelium. It is classified by 
reference to the cell lineage of origin. Thus a benign 
neoplasm arising from the glandular cells of colonic 
mucosa is a colonic adenoma. An adenoma arising 
from secretory epithelium of thyroid follicles is a 


thyroid follicular adenoma. Occasionally the term 
may be further qualified to reflect macroscopic 
features such as cyst formation, for example cystade- 
noma of ovary. In some circumstances adenoma may 
progress to a malignant neoplasm, termed adenocar- 
cinoma. Before this happens the adenoma cells may 
show evidence of disordered maturation and growth 
known as dysplasia. Thus the complete designation of 
an adenoma contains referrence to site, cell type of 
origin, and degree of dysplasia. 


Further Reading 
Klatt EC and Kumar V (2000) Robbins Review of Pathology. Phila- 
delphia, PA: W.B. Saunders. 


See also: Adenocarcinomas 
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Adenomatous polyposis coli (APC) protein is the 
product of a gene (APC) located in the human genome 
near the 5q21-22 boundary. Germline mutations in 
APC are responsible for familial adenomatous poly- 
posis (FAP), a Mendelian dominant condition in 
which hundreds of benign adenomas develop in the 
colorectal mucosa, some of which inevitably evolve 
into carcinomas. Somatically acquired deficiency of 
APC function also characterizes the majority of spor- 
adic colorectal adenomas and carcinomas in man. The 
phenotype of FAP is modeled in mice bearing hetero- 
zygous, germline mutations that encode a truncated 
APC product, although here the tumors tend to be 
distributed throughout both small and large intestines. 
In both mouse and humans the tumors themselves 
consist of clones of cells in which all APC function 
has been lost, including in FAP that of the residual 
normal germline allele. APC is therefore an oncosup- 
pressor protein. 


Although deficiency of APC function is particu- 
larly associated with tumorigenesis in the intestinal 
mucosa, the protein is normally expressed in the 
majority of tissue types. It often appears associated 
with the lateral borders of epithelial cells, where it 
colocalizes with the junctional protein E-cadherin, 
but in intestinal epithelium it is also found close to 
the apical cell membrane, where E-cadherin is absent. 
APC is essential for development; homozygous trun- 
cation mutants in mouse embryos are lethal around 
the stage of gastrulation. 


Molecular Organization 


The human gene was originally described as encoding 
a 2843-amino-acid protein, but several isoforms are 
now known to exist, arising from alternative splicing, 
an additional exon (10A), and further exons 5’ to exon 
1 (four have been described, three including trans- 
lation start codons). There is some evidence for tissue 
specificity in the expression of these isoforms. An 
unusual feature in the genomic organization is the ex- 
ceptionally long 15th (terminal) exon, which encodes 
more than two-thirds of the entire protein. The 
N-terminal portion of the protein includes an oligo- 
merization domain (corresponding to codons 1-171) 
and seven tandem armadillo repeats (453-767). The 
central region encodes three incomplete 15-aa repeats 
and seven incomplete 20-aa repeats (1014-2130). In 
the C-terminal third is a basic domain and a terminal 
T/SXV motif. The functional significance of the oli- 
gomerization and armadillo repeat domains is not 
yet completely clear, but in the central region the 15- 
and 20-amino-acid repeats bind B-catenin. Three of 
the 20-amino-acid incomplete repeats contain GSK3B 
phosphorylation sites and axin (or its homolog 
conductin) is bound by three closely adjacent 
SAMP-motif sequences. There are also both nuclear 
localizing and nuclear export signals in this central 
region. The basic region, and possibly other regions 
N-terminal to it, binds microtubules, whilst the 
C-terminal 170 amino acids include a binding site 
for the microtuble-associated protein EB-1. The 
C-terminal T/SXV motif binds the human homolog 
of the Drosophila discs large protein (hDLG), a mem- 
ber of the membrane-associated guanylate kinase 
(MAGUK) family. There is at least one caspase-3 
site, at Asp777, which is cleaved in apoptosis. 


Homologs 


Closely similar homologs of human APC are present 
in many species, and there is a second family member 
(APC-2) located on human chromosome 19p13.3. All 
of these include versions of the oligomerization and 
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armadillo repeat domains and the B-catenin and axin- 
binding sites, but the terminal domains, including the 
Dlg binding site, is absent in both APC-2 and the 
Drosophila homologs. The expression of APC-2 ap- 
pears more restricted than that of APC, being largely 
in the central nervous system. 


Function 


The best-understood functions of APC relate to its 
interaction with B-catenin and axin. Binding to 
B-catenin through the 15-amino-acid repeats is con- 
stitutive, but binding through the third, fourth, and 
seventh 20-amino-acid repeats is dependent on their 
phosphorylation by the serine-threonine kinase 
GSK-3B. GSK-3B also phosphorylates B-catenin, tar- 
geting its ubiquitination and proteasomal destruction. 
APC catalyzes this reaction, through forming a multi- 
meric complex with axin/conduction, B-catenin, and 
GSK-3B. Hence APC plays a major role in destabiliz- 
ing B-catenin. This has profound effects on the cell, as it 
prevents entry of B-catenin to the nucleus, where, with 
tcf as a partner, it acts as a heterodimer transcription 
factor. Amongst the proteins known to be transacti- 
vated by B-catenin/tcf are the immediate-early DNA 
replicative proteins c-myc and c-jun, the cell cycle pro- 
tein cyclin D4, the epithelial growth factor gastrin, and 
the extracellular protease matrilysin. APC appears to 
play an additional role in reducing the effective intra- 
nuclear concentration of B-catenin by exporting it 
from the nucleus. By both these mechanisms, there- 
fore, APC can interrupt pathways in which B-catenin 
activates transcription and hence cell division. 

One such pathway is associated with cell stimula- 
tion by Wnt-1. This paracrine growth factor binds to 
the seven-span membrane receptor Frizzled (Fz), 
and so activates the cytoplasmic protein Disheveled 
(Dvl), an inhibitor of the kinase activity of GSK-3B. 
Thus Wnt-1 stimulation stabilizes B-catenin, oppos- 
ing APC. Free cytoplasmic f-catenin is also generated 
through its release from the cytosolic moiety of the 
cell-to-cell adherence molecule F-cadherin, a major 
protein in adherens junctions. This release is triggered 
by tyrosine kinase activity, and occurs in the cellular 
response to stimulation by many growth factors, 
whose receptors also cluster at these junctions. In 
both these circumstances, through destabilizing 
B-catenin, APC downregulates the effect of external 
growth signals on cell proliferation. 

Components of the Wnt-1 signaling pathway, 
including APC, axin, GSK-3f, and B-catenin appear 
widely in biology, and in development are often con- 
cerned with issues of cell polarity and cell-to-cell 
orientation as well as with the initiation of cell replica- 
tion. The binding of APC to microtubules, EB-1, and 
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Dlg may relate to this. In vitro, APC binds and 
bundles microtubules, and in living cells it decorates 
the microtubules cytoskeleton, appearing to move 
along it in the course of distribution to more peri- 
pheral parts of the cell. In particular, APC concen- 
trates in punctate aggregates at the margins of the 
membrane extensions that characterize the advancing 
edge of migrating cells, and at the growing ends of 
microtubules that terminate close to these extensions. 
The association with the tips of growing microtubules 
is regulated by phosphorylation and appears to be 
mediated through binding to EB-1. Interestingly, an 
EB-1 homolog, Bim 1p, is also a critical element in the 
alignment of the mitotic spindle with the position of 
the budding daughter cell during the yeast division 
cycle. Dlg also plays a major role in determining cell 
polarity. It establishes the position of adherens junc- 
tions in Drosophila development, and the human 
homolog is essential for the formation of normal 
synaptic junctions in neurons. There is some evidence 
that hDLG regulates both cell growth and intercellular 
adhesion by f-catenin-mediated, APC-dependent 
pathways. 


Pathology 


These observations go some way towards explaining 
the remarkably high frequency with which APC defi- 
ciency leads to the formation of colorectal adenomas, 
lesions in which there is a sustained disorder in cell 
orientation, migration, and proliferation. In FAP 
almost all the germline mutations generate truncated, 
N-terminal fragments of the protein, and within the 
tumors the function of the residual allele is suppressed 
through second mutation, partial or complete chromo- 
some loss, or suppression of expression. Over 70% of 
sporadic tumors also show a similar pattern of biallelic 
silencing, but here more than 60% of the truncation 
mutations fall within a mutation cluster region (MCR) 
delineated by codons 1286-1513. This generates 
N-terminal peptides including the oligomerization and 
armadillo repeat domains but without the axin- and 
GSK-3B-dependent B-catenin binding sites or the 
nuclear export sites. Correspondingly, most adeno- 
mas, both in human and the murine APC-deleted 
models, show intense nuclear b-catenin accumulation. 

In the FAP syndrome associated with germline 
mutation close to the MCR, the large numbers of 
colorectal adenomas are usually accompanied by 
extracolonic abnormalities, most commonly congeni- 
tal hyperplasia of the retinal pigment epithelium 
(CHRPE), duodenal adenomas, and (often) desmoid 
tumors of the soft tissues. More rarely, there are 
osteomas of the mandibular bone and in a very 
small percentage, carcinomas of the thyroid gland, 


medulloblastoma (Turcot syndrome) or hepatoblast- 
oma. Germline deletions distant from the MCR also 
tend to produce a less severe phenotype, with many 
fewer adenomas and CHRPE is not found associated 
with mutations 5’ to the ninth exon. Somewhat mys- 
teriously, germline mutations in the centre of the MCR 
(close to codon 1300) tend to be associated in tumors 
with suppression of the residual APC allele by partial 
or complete deletion (producing loss of heterozygo- 
sity at this locus), whilst mutations at more proximal 
or distal sites are associated with second point mu- 
tations. Rare sense mutations in the MCR are not 
usually associated with FAP, but can confer cancer 
susceptibility. Finally, studies with animal models of 
germline APC mutation show the existence of other 
genes that substantially modify the multiple adenoma 
phenotype. 
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Kusick VA Adenomatous polyposis of the colon: APC. 
http://www.ncbi.nim.nih.gov/htbin-post/Omim.dispmim/ 
175100. 

Peifer M and Polakis P (2000) Wnt signaling in oncogenesis and 
embryogenesis — a look outside the nucleus. Science 287. 


See also: Tumor Suppressor Genes 


Adenosine Phosphates 


J Parker 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0009 


Adenosine is a nucleoside in which the base adenine is 
covalently linked to the 1’ carbon of the sugar ribose. 
In the adenosine phosphates, there are one, two, or 
three phosphate groups also bonded to the ribose, 
typically linked in series to the 5/-carbon of the 
sugar. The bond connecting the first phosphate 
group to the carbon is a phosphoester bond, and the 
bonds between the phosphate groups are phosphodi- 
ester bonds. The adenosine phosphates include adeno- 
sine 5’-triphosphate (ATP), adenosine 5’-diphosphate 
(ADP), and adenosine 5’-monophosphate (AMP). 
Nucleosides to which phosphates are attached are 
also referred to as nucleotides. 


See also: Adenine; ATP (Adenosine 
Triphosphate); cAMP and Cell Signaling; 
Nucleotides and Nucleosides 
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Adenoviruses are large (130nm from fiber knob 
to fiber knob), nonenveloped, icosahedral viruses 
with characteristic fibers extending from the vertices 
(Figure |) and with genomes of ~36kb double- 
stranded DNA. They infect respiratory, conjunctival, 
and intestinal epithelia of humans and other verte- 
brates. In humans they are responsible for upper 
respiratory tract infections (UTI; generally in chil- 
dren), rare, lethal viral pneumonia in newborns, 
which can spread rapidly through a hospital nursery, 
and a fatal diarrheal disease in newborns in under- 
developed regions. The respiratory adenoviruses gen- 
erally produce an acute UTI followed by a subacute 
infection of the tonsils, from which low levels of virus 
are shed for months. The name ‘adenovirus’ derives 
from their frequent recovery from cultured adenoid 
tonsils. Human adenoviruses are classified accord- 
ing to their serotype (~50) defined by neutralizing 
antisera, and are named serotype 1, 2, etc., or simply 
Adi, Ad2, etc. The closely related respiratory Ad2 and 
5 serotypes replicate to high titer in cultured human 
cells (e.g, HeLa cells), and have been significant 
models for analysis of animal cell transcription 
and mRNA processing, virus—host interactions, and 
oncogenic transformation. Much current research is 
directed toward the development of adenovirus-based 
gene transducing vectors for use in gene therapy. 


Replication Cycle 


The globular domain at the termini of the fibers 
(Figure |) absorbs to a cell-surface glycoprotein 
receptor (CAR for Ad2 and 5, also the receptor for 
coxsackie B virus), followed by the binding of an 
RGD sequence in a flexible loop of the penton protein 
at each vertex to cellular fibronectin-binding integrins 
(integrins a,B5 and o,B3). This stimulates receptor- 
mediated endocytosis. As the pH of the resulting 
endosome drops, virion structural proteins change 
conformation, releasing the fiber proteins and lysing 
the endosomal membrane, thus delivering the virion 
into the cytosol. This process has been exploited 
experimentally to introduce proteins, which have 
been added to the medium and endocytosed with the 
virion or DNA absorbed to the virion surface, into the 
cytosol of cultured cells. On entry into the cytosol, a 
virion-associated protease is activated by the reducing 
environment, resulting in the release of several virion 
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Figure | 
of cryoelectronmicrographs and X-ray crystallography 
of the fiber globular domain. Scale bar represents 10 nm. 
Courtesy of Dr Phoebe Stewart. 


Ad2 structure based on computer averaging 


proteins. The partially disassembled capsid is trans- 
ported along microtubules to a nuclear pore complex 
(NPC) where it associates with NPC cytoplasmic 
filaments. The capsid then ‘uncoats’ as viral DNA 
associated with the viral histone-like protein pVII is 
transported through the NPC into the nucleoplasm by 
a poorly understood mechanism. 

After entry into the nucleus, a strong enhancer at 
the left end of the standard genetic map activates 
transcription of early region 1A (E1A; Figure 2). 
Two related E1A proteins are expressed from alterna- 
tively spliced E1A mRNAs. Four regions of the larg- 
est E1A protein are highly conserved among human 
and simian adenoviruses: conserved region (CR) 1, 2, 
and 3 and a hexapeptide at the C-terminus. CR1 and 2 
stimulate host cell entry into S phase, CR3 activates 
transcription from early promoters E1B, E2 early, E3, 
and E4, and the hexapeptide C-terminal sequence par- 
tially represses transcriptional activation by CR3. 
CR2 binds to the RB (retinoblastoma) protein family, 
displacing these repressors from host E2F transcrip- 
tion factors and consequently inducing the expression 
of genes required for entry into S phase, i.e., Cdk2, 
cyclins A and E, and enzymes required for deoxy- 
ribonucleotide and DNA synthesis. CR1 binds to 
the CBP/P300 family of transcriptional coactivators, 
displacing the PCAF histone acetylase complex. This 
represses the function of multiple enhancers control- 
ling differentiated cell functions, although it is not 
understood how this contributes to the G,;—S trans- 
ition. CR3 activates early transcription by binding to 
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the DNA-binding domains of host cell transcription 
factors that bind to early viral promoters and to the 
hSur2 subunit of the mammalian mediator complex. 
The mediator complex in turn interacts with host cell 
RNA polymerase II and general transcription factors. 
The C-terminal hexapeptide binds host protein CtBP 
that is homologous to a corepressor that functions in 
early Drosophila development (dCtBP). 

E2 expresses three viral proteins that replicate the 
viral DNA: a DNA polymerase, the preterminal 
protein (pTP) primase that remains covalently linked 
to the 5’ end of viral DNA strands and is cleaved to 
form the terminal protein during virion assembly, 
and a single-stranded DNA-binding protein. The late 
phase of infection is defined by the onset of DNA 
replication ~5-6h postinfection of HeLa cells. The 
pTP primase primes continuous 5’—3’ single-stranded 
DNA synthesis at each end of the viral genome. This 
results in the displacement of a full-length single viral 
DNA strand that forms a panhandle secondary struc- 
ture because of the ~ 100 bp inverted terminal repeat 
(ITR) which forms the viral replication origins at each 
end of the viral genome. Priming of DNA synthesis at 
the panhandle terminus by pTP results in conversion 
of the displaced single-stranded parental DNA into 
double-stranded DNA. The high rate of recombin- 
ation between viral mutants is due to annealing of 
single-stranded genomes generated by viral DNA 
replication, and subsequent repair of mismatches in 
heteroduplexes by host cell repair processes. Host cell 
transcription factors NF-1 and Oct-1 stimulate viral 
DNA replication by binding to the ITR and recruiting 
a pIP-Ad DNA polymerase complex to the genome 
terminus. 

Two viral proteins expressed from E1B inhibit 
apoptosis that is otherwise induced by E1A CR1 and 
2 in E1B deletion mutant-infected cells. The E1B-19K 
(kilodaltons) protein is a Bcl2 homolog that inhibits 
the release of cytochrome c from mitochondria and 
the activation of caspaces. The E1B-55K protein binds 
to host cell p53, a transcription factor activated by 
DNA damage that induces G, arrest or apoptosis. 
E1B-55K contains a strong repression domain and 
activates the DNA-binding activity of p53, converting 
it from a regulated activator of cell cycle arrest and 
apoptotic genes into a constitutive repressor of the 
same genes. Like E1B-55K, E4-ORF6 also binds to 
p53 and inhibits its ability to activate transcription. 
E1B 55K and E4-ORF6 also perform a second func- 
tion during the late phase of infection: they form a 
complex that shuttles in and out of the nucleus and 
stimulates the selective nuclear transport and transla- 
tion of late viral mRNAs. 

E4-ORF4 stimulates the dephosphorylation of 
nuclear RNA-binding SR proteins by protein 


phosphatase 2A, altering the activity of splice sites in 
the complex viral transcription units that encode mul- 
tiple alternatively processed mRNAs during the late 
phase of infection. E4-ORF4 also induces apoptosis 
by a p53-independent mechanism at the end of the 
~ 36h infection cycle, a process that may be import- 
ant for the release of progeny virions. Dimers of E4- 
ORF6/7 bind to host cell E2F transcription factors, 
stimulating their cooperative binding to two inverted 
E2F sites in the E2 promoter. 

Proteins expressed from E3 and the VAI (Virus 
Associated I) RNA counter host defenses against viral 
infection. VATis abundantly transcribed by RNA poly- 
merase III and transported to the cytoplasm. It binds 
to the double-stranded RNA-binding site on the PKR 
protein kinase, inhibiting its activity. In the absence of 
VAI RNA, PKR is activated during the late phase 
of infection, inhibiting protein synthesis by phospho- 
rylating translation initiation factor eIF2. E3-19K 
protein binds to histocompatability antigens (HLA) 
protein in the endoplasmic reticulum, preventing its 
transport to the plasma membrane, and therefore its 
ability to present viral peptides to T-cell receptors 
on cytotoxic T lymphocytes. Three E3 proteins pre- 
vent the apoptosis of infected cells induced by TNF 
when it is secreted from activated macrophages and 
cytotoxic T cells. E3-14.7K inhibits signaling by the 
tumor necrosis factor (TNF) receptor (Fas). E3-10.4K 
and E3-14.5K form a transmembrane complex that 
causes endocytosis and degradation of Fas in lyso- 
somes. 

With the onset of viral DNA replication, a viral 
transcription factor is expressed from the IVa2 pro- 
moter that binds to three sites in the first intron of the 
major late transcription unit, greatly stimulating tran- 
scription from the major late promoter (MLP). DNA 
replication is also required for activation of the MLP, 
perhaps by displacing the histone-like pVII protein. 
The MLP has a TATA box with high affinity for the 
TATA-box-binding protein subunit of transcription 
factor TFIID, and consequently is a strong promoter 
for in vitro transcription using a nuclear extract from 
HeLa cells. The MLP was used as the template for 
transcription assays used in the purification and iden- 
tification of the general transcription factors for RNA 
polymerase II: TFIIA, B, D, E, F, and H. Approxi- 
mately 18 late mRNAs encoding virion structural 
proteins, a 100 kDa nonvirion protein required for 
virion assembly, and a virion-associated protease 
required for virion maturation and virion uncoating 
are processed from transcripts from the MLP by alter- 
native polyadenylation at one of five possible sites 
(Figure 2) followed by alternative RNA splicing. 
RNA splicing was discovered through the electron 
microscopic examination of heteroduplexes between 
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Figure 2 Ad2 transcription map. Transcription units are designated by horizontal arrows above and below the 
double line representing the 36 kb viral genome, with arrowheads indicating the direction of transcription. Vertical 
arrows indicate sites of polyadenylation for each of the five families of late mRNAs processed from the major late 


promoter (MLP) transcription unit (LI-L5) and the E2 early and late transcription units (E2A, E2B). 


late viral mRNAs and viral DNA which revealed 
introns as looped out regions of single-stranded 
DNA. Adenovirus introns and splice sites have been 
important substrates in im vitro experiments that have 
characterized the mechanism of RNA splicing. 

Late in infection (after the onset of DNA replica- 
tion), host cell and early viral protein synthesis is 
inhibited due to the dephosphorylation and conse- 
quential inactivation of the cap-binding translation 
initiation factor eIF4E. Translation of mRNAs tran- 
scribed from the MLP are resistant to this inhibition 
because of their common ~200 base 5’ untransla- 
ted region, the ‘tripartite leader’ produced from the 
splicing of three short exons. The tripartite leader 
allows translation initiation by hypophosphorylated 
eIF4E, and stimulates a high rate of translation by 
a ‘ribosome shunting’ mechanism that transfers the 
40S initiation complex from the 5’ end of the leader 
to an AUG within ~30 bases of the 3’ end of the 
leader, without scanning through the intervening 
RNA. Host cell mRNA nuclear-cytoplasmic tran- 
sport is also inhibited at very late times in infection 
by a poorly understood mechanism that requires 
the E1B-55K-E4-ORF6 complex and viral DNA rep- 
lication. Progeny virions assemble in the nucleus 
forming crystalline arrays of ~10° virions per HeLa 
cell and are released by disintegration of the killed 
host cell. 


Oncogenic Transformation 


At low frequency, Ad2, 5, and 12 transform cultured 
rodent fibroblasts to a noncontact inhibited pheno- 
type that forms foci of transformed clones readily 
recognized on a monolayer of nontransformed cells. 
Ad12 causes sarcomas in rats and hamsters at the site 
of injection. Human adenovirus DNA replication and 
late gene expression is attenuated in rodent cells that 


are not killed by infection. Transformation results 
from the rare integration of viral DNA into a random 
site in a cellular chromosome, probably as a conse- 
quence of host cell DNA repair processes. Transform- 
ation requires the continued expression of E1A CR1 
and 2, which drives cell cycling, and either E1B-19K, 
E1B-55K, or E4-ORF6 to prevent the apoptosis that is 
otherwise induced by E1A. Tumorogenicity of Ad12 
is due to inhibition of cell killing by cytotoxic T cells 
as a result of decreased expression of class I major 
histocompatibility complex (MHC) proteins, and 
resistance to killing by natural killer (NK) cells, both 
a consequence of Ad12 E1A functions. Cultured 
human cells are largely resistant to transformation by 
transfected E1A and E1B, and extensive analysis has 
failed to find human adenovirus DNA associated with 
human tumors. 


Adenovirus Transducing (Gene Therapy) 
Vectors 


Adenovirus recombinants have been constructed that 
package engineered gene expression cassettes into ad- 
enovirus virions. Most methods for preparing adeno- 
viral recombinants depend on 293 cells, a line of 
human embryonic kidney cells transformed by trans- 
fection of sheared Ad5 DNA. 293 cells contain the left 
~4 kb of the Ad5 genomeintegrated intoa cell chromo- 
some and express the Ad5 E1A and E1B proteins. 
Consequently, they complement the replication of 
Ad2 and 5 recombinants in which nonviral DNA is 
substituted for E1A and E1B. Large numbers of such 
recombinants can be propagated in 293 cells and 
infected into cells that express the adenovirus CAR 
and o,B; integrin receptors. Less efficient infection 
can be achieved in cells lacking the CAR receptor 
and expressing other forms of integrins. 
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If the DNA substituted for E1A and E1B contains 
an expression cassette including a promoter, the gene 
of interest, and RNA processing signals, the gene of 
interest is expressed in the transduced cells. Since E1A 
CR3 is required to stimulate high levels of transcrip- 
tion from early viral promoters, the productive infec- 
tion is greatly delayed and attenuated. In animal 
models, E1A-E1B substitution vectors (often called 
‘first generation’ vectors) have expressed transduced 
genes at high level for periods on the order of 1-2 
weeks. Infection of hepatocytes 1 is particularly effi- 
cient following intravenous injection of recombinant 
adenovirus because of high levels of the viral receptors 
on hepatocytes and the slow percolation of blood 
past hepatocytes during hepatic circulation. Virtually 
100% of hepatocytes have been transiently transduced 
in the mouse and rat. Transient expression of trans- 
genes results from loss of viral DNA, especially in 
replicating cells, and from cytotoxic T-cell elimination 
of transduced cells that express low levels of the highly 
immunogenic virion proteins. 

Successively more defective recombinants have 
been constructed in order to minimize the expression 
of viral proteins and the resulting induction of an 
immune response against the transduced cells. These 
defective substitution mutants are often propagated in 
engineered host cells that express multiple viral early 
proteins. A recent promising approach is the use of a 
helper virus for propagation of recombinants contain- 
ing only the ~100bp ITRs that function as DNA 
replication origins and a nonprotein-coding packaging 
signal region of ~350 bp from the left end of the Ad5 
genome. Such highly substituted recombinants are 
commonly known as ‘gutless’ adenovirus vectors. 
Helper viruses have been engineered in which the 
viral DNA packaging signal is flanked by parallel 
LoxP sites, recognized by the Cre site-specific recom- 
binase of bacteriophage P1. When introduced into 293 
cells that express a high level of Cre, the packaging 
signal is removed from the helper genome, which then 
replicates and expresses viral proteins at high levels, 
but is not incorporated into virions efficiently. Gutless 
vector introduced into the same cells replicates from 
the ITRs and is packaged into progeny virions because 
of the packaging signal near one end of the recombin- 
ant DNA. Recently, such gutless vectors have been 
reported to express transduced genes for more than 
1 year in mice. 


Further Reading 

Shenk T (1996) Adenoviridae: the viruses and their replication. 
In: Fields BN, Knipe DM and Howley PM (eds) Fundamental 
Virology, 3rd edn, pp. 979-1016. Philadelphia, PA: Raven 
Publishers. 


See also: Cre/lox — Transgenics; Oncogenes 
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Chromosome segregation at anaphase 1 of meiosis 
in translocation heterozygotes may yield genetically 
normal, balanced, and unbalanced products in differ- 
ent proportions, depending on the orientation of the 
chromosomes in the translocation quadrivalent and 
the positioning of chiasmata (meiotic crossovers). 
Alternate disjunction is the co-segregation to the 
same spindle pole of alternate nonhomologous cen- 
tromeres around the quadrivalent (Figure 1, AB’ and 
A'B). Adjacent I disjunction is the co-segregation of 
adjacent nonhomologous centromeres (Figure 1, AB 
and A'B’), while Adjacent II disjunction is the co- 
segregation of adjacent homologous centromeres 
(Figure |, AA’ and BB’). 

Chiasma formation within the four segments distal 
to the centromere (the pairing segments) leads to a 
ring quadrivalent at metaphase I. As illustrated in 
Figure IA, Alternate disjunction then gives rise to 
5% normal and 50% balanced gametes. On the other 
hand, both Adjacent I and II disjunction lead to 


Figure | (Opposite) Alternate/Adjacent | segrega- 
tion and gametic output of translocation quadrivalent: 
(A) Chiasma formation within the pairing segments only; 
(B) Additional chiasma formation within one interstitial 
segment. Note difference in metaphase II chromosome 
morphology. Equal dyads are produced, when there is 
absence of chiasma formation in the interstitial segments 
of the quadrivalent (A). On the other hand, unequal 
dyads are produced with interstitial chiasma formation 
(B). Note also difference in gametic output. Alternate 
disjunction in absence of chiasma formation in the 
interstitial segments gives rise to 50% normal and 50% 
balanced gametes (A). On the other hand, Alternate 
disjunction, when there is chiasma formation within an 
interstitial segment, leads to 25% normal, 25% balanced, 
and 50% unbalanced gametes. Similarly, Adjacent | 
disjunction in absence of interstitial chiasma formation 
gives rise to unbalanced gametes only, while the 
presence of interstitial chiasma formation leads to 25% 
normal, 25% balanced, and 50% unbalanced gametes. 
(Reproduced from Armstrong SJ and Hultén MA (1998) 
Meiotic segregation analysis by FISH investigations in 
sperm and spermatocytes of translocation heterozy- 
gotes. European Journal of Human Genetics 6: 430—431.) 
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unbalanced gametes only. It is important to note that 
chiasma formation (single or odd number) within the 
interstitial segments (between the centromere and the 
breakpoint) will imply a drastic change in gametic 
output. In this situation (Figure 1B) both Alternate 
and Adjacent I disjunction give rise to 25% normal, 
25% balanced, and 50% unbalanced gametes. Chro- 
mosome analysis of gametes can therefore not provide 
information on patterns of meiotic segregation with 
respect to Alternate/Adjacent 1 disjunction. 


Further Reading 

Armstrong SJ and Hultén MA (1998) Meiotic segregation analy- 
sis by FISH investigations in sperm and spermatocytes of 
translocation heterozygotes. European Journal of Human 
Genetics 6: 430-431. 

Armstrong SJ, Goldman SH and Hultén MA (2000) Meiotic 
studies of a human male carrier of the common transloca- 
tion, t(11;22), suggests postzygotic selection rather than pre- 
ferential 3:1 MI segregation as the cause of liveborn offspring 
with an unbalanced translocation. American Journal of Human 
Genetics 67: 601—609. 

Rickards GK (1983) Orientation behavior of chromosome multi- 
ples of interchange (reciprocal translocation) heterozygotes. 
Annual Review of Genetics | 7: 443- 498. 


See also: Balanced Translocation; Translocation 
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Advanced Intercross Lines 
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Mapping genes in experimental species requires the 
generation of appropriate crosses. Several sexually re- 
producing species can be manipulated to create popu- 
lations of a desired architecture. Such populations are 
designed so that genes of interest segregate alongside 
genetic markers that can be genotyped. As the distance 
between a marker and a gene increases, recombination 
will tend to break the correlation between the two. 
Therefore, only markers in the vicinity of the gene will 
remain in correlation with the gene, allowing its local- 
ization to a specific region. Increased recombination 
will more accurately localize the region to smaller 
regions. Advanced intercross lines (AILs) are designed 
to increase the actual recombination rate, which 
allows higher mapping accuracy. AILs are of particu- 
lar interest for mapping quantitative trait loci (QTL) 
that generally suffer from low mapping accuracy due 
to their initial low genotype-phenotype correlation. 
The higher rate of recombination is achieved by a series 
of random intercrosses in subsequent generations. 


Creating the Cross 


An AIL is produced from an F; population generated 
by crossing two inbred lines assumed to be homo- 
zygous for alternative alleles at a series of QTL and 
marker loci. The following generations, F3, F4, F5..., 
F,,, are sequentially produced by randomly inter- 
crossing the previous generation among themselves. 
Individuals from one of the later generations are 
phenotyped and genotyped for QTL mapping pur- 
poses; the previous generations are reared and re- 
produced only. Although random intercrossing is 
adequate, one can improve the strategy by a semiran- 
dom procedure where inbreeding is avoided by select- 
ing genetically unrelated mates as much as possible. 
Figure | illustrates the creation of an AIL from two 
inbred lines and the increase in recombination as the 
number of generations increases. 


Increasing Mapping Accuracy 


The increase in mapping accuracy is attributable to 
the increase in recombination rate at the advanced 
generation. At the advanced generation the actual 
proportion of recombinant haplotypes is increased 
since at each generation there is a new chance for a 
recombination event between any two loci. If the 
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Figure | Generating advanced intercross lines (AILs). 


recombination distance between two loci in the F, 
generation is represented by r, then the proportion of 
recombinant haplotypes, r, in the tth generation will 
be on average: 

n=[l-G-) 0-2/2 0) 
For small values of 7, equation (1) can be approximated 
using first order Taylor’s expansion, giving: 


re = rt/2 (2) 


The effect of increased recombination on mapping 
accuracy follows directly. If the one-sided confidence 
interval in a particular situation is C (in proportion of 
recombination units) for an F, then, at an advanced 
generation, the one-sided confidence interval will be 
represented by the same proportion of recombination 
but it will now correspond to a smaller region, with the 
ratio as given in equations (1) or (2). Since in this case 
we need to represent r as a function of r, equation (2) 
is more convenient. Denoting C, as the one-sided 


confidence interval length in proportion of recom- 
bination units at the tth generation, one obtains: 


C: = C/(t/2) (3) 


That is, with advancing generations, the confidence 
interval is reduced by a factor of t/2, where t is the 
number of generations. To translate this into centi- 
morgans (cM) one needs to apply a mapping function. 

AILs are of interest in species with a short gener- 
ation cycle that can be easily reproduced by intercross- 
ing and for which inbred lines exist. Hence, mice are 
an appropriate species for this design. Still, the time 
and effort required to create an AIL is significant. It 
should, therefore, be created as a resource that enables 
the analysis of multiple QTL and multiple traits in one 
AIL. In mice where distinct inbred strains exist, an 
AIL may serve as the most efficient means to accur- 
ately map a large number of QTL using a single popu- 


lation resource. 


See also: Gene Mapping; Inbred Strain; 
QTL (Quantitative Trait Locus) 
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Aflatoxin B, (AFB;) is a potent toxin, mutagen, and 
carcinogen, and is implicated in the etiology of hep- 
atocarcinoma. Although the liver is the major site of 
injury, AFB,-induced tumors have been experimen- 
tally produced in the lungs, kidneys, and colons of 
rodents. Aflatoxins are produced by certain strains of 
the fungi Aspergillus flavus and A. parasiticus that 
infect grain. Warm temperatures, high humidity, and 
plant injuries, in the field and during storage, promote 
both the growth of the fungi and aflatoxin production. 
The greatest threat to public health is from contamin- 
ated peanuts, cottonseed, maize, and rice. Aflatoxins 
comprise a family of related compounds, each of which 
is designated B or G depending on whether it fluor- 
esces blue or green when exposed to ultraviolet light. 
AFB,, which is produced by A. flavus (Figure | A), is 
the most dangerous. 

AFB, is relatively nonreactive until it is ingested 
and converted by liver enzymes to the reactive 


Aflatoxin B,(AFB,) AFB, 8,9-oxide 


Figure | 


intermediate AFB, 8,9-oxide (Figure |B). This meta- 
bolite rapidly reacts with DNA and forms adducts to 
the N-7 position of guanine (Figure 1C). Adducted 
guanines can undergo two further reactions: (1) the 
imidazole ring of the guanine can open, yielding the 
formamidopyrimidine derivative (AFB,-FAPY); or 
(2) the glycosidic bond between the guanine and the 
sugar can break, resulting in loss of the adducted base 
from the DNA. Of these three DNA lesions it is 
probably the N-7 adduct that is responsible for 
AFB,’s mutagenicity and carcinogenicity. 

AFB, exposure produces point mutations, chro- 
mosomal aberrations, chromosomal breaks, and other 
types of genetic damage. AFB; was one of the first 
human carcinogens whose mutagenicity was de- 
monstrated with the ‘Ames strains’ (McCann et al., 
1975), a set of bacterial strains that have become 
part of the battery of short-term tests for genotoxic 
agents. AFB, was also one of the first human carcino- 
gens whose mutational spectrum was determined in 
vivo, again using a simple bacterium (Foster et al., 
1983). AFB, preferentially induces G to Tand, second- 
arily, G to A mutations in all organisms tested, in- 
dicating that the mechanism by which it produces 
mutations does not differ among species. The adduc- 
tion pattern of AFB, is also nonrandom: it preferen- 
tially binds to guanines that are surrounded by other 
guanines. 

Mutations in the tumor suppressor gene p53 occur 
in many cancers and in most hepatocarcinomas. More 
than 50% of the hepatocarcinomas from patients 
living in areas with high dietary aflatoxin intake have 
one specific mutation in p53 —a GC to TA mutation at 
the third position of codon 259 (which changes AGG 
to AGT, so a serine replaces the normal arginine in 
the protein) (Bressac et al., 1991; Hsu et al., 1991). 
While this mutation reflects the mutagenic action of 
aflatoxin, it is also possible that the mutation conveys 
some particular advantage to liver cells. This remark- 
able mutational specificity suggests that the presence 
of a G to T mutation at codon 259 could be used as a 
biomarker for human aflatoxin exposure. 
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Structures of aflatoxins. (A) Aflatoxin B, (AFB,); (B) AFB, 8,9-oxide; (C) AFB, N7-guanine. 


Further Reading 
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ins. Mutation Research 424: 167-181. 
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Aging is the sum of multiple processes that increase 
the probability of death with increasing chronological 
age of an organism. Age-dependent mortality rates 
increase with age in most organisms that have a dis- 
tinct soma/germline division or exhibit asymmetric 
cell division. The rate of increase in age-dependent 
mortality is strongly influenced by genes leading to 
distinctive species-specific lifespans. 


Evolutionary Origins of Aging 


Extrinsic hazards such as disease and predation 
make indefinite survival of an animal unlikely. Hence 
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natural populations exhibit age-structure in which 
young organisms outnumber old organisms. Such an 
age-structure results in the decline in the force of 
natural selection with age. The effect of genes on 
aging is therefore not due to direct selection on aging 
characters. Rather, aging is a nonadaptive process in 
which the genes that influence the rate of aging either 
do not affect fitness or have been selected due to 
beneficial effects early in life. One prediction of this 
evolution theory of aging is that aging is influenced by 
many genes and is caused by a range of distinct 
physiological and molecular processes. In keeping 
with this view, genetic variants of Drosophila or 
Caenorhabditis that exhibit extended lifespan usually 
exhibit phenotypic trade-offs such as reduced or 
delayed fertility. 


Human Aging 


A genetic underpinning of aging in humans is revealed 
by rare genetic premature aging conditions and herit- 
ability estimates of longevity in normal populations. 
Hutchinson-Guilford syndrome is a rare autosomal 
dominant condition of childhood characterized by 
balding, skin wrinkling, subcutaneous fat loss, and 
intense atherosclerosis resulting in cardiovascular- 
associated death. There is no acceleration in brain 
aging illustrating that premature aging is segmental 
with many tissue types remaining unaffected. Werner 
syndrome (WS) is an adult progeria and is character- 
ized by skeletal muscle atrophy, premature greying of 
hair, heart valve calcification, intense atherosclerosis, 
and hypogonadism. WS is caused by mutation of the 
WRN gene that encodes a member of the RecQ family 
of DNA helicases. 

Genetic factors account for 25% of variance in life- 
span in human populations but the contributing loci 
are unknown. Some alleles associated with longevity 
have been identified from gene association studies. 
These include the £2 variant of the APOE gene, which 
is overrepresented in centenarian populations but is 
also associated with type HI and IV hyperlipidemia. 

Cell culture studies reveal the characteristics of 
human aging. Human cells cultured im vitro undergo 
a limited number of cell divisions (replicative senes- 
cence). The nondividing cells display altered charac- 
teristics including the secretion of matrix-degrading 
extracellular metalloproteinases. A limited number of 
such senescent cells also occur in the tissues of aged 
humans and may contribute to age-related changes and 
functional deficits. Senescent cells result from a short- 
ening of chromosomal telomeres during replication. 
Replicative senescence can be prevented in vitro by 
restoration of the telomere-synthesizing enzyme, 
telomerase. 


Aging in Model Genetic Systems 


Most knowledge of aging mechanisms is based on 
genetic model systems. Saccharomyces cerevisiae 
divides by symmetric budding, which produces a 
mother cell and a smaller daughter cell. The mother 
cell continues to divide but only a finite number of 
times. The number of divisions define the yeast cell’s 
lifespan. After the last daughter cell buds off, the yeast 
mother cell granulates and death occurs. Mutation 
of the yeast homolog of the human WRN helicase 
gene, called SGS1, significantly shortens lifespan. 
A histone deacetylase-encoding gene (S7R2) and a 
homolog of the mammalian RAS proto-oncogene 
(RAS2) are examples of genes controlling yeast life- 
span and may provide links between nutritional sens- 
ing, gene expression, and cell division. 

In invertebrate models, multiple genetic pathways 
determine lifespan. The adult lifespan of the nematode 
roundworm Caenorhabditis elegans is determined 
by multiple endo- or paracrine signals including an 
insulin-like signaling pathway that responds to nutri- 
tion and pheromone sensory signals, and to signals 
emerging from the somatic gonad and the germline. 
This insulin-signaling pathway also functions during 
development to regulate formation of a nonreproduc- 
ing diapause life cycle stage (the dauer larvae) that 
forms in response to adverse conditions such as food 
deprivation. Conditional dauer-formation mutants 
(daf) allow for studies on the role of this pathway in 
adult worms. Mutant adults can exhibit altered body 
size and fertility. However, some of these mutations 
(Age mutations) increase adult lifespan. For example, 
mutations in the daf-2 gene, encoding an insulin 
receptor-like protein, confers a 100% increase in life- 
span. Worms carrying additional daf mutations have 
lifespan extensions of 300%. Such mutations also con- 
fer resistance to environmental stresses such as heat 
and ultraviolet radiation. Whilst mutations of this 
pathway are highly pleiotropic and are likely to affect 
fitness under natural conditions, lifespan does not 
appear to correlate strongly with fertility or body size 
under laboratory conditions. Other classes of Age 
mutations are also highly pleiotropic. For example, 
clock (clk) mutations confer an extended lifespan but 
also deregulate a series of timed events such as devel- 
opment, cell cycle, and reproductive schedule. 

Single genes also have large effects on insect life- 
span. Mutation of the Methuselah gene in Drosophila 
melanogaster causes a 35% increase in lifespan. 
Methuselah encodes a guanosine triphosphate-binding 
protein-coupled seven-transmembrane domain recep- 
tor and therefore may regulate a number of intracellu- 
lar processes including gene expression. The mutant 
flies are also resistant to environmental stressors. 


A Mechanism for Aging 


The molecular mechanisms of aging are unknown. 
However, correlated phenotypes of longevity are con- 
sistent with the oxygen radical theory of aging in 
which endogenously produced reactive oxygen spe- 
cies (ROS) cause the accumulation of damaged macro- 
molecules that compromise the organism. C. elegans 
Age mutants are resistant to oxidative stress and over- 
express some genes encoding antioxidant enzymes 
such as superoxide dismutase (SOD) and catalase. 
D. melanogaster strains engineered to maintain addi- 
tional copies of genes encoding these enzymes exhibit 
an extended lifespan. Additionally, a mouse strain 
with a mutated p66™° gene has a 30% extension of 
life, and is resistant to oxidative stress. 

The genes that determine aging rates are pleiotropic 
and there is a strong correlation between extended 
lifespan, altered life history, lowered fitness, and 
increased stress resistance. The type of genes that 
modulate aging rate indicates that aging is influenced 
by endocrine signals, intracellular signaling pathways, 
metabolic regulators, and stress response factors in- 
cluding antioxidant enzymes. 


See also: Cancer Susceptibility; Cell Cycle; 
Pleiotropy; Ras Gene Family 
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The word ‘agouti’ takes its name from a native South 
American language, where it refers to the rodent Dasy- 
procta leporina, also known as Dasyprocta aguti or the 
golden agouti. Within genetics, however, the term 
describes a coat color gene responsible for a character- 
istic phenotype in laboratory mice and other mammals, 
including Dasoprycta species, in which most hairs on 
the body have a subapical band of red or yellow pig- 
ment on an otherwise black or brown background. 

The Agouti gene encodes a paracrine signaling 
molecule, produced by a specialized group of cells at 
the base of hair follicles, that causes overlying mela- 
nocytes to switch from the synthesis of black/brown 
eumelanin to red/yellow pheomelanin. Under some 
circumstances, Agouti protein can also cause a switch 
from eumelanin synthesis to the synthesis of no pig- 
ment, causing a pattern of black-white-black rather 
than black-yellow-black. 

Variation in regulation of Agouti expression is 
responsible for a diverse set of coat color patterns in 
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different mammals. In the wild-type configuration, 
the gene is controlled by two promoters. One promo- 
ter drives expression in the early portion of the hair 
growth cycle, is responsible for the banding pattern 
described above, and variation in its timing causes 
changes in the width or position of the yellow band. 
A second promoter drives expression throughout the 
entire hair growth cycle, but only in hair follicles on 
the ventral surface of the body, and is therefore 
responsible for the yellow or white ventral coloration 
characteristic of many different species. 

Interest in Agouti gene action stems not only from 
natural coat color polymorphisms and the underlying 
cellular and developmental processes, but also from 
the phenotype of an unusual Agouti allele, lethal 
yellow (A), initially described in laboratory mice 
in 1903. A” is the first recognized recessive lethal 
mutation in a metazoan organism and was the subject 
of many studies directed at understanding the gen- 
etic control of early embryonic development. In the 
heterozygous state, A” is dominant to other Agouti 
alleles, causing a completely yellow coat, and, in addi- 
tion, nonpigmentary effects including hyperphagia, 
obesity, and increased growth. 

Molecular cloning of Agouti and subsequent ana- 
lyses in laboratory mice revealed that the A” mutation 
was caused by a deletion immediately adjacent to the 
Agouti gene that removes coding regions for a ubiquit- 
ously expressed RNA-binding protein and, simultan- 
eously, causes Agouti protein coding sequences to be 
ubiquitously expressed. Thus, embryonic lethality in 
A’/A’ mutant embryos is caused by a requirement for 
the RNA-binding protein in preimplantation develop- 
ment, whereas the dominant effects of A” are caused 
by deregulated expression of Agouti protein. The abil- 
ity of Agouti protein to cause obesity when expressed 
abnormally in animals heterozygous for A” led to the 
discovery of arelated molecule, Agouti-related protein 
(Agrp), which is normally expressed in the hypothal- 
amus and helps to control regulation of body weight. 


See also: Coat Color Mutations, Animals 
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The genus Agrobacterium contains a large group 
of gram-negative, non-spore-forming soil bacteria, 
often isolated from abnormally proliferating plant 
tissues. This genus has been grouped together with 
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Rhizobium, Allorhizobium, and Sinorhizobium in the 
bacterial family of the Rhizobiaceae. Over the years, 
analyses based on new taxonomic criteria, including 
16S RNA sequence comparisons, pointed towards the 
close relationship of these genera. 

They are so close that some taxonomists ques- 
tion the relevance of the discrimination between 
Agrobacterium and Rhizobium. Some even want to 
abolish this distinction. However a change of A. tume- 
faciens into R. tumefaciens might be difficult to 
accept for many soil microbiologists and plant phys- 
iologists. The bacteria grouped in the Rhizobium 
family are also closely related to Mesorhizobium and 
Phyllobacterium, but still sufficiently different to 
keep them as a separate family, the family Phyllo- 
bacteriaceae. 

Until the late 1970s the classification of differ- 
ent species of Agrobacterium was based on their 
phytopathogenic properties. The species capable of 
inducing crown galls on a large variety of dicotyledo- 
nous plants were called A. tumefaciens, those inducing 
hairy root disease A. rhizogenes, and the non-patho- 
genic strains A. radiobacter. Other Agrobacterium 
species were grouped by the fact that they had a 
very limited host range and induced plant tissue 
proliferation only in some plant species. This was 
the case with A. vitis, specific for grapevine, and A. 
rubi, rather specific for some Rubiaceae. Such a classi- 
fication became clearly invalid from the taxonomy 
point of view when it was demonstrated that the 
phytopathogenicity or the host range was due to 
the presence of large plasmids (mega plasmids). In 
the case of A. tumefaciens these plasmids were 
called Ti plasmids, for A. rhizogenes the plasmids 
were called Ri plasmids. They are variant Ti plasmids. 
Curing of these plasmids converted the strain into a 
A. radiobacter. Conjugation into a radiobacter strain 
of a Ti or a Ri plasmid turned such strain into respect- 
ively a tumefaciens or a rhizogenes strain. The absence 
or presence of an extrachromosomal element cannot 
be a valid criterion for a taxonomic classification. 

As the technology for genome studies improved, as 
well as the methods for studying metabolic pathways 
(metabolite display), arguments accumulated to group 
the different Agrobacterium isolates into clusters 
called biotypes or biovars. Discussion is ongoing to 
see how many clusters one should consider. These 
clusters could finally be considered as genera, each 
containing bacteria species formerly named Agrobac- 
terium and Rhizobium. 


General Properties of Agrobacteria 


These motile (one to six flagella), aerobic (oxygen as 
end receptor for electron acceptation), rod-shaped 


bacteria have a rather slow generation time (1.5 to 
several hours) under the most optimal laboratory con- 
ditions. They are able to catabolize a large variety of 
metabolites. They show chemotaxis for some plant 
exudates, which is used in nature to start colonization 
of a characteristic plant tissue. Ongoing studies ana- 
lyze the steps in the cross-talk between plants and the 
different species of Agrobacterium. A detailed mo- 
lecular knowledge of these plant—bacteria interactions 
are of high importance to plant biologists. Indeed the 
synthesis and secretion by these bacteria of cellulose 
fibers and other cell-wall-like molecules is perceived 
as developmental signal by specific plant tissues and 
triggers a variety of responses. 

Agrobacteria show a remarkable resistance to 
desiccation. A bacterial colony left on an agar plate, 
at room temperature, can survive more than 6 months. 

Strains harboring a Ti plasmid (see Ti Plasmids) 
have the remarkable capacity to develop a conjugation 
bridge with plant cells and to transfer a single- 
stranded copy of a segment of the Ti plasmid (segment 
called T-DNA) straight to the nucleus of the plant cell. 
A set of genes encoded by the Ti plasmid and called 
the vir or virulence genes are required to form a con- 
jugation pilus needed for establishing contact with 
the plant plasmalemma, the conjugation bridge, 
induce the formation, the coating and transfer of the 
T-DNA as well as the final integration into plant 
chromosomal DNA. 

In this process not only Ti plasmid genes are needed 
but also several bacterial genome loci participate in the 
formation of an efficient contact and DNA transfer 
between Agrobacteria and plant cells. 

In nature a successful DNA transfer is observed by 
the proliferation of the transformed plant cells, that is 
the cells with a nucleus harboring one or more T-DNA 
copies. This results in the formation of abnormal tis- 
sue proliferation such as crown galls, hairy roots or 
leaf galls, all depending on the Agrobacterium species 
and the nature of the Ti plasmid. Such proliferations 
have been documented on more than 1000 different 
plant species belonging to most of the families of the 
dicotyledonous plants. Exceptionally some monocots 
(Liliaceae) were also susceptible. 


Agrobacterium as Gene Vector for Plant 
Genetic Engineering 


As soon as it was understood that crown gall forma- 
tion was due to the transfer and stable integration of a 
bacterial DNA into plant DNA, it became likely that 
this event of natural genetic engineering might be 
exploited to introduce, at will, genes which could 
confer new, desirable properties to a plant. Methods 
were developed to remove the T-DNA from the Ti 


plasmids, but leaving the border sequences which 
enabled the transfer (origin and terminator of transfer 
sites). This T-DNA was then replaced by one or more 
plant or plant-like genes. When an Agrobacterium 
harboring such an engineered Ti plasmid was used to 
interact with plant tissue, it became possible to re- 
generate from the transformed cells a healthy and 
fertile plant expressing the newly introduced genes. 
Such transgenic or genetically modified (GM) plants 
are at the base of a real revolution in the plant sciences. 
It made it possible to study plant growth and devel- 
opment, plant physiology, and plant ecology in molecu- 
lar terms. The importance of this approach was so 
overwhelming that attempts were made to extend the 
host range of the Agrobacterium-plant interaction, so 
that many more plant species could be transformed. 
This work is ongoing but already representatives of 
most plant families, including monocots from the 
Gramineae family (e.g., grasses such as rice) can effi- 
ciently been transformed by Agrobacterium-based 
methods. 

Such appropriate modifications of the Ti plasmids 
into gene vectors made it also possible to engineer 
crop plants with beneficial traits. The first laboratory 
achievements were around the mid-1980s and con- 
cerned the engineering of plants producing their own 
insecticide. This was an insecticidal protein encoded 
by a bacterial gene, cloned from a Bacillus thuringien- 
sis strain, and modified so that it could be expressed 
and accumulated in plant cells. The second success 
story was the engineering of plants tolerant to novel, 
ecologically more acceptable herbicides. These con- 
structions were then introduced into the élite lines 
of major crops such as corn, soybean, canolla (rape- 
seed), and cotton. These GM-plants, as they have been 
called, were field-tested for many years and were 
approved by the US controlling agencies such as 
FDA, EPA APHIS (USDA) for large-scale trials in 
1996. In 2000 some 45 million hectares of these trans- 
genic plants were grown, mostly in North and South 
America and China. After 15 years of testing and 5 
years of large-scale production no science-based argu- 
ment of danger for health of humans, animals, or 
environment has been advanced. Present studies 
involve the engineering of crop plants better adapted 
to biotic and abiotic stresses and the engineering of 
new compounds in plants. 

Hence Agrobacterium will be increasingly used in 
fundamental and applied studies for unraveling and 
improving its interactions with plant cells. 


See also: Rhizobium; Ti Plasmids; Transfer of 
Genetic Information from Agrobacterium 
tumefaciens to Plants 
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Alanine (Figure 1) is one of the 20 amino acids com- 
monly found in proteins. Its abbreviation is Ala and its 
single letter designation is A. As one of the nonessential 
amino acids in humans, it is synthesized by the body 
and so need not be provided in the individual’s diet. 
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Figure | Alanine. 


See also: Amino Acids 
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Characteristics and History 


Albinism is a group of genetic disorders characterized 
by reduced or absent melanin pigmentation, with an 
overall estimated frequency of about 1 per 20000 in 
most populations. Oculocutaneous albinism (OCA) 
involves the eyes, hair, and skin, whereas in ocular 
albinism (OA) visual involvement is accompanied by 
only slightly reduced pigmentation of skin and hair. 
The distinctive phenotype of OCA was known to the 
ancient Greeks and Romans, and the typical clinical 
features, modes of inheritance, and even genetic het- 
erogeneity of these disorders are evident even in clas- 
sical descriptions. Similar phenotypes occur in a great 
many vertebrate species, and absent catalytic activity 
of tyrosinase in the skin of albino mice was one of 
the first enzymatic deficiencies recognized, prompting 
Garrod to suggest in 1908 that albinism might be 
an inborn error of metabolism. Indeed, there is close 
correspondence between the various OCA pheno- 
types in humans and in mice, and the study of such 
mice has contributed greatly to understanding of OCA 
in humans. 
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OCA results from reduced or absent biosynthesis 
of melanin pigment in the skin, hair, and eyes, and 
affected individuals thus appear lightly complected 
or even virtually white (Figure 1). Melanin plays a 
major protective role against ultraviolet light, and 
persons with OCA accordingly are subject to severe 
sunburn and eventual development of skin cancers 
induced by long-term actinic irradiation, particularly 
in parts of the world with high rates of sun exposure. 
The role played by melanin in the developing visual 
system is not known, but all forms of albinism are 
accompanied by defects of neuronal migration in the 
visual pathways, with consequent low vision, nystag- 
mus and strabismus, and reduced tolerance to ambient 
light (photophobia). These optical defects occur 
regardless of the specific genetic cause of albinism, 
and thus seem to result from reduced melanin itself. 
Finally, social stigmatization and consequent psycho- 
logical morbidity of persons with albinism should not 
be overlooked, particularly in populations in which 
relatively dark skin pigmentation is the norm. 


Biochemistry and Molecular Genetics 


The rate-limiting reactions of melanin biosynthesis 
are catalyzed by the enzyme tyrosinase, which both 
converts L-tyrosine to L-dopa and then L-dopa to 
dopaquinone. Historically, OCA was classified as 
“‘tyrosinase-negative’ versus ‘tyrosinase-positive,’ with 
the recognition that these forms of OCA were non- 
allelic based on the normal phenotype of obligate 
double heterozygotes. With the definition of the cor- 
responding genetic loci it has proved to be frequently 
impossible to distinguish among the different forms of 
OCA on clinical grounds, and these designations have 
given way to more precise gene-based nomenclature. 
Three principal forms of oculocutaneous albinism are 
currently recognized: OCA1, OCA2, and OCA3. 
Autosomal recessive ocular albinism has been found 
to result from compound heterozygosity for mild 
OCA1 or OCA2 mutant alleles. X-linked recessive 
ocular albinism (OA1) involves principally ocular 
manifestations of albinism. In two additional disorders, 
Chediak—Higashi syndrome and Hermansky—Pudlak 
syndrome, OCA is accompanied by additional, fre- 
quently fatal, systemic manifestations. 

OCA1 corresponds to the former tyrosinase- 
negative OCA, and results from mutations in the 
tyrosinase (TYR) gene located in chromosome seg- 
ment 11q14—-q21. In its most severe forms, OCA1 
can be associated with little or no pigmentation and 
with severe visual deficits. Almost 100 different muta- 
tions of the TYR gene have been identified, some 
resulting in complete loss of enzymatic activity and 
others in only partial loss of function. Several of these 
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latter are associated with temperature-sensitive vari- 
ants of the enzyme, presenting human homologs of 
the Siamese cat! Most OCA1 mutations are rare except 
in certain defined populations, and as a result most 
patients are compound heterozygotes for different 
mutant TYR alleles, greatly complicating efforts at 
carrier detection and prenatal diagnosis of OCA1. 
Ordinary white mice likewise have mutations of the 
tyrosinase gene, which corresponds to the classical c 
locus. 

OCA2, perhaps the most frequent form of the for- 
mer tyrosinase-positive OCA, results from muta- 
tions in the P gene located in chromosome segment 
15q11-q13, so named because of its correspondence to 
the pink-eyed dilute (p) locus of mice. This chromo- 
somal region is frequently deleted in patients with 
Prader-Willi syndrome (PWS), and about 1% of PWS 
patients additionally manifest OC A2 due to hemizygo- 
sity or uniparental isodisomy for maternally inherited 
OCA2 mutations of the P gene. OCA2 is usually 
clinically somewhat milder than OCA1. About 30 
different human P gene mutations have been des- 
cribed, again presenting a diversity of missense and 
loss-of-function alleles. However, an intragenic dele- 
tion of the P gene represents the major OCA2 mutation 


in African and some African-American patients, 
accounting for a frequency of OCA2 up to 1 per 
1400 in parts of Africa. The P polypeptide is a 
melanosomal membrane protein that has considerable 
homology to known small-molecule transporters, but 
its specific function is not yet known. 

OCA3 is a clinically somewhat milder form of 
albinism associated with moderately reduced, often 
reddish, pigmentation of the skin and hair and variable 
visual defects. OCA3 results from mutations in the 
TYRP1 gene located at 9p23, which corresponds to the 
brown (b) locus of mice. TYRP1 encodes the melano- 
genic enzyme DHICA oxidase, which enhances pro- 
duction of black eumelanin versus brown pigments. 
OCA3 has thus far only been studied in African and 
African-American patients, in whom a specific frame- 
shift mutation may account for the majority of cases. 
The frequency of OCA3 is unknown. 

OA1, also called X-linked recessive ocular albin- 
ism, is a form of OCA in which skin and hair hypo- 
pigmentation is very mild, whereas visual deficits may 
be relatively severe. In general, only males manifest 
clinical symptoms, although carrier females may exhib- 
it variegated pigmentation of the retina. The OA1 
gene, located at Xp22.3—p22.2, encodes a melanosomal 
protein whose function is unknown. There is no known 
mouse disorder homologous to human OA1. 

Chediak—Higashi syndrome (CHS) is a rare, auto- 
somal recessive disorder characterized by variable 
manifestations of OCA, mild bleeding tendency, severe 
immunologic deficiency, slowly progressive neurolo- 
gic dysfunction, and frequently early death from an 
unusual lymphoproliferative syndrome. CHS results 
from mutations of the CHS1 gene, located at 1q42- 
43 and which corresponds to the mouse beige (bg) 
locus. Homozygosity for protein-null mutant alleles 
of CHS1 results in the typical severe childhood form 
of CHS, whereas amino acid substitutions can be asso- 
ciated with a clinically milder form of the disorder. 
The function of the CHS protein is not yet known, but 
it is thought to be involved in sorting of proteins to 
vesicles, lysosomes, and cytoplasmic granules. 

Hermansky—Pudlak syndrome (HPS) is an auto- 
somal recessive disorder characterized by OCA, a 
moderate bleeding disorder, apparent lysosomal stor- 
age, colitis, and progressive restrictive lung disease 
that frequently results in death in mid-adulthood. 
Although rare in most populations, HPS is prevalent 
in Puerto Rico, where it occurs with a frequency of 
about 1 per 1800. HPS results from mutations of the 
HPS gene, located at 10q23, which corresponds to 
the mouse pale-ear (ep) locus. The great majority of 
Puerto Rican patients have a specific frameshift, indi- 
cative of a founder effect in this island population. 
However, only about half of non-Puerto Rican HPS 
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patients have mutations in this gene, indicating the 


existence of additional HPS genes that have yet to be 
identified. 
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Alcoholism can be defined as excessive and/or com- 
pulsive use of alcohol. Alcoholism results in persistent 
social, psychological, and medical problems, and is a 
leading cause of morbidity and premature death. The 
motivation underlying excessive alcohol consump- 
tion, however, is obscure. 

Determinants of alcohol abuse include an inter- 
action of environmental and biological factors: among 
the latter, genetic factors deserve special attention. 
One-third of alcoholics have at least one alcoholic 
parent and children of alcoholics are more likely to 
become alcoholic than children of nonalcoholics, even 
when raised in nonalcoholic families. Despite a large 
effort designed to understand the underlying biologic- 
al basis of alcoholism, a definitive biological marker 
for alcoholism has not been found. 


Animal Models 


Since genetic animal models were first employed, stud- 
ies have investigated how different strains of mice 
differ in their response to alcohol, what gene products 
(proteins) are influencing behavior, and through 
which mechanism(s). Genetic animal models offer 
several advantages over studies using human subjects. 
For one, the experimenter can control the subject’s 
genotype, whereas in humans only monozygotic twins 
have identical genotypes. One widely used method is 
the study of inbred strains. Each inbred strain consists 
of animals that are essentially identical twins. By study- 
ing a number of different inbred strains, one can inves- 
tigate differences among the strains in response to 
alcohol. If individual differences exist within a strain 
they are assumed to be nongenetic (environmental) in 
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origin, while differences among strains are evidence for 
genetic differences. Studies with inbred mouse strains 
have demonstrated that some strains prefer to drink 
alcohol over water, while other inbred strains tend to 
avoid alcohol. This is one example of a behavior that is 
controlled, to some extent, by genetic factors. 

A second widely used method is to study animals 
bred to exhibit a specific response following alcohol 
administration. By mating together animals that are 
sensitive to a trait (e.g., prefer alcohol solutions or 
exhibit severe withdrawal), most of the genes leading 
to sensitivity will be fixed in these mice. At the same 
time that sensitive animals are mated, animals that are 
insensitive to the same response are mated, fixing most 
of the genes leading to low responsiveness in the in- 
sensitive line. If genes contribute, the sensitive and 
insensitive selected lines will come to differ greatly 
on the trait. If they also differ on behaviors other 
than those for which they were selected, this is evidence 
that the same genes are responsible for both traits. 
Studies utilizing this powerful technique have increased 
our knowledge of which responses to alcohol share 
similar genetic influence (Crabbe et al., 1994). 

While inbred strains and selected lines are useful in 
demonstrating genetic influence on a trait, it is not 
always clear which specific genes are making an ani- 
mal sensitive or resistant to a given drug effect. One 
technique that is useful in the elucidation of precise 
genetic influences is quantitative trait locus (QTL) 
mapping. Traits such as those mentioned above can be 
studied in panels of recombinant inbred (RI) strains. 
Each RI strain is uniquely derived from a cross of two 
specific standard inbred progenitors and individuals 
within an RI strain are genetically identical. RI strains 
have been tested for many genetic markers and the 
locations of those markers have been mapped to specif- 
ic chromosomes. Genes located on the same chromo- 
some are generally inherited together; the more so the 
closer together they are linked on the chromosome. 
Genetic regions (loci) contributing to the (quantita- 
tive) trait can then be identified by correlating RI 
strain behavioral scores with genotype at groups of 
the mapped markers. If a set of linked markers is 
associated with the trait, this QTL is thought to be 
close to a functional gene that influences the trait. 

Once a chromosome region has been identified that 
contains a QTL affecting the trait, functional genes 
already mapped to this region may become candidate 
genes. Future studies can investigate these candidates 
further to see whether these genes are the mapped 
QTLs. For example, three QTLs for acute alcohol 
withdrawal severity have been mapped to regions of 
mouse chromosomes 1, 4, and 11 (Buck et al., 2000). 
Genes coding for several different subunits of the 
y-aminobutyric acid (GABA) receptor A subtype 


map near the chromosome 11 QTL. Because GABA is 
the principal inhibitory neurotransmitter in the cen- 
tral nervous system, it is a good candidate gene for a 
withdrawal convulsion QTL and studies are under- 
way to test the hypotheses that these candidates are 
among the genes affecting acute ethanol withdrawal. 


Clinical Findings 


Certain characteristics present in those likely to 
become alcoholics may be useful markers indicating 
potential risk for the development of alcoholism. 
Electrophysiological evidence from the brain suggests 
that an abnormality exists in alcoholics and their non- 
drinking offspring. When exposed to a novel stimulus, 
alcohol-naive sons have a pattern of brain waves 
(called P3 or P300 evoked potentials) resembling 
those measured in alcoholics (Begleiter et al., 1998). 
These differences in brain activity are hypothesized 
to reflect a genetic vulnerability to alcoholism. 

Predisposing factors have also been studied in 
human populations that have very little genetic or 
social/environmental variability (e.g., a southwestern 
American Indian tribe), and researchers have found 
several genetic markers to be linked with alcoholism 
in this tribe (Long et al., 1998). One marker was lo- 
cated near a gene coding for a GABA receptor. While 
the results from this study require verification, they 
support other evidence implicating this neurotrans- 
mitter system in alcoholism. A second group of re- 
searchers used the general population in the United 
States, selecting families affected by alcoholism (Reich 
etal., 1998). One of the markers discriminating alcohol- 
ics from nonalcoholics was provisionally mapped to a 
location near the gene coding for the alcohol metab- 
olizing enzyme alcohol dehydrogenase (ADH). Other 
evidence suggests that possession of a variant of the 
ADH gene tends to protect against the development 
of alcoholism in Asian populations. Although still 
preliminary, these studies are beginning to identify 
genes mediating susceptibility to alcoholism. 


Conclusions 


Genetic animal models have provided many cases 
where clear genetic influences on alcohol sensitivity 
exist. There are also readily available populations of 
human alcoholics with well-defined pedigrees that 
are making it possible to relate the animal models to 
human genetic findings. This offers the hope that 
besides improving our understanding of how alcohol 
works, genetic studies can provide new methods for 
identifying individuals that might be at risk. Ideally, 
this information will also help to generate new pharma- 
cotherapies to address this disease. 
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Sequences 


During the course of evolution, nucleotide and amino 
acid sequences change. These changes are of two basic 
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types: (1) insertions and deletions caused by gain or 
loss of one or more residues and (2) substitutions 
which are caused by the replacement of one residue 
by another. This can be seen clearly when we take sets 
of sequences that we know to be evolutionarily related 
(these are referred to as being homologous sequences) 
and attempt to carry out a sequence alignment. Align- 
ment is where we match up corresponding residues in 
the sequences in such a way that we align homologous 
parts of the sequences with each other. An example is 
shown in Figure |. This is an alignment between the 
amino acid sequences of the two types of protein 
chains found in human hemoglobin: « and 6. 

These proteins are both members of a multigene 
family of proteins involved in the transport and stor- 
age of oxygen in vertebrates, invertebrates, plants, and 
bacteria. They all have a similar three-dimensional 
structure made up of seven or eight «a-helices sur- 
rounding a prosthetic, iron-containing, heme group 
and all diverged from a common ancestral protein, 
over a billion years ago. The two chains of human 
hemoglobin are relatively similar to each other com- 
pared with the diversity found in the globin family 
in general. These diverged from each other, a few 
hundred million years ago, during the early evolution 
of vertebrates. In order to carry out this alignment, we 
note that the two sequences are slightly different in 
length, due to insertions and/or deletions in one or 
both sequences over time. This means that we cannot 
simply match the two sequences up at the first residue 
in each and continue towards the C-termini. We must 
introduce extra blank characters (in this case hyphens) 
which we use to make what we will refer to as gaps. 

These gaps are just padding which will allow us to 
match up the two sequences in some optimal manner. 
Ideally they will correspond to the sites of insertions 
or deletions but it can often be hard to tell exactly 
where these have occurred and in which sequences. In 
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A sequence alignment between human «-globin (bottom) and B-globin (top). These sequences have been 


aligned by the insertion of gaps (~ characters). Residues that are identical between the two sequences are marked by 


stars. 
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this case, we cannot tell whether a gap corresponds to 
an insertion of extra residues in one sequence or a 
deletion of residues from the other (or both), although 
this can be partly inferred by examining the pattern of 
gaps in the globin family as a whole. Because of this 
ambiguity, for alignment purposes, insertions and 
deletions are often referred to as indels. 


Alignment is Important 


Sequence alignment is an essential prerequisite for a 
wide range of analyses, that can be carried out on se- 
quences. Any analysis that involves the simultaneous 
treatment of a number of homologous proteins will 
usually require that the proteins have been lined up 
with the homologous residues in columns. In Figure 2 
we can see a multiple alignment of some globins where 
this has been done. Only when we have such an align- 
ment can we attempt to ask questions about the way in 
which these sequences evolve. These will include 
questions relating to the phylogeny of the sequences 
and the rate at which they change (e.g., numbers of 
estimated substitutions per site). Such phylogenetic 
and evolutionary analyses are interesting in their 
own right or can be used in a more practical manner. 
For example, they can be used to tell us about the dates 
of important events in the evolution of a gene family 
or to derive amino acid weight matrices (see below). 


Human B 
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Cyanohemoglobin 
Leghemoglobin 


Sequence alignment is very widely used in the bio- 
logical literature to demonstrate conserved regions in 
a protein alignment, which we assume to have great 
functional importance. They may also be used to 
demonstrate homology between a protein family and 
a distantly related member. There may be little overall 
similarity between the proteins but we can feel more 
confident in the possible homology if we observe that 
the residues that are most conserved in the family as a 
whole are also present in the new member. Alignments 
may also be used to investigate conservation of protein 
structure or to predict the structures of new members 
when we know the tertiary structures of one or more 
members of a sequence data set. 

Most protein sequences belong to multigene families 
or contain protein domains which are related, evolu- 
tionarily, to domains in other proteins (from the same 
and from different species). The largest families con- 
tain hundreds of members in many species, especially in 
multicellular eukaryotes. The most familiar examples 
include protein kinases, zinc-finger transcription 
factors, and 7-transmembrane receptor proteins. It is 
difficult to see the subgroupings within these families 
or to follow all of the functional diversity or to relate 
function in different species, without an evolution- 
ary overview of the proteins. This overview can be 
provided bya phylogenetic analysis. This requires prior 


alignment. This situation is more important than ever 
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ADQLKKSADVRWHAERIINAVNDAVASMDDT-—EKMSMKLRDLSGKHAKSFQVDPQYFKV 
VP—QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV 


LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH---—-- 
LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH---——-— 
LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR--—--——— 
LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR-—---- 
SEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG 
LAAVIADTVAAG---D------ AGFEKLMSMICILLRSAY---—----- 
VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA-—— 


Figure 2 A multiple alignment of seven globin sequences from human («- and B-chains of hemoglobin), horse 
(a- and B-chains), whale (myoglobin), lamprey (cyanohemoglobin), and lupin (leghemoglobin). 


with the elucidation of the entire genomic sequences 
of so many model organisms, including humans. 


Alignment Parameters 


From Figure | we can clearly see that these sequences 
are reasonably similar to each other in many respects. 
They share a similar arrangement of a-helices and the 
heme-binding histidines are at matching positions. 
These sequences are now 46% identical to each 
other, assuming that this alignment is correct (there 
are 63 identical residues from 137 alignment positions, 
ignoring gaps). Therefore, over the past few hundred 
million or so years, these sequences have diverged by 
the accumulation of numerous amino acid substitu- 
tions, which we now observe as sequence differences 
along the alignment. Again, it can be hard to tell exact- 
ly which substitutions have occurred and in which 
sequences, although, again, we can make some guesses 
by including other globin sequences in our study. For 
example, at the fourth position of the alignment we 
have a serine (S) in the -globin and a threonine (T) in 
the B-globin. This is a difference and we can assume 
that at least of one of the sequences has changed since 
their most recent common ancestor. If we do not 
have access to any other sequences we cannot tell if 
the serine changed to a threonine or vice versa or if both 
sequences changed or even if several substitutions 
occurred in one or both sequences, at this position. 
The patterns of insertions/deletions and substitu- 
tions that we observe in real sequences are highly 
nonrandom. There are two obvious constraints on 
indels, from the point of view of natural selection. 
Firstly, at the nucleotide level, indels in protein- 
coding regions must maintain the phase of the coding 
sequence, i.e., there are normally only deletions or 
insertions of multiples of three nucleotides. Indels, 
which do not maintain phase, are almost guaranteed 
to result in truncated or highly altered amino acid 
sequences. One may, occasionally see insertions or 
deletions of entire exons in eukaryotes, but again, the 
phase of the coding sequence must be maintained. It is 
harder to generalize about indels in noncoding regions 
or in non protein-coding RNA genes. Once an indel 
has occurred in a protein-coding gene, this will result 
in an altered amino acid sequence. Clearly, if the 
change greatly impairs the functionality of the pro- 
tein, this will not be tolerated by natural selection. 
As a general rule, indels which alter the folding of a 
protein or which affect the active site (in the case of an 
enzyme) or a binding site for a ligand or cofactor will 
not be tolerated. More specifically, indels are rare in 
conserved o-helices and f-strands but are relatively 
common in the loops of irregular structure that con- 
nect them. Helices and strands must pack together to 
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form the basic overall tertiary structure of the protein. 
This packing is largely mediated by the burying of 
hydrophobic amino acid sidechains in the core of the 
protein. These hydrophobic amino acids are arranged 
in conserved patterns on the core helices and strands. 
Indels that disrupt these patterns will disrupt the fold- 
ing of the protein and are uncommon. In Figure 2 we 
can see a multiple alignment of a small number of 
globin sequences. In this alignment, the gaps are all 
between the conserved a-helices. 

In principle, any nucleotide may be replaced by any 
other after a mutational event. In practice, some sub- 
stitutions are more frequent than others, either due to 
biases in the mutational processes or due to the effects 
of natural selection. In protein-coding regions, the most 
visible bias will usually be a preponderance of silent 
nucleotide changes in codons. Some species will also 
exhibit strong preferences for the use of some codons 
over others. There may also be preferences for either 
using or avoiding the use of G-C base pairs. All of these 
effects are strongly species- and position-dependent 
and so it is difficult to lay down general rules. 

With amino acid substitutions, it is possible to 
make some stronger generalizations. Firstly, one might 
expect to see many substitutions between amino acids 
which are coded for by similar codons. These will be 
expected to occur most frequently, due to chance as 
they will require just one nucleotide substitution rather 
than two or three required between more dissimilar 
codons. This can be seen to a small extent between 
closely related proteins (ones which have diverged 
relatively little from each other) but it is, usually, 
completely masked by the effects of natural selection 
on the biochemical properties of the amino acid side- 
chains. Amino acid substitutions which greatly alter 
the properties of the residue are relatively rare while 
those that preserve the main biochemical properties 
are relatively common. If we look at Figure |, we can 
see that most of the differences are between pairs of 
biochemically similar amino acids (e.g., serine and 
threonine or leucine and isoleucine). The most import- 
ant biochemical properties are hydrophobicity, polar- 
ity, charge, and size. The need to bury hydrophobic 
residues in the centers of globular proteins makes this 
property especially important. The well-known muta- 
tion that causes sickle cell anemia is a mutation of a 
glutamate to a valine (charged amino acid residue to 
hydrophobic) at residue number 6 in the B-chain of 
human hemoglobin. This results in an exposed hydro- 
phobic residue which causes hemoglobin molecules to 
stick together under conditions of low oxygen pressure. 

Over the years, many attempts have been made to 
quantify the degree to which different amino acid 
residues may be replaced by each other, during the 
course of evolution. This becomes important when 
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Figure 3 A BLOSUM 62 amino acid weight matrix. The 20 amino acids are given using the one-letter code. 


we try to quantify the quality of an alignment in order 
to choose the best one (see below). The most success- 
ful of these methods are empirical and are based on 
counting actual amino acid differences in alignments. 
In one very influential piece of work by Dayhoff et al. 
(1978), they counted differences in closely related pro- 
teins and used these to derive a probability model for 
amino acid substitution. This model was then used to 
give scores reflecting the chances of seeing a particular 
amino acid substitution after any given amount of 
evolution: the Dayhoff PAM series of weight matrices. 
They are still widely used, despite the relatively tiny 
data sets that were used in their construction. More 
recently, the BLOSUM series of weight matrices was 
derived by Henikoff and Henikoff (1992), by counting 
frequencies of amino acids in columns from conserved 
blocks in alignments (Figure 3). These are based on far 
more data than the earlier PAM matrices and are usually 
considered to be more sensitive. Crudely, these num- 
bers can be used to assign scores to pairs of aligned 
amino acids. The higher the score, the more plausible 
the alignment will be. This is used to distinguish 
between alignments in order to choose the best one or 
can be used during a database search in order to choose 
the most similar sequences to a query sequence. 


Alignment Methods 


In order to carry out an alignment of two sequences, 
we could do this very accurately if we knew exactly 
which changes (insertions/deletions and substitu- 
tions) occurred and where. We could simply match up 
residues which we know to be related to each other, 
either because they have not changed during evolution 
or because we know they have diverged through one 


or more substitutions. In practice, we observe the pre- 
sent-day sequences and align these with each other 
according to some measure of alignment quality. 
Commonly, we attempt to find an alignment that dis- 
plays a maximum number (and quality) of matches 
between residues and a minimal number (and length) 
of gaps. This can be done manually using a word 
processing program on a computer but this is tedious 
and error-prone; more commonly, automatic com- 
puter programs are used. 


Dot Matrix Plots 

A useful device to help visualize alternative align- 
ments is to make use of a dot matrix plot as shown in 
Figure 4. This is a graphical device that places the two 
sequences along two sides of a rectangle and inserts 
dots at all positions between the two sequences where 
there is a match of some kind (say a residue in one 
sequence that is identical to one in the second se- 
quence or some number of residues out of so many, 
e.g., three amino acids out of five identical or some 
number of residues with a high score using an amino 
acid weight matrix). If a plot is made between homo- 
logous sequences, then the best alignment will usually 
show up on the plot as lines of dots, parallel to the 
main diagonal. These lines will be interrupted by 
blanks corresponding to mismatches between the 
sequences and there will be jumps to different dia- 
gonals corresponding to gaps. These plots are useful 
because they are very simple. Repeated sequences 
show up as sets of parallel lines and small regions of 
local similarity, perhaps corresponding to isolated 
matches of single domains in otherwise unrelated 
proteins, can be detected very easily. Plots can also 
be used to reveal regions of self-complementarity in 


Human o-globin,141 residues 


Human B-globin,146 residues 


Figure 4 A dot matrix plot between the sequences 
from Figure |. A dot is placed at every position where 
two out of three amino acids are identical between the 
two sequences. This figure was prepared using the 
DOTPLOT program of Ramin Nakisa. 


nucleotide sequences if one compares a sequence to its 
reverse complement. 


Dynamic Programming 

Dot matrix plots do not deliver an alignment directly. 
Alignments are most commonly derived using compu- 
ter programs that implement a method called dynamic 
programming. This was first introduced to sequence 
analysis by Needleman and Wunsch (1970). It is based 
onthe ability to assign a score to any possible alignment 
between two sequences. These scores are simply the 
sum of all the scores for each pair of aligned residues in 
the alignment, minus some penalty (a gap penalty) for 
each gap. Given such a scoring scheme, dynamic pro- 
gramming will deliver an alignment with the best pos- 
sible score. It is beyond the scope of this article to 
describe any details of how this method works; rather 
we will focus on how we use it in practice. 

For nucleotide sequences, one often scores align- 
ed residues using a simple scheme where identical 
nucleotides get a score of 1 and nonidentical ones a 
score of zero. Gap penalties must then be scaled 
accordingly. More complicated schemes can be imple- 
mented where scores may have a value that is inter- 
mediate between those for a match and a mismatch. 
Further, different matches and mismatches could be 
scored, depending on the position in a codon, if the 
sequence is protein-coding. Such schemes are compli- 
cated to implement and not widely used. 

For amino acid sequences, the amino acid weight 
matrices from the Dayhoff PAM series or from the 
BLOSUM series, or one of the other alternatives, can 
be used. These assign a score for all 210 possible pairs 
of aligned residues (see Figure 3). There are different 
scores for identical residues, depending on the degree 
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to which that residue is found to be highly conserved 
in most proteins (the diagonal elements of the matrix 
in Figure 3). For example, tryptophan and cysteine 
are often found at key positions in proteins where they 
play a critical role and cannot be substituted easily. 
Therefore, two aligned tryptophans get a high score 
(11) whereas two aligned alanines get a much lower 
score (4), because alanine residues can often be substi- 
tuted quite readily by other amino acids. The remain- 
ing 190 values in these weight matrices give scores for 
mismatched residues. These values may be positive or 
negative, depending on how often we expect to see a 
particular pair of residues aligned with each other in 
alignments. Residue pairs with high positive scores 
may be defined as conservative substitutions. 

In both cases above, we assign a high score to similar 
sequences in an alignment and a low score to dissimi- 
lar ones. We can then describe our alignment task as 
one of finding the alignment that maximizes this simi- 
larity score. Alternatively, we could describe the task 
as one of minimizing a distance score where identical 
sequences get a score of zero and nonidentical ones a 
score that reflects the divergence between them. This 
is of importance in the mathematical analysis of align- 
ments, but many biologists will be familiar with the 
similarity scoring schemes only. Either way, we must 
still consider how to score gaps. We canusea gap penalty 
(GP) for each gap which can be subtracted from the 
alignmentscore. Thesimplestscheme will simply assign 
a fixed penalty (g) to each gap, regardless of its size. 


GP=g 


This is certainly simple and allows for some useful 
shortcuts when one writes a computer program to 
use it but it is very crude. It means an indel of three 
residues will get the same penalty as one of 100. A 
more realistic scheme will assign a penalty that is 
proportional to the length (/) of the gap where g is a 
parameter which is normally determined by the user. 


GP= gl 


This is still simple but at least makes long gaps more 
costly than short ones. One problem is that two gaps 
of length one get exactly the same score as a single gap 
of length two. A better scheme would assign a separate 
penalty score for opening up anew gap andalowerscore 
for extending an existing gap. This can be achieved 
using linear or ‘affine’ gap penalties of the form: 


GP=g+hl 


where GP is the penalty for a gap of length J, g is a 
so-called gap opening penalty, and / is a gap extension 
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penalty. This scheme is now very commonly used, 
thanks to a development by Gotoh (1982) which allows 
us to compute alignments using this scheme as quickly 
as with the simpler schemes. Other, more elaborate 
schemes have been proposed which have more com- 
plicated relationships between gap length and gap 
penalty (e.g., logarithmic) but these are not widely 
used. Terminal gaps are usually not penalized (i.e., 
gaps at the ends of alignments). It should be stressed 
that these gap scoring schemes are simple devices to 
allow us to control the lengths and numbers of gaps in 
alignments; they do not necessarily have any deep 
biological justification. 

Given, then, a set of values for all possible pairs of 
aligned residues, and given a gap scoring scheme, 
dynamic programming can be used to find an align- 
ment for a pair of sequences that is guaranteed to have 
the best possible score. There may be several or even 
many alignments with the same best score but we are 
guaranteed to find one of them. As to whether or not 
this alignment will be the best one in terms of the 
evolution of the sequences and the need to align 
homologous residues with each other will depend on 
the parameters used and on the divergence of the se- 
quences. If the sequences are closely related then the 
best alignment will be easy to derive manually and a 
wide variety of parameter values will get the same 
answer. If the sequences are very highly divergent, 
then this will not necessarily be so. Nonetheless, 
dynamic programming is very extensively used in 
biological sequence analysis. 


Uses of Dynamic Programming 

The alignment of only two sequences is of limited use 
given the volumes of data that are available. Dynamic 
programming can, in principle, be extended to more 
than two sequences to produce a multiple alignment 
but it becomes computationally very expensive (e.g., 
the MSA program of Lipman et al., 1989). The com- 
puter time and memory requirements grow expo- 
nentially with the number of sequences and so one is 
limited to relatively small numbers of sequences. 
Multiple alignments are now very commonly derived 
by building up the overall alignment gradually, fol- 
lowing the branching order in an approximate phylo- 
genetic tree of the sequences (e.g., using the Clustal 
program of Thompson et al., 1994). 

Perhaps the most common use of dynamic pro- 
gramming is in database similarity searches. This is 
where you take a query sequence and try to find 
any sequences in the database that are similar to it. 
A variation of dynamic programming called the best 
local alignment algorithm (Smith and Waterman, 
1981) can be used to find any segments of the query 


sequence that have a high alignment score with any 
segment of a sequence from the database. You use 
gap penalties and a weight matrix as with normal 
dynamic programming and the alignment scores are 
used to rank the sequences in order of greatest simi- 
larity with the query. The familiar BLAST program 
(Altschul et al., 1997) can be considered to be a very 
fast approximation to the best local alignment algo- 
rithm and is the most widely used for routine searches. 
Searches can be iterated once some homologous se- 
quences have been found. A multiple alignment of the 
related sequences can be constructed and used to 
search for further, more distantly related homologs 
in a process called profile searching (Gribskov et al., 
1987). This is done with the very powerful PSI- 
BLAST program. 


Alternative Alignment Methods 

All that we have seen so far is based on taking simple 
scores for matches and mismatches and gaps and using 
these to find high-scoring alignments. There is no 
profound reason why these high-scoring alignments 
have to be perfect biologically. We use these methods 
because they are simple to encode in computer pro- 
grams and fast. Ideally, we might like to use methods 
that have a deeper biological or statistical significance 
but these require considerable mathematical sophisti- 
cation. One set of techniques that are particularly 
powerful are hidden Markov models (HMMs) which 
can be used to mirror all of the methods that we have 
discussed so far, but using probabilities rather than 
simple scores (e.g., Krogh et al., 1994). In some cases 
these methods can give greater sensitivity or accuracy 
than the more conventional dynamic programming 
approaches, but in other cases the reverse is true. 
Nonetheless, these more rigorous methods offer 
great scope for future work. 
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Alkaptonuria is a rare, hereditary metabolic disorder 
in which the metabolism of the amino acids tyrosine 
and phenylalanine are defective. Both of these aromatic 
amino acids are normally metabolized via homogen- 
tisic acid (HGA) to fumaric acid and acetoacetic 
acid in the liver (Figure 1). In contrast, alkaptonuric 
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individuals are deficient in the critical enzyme, homo- 
gentisic acid oxidase (HGO). This enzymatic defi- 
ciency causes an accumulation of HGA in their tissues 
and the excretion of several grams of HGA per day in 
the urine. Alkaptonuric patients show three character- 
istic features: the excretion of HGA in the urine; a 
yellow (ocher) pigmentation of their connective tissues 
(joints and tendons); and, in later years, arthritis of the 
larger, weight-bearing joints (hips, knees, shoulders, 
and the lower spine). The exact relationships between 
the accumulation of HGA, tissue ochronosis, and 
arthritis are still not completely understood. 

The essentially complete deficiency of HGO activ- 
ity is inherited as an autosomal recessive disorder. 
Over 30 distinct genetic mutations have been discov- 
ered over the past few years as being responsible for 
structural abnormalities in the enzyme. This means 
that although specific DNA tests can now be devel- 
oped to determine whether or not a person is carrying 
one of the specific HGO gene mutations, we still do 
not have a general test for detecting heterozygous 
carriers of alkaptonuria. 

The urine containing HGA turns dark slowly on 
exposure to oxygen and alkaline conditions, but may 
not be an abnormal color when first excreted. Thus, 
the diagnosis may be missed until adulthood, when 
operations are made on the knee, for example, and 
black, pigmented fragments of the knee cartilage are 
noticed. Unusual X-rays of the lower spine also may 
suggest alkaptonuria because of the characteristic 
degeneration of the intervertebral disks with a nar- 
rowing of the space between the disks and calcifica- 
tion of the intervertebral material. 

Alkaptonuria is a lifetime condition that slowly 
leads to arthritis when the person reaches middle-age, 
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or many years earlier if he engages in very hard labor, 
such as coal mining. The condition does not shorten 
the lifespan of affected people but the ochronotic 
arthritis may be disabling and severe for the large 
joints and spine. It appears that the collagen fibers 
of the connective tissues of the joints and tendons 
are more fragile in alkaptonuric subjects so sheering 
forces tend to break the collagen fibers and cause 
disruption and erosion of the protective joint surfaces. 
It has been proposed that hydroxylysine residues, 
important in cross-linkage of collagen fibers, are defi- 
cient in alkaptonuric connective tissues, presumably 
because of the abnormal metabolites preventing the 
conversion of lysine to hydroxylysine residues. 

Little can be done to reverse these pathological 
changes once they have taken place, so the objective 
of treatment at this time is to reduce the rate and extent 
of the joint pathology by avoiding physical stress to 
the large joints as much as possible. Vitamin C (ascor- 
bic acid) is being tried at higher levels (a gram or more 
per day) than those required for preventing scurvy. 
This treatment should reduce the degree of tissue 
pigmentation and may retard the progression of the 
arthritic complications. A mouse model of alkapto- 
nuria was discovered at the Pasteur Institute in 1994; 
these animals excrete HGA in their urine but they 
do not develop the arthritic complications. Mice, like 
most animals, are able to synthesize their own ascor- 
bic acid, and maintain much higher tissue concentra- 
tions of the vitamin than dietary intake accomplishes 
by the usual, normally adequate, diet in man. Thus, it 
appears that increasing the tissue level of vitamin C to 
appreciably higher than usual concentrations might 
prevent the development of arthritis in alkaptonuric 
patients. 

Treatment of alkaptonuria by replacing the missing 
enzyme using genetic engineering may be a reasonable 
future expectation, but there are special reasons to be 
cautious about this therapeutic direction for this par- 
ticular metabolic disorder. It would be potentially 
dangerous to generate the subsequent intermediary 
compounds of the tyrosine metabolic pathway from 
HGA, i.e., maleylacetoacetic acid and fumarylaceto- 
acetic acid (FAA) (see Figure l) in any tissues unable 
to dispose of them efficiently. There is another much 
more serious hereditary disease, called tyrosinemia, 
which is a hereditary deficiency of FAA hydrolase. 
These patients have severe liver cirrhosis, kidney fail- 
ure, and neurological disturbances. Careful studies 
will need to be undertaken to ensure that such poten- 
tially toxic HGA metabolites would not accumulate 
in the wrong tissues by the introduction of the gene 
for HGO in alkaptonuric patients. 


See also: Phenylalanine; Tyrosine 
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Deoxyribonucleic acid (DNA) is sensitive to damage 
by chemical alkylating agents, generated either purely 
endogenously or else present in our natural environ- 
ment. Sites on the DNA molecule most susceptible to 
alkylation-induced modification include the N7, O6, 
and N3 positions of guanine (G), the N3 position of 
adenine (A), the O* position of thymine (T) and the 
O? position of both cytosine (C) and thymine. The 
phosphate residues that make up the backbone of 
the DNA molecule are also readily alkylated to form 
alkylphosphotriesters. Out of the dozen or so lesions 
produced by alkylators, O°-methylguanine (O°-MeG) 
and O*-methylthymine (O*-MeT) are responsible for 
generating the majority of mutations in prokaryotes, 
eukaryotes, and mammals. O°-MeG preferentially 
pairs with thymine during DNA replication and Of- 
MeT preferentially pairs with guanine; subsequent 
replication results in G:C to A:T and G:C transition 
mutations, respectively. Mutagenesis by O°-MeG and 
O*-MeT is largely prevented by the specificity and 
catalytic efficiency of the DNA alkyltransferase repair 
proteins that directly dealkylate these DNA lesions. 
DNA alkyltransferases preferentially repair the 
O°-MeG and O*-MeT lesions present in double- 
stranded DNA by catalyzing the irreversible transfer 
of the alkyl group to a specific cysteine residue of 
the alkyltransferase protein, forming S-methylcysteine 
and regenerating the normal guanine and thymine bases 
in DNA (i.e., the protein is consumed in the reaction). 
The first DNA alkyltransferase to be discovered 
was the C-terminal fragment of the Escherichia 
coli Ada (39kDa) protein. Ada contains two alkyl- 
accepting cysteine residues, one situated in the 
N-terminal and the other in the C-terminal half of 
the protein, which are held together by a protease 
sensitive so-called asparagine (Asn) hinge region. 
The C-terminal cysteine residue (Cys321) accepts 
the alkyl group from the O°-MeG/O*-McT bases, 
while the N-terminal active cysteine (Cys69) accepts 
the alkyl group from nonmutagenic alkylphosphotrie- 
sters formed in DNA, but notably only from the 
S-diasterioisomers. When Ada Cys69 is alkylated, 
Ada undergoes a conformational change that enables 
it to act as a strong transcriptional activator for a group 
of genes (i.e., ada, alkB, alkA, and aidB) whose pro- 
ducts further protect E. coli from alkylation-induced 
DNA damage. E. coli has a second alkyltransferase, 
Ogt (encoded by the ogt gene), that is constitutively 


expressed and only accepts alkyl groups from alkyl- 
ated DNA bases (i.e., at the Cys139 residue in the 
C-terminal half of the protein). 

Alkyltransferases have been found in a variety of 
different organisms, including yeast, insect, rodent, 
fish, and human; this fact alone underscores the 
importance of alkylating agents as a source of DNA 
damage. There is considerable amino acid sequence 
similarity among the known alkyltransferases. For 
example, the cysteine residue that accepts an alkyl 
group from the O°-MeG/O*-MeT residues is con- 
tained in the conserved amino acid sequence 
PCHRI/V (ie., Pro, Cys, His, Arg, Ile/Val), while 
the active cysteine of those alkyltransferases that 
accept alkyl groups from alkylphosphotriesters (i.e., 
E. coli Ada, Bacillus subtilis AdaB, and Salmonella 
typhimurium Adas) is contained in the conserved 
FRPCKR (Phe, Arg, Pro, Cys, Lys, Arg) sequence. 

Crystal structures of alkyltransferases from E. coli 
(the 19-kDa C-terminal Ada fragment, Ada-C), Pyro- 
coccus kodakaraenis (Pkat), and human (hAGT) have 
recently been determined. Studying the crystal struc- 
tures of Ada-C and hAGT, Madeleine Moore and 
colleagues and Robin Vora and colleagues have pro- 
posed two somewhat different models to describe 
how alkyltransferases might recognize and transfer 
the alkyl group from alkylated DNA lesions. Moore’s 
group initially determined the crystal structure of 
Ada-C and observed that the active alkyl-acceptor 
cysteine is buried within the protein, and thus is not 
properly positioned to make favorable contacts with 
the alkylated base in duplex DNA. Therefore, Moore’s 
group suggested that a conformational change in alkyl- 
transferase proteins is required to expose the active 
cysteine to the target alkylated DNA base. Given that 
Ada-C contains a helix—turn—-helix motif (HTH) which 
is characteristic of DNA-binding proteins, they sug- 
gested further that Ada-C most likely binds the major 
groove of the DNA molecule via the second helix (i.e., 
the so-called recognition helix) and that the C-terminal 
helix is then swiveled about the DNA molecule such 
that the active cysteine site of the protein is exposed to 
the alkylated DNA base substrate whose alkyl group 
protrudes into the DNA major groove. Thus, this 
model requires a gross change in the conformation of 
the Ada-C protein and not the DNA molecule. 

By contrast, Vora’s group suggested that alkyltrans- 
ferases bind to the DNA major groove, but instead 
induce the ‘flipping out’ of the target alkylated base 
into the binding pocket containing the active site 
cysteine, a process which might be initiated via the 
insertion of an amino acid residue(s) into the DNA 
helix. They suggested that this could be initiated by 
the arginine residue that is contained in the conserved 
sequence, RAV[A/G] (present in the recognition 
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helix of the HTH motif of all the known alkyltrans- 
ferases), and might also be important for stabilizing 
the extrahelical (displaced) nucleotide or even for 
forming hydrogen bonds with the unpaired (orphan) 
base. In support of the Vora model, results of bio- 
chemical studies with the hAGT show that substitut- 
ing Arg128 to lysine, which apparently can still enter 
the major groove of DNA but has reduced hydrogen- 
bonding capacity, greatly diminishes the repair activ- 
ity of the hAGT protein. 

It is still unclear which of the two models men- 
tioned above most accurately describes the repair 
activities of DNA alkyltransferases. Further, crystal 
structures of the bacterial and human alkyltransferase 
proteins will no doubt help us to better understand 
precisely how DNA alkyltransferases recognize, bind, 
and repair alkylation DNA damage. 


See also: DNA Repair 
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Allele frequency (also called gene frequency) is the 
term used to describe the fraction of gene copies that 
are of a particular allele in a defined population. Let us 
consider, for example, a population of 100 diploid 
individuals. Each individual carries two copies of 
each gene, so there are a total of 200 gene copies in 
the population of 100 people. Now let us say that 20 
individuals in this population are heterozygous for 
allele A (with a second allele of some other type), 
and 5 individuals are homozygous for allele A. Each 
homozygote would contribute two copies of the allele 
toward the total fraction, while each heterozygote 
would only contribute one copy toward the total 
fraction. So the total number of A alleles in the popu- 
lation would be 20 + 10, for a total of 30. The allele 
frequency would be this number divided by the total 
number of gene copies (30/200) to yield 0.15, which is 
the allele frequency. Allele frequencies can always be 
determined in this way when the numbers of homo- 
zygotes and heterozygotes in a population are known. 
When heterozygotes cannot be distinguished because 
an allele expresses a recessive trait, it is still possible to 
use Hardy—Weinberg statistics (see Population Genet- 
ics) to estimate the allele frequency if certain assump- 
tions about breeding practices are made. 


See also: Hardy-Weinberg Law; Population 
Genetics 
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An allele is one of a series of alternative forms of 
a gene. On rediscovery of Mendelian principles in 
the early 1900s, it was recognized that many pheno- 
typic traits of both plants and animals are governed by 
‘unit-characters,’ one such character being inherited 
from each parent. William Bateson proposed the term 
allelomorph to describe each of the inherited unit- 
characters, with differing allelomorphs leading to dif- 
fering phenotypic traits (Bateson and Saunders, 1902). 
Today we know Bateson’s unit-characters as ‘genes’ and 
use the shortened term ‘allele’ in lieu of allelomorph. 
Incontemporary genetics, ‘allele’ refers more generally 
to any DNA sequence variation located at equivalent 
positions of homologous chromosomes, regardless of 
whether that variation influences phenotypic traits. 
For example, many variant DNA sequences are lo- 
cated in noncoding DNA and are detected only using 
molecular genetic techniques. Such alternative DNA 
sequences are nevertheless termed alleles, with the 
alternative forms providing molecular markers for 
that region of DNA. A gene or molecular marker that 
exhibits allelic variation is often termed a locus (plural 
loci). 

Diploid organisms contain two alleles at each locus 
in all somatic cells, having inherited one from each 
parent at fertilization. When those alleles are identical, 
the individual is said to be homozygous for that locus 
(noun form homozygote). When those alleles are non- 
identical, the individual is said to be heterozygous 
(noun form heterozygote). When a normally diploid 
organism contains only a single copy of a chromo- 
somal region, such as occurs in male mammals for 
sequences located on the X chromosome, the individ- 
ual is said to be hemizygous (noun form hemizygote) 
for loci of that region. 

Alleles of diploid organisms are classified as being 
dominant, codominant, or recessive by comparing the 
phenotypes of heterozygotes and homozygotes. If the 
phenotype of a heterozygote is identical to that of one 
of the homozygotes, the allele whose phenotype is 
evident in the heterozygote is said to be dominant. 
The alternative allele, whose phenotype is not evident 
in the heterozygote, is said to be recessive. If the 
phenotype of a heterozygote exhibits properties of 
both homozygotes, as occurs for example in the 
DNA of individuals heterozygous for molecular 
markers, the alleles are said to be codominant. 


Although a single individual contains only two 
alleles of a locus, populations of individuals can 
collectively contain many different alleles of that 
locus. Such loci are said to be polymorphic or to 
exhibit multiple alleles, each allele being denoted by 
a standardized nomenclature (e.g., a’, a°, a”, a’, etc.). 
Any specific allele constitutes a fraction of all alleles 
in the population, that fraction being termed its allele 
frequency. Allele frequencies change over time owing 
to the combined action of mutation, natural selection, 
genetic drift, and other processes. Alleles are in many 
ways the fundamental units of inherited variation. 
In sexually reproducing species, alleles separate from 
their partners during meiosis and are passed from 
generation to generation via the gametes. New com- 
binations of alleles and, hence, new genotypes in 
the population are created each generation by segre- 
gation and recombination of alleles during meiosis 
and by the random union of gametes during fertil- 
ization. 

In laboratory investigations of genetically manipu- 
lated organisms, one or a small number of ‘normal’ or 
wild-type strains are defined as a standard relative to 
which other strains are compared. Such strains are 
usually inbred and contain little if any allelic vari- 
ation,being homozygous at all loci. In such an un- 
changing genetic background, inheritance of new 
mutations can dramatically and consistently alter the 
phenotype of the organism. Such mutant alleles are 
a valuable resource for investigating biological phe- 
nomena. By altering or eliminating the components 
of cells, mutant alleles yield insights into how those 
components function. 

Deducing explicit molecular mechanisms from ana- 
lysis of mutants, however, requires careful considera- 
tion of the molecular nature of the alleles involved. 
Loss-of-function alleles are those that reduce or elim- 
inate the quantity or quality of an encoded protein. 
Such alleles are usually, but not always, recessive. Gain- 
of-function alleles are those in which the encoded 
protein acquires a new or aberrant property, such as 
being expressed in elevated quantities, functioning in 
an unregulated manner, or interfering with the func- 
tion of other genes. Gain-of-function alleles are usually, 
but not always, dominant. For most genes, loss-of- 
function alleles are more common than gain-of- 
function alleles, but explicit genetic and/or molecular 
tests are needed to establish whether an allele is a gain- 
or loss-of-function one. This is especially true when 
only one or a small number of alleles are available for 
investigation, as molecular interpretations are very 
different depending on the nature of the alleles 
involved. Mechanistic interpretations are most precise 
when the exact sequences of the alleles are known. For 


example, nonsense alleles alter a gene sequence such 
that a translation-terminating stop codon is intro- 
duced into its mRNA, while missense alleles alter a 
gene sequences such that one amino acid is substituted 
with another in the protein. Mutations that comple- 
tely eliminate an encoded protein are termed null 
alleles and are particularly useful for understanding 
function of a gene. 
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Allelic exclusion refers to the process by which a cell 
involved in the immune response expresses just one of 
the two alleles it carries for a particular immuno- 
globulin gene or T-cell receptor gene. Allelic exclusion 
is a random process that occurs independently in dif- 
ferent cells of the immune system. Allelic exclusion 
allows each individual immune system cell to maintain 
specificity for a particular antigen. 


See also: Immunoglobulin Gene Superfamily 
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Allopatric is a term used to describe populations or 
species that occupy mutually exclusive (nonover- 
lapping) geographic areas. By their origin two 
forms of allopatry are distinguished. When due to a 
separation (split) of an extensive species range, the 
separated populations are said to be dichopatric, but 
when a peripherally isolated population was estab- 
lished by a founder population, it is said to be peri- 
patric. 


See also: Speciation 
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Many proteins have two or more different, nonover- 
lapping ligand-binding sites. In the case of enzymes, 
one of these, the active site, binds the substrate and is 
responsible for the biological activity of the protein. 
The other site, or allosteric site, is specific for the 
structure of some other metabolites, the allosteric 
effector. The effector molecule can have a positive or 
negative effect on the rate of reaction or the binding 
affinity at the substrate site. When the protein binds 
the allosteric effector, it changes its geometry and 
undergoes an allosteric transition. Today allostery 
refers to almost any action transmitted through a 
macromolecule. 


Origins 


F. Monod, J.-P. Changeux, and F. Jacob introduced 
the idea of allosteric proteins for control of cellular 
metabolism in 1963. Their original examples involved 
the biosynthetic pathways of several amino acids 
where end-product inhibition was observed. They 
distinguished between steric inhibition, where a sub- 
strate analog may inhibit enzymes by interacting with 
the active site, and end-product inhibition of an 
enzyme at the start of a pathway. In the latter, inhibi- 
tors have little or no steric resemblance to the sub- 
strate for the first step in a multistep pathway. This 
difference in geometry of a substrate and inhibitor and 
two sites of action leads to allosteric proteins (Greek 
allos, other + steros, solid or space). Not surprisingly, 
the final example of a use for allostery in that 1963 
publication by Monod, Changeux, and Jacob, was for 
gene regulation; Jacob and Monod introduced their 
operon theory in 1961. 

Two years later, in 1965, Monod, Wyman, and 
Changeux expanded on the idea of the allosteric pro- 
teins and introduced a model dependent on multi- 
meric proteins and the constraints of maintaining 
interactions among subunits as individual subunits 
undergo allosteric transitions. There is a vast literature 
on the nature of the allosteric transition and the 
propagation of the allosteric signal from the effector 
binding site to the substrate binding site. All cases fall 
between the symmetry model and the sequential 
model. In part, this literature was due to the absence 
of known protein structures. Lysozyme, the first 
enzyme structure to be determined, a monomer, 
was determined in 1965. Hemoglobin was the only 
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multimeric protein with known structure. In the inter- 
vening time, structures of numerous proteins have 
been investigated in great detail in connection with 
allostery. These are structures both in the absence and 
presence of substrate and effectors. 


Specific Examples 


Historically, hemoglobin, being one of the first two 
proteins whose structure was determined, served as a 
source of information about the nature of allostery. It 
has both homotropic, i.e., interaction among identical 
ligands, and heterotropic allosteric effects. ‘Homotrop- 
ic’ because as each subunit binds an oxygen molecule 
the affinity for the next oxygen on the next subunit 
increases (also called ‘cooperativity’). ‘Heterotropic’ 
because pH and 2, 3-bisphosphogylcerate affect O3 
binding. The other classic example is aspartate trans- 
carbamoylase. This protein is interesting, since it is a 
protein with 12 subunits, 6 of which are catalytic and 6 
of which are regulatory. 

Among the gene regulatory proteins, the repressors 
for lactose operon, purine biosynthesis, and the 
tryptophan operon of Escherichia coli have been deter- 
mined with and without small molecule defectors and 
DNA. In these, instead of substrate conversion to 
product, the binding affinity of the protein for opera- 
tor DNA is modified by the small molecule effector. 
These molecules illustrate the propagation of long- 
range structure changes over the protein. 

The term ‘allosteric’ or ‘allostery’ is used today to 
refer to almost any consequent action at a distance 
that involves macromolecules and interaction with 
ligands. For example, cell-surface receptor activation 
of events inside the cells are often referred to by the 
term “allostery.’ 
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An allotype is an allelic form of an immune system 
protein that can induce an immunological response 
when introduced into another animal of the same spe- 
cies that does not carry the allotypic allele. An allotype 
is inherited through the germline and is present on all 
polypeptides expressed from the particular allele 
within an individual. In contrast, an idiotype is an 
antigenic entity that is present only on a small subset 
of proteins expressed by a particular individual. 


See also: Immunity 
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oy-antitrypsin (41AT) deficiency is an autosomal 
recessive disorder in which the plasma concentration 
of a major protease inhibitor is reduced to about 15% 
of the normal. The condition occurs in from 1 in 2000- 
7000 of the white European population and is rare in 
blacks and Asians. 

alAT is a major inhibitor of proteases, and there- 
fore plays an important role in the control of tissue 
destruction. «1AT is a member of a family of serine 
protease inhibitors, or serpins. A major consequence 
of the deficiency is emphysema, of particularly early 
onset in smokers. A small percentage of individuals 
with this deficiency develop liver disease in infancy. 
The deficiency is also associated with liver disease in 
adults. 

a1 AT is one of the more plentiful plasma proteins, 
present at a concentration of 1.3 g per liter. The plasma 
concentration varies according to genetic type, or PI 
(protease inhibitor) type. Plasma concentration can 


be measured either by immunological or functional 
methods. «1 AT is an acute phase protein and can show 
a marked increase in concentration during infection, 
in cancer, and in liver disease. Modest increases are 
induced by estrogen during pregnancy or when admin- 
istered as therapy. The deficient condition is generally 
considered to exist with a level of about 25% or less of 
the normal concentration. 


Genetic Variation 


alAT shows considerable genetic variability, with 
more than 70 different genetic variants, called PI 
types. These different genetic types are identified 
using isoelectric focusing, which separates the variants 
according to their charge. Many of the genetic variants 
have also been sequenced. 

The deficiency is inherited as an autosomal recessive 
trait. The most common deficiency is PI type Z. PI ZZ 
homozygotes have about 15 to 20% of the normal 
plasma concentration of «1 AT. The Z protein is visible 
by isoelectric focusing and shows both a reduced con- 
centration and a more acidic isoelectric point. About 
95% of a1 AT deficiency is due to the presence of the Z 
allele. This allele is particularly common in northern 
European populations, with a frequency of about 1 in 
2000 in Scandinavian countries, and 1 in 7000 in North 
Americans of European ancestry, but is not found in 
African or Asian populations. The plasma deficiency 
is due to lack of secretion of the Z type protein from 
the liver cell. The Z «1AT has a tendency to self- 
aggregate, and forms insoluble inclusions within the 
liver cell. Several other rare variants, including Mmal- 
ton, also show this tendency to self-aggregation. 
About 5% of deficiency variants are made up of 
more than 10 rare deficiency alleles, which are found 
in individuals of all racial origins. Other deficiency 
variants are due to a variety of causes including early 
truncation of the protein and instability of message. 


The Gene 


The gene for «1 AT is located on human chromosome 
14 (14q32) within a cluster of genes of similar 
sequence which include corticosteroid-binding globu- 
lin, a;-antichymotrypsin, protein C inhibitor, and kal- 
listatin. The gene is 12.2kb in length, including a 
1.4 kb coding region and six introns. Somewhat differ- 
ent forms of the gene are expressed in different tissues: 
there are three different forms produced in mono- 
cytes, and one in the cornea that are different from 
those produced in the liver. Each of the transcription 
start sites for the gene has its own promoter regulating 
tissue specific expression. 
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Disease Predisposition 


Lung Disease 

a“1AT inhibits a variety of serine proteases, but the 
major physiological substrate is elastase, particularly 
in the lower airways. A deficiency of «1 AT results in a 
protease/protease inhibitor imbalance in the lung, 
which allows destruction of the wall of the airways. 
This tissue destruction occurs mainly in the bases of 
the lungs. In nonsmokers, the onset of shortness of 
breath occurs at a mean age of 45 to 50 years. Smoking 
has a major effect on both the age of onset of pulmon- 
ary symptoms and on the course of deterioration. 


Liver Disease 

The major effects of «1 AT deficiency are in the lung; 
however, the liver can also be damaged. Symptoms of 
liver abnormalities in infancy are expressed in about 
17% of all individuals with «1 AT deficiency. However 
most patients recover, and only 2 or 3% of those with 
the deficiency develop early progressive liver disease. 


Other Diseases 

a1AT appears to be involved in regulation of the 
immune system. A deficiency is associated with a 
variety of disorders with an immune component 
including glomerulonephritis, causing impairment of 
the kidney; panniculitis, inflammation of the fat layer 
immediately under the skin; and rheumatoid arthritis. 


Therapy 


The most effective approach to prevent tissue destruc- 
tion is to avoid smoking. Infusion of purified «1 AT is 
being used, but the extent of the benefit is not clear. 
Aerosol administration of purified «1 AT may become 


feasible. 
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Alpha («)-fetoprotein (AFP) is the major protein in 
the serum and amniotic fluid of mammalian fetuses. 
AFP synthesis occurs primarily in the yolk sac and 
fetal liver and to a much lesser extent in the fetal gut. 
AFP levels rapidly decline at birth due to the loss of 
the yolk sac and greatly reduced expression in the liver 
and gut. AFP has multiple activities and its biological 
role is not fully understood. AFP appears to function 
primarily as a binding and transport protein for 
numerous molecules including estrogen, fatty acids, 
and steroids, activities which might allow AFP to 
control cell proliferation and differentiation in the 
developing fetus. AFP may contribute to the osmotic 
pressure of intravascular fluids by binding several 
divalent cations. Studies suggest that AFP may be 
immunosuppressive and therefore protect the de- 
veloping fetus from the maternal immune system, 
although this idea remains controversial. 

AFP was first identified in 1956 in the fetal serum 
of humans. Several years later, mouse and human 
studies revealed that elevated adult serum AFP levels 
correlated with the presence of liver cancer. Since 
then, increased postnatal serum AFP levels have 
also been associated with other cancers including 
some of the gastrointestinal tract and germ cell tumors 
of the ovary and testes. Thus, AFP is classified as an 
oncofetal protein: a protein that is synthesized during 
fetal life, normally absent in adults, and resynthesized 
in tumors. Easy and sensitive antibody-based assays to 
monitor serum AFP levels exist. Consequently, AFP 
is commonly used as a diagnostic marker for certain 
types of tumors. AFP has also been extensively used as 
a marker for prenatal testing. Elevated maternal serum 
AFP levels are associated with neural tube defects in 
the developing fetus, whereas low AFP levels have 
been associated with trisomy 21 (Down syndrome). 
Elevated AFP levels may result from nonneoplastic 
liver diseases such as viral hepatitis and alcohol or 
drug-induced liver damage. Also, several families 
have been identified in which there is an incomplete 
shut-off of AFP at birth (termed hereditary persistence 
of AFP). Therefore, AFP screening as a diagnostic 
tool for cancers or birth defects must be interpreted 
with caution. 

AFP in various species is a single chain glycopro- 
tein containing 590 amino acids; heterogeneity in the 
extent of carbohydrate moieties results in a molecular 
weight of 67000-74000 daltons. The AFP gene in 


mice and humans is composed of 15 exons that span 
roughly 22 kb pairs of DNA and encodes an mRNA 
of 2.2kb. The human and mouse AFP genes are 
located on chromosomes 4 and 5, respectively. AFP 
is evolutionarily related to the albumin gene, which 
encodes the major serum protein in adult mammals. 
These two genes presumably arose from a duplication 
of an ancestral gene 300-500 million years ago. Two 
additional genes, «-albumin and the vitamin D bind- 
ing protein, are also evolutionarily related to albumin 
and AFP. Interestingly, all four of these genes have 
remained tightly linked during evolution in several 
different species. One possible reason for the con- 
served linkage is that the members of this small multi- 
gene family share common mechanisms of gene 
regulation. 

The patterns of AFP synthesis have engendered 
considerable interest in the AFP gene as a model 
for developmental and tissue-specific transcriptional 
regulation; this question has been studied most exten- 
sively in the laboratory mouse. AFP transcription in 
the yolk sac visceral endoderm is similar to many 
other liver genes. The AFP gene is transcribed at 
high levels in hepatocytes as soon as they can be 
detected and continues to be expressed in these cells 
during fetal development. At birth, AFP transcription 
is dramatically reduced in the liver and gut to levels 
that are extremely low by 4 weeks of age. However, 
AFP transcription can be transiently activated in the 
liver during regeneration that occurs in response to 
injury. Cis-acting elements that regulate AFP tran- 
scription are located upstream of the AFP gene. 
These include a promoter and three distinct enhancers, 
as well as a repressor region that is involved in post- 
natal shut-off. A number of liver-enriched transcrip- 
tion factors that govern AFP synthesis have been 
identified; how these factors contribute to the com- 
plex mode of AFP expression is not fully understood. 


See also: Oncogenes 
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Introduction 


The discovery by Barbara McClintock of transpos- 
able elements controlling gene expression in maize 
(see McClintock, Barbara; Transposable Elements) 
led researchers to look for DNA rearrangements as 


mechanisms for regulation of gene expression in other 
eukaryotic and prokaryotic systems. Many examples 
of reversible on/off gene expression and switching 
between expression of two alleles of a gene were 
found to involve DNA rearrangements, including 
DNA inversion, insertion/excision of DNA elements, 
and directed gene conversion events (see Gene 
Rearrangement in Eukaryotic Organisms; Gene Re- 
arrangements, Prokaryotic). Theswitch-regulated gene 
products are commonly surface antigens required for 
motility, adhesion, and cell-type determination, suchas 
flagellin, pilin, extracellular polysaccharide, and a/a 
mating-type proteins. Specialized DNA recombin- 
ation systems, involving site-specific recombination 
(see Site-Specific Recombination) or transposition 
(see Insertion Sequence; Transposable Elements), 
mobilize DNA elements that control alternation of 
gene expression. Directed gene conversion utilizes 
endonucleases that nick or cleave at specific DNA 
sequences flanking gene alleles to be switched. Gen- 
eral recombination/replication enzymes are directed 
to the cleaved sites where they mediate gene conver- 
sion, as in the case of mating-type switching in fission 
and budding yeast (see Mating-Type Genes and Their 
Switching in Yeasts). 

Table | lists representative DNA rearrangement 
systems that control alternation of gene expression in 
prokaryotes and eukaryotes. This article focuses on 
specialized recombination systems involved in alter- 
nation of gene expression. Directed gene conversion 
controlling mating type in yeast is described in the 
article (see Mating-Type Genes and Their Switching 
in Yeasts). 


Site-Specific Inversion and Alternation 
of Gene Expression 


Hin-Mediated Inversion and Control of 
Flagellin Synthesis in Salmonella 
typhimurium 

The first site-specific recombination system shown 
to control gene expression was the flagellar phase 
variation system of Salmonella typhimurium. The 
details of the DNA inversion system that switches 
expression between H1-type flagellin (FljC) and H2- 
type flagellin (FIjB) in S. typhimurium are given in the 
article Hin/Gin-Mediated Site-Specific DNA Inver- 
sion. Simply stated, expression of the flagellin genes is 
controlled by inversion of a chromosomal DNA seg- 
ment encoding the promoter for fljB and for flA, 
which encodes the repressor of fliC. The site-specific 
DNA invertase, Hin, mediates this DNA rearrange- 
ment within a complex nucleoprotein structure, 
which includes a recombinational enhancer sequence 
and accessory proteins, Fis (factor for inversion 
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stimulation) and HU (histone-like protein). The char- 
acterization of the molecular details of this DNA inver- 
sion system, and related systems in bacteriophages 
Mu and P1 (Table 1), contributed significantly to 
defining the roles of DNA enhancers in stimulating 
DNA recombination, replication, and transcription. 


FimB/FimE-Mediated Inversion 

and Type | Pilin Phase Variation in 
Escherichia coli 

The site-specific DNA inversion strategy controll- 
ing ON/OFF phase variation of type 1 fimbriae in 
Escherichia coli is quite distinct from the Hin-related 
systems. The exceptional feature of this DNA re- 
arrangement is that two different site-specific recom- 
binases, FimB and FimE, mediate inversion of the 
chromosomal segment containing the promoter for 
the type 1 fimbriae gene (fim). FimE, which has been 
shown to be a lambda-integrase-related recombinase 
(see Phage à Integration and Excision), mediates 
inversion of the promoter segment in only one direc- 
tion, the ON-to-OFF direction; whereas FimB can 
mediate inversion to either orientation. Interestingly, 
the regulation of these two recombinases is linked to 
the expression of pyelonephritis-associated pili (pap). 
FimB-promoted bidirectional switching is inhibited by 
PapB, a positive transcriptional regulator of pap, while 
the FimE-mediated unidirectional inversion (ON- 
to-OFF) is stimulated by PapB. Thus, expression of 
Pap is linked to repression of type 1 fimbriae produc- 
tion in the same cell, effectively switching which tissue- 
specific adhesin is present on the surface of the E. coli 
cell (Xia et al., 2000). 


Piv-Mediated Inversion and Phase Variation 
of Type 4 Pilin in Moraxella bovis and 
Moraxella lacunata 

A unique site-specific DNA inversion system regu- 
lates type 4 pilin (tfpQ/J) expression in both Mor- 
axella bovis, a cow eye pathogen, and Moraxella 
lacunata, a human eye pathogen. Type 4 pilin, an 
important virulence factor for these pathogens, is 
required for adherence to corneal and conjunctival epi- 
thelial tissues. The invertible chromosomal segment 
of M. bovis contains the coding sequence for the 
C-terminal regions of TfpQ and TfpI; the constant 
N-terminal region of these pilin proteins and the 
tfpQ/I promoter (tfpQ/Ip2) are encoded immediately 
upstream of the invertible segment (Figure 1). 
Inversion, mediated by the recombinase (Piv), 
switches the type 4 pilin gene segment that is transla- 
tionally fused to the constant region of tfpQ/I. The 
DNA sequence of the inversion region of M. lacunata 
is nearly identical to M. bovis with the notable 
exception of a 19bp duplication early in the tfpI 


Table | Specialized recombination systems controlling alternation of gene expression in prokaryotes and eukaryotes 

Phase variable function Organism Recombination system References 
ON/OFF and TfpQ/Tfpl type IV pilin Moraxella lacunata and M. bovis Piv inversion l 
H I/H2-type flagellin Salmonella typhimurium Hin inversion 2 
Tail fiber (host range) proteins Bacteriophages Mu and PI Gin/Cin inversion 2 
ON/OFF type | fimbriae Escherichia coli FimBE inversion l 
ON/OFF and polymorphic host specificity determinant Mycoplasma pulmonis Site-specific inversion 4 
ON/OFF lipopolysaccharide expression Mycoplasma bovis Site-specific inversion or reversible insertion (?) 5 
ON/OFF extracellular polysaccharide Pseudoalteromonas atlantica IS492 reversible insertion l 
ON/OFF polysaccharide intracellular adhesin Staphylococcus epidermidis IS256 reversible insertion l 
ON/OFF sialic acid synthesis Neisseria meningitidis IS1/301 reversible insertion l 
Possible ON/OFF PorA expression Neisseria meningitidis IS/301 insertional inactivation 6 
Mating-type switching Schizosaccharomyces pombe Site- and strand-specific nick, gene conversion 7 
Mating-type switching Saccharomyces cerevisiae HO endonuclease, directed gene conversion 3 
ON/OFF LpfA fimbrial protein expression Salmonella typhimurium Not determined 8 
|. This article. 


2. Articles on Hin-Gin Mediated Site-Specific DNA Inversion; and Insertion Sequences. 
3. Article on Mating-Type Genes and their Switching in Yeasts. 


4. Dybvig K, Sitaraman R and French CT (1998) A family of phase-variable restriction enzymes with differing specificities generated by high-frequency gene rearrangements. 
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5. Lysnyansky l, Rosengarten R and Yogev D (1996) Phenotypic switching of variable surface lipoproteins in Mycoplasma bovis involves high-frequency chromosomal 
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6. Newcombe J, Cartwright K, Dyer S and McFadden J (1998). Naturally occurring insertional inactivation of the porA gene of Neisseria meningitidis by integration of IS/301. 
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segment. Consequently, M. lacunata exhibits ON/ 
OFF phase variation of the TfpQ pili associated with 
inversion of the pilin segment. While the organization 
of these inversion regions is quite similar to many of 
the Hin-related systems, the novel feature of this 
inversion system is the recombinase, Piv. Based on its 
primary amino acid sequence, Piv is unlike any other 
known site-specific DNA recombinase. In fact, it has 
recently been demonstrated that Piv is structurally 
and functionally related to the transposases of the 
1S110/1S492 family of insertion elements (Tobiason 
et al., in press). Not surprisingly, insertion elements 
from this and other IS families have been found to 
control alternation of gene expression in bacteria. 


Reversible Insertion of IS Elements and 
Phase Variation of Gene Expression 


Reversible Insertion of 1S492 and Phase 
Variation of Extracellular Polysaccharide 
Reversible transposition of insertion elements con- 
trols ON/OFF phase variation of gene expression in 


tfpQ/Ip> 
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both gram-negative and gram- positive bacteria. This 
mechanism for phase variation was first identified in 
the gram negative marine bacterium, Pseudoalteromo- 
nas atlantica (Bartlett et al., 1988). Extracellular poly- 
saccharide (EPS) production by P. atlantica is essential 
for biofilm formation on various solid surfaces in the 
ocean and is an important part of ocean ecology. Rever- 
sible insertion of IS492 into a gene involved in EPS 
synthesis (eps) controls EPS production in response to 
environmental signals, such as cell density (Figure 2). 
Biochemical and genetic characterization of EPS ON/ 
OFF phase variants of P. atlantica demonstrated that 
insertion of IS492 into a specific eps target site has 
occurred in nearly all OFF variants, and precise exci- 
sion of IS492 has restored the eps locus in ON variants. 
Insertion of IS492 results in a 5 bp duplication of the 
target sequence; excision of IS492 from the eps target 
site results in deletion of this duplication as well as the 
element. Excision of IS492 from the eps site is not linked 
to insertion at a new site on the chromosome; however, 
a circular form of IS492 that contains the 5 bp target 
sequence at the circle junction is a product of this 


invL invR 


Figure | Alternation of type 4 pilin in M. bovis. The invertible chromosomal segment, containing the alternate pilin gene 
segments, tfpQ (solid box) and tfp! (cross-hatched box), and the tfpB gene (of unknown function, open box), is shown in the 
orientations for TfpQ expression (solid pili) and for Tfpl expression (cross-hatched pili). The reversible inversion of the 
chromosomal segment is mediated by the recombinase, Piv, at the invL and invR recombination sites. piv (shaded box) is 
encoded immediately downstream of the invertible segment, andis transcribed from pivp in the opposite direction of tfpQ/I. 
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Figure 2 ON/OFF phase variation of EPS in P. atlantica. IS492 (solid box) is inserted in eps (hatched box), an 
essential gene for synthesis of EPS in P. atlantica. Excision of S492 from a specific site within eps is mediated by the 
transposase, MooV, which is encoded within the IS element (open box). Precise excision produces a circular form of 
the IS element containing one copy of the duplicated 5 bp target site at the circle junction. Although the process is 
reversible, the source for IS492 that reinserts into eps is not known; it may be the circular form or one of at least four 
copies of the element found at different sites on the chromosome. 
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precise excision (Perkins-Balding et al., 1999). The fre- 
quency of precise excision can range from 107% to 107! 
per cell per generation, depending on cell growth con- 
ditions. The regulation of excision by environmental 
conditions and the high rate of precise excision are 
unique to the IS492 transposition system at this 
time. 


Insertion/Excision of IS130] and Phase 
Variation of Capsule and Sialylated 
Lipooligosaccharide 

181301, which is unrelated to IS492, has been found 
to be associated with ON/OFF phase variation of 
extracellular polysaccharide synthesis in the gram- 
negative human pathogen, Neisseria meningitidis 
(Hammerschmidt et al., 1996). Neisseria meningitidis, 
a causative agent of sepsis and meningitis, expresses 
a polysaccharide capsule and sialylated lipooligo- 
saccharide (LOS) that allow this pathogen to escape 
the early antigen-nonspecific immune defenses, such 
as phagocytosis by macrophages and activation of the 
alternative complement pathway. In the infection 
process, initial adherence by N. meningitidis to nasal/ 
pharyngeal tissue is mediated by long pili that protrude 
through the thick capsule. However, after successfully 
evading the immune response and reaching mucosal 
infection sites, outer membrane proteins of N. menin- 
gitidis (Opa and Opc) are required for adherence to 
and penetration of mucosal epithelial tissue; the poly- 
saccharide capsule and sialylated LOS essentially block 
Opa and Opc from interacting with their target cell re- 
ceptors. Insertion of IS/301 into a specific site within 
siaA, a biosynthetic gene for sialic acid synthesis, turns 
off both capsule and sialylated LOS expression. Thus, a 
subpopulation of the disseminated bacteria can invade 
the mucosal tissues. Reversion to the encapsulated 
phenotype occurs at a frequency of 107° due to precise 
excision of IS7301. Like IS492, excision of IS1301 is 
not linked to insertion to a new chromosomal site. It 
has not been determined whether the frequency of 
insertion or excision is affected by environmental con- 
ditions. 


Transposition of 18256 and Phase Variation 
of Polysaccharide Intracellular Adhesin 
Staphylococcus epidermidis, normally found on 
human skin, is capable of biofilm formation when it 
expresses polysaccharide intracellular adhesin (PIA). 
Production of PIA is a virulence factor that is associ- 
ated with S. epidermidis strains found in opportunistic 
infections. Phase variation of PIA can occur by trans- 
position of IS256 into biosynthetic genes for PIA, 
icaA, or icaC (Ziebuhr et al., 1999). However, unlike 
1S492 and IS1301, insertion does not appear to be 
targeted to a specific target sequence within these 


loci, and precise excision occurs ata very low frequency 
(107° per cell per generation). Thus, the phase vari- 
ation of PIA in S. epidermidis appears to reflect the 
plasticity that most insertion elements confer to their 
resident genomes. 
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Alternative splicing is the process of selecting different 
combinations of splice sites within an mRNA precur- 
sor (pre-mRNA) to produce variably spliced mRNAs. 
These multiple mRNAs can encode proteins that vary 
in their sequence and activity, and yet arise from 
a single gene. Alternative splicing is an important 
mechanism in the developmental and cell-type specific 
control of gene expression, and is found in nearly all 
eukaryotic organisms that carry out standard nuclear 
pre-mRNA splicing, including animals, plants, and in 
some cases fungi. 

The primary RNA transcript of a gene contains 
exon sequences separated by intervening sequences, 
or introns. Introns are removed from the pre-mRNA 
and the exons spliced together to form a mature 
mRNA. The intron excision process is catalyzed by a 
large macromolecular complex called the spliceosome. 
The splicesome assembles onto each intron from a set 
of five small nuclear ribonucleoproteins (called the 
U1, U2, U4, U5, and U6 snRNPs), as well as additional 
protein factors. The initial assembly of the spliceosome 
involves binding of the U1 snRNP to the 5’ splice site 
of the intron, and the U2 auxiliary factor (U2AF) 
protein to the polypyrimidine tract within the 3’ splice 
site. U2AF then directs the binding of the U2 snRNP 
to the branchpoint sequence upstream of the poly- 
pyrimidine tract. The binding of these initial spliceo- 
somal components to the splice sites defines where 
the exons will be joined and is affected by many 
factors. Regulatory proteins, pre-mRNA secondary 
structure, and the rate of transcription elongation 
through a spliced region are all thought to affect splice 
site choice in different gene transcripts. In particu- 
lar, regulatory proteins binding to the pre-mRNA 
transcript can either enhance or repress specific splic- 
ing patterns. Although, the interactions of these pro- 
teins with the general splicing apparatus are mostly 
unknown. 

Exons or splice sites that are always used in the 
production of an mRNA are called constitutive or 
unregulated, as opposed to alternative or regulated 
exons and splice sites that are not used in every 
mRNA product. Sometimes alternatively spliced 
mRNAs are produced in a set ratio that does not 
vary. In other cases, the use of an alternative splicing 
pattern is dependent on the cellular conditions, and 
can be regulated by cell type, developmental state, or 
extracellular stimulus. Although changes in splicing 
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can be in untranslated regions of the mRNA, they 
usually introduce or delete portions of peptide se- 
quence in the encoded protein. These optional sequen- 
ces can drastically affect the activity of the protein. In 
different proteins, alternatively spliced segments are 
known to alter subcellular localization, ligand bind- 
ing, enzymatic activity, or posttranslational modifica- 
tion. Through the introduction of translational stop 
codons, truncated or inactive proteins are produced. 
The available splicing patterns for an mRNA can be 
very numerous. Through the combinatorial use of 
multiple alternative exons and splice sites, hundreds or 
even thousands of different mRNAs can be produced 
from a single pre-mRNA. 

Variation in the splicing pattern of an mRNA can 
take many different forms. Different gene transcripts 
can contain optional exons, optional introns, and 
alternate 5’ or 3’ splice sites. One common form of 
splicing variation is an optional or cassette exon pre- 
sent in the pre-mRNA that can be either spliced into 
the mRNA or excluded from it. Specific examples of 
such regulated exons include the male-specific exon of 
the Drosophila sex-lethal transcript and the c-src N1 
exon, which is included in the src mRNA only in 
neurons. Mutually exclusive exons are a specialized 
pair of adjacent cassette exons that are spliced in a 
mutually exclusive manner; only one exon of the pair 
is included in a given mRNA. Examples of mutually 
exclusive exons are found in the a- and B-tropomyosin 
transcripts among others. Instead of altering the use of 
a whole exon, the position of a single splice junction 
can be shifted to produce exons of differing size. A 
well-known example of this pattern of alternative splic- 
ing is in simian virus 40 (SV40), where the viral 
T antigen mRNAs differ in their use of a pair of 5’ 
splice sites. Splicing from the upstream 5’ splice site to 
the common 3’ splice site generates the large T antigen 
mRNA, whereas if the downstream 5’ splice site is 
used, the mRNA will encode the small t antigen. Simi- 
larly, alternative 3’ splice sites can be joined to a 
common 5’ splice site, as is found in transcripts of 
the adenovirus major late transcription unit and the 
Drosophila transformer gene. A third pattern of 
alternative splicing is a retained or optional intron. In 
this case, mRNAs from the gene will differ in the 
removal of an intron. One mRNA will be fully spliced, 
while another retains an intron sequence within its 
final structure. The best-known examples of this 
form of alternative splicing are the retroviral genomic 
RNAs, which remain unspliced to encode the Gag and 
Pol proteins or are spliced to produce subgenomic 
mRNAs encoding the Env and other proteins. Regu- 
lated intron retention is also seen in the P element 
transposon transcript of Drosophila, where the third 
intron is removed only in germline cells. This restricts 
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expression of the fully spliced mRNA encoding the 
P element transposase to the germline. Splicing of alter- 
native 3’ terminal exons can be coupled to the use of 
alternative polyadenylation sites, producing mRNAs 
with different 3’ terminal sequences. The switch from 
membrane bound to secreted immunoglobulin M dur- 
ing B cell differentiation is brought about by a change in 
the splicing and polyadenylation of 3’ terminal IgM 
exons. Similarly, changes in promoter position coupled 
with alternative splicing can produce mRNAs with 
alternate 5’ exons. Finally, these different types of spli- 
cing variation are often combined to produce complex 
patterns of splicing within a single pre-mRNA. 

The best understood systems of alternative splicing 
are in the pathway of somatic sex determination in 
Drosophila melanogaster. Several genes in the genetic 
cascade that directs male or female development of the 
fly encode RNA binding proteins that alter the spli- 
cing of specific pre-mRNAs in a sex-specific manner. 
The sex-lethal gene (Sxl) encodes a splicing repressor 
protein that blocks certain splicing patterns. In female 
flies, where Sxl protein is present, the protein binds to 
elements surrounding a male-specific exon in the sxl 
transcript itself. This in some way prevents exon 
recognition by the splicing apparatus, causing female- 
specific exon skipping. Sex-lethal protein also regu- 
lates splicing of the transformer transcript (Tra), 
further downstream in the pathway. In this case, Sxl 
binds directly to a 3’ splice site of the transformer 
transcript, thereby blocking assembly of the spliceo- 
some. This directs the splicing to an alternative 3’ 
splice site downstream, producing a female-specific 
transformer mRNA and protein. The female trans- 
former protein (Tra) is a positive regulator of splicing 
and activates a female-specific exon in the doublesex 
(dsx) transcript. Exon 4 of doublesex contains a splic- 
ing enhancer sequence comprised of a repeated 13 
nucleotide element and a purine-rich element. Each 
of these elements binds to Tra along with two other 
proteins: Tra-2 and a member of the SR protein family, 
an important group of splicing regulators. Unlike Tra, 
Tra-2 and the SR proteins are not female specific 
but are more generally expressed. These proteins 
assemble into a large complex on dsx exon 4 and 
activate splicing at the upstream 3’ splice site, perhaps 
through interactions between the enhancer proteins 
and U2AF. 

In mammalian cells, there are many examples of 
tissue-specific alternative splicing. In some cases, 
these transcripts are thought to be regulated in a simi- 
lar manner to the sex-specific splicing of Drosophila, 
employing combinations of regulatory proteins that 
bind to specific sequence elements in the RNA trans- 
cript. Mammalian splicing regulatory proteins include 
SR proteins and members of the hnRNP group of 


proteins, as well as other factors. Some of these sys- 
tems appear quite complex, exhibiting both positive 
and negative splicing regulation. Work in this area 
is currently focused on identifying splicing regula- 
tory proteins and characterizing their interactions 
with the pre-mRNA and the general splicing appar- 
atus. However, there are likely to be multiple mechan- 
isms contributing to the regulation of alternative 
splicing. 


See also: Gene Regulation; Pre-mRNA Splicing; 
Sex Determination, Human; Sex Determination, 
Mouse 


Altruism 


See: Kin Selection 


Alu Family 
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The Alu family is a set of related distributed sequen- 
ces, each approximately 300 bp long, in the human 
genome. Individual members possess Alu cleavage 
sites at each end. 


See also: Repetitive (DNA) Sequence 


Alzheimer’s Disease 


D C Rubinsztein 
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Dementia is a common condition which increases in 
prevalence with age. For instance, dementia affects 
about 1% of individuals younger than 70 years, 15% 
of individuals aged 80-84 years and more than 40% of 
individuals aged older than 90. Alzheimer disease 
(AD) accounts for about 60% of the dementias in the 
UK. Twin and family studies support a role for genetic 
factors in the etiology of AD. 

Amyloid plaques and neurofibrillary tangles char- 
acterize the neuropathology of AD. AD is neuro- 
pathologically indistinguishable in the young and 
old, but has been arbitrarily divided into early- and 
late-onset disease using age cutoffs of 60 or 65. 

A major component of the amyloid plaques found 
in AD brain is the B-amyloid peptide. This contains 


40-43 amino acids, depending on the C-terminal 
cleavage site. The B-amyloid peptide is derived from 
a larger protein coded for by the amyloid precursor 
protein (APP) gene. The possibility that APP muta- 
tions caused AD was supported by the observations 
that pathological changes indistinguishable from those 
seen in AD are almost universal in Down syndrome 
(trisomy 21) cases aged older than 40 years. This sug- 
gested that there was a locus (loci) on chromosome 
21 that was sufficient to cause AD if present in three 
copies rather than two copies. Subsequently, the APP 
gene was mapped to chromosome 21 and its protein 
was shown to be overexpressed in Down syndrome. 
The discovery of dominant mutations in the APP gene 
in early-onset AD families and characterization of 
the consequences of these mutations suggested that 
abnormal overproduction of B-amyloid was respon- 
sible for some cases of AD. APP mutations account 
for less than 5% of familial early-onset AD. 
Dominant mutations in the presenilin-1 gene on 
chromosome 14 may account for up to 50% of familial 
early-onset cases, while mutations in its homolog, 
presenilin-2, are very rare. Most presenilin mutations 
described to date in AD cases are missense, suggesting 
that these may create abnormal proteins which inter- 
fere with normal metabolism by gain-of-function or 
dominant-negative effects. Fibroblasts from patients 
with presenilin-1 and presenilin-2 mutations over- 
secrete the 42-amino-acid form of B-amyloid and 
transgenic mice expressing mutant presenilin-1 over- 
produce B-amyloid compared to transgenic mice 
expressing wild-type human presenilin-1. This pro- 
vides further support for the gain-of-function nature 
of these mutations and the model that B-amyloid over- 
production may be a central theme in AD pathology. 
The apoE gene is located on chromosome 19q13.2 
and codes for a mature protein of 299 amino acids. The 
three common alleles in humans are distinguished by 
amino acid changes at positions 112 and 158. Apo <3, 
the commonest allele in most populations, has 
cysteine at amino acid 112 and arginine at position 
158. Apo <4 has arginines at positions 112 and 158, 
while apo €2 has cysteines at these positions. ApoE 
plays important roles in lipopotein transport and the 
different alleles are associated with variations in 
plasma cholesterol and triglyceride concentrations. 
Individuals with one copy of apo £4 havea threefold 
risk of AD, while apo <4 homozygotes have an 11.5- 
fold risk, relative to apo £3 homozygotes. In addition, 
increasing apo £4 dose is associated with earlier onset 
of AD. Apo £4 heterozygosity and homozygosity are 
both associated with higher relative risks for early- 
onset AD compared to late-onset AD. The apo e2 allele 
is associated with a decreased relative risk for AD in 
individuals aged over 65 years, compared to apo £3. 
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Apo ¢4 is neither necessary nor sufficient to cause 
late-onset AD and recent estimates suggest that this 
locus accounts for about 50% of the genetic etiology 
of late-onset AD. Thus, other genes are likely to 
impact on AD. 

A number of other loci have been proposed to play 
a role in late-onset AD. Candidate gene association 
studies have suggested that the low-density lipopro- 
tein receptor-related protein, the «,-antichymotrypsin 
gene, and the bleomycin hydrolase gene, among 
others, play a role in AD. However, these data need 
further replication. In addition, it has been suggested 
that variants in the apoE promoter, which appear to 
modify its transcription, may modify the impact on 
AD risk of the coding polymorphisms giving rise to 
the apo €2,3,4 alleles. Reports of linkage of late-onset 
AD to chromosome 12 also await replication. Re- 
cently, a late-onset AD locus has been indentified on 
chromosome 10q. 

The other major component in the pathology of 
AD, besides amyloid deposits, are neurofibrillary tan- 
gles. These are composed of paired and straight helical 
filaments ultrastructurally, which contain the hyper- 
phosphorylated microtubule-associated protein, tau. 
This protein has now been shown to play a direct 
role in neurodegeneration, as mutations in tau are 
associated with some cases who present with familial 
frontotemporal dementia and Parkinsonism. 


See also: Down Syndrome; Genetic Diseases 
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The nucleotide triplet UAG, or amber codon, is one of 
the three ‘nonsense’ codons responsible for termin- 
ation of protein synthesis. 


See also: Nonsense Codon; Ochre Codon; 
Opal Codon 


Amber Mutation 
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An amber mutation is the result of changes in the 
DNA sequence that convert an amino acid codon 
into an amber codon (UAG). 


See also: Nonsense Codon; Start, Stop Codons 
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Amber Suppressors 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1756 


Amber suppressors are genes coding for mutant 
tRNAs whose anticodons have been altered such 
that they respond to the amber codon (UAG). 


See also: Amber Codon 
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Professor Bruce Ames (born 1928) received his BA 
degree from Cornell University in 1950 and his PhD 
in biochemistry and genetics from the California 
Institute of Technology in 1953. He was at the 
National Institutes of Health from 1953 to 1967, 
where he was a section chief in the Laboratory of 
Molecular Biology of the National Institute of Arthri- 
tis and Metabolic Diseases. In 1961 he spent a sabbat- 
ical year in the laboratories of F. H. C. Crick in 
Cambridge and F. Jacob in Paris. He joined the faculty 
of the University of California at Berkeley as a 
Professor of Biochemistry in 1968. 

Ames’s pioneering studies on the regulation of the 
histidine operon and the discovery of the role of 
the transfer ribonucleic acid, t(RNA""s, in the control 
of the histidine operon in Salmonella established him 
as a leader in the field of gene regulation. His paper 
with R. G. Martin on the use of sucrose gradient 
centrifugation for determining the molecular weight 
of proteins in complex mixtures is one of the most 
widely cited papers in the scientific literature. His 
work on bacterial signal molecules, alarmones, and 
modified bases in tRNA has opened up new areas in 
gene regulation. 

Ames has been the international leader in the field 
of mutagenesis and genetic toxicology for 25 years. 
His work has had a major impact on the direction of 
basic and applied research on mutation, cancer, and 
aging. 

Ames’s mutagenicity test, which he developed in 
the early 1970s, is routinely used by drug and chemical 
companies throughout the world for the detection 
of potential carcinogens, making it possible to weed 
out mutagenic chemicals inexpensively, before they 
are introduced into commerce. Ames and his test, 


which is used in over 3000 laboratories, have made a 
major contribution to the characterization of envir- 
onmental mutagens, both synthetic and naturally 
occurring, as well as to clarifying the role of muta- 
genesis in carcinogenesis. Thus two major contribu- 
tions of his work have been the demonstration that a 
high percentage of carcinogens are detectable as muta- 
gens, and that the ability of carcinogens to damage 
DNA is a major aspect of the mechanism of carcino- 
genesis. 

Ames’s research on endogenous DNA damage and 
its importance in aging and cancer has had a major 
impact on understanding disease, as did his seminal 
work on the detection of mutagens and carcinogens. 
Ina series of influential papers and integrative reviews, 
he has documented that endogenous oxidants from 
normal metabolism are important in damaging DNA. 
He has developed an innovative method for measuring 
oxidative DNA damage in individual humans by mea- 
suring oxidative lesions from DNA in the form of 
compounds that are excreted in urine after DNA 
repair. He has shown that, although repair is very 
effective, some oxidative lesions escape repair. The 
steady state level of oxidative lesions increases with 
age, and an old rat can accumulate about 66 000 oxida- 
tive DNA lesions per cell. Ames and his students have 
persuasively argued that oxidative damage to mito- 
chondria (DNA, protein, and lipids) is the weak link 
in aging. 

Ames’s work had a major impact on the field of 
oxidative pathology, by clarifying the role of various 
antioxidants in plasma and identifying major antioxi- 
dants that were previously not fully appreciated. 
These include urate, bilirubin, and ubiquinol. He has 
pioneered the development of important new methods 
for measuring oxidative damage and defenses in 
tissues as well as biological fluids such as urine and 
plasma. The methods include those which detect 
oxidative damage products of DNA, the oxidative 
damage products of lipids such as lipid hydroperoxides 
and malondialdehyde, and a key lipid-soluble antioxi- 
dant, ubiquinol. 

Ames and his students clarified the strategies 
employed by bacterial cells in their response to low 
doses of oxidants such as hydrogen peroxide. The 
discovery of the oxyR regulatory gene, its isolation 
and determination of its sequence and its DNA bind- 
ing sites, has provided general insights into what oxi- 
dants are hazardous to cells, what cell constituents are 
damaged by oxidants, and how these cells sense and 
respond to oxidative stress. They showed that OxyR 
controls a variety of genes, including that for catalase 
and a newly discovered enzyme, alkyl hydroperoxide 
reductase. Studies on the oxyR regulon have led to the 
elucidation of the mechanisms by which exposure of 


bacterial cells to low doses of oxidants allow these 
cells to adapt to subsequent challenges by higher 
doses of oxidants. These pioneering studies have pro- 
vided the insights and foundation for understanding 
how higher organisms such as mammals adapt to oxi- 
dant exposure. 

Ames (with Lois Swirsky Gold) has been the leader 
in painting a broad picture of the wide variety of muta- 
gens and carcinogens to which humans are exposed. 
Their carcinogenic potency data base is the definitive 
reference source for all animal cancer tests. Their ana- 
lyses are having an unusual impact on the prevailing 
paradigm in the field. They have characterized the 
large background of natural mutagens and carcinogens 
and thus have put into perspective, in humans, low 
exposures to synthetic chemicals, both qualitatively 
and in terms of quantitative carcinogenic potency. 
Ames and Gold have shown that half of the chemicals 
tested in high-dose animal cancer bioassays, whether 
synthetic or natural, are classified as carcinogens. 
They have critically addressed the reasons for this 
high positivity rate and have supported the interpreta- 
tion that it is a high-dose effect: induced cell division 
and cell replacement converting DNA lesions to 
mutations. Thus they have made a rigorous and per- 
suasive case that the current practice of linear extra- 
polation from high-dose animal cancer tests to predict 
human risk for low doses of synthetic chemicals has 
distorted the perception of hazard and allocation of 
resources, a matter of great societal import. Ames has 
also provided an intellectual bridge that connects can- 
cer mechanisms to epidemiological results on the role 
of diet in the causation and prevention of cancer. 

Ames’s recent research showing that deficiencies of 
micronutrients such as folic acid are a major cause of 
DNA damage in humans, is likely to have a major 
impact on health and prevention of cancer. He has 
shown that acetyl carnitine and lipoic acid, fed to rats 
at high levels, reverse some of the age-related decay of 
mitochondria. These compounds may be conditional 
micronutrients and thus have a major impact on delay- 
ing aging. 

Bruce Ames is a Professor of Biochemistry and 
Molecular Biology, and Director of the National Insti- 
tute of Environmental Health Sciences Center, Uni- 
versity of California, Berkeley. He is also a Senior 
Research Scientist at the Children’s Hospital Oakland 
Research Institute. He is a member of the National 
Academy of Sciences and was on their Commission on 
Life Sciences. He was a member of the board of direct- 
ors of the National Cancer Institute, the National 
Cancer Advisory Board, from 1976 to 1982. His 
awards include the General Motors Cancer Research 
Foundation Prize (1983), the Tyler Prize for environ- 
mental achievement (1985), the Gold Medal Award 
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of the American Institute of Chemists (1991), the 
Glenn Foundation Award of the Gerontological 
Society of America (1992), the Honda Prize of the 
Honda Foundation, Japan (1996), the Japan Prize 
(1997), the Medal of the City of Paris (1998), and the 
US National Medal of Science (1998). His 400 pub- 
lications have resulted in him being consistently 
among the few hundred most-cited scientists (in all 
fields): 23rd for 1973-84. 


See also: Ames Test 
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The Ames test (Salmonella typhimurium reverse 
mutation assay) is a bacterial short-term test for iden- 
tification of carcinogens using mutagenicity in bacteria 
as an endpoint. It includes mammalian metabolism to 
activate promutagens. A high but not complete correl- 
ation has been found between carcinogenicity in ani- 
mals and mutagenicity in the Ames test. The latter 
detects mutations in a gene of a histidine-requiring 
bacterial strain that produces a histidine-independent 
strain. The Ames test is one of the most frequently 
applied tests in toxicology. Almost all new pharma- 
ceutical substances and chemicals used in industry are 
tested by this assay. The Ames test is named after 
Bruce N. Ames, University of California, Berkeley, 
who developed this mutagenicity test. 


Principle and Tester Strains 


Several histidine-requiring bacterial strains of Salmo- 
nella typhimurium are used for mutagenicity testing. 
Each tester strain contains a different type of mutation 
in the histidine operon (Table |). Because of this 
mutation, the tester strain is not able to form colonies 
on agar without or with only very low histidine 
content. If a mutation is induced in this histidine- 
requiring strain that generates a histidine-independent 
strain, for instance by restoration of the wild-type 
gene (Figure 1), it will gain the ability to form colon- 
ies also on minimal agar. Since a mutation restores 
the histidine-independent wild-type phenotype, the 
Ames test is classified as a “reverse” mutation assay. 
Approximately 10° bacteria are incubated with a 
single concentration of a test substance. Although 
the probability for reversion to the wild-type is extre- 
mely low for a single bacterium, the extremely high 
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Table | Genotypes of commonly used Salmonella typhimurium tester strains? 
Strain Mutation Lipopolysaccharide DNA repair Resistance 
barrier 
uvrB pKMIOI Ampicillin Tetracycline 

TA 98 Frameshift in hisD3052 rfa = + R S 

TA 100 Base substitution in hisG46 rfa = + R S 

TA 102 Base substitution in hisG428 rfa + + R R 

TA 1535 Base substitution in hisG46 rfa = = S S 

TA 1537 Frameshift in hisC3076 rfa = = S S 


“All strains were originally derived from Salmonella typhimurium LT2. hisD3052, mutation in the hisD gene coding for 
histidinol dehydrogenase; hisG46, mutation in the hisG gene coding for the first step in histidine biosynthesis; hisG428, TA 
102 contains A-T base pairs at the site of the mutation in hisG in contrast to TA 100 and TA 1535 that contain G-C base 
pairs at the site of mutation; rfa, mutation that causes a strong reduction in the lipopolysaccharide layer; uvrB, a gene 
involved in DNA excision repair; pKMIOI, plasmid that increases chemically induced and spontaneous mutagenesis by 
enhancement of error-prone DNA repair; R, resistant; S, sensitive. 


Chemical mutagens 


2 | Salmonella 
Wild-type q typhimurium 
bacterium TA 1535 

Endogenous 
hisG46 processes hisG46 
sea C-3' 5'-C |C]C-3' 
3'-G|AIG-5' 3'-G |G G-5' 
(Leucine) (Proline) 


, | 


No colony formation on 
minimal agar 


Formation of colonies 
on minimal agar 


Figure | Genetic basis of the Ames test shown for 
test strain Salmonella typhimurium TA 1535. TA 1535 
carries an A to G point mutation compared with the 
wild-type bacterium. This point mutation causes an 
amino acid exchange (leucine versus proline) in the 
histidine operon (hisG46). As a consequence, TA 1535 is 
not able to perform histidine biosynthesis. A G to A 
point mutation restores the wild-type gene and 
produces a bacterium that is able to form a colony also 
on minimal agar, containing only very small concentra- 
tions of histidine. 


number of exposed bacteria results in a high probabil- 
ity that a mutagen will cause a reverse mutation to the 
histidine prototroph. 

Some mutagens induce exclusively specific types of 
mutations that can be classified as base exchange and 
frameshift mutations. The set of tester strains shown 


in Table | includes different mutations in the histidine 
operon that combined are able to detect most (prob- 
ably >85%) of all genotoxic carcinogens. In order to 
increase their ability to detect mutagens, the Ames 
tester strains also contain other mutations. One muta- 
tion (rfa) causes partial loss of the lipopolysaccharide 
barrier. As a consequence, the permeability of the cell 
wall to large molecules is increased. Another advantage 
of rfa is that this mutation leads to completely non- 
pathogenic bacteria. UvrB indicates a deletion of a 
gene required for DNA excision repair, resulting in in- 
creased sensitivity in detection of many carcinogens. 
The deletion excising the uvrB gene extends also 
through the gene required for biotin synthesis. Thus, 
they also require biotin for growth. Some tester strains 
(TA98, TA100, TA102) contain the plasmid pKM101, 

which confers ampicillin resistance and increases 
the sensitivity to mutagens by enhancement of error- 
prone DNA repair. Bacteria lack most of the enzymes 
required for the activation of promutagens to mam- 
malian carcinogens. A metabolically active fraction of 
mammalian liver homogenate is therefore added in the 
Ames test. 


Specific Techniques 


Two versions of the Ames test are usually applied 
(Figure 2): the plate incorporation assay, where bac- 
teria and test substance are mixed and immediately 
given onto the agar, and the preincubation assay, where 
bacteria and test substance are incubated for 1h at 
37 °C before plating them on agar. The preincubation 
version is more sensitive for some compounds, but 
also more laborious. In toxicological routine, a negative 
result in the plate incorporation assay has to be con- 
firmed ina second series; the first may bea plate incorp- 
orate, the second a preincubation assay. Recently, a 
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Figure 2 (A) Ames test procedure. All incubations 
are performed with and without addition of rat liver S9 
mix (see Figure 3). (B) A typical result of an Ames test 
with tester strain TA 98 without S9 mix. Only a small 
number of revertants can be observed in the solvent 
control (left side). Colonies on control dishes are a 
consequence of spontaneuous mutations due to en- 
dogenous processes, such as generation of reactive 
oxygen species and physical instability of DNA. Addition 
of a mutagen, such as 10 ug benzo [a]pyrene-4,5-oxide 
per plate dramatically increases the number of rever- 
tants (right side). (Preparation of plates: Hildegard 
Georgi; photo: Friedrich Feyrer.) 
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new version of the Ames test, ‘Ames II,’ has been 
developed as a microwell fluctuation test in contrast 
to the standard plate or preincubation test. Ames II 
allows automated, high-throughput screening, requir- 
ing only very small amounts of test substance. 


Metabolizing System: Rat Liver S9 Mix 


Since most carcinogens are not carcinogenic directly, 
but are active only after metabolism, the compounds 
are tested in the Ames test in the presence of a mam- 
malian metabolizing system as well as directly. The 
9000 supernatant fraction (‘S9’; see Figure 3) of liver 
homogenate from rats treated with substances causing 
a strong induction of many xenobiotic metabolizing 
enzymes (e.g., Aroclor 1254 or a combination of 
B-naphthoflavone and phenobarbital) in combination 
with an NADPH-generating system have been 
shown to be very favorable for activation. NADPH 
is required because it represents a cofactor for cyto- 
chrome P450-dependent monooxygenase activity. 
Liver S9 is highly active in carcinogen metabolism, 
since the liver represents the most important organ 
for the metabolism of most foreign compounds. 


Guidelines for Interpretation of Ames 
Test Data 


There are several criteria for determining a positive 
result, including a reproducible increase in the number 
of revertants or a dose-related increase in mutations. A 
positive result in the Ames test will usually initiate 
additional investigations by other mutagenicity assays 
including also mammalian cells. If the positive result is 
confirmed, most pharmaceutical companies will ter- 
minate further development of a drug. However, an 
Ames-positive substance is not necessarily harmful to 
humans. Although the Ames test is a useful tool in 
screening for potential carcinogens, often false-positive 
results are obtained. It is generally accepted that a 
substance may be used clinically even if the Ames 
test is positive, when the positive effect is due to a 
mechanism not relevant for humans and, ideally, if 
additional mutagenicity tests with mammalian cells 
in vivo and in vitro, tests for chromosomal aberra- 
tions, and animal carcinogenicity studies with two 
species were negative. 

Mechanisms causing false-positive results may be: 
(1) differences between bacteria and mammalian cells 
concerning metabolism and DNA repair; (2) differ- 
ences between rat and human liver, since rat liver S9 
mix is used in the standard Ames test; and (3) differ- 
ences between liver homogenate preparations such as 
S9 mix and intact hepatocytes. The latter is due to the 
loss of barrier effects in homogenate by destruction of 
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Figure 3 Preparation of rat liver S9 mix. After centrifugation of liver homogenate at 9000, the supernatant (S9) is 
used as a metabolizing system in the Ames test. S9 contains microsomes and cytosol and therefore all microsomal and 
cytosolic xenobiotic metabolizing enzymes. In contrast, the sediment containing cell membranes and lysosomes is 
discarded. An NADPH (cofactor for cytochrome P450-dependent monooxygenase activity)-generating system is 


added to S9 to form the “S9 mix.” 


the cell membrane and loss of phase II metabolizing 
enzymes owing to dilution of cofactors. Thus, bear in 
mind that the Ames test is an artificial system and does 
not necessarily reflect the im vivo situation. This is 
illustrated by the observation that the endogenous 
tripeptide glutathione and the amino acid cysteine 
both are positive in the Ames test under specific con- 
ditions (Glatt et al., 1983). 

Performance and interpretation of Ames test results 
have been standardized by international guidelines, 
such as those of the Organization for Economic Co- 
operation and Development (OECD Guideline 471) 
and the International Conference on Harmonization 


(ICH). 


Future Prospects 


The Ames test is a sensitive tool in screening for 
potential genotoxic carcinogens. However, despite the 
high correlation, a positive result is difficult to interpret 
for the individual case in question, because a mutagen 
in the Ames test is not necessarily harmful to humans. 
These problems can be alleviated in future. It has been 
shown clearly that the use of intact cells instead of S9 
mix improves the correlation between carcinogenicity 
and mutagenicity data (Utesch et al., 1987). Since 
cryopreserved human hepatocytes are now available, 
they can be used as a metabolizing system instead of 
rat S9 mix (Hengstler et al., 2000). This gives a possi- 
bility to test whether a positive result in the standard 
rat S9 Ames test is also relevant to humans. 


Further Reading 

Ames B and Hooper K (1978) Does carcinogenic potency 
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Amino acids are a class of important biomolecules that 
contain both amino groups (-NH3") and carboxylate 
groups (-COO_ ). In most contexts, the term ‘amino 
acids’ refers to the a-amino acids, so-called because 
both the amino and carboxyl groups are attached to 
the a-carbon of the structure depicted in Figure IA. 
However, other types of amino acids are encountered 


in nature, such as the B-amino acids, in which the 
amino and carboxyl groups are attached to different 
carbons in the backbone (Figure IB). 

All «-amino acids (with the exception of glycine) 
have four different substituents attached to the 
a-carbon and are therefore chiral molecules. Chirality, 
also called handedness, is a special subset of asymme- 
try describing objects that have no internal plane of 
symmetry and are not superimposable upon their own 
mirror image. Almost all of the amino acids found 
in nature have the same chirality with respect to the 
a-carbon and are referred to as L-amino acids based on 
chemical nomenclature guidelines. Their rare, stereo- 
isomer, mirror-image counterparts are called p-amino 
acids. Just as your right hand will not fit properly into 
a lefthand glove, a D-amino acid will not fit properly in 
a space that fits an L-amino acid. 

Although hundreds of amino acids have been 
identified or synthesized, 20 of these are often desig- 
nated the common amino acids. These are shown in 
Figure 2. In biological systems these amino acids 
are the building blocks of proteins. Under the direc- 
tion of messenger RNA during protein synthesis, the 
amino and carboxyl side chains of two amino acids 
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a-Carbon atom 
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are condensed chemically on the ribosome, which 
acts as an amino acid polymerase. This reaction 
releases water and is called peptide bond formation 
(Figure 1C). Chains of amino acids linked in this 
manner are thus called polypeptides or, more simply, 
proteins. 

Proteins are polymers with a constant backbone 
region (the peptide bonds) and variable side chains 
(the ‘R? side chains of amino acids). The chemical 
properties of the side chain of each amino acid (e.g., 
its size, charge, or hydrophobicity) and the order in 
which the amino acids are polymerized contribute to 
the overall shape and chemical properties and there- 
fore the function of each protein. These functions can 
vary widely, from proteins that form physical struc- 
tures for maintaining cell shape to elegant and delicate 
enzymatic machines that carry out highly regulated 
chemical reactions. 


Sources and Uses of Amino Acids 


Plants and many bacteria synthesize all 20 of the 
amino acids listed in Figure 2. Amino acids are 
synthesized from a variety of primary metabolites in 
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Structural features of amino acids. (A) Common structure of o-amino acids, position of the o-carbon is 


labeled; ‘R’ represents any number of possible chemical structures, which may be as simple as a single hydrogen atom 
(-H, as in the amino acid glycine) or more complex chemical groups, as in (B). (B) Examples of naturally occurring 
amino acids, with different ‘R’ groups. From left to right: alanine, B-alanine, and phosphinothricine. Note that in 
B-alanine the amino group is attached to the B-carbon. Phosphinothricine, a herbicide, is an unusual amino in that it 
contains phosphorus. (C) Representation of peptide bond formation. 
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Figure 2 The 20 common amino acids. Three- and one-letter abbreviations and the composition of the ‘R’ side 
chain group are shown. Asterisks indicate which amino acids are essential for humans. Proline is an imino acid; the 
entire structure and not just the side group is shown. Tyrosine can be synthesized from phenylalanine and therefore is 


not essential if phenylalanine is provided. 


living cells. However, vertebrates, including humans, 
are only able to manufacture a subset of these amino 
acids. Hence they must obtain the remainder of their 
amino acids from their diet. Amino acids that must be 
obtained in this manner are known as essential amino 
acids (Figure 2). Since proteins are composed of 
amino acids, diets that are rich in protein are more 
likely to contain sufficient amounts of each of the 
essential amino acids to preclude any deficiencies. 

In animal feeds, however, where the bulk of the 
protein present may come from a single source such 
as grain, imbalances in the individual essential amino 
acids can occur. For example, corn (maize) provides 
the bulk of protein in feed for livestock. Yet the pro- 
tein found in normal field corn is disproportionately 
low in lysine. To compensate, farmers routinely add 
lysine to animal feed to improve its nutritional value. 

Amino acids are produced commercially from a 
variety of sources and for a variety of uses. For exam- 
ple, lysine, tryptophan, and threonine to be used as 
feed supplements are produced by fermentation. In 
these processes, genetically altered bacteria that 
produce more of an amino acid than they need for 
their own growth excrete the excess amino acid into 
their growth medium. Once the desired amino acid 


accumulates to a sufficient level, the bacteria can be 
removed and the amino acid purified for use directly or 
as an ingredient in feed formulations. Glutamic acid, 
whichis often used as the flavor-enhancer monosodium 
glutamate (MSG), is similarly produced by microbial 
fermentation. Other amino acids are produced 
commercially by chemically hydrolyzing proteins. 
Thus cysteine, which is particularly abundant in the 
protein keratin, is produced from hair. 

In addition to industrial applications in animal 
feed, human nutrition, and flavor enhancers, amino 
acids are also important components of cosmetics 
and medications. Amino acids or their chemical ana- 
logs can be used as precursors for synthesis of phar- 
maceutical agents. Synthetic polymers of amino acids 
are used to encapsulate drugs so as to aid in their 
absorption or to control their release into the blood- 
stream. Current research into amino acids promises 
to yield new polymers that can be used as textile 
fibers, novel antibiotics to combat infectious diseases, 
and nutritionally enhanced plants to feed a hungry 
world. 


See also: Biotechnology; Genetic Code; 
Translation 
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Amino acid substitution is the term used to indicate 
that the amino acid residue at a specific location in a 
protein is different from the residue found in the 
normal or wild-type protein at the same location. 
Amino acid substitutions typically arise because of 
mutation or because of an error in either transcription 
or translation. 

Mutations which lead to amino acid substitutions 
are called missense mutations. Such mutations occur 
within the coding region of a protein-encoding gene 
and are typically single base pair substitutions. Errors 
in translation which give rise to amino acid substitu- 
tions can be either the result of misacylation ofa transfer 
RNA (tRNA), i.e., an error by an aminoacyl-tRNA 
synthetase, or a misreading of the genetic code on the 
ribosome itself. Suppression of nonsense mutations by 
suppressor tRNAs also typically leads to amino acid 
substitution. 

Several of the 20 amino acids of proteins can be 
grouped according to the related chemical properties 
and physical structures of their sidechains. For in- 
stance, the acidic amino acids aspartic acid and glutam- 
ic acid have similar properties as do the nonpolar 
branched chain amino acids isoleucine, leucine, and 
valine. Substitutions of closely related amino acids at a 
particular site in a protein are called conservative sub- 
stitutions. A conservative amino acid substitution may 
have a limited impact on the activity of the protein. Of 
course, in some cases the sequence of a particular 
region of a protein is not critical to the activity of the 
protein and almost any kind of amino acid substi- 
tution will have a limited effect, provided the protein 
can still fold correctly. 

Clearly, however, many amino acid substitutions 
will lead to a protein with an altered activity because 
the amino acid substitution does in fact affect the 
activity, folding, or assembly of the protein. Incorp- 
oration of additional cysteine residues can lead to 
cross bridges and interfere with folding. Similarly the 
amino acid proline disrupts regular secondary struc- 
ture so its substitutions at certain sites will lead to loss 
of activity. In addition, amino acid substitutions on 
the exterior of a protein have less effect on the activity 
of a protein than amino acid substitutions within the 
folded protein’s interior. Of course, substitution of 
amino acids in or near an enzyme’s active site often 
have profound affects on activity, even if the sub- 
stitution is a conservative one. For example, all 
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amino acid substitutions for the catalytically active 
histidine in the enzyme chloramphenicol acetyltrans- 
ferase result in an almost complete loss of enzyme 
activity. 

Particular substitutions which affect activity are 
often found by analyzing mutants which were selected 
or screened as having phenotypes indicating a loss of 
activity. The earliest example of this kind of analysis 
would be the fine structure mapping of the trpA gene 
of Escherichia coli and the biochemical analysis of the 
tryptophan synthetase A protein encoded by the gene 
from Trp” mutants carried out by Charles Yanofsky. 
Analyzing data only from such examples would 
lead to an overestimate of the impact of amino acid 
substitutions. 

However, the use of site-specific mutagenesis to 
introduce essentially random point mutations in an 
open reading frame makes it possible to ask questions 
about the overall effect of amino acid substitutions in a 
given protein. Analysis of such studies from a large 
number of proteins indicates that the minority of ran- 
dom amino acid substitutions would be expected to 
lead to a significant loss of protein function. Similar 
information is available from the sequence studies 
of the human B-globin gene and its product. Over 
300 single amino acid substitutions are known in the 
human B-hemoglobin chain, and the majority of these 
have no effect on hemoglobin function and are com- 
patible with normal health. Some residues, however, 
are critical to function and amino acid substitutions 
at these sites lead to loss of function (and hemoglobino- 
pathy). 

When comparing the sequences of hemoglobin 
molecules from phylogenetically diverse organisms it 
is found that some amino acids are always found in the 
same positions; these amino acids residues are said to 
be invariant. However, as might be expected from the 
above discussion, there is variation at most residues. 
The amount of variation is correlated to evolutionary 
distance. 

The percentage of positions at which amino acid 
substitutions have occurred between homologous 
proteins from two different species is a measure of 
the fixation of mutations in the genes encoding these 
proteins since the organisms have evolved from a 
common ancestor. Divergence can be used to calculate 
an evolutionary distance. 

Different genes or regions within a gene will accu- 
mulate mutations, and therefore amino acid substi- 
tutions, at different rates depending on such factors 
as the fraction of residues which are invariable or 
the fraction at which only conservative changes are 
allowed. In addition to the globins, several other pro- 
teins have been analyzed in this way, some of which 
span an even wider phylogenetic range. For example, 
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the 104 residues of cytochrome c from humans and 
from chimpanzees are identical, but they have only 
38 residues in common with cytochrome c from the 
yeast Saccharomyces cerevisiae. 


See also: Globin Genes, Human; Homology; 
Mistranslation; Mutation, Missense; Suppressor 
tRNA; Yanofsky, Charles 


Amino Terminus 


J Parker 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 I .0045 


All proteins have at one end an amino acid residue 
whose « carbon amino group is not involved in the 
formation of a peptide bond. This is the amino ter- 
minus or N-terminus. Ribosomes begin synthesizing 
a protein from the amino terminus, and all ribosomes 
initiate using either the amino acid methionine (ar- 
chaea and eukarya) or N-formylmethionine (bacteria). 
In bacteria the formyl group is removed from the 
methionine of nearly all nascent proteins by the en- 
zyme polypeptide deformylase. 

In all organisms the initiating methionine is re- 
moved from the majority of proteins by the enzyme 
methionine aminopeptidase. Therefore, many mature 
proteins do not have a methionine residue at the amino 
terminus. However, the activity of this enzyme is 
dependent on the identity of the next residue in the 
protein. In many organisms, it appears that the next 
residue must be an alanine, glycine, proline, serine, 
threonine, or valine for the enzyme to act. At least in 
the enteric bacteria it has been shown that methionine 
aminopeptidase is essential to the organism. 

Methionine aminopeptidase removes only the initi- 
ating methionine. Other proteases must be respon- 
sible for removing subsequent amino acids. For 
instance, the mature cII protein encoded by bacterio- 
phage A is missing both the initiating methionine and 
valine, which would have been added as the second 
residue when the protein was being synthesized. 

The amino-terminal region also contains the amino 
acid residues which signal the cell to export the pro- 
tein, or to transport it to certain cellular compart- 
ments. These ‘signal sequences’ or ‘leader peptides’ 
function in the transport process and are removed as 
these proteins are transferred through membranes. 
Other mature proteins are the product of processing 
events by proteases that do not involve transport 
but may involve removal of residues at the amino 
terminus. 


In bacteria the formylation of the initiating methio- 
nine is accomplished while the methionine is attached 
to the initiator tRNA (and as we have seen is com- 
monly removed during translation of the complete 
protein). However, in all organisms, modification to 
the amino-terminal amino acid residue of a protein 
can occur as the result of a posttranslational process. 
Acetylation is quite common; for example, in Escher- 
ichia coli the amino-terminal serine of elongation 
factor Tu is acetylated, as are the amino-terminal resi- 
dues of several ribosomal proteins. The E. coli riboso- 
mal proteins L7 and L12 are identical except that the 
amino-terminal serine of L12 is acetylated to give L7. 
In the case of L7/L12, at least, this acetylation is under 
physiological control. 

The identity of the amino acid residue that occurs at 
the amino terminus of the mature protein also is an 
important determinant of the rate of degradation of the 
protein, at least in bacteria, eukaryotic microorgan- 
isms, and animal cells. This is sometimes referred to 
as the N-end rule. Interestingly, amino acid residues 
that are uncovered by the action of methionine amino- 
peptidase (see above) are stabilizing, whereas the 
presence of other amino acid residues at the amino 
terminus lead to more rapid degradation of the pro- 
tein. Therefore, the requirement organisms have for 
this enzyme could relate to stabilization of proteins 
rather than the necessity of removing methionine to 
achieve full activity. 

The stability of proteins can be affected by add- 
ing residues to the amino terminus in a nontemplated 
fashion. In eukaryotes this can be accomplished by 
arginyl-tRNA protein transferase which adds an argi- 
nine residue to amino-terminal glutamyl or aspartyl 
residues of certain proteins as part of a pathway by 
which these proteins are degraded. Prokaryotes con- 
tain a leucyl/phenylalanyl-tRNA protein transferase 


which serves a similar function. 


See also: Leader Peptide; Translation 
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Aminoacyl-tRNA is transfer RNA carrying an amino 
acid; the covalent linkage exists between the COOH 
group of the amino acid and either the 3/- or 2'- 
OH group of the terminal base of the tRNA. 


See also: Transfer RNA (tRNA) 
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During translation of the genetic code, the ribosome 
polymerizes amino acids according to the information 
(codon sequence) in the ribosome-bound messenger 
RNA (mRNA). The identity of an amino acid inserted 
at a particular position during polypeptide synthesis is 
determined by the interaction of an mRNA codon 
with a particular aminoacyl-tRNA (AA-tRNA), spe- 
cified by the tRNA portion of the AA-tRNA. AA- 
tRNAs are formed by the 3’ esterification of tRNAs 
with the appropriate amino acids. For most AA- 
tRNAs, this is achieved by direct aminoacylation of 
a particular tRNA with the corresponding amino acid, 
catalyzed by a group of enzymes known collectively 
as the AA-tRNA synthetases (AARSs), in the follow- 


ing two-step reaction: 


AA+ATP+AARS ++ AARS-AA-AMP+PPi 
AARS: AA-AMP + tRNA 4 AARS + AA-tRNA + AMP 


where AA is an amino acid and AARS is the corres- 
ponding AA-tRNA synthetase. In the first step, the 
amino acid activation step, some synthetases (such as 
GlnRS, ArgRS, and GluRS from Escherichia coli) also 
require the presence of tRNA for catalysis of the 
activation. When an amino acid is attached to a 
tRNA (step 2), the tRNA is said to be aminoacylated 
or charged. The terms ‘uncharged tRNA,’ ‘unacyl- 
ated tRNA,’ or ‘deacylated tRNA’ refer to a tRNA 
molecule lacking an amino acid. If the tRNA specific 
for tryptophan (Trp) is acylated with tryptophan, the 
product would be termed tryptophanyl-tRNA (Trp- 
tRNA) or, more precisely, Trp- tRNA". If a glutam- 
ine (Gln)-specific tRNA were misacylated with Trp, 
the product would be termed Trp-tRNA®”. 

The specific synthetases are denoted by their three- 
letter amino acid designation and ‘RS’; for example, 
GlyRS for glycyl-tRNA synthetase. Despite their 
conserved mechanisms of catalysis, the AARSs differ 
in the nature of the active site for the amino acid and 
ATP as well as with respect to tRNA identity elements 
recognized and the modes of binding to the tRNAs. 
Accuracy in the formation of a particular AA-tRNA 
involves factors such as tRNA identity determinants 
for a particular AARS, existence of antideterminants 
in certain tRNAs that prevent interaction with 
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noncognate AARSs, proofreading of a misactivated 
amino acid either as an enzyme-bound aminoacyl- 
adenylate (pretransfer proofreading) or as an enzyme- 
bound aminoacyl-tRNA (posttransfer proofreading). 

A central premise of Crick’s “adaptor” hypothesis 
(late 1950s) was that each AA-tRNA would be synthe- 
sized by a unique amino acid-specific enzyme. Con- 
sequently, the cell was expected to contain 20 such 
proteins for the 20 common amino acids used in protein 
synthesis. In the decades that followed, that expect- 
ation was realized in the discovery of the 20 AARSs. 
However, more recent studies as well as discoveries 
arising mainly from completion of the genome 
sequencing of organisms from the three domains of 
living things, Bacteria, Archaea, and Eukarya, have 
shown that, contrary to all expectations, numerous 
organisms do not use a full complement of 20 canon- 
ical AARSs to synthesize AA-tRNAs for protein 
synthesis and that some have unexpected kinds of 
synthetases. 

A widespread exception to the conventional adap- 
tor hypothesis occurs in the formation of glutaminyl- 
tRNAS™ (Gln-tRNA©") and asparaginyl-tRNA‘" 
(Asn-tRNA). In most Bacteria, Archaea, and 
eukaryal organelles, those AA-tRNAs are formed 
in an indirect, two-step route involving tRNA- 
dependent amino acid transformations, generally in 
the absence of GlnRS and AsnRS. In the formation 
of Gln-tRNAS", for example, tRNAS" is first mis- 
acylated with glutamate by a nondiscriminating 
GluRS. In addition to generating Glu-tRNA®", that 
synthetase, owing to relaxed tRNA specificity, also 
forms Glu-tRNAC™. The resulting misacylated 
tRNA is then specifically recognized by glutamyl- 
tRNA!" amidotransferase (GluAdT) and converted 
to Gln-tRNAS™, Analogously, the actions of a non- 
discriminating aspartyl-tRNA synthetase and an 
aspartyl- tRNA*“*" amidotransferase lead to the for- 
mation of Asn- tRNA**”. 

A most interesting and unexpected kind of tRNA- 
dependent amino acid transformation came with the 
discovery that selenocysteine (Sec), a nonstandard 
amino acid found in proteins of all three domains, is 
cotranslationally inserted into polypeptides under the 
direction of the codon UGA and assisted by a distinct 
elongation factor and a structural signal in the mRNA. 
Although strong evidence exists for different aspects 
of the mechanism in eukaryal and archaeal organisms, 
the details have been most clearly revealed in studies 
with E. coli. A special tRNA species, tRNAS®, is 
misacylated with serine (Ser) by SerRS. The Ser in 
the Ser-tRNAS® so formed is then converted in two 
steps to selenocysteine, resulting in Sec- tRNAS®, the 
AA-tRNA that binds to the Sec-specific translation 
factor SelB and then responds to the UGA codon. 
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The most serious and intrinsically contradictory 
exception to the simple adaptor hypothesis so far is 
the recent finding that, in several organisms, Cys- 
tRNA©S is formed not by a CysRS but rather by 
a ProRS, a single polypeptide, the amino terminal 
part of which corresponds to ProRSs but which acyl- 
ates tRNA‘ with Cys in addition to forming Pro- 
tRNA"™®. This is the first known example of a single 
AA-tRNA synthetase that can specify two different 
amino acids in translation. Although ProCysRS does 
not require the presence of tRNAP™ for prolyl- 
adenylate synthesis, the activation of cysteine is seen 
only in the presence of (RNAS. The enzyme does 
not make the misacylated cross-products (Cys- 
tRNA’? or Pro-tRNA™’) and it does not interact 
with the other 18 amino acids. However, while binding 
and activation of proline facilitates tRNA" binding, it 
simultaneously prevents tRNA“ binding. Similarly, 
the binding of tRNA‘©"S to the enzyme blocks the 
activation of proline, thereby allowing the activation 
only of cysteine. Besides being a challenge for 
deciphering the mechanisms involved, the ProCysRSs 
should shed light eventually on the evolution of 
AA-tRNA synthetases. 


See also: Adaptor Hypothesis 
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Aminopterin is an analog of the coenzyme folic acid 
and, like the closely related compound methotrexate, 
is a potent inhibitor of the enzyme dihydrofolate 
reductase. Inhibition of this enzyme blocks the regen- 
eration of tetrahydrofolate without which the methyl- 
ation of (UMP to dTMP (carried out by thymidylate 
synthetase) cannot be accomplished. Tetrahydrofolate 
is also required for the synthesis of the purine 
ring found in adenine and guanine. Therefore, inhib- 
ition of tetrahydrofolate formation leads to inhibition 
of the de novo pathways to the DNA precursors 
dATP, dGTP, and dTTP and thus to inhibition of 
DNA synthesis. This ability to inhibit DNA synthesis 
in rapidly dividing cells has led to the use of amino- 
pterin and methotrexate as anticancer drugs. These 
drugs have also been used to induce abortions, but 
they are teratogens and their failure when used as 
abortifacients can result in infants being born with 
the multiple anomalies characterizing fetal amino- 
pterin syndrome. 


Aminopterin also has use in the laboratory as a 
selective agent in so-called ‘HAT medium’ (hypox- 
anthine, aminopterin, thymidine). The aminopterin 
in the medium blocks de novo synthesis of DNA, as 
outlined above, forcing cells in this medium to use 
salvage pathways to convert hypoxanthine to usable 
purines and to convert thymidine to TMP. The 
medium’s primary use has been to select for fusions 
between myeloma cells lacking an enzyme required 
for the use of hypoxanthine and other cells lacking thy- 
midine kinase and, therefore, unable to use thymidine. 


See also: DNA Replication; Purine 


Amniocentesis 


M A Ferguson-Smith 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0046 


Amniocentesis is the withdrawal of fluid from the 
amniotic sac. The procedure is used for fetal diagnosis 
and is usually performed at 16-18 weeks of gestation 
when there is about 180ml of fluid. At this stage 
the risk of inducing miscarriage is about 1%. Earlier 
amniocentesis is associated with an increased risk of 
miscarriage and may, in addition, lead to postural 
deformities of the fetus including club foot as a result 
of persistent leakage of amniotic fluid from the cer- 
vical canal. 

Amniocentesis is performed under aseptic con- 
ditions and after placental localization using ultra- 
sound. A needle is inserted transabdominally under 
ultrasound guidance into the amniotic cavity. Up 
to 20ml can be withdrawn for fetal diagnosis. The 
chance of contamination of the sample with maternal 
tissue cells and decidua can be greatly reduced if 
a stilette is used in the needle and if the first few 
drops of the amniotic fluid sample are discarded after 
withdrawal. 

Amniocentesis was first used extensively in the 
management of erythroblastosis fetalis due to mater- 
nal isoimmunization resulting from Rhesus D incom- 
patibility of the fetus. Since the 1970s, it has been used 
more frequently for the prenatal diagnosis of genetic 
disease and neural tube defects. Amniotic fluid 
contains viable fetal cells which can be grown in tissue- 
culture medium and used for chromosome analysis, 
enzyme assay, and DNA analysis for single-gene 
disorders. After centrifugation and the removal of 
cells, an increased level of amniotic o-fetoprotein 
(AFP) may indicate the presence of an open neural 
tube defect, i.e., anencephaly or open spina bifida or, 


more rarely, other fetal lesions associated with the 
leakage of fetal blood plasma into the amniotic fluid, 
for example congenital nephrosis and_ placental 
hemangioma. However, by far the most common use 
of amniocentesis is for the diagnosis of chromosomal 
trisomy, especially trisomy 21 (Down syndrome). In 
these cases the indication for amniocentesis may be a 
maternal age greater than 35 years, as the risk of 
Down syndrome increases with maternal age. In 
pregnancies affected by Down syndrome certain 
maternal serum constituents can act as markers for 
the syndrome; thus free B-chorionic gonadotrophin 
is elevated and serum AFP is decreased in affected 
pregnancies. These and other markers are used in 
prenatal screening programs to estimate the risk of 
an affected pregnancy at all maternal ages. The serum 
screening programs, taken together with the results 
of prenatal ultrasound examination for fetal nuchal 
translucency, are widely used by mothers who wish 
to know their risk of giving birth to an affected child. 
The screening test is not diagnostic and amniocentesis 
is required to determine if the pregnancy is affected. 
A certain diagnosis of a severe abnormality gives the 
parents the option of termination of pregnancy. 


See also: Down Syndrome; Prenatal Diagnosis 
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Amplicons are regions within a genome that consist of 
a DNA segment bordered by two repeated sequences 
positioned in a direct orientation. Amplicons have 
been also referred to as amplifiable units of DNA. 
Homologous recombination between the repeated 
sequences may lead either to amplification or deletion 
of the whole amplicon structure. 

The rate-limiting step for amplification is the 
homologous recombination event between the re- 
peated sequences, which generates a tandem duplica- 
tion. The tandem duplicated structure formed can 
further recombine along its whole length to increase 
amplification or to delete one of the tandem repeats. 
An amplified region of the genome consists of a series 
of tandemly repeated sequences in direct orientation. 
Recombination among such sequences leads to either 
an increase or a decrease in the amplification factor. In 
this context, amplification of a certain region of the 
genome should be considered as a highly dynamic 
state, whose copy number may vary and which may 
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return to the basal nonamplified state without disrupt- 
ing the structure of the genome. 

The dynamic nature of a DNA amplified region 
continuously generates closed circular structures that 
consist of monomers or multimers of the whole ampli- 
con. Due to the lack of an origin of replication these 
structures will be lost as the cell divides. However, 
some genetic manipulations may allow the recovery 
of such structures. This will facilitate the molecular 
characterization of amplicon structures. 

The main factors influencing the rate at which an 
amplicon can duplicate or delete are the length and 
sequence conservation of the recombining repeats. 
The specific nature of the repeated sequences is un- 
important. Among the reiterated sequences that 
participate as amplicon borders, there are common 
inhabitants of many genomes, such as ribosomal oper- 
ons and insertion sequences (IS) of different kinds. On 
the other hand, examples of species-specific repeated 
sequences are the recombination hot spot sequences 
(rhs) found in Escherichia coli, or the repeated nitro- 
genase operons found in several Rhizobium species. 

Amplicons define a structural characteristic of the 
genome. Any replicon, chromosome, or plasmid can 
be viewed as a structure formed by overlapping ampli- 
cons. The only exception would be a replicon devoid 
of repeated sequences. The length of specific ampli- 
cons may vary between a few kilobases to more than 
one megabase. The number and size distribution 
of amplicons in a genome depends on the amount, 
location, and relative orientation of long repeated 
sequences. These factors, coupled to the genome 
architecture (chromosomal vs. plasmidic, linear vs. 
circular) contribute to shaping the “amplicon struc- 
ture” of a particular genome. 

The biological role of DNA amplification in pro- 
karyotes might be related to adaptation to extreme 
conditions or situations that impose severe demands 
on the ability of regulatory systems. Under some con- 
ditions, overexpression of gene products through gene 
amplification may confer the phenotypic advantages 
needed for survival. The amplified state would remain 
as long as the selective condition exists; when the 
selective condition disappears, the population will 
return to the basal nonamplified state. 

Different examples of natural DNA amplification 
have been correlated with adaptative situations in dif- 
ferent organisms. Among these are the following: 
increased resistance to antibiotics; increased resistance 
to heavy metals; growth under conditions of nutrient 
scarcity; growth in different exotic (nonnatural) car- 
bon sources; enhancement of pathogenicity; and en- 
hancement of the capacity to fix nitrogen. 

In addition to its role in adaptation to transient 
conditions, DNA amplification provides new subjects 
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for evolution. Amplified genes can receive different 
mutations, thus allowing the evolution of new func- 
tions, while leaving nonmutated copies to cope with 
the original function. 


See also: Gene Amplification; Insertion Sequence 
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The story of life on earth shows that it led from 
the simplest bacteria, 3.5 billion years ago, to organ- 
isms that are far more complex. Indeed, one can rec- 
ognize a whole series of seeming stepping stones: 
prokaryotes; single-celled eukaryotes with a nucl- 
eus, cellular organelles, and sexuality; and finally 
many-celled fungi, plants, and animals. Among the 
animals one may step from ectotherms (cold-blooded) 
to two types of endotherms (warm-blooded): birds 
and mammals and, among the mammals, to species 
with highly developed parental care and the acquisi- 
tion of a complex central nervous system. Tradition- 
ally taxa with these evolutionary acquisitions have 
been referred to as higher organisms, and this kind 
of evolution has been called anagenesis (‘upward’ 
evolution). 

Considering humans as the endpoint of anagene- 
sis, this development has also been called evolutio- 
nary progress. In a purely descriptive sense the word 
‘progress’ is defensible, because each step leading to 
such ‘progress’ was the result of natural selection, a 
reward for a genotype, that was at that moment super- 
ior to its competitors. However, the term ‘evolution- 
ary progress’ should be used cautiously for two 
reasons. The first is that many, if not most, authors 
who used the term ‘progress’ were teleologists and 
ascribed the advance to intrinsic forces of purpose 
that led inexorably to ever greater perfection. But 
this is something entirely different from the tempor- 
ary superiority reflected in any given act of natural 
selection. 

More important is the criticism that most evolu- 
tionary developments in various phyletic lineages are 
not conspicuously progressive. What is so progressive 
in a sea-urchin, a moss, a giant kelp, a cave inhabitant, 
or a parasite? Evidently they all are well adapted to the 
place in nature which they occupy, but do they repre- 
sent the same kind of progress which so many authors 
saw in the series from amphioxus, to rhipidistian fish, 
reptile, monkey, and humans? Surely not! 


The term ‘anagenesis’ is still useful as a caption for 
all evolutionary processes leading to the divergence 
of phyletic lineages and to the genetic changes in a lin- 
eage Owing to its response to natural selection, but it 
does not imply the existence of any teleological forces. 

The most comprehensive treatment of anagenesis is 
still that of Rensch (1960, pp. 281-308). An extensive 
critique of anagenesis, under the label of evolutionary 
progress, was published by Ruse (1996). The other 
major evolutionary process is cladogenesis. 
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As used by Owen (1943) analogous structures are 
structures that perform similar functions. Pachen 
(1994) presents a history of the terms homology and 
analogy. Although analogy is often treated as the 
complement of homology, this was not the original 
intent. Indeed, according to Owen, homologous 
similarities can also be analogous similarities if they 
perform the same function in the two organisms that 
are compared. Insect wings and bird wings are func- 
tionally analogous because they perform the same 
function but have no underlying degree of similarity 
of their parts. But, the wings of birds and bats also 
perform an analogous function (flight), while being 
homologous as vertebrate fore-limbs. Patterson (1988) 
distinguishes analogous characters as those that fail 
the similarity test as well as the congruence. Most 
other authors reserve the term analogous to describe 
those identical character states that are not homologs 
(bird wings and insect wings) and use the terms paral- 
lelisms and convergence to describe those homopla- 
sies that are more or less structurally similar but fail 
the congruence test of Patterson (Patterson, 1988). 
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The theory of ancestral heredity was developed in the 
late 1880s by the Victorian amateur scientist Francis 
Galton (1822-1911) who wanted to determine the 
relationship between various characteristics in parents 
and offspring. The theory was derived from his law of 
ancestral heredity which stated that 


two parents contribute between them on average 1/2 (or 
0.50) of the total inheritance of offspring; four grandparents 
contribute 1/4 (or 0.50°) and so on generating the occupier of 
each ancestral place in the nth degree. 


Galton decided that the influence, pure and simple, 
of the midparent may be taken as 1/2 the midparent, 
1/4 the mid-grandparent, and 1/8 the mid-great- 
grandparent. 

Until the 1870s ideas of heredity were linked to 
problems of embryology and growth. Charles Dar- 
win’s theory of inheritance, expressed in his “provi- 
sional hypothesis of pangenesis,” was a developmental 
theory. Pangenesis was a blending theory of inherit- 
ance and implied that each cell of an organism threw 
off minute particles of its contents or gemmules, not 
only during the adult stage, but during each stage of 
development of every organism. After the gemmules 
multiplied and aggregated in the reproductive appar- 
atus, this material was to be passed on to the next 
generation. To explain how useful variation could sur- 
vive from one generation to the next without being 
swamped by the effects of blending inheritance, Dar- 
win adopted a Lamarckian view. 

In the following year, Galton wrote his first paper 
on inheritance and had “observed the fact of rever- 
sion.” Darwin had, by then, attributed reversion to 
the development of gemmules which had merely lain 
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dormant. Though Galton had objected initially to 
Darwin’s Lamarckian view, some five years after his 
experiment, he accepted pangenesis with considerable 
modification, as “a supplementary and subordinate 
part of a complete theory of heredity, but by no 
means for the primary and more important part.” 
Nearly two decades later, Galton formed a theory of 
“particulate inheritance” derived from the “particles” 
in Darwin’s hypothesis of pangenesis. 

In addition to Darwin’s theory of pangenesis, 
physiological theories of inheritance had also been 
suggested by Hugo de Vries, Ernst Haeckel, Karl 
Nägeli, August Weismann, and Herbert Spencer. 
Their ideas produced theories of the physical mechan- 
ism of heredity — how the units controlling the devel- 
opment of the organism were produced, assembled 
in the germ cells and handed on to offspring. These 
theories, which emphasized rare or discontinuous 
variation, attempted to explain what had happened in 
the reproductive organs to produce that most familiar 
phenomenon that offspring resemble parents in many 
respects although not completely. Galton, who was 
interested in measuring the ordinary and gradual 
(or continuous) variation that underpinned Darwin’s 
theory of natural selection, added a mathematical 
component to theories of inheritance when he began 
to look for statistical laws of heredity. 

This work led to the development of Galton’s an- 
cestral and “alternative” theories of inheritance which, 
in turn, provided the catalyst to his work on correl- 
ation and especially for regression: two of his most 
important and innovative contributions to statistics. 
Galton’s theory of ancestral inheritance incorporated 
blending and non-blending inheritance. He referred to 
characters that did not blend (such as eye color) as 
“alternative inheritance.” Galton’s statistical approach 
to heredity allowed him to move away from the sterile 
approach of using developmental and embryological 
ideas of heredity. His use of statistics enabled him 
to place problems of heredity within a population 
and not just within individual acts of reproduction. 
When Galton decided to set aside the whole question 
of how reproduction and growth work at the physio- 
logical level, he was able to depart from the existing 
traditions in a way that Darwin would not. Evolution 
was uncoupled from the problem of generation, and in 
one stroke Galton undermined the whole complex 
range of ideas that had upheld the developmental 
world view. 

Galton used simple regression to help him explain 
what he perceived to be discontinuous variation (i.e., 
the skipping of generations or reverting to the ances- 
tral type). He defined regression as a phenomenon 
where the child inherits partly from his parents, partly 
from his ancestry; the further his genealogy went back, 
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the more numerous and varied the ancestry was to 
become, until they ceased to differ. Galton then 
thought that, statistically, the most probable value of 
the midancestral deviates in any remote generation was 
zero. If the complete ancestry were known, the whole 
ancestry could be replaced by a single midancestor. 
The midparent thus represented the parentage of the 
offspring. The statistician Karl Pearson (1857-1936) 
understood Galton’s idea of regression to indicate the 
result of the influence of parental heredity pulling 
the offspring towards the parental value and the med- 
iocrity of the more distant ancestry pulling towards its 
own character. 

When Galton wanted to measure the relationship 
between stature of father with stature of son, he used 
graphical methods in an attempt to obtain a measure of 
simple correlation (termed by Pearson when measur- 
ing the relationship between two continuous variables 
only — for Galton this tended to be one characteristic 
usually in two generations). However, when Galton 
became interested in measuring the relationships 
between characters in more than two generations, he 
needed a different set of statistical procedures. In 1896 
and 1898 Pearson offered a partial resolutionto Galton’s 
concerns when he devised a set of mathematical- 
statistical procedures, known as multiple correlation. 
While simple correlation measures the linear relation- 
ship between two continuous variables (where one 
variable is termed the ‘independent variable’ and the 
other is the ‘dependent variable’), multiple correlation 
is a measure of the relationship of three or more con- 
tinuous variables (i.e., between one dependent vari- 
able and two or more independent variables combined 
with optimal weights). 

From Galton’s idea of regression, Pearson devel- 
oped a statistical system that is used for the linear 
prediction between two continuous variables. Though 
Pearson had shown that Galton’s correlation for- 
mula was a measure of regression instead, he retained 
Galton’s r to symbolize the correlation coefficient. 
The regression coefficient was symbolized by the let- 
ter b which designated a constant in the equation for a 
straight line Y = a + bX (where a = the intercept con- 
stant and b = the slope of the line). It then followed 
that Y’ = a + bX was the equation for the regression 
(or predicted) line. Pearson determined that the re- 
gression coefficient 
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where x is the independent variable. The constant 
a=y— bx. 

A measure of the product-moment correlation 
coefficient is: 
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In these formulas, it may be seen that the covariance 
(i.e., the sum of the cross-products of X and Y) is used 
for the numerators for both regression and correl- 
ation; hence, the value will be the same for both. 
However, if the standard deviation of x and of y differ 
(which Galton did not consider to be tenable), then 
the denominators for correlation and for regression 
will also differ; hence, Pearson showed that the correl- 
ation coefficient would not necessarily be identical to 
the regression coefficient as Galton had believed. The 
mathematical procedures Pearson used to define 
Galton’s law of ancestral heredity provided the basis 
for the development of multiple regression. Like simple 
regression, it involves a linear prediction, but rather 
than using only one variable to be “predicted,” a 
collection of variables can be used instead. While 
Galton’s law of ancestral heredity was a biological 
hypothesis, Pearson termed his multiple correlation 
and multiple regression “Galton’s law of ancestral 
inheritance.” When Pearson went on to devise 
this multivariate statistical system correlation from 
Galton’s law of ancestral inheritance, he also went on 
to make one of most seminal contributions to statistics 
when he introduced matrix algebra into statistical the- 
ory (which was to become a sine qua non for multi- 
variate statistical theory). 
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An anchor locus is a previously well-mapped locus 
that is chosen as a marker to provide an “anchor” for 


mapping studies with loci that have not been prev- 
iously mapped. With the use of anchor loci distribu- 
ted at 20cM intervals throughout a genome, it is 
typically possible to establish map locations for 
newly characterized loci in the context of new breed- 
ing studies. 


See also: Gene Mapping; Marker 
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Adhesion to extracellular matrix is a requirement 
for many cell types to proliferate and/or survive. The 
ability of a cell to proliferate in suspension, unattached 
to any matrix, is termed anchorage-independent 
growth and is a frequent characteristic of transformed 
cells, correlating with tumorigenic potential in vivo. 
Cell adhesion is mediated mainly through the interac- 
tion of integrin receptors with their extracellular 
matrix ligands. Clustering of integrins induces signal- 
ing events that cooperate with mitogen and survival 
factor signals to promote progress through the cell cycle 
and survival. Cells that are anchorage-independent no 
longer require signals via the integrins to proliferate 
and survive. 


See also: Cell Cycle 
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Androgenetic embryos have paternal genomes only 
(haploid or diploid). In natural androgenesis (which 
is rare) the maternal genome is inactivated or lost. 
Experimental androgenetic embryos can be made by 
irradiating or enucleating the oocyte, followed by 
fertilization (and potentially diploidization). In the 
mouse, androgenetic embryos can also be made by 
pronuclear transplantation, whereby after fertilization 
and pronucleus formation the female pronucleus is 
removed by microsurgery and replaced by a second 
male pronucleus. 
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Androgenetic embryos are useful for genetic map- 
ping purposes or for the rapid recovery of mutations. 
In mice androgenetic development only progresses 
halfway through gestation, to early postimplantation 
stages. These embryos have reasonably well developed 
extraembryonic tissues such as the trophoblast, but 
the embryo proper is retarded and often abnormal. 
This developmental failure is explained by the phe- 
nomenon of genomic imprinting whereby certain 
genes in eutherian mammals are expressed from 
only one of the parental chromosomes. Androgenetic 
embryos thus have lack of gene products which are 
only made by the maternal genome, and overexpres- 
sion of gene products made by the paternal genome. 
In humans androgenetically developing embryos can 
lead to hydatidiform moles, in which there is prolif- 
eration of trophoblast tissue but fetal tissues are absent 
or abnormal. The existence in nature of androgenetic- 
ally reproducing species, or normal development fol- 
lowing experimental production of androgenones, 
indicates that genomic imprinting is largely absent in 
these species. 


See also: Hydatidiform Moles 
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Ananeuploid cell or organism has a number of chromo- 
somes that differs from the normal chromosome 
number for the species by one or a few chromosomes. 
If the normal haploid chromosome number is n, then 
a somatic diploid cell has 2x chromosomes. An aneu- 
ploid somatic cell with one extra chromosome has 
2n+1 chromosomes and is said to be trisomic, since 
three versions or homologs of one chromosome are 
present. A double trisomic would be denoted 27+1+1, 
whereas a tetrasomic would be referred to as 2n+2. A 
monosomic would have 2n—1 chromosomes. 
Aneuploid organisms are generally less healthy 
than euploids, which have each chromosome repre- 
sented the same number of times (apart from the two 
different sex chromosomes in the heterogametic sex). 
The deleterious effect of aneuploidy is thought to be 
caused not by the altered dosage of a single gene, but 
by the cumulative effect of an imbalance in the gene 
activities of many genes onthe extra or missing chromo- 
some. This explains why the severity of a trisomic 
phenotype generally increases with the size of the 
trisomic chromosome (excluding sex chromosomes, 
the expression of which may be dosage compensated). 
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It also helps explain why monosomy for a given 
chromosome is generally more deleterious than tri- 
somy for the same chromosome, since the gene dosage 
is halved in the case of monosomy but increased to 
only 1.5 times the normal level in the case of trisomy. 
There is another contribution to the deleterious 
effect of monosomy, however: the singly represented 
chromosome may carry deleterious recessive genes 
that are unmasked by the absence of a homolog. 

Aneuploidy can be caused by a failure of homo- 
logous chromosomes to separate properly at meiosis 
or mitosis, a phenomenon called nondisjunction. 
Nondisjunction at either meiosis I or meiosis II can 
generate gametes with 7+1 and n—1 chromosomes, 
which upon fusion with normal gametes having 
n chromosomes will produce trisomic and monosomic 
progeny, respectively. Somatic aneuploids, in which 
some cells are euploid and others are aneuploid, can 
be generated by mitotic chromosome nondisjunction; 
the germline in a somatic aneuploid may also be aneu- 
ploid. A fertile aneuploid can transmit the aneuploid 
condition to its offspring. A fertile trisomic, for exam- 
ple, can produce gametes having both n and n+1 
chromosomes, and the latter will give rise to trisomic 
progeny. There is a tendency for one of the three 
chromosomes in a trisomic to be lost during meiosis, 
however, in which case the proportion of gametes that 
are aneuploid is less than half. In addition, transmis- 
sion of trisomy through pollen in plants is often infre- 
quent because pollen with +1 chromosomes is 
outcompeted by normal haploid pollen. 

Many trisomic lines of plants have been derived 
experimentally from triploids, in which every chromo- 
some is represented three times. Triploids can be 
readily generated from crosses between diploids and 
tetraploids. 

In humans, monosomy for any one of the 22 
autosomes is lethal in utero. About 1 in 3000 human 
females born, however, has a single X and no Y chromo- 
some (denoted X0). This results in a characteristic 
phenotype, known as Turner syndrome, which in- 
cludes incomplete sexual development and sterility. 
The trisomic combination XXY occurs in about 1 in 
1000 male births and results in Klinefelter syndrome, 
which includes mental retardation and sterility. The 
most common viable autosomal aneuploid in humans, 
occurring in about 1 in 700 live births, is trisomy for 
chromosome 21. This results in Down syndrome, 
which includes mental retardation, reduced life 
expectancy, and a characteristic physical appearance — 
broad flat face, small folds of skin covering the inner 
corners of the eyes, short stature, and short hands. 
The only other human autosomal trisomics to survive 
birth involve chromosomes 13 (Patau syndrome) 
and 18 (Edwards syndrome). Both show severe 


abnormalities and generally survive only a few weeks 
or months. 

Some allopolyploid plant species are viable as 
nullisomics, lacking both members of a pair of 
homologous chromosomes. Nullisomy can be toler- 
ated in these cases because additional pairs of partially 
homologous chromosomes can compensate for the 
missing pair. All of the possible 21 nullisomics of the 
allohexaploid wheat Triticum aestivum have been 
made, for example; they differ in appearance from 
normal hexaploids and are less vigorous. Plant genet- 
icists have used collections of nullisomic or mono- 
somic lines (a separate line affecting each chromosome 
in the haploid set) to identify quickly the chromosome 
to which new recessive mutations map. Complete sets 
of trisomics have been identified for various plants, 
including rice and the Jimson weed, with each trisomic 
exhibiting a characteristic phenotype. 

The term aneuploidy is sometimes extended to cases 
where a sizable chromosomal segment is duplicated or 
deficient. The term ‘segmental aneuploidy’ has been 
used to distinguish these special cases from the usual 
definition of aneuploidy, involving changes in chromo- 
some number. 


See also: Dosage Compensation; Meiosis; 
Nondisjunction; Polyploidy 
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Angiogenesis is the process by which new capillary 
blood vessels grow. Capillary blood vessels are thinner 
than a hair and supply virtually every tissue and organ 
in the body. A cubic millimeter of heart muscle con- 
tains approximately 2500 millimeters of capillary 
blood vessels. A pound of fat contains approximately 
one mile of capillary vessels. Because the oxygen dif- 
fusion limit in tissues is approximately 100 to 150 
microns, almost every cell in the body lives adjacent 
to a capillary blood vessel (Figure 1). Some types of 
cells are sandwiched between two capillary vessels: 
fat cells, skeletal muscle cells, and islet cells in the 
pancreas. Capillary blood vessels are generated by 
endothelial cells which then come to line these vessels 
facing the lumen. All of the endothelial cells in the 
vascular system exist as a single cell layer which would 
cover approximately 1000 square meters, an area the 
size of a tennis court. Within this vast expanse of 
endothelial cells, less than 0.1% are undergoing cell 
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Figure | Diagram of three common configurations of 
how cells are apposed to capillary blood vessels. (A) 
Islet cells are sandwiched between two capillary vessels, 
apical and basal. Fat cells and most skeletal muscle cells 
are similar. (B) Hepatic cells live adjacent to one 
capillary, and the next layer of hepatocytes are in 
another capillary neighborhood. Kidney, lung, and other 
organs are similar. (C) Epithelial cells in the epidermis, 
gastrointestinal tract, and genitourinary tract are 
separated from vessels beneath them in dermis or 
submucosa, but are still within the 100-200 um oxygen 
diffusion limit. (D) In contrast, tumor cells pile up as 
microcylinders around a capillary out to six or more 
layers. The cells at the greatest distance from an open 
capillary are severely hypoxic or anoxic. (From Folkman, 
2000.) 


division. The other 99.9% are quiescent. To appreciate 
how rarely endothelial cells proliferate under normal 
conditions, they can be compared to bone marrow 
cells which are among the most rapidly proliferating 
cells in the body. In the bone marrow there are 
approximately 6 billion cell divisions per hour and 
the entire cell population in the bone marrow is 
replaced (turned over) in approximately 5 days. In 
contrast, the turnover of normal endothelial cells is 
measured in hundreds of days. However, during 
angiogenesis endothelial cells have the potential to 
divide as rapidly as bone marrow cells. 

Physiologic angiogenesis occurs in the female 
reproductive tract in the ovary and in the uterus, for 
a few days every month. Angiogenesis in a wound is 
similarly short-lived, usually lasting not more than 2 
weeks. In mice, 75% of the liver can be removed 
surgically and it will grow back by 9 days. The brisk 
angiogenesis which turns on the day after hepatec- 
tomy turns off approximately 7 to 8 days later. There- 
fore, one hallmark of physiologic angiogenesis is its 
brevity. A second hallmark is that many of the new 
capillary blood vessels will either regress or will go on 
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to become “established” microvessels. These mature 
microvessels contain quiescent endothelial cells which 
rest on an intact basement membrane consisting mainly 
of the proteins collagen and laminin and various 
carbohydrates complexed to proteins, such as heparan 
sulfate proteoglycans. Embedded in this basement 
membrane are smooth-muscle-like cells called peri- 
cytes. Thus, established microvessels have a slightly 
thicker wall than growing vessels. In growing micro- 
vessels the basement membrane is disrupted and peri- 
cytes are sparse or absent. 

Pathologic angiogenesis persists unabated for 
months or years. Endothelial cells continue to prolif- 
erate rapidly. The new blood vessels are thin-walled 
and pericyte- poor. They rarely regress spontaneously. 
Pathologic angiogenesis supports the growth and pro- 
gression of solid tumors and leukemias. It provides a 
conduit for the entry of inflammatory cells into sites 
of chronic inflammation, such as in the intestine 
(Crohn’s disease) or in the bladder (chronic cystitis). 
Pathologic angiogenesis is the most common cause of 
blindness, destroys cartilage in rheumatoid arthritis, 
contributes to growth and hemorrhage of athero- 
sclerotic plaques, leads to intraperitoneal bleeding in 
endometriosis, is the basis of life-threatening heman- 
giomas of infancy, and permits prostate growth in 
benign prostatic hypertrophy — to name just a few 
“angiogenic disease processes” that are found in 
almost all specialties of medicine. 


The Beginnings of Angiogenesis 
Research 


Tumor hyperemia that had been observed during sur- 
gery since the 1870s was for the next 100 years attri- 
buted to simple dilation of existing host vessels. Two 
reports, in 1939 and 1945, suggested that tumor vascu- 
larity was due to the induction of new blood vessels. 
This idea was dismissed by most investigators. The 
few who accepted it believed that new vessels were 
an inflammatory side-effect of tumor growth. In 1968 
a hypothesis was advanced that tumors secreted a 
diffusible angiogenic substance. 

In 1971 I proposed a hypothesis based on experi- 
ments carried out in the 1960s with Frederick Becker, 
that tumor growth is angiogenesis-dependent. The 
idea was that tumors could recruit their own private 
blood supply by releasing a diffusible chemical signal 
which stimulated angiogenesis. Tumor angiogenesis 
could then be a novel second target for anticancer 
therapy. These concepts were not accepted at the time. 
The conventional wisdom was that tumor neovascu- 
larization, if it existed, was either: (1) an inflammatory 
host response to necrotic tumor cells, or (2) a host 
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Table | Endogenous stimulators of angiogenesis 


Protein 


Molecular weight (kDa) 


Year reported 


Basic fibroblast growth factor (FGF-2) 
Acidic fibroblast growth factor (FGF-1) 
Angiogenin 
Transforming growth factor alpha 
Transforming growth factor beta 
Tumor necrosis factor alpha 
Vascular endothelial growth factor: 
VPF 
VEGF 
Platelet-derived endothelial growth factor 
Granulocyte colony-stimulating factor 
Placental growth factor 
Interleukin-8 
Hepatocyte growth factor 
Proliferin 
Angiopoietin- | 
Leptin 


18 1984 
16.4 1984 
14.1 1985 
5.5 1986 
25 1986 
17 1987 
40—45 1983 

1989 
45 1989 
17 1991 
25 1991 
40 1992 
92 1993 
35 1994 
70 1996 
16 1998 


response detrimental to the tumor. A few scientists 
thought that the blood vessels in a tumor were “estab- 
lished” and could not undergo regression. From these 
assumptions most scientists concluded that it was 
fruitless to attempt to discover an angiogenesis stimu- 
lator, to say nothing of discovering angiogenesis inhib- 
itors. Eventual acceptance of the 1971 hypothesis 
was slow because it would be 2 more years before the 
first vascular endothelial cells (from human umbilical 
veins) were successfully cultured im vitro, 8 more years 
before capillary endothelial cells could be cultured 
in vitro, 11 years before the discovery of the first angio- 
genesis inhibitor, and 13 years before the purification 
of the first angiogenesis protein. By the mid 1980s, after 
a series of reports from our laboratory and from other 
laboratories demonstrating indirect and direct evi- 
dence that tumor growth was angiogenesis-dependent, 
this hypothesis was confirmed by genetic methods, has 
been widely accepted, and continues to be a fruitful 
basis for laboratory and clinical research. Currently, 
hundreds of laboratories world-wide pursue angio- 
genesis research or some related area of vascular bio- 
logy. In fact, the modern field of vascular biology 
began only after the first successful long-term culture 
of vascular endothelial cells in the early 1970s. 


Tumor Angiogenesis 


How Tumors Become Angiogenic 

Most human tumors arise as a microscopic-sized col- 
ony of cells. The colony usually stops expanding 
when it reaches a population of approximately 1 mil- 
lion cells and a diameter of 0.2 to 2 millimeters. This is 


called in situ cancer. In its early stages an im situ cancer 
is usually not angiogenic and cannot recruit new 
microvessels. Therefore, it must live close to estab- 
lished neighboring microvessels (cooption) to receive 
available oxygen and nutrients and to be cleared of 
catabolites. Tumor cell proliferation under these strin- 
gent conditions is balanced by a high death rate (apo- 
ptosis) which restricts growth of the whole population. 
These tumors may remain undetectable for many 
years, but can be found at autopsies of people who 
died of trauma, but who never had cancer during their 
lifetime. Of women from 40 to 50 years of age, 39% 
have im situ carcinomas in their breast, but breast 
cancer is diagnosed in only 1% of women in this age 
range. In men from age 50 to 70, 46% have in situ 
prostate cancers at the time of death, but only 1% are 
diagnosed in this age range during life. In people from 
age 50 to 70, more than 98% have small carcinomas of 
the thyroid, but thyroid cancer is diagnosed in only 
0.1% of people in this age range. 


The Angiogenic Switch 

Between 1/1000 and 1/100 im situ cancers may switch 
to the angiogenic phenotype and begin to recruit their 
own private blood supply. In experimental studies the 
switch itself is triggered by increased expression of 
one or more angiogenic proteins from a subset of 
tumor cells (Table 1). Vascular endothelial growth 
factor (VEGF) is the angiogenic protein most com- 
monly overexpressed by different tumors. For ex- 
ample, approximately 60% of human breast cancers 
overexpress VEGF. Tumors can also overexpress basic 
fibroblast growth factor (bFGF), platelet-derived 


growth factor, (PDGF), interleukin-8 (IL-8), hepato- 
cyte growth factor, angiogenin, tumor necrosis factor 
alpha (TNF-alpha), and other angiogenic proteins. 
In some tumors an oncogene is responsible for 
overexpression of a positive regulator of angiogenesis. 
For example, ras oncogene upregulates expression of 
VEGE In other tumor types, concomitant downregu- 
lation of an endogenous negative regulator of angio- 
genesis is also required for the angiogenic switch. 
When the tumor suppressor gene p53 is mutated or 
deleted, for example, thrombospondin, an endogenous 
angiogenesis inhibitor is downregulated. At the time 
of writing, it is being discovered that other tumor- 
suppressor genes control negative regulators of angio- 
genesis. The tumor cell is not the sole regulator of the 
angiogenic switch. Tumor angiogenesis can be poten- 
tiated by hypoxia. Tumor cells which are located more 
than 100-150 um away from a blood vessel become 
hypoxic. Hypoxia activates an hypoxia-inducible 
factor (HIF-1)-binding sequence in the VEGF pro- 
moter. This leads to transcription of VEGF mRNA, 
increased stability of VEGF message, and increased 
production of VEGF protein beyond what may have 
initially been triggered by an oncogene. HIF-1 also 
increases the transcription of genes for PDGF-BB and 
nitric oxide synthetase (NOS). Mast cells attracted to 
the tumor bed can potentiate angiogenesis by releasing 
enzymes (metalloproteinases) which mobilize VEGF 
(and/or bFGF) from their heparan sulfate proteogly- 
can storage sites in extracellular matrix. Tumor angio- 
genesis can also be potentiated by angiogenic proteins 
released by fibroblasts in the tumor bed or by macro- 
phages attracted to the tumor bed. Tumor angiogene- 
sis may also be modified by endogenous angiogenesis 
inhibitors which either circulate (e.g., interferon- 
beta, platelet factor 4, angiostatin, and others), or are 
releasable from extracellular matrix (e.g., endostatin, 
thrombospondin-1, and tissue inhibitors of metallo- 
proteinases (TIMPS)). The intensity of tumor angio- 
genesis is also governed by the angiogenic response of 
the host, which is genetically regulated. It is known 
that hemangiomas predominate in white infants and 
that ocular neovascularization in macular degener- 
ation is common in whites, but rare in black patients. 
These clinical observations led to a recent experimen- 
tal finding that a constant dose of angiogenic stimula- 
tion by FGF in mouse corneas yielded a 10-fold 
difference in angiogenic response in mice of different 
genetic backgrounds. 

Other molecules which help to regulate neovascu- 
larization include ephrins, which specify arterial or 
venous development of capillary vessels and cyclo- 
oxygenase 2, the production of which is stimulated 
by bFGF and which converts lipid precursors to pros- 
taglandin E2 (PGE2), an angiogenic stimulator. 
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Neovascularization 

New microvessels are understood to grow by at 
least three mechanisms: (1) new sprouts bud from 
preexisting vessels; (2) circulating endothelial progeni- 
tor cells from the bone marrow can accumulate in the 
tumor bed and participate in new vessel formation; (3) 
in some situations, endothelial cells in preexisting ves- 
sels bridge the lumen to form new vessels by intussus- 
ception. All three mechanisms depend upon loosening 
of preexisting endothelial cells from: (1) their junc- 
tions with each other which are maintained by pro- 
teins such as VE-cadherin and platelet-endothelial cell 
adhesion molecule (PECAM); (2) their junctions with 
contiguous pericytes, which are increased by angio- 
poietin-1 and decreased by angiopoietin-2; and (3) 
their attachment to underlying basement membrane 
proteins which are governed by integrins such as 
alpha,betas, and by a variety of local proteinases and 
their local inhibitors. An example of the latter is uro- 
kinase plasminogen activator (uPA) that is produced 
by growing capillaries. It is inhibited by plasminogen 
activator inhibitor-1 (PAI-1). The endothelial cell 
loosening process may be aided by early dilation of 
microvessels which occurs prior to sprout formation 


and which is partly mediated by NOS. 


Tumor Growth and Metastasis after 
Neovascularization 
As new microvessels converge on a microscopic in situ 
tumor, tumor cells grow around each vessel to form 
perivascular cuffs (Figure 2). Each endothelial cell 
can support up to 100 tumor cells. As this process con- 
tinues, more new microvessels are recruited and the 
tumor begins to expand. By the time it reaches a size 
of approximately 0.5cm* it causes symptoms or 
is detectable by a variety of imaging techniques (X- 
rays, ultrasound, or magnetic resonance imaging), and 
it may contain half a billion cells. New microvessels 
leak plasma which oozes toward the tumor surface and 
is carried away by host lymphatics. Because most 
tumors are themselves deficient in lymphatics, tissue 
pressure increases in the tumor and some microvessels 
are compressed or closed. This creates additional areas 
of hypoxia which results in further increases in produc- 
tion of angiogenic factors. Tumors therefore, do not 
“outgrow their blood supply,” but compress it. 
Tumor cells begin to squeeze between endothelial 
cells, enter the vessel lumen and circulate to remote 
sites to form metastases. Recent experimental studies 
with human colon cancer implanted into animals 
reveals that at any given time, tumor cells are entering 
the lumen of approximately 15% of vessels in a tumor, 
take about 24 h to completely traverse the vessel wall, 
and during this time, share about 4% of the total vessel 
wall with endothelial cells, producing a “mosaic” 
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Figure 2 (A) Human breast cancer. Large breast duct 
partially lined by duct carcinoma in situ (arrow) and 
intense angiogenesis in the immediately adjacent peri- 
ductal breast stroma. Brown-staining microvessels (anti- 
body to von Willebrand factor) are indicated by 
arrowhead. Note the absence of angiogenesis in the 
areas of breast stroma adjacent to portions of duct lined 
with benign duct epithelium. (From Weidner et al., with 
permission of the publisher.) (B) Large breast duct filled 
with carcinoma in situ and surrounded by new micro- 
vessels in the periductal breast stroma. Arrowhead 
shows invasion of microvessels through basement 


membrane of the duct, accompanied by invasion of 
tumor cells into the periductal stroma. (C) Invasive duct 
carcinoma, area of highest density of microvessels. (D) 
Higher-power micrograph of invasive human breast 
carcinoma 4-um thick section. Microvessels (brown 
stained with antibody to CD31) are indicated by 
arrowheads. (E) 50um thick confocal microscopy 
section showing microvessels in three dimensions 
(arrowheads) surrounded by tumor cells which fill the 
intercapillary space. (F G, and H) Cross-sections of 
breast cancer in mice showing the microcylinders 
of tumor cells which surround each microvessel. The 
100-um thickness of these tumor microcylinders is within 
the range of the oxygen diffusion limit. (F) Scanning 
electron microscope view of 100-um Vibratome section 
of a subcutaneous MCa-IV mouse breast tumor with the 
skin at the top. Blood vessels appear as black holes 
emptied of blood and preserved in an open state by 
vascular perfusion of fixative. Pale necrotic regions 
surround perivascular rings of tumor tissue which are 
approximately 100 um thick. Magnification = 25 x. (G, 
H) Large and small thin-walled blood vessels in MCa-IV 
mouse breast tumor labeled by vascular perfusion of 
green (FITC) fluorescent lectin staining (G, green) and 
CD31-immunoreactivity viewed by Cy3 fluorescence 
(H, gold). Like the lectin, CD31-immunoreactivity 
defines the luminal surface of the vessels but, unlike 
the lectin, it also labels tiny sprouts, which have no 
apparent lumen because they have CD31 -immunoreac- 
tivity, but no lectin staining. Sprouts (white arrow- 
heads) about | um in diameter radiate from the vessel 
lining into the 100-um thick perivascular ring of tumor 
tissue (outlined by white dots). Vessels preserved in 
open state by vascular perfusion of fixative. (From 
Folkman, 2000; panels F, G, and H courtesy of Donald 
M. McDonald, University of California, San Francisco.) 


vessel. Approximately 1 million tumor cells per gram of 
tumor enter the circulation per day. Tumors contain 
angiogenic and nonangiogenic subpopulations of 
tumor cells. The cells that are already angiogenic when 
they begin to grow in a remote organ can continue to 
grow rapidly and will become detectable metastases 
soon after exiting from the blood stream. In contrast, 
nonangiogenic tumor cells may grow to a microscopic- 
sized tumor and then lie dormant and undetected for 
many years. Whenever they become angiogenic, they 
will grow to sizes that are detectable and symptomatic. 


Angiogenesis Inhibitors 


Endogenous Angiogenesis Inhibitors 
Certain endogenous inhibitors of angiogenesis are 
known to play a role in the angiogenic switch. For 


more recently discovered endogenous inhibitors, a 
function is less clear (Table 2). The first endogenous 
angiogenesis inhibitors were discovered in the 1980s: 
interferon alpha/beta, platelet factor 4, and the class 
of angiostatic steroids typified by tetrahydrocortisol. 
However, thrombospondin-1 was the first endogen- 
ous inhibitor for which there was compelling evidence 
of participation in the angiogenic switch, because it 
was downregulated in tumors before angiogenesis 
could be triggered. By 1989 the switch itself was 
understood as the result of a shift in the “net balance” 
of angiogenesis stimulators and inhibitors. This led to 
the discovery of angiostatin and endostatin, two very 
potent and specific endogenous angiogenesis inhib- 
itors. Angiostatin is a 38-kDa internal fragment of 
plasminogen. Endostatin is a 20-kDa internal fragment 
of collagen XVIII. The discovery of these two pro- 
teins came from a clinical clue. Surgical removal of 
some tumors sometimes results in rapid growth of 
remote metastases. This phenomenon had been 
observed in both animals and humans for more than 
50 years, but had always been difficult to explain. 


Table 2 Endogenous inhibitors of angiogenesis 
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Once it was recognized that tumors can produce 
both angiogenesis stimulators and inhibitors, it was 
found that the stimulators are produced in excess of 
the inhibitors within the primary tumor itself, but in 
the circulation, the inhibitor exceeds the level of sti- 
mulator, because the stimulator is cleared from the 
circulation more rapidly than the inhibitor. Therefore, 
in the presence of a primary tumor, secondary tumors 
(remote metastases) are exposed to high levels of cir- 
culating angiogenesis inhibitors. Removal of the pri- 
mary tumor can result in a decrease of the circulating 
angiogenesis inhibitor. This permits the metastases to 
become neovascularized if they have angiogenic capa- 
city. This phenomenon occurs in about 3% of human 
tumors. (The primary tumor must be removed if pos- 
sible, to prevent more shedding of tumor cells, but the 
metastases must then be treated.) Another endogen- 
ous angiogenesis inhibitor was discovered in the cir- 
culation of tumor bearing animals, a 53-kDa cleavage 
product of anti-thrombin III (antiangiogenic ATII). 
Angiostatin, endostatin, and antiangiogenic anti- 
thrombin III specifically inhibit endothelial cell 


Name Molecular Year Reference 
weight (kDa) reported 

Platelet factor 4 1982 Nature 297: 307 

Interferon alpha 1980 Science 208: 516 

Prolactin fragment 16 1993 Endocrinology 133: 1292 

Angiostatin 38 1994 Cell 88: 277 

Endostatin 20 1997 Cell 79: 315 

Antithrombin III 53 1999 Science 285: 1926 

Interleukin-|2 1995 Journal of the National Cancer Institute 87: 58 | 

Inducible protein 10 1995 Journal of Experimental Medicine 182: 155 

Vasostatin 2l 1998 Journal of Experimental Medicine 188: 2349 

Canstatin 24 2000 Journal of Biological Chemistry 275: 1209 

Restin 22 1999 Biochemical and Biophysical Research 
Communications 255: 735 

Troponin | 22 1999 Proceedings of the National Academy of Sciences, 
USA 96: 2645 

Pigment epithelium growth 50 1999 Science 285: 1926 

factor (PEGF) 

2-methoxyestradiol 1994 Proceedings of the National Academy of Sciences, 
USA 91: 3964 

PEX 26 1998 Cell 92: 391 

Id] and Id3 1999 Nature 401: 670 

VEGI 1999 FASEB Journal 13: 181 

Proliferin-related protein (PRP) 1994 Science 266: 1581 

Meth-| and Meth-2 110, 98 1999 Journal of Biological Chemistry 274: 13349 

Osteopontin cleaved product 1999 Trends in Biochemical Science 7: 182 

Maspin 2000 Nature Medicine 6: 196 


Data from Folkman (2000) and in part from Carmeliet and Jain (2000). 
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proliferation and not other cell types. They have no 
effect on tumor cells per se. Endostatin inhibits tumor 
angiogenesis, but not wound angiogenesis. It is pre- 
sent in the Caenorhabditis elegans worm as a product 
of collagen XVIII and so may be at least 600 million 
years old onan evolutionary time-scale. Recent reports 
of other endogenous inhibitors include pigment 
epithelium-derived factor (PEDF), a 50-kDa serpin 
and lactoferrin, an 80-kDa heparin- and iron-binding 
glycoprotein. 


Exogenous Angiogenesis Inhibitors 

A variety of angiogenesis inhibitors have been made 
synthetically or by recombinant technology, because 
they interfere with a specific function in the angio- 
genic pathway. Certain of these inhibitors neutralize 
angiogenic proteins (e.g., VEGF). Others block the 
receptors for these proteins, while other inhibitors 
block enzymes which are necessary for endothelial 
cells to migrate through extracellular matrix during 
the formation of capillary tubes (for a full discussion 
of these inhibitors see Folkman, 2000). 


Angiogenesis Inhibitors in Clinical Trials 
Therapeutic administration of an angiogenesis inhib- 
itor can tip the balance of the angiogenic switch so 
that angiogenic output of a tumor is opposed or abro- 
gated completely. At the time of writing 20 angiogen- 
esis inhibitors are in clinical trials for patients with 
cancer in multiple medical centers in the USA. Seven 
of these have reached Phase III. Each phase of a clin- 
ical trial usually takes at least 2 years. At least five 
angiogenesis inhibitors are in clinical trial in the USA 
for the treatment of eye diseases such as diabetic 
retinopathy and macular degeneration, which are 
dominated by pathological neovascularization. Two 
inhibitors are in Phase III. Antiangiogenic gene 
therapy using genes which code for endogenous anti- 
angiogenic proteins has been reported in a variety of 
tumor-bearing animal systems. While no clinical 
translation of this technology has been initiated, anti- 
angiogenic gene therapy holds great promise for the 
future, especially because certain endogenous angio- 
genesis inhibitors, such as angiostatin and endostatin 
are not toxic and do not induce drug resistance. 


Angiogenesis in Other Diseases 


Many nonneoplastic diseases are also angiogenesis- 
dependent. They are common to many different speci- 
alties in medicine and are potentially treatable by 
angiogenesis inhibitors. A few examples are as fol- 
lows. 

In ophthalmology, pathological angiogenesis is 
the most common cause of blindness worldwide. 


Pathological neovascularization can occur in each 
compartment of the eye accounting for at least 70 
different diseases of ocular neovascularization. An 
example is age-related macular degeneration in 
which angiogenesis occurs in the choroid. In the 
severe form of the disease, microhemorrhages from 
these new vessels lead to blindness. Approximately 
1.7 million people in the USA suffer from the severe 
form which is the leading cause of blindness at age 64 
and over. Laser therapy is less effective than in diabetic 
retinopathy. In the past 5 years it has been discovered 
that the angiogenic protein VEGF is markedly ele- 
vated in macular degeneration and may be a major 
mediator of this disease. 

In dermatology certain skin diseases are 
angiogenesis-dependent, such as cutaneous Kaposi 
sarcoma and infantile hemangioma which are already 
being treated by antiangiogenic therapy in clinical 
trials. In the future, other skin diseases such as a 
psoriasis, warts (verruca vulgaris), and neurofibro- 
matosis may benefit from antiangiogenic therapy. 

In rheumatology rheumatoid arthritis may in part 
be an angiogenesis- dependent disease. The role of 
angiogenesis in rheumatoid arthritis, and in other 
forms of arthritis as well, can be most simply concep- 
tualized as two phases, prevascular and vascular. The 
prevascular phase is analogous to an acute inflam- 
matory state in which the synovium is invaded by 
inflammatory and immune cells, with macrophages, 
mast cells, and T-cells predominating, among others. 
These cells may be the source of the angiogenic stimu- 
lators found in synovial fluid which include VEGF, 
bFGF, interleukin-8, and hepatocyte growth factor. 
Activated endothelial cells can also release hepatocyte 
growth factor. The growth of new blood vessels from 
the synovium which lines the joint, begins the vascular 
phase of arthritis. The new vessels (called a neovascu- 
lar pannus) can invade and destroy cartilage, a process 
that is enhanced by the generation of enzymatic activ- 
ity, mainly metalloproteinases, at the advancing front 
of new proliferating endothelium. This neovascular 
pannus overcomes endogenous angiogenesis inhibitors 
inthe cartilage which normally protect it from vascular 
invasion and maintain its avascularity. These inhibi- 
tors include among others, tissue inhibitors of metal- 
loproteinases (TIMPS 1, 2, 3, and 4, ranging from 21 to 
29 kDa), thrombospondin-1, and troponin 1. 

In gynecology, endometriosis is illustrative of se- 
veral diseases which are angiogenesis-dependent. In 
endometriosis endometrial tissue from the lining of 
the uterus becomes implanted outside the uterine cav- 
ity on the ovaries or on the peritoneal lining of the 
abdominal cavity. Cyclic production of female hor- 
mones drives angiogenesis in these explanted tissues. 
Bleeding in the abdomen or pelvis can occur on a 


monthly basis. At least one angiogenic protein, VEGF, 
is known to mediate the neovascularization in these 
lesions. VEGF is upregulated by increased estrogen 
and downregulated by withdrawal of estrogen. 
Approximately 780000 women in the USA suffer 
from endometriosis. No clinical trials of angiogenesis 
inhibitors for endometriosis have been initiated at the 
time of writing, but these inhibitors may be beneficial 
in this disease. 

In cardiology, atherosclerotic plaques in the coro- 
nary arteries have been shown to be intensively neo- 
vascularized. In experimental studies, mice deficient in 
apolipoprotein E and fed a Western diet develop large 
atherosclerotic plaques in the aorta. These plaques are 
also highly neovascularized and plaque growth is 
significantly inhibited by antiangiogenic therapy of 
the mice. Thus, growth of atherosclerotic plaques 
may be angiogenesis-dependent. In certain cases of 
ischemic heart disease attempts are being made to 
increase angiogenesis in the myocardium by implant- 
ing genes for angiogenic proteins. Local therapeutic 
angiogenesis has been tested in animals and is being 
tested in a very few early clinical trials in humans. It 
remains to be seen if local stimulation of angiogenesis 
in specific cases of ischemia which is refractory to 
conventional therapy will be beneficial. 


Summary 


From the field of angiogenesis research has emerged an 
increasing understanding of the regulation of growth 
and regression of blood vessels. The molecules which 
mediate this regulation and the complexity of their 
interactions are gradually being elucidated. In this 
sense, the process of angiogenesis is not unlike the 
process of coagulation. At least 40 proteins are known 
to participate in the prevention or initiation of blood 
clotting. It appears that even more proteins are neces- 
sary to prevent angiogenesis under normal conditions, 
or to initiate it during brief periods of reproduction 
and wound repair. Pathological angiogenesis, espe- 
cially in cancer, seems to be a perturbation of these 
physiological mechanisms for initiating angiogenesis 
and turning it off. The microvascular endothelial cell 
may become a second target in cancer therapy in 
addition to the cancer cell. Angiogenesis-dependent 
diseases illustrate an important direction for the 
future. Oncologists, dermatologists, ophthalmolo- 
gists, rheumatologists, gynecologists, and cardiolo- 
gists are dealing with diseases which appear to be 
different from each other. Nevertheless, this family 
of angiogenesis-dependent diseases is driven by a 
small but similar set of molecules, which are regulated 
differently in each disease. Furthermore, a new class 
of drugs, the angiogenesis inhibitors, is becoming 
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available which should permit improvements in ther- 
apy for many of these diseases. Thus, angiogenesis is a 
unifying process that has heuristic value across many 
medical specialities. 
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Nucleic acids adopt double-stranded structures by the 
antiparallel association of complementary strands, 
held together by Watson—Crick base pairing of 
adenine with thymine (or uracil) and guanine with 
cytosine. A number of mismatches can be tolerated, 
generally forming non-Watson—Crick base pairings of 
lower stability, exemplified by the wobble GeU pair- 
ing. Hybridization is the process by which comple- 
mentary strands are annealed, and is the opposite of 
the melting process by which a double-stranded 
nucleic acid becomes dissociated into its component 
strands. At equilibrium these processes should be the 
exact reverse of one another. These are often called 
helix—coil transitions. 

Annealing and melting transitions are very 
temperature-dependent. The double-helical structure 
is favored at low temperature and higher salt concen- 
trations. For a given set of conditions (notably buffer 
composition), the temperature at which the nucleic 
acid is 50% annealed is referred to as the melting 
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temperature (Tm). Short duplexes tend to melt in 
a cooperative, all-or-none process, over a narrow 
temperature interval, from which thermodynamic 
properties may be obtained using a simple two-state 
model. Longer nucleic acids generally melt in a series 
of steps, with complex and broad transitions, and 
require treatment using more complex statistical 
mechanical models. These processes can be kinetically 
quite slow. 

In the laboratory, hybridization reactions are car- 
ried out by the incubation of the nucleic acids at a 
temperature close to the Tmn for long periods of time, 
and a slow reduction of temperature. It is important 
that the process is done as close to equilibrium as 
possible, to avoid kinetically trapping incompletely 
and/or incorrectly annealed species. 


See also: DNA Hybridization 


Anonymous Locus 
L Silver 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0056 


An isolated and characterized DNA region with no 
known function but one that occurs in two different 
allelic forms within a population so as to represent a 
DNA marker that can be followed in genetic studies. 


See also: DNA Marker 
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Antibiotic resistance is the phenomenon encountered 
among some strains of bacteria when they cause an 
infection and are not inhibited or killed by the con- 
centration of antibiotic acheived in the body tissues. 
It is important to differentiate between clinically 
significant resistance and that which may be observed 
in laboratory cultures. For example most strains of 
Streptococcus pneumoniae (the causative agent of 
lobar pneumonia) are clinically sensitive to benzyl 
penicillin with an MIC < 0.01 mg 1 ~'. Susceptibility 
is usually defined as the minimum inhibitory concen- 
tration (MIC), i.e., that antibiotic concentration in 
mg1~' which inhibits growth. The MIC of penicillin 


for Escherichia coli is 32-64 mg1~' and as mean blood 
levels are of the order of 5 mg1™' infections caused by 
E. coli cannot be treated although E. coli could be 
killed in a laboratory culture incorporating high levels 
of benzyl penicillin. The ability of microorganisms to 
resist killing by antimicrobial agents was first recog- 
nized by Ehrlich in 1914 who called it ‘Arzneifestig- 
keit.’ Antimicrobial chemotherapy really started with 
the introduction of sulfonamides in 1936 and penicil- 
lin in 1940, during which year Chain and Abrahams 
described a bacterial enzyme, produced by E. coli, 
penicillinase (a type of B-lactamase) which destroyed 
penicillin. 


Mechanisms of Antibiotic Resistance 


Very many biochemical mechanisms have been iden- 
tified, but all fall into one of the following types: 


Enzymatic Modification 

Resistant bacteria exhibiting this mechanism retain the 
same target for the action of the antibiotic as sensitive 
strains but the antibiotic is modified before it reaches 
the target. B-Lactamases are the most studied group of 
enzymes responsible for this mechanism of resistance, 
over 200 types being known. They open the four- 
membered B-lactam ring found in penicillins, cephalo- 
sporins, and monobactams, by a nucleophilic attack 
on the B-lactam amide bond by the hydroxyl group 
of a serine residue (in most B-lactamases) located at 
the active site of the enzyme. An acyl-enzyme inter- 
mediate characterized by an ester bond between the 
enzyme and the penicilloyl (or cephalospory) moiety 
is produced. This ester bond is then efficiently hydro- 
lyzed by water (a nucleophile) liberating active ß- 
lactamase and inactivated B-lactam antibiotic. The 
rate of enzyme turnover is highly variable giving rise 
to variations in the level of resistance conferred. Most 
B-lactamases act to some degree against both penicil- 
lins and cephalosporins; others are more specific (e.g., 
the AmpC enzyme found in Enterobacter spp. which is 
predominately a cephalosporinase or the penicillinase 
of Staphylococcus aureus). The other major group of 
antibiotic modifying enzymes are the three classes of 
aminoglycoside modifying enzymes: aminoglycoside 
acetyltransferases (AAC), adenyltransferases (ANT), 
and phosphotransferases (APH). The enzymes are 
located in the cytoplasm and only inactivate drug 
as it enters the cell. They are frequently plasmid 
mediated and widespread among gram-positive and 
gram-negative bacteria. 


Decreased Uptake 
The target site for antibiotic action can also be pro- 
tected by preventing the antibiotic from entering the 


cell or pumping it out via an efflux pump faster than it 
can flow in. B-Lactam antibiotics gain intracellular 
access to gram-negative bacteria via a water-filled hol- 
low membrane protein known as a porin. Imipenem is 
a carbapenem -lactam antibiotic and some imipenem- 
resistant Pseudomonas aeruginosa strains lack the spe- 
cific D2 porin by which imipenem is taken up into the 
cell. A similar mechanism involving other porins is 
seen in low-level fluoroquinolone and aminoglyco- 
side resistant gram-negative bacteria. Increased efflux 
via an energy-dependent membrane transport pump is 
a common mechanism for resistance to tetracyclines in 
gram-negative bacteria and is encoded by a range of 
related genes such as tet(A) that are widely distributed 
in Enterobacteriaceae. The marRAB operon asso- 
ciated with the MAR phenotype (multiple antibiotic 
resistance) probably works in part by influencing the 
expression of distant genes such as micF which encodes 
an antisense RNA that inhibits ompF translation lead- 
ing to a reduction in the ompF porin and reduced 
entry of antibiotics such as tetracycline, B-lactams, 
fluoroquinolones, and nalidixic acid. In addition 
active efflux of tetracycline and chloramphenicol has 
been seen to be associated with the MAR phenotype. 


Altered Target Site 

All antibiotics have a molecular target which they 
interfere with to inhibit growth or kill bacteria; should 
structural changes occur in that target molecule to 
resist the action of the antibiotic the cell will be resist- 
ant. Enteroccoccus spp. are inherently resistant to 
cephalosporins because the enzymes (penicillin bind- 
ing proteins — PBPs) responsible for synthesis of the 
major structural component of the cell wall (peptido- 
glycan) have a low binding affinity for them and are 
therefore not inhibited by them. Most strains of Strep- 
tococcus pneumoniae are fully susceptible to penicillin 
but by the process of transformation cells can take up 
DNA from other species of streptococci that have 
PBPs with a low affinity for penicillins. The altered 
enzyme still synthesizes peptidoglycan even in the 
presence of penicillin but it has a different structure 
and is functional. 


Bypass of Synthetic Pathways 

Bacteria can continue to produce a target which is 
inhibited by the antibiotic, but if they produce 
an alternative target which is not inhibited the cell 
can continue to grow in the presence of the anti- 
biotic, effectively “bypassing” the effect of the antibi- 
otic. Methicillin resistant Staphylococcus aureus strains 
(MRSA) produce an additional PBP2a which is 
encoded by the mecA gene, which is not inhibited 
by antibiotics such as flucloxacillin or nafcillin. The 
transferable mechanism of resistance to vancomycin 
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seen in vancomycin-resistant Enterococcus spp. en- 
coded by the vanA gene complex is an interesting 
variant of the bypass mechanism. Vancomycin-sensi- 
tive enterococci have a target for vancomycin which is 
a cell wall precursor that contains a pentapeptide that 
has a D-alanine—p-alanine terminus, to which the van- 
comycin binds, preventing peptidoglycan synthesis. 
Enterococci carrying vanA can make an alternative 
cell wall precursor ending in p-alanine—p-lactate, to 
which vancomycin does not bind. Other genes in the 
vanA cluster contribute to resistance, e.g., VanX 
cleaves the normal p-alanine—p-alanine should any 
be made, thus enhancing the role of the alternative 
synthetic pathway. 


Molecular Basis of Antibiotic Resistance 


Bacterial resistance to antibiotics can be intrinsic or 
acquired. Intrinsic resistance results from a naturally 
occurring trait in a species, e.g., cephalosporin resist- 
ance in Enterococcus spp., and by implication all mem- 
bers of that species will exhibit that resistance pattern. 
Acquired resistance arises from mutation of an exist- 
ing gene or acquisition of new DNA encoding a novel 
gene and therefore not all strains will be resistant. 
Mutation is a spontaneous event that occurs regardless 
of the presence of the antibiotic. Mutations occur at a 
frequency of about 107 in the DNA gyrase gene 
(gyrA) of Escherichia coli frequently resulting in a 
Ser83—Leu or Trp substitution, such strains being 
highly resistant to fluoroquinolones like ciprofloxa- 
cin, the resistant mutants rapidly replacing the sensi- 
tive population by outgrowing them. 

Transferable antibiotic resistance was described in 
1959 when genes encoding sulfonamide resistance 
found in Shigella transferred to E. coli on a plasmid. 
Conjugative plasmids are small (20-200 kb), self- 
replicating circular pieces of double-stranded DNA 
which encode their transfer by replication into 
another bacterial strain or species. Antibiotic resist- 
ance genes may integrate into the DNA of a bacterio- 
phage and transfer from one host to another, a process 
known as transduction. Dying bacteria release DNA 
which can be taken up by competent bacteria, a process 
called transformation, which is increasingly recog- 
nized as an important route for the spread of antibiotic 
resistance genes, e.g., the evolution of ‘mosaic’ PBPs 
in penicillin-resistant Streptococcus pneumoniae. 

A single plasmid may carry multiple antibiotic 
resistance genes which are capable of replicative trans- 
fer from one plasmid to another or the bacterial gen- 
ome if located within a transposon (or ‘jumping 
gene’). Antibiotic resistance genes may also be carried 
on mobile gene cassettes which can be integrated into 
or deleted from their receptor elements, integrons, or 
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infrequently may be integrated at other locations via 
site-specific recombination catalyzed by an integron- 
encoded recombinase. 


Origins of Antibiotic Resistance Genes 


When the Murray collection of bacteria made between 
1914 and 1950 was examined for the presence of anti- 
biotic resistance genes none were found; however a 
number of conjugative plasmids very similar to those 
carrying antibiotic resistance genes in “modern bac- 
teria” were found. This implies that all of the mechan- 
isms for antibiotic resistance gene dissemination 
existed prior to the use of antibiotics. Many antibiotic 
resistance genes have homologs in housekeeping genes 
found in bacteria, e.g., B-lactamases and PBPs suggest- 
ing they may have evolved by mutation. Antibiotic 
resistance genes are found in fungi and bacteria that 
produce antibiotics and it is probable that they have 
moved from that source. DNA sequencing studies 
of B-lactamases and aminoglycoside-inactivating en- 
zymes show that despite similarities within the protein 
sequences, there are substantial DNA sequence differ- 
ences. As the evolutionary time frame is less than 50 
years it is not possible to derive a model in which 
evolution could have occurred by mutation alone. 
They must therefore be derived froma large and diverse 
gene pool occurring in environmental bacteria some of 
which produce antibotics. Mutation is an important 
process for the “refinement” of antibiotic resistance 
genes as has been seen in the last 10 years with the SHV 
and TEM plasmid encoded f-lactamases. The parental 
enzymes SHV-1 and TEM-1/2 are ‘pure’ penicillin- 
ases but the substitution of Glu-237—Lys in SHV-5 
and Glu102—Lys in TEM-9 extend activity to de- 
grade cephalosporins like cefotaxime and ceftazidime. 
Mutations such as Arg244— Cys and Val69—Met in 
TEM B-lactamases confer resistance to inhibition by 
B-lactamase inhibitors like clavulanic acid. 

The selection pressure for the maintenance of anti- 
biotic resistance genes is heavy and injudicious use of 
antibiotics, largely in medical practice (about 50% of 
production is used on humans, 20% in hospital, 80% 
in the community), is probably responsible. The addi- 
tion of antibiotics to animal feed or water, either for 
growth promotion or, more significantly, for mass 
treatment or prophylaxis in factory-farmed animals 
is having an unquantified effect on resistance levels. 


Further Reading 
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Antibiotic-resistant organisms can arise from anti- 
biotic-sensitive organisms in a number of ways. Genes, 
or sets of genes, conferring antibiotic resistance may 
be obtained via resistance plasmids (R plasmids), integ- 
rons, or transposable elements. These genes often 
encode proteins responsible for modifying or destroy- 
ing the antibiotic, affecting the uptake or efflux of the 
antibiotic, or modifying the cellular target of the anti- 
biotic. This mechanism is of great practical i import- 
ance, as horizontal transfer of antibiotic resistance to 
pathogens is putting severe constraints on antibiotic 
therapy for treatment of infectious disease. This topic 
is covered in the entries on Antibiotic Resistance, 
Drug Resistance, and Resistance Plasmids. 

Many of the resistance genes present on transpos- 
able elements, viruses, or specially constructed cas- 
settes also have important uses as mutagenic agents 
in the laboratory, either in transposon mutagenesis or 
in various in vitro methods (for example see Transpos- 
ons as Tools). In these cases the antibiotic-resistance 
phenotype is of importance to the geneticist because it 
can be positively selected. 

Antibiotic resistance can also be the result of a 
mutation in one or more chromosomal genes in a 
sensitive strain. Sometimes these genes are also in- 
volved in uptake, destruction, or modification of the 
antibiotic. However, in some cases the resistance arises 
because the mutation is in the gene encoding the 
cellular target of the antibiotic. These latter antibiotic- 
resistance mutations have been invaluable for dissec- 
tion of the complex reactions in macromolecular 
synthesis. In this entry we will discuss a few examples 
of antibiotic-resistance in the bactertum Escherichia 
coli which result from such mutations. 


Coumarins 


The coumarins, such as coumermycin A and novobio- 
cin, are antibiotics that inhibit certain DNA topo- 
isomerases, enzymes that catalyze interconversions of 


different topological isomers of DNA. One of the 
topoisomerases that the coumarins inhibit is bacterial 
DNA gyrase. This enzyme introduces negative super- 
coils into DNA and is an essential enzyme in DNA 
replication. Specifically, the coumarins inhibit the 
activity of the B subunit which catalyzes the ATP 
hydrolysis involved in the enzyme reaction. Some 
mutations of the gyrB gene, which encodes the B subunit 
of this enzyme, lead to resistance to these antibiotics. 


Erythromycin 


Erythromycin is one of the macrolides, a large group 
of structurally related antibiotics that inhibit protein 
synthesis. Erythromycin has been shown to bind to 
the large ribosomal subunit in the peptidyltransferase 
region of the 23S ribosomal RNA (rRNA). Resistance 
can arise from mutations in at least three different 
genes encoding large subunit ribosomal proteins. Curi- 
ously, in E. coli, genetic elimination of ribosomal 
protein L11 makes the cells hypersensitive to erythro- 
mycin. Resistance can also arise from specific muta- 
tions in the gene encoding 23S rRNA, mutations 
which must be constructed in organisms like E. coli 
that have multiple copies of this gene. These mutations 
are in a region of the 23S rRNA which is protected by 
specific methylation in the organism that produces 
erythromycin. Methylation at this site in E. coli leads 
to erythromycin resistance, but the gene that encodes 
the specific methylase must be acquired by horizontal 
gene transfer. 


Fusidic Acid 


The antibiotic fusidic acid inhibits the translational 
elongation factor, EF-G, which promotes transloca- 
tion of the ribosome from one codon on the messenger 
RNA to the next. Mutants of EF-G are known that are 
resistant to fusidic acid, and they are responsible for 
the gene encoding this factor being termed fus. Many 
such mutations inhibit the growth rate of the cell as 
well as the rate of translation elongation. 


Kasugamycin 


The aminoglycoside kasugamycin acts as an inhibitor 
of translation initiation. Sensitivity to kasugamycin is 
dependent on the presence of two dimethyladenosine 
residues found near the 3’ end of 16S rRNA. Ribo- 
somes whose 16S rRNA is missing these methylated 
adenosine residues are resistant to the antibiotic 
and, therefore, mutants defective in the methylase 
(encoded by the gene ksgA) are kasugamycin resistant. 
Cells containing this undermethylated rRNA also 
have a reduced growth rate. 
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Kirromycin 


The polyenic antibiotic kirromycin inhibits the trans- 
lational elongation factor, EF-Tu, blocking its exit 
from the ribosome. Kirromycin-resistant alleles of 
the tuf genes, which encode EF-Tu, have been isol- 
ated. Individually, these mutations lead to amino acid 
substitutions at one of a small number of sites. Many 
of these alleles lead to an increase in various errors in 
translation. Many bacteria, including E. coli, have 
duplicate tuf genes (typically tufA and tufB) and the 
alleles conferring resistance are recessive to the wild- 
type alleles. Kirromycin-resistant cells grow more 
slowly than wild-type cells. Resistance to kirromycin- 
like antibiotics in some of the actinomyctes that 
produce them is also the result of these organisms 
having a resistant EF-Tu. 


Quinolones 


The quinolones, such as nalidixic acid, also inhibit 
bacterial DNA gyrase. However, unlike coumarins 
(see above) these antibiotics specifically inhibit the 
A subunit, the nicking-ligating component of the 
enzyme. Certain mutants of the gene encoding this 
subunit, gyrA (formerly nalA), confer resistance to 
nalidixic acid and other quinolones. 


Rifampicin 


The rifamycins are a group of antibiotics synthesized 
by certain Streptomyces species. One of these anti- 
biotics, rifampicin, specifically inhibits the bacterial 
RNA polymerase by binding to the B subunit of this 
enzyme. Rifampicin blocks transcription at, or shortly 
after, the initiation of an RNA chain by the polymer- 
ase, but it does not block the elongation of chains 
already initiated. Rifampicin-resistant mutants are 
readily isolated and are found to have mutations in 
rpoB (formerly rif), the gene encoding the $B subunit. 
The mutations are point mutations or small, in-frame 
insertions and deletions at a limited number of sites 
which result in amino acid substitutions leading to the 
loss of the ability of the enzyme to bind rifampicin. 
Rifamycin-resistant mutations can have pleiotropic 
phenotypes, such as temperature sensitivity or a 
change in the regulation of transcription of some genes. 


Streptomycin 


The aminoglycoside antibiotic streptomycin inhibits 
protein synthesis in bacteria by binding to a specific 
site on 16S rRNA. Streptomycin-resistant mutants of 
E. coli were first reported in 1950. These mutants have 
amino acid substitutions in ribosomal protein $12, 
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encoded by the rpsL gene (formerly str). Streptomy- 
cin itself can increase many types of translational 
errors that occur on the ribosome. Many, but not all, 
streptomycin-resistant mutants are said to be “restric- 
tive” in that they reduce the error frequency below 
that of wild-type ribosomes. All such restrictive muta- 
tions also result in a decreased growth rate and a 
decreased peptide chain elongation rate. In at least 
some cases, these effects can be compensated for by 
second-site mutations outside rpsL without diminish- 
ing the level of streptomycin resistance. Streptomycin- 
resistant mutants are recessive in cells that also contain 
a wild-type allele. Streptomycin resistance can also 
arise from mutations in the genes encoding 16S 
rRNA. These mutations are also recessive to the 
wild-type allele and in organisms containing multiple 
rRNA genes these mutations are typically constructed 
by in vitro techniques. 


See also: Antibiotic Resistance; DNA Replication; 
Drug Resistance; Elongation Factors; Escherichia 
coli; Integrons; Resistance Plasmids; Resistance to 
Antibiotics, Genetics of; Ribosomal RNA (rRNA); 
Ribosomes; RNA Polymerase; Streptomyces; 
Streptomycin; Topoisomerases; Transcription; 
Translation; Transposable Elements; Transposons 
as Tools 
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An antibody is a protein (immunoglobulin) produced 
by B lymphocytes that recognizes and binds to a 
particular foreign ‘antigen.’ 


See also: Antigen; Immunity 


Anticodons 
A Liljas 
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The anticodon is the part of the tRNA that decodes 
the genetic message contained in the mRNA. This 
leads to the incorporation of amino acids into the 
growing polypeptide. The anticodon is composed of 
three nucleotides, located approximately in the middle 
of the tRNA sequences and at one end of these elong- 
ated molecules. 


tRNA 


Crick, in his adaptor hypothesis, proposed that small 
RNA molecules would be the adaptors that could be 
charged with amino acids by specific enzymes and 
that could identify the codons (triplets of nucleotides) 
of the mRNA by base-pairing. These adaptors could 
thus participate in incorporating the amino acids into a 
growing polypeptide. Subsequently these adaptors 
were identified and are now known as the tRNA 
molecules. From the nucleotide sequences of numer- 
ous tRNA molecules, the secondary structure of the 
tRNA, the classic cloverleaf, has been identified. Of 
the three loops, the middle one contains the anticodon 
of the tRNA. The three-dimensional structure of 
tRNA has the shape of an “L.” Here the anticodon is 
located at one end and the 3’ acceptor for amino acids 
is at the opposite end, approximately 80 A away. This 
means that the anticodon has no possibility of inter- 
acting with the amino acid. This also means that, when 
the tRNA assists in the incorporation of the amino 
acid into the growing polypeptide on the ribosome, 
the interaction of the anticodon with the mRNA is far 
from the site of peptidyl transfer. The anticodons are 
frequently posttranscriptionally modified. This con- 
cerns the bases as well as the riboses. 


Code, Codons, and Codon Usage 


With a universal triplet genetic code and four different 
nucleotides in the mRNA, there are 64 words or 
codons in the genetic code. Even though the genetic 
code is universal, there are variations in the meaning of 
some code words. In bacteria three codons designate 
stop and are normally not read by tRNAs. Since there 
are 20 different amino acids in the regular protein the 
code is degenerate. Thus there are between one and six 
codons that correspond to the different amino acids. 
The tRNAs that bind the same amino acid are called 
isoacceptor tRNAs. The number of tRNAs that de- 
code the message is variable for different organisms. 
The codons used in the tRNAs must be read by some 
tRNA expressed by the organism. In some organisms, 
the codon usage is limited to a small set of tRNAs 
(minimally 20), while in others there are different 
tRNAs for almost all codons. Thus the codon usage 
is different for different organisms. 


Anticodons 


The anticodon is composed of three nucleotides, nor- 
mally positions 34-36 of the tRNA, that read the 
codons of the mRNA, primarily by Watson—Crick 
base-pairing. However, the same tRNA can base-pair 
with different nucleotides in the third position of the 


codon, corresponding to the first position of the anti- 
codon. Normally a ‘G? in the first position of the 
anticodon can read codons ending with ‘C’ as well as 
with ‘U.’ This was first identified by Crick and given 
the name ‘wobble hypothesis,’ owing to the departure 
from strict Watson—Crick base-pairing in this pos- 
ition. Other noncanonical base pairs also occur with 
the third or wobble position of the codon. This 
includes modified bases of the tRNA. The lack of 
tRNAs for certain codons can be compensated by 
the potential that some tRNAs are able to read several 
codons. 


Ribosomal Decoding Site 


The decoding site, or the ribosomal A-site, is the site 
where the codons form a short, double-stranded 
RNA helix with the anticodons on the ribosome. It 
is situated in the neck region between the head and 
the body of the ribosomal small subunit. It is partly 
composed of one region of the penultimate helix of 
the 16S RNA. In the immediate vicinity are also 
regions of the rRNA that are involved in a confor- 
mational switch. The ribosome switches from a state 
of ribosome ambiguity (ram) to a restrictive state, 
which relates to the accuracy of decoding. The switch 
of the conformation occurs in every cycle of the 
elongation. 


Fidelity 


The fidelity of the decoding depends on the interac- 
tions between the anticodon of the tRNA and the 
codon of the mRNA. The fidelity is in the order of 
1 error per 10000 incorporated amino acid residues. 
The fidelity is a result of two main processes on the 
ribosome: the initial recognition of the tRNA by 
the ribosome-bound mRNA and the proofreading 
by the ribosome. The initial recognition of a tRNA 
is done while the tRNA is bound to elongation factor 
Tu (EF-Tu) in complex with GTP. The fidelity of this 
step is in the order of 1:100. The proofreading occurs 
once the GTP molecule bound to EF-Tu has been 
hydrolyzed and EF-Tu has dissociated from the 
tRNA and the ribosome. Before the tRNA can partici- 
pate in peptidyl transfer, it has to reorient itself to 
place the amino acid into its correct position of the 
peptidyl transfer site while maintaining the codon- 
anticodon interaction. During and after this reorienta- 
tion the aminoacyl-tRNA can fall off from the 
ribosome or proceed to participate in peptidy] transfer. 
This increases the fidelity to what has been observed 
in vivo. Certain antibiotics or ribosomal mutations 
can affect the fidelity significantly. 
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An antigen is any molecule (often foreign) whose 
existence in an organism induces the synthesis of an 
antibody (immunoglobulin). 


See also: Antibody; Immunity 


Antigenic Variation 


K L Hill and J E Donelson 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1458 


Antigenic variation is a process by which many infec- 
tious agents, including some pathogenic viruses, bac- 
teria, fungi, and parasites, evade the defense responses 
of the vertebrate immune system. A major component 
of the immune system is the generation of a specific 
group of proteins, called antibodies, that attack invad- 
ing pathogens by recognizing and binding to mol- 
ecules on the pathogen’s surface. Surface molecules that 
elicit this immune response are called ‘antigens’ (‘anti- 
body generators’) and can be proteins, carbohydrates, 
or lipids. Thus, pathogens that can periodically change 
or switch the molecular composition of their surface 
antigens are said to undergo antigenic variation. This 
periodic variation provides a means for individual 
organisms within a population to temporarily camou- 
flage themselves and thereby prevent elimination of 
the entire population by the host’s immune system. In 
some pathogens, antigenic variation is accomplished 
through random mutations in the genes encoding 
either surface molecules themselves or the enzymes 
that synthesize them. In other pathogens, antigenic 
variation is mediated by mechanisms that function 
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specifically to generate diversity in the structures of 
surface molecules. 


Antigenic Variation Caused by Random 
Events 


Examples of random antigenic variation are those that 
occur in viruses such as the influenza virus and the 
human immunodeficiency virus (HIV), which causes 
acquired immunodeficiency syndrome (AIDS). The 
major antigenic components of these viruses are glyco- 
proteins that make up their viral coat. Occasionally, 
random mutations occur in the genes for these coat 
proteins during viral replication. These mutations 
often produce changes in the structure of the corres- 
ponding protein. If these structural changes do not 
significantly impair the protein’s function, a new 
strain of virus is produced that is not immediately 
recognized by an individual’s immune system, even if 
this individual was infected previously with another 
strain of the same virus. This type of antigenic vari- 
ation is called ‘antigenic drift,’ because changes in 
antigen structure are moderate and the identity of 
new viral strains slowly ‘drifts’ away from the identity 
of the original (parent) strain. A more dramatic case of 
random antigenic variation occurs when two different 
strains of a virus infect the same individual at one time. 
In this case, large reassortment of the viral genes 
between these two virus strains sometimes occurs, 
leading to drastically modified surface antigens. This 
type of antigenic variation is called ‘antigenic shift.’ 

Random antigenic variation also occurs in bacterial 
and fungal pathogens. For example, there are an esti- 
mated 90 antigenically distinct strains (‘serotypes’) of 
Streptococcus pneumoniae, a bacterial pathogen that 
causes a variety of invasive diseases, including pneu- 
monia. Each S. pneumoniae serotype produces its own 
unique surface polysaccharide capsule, which is essen- 
tial for virulence and is a major target of host anti- 
bodies during an infection. Genes that are necessary for 
polysaccharide capsule biosynthesis can be modified 
by random mutations, large gene rearrangements, or 
genetic exchange between strains. As is the case for 
viruses, these modifications give rise to the different 
serotypes of S. pneumonia that can cause repeat infec- 
tions in the same individual. 


Antigenic Variation Caused by Specific 
Mechanisms 


Some pathogens have developed elaborate mechan- 
isms that specifically function to change the structure 
of their surface antigens. While the actual antigen 
changes may be random, these mechanisms operate 
specifically to facilitate variation, and hence do not 


rely on random errors made during replication of the 
pathogen’s genome. The best-characterized example 
of this type of ‘programmed’ antigenic variation occurs 
in African trypanosomes, single-celled, protozoan 
parasites that cause disease in humans as well as wild 
and domesticated animals. The surface of African try- 
panosomes is covered almost entirely with 10 million 
copies of a single glycoprotein called the variant sur- 
face glycoprotein (VSG). The trypanosome genome 
contains several hundred genes for different VSGs. 
However, only one VSG gene is expressed at any 
given time in a single parasite. Expression of VSG 
genes occurs at specific ‘expression sites’ that are 
located near the ends (telomeres) of the trypanosome’s 
chromosomes. Trypanosomes can change which VSG 
gene is expressed by turning off the current expression 
site and turning on a new expression site, located near 
a different telomere. Alternatively, the VSG gene that 
is currently being expressed can be removed from the 
active expression site and replaced with a different 
VSG gene. The trypanosome’s repertoire of several 
hundred VSG genes can be expanded through the 
introduction of mutations in existing VSG genes or 
by putting together new, mosaic VSG coding 
sequences using pieces of existing VSG genes. Hence, 
these parasites have a virtually endless supply of anti- 
gen variants that can be expressed on their surface for 
purposes of camouflage. 

Programmed antigenic variation was originally 
identified in African trypanosomes. However, it is 
now known to occur in other pathogens. Bacteria 
that cause Lyme disease (Borrelia burgdorferi), 
Rocky Mountain relapsing fever (B. hermsii), and 
gonorrhea (Neisseria gonorrhoeae) possess multiple 
silent copies of surface protein genes that are activated 
one at a time by transposition of a silent gene into 
a single expression site. Plasmodium spp., which are 
intracellular protozoan parasites that cause malaria, 
can also periodically activate or deactivate any one of 
50-150 var genes, which encode similar, but non- 
identical surface proteins. While antigenic variation 
in these pathogens may differe mechanistically from 
VSG switching in trypanosomes, these systems are all 
conceptually similar in that they exploit differential 
expression of a preexisting pool of variant surface 
protein genes to generate diversity in antigens exposed 
to the host immune system. 


Further Reading 

Donelson JE (1995) Mechanisms of antigenic variation in Borrelia 
hermsii and African trypanosomes. Journal of Biological Chemis- 
try 270: 7783-7786. 

Newbold Cl (1999) Antigenic variation in Plasmodium falci- 
parum: mechanisms and consequences. Current Opinion in 
Microbiology 2: 420-425. 


Swanson J, Belland RJ and Hill SA (1992) Neisserial surface 
variation: how and why? Current Opinion in Genetics and Devel- 
opment 2: 805-811. 


See also: Alternation of Gene Expression; 
Antibody; Antigen 
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Cancer results, at least in part, from mutations in three 
classes of genes: 


1. Oncogenes: These are genes whose action posi- 
tively promotes cell proliferation or growth. The 
normal nonmutant versions are known as proto- 
oncogenes. The mutant versions are excessively or 
inappropriately active leading to tumor growth. 

2. Tumor suppressor genes (TSGs) or anti-oncogenes: 
These are genes that normally suppress cell division 
or growth. Loss of TSG function promotes uncon- 
trolled cell division and tumor growth. 

3. DNA damage repair genes (mutator genes): These 
genes are not directly involved in cell growth. Their 
inactivation leads to increase in the rate of muta- 
tions in oncogenes and TSGs. 


TSGs negatively regulate cell proliferation and 
growth. The existence of TSGs was originally demon- 
strated by Harris and colleagues, who found that 
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when tumorigenic and nontumorigenic cells were 
fused in culture the resulting hybrids were generally 
nontumorigenic. When some of these hybrids reverted 
back to the tumorigenic stage, this was accompanied 
by loss of chromosomal material. 


Two-Hit Model 


In the early 1970s Alfred Knudson put forward a 
theory, “the two-hit model,” to explain how muta- 
tions in TSGs can lead to cancer. This theory postu- 
lates that mutations occurring at the same genetic 
locus on each of two homologous chromosomes 
within a single cell leads to tumor formation. 

In the case of inherited cancer, one mutation may 
be inherited (germline mutation) and one acquired 
(somatic mutation), or both may be acquired in the 
case of sporadic cancers (nonhereditary). In the case of 
inherited cancer syndromes individuals who are gene 
carriers are born with the “first hit” in each of their 
constitutional cells. Hence, one copy of the gene will 
be defective throughout their life. Then during the 
individual’s lifetime a “second hit” would occur at 
the same locus on the homologous chromosome 
within one or more cells, and tumorigenesis will be 
initiated. In the case of common, sporadic tumors 
both hits are acquired during the individual’s lifetime. 
Knudson’s model also explains that the inherited and 
common forms of the same cancer are caused by 
mutations in the same gene. It therefore follows that 
the onset of inherited cancer would be earlier in an 
individual’s lifetime than in the case of sporadic 
cancer, since only one further mutation is required in 


Table I Major tumor suppressor genes 
Gene Chromosomal Neoplasm Function 
locus 

RBI 13q14 Retinoblastoma Cell cycle regulation 

APC 5q21 Colorectal cancer Cell adhesion 

p53 17p13 Sarcomas, gliomas, carcinomas Cell cycle regulation, apoptosis 

NFI 17ql11.2 Neurofibromatosis type | RAS-GTPase activating protein 

NF2 22q12 Neurofibromatosis type 2 Cell adhesion 

WTI Ilpl2 Wilms’s tumor Transcription factor 

BRCAI 17q21 Familial breast cancer 

BRCA2 13q12 Familial breast and ovarian cancer 

VHL 3p25 von Hippel—Lindau disease 

plé 9p2l1 Familial melanoma Cell cycle regulation, inhibitor of 
CDK4/CDK6 cyclin-dependent kinases 

DPC4/SMAD4 18q21.1 Pancreatic carcinoma Cell growth inhibitor 

PTC 9q22.3 Nevoid basal cell carcinoma syndrome Negative regulator of Sonic 
Hedgehog/Smoothered signal pathway 

TSCl 9q34 Tuberous sclerosis 

TSC2 16p13.3 Tuberous sclerosis 
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inherited cases to initiate cancer as compared to two 
mutations required in the event of sporadic tumors. 
Many (but not all) TSGs comply with the “two-hit” 
model of tumorigenesis (Table 1). TSG inactivation 
may occur by mutation (missense truncating, splice, 
etc.), loss, or epigenetic silencing (methylation). 


Protein Products 


The first TSG (RBI) cloned was the gene causing 
retinoblastoma in children. Since then over a dozen 
or so TSGs have been isolated, including p53, APC, 
and VHL. These genes are also known as “gate- 
keepers” — preventing cancer through direct control 
of cell growth. The protein products of TSGs are 
known to play various roles in cell cycle control 
(RB1 prevents cells that are in Go/G; phase going 
into S phase of the cell cycle, whilst p53 acts in late 
G, phase, preventing the cells progressing to the S 
phase), apoptosis (p53: in response to DNA damage 
there is rapid increase in the level of p53 which causes 
arrest of the cell cycle during G, allowing the cell to 
repair its DNA; if repair is not possible p53 induces 
programmed cell death or apoptosis), and transcrip- 
tion regulation (WT1 protein is a transcription factor 
and can bind to specific DNA sequences causing tran- 
scriptional activation or repression). 


Clinical Implications 


As well as providing insights into the mechanisms 
that regulate normal cell proliferation, the study of 
TSGs should eventually lead to novel therapies and 
better clinical management of cancer patients. Genetic 
alterations in TSGs and oncogenes mark cancer cells 
as distinct from their normal counterparts. The study 
of TSGs will provide molecular markers that can be 
used for early detection of specific cancers and will 
provide surrogate markers for chemoprevention trials 
and possibly to help design tumor-specific therapies. 


Further Reading 

Fearson ER (1997) Human cancer syndromes: clues to the 
origin and nature of cancer. Science 278: 1043-1050. 

Kinzler KW and Vogelstein B (1996) Lessons from hereditary 
colorectal cancer. Cell 87: 159-170. 

Kinzler KW and Vogelstein B (1998) Landscaping the cancer 
terrain. Science 280: 1036. 

Knudson AG (1993) Antioncogenes and human cancer. Proceed- 
ings of the National Academy of Sciences, USA 90: 10914-10921. 

Peters G and Vousden KH (eds) (1997) Oncogenes and Tumor 
Suppressors. New York: Oxford University Press. 

Weinberg RA (1995) The retinoblastoma protein and cell cycle 
control. Cell 81: 323-330. 


See also: Oncogenes; Retinoblastoma 
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Antiparallel strands of duplex DNA are organized in 
opposite orientation, thus the 5’ end of one strand is 
aligned with the 3’ end of the complementary strand. 


See also: DNA Structure 
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Synthetic DNA oligomer antisense is used to tem- 
porarily block gene expression in vitro and in vivo. 
Antisense oligonucleotides are short, traditionally 
15-25 bases long, single-stranded DNA fragments 
that are designed to hybridize by Watson—Crick base 
pairing with mRNA or genomic DNA. Oligonucleo- 
tides are synthetized complementary to (or antisense) 
the sense-stranded target nucleic acid sequence. 

In oligonucleotide design, the nucleotides must be 
modified owing to the poor stability of DNA in vivo. 
The most frequently used DNA backbone modifica- 
tionis the substitution of the nonbridging oxygen at the 
internucleotide linkage with a sulfur at the phospho- 
rus site (phosphorothioate oligonucleotides). Other 
substitutions at that internucleotide linkage site led to 
the development of phosphodiester and methyl phos- 
phonate oligonucleotides. Important considerations 
for the design of oligonucleotides are to use sequences 
that will not form duplexes with itself, and that are 
free of secondary structures such as hairpin loops. The 
designed oligonucleotide must hybridize strongly to a 
sequence that is unique to the gene of interest and that 
is free of secondary structures. This is usually around 
the translation initiation site of the mRNA. Another 
important point is the half-life of the target protein, 
which needs to be considerably shorter than the half- 
life of the antisense oligonucleotide. DNA oligo- 
nucleotide suppression of gene expression is transient 
and degradation of the applied oligonucleotide either 
intra- or extracellularly by nucleases within days 
results in rapid return to baseline expression. 

It was originally suggested that antisense oligo- 
nucleotides prevent synthesis of gene products by 
blocking the transcription of mRNA, but it has been 
appreciated that binding to mRNA can also precipitate 


message degradation and that binding of oligonucleo- 
tides to nuclear DNA, resulting in triplex formation, 
can prevent transcription. The oligonucleotides can 
also bind in a nonsequence-specific way to proteins 
and thus block their action. This process has been 
named aptamer binding. Most studies reporting suc- 
cessful application of antisense DNA have not identi- 
fied the mechanism of action responsible for gene 
suppression. In fact, gene suppression by oligonucleo- 
tides designed as ‘antisense’ molecules is very often 
not due to sequence-specific blockage of the target 
DNA or mRNA; phosphorothioate oligonucleotides 
in particular exert high nonspecific toxicity. 

For a convincing demonstration of a gene-specific 
effect, adequate control oligonucleotide sequences 
(scrambled, sense-stranded, and one or two base pair 
mismatched) must be used along with appropriate 
biological endpoints. A sensitive technique to deter- 
mine specific antisense down regulation of the target 
gene’s expression either at the RNA or protein level is 
mandatory. Major obstacles in the use of antisense 
oligonucleotides are their poor cellular uptake and 
their limited subcellular distribution. These problems 
can be overcome in vitro by microinjection, by tran- 
sient disruption of the cell membrane by electropora- 
tion, or, in some cases, by using high oligonucleotide 
concentrations. The delivery of antisense DNA into 
cells in vitro and in vivo can be accomplished by either 
packing the DNA into inactivated virus envelopes, 
which are determined to penetrate the target cell mem- 
brane, or by coating the oligonucleotides with lipids. 

A goal even harder to achieve is tissue-specific de- 
livery of these constructs. In theory, tissue-specific 
ligands could be used to direct the oligonucleotides 
to the target cells. A more promising method is the 
local delivery of the antisense oligonucleotides. How- 
ever, some organs have a high uptake of intravenously 
administered oligonucleotides. After systemic admin- 
istration of phosphorothioate oligonucleotides into 
laboratory animals, the kidneys and the liver have 
been shown to be the organs with the highest uptake. 
This was one of the reasons investigators targeted 
these organs with antisense oligonucleotides against 
transporters, transcription factors, and cell cycle regu- 
latory genes, and demonstrated a sequence-specific 
reduction in mRNA as well as protein. Other poten- 
tial in vivo applications of antisense oligonucleotides 
might be the temporary knockdown of genes respon- 
sible for the neointimal proliferation in restenotic 
lesions after balloon-angioplasty, temporary inhib- 
ition of genes coding for MHC peptides, or adhesion 
molecules in organ transplant recipients. A further 
possible clinical use of antisense oligonucleotides is 
in the treatment of malignancies, as it has been 
shown in a mouse model of melanoma that blockage 
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of bcl-2 expression facilitated apoptosis of malignant 
cells. The successful treatment of viral infections with 
classical antiviral drugs is limited; however, the intro- 
duction of treatment strategies using antisense oligo- 
nucleotides could revolutionize antiviral therapy 
because of its sequence specificity and the advantage 
of precise selection of target sites in the virus. A caveat 
needs to be addressed when using modified oligo- 
nucleotides in vivo. Nucleotides of degraded oligomers 
can be incorporated into cellular DNA and significant 
potential exists for modified bases to induce mutagen- 
esis or interfere with normal DNA repair mechan- 
isms. Such effects may be subtle and delayed in their 
appearance, making it difficult to distinguish them 
from sequence-specific effects. 

Although antisense methods can, in theory, be used 
to limit the expression of any gene, at present this 
technique is applied to study cell physiology in vitro 
by suppressing the myriad of gene products whose 
activities cannot be manipulated by conventional 
pharmaceuticals. The antisense approach is applicable 
to a wide variety of signal transduction systems, 
including G-protein-coupled receptor signaling for 
the analysis of the downstream events that dictate 
biological responsiveness. The application of antisense 
DNA in vivo, however, has proven more difficult. 


See also: Clinical Genetics; Gene Expression 
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Antisense RNAs are small (~ 100 nucleotides), diffu- 
sible, regulatory RNAs that bind to complementary 
regions on specific target RNAs to control their 
expression or biological function at a posttranscrip- 
tional level. Most antisense RNAs have been iden- 
tified in prokaryotic organisms. However, a few 
apparent cases have been described in eukaryotic 
cells. It is likely that antisense control exists in all 
cells. 

The genetic systems in which antisense RNAs 
function are varied and interesting in their own right. 
These include the homeostatic replication systems of 
diverse bacterial plasmids, copy-number-dependent 
inhibition of mobile genetic element transposition, 
temporal development of viral gene expression, con- 
trol of cell division, and postsegregational killing fol- 
lowing cell division. Antisense inhibition can occur 
at many molecular levels, including transcription 
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termination, messenger RNA processing, messenger 
RNA decay, and translation. 

Antisense and target RNAs are often, but not 
necessarily, transcribed in opposite directions from 
the same DNA template, yielding RNAs that are 
completely complementary across a defined region. 
However, complementarity need not be complete, 
and the antisense and target RNAs can be transcribed 
from regions that are unlinked to one another. Some 
antisense RNAs inhibit two or more related but dis- 
tinct target RNAs. 

Antisense RNAs are nearly always highly struc- 
tured, being comprised of one or more stem-and-loop 
secondary structural elements, flanked or separated 
by single-stranded (unpaired) regions. Occasionally, 
tertiary structures, such as pseudoknots, form between 
two or more secondary structural elements. These 
various structures are critical for antisense RNA func- 
tion, primarily for two reasons. First, they largely deter- 
mine the sensitivity of an individual antisense RNA to 
attack by cellular ribonucleases, thereby determining 
how quickly that antisense RNA will be degraded. 
Someantisense RNAsarevery short-lived, while others 
are long-lived, and their individual half-lives often 
clearly suit their biological purposes. For example, anti- 
sense RNAs that mustrespond quickly to rapid changes 
in plasmid copy number are short-lived. Second, anti- 
sense RNA structure, as well as target RNA structure, 
determines the rate at which antisense and target RNAs 
pair to one another. In all carefully studied antisense 
RNA systems, the rate of such pairing is far more 
important than the thermodynamic stability of the 
paired species. Indeed, partial pairing is usually suffi- 
cient for biological effect. Detailed genetic and physical 
analysis of the antisense/target RNA pairing pathways 
of several key systems has helped to reveal important, 
underlying principles of RNA structure and function. 
In a few cases, accessory proteins have been shown 
to enhance pairing, usually by binding to and stabil- 
izing a key intermediate structure in the pairing path- 
way. 

Artificially expressed antisense RNAs have been 
used in a broad variety of experimental and thera- 
peutic settings. Artificial antisense RNA expression 
is usually achieved by transcribing a region of interest, 
such as a region of the DNA encoding an important 
target gene, in the antisense direction — that is, in the 
direction opposite to that of native target gene tran- 
scription. Such artificial constructions are easily made 
by modern molecular genetic methods. However, 
most artificial antisense RNAs do not function as 
well as the naturally occurring species. 


See also: Antisense DNA 
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The phenomenon of persistent antitermination has 
been best studied in the context of the phage lambda 
gene N and Q proteins, which interact with the 
RNA polymerase during transcript initiation in such 
a way as to prevent termination at subsequent rho- 
dependent terminators. This is a key element in the 
differentiation between lytic and temperate growth 
of the phage. The existence of important regulatory 
mechanisms acting at the level of transcription termin- 
ation was first demonstrated by J. W. Roberts in 1969 
(Roberts, 1969). As shown in 1989 by Berg, Squires 
and Squires (Berg et al., 1989), a similar mechanism 
helps regulate transcription of ribosomal RNA genes 
ina variety of microorganisms. 

Establishing the antitermination action of N 
involves modification of the RNA polymerase just 
after it initiates transcription at either of two lambda 
early promoters, Py and Pk. It requires a recognition 
site near the start of the transcript called nut (for N 
utilization) and involves a specific sequence (boxA) 
followed shortly by a specific stem-loop (boxB). Sev- 
eral cellular factors are involved; N binds to boxB, 
while a heterodimer of NusB and NusE (which is 
ribosomal protein $10) binds to boxA. NusA binds 
to N, and at least NusA, NusE, and NusG bind tightly 
to the RNA polymerase. The nut-site-mediated bind- 
ing of N and NusA alone is sufficient to produce 
antitermination over short distances in vitro, while 
addition of the other factors establishes a stable com- 
plex that persists over kilobases, permitting transcrip- 
tion of the whole lambda genome and suppressing 
pausing of the transcription complex along the way. 
The antitermination in the ribosomal RNA operon 
involves a very similar process and the same host 
elements. There may be an additional host factor that 
functions more or less in place of the N protein, but 
nusB and S10 also bind the boxA region more strongly 
than in the case of lambda. 

The lambda Q protein recognizes a site, qut, in the 
single lambda promoter that is responsible for tran- 
scription of the phage structural and other late genes, 
and establishes a termination-resistant transcription 
complex by a mechanism that is totally different 
from that used for N-mediated antitermination. The 
qut site is recognized on the DNA, not the RNA; the 
promoter region itself is involved, particularly certain 
bases in the —10 region, as are nucleotides 1-9 of the 


nontranscribed strand in the transcribed region. In the 
absence of Q, this segment induces a prolonged pause 
in transcription at positions +16 and +17. Sigma is 
required for the pause and is present in the complex, 
even though the polymerase has already moved from 
the initiation to elongation mode (as described in 
detail in Transcription) and has released the sigma 
factor from the elongating complex. However, a re- 
sidual nonspecific binding of sigma to the polymerase 
may be involved. The pause gives Q time to bind to the 
elongation complex; Q, in turn, seems to chase the 
polymerase from the pause site, reducing the pause 
half-life about fivefold. Q antitermination requires 
host proteins N and NusA, but not any of the other 
Nus proteins involved in N antitermination. Q sup- 
presses pausing as well as antitermination at sites far 
downstream from the promoter. 

The examination of antitermination in the lamb- 
doid phage HK022 gives further insight into the 
range of possibilities and mechanisms involved in 
antitermination. The HK022 antitermination system 
does not involve N or, in fact, any other phage-directed 
proteins, but it does involve specific sites on the DNA; 
its action can be blocked either by mutations in the 
DNA site or by mutations at specific sites in the zinc- 
binding region of the P’ subunit of the RNA polymer- 
ase. Interestingly, HK022 does encode a protein, 
nun, that is homologous to the lambda N protein in 
both sequence and location and that is also targeted 
against mut sites. However, rather than recognizing 
sites in its won genome, nun targets heterologous nut 
sites in the DNA of bacteriophage lambda and some 
related phages, where it induces termination rather 
than prevents it, thus allowing HK022 to remove the 
competition of some of its relatives. 

With all three modes of inducing antitermination, 
the effect is clearly specific to the elongating complex, 
not to the termination site; overlapping transcripts 
started from other promoters are not affected. It also 
appears that the effect relates to some sort of special- 
ized pausing state, not to the general elongation 
process, and that pausing state is involved in rho- 
dependent termination, as has been suggested from 
other lines of evidence. 

‘Attenuation’, a site-specific form of antitermin- 
ation that controls expression of a number of genes 
involved in amino acid metabolism, is discussed in a 
separate entry in the encyclopedia (see Attenuation). 


Further Reading 

Weisberg RA, Gottesman ME, Hendrix RW and Little JW (1999) 
Family values in the age of genomics: comparative analyses of 
temperate bacteriophage HK022. Annual Review of Genetics 
33: 565-602. 
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An antitermination protein is a protein that permits 
RNA polymerase to transcribe through certain ter- 
minator sites. 


See also: RNA Polymerase 


AP Endonucleases 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1764 


AP endonucleases are enzymes that make cuts in DNA 
on the 5’ side of either apurinic or apyrimidinic sites. 


See also: Endonucleases 


Apert Syndrome 


See: Craniosynostosis, Genetics of; Syndactyly 
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An apomorphy is a character that is hypothesized to 
have evolved later in time than its plesiomorphic 
homolog. All taxic homologies are apomorphic at 
some level in the phylogeny of life. Apomorphies are 
contrasted with plesiomorphies when considering 
only a restricted part of the tree. 


See also: Homology; Plesiomorphy; 
Synapomorphy 
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Apoptosis is the most frequently encountered form of 
physiological (as opposed to pathological) cell death. 
Apoptosis is an active process, requiring expenditure 
of energy and metabolic activity by the dying cell and 
therefore often termed ‘cell suicide’. 

The process is usually characterized by shrinkage 
of the cell, blebbing of the cell membrane, cleavage of 
the DNA into fragments producing a ‘laddering 
pattern’ on gels, and by condensation and margination 
of chromatin. Apoptosis, by virtue of its ‘deliberate’ 
nature, is often referred to as ‘programmed cell death,’ 
although in contrast with necrosis, cells that die 
by apoptosis do not generally elicit inflammatory 
responses. 


See also: Cell Cycle 


Aporepressors 
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An aporepressor is the inactive form of a repressor 
protein. 


See also: Repressor 
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Arabidopsis thaliana (thale cress) is a short-lived, self- 
compatible, and predominantly inbreeding annual 
plant, distributed in the temperate regions of Europe, 
the Far East, and East Africa. During previous centur- 
ies, A. thaliana was introduced in North America, 
South Africa, and Australasia and continues to expand 
its range. A. thaliana belongs to the mustard family 


(Brassicaceae). This family is morphologically well 
defined by its highly conserved flower architecture, 
with four sepals and four petals, each arranged in a 
cross, and four inner plus two outer stamens (crucifer- 
ous plants). Fruits consist of two valves and can be 
divided into siliquae (fruit more than three times as 
long as wide), which is the case for A. thaliana, or 
silicula (fruit less than three times as long as wide). The 
name of the family is derived from a characteristic com- 
pound of this clade, mustard oil, which characterizes 
not only the Brassicaceae but also a greater monophy- 
letic group of 15 plant families (with the Capparaceae 
as closest sister group to the Brassicaceae family). 

Within the Brassicaceae, taxonomical concepts 
based on morphology and anatomy are highly artifi- 
cial and do not reflect phylogenetic relationships. Trad- 
itionally, the approximately 3500 species and 350 
genera of the mustard family have been grouped into 
several tribes. Accordingly, A. thaliana is member of 
the tribe Sisymbrieae. However, this tribe does not 
exist as a natural group, which has been demonstrated 
by molecular systematics based on DNA sequence 
variation of nuclear and plastidial genes and noncoding 
DNA. Based on these molecular-phylogenetic ana- 
lyses, the closest relatives of A. thaliana are from the 
former genus Cardaminopsis, which is now integrated 
into the genus Arabidopsis. The former genus Carda- 
minopsis comprises also perennial, self-incompatible, 
and crossbreeding species. More than 45 other species 
traditionally united under the genus Arabidopsis and 
primarily distributed in Central and Southeast Asia 
have been excluded and newly described as separate 
genera such as Olimarabidopsis (e.g., A. griffithiana) 
or Crucifimalaica (e.g., A. wallichii). Furthermore, A. 
thaliana is not closely related to any segregates of the 
genus Arabis. 

A. thaliana is a diploid organism and has the 
the lowest chromosome number reported for the 
Brassicaceae family (7 = 5). This low number of chro- 
mosomes has come about by a reduction from the base 
number for the close relatives (n = 8) ton = 5. Low 
chromosome number, a small genome, inbreeding, 
and the short-lived life cycle has made this species 
the model study organism in plant molecular biology, 
resulting in the genome sequencing project. The 
genome sequencing project provides strong evidence 
for chromosomal rearrangements and chromosomal 
segmental duplications. Therefore, the approximately 
120 Mbp genome should be regarded as derived. None- 
theless, comparative genome analysis reveals high 
levels of synteny even on a microscale among several 
cruciferous plants (A. thaliana, A. lyrata, Brassica 
oleracea and its relatives, Capsella rubella). 

Molecular clock assumptions based on DNA 
sequence variation of the nuclear genes chalcone 
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synthase, alcohol dehydrogenase, the plastidial matur- 
ase K, and the mitochondrial nad4 have been used to 
predict that the divergence time of A. thaliana from its 
closest relatives, the former genus Cardaminopsis, was 
approximately 5.8 million years ago. A. thaliana 
diverged from the lineage containing B. oleracea 
(cabbage) approximately 16-20 million years ago. 
The age of the whole family is approximately 50 mil- 
lion years. 

The total genome sequence information is open 
to everybody and accessible via the World Wide Web 
in combination with numerous tools for the search 
and prediction of genes, proteins, and promoter 
sequences. The combination of the rapidly expanding 
molecular knowledge about A. thaliana with increas- 
ing knowledge about the evolutionary history of the 
Brassicaceae family on different scales (from genes to 
genomes) makes A. thaliana an ideal study object not 
only for functional analysis but also with which to 
answer evolutionary questions. 


See also: Arabidopsis thaliana: The Premier Model 
Plant; Brassicaceae, Molecular Systematics and 
Evolution of 
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The Origin of Plant Genetics 


Genetics was born out of a pea seed, the controlled 
union of a pollen tube and an egg. Gregor Mendel 
created the field of genetics when he used the garden 
pea, Pisum sativum, to study inheritance, systemat- 
ically exposing the mysterious behavior of genes and 
gametes in the control of physical traits. The concepts 
of gene dominance, segregation of maternal and 
paternal chromosomes, and independent assortment 
of genes during meiosis came from Mendel’s experi- 
ments, published in 1866, 78 years before it was 
shown that DNA was the chemical agent behind the 
observed patterns of heredity. Mendel’s use of the 
garden pea as his experimental system was his first dis- 
play of genius. The garden pea is self-fertile, and there- 
fore lacks the extreme genetic variability that comes 
with interbreeding. By manipulating the fertilization 
of true-breeding lines of pea that varied for a single 
characteristic such as flower color, Mendel was able 
to study the pattern of inheritance. It was not too 


difficult for this monk to convince his colleagues of 
the impact and significance of his research; even failed 
experiments provided a vegetable for the evening 
meal. Unfortunately, the universal nature of Mendel- 
ian genetics was not immediately recognized; it took 
the scientific community 35 years to discover the 
importance of his work. 


Arabidopsis is a Useful Model Species for 
Plant Genetics 


Arabidopsis Facts 

Mendel’s new science provided the foundation for 
modern plant genetics and the adoption of a new 
model plant species, Arabidopsis thaliana (Figure 1). 
A. thaliana is a member of the crucifer family. Unlike 
other members of this phylogenetic group, which 
includes the mustards, canola, cabbage, broccoli, and 
cauliflower, Arabidopsis is not an agriculturally 
important species. It is an annual weed that populates 
sidewalk cracks and flower gardens around the world. 
Although the bitter taste of Arabidopsis leaves may 
not have been palatable to Gregor and his brothers, it 
is the geneticist’s dream plant. Like pea, Arabidopsis is 
self-fertile. Unlike pea, Arabidopsis completes its entire 
life cycle in only 45 days. The small stature and 
fecundity of Arabidopsis also would satisfy Mendel’s 
requirements for large sample sizes and replication in 
his genetic experiments. In the one square foot that 
could support a single pea plant, 500 Arabidopsis 
plants can be grown. A single Arabidopsis plant can 
produce about 100 times more seed than a pea plant. 


A Brief History of Arabidopsis Genetics 

The positive attributes of Arabidopsis as a model plant 
for genetic analyses were first recognized by the 
German botanist Friedrich Laibach. Laibach first 
described the contiguous nature of an Arabidopsis 
chromosome, and in 1943 published a paper outlining 
the strengths of Arabidopsis for genetic experiments. 
George P. Rédei, while at Columbia University, pion- 
eered the use of Arabidopsis in plant biochemical 
genetics. Throughout the 1960s Rédei mutagenized 
large populations of Arabidopsis seeds and screened 
for mutations in genes that caused seedling lethality. 
He specifically looked for plants in which seedling 
lethality could be reversed by the addition of thiamine 
(thiamine auxotrophs) (Rédei, 1975). Rédei’s publica- 
tions provided a foundation on which an international 
research community could build. Arabidopsis is auseful 
model plant for studying many aspects of growth and 
development. Not only is Arabidopsis fast-growing and 
compact, but its physiology is also extremely robust, 
including features that are conserved throughout the 
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Figure | An Arabidopsis thaliana plant at the flowering 
stage. A rosette of leaves is present at the base of the 
inflorescence shoot. Several flowers (arrows) and seed 
pods (arrowheads) are present at this stage of develop- 
ment. Bar = | cm. 


plant kingdom. In 1979 Chris Somerville and Bill 
Ogren published a classic paper in which Arabidopsis 
mutants were used to investigate how plants survive in 
atmospheric conditions of low carbon dioxide and high 
oxygen concentrations (Somerville and Ogren, 1979). 
Since then, plant research laboratories around the 
world have used Arabidopsis and a genetics approach 
to study hundreds of physiological processes, from 
light responses to cell shape control (Szymanskietal., 
1999). The results of these experiments have greatly 
increased our understanding of plant growth and 
development. On a more practical side, information 
obtained using Arabidopsis has enhanced the value or 
production of many economically important species. 
For example, a gene that controls flowering time in 
Arabidopsis has been used to reduce the regeneration 
time of aspen trees from 7 years to 1 year. New know- 
ledge about the genes that control lipid metabolism in 
Arabidopsis has been used to increase the nutritional 
and economic value of seed oil. Both of these examples 
illustrate the importance of being able to combine 
genetic analyses, gene identification, and gene function 
across species lines. This rate of progression from 
mutant identification to gene cloning and functional 
analysis partly depends on the quality of the map of 
Arabidopsis that relates chromosome position to gene 
sequence. 


Arabidopsis and its Genome 


The Classical Genetic Map 

Early genetic screens identified loci that were import- 
ant for plant growth or metabolism, but did not 
reveal the identity of the gene that controls the pheno- 
type. The development of a genetic map of Arabi- 
dopsis was the first breakthrough in bridging the gap 
between mutant phenotype and gene identity. Using a 
large collection of Arabidopsis mutants with visually 
scorable phenotypes, Martin Koorneef and William 
Feenstra published a comprehensive linkage map 
that divided the genome into 500 map units on five 
chromosomes (Koorneef et al., 1983). The classical 
map was used by geneticists to locate the chromoso- 
mal positions of new mutations. In some cases these 
mapping experiments defined a small interval within a 
chromosome that contained the mutation of interest. 
While the classical genetic map was useful, it did not 
allow researchers to identify the individual genes that 
were affected in mutant plants. To identify the affected 
gene it was necessary to construct a map based on the 
DNA sequence of the genome. 


The Molecular Map of the Arabidopsis 
Genome 

The elevation of Arabidopsis from garden weed to the 
most studied plant on earth occurred because of its 
amenability to molecular genetic analyses. To pin- 
point a mutant gene among the tens of thousands of 
additional genes in the genome, it was essential to 
develop two additional research tools: molecular mar- 
kers and a collection of DNA clones that span the 
genome. Molecular markers allowed researchers to 
identify regions of Arabidopsis chromosomes that 
were closely linked to mutations of interest. Once a 
closely linked molecular marker delimits the location 
of the mutation to a small chromosomal segment, the 
molecular marker is used as a starting point for a 
‘chromosome walk’ to the target gene. A collection 
of contiguous DNA clones that span the chromoso- 
mal segment provides the reagents to systematically 
‘walk’ along a path of DNA clones. The ‘walk’ con- 
tinues until the segment of DNA that contains the 
mutation of interest is identified. Two parameters are 
important when considering the feasibility and speed 
of using map-based cloning to identify genes: (1) gen- 
ome size; and (2) amount of repetitive DNA. Initial 
measurements of the DNA content and sequence 
composition of the haploid Arabidopsis genome were 
carried out in Elliot Meyerowitz’s laboratory at the 
California Institute of Technology. Leslie Leutwieller 
and Bob Pruitt showed that the Arabidopsis genome 
was strong in both categories. The Arabidopsis 
genome is estimated to comprise 140 million base 
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pairs, quite a small package compared to the genome 
of wheat (17 billion base pairs). Equally important, 
DNA reassociation experiments indicated that only a 
small fraction (about 10%) of the Arabidopsis genome 
is comprised of repetitive DNA, unlike the maize 
genome, 80% of which is highly repetitive sequence 
elements. Large segments of repetitive DNA make it 
difficult to link cloned segments of DNA into an 
arrangement that reflects their order in the genome. 
The problem with repetitive DNA in creating a large 
contiguous segment is analogous to assembling a jig- 
saw puzzle that contains many pieces of the same 
color and shape. The scarcity of repetitive DNA ele- 
ments in the Arabidopsis genome made it feasible to 
assemble a collection of large-insert clones that span 
almost the entire genome. These clones provided the 
materials to determine the nucleotide sequence of the 
entire Arabidopsis genome. 


Arabidopsis Genome Sequencing: The 
Arabidopsis Genome Initiative 

The large-scale sequencing of the Arabidopsis nuclear 
genome was initiated in 1996 by an international 
research consortium. The goal of the Arabidopsis 
Genome Initiative was to provide the complete DNA 
sequence of the Arabidopsis genome by the end of 
2000. The 16 Mb contiguous DNA sequence of the 
lower arm of chromosome 2 published in 1999 was 
the largest continuous sequence in Genbank until it was 
recently surpassed by a sequenced segment of human 
chromosome 22. Currently 93% of the Arabidopsis 
genome is sequenced, assembled, and annotated. The 
analysis of the existing sequence depicts a dynamic 
Arabidopsis genome. Duplications and translocations 
of genes and chromosomes appear to be common 
events during the evolution of the Arabidopsis genome. 
Recent translocations of large chromosomal segments 
from one chromosome to another have been detected 
by comparing the DNA sequences of different chro- 
mosomes. At the gene level, about 15% of all Arabi- 
dopsis genes are the result of tandem duplications, and 
75% of the genes encode proteins that share signifi- 
cant amino acid sequence identity with another Ara- 
bidopsis gene. Sequence comparisons also identified an 
insertion of 270 kb of mitochondrial genome sequence 
that was integrated into chromosome 2. Clearly there 
is great flexibility in the rules that govern chromosome 
organization. Annotation of the genome sequence has 
confirmed the early descriptions of the Arabidopsis 
genome as gene-dense. The distribution of genes is 
surprisingly regular; on average every 4.75 kb of geno- 
mic DNA contains a predicted gene. Predictions of 
the total number of Arabidopsis genes range from 
20 000 to 26 000. About 55% of the predicted proteins 
share significant amino acid sequence identity with 


proteins of known function, usually genes that are in- 
volved in conserved functions such as DNA replication 
or translation. Sequence similarities between predi- 
cted Arabidopsis genes and homologs in other species 
provide researchers with important hints as to where to 
look to uncover gene function. For 45% of the predict- 
ed genes, the amino acid sequence does not provide 
any hint of function. In total it is estimated that the 
function of only 5% of all Arabidopsis genes is known. 


The Use of Arabidopsis to Understand 
the Function of All Genes 


Genetic Analyses and the Genome 
Sequence 

A primary justification for sequencing the Arabidopsis 
genome was to provide the international research 
community with an important tool to rapidly uncover 
the function of Arabidopsis genes. Most functional 
analyses begin with the identification of a mutation 
that disrupts the function of a biochemical or develop- 
mental pathway. Identifying both the exact DNA se- 
quence defect of the mutant and the minimal wild-type 
DNA sequence that can be added back to the mutant 
to restore wild-type gene function are the first steps in 
the path toward understanding function. The avail- 
ability of a genome sequence will accelerate chromo- 
some walks to mutated genes. The sequence is used to 
identify new DNA sequence-based molecular markers 
to identify a small region of the genome that contains 
the mutation of interest. Often, predicted open read- 
ing frames in the genomic interval that contains the 
mutation can be examined quickly for candidate genes 
that are suspected to function in the pathway being 
studied. Analysis of candidate genes can rapidly accel- 
erate gene identification. Once the mutated gene is 
identified, the clones that were used to sequence the 
wild-type Arabidopsis genome can be used to deter- 
mine which wild-type gene can be simply added back 
to the genome of the mutant individual to restore the 
wild-type phenotype. The genome sequence is also 
being used to study the sequence composition and 
function of promoters, telomeres, and centromeres in 
Arabidopsis. Comparisons of the gene content, gen- 
ome organization, and evolution of Arabidopsis, rice, 
and soybean genomes are also now possible. 


Functional Genomics and the Arabidopsis 
Research Community 

Plant research in Arabidopsis goes well beyond genome 
sequencing and conventional phenotype-based genetic 
screens. There is an international effort to provide all of 
the reagents that are needed to understand the function 
of each Arabidopsis gene. Several large-scale gene- 
tagging approaches have been undertaken to generate 
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hundreds of thousands of lines of Arabidopsis that 
contain DNA elements that randomly insert into the 
genome and cause mutations. These populations of 
“‘sequence-tagged’ mutants are used both in forward 
genetic screens for mutants and in reverse-genetic 
experiments in which the sequence of the gene of 
interest is known. One then screens through DNA 
samples isolated from plant lines for the rare sample 
that contains an insertional element within the gene of 
interest. Gene chip or microarray technology is being 
used to measure the expression levels of all genes 
under different experimental conditions. Large-scale 
protein tag experiments are being conducted to loca- 
lize every protein to a specific domain of the cell and to 
determine all of the protein-protein interactions that 
occur in the cell. At the population level, Arabidopsis 
is also being used to study evolution, adaptation, and 
the control of quantitative traits. An entry point to 
current status of Arabidopsis research and a wealth of 
information about Arabidopsis is contained at The 
Arabidopsis Information Resource [5] (TAIR) at 
http://www.arabidopsis. org/. The all-out assault on 
gene function in Arabidopsis undoubtedly will con- 
tinue to have a great impact on plant science and reveal 
many of the genetic secrets that underlie plant growth 
and development. 
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Arabinose is a simple 5-carbon sugar often found 
in plants. The bacterium Escherichia coli encodes the 


enzymes for the utilization of carbon derived from 
arabinose. Arabinose catabolism in E. coli is a well- 
studied paradigm of genetic regulation. 
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This term derives from the Greek words arachnes 
(spider) and daktylos (finger), and was coined by the 
French physician Emile Achard in 1902 to refer to 
long, thin fingers, as seen in patients previously 
described by his colleague Antonin Marfan in 1896. 
This is a common finding in several heritable disorders 
of connective tissue, Marfan syndrome and congenital 
contractural arachnodactyly. Arachnodactyly also 
can refer to the toes, and represents one manifestation 
of generalized dolichostenomelia, or relative over- 
growth of long bones. Archaic or less appropriate 
terms include ‘spider fingers, ‘acromacria, and ‘arach- 
nodactylia.? 


See also: Marfan Syndrome 
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Werner Arber (1929— ) shared the 1978 Nobel Prize in 
Physiology or Medicine with Daniel Nathans and 
Hamilton Smith, “for the discovery of restriction 
enzymes and their applications in molecular biology.” 
Each of the scientists had independently contributed 
to different components of the discovery and hence 
they shared the honor. 

Werner Arber discovered the existence of restric- 
tion enzymes — a group of “chemical knives” that cut 
DNA molecules into defined fragments. He showed 
that these enzymes bind to the DNA at specific sites 
containing recurring structural elements made up of 
specific base pair sequences. Arber also postulated 
the molecular mechanisms leading to variation in 
the DNA of bacteriophage - the virus infecting a 
bacterium. 

Arber was born in 1929 in Granichen, Switzerland. 
He studied natural sciences at the Swiss Polytechnical 


School in Zurich and biophysics at the University of 
Geneva. In Geneva, Arber developed strong foun- 
dations in genetics, electron microscopy, and the 
physiology of bacteriophage. After training as a post- 
doctoral fellow in the United States for 2 years, he 
returned to Geneva in 1960 and pursued genetic 
research. After a series of preliminary studies related 
to bacteriophage physiology, he began studying the 
mechanisms of gene transfer in bacteria with phage as 
vector. Specifically, he and his associate Daisy Dussoix 
began assessing changes in the genetic materials of 
the common bacterium Escherichia coli, induced from 
radiation and from bacteriophage. 

By studying the phenomenon of “host controlled 
restriction of bacteriophages” these investigators dis- 
covered that over time, the “infected” viral DNA is 
changed in the host cell, although the host DNA itself 
remains unchanged. This suggested the existence of a 
natural “barrier” against foreign genetic material in the 
host cells. They then found that a variety of enzymes 
were involved in such barrier functions. 

Arber and colleagues then showed that there were 
two main processes intimately involved in the devel- 
opment of barrier functions: restriction and modifica- 
tion. ‘Restriction’ meant breakdown of DNA, while 
‘modification’ meant preventing such breakdown. 
Arber postulated that both processes are catalyzed 
by a specific set of enzymes. He proposed that the 
DNA molecule contained specific sites with capacities 
to bind both types of enzymes. He also demonstrated 
the chemical nature of those sites, as recurring, specific 
base pair sequences. The enzymes act at these sites 
either by cleaving the molecule causing restriction 
(breakdown) or by adding a methyl group, methylat- 
ing (causing modification) — the latter prevented the 
breakdown. 

Thus, Arber and associates had discovered the prin- 
ciples that were involved in the functioning of a family 
of “chemical scissors” which were responsible for 
both the cutting of specific segments of the DNA 
molecule, and to preventing such cutting. 

Arber’s discovery opened new avenues in mole- 
cular genetics. Using purified bacterial restriction 
enzymes, Hamilton Smith confirmed Arber’s results 
and hypotheses, and isolated the first restriction 
enzyme that cut the DNA in the middle of specific 
symmetrical sequences. Daniel Nathans showed that 
restriction enzymes could be used for constructing 
genetic maps and developed methods involving res- 
triction enzymes to explain how genes were organized 
and expressed in the living cells. 

These pioneering works led to such major discov- 
eries as determining the order of genes on human and 
animal chromosomes, analyzing the chemical struc- 
tures of genes, and most importantly, the discovery of 
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the regions of DNA that regulate gene functions. The 
knowledge of restriction enzymes also led to develop- 
ing new combinations of genes — the recombinant 
DNA technology and genetic engineering. 

Thus, the discovery of restriction enzymes became 
the cornerstone of advancing molecular biology, gen- 
etic engineering, and clinical genetics in which one 
could characterize the genetic basis for human ill- 
nesses. Without the development of the restriction 
enzyme knowledge base, the Human Genome Project 
would not have succeeded. 

After learning her father had won the big prize, 
Arber’s 8-year-old daughter Sylvia asked him to 
describe his research in simple terms, and then made 
up her own version of the discovery. In her “Tale of the 
King and His Servants” she said: 


on the tables in my father’s laboratory, there are plates with 
colonies of bacteria, like a city with many people. Inside 
each bacterium there is a king called “DNA,” who is very 
long and skinny. The king has many servants called 
“enzymes,” who are thick and short. One such servant 
serves as a pair of scissors. If a foreign king invades the 
bacterium, this servant can cut him up into small fragments, 
but does not harm his own king. My father received the 
Nobel Prize for the discovery of the servant with scissors. 


See also: Bacteriophages; Genetic 
Recombination; Nathans, Daniel; Restriction 
Endonuclease; Smith, Hamilton 


Archaea, Genetics of 


F T Robb 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.007 I 


History and Characteristics of the 
Archaea 


History 

Since their discovery in 1978 by Carl Woese and col- 
leagues, the Archaea have become widely recognized 
as a third lineage of life, distinct from bacteria and 
eukaryotes, but sharing many molecular characteris- 
tics of both of these taxa (Woese et al., 1978). Origin- 
ally identified by distinctive sequence patterns in their 
16S ribosomal RNA, the Archaea are also set aside by 
unique phenotypic characteristics such as ether-linked 
membrane lipids. Another widespread feature of the 
Archaea is their colonization of environments with 
the most extreme physical and chemical conditions 
that support life as we know it. An extensive body of 
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research, assembled over more than 12 years, confirms 
that hyperthermophilic Archaea can be isolated from 
most active geothermal areas. The sources of many 
hyperthermophilic isolates are undersea hydrothermal 
vents which are superheated to more than 200 °C and 
boiling terrestrial hot springs seemingly prohibitive to 
life. Growth of the Archaea currently in pure culture 
can occur at temperatures up to 113 °C, and across a 
wide pH range, upwards from pHO, in saturated 
salt, and under anaerobic, highly reduced gas phases. 
Many species produce methane and other gases nor- 
mally considered highly toxic, such as carbon monox- 
ide and hydrogen sulfide. The Archaea are separated 
into two subkingdoms, the Euryarcheota and the 
Crenarcheota, which are deeply divided on phylo- 
genetic trees constructed using 16S ribosomal RNA 
sequences. 

The Archaea that have been recovered from harsh 
environments have unusual metabolic processes such 
as methanogenesis, sulfur and sulfate reduction, sulfur 
oxidation, nitrate reduction, and hydrogen oxidation. 
The chemolithotrophic energy physiology of many 
genera of the Archaea may have significant relevance 
for space exploration. The notion that these unusual 
microorganisms might provide model systems for the 
interplanetary transmission of life has recently gained 
considerable acceptance. They may be relics of histor- 
ical transport processes that brought microbial life to 
earth from neighboring planets, where they may still 
be resident. On the other hand, they may contaminate 
the planets and moons that our space vehicles visit if 
appropriate precautions are not observed. Many new 
physiological processes, as well as adaptive strategies 
in response to extreme conditions, are clear in terms 
of the phenomena involved, but in most cases await 
clarification in terms of mechanisms. The develop- 
ment of archaeal genetic systems is beginning to 
answer specific questions on mechanisms of growth 
in extreme conditions, and may be crucial to the 
reliable identification of extraterrestrial life forms, if 
they exist. Surprisingly, the idea that the Archaea are 
confined to extreme environments may be a miscon- 
ception resulting from our inability to recover the vast 
majority of microbes in the environment. Recently, 
another lifestyle was added to the Archaeal repertoire 
with the discovery of an uncultivated symbiotic spe- 
cies, Crenarchaeum symbiosum, which grows intra- 
cellularly in the marine sponge Axinella mexicana 
(Schleper et al., 1998). This discovery has highlighted 
the possibility that the Archaea that we are studying 
are the “weeds” or easily cultivated strains, and that 
we are ignoring the most interesting aspects of the 
Archaea, namely their interactions with other organ- 
isms, either prokaryotic or eukaryotic. Inaddition, there 
is substantial evidence of abundant, but uncultivatable 


marine Archaea growing at low temperatures, for 
example in Antarctic waters. 


Diversity, Genomics, and Basic Genetic 
Mechanisms 
The diversity of the isolated species of Archaea is 
growing steadily. There are at least 70 described spe- 
cies of methanogenic Archaea, with a wide range of 
habitats ranging from the rumen, where they are 
responsible for significant global methane production, 
to hydrothermal vents. To date, members of at least 
22 genera, with widely divergent growth physiology, 
have been characterized and found to grow at tem- 
peratures above 90 °C, the generally accepted threshold 
for classification as a hyperthermophile. Many more 
genera of thermoacidophiles and their phages, which 
generally grow at lower temperatures, have been dis- 
covered by Wolfram Zillig and his coworkers. 

Genomic sequencing projects have recently been 
completed for four thermophilic Archaea, and efforts 
to sequence the genomes of at least six other species are 
under way. The full sequences from the hyperthermo- 
philes Methanococcus jannaschii, Archaeoglobus fulgi- 
dus, and Pyrococcus horikoshii, and the extreme 
thermophile Methanobacterium thermoautotrophi- 
cum are freely available and fully annotated. 

Although genomic sequence information is not 
limiting at this time, surprisingly little data is available 
covering basic genetic processes. Despite considerable 
effort, progress in this area continues at a slow pace. 
The spontaneous mutation rates of Sulfolobus spp 
have been measured by Jacobs and Grogan (1997) and 
were found to be on the order of 10’ mutational events 
per cell per division cycle for the pyrE and pyrF loci. 
In addition, an intrinsic mechanism for exchange and 
recombination of chromosomal markers was recently 
described in Sulfolobus acidocaldarius. The recent 
description of at least five additional conjugative 
plasmids in Sulfolobus spp. indicates that marker 
exchange may be a common event in hot solfataric 
environments, leading to speculation that viral evolu- 
tion may show a trend toward cell-to-cell transmis- 
sion when the extracellular environment is basically 
hot sulfuric acid! In many cases the recombination 
process can occur in rapidly agitated liquid cultures, 
unlike the mating phenomenon of the halophile Halo- 
ferax volcani, which appears to require prolonged 
and stable cell-to-cell contact. A mobile intron from 
Desulfurococcus mobilis has recently been described 
and may be of use in generating a new type of vector 
for genetic knockout formation in hyperthermo- 
philes. 

These mechanisms of chromosomal recombination 
are very intriguing, supporting the possibility that 
the Archaea may be remnants of an early form of 


microbial life that relied on efficient gene transfer as a 
means of adaptation to extreme and rapidly changing 
environments. The stage is set for the application of 
the power of genetic analysis to dissect adaptive func- 
tions of the group as a whole. 


Growth and Selection 


The very adaptations that make Archaea fascinating 
objects of study have impeded efforts to establish 
genetic systems, and continue to be the major chal- 
lenge in this area of research. The principal reason for 
this is that extreme conditions are required for growth 
of many Archaea, which makes it necessary to modify 
many of the routine procedures of microbiology 
merely to observe growth. For example, solid growth 
media using agar as a solidifying agent are useless 
above 70 °C, necessitating the use of Gellan gum, 
which has the property of remaining solid at tempera- 
tures up to 100 °C. Plates solidified with Gellan gum 
can also be incubated under anaerobic and highly 
reduced conditions which allow the growth of most 
of the genera of hyperthermophiles, and several have 
been reported to form colonies in one to several days. 
However, most research involving colony formation 
with the Archaea is slow and painstaking. 

In general, a plethora of markers and selection loci 
that are routinely used with bacterial strains are not 
available for the Archaea. The difficulties in obtaining 
colonial growth and of maintaining thermosensitive 
selective agents such as antibiotics for the lengthy 
incubation periods at temperatures of 80-100 °C are 
the major limitations. In addition, many Archaea are 
refractory to conventional antibiotics affecting bac- 
teria. This is due in part to their non-bacterial tran- 
scriptional system, which is similar to the poll 
complex in yeast, and to the fact that their ribosomes 
are significantly divergent from bacterial ribosomes, 
although still classified as 60S rather than as 80S. 

The most tractable of the Archaea to date are 
undoubtedly the halophiles, which grow readily on 
surfaces of agar-solidified media containing high con- 
centrations of NaCl and Mg”" salts, generally forming 
colonies in 2-5 days. Many halophiles show highly 
variable colony morphology and coloration, which 
is evidence for their possession of natural genetic 
mechanisms such as transposition via insertion 
sequences (so-called IS elements). Halophiles are 
sensitive to many antibiotics such as mevinolin, an 
inhibitor of 3-hydroxy-3-methylglutaryl coA reduc- 
tase, and resistance can be expressed by a modified 
copy of the chromosomal gene expressing this enzyme 
at an elevated level. Resistance to anisomycin and 
thiostrepton has also been obtained using a mutated 
23S rRNA, which is the target for these antibiotics. 
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The antibiotics carbomycin, celesticetin, chloram- 
phenicol, and thiostrepton as well as butanol and 
butylic alcohol have been used as growth inhibitory 
agents in hyperthermophiles. Novobiocin, an inhibi- 
tor of DNA gyrase, has also been used to good effect 
in the design of shuttle vectors for the halophiles, but 
it is unfortunately thermolabile. The effective use of 
puromycin as a selective agent in mesophilic methano- 
gens and the availability of a puromycin resistance 
gene marker has led to the construction of a set of 
seven vectors for Methanococcus and Methanosarcina 
spp. 

The hyperthermophiles, such as Sulfolobus and the 
Thermococcus/Pyrococcus group, grow at tempera- 
tures in excess of 80 °C, and colony formation may 
require incubation for up to one or two weeks. 


DNA Transfer 


Transformation 

Chromosomal transformation and complementation 
of the mesophilic, methanogenic Archaeon Methano- 
coccus voltae was first reported in 1987 by Bertani and 
Baresi (1987). This method, which is of low efficiency, 
is not widely applicable to many Methanosarcina spp. 
which are characterized by rigid chondroitin-like cell 
walls (methanochondroitin), causing the cells to grow 
as multicellular aggregates which have low plating 
efficiency and are refractory to DNA uptake. In 
some instances, the methanochondroitin habit can be 
cured by continuous growth in high salt media, lead- 
ing to the production, on marine salts medium, of 
cells with S-layer outer boundaries. The subsequent 
growth of these cultures as free cells in suspension has 
led to the development of transformation methods 
that are sufficiently mild as to leave a significant pro- 
portion of the population as viable cells. Recently, 
DNA transfer into Archaea with glycoprotein S-layers 
has been shown using electroporation, protoplast for- 
mation, and liposome techniques. Transformation fre- 
quencies of as high as 2x10° transformants per ug 
of DNA per 10° cells or approximately 20% of the 
recipient population have been reported using lipo- 
somes. The S-layers can be disrupted by treatment 
with Mg**-free sucrose buffer or by EDTA treatment, 
leading to protoplast formation and easier access into 
the cells by transforming DNA. The cell cultures are 
regenerated by resuspension in Mg**-containing 
medium. Liposome-mediated DNA transformation 
has also been used to establish the first tractable, 
highly efficient cloning systems for methanogens 
(Metcalf et al, 1997). Methanobacterium spp. are 
bounded by a rigid cell wall of pseudomurein that 
can be spheroplasted by digestion with a methanobac- 
terial endopeptidase, which may lead to the use of 
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standard transformation protocols with these strains 
as well. Amongst the 22 genera of hyperthermophiles, 
transformation and transfection protocols have been 
reported only for Sulfolobus and Pyrococcus. Plasmids 
are available for the hyperthermophiles Sulfolobus and 
Pyrococcus, and these provide the basis for ongoing 
vector construction in these hosts. A Desulfurococcus 
mobile intron may provide a novel means to introduce 
genes into a variety of archaeal hosts. 


Phage Growth and Transduction 

The three major groups of Archaea have all been 
shown to harbor phage and virus-like particles. How- 
ever, only one archaeal transducing phage, YM1 from 
Methanobacterium thermoautotrophicum, has been 
shown to transfer chromosomal markers. Because of 
its low burst size (approximately six phage per cell), 
this phage is not in widespread use. DNA from Phage 
SSV1 of Sulfolobus solfataricus has been shown to be 
infective, and it is unique in showing high-frequency 
integration into the host’s chromosome. A newly isol- 
ated related phage, His 1, which infects the halophile 
Haloarculae hispanica, shows considerable promise 
for use as a transduction system. 


Conjugation 

One of the bacteria-like capabilities of the Archaea, 
cell-to-cell conjugative transfer of DNA, appears to 
be quite widespread in halophiles and in Sulfolobus 
spp. It therefore seems to occur in both Crenarcheota 
and Euryarcheota. Transfer of chromosomal genes by 
cell fusion or cytoplasmic bridges has been observed 
in several halophile species, notably Haloferax 
mediterranei and H. volcanii, which also exhibit inter- 
specific conjugation. Plasmid pNOB8, a 45 kb conju- 
gatable plasmid of a Sulfolobus species isolated from a 
Japanese hot spring, is transferred unidirectionally 
and propagated at high frequency throughout a 
mixed culture of the Japanese isolate with either Sul- 
folobus solfataricus or S. islandicum, an Icelandic 
strain. Because of the high copy number of the plas- 
mid in recipient cells during epidemic spread, the 
colonies of recipients can be distinguished visually 
without a selectable marker because of their small 
colony size, compared with the colonies of plasmid- 
free cells. To date, however, the conjugation phenom- 
enon has not been exploited to any extent for genetic 
mapping or for strain construction. 


Shuttle Vector Systems 

Plasmid vectors using puromycin resistance as a selec- 
tive marker have been developed for many methano- 
genic Archaea. The current vectors are constructed 
either with autonomous replication from rolling- 
circle origins, or else with the ability to integrate into 


the chromosome using cassette markers flanked by 
chromosomal DNA fragments to promote homolo- 
gous replication. Recently, vectors have been devel- 
oped based on the thermoadapted hygromycin B 
phosphotransferase gene, which confers resistance 
to hygromycin at 85 °C. Mutations in the gene that 
result in increased thermostability of the protein, 
have allowed two shuttle vectors to be developed for 
hyperthermophiles: pEXSs, which utilizes a phage 
SSV1 replication origin, can be propagated in S. solfa- 
taricus under selection for hygromycin resistance. 
Another autonomously replicating plasmid, pAG21, 
contains the adh gene from S. solfataricus and a repli- 
cation origin from the Pyrococcus abyssi plasmid 
pGT5. The plasmid propagates in both P. furiosus 
and S. solfataricus. The selection marker used is imper- 
fect, however. The system uses toxic alcohols such as 
butanol that are detoxified by the Sulfolobus adh gene 
encoded by the plasmid and the selection provided is 
weak, possibly because of resident alcohol dehydro- 
genases encoded by the Pyrococcus spp. 
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Arginine (Figure l) is one of the 20 amino acids 
commonly found in proteins. Its abbreviation is Arg 
and its single letter designation is R. As one of the 
nonessential amino acids in humans, it is synthesized 
by the body and so need not be provided in the indi- 
vidual’s diet. 
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Figure I Arginine. 


See also: Amino Acids 
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Yeast artificial chromosomes (YACs) are the products 
of arecombinant DNA cloning methodology to isolate 
and propagate very large segments of DNA ina yeast 
host (Burke et al., 1987). The YAC cloning system 
provides a means of cloning exogenous DNA seg- 
ments as linear molecules and at a size scale that is 
significantly larger than can be accomplished in bac- 
terial cloning systems. The cloning capacity of YACs 
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is open-ended and ranges from less than 100kb to 
greater than 1000 kb, a capacity 30 times that of cos- 
mids (40 kb) and five times that of bacterial artificial 
chromosomes (BACs, typically 200-300 kb). Because 
of their very large insert size, YACs simplify the pro- 
cess of constructing physical maps of genomes as a 
series of overlapping cloned segments (also called con- 
tig maps). Furthermore, the budding yeast host, Sac- 
charomyces cerevisiae, is a eukaryotic organism that 
offers a variety of homologous recombination-based 
methodologies for subsequent manipulation of cloned 
exogenous DNA segments within YACs. Vectors that 
target homologous recombination to exogenous 
DNA segments in YACs have been developed, and 
provide a means for introducing specific mutations, 
reporter sequences, or appropriate selectable markers. 
Modified YACs can be transferred back into cells as 
intact DNA segments (by processes collectively 
known as YAC transgenesis) for analysis of gene 
structure and function in cultured cells or in experi- 
mental animals. 


YAC Structure and Cloning 


The basic structure of a YAC resembles a telocentric 
chromosome (Figure 1). The short arm (~5 kb) con- 
tains four DNA elements derived from YAC vector 
sequences: a functional telomere (TEL), a centromere 
(CEN), an origin of replication (ARS), and a yeast 
selectable marker. The long arm consists primarily of 
a contiguous segment of cloned exogenous DNA (up 
to 1000 kb) and vector sequences containing a second 
yeast selectable marker and a functional TEL sequence 
at the distal end. Thus, YACs contain all the cis-acting 
elements (CEN, ARS, TEL) required for chromo- 
some replication and proper segregation in the yeast 
host. Because YACs replicate once per cell division 
cycle and segregate faithfully at mitosis, they are 
maintained stably at approximately one copy per 
cell. YACs are constructed by ligation of the YAC 


TEL (plasmid) YSM-1ARS CEN 


(400-4000 Ko exogenous DNA Ti 
100-1000 kb exogenous DNA insert -5 kb 

YSM-2 (plasmid) TEL 
Figure | Basic structure of a yeast artificial chromo- 


some. TEL, Tetrahymena telomere-derived sequences; 
(plasmid) sequences derived from bacterial cloning 
vector such as pBR322; YSM-I and YSM-2, yeast genes 
for selecting yeast host transformants, generally proto- 
trophic markers; ARSI, yeast autonomously replicating 
sequence; CEN, yeast centromere DNA. 
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vector “arms” onto the ends of size-fractionated high 
molecular weight insert DNA. The ligation products 
are then introduced into yeast cells by an optimized 
DNA-mediated transformation method. Yeast trans- 
formants carrying YACs are selected using auxo- 
trophic markers in the yeast host strain that are 
complemented by the selectable markers on the YAC 
vector arms (typically Ura3* Trp1*). 


Practical Considerations 


While certainly feasible, the construction of compre- 
hensive genomic YAC libraries with average insert 
sizes of greater than 300 kb is technically challenging, 
requiring considerable effort and technical skill. A 
number of representative genomic libraries have 
been constructed from human DNA and from the 
DNA of numerous other organisms (Green et al., 
1998). Efficient methods have been developed for 
screening YAC libraries for individual clones using a 
PCR-based strategy. 

YACs have played a major role in the construction 
of clone-based physical maps of whole genomes, most 
notably of the human genome. The basic method used, 
called ‘STS-content mapping,’ involves the use of 
sequence-tagged sites (STSs) as markers and YACs as 
the source of cloned DNA. An STS is a short segment 
of genomic DNA that can be uniquely detected using 
a PCR assay. An STS map represents the genome as a 
series of STS landmarks at known physical distances 
from one another. By determining the STS content of a 
sufficiently large number of individual overlapping 
YACs using a sufficiently high number of unique 
STSs, both the order of STSs and the extent of overlap 
of adjacent YAC clones can be deduced simultan- 
eously. 

The development of methods to transfer YACs 
back into mammalian cells has provided a means for 
analyzing gene expression or regulation within very 
large regions of DNA. On introduction into mammal- 
ian cells, the cloned insert DNAs within YACs tend to 
integrate into the mammalian genome as intact 
segments. This feature permits functional analysis 
of large stretches of DNA spanning thousands of 
kilobases, involving large genes, gene clusters, or regu- 
latory elements that can be dispersed over large 
regions. 

There are technical difficulties associated with 
YAC cloning methodology. In particular, individual 
YAC clones exhibit a high rate of chimerism, that is 
the presence of two unrelated segments of DNA 
within a cloned YAC insert. Such chimeric YACs 
constitute about 50% of the YAC clones in most 
libraries. Another problem is the difficulty in purifi- 
cation of YAC clone DNA away from endogenous 


yeast chromosomes in sufficient quantities to allow 
subsequent analysis. Because of this difficulty, YACs 
are often subcloned into smaller segments in bacterial 
vectors which are then amplified and purified from 
bacterial cells to allow for efficient subsequent manipu- 
lations such as DNA sequencing. 

An excellent presentation of background infor- 
mation and detailed protocols for constructing, isolat- 
ing, and using YACs can be found in Green et al. 
(1998). 
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Introduction 


Artificial selection is distinct from natural selection in 
that it describes selection applied by humans in order 
to produce genetic change. When artificial selection is 
imposed, the trait or traits being selected are known, 
whereas with natural selection they have to be 
inferred. In most circumstances and unless otherwise 
qualified, directional selection is applied, i.e., only 
high-scoring individuals are favored for a quantitative 
trait. Artificial selection is the basic method of genetic 
improvement programs for crop plants or livestock 
(see Selective Breeding). It is also used as a tool in 
the laboratory to investigate the genetic properties of 
a trait in a species or population, for example, the 
magnitude of genetic variance or heritability, the pos- 
sible duration of and limits to selection, and the 
correlations among traits, including with fitness. 


Expected Response to Selection 


With selection on individual performance (mass selec- 
tion), the expected response (R) to selection each gen- 
eration is given by R = ° S, where ° is the heritability 


of the trait and S is the selection differential applied to 
it, i.e, the mean superiority of selected parents. The 
ratio or regression of selection response on selection 
differential (the realized heritability) is therefore an 
estimate of the heritability of the trait. The rate can 
also be expressed as R = ib op = iho a, where op’ is the 
phenotypic variance, ¢,” is the additive genetic vari- 
ance and iz = S/gp is the selection intensity (see Herit- 
ability; Selection Intensity). If selection is practiced on 
some index or other criterion of selection, family 
mean performance, for example, the expected response 
per generation is iproa, where p is the accuracy of 
selection (p = / for individual selection) (see Selection 
Index). In this form it is seen that the response depends 
on: (1) spare reproductive capacity to enable selection 
to be applied, (2) the accuracy of predicting genotype 
(specifically breeding value) from phenotype, and (3) 
the magnitude of genetic variation (specifically addi- 
tive genetic variance) in the trait. The expected rate of 
response per year also depends on (4) the generation 
interval (L, the mean age of parents when their pro- 
geny are born), and with continued selection of the 
same intensity the annual rate equals R/L. 

These predicted responses are functions solely of 
variances and covariances, and formally hold only for 
a single generation. Selection itself changes gene and 
haplotype frequencies and hence the genetic variation, 
and is necessarily practiced in a population of finite 
size, such that heterozygosity falls due to genetic drift. 
For the infinitesimal model of additive unlinked genes 
each with infinitesimally small effect on the trait, 
selection yields negligible gene frequency changes 
and consequent variance changes. It induces, however, 
a negative correlation of frequencies among loci, i.e., 
linkage (gametic) disequilibrium, so genetic variance is 
reduced by an amount which can be predicted using 
methods of Bulmer. Most of this reduction occurs in 
two generations, and asymptotes at about one-quarter 
loss of response, more if there is tight linkage between 
the relevant genes. With long-term selection, changes 
in gene frequency from both selection and drift have 
to be taken into account as, eventually, do mutations. 
Therefore although it is possible to make qualitative 
predictions, for example that rates of response will 
reduce as genes become fixed, quantitative predictions 
require information on gene effects and frequencies. 
As this is not available, it is not possible to make 
confident predictions about the magnitude of long- 
term response from data that can readily be obtained 
(see Selection Limit). Most of our information there- 
fore comes from the results of selection experiments, 
which typically show that initial rates of responses are 
maintained for five or more generations without sub- 
stantial attenuation, and may continue for many tens 
of generations. 
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Correlated Responses to Selection 


Correlated responses in other traits are also expected 
as a consequence of artificial selection (see Genetic 
Correlation) The predicted response can be expressed 
in several ways. With mass selection on trait X, for 
example, the correlated response (CRy) expected in a 
trait Y is given by CRy = ira hx hy apy, where 7 is the 
selection intensity, rą is the (additive) genetic correla- 
tion between X and Y, and the subscript to the herit- 
abilities and variances define the trait. More generally, 
CRy = ipo ay, where p is the accuracy of selection (as 
above). The correlated response therefore depends on 
the magnitude of the genetic correlation and on the 
genetic variability of the correlated trait Y, and on 
the effectiveness of the direct selection on trait X (see 
also Selection Index). 

The artificial selection on trait X also induces a 
correlated selection differential in Y, i.e., in the 
phenotypes of those selected. The ratio of correlated 
response to (correlated) selection differential in Y is 
equal to cova/covp = rahahby/rp (where cova and covp 
are the genetic and phenotypic covariances). Note that 
it is not equal to the heritability of Y, hy? = Vay/ Vey, 
illustrating the problem of inferring what trait(s) have 
been selected and with what intensity simply by 
observation. In principle, data are required on all traits 
for this to be done. 

As for the trait under selection, correlated responses 
to selection are expected to deviate from initial expect- 
ation in a predictable way due to linkage disequili- 
brium, and over the long term in an unpredictable 
way due to gene frequency change. Indeed, correlated 
changes are less easy to predict accurately than are those 
inthe traitunder selection. They rely oninitial estimates 
of genetic correlation, themselves hard to estimate 
precisely; and genes contributing positively and 
negatively to change in the correlated trait may change 
at different rates in the population; indeed, the signs of 
correlations can, in principle, change during selection. 


Changes in Gene Frequency 

If selection has been effective in changing mean 
performance, it implies that the frequencies in the 
population of genes that influence the trait(s) under 
selection must have changed in frequency. Reversing 
the argument, selection changes the gene frequency, 
which then leads to a change in mean performance. In 
essence, artificial selection induces fitness (due to via- 
bility) differences between genotypes at loci affecting 
the trait, so standard methods from population genet- 
ics can be used to predict change in gene frequencies. 
For an additive gene with frequency q in which the 
difference between homozygotes and heterozygote 
in performance is a, its selective value with mass 
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selection can be shown to be 2ia/op, and thus the 
change in gene frequency is zaq(1 — q)/op. The con- 
sequent change in mean from that locus is aAq, and if 
these changes are summed over loci, in the absence of 
epistasis the response in the mean ihap is obtained. 
The genetic variance also changes because of this 
change in frequency. 


Selection Experiments 


Numerous selection experiments using artificial 
selection have been conducted over the last century, 
for a wide range of traits in many species. With very 
few exceptions, selection response has been achiev- 
ed, indicating that there is genetic variance present 
for any trait in any population. In those experiments 
continued for many generations, responses have 
continued so that the mean performance of the 
selected line is well outside the range of that found 
in the base population. Examples of important or 
typical experiments are given in the accompanying 
figures to illustrate the roles of selection experiments. 

Experiments differ in aspects of design. In some an 
unselected control population is maintained so that 
genetic and environmental change can be distin- 
guished. In others divergent selection, in which a 
high and a low line are maintained, in practice so as 
to check whether response is symmetric in the two 
directions (which may require a control as a check) or 
simply to eliminate environmental change by compar- 
ing the contemporaneous high and low lines. The out- 
come of a selection experiment in a finite population is 
essentially a random walk: because the selected line 
is finite in size, genetic sampling (drift) produces 


variation in response among (conceptual) replicates 
each generation, and selection in the next generation 
starts off from that in the current generation. Whilst it 
is possible to calculate the sampling error of the 
response using simple approximations, which show 
that the variance of response is inversely proportional 
to the size of the population, replication of the experi- 
ment is a more robust practice. 


Examples 

The Illinois corn oil experiment (Figure |) was estab- 
lished before 1900, and continues to this day (Dudley, 
personal communication). Seed is selected from cobs 
with a high or low oil content in the high and low 
lines, respectively. While response in the low lines has 
attenuated, it is doing so at levels of oil content so low 
that there is little variation left in the population. 
Response in the high line continues after almost 
100 generations of selection. In 1934 ‘Student’ used 
results of this experiment to estimate numbers of loci 
contributing to response, by comparing response with 
variance; he obtained a value of over 30 genes, but 
estimates rise as response continues. Essentially, this 
experiment shows the power of selection to effect 
change and also the duration over which responses 
continue, these presumably being fed by new muta- 
tions coming into the lines over the many generations 
since selection started. 

Figure 2 shows the lines of Drosophila melanogaster 
selected for abdominal bristle number from a founder 
population recently caught from the wild (Clayton and 
Robertson, 1957). This experiment is important 
because it was used to test directly quantitative genetics 
theory: estimates of heritability and variance were 
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Figure | 


Selection for oil content in maize (unpublished graph courtesy of J W Dudley, University of Illinois). 


IHO, continuous selection for high oil content; RHO, reverse high oil selection; SHO, switchback to high oil selection; 
ILO, continuous selection for low oil content; RLO, reverse low oil selection. 
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Figure 2 Selection for abdominal bristle number from an outbred population of Drosophila melanogaster. 


(A) Responses in 


five replicate selected high and low lines (broken) and in relaxed or unselected lines (solid). 


(B) Frequency distribution of bristle number in the base population and in the highest and lowest replicate lines after 


34-5 generations. (Reproduced with permission from Clayton and Robertson, 1957.) 
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made in the base population, and used to predict 
selection response; the observed response was well in 
line with predictions (early generations are shown in 
Figure 2A). After long-term selection, the distribu- 
tions of bristle number in the high, base population, 
and low lines were nonoverlapping, illustrating the 
power of selection and the fact that the initial pheno- 
typic range could readily be exceeded (Figure 2B). 


ABDOMINAL BRISTLE NUMBER 


Figure 3 presents lines of Drosophila melanogaster 
selected for bristle number from an inbred base 
(Mackay et al., 1994) In these the responses come 
entirely from mutations subsequent to the start of 
the selection experiment, and can be used to estimate 
the input of variance from mutation. Note the asym- 
metry of response, presumably due to asymmetry in 
the distribution of mutations. As in the selection 
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Figure 3 Selection for abdominal (top graph) and sternopleural bristle number (lower graph) from an inbred 
population of Drosophila melanogaster. (Reproduced with permission from Mackay et al., 1994.) 


experiments from an outbred base shown in Figure 2, 
genes with deleterious effects on fitness were found to 
be segregating in later generations. 


Other Examples 
Selection experiments have also been used to: 


1. Estimate genetic parameters (see Genetic Correl- 
ation, Heritability). 

2. Estimate the effect of population size on short- and 
long-term response, in particular to test theory (see 
Selection Limit). It has generally been found that 
greater responses have been obtained with larger 
populations; most notably Weber (1990) has 
included populations of many hundreds of breed- 
ing individuals. 

3. Compare the responses to alternative breeding 
schemes; for example Bell et al. (1955) compared 
the effectiveness of selection on pure and cross line 
performance for improvement of cross perform- 
ance; and Falconer (1952) compared selection on 
animals reared in good and poor environments. 

4. Test the predictions of Wright’s shifting balance 
theory, usually by comparing selection in a single 
large population with that in a series of small lines 
between which selection is made, the best chosen, 
intercrossed, and new inbreds formed. It has not 
proved effective in most experiments. More direct 
comparisons have been undertaken by Wade and 
Goodnight (1991). 


These examples illustrate the importance of artifi- 
cial selection as a tool in experimental quantitative 
genetics. It is also, of course, the basis of genetic 
improvement of crop plants and livestock. 
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Introduction 


The fungal genus Ascobolus was established in 1791 by 
Persoon. Ascobolus has smallapothecia (fruiting bodies) 
with large asci protruding beyond the hymenium at 
maturity (see Figure |). Ascobolus is in the Ascobol- 
aceae, Pezizales, Ascomycota, and there are about 50 
species in all. These fungi usually live on dung or 
rotting plant remains and have a world-wide distribu- 
tion. Some are homothallic and self-fertile; others are 
heterothallic, usually with two mating types. 

Although many species have been studied myco- 
logically (Van Brummelen, 1967), nearly all genetic 
work has used Ascobolus immersus, which is common 
and widely distributed, especially on the dung of herbi- 
vorous mammals. A. immersus has been extremely 
important in elucidating recombination mechanisms, 
mainly through studies of segregation ratios in un- 
ordered octads from crosses using ascospore color 
markers, so that aberrant segregation ratios can be 
identified visually, as in Figure 2. 


Biology 


Sexual Reproduction 

In some species, there are fruit bodies that are cleisto- 
thecial (closed) and in others they are perithecial 
(with a neck), but in most species they are apothecial 
(open disks). Some species can develop fruit bodies 
parthenogenically, but they are usually sexual. Some 
species have ascogonia and antheridia, while some 
only have ascogonia. In heterothallic species, fusion 
is normally between vegetative hyphae, though A. 
carbonaris has trichogynes and antheridial conidia. 
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Figure | 
ascospores) cross, normally giving 4+:4w/—78 segregations. A spontaneous mutation occurred early in the cross, 
from + to a new white mutation, giving a 0 red—brown:8 white ascospore segregations. The diameter of the 
apothecium in the top left corner is 650 um. This figure originally appeared in Lamb BC and Wickramaratne MRT 
(1973) Corresponding-site interference, synaptonemal complex structure, and 8+:0m and 7+:lm octads from 
wild-type x mutant crosses of Ascobolus immersus. Genetical Research 22: 113-124, Cambridge University Press 
(reproduced with permission). 


In A. immersus the apothecia are up to 1 mm across, 
hemispherical, and usually immersed in the substrate 
up to the disk, with only a few asci protruding 
(Figure |), and with up to 40 asci being produced in 
partly synchronized waves. The asci are 500-700 by 
100-130 um and are phototropic, turning to face open 
spaces. The eight large uninucleate haploid ascospores 
are about 50-70 by 28-36 um each and are arranged 
in roughly two rows (Figure 2). In the Pasadena 
(California) strains, the ascospores are dark red- 
brown and oval, whereas the ascospores of European 
strains are brown and more straight-sided. Many spe- 
cies have violet ascospores. In A. immersus, the asco- 
spores are dehisced as a group, and therefore too 
heavy to be much influenced by air currents. They 
travel up to 30cm horizontally and 35 cm vertically. 
In contrast, species such as A. furfuraceus and A. 
scatigenus largely dehisce their ascospores individu- 
ally, and so air currents are important for dispersal. 


Apothecia of A. immersus with intact asci from a wild-type (+, red-brown ascospores) x white |—78 (white 


The diploid stage is limited to the fusion nucleus in 
the young ascus after conjugate nuclear division of 
both mating types in the ascogenous hyphae. In 
A. immersus, sexual reproduction follows the fusion 
of vegetative hyphae of opposite mating type. When 
the temperature reaches 17.5-22.5 °C, dehiscence starts 
about 9 days after inoculation of + and — strains, and 
continues for about 3 weeks. Strains often become 
sterile after prolonged vegetative culture. 


Vegetative State 

Although a few species have oidia and small conidia, 
most have no asexual spores, asin A. immersus, in which 
the mycelium is white or pale, although yellow and 
other mutant forms occur. The fungus can germinate, 
grow, and fruit on synthetic media (Yu-Sun, 1964). The 
hyphae are septate, branched, and coenocytic, with 
up to 10 haploid nuclei per segment. Anastomosis of 
hyphae is very common. Unlike in Neurospora crassa, 
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Figure 2 An intact, undehisced ascus of A. immersus which has been removed from its apothecium. It is from a wild- 
type (+) x white !—10 cross and has a 6+:2w!—10 gene conversion ratio, with one white ascospore partly hidden. The 
ascus is 550 um long. This figure originally appeared in Lamb BC and Wickramaratne MRT (1973) Corresponding- 
site interference, synaptonemal complex structure, and 8+:0m and 7+:Im octads from wild-type x mutant crosses 
of Ascobolus immersus. Genetical Research 22: 113-124, Cambridge University Press (reproduced with permission). 


vegetative fusions between A. immersus strains of 
opposite mating type can occur, as well as fusions of 
like mating type. The haploid chromosome number has 
been reported as 8, 9, 11, 12, 14, and 16; it is 11 in the 
European strains, with nine identified linkage groups. 


Ascospore Germination 

As befits a coprophilous genus, ascospores pass 
unharmed through animal alimentary canals and are 
often stimulated to germinate by this passage. Germin- 
ation can also be stimulated by heat shock (e.g., 2h 
at 50 °C, or 3min at 70 °C, for different species) by 


treatment with alkali or pepsin, by incubation on 
dung extract medium, or by incubation at 37 °C. 
Ascospore color mutants are often easier to germinate 
than wild-type spores, and white spores often germin- 
ate spontaneously on plain agar. 


Advantages and Disadvantages of 
A. immersus for Genetic Studies 


The life cycle of A. immersus takes about 12-17 days. 
The large haploid ascospores are easy to pick up with 
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mounted needles at x25 to x50 magnification. In 
addition to thousands of known white ascospore 
mutations (e.g., Figures | and 2), it is possible to 
get granular ascospore mutants, with the pigment 
restricted to large and small granules on the outer 
spore surface, instead of being uniform. It is also pos- 
sible to get auxotrophs, ascospore shape mutations, 
and mycelial morphological mutants. DNA cloning, 
integrative transformation, and physical mapping 
methods have been developed by Rossignol’s group. 
The plasmids of A. immersus tend to be unstable. 
One cannot carry out ordered tetrad analysis for 
mapping centromeres or detecting postmeiotic segre- 
gation, but by having two pairs of visual ascospore 
markers segregating, such as white/red and granular/ 
nongranular, one can detect postmeiotic segregation, 
e.g., aberrant 4:4 asci. Conversion frequencies are 
often high, usually 1-12%, sometimes up to 26%, 
with genetic factors affecting them. The whole octad 
of ascospores usually dehisces together and so can be 
collected on lids with agar for visual scoring or isol- 
ation and germination in octad analysis. A disadvantage 
is the lack of asexual conidia for mutation studies and 
filtration enrichment for auxotrophs, although modi 
fied enrichment methods using fragments of ger- 
mination hyphae have been successful. Because asco- 
spore pigmentation markers are usually only expressed 
in haploid ascospores, it is difficult to test for allelism 
of closely linked ascospore markers, as one cannot 
get diploids or partial diploids, although very rare 
oversized ascospores enclosing more than one meiotic 
product have occasionally been used for cis/trans tests. 


Uses of A. immersus in Genetic Research 


References on this subject are given in a review by 
Lamb (1996). Research on recombination has included 
relations between crossovers and conversions, polar- 
ity in recombination, formation of symmetrical and 
asymmetrical hybrid DNA, gradients of conversion 
frequencies across a locus, wider ratio octads (classes 
requiring that more than one pair of chromatids was 
involved in hybrid DNA formation at one point, e.g., 
8+:0m and 7+:1m ratios in + x m crosses), the rela- 
tion between a mutation’s molecular type and its 
conversion spectrum (frequency of postmeiotic segre- 
gation and relative frequencies of conversion to + and 
to m), co-conversion, and double-strand gap repair. 
Other studies include the effects of gene conversion in 
evolution, methylation induced premeiotically, and 
mutation and reversion frequencies. 
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The special value for genetic studies of the ascomycete 
fungus Ascobolus immersus results from two features: 
(1) the ascospores spontaneously eject from the fruit- 
ing body in clusters of eight, these including a pair for 
each of the four products of a single meiosis; (2) the 
ascospores, which are dark brown, are made colorless 
by mutation at any of a number of genetic loci, and the 
color reflects the genotype of the spore. In a cross 
segregating for an ascospore color mutation, the inves- 
tigator can quickly score large numbers of octads 
for recombination by merely scanning a collecting 
surface. In the cross of mutant x wild-type, any 
departure from the normal 4:4 ratio is easily noted. 
The non-4:4 ratios represent gene conversion or post- 
meiotic segregation at the mutant site. 

In a cross between strains carrying two different 
mutant alleles at the same spore color locus, no brown 
spores are produced except by intragenic recombin- 
ation. In the octad with a pair of recombinant brown 
spores, the other three pairs are scored for which 
parental allele they carry. This permits the distinction 
between reciprocal recombination and gene 
conversion. This system is beautifully suited for the 
easy and quick analysis of large numbers of tetrads 
with intragenic recombination at the ascospore color 
loci. Extensive studies of this kind by Rossignol and 
his colleagues have provided a detailed description of 
the properties and products of meiotic recombination 
that must be accounted for by any proposed models 
for the mechanism of the event. 

The usefulness of Ascobolus in genetic research is 
limited by the lack of vegetative spores. Such cells have 
permitted students of other filamentous fungi (e.g., 
Neurospora and Aspergillus) to exploit the methods 
of bacteriology: plating known numbers of cells, 
screening large numbers of cells for rare variants. 


The absence of these spores has impeded the Ascobolus 
work in various ways, one being the failure to detect 
nutritional mutants. Such mutants would be useful as 
genetic markers and to construct selective systems for 
the detection of rare events like mutation, intragenic 
recombination, and genetic transformation. 


Further Reading 

Rossignol J-L and Picard M (1991) Ascobolus immersus and Podos- 
pora anserina: sex, recombination, silencing and death. In: 
Bennett JW and Lasure LL (eds) More Gene Manipulations in 
Fungi, pp. 266-290. San Diego, CA: Academic Press. 


See also: Ascobolus; Fungal Genetics; 
Gene Conversion; Meiotic Product; 
Postmeiotic Segregation 
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An ascus is a fungal structure containing a tetrad or 
octad of (haploid) spores, representing the result of 
single meiosis. 


See also: Tetrad Analysis 
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Asilomar Conference on Recombinant 
DNA Molecules 


In 1975 a group of scientists gathered at the Asilomar 
Conference Center, Pacific Grove, California to 
discuss the potential dangers of recombinant DNA 
molecules. The group concluded that it would be 
prudent to exercise caution in carrying out recom- 
binant DNA experiments. It was reasoned that the 
combining of genetic information from different 
organisms might result in unknown consequences. 
The recommendations of the conference resulted in 
classifying the risk of a recombinant DNA experiment 
based on the scale of the experiment and on what 
biological organisms formed the source of the DNA. 
The higher the risk of the experiment the greater the 
containment. Containment was to be based both on 
(1) special laboratory facilities: physical containment, 
and (2) on the biology of the system: biological con- 
tainment. Certain experiments were to be deferred, 
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such as cloning genes that encoded toxins, until some 
time in the future when appropriate experimental con- 
tainment could be assured and additional experience 
with the technology had been acquired. The recom- 
mendations of the Asilomar conference ultimately 
became the basis for the NIH’s Recombinant DNA 
Guidelines. 


Further Reading 

Berg P, Baltimore D, Brenner S, Roblin RO and Singer MF (1975) 
Summary statement of the Asilomar conference on recom- 
binant DNA molecules. Proceedings of the National Academy of 
Sciences, USA 72(6): 1981-1984. 

Berg P, Baltimore D, Brenner S, Roblin RO and Singer MF (1975) 
Asilomar conference on recombinant DNA molecules. 
Science 188 (4192): 991-994. 


See also: Recombinant DNA Guidelines 
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ASO is the acronym for an allele-specific oligonucleo- 
tide which is typically 17 to 25 bases in length and has 
been designed to hybridize only to one of two or 
more alternative alleles at a locus. An ASO is usually 
designed around a variant nucleotide located at or near 
its center. It is used typically in combination with the 
polymerase chain reaction (PCR) protocol as a means 
for determining the presence or absence of a particular 
alelle in a genomic DNA sample. 


See also: Polymerase Chain Reaction (PCR) 
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Asparagine (Figure |) is one of the 20 amino acids 
commonly found in proteins. Its abbreviation is Asn 
and its single letter designation is N. As one of the 
nonessential amino acids in humans, it is synthesized 
by the body and so need not be provided in the indi- 
vidual’s diet. 
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Figure | Asparagine. 


See also: Amino Acids 
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Aspartic acid (Figure |) is one of the 20 amino acids 
commonly found in proteins. Its abbreviation is Asp 
and its single letter designation is D. As one of the 
nonessential amino acids in humans, it is synthesized 
by the body and so need not be provided in the indi- 
vidual’s diet. 
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Figure | Aspartic acid. 


See also: Amino Acids 
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The genetic system of Aspergillus nidulans, an asco- 
mycetous fungus (Kingdom: Fungi; Phylum: Asco- 
mycota; Class: Plectomycetes; Order: Eurotiales; 
Family: Trichocomaceae) has been used extensively for 
studying genetic mechanisms, metabolic regulation, 
development, differentiation, and cell cycle controls. 
A. nidulansis also known as Emericella nidulans, parti- 
cularly in DNA sequence databases, owing to peculi- 
arities of fungal taxonomic nomenclature. A. nidulans 
has biological properties that have permitted develop- 
ment of experimental techniques for manipulating its 


genome to provide insightful experimental results. 
These properties include: a haploid genome of modest 
size; a sexual (meiotic) reproductive cycle; an asexual 
(meiotic) reproductive cycle; simple nutritional 
requirements; rapid growth and reproduction; small 
size; multicellularity; transformability; and readily 
observable, informative phenotypes. A. nidulans is 
related to the extensively studied yeast Saccharomyces 
cerevisiae (Class: Hemiascomycetes) but has provided 
complementary experimental approaches and infor- 
mation due to its multicellularity, developmental path- 
ways, and genetic system. A. nidulans is also closely 
related to several other fungal model systems, including 
Neurospora crassa (see Neurospora crassa). 


History 


Almost all commonly used laboratory strains of 
Aspergillus nidulans have been derived from a single 
strain isolated by Yuill in 1939. Aspergillus genetics 
was pioneered by Pontecorvo, his associates, and his 
students during the 1940s. These scientists developed 
many of the basic techniques for utilizing the genetic 
system, including mutation and selection, meiotic and 
mitotic recombination, and aneuploidy (Pontecorvo 
et al., 1953). Molecular technologies, including DNA- 
mediated transformation, were developed in the 1970s 
and 1980s, and a physical map of the genome and an 
extensive DNA sequence database were developed in 
the 1990s (Martinelli and Kinghorn, 1994). 


Growth and Reproduction 


Life Cycle 
The life cycle of A. nidulans (Figure l) passes through 
the following stages: 


1. Growth. The vegetative growth phase consists of 
filamentous, multinucleated cells called hyphae (see 
Figure 2A), that elongate apically and branch sub- 
apically, accompanied by repeated mitoses. An 
Aspergillus colony is a syncytium because there is 
cytoplasmic continuity between the cells. 

2. Asexual reproduction. When hyphae encounter 
conditions of nutrient depletion and exposure to 
air, they form asexual reproductive structures, 
called conidiophores, that in turn produce uni- 
nucleate, mitotically derived spores called conidia 
(Figure 2B). Conidia germinate under nutritive 
conditions to produce hyphae, completing the asex- 
ual life cycle. 

3. Sexual reproduction. Sexual reproduction occurs 
spontaneously within aging colonies that produce 
multicellular fruiting bodies called cleistothecia. 
Within sac-like cells in the cleistothecia, called 
asci, nuclei undergo karyogamy and meiosis (see 
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Figure | The life cycle of Aspergillus (Emericella) 
nidulans. (Reproduced with permission of the author and 
publisher from Scherer M and Fischer R (1998) Purification 
and characterization of luccase II of Aspergillus nidulans. 
Archives of Microbiology 94: 78-84.) 


Meiosis) followed by two mitotic divisions to 
produce eight binucleate spores called ascospores 
(Figure 2C). Ascospores germinate under nutritive 
conditions to produce hyphae, completing the 
sexual life cycle. There are no discernible mating 
types: any two laboratory strains can be crossed by 
cocultivation. 


Heterokaryosis 

Genetically distinct strains can fuse to produce hy- 
phae containing variable mixtures of nuclei with dif- 
ferent genotypes, a heterokaryon. Because nuclei do 
not move freely through pores in septa, heterokaryons 
are intrinsically unstable. Heterokaryons can be main- 
tained by selecting for complementing nutritional 
deficiencies. In the laboratory, heterokaryons are 
usually formed and allowed to produce cleistothecia 
to make genetic crosses. 


Diploidy and Aneuploidy 
Aspergillus grows vegetatively as a haploid (7n = 8). 
However, diploid (27) strains can be selected from 
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Figure 2 Micrographs of Aspergillus nidulans cell types. 
(A) Light micrograph of vegetative hyphae; (B) scanning 
electron micrograph of conidia, the asexual spores; and 
(C) scanning electron micrograph of ascospores, the 
sexual spores. (Figures (B) and (C) were kindly provided 
by Drs KY Jahng, D-M Han, and YS Chung, and The 
Korean Filamentous Fungi Study Group.) 


heterokaryons and maintained indefinitely. Unlike 
premeiotic nuclei formed during sexual reproduction, 
which are also 2n, vegetative diploids do not undergo 
meiosis. The mechanisms controlling the different 
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fates of premeiotic versus vegetative diploid nuclei are 
not known. Vegetative diploids can be destabilized by 
inhibitors of chromosome segregation. They then ran- 
domly lose chromosomes to return to the haploid 
state. n + 1 aneuploids can be selected and stably 
maintained. Each of the eight n + 1 aneuploids has a 
diagnostic phenotype (Kafer and Upshall, 1973). 


Genetic System 


Meiosis 

Genetic crosses are made by mixing conidia from 
genetically marked strains and selecting for a hetero- 
karyon. Cleistothecia produced by the heterokaryon 
are microdissected, cleaned of adhering peripheral 
cells, crushed, and the ascospores are grown out. In 
any given cross, some cleistothecia will result from 
each parent’s self-fertilization. Recombinant cleisto- 
thecia are typically identified by incorporating muta- 
tions resulting in different, readily identifiable 
phenotypes, such as colony color. Once identified, 
ascospores from recombinant cleistothecia are grown 
in appropriate numbers to score for assortment of all 
segregating alleles. For fine-structure mapping, recom- 
binants across an interval can be exclusively obtained 
in large numbers by selecting for markers in repulsion. 
Although tetrad dissection is possible, this procedure 
is technically very challenging and is not typically 
used. Centromeres have been mapped by mitotic 
recombination. 


Mitosis 

Mitotic crosses are made by forming diploids, usually 
with a mutant strain to be tested and special strains 
containing single mutations on each chromosome that 
result in different, readily identifiable phenotypes. 
The diploid is then destabilized, and haploid sectors 
are selected, grown out, and scored for each phenotype, 
including that of the test strain. Mitotic crossing-over 
is rare. Therefore, the phenotype of the test strain will 
segregate in repulsion with the special strain mutation 
residing on the same chromosome as the unknown 
mutation (Table 1). All other special strain mutations 
will segregate 1:1. This is a rapid method for assigning 
new mutations to chromosomes. 

Mitotic crossing-over occurs at the four-strand 
stage (Figure 3). Therefore, in a heterozygous diploid, 
crossing-over between the centromere and the hetero- 
zygous locus, followed by chromosome reassortment, 
can produce homozygous diploids (see twin-spot 
analysis in Drosophila). The rate of production 
of homozygous diploids defines the mitotic genetic 
distance of a locus from the centromere. Meiotic 
and mitotic genetic distances cannot be compared 
directly. 


Table | Mitotic assignment of a mutation to a 
chromosome‘ 

Marker on Wild-type Unknown 
chromosome strain mutation 
| II 13 

II 15 9 

Il 12 12 

IV 10 14 

V 13 II 

VI 16 8 

Vil 24 0 

Vill 14 10 


fA diploid was formed between a mitotic mapping strain, 
containing a scorable marker on each chromosome, and a 
strain containing a new mutation in an otherwise wild-type 
background. Twenty-four haploids were obtained and 
scored for presence of mapping strain markers and the 
unknown mutation. The lack of assortment with the 
chromosome VII marker shows that the new mutation 
resides on that chromosome. 


Mutagenesis and Selection 

Mutations are typically induced in conidia by treat- 
ment with chemical mutagens, ultraviolet light, or 
ionizing radiation. Mutations resulting in growth or 
developmental abnormalities can be detected by direct 
observation. Nutritional deficiencies can be detected 
by replica-plating onto media containing or lacking 
nutritional supplements. Conditional mutations, such 
as temperature-sensitive mutations, can also be found 
by replica-plating. Unstable diploids, containing muta- 
tions in microtubule components, can be used to 
detect heterozygous lethal mutations on selected 
chromosomes because they result in complete segre- 
gation distortion in haploid derivatives. 


Transformation 

DNA-mediated transformation is accomplished by 
enzymatic removal of cell walls from hyphae in osmot- 
ically stabilized medium to form protoplasts, addition 
of DNA, protoplast fusion and regeneration, 
and selection for an introduced trait (Yelton et al., 
1984). Most often selective traits complement 
mutational deficiencies, e.g., anamino acid requirement 
in the recipient strain. However, antibiotic resistance 
determinants can be used as well. Most transformation 
events involve recombination of the transforming 
DNA with the chromosomes, although autonomously 
replicating transformation vectors exist. Recombin- 
ation of incoming DNA with chromosomes generally 
follows the patterns established for S. cerevisiae. Thus, 
gene disruption and replacement are readily accom- 
plished in Aspergillus. Many methods have been 
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Figure 3 Genetic assortment following mitotic crossing-over in a heterozygous diploid. m = mutant, + = wild-type. 


developed for cloning specific genes from libraries of 
plasmids or cosmids by mutation complementation. 


Maps 


1. Genetic map. An extensive recombinational map 
for Aspergillus has been assembled and is regularly 
updated (Clutterbuck, 1993). 

2. Physical map. A physical map of the genome has 
been assembled, consisting of overlapping cosmid 
clones (Prade et al., 1997; http://fungus.genetics. 
uga.edu:5080/). 

3. DNA sequence. The DNA sequence can be 
accessed by the public at GenBank under Emeri- 
cella nidulans, including the results of expressed 
sequence tag (EST) sequencing projects. A draft 


*ed: Sequence was done at Cereen Economics. We expect to 
make it available on a web site late in 2000 or early in 2001, 
pending Monsanto approval. 


(3 x shotgun) of genomic sequence is expected to 
be made available in 2001.* 


Genetic Resources 

Aspergillus strains, genetic map, transformation vec- 
tors, genes, and the physical mapping resources are 
maintained by and available from the Fungal Genetics 
Stock Center (http://www.fgsc.net). 


Developmental Regulation 


Conidiophore development has been studied exten- 
sively because it is easy to control in the laboratory 
and is amenable to investigation by using molecular 
and genetic approaches. Figure 4 shows developing 
conidiophores, which consist of a basal foot cell, a 
stalk (S) terminating in a swollen vesicle (V), two 
layers of cells called metulae (M) and phialides (P), 
and long chains of conidia (C). 

Hundreds of genes are selectively activated during 
development. Many of these encode structural 
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Figure 4 Scanning electron micrographs of conidiophore development. (A) A conidiophore stalk (S) with a 
developing vesicle (V). (B) Developing metulae (M) bud from the surface of the vesicle. (C) Developing uninucleate 
phialides (P). (D) Mature conidiophore bearing conidia. (Figure was kindly provided by Reinhard Fischer, 
Laboratorium für Mikrobiologie, Philipps-Universitat Marburg and Max-Planck Institut für Terrestrische Mikrobio- 


logie, Marburg, Germany.) 
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Figure 5 Pathways controlling conidiophore development in Aspergillus. Major regulatory events required for 
conidiation are shown. Arrows indicate positive interactions, and blunt lines indicate negative interactions. 


proteins or enzymes that are responsible for the dif- 
ferentiated functions of the various cell types making 
up the conidiophore. Three major regulatory genes en- 
coding transcription factors that control development 
are brlA, abaA, and wetA (Figure 5). Expression of 
brlA initiates a self-sustaining cascade of events in 
which regulatory (abaA and wetA) and structural 
genes are sequentially activated at appropriate times 
in proper cell types. Initiation of conidiophore devel- 
opment is coordinated through the output of distinct 
signal transduction pathways involving fluffy (flu) 
genes that are required to sense environmental and 
cellular factors to inhibit or promote conidiation 
and growth. Under conditions that favor conidiation, 
these signal pathway outputs serve to activate brlA, 
thereby controlling the onset of development (Adams 
et al., 1998). 


Metabolic Regulation 


Aspergillus can use a wide range of compounds as sole 
carbon and/or nitrogen source. The assimilation of 


these sources is regulated by wide-domain and path- 
way-specific transcriptional regulators. Utilization 
of nitrogen sources is subject to nitrogen meta- 
bolite repression conveyed by the wide-domain 
positive regulator AreA, which is inactive in the 
presence of ammonium (or glutamine), the preferred 
nitrogen source. Utilization of carbon sources is 
subject to carbon catabolite repression by CreA, a 
negatively acting transcriptional regulator that 
becomes active in the presence of glucose. Utilization 
of compounds that can be used both as carbon and 
nitrogen sources is under the control of both regula- 
tors, such that gene transcription is dependent on 
AreA only in the presence of a repressing CreA pro- 
tein. An example of a well-studied metabolic system is 
the pathway of nitrate assimilation (Figure 6). The 
niaD and niiA genes code for nitrate reductase and 
nitrite reductase, respectively, which convert nitrate to 
ammonia. The transcription of these genes is induced 
by nitrate via the pathway-specific regulator NirA, 
and is also controlled by the wide-domain regulator 
AreA. These protein products of these genes are both 
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Figure 6 Regulation of nitrate assimilation. 


required for activation of the structural genes. Because 
NirA only functions when nitrate is in the medium, 
niiA and niaD are only transcribed if nitrate is present 
and ammonia is absent. 


Cell-Cycle Controls 


Studies of cell-cycle controls in Aspergillus have 
complemented those carried out in S. cerevisiae and 
Schizosaccharomyces pombe (Morris and Enos, 1992). 
Identification of mutants that never enter (nim) or are 
blocked in (bim) mitosis led to the discovery of genes 
encoding protein kinases (e.g., NimA) and phospho- 
protein phosphatases (e.g., BimG) that participate in 
controlling progression through the cell cycle in all 
eukaryotes (see Cell Cycle). 


Additional Areas of Investigation 


Aspergillus has proven to be useful for studying other, 
less well-developed, topics of interest to biological 
scientists. These include: chromosome movement dur- 
ing mitosis; cytokinesis; organelle movement and 
localization; essential gene discovery; secondary 
metabolism; and protein secretion. 
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Mating is said to be ‘assortative’ when there is a cor- 
relation in phenotype between mating pairs. When the 
correlation is greater than zero, the mating system is 
positive assortative mating; when it is less than zero, 
the mating system is negative assortative mating (also 
called ‘disassortative mating’). 

Positive assortative mating differs from inbreed- 
ing. Whereas inbreeding affects all the genes in the 
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organism, positive assortative mating affects only 
those genes that determine the phenotypic character- 
istics on which mate choice is based, as well as genes 
that are genetically linked to them. 

In human beings, positive assortative mating is 
observed for height, skin color, IQ score, and certain 
other traits, although assortative mating varies in 
degree in different populations and is absent in some. 
As might be expected, positive assortative mating is 
found for certain socioeconomic variables. In one 
study in the United States, the highest correlation 
found between married couples was in the number 
of rooms in their parents’ homes. Negative assortative 
mating is apparently quite rare in human populations. 

The consequences of positive assortative mating are 
complex. They depend on the number of genes that 
influence the trait on which mate selection is based, 
on the number of different possible alleles of the 
genes, on the number of different phenotypes, on the 
sex performing the mate selection, and on the criteria 
for mate selection. Traits for which mating is assortative 
are rarely determined by the alleles of a single gene. 
Most such traits are polygenic, so reasonably realis- 
tic models of assortative mating tend to be rather invol- 
ved. The theoretical study of assortative mating for 
polygenic traits was pioneered by Fisher (1918), and 
the best modern treatment is in Crow and Felsenstein 
(1968). For a comprehensive review of the theory of 
both positive and negative assortative mating, see 
Crow and Kimura (1970). One qualitative consequence 
of positive assortative mating is seemingly obvious: 
since like phenotypes tend to mate, assortative mating 
generally increases the frequency of homozygous 
genotypes in the population at the expense of 
heterozygous genotypes. Although this is correct, the 
increase in homozygosity is trivial if the number of 
genetic factors influencing the trait is large. In contrast, 
the genetic variance is increased by a factor of approxi- 
mately 1/(1 — r) at equilibrium, where v is the correla- 
tion between mating pairs. For r= 1/4, whichis close to 
the correlation in height between husbands and wives, 
the equilibrium genetic variance is increased by 1/3. 

Positive assortative mating is often thought to play a 
rolein the evolution of premating reproductive barriers 
between closely related species whose habitats overlap. 
Selection for positive mating in this context is often 
called the ‘Wallace effect,’ after Alfred Russel Wallace. 
Intuitive support for the Wallace effect comes from the 
argument that, when hybrids have reduced viability or 
fertility, then organisms that choose mates from other 
than their own subpopulation are in effect wasting their 
gametes. Quantitatively, however, this effect is appar- 
ently quite small (Sawyer and Hartl, 1981). 

In animals, negative assortative mating appears 
to be less common than positive assortative mating. 


One indisputable example, which occurs in all sexual 
organisms, is that most matings are heterosexual. In 
certain species of Drosophila, a curious type of nega- 
tive assortative mating is a phenomenon called ‘min- 
ority male mating advantage,’ in which females mate 
preferentially with males with rare phenotypes. For 
example, in a study of experimental populations of 
D. pseudoobscura containing flies homozygous for 
either a recessive orange eye-color mutation or a 
recessive purple eye-color mutation, Ehrman (1970), 
found that, when 20% of the males were orange, the 
orange-eyed males participated in 30% of the observed 
matings; conversely, when 20% of the males were 
purple, the purple-eyed males participated in 40% of 
the observed matings. 

There are many examples of negative assortative 
mating in plants. A classical example concerns a poly- 
morphism known as ‘heterostyly’ found in most 
species of primroses (Primula) and their relatives. 
The heterostyly polymorphism refers to the relative 
lengths of the styles and stamens in the flowers. Most 
populations of primroses contain approximately equal 
proportions of two types of flowers, one known as 
‘pin,’ which has a tall style and short stamens, and the 
other known as ‘thrum,’ which has a short style and 
tall stamens. In heterostyly, insect pollinators that 
work high on the flowers pick up mostly thrum pollen 
and deposit it on pin stigmas, whereas pollinators that 
work low in the flowers pick up mostly pin pollen and 
deposit it on thrum stigmas. Negative assortative mat- 
ing therefore takes place because pins mate preferen- 
tially with thrums. Additional floral adaptations 
facilitate the negative assortative mating. For example, 
pollen grains from pin flowers fit the receptor cells of 
thrum stigmas better than they do their own, and 
pollen grains from thrum flowers germinate better 
on pin stigmas than they do on their own. 

Negative assortative mating occurs in the form of 
RNase-based gametophytic self-incompatibility in 
the Rosaceae, Solanaceae, and Scrophulariaceae, and 
in the form of sporophytic self-incompatibility in the 
Brassicaceae. In Brassica, pollen specificity is encoded 
at the multipartite S-locus, a complex region contain- 
ing many expressed genes whose functions are largely 
unknown, but among whose products are the 
S-locus glycoprotein and the S-locus receptor kinase 
(McCubbin and Kao, 1999). Self-incompatibility 
results from the rejection of pollen grains that express 
S-locus specificities held in common with the seed 
parent. A current model of self-incompatibility in 
Brassica postulates that common specificities between 
pollen and seed parent activate a signal transduction 
pathway leading to rejection of the incompatible pol- 
len. Population studies suggest that the high levels 
of genetic diversity found at the S-locus reflect the 


ancient origin of sporophytic self-incompatibility in 
this group, which predates species divergence within 
the genus by a factor of 4 or 5 (Uyenoyama, 1995). 
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This term usually implies independent assortment of 
genes and means the same as independent segregation. 


See also: First and Second Division Segregation; 
Independent Segregation 
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Ataxia telangiectasia (A-T) is a neurodegenerative dis- 
order inherited in a recessive manner that is apparent 
in children as soon as they begin to walk. The disorder 
is progressive, so that by their early teens, patients 
have to rely on a wheelchair for mobility. Not only 
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is the brain affected in A-T, but also the immune 
system (patients can have quite serious infections), 
the liver, and some blood vessels. Patients are also at 
increased risk of developing cancer of the lymphoid 
system. One of the most remarkable features about 
these patients is that they are unusually sensitive to the 
killing effects of ionizing radiation. It is this feature 
and its implications that has interested many scien- 
tists. In short, the increased radiosensitivity, which can 
be easily detected in cells from these patients, is an 
indicator of a defective response to DNA damage; 
therefore the gene for ataxia telangiectasia, called 
ATM (for ataxia telangiectasia mutated), is important 
for all of us to protect our cells from some forms of 
damage to our genetic material. How the ATM pro- 
tein does this and how loss of ATM protein produces 
the features of A-T including the tumors is described 
below. As approximately 0.5-1.0% of us carry muta- 
tions in this gene, the effect the gene has in carriers, 
i.e., in the heterozygous state, is also addressed below. 


Features of Classical Ataxia 
Telangiectasia 


The major feature of A-T is progressive cerebellar 
degeneration beginning in infancy. The cerebellar 
abnormality results in ataxia, difficulties with speech, 
and also a characteristic abnormal eye movement 
(apraxia). There is also progressive oculocutaneous 
telangiectasia (dilated blood vessels) first noted in the 
exposed bulbar conjunctivae, susceptibility to neopla- 
sia, and sinopulmonary infection. All patients show a 
deficiency of cell-mediated immunity, although defi- 
ciency of humoral immunity is more variable. All 
classical A-T patients show an increased level of 
chromosome translocations involving chromosomes 
7 and 14 in peripheral blood T cells; they also all 
show an increased cellular radiosensitivity, which can 
be measured in different ways. An elevated level of 
serum a-fetoprotein (AFP) is a consistent finding. 

Classical A-T will result from the total absence of 
any functional ATM protein and at the gene level this 
is the consequence of homozygosity or compound 
heterozygosity for ATM null alleles so that no func- 
tional ATM protein is produced; therefore, interest- 
ingly, ATM is not an essential gene and there must be 
redundancy with at least one and possibly more than 
one other protein. 


ATM Protein 


The ATM gene spans 150kb of genomic DNA 
and encodes a ubiquitously expressed transcript of 
approximately 13kb consisting of 66 exons. The 
main promoter of ATM is bidirectional and the single 
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open reading frame of the ATM gene gives a 350 kDa 
protein of 3056 amino acids. This protein shows simi- 
larity at its C-terminal end to the catalytic domain of 
phosphatidylinositol-3 kinase (PI-3). The PI-3 kinase 
motif is common to a group of protein kinases 
involved in cell cycle regulation, response to DNA 
damage, interlocus recombination, and control of 
telomere length. 

ATM is principally a nuclear protein. Its expression 
level and localization is not affected by the stage of the 
cell cycle nor whether there has been prior exposure of 
the cell to ionizing radiation. Some ATM protein may 
also be located in the cytoplasm. Although the com- 
plete inventory of ATM functions is still to be estab- 
lished, it is known that it has a role in activating G,/S, 
S, and G2/M cell cycle checkpoints following exposure 
to DNA damage. Most distinctive is the S phase 
defect, characterized by radioresistant DNA synthesis 
(RDS), in which A-T cells do not suppress DNA 
synthesis, as normal cells would, following exposure 
of cells to ionizing radiation. 

ATM is a serine/threonine protein kinase that is 
activated by exposure of cells to ionizing radiation. 
A major role of ATM is to regulate the p53 protein. 
ATM binds to and phosphorylates p53 on serine-15 
in vitro and in vivo. This phosphorylation is likely to 
enhance the ability of p53 to transactivate downstream 
genes such as p21 and MDM2. The triggering of pro- 
grammed cell death (apoptosis) is a normal physiolo- 
gical response to eliminate cells with levels of genetic 
damage too high to be repaired. Cells defective in ATM 
appear to be more resistant to ionizing radiation- 
induced apoptosis, although this appears to be a cell 
type-specific response. In addition to p53 several 
other substrates for ATM protein kinase have been 
identified including B-adaptin, c-Ab1, Nbs1, and 
BRCA1. An amino acid consensus phosphorylation 
sequence has been compiled and various other puta- 
tive substrates identified. 

ATM-deficient cells also have a defect in stress res- 
ponse pathways so that, for example, c-Jun N-terminal 
kinase (JNK) activation following exposure of cells to 
ionizing radiation is defective. ATM is reported to be 
part of the BRCA1-associated genome surveillance 
complex (BASC). In addition to ATM, the hMre11/ 
hRad50/Nbs1 complex, and BRCA1, at least five 
other proteins are reported to be in this large complex, 
which appears to act by recognizing damage to the 
genetic material and also by repairing it. 


ATM Mutations and Phenotypic Variation 
in A-T Patients 


Mutations in A-T patients are scattered across 
the whole coding sequence of the ATM gene. The 


majority of mutations are predicted to lead to the 
premature termination and complete loss of the pro- 
tein, but a minority of leaky mutations can result in 
synthesis of a small amount of normal protein and 
missense mutations can express higher levels of 
mutant protein. Consequently, some patients may 
have sufficient ATM function to moderate their clin- 
ical and cellular phenotype. 


A-T-Like Disorder (ATLD) Caused by 
Mutation in the hMREI I Gene 


Occasionally, mutations in the ATM gene cannot be 
detected in patients who have ataxia telangiectasia. It 
is now known that some of these patients have muta- 
tions in another gene, )>MRE11. Interestingly, hMre11 
is a DNA double-strand break repair protein and is 
part of the hMre11/hRad50/Nbs1 protein complex 
acting in the same DNA damage response pathway 
as ATM. It has been shown that a combination of two 
null alleles of HMRE11 (or hRADSO) is lethal. The 
mutations described in patients, however, result in 
either a truncated protein or a full-length mutated 
protein with some residual function. 


Comparison with Nijmegen Breakage 
Syndrome 


For some time A-T was the only disorder in which 
increased radiosensitivity was a recognized part of the 
disorder. Subsequently, patients were described with 
the Nijmegen breakage syndrome (NBS) who also 
show increased radiosensitivity. These two disorders, 
and ATLD, show similar features at the cellular level, 
based mainly on their increased sensitivity to ionizing 
radiation. RDS is a hallmark of cells from all three 
disorders. 

However, clinical overlap between A-T and NBS is 
only partial. This includes an immunodeficiency and 
an increased risk of lymphoid malignancies. Patients 
with NBS show a microcephaly and frequently a 
borderline mental retardation but do not develop 
cerebellar degeneration or telangiectasia. As in A-T 
and ATLD they also show chromosome transloca- 
tions in peripheral lymphocytes with breaks at the 
sites of the T cell receptor genes. 

A clear biochemical link between double-strand 
break (DSB) repair and mammalian cellular responses 
to DNA damage was revealed by the observation that 
the gene (NBS1) for Nijmegen breakage syndrome 
functions in a complex with the highly conserved 
DSB repair proteins hMre11 and hRad50. The subse- 
quent finding that hMRE11 mutations are associated 
with the clinical features of ataxia telangiectasia 
further links A-T to Nijmegen breakage syndrome. 


More recent work has shown that ATM is linked more 
directly to this repair complex since the ATM protein 
phosphorylates Nbs1 protein following exposure of 
cells to damage. 


Malignant Disease in Ataxia 
Telangiectasia Patients 


An increased risk of developing malignant disease is 
an important feature of A-T. Indeed approximately 
10-15% of all A-T patients develop a malignancy in 
childhood with the majority of these tumors being 
lymphoid in origin, including both B and T cell lym- 
phoid tumors as well as Hodgkin disease. The ATM 
gene defect appears to allow either a higher level of 
formation of illegitimate chromosome translocations, 
involving recombination of T cell receptor (TCR) 
genes, in T lymphocytes compared with non-A-T 
individuals or a lower rate of removal of these trans- 
locations. Adult patients develop T-cell prolympho- 
cytic leukemia arising from a proliferating T-cell clone 
marked by a translocation involving one of two genes, 
TCL1 or MTCP1. Younger patients develop T-cell 
acute leukemia or T-cell lymphoma and it is likely 
that the propensity for chromosome translocation 
contributes to these. 

ATM mutations in A-T patients with leukemia and 
lymphoma are scattered across the ATM gene suggest- 
ing that a single position within the ATM-coding 
sequence is unlikely to be associated with occurrence 
of leukemia or lymphoma in A-T patients. Disruption 
of the hMre11/hRad50/Nbs1 complex through muta- 
tions in the NBS1 gene in patients with NBS also 
results in a high frequency of lymphoma in these 
individuals. 

Other tumors seen in A-T patients at a higher 
frequency than normal include various epithelial cell 
tumors and brain tumors, and also breast cancer in a 
few families. 


Cancer Risks of ATM Mutation Carriers 
in A-T Families 


Although ATM mutations in A-T patients predispose 
to lymphoid tumors the effect of the mutations may be 
numerically more important in the heterozygous 
state. Approximately 0.5-1% of the population carry 
an ATM mutation and carriers in A-T families have 
been reported to have an increased relative risk of 
breast cancer of three- to fourfold. In contrast to studies 
with carriers in A-T families, studies of patients with 
sporadic breast cancer have not shown convincingly 
that ATM mutation carriers have an increased relative 
risk for breast cancer. 
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ATM Mutation in Sporadic Tumors 


ATM mutations also play a role in the development of 
some sporadic tumors including T-cell prolympho- 
cytic leukemia, B-cell chronic lymphocytic leukemia, 
and mantle cell lymphoma. 
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Adenosine triphosphate (ATP) is the primary chemical 
energy source used by cells to carry out energy- 
requiring reactions. Adenosine is a nucleoside in which 
the purine base adenine is covalently linked to the 1’ 
carbon of the sugar ribose. In the case of ATP, three 
phosphate groups are also present, bonded to the 5’ 
carbon of the ribose; therefore the ATP used by cells is 
adenosine 5’-triphosphate. The bond connecting the 
first phosphate group to the carbon is a phosphoester 
bond, and the bonds between the phosphate groups 
are phosphoanhydride bonds (formerly called phos- 
phodiester bonds). These latter two bonds are 
often referred to as ‘high energy’ bonds because con- 
siderable free energy, about —30.5kJ per mole, is 
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released when they are hydrolyzed. It is the hydrolysis 
of these phosphoanhydride bonds, when appropri- 
ately coupled to other reactions, which drives many 
of the energy-requiring reactions in metabolism. In 
several cases both phosphoanhydride bonds must be 
cleaved to make the coupled synthetic reaction pos- 
sible. Inthis case the reaction is of the form ATP - AMP 
+ inorganic pyrophosphate, and the inorganic 
pyrophosphate is subsequently broken down by 
pyrophosphatase. ATP is a substrate in many reactions. 
In some cases the AMP portion of ATP is in- 
corporated directly into the final product, as in RNA 
synthesis (and as deoxy ATP in DNA synthesis), but 
in most others ATP is simply used to supply energy to 
the overall process. 

Most anabolic metabolism in the cell is powered 
by ATP. ATP is used in protein biosynthesis, nucleic 
acid synthesis, as well as the synthesis of lipids and 
carbohydrates. It is also involved in the active trans- 
port of molecules and ions, muscle contraction and 
many other processes. ATP is not used by cells to 
store energy but to capture it and use it. Therefore, 
the actual concentration of ATP in a cell is in the 
millimolar range even when it is being synthesized in 
prodigious amounts. Rather it is being continually 
broken down and then must be resynthesized. 

ATP is generated by adding an inorganic phosphate 
group (P;) to adenosine 5/-diphosphate (ADP). The 
energy required to carry out this reaction comes 
from the energy-releasing reactions of respiration, 
photosynthesis, or fermentation. In respiration and 
photosynthesis, the production of ATP is carried out 
by the membrane-bound enzyme ATP synthase. In 
both processes the ATPase produces ATP from ADP 
and P; using the energy from dissipation of the 
proton motive force which is generated across the 
membrane. The ATPases of both prokaryotes and 
eukaryotes are complex structures. They contain sub- 
units which are rotated by the proton motive force and 
this rotation (mechanical energy) is converted to 
chemical energy during the synthesis of ATP. The 
reaction is fully reversible and thus ATP can be broken 
down by ATPase to generate a proton motive force. 

In respiration the production of ATP is termed 
oxidative phosphorylation. Respiration involves the 
oxidation of organic (orinorganic) compounds coupled 
to the reduction of a terminal electron acceptor such as 
oxygen. In this case oxidation—reduction reactions in a 
series of membrane-associated electron carriers gen- 
erates a proton motive force across a membrane. The 
dissipation of this force is coupled by the membrane- 
bound ATP synthase to synthesize ATP from ADP. In 
eukaryotic cells these reactions take place in mitochon- 
drial membranes, whereas in prokaryotes they take 
place in the cytoplasmic membrane. Respiration 


involves a more complete oxidation of the substrate 
than does fermentation and thus yields considerably 
more ATP. 

In photosynthesis the process of forming ATP is 
termed photophosphorylation. Here, light energy is 
converted to chemical energy (in the so-called ‘light 
reactions’) and the chemical energy is trapped in the 
terminal phosphoanhydride bond in ATP. As in 
respiration, a proton motive force is generated by 
electron transport and is used by ATP synthase to 
generate ATP. However, instead of oxidizing organic 
or inorganic substances as in respiration, in photo- 
synthesis energy is obtained from light. In plants 
these reactions take place in chloroplasts. 

ATP is also produced in fermentation, a process 
that occurs in the absense of added terminal electron 
acceptors like oxygen and does not involve membrane- 
mediated events. In this case, ATP is synthesized 
by substrate level phosphorylation during specific 
steps in the fermentative pathway; that is, ADP is 
phosphorylated by specific enzymatic steps in catab- 
olism. Fermentation yields much less ATP than does 
respiration because in the absence of aterminal electron 
acceptor the fermented organic material cannot be 
completely oxidized. 


See also: Adenosine Phosphates; cAMP and Cell 
Signaling; Mitochondria; Nucleotides and 
Nucleosides 
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att sites are loci on either phage or bacterial DNA at 
which integration or excision of phage DNA from the 
bacterial chromosome takes place. 


See also: Excision Repair; Integration 


Attached-X and other 
Compound Chromosomes 
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A compound chromosome is one that has a nonstand- 
ard set of arms attached to one centromere. For 
example, in Drosophila melanogaster, the second 
chromosome is a metacentric, with left and right 
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arms; the normal second chromosome can therefore 
be symbolized as 2L-centromere-2R. A chromosome 
that is instead composed of 2L-centromere-2L is 
called a ‘compound chromosome,’ specifically here 
‘compound 2 left’? and symbolized C(2L). Any pair 
of arms can be attached to the same centromere, 
subject only to the ingenuity of the experimenter in 
arranging that the resulting fly (or other organism) 
be euploid, therefore alive, but compounds of the 
type 2L-centromere-3L are more usually classified 
as ‘translocations,’ although they are also termed ‘het- 
erocompounds.’ Indeed, a (homo) compound such as 
C(2L) is a translocation — between homologs — but 
the genetic consequences of compounds like C(2L) 
(see below) are sufficiently different from the genetic 
consequences of translocations to justify a distinctive 
name for them. Attached-X chromosomes are simply 
compounds involving the two X chromosomes and, in 
flies, are more properly symbolized C(1). 


Viability Considerations 


Most organisms are intolerant of aneuploidy; hetero- 
zygous deficiencies for small regions are compatible 
with viability, as are slightly larger duplications, but 
in general deficiencies or duplications for whole arms 
cause lethality. The above C(2L) chromosome is 
therefore viable only ina fly that simultaneously lacks 
a normal second chromosome and has two copies of 
the right arm of chromosome 2, for example as C(2R). 
A C(2L);C(2R) fly is euploid (it has two and only two 
copies of each gene) but is sterile in crosses to flies 
with normal chromosomes (Figure IA) because all 
progeny are grossly aneuploid. C(2L);C(2R) flies are, 
however, fertile when crossed to C(2L);C(2R), since 
complementary gametes are produced (Figure |B). 

The situation for attached-Xs in flies is slightly 
different, because aneuploidy for the Y chromosome 
has little effect on viability and no effect on sex. Thus, 
crossing a C(1)/0 female with a normal X/Y male gives 
viable C(1)/Y females and X/0 males (Figure IC) as 
50% of the total zygotes (aneuploidy for the X is 
lethal), although the X/0 males are sterile for lack of 
a Y. Crossing that C(1)/Y daughter back to a normal 
X/Y male now gives C(1)/Y daughters and X/Y (fer- 
tile) sons, which form a viable stock. 

Should one wish to keep C(1) females without a 
free Y, males carrying an attached XY (C(1;Y)) would 
be used. 


Meiotic Segregation in 
Compound-Bearing Flies 


In Drosophila males, compound chromosomes 
segregate independently of each other (see caveat 


below), as expected since they are not homologs; 
in Drosophila females, however, even though com- 
pound chromosomes are not homologs and do not 
cross over with each other, they nevertheless segregate 
regularly from each other via the backup distributive 
segregation system (see Segregation). Consequently, 
although a C(2L);C(2R) male produces all four types 
of sperm in equal frequency, a C(2L);C(2R) female 
produces ca. 90% of eggs with either C(2L) or 
C(2R) and only ca. 10% with both or neither. Simi- 
larly, a C(1)/Y female produces primarily eggs with 
the C(1) or the Y. 


Construction of Compounds 


Getting the first autosomal compound chromosome 
as a viable fertile fly was decidedly tricky, but once 
an autosomal compound exists, getting more (of 
defined genotypes) is easy: treat a normal female with 
X-rays and cross to a male carrying compounds. Only 
progeny carrying a new compound will be viable. 
These new compounds are found to have been caused 
by a translocation-type event between the left arm 
of one homolog and the right arm of the other 
(Figure 2), and most of the recovered new compounds 
have both breaks in the proximal heterochromatin 
(Figure 2A); this is because breaks in the euchromatin 
give compounds that are aneuploid, and here, too, 
very much aneuploidy (of euchromatic genes) is lethal 
(aneuploidy for heterochromatin has very little effect). 
Compounds that have a little bit of one arm hyper- 
ploid are, however, viable, and interestingly these do 
show preferential C(L)C(R) segregation in male 
meiosis; this strongly suggests that whatever it is that 
achiasmate Drosophila males use to direct autosomal 
homologs to opposite poles at meiosis I, it involves 
euchromatic homology and ignores heterochromatin. 
Attached-Xs are easier to construct in Drosophila, 
since a new C(1)-bearing egg is viable if fertilized with 
a Y-bearing sperm; however, just X-raying a normal 
X/X female yields very few C(1)s, because the right 
arm of the very nearly telocentric X is a very small 
target indeed. Consequently, building an attached-X is 
a two-step process; first, an X-centromere-Y arm 
chromosome is generated by a translocation between 
the extensive proximal X heterochromatin and either 
arm of the Y (Figure 2B), then a second translocation 
is induced between the other Y arm and the proximal 
heterochromatin of a second normal X. Attached-Xs 
therefore usually bear a Y-chromosome centromere. 


Types of Compounds 


The arrangement of the genetic material of the 
two arms of a compound chromosome is not fixed: 
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parent; and (C) normal XY male. Solid line, euchromatin; wavy line, heterochromatin; circle, centromere. Lethal 


classes are crossed out with dashed lines. 


the two arms may be in the same left-right orientation 
(tandem, T) or in opposite left-right orientation (re- 
versed, R); the centromere may be in the middle (meta- 
centric, M) or near one end (acrocentric, A); and the 
ends may be free (no special symbol) or joined to make 
a ring chromosome (another ‘R, unfortunately — R(1) 
is a ‘single’ X ring). There are therefore six basic com- 
pound chromosomes possible - TM, TA, RM, RA, 


TR, and RR — and all six have been made for the X in 
Drosophila and the consequences of crossing-over 
within them studied (see Lindsley and Zimm, 1992). 
Basically, reversed compounds synapse as a rod 
(Figure 3) and crossovers occur freely and have 
no consequences other than changing genotype 
(chiasmata resulting from such crossovers have no 
role in directing meiosis I segregation, since they join 
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Figure 2 Construction of compound chromosomes. Solid line, euchromatin; wavy line, heterochromatin; circle, 
centromere; short arrows, sites of breakage; cross lines, sites of rejoining. 


Figure 3 Synapsis of reversed and tandem com- 
pounds. Solid line, euchromatin; wavy line, heterochro- 
matin; circle, centromere. 
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Figure 4 Generation of isochromosomes. Solid line, 
euchromatin; wavy line, heterochromatin; circle and 
semi-circle, centromere. 


homologs attached to the same kinetochore). 
Tandem compounds synapse as a circle (Figure 3) 
and crossovers occur freely and can generate single- 
ring chromosomes. 


Isochromosomes 


Isochromosomes are compound chromosomes that 
have been generated by centromere misdivision 
(Figure 4); the two arms are therefore always identical, 
since they are derived from sister chromatids rather 
than from homologs. Centromere misdivision is 
common in some plants (e.g., wheat) whenever a uni- 
valent chromosome goes through meiosis, but if it 
occurs at all in Drosophila, it is very rare; compound 
chromosomes in flies (and worms) are therefore not 
isochromosomes. 


Genetic Consequences of Compound 
Chromosomes 


In ordinary crosses, progeny get one homolog from 
the paternal parent and the other from the maternal 
parent. Crosses involving compounds, however, give 
progeny who have received both homologs from 
one parent. This is most striking for the X. In ordinary 
crosses (X/X female by X/Y male), males get their single 
X from their mothers and are therefore ‘like’ them 
(matroclinous, like mother). In crosses of C(1) females 
by X/Y males, sons get their single X from the father and 
are therefore ‘like’ him (patroclinous, like father) and 
daughters get their two Xs from their mother (are 
matroclinous), rather than one from each parent. 
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Attachment sites are relatively short — roughly 20 to 
250 base pairs - DNA sequences that serve as 
the targets for site-specific recombination reactions, 
generally between a bacterial chromosome and a 
bacteriophage (phage) genome. Site-specific recombin- 
ation reactions are used for very many biological 
functions, but the reaction first studied and the one 
which gave the name attachment site to these se- 
quences is used by the temperate bacteriophage lambda 
to insert its genome into the chromosome of Escher- 
ichia coli to generate a lysogen (a bacterium which 
carries an integrated prophage; Figure 1). Many tem- 
perate phage use chromosomal integration as the 
mode of establishing the lysogenic state. The recom- 
bination event occurs between a unique locus on the 
phage genome (attP) and a unique locus on the bacter- 
ial chromosome (aitB). The recombination event gen- 
erates hybrid recombination sequences at the two 
boundaries between the phage and bacterial genome; 
these are known as the left and right attachment sites 
(attL and attR). Once in the chromosome, the 
prophage represses most of its genes, and remains 


Figure | Insertion of phage lambda into the chromo- 
some of Escherichia coli. 


“quiescent” as long as its host is growing well and is 
not suffering DNA damage. In response to DNA 
damage, the prophage can excise from the chromosome 
of the host bacterium by a second site-specific 
recombination reaction between attL and attR called 
excision. The excision reaction regenerates the original 
attP and attB sequences and separates the bacterial and 
host genomes, permitting the phage to enter its lytic 
replication cycle. When lysogeny of phage lambda was 
first discovered, the nature of the association between 
phage and host bacterium was not clear, but it was 
observed that their chromosomes were somehow 
“attached.” In 1962, Allan Campbell proposed the 
recombination model shown in Figure l, and all 
subsequent genetic and biochemical data obtained 
since his proposal have shown that the Campbell 
model is correct. 


The Function and Structure of the 
att Site 


The proteins that perform the catalytic steps of recom- 
bination are called integrases or Int proteins, and are 
encoded by the phage. These enzymes perform many 
functions and are extremely interesting proteins. First, 
they have to recognize the attachment sites, and 
almost always do so by binding to repeats within the 
att sites that are inverted with respect to each other 
(Figure 2). These repeated recognition sites are separ- 
ated by a short spacer, known as the overlap region. 
Int breaks and rejoins the DNA strands of the att sites, 
one at a time, at the edges of the overlap: the top strand 
of the DNA is cleaved at the left of the overlap and 
the bottom strand at the right. The sequence of the 
overlap must be identical between the recombination 
partners; sequence differences within the spacer 
severely lower the recombination efficiency, as much 
as 20-fold for a single base difference between the attB 
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and attP partners. This stringent requirement for iden- 
tity between the overlap acts as a check on whether the 
recombination is taking place between the correct 
partner sequences rather than between sequences 
that merely resemble the genuine att sites. 

The overlap and inversely repeated recognition 
sites for Int constitute the core of the att site; in fact 
the simplest att sites, such as the E. coli attB for phage 
lambda, consist only of this core region. In contrast, 
the attP sites have extra DNA flanking the core 
region. These flanking sequences, or arms, contain 
additional binding sites for Int as well as binding 
sites for accessory proteins required to help Int per- 
form the recombination reaction (Figure 2). The Int 
binding sites in the core of the att site are known as 
core sites, while those in the flanking regions are 
known as arm sites. The arm and core sites have dif- 
ferent sequences, and Int binds them with different 
domains. Moreover, the arm sites have a relatively 
higher affinity for Int than the core sites do, and are 
the place where Int first contacts the att site DNA. 
While the arm-binding domain of Int touches the 
flanks of the att sites, the catalytic domain of Int 
must be delivered to the core sites. This is accom- 
plished with the help of helper proteins that bind to 
the arm region of the att sites between the Int arm and 
core binding sites. One of these helper proteins is the 
host-encoded integration host factor (IHF), a protein 
which binds DNA at specific sequences and causes 
sharp, almost hairpin turns in the DNA. As seen in 
Figure 2, IHF binding sites separate the arm and core 
binding sites in every att site which contains flanking 
sequences in addition to the core region. The IHF 
protein is, as its name implies, encoded by the host, 
and was discovered in the late 1970s because of its role 
in site-specific recombination. However, it is found in 
most gram-negative bacteria and profoundly influ- 
ences the structure of the bacterial genome and the 
expression of as many as 20% of bacterial genes, either 
by helping to repress or to activate them. 

In the case of lambda and related phages, two more 
helper proteins bind to two of the four att sites, attP 
and attR, and play an important role in ensuring that 
recombination proceeds in a directional fashion. 
These are the phage-encoded excisionase (Xis) protein 
and the factor for inversion stimulation (FIS) protein. 
When the host is suffering DNA damage, it is import- 
ant that the phage, once excised, does not reinsert into 
the chromosome since this would expose it to more 
damage. By the same token, once a phage has managed 
to infect a rather lonely bacterium barely eking out a 
living in a nutrient-poor environment, it would be 
best if it remained integrated rather than mistakenly 
excising and setting off a lytic infection which would 
kill one of the only hosts around. The Xis protein is 
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required for excision but not for integration (Figure 
2). In fact, together with FIS, it inhibits integration. 
These two properties help ensure that recombination 
results in the appropriate outcome for the phage. Xis 
and FIS appear to adjust the curvature of the DNA to 
make recombination between attR and attL much 
more efficient than it would otherwise be. Because, 
in E. coli, high levels of FIS are made only during 
exponential growth, excision is coupled to this growth 
phase; FIS signals a healthy cell with enough biosyn- 
thetic capacity to make lots of phage particles during 
the lytic cycle. 


Location of att Sites in the Host 
Chromosomes 


Phages would presumably be best served by recombin- 
ation with short sequences that are unobtrusive as far 
as the host cell is concerned, and this is true of the 
location of the lambda attB site in the E. coli genome, 
between transcription units. However, most attB sites, 
including those of many E. coli phages, the Salmonella 
phage Gifsy-1, and the Haemophilus influenzae phage 
HP1, occur within coding regions, many for essential 
genes! In the case of some phages the attB sites are 
within tRNA genes; the attB site of the Salmonella 
Gifsy-1 phage is within the /epAB operon. Integration 
of the phage into these att sites would be expected to 
disrupt these coding regions and kill the host, where it 
not for the fact that the attP versions of the att sites are 
such that integration of the phage regenerates the 
intact coding sequence. That is, the phages in question 
carry a portion of the essential gene’s coding region 
within their genomes, and recombination between 
attB and attP generates a short duplication of bacterial 
sequences: one copy encodes the intact gene, the sec- 
ond a partial copy (Figure 3). Why is this seemingly 
awkward situation advantageous to the phage? Pre- 
sumably, this coincidence of attB sequences with essen- 
tial genes ensures the phage that the att site sequence 
will be maintained intact over the course of evolution. 
Although lysogeny does confer some advantages to 
the lysogen, this state nevertheless has great potential 
risk for the host bacterium — it has the equivalent of a 
ticking time-bomb. A cell in which the attB sequence 
has been deleted will be immune to being lysogenized. 
However, if the phage recombines with an att site that 
is part of an essential gene for the host, it is no longer 
possible for the host to survive deletion of the att site. 
This provides a very strong selection within the popu- 
lation of host bacteria to maintain an intact att site. 
The need for inversely repeated sequences within the 
core region is fulfilled within tRNA genes since these 
genes encode structures which themselves contain 
inversely repeated sequences. 
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Figure 3 Example of insertion into an attB site which 
is part of a gene. 
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Other att Sites 


Other mobile elements have sites at which site-specific 
recombination reactions occur. Some examples are cer- 
tain plasmids, the elements encoding drug-resistance 
genes known as integrons, and even some transposons, 
such as Tn7, which show a great preference for insert- 
ing at a specific sequence. Occasionally, the ends of 
transposons such as Mu where transposase binds prior 
to catalyzing transposition have been named att sites, 
although the reaction catalyzed in no way resembles 
site-specific recombination. 


See also: Integrons; Lysogeny; Site-Specific 
Recombination 
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The term ‘attenuation’ refers to mechanisms of regu- 
lation of gene or operon expression that result in dis- 
continuation or termination of RNA synthesis by 
RNA polymerase, soon after the initiation of tran- 
scription. Transcription attenuation makes use of 
RNA sequences and structures and allows cells to 
sense availability of the precursors needed for RNA 
and protein synthesis. RNA signals can direct a tran- 
scribing RNA polymerase molecule to pause dur- 
ing transcript elongation, to terminate transcription 
prematurely, or to transcribe through a potential 


termination sequence. Determination of whether 
transcription will or will not be terminated at a par- 
ticular site can be dictated by the formation of 
mutually exclusive RNA structures, for example, 
either of two alternative base-paired structures in a 
nascent transcript, one of which causes transcription 
termination. Furthermore, translation is often used to 
mediate attenuation decisions (translational control of 
transcription). A characteristic feature of transcription 
attenuation is that control over the continuation of 
transcript elongation occurs at sites that are encoun- 
tered by RNA polymerase in a ‘leader’ region, a 
sequence prior to the beginning of a particular gene. 
Transcription attenuation was discovered in regu- 
latory studies with the is and trp amino acid biosyn- 
thetic operons in Salmonella typhimurium (official 
designation, Salmonella enterica serovar typhimur- 
ium) and Escherichia coli, respectively. Although a 
wide variety of attenuation mechanisms have been 
discovered in enteric bacteria, the most comprehen- 
sively understood mechanism is that represented by 
regulation of the E. coli trp operon. Complementing 
the trp repression regulatory mechanism, attenuation 
in the trp leader region is of the type in which the 
location of a ribosome controls formation of alterna- 
tive secondary structures in the nascent transcript 
(ribosome stalling, alternative RNA structure- 
dependent attenuation). Attenuation mechanisms for 
regulation of a number of biosynthetic pathways have 
features similar to those of the trp operon. They 
include sites of transcription pausing and Rho- 
independent termination, a coding region for a short 
leader peptide containing codons for the regulating 
amino acid(s), and transcript segments that specify 
alternative RNA secondary structures. RNA poly- 
merase pauses in the initial segment of the leader 
region, caused in part by formation of an RNA 
secondary structure in the nascent RNA chain 
(termed the ‘pause RNA hairpin’). The temporary 
transcriptional pause induced by the pause signal 
allows time for the ribosome to start the synthesis 
of the leader peptide before the Rho-independent 
termination site is transcribed. When the ribsome 
reaches the paused polymerase, transcription resumes. 
The attenuator is a terminator hairpin that is tran- 
scribed from a DNA segment downstream of the 
pause signal. The antiterminator, an alternative RNA 
secondary structure, can form from the downstream 
half of the pause structure and the upstream half of the 
terminator structure. Formation of the antiterminator 
structure during transcription of the leader region 
prevents formation of the terminator structure, there- 
by allowing continuation of transcription. The un- 
hampered translation of the leader peptide coding 
region versus ribosome stalling at key sites within that 


region selects between terminator or antiterminator 
formation, resulting either in termination of transcrip- 
tion or continuation of transcription into the genes of 
the operon. In the trp operon in particular, the leader 
RNA encodes a 14-amino-acid peptide that contains 
two tryptophan residues, at positions 10 and 11, and 
the UGA translation termination signal at codon 15. 
In the presence of ample tryptophan (tryptophanyl- 
tRNA), the ribosome efficiently translates through 
the two Trp codons, preventing formation of the 
antiterminator hairpin and thereby allowing forma- 
tion of the terminator. However, under conditions of 
limiting tryptophan, the scarcity of tryptophanyl- 
tRNA results in pausing of the ribosome at the Trp 
codons. This allows formation of the antiterminator 
structure, prevents formation of the terminator, and 
allows transcription to continue into the genes of the 
trp biosynthetic operon. 

‘Transcription attenuation is an often-used, effective 
regulatory strategy. By using RNA as the main ele- 
ment in a regulatory decision, strategies become avail- 
able that are not possible with DNA as the target. 
Furthermore, those mechanisms that utilize translation 
or translational components expand the avenues by 
which the gene or operon can respond quickly to 
physiological changes. Although there are several 
examples in eukaryal organisms with features resem- 
bling those of bacterial transcription attenuation 
mechanisms, this form of regulation has not been 
extensively studied in higher organisms. 


See also: Antitermination Factors; Attenuation, 
Transcriptional; Terminator 
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Gene expression can be efficiently regulated by abort- 
ing the synthesis of messenger RNA through the 
premature termination of transcription. Transcription 
termination signals that are affected by this type of 
control mechanism are called attenuators, because 
they have the capacity to reduce the levels of down- 
stream transcription. Attenuators generally resemble 
the typical termination signals found at the ends of 
transcriptional units, but lie upstream of the coding 
sequence of the genes they control. A variety of mol- 
ecular mechanisms have evolved for directing RNA 
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polymerase to either terminate or read through a par- 
ticular attenuation site (Figure |). These mechanisms 
include controlling the formation of an RNA structure 
that precludes terminator formation, or converting 
the transcription machinery to a termination-resistant 
form. A few of these mechanisms are described below. 
In each case, the efficiency of termination can be 
modulated in response to some physiological signal 
such as nutrient availability. 


Translation of Leader mRNA 


A number of genes, primarily amino acid biosynthetic 
operons suchas the Escherichia coli trp and his operons, 
are regulated by the efficiency of translation of a short 
segment of mRNA (the “leader”) upstream of an 
attenuator. Leader mRNA has the ability to fold into 
two or more alternate structures, most important of 
which are the terminator and a competing antitermin- 
ator. Because transcription and translation are coupled 
in prokaryotes, transient pausing of RNA polymerase 
as it traverses the leader region permits initiation of 
translation of a short (usually <30 codons) open read- 
ing frame; leader mRNA is enriched for codons for the 
cognate amino acid, i.e., the trp leader contains tandem 
tryptophan codons, the Ais leader contains multiple 
histidine codons, etc. If the concentration of the cog- 
nate amino acid is low, ribosomes translating this 
region will stall at these codons, due to low availability 
of the charged tRNA. Conversely, if the amino acid is 
abundant, translation of the leader peptide will be 
rapid. The movement of the ribosome in turn affects 
the formation of structural elements in the leader 
RNA, since a stalled ribosome will sequester a portion 
of the RNA. Stalling of the ribosome results in the 
formation of an antiterminator structure which in turn 
prevents formation of the transcription terminator, 
leading to synthesis of the full-length mRNA and 
expression of the downstream genes. In contrast, effi- 
cient leader RNA translation, which occurs when the 
cognate amino acid is abundant, results in termination 
of transcription and downregulation of gene expres- 
sion. The specificity of the response of each operon is 
governed by the presence of the appropriate codons in 
the leader mRNA, dictating which amino acid will be 
monitored. 


RNA Binding Proteins 


The switch between terminator and antiterminator 
structures can be controlled by binding of a specific 
protein to the leader RNA. These RNA binding 
proteins can act to stimulate either termination or 


readthrough. 
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Mechanisms for control of transcription termination. (A) Translation of leader mRNA (e.g., E. coli trp 


operon). Amino acid limitation causes stalling of the ribosome (stippled circles) in region A of the leader RNA, 
resulting in formation of the antiterminator (B:C) instead of the terminator (C:D). (B) Termination directed by an 
RNA binding protein (e.g., B. subtilis trp). Binding of a protein (shaded circle) to the mRNA prevents antiterminator 
(B:C) formation, promoting formation of the terminator (C:D). (C) Readthrough directed by an RNA binding protein 
(e.g, E. coli bgl). Binding of a protein (shaded circle) to the leader region stabilizes the antiterminator (B:C), 
preventing formation of the terminator (C:D). (D) Antitermination directed by tRNA (e.g., B. subtilis tyrS). Interaction 
of the cognate uncharged tRNA with the leader mRNA promotes formation of the antiterminator (B:C), preventing 
formation of the terminator (C:D). (E) Modification of the transcription elongation complex (e.g., à N). Binding of 
proteins (small circles) to the nascent transcript (diagonal line) and RNA polymerase (hatched ellipse) converts the 
transcription complex to a form resistant to termination signals (t). (Reproduced with permission from Annual Review 


of Genetics 30, by Annual Reviews (www. AnnualReviews.org).) 


Terminator Proteins 

Expression of the Bacillus subtilis trp operon is con- 
trolled by TRAP, an unusual RNA binding protein. In 
the presence of tryptophan, TRAP binds to the trp 
leader RNA and prevents formation of an antitermin- 
ator structure, thereby permitting formation of the 
competing intrinsic terminator. TRAP assembles into 
an 11-subunit symmetrical ring, with 11 molecules of 
tryptophan spaced between the TRAP monomers. 
The RNA appears to wrap around the outside of the 
TRAP ring, with contacts between each monomer 
and GAG/UAG repeats in the RNA binding site. 
TRAP oligomerization is tryptophan-independent, 
but RNA binding requires tryptophan, suggesting 
that tryptophan controls TRAP activity by causing a 
conformational change that is required for binding to 
its RNA target site. The B. subtilis pyr system is also 
regulated by binding of a regulatory protein to the 


RNA leader region to mediate transcription termin- 
ation. In this case, the regulator, PyrR, binds in the 
presence of UMP, an end product of pyr operon 
expression. The target site for PyrR is a complex 
structure. Binding to this element precludes formation 
of an antiterminator structure, which competes with 
the attenuator. Thus, PyrR causes termination by 
stabilization of an anti-antiterminator. The trp and 
pyr systems are similar in that the default state is 
readthrough of the attenuator in the absence of the 
end-product of expression of the operon, so that tran- 
scription will be prevented only if the required meta- 
bolite is present. 


Antiterminator Proteins 

Systems such as the E. coli bgl and B. subtilis 
sac operons, which are involved in utilization of B- 
glucosides and sucrose, respectively, differ from the 


previous systems in that the default state is termina- 
tion. Binding of a regulatory protein to the leader 
RNA is required to stabilize an otherwise unstable 
antiterminator structure, and to prevent the formation 
of a competing terminator. The RNA binding activity 
of the antiterminator protein is controlled by a phos- 
phorylation reaction catalyzed by a specific sugar 
transport protein. When the substrate sugar is available, 
the phosphorylation activity of the transporter is 
directed toward the sugar so that the antiterminator 
protein remains unmodified and is active in antitermin- 
ation. In the absence of the sugar substrate, the anti- 
terminator protein is inactivated by phosphorylation, 
so transcription terminates at the attenuator. These 
systems therefore couple substrate transport to the 
expression of genes which encode enzymes that meta- 
bolize the substrate. By using the same transporter 
protein to mediate both functions, transcription 
occurs only when the substrate is available. 


tRNA-Directed Antitermination 


In B. subtilis and other gram-positive bacteria, many 
genes involved in amino acid biosynthesis and activa- 
tionare regulated by a unique transcription termination 
control mechanism. Each gene responds specifically 
to the charging ratio of the cognate tRNA (e.g., 
the tyrosyl-tRNA synthetase gene, tyrS, responds to 
tyrosyl-tRNA, the threonyl-tRNA synthetase gene, 
thrS, responds to threonyl-tRNA, etc.). Unlike the E. 
coli trp-type system, where tRNA charging is moni- 
tored via the translation of a leader RNA-encoded 
peptide, in gram-positive systems tRNA charging 
is measured by a direct interaction between the 
uncharged tRNA and the leader RNA. The specificity 
of the interaction is mediated by a single codon in the 
leader RNA which matches the anticodon of the regu- 
latory tRNA, and a single-stranded region of an anti- 
terminator structure which pairs with the acceptor 
end of uncharged tRNA. Binding of the uncharged 
tRNA to the leader is postulated to stabilize the anti- 
terminator, which competes with terminator forma- 
tion; charged tRNA is predicted to be unable to 
interact with the antiterminator, so that readthrough 
of the terminator occurs only when uncharged tRNA 
accumulates. This event signals a requirement for 
increased expression of the appropriate amino acid 
biosynthesis or aminoacyl-tRNA synthetase genes. 
The leader regions of all of the genes in this family 
exhibit high conservation of a number of sequence 
and structural elements, all of which are required for 
readthrough of the terminator, but the roles of these 
elements remain to be determined. It is likely that 
factors in addition to the tRNA play a role in anti- 
termination. 
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Modification of Transcription Complex 


Another mechanism for termination control involves 
conversion of RNA polymerase to a termination- 
resistant form. The classic example of this mechanism 
is N-mediated antitermination in bacteriophage 
lambda, where readthrough of terminators early in 
the major transcriptional units is required for the 
transition from early to delayed-early gene expres- 
sion. The phage-encoded N protein binds to target 
sequences in the nascent transcript upstream of the 
terminators, and recruits a set of host cell factors 
(Nus proteins) which form a complex with RNA 
polymerase. This N-modified transcription complex 
is capable of ignoring multiple transcriptional termin- 
ators, both intrinsic and Rho-dependent, over long 
distances. A related mechanism comes into play dur- 
ing the transition between delayed early and late tran- 
scription, mediated by the Q antitermination protein. 
This type of system differs from the previous systems 
in that N and Q modulate transcription termination 
by altering the properties of the transcription machin- 
ery itself, rather than by affecting the conformation of 
the leader RNA. 


Further Reading 

Henkin TM (2000) Transcription termination control in bac- 
teria. Current Opinion in Microbiology 3: 149-153. 

Weisberg RA and Gottesman ME (1999) Processive antitermin- 
ation. Journal of Bacteriology 181: 359-367. 


See also: Bacillus subtilis; Leader Sequence; 
Transcription 


AUG Codons 
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The unique characteristic of the AUG codon is that it 
is the initiator codon at the start of a translated 
sequence of a messenger RNA (mRNA). It leads to 
the incorporation of a methionine as the first amino 
acid of a synthesized protein during protein biosynthe- 
sis on ribosomes. AUG codons are always translated 
into methionine regardless of their position in the 
mRNA. This is done in different ways in eukarya 
and bacteria. In eubacterial mRNAs, a small region 
rich in As and Gs usually precedes the initiator AUG 
codon. This region is complementary to a region of 
the 3’ end of the 16S ribosomal RNA. The binding of 
this region of the mRNA to the 3’ end of the 16S 
rRNA is called the ‘Shine Dalgarno interaction.’ This 
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interaction presents the initiator AUG codon at the 
decoding site of the small ribosomal subunit. The 
initiator tRNA (fMet-tRNA) complexed with initia- 
tion factor 2 recognizes this AUG codon and binds to 
the P-site of the small subunit of the ribosome. The 
translation then proceeds after the association of the 
large subunit to the initiation complex and by binding 
of aminoacyl-tRNAs to the A-site. 

In eukaryal systems, the initiation AUG codon is 
recognized quite differently. The eukaryal mRNAs 
are usually capped at the terminal 5’ position. This 
means that they have an N’-methylated GTP linked 
by a 5/-5'-pyrophosphate bond to the terminal 
nucleotide. Specific proteins, the cap-binding pro- 
teins, recognize this so-called cap and are important 
constituents for initiation. The cap is situated at a 
varying distance from the initiation codon, the first 
AUG. The initiator tRNA binds to the small subunit 
in complex with the eukaryal initiation factor 2 (eIF-2, 
which is composed of three polypeptides). The small 
subunit then scans the mRNA for this AUG codon 
which will be recognized by the bound initiator tRNA. 
Subsequently the large subunit associates with this 
complex to initiate protein synthesis. The elongator 
AUG codons are translated just as any codon. 


See also: Elongation; Genetic Code; Initiation 
Factors; Messenger RNA (mRNA); Shine- 
Dalgarno Sequence; Terminator; Translation 
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Several sites of action for growth factors have been 
described. Sporn and Todaro (1980) defined autocrine 
and paracrine to distinguish activities of growth fac- 
tors from the classical activities of hormones, which 
travel great distances through the circulation from 
their sites of production to their target cells within 
the organism. The autocrine mode refers to the ability 
of growth factor to act on the cell releasing it. In the 
paracrine mode, the released growth factor from one 
cell acts on a nearby or adjacent cell. Certain growth 
factors also exist as membrane-anchored forms, which 
can bind and activate membrane receptors only on 
adjacent cells. This process considered a variant of 
the paracrine mode, has been termed juxtacrine 
(Massague, 1990; Bosenberg and Massague, 1993) 
and is capable of delivering spatially localized inter- 
cellular stimuli. A number of researchers have 


observed that factors which are produced in cells but 
are not detectably secreted, nevertheless, can induce 
observable phenotypic changes in those cells. The 
suggestion has been made that this represents an 
‘intracrine’ mode of action, whereby the factor inter- 
acts with its receptor, for example, within the Golgi 
apparatus (Re, 1988; Logan, 1990). A sixth mode of 
action, in which the growth factor is bound to, and 
stored within, the extracellular matrix before present- 
ation to the receptor on the cell surface has also 
been demonstrated (Klagsbrun and Baird, 1991; 
Yayon et al., 1991). 

The role of autocrine-acting growth factors in 
transformation was initially established by the demon- 
stration that the v-sis oncogene encoded a protein 
closely related to human PDGF-b (Doolittle et al., 
1983; Waterfield et al., 1983). Tumors induced by 
this oncogene were specific for target cells possessing 
the cognate PDGF receptors. Subsequent studies re- 
vealed that MMTV induction of mammary carcinoma 
in mice correlated with integration of the provirus in 
the region of the imt-2 (FGF-3) gene (Smith et al., 
1988). Moreover, the FGF-4 and FGF-5 genes were 
isolated by their ability to cause transformation of 
mouse fibroblasts in vitro (Thomas, 1988; Burgess 
and Maciag, 1989; Chiu, 1989). By extrapolation, it 
follows that the expression of any growth factor and 
its specific receptor by the same cell might establish 
an autocrine loop that contributes to tumor pro- 
gression. In fact, the ability of autocrine stimulation 
to induce a tumorigenic phenotype in established cell 
lines has been demonstrated under a variety of experi- 
mental conditions. After transfection of cDNA 
expression vectors encoding the specific factor and 
receptor, such cells overcome their growth factor de- 
pendence and become tumorigenic (Cleveland et al., 
1994; Valtieri et al., 1987). However, it should be noted 
that normal cells also have the capacity to produce 
certain growth factors under conditions that can tran- 
siently activate autostimulatory pathways. 

Autocrine-transforming interactions have been 
identified in a number of human malignancies. At 
least one PDGF chain and one of its receptors have 
been detected in a high fraction of sarcomas as well as 
in glial-derived neoplasms (Nister et al., 1988; Heldin 
and Westermark, 1989; Matsui et al., 1989; Maxwell 
et al., 1990). In tissue culture, such tumor cells exhibit 
evidence of a functional autocrine loop, in which 
chronic PDGF receptor activation can be demon- 
strated by the detection of tyrosine-phosphorylated 
receptors and/or downregulation of the receptor 
protein. Thus, it appears that inappropriate expression 
of PDGF often plays an important role in such tumors. 

TGFz is often detected in carcinomas that express 
high levels of EGF receptors (Derynck, 1988; Di 


Marco et al., 1989). The role of acidic or basic FGF in 
tumors is less well established. Since neither of these 
molecules possesses a secretory signal peptide 
sequence, their normal route of release from cells is 
not through the classical secretory pathway by which 
growth factor receptors are processed (Burgess and 
Maciag, 1989; Chiu, 1989). However, studies have 
demonstrated the expression of bFGF by human mel- 
anoma cell lines but not by normal melanocytes 
(Halaban et al., 1988a). Moreover, only the latter 
require bFGF for proliferation in culture (Halaban 
et al., 1988b). Evidence that antagonists of FGF can 
inhibit growth of melanoma cells argues for a role of 
bFGF in the uncontrolled growth of these cells 
(Halaban et al., 1988a). Since many more ligands for 
tyrosine kinase receptors have recently been identi- 
fied, the contribution of autocrine loops to human 
malignancies is probably much more extensive than 
is presently documented. 

While several growth factors have been shown to 
induce transformation by an autocrine mode, it is also 
worth considering the possible role that growth factors 
acting in a paracrine mode might have in predisposing 
to cancer or contributing to malignant progression. For 
example, chronic stimulation by growth factors acting 
in a paracrine mode under conditions such as inflam- 
matory bowel disease or chronic hepatitis involving 
tissue damage and repair might increase the prolifer- 
ation of a polyclonal target cell population. This could 
increase the frequency of spontaneous genetic changes 
in the population, eventually selecting for a cancer 
cell. Experimental evidence in support of this concept 
has been obtained (Coussens et al., 2000). By such a 
model, increased production of paracrine-acting 
growth factors might function in a manner analogous 
to that of a tumor promoter. 

Tumor cells are also known to release growth fac- 
tors, which act in a paracrine manner to stimulate 
proliferation of cells such as endothelial cells. The 
ability of a tumor cell population to grow beyond a 
certain size is thought to be dependent on the develop- 
ment of new blood vessels, termed neovascularization 
or neoangiogenesis (Hanahan and Folkman, 1996). 
Thus, the production by tumor cells of growth factors 
including vascular endothelial growth factor (VEGF), 
which acts specifically on endothelial cells to attract 
them and induce their proliferation, represents a 
specific example of the role of a paracrine-acting 
growth factor in tumor progression. 
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Autogenous control is the process by which a gene 
product either inhibits (negative autogenous control) 
or activates (positive autogenous control) expression 
of the gene coding for it. 


See also: Gene Expression 


Autoimmune Diseases 
A Cooke 
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The immune system provides one of the body’s most 
important defense mechanisms against infection. 
The immune system can be subdivided into two 
branches, innate and adaptive. Key features of the 
adaptive immune response are that it is specific and 
that it has a memory component. Vaccination utilizes 


both the specific and the memory component of the 
immune response to provide long-term protection 
from a specific disease. The immune response is nor- 
mally directed against external agents but a category of 
diseases, autoimmune diseases, arise as a result of im- 
mune reactivity to the body’s own components. These 
autoimmune responses are both specific and have mem- 
ory. When this immune reactivity is directed against 
a discrete target organ such as the insulin-secreting 
B cells of the pancreas, this is termed an organ-specific 
autoimmune disease. If the reactivity is directed 
against a more generalized target such as DNA this 
is called a nonorgan-specific autoimmune disease. The 
development of autoimmune disease is normally con- 
trolled by many genes, some of which may govern the 
innate and some the adaptive immune response. 


See also: Immunoglobulin Gene Superfamily 


Autonomous Controlling 
Element 
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An autonomous controlling element is an active trans- 
poson (in maize) demonstrating the ability to transpose 
(cf. nonautonomous controlling element) 


See also: Nonautonomous Controlling Elements; 
Transposable Elements 
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Autoradiography is a technique for detecting radio- 
actively labeled molecules by virtue of their ability to 
create an image on photographic film. 


Autoregulation 


J Hodgkin 
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Autoregulation is a term used for situations in which a 
gene or gene product regulates its own activity, either 
positively or negatively. Most cases of autoregulation 


involve genes that play regulatory roles themselves, 
affecting the expression of other genes as well as their 
own. In some cases, however, autoregulation is inci- 
dental to the main function of the gene. For example, 
bacterial ribosomal proteins are able to regulate their 
own level of synthesis by both transcriptional and 
translational mechanisms. 

In cases of positive autoregulation, a positive feed- 
back loop can be established, which will potentially 
lead to an unlimited increase in activity of the gene in 
question. However, other factors that are essential for 
gene expression will usually be limiting and thereby 
prevent activity from rising above a certain level. A 
system involving such a loop will therefore tend to 
adopt one of two states: either zero or maximal activ- 
ity. This can provide the essential switch mechanism 
for controlling the choice between two physiological 
states, or two developmental pathways. 

In cases of negative autoregulation, negative feed- 
back occurs and will prevent activity from rising 
above a certain level. Negative autoregulation there- 
fore tends to act homeostatically, keeping the amount 
or activity of a given gene product at a constant level. 
This will be important for genes that exhibit dosage 
sensitivity. For example, cells contain many macro- 
molecular complexes made up of different protein 
subunits, which need to be synthesized in the optimal 
stoichiometric ratios for efficient assembly of the 
complex. Many DNA-binding proteins also need to 
be tightly regulated in amount, because overproduc- 
tion can lead to binding to inappropriate target sites, 
or to repression instead of activation. 

The activity of a gene can be controlled at many 
levels, from DNA to protein, and it is therefore not 
surprising that autoregulatory controls can take a var- 
iety of forms. A simple case, often encountered, is 
provided by a DNA-binding protein that can bind to 
the control region of the gene encoding that protein. If 
the protein is a transcriptional activator, then positive 
autoregulation will occur, and if it is a repressor, nega- 
tive autoregulation will occur. Sometimes both effects 
can occur with the same protein. A classic example of 
this is provided by lambda repressor protein, during 
lysogenic growth of bacteria. The repressor is able to 
bind to three sites in the maintenance promoter Py of 
the cI gene which encodes repressor, with different 
affinities. At low levels of repressor, two of these sites 
are occupied and the protein acts as a transcriptional 
activator. At higher levels of repressor, the third site is 
also bound, and this results in transcriptional repres- 
sion. As a result, expression of the repressor gene is 
maintained, but cannot rise above a certain level. 

A variety of cases of autoregulation acting at the 
level of RNA have been studied in detail. Proteins 
with RNA affinity can bind to their own mRNA or 
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pre-mRNA and thereby affect any of the steps 
involved in RNA processing, transport, translation 
and stability. The Drosophila protein SXL, encoded 
by the sex determination gene Sx/ (Sex lethal) illus- 
trates positive autoregulation by means of alternative 
RNA splicing. Throughout most of their lives, male 
and female fruit flies produce identical primary 
transcripts from the Sxl gene, which contain eight 
exons. In female flies, SXL protein binds to this pre- 
mRNA and modifies its splicing, so that exon 3 is 
skipped. The resulting mRNA with seven exons 
encodes full-length, functional SXL protein. In male 
flies, SXL protein is not made and exon 3 is included in 
the final mRNA. This exon contains a stop codon, so 
only a truncated, nonfunctional SXL protein can be 
made in males. Consequently, both the presence of 
SXL (in females) and the absence of SXL (in males) 
are self-sustaining processes. The system is primed by 
sex-specific events in early embryogenesis, which lead 
to the production of SXL protein in females but not in 
males. In addition to regulating its own synthesis, SXL 
has multiple functions in promoting female develop- 
ment and preventing activation of the male mode 
of dosage compensation. Negative autoregulation by 
means of regulated splicing is observed in some yeast 
ribosomal proteins, which can inhibit productive spli- 
cing of their own pre-mRNAs. 

Autoregulation at the protein level is also encoun- 
tered. Many metabolic enzymes exhibit product inhib- 
ition, which is a form of negative autoregulation. 
Proteolytic enzymes can exhibit positive autoregula- 
tion, by cleaving an inactive precursor (proenzyme) to 
an active final form. They can also exhibit negative 
autoregulation by self-cleavage, so that proteolyic 
activity is reduced or destroyed. Proteins that are 
able to modify other proteins by phosphoryla- 
tion (kinases) or dephosphorylation (phosphatases) 
can modify their own activity either positively or 
negatively. Other kinds of protein modification, such 
as acetylation, offer the same potential. 


See also: Dosage Compensation; Gene 
Regulation; Phage à Integration and Excision 


Autosomal Inheritance 
D E Wilcox 
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When mutation in a single gene has a large effect on 
the phenotype it shows a pattern of inheritance that is 
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similar to those described by Mendel for transmission 
of characteristics in peas. The inheritance pattern is 
determined by the transmission of the chromosomes 
and is called autosomal inheritance when the gene 
is located on one of the autosomal chromosomes. A 
mutant phenotype is dominant if it is expressed when 
a single allele is affected, i.e., when the genotype is 
heterozygous with one mutant and one wild-type or 
normal allele (M/+). A mutant phenotype is recessive 
when it is expressed only when mutations are present 
on both alleles, i.e., homozygous mutant (m/m). Each 
pattern of inheritance is associated with a number of 
characteristic features. Not all disorders or families 
will show all of these features, but identification of 
features present in an affected family can help deter- 
mine or confirm the pattern of inheritance allowing 
genetic risks for family members to be calculated. 


Autosomal Dominant Inheritance 


Figure | shows an example of a family affected by 
Machado-Joseph disease, an autosomal dominant 
neurological condition that causes a progressive 
ataxia (unsteadiness). The condition varies in severity 


between family members and tends to become more 
severe in each successive generation. The gene locus is 
on chromosome 14; in Figure |, the chromosome 
pairs are shown below each individual’s pedigree sym- 
bol. The genotypes are +/+ (normal homozygote) 
and +/M (mutant heterozygote). 


Features of Autosomal Dominant Pedigrees 


1. Phenotype expressed in heterozygote. Each affect- 
ed person has inherited the mutation from only one 
parent and so is heterozygous. 

2. Vertical pattern of inheritance. The result of the 
mutation being present in heterozygotes in several 
generations of a family is a vertical pattern of af- 
fected individuals. Typically, this is grandparent to 
child to grandchild. In this family, the transmission 
from the male, II:3 to II:2 then to IV:1, is a vertical 
pattern. 

3. Affected person has a 1 in 2 chance of transmitting 
phenotype to each offspring. Each affected person 
has two chromosomes and the chance of transmit- 
ting the chromosome with the mutated allele is 1 in2. 


IV:1 IV:2 


Figure | 
individual’s chromosome pair. 


Example pedigree with autosomal dominant inheritance showing the arrangement of the alleles on each 


4. Males and females are affected in equal proportions. 
Autosomes, and their linked genes, are transmitted 
to offspring independently of the sex chromosomes 
thus males and females are equally likely to be 
affected. 

5. Male to male transmission. In this family, II:3 has 
transmitted the disorder to his son MI:2. This 
excludes X-linked inheritance as a male transmits 
his Y sex chromosome and not his X chromosome 
to his sons. 

6. Nonpenetrance. It is extremely unlikely that II:1, 
II:3, and III:4 have arisen because of separate new 
mutations. The most likely explanation is that 1:2 
and II:4 are obligate gene carriers. Nonpenetrance 
occurs when a heterozygote has no evidence of 
the phenotype. In the case of I:2, who is the first 
heterozygote in the family, it may be because 
the mutation has started in her ovaries. This is 
called gonadal mosaicism and such a person’s 
somatic cells do not have the mutation and are 
homozygous normal. II:4 cannot be a gonadal 
mosaic as she has inherited the mutation from her 
mother via the egg, so the mutation will be in all her 
cells. Although IV:2 has a healthy phenotype, it 
would be a mistake to assume his genotype is 
healthy. He should be offered a presymptomatic 
genetic test when he is old enough to consent. His 
healthy aunt, III:3, was tested and found to have a 
normal genotype. 

7. Variable expressivity. Machado-Joseph disease is 
caused by an unstable amplified CAG trinucleotide 
repeat mutation. The size of the amplified repeat is 
proportional to the severity of the disorder. How- 
ever, although variation in severity among affected 
family members is a feature of many autosomal 
dominant disorders, it is not universal. Conditions 
such as achondroplasia, a skeletal dysplasia causing 
restricted growth, show little variation even among 
unrelated families. The molecular explanation is 
that most cases of achondroplasia are caused by 
the same G-to-A missense mutation in the FGFR3 
gene. Mutations elsewhere in the gene are asso- 
ciated with a number of other distinct disease 
phenotypes. This is an example of pleiotropic 
effects of a single gene, causing different pheno- 
types and even distinct disorders, being dependent 
on the nature of the mutation. 

8. Anticipation. The increase in severity with each 
generation is called anticipation. It is present in a 
number of autosomal disorders, which, like 
Machado-Joseph disease, are caused by unstable 
trinucleotide repeat mutations. The increase in 
mutation size is often greater when transmitted 
through one sex. In Machado-Joseph disease and 
Huntington disease anticipation occurs through 
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paternal transmission and in myotonic dystrophy 
through maternal transmission. 


Autosomal Recessive Inheritance 


Figure 2 shows an example of a family affected by 
albinism type 1, an autosomal recessive condition of 
pigmentation caused by mutations in the tyrosinase 
gene on chromosome 11. The chromosome pairs 
are shown below each individual’s pedigree symbol. 
The genotypes are +/+ (normal homozygote), +/a 
(mutant heterozygote), and a/a (mutant homo- 
zygote). Only mutant homozygotes express the pheno- 
type. Where an individual’s genotype, with respect 
to the tyrosinase gene, cannot be inferred from the 
position in the pedigree, the following notation is 
used: +/?. 


Features of Autosomal Recessive Pedigrees 


1. Phenotype only expressed in homozygote. Each 
affected person has inherited the mutation from 
both parents and so is homozygous. 

2. Horizontal pattern of inheritance with low risk to 
offspring of affected individuals. Since in affected 
individuals the mutation needs to be inherited from 
both parents (and partners of other individuals in the 
family such as II:1 and II:4 are unlikely to be car- 
riers), recurrence of the phenotype is usually only 
seen in one sibship. In this family, the affected indi- 
viduals II:4 and II:5 represent the horizontal pattern. 
Note that the mutation is carried by heterozygotes 
in the other generations. The parents are obligate 
carriers since the new mutation rate in recessive dis- 
orders tends to be low compared with the frequency 
of heterozygous mutation carriers in the population. 
HI:2 is also an obligate carrier since she must inherit a 
mutation from her homozygous (a/a) mother. Obli- 
gate carriers are identified on the pedigree by half 
shading the pedigree symbol (see Figure 2). In a 
condition that affects two or more sibs whose par- 
ents are healthy, autosomal recessive inheritance of a 
single gene disorder is not the only explanation. 
Multifactorial disorders such as neural tube defects 
also recur in sibships owing to a combination of 
shared genes of small effect and environmental 
factors. This is called multifactorial inheritance. 

3. Parents of an affected individual have a 1 in 4 
chance of having a recurrent affected child. Each 
parent (+/a) has a 1 in 2 chance of transmitting the 
chromosome with the mutant allele. There is there- 
fore a 1 in 4 chance of having a homozygous (a/a) 
child with the affected phenotype. The other three 
possible genotypes are +/+, +/a, and a/+. Each of 
these genotypes has a healthy phenotype; therefore, 
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Figure 2 Example pedigree with autosomal recessive inheritance showing the arrangement of the alleles on each 


individual’s chromosome pair. 


the carrier risk for the healthy sib of an affected 
person is 2/3. It is not 2/4 as one of the four pos- 
sible genotypes (a/a) is excluded because it has an 
affected phenotype. 

4. Males and females are affected in equal pro- 
portions. Autosomes, and their linked genes, are 
transmitted to offspring independently of the sex 
chromosomes, thus males and females are equally 
likely to be affected. 

5. Constant expressivity within an affected sibship. 
Although each recessive mutation affects the func- 
tion of a gene to a differing degree, each affected sib 
carries the same pair of mutations and so has a 
similar phenotype. The severity of the phenotype 
varies between sibships with different combin- 
ations of mutations. An example is spinal muscular 
atrophy, a cause of progressive neuromuscular 
weakness, that can vary from severely affected 
infants to affected children and mildly affected 
adults, depending on the combination of mutations. 

6. Consanguinity. Consanguinity should always be 
asked about when taking a family history of an 
autosomal recessive disorder. Since relatives share 
genes, a relative is much more likely to carry an 
individual’s recessive mutation than a nonrelative. 
The consanguinity between III:1 and II:2 risks a 
recurrence in this family. II:2 has a 2/3 carrier risk 
so III:1’s carrier risk is 1/3. III:2 is an obligate 
carrier and has a carrier risk of 1/1. The chance 
that III:1 and III:2 are both carriers and might 


have an affected child is therefore 1/3 x 1/1 x 1/4 
= 1/12. 

7. Ethnic origin. Certain racial groups have high inci- 
dences of certain autosomal recessive disorders: 
those that originate in northwest Europe are at 
risk of cystic fibrosis; those in malarial regions 
of Africa, sickle-cell anemia; and those in the 
Mediterranean, Middle and Far East, thalassemia. 
Individuals whose partner’s origin is outside their 
own racial group will have a lower risk of their 
population’s recessive disorders. 


Autosomal Codominant Expression 


Medical tests have revealed a number of polymorph- 
isms where both alleles can be distinguished in hetero- 
zygotes. An example is the three-allele ABO blood 
group system. Serological tests can detect both A and 
B alleles in an individual with A/B genotype. A and B 
are therefore codominant with respect to each other. 
Allele O is recessive to both A and B and individuals 
with A/O or B/O genotypes have A or B phenotypes, 
respectively. 


Further Reading 

Connor JM and Ferguson-Smith MA (1997) Essential Medical 
Genetics, 5th edn. Oxford: Blackwell Science. 

Gelehrter TD Collins FS and Ginsburg D (1998) Principles of 
Medical Genetics, 2nd edn. Bethesda, MA: Williams & Wilkins. 


Online Mendelian Inheritance in Man. http://www.ncbi.nlm.nih. 
gov/omim/ 

University of Glasgow, Department of Medical Genetics, Ency- 
clopaedia of Genetics pages contain a number of relevant 
illustrations and animated diagrams. http://www.gla.ac.uk/ 
medicalgenetics/encyclopedia.htm 


See also: Clinical Genetics; Mendelian Genetics; 
Mendelian Inheritance 
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Chromosomes other than the sex chromosomes are 
referred to as autosomes. The number of autosomes 
varies from one organism to another. Humans have a 
total of 46 chromosomes. Of these, 44 are autosomes 
and 2 are sex chromosomes — either XX for females or 
XY for males. Mice have 40 chromosomes, including 
38 autosomes and 2 sex chromosomes. Although nor- 
mal diploid cells have two copies of each autosome, 
autosomal number abnormalities have been associated 
with certain diseases such as Down syndrome, which 
results from an autosomal trisomy of chromosome 21. 
For the most part, genes on autosomes tend to follow a 
Mendelian pattern of inheritance. 


See also: Disjunction; Karyotype; 
Sex Chromosomes; Trisomy 
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An auxotroph is a mutant organism that has an add- 
itional nutritional growth requirement, when com- 
pared with the parental organism from which it was 
derived. This concept has been especially important in 
the genetic analysis of microorganisms such as bac- 
teria and fungi, some of which normally have very 
few growth requirements and therefore can grow on 
simple defined synthetic media in the laboratory. 
Mutations in any of hundreds of different genes 
can cause auxotrophy, resulting in a requirement for 
an amino acid, nucleotide, vitamin, etc., or some 
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combination of requirements. These kinds of muta- 
tions are easy to select against in genetic crosses and 
have been crucial for the analysis of both the large- 
scale and fine genetic structure and biosynthetic path- 
ways of numerous microorganisms. 

In order to obtain new auxotrophs in the labora- 
tory it is possible in many cases to induce mutations 
in a culture of the strain under study, and plate out 
the surviving cells to obtain colonies on agar plates 
containing rich medium with many nutrients to allow 
auxotrophs as well as parental cells to grow. This is 
followed by ‘replica plating’ (to copy a pattern of 
colonies using a round velvet-covered block for im- 
printing some of the cells from the colonies, for trans- 
fer onto other plates) onto both rich medium and 
similarly onto minimal medium, in order to identify 
rare clones which grow only on the rich medium. The 
particular growth requirements of various new un- 
known auxotrophs can be determined by testing them 
on various mixtures (pools) of known ingredients. 

An alternative procedure for auxotroph isolation, 
when mutants of a particular type are desired (e.g., 
requiring arginine for growth), is to plate out muta- 
genized cultures onto minimal medium plates that 
contain a very small amount of the particular ingredi- 
ent (e.g., arginine), so that the desired auxotrophs (e.g. 
arginine-requiring) will form rare tiny colonies as 
compared with the majority large colony type. Some 
of the tiny colonies are found to carry mutations in a 
gene (e.g., arg) for the desired phenotype. 

The efficiency of isolation of rare auxotrophs in a 
mutagenized culture can be improved using a method 
that kills most of the nonmutated cells but does 
not kill the auxotrophs. Such a method is the use of an 
antibiotic such as penicillin, which kills, for example, 
Escherichia coli or Salmonella if they are growing (by 
interfering with cell wall synthesis) but not if they are 
starved and in stationary phase. This can be accom- 
plished by growing the mutagenized culture in min- 
imal medium for a period, to allow the auxotrophs to 
come to a halt in growth, and then adding penicillin to 
kill (most of) the growing cells. The survivors are then 
screened to identify auxotrophic mutants. 

Auxotrophic mutations differ widely in how defect- 
ive or ‘tight’ the block in function is. A partially 
blocked, or ‘leaky’ auxotroph is known as a ‘brady- 
troph,’ and grows slowly in the absence of the 
required nutrient. Some of these auxotrophic require- 
ments are much tighter at either high or low tempera- 
ture, thus providing a conditional phenotype that is 
useful in some studies. Most auxotrophic mutants are 
mutated in a biosynthetic gene or a gene for tRNA 
synthesis, or an aminoacyl-tRNA synthetase. 


See also: Screening 
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Bacterial artificial chromosomes (BACs) are plasmids 
used for cloning and stably maintaining large seg- 
ments of foreign DNA in Escherichia coli. This is 
important in various types of analyses of mammalian 
and other genomes. A problem in some recombinant 
DNA experiments is the stable maintenance of large 
(>100 kilobase pairs) inserts in E. coli. Typically, most 
plasmid vectors used for cloning foreign DNA into 
E. coli can stably carry DNA fragments of 10 kb or 
less. When the DNA inserted into a standard cloning 
vector exceeds this size, several problems may result: 


1. The plasmid clone replicates poorly (due, for 
instance, to foreign sequences that adopt structures 
that are difficult for the E. coli apparatus to repli- 
cate, or do not segregate evenly at cell division). 

2. The cell grows slowly (due, for example, to toxic 
products being produced by gene expression from 
the foreign DNA). 

3. The inserted DNA adopts structures, such as cruci- 
forms, that are readily deleted in E. coli. 


In the first two cases, rare deletions in the insert that 
reduce its size and eliminate growth problems will have 
a growth advantage over the parental clone and even- 
tually overgrow the culture. Thus in all these instances 
the large insert is unstable and hard to maintain. 

In 1992, Shizuya and Simon at Cal Tech developed 
vectors capable of maintaining large inserts without 
these problems. These vectors were plasmids that con- 
tained the origin of replication from the F factor of 
E. coli. The F factor is a large plasmid that is normally 
capable of replication of DNA molecules greater than 
100 kb in length. It is a low copy number plasmid, 


being present in only one to two copies per cell. 
These two features aid in preventing problems 1 and 
2 above. The mechanism of deletion of structures 
such as cruciforms (problem 3) requires that certain 
enzymes, such as nucleases, load onto DNA and may 
also be influenced by DNA superhelicity, both of 
which may be different with a replication fork formed 
from the F origin of replication than from other repli- 
cons, such as are used in higher copy number vectors. 
BAC vectors have been engineered to have other fea- 
tures such as selectable antibiotic resistance genes and 
restriction enzyme cleavage sites for inserting or 
removing foreign DNA. 

BAC vectors have been extremely successful for 
cloning and maintaining mammalian DNA in E. coli. 
In the Human Genome Project, BAC clones from 
large libraries (10°-10° clones) were first carefully 
mapped along each chromosome, and then the DNA 
sequences of a subset of these BAC clones were indi- 
vidually determined and assembled to provide the 
complete human genome sequence. 


See also: DNA Cloning; Plasmids; Vectors 
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Dozens of genomes have been sequenced and many 
more will soon be added to the list. Unfortunately, 
most genomes have long runs of nucleotides that 
encode genes with unknown functions. Achieving an 
understanding of model bacteria as a means of organ- 
izing our biological knowledge (and more generally 
our knowledge about what constitutes life) is there- 
fore of particular importance. Only two bacterial 
models are available: Escherichia coli, which is the 
model for gram-negative bacteria and is the best- 
known living organism; and Bacillus subtilis, which is 
the model for gram-positive bacteria. The recent con- 
troversial proposal of Gupta to classify bacteria into 
monoderm and diderm cell types places these two 
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model systems in key positions in investigations of 
what constitutes life. In the case of B. subtilis, most 
of the studies have been devoted to specific processes 
such as sporulation, competence/transformation/re- 
combination, or secretion. Until recently, not much 
was known about the intermediary metabolism of 
B. subtilis. After the elucidation of its genome 
sequence, facts and concepts in this area increased 
dramatically. Apart from its importance as a model 
organism, B. subtilis is also widely employed in bio- 
technology, for example in fermentation processes 
(secretion of enzymes and processing of plants such 
as soybean). Because the sequence of its genome is 
now known, it is fast becoming one of the few uni- 
versal models for the understanding of the require- 
ments for life in unicellular organisms. 


Bacillus subtilis and its Biotope 


The objective of any living organism is to occupy 
a part of the earth’s crust. This means, among other 
ancillary functions, the exploration, colonization, 
maintenance and exploitation of the local resources 
dealing with congeners and with other organisms, etc. 
As a consequence, one cannot understand an organism 
if one does not have knowledge of its habitat. Bacillus 
subtilis was first identified in 1872. It is a bacterium that 
can be routinely obtained in pure culture by soaking 
hay in water for a few hours at 37 °C, then filtering and 
boiling for 1h at neutral pH. Bacillus subtilis has also 
been isolated directly from soil-inoculated nutrient 
agar, where B. subtilis predominates among the out- 
growing cultures. Spores are more readily obtained in 
solid media than in liquid media, and they require the 
presence of manganese ions. The bacteria produce a 
complex lipopeptide, surfactin, that permits them to 
glide very efficiently over the surface of certain types of 
media. This property is likely to be related to coloni- 
zation of the surfaces of leaves (the ‘phylloplane’), 
fruits or sometimes roots. Indeed B. subtilis makes up 
the major population of bacteria on flax stems during 
the retting process. Vegetative cells of B. subtilis are 
responsible for the early stages of breakdown of plants, 
and sometimes products of animal origin; some vari- 
ants (e.g., B. amyloliquefaciens) cause potato tubers to 
rot. When conditions become unfavorable, the onset of 
a differentiation process, sporulation, permits the cells 
to generate resistant spores that can be easily dispersed 
throughout the environment, where they will germin- 
ate if conditions are appropriate. 

Unlike most other bacterial species, endospore- 
forming bacteria are highly resistant to the lethal 
effects of heat, drying, many chemicals, and radiation. 
In fact, one fashionable hypothesis of the origin of life 
on earth by panspermia by Sven Arrhenius, and more 


recently Francis Crick, relies on the notion that bac- 
terial spores such as those of B. subtilis could travel 
through space and survive for millions of years. Des- 
pite its appeal to a wild imagination, this hypothesis 
essentially puts the investigation of the origin of life 
out of our reach, because exploring the whole Universe 
is not possible. 


Compartmentalization: Bacillus subtilis 
and its Envelopes 


Envelope of the Vegetative Cell 
Gram-positive bacteria and, in general, monoderms 
have complex envelopes comprising one bilayer lipid 
membrane separating the cytoplasm from the exterior 
of the cell. The membrane is part of a very complex 
structure that comprises many layers (up to 40 in the 
case of B. subtilis) of murein, or peptidoglycan, a com- 
plex of peptides containing D-amino acids (in par- 
ticular mesodiaminopimelic acid), and amino sugars. 
The cell envelope also has several layers of teichoic 
acid (Figure 1). 

The possible existence of a periplasm in B. subtilis 
in a distinct cell compartment surrounded by the 
cytoplasm membrane and the cell wall is a controver- 
sial issue. Cytoplasm, membrane, and protoplast 
supernatant fractions were prepared from protoplasts 
generated from phosphate-limited cells. The proto- 
plast supernatant fractions was found to include cell 
wall-bound proteins, exoproteins in transit, and con- 
taminating cytoplasmic proteins arising through leak- 
age from a fraction of protoplasts. By this operational 
definition, 10% of the proteins of B. subtilis can be 
considered periplasmic. 


Sporulation 

Upon starvation, B. subtilis stops growing and initi- 
ates sporulation. This developmental process involves 
differentiation into two cell types (Figure 1). The 
process begins with a reorganization of the cell cycle 
that leads to the production of cells whose size and 
chromosome content is appropriate for the develop- 
mental process. The formation of the two cell types, 
a forespore and a mother cell twice as large as the 


Figure | (See Plate 2) Electron micrograph of Bacillus 
subtilis in the process of sporulation. 


forespore, with differing developmental fates is the 
first morphological indication of the early stages of 
sporulation in B. subtilis. Endospore formation is a 
multistep process that is common among bacilli. This 
seemingly simple structure is the product of a very 
complex network of interconnected regulatory 
pathways that become activated during late growth 
in response to unbalanced nutritional shifts and cell 
cycle-related signals. Sporulation starts with stage 0 
(vegetative growth). Symmetrical cell division, char- 
acteristic of vegetative growth, is blocked. Instead, the 
cell divides asymmetrically to produce a small polar 
prespore cell and a much larger mother cell. During 
stage I, asymmetrical preseptation starts. The cellular 
DNA takes the shape of an axial filament. At stage II, 
septation proceeds and the daughter chromosomes 
are separated. Spore development follows at stage ITI 
(engulfment of the forespore and complete separation 
of the spore membrane from that of the mother cell). 
Stage IV involves formation of the spore cortex. In stage 
V spore coat proteins are synthesized and assembled. 
At stage VI the spore becomes highly refractile under 
the microscope, and it acquires heat and stress resist- 
ance. Finally, the programmed death of the mother cell 
occurs, leading to lysis and release of the mature spore 
(stage VII). Pigments are produced that stain the spores 
from reddish brown to blackish brown (black in the 


presence of tyrosine). 
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Our understanding of sporulation control in 
B. subtilis is extensive. The process combines phos- 
phorylation cascades mediated by kinases and phos- 
phatases with a network of transcription controls 
by sigma factors, together with membrane-bound ef- 
fector molecules that control compartmentalization 
(Figure 2). Despite the intensive work of hundreds of 
scientists on many of the signals involved in the onset 
and control of sporulation, some still remain unknown. 

The spore coat is a complex envelope comprising 
several layers of spore coat proteins that protect the 
almost entirely desiccated interior of the spore, where 
DNA is compacted and protected from the harmful 
influence of the environment. Under conditions of 
appropriate moisture, in media that contain alanine, 
glucose, and minerals, spores are able to germinate. 
This process involves swelling and a complex lytic 
process that opens and sometimes degrades the coat- 
ing envelope, during which time metabolism is 
initiated. Cells then resume normal vegetative growth. 


Quorum-Sensing and Chemotaxis 

It has long been known that bacteria form colonies 
on agar plates. If the medium is appropriate these 
colonies give rise to bacterial swarming. In the late 
1960s, it was observed that cultures of Vibrio fischeri 
(a luminescent gram-negative bacterium that colon- 
izes squid) remained nonluminescent during the first 


Stage II 


Free spore 


Figure 2 The stages of sporulation. The various sigma transcription factors that control gene expression processes 
during sporulation are indicated within the compartments where they operate. 
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hours of growth, during which time the number of 
cells increased. Luminescence appeared when the 
population reached a significant density, at a moment 
when the bacteria ran out of nutrients. This collective 
behavior meant that a bacterial function was expressed 
at a certain cell density: the organisms in the popu- 
lation were sensing each other. This, was termed 
‘quorum sensing.’ 

A variety of processes are regulated in a cell density 
or growth phase-dependent manner in gram-positive 
bacteria. In the early 1990s quorum sensing was dis- 
covered in B. subtilis and was certainly linked to spor- 
ulation (to swarm or to sporulate, that is the question), 
but the functional reason(s) for the existence of the 
process are not yet known. Most bacteria that use 
quorum sensing systems inhabit an animal or plant. 
The microorganisms benefit from the process, but 
the host organism may or may not. Each bacterium 
produces small diffusible molecules that allow cell-to- 
cell communication. As the population of bacteria 
increases, so does the concentration of the signaling 
molecules. Sensors recognize these molecules. Once 
the local concentration in the medium has reached a 
threshold value, the sensor proteins transmit a signal 
to a transcriptional regulator. Examples of such 
quorum-sensing modes are the development of 
genetic competence in B. subtilis and Streptococcus 
pneumoniae, the virulence response in Staphylococcus 
aureus, and the production of antimicrobial peptides 
by several species of gram-positive bacteria, including 
lactic acid bacteria. 

Avariety of ways for bacterial populations to coord- 
inate their activities have been discovered. Cell density- 
dependent regulation in these systems appears to 
follow a common theme. First, the signal molecule (a 
posttranslationally processed peptide-pheromone) is 
secreted by a dedicated ATP-binding cassette (ABC) 
exporter. The role of the secreted peptide pheromone 
is to function as the input signal for a specific sensor 
component of a two-component signal-transduction 
system. Coexpression of the elements involved in 
this process results in self-regulation of peptide- 
pheromone production. Peptides are secreted and 
processed under various conditions that are further 
recognized by the cell. Next, in response to phero- 
mone, cells swim in a coordinated fashion, thereby 
forming a kind of wall surrounding rings of bacteria 
having the same exploration behavior (Figure 3). 
Bacillus subtilis is a highly motile bacterium, endowed 
with a complex flagellar machinery. This permits cells 
to swim toward nutrients or away from repellents. 
Many genes similar to those known in motile bacteria 
are found in B. subtilis, making it likely that the tum- 
bling and swimming processes function similarly to 
those of E. coli. One can expect that this behavior 


permits the cell to invade and colonize the surface of 
leaves where they can find nutrients (especially as 
carbon and nitrogen sources as well as micronutrients) 
secreted by the plant or decaying leaves. The bacteria 
secrete antibiotics that permit them to outcompete 
other organisms, for example the products of the pks 
genes act against Agrobacterium species. This estab- 
lishes a cooperation between the plant and the bac- 
teria; commensalism rather than symbiosis. 


Protein Secretion 

Bacillus subtilis is one of the organisms of choice in the 
study of protein secretion. At the time of this review, 
many fundamental aspects of this process are not yet 
understood. Several systems enable proteins to be 
inserted into the membrane and/or to be located out- 
side of the membrane or secreted into the surrounding 
medium. In B. subtilis, the Sec-dependent pathway 
(one that recognizes signal peptides) has at least five 
different signal peptide peptidases. Proteins that are 
periplasmic in gram-negative bacteria are also found 
in B. subtilis, presumably as lipoproteins (i.e., posses- 
sing a specific signal peptide, cleaved upstream of a 
cysteine residue that is covalently coupled to the outer 
lipid layer of the cell membrane upon cleavage). 

The signal recognition particle (SRP) system is an 
oligomeric complex that mediates targeting and inser- 
tion of proteins into the cytoplasmic membrane. SRP 
consists of a 4.5S RNA and several protein subunits. 
One of these subunits, Ffh, interacts with the signal se- 
quence of nascent polypeptides. The N-terminal resi- 
dues of Ffh include a GTP-binding site (G-domain) 
and are evolutionarily related to similar domains in 
other proteins. A second protein, the counterpart of 
the E. coli FtsY protein, is believed to play a role 
similar to that of the docking protein in eukaryotes. 

Finally, it appears that some B. subtilis secreted 
proteins are made of two parts. The first part remains 
inserted in the membrane, presumably as a permease, 
and the second part is liberated in the surrounding 
medium after cleavage by an unknown protease. 


Metabolism 


In addition to the need for compartmentalization, liv- 
ing cells must chemically transform some molecules 
into others. Metabolism is the hallmark of life. Cells 
can be ina dormant state, as is the case with spores, for 
example, but one cannot be sure that they are living 
organisms unless, at some point, they initiate metab- 
olism. In general, one distinguishes between primary 
metabolism (the transformation of molecules that sup- 
port cell growth and energy production) and second- 
ary metabolism (transactions involving molecules that 
are not necessary for survival and multiplication, but 
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Figure 3 Competence is triggered when Bacillus subtilis encounters a signal from the environment and when an 
appropriate quorum is reached, monitored by phenomones synthesized by the bacteria. The chain of events is 
depicted. A sensor controls a regulator which, though a phosphorylation cascade, controls transcription. The onset 
of sporulation negatively controls competence under appropriate conditions through the action of the protein 
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assist in the exploration and occupation of biotopes, 
e.g., antibiotic synthesis). 


Transport of Basal Cell Atoms 
Carbon, oxygen, nitrogen, hydrogen, sulfur, and 
phosphorus are the core atoms of life. Electron trans- 
fers and catalytic processes, as well as the generation of 
electrochemical gradients, require many other atoms 
in the form of ions. Metabolic processes allow the cell 
to concentrate, modify, and excrete ions and molecules 
that are necessary in energy management, growth and 
cell division. 

Nutrients and ions are transported into cells by 
a number of more or less specific permeases, most 
of which belong to the ABC permease category. In 
B. subtilis these permeases generally comprise a bind- 
ing lipoprotein responsible for part of the specificity, 
located at the external surface of the membrane, an 
integral membrane channel made of proteins of two 
different types, and a dimeric, membrane-bound cyto- 
plasmic complex, which binds and hydrolyzes ATP as 
the energy source. For positively charged ions, select- 
ivity is the most important feature of the permease, 
because the electrochemical gradient is oriented 
toward the interior of the cell (negative inside). Ions 
must be concentrated from the environment until they 
reach the concentration required for proper activity, 


but must not reach inhibitory levels. Apart from iron, 
which is scavenged from the environment with highly 
selective siderophores synthesized in response to iron 
limitation, manganese is the most important transition 
metal ion for B. subtilis. It is required for many en- 
zyme activities, suchas superoxide dismutase, agmatin- 
ase, phosphoglycerate mutase, pyrophosphatase, etc. 
Copper is important for electron transfer and cobalt is 
required by the important recycling protein methio- 
nine aminopeptidase. Nickel is required by urease, zinc 
is needed as a cofactor of polymerases and dehydro- 
genases, and magnesium is involved in catalytic com- 
plexes with substrate in about one-third of enzyme 
reactions. Potassium is needed to construct the elec- 
trochemical gradient of the cell’s cytoplasm, and is a 
likely cofactor in many reactions. Calcium is probably 
needed in major reactions during the division cycle, 
but the importance of this ion still remains a mystery. 

Anions are also important, and they need to be 
imported against a strong electrochemical gradient. 
Phosphate in particular requires a set of highly 
involved transport systems. For B. subtilis a main 
source of phosphate is probably phytic acid, a slowly 
degraded phosphate-rich molecule. Sulfate is the pre- 
cursor of many important coenzymes in addition 
to cysteine and methionine, but not much is yet 
known about its transport and metabolism, except 
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Intermediary Metabolism 

Carbon and nitrogen metabolism in B. subtilis follow 
the general rules of intermediary metabolism in aerobic 
bacteria, with a complete glycolytic pathway and 
a tricarboxylic acid cycle. Electron transfer to oxygen 
is mediated by a set of cytochromes and cytochrome 
oxidases, allowing efficient respiration in B. subtilis. 
This organism is generally said to be a strict aerobe, 
and indeed it respires very efficiently. However, it can 
grow in the absence of molecular oxygen, provided 
that appropriate electron acceptors such as nitrate 
are present in the environment. Coupled to electron 
transfer, a proton addition between NAD(P) and 
NAD(P)H occurs. Bacillus subtilis does not possess 
a transhydrogenase that could equilibrate the pools 
of NADH and NADPH. Therefore, because the 
enzymes using NAD and NADP often differ, there 
must be a means of equilibrating the corresponding 
pools of reduced molecules. 

As expected from its vegetal biotope, B. subtilis can 
grow on many of the carbohydrates synthesized by 
plants. In particular, sucrose can function as a major 
carbon source in this organism, via a very compli- 
cated set of highly regulated pathways. As in many 
other eubacteria, the phosphoenolpyruvate-dependent 
(PTS) system plays a major role in carbohydrate trans- 
port and regulation. Catabolite repression control, 
mediated by a unique system involving specific factors 
(and no cyclic AMP), exists in this organism. Some 
knowledge about nitrogen metabolism in B. subtilis 
has accumulated, but significantly less than in its E. coli 
counterpart. Many nitrogenous compounds, such as 
arginine or histidine, can be transported and used by 
B. subtilis. A specific transcription factor controls 
nitrogen availability. Amino acid biosynthesis is not 
yet well documented, but purine and pyrimidine 
metabolism is well understood. In B. subtilis, in con- 
trast to E. coli, there are two carbamoylphosphate 
synthases: one specific for arginine synthesis and the 
other for pyrimidine synthesis. As in other living 
organisms, the ubiquitous polyamines putrescine and 
spermidine play a fundamental, yet enigmatic, role. 
They arise via the decarboxylation of arginine to 
agmatine, coupled toa manganese-containing agmatin- 
ase, and not from decarboxylation of ornithine, as in 
higher eukaryotes. 


Special Environmental Conditions 

Another aspect of the B. subtilis life cycle is that it can 
grow over a wide range of different temperatures up to 
54-55 °C. This indicates that its biosynthetic machinery 
comprises control elements and molecular chaperones 


that permit this versatility. Specific transcription con- 
trol processes allow the cell to adapt to changes of 
temperature by transiently synthesizing heat shock or 
cold shock proteins, according to the environmental 
conditions. In addition, gene duplication may permit 
adaptation to high temperature, with isozymes having 
low and high temperature optima. As a case in point, 
B. subtilis has two thymidylate synthases. The one 
coded by the thyA gene is thermostable (and more 
related to the archebacterial type) and the other, ThyB, 
is thermosensitive. Bacillus subtilis is also able to adapt 
to strong osmotic stresses, such as the one that occurs 
during dehydration and can adapt to high oxygen 
concentrations and changes in pH. Not much is yet 
known about the corresponding genes and regulation. 

Because the ecological niche of B. subtilis is linked 
to the plant kingdom it is subjected to rapid alternat- 
ing drying and wetting. Accordingly, this organism is 
very resistant to osmotic stress, and can grow well in 
media containing 1 mM NaCl, and indeed B. subtilis has 
been recovered from sea water. 


Secondary Metabolism 

In Bacillus species, starvation leads to the activation of 
a number of processes that affect the ability to survive 
during periods of nutritional stress. Capabilities that 
are induced include competence and sporulation, the 
synthesis of degradative enzymes, motility, and anti- 
biotic production. Some genes in these systems are 
activated during the transition from exponential to 
stationary growth. They are controlled by mechan- 
isms that operate primarily at the level of transcription 
initiation. One class of genes functions in the synthesis 
of special metabolites such as peptide antibiotics, as 
well as the cyclic lipopeptide surfactin. These genes 
include the srfA operon that codes for the enzymes of 
the surfactin synthetase complex or the pks operon, 
presumably controlling synthesis of polyketides. Sev- 
eral antifungal antibotics, some of which are used in 
agriculture, are produced by B. subtilis strains, indi- 
cating that competition with fungi is probably a major 
feature of the B. subtilis biotope. Peptide or polyketide 
antibiotic biosynthesis genes are regulated by factors as 
diverse as the early sporulation gene product Spo0A, 
the transition-state regulator AbrB, and gene products 
such as ComA, ComP, and ComQ, required for the 
initiation of the competence developmental pathway. 


Information Transfer: B. subtilis Genome 
and its Organization 


The complete sequence (4 214 820 bp) of the B. subtilis 
genome (strain 168) was published in November 1997, 
and further corrected after several rounds of se- 
quence verification. The reference specialized database, 


SubtiList, updates the genome sequence and annota- 
tion as work on B. subtilis progresses throughout the 
world. Of the more than 4100 protein-coding genes, 
53% are represented once. A quarter of the genome 
corresponds to several gene families that have been 
greatly expanded by gene duplication, the largest of 
which is a family containing 77 putative ATP-binding 
cassette permeases. 


Features of Genome Sequence 

Analysis for repeated sequences in the genome 
demonstrated that strain 168 does not contain inser- 
tion sequences. A strict constraint on the spatial dis- 
tribution of repeats longer than 25 bp was found in the 
genome, in contrast to the situation in E. coli. This was 
interpreted as a hallmark of selective processes leading 
to the insertion of new genetic information into the 
genome. Such insertion appears to rest on the uptake 
of nonspecific DNA by the competent cell and its 
subsequent integration in the chromosome in a circu- 
lar form through a Campbell-like mechanism. Similar 
patterns are found in other competent genomes of 
gram-negative bacteria as well as Archaea, suggesting 
a similar evolutionary mechanism. The correlation of 
the spatial distribution of repeats and the absence of 
insertion sequences in the genome suggests that 
mechanisms aiming at their avoidance and/or elimin- 
ation have been developed. 

Knowledge of whole genome sequences allows one 
to investigate the relationships between gene and gene 
products at a global level. Although there is generally 
no predictable link between the structure and function 
of biological objects, the pressure of natural selection 
has created some fitness among gene, gene products, 
and survival. Biases in features of predictably unbiased 
processes is evidence for prior selective pressure. In 
the case of B. subtilis one observes a strong bias in the 
polarity of transcription with respect to replication: 
70% of the genes are transcribed in the direction of the 
replicating fork movement. Global analysis of oligo- 
nucleotides in the genome demonstrated that there is a 
significant bias, not only in the base or codon compos- 
ition of one DNA strand with respect to the other, but, 
quite surprisingly, also at the level of the amino acid 
content of the proteins. The proteins coded by the 
leading strand are valine-rich, and those coded by 
the lagging strand are threonine + isoleucine-rich. 
This first law of genomics seems to extend to most 
bacterial genomes. It must result from a strong selec- 
tion pressure of a yet unknown nature. 


Codon Usage and Organization of the Cell’s 
Cytoplasm 

Because the genetic code is redundant, coding 
sequences exhibit highly variable patterns of codon 
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usage. If there were no bias, all codons for a given 
amino acid should be used more or less equally. The 
genes of B. subtilis have been split into three classes on 
the basis of their codon usage bias. One class comprises 
the bulk of the proteins, another is made up of genes 
that are expressed at a high level during exponential 
growth, and a third class, with A+T-rich codons, cor- 
responds to portions of the genome that have been 
horizontally exchanged. What is the source of such 
biases? Random mutations would be expected to have 
smoothed out any differences, but this is not the case. 
There are also systematic effects of context, with some 
DNA sequences being favored or selected against. 

The cytoplasm of a cell is not a tiny test tube. One 
of the most puzzling features of the organization of 
the cytoplasm is that it accommodates the presence of 
a very long thread-like molecule, DNA, which is tran- 
scribed to generate a multitude of RNA threads that 
usually are as long as the length of the whole cell. If 
mRNA molecules were left free in the cytoplasm, all 
kinds of knotted structures would arise. There must 
exist therefore, some organizational principles that 
prevent mRNA molecules and DNA from becoming 
entangled. Several models, supported by experiments, 
postulate an arrangement where transcribed regions 
are present at the surface of a chromoid, in such a way 
that RNA polymerase does not have to circumscribe 
the double helix during transcription. Compartmen- 
talization is important even for small molecules, 
despite the fact that they can diffuse quickly. In a 
B. subtilis cell growing exponentially in rich medium, 
the ribosomes occupy more than 15% of the cell’s 
volume. The cytoplasm is therefore a ribosome lattice, 
in which the local diffusion rates of small molecules, 
as well as macromolecules, is relatively slow. Along 
the same lines, the calculated protein concentration 
of the cell is ca. 100-200 mg ml ', a very high concen- 
tration. 

The translational machinery requires an appro- 
priate pool of elongation factors, aminoacyl-tRNA 
synthetases, and tRNAs. Counting the number of 
tRNA molecules adjacent to a given ribosome, one 
conceptualizes a small, finite number of molecules. As 
a consequence, a translating ribosome is an attractor 
that acts upon a limited pool of tRNA molecules. This 
situation provides a form of selective pressure, whose 
outcome would be adaptation of the codon usage bias 
of the translated message as a function of its position 
within the cytoplasm. If codon usage bias were to 
change from mRNA to mRNA, these different mole- 
cules would not see the same ribosomes during the life 
cycle. In particular, if two genes had very different 
codon usage patterns, this would predict that the cor- 
responding mRNAs are not formed within the same 
sector of the cytoplasm. 
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When mRNA threads are emerging from DNA 
they become engaged by the lattice of ribosomes, 
and they ratchet from one ribosome to the next, like 
a thread in a wiredrawing machine (note that this is 
exactly opposite to the view of translation presented in 
textbooks, where ribosomes are supposed to travel 
along fixed mRNA molecules). In this process, nas- 
cent proteins are synthesized on each ribosome, and 
spread throughout the cytoplasm by the linear diffu- 
sion of the mRNA molecule from one ribosome to the 
next. However, when mRNA disengages from DNA, 
the transcription complex must sometimes break up. 
Broken mRNA is likely to be a dangerous molecule 
because, if translated, it would produce a truncated 
protein. Such protein fragments are often toxic, because 
they can disrupt the architecture of multisubunit com- 
plexes (this explains why many nonsense mutants are 
negative dominant, rather than recessive). There exists 
a process that copes with this kind of accident in B. 
subtilis. When a prematurely terminated mRNA mol- 
ecule reaches its end, the ribosome stops translating, 
does not dissociate, and waits. A specialized RNA, 
tmRNA, which is folded and processed at its 3’ end 
like atRNA and charged with alanine, comes in, inserts 
its alanine at the C-terminus of the nascent polypep- 
tide, then replaces the mRNA withina ribosome, where 
it is translated as ASFNQNVALAA. This tail is a 
protein tag that is then used to direct it to a proteolytic 
complex (ClpA, ClpX), where it is degraded. 

The organization of the ribosome lattice, coupled 
to the organization of the transcribing surface of the 
chromoid, ensures that mRNA molecules are trans- 
lated parallel to each other, in such a way that they do 
not make knots. Polycistronic operons ensure that 
proteins having related functions are coexpressed 
locally, permitting channeling of the corresponding 
pathway intermediates. In this way, the structure of 
mRNA molecules is coupled to their fate in the cell, 
and to their function in compartmentalization. Genes 
translated sequentially in operons are physiologically 
and structurally connected. This is also true for 
mRNAs that are translated parallel to each other, 
suggesting that several RNA polymerases are engaged 
in the transcription process simultaneously, yoked 
as draft animals. Indeed, if there is correlation of 
function and/or localization in one dimension, there 
exists a similar constraint in the orthogonal directions. 
Because ribosomes attract tRNA molecules, they 
bring about a local coupling between these molecules 
and the codons being translated. This predicts that a 
given ribosome would preferentially translate mRNAs 
having similar patterns of codon usage. As a conse- 
quence, as one moves away from a strongly biased ribo- 
some, there would be less and less availability of the 
most biased tRNAs. This creates a selection pressure 


for a gradient of codon usage as one goes away from 
the most biased messages and ribosomes, nesting tran- 
scripts around central core(s), formed of transcripts 
for highly biased genes. Finally, ribosome synthesis 
creates a repulsive force that pushes DNA strands 
away from each other, in particular from regions 
near the origin of replication. Together these processes 
result in a gene gradient along the chromosome, which 
is an important element of the architecture of the cell. 


Information Transfer 

The DNA polymerase complex of B. subtilis is 
attached to the membrane. During replication, the 
DNA template moves through the polymerase. This 
might be caused in part by the formation of planar 
hexagonal layers of DnaC, the homolog of E. coli 
DnaB helicase. The B. subtilis chromosome starts 
replicating at a well-defined Ori site, and terminates 
in a symmetrical region, probably using a recombin- 
ational process to resolve the knotted structure at the 
terminus. This may account for the presence of hori- 
zontally exchanged genetic material (prophages in 
particular) near the terminus. 

Transcription in B. subtilis is similar to that in other 
eubacteria. The major RNA polymerase is a holoen- 
zyme made up of four subunits (two as, B, B’) and a 
sigma factor. Eighteen sigma factors have been identi- 
fied within the genomic sequence. Apart from 054, 
which is specialized for the control of nitrogen meta- 
bolism, the other os specifically control specialized 
processes such as sporulation, stress response, or che- 
motaxis and motility. 

Translation in B. subtilis is typical of eubacterial 
translation. A new type of control of the synthesis 
of aminoacyl-tRNA synthetase was discovered in 
B. subtilis. Most aminoacyl-tRNA synthetase genes 
belong to the so-called T-box family of genes. They 
are regulated by a common mechanism of transcrip- 
tional antitermination. Each gene is induced by spe- 
cific amino acid limitation; the uncharged cognate 
tRNA is the effector that induces transcription of the 
full-length message. The mRNA leader regions of 
the genes in this family share a number of conserved 
primary sequence and secondary structural elements, 
some of which are involved in binding the charged 
tRNA molecule. 


Horizontal Gene Transfer and Phylogeny 


Three principal modes of transfer of genetic material, 
namely transformation, conjugation, and transduction 
occur naturally in prokaryotes. In B. subtilis there 
is not much evidence for conjugation processes 
(although DNA can be conjugated into the organism), 
but transformation is an efficient process (at least in 


some B. subtilis species such as the Marburg strain 
168) and transduction with the appropriate carrier 
phages is well understood. 


Bacillus subtilis Phages 

Anunexpected result that emerged from an analysis of 
the B. subtilis genome sequence was that it harbors at 
least 10 prophages or prophage-like elements. While 
the lysogenic SPbeta phage, as well as the defective 
PBSX and skin elements, was known to be present, no 
other phage had been identified. Many phages how- 
ever can utilize B. subtilis as a host, in particular phi- 
29, phi-105, SPO1, SPP1, beta 22 or SF6, but the 
details of their biology are generally not well docu- 
mented. Bacteriophage PBS1, or the phages IG1, IG3, 
and IG4 can perform specialized transduction. 
Among the remarkable features of the phage genomes 
are the presence of introns or inteins, especially in 
genes involved in modulating DNA synthesis by the 
host. A three-dimensional reconstruction of phage 
phi-29 and its empty prohead precursor has been per- 
formed using cryoelectron microscopy. The head-tail 
connector, which is the central component of the 
DNA packaging machine, has been visualized in situ. 
The connector, with 12- or 13-fold symmetry, appears 
to fit loosely into a pentameric vertex of the head, a 
symmetry mismatch that may be required to rotate 
the connector to package DNA. An RNA molecule, 
pRNA, is required in the form of an hexamer to 
package DNA in the capsid. 


Competence and Transformation 

In addition to sporulation, B. subtilis enjoys another 
developmental process, i.e., competence, which can 
lead to genetic transformation. Interconnected regu- 
latory networks control the initiation of sporulation 
and the development of genetic competence. These 
two developmental pathways have both common 
and unique features and make use of similar regulatory 
strategies. This explains why, before the genome of 
B. subtilis was sequenced, the vast majority of ex- 
periments using this organism were dealing with 
these processes. Quorum-sensing, used by cells to 
monitor local cell density, controls the transformation- 
competence of B. subtilis. This control system is part 
of the 11 phosphorylation cascades comprising a 
regulatory aspartate phosphatase that have been dis- 
covered in the strain 168 genome. 


Recombination 

The presence in the B. subtilis genome of local repeats, 
suggesting Campbell-like integration of foreign 
DNA, is consistent with a strong involvement 
of recombination processes in its evolution. In add- 
ition, recombination must be involved in mutation 
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correction. It is therefore interesting to analyze the 
proofreading systems at the level of replication. In B. 
subtilis MutS and MutL homologs exist, presumably 
for the purpose of recognizing mimatched base pairs. 
But no MutH activity could be identified that would 
allow the daughter strand to be distinguished from its 
parent. It is therefore not known yet how long-patch 
mismatch repair corrects mutations in the proper 
strand. Excision of misincorporated uracil instead of 
thymine might be a general process that would not 
require extra information. 


Restriction—Modification Systems 

Bacillus subtilis strains contain many restriction- 
modification systems, mostly of type II, many of 
which were probably transferred from phages. The se- 
quence specificities of several restriction—modification 
systems are known: BsuM (CTCGAG); BsuE 
(CGCG); BsuF (CCGG); BsuRI (GGCC); and 
BsuBI, which is similar to the PstI system. BsuC is a 
type I system, which is very similar to the ones found 
in enterobacteria. 


Phylogeny 

Bacillus subtilis is a typical gram-positive eubacter- 
ium. As such it is significantly more similar to Archaea 
than is E. coli. Many metabolic genes have a distinct 
archaeal flavour, in particular genes involved in the 
synthesis of polyamines, but it is rare to find genes in 
B. subtilis that are similar to eukaryotic genes. This led 
Gupta to propose that ancestral bacteria comprised 
a monoderm organism that diverged into gram- 
positive bacteria and Archaea, and that gram-positive 
bacteria further led to gram-negative bacteria with 
their typical double membrane (diderms). This 
hypothesis stirred a very heated, but interesting, debate 
about the origin of the first cell(s). As such, bacilli 
form a heterogenous family of bacteria that can be 
split into at least five distinct groups. Bacillus subtilis 
is part of group 1 and is strongly linked to B. licheni- 
formis (which is often found on the cuticle of insects), 
and to the group of animal pathogens formed by B. 
thuringiensis, B. cereus, and B. anthracis. In this clas- 
sification B. sphaericus is typical of group 2, B. poly- 
myxa of group 3, and B. stearothermophilus of group 
5. The pathogen Listeria monocytogenes (in between 
groups 2 and 5) is related to B. subtilis, and, indeed, its 
genome has many features in common with that of the 
genome of B. subtilis. Accordingly, B. subtilis is an 
excellent model for these groups of bacteria. 


Industrial Processes 


As a model organism, B. subtilis possesses most of the 
functions that one would expect to find in bacteria. 
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It is an organism Generally Recognized As Safe 
(GRAS). This explains why it is a source of many 
products synthesized by the agro-food industry. 
Bacillus subtilis has often been thought to be a desir- 
able host for foreign gene expression or fermentation 
and it is commonly used at the industrial level for both 
enzyme production (amylase, proteases, etc.) and food 
supply fermentation (Bacillus natto, a close parent of 
B. subtilis, is used in Japan to ferment soybean, produ- 
cing the popular ‘natto’). Riboflavin is derived from 
genetically modified B. subtilis using fermentation 
techniques. For some time, high levels of heterologous 
gene expression in B. subtilis was difficult to achieve. 
Knowledge of the genome allowed identification of 
one of the major bottlenecks in this process: Although 
it has a counterpart of the rpsA gene, this organism 
lacks the function of the corresponding ribosomal S1 
protein that permits recognition of the ribosome bind- 
ing site upstream of the translation start codons. In 
general gram-positive bacteria have transcription and 
translation signals that must comply with rules much 
more stringent than do gram-negative bacteria. Trad- 
itional techniques (e.g., random mutagenesis followed 
by screening; ad hoc optimization of poorly defined 
culture media) are important and will continue to be 
utilized in the food industry. But modern biotechnol- 
ogy now includes genomics, which adds the possi- 
bility to target genes constructed in vitro at precise 
position, as well as to modify intermediary metabol- 
ism. As a complement to standard genetic engineering 
and transgenic technology, this has opened up a whole 
new range of possibilities in food product develop- 
ment, in particular allowing ‘humanization’ (i.e., adapt- 
ation to the human metabolism and even adaptation to 
sickness or health) of the content of food products. 
These techniques provide an attractive means of pro- 
ducing healthier food ingredients and products that are 
presently not available or are very expensive. Bacillus 
subtilis will remain a tool of choice in this respect. 


Conclusion: Open Questions 


The complete genome sequence of B. subtilis contains 
information that remains underutilized in the current 
prediction methods applied to gene functions, most of 
which are based on similarity searches of individual 
genes. Methods that utilize higher level information 
on molecular pathways to reconstruct a complete 
functional unit from a set of genes have been devel- 
oped. The reconstruction of selected portions of the 
metabolic pathways using the existing biochemical 
knowledge of similar gene products has been under- 
taken. But it often remains necessary to validate such 
in silico (using computers) reconstruction by in vivo 
and im vitro experiments. The completeness of a 


reconstructed pathway (i.e., no missing reaction) is 
an indicator of the correctness of the initial functional 
assignment. The core biosynthetic pathways of all 
20 amino acids have been completely reconstructed 
in B. subtilis. However, many satellite or recycling 
pathways have not been identified yet. Finally, there 
remains at least 800 completely unknown genes in the 
genome of strain 168. Functional genomics is aimed at 
identifying their role. 
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Backcross is the term for a cross between a class of 
organisms that is heterozygous for alternative alleles 
at a particular locus under investigation and a second 
class that is homozygous for one of these alleles. 
The term is often used by itself to describe a two- 
generation breeding protocol that begins with a cross 
between two inbred strains to produce F; hybrid off- 
spring (see F1 Hybrid). These F, hybrid offspring are 
heterozygous atnumerouslocithroughoutthe genome. 
The F; organisms are ‘backcrossed’ to organisms from 
one of the original parental strains to obtain a second- 
generation population of organisms in which segrega- 
tion and assortment of alleles occurs independently 


during the generation of each individual. Genotypic 
and phenotypic analysis of animals in this second- 
generation population can provide data that can be 
used to determine linkage relationships and map posi- 
tions of genes. 


See also: Fl Hybrid; Independent Assortment; 
Independent Segregation; Test Cross 
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The concept of ‘background selection’ was proposed 
by Brian and Deborah Charlesworth and their col- 
leagues as a theoretical explanation for the empirical 
observation that population levels of genomic vari- 
ability are reduced in regions of reduced recombina- 
tion. This observation, first uncovered in Drosophila, 
has been described in humans, mice, and plants, and 
appears to hold universal generality. 

Background selection introduces the idea that there 
is a steady introduction of deleterious mutations into 
the genome, and each mutant allele has a reduced life 
span relative to neutral or mutation-free alleles. This 
subset of alleles is removed from the population at rate 
proportional to the selection intensity against them. If 
the mutation rate to deleterious alleles is sufficiently 
high and genetic recombination of a genomic region 
sufficiently reduced, the population genetic conse- 
quences are a reduction in the neutral genetic vari- 
ability of the entire region. 

The process can be explicity described using the 
Wright-Fisher population model as a theoretical 
description of how random sampling processes in 
finite populations affect steady-state levels of neutral 
nucleotide polymorphism. In this mathematical 
model, the population consists of 2N gene copies 
(where N is the number of diploid individuals), and 
each gene copy is assumed to produce an infinite 
number of gametes from which 2N copies are then 
sampled for the next generation. This results in each 
copy replacing itself every generation as a Poisson 
distributed number with mean 1. Through time this 
process leads to a genealogy of allele lineages with 
expected mathematical features. For example, viewed 
retrospectively as a coalescent process, it can be shown 
that any two copies sampled in a contemporary popu- 
lation will have a common ancestor that is on the 
average 2N generations in the past. Imposing neutral 
mutational events on this genealogy results in a sample 
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of alleles with an expected number of segregating 
polymorphisms. For a gene undergoing deleterious 
mutation, a subset of these 2N sampled alleles acquires 
a deleterious mutation every generation and will, 
depending on the intensity of selection, replace itself, 
with mean less than 1, until eventual loss from the 
population. The removal of this subset of alleles or 
lineages leads to a reduced number of descendent 
lineages relative to the same population without dele- 
terious mutation. From a genealogical perspective, 
this results in relatively fewer generations to the com- 
mon ancestor of the remaining lineages. Since the total 
genealogical history of a sample of alleles now totals 
fewer generations, the sample will also show reduced 
levels of neutral polymorphism. 

Since single-locus deleterious mutations rates are 
very small, the number of lineages being removed by 
selection as a proportion of the total is insignificant at 
any single locus. However, as single alleles are trans- 
mitted as linked gene arrays the cumulative impact of 
deleterious mutation at distant but linked genes can 
have significant effects on neutral variation at a single 
locus embedded in a chromosome. When the effect is 
integrated across many linked loci, the predicted 
reduction in variation can be substantial, reducing 
neutral variation by an order of magnitude or more. 
For a given locus, the level of standing variation will 
depend on the potential opportunity for its genea- 
logical history to be distorted by a background of 
linked deleterious selection. Overall, this will depend 
on the level of local recombination and the rate of 
mutation, with a particular level of detrimental impact 
entering the critical region. For two loci embedded in 
regions with equivalent gene densities, but different 
regional recombination rates, the locus in the region of 
lower level of recombination will possess lower levels 
of neutral polymorphism due to background selec- 
tion. This model could explain the reduced level of 
polymorphism and codon bias seen for genes in telo- 
meric and centromeric regions of chromosomes. 

There are, however, alternative models to explain 
this phenomenon. The leading alternative model is the 
concept that the genome constantly undergoes ‘adap- 
tive sweeps’ associated with advantageous mutations 
that enter the population and rise to appreciable fre- 
quencies or fixation. The difficulty in addressing these 
competing views is that both background-selection 
and adaptive-sweep models predict the same qualita- 
tive outcome. Low variation is always associated with 
reduced recombination. The test of these alternative 
explanations has been to contrast the frequency dis- 
tribution of alleles in regions of normal and reduced 
recombination. Under the background-selection 
model, the frequency spectrum of nucleotide poly- 
morphisms is predicted to not deviate from the neutral 
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expectation; however, under the adaptive-sweep scen- 
ario, the genomic region is proposed, depending on 
the age of the sweep, to show an excess of rare alleles, 
or ‘singletons.’ This issue remains unresolved. 


Further Reading 

Begun DJ and Aquadro CF (1992) Levels of naturally occurring 
DNA polymorphism correlate with recombination rates in 
D. melanogaster. Nature 356: 519-520. 

Charlesworth B, Morgan MT and Charlesworth D (1993) The 
effect of deleterious mutations on neutral molecular varia- 
tion. Genetics 134: 1289-1303. 

Charlesworth D, Charlesworth B and Morgan MT (1995) The 
pattern of neutral molecular variation under the background 
selection model. Genetics 141: 1619-1632. 


See also: Mutation Rate; Neutral Mutation; 
Polymorphism; Polymorphisms, Tree 
Reconstruction; Selective Sweep 


Bacteria 


J Parker 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.0102 


Bacteria constitute one of the three basic taxonomic 
domains of cellular organisms; the other two being the 
Archaea and the Eukarya. Bacteria are prokaryotes, 
cells which lack a true nucleus, and for a long time the 
terms bacteria and prokaryotes were synonyms. How- 
ever, in the 1970s, through the research of Carl Woese 
and colleagues, it became clear that the prokaryotes 
contain two very distinct groups of organisms. These 
have come to be called Bacteria (formerly eubacteria) 
and Archaea (formerly archaebacteria). Although the 
term bacteria is still sometimes found used in its ori- 
ginal, broader sense, this entry will deal only with the 
taxonomic domain Bacteria. 


Characteristics of Bacteria 


Prokaryotes (Bacteria and Archaea) have some things 
in common besides the lack of a membrane-bound 
nucleus. Prokaryotes are also missing the other mem- 
branous organelles, such as mitochondria and chloro- 
plasts, found in eukaryotic cells. (Interestingly, it 
seems very clear that mitochondria and chloroplasts 
have evolved from Bacteria.) While many prokaryotes 
are motile by means of a flagellum or flagella, the fla- 
gellum of the prokaryotes is unrelated to that found in 
eukaryotes. 

Prokaryotes also reproduce exclusively asexually, by 
a process called binary fission. Unlike the Eukarya, 


prokaryotes typically have a single, often circular, 
molecule of double-stranded DNA as their only 
chromosome and these chromosomes seem to have a 
single site for replication initiation. The chromosomes 
of prokaryotes have much less protein associated with 
them than is the case for the structurally more com- 
plex eukaryotic chromosome. 

Although macromolecular synthesis is very similar 
in the three taxonomic domains, there are some 
important distinguishing characteristics. For example, 
the RNA polymerases of the Bacteria are simpler, that 
is they have fewer subunits, than those of the Archaea 
and Eukarya. In addition, protein synthesis in the 
Bacteria is initiated with formylmethionine whereas 
in both the Archaea and the Eukarya an unmodified 
methionine is used. There are numerous other bio- 
chemical and physiological differences between the 
organisms in the three domains, including the chem- 
istry of the cell wall. 

Almost all prokaryotes have cell walls, and these 
cell walls are quite distinct from those of the eukary- 
otic fungi or plants. In addition, the cell walls of the 
Bacteria are quite distinct from those of the Archaea. 
The cell walls of Bacteria contain peptidoglycan, a 
fairly rigid polymer of modified sugars crosslinked 
by peptides. One of the important distinguishing 
molecules in Bacterial cell walls is the sugar derivative 
muramic acid, a part of the peptidoglycan. The produc- 
tion of peptidoglycan is inhibited by penicillin and, 
therefore, this antibiotic is specific for Bacteria. (Many 
other antibiotics are also specific for Bacteria.) 

Most Bacteria can be differentiated into two groups 
by a staining technique, the gram stain, which is based 
on the structure of their cell walls. The gram-positive 
Bacteria have cell walls composed primarily of 
peptidoglycan, while gram-negative Bacteria have 
complex cell walls containing a thin inner layer of 
peptidoglycan and a complex outer layer of lipids, 
proteins, and lipopolysaccharides. This outer layer is 
called the outer membrane. The complex cell wall of 
gram-negative Bacteria interferes with the uptake 
of some antibiotics and therefore gram-negative 
Bacteria are commonly more resistant to these anti- 
biotics than are gram-positive Bacteria. 

These rigid cell walls give the different species of 
Bacteria characteristic shapes. Some are ovoid or 
spherical (Figure 1), and are called cocci (singular, 
coccus), some are rod shaped (Figure 2), and others 
are curved sometimes into spiral shaped or helical 
patterns (Figure 3). These latter include the spiro- 
chetes, which are tightly coiled and motile by means 
of axial filaments and contain a complex outer sheath. 

Most Bacteria (and Archaea) are small, with diam- 
eters (and lengths) of about 1 to 5 um. However, al- 
though the spirochetes are thin, they are sometimes 


over 200 um in length. One of the largest Bacteria is 
the rod-shaped Epulopsicum fishelsoni which has a 
diameter of 50 um and is over 500 um in length. The 
cells of the bacterium Thiomargarita namibiensis can 
be as large as 750 um in diameter, about the size of a 
printed period (full stop), and are thus visible to the 
naked eye. This enormous size results from the pres- 
ence of a very large vacuole which contains nitrates. 
Many Bacteria contain vacuoles or storage granules, 
but most are much smaller in size. 

While most Bacteria are unicellular, some clump 
together in regular patterns (see Figure |) and others 
can form complex multicellular groups during their life 
cycles. These latter include organisms like Myxococcus 
xanthus, where specialized cell types form during a 
complex life cycle. 


Diversity of Bacteria 


Bacteria occupy almost every niche where cellular life 
is possible and are the most numerous forms of cellu- 
lar life on the planet. Indeed, many Bacteria live in 
habitats which are far removed from the environmen- 
tal conditions which can be tolerated by eukaryotic 
organisms (although Archaea can also thrive in these 
extreme environments). Bacteria also exist in very large 
numbers in most habitats, and consequently their meta- 
bolism has a profound impact on environmental and 


Figure | A scanning electron micrograph of Micro- 
coccus luteus. Each coccoid cell is approximately | um in 
diameter. Note there is a tendency of the cells to exist in 
small clusters. This organism is an obligate aerobe, that 
is oxygen is required for metabolism, and is a member 
of the gram-positive division. (Electron micrograph 
courtesy of John Bozzola.) 
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even geological processes. Of course, some Bacteria 
cause disease. In fact, almost half of human disease is 
caused by Bacteria. Even so, pathogenic species of 
Bacteria are a minor fraction of known species. Add- 
itionally, the approximately 4500 known species seem 
to constitute less than 1% of all the species of Bacteria 
thought to exist (see below). 


Metabolic Diversity 
The metabolic diversity found among the various spe- 
cies of Bacteria is enormous, encompassing all known 
major modes of nutrition and most known modes of 
metabolism. Many Bacteria (like most Eukarya) are 
chemoheterotrophs, and must consume organic mol- 
ecules for both a source of carbon and of energy. Many 
other Bacteria (like most plants) are photoautotrophs, 
and can derive energy from light and synthesize 
organic compounds from carbon dioxide. Some Bac- 
teria are chemolithoautotrophs, and also synthesize 
organic compounds from carbon dioxide but derive 
energy from oxidizing inorganic substances. Still 
other Bacteria are photoheterotrophs, and use light 
to generate energy but require organic carbon as a 
carbon source. 

To say merely that the heterotrophic Bacteria 
require an organic carbon source fails to convey the 
enormous variety of carbon sources that different 


Figure 2 A scanning electron micrograph of Proteus 
vulgaris. The cells are about 2.0 to 2.5 um in length. This 
organism is motile by means of flagella, which are 
bunched together in this micrograph. This Bacteria is a 
member of the Proteobacteria division and is a frequent 
cause of urinary tract infections in humans. (Electron 
micrograph courtesy of John Bozzola, strain courtesy of 
Eric Niederhoffer.) 
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Figure 3 A scanning electron micrograph of Borrelia burgdorferi. This bacterium is a member of the Spirochete division 
and is the cause of Lyme disease. It is one of the relatively few Bacteria known to havea linear chromosome (see Table 1). 
The cell shown is 15 um in length. (Electron micrograph courtesy of Pawel Krasucki and Cathy Santanello.) 


Bacteria might use. Indeed, most organic compounds 
can be metabolized by some species of Bacteria. Some, 
such as Bacillus subtilis and Escherichia coli, need only 
a simple organic carbon source, such as glucose, from 
which they derive their energy and form all the other 
carbon compounds found in the cell. Other free-living 
Bacteria require a complex mixture of organic com- 
pounds for growth, while others are parasites and 
obtain complex substances from their hosts. 
However, the metabolic diversity of the Bacteria is 
not confined to how they derive energy or what carbon 
source they use. Nitrogen is the second most abundant 
element in living material and several of the reactions 
that nitrogen undergoes in the environment are carried 
out almost exclusively by Bacteria. About 85% of the 
nitrogen fixation, the process by which nitrogen gas 
(N3) in the atmosphere is reduced to ammonia (NH3), 
which occurs on the planet is biological and almost all 
of this is carried out by Bacteria (although a few 
Archaea can also fix nitrogen). Many of the Bacteria 
which fix nitrogen, e.g., species of Rhizobium, do so 
while participating in a symbiotic relationship with 
higher plants. Bacteria are also responsible for the 
denitrification which returns nitrogen to the atmos- 
phere and also play key roles in the global sulfur and 
iron cycles as well as those of several trace metals. 
Some Bacteria are obligate aerobes, that is they re- 
quire oxygen for respiration and cannot grow without 
it. Other Bacteria use oxygen if it is present to respire 


but can also grow in an oxygen-free environment. 
These organisms are called facultative anaerobes. 
Finally there are obligate anaerobes, organisms which 
are poisoned by oxygen. 

Bacterial metabolism can have profound effects 
on the environment and has had such effects in the 
past. Phototrophic Bacteria played an important role 
in the evolution of an oxygen rich atmosphere on 
earth. This change in turn must have had a dramatic 
impact on the continuing evolution of organisms on 
the planet. 


Phylogenetic Diversity 

As mentioned above, the known species of Bacteria 
constitute only a very minor fraction of the existing 
bacterial species. The true extent of the phylogenetic 
diversity of the Bacteria became apparent only after 
the advent of molecular taxonomy and the ability to 
use technology to identify and sequence DNA from 
organisms without having to grow them in culture. As 
mentioned above, one of the first triumphs of molecu- 
lar taxonomy was the understanding that Bacteria 
were one of three domains, the highest taxonomic 
grouping. 

Traditionally it was necessary to culture a bacter- 
ium in order to classify it taxonomically. Since 
Bacteria do not reproduce sexually, species identity 
does not involve reproductive isolation. Instead tradi- 
tional taxonomy depended very much on phenotypic 


characteristics such as morphology, gram reaction and 
cell wall chemistry, nutritional classification (photo- 
autotroph, chemoheterotroph, and so forth), ability 
to use various carbon, nitrogen, and sulfur sources, 
nutritional requirements, lipid chemistry, temperature 
and pH requirements or tolerances, pathogenicity, and 
habitat. However, molecular taxonomy is based on the 
relatedness of the sequences of macromolecules, par- 
ticularly that of the 16S ribosomal RNA (rRNA) 
found in the small subunit of the prokaryotic ribo- 
some. Two Bacteria whose 16S rRNA differ by more 
than 3% are considered to be separate species and 
those which differ by more than 5-7% are considered 
to be in separate genera. (The conservation of these 
sequences can be noted by the fact that a 3% differ- 
ence in 16S rRNA sequence can be indicative of an 
overall genome similarity of about 70%.) 

Higher level taxonomy depends both on these 
sequence differences and the phenotypic characteriza- 
tion mentioned above. Among the culturable Bacteria 
there are at least 14 different major divisions. These 
divisions are also sometimes referred to as ‘kingdoms’ 
or ‘phyla’ (there is not yet a consistent usage of taxo- 
nomic nomenclature of the Bacteria for the higher 
level of groupings). The typical difference between 16 
S rRNA sequences from different divisions is 20-25%. 
The major divisions include Aquificales, Chlamydia, 
Cyanobacteria (the chloroplasts of eukaryotes are 
related to this division), Cytophagales, Deinococcus/ 
Thermus, Fusobacteria, Gram-positive Bacteria 
(often divided into two divisions), Green nonsulfur 
Bacteria, Green sulfur Bacteria, Nitrospira, Plancto- 
myces, Proteobacteria (also called Purple Bacteria), 
Spirochetes, and Thermotagales. (Thereisnotyetagree- 
ment on the names of the divisions.) However, the 
ability to identify and characterize organisms using in 
situ hybridization to nucleic acid probes has indicated 
there may be as many as 50 such higher-order group- 
ings. Many of these are currently known only as a 
sequence found in a natural population, these are 
referred to as ‘environmental sequences.’ 

It is the power of this methodology that has given 
rise to the estimate that so few of the existing species of 
Bacteria have been cultured. At the same time it has 
become clear that the currently culturable Bacteria do 
not make up a preponderance of natural populations. 
This finding emphasizes that these new divisions do 
not simply contain a few minor exotic Bacteria. 


Bacterial Genomes 


As mentioned above, the majority of Bacteria whose 
genomes have been characterized are found to contain 
a single circular double-stranded DNA molecule 
as a chromosome. Certain Bacteria do have linear 
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chromosomes, e.g., Borrelia burgdoferi and members 
of the genus Streptomyces. A good case can also be 
made that certain Bacteria, e.g., Rhodobacter sphaero- 
ides, have more than one chromosome. A chromosome 
is defined here as a genetic element which carries essen- 
tial genes. 

Many, perhaps most, Bacteria also contain plas- 
mids, which vary widely in size and carry a variety 
of genes. Some plasmids carry genes which confer 
important and distinguishing phenotypes on their 
hosts, such as the Sym plasmids of Rhizobium, which 
are essential for establishing the symbiotic relation- 
ship these Bacteria have with plants, and the resistance 
plasmids found in many Bacteria. Most plasmids are 
also circular double-stranded DNA molecules, but 
there are many examples of linear plasmids in Bacteria. 

The first cellular chromosome to be sequenced 
was that of the bacterium Haemophilus influenzae. 
Table I lists a few of the Bacteria whose chromosomes 
have been completely sequenced. The organisms in 
the list are certainly not a representative sample of 
Bacteria. They are, of course, cultured organisms and 
many are from the gram-positive Bacteria and Proteo- 
bacteria divisions. The Proteobacteria is the largest and 
most diverse Bacterial division. The list also has a heavy 
representation of pathogenic Bacteria because of our 
interest in human disease. There is also a bias toward 
Bacteria with smaller genomes, presumably because 
they were somewhat easier to completely sequence. 
Much larger bacterial chromosomes exist. For exam- 
ple, the linear chromosomes of members of the genus 
Streptomyces are about 8 megabase pairs and the circu- 
lar genome of Mycococcus xanthus is about 9 megabase 
pairs. Even so, the largest bacterial genomes are smaller 
than that of any known eukaryote. Also unlike those of 
eukaryotes, the genomes of Bacteria have a very wide 
range of G + C content, from about 25% to 75%. 

Note the smallest chromosome has less than 500 
protein-encoding genes. Some geneticists have specul- 
ated that it might be possible to maintain a cellular 
existence with as few as 250 genes. In most cases. Bac- 
terial chromosomes have very little noncoding DNA. 
However, this is not true of the intracellular parasite 
Rickettsia prowazekii which has 24% noncoding 
DNA. This DNA may represent coding capacity 
that has been lost during the organism’s evolution 
toward a parasitic existence. (Rickettsia are also evo- 
lutionarily related to those Bacteria which were the 
progenitors of the mitochondria.) As such, it is unre- 
lated to the enormous fraction, greater than 95%, of 
noncoding DNA found in the genomes of the higher 
eukaryotes. Genomic analysis of the Bacteria is leading 
to important information about evolutionary relation- 
ships among organisms and also has practical appli- 
cations in the biotechnology industry and in medicine. 


Table I Some Bacteria with sequenced chromosomes 


Organism” Chromosome size ORFs? Description‘ 


(base pairs) 


Mycoplasma genitalium 580 070 
Mycoplasma pneumoniae 816394 
Borrelia burgdorferi 910725 
Chlamydia trachomatis 1042519 
Rickettsia prowazekii 1111523 
Treponema pallidum | 138006 
Aquifex aeolicus 1551 335 
Helicobacter pylori | 667 867 
Haemophilus influenzae 1 830 137 
Synechocystis sp. 3 573 470 
Bacillus subtilis 4214810 
Mycobacterium tuberculosis 441 1 529 
Escherichia coli 4639221 


470 
677 
853 
894 
834 
1041 
1512 
1590 
1743 
3168 
4100 
3924 
4288 


Gram-positive Bacteria, lacks cell wall, parasitic, smallest known cellular genome 

Gram-positive Bacteria, lacks cell wall, causes pneumonia 

Spirochete, has linear chromosome,’ causes Lyme disease 

Chlamydia, obligate intracellular parasite, common human pathogen 

Proteobacteria, obligate intracellular parasite, causes epidemic typhus 

Spirochete, human parasite which cannot be cultured continuously in vitro, causes syphilis 
Aquificales, a hyperthermophilic chemolithoautotroph, growth maximum near 95 °C 
Proteobacteria, causes peptic ulcers, the most common chronic infection of humans 
Proteobacteria, causes infectious meningitis, naturally transformable, first cellular genome sequenced 
Cyanobacteria, a photoautotroph 

Gram-positive Bacteria, genetic model 

Gram-positive Bacteria, causes tuberculosis, claims 3000000 lives per year, more than any other pathogen 
Proteobacteria, gram-negative genetic model 


“All Bacteria listed have a single chromosome. However, many contain one or more plasmids. For example, the strain of Borrelia burgdorferi whose chromosome was 
sequenced contains |7 different plasmids which themselves total 533 kilobase pairs. "The number of open reading frames, ORFs, is an approximation of the total number of 
different proteins that an organism encodes. ‘The bold term in each of these descriptions is the name of the bacterial ‘division’ to which the organism belongs. “All other 


chromosomes in this list are circular. 
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Gene Transfer in Bacteria 


As was mentioned above, Bacteria do not reproduce 
sexually, and there is no mechanism of recombination 
involving entire chromosomes such as occurs in the 
Eukarya. However, there are mechanisms of gene 
transfer which are known to occur in nature in 
which fragments of bacterial chromosomes can be 
exchanged. Some of these involve plasmids and trans- 
posable elements and the transfer mechanisms allow 
these genetic elements to transfer themselves in their 
entirety. Geneticists have taken advantage of these nat- 
ural processes to do genetic analysis and gene mapping 
with Bacteria, but for many uses such techniques are 
being supplanted by recombinant DNA technology. 


Transformation 

Genetic transformation in Bacteria is the process of 
taking up free DNA from the environment and incorp- 
orating it into a recipient cell. The ability of an organ- 
ism to take up DNA is called competence. The 
elucidation by Oswald Avery and colleagues that 
DNA was the ‘transforming agent’ in Streptococcus 
pneumoniae was one of the outstanding discoveries 
in genetics of the last century. There are several differ- 
ent mechanisms of natural transformation known, and 
many organisms including Bacillus subtilis and 
Haemophilus influenzae are naturally transformable 
(see Table 1). Methods of inducing artificial com- 
petence in the laboratory, such as electroporation, 
have also been developed to facilitate molecular 
cloning in other organisms. 


Conjugation 

Bacterial conjugation is a plasmid-encoded mechan- 
ism that allows certain plasmids to transfer them- 
selves from cell-to-cell, sometimes across wide 
phylogenetic distances. Conjugation is one of the 
prominent mechanisms for the spread of antibiotic 
resistance in pathogenic Bacteria and is undoubtedly 
involved in other types of horizontal gene transfer. 
Conjugation was discovered (in Escherichia colt), 
however, because under certain conditions the 
process can also mobilize the transfer of part of the 
host chromosome. 


Transduction 

Transduction in Bacteria involves host genes being 
transferred by a virus. Broadly speaking there are 
two types of transduction: generalized transduction 
in which random parts of the host chromosome are 
accidently packaged in the virion instead of viral 
DNA, and specialized transduction in which a specific 
portion of the host chromosome becomes incorpor- 
ated into the viral genome and is packaged along 
with it. 
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Introduction and Nomenclature 


What is a Gene? 

Genes are the segments of genetic material (DNA in 
the case of bacterial chromosomes) that specify indi- 
vidual, heritable, mutable functions of an organism. 
For most organisms with DNA genomes, this defini- 
tion is usually narrowed a bit, by specifying that genes 
are transcribed into RNA. This excludes from the 
definition mutable elements such as promoters, oper- 
ators and other regulatory sites, which act as signals 
but are not themselves functions. A complete under- 
standing of a particular gene requires some specifica- 
tion of all those sites surrounding the gene that affect 
its expression (which say ‘start,’ ‘stop,’ ‘more,’ ‘less,’ 
and so forth). 

Historically, the identity of a ‘function’ in this for- 
mulation has gone through several changes. Mendel 
originally described ‘heritable characters’ as the unit 
of genetics, without a molecular description of what 
these were. (The term ‘character’ is now often used by 
evolutionary geneticists to refer to an informative 
mutable site.) At the beginning of the molecular revo- 
lution, the development of biochemical genetics led to 
the ‘one gene, one enzyme’ hypothesis (Beadle and 
Tatum, 1941). On discovery of multisubunit enzymes, 
this became ‘one gene, one polypeptide chain.’ Still 
later, with discovery of RNA molecules with mutable 
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function independent of any translation product, it 
became essentially ‘one gene, one diffusible gene 
product.’ Some authors still restrict the definition of 
a gene to those coding for proteins, but it is well- 
accepted to refer to RNA-coding genes, for example 
those encoding tRNAs (e.g., Sched! and Primakoff, 
1973). As we shall see, these matters can become 
highly complex, with genes embedded in genes, 
genes on both strands, and interrupted genes. The 
basic idea remains that there is a connection between 
the term ‘gene’ and an individual unit of function. 
Protein-coding genes are further delimited by com- 
paring the nucleotide sequence with the sequence of 
the encoded protein. 


Genes and Phenotypes 

Genes are defined by finding organisms with variant 
functions, manifested as differing observable behav- 
iors. One behavior is defined as the ‘wild-type’; a 
variant behavior (such as failing to grow on lactose 
as sole source of carbon) is then ‘mutant’ behavior. 
The observed behavior itself is the ‘phenotype’; the 
underlying gene state is the ‘genotype.’ A strain is 
identified as a mutant strain only if the altered 
behavior is heritable, stably transmitted to offspring. 
This is distinct from an adapted condition, such as 
may occur upon transfer to a new environment. For 
example, the absence of the enzyme f-galactosidase, 
used to break down lactose, may indicate either an 
environmental condition or a genetic one, depending 
on the strain. A culture of wild-type cells grown in 
the absence of lactose (with glucose or glycerol as 
the sole source of carbon) will contain very little of 
this enzyme, while the same culture, shifted to a me- 
dium with lactose, will contain a large amount. In 
contrast, a mutant unable to make f-galactosidase 
at all will contain no B-galactosidase under either 
condition. 


Gene Names 

In bacteria, genes are named, for the most part, by 
rules originally laid down in the 1960s (Demerec et al., 
1968). Each gene gets a three-letter lower-case italic 
mnemonic that has something to do with an observ- 
able behavior, or phenotype: for example lac has to do 
with degradation of lactose. If more than one gene is 
found to determine a general function, each is given an 
upper case italic designation (lacZ, lacY, lacA, lacl). 
Classically, each gene was defined by mutations lead- 
ing to altered function. With the advent of genomic 
sequencing, large numbers of ‘genes’ (open reading 
frames or ORFs, see below) exist that have no known 
observable property. Pending characterization of 
mutations and phenotypes, these have usually been 
given accession numbers, often two capital letters that 


are an acronym for the genus and species, followed by 
a number. For the best-studied bacteria, Escherichia 
coli and Bacillus subtilis, ORFs of unknown function 
have been given names that conform to the gene formula 
but refer to location, not function. These names are of 
the form yzzZ, where z is a letter between a andj and Z 
can be any letter. 


Mutations 

Genes are operationally defined by the heritable 
effects on phenotype observed when mutations occur. 
Accordingly, mutations are changes in the nucleotide 
sequence: substitution of one nucleotide for another, 
deletion or insertion of one or more nucleotides, and 
inversion or translocation of a sequence segment are 
all mutations. Demonstration that a particular func- 
tion is encoded by a particular gene requires demon- 
stration that a strain lacking the function in question 
carries an alteration in the gene to which it has been 
assigned. Nowadays this is usually accomplished by 
sequencing the gene. Some sequence changes (called 
‘silent mutations’) do not give rise to an observable 
change in phenotype. This is due largely to the de- 
generacy of the genetic code (see below): many third- 
position nucleotide substitutions do not change the 
sequence of the polypeptide encoded. Sequence 
changes that lie outside of genes and that do not affect 
regulatory sites are also silent. In contrast to the gen- 
omes of many eukaryotes, almost all DNA in bacterial 
genomes is used to code for gene products, or to 
regulate expression, so most sequence changes do dis- 
rupt function. 


Gene Structure 


Protein-Coding Genes 

A protein-coding gene is one that is transcribed into 
messenger RNA and then translated into protein 
(Figure 1). The DNA signals that determine where 
the transcript starts (promoter) and stops (terminator) 
are essential for gene activity but are not part of the 
gene. The transcript typically contains three parts. A 
5’ untranslated region (UTR) contains a ribosome- 
binding site (RBS; also known as a Shine-Dalgarno 
sequence). In bacteria, more than one gene may be 
expressed from a single transcript (see below). The 
coding sequence (CDS) is the gene proper, and speci- 
fies the polypeptide sequence of the protein. This 
begins with a translation start signal (codon) that is 
always read by formyl-methionyl tRNA and ends 
with a translation stop codon that prompts the ribo- 
some to dissociate from the mRNA. A 3’ UTR is also 
present. The 5’ and 3’ UTRs may contain regulatory 
sites that affect the level of expression. The start codon 
is most frequently AUG; GUG is sometimes used, 
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protein-coding genes: the lac operon. Three protein- 
coding genes in one operon are shown (not to scale). 
The genes correspond to protein coding sequences as 
described in the text. Translation start (AUG and UUG) 
and stop (UAA) codons are included in the definition of 
the gene. A transcriptional promoter and terminator 
present in the DNA sequence are essential for proper 
expression but are not part of the gene. Other 
regulatory signals may be present that are active in the 
DNA or mRNA. Operators (Oc not shown), attenu- 
ators, or enhancers affect the level of transcription; 
ribosome-binding sites, message-stabilizing elements, or 
message-processing sites affect the level of translation. 
These heritable signals affect gene activity but are not 
part of the gene itself. 


CUG and UUG are rarely used. Stop codons are 
UAG, UAA, and UGA in most organisms but some 
mycoplasmas and nonplant mitochondria use UGA to 
code for tryptophan. 


The Genetic Code 

Translation of the coding sequence into a polypeptide 
chain follows coding rules that transform a nucleotide 
sequence with a four-letter alphabet into a polypep- 
tide sequence with a 20-letter alphabet. A rarely used 
twenty-first amino acid, selenocysteine, can also be 
used by a special mechanism. Three sequential nucleo- 
tide letters form a ‘word’ known as a codon, specify- 
ing one amino acid. Since there are 64 possible codons 
(four possible letters in each of three possible posi- 
tions) but only 21 possible translations (20 amino 
acids plus ‘stop’), up to six different codons specify 
the same amino acid. This is called degeneracy. The 
third position in the codon is often not informative; 
for example, GCA, GCC, GCG, and GCU are all 
translated as alanine, and the third letter has no infor- 
mation not present in the first two. Three different 
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reading frames are possible for any sequence: for 
example, ATG GCA or xAT GCC Axx or xxA TGC 
CAx. The location of the ribosome-binding site and 
start codon determines which frame is actually trans- 
lated. The start codon is not always the first available 
potential start in the transcript. Conversely, there may 
be no ribosome-binding site: in a lambda lysogen, the 
CI repressor is translated starting at the first base of 
the message. 


RNA Genes 

Some genes code for RNA transcripts that have func- 
tion in themselves, without translation. Pre-eminent 
among these are those involved in the translation 
machinery: ribosomal RNA and transfer RNA. The 
gene is then defined as the transcribed segment in 
toto. Some processing may occur to yield the final 
product so that the active species may be smaller 
than the initial transcript. Some other RNA gene 
products play enzymatic roles: RNAse P for example 
is comprised of both a protein moiety and an RNA 
moiety. Yet others act by altering the metabolism of 
particular transcripts: for example, RNA I of pBR322 
regulates plasmid copy number via a pairing inter- 
action with the RNA (RNA IJ) that serves as a replica- 


tion primer. 


Cistrons and Complementation Groups 

When multiple gene products are required for the 
same function, mutations identifying these functions 
can be assigned to different genes using a comple- 
mentation test. The idea behind the test is that in a 
heterozygote, with one mutant and one wild-type 
copy of a gene region, the wild-type copy can provide 
functions missing in the other copy: it complements 
the defect. In a test known as a cis-trans test, two 
mutations are analyzed by arranging two situations: 
in one, both mutations are on the same DNA molecule 
in the heterozygote (in cis); in the other they are on 
different molecules (in trans). If both mutations are 
in the same gene, the wild-type copy of sequence in 
the heterozygote will provide the needed function 
when the mutations are in cis; but when the mutations 
are in trans, both copies of the gene are defective, and 
complementation does not occur. The mutations are 
then assigned to the same gene, or complementation 
group. If the two mutations affect different genes, 
one copy of the DNA region will provide function 
for both genes in the cis configuration; while in the 
trans configuration, one function will be provided by 
one copy and the other function by the other copy. 
The mutations are then assigned to different genes. 
The term ‘cistron’ is sometimes used to refer to a gene 
defined in this way. This procedure requires no know- 
ledge of the DNA sequence in either the wild-type 
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or mutant state, nor even any knowledge of the bio- 
chemical function in question. However, it does 
require a method for introducing a second, wild-type 
copy of the candidate genes. In bacteria, this is not 
straightforward, since the genome is haploid. Such 
experiments are usually conducted by establishing a 
plasmid carrying the gene region in question. 


Gene Organization 


Operons 

In bacteria, in contrast to higher organisms, two or 
more genes may be expressed from one mRNA. This 
affords an opportunity for co-regulation of the ex- 
pression of multiple genes, by regulating the level of 
transcription. Such genes are said to form operons. 
Frequently, as in the example of lac shown in Figure I, 
all the genes in the operon affect related functions: the 
product of lacZ is an enzyme that degrades lactose, 
while the product of lacY is a membrane-bound pro- 
tein that specifically transports lactose into the cell. 
Disruption of the function of either of these proteins 
makes the cell unable to grow on media with lactose as 
the sole source of carbon and energy (phenotypically 
Lac’). 


Overlapping, Frameshifted, and Nested 
Genes 

In some cases, adjacent genes overlap and are trans- 
lated in different frames from the same sequence. 
Usually the overlap is small. A significant minority 
of genes in operons overlap by one or four nucleotides 
for example: TAATG, where TAA is the stop for the 
upstream gene, and ATG is the start of the second; or 
GTGA, where GTG is the start of the downstream 
gene and TGA the stop for the upstream gene. 

Numerous examples of ribosomal frameshifting 
have been described in viruses and insertion sequences 
as well as at least two conventional bacterial genes. 
Translating ribosomes ‘slip’ on the message at a de- 
fined location (called a ‘slippery sequence’) and con- 
tinue translation in a frame different from the original 
one. This occurs with dnaX of E. coli, leading to 
expression of replication factor gamma. A subset of 
ribosomes fails to frameshift; these terminate transla- 
tion at a stop codon not far away, resulting in trans- 
lation of replication factor tau, so that there are two 
gene products. 

In rare instances, two genes may overlap exten- 
sively: the IS5 insertion sequence expresses one pro- 
tein from one strand and two others from the other 
strand. In this instance, the same sequence segment 
codes for two genes. This sort of overlap is more 
frequent in mobile elements and bacteriophage, 


which have presumably experienced evolutionary 
pressure to keep genomes small. 

Another strategy used in several instances is to 
initiate translation at two different locations in the 
same frame, resulting in a full-length protein (from 
the first initiation site) and an N-terminal truncation. 
In the best-known examples, such as the Tn5 trans- 
posase and Inh protein, the truncated protein acts to 
inhibit or otherwise regulate the activity of the full- 
length protein. Because the functions are significantly 
different, this can be considered two genes coded by 
the same sequence. 


Intervening Sequences 

Intervening sequences are segments of DNA sequence 
within a protein-coding sequence that do not contrib- 
ute the final protein product. These come in two 
kinds: introns and inteins. Introns remove themselves 
at the RNA level; inteins remove themselves at the 
protein level. 

Introns are abundant in eukaryotes, which elab- 
orate a complex set of ribonucleoproteins to remove 
them. A smaller number of introns (Group I introns) 
are able to remove themselves via appropriate folding 
and catalytic action of the RNA transcript itself. 
Although introns are rare in Bacteria, self-splicing 
introns do occur, typically in tRNA genes, but more 
commonly in bacteriophage, where they are found 
in protein-coding genes. Frequently the segment of 
sequence that is removed encodes a second polypep- 
tide, distinct from the product of the original gene. 
The second protein in all known examples displays 
similarity to enzymes known as homing endonucl- 
eases; these are able to promote spread of the intron 
in examples examined. The second protein is not 
required for RNA splicing by group I introns. 

Inteins are found in all three domains of life. The 
intervening DNA sequence codes for an inframe in- 
sertion of polypeptide sequence (an intein) that has the 
ability to splice itself out of the host protein. The 
splicing event rejoins the external sequences (called 
exteins) and results in two protein products from one 
translation product. Most inteins also display similar- 
ity to homing endonucleases, but the endonuclease 
activity is not required for protein splicing to occur. 
These always occur at highly conserved locations in 
critical proteins, such as DNA polymerase or RecA. 


Fused Genes and Domain Structure 

Functions coded for by two genes in one organism 
may be coded for by one gene (one translation 
product) in another. The classic example of this is the 
product of trpC in E. coli; this polypeptide catalyzes 
two sequential steps in the biosynthesis of tryptophan, 
with enzymatic activities designated phosphoribosyl- 


anthranilate isomerase and indoleglycerol phosphate 
synthase. However, in Pseudomonas putida and other 
nonenteric bacteria, different polypeptides carry out 
these two steps. Mutations in trpC can inactivate the 
two activities separately. These mutations cluster in 
separate parts of the polypeptide. Limited proteolysis 
of the TrpC polypeptide enables separation and isola- 
tion of two segments, each containing one of the 
activities. These two segments are called domains. 

Domain structure of this kind, in which one well- 
folded portion of a polypeptide has one activity and 
another portion exhibits a second activity, turns out to 
be fairly common. At least three other fusion genes 
occur in various bacteria, just in the aromatic amino 
acid biosynthesis pathway. Presumably, the two-gene 
arrangement is the ancestral one. This notion is con- 
sistent with the known distribution of one- or two- 
gene arrangements in gram-negative bacteria. 


Genes in Databases 


Large-scale sequencing projects have produced mas- 
sive quantities of sequence information on large 
numbers of bacterial species. This section sketches 
very briefly the basis of gene definition for these raw 
sequences, and some problems of interpretation that 
arise in the absence of biochemical or mutational data. 

Genetic analysis of model systems (especially E. coli 
and B. subtilis) over the last 50 years has provided a 
large reservoir of knowledge of genes and functions 
essential for life. This knowledge can be applied to 
new sequences, by computerized analysis based on 
known gene structure and by comparison with known 
sequences. Accordingly, annotations to database en- 
tries can provide a guide to what genes are present, 
where they are, and what they might do. 


ORFs 

The starting point for bioinformatic annotation of a 
sequence is conceptual translation of possible protein- 
coding sequences (open reading frames, or ORFs), 
defined as the region between stop codons in any 
frame (see above). Translation proceeds according 
to the universal genetic code, but altered codes are 
known. Limited computational resources usually con- 
strain analysis to deduced polypeptides of 100 amino 
acids or more (sometimes a smaller number is used). 
Some genuine genes are shorter: for example, the tran- 
scription regulator Cro of phage lambda is 66 amino 
acids. Nevertheless, most known proteins are longer 
than this. This set of predicted proteins is further 
analyzed for compatibility with properties of known 
proteins (amino acid composition and codon usage 
especially). In most cases the locations of promoters, 
terminators or regulatory sites are not predicted, since 
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these are highly variable in sequence and are not easily 
handled by the bioinformatic methods available at 
present. 


Codon Usage 

Because of the redundancy of the genetic code, several 
different codons can in principle code for one amino 
acid. However, in practice, a given organism will pre- 
ferentially employ a smaller number of codons for 
particular amino acids, especially in highly expressed 
genes. For example, E. coli very rarely uses two of the 
six codons that specify arginine (AGA and AGG). 
This nonrandom distribution correlates with the 
abundance of the corresponding tRNA, in cases 
studied. 

In the well-studied bacteria E. coli and B. subtilis, 
genes expressed at a high level (like ribosomal pro- 
teins) show a different distribution of codon choice 
than the majority. Genes judged likely to have been 
horizontally transferred show a third distinct codon 
usage. Open reading frames with an abundance of rare 
codons are accordingly downgraded in likelihood of 
actual expression, although another possibility is that 
the ORFs were recently acquired from an evolution- 
arily distant source. If such rare codons are clustered 
in one part of a sequence, it can help in choosing 
among possible start codons; or it may suggest the 
presence of sequence errors, particularly if there is 
reason to suppose that the protein in question would 
be expressed at a high level. 


Sequence Similarity and Functional 
Predictions 

Relatively few amino acid substitutions in a given 
protein will allow function to be preserved, particu- 
larly at enzymatic active sites or cofactor binding sites, 
or at locations buried in the core of a folded protein. 
Because of this, polypeptide sequences are conserved 
during evolution. Thus, a DNA polymerase from 
one genus nearly always has a sequence with few 
changes from the functionally identical DNA poly- 
merase of a related genus. Even across large evolution- 
ary distances, amino acids critical for function at the 
active site of a polymerase are highly conserved. 
Because of this conservation, a newly acquired 
DNA polymerase sequence will line up with known 
sequences, aligning identical strings of amino acids in 
similar locations. 

Using a computer, such alignments can be carried 
out on a large scale, to derive a set of alignments of 
each predicted polypeptide with each known poly- 
peptide. The ‘best’ of these can then be chosen and 
inferred to predict function for the unknown poly- 
peptide. Public or private sequence databases are used 
for this exercise. Genbank, EMBL, and DDBJ are the 
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main public databases, maintained by agencies of the 
US, the EU, and Japan respectively. A widely used 
method is the BLAST program developed at NCBI, 
but there are numerous other programs. 

What is the ‘best’ alignment? These programs 
typically yield pairwise alignments i in which identical 
amino acids are lined up; gaps in one sequence or the 
other or both may be introduced to restore alignment. 
Each alignment is assigned a percent identity (fraction 
of positions in the alignment that have identical amino 
acids in the two sequences), a quality score, and also an 
estimate of the probability that the alignment could 
occur by chance. Scores are generally higher for longer 
alignments, for alignments with higher identity, and 
for alignments with fewer gaps; probability estimates 
are correspondingly lower. 

Depending on these scores, an annotator may 
decide to assign to the unknown protein the same 
function as that known for the protein in the database 
that gives the best alignment. Such an assignment can 
be made with confidence for long alignments with 
high identities, e.g. greater then 95% of amino acids 
are the same for an alignment covering the whole 
length of both proteins. With lower similarity scores, 
(e.g. 50% identity over most of the protein) the infer- 
ence may be drawn that the unknown belongs to a 
family of proteins with a general class of function (for 
example, it is predicted to be a dehydrogenase). With 
still lower scores, or with high similarity of a small 
segment of a protein, very limited predictions are 
possible. For example, nucleotide-binding sites show 
good conservation but demand a rather short sequence 
segment. With alignments showing 20% or lower 
identity, very little can be said. When entered in the 
database, the unknown then acquires an annotation 
reflecting this judgement. The predicted function then 
becomes an hypothesis to be tested. 


Modules 

With the availability of large amounts of sequence data 
from a variety of bacteria, it has become apparent that 
gene fusions are fairly common. For this reason, attri- 
bution of function based on high similarity over only a 
portion of a gene can be tricky. The term module 
refers to an extended segment of high similarity in an 
alignment that nevertheless covers only a fraction of 
a total polypeptide. In well-characterized examples 
(see above), these modules correspond to well-folded 
polypeptide domains. For less well-studied proteins, 
caution is in order. The known partner in the align- 
ment may have a known function, but it may be that 
one module carries it out, while a second module 
carries out a second, uncharacterized function. If 
the unknown aligns over only one segment, it is 
impossible to know whether to assign it the known 


function, since that function may belong the unaligned 
region. 
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Bacterial genetics is the study of how genetic informa- 
tion is transferred, either from a particular bacterium 
to its offspring or between interbreeding lines of bac- 
teria, and how that genetic information is expressed. 
Given the short generation times of most bacteria, the 
inheritance of genetic information must be extremely 
faithful. Occasionally genetic variation or the transfer 
of genetic information between bacteria gives rise to 
mutations. The large sizes of bacterial populations 
ensure that even extremely rare genetic events are 
likely to occur. This genetic variation makes it possible 
for individual members of huge populations of bac- 
teria to evolve new traits rapidly. For example, a single 
mutation may allow a bacterium to survive environ- 
mental conditions that would kill its nonmutant sib- 
lings (e.g., exposure to an antibiotic), or a group of 
genes transferred from another bacterial species 
may enable such an altered bacterial species to invade 
a new environmental niche (e.g., the ability to infect a 


new host). In the laboratory, genetic variation is ex- 
ploited to study the properties of bacteria, to explore 
the fundamental characteristics of gene transfer and 
gene expression, and to construct mutants with desired 
characteristics. 

A minimal set of tools required for bacterial gen- 
etics includes the ability to isolate mutants, the avail- 
ability of selectable genetic markers, and the ability to 
transfer genetic information between cells. 


Mutations and Mutagenesis 


A powerful feature of bacterial genetics is the ability to 
examinevery large populations of cells (typically > 101°) 
for extremely rare types of mutants. Mutations can arise 
in a variety of ways, including rare errors in DNA 
synthesis or DNA repair. The probability that a spon- 
taneous mutation will affect a particular base pair 
varies from about 1077 to 107"! per generation. 
Thus, in a population of bacteria, about 1 in 10° cells 
may have a mutation in a particular gene. The 
frequency of mutations can be increased by certain 
chemical or physical agents called mutagens. Muta- 
gens may act by increasing the frequency of errors 
during DNA synthesis or repair or by directly altering 
the DNA. Direct exposure of bacteria to a mutagen 
may increase the frequency of mutations in a popula- 
tion of cells from 10°- to 10°-fold. Thorough genetic 
analysis requires many types of mutations, but each 
particular method of mutagenesis yields different sub- 
sets of mutations. 


Effects of Mutations 

Bacterial mutants are typically described by compar- 
ison to a standard, well-characterized, reference strain 
called the ‘wild-type’ strain. Bacterial mutants have 
often lost some growth property (e.g., failure to utilize 
a particular carbon or nitrogen source or failure to 
grow without a particular nutrient), or acquisition of 
some new growth property (e.g., ability to grow in the 
presence of some toxic substance). Genes can be 
divided into two categories based on the phenotypes 
of the corresponding mutants: nonessential gene pro- 
ducts are only required under specific growth condi- 
tions, while essential gene products are required under 
all conditions. The genes of lactose catabolism are 
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nonessential because they are only required for 
growth on medium with lactose as the sole carbon 
source. In contrast, the genes encoding RNA poly- 
merase are essential because they are required for 
growth on all media. Null mutations in a nonessential 
gene will prevent growth on a medium that requires 
that gene product but such mutants will still grow on 
other media. In contrast, null mutations in an essential 
gene are lethal. Consequently, such mutants cannot be 
recovered from haploid bacteria. Nevertheless, it is 
possible to isolate more subtle mutations in essential 
genes. For example, it is possible to isolate mutations 
that alter a subunit of RNA polymerase that make the 
organism resistant to the antibiotic rifampicin. It is also 
possible to isolate mutations where some phenotype is 
observed under certain ‘nonpermissive’ conditions but 
not under other ‘permissive’ conditions (Table 1). 

Because not all mutations have an observable effect, 
it is important to distinguish the genotype from the 
resulting phenotype. In bacterial genetic nomenclat- 
ure a three-letter mnemonic refers to a pathway or 
discrete cluster of physiologically connected systems. 
A fourth, capitalized letter represents a particular gene 
of that set. The genotype is written in lower case 
letters and italicized (e.g., putA), with a plus super- 
script indicating the wild-type genotype. The pheno- 
type is indicated by the same mnemonic but the first 
letter is upper case and it not italicized (e.g., PutA), 
with a plus superscript indicating the functional 
phenotype and a minus indicating a mutant pheno- 
type. The genotype of a cell is usually inferred from 
its phenotype but may also be determined indirectly 
by recombination experiments or directly by DNA 
sequencing. 


Isolation and Characterization of Bacterial 
Mutants 

Genetic analysis begins with the isolation of mutants 
that affect some property of the bacteria. Since muta- 
tions are very rare, mutants must be isolated from 
large populations of wild-type cells. Thus, some tactic 
is needed to find the rare mutants within a vast excess 
of parental bacteria. It is possible to identify mutations 
by physical methods such as DNA sequencing, but 
because mutations are so rare this approach is in- 
appropriate. Instead, mutants are usually identified 


Some types of conditional mutations used in bacterial genetics 


Conditional mutation 


Permissive conditions 


Nonpermissive conditions 


30 °C 
42°C 


Temperature sensitive (Ts) 
Cold sensitive (Cs) 
Osmoremedial 
Suppressor sensitive 


High osmotic strength 
Host with suppressor mutation 


42 °C 

30 °C 

Low osmotic strength 

Host lacking suppressor mutation 
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on the basis of an observable effect on the physiology 
of the cell. Detection of mutants requires a genetic 
selection or screen (Table 2). 

A selection is an experimental arrangement that 
allows specific mutants, but not the parental cells, to 
grow. Genetic selections are very powerful because 
they allow the direct isolation of rare mutations from 
a very large population of cells. Some useful selections 
include resistance to antibiotics, resistance to phage, 
or the ability to grow on a medium where the parental 
cells cannotgrow. Forexample, selectionforrifampicin- 
resistant (Rif?) mutants simply requiresexposingalarge 
number of bacteria to medium containing rifampicin — 
sensitive bacteria are killed by the antibiotic and 
resistant mutants grow. 

The ability to grow bacteria on solid media (i.e., 
agar plates) under conditions where each cell forms a 
single colony allows one to carry out screens that 
distinguish a particular mutant from other bacteria in 
the population. If the mutation is relatively common 
and no direct selection for the mutant phenotype is 
available, it is possible to screen for mutants on media 
where both the mutant and parental cells grow, but 
where the mutant has a readily scorable phenotype. 
For example, mutants of Escherichia coli unable to 
degrade lactose can be identified on indicator media, 
such as MacConkey-lactose color indicator plates. 
MacConkey-lactose plates contain a carbon source 
that can be used by Lac” cells, and peptides that can 
be used as a carbon source by both Lactand Lac” 
cells. Both the Lac” mutant and the Lac* parental 
cells can therefore grow on MacConkey-lactose me- 
dium. However, MacConkey-lactose medium also 
contains a pH indicator that is colorless at high pH 
but red when the pH decreases owing to fermentation 
of the lactose, so Lac* bacteria form red colonies 


possible visually to screen for Lac” mutants by looking 
for rare white colonies on MacConkey-lactose plates. 

Screens for other types of mutants are not this 
simple. For example, when mutations disrupt a bio- 
synthetic pathway, the bacteria cannot grow unless the 
endproduct of the pathway is provided in the medium 
(permissive medium). It is not possible to isolate such 
‘auxotrophic mutants’ by directly screening for 
growth on media lacking the endproduct because the 
desired mutants will not grow (nonpermissive me- 
dium). However, auxotrophs are readily identified 
by replica plating. The bacteria are grown on permis- 
sive medium at a density of several hundred colonies 
per plate. Each of these ‘master plates’ is then repli- 
cated onto two other plates, containing either the 
permissive medium or the nonpermissive medium. 
The bacteria are transferred to an identical position 
on each of the replica plates. Once colonies are identi- 
fied that grow only on the permissive medium, the 
mutants can be isolated from the master plate. 

The difference between a selection and a screen has 
important practical consequences. Compare the selec- 
tion for a Rif? mutant with a screen for a His” mutant. 
Isolation of a Rif? colony is a direct selection because 
as many as 10’° cells can be spread on a single plate 
containing rifampicin and only Rif? mutants will form 
colonies. Thus, a Rif? mutant as rare as 1 in 10'° can be 
isolated easily. In contrast, finding an auxotrophic 
mutant involves screening through a large number of 
colonies for one that fails to grow in the absence of the 
required nutrient. To score the growth behavior of 
individual colonies, only a few hundred colonies can 
be examined on any given plate. Thus, if 1 in 10° cells 
in the population were a particular auxotrophic 
mutant, thousands of plates would be needed to find 
a single mutant. Treating populations of cells with a 


and Lac” bacteria form white colonies. Hence, it is | mutagen may increase the fraction of mutants to 1 in 

Table 2 Isolation of bacterial mutants 

Approach Features Sensitivity Examples 

Selection Condition where only mutant cells can fo: Resistance to antibiotics or toxic sub- 
grow strate analogs; resistance to phage 

Screen Condition where both mutants and 107? to 10-3 Indicator medium to identify mutants 
parental cells can grow, but the mutants unable to ferment a carbon source; 
have a phenotype that is distinguishable replica plating to identify auxotrophs 
from the parent 

Enrichment Condition where survival of the mutant is 107°? per cycle Penicillin enrichment; b-cycloserine en- 


favored over the parental cells; usually 
employs an antibiotic that selectively kills 
growing cells or a condition that kills cells 
able to incorporate a particular substrate; 
a genetic screen is needed to identify the 
resulting mutants 


richment; radioisotope suicide 


10°, allowing screening for the desired mutant on a 
reasonable number of replica plates. Nevertheless, 
random mutagenesis may not increase the mutant 
fraction sufficiently to screen easily for the desired 
mutants, and mutagenesis may lead to other, undesir- 
able mutations as well. 

Isolating rare mutants is generally achieved by 
using some sort of enrichment, a method that favors 
the growth of the desired mutants relative to non- 
mutant bacteria. Penicillin enrichment is a classical 
example of this approach. This antibiotic disrupts the 
synthesis of the bacterial cell wall. Nongrowing bac- 
teria do not engage in cell wall synthesis, so penicillin 
only kills actively growing bacteria. The differential 
survival of growing versus nongrowing bacteria can 
be used to enrich for a desired mutant. For example, 
penicillin enrichment can be used to isolate a rare 
auxotrophic mutant from a population of bacteria. If 
a mixture of wild-type and auxotrophic bacteria are 
suspended in the nonpermissive medium containing 
penicillin, the auxotrophic mutants do not grow and 
thus will survive, while 99% of the wild-type bacteria 
will grow and be killed by the penicillin. When the sur- 
viving bacteria are washed free of the penicillin, and re- 
suspended in permissive medium, both the mutant and 
wild-type bacteria will grow, but the ratio of mutant 
to wild-type bacteria will be enriched 100-fold. 
Repeating this procedure multiple times eventually 
increases the proportion of mutants in the population. 


Genetic Exchange 


Exchange of DNA between bacteria plays an import- 
ant role in evolution. Gene transfer in nature can result 
in the acquisition of antibiotic resistance and new 
virulence traits. Gene transfer is also a useful tool in 
the laboratory, allowing genetic mapping and comple- 
mentation tests, and the construction of bacterial 
strains with multiple mutations. The three most 
common methods of gene transfer between bacteria 
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are transformation, conjugation, and transduction 
(Table 3). A suitable selectable marker is required to 
identify recipients that have inherited the desired 
region of donor DNA. 


Transformation 

The uptake of naked DNA is called transformation. 
Some species of bacteria are naturally transformable. 
At some stage during growth they express gene pro- 
ducts that facilitate the uptake of exogenous DNA, 
a condition called ‘competence.’ The physiological 
conditions required to induce competence differs for 
different species of bacteria. However, most natural 
transformation seems to involve the degradation of 
one strand of the exogenous DNA during the transfer 
of the other strand of DNA into the cell. Stable inheri- 
tance of the donor DNA requires either that it be 
replicated (e.g., some plasmids and phage) or that it 
integrate into the recipient chromosome via homolo- 
gous recombination. 

Many types of bacteria are not naturally transform- 
able but can be induced to take up DNA by treatment 
with specific chemicals or by electric shock, processes 
that are mechanistically different from natural trans- 
formation. The most common method of chemical 
transformation is by hypotonic Ca?* shock. To pre- 
pare competent bacteria by this method, an early- 
exponential phase culture of cells is suspended in a 
cold hypotonic CaCl, solution. When DNA is added 
to these bacteria it forms a calcitum-DNA complex 
that adsorbs to the cell surface. The bacteria are then 
briefly warmed (heat shocked), which allows the 
DNA to enter the cell. In electroporation, cells are 
exposed to an electric field, so that a voltage potential 
develops across the membrane, transiently forming 
small pores that allow entry of exogenous DNA. 
In contrast to natural transformation and chemical 
transformation, which only seem to work in certain 
bacteria, a wide range of bacteria can be transformed 
by electroporation. 


Table 3 Common mechanisms of natural gene transfer in bacteria 


Transfer mechanism Vehicle 


Properties 


Transformation — 
Conjugation 


Generalized transduction Phage 


Specialized transduction Lysogenic phage 


Conjugal plasmid or transposon 


Uptake of naked DNA from environment; recipient 
cell must be competent 

Requires cell-to-cell contact; very long fragments of 
DNA can be transferred to recipient cell 
Phage-length fragments of chromosomal DNA 
transferred to recipient cell; phage head only carries 
host DNA 

Short fragments of chromosomal DNA transferred 
to recipient cell; phage head carries host DNA 
covalently attached to phage DNA 
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Conjugation 

DNA can also be transferred between bacteria by a 
process called conjugation. Conjugation requires cell- 
cell contact and the formation of a mating channel. 
This structure is formed by specific proteins that form 
a pore between the juxtaposed membranes through 
which single-stranded DNA and some associated pro- 
teins are transferred into the recipient cell. Conjuga- 
tion requires four events: (1) contact between donor 
and recipient cells and formation of a mating append- 
age; (2) nicking of the donor DNA at specific sites; 
(3) translocation of the nicked strand of donor DNA 
to the mating bridge and into the recipient cell; and 
(4) replication of the transferred DNA in the recipient 
cell. The proteins required for conjugation are usually 
encoded on specific plasmids or transposons. These 
conjugal plasmids or transposons can transfer them- 
selves or, if integrated into another region of the 
genome, any adjacent genomic sequence. Thus, con- 
jugation can result in the transfer of very large 
DNA fragments — even chromosome length DNA 
fragments. Furthermore, some conjugal plasmids 
are quite promiscuous. In addition to transferring 
DNA into a wide variety of bacteria, some conjugal 
plasmids can transfer DNA into Archaea and eukary- 
otes as well. 


Transduction 

Transduction is the transfer of bacterial DNA from 
one cell to another by a phage particle. Phage particles 
that contain bacterial DNA are called transducing 
particles. There are two types of transduction: gener- 
alized and specialized. 

Generalized transduction equires an error during 
the packaging phase of phage maturation, such that 
random, phage-size fragments of chromosomal DNA 
get inserted into the phage head in place of phage 
DNA. Thus, in a lysate of a generalized transducing 
phagea some particles contain DNA obtained from 
the host bacterium rather than phage DNA. The bac- 
terial DNA fragment can be derived from any part of 
the bacterial chromosome. When these phage particles 
adsorb to a recipient, the double-stranded chromo- 
somal DNA is injected. Stable inheritance of the 
donor DNA requires either that it be replicated (e.g., 
following transfer of plasmids) or that it integrate into 
the recipient chromosome via homologous recombin- 
ation. Generalized transducing particles arise during 
lytic growth of both virulent and lysogenic phages. 

Specialized transduction results from the aberrant 
excision of an integrated lysogenic phage. In contrast 
to generalized transduction, the phage head packages 
contains a contiguous molecule having both host and 
phage DNA. Only regions of the host DNA that flank 
the prophage are packaged. By using genetic tricks to 


force lysogenic phages to integrate at many sites in the 
genome, it is possible to isolate specialized transdu- 
cing phage from many different regions of a bacterial 
genome. Stable inheritance of specialized transducing 
fragments can occur either by phage-specific pro- 
cesses or by homologous recombination mediated by 
bacterial recombinases. 


Genetic Analysis of Bacterial Mutants 


Three of the most important tools for the analysis 
of bacterial mutations are suppressor analysis, genetic 
recombination, and complementation. Suppressor 
analysis facilitates the dissection of the roles of gene 
products and how they interact with other gene pro- 
ducts. Recombination allows the construction of new 
combinations of genes and the elucidation of the posi- 
tions of genes on a chromosome relative to other 
genes. Complementation can reveal how many genes 
are responsible for a particular phenotype and can 
distinguish regulatory genes from regulatory sites. 


Reversion and Suppressor Analysis 

Mutant organisms can regain their wild-type charac- 
teristics by means of a mutation that restores function, 
by a process called reversion. Reversion can either be 
owing to ‘true reversion,’ a back mutation that restores 
the original genotype, or ‘suppression,’ a second mu- 
tation that produced a change that partially or fully 
compensated for the first mutation. The ability to 
analyze very large populations of bacteria facilitates 
the isolation of rare classes of such revertants. 

The reversion frequency is a useful criterion for 
classifying mutants. The reversion frequency can dis- 
tinguish a single point mutant from a double mutant 
or a deletion mutant: single point mutants (which can 
be repaired by a single back mutation) revert at a much 
higher frequency than double mutants (which require 
two changes and thus revert very rarely) or deletion 
mutants (which cannot directly revert). The effect 
of mutagens on the reversion frequency can also be 
used to distinguish different types of mutations — 
reversion of frameshift mutations is stimulated by 
frameshift mutagens, and so on. For most purposes 
this approach has been superceded by direct DNA 
sequence analysis; however, the reversion of known 
bacterial mutations remains an important assay to 
detect potential human carcinogens (for example, by 
the Ames test). 

Second site mutations that result in suppression 
can occur within the same gene as the original muta- 
tion (intragenic suppression), or within a different 
gene (intergenic suppression). Intragenic suppressors 
can provide information about the role of particular 


amino acids in a protein, protein folding, and interac- 
tions between different amino acids within a protein. 
Intergenic suppressors can occur in a variety of ways, 
and each provides different insights into the structure 
or function of the gene products involved. Suppressor 
mutations often result from ‘gain-of-function’ muta- 
tions that are dominant over the wild-type gene. 
Thus, characterization of suppressors requires genetic 
recombination to construct strains with different 
combinations of mutant alleles, and complementation 
analysis to determine dominance. 


Homologous Recombination 

Genetic recombination is the physical breakage, 
exchange, and rejoining of two DNA molecules. 
Homologous or general recombination can be 
mediated by several different pathways in bacteria. 
Each of these pathways requires the RecA protein to 
align the DNA molecules between regions of substan- 
tial DNA sequence identity. The DNA molecules are 
broken between random but matching nucleotides, 
and then the DNA fragments are exchanged and 
rejoined to form two new combinations of genes. 
For example, recombination between two DNA 
molecules with the genotypes ab and ab* can yield 
two recombinant DNA molecules with the genotypes 
a’b* and ab. Owing to the efficiency of gene transfer 
and the ability to work with large populations of 
bacteria, recombination analysis can be sufficiently 
sensitive to detect recombination events between 
adjacent base pairs. 

Recombination can be used not only to construct 
strains with different genotypes, but can also reveal 
the relative map location of genes. The probability of 
recombination between any two adjacent base pairs 
is very low, but the probability K recombination 
between base pairs within a homologous DNA 
sequence is essentially random. Hence, the physical 
distance between two genes located on the same DNA 
molecule determines the frequency of recombination 
between the genes: the probability of recombination is 
less if the genes are close to each other than if the genes 
are farther apart. 

Genetic mapping exploits the recombination fre- 
quency between genes to measure the relative distance 
between genes. In bacterial genetics, the probability 
that recombination did not occur between genes 
is usually determined. If recombination does not 
occur between two genes, the genes will be coin- 
herited. For two genetic markers on the same DNA 
molecule, the closer two genetic markers are to each 
other, the more often they will be coinherited. The 
frequency that two genes are coinherited is defined 
as their linkage. Determining the linkage of two 
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genetic markers is called a two-factor cross. It is also 
possible to determine the relative location of genetic 
mutations by using three genetic markers (three- 
factor crosses) or by genetic crosses involving a set 
of defined deletion mutations (deletion mapping). 
Although it is possible to determine the relative loca- 
tion of genes by hybridization or DNA sequencing, 
genetic mapping often provides a simple and inexpens- 
ive way to determine rapidly the location of mutations 
in bacteria. 

Recombination also provides an invaluable tool for 
constructing strains with multiple mutations. If a mu- 
tation involves a directly selectable phenotype, it 
can be transferred with ease from a donor to an appro- 
priate recipient strain. If the mutation cannot be 
directly selected, linkage to an adjacent, selectable 
marker can be used to move the mutation into a reci- 
pient strain. For example, an auxotrophic mutation 
may be transferred by selecting for coinheritance of a 
nearby gene. 

In order to determine the genetic or biochemical 
effects of a mutation, it is necessary to compare a 
mutant with a strain that only differs by a single 
mutation. If several mutations are present, it is not 
obvious which of the mutations is responsible for an 
observed phenotypic change. Two organisms that dif- 
fer by only a single mutation are said to be isogenic or 
to have the same genetic background. The most com- 
mon way to ensure that two strains are isogenic is to 
transfer a small region of DNA carrying the mutation 
into the parental strain by recombination. 


Complementation Analysis 


A particular phenotype frequently reflects the activity 
of many genes. To understand any genetic system it is 
essential to know the number of genes and regulatory 
elements involved. Since multiple genes that affect the 
same function may map very close to each other, it is 
not possible to determine if two mutations are in the 
same or different genes simply from the recombina- 
tion frequency. For a variety of reasons, it is also often 
difficult to prove that two mutations affect the same 
gene product by DNA sequence analysis. However, 
this question can be addressed by genetic complemen- 
tation. 

Bacteria are normally haploid, but complementa- 
tion requires the maintenance of two copies of a par- 
ticular gene in the same cell. In bacteria this can be 
done by constructing a partial diploid or ‘merodi- 
ploid’ that carries two copies of the relevant genes. 
Partial diploids may provide one copy of the gene on 
the chromosome and a second copy of the gene on a 
plasmid. For example, the partial diploid lacZ/lacZ* 
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has both a copy of the lacZ gene and second copy with 
the lacZ* gene. If the functional copy of these genes 
encodes a protein that can diffuse through the cell to 
perform its function, and the second copy has a loss- 
of-function mutation, the functional copy of the gene 
will be dominant over the mutant copy. Thus, even 
though the lacZ gene on the chromosome cannot pro- 
duce B-galactosidase, the plasmid-bone lacZ* gene 
makes f-galactosidase so the cell is phenotypically 
Lac*. The mutant is complemented by the wild-type 
gene, indicating that the mutation is recessive. In con- 
trast, the partial diploid lacZ/lacZ cannot make 
B-galactosidase so the cell is phenotypically Lac. If 
two recessive mutants fail to complement, they affect 
the same gene. 

Complementation analysis in bacteria usually fol- 
lows the grouping of mutations by genetic mapping 
experiments. This reduces the number of partial 
diploids that must be constructed because genes that 
map far from each other are clearly different. In 
addition, complementation analysis should involve 
partial diploids with an equal number of copies 
of each gene. If one of the genes is present in excess 
(for example, with genes cloned on multicopy plas- 
mids) artifacts can occur which may be very mis- 
leading. 


Portable Regions of Homology 


Although most genes have a specific, defined location 
in a bacterial genome, the genome is by no means 
static. Recombination between repeats of homologous 
DNA sequences can result in the duplication, dele- 
tion, or inversion of the intervening DNA. It is pos- 
sible to select for insertion of such homologous 
sequences at strategic positions in the genome. Such 
‘portable regions of homology’ can be generated by 
transposable elements, which are mobile segments of 
DNA that move to new locations at low frequency. 
Alternatively, having specific DNA fragments on 
a circular DNA molecule (e.g., a plasmid or phage) 
can permit the targeted insertion of DNA into a 
particular site on the chromosome. These two types 
of insertions have a wide variety of genetic uses. 
A few examples include: construction of insertion 
mutants with complete loss of function; construction 
of deletion mutations with defined endpoints; con- 
struction of duplication mutations for complementa- 
tion studies; integration of other genetic elements at 
particular sites in the genome by homologous recom- 
bination; transfer of mutations by selection for an 
associated genetic marker (e.g., antibiotic resistance); 
isolation of linked genetic markers with a selectable 
phenotype; and construction of operon or gene 
fusions. 


Transposable Elements 

Transposable elements in bacteria include insertion 
sequences and transposons. Insertion sequences are 
short elements (typically less than 5kb) that only 
encode functions required for their own transposition. 
Transposons are typically longer (>5 kb) and encode 
other gene products (e.g., antibiotic resistance) in 
addition to the functions required for transposition. 
The frequency of transposition of these elements is 
typically low, although the frequency varies over a 
wide range (10 ” to 10 ? per generation). Transposi- 
tion requires both cutting the DNA at each end of the 
transposon and the target DNA site, joining the ends 
of the transposon and target DNA molecules, and 
DNA replication. Transposition can occur by a repli- 
cative mechanism (requiring replication of the entire 
transposon) or nonreplicative mechanism (requiring 
only replication of short fragments at the end of the 
insertion site). Both mechanisms are independent 
of the homologous recombination machinery of the 
host. 

Transposable elements play an important role in 
bacterial evolution, including the transfer of antibiotic 
resistance genes between bacteria and promoting 
chromosome rearrangements. In addition, transpos- 
able elements are useful tools in bacterial genetics 
because they provide selectable markers and portable 
regions of homology that can be used to facilitate 
genetic recombination. 


Integration of Circular DNA Molecules 

In addition to transposons, it is possible to construct 
small repeats of homologous DNA by integration of 
a circular DNA molecule with a cloned fragment of 
chromosomal DNA. This approach is useful for 
construction of defined duplications for comple- 
mentation analysis and for construction of insertion 
mutations on the chromosome. Recombination be- 
tween the homologous sequences of the resulting 
duplication can result in allele exchange, moving a 
mutation from a cloned sequence onto the chromo- 
some DNA. 


Summary 


A hallmark of bacterial genetics is the ability to ana- 
lyze very large populations of cells to identify rare 
genetic events. Although a wide variety of genetic 
tricks have been developed for specific purposes 
in particular bacteria, bacterial genetics relies on a 
relatively small core of tools for dissection of the 
structure and function of genes. The essential tools 
include the isolation of mutations, the ability to transfer 
genes between bacterial strains, the ability to isolate 
recombinants, and the ability to do complementation 


tests. These tools have been finely honed for a select 
group of model bacteria, including Escherichia colt, 
Salmonella enterica, and Bacillus subtilis. The con- 
cepts developed for these model bacteria are readily 
applicable to other bacteria as well, although the ex- 
perimental details typically require adaptation for 
each particular species of bacteria. 
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Transcription initiation by bacterial RNA polymerase 
is a multistep process that includes the initial binding 
of RNA polymerase to the promoter to form a closed 
complex, isomerization of the closed complex to an 
open complex, the initial polymerization of ribo- 
nucleotides, and clearance of the promoter by RNA 
polymerase. Transcription factors are proteins that 
affect, either negatively or positively, specific steps 
in this process. 


Sigma (o) Factors 


Function of Sigma Factors 

The catalytic form of RNA polymerase, core RNA 
polymerase, consists of four protein subunits, B, B’, and 
two a subunits. This form of the enzyme, however, 
cannot recognize promoter sequences. Binding of an 
additional subunit, ©, to core RNA polymerase results 
in a holoenzyme that recognizes specific promoter 
sequences. Bacteria often have several sigma factors, 
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each conferring a different promoter specificity to 
holoenzyme. One or more of these proteins serves as 
the primary sigma factor which is required for tran- 
scription of most of the genes within the bacterium. 
The rest are alternative sigma factors that are required 
for the transcription of specific sets of genes. The 
number of sigma factors varies widely between bac- 
terial species. For example, Mycoplasma genitalinm 
possesses a single sigma factor, while Bacillus subtilis 
has 17. Genes that are transcribed by alternative sigma 
factors usually encode proteins that are involved in 
a common cellular function. For example, o°? is an 
alternative sigma factor in Escherichia coli that is 
required for the expression of heatshock proteins in 
response to increased temperature. 

Amino acid sequence comparisons of sigma factors 
have revealed two distinct protein families. The larger 
of these two families shares sequence homology with 

7° the primary sigma factor of E. coli, and consists of 
both primary and alternative sigma factors. The se- 
cond family consists of a single alternative sigma fac- 
tor, o°*, which is found in a wide variety of bacterial 
species. Not only does o°* lack sequence homology 
with members of the o° family, but the mechanism of 
transcription initiation with o*-RNA polymerase 
holoenzyme differs from that of other forms of 
RNA polymerase holoenzyme. 


Regulation of Sigma Factor Function 

The activities of certain alternative sigma factors are 
tightly controlled as a means of regulating expression 
of genes that are dependent on these sigma factors. 
This is accomplished either by modulating the levels 
of the sigma factor inside the cell or by regulating the 
activity of the sigma factor. An example of the first 
type of regulation is observed with E. coli o°. Levels 
of o” are normally low inside the cell due to the 
instability of the protein. This instability is caused by 
binding of the molecular chaperone DnaK to o°, 
which leads to the degradation of o”? by FtsH, a 
membrane-bound metalloprotease. Upon temperature 
upshift, DnaK dissociates from o* and preferentially 
binds denatured proteins that accumulate during the 
temperature stress. This results in dramatically higher 
levels of o°’, which leads in turn to increased expres- 
sion of o°?-dependent genes. 

The activities of some alternative sigma factors are 
regulated by anti-sigma factors. Anti-sigma factors 
bind tightly to their corresponding sigma factors, 
thereby preventing them from interacting with core 
RNA polymerase. Anti-sigma factors are themselves 
subject to regulation by various mechanisms, includ- 
ing secretion of the anti-sigma factor from the cell, 
interactions of the anti-sigma factor with an extra- 
cytoplasmic signal, or sequestration of the anti-sigma 
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factor by an anti-anti-sigma factor. An interesting 
example of anti-sigma regulation occurs in flagellar 
biogenesis in enteric bacteria. Synthesis of the late 
flagellar genes requires the alternative sigma factor 
o°*. During the early stages of flagellar synthesis, o° 
is inactive due to its association with the anti-sigma 
factor FlgM. Upon assembly of the flagellar basal 
body and hook structures, FlgM is secreted out of 
the cell by the flagellar export system, thereby freeing 
o”? to bind to core RNA polymerase. 


Transcriptional Activators 


Additional transcriptional factors are required for 
maximal expression from many bacterial promoters. 
These transcriptional activators may be required for 
recruiting RNA polymerase to the promoter, for isom- 
erization of the closed promoter complex to an open 
complex, or for promoter clearance. Activators, there- 
fore, work by facilitating specific steps in the normal 
pathway of transcription initiation rather than by 
creating new pathways. 


Activators that Recruit RNA Polymerase 
Promoters that are dependent on activators often lack 
either the canonical —10 or —35 hexamer elements 
that are contacted by specific regions of the sigma 
subunit of RNA polymerase. Consequently, these 
promoters have low affinities for RNA polymerase. 
Activators that function at these promoters generally 
bind immediately upstream of the —35 region of the 
promoter and directly contact RNA polymerase, 
which increases the affinity of the RNA polymerase 
for the promoter. 

The best studied of this type of activator is the 
cyclic AMP receptor protein (CRP) (also referred to 
as catabolite gene activator protein (CAP)). In the lac 
system, CRP binds as a dimer to a site upstream of the 
lac promoter, and the subunit that is proximal to the 
promoter contacts the C-terminal domain (#CTD) of 
one of the a subunits of RNA polymerase. The ~CTD 
is the target for a number of activators. Interestingly, 
the exact location of the activator binding site varies 
considerably. This is because the «CTD is joined to 
the rest of the « subunit by a flexible linker, and 
activators can bind to a variety of positions upstream 
of the promoter and still contact «CTD, provided 
that the two proteins are bound on the same face of 
the DNA. 

In principle, any surface region of RNA polymer- 
ase could serve as a binding site for activators that 
function by recruiting polymerase to the promoter. 
Some activators, such as the bacteriophage lambda cI 
protein, function by making direct contact with the o”° 
subunit of o”°-RNA polymerase holoenzyme. Other 


target sites within RNA polymerase that have been 
identified include: the N-terminal domain of the « 
subunit, which is contacted by CRP at certain pro- 
moters; the B subunit, which is contacted by DnaA at 
the à Pr promoter; and the P’ subunit, which is con- 
tacted by the bacteriophage N4 single-stranded DNA 
binding protein. 


Some Activators Work by Altering 
Promoter Structure 

A limited number of activators function by altering 
the DNA structure within the promoter. The para- 
digm for this class of activators is the MerR protein, 
which activates transcription of genes required for 
mercury resistance. The spacing between the —10 
and —35 elements of the merP promoter is slightly 
greater than the optimal spacing. This prevents tran- 
scription initiation from the merP promoter. MerR 
binds to a site located between the —10 and —35 
regions of the merP promoter. Upon binding of mer- 
curic ions, MerR underwinds the DNA to realign 
the —10 and —35 promoter elements, allowing RNA 
polymerase to initiate transcription. 


Activators of o°4-RNA Polymerase 
Holoenzyme Catalyze Open Complex 
Formation 

Unlike other forms of RNA polymerase, ot-RNA 
polymerase holoenzyme (o*-holoenzyme) recognizes 
promoter elements at the —12 and —24 regions. o°*- 
Holoenzyme binds the promoter to form a closed 
promoter complex, but it is unable to undergo isomer- 
ization to an open complex. Transcription initiation 
from a o°*-dependent promoter requires an activator 
that catalyzes the isomerization of the closed complex 
to an open complex. This type of activator must hydro- 
lyze ATP or other nucleoside triphosphates to catalyze 
open complex formation, a feature that distinguishes it 
from other bacterial transcriptional activators. 

The best characterized activator of o°*-holo- 
enzyme is NtrC, which activates transcription from 
glnA in enteric bacteria. NtrC-binding sites at glnA 
have the properties of eukaroytic transcriptional 
enhancers in that they can function when placed sev- 
eral kilobases away from the promoter. Hence, bind- 
ing sites for activators of o°*-holoenzyme are referred 
to as enhancers and the activators are considered to be 
bacterial enhancer-binding proteins. 

Enhancers serve two purposes in transcriptional 
activation. First, enhancers tether the enhancer-binding 
protein near the promoter to increase the local con- 
centration of activator and improve the chance 
of productive interactions between o°*-holoenzyme 
and the enhancer-binding protein. Second, enhancers 
facilitate the assembly of the enhancer-binding 


proteins into an active oligomeric form. A single dimer 
of NtrC is unable to activate transcription, but bind- 
ing of NtrC to the enhancer promotes the assembly of 
an oligomeric form of the protein that can activate 
transcription. 

Interactions between the enhancer-binding protein 
and o°*-holoenzyme are necessary for transcriptional 
activation. Chemical crosslinking studies suggest 
that the enhancer-binding protein interacts with both 

54 and the B-subunit of RNA polymerase. Enhancer- 
binding proteins contact o”-holoenzyme through 
DNA looping after binding to sites upstream of the 
promoter. At some promoter regulatory regions, these 
loops result from random and transient changes in 
DNA structure, while at other promoter regulatory 
regions the loops are stabilized by the integration host 
factor. At such promoter regulatory regions, binding 
of integration host factor to a site located between the 
promoter and the enhancer causes a sharp bend in the 
DNA, which facilitates productive interactions be- 
tween o°'-holoenzyme and the enhancer-binding 
protein bound to their respective sites. 

The activities of enhancer-binding proteins are 
themselves regulated. Several of them, such as NtrC, 
are response regulators of two-component systems. 
These proteins are activated after being phosphorylated 
by protein histidine kinases that are responsive to envir- 
onmental or cellular signals. Other enhancer-binding 
proteins are activated upon binding an inducer. Finally, 
the activities of some enhancer-binding proteins are 
regulated through interactions with other proteins. 


Repressors of Transcription 


The operon theory of repression, enunciated in 1961 
by Jacob and Monod, was the earliest model to 
account for transcriptional control in bacteria. Re- 
pressors bind at specific DNA sites, referred to as 
operators, where they interfere with the binding of 
either RNA polymerase or activators. It was origin- 
ally thought that repression was caused by binding of 
one repressor to a single operator. Subsequent studies 
showed that repression is more complicated than this 
simple model, and usually involves multiple operators 
or even the involvement of additional proteins. 


The Lac Repressor 

The Lac repressor protein, Lacl, prevents the tran- 
scription of genes involved in lactose utilization (lac 
genes) in E. coli. Like many other repressors, Lacl 
utilizes multiple operators to increase the efficiency 
of repression. The main operator, O1, is centered at 
+11 relative to the transcriptional start site of the lac 
operon. Auxiliary operators, O2 and O3, are centered 
at positions +412 and —82, respectively. Lacl, which is 
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a tetramer, binds to O1 through two of its subunits. 
The other two subunits then bind either O2 or O3, 
forming a loop between the operators. Either config- 
uration prevents the binding of RNA polymerase to 
the promoter. Binding of the inducer allolactose to 
Lacl stabilizes a conformation of the protein that has 
a low affinity for the operator, which results in 
decreased occupancy of the lac operators and the 
derepression of the lac genes. 


The Cyt Repressor 

The Cyt repressor, CytR, is an anti-activator that 
regulates the expression of nine transcription units 
in E. coli whose products are involved in pyrimidine 
biosynthesis. Cytidine is an inducer that relieves 
the repression of these genes by CytR. Most of the 
promoter regions of these operons havea CRP-binding 
site near position —41 and a second CRP-binding site 
near position —94. A binding site for CytR is located 
between these two CRP sites, but CytR only binds 
to this site and represses transcription when CRP 
is bound. Binding of CytR to its site represses 
transcription by preventing CRP from contacting the 
a subunit of RNA polymerase. 
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In bacterial genetics, the term competence refers to 
the ability of cells to take up DNA molecules from 
their environment, whereas transformation refers to 
the acquisition of a new genetic property from that 
DNA. For example, competent Escherichia coli cells 
sensitive to the drug ampicillin may take up and repli- 
cate a plasmid DNA molecule carrying an amp™ gene, 
and thus become transformed to ampicillin resistance. 
In many bacteria the ability to actively take up DNA 
develops under natural conditions, controlled by 
genetically programmed developmental pathways. It 
is important to distinguish between this ‘natural’ or 
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‘evolved’ competence and the competence of cells that 
have been artificially permeabilized by exposure to 
divalent cations or electrical shock. Such procedures 
are widely used in the laboratory to introduce plas- 
mids into cells, but probably have no evolutionary sig- 
nificance. The term ‘transfection’ is used to describe 
transformation of bacterial cells with DNA of an in- 
fectious bacterial virus (a bacteriophage) and, more 
generally, for transformation of cultured mammalian 
cells (see Transfection). 


Issues Common to Both Natural and 
Artificial Competence 


DNA uptake is usually not a continuous process, i.e., a 
longer incubation time does not result in more trans- 
formants. Rather, under laboratory conditions, uptake 
is completed in a short time, and increasing the 
amount of DNA beyond a saturation point does 
not give more transformants. Two measures are com- 
monly used for transformation, transformation effi- 
ciency (transformed cells per microgram of DNA) for 
artificially competent cells, and transformation fre- 
quency (transformed cells per total cells) for naturally 
competent cells. Where transformation is done with 
saturating concentrations of DNA, the competence of 
the culture is most sensitively measured by the trans- 
formation frequency rather than by direct measure- 
ments of uptake of radiolabeled DNA. Most competent 
cells can take up more than one DNA molecule. Thus, 
with saturating DNA, the frequency of cotransforma- 
tion with different fragments or plasmids (‘congres- 
sion’) can be used to estimate the fraction of cells in the 
culture that are competent. This is often much less 
than 100%. 


Natural Competence 


Distribution 

Many bacteria are naturally able to take up DNA from 
their environment. Usually double-stranded linear 
DNA fragments are the preferred substrate. These 
cannot replicate independently and are quickly 
degraded unless they recombine with a homologous 
region of the chromosome, replacing the resident ver- 
sion of the sequence. 

Natural competence is not confined to particular 
evolutionary lineages on bacterial phylogenetic trees, 
but is sporadically distributed among lineages that 
also have many nontransformable groups. This sug- 
gests that competence may have evolved independ- 
ently many times, and/or may have often been lost 
during evolution. Even within a single species, differ- 
ent strains or isolates often differ greatly in their abil- 
ity to be transformed. 


Importance 

Transformation has been demonstrated under semi- 
natural conditions, in soil, water, and the mammalian 
hosts of bacterial pathogens. However, we have only 
indirect evidence about the frequency of DNA uptake 
in nature. DNA is relatively abundant in many envir- 
onments; concentrations range from 0.2-50 pg 17! in 
fresh water and sea water to 300 pg ml‘ in some bodily 
fluids; in each case this is a significant fraction of 
the total available nutrients. Furthermore, in some 
environments DNA can be more abundant than free 
nucleotides and bases. Most of this DNA is not closely 
related to the genomes of the competent cells that 
might take it up. 


Immediate utility to bacteria 

The primary benefit that cells obtain from taking up 
DNA is likely to be the nucleotides the DNA contains 
rather that the genetic information it encodes. Because 
nucleotides are expensive for cells to produce and can 
also be metabolized into energy, phosphate, and nitro- 
gen, nucleotides from DNA can provide a significant 
resource for the cell. In contrast, the new genetic 
combinations that can be produced when cells take 
up and recombine with DNA from closely related 
bacteria are more often harmful than beneficial, both 
because coadapted sets of alleles may be disrupted and 
because the new DNA may carry harmful mutations. 
Many noncompetent bacteria, including E. coli, also 
use DNA as a source of nucleotides; however, they do 
so by secreting nucleases and taking up the nucleotides 
they release. 

The short-term nutritional function of DNA 
uptake does not detract from the long-term evolution- 
ary importance of the recombination that it causes. 
Every sequenced bacterial genome contains many seg- 
ments that appear to have come into the genome from 
other bacteria, providing evidence that natural selec- 
tion has sometimes favored ancestors with recombin- 
ant genomes. In naturally competent bacteria, much 
of this recombination will have occurred by transform- 
ation. However, in the short term these beneficial 
recombinants arise so rarely that they are unlikely to 
influence the evolution of competence, or the evolu- 
tion of the other processes that can produce them. In 
certain bacteria, the abundance in the genome of an 
‘uptake signal sequence,’ whose only known function 
is in competence, indicates that uptake of homologous 
DNA has played a significant role in shaping the 


genome. 


Population structure and evolution 

In some Bacillus, Helicobacter, and Neisseria species, 
transformation with closely related DNA is frequent 
enough that the population structure is fully mixed 


(panmictic), with linkages between alleles of different 
genes randomized by recombination rather than being 
inherited only clonally. This population structure 
resembles that of sexually reproducing eukaryotes, 
even though the actual amount of recombination 
is much less. In other naturally competent bacteria, 
the clonal population structure dominates, although 
genome sequences show that some genes have been 
transferred from one lineage to another of the same 
or closely related ‘species.’ This clonality may exist 
because DNA uptake is rare in nature, because homo- 
logous DNA is not commonly available, or because 
most of the available homologous DNA comes from 
sibling cells derived from the same clone. 


In the laboratory 

Transformation by natural competence provides an 
extremely convenient method of strain construction 
in the laboratory, because efficiencies are often much 
higher than those seen when the same cells are made 
artificially competent, and because the sequences pre- 
ferentially recombine with the chromosome. Natural 
transformation with simple plasmids (lacking chromo- 
somal sequences) can also be reasonably efficient. 
However, plasmids carrying chromosomal sequences 
often recombine with the chromosome when they 
enter the cell; this may be a benefit or an inconveni- 
ence, depending on the desired outcome, and can be 
prevented by using a recombination-deficient host. 


Mechanisms of DNA Uptake and 
Recombination 

The mechanisms and regulation of natural compe- 
tence have been well characterized in only a few 
groups, primarily the gram-negative genera Haemo- 
philus and Neisseria and the gram-positive Bacillus 
and Streptococcus. In both gram-negative and gram- 
positive bacteria, specific proteins on the competent 
cell surface bind double-stranded DNA and pass it to 
the DNA-translocation machinery in the inner (cyto- 
plasmic) membrane, which then threads one or both 
strands of the DNA across this membrane into the 
cell. One strand of the DNA is usually degraded; this 
may occur before, during, or after entry of the other 
strand into the cytoplasm. The proteins involved in 
the initial steps are different in different bacteria, but 
at least some components of the membrane-threading 
machinery are homologous. 

The topological problems associated with DNA 
uptake have received little attention. DNA molecules 
are very big (a 10-kb molecule is as long as a bacterial 
cell), and double-stranded DNA is not very flexible 
(persistence length ~50nm). DNA also carries a 
strong negative charge that may repel the cell surface 
and that will resist passage through hydrophobic 
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membranes. In principle, linear DNAs may be 
threaded through relatively small pores, starting at 
one end, but this may require a protracted search for 
the end of a long fragment. Some bacteria (e.g., Bacil- 
lus subtilis) solve this problem with a cell-surface 
nuclease that cuts long fragments to create new ends. 
However, other bacteria do not cut DNAs, and the 
initial steps of uptake do not require a free end, so 
kinking or other solutions must be involved. 

DNA strands that have no homology to the 
chromosome are rapidly degraded to nucleotides by 
cytoplasmic nucleases. The fate of homologous 
sequences is determined by the balance between 
these degradative nucleases and the cell’s machinery 
for recombinational repair, which carries out a hom- 
ology search with any single strands it encounters. 
This search can be very efficient, and up to 50% of 
incoming DNA may recombine with homologous 
sequences in the chromosome. Even large insertions 
and deletions can be readily recombined into the 
chromosome if they are flanked by sequences homo- 
logous to the chromosome. 


Regulation of Competence 

Competence develops most commonly under condi- 
tions of growth downshift or nutritional stress, but the 
genes, signals, and mechanisms of regulation differ 
widely. Many regulatory genes have been identified, 
but few appear to be specific to competence (i.e., 
most also control other cellular functions). Laboratory 
cultures of Neisseria gonorrhoeae are competent at all 
stages of growth. Some cells in Haemophilus influen- 
zae cultures become competent at the end of expo- 
nential growth, and the entire culture becomes 
competent after an abrupt transfer to a starvation 
medium. B. subtilis uses secreted factors and a com- 
plex network of nutritional signals to coregulate 
competence induction with other ‘postexponential’ 
cellular processes at the onset of stationary phase. 
Streptococcus pneumoniae becomes competent in 
response to a secreted factor as culture density in- 
creases during exponential growth. There is little evi- 
dence that different bacteria use common factors to 
regulate competence, suggesting that regulation has 
evolved independently in response to different selec- 
tive pressures or environments. 


Specificity of DNA Uptake 

Although competent cells of gram-positive bacteria 
will bind and take up all double-stranded DNAs 
equally well, several competent gram-negative bac- 
teria (H. influenzae, N. gonorrhoeae, and N. meningi- 
tidis) efficiently bind only DNAs from their own or a 
closely related species. This specificity is due to pre- 
ferential binding of DNA fragments containing a 9- to 
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10-bp sequence, the uptake signal sequence (USS), 
which is very abundant in each organism’s genome 
(approximately one copy per 1000 bp). The USSs are 
not species-specific; the two Neisseria species share 
a common USS, and the H. influenzae USS is also 
abundant in other members of the family Pasteurella- 
ceae. The evolutionary role of USSs is not understood. 
Several other naturally competent bacteria are known 
to preferentially take up conspecific DNA, but no 
USS has been identified; these include Helicobacter 
pylori, Campylobacter jejuni, Pseudomonas stutzeri, 
and Azotobacter vinlandii. 


Artificial Competence 


Transformation of laboratory cultures with plasmids 
is an essential tool for molecular biology, and the 
availability of a reliable laboratory procedure for 
transformation may determine whether a particular 
bacterial species or strain is suitable for genetic analy- 
sis. Cells made competent by these procedures are 
usually transformed with self-replicating plasmids as 
the DNA taken up by such cells does not usually 
recombine with the chromosome even if sequences 
are homologous. 


Chemical Competence 

In the standard procedure for preparing competent 
E. coli cells, exponentially growing cells are incubated 
inacold solution of calcium chloride and then exposed 
to circular double-stranded plasmid DNA molecules 
which bind to the cell surface. A brief heat shock then 
allows the DNA to enter the cells. Transformation 
efficiencies are usually between 10° and 10° trans- 
formants per microgram of plasmid DNA. Higher 
efficiencies can be obtained with more complex pro- 
tocols, using other divalent cations such as rubidium. 
Simpler protocols are also available; for example 
transfering the cells to medium containing the solvent 
dimethylsulfoxide (DMSO). Similar procedures have 
been developed for many other bacterial species. Most 
artificially competent cells can be stored frozen, and 
frozen competent cells of standard E. coli strains are 
commercially available and very convenient. 


Electroporation 

Transformation by electroporation uses very brief 
exposure to a very high electric field to create transient 
small openings in the cell membranes, through which 
DNA may enter. A dedicated power supply and spe- 
cial cuvettes are needed, and conditions must be care- 
fully controlled to prevent killing the cells. Under 
optimal conditions, transformation efficiencies of 
10°-10'° transformants per microgram of plasmid 
DNA are obtained. Electroporation procedures have 
been developed for many bacterial species and are 


especially valuable where chemical competence is not 
an option. Frozen ‘electrocompetent’ E. coli cells are 
commercially available. 
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Genetic recombination has two major forms. Homo- 
logous recombination involves exchange between two 
DNA molecules that have extensive homology, and is 
catalyzed by proteins that can function anywhere 
along the homologous regions. Site-specific recombin- 
ation, on the other hand, involves exchange between 
two DNA molecules that have little or no homology, 
and is carried out by specialized proteins that act only 
at those particular sites. Both forms of recombin- 
ation are central to the life cycles of bacteriophages, 
although their importance varies greatly between 
different phage species. 

This article begins with a brief historical perspective 
that highlights a few of the most important early devel- 
opments in bacteriophage recombination, followed by 
a more indepth overview of some of the key roles of 
homologous recombination in bacteriophage, and 
closes with a brief summary of site-specific recombin- 
ation. The emphasis will be ona broad overview of the 
biology of phage recombination, and the reader is 
encouraged to explore the fascinating mechanisms 
underlying this biology in numerous other review 
articles and in the primary literature. 


History 


The recombination of bacteriophage genetic markers 
was first observed in the 1940s, a few years before the 


nucleic acids were shown to be the genetic material of 
bacteriophage. The propensity of bacteriophage to 
recombine and the ease of measuring this recombin- 
ation were crucial during the ‘explosive’ period when 
studies of bacteriophage largely created the new field 
of molecular biology, as portrayed in the excellent 
book Phage and the Origins of Molecular Biology 
(Cairns et al., 1992). 

One important role of homologous recombination 
in phage life cycles began to emerge in 1947 when 
Salvador Luria discovered the phenomenon of ‘multi- 
plicity reactivation’ in the T-even phages T2, T4, and 
T6 (Luria, 1947). In multiplicity reactivation, heavily 
damaged (UV-treated) phage are able to mount a 
productive infection only when two or more phage 
particles coinfect the same cell. Homologous recom- 
bination pathways are somehow able to stitch to- 
gether complete phage chromosomes free of damage 
beginning with multiple chromosomes that have 
damage throughout their lengths. Consistent with the 
importance of recombination in DNA repair, phage 
T4 mutants deficient in particular recombination 
proteins were later isolated and found to be hypersen- 
sitive to DNA-damaging agents. 

Early studies on the structure of T-even phage gen- 
omes also highlighted another key role of homologous 
recombination in phage life cycles. The packaging of 
T-even phage genomes into the protein capsid was 
found to occur by a ‘headful packaging’ mechanism, 
in which DNA is stuffed into a preformed head struc- 
ture until the head is full (Streisenger et al., 1967). 
Homologous recombination was found to be essential 
in generating the DNA precursor for packaging, 
namely a very long ‘concatamer’ in which multiple 
phage genomes are linked one after the other. We 
will now turn to a detailed description of the import- 
ance of concatameric DNA in phage life cycles. 


Homologous Recombination 


Production of Concatemeric Phage DNA 

A critical issue in the replication of DNA is the ‘end 
problem,’ which refers to the difficulty of replicating 
a DNA end. This end problem arises from three facts: 
(1) the two chains of a DNA duplex are antiparallel; 
(2) DNA polymerases can only synthesize DNA 
in the 5’ to 3’ direction; and (3) DNA polymerases 
require a pre-existing primer (usually a short RNA 
chain) to begin synthesis. Thus, the two 3’ ends of a 
linear DNA molecule cannot easily be used as tem- 
plates, and these 3’ ends in the daughter molecules 
remain single-stranded (ss). A linear DNA duplex 
would thereby become shorter and shorter in succes- 
sive replication cycles unless some mechanism is 
invoked to avoid the end problem. Most bacterial 
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chromosomes avoid the end problem by being cir- 
cular, while linear eukaryotic chromosomes have 
special end sequences, telomeres, which allow a 
novel mode of replication to solve the end problem. 
Bacteriophages have adopted a variety of strategies 
to deal with the end problem, and these strategies 
are reflected in the diverse genome structures of dif- 
ferent phages. Phages such as phi X174 and PM2 have 
adopted the same simple strategy as bacterial cells, 
namely the use of a circular genome. Others, like 
phi 29, have adopted another simple strategy, using a 
terminal protein that serves as a primer for DNA 
polymerase at the very ends of the genome. For most 
phages, however, the solution to the end problem 
involves homologous recombination in one way or 
another. 

As mentioned above, the T-even phages use homo- 
logous recombination to generate the concatemeric 
DNA that is a precursor to headful packaging. The 
pathways for generating these long DNA concatemers 
are quite complex, and indeed are still under intensive 
study. To understand the process, we need to consider 
first the structure of DNA within the phage head. 
Each T4 particle contains one linear duplex of phage 
DNA. Even though the nonredundant T4 genetic 
material consists of 169003 bp, each packaged DNA 
is several kilobases longer. The explanation is that the 
DNA sequence at one end of the packaged DNA is 
repeated at the other end, so-called ‘terminal redun- 
dancy.’ Using simple letters instead of more complex 
gene names, a given phage DNA molecule might have 
the structure ABCD... YZAB, with segment AB as 
the terminal redundancy. A second complexity of the 
genome structure is called ‘circular permutation,’ 
which refers to the fact that different ends are found 
in different phage DNA molecules. Thus, a series of 
five phage DNA molecules (with terminal redun- 
dancy) could be represented as: ABCD... YZAB; 
BCDE...ZABC;CDEF... ABCD;DEFG...BCDE; 
EFGH...CDEF. This is an oversimplification — the 
precise ends of the genome can be within genes. 
Indeed, every one of the 169 003 bp of the T4 genome 
may be an end within a subset of the packaged DNA 
molecules. 

During the course of an infection, the infecting 
linear DNA of phage T4 is replicated into very long, 
branched concatemers containing hundreds of phage 
genome equivalents (i.e., tens of millions of base pairs). 
Immediately after infection, T4 replicates from intern- 
ally located replication origins to duplicate the gen- 
ome once or a few times (Figure |). As described 
above, the 3’ ends of the parental DNA cannot be 
completely replicated, resulting in ss ends that are 
recombinogenic. During infection by a single phage 
particle, the two daughter molecules from this early 
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replication undergo homologous recombination, with 
the right end of one daughter interacting with the left 
end of itself or the other daughter molecule. In a 
process that will be described more fully below, the 
recombination event creates a new replication fork 
that travels down the length of the chromosome 
(Figure 1). Notice that the DNA becomes longer 
than the infecting molecule as replication proceeds; 
repeated rounds of this recombination-dependent 
replication (RDR) generate the very large concatemers 
that are found later in the infection. During infection 
with multiple phage particles, the ends of one 
molecule will also be homologous to the middles of 
others, raising additional possibilities for generating 
branched concatameric DNA, and creating genetic- 
ally recognizable recombinant phage in the process 
(see Figure 1). 

The packaging machinery of T4 recognizes the long 
concatameric products of RDR and begins the process 
of packaging DNA into the new phage particles. 
Packaging into each head is terminated, by endonu- 
cleolytic cleavage, when the head becomes full. Since 
the phage head is large enough to hold slightly more 
than one genome length, the packaged DNA is ter- 
minally redundant. The packaging machinery uses a 
single concatemer to fill progressively many heads 
with DNA. Each successive head will have a different 
end sequence (since more than one genome length 
is packaged), and this causes the circular permutation 
observed in packaged phage DNA. 

Phage T7 particles contain a linear duplex DNA of 
39 936 bp, with terminal repeats of the same 160-bp 
sequence at the two ends of the viral genome. Thus, T7 
DNA is terminally redundant but has unique ends 
(i.e., it is not circularly permuted). After infection of 
the bacterial host, T7 DNA replicates from an internal 
origin, generating multiple copies with ss 3’ ends. 
These ss genome ends engage in a ‘nonconservative’ 
form of homologous recombination (Figure 2). 
Because the ss region on the right end of one genome 
is complementary to the ss region on the left end of 
another, simple annealing can join the two ends and 
thereby generate a dimeric phage chromosome. This 
recombination event is nonconservative because it 
results in the loss of one strand of DNA from each 
repeat, which translates into the loss of one complete 
copy of the 160-bp repeat. Additional rounds of end 
recombination generate longer and longer concat- 
emers, which are then used as the substrate for DNA 
packaging. T7 does not use the headful packaging 
mechanism described above. Instead, T7 enzymes 
recognize the end sequences, even though they are 
embedded within the concatemeric DNA. A complex 
series of steps then duplicates each 160-bp repeat and 
packages the remainder of the viral genome, located 


between repeats, into a phage particle. In this case, the 
special replication reaction is regenerating the strands 
that were lost during the original recombination 
event, in essence restoring the lost copy of the 
160-bp repeat so that all DNA in the concatemer can 
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DNA replication and recombination of bacteriophage 
T4 is presented schematically. (A) A single round of 
replication from an origin, with each DNA duplex 
indicated by a single line. DNA replication results in ss 
regions at the 3’ ends from the parental DNA (see also 
Figures 2 and 3), which activate recombination of the 
ends. Because of the terminal redundancy, the end of 
one daughter molecule is homologous to the opposite 
end of itself or of the other daughter molecule. The 
recombination event that ensues initiates a new 
replication fork, by a mechanism that will be described 
below. (B) A similar event during a coinfection, with the 
end of one phage DNA molecule homologous to the 
middle of another. 


be productively converted into genomes with a full 
repeat at each end. 

Phage lambda uses yet another strategy for gener- 
ating concatameric DNA prior to packaging, and once 
again recombination proteins are involved. Lambda 
phage particles contain a linear duplex DNA with each 
strand containing 48 502 bp, but the two 5’ ends of the 
duplex overhang the 3’ ends by 12 bases. Since these 
two ends are complementary to each other, they are 
‘cohesive’ (cos sites) and quickly anneal with each 
other when the DNA enters the new host. Thus, the 
infecting DNA initially forms a circle, but later in 
the infection, concatemeric lambda DNA is evident 
(and required for packaging). The concatemeric DNA 
is generated by rolling-circle replication, with one tem- 
plate circle spinning off repeated daughter molecules. 
Although the precise mechanism for establishing 
rolling-circle replication is still not elucidated, phage- 
encoded recombination proteins are involved. Pack- 
aging of lambda DNA from the concatemer involves 
staggered endonucleolytic cleavages at the cos sites, 
which regenerates the 12 base cohesive ends found in 
the phage particles. 


Recombination-Dependent Replication 

As mentioned above, the process of RDR is important 
in the generation of concatemeric DNA in the T-even 
phages (and perhaps many others). RDR is also critical 
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Figure 2 End annealing in bacteriophage T7. The 
infecting duplex of T7 DNA is depicted at the top, with 
the heavy boxes at the ends indicating the 1|60-bp 
terminal repeats. After DNA replication from an 
internal origin, one 3’ end of each daughter molecule 
is exposed in ss form. Because the two ends are 
terminally redundant, these two ss regions are com- 
plementary and can anneal to form a duplex. This 
recombination reaction may require additional nucleo- 
lytic trimming and/or repair replication, which are not 
shown here. 
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for producing large amounts of phage DNA. Phage 
T4 mutants deficient in any of several different recom- 
bination proteins display a severe defect in DNA 
synthesis. Indeed, most T4 DNA replication requires 
the phage-encoded recombination proteins. 

Until recently, RDR was thought to be an odd 
quirk of a complicated phage life cycle. However, the 
process has now turned up in many unexpected situ- 
ations throughout the biological world. First, the 
recombinational repair of double-strand breaks in 
both prokaryotes and eukaryotes very often involves 
either localized or extensive DNA replication. Sec- 
ond, as elucidated with studies in Escherichia coli, 
replication forks that were initiated at a bona fide 
origin of replication often run into difficulties and 
cannot complete replication of the intended genome 
segment. In these cases, a process of RDR creates a 
new replication fork to complete the job. Third, even 
the replication of mammalian telomeres may use RDR 
at certain times, for example in telomerase-deficient 
tumor cells. These and other fascinating examples of 
RDR are reviewed in a recent collection published in 
Trends in Biochemical Sciences (see Kowalczykowski, 
and von Hippel, 2000). 

The phage T4 system provides an excellent model 
in which to study RDR because it occurs at sucha high 
frequency in every infection. Genetic analyses over 
the years identified the phage-encoded recombination 
and replication proteins that are involved, and also 
characterized the precise nature of the recombinants 
that are produced during the process. Studies of phage 
DNA replication in vivo clearly showed that DNA 
ends can trigger the process, as expected from the 
general model depicted in Figure |. Nearly all of 
the involved proteins have been purified to homo- 
geneity, and biochemical studies have elucidated 
their precise biochemical functions and, in some 
cases, their three-dimensional structures. Further- 
more, a T4 RDR reaction was reconstituted with 
these purified proteins. 

In vivo, T4 RDR probably consists of a family of 
closely related pathways, but the mechanism depicted 
in Figure 3 conveys the major features and is probably 
the predominant pathway. In a step that is not 
depicted in Figure 3, an ss region is generated next 
to a DNA end, either by incomplete DNA replication 
or by exonucleolytic degradation. In the first step 
shown in the Figure 3, the ss end invades homologous 
DNA in a strand-invasion reaction catalyzed by 
phage-encoded recombination proteins, most notably 
the strand-exchange protein UvsX. The ‘D-loop’ that 
is formed then becomes the site of assembly of a new 
replication complex. The phage-encoded DNA poly- 
merase holoenzyme binds to and extends the invading 
3’ end to create a new leading strand of replication. 
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In addition, another phage protein binds to the D- 
loop and then loads the replicative helicase and pri- 
mase onto the displaced DNA strand, allowing Oka- 
zaki fragment synthesis on the lagging strand. 


Recombinational Repair 

The general process of recombinational repair was first 
inferred from the phenomenon of multiplicity reacti- 
vation, in which two or more heavily damaged phage 
genomes reconsitute a viable phage genome by recom- 
binational processes. However, recombinational repair 
in phages goes well beyond multiplicity reactivation, 
which is simply a convenient way to study the pro- 
cess. For example, recombination-deficient mutants of 
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Figure 3 Mechanism of recombination-dependent 
replication in bacteriophage T4. The ss 3’ end at the 
top is created by a previous round of DNA replication. 
Invasion of the single-stranded region into a homologous 
duplex creates a D-loop, which becomes the site of 
assembly of a new replication complex. The continuous 
leading strand is shown as a solid thick line and the 
discontinuous lagging strand as a dashed thick line. 


many phages are hypersensitive to DNA-damaging 
agents that are introduced during an infection. 

Many phage, and probably all cells, use recom- 
bination to repair and circumvent DNA damage. 
Indeed, one school of thought argues that the selective 
pressure for the evolution of recombination was the 
repair of DNA damage and, in this view, the exchange 
of genetic information was a fortunate byproduct of the 
process. Recombinational repair is a highly sophisti- 
cated process, rather than a set of random recombina- 
tion reactions that patch together two bad DNA 
molecules to make one good one. Clearly, the DNA 
damage itself often initiates the recombination reac- 
tion, since damaging agents generally stimulate recom- 
bination. Remarkably, damaged regions appear to be 
excluded from the progeny molecules during recom- 
binational repair. For example, in the process of multi- 
plicity reactivation, viable progeny are generated at a 
much higher frequency than expected from random 
recombination reactions. Two phage DNA molecules, 
each with multiple lethal lesions, can generate a viable 
phage in nearly every infection. Furthermore, the 
presence of several damaged phage DNA molecules 
does not jeopardize the fate of a single undamaged 
molecule. 

Pathways of recombinational repair are quite com- 
plex and varied, and many are discussed in the collec- 
tion of reviews in Kowalczykowski and von Hippel 
(2000). The basic process of T4 RDR, described 
above, probably forms the platform for most or all 
recombinational repair in phage T4. In phage lambda, 
both host-encoded and phage-encoded recombination 
proteins can participate in the repair of damaged phage 
chromosomes. 


Hotspots for Genetic Recombination 

The frequency of homologous recombination is not 
constant across the genome. In all phages that have 
been studied in sufficient detail, recombination is 
more frequent in certain regions called recombination 
hotspots. Recombination hotspots provide a very 
useful avenue to analyze the mechanisms of recom- 
bination, because they focus recombination into one 
region and because they reflect sites or structures in 
the DNA molecule that trigger the process. 

Origins of DNA replication cause recombination 
hot spots in several different phages, including 
phi X174, T7, and T4. In the case of phi X174, the re- 
plication originis recognized bya phage-encoded repli- 
cation initiation protein which nicks one strand to begin 
replication. The site-specific nick, or some derivative 
thereof, is presumably the signal that triggers inflated 
recombination. In the cases of T7 and T4, the recom- 
bination hotspots were measured by a procedure that 
involves damaging the DNA molecule that is donating 


the genetic material. The simplest model to explain 
these hotspots is that the replication origin triggers 
new replication forks, which are then blocked when 
they reach nearby damage. The increase in DNA copy 
number due to the localized replication presumably 
contributes to the inflated recombination. In addition, 
the blocked fork itself is probably a recombinogenic 
structure. As discussed above, DNA damage stimu- 
lates recombination, and this stimulation may require 
or be enhanced by the arrival of a replication fork. 

E. coli infected with phage lambda has provided one 
of the best ‘laboratories’ for studying the mechanisms 
of recombination (see Thaler and Stahl, 1988). Two im- 
portant kinds of hotspots emerged from these studies. 
First, when lambda recombination is catalyzed by the 
phage-encoded ‘Red’ proteins, double-strand breaks 
or ends reveal themselves as strong recombination hot- 
spots. The cos sites which are cleaved during packaging 
to generate the mature virion DNA were found to be re- 
combination hotspots, implying that some cos cleavage 
occurs prior to DNA packaging. In addition, sites for 
restriction enzymes can be shown to be recombination 
hotspots when the bacterial cells express the restriction 
enzyme and the lambda DNA is not protected from 
cleavage by the corresponding methylase enzyme. 

The second kind of recombination hotspot discov- 
ered in phage lambda infections is the x site, a short 
DNA sequence that is recognized by the RecBCD en- 
zyme of the bacterial host. This hotspot is only active 
when recombination is catalyzed by the host machi- 
nery, including RecBCD (a nuclease and helicase) and 
RecA (strand-exchange protein). Numerous copies of 
the x site are found around the bacterial chromosome 
and play important roles in bacterial recombination. 
Largely because of studies with the x site, we now 
understand the broad outlines of recombination 
promoted by RecBCD and RecA proteins. The 
RecBCD protein binds to double-strand ends and 
begins to degrade the DNA, working progressively 
inwards. When the enzyme encounters a % site, it 
changes from a degradative nuclease to a potent 
recombination machine, unwinding the two strands 
and loading the strand-exchange protein RecA onto 
the ss product of the unwinding reaction. The RecA 
protein then catalyzes a strand-invasion reaction, 
leading ultimately to recombinant products. 


Intron Mobility 

A fascinating genetic process was uncovered with the 
discovery of mobile introns in bacteriophage genomes 
(also found in eukaryotic organelle genomes). Like 
other introns, the RNA of mobile introns is spliced 
out of the final mRNA so that the gene into which the 
intron is inserted remains functional. The novel aspect 
of mobile introns is that they induce a particular 
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homologous recombination reaction at a very high 
frequency. The recombination reaction occurs at 
the level of DNA, and does not involve the intronic 
RNA. 

Even in closely related phages, the existence of 
mobile introns is inconsistent. For example, phage 
T4 has an intron within its td gene, but very closely 
related phages do not. When an intron-containing and 
an intron-free phage coinfect the same bacterial cell, 
something remarkable happens — virtually every pro- 
geny phage from the infection contains the intron 
DNA in its td gene. The intron somehow sends a 
copy of itself to the genome that does not contain 
the intron, while maintaining itself in the original 
genome. The process is extremely accurate — the 
intron always appears at exactly the same site in the 
td gene, and not elsewhere in the genome. 

The process of intron mobility was found to be 
dependent on a site-specific endonuclease, which itself 
is encoded within the intron. Remarkably, this endo- 
nuclease recognizes the intronless td gene and intro- 
duces a double-strand break very close to the site 
where the intron is normally located. DNA that 
already contains the intron is immune from cutting, 
because its sequence is different due to the presence of 
the intron. Once the intronless DNA is cleaved, it 
enters a double-strand break repair pathway. In this 
pathway, the broken DNA (genome without the 
intron) is repaired by recombination using an intact 
copy of the homologous sequence (genome with the 
intron) as template for DNA replication across 
the break. Thus, both resulting molecules contain the 
intron, and it appears as though the intron has sent a 
copy of itself to the DNA that was originally free of 
the intron. 

The process of intron mobility raises very interest- 
ing evolutionary questions. Which came first, the 
intron endonuclease or the RNA splicing component 
of the intron? Did the DNA without the intron have 
the intron at some point in its evolutionary history but 
then lose it? Or is the site in the related phage simply 
(almost) identical to the site in T4 into which the 
intron first inserted itself at some point in the distant 
past, with the specificity of transfer reflecting the 
initial specificity of the enzyme responsible for that 
process? In addition to its evolutionary interest, the 
process of intron mobility also provides an excel- 
lent opportunity to study the detailed mechanism of 
double-strand break repair. 


Site-Specific Recombination in 
Bacteriophages 


Studies of phage genome structures led to the first 
detailed understanding of a site-specific recombination 
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system. Lysogenic bacteriophage can enter into a 
benign relationship with their bacterial hosts; this is 
called lysogeny. Thus, a lysogenic phage genome in 
the chromosome of E. coli does not express any of its 
lytic genes, allowing the bacterial host to grow nor- 
mally. However, the lysogenic phage does express one 
or a few genes that alter the phenotype of the host, for 
example making the host immune to another infection 
with closely related phages. To establish the lysogenic 
state, many (but not all) species of lysogenic bacterio- 
phages integrate into the bacterial chromosome by a 
site-specific recombination reaction. Once inserted, 
the lysogenic bacterial cell can propagate indefinitely, 
making many copies of itself and of the phage DNA 
that is harbored within the bacterial genome. Under 
certain conditions, however, the lysogenic state is ter- 
minated and the phage enters a normal lytic cycle. At 
this time, the phage DNA first excises from the bac- 
terial chromosome by a reversal of the integration 
reaction, then replicates extensively, and is finally 
packaged into new phage particles. 

Campbell (1962) provided the first model for a site- 
specific recombination reaction, the integration of 
lambda DNA into the host chromosome. Inthis elegant 
model, a particular site on the lambda chromosome, 
attP (attachment phage), and another site on the 
bacterial chromosome, attB (attachment bacterial), are 
cleaved and then reconnected in the opposite config- 
uration to integrate the lambda DNA (Figure 4A). The 
Campbell model has now been confirmed and extended 
by reconstitution of the reaction zn vitro and numerous 
elegant biochemical and genetic experiments. The 
reaction is catalyzed by the phage-encoded integrase 
(or Int) protein, which makes staggered double-strand 
breaks in the att sites, rearranges the broken DNA to 
swap partners, and reseals the DNA in the recombin- 
ant configuration (Figure 4A). 

A family of site-specific recombination reactions 
that are related to integration involves two sites in 
the same DNA molecule, and results in a genetic 
flip-flop. In this case, cleavage and religation of the 
two sites by an invertase protein flips the intervening 
DNA. One well-studied example is located within the 
genome of phage Mu (Figure 4B). The intervening 
segment of DNA encodes two genes that control the 
host specificity of the phage. In one orientation, the 
genes S and U are expressed and allow attachment 
to E. coli K-12 (and certain other hosts). When the 
DNA segment flips, the alternate genes S’ and U! are 
expressed, and phage particles with these proteins 
bind instead to Citrobacter freundit, Shigella sonnei, 
and certain other hosts. Thus, the genetic flip-flop 
increases the host range by allowing the phage to 
have two very different attachment specificities. The 
invertase protein that catalyzes the recombination 
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Figure 4 Site-specific recombination reactions in 
bacteriophages lambda and Mu. (A) The integration of 
lambda DNA into the host bacterial chromosome, with 
each line representing duplex DNA. The jagged breaks 
within attP and attB indicate the staggered DNA breaks 
that occur during the recombination reaction. (B) The 
inversion of the genes responsible for host recognition 
in phage Mu, with each line representing duplex DNA. 
The genes are transcribed and translated from the left 
flank, so that only the genes on the left border are 
expressed. The DNA is redrawn in a looped configu- 
ration to illustrate the substrate for site-specific 
recombination. 


event is encoded within the phage Mu genome, near 
(but not within) the invertible segment. Other phages 
such as P1 and P7 contain analogous systems. 

The last example of site-specific recombination in 
bacteriophage involves the process of transposition. 
Transposons are segments of DNA that can move 
from one DNA location to another, often causing 
mutations at the sites where they insert. Remarkably, 
one family of bacteriophages are actually transposons, 
the best-studied example being phage Mu. When 
phage Mu infects a new bacterial cell, the phage 
DNA transposes into the bacterial chromosome at a 
randomly chosen site. Mu can then take either of two 
pathways. In one, the phage becomes a lysogen much 
like phage lambda, with all of its lytic genes repressed 
and the one copy of Mu DNA remaining at its singular 
location in the bacterial chromosome. In the other 
pathway, Mu undergoes a lytic infection in which it 
replicates extensively and is ultimately packaged into 


new phage particles. In this lytic infection, all of the 
DNA replication of the phage occurs by means of 
replicative transposition events. Thus, the phage DNA 
is madly transposing around the bacterial chromosome 
in ever-larger numbers as the lytic infection proceeds, 
until the cell lyses and releases a crop of new phage 
particles. Phage Mu has provided a wonderful system 
to study transposition, since the process occurs at an 
extremely high rate during a lytic infection (hundreds 
of events per cell per hour), as opposed to the very low 
rate of other bacterial transposons (on the order of 
1074 events per cell per hour). 


Closing Comments 


Homologous and site-specific recombination play 
many important roles in the life cycles of various 
bacteriophages. Some have been presented in this 
chapter, but numerous others have been either studied 
less well or remain to be discovered by the next gen- 
eration of researchers. Undoubtedly, both homo- 
logous and site-specific recombination also play 
major roles in the evolution of the huge diversity of 
bacteriophages found in nature, although the details 
are less clear. The study of bacteriophage recombina- 
tion holds great interest beyond an appreciation of the 
life cycles and evolution of bacteriophage. For ex- 
ample, over the second half of the twentieth century, 
the elucidation of particular pathways of recombina- 
tion in bacteriophage has provided some of the most 
important advances in the field of molecular biology. 
Furthermore, bacteriophage enzymes involved in 
recombination (e.g., ligases, polymerases, site-specific 
recombinases) have provided key tools used through- 
out molecular biology. Finally, in recent years, 
different bacteriophages have been shown to play 
key roles in bacterial pathogenesis, and thus bacterio- 
phage recombination pathways are also relevant to 
human disease processes. 
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The rapid spread of multi-drug-resistant bacterial 
strains is limiting the effectiveness of antibiotic treat- 
ment and leading to the intense search for alternatives. 
As one consequence, the use of bacteriophages as anti- 
biotics is regaining attention, particularly for treating 
a wide variety of diseases whose control with chemo- 
therapeutic agents is difficult. With our present level 
of understanding and technical expertise, using phages 
as antibiotics makes sense in both scientific and eco- 
logical terms. High specificity against the infectious 
agents and the benignity of lytic bacteriophages offers 
encouraging possibilities of their successful usage. In 
the last few years, three companies have been estab- 
lished in the US specifically to develop phage therapy 
applications, and all three are nearly ready for human 
trials as well as having very promising phages available 
for agricultural uses. A number of additional scientists 
are exploring potential therapeutic applications in uni- 
versities, institutes, government facilities, and broader- 
based corporations in Europe, North America, Israel, 
and India. This article will focus on giving an overview 
of the history and current applications; sources cited 
at the end will provide additional information that is 
regularly updated. 

Felix d’Herelle, co-discoverer of bacteriophages, 
immediately focused on their potential for the treat- 
ment of bacterial disease. In 1915, he was at the 
Pasteur Institute studying prevention and treatment 
of infectious disease and was working on an outbreak 
of severe dysentery among French troops in a nearby 
town. In connection with trying to develop a vaccine, 
he made bacteria-free filtrates from the feces of 
patients and mixed them with Shigella bacteria isolated 
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from the patients. When this mixture was spread 
on plates, he saw the round, cleared areas he later 
named plaques. In a 1917 paper to the French Aca- 
demy of Sciences, he suggested that these were caused 
by viruses that parasitized the bacteria. D’Herelle 
soon carried out a great deal of work using phage 
therapy approaches in animals. For example, he 
worked throughout France studying phage use in con- 
trolling the widespread problem of typhosis in chick- 
ens and spent 1920 in Indochina, mainly treating 
barbone infections in cattle. The probable first use of 
phage to treat people was in 1919, when d’Herelle 
collaborated with the Chief of Pediatrics at the 
Hôpital des Enfants-Malades in Paris to treat severe 
dysentery. He and several interns swallowed the 
concentrated phage preparation to guarantee its 
safety the day before giving it to the first patient, a 
12-year-old boy; encouragingly, the several children 
that were treated with phage all got well within a day 
or two. 

The best sources of phages against Shigella and 
Escherichia coli are the stools of people recovering 
from dysentery, emphasizing the role of phages in 
natural disease modulation. D’Herelle actually 
reported in his first phage paper that the isolation of 
bacteriophage from dysentery convalescents with 
residual enteritis was easy, although he did not find 
significant phages against the bacteria causing 
dysentery in uninfected individuals or in patients 
in the active phase of the disease until he introduced 
very sensitive plaque assays. He went on to carefully 
characterize bacteriophages as viruses that multiply 
in bacteria and worked out the details of infection 
by various phages of different bacterial hosts under a 
variety of environmental conditions. Always, he was 
working to combine natural phenomena with labora- 
tory findings, to better understand immunity and nat- 
ural healing from infectious disease (Summers, 1998). 

There was much academic argument over the nat- 
ure of bacteriophages, but d’Herelle’s enthusiasm for 
their potential for treating bacterial disease was infec- 
tious. Over 700 papers related to phage therapy were 
written in the first half of this century; even the Eli 
Lilly Co. had phage preparations for several bacterial 
diseases in their catalogues, and there were many stor- 
ies of remarkable successes. However, phage therapy 
research and clinical application were abandoned by 
the Western World after World War II. This happened 
because of the mixed success of the approach at that 
time — a consequence of poor basic understanding of 
phage biology then, difficulties in bacterial identifica- 
tion coupled with the high specificity of phages, low- 
quality work by many enthusiasts — and because of the 
discovery and widespread introduction of broad- 
spectrum antibiotics. 


Tbilisi, Georgia 


Work on therapeutic uses of phages continued in the 
USSR, where they had much less access to the new 
antibiotics and more experience and trust in the appli- 
cation of phages. This work was led by scientists at 
what is now called the Eliava Institute of Bacterio- 
phages, Microbiology, and Virology in Tbilisi. During 
visits to the Pasteur Institute in 1921-26, its founder 
George Eliava worked extensively with d’Herelle; 
their first joint paper was published in April 1921. 

During over half a century, the leading direction of 
the Institute has remained the investigation of bacterio- 
phages. Many basic and practical studies were aimed at 
the understanding of the isolation and selection, mor- 
phology and biology, serology and taxonomy of viru- 
lent and temperate bacteriophages. Bacteriophage 
ecology, phage—host bacterial cell interaction mechan- 
isms, appearance and development of lysogeny, the 
methods of isolation of active phage clones to aerobic 
and anaerobic bacteria, phage purification and con- 
centration have been studied at the Institute. At its 
height, the Institute had about 150 research scientists, 
with an additional 650 people associated with the facil- 
ity for mass producing phage for commercial applica- 
tions. At times they produced over 2 tonnes of phage 
preparations a day, which were distributed through 
hospitals and pharmacies all over the USSR. 

The Institute had focused its interest on applica- 
tions of phages in medicine for treatment and prophy- 
laxis of different infectious diseases. Polyvalent phage 
preparations against purulent microorganisms, par- 
ticularly intestinal and wound anaerobic infections, 
have been created. ‘Piophage’ (containing phages 
active against Staphylococcus, Streptococcus, Proteus, 
E. coli, and Pseudomonas bacterial strains) and ‘Intesti- 
phage’ (21 different phage components) were success- 
fully used as remedies against infectious diseases in the 
entire former USSR, and additional production cen- 
ters were established in three Russian cities. In addi- 
tion to the combined polyvalent phage preparations, 
mono-phage preparations were produced against 
Salmonella, Shigella, Staphylococcus, Streptococcus, 
E. coli, and Pseudomonas. A Staphylococcus phage pre- 
paration should be specially mentioned. This prepara- 
tion has been very highly purified and concentrated, 
giving it higher efficiency and rapid action; in contrast 
with the other phage preparations, it can be applied 
intravenously. This anti-Staphylococcus phage has 
been successfully used against septicemia, female 
infertility, osteomyelitis and open fractures, and burn 
and wound infections. It appeared especially efficient 
for treating septicemia in the newborn. 

Along with the phage research aimed at creation 
of therapeutic and prophylactic phage preparations, 


original phage-typing patterns (or lysotyping schemes) 
designed for interspecific differentiation of pathogens 
such as Salmonella paratyphi, S. typhimurium, 
Shigella flexneri, Sh. sonnei, Clostridium perfringens, 
and others have been elaborated. Application of the 
highly specific lysotyping phage sets is of great signi- 
ficance for epidemic studies, determining the sources 
and ways of transmission of the infections, as well as for 
rapidly diagnosing bacterial pathogens for selection of 
treatment. 

In recent years, a number of new applications of 
phages have been developed. Particularly, a severe 
problem of hospital-acquired infections has been 
solved due to them. The experience of purposeful 
phage application has been regularly practiced in many 
clinics in the Republic of Georgia and in Russia. A new 
phage preparation ‘Phagobioderm,’ a highly effective 
wound-covering material, has been worked out in 
collaboration between scientists from several insti- 
tutes in Tbilisi. This preparation, stimulating acceler- 
ated wound healing, is a bioresorbable polymeric film 
impregnated with dried bacteriophage (sometimes 
with other antibiotics) and painkiller substances and 
having a surface-immobilized -chymotrypsin that 
causes slow release of the various agents. It protects 
against entry of new microbes and treats those already 
there while allowing efficient air circulation and 
stimulating healing, even of large burns. 

Investigation of new bacteriophages and the 
mechanisms of their interaction with host cells are 
also performed at the Tbilisi Institute. Bacteriophages 
remain excellent model systems for elucidating such 
major problems of biology as DNA-protein and 
protein-protein recognition. Investigation of phage 
genomes and gene products may be useful with regard 
to practical applications in biotechnology as well as in 
some fields of medicine. Basic characteristics and 
molecular properties of the phages included in the 
multicomponent preparations are carried out, as well 
as the investigation of the mechanisms by which the 
bacterial viruses cause degradation of host cellular 
structures ensuring its successful replication. Com- 
parative analyses of the interaction mechanisms of 
phages with different host cells provide useful infor- 
mation as to what is common in these processes, thus 
enlightening some peculiarities of viral evolution. 


Eastern Europe 


The Soviet Union and Eastern Europe maintained a 
strong emphasis on both basic and applied research on 
phages in many other institutes and universities, as 
well. The other major facility that has been particu- 
larly instrumental in work with phage therapy since 
1957 is the Hirszfeld Institute of Immunology and 
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Experimental Therapy of the Polish Academy of 
Sciences, founded in 1952. Stefan Slopek’s group 
there published the most detailed papers available in 
English documenting phage therapy, describing the 
results of phage treatments carried out from 1981 to 
1986 with 550 patients in ten Polish medical centres. In 
518 of the cases, phage use followed extensive unsuc- 
cessful treatment with all available antibiotics; this 
work served as internal controls. The major categories 
of infections treated were long-persisting suppurative 
fistulas, septicemia, abscesses, respiratory tract sup- 
purative infections and bronchopneumonia, purulent 
peritonitis, and furunculosis. Ina final summary paper 
(Slopek et al., 1987), the authors analyzed the results 
with regard to such factors as nature and severity of 
the infection and monoinfection versus infection with 
multiple bacteria. Rates of success ranged from 75% 
to 100% (92% overall), as measured by marked general 
improvement of health, tendency to heal local wounds, 
and disappearance of measurable bacteria; 84% demon- 
strated full elimination of the suppurative process 
and healing of local wounds. Infants and children did 
particularly well; not surprisingly, the poorest results 
came with the elderly and those in the final stages of 
extended serious illness, with weakened immune 
systems and generally poor resistance. 

The bacteriophages used all came from the exten- 
sive collection of the Bacteriophage Laboratory of the 
Institute of Immunology and Experimental Therapy. 
All were virulent, capable of completely lysing the 
bacteria being treated. In the first study alone, 259 
different phages were tested (116 for Staphylococcus, 
42 for Klebsiella, 11 for Proteus, 39 for Escherichia, 30 
for Shigella, 20 for Pseudomonas, and one for Salmo- 
nella); 40% of them were selected to use directly for 
therapy. All of the treatment was done in a research 
rather than production mode, with the phage prepared 
for each patient at the Institute and tested for sterility. 
Treatment generally involved 10 ml of sterile phage 
lysate orally half an hour before each meal, with 
the stomach juices neutralized by (basic) Vichy water, 
baking soda or gelatin. Phage-soaked compresses 
were also applied three times a day for local infection. 
Various other methods of administration were suc- 
cessfully used, including aerosols and infusion rectally 
or in surgical wounds. Treatment ran for 1.5 to 14 
weeks, with an average of 5.3; for intestinal problems, 
short treatment was enough, while it was longer for 
pneumonia and pyogenic arthritis. Bacterial levels and 
phage sensitivity were continually monitored, and the 
phage(s) being used were changed if the bacteria lost 
their sensitivity. 

Few side-effects were observed, and those seemed 
directly associated with the therapeutic process. Brief 
pain in the liver area was often reported around days 
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3-5; the authors suggested that this might be related to 
endotoxin liberation as the phage were destroying the 
bacteria. In severe cases with sepsis, patients often ran 
a fever for 24h about days 7-8. Intravenous adminis- 
tration was not recommended for fear of possible 
toxic shock from bacterial debris in the lysates. How- 
ever, it was clear that the phages readily got into the 
body from the digestive tract and multiplied internally 
wherever appropriate bacteria were present, as meas- 
ured by their presence in blood and urine as well as by 
therapeutic effects. The articles include many specific 
details on individual patients which help give insight 
into the ways phage therapy was used, as well as an in- 
depth analysis of difficult cases. 


Advantages of Phage Therapy 


There are many reasons why it now makes sense to 
seriously explore widespread use of phage therapy 
worldwide: 


1. Phage are both self-replicating and self-limiting, 
since they will multiply only as long as sensitive 
bacteria are present and then are gradually elimin- 
ated from the individual and the environment. 

2. Phages can be selected that are targeted far more 
specifically than other antibiotics to the specific 
problem bacteria, causing much less damage to the 
normal microbial balance in the body. The bacterial 
imbalance or ‘dysbiosis’ caused by treatment with 
many antibiotics can lead to serious secondary 
infections involving relatively resistant bacteria, 
often extending hospitalization time, expense and 
mortality. 

3. Phages can possibly be targeted to receptors on the 
bacterial surface which are involved in pathogen- 
esis, in which case any resistant bacterial mutants 
tend to be less virulent. 

4. Virtually no side effects have been reported for 
phage therapy. 

5. Phage therapy would be particularly useful for 
people with allergies to antibiotics. 

6. Appropriately selected phages can easily be used 
prophylactically to help prevent bacterial disease 
in people or animals during times of exposure, 
or to sanitize hospitals and help protect against 
hospital-acquired (nosocomial) infections. 

7. Especially for external applications, phages can be 
prepared fairly inexpensively and locally, facilitating 
their potential use in underserved populations 
worldwide. 

8. For localized infections, phage have the special ad- 
vantage that they continue multiplying and pene- 
trating deeper into the tissues or wounds as long as 
the infection is present, rather than decreasing 


rapidly in concentration below the surface as anti- 
biotics do. 


Precautions in Phage Therapy 


Clearly, it is important to carefully select phages that 
target the bacteria in question and to monitor the 
ongoing sensitivity of the bacteria, switching phages 
if necessary. Cocktail mixtures of different phages can 
make this process quicker and more efficient. In all the 
phage therapy work discussed above, care has been 
taken to use phages that are lytic, killing their bacterial 
hosts in short order, rather than temperate phages that 
are capable of existing in a prophage state inside their 
hosts for extended periods. The importance of this pre- 
caution has become especially clear as we have learned 
more about the role of some temperate phages in 
transferring genes involved in bacterial pathogenesis. 


Conclusions 


While it may be premature to generally introduce 
injectible phage preparations in the West without 
further extensive research, the carefully implemented 
use of phages for a variety of agricultural purposes and 
in external applications could potentially soon help 
reduce the emergence of antibiotic-resistant strains. 
Phage are also especially useful in dealing with chal- 
lenging nosocomial infections, where large numbers 
of particularly vulnerable people are being exposed to 
the same strains of bacteria in a closed hospital setting. 
In this case, the environment as well as, eventually, the 
patients can be effectively treated using phages. 

New techniques for the detailed genetic and 
physiological characterization of phages isolated 
from nature, for the rapid characterization of the 
pathogens involved in a specific disease process and 
for the eventual intentional modification of potential 
therapeutic phages offer further promise in the 
development of powerful phage therapy approaches 
for restoring microbial balance and improving our 
health and that of our ecosystem. 
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Bacteriophages are viruses that specifically infect bac- 
teria. Like all viruses, they are obligate parasites. 
While they carry all the information to direct their 
own reproduction in an appropriate host, they have no 
machinery for generating energy and they have only 
one kind of nucleic acid — either DNA or RNA. Each 
phage consists of a piece of genetic information, deter- 
mining all of the properties of the virus, which is 
packaged in a protein coat. Phages are like ‘space 
ships’ that carry genetic material from one susceptible 
bacterial cell to another and then reproduce in the cell 
where they land. 

Phages are found in large quantities wherever their 
hosts are found — in sewage and feces, in the soil, in 
deep thermal vents, in drinking water. Wommack and 
Colwell, 2000 provide an excellent review of phages 
in aquatic ecosystems, including their key roles in 
maintaining the food web in the oceans, where their 
numbers surpass those of bacteria by an order of 
magnitude. Their high level of specificity, long-term 
survival, and ability to reproduce rapidly given appro- 
priate hosts contribute to bacteriophages maintaining 
a dynamic balance among the wide variety of bacterial 
species in any natural ecosystem. When no hosts are 
present, they can maintain their infectivity for years 
unless damaged by something in the environment. 
Phages are susceptible to agents such as UV that 
damage nucleic acid and most are killed by drying out 
or freezing, but the majority are not susceptible to 
organic solvents and many other forms of sterilization. 

The target for each bacteriophage is a specific group 
of bacteria. Bacteriophages cannot infect the cells of 
more complex organisms because of major differences 
in key intracellular machinery as well as in the specific 
cell-surface proteins to which they must bind to infect 
a cell. Most phages have tails, the tips of which have 
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the ability to bind to specific molecules on the surface 
of their target bacteria. The phage genome then passes 
through the tail into the host cell where it directs the 
production of progeny phages, often over a hundred 
in half an hour. 

Each strain of bacteria has very large numbers of 
characteristic protein, carbohydrate, and lipopolysac- 
charide molecules on its surface. These molecules are 
involved in forming pores, in motility, and in binding 
of the bacteria to particular surfaces; the majority of 
such molecules can act as receptors for particular 
phages. Most phages require clusters of a specific 
kind of molecule, binding with several tail fibers 
simultaneously to position themselves properly for 
penetration of the surface. Development of resistance 
to a particular phage generally reflects mutational loss 
of its specific receptor; this loss often has negative 
effects on the bacterium and does not protect it against 
the many other kinds of phage that use different 
cell-surface molecules as receptors. 

Bacteriophages were first described and named by 
Félix d’Herelle in 1917 (see D’Herelle, Félix); similar 
phenomena had been independently reported by 
Frederick Twort in 1915 though with little detail, 
and the two are jointly given credit for the discovery 
of phages. From the beginning, d’Herelle was inter- 
ested in the possibility of using phages therapeutically 
in the treatment of bacterial disease. The separate 
entry on bacteriophage therapy (see Bacteriophage 
Therapy) discusses the history of their early use as 
antibiotics, the ongoing applications in Eastern 
Europe even after the advent of chemical antibiotics, 
and the resurgence of interest in the West as antibiotic- 
resistant bacteria become increasingly problematic. 

The study of phages, as with their discovery, gen- 
erally begins with plaque formation. Various dilutions 
of a sample that is likely to contain phages are mixed 
with a few drops of a culture of some susceptible 
bacterium, and the mixture is spread over the surface 
of a petri plate containing nutrient agar. After incu- 
bation for a number of hours at an appropriate tem- 
perature, the bacteria form a continuous layer, or 
lawn, over the plate. At the appropriate dilution of a 
solution containing phages, the lawn is interrupted by 
clear, round areas of various sizes; each of these is a 
‘plaque,’ and it represents an area where the bacteria 
have been infected by phages and killed. Plaques 
made by different phages differ in size, degree of 
clearing, and in characteristic circular zones of clarity 
or turbidity. Because most phages grow well only 
in bacteria in exponential phase, plaques do not 
generally grow in size indefinitely. As bacteria enter 
the stationary phase, further infection is limited and 
the size of the plaque is therefore defined by the 
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relative reproductive rate of the phages and bacteria 
up to that point. 

Each plaque contains many millions of phages, all 
the progeny of a single phage or infected cell. It was 
the phenomenon of plaque formation that first indi- 
cated that phage should be thought of as particulate 
entities, rather than some kind of ‘poison.’ Thus, the 
titer of a phage stock (the number of phage particles 
per milliliter) may be estimated by plating appropriate 
dilutions to obtain plaques, counting the number of 
plaques, and multiplying by the dilution factor, just 
as bacteria are enumerated by counting colonies. 
While the ratio between the number of plaques and 
the number of phages is always linear, it is not 
always 1 to 1; the ratio between the actual number of 
viable phage particles and the number of plaques pro- 
duced on a given host and medium is called the ‘effi- 
ciency of plating.’ It is affected by the ability of 
the phage to get past the host defenses, especially in 
the first round of replication, and by the burst size (the 
average number of phages made per cell at each round 
of infection). A single strain of phage may be purified 
out of a mixture by carefully removing a sample from 
one plaque (with a sterilized bacteriological needle, 
capillary tube, or toothpick) and regrowing in a fresh 
bacterial culture. 

The single-step growth curve is the other general 
way that phage-host interactions have been studied 
ever since the seminal work of Emory Ellis and Max 
Delbriick in 1939. At time zero, phages are mixed with 
appropriate host bacteria. Samples are removed at 
various times and plated. The result is that the number 
of plaques remains constant (at the number of infected 
cells) for a characteristic time, referred to as the ‘latent 
period’ (about 25 min for T4 on Escherichia coli aerated 
at 370° in broth); it then rises sharply and levels off at 
about 100 times its initial value as each cell bursts, or 


lyses, liberating the completed phage. The ratio 
between the number of plaques obtained after and 
before lysis is called the ‘burst size.’ Both the burst 
size and the latent period tend to be characteristic of 
each phage strain under particular conditions; they are 
affected by the host used and its growth rate in the 
particular medium and temperature. If the infected 
cells are broken open at various times after infection, 
no phage can be detected for the first 11-12 min after 
infection; this ‘eclipse period’ was a mystery until 
the nature of the phage particle and of the infection 
process were determined. 

The typical phage particle is made of approximately 
equal amounts of protein and DNA. In 1952, before 
the genetic role of DNA had been firmly demon- 
strated, Alfred Hershey and Martha Chase separated 
the roles of the protein and the DNA in phages by a 
classic series of experiments. They grew one stock of 
phage T2 in medium containing **P, to label its DNA, 
and another stock in medium containing *°S, to label 
its protein. They then followed the fates of the labeled 
components. Thomas Anderson’s electron micro- 
scopic observations of phage attached to the bacterial 
cell surface after infection suggested that this compon- 
ent might be stripped off by violent agitation; there- 
fore Hershey and Chase looked for the release of 
radioactivity from infected cells vortexed in a blender. 
They showed that DNA (°P label) always remained 
in the infected cells (which were collected by centri- 
fugation), but that protein, labeled with *°S, was easily 
released into the supernatant by blending. Thus, they 
concluded that only the DNA of the phage is actually 
injected into the cell, the protein remaining outside. 
When phages were mixed with bacterial cell wall 
fragments they could be made to adsorb to these frag- 
ments and release their DNA into the medium. Further- 
more, the labeling pattern of newly made phages 


Symbol Family Genus Features Nucleic acid Example 
M Myoviridae - Contractile tail DNA, ds, L T4 

S Siphoviridae - Long, noncontractile tail DNA, ds, L lambda 

P Podoviridae - Short tail DNA, ds, L T7 

li Inoviridae Inovirus Long filament DNA, ss, C fd 

Ip Inoviridae Plectovirus Short rod DNA, ss, C MVLI 

Mi Microviridae Microvirus oa DNA, ss, C Phi (p) X174 
Cc Corticoviridae Corticovirus Lipid-containing capsid DNA, ds, C, S PM2 

T Tectiviridae Tectivirus Double capsid DNA, ds, L PRD | 

L Leviviridae Levivirus 7 RNA, ss, L MS2 

PI Plasmaviridae Plasmavirus Envelope, no capsid DNA, ds, C, S MVL2 

Cy Cystoviridae Cystovirus Envelope RNA, ds, L, M Phi (p) 6 
SSVI SSVI group SSVI group Lemon-shaped DNA, ds, C, S SSVI 

Li Lipothrixviridae Lipothrixvirus Lipid-containing envelope DNA, ds, L TTVI 


—, None established; ds, double-stranded; ss, single-stranded; L, linear; C, circular; S, supercoiled; M, multipartite. 


showed that large amounts of labeled DNA are passed 
on to the next generation, while virtually no parental 
protein is contained in the new phage. 

The Hershey—Chase experiment was the classical 
demonstration that DNA is the stuff of heredity, so 
for this reason it is important to all of biology. But it 
also clearly established the general pattern of phage 
growth, and it explained the eclipse period. The first 
event following adsorption of the phage particle must 
be injection of its DNA. The DNA takes over the 
cellular apparatus and initiates the synthesis of new 
phage proteins; but whole phage particles are not 
made until after about 11-12 min, and then their 
numbers increase rapidly. 


Phage Morphology and Classification 


Bacteriophages come in a large variety of sizes and 
shapes (Figure |). They are classified in terms of 
morphology, genome type, and host organisms. 
Unlike the taxonomy for cellular organisms, viral tax- 
onomy at this point is simply a classification scheme 
and does not imply phylogenetic relationships; these 
are very poorly understood. 

Over 95% of the phages described in the literature 
to date have double-stranded DNA genomes and 
tailed morphology, looking rather like sperm or tad- 
poles which attach to bacteria by their tails; this large 
group has recently been assigned the order name 
Caudovirales. Two quite different lifestyles are seen 
in members of this group. 

Many are virulent or lytic; they enter their host and 
immediately adapt its machinery to making more 
phages, lysing minutes or hours later to release hun- 
dreds of new phages. They all go through a common 
general pattern of developmental gene expression, 
though the details are specific for each phage group. 
They initially transcribe and translate a set of early 
genes whose functions include protecting the phage 
genome and restructuring the host appropriately for 
the needs of that particular phage. A set of middle 
genes then is generally responsible for synthesizing 
the new phage DNA, while a set of late genes makes 
the components of the phage capsid, the machinery 
for packaging the DNA, and the proteins responsible 
for lysing the cell at the appropriate time and releasing 
the progeny phage. This group includes the enteric 
T phages T1-T7 and their relatives, which have their 
own entry (see T Phages) as well as Bacillus subtilis 
phage SPO1, discussed below. 

Other phages have a so-called temperate lifestyle. 
They have a choice upon entering the host cell of 
going into a prophage mode that can be maintained 
for years or of immediately going into a vegetative 
(lytic) growth phase. Bacteriophages lambda and Mu 
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(see Phage Mu; Phage A Integration and Excision) are 
the best-studied members of this group. Temperate 
phages can help protect their hosts from infection by 
other phages, can be involved in carrying host genes 
from one bacterial cell to another (transduction), and 
can lead to significant changes in the properties of 
their hosts. They may even in some cases convert the 
host to a pathogenic phenotype, as in diphtheria or 
enterohemorrhagic E. coli (EHEC) strains; this is 
discussed further below. 

Some virulent phages also produce occasional trans- 
ducing particles, but these contain only host DNA 
rather than a combination of host and phage DNA, 
as is frequently seen with temperate phages. The larger 
virulent phages generally encode a number of different 
host-lethal proteins that disrupt host replication, tran- 
scription, and/or translation and may lead to degrada- 
tion of the host genome. 

The Caudoviridae of both virulent and temperate 
types can be divided into three families on the basis of 
morphology (see Figure l): 


e Myoviridae: contractile tails, built on baseplates 
(25%), e.g., the T-even phages. 

e Podoviridae: very short tails (15%), e.g., T7. 

e Siphoviridae: long, noncontractile tails (60%), 
e.g., lambda. 


The nine tailless phage families described to date each 
have very few members. They are differentiated by 
criteria such as: shape (rod-shaped, spherical, lemon- 
shaped, or pleiomorphic); being enveloped in a lipid 
coat or nonenveloped; having double-stranded or 
single-stranded DNA or RNA genomes, segmented 
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Figure | The various families of phage. 
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or not; continually extruding the progeny or lysing the 
cell to release them. Some filamentous phages have 
been associated with cholera production (see Filament- 
ous Bacteriophages). No particular relationships to 
generation of pathogenicity have been associated 
with any of the other tailless families of phages. 


Specific Bacteriophages 


Several specific phage groups are discussed in some 
detail elsewhere (see T Phages; Phage Mu; Filamentous 
Bacteriophages; Archaea, Genetics of). There are also 
several articles on topics important to genetics of a 
variety of phages (Phage Recombination; Lambda Inte- 
gration; Rolling-Circle Replication; Transcription). To 
give a better idea of the breadth of properties of phages, 
wealsoinclude hereasurvey of some of the other phages 
that have played important roles in genetic analysis and 
understanding of gene function. They are explored in 
much greater detail in Webster and Granoff (1994) and 
Calendar (1988). The complete sequences of many of 
them have now been determined and are available in 
the bacteriophage section of the genome site at NCBI 
http://www.ncbi.nem.nih. gov:80/. This repository 
also contains the sequences of many prophages deter- 
mined in the course of microbial genome projects, as 
well as lambdoid and mycobacterial phages analyzed 
as part of the comparative genomics project at the 
Pittsburgh Bacteriophage Institute, under the direct- 
ion of Graham Hatfull and Roger Hendrix. 


PI 

P1 is a temperate phage that is unusual in that its pro- 
phage most commonly exists in plasmid rather than 
integrated form. It is particularly useful because it can 
carry out generalized transduction of markers between 
strains of E. coli and Shigella. P1 also encodes its own 
restriction-modification system, the study of which by 
Werner Arber played a major role in determining the 
biology of such systems and laying the foundation for 
genetic engineering. P1 belongs to the Myoviridae. Its 
93 601 bp genome, encoding 110 genes, is terminally 
redundant and circularly permuted in the phage par- 
ticle, being packed by a headful mechanism from a con- 
catameric genome that is produced by rolling-circle 
replication. Packaging into the first prohead is initiated 
froma distinct pac site that must be dam methylated (by 
either the host or the phage system) to function and 
interacts with specific packaging machinery, which 
then continues to insert headfuls of DNA (with about 
10% redundancy) into subsequent proheads. P1 has 
an 85 nm icosahedral head, but variable numbers of 65 
and47 nm heads are also produced, depending on phage 
and host strain and on conditions; these heads package 
partial genomes. 


Genes for related functions are scattered through- 
out the genome, as in T4, rather than being tightly 
clustered as in lambdoid phages. Two different origins 
of replication are used: oviR in the prophage state; oriL 
in the lytic phase. The two modes of replication also 
have different requirements for host proteins. Though 
present in only one to two copies per cell, the P1 pro- 
phage plasmid is only lost once per 100000 cell div- 
isions. This very high efficiency of maintenance is due 
to the fact that it encodes its own effective partition 
function, par, to ensure that the daughter chromo- 
somes are properly divided between the two daughter 
cells. P1 has a particularly complex set of immunity 
functions involved in maintaining the prophage state 
and excluding other phages. It also encodes an anti- 
repressor protein that is capable of blocking the action 
of the repressor and thus causing the prophages to be 
activated to the lytic mode. However, activity of the 
antirepressor is tightly controlled in the prophage 
state by a 77 bp antisense mRNA. 

P1 makes generalized transducing phages that 
include all parts of the host genome equivalently 
(and little or no phage DNA). This implies that the 
occasional packaging of sections of the host genome 
continues until all of the DNA molecule is used and/ 
or the packaging apparatus recognizes a number of 
host sites as if they were pseudo-pac sites from 
which DNA packaging can be initiated. Another un- 
usual feature is that P1 extends its host range by 
encoding two different versions of its tail-fiber genes 
on a 4.2kb invertable ‘C-segment’ that is largely 
homologous to the smaller G-segment of phage Mu 
(see Phage Mu); both seem to recognize lipopolysac- 
charide moieties. 


P2 and P4 

P2 and P4 are generally considered as a pair because P4 
has no genes for structural proteins of its own. Rather, 
it has the ability to instruct the main head protein of 
P2 and related temperate phages to assemble into 
a particle one-third the normal size — the right size 
for the P4 genome (11 624 bp) rather than for the P2 
genome (33 593 bp). P4 is a true parasite, absolutely re- 
quiring its helper phage despite having virtually no 
sequence homology and no organizational similarity 
to P2. The P2 family, widespread in nature, are among 
the Myoviridae; a pair of disks, one inside the head and 
one outside, attach the head to the inner tail tube, while 
an outer contractile sheath attaches to a base plate with 
six tail fibers and a single tail probe. Sections of the 
tail-fiber genes show homology with tail-fiber genes 
of unrelated coliphages, presumably reflecting hori- 
zontal exchange to enhance host range. P2 and P4 can 
each infect cells lysogenic for the other, while lytic 
development of either induces the other. Induction 


of P4 requires the cox gene of P2, which activates 
transcription from the P4 lytic promoter. 

Replication of P2 occurs via a rolling-circle 
mechanism that includes its own site-specific initiation 
functions but otherwise relies on host genes. The 
related phage 186 also encodes a protein that depresses 
host replication which enhances the phage burst size, 
but is not essential. In contrast, P4 replicates bidirec- 
tionally from a unique ori (origin) that requires a 
second site 4.5kb away and several phage proteins, 
but only two host proteins — PollIII and Ssb, the single- 
stranded DNA-binding protein. P2 DNA is packaged 
from monomeric circles rather than from linear DNA 
and requires a 125bp region including a site that is 
cleaved to give 19 bp cohesive ends. 


P22 
The generalized transducing phage P22 of Salmonella 
typhimurium was involved in the initial discovery of 
transduction by Zinder and Lederberg in 1952. P22 is 
a member of the Podoviridae, with a 57 nm icosahe- 
dral head, a short tail, a sort of baseplate made up of six 
trimers of the tailspike protein, and a single fiber 
extending from the middle of the baseplate. The 
41724 bp genome includes 64 genes and unidentified 
open reading frames (ORFs). Its genes are clustered 
by function, as in its distant relative, phage lambda, 
with which it can exchange blocks of genes. It cir- 
cularizes and then either integrates into a specific 
chromosomal site to form a prophage or replicates 
via a rolling-circle process to form a concatemer. The 
DNA is packaged by headfuls that give circularly 
permuted molecules with a terminal redundancy of 
several per cent, as in T4. However, in P22 the pack- 
aging of the first head starts from a specific pac site and 
proceeds unidirectionally, with packaging into the 
next prohead then starting wherever the previous 
one finishes. Generalized transduction is thought to 
be a consequence of the occasional packaging of host 
DNA starting at some pac-like site and continuing 
through multiple headfuls of bacterial DNA. P22 can 
be very advantageous to its host. In addition to the 
genes involved in maintaining lysogeny, P22 pro- 
phages express genes that interfere with DNA injec- 
tion by related phages, that alter the O-antigen 
structure to interfere with P22 adsorption, and that 
abort the lytic cycle of some other Salmonella phages. 
The P22 DNA packaging apparatus and pac site 
have been used very effectively in building cloning 
vectors. P22 also encodes an antirepressor that is 
tightly controlled in the prophage state but can induce 
any P22-like prophages when it is expressed. The 
operator region for this ant gene has been used 
to construct a clever system to screen for clones 
expressing particular classes of regulatory proteins. 
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Any DNA sequence thought to be involved in gene 
regulation can be substituted for the ant operator 
sequence; the resulting phage will kill any cell it infects 
except those expressing a protein that can effectively 
bind to that DNA sequence and thus repress the 
expression of the ant gene. 


Cyanophages 

After much early confusion about the nature of the 
photosynthesizing ‘blue-green algae,’ it has become 
clear that they are actually bacteria with cell walls 
closely resembling those of gram-negative bacteria. 
Studies have been carried out with a number of phages 
belonging to all three major phage morphological 
categories that infect either the unicellular or the fila- 
mentous cyanobacteria families. Phages infecting the 
latter generally cause rapid invagination and destruc- 
tion of the host’s photosynthetic membranes. Such 
destruction is only seen very late in the infection 
cycle with those phages infecting unicellular cyano- 
bacteria, for which successful infection seems to 
depend on ongoing photosynthesis. Cyanophages 
are currently being used extensively to explore the 
complex physiology of these very interesting oxygen- 
producing organisms. 


Phages of Gram-Positive Bacteria 

The phages of gram-positive bacteria have less variety 
of potential binding sites than do the phages of gram- 
negative bacterial species with their elaborate outer 
membranes; most that have been characterized seem 
to bind to the glucosylated teichoic acids that make up 
much of the cell surfaces. The genomes of all of those 
that have been characterized contain multiple pro- 
phages, which generally seem to belong to a rather 
small number of families of closely related phages. 
For example, the temperate phage found for Bacillus 
subtilis have been classified into four groups, all with 
long tails; the genome sizes of groups I to IV are about 
40, 40, 126, and 60 kb, respectively. A number of the 
group III phages, in particular, seem to encode their 
own versions of such enzymes as DNA polymerase 
and thymidylate synthase. An additional group of 
defective prophages has also been found in B. subtilis. 
Prophages and the problems they cause have also been 
studied extensively in the lactic acid bacteria used in 
the dairy fermentation industry; again, only very few 
families seem to be involved. 

Several B. subtilis lytic phages have been studied 
quite extensively. SPO1 and its relatives are large viru- 
lent members of Myoviridae (145kb, including a 
12.4kb terminal redundancy) in which hydroxy- 
methyluracil (hmU) is used instead of thymine. It 
has a self-splicing intron in its DNA polymerase gene 
whose secondary structure and sequence are consistent 


184 Bacteriophages 


with the conserved features of the group I introns of 
T-even phages, cyanobacteria, and the mitochondria 
of filamentous fungi. Despite the potentially identify- 
ing presence of the unusual base, host DNA is not 
degraded during infection (as it is during T4 infection) 
and there is no indication that the substitution is 
involved in/required for the shutoff of host DNA 
synthesis and transcription. The hmU does seem to 
enhance middle-mode SPO1 transcription and the 
binding of TF1, an SPO1-specific DNA-binding pro- 
tein made in large quantities after infection that 
enhances replication though it is not essential. SPO1 
is replicated as a long concatemer with a single copy of 
the terminal redundancy between monomers that is 
then cleaved in staggered fashion leaving overhanging 
5’ ends which are then replicated. A cluster of 20 genes 
is involved in shutting off host replication and gene 
expression and inhibiting cell division. However, 
SPO1 does not shut off host ribosomal RNA synthesis 
or degrade the host DNA. Like T4, SPO1 has a very 
complex capsid, involving at least 53 different poly- 
peptides and almost half its genome. Its contractile tail 
also ends in a complex baseplate. However, there is no 
indication of any relationship between SPO1 and T4. 
Little is known about SPO1’s morphogenesis, infect- 
ion process, or the functions of individual genes except 
for some of those involved in its complex regulatory 
processes and DNA replication. 

Phi 29 is a rather small temperate phage (19 285 bp) 
characterized by a terminal protein (TP) covalently 
linked at the 5’ ends via a phosphoester bond, leading 
to a very interesting mechanism of replication that has 
been studied extensively. Phi 29 morphogenesis and 
DNA packaging have also been studied extensively in 
vitro. The tail connector protein seems to have an 
important role in giving the head its prolate shape. A 
special 174-base phage-encoded packaging RNA 
(pRNA) is essential for in vitro DNA packaging; six 
copies are found attached to the connector. This is 
the only case to date where such pRNAs have been 
found. 


Lipid-Containing Phages 

Phi 6 is a small enveloped virus whose genome consists 
of three polycistronic pieces of double-stranded RNA 
of 6374, 4057, and 2948 bp, respectively. The RNA is 
encased in an icosahedral polymerase complex sur- 
rounded by a capsid. This is in turn encased in a 
membrane that is about half phage-encoded proteins 
(including an adsorption—-fusion complex), and half 
lipid, with a lytic enzyme carried in between the 
membrane and capsid. It is pilus-specific and infects 
pseudomonads that are pathogenic to plants. After 
infection, the viral transcriptase transcribes all three 
segments. The largest is translated to form the 


polymerase—procapsid complexes, which take in one 
of each of the three mRNAs and replicate them to 
double-stranded form. Transcription of the capsid 
and membrane proteins then occurs within the pro- 
capsid until it has been encased within the capsid. 

Coliphage PRD1 is a member of a broad family of 
lytic phages that infect various gram-negative bacteria 
which contain antibiotic-resistance conjugative plas- 
mids of type N, P, or W, attaching to the sex pilus. 
PRD1 has a 14 925 bp double-stranded DNA genome 
that encodes 22 genes and is encased in a membrane 
layer which is in turn surrounded by a protein shell. 
The membrane includes phage-encoded proteins in- 
volved in adsorption, DNA injection, and DNA 
packaging. The phage also encodes its own DNA poly- 
merase as well as an initiator protein that is bound 
covalently to the start of each DNA strand. The shell 
is formed and then lined with membrane (taken from 
the host plasma membrane) before it is filled with 
DNA; several hundred viral particles are liberated on 
cell lysis. 


Temperate Bacteriophages Involved in 
Bacterial Pathogenicity and Toxin 
Production 
Lysogenization by specific phages carrying toxin 
genes and/or pathogenicity islands is involved in the 
conversion of a number of nonpathogenic bacteria 
to pathogens, as has been recognized increasingly in 
recent years. 

The genesis of cholera is discussed in the entry on 
filamentous phages (see Filamentous Bacteriophages). 


Shiga-like toxins of E. coli 

The Shigella dysenteriae toxins involved in causing 
bacterial dysentery are chromosomally encoded, but 
the related Shiga-like or Vero toxins SLT-I and SLT-II 
of E. coli are carried on phages related to lambda. 
The structural genes for SLT-I and SLT-II are in two 
different prophages in the enterohemorrhagic E. coli 
(EHEC) strains responsible for causing hemorrhagic 
colitis and the hemolytic uremic syndrome, so any 
given strain can produce either or both toxin. Both 
kinds of phages have been induced and characterized; 
SLT-I converting phage H19 has a long, flexible non- 
contractile tail, while SLT-II converting phage 933W 
has a very short tail. In some cases, non-inducible 
defective prophages seem to be responsible for the 
toxin production. Both SLT-I and SLT-II bind to 
specific glycosphingolipid receptors on susceptible 
endothelial cells of the blood vessels of the colon 
and/or kidneys. This triggers receptor-mediated endo- 
cytosis followed by movement into the cytoplasm of 
a protein component that blocks protein synthesis. 
This occurs by removing a specific adenosine from 


the 28S ribosomal RNA; both the structure and the 
mechanism are related to those of ricin and related 
plant toxins. 


Clostridium botulinum 

The toxin produced when spores germinate and grow 
anaerobically affects the peripheral cholinergic sys- 
tem, leading to the neuromuscular paralysis typical 
of botulism and, in some cases, death within 24h, 
generally from respiratory paralysis. Recently, there 
has been particular concern regarding infant botulism 
and the possibility that it may be involved in some 
cases of sudden infant death syndrome (SIDS) where it 
may have been picked up from unpasteurized honey. 
Genes for two of the seven distinct but related neuro- 
toxins that can be involved in botulism have been 
found in a family of phages of the Myoviridae type. 
These two toxin genes ‘types C1 and D’ are the major 
causes of botulism in animals. The types that infect 
humans seem to be chromosomally encoded, but the 
possibility cannot be excluded that defective pro- 
phages are involved. 


Diphtheria toxin 

Cornebacterium diphtheriae infection of the upper 
respiratory tract leads to the potential obstruction of 
the airways associated with diptheria; while the infec- 
tion is localized there, the toxin is distributed through 
the circulation and can also cause polyneuritis, myo- 
carditis, and other systematic complications. Immun- 
ization successfully reduced the reported cases of 
diphtheria in the USA from over 200000 in 1922 to 
only 22 between 1980 and 1987. Diphtheria antitoxin is 
the main treatment, along with antibiotics to eliminate 
the infection. The fact that phages were involved in 
toxigenicity was discovered in 1951. The best studied 
of the tox-carrying corynephages is the 34.7 kb tem- 
perate Siphoviridae phage B, which has the tox gene 
adjacent to the attachment site — consistent with its 
having originally been acquired through imprecise 
prophage excision. The toxin blocks protein synthe- 
sis through NAD-dependent ADP ribosylation of 
elongation factor EF-2; a single molecule of diphtheria 
toxin is able to block protein synthesis in a cell within 
a few hours. 


Pyrogenic staphylococcal enterotoxins and streptococcal 
exotoxins 

The streptococcal pyrogenic exotoxins are responsible 
for the rash produced in scarlet fever, while related 
Staphylococcus aureus toxins are responsible for many 
of the symptoms of toxic shock syndrome and those of 
staphylococcal food poisoning. This group of toxins are 
the best characterized microbial superantigens, respon- 
sible for polyclonal activation of T cells, stimulation of 
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macrophage cytokine production, and suppression of 
the activation of B cells to produce antibodies. They 
also enhance sensitivity to the endotoxins of gram- 
negative bacteria. 


Phage Evolution 


There has long been interest in where viruses come 
from, how they acquire their special properties and 
genes, and how they relate to each other. In 1980, 
David Botstein suggested that lambdoid phages, at 
least, are put together in a sort of mix-and-match 
fashion from an ordered set of modules, each of 
which may have come from a particular host, plasmid, 
or other phage. It is now generally agreed that bac- 
teriophages are very ancient — as ancient as the bacteria 
that they infect. Within each large family of phages, a 
common general gene order is preserved, facilitating 
large-scale recombination among them; this has been 
particularly well studied in the lambdoid phages, 
many of which have been sequenced at the Pittsburgh 
Phage Institute. Harald Bruessow has provided simi- 
lar data for temperate phages of the gram-positive 
lactic acid bacteria. In addition, there is strong evidence 
for considerable intervirus recombination through 
simultaneous infection or recombination with pro- 
phages. This eventually can lead to unrelated temperate 
bacteriophages from distant bacterial groups posses- 
sing homologous genes. This has been particularly 
well demonstrated with Andrew Kropinski’s recent 
completion of the sequence of bacteriophage P22. 
Significant homologies are seen between P22 and not 
only other members of Podoviridae, but also those of 
Myoviridae and Siphoviridae — all of them temperate 
phages, but infecting a variety of different gram- 
negative bacteria. In addition, the pair of genes 
involved in O-antigen conversion are related to those 
of phage Sfx — a member of the Inoviridae. All of this 
supports the long-held concept that the temperate 
tailed phages, at least, are mosaics built of a series of 
modules or cassettes. The extent of the relationships 
and the degree of apparent randomness are particu- 
larly interesting. For example, while the holin gene is 
related to that of phage lambda, the lysozyme is not, 
even though the main purpose of the holin is to give 
the lysozyme access to the peptidoglycan layer at the 
time that lysis is to occur. 

The generalizations that the Pittsburgh group have 
made suggesting that all double-stranded DNA phages 
are “mosaics with access, by horizontal exchange, to 
a large common gene pool” seem likely to apply to a 
substantial degree to the temperate phages, but the 
case is much less clear for the large lytic phages. T4, 
with 168903bp, is the only such phage whose 
sequence has been completed. Only about 12% of the 
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T4 genes show significant homologies with anything 
from the databases other than genes of other large, 
lytic phages. Similar results have been seen for the 
substantial regions of the genome sequenced for 
other phages such as SPO1 and T5. The main hom- 
ologies seen are for enzymes involved in nucleic acid 
metabolism, and they clearly reflect ancient diver- 
gence, not recent acquisitions. For example, a detailed 
analysis of the relationship patterns for thymidylate 
suggests that the T4 enzyme branched off shortly 
before the split between eukaryotes and bacteria. 
This pattern is not just due to faster evolution among 
viruses, which generally seem to coevolve with their 
hosts; herpes viruses, for example, appear to branch 
off just before the separation between the human 
and rat-mouse lines. Inspection of the sequence 
alignment reveals that the T4 enzyme has several 
stretches that seem to be generally diagnostic of the 
eukaryotic enzymes intermixed with others that 
seem to be general and unique to bacteria. It also has 
an N-terminal sequence otherwise seen only in the 
Archaea. The closest similarity with an E. coli enzyme 
is seen for the two components of the anaerobic ribo- 
nucleotide reductase; even here, detailed pattern 
analysis makes it clear that the separation occurred 
well before the divergence of E. coli and Haemophilis 
influenzae. 

T4’s one similarity with phage lambda and with an 
apparent cryptic prophage in E. coli involves the distal 
portion of the long tail fibers, a region where T4 shows 
no similarity to most other T-even phages, and where 
there is clearly a high level of selection for any event, 
however rare, that can lead to new host specificity. T4 
genes also share interesting homologies with enzymes 
from eukaryotic viruses, such as the DNA polymerase 
of Herpes and an RNA ligase/polynucleotide kinase 
from Baculovirus. 

Patrick Forterre has an interesting hypothesis to 
explain the number of genes in most large viruses 
that have no homologs in the growing database 
of fully sequenced cellular genomes (Tran Thanh Van 
et al., 1992). He suggests that many of these “orphan” 
genes may be ancient relics from before the time of the 
“last common ancestor” of bacteria, Archaea, and 
eukaryotes; that these may have been somehow pre- 
served in some of the vast number of viral genomes, 
even though the cells from which they originally 
came may have been lost to evolution in that very 
narrow cellular bottleneck. Whether or not this is 
true, bacteriophages still clearly have a great deal to 
teach us in areas such as ecology, evolution, develop- 
ment, and gene regulation, in addition to supplying 
very valuable enzymes for biotechnology and pro- 
viding a promising option for dealing with antibiotic- 
resistant bacteria. 


Further Reading 

Hendrix R, Smith MC, Burns RN, Ford ME and Hatfull GF (1999) 
Evolutionary relationships among diverse bacteriophages 
and prophages: all the world’s a phage. Proceedings of the 
National Academy of Sciences, USA 96: 2192-2197. 
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The baculovirus system is a system whereby an insect 
virus, modified to contain a specific DNA sequence, is 
injected into cell cultures and overexpressed in order to 
produce large quantities of semi-pure or pure protein. 


See also: Gene Expression 
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The traditional notion of genetic polymorphism as- 

sumed that genes were either essentially monomorphic 
or highly polymorphic. Now we know that there is a 
continuum of genetic variation and that most, if not 
all, genes harbor some genetic variation; the amount 
and type of variation depending on the nature and the 
strength of evolutionary forces — mutation, selection, 
migration, and genetic drift. “Balanced polymorphism’ 
implies that the polymorphism is being maintained by 
the interplay of two or more evolutionary forces acting 
in opposite directions. All genetic polymorphisms need 
not and probably cannot be balanced polymorphisms. 
For example, as beneficial mutations increase in 


frequency they would necessarily produce genetic 
polymorphisms against the original allele before 
reaching fixation (100% frequency). This situation is 
described as ‘transient polymorphism.’ Another form 
of transient polymorphism develops when mutations 
which are neutral or nearly neutral in their fitness in- 
crease in frequency by chance (random genetic drift). 
This is known as ‘mutation-drift’ balance. A large 
number of genes affecting human diseases generally 
tend to show small amount of genetic variation and 
the frequency of the deleterious allele is governed by a 
balance between mutation producing the deleterious 
alleles and natural selection eliminating it (‘mutation— 
selection’ balance). Whether the deleterious gene is 
dominant or recessive makesa big difference inthe equi- 
librium population frequency of the deleterious gene. 

A third form of balanced polymorphism involves 
‘selection—migration’ balance for a locally deleterious 
gene: selection reducing the frequency in the local 
population and migration (from nearby populations) 
replenishing it. Depending on the relative strength of 
selection and migration, the amount of genetic poly- 
morphism can be little or substantial. 

The most interesting form of balanced polymorph- 
isms are those that show segregation of high-frequency 
alleles and are suspected of being maintained by natural 
selection. This is the type of polymorphism which E.B. 
Ford had in mind when he defined polymorphism as: 


the occurence together in the same locality of two or more 
discontinuous forms of a species in such proportions that the 
rarest of them cannot be maintained merely by recurrent 
mutation. 


Three of the most interesting forms of selectively main- 
tained balanced polymorphisms are heterozygous 
advantage or overdominance, frequency-dependent 
selection, and multiple-niche polymorphism. 
Heterotic-balance polymorphisms develop when 
the fitness of the heterozygotes is higher than 
those of the homozygotes. A classic case of balanced 
polymorphism in human populations is that of sickle- 
cell anemia. A mutation in the hemoglobin gene (B°) 
leads to an alteration in the hemoglobin protein such 
that the homozygote (B°B°) genotype is effectively 
lethal because individuals die of anemia. This would 
lead to elimination of the pS allele from the populations 
except that in regions where there is malaria the normal 
homozygote (BB*) individuals suffer relatively more 
mortality from malaria (caused by Plasmodium falci- 
parum) than heterozygous individuals (BABS). The 
latter enjoys the highest fitness as they receive pro- 
tection from both anemia and malaria. The loss of pS 
alleles due to anemia is compensated (at equilibrium) by 
the loss of Bê alleles from malaria and thus both alleles 
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are maintained in a state of balanced polymorphism. 
Such polymorphisms are found in many parts of the 
world where there is malaria, suchas Africa, the Middle 
East, and India. Many of these countries have fairly high 
(5-6%) frequencies of the sickle-cell allele. Eradication 
of malaria would lead to reduction of B° allele from 
human populations as appears to be the trend in the 
black populations of the United States of America. 
Thus heterozygote advantage is a powerful mechan- 
ism for maintaining genetic polymorphisms, even for 
deleterious genes, and many of the debilitating human 
diseases (e.g., Tay-Sachs, Gaucher, and Niemann- 
Pick diseases in the Ashkenazi Jews) and some of 
the highly polymorphic blood group and enzyme 
genes (e.g., the ABO blood groups and glucose-6- 
phosphate dehydrogenase) are suspected of being 
cases of present or past selectively maintained balanced 
polymorphisms. 

Another example of overdominant molecular poly- 
morphism may be that of the alcohol dehydrogenase 
(Adb) gene in natural populations of Drosophila mela- 
nogaster. This gene segregates for two protein elec- 
trophoretic alleles, Adh-F (Fast) and Adh-S (Slow), 
which show north-south (latitudinal) clinal variation 
in populations from different continents. The DNA 
sequence studies show that the Adh-F allele is of 
recent origin and a lysine residue in the Adh-S allele 
has been replaced by a threonine residue in the Adh-F 
allele. The F protein shows more enzymatic activity 
and is produced in larger quantity. DNA sequencing 
studies of the representative samples of the two alleles 
have shown many silent site polymorphisms and a 
higher level of nucleotide variation at sites near to the 
amino acid altering mutational site than elsewhere in 
the gene. The latter observation is expected as linked 
polymorphic nucleotide sites cannot segregate freely 
and a nucleotide site under balancing selection within 
the gene, through linkage, will influence levels of 
polymorphism at the tightly linked sites. The Adh 
polymorphism is suspected of being maintained by 
heterozygote advantage. 

Another form of selection which can lead to bal- 
anced polymorphism is frequency-dependent selec- 
tion. Here the fitnesses of genes and genotypes are 
not constant (as is usually assumed in the case of 
heterotic balance) and rather vary in inverse relation- 
ship to their frequencies, i.e., the larger the frequency 
of a genotype, the lower its fitness and vice versa. In 
this form of natural selection a single genotype will 
never replace its competitors as increasing its fre- 
quency will lead to lowering its fitness and consequent 
decrease of its frequency. Thus rare genes will tend to 
increase and common genes decrease in frequency 
leading to a balanced polymorphism. Some well- 
known cases of frequency-dependent selection are 
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rare-male mating advantage in Drosophila, self- 
incompatibility alleles in plants, and bird predation 
in colored moths and snails. Self-incompatibility 
genes in plants control germination of pollen on the 
female stigma and discriminate between self and non- 
self. Successful pollination only occurs when pollen 
and stigma are of opposite types. In this situation 
a population would theoretically maintain multiple 
self-incompatibility alleles. Several types of self- 
incompatibility genes are known and they provide 
an example of some of the highest polymorphic, 
multiallelic genes known next only to the major histo- 
compatibility complex (MHC) genes in humans. 

An ecologically important form of balanced poly- 
morphism develops when a population’s environ- 
ment is heterogeneous in a way that favors different 
genotypes in different environments (multiple—niche 
polymorphism). A combination of random mating, 
niche-specific genotypic fitness, and niche-specific 
contribution to total population numbers can lead to 
a balanced polymorphism. 

There are many other forms of selection, such as 
seasonal selection within a year, cyclic selection over 
generations, and selection between different life 
stages, that can lead to balanced polymorphisms. 
However it is important to realize that balanced poly- 
morphism, specially in the case of heterotic balance, is 
maintained through loss of fitness (lower viability or 
fertility) of disadvantaged individuals which leads to 
more loss in population number than would be the 
case without balanced polymorphism. Therefore it has 
been argued that there must be a limit to the number of 
gene loci that can be maintained by balancing selec- 
tion. Unlike the polarized view 50 years ago, now 
the problem of the amount of genetic variation has 
been separated from the problem of its maintenance. 
There appears to be more genetic variation in most 
populations than can be maintained by balancing 
selection alone. Mutation rate and population size, in 
addition to selection, are important variables affecting 
genetic variation. 


See also: Frequency-Dependent Selection; Nearly 
Neutral Theory; Neutral Theory; Sickle Cell 
Anemia 
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A chromosomal translocation is the result of the 
exchange of chromosomal segments between 


nonhomologous chromosomes. A translocation is 
called ‘balanced’ or ‘reciprocal’ if there is no overall 
loss or gain of genetic material. If the translocation is 
present in every cell in the body, it is called a constitu- 
tional translocation. When a chromosomal change is 
found in every cell, it indicates that it was either pre- 
sent at the time of fertilization or occurred very early 
in development. If the chromosomal change is found 
in only certain cells or tissues, it is referred to as an 
acquired or somatic translocation. In this case, the 
translocation could have occurred later in develop- 
ment or in the adult. An acquired or somatic trans- 
location might not be present in the germ cells (sperm 
and egg cells) and thus is not necessarily passed on 
to the next generation. Balanced translocations have 
been seen in a wide range of organisms, including 
fungi, plants, insects, and mammals. 

Balanced or reciprocal translocations have been 
found in individuals with no medical problems, in 
individuals with specific diseases, and in individuals 
with abnormalities but no specific diagnosis. The 
different outcomes are related to the location of 
the breakpoints of the chromosomal exchange. A 
balanced translocation may have a detrimental effect 
if the rearrangement disrupts an important gene (such 
as neurofibromatosis or retinoblastoma). This could 
be due to the direct disruption of the coding sequence, 
separation of regulatory regions from the transcrip- 
tion unit or a position effect by the new chromosomal 
environment. It is also possible that an apparently 
reciprocal translocation is not truly balanced. For 
example, a translocation may appear balanced by 
cytogenetic analysis, but, in fact, could have a net 
loss or gain of material at the molecular level. This is 
because most translocations are ascertained using 
traditional cytogenetic approaches, which have a 
limit of resolution of about 4 megabases (4 x 10° bp). 
In most cases the breaks occur in regions where there 
are no genes (only 10-15% of the genetic material 
codes for genes) and in these cases there probably 
will be no clinical abnormalities associated with the 
translocation. 

Studies in humans show that approximately 1 in 
500 individuals have a balanced constitutional trans- 
location (Van Dyke et al., 1983; Hook et al., 1984). A 
constitutional translocation can either be inherited 
from a parent (familial) or occur de novo. It has been 
noted that the risk of associated medical problems is 
higher in individuals who carry a de novo transloca- 
tion (Warburton, 1991). The most likely reason for 
this difference is that normal, fertile individuals who 
carry a translocation and pass it to their children 
probably have a translocation that does not disrupt a 
gene. On the other hand, an individual with a de novo 
translocation, causing serious medical problems, is 


less likely to reproduce and thus does not pass the 
translocation to his/her offspring. 

Most reciprocal translocations are unique to an 
individual or family. In humans, translocations involv- 
ing different regions of all of the autosomes and the 
sex chromosomes have been reported. The only known 
recurrent balanced translocation occurs between 
chromosomes 11 and 22, which has been documented 
in over 150 families. Recently, both breakpoint 
junctions from the t(11; 22) were characterized 
molecularly and found to occur within palindromic 
AT-rich regions (Kurahashi et al., 2000). Results from 
cloning and sequencing other balanced translocation 
breakpoints have shown that for most unique constitu- 
tional translocations there is little sequence homology 
at the breakpoint region of the two involved chromo- 
somes. Further, in most cases there are usually only a 
small number of nucleotides that are lost or gained. 

The major medical problems balanced transloca- 
tion carriers encounter relate to reproduction. In 
some cases, carriers have reduced fertility and/or 
an increase in the number spontaneous abortions. 
Carriers are also at risk of having offspring that are 
abnormal due to malsegregation of translocation 
chromosomes during meiosis. To understand how 
a balanced translocation can lead to chromosomal 
imbalance in the offspring of carriers one must ex- 
amine the steps of meiosis, which occurs during the 
formation of gametes. Normally, meiosis requires the 
pairing of homologous chromosomes to form biva- 
lents. When there is a balanced translocation, the two 
translocation chromosomes and the two normal 
chromosomes must align along the regions that are 
homologous. The resulting structure is called a 
quadrivalent (see Figure 1). There are multiple ways 
in which the chromosomes may segregate from the 
quadrivalent. For example, if the chromosomes segre- 
gate in such a way that the gametes receive either the 
two normal chromosomes or both translocation 
chromosomes, then the genetic material is balanced 
and the offspring will be normal. This is called alter- 
nate segregation. However, if the normal chromosome 
segregates with one of the translocation chromosomes 
(adjacent-1 segregation), the zygote would have partial 
trisomy and partial monosomy for the relevant seg- 
ment of the involved chromosomes. The resulting 
pregnancy could result in early miscarriage or off- 
spring with multiple abnormalities. There are other 
possible outcomes from resolution of the quadrivalent. 
Since each translocation is unique, it is difficult to pre- 
dict accurately the frequencies of the different events. 

Acquired balanced translocations are most often 
seen in association with cancer cells (reviewed by 
Rabbitts, 1994). The best-known example of this is 
the Philadelphia chromosome (Ph'), which is found 
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Figure | Gametes resulting from meiosis in a 
balanced translocation carrier. The top of the figure 
illustrates the quadrivalent formed at meiosis during 
pairing of the chromosomes involved in the transloca- 
tion. Shown at the bottom are the gametes most 
frequently formed at the end of meiosis. Illustrated on 
the left are the gametes resulting from ‘alternate 
segregation’ (when nonadjacent chromosomes segre- 
gate). The products of ‘alternate segregation’ are 
genetically balanced. On the right are shown the 
gametes resulting from ‘adjacent-| segregation’ (when 
chromosomes next to one another segregate). These 
gametes will produce unbalanced offspring that have 
partial trisomy and partial monosomy for the involved 
chromosomes. There are additional modes of segrega- 
tion that can result, such as 3:1 segregation, which are 
not shown. 


in 90% of patients with chronic myeloid leukemia. 
The Ph' chromosome is the result of a reciprocal 
translocation between chromosomes 9 and 22. The 
rearrangement creates a fusion protein from two 
genes: the ABL oncogene on chromosome 9 and 
the BCR gene on chromosome 22. The chimeric 
protein formed is a tyrosine kinase with unique 
properties. 

Balanced translocations from a variety of organ- 
isms have been useful laboratory tools. Whole organ- 
isms and cell lines with these rearrangements have 
been used in a wide range of genetic investigations, 
including gene mapping, studies on the effect of 
duplications and deficiencies of defined chromosomal 
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regions, and analysis of chromosome position effects. 
Chromosomal rearrangements, such as balanced 
translocations, are also thought to play an important 
role in the evolution of new species. 


References 

Hook EB, Schreinemachers DM, Willey AM and Cross PK 
(1984) Inherited cytogenetic 
detected incidentally in fetuses diagnosed prenatally: fre- 


structural abnormalities 
quency, parental-age associations, sex-ratio trends, and com- 
parisons with rates of mutants. American Journal of Human 
Genetics 36: 422—443. 

Kurahashi H, Shaikh TH, Hu P et al. (2000) Regions of genomic 
instability on 22qI | and |1q23 as the etiology for the secu- 
nant constitutional t(11; 22). Human Molecular Genetics 9: 
1665-1670. 

Rabbitts TH (1994) Chromosomal translocations in human 
cancer. Nature 372 (6502): 143-149. 

Van Dyke DL, Weiss L, Roberson JR and Babu VR (1983) 
The frequency and mutation rate of balanced autosomal 
rearrangements in man estimated from prenatal genetic 
studies for advanced maternal age. American Journal of 
Human Genetics 35: 301—308. 

Warburton D (1991) De novo balanced chromosome re- 
arrangements and extra marker chromosomes identified 
at prenatal diagnosis: clinical significance and distribution 
of breakpoints. American Journal of Human Genetics 49: 
995-1013. 


See also: Philadelphia Chromosome 


BALB/c Mouse 


L Silver 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.0109 


A well-known inbred strain of albino mice used exten- 
sively in immunological studies. 


See also: Inbred Strain 


Balbiani Rings 
J C J Eeken 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.0110 


Balbiani rings (BR) in principle are giant puffs in the 
polytene chromosomes of the salivary gland cells of 
larvae of Chironomus tentans, as first described in 


1881 by Balbiani. They are sites on the chromosome 
with a very high RNA production. Chironomus 
tentans is a holometabolous insect (midge) whose 
egg, larval, and pupal stages develop in water. The 
larvae live under water in a thin fibrous protein tube, 
a protective and food-gathering funnel, the compon- 
ents of which are continuously produced by the 
salivary gland cells. The amount of protein each sali- 
vary gland cell produces is enormous (in 24h it pro- 
duces and exports an amount of BR-encoded protein 
equal to its own total protein content). Approximately 
15 different secretory proteins can be extracted. In 
vitro these proteins form soluble complexes that are 
capable of assembly, disassembly, and reassembly. In 
vivo, however, the luminal contents are pumped on 
demand through the salivary gland duct and leave the 
animal’s mouth as an insoluble silk fiber. The four 
largest of the proteins produced in the salivary gland 
cells (also called ‘silk-proteins’) are encoded by single 
copy genes in the Balbiani rings. The typical morpho- 
logical chromosome structure forming the Balbiani 
rings is the result of the very high transcription rate 
of these genes. The majority of the genes encoding 
the secretory proteins, including those related to the 
Balbiani ring structures, are clearly related. They are 
characterized by internal sequence repetition. The four 
genes creating the Balbiani rings during transcription 
are approximately 35-40 kb in length. All four genes 
contain several exons, one of which is relatively very 
large and consists completely of 130-150 copies of a 
nearly identical sequence. This repeated sequence 
encodes between 60 to 90 amino acid residues, 
depending on the particular gene. The first part of 
this sequence, the C (constant) region, is very con- 
served and encodes a peptide able to form an a-helix 
structure. The second part of the repeat, the subrepeat 
(SR) region is itself build up by repeats of 9 to 33 bp, 
typically containing a +Pro— motif. As a result, the 
SR region encodes a peptide that can also form a 
helical structure. 

Primarily based on the size of the Balbiani ring 
structures as well as the size of the salivary gland 
cells in which they are found, the Balbiani ring genes 
have been used in studies of the structure of actively 
transcribing chromatin, packaging of pre-mRNA in- 
to RNA-protein particles, the process of splicing, 
nuclear pore passage, and polysome structure. The 
molecular structure of the genes themselves makes 
them particular useful as model for the evolution 
of genes by sequence duplication and intragenic 
reduplication. 


Further Reading 
Daneholt B (1997) A look at messenger RNP moving through 
the nuclear pore. Cell 88: 585-588. 


Wieslander L (1994) The Balbiani ring multigene family: coding 
repetitive sequences and evolution of a tissue-specific cell 
function. Progress in Nucleic Acid Research and Molecular 
Biology 48: 275-313. 


See also: Chromosome Structure; Polytene 
Chromosomes 
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The inactive X chromosome of XX females is visible 
in the nucleus as a Barr body. This is not seen in the 
nuclei of XY males, or of XO females (Turner syn- 
drome patients). 


See also: Sex Chromatin; X Chromosome; 
X-Chromosome Inactivation 


Basal Cell Carcinoma 


A Balmain 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1548 


Basal cell carcinoma of the skin is one of the most 
common forms of human cancer, occurring predomin- 
antly in individuals exposed to sunlight. Incidence is 
relatively higher among males than females, and 
increases in latitudes near the equator. The most fre- 
quent site of occurrence is on the sun-exposed skin 
of the face or limbs, but other sites can also be affected. 
These lesions do not generally metastasize, but can be 
locally invasive and highly disfiguring due to spread 
from the primary tumor site. Surgical removal is the 
most common treatment, but radiotherapy or chemo- 
therapy can also be used. Some individuals are highly 
susceptible to basal cell carcinoma development due to 
an inherited mutation in the gene known as ‘Patched’ 
or PTC. Families in which this mutation is carried 
from one generation to the next are highly tumor- 
prone, but the same gene is mutated in a high propor- 
tion of sporadic basal cell carcinomas in individuals 
with no family history of the disease. 


Further Reading 
MacKie RM (1989) Skin Cancer. London: Martin Dunitz. 


See also: Cancer Susceptibility 
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Base analog mutagens are chemicals that mimic bases 
to such an extent that they can be incorporated into 
DNA in place of one of the normal bases but in doing 
so lead to an increase in the rate of mutation. To 
be mutagenic, a base analog must mispair more fre- 
quently than the normal base it replaced. This mis- 
pairing can occur either during the initial incorporation 
into DNA, or during subsequent rounds of replication 
when the base analog is used as a template. Most of 
these mutagens typically induce only base pair sub- 
stitutions (and not other types of mutation). They are 
usually not highly toxic nor do they increase rates of 
recombination. 

Base analog mutagenesis normally causes transi- 
tions, and not transversions. This would seem to be a 
natural outcome of the fact that the mechanism 
involves formation of base pairs, which even when 
not natural almost always involve a purine and a 
pyrimidine. However, different base analogs behave 
differently as two examples will illustrate. 

The base analog 5-bromouracil (BU) is efficiently 
incorporated into DNA in place of thymine (T). It is 
mutagenic because BU is more often in the enol form 
than is T, and in this form it can base pair with guanine 
(G). (BU is also more reactive to ultraviolet light than 
T, and this may also increase the level of mutation.) 
Experiments have shown that BU typically seems to 
cause GC — AT transitions. This can be explained 
by postulating that BU typically mispairs when it is 
originally incorporated, rather than during subse- 
quent rounds of replication. On the other hand, 2- 
aminopurine (AP), an analog of adenine (A), is poorly 
incorporated into DNA, but is highly mutagenic 
because in its normal state it can base pair with both 
Tand with cytosine (C). AP seems to lead mostly to 
AT — GC transitions, which would occur if it usually 
paired with C not during incorporation, but during 
subsequent rounds of replication. 

Many base analog mutagens show specificity for 
the organism in which they induce mutations. This 
could be because of the varying efficiency of different 
organisms in taking up the bases (or the nucleosides), 
in converting them into the nucleoside triphosphates, 
or in using the analog-containing nucleoside triphos- 
phates as substrates for their DNA polymerases. 
While AP (and also 2,6-diaminopurine) is an effective 
mutagen when used in bacteria, it is ineffective against 
eukaryotes. However, 6-hydroxyaminopurine can 
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be used in bacteria but is most effective against 
eukaryotes. The base analog mutagen 4-hydrazino-2- 
oxopyrimidine (N*-aminocytosine) is about equally 
effective against a wide range of organisms. 

Of course, some useful base analogs are not muta- 
gens. The base analog azidothymidine (3’-deoxy-3'- 
azidothymidine, AZT) is not mutagenic, but when 
converted into the triphosphate this compound 
can inhibit retroviral reverse transcriptase. If it is in- 
corporated into DNA it leads to chain termination. 


See also: Purine; Pyrimidine; Transition; 
Transversion Mutation 
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The genetic material of all cellular organisms is 
double-stranded DNA. Because of base pairing, the 
amount of adenine (A) will equal the amount of thy- 
mine (T) and the amount of guanine (G) will equal the 
amount of cytosine (C). Although the G + C content 
and the A + T content must of necessity equal 100%, 
the ratio of G + C to that of A + T in the genome can 
vary quite widely from one organism to another. 
Although this can be reported as a ratio, or as the 
amount of either base pair as a fraction of the total, 
it is more typical to refer to the base composition of 
the DNA of an organism simply in terms of G + C 
content or per cent GC content. 

The greatest variation in overall per cent GC con- 
tent is seen in the genomes of different species of 
prokaryotes and the lower eukaryotes, such as the 
algae and protists. Among these organisms GC con- 
tent can vary between 25% and 75%. For example, 
among the prokaryotes, the bacterium Mycoplasma 
capricolum has a GC content of 25%, the bacterium 
Borrelia burgdorferi, 29%, the Archeon Methanococ- 
cus jannaschii, 31%, the bacterium Staphylococcus aur- 
eus, 33%, the bacterium Helicobacter pylori, 39%, the 
bacterium Bacillus subtilis, 44%, the bacterium 
Escherichia coli, 51%, the bacterium Mycobacterium 
tuberculosis, 66%, and the bacterium Micrococcus 
luteus, 75%. The GC content of the genomes of higher 
eukaryotes shows much less variation; the vertebrates 
have a GC content of 42-44%, and many inverte- 
brates and plants have GC contents in the range of 
35-45%. 

The variation in GC content among the pro- 
karyotes reflects phylogenetic relationships. For 


example, the gram-positive bacteria with low GC 
content, such as Mycoplasma capricolum and S. aureus, 
are related to each other, as are the gram-positive 
bacteria with high GC content, suchas Mycobacterium 
tuberculosis and Micrococcus luteus. All members of 
the genus Streptomyces have a high GC content 
(70-74%). However, the pressures that have led to 
this variation are unknown. Interestingly, differences 
in base composition exist in individual strands of 
DNA. For example, in bacteria those regions of DNA 
that are replicated as leading strands seem to have a 
greater abundance of G than those regions that are 
replicated as lagging strands. 

Although the overall GC content can vary widely 
in prokaryotes, within a particular genome the com- 
position is fairly uniform. As would be expected, 
codon bias (the preferential use of synonymous 
codons) strongly reflects the base composition of the 
genome and varies dramatically between GC-rich and 
AT-rich organisms. However, there is some heterogen- 
eity in GC composition within a prokaryotic genome. 
Some of these differences clearly relate to the func- 
tion of the DNA. The ribosomal RNA operons of 
the hyperthermophilic bacteria have relatively high 
GC content, even if the overall GC content of a 
particular genome is low. For instance, while the over- 
all GC content of the bacterium Aquifex aeolicus is 
43%, the GC content of the ribosomal RNA operons 
is 65%. However, some differences in base composi- 
tion of individual prokaryotic genes or regions of the 
chromosome seem to relate to whether a particular 
gene was acquired from another organism by hori- 
zontal transfer. The GC content of such regions will 
reflect to some degree the GC content of the donor 
organism. 

Although genome to genome comparisons of GC 
content among most eukaryotes shows less variation 
than is seen among prokaryotes, regional difference 
within eukaryotic genomes are common. Many eukary- 
otic genomes contain long segments of DNA (50 to 
over 300 kb) which have relatively homogeneous base 
composition, called isochores. In humans these iso- 
chores have GC contents ranging from 30% to 60%. 
The GC-rich segments are also gene rich. (It must be 
remembered that in the higher eukaryotes most of the 
genome is noncoding; in humans it is estimated that 
97% of the genome is noncoding.) Although these 
regional differences are a prominent component of 
genomes from yeast to humans, they are not found 
in all eukaryotes. The GC content of the nematode 
Caenorhabditis elegans is a relatively constant 36% 
among all chromosomes. 


See also: Base Pairing and Base Pair Substitution; 
Codon Usage Bias; DNA; DNA Replication 
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A base pair (bp) is a partnership of adenine (A) with 
thymine (T) or cytosine (C) with guanine (G)ina DNA 
duplex. In RNA, the pairs are adenine and uracil (U) 
and guanine and cytosine. 


See also: Codon Usage Bias; Genetic Code 
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The fidelity of DNA replication is essential for the 
accurate transmission of genetic information. Errors 
made by DNA polymerases during DNA replication, 
if unrepaired, would result in base substitution muta- 
tions. Since the discovery of the three-dimensional 
structure of DNA by Watson and Crick, the mechan- 
ism of base mispair formation has been the subject of 
intense laboratory investigation. The significance of 
base mispair formation extends beyond mispairs 
formed by the normal DNA bases. It is now well 
established that many carcinogenic agents may alter 
the chemical structures of DNA bases, thus facilitat- 
ing base mispair formation, mutagenesis and carcino- 
genesis. 

In 1953, Watson and Crick proposed a model for 
the three-dimensional structure of DNA. This model 
comprised two antiparallel DNA strands joined by 
hydrogen bonding between the bases on the interior 
of the duplex, with the negatively charged phosphate 
backbone on the exterior of the duplex. Complemen- 
tary base pairs are formed between adenine-thymine 
(Figure 1A) and guanine-cytosine (Figure |B) base 
residues. The two complementary strands of the 
duplex are held together by a combination of hydro- 
gen bonding, between complementary paired bases on 
opposite strands, and base-stacking interactions, pri- 
marily involving neighboring bases on the same 
strand. The magnitudes of both hydrogen bonding 
and base-stacking interactions depend upon the geo- 
metry of the interacting bases. Both hydrogen bond- 
ing and base-stacking interactions are optimized when 
the bases are in Watson—Crick geometry. The Watson- 
Crick structural model for the normal, complementary 
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base pairs in duplex DNA has been confirmed in 
numerous experimental studies. 

While formulating their model for the DNA helix, 
Watson and Crick encountered a problem in joining 
the bases correctly because thymine and guanine were 
drawn incorrectly in the enol tautomeric form in the 
biochemistry textbooks of the time. As shown in 
the figure, a proton may be placed either on a ring 
nitrogen atom (keto form, Figure IC) or an exocyclic 
oxygen atom (enol form, Figure ID). The placement 
of the proton is indeed critical as it changes the way in 
which the base residue would form complementary 
hydrogen bonds. Although both keto and enol forms 
are in equilibrium with one another, the keto forms 
of thymine and guanine predominate by a ratio of 
approximately 100000 to 1. 

Once the predominant keto forms of thymine and 
guanine were recognized, Watson and Crick quickly 
converged on the structures of the complementary 
base pairs and duplex DNA. However, having recog- 
nized the potential ambiguity generated by the exist- 
ence of alternative tautomeric forms, Watson and 
Crick proposed a model for the spontaneous forma- 
tion of base mispairs during DNA replication. In the 
incorrect tautomeric forms, thymine could mispair 
with guanine (Figure IE) and cytosine with adenine. 
The concept of incorrect or “rare” tautomeric forms as 
a basis for spontaneous transition mutations took hold 
and has appeared since in most biochemistry text- 
books, in spite of a paucity of confirmatory experi- 
mental data. It is therefore important to consider 
alternative base mispair structures. 

In addition to the exchange of protons between ring 
nitrogens and exocyclic nitrogen or oxygen atoms, 
protons may also exchange with solvent in aqueous 
solution, generating ionized forms of the bases 
(Figure IF — Figure II). The ionization constants 
(pK, and pKp values) for the normal DNA bases 
are approximately three pH units away from physio- 
logical pH. Therefore, the ionized forms of the bases 
would exist at a ratio of approximately 1000 to 1, 
under physiological conditions. 

Although ionized forms of the bases may similarly 
generate base mispairs, and may exist at a frequency 
100 times greater than rare tautomeric forms, the con- 
cept of ionized base pairs in DNA was originally 
dismissed. Prior to the model presented by Watson 
and Crick, Linus Pauling proposed a model for duplex 
DNA in which the phosphate backbones of the two 
strands formed salt bridges with one another on the 
interior of the helix, and the base residues were on 
the exterior of the helix. However, Watson and Crick 
were aware of titration data which indicated that the 
sites of ionization of the DNA bases were substan- 
tially more difficult to ionize in duplex DNA. This 
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Figure | Structures of base pairs and base mispairs. (A) T ®A Watson—Crick base pair; (B) C@G Watson—Crick 
base pair; equilibrium between (C) favored keto tautomer of T with (D) rare enol tautomer of T; (E) T(enol)®G base 
mispair; pH-dependent equilibrium between (F) T (keto) and (G) T (ionized); pH-dependent equilibrium between (H) 
A (amino) and (I) A (protonated); (J) TeG wobble base mispair; equilibrium between (K) C®A (ionized) base mispair 
and (L) CeA reverse wobble mispair; (M) N*-methoxycytosine (imino) ® A base mispair; (N) ) N*-methoxycytosine 
(amino)®G base pair; pH-dependent equilibrium between (O) BrU è G wobble base mispair and (P) BrU (ionized)@G 
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critical piece of data convinced Watson and Crick that 
the sites of base ionization, which are also the sites of 
hydrogen bond formation, must be in the center of the 
helix in duplex DNA and not protruding into solution 
as proposed by Pauling. The observation that base pair 
formation suppresses ionization in duplex DNA then 
led to the concept that base ionization would inhibit 
rather than promote base pair formation. Twenty-five 
years later, this concept would undergo considerable 
revision. 

From the time Watson and Crick proposed their 
original model, the configuration of base mispairs in 
DNA was the subject of intense theoretical examin- 
ation. However, in the late 1970s and early 1980s, the 
technology was developed for the chemical synthesis 
of defined sequence oligonucleotides. Such synthesis 
made possible, for the first time, the generation 


of specific mispairs embedded in otherwise normal 
DNA. Simultaneously, the development of high field 
superconducting magnets and computers capable 
of performing Fourier transform NMR spectroscopy 
allowed examination of the structure and dynamics of 
base mispairs in DNA. 

The first mispair examined by NMR spectroscopy 
in aqueous solution using a defined-sequence syn- 
thetic oligonucleotide was the guanine-thymine mis- 
pair. Surprisingly, the structure identified did not 
involve rare tautomeric forms as anticipated by the 
Watson-Crick model. Rather, both the guanine and 
thymine residues were observed in the predominant 
keto tautomeric forms, hydrogen bonded in a 
wobble geometry (Figure IJ). Examination of this 
mispair by X-ray crystallography has led to the same 


conclusion. 
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The other mispair formed between a normal purine 
and pyrimidine, which would result in a transition 
mutation, is the adenine-cytosine mispair. When first 
examined by NMR spectroscopy, an insufficient 
number of resonances were observed to define the 
structure of the adenine-cytosine mispair. An exam- 
ination by X-ray crystallography indicated that the 
adenine amino group was in close proximity to the 
cytosine ring nitrogen, and the adenine ring nitrogen 
was within hydrogen bonding distance of the cyto- 
sine carbonyl. The position of these heteroatoms in 
the crystal structure indicated the formation of two 
hydrogen bonds between the adenine and cytosine 
residues in a wobble geometry. However, two 
hydrogen bonds would not be possible in such a con- 
figuration if both the adenine and cytosine residues 
were in the normal, amino tautomeric forms. A re- 
examination of the NMR data led to the conclusion 
that the additional hydrogen bond formed in the 
adenine-cytosine mispair resulted from protonation 
of the adenine residue on the ring (N;) nitrogen 
(Figure IK). 

While emerging data prompted a theoretical re- 
examination of the potential involvement of rare tau- 
tomeric forms and ionized bases in mispair formation, 
the observation of the protonated adenine—cytosine 
mispair in DNA placed the concept of ionized base 
pairs ona solid experimental footing. It was becoming 
clear that, while base pair formation between Watson- 
Crick base pairs would suppress ionization, ionization 
could indeed create additional hydrogen bonds in 
mispaired or incorrect structures. Base protonation 
or ionization may allow formation of additional 
hydrogen bonds between mispaired bases. 

Experimental data from several sources has con- 
verged upon the protonated adenine-cytosine wobble 
mispair as the predominant configuration at and 
below physiological pH. However, if the solution 
pH was increased could the proton be removed from 
the adenine residue? If so, would the structure then 
collapse into a Watson—Crick base pair in which either 
the adenine or cytosine residues would assume a rare 
tautomeric form? With increasing solution pH, the 
proton of the protonated adenine residue can indeed 
be extracted, but the base pair configuration changes 
from the protonated wobble to a much less stable, 
neutral, reverse wobble configuration involving only 
one hydrogen bond (Figure IL). The pH at which this 
transition occurs is between 7 and 8, depending upon 
the surrounding base sequence context. 

Studies with the adenine-cytosine mispair estab- 
lished two important concepts. First, base ionization 
can occur within base mispairs in duplex DNA, and 
that base protonation can stabilize the mispair. 
Second, unlike normal Watson-Crick base pairs 


which are observed as a single, predominant con- 
figuration, base mispairs may exist as a family of 
structures in equilibrium with one another. 

Numerous experimental studies have been con- 
ducted over the past few years on a variety of mispairs 
in DNA involving mutagenic base analogs such as 
5-bromouracil and 2-aminopurine, and bases chem- 
ically modified by carcinogens including those dam- 
aged by oxidation and alkylation. The picture which 
is emerging from these studies is that essentially all 
mispairs examined to date are best represented as a 
family of configurations which are in equilibrium with 
one another. Such equilibria may involve ionization, 
rotation of purine residues around the glycosidic bond 
from the normal anti to a syn conformation, and even 
tautomerization. 

To date, the only confirmed tautomeric equilibrium 
within a base mispair in DNA, observed by either 
NMR spectroscopy or X-ray crystallography, in- 
volves N*-methoxycytosine. This modified base, 
formed by reaction of methoxyamine with cytosine, 
is preferentially in the unusual imino configuration. 
However, the energy difference between tautomeric 
forms is sufficiently small that the tautomeric forms 
observed in DNA may be altered by changing the base 
paired opposite the modified base (Figure IM, N). 

As most modified base pairs represent a family of 
structures in equilibrium with one another, which of 
these forms represents the configuration which results 
in incorporation of the mispaired deoxynucleoside 
triphosphate by DNA polymerase? This question 
comprises the thrust of current and future experimen- 
tal research efforts. 

With normal base pairs, the energy of both hydro- 
gen bonding and base-stacking is optimized when the 
base pair assumes Watson—Crick geometry. With mis- 
pairs, however, the geometric configuration in which 
base stacking is optimum might correspond to a con- 
figuration with a substantial steric clash between 
hydrogen bonding protons or the positioning of two 
highly electronegative heteroatoms directly in front 
of one another. The most stable among the possible 
base pair configurations formed between mismatched 
bases is usually not Watson—Crick. As a consequence, 
mispair formation generally results in a substantial 
decrease in the thermal and thermodynamic stability 
of a DNA duplex. 

One view of the mechanism of correct base selec- 
tion by a polymerase is that DNA polymerase can 
discriminate between correct and incorrect base pairs 
by free energy differences. As mispairs are generally 
less stable, an incorrect deoxynucleoside triphosphate 
would dissociate more readily from the replication 
complex, and thus be less likely to be incorporated 
covalently by DNA polymerase. Although mispairs 


are generally less stable than correct base pairs, the free 
energy differences between correct and incorrect base 
pairs often are not sufficiently large to explain poly- 
merase selectivity. Therefore, other considerations, 
including the structure of water in the polymerase 
active site, as well as base pair geometry are being 
examined. 

Alternatively, minor configurations of the mispairs, 
in equilibrium with the predominant forms, may cor- 
respond to the configurations incorporated by DNA 
polymerase. As experimentally measured thermo- 
dynamic properties correspond to the predominant 
configuration under specific experimental conditions, 
the lack of correlation between thermodynamic 
measurements and polymerase insertion presence 
might suggest a role for minor configurations in base 
selection by DNA polymerase. 

Perhaps the best data to date in favor of the con- 
tribution of minor forms is from work with the muta- 
genic base analog 5-bromouracil (BrU). Both NMR 
and crystallography studies have demonstrated that 
the mispair formed between BrU and guanine in 
duplex DNA is predominantly wobble, under physio- 
logical conditions. With increasing solution pH, how- 
ever, the BrU residue ionizes and the base pair assumes 
a configuration similar to that of a Watson—Crick base 
pair. When the polymerase-directed coding properties 
of BrU are examined as a function of solution pH, 
formation of the incorrect BrU-guanine mispair 
increases with increasing solution pH. This experi- 
mental observation indeed suggests that the ionized 
form of BrU contributes to formation of the mispair, 
although the ionized form does not predominate at 
physiological pH. Further experimental studies with 
other systems should indicate if this is a general trend. 

Currently, several generalizations can be made with 
respect to base mispairs in DNA. First, the standard 
Watson—Crick base pairs can be observed as a strongly 
preferred, predominant configuration. However, most 
mispairs are best described as a family of different con- 
figurations in equilibrium with one another. The vari- 
ous possible structures of the mispairs can be related 
to one another by ionization, tautomeric shifts and/or 
rotation of purines from anti to syn conformations. 

Second, mispairs in general destabilize duplex 
structure. Discrimination by DNA polymerase may 
exploit structural differences between correct and 
incorrect bases, free energy differences, or both. The 
degree to which a given configuration of a mispair 
may contribute to mispair formation by DNA poly- 
merase could be influenced strongly by the specific 
mispair, the neighboring base sequence, as well as 
experimental variables such as solution pH. 

Third, the role of H-bonding between an incoming 
dNTP substrate and its complementary template base 
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may, in fact, be much less important in determining 
the fidelity of DNA synthesis than originally thought. 
Recent data show that difluorotoluene, a base analog 
of thymine that cannot form Watson—Crick H-bonds 
with adenine, is nevertheless incorporated opposite A 
almost as well as is T. This “surprising” finding sug- 
gests perhaps that geometrical and electrostatic prop- 
erties of the polymerase active site are likely to 
influence nucleotide insertion fidelity profoundly, 
favoring those structures which most closely approx- 
imate Watson—Crick base pairs. 

Indeed, various polymerases, and their associated 
proteins, may rely differently upon these factors to 
optimize fidelity and DNA replication rates. Future 
studies must consider this complex array of variables 
in order to explain how DNA polymerase makes 
errors on either natural or chemically damaged DNA 
templates. 


Further Reading 
Sinden RR (1994) DNA Structure and Function, pp. 12-22. San 
Diego, CA: Academic Press. 


See also: DNA Replication; Replication Errors; 
Wobble Hypothesis 
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A base substitution mutation is a change of a single 
base or base pair in an organism’s heritable genetic 
material (DNA or RNA). Base substitution mutations 
are a type of point mutation (point mutations are 
mutations that only change a few bases). 

Base substitution mutations are either transition 
or transversion mutations. Transition mutations are 
those changes that substitute a purine for the other 
purine or a pyrimidine for the other pyrimidine. 
Transversion mutations are those changes that substi- 
tute a purine for a pyrimidine or a pyrimidine for a 
purine (see Figure l). Thus, transition mutations main- 
tain and transversion mutations reverse the purine/ 
pyrimidine axis of the DNA helix. 

Base substitution mutations can be induced by a 
number of chemical agents. For example, N’-methy]- 
N’-nitro-N-nitrosoguanidine (MNNG) and ethyl- 
methane sulfonate (EMS) primarily generate G:C — 
A:T transitions, and the base analog 2-aminopurine 
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Figure | Base-substitution mutations are either 
transitions or transversions. Bases connected with “:” 
are base paired to each other. 


(2-AP) generates both A:T — G:C and G:C — A:T 
transitions. Benzo[a]pyrenediol epoxide (BPDE) and 
5-azacytidine (5AZ) induce G:C — T:A and G:C > 
C:G transversions respectively. Physical agents such 
as ultraviolet (UV) light can also induce base substitu- 
tion mutations. UV light induces a number of differ- 
ent mutations but G:C — A:T transitions predominate. 

When a base substitution mutation occurs in the 
coding region of a gene, the mutation can either be 
silent, a missense, or a nonsense mutation. Silent muta- 
tions are mutations that change a codon to another 
codon for the same amino acid. For example, if TAC 
is changed to TAT (a G:C — A:T transition), a silent 
mutation has occurred because TAC and TAT are 
redundant codons for tyrosine. A missense mutation 
is a mutation that alters the codon so that the amino 
acid is changed. If the TAC codon is changed to a 
CAC codon (an A:T —> G:C transition), histidine will 
be inserted instead of tyrosine. If a base substitution 
mutation causes a codon to be changed toa stop codon, 
it is called a nonsense mutation. If the TAC codon 
is changed to TAA (a G:C — T:A transversion) the 
tyrosine codon has been replaced by a STOP codon 
and the protein will be truncated at that point. 

Geneticists have designed various ways to test for 
specific mutations, and these methods can be used to 
determine what types of mutations are induced by 
certain agents. For example, Cupples and Miller 
(1989) developed Escherischia coli strains that cannot 
use lactose as a carbon source but can grow on lactose 
if a specific mutation has occurred. A mutagen is 
added to each of the tester strains in the set and the 
number of lactose-utilizing mutants determined. 
Usually only one of the tester strains will show a 
significant increase in the number of mutants induced 
by the chemical. Because the base change that is 
needed to produce lactose utilization is known, the 
mutation that is induced by that mutagen is deter- 
mined. 


Further Reading 
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In general, a base is any chemical species, ionic or 
molecular, that can accept a proton (hydrogen ion) 
from another substance. In molecular genetics, the 
term bases refers to the weakly basic nitrogenous, 
organic derivatives of pyrimidine and purine (Figure I) 
that are part of the nucleotide components of DNA 
and RNA. The four common bases in DNA are the 
purines adenine (Ade) and guanine (Gua) and the pyr- 
imidines cytosine (Cyt) and thymine (Thy). The four 
common bases in RNA are the same as those in DNA 
except that thymine is almost always replaced by uracil 
(Ura). One exception is that almost all transfer RNAs 
have thymine (attached to a ribose) at a particular 
conserved position. 

A base attached to ribose or deoxyribose consti- 
tutes a nucleoside. A nucleotide is a phosphate ester of 
a nucleoside. In a strand of DNA or RNA, the single 
letters A, G, T, C, and U usually designate the nucleo- 
sides for Ade, Gua, Thy, Cyt, and Ura. Thus a hexa- 
meric DNA oligonucleotide might be represented 
as pApGpApTpCpT and a comparable RNA oligo as 
pApGpApUpCpU. The bases in strands of DNA and 
RNA can interact with each other through hydrogen 
bonding, which provides the basis for the formation of 
specific base pairs; in DNA, Ade with Thy and Gua 
with Cyt, and in RNA, Ade with Ura, Gua with Cyt, 
and often Gua with Ura. The specificity of these inter- 
actions is the foundation for DNA replication, for 
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defining the secondary structure (and sometimes the 
tertiary or higher order structure) of structural RNAs, 
and for the decoding of genetic information in mes- 
senger RNA (base pairing between codon and tRNA 
anticodon). Modified bases or nucleoside are found in 
both DNA and RNA. Some of the modifications have 
been shown to have important biological functions. 


See also: DNA; Messenger RNA (mRNA); 
Transfer RNA (tRNA) 
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Erwin Baur (1875-1933) can be viewed as (1) the 
founder of plant virology, (2) the father of plastid genet- 
ics, (3) the first to present a fully rational explanation of 
plant chimeras, (4) the one who laid the basis for Antir- 
rhinum genetics that is so fertile today, and (5) the one 
being responsible for the controversial introduction of 
what is currently known as neo-Darwinism into the 
German-speaking world more than a decade before 
the “modern synthesis” was launched by Dobzhansky 
in 1937. 


Baur’s Youth, Education, and Years of 
Searching for his Scientific Research 
Subject 


Erwin Baur was born on 16 April 1875 in Ichenheim, 
near the Black Forest, Southern Germany, where his 
parents owned the local drugstore. Since there was no 
high school in Ichenheim, for further education young 
Erwin spent the next four years in Konstanz with his 
uncle, also a chemist having strong interests in botany. 
The boy regularly accompanied his uncle on botanical 
excursions contributing to collecting plants for their 
herbarium. Moreover, in 1892 a botanical journey with 
his father to Norway is reported to have been a key 


Baur, Erwin 199 


event for further deepening his botanical motivations. 
This was also the time when Erwin studied Humboldt’s 
voyage letters, Darwin’s On the Origin of Species, and 
works of the ultra-Darwinist Ernst Haeckel. 

Although he wanted to study botany, Erwin 
respected his father’s wish and began studying 
medicine in 1894 at the universities of Heidelberg, 
Freiburg, Strasbourg, and Kiel, gaining his MD in 
1900 in Heidelberg. 

After performing military service in the navy and 
working as an assistant physician in two mental asy- 
lums, Baur once and for all changed to botany in 1902, 
receiving his PhD in Freiburg in 1903 with a thesis on 
lichens. While in Freiburg Baur heard lectures by 
August Weismann, who emphasized natural selection 
as a key process in the origin of the species, but 
strongly opposed the idea of the inheritance of 
acquired characters still found in the works of Darwin, 
Haeckel, and others. 


Scientific Career and Main Scientific 
Discoveries 


In the same year, 1903, Baur obtained a post at the 
Berlin Botanical Institute, and in 1904 he became a 
reader in botany. It was also in 1904 that Baur’s growing 
interests in the laws of hereditary trait transmission led 
nearly simultaneously to at least five major research 
projects, later to be amplified by further activities, as 


detailed below. 


Baur’s Examination of the Infectious 
Chlorosis in Malvaceae 

After careful experimentation on the transmission of 
the disorder (especially in Abutilon), Baur concluded 
that viruses were responsible for infectious chlorosis. 
Publishing his results from 1904 to 1911 when hardly 
anything was known about viruses at all, his clear, 
daring, and essentially correct conclusions laid the 
basis for further research in that area: “By these infer- 
ences from his experimental investigations and reflec- 
tions, Baur had reached the limits of the knowledge of 
his time; moreover, beyond these limits he had deter- 
mined the way which proved to be correct for the 
following decades of virus research.” (Hagemann, 
2000, pp. 51-52) 


Non-Mendelian Inheritance in Pelargonium 
zonale 

Usually two men of science are quoted to have dis- 
covered non-Mendelian or cytoplasmic inheritance in 
plants at about the same time (1909): Erwin Baur and 
Carl Correns. However, it was definitely Baur who 
clearly drew attention to separate plastid inheritance 
whilst Correns developed a hypothesis that only the 
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Figure | 
from Stubbe, 1959.) 


Edwin Baur. (Reproduced with permission 


cytoplasm has changed but not the plastids them- 
selves. The following quotations illustrate Baur’s cau- 
tious, but clear-cut and entirely correct conclusions. 
On the very special case of biparental plastid genetics, 
Baur, 1909, pp. 349-350) wrote: 


The zygote, arisen by uniting of a “green” anda “white” sexual 
cell, contains two different plastids, green and white ones. In 
the course of cell divisions forming the embryo, the plastids 
segregate to the daughter cells according to the laws of prob- 
ability. If a daughter cell has only white plastids, all the cells 
derived from it will be white generating a white patch of 
cells. Ifthe cell has only green plastids, a green complex of cells 
is produced. There is no need for me to further analyze (the 
point) that cells with both kinds of plastids will be able to 
continue to segregate. 

... According to the present dominating opinion, the plas- 
tids of a zygote are derived solely from the mother. Whether 
this view is absolutely sure is not for me to decide... If, 
however, in contrast to the expert opinion so far, it can be 
shown that male gametes can also transmit plastids, the 
hereditary relations of the plants with the white edges will 
be entirely clear. Further studies will decide these questions. 


Baur’s analysis was proved to be fully correct by further 
research. 


Baur’s Explanation of Plant Chimeras 

In the years from 1907 to 1912 Hans Winkler’s plant 
chimeras were the botanical sensation of the time. 
Winkler believed that solely by grafting techniques 
he had generated genuine hybrids between different 
plant species. It was again Baur who, having worked 
since 1904 with Pelargonium zonale on related ques- 
tions, drew the correct inferences from the experimen- 
tal material. Baur distinguished between two kinds of 
chimeras: (1) sectorial chimeras, constituting plants 
with different tissues often forming sectors right 
through large parts or the entire body of the plant; 
and (2) periclinal chimeras, constituting plants the 
apical domes of which (as well as the cells derived 
from them) consist of genetically different cell layers, 
the differences of which may concern either the 
plastids or the larger DNA programs of the nucleus 
as well as the plastids. Baur was able to show con- 
vincingly that Winkler’s so-called genuine hybrids 
obtained by grafting were, in fact, periclinal chimeras. 
Winkler, after bitter controversy, accepted Baur’s 
analysis. Heribert Nilsson, one of Baur’s contempor- 
ary genetical pioneers in Sweden, commented (p. 61): 
“A more elegant solution of a complex problem by 
combining morphological, anatomical and genetic 
experiments can hardly be found in biological 
research.” 


The Beginnings of Genetical Research with 
Antirrhinum, Leading to Pioneering and 
Lasting Contributions to Genetics 

Baur began working with Antirrhinum in 1904 pub- 
lishing the first description of a (semi-dominant) lethal 
gene mutant in 1907 (the Aurea gene). Baur’s book on 
his Antirrhinum studies (1924) lists and discusses 29 
genes including three cases of multiple alleles. The first 
gene linkages in Antirrhinum were published by Baur 
for the genes Eluta-rosea-pallida in the years 1911 and 
1912. In 1927, after hearing Muller, Baur and his stu- 
dent Hans Stubbe worked on induction of mutations 
by X-rays and other mutagenic agents. Following 
Baur’s death, Stubbe continued this work, and pub- 
lished his Antirrhinum monograph in 1966, with a 
description of all the mutants obtained so far. 

For the first homeotic plant gene to be cloned, 
Baur’s deficiens mutant of 1917 was used (Sommer 
et al., 1990), followed by investigations on several 
other homeotic Antirrhinum mutants from his collec- 
tion (Theissen and Saedler, 1999). 


Baur’s Controversial Introduction of 
Neo-Darwinism into the German-Speaking 
World 

The term neo-Darwinism was first used for Weismann’s 
theory of evolution at the end of the nineteenth and 


beginning of the twentieth century and later in the 
English-speaking world for the “modern synthesis” 
beginning with Dobzhansky’s book Genetics and 
the Origin of Species (1937; for details, see Lénnig, 
1998). 

In clear contrast to the hypotheses of De Vries, 
Correns, Bateson, Goldschmidt, and others who 
thought that the origin of species was due to “large” 
mutations (saltationism) and selection was playing 
only a minor role, Baur thought that the mutations 
responsible for such adaptations had “small or even 
invisible effects on the phenotype” (Mayr, 1970, 
p. 169), selection being the other key factor in the 
evolutionary process. He also defended Darwin’s the- 
ory against the objection that very small differences 
will be hardly noticed by selection by stating that 
combinations of many hereditary factors will produce 
differences large enough to be relevant for selection: 
“At least for the differentiation of subspecies and 
closely related species we return to the pure Darwinian 
theory of selection, however with the addition that the 
original material to be selected is mostly produced by 
small mutations” (Baur, 1924, pp. 146-147). 

So, for his contemporary geneticists and biologists, 
it was Erwin Baur who was the driving force for 
the introduction of neo-Darwinism to the German- 
speaking world in the 1920s. It is Baur who is respon- 
sible for the resuscitation of this opinion which was 
one totally overcome by saltationism and Mendelism 
(Nilsson, 1953, p. 161) Mayr (1997, p. 352) concurs, 
stating that Baur’s work on Antirrhinum was a crucial 
factor in making the “new synthesis” of the 1930s and 
1940s possible. Before the “modern synthesis” nearly 
all Darwinians were still convinced of the inheritance 
of acquired characters (modifications) to be playing a 
major role in the origin of species and higher system- 
atic categories. As a geneticist, Baur was a Mendelist 
fully rejecting Lamarckism, formulating noticibly 
clear-cut definitions of the differences between modi- 
fications and mutations in his textbooks. However, in 
contrast to Mendel, who was convinced “that species 
are fixed with limits beyond which they cannot 
change” (Mendel, 1866, p. 47), Baur was a whole- 
hearted evolutionist. Discovering that many small 
segregating differences were responsible for the differ- 
ences between the many Antirrhinum species he had 
investigated, he returned to the idea that “small” muta- 
tions and selection were responsible for the origin of 
most Antirrhinum species. Incidentally, by employing 
Mendelian factors for the origin of species studied, 
Baur (together with Bateson) also was one of the first 
biologists to apply the laws of genetics to questions of 
systematic botany, a key question still largely being 
neglected in systematics at the beginning of the 
twenty-first century. 
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Yet Baur was rather cautious as to whether his 
findings could be extrapolated to the origin of the 
rest of the plant world, especially genera and higher 
systematic categories: “The inferences which can be 
drawn from experimental genetics for the problem of 
evolution are at present rather modest and more nega- 
tive than positive” (Baur, 1930, p. 401). Despite some 
exquisite progress in comparative biology (especially 
molecular genetics) illuminating systematic relation- 
ships in the plant world, this statement of Baur’s, too, 
has proved to be true to this very day (Lénnig, 1993, 
1955, 2001; Behe, 1996). 


Seed Collections 

Baur began work on seed collections in 1911, gather- 
ing original wheat and oat lines. After World War I he 
reinforced these activities participating in several 
excursions into Turkey, Spain, Portugal, and South 
America. Also, from about 1927 onwards, Baur 
had contacts with the famous Russian geneticist 
Nicolai I. Vavilov, who — independently of Baur — 
had recognized the key function of large seed banks 
for future recombinant plant breeding. Baur’s students 
continued his work in Germany, resulting in the 
Gatersleben collection of more than 50000 lines of 
original and cultivated plant lines. The enterprise of 
Baur and Vavilov can be viewed as the first scientific 
undertaking of the modern era for the conservation of 
biodiversity. 


Plant Breeding Projects 

Perhaps the most famous project inspired by Baur led 
to the discovery of the sweet lupin (in Lupinus luteus 
and L. angustifolius), at the end of the 1920s. Further 
projects involved wheat, rye, barley, potato, vine, and 
tree fruits. 


Publications 

With the help of elder colleagues, Baur initiated in 
1908 the world’s first genetical journal: Zeitschrift fiir 
induktive Abstammungs- und Vererbungslehre (now 
Molecular and General Genetics) and in 1929 a second 
journal Der Ziichter (now Theoretical and Applied 
Genetics), the latter intended to emphasize plant 
breeding. In 1917 Baur edited the first volume of 
Bibliotheca Genetica to be followed by 14 further 
monographs until 1930. Also to be mentioned is the 
edition by Erwin Baur and Max Hartmann of the 
multivolume Handbuch der Vererbungswissenschaft 
(Handbook of Genetics), begun in 1927. He also 
wrote two textbooks: Einführung in die experiment- 
elle Vererbungslehre (Introduction to Experimental 
Genetics) (1st edn 1911, 5th edn 1930) and Die 
Wissenschaftlichen Grundlagen der Pflanzenztichtung 
(The Scientific Basis of Plant Breeding) (1921). 


202 Baur, Erwin 


Baur and Eugenics 


It was Baur’s involvement in eugenics that he has been 
most severely critized for, even to the point as categor- 
izing him with the Nazi movement, or at least naming 
him as an important forerunner of their murderous 
race politics (Miiller-Hill, 1984; Gilsenbach, 1990). 

The following points have been cited to prove the 
accusation: (1) Baur supported the National Socialist 
sterilization laws in 1933; (2) Baur made some discrim- 
inating comments on the immigration of Eastern Jews 
(Ostjuden) into Germany; (3) already in 1906 Baur 
had become a member of the eugenics movement in 
Germany and had strongly contributed to the reputa- 
tion of eugenics in his country by working and pub- 
lishing in favour of that movement; and (4) he was 
involved in the foundation of the later infamous 
Kaiser-Wilhelm-Institut fiir Anthropologie, Eugenik 
und menschliche Erblehre in 1927. 

The following points may be enumerated in favor 
of Baur: (1) though strongly desiring to become secre- 
tary of agriculture, Baur never was a member of a 
political party; (2) in contrast to the majority of his 
colleagues, Baur did not send a declaration of loyality 
to Hitler after the dictator’s take-over of power in 
1933; (3) no Nazi would have suggested a Jewish 
scientist (R. Goldschmidt) to be his successor in Berlin 
in 1928; (4) Baur defended the Jewish member of his 
institute, Fanny du Bois-Reymond, against being fired 
and did his utmost to keep her as a coworker in 1933; 
(5) the English geneticist R.N. Salaman wrote 1934: 
“T have heard that Baur exerted himself energetically 
on behalf of his non-Aryan genetic colleagues” (as 
quoted by Kröner et al., 1994); (6) neither did Baur 
think of terminating the contracts with his leftist and 
Marxist coworkers like the geneticists Stubbe, 
Kuckuck, and Schick; (7) his students have given posi- 
tive accounts on their teacher, seeing in him a victim of 
Nazi politics rather than a collaborator; and (8) last 
but not least, Baur repeatedly contradicted National 
Socialist theories about pure races in man and empha- 
sized that the German people, too, were a mixture of 
different races as were all other nations. 

Moreover, Baur’s views on eugenics must be 
viewed in connection with the eugenics movement of 
his time world-wide (sterilization laws were passed in 
Norway, Finland, Sweden, Denmark, Iceland, Canada, 
and the USA), and especially in association with his 
early imprinting by Darwinian and Haeckelian ideas 
on the origin of species including man (see above). 
Darwin himself had emphasized “the preservation of 
favoured cases in the struggle for life” already in the 
title of his book On the Origin of Species in 1859. 
Practically all of Darwin’s followers of the nineteenth 
century were racists, most outspokenly the German 


zoologist Ernst Haeckel. Shipman (1994), pp. 134-135) 
comments the special German situation as follows: 


The influence of Haeckel’s anti-Semitic views on German 
society and the Nazi party was immense because of his huge 
personal following and high scientific standing. ... Hitler 
referred directly to many of Haeckel’s most important 
ideas, including the biological unfitness of the Jews and the 
sure doom that would befall the German people if they did 
not cleanse themselves of such impurities. 


In contrast to Haeckelian and Nazi views, Baur never 
proposed the destruction of “foreign race elements” 
(for further details, see Kroner et al., 1994). In general, 
Baur’s character appears to have been inclined to avoid 
inhumanity and brutality including the pronounced 
racism of Haeckel. 


Concluding Remarks 


Erwin Baur was an ambivalent character. He was an 
internationalist in science and nationalist at home. He 
is described as having been generous and liberal in his 
treatment of his coworkers, andalsoas having displayed 
an all-embracing claim to leadership so that his 
coworkers had difficulties in developing their own 
scientific profile; he is said to have been entirely free 
from vanity and self-interest and also to have been an 
inconsiderate egocentric who sometimes even pro- 
claimed research results of his coworkers as his own. 
In general, his behaviour at the institute as well as to 
his wife and four children has been described as open- 
minded and communicative (displaying a good sense 
of humor), yet to his coworkers also as sometimes 
terribly quarrelsome. 

The last year of his life was overshadowed by heavy 
economic problems of his institute, the Kaiser- 
Wilhelm-Institut fiir Zichtungsforschung Miinche- 
berg (the later Max-Planck-Institut fiir Ziichtungs- 
forschung). The institute was founded on condition 
that Baur himself collected the money necessary for 
the maintenance of the institute from industry, admin- 
istration, and international societies, which proved to 
be too heavy a burden after some four years. Personal 
and political problems added their part to his situ- 
ation, and Baur died on 2 December 1933 after an 
acute heart attack. 

Hans Stubbe made a commemorative speech on 
Erwin Baur in 1958. After mentioning his strengths 
and weaknesses he closed his statement as follows: 


He was a man, take him for all in all, 
I shall not look upon his like again. 
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Bayes’ theorem states: P(A|B)P(B) = P(B|A)P(A) and 


is often rewritten in the form 


P(A|B)P(B) 


Pray = PRIA) 


In these expressions, P(A|B) is the conditional prob- 
ability of A given B and P(B|A) is the conditional 
probability of B given A. This relationship between 
the conditional probability expressions is most 
commonly encountered in scenarios in which new 
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information is used to revise a prior probability. The 
multiplication rule for independent events is a special 
case of Bayes’ theorem. The more general formulation 
of the theorem is useful for the many situations in 
which the events A and B are either known not to be 
independent or are of uncertain independence. 

Nonindependent events are common in everyday 
life, so Bayes’ theorem has a wide range of potential 
applications. The most important of these involve 
refining predictions based on subsequently obtained 
information. A few common examples include traffic 
patterns, weather prediction, and economic forecasting. 

Establishing linkage between an unmapped gene 
and a marker has different implications in a maximum 
likelihood setting than in a Bayesian setting. A like- 
lihood interpretation asserts that linkage is established 
when the null hypothesis of no linkage is rejected at 
the established significance level and assigns the gen- 
etic distance as the maximum likelihood estimate of 
the map distance between the test locus and the mar- 
ker. The Bayesian interpretation addresses a different 
question — evaluating the probability of a specific map 
distance given a specific data set. 

Consider the problem of establishing whether a test 
locus is linked to a test locus at some distance 0. Let us 
suppose that the marker is located at the midpoint of a 
70cM chromosome in a 1400cM genome, supposi- 
tions consistent with estimates of murine chromo- 
some and genome sizes. It is clear that in the absence 
of additional data and assuming that genes are uni- 
formly distributed over the genome, P(linkage) = 70/ 
1400 = 0.05. In general, P(linkage) is the chromosome 
of interest’s fractional genome length. 

Next, suppose that genotype data are obtained 
from a backcross in which the test locus and the mark- 
er are scored as either concordant or discordant. For a 
sample of N individuals the probability of having R 
concordant and (N — R) discordant individuals is 
given by the binomial term 


RJC- DD 


In this expression, 1 — D is the probability of concord- 
ance and D is the probability of discordance. Accord- 
ing to the null hypothesis of no linkage D = 0.5*. 

In the maximum likelihood interpretation, the 
value of D is assigned as to maximize the overall 
expression. This is the maximum likelihood estimate 


*Slight modification of these expressions is necessary if 
mapping functions (Kosambi, Haldane, etc.) are used, but 
these do not affect the statistical interpretation. Here, map 
distance = recombination fraction is assumed. 


of the linkage distance. This value is compared to 
the likelihood according to the null hypothesis. The 
logarithm of the ratio of the maximum likelihood to 
the null hypothesis likelihood gives the maximum 
logarithm of the odds or LOD score. Traditionally 
(although not in many recent treatments), in mamma- 
lian linkage mapping a threshold value of 3.0 has been 
used for establishing linkage. The confidence interval 
of linkage is then calculated by integrating over loca- 
tions around the LOD maximum. 

In the Bayesian interpretation, the binomial term 
given above is given a different interpretation. It is 
P(data|linkageD), where D is a specific map position. 
For all unlinked locations in the genome, D = 0.5 and 
in the example given above total 1330/1400 cM of the 
genome. These are all equivalent from the standpoint 
of mapping, so that: 


P(data|nolinkage) = (1330/1400) (¥) (0.5) (0.5)? 


For the remaining, linked, 70cM of the hypothetical 
genome, D may assume any value between 0 and 0.35 
and P(data|linkage) is 


eror 


Bayesian analysis of this experiment is possible 
because there is a finite and enumerable set of linkage 
relationships. The sum of P(data|linkage) + P(data|no 
linkage) gives us an expression for P(data). In this 
example, we have generated expressions for P(link- 
age), P(data), and P(data|linkage). Bayes’ theorem is 
then used to calculate P(linkage|data) according to 


P(data|linkage)P (linkage) 


P(linkage|data) = P(data) 


Thelinkage confidence interval canbe calculated simply 
by choosing limits of integration for P(linkage|data) 
to include (1 — significance level), as the contributions 
of each location on the genome P(linkage|data) sum 
to unity. It is worth noting that P(linkage|data) is in 
fact the quantity that geneticists wish to determine. 
This Bayesian interpretation also reminds us that the 
judgement of linkage and map distance is contingent 
on the available data. 

It appears from the above discussion that, in prac- 
tice, Bayesian analysis has little effect other than to 
demand more stringent criteria for a given level of 
significance. This observation may explain why a 
Bayesian significance level of 0.05 is approximately 
equivalent to a LOD score of 3.0 — reflecting the 
approximate factor of 20 by which P(no linkage) 
exceeds P(linkage) in the absence of prior data. This 


is less important, however, than appreciation that the 
statistical questions addressed by maximum likelihood 
and Bayesian interpretations differ as outlined above. 
The crux of this difference is that the Bayesian inter- 
pretation leads naturally to evaluation of P(linkage| 
data) at every location in the genome and that over all 
locations, these must sum to 1. The Bayesian interpret- 
ation partitions a finite, normalized P(linkage|data) 
while the maximum likelihood interpretation com- 
pares the likelihood of the best location to no linkage, 
without considering the excess of unlinked locations 
in the genome. 
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The Bcl-2 family of cytoplasmic proteins are major 
regulators of programmed cell death. This process of 
cellular suicide, termed apoptosis, is thought to be 
conserved among all metazoan organisms. Apoptosis 
is vital for normal development, maintenance of tissue 
homeostasis, and proper immune function, and dis- 
turbances in it are implicated in disorders ranging 
from cancer to degenerative and autoimmune diseases. 
Typically, members of the Bcl-2 family, some of which 
work in opposition to the others, determine whether 
a cell commits to undergo apoptosis. The apoptotic 
program is mediated by proteases of the caspase 
group, which cleave vital cellular proteins. However, 
the caspases are normally maintained as nearly in- 
active precursors. It is thus the task of the Bcl-2 family, 
in response to developmental and environmental cues 
and various intracellular damage signals, to determine 
whether certain procaspases are processed into the 
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active form and initiate the proteolytic cascade that 
dismembers the cell. 


Discovery and Link to the Cell Death 
Pathway 


The gene for Bcl-2, the founder of the family, was 
discovered as a presumptive oncogene activated by 
chromosome translocation in follicular lymphoma, a 
common human malignancy of B lymphocytes. The 
subsequent unexpected discovery in 1988 that the 
bcl-2 gene conveyed cell survival was seminal because 
it revealed the first molecularly defined regulator of 
cell death. Moreover, the finding immediately sug- 
gested that enhanced cell survival might be a critical 
step in oncogenesis, whereas attention had previously 
been focused almost exclusively on altered prolifer- 
ation. The first insights into the genetics of cell death 
came from the nematode Caenorhabditis elegans. All 
the cell deaths that occur during the development of 
this worm have been shown to require three genes 
(ced-3, ced-4, and egl-1), whereas the cells are saved 
by a gain-of-function mutation in the ced-9 gene. 
Genetic analysis now suggests a pathway in which 
EGL-1 counters CED-9 activity, whereas CED-9 
inhibits CED-4, which is required for the action of 
CED-3. Satisfyingly, the ced-9 gene proved to be the 
nematode counterpart of bcl-2, and indeed each of the 
C. elegans genes is now known to have one or more 
homologs in mammals and probably also in Droso- 
phila. CED-3 proved to be a caspase, whereas CED-4 
is an adaptor or scaffold protein that activates CED-3 
by binding its precursor and inducing it to form multi- 
mers that can undergo autocatalysis. Thus, CED-9 
functions by keeping CED-4 in a latent form. Like- 
wise, Bcl-2 is thought to restrain the activity of the 
mammalian CED-4 homolog Apaf-1 (see below), 
which regulates activation of caspase-9. 


Competing Activities within the Bcl-2 
Family 


A remarkable feature of the Bcl-2 family is that, 
whereas some members promote cell survival, others 
instead favor apoptosis. The mammalian life-sparing 
members include not only Bcl-2 but also Bcl-x,, 
Mcl-1, A-1, and Boo. They share either three or four 
conserved regions, termed Bcl-2 homology (BH) 
domains. (Reflecting their order of discovery, these 
domains are numbered from the N-terminus as BH4 
- BH3 - BH1 - BH2.) The promoters of apoptosis fall 
naturally into two groups. The structure of some, such 
as mammalian Bax, Bak, and Bok, is surprisingly simi- 
lar to that of the Bcl-2 subfamily and includes the 
BH3, BH1, and BH2 domains. The second apoptogenic 
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group, however, is much more distant and hetero- 
genous. Its members, which include the mammalian 
Bik, Bad, Bid, Bim, Hrk, and Noxa as well as the 
nematode EGL-1, have only the short (9-16 residue) 
BH3 domain in common with the family and with 
each other. This amphipathic «-helix is necessary and 
perhaps sufficient for the apoptogenic action of the 
“BH3-only” proteins and is also very important in 
that of the Bax subfamily. 

The opposing action of various Bcl-2 family mem- 
bers is in part due to their ability to form hetero- 
dimers, and structural studies on Bcl-x; revealed the 
basis. A hydrophobic groove on the surface of Bcl-x, 
formed by the convergence of a-helices in its BH3, 
BH1, and BH2 regions, can bind tightly to the BH3 
a-helix of an apoptogenic family member. This inter- 
action is thought to inactivate pro-survival function, 
or perhaps even render the molecule apoptogenic. 

In addition to the BH domains, most members of 
the Bcl-2 and Bax subfamilies and certain of the BH3- 
only proteins also possess a hydrophobic C-terminal 
domain. It facilitates their targeting to the surface of 
the mitochondria, endoplasmic reticulum, and nuclear 
envelope, the sites where the majority of pro-survival 
molecules typically reside and where the apoptogenic 
ones gather during apoptosis. 

To preclude unwarranted apoptosis, healthy cells 
prevent heterodimerization between family members 
in various ways. Some of the apoptogenic genes, such 
as those for EGL-1, Hrk, or Noxa, are transcribed 
predominantly only after particular apoptotic stimuli. 
Most apoptogenic members, however, are constitu- 
tively made but restrained by their conformation or 
subcellular localization. In Bid, for example, the BH3 
domain is buried and becomes exposed only when Bid 
has been cleaved by, for example, caspase-8, which is 
usually activated by “death receptors” of the TNF 
family. With Bad, phosphorylation at different sites 
allows its sequestration by 14-3-3 proteins or directly 
masks its BH3 domain. Bim, on the other hand, is 
rendered inactive by its association with the dynein 
motor complex on microtubules. Finally, the conform- 
ation of Bax keeps it in the cytosol until an unknown 
signal triggers its oligomerization and translocation to 
organelles such as mitochondria. 


Biological Roles of Family Members 


It now seems reasonable to expect that, in all metazoan 
organisms, the activity of Bcl-2-like proteins is essen- 
tial for the survival of most if not all nucleated cells. 
Development in C. elegans, for instance, requires the 
ced-9 gene. In mice, disruption of pro-survival genes 
has led to apoptosis in specific tissues. For example, 
Bcl-2 is required for the maintenance of the lymphoid 


system, Bcl-x, for erythropoiesis and neurogenesis, 
and Bcl-w for spermatogenesis. These tissue-specific 
effects, however, almost certainly reflect the partially 
overlapping expression patterns and largely redundant 
functions of these guardians. 

The BH3-only proteins appear to represent senti- 
nels for various types of intracellular damage as well as 
major mediators of developmentally induced apopto- 
sis. In C. elegans, expression of the egl-1 gene heralds 
developmental cell death. In mice, the bim gene is a 
major mediator of hemopoietic homeostasis, because 
its disruption leads to excess myeloid and lymphoid 
cells and eventually to autoimmune disease. Notably, 
the bim —/— lymphocytes exhibit impaired responses 
to certain apoptotic stimuli (e.g., cytokine withdrawal, 
taxol) but not others (e.g., gamma-irradiation), whereas 
liver cells deficient in Bid are refractory to cytotoxic 
signals from the CD95 death receptor. Hence, 
specific BH3-only proteins may have the prime 
responsibility for monitoring particular cellular com- 
partments and/or detecting particular stresses. The 
biological role of the Bax-like genes is not as yet 
well understood. They may act as additional sentinels 
or instead act mainly downstream of Bcl-2-like pro- 
teins, perhaps to deliver the final blow to the cell after 
the latter have been inactivated. Growing evidence 
suggests that the Bax-like proteins can kill cells 
independently of association with the pro-survival 
molecules, perhaps by damage to mitochondria (see 


below). 


Potential Molecular Mechanisms for 
Regulating Cell Death 


At present the function of the apoptogenic family 
members seems better understood than that of the 
pro-survival ones. Most BH3-only proteins appear 
to represent direct antagonists of their pro-survival 
relatives, although it is uncertain whether Bid targets 
those molecules or instead activates Bax or Bak. Mem- 
bers of the Bax-like group may kill in two ways: (1) by 
using their BH3 domain to ligate the Bcl-2-like pro- 
teins and (2) by using the duplex hairpin structure in 
the BH1-BH2 region to penetrate the membranes of 
organelles, particularly mitochondria. 

How the Bcl-2-like proteins convey cell survival 
remains contentious. It is generally accepted that their 
impact is greatest on death stimuli that would other- 
wise lead to mitochondrial disturbances. More 
specifically, the major pathway regulated by the mam- 
malian antiapoptotic family members is thought to 
lead to activation of caspase-9 via the adaptor Apaf- 
1, the only known mammalian homolog of C. elegans 
CED-4. This belief is based on the observations that 
Bcl-2 can prevent the release from mitochondria of 


cytochrome c, an essential cofactor for Apaf-1, and 
evidence that certain cells, albeit not others, lacking 
Apaf-1 or caspase-9 are refractory to cytotoxic stimuli 
that Bcl-2 can regulate. 


Direct Sequestration of Caspase Activators? 
At present, there is conflicting evidence as to whether 
a Bcl-2-like protein functions by directly or indirectly 
inhibiting a caspase activator. For C. elegans, a direct 
sequestration model is strongly favored. In healthy 
worm cells, CED-9 appears to hold CED-4 on the 
mitochondria, but in cells fated to die EGL-1 is 
expressed and displaces CED-4 from CED-9. CED- 
4 then translocates to the nuclear envelope, where 
it presumably activates CED-3. In mammalian cells, 
on the other hand, the pro-survival proteins do not 
appear to associate with Apaf-1. In healthy cells, 
Apaf-1 appears to be a monomeric cytosolic protein, 
although some Apaf-1 molecules may be associated 
with other proteins (HSP90, Aven). If Bcl-2 does not 
sequester Apaf-1, it must control Apaf-1 indirectly. 
Bcl-2 might, for example, sequester another, as yet 
unidentified, mammalian CED-4 homolog which acts 
upstream of Apaf-1. In that model, the true initiator 
caspase regulated by Bcl-2 remains to be identified, 
and Apaf-1 and caspase-9 serve merely to amplify the 
proteolytic cascade. 


Guardian of the Organelle Barrier? 

Another model for pro-survival function, currently 
more widely embraced, abandons the parallel with 
the C. elegans pathway and postulates that the mam- 
malian Bel-2-like proteins act by preserving the integ- 
rity of organelles, particularly the mitochondrial outer 
membrane. Certainly, the pro-survival proteins can 
prevent the release from the mitochondrial intermem- 
brane space of cytochrome c and other apoptogenic 
molecules, such as the recently described Diablo/ 
Smac. How Becl-2 preserves this barrier function, 
however, remains obscure. One clue is that some evi- 
dence exists that Bcl-2 normally associate with the 
mitochondrial channels, the ‘permeability transition 
pores,’ that allow small molecules such as ATP to pass 
through its two membranes. Speculatively, Bcl-2 
might stabilize this pore, whereas oligomerized Bax 
might interact with the pore, or act alone, to form the 
putative novel channels large enough to allow passage 
of the apoptogenic proteins. Thus, it remains to be 
determined whether a Bcl-2-like protein functions 
through direct association with a novel caspase- 
activator protein or with an organelle component, 
such as the mitochondrial pores, or in some other 
fashion. In view of the central role of the Bcl-2 family 
in determining the life or death of cells, the answers 
will be eagerly sought. 
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The BCR/ABL oncogene is the product of the Phila- 
delphia chromosome (denoted Ph' or simply Ph). The 
Ph chromosome was first identified over 40 years 
ago by Nowell and Hungerford in Philadelphia as an 
abnormally short G-group chromosome in blood cells 
from patients with the myeloproliferative disease 
chronic myelogenous or myeloid leukemia (CML), 
and has the distinction of being the first genetic 
abnormality identified in a human cancer. The devel- 
opment of chromosome-banding techniques allowed 
the identification of the Ph chromosome as the der22 
product of a balanced translocation between chromo- 
somes 9 and 22, t(9;22) (q34.1;q11.21). The structure 
of the breakpoint on the Ph chromosome was eluci- 
dated in 1984, with the demonstration that the c-ABL 
gene on chromosome 9 was translocated to a restricted 
region of about 6kb on chromosome 22 called the 
breakpoint cluster region, or ber. Subsequent charac- 
terization of this region demonstrated that bcr was in 
the middle of a protein-coding gene composed of 
25 exons, now called BCR. Conversely, the break- 
point in the ABL gene on chromosome 9 was variable, 
and located in a large first intron of over 250 kb. The 
c-ABL gene encodes a nonreceptor protein-tyrosine 
kinase, c-Abl (see c-ABL Gene and Gene Product), 
while the BCR gene product is a 160 kDa cytoplasmic 
phosphoprotein, Bcr. As a consequence of the Ph 
translocation in CML, the first 13 or 14 exons of 
BCR are fused 5’ to the second exon of c-ABL, with 
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maintenance of the translational reading frame. Tran- 
scription of the chimeric gene and RNA splicing gen- 
erates an 8.5 kb mRNA encoding a fusion protein of 
210 kDa, designated p210 Bcr/Abl. There are actually 
two alternative p210 fusion genes (usually designated 
b2a2 and b3a2) found in different CML patients, 
depending on whether BCR exon 14 is included in 
the fusion. The b3a2 p210 protein is 25 amino acids 
longer than the b2a2 isoform. The reciprocal trans- 
location product from the der9 chromosome is a 
chimeric ABL/BCR gene whose reading frame is also 
intact, but the variable expression of this gene in CML 
patients argues against a significant role for this fusion 
gene in leukemogenesis. 

The Ph chromosome is found in over 90% of 
patients with clinical features of CML (which include 
leukocytosis with increased maturing myeloid cells, 
hepatosplenomegaly, and basophilia). Among the 
infrequent patients with CML-like disease that lack 
the Ph chromosome, about half demonstrate molecu- 
lar evidence of fusion of BCR and ABL when South- 
ern blotting, reverse transcriptase polymerase chain 
reaction, or fluorescence im situ hybridization are 
employed. These patients are likely to have multiple 
chromosomal rearrangements in their leukemic cells 
that obscure the Ph translocation. Of the remaining 
Ph-negative patients, many have atypical clinical fea- 
tures and may represent another disease, such as a 
myelodysplastic syndrome. A small number may 
have a variant 9q34 translocation that fuses c-ABL to 
another gene, such as TEL (ETVb) on chromosome 
12p13. Besides CML, the Ph chromosome is also 
found in several other human hematologic malig- 
nancies, most commonly in some cases of B-lymphoid 
acute lymphoblastic leukemia (B-ALL), and infre- 
quently in acute myeloid leukemia, non-Hodgkin’s 
lymphoma, and multiple myeloma. The majority of 
adult and pediatric patients with Ph-positive B-ALL 
have a distinct type of chimeric BCR/ABL gene that is 
results from a chromosome 22 breakpoint within the 
first intron of BCR, rather than in the classic ber 
region. The product of this chimeric gene (denoted 
ela2) is a fusion protein of 190kDa, p190 Ber/Ab1 
(also referred to as p185 in some references). A third 
and less common variant fuses BCR exon 19 to ABL 
exon 2 (e19a2), generating a p230 form of Bcr/Ab1. 
Rare patients with fusion of BCR exon 1 or 13 to ABL 
exon 3 (b1a3 and b2a3) and BCR exon 6 to ABL exon 
2 (b6a2) are also observed, where the translational 
reading frame is preserved in all cases. 

While neither Bcr nor c-Abl proteins will trans- 
form cells, the Ber/Abl fusion protein transforms 
fibroblasts, hematopoietic cell lines, and primary 
bone marrow cells in vitro. Furthermore, expression 


of the BCR/ABL gene in hematopoietic cells in mice 


by retroviral transduction or transgenic mice induces 
fatal leukemias that closely resemble human CML and 
B-ALL, demonstrating that BCR/ABL is the funda- 
mental cause of these diseases. The Bcr/Abl fusion 
protein is localized to the cytoplasm and actin cyto- 
skeleton of hematopoietic cells, and has increased and 
dysregulated tyrosine kinase activity relative to c-Abl. 
The tyrosine kinase activity of Ber/Abl is required 
for transformation and leukemogenesis, and small 
molecule inhibitors of the Abl kinase can revert Ber/ 
Abl-transformed cells in vitro and induce clinical 
remissions in CML patients. In the Ber portion of 
the p210 polypeptide, the fusion protein contains 
an N-terminal coiled-coil oligomerization domain, 
an atypical serine-threonine kinase domain, a region 
that binds to the SH2 domain of Abl in a non- 
phosphotyrosine-dependent manner, and a region of 
homology to the Dbl/Cdc42 oncogene. The Abl por- 
tion of the fusion contains all of c-Abl except the short 
first exon-derived sequence, and includes the Src 
homology 3 and 2 domains, catalytic domain, and a 
large C-terminal region containing nuclear localiza- 
tion and export signals, and DNA- and actin-binding 
domains. Expression of Ber/Abl results in constitutive 
activation tyrosine phosphorylation of many cellular 
proteins and activation of a large number of cell sig- 
naling pathways, including Ras, Rac, MAPK/ERK, 
SAPK/JNK, phosphatidylinositol 3-kinase, NF-kB, 
Myc, Jun, and Jak/STAT. Experiments with inhibitors 
and dominant-negative mutants suggests that many 
of these pathways contribute to transformation by 
Ber/Abl. Cells expressing Bcr/Abl have increased 
proliferative capacity in limiting concentrations of 
mitogens, increased survival in response to cytokine 
deprivation and genotoxic damage, and multiple 
abnormalities of cytoskeletal structure and function. 
However, the precise mechanisms by which Ber/Abl 
induces cell transformation in vitro or leukemia im 
vivo are not understood. 


See also: c-ABL Gene and Gene Product; 
Leukemia, Chronic 


Bead Theory 


J H Miller 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.0121 


Bead theory is the concept that genes are arrayed on the 
chromosome much like beads in a necklace, in that dif- 
ferent allelic states are represented by the whole gene, 
orwholebead,beingdifferent. Accordingtothis theory, 


the gene cannot be separated into smaller parts that 
are themselves mutable or able to be recombined. 
Benzer’s uncovering of the fine structure of the gene 
rendered the bead theory obsolete. 


See also: Alleles 


Beckwith—Wiedemann 
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In Beckwith-Wiedemann syndrome babies are born 
with both pre- and postnatal overgrowth. Children 
are pronetoembryonal tumors withabout5% develop- 
ing tumors, particularly Wilms’ tumor. Although most 
cases are sporadic there is some clustering in families 
and clearly a genetic component. The genetics is par- 
ticularly complex because a cluster of genes, mapping 
to 11p15, and showing genomic imprinting, are in- 
volved. A small number of cases show cytogenetic 
abnormalities involving this region of 11p. Maternal 
transmission of the abnormalities carry a greater risk 
to the offspring, demonstrating the parent of origin 
effects associated with imprinting. In approximately 
20% of sporadic cases uniparental paternal disomy 
if 11p15.5 is found as a somatic mosaic, which has 
occurred as a result of postzygotic mitotic recombin- 
ation. ‘Significant genes within the imprinted gene 
cluster are 1GF2, H19, KVLQT1, and CDKNIC and 
the most important event is overexpression of [GF2. 
This is mediated by H19, a gene with no protein pro- 
duct. Point mutations have occasionally been found in 
CDKN1C, acell cycle kinase, but KVL Q71 is unlikely 
to have an effect on the phenotype. 


See also: Wilms’ Tumor 
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Overview 


Behavioral genetics is one of the oldest areas within 
the discipline, and, until relatively recently, one of the 
least well-developed, for two reasons. The first is 
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technical: the difficulty of measuring behavior, which 
is a dynamic phenotype quite unlike any of the char- 
acteristics that geneticists usually like to measure. The 
second is its political incorrectness, and the shadow 
cast by its tragic and gruesome history in the twentieth 
century. Since the 1980s, however, the genetic analysis 
of the nervous system and the explosion of interest 
in molecular neuroscience has moved the study of 
behavior into center stage. 


The Early Days 


Plato, writing in The Republic was the first to suggest 
that the 


best men should as often as possible form alliances with the 
best women, and the most depraved men, on the contrary, 
with the most depraved women: and the offspring of the 
former is to be educated, but not of the latter, if the flock is to 
be of the most perfect kind. 


This crude eugenics message has found many fol- 
lowers in more recent times. Francis Galton, as a 
young man, took distinguished Victorians (men, of 
course) whom he considered to be geniuses in their 
own fields, and examined the eminence of their male 
relatives. He observed that the closer the genetic rela- 
tionship, the more likely were their relatives to be 
successful, suggesting to him, a genetic predisposition 
for mental ability. As an older man, Galton founded 
the Eugenics Society, and every decade there was 
an international meeting to discuss genetics and the 
“directed” evolution of human behavior. The first two 
meetings were attended by the “who’s who” of the 
genetics world. The third meeting, held in New York 
in August 1932, had been abandoned by reputable 
geneticists, and had descended into an unpleasant 
farce, with paper after paper advocating mass steriliza- 
tion of moral or mental “inferiors.” Unfortunately, this 
message was taken literally in the USA and marked a 
dark period in their history with tens of thousands of 
forced sterilizations. This was to be taken much further 
in Germany, culminating in the Holocaust. 

In this climate of genetic determinism, mostly advo- 
cated by the medical profession, psychologists inter- 
ested in the genetic analysis of behavior either kept 
their heads down or, like JB Watson, put forward the 
opposite view of extreme environmentalism. Mean- 
while, during the 1920s and 1930s, Edward Tolman 
and his student Robert Tryon were performing their 
classic experiments on the genetic basis of maze 
learning in rats. This work was not fully published 
until the late 1940s, when it stimulated the field we 
now call behavioral genetics. There were two schools 
in those early days of the 1950s, the American 
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school, which consisted of psychologists interested in 
subjects such as animal learning, and the European 
school, whose background was based more in the 
emerging ethology. The sophisticated developments 
that were taking place in quantitative genetics pro- 
vided useful tools for analyzing the genetic architec- 
tures of polygenic behavioral traits from inbred or 
selected strains. Consequently, hundreds of studies in 
mice, rats, and flies were conducted in which the main 
question asked was whether high or low levels of a 
specific behavior in particular strains were due to 
underlying dominant, additive, or epistatic genetic vari- 
ation. Mapping behavioral genes was not feasible 
using such methods. 

Geneticists interested in human behavior also made 
headlines, using family and adoption studies, particu- 
larly the twin paradigm, to assess the contributions of 
genetic variation to human traits such as IQ, alcohol- 
ism, schizophrenia, manic depression, etc. The IQ 
debate that ran from the late 1960s onwards was parti- 
cularly fiercely contested and its implications for racial 
differences stirred up sensitive political and sociologic- 
al issues. The results of all these studies, on almost 
every conceivable personality trait, was that there was 
nearly always a significant genetic component, some- 
times minor, sometimes more compelling. However, 
localization of the relevant genes was impossible until 
the molecular revolution took hold in the 1980s. 


Benzer and Single Genes 


In the late 1960s, Seymour Benzer, a geneticist work- 
ing in California, advocated a new approach to study- 
ing behavior. Using Drosophila as the model system 
he suggested that instead of investigating polygenic 
inheritance through strain differences, chemical muta- 
genesis should be used to induce single gene mutations 
in the behavior of choice. Setting up ingenious mass- 
screening techniques, and clever genetic tricks, he 
succeeded in generating many mutations in simple 
behaviors such as phototaxis, geotaxis, flight, loco- 
motor activity, and coordination. The mutations 
could then be mapped, and by use of genetic mosaics 
and fate mapping (see Neurogenetics in Drosophila), 
the region of the nervous system responsible for the 
mutant behavior could be roughly identified. In 
essence, he was using the mutated gene and behavior 
as scalpels to dissect the function of the nervous 
system. This gave rise to the term ‘neurogenetics,’ 
and represented a dramatic departure from the quanti- 
tative genetics methodologies used previously to study 
behavior. 

Initially, this new work caused enormous problems 
for the more traditionally minded behavioral geneti- 
cists. They were unimpressed with the anatomical, 


physiological, and biochemical studies coming from 
Benzer’s laboratory, and complained that the mutants, 
which often showed quite bizarre behavior (e.g., leg 
shaking under ether, dropping dead after a few hours, 
etc.), could not contribute to any understanding of 
“normal behavior,” either at the functional or evolu- 
tionary level. However, as we shall see below, they 
were wrong. 

Gradually, Benzer’s students began to study much 
more complex phenotypes, including learning, circa- 
dian rhythms, and sexual behavior. For example, they 
isolated mutants that could not learn to associate an 
odor with electric shock, or mutants that could learn 
but forgot very quickly, mutants whose 24-hour 
circadian clock ran fast or slow, or not at all (see 
Clock Mutants), and mutants whose lovesongs were 
abnormal and so males had trouble finding a mate 
(see Neurogenetics in Drosophila). The relevant genes 
were mapped, and mosaic analysis revealed those parts 
of the nervous system that were responsible for 
generating normal behavior. This was particularly 
successful in the analysis of courtship in which the 
various hierarchically arranged steps in the male 
behavioral sequence, were mapped sequentially to 
different neural foci. This was noninvasive neuro- 
ethology at its best. 


Molecular Behavioral Genetics 


In the early 1980s, Richard Scheller demonstrated that 
behavioral genes could be identified at the molecular 
level, by cloning a neuropeptide gene involved in the 
egg-laying behavior of Aplysia, a large marine mollusc. 
The gene was highly expressed in a cluster of neurons 
called bag cells, and by extracting their mRNA during 
the Aplysia mating season, and then screening a genomic 
library, Scheller identified the genomic sequence that 
corresponded to egg-laying hormone (ELH), a small 
neuropeptide whose amino acid sequence was already 
known. The gene also encoded a number of other 
neuropeptides that could becleaved posttranslationally, 
and which were also known to be important in control- 
ling various aspects of egg-laying behavior. This repre- 
sented a major leap forward in understanding how a 
single gene could encode a complex behavioral pro- 
gramme, in that a coordinated set of egg-laying behav- 
iors could be encoded by a gene releasing more than 
10 neuropeptides, either singly or in combination. 

By 1984, the first behavioral genes in Drosophila 
had been cloned, the learning mutant dunce, and the 
circadian clock gene period (per). Later that year the 
per gene was transformed back into arrhythmic per 
mutant hosts, rescuing the 24-h circadian phenotype. 
The per gene became, and still is, the cutting edge of the 
molecular analysis of behavior, and a number of other 


clock genes have been isolated, initially by mutagenesis, 
and their molecular roles in generating the circadian 
phenotype have been elucidated (see Clock Mutants). 
Furthermore, intraspecific natural variation in the 
coding regions of the per gene has been shown to 
have important implications for the fly’s Darwinian 
fitness, with natural selection distributing this varia- 
tion geographically along a latitudinal cline in Europe 
and Australia. 

In addition, the per gene carries with it species- 
specific behavioral information, so that circadian 
locomotor activity patterns of different fly species 
can be transferred between species by interspecific 
transformation. This also applies to another species- 
specific phenotype controlled by the per gene, the 
60-sec lovesong cycle generated by the male’s wing 
display during courtship. These results show how the 
identification of per by mutagenesis has ultimately 
resulted not only in dissecting out the molecular basis 
of the circadian clock, but also in contributing to our 
understanding of the biological clock in an evolu- 
tionary, ecological and population genetics context. 
This example provides the most powerful response to 
the critics of Benzer’s approach, namely that it could 
never tell us anything about normal behavior, nor its 
evolution. 

Other equally compelling stories have been devel- 
oped in the study of learning and memory, sexual 
behaviour, olfaction, etc. (see Neurogenetics in Droso- 
phila). The fly remains a wonderful model system for 
behavior genetic research because of its rich repertoire 
of behavior and its tractable genetics. However, other 
higher eukaryote models have now been developed, 
primarily the nematode and the mouse. Sydney 
Brenner, like Benzer, also saw the value of the single- 
gene approach, and extensive mutagenesis of 
Caenorhabditis elegans has provided many behavioral 
variants in mechanoreception, locomotion, etc. Beha- 
vior in the worm is rather less sophisticated than in the 
fly, but the developmental fate of all 302 nerve cells is 
known, as is the worm’s complete DNA sequence, so 
the potential for molecular neurogenetics is enormous 
(see Neurogenetics i in Caenorhabditis elegans). 

Forward genetics by mutagenesis followed by be- 
havioral screening is not very practical in the mouse, 
although recently the approach has had some stunning 
success in the case of the Clock circadian rhythm 
mutant (see Clock Mutants). The development of 
gene knockouts in the mouse has permitted the targeted 
mutagenesis of known DNA sequences, and the assess- 
ment of function. This has been very informative for a 
number of genes that are involved in long-term mem- 
ory formation, particularly the gene that encodes 
CREB, a protein that mediates cAMP-responsive 
transcription, and that encoding the protein kinase 
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CaMKII. Molecular analysis of learning and memory 
inflies, mice and Aplysia has revealed a highly conserved 
mechanism (see also Neurogenetics in Drosophila). 


Human Behavioral Genetics 


In spite of the molecular revolution, much of the 
behavioral genetics of mammals still focuses on the 
same paradigms that have been in use for 50 years, 
particularly with humans, where molecular analysis 
is particularly difficult. Thus, complex multivariate 
models continue to be imposed on family pedigree 
data involving complex behavioral traits such as gen- 
eral cognitive ability, schizophrenia, and other mood 
disorders, and are designed to partition the underlying 
genetic and environmental factors. Many attempts 
have been made to identify major gene contributions 
to psychopathological conditions using linkage analy- 
sis with DNA markers, but as yet no compelling case 
has been made, particularly for schizophrenia. 

The most spectacular case of an apparently complex 
personality disorder that was mapped to a single muta- 
tion involved a Dutch family in which half the boys or 
young men showed sudden outbursts of unrestrained 
violence, arson, attempted rape, and exhibitionism. 
This X-linked trait was mapped using DNA markers 
to the site of the monoamine oxidase A (MAOA) gene, 
whose product metabolizes synaptic serotonin, dopa- 
mine, and noradrenaline. Sequencing revealed a point 
mutation that generated a premature translational stop 
codon, truncating the MAOA product and having 
dramatic effects on monoamine metabolism. Other 
success stories include the study of the catastrophic 
neurodegeneration and severe behavioral abnormal- 
ities caused in middle age by Huntington disease and 
Alzheimer disease. A number of genes known to be 
important in the development of these conditions have 
been identified by linkage analysis. However, these 
severe neuropathologies are not generally included 
under the umbrella of personality disorders. 

An interesting alternative approach to human 
behavioral genetics has been to take candidate genes 
that may underlie various personality traits, examine 
them for variation, and then attempt to correlate this 
variation with different levels of the trait. This type of 
analysis was performed for the personality trait of 
novelty-seeking with the gene encoding the dopamine 
D4 receptor (D4DR). A length polymorphism involv- 
ing from between two to eight copies of a 16 amino 
acid encoding repeat means that individuals can have 
either a long or short allele of the gene. There appears 
to be a positive correlation between individuals who 
score highly on a novelty-seeking questionnaire (1.e., 
people whose idea of fun is to bungee jump as opposed 
to those who prefer to stay in and watch the soccer 
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on TV), and the length of the D4DR alleles they carry. 
This study has been replicated a number of times, but 
the proportion of the variation accounted for by this 
locus is very small, about 4-5%. 


Future Prospects 


There is little doubt that the new century will see 
remarkable developments in animal behavioral 
genetics, driven mainly by the molecular revolution. 
The various mammalian genome projects will be 
concluded in the next few years and the conservation 
of gene function in behavior that has been seen, for 
example in learning and circadian rhythms across 
taxa, will inevitably provide candidate genes and 
behaviors that can be analyzed in mammals, including 
humans. Developments in breeding techniques, and 
increased sensitivity of the mathematical tools for 
identifying the gene loci that make small contributions 
to behavioral phenotypes, will begin to dissect the 
polygenic bases of behavior, particularly in tractable 
organisms such as mice. Linkage studies will continue 
to be used in attempts to find loci that make contri- 
butions, however small, to both normal and abnormal 
phenotypes in human behavior. As the field expands 
and becomes an area within molecular neuroscience, 
important political and ethical questions will have 
to be addressed about the use of the knowledge that 
will be obtained. Will it be a Pandora’s box, or will it 
finally herald in the Age of Reason? 


Further Reading 

Cloninger CR, Adolfsson R and Svrakie NM (1996) Mapping 
genes for human personality. Nature Genetics 12: 3—4. 

Karayiorgou M and Gogos JA (1998) A turning point in schizo- 
phrenia genetics. Neuron 19: 967-979. 

Plomin R, DeFries JC, McClearn GE and Rutter M (1997) Behav- 
ioral Genetics, 3rd edn. New York: WVH Freeman. 

Science magazine, |7, June (1994), Vol. 264, has a number of 
reviews on behavioral genetics. 


See also: Alzheimer’s Disease; Benzer, Seymour; 
Clock Mutants; Neurogenetics in Caenorhabditis 
elegans; Neurogenetics in Drosophila; 
Schizophrenia 
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After receiving his BA from Brooklyn College in 
1942, Seymour Benzer (1921-) studied solid state 


physics at Purdue University, earning his PhD in 
1947. However, it was his participation in the bacterio- 
phage genetics course during the summer at Cold 
Spring Harbor Laboratory that altered the direction 
of Benzer’s research, and induced him to be one of the 
leaders in the rapidly developing field of molecular 
biology. After a year at Oak Ridge, Benzer spent two 
years with Max Delbrück at Caltech, from 1949 to 
1951, followed by a stint at the Pasteur Institute in 
the laboratory of André Lwoff (1951-52). Benzer then 
headed a laboratory at Purdue from 1953 to 1965. 
At Purdue, Benzer developed bacteriophage genetics 
to a fine art, exploiting deletion mapping and high 
resolution selection with the phage T4 rll system. In 
addition to introducing the term ‘cistron, to define 
the gene as a functional unit, he was involved in help- 
ing to decipher nonsense mutations, as well as dis- 
covering suppressors for them. His papers on the 
fine structure of the gene were a landmark in molecu- 
lar genetics, since he helped to bridge the gap between 
the classical view of the gene as an indivisible unit and 
the Watson-Crick structure of DNA that pointed to 
individual base pairs as units of mutation and recom- 
bination. He showed that the gene consisted of a linear 
array of subunits that could mutate and could recom- 
bine with one another, and that were later correlated 
with individual base pairs. Benzer also showed that 
some points in the gene were more mutable than 
others, and defined the term ‘hotspot’ to refer to these 
points. In the early 1960s, Benzer’s interest shifted to 
neurobiology, using the fruit fly Drosophila as a model 
system, and in 1965 he joined the Caltech (California 
Institute of Technology) faculty in Pasadena. He has 
recently been involved in studying the Methuselah 
gene in Drosophila, that can increase the life span of the 
fruit fly when mutated. Benzer has received numerous 
awards, including the National Medal of Science. 


See also: Behavioral Genetics; Cistron; 
Delbriick, Max 


Beta (()-Galactosidase 
R E Huber 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0489 


B-Galactosidase is an enzyme found in many organ- 
isms. The B-galactosidase that has been studied in the 
most detail is encoded by the lacZ gene of the /acoperon 
of Escherichia coli. It catalyzes B-p-galactoside break- 
down and galactosyl transfer reactions. This enzyme is 
of both scientific and historical significance. Jacob and 


Monod used it to study the lac operon and induction, 
and later won the Nobel Prize for their work. 


Structure 


The enzyme is composed of four identical monomers 
(1023 amino acids; mol. wt = 116 353), each monomer 
having five domains. B-Galactosidase is active only 
as a tetramer. The active sites, located on TIM 
barrel domains, are shared between subunits. M15-B- 
galactosidase is a form of the enzyme that has a dele- 
tion of residues 11-41. It is not able to form tetramers 
and, as a result, is inactive; however, it can be activated 
by the addition of a polypeptide that contains residues 
3-44, The activation occurs as a result of formation 
of the tetrameric structure. The activating peptide is 
called the o-peptide and the process of activation 
is called «%-complementation. -Complementation is 
important in molecular biology and in diagnostics. 


Substrates and Assays 


Binding Site 
B-Galactosidase is designed to bind galactose and 
glucose (the two monosaccharides of lactose): 


1. Galactose subsite. The enzyme is highly specific for 
D-galactose. Only sugars different from galactose at 
position 6 are tolerated but even they are poor 
substrates. 

2. Glucose subsite. B-Galactosidase has low specifi- 
city for D-glucose, for which a variety of alcohols 
can be substituted. The glucose subsite is hydro- 
phobic, so B-galactosides that have hydrophobic 
groups in place of glucose bind very well. The 
affinity for D-glucose increases significantly after 
the glycosidic bond has been broken. 


Lactose 
Hydrolysis of lactose yields galactose and glucose. 
Intramolecular galactose transfer yields allolactose, 
the natural inducer of the Jac operon. Allolactose is 
also hydrolyzed and thus is only a transient product. 
A complete assay that quantifies the three products 
of reaction with lactose (D-galactose, D-glucose, and 
allolactose) is best accomplished using gas-liquid 
chromatography; however, this assay is quite time 
consuming. A coupled assay with galactose dehydro- 
genase enables one to follow galactose production 
quickly and a coupled assay using a combination of 
hexokinase and glucose-6-phosphate dehydrogenase 
is well suited for glucose quantification. 


Synthetic Substrates 
Two common synthetic substrates are o-nitrophenol- 
B-p-galactose (ONPG) and p-nitrophenol-B-p-galac- 
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tose (pNPG). The nitrophenol products absorb at 
420 nm and assays are rapid and straightforward. 


Metal Requirements 


1. Either Na” or K” binds at the active site (Asp201 is 
a ligand) and are needed for full activity. The role of 
the monovalent cation has not been established. 

2. One Mg’* or one Mn** binds at each active site 
(Glu416, His418, and Glu461 are the ligands). The 
bound Mg’* (or Mn’*) is probably important for 
proper structure but it might also act as an electro- 
phile. There may be a second divalent metal site in 
the N-terminal region. 


Reaction Mechanism 


Glu461 is thought to be a general acid catalyst for 
cleavage of the glycosidic bond. His540, His357, 
His391, Asp201, Glu461, Trp568, and Phe604 are 
required for transition state stabilization, but other 
residues are also undoubtedly involved. Glu537 forms 
a covalent bond with galactose during the reaction and 
Tyr503 is a general acid catalyst for breakage of this 
covalent bond. 


Biochemical and Biotechnological 
Applications 


B-Galactosidase is used in a number of biochemical 
and biotechnological applications. 

Many adults cannot digest lactose because they 
lack intestinal B-galactosidase, a condition known as 
lactose intolerance. Production of “low-lactose” dairy 
products for consumption by these individuals is 
accomplished using microbial B-galactosidases. 

Genes for other proteins are often fused to the 
beginning of the gene that codes for B-galactosidase 
so that the B-galactosidase produced has some other 
polypeptide or protein attached at the N-terminal end. 
There is considerable latitude in the sequence and 
composition at this end of B-galactosidase as long as 
the fusion position leaves most of the amino acids at 
the N-terminal end of B-galactosidase intact. In some 
cases the lacZ gene is fused to some other operon and 
then “reports” the expression of that operon. 

The product of the reaction of B-galactosidase with 
X-gal (5-bromo-4-chloro-3-indoy]-B-p-galactopyr- 
anoside) is insoluble and intensely blue. Its formation 
is used in DNA recombination experiments (blue/ 
white screening). The screening process is dependent 
upon a&-complementation. 

B-Galactosidase is also used to synthesize a variety 
of galactosides by intermolecular galactose transfer to 
alcohols. 
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Finally, «-complementation in conjunction with 
antibodies directed toward specific antigens attached 
to the a-peptide is used in a diagnostic test for com- 
pounds in blood or urine. 


Further Reading 

Jacobson RH, Zhang X-H, Dubose RF and Matthews BW (1994) 
Three-dimensional structure of B-galactosidase from E. coli. 
Nature 369: 761—766. 

Juers DH, Jacobson RH, Wigley D et al. (2000) Protein Science, 
pp. 1685-1699 

Sanbrook}, Fritsch EF and Maniatis T (1989) Molecular Cloning, 2nd 
edn. Plainview, NY: Cold Spring Harbor Laboratory Press. 

Tronrud and Matthews (2000) High resolution refinement of 
B-galactosidase in a new crystal form reveals multiple 
metal binding sites and provides a structural basis for 
a-complementation. Protein Science. 


See also: Fusion Proteins; lac Operon; 
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Bidirectional replication is the replication accom- 
plished when two different replication forks move 
away from the origin in different directions. 


See also: Replication; Replication Fork 
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In 1968, studies on the kinetics of DNA renatura- 
tion showed that large numbers of repeated DNA 
sequences form significant portions of eukaryotic 
genomes (Britten and Kohne, 1968). Although Britten 
and Khone did not identify repetitive DNA in Escher- 
ichia coli, a number of repeated elements are now 
known. In fact other prokaryotic genomes also con- 
tain DNA repeats, some of which are several kilobases 
long and may have identical nucleotide sequences. 
Such duplications include operons coding for ribo- 
somal RNA subunits, genes for other essential cellular 


functions, insertion sequences, transposons, sym- 
biotic genes for nitrogen fixation, etc. 

Many bacterial genomes also include shorter inter- 
spersed repetitive elements (<200 bp) generally found 
at the 3’ end of transcription units. First described in E. 
coli and Salmonella typhimurium, these repeats 
include the repetitive extragenic palindrome (REP) 
(Higgins et al, 1982) or PU (Palindromic Unit) 
(Gilson et al., 1984) elements, as well as the 126 bp 
long enterobacterial repetitive intergenic consensus 
(ERIC) (Hulton et al., 1991) sequences also found 
in other enterobacteria. Analysis of the E. coli genome 
identified 581 REP-like sequences that are grouped 
into 314 elements of one to twelve tandem copies 
(Blattner, 1997). Together these sequences represent 
the most abundant class of repetitive DNA in E. coli 
and account for 0.54% of the whole chromosome. In 
contrast, only 19 ERIC elements were found. 

Like in short repeats often combine into complex 
motifs mosaics, of up to 300 nucleotides, first charac- 
terized as bacterial interspersed mosaic elements 
(BIMEs) (Gilson et al., 1991). Similarly, genomes of 
various members of the Rhizobiaceae were shown to 
contain modular repeats larger than 100 bp, that were 
called Rhizobium-specitic intergenic mosaic elements 
(RIME) (Osteras etal., 1995, 1998). Asin BIME, ERIC, 
or REP elements, RIME1 and RIME2 sequences also 
include inverted repeats that can form stem-loop struc- 
tures when transcribed into RNA. Although REP were 
shown to stabilize upstream mRNA and influence gene 
expression, these functions appear to be secondary 
consequences of stem-loop formation in appropriate 
locations rather than reflecting a primary function of 
the REP sequences (Higgins et al., 1988). Although the 
exact role of short interspersed sequences remains 
unknown, their scattered distribution in many differ- 
ent bacterial genomes helped produce genomic finger- 
prints using PCR-primers complementary to REP, 
ERIC, and other repeats. 

In contrast to the intergenic BIME, ERIC, REP or 
RIME sequences, 19 of the 44 Rickettsia palindromic 
elements (RPEs) identified in R. conorii are inserted 
in open reading frames (Ogata et al., 2000). These 
19 repeats of 106 to 150 nucleotides never interrupt 
the translation frame, and are predicted to code for 
a central a-helix domain flanked by two extended 
or coil regions. As RPEs are found in transcripts and 
lack the common features of the self-splicing inteins, 
they probably form a class of DNA elements enco- 
ding a peptide insert tolerated by many arbitrary host 
proteins. 
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The binomial distribution is based on ‘Bernoulli 
trials.’ Consider repeated independent trials where 
there are only two possible outcomes for each trial 
and their probabilities (p and q) remain the same 
throughout the trials. These are called ‘Bernoulli 
trials,’ after the famous French mathematician. Of 
course, p + q = 1. Coin tossing is a good example. 
In this case, p = q = 0.5 for the ideal situation. 

We are often interested in only the total number of 
a particular outcome (such as, ‘head’ in coin tossing) in 
n trials. This number can be 0, 1, 2 ...n, and the 
probability (Prob[k]) of having k particular outcomes 
is given by nCkp* q” ~}, where nCk = n!/[(n — k)!R!]. 
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Because this probability is the kth term of the bino- 
mial expansion of (p + q)”, this probability distribu- 
tion is called ‘binomial distribution.’ Because p + q = 1 
by definition, the sum of all terms for (p + g)” = 1, 
which satisfies the requirement of a probability 
distribution. 

When 7 is not large, the binomial distribution is 
skewed. For example, when n = 12 and p = 1/3, 
Prob[0], Prob[1], ..., and Prob[12] are 0.008, 0.046, 
0.127, 0.212, 0.238, 0.191, 0.111, 0.048, 0.015, 0.003, 
0.000, 0.000, and 0.000, respectively. The highest 
probability is for Prob[4]. This is because expectation 
[pn] = 4.0. As n becomes larger, the probability dis- 
tribution will approach the normal distribution with 
mean as expected value. When 7 is large and p is small, 
the binomial distribution can be approximated as 
Poisson distribution, exp(—/) A*/k!, where 4 is the 
mean (= np). 

When Mendel studied pea phenotypes in search of 
the fundamental laws of genetics, he used a simple 
binomial distribution, with p = g = 1/2 and n = 2. In 
this case, there are only three terms (k = 0, 1, and 2), 
and these correspond to the probability of obtain- 
ing homozygotes of one allele, heterozygotes, and 
homozygotes of the other allele. Under this condition, 
Prob[0], Prob[1], and Prob[2] are 1/4, 2/4, and 
1/4, respectively. When the phenotype of the hetero- 
zygote is indistinguishable from one of the homo- 
zygotes, we obtain the famous 3:1 proportion under 
dominance. 

The so-called Hardy—Weinberg ratio is also a sim- 
ple application of the binomial distribution. In this 
case, the probabilities p and q correspond to allele 
frequencies of two alleles at one locus in one popula- 
tion, and n = 2 (two gametes transmitted from pater- 
nal and maternal parents). Because two gametes are 
united by chance under random mating, the expected 
frequency of genotypes in the offspring generation 
is given by expansion of (p + q) = p° + 2pq + q. 
When there are more than two alleles, we can use 
the multinomial formula instead of the binomial 
one. In any case, this Hardy-Weinberg ratio is 
known to be a good approximation for estimating 
observed number of genotypes in a population, unless 
the effect of random genetic drift, inbreeding, assorta- 
tive mating, mutation, gene flow, and other factors are 
significant. 

When we compare two homologous amino acid or 
nucleotide sequences, we obtain two classes of sites 
after aligning them: identical or different. Both the 
number of identical and different sites follow binomial 
distribution, nCkp* g”~*, where n is the number of 
compared sites (excluding gaps), p and q are the prob- 
abilities of different and identical sites, respectively, 
and k is the observed number of different sites. The 
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probability p can be estimated as k/n, and its variance 


is p(1 — p)/n. 


See also: Hardy-Weinberg Law 
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Biochemical genetics combines the unique strengths 
of both genetic and biochemical analyses to gain 
insights into cellular metabolism. In the most rudi- 
mentary sense, biochemical genetics defines a bio- 
chemical function for a gene product in vitro and 
uses this information to assign a role to the respective 
gene product in the context of in vivo physiology. This 
approach i is currently most tractable in microorgan- 
isms with established genetic systems, although, with 
the continuing advances in molecular biological tech- 
niques, biochemical genetic approaches are becoming 
viable in a variety of systems. 

The goal of a biochemical genetic approach is to 
identify the specific biochemical role of a gene pro- 
duct in cellular metabolism. To achieve this goal, two 
results must be obtained: (1) demonstration of the 
biochemical activity of a gene product in vitro; (2) 
demonstration that inactivation of the gene by 
mutation results in the phenotype predicted for a 
strain lacking this activity. If both of these results 
are obtained, the probability that the biochemical 
activity identified in vitro is physiologically relevant 
is increased. For instance, a mutant lacking the 
trpD gene (Figure 1) should not only require try- 
ptophan but also no anthranilate phosphoribosyl 
transferase activity should be detectable in the cell- 
free extracts. 


Classical Biochemical Genetics 


Biochemical genetics has been instrumental to the 
advancement of our understanding of biosynthetic 
and catabolic pathways in the bacterial cell. In general, 
these analyses started with the identification of a 
mutant with a particular metabolic phenotype or 
nutritional requirement. Careful dissection of this 
phenotype allows the investigator to predict a bio- 
chemical process or specific enzymatic reaction that 
is defective in the mutant strain. In some cases, block- 
ing an enzyme results in accumulation of the substrate 
for that enzyme. If this substrate (or a metabolizable 
derivative) can both exit the cell and enter a different 
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Figure | The biochemical reactions of the tryptophan 
biosynthetic pathway. The genes encoding the biosyn- 
thetic enzymes are indicated by the reaction catalyzed 
by the respective gene product. 

SER — serine, PP; — pyrophosphate, PRPP — 3-phospho- 
ribosyl-|-pyrophosphate, GLU — glutamate, GLN — 
glutamine. 


cell, this accumulation can result in a ‘crossfeeding’ 
phenomenon that further defines the biochemical 
lesions in the respective strains. For instance, in the 
tryptophan pathway shown in Figure l, mutants 
defective in either trpA or trpD require tryptophan. 
However, trpA mutants will accumulate and excrete a 
compound (indoleglycerol phosphate) that, under 
some conditions, will allow trpD mutants to grow. 
The converse is not true. This result reflects the 
order of these two gene products in the biosynthetic 
pathway and narrows down the biochemical defects 
that could explain the mutant phenotypes. 

The biochemical prediction resulting from pheno- 
typic analysis and ‘crossfeeding’ experiments can 
then be tested im vitro, often by demonstrating that a 
particular enzyme-catalyzed reaction does not occur 
in cell-free extracts of the mutant strain but is demon- 
strable in extracts of the wild-type strain. For i instance, 
in a straightforward case, a mutant requiring an amino 
acid could be shown im vitro to lack one of the en- 
zymatic activities leading to the formation of this 
amino acid. In this scenario, it is reasonable to pursue 
the hypothesis that the mutant gene encodes the miss- 
ing enzymatic activity. In a number of cases this 
general approach revealed the genes encoding the 
biosynthetic steps in a pathway. Results from this 
type of work generated and propagated the simple 
notion that one gene encoded one enzyme, an assump- 
tion that has been instrumental in our understanding 
of the biochemical steps in many of the metabolic 
pathways in the bacterial cell. 

However, when performed as described above, the 
correlation between gene product, biochemical func- 
tion, and zm vivo role has not been rigorously demon- 
strated. The work must be extended to demonstrate a 
direct correlation between mutant gene and lost activ- 
ity. A standard way to do this is to purify the relevant 
gene product and demonstrate that it can perform the 
activity that is lacking in the crude extract of the 
mutant. By this approach the direct role of the mutant 
gene product in the lack of activity can be verified, and 
the possibility of an indirect or regulatory effect of the 
mutation minimized. These analyses can be extended 
to sophisticated levels, including analyzing mutant 
proteins in vitro and correlating the change in activity 
with phenotypes produced by the mutant proteins. It 
is this type of work that allows the correlation to 
be extended to understand not only the specific func- 
tion of the gene product but also other aspects of 
metabolism that are affected by its activity. For 
instance, mutations resulting in temperature-sensitive 
phenotypic defects could be shown to generate 
enzymes that were defective at elevated temperatures. 
Mutant proteins with altered kinetic parameters may 
result in phenotypes that require the invocation of 
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broader physiological effects to adequately explain 
the phenotypic consequences of the mutation. 


Modern Biochemical Genetics 


The increased availability of genome sequences and 
the desire to address metabolism in less genetically 
tractable organisms has stimulated the development 
of modern biochemical genetic approaches. In theory 
these approaches have the same goal as those described 
above. However, the newer approaches are often 
more targeted than the broad and general classical 
approaches that were based on initiating mutant char- 
acterization by starting with a nutritional phenotype. 
In modern biochemical genetics, the starting point is 
often a sequenced gene that is of interest for either 
biochemical or genetic reasons. The plan of attack in 
modern techniques is to: (1) mutate the gene and assess 
phenotypic manifestations, and (2) characterize the 
gene product in vitro to determine function. Ideally, 
the mutant must display the phenotype predicted by a 
lack of the function demonstrated in vitro. However, 
in many cases this level of rigor is not achieved before 
a physiological function is assigned for the gene pro- 
duct. Several things can contribute to the lack of rigor 
in these cases. First, in an organism that is difficult to 
manipulate genetically or to culture in the laboratory, 
confirmation of the biochemical/genetic correlation 
may be unattainable. Secondly, phenotypic analysis 
may not be feasible, particularly in more complex 
or differentiated organisms. In these cases, previous 
work in more tractable systems is used as precedence, 
and functional assignments are based on sequence 
similarity of the relevant gene to those that have 
been rigorously characterized in different systems. 
Investigators should be cautioned against making 
strong conclusions about the physiological function 
of a gene product solely on the basis of structural 
homologies. While functional assignment based on 
similarity may be correct in most cases, without 
detailed experimentation, the possibility of identify- 
ing new functions and or paradigms in metabolism is 
reduced. 


When a Negative Result is Informative 


Biochemical genetic approaches are useful in uncover- 
ing new aspects of metabolism. In the examples 
described in the preceding section, there is the expect- 
ation that the mutant phenotype will support the 
biochemical function identified in vitro. Those 
instances where the predicted phenotype is not found 
can provide insights into metabolism that would 
otherwise be overlooked. In such a case, there are 
two simple explanations for this result: (1) the in vitro 


218 Biosynthesis of Small Molecules 


assay is monitoring an activity that is not physiologic- 
ally relevant or, (2) there is a redundant function in the 
cell that masks the requirement for this enzyme 
in vivo. Further experimentation can distinguish be- 
tween these two possibilities. Importantly, such re- 
sults have the potential to broaden our understanding 
of metabolism in a way that might not be achievable if 
a biochemical or genetic approach had been used in 
isolation. 


Summary 


Biochemical genetics provides a means to rigorously 
gather insight into the physiology of a living cell. The 
premise behind this approach is that to understand 
physiology one must know the gene, the function of 
the product, and the role of this product in metabolism. 
To obtain this information requires the integrative use 
of both biochemical and genetic approaches. Either 
approach alone, or in combination with sequence 
data can go only so far in defining the role of a gene 
in cellular physiology. In each system the feasibility of 
the current and future technologies must be evaluated 
to ensure that the highest level of rigor possible is 
being used to make functional assignments. 
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Goals of Understanding Small Molecule 
Biosynthesis 


Living cells contain hundreds of different enzymes 
that mediate the operation of metabolic pathways. 
For example, the genome of the gram-negative bac- 
terium Escherichia coli specifies 4473 genes, at least 
800 of which encode metabolic enzymes that catalyze 
nearly 990 chemical reactions in over 120 different 
multistep metabolic pathways. At least 280 of these 


genes encode enzymes that carry out the synthesis 
of small molecules, which furnish precursors for the 
essential building blocks of macromolecules, such as 
DNA, RNA, proteins, and polysaccharides. For most 
of the twentieth century, scientists pursued the goal of 
determining the order of the chemical reactions in 
biosynthetic pathways and the mechanisms that regu- 
late these steps. Beyond achieving its fundamental 
aims, this research has contributed in important ways 
to the understanding of metabolic diseases and has laid 
a foundation for the utilization of enzymes for indus- 
trial purposes. It has also led to the identification 
of the genes and enzymes essential for the steps in 
biosynthetic pathways. 

Genetic approaches have been especially helpful 
in identifying biosynthetic genes and enzymes in 
model organisms, such as the bacteria E. coli, Salmon- 
ella typhimurium, and Bacillus subtilis and the yeast 
Saccharomyces cerevisiae. In many instances, the path- 
ways and regulatory principles worked out in these 
model organism turned out to be widely applicable to 
plants and animals, where genetic methods tend to be 
considerably more difficult and time-consuming. In 
other cases, information about biosynthesis in model 
organisms has been a point of departure for studies 
of the pathways in plants and animals. 

This article is divided into three sections that dis- 
cuss genetic approaches to elucidating the biosyn- 
thesis of small molecules in model organisms. The first 
and second sections cover updated classical genetics 
and reverse genetics. The third section focuses on the 
applications of genomics to metabolism. The article 
concludes with a short section on the control of 
biosynthesis. The article uses many examples from 
model bacterial systems to illustrate approaches and 
reasoning. Analogous genetic approaches exist in 
yeast and other model organisms, and the methods 
of reverse genetics can usually be applied to any 
organism. 


Updated Classical Genetic Approaches 


Simple Screening 

A powerful classical approach to elucidating bio- 
synthetic pathways has been to isolate mutants that 
require the addition of a specific nutrient for growth — 
‘auxotrophs.’ These mutants usually contain defective 
variants of one or more of the enzymes necessary to 
synthesize a compound that is needed for growth. For 
example, suppose we want to study the biosynthesis of 
a small molecule, arbitrarily called F (Figure 1), which 
could be an amino acid, vitamin, pyrimidine or purine 
base, or some other essential compound normally 
synthesized by bacterial cells. For the sake of illustra- 
tion, suppose F is synthesized in cells in five steps 


Gene Gene Gene 
a b c 
Enzyme Enzyme Enzyme 
a b c 
Starting Intermediate Intermediate 
substrate 


Figure | Biosynthesis of a small molecule. 


catalyzed by enzymes called a, b, c, d, and e, where 
enzyme a converts starting substrate A into inter- 
mediate B, enzyme b converts intermediate B into 
intermediate C, and so on (Figure |). Mutants defect- 
ive in any one of these five enzymes (a to e) will 
require small molecule F for growth. 

In the simplest scheme, mutants unable to syn- 
thesize nutrient F could be isolated from a mixed 
population of mutants that arise spontaneously. The 
screening process could be accomplished by spreading 
diluted bacterial populations onto agar plates contain- 
ing a carbon source and nutrient F. Each colony that 
arises on these plates is descended from a single bac- 
terial cell, and each colony could be tested to deter- 
mine whether the bacteria therein require nutrient F 
for growth. A similar approach could be applied to 
yeast cells. While simple in principle, this strategy is 
incredibly laborious, because a spontaneous mutation 
in one of the five genes encoding the five enzymes 
(a through e) required for nutrient F biosynthesis 
will occur in only 1 of 10° to 10” bacteria. Thus, 
about 4000 petri plates containing 250 colonies each 
would have to be screened to find just one mutant 
auxotrophic for nutrient F. 


Mutagenesis 
Several methods were developed to make the isolation 
of auxotrophic mutants more efficient. Bacteria and 
other kinds of cells can be treated with various agents 
to increase the fraction of a bacterial population that 
contains mutants. Several powerful chemical mutagens 
are available, including alkylating agents, deaminating 
agents, and base analogs. Treatment with these muta- 
gens ultimately leads to changes in the DNA and the 
accumulation of mutations throughout the chromo- 
some. Alternatively, auxotrophic mutants may be 
sought in bacterial strains that lack key DNA repair 
pathways, such as DNA proofreading or mismatch 
repair, and consequently have very high spontaneous 
mutation frequencies. 

The power of mutagenesis was extended consid- 
erably by using transposons, which can “jump” into 
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the chromosomes of bacteria, yeast, and other organ- 
isms from carrier pieces of DNA referred to as 
vehicles. Transposon insertion into a gene disrupts 
the protein reading frame and thereby usually inacti- 
vates the gene. In addition, transposons themselves 
carry genes that impart resistance to antibiotics or 
other readily selectable genetic markers. For example, 
when a transposon carrying a gene that imparts resist- 
ance to an antibiotic, such as kanamycin, inserts into a 
bacterial chromosome, the resulting bacterial strain 
becomes resistant to kanamycin. Thus, the bacteria 
in each kanamycin-resistant colony contain a trans- 
poson inserted at a specific place in their chromosomes, 
and bacteria from individual kanamycin-resistant 
colonies will usually have the transposon inserted at 
different loci in their chromosomes. Following a 
transposon jump, antibiotic-resistant colonies can be 
screened for an auxotrophic requirement, such as 
the need for nutrient F (Figure 1). About 1 in 2000 
transposon-containing, antibiotic-resistant colonies 
will typically turn out to be nutrient F auxotrophs. 


Mutant Enrichment 

Methods have been developed in bacteria to avoid the 
need to screen thousands of colonies for a growth 
requirement. Enrichment schemes rely on the fact 
that bacteria stop growing when they are deprived of 
an essential nutrient. Nongrowing bacteria survive 
exposure to antibiotics, such as penicillin, that kill 
growing bacteria by preventing peptidoglycan (cell- 
wall) biosynthesis. Usually enrichment schemes in- 
clude prior mutagenesis to increase the proportion 
of auxotrophic mutants. The mutagenized bacteria 
are then washed with a physiological saline solution 
to remove nutrients and suspended in a minimal-salts 
medium containing a suitable carbon source and an 
antibiotic, such as penicillin. During this incubation, 
auxotrophic bacteria cannot grow and will survive, 
whereas nonauxotrophs are killed. After the antibiotic 
is washed away, the surviving bacteria can be spread 
onto nutrient-rich medium and screened for growth 
requirements, such as an inability to grow without 
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added nutrient F (Figure 1). Many variations of 
enrichment procedures have been developed. 


Chemical Analogs 

The chemical structure of the endproduct of a bio- 
synthetic pathway (e.g., small molecule F in Figure 1) 
can be used to design analogs that can be exploited to 
identify the genes involved in the steps or regulation of 
that pathway. For example, a halogen atom could 
be substituted for a hydrogen atom in the structure. 
Numerous potential analogs of biologically active 
compounds have been synthesized and many are com- 
mercially available. Chemical analogs frequently in- 
hibit enzymes or alter the regulation of biosynthetic 
pathways and thereby inhibit the growth of treated 
bacterial cells. Analogs can also be used with yeast 
and other kinds of cells. Depending on the enzyme 
and pathway, bacterial mutants that are hypersensi- 
tive or resistant to chemical analogs can be isolated. 
Hypersensitive mutants must be screened or enriched 
for in populations of mutagenized cells. In contrast, 
analog resistant mutants, but not the wild-type parent 
bacteria, are the only cells that can grow on plates 
containing the analog. Thus, analog-resistant mutants 
are among the easiest class of mutants to isolate by 
direct selection. 

Hypersensitivity and resistance to chemical analogs 
can arise by several different mechanisms. For ex- 
ample, hypersensitivity can result when a mutation 
increases the affinity of a biosynthetic enzyme for an 
inhibitory analog. Alternatively, a regulatory mutant 
that decreases the cellular amount of an enzyme may 
cause hypersensitivity. Conversely, resistant mutants 
can result when a mutation increases the cellular 
amount ofan inhibited enzyme or decreases the affinity 
of the enzyme for the analog. Resistance can also arise 
when a mutation disables an enzyme that converts a 
chemical analog into a toxic substance or that decreases 
uptake or retention of the chemical analog by cells. 


Using Mutants to Identify Genes in 
Biosynthetic Pathways 


Locating transposon insertions 

Mutants defective in the biosynthesis of small mol- 
ecules can be used in numerous ways to gain informat- 
ion about the genes that mediate a biosynthetic 
pathway. The DNA sequences that flank the point of 
insertion of a transposon insertion in the chromo- 
somes of bacteria, yeast, and other organisms can be 
determined rapidly by cloning and polymerase chain 
reaction (PCR) methods. If the DNA sequence of the 
genome is known, then the gene or open reading frame 
disrupted by the transposon can be identified from the 
database. 


In bacteria, the genes that mediate biosynthetic 
pathways are often organized into polycistronic oper- 
ons or clusters. Therefore, there is a certain prob- 
ability that a gene disrupted by a transposon may be 
surrounded by other genes involved in the same 
biosynthetic pathway. On the other hand, there are 
numerous instances of multifunctional operons con- 
taining genes whose products function in entirely 
unrelated pathways. In these cases, a gene containing 
a transposon insertion may not be directly involved in 
the pathway whose disruption causes a nutritional 
requirement. Rather, the transposon insertion in an 
upstream gene in a multifunctional operon may 
block the expression of a downstream gene that en- 
codes an enzyme in the biosynthetic pathway. This 
indirect effect on function is referred to as polarity. 
Sorting out direct loss of function from indirect 
polarity usually requires data from several molecular 
biological approaches, including analyses of mRNA 
transcripts produced from adjacent genes. 


Complementation 
This technique can be used to classify genes where 
mutations produce identical phenotypes. For ex- 
ample, mutations that inactivated genes a to c would 
all require F for growth. In a complementation experi- 
ment, a wild-type copy of a gene is introduced into a 
cell that contains a mutant gene. For example, suppose 
a set of bacterial mutants that require a specific 
nutrient (e.g., small molecule F in Figure 1) were 
collected following chemical mutagenesis. The gene 
or operon deficient in each auxotrophic mutant could 
be identified using a genomic library prepared from 
the wild-type bacterium. One type of bacterial genomic 
library is a collection of plasmids that separately 
contain different segments of the wild-type bacterial 
chromosome. To find candidate genes (e.g., genes a to e 
in Figure 1) whose loss of function causes a nutrient 
requirement (e.g., small molecule F), each auxotrophic 
mutant in the set would be transformed with the 
plasmid library, and the resulting bacteria would be 
spread onto plates lacking small molecule F. Assuming 
negligible reversion of the original auxotrophic muta- 
tion, the only colonies that could appear would be 
those where the growth requirement had been alle- 
viated by a wild-type copy of the gene carried on a 
plasmid, a process referred to as complementation. 
The DNA of chromosome inserts carried by 
complementing plasmids can be sequenced by rapid 
methods. The intact wild-type genes contained on the 
plasmids would constitute candidates for genes that 
were defective in the auxotrophic mutants. For ex- 
ample, if genes a to e in Figure | are located at separate 
positions in the bacterial chromosome, then we would 
obtain five separate groups of mutations unable to 


synthesize nutrient F, and complementation of these 
mutations would give wild-type copies of genes a to 
e contained on five distinct plasmids. On the other 
hand, if genes a to e are grouped together into an 
operon, then we would obtain a set of complementing 
plasmids whose DNA sequences overlapped, because 
genes a to e lie adjacent to one another. 

The DNA sequences of candidate genes identified 
by genetic complementation can be used to design 
PCR primers needed to determine the DNA sequence 
of the corresponding gene in the original auxotrophic 
mutant. Such direct determinations of the sequences 
of chromosomal mutations were made feasible by the 
development of methods to isolate bacterial chromo- 
somal DNA rapidly and amplify specific sequences 
by PCR, by the relatively low cost of oligonucleotide 
primers, and by the advent of automated DNA sequen- 
cing. Barring certain complications, the candidate gene 
in the chromosome of the auxotrophic mutant should 
contain a change in its DNA that would disrupt its 
function. For example, suppose a nutrient F auxotroph 
is complemented by a wild-type copy of gene b con- 
tained on a plasmid. In the simplest case, the chromo- 
some of the auxotroph should contain a mutation in the 
gene b coding region or in a regulatory element that 
impairs gene expression. 


Mapping 

The ability to locate rapidly the sites of transposon 
insertions in bacterial chromosomes, the availability 
of bacterial genomic libraries, and advances in PCR 
methods have largely supplanted older genetic map- 
ping methods for localizing the sites of auxotrophic 
mutations. These older methods depended on conju- 
gation or generalized transduction by bacteriophages 
to locate mutations roughly within fairly broad 
regions of bacterial chromosomes. Genetic mapping 
methods are still an option for locating auxotrophic 
mutations in genes whose wild-type copies are not 
well represented in genomic libraries. Certain genes 
may be missing from genomic libraries because of 
cloning artifacts or because multiple copies of these 
biosynthetic genes or surrounding genes are dele- 
terious to cell growth. 


Determining the Function and Order of 
Steps in Biosynthetic Pathways 


Conserved functional domains 

Once a set of genes is identified that mediates the 
biosynthesis of a small molecule (e.g., genes a to e, 
Figure 1), then the next goal is to identify the func- 
tions of the genes and the order of the steps in the 
pathway. The amino acid sequences of proteins can 
be predicted from their DNA sequences. Many classes 
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of enzymes, such as nucleotide-binding proteins and 
oxidoreductases, contain highly conserved amino acid 
sequences in their functional domains. These hallmark 
motifs have been assembled for many different classes 
of enzymes and can be used to predict possible enzyme 
functions. Often it is even possible to formulate hypo- 
thetical biosynthetic pathways from the substrates 
used at the start of a pathway (e.g. A in Figure 1), the 
end products (e.g., F in Figure |), and the number and 
predicted functions of enzymes that mediate the path- 
way (enzymes a to e in Figure |). However, hypo- 
thetical pathways and enzyme functions must be 
confirmed directly by biochemical analyses, including 
isotopic labeling of suspected intermediates and the 
purification and characterization of the substrates and 
products of each enzyme in a pathway. 


Crossfeeding 

Two classical genetic approaches have often been 
helpful in determining the order of the steps in bio- 
synthetic pathways. The first approach, which relies 
on ‘crossfeeding’ (syntrophism) by mutants defective 
ina biosynthetic pathway, has been used extensively in 
bacteria. To illustrate the point suppose we actually 
do not know whether enzyme b or enzyme d acts first 
in the pathway of nutrient F biosynthesis (Figure 1). 
A mutation in gene b will inactivate enzyme b; hence 
the cells may accumulate intermediate B, which could 
be excreted from the cell. Likewise, a mutation in gene 
d could lead to the accumulation and excretion of 
intermediate D. Now suppose some mutant $ cells 
were spotted ona lawn of mutant d cells and vice versa. 
If enzyme d acts after enzyme b in the pathway as 
depicted in Figure |, then the gene d mutant would 
excrete intermediate D, which can be converted into 
intermediates E and F by mutant $ cells. That is, 
d mutants crossfeed b mutants. Conversely, the b 
mutant will not crossfeed the d mutant, because inter- 
mediate B excreted by the gene b mutant still cannot 
be converted into nutrient F by the gene d mutant. 
For crossfeeding approaches to be successful, inter- 
mediates before genetic blocks must be synthesized 
and excreted in sufficient amounts to feed other 
mutants, which in turn must be able to take up the 
excreted intermediates. Crossfeeding typically does 
not work well for pathways involving phosphorylated 
compounds, which are neither excreted nor taken up 
efficiently. 


Epistasis 

A second general way to order the action of genes and 
enzymes in biosynthetic pathways in bacteria and 
yeast is by epistasis tests. Here, the phenotypic prop- 
erties of single mutants are compared with those 
of double mutants. In the context of biosynthesis, 
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suppose that we have a biochemical method, such as 
high performance liquid chromatography (HPLC) or 
thin-layer chromatography, to determine the amounts 
of intermediates A through E and nutrient F synthe- 
sized de novo by bacterial cells (Figure 1). A gene b 
mutant lacking functional enzyme b would accumu- 
late intermediate B, whereas a gene d mutant lacking 
functional enzyme d would accumulate intermediate 
D. If enzyme b acts before enzyme d in the pathway as 
depicted in Figure I, then a gene-b gene-d double 
mutant lacking both enzymes b and d would accu- 
mulate intermediate B, but not intermediate D, and 
one would say that gene b is epistatic to gene d (lit- 
erally “sits on top of”). The opposite result would be 
obtained if enzyme d acted first in the pathway. For 
epistasis analysis to succeed, it is necessary that the 
pairs of mutants analyzed have readily distinguishable 
phenotypes; in this example, accumulation of dif- 
ferent pathway intermediates that can be biochem- 
ically distinguished. 


Potential Problems 

These updated classical approaches have worked well 
for many biosynthetic pathways and are still used 
to study new primary and alternative biosynthetic 
pathways. Yet, there are two serious problems that 
have hampered application of these approaches to 
some important biosynthetic pathways. First, these 
approaches depend on the ability to fulfil growth 
requirements by the addition of pathway end products 
(e.g., small molecule F in Figure 1). As noted above, 
several broad classes of nutrients, including phos- 
phorylated compounds such as pyridoxal phosphate 
(the active form of vitamin B6) and isopentenyl pyro- 
phosphate (a precursor for polyprenoids), are not 
taken up by bacterial cells. The pathways leading 
to the biosynthesis of these compounds have therefore 
been difficult to dissect and are only now being 
worked out. 

Another serious problem with the updated classical 
approaches is the redundancy of enzyme activities. 
In the example shown in Figure 1, all the steps in a 
pathway are catalyzed by single enzymes, and muta- 
tions in any one of the five genes (a to e) will cause a 
growth requirement for small molecule F. However, it 
turns out that many activities can be provided by two or 
more enzymes. Sometimes these redundant activities 
are provided by true isozymes, which are slightly 
different forms of enzymes with identical activities. 
In other cases, redundancy is provided by a minor 
activity of a related enzyme that may not normally 
function in a pathway in wild-type cells. In either case, 
it is difficult to obtain simultaneous mutations in two 
or more genes and thereby cause a nutrient require- 
ment. Redundancy problems can often be overcome 


by applying knowledge from complete DNA genomic 
sequences as discussed below. 


Reverse Genetics 


Reverse genetics depends on the isolation of an 
enzyme of sufficient purity to allow determination 
of a segment of its amino acid sequence. Significant 
technological advances have recently been made in 
peptide sequencing by chromatography and mass 
spectroscopy so that the amount of purified protein 
needed for amino acid analysis is small. After an amino 
acid sequence has been obtained, two strategies can 
be used to find the gene that encodes the purified 
enzyme. If the enzyme is from an organism whose 
entire genome is known, then the amino acid sequence 
can rapidly be found among all the proteins encoded 
by that genome. For organisms whose genomes have 
not been fully sequenced, searches of the available 
DNA sequences may fail to identify the gene encod- 
ing the purified enzyme. Nevertheless, the entire gene 
can still be identified by molecular methods. In this 
approach, a set of mixed oligonucleotide probes is 
synthesized, based on the genetic code, to correspond 
to the sequence of amino acids in the peptide from 
the purified enzyme. 

A genomic library prepared from the organism 
under study (above) is then screened for the gene 
that hybridizes strongly to the mixed oligonucleotide 
probe. As noted above, genomic libraries can be 
prepared in plasmid vectors. They can also be pre- 
pared in bacteriophage vectors. To screen a library, 
individual bacterial or yeast colonies or bacterio- 
phage plaques, each containing a vector with a small 
segment of the chromosome under study, are separ- 
ated on petri plates, attached to a synthetic support 
medium, such as nylon, lysed, and hybridized to the 
mixed oligonucleotide probe, which is labeled. A col- 
ony or plaque that hybridizes strongly to the probe is 
further analyzed to determine whether it contains a 
complete copy of the gene that encodes the purified 
enzyme. 

The application of reverse genetic approaches 
depends on the availability of methods for purifying 
enzymes from crude extracts of an organism. In addi- 
tion, the success of this approach depends on the 
stability and cellular abundance of the enzymes. 
Enzyme assays and purification schemes often require 
considerable ingenuity to design and optimize. Un- 
fortunately, the activities of some enzymes cannot be 
assayed, because the substrates are not known or are 
not available. Certain enzymes are not functional or 
stable during purification despite the addition of cock- 
tails containing protease inhibitors, reducing agents, 
and stabilizing agents. Finally, some enzymes are 


present in amounts too low to be purified in sufficient 
quantities, even from large quantities of harvested 
cells. In these cases, another approach must be tried, 
such as two-dimensional electrophoresis to separate 
extremely small amounts of polypeptides, which can 
be indentified by using ultrasensitive methods of mass 
spectroscopy. 


Genomics 


The complete DNA sequences of the chromosomes of 
many prokaryotes and eukaryotes have been deter- 
mined, and many more genomes will sequenced in 
the near future. One surprise about the genome of 
the bacterium E. coli is that the functions of approxi- 
mately 1700 genes out of a total of 4473 genes are 
unknown. Thus, nearly 38% of the genes in E. coli, 
which is one of the most studied of all organisms, 
have not been encountered before. Complete genome 
sequences allow comparisons of the predicted amino 
acid sequences of all the proteins encoded within an 
organism and among different organisms. Within 
an organism, polypeptides with conserved amino 
acid sequences, shared structural motifs, and similar 
functions using different substrates are referred to as 
‘paralogs’ and can be classified into paralogous gene 
families. Such comparisons can sometimes suggest 
whether related proteins likely evolved by duplication 
from a common ancestral protein followed by diver- 
gence to acquire new functions. In other cases, pro- 
teins with related functions seem to have arisen by the 
convergence of two unrelated ancestral proteins or 
acquired from another organism by horizontal gene 
transfer. 

Comparisons of the predicted proteins synthesized 
in different organisms reveals conserved proteins, 
called ‘orthologs,’ that likely carry out the same func- 
tions. One of the great values of genomics is that once 
the function of a protein or enzyme has been deter- 
mined biochemically and genetically in one organism, 
its orthologs will likely have the same or a very similar 
function in other organisms. Thus, homology searches 
can suggest some readily testable hypotheses about 
the functions of previously unidentified genes in 
many different organisms. 

Genomics has been invaluable in helping to eluci- 
date the biosynthetic pathways of certain small 
molecules. For example, suppose a hypothetical bio- 
synthetic pathway has been proposed, but mutants 
defective in one step of the pathway have never been 
isolated. Or suppose that knockout mutations in a 
gene always result in a partial growth defect in which 
the mutant grows normally when a certain nutrient is 
added but slowly when the nutrient is omitted. Both 
of these cases suggest redundancy of enzyme function 
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in which two different enzymes independently 
catalyze the same step in a biosynthetic pathway. 

In the first case, the proposed step in the biosyn- 
thetic pathway may suggest a specific class of enzyme, 
such as a dehydrogenase, transaminase, kinase, or 
phosphatase. These different classes of enzymes have 
conserved amino acid motifs. From the genome se- 
quence, it is possible to predict all of the proteins in 
each class of enzyme in an organism. Thus, if a de- 
hydrogenase is postulated, then it is possible to make a 
list of all the genes encoding putative dehydrogenases 
that have not already been identified or characterized. 
In the second case, the sequence of the mutated gene 
giving the partial growth phenotype can be deter- 
mined. If this gene product is in a readily recognizable 
class of enzymes, such as kinases, then a list of kinases 
of unknown functions can be obtained from the gen- 
ome sequence. In either case, candidate genes can 
rapidly be cloned by PCR methods into multicopy 
vectors. Extracts prepared from cells overexpressing 
gene products can often be assayed for marked 
increases in suspected enzyme activities. Alterna- 
tively, insertion mutations in specific genes can be 
constructed rapidly im vitro and crossed into the 
chromosomes of bacteria, yeast, and other organisms. 
Growth defects imparted by one or by combinations 
of these mutations can then be tested. If hypotheses 
about enzymatic redundancy are correct and barring 
certain additional complications, then specific nutrient 
requirements should appear when two or more muta- 
tions are combined in the same genetic background. 

In addition, genome sequences are being used to 
identify genes encoding new biosynthetic enzyme 
activities. There is often a high degree of amino acid 
sequence conservation among enzymes that catalyze 
the same reaction on structurally related, but non- 
identical, substrates. For example, the amino acid 
sequences of dehydrogenases that oxidize three- 
carbon or four-carbon sugar phosphates are highly 
conserved. Thus, the highly conserved paralogs of an 
enzyme whose function has been determined may 
catalyze the same reaction on substrates structurally 
similar to that used by the characterized enzyme. 
Besides resolving questions about enzymatic redun- 
dancy, characterization of enzymes from specific 
families predicted from genome sequences is some- 
times a faster route to identifying new biosynthetic 
genes than genetic approaches. 

For example, transketolases make up a family of 
enzymes that catalyze a certain type of chemical con- 
densation of different pairs of substrates. This kind of 
condensation is needed to synthesize a precursor to 
a building block of long-chain hydrophobic mol- 
ecules called polyprenoids in E. coli. To identify this 
biosynthetic enzyme, unidentified enzymes in the 
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transketolase family were individually cloned, and 
one was found to carry out the biosynthesis of the 
precursor. In this case, a genetic approach would have 
been complicated by the fact that polyprenoids are 
essential for bacterial cell wall biosynthesis; therefore, 
it would have only been possible to isolate condition- 
ally lethal mutations defective in precursor biosyn- 
thesis, such as those that allow function of this enzyme 
at low but not high temperatures. 


Regulation of Biosynthesis 


A detailed discussion of the regulation of biosynthetic 
pathways is beyond the scope of this article; never- 
theless, a couple of generalizations are warranted. 
Biosynthesis is regulated at several levels. Pathway 
regulation involves inhibition of enzymatic activity 
by an intermediate or product of a biosynthetic path- 
way. Often the end product, such as nutrient F in 
Figure |, inhibits the activity of the first enzyme in 
the pathway, in this case enzyme a. Mutations in genes 
encoding enzymes, such as enzyme a, that are no 
longer feedback-inhibited by end products, such as 
nutrient F, can often be selected using chemical ana- 
logs (see above). Genetic regulation involves changes 
in the amounts of the enzymes themselves (e.g., 
enzymes a to e in Figure l). Genetic regulation is 
often studied by constructing fusions between biosyn- 
thetic enzymes and reporter proteins, whose activities 
are easy to assay. A cutting-edge approach to study 
genetic regulation involves the simultaneous quan- 
tification of all mRNA transcripts in cells using DNA 
microarrays attached to chips. Finally, it should be 
noted that the older generalization that biosynthetic 
pathways contain a single rate-limiting step is no 
longer accepted. Instead, several enzymatic steps in 
each pathway may influence the rate at which inter- 
mediates are converted and product formed by the 
pathway. The analysis of the flux of intermediates 
through pathways is called metabolic control analysis. 
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The term ‘biotechnology, which seems to have origin- 
ated in the 1970s, means different things to different 
people. A useful, broad definition — the application of 
biological systems and organisms to technical and 
industrial processes — coined by a White House work- 
ing group in the mid-1980s clearly encompasses a 
variety of old and new processes and products. These 
include endeavors as different as fish farming, the 
production of enzymes for laundry detergents, and 
the genetic manipulation of bacteria to enable them 
to clean up oil spills or synthesize human insulin. But 
to many, biotechnology connotes genetic engineering 
— specifically with the newest molecular, gene-splicing 
techniques. 

Neither biotechnology nor its subset, genetic engin- 
eering, is new. A primitive form of biotechnology 
dates back at least to 6000 Bc when the Babylonians 
used microorganisms in fermentation to brew al- 
coholic beverages. And genetic engineering can be 
dated from humans’ recognition that animals and 
crop plants can be selected and bred to enhance 
desired characteristics. In these applications, early 
biologists or agriculturists selected for desired 
phenotypes, with the poorly understood evolution of 
genotypes occurring concomitantly. 

During the past half century, better understanding 
of genetics at the molecular level has added to the 
sophistication of genetic manipulation. An excellent 


example is the genetic improvement of Penicillium 
chrysogenum, the mold that produces penicillin: Via 
the application of a variety of techniques during the 
past half century, penicillin yields have been increased 
more than a hundred-fold. Similarly, agricultural 
crops have been genetically improved with astonish- 
ing success. These applications of “conventional” bio- 
technology, or genetic engineering, represent scientific, 
technological, commercial, and humanitarian succes- 
ses of monumental proportions. However, the tech- 
niques used for these earlier successes were relatively 
crude; recently, they have been supplemented, and 
even supplanted, by “the new biotechnology,” a set 
of enabling techniques that make possible genetic 
manipulation at the molecular level. These new, 
widely applicable techniques are of two general kinds. 
The prototype, variously called recombinant DNA or 
gene-splicing, shuttles genes readily between organ- 
isms. Recombinant DNA technology provides more 
precise, better understood, and more predictable 
methods for manipulating genetic material than was 
possible with conventional biotechnology. The de- 
sired “product” of recombinant DNA manipulations 
may be the engineered organism itself — for example, 
bacteria altered to clean up oil spills, a weakened virus 
used as a vaccine, or a pest-resistant crop plant — or a 
biosynthetic product of the cells, such as human insulin 
produced in bacteria, a hepatitis vaccine antigen 
synthesized in yeast, or oil expressed from seeds. The 
other major enabling technology is the production of 
hybridomas, immortal cell lines that produce mono- 
clonal antibodies of high specificity that are useful as 
drugs and in clinical diagnostics. 

The seminal recombinant DNA experiment was 
the 1973 paper by Stanley Cohen, Herbert Boyer 
and their collaborators (Cohen and Boyer, 1973), in 
which they mixed two plasmid DNAs digested with a 
restriction enzyme and, after ligation, introduced the 
resulting recombinant, or chimeric, DNA into Escher- 
ichia coli, When the bacteria were propagated, the 
plasmids containing heterologous DNA were likewise 
propagated and produced amplified amounts of this 
recombinant DNA. 

The idea for this experiment came not as a sudden 
epiphany but was the logical extension of earlier work 
in several discrete scientific areas. Recombinant DNA 
technology developed from the synergy of several 
more or less independent lines of biological and chem- 
ical research extending over several decades. Prodigious 
research in enzymology had led to the use of restriction 
enzymes to cut DNA molecules at defined sequences, 
and to the use of DNA ligases to rejoin DNA frag- 
ments to form covalently linked chimeric molecules. 

Another essential contribution was the panoply of 
advances in fractionation procedures that permitted the 
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rapid detection and separation of nucleic acids and 
proteins. The most prominent of these techniques 
were gel electrophoretic separation of polynucleotides, 
nucleic acid hybridization, and immunological de- 
tection of specific antigens. These techniques made it 
possible to sort through, purify, and identify the frag- 
ments of genetic material to be manipulated and 
moved. 

The last essential element was the accumulated 
knowledge of microbial physiology and genetics 
that made possible the introduction of recombinant 
plasmids into bacterial cells (‘transformation’) and the 
appropriate expression of introduced genes. Thereby, 
heterologous genes could be made to function and 
express at high levels in new intracellular milieus. 

The technical successes of recombinant DNA tech- 
nology have offered not only myriad commercial 
applications, but extraordinary tools for studying the 
genetics and biochemistry that underlie fundamental 
biological processes in normal and disease states — how 
genes duplicate, the mechanism(s) of genetic recom- 
bination, the details of macromolecular synthesis, 
and the nature of control over cellular growth and 
senescence. 


Biotechnology’s Contributions to 
Science and Society 


Thus, what has changed since the demonstration of 
recombinant DNA technology in the early 1970s is 
the technology of biotechnology. The new technology 
is at the same time more precise and predictable 
than its predecessors and yields better characterized 
and more predictable products. And what a cornu- 
copia of products! There are already more than two 
dozen distinct gene-spliced or hybridoma-derived 
drugs on the market (including one adjunct to cancer 
chemotherapy whose revenues exceed $1 billion 
annually) and upwards of 500 in clinical develop- 
ment. Marketed products include human insulin 
synthesized in recombinant E. coli (Figure 1), used 
daily by millions of American diabetics; tPA, tissue 
plasminogen activator, a protein that dissolves the 
blood clots that cause heart attacks and strokes; 
human growth hormone, used to treat children with 
hormonal deficiency; erythropoietin, which stimu- 
lates the growth of red blood cells in certain 
patients suffering from anemia; and several interfer- 
ons, proteins used to treat a variety of maladies, from 
multiple sclerosis to viral infections and cancer. 
Dozens of recombinant crop and garden plants on 
the market have been genetically improved with a 
variety of introduced genes, to impart pest and 
disease resistance; these include tomato resistant to 
bacterial speck disease (modified by the introduction 
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Figure | 


¥ 


Escherichia coli containing genes for human proinsulin that were introduced with recombinant DNA 


techniques. The large, homogeneous-appearing inclusion bodies in these elongated bacteria are crystallized proinsulin 
that has precipitated out because of its high intracellular concentrations. (Courtesy of Eli Lilly & Co.) 


of a gene from the bacterium Pseudomonas syringae 
(Figure 2), and herbicide-resistant soybeans (modi- 
fied by the addition of an enzyme that degrades the 
herbicide glyphosate) that permit the use of a more en- 
vironment-friendly herbicide, and in smaller amounts. 

Another promising application of the new biotech- 
nology is gene therapy, the insertion of normal or 
modified genes into an animal or human, which can 
be done for different purposes. A common application 
is the creation of genetic lines of animals with char- 
acteristics useful in research or medicine — animals that 
are, for example, models of important human diseases 
such as breast cancer or multiple sclerosis, or that 
secrete into their bloodstream large amounts of a sub- 
stance that can be used as a human therapeutic, a 
process known as ‘biopharming.’ In humans, gene 
therapy is being widely tested to correct genetic or 
acquired disorders via the synthesis in the body of 


missing, defective or insufficient gene products. 
More than 6000 patients in approximately three 
dozen countries are currently undergoing gene ther- 
apy for diseases ranging from cystic fibrosis to cancer 
and AIDS. Gene therapy can potentially also be used 
for nontherapeutic purposes, including attempts 
at genetic ‘enhancement’ that would not correct 
abnormalities or disease but would treat conditions 
like baldness, or even increase human physical or 
mental capacities above the person’s baseline. 

Thus, genetic manipulation with the techniques of 
the new biotechnology has already provided all man- 
ner of important new research tools and commercial 
products. They have only begun to change the way we 
do biological research and to increase the choices 
available to farmers, food producers, physicians, and 
consumers. But given that the new biotechnology is 
an extension, or refinement, of the kinds of genetic 
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Tomato with bacterial gene that confers resistance to bacterial speck disease. On the right is a wild-type 
tomato plant; on the left is a plant that differs from wild-type functionally by the addition of a bacterial gene (Prf, from 
Pseudomonas syringae) that modulates resistance to bacterial speck disease. Both plants have been challenged by the 
application of the Pseudomonas pathogen. (Courtesy of Dr. Brian Staskawicz, University of California, Berkeley.) 


manipulation that preceded it, perhaps we should 
think of the technological era that is approaching as a 
Brave Old World. 
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A chimera is an organism whose cells derive from two 
or more zygotes. Blood group chimeras are identified 
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by the presence of two different blood groups in one 
person. There are two types of blood group chimera: 
twin chimeras, the result of mixing of blood between 
two fetuses in utero; and tetragametic (dispermic) chi- 
meras, the product of the fusion of two zygotes and 
development into one person containing two cell 
lineages. 


See also: Blood Group Systems; Chimera 
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Blood groups are antigenic determinants on the sur- 
face of blood cells, but the use of the term is generally 
restricted to antigens on red blood cells. A blood 
group system is one or more blood group antigens 
encoded either by a single gene or by a cluster of two 
or more closely linked, homologous genes. 


There are 25 blood group systems recognized 
by the International Society for Blood Transfusion 
(Table 1). Some systems contain only one determinant, 
others, such as Rh and MNS, contain many. The MNS 
system encompasses three genes, the Rh and Chido/ 
Rodgers systems two genes each, and the remainder of 
the systems appear to represent single genes. The 
genes controlling all the blood group systems have 
been located on specific chromosomes (Table 1); the 
genes for all but four (P, DO, SC, and RAPH) have 
been cloned and sequenced. In addition to the antigens 
of the blood group systems there are about 50 other 
well-defined red cell antigens, mostly of very high or 
very low frequency, that have not been assigned to a 
system due to insufficient genetical evidence. 

Almost all blood groups are inherited characters, 
although some blood group phenotypes may be 
modified by environment, development, or disease. 
Some blood group antigens, such as the Rh antigens, 
are only detected on red cells, whereas others may 
be present on other blood cells and in other tissues. 
Those with wide distribution throughout the body, 
such as the ABO antigens, are referred to as histo- 
blood group antigens. 


Table | Human blood group systems, genes that encode them, and their chromosomal location 
Number System name System symbol Gene name(s) Chromosome Number of 
antigens 

001 ABO ABO ABO 9 4 
002 MNS MNS GYPA, GYPB, GYPE 4 43 
003 P PI PI 22 l 
004 Rh RH RHD, RHCE | 45 
005 Lutheran LU LU 19 18 
006 Kell KEL KEL 7 23 
007 Lewis LE FUT3 19 6 
008 Duffy FY FY | 6 
009 Kidd JK SLCI4A1 18 3 
010 Diego DI SLC4A I 17 18 
oll Yt YT ACHE 7 2 
012 Xg XG XG x l 
013 Scianna SC SC l 3 
014 Dombrock DO DO 12 5 
015 Colton CO AQPI 7 3 
016 Landsteiner—Wiener LW LW 19 3 
017 Chido/Rodgers CH/RG C4A, C4B 6 9 
018 Hh H FUTI 19 l 
019 Kx XK XK x l 
020 Gerbich GE GYPC 2 7 
021 Cromer CROM DAF | 10 
022 Knops KN CRI | 5 
023 Indian IN CD44 II 2 
024 Ok OK CD147 19 l 
025 Raph RAPH MER2 II l 


Blood Group Antibodies 


Blood groups are defined by antibodies, usually 
alloantibodies produced by individuals who lack the 
corresponding antigen. Some blood group antibodies, 
such as anti-A and anti-B, are present in the plasma 
of everybody whose red cells lack the corresponding 
antigen, but most blood group antibodies are only 
formed in response to antigen-positive red cells as 
the result of transfusion or pregnancy. Some blood 
group antibodies facilitate immune destruction of 
transfused red cells carrying the corresponding anti- 
gen. This can result in an immediate or delayed 
hemolytic transfusion reactions. Maternal immuno- 
globulin G (IgG) blood group antibodies are capable 
of crossing the placenta and facilitating immune 
destruction of fetal red cells or erythroid precursors. 
This is the cause of hemolytic disease of the fetus 
and newborn. 


Structure of Blood Group Antigens 


Some blood groups are carbohydrate structures on 
glycoproteins and glycolipids. These include the anti- 
gens of the histo-blood group systems, ABO, H, and 
Lewis. The genes controlling expression of the carbo- 
hydrate antigens do not encode the antigen directly, 
but produce transferase enzymes that catalyze bio- 
synthesis of the antigens by stepwise addition of 
monosaccharide residues to an oligosaccharide chain. 
Most blood group antigens are proteins or glycopro- 
teins in which the main factor determining the blood 
group polymorphism is amino acid sequence, encoded 
directly by the blood group gene. With these glyco- 
proteins the presence of carbohydrate may still play a 
role in expression of the antigen. Many blood group 
polymorphisms represent single amino acid substi- 
tutions, but some, in the more complex MNS and Rh 
systems, involve a variety of different genetic mechan- 
isms that include intergenic recombination and splice 
site mutations. 


Function of Blood Group Antigens 


The functions of some red cell antigens are known. 
Some antigens act as membrane transporters and 
channels, facilitating the movement of biologically 
important molecules in or out of the cells. The Diego 
antigen is the red cell anion exchanger, band 3, the 
Kidd antigen is a urea transporter, and the Colton 
antigen is a water channel. The Cromer and Knops 
blood group antigens are complement regulatory 
proteins, protecting the cells from attack from auto- 
logous complement. Band 3, the Diego antigen, and 
glycophorin C, the Gerbich antigen, have a structural 
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function, acting as links between the lipid bilayer and 
the membrane skeleton. Functions of some antigens 
on red cells can be surmised, either because their func- 
tions on other cells are known or because they resem- 
ble other structures of known function. For example, 
the Lutheran, LW, and Ok antigens are members of 
the immunoglobulin superfamily of adhesion mol- 
ecules and receptors. 

Almost nothing is known of the biological signifi- 
cance of blood group polymorphism. Some blood 
group antigens have been exploited by pathogenic 
microorganisms as receptors, important for the at- 
tachment of the parasite to host cell and subsequent 
invasion. It can be speculated that some cell surface 
polymorphisms may have evolved in response to 
selection pressures imposed by pathogens. 
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Bloom’s syndrome (BS), the constant clinical feature 
of which is small size, is the phenotype of persons who 
fail to inherit a normal BLM gene. BS cells, because 
they lack the activity of BLM, the protein encoded by 
BLM, are hypermutable and hyperrecombinable, an 
important consequence of which is a predisposition to 
neoplasia. 

Clinically, BS features proportional dwarfism, 
usually accompanied by a sun-sensitive erythematous 
skin lesion limited to the face and dorsa of the hands 
and forearms, a characteristic facies and head config- 
uration, and immunodeficiency, the last predisposing 
to otitis media and pneumonia. Affected men fail to 
produce spermatozoa, and women, though sometimes 
fertile, cease menstruating at unusually early ages. 
Excessive numbers of well-circumscribed areas of 
dermal hypo- and hyperpigmentation are present. 
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The three major complications are chronic lung dis- 
ease, diabetes mellitus, and cancer. 

BS is a genetically determined trait transmitted in 
straightforward autosomal recessive fashion, muta- 
tion at the locus BLM being responsible. Homozyg- 
osity or compound heterozygosity of any of the more 
than 60 mutations at BLM identified so far result in a 
similar phenotype. The mutations are predominantly 
null alleles, but missense mutations also have been 
detected. BS is rare in all populations, but in the 
Ashkenazi Jewish population one particular mutant 
allele, a 6-bp deletion and 7-bp insertion that results in 
premature termination of translation, has through 
founder effect reached a relatively high carrier 
frequency of approximately 1%; in 31% of all persons 
with BS one or both parents are Ashkenazi. 

The genome is abnormally unstable in the somatic 
cells of persons with BS so that mutations arise spon- 
taneously and accumulate in numbers many times 
greater than normal. These include both microscop- 
ically visible chromatid gaps, breaks, and rearrange- 
ments and mutations at specific loci. Exchanges 
between chromatids take place excessively, at what 
appear to be homologous sites. One consequence of 
this hyperrecombinability is reduction to homo- 
zygosity of constitutionally heterozygous loci distal 
to points of exchange. 

Some of the clinical characteristics of BS may be 
viewed as direct or indirect consequences of the 
hypermutability, so that clinical BS has been consid- 
ered the prototype of a class of disease referred to as 
the somatic mutational disorders. Nevertheless, the 
small size, the diabetes, and the immunodeficiency 
remain to be explained. A major consequence of the 
hyperrecombinability and hypermutability is prone- 
ness to neoplasia; BS more than any other known 
human state predisposes to the development of cancer 
of the types and sites that affect the general popu- 
lation, and at unusually early ages: carcinoma com- 
monest, leukemia and lymphoma next in frequency, 
the rare childhood neoplasms last. 

Diagnosis of BS is based on clinical observation. 
Laboratory confirmation ordinarily is by cytogenetic 
demonstration of the characteristically increased ten- 
dency of chromatid exchange to take place. BS is the 
only condition known that features a greatly increased 
rate of sister chromatid exchange (SCE), and blood 
lymphocytes in short-term culture are suitable for 
confirming or disaffirming the diagnosis. Under cer- 
tain circumstances, the diagnosis can be confirmed by 
demonstrating mutation(s) at BLM by molecular 
techniques. 

The mapping of BLM to chromosome band 
15q26.1 and its subsequent molecular isolation identi- 
fied a nuclear protein which contains a 350 amino acid 


domain common to DNA and RNA helicases. The 
helicase domain of the BLM protein is 40-45% 
identical to that present in the RecQ subfamily of 
DNA helicases. Although DNA-dependent ATPase 
activity and DNA duplex-unwinding activity have 
been demonstrated for several RecQ helicases includ- 
ing BLM, the nucleic acid substrates these proteins 
act upon in the cell are unknown. Whatever these 
substrates are, the molecular and genetic evidence 
from BS identify BLM as a protein of importance 
in the cellular mechanisms that maintain genomic 


stability. 
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Blunt-end ligation is a reaction that joins two double- 
stranded DNA molecules (without ‘staggered cohe- 
sive ends’) directly at their ends. 


See also: DNA Ligases 


Bombay Blood Group 
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The Bombay phenotype is a very rare histo-blood 
group phenotype in which H antigen, the precursor 
of the A and B blood group antigens, is absent from 
red cells and from all other parts of the body. A 
and B antigens are not produced, regardless of ABO 
genotype. Bombay phenotype results from homo- 
zygosity for inactivating mutations in both 1,2-a- 
fucosyltransferase genes, FUTI and FUT2. 


See also: Blood Group Systems 
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The domesticated or mulberry silkworm, Bombyx 
mori, is the second best-studied insect genetic model 
after the fruit fly, Drosophila melanogaster. Its rela- 
tively large size (up to 5 g per mature last stage larva), 
fecundity (200-400 eggs per female), and reasonably 
short life cycle (2 months), together with the ability to 
rear thousands of insects en masse, present experimen- 
tal advantages that have been exploited for basic 
research in parallel with its use in agriculture for silk 
production. Two major types of genetic resources 
have been developed for the silkworm: (1) stocks 
carrying a wide variety of classic Mendelian mutations 
(more than 450 described) and radiation-induced 
chromosome aberrations which have been used to 
study fundamental biological processes, such as bio- 
chemistry, development, physiology, hormone action, 
sex determination, virus infection, radiation sensitiv- 
ity, and feeding behavior; and (2) hundreds of inbred 
strains that differ in economic traits such as silk yield 
and quality, growth rate, fecundity, fertility, disease 
resistance, and tolerance to seasonal variation in rear- 
ing conditions, which are used for practical breeding. 
The practical breeding strains have been the source 
of many of the spontaneous mutations in present 
stock collections; however, their potential as sources 
of quantitative trait loci (QTL), genes affecting 
complex or polygenic traits, is just beginning to be 
exploited. A third source of genetic variation is 
Bombyx mandarina, the putative wild ancestor of 
the silkworm which can be found in mulberry fields 
(the main food source for both species) in Southeast 
Asia (Japan, Korea, and China). Despite a difference in 
chromosome number between most extant popula- 
tions of B. mandarina (n = 27) and B. mori (n = 28), 
the two species are interfertile. Thus, the wild silk- 
moth has been the source of many larval and adult 
color variants as well as, more recently, distinctive 
behavioral traits by introgression into the domestic- 
ated species. 

In general, silkworm genetics has developed in 
parallel with traditional genetics worldwide, with 
scientists applying the basic tools and techniques of 
the larger genetics community to this model organism 
as they became available, from linkage mapping to 
gene isolation. The most recent and exciting technical 
breakthrough is the ability to produce transgenic silk- 
worms; this will significantly extend the usefulness of 
the silkworm as a model for studying basic biological 


Bombyx mori 231 


processes as well as for potential production of value- 
added products introduced by genetic engineering. 


Cytogenetics 


B. mori, as a typical lepidopteran, has small numerous 
‘holokinetic’ chromosomes (n = 28) with dispersed 
centromeres and, unfortunately, few visible landmarks 
apart from occasional constrictions and bead-like 
chromomeres. The failure to show regular banding 
patterns in meiotic or mitotic tissue and the lack of 
polyteny in terminally differentiated tissue despite 
its polyploidization (chromosomes replicate without 
cytokinesis but fail to align as in dipteran insects) have 
meant that cytogenetics is of limited utility for inves- 
tigating chromosome fine structure. Chromosomal 
fluorescent im situ hybridization (FISH) has been used 
successfully to localize repeated sequences, including 
telomeric repeats, ribosomal DNA, families of dis- 
persed retrotransposable elements (LINES), and silk- 
worm homologs of centromere-associated sequences. 
Few single-copy genes have been localized by this 
method, perhaps because of the relatively large 
genome size (530 Mb) and features of chromosome 
organization not well understood. 


Classic Genetics 


The community of scientists exploiting the silkworm 
for genetic and other studies has largely been confined 
to Southeast and South Asian countries traditionally 
engaged in sericulture such as Japan, China, Korea, and 
India, which maintain the largest number of well- 
defined genetic stocks, but there has also been a tradi- 
tion of basic research in the silkworm in Europe, 
notably in Russia, Italy, and France, which also have 
historic and modern ties to the silk industry. Most 
Mendelian mutations have been found as spontaneous 
mutations during the course of mass-rearing for stock 
maintenance or sericulture. Standard methods of muta- 
genesis such as irradiation and chemical mutagenesis 
followed by selective screens to target specific pro- 
cesses or developmental stages have also been used, 
but have been limited by the expense of mass-rearing 
and long-term stock maintenance, which must be car- 
ried out annually to renew vitality despite the ability 
to put most strains into an early embryonic diapause 
(dormancy) for 6-10 months per year. Diapause, which 
depends in part on genetic constitution and partly on 
rearing conditions (temperature and light cycle), is 
broken by a period of chilling, or by artificial acti- 
vation at critical stages after egg-laying. In addition to 
point mutations and deletions affecting specific traits, 
modest collections of chromosome aberrations have 
been produced, notably autosomal and sex-limited 
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translocations; the latter have been especially 
useful for investigating the chromosomal basis of sex- 
determination, and in the creation of visibly marked 
stocks for automatic sexing for egg production in 
sericulture. Silkworm mutations have been mapped 
to around 210 loci on 28 linkage groups. The latter 
vary in density from 2 to 15 markers. Although some- 
what arduous because of the large number of linkage 
groups, gene mapping routinely takes advantage of the 
fact that silkworm females are heterogametic (ZW) 
and have no crossing-over, whereas males, which are 
homogametic (ZZ), undergo normal genetic recombin- 
ation. This allows one to assign a marker to a known 
linkage group without the complications of interchro- 
mosomal exchange by mating heterozygous F; females 
to homozygous males of an appropriate genotype. The 
mutation’s map position can then be assayed by rever- 
sing the cross and mating heterozygous males with 
homozygous females, using stocks with multiple mar- 
kers located only on the chromosome of interest. 
These traditional approaches have remained largely 
unchanged since the beginnings of silkworm gene 
mapping in the early part of the century, primarily 
because of the difficulty of developing balancer stocks 
or other kinds of genetic tools to facilitate the process. 


Molecular Biology 


In the past 30 years research groups in Japan, France, 
the United States, and Canada have developed a num- 
ber of model systems in the silkworm primarily to 
study control of gene regulation; these early model 
systems include the genes encoding silkgland-specific 
proteins (the two major silk fiber proteins, fibroin 
heavy and light chains, the soluble cocoon ‘gum,’ 
sericin, and p25, a putative chaperone protein) and 
transfer RNAs for amino acids that are highly 
enriched in silk, and the chorion multigene families 
that encode the eggshell proteins, which are synthe- 
sized and secreted by the follicular cells that nurture 
the growing oocyte. Recent advances in cloning tech- 
nology have been widely used to isolate and study 
many silkworm genes based on knowledge of protein 
products or expected homology with conserved genes 
from other species, extending the study of fundamen- 
tal mechanisms into such areas as early development, 
the immune response, sex determination, and neuro- 
biology. Although to date no group has reported suc- 
cessful isolation of a mutation by direct positional or 
map-based cloning, as discussed below, the tools to do 
this are becoming available. 


Molecular Genetics 


In the past several years molecular linkage maps 
have been constructed using a variety of physical 


markers, including restriction fragment length poly- 
morphisms (RFLPs) based primarily on anonymous 
and partially sequenced cloned cDNAs (expressed 
sequence tags, ESTs) and isolated genes, random arbi- 
trary polymorphic DNAs (RAPDs), microsatellites, 
and inter-simple sequence repeats (ISSRs). Map dens- 
ity ranges in these maps are up to 1000 markers for 
RAPDs, which are estimated to be spaced at an aver- 
age of 500kb, and are being integrated with other 
molecular markers as well as with the conventional 
genetic maps. The construction of BAC libraries in the 
same genetic stocks as the primary molecular linkage 
maps will make positional cloning feasible in the near 
future. This work, together with broad-scale gene 
isolation and identification, is being aided by the 
development of a large-scale EST sequencing project 
which will provide a source of anchor loci for con- 
tig assembly aimed at a whole genome sequencing 
project. The EST database (“SilkBase”) contains more 
than 20000 clones to date representing at least 8000 
independent sequences obtained from more than 30 
different cDNA libraries from many tissues and 
developmental stages. Data from SilkBase is also 
being used for tissue transcription profiling and 
transcription mapping of isolated genomic DNA 
fragments. 


Gene Introduction 


BmNPY, nucleopolyhedrosis virus, a double-stranded 
DNA baculovirus that infects silkworms and is a ser- 
ious problem in sericulture, has been engineered as an 
expression vector to produce exogenous products 
such as interferon for commercial use and has some 
applications in the pharmaceutical industry. Strains of 
B. mori have been bred specifically for mass-rearing 
under sterile conditions using artificial diet, and can be 
used to harvest expressed proteins directly from the 
hemolymph before the virus kills the host. Current 
efforts are aimed at disabling the virus, and engineer- 
ing it to become a suitable vector for germline trans- 
formation. 

Current methods for obtaining transgenic silk- 
worms rely on a vector derived from piggyBac, a 
transposable element first found in the cabbage looper, 
Trichoplusia ni, using green fluorescent protein as a 
reporter and a silkworm actin promoter, and injected 
directly into the embryo at the syncytial preblasto- 
derm stage shortly after egg-laying. Although not yet 
routinely used by silkworm geneticists, improvements 
inthe efficiency of obtaining stable germline transform- 
ants by the development of new vectors and easier 
methods of gene delivery are under investigation, 
and promise to usher in a new period of basic research 
using this model to elucidate basic genetic mechanisms 


in an order of insects that include the worst agricul- 
tural pests world-wide. 


Further Reading 

Goldsmith MR and Wilkins AS (eds) (1995) Molecular Model 
Systems in the Lepidoptera. New York: Cambridge University 
Press. 


See also: Biotechnology 


Bootstrapping 


See: Trees 
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Definition and Examples 


In the context of population genetic studies, the con- 
sequences of a temporary reduction (and subsequent 
expansion) of population sizes on genetic variation is 
called the bottleneck effect. Often this phenomenon is 
studied in conjunction with the ‘founder effect,’ 
which influences the genetic composition of a popula- 
tion initially formed by a small number of founders. 
The population genetic properties of these effects are 
closely related, since in both cases the contemporary 
members of the population can trace back their ances- 
try to a small number of common ancestors at some 
time point in the past. Scenarios that produce these 
effects can be varied. For example, a successful large 
population may undergo an ecological disaster, during 
which a small number of individuals survived, and 
upon restoration of favorable ecological conditions, 
they continue to produce offspring to increase the 
population size in subsequent generations. Likewise, 
a small number of individuals from a population may 
decide to colonize a new geographic area to establish a 
new population, in which case at a future point in 
time, all individuals of this new population would 
trace their ancestry to the common set of founders. 
Disease epidemics, political warfare, geographic isol- 
ation, etc., all may induce these type of effects, ex- 
amples of which are abundant in the animal and plant 
populations. 

In the human context, the genetic composition of 
the present population of Finland is studied in refer- 
ence to bottleneck effects (Sajantila et al., 1996; Kittles 
etal., 1999). The origin of American Indian populations 


Bottleneck Effect 233 


(Cavalli-Sforza et al., 1994), and the genetic structure 
of the population in the South Atlantic island of 
Tristan da Cunha (Thompson, 1986) are classic ex- 
amples in which founder effects are discussed. Some 
authors argue that the contemporary genetic structure 
of almost every modern human population is shaped 
by past bottlenecks and subsequent rapid expansions 
(Harpending et al., 1993; Kimmel et al., 1998). How- 
ever, there is still a controversy about whether the 
global human genome diversity can be explained by 
any drastic bottleneck, or a relatively small long-term 
effective population size (Li and Sadler, 1991; Reich 
and Goldstein, 1998). Historical bottleneck effects are 
also used to explain the reduced genetic variation in 
African cheetahs (Menotti-Raymond and O’Brien, 
1993) and elephant seals (Hoelzel et al., 1993), and 
evidence exists suggesting that many Drosophila 
species may have evolved through genetic bottlenecks 
(Nei et al., 1975; Hedge and Krishna, 1996). Do- 
mestication of plants also lead to reduction of genetic 
diversity through bottlenecks (Eyre-Walker et al., 
1998). Evolution of the human immunodeficiency 
virus type 1 (HIV-1) is also explained by serial bottle- 
neck phenomena (Nijhuis et al., 1998; Yuste et al., 
1999). 


Genetic Effects of Population Bottleneck 


The realization that population bottlenecks reduce 
genetic variation was made long before any formal 
assessment of this phenomenon was made (e.g., 
Mayr, 1963). Nei et al. (1975) made the first formal 
attempt to quantify the extent of such loss of variation, 
together with making predictions of the time required 
for restoring genetic variation through subsequent 
expansion of the population size. Subsequent to this 
initial study, numerous other quantitative treatments 
of bottleneck effects have been made, entertaining 
different types of mutation models (Chakraborty 
and Nei, 1977), repeated bottlenecks (Maruyama and 
Fuerst, 1985), estimation of the timing of expansion 
traversing backward in time (Rogers and Harpending, 
1992), and analysis of the statistical power of differ- 
ent statistics for finding signatures of population 
bottleneck from genetic data on current populations 
(Cornuet and Luikart, 1996; King et al., 2000). 

In summary, the rate at which genetic variation is 
restored following a bottleneck event depends upon a 
number of parameters, such as the bottleneck size, 
intrinsic rate of population growth following the bot- 
tleneck event, mutation rate and mechanism (i.e., 
every mutation being new, versus forward—backward 
changes of allelic states), impact of natural selection, 
and migration. Further, with these parameters remain- 
ing the same, not all summary measures of genetic 
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variation (e.g, number of segregating alleles, gene 
diversity, and allele size variance) have the same popu- 
lation dynamics during the recovery process. For ex- 
ample, the process of restoration of genetic variation 
in terms of number of segregating alleles, allele size 
variance is much quicker than that of gene diversity or 
heterozygosity. As a consequence, when composite 
parameters of any model are estimated from these 
various statistics, the estimates do not necessarily con- 
form to each other, which can be measured by so- 
called ‘imbalance indices’ (Kimmel et al., 1998; Reich 
et al., 1999; Gonser et al., 2000). 

In the context of microsatellite loci, the signature of 
expansion is a faster growth of variance of the number 
of repeats, compared to heterozygosity. This can be 
translated into transient growth of an imbalance index 
p, defined as a function of variance and heterozygosity 
(Kimmel et al., 1998). One use of such imbalance 
indices is that they allow distinction between different 
scenarios of population bottlenecks. The scenario in 
which a small population at equilibrium undergoes a 
rapid expansion is sometimes named the ‘long neck.’ 
Another type is ‘hourglass,’ which is characterized by 
a rapid reduction in size followed by a rapid expan- 
sion. In this later case, the imbalance observed is an 
initial transient reduction of f, followed by a subse- 
quent growth of f. At a given time following the 
event, the genetic signature of the hourglass can be 
opposite to that of the long neck (classical bottleneck), 
as it was demonstrated by Kimmel et al. (1998). This 
distinction is not always recognized in the literature 
(Harpending et al., 1998). 

Bottleneck effects also distort the dynamics of gene 
differentiation between populations. For models 
where the genetic differentiation accumulates linearly 
with the time of divergence, nonlinearity may result 
from recent bottlenecks in one or both of the popula- 
tions (Chakraborty and Nei, 1977; Hedrick, 1999). 
As a consequence, the original idea of Wright (1938), 
that the effect of population size fluctuation can 
be accounted for by considering the evolutionary har- 
monic mean effective size, does not explain all features 
of genetic variation in a bottlenecked population. 


Bottleneck and Other Evolutionary 
Factors 


Bottleneck effects may also mimic effects of other 
evolutionary factors. For example, under the ‘infinite 
allele model’ of mutations, the deviation from the 
expected relationship between number of segregating 
alleles and homozygosity in a population recovering 
from a bottleneck is exactly in the same direction (i.e., 
too little homozygosity for a given number of seg- 
regating alleles) as that produced by advantageous 


selection and/or population substructure. Likewise, 
for DNA sequence polymorphisms, the imbalance of 
the number of segregating sites and the extent of 
sequence mismatch (Tajima, 1989) cannot discrimin- 
ate bottleneck effect from that of advantageous muta- 
tions and/or population substructure. These are 
artifacts of the phenomenon of accumulation of excess 
rare alleles, observed through theoretical as well as 
empirical analyses (Chakraborty et al., 1988). Further 
problems arise because some statistics of genetic 
variation cannot discriminate between different scen- 
arios of population growth following a bottleneck 
(Polanski et al., 1998). 

Since bottlenecks affect the coalescence history of 
gene genealogies, such effects are also important in 
designing population- based association studies for 
fine mapping of genes. This is because a bottleneck 
can reintroduce linkage disequilibrium between loci at 
certain regions of the genome, probably randomly 
irrespective of the history of disease mutations. 
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In the present-day parlance, Theodore Boveri (1862- 
1915) would be called a developmental biologist (or 
geneticist), although in his day he was considered a 
cytologist. His experimental as well as theoretical 
approach to biology was purely analytical. He stood 
unrivaled among biologists of his age. Boveri was born 
in Bamberg, Germany, in 1862. His early education 
was in a semi-classical secondary school in Nürnberg, 
which was followed by classical education and a doc- 
tor’s degree in medical science at the University of 
Munich, the latter awarded in 1885. He chose cyto- 
logical research as a career and the distinguished zo- 
ologist Richard Hertwig of the Zoological Institute of 
Munich as his mentor. His work in association with 
Hertwig gained international recognition and he was 
visited by distinguished scientists such as E. B. Wilson 
of New York. At the age of 31, he was appointed to the 
chair of Zoology and Comparative Anatomy at the 
University of Würzburg, where he spent the rest of his 
life. He also held nonacademic positions such as rector 
at the University. During this period, he made regular 
visits to the Zoological Station in Naples, renowned 
for its emphasis on developmental biology. His aca- 
demic and public standing led to the offer in 1912 of 
the directorship of the soon-to-be-founded Kaiser 
Wilhelm Institute of Biology in Berlin. Although he 
ultimately declined the position, he was instrumental 
in recruiting such luminaries as Hans Spemann, Otto 
Warburg, Richard Goldschmidt, and Max Hartmann 
to this institution. 

Boveri’s scientific work can be broadly divided into 
two parts, experimental and theoretical; the latter, 
dealing mainly with the origin of cancer, was based 
on the results of the former. In a series of publications 
entitled Zellenstudien he described a set of seminal 
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studies on the mitotic behavior of chromosomes in the 
roundworm Ascaris megalocephala and the sea urchin 
Paracentrorus, which led to far-reaching conclusions. 
He demonstrated that chromosomes emerge from 
interphase with the same number with which they 
went into interphase (four in the case of A. megaloce- 
phala). This observation allowed him to hypothesize 
that chromosomes maintain their independence (indi- 
viduality) during interphase between cell divisions. 
In another set of experiments, he fertilized fragmented 
sea urchin eggs with sperm and his subsequent analy- 
sis of the fertilized and unfertilized products enabled 
him to show that gametic nuclei of both parents 
contributed parallel information. A third important 
line of investigation dealt with the behavior of chromo- 
somes and fate of cleavage products of eggs with 
tetrapolar and tripolar spindles, derived from dis- 
permic fertilizations. Eggs with tetrapolar spindles 
showed unequal distribution of chromosomes in 
daughter cells and the embryos did not survive 
beyond gastrulation, while those with tripolar spin- 
dles carried varying numbers of chromosomes and 
were capable of development to various stages, exhi- 
biting abnormalities at the same time. The results of 
these experiments allowed Boveri to suggest that indi- 
vidual chromosomes are endowed with unique quali- 
ties. The idea that chromosomes maintain their 
individuality through cell divisions, parental gametic 
nuclei contain parallel information, and individual 
chromosomes are qualitatively different from each 
other laid down a firm foundation for the chromo- 
some theory of heredity, foreshadowing the emer- 
gence of the new science of genetics, early in the 
twentieth century. 

Obviously Boveri thought a great deal about his 
experimental work, because a year before his death 
he published the book Zur Frage der Entstehungen 
maligner Tumoren in which he set forth what has 
since come to be known as the chromosomal theory 
of cancer. Although he had never worked with cancer 
cells, based on his sea urchin work, he proposed that a 
malignant tumor could arise from an abnormal 
chromosomal constitution resulting from a multipolar 
mitosis. For good measure, he declared that the tumor 
problem is a cell problem. It took another 85 years 
and the emergence of genetics and molecular biology 
to realize how perceptive the best of human geniuses 
can be. 


Further Reading 
Boveri T (1914) Zur Frage der Entstehungen maligner Tumoren. 
Jena: G. Fischer. 


See also: Cancer Susceptibility; Chromosome 
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B-prolymphocytic leukemia (B-PLL) is a leukemia of 
medium size B lymphocytes with distinct morpho- 
logical features: a prominent central nucleolus, affect- 
ing blood, bone marrow, and spleen. B-PLL is rare, 
comprising about 1% of lymphocytic leukemias. Most 
patients are elderly, with a median age of 70 and a 
slight male predominance (male:female ratio 1.6:1). 
Main clinical features are a high lymphocyte count 
(>100 x 10° 171), splenomegaly with no lymphadeno- 
pathy, anemia, and thrombocytopenia. B-PLL cells 
express strong surface IgM +/— IgD and other B-cell 
antigens (CD19, 20, 22, 79b). In contrast to B-chronic 
lymphocytic leukemia, CD5 and CD23 are often 
negative. There are no specific chromosome abnormal- 
ities, although breakpoints involving 14q32 and 
t(11;14) (q13;q32) are found in 20% of cases. Problems 
of differential diagnosis with mantle cell lymphoma 
may arise in such cases. The frequency of p53 ab- 
normalities is the highest (53%) of all lymphoid 
malignancies and this underlies the progressive clinical 
course. Deletions at 11q23 and 13q14 have been 
reported by fluorescent in situ hybridization (FISH) 
analysis. B-PLL responds poorly to treatment. The 
median survival is 2-3 years. 


See also: Leukemia 
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Derived from the Greek meaning “short digits,” this 
term encompasses a group of hand malformations 
characterized by shortening of the fingers secondary 
to abnormal development of the metacarpals and/or 
phalanges. The heritable brachydactylies can occur 
as either an isolated malformation or as part of a 
wider syndrome and are subclassified based on their 
specific pattern of digital involvement (types A1-A4, 
B, C, D, E). Brachydactyly types Al—A4 are character- 
ized by shortening of the middle phalanges in various 
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patterns; type B, by shortened middle phalanges and 
distal phalanges and nail aplasia (gene localized to 9q); 
type C, by shortened middle phalanges and ulnar 
deviation of the index and middle fingers (due to 
heterozygous mutations in the cartilage-derived mor- 
phogenic protein 1 gene); type D, by short, broad 
terminal phalanges of the thumbs and great toes; and 
type E, by shortness of all metacarpals and phalanges, 
especially the fourth and fifth digits (GNAS1 gene mu- 
tations found in subgroup with Albright hereditary 
osteodystrophy). 


See also: Genetic Diseases 
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The mouse Brachyury or T locus encodes a product 
with specific DNA-binding activity that is likely to 
play a role in the development of all metazoan 
organisms. A family of mouse genes that share 
the novel DNA-binding peptide motif — called the 
T-box — found in the prototypical T locus has 
recently been identified. Each mouse T-box gene is 
expressed in a unique temporal and spatial pattern 
during mouse embryogenesis and these expression 
patterns are suggestive of possible functional roles 
for each gene product. T-box homologs have also 
uncovered and characterized from the human and 
Caenorhabditis elegans genomes. The accumulated 
data indicate first that multiple T-box genes were 
present in the common metazoan precursor to 
worms and people, and second, that certain T-box 
genes may have emerged from duplication events 
that catalyzed the evolution of specialized vertebrate 
developmental characters. The T-box motif is unique, 
and the T-box family of genes appears to represent 
a heretofore unrecognized category of develop- 
mental transcription factors. Thus, members of the 
T-box family could have played a role in the 
evolution of all metazoan organisms. Current studies 
of T-box expression in adult human tissues as well as 
knockout studies in the mouse are aimed at further 
elucidating the function of individual T-box genes 
and examining the possibility that mutations in these 
genes could be involved in particular human disease 
states. Already, connections have been made between 
Tbx5 and Holt-Oram syndrome, and between Tbx3 
and ulnar-mammary syndrome. Results from our 
laboratory suggest a possible role for Tbx1 in 


DiGeorge syndrome, and for Tbx15 in acromegaloid 
facial appearance (AFA) syndrome. Most interest- 
ingly, all these diseases appear to be caused by null 
mutations that act in a dominant manner, as first 
discovered with the prototypical T-box gene, 
Brachyury. 


See also: DiGeorge Syndrome; T-Box Genes 
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Branch migration is the movement of the point at 
which two homologous DNA molecules exchange 
base-paired strands. Two duplex molecules interact 
at a Holliday junction, and a Y-structure is formed 
if a single strand interacts with a duplex (forming a 
displacement- or D-loop). Branch migration extends 
or shortens the length of heteroduplex DNA accord- 
ing to the direction of movement, away from or 
towards the initial point of exchange. Branch migra- 
tion can occur spontaneously. The process is catalyzed 
in Escherichia coli by RuvAB, acting predominantly 
on Holliday junctions, and by RecG, which also acts 
on Y-structures. 


See also: Cruciform DNA; Holliday Junction; 
RuvAB Enzyme 
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The Brassicaceae, also called the Cruciferae in refer- 
ence to its four “crossed” petals, is commonly known 
as the mustard family. The family is of systematic 
interest, in part, because it includes the various culin- 
ary mustards such as Chinese mustard (Brassica jun- 
cea), black mustard (B. nigra), white mustard (Sinapis 
alba), horseradish (Armoracia rusticana), radish 
(Raphanus sativus), and the highly human-modified 
B. oleracea which provides broccoli, Brussels sprouts, 
cabbage, cauliflower, kale, and kohlrabi. The family 
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also provides canola (rapeseed) oil (B. napus) and a 
number of ornamental plants. The “fast plants” often 
used in biology classes are derived from turnip 
(B. rapa). Of more current interest is the plant Arabi- 
dopsis thaliana, which has become a model organism 
in studies of development, embryology, gene expres- 
sion, and genome evolution and organization because 
of its low chromosome number (n = 5), compact 
genome, rapid life cycle, and the ease with which it 
is grown in the laboratory. The entire genome of this 
species has recently been sequenced by the inter- 
national Arabidopsis Genome Initiative. Systematic 
studies in Brassicaceae, i.e., those that focus on evolu- 
tionary history and classification, have traditionally 
focused on morphological variation of the silique, a 
unique fruit found only in the Brassicaceae. With the 
advent of molecular systematics — using information 
derived primarily from DNA to ascertain phylo- 
genetic (evolutionary) relationships and applying 
this knowledge to taxonomy - the classical view of 
Brassicaceae has been changing. 

Phylogenetic analyses of DNA sequences from the 
nuclear (18S ribosomal DNA) and chloroplast (rbcL 
and atpB genes) genomes, as well as combined ana- 
lyses including gene sequences and morphological 
data, place the Brassicaceae in a monophyletic group 
of families, the order Brassicales, that all produce 
sulfur-containing glucosinolates, the mustard oils. A 
monophyletic group is one that contains its ancestor 
and all of that ancestor’s descendants. Only one other, 
unrelated genus of plants (Drypetes, Euphorbiaceae) 
contains glucosinolates. These analyses, as well as one 
that also included the mitochondrial genes atp? and 
matR, place the Brassicales in a larger monophyletic 
group often referred to as Eurosids II, which includes 
such families as the Malvaceae (e.g., cotton, okra), 
Onagraceae (fuchsia, evening primrose), Anacardia- 
ceae (sumac, poison ivy), Rutaceae (citrus crops), and 
Sapindaceae (maples). An inescapable conclusion of 
these studies is that the Capparaceae (caper family) 
should be included in a broadened Brassicaceae. It is 
clear that the Brassicaceae and the Capparaceae share a 
common ancestor and that the Brassicaceae as tradi- 
tionally delimited has arisen from within Cappara- 
ceae. Excluding Capparaceae from Brassicaceae 
causes Capparaceae to be paraphyletic (i.e., it includes 
its common ancestor but not all of that ancestor’s 
descendents). The broadened family retains the 
name Brassicaceae, rather than Capparaceae, because 
Brassicaceae is the older name. 

Analyses within Brassicaceae have either focused 
on the relationships of species within particular genera 
or, because of the importance of A. thaliana to bio- 
logical studies, examined the relationships of the 
genus Arabidopsis to other members of the family. 


To date, our overall view of relationshisp within Bras- 
sicaceae is primarily a product of the later studies. 
Published molecular analyses focusing on the place- 
ments of Arabidopsis have relied on DNA sequences 
of the chloroplast rbcL gene, the nuclear gene and 
circumscription adc (arginine decarboxylase), adh 
(alcohol dehydrogenase), and chs (chalcone synthase), 
the nuclear internal transcribed spacers (ITS) of the 
large subunit of rDNA, and on restriction site analysis 
of the chloroplast genome. These studies clearly indi- 
cate that the intrafamilial systematics of the family are 
in need of revision. For example, the genus Lesquer- 
ella is paraphyletic if Physaria is not included in it; 
Arabis is currently polyphyletic, consisting of several 
unrelated lineages; and long-standing tribal relation- 
ships in the family either need to be thoroughly 
revised or abandoned due to the rampant polyphyly 
and paraphyly of the tribes. Conversely, molecular 
studies have strongly supported, for example, the rela- 
tionships among the mustard species of the genus 
Brassica previously worked out on morphological 
and cytological grounds. Because of the attention 
given to Arabidopsis, much more is now known of 
its inter- and intrageneric relationships. The genus 
now consists of A. thaliana, all of the species formerly 
included in Cardaminopsis, and some taxa previously 
placed in Arabis. Except for Arabidopsis thaliana, all 
species previously placed in Arabidopsis have now 
been reassigned to other, mostly new, genera. 
Molecular tools have not only forced a reconsider- 
ation of the systematics of Brassicaceae, they have 
refocused the attention of systematists on a suite of 
characters previously underutilized, such as leaf inser- 
tion, growth form, hair types, cytology, and biogeo- 
graphy. Reliance on silique morphology to the 
exclusion of other characters is untenable. The genus 
Arabisisanexample where relying on fruit morphology 
has produced a highly unnatural, polyphyletic genus. 


See also: Arabidopsis thaliana: Molecular 
Systematics and Evolution; Arabidopsis thaliana: 
The Premier Model Plant; Plant Development, 
Genetics of 
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Brassinosteroids are a class of polyhydroxylated sterol 
derivatives that act as steroid hormones in plants. 
Like their animal counterparts, brassinosteroids have 
been shown to regulate gene expression, stimulate cell 
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Figure | Brassinolide. 


division and differentiation, and modulate repro- 
ductive development. Brassinosteroids also mediate 
growth responses unique to plants, including promo- 
tion of cell elongation in the presence of a complex cell 
wall, xylem differentiation, senescence, stress pro- 
duction, and coordinating multiple developmental 
responses to darkness and light. The chemical 
structure of brassinolide is shown in Figure |. 


Further Reading 
Ecker JR (1997) BRI-ghtening the path to steroid hormone 
signaling events in plants. Cell 90: 825-827. 


See also: Plant Hormones 
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Approximately 1 in 10 women in the Western world 
develop cancer of the breast, and at least 5% of these 
cases are thought to result from a hereditary predis- 
position to the disease. Two breast cancer susceptibil- 
ity (BRCA*) genes have been mapped and cloned and 
mutations in these genes account for most families 
with four or more cases of breast cancer diagnosed 
before the age of 60 years. Women who inherit loss- 
of-function mutations in one allele of either of these 
genes have an up to 85% risk of breast cancer by age 
70 years. Both BRCA1 and BRCA2 are thought to 
be tumor suppressor genes, as the wild-type allele of 
the gene is observed to be lost in tumors of hetero- 
zygous carriers. As well as breast cancer, carriers of 
mutations in these genes are at elevated risk of cancer 
of the ovary, prostate, and pancreas. Surprisingly, 
despite the association with inherited predisposition, 
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somatic disease-causing mutations in BRCA/ or 
BRCA2 are extremely rare in sporadic breast cancers. 
Functions for the BRCA proteins in both transcrip- 
tional regulation and DNA repair/recombination have 
been suggested. However, it is, as yet, unclear how loss 
of BRCA gene function leads to tumorigenesis. 


Clinical Aspects 


Analysis of the pathology of breast tumors that arise 
in carriers of mutations in BRCA/ or BRCA?2 revealed 
that their properties differ from each other and from 
sporadic cases. Tumors in both BRCA/ and BRCA2 
carriers are of higher grade than sporadic cases, and 
BRCA1 tumors are much more likely to be negative 
for the estrogen receptor and to have p53 mutations 
than sporadics. This may indicate some differences in 
the ways in which the genes predispose to breast can- 
cer. Whether the survival rates of women with breast 
cancers who carry BRCA1 or BRCA2 mutations are 
different from sporadics is controversial. Early reports 
suggested that the prognosis was better than for 
matched individuals with sporadic tumors. However, 
other studies have suggested that the survival is worse 
in carriers. Larger, longer-term studies are required to 
resolve this issue. 

The high rates and early onset of breast (up to 85% 
by age 70 years) and ovarian (up to 40% lifetime risk 
for BRCAI carriers) cancers in mutation carriers has 
important clinical management implications. Regular 
mammographic screening is indicated but is of un- 
known effectiveness in younger women. Bilateral pro- 
phylactic mastectomy has been shown to be effective 
in considerably reducing the risk of breast cancer in 
women with a family history. However, this can carry 
with it psychological and physical morbidity. Prophy- 
lactic ovariectomy has also been shown to be of 
some effect in reducing breast cancer risk in BRCA1 
mutation carriers. This finding may indicate that 
hormone intervention therapies such as tamoxifen 
might be effective in reducing the risk of breast 
cancer. 


BRCAI and BRCA2 Genes and Their 
Encoded Proteins 


The BRCAI gene, which maps to human chromosome 
17q21, consists of 22 coding exons and encodes a 
protein of 1863 amino acids. Most of the BRCA1 


*Upper case and italics, i.e., BRCAI and BRCA2, indicate the 
genes in humans, whereas Brcal and Brca2 denote the 
equivalent mouse genes. Roman type, e.g., BRCAI, is used 
for the corresponding proteins. 
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protein shows no sequence similarity to previously 
described proteins apart from the presence of a 
Zn**-binding RING finger domain at the N-terminus 
of the protein and two BRCT repeats at the C- 
terminus. RING finger domains may be responsible 
for protein-protein interaction, and the BRCA1 
RING finger, like some others of this family, may be 
involved in facilitating protein degradation. The 
BRCT repeat is a poorly conserved domain found in 
a range of proteins, many of which are involved in 
either DNA repair or metabolism, such as RAD9 and 
XRCC1. Although there has been some controversy 
regarding the location of the BRCA1 protein in the 
cell, it is now believed that the protein is present 
within the cell nucleus; within S-phase of the cell 
cycle, BRCA1 localizes to discrete foci within the 
nucleus. 

The BRCA2 gene, which maps to human chromo- 
some 13q12, has 26 coding exons and encodes a pro- 
tein of 3418 amino acids, with a molecular weight of 
384kDa, which localizes to the nucleus. The only 
obvious feature of the BRCA2 protein is the presence 
of eight copies of a 30- to 80-amino-acid repeat (the 
BRC repeat) in the part of the protein encoded by 
exon 11; these repeats are able to bind the RAD51 
protein implicated in DNA repair and recombination. 


Genetics 


Breast cancer exhibits familial association in that the 
disease is about twice as common in the mothers, 
sisters, and daughters of carriers as it is in the general 
population. This familial risk rises to about fivefold 
where the cancer occurs before 40 years of age. Muta- 
tions in BRCA1 and BRCA2 account for most of 
the inherited susceptibility to breast cancer in famil- 
ies with several (more than six) affected individuals. 
However, it has been estimated that, overall, BRCA1 
and BRCA2 mutations might account for only 20- 
25% of familial risk. None of these other putative 
BRCA genes (BRCA3, —4, —5, etc.) has yet been 
mapped or cloned. 

Carriers of mutations in BRCA1 or BRCA2 have 
an up to 85% chance of developing breast cancer by 
age 70years, but this might differ between different 
populations. Hundreds of different mutations in 
BRCA1 and BRCA2 have been described (see the 
Breast Cancer Information Core (BIC) database 
on the World Wide Web at http://www.nhgri.gov/ 
Intramural_research/Lab_transfer/Bic/). Some muta- 
tions are found more commonly than others, usually 
due to founder effects in certain populations. For 
example, in the Ashkenazi Jewish population, two 
BRCAI1 mutations (185delAG and 5382insC) and 
one BRCA2 mutation (6174delT) are common and 


are detected in a significant proportion of early-onset 
breast cancer cases. A few disease-causing missense 
changes, most notably in the RING finger region of 
BRCALZ, have been noted, but the majority are trun- 
cating nonsense or frameshift mutations spread 
throughout the genes. Some evidence for a geno- 
type-phenotype correlation for an elevated risk of 
ovarian cancer has been presented for both BRCA 
genes, but this remains to be definitively proven. 

Evidence is accumulating for the effect of modify- 
ing genes on the penetrance of certain mutations in the 
BRCA1 and BRCA2 genes. For example, the 999del5 
mutation in BRCA2, prevalent in the Icelandic 
population, appears to be associated with male breast 
cancer in some families but not others. No modifying 
genes have yet been identified, but rare alleles at 
a variable number of tandem repeats (VNTR) linked 
to HRAS1, the Harvey-Ras proto-oncogene, might 
increase the risk of ovarian cancer modestly in 
individuals carrying a BRCA/ mutation. This area is 
certain to receive much more attention in the next few 
years. 


Mouse Models for Loss of BRCA! and 
BRCA2 


Germline manipulation has been used to create mice 
carrying several different presumptive null alleles of 
both Brcal and Brca2. Mice heterozygous for these 
mutations have not shown elevated susceptibility to 
cancer of the mammary gland or indeed of any other 
tissue. It is possible that the rate of loss of the wild- 
type allele is insufficient to lead to a population of null 
cells large enough for tumorigenesis. This might re- 
late to differences in breast physiology or develop- 
ment between mice and humans. Alternatively, some 
species-specific cellular difference, such as telomere 
length, might be responsible. 

In contrast to heterozygotes, mice that are homo- 
zygous for null alleles of the Brca genes are very 
severely affected. Brcal and Brca2 have indispensable 
roles during mouse development, and null mutations 
for both genes result in embryonic lethality between 
days 5.5 and 9.5 in embryogenesis, the phenotype of 
Brcal—/— embryos being more severe than that of 
Brca2—/— embryos. A failure in cell proliferation 
has been suggested as the explanation for the failure 
of Brcal and Brca2 null embryos to develop. The 
lethality of homozygosity for Brca1 has been cir- 
cumvented by mammary gland-specific deletion of 
the gene. Mammary gland tumors having some of 
the morphological features of human breast cancers 
occurred in these animals. These mice should be useful 
in the development of novel preventative or thera- 
peutic approaches. 


BRCAI, BRCA2, and DNA Repair 


Mouse cells with Brcal or Brca2 mutations are hyper- 
sensitive to ionizing radiation, a genotoxic treatment 
that causes primarily double-strand breaks in DNA. 
This and the association of both BRCA1 and BRCA2 
with RAD51, a protein which plays a key role in 
homologous recombination, suggests that BRCA1 
and BRCA2 play a part in the cellular response to 
DNA double-strand breaks. Furthermore, BRCA1 
also associates with the RAD50/MRE11/nibrin com- 
plex, which is thought to process DNA double-strand 
breaks for repair by the processes of both nonhom- 
ologous end joining and homologous recombination. 
BRCAI and BRCA2 are also present, at least in part, 
ina cellular complex. Together, these data suggest that 
BRCAI and BRCA2 are involved in homologous re- 
combination-mediated repair of double-strand breaks. 
There is also some evidence that BRCA1 may have 
a role in the mechanistically independent process of 
the transcription-coupled repair of oxidative DNA 
damage. 

Spontaneous chromosomal abnormalities are ob- 
served at high frequency in untreated Brcal and 
Brca2 mutant cells, implying that these genes act to 
repair DNA damage which occurs as a consequence of 
normal cell division, as well as that caused by geno- 
toxic agent. At the end of mitosis, each daughter cell 
inherits one of the two centrosomes and duplicates 
this at the G1/S transition so that it has two centro- 
somes during mitosis. Recent studies have found that 
a high proportion of Brcal and Brca2 mutant cells 
contain supernumerary centrosomes. This finding 
might explain the high degree of aneuploidy seen in 
BRCA breast tumors in humans. 


Possible Roles of BRCAI and BRCA2 in 
Transcriptional Regulation 


There is accumulating evidence for a role for BRCA1 
and, to a lesser extent, BRCA2 in transcriptional 
regulation. Disregulation of target genes consequent 
to the loss of BRCA genes is a plausible mechanism 
with which to explain, at least in part, tumorigenic 
progression. However, the exact function of the 
BRCA proteins in transcriptional regulation is not 
yet understood. Various genes such as GADD45 and 
the cell cycle regulator p21““”" are thought to be 
regulated by BRCA1. However, there are no reports, 
as yet, of BRCA1 binding DNA and acting directly 
as a transcription factor. Rather, BRCA1 appears 
to exert its influence on transcription as a cofactor 
or adaptor, since it can interact with both DNA- 
binding transcription factors and the RNA PollII 
holoenzyme. 
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There has been a suggestion that BRCA1 may 
be part of the complex involved in the process of 
chromatin remodeling. This process might be required 
for both transcriptional regulation and DNA repair, 
potentially reconciling findings on the apparently 
diverse functions of the BRCA proteins. 


Further Reading 

Bertwistle D and Ashworth A (1998) Functions of the BRCA! 
and BRCA2 genes. Current Opinion in Genetics and Development 
8: 14-20. 

Bertwistle D and Ashworth A (1999) How do the functions of 
BRCAI and BRCA2 relate to breast tumour pathology? 
Breast Cancer Research |: 41—47. 

Hongbing Z, Tombline G and Weber BL (1998) BRCAI, BRCA2 
and DNA damage response. Cell 92: 433 — 436. 

Rahman N and Stratton MR (1998) The genetics of breast 
cancer susceptibility. Annual Review of Genetics 32: 95—121. 


See also: Breast Cancer; Cancer Susceptibility 


Breakage and Reunion 
See: Break—Copy/Break-—Join 
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Break-copy and break-join are two of the three 
fundamentally different mechanisms that have been 
considered by which homologous recombination 
might occur. The third mechanism is copy—choice. 
Break—join recombination has the net effect of break- 
ing and rejoining two molecules such that the recom- 
binant molecule consists only of the material derived 
from the two parental molecules. A break—copy 
mechanism involves a broken molecule priming DNA 
synthesis from a homologous molecule, so that part of 
the recombinant molecule is parental material and part 
is newly synthesized. Newly synthesized DNA in 
break—copy recombination would be extensive (to the 
end of the chromosome or to the replication terminus). 
The term is not applied where the synthesis is local 
repair synthesis. In copy—choice, the recombinant 
molecule would consist entirely of newly synthesized 
DNA (or RNA). Both break-join and break—copy 
have been demonstrated in different bacteriophage. 
Break-join was long regarded as the major mode for 
most organisms, but much recent work has pointed to 
widespread occurrence of break—-copy as a mechanism 
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for the repriming of broken replication forks (replica- 
tion restart). 


See also: Copy-Choice Hypothesis; Genetic 
Recombination 


Breast Cancer 
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Breast cancer can occur in sporadic and hereditary 
forms and both forms are associated with modifica- 
tion to the genetic material. In the case of hereditary 
forms, a constitutive mutation in a specific gene may 
predispose individuals to cancer. In sporadic forms, 
mutations in somatic cells accumulate and result in 
transformation of a normal cell to one with malignant 
potential. 


Family History as an Indicator of 
Predisposition to Breast Cancer 


A history of breast cancer among relatives has been 
found in epidemiological studies to be an indication of 
breast cancer risk. Familial breast cancer is charac- 
terized by a younger age at diagnosis than sporadic 
forms, increasing numbers of affected family mem- 
bers, an increased risk of bilateral breast cancer, and a 
strong association with ovarian cancer. 


Studies of Familial Breast Cancer 


Previous studies have found evidence of an autosomal 
dominant gene with a population frequency of around 
0.0033. Different studies suggest that 5% of cases of 
breast cancer in the general population are associated 
with germline mutations in dominant, highly pene- 
trant susceptibility genes. Linkage analysis has pro- 
duced evidence of cosegregation between breast 
cancer predisposition and genetic markers. Initially, 
this produced evidence for linkage between the breast 
cancer trait and an anonymous marker D17S74 
(located on chromosome 17q21). 


Genes Implicated in Breast Cancer 
Predisposition: BRCAI and BRCA2 


BRCAI 

The BRCA1 gene has been identified on chromosome 
17q21 by positional cloning methods. This gene has 
5592 coding nucleotides that are distributed over 


100000 bases of genomic DNA and consists of 22 
coding exons, which encode a protein 1863 of amino 
acids. About 80% of all BRCA/ mutations are frame- 
shift or nonsense mutations that alter the codon 
reading frame and result in a ‘stop’ codon producing 
a premature protein termination. Genetic susceptibil- 
ity to breast cancer is thought to occur when one 
BRCA1Z allele is inactivated in the germline and sub- 
sequently the other allele is lost in somatic breast 
tissue. The most common mutations are 185delAG 
and 5382insC. 

There is evidence that approximately 45% of 
families with pure, site-specific breast cancer have 
linkage to BRCA1 and have an associated cumulative 
breast cancer risk among gene carriers of 50-85% by 
age 70 depending on the population studied. Where a 
woman is affected with both breast and ovarian cancer 
or has a family history of both breast and ovarian 
cancer there is an increased probability of a mutation 
in BRCA1. For a BRCAI carrier with a first breast 
cancer, the risk of contralateral breast cancer is esti- 
mated to be up to 48% by age 50 years and 64% by 
age 70. Similarly, the risk of ovarian cancer in these 
women ranges from 20% to 50% by age 70. Colon 
cancer risk is fourfold that of the general population 
and prostate cancer may occur 3.3 times more often 
than expected in male BRCA1 mutation carriers, with 
an absolute risk of 8% by age 70 years. 


BRCA2 

A second breast cancer susceptibility gene (BRCA2) 
was localized on chromosome 13q12-13. In affected 
families cases of male breast cancer were found to be a 
part of the BRCA2 tumor spectrum and, in addition, 
the risk of ovarian cancer is lower than in families with 
BRCAI1. 

BRCA2 was cloned and found to be a large gene. It 
has 11385 coding nucleotides distributed over 70 000 
bases of genomic DNA containing 27 coding exons 
and coding for a protein of 3418 amino acids. Like 
BRCAI, multiple distinct mutations in BRCA2 have 
been identified, scattered throughout this gene. It 
is estimated that BRCA1 and BRCA2 account for 
approximately 80% of inherited breast cancer. Many 
other cancers including pancreatic cancer and mela- 
noma occur in excess in carriers of BRCA2. 


Founder Effects involving BRCAI and 
BRCA2 

Specific BRCA1 and BRCA2 mutations are highly 
prevalent in population subgroups, such as those 
identified among Jewish women of central European 
(Ashkenazi) origin. BRCAI (185delAG) and BRCA2 
(6174delT ) together may account for one-fourth of all 
early-onset breast cancer and two-thirds of early-onset 


breast cancer in the setting of a family history of 
breast or ovarian cancer among Ashkenazi Jewish 
women. Observations suggest that the penetrance of 
185delAG (that is, the likelihood that a person with 
the mutation will actually develop cancer) is signifi- 
cantly greater than the penetrance of 6174delT. This 
supports the possibility that some breast cancer gene 
mutations are associated with a higher risk than others, 
a finding that further complicates genetic counseling 
in this setting. Similar founder mutations have been 
found in other populations and mutations such as 
large deletions may also be specific to founder popu- 
lations. 


Function of BRCAI and BRCA2 Proteins 
Studies of the normal function of BRCA1 suggests 
that it encodes a protein involved in the cellular 
response to DNA damage. Evidence indicating links 
between BRCA1 phosphorylation by Chk2 and with 
ATM suggests that BRCA1 may link DNA repair 
functions of BRCA2 to pathways that signal DNA 
damage or incomplete DNA replication. 


‘BRCA3’...? 

Studies indicate that 10-20% of families at high risk 
for breast cancer are not linked to either BRCA/ or 
BRCA?. 


Other Syndromes associated with 
Predisposition to Breast Cancer 


Li-Fraumeni Syndrome 

In 1990, germline mutations of the tumor suppressor 
gene p53 were found in five families with the Li- 
Fraumeni syndrome. The risk of breast cancer in car- 
riers of p53 mutations in these families is not known 
precisely, but an estimate is that at least 50% will have 
breast cancer by age 50 years. 


Cowden Disease 

Cowden disease is considered an autosomal domin- 
ant disorder in which an estimated 30% of affected 
women have breast cancer, often bilateral and typically 
at a younger than average age. A genomic search local- 
ized the Cowden gene to chromosome 10q22-23, and 
mutations in the gene, PTEN, have been found in 
Cowden disease patients. 


Androgen Receptor Mutations 

Two families have been described in which multiple 
male breast cancers and germline mutations in the 
androgen receptor (chromosome Xq11.2-12) have 
been observed. 
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Low-Penetrance Breast Cancer 
Susceptibility Genes 


Susceptibility genes may also exist that are much more 
common but less penetrant than the above. Mutations 
in the ataxia telangiectasia gene and the rare HRAS1 
variable number of tandem repeats (VNTR) poly- 
morphisms may be two such loci. These do not pro- 
duce dramatic familial aggregations of breast cancer 
but may prove to be responsible for a substantial 
proportion of all breast cancers if their epidemiologic 
association with breast cancer is confirmed. 

Investigators recently suggested that an interaction 
between the HRAS1 VNTR locus and BRCAI pro- 
duce a twofold increase in risk of ovarian cancer 
among BRCA1 mutation carriers. 


Predictive Testing for BRCAI and 
BRCA2 


It is generally agreed that none of the currently avail- 
able cancer susceptibility tests are appropriate for the 
screening of asymptomatic persons in the general 
population, although the population-specific muta- 
tions described among Ashkenazi Jews and Icelanders 
may achieve that status in the future. The testing of un- 
affected members of a family known to carry a BRCA1 
or BRCA2 mutation or other cancer-predisposing gene 
(known as a predictive genetic test) is probably best 
done at specialty clinics. 


See also: BRCAI/BRCA2; Cancer Susceptibility 
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A formal classification system has been developed to 
describe the various types of crosses that can be set up 
between animals having defined genetic relationships 
relative to each other at one or more loci. For the sake 
of simplicity in describing these crosses, I will arbitrar- 
ily use a single locus (the A locus) with two alleles (A 
and a) to represent the situation encountered for the 
whole genome. With a simple two-allele system, there 
are only four generalized classes of crosses that can be 
carried out. 

At the start of most breeding experiments, there is 
usually an outcross, which is defined as a mating be- 
tween two animals or strains considered unrelated to 
each other. In many experiments, the starting material 
for this outcross is two inbred strains. All members 
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of an inbred strain are homozygous across their en- 
tire genome and genetically identical to each other. 
Thus, an outcross between two inbred strains can be 
symbolized as A/A x a/a, and the offspring resulting 
from such a cross are called the first filial generation, 
symbolized by F;. All F; animals that derive from an 
outcross between the same pair of inbred strains are 
identical to each other with a heterozygous genome 
symbolized as A/a. However, when either or both 
parents are not inbred, F; siblings will not be identical 
to each other. 

An outcross between two inbred strains or between 
one inbred strain and a non-inbred animal that 
contains a genetic variant of interest is almost always 
the first breeding step performed in a linkage analysis. 
The F; animals obtained from this outcross can 
be used in two types of crosses commonly performed 
by experimental geneticists — backcrosses and inter- 
crosses. A mating between a heterozygous F; animal 
(with an A/a genotype) and one that is homozygous 
for either the A or a allele is called a backcross. 
This term is derived from the vision of an F; animal 
being mated “back” to one of its parents. In actuality, a 
backcross is usually accomplished by mating F; ani- 
mals with other members of a parental strain rather 
than a parent itself. The two-generation outcross— 
backcross combination is one of the major breeding 
protocols used in linkage analysis. From Mendel’s first 
law of segregation, we know that the offspring from 
a backcross to the a/a parent will be distributed 
in roughly equal proportions between two genotypes 
at any single locus — approximately 50% will be 
heterozygous A/a, and approximately 50% will 
be homozygous a/a. 

A mating set up between brothers and sisters from 
the F; generation, or between any other two animals 
that are identically heterozygous at a particular locus 
under investigation, is called an intercross. An inter- 
cross can be represented by the notation: A/a x A/a. 
The two-generation outcross—intercross series was 
the classic breeding scheme used by Mendel in the 
formulation of his laws of heredity, and it is the second 
major breeding protocol used today for linkage an- 
alysis in animals. Again, according to Mendel’s first 
law, the offspring from an intercross will be distrib- 
uted among three genotypes at any single locus — 50% 
will be heterozygous A/a, 25% will be homozygous 
A/A, and 25% will be homozygous a/a. 

A mating between two members of the same inbred 
strain, or between any two animals having the same 
homozygous genotype is called an incross. The 
incross (A/A x A/A or a/a x a/a) serves primarily as 
a means for maintaining strains of animals that are 
inbred or carry particular alleles of interest to the 
investigator. All offspring from an incross will have 


the same homozygous genotype which is identical to 
that present in both parents. 


See also: Genetic Engineering; Inbred Strain; 
Mendel’s Laws 
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Sydney Brenner (1927— ) studied at the University of 
Witwatersrand, South Africa and received his PhD 
from Oxford, England in 1954. After spending 1955- 
56 as a lecturer in physiology at the University of 
Witwatersrand, he went back to England to work at 
the Medical Research Council (MRC) in Cambridge. 
He was the Director of its Laboratory of Molecular 
Biology from 1979 to 1986 and then headed the Mole- 
cular Genetics Unit until 1989. 

Brenner’s initial focus was on the transfer of infor- 
mation from DNA to protein, combining theoretical 
and experimental approaches. One of his first contri- 
butions to the growing science of molecular biology 
was his demonstration of the impossibility of all over- 
lapping triplet codes. Before the genetic code had been 
deciphered, it was clear that it must be triplet at a 
minimum, since two nucleotides together can only 
encode 16 amino acids, not the needed 20. However, 
the code might have been overlapping, so that nucleo- 
tides 123 in some sequence might be the first word, 
234 the second, 345 the third, and so on. But if 123 
represents a unique amino acid, there are only four 
possible choices for position 4, and thus 123 can only 
have four possible neighbors to the right. By exam- 
ining the relatively few known protein sequences, 
Brenner showed that in several cases a given amino 
acid is followed by more than four others, thus 
excluding this mode. Brenner later collaborated with 
Crick, Barnett, and Watts-Tobin in a classical experi- 
ment (see Frameshift Mutation) that demonstrated 
general features of the code: that it must be triplet 
(or a multiple of three nucleotides) and that it does 
not contain “commas” to mark off codons. 

By 1961, several aspects of coding and protein 
synthesis had been elucidated; RNA was known to 
be an intermediate between DNA and protein and, 
since it was clear that ribosomes are the sites of protein 
synthesis and that they contain RNA, it was generally 
assumed that the code was carried in those ribosomal 
molecules. However, Brenner and François Jacob 
found reason to doubt this. For one, analyses of ribo- 
somal RNA showed that the molecules are of only a 


few types, but molecules encoding proteins ought to 
vary widely. Furthermore, ribosomes were known to 
be quite stable, and yet the phenomena of enzyme 
induction and repression in bacteria showed that the 
synthesis of specific proteins could be started or 
stopped within minutes. Brenner and Jacob therefore 
postulated that the message is actually carried by a 
short-lived molecule, which they called messenger 
RNA, that would be made quickly, associate with 
ribosomes where it would be translated, and then be 
destroyed quickly. In collaboration with Matthew 
Meselson, they demonstrated that such molecules 
must exist in phage-infected bacteria and thus, by 
extension, in normal, uninfected cells. 

In reporting on this experiment at the 1961 Cold 
Spring Harbor Symposium, Brenner demonstrated a 
classic sense of humor. In concluding his presentation 
in his rich, resonant voice, reminiscent of the poet 
Dylan Thomas, Brenner said: 


Last night, Spiegelman was highly induced for ethanol dehy- 
drogenase, and he came over to me and said, “Who invented 
this term ‘messenger RNA’?” and I said, “Well, I didn’t.” 
“Well,” he said, “it’s a very bad term.” But when I thought 
about it afterward, I decided that ‘messenger RNA is really a 
very good term. In Greek mythology, the messenger of the 
gods was Hermes, and this stuff has been hermetic — it’s been 
hidden from us. And in Roman mythology, the messenger of 
the gods was Mercury, and this stuff has been mercurial — it’s 
been hard for us to get our hands on. 


Fritz Lipmann then spoke up and said: 
Mercury was also the god of thieves 


and in a peal of laughter the session broke up. 
Another of Brenner’s important contributions fol- 
lowed the discovery of amber mutations, initially in 
phage T4. To confirm other observations suggesting 
that such mutations are nonsense mutations, resulting 
in chain termination, Sarabhai, Stretton, Brenner, and 
Bolle examined amber mutants of T4 gene 23, which 
encodes the major head protein of the phage. Since this 
protein constitutes so much of the intracellular pro- 
tein made during phage growth, it was not necessary 
to purify it; Brenner and his associates demonstrated 
that the amber mutants make only fragments of the 
protein and that the lengths of these fragments cor- 
respond to the positions of the mutations within the 
gene. Thus, protein synthesis is directed by codons 
starting at one end of the gene and continues to the 
point of the amber mutation, where it stops. Kaplan, 
Stretton, and Brenner then went onto discover the ochre 
mutations, a similar class of chain terminators. This 
work was critical in identifying the triplets that nor- 
mally serve as chain terminators for protein synthesis. 
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Brenner also played important roles in the early 
development of guidelines for research in genetic en- 
gineering, including representing Britain at pivotal 
meetings of the NIH Director’s Advisory Committee, 
which was developing rules for the US that largely set 
the tone for the rest of the world. His insights and wit 
were generally much appreciated, as was his focus 
(along with that of Roy Curtis) on relying primarily 
on biological rather than physical approaches to con- 
tainment of potentially hazardous recombinants. He 
even tested some of the early, genetically weakened 
Escherichia coli on himself, determining how poorly 
they came through his intestine in contrast to his 
normal intestinal flora. 

Once the foundation of molecular biology was 
firmly laid with bacterial and phage systems, Brenner 
decided it was time for him to move on to apply these 
methods of analysis to the problem of embryological 
development, especially of the nervous system, and he 
searched for an appropriate organism. He finally 
found it in a minute roundworm, only about 1mm 
long: Caenorhabditis elegans, which has proven to be 
an ideal subject for analyzing development. It has a 
fixed number of cells, each of which follows a tightly 
determined developmental pathway and has exactly 
the same function from individual to individual, 
allowing the derivation of a flow chart of the fate of 
each cell. Since C. elegans is a hermaphrodite, making 
and working with mutants required developing quite 
different techniques than had been used in such typical 
genetic systems as the fruit fly. Many mutants have 
been discovered whose analysis, in combination with 
other techniques, is yielding a detailed picture of the 
unfolding of genetic information in the genome as 
cells differentiated along their specialized pathways. 
In particular, he and postdoctoral fellow Robert 
Horvitz sorted out the molecular mechanisms of the 
controlledcell suicide, or ‘apoptosis,’ thatplaysapivotal 
role in many aspects of embryonic development and 
also helps protect higher eukaryotes against cancer. 
His group was also instrumental in developing the 
study of aging, nerve cell function, and transducing 
chemical signals from the cell surface to its interior. 
They also used very intricate analysis of electron 
micrographs of thin sections of the worm to trace the 
connections of every neuron — thus producing the 
only total wiring diagram to date for any animal. 

Worldwide, about 1000 scientists in many labora- 
tories are still using C. elegans productively. However, 
in the mid-1980s, Brenner moved on to do seminal 
work in yet another field — the Human Genome Pro- 
ject. In 1986, he initiated the focus on those very 
limited regions of the DNA that actually encode pro- 
teins, getting the UK to employ this method of genomic 
studies. He then turned to the pufferfish, whose genes 
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are organized far more compactly than those of most 
higher organisms, with very little noncoding DNA. Its 
genome is only one-eighth the size of the human gen- 
ome, yet seems to encode most of the same genes; many 
have been shown to be able to substitute for defective 
mouse genes. A major focus is on the problem of how 
higher organisms have evolved from one another. 

After retiring as head of the MRC Molecular 
Genetics Unit in 1989, Brenner continued his research 
at the Salk Institute in La Jolla, California. In 1996, 
he formed a multidisciplinary Molecular Sciences 
Institute (MSI) to focus intently on research into 
how the genes of an organism can direct the genera- 
tion of a fully functioning living entity, using a com- 
bination of genomic research, computation, and 
simulation. The establishment of this nonprofit organ- 
ization, which opened in January 1998 in Berkeley, 
was funded through a one-time, no-strings-attached 
gift of $10 million from Philip Morris. A main goal is 
to collect massive amounts of information on such 
factors as the complement of RNAs being expressed 
in a given cell and the phosphorylation state of every 
significant kinase or other regulatory molecule. MSI is 
also continuing his work on sequencing the Japanese 
puffer fish and determining the functions of its genes. 
They are also interested in its remaining “junk” DNA 
— as Brenner pointed out, such DNA indeed needs to 
be considered as junk, being saved for possible even- 
tual use for some purpose some day, not as garbage 
DNA, which might as well be thrown away. Another 
project involves developing innovative databases, such 
as one that explores interesting patterns of protein 
interactions for possible functional implications. 

Most recently, Brenner is trying to realize another 
long-time dream. His ambitious project to greatly 
accelerate the analysis of genes expressed in particular 
cells and the large-scale sequencing of interesting 
DNA stretches is being carried out through Lynx 
Therapeutics, based in Hayward, California. The 
Lynx “Megaclone” technology involves transferring 
the contents of a DNA library onto beads, each of 
which then contains thousands of copies of one mem- 
ber of the original library bound through a system 
of tags and antitags. The technology should allow 
researchers to fish out and then analyze regions 
that are over- or underexpressed, as well as identify 
disease-associated, single-nucleotide polymorphisms. 

In November of 2000, Brenner was awarded the 
Albert Lasker Award for Special Achievement in 
Medical Science: 


for 50 years of brilliant creativity in biomedical science — 
exemplified by his legendary work on the genetic code; his 
daring introduction of the roundworm Caenorhabditis ele- 
gans as a system for tracing the birth and death of every cell 


in a living animal; his rational voice in the debate on recom- 
binant DNA; and his trenchant wit. 


He had already received the 1971 Albert Lasker Basic 
Medical Research Award for his contributions toward 
our basic understanding of molecular genetics. He also 
was awarded the Royal Medal of the Royal Society, 
and the Gardiner Foundation Award, as well as hon- 
orary degrees from universities throughout the world. 


See also: Caenorhabditis elegans; Crick, Francis 
Harry Compton; Frameshift Mutation; Human 
Genome Project; Jacob, Francois 
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Buoyant density is a measure of the ability of a 
substance or particle to float in a standard solution, 


e.g., CsCl. 
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This lymphoid cancer derives its name from Dr Denis 
Burkitt who, in 1958, described a new neoplasm 
affecting jaws and other facial bones of African chil- 
dren. Epidemiological research by Burkitt and others 
quickly established that the condition is common in 
the region of Africa represented by the geographical 
band extending 15° north and south of the Equator 
and extending further south on the eastern side of the 
continent. These studies also found that this distribu- 
tion of Burkitt’s lymphoma in Africa corresponded 
with regions that are holoendemic for malaria, a 
disease transmitted by the mosquito Aedes aegypti. 
Burkitt’s lymphoma was subsequently detected in 
children in other parts of the world as well. The Afri- 
can form, also called endemic Burkitt’s lymphoma, 
accounts for nearly 50% of all childhood cancers 
in the endemic region, while the non-African 
(nonendemic) form accounts for nearly one-third of 
childhood lymphomas, with a male to female ratio of 
two or three to one. A less common adult form is 
associated with immunodeficiency. Occasionally, the 


tumor in children as well as adults may also present in 
the marrow as an acute lymphoblastic leukemia. Facial 
bone tumors are less common in the nonendemic 
form, the majority of cases presenting in the abdomen. 
These tumors are highly aggressive, but are potentially 
curable. The prognosis in children correlates with 
disease bulk at the time of diagnosis. 

As a lymphoma, it comprises a small but well- 
defined histological subset of the histologically and 
clinically complex B cell non-Hodgkin’s lymphomas. 
Histologically, the tumor cells are monomorphic 
comprising small B cells. They display round nuclei, 
multiple nucleoli, and relatively abundant basophilic 
cytoplasm. The tumor cells exhibit a high rate of pro- 
liferation as well as a high rate of spontaneous cell 
death. A histological hallmark of this disease is the 
so-called ‘starry sky’ pattern resulting from ingestion 
of the apoptotic cells by numerous benign macro- 
phages. As a lymphoid neoplasm, the tumor cells 
present surface immunoglobulin as well as a number 
of B-cell-associated antigens, notably CD19, CD20, 
CD22, CD79a, and CD10. 

One of the first recognized intriguing features of 
this neoplasm was that the serum of endemic patients 
exhibits high titers of antibodies against the DNA 
virus Epstein-Barr virus (EBV). The viral genomes 
are also detected in the lymphoma cells of almost all 
endemic patients, but rarely in nonendemic patients. 
The precise role played by the virus in the generation 
of the tumor is unknown. It has been suggested that 
the virus-infected B lymphocytes may be prone to 
excessive proliferation, which may predispose them 
to additional genetic errors, some of which may 
indeed be specific for cell transformation to a malig- 
nant state. While this scenario is possible, EBV is a 
common virus with most individuals in the population 
being seropositive. Acute infection by the virus causes 
infectious mononucleosis and there is no increased 
predisposition of these patients to Burkitt’s lymph- 
oma. Nonendemic patients with EBV-negative 
Burkitt’s lymphomas (no virus in tumor cells) are 
often seropositive for the virus. Further complicating 
the issue of the role of this virus in the etiology of 
lymphoma is the fact that EBV-positive Burkitt’s or 
Burkitt’s-like lymphoma is one of the malignancies 
frequently associated with immunodeficiency states 
associated with human immunodeficiency virus 
(HIV) infection or iatrogenic immunosuppression. 

Burkitt’s lymphoma was the first human tumor 
in which the key molecular aberration underlying 
tumorigenesis was defined. Soon after the disease 
was identified, cell lines were derived in Sweden 
from tumors collected in Africa. Cytogenetic analysis 
of these tumors during the 1970s identified a set of un- 
usual chromosomal translocations that characterized 
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essentially all tumors. In these tumors, a region of 
chromosome 8 (band 8q24) was invariably exchanged 
by breakage and rejoining, most commonly with 
a region of chromosome 14 (band 14q32), and less 
commonly with regions of chromosomes 2 (band 
2p12) and 22 (band 22q11); in cytogenetics parlance, 
these translocations are written as t(8;14)(q24;q32), 
t(2;8)(p12;q24), and t(8;22)(q24;q11), respectively. 
Combining gene mapping and molecular cloning 
techniques that became available during the late 
1970s and early 1980, investigators showed that these 
chromosomal regions harbored some very important 
genes. Thus, a newly discovered cellular oncogene 
called MYC was mapped at the chromosome 8 break- 
point, 8q24. MYC is the cellular counterpart of the 
avian myelocytomatosis virus (an RNA virus or 
retrovirus) which causes carcinomas and sarcomas in 
chickens by infection. The chromosomes 14, 2, and 22 
breakpoints 14q32; 2p12, and 22q11, respectively, 
harbor the immunoglobulin heavy chain (IGH), 
x light chain (IGHR), and À light chain (IGH L) genes. 
Analysis of the gene structure changes associated with 
the translocation using molecular cloning techniques 
showed that in each of these translocations, the onco- 
gene’s transcription regulation machinery was re- 
placed by that of the immunoglobulin genes without 
disrupting its protein-coding region. The MYC gene 
belongs to a class of genes called transcription factors 
whose main cellular function is regulation of tran- 
scription of many genes in the genome, activation 
as well as repression; the latter, in turn, regulate multi- 
ple cellular functions. Therefore, the regulation of 
expression of the transcription factor itself is precisely 
regulated. The Burkitt’s lymphoma translocations 
eliminate this precise regulation and bring the gene 
under the transcriptional regulation of the immuno- 
globulin genes that are continuously expressed in B 
cells. This deregulation leads to inappropriate expres- 
sion of the oncogene’s normal protein product, lead- 
ing in turn to activation of other genes that should not 
be expressed in this lineage, and ultimately leads to 
cell transformation. In support of this scenario, when 
artificially created JG-MYC fusion genes were intro- 
duced into B cells, transformation resulted in vitro as 
well as im vivo in mice. 

The mechanism of deregulation of MYC expres- 
sion by immunoglobulin genes has since been 
recognized as a general model of transformation of 
lymphoid cells, especially B cells. Currently, over a 
dozen such translocations have been recognized and 
in many of these the deregulated genes have been 
identified. These genes belong to many different 
classes that regulate key cell functions such as the 
cell cycle, apoptosis, and immune system regulation. 
Indeed, several of these genes were first discovered 
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through the analysis of these immunoglobulin-gene- 
associated translocations and the study of their normal 
function made significant contributions to our under- 
standing of these biological phenomena. Thus, the 
discovery of MYC-associated translocations and 
the characterization of their molecular biological 


consequences initiated a new era of powerful insights 
into mechanisms of cell regulation and tumorigenesis. 


See also: EBNA; Epstein-Barr Virus (EBV); 
Immunoglobulin Gene Superfamily 
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C genes are genes coding for the constant regions of 
immunoglobulin (antibody) molecules. 


See also: Antibody; Constant Regions; V Gene 


C Value 
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The C value is the total amount of DNA in a haploid 


genome. 


See also: C-Value Paradox; Genome Size 


C-Value Paradox 
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C-value for a species is defined as the amount of DNA 
in picograms (g x 10 '*) in one haploid set of chromo- 
somes from a nondividing, somatic cell of an organism 
belonging to that species. C-value is the same as ‘gen- 
ome size,’ although the latter is more often expressed 
in kilobases of DNA. 

The C-value paradox arises from the fact that 
different organisms having the same general level of 
biochemical, organic, and morphological complexity 
and even organisms belonging to the same genus, 
nevertheless often have widely different C-values. For 
example, genome sizes among vertebrates range from 
0.5 to 150 pg, among the insects from 0.05 to 15 pg, and 
among the annelid worms from 0.7 to 8 pg. Within one 
genus of salamanders, Plethodon, genome sizes for 


different species range from 18 to 69 pg, even though 
these species all look remarkably alike and show no 
strikingly obvious developmental, behavioral, or eco- 
logical differences (Macgregor, 1983). Exactly the 
same applies to plants. In any one group of animals 
or plants, the minimum genome size required to pro- 
duce a given grade of organization is usually small 
compared to the maximum genome size found within 
that group. 

The C-value paradox can be resolved on three 
grounds. 


1. The genomes of eukaryotes, and to a lesser but 
nonetheless significant extent, prokaryotes (see 
Bendich and Drlica, 2000) show a tendency for 
growth by duplication of both coding and non- 
coding DNA sequences. Genome size differences 
amongst eukaryotes are mainly the result of dif- 
ferent amounts of noncoding repetitive DNA 
sequences and different levels of repetition of cod- 
ing and noncoding sequences. Drosophila virilis, for 
example, has a genome twice as large as D. melano- 
gaster, but over 40% of the virilis genome consists 
of multiple repeats of just four short noncoding 
sequences. 

2. There is no C-value paradox at the levels of metab- 
olism and development, as determined by com- 
plexity of messenger RNA, i.e., the transcriptive 
capacity of the genome. For example, the genetic 
coding information content is the about the same 
for the genomes of all vertebrates. 

3. In considering the differences in genome size 
(the ‘nucleotype’) between related organisms and 
the wide differences in chromosome number and 
shape (karyotype) that are also found within 
families and genera, it is essential to uncouple the 
coding informational component of the genome 
from nucleotype and karyotype. Nucleotype and 
karyotype are characters of an organism or species 
that have evolved through pressures of natural 
selection that are in different categories from those 
that determine the evolution of the informational 
component of the genome. Genome size, for ex- 
ample, influences cell size and cell cycle time and, 
through these effects, it undoubtedly has a wide 
impact on growth and development (Horner and 
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Macgregor, 1981). Karyotype determines patterns 
of linkage and gene segregation and recombination 
and, again in a broad sense, it probably influences 
patterns of gene expression, through the formation 
of chromosomal and nuclear domains. 


An excellent discussion of the C-value paradox in 
relation to chromosome organization is given in 
Gall (1981). Important aspects of the C-value paradox 
in relation to evolution and development are excep- 
tionally well covered by John and Miklos (1988). 
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C. elegans 


See: Caenorhabditis elegans 


C57BL/6 
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C57BL/6 is the name given to a very widely used 
inbred strain of black mice. This strain is also referred 
toas B6. In studies of the genetic effects of mutations on 
any expressed trait, it is always important to rule out 
contributions from other genes. This is accomplished 
typically by placing the mutation of interest on a 
standard inbred genetic background through the 
creation of a congenic strain. The C57BL/6 strain is 
the most common inbred strain used for the creation 
of congenic strains of mice. 


See also: Congenic Strain; Inbred Strain 
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CA microsatellite repeats are variable length runs of 
repeated CACACA(CA)n nucleotides that are asso- 
ciated with genes. 


See also: Microsatellite 
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The CAAT box is a motif present in the conserved 
sequence upstream of the start-points in eukaryotic 
transcription units, which is recognized by a large 
group of transcription factors. 


See also: Transcription 


c-ABL Gene and Gene 
Product 
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The c-ABL (formally denoted ABL1) gene was first 
identified as the cellular homolog of the transforming 
gene of Abelson murine leukemia virus. ABL is a 
single-copy gene in the haploid mammalian genome 
and is located on human chromosome 9q34.1. ABL 
has been conserved throughout metazoan evolution, 
with orthologs present in the genomes of Caenorhab- 
ditis elegans and Drosophila. The mammalian gene has 
11 exons, with two alternative 5’ exons, transcribed 
from distinct promoters that are GC-rich and lack 
TATA elements. A large first intron of approximately 
250 kb separates the two alternative first exons. This 
intron is the region of breakpoints on chromosome 9 
in the t(9;22) translocation that generates the Philadel- 
phia chromosome, characteristic of the human hema- 
tologic malignancy chronic myeloid leukemia (see 
BCR/ABL Oncogene). The c-ABL gene produces 
two mRNA transcripts of 5.3 and 6.5 kb, containing 


the different 5’ exons spliced to a common second 
exon, followed by the remained of the coding 
sequence. A smaller transcript is detected in testes, 
but there is no alteration in coding sequence of this 
mRNA. The c-ABL gene is relatively ubiquitously 
expressed during embryonic development and in 
adult tissues, with higher levels of expression in 
lymphoid tissue and testes. There is a single known 
homolog of c-ABL in the genome known as ARG 
(ABL-related gene), or formally as ABL2. 

The c-ABL gene encodes a nontransmembrane 
protein-tyrosine kinase, c-Abl, of about 145 kDa. 
There are two protein isoforms of Abl, denoted type 
Ia and Ib, that are the products of the two distinct 
mRNAs, and differ only in their N-terminal 
sequences. The type Ib form of c-Abl contains a gly- 
cine residue at the second position and is covalently 
attached to a myristoyl fatty acid moeity. The remain- 
der of the polypeptide is identical between the two 
isoforms. The N-terminal 60 kDa is very similar in 
structure to members of the Src family, with Src 
homology 3 (SH3) and Src homology 2 (SH2) 
domains, followed by the catalytic domain. However, 
c-Abl (and Arg) differ from Src proteins by the exist- 
ence of a large (90 kDa) C-terminal domain. This 
domain contains many functional motifs, including 
phosphorylation sites, nuclear localization and export 
signals, and DNA- and actin-binding domains. 

The normal cellular functions of c-Abl are un- 
known. The protein is localized predominantly to 
the cell nucleus in adherent cell types, with a fraction 
also found in the cytoplasm, predominantly associated 
with the filamentous actin cytoskeleton. The tyrosine 
kinase activity of the protein is tightly regulated 
in vivo. Abl catalytic activity is directly inhibited by 
its own SH3 domain in an intramolecular fashion that 
is similar to the regulation of Src kinases, but there is 
also evidence for regulation by a cellular inhibitor and 
via phosphorylation by other tyrosine and serine 
kinases. Abl kinase activity is stimulated by several 
physiologic stimuli, including DNA damage, oxidative 
stress, and integrin and growth factor stimulation. 
Overexpression studies have suggested roles for 
nuclear Abl in growth arrest and apoptosis responses 
to genotoxic stress and in transcription, and for cyto- 
plasmic Abl in cytoskeletal responses to cell adherence 
and growth factor stimulation. In Drosophila, genetic 
evidence implicates Abl in axonogenesis in the central 
and peripheral nervous system. The murine c-abl gene 
was one of the first to be inactivated by homologous 
recombination. Mice with homozygous null muta- 
tions in c-abl have reduced postnatal survival, variable 
deficiency of mature B- and T-lymphoid cells, and 
impaired spermatogenesis. However, mice lacking 
Abl do not have profound defects in DNA damage 
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responses, immunity, or fertility at the organismal or 
cellular levels, suggesting that the role of Abl in these 
processes is redundant, subtle, or both. 


See also: BCRIABL Oncogene; Mouse Leukemia 
Viruses 
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Introduction 


Caenorhabditis elegans is a small, free-living, nema- 
tode worm, which has become established as a stand- 
ard model organism for a great variety of genetic 
investigations, being especially useful for studying 
developmental biology, cell biology and neurobiol- 
ogy. As an invertebrate experimental system, it is 
now second only to Drosophila melanogaster in terms 
of convenience and popularity. By the year 2000, the 
community of C. elegans researchers had expanded to 
approximately 300 laboratories, distributed over 20 
countries. In 1998, sequencing of the 97 million base 
pairs of DNA that make up the entire genome was 
essentially completed. This was the first complete 
genome sequence to be determined for any multi- 
cellular organism. 

The potential usefulness of nematodes as tools for 
genetic research was recognized early on by Ellsworth 
Dougherty (USA) and Victor Nigon (France), but the 
special popularity of C. elegans stems directly from 
its adoption as an experimental organism by Sydney 
Brenner, working in Cambridge, UK, during the 1960s. 
The reasons for choosing this species as a subject for 
intensive study are described later in this article. 

C. elegans is a member of the phylum Nematoda, 
which are commonly known as roundworms. The 
number of species in this phylum is unknown, but 
believed to be very large: at least one hundred 
thousand and probably several million. In terms of 
individual numbers, nematodes are also extraordinar- 
ily numerous; according to some estimates, four out of 
every five animals on this planet is a nematode. Both 
free-living and parasitic species occur. Plant parasitic 
nematodes are economically important, being respon- 
sible for the loss of 10-20% of primary agricultural 
production. The animal parasites are also very import- 
ant; most vertebrates have at least one nematode 
parasite, which can have debilitating or lethal effects 
on its host. Entomopathogenic nematodes, which kill 
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insects as a result of symbiotic bacterial pathogens, 
are used for biological control. Nematode parasites 
that infect humans and cause disease include ascarids 
(causing ascaridiasis), threadworms, hookworms, gui- 
nea worms, filarial nematodes (causing elephantiasis), 
and Onchocerca (causing river blindness). At least one 
billion people carry one or more nematode infections. 
Thus, the nematodes are a major group of animals, 
deserving of study in their own right. Nematology is 
a recognized subdiscipline of zoology, with its own 
journals and departments. 


Properties and Life Cycle 


In the natural environment, C. elegans is a free-living 
species, which exists as part of the soil microfauna, 
eating the bacteria that form during the decomposi- 
tion of vegetable matter. Strains of the species have 
been recovered from soil in many different countries. 
In the laboratory, it is usually grown monoxenically, 
using Escherichia coli as a bacterial food source. Lawns 
of E. coli are spread on nutrient agar plates, and worms 
are added to these lawns. The worms can also be 
grown in bulk liquid culture, either monoxenically in 
suspensions of E. coli, or on axenic media. Most 
experimentation employs monoxenic plate culture. 
When feeding on E. coli, the worms grow rapidly, 
going through one generation every 3 days at 25 °C. 
There are two sexes, the self-fertilizing herm- 
aphrodite and the male (Figure 1). Both sexes are 
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diploid, with five pairs of autosomal chromosomes, 
but the hermaphrodite has two X chromosomes (XX) 
and the male has only one X chromosome (XO). The 
hermaphrodite sex is essentially a modified female sex, 
which has evolved as an adaptation for rapid popula- 
tion growth. Usually, populations consist of almost 
100% XX hermaphrodites, and males only arise as a 
result of rare loss of X chromosomes at meiosis, so 
only about 1 in 500 animals is a male. The basic life 
cycle is simple (Figure 2). Adult hermaphrodites con- 
tain both sperm and oocytes, and are therefore capable 
of self-fertilization. This takes place in the sperma- 
theca; the fertilized egg then undergoes development 
for a few hours in the uterus before being laid through 
the centrally located vulva. Each self-fertile hermaph- 
rodite lays approximately 330 eggs. Twelve hours after 
first cleavage at 25°C, embryonic development is 
completed and the worm hatches from the egg as a 
first stage larva (L1), about 0.15 mm long, with 552 
nuclei (554 in the case of a male). The L1 larva has 
most of the adult organ systems, with the exception of 
reproductive structures. It begins feeding and grows 
rapidly, molting four times while the body matures 
and the reproductive organs and germline develop. 
Spermatogenesis occurs during the L4 stage; the 
resulting ameboid sperm are stored in the sperma- 
theca, and the germline switches over to oogenesis. 
In early adulthood, when egg-laying commences, 
the hermaphrodite is about 1mm long. Production 
of eggs continues for about 3 days, until all sperm 
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Anatomy of adult hermaphrodite (XX) and adult male (XO) C. elegans. The hermaphrodite is about | mm 


long when it reaches adulthood, but continues to grow in length thereafter, to a maximum of 1.5 mm. Males are 
smaller, reaching | mm in total length. Larval stages have simpler anatomy, because they lack mature gonads and 
genitalia. 
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Figure 2 Life cycle of C. elegans. Minimum length of 
the life cycle is about 52 hours at 25°C and about 100 
hours at 15°C. The dauer larva is an alternative third- 
stage larval form, specialized for survival, which develops 
under conditions of nutrient deprivation. 


are used up. More eggs will be produced if the herm- 
aphrodite mates with a male, so the fertile period can 
be extended for another few days. Sterility and death 
follow, at a total lifespan of about 14 days. 

Two variations on this basic life cycle are the pro- 
duction of males, and the formation of dauer larvae. 
Males arise as a result of X chromosome loss during 
meiosis; embryonic development is almost identical 
between male and hermaphrodite, but larval develop- 
ment diverges increasingly, so there are extensive dif- 
ferences between adults of the two sexes. The male 
gonad produces only sperm, which are transferred to 
hermaphrodites during mating. After mating, sperm 
contributed by the male are used in preference to the 
resident sperm produced by a hermaphrodite, so only 
outcross progeny are generated after mating. Fifty 
percent of the male sperm carry no X chromosome, 
and consquently 50% of the outcross progeny are male, 
50% hermaphrodite (Figure 3). Crosses between 
males and hermaphrodites provide the basis for all 
classical genetic analysis in C. elegans, because they 
can be used to generate any desired combination of 
genes. 

Dauer larvae (from the German dauer, enduring) 
are an alternative third stage larval form, which de- 
velop under conditions of crowding and nutrient 
deprivation. They are extensively modified for survival 
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Figure 3 Sexual system of C. elegans. Each self-fertile 
hermaphrodite (XX) produces about 330 sperm and a 
larger number of oocytes, and consequently produces 
about 330 XX progeny, sperm being used with 100% 
efficiency. Males (XO) arise occasionally by X chromo- 
some loss. They are able to mate with hermaphrodites 
and transfer sperm to the spermathecae. These male- 
derived sperm have a competitive advantage over the 
hermaphrodite-derived sperm, so after mating, a 
hermaphrodite will switch over completely to the 
production of cross-progeny. 


and can survive harsh conditions and prolonged star- 
vation, remaining viable for several months. When 
food becomes available, dauer larvae will molt to 
form L4 larvae, and resume maturation. 


Technical Advantages 


The major advantages of C. elegans as an experimental 
system are as follows: 


e Anatomical simplicity: the adult hermaphrodite 
has only 959 somatic nuclei when fully grown, yet 
it contains well-differentiated tissues, correspond- 
ing to those found in more complicated animals. 
These include intestine, musculature, epidermis 
(also known as hypodermis), sense organs, neurons, 
germ cells, and so on. 

e Genomic simplicity: the DNA content, at 97 mega- 
bases, is lower than that of most animal species. 

e Ease of culture: the worms are easily grown on 
bacterial culture plates. All wild-type and mutant 
strains can be frozen for permanent storage in 
liquid nitrogen, retaining viability indefinitely. 

e Rapid growth: the 3-day generation time permits 
many genetic crosses in a short space of time. 
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e Self-fertility: the ability to reproduce by selfing 
means that homozygous mutant strains arise auto- 
matically. If a hermaphrodite is exposed to a muta- 
gen, some of her haploid gametes will acquire 
mutations and her F; progeny will therefore be 
heterozygous for these mutations. Most mutations 
are recessive, so no difference in phenotype will be 
seen, but each heterozygous F, hermaphrodite 
(genotype symbolized m/+) will produce F; self- 
progeny in the Mendelian proportions of one-fourth 
homozygous mutant (m/m), one-half heterozygous 
(m/+), and one-fourth wild-type (+/+). This means 
that F2 populations can be screened or selected for 
recessive mutant phenotypes in bulk, without any 
need for separate crosses. Self-fertility also means 
that mutants with very severe neuronal, locomotory, 
or developmental abnormalities can still be propa- 
gated as viable homozygous stocks, because the 
hermaphrodite needs little more than a functional 
alimentary tract, a supply of food, and a func- 
tional gonad in order to reproduce. 

e Small size: when fully grown, animals are only 
about 1mm in length, so many thousands can be 
cultivated and examined on a single 9cm culture 
plate. For some purposes, the animal can therefore 
be treated as a microorganism. Handling large num- 
bers means that very rare genetic events (recombin- 
ants or mutations) can be recovered, by deploying 
efficient selections or screens. Small size also per- 
mits detailed light and electron microscopy. 

e Transparency: the animal is fully transparent to 
light microscopy at all stages in development. 
Developmental and cellular events can therefore be 
examined directly in the living animal, and followed 
in real time or by time-lapse microscopy. Nomarski 
differential interference contrast microscopy has 
been extensively used for this purpose. More 
recently, many transgenic lines expressing different 
proteins tagged with GFP have been generated, 
which permit im vivo examination of each specific 
reporter by means of fluorescence microscopy. 

e Invariance: many nematode species, including C. 
elegans, are eutelic; that is, all wild-type individuals 
have identical numbers of nuclei. These are gen- 
erated by essentially invariant patterns of cell 
division. Moreover, the branching pattern and 
synaptic connections made by all neurons in the 
adult hermaphrodite are also largely invariant 
between different individuals. This invariance has 
permitted description of the entire cell lineage 
(from the single cell of the zygote to the 959 somatic 
nuclei of each adult hermaphrodite), the complete 
parts list (the functional identity of all the cells in 
the animal), and the complete wiring diagram (the 
ultrastructure and synaptic connectivity of all 302 


neurons, as reconstructed from serial section elec- 
tron micrographs). These descriptions were carried 
out by John Sulston, John White, Robert Horvitz, 
Judith Kimble, Nichol Thomson and their collabor- 
ators. 

© Resources: Informational resources include docu- 
mentation of the complete cell lineage, complete 
wiring diagram and parts list; the physical and 
genetic maps of the six chromosomes; and the com- 
plete sequence of the genome. Practical resources 
include the 18000 cosmid clones and 5000 YAC 
(yeast artificial chromosome) clones which were 
used to assemble the physical map, and over 3000 
strains held at the Caenorhabditis Genetics 
Center (CGC). The CGC also maintains a biblio- 
graphy of over 4000 references. There are com- 
prehensive and publicly accessible databases for 
C. elegans such as the Caenorhabditis elegans 
World Wide Web server (http://elegans.swmed. 
edu), the CGC (http://biosci.umn.edu), WormBase 
(http://www.wormbase.org), and ACeDB (http:// 
www.sanger.ac.uk/Software/Acedb/). 


The Genome and Postgenomic Analyses 


The 97 million base pairs of the haploid genome are 
distributed over six chromosomes of roughly equal 
size. Nematode chromosomes are unusual in being 
holocentric, with multiple attachment points for 
mitotic microtubules, and consequently there are no 
centromere-specific concentrations of heterochroma- 
tin, in contrast to most other eukaryotes. Instead, 
repeated sequences are distributed along each chromo- 
some, with some repeat families occurring at lower 
frequency in the middle third of each autosome. These 
central regions, known as clusters, contain more 
unique and conserved genes, and are more gene dense. 

Computer predictions of genes, supported by 
expressed sequence tag (EST) analyses, indicate that 
there are a total of about 19 000 protein-coding genes. 
According to these predictions, 27% of the genome is 
protein-coding, 26% is intronic, and 47% intergenic or 
RNA-encoding. Some of the protein-coding genes are 
organized into operons, which are small sets of genes 
arranged in tandem and cotranscribed from a single 5’ 
promoter. The primary transcript is then broken up 
into single cistrons by trans-splicing to a specialized 
splice leader RNA called SL2. Genes in operons are 
usually not functionally related. Many single gene 
transcripts undergo trans-splicing to a different 5’ 
splice leader, SL1. Trans-splicing is an unusual feature 
of nematodes, but in other respects RNA processing 
resembles that in most other eukaryotes, with genes 
containing an average of five introns; these are 
removed by conventional cis-splicing. 


The genome also encodes the usual sets of eukary- 
otic RNAs: ribosomal RNAs, tRNAs, snRNAs, 
scRNAs and other small RNAs, as well as gene- 
specific regulatory RNAs. Most of the RNA genes 
occur in families, amounting to at least another one 
thousand genes. A variety of repeated sequences con- 
tribute about 6% of the genome. Six families of active 
transposons, called Tc1-Tc6, have been defined, pre- 
sent in 5-30 copies and polymorphic between differ- 
ent races of C. elegans. Two of these transposons, Tcl 
and Tc3, belong to the Tc1/Mariner family and have 
been extensively studied with respect to transposition 
mechanism, as well as being used for transposon- 
tagging and other manipulations. 

The major gene families in the protein-coding part 
of the genome conform to the general animal pattern, 
encoding numerous kinases, DNA-binding proteins, 
RNA-binding proteins, extracellular matrix compon- 
ents, and so on. Conspicuously abundant are collagen 
genes, which encode the various components of the 
complex cuticle which acts as the animal’s exoskel- 
eton. The genome also contains large families of G- 
coupled transmembrane receptors, which are believed 
to act as chemoreceptors: C. elegans has a sophis- 
ticated olfactory sense, by which it probably gets 
most information about its environment. Another 
major gene family encodes nuclear hormone recep- 
tors, which seem to be more frequent than in other 
animals, for unknown reasons. 

A set of post-genomic tools is being used to assign 
and analyze the function of the 20000 genes of 
C. elegans, which supplement and greatly extend the 
classical genetic tools already available. These include: 


e Systematic isolation and characterization of cDNA 
clones, in order to isolate the coding parts of each 
gene and to verify the genes and intron-exon organ- 
ization predicted from sequence information. 

© Sequencing of the related nematode Caenorhabditis 
briggsae, which is at least 20 million years diverged 
from C. elegans, so that intronic sequences exhibit 
no similarity, but exons and control regions are 
conserved and can therefore be recognized. 

e Expression analyses, by in situ hybridization and by 
making transgenic lines with reporter gene con- 
structs, using either B-galactosidase or green fluor- 
escent protein (GFP) tags. Transformation and the 
construction of transgenic lines of C. elegans are 
simple, because the absence of specific centromeric 
sequences means that any piece of exogenous DNA 
can be propagated as an extrachromosomal elem- 
ent, once it has been injected into the germline. 
The GFP-tagged lines allow labeling and exam- 
ination of specific tissues, cells, or subcellular struc- 
tures in the living animal. 
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e Investigation of whole-genome transcriptional 
properties, by means of hybridization to micro- 
arrays carrying sequences corresponding to all cod- 
ing genes. 

e Systematic gene knockouts by chemical deletion. 
Homologous recombination between chromo- 
somal genes and injected transgenes does not occur 
efficiently in C. elegans, so deletions are generated 
as rare events in large populations, and then isolated 
by screening using PCR. 

e Transient gene knockouts by injection of double- 
stranded RNA. Injection of RNA corresponding 
to a gene of interest into the syncytial gonad of a 
hermaphrodite results in silencing of that gene in all 
progeny, as a result of a posttranscriptional gene 
silencing (PTGS) process called RNAi (for RNA 
intereference). RNAi is a particularly useful tech- 
nique because it blocks both maternal and zygotic 
components of gene expression, and consequently 
reveals phenotypes that may not be apparent in a 
simple gene knockout. 

e Large-scale searches for protein-protein inter- 
actions, using yeast two-hybrid screens. 


Research Areas 


Conventional mutational techniques, mostly using 
the chemical mutagen ethyl methane sulfonate, have 
defined more than 1500 genes, distributed over 300 
phenotypic categories. The largest gene classes named 
for specific phenotypes, and therefore implicated in 
particular processes, provide a reflection of some of 
the main research areas. Between 12 and 130 genes 
have been defined by mutation in each of the follow- 
ing classes: emb (embryogenesis), lin (cell lineage and 
differentiation), ced (programmed cell death), mig 
(cell and axon migration), gon (gonad development), 
mab (male-specific development), him (meiosis), fer 
and spe (spermatogenesis), che and dyf (chemotaxis), 
cod (male copulation behavior), eat (feeding behav- 
ior), egl (egg-laying), mec (mechanosensation), 
osm (osmotaxis), unc (locomotion), daf (dauer-larva 
formation), dpy (size and body shape), sup (genetic 
suppression). The efficiency of mutagenesis in C. ele- 
gans also means that genes are often defined by mul- 
tiple independent alleles, sometimes including unusual 
mutations such as temperature-sensitive alleles or 
gain-of-function alleles. 

In addition to genes implicated in specific pro- 
cesses, many hundreds of essential let (lethal) genes 
have been defined by mutations that lead to embry- 
onic death, larval death, or adult sterility, but as yet 
these have been little characterized with respect to 
phenotype. The total number of essential genes in C. 
elegans is not certain, but is unlikely to be more than 
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five or six thousand, less than one-third of the all the 
genes in the genome. Consistent with this estimate, 
the majority of gene-knockouts generated by chem- 
ical deletion or RNAi do not result in lethality. 

Major research areas that have been opened up by 
work on C. elegans include many topics in devel- 
opmental biology, neurobiology, cell biology, and life 
history. The system has also contributed to studies of 
basic genetic phenomena such as genome organization, 
recombination, and informational suppression. 

In developmental biology, almost all of the con- 
served signaling pathways have been found and 
extensively studied in C. elegans, often contributing 
important advances to the definition or elucidation of 
these pathways. Several transcription factor families 
were first discovered in C. elegans, such as the LIM 
and POU subclasses of homeobox-containing pro- 
teins. Major conserved signaling, pathways have been 
studied intensively, such as growth-factor linked 
kinase cascades, Wnt signaling and LIN-12/Notch- 
mediated cell-cell interaction. The conserved pathway 
regulating apoptosis (programmed cell death) was first 
discovered and explored in C. elegans. Less conserved 
pathways, such as those involved in sex determinat- 
ion, in developmental timing, and in the control of 
dauer-larva formation, have also been elucidated, 
so that these processes are being analyzed at the mo- 
lecular level. 

In neurobiology, analyses of ontogeny and func- 
tion have been similarly detailed and thorough. The 
genetic bases of neuronal generation, specification, 
axon guidance, synaptic specificity, and function are 
now understood in detail for some parts of the 
nervous system, providing many insights that are re- 
levant to the more complex nervous systems of insects 
and mammals. Sensory transduction is being un- 
raveled down to molecular mechanisms, in particular 
for mechanotransduction and for odorant detection, 
which are the two most thoroughly studied sensory 
modalities. 

In cell biology, C. elegans has contributed to 
advances in the study of cell division, migration and 
morphogenesis. Muscle biology has benefited from 
the extensive collection of muscle-defective mutants 
isolated early on in C. elegans research; the first 
complete sequence for a myosin heavy chain was 
generated as a result of this research. 

Life history traits have also been much studied 
using C. elegans as a model, in particular the process 
of aging. Because the normal lifespan of the species is 
less than 3 weeks, it makes attractive experimental 
material for aging studies A variety of mutants with 
extended lifespan have been isolated and studied; 
some of these result in twofold or threefold increases 
in longevity. Neuronal signaling, nutrient supply, and 


catalase levels are among the factors that have been 
shown to play significant roles. 

This entry has described some of the main fields for 
which C. elegans has been important, but research on 
this organism continues to expand into new areas and 
applications. For example, pharmaceutical companies 
have begun to exploit the small size and manipula- 
bility of the organism by using it in large-scale drug 
screens. The emphasis in C. elegans research on com- 
plete description and holistic analysis seems likely to 
be sustained in the future. In terms of cellular anat- 
omy, development and genomic sequence it is already 
the most thoroughly described of all animals. A major 
challenge now is to assign function to all 20 000 genes, 
to describe their regulation and interaction, and ulti- 
mately to arrive at a complete and integrated under- 
standing of the biology of this simple creature. 
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John Cairns (1922-) began his professional career 
in 1946 in Oxford as a clinician but soon turned 
to research. After moving to Australia in 1950, he 
embarked on studies of the replication of influenza and 
vaccinia viruses, during which he produced, in passing, 
the first genetic map of an animal virus (Gemmell and 
Cairns, 1959). However, Cairns’s research changed 
direction when, in 1957, he took a short leave from 
the Australian National University, Canberra, to learn 
cell culture techniques at the California Institute of 
Technology. There he stayed in a house occupied by 


the young molecular biologists Matthew Meselson, 
John Drake, and Howard Temin, and frequented by 
Franklin Stahl. This experience and his exposure to 
Max Delbriick led him eventually into molecular biol- 
ogy. His career change was solidified by a sabbatical 
leave with Alfred Hershey at Cold Spring Harbor in 
1960. Hershey was developing techniques to isolate 
unbroken DNA, and Cairns used his experience with 
the autoradiography of vaccinia virus to make the first 
measurement of the length of an intact DNA molecule 
(Cairns, 1961). This work established that biologically 
active DNA exists as a single double helix. Upon 
returning to Australia, Cairns applied this technique 
in several studies to visualize the structure and mode 
of replication of DNA. The most famous of his pic- 
tures, showing the replicating chromosome of the 
bacterium Escherichia coli (Cairns, 1963), remains a 
staple of biological textbooks. 

In 1963 Cairns returned to the US to become the 
director of the Cold Spring Harbor Laboratory. Two 
existing institutions, run by the Carnegie Institution 
and the by Long Island Biological Association, had 
merged to form the Cold Spring Harbor Laboratory 
of Quantitative Biology, and Cairns was its first 
director. Unfortunately, the new institution inherited 
a crippling debt and decaying facilities. In the 5 years 
that Cairns was director, the debts were paid off, many 
needed repairs completed, anda significant cash reserve 
established. This was achieved mainly by a program of 
extreme austerity and a great increase in the profits 
from the sale of the Symposia volumes. By pulling the 
laboratory from the brink of financial disaster while 
maintaining its worldwide reputation, Cairns set it on 
the path to the success that it enjoys today. 

Cairns remained at Cold Spring Harbor Labora- 
tory as an American Cancer Society Professor until 
1973. During this period he achieved a classic feat 
of scientific deduction and analysis. At that time E. 
coli’; DNA polymerase I (the Kornberg enzyme) was 
believed to replicate the chromosome. But Cairns felt 
that Poll did not have the properties a replicative 
enzyme should have. So he developed a rapid assay 
for the particular DNA polymerization activity of 
Poll, and his technician, Paula De Lucia, screened a 
mutated population of E. coli for a clone lacking this 
activity. The 3478th clone had no detectable Poll 
activity but had a normal growth rate (De Lucia and 
Cairns, 1969). Although this result was highly suggest- 
ive that E. coli had another DNA polymerase, De 
Lucia and Cairns did not claim so, but suggested that 
their mutant could be used to find the true replicative 
activity. And, indeed, PollII and PollII were identified 
shortly thereafter, independently in the laboratories of 
Friedrich Bonhoeffer and Malcolm Gefter, and Poll 
proved to be the replicative enzyme. 
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From 1973 to 1980, Cairns was head of the Mill Hill 
Laboratories (the precursor of the current Clare Hall 
Laboratories) of the Imperial Cancer Research Fund 
in England. At this point Cairns became committed to 
understanding the causes of human cancer. Believing 
that the study of development would lead to such 
understanding, he recruited to Mill Hill several devel- 
opmental biologists studying simple model organ- 
isms. Research in Cairns’s own laboratory focused on 
mechanisms of mutagenesis in E. coli. This approach 
led Cairns and his graduate student Leona Samson to 
discover a previously unknown pathway of DNA 
repair that removes the lethal and mutagenic lesions 
produced by alkylating agents (Samson and Cairns, 
1977). The burst of research stimulated by this discov- 
ery included the finding that one of the repair proteins 
of this pathway was a ‘suicide enzyme’ that acted only 
once to remove the methyl group from alkylated 
DNA bases (Robins and Cairns, 1979). It turned out 
that this enzyme and others in the alkylation repair 
pathway are conserved in higher organisms. 

Cairns returned to the US in 1980 as a professor in 
the Cancer Biology Department of Harvard School of 
Public Health. Although hindered by limited research 
funds, Cairns developed more fully his thoughts about 
the causes of cancer, the history of cancer research, and 
indeed the history of human diseases and death. In a 
series of penetrating articles, Cairns argued that life 
patterns such as diet and smoking determine the 
susceptibility of populations to specific cancers. There- 
fore, most human cancers are preventable by changes 
in lifestyle, but understandable only by knowing the 
molecular changes caused or promoted by the known 
risk factors. 

Since the proximal causes of cancers are mutations, 
but the common risk factors for cancer are not muta- 
gens, Cairns decided to study mechanisms that might 
provoke or enhance spontaneous mutation, again 
using E. coli as a model system. The result was a highly 
controversial paper documenting cases in which 
mutations seemed to be induced or directed by select- 
ive conditions (Cairns et al., 1988). This paper stimu- 
lated a flurry of research on the phenomenon by 
Cairns and other scientists. In general the Lamarckian 
interpretation of the phenomenon was not supported, 
but the mechanisms producing mutations during 
selection proved to be various and often distinct 
from those producing mutations during normal cellu- 
lar proliferation. 

John Cairns has a keen disregard for conventional 
wisdom and an unwillingness to ignore awkward facts. 
These traits, together with his prodigious laboratory 
skills, were responsible for some of his most original 
ideas and discoveries. He is also an erudite and graceful 
essayist. Many of the ideas he developed during his 
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career are explicated for the general reader in his book 
Matters of Life and Death (Cairns, 1997). 

John Cairns was elected a Fellow of the Royal 
Society, UK, in 1974, and received a MacArthur 
Fellowship in 1981. He retired to Oxford in 1991, 
where he continues to write and to collaborate. 
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Cyclic 3’, 5’-AMP (cAMP), a nucleotide synthesized 
from ATP by the enzyme adenylate cyclase, is used to 
regulate intracellular processes in some prokaryotes 
and higher eukaryotes, but apparently not in plants. In 
these organisms it regulates numerous biological pro- 
cesses ranging from catabolite repression in Escheri- 
chia coli, control of the cell cycle in Saccharomyces 
cerevisiae, and morphological switching in pathogenic 
fungi, to chemotaxis in Dictyostelium discoideum and 
odor perception in Homo sapiens. 


cAMP Signaling in Eukaryotes 


In eukaryotes, cAMP acts as a secondary messenger, 
produced in response to extracellular stimuli and is 
used to trigger a variety of intracellular responses. In 
mammalian cells the stimulus is usually a hormone or 
neurotransmitter but in yeast there is evidence for 
metabolites acting as the stimuli. The extracellular 


response is usually detected by a membrane receptor 
which then activates or inhibits adenylate cyclase, 
increasing and decreasing cAMP production, respect- 
ively. For example, in mammalian cells, adrenaline 
binds to B-adrenergic receptors stimulating cAMP 
production, whilst acetylcholine binds to muscarinic 
receptors inhibiting cAMP production. In the yeast 
Saccharomyces cerevisiae the cAMP signal transduc- 
tion pathway is used to control mating in response to 
the binding of a or a pheromones to the complemen- 
tary Ste2 and Ste3 receptors, and the Gpr1 receptor 
responds to nitrogen limitation to induce filamentous 
growth. Generally the receptors are seven-helix mem- 
brane proteins that interact with small G-proteins and 
are termed G-protein-coupled receptors (GPCR). G- 
proteins are frequently heterotrimeric proteins com- 
posed of a GTP-binding -subunit and two intimately 
associated B and y subunits, which have approximate 
molecular masses of 45kDa, 35kDa, and 7kDa, 
respectively. The binding of GTP activates the G- 
protein, which subsequently adopts an inactive form 
as the GTP is hydrolyzed to GDP. The interaction of 
the receptor with the inactive G-protein—GDP com- 
plex triggers exchange of the GDP for GTP, activating 
the «-subunit. The «-subunit dissociates from the B/y 
regulatory subunits and interacts with adenylate 
cyclase, increasing or decreasing its activity to effect 
changes in the concentration of cAMP. Interestingly, 
only an -subunit mediates the signal of the filament- 
ous growth pathway in S. cerevisiae (e.g., nitrogen 
starvation triggers an increase in cAMP, which then 
induces the cells to produce pseudohyphae and to 
grow invasively into the agar medium support). In 
addition, the G-protein Ras, which commonly acti- 
vates MAP kinase signal transduction pathways, can 
also activate adenylate cyclase in yeast. 

Adenylate cyclase is an integral membrane protein, 
with a molecular mass of approximately 120 kDa. It 
has a topology that consists of two membrane domains, 
each composed of six a-helices, and two cytoplasmic 
catalytic domains, one connecting the two membrane 
domains and the other at the C-terminal end of the 
protein. Adenylate cyclase deactivates the G, protein 
by stimulating its GTPase activity. Increases in cAMP 
levels are downregulated by phosphodiesterases 
(PDEs), which convert cAMP to AMP. 

cAMP elicits its effects by binding to protein kinase 
A (PKA), a tetrameric protein composed of two regu- 
latory subunits, which bind cAMP, and two catalytic 
subunits, which act as kinases phosphorylating 
serines/threonines in target proteins containing the 
consensus sequence Arg-Arg-X-Ser/Thr-X. Mamma- 
lian cells possess three isoforms of the catalytic subunit 
(i.e., Ca, CB, and Cy) and two isoforms of the regu- 
latory subunits (i.e., RI and RII). Differences in the 
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regulatory subunits determine the cellular location of 
the PKA; while some are cytoplasmic, others are asso- 
ciated with cellular structures and organelles owing to 
an interaction between the regulatory (e.g., RII) sub- 
unit and A-kinase anchoring proteins (AKAPs). 
Owing to a dimerization domain at the N-terminus, 
the regulatory subunits exist as a dimer. Each subunit 
also includes a hinge region, toward the N-terminus, 
and two structurally and kinetically distinct cAMP 
binding sites in the C-terminal end, which probably 
arose by a gene-duplication event. The hinge region, 
which is highly susceptible to proteolytic cleavage, 
includes a consensus phosphorylation sequence and 
acts as a pseudosubstrate site to which the catalytic 
domain binds with high affinity. This then acts as 
an autoinhibitory domain, with the type II site differ- 
ing from the type I in that it has a phosphorylatable 
serine. Although S. cerevisiae possesses three different 
catalytic subunits, it only has a single regulatory sub- 
unit and no AKAPs have been identified on sequen- 
cing the genome. The binding of cAMP to the 
regulatory subunits causes dissociation of these sub- 
units from their complex with the catalytic subunits, 
which can then phosphorylate downstream target 
proteins to alter their activity. For example, the 
hormone adrenaline controls glycogen metabolism 
via phosphorylation of phosphorylase kinase and 
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Schematic summary of the cAMP signal transduction pathway. 


glycogen synthase by PKA. Phosphorylase kinase 
activates phosphorylase, an enzyme that breaks 
down glycogen, releasing glucose for metabolism, 
while phosphorylation of glycogen synthase inhibits 
the synthesis of glycogen. There are hundreds of 
known physiological substrates of PKA, including 
metabolic enzymes, hormone receptors, and ion chan- 
nels. PKA can also regulate gene transcription by 
phosphorylating target transcription factors, such as 
CREB in mammals and Flo8 in S. cerevisiae. CREB 
binds to cAMP-response elements found in the pro- 
moter regions of a number of genes, such as those 
encoding enzymes involved in gluconeogenesis. Flo8 
controls the transcription of the cell-surface flocculin, 
Flo11 —a class of serine/threonine-rich glycosylphos- 
phatidyl-inositol-anchored cell wall proteins that 
have a role in the calcium-dependent process of cell- 
cell adhesion known as flocculation. Moreover, Flo11 
plays a critical role in the production of pseudohyphae 
by, and invasive growth of, S. cerevisiae in response to 
nitrogen starvation, as the cells probably search for a 
new nutrient source. The reassociation of the PKA 
tetramer is driven by phosphatases that phosphorylate 
the RII subunit and by the binding of MgATP to the 
RI subunit. 

cAMP can also interact with channels that con- 
duct monovalent and divalent cations. The channel is 
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composed of four or five subunits, with each subunit 
adopting a six-helix topology in which there is a pore- 
forming segment coupling helices 5 and 6. There is a 
single cAMP binding-site in the C-terminal end of 
each subunit, which triggers channel opening on bind- 
ing cAMP. The degree of occupancy of the multiple 
cAMP binding sites present within a channel may 
regulate its conductance. 


cAMP as a Chemoattractant 


Another role for cAMP is as a chemoattractant in cell 
chemotaxis, in which amoeboid cells, such as those of 
the amoeba Dictyostelium discoideum, move toward 
increasing concentrations of extracellular cAMP. 
Under growth conditions, these amoeba cells track 
down and phagocytose bacteria; but when starved 
they move toward secreted cAMP, form aggregates, 
and differentiate into stalk and spore cells. The attract- 
ant is detected by its binding to the serpentine GPCR 
cAR1, with the signal propagated through the By sub- 
units rather than the « subunit of the associated 
G-protein, eliciting rapid and transient increases in the 
secondary messengers cAMP, cGMP (guanosine 3’, 
5’-monophosphate), IP3 (inositol 1,4,5-triphosphate), 
and Ca**. However, IP3/Ca** signaling does not 
appear to be required for chemotaxis. cGMP stimu- 
lates myosin fiber assembly, probably via a cGMP- 
dependent protein kinase that activates myosin II 
kinase, and reorganization of the actin cytoskeleton. 
These events allow the cell to throw out a pseudopod 
containing F-actin toward the cAMP source. Analo- 
gous systems are operative in humans, suchas the move- 
ment of leukocytes toward chemokine attractants. 


Catabolite Repression in Bacteria 


When bacteria grow in the presence of a plentiful 
supply of several different carbon sources they will 
switch off those genes that encode enzymes which 
catabolize carbon substrates that are poor sources of 
energy. cAMP is used to regulate catabolism in some 
bacteria in a process known as catabolite repression. 
Cellular concentrations of cAMP vary inversely with 
levels of cellular catabolites (i.e. cAMP levels are 
higher when catabolite levels are low, a situation that 
prevails when growing on a poor carbon source such 
as lactose), owing to changes in the activity of 
adenylate cyclase. The activity of adenylate cyclase 
is regulated by the IIA component of the phospho- 
enolpyruvate (PEP)-transport system (PTS), which 
catalyzes the uptake of glucose. The IIA protein cycles 
between phosphorylated and unphosphorylated 
forms, phosphorylating glucose during its transloca- 
tion across the membrane. The phosphorylated IIA 


protein can also interact with adenylate cyclase to 
increase its activity. In the presence of glucose, less of 
the phosphorylated IIA protein will exist to interact 
with adenylate cyclase and cAMP levels will decrease. 
cAMP switches on catabolite repression when it binds 
to the CRP/CAP protein (cAMP receptor protein/ 
catabolite gene activator protein) that interacts with 
RNA polymerase to activate transcription of specific 
operons. 
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The Campbell model was first proposed to explain the 
mode of association of the genomes of bacteriophage 
lambda and its host, Escherichia coli, in lysogenic bac- 
teria (Campbell, 1962). First, the ends of the linear 
lambda genome become joined; then the circular 
phage and host genomes cointegrate by reciprocal 
crossover within a segment of homology between 
the two. Such cointegration (Figure 1) is the essence 
of the Campbell model, a term that has been applied to 
all similar cointegrations, whether they take place 
through site-specific recombination (as in lambda) or 
by general (homology-dependent) recombination. 
The reversal of the reaction leads to excision of the 
integrated element from the chromosome. 


Integration by Site-Specific 
Recombination 


The validity of the model to lambda integration has 
been rigorously tested. On entering an infected cell, the 
lambda genome circularizes by annealing and ligation 
of complementary ‘sticky ends’ (projecting 12 bp 


single-stranded 5’ ends of the viral DNA). Integration 
is mediated by a phage-coded protein (integrase) that 
recognizes specific sequences at the crossover site. 
One such site is present on each partner (phage and 
bacterium), although rare integration events use bac- 
terial partner sequences with reduced similarity to the 
primary sites. Besides integrase, excision requires a 
second phage-coded protein, excisionase. 

For lambda and E. coli, the segment of sequence 
identity (123 in Figure l) is 15 bp long. This is too 
short to serve as a substrate for general recombinases 
like E. coli RecA. Among phages and plasmids using 
lambda-related integrases, the extent of sequence 
identity at the crossover point varies from as low as 
10 to over 100 (Campbell, 1992). 

From mutational and biochemical studies of the 
integration reaction, the inferred mechanism entails 
single-strand cleavage at corresponding sites of one 
DNA strand from each partner, followed by cross- 
ligation to give a crossed-strand (Holliday) structure. 
An intermediate in the strand transfer has a covalent 
DNA-protein link on the 3’ ends of each of the two 
strands. Subsequently, the other two strands are cleaved 
and cross-ligated at a position displaced 7 bp from the 
site of initial cleavage. Itis only within the 7 bp between 
the two sites (overlap segment) that sequence identity 
between the two partners is required, probably to 
facilitate strand-swapping that takes place between 
cleavage and ligation (Nunes-Duby et al., 1995). 

This 7bp segment is flanked by an approximate 
reverse repeat apparently used in protein-DNA 
recognition. The extent of specific sequence needed 
at the crossover site is about 20 bp. However, in the 
phage partner, additional specific sequences are needed 
in the DNA flanking the crossover site (attP). A 
DNA-binding protein, integration host factor (IHF) 
is also needed for proper positioning of the DNA 
loops. The complex (called an intasome) of attP, IHF 
and several molecules of integrase form first. Then 
bacterial (attB) DNA is recruited (Craig, 1998). 

Integrase and excisionase are separately controlled 
during lambda development so as to give efficient 
integration in those cells destined for lysogeny and 
efficient excision within those lysogenic cells that are 
reinitiating phage production. 

Phage lambda belongs to a large group of natural 
(lambdoid) coliphages related to one another in DNA 
sequence. Lambdoid phages use various integration 
sites in their host bacteria. Their common feature is 
an approximate reverse repeat surrounding a 7-bp 
segment of identity. Some members of the integrase 
family use a 6-bp or 8-bp overlap segment, but these 
integrases are not ordinarily used in phage integration. 
Some phages and plasmids with little DNA sequence 
similarity to lambda also use integrases of the same gene 
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Figure | Generalized Campbell model. A circular 
extrachromosomal element (above) integrates into the 
bacterial chromosome (of which only a linear segment is 
shown) within homologous DNA (123). abcd and wxyz 
are genetic markers of element and chromosome, 
respectively. In phage lambda, the circle is formed by 
joining the ends of linear DNA (with order abcd) so in 
the inserted prophage this order is permuted to cdab. 


family. A phage has also been reported that uses for 
integration a member of the other major group of site- 
specific recombinases, the DNA invertase—resolvase 
family (Thorpe and Smith, 1998). 


Integration by General Recombination 


When the model was proposed, it was an attempt to 
providea unified mechanism for integration by autono- 
mous elements, including most specifically phage 
lambda and the E. coli fertility plasmid F. F integrates 
mainly by general recombination, using as portable 
regions of homology insertion sequences (IS elements) 
common to both F and the chromosome. Some inte- 
gration may also take place through replicative trans- 
position mediated by the IS elements, a mechanism for 
cointegrate formation that transcends the Campbell 
model. 

Both lambda and F can excise abnormally from 
the chromosome to include host DNA adjacent 
to the insertion site. Abnormal excision of 
lambda is rare and proceeds by unknown bio- 
chemistry that juxtaposes heterologous DNA; 
with F, the major mechanism is general recombin- 
ation between homologous IS elements flanking 
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the inserted F. Such abnormal elements (called spe- 
cialized transducing phages and F’ plasmids, respect- 
ively) can integrate into the bacterial chromosome by 
general recombination using the homology provided 
by the DNA they have picked up from the host. 

Besides such natural processes, integration by 
homology has been used extensively with genetically 
engineered constructs where a host gene is cloned into 
a phage or plasmid vector. As implied by Figure I, the 
resulting integrant has two copies of the cloned seg- 
ment, in direct orientation, flanking one copy of vec- 
tor DNA. Where these two copies differ by mutation, 
excision by general recombination can generate a 
cell with alleles originally present in the cloned insert 
or vector carrying alleles originally present in the 
host. Such swaps will occur for alleles at position 2 
(Figure |) if the integrating crossover occurs between 
1 and 2 and excision between 2 and 3. 
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Cancer susceptibility refers to the increased risk of 
developing cancer that is found in some individuals 
as compared to the general population’s risk. Indi- 
viduals may have a significantly increased relative risk 
(e.g., 2-10 times the population risk) which is still 
associated with a relatively low risk for cancer overall 
or they may have a very large risk associated with a 
strong family history (e.g., 50-90% probability of 


developing cancer). In both cases, a significant genetic 
component contributes to the disease. 


Hereditary Cancer Predisposition 


Hereditary forms of cancer may be associated primar- 
ily with a single tumor type (e.g., hereditary breast 
cancer) or with several distinct tumor types (e.g., 
Li-Fraumeni syndrome). Inherited cancer syndromes 
may also feature other associated phenotypes which 
act as markers for individuals at risk of developing 
tumors (e.g., café-au-lait spots in patients with neuro- 
fibromatosis) or the tumor may be the only manifest- 
ation of the disease. It is estimated that 5-10% of 
human cancer is due to hereditary mutations of a 
cancer-related gene. Although this represents a rela- 
tively small fraction of all cancer cases in the popula- 
tion, the relative risk for individuals within these 
families is very high. For example, the lifetime risk of 
breast cancer is 1 in 8 for women in the general popu- 
lation, while those with inherited mutations of the 
BRCA1 or BRCA2 genes have a 1.2 to 1.6 risk of 
developing tumors by the age of 70. 

Familial predisposition to cancer is inherited as an 
autosomal dominant phenotype with variable pene- 
trance depending on the specific disease and genes 
implicated. Generally, these hereditary forms of can- 
cer arise from inactivation of tumor suppressor genes. 
As described in Knudsen’s “two hit” model (1971), 
these genes contribute to tumorigenesis through the 
inactivation of both alleles in a single cell. Affected 
family members inherit one mutant copy of this tumor 
suppressor gene, but the remaining copy allows the cell 
to function normally. However, subsequent mutations 
that occur somatically and inactivate the remaining 
copy of the gene in a relevant cell type may lead to 
transformation and tumorigenesis. The probability of 
this second mutation occurring is high due to the large 
population of predisposed target cells which harbor 
the “1st hit” (i.e., all cells) and, thus, these individuals 
have a high risk of developing specific tumor types. 
Because of the large population of predisposed cells, 
and since only one mutation may be required for 
transformation of any of these cells, individuals with 
inherited cancer frequently have early-onset tumors 
and often have multiple primary tumors. 

The cancer syndrome multiple endocrine neoplasia 
type 2 (MEN 2) is the single known exception to the 
rule that tumor suppressor genes are responsible for 
hereditary cancers. In this instance, activating muta- 
tions of the RET proto-oncogene are the predisposing 
mutations. These contribute to tumorigenesis even in 
the presence of a normal, functional RET allele. It is 
likely that other oncogenes associated with inherited 
cancer susceptibility will be identified in future as the 


complex protein interactions involved in controlling 
cell proliferation are elucidated. 

The first gene to be identified as responsible for 
an inherited predisposition to cancer was the retino- 
blastoma tumor suppressor gene RB1. Mutations 
and/or loss of this gene results in the hereditary child- 
hood tumor of the retina, retinoblastoma, and are 
also implicated in bone tumor, osteosarcoma, and 
lung carcinoma. RB1 is a phosphoprotein that plays 
multiple roles in regulating gene transcription and cell 
proliferation. When RB1 is inactivated, it leads to 
deregulation of the cell’s entry into the cell cycle, 
allowing cells to continue dividing inappropriately, 
which can result in tumor formation. Since the dis- 
covery of RB1, a large number of genes which dis- 
regulate normal cell growth have been implicated as 
potential “cancer genes.” The genes associated with 
hereditary cancers vary in their nature; however, the 
majority have been shown to affect major cellular path- 
ways that are required for cell growth and proliferation 
or for cell death. These include transcription regulators 
(e.g., VHL, p53), receptors (e.g., RET), and many pro- 
teins with as yet unclear functions (e.g., MEN 1). 


Low-Penetrance Cancer-Susceptibility 
Alleles 


Although Mendelian inheritance of mutations in 
cancer-susceptibility genes is associated with a subset 
of human cancer, there are also mutations and variants 
in the genome that do not confer this type of strong 
predisposition yet still increase cancer risk, either 
generally or specifically for one tumor type. These 
variants may identify additional risk factors associated 
with known cancer-related genes or may identify 
genes which have less obvious or direct roles in tumori- 
genesis and yet may also contribute to the cancer 
phenotype. Recent studies show that low-penetrance 
mutant alleles of some tumor suppressor genes, as well 
as the better known high-penetrance mutations, can 
contribute to cancer incidence. These low-penetrance 
alleles confer a significantly increased risk of sporadic 
tumors. For example, the 11307K allele of the adeno- 
matous polyposis coli (APC) gene confers a twofold 
risk for sporadic colon carcinoma as well as its her- 
editary role in familial adenomatous polyposis. These 
low-penetrance alleles are inherited in the same way 
as the higher-penetrance mutations; however, only a 
proportion of those who inherit these mutations will 
develop tumors. Thus, the typical autosomal dom- 
inant cancer predisposition inheritance pattern that 
we see with highly penetrant alleles is not obvious, 
and the few tumor cases observed in a family can be 
mistakenly interpreted as sporadic cases. Mutations 
of these genes may also contribute to tumors in a 
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dosage-dependent fashion, such that the degree of can- 
cer susceptibility is dependent on the number of func- 
tional copies of the gene. For example, carriers of 
mutations of the ataxia-telangiectasia mutated (ATM) 
gene have a 3.5- to 5-fold relative risk of developing 
breast cancer as compared to the total population. 

A number of cancer-susceptibility genes confer 
increased relative risk of specific tumor types when 
coupled with environmental factors. For example, 
women with variant alleles of CYP1A1 or CYP2E1 
who are also cigarette smokers have a significantly 
increased risk of breast cancer, probably due to 
decreased ability of these specific alleles to metabolize 
carcinogens. More significantly, individuals with 
mutations of proteins required for DNA repair, such 
as those involved in xeroderma pigmentosum, are 
unable to repair DNA damage caused by environmen- 
tal factors such as UV light or other mutagens and 
have an accumulation of DNA damage which contri- 
butes to a high incidence of skin tumors. 


Summary 


It seems likely that the risk of developing cancer may 
be dependent on a few major genetic effects and mul- 
tiple low-penetrance alleles, potentially incombination 
with other environmental risks in a given individual. 
Each of these effects does not act in isolation but 
forms part of the individual’s cumulative risk. Thus, 
inherited cancer susceptibility may be much higher 
than the 5-10% estimate associated with familial 
forms of cancer and may reflect a significant subset 
of what we currently think of as sporadic tumor cases. 


See also: Breast Cancer; Carcinogens; Oncogenes; 
Proto-Oncogene; Tumor Suppressor Genes 
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A candidate gene is a gene that is believed to harbor 
alleles causing a Mendelian disorder, or contributing 
to a complex phenotype, based on an a priori under- 
standing of that gene’s biochemical function or mutant 
phenotypes associated with that gene. 


Contexts in which Candidate Genes are 
Important 


Traditional, single-generation meiosis-based linkage 
mapping generally maps a genetic locus to a physical 
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region of the genome containing a number of open 
reading frames. When multiple loci and the environ- 
ment contribute to a phenotype, or the number of 
informative meioses which can be obtained is small, 
the size of the physical region to which a gene has 
been mapped can be a number of centimorgans in 
size. In organisms where this entire gene region is 
sequenced and annotated (or closely related organ- 
isms where synteny can be effectively employed), it 
is often possible to narrow the search for a causative 
or contributing locus to a subset of the annotated 
genes in the mapped region. In the case of Mendelian 
disorders, candidate genes in a mapped interval will 
be closely examined for frameshift, stop, missense, 
or splicing mutations that segregate with the disease. 
In the case of complex diseases, often the candidate 
genes will be examined more closely in the context 
of either a population- or family-based association 
study. 

It has recently become feasible to move directly 
to an association-based candidate gene approach with- 
out any prior linkage mapping information. In this 
approach, the researcher will test for the possible 
role of a number of candidate genes in contributing 
to a disease regardless of their map position. This 
can be a particularly effective strategy when there is 
a very strong set of candidate genes and/or the 
nature of the disorder prevents an effective linkage 
study. 


Identification of Candidate Genes 


Candidate genes are chosen based on a biological 
understanding of the role of the wild-type product of 
that gene. A gene is a good candidate for a complex 
disease if there are mutations of large effect in that 
gene that give rise to similar, yet more dramatic, 
phenotypes to the complex disease under considera- 
tion. For example, if mutations are known in a gene 
that gives rise to a particularly severe or early onset 
form of a disease, this gene is a strong candidate for 
harboring alleles of more subtle effect. 

A second approach for identifying candidate genes 
is to choose genes from a biochemical pathway known 
to be involved in disease etiology and/or that contain 
members in which mutations exist that affect the 
phenotype of interest. In instances where these path- 
ways are poorly understood, methods such as gene 
expression profiling using DNA microarrays and 
yeast two-hybrid screens can generate candidate 
genes that are coregulated with, or interact with, a 
known member of the pathway. 


See also: Genetic Diseases 
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Selective breeding over many generations has pro- 
duced more than 300 distinct breeds of domestic dog 
worldwide, each defined by specific physical and 
behavioral characteristics. As a result, modern breeds 
of dogs display a range of morphological and behav- 
ioral variation that is unique among mammals, and 
resulting almost exclusively from genetic causes. 
Despite extraordinary phenotypic variation, however, 
all breeds of dog belong to a single species (Canis 
familiaris) and crosses between most breeds produce 
fertile offspring. 

Purebred dogs, by definition, are segregated in- 
to small inbreeding subpopulations which are subject 
to intense selection to meet specific and rigid criteria 
for performance and physique, as defined by breed 
standards. The combination of founder effects, genet- 
ic drift in small populations, inbreeding, and positive 
selection means that dogs of any one breed, even 
those from apparently unrelated families, will share 
homogeneity at some places in the genome. The 
combination of interbreed variation and intrabreed 
homogeneity offers geneticists a rare opportunity to 
delve into the genetics of mammalian morphology, 
behavior, and disease. 


Evolution of Dogs 


The domestic dog is a single species in the family 
Canidae, order Carnivora, and superfamily Canoidea 
that includes seals, bears, weasel, and raccoon-like 
carnivores. Although all dogs appear to have been 
derived from the gray wolf, origination or interbreed- 
ing events may have occurred several times over 
human history. Theories of dog origins range from 
those maintaining that dogs originated once from a 
limited founding pool to those suggesting multiple 
origins, from possibly more than one species, over 
the course of human history. 

The specific number of domestication events and 
their timing and location which lead to the modern 
dog is somewhat controversial. The archeological 
record suggests that the first domestic dogs were 
found in the Middle East about 14000 years ago. 
However, very old fossil remains found in North 
America and Europe suggest that dogs are closest 
phenotypically to Chinese wolves. The phenotypic 
plasticity of dogs is a problem when attempting recon- 
structions of their origin. Some dogs resemble closely 


the phenotype of wild wolves; others less so. Conse- 
quently, the first appearance in the fossil record of 
domestic dogs, as marked by their phenotypic diver- 
gence from wolves, may be misleading. Rather than 
the first domestication event, the appearance of the 
first differentiated dogs in the fossil record may 
instead record a change in artificial selection asso- 
ciated with a cultural change in human societies. 

An independent assessment of dog domestication is 
provided by mitochondrial control region sequence 
data (Figure 1). Recall that mitochondrial DNA 
is non-nuclear DNA that is inherited only from 
the mother. Tracking phylogenetic relatedness of 
mitochondrial sequences between populations is 
therefore a useful way to unravel evolutionary rela- 
tionships. Phylogenetic analysis of mitochondrial 
control sequences reveals four divergent sequence 
clades, suggesting four primary lines of canine 
lineage. The most diverse of these clades contain 
sequences that differ by at most 1% in DNA sequence 
(Figure l, clade I). Therefore, because wolves and 
coyotes diverged about 1 million years ago and have 
mitochondrial control region sequences that are 7.5% 
different, dogs and gray wolves may have diverged 1/ 
7.5 as long ago or about 133 000 years before present, 
implying an ancient origin of domestic dogs from 
wolves. There is strong evidence that wolves and 
humans lived in the same habitats for as much as 
500000 years. Thus, the domestication of dogs may 
have long preceded more recent physical changes 
associated with the shift of human societies from 
hunter-gatherer cultures to more agrarian societies, 
about 12 000 years ago. 


Genotypic and Phenotypic Diversity and 
Ancient Dog Breeds 


Although different breeds of dog have strikingly dif- 
ferent appearances, it is difficult and usually impos- 
sible to distinguish different breeds of dog by their 
DNA sequence alone. These results are expected 
because most breeds of dog have been developed only 
very recently, and apparently were derived from 
a gene pool that was both diverse and well-mixed. 
Therefore, the relationship of sequences among breeds 
reflects the complexity and divergence in the ances- 
tral common gene pool of dogs rather than specific 
ancestor-descendent relationships of recently di- 
verged breeds. 

Interestingly, some more ancient breeds, such as the 
dingo, the New Guinea singing dog, greyhounds, and 
mastiffs were developed when human populations 
were more isolated. It is postulated therefore that 
some of these breeds may even have been independ- 
ently domesticated from wolves. This notion is 
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supported, for instance, by data from the Norwegian 
breeds suggesting an ancient and perhaps independent 
origin from wolves (Figure I, clade II). To determine 
if other ancient breeds with a long history of isolation 
were independently derived from wolves, a survey 
was recently done of the Mexican hairless, or Xolo 
dogs (Vila et al., 1999), developed by the Aztecs over 
1000 years ago. A survey of 26 Xolos shows that they 
contained sequences identical to those found in other 
more common breeds. Moreover, representatives of 
all four sequence clades were found in the Xolo 
(Figure 1), indicating that the population of dogs 
that migrated with humans into the New World was 
large and diverse and had a recent common ancestry 
with dogs of the Old World. None of the Xolo se- 
quences were similar to New World wolves suggesting 
that they were not independently domesticated from 
them. 


Wolf-Dog Hybridization 


Even today, wolves continue to influence the genetic 
diversity of dogs. In the US alone, there are currently 
over 100 000 wolf—dog hybrids. These dogs are popu- 
lar among some individuals because of their appear- 
ance and because of their attributes as protectors. 
Wolf-dog hybrids are frequently interbred with pure- 
bred dogs, particularly Nordic breeds and German 
shepherd. As such interbreeding occurs, it is generally 
assumed that the resulting progeny will have a lower 
proportion of wolf genes, and thus be more docile. 
Unfortunately, so little is known about the genetics of 
behavior in mammals, that while this is likely true at 
the population level, i.e., dilution of wolf-dog hybrid 
genes by mating with domestic dogs will likely pro- 
duce more docile dogs, it is difficult to predict what 
the progeny of any single wolf-dog hybrid cross will 
be like. 

As a result of such crosses, wolf genes continue to 
diffuse into the general dog population. It is interest- 
ing to note, however, that dog genes may also be 
influencing the genetic composition of wild wolves. 
In Italy and Spain, for instance, gray wolves occa- 
sionally will interact and even interbreed with semi- 
feral populations of domestic dogs. Such matings 
can threaten the genetic integrity of wild wolf popu- 
lations and are a major concern of conservation 
geneticists. 


Developmental and Genetic Diversity 


The origin of phenotypic diversity in domestic dogs is 
intriguing. Dogs are clearly the most diverse domestic 
species. The range in size and conformation is exem- 


plified by very small breeds, like the chihuahua and 
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Figure | Neighbor-joining relationship tree of wolf (WV) and dog (D) control region sequences (16). Dog 
haplotypes are grouped in four clades, | to IV. Boxes indicate haplotypes found in the 19 Xolos (25). Haplotypes found 
in two Chinese crested dogs are indicated with a black circle. Bold characters indicate haplotypes found in New 
World wolves (VWW20 to W25). (Reproduced with permission from “Origin, genetic diversity and genome structure 
of the domestic dog,’ Wayne RK and Ostrander EA. BioEssays. Copyright © 1999, Wiley-Liss, Inc., a subsidiary of 
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Pomeranian that weigh only a pound or two, and the 
Newfoundland and St. Bernard which can weigh close 
to 200 pounds. This two orders of magnitude differ- 
ence in size has no parallel in other domesticated 
animals. 

Past theories have hypothesized that the basis of 
phenotypic diversity in dogs is related to the profound 
developmental alterations that occur from neonate to 
adult. Neonatal dogs have an extremely broad and 
foreshortened cranium whereas many adult dogs 
have a long extended head. But developmental alter- 
ations that truncate, accelerate, or retard aspects of 
this ontogenetic transformation create dramatically 
divergent skull morphologies that can readily be 
selected by breeders. Consider for instance the very 
different head shape of collies, with long sloping 
foreheads, and pugs, with round heads and a ‘pushed 
in’ face. Puppy-like features in adult animals are often 
cultivated by humans and this is particularly true in 
selection of dogs for breeding. It is interesting to note 
that, in contrast, neonatal and adult domestic cats 
show very little phenotypic variation, and thus 
changes in growth rate or timing will not cause such 
a dramatic change in conformation. 

Breed diversity is also reflected by ontogenetic 
diversity in other domestic mammals. This implies 
that the difference in diversity between dogs and 
other domestic animals reflects the degree to which 
neonates and adults differ in conformation. The action 
of only a few developmental genes on growth will 
cause more dramatic change in dogs than in other 
domestic animals. However, this conclusion was 
based on the assumption that dogs and other domestic 
species had similar initial levels of genetic variation. 
The finding that dogs have had a diverse and ancient 
origin imply that genetic variation may be an import- 
ant prerequisite for phenotypic variation in dogs and 
other domestic species. 


Genome Organization and Cytogenetics 


Among the many methods for studying genetic vari- 
ation are those based upon cytogenetic analyses, com- 
parative studies of gene families, and molecular 
genetic analyses. The canine genome comprises many 
(2n = 78) small acrocentric chromosomes that make 
cytogenetic analysis difficult. Using high-resolution 
banding of metaphase chromosomes prepared from 
dog fibroblasts, a 460 band ideogram of the dog gen- 
ome has been described. Standards for chromosome 
identification by G-banding have been established 
for only the largest 22 canine autosomes by the Com- 
mittee for the Standardized Karyotype of the Dog. 
The remainder of chromosomes are expected to be 
identified in the very near future using one of several 
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approaches. First, efforts are under way to isolate 
cosmids which can be used in fluorescent im situ 
hybridization analysis (FISH) to ‘tag’ each chromo- 
some. The presence of characterized polymorphisms 
within the cosmids means that the location of the 
cosmid can be easily integrated into the evolving 
genetic map. In addition, cytogenetic techniques are 
continually developing and it is expected that new 
techniques will lead to the development of higher 
resolution chromosome maps. 


Linkage Analyses and Genetic Maps 


While chromosome gene maps are necessary for deter- 
mining the evolutionary relationship between gen- 
omes, and for determining the syntenic relationships 
between mammals, a genetic map is needed for identi- 
fying loci which contribute to traits of interest. A 
genetic map is one for which the distance between 
markers is measured as a function of genetic recombin- 
ation. A marker is a short segment of DNA that 
varies between homologous chromosomes in the 
population. Because any given individual has two 
copies of each chromosome, each individual must 
have, by definition, two alleles for every marker. If 
identical alleles are inherited from each parent, individ- 
uals are homozygous for that marker. Markers are 
considered informative if there are sufficient alleles 
in the population that most couplings allow the inher- 
itance of chromosomes (or regions of chromosomes) 
to be tracked from grandparent to parent to offspring. 
If the frequency of the most common allele that 
appears in the population is less than 95%, then the 
marker is referred to as polymorphic. 

If a marker and a gene are physically located close 
together on the same chromosome, alleles on homo- 
logous chromosomes will be coinherited in a signifi- 
cant number of offspring and are thus linked. If two 
markers are located far apart on the same chromosome 
or on different chromosomes their alleles will be 
inherited independently or randomly in offspring 
and are unlinked. For a given region of the genome 
the probability of a genetic recombination event 
occurring between a pair of markers or a marker and 
a disease gene is proportional to the distance between 
them. This probability is expressed as a recombin- 
ation fraction or, in units called centiMorgans (cM). 
One percent recombination is equal to 1 cM, which 
roughly corresponds to a million base pairs in the 
human genome. 

To map the gene for a trait of interest, a genomic 
screen of DNA from families with the trait of 
interest is undertaken, using markers spaced about 
every 5-10 cM. Figure 2 shows a schematic of a two- 
generation pedigree and a denaturing sequencing gel 
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Figure 2 (A) Segregation of alleles in a single pedigree. 
Females are represented as circles, males are squares. 
Affected individuals are colored in black, unaffected in 
white. (B) The marker analyzed here is hypothesized to 
be linked to the disease gene in question, since all 
affected individuals have inherited this allele from their 
affected parent or grandparent. 


resulting from analysis of a single marker. The black 
bars represent alleles separated on a gel, and demon- 
strate Mendelian inheritance of the alleles. One allele 
from the father, which is circled, appears in all affected 
individuals. In addition, no unaffected individuals 
inherit this allele. Thus, it can be hypothesized that the 
marker indicated is close to the disease gene. Addi- 
tional markers and many more families would need to 
be analyzed to determine if the proposed linkage is 
true and to determine the distance between the marker 
and disease gene. Odds of 1000:1 that a given marker is 
linked to a trait of interest are indicated by a Lod score 
of >3.0 and is generally accepted as evidence of link- 
age. A Lod score of less than —2.0 indicates that a 
given marker and trait of interest are unlinked 
Currently most screens utilize genetic maps com- 
posed of microsatellite markers. Microsatellites are 
small repetitive stretches of polymorphic DNA that 
can be tracked using the polymerase chain reactions 
(PCR). They are optimal for the construction of 
genetic maps for several reasons. First, they are fre- 
quent and distributed randomly; there are several 
thousand of the common repeat arrays (e.g., (CA), 
(GATA),, or (CAG),) scattered throughout the 
canine genome. Hence collection of large numbers of 
markers for map building is a relatively straightfor- 
ward exercise. Second, the rate at which mutations 
generate new variation/length alleles is nontrivial — 
about 107° for (CA) repeats and about 107° for 
microsatellites based upon tetranucleotide repeats. 
This means that they are highly informative in map- 
ping studies in relatively inbred families. Neverthe- 
less, they are sufficiently stable that the inheritance 
of adjacent sections of chromosomes can be tracked 
through several generations of a family with reliability. 
Linkage analyses of large numbers of microsatellite 
markers on outbred reference families, comprised of 
many distinct dog breeds, have led to the production 


of a preliminary canine genetic map. A high-density 
map appears well on its way to completion, with well 
spaced, highly informative markers spanning several 
chromosomes. The map likely covers greater than 
85% of the canine genome, although exact estimates 
are difficult to determine since the precise size of 
the canine genome is not known. The best estimates 
suggest that it is about 26.5 + 1.1 Morgans (95% 
confidence interval = 24.3 M to 28.7 M). As the density 
and coverage of the map increases, the ability to iden- 
tify loci through linkage analyses of families with 
traits of interest will increase proportionately. Thus 
far, several hundred canine microsatellites have been 
described and placed on the canine map (Mellersh 
et al., 1997), with several hundred more currently in 
progress. While there often appears to be a unique 
distribution of alleles within particular breeds, it has 
not yet been possible to define markers which are 
breed specific. This is not surprising given the discus- 
sion above about the significant genetic variation that 
contributed to the canine gene pool. 


Mapping Genetic Disease in Dogs 


The abundance of genetic disease in modern purebred 
dogs, coupled with the evolving canine genetic map, 
presents a rare opportunity to better understand the 
genetic basis for disease in all mammals. In recent 
years, tremendous progress has been made in the map- 
ping of human disease genes, including those for cystic 
fibrosis, Huntington disease, and colon, breast, and 
prostate cancer susceptibility genes. In general, iden- 
tification of disease gene loci by linkage mapping in 
human families is a slow and laborious process, ham- 
pered largely by the pedigree structure of human 
families and human populations in general. In large 
part, these problems could be remedied by using the 
dog as a surrogate. For instance, the problem of locus 
heterogeneity which often confounds human linkage 
studies may be avoided in dogs, because breeding 
practices assure that usually a small number of genes 
or even a single gene will underlie a given disease in a 
specific breed. This presents a unique opportunity for 
simplifying the study of human diseases for which 
there are likely to be several underlying genes, such 
as epilepsy, cancer, deafness, blindness, and motor 
neuron disease. The problem of mapping human dis- 
ease genes is further compounded by the fact that, in 
some cases, different genes responsible for very similar 
phenotypes lead to slight variation in presentation of a 
disease. For instance, when comparing different types 
of retinitis pigmentosa, it is possible to relate variation 
in disease presentation to the underlying genetic cause. 
But for many other diseases, such as cancer, such 
subtleties are not obvious. Thus, even in a collection 


of families where there is strong evidence that genetic 
predisposition underlies the disease, it may be difficult 
to localize any single underlying disease gene. 

Dog families offer the additional advantage that 
they are often much larger than human families; a 
given set of parents may produce dozens of offspring 
in their lifetime. Canine families may also be more 
informative for mapping as related individuals can 
be easily crossed to produce the most informative 
families for genetic mapping. Therefore levels of stat- 
istical power can be high, and once the canine linkage 
map reaches sufficient density and coverage, it may be 
quicker to map mammalian disease genes in dogs than 
in humans. 


Canine Diseases of Interest 


Thus far, several canine diseases appear due to the 
same underlying genetic causes as phenotypically 
similar human diseases. For instance, von Willebrand 
disease is a group of inherited bleeding disorders in 
mammals, including dogs, all of which are caused by a 
deficiency of the multimeric plasma glycoprotein, von 
Willebrand factor. Hematologic disorders in dogs, 
such as hemophilia A and B, also share a similar 
genetic basis in dogs and humans, as do mucopolysac- 
charidosis type VII (MPS VII), X-linked severe com- 
bined immunodeficiency, and a host of others. 

One arena where there is great promise that canine 
studies will unravel the underlying genetics of similar 
human disorders is the study of hereditary blindness. 
Progressive retinal atrophy (PRA) is the name given 
to a group of a heterogeneous diseases in dogs which 
are the counterpart of retinitis pigmentosa in humans. 
The gene for an early onset form of PRA in the Irish 
setter, classified as rod—cone dysplasia type 1, has 
recently been identified as the B-subunit of cyclic 
guanosine monophosphate phosphodiesterase (GMP) 
which is a protein involved in the visual transduction 
cascade. Mutations in GMP, however, only account for 
a portion of canine blindness and studies in other dog 
breeds are under way to identify other relevant genes. 

Progressive rod-cone degeneration (prcd) is the 
most widespread retinal disease leading to blindness 
in dogs, and accounts for eye disease in several breeds 
including poodles, Portuguese water dogs, Labrador 
retrievers, and others. The prcd locus has recently been 
localized to a small region of canine chromosome 9, 
in a region which is partially syntenic with human 
chromosome 17q. This result was important in the 
field for two reasons. First, it has lead to the develop- 
ment of a useful diagnostic for identifying dogs which 
are carriers of prcd. Incorporation of the genetic test 
into breeding programs is likely to quickly dilute the 
deleterious alleles from the gene pool. Dogs that are 


Canine Genetics 269 


carriers of the disease may have physical attributes 
that lead breeders to keep them in the breeding pro- 
gram, but as long as matings are structured so carriers 
are not crossed, the health of the overall breed can still 
be expected to improve. 

In addition, the mapping of prcd to canine chromo- 
some 9 established locus homogeneity with RP17, a 
human retinitis pigmentosa locus for which no gene 
has yet been identified due to the small number of 
linked families. Cloning of the prcd gene, therefore, 
would likely identify the human RP17 gene as well. 
Several other breeds of dog, such as Norwegian elk- 
hounds, miniature schnauzers, Tibetan terriers, and 
miniature longhaired dachshunds, are characterized 
by similar, but apparently distinct, forms of hereditary 
blindness. The mapping of those disease genes, even if 
there is no comparable human disease, will likely pro- 
vide insight into the cascade of interacting genes 
responsible for vision. 


Synteny between Mammalian Genomes 


The ultimate identification of genes in the dog can be 
expedited by knowledge of the syntenic relationship 
between mammalian genomes for which extensive 
gene maps are available, such as the human or 
mouse. The two best strategies for linking the evolving 
canine genetic map with those of the human and 
mouse is through identification of gene containing 
cosmids which can then be used for FISH mapping 
and by the development of resources for physical 
mapping, such as interspecies hybrid cell lines or 
radiation hybrid panels. 

The first approach is best illustrated by the use of 
FISH to map several loci from human chromosome 
17q to the centromeric two-thirds of dog chromosome 
9. Subsequent isolation of microsatellite-based mark- 
ers from each cosmid followed by linkage analyses 
using multiple large outbred families has allowed the 
placement of these ‘gene-linked markers’ on the canine 
microsatellite map. Both FISH and linkage analysis 
now suggest that the gene order on canine chromo- 
some 9 is similar to that of human 17q and mouse 
chromosome 11 (Werner et al., 1997). All the human 
genes mapped between the neurofibromatosis gene 
(NF1) and the thymidine kinase gene (TK1) appear 
to be present in the dog, although the gene order is 
inverted with respect to the centromere. In addition, 
two loci, GLUT4 and PMP22, which are located on 
human chromosome 17p have been mapped by FISH 
analysis of gene containing cosmids to dog chromo- 
some 5 ina region also identified by the whole human 
chromosome 17 paint, thus indicating a breakage of 
human chromosome 17 syntenic homology at the 
centromere. This is confirmed by the previous 
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placement of canine p53 gene (human 17p) on canine 
chromosome 5. Genes or expressed sequence tags 
(ESTs) mapped to human chromosome 17, therefore, 
serve as candidates for linkage to loci mapped to 
canine chromosome 9 and 5, respectively. This is likely 
to facilitate mapping of canine prcd gene, which lies 
close to the TK gene on canine chromosome 9 and 
studies of a number of genes such as BRCA1, her2, and 
RARA which have a role in growth and regulation of 
malignant tumors. 

A second approach for undertaking comparative 
studies of all mammalian genomes is the placement 
of common sets of genes on all mammalian genome 
maps. This is most easily done using a panel of 
radiation hybrids, in which each hybrid contains a 
portion of the genome of interest in a cell line with a 
complete background of mouse or hamster DNA. 
Canine radiation hybrid panels have recently become 
available and a radiation hybrid map of 400 genes and 
markers recently described (Priat et al., 2000). By 
comparing the location of genes on the dog map to 
the corresponding and more densely mapped mouse 
and human genomes, candidate genes may be selected 
to follow-up any primary linkage findings in canine 
families. 

Several sets of anchored reference loci have been 
developed to facilitate these comparative mapping 
studies. The genes selected as anchor loci are evolu- 
tionarily conserved, are members of important gene 
families, and have been characterized in several mam- 
malian species such as the cow, pig, and cat (Lyons 
et al., 1997). Primer pairs that define each gene in the 
anchor set have been designed to span introns, thus 
maximizing the opportunity for development of poly- 
morphic markers as well. A concerted effort is under- 
way for the developers of maps of all mammalian 
genomes to place the same set of 300-400 genes on 
their maps. In this way, analyses of a locus on any 
single mammalian chromosome will be enhanced by 
a wealth of data from the comparative chromosomes 
of other mammals. 


Concluding Remarks 


Genetic analysis of the domestic dog offers a unique 
opportunity for genetic dissection of a wide variety 
of mammalian traits. The high incidence of genetic 
disease within specific dog breeds as well as the 
availability of multigeneration genealogies, coupled 
with the recent availability of a canine genetic map, 
now make the dog a tangible and attractive genetic 
model. Together with population level and evolution- 
ary studies, the dog is likely to become one of the 
genetically best-defined domestic species in the com- 
ing years. 
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A cap is the structure at the 5’ end of eukaryotic 
mRNA, introduced after transcription by linking the 
terminal phosphate of 5’ GTP to the terminal base 
of the mRNA. The added G (and occasionally other 
base pairs) are methylated, resulting in the structure 
7MeG 5’ppp 5’Np... 


See also: Messenger RNA (mRNA) 


CAP (CRP) 
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CAP (CRP) is a positive regulator protein activated 
by cyclic AMP. It is required in order for RNA poly- 
merase to initiate transcription of some (catabolite- 
sensitive) operons in Escherichia coli. 


See also: Cyclic AMP (cAMP); RNA Polymerase 


Capsid 
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A capsid is the external protein coat of a virus particle. 


See also: Virus 


Carcinogens 
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Carcinogens are agents that cause cancer such as 
ionizing radiation, ultraviolet radiation, viruses, and 
chemicals. 


lonizing Radiation 


Examples of ionizing radiation include « particles, 
X-rays, and y-rays. Most human exposure to ionizing 
radiation is the result of exposure to cosmic rays, 
environmental radioactivity in the form of radioiso- 
topes, medical radiography, and radon gas. Radon 
exposure results from the radioactive decay of uran- 
ium in soil to radium which, in turn, decomposes to a 
gas that can collect in habitable structures. Radon gas 
is now regarded as one of the principal sources of 
radiation affecting the US population. There are con- 
siderable epidemiological data linking radiation expo- 
sure to human cancers. For example, lung cancer was 
common among the first uranium miners, who regu- 
larly breathed in large amounts of radon gas, and skin 
cancer was common among early X-ray workers. 
More recently, a high incidence of leukemia and a 
variety of solid tumors have been observed among 
survivors of the atomic bombing of Hiroshima and 
Nagasaki during World War II. Although the detailed 
mechanism for carcinogenesis by radiation exposure 
has yet to be established, it is well known that radia- 
tion is mutagenic and readily produces chromosomal 
translocations and deletions via the formation of 
DNA double-strand breaks. These chromosomal 
anomalies may lead to oncogene activation or suppres- 
sor gene deletion, thereby initiating carcinogenesis. 


Ultraviolet Radiation 


Skin cancers are the most common form of human 
cancer. Early carcinogenesis experiments involving 
mice and rats demonstrated that ultraviolet (UV) 
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radiation from sunlight was responsible for skin can- 
cer in these animals. The shorter wavelength UV-B 
region of ultraviolet light (280-320 nm) was shown 
to be a more effective inducer of carcinogenesis than 
the UV-A region (320—400 nm), although the latter 
region is also carcinogenic at higher doses over 
extended times. It is now widely accepted that 
repeated exposure to UV radiation from sunlight is 
responsible for most nonmelanoma human skin can- 
cers, and it contributes to the onset of melanoma as 
well. UV radiation produces cyclobutane-type pyrimi- 
dine dimers and other pyrimidine-pyrimidine and 
pyrimidine-purine photoproducts in DNA. A failure 
to repair these lesions may result in base pair substitu- 
tion mutations that can inactivate suppressor genes 
(e.g., p53), resulting in carcinogenesis. 


Viruses 


Some of the earliest experiments in viral carcinogen- 
esis during the first half of the twentieth century 
demonstrated that avian leukemia and an avian sar- 
coma were transmissible diseases. Later investigations 
identified viruses as the agents responsible for this 
transmissibility. Viruses were also identified as the 
agents responsible for causing fibrous tumors and 
benign papillomas in rabbits, and later studies led to 
the discovery of murine leukemia viruses and the 
feline leukemia virus. The involvement of viruses in 
causing human cancer has only recently been estab- 
lished and has thus far been limited to human T-cell 
leukemia, Burkitt’s lymphoma, and nasopharyngeal 
cancer. Adult T-cell leukemia, which is endemic to 
Japan, the Caribbean, and parts of Africa, is caused 
by the human T-cell leukemia virus type I (HTLV-1), a 
human retrovirus. The Epstein-Barr virus (EBV), a 
double-stranded DNA virus and a member of the 
herpesvirus family, was shown to be responsible for 
Burkitt’s lymphoma, particularly among equatorial- 
belt East Africans. EBV is also linked to the occur- 
rence of nasopharyngeal carcinoma in China as well as 
in areas of Africa. Genital tract carcinomas and some 
upper airway and oral cancers are more loosely asso- 
ciated with some human papillomaviruses, which are 
also DNA viruses. Additionally, there is epidemi- 
ological evidence suggesting a strong role for the hepa- 
titis B virus (another DNA virus) in the etiology of 
human hepatocellular carcinoma; however, it is not 
yet regarded as the sole causative agent. 

The mechanism of carcinogenesis by human retro- 
viruses involves the transcription of viral RNA (by 
reverse transcriptase) into a complementary DNA. 
This DNA is then converted to a double-stranded 
DNA provirus that integrates into the host cell’s 
genome. In the case of DNA viruses, the viral DNA 
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is integrated directly into the genome. Insertion sites 
differ and the mechanisms of transformation resulting 
from the integrated DNA also differ and are quite 
complex, but may involve either oncogene activation 
or inactivation of tumor suppressor genes as a result of 
the insertion of viral DNA. In addition, virus-encoded 
proteins may act as transcription factors or interact 
with critical regulatory proteins of the host cell as part 
of the transformation process. 


Chemicals 


Chemicals constitute the most diverse group of car- 
cinogens. Hundreds of chemicals are known to be 
carcinogenic/tumorigenic in animals. A carcinogen is 
termed genotoxic if it covalently binds to cellular 
DNA. If unrepaired, the damaged DNA may cause 
mutations by inducing the misincorporation of bases 
during DNA replication. Genotoxic carcinogens may 
be either direct-acting (ultimately reactive toward 
DNA from the outset) or they may require metabolic 
activation to become reactive toward DNA (indirect- 
acting carcinogens). Examples of direct-acting car- 
cinogens include alkyl or aryl epoxides, nitrosoureas, 
nitrosamides, and certain sulfonate and sulfate esters. 
Examples of indirect-acting carcinogens include poly- 
cyclic aromatic hydrocarbons, aromatic amines, alkyl 
nitrosamines, or aflatoxin B4. Most chemical carcino- 
gens require metabolic activation to elicit a tumori- 
genic response. 

Epigenetic carcinogens are carcinogens that do not 
damage DNA directly; however, they may enhance 
tumorigenesis by a variety of mechanisms. Epigenetic 
carcinogens may induce the generation of activating 
enzymes that metabolize carcinogens to DNA react- 
ive forms or they may inhibit beneficial detoxifying 
reactions that convert procarcinogens to excretable 
forms that are not DNA reactive. Epigenetic carcino- 
gens may also inhibit the repair of damaged DNA or 
serve as promoters. Promoters are agents that are not 
directly reactive toward DNA or mutagenic but 
instead stimulate the growth and division of cells 
that may have already sustained the genetic damage 
that predisposes them to become tumorigenic. 

Cigarette smoking poses the greatest chemical risk 
for causing cancer in humans. Cancers linked to cigar- 
ette smoking include those of the lung, larynx, mouth, 
pharynx, esophagus, bladder, and pancreas. This stems 
from the fact that a large number of carcinogens have 
been identified in cigarette smoke. However, examples 
of carcinogenic chemicals are also found among agri- 
cultural chemicals (e.g., pesticides, herbicides, and fun- 
gicides), industrial chemicals (e.g., aromatic amines, 
vinyl chloride, benzene, and chromium compounds), 
atmospheric pollutants (e.g., polycyclic aromatic 


hydrocarbons resulting from incomplete combustion 
of fossil fuels), contaminants in drinking water (halo- 
genated organic compounds produced during water 
chlorination), some medications (including some 
anticancer drugs, estrogens, and analgesics), plants 
such as cured tobacco, cooked meats (which produce 
polycyclic aromatic hydrocarbons and heterocyclic 
aromatic amines), and mycotoxin-contaminated foods 
(e.g., aflatoxins). 
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Most organ systems in the body contain neuro- 
endocrine cells as part of a diffuse endocrine system 
(Gosney, 1992). These cells often respond to injuries 
or disease. For example, diffuse idiopathic pulmonary 
neuroendocrine hyperplasia (DIPNECH) is a rare 
clinicopathologic syndrome of the lung. It is seen in 
a setting of obstructive pulmonary disease, usually 
obliterative bronchiolitis, with no interstitial lung dis- 
ease. The neuroendocrine hyperplasia is confined to 
airways, usually bronchioles and alveolar walls. In 
some conditions, such as bronchiectasis, there may 
be focal proliferation of neuroendocrine cells, known 
as tumorlets. The distinction between a tumorlet and a 
carcinoid tumor is entirely arbitrary, tumorlets being 
less than 0.5 cm in diameter, a carcinoid tumour being 
0.5cm or greater. Similarly, the stomach can show 
multiple neuroendocrine lesions, ranging from the 
size of tumorlets through to carcinoids. 


Carcinoid tumors are part of a histological spectrum 
of tumors. Thus, in the lung at the “benign” end there 
is the typical carcinoid, then atypical carcinoid, large 
cell neuroendocrine carcinoma, and small cell lung 
cancer. The latter three entities will not be discussed 
butare described in standard texts (e.g. Hasleton, 1996). 

The histology of carcinoid tumors is the same 
wherever they are found in the body. There are a 
variety of histological patterns, ranging from insular, 
trabecular, acinar, and rarer types such as oncocytic, 
papillary, and goblet cell carcinoids. 

The tumors are extremely vascular due to produc- 
tion of the angiogenic transforming growth factor 
alpha (TGFa). In addition, the stroma may be ex- 
tremely fibrotic and show foci of calcification. These 


latter two features are due to other cytokines, such as 
insulin-like growth factor (IGF) and TGFB. 


Types of Neuroendocrine Tumors 


Carcinoid tumors are traditionally described accord- 
ing to their location in the body: fore, mid, or hindgut. 
Foregut includes: thymic, esophageal, gastric, and 
respiratory tract, as well as pancreatic and duodenal 
carcinoids. Midgut includes appendiceal and ileal car- 
cinoids. Hindgut encompasses large bowel. They may 
also be identified in the kidneys and other sites. 

Irrespective of site, the tumors have uniform cyto- 
logic features with moderate eosinophilic finely gran- 
ular cytoplasm. The nuclei have a finely granular 
chromatin pattern. Despite the uniform histology, the 
clinical behaviour appears to differ with site. Thus, 
appendiceal carcinoids may spread outside the wall of 
the organ into the serosa. Although this often indicates 
malignancy, in this location the tumor usually behaves 
in a benign manner. However, ileal carcinoids, once 
they spread into the serosa, metastasize to the liver 
and cause the carcinoid syndrome (see below). 

The site of a tumor determines to some extent its 
clinical behaviour. Thus if a tumor blocks a viscus, 
obstruction causes symptoms. Alternatively the sur- 
face of the tumor may become ulcerated, producing 
blood loss and anemia. In some structures, such as the 
lung, if tumors occupy the parenchyma, they have a 
large area available for growth. Because of the func- 
tional capacity of this organ, they may be symptom- 
less for a long period. Separation of typical from 
atypical carcinoid tumor in the lung is important, 
since the former have an 87% 10-year survival, whereas 
this figure drops to 35% in atypical carcinoids. 


Functional Significance of 
Neuroendocrine Tumors 


Carcinoid tumors are known to medical students for 
the carcinoid syndrome. This consists of flushing, 


Carcinoid Tumors 273 


diarrhea, and valvular lesions. The latter commonly, 
but not exclusively, affect the right side of the heart, 
involving the tricuspid and pulmonary valves. In addi- 
tion there may, less commonly, be wheezing and pel- 
lagra. The syndrome is rare in pulmonary carcinoids, 
occurring in up to 7% of cases. Ileal carcinoids only 
produce the carcinoid syndrome when liver meta- 
stases are present. This is due to the inability of the 
liver to detoxicate substances such as 5-HIAA (5- 
hydroxyindole acetic acid). 5-HIAA metabolites 
interact synergistically with kinins and prostaglandins 
to cause the carcinoid symptoms. However, this 
causes problems in understanding the carcinoid syn- 
drome with pulmonary neuroendocrine tumors. 
These drain substances such as 5-HIAA into the left 
atrium via the pulmonary vein and one would expect 
the syndrome to be commoner. 

Carcinoid tumors are full of peptides. These include 
kinins, endorphins and encephalins, vasoactive amines, 
and peptides. The commonly associated peptides are 
bombesin, calcitonin, gastrin, and glucagon, but many 
others can be demonstrated. It is possible to identify 
more than one peptide per cell. Despite the plethora of 
these substances, endocrine manifestations are rare. 
When they occur with carcinoid tumors, they include 
acromegaly, Cushing syndrome, and insulin produc- 
tion with hypoglycemia. The tumor may be part 
of multiple endocrine adenomatosis (MEA), when 
patients may also have adenomas in the pituitary, 
thyroid, parathyroid, or adrenals. Identification of 
such cases is important since they are familial and 
relatives should be investigated. 


Cell Control Mechanisms in Carcinoid 
Tumors 


Ploidy 

Since cytogenetic changes are a recognized feature of 
many human neoplasms they may be related to clin- 
ical behavior and have prognostic value. In bronchial 
carcinoids this hope was not fulfilled. In a study of 53 
patients, those with DNA diploid tumors tended to 
survive longer than those with DNA aneuploidy 
(Jones et al., 1988), though the difference was of bor- 
derline statistical significance. The incidence of DNA 
aneuploidy in tumors with lymph node metastases 
was significantly higher than those without. However, 
two typical carcinoids had lymph node metastases 
were DNA aneuploid. In a Cox multivariate analysis 
the most powerful predictor of prognosis was histo- 
logical growth pattern. 


p53 
Inactivation of tumor suppressor genes through inhib- 
ition of their protein products (p53, retinoblastoma 
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gene, CdK-I-P16) remove important regulatory con- 
straints in the cell cycle at the G4 restriction point. p53 
transcription factor is on the same common pathway as 
the retinoblastoma gene (Rb) regulating G, arrest and 
on a pathway independent from Rb, regulates apop- 
tosis. Thus p53 inactivation could contribute to accel- 
erated growth of tumor tissue by increasing the rate of 
cell division as well as allowing escape from apoptosis. 

p53 mutation or stabilization is absent in typical 
carcinoids. Atypical carcinoids, which showed focal 
(less than 10%) or patchy p53 positivity, were more 
aggressive and had significantly shorter survival times 
than those without p53 staining (Brambilla and 
Brambilla, 1999). 


Retinoblastoma Gene 

The loss of pRB (a nuclear phosphoprotein able to 
bind to double-stranded DNA) in bronchial carcin- 
oids is rare. The frequency of RB gene inactivation in 
high-grade neuroendocrine carcinomas is similar to 
that seen in retinoblastoma. 


Cyclin 

Benign endocrine tumors are frequently cyclin D3 
positive, while high-grade (small cell neuroendocrine 
carcinomas) are always negative. 


bax and bcl2 

These are survival and apoptotic genes, respectively. 
There is an inverse correlation between the scores of 
bax and bcl2 expression in neuroendocrine tumors. 
There was a predominant bax expression in low- 
grade neuroendocrine tumors (typical and atypical 
carcinoids) and mainly bc/2 expression in small cell 
and large cell lung cancers. The p16- retinoblastoma 
pathway is normal in typical carcinoids but abnormal 
in the higher grade neuroendocrine tumours (Dosaka- 
Akita et al., 2000). 


Telomeres 

There is a specialized ribonucleoprotein polymerase, 
which adds TTAGGG repeats at the end of vertebrate 
chromosomal DNA, called telomerase. Telomeres 
undergo progressive shortening with cell division, 
through a replication-dependent sequence loss at 
DNA termini. This telomere shortening may be a 
mechanism for cellular senescence. Telomerase prob- 
ably compensates for the loss of telomeric repeats, 
being associated with acquisition of the immortal 
phenotype. Some malignant tumors specifically ex- 
press telomerase activity. 55% of typical and atypical 
carcinoids with low-grade malignant potential are 
weakly positive for telomerase RNA expression; 
100% of the rapidly growing large cell neuroendo- 
crine and small cell carcinomas show high-grade 
expression. 


Somatic Genetic Changes 


The number of carcinoid tumours analyzed for such 
changes is small. There are relatively simple karyo- 
typic abnormalities in carcinoid tumors, whereas 
atypical variants have more complex karyotypes. 
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The term ‘carrier’ or ‘carrier state’ usually refers to an 
individual that has both a wild-type (normal) and a 
recessive allele of a gene at a particular locus. Such a 
person is considered to be heterozygous at that locus. 
The recessive gene is often deleterious and is expressed 
only in the homozygous state. Phenotypically, the 
carrier appears normal and may even be at a selective 
advantage compared to the homozygous wild-type. If 
two carriers for an autosomal trait have a child, the 
chance that it will be affected is one in four. If the gene 
is sex-linked and a female carrier mates with a normal 
male, half the sons, but none of the daughters will 
display the trait. 


Frequency 


An outbred population contains an enormous amount 
of heterozygosity. Just consider the amount of diver- 
sity evident in a single family. Many genes are poly- 
morphic. If the alleles are of nearly equal selective 


value, each may be present at a significant level (i.e., 
1% or greater) in the population. Skin color, hair 
color, eye color, and blood groups may be cases in 
point. Other alleles are kept at a relatively high fre- 
quency due to heterosis (where the heterozygote is 
more fit than either homozygote). The classic example 
of this is the carrier of the sickle cell trait who is 
relatively resistant to malaria. More subtle mechan- 
isms such as linkage disequilibrium may also be at 
work. Since there is selection against deleterious 
genes, one would expect that their frequency at any 
given locus would be lower than that for neutral or 
near neutral alleles. This assumption is correct. Many 
governmental bodies now require screening of infants 
for a variety of metabolic disorders. From the number 
of homozygotes detected in such screens, it was easy 
to show that the frequency of heterozygotes (carriers) 
for these traits ranged from 1.6% for phenylketonuria 
to less than 0.3% for Fanconi syndrome. 

What fraction of the population carries at least one 
deleterious gene? This is a difficult question. Each 
ethnic group carries its own assortment of defective 
genes at relatively high levels. For example, the gene 
for Tay-Sachs disease is 10 times higher for American 
Jews from Eastern Europe than for the US population 
as a whole, including Jews originating from other 
parts of the world. Sickle cell anemia is largely limited 
to African Americans and is considered to reflect 
the high incidence of malaria in Western and Central 
Africa. Nevertheless, since there are over 1600 known 
recessive diseases, it is safe to assume that over a third 
of the population carries at least one gene for a serious 
genetic disease. 


Detection 


Several means to determine whether a person is carry- 
ing a mutant gene are available. An early approach was 
to examine the level of a gene product. For example, 
Tay-Sachs disease is characterized by a hexosaminid- 
ase deficiency. Carriers of the defective gene have a 
significantly lower level of this enzyme in their blood 
than homozygotes and can be identified accurately 
and inexpensively. This method, however, has limited 
applicability. Newer methods test directly for mutant 
DNA sequences. As an example, consider the follow- 
ing scenarios. (1) A young man has a living relative 
with cystic fibrosis (CF); he is contemplating marriage 
to a woman who ‘may’ have a similar relative in her 
family. Therefore he is anxious to know if he is a carrier 
of a mutant CF gene. Since the relative is available for 
testing, and the sequence of the CF gene is known, sev- 
eral techniques can be used to determine if the man has 
inherited the same mutant gene. If he has, the woman 
can be tested to see if she is carrying the same one. (2) 
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The young man has a relative who died of CF. This 
case is more difficult since over 400 different mutant 
genes are known to be associated with CF. However, 
one particular mutation accounts for 70% of the cases, 
and 29 others cover a further 20%. A commercial test 
is available that screens for all 30 of these genes. It 
requires only 20 ml of blood and costs about $200. If 
both the young man and his prospective bride are free 
of these genes, the chance of having a child with CF 
should be less than 1 in 10 000. Similar tests are avail- 
able for 14 other relatively common diseases. The costs 
range from $200 to $400 per sample depending on the 
technique used to detect the defect. 

Frequently individuals will wish to know if they 
are carrying particular deleterious genes. Either the 
gene is known to be in the family, as in the above 
example, or it has a high frequency in certain ethnic 
groups. For example 4% of the US Caucasian popula- 
tion carries a gene for cystic fibrosis. Some people 
argue that everyone in that population should be 
screened for the gene. 


Prospects for the Future 


The recent completion of the Human Genome Project 
has laid the groundwork for the efficient screening 
of many additional genetic variants. Using the new 
microchip technology, we will soon have the ability 
to screen simultaneously for thousands of possible 
mutations at a price not much higher than it now 
costs to screen for errors in a single gene. This 
capability will raise many complicated questions of a 
practical and moral nature. These include health insur- 
ance issues and the potential to prevent the birth of 
affected or even carrier children. However, consider- 
ation of these vital and contentious issues is outside 
the scope of this article. 


See also: Cystic Fibrosis; Ethics and Genetics; 
Genetic Counseling; Sex Linkage; Sickle Cell 
Anemia; Tay-Sachs Disease 
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Although yeast is a unicellular organism, it has three 
specialized cell types — a and a haploid cells and a/a 
diploid cells. a and & cells are specialized for mating; in 
contrast, a/a cells cannot mate but are able to undergo 
meiosis and produce haploid spores (the process of 
sporulation). A great deal has been learned about the 
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genetic control of cell specialization by studies of the 
yeast cell types. A major theme has emerged: cell 
specialization results from the interplay of two pro- 
cesses: control of gene transcription by regulatory 
proteins coded by a master regulatory locus (the mat- 
ing type locus) and induction of the final stages of 
differentiation at the appropriate time by environ- 
mental stimuli (such as signaling molecules produced 
by mating partners or as a result of nutritional star- 
vation). This article first discusses how the mating 
type locus controls cell type and then the mechanism 
of signal transduction during differentiation of mating 
types. It is notable that the molecular machinery used 
by this unicellular eukaryotic cell is closely related to 
machinery used by multicelluar eukaryotes such as 
humans. Examples include the use of homeodomain 
transcription factors to regulate transcription of yeast 
genes and the use of G-protein-coupled receptors to 
signal between mating partners. 

Budding yeast exhibits a fascinating phenomenon 
that had mystified geneticists for many years because 
it seemed to violate the basic rules of genetics — cells 
can switch from one mating type to another: a cells 
typically produce more a cells, but at a frequency of 
around 1 per million cells, they produce an « cell. 
Similarly, « cells give rise to a cells at the same fre- 
quency. Even more amazing, in some yeast strains, the 
switch from a to « and from « to a occurs nearly every 
cell division. This phenomenon (mating type inter- 
conversion) is now known to result from a genetic 
rearrangement by a process using genetic “cassettes.” 
According to this cassette mechanism, silent infor- 
mation becomes activated by moving it from a storage 
position in the genome to a position in the genome 
(the mating type locus, which behaves as a “playback” 
locus), where it is expressed. This article will also 
briefly describe mating type interconversion and the 
cassette mechanism for gene regulation. 


Control of Cell Specialization by the 
Mating Type Locus 


Yeast cells double in number every 2 hor so. When cells 
of opposite mating type (a or «) are near each other, 
nearly touching, they undergo the process of mating, 
whereby their cell walls break down and the two cells 
and their nuclei fuse to form a diploid cell. This diploid 
a/a cell then doubles every 2h or so until it receives a 
signal (nutritional starvation) to undergo the reverse 
process — producing haploid cells from diploids — by 
the process of meiosis and spore formation. 

a and a cells produce special products that facilitate 
mating. In particular, a cells secrete a-factor, which 
acts on a receptor (the a-factor receptor) present on 
the surface of a cells. Similarly, æ cells secrete «-factor, 


which acts on a receptor (the «-factor receptor) pre- 
sent on the surface of a cells. Both a and a cells contain 
the genes for production of a-factor and o-factor, and 
they also contain the genes for the a-factor and a- 
factor receptors. Why is it that a cells produce only 
the a-specific products, a-factor and the o-factor 
receptor, and why is it that « cells produce only the 
a-specific products, «-factor and the a-factor recep- 
tor? The answer lies in the mating type locus. 

The mating type locus (MAT) is located at a particu- 
lar position on chromosome 3 (of yeast’s 16 chromo- 
somes) and has two different forms (or “alleles”): 
MATa, which programs the a cell type, and MATo, 
which programs the « cell type (Figure 1). MATa 
codes for two transcriptional regulatory proteins: «1 
is an activator protein, which turns on transcription of 
a-specific genes such as the genes for «-factor and the 
a-factor receptor; 42 is a repressor protein, which 
turns off transcription of a-specific genes such as the 
genes for a-factor and the a-factor receptor. Thus, in 
an a cell, the appropriate genes are turned on and the 
inappropriate genes are turned off, and the cell mates 
as an a. 

In an a cell, the o-specific genes are not expressed 
because «1 is absent; the a-specific genes are express- 
ed because «2 is absent. Thus, in an a cell, the appro- 
priate genes are on and the inappropriate genes are off, 
and the cell mates as an a. 

In an a/a cell, which does not mate, both the 
a-specific and o-specific genes are not expressed. 
This results because «2 turns off the a-specific genes 
(as it does in the o cell). «-specific genes are not 
expressed because a/a cells posess a novel repressor 
protein formed by association between «2 and the 
al protein encoded by the MATa allele. This novel 
repressor, al—o2, turns off synthesis of «1 and other 
genes involved in mating. Thus, an a/a cell does 
not mate because the genes required for mating are 
turned off. a/a cells are able to sporulate when they are 
nutritionally starved because the repressor al—a2 
turns off synthesis of an inhibitor of sporulation, 
which is produced in a cells and « cell but not in a/a 
cells. 

a2 and al—a2 are both homeodomain proteins, a 
large group of proteins that is involved in cell special- 
ization in fruit flies, nematodes, mice, and humans. 


Cell Signaling between Mating Partners 
Turns on the Final Differentiation of 
a and a Cells 


Studies of how yeast cells respond to the mating fac- 
tors, a-factor and o-factor, have contributed a great 
deal to understanding how signals are transduced 
from a cell-surface receptor into a cell. When yeast a 
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Figure | Control of yeast cell specialization by 
regulatory proteins coded by the mating type locus. 
Arrow from al indicates activation of transcription; 
blunt arrowhead from «2 and from al—o2 indicates 
repression of transcription. Arrows over a-, o-, and 
haploid-specific genes indicate transcription. Other 
details are described in the text. 


cells are exposed to o-factor, they exhibit three re- 
sponses: (1) they arrest in the G1 phase of the cell cycle; 
(2) they synthesize a variety of proteins involved in 
cell fusion; and (3) they grow towards their mating 
partner. All of these responses are initiated when the 
mating factor (a signaling molecule that acts between 
organisms, hence called a “pheromone”) binds to the 
receptor (Figure 2). The activated receptor then trig- 
gers activation of a protein kinase cascade (a so-called 
“MAP kinase cascade”), which culminates in activa- 
tion of a transcriptional activator protein, Ste12. Ste12 
then induces synthesis of a variety of proteins that are 
important for arresting the cell cycle (the Far1 pro- 
tein) and for cell fusion (the Fus1 protein). 

Human cells contain many of the same components 
that are found in this signaling pathway of yeast. In 
particular, the yeast receptors are like those that func- 
tion in the human brain. The way in which yeast cells 
arrest their cell cycle as a prelude to mating uses 
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Figure 2 A signal transduction pathway induces the 
final differentiation of yeast cells and causes cell-cycle 
arrest. Mating factors produced by one mating partner 
act on cell-surface receptors to activate a signal- 
transduction pathway as indicated. This culminates in 
induction of synthesis of a variety of proteins involved in 
cell fusion (Fus! protein) and in arrest of the cell cycle as 
a prelude to mating (Farl). Other details are described 
in the text. 


machinery (the Far1 protein) that is found to control 
human cell division and that can go awry in certain 
types of cancer. 


Mating Type Interconversion and the 
Cassette Mechanism 


As noted earlier, yeast cells have the remarkable ability 
to change mating type. This happens at low fre- 
quency in standard laboratory strains but at very 
high frequency in strains that contain a functional 
HO gene. 

Cells can switch mating type, for example, from « 
to a, because all cells contain silent copies of the 
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Figure 3 Mating type interconversion in yeast occurs 
by a genetic rearrangement. All yeast cells contain silent 
a information (a silent « cassette) on the left arm of 
chromosome 3 and silent a information (a silent a 
cassette) on the right arm of chromosome 3. The 
cassette at MAT (the “playback” locus) is expressed and 
determines cell type. a and ~ cassettes have distinctive 
DNA sequences indicated by open or hatched rec- 
tangles. These sequences are flanked by DNA sequences 
that are the same at all three cassette loci (indicated in 
black), which participate in the recombinational event. 
Mating type switching occurs when the DNA at the 
mating type locus is cleaved in the black area on the right 
side of MAT. This broken chromosome is repaired using 
the silent a or silent « information. 


mating type locus information in addition an active 
copy of the mating type locus information at the 
mating type locus itself: they have both silent « and 
silent a information (Figure 3). These blocks of in- 
formation are called genetic “cassettes” because they 
can become active if they are inserted into the “play- 
back” locus, which is the mating type locus. Con- 
sequently, an « cell switches to an a cell by having its 
active a cassette at MAT replaced by a copy of the a 
cassette donated by the silent a locus (see Figure 3). 
This type of nonreciprocal transfer of genetic infor- 
mation from one position to another is called a “gene 
conversion.” 

This type of cassette mechanism is used by a 
variety of organisms such as trypanosomes, which 
evade the immune system of hosts that they infect 
by changing their cell-surface proteins by a cassette 
mechanism. 
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The most significant force in early genetic work on the 
mouse was William Ernest Castle, who directed the 
Bussey Institute at Harvard University until his retire- 
ment in 1936. Castle brought the fancy mouse into his 
laboratory in 1902 and with his numerous students 
began a systematic analysis of inheritance and genetic 
variation in this species as well as in other mammals. 
The influence of Castle on the field of mammalian 
genetics as a whole was enormous — over a period of 
28 years, the Bussey Institute trained 49 students, 
including L.C. Dunn, Clarence Little, Sewall Wright, 
and George Snell; 13 were elected to the National 
Academy of Sciences in the USA, and many students 
of mouse genetics today can trace their scientific heri- 
tage back to Castle in one way or another. 


See also: Dunn, L.C.; Little, Clarence; 
Snell, George; Wright, Sewall 
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The catabolite gene activator protein (CAP) is a pleio- 
tropic effector for the expression of hundreds of 


catabolite-sensitive genes. The discovery of CAP 
was made with the help of an im vitro assay for its 
ability to stimulate gene expression of the lac operon. 
CAP binds cAMP and in the process its conformation 
is altered so that it becomes a gene activator. The 
cAMP-CAP complex binds to DNA at or near the 
promoter sites for susceptible genes. The ultimate 
effect of CAP binding to a promoter site is to stimu- 
late the transcription of the promoter-associated 
genes. 

The key investigations on the Jac operon that led to 
the discovery of CAP and some of its basic properties 
are described below. 


Discovery and Isolation of CAP 


Although the genetic and biochemical studies on the 
action of repressor on the lac operon answered many 
questions about the expression of the lac operon, they 
left equally important questions unanswered. It has 
been known since the turn of the nineteenth century 
that the lac operon expresses at a greatly reduced level 
if lactose and glucose are present simultaneously. 
Either of these sugars can be used by bacteria as a 
source of carbon compounds and energy, but the 
lactose is not utilized to any appreciable extent until 
the glucose supply has been exhausted. This effect 
is called catabolite repression. As long as glucose is 
available lactose is underutilized. 

A turning point in our understanding of catabolite 
repression was provided by Makman and Sutherland 
(1965), who found that when glucose was added to 
growing Escherichia coli cells, the level of 3’, 5’- 
cAMP (cAMP) was drastically reduced. Could the 
lack of cAMP be responsible for the poor expression 
of the lac operon in the presence of glucose? In sup- 
port of this theory Perlman and Pastan (1968) found 
that large quantities of cAMP added to the growth 
medium could partially reverse the glucose catabolite 
repression effect. In a cell-free system containing 
crude extracts from Escherichia coli and lac operon 
DNA, Chambers and Zubay (1969) found that low- 
level expression of the Jac operon could be greatly 
increased by addition of cAMP. This provided support 
for the notion that cAMP was playing a direct role in 
activating the lac operon. 

Further investigations were facilitated by 
Beckwith’s genetic studies and the isolation of key 
mutants relating to the action of cAMP (Zubay et al., 
1970). Beckwith and his colleagues isolated a large 
family of mutants that were permanently catabolite- 
repressed. These mutants fell into two categories: those 
that could be phenotypically corrected by growing in 
the presence of cAMP and those that could not. Mu- 
tants in the first category were believed to be defective 
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in the synthesis of cAMP; those in the second category 
were presumed to be defective in the protein receptor 
for cAMP. Cell-free extracts were prepared from both 
of these mutants. When used for cell-free synthesis of 
B-galactosidase, mutants of the first type were found 
to be greatly stimulated by addition of cAMP, con- 
firming the belief that these mutants were defective in 
the synthesis of cAMP. When extracts from mutants of 
the second type were used instead, cAMP had no 
stimulating effect, suggesting that a protein necessary 


Figure | The structure of the CAP regulatory 
protein. The protein is a dimer containing identical 
monomers with recognition helices (labeled 3) spaced 
precisely 34 A apart along the direction of the DNA 
helix axis so that they can make identical contacts with 
adjacent major grooves of the DNA duplex. The 
cylinders in the figure represent regions of the 
polypeptide chains that are in the folded «-helical 
conformation. These cylinders are interconnected by ex- 
tended polypeptide chains. The arrows indicate the 
directional sense (N to C) of the regions containing 
extended polypeptide chains; N and C labels indicate the 
N- and C-termini of the polypeptide chains. 
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Figure 2 The CAP-DNA complex. The CAP dimer’s 
two helix—turn—helix motifs bind in successive major 
grooves of the DNA. The binding of CAP produces 
two kinks in the DNA structure, leading to an overall 
change in direction of the double helix of about 90°. 
(Figure kindly provided by Dr Thomas Steitz, Yale 
University.) 


for cAMP action was missing or defective. Further 
cell-free studies were performed in which mutants of 
the second type were used in conjunction with par- 
tially purified extracts from a normal strain. Addition 
of small amounts of extracts from a normal strain re- 
established the stimulatory effect of the cAMP. The 
purification of the cAMP receptor protein was moni- 
tored with this system. Ultimately, a single protein 
which we named CAP because it behaved as a catabol- 
ite gene activator protein, was found to be responsible 
for the effect. 


Properties of CAP 


Shortly after its isolation it was found that CAP was 
a dimer composed of identical subunits, each with a 
molecular weight of 22000. CAP binds to DNA and 
this binding is greatly stimulated in the presence of 


cAMP. The cAMP alters the conformation of CAP so 
that it can form a strong complex with DNA at the lac 
promoter region. 

A great deal is known about CAP structure and 
how it binds to DNA. This information has come 
mostly from the crystallographic investigations con- 
ducted by Steitz and his colleagues (Schultz et al., 
1991). The CAP protein homodimer contains recog- 
nition helices spaced precisely 34 A apart along the 
direction of the DNA helix’s axis so that they can make 
identical contacts with adjacent major grooves in the 
DNA duplex. The strategy seems clear: The regu- 
latory protein contains two identical half-sites for 
interaction with two virtually identical half-sites in 
the DNA. 

In prokaryotes such as Escherichia coli the pro- 
tein helix segment recognized by the DNA is part 
of a larger domain known as the helix—turn—-helix 
motif (Figure |). A protruding recognition helix is 
supported by a second segment of helix that stabil- 
izes the recognition helix and fixes its orientation 
with respect to the remainder of the regulatory pro- 
tein. 

In its active form the CAP-cAMP dimer forms 
a complex with a self-complementary 30-base pair 
duplex stretch of the DNA (Figure 2). The CAP 
dimer’s two helix—turn-helix motifs bind in successive 
major grooves of the DNA. The binding of the CAP 
produces two kinks in the DNA structure, leading to 
an overall change in direction of the double helix of 
about 90°. It is not clear how this kinking of the DNA 
influences the ability of CAP to stimulate transcrip- 
tion of the DNA. 
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Catabolite repression is a phenomenon observed 
during growth under conditions where catabolism 
exceeds anabolism. As the term ‘repression?’ implies, 
the synthesis of many enzymes involved in the quest 
for food is inhibited at the level of transcription. 
The triggering event very often is the availability of 
a rapidly metabolizable carbon source (e.g., glucose) 
which causes repression of the enzymes involved 
directly or indirectly in the utilization of poorer car- 
bon sources and in energy generation. Carbon cata- 
bolite repression is a universal phenomenon found in 
prokaryotic, ranyolic, and eukaryotic organisms. The 
mechanisms by which repression is imposed are quite 
variable. They seem, however, to follow a general 
scheme: complex sensory systems which rely mostly 
on protein kinases and phosphatases sense either 
the intracellular levels or the ratios of glycolytic 
intermediates, and alarmones, i.e., molecules whose 
levels reflect the energization state of a cell. The sens- 
ory systems transduce this information to global regu- 
lators. These regulators control the transcription and 
expression of large groups of genes and enzymes e.g., 
for carbohydrate transporters, catabolic metabolism 
and other functions related to the quest for food, 
such as motility, respiration, sporulation, or poly- 
mer degradation. The global control mechanisms in- 
volved in carbon catabolite repression are as diverse 
as the systems they control and are not well under- 
stood for most types of cells, in particular eukaryotic 
organisms. 


Catabolite Repression and Related 
Regulatory Phenomena 


When offered mixtures of carbon sources, most pro- 
karyotes and lower eukaryotes use one of them pre- 
ferentially. Fast degradation of the preferred or class A 
substrate inhibits the synthesis of the enzymes 
involved in the transport and metabolism of the lesser 
or class B substrates. After the class A carbon source is 
exhausted, repression is relieved, i.e., gene transcription 
for class B enzymes begins and, after a lag, growth 
on the class B substrates starts. The term “diauxie’ 
describes this two-phase growth. The mechanism 
which ensures that a cell will prefer the best carbon 
and energy source available is sometimes called the 
glucose effect (repression) when triggered by the rapid 
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consumption of glucose. This abundant carbohydrate 
is the primary fuel for most cells. However, many 
microorganisms use other preferred substrates, and 
any growth conditions leading to an excess of cata- 
bolism (degradation) over anabolism (macromolecule 
synthesis) can cause repression, so the name ‘carbon 
catabolite repression’ is more appropriate. Even this 
name is a misnomer because it assumes that one or 
more intermediates (catabolites) generated from the 
repressing substrate trigger repression, and that one 
mechanism is responsible for all effects. In reality, 
whether a substrate behaves as a class A or class B 
carbon source is not defined by its chemical structure 
but by the rate at which it enters metabolism. 

Repression of the catabolic operons may be direct, 
by controlling transcription initiation at promoters; it 
may be permanent and last as long as the repressing 
carbon source is metabolized rapidly; or it may be 
transient, i.e., acting only during and immediately 
following a change in the carbon sources present in 
the medium. Repression may also be caused indirectly 
by a process called inducer exclusion (catabolite in- 
hibition). When taken up rapidly, class A substrates 
inhibit the activity of the transport systems and cata- 
bolic enzymes involved in inducer uptake and synthe- 
sis and thus prevent synthesis of the enzymes for 
lesser substrates; the corresponding genes remain 
uninduced (repressed). Depending on the growth 
conditions which trigger catabolite repression, e.g., 
an excess of carbon supply or a limitation in nitrogen, 
phosphor, and sulfur supply, very different mechan- 
isms may be involved which, in addition, vary from 
organism to organism. 


Sensory Systems and Global Regulatory 
Networks are Central in Catabolite 
Repression 


Microorganisms directly monitor their surround- 
ings for specific stimuli like carbohydrates using 
membrane-bound sensors (Figure 1). These sensors 
are often transport systems. Together with substrate- 
specific repressors and activators the sensors control ex- 
pression of the genes (operons, regulons) for the meta- 
bolism of the inducing carbohydrates. Alternatively, 
sensors measure pools of intracellular molecules that 
depend on the transport capacity of the cell and indir- 
ectly also reflect the environment. Drastic changes such 
as feast and famine correspond to stress and generate a 
physiological state of alarm. This is often signaled to 
the cell as changes in distinct indicator molecules or 
alarmones synthesized specifically for this purpose 
and perceived through intracellular sensory networks. 
The best-understood example of sucha sensory system 
is the carbohydrate: phosphotransferase system (PTS) 
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of the eubacteria. It comprises a phosphoenolpyruvate 
(PEP)-dependent histidine protein kinase (EI in 
Figure |) whose activity depends on the ratio of PEP 
to pyruvate, which in turn reflects the glycolysis to 
gluconeogenesis ratio. The EI kinase is linked to a series 
(up to 15 per cell) of membrane-bound and substrate- 
specific transporters (EIIs in Figure |) through several 
targeting subunits or phosphate-transfer proteins. One 
of these, named ILA“ (or IIAS), in its phosphory- 
lated form, activates adenylate cyclase (gene, cyaA) 
to synthesize the alarmone cAMP. IIA“™ is depho- 
sphorylated whenever a PTS-carbohydrate, e.g., glu- 
cose, is transported. This is because El-dependent 
transport is coupled to the phosphorylation of the 
transported substrate, and hence the dephosphory- 
lation of the PTS-proteins including IIA~™. Unphos- 
phorylated ILA“ inhibits non-PTS transporters, 
e.g., those for lactose, maltose, L-arabinose, and 
glycerol, thereby causing inducer exclusion. Other 
targeting subunits may also be involved, e.g., the his- 
tidine protein HPr in gram-positive bacteria (see 
below). 

In all microorganisms, sensors which are coupled 
to protein kinases convert a stimulus into a signal 
by increasing or decreasing the autophosphorylating 


activity of the kinase. These changes are perceived by 
receivers that in catabolite repression either directly 
modulate regulators involved in gene expression at 
the transcription level, or indirectly modulate through 
second messengers (alarmones) such as cAMP. These 
regulators invariably are global regulators that control 
large groups of genes, operons, and regulons with a 
common goal, e.g., the quest for food. Thus, a system 
of global gene control responds to physiological alarm 
states like feast and famine that is epistatic (‘super- 
imposed’) over specific control mechanisms. Groups 
of genes, operons, and regulons coordinately con- 
trolled by such epistatic global systems are called 
‘modulons’ in bacterial genetics. 


Carbon Catabolite Repression involving 
the cAMP.CrpA Global Regulator; the 
crpA-Modulon of Enteric Bacteria 


In enteric bacteria, the phosphorylation state of the 
PTS-proteins, in particular of IIA“, reflects carbo- 
hydrate influx into catabolism. The ratio of PEP to 
pyruvate reflects the catabolic state of the cell. As 
a consequence, starved cells show high levels of 
P~IIAS", which activates adenylate cyclase and 
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Carbon catabolite repression in enteric bacteria. The phosphoenolpyruvate (PEP)-dependent glucose 


(Glc): phosphotransferase system (PTS) is shown with its general proteins enzyme | (El), a PEP-dependent histidine 
protein kinase, and histidine protein (HPr), the specific proteins IA“ (also called IIA“), IIBS'S, and the transporter 
IICS'S, as well as various reversible phosphate transfer reactions (solid lines). Under famine conditions, P ~ IAC 
activates the adenylate cyclase (CyaA) which converts ATP to 3’, 5’-cyclic adenosine monophosphate (cAMP). This 
alarmone (second-messenger) in a complex with the global activator CrpA enhances transcription of the crpA- 
modulon and synthesis of all catabolic enzymes (broken arrows, positive signs). During growth on glucose and under 
feast conditions which cause a high pyruvate:PEP ratio and dephosphorylation of the PTS-proteins, IA“ inhibits 
transport and metabolism of nonPTS carbohydrates, e.g., lactose and glycerol (broken arrows, negative signs) causing 
inducer exclusion, and elicits catabolite repression through the failure to activate the crpA modulon. 


causes cAMP synthesis. Conversely, nonstarved cells 
have low cAMP levels. The alarmone cAMP, however, 
is the co-regulator of the cAMP-binding or receptor 
protein CrpA (also CRP, CAP; gene crpA), a global 
regulator for the crpA modulon. This regulator binds 
to a consensus sequence(s) located within the pro- 
moters of catabolite repression sensitive operons. 
The cAMP/CrpA complex then interacts with the 
RNA-polymerase -subunit and acts as an activator 
for transcription. Feast conditions and low cAMP 
concentrations which elicit carbon catabolite repres- 
sion in mechanistic terms act through a lack of gene 
activation. The CrpA binding site may be located 
further upstream and require a DNA-binding activity 
intrinsic to CrpA and/or the presence of other regu- 
latory proteins, e.g., specific gene activators involved 
in specific induction. CrpA may also weaken the bind- 
ing of specific repressors, and cause the recognition 
of alternative promoters by RNA-polymerase. In rare 
cases it may even act as a repressor and decrease tran- 
scription. 

Members of the crpA modulon include all catabolic 
operons and regulons, and many systems involved 
in a more general way in the quest for food, e.g., 
those involved in carbon storage, cell motility and 
starvation control. This includes autoregulation of 
genes cyaA and crpA whose expression depends also 
on cAMP/CrpA. All the mechanisms together, i.e. 
the cAMP/CrpA-dependent catabolite repression 
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and autoregulation as well as the IIA@'-mediated 
inducer exclusion effect on catabolic systems which 
require a specific inducer for full transcription rates, 
essentially explain the various regulatory phenomena 
related to catabolite repression in the enteric bacteria. 
They all act eventually by modulating, on the one 
hand, the intracellular concentration of inducers for 
specific operons and regulons and, on the other hand, 
the intracellular amount of the global regulator CrpA 
with its co-regulator cAMP. 


Carbon Catabolite Repression Involving 
the Global Regulator CcpA in Bacillus 
subtilis 


Gram-positive bacteria possess an ATP-dependent 
serine protein kinase HprK whose activity is modu- 
lated during feast and famine conditions by glycolytic 
intermediates, e.g., fructose bisphosphate (Figure 2). 
One of its substrates is the targeting subunit HPr and 
perhaps other HPr-like proteins of the PTS (see above). 
When in the dephosphorylated form, i.e., under feast 
conditions, HPr can be phosphorylated at a conserved 
serine residue by the activated serine kinase. In this 
state, HPr becomes refractory to phosphorylation by 
El at the histidine residue, and hence inactive in trans- 
port (‘inducer exclusion’). At the same time, it can 
now modulate the activity of a global regulator called 
catabolite control protein A or CcpA by either 
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Figure 2 Carbon catabolite repression in Bacillus subtilis. The glucose-PTS is shown as in Figure |. Under feast 
conditions, e.g., during growth on glucose, glycolytic intermediates accumulate that activate an ATP-dependent serine 


protein kinase (HprK). This kinase phosphorylates at a serine residue free HPr, i.e., 


molecules that are not 


phosphorylated at a histidine residue. Thus activated, HPr Ser-P complexes to the global repressor CcpA which 
under famine conditions is inactive; the complex represses the ccbA modulon and the synthesis of (most) catabolic 
enzymes, thus causing carbon catabolite repression. A putative HPr-phosphatase (HprP) ends the process and 


generates free HPr. 
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increasing or decreasing its binding to a consensus 
binding site (cre for catabolite response elements) 
located in or close to the promoters of all members 
of the ccpA modulon. CcpA can act as a repressor 
(normally) or as an activator (rarely) for gene tran- 
scription, i.e. it acts like a mirror-image of CrpA. 
Besides carbon metabolism, carbon storage and 
starvation survival, enzymes involved in extracellular 
polymer degradation, cell adhesion including motility, 
and especially sporulation, are all subject to catabolite 
repression. Besides the PTS and the CcpA-dependent 
global control, many other mechanisms are involved 
in catabolite repression and form a complex regula- 
tory network. Thus, CcpA seems to be part of the link 
that couples catabolite repression triggered by carbon 
starvation to that triggered by nitrogen starvation, a 
mechanism that requires alternative PTS-proteins in 
enteric bacteria. In this case, the PTS sensory system 
seems to allow cross-regulation between a global regu- 
latory network for the catabolism of carbon sources 
to one for nitrogen metabolism. 


Catabolite Repression in the Yeast 
Saccharomyces cerevisiae 


Carbon catabolite repression in S. cerevisiae, although 
more complex because of the multitude of regulators 
and their ancillary proteins, basically follows the same 
strategy as in bacteria, particularly where glucose 
is involved. Glucose in the medium is sensed by 
membrane-bound proteins that either have sensing 
and transporter activity (Hxt), or have lost the trans- 
porter activity (e.g., Rgt2/Snf3). There are high- 
affinity (Hxt1-4) and low-affinity (Hxt1) transporters 
whose synthesis is controlled by the glucose concen- 
tration in the medium. As in bacteria, it is the rate with 
which glucose enters metabolism that defines the level 
of catabolite repression. At least two different glucose 
sensing and signal transduction pathways are in- 
volved: one for induction of genes involved in glucose 
transport and glycolysis, the other for repression of 
the genes under catabolite repression control, e.g., those 
for proteins in the respiratory pathways and for the 
utilization of lesser carbon sources. Both pathways in- 
volve a multitude of sensors (e.g., Hxt1—-4; Rgt2; Snf3; 
Gpr1, Gpa2), protein kinases/phosphatases and trans- 
mitters (e.g., Snf1 to Snf4 kinases; cAMP-dependent 
PKA kinases; Glc7; Grr1; Reg1), adenylate cyclase 
(Cyr1) and the second messenger cAMP, and several 
general and specific gene regulators (e.g., Rgtl; 
SCFS""'; Mig1). A complete picture of the mechanisms 
involved in glucose induction and repression cannot be 
given mainly because it is not clear which molecules 
among the plethora of regulators identified thus far are 
involved directly, and which ones are involved only 


indirectly. Thus hexokinase 2 (Hxk2) plays a major 
roleincatabolite repression when cells grow with abun- 
dant glucose but probably only because it alters the 
AMP:ATP ratio drastically and by so doing the activity 
of, e.g., AMP-activated or ATP-dependent protein 
kinases/phosphatases (e.g., Snf2) involved in gene 
expression. Thus it is clear that as we begin to under- 
stand catabolite repression in bacteria, our understand- 
ing ineven the lower eukaryotes is still very incomplete. 
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Cattanach’s translocation was first described in 1961. 
It is a complex mouse chromosome rearrangement in 
which a central region of chromosome 7 has been 
inserted into the X chromosome. The autosomal loci 
in the inserted segment are variably subject to the 
X-inactivation process and additionally express a 
variegated-type position effect. The insertion is in- 
verted relative to the centromere and for this reason 
recombinants are not recovered. The original varie- 
gated female occurred in the cross of a mutagenized 
wild-type male with a mutation testing stock female 
homozygous recessive alleles at seven visible loci. 
Notably, these included alleles at the pink-eyed dilu- 
tion (p) and chinchilla (c*) loci. 


Chromosomal Breakpoints 


On the physical map, the central chromosome 7 region 
is inserted into the X chromosome at the junction of 
Giemsa bands XF1 and 2. On the linkage map, it lies 
closely distal to the mottled (Mo) locus. No loss from 
the X chromosome has been detected. As the insertion 
comprises about one-third of chromosome 7, the 
rearranged X is the longest in the chromosome com- 
plement (14% longer than the longest normal chromo- 
some, chromosome 1). It therefore provides a good 
cytogenetic marker. The proximal and distal chromo- 
some 7 breakpoints lie at the junction of G bands 
7B-C and 7E2 on the physical map, and between the 
ruby-2 (ru2) and quivering (qv), and shaker-1 (sh1) 
and hemoglobin B-chain (Hb) on the linkage map. 
Although at pachytene of male meiosis the central 
region of the normal chromosome 7 is regularly seen to 
assume a loop formation and pair homologously with 
the insertion within the X, at diakinesis multivalent 
associations are seen witha frequency of less than 10%, 
which is in accord with the rarity of anaphase bridges. 


Inheritance 


Two forms of the translocation exist, the original 
balanced form (Type I) and its unbalanced derivative 
(Type II). The latter has three copies of the central 
chromosome 7 region (two in normal chromosome 7s 
and one in the X). Type I females are fertile, but the 
equivalent male is commonly sterile on genetic back- 
grounds other than that of its origin. Type II females and 
males are almost invariably fertile, although the females 
are prone to have imperforate vaginas. The other un- 
balanced derivative, which is deficient for the central 
region of chromosome 7, dies early in development. 


Phenotypes 


The wild-type alleles of the coat and eye color genes, 
ru2, p and c lie within the insertion. These are liable to 
be inactivated when the rearranged X is inactivated in 
the normal process of X-inactivation. When recessive 
alleles at one or more of the three loci are present on 
the normal chromosome 7(s), variegated coat and eye 
colors may be seen. In Type I females the variegating 
coat color is that of the hemizygote for alleles/genes on 
the single chromosome 7, and with the Type II females 
it is that of the compound of alleles/genes on the two 
normal chromosome 7s. The difference provides one 
means of distinguishing Type I and Type II females. In 
males, both the TypeI and Type II classes, having only a 
single active X, are phenotypically wild-type. They can 
be distinguished from each other with some reliability 
by the fact that the latter become growth retarded after 
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birth and havea reduced viability whichis genetic back- 
ground dependent. Chromosome 7 markers can allow 
the distinction of chromosomally normal segregants. 


Sex Chromosome Aneuploids and 
Homozygotes 


Elevated frequencies of X-Y nondisjunction and sex 
chromosome loss occur in both Type I and Type II 
males. Nonvariegated XO daughters lacking the pater- 
nal rearranged X are therefore commonly produced, 
and variegated XXY males which have inherited the 
paternal rearranged X as well as the Y are also found. 
XO females carrying a single rearranged X can be 
generated and, of necessity, are phenotypically wild- 
type. Type II homozygous females die early in develop- 
ment but the Type I/Type II compound is viable 
and wild-type. 


Position Effect Variegation 


Although the variegated coat phenotypes of the 
translocation heterozygotes are primarily caused by 
random inactivation of one or other of their X chromo- 
somes, a Drosophila-type position effect variegation 
also occurs. Thus, loci that are closer to a breakpoint 
are more likely to be inactivated when the X is inactiv- 
ated and heterochromatic than those located further 
away. This has been demonstrated in several ways. 
First, c variegation is typically more extensive than p 
variegation, the p locus lying more centrally in the 
insertion. Second, in mice with the balanced form of 
the translocation and showing variegation both for the 
p and c loci, three colors (wild-type, the compound 
white color of p c”/deficiency, and the brownish color 
attributable to c”/deficiency) are found. This implies 
that the p locus can be active in cells in which the c 
locus is inactivated. Third, p and/or c variegation still 
occurs, if at low levels, when the Is1Ct X is forced, 
through the addition of the t(16;X)16H translocation, 
to be the inactive X in all cells. Again, pigmented areas 
are more evident when the p locus is studied. 

Variegation for ru2, which lies at the other end of 
the insertion from the c locus, also gives more evident 
variegation than that with the p locus, indicating that 
the inactivation effects of the inactive X spread into 
both sides of the insertion. 

Studies on aging animals have shown that the posi- 
tion effect variegation reverses with age. This has been 
most clearly demonstrated with t(16;X) 16H/Is1Ct 
compounds that show extensive amounts of c varie- 
gation; the white c areas progressively darken with age 
eventually to appear slate colored. Hair plucking 
studies have shown that this is not related to cell 
cycle number but rather to a true temporal effect. 
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This time-related reactivation has suggested that the 
mouse position effect variegation is not attributable to 
a progressive spread of inactivation into the insertion 
from the heterochromatic X, as hypothesized for Dros- 
ophila position effect variegation but, conversely, 
results from the progressive reactivation of the pre- 
viously inactivated autosomal loci. 


Uses of Translocation 


The translocation has been extensively used in diverse 
genetic and cytogenetic studies. In addition to provid- 
ing the first recorded examples of the XXY condition 
in the mouse, selection studies upon the levels of varie- 
gation ultimately led to the recognition of the Xce 
locus and its control of the randomness of X inactiva- 
tion. Xce effects have been investigated in both fetal 
and placental tissues using the translocation. The trans- 
location has also been used in (1) diverse X-inactivation 
studies, (2) comparisons of X-inactivation and chimera- 
based variegation, (3) biochemical studies upon gene 
dosage at the clocus, (4) creating flow sorted X chromo- 
some libraries, (5) duplication mapping, (6) eye 
pigmentation studies, (7) investigations of the influ- 
ence of pigmentation on the retinofugal pathways and, 
as a long marker chromosome, it has also been used in 
(8) cytogenetic studies on X inactivation, and (9) to 
investigate the single-cell origin of induced tumors. 
More recently, the translocation has been used to gen- 
erate maternal duplication/paternal deficiency for the 
central region of chromosome 7 which creates a mouse 
model of the human imprinting condition, Prader- 
Willi syndrome. Currently, unbalanced (Type I) 
animals are being used to investigate the basis of autism 
found in humans with additional copies of the homo- 
logous region. 
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Complementary DNA (cDNA) is the DNA pro- 
duced on an RNA template by the action of reverse 


transcriptase (RNA-dependent DNA-polymerase). 
The sequence of the cDNA becomes complementary 
to the RNA sequence. Unlike RNA, DNA molecules 
can be cloned easily (these are called “cDNA clones’) 
by making the cDNA double-stranded and ligated to a 
vector DNA. Sequence analysis of DNA is much 
easier than that of RNA, thus, cDNA is the essential 
form in the analysis of RNA, particularly of eukar- 
yotic mRNA. 

Eukaryotic genes are fragmented (as exons) in the 
genomic DNA by the presence of intron sequences. 
When a gene is expressed, the entire gene region 
including the intron sequences is initially transcribed 
to RNA. Then the introns are removed (a process 
called ‘splicing’) to generate mature mRNA which 
has a continuous set of triplets (three base genetic 
codons) corresponding to the amino acid sequence of 
the protein product. The pattern of splicing can be 
variable, leading to the production of different proteins 
froma single gene. This information is obtained mainly 
from cDNA analysis. Finally, cDNA clones are used 
for the production of proteins, using suitable expres- 
sion systems such as bacteria, yeast, or animal cells. 


See also: DNA Cloning; Reverse Transcription 
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In the first half of the nineteenth century, accumulat- 
ing evidence from microscopic observations led to the 
recognition of cells as the fundamental building blocks 
of plant and animal tissues, and raised the question 
of how new cells are made. Pioneering microscopists 
witnessed the birth of new cells from the division 
of pre- existing cells, and repeated observation of cell 
division in many tissues firmly established that all 
cells arise by the division of parental cells. Indeed, 
over 150 years later, our appreciation of the complex- 
ity of cell organization makes it inconceivable that 
cells could arise in any other way. The means whereby 
self-replicating cells first evolved in the primeval soup 
remains one of the deepest mysteries in the origin of 
life, and even with our rapidly increasing technology 
and knowledge about what cells are made of, the goal 


of synthesizing even the simplest living cell from 
scratch using chemical ingredients is likely to remain 
unattainable for the foreseeable future. Even the copy- 
ing of a cell’s contents and their distribution to produce 
two daughter cells is a stunning feat requiring exquis- 
ite coordination. The set of carefully orchestrated steps 
by which proliferating cells make copies of themselves 
constitutes the cell cycle. 


Duplication, Segregation, and Division 


We can distinguish three processes that all cells must 
complete in order to proliferate: cellular constituents 
must be duplicated, the duplicated sets of constituents 
must be spatially segregated from each other, and the 
parent cells must then divide in two so that each 
daughter cell inherits all necessary ingredients from 
the parent. In most cases these three processes occur 
sequentially, and the cell cycle in eukaryotic cells (cells 
that have a nucleus) has been divided into ‘interphase,’ 
when cell contents are duplicated, ‘mitosis,’ when du- 
plicated components are segregated from each other, 
and ‘cytokinesis,’ when the cell physically divides. 
Interphase consumes the lion’s share of the cell cycle, 
whereas in most cells mitosis takes less than 10% of 
the cell cycle time, and cytokinesis is even more rapid. 


Interphase 

By far the most intensively studied of the duplication 
events occurring in interphase is DNA replication. 
DNA replication occurs during a restricted portion 
of interphase called S (Synthesis)-phase; the periods 
before (G4) and after (G2) S-phase are called ‘gap’ 
phases, as early studies did not detect any obvious 
changes occurring during those periods of the cell 
cycle. However, we now recognize that much of the 
information processing that coordinates successful 
proliferation occurs during these G, and G; phases 
(see below). 

DNA ineukaryoticcellsis presentina set of chromo- 
somes, linear segments of DNA each of which codes 
for hundreds or thousands of genes. Given the rate at 
which DNA polymerases copy DNA, a single poly- 
merase would take several days to copy a large chromo- 
some, but S-phase (during which all chromosomes are 
copied) typically takes less than 8h in mammalian 
cells, and can even be completed in under 10 min in 
early fly embryos. This speed-copying is accom- 
plished by using many polymerases to copy each 
chromosome. For this strategy to succeed, each poly- 
merase must copy a distinct segment of the chromo- 
some, and no polymerase can be allowed to recopy 
a segment that has already been copied. But how can 
a polymerase distinguish whether it is appropriate to 
copy a particular segment? 
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The polymerases are loaded onto DNA at special- 
ized sites on the chromosomes called origins of repli- 
cation. Once loaded, polymerases start copying DNA 
moving away from the origins in both directions, and 
keep going until they meet a polymerase coming the 
other way or until they reach the end of the chromo- 
some. Some origins initiate replication early in S-phase, 
while others initiate replication later in S-phase, but a 
single origin never initiates replication more than once 
during a particular S-phase. This behavior helps to 
guarantee that polymerases don’t start rereplicating 
DNA, and raises the question of how an origin 
‘knows’ that it should not start another round of 
replication. In addition, how is it that the same origin 
is once again allowed to initiate replication in the next 
S-phase? Furthermore, why is it that origins do not 
initiate replication during other parts of the cell cycle 
than S-phase? One of the most satisfying advances in 
cell cycle research during the past decade has been the 
discovery of how cells ensure that all parts of every 
chromosome are copied once and only once in each 
cell cycle (see below). 

The microtubule organizing center must also be 
precisely duplicated once per cell cycle. Microtubules 
are cytoskeletal polymers (long hollow fibers of 
stacked protein subunits) that help to determine cell 
shape during interphase and that reorganize during 
mitosis to make a remarkable apparatus that segre- 
gates the duplicated chromosomes to opposite ends 
of the cell (see below). The microtubule organizing 
center (either a centrosome or a spindle pole body 
depending on the species) nucleates polymerization 
of microtubules, and hence determines the spatial dis- 
tribution of these fibers within the cell. Newborn cells 
have a single microtubule organizing center that is 
duplicated at a characteristic time (frequently at the 
beginning of S-phase) during interphase. It appears 
that common signals trigger both DNA replication 
and duplication of the microtubule organizing center, 
although the detailed mechanism underlying this 
duplication remains mysterious. 

Duplication of other components of the cell need 
not be as exact as it must be for DNA, perhaps 
suggesting that duplication of these components is 
more or less automatic, requiring little in the way of 
sophisticated coordination. Nevertheless, it is aston- 
ishing that each of the cell’s many components are 
(approximately) doubled during each interphase. Stud- 
ies on yeast cells suggest that some of these dupli- 
cation events may be linked: if delivery of new 
membrane to the cell surface (needed to double the 
amount of plasma membrane) is blocked, then a signal 
is sent to halt production of new ribosomes (the cyto- 
plasmic factories that synthesize proteins). This may 
represent just the tip of the iceberg of self-monitoring 
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capabilities that cells employ to coordinate duplica- 
tion of their various constituents. 


Mitosis 

Once cell constituents are duplicated they must be 
segregated into two distinct portions of the cell that 
will become the daughter cells upon cell division. This 
is a particularly heroic task in the case of the chromo- 
somes. For instance, human cells contain 46 chromo- 
somes (two homologs each of 22 distinct autosomes 
plus two sex chromosomes) that are duplicated during 
interphase, yielding a total of 92 chromosomes. How 
do cells ensure that exactly 46 chromosomes (and the 
right ones, not a jumbled assortment!) are received by 
each daughter cell? 

A key to the success of this endeavor is the con- 
struction of a segregation machine called the mitotic 
spindle, built with microtubules (Figure 1). Micro- 
tubule organizing centers, duplicated during inter- 
phase, move apart from each other near the beginning 
of mitosis and organize microtubules into a bipolar 
array in which fibers from each pole spread out into 
the cell interior. In animal and plant cells, the nuclear 
envelope breaks down into vesicles, allowing the 
microtubules to gain access to the chromosomes (pre- 
viously sequestered within the nucleus). In fungi, the 
spindle pole bodies are embedded within the nuclear 
envelope and are able to grow microtubules directly 
into the nucleus, contacting chromosomes without 
need for nuclear envelope breakdown. A specialized 
region of each chromosome recruits many proteins to 
form the ‘kinetochore,’ a grasping hand that can cap- 
ture and hold onto microtubules, thereby forming 
connections between the chromosomes and the spin- 
dle poles (Figure 2A, B). 

A second key to successful chromosome segrega- 
tion is that replicated chromosomes don’t drift apart 
and become lost in a free-for-all within the nucleus. 
Rather, during S phase a sticky protein-based ‘cohe- 
sion’ is established so that the two copies of each 
chromosome (called ‘sister chromatids’) remain joined 
to each other along their length. During mitosis 
the chromosomes become highly condensed, and the 
kinetochores of sister chromatids are stuck back-to- 
back so that the kinetochore of each sister cannot 
interact with microtubules coming from the direc- 
tion of the other sister (Figure 2A). This promotes 
bipolar attachment of the chromosomes to the 
spindle, so that for each pair of sister chromatids 
one kinetochore is connected to one pole, while the 
sister kinetochore is connected to the other pole 
(Figure 2C). 

Once all of the chromosomes have attained a bi- 
polar attachment to the spindle, the cohesion between 
sister chromatids dissolves in a concerted manner 


for all chromosomes, and the sisters move toward 
opposite sides of the cell (Figure 2D). This involves 
both the movement of chromosomes toward the spin- 
dle poles and the movement of the spindle poles away 
from each other toward opposite ends of the cell 
(Figure | and 2D). When the spindle poles and asso- 
ciated sets of chromosomes are segregated to opposite 
ends of the cell, the changes induced during mitosis are 
reversed: kinetochores detach from the microtubules, 
the nuclear envelope reforms around the chromo- 
somes, and the chromosomes decondense in prepar- 
ation for the next interphase. 

For this extraordinarily accurate segregation plan 
to work, it is critical that cohesion be maintained until 
all kinetochores have been properly connected to the 
spindle. Yet, the interaction of microtubules with 
kinetochores occurs by chance: It may occur quickly 
if all chromosomes happen to be conveniently placed 
when the nuclear envelope breaks down, or it may 
occur much more slowly if one or more chromosomes 
have drifted to peripheral regions rarely visited by 
microtubules. If cohesion between sisters were dis- 
solved while such chromosomes were still misplaced, 
then both of the lost sister chromatids might end up in 
one of the daughter cells, with the other daughter 
getting no copies of that chromosome. This kind 
of mitotic mistake is remarkably rare, because cells 
monitor correct attachment of sister chromatids to 
the mitotic spindle and do not dissolve the cohesion 
between sisters until all sisters are properly aligned 
(see below). 

As with duplication, segregation of other cell 
components need not be as exact as that of the micro- 
tubule organizing centers and chromosomes. Never- 
theless, a number of other events occur during mitosis 
to ensure the more-or-less equal partitioning of mem- 
brane, cytoplasm, and organelles to daughter cells. 
Animal cells, which often have very asymmetric and 
irregular shapes during interphase, are usually re- 
shaped into symmetrical spheres (through remodeling 
of another set of fibers called actin filaments) during 
mitosis, which allows the fairly equal partitioning 
of the cytoplasm during cytokinesis. Some mem- 
brane-bounded organelles disintegrate into smaller 
vesicles that are distributed in sufficient numbers to 
ensure that each daughter inherits an adequate com- 
plement from which to rebuild a functional organelle. 
Many of these vesicles also appear to hitch a ride on the 
mitotic spindle to assist in ensuring that approxi- 
mately equal numbers of vesicles end up in each 
daughter cell. 


Cytokinesis 
Cytokinesis, the physical division of one parent cell in- 
to two daughters, begins during or after chromosome 
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Figure | Mitosis. Photographs of rat kangaroo kidney cells at different stages of mitosis, showing the positions 
of the chromosomes (seen by phase contrast microscopy; left panels) and microtubules (made visible by using 
fluorescent antibodies to decorate the fibers; right panels). In the top cell, the microtubule organizing centers 
have moved apart and the chromosomes are starting to condense within the nucleus at the beginning of mitosis. 
In the middle cell, the nuclear envelope has disassembled and microtubules from the two poles have contacted 
the kinetochores of most of the sister chromatids, whereas in the bottom cell the cohesion between the sisters 
has been dissolved and the chromosomes are moving toward opposite poles of the spindle. Note how the 
sister chromatid pairs in the middle cell are thicker than the segregating individual chromatids in the bottom cell. Also, 
the distance between the poles is increasing (compare the middle and lower cells) as the poles move to opposite ends 
of the cell. (Pictures kindly provided by Julie Canman and Ted Salmon, University of North Carolina, Chapel Hill, NC, 


USA.) 


segregation. The mechanisms of cytokinesis appear to 
be quite variable depending on the cell type. Most 
animal cells have an easily deformable plasma mem- 
brane, and divide by changing cell shape, constricting 
a ring of actin and myosin filaments (similar to those 
that power muscle movements) to generate a ‘cleavage 
furrow’ that pinches the cell in two (Figure 3A). 
Plant, fungal, and bacterial cells, in contrast, are sur- 
rounded by a rigid cell wall and divide by synthesizing 
a new cell wall or septum that bisects the parental cell 
(Figure 3B). In all cases it is critical that the plane of 
cell division be perpendicular to the axis of the mitotic 


spindle. How is this spatial coordination accom- 
plished? 

Experiments in which the mitotic spindle was phys- 
ically displaced in large animal cells revealed that the 
position of the spindle dictated the position of the 
cleavage furrow. This coupling appears to be mediated 
by a signal emanating from the center of the mitotic 
spindle (where the microtubules from each pole are 
interdigitated) and acting on the local cell cortex to 
assemble the actin-myosin ring and initiate furrow- 
ing in the right place. In contrast, budding yeast cells 
always divide at the ‘neck’ between mother and bud; 
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Figure 2 Chromosome segregation. (A) Schematic of a fully condensed pair of sister chromatids during mitosis. 
Because microtubule fibers are rigid they tend to assemble in straight lines or shallow curves, making it very unlikely 
that microtubules from one pole will loop around to contact both sister kinetochores. (B) Chance encounters 
between microtubules and kinetochores promote capture of the microtubule and formation of a stable linkage 
between the chromosome and the spindle pole. (C) Eventually, chance encounters will lead to ‘bipolar’ attachments in 
which sister kinetochores grasp microtubules from opposite poles. This leads to pulling forces attempting to separate 
the sister chromatids, generating tension at the kinetochores. (D) Once all of the chromosomes attain a bipolar 
attachment, the cohesion that keeps sister chromatids together is dissolved and the chromosomes are pulled to 
opposite poles as the poles also move apart to opposite ends of the cell. (Adapted with permission from a figure by 


Bruce Nicklas, Duke University, NC, USA.) 


in essence, these cells build a daughter cell (the bud) 
adjacent to the parent, segregate components into the 
bud, and form a septum at the neck once segregation is 
complete. In this case, it is the spindle that must orient 
along the predetermined mother-bud axis so that it is 
perpendicular to the cleavage plane. Spindle orien- 
tation is achieved through pulling of the spindle pole 
bodies by attached cytoplasmic microtubules that 
interact with cortical cues established by the actin 
cytoskeleton (which is polarized along the mother- 
bud axis). Thus, in animal cells actin responds to spa- 
tial information from microtubules in the spindle, 
while in yeast cells microtubules respond to spatial 
information from the actin cytoskeleton. 


Timing: A Cell Cycle Clock 


How do cells know when their contents have been 
duplicated and it is time to enter mitosis? How do cells 


know when segregation has been accomplished and it 
is time to begin cytokinesis? What is it that triggers 
DNA replication (the onset of S-phase) within inter- 
phase? Early answers to these questions came from 
clever experiments in which the plasma membranes of 
animal cells at different cell cycle stages were fused 
together, yielding one big cell with two nuclei and 
mixed cytoplasm. Fusion of an interphase cell with a 
cell in mitosis caused the interphase nucleus to enter 
mitosis immediately (and prematurely), indicating the 
presence in mitotic cells of a diffusible factor that 
triggered entry into mitosis. Similarly, fusion of an 
S-phase cell with a cell in G1 caused the G1 nucleus 
to begin replicating DNA prematurely, indicating the 
presence of another diffusible factor that triggered 
entry into S-phase. The molecular identities of these 
and other factors controlling the timing of cell cycle 
events emerged from studies on the cell cycles of early 
embryos and of unicellular yeasts. 


Cell Cycle 291 


Figure 3 Cytokinesis. (A) Cytokinesis in animal cells is illustrated with pictures of a sand dollar egg undergoing its 
second cleavage. The spindle poles are separating in the left panel, and the cell surface is starting to invaginate in the 
second panel, forming a cleavage furrow that ingresses until the cells are pinched in two. (Reproduced with 
permission from Rappaport R (1996) Cytokinesis in Animal Cells. Cambridge: Cambridge University Press.) 
(B) Cytokinesis in plant cells is illustrated with pictures of Tradescantia stamen hair cells. Chromosome segregation 
is occurring in the left panel, and the beginning of a cell plate or septum can be seen forming in the middle of the cell 
in the second panel. The cell plate then grows outward until it completely bisects the cell. (Phase contrast microscopy 
pictures kindly provided by Aline H. Valster and Peter K. Hepler, University of Massachusetts, Amherst, MA, USA.) 


Cyclins and Cyclin-Dependent Kinases 

Marine invertebrates and amphibians invest a lot of 
energy into generating huge egg cells (up to 50 
times the diameter of typical somatic cells) that, once 
fertilized, embark on a frantic program of rapid cell 
division to generate several thousand small cells 
without additional growth. Eggs have stored reserves 
of all of the cell’s components except DNA and the 
microtubule organizing center (for which the egg 
must wait for the sperm’s contribution), so interphase 
in the early embryonic divisions is reduced to the 
task of replicating DNA and duplicating centrosomes. 
These stripped-down cell cycles have been extraordin- 
arily useful for investigating the proteins respon- 
sible for driving the cell cycle. Studies on these 
embryos revealed a class of proteins called ‘cyclins’ 
that accumulated during each interphase and were 
destroyed during each mitosis. We now know that 
accumulation of one class of cyclins triggers entry 
into S-phase, whereas another class of cyclins triggers 
entry into mitosis. Furthermore, cyclin destruction is 


needed to promote cytokinesis and to return to inter- 
phase. 

Further insight into the molecular machinery driv- 
ing the cell cycle came from studies of unicellular 
yeasts. Budding yeasts (the ones used for baking and 
brewing) and the distantly related fission yeast were 
particularly attractive for cell cycle studies because 
their cell shape provided a rapid and simple readout 
of the cell cycle stage (Figure 4). In addition, the ease 
of obtaining and manipulating mutant strains of yeast 
allowed investigators to identify genes encoding key 
components responsible for driving the cell cycle. One 
particular gene called cdc2 was identified in two separ- 
ate genetic screens: one screen identified conditional 
cdc (cell division cycle) mutants that arrested cells in 
interphase, while the second screen identified ‘wee’ 
mutants that accelerated entry into mitosis. The fact 
that a single gene, cdc2, was identified in both screens 
suggested that altering the activity of the encoded 
protein (designated as Cdc2 with a capital C) in dif- 
ferent ways could either prevent (cdc arrest) or 
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Figure 4 Yeast cell cycles. Yeast cells are surrounded 
by a rigid cell wall and change shape in a characteristic 
manner as they proceed through the cell cycle. (A) The 
baker’s yeast, Saccharomyces cerevisiae, grows larger 
during G, and begins to form a bud at the beginning of 
S-phase. The bud grows during S-phase and G}, and 
during mitosis the nucleus elongates and divides along 
the mother—bud axis prior to cytokinesis. The relative 
size of the bud compared to the mother provides a 
rough indication of where the cell is in the cell cycle. (B) 
The fission yeast, Schizosaccharomyces pombe, is rod- 
shaped and grows at its ends, so that the length-to-width 
ratio provides a rough indication of where the cell is in 
the cell cycle. Cytokinesis and septum formation (which 
occurs through new cell wall growing inward at the cell 
middle in fungi, rather than outward as in plants) take a 
significant amount of time, and in rich nutrient broth 
these cells grow with a very short G, phase and begin 
S-phase while the septum is still being constructed. By 
the time cell separation is complete, the cells are in G3 
where they spend the bulk of the cell cycle. 


accelerate (wee phenotype) the transition from inter- 
phase to mitosis. 

These findings came together with the discovery 
that cyclin and Cdc2 were both subunits of a protein 
complex purified from starfish and frog egg cytoplasm 
that induced entry into mitosis upon injection into 
interphase cells. Cdc2 is a protein kinase (an enzyme 
that modifies specific target proteins by transferring a 
phosphate from ATP onto the target protein) whose 
enzymatic activity is switched on upon binding to 
cyclin. Cdc2 and its relatives are now known as 
‘cyclin-dependent kinases,’ or CDKs, because of this 
property. Cyclin/Cdc2 and other kinases turned on at 
mitosis by cyclin/Cdc2 catalyze a large increase in the 


phosphorylation (the number of phosphates attached) 
of many cellular proteins, and these phosphorylations 
are thought to alter the properties of those proteins 
so as to induce chromosome condensation, nuclear 
envelope disassembly, altered microtubule behavior, 
and other events of early mitosis. Following cyclin 
destruction, Cdc2 is inactivated and cellular phos- 
phatases (a phosphatase is an enzyme that removes 
the phosphate groups attached by kinases) return the 
Cdc2 targets to their dephosphorylated form, leading 
to nuclear envelope reassembly, chromosome decon- 
densation, and exit from mitosis. 

Subsequent studies have discovered many more 
cyclins and CDKs, and they appear to drive cell 
cycle progression in all eukaryotic cells. Whereas the 
‘mitotic cyclins’ discussed above trigger entry into 
mitosis, a distantly related set of ‘G4 cyclins’ activates 
CDKs to promote initiation of DNA replication (the 
transition from G, to S-phase) and duplication of 
the microtubule organizing center during interphase. 
The different classes of cyclins cause the CDKs to 
which they bind to target distinct (though overlap- 
ping) sets of proteins for phosphorylation, leading to 
different cell-cycle events. In this way, the sequential 
accumulation and destruction of different cyclins trig- 
gers the events of the cell cycle in the proper order. 


Building an Autonomous Oscillator 

If cyclin accumulation and destruction drive the cell 
cycle, then what is it that drives cyclin accumulation 
and destruction? Remarkably, cycles of accumulation 
and destruction occur spontaneously in cell-free ex- 
tracts of frog eggs that lack nuclei and microtubule 
organizing centers, suggesting that these extracts con- 
tain a cyclin-based biochemical oscillator or cell cycle 
clock. Cyclin synthesis (by ribosomes programed 
with a pool of stable cyclin mRNA) occurs at a con- 
stant rate, but cyclin destruction only occurs in short 
bursts at the end of each cycle. This leads to a ‘saw- 
tooth’ pattern of cyclin abundance, with gradual accu- 
mulation of cyclin punctuated by brief episodes of 
cyclin annihilation (Figure 5A). What triggers this 
sudden annihilation? Cyclin destruction takes place 
inside a protein complex called the proteasome, 
which feeds proteins through a tunnel-like interior 
cavity that chews them up into small peptides and 
amino acids. This executioner does not touch the 
majority of proteins, but specifically recognizes 
those that have been sentenced to death by conju- 
gation to a small protein called ubiquitin. Cells have 
a complex biochemical judiciary to ensure that only 
the right proteins are flagged with ubiquitin, and the 
judges who deliver the death sentence are called ‘unbi- 
quitin ligases.’ The particular ubiquitin ligase respon- 
sible for flagging cyclin is another protein complex 


that is dormant much of the time, but is awakened by a 
process that involves phosphorylation of several of its 
constituent proteins by active cyclin/CDK. Thus, 
cyclin/CDK sows the seeds of its own demise (follow- 
ing a lag period that is still poorly understood) by 
activating the ubiquitin ligase that flags cyclin 
for destruction. Once the cyclin is gone, the CDK 
becomes inactive, and ever-present phosphatases 
reverse the phosphorylations that activated the 
ubiquitin ligase, so that cyclins can once again begin 
to accumulate and the cycle starts anew (Figure 5A). 

Why is the ubiquitin ligase activated suddenly 
toward the end of the cycle, rather than gradually as 
the cyclin accumulates? It turns out that the gradual 
accumulation of cyclin does not lead to a gradual 
activation of the CDK to which it binds. Instead, 
another kinase keeps cyclin/CDK complexes inactive 
by phosphorylating a key position on the CDK 
(Figure 5B). A specific phosphatase attempts to 
remove the offending phosphate and allow cyclin/ 
CDK activation, but at the time when cyclin starts to 
accumulate the balance of power between the inhibi- 
tory kinase and the activating phosphatase is tilted 
firmly in favor of the kinase, and the accumulating 
cyclin/CDK complexes remain inactive (Figure 5B). 
However, as more cyclin/CDK complexes are formed, 
a few of the complexes escape inhibition, and these 
active complexes begin to phosphorylate both the 
inhibitory kinase (decreasing its activity) and the acti- 
vating phosphatase (increasing its activity). This tilts 
the balance in favor of the phosphatase, and leads to 
rapid activation of all of the remaining cyclin/CDK 
complexes (Figure 5B). Thus, the gradual accumu- 
lation of cyclin is not converted into CDK activation 
until a significant amount of cyclin has built up, at 
which time the CDK is suddenly activated, leading to 
abrupt activation of the ubiquitin ligase and conse- 
quent cyclin destruction. Following cyclin destruc- 
tion and CDK inactivation, the phosphorylations of 
the inhibitory kinase and activating phosphatase are 
reversed, once again tilting the balance of power in 
favor of the inhibitory kinase at the beginning of the 
next cycle. 

From the simplified outline above, the cyclin cycle 
in frog egg extracts can be understood as an alternation 
between two unstable states: a low-CDK-activity state 
that is unstable because cyclin accumulation eventually 
triggers abrupt CDK activation, and a high-CDK- 
activity state that is unstable because it promotes 
cyclin destruction and CDK inactivation. The cyclin 
oscillations observed in typical somatic cells are 
considerably more complex, but they use similar regu- 
latory strategies to promote the constant progress 
from one unstable state to the next. For instance, in 
budding yeast the accumulation of G, cyclins triggers 
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Figure 5 The cyclin/CDK oscillator. (A) The abun- 
dance of mitotic cyclin in extracts from frog eggs rises 
gradually and then collapses rapidly in a ‘sawtooth’ 
pattern. This is driven by the periodic activation of a 
ubiquitin ligase (a protein complex whose components 
were discovered through genetic screens in yeast) that 
flags cyclin for destruction (see text). (B) Gradual cyclin 
accumulation is converted into a delayed but sudden 
activation of cyclin/ CDK complexes as a result of a 
regulatory ‘positive feedback loop’ involving an inhibi- 
tory kinase (called Weel) and an activating phosphatase 
(called Cdc25), both of which were discovered through 
genetic screens in yeast. As cyclin begins to accumulate, 
cyclin/ CDK complexes are phosphorylated (denoted by 
the circled ‘P’) and thereby inhibited. Although the balance 
between inhibition by the kinase (arrow to the right) and 
reactivation by the phosphatase (arrow to the left) is 
biased toward inhibition, the gradual buildup of cyclin/ 
CDK complexes allows a few of those complexes to 
become active. These begin to phosphorylate both the 
kinase and the phosphatase, tilting the balance toward 
more cyclin/CDK activation, which leads to more 
phosphorylation of the kinase and phosphatase, tilting 
the balance even further, and so on until all of the 
cyclin/ CDK complexes are active, all of the kinase is 
inhibited, and all of the phosphatase is activated. 
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the subsequent accumulation of mitotic cyclins, while 
the mitotic cyclins shut off the synthesis of G4 cyclins. 
Such regulatory circuits keep the cyclin clock ticking 
from one wave of cyclins to the next. 


Executing the Clock’s Instructions 
What is the output of the cyclin/CDK clock? How 
does cyclin/CDK activation or inactivation promote 
cell cycle events? Many of these events are thought to 
be directly triggered by phosphorylations catalyzed 
by cyclin/CDK complexes (or by dephosphorylations 
that follow cyclin destruction and CDK inactivation). 
In addition, many of the cyclin/CDK target proteins 
are themselves kinases or phosphatases that promote 
further phosphorylations and dephosphorylations. 
Another set of cyclin/CDK targets are transcription 
factors (proteins that bind to DNA sequences near 
specific genes and recruit RNA polymerases to pro- 
mote mRNA synthesis) whose activity is altered by 
phosphorylation, leading to waves of gene transcrip- 
tion at different times during the cell cycle. In budding 
yeast about 800 genes (out of approximately 6000 in 
total) are expressed in at least five distinct waves 
during the cell cycle. These include genes encoding 
cyclins and other components of the cell cycle clock, 
but also many genes involved in DNA replication and 
in the events of mitosis. Finally, cell-cycle-regulated 
protein destruction is important not only for destroy- 
ing cyclins but also for executing the clock’s instruc- 
tions. In particular, once the cyclin-destroying 
ubiquitin ligase is activated during mitosis it triggers 
the destruction of other proteins controlling the sticky 
cohesion that ties sister chromosomes together, lead- 
ing to chromosome segregation. 

One of the most intensely studied events of the cell 
cycle is DNA replication. Chromosomal origins of 
replication at the beginning of the cell cycle are popu- 
lated by a ‘preinitiation complex’ of specialized pro- 
teins (including some referred to as licensing factors) 
that control recruitment of replication factors. Many 
of these proteins are phosphorylated by CDKs, and 
others are targets of kinases activated by CDKs. 
In aggregate, these multiple phosphorylations are 
thought to ‘activate’ these proteins to promote origin 
‘firing’ and ensuing DNA replication once appropri- 
ate cyclin/CDK complexes become active toward the 
end of G;. However, origin firing triggers the depart- 
ure of licensing factors from the origin, and crucial 
proteins are then either destroyed, exported from the 
nucleus, or otherwise rendered unavailable as a result 
of the phosphorylations. This explains why origins 
cannot refire during S-phase: the CDKs that promote 
origin firing also prevent the reestablishment of pre- 
initiation complexes. This ‘block to rereplication’ is 
maintained by subsequent waves of cyclins until the 


end of mitosis, when all cyclins are degraded, CDKs 
are inactivated, and the phosphorylations of preiniti- 
ation complex proteins are removed by phosphatases. 
This permits the reaccumulation of proteins that were 
degraded, the reimport of proteins that were excluded 
from the nucleus, and the reassembly of preinitiation 
complexes at origins of replication. Activation of these 
complexes by CDKs accumulating in the next cell cycle 
then activates these proteins to initiate replication once 
more, marking the start of the next S- -phase. This elegant 
two-part strategy, permitting origin licensing only 
while CDKs are inactive and origin firing only while 
CDKs are active, ensures that DNA is replicated once 
and only once during each cycle of CDK activation/ 
inactivation. 


Coordination: Stopping and Starting the 
Cell Cycle 


In the real world, the scenario of a constantly ticking 
cell cycle clock driving continuous cell proliferation 
only applies to very special circumstances, like the 
early embryo or the growth of microorganisms in 
rich nutrient broth. In all other cases, proliferation is 
tightly regulated. Microorganisms adjust their prolif- 
eration rate to match the rate at which they can make 
new cellular components with available nutrients, and 
they stop proliferating entirely when critical nutrients 
are in short supply. Cells within multicellular organ- 
isms generally inhabit a more constant environment 
with nutrients delivered from other cells in more- 
or-less continuous supply, but they are very sensitive 
to instructions from diffusible signals such as hor- 
mones and from adhesion to neighboring cells or to 
extracellular matrix, which ensure that proliferation 
only occurs when it benefits the organism as a whole. 
How are these and other signals translated into regu- 
lation of the cell cycle? 


Response to External Conditions 

Given that the vast majority of cells on the planet are 
not engaged in active proliferation at any given 
instant, we can ask where in the cell cycle they chose 
to halt. For unicellular organisms that lack sufficient 
nutrients, the answer is that all of the cells stop in G4. 
This is not simply because they happened to run out of 
nutrients during G;: the cells apparently decide not to 
embark on another cell cycle before they actually run 
out of nutrients altogether. Studies in budding yeast 
have identified mutant strains that keep proliferating 
when nutrients become scarce, and then die (at what- 
ever stage of the cell cycle they happen to have 
reached) when the nutrients finally run out. This indi- 
cates that healthy cells (i-e., nonmutant or ‘wild-type’ 


cells) make an active decision to stop proliferating 
in G, when nutrients are scarce. Similarly, mammal- 
ian cells arrest proliferation in G1 in response to many 
conditions, including the absence of appropriate 
‘growth factors’ or when deprived of physical anchor- 
age and released to float in liquid medium. Other 
signals, such as cell crowding (causing a phenomenon 
called contact-inhibition) or certain cytokines, cause 
G, arrest even in the presence of growth factors 
and suitable anchorage. In all cases, the external sig- 
nals act through intracellular signal transduction path- 
ways to affect the activation of G, cyclin/CDK 
complexes, and research in this area has uncovered 
complex regulation of G4 cyclin synthesis as well as 
a host of CDK inhibitors that can block the assembly 
or the activity of cyclin/CDK complexes to cause G, 
arrest. 

Even during active proliferation, unicellular organ- 
isms adjust the rate of progression through the cell 
cycle to match the rate at which the available nutrients 
permit cells to double their mass. When nutrients are 
available at low levels it takes cells longer to double 
their mass, and cells appear to wait until they reach a 
‘critical size’ before proceeding with the cell cycle. In 
budding yeast the critical size must be reached during 
G, to allow G, cyclin/CDK activation, whereas in 
fission yeast the critical size must be reached in G3 to 
allow mitotic cyclin/CDK activation. The basis for 
this ‘size control’ is unclear, but the term may be 
misleading, as it is thought that cells may respond to 
some parameter (like total protein synthesis capacity) 
that is only loosely correlated with cell size. 

Some specialized cells in multicellular organisms 
arrest proliferation at unusual places in the cell cycle. 
For instance, frog oocytes arrest in Gz until a hor- 
mone triggers their maturation into eggs, and then the 
eggs arrest in mid-mitosis until fertilization by sperm 
triggers them to begin the embryonic divisions. G2 
arrest is due to inhibition of cyclin/CDK complexes 
by phosphorylation, accompanied by the exclusion of 
cyclin/CDK complexes from the nucleus. Mitotic 
arrest is thought to involve inhibition of the ubiquitin 
ligase that flags cohesion proteins and cyclins for 
destruction. Thus, a plethora of external signals can 
act to stop the cell cycle clock at different stages, 
stabilizing the normally unstable states of the clock 
until appropriate signals trigger release from cell cycle 
arrest. In some cases (as with oocytes) the arrest can 
persist for several decades, and in others (as with many 
‘terminally differentiated’ cells including nerve cells) 
the arrest is effectively permanent. 


Checkpoint Controls 
In addition to external inputs to the cell cycle, it appears 
that cells can monitor some of the cell cycle events 
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themselves, and transiently stop the cell cycle clock if 
events are not proceeding according to plan. This 
insight arose from studies on budding yeast ‘cdc’ 
genes. Although several of these encode components 
of the cell cycle clock, many more of the cdc genes 
turned out to encode proteins required to perform 
certain cell cycle events, particularly DNA replication 
and assembly of the mitotic spindle. This indicated 
that unlike frog egg extracts, yeast cells did not have 
an autonomous oscillator cut loose from the cell cycle 
events that it triggered: Blocking either DNA replica- 
tion or spindle assembly caused the cell cycle clock to 
stop. Yeast geneticists hypothesized that cells might 
possess surveillance pathways, which they termed 
‘checkpoint controls,’ that could halt the cell cycle 
clock if key cell cycle events had not been completed. 
They reasoned that it should be possible to isolate 
‘checkpoint mutants’ that inactivated genes required 
for these surveillance pathways, and that such mutants 
would attempt to continue the cell cycle even if key 
events were blocked. Many such mutants were isol- 
ated, lending support to the checkpoint hypothesis, 
and analysis of the genes affected by such mutants is 
beginning to reveal how the surveillance pathways 
operate. 

The ‘DNA replication checkpoint’ prevents 
chromosome segregation if DNA replication has not 
been completed. This checkpoint is thought to detect 
the presence of DNA polymerases still actively engaged 
in replicating DNA, or perhaps to detect polymerases 
that have stalled during replication. Attempts to segre- 
gate incompletely replicated chromosomes would lead 
to DNA breaks, and daughter cells would not inherit 
a complete set of chromosomes. The checkpoint 
prevents this by stopping the clock until replication 
is complete, though the detailed mechanism varies 
between species. In budding yeast the checkpoint 
allows cyclin/CDK activation and spindle assembly, 
but prevents the ubiquitin ligase-mediated dissolution 
of sister chromatid cohesion. In fission yeast and 
mammalian cells, the checkpoint restrains cyclin/CDK 
activation, thereby preventing entry into mitosis. 

The ‘spindle assembly’ checkpoint prevents the 
dissolution of sister chromatid cohesion until all of 
the kinetochores on sister chromatids have been 
appropriately attached to microtubules from opposite 
poles of the spindle. This is important because loss of 
cohesion between sister chromatids would leave unat- 
tached chromosomes with no way of ensuring equal 
segregation of the sisters to the two daughter cells. 
Many of the proteins identified through screens for 
spindle assembly checkpoint mutants are present on 
kinetochores, but are released once a pair of sister 
kinetochores becomes appropriately attached to the 
spindle. Based on elegant experiments using glass 
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needles to pull on the large condensed chromosomes in 
grasshopper spermatocytes, it was proposed that the 
checkpoint proteins actually monitor physical tension 
at the kinetochore. Tension is generated when sister 
kinetochores are being pulled toward opposite poles 
of the spindle by their attached microtubules, at which 
point the sister chromosomes are ready for segrega- 
tion. Even a single chromosome whose kinetochore 
has not yet appropriately attached can prevent mitotic 
progression. Various experiments suggest that the 
checkpoint proteins directly inhibit the ubiquitin ligase 
that flags cohesion proteins and cyclins for destruction 
until every kinetochore is properly attached. 

Another checkpoint control pathway monitors 
DNA damage. Such damage can be caused by radiation 
or chemicals at any time in the cell cycle, and cells have 
many sophisticated strategies to repair the damage. 
However, repair takes time, and either replication of 
a damaged region or mitotic segregation of a damaged 
chromosome can render the damage irreparable, so it 
is important to ensure that repair is completed before 
either replication or segregation. Remarkably, this 
checkpoint can halt (or at least delay) the cell cycle at 
almost any step: in G4, within S-phase, in Go, in mid- 
mitosis, or even during chromosome segregation, 
depending on when the damage was incurred. 

New checkpoint pathways are still being discov- 
ered. In budding yeast recent studies have identified a 
‘morphogenesis checkpoint’ which ensures that cells 
have built a bud prior to entering mitosis, and a ‘spindle 
position checkpoint’ which ensures that the spindle 
has correctly segregated one nucleus into the mother 
and one into the bud prior to undergoing cytokinesis. 
These checkpoints protect cells from environmental 
perturbations that affect the cytoskeletal fibers and 
thus delay proper bud formation or spindle orien- 
tation. In aggregate, checkpoint controls ensure that 
the order of cell cycle events is preserved in the face of 
delays in key processes, and protect cells from disaster 
when random perturbations make continued cell cycle 
progression dangerous. 


Cell Cycle in Health and Disease 


The events of the cell cycle are masterpieces of mo- 
lecular engineering, employing distinct machineries 
(such as DNA polymerases for replication, the micro- 
tubule spindle for segregation, and actin and myosin 
fibers for cytokinesis) to generate two cells from one. 
These machineries are governed by a biochemical clock 
that is in turn controlled by exquisitely tuned signal- 
ing pathways processing information from both inside 
and outside the cell to ensure the correct order of 
events and to allow proliferation only when it is 
appropriate for the organism. Derangements of cell 


cycle control pathways by mutations in somatic cells 
make an important contribution to disease, and in 
particular to cancer. Mutations that uncouple cell 
cycle progression from its normal requirement for 
hormones or anchorage cause uncontrolled prolifer- 
ation, while mutations that cripple checkpoint con- 
trols can cause cells with damaged DNA to proceed 
with DNA replication or cells withunattached chromo- 
somes to proceed through chromosome segrega- 
tion, dramatically accelerating the rate at which the 
cells acquire more mutations, and increasing the 
chance that they will become malignant. Thus, under- 
standing the cell cycle in full molecular detail remains 
an important goal in combating disease as well as a 
fascinating study in its own right. 
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In an early embryo a cell has the potential to gener- 
ate many different cell types. During development 
cells generally lose this potential (or ‘potency’), and 
become restricted to making one or a few cell 
types. This process by which cells become progres- 
sively restricted in their potency is referred to as deter- 
mination. 

Determination of a cell or tissue is an operational 
concept, and is analyzed by experiments in which the 
cell or tissue is isolated or placed in an abnormal en- 
vironment. If the cell’s fate does not change as a result 
of the experiment, then the cell can be said to be 
determined with respect to that manipulation. How- 
ever, it is possible that other experiments could cause 
alterations in the cell’s fate. Thus, a cell cannot be said 
to be absolutely ‘determined’ but only determined 


relative to experimental tests. Evidence that a cell is 
not determined can also come from cell marking 
(clonal analysis) experiments: if a marked cell gives 
rise to multiple cell types in its progeny, then the 
marked precursor can not have been determined to 
make any one cell type. 

From analyses of cell fate determination in many 
organisms, the following general rules have emerged. 
First, determination is a gradual process, in which a 
cell’s potency is progressively restricted during devel- 
opment. Second, the ‘determined state’ is heritable 
through somatic cell divisions, an example of ‘cellular 
memory.’ Third, determination is usually but not 
always irreversible; in some situations a cell can revert 
to an apparently undetermined state, or can ‘trans- 
determine’ to a different stable state. 

Although determination is a multistep process, two 
basic phases can be distinguished: an initial phase in 
which a cell is specified to a particular developmental 
pathway (‘cell fate specification’), and a more extend- 
ed process of commitment, in which the specification 
is fixed and made largely irreversible. It is now well 
established that cell fate specification in embryos can 
involve both cell-autonomous mechanisms and induct- 
ive signals from a cell’s surroundings. Combinations 
of these influences result in progressive alterations in 
the gene expression patterns of embryonic cells. The 
later process of commitment is less well understood — 
for example, why the determined state is stable and 
heritable, and why it is unstable in some situations. 

Cells can become undetermined in special circum- 
stances. In amphibian limb regeneration, cells lose 
their differentiated characteristics and form a ‘regen- 
eration blastema,’ which can generate all the tissues 
of a mature limb. Certain cultured cell lines behave as 
if undetermined, such as the embryonic stem cells 
(ES cells) used in generating transgenic mice. Germ- 
line cells are also exceptional in that they retain the 
potency to generate an entire organism when they 
combine to form a zygote. 

The distinction between cell fate specification 
and determination is exemplified by Drosophila genes 
known as selector genes. Genetic analysis in Dros- 
ophila identified homeotic mutants, in which the fates 
of certain body regions were altered. These homeotic 
mutants defined the homeobox-containing selector 
genes, which function to specify region-specific cell 
fates. For example, cell fates in the third thoracic (T3) 
segment of Drosophila are specified by the homeobox 
gene Ultrabithorax (Ubx). 

The specification of cells to the T3 identity occurs 
during embryogenesis and involves the localized acti- 
vation of Ubx by transcription factors that are tran- 
siently expressed in the embryo. Ubx expression is 
activated in the future T3 segment and then persists 
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in these cells throughout development. If Ubx func- 
tion is removed from cells later in development, they 
lose their T3 identity and become transformed in fate, 
indicating that Ubx activity is required continuously to 
maintain the cells in their determined state. The stable 
activation of Ubx in T3 cells and its stable repression 
in other cells involves chromatin-associated proteins 
required for the maintenance of active and inactive 
states of gene expression. Thus, the stability of the 
determined state may in part reflect stable patterns of 
chromatin. In vertebrates, DNA methylation could 
provide an additional heritable mechanism for stable 
patterns of gene expression. 


See also: Embryonic Stem Cells 
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Genes involved in various aspects of the cell division 
cycle have been identified in a wide variety of prokar- 
yotic and eukaryotic species. Classical and molecular 
genetic studies are generating a coherent overview of 
cell cycle controls, which in many respects have been 
highly conserved during eukaryotic evolution. 

The cell division cycle in all organisms comprises 
replication of the genome and segregation of the 
duplicated DNA. In most eukaryotes these events 
are separated temporally into discrete phases termed 
S-phase (for synthesis) and M-phase (for mitosis or 
meiosis). These phases usually alternate, except in 
specialized circumstances such as meiosis, where two 
M-phases occur without an intervening S-phase, or 
endoreduplication, where successive S-phases proceed 
without intervening M-phases. Except in syncytia, the 
cell cycle is completed by cell division (cytokinesis), 
which is frequently dependent on completion of 
chromosome segregation. Progression through the 
cell cycle is under genetic control. Many cell cycle 
genes encode components of the machinery required 
for S-phase, M-phase, or cytokinesis, such as DNA 
polymerase subunits or tubulin. Others have regula- 
tory roles that, for example, determine the relative 
timing of cell cycle events. 

In bacteria, many genes required for the successful 
completion of DNA replication have been identified 
through mutational screens. Although mutations in 
DNA replication genes can also influence the septation 
process, bacterial DNA replication is relatively loosely 
coupled to (and frequently overlaps with) cell division, 
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in contrast with the situation in most eukaryotic cells. 
Asymmetric divisions generating unequal daughter 
cells are important in bacterial sporulation, and spor- 
ulation-defective mutants have been used to identify 
genes required for this simple developmental program. 

Among the eukaryotes, cell cycle genetics are most 
highly developed in the budding yeast Saccharomyces 
cerevisiae and the distantly related fission yeast 
Schizosaccharomyces pombe. As well as their straight- 
forward genetics, these model organisms have the 
advantage of being capable of continuous growth in 
the haplophase, greatly facilitating the identification 
of recessive mutations. Many cell division cycle (cdc) 
genes encoding either mechanical or regulatory cell 
cycle components have been identified in these yeasts, 
through the isolation of conditional mutants that are 
unable to proceed through the cell cycle after shifting 
to the restrictive temperature. In S. pombe the identi- 
fication of such mutants is made easier by the con- 
tinued elongation of the roughly cylindrical cells after 
blockage of the nuclear cycle. Isolation of the corres- 
ponding cdc genes has employed complementation 
using plasmid libraries and, more rarely, fine-scale 
mapping and linkage studies. In S. pombe, mitotic 
advancement can be scored microscopically as reduced 
cell length at division. Key regulators of M-phase have 
been identified in this organism through the compara- 
tively small number of alleles that cause the advance- 
ment of M-phase entry, rather than cell cycle arrest. 

Orthologs of many of the cdc genes discovered in 
yeasts have since been identified in more complex 
eukaryotes, in some cases by complementation of the 
appropriate yeast mutations using cDNA libraries 
from other species, but more frequently on the basis 
of sequence similarity. Genetic screens in Drosophila 
melanogaster have also defined a number of cell cycle 
regulatory genes, although the corresponding mutant 
phenotypes are frequently masked by cryptic mater- 
nal effects during early development. In a small num- 
ber of cases, conditional cell cycle mutants have been 
identified in cell lines of vertebrate origin, and for 
some of these the corresponding genes have been iso- 
lated. In such cases the recessive mutations are pre- 
sumably revealed by spontaneous loss or inactivation 
of the second allele. These diverse approaches, rein- 
forced by extensive biochemical studies, have shown 
that fundamental cell cycle controls are broadly simi- 
lar in all eukaryotes. This is particularly true of the 
mechanisms governing entry into (and exit from) 
mitosis or initiation of DNA replication. 

The fidelity of chromosome duplication and segre- 
gation in eukaryotes is ensured by mechanisms collect- 
ively termed DNA structure checkpoints, which 
normally ensure that mitosis is not initiated if the 
chromosomal DNA is damaged or not fully replicated. 


Additional controls govern the alternation of S- and 
M-phases, and ensure that anaphase is not initiated 
until every kinetochore is appropriately attached to 
spindle microtubules. Mutants defective in each of 
these processes have been identified in several species, 
and most of the corresponding genes have been identi- 
fied. Several of these checkpoint genes are not required 
for cell cycle progression per se, but are important for 
maintenance of genome integrity and cell viability 
following DNA damage, inhibition of DNA replica- 
tion, or loss of spindle function. In contrast, the genes 
that ensure the alternation of S- and M-phases gener- 
ally encode components of the cell cycle machinery 
itself. 


See also: Cell Division in Caenorhabditis elegans; 
DNA Replication; Meiosis; Mitosis; 
Schizosaccharomyces pombe, the Principal 
Subject of Fission Yeast Genetics 
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Cellular reproduction is a complex endeavor, encom- 
passing many independent processes that are coord- 
inately controlled. Classical and molecular genetic 
studies, particularly those carried out in budding and 
fission yeast, have identified many factors essential for 
karyokinesis (nuclear division) and cytokinesis (div- 
ision of the cytoplasm). Genetic, cytological, and ultra- 
structural analysis of these mutants has furthered our 
knowledge of cell division by uncovering key regula- 
tory mechanisms and intermediate steps in important 
processes such as assembly of the mitotic spindle and 
of the cytokinetic ring. While much of the work in 
lower eukaryotes is universally relevant, the mechan- 
isms of cell division in animal cells are likely to differ 
in many important aspects, and thus, animal model 
systems are necessary. This review discusses the use of 
the nematode worm Caenorhabditis elegans for the 
study of animal cell division. 


Caenorhabditis elegans as a Model System 
for Cell Division Studies 


Since its inception as a model organism for the study 
of animal development and behavior over 30 years ago, 


C. elegans has been employed by a steadily increasing 
number of investigators. Many of the qualities that 
draw developmental biologists to this animal are also 
appealing to those who wish to investigate basic 
aspects of cell division. C. elegans is a small worm, 
approximately 1 mm in length, with a short life cycle 
of 3 days. It can be easily cultivated in the laboratory; 
each self-fertilizing hermaphrodite can produce 300 
progeny, and several generations can be grown on a 
single petri plate with a lawn of bacteria as the food 
source. It has a compact genome of 100 Mbp and is 
composed of fewer than 1000 somatic nuclei. This 
simplicity has allowed it to become the most com- 
pletely understood animal in terms its development, 
anatomy, and genome structure. 

In 1974, Sydney Brenner described the isolation of 
the first set of morphological and behavioral mutants 
of C. elegans and provided basic methods for generat- 
ing, isolating, mapping, and analyzing mutations. 
Many additional techniques have since been devel- 
oped by numerous other C. elegans researchers, and 
there now exists a formidable arsenal of genetic tools 
including techniques for mosaic analysis, transposon- 
mediated mutagenesis, germline transformation, 
RNA-mediated gene silencing (RNAi), and in situ 
hybridization. 

For the investigator interested in cell division, C. 
elegans also offers an important accompaniment to 
these strong genetic approaches: the ability to charac- 
terize mutants in terms of their cytological defects. 
All developmental stages of C. elegans are transparent, 
and thus, every cell division can be monitored in live 
specimens by light microscopy. The early embryo is 
especially attractive for cytological studies; it is im- 
mobile and the early blastomeres are relatively large 
(Figure 1). Subcellular structures such as the centro- 
somes and spindles are visible in live specimens by 
differential interference contrast (DIC) microscopy, 
and thus, pertubations in these structures can be readily 
identified in mutant animals. In addition, the embryo 
possesses dramatic examples of two basic types of cell 
division: proliferative divisions in which two identical 
daughters are produced and determinative (or asym- 
metic) divisions in which two dissimilar daughters are 
produced. Thus, genes that are required for one or both 
types of cell division can be identified and analyzed. 


Genetic Approaches to Studying Cell 
Division in Caenorhabditis elegans 


The application of genetics to a particular problem 
usually begins with the identification of genes that 
have essential roles in the process under study. Two 
P y: 
general strategies can be used to identify cell division 
genes in C. elegans. Forward genetic approaches 
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involve identifying functionally relevant genes via 
mutation, and reverse genetic approaches involve 
silencing the activities of molecularly defined genes. 
Both are now widely used in C. elegans. 

In forward genetic screens, heritable mutations are 
induced in a population of worms by exposing them 
to a potent mutagenic agent. The descendants of these 
animals are then screened for cell division defects. 
In practice, cell division mutants are identified based 
on some easily scored phenotype such as lethality or 
sterility. The relatively small number of mutants that 
exhibit this target phenotype can then be examined 
by more time-consuming measures for cell division 
defects. Thus, only those genes that confer cell div- 
ision defects when mutated are identified. 

Cell division mutants can exhibit any one of a 
number of phenotypes depending on the functional 
specificity of the gene, the strength and nature of the 
mutation, and many other factors. Some mutations 
affect the early embryonic divisions and exhibit a 
maternal-effect embryonic lethal (Mel) phenotype. 
That is, mothers that are homozygous for the muta- 
tion are unaffected, and phenotypically wild-type, 
but all the offspring of such animals are inviable. The 
reason for this is that many of the factors required 
for early embryogenesis — including those required 
for cell division — are synthesized by the mother and 
stored in the egg. Homozygotes, therefore, are able to 
complete embryogenesis normally using materials 
supplied by their heterozygous mothers but are 
unable to provide their offspring with these essential 
activities. Other mutations affect the postembryonic 
divisions; most of these occur in the developing gonad 
and nervous system and therefore these mutants 
usually exhibit sterility and motility defects. Still, 
other mutations affect genes with very limited roles 
and confer more subtle mutant phenotypes. 

Reverse genetic approaches provide an alternative 
means to investigate gene function. These approaches, 
which require DNA sequence information, have 
become popular as more genomic sequence data have 
become available. Although these approaches vary 
greatly, they all involve disrupting the expression of 
genes that have been defined by DNA sequence only. 
One such method involves molecularly screening a 
large pool of mutagenized chromosomes for a small 
deletion that removes the gene sequence of interest. 
From an initially large population, a single worm 
heterozygous for the deficiency can eventually be 
identified and the phenotype of homozygous off- 
spring analyzed. Alternatively, the activity of a gene 
can be temporarily silenced through the introduction 
of a double-stranded RNA molecule derived from the 
sequence of interest. This method, called RNA inter- 
ference, or RNAi, involves microinjecting the dsRNA 
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Figure | Early embryonic development of Caenorhabditis elegans. (A) Pronuclear migration. The oocyte pronucleus 
(o) travels toward the sperm pronucleus (s) at the posterior of the embryo and passes through a transient furrow at 
mid egg length. (B, C) Alignment of the first spindle. After meeting at the posterior, the two pronuclei move to the 
center where they undergo a 90° rotation to position the centrosomes, or future spindle poles, (arrowheads) on the 
anterior—posterior (A-P) axis. (D, E) First cleavage. The spindle — visible as a clearing of cytoplasmic granules — initially 
forms at the center of the cell but becomes eccentrically placed towards the posterior during anaphase. The ensuing 
furrow divides the cell into a larger anterior cell (AB) and a smaller posterior cell P). (F) Second division. AB divides 
first and symmetrically with its spindle (poles denoted by arrowheads) perpendicular to the A-P axis. Soon after, P, 
divides asymmetrically along the A-P axis; the positions of P; centrosomes (indicated by arrows) predict the axis on 
which the spindle will form. Anterior is to the left in all panels. Bar=10 um. (Reproduced with permission from 


O’Connell et al., 1998.) 


into the gonad of an adult hermphrodite. Expression 
of this gene in the offspring of the injected animal is 
often silenced. Although the mechanism of RNAi is 
unknown, the method is known to be specific and high- 
ly effective, in many cases producing a phenotype equal 
to that of the strongest loss-of-function mutations. 

The following sections illustrate how these ap- 
proaches have beenapplied to study various cell division 
processes in C. elegans. 


Mitotic Spindle Assembly and 
Chromosome Segregation 


The hallmark of mitosis is the bipolar spindle, an 
elaborate macromolecular assembly of biopolymers 


called microtubules, microtubule organizing centers 
or centrosomes, and a host of accessory proteins. 
The principles that underlie its structure and essential 
role in chromosome segregation have long been active 
areas of research. Among the C. elegans genes known 
to be required for proper spindle assembly, the zyg-1 
and spd-2 genes play crucial roles. Mutation of the 
zyg-1 gene leads to the presence of abnormal numbers 
of centrosomes. As each spindle pole is organized by a 
centrosome, zyg-7 mutations result in the formation 
of mono- and multipolar mitotic spindles. In spd-2 
mutants, centrosomes appear to be nonfunctional, 
and spindle formation is completely blocked. These 
genes underscore the dominant role of the centrosome 


in establishing spindle form. Spindle structure is also 
affected by mutations that inhibit microtubule func- 
tion as illustrated by maternal-effect mutations in the 
zyg-9 gene. In zyg-9 mutant embryos, the first mitotic 
spindle is smaller than normal and misplaced towards 
the posterior of the zygote. In addition, female meiotic 
spindles do not form. The ZYG-9 protein is homo- 
logous to XMAP215, a frog microtubule-binding 
protein, and like XMAP215, ZYG-9 protein localizes 
to spindle poles and regulates microtubule length. 

Genetics has also been used to investigate the mech- 
anisms of chromosome segregation in worms where 
many mutations that affect the fidelity of this process 
have been identified. Most of these predominantly 
affect meiotic chromosome segregation. These are 
easily identified as they affect segregation of the sex- 
determining chromosome, thus altering the ratio of 
hermaphrodites to males and yielding a high-incidence- 
of-males (Him) phenotype. At least one of these genes, 
him-10, also functions in mitotic chromosome segre- 
gation and may act in both the germline and the soma. 
While the underlying cause of the him-10 chromosome 
missegregation defect is unknown, further analysis 
should provide valuable insight into this important 
process. The abc-1 gene is also required for mitotic 
chromosome segregation. In abc-1 mutants, daughter 
chromosomes fail to separate completely and remain 
connected by chromatin bridges. Thus, the ABC-1 
protein may act to facilitate the separation of sister 
chromatids. 


Cytokinesis 


Cell division ends with the process of cytokinesis in 
which the mother cell is cleaved in half yielding two 
cells, each containing one of the daughter nuclei. 
Mutant analysis in C. elegans indicates that cytokinesis 
may proceed intwodistinctsteps. Thefirst step involves 
furrow formation and requires cytoskeletal elements 
that act to constrict the cell at the equator. This step is 
defined genetically by mutations that block furrowing 
activity, such as those in the cyk-3 gene. The second 
step involves separation of daughter cells and is illus- 
trated by mutations in the cyk-1 gene. In cyk-1 
mutants, furrows form and ingress normally but 
fail to complete. cyk-1 encodes an evolutionary- 
conserved protein that localizes to cytokinetic fur- 
rows where it may act to carry out this final step by 
stabilizing the constriction until cleavage is complete. 


Cell Cycle Coordination 


Successful completion of the cell cycle requires that 
each of the necessary events is executed in an orderly 
fashion; for instance, DNA synthesis must be 
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completed before mitosis begins, and mitosis must be 
completed before initiating a new cell cycle. To main- 
tain order, a set of regulatory genes monitor cell cycle 
‘checkpoints’ and block cycle progession until a par- 
ticular task is complete. In C. elegans, the lin-5 and 
lin-6 genes appear to play sucha role. In /in-6 mutants, 
for example, multiple rounds of nuclear division occur 
in the absence of DNA synthesis. Thus, one can specu- 
late that the LIN-6 protein functions to prevent cell 
cycle progression until DNA synthesis is completed. 
Further analysis of these mutants will aid in under- 
standing how these important cell cycle controls 
function in multicellular organisms. 


Mechanisms of Asymmetric Cell Division 


Many of the embryonic divisions of C. elegans are 
asymmetic; the two daughters differ in size, develop- 
mental potential, cell cycle length, and cleavage 
pattern. These divisions require the six par genes. 
Mutations in these genes partially or completely trans- 
form the asymmetric divisions of the early embryo 
into proliferative divisions. That is, daughters of a 
normally asymmetric division are qualitatively more 
similar. Genetic analysis suggests that the par genes 
interact with cytoskeletal elements to reorganize the 
embryo prior to cleavage such that each daughter 
inherits a different set of developmental instructions. 

An important aspect of asymmetric division that 
is being investigated is how the orientation of these 
divisions is established. In contrast to proliferative 
divisions, which occur along all embryonic axes, asym- 
metricdivisions only occuralong theanterior—posterior 
(A-P) axis. This A-P alignment is essential for asym- 
metric division and is established by mechanisms that 
align the mitotic spindle parallel to the A-P axis. The 
molecular motor protein dynein may provide the 
motive force that drives spindle positioning. Spindles 
fail to align properly in embryos that have been treated 
with RNAi to silence expression of genes required for 
dynein activity. As a result, these embryos exhibit ab- 
normal cleavage configurations. Likewise, mutations 
in the /et-99 gene cause related defects. Further analysis 
of these genes will provide important information on 
the molecular mechanism of spindle alignment. 


Future Prospects 


While cell division research in C. elegans is only in 
its infancy, there is great potential for growth. With 
the entire genomic DNA sequence nearly i in hand, 
the advent of powerful reverse genetic approaches, 
and strong methodology for cytological studies, this 
model organism offers an enormous opportunity to 
learn more about basic cell division processes. 
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The cell lineage of an organism is the pattern of cell 
divisions during its development. Cell lineages are 
described by following cell divisions in living individ- 
uals, or by marking cells and examining their progeny. 
Some organisms or precursor cells display invariant 
patterns of cell division, in which specification of cell 
fates is correlated with cell division patterns; in other 
organisms, lineage patterns are variable and not correl- 
ated with cell fates. Invariant cell lineages reflect both 
cell-autonomous mechanisms of fate determination 
and highly reproducible cell-cell interactions. Genetic 
analysis of cell lineages has focused on systems where 
cell lineage and cell fates are correlated, such as Caeno- 
rhabditis elegans or the nervous system of Drosophila. 
Mutations affecting cell lineages in these animals have 
been informative in understanding both the mechan- 
isms of cell fate specification and the control of cell 
proliferation. 


Overview of Biology of Cell Lineages 


History of Cell Lineage Studies 

Cell lineage studies began with Whitman’s description 
of cleavage patterns in leech embryos in the 1870s, and 
continued with descriptions of lineages in many inver- 
tebrate animals, including nematodes, sea urchins, and 
ascidians. It was found that in some animal groups, 
such as nematodes and ascidians, the pattern of cell 
divisions was almost identical from individual to 


individual. Such ‘invariant’ cell lineages allowed the 
reconstruction of extensive lineage trees. In other ani- 
mals, such as leeches and insects, stereotyped patterns 
of cell division (‘sublineages’) were seen in the pro- 
geny of particular precursor cells. Because of the cor- 
relation between cell lineage and cell fate in such 
invariant lineages, it was assumed that cell fates were 
determined by factors segregating within the dividing 
cells (termed ‘determinate’ cleavage). This mode of 
development was contrasted with the ‘indeterminate’ 
cleavages observed in other animals, in which cell 
lineages are variable and cell fates are determined by 
a cell’s interaction with its environment. However, as 
discussed below, invariant cell lineages do not neces- 
sarily mean that cell fates are determined by the cell 
lineage pattern (see Moody, 1999 for examples). Over 
time, the term ‘cell lineage’ has acquired multiple 
meanings (Slack, 1991; Price, 1993). Here, cell lineage 
is defined as the pattern of cell divisions in the devel- 
opment of an organism, whether invariant or not. 


How Cell Lineages Are Followed 


Direct observation 

In the nineteenth century, lineages were followed 
either by direct observation, or by reconstruction 
from fixed specimens. Such studies required embryos 
that were small, transparent, and rapidly developing, 
but were necessarily limited to early embryogenesis 
where the cells were large and few in number. More 
extensive observations of cell lineages have been 
made possible by the development in the 1960s of 
Nomarski differential interference contrast micro- 
scopy, which allows the imaging of transparent speci- 
mens. The complete cell lineage of the nematode 
C. elegans was followed using Nomarski microscopy; 
cell lineages in the Drosophila central nervous system 
have also been described by direct observation. More 
recently, time lapse microscopy in multiple focal 
planes (‘four-dimensional’ microscopy) has allowed 
entire cell lineages of individual animals to be re- 
corded digitally. 


Clonal analysis 

In large, opaque, or slowly developing embryos, direct 
observation of cell divisions is not feasible. To analyze 
cell lineages in such cases, it is necessary to mark 
individual cells by physical or genetic means, and 
later to identify their progeny by expression of the 
marker. Such techniques are known as clonal analysis, 
because the progeny of a single cell forms a clone. In 
many animals cells can be labeled by injection with a 
nondiffusing dye such as fluorescein-conjugated dex- 
tran. A problem with this technique in growing tissues 
is that the dye can become progressively diluted with 


each round of cell division. In vertebrates, cells can be 
marked by infection of an embryo with a replication- 
defective retrovirus that expresses a reporter gene such 
as B-galactosidase or green fluorescent protein (GFP). 
At low virus concentrations single cells can be infected 
and their progeny recognized by reporter gene expres- 
sion; there is no dilution of the marker because each 
cell in the clone expresses the reporter gene. This tech- 
nique has been used to analyze cell lineages in chick 
and mammalian neural development. 

In Drosophila, individual cells can be marked gen- 
etically for clonal analysis by mitotic recombination 
(Figure IA). This technique is based on the obser- 
vation that X-irradiation of mitotically dividing cells 
causes homologous chromatids to recombine. Thus, if 
a parent cell that is heterozygous for a mutation (m/+) 
undergoes recombination between the mutation and 
the centromere in the Gz phase of the cell cycle, it will 
divide to produce one homozygous mutant cell (m/m) 
and one homozygous wild-type cell (+/+). Recessive 
mutations that cause cell-autonomous phenotypes 
will be expressed only in the clone of mutant cells 
derived from the m/m daughter, allowing this clone 
to be visualized. The size of the clone depends on the 
number of cell divisions between irradiation and the 
time of analysis. Inducible expression of recombinases 
such as the yeast FLP enzyme causes mitotic recom- 
bination between chromatids bearing the FLP recog- 
nition sequence (FRT sites), allowing clones to be 
made at specific times and in specific tissues. Clones 
of genetically marked cells can also be generated in 
plants by induced excision of a transposon from with- 
in a transgenic reporter gene. 

Chimeric embryos are a different form of genetic 
mosaic and have also been useful in defining lineage 
relationships. Chimeras are embryos formed from cells 
of two different genotypes. Most chimeras involve 
multiple cells of each type and thus these approaches 
involve the analysis of multiple rather than single 
clones. Mammalian chimeras are made by combining 
blastomeres from two early embryos; if the cells 
are genetically or physically distinct their progeny can 
be identified later. Chimeras can be made between 
chick and quail embryos; the quail cells can be dis- 
tinguished by nucleolar morphology, allowing lineage 
relationships to be traced. Interspecific chimeras have 
also been used to examine lineages in plant develop- 
ment. 


Types of Cell Division Pattern 

Cell division patterns are typically represented as a 
branching tree (Figure 1B). Three basic types of div- 
ision can be distinguished (Stent, 1998). In a ‘prolifer- 
ative’ cell division, a cell divides symmetrically to give 
rise to two daughters, each of which behaves like its 
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Figure I (A) Clonal analysis techniques. Generation 
of homozygous mutant clones in Drosophila by X-ray 
irradiation or by FLP recombinase-catalyzed recombina- 
tion between FRT sites. The centromere is denoted 
by the circle between chromatids. m, cell-autonomous 
recessive marker mutation. (B) Types of cell division: 
proliferative, stem-cell, and diversifying. 


parent (cell type A divides to give two cells of type A). 
The other two types of division are asymmetric, in 
that the fates of the daughter cells are different. In a 
‘stem-cell’ division, the parent cell gives rise to one 
daughter that resembles the parent and one daughter 
of a different type (A divides to make A + B). Finally, 
in a ‘diversifying’ lineage the two daughters are differ- 
ent in fate from each other and from their parent (A 
divides to make B + C). Some bacteria, such as Bacillus 
subtilis and Caulobacter crescentus, and single-celled 
eukaryotes such as the budding yeast Saccharomyces 
cerevisiae develop by stem-cell-like cell divisions 
and provide models for understanding asymmetric cell 
division in multicellular animals. Because asymmetric 
cell divisions give rise to daughters with different fates 
they are important in understanding how different 
cell types arise, and have been the focus of intense 
genetic analysis (see below; reviewed by Horvitz and 
Herskowitz, 1992; Jan and Jan, 1998). 
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Intrinsic and Extrinsic Mechanisms in Cell 
Fate Determination 

In animals displaying invariant cell lineages, the ances- 
try, environment, and fate of a cell are correlated. It 
was often assumed that invariant cell lineages reflected 
intrinsic (cell-autonomous) mechanisms of cell fate 
determination (also known as the ‘mosaic’ mode of 
development), in which the fate of a cell is determined 
only by its inheritance of factors segregated in ances- 
tral cell divisions. However, lineage invariance is not 
sufficient evidence for a lineage-intrinsic mechanism. 
It is important to note that in an invariant cell lineage 
both a cell’s environment and its ancestry are correl- 
ated with its fate. Thus, cell fates could be specified by 
reproducible cell-cell interactions rather than repro- 
ducible inheritance of intrinsic factors. To prove that 
fates are specified autonomously, experiments in which 
a cell is isolated or transplanted must be performed. 
Although nematodes and ascidians both display invari- 
ant lineages, modern experiments have shown that 
many aspects of development in these animals are 
not cell-autonomously programmed, but instead rely 
on invariant cell-cell interactions. 


Genetics of Cell Lineage in Nematode 
Caenorhabditis elegans 


Cell Lineage 

Our understanding of cell lineages in Caenorhabditis 
elegans is uniquely privileged in that the complete cell 
lineage from zygote to adult has been determined 
(Figure 2), a heroic work of direct observation of 


living specimens (reviewed by Sulston, 1988). In con- 
junction with maps of cell nuclei, the cell lineage 
provides a complete fate map, and makes it possible 
to analyze the results of experimental manipulations 
and mutants with single-cell resolution. 

The C. elegans zygote undergoes a series of asym- 
metric cell divisions to generate six blastomeres (AB, 
MS, E, C, D, and P4), known as embryonic founder 
cells (Figure 3A). Each founder cell is distinctive in 
terms of its cell lineage pattern and the cell fates it 
generates. For example, the zygote divides asymmet- 
rically to form a larger anterior daughter denoted the 
AB founder cell, which undergoes a set of initially 
symmetrical divisions to generate neurons, muscle 
cells, and some epidermal cells. Most cell proliferation 
occurs during the first half of embryogenesis; a small 
number of postembryonic blast cells divide in larval 
development to generate neuronal and epidermal cells, 
the gonad, and sexually dimorphic structures. During 
the development of a C. elegans hermaphrodite 1090 
somatic cells are generated, of which 131 undergo 
programmed cell death, to yield an adult containing 
959 somatic cell nuclei (the number of cells is lower 
because some cells fuse to form multinucleate syncytia). 

In C. elegans lineage studies each cell is given a 
unique name reflecting its lineage history. Certain key 
embryonic and postembryonic precursors are given ar- 
bitrary names (e.g., AB, Z1). Their progeny are named 
by adding letters denoting the axis of the cell division 
relative to the body axes (a/p for anterior/posterior, 
etc.). Thus, Z1.ppp is the posterior daughter of the 
posterior daughter of the posterior daughter of Z1. 
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Figure 2 The Caenorhabditis elegans cell lineage. Time axis is vertical; each cell division is a horizontal line. The 


origin of some cell types is indicated. 
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Figure 3 (A) Abbreviated embryonic lineage of Caenorhabditis elegans, showing the relationships of the major em- 
bryonic founder cells. (B) P cell sublineage, showing the classes of cell generated. mn, motor neuron, with neuro- 
transmitter indicated (ACh, cholinergic; GABA, GABAergic); sn, sensory neuron. 


The somatic cell lineage of C. elegans is largely 
invariant, with limited exceptions. Within some pairs 
of cells there is variation in terms of which member of 
the pair adopts one fate and which adopts the other 
fate. For example, two adjacent gonadal precursor 
cells, Z1.ppp and Z4.aaa, generate two cells known as 
an anchor cell (ac) and a ventral uterine precursor 
(VU) cell. In an individual animal, either Z1.ppp or 
Z4.aaa becomes an anchor cell, and the other cell 
becomes a VU cell. Since in normal development the 
Z1.ppp/Z4.aaa pair never generates two anchor cells 
or two VU cells, the two cells must communicate to 
ensure the normal pattern of fates. The Z1.ppp/Z4.aaa 
pair of cells is an example of an ‘equivalance group’: a 
group of cells equivalent in developmental potential. 
In the case of the Z1.ppp/Z4.aaa pair, the choice of 
fates appears to be entirely stochastic; in other equiva- 
lence groups, the choice of fates is biased. 


The vast majority of cell divisions in C. elegans 
are asymmetric, in that the fates of the daughters 
are different. Most cell types (neurons, muscles, epi- 
dermis) are generated in patterns that, while not ran- 
dom, do not show simple lineage relationships. The 
germline and intestine are exceptional in that they 
develop as clones from the precursors E and P4, 
respectively. Furthermore the germline develops in a 
proliferative lineage that is variable from animal to 
animal. 

A striking feature of the lineage is that repeated 
‘sublineages’ are evident, in which homologous pre- 
cursors divide in identical ways to make homologous 
sets of cells. Such sublineages, in which cell fate and 
lineage correlate in multiple instances, suggest the 
existence of lineage-intrinsic mechanisms specifying 
fates. For example, along the length of the ventral 
side of the first stage larva are 12 postembryonic 


306 Cell Lineage 


blast cells denoted P1 through P12 (these are different 
from the embryonic blast cells Py;—P4). Each P cell 
divides to generate an anterior daughter with neuro- 
blast fate and a posterior daughter with epidermal fate; 
the anterior daughters all divide in similar patterns to 
generate five motor neuron types at identical positions 
in each lineage tree (Figure 3B). P cells in different 
body regions divide in the same basic pattern with 
slight modifications. Because isolation or transplant- 
ation of P cells is not technically feasible, it is not 
known to what extent cell fates are determined intrin- 
sically within each sublineage. 

An example of extrinsic control of cell fates was 
provided by Priess and Thomson (1987). Normally the 
anterior and posterior daughters of AB have different 
fates. If the division axis of AB is reversed by micro- 
manipulation, such that the anterior daughter now lies 
posteriorly, the AB daughters display regulation and a 
normal embryo is formed. Thus, differences between 
AB daughters cannot result from cell-autonomous 
mechanisms but must involve interactions with each 
cell’s environment. 


Isolation of Cell Lineage Mutants 

Mutations affecting C. elegans cell lineages have been 
isolated in many genetic screens. The most common 
approach has been to isolate mutants with morpho- 
logical or behavioral defects, and subsequently to 
identify cell lineage defects. Because C. elegans can 
propagate as a self-fertilizing hermaphrodite, many 
mutants with severe defects in morphology or behav- 
ior can be recovered. Alternative approaches have 
been to screen directly for alterations in the pattern 
or number of cells generated, visualizing cells by 
Nomarski microscopy or by staining with DNA- 
binding dyes. Early screens focused on mutants affect- 
ing postembryonic cell divisions; more recently, 
screens for maternal-effect and zygotic embryonic 
lethal mutants have identified genes required for 
embryonic cell lineages. The genes defined by such 
cell lineage mutants form a diverse set, with roles 
ranging from general requirements in cell division to 
roles in certain types of cell division or specific cell 
fates (reviewed by Horvitz, 1988). 


Genes Identified by Cell Lineage Mutations 


Genes required for cell-cell interactions that specify 
fates 

Many mutations result in ‘homeotic’ cell fate trans- 
formations, that is, a particular cell is not simply 
abnormal but takes on the fate (as evidenced by a cell 
lineage transformation or other markers) of another 
cell normally found in a different body region, in a 
different developmental stage, or in the other sex. 


An example of a homeotic transformation of cell 
lineage is provided by mutations in the /in-12 gene (lin 
stands for cell lineage abnormal). lin-12 mutants 
display a variety of homeotic transformations, often 
involving the members of equivalence groups. For 
example, in the ac/VU (Z1.ppp/Z4.aaa) equivalence 
group, a reduction of lin-12 function causes both 
cells to become anchor cells (Figure 4A). Elevation 
of lin-12 activity causes both cells to become VU cells. 
Because opposite changes in lin-12 activity cause 
opposite effects on cell fates, /in-12 is an example of 
a binary switch gene, whose activity controls which 
of two alternative fates a cell can adopt. The LIN-12 
protein is a transmembrane receptor of the Notch 
family, and functions in cell-cell communication 
between members of an equivalence group. Thus, in 
normal development, LIN-12 is likely initially 
expressed in both Z1.ppp and Z4.aaa. By chance, 
LIN-12 becomes more active in one cell than the 
other; elevated activity of LIN-12 feeds back posi- 
tively to keep LIN-12 on in that cell, and negatively to 
turn LIN-12 off in the other cell. As a result, LIN-12 
activity increases in the cell that becomes the VU 
cell, and decreases in the cell that becomes the anchor 


cell. 


Genes required for timing of cell lineage patterns 

C. elegans normally develops through four larval 
stages (L1-L4). Postembryonic precursor cells under- 
go stage-specific patterns of cell division within each 
larval stage. A fascinating class of mutants known as 
heterochronic mutants display either precocious or 
retarded expression of these cell lineage patterns. 
Genes defined by heterochronic mutations thus func- 
tion in controlling the temporal pattern of cell fates 
during larval development. 

Mutations in the /in-14 gene affect stage-specific 
patterns of cell division (Figure 4B). Reduction of 
LIN-14 function results in a precocious phenotype 
(early stages express the patterns of later larval stages), 
while abnormally high LIN-14 function causes re- 
tarded cell lineage patterns (all stages express early 
patterns). Thus, the level of LIN-14 activity deter- 
mines whether a precursor undergoes early or late 
division patterns. The lin-14 locus encodes nuclear 
proteins of unknown biochemical function that are 
present at high levels in early larvae and low levels in 
late larvae. 


Genes involved in asymmetric cell divisions 

Many cell lineage mutants display defects in the nor- 
mal asymmetry of cell divisions, and have provided 
insights into the mechanisms by which determinants 
of cell fates are segregated in such asymmetric div- 
isions. The first division of the zygote is asymmetric 
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Figure 4 Effects of cell lineage mutations. (A) Effects of lin-12 mutations on cell fates in the anchor cell/ventral 
uterine (ac/VU) equivalence group. Filled circle = anchor cell; open circle = ventral uterine precursor. lin-/2(gf) = 
gain-of-function lin-12 mutation causing overactivity of the LIN-I2 protein; lin-/ 2(If) = loss or reduction of LIN-12 
function. (B) Effects of lin-14 mutations on the temporal control of the lineage of the postembryonic blast cell T. In 
normal development T generates the lineage shown in the LI and L2 larval stages. In lin-!4(gf) mutants the LIN-14 
protein is overactive and early (Ll-specific) lineages are reiterated (retarded phenotype); in mutants that cause loss of 
LIN-14 function (lin-! 4(If)), Ll-specific patterns are bypassed and T undergoes an L2-specific lineage. (C) Effect of 
unc-86 mutations on diversifying lineages in the nervous system. In unc-86 loss-of-function mutants, diversifying 
lineages are transformed to a reiterating stem-cell-like pattern. UNC-86 protein is expressed within the daughter 


affected in the mutants (C in the figure). 
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along the anteroposterior axis, which is determined by 
the point of fertilization. Because the oocyte appears 
to be symmetrical, and after fertilization is isolated 
within an eggshell, the asymmetry of the first division 
is likely to be set up cell-autonomously rather than by 
environmental cues. Maternal-effect mutations affect- 
ing the asymmetry of the first division define several 
par (defective partitioning) genes, the products of 
which are asymmetrically distributed in the zygote. 
The asymmetry of subsequent cell divisions may 
involve both intrinsic mechanisms that provide a cel- 
lular memory of this initial asymmetry, and cell-cell 
interactions. All asymmetric divisions in C. elegans 
involve cell division along the anteroposterior axis, 
and in many of these divisions the protein POP-1 is 
asymmetrically distributed with higher POP-1 levels 
in the anterior daughter. In many cells this asymmetry 
of POP-1 levels requires cell signaling via the Wnt 
pathway. 

Several genes have been identified that function in 
asymmetric cell divisions in later development. One 
gene, unc-86, is required in diversifying neuroblast 
lineages. In unc-86 mutants, the diversifying character 
of such divisions is lost, revealing an underlying stem- 
cell type of division (Figure 4C). The UNC-86 pro- 
tein is a POU-domain transcription factor that is 
asymmetrically activated in the daughter cell that re- 
quires its function. 


Cell Lineages in Insects 


Insects mostly display cell lineages that are variable 
at the level of individual cell divisions. However, in 
the central and peripheral nervous systems (CNS 
and PNS), precursor cells undergo stereotyped sub- 
lineages giving rise to neurons and neuronal support 
cells. Analysis of such lineages has involved a combin- 
ation of direct observation, dye labeling, and exam- 
ination of lineage-specific molecular markers. 

Genetic analysis of cell lineages in insects has 
focused on Drosophila CNS and PNS neuroblast 
lineages. In the development of a peripheral sensillum 
such as a bristle, a precursor cell generates one neuron 
and three support cells (Figure 5A). If activity of the 
Notch signaling pathway is reduced, all cells become 
neuronal, indicating that Notch signaling normally 
promotes the non-neuronal fate. Notch signaling 
appears to operate between sister cells in the lineage 
(Figure 5B). Thus, although fates are specified autono- 
mously within the lineage, they require local inter- 
actions between cells in the same lineage. 

Other mutations disrupting neuroblast lineages 
define several genes required for the normal asym- 
metry of cell division and cell fates. Such genes may be 
involved in determining the polarity of the asymmetry 


itself, or may be segregated in response to the polarity, 
such as numb. Mutations in the numb gene cause sister 
cell transformations in peripheral neuroblasts, leading 
to a total absence of sensilla (Figure 5C). The numb 
protein is asymmetrically localized in the dividing 
precursor cell, is segregated to the one daughter cell 
that will make a neuron, and thus can be considered a 
localized determinant. The function of the numb pro- 
tein is to antagonize the effects of Notch signaling, 
and thus promote neuronal development. Several other 
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Figure 5 Cell lineage of a Drosophila peripheral 
sensory organ precursor. (A) A single precursor under- 
goes two rounds of asymmetric divisions to generate 
four cells: a sensory neuron, a hair cell, a socket cell, 
and a glial cell. (B) In a Notch mutant, all four cells are 
converted into neurons; (C) in a numb mutant, the 
opposite effect is seen, in which all cells adopt non- 
neuronal fates. 


genes have been found that regulate the asymmetric 
cell division itself. Some of these genes may be in- 
volved in setting up or responding to the apical/ 
basal asymmetry of the neuroepithelium from which 
the neuroblasts arise. 


Cell Lineage in Vertebrates 


The size and cell number of most vertebrate embryos 
make direct observation of cell division patterns 
difficult, and thus lineage relationships have been 
largely defined using clonal analysis. Cell marking 
and transplantation experiments in amphibians and 
the zebrafish Danio rerio have shown that the early 
cleavages are not determinative, and that cells do not 
become committed to specific fates until the blastula 
stage. 

Cell lineage studies in the vertebrate CNS and 
retina showed that individual cells can generate a 
wide variety of cell fates, even in very small clones. 
Thus, cell fates in these situations appear to be speci- 
fied by a cell’s environment and not by its lineal 
ancestry. Evidence suggestive of lineage-autonomous 
mechanisms of fate determination has come from the 
analysis of vertebrate homologs of proteins such as 
numb and Notch, both of which are asymmetrically 
localized in dividing neuroblasts in the mammalian 
cerebral cortex. However, the role of these proteins 
in cell fate specification in vertebrates has not yet been 
determined. 


Cell Lineage in Plant Development 


Stereotyped cell lineages have been observed in the 
development of many plants. Asymmetric cell div- 
isions occur in the development of colonial algae 
such as Volvox, in which they segregate somatic versus 
germline fates. Early cell divisions of flowering plants 
such as Arabidopsis are highly stereotyped. However, 
cell interactions appear to be more important than 
ancestry in specifying fates. Stereotyped cell lineages 
are also observed during development of Arabidopsis 
root and floral meristems, and in stomatal develop- 
ment, but again the pattern of cell fates may be deter- 
mined by interactions rather than ancestry. 


Evolution of Cell Lineages 


Once a cell lineage has been described for one species, 
one can examine equivalent lineages in related species 
to understand how cell lineages have been modified 
in evolution — in effect, comparative anatomy with 
single-cell resolution. Comparative cell lineage analysis 
has been performed in nematodes, mollusks, insects, 
and ascidians. Studies of cell lineages in nematodes 
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have begun to yield insights into how morphological 
change occurs in evolution (reviewed by Félix and 
Sternberg, 1997). For example, in C. elegans the choice 
of fates in the the ac/VU (Z1.ppp/Z4.aaa) equivalence 
group is stochastic, with each precursor equally cap- 
able of becoming an ac or a VU. In some nematode 
species the allocation of fates is variable but biased, 
while in other species the allocation of fates is invari- 
ant. Cell killing experiments show that in such species 
cell fates are no longer dependent on cell-cell inter- 
actions. An emerging theme is that alterations in the 
behavior of single cells can result in dramatic morpho- 
logical changes. 


Conclusions 


Studies of cell lineages have been critical in our under- 
standing of how cell fates are specified in development 
and how fates are correlated with cell division pat- 
terns. Invariant lineages or sublineages, although 
initially considered to imply ‘lineage-intrinsic’ mech- 
anisms of fate determination, are now thought to 
reflect both intrinsic and extrinsic mechanisms. Thus, 
animals with invariant cell lineages may not develop in 
fundamentally different ways from larger animals in 
which cell lineages are variable. In insects and verte- 
brates, cells mostly function in groups, within which 
cell communication specifies fate. In such animals 
development may be described as a lineage of cell 
groups. Selection for rapid development and small 
size might have led to the reduction of such cell groups 
to individual cells, and thus the appearance of animals 
with defined cell lineages. 
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Cell Line 


A cell line is a permanently established cell culture that 
will proliferate indefinitely given appropriate fresh 
medium and space. Lines differ from cell strains in 
that they become immortalized. 


Cell Culture and the Establishment of Cell 
Lines 
Cell culture and cell lines have assumed an important 
role in studying physiological, pathophysiological and 
the differentiation processes of specific cells. It allows 
the examination of stepwise alterations in the struc- 
ture, biology, and genetic makeup of the cell under 
controlled environments. This is especially valuable 
for complex tissues, such as the pancreas, which is 
composed of various cell types, where in vivo exam- 
ination of individual cells is difficult, if not impossible. 
The extreme difficulties in the isolation and purifica- 
tion of individual epithelial cells from complex tissues 
by maintaining their native characteristics has ham- 
pered our understanding of their physiological, bio- 
logical, growth, and differentiation characteristics. 
Attempts have been made to culture almost every 
tissue, including neuronal cells, bone, cartilage, hair 
cells, etc. In general, animal cells, particularly fibro- 
blasts, can be more successfully cultured than human 
cells, and human fibroblasts are easier to culture than 
epithelial cells. Also, different epithelial cells show 
different responses to culture conditions. Despite 
advances in culturing techniques, human epithelial 
cells could not be maintained in culture for long time 
periods. The problem is the tendency of human cells to 
undergo senescence after a certain cell division. Trans- 
fection of these cells with the E6E7 gene of human 
papilloma virus 16, or with the small and large T antigen 
of the simian virus (SV) 40 has partially overcome the 
senescence and has increased cell longevity in vitro but 
has not led to immortality of the cells. The resulting 


genetic manipulations limit the use of these cells for 
molecular biological studies, especially for defining 
genetic changes that occur during cell differentiation 
and transformation. The introduction of these foreign 
genes alters the function of the host’s regulatory genes 
including the inactivation of the tumor suppressor 
protein p53 and retinoblastoma protein pRb. Even 
though these cell lines do not grow in soft agar, 
which would be a first sign of transformation, or 
when introduced into nude mice, the additional trans- 
fection with certain oncogenes such as k-ras has 
resulted in the malignant transformation of the cells. 

The quality of the culture medium and the cell 
preparation technique are very important for the 
maintenance of human epithelial cells in culture. By 
using a defined culture medium and cell separation 
technique, human pancreatic epithelial cells have been 
kept in culture for more than 10 months. Another, 
recently discovered method to prolong the lifespan 
of human cell, is the infection of cells with telomerase, 
an enzyme that prevents telomere loss by de novo 
addition. It restores the length of telomeres, which 
otherwise shorten with each cell proliferation, lead- 
ing to senescence. So far, successful reports include 
immortalized fibroblasts, retinal, and endothelial cells. 

Attempts have been made to identify and culture 
stem cells of specific tissues because these cells can 
better adjust to the environmental conditions and 
can give rise to a variety of mature cells under specific 
environments. For example, it has been shown that 
cultured colon cells containing stem cells can give rise 
to either neuroendocrine cells, colon cells, or a mixture 
of them. Therefore, such cultures provide ample 
opportunity to investigate differentiation pathways 
and provide a unique tool to test the effects of natural 
and synthetic substances, including cytokines, growth 
factors, nutrients, and physical factors in the matur- 
ation or death of the cells. 

The mechanisms of malignant transformation can 
be studied im vitro using cell lines treated with a car- 
cinogen or radiation in culture. Gradual phenotypical, 
genetic (e.g. DNA adduct levels, alkylations, muta- 
tions) and chromosomal changes can be investigated. 
Specific markers associated with the transformation 
may be expressed, such as tumor growth factor-a 
(TGF-æ) and epithelial growth factor receptor 
(EGFR). Unfortunately, it has not been possible to date 
to transform human epithelial cells in culture, so the 
need for animal models still exists. Rodents are much 
more susceptible to carcinogenicity than humans. 


Advantages of Cell Culture 


Cell culture offers many research possibilities difficult 
or impossible to achieve in vivo. The effects and 
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metabolism of certain drugs and toxins can be tested 
under various conditions in individual cells of a com- 
plex tissue. Many parameters, including the in- 
gredients of the culture medium, culture conditions, 
population density, and growth rate can be controlled. 
Furthermore, cells can be manipulated by transfection 
to investigate the role of various genes in the physi- 
ology or malignancy of the cells. The effect of toxic and 
carcinogenic substances and the interaction of various 
drugs, viruses, and physical or chemical carcinogens, 
can be evaluated. From a mixed cell population (most 
native cell lines are known to have a heterogenous cell 
population), clones can be established and the patterns 
of individual clones can be studied. Functional studies 
can also be performed. Proteins or peptides, produced 
or secreted by the cells, can be measured in conditioned 
media under various culture conditions. Immuno- 
histochemical, molecular biological, and immunoelec- 
tron microscopical examinations are other useful 
methods to gain some information. Cell lines are also 
useful for defining therapeutic measures in vitro as 
well as after implantation of the cells into animals 
before the procedures can be applied to humans. 

Cell differentiation is another important field of 
cell culture research. Cell culture can also have a life- 
saving function. For example, the sensitivity of tumor 
cells to specific cytotoxic agents can be tested in cul- 
tured tumor cells of the patient to select the most 
efficient drug in killing these tumor cells. Short culture 
of human cells is also used in lymphatic diseases, 
where the normal lymphocytes and the stem cells 
are propagated to be reintroduced into the patients 
who have lost their blood cells after heavy radiation 
or chemotherapy. 


Problem Areas 


Cell-to-cell interaction is one of the most important 
cellular functions in an organism, the disruption of 
which certainly has known and still unknown con- 
sequences. It is questionable whether the few cells 
from which immortalized cell lines originate, are 
representative of their tissue or disease of origin. 
Genetic manipulations of the cells add additional 
problems and can ultimately alter some or many 
native functions and responses of the cells. The major 
problem with the human cells is their tendency to 
become senescent and, therefore, they presently are 
useless for long-term experiments. 

Transplantation of cultured cells into a suitable 
host, as often performed to test the malignancy of 
cells, can also be problematic. The growth and differ- 
entiation of tumor cells and their response to thera- 
peutic agents can be different in different species, and 
even between different strains of the same animal. 


Conclusion 


Cultured cells have provided some information on 
physiological and pathophysiological processes of 
various cell types. So far, most of the findings are 
based on the cultured cells of rodents. The advance- 
ment of tissue culture techniques and molecular biol- 
ogy offer steady progress in this important line of 
research. The collaboration of researchers from differ- 
ent medical disciplines is necessary for successful 
isolation, purification, and maintenance of normal 
human epithelial cells. 


See also: Tissue Culture 
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Among the bioluminescent organisms two groups, 
bacteria and coelenterates (e.g., jellyfish), not only 
produce light, but also alter the wavelength of the 
emitted light by using an associated fluorescent pro- 
tein. In the coelenterates the fluorescent protein, 
called the green fluorescent protein or GFP, absorbs 
near-UV and blue light and emits green light. One of 
the remarkable features of GFP (and one not shared by 
the bacterial protein) is that it requires no additional 
factors: the fluorophore forms by the modification of 
the primary amino acid sequence. The demonstra- 
tion that the GFP from the jellyfish Aequorea victoria 
fluoresced when expressed in other organisms, and, 
thus, needed no other coelenterate components to 
form a functional protein, ushered in the use of this 
protein as a biological marker. Transformation with 
DNA encoding GFP is now used to label cells, 
organelles, and their constituents. 

Several properties of GFP make it particularly use- 
ful as a biological marker. First, because fluorescence 
relies on gene expression, the use of GFP is relatively 
noninvasive. Cells do not have to be fixed or perme- 
abilized to gain access to necessary components. Sec- 
ond, because access to a marked protein is not 
required (GFP can be incorporated as part of a fusion 
protein), protein interactions that may block active 
sites or antibody binding sites are not a problem. 
Third, GFP fluorescence is seen in living (and fixed) 
tissues. The fluorescence in living tissue permits 
researchers to study dynamic changes in biological 
processes. Fourth, GFP is fluorescent as a monomer 
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(its usual form, except at very high concentrations), 
so it should not compete with protein interactions as 
B-galactosidase, which forms tetramers, can. Fifth, 
the small size of the monomer allows GFP, when 
expressed alone, to diffuse throughout cells, outlining 
their entire structure. This feature is particularly 
important when examining nerve cells. Sixth, GFP is 
fluorescent in both the cytoplasm and in the extracel- 
lular space. B-Galactosidase, in contrast, functions 
only in the cytoplasm. Seventh, GFP is a hardy mol- 
ecule, being resistant to a broad range of pH (essentially 
no changes are seen in fluorescence between pH 5.5 
and 10.0), most proteases, and elevated temperature 
(once the protein has formed). Eighth, because of the 
structure of the protein, photodamage from continued 
irradiation is minimal. 


Properties of GFP and its Variants 


The 238-amino-acid polypeptide of GFP forms a 
structure called a B can. The can is formed from eleven 
B strands that make a cylinder (a B barrel) and several 
loops that form the top and bottom. An « helix located 
between the third and fourth B strands contains the 
fluorophore and runs from one end to the other within 
the can. The fluorophore is produced from the cycl- 
ization of the peptide backbone at Ser65-Tyr66-Gly67, 
and the formation of a dehydrotyrosine at position 66. 
The B can structure explains the stability of the mol- 
ecule and its resistance to proteases. Moreover, the fact 
that the fluorophore is buried within the molecule 
explains why irradiation of GFP-containing cells 
causes little if any photodamage. 

Native and recombinant wild-type GFP have exci- 
tation peaks at 395 nm (near UV) and 470 nm (blue) 
and an emission peak at 509 nm (green). Excitation of 
the 395 nm peak produces about six times more fluor- 
escence than excitation at 470 nm. Oxygen is needed 
for the formation of the fluorescent protein. However, 
if GFP produced in oxygen-depleted cells is irradiated 
with blue (488 nm) light, the protein will fluoresce red 
(with peaks at 590 and 600nm) when subsequently 
irradiated with green (525 nm) light. 

Mutation of GFP has produced several variants 
with altered properties. One mutation (changing 
Ser65 to Thr) results in a single excitation peak at 
488nm that produces about sixfold more fluores- 
cence than excitation of the wild-type 470 nm peak. 
Other mutations also increase the fluorescence inten- 
sity, usually by allowing more soluble protein (which 
is need to form the fluorophore) to be produced. 
Mutation of Tyr66 alters the emission spectra; re- 
placement with His produces a blue emission peak 
(448 nm). This replacement reduces the fluorescence 
intensity, so other changes are needed to increase 


fluorescence intensity. Mutation of Thr203 to Tyr in 
a molecule that also has the Ser65 Thr mutation results 
ina shift of both the excitation and emission spectra so 
that the emission is in the yellow-green (527 nm). 
Finally, while GFP, once formed, fluoresces at ele- 
vated temperatures (only half the fluorescence is lost 
at 76 °C), the formation of the folded protein appears 
to be temperature-dependent. Inappropriate folding 
leads to the production of a nonfluorescent, insoluble 
protein. The double mutation Val 163 Ala and Ser 175 
Gly results in GFP that is more soluble and presum- 
ably more thermostable with regard to folding at 37°C. 

As well as by altering the GFP amino acid se- 
quence, fluorescence intensity can be increased by 
changing expression of the protein. Changes to the 
wild-type cDNA that have increased expression in 
some organisms include: (1) changing the original 
translation start to conform with the predicted 
Kozak sequence, (2) altering third base pair positions 
to optimize codon usage, (3) inserting synthetic in- 
trons to increase, presumably, processing and export 
of the mRNA from the nucleus, and (4) removing a 
cryptic splice site to allow GFP expression in Arabi- 
dopsis thaliana and other plants. 


Uses of GFP 


GFP and its variants have been used in organisms from 
bacteria and yeast to mice and human cells. One of the 
most common uses of GFP is in promoter and pro- 
tein fusion constructs. Promoter fusions with GFP 
can document patterns of gene expression. Given the 
dynamics of GFP production (the fluorophore takes 
some time to form) and stability (the protein appears 
to be long-lived), detailed studies of the onset and 
cessation of gene expression (with a resolution of 
minutes) are not possible. Protein fusions are useful 
in determining the subcellular localization of a protein 
of interest and whether that localization changes dur- 
ing development, with different growth conditions, 
or in different genetic backgrounds. The most useful 
fusions are those that also rescue the mutant pheno- 
type, because the rescue indicates that the fusion pro- 
tein functions appropriately. 

Sometimes these fusion constructs are used to ana- 
lyze a protein or promoter of interest. At other times 
these fusions mark cells or cellular compartments so 
that biological phenomena can be examined or mani- 
pulated. Nuclei, endoplasmic reticulum, Golgi, mito- 
chondria, peroxisomes, and synaptic endings have all 
been labeled using GFP. Once organisms have been 
labeled, they can be subjected to various conditions or 
they can be mutated to obtain mutants with altered or 
absent expression. For example, we have used GFP- 
labeled neurons in the nematode Caenorhabditis 


elegans as the basis of a screen for mutations that alter 
cell fate, cell migration, or neuronal outgrowth. 

GFP can also indicate the presence of viruses and 
microorganisms. In molecular biology research, the 
labeling of viral proteins makes GFP a useful transfec- 
tion marker. Since GFP labels living cells, the labeling 
of microorganisms may be particularly important in 
studying interactions between and within popula- 
tions, e.g., symbiosis and host-parasite interactions. 
GFP can also be used to monitor infectious processes 
in plants and animals. 

Recently several groups have produced GFP fusion 
proteins that couple the fluorescence of GFP to par- 
ticular biological conditions. Such hybrid molecules 
respond with altered fluorescence to differences in 
membrane potential, calcium concentration, and pH. 
These molecules and others like them promise to 
greatly expand the usefulness of GFP into the realm 
of biological sensors. 
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Inappropriate cell death underlies the pathology of 
many human and animal diseases. In particular, pre- 
mature neuronal cell death plays a significant role in 
several late onset degenerative disorders such as 
Alzheimer disease and amyotrophic lateral sclerosis. 
Somewhat unexpectedly, genetic programs that exe- 
cute and regulate cell death exist. The genetic instruc- 
tions for the regulation and execution of one type of cell 
death, called apoptosis or programmed cell death, have 
been remarkably conserved. Studies in lower organ- 
isms, such as in the nematode Caenorhabditis elegans, 
have provided significant insight into the mechanism 
underlying this cell death process. The mechanisms of 
pathological or necrotic cell death are less clear, but a 
detailed molecular model of inherited neurodegenera- 
tive conditions, identified in the nematode, is emerging 
and may provide a means of identifying a conserved 
pathway for pathological cell death in humans. 


Cell/Neuron Degeneration 313 


Overview of Cell Death 


Two major types of cell death were initially distin- 
guished on the basis of morphological changes 
observed in the dying cell. One, termed apoptosis, is 
characterized by shrinkage and fragmentation of cyto- 
plasm, compaction of chromatin and eventual de- 
struction of cellular organelles. Frequently DNA is 
degraded by intranucleosomal cleavage which after 
electrophoretic separation generates a characteristic 
DNA ladder of fragments that differ in size by one 
nucleosome repeat length. Apoptotic cellular remains 
are usually removed by phagocytosis and do not 
invoke an inflammatory response. In several cases it 
has been demonstrated that apoptotic cell death is an 
active process, requiring RNA and protein synthesis, 
although this is not a universal feature of this type of 
cell death. Death by apoptosis often occurs as part of 
normal development or homeostasis. Apoptotic death 
generally accounts for normal elimination of cells dur- 
ing development and in cell depletion due to broad 
range of stimuli including changes in growth factor or 
hormone levels, mild ischemia, cell-mediated immune 
attack, ionizing radiation, mild hypothermia, and sev- 
eral chemotherapeutic agents. Genetic and biochem- 
ical studies have identified proteins that regulate and 
execute apoptotic death. The activities of these pro- 
teins have proved to be remarkably conserved be- 
tween invertebrates and vertebrates. 

A second type of cell death, termed necrosis or 
pathological cell death, contrasts with apoptosis in 
several respects. First, necrotic cell death does not 
appear to be part of normal development or homeo- 
stasis. Rather, this type of death generally occurs as a 
consequence of cellular injury or in response to 
extreme changes in physiological conditions. Second, 
the morphological changes observed during necrosis 
differ greatly from those observed in apoptotic cell 
death. Necrotic cell death is characterized by gross 
cellular swelling and distention of subcellular organ- 
elles such as mitochondria and endoplasmic reticulum. 
Clumping of chromatin is observed and DNA degrad- 
ation occurs by cleavage at random sites. In general, 
necrosis occurs in response to severe changes of physi- 
ological conditions including hypoxia and ischemia, 
and exposure to toxins, reactive oxygen metabolites, 
or extreme temperature. 

Necrotic cell death is a significant problem in 
human health. For example, the excitotoxic neuronal 
cell death that accompanies oxygen deprivation asso- 
ciated with stroke is a major contributor to death and 
disability. Ischemic diseases of the heart, kidney, and 
brain have been cited as the primary causes of mor- 
tality and morbidity in the US and industrialized 
nations. Necrosis is believed to occur independently 
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of de novo protein synthesis and is generally thought 
to reflect the chaotic breakdown of the cell. However, 
given that many cells of diverse origins exhibit stereo- 
typed responses to cellular injury, it is conceivable that 
a conserved ‘execution’ program, activated in response 
to injury, may exist. 

It should be noted that some have argued that more 
than just these two patterns of cell death can be dis- 
tinguished. Intensive research into death mechanisms 
supports that the initial distinction between apoptosis 
and necrosis is an oversimplification. For example, 
alternative morphological death profiles have been 
described and certain dying cells are known to exhibit 
some, but not all, commonly distinctive features of 
either apoptosis or necrosis. Likewise, certain markers 
of death can be expressed by both apoptotic and 
necrotic cells. Although remarkable progress in under- 
standing apoptosis has been accomplished, understand- 
ing necrosis and alternative death mechanisms is more 
limited. 


Caenorhabditis elegans as a Model System 
for the Study of Cell Death 


Caenorhabditis elegans is a small (1.3 mm), free-living 
soil nematode that feeds on E. coli in the laboratory. A 
key strength of the C. elegans model system resides in 
the extensive genetic analyses that can be conducted 
with this animal. The ability of C. elegans to repro- 
duce by self-fertilization renders the production and 
recovery of mutants easy — homozygous mutants seg- 
regate as F, progeny of mutagenized parents without 
any required genetic crossing. Mutant alleles are read- 
ily transferred by male matings so that complement- 
ation analysis and construction of double mutant 
strains is straightforward. Positions of thousands 
of genes on the six C. elegans chromosomes have 
been determined. This genetic map has been aligned 
with the physical map of the genome (a collection of 
overlapping DNA clones that span the six chromo- 
somes). Sequence analysis of the C. elegans genome 
has been completed. Transgenic nematodes are con- 
structed by injecting DNA into the hermaphrodite 
gonad where it is packaged into developing oocytes. 
C. elegans is well suited for the study of both 
normal and aberrant cell death at the cellular, genetic 
and molecular levels. There is no model system in 
which development is better understood. The animal 
is essentially transparent throughout its life cycle and 
individual nuclei can be readily visualized using dif- 
ferential interference contrast optics. These attributes 
have enabled the complete sequence of somatic cell 
divisions, from the fertilized egg to the 959-celled 
adult hermaphrodite, to be determined. Elucidation 
of the lineage map has revealed that in certain lineages, 


particular divisions generate cells which die at specific 
times and locations and that the identities of these ill- 
fated cells is invariant from one animal to another. The 
ability to easily recognize dying cells within a living 
animal has allowed identification of mutants with 
aberrant patterns of both apoptotic and necrotic cell 


death. 


A Conserved Apoptotic Death 
Mechanism in Caenorhabditis elegans 


C. elegans development includes the programmed 
death of 131 identified cells. Genetic studies have 
identified several genes that participate in all C. elegans 
programmed cell deaths. ced-3 (cell death abnormal) 
encodes a cysteine protease (caspase) that is essential 
for death execution. The ced-4 product activates CED- 
3 activity and is also required for all programmed cell 
deaths. In cells fated to live, the death program is held in 
check by negative regulator CED-9, which can be 
antagonized by EGL-1. Both activation and negative 
regulation may be controlled by physical association/ 
multimerization of these proteins in the vicinity of the 
mitochondrial membrane. After death, cell corpses are 
removed by the products of two groups of genes that act 
in two parallel pathways (one includes ced-1, ced-6, and 
ced-7; another includes ced-2, ced-5, ced-10, and ced- 
12). These ‘undertaker’ genes are required for phago- 
cytosis and degradation of dead cells. 

Analysis of gene function in C. elegans pro- 
grammed cell death has had an important influence 
in advancing understanding of mammalian apoptopic 
death mechanisms because regulators, executors, and 
undertakers of programmed cell death are function- 
ally conserved from nematodes to humans. CED-3 is 
related to the mammalian caspases that execute apo- 
ptotic cell death, CED-4 is related to Apaf-1, CED-9 is 
a member of the mammalian BCL-2 family and EGL- 
1 is a member of the death-regulatory BH3-only 
family. Because apoptotic cell death is discussed else- 
where in this volume, we focus on nonapoptotic cell 
death in this section. 


Degenerins and Neurodegeneration in 
Caenorhabditis elegans 


Unusual gain-of-function mutations in several spe- 
cific C. elegans ion channel genes induce necrotic-like 
death of the neurons that express these channel genes. 
For example, dominant mutations in the mec-4 gene 
(mechanosensory; mec-4(d)) induce degeneration of 
six touch receptor neurons required for the sensation 
of gentle touch to the body. (In contrast, most mec-4 
mutations are recessive loss-of-function mutations 
that disrupt body touch sensitivity without affecting 


touch receptor ultrastructure or viability). Similarly, 
dominant mutations in deg-1 (degenerin; deg-1(d)) 
induce death of a group of neurons that includes the 
PVC interneurons of the posterior touch sensory 
circuit. (Loss-of-function mutations in deg-1 appear 
wild-type in behavior.) 


mec-4 and deg-! Encode lon Channel 
Subunits of the DEG/ENaC Superfamily 
mec-4 and deg-1 encode proteins that are 51% iden- 
tical. These genes were the first identified members of 
the C. elegans ‘degenerin’ family, so named because 
several members can mutate to forms that induce cell de- 
generation. Included in this family are mec-10, which 
can be engineered to encode toxic degeneration- 
inducing substitutions, unc-8, which can mutate to a 
semidominant form that induces swelling and dys- 
function of ventral nerve cord; and unc-105, which 
appears to be expressed in muscle and can mutate to 
a semidominant form that induces muscle hypercon- 
traction. Thus, a general feature of the degenerin gene 
family is that specific gain-of-function mutations have 
deleterious consequences for the cells in which they 
are expressed. 

C. elegans degenerins share sequence similarity 
with subunits of the vertebrate amiloride-sensitive 
epithelial Na* channel. «-, B-, and y-ENaC (for epi- 
thelial Na* channel) are homologous subunits of the 
multimeric Na* channel that mediates Na* absorp- 
tion in epithelia of the distal part of the kidney tubule, 
the urinary bladder, the distal colon and the lung. The 
degenerin family of C. elegans currently includes 23 
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members that have been characterized or predicted 
by the C. elegans Genome Sequencing Consortium. 
Given such a large C. elegans gene family, it is pre- 
dicted that the mammalian ENaC family should like- 
wise be large. Because many C. elegans degenerins can 
mutate to toxic forms that induce neurodegeneration, 
the neuronally-expressed mammalian family members 
are logical candidates for genes that can mutate to 
cause neurodegeneration in higher organisms. In this 
regard it is interesting that mammalian MDEG, 
engineered to encode an amino acid substitution analo- 
gous to the change in mec-4(d) (see below), induces 
degeneration when expressed in Xenopus oocytes and 
embryonic hamster kidney cells. 


Morphology, Timing, and Ultrastructure of 
mec-4(d)- and deg-I (d)-Induced 
Neurodegeneration 

Although mec-4(d) and deg-1(d) mutations kill differ- 
ent groups of neurons, the morphological features of 
cell deaths they induce are the same. The time course 
of degeneration depends upon the dosage of the toxic 
allele, but on average can take approximately 8 hours. 
When viewed using the light microscope, the nucleus 
and cell body of the affected cell first appear distorted 
and then the cell swells to several times its normal cell 
diameter (Figure |). Eventually the swollen cell dis- 
appears, often after shrinking but sometimes as a 
consequence of cell lysis. Interestingly, the swollen 
character of mec-4(d)- and deg-1(d)-induced deaths 
resembles the morphologies of mammalian cells 
undergoing necrotic cell death. 


Figure | 
black ones point to dying ones. A cell undergoing apoptotic or programmed cell death, is shown in (A). In (B), a 
degenerating cell has swollen to several times its diameter and adopts a vacuole-like appearance that is different from 
the compacted, button-like structure of the apoptotic cell. 


Apoptotic and degenerative cell death in Caenorhabditis elegans. White arrows indicate normal cells while 
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At the ultrastructural level, cells dying as a conse- 
quence of mec-4(d) and deg-1(d) expression exhibit 
some remarkable features. The first detectable 
abnormality apparent in an ill-fated cell is the forma- 
tion of small tightly wrapped membrane whorls that 
seem to originate at the plasma membrane. These 
whorls are internalized and appear to coalesce into 
large electron-dense membranous structures. Large 
internal vacuoles form and distortion of the nucleus 
by these vacuoles is associated with chromatin clump- 
ing. Finally, organelles and cytoplasmic contents are 
degraded, usually leaving a membrane-enclosed shell. 
The striking membranous inclusions suggest that 
intracellular trafficking may contribute to degener- 
ation. Interestingly, in some mammalian degenerative 
conditions such as neuronal ceroid lipofuscinosis 
(Batten disease; the mnd mouse) and that occurring 
in the wobbler mouse, cells develop vacuoles and 
whorls (fingerprint bodies) that look similar to inter- 
nalized structures in dying C. elegans neurons. This 
suggests that some degenerative processes may be 
similar in nematodes and mammals. 

The touch receptor neurons in mec-4(d) mutants 
express terminally differentiated properties before 
they die and the PVC neurons in deg-1(d) mutants 
differentiate and function before they degenerate. 
mec-4(d)- and deg-1(d)-induced cell deaths have 
therefore sometimes been referred to as the nematode 
version of ‘late onset’ neurodegeneration. Careful stud- 
ies of the timing of mec-4 expression relative to the 
onset of degeneration support that onset of neuro- 
degeneration is correlated with the initial expression 
of the toxic gene product. 


Death-Inducing Channel Mutations and 
Models for Initiation of Neurodegeneration 
mec-4(d) and deg-1(d) alleles encode substitutions for 
a conserved alanine that is positioned extracellularly, 
adjacent to pore-lining membrane-spanning domain. 
The size of the amino acid sidechain at this position is 
correlated with toxicity — substitution of a small side- 
chain amino acid does not induce degeneration 
whereas replacement of the Ala with a large sidechain 
amino acid is toxic. This ‘rule’ suggests that steric 
hindrance plays a role in the degeneration mechanism 
and supports the following working model for 
mec-4(d)-induced degeneration. MEC-4 is postulated 
to be a subunit of a channel that, like other channels, 
can assume alternative open and closed conforma- 
tions. In adopting the closed conformation, the side- 
chain of the amino acid at MEC-4 position 713 is 
proposed to come into close proximity to another 
part of the channel. Steric interference conferred by 
a bulky amino acid side chain prevents such an 
approach, causing the channel to close less effectively. 


Increased cation influx results, initiating neurodegen- 
eration. That ion influx is critical for degeneration is 
supported by the fact that amino acid substitutions 
that disrupt the channel conducting pore can prevent 
neurodegeneration when present in cis to the A713 
substitution. In addition, large sidechain substitutions 
at the analogous position in some neuronally ex- 
pressed mammalian superfamily members do mark- 
edly increase channel conductance. 


mec-4(d)- and deg-I (d)-Induced 
Neurodegeneration Occur Autonomously 
and Independently of Programmed Cell 
Death Executors 

Genetic mosaic analyses first indicated that mec-4(d) 
kills as a consequence of a toxic activity within the 
cells that die. Ectopic expression of mec-4(d) can 
induce swelling and death of cells other than the 
touch receptor neurons, confirming the cell autonomy 
of mec-4(d) action. The execution of degenerative cell 
death occurs by a mechanism that appears distinct 
from that utilized in programmed cell death. At the 
genetic level, it has been demonstrated that ced-3(If) 
and ced-4(If) mutations do not block mec-4(d)- 
and deg-1(d)-induced cell degeneration. Likewise, 
mec-4(d) and deg-1(d) alleles do not affect pro- 
grammed cell deaths. 


Other Cellular Insults Can also Induce 
Necrotic-Like Cell Death, Suggesting a 
Common Response to Cell Injury 


In the case of degenerin-induced cell death, degener- 
ation is the consequence of a highly specific stimulus. 
One could argue that the death process is unique to 
this particular ion channel family. Evidence suggests, 
however, that necrotic-like cell death may actually 
be a general response to different ‘injuries.’ At least 
three additional genes cause C. elegans cell death that 
is morphologically similar to that induced by de- 
generins. 


Mutations in Other lon Channels 

Additional genes that increase channel activity cause 
vacuolar degeneration of C. elegans neurons. deg-3 
encodes a protein related to the vertebrate a-7 nicotinic 
acetylcholine receptor that, together with DES-2, 
forms a channel highly permeable to Ca**. Dominant 
allele deg-3(u662) induces swelling and degeneration 
of several C. elegans neurons. Interestingly, deg-3 
(u662) encodes a mutation similar to that of a char- 
acterized allele in the chick that decreases desensi- 
tization (thus increasing ion influx). Channel assays 
support that the C. elegans mutation causes a similar 
disruption. Consistent with this hypothesis, some 


nicotinic antagonists partially suppress deg-3(d)- 
induced defects. 


Activated Ga, 

Expression of constitutively active, GTPase-defective, 
heterotrimeric G protein Ga, (either from C. elegans 
or from rat) causes swelling and degeneration of many 
(but not all) cells in which the mutant gene is ex- 
pressed. 


Human Alzheimer’s Disease Amyloid 
Peptide AB 1-42 

Alzheimer’s disease can be caused by mutations that 
increase deposition of B-amyloid peptide 1-42 derived 
from the APP precursor protein. Expression of the 
toxic human fragment using the C. elegans bodywall 
muscle promoter unc-54 causes animals to become pro- 
gressively paralyzed as they develop and can induce 
necrotic-like death of some cells around the nerve 
ring. 


A Common lon Channel Theme in Necrotic- 
Like Cell Death in Caenorhabditis elegans? 
Although these genes normally are involved in dis- 
tinct processes, it remains possible that they share 
a common death-activating mechanism: alteration of 
channel activity. Consistent with this possibility, G 
proteins are known to modulate channel activity. 
Likewise, some studies have linked B-amyloid toxicity 
with altered channel function. 


Many Genes Can Mutate to Cause Necrosis 
New mutations that induce necrotic-like cell death 
can be isolated fairly readily in genetic screens for such 
mutations (the identities of genes affected are not yet 
known), consistent with the possibility that diverse 
insults can provoke a similar degenerative process. 

Along these lines, it is interesting that necrotic-like 
figures (of unknown origin) are commonly noted in 
aged animals. Could various cell injuries, environmen- 
tally or genetically introduced, converge to activate a 
degenerative death process that involves common bio- 
chemical steps? 


Genetic Requirements for Degeneration 


One of the key advantages of using C. elegans as a 
model organism for deciphering death mechanisms 
is that genetic approaches can be applied to the pro- 
blem. By isolating mutations that suppress degener- 
ation, molecular requirements for the degeneration 
process can be identified. Since several aspects of 
necrotic cell death appear conserved, this strategy 
may reveal new targets for therapeutic intervention 
in humans. 
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Although generally acting death suppressors have 
been isolated, data on these has yet to be published in 
the scientific literature. At present, best understood 
death suppressors affect specific death initiating 
stimuli. For example, mec-6 mutations can suppress 
degeneration induced by various hyperactivated 
degenerin channel mutations. mec-6 is thought to be 
specifically required for degenerin channel function; it 
is not needed for Ga,-induced cell death. 

One gene required for Ga,-induced cell death is 
acy-1/sgs-1, which encodes an adenyl cyclase 
expressed broadly throughout the nervous system. 
Although acy-1/sgs-1 is expressed in the touch recep- 
tor neurons, it is not required for mec-4(d)-induced 
touch cell degeneration. What does this say about 
the necrotic death process? There are two possibil- 
ities: first, distinct death mechanisms may be invol- 
ved in the necrosis induced by different initiating 
factors. Alternatively, the initiating events may feed 
intoacommon pathway ata point downstream ofacy-1. 
Characterization of broadly acting necrosis suppres- 
sors will indicate which of these alternatives applies. 


Parallels between Neurodegenerative 
Cell Death in Caenorhabditis elegans and 
Higher Organisms: A Common 
Degenerative Death Mechanism? 


Inappropriate channel activity is known to be causative 
for some mammalian neurodegenerative conditions. 
For example, it is interesting that the working model 
for the initiation of degenerative cell deathin C. elegans 
is remarkably similar to events that initiate excitotoxic 
cell death in higher organisms. In excitotoxicity, gluta- 
mate receptor ion channels are hyperstimulated by the 
excitatory transmitter glutamate and the resultant ele- 
vated Na* and Ca** transport induces death accom- 
panied by neuronal swelling. Mammalian ion channel 
mutations can also induce neurodegeneration. In the 
weaver mutant mouse, altered gating and ion selectiv- 
ity properties of the GIRK2 potassium channel are 
associated with vacuolar cell death in the cerebellum, 
dentate gyrus and olfactory bulb. 

It is noteworthy, however, that mutations in chan- 
nel genes are not the sole means by which vacuolar 
neurodegeneration can be induced in C. elegans. As 
noted above, necrotic-like death of some C. elegans 
cells can be induced by expression of human B-amyloid 
peptide. Mutations in transcription factor lin-26 cause 
hypodermal cells to become neuroblasts which swell 
and die. Also, since mutations that cause swelling and 
death can be isolated at a relatively high frequency, 
multiple gene classes appear capable of mutation 
to induce necrotic-like death. These observations 
and morphological parallels between nematodes and 
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higher organisms suggest that cell death might be 
induced by a variety of cellular ‘injuries’ and that a 
common death mechanism (rather than chaotic cellu- 
lar destruction) could operate to eliminate injured 
cells. The peculiar internalized membranous whorls 
observed suggest degenerin-induced death could 
involve disrupted intracellular trafficking, an interest- 
ing implication given that disrupted trafficking has 
been implicated in Alzheimer disease, Huntington 
disease, and ALS. Perhaps endocytotic responses pro- 
voked by diverse types of damage might be acommon 
element of diverse degenerative conditions. 


Future Prospects 


The identification of C. elegans mutations that cause 
necrotic-like cell death enables us to exploit the 
strengths of this model system to gain novel insight 
into a nonapoptotic death mechanism. The intriguing 
observation that distinct cellular insults can induce a 
similar necrotic-like response suggests that C. elegans 
cells may respond to various injuries by a common 
process, which can lead to cell death. 

The initiation of degenerative cell death in C. ele- 
gans and its general neuropathology are reminiscent of 
elements of excitotoxic cell death and other necrotic- 
like cell death in higher organisms. Excitotoxic neur- 
onal death mediated via glutamate receptors (channel 
proteins) in cell culture or im vivo in response to 
ischemia is an example of this type of cell death. It is 
also interesting that there are many reported instances, 
in animals as diverse as flies, mice, and humans, in 
which neurons degenerating due to genetic lesions 
exhibit morphological changes similar to those 
induced by mec-4(d) and other hyperactivated degen- 
erins. Given that apoptotic death mechanisms are 
conserved between nematodes and humans it can be 
hypothesized that various cell injuries, environmen- 
tally or genetically introduced, converge to activate a 
degenerative death process that involves common bio- 
chemical steps. At present the question of common 
mechanisms remains an intriguing but open question. 

If specific genes enact different steps of the degen- 
erative process, then such genes should be identifiable 
by mutation in C. elegans. Indeed, suppressor muta- 
tions in several genes that block mec-4(d)-induced 
degeneration have been isolated. Although some sup- 
pressor mutations affect channel function (for ex- 
ample mutations in mec-6), others are expected to be 
more generally involved in the death process. Analysis 
of such genes should result in the description of a 
genetic pathway for degenerative cell death. Perhaps, 
as has proven to be the case for the analysis of 
C. elegans programmed cell death mechanisms, 
elaboration of an injury-induced death pathway in 


C. elegans may provide insight into neurodegenerative 
death mechanisms in higher organisms. 


Further Reading 

Aguzzi A and Raeber AJ (1998) Transgenic models of neurode- 
generation. Neurodegeneration: of (transgenic) mice and 
men. Brain Pathology 8: 695—697. 

Canessa CM, Horisberger J-D and Rossier BC (1993) Epithelial 
sodium channels related to proteins involved in neurodegen- 
eration. Nature 361: 467—470. 

Dragunow M, MacGibbon GA, Lawlor P, et al. (1997) Apoptosis, 
neurotrophic factors and neurodegeneration. Reviews in 
Neuroscience 8: 223— 265. 

Driscoll M (1996) Cell death in C. elegans: molecular insights 
into mechanisms conserved between nematodes and mam- 
mals. Brain Pathology 6: 41 1-425. 

Heintz N and Zoghbi HY (2000) Insights from mouse models 
into the molecular basis of neurodegeneration. Annual Review 
of Physiology 62: 779-802. 

Lints R and Driscoll M (1996) Programmed and pathological cell 
death in C. elegans. In: Martin GR, Holbrook N and Lockshin 
RA, (eds) Cell Aging and Cell Death. New York: Wiley-Liss. 

Min KT and Benzer S (1997) Spongecake and eggroll: two 
hereditary diseases in Drosophila resemble patterns of 
human brain degeneration. Current Biology 7: 885-888. 

Min KT and Benzer S (1999) Preventing neurodegeneration in 
the Drosophila mutant bubblegum. Science 284: 1985-1988. 

Nakao N and Brundin P (1998) Neurodegeneration and gluta- 
mate induced oxidative stress. Progress in Brain Research | 16: 
245-263. 

Paulson HL (2000) Toward an understanding of polyglutamine 
neurodegeneration. Brain Pathology 10: 293-299. 

Warrick JM, Paulson HL, Gray-Board GL et al. (1998) Expanded 
polyglutamine protein forms nuclear inclusions and causes 
neural degeneration in Drosophila. Cell 93: 939-949. 


See also: Apoptosis; Caenorhabditis elegans; 
Neurogenetics in Caenorhabditis elegans 


Cenancestor 


W Fitch 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.0176 


Cenancestor is the term for the most recent common 
ancestor of the biological entities (organisms, struc- 
tures, proteins, genes, etc.) being considered. Its ety- 
mology is: cen-, from Greek kainos meaning recent (as 
in cenozoic) and koinos meaning common (as in ceno- 
bite), plus ancestor. See figure under Homology. 


See also: Cladograms; Homology 
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The linkage distance between two markers on the same 
chromosome is defined in terms of morgans, the mean 
number (M) of nonsister exchanges in the interval 
between the markers per haploid meiotic product. 
The more commonly used unit is the centimorgan(1 M 
= 100cM). The units are named after T.H. Morgan, 
American founder of the field of Drosophila genetics. 
Values of linkage distance are derived from observed 
recombination frequencies (R) in standardized crosses. 

Because recombination occurs in meiosis after 
chromosome replication, one nonsister exchange per 
bivalent in a given interval equals 0.5 nonsister ex- 
changes per haploid product of meiosis. Values of 
linkage distance for adjacent intervals must be additive 
— for markers linked in the order ABC, the map dis- 
tance M4, for interval AB + the map distance Mgc 
for interval BC must equal the map distance M 4c for 
interval AC (ie, Mag + Mgc=Mac). Whenever 
Rag + Rac © Rac, multiple exchanges are rare, and 
R ~ M. Additivity of R values will obtain when R 
values are small and/or interference reduces multiple 
exchange frequencies. Since meiotic recombination 
frequencies of 1% are usually additive, it is conven- 
tional for a meiotic recombination frequency of 1% to 
be set equal to 1 cM. 

For intervals with larger R values, R < M unless 
interference is positive and strong. M can be estimated 
for intervals with such large R values in several ways. 
(1) When markers are available to break the interval 
into smaller segments for which the R values are addi- 
tive, M for the inclusive interval is the sum of the 
R values of the smaller segments; (2) in the absence 
of markers that subdivide the interval, M can be esti- 
mated from R with the aid of a mapping function 
designed to transform (observed) R values to (addi- 
tive) M values; (3) when meiotic tetrad data are avail- 
able, the frequency of double exchanges in an interval 
can be estimated from the frequency of nonparental 
ditype tetrads, allowing an estimate of map distance 
that enumerates primarily single and double ex- 
changes: M ~ (T + 6 NPD)/2, where Tand NPD are 
the frequencies of tetratype and nonparental ditype 
tetrads, respectively. 


See also: Gene Mapping; Interference, Genetic; 
Map Expansion; Mapping Function; Tetrad 
Analysis 
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Centric fusion refers to the situation in which two 
telocentric chromosomes, i.e., with terminal or near- 
terminal centromeres and which are rod-shaped at 
metaphase or anaphase, appear to have become fused 
at or close to their centromeres to form one meta- 
centric chromosome, which is V-shaped at metaphase 
or anaphase. Examples are occasionally found as 
aberrations within populations, but are more often 
inferred from comparisons of chromosome sets 
between related species. Thus, in the insect order 
Orthoptera (grasshoppers and locusts) some species, 
including the common locust, have (apart from the 
X chromosome) 11 chromosomes, all rod-shaped, in 
the haploid complement, while others have nine 
rod-shaped and one V-shaped chromosome, and yet 
others have seven ‘rods’ and two ‘Vs’. In the fruit 
fly genus Drosophila the ancestral haploid karyo- 
type is considered to have consisted of five long 
rod-shaped chromosomes, including the X chromo- 
some, and one very short ‘dot-like’ chromosome. 
This situation persists in D. virilis, but in some 
other species there have apparently been either one 
or two centric fusions, converting two or four of 
the rod-shaped chromosomes into one or two Vs. 
The most studied species, D. melanogaster, is an 
example of the latter pattern. There are also a few 
apparent examples of centric fusion among mammals. 
Goats (2n = 60) have rod chromosomes exclusively, 
whereas sheep (27 = 54) have six pairs of rods re- 
placed by three pairs of Vs. Among flowering plants, 
one of the best of the rather few examples of centric 
fusion concerns species of Fritillaria, where the hap- 
loid chromosome number has been shown to vary be- 
tween 9 (4 rods plus 4 Vs) and 13 (12 rods and 1 V) with 
the number of chromosome arms remaining constant. 

One may ask how, in such examples as those above, 
one can know whether the difference between species 
is due to fusion rather than splitting at the centromere. 
It is difficult to be sure, but if one is making compari- 
sons within a species, the ancestral type is most likely 
to be the one that could have given rise to all the others 
with the fewest rearrangements overall. This type of 
argument has generally favored fusion rather than split- 
ting. Also, centric fusion seems mechanistically more 
probable, since it could be the result of a Robertsonian 
translocation (see Robertsonian Translocation), of 
which there are many examples. In a few plants, such 
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as Fritillaria, centric fission has been seen to occur as a 
result of ‘crosswise’ misdivision of the centromeres in 
lagging chromosomes at anaphase I of meiosis, but 
such an event has not been widely observed and the 
resulting telocentric chromosomes are probably not 
stably transmitted. 

What is the difference between centric fusion and 
Robertsonian translocation, or are they the same 
thing? A Robertsonian translocation is usually seen 
as resulting from two breaks in different telocentric 
chromosomes in positions close to the centromeres, 
very probably within heterochromatin, but not actu- 
ally within the centromeres themselves, in so far as 
their limits can be defined. On this assumption, each 
of the translocation products has just one of the original 
centromeres, but one product, probably consisting 
mainly or entirely of heterochromatin and hence 
genetically inert, has been lost without consequence 
for the viability of the cell. On the other hand, centric 
fusion in a strict sense should mean fusion following 
breakage within the centromeres themselves. It is very 
difficult to distinguish between the two possibilities 
solely by microscopy, and without DNA sequencing 
across the centromere regions of the species being 
compared (which has in fact never been done). Evi- 
dence for true centric fusion has been claimed in one 
particular case. Two species of muntjac deer show an 
extraordinary difference in chromosome number and 
size. The Indian muntjac (Muntiacus muntjac vagina- 
lis) has just three pairs of large V-shaped chromo- 
somes, whereas the Syrian muntjac has 23 pairs of 
relatively very small telocentric chromosomes. Obvi- 
ously, centric fusion, which can only reduce chromo- 
some number by a factor of two, could not account for 
all of this difference, but it could be a part of the 
explanation. Brinckley and colleagues investigated 
the fine structure of centromeres by staining chromo- 
somes with fluorescent antibodies specific for centro- 
meric protein, and the resulting appearance of the 
chromosomes suggested that the Indian muntjac 
centromeres were compound, with internal linear re- 
petition, as compared with the Syrian muntjac 
centromeres. Investigation of the centromere struc- 
tures at a finer level, i.e., by DNA sequencing, is 
required for a more certain conclusion to be reached. 
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Centrioles are specialized structures or organelles of 
animal cells containing two orthogonally arranged 
cylinders, each with nine microtubule triplets com- 
posing the wall. The forming (immature) centriole is 
termed the procentriole. Multiple procentrioles are 
present in some cells. Centrioles divide prior to mito- 
sis and the daughter centrioles become located at the 
poles of the spindle. 


See also: Mitosis; Spindle 
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Appearance under the Microscope 


The centromere is the chromosome region that attaches 
toa spindle fibre at metaphase of mitosis or meiosis and 
moves to the spindle pole at anaphase, pulling the rest of 
the chromosome behind it. It can often be distinguished 
microscopically at metaphase as a thin constriction in 
the otherwise thick condensed chromosome, and a 
point at which the chromosome is flexible and free to 
bend. The two chromatids into which metaphase 
chromosomes are usually visibly divided are held 
together in the centromere region. In particularly 
clear chromosome preparations chromatids sometimes 
appear separate at the centromere core but adherent at 
points closely placed on each side (Figure 1A). The 
position of the centromere is constant for a particular 
chromosome, but variable between chromosomes, 
which are called metacentric, acrocentric, or telo- 
centric, depending on whether their centromeres are 
more or less central, near the end, or terminal (Figure 1). 


Timing of Centromere Splitting 


In mitosis, the cohesion of sister chromatids at the 
centromere lapses at the end of metaphase, enabling 


the daughter chromosomes to move apart towards 
the two poles of the spindle. In meiosis, in contrast, the 
chromatids remain joined at the centromere at the 
first anaphase. The bivalent chromosomes, resulting 
from pairwise synapsis and chiasma formation, each 
separate into two dyads, each consisting of two chro- 
matids joined at the centromere (Figure 1B), which 
is not split until the end of metaphase of the second 
division. Thus the centromere can be defined genetic- 
ally as that point in a linkage map that always segre- 
gates at the first division of meiosis (reductionally) and 
never at the second division (equationally). 


Kinetochores 


Centromeres attach to spindle fibers through protein 
structures called kinetochores. Except in organisms 
without localized centromeres (see below), there is 
one kinetochore per chromatid, and mitotic metaphase 
chromosomes, divided into two chromatids, have one 
kinetochore directed towards each pole of the spindle. 
Kinetochores can sometimes be seen under the micro- 
scope to be stretched towards the spindle poles. At the 
first division of meiosis, there are again two kineto- 
chores on each of the undivided centromeres, but here 
they are pointed in the same direction; this may be at 
least part of the explanation for the centromeres 
remaining undivided (Figure |B). 


Centromere Structures 


The structures of centromeres are extremely diverse 
between different organisms. They have been invest- 
igated most completely in the budding yeast Saccharo- 
myces cerevisiae. With the completion of the yeast 
genome project, the DNA sequences underlying all 
18 centromeres are known, and the minimum sequence 
necessary for centromere function has been determined 
by the testing of chromosome fragments in yeast arti- 
ficial chromosomes. This minimum sequence, only 
125bp long, is, with minor variations, conserved 
between chromosomes, and consists of three elem- 
ents, CDEI, II, and III, of which CDEIII has the 
best-defined function. It is an imperfect palindrome 
of 25 bp, and it binds to the innermost proteins of the 
kinetochore, which is a highly complex multiprotein 
structure with multiple functions in the regulation of 
chromosome division, the most obvious of which is 
binding to a spindle fiber (microtubule). CDEI is an 
imperfect palindrome of 8 bp which binds one known 
protein and facilitates centromere function without 
being absolutely essential. The remaining part of the 
centromere, CDEII, is a spacer between CDEI and 
CDEHI and consists of a less specific A/T-rich 
sequence (Figure 2A). 
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Figure | (A) The three chromosomes in the haploid 
genome of the Indian muntjac deer, as seen at metaphase 
of mitosis. The centromeres are clearly visible as thin 
constrictions. Chromosomes | and 2 are metacentric 
and acrocentric, respectively. Note that sister chroma- 
tids appear to be attached together at points closely 
flanking the centromere cores, rather than at the cores 
themselves, which are visibly divided. The X chromo- 
some centromere is unusual in appearing to extend 
along a substantial chromosome segment. (Redrawn 
from Lima-de-Faria (1983); original preparation of K. 
Fredga (1971) (Hereditas 36: 322-337.) (B) Chromo- 
some dyad (two chromatids attached at their centro- 
meres) at anaphase | of meiosis in the plant Tradescantia. 
The two kinetochores (one for each chromatid) are 
seen stretched towards the same pole of the division 
spindle. (Redrawn from Lima-de-Faria, 1983.) 


The fission yeast Schizosaccharomyces pombe has 
only three chromosomes, with distinctive centromere 
regions very much larger than those of budding yeast. 
They each have an essential central core (cc1, 2, and 3) 
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of 4-7 kb; there is some sequence similarity between 
cc1 and cc3, but little or none between either and cc2. 
Flanking the cc regions are long and, to some extent, 
mutually inverted repetitive sequences, amounting to 
approximately 38, 65, and 97 kb for centromeres 1, 2, 
and 3, respectively. The sequences flanking cc1 form 
an almost perfect inverted duplication (Figure 2B). 

Mammalian centromere and kinetochore structure 
has also been much investigated. The multiprotein 
kinetochores have some resemblance to those of Sac- 
charomyces, but the underlying DNA is quite differ- 
ent. There is no analog of the CDEIII palindrome 
to serve as an attachment point for the kinetochore. 
Instead, the kinetochores are positioned somewhere 
within long sequences (240 kb to several megabases in 
humans) of repetitive DNA of the alphoid («-satellite) 
type (Figure 2C). Satellite DNA is characteristically 
associated with heterochromatin, which in many 
organisms (e.g., mammals, Drosophila spp., and many 
flowering plants) is found mainly in blocks flanking 
centromeres, but it does not appear to play any essen- 
tial part in centromere function. There is much 
evidence that, over the long term, repetitive DNA 
sequence tends to expand in an invasive way, and 
it may be that, for reasons unknown, centromeric 
regions are where this apparently ‘selfish’ DNA can 
be accommodated with the least disruption. 


Neocentromeres 


Centromeric DNA sequence is extremely variable, 
both between and within species. The ability of cen- 
tromeres to become established over different DNA 
sequences is most strikingly shown in the formation 
of neocentromeres — more or less functional new cen- 
tromeres with associated kinetochores that sometimes 
appear on chromosomes that have had their regular 
centromeres deleted or otherwise inactivated. These 
have been particularly studied in cultured human 
cells. A well-investigated example is a neocentro- 
mere in a partially deleted human chromosome 10. 
Here the DNA spanning the neocentromere has 
been sequenced and found to contain no alphoid satel- 
lite nor any other sequence similarity to regular cen- 
tromeres. 

A hypothesis that is attracting increasing attention 
is that kinetochores are propagated epigenetically 
(Karpen and Allshire, 1997). On this view, new kinet- 
ochores are built on newly replicated chromosomes 
at the same sites as the old ones, not primarily because 
of the DNA sequence but because some trace of kinet- 
ochore structure is already there. The formation of 
neocentromeres suggests that, in centromere-deficient 
chromosomes, kinetochores can be established de 
novo, without strict DNA sequence requirements, 
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Figure 2 Centromere DNA structures. (A) Budding 
yeast, Saccharomyces cerevisiae, all 18 centromeres; |, 2 
and 3 are the regions named CDEI, CDEIl, and CDEIII 
(the latter is an imperfect inverted repeat). (After 
Karpen and Allshine, 1997.) (B) Fission yeast, Schizosac- 
charomyces pombe, chromosome |, with six kinds of 
repetitive sequence flanking the centromere core (ccl), 
together forming a long inverted repeat. (From Choo, 
1997.) (C) The human Y chromosome centromere 
region, with the black box representing «-satellite 
sequence, the stippled box a region of 5bp repeats, 
and the other boxes repetitive sequences of other kinds. 
(After Karpen and Allshire, 1997.) 


but no doubt with greater probability over some 
DNA sequences than others. The fact that most 
organisms have a single localized centromere per 
chromosome implies that, once established, a cen- 
tromere (or more likely the kinetochore that it carries) 
effectively inhibits the formation of a neocentromere 
on the same chromosome. This inhibition can be sup- 
posed to be absent in those organisms, which include 
nematode worms, some arachnids (spiders and scor- 
pions), and some monocotyledonous plants (Carex, 
Luzula), that do not have localized centromeres. In 
these very various species, spindle fibers attach to 
multiple kinetochores along the whole lengths of the 
chromosomes, which consequently separate at ana- 
phase as parallel rods rather than in the usual pole- 
directed arrowhead orientation 
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The process of translation is traditionally divided into 
three steps: initiation, elongation, and termination. 
Soluble protein factors catalyze the process by bind- 
ing to the ribosome transiently. More than ten fac- 
tors participate in eubacterial translation whereas a 
considerably larger number participate in eukaryotic 
translation. 


Initiation 


In the initiation phase of protein synthesis a messenger 
RNA (mRNA) is bound to the ribosome. In this 
process the correct initiation (methionine) codon 
from which the translation begins is selected. Eubac- 
terial initiation is stimulated by three initiation 
factors, IF-1, IF-2, and IF-3. In eukaryotes a much 
larger number of initiation factors participate. The 
fundamental steps in initiation are the binding of the 
mRNA to the small subunit, the subsequent binding 
of the initiator tRNA, and the attachment of the large 
subunit to this initiation complex. IF-2 in complex 
with GTP binds with the initiator tRNA to the ini- 
tiation codon of the mRNA on the small subunit, 
which in turn associates with the large subunit. IF-1 
may assist IF-2 in binding the initiator tRNA to 
the P-site, whereas IF-3 prevents the large subunit 
from associating before a proper initiation as been 
completed. 


Elongation 


During each cycle of elongation one amino acid is 
incorporated into the nascent peptide. The elongation 
factors (three in eubacteria) catalyze two of the basic 


steps in translation: binding of aminoacyl-tRNA to the 
A-site and translocation of peptidyl-tRNA from 
the A-site to the P-site. During the translocation 
step the mRNA is moved so that the next codon is 
exposed in the A-site. However, the central chemical 
event in elongation, peptidyl transfer, seems to be a 
spontaneous process which does not require a protein 
factor. 

The recognition of the codon by the anticodon of 
the tRNA is a multistep process. The anticodon of the 
aminoacyl-tRNA, complexed to the elongation factor 
Tu (EF-Tu) and GTP, is matched against the codon in 
the A-site of the ribosome in a phase called initial 
selection. A good match allows EF-Tu to interact 
with the ribosome in a way that induces it to hydro- 
lyze its bound GTP to GDP and phosphate. This has 
the effect that the EF-Tu-GDP complex loses the 
affinity for the aminoacyl-tRNA and the ribosome. 
At this stage the aminoacyl-tRNA has an orientation 
where its amino acid moiety is far from the peptidyl 
transfer center. After the dissociation of EF-Tu the 
aminoacyl-tRNA can reorient itself in the A-site of 
the ribosome, while retaining the interaction with its 
codon. This process coincides with the proofreading 
of the anticodon of the tRNA to the codon of the 
mRNA. An incorrect (noncognate) match of the anti- 
codon to the codon increases the likelihood that the 
aminoacyl-tRNA will dissociate before its amino acid 
has reached the peptidyl transfer site of the ribosome. 

Peptidyl transfer is catalyzed by the rRNA of the 
large subunit without direct assistance of ribosomal 
proteins or elongation factors. Once the aminoacyl 
moiety reaches the A-site part of the peptidyl transfer 
site the peptide on the peptidyl-tRNA in the P-site 
can be transferred to it. This leads to a peptidyl-tRNA 
in the A-site and a deacylated tRNA in the P-site. 

The final step of elongation is the translocation of 
the peptidyl-tRNA from the A-site and the movement 
of the mRNA by three nucleotides so that next codon 
is exposed in the A-site. EF-G, which catalyzes this 
process, binds to the ribosome in complex with GTP. 
After translocation is performed it dissociates in com- 
plex with GDP. A surprising finding is that the ternary 
complex of EF-Tu with GTP and aminoacyl tRNA 
has the same shape as EF-G. It is possible that EF-G, 
when it dissociates from the ribosome, leaves an 
imprint that matches this ternary complex. 


Termination 


The termination of protein synthesis depends on the 
exposure of one of the three stop codons, UAG, 
UAA, and UGA, in the decoding part of the A-site. 
In eubacteria two release factors RF1 and RF2 partici- 
pate to decode the stop codons and hydrolyze the 
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completed peptide from the P-site tRNA. In eukary- 
otes they correspond to a single decoding factor, 
eRF1. The crystal structure of eRF1 indicates that 
these factors may perform their function by mimick- 
ing tRNA. The termination factor RF3 in all cases 
catalyzes the dissociation of the decoding factors 
from the ribosome. 

The ribosome recycling factor (RRF) has the role 
of removing the mRNA from the ribosome so that 
the ribosome is available to synthesize new protein 
from new mRNAs. It performs this role together 
with EF-G. An amazing observation is that RRF also 
closely mimics tRNA. This may suggest that RRF 
binds to a tRNA binding site, possibly the A-site, 
and is translocated from this site by EF-G. This would 
lead to the dissociation of the mRNA from the ribo- 
some and the ribosomal subunits from each other. 


See also: Translation 
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The chaperonins are a family of molecular chaperones 
that form cylindrical double-ring complexes. They 
assist, in an adenosine triphosphate (ATP)-dependent 
manner, the folding of newly synthesized polypep- 
tides and the refolding of proteins that are partially 
denatured under stress, for example, when the cells are 
exposed to high temperatures. Two distantly related 
subfamilies of chaperonins can be distinguished: 
Members of the GroE (group I) subfamily are found 
in eubacteria, mitochondria, and chloroplasts, whereas 
the TCP-1 (group II) chaperonins occur in archaebac- 
teria and eukaryotic cytosol. GroE terminology re- 
flects the fact that the eubacterial chaperonin was first 
identified as a protein required for the replication of 
bacteriophage lambda. ‘Gro’ refers to phage growth, 
and the suffix ‘E’ indicates that the GroE dependence 
for growthis overcome when the phage carries a muta- 
tion in the head gene E. TCP-1 terminology is derived 
from the identification of the protein T-complex poly- 
peptide, encoded in the mouse T locus, as a chaperonin 
subunit. 


Function 


Chaperonins are essential for cell viability in all 
growth conditions, because they are required for the 
efficient folding of numerous proteins that mediate 


vital cellular functions. Substrate proteins of the 
Escherichia coli chaperonin GroEL are involved in 
processes such as energy metabolism, protein bio- 
synthesis, and DNA rearrangement. Substrates of 
TRiC (T-complex polypeptide ring complex, also 
known as chaperonin-containing TCP-1, CCT), the 
cytosolic chaperonin of eukaryotes, include the cyto- 
skeletal components actin and tubulin. Generally, 
chaperonin substrates are thought to have relatively 
slow folding kinetics and therefore to be sensitive to 
aggregation during folding. At normal growth tem- 
peratures of 30-37 °C, GroEL interacts with 10-15% 
of total newly synthesized cytosolic proteins and with 
up to 30% under heat stress at 42 °C, where GroEL 
levels increase three- to fivefold. A subset of GroEL 
substrates are structurally unstable and require repea- 
ted chaperonin assistance for conformational mainten- 
ance even under normal growth conditions. GroEL 
acts posttranslationally and frequently cooperates in 
protein folding with the heat shock protein (Hsp)-70 
machinery of molecular chaperones. TRiC can inter- 
act cotranslationally with a subset of nascent chains 
and cooperates in their folding with molecular chaper- 
ones of the GIM/prefoldin family. 


Structure and Mechanism 


Chaperonins form large, approx. 800-kDa complexes 
with ATPase activity. They consist of two stacked rings 
of subunits that enclose separate cavities for the bind- 
ing of substrate polypeptide. The homo-oligomeric 
group I chaperonins have sevenfold symmetry, where- 
as group II chaperonins are hetero-oligomeric and 
contain eight or nine subunits per ring. The structure 
and mechanism of action of group I chaperonins are 
well understood. Their subunits have three domains: 
the equatorial domain binds and hydrolyzes ATP; it 
mediates most of the intersubunit contacts within and 
between rings and is connected via the hinge-like 
intermediate domain to the apical domain. The apical 
domains of the seven subunits form the ring opening 
and expose hydrophobic amino acid residues toward 
the central cavity. These hydrophobic patches provide 
binding regions for the hydrophobic surfaces of 
nonnative polypeptides. Folding is dependent on the 
cofactor GroES, a single heptameric ring of approx. 
10-kDa subunits that covers the openings of the 
GroEL cylinder. GroES binding displaces the sub- 
strate protein from its binding sites on GroEL. As a 
result, a single polypeptide chain becomes enclosed 
inside the GroEL-GroES cage, where it is protected 
from off-pathway aggregation reactions and can fold 
productively to the native state. Binding and release of 
GroES is timed by GroEL ATPase. GroES associates 
with the ATP form of GroEL and dissociates once 


the seven ATP molecules in the interacting GroEL 
subunits have been hydrolyzed, i.e., after approx. 
15s. At this point, GroES release is triggered by bind- 
ing of ATP to the opposite GroEL ring. Folded pro- 
tein leaves the cage, whereas incompletely folded 
protein may rebind for another folding attempt. 
Most GroEL substrates are below 60 kDa, the upper 
size limit of the folding compartment. Prevention of 
aggregation during folding seems to be the main 
feature of the chaperonin mechanism. Additionally, 
binding to the chaperonin may result in unfolding of 
kinetically trapped folding intermediates prior to their 
release into the folding cage. Group II chaperonins 
function without a GroES-like cofactor. Instead, 
closure of the chaperonin cavity is achieved by flexible 
a-helical extensions emanating from the apical do- 
mains of the chaperonin subunits. 


See also: Heat Shock Proteins; Proteins and 
Protein Structure 
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A character is any biological feature that occurs 
across a range of organisms and might thus be used 
to determine evolutionary relationships. Characters 
may be structural (bones, organs), molecular (genes, 
proteins), functional (flying, enzymatic activity), or 
behavioral (mating, food gathering). 


See also: Character State; Cladograms; 
Quantitative Trait 


Character State 


W Fitch 
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Character state is the term that denotes the various 
forms that a character may have. If the character is 
thumb, its state might be opposable; if nucleotide, 
perhaps adenine; if amino acid, perhaps tryptophan; 
if enzyme, perhaps hydrolase; if mating, perhaps eats 
mate. 


See also: Character; Cladograms 
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Chargaff’s Rules 
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Chargaff’s rules are empirical rules, first proposed by 
Erwin Chargaff, stating that in any DNA preparation 
the molar quantities of adenosine (A) and thymidine 
(T) are equal, as are the molar quantities of guanosine 
(G) and cytidine (C). Before Chargaff’s work, a com- 
monly held view of DNA structure was embodied in 
the tetranucleotide hypothesis, proposed by Phoebus 
A. Levene: a polynucleotide is made up of a repeating 
tetranucleotide of A, T, G, and C (not necessarily in 
that order). (Notice that under this model, DNA 
could not be an informational molecule, since all 
molecules would have (virtually) the same nucleotide 
sequence.) If this were true, every DNA sample ought 
to contain equimolar amounts of the four nucleotides. 
Chargaff’s observations contradicted this hypothesis. 
Chargaff’s rules also provided an important clue to 
Watson and Crick in elucidating the double-helix 
model of DNA, leading them to find the specific 
A-T and G-C bonding arrangements. 


Further Reading 

Chargaff E (1950) Chemical specificity of the nucleic acids 
and mechanism of their enzymatic degradation. Experientia 
6: 201. 


See also: DNA; Nucleotides and Nucleosides 


Charon Phages 


See: Vectors 
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Chi sequences, 5' GCTGGTGG 3’, are hotspots of 
homologous recombination in the bacterium Escher- 
ichia coli and related species. Chi is recognized by and 
alters the RecBCD enzyme, which produces at Chi a 
recombinogenic 3’-ended single strand of DNA and 
coats it with RecA protein. This DNA-protein com- 
plex invades a homologous duplex leading to an ele- 
vated frequency of recombination within a few kb of 
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Chi. Other sequences appear to have a similar role in 
more distantly related bacteria. 


Discovery of Chi and its Nucleotide 
Sequence 


Chi was discovered as a mutation that enhances the 
growth of phage lambda red gam mutants. Multimeric 
lambda DNA, produced by rolling circle replication 
or by recombination of monomeric DNA, is required 
to produce viable phage particles. In the absence of 
Gam protein the E. coli RecBCD enzyme blocks 
lambda rolling circle replication, the major route to 
multimeric DNA. In the absence of the Red recombin- 
ation pathway, recombination is limited to the host 
RecBCD pathway, which recombines lambda at 
low frequency. Consequently, lambda red gam phage 
make few viable phage particles and small plaques. 
Spontaneous mutants that form large plaques contain 
mutations creating Chi at one of four identified sites. 
These mutations increase the frequency of recombin- 
ation near Chi and thus the amount of multimeric 
DNA. Lambda red gam phage with plasmid pBR322 
inserted into it also makes small plaques. These results 
indicate that wild-type lambda and pBR322 contain 
no active Chi site. Mapping and sequencing of Chi 
mutations in lambda and pBR322 revealed eight 
nucleotides common to all active Chi sequences. Sec- 
ondary mutations inactivating Chi occur in this octa- 
mer 5'GCTGGTGG 3’, which is thus equated with 
Chi (see below). 


Genetic Properties of Chi 


As noted above, in phage lambda crosses Chi stimu- 
lates homologous recombination at or near the site of 
the mutation. Stimulation is exclusively by the 
RecBCD pathway and extends leftward from Chi 
(with respect to the direction Chi is written here); 
stimulation is greatest at Chi and diminishes by a 
factor of 2 for each 2-3 kb from Chi (Figure IA). A 
Chi site in only one parent shows high activity, even 
when the other parent carries a heterology of several 
kb opposite the Chi site. In this case recombination is 
stimulated in the region of homology just to the left of 
the heterology. Chi also stimulates E. coli generalized 
transduction by phage P1 and transformation by lin- 
ear DNA (gene targeting). Chi stimulates the forma- 
tion of high molecular weight DNA by plasmids 
that replicate as rolling circles; this stimulation, like 
Chi-stimulated homologous recombination, requires 
RecA protein and may reflect increased recombina- 
tion of the plasmids or decreased nuclease activity of 
RecBCD enzyme (see below). 


Interaction of Chi and RecBCD Enzyme 


The pathway specificity of Chi’s stimulation of 
recombination suggested that Chi interacts with 
RecBCD enzyme, the only known component unique 
to that pathway. RecBCD enzyme has multiple activ- 
ities on linear DNA, broadly classed as DNA unwind- 
ing and nuclease activities, both of which require 
hydrolysis of ATP or another NTP. Special mutations 
in the recB, recC, and recD genes, encoding the three 
subunits of the enzyme, reduce or abolish Chi activity 
but without total loss of recombination proficiency; 
these results suggested a direct interaction between 
Chi and RecBCD enzyme. Studies with purified 
RecBCD enzyme and DNA showed this direct inter- 
action, in which both the enzyme and the DNA sub- 
strate are changed. 

There are two distinct reactions of purified 
RecBCD enzyme at Chi. The outcome of these reac- 
tions depends on the reaction conditions, notably the 
ratio of the concentrations of ATP and Mg**, which 
form a 1:1 complex. Both reactions require that the 
enzyme enter linear DNA from the right (as Chi 
is written here) and that the DNA contain 
5’ GCTGGTGG 3’ on the upper strand (Figure IB). 
With excess ATP (i.e., with little uncomplexed Mg’) 
RecBCD enzyme nicks one strand (that with a 3’ end at 
the site of entry) a few nucleotides to the right (3’ side) 
of Chi; this reaction occurs only during DNA unwind- 
ing and releases single-stranded DNA products. With 
excess Mg** RecBCD enzyme degrades the 3’-ended 
strand up to Chi, ceases degradation of that strand, 
nicks the opposite strand, and continues to degrade 
this latter strand. Although it has not been demon- 
strated which, if either, of these two reactions occurs 
in E. coli cells, both reactions produce single-stranded 
DNA with a 3’ end bearing Chi, the ‘Chi tail,’ thought 
to be an important recombination intermediate. 

Chi also changes RecBCD enzyme: Chi alters the 
nuclease activity, as noted above, and activates the 
loading of RecA protein onto the 3’-ended Chi tail 
by RecBCD enzyme. This Chi-dependent change has 
been speculated to be a change in the RecD subunit or 
its ejection from the holoenzyme, since in some ways 
RecBC enzyme (lacking the RecD subunit) behaves 
like Chi-altered RecBCD enzyme. The only reported 
Chi-dependent physical change in RecBCD enzyme is 
the disassembly of all three subunits, which leaves the 
enzyme inactive; this change may happen not at Chi 
but when the enzyme reaches the end of the DNA 
substrate. Such inactivation would lead to one 
RecBCD enzyme promoting just one recombin- 
ational exchange, the minimum required at each DNA 
end to effect recombination of a linear DNA fragment 
with the circular chromosome. 
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(A) Localized stimulation of recombination by Chi in phage lambda crosses. |, la, etc. are genetic intervals 


bounded by markers located the indicated distance from a Chi site in lambda. Solid circles indicate the midpoints 
of each interval and the frequency of recombinants per physical length of that interval, normalized to interval II = |. 
(B) Action of purified RecBCD enzyme at Chi. With (ATP) > (Mg?*) RecBCD enzyme unwinds the DNA 


substrate, nicks the upper strand about five nucleotides to the right of Chi, and continues unwinding. With (Mg 


at) 


> (ATP) RecBCD enzyme degrades the upper strand up to Chi, nicks the lower strand, and degrades or unwinds it to 
the left of Chi. Both conditions produce single-stranded DNA with a 3’ end near Chi and extending to its left. 
RecBCD enzyme loads RecA protein onto this “Chi tail? (Reprinted with modification from Smith et al. (1995) with 


permission. 
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Distribution of Chi in the Escherichia coli 
Chromosome 


There are 1009 Chi sites in the 4.6 Mb E. coli genome, 
or 1 per 4.6 kb. This is about seven times more 
frequent than predicted from the random association 
of nucleotides in E. cols 51% G + C DNA. The 
frequent occurrence of Chi may be accounted for by 
its containing frequently used codons, such as CTG 
for leucine, and by ~90% of the genome encoding 
proteins. About 75% of the Chi sites are co-oriented 
with the direction of replication (left to right, as Chi is 
written above); this feature may reflect transcription 
(and hence translation) being preferentially in the 
same direction as replication. 


Similar Sites in Other Bacteria 


Chi appears to interact with the RecBCD enzyme 
of numerous enteric bacteria, such as Salmonella 
typhimurium and Klebsiella pneumoniae, to enhance 
recombination as it does in £. coli. This sequence does 
not appear to act in more distantly related gram- 
negative bacteria, such as Pseudomonas spp., however. 
In the gram-positive bacterium Bacillus subtilis 
5 AGCGG 3’ directs the AddAB enzyme, function- 
ally similar to RecBCD enzyme, to produce single- 
stranded DNA fragments with this sequence at or near 
the 3’ end, and it leads to high molecular weight DNA 
forms of rolling circle plasmids. In the gram-positive 
bacterium Lactococcus lactis 5 GCGCGTG 3’ appears 
to play a similar role. Thus, Chi or other sequences 
may be a signature of closely related, recombining 
bacterial species. Although the Chi sequence is 
found in eukaryotic DNA, there is no clear evidence 
that it affects recombination in eukaryotes, which do 
not appear to contain functional analogs of RecBCD 
enzyme. 
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In the context of reproductive biology, chiasma (pl. 
chiasmata, from Greek, meaning a cross) refers to the 
microscopically observable nonsister chromatid ex- 
changes in meiotic prophase nuclei of most sexually re- 
producing species: protists, fungi, plants, and animals. 
At meiotic prophase, the maternal chromosome 
(Figure l, gray lines) and the paternal chromosome 
(Figure l, solid lines) are paired and have duplicated 
so that each consists of two sister chromatids. The 
chiasma is the result of breakage and rejoining 
between two non-sister chromatids. Breakage and 
rejoining between sister chromatids does happen but 
is not observable with microscopy in the absence of 
dyes such BuDR, etc. 


Physical Exchange of Chromosome Parts 


The accumulated evidence indicates that a chiasma 
represents the site of genetic recombination between 
the parental chromosomes. For example, as shown 
in Figure | with morphologically marked chromo- 
somes (long satellite and knob), and genetic mar- 
kers (A/a and B/b), the genetically recombinant 
chromosomes (Ab and aB) also have a physical ex- 
change of satellites and knobs. Furthermore, in the 
absence of chiasmatain the short arm, the long satellites 
always are together at metaphase I. This is a variation of 
Barbara McClintock’s classical experiment which was 
done in the 1930s. 


Multiple Chiasmata 


The number of chiasmata per chromosome is a her- 
editary characteristic and is therefore genetically 
determined. Depending on the species, there can be 
one or several chiasmata in a set of paired chromo- 
somes. If the two adjacent chiasmata involve the same 
two nonsister chromatids, it is a two-strand double 
crossover. Depending on which chromatids are 
involved, there can also be a three-strand or a four- 
strand double crossover. As a rule, each chromosome 
pair, no matter how short, should have at least one 
chiasma to assure proper chromosome segregation at 
the first meiotic division. 


Localized Chiasmata 


The positions of chiasmata along the length of the 
paired chromosomes are genetically determined and 
can be highly localized or may appear to be more 
evenly distributed. In some cases, there can be a pre- 
determined single chiasma next to the centromere, 
whereas a closely related species can have several 
chiasmata along the length of each chromosome pair. 
In other cases the single chiasma can be located strictly 
at the end of the chromosomes, and a closely related 
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natural selection must have favored one or the other 
mode. It has been speculated that localization of 
chiasma preserves large blocks of genetic material 
that is not altered by recombination. This may be a 
biological adaptation to relatively stable environmen- 
tal conditions. 


Chiasma Interference 


It is an observed but unexplained fact that, in most 
organisms, the presence of a given chiasma interferes 
with the occurrence of a chiasma nearby. As a conse- 
quence, chiasmata are rarely distributed in a random 
fashion. The presence of chiasma interference is 
reflected at the genetic analysis level by the fact that 
for short genetic distances there are fewer than ex- 
pected double crossovers. 


Chiasma Resolution 


In order for the paired chromosomes to separate at 
the first meiotic division, it is necessary for the 
chiasmata to be resolved. The older models postulated 
that the chiasmata slide to the end of the paired 
chromosomes, a process referred to as ‘chiasma ter- 
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(See Plate 3) Diagrammatic representation of chiasma formation and resolution. The light gray colored 


chromosome and its dark homolog have a reciprocal exchange event between the positions of genes A and B involving 
nonsister chromatids. The chromosome core holds the sister chromatids together when the centromeres are pulled 
to opposite poles (arrows). Under tension, the cross formation (chiasma) becomes evident. When the proteins of the 
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exchange is accompanied by a physical exchange of chromosome parts. 
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satisfactory. On the contrary, experimentally differ- 
entiated chromatids gave clear evidence that the 
chiasma position does not change. It now appears 
that the axial cores of each pair of sister chromatids 
(see Figure |) prevent the resolution and movement 
of the chiasma. (The axial core is the protein structure 
around which the chromatin organizes in meiotic pro- 
phase.) When the core proteins are degraded at the 
first meiotic division, the sister chromatids are free to 
separate except at the centromere and, as a conse- 
quence, the chiasmataare resolved. Apparently the core 
proteins that reside between the sister centromeres are 
exempt from the degradation so that the centromeres 
stay together until the second meiotic division. 


Recombination Nodules 


At the high resolution of an electron microscope it 
is possible to detect a few small, 100-nanometer, 
dense bodies along the length of the paired chromo- 
some cores. The positions of these nodules correlate 
with the positions of chiasmata at the later stages of 
meiotic prophase. It is likely that these nodules are 
conglomerates of proteins that are involved in the 
DNA metabolism at the site of a crossover. In con- 
trast to these few late nodules, there are numerous 
early nodules that do not necessarily correlate with 
chiasmata in number or position. These early nodules, 
however, are associated with proteins involved in 
the initiation of early recombinant events. The func- 
tion of these structures in the initiation, maturation, 
and resolution of recombination is under investiga- 
tion. 


Molecular Mechanisms 


The chiasma/crossover is based on the induction 
of chromosome breaks by the cell. While these 
breaks are double-stranded in Saccharomyces cerevi- 
siae and Schizosaccharomyces pombe, no similar infor- 
mation has been reported to date for other species. 
The induction of breaks is a remarkable process 
because cells in general guard strongly against breaks 
in DNA. Several mechanisms in somatic cells detect 
damage, arrest cell proliferation, and either repair the 
damage or cause cell degeneration. Meiotic prophase 
cells, on the contrary, activate enzymes that induce 
DNA breaks. This is followed by detection and repair 
processes with the employment of somatic and 
meiosis-specific enzymes. Whereas in somatic cells, 
recombinational repair can make use of the un- 
damaged sister chromatid as a template, in meiotic 
cells, meiosis-specific enzymes direct the repair 
towards the non-sister chromatid thereby promoting 
the formation of a chiasma. 


Further Reading 

Jones GH (1987) Chiasmata. In: Moens PB (ed.) Meiosis, pp. 
213-244. San Diego, CA: Academic Press. 

Sybenga J (1996) Recombination and chiasmata: few but intri- 
guing discrepancies. Genome 39: 473— 484. 


See also: Interference, Genetic; Recombination 
Nodules (RNs) 


Chimera 
C L Stewart 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.0190 


Websters Dictionary defines a ‘chimera’ in Greek 
mythology as “A fire-breathing she-monster having 
a lion’s head, a goat’s body, and a serpent’s tail.” Cur- 
rent scientific methods have not yet created quite so 
dramatic an organism, although chicken embryos, 
with parts of their organs derived from mouse cells, 
have been produced. A chimera, in current usage, is 
defined as an animal comprised of two or more gen- 
etically distinct cell populations derived from two or 
more zygotes. Chimeras have been, and continue to 
be, an extremely important tool in determining the 
role and functions of different cell types in many 
aspects of biology and in the generation of novel 
strains of animals carrying mutations in specific genes. 

Chimeras are experimentally produced by the 
transplantation of cells or tissues between indi- 
viduals, most often between embryos, although transfer 
between adults or between embryos and adults are 
also utilized. Chimeras have been widely used as an 
experimental technique particularly in the analysis of 
embryonic development. Cells or tissues from one 
embryo are transplanted into a recipient embryo. 
Usually the cells of one or the other embryo are 
marked in some manner so that they and their descend- 
ants can be distinguished from the cells of the other 
embryo. In some of the earliest experiments per- 
formed on amphibians the markers were naturally 
occurring pigment granules present in the cells. In 
interspecific avian chimeras, differences in between 
chicken and quail nucleolar morphology have func- 
tioned as an effective means of distinguishing between 
the two cell populations in chimeric tissues. In mam- 
mals, polymorphic differences in the electrophoretic 
mobility of certain constitutively expressed enzymes, 
differences in coat color, and chromosomal markers 
have all been employed in the analysis of chimeras. 
Recently, molecular biology has provided new and 
more convenient ranges of markers that are easier to 


use and are more informative. In particular, mice have 
been genetically engineered to express in all their cells 
enzymes such as {-galactosidase, alkaline phosphat- 
ase, or green fluorescent protein which result in the 
cells becoming colored when they are treated with 
particular substrates or fluoresce when exposed to 
UV light. These are particularly useful as they make 
it relatively easy to determine the distribution of cells 
within a whole chimeric embryo or histological sec- 
tion (Figure 1). 


Use of Chimeras in the Analysis of 
Embryogenesis 


By intermingling marked with unmarked cells at dif- 
ferent stages of embryogenesis, or between different 
tissues in newborn or even adult individuals, a re- 
markable body of information has accumulated 
regarding what cells do in embryos and in adults. 
This has been particularly informative as to how cells 
interact with each other in coordinating embryonic 
development. 

The earliest experiments employing chimeras were 
performed on amphibians. The great embryologist 
Hans Spemann demonstrated the existence of organ- 
izers, which established that particular groups of cells 
interact with (induce) other cells so regulating the 
formation of embryonic structures such as the neural 
crest (the precursor to the central nervous system) 
and anterior—posterior axis of the embryo. Transplant- 
ation of the dorsal lip from one early embryo to 
another induced the formation of a second embryonic 
axis in the recipient embryo. Similar experiments on 
other tissues, particularly in chickens, have demon- 
strated the existence of key groups of cells organizing 
neuronal distribution in the spinal cord, formation of 
the central nervous system, and the development of 
the limbs. 

Chimeras have been central to understanding cell 
lineages, i.e., the range or extent of different cell types 
formed from a precursor in chickens and mice. Ele- 
gant studies initiated by Nicolle LeDouarin using 
chick—quail chimeras established the embryonic ori- 
gins and development of the peripheral nervous sys- 
tem and of the craniofacial musculature, the origin and 
migration of blood cells, and the development of the 
immune system including the thymus and bursa of 
Fabricius. Chimeras are also providing information 
into the cellular and neurological basis of behavior. 
Grafts of brainstem tissue between embryonic chicks 
and quails identified a region of the brainstem that 
determines what type of song the chimera will sing 
as well as the type of head movement that accom- 
panies the singing. 
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Figure | A chimeric mouse embryo in which un- 
marked mouse ES cells were injected into a blastocyst 
expressing B-galactosidase as a marker and which colors 
the cells expressing it dark blue. The ES cells have almost 
entirely formed the embryo proper (2) except for part 
of the gut that is darkly labeled. The membranes (yolk 
sac) surrounding the embryo have been entirely derived 
from the blastocyst as they are all darkly labeled (1). 


In mammals, the mouse is the principal species used 
in genetics and embryology. Early studies on mice had 
largely been limited to creating the chimeras using 
preimplantation stages, as access to the later stages is 
complicated by their development in the uterus and 
their dependence on a placenta. Mouse chimeras made 
from preimplantation stages established that the 
trophoblast of the preimplantation embryo forms the 
placenta, whereas the inner cell mass (ICM) forms 
the embryo proper. Significant progress has however 
been recently made in analyzing later stages of devel- 
opment, due to the advent of gene targeting, the use 
of mosaics (see below), and the fact that embryos 
between 7 and 10 days of age can now be cultured 
and manipulated in vitro. The latter technique that has 
provided limited, but significant information on cell 
lineages at this critical time of mouse gastrulation, 
particularly with regard to the formation of the 
three germ layers and the way they organize to form 
the embryo. As in chickens, bone marrow transfer 
between adults or between embryo and adult mice 
have been central to understanding how the hemo- 
poietic and immune systems are formed and the roles 
of the different cell types that comprise both systems. 
Chimeras have been instrumental in determining the 
origin of germ cells, the cells that will ultimately form 
the eggs or sperm. Also, they revealed that in mam- 
mals phenotypic sex is determined by the genetic sex 
of the somatic cells in the gonads and not the genetic 
sex of the germ cells. 
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Gene Targeting and Chimeras 


At present the most extensive use of mammalian chi- 
meras is in the generation of so-called ‘knockout’ lines 
of mice. Embryonic stem (ES) cells, which are estab- 
lished from the ICM of the preimplantation embryo, 
can produce an entire mouse when injected into blasto- 
cysts. ES cells can also be selected in culture to carry 
a specific mutation introduced into a gene of interest. 
An ES clone, carrying a specific mutation, is then 
injected into recipient blastocysts where it integrates 
into the ICM and participates in embryonic develop- 
ment resulting in a chimera in which the ES cells form 
the gametes as well as other somatic tissues. As adult 
chimeras, these are bred and offspring derived carry- 
ing the mutated gene in all their cells (Figure 2). The 
heterozygotes are then intercrossed to produce homo- 
zygotes in which the mutation’s effect on gene func- 
tion can be studied in the context of the entire life 
cycle of the animal. Such a technology has been well 
established using mice, although it has had very limit- 
ed success with other mammalian species, primarily 
because it has been difficult to establish ES cells from 
these species, although human ES cells do exist. Leg- 
islation has forbidden the use of human ES cells for the 
genetic manipulation of embryos. However there is 
currently much interest as to whether these cells could 
be used as a source for the derivation of other cell 
types, e.g., hemopoietic stem cells that could be of 
therapeutic use in blood transfusion. Nevertheless, 
mouse ES chimeras are revolutionizing mammalian 
genetics in understanding the function and require- 
ment of particular genes in all aspects of mammalian 
biology, including the generation of mouse models 
for human congenital diseases, such as cystic fibrosis, 
Alzheimer disease, muscular dystrophy, and inherited 
forms of cancer. 


Mosaics 


Another form of chimera is the mosaic, which is a 
composite individual derived from a single fertilized 
egg. In mammals all females can be described as 
‘mosaics’ since they are a mixture of cells, differing 
from each other by which X chromosome has been 
inactivated during embryogenesis. As an experimental 
tool mosaics have been of greater use in the study of 
the development of worms and flies, as well as plants. 
Mosaics are generated by the individual marking of 
cells by a dye or by the introduction of specific genes. 
They can also be derived by inducing a specific genetic 
alteration (e.g., chromosomal translocation) in a cell, 
with all the descendants of the cell subsequently in- 
heriting the chromosomal change. Mosaics have been 
used to study cell lineages as well as to determine 


Figure 2 Three chimeric adult mice derived from the 
injection of embryonic cells from a black mouse into 
blastocysts derived from albino (white) mice. The mouse 
on the extreme left is entirely black showing that it 
was derived entirely from the injected black cells. The 
other two mice show intermediate mixtures of black 


and white coloring in their hair revealing that they are 
derived from both the embryos used to make them. 


what effect the genetic alteration has on the cells in- 
heriting the alteration. In mammals, mosaics, through 
the use of gene targeting techniques, are becoming in- 
creasingly important to understanding the role of genes 
in development. Cre-loxP technology has resulted in 
the ability to inactivate a gene in a specific tissue or 
cell type or at a specific stage of development. In this 
technique a particular region of a gene of interest is 
flanked by two loxP sequences. The loxP sequences 
are short stretches of DNA that when recognized by 
the Cre recombinase enzyme recombine and in doing 
so loop out and delete the gene’s DNA that lies 
between the two loxP sites. The loxP sequences are 
inserted in such a way, for example, into the genes 
introns, that they do not interfere with the gene’s 
normal function. When the Cre recombinase is 
expressed in a specific tissue, e.g., the heart or pan- 
creas, or at a particular time in development the Cre 
acts on the loxP sites deleting the intervening gene 
sequences and so inactivating the gene in that tissue. 
An example of the power of this technique, mice 
lacking the insulin receptor in all cells die shortly 
after birth. However, mice carrying a ‘floxed’ insulin 
receptor gene are viable. If these mice are crossed with 
mice that only express Cre recombinase in the islet 
cells of the pancreas, these cells then specifically lose 
expression of the insulin receptor and the mice 
develop a form of diabetes. 

Sequencing of the genomes of many higher organ- 
isms is a soon-to-be-completed task. The combination 
of gene targeting and the use of chimeras and mosaic 
analysis will go a long way to understanding how 


genes interact and function in regulating the embryo- 
genesis and life cycle of these organisms. 


Further Reading 
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The term ‘chimeric’ originates from the Chimaera of 
Greek mythology. The Chimaera was a monstrous 
creature with the head of a lion, body of a goat, and 
tail of a serpent. By analogy to a Chimaera, chimeric 
proteins are built from two or more individual pro- 
teins within the same physical body (polypeptide). 
They have been found to be extremely useful and to 
date gene fusion techniques are used within many 
areas of protein engineering. The applications of 
chimeric proteins are continuously being developed. 


Gene Fusion 


Chimeric proteins are prepared by fusing the struc- 
tural genes of the proteins in question in a suitable 
expression vector. The translational 3’ terminus of the 
first gene is deleted, as is the promoter at the 5’ ter- 
minus of the second structural gene. The two genes are 
then ligated in-frame and expressed in an appropriate 
host. The most frequently used hosts are bacteria such 
as Escherichia coli, but plant, mammalian, and insect 
cells have also been used. After transcription and 
translation, the cell will produce one single polypep- 
tide chain with the properties of both the original gene 
products. The fusion can be made at either or both 
termini of a protein. Whether one side is more favor- 
able for the biological activity of the protein or not has 
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to be evaluated for each fusion construct. The DNA 
molecules to be fused can be short, synthetic oligonu- 
cleotides or full-length structural genes. The increas- 
ing number of sequenced genes, in combination with 
PCR, provides many combinations for the possible 
fusion of structural genes from a variety of sources. 
The almost endless numbers of possible combinations 
of fusion partners have turned this technique into a 
versatile and valuable tool within many areas of bio- 
chemistry and biotechnology. 


Linker Regions 


It is crucial to the design of a chimeric protein that 
upon fusion the individual units retain their ability to 
fold independently of the remainder of the polypep- 
tide chain. Otherwise folding may result in nonactive, 
scrambled structures. There are many aspects to con- 
sider when designing and analyzing a linker region. 
The first requirement is to avoid proteolytic cleavage 
within the linker, which limits the amino acid com- 
position of the linker. Pairs of dibasic amino acids 
are often sites for proteolytic activity, as well as, for 
instance, repeats of glycine-glycine-X, where X is an 
amino acid with a hydrophobic side chain. The linker 
region should not depend on the rest of the protein for 
its stabilization and conformation. Since the preferred 
secondary structure within the linker is a coil or a bent 
structure, the amino acid composition of the linker 
is further restricted. In naturally occurring chimeric 
proteins, bulky amino acids as well as hydrophobic 
residues are avoided in the linkers, with the exception 
of the smaller hydrophobic residues alanine and pro- 
line. Glycine, serine, and threonine are strongly pref- 
erred. Other common linker residues are asparagine, 
glutamine, and lysine. 


Areas of Application 


The design of synthetic chimeric enzymes has proven 
to be a valuable tool in protein engineering and in 
enzymology in the study of proximity effects between 
enzymes. Particularly, if an enzymatic process is based 
on consecutive steps, it can be convenient to put the 
catalytic centers of separate enzymes in close prox- 
imity to improve the overall kinetics and yield of the 
process. In nature, this has been accomplished by 
the evolution of multifunctional proteins and multi- 
enzyme complexes. Several metabolic pathways seem 
to appear as free enzyme systems or multienzyme 
systems in those cellular forms that probably occurred 
early in evolution and appear as multifunctional 
enzymes or multienzymes in cells that arose later in 
evolution. This phenomenon has been explained by 
fusion of genes coding for the separate enzymes into 
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one polycistronic gene encoding a multifunctional 
polypeptide. The evolution of fatty acid synthases 
and the enzymes involved in aromatic amino acid 
biosynthesis are considered to be examples of evolu- 
tion by gene fusion. In most prokaryotic organisms 
and in plants, these enzymes exist as discrete, separate 
enzymes, whereas in other species various combin- 
ations of multifunctional enzymes are present. 

There is great interest in developing new tech- 
niques for fast and simple purification of proteins. 
For industrial purposes, such methods can improve 
the overall economy of the production process. 
Genetic engineering has made it possible to create 
chimeric proteins between the target protein and an 
affinity tag. The tag makes it possible to purify the 
protein to near homogeneity from a crude biological 
mixture often by a single-affinity chromatography 
step. In recent years, the most commonly used system 
for affinity purification has been histidine tagging, 
which facilitates purification, using immobilized 
metal ion affinity chromatography (IMAC). Histidine 
is a relatively rare amino acid in globular proteins 
(about 2%), and only about half of them are exposed 
on protein surfaces. A histidine tag on a target protein 
ensures a high affinity for chelated metals, which in 
turn makes the target protein unique and easily isol- 
ated from its contaminants. Histidine tails can also be 
used to purify proteins under denaturing conditions, 
which can be useful in the recovery of proteins in 
inclusion bodies. 

Fusions with antibodies as one fusion partner 
have simplified the design of affinity-based analytical 
methods, e.g., enzyme-linked immunosorbent assay 
(ELISA) and Western blotting. Chimeric proteins are 
also valuable tools with which to create new drugs for 
targeted delivery totumorcells. Recombinantimmuno- 
toxins are chimeric proteins in which an antibody or 
fragment of an antibody with affinity to the target 
cells is fused to a toxin. 

A very important application within molecular 
biology is the possibility to construct reporter mol- 
ecules for monitoring gene expression and protein 
localization. One of the most commonly used reporter 
molecules is the structural gene of B-galactosidase 
(lacZ fusions). Another example is the green fluores- 
cent protein that is currently being widely exploited 
as a reporter molecule in various gene expression 
studies. 

It has become possible to modify cell surfaces and 
to construct ‘display’ libraries on the surfaces of both 
bacteriophages and cells. These efforts have mainly 
been focused on the possibilities of creating peptide 
libraries. Chimeric proteins composed of peptides 
fused to membrane proteins have proved powerful for 
isolating, e.g., novel antigens and enzyme inhibitors. 


Further Reading 
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More than 450 species of the unicellular flagellated 
photosynthetic algae are comprised within the genus 
Chlamydomonas. Because of their small size, fast 
growth, and short sexual cycle, these organisms pro- 
vide unique possibilities for addressing important 
problems in cell and molecular biology, although only 
few species have been used extensively for research. 
Amongst these, Chlamydomonas reinhardtii has 
emerged as the organism of choice. It is a powerful 
model system for a large variety of biological prob- 
lems, including chloroplast biogenesis, photosyn- 
thesis, flagellar structure and function, gametogenesis 
and cell mating, cell-wall synthesis, phototaxis, and 
circadian rhythms. 


The Organism 


Cells of C. reinhardtii are oval-shaped, typically 10 um 
in length and 3um in width, with two flagella at 
their anterior end (Figure l). This alga contains several 
mitochondria and a unique cup-shaped chloroplast 
containing the internal thylakoid membranes where 
the primary reactions of photosynthesis occur. The 
chloroplast occupies 40% of the cell volume and 
partly surrounds the nucleus. C. reinhardtii possesses 
a primitive visual system with the eyespot consisting 
of stacks of carotenoid-containing lipid granules that 
focus the incoming light on an overlying patch of 
plasma membrane where the photorecetor is believed 
to be localized. This photoreceptor is a rhodopsin-like 
protein with an all trans-retinal chromophore, and it 
is closely related to photoreceptors of multicellular 
organisms. 


Haploid vegetative cells of C. reinhardtii can be 
propagated through mitotic divisions. These cells exist 
as mating-type(+) or mating-type(—), determined by 
two structurally distinct alleles of the mating-type 
locus. Vegetative cells differentiate into gametes 
when they are starved of nitrogen. Gametes undergo 
several characteristic changes such as loss of ribo- 
somes, alteration of chloroplast morphology, starch 
accumulation, and reduced photosynthetic activity. 
Mixing of gametes of opposite mating type leads to a 
rapid agglutination of their flagella, a response which 
is mediated through gamete-specific glycoproteins, 
called agglutinins, which are associated with the fla- 
gellar membrane. The flagellar agglutination triggers a 
series of complex reactions which ultimately lead to 
the fusion of the gametic cells as well as their nuclei 
and chloroplasts, and to the maturation of the zygote 
into a thick-walled zygospore. After a maturation 
period, the latter undergoes meiosis and produces a 
tetrad consisting of four haploid daughter cells. Stable 
vegetative diploid cells can also be obtained after mat- 
ing. They divide mitotically and are useful for deter- 
mining whether a mutation is dominant or recessive. 

Photosynthetic function is dispensable in C. rein- 
þardtii provided a carbon source such as acetate is 
included in the growth medium. This property has 
been used extensively for isolating numerous mutants 
deficient in photosynthetic function. An important 
feature of this photosynthetic alga is that its chloro- 
phyll fluorescence patterns are highly sensitive to 
deficiencies in the photosynthetic electron transport 
chain. Fluorescence can thus be used as a noninvasive 
method to identify photosynthetic lesions in the 
mutants. Cells of C. reinhardtii can be grown under 
three distinct regimes: phototrophic growth with CO, 
assimilated through photosyntheis as unique carbon 
source, heterotrophic growth in the dark with acetate, 
and mixotrophic growth in the light with acetate. 
In addition, cell division can be synchronized by 
alternate 12 h light and 12 h dark cycles. Another 
important feature of C. reinhardtii is its ability to 
synthesize chlorophyll both in a light-dependent and 
light-independent manner. As many mutants deficient 
in photosynthesis are sensitive to light, they need to 
be grown in the dark. Under these conditions and in 
contrast to higher plants, C. reinhardtii is still able to 
assemble its photosynthetic apparatus. It is thus pos- 
sible to isolate photosynthetic complexes from light- 
sensitive mutant cells and to study their properties. 


Three Genetic Systems 


As for other photosynthetic eukaryotes, C. reinhardtii 
contains three genetic systems, located in the nucleus, 
the chloroplast, and the mitochondria. Nuclear genes 
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Figure | Section through a cell of Chlamydomonas 
reinhardtii, showing cup-shaped chloroplast (c) with 
thylakoid membranes, prominent nucleus (n), and 
mitochondria (m). (Courtesy of U. Goodenough.) 
(Reproduced with permission from Encyclopaedia of 
Molecular Biology and Molecular Medicine (1996) |: 347— 
360. VCH Verlagsgesellschaft.) 


are transmitted to the progeny according to Mendel’s 
rules whereas chloroplast and mitochondrial genes are 
usually transmitted uniparentally from the mt (+) and 
mt(—) parent, respectively. The complexity of the 
nuclear genome of C. reinhardtii has been estimated 
at 10° kb. The genetic map is composed of 148 loci 
distributed over 17 linkage groups. Approximately 
240 molecular markers, including RFLP and STS 
(short tagged sequences) markers have been mapped 
to all linkage groups with an average spacing of 
4-5cM or 0.4-0.5 Mb. The nuclear DNA of 
C. reinhardtii has a GC content of 62% and genes that 
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are highly expressed display a strong codon bias. A 
distinctive feature of these nuclear genes is the pres- 
ence of multiple introns. 

The chloroplast DNA of C. reinhardtii consists of 
circular molecules of 196 kb. Although its complexity 
is rather modest, only 0.2% of the total cell DNA, 
the chloroplast DNA constitutes between 10 and 15% 
of the total cell DNA mass because it is present in 
80 copies per cell which are organized into 8-10 
nucleoids within each chloroplast. In marked contrast 
to nuclear DNA, the chloroplast DNA G + C content 
is only 37%, indicating that these genomes have differ- 
ent origins. The major portion of the chloroplast genes 
encode subunits of the photosynthetic complexes and 
components of the chloroplast protein synthesizing 
system. The chloroplast gene expression system re- 
sembles that of prokaryotic organisms, e.g., chloro- 
plast ribosomes sediment at 70S and are sensitive to 
the same type of antibiotics as prokaryotic ribosomes. 
There are some striking differences, however. Tran- 
scription and translation, for example, are not coupled 
and some chloroplast genes have an unusual structure. 
Thus, psaA, encoding one of the photosynthetic re- 
action center subunits, consists of three exons that 
are dispersed on the chloroplast genome and that are 
transcribed independently. Maturation of the psaA 
mRNA requires two trans-splicing reactions. 

The mitochondrial genome of C. reinhardtii con- 
sists of 15.8kb linear DNA molecules. This DNA 
encodes only eight proteins: cytochrome J, subunit I 
of cytochrome oxidase, five subunits of NADH de- 
hydrogenase, and a protein with some resemblance to 
reverse transcriptase. It also contains the mitochon- 
drial ribosomal genes which are organized in an un- 
usual way: they are split into several smaller coding 
modules scattered over nearly half of the mitochon- 
drial genome and interspersed with protein and tRNA 
genes. Because only three tRNA genes are present 
in this mitochondrial genome, it is likely that the 
other tRNAs are nucleus- or chloroplast-encoded 
and imported into mitochondria. Mutants deficient 
in cytochrome b require light for growth and are 
unable to grow on acetate in the dark. 


Nuclear and Chloroplast Transformation 


The value of C. reinhardtii as a model system has been 
greatly increased by the development of efficient 
methods for nuclear and chloroplast transformation. In 
most cases nuclear transformation occurs through non- 
homologous recombination as the transforming DNA 
integrates at random sites of the nuclear genome. This 
property has been used successfully for tagging genes. 
Because of the high efficiency of nuclear transform- 
ation, it hasalso been possible to isolate nuclear genes by 


complementation of nuclear mutations with genomic 
cosmid libraries. Chloroplast transformation can be 
achieved by bombarding cells with DNA-coated 
tungsten particles from a particle gun. The chloroplast 
aadA expression cassette, consisting of the bacterial 
gene aadA (aminoglycoside adenyl transferase) has 
been used widely as selectable marker. In contrast to 
nuclear transformation, chloroplast transformation 
occurs exclusively through homologous recombin- 
ation. It is thus possible to disrupt specific chloroplast 
genes or to perform site-directed mutagenesis on any 
chloroplast gene of interest. These features have been 
exploited for performing chloroplast reverse genetics 
and in particular, for elucidating the functions of spe- 
cific plastid genes. Because chloroplast genomes con- 
sist of identical copies of DNA molecules, several cell 
cloning steps under constant selection are required for 
achieving a homoplasmic state of the transformed 
chloroplast genome. However, disruptions of chloro- 
plast genes that have an essential function under all 
growth conditions never lead to a homoplasmic state. 
In this case stable heteroplasmicity is maintained as 
long as the selective pressure is maintained. Persistent 
heteroplasmicity has therefore been used for identify- 
ing chloroplast genes with essential functions. 


Chloroplast Biogenesis 


As in higher plants, the biosynthesis of the photosyn- 
thetic apparatus of C. reinhardtii occurs through the 
concerted action of the nuclear and chloroplast genetic 
systems. Subunits of the photosynthetic complexes are 
either encoded by the chloroplast genome and trans- 
lated on chloroplast 70S ribosomes or encoded by 
nuclear genes, translated on cytosolic ribosomes as pre- 
cursors, inmost cases withan N-terminal extension that 
targets the proteins to the chloroplast. An extensive 
genetic analysis of mutants deficient in photosynthetic 
activity has revealed two major classes. The first 
includes mutations within the genes of the subunits 
of the photosynthetic complexes. The second includes 
mainly nuclear mutations that interfere with photo- 
synthesis indirectly by affecting chloroplast gene 
expression. These mutations affect mostly chloroplast 
post-transcriptional steps such as RNA stability and 
processing, splicing, translation and the assembly 
process of photosynthetic complexes. The number of 
these genes is surprisingly high and most of their 
products appear to act in a gene-specific manner. 
Although it is clear that the nucleus plays a crucial 
role for chloroplast gene expression, the state of the 
chloroplast can also influence nuclear gene expression. 
Earlier studies with higher plants revealed that certain 
nuclear genes involved in photosynthesis are not 
expressed in plants containing defective plastids as a 


result of plastid ribosome deficiency or of absence 
of carotenoids, which leads to photobleaching of the 
plastids in the light. Recent work in Chlamydomonas 
strongly suggests that some of the plastid-derived 
factors involved in this chloroplast—nuclear crosstalk 
are a chlorophyll precursor, Mg-protoporphyrin IX 
methylester, and its immediate precursor. 


Flagellar Assembly and Function 


The flagellar system of C. reinhardtii has proven to 
be particularly well suited for the study of micro- 
tubule assembly and function, and motility. The rea- 
son is that flagellar biosynthesis can be readily 
synchronized and numerous mutants affected in the 
function and assembly of the flagellar apparatus have 
been isolated. Both mutants with abnormal or no 
motility and those deficient in flagellar assembly 
have been characterized. Besides the major flagellar « 
and B tubulins, as many as 250-300 distinct polypep- 
tides can be resolved in the flagellae. Analysis of many 
paralyzed mutants has revealed deficiencies in sets of 
polypeptides corresponding to distinct flagellar pro- 
tein complexes. Because flagellar structure has been 
conserved throughout evolution, results obtained 
with Chlamydomonas are relevant for understanding 
human diseases. These include primary ciliary dyski- 
nesia, which affects cilia motility; polycystic kidney 
disease, some forms of which involve a defect in assem- 
bly of the primary cilia; and retinitis pigmentosa, which 
causes blindness due to retinal degeneration and 
involves a defect in transport of proteins through the 
connecting cilium of the photoreceptor cells. Several of 
the Chlamydomonas flagellar proteins are remarkably 
similar to human proteins associated with some of 
these diseases. It is thus apparent that the use of C. 
reinhardtii as a model system is not restricted to photo- 
synthesis and chloroplast biogenesis, but can also be 
extended for the understanding of human diseases 
associated with flagellar or ciliary dysfunction. 
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Photosynthesizers convert carbon dioxide into sugar 
using light energy captured by the green pigment, 
chlorophyll. They comprise most of the biomass on 
Earth and are at the base of almost all ecological com- 
munities, deep sea vents being a notable exception. 
Photosynthesizers are found in the two major div- 
isions of organisms, the prokaryotes and the eukary- 
otes, and are distinguished by a lack of nucleus and 
other cell compartments in the former and the pres- 
ence of a nucleus and compartments such as mito- 
chondria and chloroplasts in the latter. Chloroplasts 
are the site where photosynthesis occurs in eukary- 
otes, in particular in plants and in algae. An evolu- 
tionary link between prokaryotes and eukaryotes is 
that chloroplasts are former prokaryotic (cyanobac- 
terial) symbionts, acquired about 25 billion years 
ago by an ancestral eukaryote. Chloroplasts are now 
well-integrated, permanent residents of their hosts, 
although they still retain some of their prokaryotic 
characteristics. This includes a genome with the semi- 
autonomous capabilities of replication, transcription, 
and translation. 

In general, photosynthetic symbioses are quite 
common, perhaps due to the obvious advantages to 
acquiring a food-generating partner. It appears that 
chloroplasts (sometimes generically called plastids) 
evolved several times. Among the earliest extant 
lineages of photosynthesizers are the euglenoids. 
Green algae (Chlorophytes, including Chlamydo- 
monas) seem to have acquired their chloroplasts later 
and subsequently some of this lineage gave rise to 
plants. 


Chloroplast Genomes 


Compared to their cyanobacterial ancestors, chloro- 
plasts have lost most of their genes. Algae and plant 
chloroplasts have only a few hundred kilobases of 
DNA in circular genomes present in multiple copies 
with about 100 genes. Parasitic plants that have 
secondarily lost the ability to photosynthesize have 
even smaller chloroplast genomes as in Epifagus with 
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70 kb of DNA and 42 genes. An even greater loss 
of genes is found unexpectedly in the remnant 
chloroplasts of all apicomplexans which are obligate 
parasites. For example Plasmodium which causes 
malaria hasa plastid genome of 35 kb. Sequence analyses 
suggest that the apicomplexans evolved from photo- 
synthetic dinoflagellates. For additional details on loss 
of genes see Mitochondria and Symbionts, Genetics of. 


Shared Coding 


Complete loss of redundant and extraneous genes 
seems to occur easily in close symbioses, perhaps 
because a streamlining of the genome confers a repli- 
cative advantage. In addition intracellular horizontal 
transfer of genes may occur, facilitated by the prox- 
imity of the chloroplast, mitochondrial, and nuclear 
genes. The direction of transfer is strongly biased 
toward the nucleus, although chloroplast to mito- 
chondria transfers are also noted. A result of hori- 
zontal transfer is a shared coding for some essential 
chloroplast structures including the ribosomes. This 
makes the relationships even more obligate among the 
various genomes of eukaryotic cells. For additional 
details see Mitochondria and Symbionts, Genetics of. 


Variations on Genetic Code and Editing 


Unlike mitochondria, chloroplasts seem to adhere to 
the genetic code, at least among those that have been 
examined so far. However, some chloroplast se- 
quences do undergo some editing of mRNA, in par- 
ticular, conversions of C to U. The purpose of editing, 
convoluted as it is, appears to be a means of regu- 
lating and modifying transcription. For more details 
on editing see Mitochondria, RNA Editing in Plants. 


Recombination of Chloroplast DNA 


A wide range of mutant chloroplast genes including 
antibiotic sensitivities and pigment alterations may be 
observed to recombine in those algae and plants in 
which gametes are of similar size. For example, Chla- 
mydomonas has been frequently used to demonstrate 
recombination. 


Maternal Inheritance 


There is considerable variation in the plants and algae 
in respect to gamete size. In some cases maternal 
gametes (ova) are much larger than the paternal ones 
(pollen) and contribute entirely or almost entirely to 
the chloroplasts of the zygote. This means that mater- 
nal inheritance of chloroplast mutations can occur 
in some plants. Often such inheritance is manifested 


by variegation as in chloroplast mutants that fail to 
produce chlorophyll, yielding a splotchy phenotype 
of green and colorless areas on the plant. Complete- 
ly colorless plants generally fail to reproduce, so a 
mixed population of chloroplasts is more likely to 
be inherited. 


Further Reading 

Dyer B and Obar R (1994) Tracing the History of Eukaryotic Cells. 
New York: Columbia University Press. 

Gillham N (1994) Organelle Genes and Genomes. New York: 
Oxford University Press. 

Margulis L (1993) Symbiosis in Cell Evolution. New York: WH 
Freeman. 


See also: Mitochondria; RNA Editing in Plants; 
Symbionts, Genetics of 


Christmas Disease 


F Gianelli 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0196 


Christmas disease is the name given to the type of 
hemophilia caused by deficiency of coagulation factor 
IX. The term originates from the surname (Christmas) 
of the first patient found to suffer from this type of 
hemophilia. Christmas disease is synonymous with 
hemophilia B. 


See also: Hemophilia 
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A chromatid is one of the replicated copies ofa chromo- 
some. Identical sister chromatids are produced as a 
result of DNA replication. In contrast, homologous 
chromosomes derive from either the mother or the 
father of the organism, and although they contain the 
same set of genes, they usually have genetic differences. 
Sister chromatids are physically attached all along their 
lengths and particularly at the centromeres. This cohe- 
sion between the chromatids is established while the 
DNA is being replicated and is mediated by several 
proteins, some of which constitute the cohesin com- 
plex. Sister chromatid cohesion is essential for the 
movement of the chromosomes to the metaphase 


plate and their proper segregation to daughter cells. In 
mitosis, sister chromatids are segregated to different 
daughter cells. In prophase, the chromosomes begin to 
condense, and only then can the sister chromatids be 
distinguished from each other cytologically. At meta- 
phase, the paired sister chromatids are aligned on the 
metaphase plate. At the metaphase—anaphase transi- 
tion, the cohesion between the sister chromatids is 
released, and they are pulled to opposite sides of the 
dividing cell. During telophase, when the chromatids 
are decondensing, the sister chromatids are now con- 
sidered each a ‘chromosome’ for the daughter cells. 

In meiosis, sister chromatids are segregated as 
one unit in the first meiotic division as they are moved 
to the same cell, and then are separated from each other 
in the second meiotic division. Early in meiosis, homo- 
logous chromosomes are paired and recombination 
between nonsister chromatids occurs. This exchange 
produces recombinant sister chromatids such that 
chromatid arms distal to the exchange point are of the 
homolog. At the metaphase I-anaphase I transition, 
homologous chromosomes are segregated and each 
pair of sister chromatids moves to the same pole of 
the cell. While cohesion is lost along the arms of the 
sister chromatids at this transition, the chromatids 
remain joined at the centromere, until the metaphase 
IJ-anaphase II transition when centromeric cohesion 
is lost and the chromatids proceed to opposite poles. 

In both mitosis and meiosis, the loss of cohesion is 
mediated by the cleavage of one of the cohesin com- 
plex subunit proteins. In mitosis, the cohesion is lost 
simultaneously along the arms and at the centromere. 
However, in meiosis, cohesion is lost sequentially, first 
along the arms in the first division and then at the 
centromeres in the second. The proper timing of these 
events is of critical importance to the cell. Missegrega- 
tion of chromosomes can lead to aneuploidy, chromo- 
some breakage, or chromosome loss. 

Recombination between sister chromatids occurs 
frequently in mitosis as a mechanism of DNA repair. 
Evidence for mitotic sister chromatid exchange has 
come from both cytological and genetic observations. 
If the thymidine analog bromodeoxyuridine (BrdU) is 
added to the growth medium of cells for two rounds 
of DNA replication, the DNA strands of each sister 
chromatid can be easily distinguished under a micro- 
scope after appropriate staining because one sister 
chromatid has both DNA strands labeled with BrdU 
while the other sister has only one strand labeled. 
These are called harlequin chromosomes. Occasion- 
ally, chromatids that have exchanged are observed by a 
change in the pattern of BrdU labeling. 

Recombination between sister chromatids can be 
scored only when genetic markers are tandemly re- 
peated. If recombination occurs ‘unequally’ between 
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repeats that are out of register, one product contains an 
increase in the number of repeats and the other a 
decrease. This has been used in yeast to monitor 
sister chromatid recombination. In higher eukaryotes, 
unequal crossing-over often happens between areas 
of repeated sequence, and gene duplications, deletions, 
and loss of heterozygosity can result. One well- 
studied example of unequal crossing-over is the photo- 
receptor gene group on the human X chromosome. 
The red and green photoreceptors genes are located in 
a tandem array and are extremely similar in their 
sequence. Unequal crossing-over can give aberrations 
that lead to color blindness and is the most frequent 
cause of color blindness in humans. 

Recombination occurs in the early stages of meio- 
sis, but genetic exchange between nonsister chromat- 
ids is favored over recombination between sister 
chromatids. It is crucial that recombination occurs 
between homologous chromosomes to ensure their 
proper segregation in meiosis I. This appears to be 
accomplished by repressing recombination between 
sister chromatids during meiosis so that recombin- 
ation between homologous chromosomes will occur. 
The mechanism for this bias is not understood. 


See also: Cell Division Genetics; Chromosome; 
Meiosis; Mitosis; Unequal Crossing-Over 
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In meiosis, chromosomes are replicated prior to the 
onset of recombination, so that each chromosome is 
composed of a pair of chromatids (sisters). These rep- 
licated chromosomes pair with their homologs, creating 
a bivalent, and the chromatids undergo exchange. 
Exchange between nonsister chromatids can result 
in crossing-over. If the chromatid pair that crosses over 
in one interval has no influence on which pair of non- 
sister chromatids is involved when crossing over occurs 
in a linked interval, chromatid interference is absent. 
Chromatid interference is detected when 2-chromatid: 
3-chromatid:4-chromatid double crossovers depart 
from the 1:2:1 ratio expected from randomness. 
Negative chromatid interference implies the ten- 
dency for two exchanges occurring in the same bi- 
valent to involve the same nonsister pair of chromatids 
and is manifested as an excess of 2-chromatid double 
crossovers. Positive chromatid interference is the 
tendency for two exchanges in the same bivalent to 
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involve different pairs of nonsister chromatids and is 
manifested as a paucity of 2-chromatid double cross- 
overs. When chromatid interference is reported, it is 
usually weak and negative. 


See also: Coincidence, Coefficient of; 
Interference, Genetic; Negative Interference; 
Tetrad Analysis 
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Composition and Structure 


Chromatin is composed of a cells DNA and asso- 
ciated proteins. Histone proteins and DNA are 
found in approximately equal mass in eukaryotic 
chromatin, and nonhistone proteins are also in great 
abundance. The basic unit of organization of chro- 
matin is the nucleosome, a structure of DNA and 
histone proteins that repeats itself throughout an 
organism’s genetic material. 

Histones are highly conserved basic proteins, 
whose positively charged character helps them to 
bind the negatively charged phosphate backbone of 
DNA. There are five histone proteins in the family: 
H1, H2A, H2B, H3, and H4. Two H3 and two H4 
proteins form a tetramer, which combines with two 
H2A/H2B dimers to form the disk-shaped histone 
core. Approximately 150bp of DNA wrap around 
this protein structure almost twice to make a nucleo- 
some core particle. With linker histone (e.g., histone 
H1) and linker DNA, this is called the nucleosome. 
The linker DNA can vary in length, usually be- 
tween 10 and 90bp, depending on the species, 
gene activity, developmental stage, and other factors. 
The nucleosome repeats approximately every 200 bp 
and is close to 10 nm in diameter. The X-ray crystal 
structure of the nucleosome core particle was derived 
in 1997. The core histones each have a central fold, 
which lies within the DNA, and an unstructured N- 
terminal tail, which protrudes outside the core. The 
tail extensions of histone H3 in particular are very 
long and are held in place by nucleosome-nucleosome 
interactions. 

The nucleosome is the most basic unit of structure 
of chromatin, but the chromatin is even further organ- 
ized by folding into a higher-order structure. Early 
evidence for this came from the observation that 
in vitro, when chromatin is treated with salt, the over- 
all chromatin structure falls apart and the nucleosomes 


resemble ‘beads on a string.’ In addition, in solutions 
of salt with concentrations comparable with physio- 
logical conditions, chromatin is usually seen in a 
thicker structure (30 nm fiber) than would be expected 
by a string of nucleosomes. In the ‘solenoid’ model of 
chromatin folding, each nucleosome associates with 
one H1 protein and a group of six nucleosomes is 
turned into a spiral shape (see Figure |). The structure 
of the nucleosome predicts that interactions between 
histone tails and nucleosomes may also play a role in 
the coiling of chromatin fibers. During cell division, 
further compaction of DNA occurs when the chro- 
matin is condensed into chromosomes in prophase. As 
it is very difficult to organize and move large amounts 
of chromatin fiber, this condensation is necessary for 
the cell to be able to properly segregate chromosomes 
in mitosis and meiosis. Though all of the proteins 
necessary for this process have not been identified, 
the condensin complex is a group of proteins that is 
essential for the proper condensation of chromo- 
somes. It is thought that compaction may involve 
the 30-nm chromatin, forming loops extending from 
a proteinaceous scaffold composed of nonhistone pro- 
teins, though there may be even more complex mech- 
anisms of condensation (see Figure | for a model). 


Function 


Chromatin structure plays an important role in con- 
trolling gene expression and replication. The packaging 
of DNA into nucleosomes forms a ‘closed’ structure 
that is not very accessible to enzymes that perform 
replication, transcription, and DNA repair. This struc- 
ture is generally transcriptionally repressive, allowing 
only a basal level of gene expression. In a disrupted, 
‘open’ nucleosome structure, the DNA is more acces- 
sible to replication and transcription factors. 

In transcription, some activators and repressors 
interact with RNA polymerases to change the chro- 
matin structure and modulate gene activity. Activators 
can help to disrupt nucleosome structure and thereby 
stimulate the assembly of RNA polymerase and tran- 
scription factors at the promoter. For replication, 
a similar modulation of chromatin structure must 
occur to allow the replication machinery to be pos- 
itioned at the origins of replication. 

The structure of chromatin can also have long- 
range effects on gene expression. In a phenomenon 
termed ‘position effect variegation,’ genes located 
near silent heterochromatic regions can also be made 
transcriptionally inactive. Genes as far as 1000kb 
away can be silenced. Because the exact areas that are 
repressed vary from cell to cell, this is an epigenetic 
phenomenon that produces variegation in phenotype. 
It is generally thought that the highly condensed 
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Figure | Model of chromatin packing into higher- 


order structures. DNA is assembled into the ‘beads-on- 
a-string’ conformation by wrapping around histones to 
form nucleosomes, which can then be packed into a 30- 
nm fiber. The fibers coil to form a chromosome. In 
metaphase, the chromosomes condense even more 
extensively. Reproduced from Alberts, Bruce et al. 
(1994) Molecular Biology of the Cell, 3rd edn. New York: 
Garland Publishing. (Permission from Elsevier Science). 


nature of heterochromatin prevents access by tran- 
scription factors, but how this can affect neighboring, 
nonheterochromatic regions is not fully understood. 
While it is accepted that proteins found in the hetero- 
chromatin can ‘spread’ to adjoining regions and im- 
pact a similar repressive effect, another possibility 
is that the heterochromatin may be grouped into com- 
partments of the nucleus that are inaccessible to tran- 
scription factors. 

Chromatin structure can also affect DNA replica- 
tion on a global level. For example, heterochromatin 
and other silent areas of the genome replicate late in 
S-phase, but the reason that these late-replicating 
regions are silent is unknown. One possibility is a 
specific repressive chromatin structure that can be 
overcome to allow origin firing late in S-phase. 
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Modification 


Both protein complexes and small organic molecules 
modulate the state of chromatin in the cell. Large 
chromatin-remodeling complexes use the energy 
from ATP hydrolysis to destabilize and reposition 
nucleosomes. These complexes are highly conserved 
in many eukaryotic organisms (see Table 1). They 
all include a helicase-like subunit that has a DNA- 
dependent ATPase activity. The SWI/SNF family of 
chromatin-remodeling complexes are very large pro- 
tein complexes with many subunits. They are thought 
to help activate transcription by disrupting the 
nucleosome structure, by displacing the histone octa- 
mer to neighboring DNA. The ISWI family of remod- 
eling complexes are smaller in mass and generally 
have fewer subunits. Unlike the SWI/SNF complexes, 
they are thought to promote gene expression by slid- 
ing the nucleosomes along the DNA, opening up a 
local area. 

Another example of a chromatin-remodeling com- 
plex is the polycomb (Pc) group in Drosophila. This 
complex is a negative regulator of transcription, acting 
to repress homeotic genes during development. The 
Pc complex is believed to cause a tightening of chro- 
matin structure, inducing a heterochromatin-like 
state, or to coat the chromatin, thus preventing access 
by transcription factors. The Pc group is also needed 
for position effect variegation. 

Posttranslational modifications also exert signifi- 
cant effects on chromatin activity (see Figure 2). 
Acetylation of the N-terminal tails of core histones 
is the best-studied histone modification. The addition 
and removal of acetyl groups has considerable influ- 
ence on gene regulation. This connection was made 
by several observations. First, some transcriptional 
activators are histone acetyl transferases (HAT), while- 
some transcriptional repressors are or recruit histone 
deacetylases (HDAC). Second, in Saccharomyces 
cerevisiae, regions with hyperacetylated histones (in 
particular, H3 and H4) tend to be transcriptionally 
active, while areas with hypoacetylated histones are 
mostly silent. In mammals, the inactive X chromo- 
some has little acetylation of histone H4. 

The addition of an acetyl group on a lysine residue 
will reduce the positive charge of the histone, thereby 
causing a weaker interaction between the histone and 
the DNA. This loss of stability in the nucleosome 
probably facilitates access to the DNA by transcrip- 
tion factors. By the crystal structure of the nucleo- 
some, the N-terminal tails of histones were shown 
to mediate nucleosome—nucleosome interaction, and 
acetylation of these tails is predicted to disrupt these 
interactions and thereby open the chromatin struc- 
ture. Since multiple lysine residues are often modified, 
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Table | Chromatin-remodeling complexes. 
Complex Organism ATPase Mass No. of 
(MDa) Subunits 
SWI/SNF family 
SWI/SNF S. cerevisiae Swi2/Snf2 2 II 
RSC S. cerevisiae Sth | | 15 
Brahma D. melanogaster brahma 2 nd 
h SWI/SNF H. sapiens hBRM 2 ~10 
h SWI/SNF H. sapiens BRGI 2 ~10 
NRD H. sapiens CHD4 1.5 18 
ISWI family 
I SWI S. cerevisiae ISWII 0.4 4 
| SWI2 S. cerevisiae ISWI2 0.3 2 
NURF D. melanogaster ISWI 0.5 4 
CHRAC D. melanogaster ISWI 0.7 5 
ACF D. melanogaster ISWI 0.2 4 
RSF H. sapiens hISWI 0.5 2 


(Reprinted from Kornberg and Lorch, 1999. Twenty-five years of the nucleosome, fundamental particle of the eukaryote 
chromosome. Cell 98: 285-294; with permission from Elsevier Science.) 


Figure 2 Schematic of modification of chromatin histones (H2A, H2B, H3, H4). The sites of acetylation (Ac), 
phosphorylation (P), methylation (Me), and ubiquitination (Ub) of the core histones are diagrammed. Acetylation, 
methylation, and phosphorylation occur on the N-terminal tails of the histones. With the exception of 
phosphorylation of serine residues (S), modifications are of lysine residues (K). (Adapted from Spencer VA and 
Davie JR (1999) Role of covalent modifications of histones in regulating gene expression. Gene 240: I-12; with 


permission from Elsevier Science.) 


it is suspected that these hyperacetylated histones 
could affect global chromatin structure, e.g., by desta- 
bilizing the 30-nm fiber structure. 

Phosphorylation is another modification seen on 
histone tails. In particular, the phosphorylation level 
of N-terminal serines of H1 and H3 changes with 
mitotic and meiotic stages, appearing late in Gp, 
reaching its peak in metaphase, and disappearing at 


anaphase. This correlates with the timing of chromo- 
some condensation, and phosphorylation of histone 
H3 is required for proper chromosome condensation 
and segregation. 

Methylation occurs on histone lysines, beginning 
after nucleosome assembly and peaking in mitosis. 
Recent work has shown that methylation at a particu- 
lar lysine in H3 is required for proper cell division. 


Also, ADP ribosylation and ubiquitination on his- 
tones have been observed, but their effects are not as 
yet well understood. 

Modification of the DNA itself has profound 
effects on chromatin structure and gene expression. 
In most eukaryotes, methyl groups are often added to 
the cytosine residue in a CG doublet. Silent genes are 
often methylated, while active genes are usually not 
methylated. Since methylation is found frequently 
at the 5’ ends of genes, this modification probably 
induces some sort of silent chromatin state that pre- 
vents access by RNA polymerase. One example of 
genes silenced by methylation is the globin gene 
cluster in adult chicken erythroid cells. In mammals, 
the inactive X chromosome in females is silenced 
by methylation. Studies on methylated genes have 
shown that the methylation patterns are heritable, 
and models have been proposed as to how such a 
state would be propagated. 


Replication 


The exact mechanism of how chromatin is replicated 
is not yet clear, but important observations have been 
made on certain aspects of this process. Most of chro- 
matin assembly occurs during S-phase, so the nucleo- 
some is assembled soon after DNA replication. Only 
a very small area of DNA is perturbed at a time as 
replication occurs: only two nucleosomes in front of 
the replication fork are disturbed, and less than 300 bp 
behind the fork are without nucleosomes. The first 
nucleosome behind the fork is almost complete (lack- 
ing only H1), but, 450-650 bp behind the fork, fully 
assembled nucleosomes are found. As the replication 
fork approaches, the histone octamers disassemble 
into H2A/H2B dimers and (H3, H4), tetramers. The 
formation of the new nucleosome occurs in several 
stages. The tetramer of histones H3 and H4 is deposited 
onto the newly replicated DNA first with the help 
of chromatin assembly factor-1 (CAF-1). This is 
dependent on replication. Interestingly, H3 and H4 
are acetylated on specific lysines when they are first 
deposited, and are deacetylated to another form after 
they are incorporated. The H2A/H2B dimers are 
then deposited in a replication-independent process 
that may involve NAP-1 (nucleosome assembly 
protein-1). H1 is the last protein added, and the new 
nucleosome is made up of both old and new histone 
proteins. 

The overall state of the chromatin is preserved after 
replication. For instance, regions that are silent because 
of position effect variegation are maintained. It has 
been hypothesized that the modification state of the 
chromatin can be the epigenetic mark that is passed 
onto new chromatin. During cell division in females, 
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the same X chromosome is inactivated faithfully 
through its methylation state. More work is needed 
on whether the modification state of histones may 
also be a carrier of gene-expression information after 
replication. 
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At prophase of mitosis and meiosis in plants and 
animals, condensing chromosomes display a beaded, 
granular appearance. These beads are termed chromo- 
meres. Chromomeres can also be seen in polytene 
interphase chromosomes of dipteran insects. 

Chromomeres vary considerably in size, but pro- 
vide a constant pattern on any given chromosome. 
The term chromomere has been applied inconsistently 
in different organisms to include chromatin of variable 
composition. In mammals, the pattern of chromomeres 
at pachytene of meiosis is very similar to that of the 
dark bands on somatic chromosomes obtained by 
staining with Giemsa following trypsin treatment. 
Such dark G-bands tend to be AT-rich and also gene- 
poor regions of the genome. This implies that the 
chromomeres visible at meiosis have a similar consti- 
tution. Somewhat in contrast, chromomeres are pres- 
ent at the bases of transcription loops of lampbrush 
chromosomes of urodeles. This would indicate that 
in these organisms, chromomeres form in chromatin 
segments that are gene-rich. The significance of 
chromomeres, in terms of chromosome structure and 
function, remains a matter of debate. 
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Chromosomes are the physical carriers of genes, 
consisting of DNA and associated proteins. Bacteria 
typically have one circular chromosome, while eukary- 
otes usually have linear chromosomes and vary 
widely in their sizes and numbers of chromosomes. 
All chromosomes have the capacity to transmit genes 
faithfully during cell division. The structures of the 
eukaryotic chromosome that allow them to do this 
are replication origins, telomeres which protect the 
chromosome ends, and centromeres for microtubule 
attachment and microtubule motor binding in cell 
division. Chromosomes must also be able to condense 
during cell division so that they can align onto 
the spindles and be moved to the spindle poles. In 
metazoans, chromosomes can be most broadly cat- 
egorized into autosomes and sex chromosomes 
(chromosomes that have a part in determining the 
sex of the organism). 

The term ‘chromosome’ was first coined in 1888 by 
Waldeyer, who used an aniline dye and saw ‘colored 
bodies’ under the microscope. Bridges, in 1916, was 
the first to prove that genes are carried by the physical 
chromosomes in his experiments with fruit flies 
(Drosophila). By doing genetic crosses and following 
a sex-linked marker for eye color, Bridges hypothe- 
sized that the rare fruit flies with unexpected pheno- 
types were the result of chromosome nondisjunction. 
By studying the chromosomes of these fruit flies 
under the microscope, he was able to confirm that 
they had abnormal numbers of sex chromosomes and 
that the gene responsible for eye color was present on 
the X chromosome. The order of genes on chromo- 
somes was determined by linkage studies, in which 
genetic crosses were performed and the frequency of 
recombination between markers on the chromosome 
calculated. The order of genes determined by these 
genetic crosses corresponded to the physical order of 
genes on chromosomes. 


The chromosomes of eukaryotic cells may be dif- 
ferentiated by size, the position of the centromere, and 
banding patterns made by stains that have various 
affinities for certain kinds of DNA sequences. Organ- 
isms vary widely in the size of their genomes and the 
number and size of their chromosomes that contain 
their genetic information. Even two species in the 
same genus can show extensive diversity. The Chinese 
muntjac deer (Muntiacus muntjac reevesi) has 46 
chromosomes, while the Indian muntjac (M. muntjac 
vaginalis) has only six chromosomes in the female and 
seven in the male. Salamanders of the genus Plethodon 
can have 19.5 x 10° bp (P cinercus) or 67.6 x 10° bp 
(P. vandykei), but in spite of having a difference of 
over three times in the number of base pairs, both 
species have 28 chromosomes. 

Chromosomes can only be visualized when they 
are condensed during the cell cycle. For most of the 
cell cycle (interphase), one can only see a tangle of 
chromatin in the nucleus. In order for the chromo- 
somes to be accurately partitioned during cell div- 
ision, the chromatin must condense into a more 
tightly compacted form. As cells enter prophase, the 
chromatin begins to condense into rod-like structures 
that become fully formed in prometaphase. At meta- 
phase, the chromosomes are aligned on the metaphase 
plate, and the two sister chromatids are segregated at 
the metaphase-anaphase transition. The chromo- 
somes decondense back to the diffuse form in telo- 
phase. In many organisms, prophase chromosomes 
have been observed to take on an arrangement in the 
nucleus called the Rabl orientation, in which the cen- 
tromeres of the chromosomes are at one end of the 
nucleus while the telomeres are oriented toward the 
other end. This develops as a result of the chromo- 
some arrangement during the previous anaphase, 
where the centromeres are the leading part of the 
chromosome in moving to the spindle pole. In some 
organisms, early meiotic chromosomes can also 
arrange themselves into a ‘bouquet formation,’ where 
the chromosome ends cluster together at the nuclear 
membrane. 

There are several types of special chromosomes. 
Polytene chromosomes, found in some insect tissues 
and giant trophoblast cells of mammals, are formed as 
a result of endoreplication. Here, chromosomes are 
replicated two or more times without intervening 
mitoses, producing many copies of tightly paired, rep- 
licated chromosomes. Transcription of these chromo- 
somes can be a mechanism for a cell to produce 
proteins in large quantities. Double Minutes are 
unstable chromosome-like structures composed of 
amplified genes. They do not have formal telomeres 
or centromeres and segregate randomly at mitosis. 
Double Minutes have only been observed in cancers 


or cell lines that have developed resistance to drugs. 
Some organisms carry extra chromosomes, known as 

chromosomes or supernumerary chromosomes. 
They have been found in many different plants and 
animals (but excluding humans) and most extensively 
studied in maize and grasshoppers. B chromosomes 
are largely heterochromatic and dispensable for the 
organism. Surprisingly, B chromosomes seem to affect 
the organism only when they are present at high copy 
numbers, where they can influence viability and re- 
combination frequency. 


See also: C-Value Paradox; Centromere; 
Chromatid; Double-Minute Chromosomes; 
Linkage; Linkage Map; Meiosis; Mitosis; 
Telomeres 
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Changes inthe genetic material which cause phenotypic 
abnormality involve mutation, duplication, deletion, or 
rearrangement of DNA. The extent of the change varies 
from the gain or loss of a single nucleotide (a point mu- 
tation) to gain or loss of whole chromosomes. Those 
changes that are large enough to be visible under the 
light microscope are usually termed ‘chromosome aber- 
rations.’ They differ from molecular aberrations invol- 
ving individual genes only in terms of scale. This means 
that it would be unusual to identify a chromosomal 
deletion or duplication under the microscope that 
involves less than 4 x 10° bp. The causes of structural 
chromosome aberrations may be expected to be similar 
to the causes of other types of genetic mutation. 

The identification of a chromosome aberration 
requires a source of dividing cells. In human chromo- 
some analysis, a small heparinized sample of venous 
blood cultured for 48-72 h in the presence of a mitotic 
stimulant (phytohemagglutinin) provides sufficient 
numbers of dividing T lymphocytes. Mitosis is 
arrested by colchicine and the accumulated meta- 
phases are prepared for analysis by treatment with 
hypotonic solution, which separates the chromo- 
somes from one another before fixation in acetic alco- 
hol. A few drops of the suspension of mitotic cells in 
fixative is dropped onto a microscope slide and 
allowed to dry in air; this causes the chromosomes in 
each metaphase to spread in one optical plane in a state 
suitable for microscopic analysis. Mild denaturation 
by enzyme or heat treatment, followed by Giemsa 
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staining, produces characteristic banding patterns 
along the chromosomes, allowing the unequivocal 
identification of each individual chromosome. The 
same basic technique is used for chromosome prepar- 
ations made from any active cell culture, bone marrow, 
or other mitotically active material, including cancer 
cells. Similar preparations of meiotic chromosomes 
from adult testicular material or from fetal ovarian 
material, permit the study of chromosome behavior 
during gametogenesis. 

It is convenient to classify chromosome aberrations 
into ‘numerical aberrations,’ where the somatic cells 
contain an abnormal number of normal chromo- 
somes, and ‘structural aberrations,’ where the somatic 
cells contain one or more abnormal chromosomes. 
The former are the result of errors of cell division, 
while the latter involve breakage and reunion of 
DNA. The various types of aberration occur in all 
species, but, for convenience, examples are drawn 
from human chromosome aberrations as human cyto- 
genetics has provided the most extensive experience. 


Numerical Chromosome Aberrations 


Human somatic cells normally contain 46 chromo- 
somes (the diploid number, 27). Mature sperm and 
eggs have 23 chromosomes (the haploid number, 7), 
i.e., one member of each chromosome pair. The 
diploid number is thus reconstituted at fertilization. 
Cells with a chromosome number that is an exact 
multiple of the haploid number, and exceeds the 
diploid number, are polyploid (see Polyploidy). 

Triploidy (37) is the most common form of poly- 
ploidy associated with phenotypic abnormality (see 
Triploidy). It results from the fertilization of an egg 
by two sperm (dispermy) or from failure of one of the 
maturation divisions of the egg (digyny). It is esti- 
mated that some 2% of all human conceptions are tri- 
ploid, but most are lost early in pregnancy and only a 
few survive to term. The affected fetus is small, with a 
disproportionately small trunk to head size, and has 
multiple malformations, including syndactyly (fusion 
of digits III and IV). When there is a double chromo- 
some contribution from the father, the placenta is 
large and shows hydatidiform change; this is due to 
genomic imprinting, as it does not occur when the extra 
chromosome set is maternal. A similar finding occurs 
in hydatidiform mole. This is an abnormal conception 
in which there is no embryo and the chorionic villi 
contain no vasculature and become greatly swollen. 
Chromosome analysis reveals a 46, XX karyotype, but 
both chromosome sets are derived from the father. 
This can be explained by degeneration of the female 
pronucleus and diploidization of the male pronucleus, 
but other explanations are also possible. 
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Tetraploidy (47) is a normal finding in many 
somatic tissues, including regenerating liver and bone 
marrow. Tetraploid cells arise by endomitotic redupli- 
cation, i.e., the chromosomes divide twice and the 
cell divides only once. Tetraploid embryos occur as a 
result of failure to complete the first cleavage division, 
but this is invariably followed by abortion in the first 
trimester. 

Aneuploidy is the term used to describe an irregu- 
lar number of normal chromosomes in somatic cells. It 
arises from failure of paired chromosomes to separate 
from one another during first meiosis, or from sister 
chromatids to separate from one another during sec- 
ond meiosis or during somatic mitosis after fertil- 
ization. The result may be the production of an 
embryo with three chromosomes (trisomy) or only 
one chromosome of a particular pair (monosomy). 
Embryos with both members of a chromosome pair 
missing (nullisomy) are inviable. 

Failure of chromosomes or chromatids to separate 
during cell division is referred to as ‘nondisjunction.’ 
Failure to pair during first meiosis is ‘nonconjunction.’ 
Delayed movement of a chromosome at anaphase may 
result in it being incorporated into the wrong daughter 
cell or excluded from either. All these mechanisms 
result in embryos having one or more normal chromo- 
somes missing or extra. If the abnormal event occurs 
after fertilization, cell lines with different chromo- 
some complements may persist in the embryo and/or 
extraembryonic fetal membranes. Chromosome an- 
alysis may then detect more than one cell line, with 
normal and/or abnormal numbers of chromosomes; 
this is termed ‘chromosomal mosaicism.’ 

While the mechanisms which lead to aneuploidy 
are understood, the causes remain uncertain. Meiotic 
nondisjunction occurs with increased frequency with 
increasing maternal age, suggesting some degenerative 
changes in the meiotic spindle apparatus. 

Trisomy 21 is the best known of the human aneu- 
ploidies. It leads to Down syndrome (see Down Syn- 
drome), which occurs in approximately 1 in 700 births. 
Itaccounts for about 30% of moderate and severe learn- 
ing difficulties in children of school age. Affected 
patients are short, and dysmorphic features include 
oblique palpebral fissures, speckled irides, midface 
hypoplasia with small nose and relatively large tongue. 
The skull is brachycephalic and the ears are small and 
low-set. The fifth finger is short and incurved and 
there may be a transverse palmar crease. Congenital 
heart disease iscommon, and other features may include 
duodenal atresia, epilepsy, leukemia, and presenile 
dementia. Young parents of children with trisomy 21 
have a risk of recurrence of trisomy 21 of about 1.5%. 
Many couples will seek the reassurance of prenatal 


diagnosis in subsequent pregnancies. Maternal screen- 
ing by ultrasound and biochemical tests can help 
to determine the risk of an affected pregnancy (see 
Prenatal Diagnosis). 

Trisomy 18 (see Trisomy 18) and trisomy 13 (see 
Patau Syndrome)are less common forms of aneuploidy. 
Both cause severe physical and mental handicap and 
most affected infants do not survive the neonatal 
period. Each has characteristic dysmorphic features 
which permit a clinical diagnosis, and the incidence 
at birth increases with maternal age. 

Most other human autosomes can be involved in 
trisomic conceptions, but many of these are anem- 
bryonic and all are inviable. The rare exceptions 
are chromosomal mosaics, where the survival of the 
trisomic cell line is supported by the presence of a 
normal cell line. In other types of trisomic concep- 
tion, return to the disomic state is achieved by 
‘trisomic rescue,’ in which one of the trisomic 
chromosomes is lost. If this results in the two remain- 
ing chromosomes having the same parental origin, the 
term ‘uniparental disomy’ (UPD) is used (see Uni- 
parental Inheritance). The importance of this relates 
to the phenomenon of parental genomic imprinting in 
which specific genetic loci are inactivated during 
either male or female gametogenesis. UPD may there- 
fore result in both alleles being inactivated at one or 
more loci, with consequent abnormal effects on the 
phenotype. Chromosomes 7, 11, 14, and 15 are known 
to be affected by imprinting, whereas others such as 
chromosomes 13 and 21 are not. The best-known 
example is chromosome 15, in which maternal UPD 
occurs in 30% of patients with the Prader-Willi 
syndrome, and paternal UPD accounts for 5% of 
cases of Angelman syndrome. 

Aneuploidy for the sex chromosomes is not 
associated with such severe disability as is found in 
autosomal aneuploidy. 47, XXY Klinefelter syndrome 
(see Klinefelter Syndrome) occurs in approximately 1 
in 1000 male births and the main feature is infertility 
due to primary hypogonadism. 47,XXX is the equiva- 
lent condition in women, who are not infertile but may 
have learning difficulties. Similarly males with a 47, 
XYY complement are usually fertile and asymptom- 
atic; they tend to be 4-5 cm taller than average. 

The only viable human monosomy is 45,X Turner 
syndrome (see Turner Syndrome). However, it is esti- 
mated that 98% of such conceptions are inviable and 
that placental mosaicism may explain the occurrence 
of most of the survivors. The incidence at birth is 
approximately 1 in 5000 female births. Short stature 
and sexual infantilism are the main findings, together 
with complex dysmorphic features, including webbed 
neck and congenital heart disease. 


Structural Chromosome Aberrations 


In essence, structural chromosome aberrations are the 
result of chromosome breakage and abnormal reunion 
of broken chromosomes. They can be produced 
experimentally by exposing active cells to mutagens 
such as ionizing radiation. However, spontaneous 
structural rearrangements in both somatic and germ 
cells arise from errors of recombination. Meiotic 
recombination is preceded by synapsis of homologous 
chromosomes which involves the recognition by one 
homolog of complementary sequences in the other 
homolog. Mismatching can occur in this process, par- 
ticularly at chromosomal sites containing tandem 
repeats of DNA sequences. This may result in dupli- 
cation or deletion of the DNA at such sites. Similarly, 
synapsis between homologous sites on nonhomolo- 
gous chromosomes may lead to accidental recom- 
bination between nonhomologous chromosomes, 
thereby leading to the transfer of chromosomal seg- 
ments from one chromosome to another. These re- 
arrangements are termed ‘translocations.’ 

Recombination also occurs between homologous 
chromosomes in somatic cells, and occasionally ex- 
amples of pairing and chromatid exchange can be 
observed in routine chromosome preparations. How- 
ever, the main evidence comes from studies of DNA 
markers in neoplasia, in which individuals hetero- 
zygous at a number of gene loci on a chromosome 
pair have tumors homozygous at the same loci on the 
same pair of chromosomes. 

Chromosome analysis of cell cultures exposed to 
irradiation before DNA synthesis reveals that, when a 
chromosome breaks, two unstable ends are produced. 
DNA repair mechanisms within the cell usually 
ensure that the two ends are rejoined. However, 
when there is more than one break, the correct ends 
may not be rejoined and abnormal chromosomes may 
result. Various combinations can occur, including 
acentric fragments, ring chromosomes, translocations, 
and multicentric chromosomes. When breaks are 
induced at stages in the cell cycle during or after 
DNA synthesis, chromatid aberrations occur in 
which chromatids rather than whole chromosomes 
are seen to be involved in exchanges. These studies 
show that somatic cells are capable of DNA repair and 
also that the ends of chromosomes are unstable unless 
they possess an organized telomere. Constitutional 
terminal deletions must arise in such a way as to retain 
a functional telomere; some result from reciprocal 
translocations, others from interstitial deletions; in 
others a terminal deletion extends down the chromo- 
some until a DNA region homologous to telomeric 
sequences is reached. At this point a new telomere is 
synthesized by the enzyme telomerase. 
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Many structural chromosome rearrangements do 
not result in gain or loss of DNA. They are referred 
to as ‘balanced rearrangements’ and cause no pheno- 
typic effect. Sometimes the DNA is disrupted at the 
point of rearrangement leading to clinical abnormal- 
ity. Most rearrangements of this type arise as de novo 
events and are not transmitted to offspring. 

Balanced chromosomal rearrangements may be 
inherited without obvious effect through many gen- 
erations. However, abnormal (unequal) segregation 
(see Adjacent/Alternate Disjunction) of the rearranged 
chromosomes during meiosis may result in unbal- 
anced gametes containing one or other of the structur- 
ally abnormal chromosomes. If these gametes take 
partin fertilization, embryos with unbalanced chromo- 
some complements will result. Usually, this results in 
miscarriage or stillbirth, but sometimes a live-born 
infant is delivered with varying degrees of develop- 
mental abnormality, depending on the extent of the 
chromosomal imbalance. Karyotype—phenotype cor- 
relations in many such cases have led to the character- 
ization of a large number of clinically distinguishable 
chromosomal syndromes, each associated with imbal- 
ance of a different chromosome region. 

The following paragraphs define the various types 
of structural chromosome aberration that are encoun- 
tered in diagnostic cytogenetic laboratories. Detailed 
phenotypic descriptions of specific chromosomal syn- 
dromes are out of the scope of this section, and refer- 
ence should be made to standard medical genetics 
texts. 

A reciprocal translocation arises from an exchange 
of fragments between the ends of nonhomologous 
chromosomes. A quadrivalentis formed during meiosis 
and the various possibilities of alternate and adjacent 
disjunction of the translocation derivatives into pre- 
gametic cells are considered under meiotic segregation 
(see Meiosis). 

The smaller the exchange, the more likely is the 
occurrence of a viable unbalanced embryo with 
developmental abnormality; large imbalances tend to 
be inviable. 

A Robertsonian (centric fusion) translocation is one 
which occurs between the short arms of acrocentric 
chromosomes, e.g., human chromosomes 13, 14, 15, 
21, and 22. The exchange occurs within regions of 
repetitive DNA, and the result is a large, dicentric 
chromosome and a very small acentric fragment con- 
taining the ribosomal genes of both chromosomes. 
The acentric fragment is readily eliminated from the 
cell and so balanced carriers of Robertsonian trans- 
locations typically have only 45 chromosomes and 
show no phenotypic effects of chromosomal loss, as 
there is sufficient redundancy of ribosomal genes at 
other loci. Carriers of Robertsonian translocations 
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occur in about 1 in 500 births. The most important 
types are 45,t(13q;14q) and 45,t(14q;21q), as these can 
lead to unbalanced offspring with translocation tri- 
somy 13 (Patau syndrome) and translocation trisomy 
21 (Down syndrome), respectively. 

Insertional translocations may occur within a 
chromosome or between two chromosomes. Three 
breakpoints are required, two to provide the inter- 
stitial deletion of a chromosome fragment and one 
to allow the insertion of the fragment into another site. 
Segregation of the two translocation derivatives during 
meiosis may lead to unbalanced offspring with either a 
deletion or duplication of the inserted fragment. 

Most deletions and duplications are the result of 
unbalanced translocations. Ring chromosomes arise 
when breaks occur at both ends of a chromosome, 
with reunion of the proximal ends and loss of the distal 
telomeric fragments. They can be associated with sub- 
stantial terminal deletions of DNA. Ring chromo- 
somes are a common feature of irradiated cells. They 
may occur as part of a constitutional abnormality but 
are seldom inherited unless very small. Some are 
unstable, and sister chromatid exchange within the 
ring may lead to double-sized dicentric rings. The 
instability of the dicentric ring may lead to further 
changes and these may be associated with more exten- 
sive phenotypic abnormality. 

An isochromosome is a metacentric chromosome 
in which both arms are genetically identical (see Iso- 
chromosome). It most often arises by an isochromatid 
break and fusion of the sister chromatids above the 
centromere. Thus most isochromosomes are dicentric, 
although only one centromere is active and the iso- 
chromosome segregates normally during cell division. 
Human isochromosomes occur mostly as sex chromo- 
some abnormalities and are particularly associated 
with Turner syndrome (see Turner Syndrome). 

Inversions are intrachromosomal aberrations 
which result from two breaks with inversion of the 
intervening segment through 180°. There are essen- 
tially two types: pericentric inversions, in which the 

centromere is included within the inversion; and para- 
centric inversions in which the centromere is outside 
the inverted segment. An inversion alters the order of 
gene loci within a chromosome and this in itself has no 
phenotypic effect. However, inversions interfere with 
synapsis of homologs during meiosis. An inversion 
loop may form in order to achieve synapsis. Cross- 
ing-over within the loop of a paracentric inversion 
leads to dicentric and acentric recombinants. Cross- 
ing-over within a pericentric inversion leads to mono- 
centric recombinants with duplication and deletion of 
the chromosome distal to the breakpoints of the inver- 
sion. The closer the inversion breakpoints are to the 
ends of the chromosome, the smaller the imbalance. 


Further Reading 
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Definition and History 


Chromosome banding is the “lengthwise variation in 
staining properties along a chromosome... normally 
independent of any immediately obvious structural 
variation,” and thus excludes patterns such as those 
seen on polytene chromosomes of Drosophila, which 
have a morphological component. Although the first 
observations of what could be called chromosome 
banding were made at the end of the nineteenth cen- 
tury, modern chromosome banding methods date 
from 1968 and can be applied to chromosomes of a 
wide variety of species with no more than slight modi- 
fications. Following the introduction of Q-banding 
by Caspersson and his colleagues in 1968, Pardue 
and Gall inadvertently produced differential staining 
of heterochromatin in their pioneering in situ hybri- 
dization studies, leading directly to C-banding, and in 
1971 G-banding was discovered by several authors. R- 
banding was also introduced in 1971. Over the 
next few years, many other banding techniques, too 
numerous to mention individually, were introduced, 
many of them using fluorochromes. Silver staining for 
nucleolus organizing regions (NORs) was introduced 
in 1975, methods to show chromosome replication 
were invented, and the use of autoimmune sera to 
label kinetochores immunocytochemically was dis- 
covered. 


Classification of Chromosome Bands 
Four classes of bands can be recognized: 


1. Heterochromatic bands are demonstrated by C- 
banding techniques, as well as by various methods 
of fluorochrome staining, and correspond to clas- 
sically defined constitutive heterochromatin, that 
is, regions of chromosomes that normally remain 


condensed throughout interphase, and are gener- 
ally found as blocks around centromeres, and 
sometimes terminally or interstitially on chromo- 
somes. Facultative heterochromatin, such as the 
inactive X chromosome in female mammals, is not 
stained specifically by banding methods. 
Euchromatic bands form a pattern of alternating 
positively and negatively stained (or fluorescent) 
bands throughout the length of the chromo- 
somes, and are demonstrated by methods such as 
G-banding, R-banding, Q-banding, and by certain 
fluorochromes. 

Nucleolus organizer regions are the segments of 
chromosomes that contain the genes for ribosomal 
RNA, and which give rise to the interphase 
nucleoli. They can be stained with silver (Ag- 
NOR staining). 

Kinetochores are the centromeric structures 
through which mitotic and meiotic chromosomes 
are attached to the spindle microtubules, and are 
generally labeled using autoimmune CREST sera. 
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Applications of Chromosome Banding 


The most important application of banding is in the 
identification of individual chromosomes. Euchro- 
matic banding techniques, especially G-banding, are 
ideal, but euchromatic bands are essentially restricted 
to higher vertebrates. However, even in organisms 
that lack euchromatic bands, distinctive patterns of 
heterochromatic banding can often be used to distin- 
guish between chromosomes. In humans, G-banding 
is used to identify chromosome abnormalities and 
rearrangements in genetic diseases and cancers. Band- 
ing is also valuable for the identification of chromo- 
some rearrangements that have occurred in the course 
of evolution. 

Polymorphisms of heterochromatic bands have 
become a study in their own right, as well as being 
useful tools for distinguishing between paternal and 
maternal homologs. There is no evidence that these 
polymorphisms have any phenotypic effects in 
humans, but in maize there is a correlation between 
growth rate and amount of heterochromatin, suggest- 
ing that in some cases, heterochromatin may have 
phenotypic effects. 

Ag-NOR staining can be used not only to identify 
the location of nucleolus organizers on chromosomes, 
but also to assess their activity. In a species with multi- 
ple NORs, such as humans (who have five pairs) only 
a proportion are stainable with silver, while in hybrids, 
it often happens that only the NORs from one parent 
are active. Ag-staining of NORs in interphase nuclei 
also has prognostic value in various cancers. 
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CREST labeling of kinetochores is an important 
tool in identifying centromeres and in distinguishing 
active and inactive centromeres in dicentric chromo- 
somes, and has become invaluable in understanding 
centromeric organization. 


Functional and Structural Significance of 
Chromosome Bands 


Heterochromatin is widely believed to be functionless 
(junk DNA’), a view supported by the lack of obvious 
phenotypic effect of C-band polymorphisms in many 
cases (see above), and by its content (in most cases) of 
highly repetitive DNA sequences that could certainly 
not code for proteins, and which, it seems to many 
people, could have no other conceivable function. 
Such views are almost certainly incorrect. In Dros- 
ophila, a number of genes, as well as some nongenic 
functions, have been localized to heterochromatin, 
and it could well be that when the heterochromatin 
of other organisms has been examined in the same 
amount of detail, it will be found that these too have 
various functions in their heterochromatin. In add- 
ition, it has been suggested that centromeric hetero- 
chromatin has an essential role in holding sister 
chromatids together until the end of metaphase, and 
ensuring their controlled separation at the beginning 
of anaphase. 

Many differences have now been found between 
positive and negative euchromatic bands, most of 
which are related to the fact that positive G-bands 
have relatively few genes, while negative ones are 
much richer in genes; in humans, approximately 80% 
of the genes are in the negative G-bands, which form 
only about half of the genome. The highest concen- 
trations of genes are in the T-bands, a subset of 
R-bands (negative G-bands) that are found largely at 
the ends of chromosomes. 

The reason for the division of chromosomes into 
gene-rich and gene-poor segments is not at all clear, 
but it is probably universal, and not restricted to 
mammals. Evidence is accumulating that lower verte- 
brates, invertebrates, and plants also show a nonuni- 
form distribution of genes on their chromosomes, 
which appears to be correlated with patterns of early 
and late replication. Chromosome banding, therefore, 
is not simply an invaluable method of identifying 
chromosomes, but has also become very important 
in drawing our attention to functional aspects of the 
longitudinal differentiation of chromosomes. 


Further Reading 
Bickmore W and Craig J (1997) Chromosome Bands: Patterns in 
the Genome. Austin, TX: R. G. Landes. 
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See also: Centromere; Heterochromatin; 
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A chromosome break is a break in the phosphodiester 
backbone of DNA. Here chromosome break is under- 
stood to be a double-strand break in the DNA. 
The consequences of unwanted chromosome breaks 
are severe. The resulting broken ends of DNA are 
recombinogenic, and this can lead to chromosome 
fusions, aneuploidy, or rearrangements such as inver- 
sions, translocations, and deletions. In certain plants 
and animals, a chromosome break can be the first stage 
of a ‘breakage—fusion-bridge’ cycle. The loose ends 
made by a chromosome break may fuse together and 
form a bridge between two different chromatids or 
chromosomes. The resulting dicentric chromosome is 
unstable, and another chromosome break forms, lead- 
ing to duplications and deletions. 

In vivo, chromosome breaks are formed in several 
normal cellular processes. During T and B cell differ- 
entiation, a double-strand break is made during V(D)J 
recombination such that the different gene segments 
are cut and joined together by the nonhomologous 
end-joining pathway (see below). During meiosis in 
yeast, recombination is initiated by the formation of a 
double-strand break by the enzyme Spo11. The Spo11 
gene is conserved to mammals and is essential for 
meiosis and synaptonemal complex formation in the 
mouse, suggesting that formation of double-strand 
breaks may be a general mechanism for initiating 
meiotic recombination. The cut ends formed by the 
breaks go on to find nonsister chromatids and exchange 
sequences. 

Errors in cell metabolism can lead to chromosome 
breaks. Incomplete replication of the chromosomes 
can lead to difficulties when the sister chromatids are 
segregated at mitosis. The pull of the spindles can 
break off a part of a chromatid that does not have 
sister sequence because of unfinished replication. An 
improper telomere ‘cap’ of chromosomes can also lead 
to breaks. The telomere is a protective structure for 
the cell, ensuring that no genetic information is lost at 
replication. An uncapped chromosome can be recom- 
binogenic and is more likely to fuse with another 


uncapped chromosome. Chromosome breaks can 
also be made by a variety of agents external to the 
cell. Ionizing (gamma or X-ray) radiation induces 
breaks, as do a variety of chemicals. Alkylating agents 
(e.g., methyl methane sulfonate), base derivatives, aro- 
matic amines, nitroso compounds, and heavy metals 
can also cause chromosome breaks. 

There are two pathways to repair chromosome 
breaks. Repair by homologous recombination uses 
the genetic information from the sister chromatid or 
homolog as the template for repair. If a sister chroma- 
tid is available (i.e., after DNA replication and before 
mitosis), the repair process should restore the chromo- 
some to its original state. However, if repair is per- 
formed using the sequence of the homolog and the 
homolog sequence is different, the repaired chromo- 
some will carry a new sequence. This may lead to 
loss of heterozygosity at that locus and the possibility 
of a detrimental phenotype. Repair by the nonhom- 
ologous end-joining (NHE]J) pathway does not use a 
copy of the chromosome, but instead brings two 
pieces of broken DNA together and joins their ends. 
While this process may restore the overall structure of 
the DNA, it is usually not precise and deletions can 
occur. Though proteins of both pathways are con- 
served throughout evolution, yeast preferentially use 
homologous recombination, while mammalian cells 
more frequently employ NHE]J. 

Mutations in repair genes may cause serious diffi- 
culties for an organism. In humans, there are several 
diseases in which genes in the NHEJ pathway are 
mutated and patients are highly cancer-prone. In 
ataxia-telangiectasia-like disorder, the hMRE11 gene 
is mutated and in Nijmegen syndrome, NBS1 is 
altered. Both of these proteins are in the NHEJ repair 
pathway, along with the Ku proteins and DNA- 
dependent protein kinase. 


See also: Chromatid; Chromosome 
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Chromosome and chromatid bridges occur at ana- 
phase of mitosis and meiosis when chromatids are 
not free to separate and form a bridge between the 
two sets of segregating chromosomes. The chromatid 
or chromosome forming the bridge usually breaks, 
leading to duplication of a segment in one daughter 
nucleus and deletion in the other. Bridges occur for 
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several reasons. A chromosome may have two active 
centromeres (a dicentric, for example, from a recipro- 
cal translocation with intercalary breakpoints along 
chromosome arms), forming a bridge when two cen- 
tromeres on one chromatid move to different spindle 
poles. Chromosome bridges may occur when a cell 
divides before replication of the DNA is complete, 
and the unreplicated segment cannot separate. Ring 
chromosomes, involving deletion of both terminal 
regions and rejoining as a ring, are frequently asso- 
ciated with bridges at mitosis, arising from interlocked 
or dicentric rings formed following sister chromatid 
exchange. Repeated breakage—-fusion-bridge cycles 
may occur, leading to massive amplification of ter- 
minal DNA sequences. 


See also: Centromere; Chromatid; Chromosome 
Aberrations 
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Homologous recombination is a ubiquitous process 
that plays a key role in the repair of DNA damage and 
in restarting replication forks that have stalled or 
aborted as a consequence of the fork encountering 
DNA lesions (see Cox, 1998). Homologous recombin- 
ation is also exploited by organisms to generate genetic 
diversity in different ways, which include meiosis, mat- 
ing type switching, and various processes that lead to 
antigenic variation. Homologous recombination poses 
a unique threat to the integrity of circular genomes, 
since an odd number of recombinational exchanges 
can result in the fusion of individual monomer chromo- 
somes into dimers which cannot be effectively seg- 
regated to daughter cells at cell division (Figure 1A). 
In bacteria, homologous recombination between 
newly replicated sister duplexes, within a replicating 
chromosome, is probably the most frequent homolo- 
gous recombination event. To overcome the problem 
of chromosome dimerization, most bacteria with cir- 
cular chromosomes have evolved a specific mechan- 
ism to resolve dimeric chromosomes (and other 
circular replicons) to monomers prior to cell division, 
thereby ensuring their stable inheritance (Sherratt 
et al., 1995). In Escherichia coli, Xer site-specific re- 
combination converts dimers to monomers by using 


two related site-specific recombinases, XerC and 
XerD, which act at a specific chromosomal recom- 
bination site, dif. Homologs of XerC and XerD are 
present in the genomes of most characterized bacteria 
that have circular genomes. 


The Chromosomal Recombination Site, 
dif 

The recombination site, dif, is located at position 
1589 kb of the E. coli chromosome within the replica- 
tion terminus region. The minimal dif site required for 
recombination is a 28 bp sequence consisting of two 
11 bp binding sites, for XerC and XerD, flanking a 
6 bp central spacer region. The position of difis critical 
for its role in chromosome dimer resolution, since 
translocation of dif to a position more than approxi- 
mately 20kb from its normal location renders it in- 
active. While dif is not essential for E. coli survival, 
deletion of dif, or its translocation away from its 
normal position, results in a number of phenotypic 
defects, including cell filamentation, aberrant chromo- 
some segregation, and induction of the SOS response. 
The formation of aseptate filaments is largely a con- 
sequence of induction of the SOS response, since in its 
absence dif cells appear as chains in which partially 
formed septa bisect apparently dimeric chromosomes. 
Mutation in the xerC and xerD genes result in the 
same dif phenotype. Conditions that increase DNA 
damage and replication fork demise lead to increased 
use of the Xer recombination system and result in 
more severe phenotypic defects in cells impaired for 
Xer recombination. 


The Recombination Reaction 


XerC and XerD are members of the large tyrosine 
recombinase family (Esposito and Scocca, 1997). 
They are related to proteins which function in a wide 
range of DNA processing events including: recombin- 
ation of bacterial viruses and other genetic elements 
into and out of host genomes; the processing of 
transposition intermediates; and the control of gene 
expression. All members of this family catalyze recom- 
bination by using the same basic mechanism of conser- 
vative site-specific recombination. Recombination is 
initiated when an active site tyrosine nucleophile 
located near the recombinase C-terminus attacks the 
DNA scissile phosphate to form a 3’ phosphotyrosyl- 
recombinase-DNA covalent intermediate and a free 
DNA 5’ OH. A conserved pentad of other catalytic 
residues are implicated in transition-state stabilization 
and general acid-base catalysis. A strand exchange is 
then completed when a DNA 5’ OH from the partner 
duplex attacks the 3’ phosphotyrosyl bond, and 
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Figure I (A) Consequences of an odd number of homologous recombinational exchanges. In linear chromosomes, 
recombination between homologs (as shown), or between sister chromatids, always yields linear chromosomes. In 
the case of circular chromosomes, any odd number of homologous recombination events will generate a dimeric 
chromosome, which requires Xer-mediated resolution for segregation. (B) Schematic representation of the site- 
specific recombination reaction catalyzed by tyrosine recombinases. Two recombination sites (shown double- 
stranded) align in antiparallel, each bound by two recombinase monomers. In the case of Xer recombination, XerC 
(white ovals) catalyzes the first pair of strand exchanges to form a Holliday junction intermediate. This can then 
undergo a conformational change to provide a substrate suitable for its resolution by XerD (shaded ovals). (C) dif 
recombination and the Escherichia coli cell cycle. Xer recombination at dif (A) only becomes necessary in the event of 
homologous recombinational exchanges between replicating molecules to generate a dimeric chromosome. At the 
onset of septation, dimeric chromosomes are resolved by Xer recombination in an FtsK-dependent manner. 
Chromosomes are specifically oriented within the cell, with the origin (®) close to one pole and the terminus region 
(including dif) close to the other pole. After replication, both dif sites lie close to the invaginating septum, in the 
appropriate position for cell-division-dependent recombination to occur. 


rejoins the DNA 3’ phosphate to the 5 OH. A 
complete recombination reaction proceeds by two 
sequential pairs of strand exchanges separated by 6- 
8bp, the first generating a Holliday junction inter- 
mediate, and the second resolving this intermediate 
to generate recombinant product (Figure lB). 
Whereas most site-specific recombination systems 
utilize only one recombinase, the Xer system is un- 
usual in that two recombinases are required. The roles 
of XerC and XerD are temporally and spatially separ- 
ated, with XerC catalyzing the first pair of strand 
exchanges to generate the Holliday junction inter- 
mediate, and XerD resolving this intermediate. In 
order for XerD to recognize the Holliday junction 
and complete the recombination reaction, a conforma- 


tional change of the Holliday junction intermediate 
must take place. XerC—XerD interactions play a key 
role in the assembly of the heterotetrameric recombin- 
ation complex and in coordinating catalysis so that 
only two of the four recombinase molecules are active 
at any one time. Partial and complete recombination 
reactions have been reconstituted im vitro; they faith- 
fully reproduce the in vivo reactions. 


Xer Recombination and Multicopy 
Plasmid Inheritance 


Small multicopy plasmids have recruited the Xer 
recombinases to convert multimers to monomers, 
whereas large low-copy-number plasmids encode 


their own resolution systems. Plasmids such as ColE1 
and pSC101 contain recombination sites similar to 
dif (cer and psi, respectively) at which XerC and XerD 
act to carry out recombination via the same re- 
action mechanism as at dif. However, unlike dif, recom- 
bination at these natural plasmid sites is exclusively 
intramolecular. In addition to the dif-like core 
recombination site, cer and psi each possess approxi- 
mately 200 bp of accessory sequences adjacent to the 
XerC binding site at which host-encoded proteins 
bind (PepA and ArgR in the case of cer, PepA and 
ArcA at psi), in order to assemble a nucleoprotein 
complex in which the DNA duplexes follow a specific 
path. These complexes only form and undergo re- 
combination when the two recombination sites are 
directly repeated in the same DNA molecule, thereby 
restricting recombination to the production of mono- 
mers from dimers. 


Dimer Resolution and the Cell Cycle 


Xer recombination at chromosomal dif is restricted to 
cells that contain chromosomal dimers at the time 
of cell division. Furthermore, recombination requires 
that cell division can be initiated and that the 
septum-located protein FtsK is functional. The 
C-terminal domain of FtsK is needed for Xer recom- 
bination at dif, while the FtsK N-terminal domain is 
required for cell division. Therefore FtsK acts to inte- 
grate chromosome replication and segregation with 
cell division. Ker recombination at cer and psi is 
FtsK-independent. These findings suggest the ex- 
istence of a chromosome dimer-dependent, FtsK- 
dependent activation process for Xer recombination 
at dif (Figure 1C), which ensures that chromosome 
dimer resolution occurs only when required in cells 
containing dimers immediately prior to cell div- 
ision. 
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Chromosome mapping is the term used to refer to 
determining the position of genes on chromosomes. 
The term is generally synonymous with chromosome 
mapping which is the process of establishing gene 
maps for entire chromosomes. Chromosome mapping 
is an activity that has attracted the attention of many 
renowned biologists and statisticians over the last 90 
years and has led to the development of many in- 
genious methods for estimating the order and distance 
between genes and frequently, by extrapolation, esti- 
mates of the total number of genes in a large variety 
of organisms. This entry is confined to discussing 
chromosome mapping as applied primarily to higher 
organisms (eukaryotes) and particularly to humans. 


Historical Perspectives 


After the independent rediscovery of the principles of 
Mendelian genetics at the start of the twentieth cen- 
tury by three botanists (de Vries, Correns, and von 
Tschermak) it was quickly realized by Bateson and 
Sutton studying the transmission of characters in the 
vetch Lathyrus odoratus that, although the majority of 
genes appeared to segregate independently from each 
other when transmitted from one generation to the 
next, as predicted by Mendel’s laws, there were excep- 
tions; particular combinations of gene variants (alleles) 
appeared to either attract or repel one another, which 
Bateson referred to as ‘coupling’ and ‘repulsion’ 
respectively. This phenomenon was subsequently cor- 
rectly interpreted by T. H. Morgan working on gene 
inheritance patterns in Drosophila melanogaster to be 
caused by the genes concerned being physically 
located on the same chromosome and hence the alleles 
could not be transmitted independently of each other. 
Morgan first observed this for eye-color genes carried 
on the X chromosome and later for genes on other 
chromosomes, the autosomes. This association was 
termed ‘linkage’. 

Morgan and his pupils Sturtevant, Bridges, and 
Muller went on to establish that the level of linkage 
between genes on the same chromosome was deter- 
mined by the frequency of meiotic recombination 
(meiotic crossing-over, chiasmata) which had occurred 
during gamete forming in the parents, such that the 
greater the distance between genes the higher the fre- 
quency of recombination in the offspring and vice 
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versa. The frequency of recombination between genes 
could therefore be used as an approximation of the 
distance between them with the corollary that the 
maximum recombination frequency observable be- 
tween linked genes was 50%, a value indistinguishable 
from the recombination frequency expected when 
genes are carried on separate chromosomes and segre- 
gate entirely independently of each other. Morgan and 
his colleagues deduced that recombination frequen- 
cies could be directly converted into units of distance 
when the genes were close together. These units were 
termed centimorgans with 1% recombination being 
equal to 1 centimorgan. Genes were considered to be 
linked when the recombination frequency between 
them was significantly less then 50%. Using these 
principles Morgan and colleagues went on to establish 
that each gene had a specific position on a chromo- 
some and that genetic linkage maps could be con- 
structed arranged as linear orders of genes for all 
Drosophila chromosomes. Morgan was awarded the 
Nobel Prize in Physiology or Medicine for his dis- 
coveries in 1934. 

One year earlier, the cytologist T.S. Painter pub- 
lished an analysis of the giant polytene chromosomes 
existing in the salivary gland of Drosophila, in which a 
comparison was made of the chromosome banding 
patterns with the gene distributions observed in the 
genetic linkage maps generated by the Morgan school; 
this permitted defining the physical location of in- 
dividual genes directly on the chromosomes within 
the levels of resolution offered by banded polytene 
chromosomes and light microscopy. This process 
became known as physical mapping. Further, the 
use of Drosophila stocks carrying visible structural 
chromosome rearrangements such as translocations 
and inversions which disrupted the normal linkage 
map in a predictable fashion was an important means 
of confirming the colinearity of the two maps. This 
combination of genetic and physical map evidence set 
the gold standard and provided an important para- 
digm for the majority of subsequent gene mapping 
studies. 

Although it had been long realized that genes car- 
ried on the human X chromosome must be linked to 
each other, the first formal demonstration of this by 
segregation analysis was published by J.B.S. Haldane 
and J. Bell in 1937 for the genes for red-green color 
blindness and hemophilia A. In discussing their 
results, the authors made a profoundly prophetic state- 
ment which sums up both the enormous potential of 
gene mapping to human health care issues and under- 
lines many of the social issues now being faced today: 


Should an equally close linkage (as that between colour 
blindness and haemophilia) be found between the genes 


determining Huntington’s chorea and a blood group, we 
should be able, in many cases, to predict which children of 
an affected person would develop this disease and to advise 
on the desirability or otherwise of their marriage. 


In fact, it was a further 50 years before the information 
derived from gene maps was sufficiently detailed to 
permit diagnosis of Huntington disease and several 
other diseases by either indirect detection of disease 
status by linkage or shortly after by the direct detec- 
tion of mutations in the disease genes themselves. 
These developments were driven by the many tech- 
nical innovations that took place starting in the mid- 
1960s and carrying on up to the present day. These 
commenced with the discovery of chromosome band- 
ing in the late 1960s, which permitted alignment of 
gene location relative to the chromosome banding 
pattern in an analogous fashion to that first demon- 
strated in Drosophila polytene chromosomes. 

The advent of recombinant DNA technology 
allowed subdividing the genome of complex organ- 
isms, such as man, into much smaller and much more 
easily analyzable units than entire chromosomes by 
large-insert cloning and to the development of an 
entirely new category of genetic marker based on 
DNA sequence information that could be used effi- 
ciently in linkage and physical mapping studies. The 
discovery of the polymerase chain reaction (PCR) was 
an important development that revolutionized all 
aspects of DNA analysis, but in particular facilitated 
the ease with which DNA markers can be generated 
and detected in all types of mapping study. Other 
technical developments that have played a vital role 
in the progress of gene mapping included the follow- 
ing: the introduction of efficient and rapid methods 
of DNA sequence analysis; the use of radiation to 
artificially induce chromosome breakage and genetic 
segregation in in vitro somatic cell hybrids (radiation 
hybrid mapping); the direct physical mapping of 
DNA sequences to chromosomes by fluorescent 
in situ hybridization; the development of sophisti- 
cated computational tools to permit complex con- 
struction and integration of genetic and physical 
maps; and the efficient storage and retrieval of large- 
scale map and DNA sequence information from 
publicly accessible databases. 

The first coordinated human gene mapping activ- 
ities were organized as human gene mapping confer- 
ences which continued on an annual or biennial basis 
from 1973 to 1991. Significantly, at the first conference 
only 64 autosomal genes were recorded with a known 
map location. This data also included all studies car- 
ried out over the previous decades. However, the 
speed of mapping increased exponentially, which in 
the early to mid-1970s was largely determined by the 


widespread use of somatic cell analysis and later by 
implementation of DNA markers. 

1990 saw the start of the international Human 
Genome Project, one of whose primary goals was to 
define the ultimate gene map in which the location and 
structure of all coding sequences (genes) in humans 
and a variety of model organisms was defined at the 
DNA sequence level. In principle such a “complete” 
map should subsume all prior genetic and physical map 
information with one important proviso, namely that 
since the majority of genetic traits or diseases can only 
be recognized by their phenotype, mapping disease 
genes would still require genetic linkage analysis via 
family segregation analysis to establish an initial loca- 
tion on the DNA sequence map. As discussed later, 
the availability of the complete DNA sequence of the 
human genome should facilitate the ease with which 
genetic disease and trait phenotypes can be attributed 
to variations in specific coding sequences. 


From Gene Mapping Conferences to the 
Human Genome Project 


Human gene mapping results were presented and dis- 
cussed at the human gene mapping conferences which 
commenced in 1973. The human gene mapping com- 
munity quickly divided itself up into separate com- 
mittees to determine the mapping status of individual 
chromosomes. There were also standing committees 
for nomenclature, comparative mapping and, from 
1982 onwards, a recombinant DNA committee which 
made recommendations on naming and keeping track 
of cloned DNA segments used in mapping studies. 
One of the major problems was that workers in the 
field were starting to produce and publish large num- 
bers of cloned markers and naming them in all sorts 
of nonstandard ways leading to many errors in the 
identity of clones. In particular, the nomenclature of 
genes was considered to be extremely important to 
ensure that all newly discovered genes received a 
standardized name and gene symbol reflecting the 
known biological activity of the gene concerned. The 
standards and recommendations for naming human 
genes were defined in a landmark document published 
in 1977 at the time of convening the committee for the 
first time. The nomenclature committee still exists 
today and is still flourishing and striving to maintain 
the standards in gene naming first defined 25 years 
previously. From the earliest days of the gene mapping 
conferences both map and gene nomenclature data 
was stored in a centralized database. The database 
was modified later and extended to permit remote 
access for data perusal and editing of map information 
by designated editors for each chromosome and to 
maintain an up-to-date listing of all official gene 
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names. This was at a time that internet connections 
were still slow and unreliable and the World Wide 
Web did not yet exist. Part of the data curation carried 
out by chromosome editors was to determine the re- 
liability of mapping assignments. Assignments were 
considered provisional if they had been determined 
by a single group and confirmed when two or more 
groups independently mapped genes to the same loca- 
tion and preferably using different mapping methods. 
This rigor of data control was important in establish- 
ing the scientific credibility and value of human 
gene mapping activities. The success of the human 
gene mapping community to work in a coordinated 
fashion through chromosome committees encouraged 
the mouse mapping community to also adopt the same 
data collection and evaluation model. In addition to 
the human gene mapping conferences, gene mapping 
workshops on individual chromosomes were organ- 
ized and continued for some 10 years with funding 
from NIH, DOE, and the European Community. 
Although the number of mapped genes was increasing 
exponentially the number that had been mapped by 
1990 was still less than 10% of the estimated number 
of human genes. One of the problems was that data 
generation was still very much a cottage industry with 
many small groups making minor contributions rather 
than a few large groups generating most of the data on 
a production-line basis. The genome project which 
commenced in 1990 foresaw the need for such a 
scale-up in productivity and several large groups 
were funded in the USA as genome centers by NIH 
and DOE in what was originally believed to be a 15- 
year program to completely map and sequence the 
human genome. The initial US budget was $200 
million per year. At that time the only major center in 
Europe involved in large-scale mapping was Généthon 
in France, initially funded on a private basis by the 
French Muscular Dystrophy Association. Some years 
later, the Sanger Centre was established and funded by 
the Wellcome Trust in the UK which increased the 
European contribution to the Human Genome Pro- 
ject enormously. The net result was that the original 
gene mapping activities were mainly replaced within 
a space of 2 years by a limited number of large and 
well-funded genome centers and the gene mapping 
conferences were abandoned. The chromosome specif- 
ic workshops have carried on sporadically. 

The original planning of the Human Genome Pro- 
ject foresaw that the first 5 years would be mainly 
concerned with creating high resolution genetic or 
linkage maps, the second 5 years with creating phys- 
ical maps and generating the DNA clones necessary for 
sequencing, and the last 5 years with carrying out the 
sequencing itself. Another expectation was that faster 
and cheaper methods of DNA sequencing would have 
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to be devised to achieve the goal of sequencing the 
entire human genome in a cost-effective and timely 
fashion. During the first few years of the project con- 
siderable investment was made, particularly in Japan 
and the USA, to design and develop new sequencing 
strategies. In the event, no new and effective sequen- 
cing methods emerged and a more efficient use of 
already existing technology proved to be the answer. 
These improvements included increasing the speed 
and capacity of the automated sequencing machines 
based on fluorescent dideoxy labeling, running many 
machines simultaneously in parallel, and using robots 
to handle sequence clones and set up sequencing reac- 
tions. A number of centers were designated to start 
large-scale sequencing operations. However, the plan- 
ning was completely undermined by an unexpected 
development; the commercial firm Celera, under the 
leadership of Craig Ventor, developed the capability of 
reconstructing the DNA sequence of entire genomes 
following the sequencing of short randomly selected 
DNA clones (shot-gun sequencing). In theory, this 
removed the necessity of mapping the clones needed 
for DNA sequencing beforehand. The year 2000 saw 
the publicly funded arm of the Human Genome Pro- 
ject rapidly adjust its short-term sequencing goals in 
response to this pressure from Celera and it seems 
likely that all of the sequencing will be complete by 
the end of 2001 in draft form and fully completed 
by 2003, just in time to celebrate the 50th anniversary 
of the discovery of the structure of DNA by Watson 
and Crick. 


Modern Developments in Linkage 
Mapping 


An intrinsic feature of establishing linkage between 
genes is the necessity of being able to distinguish 
between two or more forms of the genes concerned, 
known as genetic polymorphisms. In Morgan’s ori- 
ginal studies he made use of variations in external 
phenotypic features such as eye color, sternopleural 
bristle shape and bristle number, etc. The use of serum 
protein and blood group variations was introduced 
in human linkage studies and led to demonstration of 
the first autosomal linkage in humans (between ABO 
secretor and the Lutheran blood group) by Jan Mohr in 
1954. This was followed 2 years later by the demon- 
stration by Newton Morton of linkage between ellipto- 
cytosis, a form of erythrocyte membrane abnormality 
associated with anemia, and the rhesus blood group. 
This study also determined that elliptocytosis was 
not linked to rhesus in all families and demonstrated 
heterogeneity of a genetic disease by linkage analysis 
for the first time. Another notable study carried out in 
this period by Jim Renwick and Sylvia Lawler linked 


the ABO blood group to the nail—patella syndrome and 
showed that the meiotic recombination frequency 
between the two loci was significantly higher in 
females than in males; many subsequent studies have 
generally confirmed this sex difference albeit that 
the sex differences in recombination frequency vary 
from one chromosome region to another and even in 
some regions, such as 11p, demonstrate a male excess. 
These recombination differences result in the female 
genetic maps being approximately 1.7 times longer 
than those of the male, however, with gene order 
along the chromosomes remaining the same. The aver- 
age total length of the human male meiotic map is about 
27 Morgans and that of the female 47 Morgans. Con- 
ventionally, male and female linkage maps are derived 
and published separately or alternatively combined to 
give an average of the two sexes. Given the length of the 
human genome to be approximately 3.5 x 10° bp in 
length, the average physical length corresponding to 
1 cM of the sex-averaged linkage map is approximately 
0.9 Mb. A similar excess of female recombination has 
also been described in other organisms including the 
mouse. Ironically, Morgan did not encounter this prob- 
lem in his original Drosophila studies because of the 
complete lack of recombination in the male, the linkage 
map being entirely based on female recombination. 
Recombination appears to be much higher towards 
the telomeres of the majority of chromosomes in 
both sexes in man and mouse and the physical-genetic 
map length ratio drops as low as 1 cM being equivalent 
to 100 kb in some telomere regions. 

An important development in human linkage an- 
alysis was the introduction of appropriate methods of 
statistical analysis to take account of the small family 
sizes encountered. The amount of information derived 
froma single family is usually insufficient to be able to 
draw firm conclusions on the likelihood that two loci 
are linked. Many analytical procedures were developed 
to cope with this problem starting with the work of 
Bernstein in the early 1930s, carried on by Haldane 
and Smith in 1947, and followed by Morton in 1955. 
This led to implementation of the LOD (logarithm of 
differences) score method which is still widely used 
today. In this method, the relative likelihood that a 
particular set of family data would be obtained if a pair 
of genes is linked rather than if they segregate ran- 
domly is calculated for different recombination fre- 
quencies varying between 0 and 0.5. The LOD score 
is defined as: 


LOD = Log,,[(1 — 0)%0*/(0.5)"**| 
where 0 is the recombination fraction and N and R 


refer to the number of observed nonrecombinant and 
recombinant individuals respectively. The power of 


LOD score analysis is that LOD scores can be 
summed over several families. Linkage between two 
loci is formally accepted as established when the total 
LOD score reaches a value of 3 or more giving a 
probability that the observations have occurred by 
chance of less than 1 in a 1000. A negative LOD 
score of —2 or more is accepted as an absence of 
linkage and demonstration of completely independent 
segregation. Where linkage has been established, the 
recombination fraction between the two loci con- 
cerned is taken as the 0 value corresponding to the 
highest LOD score, referred to as the maximum like- 
lihood value of 0. However, direct determination of 
the number of recombinants and nonrecombinants 
requires information on the phase of the alleles on 
the parental chromosomes, i.e., which alleles of the 
two loci are in coupling and which are in repulsion. 
Such direct information can only be derived from 
three-generation families in which the alleles of all 
four parental chromosomes can be unambiguously 
recognized. This level of complete information rarely 
occurs in human families and the probability of indi- 
viduals being recombinants or nonrecombinants is 
derived from calculating the likelihoods of all possible 
genotype (allelic) combinations. Computer programs 
such as LIPED have helped remove the burden of such 
tedious and complex calculations. LIPED permits 
construction of linkage maps based on two point 
crosses, taking into account lack of expression of a 
disease phenotype in some individuals (incomplete 
penetrance,) the mode of inheritance and the fre- 
quency of the disease and marker alleles in the popu- 
lation being studied. Programs developed later, 
including LINKAGE, and much more recently MAP- 
MAKER, have opened the way to calculating linkage 
between multiple loci simultaneously which leads to a 
much more rigorous estimate of the distances between 
loci and establishing their order than combining the 
data from a series of two-point crosses. 

It was realized by Haldane in 1919, in the very 
earliest days of linkage mapping, that the direct con- 
version of recombination fractions into genetic dis- 
tance would result in an underestimate of the map 
distance between genes with longer distances between 
them due the occurrence of double recombinations 
transferring the alleles back onto the same chromo- 
some strand. Haldane developed a mapping function 
based on assumption of a random distribution of 
chiasmata, corresponding to a Poisson distribution, 
which corrected the observed recombination fraction 
for the chance that two or more recombinations 
had occurred in a given chromosome segment. The 
Haldane mapping function is defined as: 


w = —1/2In(1 — 26) 
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where w is the map distance, @ is the recombination 
fraction, and In is the log to the base e. 

Mechanisms have been proposed which cause a 
nonrandom distributions of chiasmata, including an 
obligate chiasma per chromosome or chromosome 
arm as proposed by several authors, or the occurrence 
of interference between chiasmata. Interference occurs 
when the formation of a chiasma within a given 
chromosome segment reduces the likelihood of 
further chiasmata arising within the same segment. 
The greater the segment, the lower the level of inter- 
ference and vice versa. The generally applied Kosambi 
mapping function has many similarities to Haldane’s 
function but also takes account of interference 
between chiasmata. 

The Kosambi mapping function is defined as 


w = 1/4 In{(1 + 26)/(1 — 20)] 


where w is the map distance, @ is the recombination 
fraction, and In is the log to the base e. 

Due to the large discrepancies between the distri- 
bution of chiasmata in human male meiosis observed 
by direct examination of meiotic chromosome pre- 
parations by Hultén and a random distribution, more 
complex mapping functions have been proposed by 
Newton Morton and others. These take into account 
differences in the distribution of chiasmata along 
the chromosome. In principle, specialized mapping 
functions can be defined which match the chiasma 
distribution characteristics of each individual chromo- 
some, but have only been applied in a limited number 
of studies for a few chromosomes. 

The major factor which limited the efficiency of 
human linkage studies for many decades was the 
availability of a sufficiently large number of highly 
polymorphic markers which were informative 
(heterozygous) in the majority of individuals studied. 
The discovery of variations in DNA sequences known 
as restriction fragment length polymorphisms 
(RFLPs) opened a new era of linkage mapping. A 
landmark paper by Botstein and colleagues in 1980 
predicted the use of RFLPs for constructing a com- 
plete human linkage map and facilitate the mapping 
of disease genes. By 1985 linkage analysis had led to 
defining the map location of several important disease 
genes including Huntington disease, myotonic dys- 
trophy, retinoblastoma, antithrombin III deficiency, 
neurofibromatosis, polycystic kidney disease type I, 
including a large number of blood groups, serum pro- 
teins, and isoenzymes. The use of RFLPs as markers in 
linkage studies was rapidly followed by variable num- 
ber tandem repeats VNTRs and microsatellites, also 
referred to as short tandem repeats or STRs. Both are 
characterized by the polymorphism being caused by 
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Table I Progress in human linkage mapping resolution 1987—94 

Year Institute/author Type of Number of Average 
marker random markers density 

1987 Collaborative Research Inc.; Donis-Keller et al. RFLPs 403 ~10cM 

1992 Généthon; Weissenbach et al. STRs 813 ~4.5 cM 

1994 CEPH; Gyapay et al. STRs 2066 ~1.5cM 

1994 CHLC; Murray et al. RFLPs, STRs 5840 ~0.7 cM 


variations in the number of repeats of short segments 
of DNA at a given locus. These markers exhibited a 
much higher level of polymorphism than RFLPs and 
were used for constructing high-density framework 
linkage maps in the first phase of the Human Genome 
Project. Ideal three-generation families containing 
many offspring were used for the construction of 
these maps. The families had been assembled some 
years earlier by the Centre d’Etude du Polymorph- 
isme Humaine (CEPH) in Paris for the express 
purpose of encouraging the gene mapping community 
to construct genetic maps using the same families; this 
permits maps generated by different groups and using 
different markers to be directly integrated. Several 
large groups were involved in generating the data, 
notably Généthon, the Whitehead Institute, and the 
Co-operative Human Linkage Centre (CHLC). The 
reliability required in determining the local map order 
of such framework markers had to be at least 1000:1. 
By 1994 a framework linkage map had been created at 
an average resolution of 2cM and requiring approxi- 
mately 1000 equally spaced markers (see Table 1). 
Importantly, the specificity of STR markers 
depends upon PCR reactions which can be automated 
by the use of robots; this has opened the way to 
large-scale linkage studies involving several hundred 
markers spanning the whole genome and referred to as 
genome screens. A 10 cM resolution using ~ 350 mark- 
ers is usually used for an initial screen. The detection 
of linkage of single gene disorders using this strategy is 
now standard practice. However, many genome screen 
studies are now under way to identify the genes 
involved in complex diseases involving several genes 
including such chronic disorders as diabetes type 1 
and 2, schizophrenia, cardiovascular diseases, hyper- 
tension, and asthma. The phenotype of some of these 
complex disorders, such as hypertension, involves 
continuous variation, and phenotype classes suitable 
for linkage studies are defined by arbitrarily dividing 
the total variation into classes of suitable size. This is 
known as quantitative trait locus (QTL) mapping and 
has been used extensively in animal and plant breeding 
programs to map genes determining commercially 
important traits such as weight, fat, and starch con- 
tent. However, the analysis of complex traits has 


necessitated developing alternative methods of data 
analysis than the standard LOD score procedures. 
These are known as nonparametric analysis methods 
since they do not require using fixed parameters such 
as mode of inheritance and trait frequency as does 
standard LOD score analysis. Further, given that the 
extended families required for LOD score analysis are 
often unavailable in human populations for complex 
traits, the affected sib-pair method has been widely 
implemented. This method, first proposed by Penrose 
in the mid-1930s and further extended by Elston and 
Stewart in 1971, compares the frequency of trans- 
mission of individual marker alleles to both affected 
sibs relative to the 50% transmission expected on a 
random basis. It is important to be able to identify 
the specific parental origin of each allele since this 
increases the statistical power of the study. This is 
referred to as identical by descent (IDB) as apposed 
to identical by state (IDS). For example, assume that 
two sibs are both 2-1 for a given marker, their alleles 
are identical by state but may not be identical by 
descent dependent upon the genotype of the parents. 
Most of the complex trait mapping studies now under- 
way make use of mapping procedures based on the 
affected sib-pair principle. However, although several 
claims of detecting linkage with particular marker loci 
in complex disorders such as schizophrenia and dia- 
betes have been made, it has proven difficult to con- 
firm these claims in other data sets and populations. It 
has been argued that the chance of detecting a linkage 
in complex disorders is increased when the studies are 
carried out in isolated populations derived from a 
small group of founder individuals within the last 
1000 years. The underlying argument is that such 
population isolates should exhibit much less genetic 
heterogeneity and it will be accordingly easier to de- 
tect linkage with rare alleles causing disease. Although 
this principle has been clearly demonstrated for auto- 
somal recessive genes, the same may not be true for the 
genes involved in complex disorders. Two historically 
isolated populations, namely the Finnish and Icelandic 
populations, are being extensively used for gene map- 
ping in complex disorders and have failed to live up to 
their promise to date. Is it possible that the alleles of 
disease genes are so common in all populations that 


there is no advantage to be gained from using isolated 
populations? Accordingly, it has recently been argued 
that screening for complex disease genes by linkage 
analysis can better be carried out using populations 
from London and New York rather than Helsinki or 
Reykjavik. 

The availability of large amounts of DNA sequence 
data during the last 2 years as a result of the sequencing 
activities in the Human Genome Project has led to 
defining single nucleotide substitution polymorph- 
isms (SNPs). These are extremely common and occur 
at an average frequency of about 1 per 1000 nucleo- 
tides giving a total of approximately 3 x 10° SNPs for 
the whole genome. The major technical advantage of 
SNPs for genetic marker studies is their low analysis 
cost and high throughput potential by comparison to 
other markers. Another benefit of SNPs is that the 
variant nucleotide might be responsible for the disease 
phenotype being studied, particularly where SNPs 
occur within coding sequences. Expected difficulties 
in their use for linkage studies are that, since SNPs 
have a low polymorphism content similar to RFLPs, 
the information of several adjacent SNPs will have to 
be used in tandem to generate a phase-known haplo- 
type to deliver the same level of information as single 
locus VNTRs or microsatellite markers. It is antici- 
pated that SNPs will be extremely useful for detecting 
disease association by linkage disequilibrium over 
short genomic distances (see below). Many novel 
methods of rapid SNP detection are either under 
development or already available and vary from 
mass-spectroscopy to oligonucleotide hybridization. 
Large SNP databases are being constructed and made 
available both in the public and private domains. 
However, despite the enormous attention now being 
given to SNPs, they remain a theoretically interesting 
marker whose potential has still to be realized. 

Linkage disequilibrium arises when alleles of two 
linked loci are located so close together on the same 
chromosome (in coupling) that their linkage relation- 
ship is never or extremely rarely disturbed by meiotic 
recombination. This phenomenon can be extremely 
important in narrowing down the chromosome region 
in which a putative disease gene is located where a 
linkage has been found locating the disease gene 
between two flanking markers with known map loca- 
tion. Linkage disequilibrium between the AF508 allele 
of the cystic fibrosis (CF) gene and alleles of adjacent 
flanking markers was used to significantly narrow 
down the region encompassing the CF gene on the long 
arm of chromosome 7 following initial chromosome 
assignment of the disease locus by linkage analysis. In 
principle, the level of linkage disequilibrium between 
a disease allele and a series of alleles of linked marker 
loci is determined by the distance between them with 
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directly adjacent loci showing complete linkage dis- 
equilibrium and a reduction in linkage disequilibrium 
with increasing distance. In the case of the CF gene 
changes in the level of linkage disequilibrium between 
the original two flanking markers suggested that the 
gene must lie midway between the two markers. This 
fact helped in locating the CF gene. There are conflict- 
ing estimates about the average distance over which 
complete linkage disequilibrium extends varying from 
3 to 50kb. In practice it appears that the relationship 
between linkage disequilibrium and distance varies 
widely from one chromosome region to another and 
each region has to be tested on its own merits. The 
lower estimate of 3 kb leads to the conclusion that 
over 1 million equally spaced SNPs will be required 
to cover the entire human genome in linkage disequi- 
librium analysis in the hunt for disease alleles of genes 
in complex diseases. Most workers in the field are 
hoping fervently that the average disequilibrium dis- 
tance will turn out to be much larger than 3 kb and 
consequently that it will require a much smaller num- 
ber of markers for disease allele detection by linkage 
disequilibrium. 


Other Forms of Linkage (Syntenic) 
Analysis 


In 1971 Renwick coined the term ‘synteny’ to apply 
to genes carried on the same chromosome. It was 
realized that genes could be carried on the same 
chromosome but were not necessarily genetically 
linked because the distance between them resulted in 
meiotic recombination frequencies greater than 50%. 
Accordingly, all linked genes are by definition syn- 
tenic, but not all syntenic genes linked. 


Somatic Cell Hybrid Analysis 

In the mid-1960s an extremely powerful method of 
gene mapping for detecting syntenic relationships 
emerged from independent contributions by Henry 
Harris, Boris Ephrussi, and John Littlefield and was 
termed ‘somatic cell hybridization.’ Essentially when 
tissue culture cells from two different species are fused 
together by the use of Sendai virus and one of the cell 
lines contains an enzyme deficiency, cell hybrids 
can be selected by their capacity to grow in selection 
medium. In 1967 Weiss and Green fused a permanent 
mouse cell line which was deficient in the enzyme 
thymidine kinase (TK) with human diploid fibroblasts 
and grew the hybrids in hypoxanthine-aminopterin- 
thymidine (HAT) selection medium which kills 
TK-deficient cells. The enzyme deficiency was com- 
plemented in those mouse cells which had successfully 
fused with human cells and could make use of human 
TK activity. Weiss and Green noted that human 
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chromosome 17 was consistently retained and con- 
cluded that the gene for human TK must be on the same 
chromosome. Shortly thereafter, a similar selection 
system was developed in mouse and Chinese hamster 
cell lines based on deficiencies of the HGPRT locus on 
the X chromosome. These developments led to the 
observation that hybrids between mouse and human 
cells tended to lose most human chromosomes except 
for the chromosome providing the selectable marker 
and that the loss of the other chromosomes was more 
or less random. By comparing the pattern of retention 
and loss of human chromosomes in independently 
derived somatic cell hybrid lines with the pattern 
of specific human gene products, such as isoenzymes 
and cell membrane markers, it was possible to as- 
sign human genes to individual chromosomes. The 
approach was quickly implemented by many labora- 
tories and the majority of genes mapped during the 
1970s and early 1980s were mapped using somatic cell 
hybrids. The use of human cell lines containing trans- 
locations which divided particular chromosomes into 
two pieces that could be separated from each in the 
somatic cell hybrid system permitted mapping human 
genes to subregions of chromosomes. This was termed 
regional assignment. Various groups developed exten- 
sive regional mapping panels of somatic cell hybrids 
to permit rapid construction of maps for individual 
chromosomes. For example, such panels for the X 
chromosome used nearly 50 different translocation 
breakpoints to divide the chromosome into small seg- 
ments or bins into which genes were mapped. The 
resolution provided by such mapping panels exceeded 
the resolution of the chromosome banding patterns 
which is approximately 5 Mb. 


Radiation Hybrid Mapping 

In 1975 Henry Harris and Stephen Goss used radi- 
ation to induce breakage in the human chromosomes 
present in a hybrid cell to divide a given chromosome 
into small fragments, most of which were subsequently 
lost. This then permitted localization of genes to small 
chromosome regions. David Cox adapted the same 
principle to develop radiation hybrid mapping (RH 
mapping) in which human cells are strongly irradiated 
and then fused to a rodent cell line using standard 
somatic cell hybridization techniques. Approximately 
100 hybrid cell lines would contain the whole human 
genome arranged as random fragments which over- 
lapped each other. Cox reasoned that if two genes 
were adjacent to each other, they would very likely be 
carried on the same fragment and be simultaneously 
present in a given cell lines; the greater the distance 
between the two genes then the lower the likelihood 
they would be present simultaneously. In many ways 
the procedure depends onthe same principles as meiotic 


recombination mapping in that the frequency of radia- 
tion induced breaks occurring between two genes is 
direct function of the distance between them. The 
segregation frequency varies between 0 (the two 
markers are never separated) to 1.0 (the two markers 
are always broken apart and are therefore unlinked). 
A mapping function was used to account for the 
underestimate of the segregation frequency when 
the markers are far apart, which is almost identical to 
the Haldane mapping function used in meiotic recom- 
bination studies. The units of distance were known as 
centirays (cR). Further, by varying the radiation dose 
and generating fragments of different average lengths, 
it was possible to create maps at different resolutions. 
It was quickly discovered that, unlike chiasmata 
which are clearly not randomly distributed and lead- 
ing to increases in the length of the genetic map 
towards the end of the chromosomes, radiation breaks 
occurred more or less randomly, resulting in the 
construction of RH maps which are proportional to 
physical length along the length of the whole chromo- 
some. The only requirement was that the presence of 
each human marker could be individually detected 
by a PCR reaction in a DNA sample of a radiation 
hybrid. This development opened the way to rapidly 
establishing the map position of DNA markers, includ- 
ing sequence tagged sites (STSs), expressed sequence 
tags (ESTs), and STRs, which were being generated 
and used in large numbers in the genome project. In 
principle the whole gene mapping community could 
use the same RH cell lines to construct a map based on 
data from many centers. A particularly powerful fea- 
ture of the system is that the DNA from standard sets 
of RH cell lines is commercially available permitting 
everyone to map individual markers in their own 
laboratory. This is done by determining which lines 
from the standard set are positive or negative for a 
given marker by PCR and sending the results to a 
server such as those at the Sanger Centre, the Stanford 
Human Genome Center, or the Whitehead Institute. 
The map location with supporting evidence for the 
reliability of the result and information on the sur- 
rounding markers is returned within minutes: gene 
mapping made easy. 
http://www.sanger.ac.uk/Software/Rhserver/ 


Sanger Centre 


http://www-shgc.stanford.edu/RH/index.html 
Stanford Human Genome Center 


http://carbon.wi.mt.edu:8000/cgi-bin/contig/ 
rhmapper.pl Whitehead Institute 


RH mapping has become a standard mapping method 
used in many organisms besides humans. These 
include mouse, dog, pig, cow, chicken, and zebrafish. 


Mapping by Chromosome in situ 
Hybridization 

Chromosome in situ hybridization was introduced in 
the late 1960s by Pardue and Gall to locate the position 
of highly repetitive ribosomal genes in Drosophila 
polytene chromosomes. Essentially the method entails 
labeling a particular type or segment of DNA with 
tritiated thymidine, hybridizing the labeled DNA 
directly to chromosome preparations on a microscope 
slide, and detecting the position of labeling above 
the chromosomes by exposure to an extremely sensi- 
tive and thin layer of photographic emulsion placed 
above the preparation. This method was termed auto- 
radiography and initially could only be used to locate 
the chromosomal position of highly repetitive types of 
DNA. The map position of the ribosomal genes and 
several categories of satellite DNAs were mapped in 
man and other organisms, mainly primates, in the 
1970s using this principle. Early claims were made to 
have also mapped single sequence genes using the 
method, although there was a lot of skepticism in the 
mapping community because of the low sensitivity of 
the procedure. In an early human gene mapping con- 
ference one of these claims was discussed and evoked 
the wry comment “the claimed result could only have 
been achieved by starting the autoradiographic expos- 
ure some time in the Pleistocene.” However, by 
improving the labeling and hybridization conditions 
it finally became possible to detect single copy DNA 
sequences using autoradiography in the late 1970s. 
Over the following years the chromosome location 
of several human genes for which cloned DNA 
sequences were then available was established using 
autoradiography. However, the method was extreme- 
ly time-consuming because of the low signal-to-noise 
ratio and involved scoring the chromosome location 
of silver grains above many metaphases to finally 
determine if there was a statistically significant higher 
concentration of grains at one particular chromosome 
location. A significant breakthrough came in the 
mid-1980s with the introduction of fluorescent 
DNA probe labeling which was termed fluorescent 
in situ hybridization (FISH). In this technique, the 
DNA probe is labeled using nucleotides which have 
been modified beforehand by incorporating reporter 
molecules, such as digoxygenin or biotin. These 
probes are then used for the hybridization to meta- 
phase chromosomes. Subsequently, the reporter mol- 
ecules can themselves be detected by either binding of 
a specific antibody to which a fluorescent dye has been 
attached, as in the case of digoxygenin, or by binding 
of a fluorescently labeled avidin molecule to biotin. 
The preparations are examined under a fluorescent 
microscope and give a much higher signal-to-noise 
ratio than radio labeling. Although it is usual and easier 
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to use relatively large DNA probes > 40 kb, the method 
permits detection of signals using probes down to 
about 3kb in length. This method was immediately 
used to determine or confirm the chromosome location 
of many genes. In particular, by simultaneously hybrid- 
izing several probes each of which was labeled with 
a fluorochrome with a unique emission wavelength, 
it was possible to establish the local order of DNA 
sequences on the chromosome. In the early 1990s 
probes became available for many human loci includ- 
ing regions involved in chromosome microdeletion 
syndromes such as the DiGeorge, Prader-Willi, and 
Langer-Gideon syndromes. FISH analysis has be- 
come the standard method of detecting the presence 
of deletions by there being a signal on only one of the 
two homologs, instead of on both, as in normal indi- 
viduals (see Figure 1). 

However, the compaction of the metaphase 
chromosome does not permit distinguishing the loca- 
tions of probes which are closer together than about 
3. Mb, and when observed through the microscope 
they appear to be located directly above each other. 
Trask and Lawrence independently introduced inter- 
phase analysis in which the distance between probe- 
specific signals was investigated in the nuclei of 
nondividing cells. This permitted determining the 
order of DNA sequences down to about 100 Kb. A 
development in further increasing the spatial resolu- 
tion was the introduction of extended DNA fiber 
analysis. In this method DNA fibers are first released 
from nuclei by controlled lysis directly onto chromo- 
some slides by one of several methods and then hybri- 
dized giving a spatial resolution down to about 3 kb. 
These high-resolution methods have become an 
important part of establishing the local order and dis- 
tance between specific DNA sequences. Applications 
which have used this approach include determining 
the order of cloned segments within the Duchenne 
muscular dystrophy gene and establishing the position 
of breakpoints in specific genes in cases of leukemia. 

An important discovery was that when a probe 
library was created from a sample of a single chromo- 
some isolated by flow sorting, the library could be 
fluorescently labeled and used to detect sequences 
for the entire chromosome. This was referred to as 
chromosome painting. Flow-sorted libraries have now 
been created for all human chromosomes and many 
other mammalian species also. Observations on the 
interphase nucleus using chromosome paints showed 
that each chromosome is located within its own three- 
dimensional territory with very little overlap between 
chromosomes. Subsequent improvements in fluores- 
cent labeling and image analysis now permit visualiz- 
ing all 24 different human chromosomes in a unique 
color simultaneously. This is approach is referred to as 
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Figure | 
chromosome in a female carrier patient carrying a 
microdeletion of the Duchenne muscular dystrophy 
gene on the other X chromosome (signal absent). 


Single cosmid signal on the normal X 


multicolor FISH or M-FISH. The system is particu- 
larly powerful for detecting small translocations 
which appear as chromosomes labeled in two different 
colors (see Figure 2). Several M-FISH systems are 
now commercially available and are being rapidly in- 
troduced into clinical cytogenetic laboratories for the 
analysis of complex chromosome rearrangements in 
relation to congenital abnormalities and malignancies. 

It was discovered by Wienberg, Ferguson-Smith, 
and colleagues that chromosome-specific painting 
could be applied across species so that a chromosome 
paint made from flow-sorted chromosomes in one 
species would also detect the homologous sequences 
in the other. This has permitted tracing karyotypic 
evolution between different mammalian groups 
and defining which human syntenic relationships are 
ancient and which have arisen recently. Surprisingly, 
the overall structure of mammalian genomes seems to 
have been conserved over very long periods of time 
with occasional disruptions of syntenic regions by 
translocation. In particular the method permits map- 
ping the redistribution of syntenic fragments gener- 
ated by translocation during the evolution of new 
species and estimating the minimal number of translo- 
cations that must have taken place. In general the 
results of this cross-species hybridization for defining 
conserved syntenies matches the results of gene map- 
ping carried out by other mapping methods. Achiev- 
ing chromosome paints between species that have been 
evolutionarily separated many tens of millions ago is 
surprising, because, given the low proportion of coding 
sequences in the genome, it implies conservation not 
only of coding but also of noncoding sequences. 


Figure 2 Multicolor FISH showing several transloca- 
tions in a colon cancer cell. 


In 1992 a technique called comparative genome 
hybridization (CGH) was introduced by Kallioniemi. 
By simultaneous hybridizing total genomic test and 
normal control DNAs labeled with different fluores- 
cent dyes to chromosome preparations, increases or 
decreases in the signal of the test DNA relative to the 
control can be detected along individual chromosomes. 
The ratio of the two signals indicates changes in the 
number of copies of different chromosome regions 
induced by aneuploidy or local amplification. Com- 
plex image analysis equipment and software is neces- 
sary to carry out the analysis and calculate the level 
of aneuploidy. The method has been instrumental 
in mapping the chromosome location of oncogenes 
which undergo local amplification in tumor tissue. In 
general the method is applicable to studying the gen- 
ome location of changes in copy number in tissues 
from which it is difficult or impossible to derive 
chromosome preparations. 

A new method of examining copy number changes 
in chromosome regions that is now emerging and is 
likely to replace or supplement cytogenetic investiga- 
tion based on microscopical investigation is that of 
CGH microarray analysis. Essentially, the method in- 
volves spotting probes derived from many chromo- 
some regions onto a glass slide as a tightly grouped 
raster of spots. By the simultaneous hybridization of 
test and control DNAs labeled with different fluores- 
cent dyes, increases or decreases in the signal of the 
test relative to a control can be detected in an identical 
fashion to CGH analysis on chromosomes. Activities 


Table 2 Different types of physical map 
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Type of map Method Resolution 

Chromosome breakpoint maps Translocation breakpoints positioned using chromosome ~3Mb 
banding 

In situ hybridization maps Metaphase chromosome analysis positioned using chromo- ~3Mb 
some banding 
Interphase analysis ~100 kb 
Extended DNA fiber analysis ~3 kb 


Induced chromosome 
fragmentation maps 
Long-range restriction map 
Clone contig map 


Notl restriction maps 


Transcript map 
DNA sequence map 


Overlaps between clones detected with STSs 


ESTs located by RH mapping 
Complete nucleotide sequence | bp 


Radiation hybrids. (This is not itself a direct physical mapping 500kb — 2Mb depending 
method but is used to order markers.) 


on radiation dose used 
Several hundred kb 

150 kb with PACs 40 kb 
with cosmids 

500 kb — 2 Mb 


are under way to use 3300 bacterial artificial chromo- 
some (BAC) clones which are premapped and equally 
spaced along the entire human genome at an average 
distance of 1 Mb. Early results by Albertson indicate 
that this technique will be a powerful method for 
determining the map position and size of chromosome 
abnormalities that result in locus copy number 
changes such as duplications and deletions. 


Physical Mapping 


Various types of physical map were originally envis- 
aged to be constructed within the Human Genome 
Project with the emphasis on creating clone contig 
maps. However, technical developments led to more 
types of map being created then originally expected. 
Table 2 lists those large-scale physical mapping strat- 
egies that have finally been used within the project. 

Chromosome breakpoint, in situ hybridization, 
and induced chromosome fragmentation maps have 
already been considered. 


Long-Range Restriction Maps 

Constructing long-range restriction maps depends on 
the use of restriction enzymes, such as NotI, which 
cleave DNA very rarely. Typically, the enzymes 
recognize sequences containing CpG dinucleotides 
which occur rarely in vertebrate DNA, and generate 
fragments that are usually several hundred kilobases 
in length. A technique called pulse-field gel electro- 
phoresis (PFGE) is used to separate the resulting 
long DNA fragments on agarose gels by reversing 
the polarity of the field periodically. Following 
electrophoresis the fragments are analyzed by filter 
hybridization with sequences believed to be in the 
chromosome region of interest. The structure of sev- 
eral human chromosome regions has been analyzed 


using this approach. In an early application of the 
method inthe mid-1980s, the structure of the Duchenne 
muscular dystrophy gene was studied and proved 
to be the longest known human gene known with 
a genomic length of 2.5 Mb. This result was later con- 
firmed using large insert clones. A NotI restriction 
map has been created for the entire long arm of 
human chromosome 21. 


Clone Contig Maps 

Further developments in cloning vector design using 
DNA technology led to construction of large insert 
cloning vectors capable of accepting foreign DNA 
inserts which were much larger than those previously 
cloned in either plasmids (5kb) or lambda phage 
(20 kb). The major vectors involved were cosmids 
(40 kb inserts), YACs (300 kb-3 Mb inserts), PACs 
(120 kb inserts), P1 vectors (100 kb inserts), and BACs 
(150 kb inserts). The vectors were used for creating 
clone libraries of either total genomic DNA or DNA 
with reduced genomic complexity isolated from indi- 
vidual chromosomes separated by either flow sorting 
(cosmids) or by using somatic cell hybrids containing 
a single human chromosome (all vectors). DNA used 
for cloning was subjected to incomplete digestion to 
ensure creation of overlapping fragments. The isolated 
clones were subsequently used to create clone contig 
maps by detecting overlaps between clones using a 
variety of methods such as chromosome walking, 
repetitive DNA fingerprinting, and STS content map- 
ping. STS content mapping is based on the high spe- 
cificity and sensitivity provided by a PCR reaction 
and has provided the most robust and universally 
applicable method for constructing clone contig 
maps. Once a contig has been made, a selection of 
the clones is made which provides the minimum num- 
ber of clones needed to cover the entire chromosome; 
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this is referred to as the minimal tiling path. Because of 
their very large insert capacity, YACs were initially 
used for creating clone contig maps of individual 
chromosomes. However, YACs tend to be genetically 
unstable and to lose part of their inserts and often 
carry inserts which are chimeric due to coligation of 
fragments from two different genome locations. As a 
result the mapping community has gradually moved 
over to using BACs, which although they carry small- 
er inserts than YACs seem to be much more stable and 
provide more reliable mapping results. The cloned 
contigs finally used for large-scale genomic sequen- 
cing have generally been derived from BACs. 


Transcript Maps 

Gene sequences represent less than 3% of the total 
human genome and various methods have been 
employed to identify the presence of coding sequen- 
cies in genomic DNA. The starting point is frequently 
a large insert genomic clone with a known physical 
map location. Given that coding DNA is much more 
strongly conserved during evolution than non-coding 
DNA, a positive signal derived when the clone is 
hybridized to DNA from a variety of animal species 
in a Southern blot indicates the presence of coding 
sequences in the clone. This method is known as zoo 
blot hybridization. Other methods for locating gene 
sequences include CpG island identification, exon 
trapping, and analysis of DNA sequence data to search 
for homologies with other gene sequencies present 
in the DNA sequence databases or to predict the 
presence of exons (exon prediction). EST mapping 
has proved to be the quickest and easiest method of 
developing transcript maps. Expressed sequence tags 
(ESTs) are relatively short DNA sequences (usually 
200-300 nucleotides) generally generated from the 3’ 
ends of cDNA clones from which PCR primers can be 
derived and used to detect the presence of the specific 
coding sequence in genomic DNA. At the beginning 
of 2001 more than 3 million human ESTs were avail- 
able in the publicly accessible database dbEST and ~4 
million for all other species (http://www.ncbi.nlm. 
nih.gov/dbEST/). 

The database has a tremendous redundancy and 
most genes are represented many times. In 1996 a 
large-scale DNA sequence comparison was made of 
163 000 EST sequences present in dbEST at that time 
and 8500 known gene sequences in the DNA sequence 
database GenBank. This identified a set of 49000 
unique genes referred to as the UniGene set. An inter- 
national consortium mapped about 16000 of these 
genes to a framework map containing 1000 poly- 
morphic markers that had previously been ordered 
in high-resolution linkage maps, to two RH mapping 
panels, and to YAC clones with known map location. 


The density of the map was doubled to 30 200 genes in 
1998. Figure 3 shows the rate of gene mapping up 
to 1998. The transcript map data is available through 
the NCBI web site (http://www.ncbi.nlm.gov/ 
genemap99). Individual chromosome regions can be 
examined for their gene content or alternately the map 
position derived for individual ESTs. In the 1998 tran- 
script map, the map position of ESTs for the two 
radiation hybrid maps and genetic map used to 
position the EST is also available. Figure 4 displays 
the density of genes along chromosome 13 in relation- 
ship to the chromosome banding pattern. The genes 
are clearly not equally distributed along the chromo- 
some and there is a tendency for genes to be located in 
the light banded regions which are relatively AT rich 
by comparison to the dark banded regions. 


DNA Sequence Map 

At the time of preparing this entry, completed DNA 
sequence is publicly available for the two smallest 
human chromosome, namely 21 and 22. Completed 
means that sequencing errors have been reduced to 
less than 1 in 10000 nucleotides and gaps in the data 
reduced to “acceptable” levels. For example, in the 
case of chromosome 21 sequence, coverage of the 
long arm is at least 99.7% with three small clone 
gaps and seven sequence gaps remaining. Draft data 
which is sequence data which still requires further 
analysis to reduce sequence errors to the 1 in 10000 
level and gaps is publicly available for the majority of 
other human chromosomes in varying degrees. The 
company Celera claims that it has completed the 
sequence analysis of the entire human genome at 
high quality but is restricting access to the data to 
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Figure 3 Rate of gene mapping up to 1998. (From 
Deloukas et al. (1998) Science 282: 744-746.) 


fee-paying clients. Early analysis of the data suggests 
that either the human genome contains considerably 
fewer genes than originally predicted or that the algo- 
rithms for detecting coding sequences are still ineffi- 
cient. Current estimates of the total number of human 
genes now range from 28 000 to 45 000 in place of the 
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Figure 4 Nonrandom distribution of ESTs along 
chromosome 13. (From Schuler et al. (1996) Science 
274: 540-546.) 
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60-70 000 predicted on the basis of complexity of 
cDNA libraries. The comparison of chromosome 22 
sequence data with marker position predicted from 
genetic linkage information confirm a general coli- 
nearity between the two, but with local variations 
suggesting hotspots in recombination. The sequence 
analysis of gene content for chromosome 21 showed 
that of the 284 predicted genes and pseudogenes, 127 
were already known. The chromosome banding pat- 
tern of chromosome 21 divides the long arm up into 
proximal dark-banded and distal light-banded halves. 
The sequence data show that the gene density is three 
times higher in the distal half of chromosome 21 q, 
than the proximal half, with approximate levels of 
G+C content of 48% and 37%, respectively. In 
general, chromosome 21 exhibits a two- to threefold 
lower gene density than chromosome 22 which is also 
in keeping with general differences in G+C content 
between the two chromosomes. We can expect similar 
analyses and comparisons for all human chromosomes 
in the near future. It now becomes possible for the first 
time to refer to a map location of a human gene by its 
absolute position in terms of the number of nucleo- 
tides from the end of the short arm of the chromo- 
some. 


Mapping Human Genetic Disease Genes 


There has always been an intense interest to map and 
isolate human disease genes to help in a range of 
activities including: recognition of disease gene car- 
riers, improving disease diagnosis, disease prevention, 
understanding disease etiology, and designing treat- 
ment strategies. Various disease gene catalogues have 
been created with the most well known and extensive 
one being Victor McKusick’s On Line Mendelian 
Inheritance in Man (OMIM) which primarily consid- 
ers inherited disease (http://www3.ncbi.nlm.nih.gov/ 
omim/). 

McKusick includes information on disease pheno- 
types, extensive information on inheritance patterns, 
their association with genes, and map position. Specific 
mutation information or the so-called allelic variation 
is included. The catalog contains a total of 6641 
mapped disease gene entries as of January 2001. 

Disease genes have been mapped using a wide var- 
iety and combination of techniques. However the most 
widely applied method is referred to as positional 
cloning. This has been applied for approximately the 
last 12 years in the way depicted in Figure 5. The 
starting-point is usually determining a chromosome 
location for the disease gene using linkage analysis. 
However, the subsequent search for candidate genes 
and detection of disease specific mutations will be 
become much easier with the availability of well- 
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annotated databases containing the entire DNA 
sequence. Cloning will no longer be necessary and 
the positional cloning method will be replaced by a 
positional candidate gene approach. 


Comparative Mapping Studies 


From the earliest days of the Human Genome 
Project it was realized that it would be much easier 
to interpret the structure and function of the human 
genome correctly if the results of comprehensive map- 
ping and sequence studies were available in a series of 
model organisms which were much more amenable to 
genetic manipulation than in humans. In particular, 
the analysis of gene function in a model organism 
would provide valuable insights into the function of 
human homologous coding sequences. Table 4 lists 
some of the more important model organisms being 
studied. 

An extensive physical map became available for 
Escherichia coli in 1986 with the creation of an over- 
lapping cosmid clone contig. This map confirmed the 


sequences of Caenorhabditis elegans and Drosophila 
were completed in 1998 and 2000 respectively. Both 
organisms had been thoroughly mapped beforehand. 
Drosophila mapping studies had commenced 90 years 
previously under the supervision of T.H. Morgan 
and mapping work on C. elegans was initiated in the 
mid-1960s by Sydney Brenner, who had realized the 
importance of studying as simple a multicellular organ- 
ism as possible to be able to ultimately determine how 
an entire genome is regulated to give tissue differenti- 
ation. In particular, the origin and wiring of the nervous 
system of C. elegans was studied. In the case of C. 
elegans an almost complete clone contig, mainly com- 
posed of cosmids, was used for sequencing. A surprise 
was the much lower number of genes in Drosophila 
than C. elegans despite the significantly larger genome 
of Drosophila. Comparison of average gene structure 
in the two organisms with estimates of the number and 
types of transcripts, led to the early conclusion that 


Table 3 Largest dbEST entries 


gene order and position of the E. coli genetic maps Species Number of ESTs 
generated over the proceeding decades. However, it Homo sapiens (human) 3027604 
was a further 11 years before the 4.6 Mb genome was Mus musculus (mouse) | 884582 
completely sequenced. Yeast was the first eukaryotic Rattus spp. (rat) 263 120 
organism to be completely sequenced and involved Bos taurus (cattle) 158593 
a large international consortium of laboratories. Glycine max (soybean) 137698 
Although, the ECHOS of yeaS ISP ractically three Drosophila melanogaster (fruit fly) 116471 
times as big as E. coli, and divided over 16 chromo- Arabidopsis thaliana (thale cress) 112500 
somes in place of the single chromosomes in E. coli, it Caenorhabditis elegans (nematode worm) 109215 
contains only 20% more genes. The complete DNA 
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Figure 5 Scheme for positional cloning. (From Schuler et al. (1996) 274: 540—546.) 


Table 4 Model organisms 
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Organism Genome Haploid number Sequence status Estimated 
size of chromosomes number of 
genes 
Escherichia coli 4.6Mb l Completed September 1997 4289 
Saccharomyces cerevisiae (yeast) 12 Mb 16 Completed October 1996 6217 
Caenorhabditis elegans (nematode worm) 97Mb 6 Completed December 1998 19099 
Drosophila melanogaster (fruit fly) 165 Mb 5 Completed March 2000 13600 
Danio rerio (zebrafish) 1700Mb 25 Draft sequence by January 2003 60000? 
Fugu rubripes rubripes (puffer fish) 400 Mb ? Draft sequence by April 2001 60000? 
Mus musculus (house mouse) 3000Mb 20 Draft sequence by January 2002 60000? 


Drosophila made much more use of alternative exon 
splicing than C. elegans, so that in C. elegans there was 
a much closer one-to-one relationship in the numbers 
of transcript and structural genes than in Drosophila. 
The recent debate on the number of structural genes in 
man based on differences between initial estimates of 
structural gene number from the human chromosome 
21 and 22 sequence analyses and the large number of 
transcripts determined from EST analyses suggest that 
the one structural gene multiple transcripts paradigm 
was extended further in the origin of the vertebrates. 
In Table 4, a question mark has been placed against 
the estimated number of genes in vertebrate species 
because of this discussion. The numbers may have to 
be downgraded when the genomes are sequenced and 
analyzed. 

Recent comparative mapping data comparisons bet- 
ween the zebrafish and humans point to an extensive 
conservation of syntenic groups despite an evolution- 
ary separation of 450 million years. Extensive levels of 
conserved synteny have been observed also between 
the genomes of humans and the puffer fish. Not sur- 
prisingly, comparisons of the gene maps of mammals 
demonstrate very high levels of conservation of syn- 
teny over a period of 70 million years of separation. 
However, a universal phenomenon is that map order 
within the syntenic groups is invariably disturbed 
showing that chromosome inversions have generally 
occurred much more frequently than translocations 
in the evolutionary separation of the vertebrates. 

The success of Celera in using a shot-gun strategy 
to sequence the genomes of Drosophila, mouse, and 
humans within a matter of months of each other has 
invigorated the genome mapping and sequencing 
community to follow suit. At the end of 2000 various 
proposals were put forward by consortia to produce 
draft DNA sequences of three model organisms 
within a short period of time. Each of the organisms 
concerned, zebrafish, puffer fish, and mouse, has 
unique features which make them invaluable to inter- 
preting the function of the human genome. Zebrafish 


embryonic development, particularly of the central 
nervous system, can be studied uniquely because of 
the ease of visualizing the effect of mutations in the 
young embryo. The genome of the puffer fish exhibits 
a sevenfold compaction by comparison to the human 
genome and has a vastly reduced complexity of repeti- 
tive sequences making gene detection and analysis 
much simpler. The mouse is the organism most closely 
related to human which is being extensively studied 
for gene function, notably by gene knockout and 
knockin experiments. 


Further Reading 
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10, in special genome issue of Science 286: 458—481. 

Cuticchia AJ and Pearson PL (1994) Human Gene Mapping 1993: 
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Press. 

National Human Genome Research Institute (NHGRI), http:// 
www.nhgri.nih.gov/index.html 

Ott J (1991) Analysis of Human Genetic Linkage, 2nd edn. 
Baltimore, MD: Johns Hopkins University Press. 

Strachan T and Read AP (1999) Human Molecular Genetics, 2nd 
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See also: BAC (Bacterial Artificial Chromosome); 
Candidate Gene; Chiasma; Chromosome 
Painting; Comparative Genomic Hybridization 
(CGH); Contig; CpG Islands; Crossing-Over; 
FISH (Fluorescent in situ Hybridization); 
Functional Genomics; Haldane, J.B.S.; Human 
Genome Project; In situ Hybridization; Linkage 
Disequilibrium; LOD Score; Mapping Function; 
Marker; Microarray Technology; Morgan, 
Thomas Hunt; Physical Mapping; Polytene 
Chromosomes; QTL Mapping; QTL 
(Quantitative Trait Locus); Restriction Fragment 
Length Polymorphism (RFLP); Single Nucleotide 
Polymorphisms (SNPs); YAC (Yeast Artificial 
Chromosome) 
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Chromosome Movement 
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The proper distribution of chromosomes at both 
mitosis and meiosis requires their movement to oppos- 
ite sides of the cell. In mitosis the two sister chroma- 
tids split at the metaphase—anaphase transition. They 
then move towards opposite poles of the spindle on 
microtubule tracks. Their movement towards the 
poles is driven by protein motors known as kinesins 
and dyneins acting at the kinetochore (the DNA pro- 
tein complex assembled at the centromere of the 
chromosome). Other motors attached to the arms 
of chromosomes may act to balance this force or to 
modulate progression to the poles. 

In meiosis, homologs are also pulled to opposite 
poles by similar motor proteins. At the first meiotic 
division the two homologs, with each pair of sisters 
still attached, move to opposite poles of the spindle. 
Because this division reduces the total number of 
chromosomes in the cell by half, it is called the ‘reduc- 
tional division.’ Each pair of homologous chromo- 
somes orients itself on the spindle during meiosis I in 
a fashion that is fully independent of the other pairs of 
homologs. This random orientation of homolog pairs 
(bivalents) is the physical basis of independent assort- 
ment. At meiosis II, each homolog now orients itself 
on the metaphase plate such that the two sister cen- 
tromeres are oriented towards opposite poles of the 
spindle. At anaphase of meiosis II the sister chroma- 
tids separate and move to opposite poles. Because the 
total number of chromosomes does not change during 


this second meiotic division, it is called the “equation- 
al division.” 


See also: Meiosis; Mitosis 


Chromosome Number 


M A Ferguson-Smith 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0210 


Each species has a characteristic number of chromo- 
somes. The total number in each somatic cell nucleus 
is referred to as the diploid number as it consists of a 
series of pairs of chromosomes, one member of each 
pair being contributed by each parent at fertilization. 
In normal human somatic cells there are 23 pairs of 
chromosomes and the diploid number is thus 46, 
usually indicated as 27 = 46. Gametes (sperm and ova) 
contain but one member of each chromosome pair as a 
result of the reduction division of meiosis (see Meiosis), 
and this number is referred to as the haploid number. 
The diploid number varies greatly between species. 
The mouse has 40 chromosomes, the cat has 32, and 
the dog has 78. The mammal with the smallest number 
of chromosomes is the Indian muntjac with 6 chromo- 
somes. In contrast, the black rhinoceros has 84. 


See also: C-Value Paradox; Diploidy; Meiosis 


Chromosome Painting 
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DNA probes can be produced to span an entire human 
chromosome. This spectrum of probes can be labeled 


(A) (B) 


Figure | 


(See Plate 5) Examples of (A) two-colour chromosome painting and (B) multicolor M-FISH. Part (A) 


demonstrates the use of two-color chromosome painting to characterize further the prostate cancer cell line, PC3, 
helping to elucidate the origin of a complex marker chromosome containing material from chromosome 8 (green) 
and 12 (red). Part (B) shows the M-FISH karyotype of the bladder cancer cell line, EJ28. Several chromosome 
rearrangements can be seen in this transitional cell carcinoma. 


with fluorescent reporter molecules and hybridized to 
metaphase chromosomes, so that it appears painted. 
These chromosome-specific paints are generally pro- 
duced by fluorescence-activated chromosome sorting 
followed by polymerase chain reaction (PCR) ampli- 
fication. Several chromosomes share a large number of 
repetitive sequences which can cause interchromo- 
somal hybridization. To overcome this problem, in situ 
suppression with Cot-1 placental DNA is employed 
to anneal to the repetitive regions and inhibit their 
binding potential. 

Initially this technology was utilized to paint one 
or two chromosomes (see Figure IA), so that re- 
arrangements could be more accurately characterized. 
Over the last few years, techniques such as multicolor 
FISH (M-FISH, see Figure 1B) and spectral karyo- 
typing (SKY) have allowed all human chromo- 
somes to be simultaneously visualized in 24 discrete 
colors. 


See also: Chromosome; Karyotype; Probe 
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Synapsis 
P B Moens 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.021 I 


Chromosome pairing refers to the lengthwise align- 
ment of homologous chromosomes at the prophase 
stage of meiosis. Most sexually reproducing organisms 
have two sets of chromosomes, one set inherited from 
each parent. For these organisms to produce cells with 
a single set of chromosomes, the sets have to be sep- 
arated such that the daughter cells have one copy of 
each chromosome. The responsible cell division is 
meiosis and the mechanism is pairing/synapsis and 
subsequent separation of homologous chromosomes. 
‘Pairing’ refers to the juxtaposition of a pair of homo- 
logs at meiotic prophase, and ‘synapsis’ refers to the 
even closer alignment of the homologs, usually via 
the parallel alignment of the meiotic chromosome 
cores that form the synaptonemal complex. Close 
alignment of homologs can occasionally also be 
observed in somatic cells, particularly in dipteran 
insects, but that phenomenon is not included in this 
presentation. 
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Historical Background 


Since the early 1900s, a large number of microscope 
observations has been reported on the progression 
of meiotic chromosome pairing/synapsis. Originally, 
observations were made in the oocytes of newborn 
rabbits and later in male and female reproductive cells 
of numerous species of mammals, insects, plants, and 
fungi. To visualize the chromosomes, the cells/nuclei 
were fixed in a mixture of alcohol and acetic acid and 
squashed between a glass microscope slide and a thin 
coverslip. The chromosomes (from Greek, meaning 
colored bodies) were then colored with chromosome- 
specific natural dyes such as carmine or orcein or with 
aniline dyes (the Feulgen procedure), which were par- 
ticularly useful to quantify the amount of DNA per 
nucleus. For example, in the meiotic nucleus of Figure 
IA, the 22 chromosomes and the X chromosome of 
the common locust are Feulgen-stained. In Figure IB, 
the homologous pairs have synapsed. These pairs, 
referred to as bivalents, are more readily visible once 
they have shortened as in Figure 1C, where there 
are now 11 bivalents and the X chromosome. This 
synapsis of homologous chromosomes is one of the 
outstanding characteristics of meiosis. At a later 
stage, the partners of each bivalent begin to separate 
but remain bound at the sites of chiasmata (ch in 
Figure ID). 

After 1960, much more detailed images were 
obtained through the use of electron microscopy. At 
first, limited views of the meiotic nuclei were obtained 
by observation of thin sections of the nuclei. With 
improved sectioning techniques, complete serial sec- 
tions of single nuclei were used to give a more com- 
plete view of the nucleus by means of computerized 
reconstruction of chromosomes and chromosome 
cores (Figure 2A, B). From 1973 onwards, the use of 
surface spreading in combination with silver staining 
gave quick and easy visualization of complete nuclei. 
After 1985, antibodies against proteins of the meiotic 
chromosomes were widely used for fluorescent micro- 
scopy and electron microscopy studies of chromo- 
some cores and synaptonemal complex formation 
during meiosis. 


The Bouquet Stage 


Following meiotic DNA replication at S-phase, the 
replicated, but still decondensed chromosomes start 
to associate in pairs of homologs (Figure IB). 
Simultaneously, the chromosomes begin to shorten 
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and become more amenable to optical microscope 
observations (Figure IC). In the early stages of 
meiotic prophase, the ends of the chromosomes 
are attached to the inner nuclear membrane. Fre- 
quently the ends are polarized so that most are clus- 
tered in one region of the nuclear envelope (Figure 
2A, B). The possible functional significance of this 
bouquet organization has been speculated on at 


a 


metaphase 


_ P I^ 


length. It has been seen as a mechanism that assists 
in the pairing of homologous chromosomes or, con- 
versely, it may be a side effect of homologous chromo- 
somes undergoing synapsis that brings the ends 
together. But it has also been considered a fortuitous 
result of cytoplasmic organization since the bouquet 
arrangement can be abolished by tubulin inhibitors 
acting in the cytoplasm. 


diplotene 


Figure | Chromosome pairing during meiosis in the male locust. (A) This nucleus undergoing mitotic metaphase 
shows that the male locust has 22 chromosomes and an X (arrow) chromosome. (Female locusts are XX; males are 
XO.) That number is derived from two sets of || chromosomes and the sex chromosome X. (B) During the 
pachytene stage of meiotic prophase, the pairs of homologous chromosomes associate with each other and then 
form a tight synapsis. (C) As the pairs of chromosomes (now called bivalents) shorten, it is evident that there are now 
| I bivalents plus the X chromosome (arrow). (D) At a later stage of meiotic prophase, the partners of each bivalent 
are pulled apart while they remain bound at the sites of a reciprocal crossover/chiasma, ch. The arrow marks the 
single X chromosome. Scale bars = 10 um. 


Figure 2A, B Three-dimensional demonstration of 
the bouquet stage of meiotic prophase. Each line 
represents the synaptonemal complex of a set of paired 
homologs. This grasshopper has three pairs of very long 
chromosomes with a centromere, ce, in the middle. 
There are five shorter pairs with the centromere near 
one end. In the lower left corner, all the ends are 
attached to a small region of the nuclear envelope. This 
computer reconstruction is based on electron micro- 
graphs of a complete series of sections through the 
spermatocyte nucleus. The three-dimensional view is 
generated by the computerized 5° rotation of the 
nucleus around a central point, c. Scale bars represent 
10 um. 


Initiation of Pairing 


Optical microscope and electron microscope obser- 
vations on well-differentiated chromosomes or on 
chromosome cores indicate that in most organisms, 
chromosome pairing can be initiated simultaneously 
at several locations along the length of the chromo- 
somes. The existence of internal initiation sites is sup- 
ported by the observation that if one homolog has 
an internal inverted region relative to its partner, one 
can see what is called an ‘inversion loop’ at meiosis 
(Figure 3). This complex pairing configuration can 
only be the result of a pairing initiation site within 
the inverted region. 

The case has been made that, at least in maize, the 
pairing initiation site is also the site of a chiasma 
(reciprocal genetical exchange). In the heterogametic 
sex of some insects, male flies and female butterflies/ 
moths, the lack of crossing-over is correlated with 
modified chromosome synapsis. In some fly species, 
the males have no close synapsis and the limited 
association of homologous chromosomes is of a 
specialized kind that involves the DNA spacer regions 
of the ribosomal genes and other, weaker pairing 
sites. 

With chromosome painting, it has been reported 
that throughout interphase in some fungi, pairs of 
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Figure 3 A diagram of the formation of an inversion 
loop where an inverted chromosome segment pairs 
with its normal homolog. In the two homologous 
chromosomes, the segment containing the genes B, C, 
and D is inverted. In order to pair the homologous 
segments, one of the two chromosomes must form 
a loop which contains the inverted section. When the 
second chromosome bends around this loop, the 
homologous regions of the chromosomes are aligned. 


homologous chromosomes may occupy adjacent 
domains in the meiotic prophase nuclei which would 
facilitate chromosome pairing, but such an arrange- 
ment has not been reported in plants and animals. 
A number of studies have implicated the telomere 
regions of chromosomes in the initiation of pairing/ 
synapsis, perhaps resulting in the bouquet configur- 
ation noted above. 

Stringent regulation of meiotic chromosome pair- 
ing must exist in the common bread wheat which has 
not just one set of homologous chromosomes but 
three similar sets, AA, BB, and DD. At meiosis, the 
A chromosomes pair with each other as do the chromo- 
somes of the B set and the D set. Apparently, the 
initiation of pairing is not all that precise but subse- 
quent correction mechanisms establish precise order. 
The order is lost in strains that have a mutation of the 
Ph gene thereby allowing recombination between the 
A set and the B set of chromosomes. This loss of order 
is used by wheat breeders to introduce desirable genes 
into wheat strains from ancestral species. 


Genetic Aspects 


It has been observed for over half a century that occa- 
sionally some plants or animals may have a defect in 
chromosome synapsis which leads to sterility. The 
trait is inherited: it is passed on through heterozygous 
parents that carry the recessive mutation. These traits 
are of medical interest in humans and are of value 
for agricultural plant breeding programs where male 
and female sterility are manipulated to mass-produce 
plants with hybrid vigour. 
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The genetic defects that lead to synaptic abnor- 
malities and subsequent partial or full sterility have 
recently been identified. In yeast and mouse model 
systems, it has been reported that pairing/synaptic 
aberrations result from defects in the proteins of the 
chromosome cores, the synaptic proteins, recombin- 
ation proteins, and DNA damage/detection/repair 
proteins. Most of the defects lead to degeneration 
of the meiotic cells and thereby failure to generate 
haploid cells. 
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If mitotic metaphase chromosomes are depleted of 
95% of their proteins and 99% of their DNA and 
RNA, an insoluble network of nonhistone proteins 
can be obtained, which still retains the overall shape 
of a mitotic metaphase chromosome. This network is 
the chromosome scaffold. According to the scaffold- 
loop model of chromosomes, the scaffold determines 
the shape of native metaphase chromosomes and func- 
tions as a basis for organizing the chromatin in specific 
loop domains. 

Two major scaffold proteins (Sc proteins) have 
been identified, ScI and SclI. ScI is identical to topoi- 
somerase II (topo II), whereas ScII is an structural 
maintenance of chromosomes (SMC)-type protein, 
which is now also called SMC2. Topo II and SMC2 
have both been localized to the center of the long 
axis of the chromatid arms of mitotic metaphase 
chromosomes by immunofluorescence. The immuno- 
labeled scaffolds appear helically coiled, and the 
coiling of sister chromatid scaffolds displays a mirror- 
symmetry. 

The DNA sequences by which chromatin loops are 
anchored to the chromosome scaffold have been ana- 
lyzed in various ways. If metaphase chromosomes are 
protein-depleted and the genomic DNA is sub- 
sequently digested with restriction enzymes, specific, 


AT-rich DNA sequences remain bound to the chromo- 
some scaffold. If these scaffold-attached regions 
(SARs) are added as cloned fragments to metaphase 
chromosome scaffolds, they bind to the scaffolds with 
high affinity. The association of AT-rich sequences to 
the chromosome scaffold has also been demonstrated 
within intact mitotic metaphase chromosomes by 
differential fluorescent staining of AT- and GC-rich 
sequences. The signal for the AT-rich sequences 
(named ‘the AT queue’) colocalizes with the chromo- 
some scaffold as detected by immunolabeling of topo II 
(Saitoh and Laemmli, 1994). 

The role of the major scaffold proteins in chromo- 
some organization is far from elucidated. Topo II can 
catalyze the passage of one double-stranded DNA 
molecule through a transient double-strand break in 
another DNA-molecule and thereby catenate and 
decatenate DNA. SARs are highly enriched for the 
consensus DNA-sequence for cleavage by Topo II. 
In vitro, Topo II binds selectively and cooperatively 
to SARs. In vivo, binding sites of Topo II to DNA 
have been mapped by means of drugs that inhibit 
Topo II at the cleavage step. Many double-strand 
DNA breaks that were thus generated were localized 
within SARs, and, therefore, Topo II binds to SARs 
in vivo. Topo II is required for late steps in chromo- 
some condensation and for decatenation of sister 
chromatids at anaphase. These two functions could 
be linked. Residual catenation of sister chromatids 
possibly sterically hampers chromosome conden- 
sation, whereas condensation could push the equili- 
brium between catenation and decatenation by Topo II 
toward decatenation of sister chromatids. 

SMC2 (ScII) can interact by its C-terminal domain 
with DNA, in particular with AT-rich sequences with 
a tendency to form secondary structures. Because 
many SARs have these features, it is possible that 
SMC2 binds to SARs in vivo. SMC2 can form a het- 
erodimer with another SMC protein, SMC4. In 
metaphase chromosomes of Xenopus spp., proteins 
homologous to SMC2 and SMC4 colocalize in the 
chromosome scaffold. SMC2 and SMC4 occur also 
as a heterodimer in a 13S protein complex from Xeno- 
pus egg extracts that fulfils an essential role in chromo- 
some condensation im vitro. This 13S condensin 
contains, besides the SMC2/SMC4_ heterodimer, 
three other protein components, which are all evolu- 
tionary highly conserved. It has been proposed that, 
during chromosome condensation, 13S condensation 
forms large, positive supercoil loops in DNA, which it 
fastens and organizes into a regular solenoidal struc- 
ture. It is conceivable that, during such a process, 13S 
condensin or components end up in a chromosome 
scaffold. However, it remains to be established 
whether all 13S condensin components contribute 


to the structural organization of the chromosome 
scaffold, and which components recognize specifically 
SARs during chromosome condensation. 

The chromosome scaffold has some features in 
common with the nuclear matrix or nuclear scaffold, 
which represents the insoluble fraction from inter- 
phase nuclei after protein depletion and nuclease 
digestion. All analyzed SARs remain also attached to 
nuclear matrices and are then named MARs (matrix 
attached regions), whereas many but not all MARs are 
also SARs. There are also pronounced differences 
between the two structures. For instance, SMC2 
(ScII) does not make part of nuclear scaffolds. 
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A typical metaphase chromosome consists of a pair of 
roughly cylindrical chromatids, which are joined at 
the centromere, or primary constriction, and may 
show other, secondary constrictions, such as nucleolar 
organizers and fragile sites. The centromere is the site 
of attachment of the chromosome to the spindle, and at 
anaphase the chromatids split apart at the centromere, 
and the two separate daughter chromosomes, as they 
are now known, move to the poles of the cell, so that one 
of each is incorporated into the daughter cells. Previous 
to metaphase, the chromosomes are in prophase, and 
appear much longer and thinner, and the separation 
into separate chromatids is not always apparent. 

Chromosomes are organelles for packing an im- 
mense length of DNA into a form sufficiently com- 
pact to be handled by the cell, particularly at cell 
division (mitosis and meiosis). Each human nucleus 
contains approximately 2 meters of DNA, and this is 
reduced to a length of approximately 200 um or less in 
the condensed chromosomes. The ratio between the 
length of fiber (whether DNA or chromatin) and 
the length of the object into which it is packed is 
known as the packing ratio, which is about 10000 
for DNA in the chromosome. This high degree of 
compaction is attained in several stages, each with its 
own packing ratio. 
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Uninemy 


The undivided chromosome is unineme, that is, 
only a single DNA molecule runs throughout the 
length of a chromatid. The evidence for this is: (1) 
DNA replicates semiconservatively, and chromo- 
somes themselves replicate semiconservatively; uni- 
nemy is the simplest explanation for this; (2) DNA 
molecules from species with small chromosomes 
have the sizes that would be expected if there were 
only a single molecule per chromosome; (3) the 
kinetics of chromosome breakage by low-energy 
X-rays or by DNase are consistent with the presence 
of only a single DNA fiber; and (4) the axial fiber of 
lampbrush chromosomes is only wide enough to con- 
tain two DNA molecules, one for each chromatid. 


Chromosomal Fibers 


The chromosomal DNA fiber is packed into a 10nm 
nucleosomal fiber, producing a packing ratio of about 
7, and this, in turn, is packed into a 30nm fiber, the 
solenoid, which again produces a packing ratio of 7, 
giving a total of about 50. The 30-nm fiber seems to be 
the basic unit of chromosome organization, and is also 
found in interphase nuclei. The further 200-fold com- 
paction to form the metaphase chromosome appears 
to involve further folding of the 30-nm fiber, and not 
its reorganization into thicker fibers. 


Chromosome Scaffolds 


Metaphase chromosomes examined by electron 
microscopy only show an apparently random tangle 
of chromatin fibers, but a variety of evidence, in par- 
ticular the reproducibility of detailed chromosomal 
banding patterns, indicates that chromatin fibers are 
organized in a reproducible pattern. It is now accepted 
that chromosomes consist of a proteinaceous ‘scaf- 
fold’ from which radiate chromatin loops. The details 
of the scaffold, as seen by electron microscopy, are 
quite variable, it being sometimes a compact structure, 
and sometimes much more diffuse. 

Two main proteinaceous components of the scaf- 
fold have been identified. One of these is the im- 
portant nuclear enzyme topoisomerase II (Topo I), 
which is involved in many processes that require topo- 
logical alterations of the DNA molecule, including 
replication, transcription, DNA repair, and the 
decatenation (separation) of newly replicated DNA 
molecules. Topo II also seems to be necessary for 
chromosome condensation. The function of Topo II 
in the chromosome scaffold may be primarily struc- 
tural; there is evidence that specific sequences in the 
chromosomal DNA may attach themselves to Topo 
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II. The other major scaffold protein belongs to a class 
known as the SMC1 family (SMC, structural mainten- 
ance of chromosomes), which are involved in diverse 
chromosomal processes such as chromosome conden- 
sation, sister-chromatid cohesion, DNA repair, and 
dosage compensation. This SMC protein can form 
complexes with Topo II. 


Chromatin Loops 


Although there is a single DNA molecule running 
throughout each chromatid, the DNA behaves as if it 
consists of much smaller units. The DNA is attached 
to the scaffold at frequent intervals, and forms loops 
with characteristic properties. The loops of DNA 
extend 6-30 um from the scaffold, or up to 100 kb. 
The loops probably radiate in all directions from 
the scaffold, to form a rosette. It has been claimed 
that the structure formed by loops attached to a 
scaffold is no more than 0.2-0.3 um in diameter. It 
has been estimated that the degree of compaction 
produced by the loops would be in the region of 40- 
fold, to produce a total packing ratio for the DNA of 
about 2000. 

The DNA loops are attached to the scaffold by the 
scaffold attachment regions (SARs), DNA sequences 
that remain attached to scaffolds after exhaustive 
digestion. SARs are AT-rich, and generally appear to 
contain a consensus sequence for topoisomerase cleav- 
age, consistent with the observation that Topo II 
is a major scaffold protein, although not all SARs 
appear to contain a sequence for Topo II cleavage, 
and SARs appear to be able to bind to scaffolds lacking 
Topo II. SARs are never found in coding sequences, 
and are between about 3 and 140kb apart. It has 
been found that they flank genes, and coincide with 
the boundaries of the nuclease-sensitive domains 
associated with active genes. They have also been 
associated with origins of replication, and the sizes 
of loops are similar to the sizes of replicons. SARs 
may therefore be functional units of chromatin and 
chromosome organization. 

There is considerable variation in the size of loops, 
which may correspond to variation in sizes of repli- 
cons or transcribed domains. However, there may 
also be more systematic differences in loop sizes. The 
more frequent attachment of ribosomal DNA to the 
scaffold, with correspondingly shorter loops, might 
explain the existence of a secondary constriction at 
the sites of nucleolar organizer regions (NOR); simi- 
larly, the tendency of alphoid satellite (in humans) to 
associate with the scaffold rather than with the loops 
might account for the centromeric constriction. How- 
ever, other explanations of chromosomal constrictions 
are possible. 


Final Stages of Chromosome 
Condensation 


Early prophase chromosomes appear as long, thin 
threads, which become shorter and fatter as the cell 
proceeds to metaphase. A simple chromosome model 
in which loops of a fixed size are attached to a scaffold 
of fixed length is incompatible with the changes in 
length and diameter as the chromosomes condense. 
In any case, the packing ratio of the scaffolds-and- 
loops structure is only 2000, still fivefold short of 
what is needed, and this type of structure appears to 
be only 0.2-0.3 um diameter, still severalfold thinner 
than a fully condensed chromosome. Coiled chromo- 
somes have often been reported, but although they can 
sometimes be seen in living cells, coils are not usually 
seen in most chromosomes. Coiling is not incom- 
patible with a scaffold-and-loops model, and there is 
evidence for a coiled scaffold, in which the chromo- 
some would be composed of a fiber 200-300 nm in 
diameter, made up of radial loops, which could form 
the metaphase chromosome by coiling, resulting in at 
least a ninefold compaction. In such structures, con- 
strictions would be the result of differences in coiling 
rather than in loop size. 

The condensation of chromosomes into a series of 
chromomeres has also been observed. This can most 
easily be demonstrated at the pachytene stage of 
meiotic prophase, but can also be seen occasionally 
in mitotic chromosomes. However, the majority of 
mitotic chromosomes do not show any clear sign 
of chromomeric structure. Condensation of chromo- 
somes into a series of chromomeres, which then fuse to 
form a uniform cylinder, also seems to be compatible 
with the scaffold-and-loops model. Nevertheless, it is 
difficult to see how the helical and chromomeric 
models of chromosome condensationcanbereconciled. 
Although there is plenty of observational evidence for 
both, and both are compatible with the scaffold-and- 
loops model, more often than not, neither type of 
structure is visible. In turn, it is not clear whether 
a further level of condensation is actually needed 
beyond that provided by loops attached to a scaffold. 
At present, there is no consensus on the highest level 
of chromosome structure. 


Chromosome Periphery 


Chromosomes do not consist only of chromatin loops 
radiating from a core, and they also have a character- 
istic surface layer. This has been called the chromo- 
some periphery, or perichromosomal material, and 
consists of closely packed fibrils and granules, consist- 
ing of ribonucleoproteins (RNP), while more recently, 
several proteins have been identified at the surface of 


chromosomes. Different proteins are bound to the sur- 
face of chromosomes at different stages of mitosis. 
Some, including snRNPs, are only bound during 
metaphase and anaphase; others, such as some nu- 
cleolar proteins, are present from early prophase until 
telophase. Finally, in anaphase and telophase, the 
chromosomes are coated by the lamin B receptor, 
essential to the reformation of the nuclear envelope at 
the end of mitosis. 

Functions proposed for the surface coating of 
chromosomes include: (1) a role in chromosome organ- 
ization, particularly in condensation; (2) protection 
of the chromosomes in the absence of the nuclear 
envelope during mitosis; (3) segregation of proteins 
to daughter cells (‘passenger proteins’); and (4) involve- 
ment in the reformation of the nuclear envelope at 
telophase. 


See also: Centromere; Chromosome Banding; 
Chromosome Scaffold; Nucleolus; Telomeres 
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Chromosome walking was first applied to isolate 
the Ace, rosy, and bithorax gene complex loci in 
Drosophila in a series of overlapping genomic clones 
(Bender et al., 1983). ‘Walking’ provides a useful meta- 
phor for the process, which relies on the extension of 
regional clone coverage by small overlapping in- 
crements (Figure 1). Chromosome walking was 
soon used to isolate DNA sequences from the human 
major histocompatibility complex (MHC) in overlap- 
ping cosmid clones, and has since been used to con- 
struct contiguous clone sets spanning large genomic 
intervals in many species. The procedure can be car- 
ried out using any type of genomic clone library, and 
whether the clones of choice contain 15 kb (e.g., in 
phage replacement vectors) or 1Mb recombinant 
DNA inserts (e.g., yeast artificial chromosomes, or 
YACs), the procedure is essentially the same. 


Chromosome Walking in Practice 


Establishing a Foundational Clone Set 

A chromosome walk begins with isolation of a foun- 
dational set of clones derived from the genomic region 
of interest. This foundational set generally consists of 
a clone or clones isolated by screening a specific type 
of genomic library; for the purposes of this discussion 
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I will use the example of walking in a bacterial arti- 
ficial chromosome (BAC) clone library. These first 
clones are typically identified by hybridization of 
recombinant bacterial colonies stamped onto nylon 
filters with a probe designed from a gene, microsatel- 
lite sequence, or other marker (marker A in Figure 1). 
Alternatively, the clones may be identified through 
polymerase chain reaction (PCR) screening of DNA 
prepared from pooled recombinants using specific sets 
of oligonucleotide primers. Depending on the depth 
of the library that is screened (or, the average number 
of times a particular sequence is represented in differ- 
ent clones comprising the library), this first screening 
generally will permit the isolation of a foundation set 
of overlapping clones, each containing the marker of 
interest and extending to different lengths in either 
direction. Such a collection of overlapping clones is 
termed a ‘contig’ (Figure l; see below). The degree of 
overlap between specific clones in the contig is gen- 
erally determined by comparing restriction enzyme 
fragment patterns produced by each clone, a process 
called ‘restriction fingerprinting.’ Assembly of clones 
based on the presence of shared and unique restriction 
fragments can produce a restriction map of the contig 
if a large enough number of clones is examined with 
informative enzymes. These maps permit the order 
and placement of clones within the contig to be estab- 
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Figure | Bidirectional chromosome walking from a 
foundational clone set. To establish a foundational clone 
contig containing a gene or genetic marker of interest 
(marker A), the marker is used, in the form of a 
hybridization probe or oligonucleotide primers for PCR, 
to isolate an initial set of overlapping clones. End 
sequences are generated from the overlapping set, to 
identify the sequences from the extreme ends of the 
contig (shown as boxes). Unique sequences from the 
contig ends are used to create new probes or primer 
sets for an additional round of library screening. This 
process permits extension of the foundational contig in 
both directions. A third walking step can be initiated 
with extreme ends of the newly expanded clone set 
(shown as open circles). This process can be repeated 
multiple times, in one or both directions, until the contig 
reaches a desired length. 


foundational clone 
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Extending a Foundational Clone Contig 
through Chromosome Walking 

Walking is initiated in order to increase clone coverage 
further, through the identification of new clones that 
overlap the original set and extend further in either or 
both directions. To begin the walk, sequences located 
at the extreme ends of the foundational clone set 
are first isolated. These ‘endclone’ sequences are used 
to design new hybridization probes or oligonucleotide 
primer sets for a second round of library screening 
(Figure |). The new sets of clones will overlap the 
original contig to varying degrees; in some cases 
the extension provided by a single clone will be quite 
large (up to 90% of a clone length, or 100-150 kb for 
high-quality BAC clone libraries). However, the 
degree of extension provided by many clones will be 
significantly less. To assure that significant extensions 
are made in a walking step, it is therefore important 
to screen libraries with the highest possible level of 
sequence redundancy. 

After the first walking step, each set of new clones 
(emanating from either side of the foundational set) 
must be mapped relative to each other and to the 
foundational contig, to identify clones defining the 
longest extension in either direction. To continue 
extension of the clone set, end sequences are generated 
from clones that extend furthest in either direction, 
and new probes or primer sets are designed from the 
clone ends to initiate a second walking step (Figure 1). 
The process is repeated until the cloned region is 
extended to the desired length. Traditionally, the gen- 
eration of usable end-sequence probes has represented 
a significant bottleneck in the process of chromosome 
walking. Although a number of clever schemes have 
been developed for isolation of clone ends by plasmid 
rescue or PCR with vector-specific primers sets, these 
protocols can be slow and tedious. Random clone 
ends in human or other mammalian libraries also 
often contain repetitive sequences that confound 
further attempts at contig expansion and raise the 
probability of false walking steps, by j joining clones 
that are derived from different genomic regions. 

More recently, efficient protocols for direct se- 
quencing of BAC ends have been employed, using 
BAC clone DNA preparations as sequencing tem- 
plates and vector-specific primers in dye-terminator 
sequencing reactions. These protocols, driven by the 
need for strategies to streamline selection of clones for 
large-scale genome sequencing (Venter et al., 1996) 
have been used to generate databases of end sequences 
for large collections of BAC clones in libraries derived 
from DNA of human, mouse, and many other species. 
These BAC end-sequence databases, when used to 
design short, overlapping oligonucleotide or ‘overgo’ 
probes, have revolutionized the once tedious problem 


of chromosome walking. Overgo probes, which are 
typically designed to recognize unique 40bp seg- 
ments, are short enough to be generated from most 
sources of sequence, including random genomic 
sequence reads like BAC ends, even in repeat-rich 
mammalian genomes. 


Chromosome Walking in silico 


As human genome sequence begins to flood into public 
databases, the process of chromosome walking is being 
transformed from an experimental task to a process 
that can be carried out entirely with the computer. 
This revolution in chromosome walking, and hence, 
in positional gene cloning and genome analysis, is 
likely to become a reality for researchers interested 
genomes of mouse, zebrafish, and many other species 
in the near future. The driver behind this revolution 
is coordination of genomic sequencing with other 
resources, such as restriction fingerprinting (which 
identifies overlapping clones by identification of 
shared restriction fragments) and especially, deeply 
sampled BAC end-sequence sets, in libraries that are 
designated as the common currency for genome cen- 
ters worldwide. Given draft sequence of a foundational 
clone or set of clones, overlapping BACs can quickly 
be identified in fingerprinted contig sets, or through 
database searches for matches between sequenced 
BACs and BAC end-sequence collections. The accu- 
mulation of restriction-fingerprint and BAC end- 
sequence data is certain to transform the analysis of 
contiguous genomic regions in dozens of species from 
a laborious, repetitive laboratory effort to a rapid, 
computer-based method within the next several years. 
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A linkage map is usually a graphical summary of the 
estimated linkage distances between genetic markers 


as deduced from observed recombination frequencies 
obtained from standardized crosses. For eukaryotes, 
such maps, like most chromosomes, are typically 
linear in the sense of having two ends. For many pro- 
karyotes and viruses, however, linkage maps are 
circular in the sense of being one-dimensional and of 
finite length but having no ends. 

Consider three markers, A, B, and C, linked in that 
order. If a fourth marker, D, also maps between A and 
C, a linear map remains adequate if either of the 
linkage orders ADB or BDC can be demonstrated. 
Demonstration of the orders BCD or DAB implies a 
circular map, which remains adequate so long as each 
additional marker, N, which shows linkage ANC can 
be mapped to one of the following positions: ANB, 
BNC, CND, or DNA. 

In a circular map, any two markers are linked to 
each other by two arcs of the map. Recombination 
of the markers requires disruption of both linkages. 
Since the disruption of linkage in each arc implies an 
odd number of exchanges in that arc, recombinant pro- 
duction requires an even total number of exchanges. 


Occurrence 


Many viruses and plasmids have circular chromo- 
somes, typically DNA duplexes, throughout their 
life cycles. These replicons manifest circular linkage 
maps, implying that there is no single spot at which an 
exchange always occurs. 

In T-even bacteriophages of Escherichia coli (e.g., T2 
or T4), the chromosomes in the virion are linear 
but the gene orders among the particles of the same 
clone are circular permutations of each other. Thus, in 
some particles, members of a given marker pair are 
farther apart, physically, than they are in the other 
particles. Crosses typically involve infection of 
numerous host cells by several phage particles of 
each of two genotypes. The resulting linkage maps, 
based on recombinant frequencies among the phage 
particles produced, are circular (Stahl and Steinberg, 
1964). 

Bacteria that carry the sex factor can conjugate 
with other bacteria and transfer the sex factor to 
them. Clones of donor E. coli cells in which the sex 
factor F is integrated into the bacterial chromosome 
(Hfr cells) efficiently transfer the segment of chromo- 
some to one side of F. Artificial interruption of 
synchronous conjugation can determine the time of 
transfer of a given marker, providing a basis for deter- 
mining the order of markers. The gene orders obtained 
with different Hfr strains constitute a set of overlap- 
ping linear map segments which can be assembled 
uniquely i into one circular linkage map for E. coli. By 
convention, the distances shown on the map are in 
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units of time of transfer, normalized to a total map 
length of 100 minutes. 

The chromosome of phage lambda is linear in the 
virion but circularizes after entering the host cell. 
However, the linkage map obtained from mixed infec- 
tions with two different genotypes is linear, because 
the chromosomes are cut at a unique site (cos) prior to 
being packaged into virions. As circular chromosomes 
can yield linear linkage maps, so can linear chromo- 
somes yield circular maps, at least in principle. The 
formal requirement for a circular map is a tendency to 
even numbers of exchanges in variable positions, inde- 
pendent of chromosome topology (Stahl, 1967). 

When lambda lysogenizes its host, the repressed 
chromosome, in its circular state, undergoes a single 
genetic exchange with the chromosome of its host. This 
exchange occurs at a defined site (att) on the chromo- 
some of the phage, so that the resulting prophage 
chromosome has a linear gene sequence that is a 
circular permutation of the sequence of the virion 
chromosome. It is likely that recombinant frequencies 
obtained from genetic crosses between lytic cycle 
phage and prophage would yield a circular linkage map. 
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Cis-Acting DNA Regions 


Analysis of the genomic DNA of prokaryotes has 
focused almost exclusively on the open reading 
frames and the proteins encoded by them because 
they represent the vast majority of the genomic 
sequences. This situation is quite different in higher 
eukaryotes like mammalian species. Only a very small 
fraction of the genomic DNA of these organisms 
encodes proteins. DNA sequences involved in chro- 
matin organization, transcriptional regulation, and 
splicing constitute a much larger fraction of the gen- 
omic sequences as compared to protein-coding regions. 
Among the most important and abundant cis-acting 
DNA regions are the matrix or scaffold attachment 
regions (S/MARs), the locus control regions (LCRs), 
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enhancers, silencers, and promoters. A DNA region is 
cis-acting if it is located on the same DNA molecule as 
the target sequence that it affects. In this sense, all of 
the above mentioned regions are cis-acting. In classical 
genetics the term locus has a meaning different than 
that used here. Usually a locus defines a complete 
genetic unit (at least a complete gene) while here the 
term refers to a certain region in DNA with a defined 
biological function, which is not necessarily a com- 
plete gene. The focus of this article will be on eukary- 
otic DNA regions, as they are more complex in 
organization than prokaryotic sequences. 


Scaffold/Matrix Attachment Regions 
(S/MARs) 


S/MARs are involved in the architectural structuring 
of chromosomal DNA as well as in the long range 
regulation of transcription. S/MARs are abundant in 
mammalian genomes (estimated number of S/MARs 
in the human genome, about 100000) and consist of 
DNA regions ranging from about 300 nucleotides up 
to more than 1000 nucleotides in length. 


Functions 

The functions of these elements include delimitation 
of chromosomal loops (between two S/MARs) and 
attachment of chromosomal DNA to the nuclear scaf- 
fold or matrix, located at the inner surface of the 
nuclear membrane. Matrix attachment is a prerequis- 
ite for transcription. S/MARs play an important role 
in chromatin rearrangements such as histone H1 
displacement and DNA unwinding, including strand 
separation. S/MARs also have various effects on tran- 
scriptional regulation. These include insulating the 
DNA in between two corresponding S/MARs from 
the positional effects of chromatin and synergistic 
actions with enhancers and/or promoters within the 
S/MAR delimited chromatin loop. There is also a 
subclass of S/MARs associated with origins of DNA 
replication. This role of the elements probably reflects 
the ability of S/MARs to induce single-stranded 
DNA stretches associated with DNA replication. It 
is important to note that S/MARs require a native 
chromatin structure to function. Most of their effects 
are lost in transient assays where the constructs are not 
integrated into chromosomal DNA. 


Structure 

So far, no definitive structure for S/MARs is available. 
However, S/MARs are characterized by the accu- 
mulation of a number of individual elements like 
intrinsically curved DNA, topoisomerase II binding 
and cleavage sites, short repeats, DNase I hyper- 
sensitive sites, DNA unwinding elements, and 


stretches with a high probability of strand separation. 
AT tracts, inverted repeats capable of formation of 
cruciform DNA, and numerous transcription factor 


(TF) binding sites were also denoted as S/MAR asso- 
ciated elements. 


Detection 

S/MARs are usually detected experimentally on the 
basis of their ability to bind to nuclear matrix compon- 
ents. Their transcriptional effects have been studied 
in some detail and there are methods available — and 
emerging — for computer-assisted detection of poten- 
tial S/MARs in genomic DNA. 


Locus Control Regions (LCR) 


Locus control regions (LCRs) are absent from consti- 
tutively expressed chromatin domains, which is con- 
sistent with their function of activating silent genes. 
They are also not yet structurally well defined but a 
few known examples suggest an organization that 
includes features present in S/MARs and enhancers. 


Function, Structure, and Detection 

LCRs can open condensed chromatin domains, there- 
by activating enhancers. Like S/MARs they control 
gene expression independently of their chromosomal 
position. LCRs also act on a long-range time scale, in 
contrast to enhancers and promoters. When a silent 
domain has been experimentally derepressed, estab- 
lishment of the repressed chromatin structure requires 
that cells pass through S-phase. Thus, LCRs act as 
long-term on/off switches in the chromosome. LCRs 
are also usually detected experimentally on the basis 
of effects on transcription control. They contain 
various binding sites for activating proteins (TFs). 


Enhancers and Silencers 


Enhancers are sequences that dramatically increase the 
transcription of responsive promoters. Their most 
prominent hallmark is that they function in a position- 
and orientation-independent manner within several 


kb of the DNA whose transcription they modulate. 


Function 

Enhancers act on promoters by binding activating TFs 
and bringing them into close proximity to promoters 
by a phenomenon known as DNA looping. In this 
manner the local concentration of the activating 
domains of TFs is increased. The enhancer-bound 
proteins result in stronger activation than would be 
possible if the proteins had to bind from free solution. 
Enhancers modulate the overall transcription of a 
promoter (sometimes by several orders of magnitude). 


Silencers are similar in principle to enhancers; they 
also contain protein-binding sites and can exert 
their effects in position- and orientation-independent 
fashion. They suppress or reduce transcription from 
promoters (they are not a part of the promoter) and 
are vital for the establishment of tissue- and/or 
cell-specific expression. Silencing can also be achieved 
by competitive binding of factors to overlapping 
binding sites (as in the osteocalcin gene) where the 
relative concentration of two binding proteins dic- 
tates whether silencing or activation is the outcome. 
Enhancers and silencers also work in transient systems 
and do not necessarily require a native chromatin 
structure because they are usually located in nu- 
cleosome-free regions of genomic DNA. However, 
sometimes effects vary between transient and inte- 
grated enhancer sequences indicating at least some 
influence of chromatin which may be due to the 
involvement of S/MAR sequences which can interact 
with enhancers. 


Structure 

Many enhancers are organized in a manner very 
similar to promoters. The promoter-specific region 
(core) is absent from enhancers (see below for 
details). Enhancers contain clusters of TF-binding 
sites probably facilitating the formation of oriented 
multiprotein complexes which then interact with pro- 
moter/protein complexes. Enhancers have also been 
described in prokaryotic systems. 


Detection 

Enhancers and silencers are mainly detected by means 
of their effects on reporter gene constructs driven by 
responsive promoters. So far, no general identification 
of enhancers and/or silencers by computer-assisted 
sequence analysis is possible. 


Promoters 


In general, the promoter is an integral part of the gene. 
The behavior of a given promoter often makes sense 
only in the context of its own gene, especially if the 
frequency of transcription is determined outside of 
the promoter (e.g., by an enhancer). The promoter 
by definition marks the beginning of the first exon of 
a gene and always contains the transcriptional start 
site (TSS). There are three different promoter types 
in higher eukaryotic sequences, named after their 
respective RNA polymerases, I, II, and III. Since 
most of the regulated cellular genes are transcribed 
by polymerase II, only these promoters will be de- 
scribed in detail. 
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Function 

The function of a promoter is to mediate and control 
the initiation of transcription of that part of a gene 
which is located immediately downstream of the pro- 
moter (3’). This can be achieved either in an unregu- 
lated manner (constitutive transcription) or ina highly 
regulated fashion, where transcription is under the 
control of various extracellular and intracellular sig- 
nals (regulated transcription). 


Structure 

The structure of a polymerase II promoter can be 
viewed as a mosaic of several segments of DNA, 
each with a specific function. To start from inside 
out in terms of function, a promoter must contain a 
transcription start site (TSS) often located inside a so- 
called initiator region (INR). Promoters also contain 
one or more essential binding sites for general tran- 
scription factors (GTFs) which are sometimes located 
downstream of the TSS (downstream elements). 
One of the most prominent GTF binding sites is 
the TATA box, recognized by the TATA-box-binding 
protein, which itself is part of a larger complex of 
proteins. The minimal promoter may include a few 
more sites located close to the TATA box or the 
TSS. The region immediately 5'-adjacent to the min- 
imal promoter constitutes the promoter proximal 
segment, which usually extends about 200 to 300 
nucleotides upstream of the TSS. The CCAAT box is 
an example of a relatively common upstream TF bind- 
ing site situated in the proximal part of the promoter. 
Further upstream (i.e., in 5/-direction) there may be 
distal promoter sequences. The only difference 
between distal promoter sequences and enhancer 
sequences is the position and orientation independ- 
ence of enhancers. In addition to these features, spe- 
cific DNA or RNA structural elements, such as 
intrinsically curved DNA, direct or inverted repeat 
elements, may also influence the formation of the 
initiation complex. 


Modular Organization in Promoter 
Structures 

The TF binding sites within a promoter (or the 
upstream regulatory sequences) do not show any 
obvious patterns with respect to location and orien- 
tation within promoter sequences. Apparently, TF 
binding sites can be found virtually anywhere in 
promoters but they need not be present in every 
promoter. A closer look reveals that the particular 
function of a TF binding site (e.g., activation or re- 
pression) often critically depends on the relative 
location and especially on the context of the binding 
site. 
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As a consequence of context requirements, TF sites 
are often grouped together and such functional groups 
have been described in many cases as promoter mod- 
ules. Within a promoter module both sequential order 
and distance can be crucial for function, indicating 
that these modules rather than individual binding 
sites may be the critical determinants of a promoter. 
Promoter modules may use overlapping sets of bind- 
ing sites. The basic principles of module organization 
are also true for at least some enhancers and are neither 
peculiar nor restricted to promoters. 


Detection 

The DNA region representing a promoter can be 
determined by assays for promoter function in a 
heterologous context. Many attempts have also been 
made within the last few years to achieve promoter 
prediction by computer-assisted methods. However, 
owing to the variability of the modular organization 
of promoters all attempts towards general promoter 
recognition have been hampered by low specificity. 
Some specific promoter models exist, but they only 
describe a one class of promoters and are not suit- 
able for detecting functionally unrelated promoters. 
However, these attempts demonstrated the feasibility 
of specific promoter recognition. Computer-assisted 
prediction of a wide variety of cis-acting DNA regions 
is likely to become routine in the analysis of genomic 
sequences. 

In summary, many details of the complex structure 
of cis-acting DNA regions are known and modular 
organization is a widespread if not general principle. 
However, many of the functional requirements and 
restrictions on the DNA sequences of these regions 
remain elusive and detection of these cis-acting regions 
by computer-assisted sequence analysis is still in its 
infancy. Nevertheless, the identification of promoters 
in whole genomic DNA sequences is (and most likely 
will remain so in the near future) out of reach. Further 
improvements in the bioinformatics methods are re- 


quired. 
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Cis-acting proteins are an unusual class of DNA- 
binding proteins that act preferentially on DNA sites 
located close to the gene from which they are expressed. 
This is in sharp contrast to the majority of proteins, 
which are freely diffusible (trans-acting) and can act at 
many different locations in the genome with equal 
efficiency. In fact, a protein’s ability to freely diffuse 
in a bacterium is a basic requirement of the classical 
complementation test used to determine if two muta- 
tions affect the same gene function. Cis-acting pro- 
teins were originally identified using such an assay: 
they exhibited weak complementation of a defective 
allele when a wild-type copy of the gene was supplied 


in trans. 


Classification of Cis-Acting Proteins 


Most examples of cis-acting proteins have been 
described in bacteria where the coupling of transcrip- 
tion and translation ensures that a protein is synthe- 
sized in the vicinity of its gene, which fulfills one 
of the requirements for cis action (see below). The 
discrete compartmentalization of transcription and 
translation in eukaryotes prevents such a localized 
synthesis. In fact, the few examples of cis preference 
described in higher organisms have involved proteins 
acting in cis on their mRNA. 

Bacterial cis-acting proteins have been divided into 
three groups based on their function; however, pro- 
teins within a class do not achieve cis action by the same 
mechanism. The largest and most well-studied class 
of cis-acting proteins consists of the transposases, 
encoded by bacterial insertion elements. Proteins asso- 
ciated with replication of certain single-stranded phage 
(for example, the CisA protein of phi X174) and bac- 
terial plasmids (the RepA protein of plasmid R1) form 
the second group of cis-acting proteins. The third class 
includes regulatory proteins such as the bacteriophage 
lambda anti-termination protein, Q, and the D-serine 
deaminase activator protein of Escherichia coli. 


What Purpose Does Cis Action Serve? 


All the cis-acting proteins described to date play a 
critical role in DNA/RNA metabolism or regulation 
wherein restriction of activation to a single genetic 
unit is beneficial to the survival of the cell and/or the 
genetic element encoding the protein. This is espe- 
cially true of the insertion sequence (IS) transposases. 
Bacterial IS elements transpose predominantly by a 
cut-and-paste (donor-suicide mechanism) mechanism 
that leaves potentially lethal double-strand breaks in 
the chromosome. These transposons are often found 
in multiple copies within a cell and many of these are 
defective due to the acquisition of deleterious muta- 
tions. High expression of a trans-acting transposase, 
encoded by a cut-and-paste transposon, would result 
in large-scale activation of cryptic elements within a 
genome and lead to many deleterious IS insertions and 
DNA rearrangements. Thus cis action limits trans- 
position to a single element, ensures that distant defec- 
tive elements are not activated, and provides a selective 
process to enrich for active transposons. Interestingly, 
transposons that move via a replicative mechanism 
which does not involve double-strand breaks encode 
trans-acting transposases. 


How Is Cis Action Achieved? 


To explain cis preference most models propose that 
there is an unequal distribution of protein within the 
cell such that the highest concentration of active pro- 
tein exists around its site of action — close to the gene 
encoding the protein. To generate such a gradient 
requires that (1) protein synthesis be limited to the 
immediate vicinity of the gene, and (2) that diffusion 
of the protein to other sites in the genome be restricted. 
How this gradient is achieved and maintained has been 
the focus of much research and, not surprisingly, is 
accomplished in several distinct ways. The most sig- 
nificant insight into these mechanisms has been gained 
by the isolation and characterization of protein 
mutants, or the development of conditions, that 
allow a cis-acting protein to become trans-acting. 


Localizing Protein Synthesis to the Vicinity 
of its Gene 

In bacteria the natural coupling of transcription and 
translation results in localized protein synthesis. Con- 
sequently, any process that enhances this coupling will 
increase the likelihood of action in cis. For example, 
slow release of mRNA from its DNA template will 
increase the tethering of the mRNA (and therefore 
protein) to its gene. The rate of degradation of a tran- 
script will also influence the location of protein synthe- 
sis: a long mRNA half-life would allow time for the 
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message to diffuse away from the gene and therefore 
facilitate trans action. 

Examples of such regulation have been described 
for the IS/0 transposase. The cis preference of the 
transposase is influenced by mutations that affect the 
release and stability of the transposase mRNA. Muta- 
tions that increase the rate of translational initiation 
result in an increased rate of transcript release and 
also an increased half-life of the mRNA, as ribosomes 
protect the mRNA from nucleases. The net effect of 
this is to increase the amount of diffusible transcript, 
resulting in a more even distribution of protein in the 
cell. 

The cis action of the RepA protein of the plasmid R1 
has also been attributed to transcriptional tethering. 
This protein is required for initiation of plasmid 
replication. A Rho-dependent transcription termin- 
ation site located at the 3’ end of the repA gene is 
thought to cause the RNA polymerase complex to 
stall thereby increasing the length of time the mRNA 
is tethered to its template and thus facilitating the 
delivery of RepA protein to sites associated with the 
repA gene. 

An unusual form of cis preference, also based on 
tethering, has been proposed to explain the lack of 
complementation observed between the multiple 
copies of LINE (L1) retrotransposons found in 
mammals. Although these elements are extremely 
abundant, only a small fraction of them actually trans- 
pose. It is thought that the nature of retrotranspos- 
ition, which occurs via an RNA intermediate, plays a 
key role in cis action. The L1 transcript has two roles. 
It is the template for translation of two proteins 
required for transposition. One of these, the ORF2 
protein, encodes reverse transcriptase and endonu- 
clease activities and is thought to bind to the polyA 
tail of its own mRNA immediately following transla- 
tion. The transcript also acts as the template for target- 
primed reverse transcription mediated by ORF2. 
Thus, ORF2 preferentially reverse transcribes its 
own mRNA in the transposition process. 


Mechanisms that Limit Protein Diffusion and 
thereby Enhance Cis Preference 

To maintain the gradient established by coupled tran- 
scription and translation, it is important to limit the 
redistribution of protein to other locations in the gen- 
ome. There is a variety of processes that serve to 
accomplish this and in many cases more than one is 
employed to reduce trans action. 


Protein instability 

Cis preference of the IS903 transposase has been 
correlated with its very short half-life. Mutations 
of transposase, or conditions that increase protein 
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stability increase its ability to complement defective 
transposons in trans by allowing more time for the 
protein to diffuse through the cell before it is inacti- 
vated. This requires that the protein be made in limit- 
ing amounts, and that the time taken to find a distant 
site (in trans) must be longer than the half-life of the 
protein (see section “Sequestration of protein” below). 
In fact, the IS903 transposase, like many other IS 
transposases, is poorly expressed. Since transposition 
is thought to require multimers of transposase, limit- 
ing the amount of protein synthesized will also reduce 
the likelihood that the concentration of protein at 
trans sites would be sufficient to form the multimers 
that catalyze transposition. 


Sequestration of protein 
Reducing the functional half-life of the protein by 
increasing the time required to find a distant site can be 
achieved by sequestering the protein. The CisA replica- 
tion initiation protein of phi X174 is quickly seques- 
tered in the membrane away from its site of action in 
the A gene. Thus the membrane acts as a trap reducing 
the availability of protein to other genomic sites. 
Formation of inactive multimers is thought to 
sequester the IS50 transposase away from trans sites 
and favor cis action of the protein. A derivative of IS50 
transposase that reduces dimerization with either 
itself or a transposase inhibitor protein increases 
trans activity, suggesting that nonproductive multi- 
merization of the transposase reduces the functional 
half-life of non-DNA bound protein. 


Multiple binding sites 

A third way to reduce the redistribution of protein is 
to have multiple binding sites for the protein in the 
vicinity of the gene. This situation is observed with the 
repA gene of the plasmid R1, which is closely linked to 
a multiple array of RepA binding sites thought to trap 
the protein and prevent further redistribution. An 
extension of this type of model is simply to propose 
that the protein in question has a relatively high affin- 
ity for nonspecific DNA compared with its specific 
DNA-binding site. In this scenario the protein would 
spend extended periods of time associated with non- 
specific DNA, which would contribute to cis preference 
by slowing diffusion away from its site of synthesis. 


More Cis-Acting Proteins? 


To date only extreme cases of cis preference have been 
documented. As other regulatory systems are charac- 
terized and genetic systems developed in other organ- 
isms, it is likely that other examples of cis-acting 
proteins will be described, but perhaps with more 
subtle cis preferences (i.e., an intermediate phenotype). 


Given the different schemes that have been identified 
to date, it would not be surprising if cis action can be 
achieved by yet other novel processes. Further ex- 
amples of cis-acting proteins are likely to be described 
in eukaryotes as new and improved genetic systems 
allow more precise monitoring of complementation 
analyses. The example of LINE elements certainly 
suggests that other retrotransposons may encode pro- 
teins that preferentially act on their RNA templates. 
Mobile group II introns move by a similar mechanism 
to LINE-like elements, and thus might also be expected 
to favor insertion of a copy of the RNA template 
from which its proteins were encoded. Preliminary 
evidence indicates that a cis-acting protein(s) may be 
required for replication of the RNA-based poliovirus 
and thus suggests that RNA viruses might also be a 
new, untapped source of cis-acting proteins. By exten- 
sion, other RNA-mediated processes may also utilize 
cis preference for regulation. 
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The cis configuration refers to two features that are on 
the same DNA molecule. For instance, a promoter 
and the coding sequence of a gene are two sequences 
that must occur in cis, since a promoter cannot 
promote transcription of a gene that is located some- 
where else. Similarly, a regulatory DNA sequence, 


such as the Jac operator locus, must be located in cis in 
order to influence transcription. ‘Cis-dominance’ 
refers to the action of a mutation in such a regulatory 
sequence. When the mutation is in cis with the struc- 
tural gene (coding sequence), its effect is observed 
phenotypically, as if it were a dominant mutation. 
When it is located in trans to the structural gene, how- 
ever, there is no effect of the mutation. (Operationally, 
the term is probably not any different from the con- 
cept of a cis-acting mutation.) 


See also: Cis-Acting Locus; Operon; Regulatory 
Genes 
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Cis and trans are terms applied to the configuration of 
two different mutations in a diploid (including partial 
and temporary diploids in bacteria and bacterioph- 
age). When both mutations are on the same DNA 
molecule or chromosome they are in the cis config- 
uration, and when they are on different molecules or 
chromosome they are in the trans configuration. The 
term was initially used in referring to studies of Dros- 
ophila by Pontecorvo, Green, and Lewis, and later 
adopted by Benzer in his studies of the rll locus of 
phage T4. 


See also: Complementation Test 
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Genes have traditionally been defined, both theoret- 
ically and operationally, by three criteria: biological 
function, mutability, and recombination with other 
genes. In his analysis of fine genomic structure, 
Seymour Benzer (1957) defined three elementary 
units corresponding to these criteria: the cistron as a 
unit of function, the muton as a unit of mutation, and 
the recon as a unit of recombination. The term cis- 
tron is hardly used any longer except in combined 
forms (polycistronic: containing more than one cis- 
tron), but the concept is still important. It is based 


Cistron 383 


on the theoretical conception of a gene as a struc- 
tural unit, specifically a portion of the genome that 
encodes a single protein (polypeptide), and ideally 
a cistron ought to be identical with a gene so con- 
ceived. 

In Benzer’s system, a cistron is defined on the basis 
of a complementation test, performed by putting two 
copies of a gene in the same cytoplasm to observe their 
interactions; with a phage such as T4, the system 
Benzer used, this is done by simultaneously infecting 
bacteria with two mutants, and the idea is most easily 
explained with this system. The bacterial host is 
chosen for restricting the growth of all the mutants 
involved, so no mutant by itself can grow on this host. 
However, sometimes a cell infected simultaneously 
with two distinct mutants produces phage because 
the mutations complement each other. That is, each 
mutant phage is still capable of supplying a function 
that the other is missing. In each experiment, either the 
two mutations affect the same cistron (i.e., the same 
gene as defined above) or two different cistrons (call 
them A and B); also, the mutations can either be in the 
same genome, cis to each other, or in different gen- 
omes, trans to each other. Thus there are four possible 
experimental situations: 


1. One mutant has a defective A gene, the other a 
defective B gene; the mutations are trans to each 
other. Since the A mutant has a wild-type B gene 
and the B mutant a wild-type A gene, the phage 
should complement each other and multiply nor- 
mally. 

2. One mutation affects the A gene, the other the B 
gene, but they have been recombined so both are in 
the same genome, cis to each other; the other gen- 
ome has only wild-type genes. This is a control to 
ensure that a mutant gene does not somehow dom- 
inate a normal wild-type gene, perhaps by produ- 
cing a toxic product; the phage should multiply 
normally. 

3. Both mutations affect the A gene and they are trans 
to each other. Since neither genome has a wild-type 
A gene, the phage should not multiply. 

4. Both mutations affect the A gene and they are cis to 
each other; the other genome has only wild-type 
genes. Since one genome has both wild-type genes, 
the phage should multiply normally. 


Thus, a complementation test is sometimes called a 
cis-trans test, and a cistron is then a region of a genome 
defined by a set of mutations that are located together, 
as determined by mapping experiments, and do not 
complement one another. In practice, it may be diffi- 
cult to carry out unambiguous complementation tests, 
but in a well-controlled system, this is a classical and 
still useful way to determine the limits of a gene. 
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Clade is the term that denotes a phyletic lineage, con- 
sisting of a stem species and all the species derived 
from it. A branch ina cladogram, in a formal cladifica- 
tion, is termed a cladon. 


See also: Cladistics; Cladogenesis; Cladograms 
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Cladistics, or phylogenetic systematics, is a system- 
atic and taxonomic discipline. Hennig (1960, 1966), 
founded the discipline although he was certainly not 
the first to use many of its principles. It provides a 
method for reconstructing the phylogenetic relation- 
ships between species and higher taxa. Species are 
grouped into natural, or monophyletic, groups based 
on sharing of synapomorphic homologs while plesio- 
morphic homologs are rejected as valid evidence for 
relationship. Phylogenetic or cladistic classifications 
are characterized by containing only strictly mono- 
phyletic groups (containing an ancestor and all des- 
cendants of the ancestor) while specifically rejecting 
paraphyletic groups (one or more descendants re- 
moved from the group) or polyphyletic groups (ances- 
tor not logically included in the group). 
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This is one of the two great phylogenetic processes, 
the other being anagenesis. The study of cladogenesis 
is the study of the origin and of the nature of the 
branching pattern of the phylogenetic tree. Cladogen- 
esis also concerns the various different methods by 
which the phylogenetic tree is reconstructed. It in- 
cludes the process of speciation, because every act of 
speciation adds a branch, no matter how short, to the 
phylogenetic tree. 
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Cladograms are graphic representations of trees. Like 
all tree diagrams, cladograms are composed of ter- 
minal branches, nodes, and internodes. In systematics, 
the terminal nodes represent known taxa (species or 
monophyletic groups in the phylogenetic system) 
while the internal nodes and branches represent 
some relationship. For most phylogeneticists the 
internal nodes represent speciation events while the 
branches represent at least one hypothetical common 
ancestor linking to descendants. In the phylogenetic 
system, unrooted cladograms portray neighborhood 
relationships among taxa but do not suggest a particu- 
lar quality of these relationships. In other words, taxa 
that are adjacent on an unrooted tree may or may not 
be closest relatives, but they will be topologically 
closer to each other in a rooted tree than to taxa not 
found in the neighborhood. In systematics, rooted 
cladograms specify a particular direction of evolution 
and thus a specific relationship among the taxa on the 
tree. Designation of some taxa as outgroups will allow 
rooting of part of the tree even though the tree as a 
whole remains unrooted. This allows for specific rela- 
tionships to be hypothesized within the group studied 
without designating a root for the entire cladogram 


which contains both the group studied and one or 
more outgroups. 


See also: Clade; Phylogeny; Taxonomy, 
Evolutionary; Trees 


Class Switching 
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Class switching is a change in the expression of the 
constant region of an immunoglobulin heavy chain 
during lymphocyte differentiation, resulting in the 
production of a different antibody type. 


See also: Antibody; Immunoglobulin Gene 
Superfamily 


Cleavage 


See: Nuclease, Restriction Endonuclease 
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The human lip and palate form as a result of the cell 
proliferation (growth), apposition, and fusion of 
embryonic facial processes between the fifth and 
twelfth weeks of gestation. This requires that the pro- 
cesses appear in the correct place, achieve the correct 
shape and size, and have no obstruction to fusion. 
Given the complex nature of this oral development, 
one can readily imagine a long list of potential mis- 
haps. Indeed, oral clefts are a major public health 
problem worldwide. 

Cleft lip with or without cleft palate (CL + P) has 
an incidence at birth of about 1 in 500-1000 that varies 
by population; persons of Asian descent are often 
at higher risk than those of Caucasian or African 


Figure | 
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descent. In all populations there are significantly more 
males born with CL + P than females. The incidence 
at birth for cleft palate alone (CP) is relatively uniform 
across populations at about 1 in 2000; significantly 
more females are born with CP than males. It has 
clearly been established that CL + P and CP are 
etiologically distinct. Persons with CL + P very rarely 
have relatives with CP and vice versa. What CL + P 
and CP do share is that despite over 50 years of intense 
study the etiologies of both are largely an enigma. 


Inheritance of Oral Clefts 


In 1942 Poul Fogh-Andersen published his ground- 
breaking study of hundreds of CL + P and CP famil- 
ies from which he concluded that oral clefts are 
Mendelian autosomal dominant disorders with greatly 
reduced penetrance. Sixty years hence we are margin- 
ally more knowledgeable than Fogh-Andersen about 
the etiologies of CL + P and CP. From the weight of 
the evidence it is clear that there are important major 
gene effects; these tentatively appear to involve genes 
related to growth or fusion of facial processes. Never- 
theless, the inheritance patterns of CL + P and CP are 
not classically Mendelian, exhibiting phenocopies, 
incomplete penetrance, genetic heterogeneity within 
and between populations, and the influence of modi- 
fier genes and diverse environmental factors. This 
is well-illustrated by the Fraser—Juriloff paradigm of 
differences in susceptibility to an environmental ter- 
atogen resulting from a genetically determined differ- 
ence in normal oral development (Figure 1). 


Recurrence Risk 


Because the etiologies of CL + P and CP are so largely 
undefined, the counseling of affected families relies 
almost entirely on empirical studies of recurrence 
risk. For Caucasians, it has been found that if the 
proband has other affected first and/or second degree 
relatives, the risk to subsequent siblings or offspring is 
about 15%. If the proband has no other affected first 
and/or second degree relatives, the risk is about 3-5%. 
Unfortunately, similar empirical risk determinations 
for other racial groups have not been made, but it is 


(See over) Fraser—Juriloff model of CP susceptibility. The roof with holes in it represents the maternal 


barrier between teratogen (arrows) and embryo. The x-axis represents the phenotypic distribution, normal to the left 
of the vertical threshold and abnormal to the right; the threshold separates palate closure from palate nonclosure. 
(A) Palate closure is normally late (slow growth), so the phenotypic distribution for this genotype (dashed curve) is near 
the threshold, and the delaying effect of the teratogen causes all embryos (solid curve) of this genotype to fall beyond 
the threshold and be affected (hatched area). (B) In an early closing (faster growth) genotype, the same delay causes a 
minority of embryos to be affected. Of course, these two cases are the outer boundaries of the model, and there will be 
many genotypes (dashed curves) at varying distances to the left of the threshold. (Reproduced with permission from 
Fraser FC (1980) Animal models for craniofacial disorders. In: Melnick M, Bixler D and Shields ED (eds) Etiology of Cleft 
Lip and Cleft Palate, pp 1-23. New York: Alan R. Liss. Reprinted with permission of John Wiley & Sons, Inc.) 
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generally agreed that the above estimates are reason- 
able for non-Caucasians as well. 


See also: Dysmorphology 


Clinical Genetics 
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Clinical genetics (also termed medical genetics) is the 
science of human biological variation as it relates to 
health and disease. Although people have long been 
aware that individuals differ, that children tend to 
resemble their parents, and that certain diseases tend 
to run in families, the scientific basis for these obser- 
vations was only discovered during the past 125 years. 
The clinical applications of this knowledge are even 
more recent with most progress confined to the past 
35 years. The term clinical genetics is also applied to 
the clinical speciality which is concerned with the 
delivery of medical genetics services. These services 
include clinicians, genetic counselors, nurses, scien- 
tists, and support staff and provide genetic testing 
and genetic assessment and counseling. 


See also: Genetic Counseling; Genetic Diseases 
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The genetic analysis of circadian 24-hour biological 
rhythms is an exciting and fast-moving field. Its 


(B) 


popularity is partly due to the fact that everyone can 
relate to their own circadian sleep-wake cycle, and so 
this subject has an instant ‘street credibility’ for both 
students of biology and lay(wo)men alike. Central to 
the approach hasbeen, and stillis, the use of mutagenesis 
to generate clock variants in the organism of choice. 
The identification of clock mutants was first docu- 
mented in 1971 with Drosophila melanogaster, but a 
number of other model organisms have more recently 
come into prominence, particularly cyanobacteria, 
Neurospora, and mice. However Drosophila takes 
center stage historically, and the molecular mechan- 
isms that provide circadian cycles of behavior and 
physiology to the fly have striking similarities to 
those described in other organisms, so we shall focus 
on the fruit fly as the model model. 


The Fly Model: Circadian Phenotypes 
and the Period Gene 


Drosophila means “dew lover’, and this inspired taxo- 
nomical insight describes the behavior of an adult 
fly as it emerges from its pupal case at dawn, when 
humidity is at its peak. This allows the fly a few hours 
to tan its cuticle and pump out its wings before the 
midday sun of sub-Saharan Africa (fruit flies evolved 
in this part of the continent) desiccates the fly. Ifa fly is 
ready to emerge in mid-afternoon, it will wait for the 
next morning ‘gate,’ and so a population of mixed-age 
pupae will show several cycles of morning eclosion, 
giving a circadian 24-h rhythm which persists even 
in constant conditions of darkness and temperature. 
In 1971, a chemical mutagenesis of D. melanogaster 
performed by Ronald Konopka and Seymour Benzer 
(see Behavioral Genetics; Neurogenetics in Dros- 
ophila; Benzer, Seymour) generated three sex-linked 
mutants, whose rhythmic eclosion profiles were dra- 
matically altered. In constant (or ‘freerunning’) con- 
ditions, one mutant had a fast 19-h cycle, another 


showed a long 29-h rhythm, and the third was 
arrhythmic. The three mutations mapped to the same 
spot on the X-chromosome, thereby defining the 
period (per) locus, and became known as per’, per” 
and per? respectively. These mutations also affected 
the circadian rhythms of individual flies, as measured 
by their locomotor activity (or ‘sleep—wake’) cycles; 
per’ had 19-h cycles, per” 29-h, and per? were insom- 
niacs. Surprisingly, these mutations also had parallel 
effects on the following: 


1. A very short (ultradian) one-minute rhythm found 
in the male fruit fly’s courtship song (per? males 
showed 40s cycles, per” 80s, and per? were again 
arrhythmic or supershort with 20-30s cycles). 

2. A very long (infradian) 10-day rhythm of develop- 
mental time (per? 9 days, per” 11-12 days, and per? 
were supershort, with 8-day cycles). 


A second clock gene, timeless (tim), was identified 
by mutagenesis in 1994, and this forward genetics ap- 
proach has more recently isolated a number of other 
clock genes including cycle (cyc), Clock (Clk), and 
doubletime (dbt). Mutations in these genes can gener- 
ate short, long or arrhythmic phenotypes, but only in 
dbt is there a lethal allele. Thus dbz is predicted to 
encode a protein with vital housekeeping functions, 
whereas per is a genuine behavioral gene; when it is 
removed, rhythms simply disappear but the flies are 
perfectly viable. 


Molecular Analysis: the Negative 
Feedback Loop 


The PER protein cycles in abundance in many tissues, 
but in the central brain it is localized predominantly in 
a small group of nerve cell bodies termed the ‘lateral 
neurons’ (LNs), where it oscillates with a peak late in 
the night phase, and a trough during the day. Further- 
more, during the night, PER can be seen building up in 
the cytoplasm of these cells before moving into the 
nucleus. Experiments with mosaic flies reveal that if 
these neurons do not express PER, the fly is behavior- 
ally arrhythmic, so the LNs constitute the fly’s behav- 
ioral pacemaker. The TIM protein colocalizes with 
PER with similar dynamics, and both proteins are 
phosphorylated during the circadian cycle, PER exten- 
sively, TIM less so. The transcripts from both per and 
tim also cycle, so as the protein levels rise during the 
night, the transcript levels fall, and vice versa. This 
inverse phase relationship between mRNA and protein 
suggests a negative feedback of the two clock proteins 
on their own mRNAs, presumably mediated by the 
entry of PER and TIM into the nucleus. The delay 
between peak mRNA and protein levels (about 6 h) 
provides the permissive conditions whereby the clock 
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proteins can accumulate before exerting their negative 
regulatory effects within the nucleus. Without the 
delay, the molecular feedback cycle would damp out. 

The importance of nuclear translocation is revealed 
in arrhythmic tim mutants, in which PER remains 
cytoplasmic. TIM therefore acts as a nuclear translo- 
cator for PER by physically associating with a region 
of PER known as PAS, which is also found in many 
other proteins, including CLK and CYC, which 
themselves dimerize via this motif. These latter pro- 
teins also have bHLH (basic helix-loop-helix) 
domains by which they bind to specific DNA 
sequences called E-boxes found in the per and time 
promoters, and, during the latter part of the day, the 
CLK-CYC dimer activates per and tim transcription 
(Figure | A). CLK and CYC are therefore the positive 
regulators for per and tim, whereas PER and TIM act 
as the negative regulators. So as PER and TIM move 
into the nucleus late at night the PER PAS domain 
interacts with the CLK PAS domain and sequesters 
the CLK-CYC dimer from the per and tim pro- 
moters, blocking per/tim transcription. As PER and 
TIM degrade during the day, this releases the CLK- 
CYC dimer to reactive per and tim transcription, and 
the relentless molecular oscillation begins again.... 

The critical 6-h delay between peak per and tim 
transcription and the PER/TIM dimer’s role as a nega- 
tive autoregulator, is mediated by the dbt gene, which 
encodes a homolog of mammalian casein kinase 12. As 
PER protein begins to be translated during the night, 
it is phosphorylated in the cytoplasm by DBT, and 
then degraded (Figure | A). As TIM levels build up in 
the cytoplasm, they block the actions of DBT, and 
PER levels can finally accumulate to a level where 
they can associate with TIM before the PER-TIM 
dimer translocates to the nucleus. 


TIM and CRY: Light-Sensitive Clock 
Molecules 


Circadian clocks that are freerunning in darkness are 
very responsive to brief light pulses. A light pulse 
applied early at night will generate a delay in the 
rhythm, whereas one given late at night produces an 
advance. These dynamic changes are collectively 
termed the phase-response curve (PRC), and are 
very similar in all organisms. TIM rapidly degrades 
in response to light, so a light pulse given early at night 
depletes TIM during its rising phase when there is an 
available pool of tim mRNA. The time it takes to 
reconstitute the previous TIM levels, generates the 
delay in the molecular (and behavioral) cycle. The 
same pulse given late at night, when TIM levels are 
falling, again depletes TIM, but at a time when there is 
little tim mRNA. TIM levels are prematurely reduced 
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Figure | (A) The Drosophila interlocked circadian feedback loops. The rectangles filled in italics represent the 
coding regions of genes, their promoters are shown as a straight line and transcription is shown as a horizontal arrow 
rising from the gene from which transcripts (single-stranded squiggles) arise. The shapes filled with roman letters are 
the proteins. ‘+++’ and ‘-———’ represent whether the transcription factors activate or inhibit transcription 
respectively. The CLK/CYC transcription factors bind to the per/tim promoters and activate transcription during the 
day and early night. The PER protein is initially degraded by the actions of DBT kinase, but TIM protein blocks this 
phosphorylation of PER, allowing PER levels to rise. PER-TIM dimerization and translocation of the dimer into the 
nucleus late at night blocks per/tim transcription by sequestering the positive transcription factors CLK/CYC. CLK/ 
CYC also repress transcription of Clk (either directly or indirectly, hence the ‘?’), so when the PER-TIM dimer moves 
into the nucleus, it derepresses Clk, allowing CLK levels to build up during the day, after which they dimerize with 
CYC and reactivate per/tim but repress Clk transcription. (B) The mouse interlocked loop. The functionally equivalent 
clock genes and molecules are shown for the mouse circadian system. Protein shapes that are similar to those in 
Drosophila represent homologous proteins. Thus fly DBT is equivalent to mouse CK le (casein kinase |) which is able 
to phosphorylate mPER, although whether the delay between mPER translation and its function is mediated by CK le 
is not proven, hence the ‘?’. The key genes are mPer2, the mCry, and bmall. Dimerization between mCRY and mPER2 
allows nuclear translocation, but the repression of mPer2 and mCry transcription is by mCRY alone. Similarly, 
activation of bmall is by mPER2 acting with an unknown (‘?’) transcription factor. 


to a level that corresponds to that several hours inthe conditions, mutation in the cry gene gives abnormal- 


future, thereby generating a phase advance. This sim- 
ple molecular model provides a compelling explan- 
ation for the apparently complex PRC. Entrainment 
of circadian cycles to different light-dark regimes and 
the PRC can be understood in terms of TIM light- 
sensitivity, which is itself mediated by the blue-light 
receptor CRYPTOCHROME (CRY). Under certain 


ities in entertainment, although in constant darkness, 
this mutation has little effect on the normal circa- 
dian rhythm. CRY therefore can be thought of as a 
circadian photoreceptor and part of the clock input 
pathway because it interfaces between the environ- 
mental light-dark cycle and TIM, but not a bona fide 
clock component. 


Interlocked Feedback Loops 


Clk mRNA also cycles, but with a peak late at night. 
In per or tim null mutants, Clk mRNA levels are very 
low, suggesting that PER and TIM activate CLK 
transcription. PER-TIM probably does this by 
sequestering the CLK-CYC dimer, which itself acts 
as a repressor of Clk transcription, because in arrhyth- 
mic Clk mutants, Clk mRNA levels are held at con- 
stant high levels (Figure 1A). So as the PER-TIM 
dimer moves into the nucleus at night, it represses 
per and tim, but derepresses Clk transcription. The 
Clk mRNA is translated and CLK protein levels 
increase as PER and TIM are degraded. During the 
day CLK, along with its partner CYC, then reactive 
per and tim transcription, and the molecular cycle 
begins again (Figure 1A). Thus the C/k cycle is inter- 
locked with the per/tim cycle. 


Clock Output Genes 


How does a molecular oscillation, like the one 
described in the pacemaker cells of the Drosophila 
brain, translate itself into circadian behavior? If the 
clock molecules PER, TIM, and CLK can regulate 
themselves, they can potentially also regulate down- 
stream genes that convey the circadian message to the 
organs and structures that carry out overt rhythmic 
behavior. One such output gene is pigment dispersing 
factor (pdf), which when mutated, produces arrhyth- 
micity under constant conditions. The gene is expressed 
in a subset of the LN pacemaker cell nerve terminals, 
is positively regulated by CLK-CYC, and encodes 
a small peptide that appears to act as a circadian 
messenger. 


Other Model Systems 


Rodents 

There are two clock mutations in mammals that give 
circadian phenotypes as dramatic as those found in 
Drosophila. The naturally occuring tau mutation in 
the Syrian hamster was fortuitously discovered as a 
heterozygous animal with a short, 22-h period in its 
locomotor activity cycle. Homozygous mutants have 
20-h rhythms, and molecular identification of tau 
revealed that it was a mutant allele of the casein kinase 
1e gene (homologous to doubletime in the fly). 
Chemical mutagenesis in the mouse identified an 
arrhythmic mutant called mClk, which turned out 
to be the mammalian homolog of Drosophila Clk 
(or dClk). Needless to say, the mammalian circadian 
mechanism shares all the components of the fly clock, 
but with some significant differences (Figure |B). 
First of all there are three per genes in the mouse 
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(mPer 1-3), two mCry genes, the homolog of fly cyc 
is called bmall in the mouse, and the bmall transcript 
cycles in mouse whereas that of mClk does not (in the 
fly dClk cycles, cyc l does not). The mammalian tim 
gene (mTim) does not appear to play a significant part 
in the mouse clock as in the fly, because its role has 
been largely taken by the mCry genes, which have lost 
their photoreceptor function, but instead play a cru- 
cial role in the negative limb of the feedback loop. 
mCRY is therefore a clock component in the mouse, 
whereas in the fly it is a photoreceptor. This is dramat- 
ically emphasized by simultaneous mutation of both 
mCry genes, which gives arrhythmicity in the mouse, 
in contrast to the fly, in which the effects of a cry 
mutation can only be detected during entrainment 
to light-dark cycles. The negative feedback loop 
works as in the fly, but instead of PER-TIM dimers, 
mPER2-CRY heterodimers are translocated to the 
nucleus where mCRY negatively regulates the mPer 
and mCRY genes by interacting with mCLK- 
BMAL1 heterodimers. mPER2 activates mClk and 
bmall transcription, so as in the fly, the mPER/ 
mCRY and BMALI feedback loops are interlocked. 


Bread Mold 

In Neurospora crassa, mutagenesis has identified a 
number of clock genes that control circadian conidia- 
tion rhythms. The frequency (frq) gene has a central 
role, cycles at the mRNA and protein levels, and is 
involved in a negative autoregulatory feedback loop, 
just like per and tim. The positive regulators of frq 
are the white collar genes, wc1 and we2, which, like 
their fly/mammal counterparts Clock and cycle (or 
bmall), encode PAS domains. Furthermore FRQ also 
has a positive role in that it enhances WC1 syn- 
thesis via a posttranscriptional mechanism. Conse- 
quently the dual role of FRQ leads to interlocking of 
the FRQ and WC1 feedback loops, reminding one 
yet again of the similar mammalian and fly mech- 
anisms. 


Cyanobacteria 

Finally, the cyanobacterium Synechococcus is a uni- 
cellular organism that shows circadian cycles in a 
number of physiological characteristics including 
photosynthesis and cell division. Use of a reporter 
gene strategy by which circadian cycles of biolumi- 
nescence were targeted for mutagenesis, a large number 
of clock mutants have been isolated. These reveal an 
essential clock gene cluster, kaiA, kaiB, and kaiC, 
whose products do not share sequences similarity 
with any of the eukaryotic clock proteins described 
above. KAIC negatively regulates its own gene, pro- 
viding the basic feedback loop and KAIA enhances 


390 Clone 


kaiB and kaiC expression, thereby stabilizing the 
oscillation. None of the KAI proteins have DNA- 
binding capability, so the transcription factors in- 
volved are unknown. 


Conserved Clock Mechanisms 


The remarkable sequence conservation between insect 
and mammalian clock genes, the role of PAS domain 
clock proteins in the fly, mouse, and mold, and the 
basic negative feedback loop observed in prokaryotes 
and eukaryotes reveal a remarkable conservation in 
circadian mechanisms throughout the animal kingdom. 
Plants too have rhythms, and mutagenesis of Arabi- 
dopsis is beginning to identify and isolate the relevant 
molecular clock components. 
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A clone is a large number of identical cells or mol- 
ecules arising from a single progenitor (ancestral) cell 
or molecule. 


See also: Cloned Organisms; DNA Cloning; 
Whole Organism Cloning 


Cloned Organisms 
D Solter 
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A cloned organism is derived by asexual reproduction 
from a parental organism and is genetically its exact 
copy. Unicellular organisms multiplying by simple 


division are regarded as clones, but we usually reserve 
the term for multicellular organisms derived from a 
single cell of the parental organism. It is also assumed 
that multiple identical copies can exist, all derived 
from a single parental organism, thus both a single 
cloned organism and a population of identical cloned 
organisms are called clones. In recent times the term 
‘clone’ has also been used for animals produced by 
nuclear transfer, even if such animals are not abso- 
lutely genetically identical to the animal which pro- 
vided the donor cell. 

The literal translation of the Greek word clone is a 
twig, and plants derived by grafting, budding, and, 
recently, by growing from a single cell of another 
plant are indeed clones in the strict sense of the 
word. The derivation of a mature carrot plant from 
a single cultured cell established definitively that 
somatic cells can be totipotent and serve as progeni- 
tors of the entire adult organism. This type of cloning 
is now fairly common and plants from many different 
species can be derived from a single cell. 

The cloning of multicellular animals, especially ver- 
tebrates, by growth from a single somatic cell is not 
possible, and all clones described so far have been 
produced by nuclear transfer into an enucleated egg. 
It is arguable whether individuals derived by the split- 
ting (natural or experimental) of an embryo should be 
considered clones. Such twins, triplets, etc., are indeed 
genetically identical, however they are not derived 
from another genetically identical organism and the 
number of individual clones is obviously limited, 
while the number of clones produced by nuclear 
transfer is, at least theoretically, without limits. 

The first true animal clones were described in 
amphibians about 50 years ago (Di Berardino, 1997). 
They were produced by transferring the nuclei from 
embryonic cells into eggs whose genetic material was 
mechanically removed. Following this initial success, 
numerous investigators continued nuclear transfer 
experiments in amphibians (mostly frogs and toads), 
and their results can be briefly summarized as follows: 


1. The capacity to support development to an adult, 
sexually mature individual is gradually lost as cells 
from more advanced stages of development are 
used as nuclear donors. 

2. Nuclear transfer from adult differentiated cells can 
result in substantial embryonic development but 
development to an adult animal was never observed. 

3. Embryonic and adult somatic cells can be repro- 
grammed to a greater or lesser degree in the cyto- 
plasm of an enucleated egg. 

4. Failure of adult cells to support entire development 
could be due to some irreversible genetic change 
or, more likely, to chromosomal abnormalities 


observed when a slow-dividing somatic nucleus is 
forced to undergo very rapid division following 
transfer into the egg cytoplasm. 


Regardless of the interpretation, nuclear transfer ex- 
periments in amphibians did not answer the crucial 
question: are nuclei from differentiated somatic cells 
totipotent or irreversibly changed by differentiation? 

The answer to this question was provided by 
nuclear transfer experiments in mammals. In the last 
20 years methods for nuclear transfer — embryo acti- 
vation and culture, and transfer to foster mothers — 
have gradually improved, finally resulting in the sheep 
Dolly, the first mammal cloned from a differentiated 
adult cell. This success has been rapidly followed by 
others, and sheep, cows, mice, goats, and pigs have all 
been cloned from various adult cell types. While these 
results prove that at least some of the cells in adult 
organisms can be completely reprogrammed in the egg 
cytoplasm, the success rate of the procedure is very 
low (Solter, 2000). 

Summarizing all the results of nuclear transfer 
using adult nuclei, it appears that only about 1% of 
manipulated eggs develop into normal adults. It is 
likely that this low success rate is due to a combination 
of technical and biological problems. The many steps 
of nuclear transfer include egg selection, synchroniza- 
tion of the egg and donor nucleus, egg activation 
following nuclear transfer, embryo culture, and prep- 
aration of the recipient foster mother. It is likely that 
further work on these procedures will result in an 
improved cloning rate. However, solving the biolog- 
ical problem, i.e., reprogramming of the donor nucleus, 
may be more difficult. It seems that the cytoplasm of 
the mature ovulated egg is the only environment in 
which the nucleus can be reprogrammed; but the 
molecular basis of reprogramming is still a mystery. 
If the whole process is essentially random, it may 
never be possible to increase significantly the success 
rate of nuclear transfer, or to predict which of the 
manipulated embryos will develop to adulthood. The 
continuous loss of nuclear transfer embryos through- 
out development and the abnormalities observed soon 
or well after birth indicate that reprogramming fails 
much more often than it succeeds. 

The cloning of laboratory animals, most notably 
mice, will continue in order to explore the basic bio- 
logical problem of gene control and genomic repro- 
gramming. The cloning of farm animals is being 
intensively pursued, as substantial benefits for 
agriculture can be envisioned. One kind of benefit 
would be the production of multiple copies of 
phenotypically (and thus genetically) highly desirable 
individuals. Cloning procedures are still so labor- 
intensive and expensive that only very rare individuals 
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will be reproduced by this method. However, cloning 
by transferring the nuclei from genetically modified 
cells is likely to become very common and very 
valuable. It is possible to introduce a specific gene 
(e. g- coding for a desired human protein) into cells 
in culture. After confirming that the gene is integrated 
into the desired place and that its expression is 
properly controlled, the nuclei from such genetically 
modified cells are and will be used for cloning, re- 
sulting in an animal producing the desired protein. 
For this approach the low success rate of cloning is 
irrelevant, since a single genetically modified animal 
can be bred and the desired genetic modification propa- 
gated. 

The cloning of humans is a very controversial sub- 
ject and the consensus seems to be against it. At pres- 
ent safety issues are probably sufficient to discourage 
any attempts, but if these are resolved and human 
cloning i is regarded as one of the aspects of reproduct- 
ive freedom, the entire subject may be discussed in a 
new light. So-called ‘therapeutic cloning’ involving 
nuclear transfer, embryo culture to blastocyst and 
derivation of embryonic stem cells, is currently 
attracting a lot of attention. This procedure would 
enable the establishment of individualized embryonic 
stem cells whose differentiated derivatives may be 
used in cell and tissue therapy (Solter and Gearhart, 
1999). Provided that the hopes pinned on the use of 
embryonic stem cells become real, therapeutic cloning 
may become acceptable and a common procedure. 
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Cloning Vectors 


I Schildkraut 
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A cloning vector is a DNA molecule which is used 
as a means to carry along and replicate in a host 
cell another DNA fragment which has been joined 
to it. The vector has attributes that allow the joined 
DNA to replicate in a host cell and usually carries a 
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selectable characteristic such as an antibiotic resist- 
ance gene to enable host cells which harbor the vector 
to be distinguished from the cells that do not harbor 
the vector. 


See also: Vectors 
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A closed reading frame is a sequence containing 
termination codons that prevent its translation into 
protein. 


See also: Genetic Code; Open Reading Frame 
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Recent advances in molecular biology have made 
large-scale studies of molecular variability within 
populations a reality. Data from such studies are 
often obtained as random samples of DNA sequences, 
or as samples of single nucleotide polymorphisms. 
Because the individuals in the sample are related, 
these data are highly dependent; understanding the 
nature of this dependence is crucial for the analysis 
of the variability in the sample. 

In contrast to data collected from pedigrees, the 
precise nature of the ancestral relationships among 
the DNA sequences in a random population sample 
is not known, and must be modeled. The coalescent, 
introduced by Kingman in 1982, describes one class of 
models for the genealogical relationships among a 
random sample of chromosomes. 

The use of genealogical or coalescent methods is 
now central to the analysis of much genetic data. They 
allow for efficient simulation of the molecular struc- 
ture of a sample of chromosomes; instead of simulat- 
ing the entire population and then sampling from that, 
one only needs to keep track of the ancestors of the 
sample. Furthermore they provide a natural frame- 
work for estimation and inference about population 
parameters such as mutation rates and recombination 
rates, as well as about features of the ancestry of the 
sample or population. 


The Ancestral Process 


The Neutral Case 

To describe the genealogy under a neutral model we 
assume that the population is haploid and of fixed 
size N individuals. Furthermore, we assume that the 
population evolves according to the discrete time 
Wright-Fisher model. In this model N descendants 
are chosen in each generation according to a multi- 
nomial distribution which reflects the gene frequen- 
cies in the previous generation. For instance, in the 
case of a single locus with two alleles A, and Az with 
respective frequencies x; and x2 the probability that 
there are k descendants of type A, in the following 
generation is given by 


if one ignores the possibility of mutation. 

In the neutral case, demography and the mutation 
process can be separated. This allows one to determine 
the ancestral relationships in the sample without refer- 
ence to the allelic types. 

When the population size N is large compared to 
the sample size the genealogy of a sample of size n can 
be approximated by a continuous time Markov chain 
A(t) in which time t is measured in units of N gener- 
ations. The process starts from A(0) = n and goes 
through the states , n — 1,...,2, 1. A value of A(z) 
= j means that the sample had j distinct ancestors at 
time t ago. The amount of time 7; for which there 
are j ancestors is exponentially distributed with mean 
2/[j(7 — 1)], and these times are independent of 
one another. This Markov chain A(t) is called the 
‘coalescent process.’ 

Of interest is the time to the most recent common 
ancestor (MRCA) of the sample. This time is denoted 
by Turca. It can be represented as the sum of the 
coalescence times T;, that is, 


Tyrca = Tn + Thra tee Tp 


It follows that the expected time to the MRCA is 
2(1 — 1/n). Thus in a large sample, the time to the 
MRCA is on average about 2N generations. 

The genealogy can be visualized as a coalescing 
tree. A realization is shown in Figure |. A tree that 
corresponds to a sample of size n has n tips and one 
root. The root is the location of the most recent com- 
mon ancestor. 

A characteristic of the neutral genealogy for fixed 
population size is that the last two branches dominate 
the height of the tree. This can be seen by comparing 
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Figure | Coalescent tree of sample of five individuals. 


the expected coalesence time of two branches, ET>, 
and the expected time to the most recent common 
ancestor, ETyrca.- The expected time until two ances- 
tors coalesce is 1 which is more than half of the total 
expected time to the most recent common ancestor, 
regardless of the sample size. 

Since under neutrality demography and the muta- 
tion process can be separated, to obtain a sample of 
size 7, one can first construct its genealogy and then 
superimpose the mutation process on the genealogy. 
This provides an extremely efficient way to simulate 
observations from complicated demographic and 
mutation scenarios. 

We assume the simplest mutation process in which 
mutations occur independently to all genes with prob- 
ability uy per gene per generation. If time is scaled in 
units of N generations and if 


Jim 2Nun =0 


then mutations occur along the branches of the co- 
alescent tree according to a Poisson process with rate 
0/2 independently in each branch of the tree. 

The distribution of the total number of mutations 
in the sample since their most recent common ancestor 
follows readily. Given the total length T of the 
branches in the tree, which is 
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the total number of mutations in the tree follows a 
Poisson distribution with mean 07/2. 


The Selection Case 
In the neutral case, the demography and the mutation 
process can be separated. This is reflected in the fact 
that the genealogy of a sample can be reconstruct- 
ed without reference to the mutation process. This 
separation of demography and mutation process no 
longer holds true when natural selection is incorpor- 
ated into the model. Under selection reproductive suc- 
cess depends on the allelic type. This is reflected in 
the more complicated structure of the ancestral graph. 
The simplest case of a population model with selec- 
tion and mutation is a discrete time haploid Wright- 
Fisher model with two alleles A; and A; at one locus. 
Mutations from A, to A> or the reverse occur with 
probability uy per gene per generation. Genes of type 
Az have a selective advantage with selection parameter 
sy. That is, if Y;(&) denotes the number of genes of 
type A, at generation k, then 


with 


pa -— un) + (1 —p)(14 sx)un 


Ye p= an) 


where p = i/N, the fraction of genes of type A; in 
generation k. 

Again, when the population size is large, the ge- 
nealogy of a sample of n genes can be approximated by 
a continuous time Markov process G(t), t > 0. This 
limiting object is called the ‘ancestral selection graph.’ 
Time ż is measured in units of N generations and 


and lim 2Nsy =o 

N-0o N-0o 

As in the neutral case, the genealogical process can be 
most easily explained when visualized as a graph. The 
ancestral graph has a coalescing/branching structure. 
An ancestral graph is shown in Figure 2. The ancestral 
graph is a stochastic process whose dynamics are as 
follows. If there are k branches in the graph, then a 
coalescence event occurs at rate k(k — 1)/2, and a 
branching event occurs at rate ko/2. Coalescing events 
correspond to the merging of two ancestral lines as in 
the neutral case. Branching events are a characteristic 
of genealogies under selection. They reflect the fact 
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Figure 2 Ancestral selection graph for a sample of five 
individuals. Mutations denoted by X. 


that the fitter type has a higher reproductive success 
than the less fit type. Following an ancestral line back 
on the ancestral graph, at a branching point the two 
branches coming out of a point constitute possible 
ancestral paths. The branch that branches off the 
straight branch in the graph is called the incoming 
branch, while the straight branch is called the original 
branch. If the ancestor on the incoming branch is of 
the fitter type, then the ancestral path follows the 
incoming branch; if not, it follows the original branch. 
Paths in the ancestral selection graph are thus possible 
ancestral paths. As long as ø < œ, the size of the graph 
will eventually reach 1. The ancestor at this instant is 
called the ultimate ancestor. Which of the paths are 
contained in the embedded genealogy can be deter- 
mined once the ultimate ancestor is found. 

The type of the ultimate ancestor needs to be chosen 
according to the allele frequencies at the time of the 
ultimate ancestor. For instance, if the gene frequencies 
were in equilibrium at that time, the type of the ultim- 
ate ancestor would be chosen from the stationary dis- 
tribution. 

Mutation events can be treated as in the neutral 
case: mutation events are superimposed on the ances- 
tral graph at rate 0/2, independently in each branch. 

Embedded in an ancestral recombination graph is 
the true genealogy of the sample, called the “embedded 
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Figure 3 Embedded genealogy from Figure 2. Muta- 
tions denoted by X. 


genealogy.’ To find the embedded genealogy, one 
starts at the ultimate ancestor and follows the graph 
forward in time. At mutation events the type changes 
accordingly. At coalescing events, the two branches 
coming out of the coalescing point receive the same 
type as the branch entering the coalescing point. At 
branching points, if the incoming branch has the fitter 
allele, then the gene on the incoming branch continues. 
Following these rules one eventually arrives at the 
present time and obtains a sample of size n. Going 
back up the graph one can then extract the embedded 
genealogy and identify, for instance, the most recent 
common ancestor. As shown in Figure 3, this may 
differ from the ultimate ancestor. 


Robustness of the Genealogy 


The coalescent is remarkably robust. It provides a 
good approximation for a large class of reproduction 
models when the population size N is large relative to 
the sample size n. 

This class includes both discrete time models in 
which generations do not overlap and continuous 
time models in which generations overlap. One can 
also change the offspring distribution. For instance, if 
the variance of the offspring distribution is v, then in 
the neutral case a change in the time-scale of the 


coalescent occurs: The average time between coales- 
cing events changes by a factor 1/v. This implies 
that the time to the most recent common ancestor is 
shortened if the variance of the number of offspring is 
increased. 

Furthermore, genealogies can be formulated for 
diploid populations. In the neutral case when mating 
is random (i.e., a panmictic population), diploidy sim- 
ply means that the number of genes is doubled: if the 
population size is N, then the number of genes is 2N. 
The genealogy in the diploid case is then the same as in 
the haploid case with N is replaced by 2N. In the 
selective case when mating is random, the ancestral 
graph is more complicated. At branching points, three 
branches now come together. The additional branch is 
used to identify the type of the diploid parent. As in 
the haploid case it is possible to extract the embedded 
genealogy by following the paths in the ancestral 
graph. 


Varying Population Size 


It is straightforward to incorporate deterministically 
varying population size into the ancestral process. 
This only affects the coalescing rate and is therefore 
the same for both the neutral and the selective case. 

If N(t) denotes the population size t units in the 
past where t is measured in units of N = N(0) gener- 
ations and if N(t)/N — 1/p(t), then the coalescing 
rate is k(k — 1)u(t)/2 if there are k branches present 
at time t. 

The effect of a growing population can be quite 
dramatic. For instance, if the population has grown 
exponentially, i.e., N(t) = e-"'N for some f > 0, then 
u(t) = e” and the coalescing rate is k(k — 1)e /2. The 
resulting graph is stretched near the present time and 
compressed in the past (i.e., near the root). The result- 
ing graph resembles a star phylogeny in the neutral 
case. 


Recombination 


To describe the genealogy of two linked loci, L4 and 
L2, we assume that the population is of fixed size N 
and evolves according to the neutral Wright—Fisher 
model. Recombination occurs independently in each 
offspring. In each generation, with probability 1 — r 
each offspring independently inherits the genes at loci 
L, and L, from the same chromosome; with probabil- 
ity r the genes are inherited from different chromo- 
somes (i.e., a recombination event occurred). 


When the population is large and 


lim 2Nr =p 


N-0co 
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the genealogy of a sample of size n can be approxi- 
mated by a continuous time Markov chain R(t), t > 0, 
where time ¢ is measured in units of N generations. 
This Markov chain, known as the ‘ancestral recombin- 
ation graph,’ can be described as a graph that contains 
the lineages of each individual of the sample. Follow- 
ing a lineage backwards in time on this graph, recom- 
bination events occur at rate p/2. At such times, the 
lineage of the two loci L4 and L3 splits which results in 
a branching event. One branch follows the ancestry of 
one locus, the other branch follows the ancestry of the 
other locus. Common ancestry is again represented by 
the coalescing of branches. An example is given in 
Figure 4. 

The dynamics of this recombination graph are 
given as follows. If there are & branches in the graph, 
then a coalescing event occurs at rate G; that is, each 
pair of branches coalesces at rate 1; a branching event 
in which a branch splits into two, occurs at rate k p/2, 
that is, each branch splits into two at rate p/2. 

If one adopts the convention that branches that 
correspond to the L, locus are drawn to the left and 
branches that correspond to the L, locus are drawn to 
the right at branching points, then the ancestry of each 
locus can be traced separately by following the paths 
to the left for the L4 locus and to the right for the L2 
locus at each branching point. It follows that the 
ancestry of each locus is given by the neutral coales- 
cent process and each subtree has its own most recent 
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Figure 4 Two-locus ancestral recombination graph 
for sample of five individuals. 
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common ancestor. These marginal coalescent trees are 
of course not independent of one another. 

The ancestral graph can be adapted to describe 
multiple loci by keeping track of where the break- 
points occur at each recombination event. Just as earl- 
ier, mutations can be superimposed on the ancestral 
recombination graph at rate 0/2, independently in 
each branch. 


Migration and Subdivision 


The assumption of panmixia can be replaced by the 
assumption that the population is geographically 
structured. The simplest case is that of a subdivided 
population in which the population consists of a finite 
number of islands, each populated by a subpopulation. 
The size of the subpopulation on island zis denoted by 
N; for i = 1, 2,..., K, where K is the total number of 
islands. Reproduction on each island follows the 
Wright-Fisher model (possibly with selection). Each 
generation, a proportion m; of the offspring on island 
i migrates to island j, regardless of their genotype. A 
simplifying assumption is to stipulate that the sizes of 
subpopulations are fixed, that is, immigration balances 
emigration at all times. 

When the sizes of all subpopulations are sufficiently 
large, the genealogy ofa sample of size can be approxi- 
mated by a continuous time Markov chain S(t), t >0, 
where time t is measured in units of N = X4; N; 
generations. This process is called the ‘structured 
coalescent.’ In addition to the coalescent process in 
each of the islands, each branch in island 7, i = 1, 
2,...,K, “migrates” to island j at rate u;/2 where 


N; 
Jim 2N — N, Mij = Hij 


The effect of population subdivision compared to 
the panmictic case is a compression of the coalescent 
near the tips of the tree due to the smaller sizes of the 
subpopulations. However, further back in the past, the 
branches are extended provided the migration rate is 
small enough, since lineages have to be on the same 
island in order to coalesce. 

There are cases where these effects balance each 
other and the mean coalescing time for a pair of 
genes with population subdivision is identical to the 
panmictic case. However, the coalescing time in the 
subdivided population shows much greater variance 
than in the panmictic case. 


Strong Selection 


Under selection, demography and mutation become 
inseparable which results in a more complicated 


ancestral process. However, if selection is sufficiently 
strong, one can again separate demography and muta- 
tion, at least approximately. The embedded genealogy 
then becomes approximately a simple time change 
of Kingman’s neutral coalescent. The reason for this 
is that under strong selection the population dynam- 
ics are on a much faster time-scale than coalescing 
events. 

In cases where this separation of time-scale occurs, 
the ancestral process can be modeled as a change in the 
effective population size. In particular, this says that 
not only are the expected times between coalescing 
events a time change relative to the neutral case but the 
distribution of the coalescing events is the same as in 
the neutral case except for the time-scale. 

Strong selection can often be modeled as a sub- 
divided population where the subpopulations corres- 
pond to the different alleles. Migration between 
subpopulations is then governed by the mutation 
process. 


Other Coalescents 


The structure of the coalescent has been identified for 
a wide variety of other phenomena, such as non- 
random mating (e.g., selfing), different sexes, age struc- 
ture, and so on. We have assumed in our exposition 
that mutation, recombination, and selection rates are 
of the order of the reciprocal of the population size. In 
cases where this is not true, other behavior for the 
genealogy is possible; discrete time branching pro- 
cesses arise in this context. 


Inference 


An important use of coalescents arises when using 
random population samples to estimate population 
parameters such as p, 0, and o. A number of 
approaches have been proposed for this purpose, 
including those based on the behavior of summary 
statistics (for example, the number of segregating 
sites observed in a sample of DNA sequences is often 
used to estimate 0). Full likelihood methods and 
Bayesian approaches are currently of great interest, 
particularly as they provide an inferential framework 
for mapping disease genes by linkage disequilibrium 
mapping, and by haplotype sharing. Importance 
sampling and Markov chain Monte Carlo approaches 
have proved useful in this context. 
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History 


Discrete variation in external morphology or appear- 
ance that segregates in pedigrees provides the corner- 
stone of genetics in all organisms, and therefore it is no 
surprise that coat color variation has played a crucial 
role in animal genetics. 

Pioneering studies in this field were carried out at 
the turn of the century by William Castle and two of 
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his students, Clarence Cook Little and Sewall Wright. 
In particular, Wright published a series of manuscripts 
in 1917 and 1918 in which he argued that coat color 
genetics was a useful way “to assist embryology and 
biochemistry in filling in the links between germ cell 
and adult in specific cases,” because many coat color 
mutations were available for comparative study, and 
because a rudimentary knowledge of pigment chem- 
istry and biochemistry provided a foundation with 
which to interpret genetic interaction experiments. 
For example, in crosses between rabbits that were 
black or yellow, Wright remarked that there was a 
reciprocal biochemical and genetic relationship, indi- 
cative of different alleles acting at the same locus. 
By contrast, in crosses to albino rabbits, it was found 
that albinism could mask, or in genetic parlance, “be 
epistatic to,” either black or yellow. This led to the 
hypothesis, verified biochemically nearly 50 years 
later, that the biochemical process responsible for 
determining whether hairs were black or yellow acted 
on a single substrate produced by the product of the 
albino locus. 

In the early 1900s, availability of many coat color 
mutations for comparative study was driven primarily 
by cultural rather than scientific factors, as new vari- 
ants of spontaneous origin had been collected and 
maintained by communities of animal enthusiasts, 
so-called fanciers, for several hundred years. How- 
ever, the value of mutants to biomedical research 
became increasingly apparent and, by the 1920s, sys- 
tematic attempts were initiated at several academic 
institutions to catalog and preserve different mutations 
and, in addition, to develop inbred strains of animals 
so that the effects of different mutations could be 
studied on a consistent genetic background. Although 
much of Wright’s early work was with guinea pigs, the 
house mouse rapidly became favored due to smaller 
size, ready availability, and rapid generation time. The 
Jackson Laboratory, in Bar Harbor, Maine, founded in 
1923 by C.C. Little, has played and continues to fulfill 
an especially prominent role, providing a repository 
and distribution center for different mutations and 
strains of mice to scientists around the world. Thus, 
most of our knowledge regarding coat color gene 
action has come from mice, although in some cases 
studies in other mammals have confirmed or refined 
our principles of color inheritance. 

Many coat color mutations in mice and virtually all 
those in other mammals are of spontaneous origin. 
However, a special class of mouse mutations at a small 
number of loci have been induced in experiments 
designed to measure and characterize genotoxic effects 
of radiation or chemicals. Supported at large national 
laboratories such as Harwell (UK), Neuherberg 
(Germany), or Oak Ridge, Tennessee (USA), 
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most of these experiments have been designed to 
detect loss-of-function mutations at one of seven dif- 
ferent loci: Agouti (a), Brown (b), Albino (c), Dilute 
(d), Short ear (se), Pink-eyed dilution (p), and Piebald 
(s), of which all but one, Short ear, affect coat color. 
Typically, mutagenized wild-type animals are crossed 
to a/a, b/b, c/c, d se/d se, p/p, s/s animals, allowing new 
recessive mutations for each of the seven ‘specific 
loci’ to be recognized in the F, progeny, along with 
new dominant mutations at other loci. The availability 
of multiple alleles at a single locus can be a powerful 
tool in any genetic system; consequently experimental 
results based on the specific locus test have played an 
important role in the history of coat color gene action. 

Discussing genes, mutations, and loci can be con- 
fusing, since the meaning of these terms has changed 
somewhat as the era of molecular genetics has ma- 
tured. Experimental geneticists use the term ‘locus’ to 
describe a specific heritable trait whose map position 
can be compared with other heritable traits that 
produce a similar effect. For example, oculocutaneous 
albinism in humans is a recessive condition, but, 
rarely, a mating between albino individuals may pro- 
duce children that all have normal pigmentation, in- 
dicating that albinism in the parents is caused by 
two different loci. (One is homologous to the mouse 
albino locus, while the other is homologous to the 
mouse pink-eyed dilution gene.) Although the term 
‘mutation’ is frequently used to describe an alteration 
in DNA sequence, here we use ‘mutation’ to describe 
phenotypic variation in a heritable trait. Finally, the 
term ‘gene’ may be used to describe a unit of heritable 
variation (similar to locus) but, in a molecular context, 
usually refers to a contiguous DNA sequence required 
for production of a specific RNA or protein product. 


Number of Coat Color Mutations and 
Coat Color Genes 


Experiments based on the specific locus test have pro- 
duced a large number of recessive mutations for the a, 
b, c, d, p, and s loci, as well as a large number of 
dominant or semidominant mutations for the White 
spotting (W) and Steel (SL) loci. The types of muta- 
tions produced are usually loss-of-function, and the 
different inheritance patterns reflect instrinsic differ- 
ences between gene action at the two groups of loci. 
In most circumstances, the a, b, c, d, p, and s loci are 
not very sensitive to gene dosage such that the pheno- 
types of A/a, B/b, C/c, D/d, P/p, or S/s mice are identi- 
cal to the phenotypes of A/A, B/B, C/C, D/D, P/P, 
or S/S mice, respectively. By contrast, W/w or Sl/sl 
mice are easily distinguished from W/W or SII/S1 
loci (in genetic parlance, these loci are ‘haplo- 
insufficient’). 


In addition to a, b, c, d, p, s, W, SL, spontaneous coat 
color mutations have been observed for approximately 
90 additional loci in mice (generally with a small 
number of mutations per locus) for a total of approxi- 
mately 100 different coat color genes. 

The terminology used here, up to this point, for the 
different loci is historical and reflects the fact that the 
genes were identified originally by virtue of their 
phenotype. However, in modern nomenclature, most 
of the genes have been renamed to reflect their protein 
product. Thus, the b, c, d, s, W, and S/ genes are now 
referred to as tyrosinase-related protein 1 (Tyrp 1), 
tyrosinase (Tyr), myosin 5a (MyoSa), endothelin 
receptor B (Ednrb), c-Kit proto-oncogene (Kit), and 
mast cell growth factor (Mgf), respectively (Agouti 
and Pink-eyed dilution have retained their original 
names). 

In most mammals other than mice, a small number 
of loci (fewer than 10) have been recognized as coat 
color mutations. While only a few have been charac- 
terized at a molecular level, in many cases it has been 
possible to assign homologies among different mam- 
mals in the absence of molecular information. For 
example, a temperature-sensitive loss-of-function 
mutation in Tyrosinase produces a distinctive pheno- 
type known as the ‘Himalayan mutation’ in rabbits, 
mice, or guinea pigs, and is also responsible for the 
characteristic appearance of Siamese cats. 


Different Types of Coat Color Mutations 


Coat color mutations are usually classified on the basis 
of cellular and/or developmental processes that are 
disrupted: pigment cell differentiation/migration/sur- 
vival, biochemical synthesis of melanin, intracellular 
trafficking/membrane sorting of pigment granules, or 
pigment type-switching. An alternative approach to 
classification is based on whether the effects of a 
particular mutation are limited to coat color or are 
pleiotropic, affecting multiple processes in tissues of 
different embryonic origins. For example, several coat 
color mutations that disrupt intracellular pigment 
granule sorting also affect the sorting of intracellular 
contents in platelets, leading to prolonged bleeding 
time. 


Mutations that Cause White Spotting 


During embyronic development, pigment cell precur- 
sors, melanoblasts, differentiate from a specialized 
region of the neural tube, the neural crest, which also 
gives rise to the peripheral nervous system, connective 
tissue of the head and neck, and a portion of the 
adrenal gland. The melanoblasts proliferate and mi- 
grate from the middorsal region in a lateral direction to 


meet at the ventral midline. In general, melanoblasts 
are restricted from migrating along the rostrocaudal 
axis, but probably produce paracrine factors that dif- 
fuse beyond the boundaries of migration, which may 
explain why death of a melanoblast during the migra- 
tion process can cause an irregular, localized white 
spot in the adult animal. The developmental history 
of pigment cells also helps to explain why white spots 
are especially common on the ventral body surface, 
and why individual spots never cross the ventral mid- 
line. By contrast, the loss of pigment cells that appears 
in juvenile or adult life, also known as ‘vitiligo,’ is 
caused not by a developmental abnormality, but 
instead by destruction of pigment cells, often by an 
autoimmune process. 

The action of white-spotting mutations is represen- 
tative of many developmental processes that are sto- 
chastic. For example, in animals heterozygous for a 
loss-of-function mutation at Ednrb, which encodes a 
receptor on melanoblasts that helps stimulate migra- 
tion and proliferation, every cell in the animal has 
reduced gene dosage for Ednrb, which lowers the 
threshold for additional factors — environmental, 
genetic, or random — that may cause the death of an 
individual melanoblast. Thus, animals with identical 
Ednrb mutations have different amounts of spotting, 
and their spots are located in different regions of the 
body. In an extreme case, white-spotting mutations 
cause a completely white coat with preservation of 
pigment in the back of the eye, since retinal pigment 
epithelial cells are not derived from the neural crest 
and do not depend on many of the molecular pro- 
cesses used during melanocyte development. This 
phenotype of “one big spot” is easily recognizable in 
many different animals, e.g., white horses, white cows, 
or white cats. 

Many white-spotting mutations are pleiotropic, 
because the molecular process disrupted by the muta- 
tion is used in tissues other than pigment cells. For 
example, Kit encodes a receptor that is required 
for proliferation, migration, and/or survival not 
only for melanoblasts but also for developing blood 
cells and germ cells, therefore some Kit mutations 
cause not only white spotting, but also anemia and 
sterility. Neural-crest-derived melanocytes (though 
not pigment) are also required for proper function of 
the inner ear, therefore some white-spotting mutations 
also cause deafness. 

While most white-spotting mutations produce 
localized deficiency of melanocytes in an irregular 
pattern that varies among genetically identical animals, 
mutations that produce a regular and stereotypic 
pattern of spotting are easily recognized in certain 
species, e.g., panda bears or weasels. Although 
the underlying mechanisms are uncertain, similar 
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phenotypes in mice are due to unusual molecular 
alterations that cause components of the melanoblast 
migration machinery to be overexpressed in certain 
regions of the body. In some cases, e.g., racoons and 
zebras, regular patterns of white spotting are probably 
caused not by melanocyte deficiency, but instead by 
genes that affect pigment type-switching. Regardless 
of the underlying mechanism, rare genetic variation in 
coat color genes has provided a substrate for environ- 
mental adaptation and selective advantage in certain 
species during mammalian evolution. 


Mutations that Affect Melanin 
Biosynthesis and Different Pigment 
Types 


In mammals, melanin is a complex polymer, derived 
from oxidized derivatives of tyrosine, and is deposited 
in an organized fashion within subcellular organ- 
elles known as ‘melanosomes.’ Melanin biosynthesis, 
which takes place within these organelles, requires a 
series of enzymes for different oxidation steps, struc- 
tural proteins to make up the melanosome matrix, and 
transporters to maintain the appropriate levels of con- 
stituents inside the melanosomes. By contrast to white 
spotting, mutations that impair melanin biosynthesis 
affect the entire animal, often including retinal pig- 
ment. The best-known mutation of this type, albino, 
is a complete loss-of-function for tyrosinase, which 
catalyzes the initial step in melanin biosynthesis, oxi- 
dation of tyrosine to dopaquinone. 

Further enzymatic oxidation of dopaquinone pro- 
vides precursors for brown/black eumelanin, whereas 
cysteinyl derivatives of dopaquinone provide precur- 
sors for red/yellow pheomelanin. Thus, tyrosinase 
is required for synthesis of both types of pigment, 
whereas additional melanin biosynthetic genes are 
generally used either for eumelanin or pheomelanin, 
but not both. Genes required for eumelanin but not 
pheomelanin synthesis have been especially well char- 
acterized; loss of function in some, e.g., Pink-eyed 
dilution, blocks nearly all eumelanin synthesis, while 
loss of function in others, e.g., Tyrosinase-related pro- 
tein 1, alters the quality of eumelanin, causing it to 
appear brown instead of black. 

In general, genes required for eumelanin bio- 
synthesis are not used outside of pigment cells, there- 
fore their primary effects are limited to pigmentation. 
However, retinal pigment is required for axons of 
retinal ganglion cells to project to their proper loca- 
tions in the brain. In addition, while neural-crest- 
derived melanocytes may produce eumelanin or 
pheomelanin, retinal pigment cells make only eume- 
lanin. Thus, absence of eumelanin may have variable 
effects on coat color (depending on whether or not 
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pheomelanin is synthesized), but always causes a 
loss of retinal pigment and secondary defects in 
visual perception. Furthermore, while genetic vari- 
ation in melanin biosynthetic components is probably 
responsible for a wide range of coat color phenotypes 
seen in nature, complete loss of pigmentation as in the 
albino phenotype is generally limited to animals in 
captivity. 


Mutations that Affect Pigment Granule 
Trafficking or Membrane Sorting 


In mice, mutations in a large class of coat color genes 
produce a generalized pigmentary dilution, platelet 
storage pool deficiency, and abnormal lysosomal 
trafficking. Among those whose molecular identity 
is known are several that encode components of 
membrane-sorting pathways. A related class of genes 
encodes components of molecular motors required for 
the intracellular transport of melanosomes. The identi- 
fication and analysis of both types of genes has pro- 
vided both a useful resource for, and molecular insight 
into, basic aspects of cell biology. Because mutations 
in most of these genes have nonpigmentary effects, 
there is little genetic variation outside of laboratory 
animals or human patients. However, in some cases, 
mutations in homologous genes have been identified 
among several domesticated species. For example, 
Chediak—Higashi syndrome, characterized by pig- 
mentary dilution, abnormal membrane trafficking, 
and immunodeficiency, is found in humans, mice, 
cats, mink, and cattle. 


Mutations that Affect Pigment 
Type-Switching 


As described above, hair follicle melanocytes may 
switch between the two basic pigment types, red/ 
yellow pheomelanin and brown/back eumelanin. De- 
pending on genetic background, switching between 
pigment types occurs at specific times during hair 
growth and in particular regions of the body, allowing 
genetic control of pigment type-switching to give rise 
to a diversity of coat color patterns. 

A paracrine signaling molecule that plays a key role 
in pigment type-switching, Agouti protein, is pro- 
duced by specialized dermal cells underneath each 
hair follicle, and causes overlying melanocytes to 
switch from the synthesis of eumelanin to pheome- 
lanin. A commonly observed pattern in many animals, 
including a group of South American rodents after 
which Agouti protein is named, is the presence of a 
subapical band of pheomelanin on a hair that is other- 
wise eumelanic. The presence of such a band on most 


or all body hairs gives the entire animal a brushed 
golden appearance that can provide camouflage in 
some circumstances. 

Mutations in several genes can alter pigment type- 
switching, including the Agouti gene itself, and the 
Melanocortin receptor 1 (Mcir) gene, which encodes 
the receptor for Agouti protein expressed on melano- 
cytes. Genetic variation in Agouti and Mcir are an 
important source of natural coat color polymorphisms 
that alter the balance between eumelanin and pheo- 
melanin, and have been found in several domesticated 
species including dogs, pigs, horses, cows, as well as 
humans. 

The intracellular signaling events responsible for 
switching from the synthesis of eumelanin to pheo- 
melanin are not completely understood, but one 
important component associated with the switch is 
downregulation of tyrosinase activity, since pheo- 
melanin synthesis apparently requires less tyrosinase 
activity than does eumelanin synthesis. However, in 
some genetic backgrounds, Agouti signaling reduces 
tyrosinase activity to a level no longer sufficient to 
maintain pheomelanin synthesis, causing a switch 
from production of black/brown pigment to almost 
no pigment. This phenomenon is probably respon- 
sible for the difference between the appearances of 
brushed golden and brushed gray, the latter being 
characteristic of animals such as the chinchilla or the 
gray wolf. 

Among the most interesting group of coat color 
mutations are those that cause regular patterns of 
stripes or spots, as in zebras, tigers, leopards, or gir- 
affes. Although chemical or biochemical studies have 
not been carried out, the components of such patterns 
are likely to be eumelanin alternating with pheome- 
lanin (as in tigers or leopards), or eumelanin alternat- 
ing with no pigment (as in zebras). 

Thus, the mechanisms operative in pigment type- 
switching — Agouti and Mc1r signaling — may also be 
responsible, in part, for regular pigmentation patterns. 
However, in contrast to most coat color variants, an 
ordered pattern of stripes or spots has not been 
identified in laboratory mice or other rodents, which 
has hampered molecular genetic insight into the 
underlying mechanisms. Nonetheless, some limited 
conclusions can be drawn from genetic studies in 
domestic cats, where different alleles of a single 
gene, Tabby, modify pigment type-switching in regu- 
lar patterns that may resemble tiger stripes or leopard 
spots. Because a single Tabby genotype can produce 
patterns that are either yellow alternating with black 
or white alternating with black, the white areas 
probably represent pigment type-switching rather 
than the absence of pigment cells. Whether a similar 
phenomenon explains alternating patterns of black 


and white in ungulates, i.e., zebras, is less clear, how- 
ever, since the Tabby gene is clearly recognized only in 
the Carnivora. 


Insight from Coat Color Mutations into 
Human Pigmentation 


As an increasing number of genomes are sequenced, it 
is becoming clear that the genomes of different mam- 
mals show relatively little variation in gene content or 
gene identity. It is no surprise, then, that many of the 
coat color mutations identified in mice or other furred 
animals have also been found in humans. Genetic 
variation in human pigmentation genes can be classi- 
fied into rare, disease-causing variants such as albin- 
ism or piebaldism, or common variation in eye, hair, 
and skin color that may distinguish individuals of 
different ancestry. 

In medical genetics, albinism refers to a generalized 
dilution or loss of pigmentation and is broadly 
grouped into conditions that affect eyes, skin, and 
hair (oculocutaneous albinism) or just the eyes 
(ocularalbinism). In both cases, defects in retinal pig- 
mentation frequently lead to visual impairment. 
Approximately 10 different genes have been identified 
that, when mutated, cause human albinism, including 
some involved in melanin biosynthesis such as 
Tyrosinase, Tyrosinase-related protein 1, Pink-eyed 
dilution, and others involved in vacuolar sorting 
or transport, i.e., Hermansky—Pudlak or Chediak- 
Higashi syndromes. In addition, several genes identi- 
fied because of white spotting in mice are also sources 
of mutations that cause localized loss of melanocytes 
in humans, occasionally associated with deafness, a 
condition termed Waardenburg syndrome. 

Mutations that affect pigment type-switching are 
also found in humans but, in contrast to the conditions 
described previously, are relatively common and a 
source of normal variation in many human popula- 
tions. In particular, loss-of-function mutations in the 
human Mcir gene account for the majority of indi- 
viduals in populations of European ancestry that have 
carrot-red hair, fair skin, and freckling. 

The genetic causes of blond versus brown versus 
black hair, or those responsible for skin pigment 
phenotypes characteristic for individuals of African, 
Asian, or European ancestry, have not been identified. 
However, biochemical and histological studies 
suggest these determinants are likely to have a rela- 
tively minor effect on pigment type-switching, and 
instead are more likely to modulate overall levels 
of melanogenesis. Identifying and understanding 
how these genes act remains a challenge for the 
future. 
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Coding sequences are regions on the DNA that 
encode a gene product and are distinguishable from 
regulatory sequences such as promoters or operators. 
The product encoded can be an RNA or a protein; 
therefore, the coding sequences of an organism 
include genes that encode proteins (open reading 
frames, ORFs) and genes that encode stable RNA. In 
most organisms the ORFs greatly outnumber the 
stable-RNA encoding sequences. Occasionally ‘cod- 
ing sequence’ and ‘ORF’ are used as synonyms, but 
this is incorrect. 

Because of the constraints placed on base compos- 
ition by the genetic code, protein-coding sequences 
may have a different base composition from non- 
coding sequences in organisms or genetic elements 
that havea highly biased base composition. In prokary- 
otes coding sequences make up the largest fraction of 
the genome, usually 90% or above. However, in the 
higher eukaryotes the fraction is much lower. In 
humans only about 3% of the genome consists of 
coding sequences. 

In eukaryotes, the sequences that encode a single 
product are often interrupted by sequences called 
introns, which do not encode a part of the product of 
the gene. (A few introns do encode separate products.) 
The coding sequences in such an arrangement are 
referred to as exons. Introns in protein-encoding 
genes are the rule in the higher eukaryotes, such 
as mammals, but are much less common in lower 
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eukaryotes, such as fungi. Introns are rarely found in 
prokaryotes. 

Although the introns found in RNA are encoded 
by DNA, this DNA is not considered a ‘coding se- 
quence.’ Furthermore, the sequences found in mature 
mRNA upstream or downstream from the actual 
ORF are not considered coding sequences, even if 
they are required for translation of the mRNA. How- 
ever, DNA sequences that lead to translated RNA are 
considered coding sequences even if the encoded 
amino acids are not found as part of the final gene 
product. For instance, the DNA encoding the initiat- 
ing methionine of a protein is considered a coding 
sequence even though the methionine is removed 
from the protein. 


See also: Codon Usage Bias; Gene Product; 
Introns and Exons; Open Reading Frame 
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A coding strand is the DNA strand with the same 
sequence as mRNA. 


See also: Messenger RNA (mRNA) 
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The term codominance describes the relationship 
between two alleles at a locus when animals hetero- 
zygous for the two alleles display both of the pheno- 
types observed in animals homozygous for one allele 
or the other. A prominent example of codominance 
occurs with the A and B alleles at the classical blood 
type locus (symbolized as I). People heterozygous for 
the alleles A and B express both the A and B blood type 
antigens. Thus, these heterozygous individuals are 
readily distinguishable from both A homozygotes 
and B homozygotes. The term has also been coopted 
by molecular biologists to describe any DNA marker 
for which alternative alleles can both be readily 
detected with the use of a DNA-based assay of some 
kind. 


See also: Dominance; Overdominance 
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The genetic code is degenerate: except for Met and 
Trp, all amino acids are encoded by more than one 
nucleotide triplet (codon). The number of alternative, 
or synonymous, codons varies from two to six, with 
the synonyms generally differing at their third pos- 
ition. It might be expected that alternative syn- 
onymous codons would be used in roughly equal 
frequencies, but this is not so. Most genes from most 
species exhibit biased codon usage. Different species 
have different codon usage profiles, and codon usage 
often varies significantly among genes from the same 
genome. 

The pattern of synonymous codon usage must 
reflect the combined influences of mutation, natural 
selection, and random genetic drift. Investigations of 
codon usage have provided interesting insights into 
basic aspects of cell biochemistry, genetics, and evolu- 
tionary biology. The results can also have useful 
applications. 


Mutation Biases 


Even in the absence of natural selection, DNA 
sequences rarely consist of equal proportions of the 
four nucleotides A, C, G, and T. Mutational biases are 
pervasive and lead to biased patterns of codon usage, 
with different patterns in different species, and some- 
times different patterns in different genes from the 
same genome, dependent on their location. 


Genome G+C Content 

In double-stranded DNA, any occurrence of A (or C) 
in one strand implies a T (or G) in the other, and so the 
base composition of DNA sequences is primarily 
described by their G+C content. The overall genomic 
G+C content of different species varies widely. This 
is particularly true in bacteria, where genomic G+C 
ranges from about 25% (e.g., in species of Myco- 
plasma) to 75% (e.g., in species of Streptomyces) 
(Table |). Bacterial genomes are mostly composed of 
coding sequences, with little intergenic DNA, and so 
constraints on the amino acid sequences of bacterial 
proteins mean that most of the G+C variation among 
species is accommodated at synonymously variable 
sites in genes. Consequently, when only third pos- 
itions of codons are considered the range of G+C 
values among species is even wider, from near zero 
to almost 100%. These base composition biases most 


Table | Codon usage bias in various species® 


Species” 
Escherichia coli Bacillus subtilis Mycoplasma Streptomyces Saccharomyces Drosophila Homo sapiens 
capricolum coelicolor cerevisiae melanogaster 
High Low High Low High Low High Low AST G+C 
Phe UUU 03 1.0 0.4 1.4 1.8 0.0 0.2 L1 0.2 L1 1.5 0.2 
UUC 1.7 1.0 1.6 0.6 0.2 2.0 1.8 0.9 1.8 0.9 0.5 1.8 
Leu UUA 0.0 0.9 0.9 1.1 4.5 0.0 0.6 1.2 0.0 0.9 1.4 0.0 
UUG 0. 1.0 0.3 0.9 0.3 0.1 5.1 1.2 0.6 1.4 1.1 0.2 
CUU 0. 0.9 3.9 1.6 0.5 0.1 0.0 0.8 0.2 1.0 1.5 0.1 
CUC 0.2 0.5 0.3 0.7 0.0 2.3 0.0 0.7 0.9 0.5 0.5 1.6 
CUA 0.0 0.2 0.4 0.3 0.6 0.1 0.2 0.8 0.1 0.8 0.7 0.1 
CUG 5.6 2.6 0.2 1.3 0.0 3.4 0.0 1.2 4.2 1.3 0.8 4.0 


“Values indicate the relative synonymous codon usage calculated as the observed values divided by those expected if synonyms are used equally. 
High and Low indicate genes expressed at high and low levels, for species with codon bias selected for translation; the optimal codons are in bold. A+T and G+C indicate 
A-+T-rich and G+C-rich genes. 
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likely reflect the effect of subtle mutational biases 
accumulating over millions of generations. 


Leading vs. Lagging Strands 

Chromosome replication in bacteria involves leading 
and lagging strands: the leading strand, proceeding 5! 
to 3! bidirectionally away from the origin of replica- 
tion, is replicated first. In many bacterial species, base 
composition differs between the strands. Generally G 
has a higher frequency on the leading than on the 
lagging strand, which can be alternatively expressed 
as an excess of G over C on the leading strand, or GC 
skew; often there is also an excess of T over A on the 
leading strand. The magnitude of these skews varies 
among species, from highly pronounced in the spiro- 
chaete Borrelia burgdorferi (the agent of Lyme 
disease) to nonexistent in the cyanobacterium Syn- 
echocystis. The effect is most pronounced at synonym- 
ously variable sites in genes, and so codon usage is 
heavily influenced by it. The phenomenon most likely 
reflects mutational biases, in this case differing 
between the two strands because the processes of 
replication (and the errors that they incur) differ. 


Intragenomic Regions 

In eukaryotes base composition often varies among 
regions of the genome. For example, in mammals the 
patterns of codon usage in different genes reflect this. 
Human genes vary in G+C content at third positions 
of codons, from about 30% to 90% (Table 1). Genes 
with G+C-rich codon usage also have relatively 
G+C-rich introns and flanking sequences. Neigh- 
boring genes have similar G+C contents. Mutational 
biases are the simplest explanation, in this case imply- 
ing that the biases vary across the genome. Several 
processes may have different impacts on different 
regions of the genome. The spectrum of mutations 
may vary during the replication cycle, such that se- 
quences replicated late suffer a different pattern of 
mutation from those replicated earlier. Also the rate 
of recombination varies around the genome: recombin- 
ation involves DNA repair which is biased towards 
G+C-richness in mammals, so that genes undergoing 
recombination at different rates can incur different 
mutational biases. 


Fine-Scale Variations 

Mutational biases also vary on the fine scale. The 
clearest example concerns the dinucleotide CpG. In 
many species, including mammals, when C has G as its 
3’ neighbor (i.e., in the dinucleotide CpG), the C is 
prone to methylation. This "C is then susceptible to 
deamination, becoming T. Thus the CpG dinucleotide 
is a mutational hot spot, and depletion due to muta- 
tion leads to this dinucleotide occurring much more 


rarely than expected based on the occurrence of C and 
G individually: as a consequence, codons containing 
CpG are rare in genes subject to methylation. 


Natural Selection 


The effect of natural selection may be superimposed on 
the patterns of codon usage bias caused by mutational 
biases. Alternative synonymous codons are not equiva- 
lent in their translational properties, because of the 
interaction between a codon and its cognate tRNA. 
There are two main reasons. First, for amino acids 
encoded by more than two synonyms there is usually 
more than one tRNA species. These isoaccepting 
tRNAs, with different anticodon sequences for the 
same amino acid, have different abundances in the cell. 
Second, any particular tRNA species often decodes 
more than one codon (typically two, but sometimes 
three or even four), due to ‘wobble.’ The potential 
bonds between one tRNA anticodon and the multiple 
codons it can recognize are not equivalent. Combin- 
ing these two effects, the translationally optimal 
codon for any amino acid is the one best recognized 
by the most abundant tRNA. 


Escherichia coli 

For example, in the gram-negative bacterium Escher- 
ichia coli there are four different species of Leu tRNA 
to decode the six Leu synonyms. The tRNA for CUG 
is very abundant, while that for CUA is very rare. 
The two Phe codons UUU and UUC are decoded by 
a single tRNA species, with anticodon 3/-AAG-5’: 
while this anticodon can bind both UUU and UUC, 
it binds more strongly to the latter. Codon usage in 
E. coli reflects these factors: both CUG and UUC are 
strongly preferred over their synonyms (and CUA 
is very rare) in genes expressed at very high levels 
(Table 1), such as those encoding proteins involved 
in translation (ribosomal proteins and translation 
elongation factors), and abundant outer membrane 
proteins. However, genes expressed at low levels ex- 
hibit much less bias, and have codon usage patterns 
largely consistent with the effects of mutational biases. 
The same phenomenon is seen for most of the 18 
amino acids with alternative synonymous codons. 


Other Bacteria 

Similar observations have been made for the gram- 
positive bacterium Bacillus subtilis. For some amino 
acids, such as Phe, it is the same codon (UUC) that is 
translationally optimal, but for others, such as Leu, the 
identity of the optimal codon is different (Table 1), 
correlated with a change in the abundance of the re- 
spective tRNAs. Although the abundance of tRNA 
species has not been quantified in other bacteria, 


similar observations of strong codon usage bias, spe- 
cifically in highly expressed genes, have been made in 
a number of other species. Within any species, the 
pattern of codon usage and the abundance of tRNA 
species can be viewed as a highly coadapted system. 

Closely related species, such as E. coli and Salmon- 
ella typhimurium, generally have very similar patterns 
of codon usage because the influence are similar. Bac- 
teriophages exploit the translation machinery of their 
hosts, and often have similar codon usage patterns to 
their hosts. 

However, selected codon usage bias is not ubiqui- 
tous among bacteria. The human pathogen Helicobac- 
ter pylori does not exhibit preferentially biased codon 
usage in highly expressed genes. Also, in many species 
with extremely biased base compositions, such as 
Mycoplasma and Streptomyces (Table 1), or Borrelia 
which is A+T-rich overall and exhibits a strong skew 
to G+T on the leading strand, there is little evidence 
of differently biased codon usage in highly expressed 
genes. In these species natural selection on codon 
usage has not been effective. 


Codon Bias and Fitness 

Optimal codons are translated faster and with fewer 
errors of misincorporation of an incorrect amino acid 
in the growing polypeptide chain, and so selection 
may act on the speed and/or the accuracy of transla- 
tion. However, the speed of translation is not expected 
to directly enhance the level of expression of any 
particular gene. The rate of production of any one 
protein is expected to be largely determined by the 
rate at which translation of its mRNAs is initiated, 
which in turn is mainly influenced by the rate of 
ribosome binding. So why might faster translation of 
highly expressed genes be adaptive? The answer prob- 
ably lies in the efficiency of protein production when 
considered at the level of the entire cell. During periods 
of rapid growth the rate of overall gene expression 
(and hence growth) is limited by the availability of 
ribosomes. The more rapidly a ribosome moves along 
an mRNA, the sooner it becomes available to translate 
another mRNA. Thus “good” codon usage is adaptive 
because it enables efficient use of ribosomes and maxi- 
mizes growth rate. 

Selection on translational accuracy may also be 
important, influencing the choice of codons for amino 
acids where frequent misincorporation would be detri- 
mental to a protein’s function and thus to the fitness of 
the organism. Other possible influences include selec- 
tion for or against certain DNA sequences, such as 
those with the potential to form secondary structures. 

The selection coefficients (S) involved, i.e., the dif- 
ferences in fitness caused by using one synonym 
rather than another, are expected to be very small, 
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and indeed may be the most subtle fitness differences 
known. From population genetic theory, it is evident 
that selection on synonymous codons can only effect- 
ive in those species where the effective population size 
(Ne) is large: more precisely, if the product of N. and S 
is less than 1, natural selection cannot shape codon 
usage patterns. 


Eukaryotes 

There is evidence that natural selection has shaped 
codon usage patterns in a wide range of unicellular 
eukaryotes. The first and best-documented example is 
the budding yeast Saccharomyces cerevisiae (Table 1). 
The phenomenon has also been reported in many 
other unicellular fungi, including Aspergillus nidulans, 
Neurospora crassa, and Schizosaccharomyces pombe, 
the slime mold Dictyostelium discoideum, and in 
parasitic protozoa belonging to diverse phyla, includ- 
ing Giardia lamblia, Plasmodium falciparum, Tricho- 
monas vaginalis, and Entamoeba histolytica. 

The effect of natural selection on codon usage 
in multicellular eukaryotes may be more complex, 
because the tRNA population can vary among tissues 
and at different stages of development. Nevertheless, 
several animals have been found to have codon usage 
patterns shaped by natural selection. These include the 
nematode Caenorhabditis elegans and the insect Dros- 
ophila melanogaster (Table |). In contrast, natural 
selection does not appear to have been effective in 
vertebrate species. This is not unexpected, since for 
example in mammals long-term evolutionary effective 
population sizes have been estimated to be quite small. 

Codon usage in plants has been less extensively 
studied, but there are reports of translationally 
selected codon usage bias in some species. Viruses of 
eukaryotes often have biased codon usage, but it 
appears to be generally due to mutation biases rather 
than the influence of natural selection. 


Applications 


Gene Identification 

Information on the codon usage profile of a species 
can be applied in genome sequencing projects to assess 
whether an open reading frame is indeed likely to be 
gene. However, particularly in bacteria, mismatched 
codon bias may reflect the recent horizontal transfer 
of a gene from a species with different codon bias. In 
species where translational selection is effective it is 
possible to predict whether a gene is likely to be highly 
expressed. 


Heterologous Gene Expression 
Knowledge of codon bias may have applications in the 
field of biotechnology. Genes are often cloned and 
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then inserted into another species for expression. The 
codon usage of a heterologous gene is often quite 
different from that of the host genome. Adjusting the 
codon usage of the foreign gene may enhance its 
expression, increasing the amount of protein product 
obtained. This effect has been reported a number of 
times, both in the case of heterologous expression of 
genes for protein production, and in the use of report- 
er genes such as that for jellyfish green fluorescent 
protein (GFP). However, contradictory reports exist. 
Because of the considerations discussed above con- 
cerning the manner in which optimal codon usage 
may be adaptive, it is quite surprising that optimizing 
codon usage can change the expression level of a single 
gene. It is possible that the effect is indirect, for ex- 
ample due to changes in mRNA structure or lon- 
gevity. This area remains controversial and mysterious. 


See also: Codons; Universal Genetic Code 


Codons 


I Schildkraut 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0238 


A codon is one of 64 possible triplets of the four 
nucleic acid bases. The triplet of bases in DNA 
encodes an amino acid. After transcription of the 
gene into RNA the triplets are represented by 
the four bases of RNA, (thymine being replaced by 
uracil). For example, CCC encodes the amino acid 
proline and GAA encodes the amino acid glutamate. 
There are as many as six triplets each encoding the 
amino acids arginine, leucine, or serine and only 
one codon each encoding methionine or tryptophan. 
Three of the 64 possible codons (TAA, TAG, and 
TGA) each encode a signal for the ribosome to ‘stop’ 
or terminate the protein being translated. 


See also: Genetic Code 


Codons, Invariable 
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Codons are three successive nucleotides that specify 
an amino acid in some protein. No codon is absolutely 
invariable because a mutation can occur anywhere. 


varied unvaried 
(has) (hasn't) 
variable invariable 
(can) (can't) 


Figure | 


However, some amino acids in particular position 
in a protein sequence may have a function so specific 
and so vital that any change in that amino acid is 
so deleterious to that protein’s function, and hence 
to the organism in which it resides, that selective 
forces will surely remove that mutation from the 
gene pool. Thus, when one examines lots of sequences 
from different organisms, one does not see any vari- 
ation in that particular position. Such a position is 
said to be invariable and the observation of its being 
unvaried is a clue to its potential functional import- 
ance. But a position may have no variants, not 
because that amino acid is so functionally important, 
but because that position has, by chance, not received 
one of the allowable variants in that position, or the 
organism that has the variant present was not sampled. 
It is variable but unvaried. Generalized to all the pos- 
itions in the sequence, the unvaried positions (A + B 
in Figure |) comprise two mutually exclusive kinds of 
positions, the invariable (A) and the variable-but- 
unvaried (B) positions. Similarly, the positions that 
are unchanged (B + C) also comprise two mutually 
exclusive kinds of positions, the unvaried (C) and the 
variable-but-unvaried positions (B). 

We can count the varied and the unvaried positions 
by inspection; varied and unvaried are observations. 
We cannot know the number of variable and invari- 
able positions except by making additional assump- 
tions; variable and invariable are inferences. Each of 
these pairs, varied plus unvaried and variable plus 
invariable add up to the total. The distinction is vital 
in genetic analyses. 

The word invariant is avoided as it does not distin- 
guish between the two forms, that which is impossible 
(invariable) and that which has not varied (unvaried), 
and thus introduces ambiguity as to the author’s 
intent. 

These distinctions are important because it is now 
clear that the positions that are variable in a protein 
from mammals may differ from those that are variable 
in the same homologous protein in, say, plants. This 
result led to the concept of covarions (concommit- 
antly variable codons). 


See also: Codon Usage Bias; Codons; Covarion 
Model of Molecular Evolution 
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Concepts of Coevolution 


Although the term ‘coevolution’ occasionally refers to 
the joint evolution of different features of a single 
species (e.g., enzymes in a pathway, sex-specific fea- 
tures used in sexual interactions), it usually (and in 
this article) refers to the joint evolution of two or more 
species or genomes, owing to interactions between 
them. (In some instances, as in the joint evolution of 
nuclear and cytoplasmic genomes, this distinction 
may fail.) Most study of coevolution has focused on 
interspecific competition, mutualism, and interactions 
between predators and prey, herbivores and plants, 
and parasites and hosts (together referred to here as 
‘consumers’ and ‘victims’), as well as a few interac- 
tions, such as mimicry, that do not conveniently fit 
into these major categories. 

In classifications of coevolutionary processes 
(Thompson, 1994), one important distinction de- 
scribes evolution within lineages versus branching of 
lineages. One possible form of coevolution is co- 
speciation, the coordinated branching (speciation) of 
interacting species (such as a host and parasite). 
This may have occurred, for example, in figs (Ficus) 
and the small wasps (Agaonidae) that pollinate and 
develop in fig flowers. Each fig species is pollinated 
by a single host-specific wasp, and related wasps 
appear to be associated with related figs. A series of 
cospeciation events may produce concordant phylo- 
genies of two groups of interacting species. Such 
concordance is the exception rather than the rule, 
but has been described for the lice of pocket gophers, 
the bacterial endosymbionts of aphids, and a few other 
instances. In most such cases, there is little opportunity 
for transmission of the symbiont between different 
species of hosts. Concordance of host and symbiont 
phylogenies implies a longer history of association, 
and of opportunity for reciprocal adaptation, than 
when symbionts frequently have switched from one 
host to another. Phylogenetic studies have revealed 
that considerable switching among hosts has occurred 
in most groups of herbivorous insects, some symbiotic 
bacteria, and various parasite groups. 

Coevolution within two or more interacting line- 
ages consists of genetic changes in the characteristics 
of each, due to natural selection imposed by each on 
the other. That is, it consists of adaptation of lineages 
to each other. Such changes are referred to as specific 
or pairwise coevolution if the evolutionary responses 
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of two species to each other have no impact on their 
interactions with other species. Diffuse or guild co- 
evolution occurs when the genetic change in at least one 
species affects its interaction with two or more other 
species. For example, cucumber genotypes with high 
levels of cucurbitacin have enhanced resistance to 
mites, but also enhanced attractiveness to cucumber 
beetles — an instance of a negative genetic correlation 
in resistance. Early-season attack by flea beetles makes 
sumac plants more susceptible to stem-boring ceram- 
bycid beetles, so resistance to the former would also 
reduce the impact of the latter. 


Methods of Studying Coevolution 


The methods of studying coevolution correspond 
to those of studying evolution generally. Long-term, 
macroevolutionary patterns of coevolution, for ex- 
ample, are analyzed by paleobiologists and by the use 
of phylogenetic studies of extant species. Phylogenies 
can indicate whether related parasites have speciated 
and diverged in concert with their hosts or have 
shifted among host species, in a process analogous 
to colonizing new geographic areas. They can indi- 
cate whether or not a repeatedly evolved feature 
that affects ecological interactions, such as a defense 
against parasites, has been consistently associated with 
increased diversification of species. By plotting char- 
acters such as defenses on a phylogeny based on other 
data, patterns in the evolution of such features can be 
discerned. Phylogenetic information can be important 
for demonstrating that a character is an adaptation 
for an ecological interaction, rather than a widely 
shared primitive character that happens to confer a 
benefit in a novel context. 

Mathematical and computer models of processes 
of coevolution within populations and species play 
an important role in studying coevolution. Such 
models are based on population genetics, quantitative 
genetics, or optimality theory. Some models couple 
genetic dynamics to population dynamics, based on 
assumptions about how the outcomes of interactions 
between individuals with specified phenotypes will 
affect demography. As more such realism is intro- 
duced, the dynamics and possible outcomes are often 
found to become increasingly complex, and depend- 
ent on initial conditions. 

Many empirical studies document changes in fea- 
tures that mediate ecological interactions (e.g., size of 
bills, teeth, or other trophic structures) by comparing 
features of related species or conspecific populations, 
or by characterizing rapid changes in populations that 
have been moved to new regions by humans and have 
engaged in new interactions. Some comparisons test 
predictions from coevolutionary models; others test 
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important assumptions of the models, such as the 
importance of ‘costs’ of adaptation. Finally, coevo- 
lution can be studied directly in model systems in the 
laboratory, such as rapidly evolving populations of 
bacteria and bacteriophage. 


Coevolution of Competing Species 


Darwin argued that competition is an important 
agent of natural selection for adaptation to different 
habitats or resources by different species. Indeed, a 
common theme in community ecology is that coexist- 
ing species differ in food or other components of their 
ecological niches, and that such differences are ordin- 
arily necessary for species to coexist in the long term. 

Quantitative genetic models of the evolution of 
competitors assume that a heritable, continuously 
varying trait, such as an animal’s body size or mouth 
size, determines the mean and variance of resources 
(e.g., size of prey) consumed (Taper and Case, 1992). 
For instance, both within and among species of 
finches, the depth of the bill is correlated with the 
size and/or hardness of seeds that are most effectively 
handled and most frequently consumed. Such correl- 
ations support the assumption that for mechanical or 
other reasons, each phenotype performs better on a 
specific, optimal, resource than on other resources, 
and that a generalized phenotype handles a given 
resource less effectively than a phenotype that is 
specialized for that resource (i.e., “a jack of all trades 
is master of none”). The models may assume that 
overlap in resource use, and thus the intensity of com- 
petition, is proportional to the similarity of two pheno 
types, whether of the same or different species. 

In these models, a solitary species evolves to the 
mean phenotype (e.g., bill depth) that enables use of 
the most abundant resources, and intraspecific com- 
petition imposes frequency-dependent selection that 
can maintain variation, so that the population consists 
of a variety of phenotypes, each more or less special- 
ized on a different set of resources. If two genetically 
variable species overlap in resource use, then similar 
phenotypes in both species have lower fitness, due to 
the burden of competition both within and between 
species, than phenotypes that suffer only intraspecific 
competition. Hence the means of the two species 
diverge, and the overlap between the two phenotype 
distributions becomes lower. At equilibrium, how- 
ever, some overlap remains, and this residual compe- 
tition between the species may reduce the genetic and 
phenotypic variation within each species. If more than 
two species compete, they may evolve to be spaced out 
along the spectrum of resources. 

Divergence, resulting in stable coexistence and 
collective use of a wide variety of resources, is not 


inevitable. According to the genetic models, if two 
species are initially both close to the optimal pheno- 
type of a solitary population, they will converge 
toward it. The result may be extinction of one, by 
competitive exclusion. Similarly, if competitive effects 
are asymmetrical, as when larger individuals have a 
greater impact on smaller ones than vice versa, the 
species may change in parallel, with one ultimately 
extinguishing the other. Thus coevolution need not 
result in stable coexistence. 

As this theory predicts, closely related sympatric 
species of Darwin’s finches, woodpeckers, and some 
other animals each use a narrower variety of food 
types or microhabitats than do species that occur 
singly on islands. Evidence for evolutionary response 
to competition is provided by some instances of 
character displacement — a greater difference between 
two species where they occur together than where 
each occurs alone. Some lakes left by retreating 
glaciers in northwestern North America are inhabited 
by a single species of stickleback fish (Gasterosteus 
aculeatus complex), which feeds both near the bottom 
and in open water. In other lakes, two coexisting spe- 
cies have evolved. Relative to the solitary form, the 
coexisting species have diverged and specialized in 
morphology and behavior: one feeds on benthic prey 
and the other on plankton. In an experiment under 
seminatural conditions, competition among similar 
phenotypes reduced growth of juveniles more than 
among dissimilar phenotypes. 

Coevolution explains some patterns in the com- 
munity structure of coexisting species. For example, 
differences in body size among sympatric pairs or 
triplets of species of bird-eating hawks (Accipiter), 
which are correlated with differences in the average 
size of their prey, are greater than if pairs or triplets 
of species had been assembled at random from the 
47 species of Accipiter in the world. In a remarkable 
example of coevolutionary consistency, ecologically 
and morphologically equivalent species of Anolis 
lizards have evolved independently on each of the 
four islands of the Greater Antilles. Each island, for 
example, has a ‘crown giant’ that inhabits the canopy 
and a small, short-legged, long-tailed species that 
occupies slender twigs. 


Coevolution of Consumers and Victims 


Under this heading are included predators and prey, 
herbivores and plants, and parasites (including fungi, 
bacteria, and viruses) and their hosts. Naively, one 
might expect an evolutionary ‘arms race,’ whereby 
the victim evolves ever greater resistance, defense, or 
evasion, and the consumer evolves ever greater profi- 
ciency in finding and attacking the victim. However, 


the coevolutionary dynamics may be much more 
complex than this, due to factors such as (1) costs of 
adaptation, (2) diffuse versus pairwise coevolution, 
and (3) selection at multiple levels, as well as the 
biological details of particular interactions. 


Costs of Adaptation 

There is little evidence that evolutionary ‘arms races’ 
continue indefinitely. A major factor tending to estab- 
lish evolutionary equilibrium is the cost of adaptation. 
Considerable evidence supports the assumption that 
greater elaboration of a defensive or offensive feature 
imposes a cost that may be outweighed by the benefit 
it provides in the presence of the interaction. Such 
costs may be due to the character’s interfering with 
another function, or simply to the energy required 
to develop it. They may also be due to correlated 
effects on the organism’s interaction with other spe- 
cies. For example, if a host’s resistance to parasite A 
is correlated with susceptibility to parasite B, then 
resistance carries a ‘cost,’ and selection will vary in 
time and space, depending on the relative abundance 
of the two parasites. Similar considerations apply to 
other characters, such as the ability of a predator to 
handle different prey species. 


Pairwise and Diffuse Coevolution 

Adaptations for finding and attacking prey or hosts, 
or for escaping or resisting attack, account for much 
of the adaptive diversity of organisms. In many cases, 
the features of one or both of two interacting species 
have evolved by diffuse, rather than pairwise, coevolu- 
tion. For instance, furanocoumarin compounds in 
members of the parsnip family deter many herbivorous 
insects, rough-skinned newts (Taricha) produce a 
highly toxic alkaloid that most predators cannot 
tolerate, and vertebrates have evolved an extraordin- 
arily complex immune system for defense against a 
wide variety of microbial and other invaders. Certain 
insects, such as parsnip webworms, can detoxify the 
parsnip toxins; a newt-eating population of garter 
snake is resistant to the newt’s alkaloid; trypanosomes 
and the gonorrhea bacterium are among the parasites 
that evade the immune system by rapidly changing 
their surface proteins. The webworm and snake, and 
perhaps the parasites, have adapted to the defense 
mechanism of a specific species of victim, but the 
victim’s defensive character, in each instance, confers 
resistance against a wide variety of potential con- 
sumers, and probably did not evolve due to selection 
by the particular consumer that has become adapted 
to it. In fact, these defenses are characteristic of the 
higher taxa to which the individual species belong, and 
evolved in an ancestor of these taxa, long before the 
specific interaction cited came into being. 
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An instance of pairwise coevolution is afforded 
by the common cuckoo (Cuculus canorus) in Eurasia, 
which lays eggs in the nests of other species. The 
young cuckoo is fed by the host birds, and usually kills 
the host’s offspring. Some female cuckoos specialize 
on a particular species of host, and lay mimetic eggs 
that in color pattern resemble the host’s eggs. The 
mimetic patterns are clearly adaptations to the defen- 
sive behavior of certain hosts, which remove recog- 
nizably foreign eggs from the nest. Host species 
whose eggs are mimicked by cuckoos reject eggs 
more often than species whose eggs are not mimicked. 
Moreover, two species of birds accepted artificial 
cuckoo eggs in Iceland, where cuckoos do not occur, 
but rejected them in England, where these species are 
favored hosts of cuckoos. These observations provide 
evidence that the hosts have evolved rejection behav- 
ior in response to parasitism by cuckoos. 


Levels of Selection 

The Darwinian fitness of a genotype of predator or 
parasite is measured by the average reproductive suc- 
cess of an individual of that genotype. Often, repro- 
ductive success is enhanced by consuming more prey, 
or extracting more resource from a host and thereby 
reducing its chance of survival. (The degree of damage 
a parasite inflicts on its host is referred to as its viru- 
lence.) Hence evolution of the predator or parasite by 
individual selection may result in such high profi- 
ciency or virulence that the prey or host population 
is extinguished. Extinction of prey populations does 
not alter the relative fitnesses of individual predator 
genotypes, and so does not select for reduced viru- 
lence or predatory proficiency within the population 
of consumers. However, kin selection or group selec- 
tion may favor lower virulence or proficiency. If 
populations of more proficient predators or virulent 
parasites suffer higher extinction rates than popula- 
tions of less proficient consumers, the species as a 
whole might evolve lower proficiency. Individual 
selection is likely to be stronger than group selection 
in predator evolution, but the population structure of 
some parasites may provide an opportunity for group 
selection to affect their evolution. 


Models of Consumer-Victim Coevolution 

Quantitative genetic models of pairwise coevolution 
between consumers and victims describe change in 
a character in each species, such as fleetness of an 
ungulate and a carnivore. The more complete models 
include equations for change in both the character and 
the population density of each species. Population 
densities and character values affect each other. For 
example, an increase ina character that enables a preda- 
tor to capture its prey may increase the predator’s 
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population density, which may have the effect of 
lowering the density of the prey and increasing selec- 
tion for improved defense. 

The population dynamics and the course of char- 
acter evolution depend on many parameters, and are 
often sensitive to starting conditions. An indefinitely 
extended ‘arms race’ or escalation of the two species’ 
characters is unlikely, since the cost of a sufficiently 
elaborated character eventually exceeds its benefit. 
Rather, both the defense character of the prey and 
the ‘offense’ character of the predator may evolve 
to an intermediate stable state. Perhaps counter- 
intuitively, either of these may actually evolve to a 
lower value than it started with; for instance, a prey 
species may evolve a lower level of defense if it is so 
well defended that the predator becomes rare, and 
thus becomes a weaker agent of selection than the 
energetic cost of defense. In many models, both the 
population densities and the character means of both 
species may change indefinitely, either in stable limit 
cycles or chaotically. Theoretically, coevolutionary 
changes can cause rapid fluctuations in population 
density, and may result in extinction. Coevolution 
need not lead to stable coexistence. 

Similar fluctuations in allele frequencies are found 
in some models of “gene-for-gene’ interactions, which 
have been described for certain plants and their fungal 
or insect parasites. Resistance in the host is conferred 
by dominant alleles at several loci. For each resistance 
allele, a corresponding, usually recessive ‘virulence’ 
allele enables the parasite to overcome resistance. As 
expected from these models, local populations of a 
wild flax and its associated rust vary greatly in geno- 
type frequencies. If, however, the alleles for host resist- 
ance and parasite ‘virulence’ do not have substantial 
costs, the host may become fixed for resistance alleles 
at all loci, the parasite may become fixed for ‘viru- 
lence’ alleles, and further evolution depends on the 
origin of new mutations. Something resembling this 
scenario has occurred in the history of wheat and 
Hessian fly (Mayetiola destructor). Repeatedly, wide- 
spread planting of wheat varieties with additional 
resistance alleles has been followed by the spread of 
fly ‘biotypes’ with recessive alleles that overcome the 
crop’s resistance. 

Continual, perhaps cyclic, evolution in response 
to changes in an antagonistic species has been dubbed 
‘Red Queen coevolution’ after the character in Lewis 
Carroll’s Through the Looking-Glass who explained 
to Alice that it was necessary to run as fast as possible 
just to stay in the same place. It has been suggested that 
Red Queen coevolution of antagonists selects for 
recombination and sexual reproduction, and, indeed, 
sexual genotypes have been found to carry lower para- 
site loads than asexual genotypes in species of fishes 


and geckos. One theory of sexual selection holds 
that Red Queen coevolution with parasites selects, in 
host species, for female choice of males with display 
characters that indicate resistance to infection. 


Evolution of Virulence and Avirulence 

The fitness of a parasite genotype may be measured, 
approximately, by the proportion of potential hosts 
it infects, compared with other genotypes. Often, 
the rate of transmission to new hosts is proportional 
to the parasite’s reproductive rate, which in turn 
often (though not always) determines the parasite’s 
virulence to the host. For example, the probability 
that progeny of a virus are transmitted by a mosquito 
is a function of the density of viral particles in the 
host’s blood. However, the probability of transmis- 
sion is reduced if the host dies too soon, i.e., if the 
parasites die before transmission. Hence an equi- 
librium level of virulence is likely to evolve. The 
equilibrium value is affected by several factors. If 
transmission is ‘vertical,’ i.e., only to the offspring of 
infected individuals, then parasite fitness is propor- 
tional to the number of surviving host offspring, and 
selection favors benign, relatively avirulent parasite 
genotypes. If transmission is ‘horizontal,’ i.e., among 
hosts of the same generation, the equilibrium level of 
virulence is likely to be higher, because (1) an individ- 
ual parasite’s fitness does not depend on successful 
reproduction of its individual host, and (2) the likeli- 
hood is higher that an individual host will be infected 
by multiple parasite genotypes. Then competition 
among genotypes for transmission to new hosts favors 
a higher reproductive rate. Moreover, a high parasite 
reproductive rate, and therefore higher virulence, are 
favored if the host has a short life span due to external 
factors, if the parasites are likely to be extinguished 
before transmission (e.g., by antibiotic treatment), or 
if the opportunity for infection of new hosts rapidly 
increases, as during disease epidemics. 

Thus, the popular idea that parasites evolve to be 
harmless to their hosts holds true under some circum- 
stances, but not others. As theory predicts, geographic 
populations of a horizontally transmitted protozoan 
parasite of Daphnia (water flea) are better adapted 
to their local host population than to hosts from 
other regions: they reproduce at higher rates, but 
cause higher mortality and reduce host reproduction. 
Among species of nematodes that parasitize fig 
wasps, those that are mostly horizontally transmitted 
cause a greater reduction of their hosts’ fitness than 
those that are vertically transmitted. In experimental 
cultures of bacteria, decreased virulence of a bacterio- 
phage evolved under vertical, compared to hori- 
zontal, transmission. It has been suggested that 
pathogenic strains of normally innocuous bacteria 


may become prevalent in hospitals because intensive 
antibiotic treatment and/or heightened opportunity 
for transmission favor the evolution of more rapid 
reproduction. 


Evolution of Mutualism 


In mutualistic interactions between species, each 
species uses the other as a resource. That is, each 
exploits the other, and the balance between exploit- 
ation and overexploitation — i.e., parasitism or 
predation - may be a delicate one. Mutualisms include 
interactions both between free-living organisms, such 
as plants and pollinating animals, and between sym- 
bionts, one of which spends most of the life cycle on or 
in the other. Microbes are partners in many symbiotic 
mutualisms. Mutualists often have adaptations for 
encouraging the interaction or even nurturing the 
associate, such as foliar nectaries in plants, which 
attract ants that defend the plants against herbivores, 
or the root nodules of legumes, which house and 
nourish nitrogen-fixing rhizobial bacteria. Some sym- 
bioses are so intimate that the symbiont functions as 
an organ or organelle, as in the case of host-specific 
bacteria that reside within special cells in aphids 
and supply essential amino acids to their host. Many, 
though by no means all, mutualisms evolved from 
parasitic interactions. 

For each mutualist, the interaction has both a bene- 
fit and a cost. Legumes, for example, obtain nitrogen 
from rhizobia, but expend energy and materials on the 
symbionts. Excessive growth of the rhizobia would 
reduce the plant’s growth to the point of diminishing 
its fitness. Likewise, excessive proliferation of mito- 
chondria or plastids, which originated as symbiotic 
bacteria, would reduce the fitness of the eukaryotic 
cell or organism that carries them. Thus, selection will 
always favor protective mechanisms to prevent over- 
exploitation by an organism’s mutualist. Whether or 
not selection on a mutualist favors restraint depends 
on how much an individual’s fitness depends on the 
fitness of its individual host. When a mutualist can 
readily move from one host to another, as pollinating 
insects can from plant to plant, it does not suffer from 
the reproductive failure of any one host, and selfish- 
ness or overexploitation may be favored. Indeed, 
many pollinating insects ‘cheat.’ The larvae of yucca 
moths (Tegeticula) feed on developing yucca seeds in 
flowers that their mothers actively pollinated. How- 
ever, several species of Tegeticula have independently 
lost the pollinating behavior, having evolved the habit 
of ovipositing in flowers that other species have 
already pollinated. Moreover, the pollinating species 
lay only a few eggs in each flower, so that the few 


Coevolution 4ll 


larvae do not consume all the developing seeds. This 
reproductive restraint has evolved in response to a 
defensive tactic of the plant, which aborts developing 
fruits that contain more than a few eggs. However, 
the ‘cheater’ species of Tegeticula circumvent the 
plant’s defense by laying eggs after the developmental 
window for fruit abortion, and they lay so many eggs 
that the larvae consume most or all of the seeds. 

Vertical transmission of a symbiont favors restraint, 
just as it favors lower virulence in parasites, because 
the fitness of the individual symbiont is then propor- 
tional to its host’s reproductive success. This principle 
can explain why internal symbionts such as aphids’ 
bacteria or corals’ zooxanthellae (or eukaryotes’ mito- 
chondria) divide at rates commensurate with their 
host’s growth. By the same token, it has been sug- 
gested that hosts may evolve mechanisms to prevent 
horizontal transmission (mixing) of symbionts, and 
thus maintain conditions under which ‘selfishness’ 
would be disadvantageous to the symbiont. By exten- 
sion, such principles explain the conditions for the 
evolution of coordination versus conflict among dif- 
ferent genes, i.e., the evolution and maintenance of 
integrated organisms. 


Consequences of Coevolution 


We are only beginning to understand the effects that 
coevolution has had on the history and diversity of 
life. Clearly many of the adaptive differences among 
organisms — the many thousands of toxic defensive 
compounds in different plants, insects, and fungi, the 
many forms of flowers, the diverse growth forms of 
plants, the sometimes astonishingly specialized diets of 
animals — have issued from interactions among species. 
The numbers of species, too, may have been augmented 
by coevolution. According to a hypothesis advanced 
by P.R. Ehrlich and P.H. Raven, plant lineages that 
evolved novel chemical defenses against herbivores 
became free to diversify, and gave rise to diverse taxa 
to which insect lineages subsequently adapted and 
diversified in turn. As this hypothesis predicts, plant 
lineages with resin- or latex-bearing canals, which are 
known to deter insect attack, consistently have more 
species than their equally old sister lineages that lack 
these novel defenses. Coevolution among competitors 
can also augment the species diversity i in communities, 
producing suites of specialized species that finely par- 
tition resources among them. In theory, such coevo- 
lution may result in ecosystem-level effects such as 
higher productivity and resource consumption, but 
the evidence on this subject is very sparse. Coevolution 
may cause rapid changes in the properties, and there- 
fore the population dynamics, of species. It can 
have far-ranging evolutionary consequences, such as 
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affecting the evolution of sexual reproduction. A kind 
of coevolution occurs when the viruses and bacteria 
that attack the human body and the insects and fungi 
that attack our crops rapidly evolve defenses against 
new antibiotics, pesticides, or genetically altered crops. 
Such coevolution affects us all. 
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Cognate tRNAs are those recognized by a particular 
aminoacyl-tRNA synthetase. 


See also: Transfer RNA (tRNA) 
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Cohesive ends are complementary single-stranded 
DNA segments that extend from the ends of linear 
double-stranded DNA molecules. Cohesive ends were 
discovered in 1963 by Hans Ris and A.D. Hershey and 
coworkers, who found cohesive ends in the DNA 
chromosome of lambda, a virus that infects the bacter- 
ium Escherichia coli. Bacterial viruses such as lambda 
are also referred to as bacteriophages or simply phages. 
DNA purified from lambda virus particles is a linear 


double-stranded (ds) piece of DNA, 48 502 base pairs 
long. Ris, by examining purified lambda DNA in the 
electron microscope, found that upon storage, the 
lambda DNA molecules formed rings. Later work 
showed that ring formation was reversible. Later, 
DNA chemistry studies showed that the formation 
of rings was due to the presence of single-stranded 
DNA segments at the 5’ ends of the DNA strands. 
These cohesive ends were 12 nucleotides long and self- 
complementary, enabling ring formation to occur in 
dilute DNA concentrations. At high concentrations, 
lambda DNA molecules form linear and circular 
multimers, called concatemers. Lambda is a member 
of a particular group of medium-sized bacteriophages, 
the tailed dsDNA phages. The tailed dsDNA phages 
have an icosahedral protein shell, the capsid, with a 
protruding protein appendage, or tail, that is used to 
attach to a bacterial cell. The tail serves as a conduit for 
transport of the DNA out of the capsid and into the cell 
to start an infection. About 90% of all phages are tailed 
dsDNA viruses. Among the tailed dsDNA phages, the 
DNA molecules of about 75% have cohesive ends. 
In a cell, the ends of linear DNA molecules are 
subject to digestion by nucleases involved in DNA 
recombination, degradation, and other types of DNA 
processing. This digestion could potentially degrade an 
incoming viral chromosome, thus preventing a success- 
ful infection. Viruses with linear DNA chromosomes 
have a variety of strategies for protecting DNA ends 
from nuclease attack. For example, the DNAs of some 
tailed dsDNA viruses carry protective proteins that 
block nucleases from attacking the ends. When lambda 
DNA is injected into an E. coli host cell, the cohesive 
ends anneal, cyclizing the chromosome. The site of the 
annealed cohesive ends is called cos, for cohesive end 
site. The strand ‘nicks’ (interruptions in the continuity 
of the phosphodiester backbone of the DNA) at the site 
of the annealed cohesive ends are sealed by a host pro- 
tein, DNA ligase. Next, several rounds of replication 
produce progeny rings; during this early period of repli- 
cation, lambda also produces a protein known as Gam 
that inactivates one of the major nucleases of the cell, the 
RecBCDnuclease. Atlatetimes,circularlambdaDNAis 
replicated by a rolling-circle mechanism, creating long 
linear multimers of lambda DNA. During the assembly 
of progeny virions, a viral enzyme, called terminase, in- 
troduces staggered nicks at the cos sites of the concate- 
mers, thus generating cohesive ends as the viral DNA 
molecules are being packaged into the protein shells. 
The sequences of the lambda cohesive ends, slightly 
separated, are shown in Figure |. The boxes indicate 
base pairs that have twofold rotational symmetry, i.e., 
some of the base pairs on the top and bottoms strands, 
read 5’ to 3’, are identical to each other. The center of 
symmetry is indicated by the dot. Note that the nick 
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Figure | Cohesive ends in phage lambda. 


positions are symmetric, and two base pairs outside the 
nick positions show symmetry. This twofold rota- 
tional symmetry reflects the fact that the terminase 
enzyme is a dimer, with each of the two catalytic 
subunits nicking one strand. The nicking sites of 
restriction enzymes often show twofold rotational 
symmetry, and structural studies of some of these 
restriction enzymes have demonstrated that the 
enzymes are indeed dimeric. Many restriction 
enzymes also introduce staggered nicks, rather than 
cutting the DNA to produce blunt ends. These 
enzymes produce short cohesive ends that are 1 to 4 
bases in length. These cohesive ends are too short to 
survive within the cell, and so the DNA is cut into 
pieces. In the laboratory,DNA pieces with these short 
cohesive ends can be annealed at low temperatures and 
joined together with DNA ligase. The ability to create 
new combinations of genes and parts of genes, using 
restriction enzymes to generate the pieces, was one of 
the key discoveries leading to the ability to carry out 
sophisticated recombinant DNA technology. 

A number of the other lambdoid phages such as 
434, 21, and phi 80 have cohesive ends that are identical 
to lambda’s. Another group of E. coli viruses includes 
phage P2 and its relatives 186 and PSP3; the cohesive 
ends of these viruses are at the 5’ strand ends and are 19 
bases long. Pseudomonas aeruginosa virus phi CTX is 
a P2-related virus with similar cohesive ends that 
are 21 bases long, the longest reported to date. The 
shortest reported are 7 bases long; these are the 5’ 
cohesive ends of Haemophilus influenza virus HP1, 
and the 3’ cohesive ends of Bacillus subtilis phage 
phi 105. Although the lambda cohesive ends show 
twofold rotational symmetry, most cohesive end 
sequences are asymmetric. 

There is a genetic consequence of cutting multi- 
meric DNA to generate virion chromosomes, as fol- 
lows: Genes on opposite sides of the cos sequence end 
up on different virus chromosomes in different vir- 
ions. Hence the cohesive ends define the ends of the 
chromosomes; the cohesive ends mark the ends of 
the genetic map of the genes of a virus chromosome 
having cohesive ends. 


See also: Bacteriophages; DNA Replication; 
Genetic Mapping; Nuclease; Restriction 
Endonuclease; Rolling Circle Replication 
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Recombination in each of two linked intervals may be 
correlated. The coefficient of coincidence is a measure 
of that correlation. Letting R, be the recombination 
frequency for interval 1 determined without regard to 
interval 2, R be that for interval 2 determined without 
regard to interval 1, and R12 be the frequency of individ- 
uals recombinant simultaneously in the two inter- 
vals, the expectation for no correlation is Riz = Ri Ro. 
When the expectation is not realized, introduction of a 
factor, C (the coefficient of coincidence), allows the 
equation Rı2 = CR,R2. A C value of unity implies no 
correlation, values of zero to unity imply negative 
correlation, and values greater than unity imply posi- 
tive correlation. 

C has been applied to adjacent (joint) intervals (C7) 
and to disjoint intervals (Cp). Each application has its 
uses: 

Cy. Three linked markers define a pair of adjacent 
intervals, 1 and 2, for which C} can be experimentally 
determined by a three-factor cross in which R4, Ro, 
and Rı2 are measured: 


Cy = Ri2/RiR2 (1) 


Cj, Ri, R2, and the recombination frequency for the 
inclusive interval, R3, are related, through the equa- 
tions above, by 


RoR BRR. (2) 
With this expression, Cy can be determined (less sensi- 
tively), by combining the data from the three two- 


factor crosses: 


Cy = (Ri + R2 — R3)/2R, Ro. (3) 
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Figure | (A) Three-factor cross to measure C; (B) 
four-factor cross to measure Cp. R values are recombin- 
ation frequencies. 
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Agreement between the two methods ensures that the 
positions of the markers, and not the nature of the 
markers, dictate the R values. 

Cp: Four linked markers define three intervals. The 
coefficient of coincidence (Cp) for the two disjoint 
intervals, 1 and 2 (without regard to events in the 
central interval, 3), is Cp = Ri2/Ri Ro. 


Observations on the Coefficient of 
Coincidence 


Both C; and Cp are observed to vary with the value of 
R. For most meiotic data, the length of DNA 
between markers is sufficiently great that recombin- 
ation is dependent primarily on reciprocal exchange 
(crossing- over) and is not appreciably influenced by 
gene conversion. In these organisms, as R3 approaches 
zero, C is observed to approach zero, implying that 
close double exchanges are prohibited. C increases 
toward unity as R; increases. Equation (1) demands 
that Cy approach 1 as either R, or R2 (and therefore 
R3) approach 0.5, the upper limit of observed recom- 
bination frequencies. Consequently, Cy is uninforma- 
tive regarding any influence of distant exchanges on 
each other. Cp, on the other hand, remains informa- 
tive for all values of R3; it typically approaches unity 
when M3, the linkage map length of the central inter- 
val, is about 40 cM (0.4 nonsister exchanges per chro- 
matid) (Foss et al., 1993). 

Equation (2) implies that R3 < Rı + R2 by an 
amount that is dependent on Cz. For R; and R2 
sufficiently small (and Cy not too large), additivity 
will (approximately) obtain because CRR will be 
small compared to Ry and R2. For larger R values, 
the deviation from additivity can be predicted from 
knowledge of C}, facilitating the construction of em- 
pirical mapping functions (e.g., Amati and Meselson, 
1964). 

The nonrandomness of exchanges is sometimes 
reported as interference, defined as 1 — C. When C < 
1, interference is positive; when C > 1, interference is 
negative. The mechanism(s) underlying the positive 
interference seen in meiosis of most eukaryotes is 
unknown. 

When markers are within a few kilobases of each 
other, Cj, measured in three-factor crosses, is often 
greater than unity. Examination of meiotic yeast tet- 
rads reveals that such ‘localized negative interference’ 
is associated with gene conversion and is influenced by 
mismatch repair of heteroduplexes arising at sites of 
repair of meiotically induced double-strand breaks. 
Localized negative interference in bacteriophage and 
bacterial crosses has a similar basis. 

For more distant markers, bacteriophage crosses 
manifest distance-independent negative interference. 


This is primarily a consequence of heterogeneity in 
recombination opportunities among the lineages of 
phage particles produced during the cross. For phages 
with circular linkage maps, some of this negative inter- 
ference is a consequence of map circularity, which 
demands that recombination in one interval be accom- 
panied by recombination elsewhere to meet the 
requirement for an even total number of exchanges, 
which defines a circular map. 


Further Reading 
Stahl FW (1979) Genetic Recombination: Thinking about It in Phage 
and Fungi. San Francisco, CA: WH Freeman. 
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This term refers to an animal, or strain of animals, with 
a whole genome identical to that of an inbred strain, 
except for an alternative allele at a single locus. Coiso- 
genic strains originate in one of two ways: through a 
spontaneous mutation that occurs within an animal 
from an inbred strain, or by direct gene replacement 
by genetic engineering on embryos from an inbred 
strain. A coisogenic pair of strains refers to the original 
inbred strain and the mutant strain. Coisogenic pairs 
provide a powerful set of living tools for characteriz- 
ing the actual effect that a mutation has on a whole 
animal because any reproducible phenotypic differ- 
ence observed between the two strains (raised in the 
same environment) must be a consequence of the 
single allelic difference that distinguishes them from 
each other. 


See also: Inbred Strain 
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Bacteria respond to environmental challenges in many 
ways. Competition with different members of the 
microbial community is one such challenge, and the re- 
sponse is often to release chemicals that inhibit the 
growth of other microorganisms. These allopathic sub- 
stances include metabolic byproducts such as hydro- 
gen peroxide; ‘classical’ antibiotics such as bacitracin; 
and protein antibiotics (bacteriocins) which include 
the colicins. Colicins are antimicrobial compounds 
produced by, and active against, Escherichia coli and 
other members of the Enterobacteriaceae. This article 
discusses the distribution of colicins and Col factors as 
well as their ecoevolutionary dynamics. 


Colicinogeny 


The colicin phenotype is encoded by three tightly 
linked genes: the colicin, immunity, and lysis genes. 
The genes are found on accessory genetic elements 
(plasmids) called Col factors. Under conditions of 
stress, such as nutrient depletion, some colicinogenic 
cells are induced to produce colicin proteins. Simul- 
taneously, lysis proteins are produced and some time 
after synthesis, the colicin protein is released from the 
cell. Colicin synthesis results in the death of the cell. 
Cells harboring Col factors are protected from their 
own colicin by a specific, constitutively expressed, 
immunity protein. 

Colicins gain entry into susceptible cells by recog- 
nizing specific receptors on the surface of the target 
cell. Once a colicin is translocated into a target cell it 
will, depending on the colicin, kill the cell in one of four 
ways: by altering the permeability of the cytoplasmic 
membrane, cleaving 16S ribosomal RNA, nonspecific- 
ally degrading DNA, or inhibiting peptidoglycan 
synthesis. The available evidence indicates that a single 
colicin molecule is sufficient to kill the target cell. 


Col Factors 


Col factors may be large selftransmissible plasmids with 
molecular weights of about 10’, or nonconjugative, but 
mobilizable plasmids with molecular weights of about 
10°. The low-molecular-weight Col factor, Col E1, has 
been entirely sequenced. Other than the genes 
required for plasmid replication, mobilization, and 
the three colicin-related genes; no other genes have 
been identified. The large Col factors harbor genes 
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unrelated to plasmid replication, transfer, or colicin 
production. For example, Col V encodes aerobactin, 
a putative virulence factor. 

The evolutionary stability of Col factor lineages is 
variable. The large conjugative Col factors, such as Col 
Ia, appear to represent many distinct plasmid lineages 
that carry the same colicin gene cluster. Extensive 
restriction fragment length polymorphisms (RFLPs) 
have been observed in Col Ia, with little of the plasmid 
DNA being homologous between Col Ia isolates. The 
observed differences indicate the transfer of the Col Ia 
operon between plasmids with different evolutionary 
ancestries. By contrast, the small nonconjugative Col 
factors appear to share a common ancestry. RFLP 
analysis of Col E1 isolates has revealed extensive 
sequence homology between isolates. Further, most 
Col E1 factors apparently exhibit a stable long-term 
association with their bacterial host. However, trans- 
fer of Col factors among bacterial hosts does occur. 
Direct evidence for Col factor transfer comes from the 
observation that it is not rare to find two Col factors in 
the same cell. 

The co-occurrence of different Col factors in the 
same host enables subsequent rearrangements at the 
plasmid level. Cointegration of Col factors has been 
observed and cointegrates of Col B and M are common. 
Recombination allows more localized rearrangements 
of colicin genes between Col factors. The Col factors 
E3 and E6 both contain an additional immunity gene 
that shows a high degree of sequence similarity to the 
E8 immunity gene. Recombination between Col E2 
and Col E7 has generated a novel Col factor that, in 
essence, consists of the colicin E2 gene cluster on a Col 
E7 plasmid. 


Distribution of Col Factors 


Typically, 30-35% of E. coli isolates harbor Col factors. 
Over 25 types of colicins have been identified. Several 
different Col factors are present in most E. coli popu- 
lations, although the types of Col factors present 
varies extensively among populations. Some Col 
factors such as Col Ia are frequently isolated, although 
the dominant Col factor in a collection of isolates 
usually differs among populations. The frequency of a 
particular Col factor in a population is not necessarily 
constant. Over 6 months, the frequency of Col E2 
declined by 30% in a population of E. coli isolated 


from feral house mice. 


Resistance to Colicins 


Cells can become resistant to the action of a colicin 
through alternations in the surface receptor that 
binds the colicin molecule, or through changes in cell 
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membrane proteins involved in colicin translocation. 
The majority of E. coli isolates are resistant to most co- 
occurring colicins. These high levels of resistance are 
in part because colicins exploit only a limited number 
of surface receptors. For example, all of the E colicins 
and colicin A use as a receptor the BtuB protein which 
is normally involved in vitamin B42 transport. Simi- 
larly, these same colicins exploit the Tol translocation 
system. Colicin resistance results in a fitness cost for 
the resistant cells relative to colicin sensitive cells. 


Col Factor Invasion Dynamics 


Colicins are thought to function as anticompetitor 
agents. Early theoretical and empirical studies investi- 
gated the conditions under which colicin production 
would provide a competitive advantage to the produ- 
cing cell. The probability of a colicin-producing strain 
displacing a colicin-sensitive population is a classic 
example of a frequency-dependent phenomenon. If 
colicin-producers are uncommon then they will not 
be able to invade. This is because colicin production 
imposes a cost due to the death of the producing cell. If 
there are few producers in the population, then small 
amounts of colicin are released and too few sensitive 
cells are killed to offset the number of cell deaths due 
to colicin production. The initial frequency of colicin- 
producing cells required to invade a sensitive-cell 
population depends on the rate of cell lysis, the 
amount of colicin produced per lysed cell, and the 
rate at which colicin molecules absorb to sensitive 
cells. These characteristics vary depending on the 
type of colicin being produced and the host cell. In a 
common host background the amount of colicin pro- 
duced per cell can vary 100-fold between colicin types. 
Comparable differences in the amount of colicin re- 
leased by a cell can result when the same Col factor is 
in different host backgrounds. 


Col Factor Eco-Evolutionary Dynamics 


Col factors are thought to function as anticompetitor 
agents by aiding the producing strainin establishing ina 
bacterial community or preventing its displacement 
by another strain. Col factors are isolated at significant 
frequencies in E. coli populations. The high frequen- 
cies at which they are found implies that they must 
confer some advantage to the producing cell, but most 
E. coli cells are resistant to the majority of colicins 
and only a small fraction of cells are sensitive to the 
colicins present in the population. How can these 
apparently contradictory observations be reconciled? 
At present, it is thought that the interactions between 
colicin-producing, colicin-resistant, and colicin- 
sensitive cells are extremely dynamic. 


Consider a population of bacteria that consisted 
initially of colicin-sensitive cells. A colicin-producer 
cell is likely to invade the colicin-sensitive population 
and the population will quickly become dominated 
by the producer. However, colicin-resistant mutants 
will rapidly arise from the sensitive cell population. 
The resistant cells will increase in abundance at the 
expense of the colicin-producing cell population. The 
rate at which resistant cells displace producers will 
depend on the costs to the producer of colicin syn- 
thesis and Col factor carriage, relative to the costs to 
resistant cells of alterations in surface receptors or 
translocation systems. As the frequency of colicin 
producers declines, the cost of resistance in the 
absence of colicin production will favour the increase 
in frequency of colicin-sensitive revertants. The 
colicin-sensitive cells will then displace the colicin- 
resistant population. 

The displacement of the producer population by a 
resistant population, and of the resistant population 
by a sensitive population, will occur much more 
slowly than the displacement of the sensitive popula- 
tion by a producer population. In the first two cases, 
replacement of one population by another depends 
only on the relative growth rates of the two strains. 
In contrast, the production of colicin by the invading 
producer rapidly eliminates the sensitive cell popula- 
tion. This simple scenario suggests a predictable 
sequence of replacement events: sensitive to producer 
to resistant to sensitive. However, the replacement of 
one population by another need not proceed in such 
an orderly fashion. The producer population might be 
replaced by a new type of colicin producer, or the 
resistant population be replaced by a novel producer 
to which the dominant resistant population is sensi- 
tive. This conceptual model captures some of the 
features observed in natural populations; the high fre- 
quency of colicin-producing cells and resistant cells; 
the low frequency of colicin-sensitive cells; and the 
flux in the relative frequency of the different cell 
classes. 

There is a great deal of interest in the use of colicins 
and related compounds as food preservatives, as 
potential replacements for traditional antibiotics to 
treat disease, and as biocontrol agents in the manage- 
ment of plant and animal diseases. Much remains to be 
learned concerning the role that Col factors play in 
natural populations of bacteria. 
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A mutant that is defective in some function at low 
temperature relative to wild-type, but is not defective 
at normal or higher temperatures. 


See also: Temperature-Sensitive Mutant 
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Colicins belong to a class of antibiotics called bacteri- 
ocins. Bacteriocins are produced by various bacter- 
ial species (colicins by Escherichia coli, pyocins by 
pseudomonads, etc.). They are peculiar, notably, in 
their relationship to bacteriophages (phages), i.e., 
viruses that infect bacteria. 

In 1925 the Belgian microbiologist André Gratia 
reported that an E. coli strain (named E. coli V) dis- 
playing high virulence toward rabbits and guinea pigs 
produced a substance that was toxic to another E. coli 
strain (¢) and, to a lesser extent, to Shigella dysenter- 
iae. This substance was called the ‘V principle’ and later 
colicin V. It was active at high dilution (107°) and could 
diffuse in agar from the point of inoculation of the 
producing strain. It was thermostable (withstanding 
heating for up to 30 min at 120°C) and could cross a 
cellophane membrane. It precipitated in an active form 
in the presence of acetone but was destroyed by abso- 
lute alcohol. Gratia concluded that it was a low- 
molecular-weight proteinic substance and confirmed 
this later by demonstrating its sensitivity to trypsin. 

This was the first characterization of such an anti- 
biotic. Colicin V has no therapeutic value but is 
extremely interesting to cell physiologists and molecu- 
lar biologists. Colicins differ from other antibiotics 


Colicins 417 


(B-lactams, quinolones, macrolides, aminoglycosides) 
by several interesting features. Furthermore, as they 
are usually plasmid-encoded, they are also important 
in genetics and biotechnology. 


Diversity and Action of Colicins 


In 1948 Pierre Fredericq described the results of an 
analysis of 881 bacterial strains. Among these, 411 
proved sensitive to colicin V and 254 produced at least 
one colicin active against the indicator strain E. coli b. 
Seventeen colicins were identified, differing by their 
activity spectrum (range of strains against which they 
were active), by the mutations leading to resistance 
against them, by their sensitivity to microbial pro- 
teases, and by the appearance and size of the inhibition 
zones formed on bacterial lawns. Structurally, colicins 
range from simple, low-molecular-weight polypep- 
tides such as colicin V to complex proteinic structures 
such as macrocin G, which resembles a phage tail. 
Colicins bind to specific receptors on the bacterial 
envelope. This in itself, however, is not what causes 
their lethal action, as shown by the existence of mu- 
tations conferring combined tolerance to colicins 
K, E1, and F. Such mutations do not affect the re- 
ceptors for these colicins but rather a common step 
in their lethal action. The latter involves pore forma- 
tion and depolarization of the plasma membrane, 
through an effect on the efflux rate of intracellular 
potassium. This is accompanied by a drop in the ATP 
level and by cessation of macromolecule synthesis and 
B-galactoside transport. Other colicins have other 
modes of action. Colicin E2 acts as a nonspecific endo- 
nuclease, producing single- or double-strand breaks 
in DNA. Colicin E3, like cloacin DF13 produced by 
Enterobacter cloacae, inhibits protein synthesis by 
altering the ribosomes (endonucleolytic rupture of 
16S RNA in the 30S ribosomal subunit of E. coli). 


Relationship to Phages: Colicinogeny and 
Lysogeny 


The first reports on colicins mention analogies to 
phages but stress that colicins, contrary to phages, 
cannot multiply. Colicins and phages sometimes 
share the same receptors: phage BF23 with colicins 
of groups E and F, phage T6 with colicin K, phages 
T1 and phi 80 with colicin M, etc. 

The action of lethal colicin proteins is similar to 
that of virulent phages emptied of their DNA content 
(ghosts) or irradiated with a high dose of UV light so 
as to neutralize the DNA. Comparative kinetic and 
radiobiological studies converge to suggest a similarity 
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at molecular level between colicin K and the tail of 
phage T6. 

Like the phages of lysogenic bacteria, the colicins 
studied to date are not secreted during growth, but 
released upon lysis. UV-induced colicin synthesis in 
colicinogenic bacteria is in all ways comparable with 
prophage induction in lysogenic bacteria. Under cer- 
tain conditions, there may appear on a bacterial lawn 
some very small plaques called ‘lacunae,’ due to the 
production of colicin by a single bacterium under- 
going lysis. 


Colicin Receptors and Colicinogenic 
Factors as Tools in Genetics 


From the emergence of bacterial genetics, mutations 
leading to colicin resistance (either through loss of 
specific receptors or through the development of tol- 
erance) have been analyzed genetically by conjugation 
and transduction. Such mutations are numerous and 
scattered over the entire chromosome, and can thus 
serve as markers in genetic analyses. For instance, 
extensive deletions affecting the prophage attachment 
site have been used to demonstrate the validity of 
Campbell’s model of prophage insertion into the bac- 
terial chromosome. 

Fredericq and Betz-Bareau were the first to show, 
in 1954, that the genes encoding various colicins are 
located on plasmids; examples are the important and 
interesting pCol E1 and pCol V. In 1925, E. coli V was 
isolated from a rabbit that had died of septicemia. 
Later, with various old and more recent lyophilized 
strains, it was shown that the factor responsible for 
septicemia was not colicin V but the products of two 
other genes on pCol V, one conferring resistance to 
serum and the other coding for aerobactin, which, by 
capturing iron, enables the bacterium to develop in the 
bloodstream. Hence, E. coli do not necessarily cause 
septicemia upon entering the bloodstream, but pCol V 
is one of the factors enabling them to do so. 


Plasmid Col El, Source of Gene-Cloning 
Vectors 


Plasmid Col E1 has applications in molecular genetics. 
It has been used to study the mechanism of recombin- 
ation, for instance, and to test Holliday’s model. 
Genetic engineering developed on the basis of this 
plasmid, owing to the following features: (1) the 
Col E1 plasmid is a small, circular molecule that can 
be used to transfect a recipient cell without i integrating 
into its chromosome; (2) the plasmid has a unique 
restriction site recognized and cleavable by the EcoRI 
restriction endonuclease. When treated in vitro with 
this enzyme, the plasmid opens and, without losing a 


single gene, can integrate a foreign DNA fragment 
obtained by restriction with the same endonuclease; 
(3) the plasmid vector bearing the foreign insert can 
then be introduced into bacterial cells (cloned) and 
amplified. After treatment of the culture with chlor- 
amphenicol, a single cell can produce by replication 
up to 1000 copies of the plasmid, each bearing a copy 
of the foreign gene. The product of the cloned gene 
can thus be overproduced by the bacterial strain. 

By recombination between Col and R plasmids, 
researchers have made Col-derived cloning vectors 
that bear an antibiotic-resistance gene. This makes it 
easy to screen for transformants, by simply plating on 
a medium containing the antibiotic. 


Colicinotyping: Applications in 
Epidemiology 


Sensitivity to a given colicin reflects both the presence 
of specific receptors for that colicin and a positive 
response to its lethal action. Sensitivity to bacteriocins 
in general and colicins in particular is used in both 
genetics and epidemiology as a marker for typing 
bacterial strains, including strains of clinical interest. 
Colicinotyping is a complement to lysotyping (typing 
according to sensitivity to bacteriophages). 


See also: Antibiotic Resistance; Bacteriophages; 
Col Factors; Holliday’s Model; Plasmids 
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The Nature of Gene—Protein 
Colinearity 


One of the principal roles of the genetic material of 
each organism is to specify the amino acid sequences 
of all of its many different proteins. Genetic material is 
mostly composed of distinct genes, each consisting of 
a relatively unique linear sequence of nucleotide pairs. 
Genes that encode proteins are transcribed into com- 
plementary messenger RNAs, each of which is trans- 
lated to yield the corresponding protein. Although 
numerous posttranscriptional and posttranslational 
events can influence the ultimate sequence and char- 
acteristics of many proteins, the organism’s genetic 
material in general serves as the dictionary of specific 
instructions that dictate the amino acid sequence of 
every one of its proteins. The relationship between 
the nucleotide sequence of a gene, and the amino acid 
sequence of its corresponding protein, has been 


described as gene—protein colinearity, an hypothesis 
that was verified experimentally in the early 1960s. 


Early History of the Colinearity Concept 


As the field of genetics developed there was consider- 
able interest in the nature of the gene and how genes 
determine the visible characteristics of each organism. 
Although it was appreciated that enzymes catalyze the 
reactions that proceed in all living things there was 
little understanding of how proteins are synthesized 
or how genes participate in this process. It was the 
studies by George Beadle and Edward Tatum in the 
early 1940s that first clearly focused on the relation- 
ship between gene and protein (Beadle and Tatum, 
1941). At the time they conducted their analyses it 
was not known that the genetic material of most organ- 
isms was DNA, nor had it been established that 
proteins consist of linear sequences of amino acids. 

Beadle and Tatum chose the bread mold Neuro- 
spora crassa for their studies. This haploid organism 
can grow ona simple nutrient medium consisting of a 
mixture of salts, a carbon source, and the single vita- 
min, biotin. They mutagenized Neurospora and isol- 
ated numerous nutritional mutants that would not 
grow on this simple nutrient medium, i.e., mutants 
that required a specific amino acid, vitamin, purine or 
pyrimidine for growth. They analyzed these mutants 
both biochemically and genetically, and concluded 
that all mutants defective in performing a particular 
biochemical reaction were altered in the same gene. 
Their compelling evidence establishing this relation- 
ship led them to propose the one gene, one enzyme, 
one biochemical reaction hypothesis. 

At the time Beadle and Tatum proposed their 
hypothesis we knew so little about the composition 
of genes and proteins that it was not possible to put 
the hypothesis to an experimental test. Somewhat 
later, in the early 1950s, it was established that double- 
stranded DNA serves as the genetic material of most 
organisms. It was also shown that most proteins con- 
sist of a linear sequences of amino acids. These find- 
ings led to a redefinition of the one gene, one enzyme 
hypothesis as the gene-protein colinearity hypothesis. 
This updated hypothesis stipulated that the nucleotide 
sequence of each gene determines the amino acid 
sequence of the corresponding polypeptide. However, 
in the early 1960s, when the colinearity hypothesis 
was addressed experimentally, it was not yet possible 
to isolate single genes, or determine their nucleotide 
sequences. Similarly, although the technology for 
protein sequencing had been developed, it was not 
obvious how this technology could be applied to 
examining gene-protein colinearity. Because of these 
technical limitations, the research groups concerned 
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with examining the colinearity hypothesis were forced 
to develop strategies that did not require the sequen- 
cing of genes and proteins. 


Demonstration of Gene-Protein 
Colinearity 


The colinearity of gene structure and protein struc- 
ture was established by two research groups working 
with different material and employing somewhat 
different approaches. One group was led by Charles 
Yanofsky and the other by Sydney Brenner. The 
Yanofsky group performed their studies with the 
trpA gene of the bacterium Escherichia coli, and its 
corresponding protein, TrpA (Yanofsky et al., 1964, 
1967). This protein is essential for tryptophan bio- 
synthesis in this organism. TrpA is one subunit of a 
two component enzyme complex; the second subunit 
is the TrpB protein. To characterize the trpA gene 
genetically a large number of mutants were isolated, 
crossed with one another and the pairwise recombin- 
ation frequencies recorded. From these recombination 
frequencies a linear fine structure genetic map was 
constructed repesenting the relative locations of all 
the mutationally altered sites in the trpA gene. 

This fine-structure genetic map was constructed 
using the logic employed by Seymour Benzer in his 
demonstration that a genetic map was valid represen- 
tation of the nucleotide sequence of a gene (Benzer, 
1957). In the studies with trpA, two classes of mutants 
were recovered, so-called missense mutants and 
nonsense mutants. Missense mutants produce a full 
length, inactive protein, whereas nonsense mutants 
produce only a fragment of the protein. The inactive 
protein encoded by each missense mutant was purified 
and the amino acid change responsible for inactivity 
determined by a procedure called ‘peptide fingerprint- 
ing.’ The positions of the amino acid changes ina set of 
missense mutants were then compared to the positions 
of the genetic alterations in these mutants on the map 
of the trpA gene. As shown in Figure I, the two were 
colinear, i.e., the order and spacing of mutational sites 
within the trpA gene correlated with the order of 
amino acid replacements in the TrpA protein. 

Gene-protein colinearity was also established by 
Sydney Brenner and coworkers in studies of the gene 
encoding the head protein of bacteriophage T4 
(Sarabhai et al., 1964). They analyzed a set of nonsense 
mutants that produced only a fragment of the head 
protein because a translation termination event caused 
by the nonsense mutation interrupted head protein 
synthesis. Their analyses were facilitated by the find- 
ing that approximately 50% of the protein synthe- 
sized in the late stages of infection of a population of 
E. coli cells by phage T4, is the phage head protein. 
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Representation of the genetic map of the trpA gene and the sequence of the corresponding TrpA protein. The 


positions of the genetic alterations in a set of trpA mutants is indicated in the upper double strand, representing the gene. 
The positions of the corresponding amino acid changes are indicated on the bottom line, representing the TrpA protein. 


They radiolabeled the protein fragments that were pro- 
duced by a set of head protein nonsense mutants and 
sized the head protein fragments by digestion with 
different proteolytic enzymes and electrophoretic 
separation of the resulting peptides on a gel. They 
also constructed a fine-structure map of the T4 head 
protein gene by recombination analyses with their set 
of nonsense mutants. They demonstrated that the map 
of the head protein gene correlated exactly with the 
lengths of the corresponding head protein fragments, 
demonstrating that the two are colinear. 


Beyond Colinearity 


With the advancement of technology for analyzing 
DNA, it has become routine to isolate specific genes 
and determine their complete nucleotide sequences. 
Knowledge of the genetic code relating each trinu- 
cleotide in DNA with a specific amino acid, permits 
the complete amino acid sequence of a protein to be 
predicted from the nucleotide sequence of its gene. 
In higher organisms the coding regions within genes, 
called exons, are generally interrupted by noncoding 
nucleotide sequences, called introns, which are re- 
moved when the primary transcript is processed to 
yield the messenger RNA that is actually translated. 
Despite the additional complexity of having noncod- 
ing blocks of nucleotide sequence within genes, the 
colinearity relationship proven in studies of simpler 
organisms remains valid. The protein product invari- 
ably reflects the linear order of the nucleotides in the 
specifying gene. 
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Colony hybridization is a technique for im situ hybrid- 
ization of bacterial colonies to identify those con- 
taining DNA homologous with a particular sequence 
(probe). 


See also: Bacterial Genetics 
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Color blindness is the inability to discriminate visually 
certain colors. It may be complete, i in which case 
there is no color sensation and vision is described as 


monochromatic, or partial where some color discrimin- 
ation in certain regions of the spectrum is present. 

The vertebrate retina contains photoreceptor cells 
that are specialized for the capture of light. These cells 
are subdivided into two classes, the rods and cones. 
Rods are responsible for monochromatic vision in dim 
light and cones for color vision at normal light levels. 
In both cases, the outer segment of the photoreceptor 
is composed of a membranous stack in which the 
key molecules for photon capture, the photosensitive 
visual pigments, are embedded. 

The mechanism of color vision depends critically 
on a comparison of the photon catch of different types 
of cone photoreceptors that are maximally sensitive 
at different wavelengths. This is the process of ‘oppon- 
ency’ whereby different photoreceptors are stimu- 
lated to different extents by light of differing spectral 
content. Comparison of these signals by the brain 
provides the sensation of color. From this it follows 
that color vision requires a minimum of two different 
types of cone photoreceptors to be present. 

In primates, trichromatic color vision is provided by 
the presence of three classes of cone photoreceptors 
with wavelengths of maximal sensitivity (Amax) in the 
yellow-green (around 560 nm, longwave-sensitive, L), 
green (around 530nm, middlewave-sensitive, M) and 
blue (around 430 nm, shortwave-sensitive, S) regions 
of the spectrum. This sensitivity arises from the par- 
ticular visual pigment that is present in each cone type. 
Visual pigments are composed of a chromophore 
retinal attached via a protonated Schiff base to an opsin 
protein. The spectral differences between pigments in 
mammals is entirely due to differences in the amino 
acid sequence of the opsin protein which in turn arises 
from coding differences in the corresponding opsin 
genes. The S pigment is encoded by an S gene on 
human chromosome 7 and the M and L pigments are 
encoded by genes on the X chromosome. 


Red-Green Color Blindness: 
Deuteranopia and Protanopia 


The common forms of color blindness in humans 
affect color discrimination in the red-green region of 
the spectrum and are associated with changes in the 
X-linked M and L genes. The high frequency of these 
defects amongst males is a direct consequence of 
hemizygosity of X-linked genes. The M and L genes 
are organized into a head-to-tail array separated by 
only a short stretch of DNA. These genes show a high 
degree of identity (96% in amino acid sequence) and 
this, together with their close proximity, is responsible 
for the relatively high frequency of mispairing be- 
tween M and L gene sequences and unequal crossing- 
over within the array. Depending on the precise 
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location of the crossover, the consequences of this 
are gene loss, gene duplication, or the generation of 
hybrid M/L genes. 

Red-green color blindness can be subdivided into 
two distinct types: dichromacy where color discrimin- 
ation in the red-green region is absent, or anomalous 
trichromacy where some limited color discrimination 
in this region is retained. With dichromacy, either the 
M or L gene may be absent, resulting in deuteranopia 
or protanopia, respectively. Anomalous trichromacy 
arises from the production of hybrid M/L pigments; 
the relative contribution of M or L sequence to the 
hybrid gene will depend on the exact position of the 
exchange between the M and L genes and this will 
in turn determine whether the resulting pigment will 
have a Amax either similar to an L or an M pigment 
or somewhere in between. If the latter, the spectral 
separation between such a hybrid pigment and a nor- 
mal M or L pigment will be substantially reduced, 
thereby reducing color discrimination in the red- 
green region of the spectrum. 

The relative frequencies of dichromacy and anom- 
alous trichromacy are given in Table |. 


Tritanopia 


The loss of functional S cones, a condition called tri- 
tanopia, arises from mutation in the S opsin gene. It 
occurs at a much lower frequency than red-green color 
blindness. The absence of the blue-sensitive pigment 
limits blue-yellow color discrimination. Tritanopia is 
inherited as an autosomal dominant disorder; the pres- 
ence of a mutant pigment even in heterozygous indi- 
viduals is sufficient to result in S cone degeneration. 


Achromatopsia 


Total loss of color vision arises in two ways, either by 
the absence of both M and L pigments, a rare condi- 
tion called blue cone monochromacy where only S 
cones are present, or by the absence of all cone types. 
In the latter case, only rod photoreceptors are retained 
so the condition is called rod monochromacy. Since 
blue cone monochromacy arises from mutations in the 
M/L gene array on the X chromosome, it shows an 


Table I Types and frequency of color blindness 
Pigments Occurence 
present in males (%) 

Dichromacy 

Protanopia M only 0.81 
Deuteranopia L only 0.48 


Anomalous trichromacy LorM-+ Hybrid 6.61 
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X-linked pattern of inheritance, whereas rod mono- 
chromacy arises from mutations in a number of genes 
that affect other components of signal processing in 
the cone photoreceptors. It is generally inherited as an 
autosomal recessive condition. 


Nonprimate mammals 


Most mammals, other than primates, have only two 
cone types (the minimal situation for color vision), 
one maximally sensitive in the green region of the 
spectrum (500-530 nm) and the other in the blue or 
ultraviolet region (435-365 nm). Such a system pro- 
vides a basic dichromatic color vision system with 
only limited discrimination particularly at longer 
wavelengths. 


See also: Sex Linkage 
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Epidemiology 


There are approximately 135000 new cases of colon 
cancer annually in the United States with approxi- 
mately 60000 people who die from the cancer or its 
complications every year. Over a lifetime, average-risk 
women have a 5% risk of developing colorectal cancer 
and average-risk men carry a 6% risk. This risk is 
substantially increased if an individual has a family 
history of colorectal cancer. The incidence of sporadic 
colon cancers arise with advancing age, especially for 
men and women in their 60s and 70s. 


Risk Factors 


The vast majority of sporadic colon cancers arise from 
the progression from normal colonic mucosa to ad- 
enomatous polyp (a precancerous growth) to cancer. 
This process can take 5-10 years to occur and only 
about 10% of adenomous polyps eventually become 
cancer. The factors which favor progression to cancer 
within a polyp include size of the polyp and histology 
of the polyp (villous features). Other risk factors 
for sporadic colon cancer include prior history of 
colon cancer; inflammatory bowel disease, especially 
ulcerative colitis, where the relative risk increases 
with duration and extent of colitis; prior radiation 
therapy; acromegaly; and, possibly, prior history of 


breast cancer. 


While sporadic colon cancer accounts for 80-90% 
of all colon cancers, there are certain inherited diseases 
or conditions that predispose to the accelerated devel- 
opment of colon cancer. These can be classified into 
the polyposis and nonpolyposis syndromes. The poly- 
posis syndromes, which have an autosomal dominant 
mode of inheritance, can be further subclassified into 
the following: 


1. Adenomatous polyposis syndromes which include 
familial adenomatous polyposis (FAP), which is a 
condition of diffuse colonic polyposis (100—-1000s 
of polyps) and the inevitable development of colon 
cancer when a patient is in his/her 20s or 30s. There 
are rare variants, called attenuated adenomatous 
polyposis coli (AAPC), and Turcot syndrome 
(association with brain tumors) 

Hamartomatous polyposis syndromes which 
include juvenile polyposis, Peutz—Jeghers and 
Cowden, and some other rare entities. 


N 


The nonpolyposis syndromes include hereditary non- 
polyposis colorectal cancer (HNPCC) types I and II 
as well as Muir—Torre syndrome. HNPCC accounts 
for about 3-5% of all colon cancers and its hallmark 
is predominantly right colonic adenomatous polyps 
(<100). Apart from these syndromes, there are individ- 
uals with a family history of colon cancer (e.g., one 
first-degree relative with colon cancer at an early age 
or two first-degree relatives with colon cancer) who 
are at increased risk for colon cancer. Finally, it has 
been suggested that there is an inherited susceptibility 
to colonic adenomatous polyps and colorectal cancer 
which may be for a subset of colon cancers that is 
independent of the known or well-defined inherited 
colon cancer syndromes mentioned. 


Genetic Basis of Colon Cancer 


There are a number of epidemiological bases to sup- 
port the premise that environmental factors are import- 
ant in the pathogenesis of colon cancer. These include 
diet where high fat, especially red meat, is a contribut- 
ing factor. While long held to be a seminal fact, low 
fiber as a cofactor in colon cancer development has 
been challenged recently. Other invoked factors, with 
varying levels of proof, include obesity, alcohol, and 
estrogen use. 

It is nevertheless likely that a complex interplay 
of environmental factors and genetic alterations 
provide the proper milieu for colon cancer initiation, 
development, and progression. From a genetic view- 
point, colon cancer serves as the paradigm for under- 
standing the genetic basis of cancer in general. In this 
context, sporadic colon cancer can be viewed as the 


orderly accumulation of changes in key oncogenes, 
tumor suppressor genes, and DNA mismatch repair 
genes. 

At the same time, a great deal of information has 
been gained through the elucidation of molecular 
mechanisms underlying FAP and HNPCC. The gene 
responsible for FAP is called the adenomatous poly- 
posis coli (APC) gene. This gene, located on chromo- 
some 5q21, comprises 15 exons and encodes a protein 
of 310 kDa. Germline mutations in the APC gene are 
responsible for the colonic polyposis in FAP as well as 
extraintestinal manifestions, such as congenital hyper- 
trophy of the retinal pigment epithelium, gastro- 
duodenal polyps, and desmoid tumors. The APC 
protein interacts with several intracellular proteins, 
notably B-catenin. This leads to the sequestration and 
degradation of B-catenin. However, if APC is mutated, 
then B-catenin is stabilized, and is translocated into the 
nucleus where it interacts with transcriptional factors 
(Tcf, Lef) to transactivate key genes (c-myc, cyclins). 

The genetic basis for HNPCC is quite complex. It 
involves DNA mismatch repair genes, originally iden- 
tified and characterized in bacteria and yeast. These 
genes, which include /MLH1, bMSH2, hPMS1, 
bPMS2, and hMSH6, maintain the fidelity of DNA 
replication. However, when mutated, there is disrup- 
tion of proper DNA replication, leading to micro- 
satellite instability. It has been demonstrated that about 
50-60% of HNPCC kindreds have germline muta- 
tions in either HMLH1 or bMSH2 and genotypic- 
phenotypic correlations are emerging for HNPCC in 
the context of colon cancer and also extracolonic 
cancers (e.g., endometrial). Overlap exists between 
the pathogenesis of FAP and HNPCC and that of 
sporadic colon cancer. For instance, about 70-90% 
of sporadic adenomatous polyps harbor APC muta- 
tions. About 15-20% of sporadic colon cancers have 
evidence of microsatellite instability as a consequence 
of mutations in the mismatch repair genes. Target 
genes of microsatellite instability include TGFBIIR, 
BAX, and APC, among others. 

In addition, amongst activation of oncogenes, it has 
been demonstrated that mutations in the k-ras onco- 
gene are found in 40-50% of polyps and cancers. 
Sporadic colon cancers also involve inactivation of 
tumor suppressor genes, notably p53 and genes on 
chromosome 18q, particularly SMAD4 which is 
downstream of TGFBITR. Thus, it is important to 
view colon cancer as the accumulation of multiple 
genetic alterations. Further insight has been gained 
through the generation and characterization of trans- 
genic and knockout mouse models that recapitulate 
colonic polyps and cancer. 

As a separate consideration, it is also clear that 
overexpression of cyclooxygenase-2 (COX-2), a key 
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enzyme involved in arachidonic acid metabolism, is 
important in colon cancer pathogenesis through inhib- 
ition of apoptosis, promotion of cell proliferation and 
perhaps, facilitating angiogenesis. In fact, cell culture 
systems, animal models, and human colon cancer spe- 
cimens support the notion that COX-2 overexpres- 
sion is an early event in a manner similar to APC 
mutation. 


Clinical Applications and Future 
Directions 


Knowledge of the genetic basis for colon cancer has 
led to applications in genetic testing for FAP and 
HNPCC, genotypic—phenotypic correlations for 
both inherited and sporadic colon cancer, enabling 
molecular pathologic correlations, and providing the 
basis for chemopreventive and therapeutic approaches. 
The interface between molecular genetics and clinical 
medicine is ever expanding and no where is this more 
apparent than in colon cancer. 

Average-risk patients are defined as those men and 
women over the age of 50 without family history of 
colorectal cancer and without symptoms or signs of 
the disease. Screening guidelines include annual fecal- 
occult blood testing (that is, testing for blood in stool 
with cards) and flexible sigmoidoscopy every 5 years 
or of more utility, colonoscopy. However, if an adeno- 
matous polyp is found during flexible sigmoidoscopy, 
then colonoscopy is performed. Subsequent surveil- 
lance is done with periodic colonoscopy. 


See also: Cancer Susceptibility; Oncogenes 
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A commaless code is one in which amino acids are 
specified by a series of codons in consecutive se- 
quence, as distinct from a code in which after each 
codon one or more bases indicate a punctuation or 
comma before the next codon. For example, in a 
triplet code with commas, three bases would specify 
one amino acid, and a fourth would be a comma, and 
then bases five through seven would specify the next 
amino acid. In a commaless code, bases 1-3 would 
specify the first amino acid, and bases 4-6 the second, 
etc. 


See also: Genetic Code 
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Commensal 
L Silver 
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Animals, like the house mouse, that have adapted to 
life in close association with people or the structures 
that people build, are referred to as commensal. Com- 
mensal stands in contrast to ‘feral’ which describes 
animals from species that can be commensal, but 
instead live in natural habitats. 


See also: Feral 
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Cancer is thought to arise through a stepwise accu- 
mulation of genetic aberrations that gradually change 
a normal cell to a malignant cell. Identification and 
detailed characterization of these genetic changes is 
crucial for the understanding of cancer development 
and progression and more importantly provides an 
opportunity for improved diagnostic and therapeutic 
approaches. Conventional cytogenetic analysis has 
been very successful in identifying genetic changes 
involved in hematological malignancies but the analy- 
sis of solid tumors has been less favorable due to 
technical difficulties and the complexity of the genetic 
changes. Comparative genomic hybridization (CGH) 
was developed in the early 1990s for comprehensive 
screening of DNA sequence copy number changes in 
cancer, especially in solid tumors. CGH allows iden- 
tification and mapping of gains and losses of DNA 
sequences throughout the entire genome. This review 
will provide a brief overview of the methodological 
aspects of CGH and the different applications of 
CGH in cancer research. 


CGH Methodology 


CGH is based on the simultaneous hybridization of 
differentially labeled test and normal reference DNAs 
to normal metaphase chromosomes. The hybridized 
DNAs are detected with two different fluorescent 
dyes. Differences in binding of the test and reference 


DNAs along the target metaphase chromosomes 
reflect the copy number differences between the test 
and reference genomes at every chromosomal loca- 
tion. A digital image analysis system is used to collect 
fluorescence images from individual metaphases, to 
quantitate the fluorescence intensities along meta- 
phase chromosomes, and to calculate the test to refer- 
ence fluorescence ratios. Data from several different 
metaphases are combined to increase the sensitivity 
and to generate relative copy number profiles. An 
increased test to reference fluorescence ratio at a 
given chromosomal location indicates gain or ampli- 
fication of DNA sequences in the test genome in that 
particular chromosomal region. Similarly, decreased 
test to reference ratio indicates loss of DNA sequences 
in the test genome. 

The main advantage of CGH is that it allows 
detection of DNA sequence copy number changes 
throughout the genome in a single hybridization and 
it maps these sometimes very complex changes on- 
to normal metaphase chromosomes. CGH can be 
applied to all kinds of samples where genomic DNA 
is available and a large number of samples can be 
analyzed in a fast and efficient manner making CGH 
an ideal screening tool. Protocols for the analysis of 
paraffin-embedded tissues are in routine use and make 
it possible to study old archival samples. Very small 
specimens, such as a few cells microdissected from a 
specific part of a tumor, can also be examined after 
polymerase chain reaction (PCR)-based DNA ampli- 
fication. The limitation of CGH is that it cannot 
detect genetic changes where there is no change in 
DNA sequence copy number, such as translocations, 
inversions, or mutations. The sensitivity of CGH is 
dependent on the target metaphase chromosomes 
and is limited to aberrations involving 10-20 Mb of 
DNA. Recent developments have considerably im- 
proved the sensitivity of CGH and will be discussed 
below in the section “Comparative Genomic Hybrid- 
ization to Microarrays.” 


Applications of CGH in Cancer Research 


Comparative genomic hybridization has been suc- 
cessfully used in clinical genetics for identification 
of the origin of extrachromosomal material and in the 
analysis of unbalanced chromosomal aberrations. 
However, most of the applications of CGH come 
from cancer research and in particular from the an- 
alysis of solid tumors. In the early years of the tech- 
nology, CGH was mostly used in the identification 
of common chromosomal abnormalities in various 
tumor types. Combination of CGH data from differ- 
ent studies indicates that, in general, chromosomal 
gains are more frequent than losses in solid tumors 
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and that consistent patterns of nonrandom genetic 
aberrations exist in various tumor types. Some of the 
most frequent genetic aberrations identified by CGH 
are common to a number of different tumor types, 
such as the gains of 1q and 8q as well as losses of 8p, 
9p, and 13q. Other chromosomal changes seem to be 
more tumor-specific, such as the 6p loss in gastric 
cancer and the 12p gain in testicular cancer, indicating 
that these chromosomal regions might contain genes 
that play an essential role in the development of these 
specific tumor types. 

A large number of chromosomal gains and losses 
have been discovered by CGH but the genes involved 
in these aberrations are still largely unknown. How- 
ever, CGH has sometimes played an essential role in 
the identification of putative cancer genes, especially 
genes involved in DNA amplifications. In most cases, 
the target gene was a previously cloned gene that was 
located in the region of involvement indicated by 
CGH. For example, CGH analysis of prostate cancers 
that had recurred during androgen deprivation ther- 
apy showed frequent gain or amplification at chromo- 
somal region Xq11-q12. The androgen receptor gene 
is located in this region and was shown to be amplified 
in these tumors. Similarly, CGH analyses of alveolar 
rhabdomyosarcomas showed amplification at 1p36 
and 13q14 and subsequent studies revealed the exist- 
ence of an amplified PAX7-FKHR fusion gene (fusing 
the PAX7 locus at 1p36 and the FKHR locus at 13q14). 
More recently, traditional positional cloning efforts 
have been successfully utilized to identify target 
genes from chromosomal regions that show frequent 
amplification by CGH. For example, the A/B/, 
ZNF217, and BTAK genes were identified as putative 
targets for the frequent 20q amplification in breast 
cancer, and it is likely that such examples will be 
more common in the future. CGH can also be useful 
in identification of genes involved in DNA losses, 
although at present such examples are more rare. 
CGH analysis of intestinal polyps from patients with 
Peutz—Jegher syndrome, a hereditary cancer syn- 
drome, showed consistent loss at 19p and the gene 
that causes this syndrome was subsequently identified 
from this region. 

Tumor development and progression, the gradual 
advancement of a local slow-growing tumor to an 
invasive, metastatic, and eventually treatment refrac- 
tory cancer, is caused by a step-wise accumulation of 
genetic changes. CGH is an excellent tool for the 
genome-wide analysis of genetic changes involved in 
tumor progression and several CGH studies have 
shown that the number of genetic changes increases 
during tumor progression as expected. Analysis of 
large number of tumors at different stages, such as 
premalignant lesions, localized tumors, invasive 


cancers, and metastases, can be used to highlight 
chromosomal changes that are associated with differ- 
ent stages of tumor progression, such as the 3q gain in 
advanced cervical cancer. Comparison of genetic 
changes between groups of tumors, such as primary 
tumors from patients with and without metastases, 
can also be used to identify genetic aberrations 
involved in specific steps of tumor progression. The 
analysis of paired samples from the same patient 
allows a more direct evaluation of the clonal evolution 
of cancer. For example, analysis of primary tumors 
and their metastases in breast and renal carcinomas 
showed a clear clonal relationship between the two 
samples and in most cases a hypothetical pathway of 
genetic progression could be constructed. Recent 
CGH studies have also been pursued to correlate 
genetic aberrations with patient outcome. For ex- 
ample, the gain of chromosomes 17q and 20q has 
been linked to poor prognosis in breast cancer. 


Comparative Genomic Hybridization to 
Microarrays 


The major limitation of CGH is that its resolution is 
limited by the target metaphase chromosomes. CGH 
is very sensitive in detecting small copy number 
changes affecting large chromosomal regions, such 
as gains or deletions involving several chromosomal 
bands, and high level copy number increases of small 
regions, such as those seen in amplifications. How- 
ever, CGH cannot detect genetic aberrations that 
involve less than 10-20 Mb of DNA. Recent studies 
have illustrated that this problem can be solved by 
replacing the target metaphase chromosomes with 
cloned DNA fragments as hybridization targets. The 
DNA fragments are placed in high density on a solid 
support, typically on glass slides, and in theory the 
representation of the entire genome can be included 
in such an array. At present, arrays containing large 
insert size genomic clones, such as bacterial artifi- 
cial chromosomes (BACs), or complementary DNA 
(cDNA) clones have been used. Both strategies have 
been shown to be practical in detection of copy num- 
ber changes and their resolution is dependent only on 
the genomic distance of the clones in the array. The 
cDNA clone-based arrays have the advantage that 
they can be used for parallel analysis of gene expres- 
sion changes and thus can provide a very elegant 
approach for simultaneous analysis of gene copy num- 
bers and gene expression levels. 
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The process of compartmentalization was discovered 
in the imaginal disks (the groups of cells that form 
the cuticular structures of the adult) of the fruit fly 
Drosophila melanogaster and refers to the process by 
which groups of cells become developmentally segre- 
gated during development. Compartments are pre- 
cisely defined parts of the body characterized by the 
lineage of their constituent cells; they are exclusively 
formed by the descendants of a small group of neigh- 
bor cells, a polyclone. Each cell of the original poly- 
clone may contribute to different regions of the adult 
compartment in different individuals, but together 
the cells of the polyclone will always construct the 
same region of the fly. Thus, a compartment is a unit of 
cell lineage in development. Compartmentalization is 


a mechanism to subdivide an organism into parts. It 
provides a cellular and anatomical basis for the old 
concept of ‘state of determination’ in development: 
once a cell belongs to a particular polyclone, the devel- 
opmental fate of the cell and its progeny becomes 
fixed to differentiate a particular body part. 

Compartments may reflect a general property of 
the organization of the body of multicellular organ- 
isms. They are units of genetic control of development 
and of growth and proliferation, and also play a crit- 
ical role in setting the signaling mechanisms involved 
in pattern formation. 


Compartments Are a General Feature 
of the Body of Drosophila and possibly of 
Other Organisms 


Although they were first identified in the thoracic 
structures, it was soon found that the whole Dros- 
ophila body is formed of compartments. The first 
compartmentalization event, which segregates an 
anterior and a posterior compartment in each body 
segment, takes place in early embryogenesis and 
affects all the germ layers. Subsequently, each original 
polyclone may be further subdivided into new ones 
thus originating new compartments within the origin- 
al one. 

For some time it was thought that compartments 
were a specialty of insects because they had only been 
found in Drosophila. The reason was that compart- 
ments were demonstrated thanks to a special cell lin- 
eage method — the Minute technique — only developed 
in Drosophila. This method allowed the production of 
marked clones of cells able to proliferate more rapidly 
than surrounding cells. These clones reached very 
large size and could nearly fill entire adult regions, 
but they would not transgress certain fixed (compart- 
ment) boundaries. Thus, these fast-growing clones 
delineated compartments. The lack of similar tech- 
niques made it harder to demonstrate compartment- 
like lineage segregations in other organisms. However, 
there is now evidence for lineage segregations during 
the development of vertebrate limbs, strongly suggest- 
ing that compartmentalization is a common feature of 
multicellular organisms. 


Compartments Are Units of Genetic 
Control of Development 


A principal tenet of the compartment hypothesis is 
that polyclones are the realm of action of some key 
regulatory genes that establish developmental pro- 
grams in groups of cells. For example, the genes of 
the Hox cluster specify the identity of the body seg- 
ments along the anteroposterior axis in Drosophila 


and in the entire animal kingdom. The functional and 
expression domains of the Hox genes are delimited in 
Drosophila by compartment boundaries, indicating 
that these genes recognize polyclones as units of their 
expression. Similarly, other genes involved with the 
specification of more discrete body regions become 
activated in specific polyclones. For instance, the 
subdivision of embryonic segments into anterior and 
posterior polyclones is achieved and maintained by 
the activity of the homeobox gene engrailed in each- 
posterior polyclone, while it is permanently turned off 
in the anterior polyclones. Thus the posterior poly- 
clone is the developmental unit of engrailed function, 
which confers posterior cells their specific identity. 
A similar phenomenon occurs later during the devel- 
opment of the wing disk when a compartment bound- 
ary appears separating dorsal and ventral polyclones. 
All the cells of the dorsal and none of the ventral 
polyclone acquire activity of the homeobox gene 
apterous, that confers on them specific dorsal identity. 


Compartments Are Units of Growth 


Compartments appear to be units of size control in 
development. This is indicated by the Minute experi- 
ments in which a fast proliferating clone can fill as 
much as 80-90% of the compartment and yet it is of 
normal size. There must be a mechanism restricting 
the proliferation of the other cells within the poly- 
clone in order to build a compartment of normal size. 
This mechanism implies the existence of specific 
cellular interactions within polyclones that control 
growth rates. This control mechanism appears to 
involve the elimination of slow proliferating cells, a 
process called ‘cell competition’ and that operates 
within compartments. 


Compartment Borders Are Sources of 
Morphogens and Therefore Are Borders 
of Positional Information 


Compartment borders play a key role in patterning 
processes. The function of the three major morphogens 
discovered in Drosophila, the products of the genes 
hedgehog (4h), decapentaplegic (dpp), and wingless 
(wg), is associated with compartment borders. The 
interface of bh-expressing and non-expressing cells 
along the anteroposterior compartment border results 
in the activation of dpp (a homolog of TGF-B proteins 
of vertebrates) in the anterior cells adjacent to the 
border. The diffusion of the dpp molecule patterns 
the corresponding structure. Similarly, the dorsoven- 
tral compartment border in the wing is the source of 
the signaling molecule wingless — a homolog of the 
wnt oncogen of vertebrates. 


Complement Loci 427 


As morphogens activate their different target genes 
depending on concentration and the latter is a measure 
of the distance to the source — the compartment border 
-it follows that compartment borders are also borders 
of positional information. 


Further Reading 

Garcia-Bellido A, Lawrence PA and Morata G (1979) Compart- 
ments in animal development. Scientific American 241: 
102-110. 
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and pattern: lessions from Drosophila. Cell 85: 951—961. 
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A compatibility group is a group of plasmids contain- 
ing members unable to coexist in the same bacterial 
cell. 


See also: Plasmids 
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The complement system is the fluid-phase effector 
arm of the adaptive immune response and its main 
biological function is to coat antigens and injure 
invading organisms identified by bound antibody or 
by foreign carbohydrate. It has three activation path- 
ways which are triggered enzyme cascades culminat- 
ing in the cleavage of C3, the central and most 
abundant component of complement (Figure |): 


1. The classical pathway initiated by bound antibody 
and involving C1, C4 and C2. 

2. A lectin pathway triggered by the binding of 
mannan-binding lectin, which in turn activates the 
proteases MASP-1 and -2 to cleave C4 and C2. 

3. The alternative pathway whose main function is 
positive feedback, involving Factors B and D, mod- 
erated by Factors H and I. 
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Figure | The complement system (simplified). 


Cleaved C3 (C3b) is the initiator of the alternative 
pathway and is also a necessary component of the 
enzymes which cleave C5, the last enzymic step, lead- 
ing to the activation of: 


4. The terminal pathway and the assembly of C6, C7, 
C8 and several molecules of C9 to form a mem- 
brane-damaging complex (membrane attack com- 


plex, MAC). 


In addition, there are a number of proteins, both fluid- 
phase and cell-bound, which are involved in the 
homeostasis of the system and as cellular receptors 
for activated components. 

In all, there are at least 35 proteins which are mem- 
bers of or have a close connection with the comple- 
ment system, some of which are encoded by more 
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than one locus. Much of this complex system evolved 
early in vertebrate history over a relatively short time- 
span and many of the modern proteins and their genes 
can readily be classified into families which represent 
gene duplications in the distant past. The gene dupli- 
cations which gave rise to the families were both 
polyploid and tandem in character, so that the modern 
genes are found both in clusters and on different 
chromosomes. There is a functional relationship be- 
tween members of some families (e.g., the ‘terminal 
complement components’) while in other cases the 
members of the families have analogous roles in dif- 
ferent pathways. Furthermore, there is some overlap of 
features of the proteins/genes between the families 
since many of the proteins are mosaics including 
cystein-rich ‘domains.’ Many of these domains are 
found in quite different proteins, as indicated by 


their names derived from the ‘canonical’ discovery. 
However two domains are special to the complement 
system: the complement control protein (CCP) or 
‘Sushi’ repeat and the FIMAC repeat (Factor I, C6 
and C7). 

Historically, deficiency and polymorphism of the 
proteins of the complement components gave an out- 
line knowledge of the genetics, which has now been 
refined or superseded by investigations at the DNA 
level. Deficiency not only causes susceptibility to 
infections (though in many cases less or more selective 
in character than might be expected), but also to 
immunological diseases such as nephritis and systemic 
lupus erythematosus (SLE), mediated by autoimmun- 
ity probably stimulated by uncleared microbes and 
microbial debris. 

In this article, I propose to present the facts by 
family rather than by location (Tables | and 2). 


Clq and Lectins 


C1q is historically the ‘founder’ of this family and the 
protein resembles a short version of pro-collagen, 
having a body, six ‘stalks’ and ‘heads,’ the stalks 
being composed of pairs of polypeptide chains with 
the characteristic Gly-Pro—X motif. Other members 
of this family include bovine conglutinin (which has a 
historic but nonfunctional relationship with the com- 
plement system), and pulmonary surfactants proteins 
A and D. 

C1q is encoded by three genes: C1qA, C1qB, and 
C1qC. All are located on chromosome 1p34.1-1p36.3, 
in the order A-C-B in 24 kb. The genes are 2.5 (A), 2.6 
(B), and 3.2 (C) kb long and have only one intron, 
located within a homologous Gly codon in each gene, 
apparently at the kink-point seen in electron micro- 
graphs. No polymorphism has been described but 
deficiencies due to defects (various) of any of the 
three genes leads to recurrent infection and immune 
complex diseases. 

Mannose-binding lectin is a homotrimer encoded 
by a 3.5kb gene comprising four exons located on 
chromosome 10q. Three variants lead to synthesis of 
truncated polypeptides and are associated with sus- 
ceptibility to infections. 


Serine Proteases 


The trypsin-like serine proteases of the complement 
system fall into a number of structurally distinct 
groups. Some are secreted as functionally active 
enzymes which wait for their substrate (Factors I 
and D), others are fairly conventional pre-proteases 
which are activated as required (Cir and Cis, 
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MASP-1 and MASP-2), while the third group are 
more complex pre-proteases with additional struc- 
tural elements controlling their specificity (C2 and 
Factor B). Of these three groups, the second and 
third include tandem gene duplicates. 

Factor I has little relationship with any other com- 
plement locus, except that it shares a cysteine-rich 
‘domain’ with C6 and C7 and also has two LDL 
receptor class A repeats. The protease domain is 
C-terminal. Deficiencies, which are rare and cause 
runaway activation of the Alternative Pathway, lead 
to complement depletion and increased susceptibility 
to pyogenic infection. The homeostatic function of 
Factor I and the positive feedback character of the 
Alternative Pathway were largely elucidated by inves- 
tigation of a deficient individual and by in vitro simu- 
lation of Factor I deficiency by antibody depletion. 
Factor I cleaves the biologically active C3b fragment 
in a complex process involving different cofactors 
(initially Factor H) and leading to the removal of 
most of the C3 other than the short polypeptide con- 
taining the reactive thioester by which C3b attaches to 
substrates. A charge polymorphism is so far only 
observed in the Japanese population. 

Factor D is also known as adipsin, a structurally 
simple serine protease of adipose tissue. From a com- 
plementological viewpoint, its function is to cleave 
and activate Factor B when this is complexed with 
C3b. This active C3bBb complex then cleaves more 
C3 to C3b, which is, in turn, available to react with 
Factor B and Factor D. Factor D only cleaves Factor B 
complexed with C3b. The gene has not been charac- 
terized, though there is a charge polymorphism of 
the protein among Africans and family studies show 
that it is autosomal (author’s unpublished observa- 
tions). 

Cir and C1s are very similar at the cDNA level and 
are inverted (3’-to-3’) tandem duplicates separated by 
9.3 kbp and located on chromosome 12p13. They are 
mosaics, having two CUB domains separated by 
a calcium-binding EGF (epidermal growth factor) 
repeat, followed by two CCP domains and the 
C-terminal protease. While the gene structure of Cir 
is not yet known, C1s comprises 12 exons spanning 
10.5 kb. Cir has polymorphic variants, but none have 
been identified for C1s. 

MASP-1 and MASP-2 are also very similar at the 
protein level to Cir and C1s and function anlagously, 
but are on different chromosomes. 

C2 and Factor B are very close tandem gene dupli- 
cates with novel N-terminal regions including CCP 
and von Willibrand Factor Type A repeats. Interest- 
ingly, but probably accidentally, these genes are 
located close to the C4 genes in the MHC (major 
histocompatibility complex) region of chromosome 6. 
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Table | Complement gene locations and sizes 


Component Gene Name GDB:# Chromosome Gene size Exons 
Clq CIQA 119042 1 p36.3-1 p34. 1 2.5 2 
CIQB 119043 | p36.3-1 p34. 1 2.6 2 
CIQC 128132 | p36.3-1p34.1 3.2 2 
MBL MBL2 120167 10q22.2 3.5 4 
Conglutinin (bovine) CGN! 28q18 (bovine) II 9 
SP-A SFTPAI 119593 10q21-24 45 7 
SFTPA2 6045454 10q21-24 4.5 7 
SP-D SFTPD 132674 10q21-24 8 8 
Cir CIR 119729 12p13 
Cls cls 119730 12p13 10.5 12 
MASP- | MASP! 361104 3q27-28 >50 >16 
MASP-2. MASP2 6071500 | 
C2 C2 119731 6p21.3 18 18 
Factor B BF 119726 6p21.3 6 18 
Factor D DF 132645 ? (autosome) 
Factor | IF 120077 4q24—-4q25 63 13 
C3 C3 119044 19p 13.3 Al 41 
C4A C4A 119732 6p21.3 21 41 
C4B C4B 119733 6p21.3 14.6 or 21 41 
cs c5 119734 9q33 (?9q34.1) 79 41 
C6 C6 119045 5p12-5p14 80 18 
C7 C7 119046 5p12-5p14 80 18 
C8 C8A 119735 p32 70 11 
C8B 119736 1 p32 40 12 
C8G 119737 9q34 1.8 7 
C9 C9 119738 5p12-5p14 100 11 
Factor H HFI 120041 1q32 120 (mouse) 22 (mouse) 
CRI CRI 119800 1q32 133 39 
CR2 CR2 119802 1q32 30 19 
DAF DAF 119088 1q32 40 11 
MCP MCP 120169 1q32 
C4-bp C4BPA 120568 1q32 
C4BPB 125208 1q32 
CR3 ITGAM 120599 16p11.2 (+) 55 30 
ITGB2 120574 21q22 40 16 
CR4 ITGAX 119758 16p 11.2 25 >30 
ITGB2 120574 21q22 40 16 
ClqRp CIQR 9957729 Small 2 
C3aR C3ARI 5982182 12p13 7 2 
C5aR C5RI 128856 19q13 9 2 
Clusterin CLU 125226 8p21 17 9 
Cl-inhibitor CINH 119041 | 1ql2.1-I1q13.1 17.5 8 
Properdin PFC 120275 Xpl1.3—Xp 11.23 6 10 
CD59 CD59 119769 IIpl3 26 5 


Table 2 Complement gene structures and variants 
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Component Gene Domains* Deficiency Disease? Polymorphism Polymorphism 
name (protein) (DNA)’ 
Clq CIQA + Yes 
CIQB + Yes 
CIQC + Yes 
MBL MBL2 T Yes 
Conglutinin CGN! ? 
(bovine) 
SP-A SFTPAI T F 
SFTPA2 SF + 
SP-D SFTPD i 
Clr CIR CUB, EGFCa, CCP, Protease + Yes ++ 
Cls CIS CUB, EGFCa, CCP, Protease + Yes 
MASP- | MASP! CUB, EGFCa, CCP, Protease 
MASP-2 MASP2 CUB, EGFCa, CCP, Protease 
C2 C2 CCP, VWFA, Protease ++ Susceptible + + 
Factor B BF CCP, VWFA, Protease (+) ++ 
Factor D DF Protease (+) (Africans) 
Factor | IF LDLRA, FIMAC, Protease + Yes (+) 
C3 C3 F Usually ++ PF 
C4A C4A FFF Susceptible +++ TFF 
C4B C4B FFF Susceptible +++ +++ 
C5 C5 + (+) (+) (Melanesians) 
C6 C6 TSPI, LDLRA, EGF, CCP, FIMAC + Susceptible ++ ++ 
C7 C7 TSPI, LDLRA, EGF, CCP, FIMAC + Susceptible + ++ 
C8 C8A TSP1, LDLRA, EGF + Susceptible + + 
C8B TSP1I, LDLRA, EGF + Susceptible + F 
C8G F 
C9 C9 TSP1, LDLRA, EGF Fr Susceptible ?— FF 
Factor H HFI CCP + Yes ++ 
CRI CRI CCP ++ 
CR2 CR2 CCP ++ 
DAF DAF CCP, STP + (Acquired) Yes + 
MCP McP 
C4-bp C4BPA 
C4BPB 
CR3 ITGAM Yes 
ITGB2 =f Yes + 
CR4 ITGAX 
ITGB2 = Yes + 
ClqRp CIQR CRD, EGF STP 
C3aR C3ARI Serpentine TM 
C5aR C5RI Serpentine TM 
Clusterin CLU + + 
CD59 CD59 (+) Yes 
CI Inhibitor CINH STP Serpin ++ Dominant ++ 
Properdin PFC TSPI + Yes 


*Domain abbreviations: CUB, C Ir/s-uEGF-bone morphogenic protein; EGF, epidermal growth factor, Ca calcium-binding; 
CCP, complement control protein; Protease, serine protease; VWFA, von Willibrand factor type a; FIMAC, Factor | and 
membrane attack complex; TSPI, thrombospondin type |; LDLRA, low density lipoprotein receptor type a; STP, serine, 
threonine, proline-rich mucin-like domain; CRD, carbohydrate recognition domain. ‘DNA polymorphism refers to additional 
polymorphisms not reflected in phenotype. 
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C3 and Relatives (C4 and C5) 


It would be convenient, but untrue, to refer to these 
three proteins as ‘the thioester family.’ In fact, though 
they are very similar structurally, there is no reactive 
thioesterin C5. They arealso related toa -macroglobu- 
lin. The thioester is revealed on activation by the 
cleavage of the heavy chain of C3 or C4 and allows the 
large fragment to form covalent bonds with -OH or 
-NH) groups of adjacent molecules. These large frag- 
ments have both the function of an ‘anvil’ on which 
the cleavage or assembly of the next molecule takes 
place and as biologically important molecules in their 
own right, especially the fixed fragments of C3 which 
are vital to opsonization, the enhancement of phago- 
cytosis of coated particles by cells bearing the appro- 
priate complement receptors. The genes are dispersed, 
probably the products of polyploid duplication. 

C3 is composed of two disulide-linked polypeptide 
chains encoded by a single gene. C3 is the most abun- 
dant of the complement proteins and is pivotal as the 
target for cleavage by all three activation pathways 
and as a necessary component for the activation of C5 
and hence the terminal (MAC) pathway. Discovery of 
C3 polymorphism was the first genetic investigation 
of the complement system. 

C4 genes are structurally similar to C3, while the 
polypeptide is cleaved to yield a y chain in addition to 
a and B. However, C4 is encoded in two tandem gene 
copies, both in man (C4A and C4B) and mouse (C4 
and SLP) and the products of these copies have differ- 
ent biological activities. Furthermore, the C4 genes lie 
within the MHC and are probably the most poly- 
morphic plasma proteins known. The polymorphisms 
are observed at several levels: 


= 


. Polymorphism of locus number. Although two is 
the modal value in humans, one or other may be 
deleted, rarely both and occasionally a third locus is 
present. The single deletions have some clinical 
significance in relation to susceptibility to immune 
complex diseases. Double deletion, especially in 
combination with single or double deleted loci on 
the other chromosome, usually causes disease. 

2. Polymorphism of intron size. C4B varies by the 

insertion of a 6.4 kb retroposon in intron 9. 

3. An unusual degree of sequence variation in both 

loci, with the number of protein variants in the 

teens, which is close to the resolution limit of the 
methodology used (agarose gel electrophoresis). 

These variants are due to substitutions on all three 

polypeptide chains. 


The degree of variation may well be related to both to 
the presence of the genes in an unstable region and the 


propensity of duplicated genes to undergo gene con- 
version events. 

C5 is very similar to C3, with the exception of the 
lack of a thioester bond. 

The structure of all three genes is extraordinarily 
conserved, with the exons in homologus places and 
the intron phase types identical. 


Terminal Complement Components 


The terminal complement components comprise C6 — 
C9 and are a family of gene duplicates, though C8 is a 
heterotrimer composed of the family members C8a 
and C8B and the unrelated C8y. They assemble in 
order on the major fragment of C5 (C5b) and develop 
a short-lived binding site, whose chemical character is 
unknown, when C7 is bound. Following attachment 
of C8, multiple C9 molecules (8-12) complete a 
doughnut-shaped structure which inserts into mem- 
brane bilayers, leading to loss of osmotic integrity. 
The genes/proteins are complex mosaics, with the 
unusual feature that many of the exons encode parts 
of two domains (including structurally dissimilar 
domains) and many domains are encoded by more 
than one exon. Like the previous family, the architec- 
ture of the five genes is extraordinarily conserved, 
with the exons in homologous places and the intron 
phase types identical. However, the genes differ in 
numbers of exons and hence in the structural com- 
plexity of the proteins encoded, the largest being 
C6 and the smallest C9. Surprisingly, careful study 
of the gene structures leads to the conclusion that 
the ancestor was most like C6 than the much sim- 
pler C9, and that the structural differences represent 
evolution by ‘editing’ rather than by ‘accretion.’ 
The genes are found on chromosomes 1 (C8A and 
C8B) and chromosome 5 (C6, C7, and C9). Of these 
groups, the C8 genes are in fairly close inverted (3’- 
to-3’) relationship and the C6 and C7 similarly 
inverted, but with a large intergenic space. The rela- 
tionship of C9 to the C6 and C7 pair is not known: 
They are close by linkage, but no molecular map is 
available. 


Regulators of Complement Activity 
(Complement Control Proteins) 


These genes lie in a tandem array on chromosome 1 
and encode proteins which are both membrane-bound 
and secreted into the plasma. They are largely or even 
exclusively made up of repeats of the CCP protein 
domain, usually encoded in one exon, although they 
may be split. The CCP module is also found in several 
of the serine proteases and in C6 and C7. 


CR1 isa large integral membrane protein which is a 
cellular receptor for C3b and C4b, whose function is 
in binding, rather than activation. It is found on eryth- 
rocytes, polymorponuclear leukocytes, and mono- 
nuclear cells, including follicular dendritic cells. The 
binding function allows immune complexes to be 
transported on red cells to the liver, where they are 
removed by Kippfer cells or to be trapped in lymph- 
oid follicles for antigen processing and presenta- 
tion. CR1 is also a cofactor in the catabolism of C3 
by Factor I. The molecule shows polymorphism of 
size, mediated by insertion of extra repeats. 

CR2 is rather similar in many general ways to CR1, 
but binds the ‘rump’ degradation fragments of C3 and 
C4 (C3d and C4d). Itis expressed on mature B lympho- 
cytes, some T lymphocytes, and follicular dendritic 
cells. Its function is to enhance humoral immune 
responses through antigen presentation and process- 
ing. Isoforms are produced by alternative splicing 
rather than internal gene duplications. 

DAF (decay-accelerating factor) is a GPI-anchored 
membrane protein, much smaller than CR1 or CR2. It 
is present on all blood cells and its role is to protect the 
cells from damage by bystander deposition of C3 and 
C5 convertases. Clonal deficiency of DAF due to 
somatic mutation in the erythropoietic system leads 
to susceptibility of that fraction of the red cell popula- 
tion to lyse, a condition known as paroxysmal noctur- 
nal haemoglobinuria. 

MCP is an intrinsic membrane protein, similar in 
structure to DAF. Ubiquitously expressed, it is a 
cofactor for the breakdown of C3 and C5 convertases. 
There are various alternative-splicing isoforms, in- 
cluding alternative transmembrane segments. 

C4-binding protein is a plasma protein which is a 
cofactor for the breakdown of C4b and hence down- 
regulation of C3 and C5 convertases. It has two poly- 
peptide chains, each of which is principally composed 
of CCP repeats. 

Factor H is a large plasma protein which is made up 
only of CCP repeats and is a cofactor for the break- 
down of C3b, hence limiting both the activation of the 
alternative feedback pathway and of C5 convertase. 
Deficiency leads to uncontrolled AP activation and 
hence complement depletion and increased suscept- 
ibility to infection. 


Other Complement Components and 
Receptors 


There are several other complement components 
(notably C1-inhibitor and properdin) and a range of 
receptors. These do not neatly fall into groups, 
with the exceptions of CR3 and CR$ which are 
members of the collectin family and similar to each 
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other and C1-inhibitor which is a member of the 
serpin group of protease inhibitors. For details, see 
Tables I and 2 


Further Reading 

Morley BJ and Walport MJ (eds) (2000) The Complement Facts- 
book. London: Academic Press. 

Sellar GC, Blake DJ and Reid KB (1991) Characterization and 
organization of the genes encoding the A-, B- and C-chains 
of human complement subcomponent Clq. The complete 
derived amino acid sequence of human Clq. Biochemical 
Journal 274: 481- 490. 


See also: Proteins and Protein Structure 


Complementary DNA 
(cDNA) 
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Complementary DNA (cDNA) is a DNA copy of a 
messenger RNA (mRNA) molecule produced by 
reverse transcriptase, a DNA polymerase that can use 
either DNA or RNA as a template. The reverse tran- 
scriptase first copies the RNA, laying down a DNA 
strand, and then uses that DNA strand to make the 
complement, thus giving double-stranded DNA. 

Complementary DNA differs from genomic DNA 
(the chromosomal copy of the same gene) in that the 
RNA transcript copied from the chromosome has 
usually been processed before the cDNA is made. In 
eukaryotes, this involves splicing out the introns and 
adding a poly-adenine tail on the 3’ end. The cDNA 
also carries those sequences at the 5’ end of the gene 
that are transcribed but not translated. 

The use of cDNA has been a major tool in 
molecular biology, especially for gene discovery. 
Messenger RNA is extracted and used to make 
cDNA. The cDNA is then inserted into a plasmid 
that is introduced into Escherichia coli by transform- 
ation. Each colony of bacteria grown from a single 
transformed cell carries many copies of the plasmid 
containing the same cDNA insertion. If that plasmid 
can be introduced into cells of another species, and is 
found to complement a mutant gene, the mutant gene 
has been identified. Because the plasmid carrying the 
gene is still available in the bacterial colony, the gene 
has also been isolated and can be sequenced. 


See also: Introns and Exons; Reverse 
Transcription 
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Complementation 
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Complementation is the ability of a mutant gene to 
restore normal function to a cell that has a mutation at 
a homologous site when a hybrid or heterokaryon is 
produced. This is possible when the mutations are in 
different cistrons such that, between them, a complete 
set of normal information is present. 


See also: Complementation Test 


Complementation Map 


See: Complementation Test 


Complementation Test 


J H Miller 
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Mutations can often be assigned to the same or differ- 
ent genes based on functional tests with diploids (or in 
the case of bacteria and bacteriophages partial or 
temporary diploids) that carry each mutation on a dif- 
ferent chromosome or DNA molecule. In this case, 
the trans configuration, the mutations do not destroy 
function if they are in different genes, since each 
chromosome contributes one wild-type gene that can 
direct the synthesis of a functional product. However, 
if the mutations prevent function, then they are 
assigned to the same gene, since now there is no 
wild-type copy of the gene to direct the synthesis of 
one of the required gene products. As a control for this 
case, the mutations are tested when they are on the 
same chromosome, the cis configuration, in the pres- 
ence of a chromosome with both wild-type genes. 
Now the wild-type character should be restored. 
Mutations that destroy function in the trans but not 
the cis configuration are assigned to the same gene. 
Mutations can be assigned to genes by pairwise com- 
plementation tests, resulting in a complementation 
map of the genes and mutations. 


See also: Cis-Trans Configurations; Cistron 


Complete Penetrance 


See: Penetrance 


Complex Locus 
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The complex locus (of Drosophila melanogaster) pos- 
sesses genetic characteristics inconsistent with the 
function of a gene for a single protein. Complex loci 
are generally very large (over 100 kb). 


See also: Drosophila melanogaster 


Complex Traits 
W N Frankel 
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A ‘complex trait’ is a form of genetic inheritance in 
which a direct and complete relationship does not 
exist between a Mendelian gene and its phenotypic 
consequences. This is partly because complex trait 
phenotypes are more multifaceted than those of 
simple traits. They represent the aggregate effects of 
many cellular or molecular processes on outcomes 
that are more conveniently classified as single meas- 
ures, such as size, blood pressure, body size, an 
unusual behavior, or disease susceptibility. This multi- 
faceted nature may result from only a few genes which 
have pleiotropic effects, or from many genes each 
independently associated with a different process. In 
addition, higher-order factors usually further compli- 
cate the genotype-phenotype relationship, ranging 
from nongenetic factors such as environment, incom- 
plete penetrance or variable expressivity; to genetic 
factors such as allelic or locus heterogeneity, gene 
instability, multigene inheritance, epistasis, imprint- 
ing or mitochondrial inheritance; to combinations of 
factors such as gene x environment interaction. 
Nevertheless, whatever the contributory factors may 
be, the genetic component of a complex trait can be 
studied only when allelic variation exists. Indeed, a 
complex trait phenotype that has multiple genetic 
determinants in one population may show simpler 
inheritance in another, depending upon allele distribu- 
tion within each population. 


Analyzing Complex Traits 


In practice, the label ‘complex trait’ is applied broadly 
to cases where there is some hint of Mendelian in- 
heritance, as evidenced by familial clustering or 


concordance amongst relatives, but which cannot be 
explained by a conventional mode of dominant, reces- 
sive or additive inheritance. Thus, what makes com- 
plex traits stand out is the comparative difficulty of 
their analysis — identifying underlying genetic loci and 
determining how each contributes to phenotype. The 
ultimate analysis of complex traits would entail very 
sophisticated statistical designs, accounting for all the 
possible variables and their interactions. This may 
be feasible in some experimental populations such as 
plants and fruit flies, but tends to be problematic for 
humans and laboratory mammals where it is more 
difficult to obtain the very large populations required 
to rival the multiple variables. 

Even in less tractable systems, however, complex 
trait analysis can be approached using conventional 
designs by (1) reducing the number of phenotypes to 
only the most robust measures, (2) testing models of 
inheritance and gene action that are more likely to 
occur in nature than others, and (3) exploiting the 
high density of genetic markers that exist for humans 
and some model organisms to use as ‘surrogates’ for 
true trait loci. Thus, traits which have continuous 
distributions in a population (e.g., height, blood pres- 
sure) can be analyzed as parametric quantitative trait 
loci (QTL), for example, by examining whether a par- 
ticular marker genotype correlates with phenotypic 
values, and determining the fraction of the variation 
for which it accounts. Marker-phenotype associations 
for traits which are discrete in nature (e.g., diabetic vs 
nondiabetic, drug-resistant vs drug-susceptible), can 
be analyzed non-parametrically, such as by x7 tests, 
and a risk for phenotypic outcome can be assigned to 
each allele. Consequently, however, the statistical 
power gained by the use of simplified designs usually 
comes at a high price: relative to simple traits there is 
poor resolution for genetic mapping and for assign- 
ment of a specific phenotypic role for each locus. 


Experimental Approaches to Dissecting 
Complex Traits 


In model organisms the availability of laboratory 
inbred strains allows researchers to control the genetic 
contribution to a complex trait empirically. In turn, 
reducing the genetic complexity can result in higher 
resolution. Although there is no allelic variation 
within an inbred strain, controlled matings amongst 
strains and their progeny create segregating popula- 
tions in which trait loci can be mapped — typically 
backcross (N3 generation), intercross (F, generation) 
or recombinant inbred strains. Correlating genotype 
with phenotype in these populations simplifies the 
initial mapping of complex trait loci because the allelic 
origin is known for each locus. 
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Even in controlled matings, however, it can be 
difficult to determine precisely a chromosomal pos- 
ition or assign phenotypic role for a locus when, for 
example, 19 loci on other chromosomes may also 
influence the phenotype. In such cases, inbred strain 
crosses may be exploited further by constructing spe- 
cialized substrains between which smaller numbers of 
genes differ. This may be done after acomplex trait locus 
has been assigned to an approximate chromosomal 
region, for example, by constructing a ‘congenic’ 
strain pair which differ only at one of the 19 trait 
loci. In such strain pairs, a phenotype can be assigned 
more specifically to the target locus because no other 
loci are segregating. Moreover, because complex trait 
loci are generally distributed throughout the genome, 
some specialized strain constructions can be done 
in advance of knowing chromosomal location. For 
example, researchers can use recombinant congenic or 
consomic strains, where, depending on the breeding 
strategy, only a fraction (typically between 5 and 
20%) of all parental allelic differences will differ 
between two strains. These latter strategies which 
capture multiple loci simultaneously are an advantage 
when gene x gene interactions (epistasis) underly 
a complex trait, whereas congenic strains are more 
likely to capture only single loci. 


See also: Epistasis; Inbred Strain; Multifactorial 
Inheritance; Pleiotropy; QTL (Quantitative Trait 
Locus) 
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A concatemer is a length of DNA comprising a series 
of tandemly repeated sequences. 


See also: Tandem Repeats 


Concatemer (Genomes) 


E Kutter 
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A concatemer is a molecule made up of multiple 
copies of the same genome strung together in tandem. 
For viroids and many bacteriophages, this sort of 
structure is a standard part of the replication process. 
Viroid reproduction involves the formation of linear 
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concatemers of single-stranded DNA, produced 
through rolling-circle replication. A number of phages 
also replicate using rolling-circle replication to pro- 
duce concatemers. On the other hand, bacteriophage 
T4 undergoes extensive recombination along with its 
replication, forming a replicating DNA pool that is a 
branched concatamer containing 50 or more phage 
genomes. During packaging, the T4 DNA is removed 
and packaged a head-full at a time, with several en- 
zymes responsible for trimming off residual branches 
and sealing any nicks during the packaging process. 
For linear DNA molecules, the formation of con- 
catemers is important in solving the problem of repli- 
cating the ends (see Bacteriophage Recombination). 
Plasmids also sometimes form concatemers, probably 
generally through recombination. 


See also: Bacteriophage Recombination; 
Bacteriophages; Rolling Circle Replication; 
Viroids 


Concatenated Circles 
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Concatenated circles are DNA circles that are inter- 
locked like the rings of a chain. 


See also: DNA Structure 
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While many repetitive sequences duplicate and di- 
verge to evolve new functions, other gene families 
undergo a remarkable type of genetic process by 
which family members become highly homogeneous. 
Individual family members within a species do not 
grow increasingly dissimilar over time due to the 
independent accumulation of mutations, but rather 
undergo sequence homogenization. However, during 
speciation each lineage follows an independent tra- 
jectory with the result that families are divergent 
between species (Figure 1). Concerted evolution 
(coined by Elizabeth Zimmer and colleagues in 1980) 
is the name currently used for this process, although a 
variety of terms (coevolution, horizontal evolution, 
coincidental evolution) have been used in the past. 


The hallmark of concerted evolution is that gene 
family members within a species (paralogs) are more 
similar to each other than they are to their functional 
counterparts in other species (orthologs). Concerted 
evolution is a composite of three distinct phases that 
may or may not be the same mechanistically. 


1. The genetic unit is amplified, via unequal exchange 
or a more saltatory mechanism. 

2. Once amplified, intrachromosomal homogeniza- 
tion of family members occurs. 

3. The species-specific repeat type(s) spread between 
chromosomes to become the predominant family 
member in all individuals in that species. 


Concerted evolution is a universal genetic process that 
occurs in repeated gene families in all organisms, from 
bacteria to humans. It is an important aspect of the 
function, structure and evolution of genomes. Some 
gene families that are evolving in concert encode prod- 
ucts critical for basic cellular functions, such as his- 
tone and ubiquitin proteins and ribosomal, 5S, and 
small nuclear RNAs. Other sequences that undergo 
this genetic process are localized to centromeres and 
telomeres, and are believed to have significant roles in 
chromosome structure and function (e.g., heterochro- 
matin, chromosome pairing and segregation, and 
determination of nuclear positioning). Concerted 
evolution has importance beyond the inherent cellu- 
lar functions of individual gene families. On the mo- 
lecular evolution front, concertedly evolving genes 
remain among the sequences of choice for generation 
of phylogenetic hypotheses at multiple taxonomic 
levels. 


Concerted Evolution within Arrays of 
Tandemly Repeated Genes 


Concerted evolution is particularly evident in tan- 
demly repeated genes, highly specialized arrangements 
in which tens, thousands, or even millions of individ- 
ual units are repeated in a head-to-tail configuration. 
A repeating unit typically consists of genic regions 
and spacers; it may contain single or multiple genes 
(Figure 2A). Many types of sequences (noncoding 
ones like satellites, transcribed but not translated 
genes like ribosomal DNA, protein-coding genes 
like histones) can be tandemly arrayed. This arrange- 
ment seems to foster concerted evolution independent 
of the precise sequences of the repeating unit. 
Concerted evolution of tandem repeats is recog- 
nized by a classic signature of identical, or nearly 
identical, patterns in whole genome blot analysis 
(Figure 2B). High levels of identity have been con- 
firmed by sequencing representative repeats from 
many gene families in a variety of organisms. Sequence 


similarities of 95-100% between family members are 
common. 

A seemingly contradictory aspect of the concerted 
evolution of tandem repeats, more apparent than 
real, is the widespread occurrence of length poly- 
morphisms. Repeats of different lengths coexist in 
many eukaryote genomes (Figure 2C), even within a 
single array. Length polymorphisms map to the 
spacers of the basic repeat. Repeat length differences 
may be caused by insertions and excisions of mobile 
genetic elements, as well as by variation in copy 
numbers of embedded microsatellites, minisatellites, 
or other internally repetitive elements. Concerted 
evolution still occurs between homologous sequences, 
even between family members with dramatic length 
differences. 

Concerted evolution is a dynamic process that 
reflects a balance between the forces that introduce 
new mutations and the genetic mechanisms of ampli- 
fication, homogenization, and fixation. As a result, 
identity between family members is not absolute. 
Instead, subtle variation is superimposed on a general 
framework of high identity. The partitioning of vari- 
ation in an array, while only a snapshot in evolutionary 
time, is still a useful characteristic. Two generaliza- 
tions emerge from studies of several gene families in 
a number of organisms: (1) There is an inverse rela- 
tionship between linear distance and the strength of 
sequence homogenization. In other words, repeats 
that are neighbors in a tandem array tend to be more 
similar to each other than are repeats that are more 
distant from one another. (2) Variation in one locus is 
greatest at the ends of arrays and lowest in internally 
located regions. 


aaaa |aaaa 
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When patterns of variation are analyzed within 
repeats, it is evident that concerted evolution often 
results in patchwork patterns in two ways. (1) Repeat 
regions that are highly homogenized (typically the 
genic sequences) are adjacent to regions that are 
much more variable (typically the spacers). (2) Se- 
quence homogenization is sometimes manifested in 
highly mosaic patterns with a shuffling of components 
derived from different repeats. Not all of these mosaic 
patterns can be explained by functional constraints and 
selection, thus they must ultimately be explained by 
the predominant mechanism operating in that system. 


Concerted Evolution between 
Non-Allelic Sites 


Concerted evolution was initially discovered in large, 
tandemly repeated gene families at one locus, and 
has been most intensively studied therein. However, 
concerted evolution is not restricted to this type of 
arrangement, but rather takes place in every type 
of genomic architecture in which gene families are 
found. Multiple tandem arrays of one family can 
occur at nonallelic chromosomal locations. For ex- 
ample, human and primate rDNAs as well as some 
fly histone families are distributed over as many as five 
different chromosomal sites yet also evolve in concert. 
Repetitive genes that are located on both of the sex 
chromosomes in flies (for examples, rDNA and the 
Stellate/Suppressor of Stellate family) demonstrate 
concerted evolution as well. Members of other, 
much smaller, gene families occur in tightly linked 
clusters but are not in strict tandem arrangement. 
Primate œ- and y-globin genes have this type of 
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Family members within a species (paralogs) are more 
similar to each other than to counterparts in a different 


species (orthologs) 


Figure | 


Concerted evolution. Duplicated genes (boxes) are shown with the ancestral sequence (a) at four 


positions. Concerted evolution occurs when independent changes in Species A (A) and Species B (B) spread 
throughout their respective families and become fixed in the population. For simplicity of presentation, the ancestral 
repeated genes are depicted as identical. As indicated in the text, any set of genes is likely to harbor variants that may 


be selectively represented as new lineages evolve. 


438 Concerted Evolution 
(A) 
(B) 1 2 3 4 © 12345 678 
” a" | 
— sz 
3 28 
-= ~ 
3.6 kbs z 
=< «œ (Sa 
5.0 kb > 3.4 kb > we 
48 kb W = te mad 


Figure 2 (See Plate 4) Examples of concerted evolution in tandem repeats. (A) Histone repeats are quintets of 
coding regions (boxes; one for each of the histone proteins HI, H2A, H2B, H3, H4; arrows show direction of 
transcription) and associated intergenic spacers (horizontal lines between boxes). This entire unit is repeated head- 
to-tail to form a tandem array. (B) Genomic blot analysis of Drosophila melanogaster histone repeats. Four different 
restriction endonucleases (lanes 1—4), each with one recognition site per repeat, generate unit lengths of 4.8 and 
5.0 kb (length differences map to the HI—H3 spacer) in the vast majority of the 100 copies per haploid genome. 
(C) Genomic blot analysis of D. virilis histone repeats. Four different restriction endonucleases, each with one site 
per repeat (lanes I, 2, 6, 7), reveal extensive length heterogeneity in the 50 copies per haploid genome: the doublet at 
3.6/3.4kb represent HI-less quartets, and the ladder of larger fragments are quintets with variably sized H4A-H2A 
spacers. Predominant fragment patterns with restriction enzymes with multiple sites (lanes 3, 5) or no sites (lane 8) in 


the repeat provide further evidence for concerted evolution. 


genomic architecture and are proposed to undergo 
concerted evolution. Members of gene families can 
also occur at dispersed, nonallelic locations as essen- 
tially solitary copies. Examples of such families that 
may be evolving in concert are color vision genes in 
vertebrates and heat shock genes in many organisms. 
While concerted evolution can occur in all of these 
diverse types of gene families, its tempo and mode 
may be very different. For example, a process of 
birth and death that couples amplification (phase 1) 
and lineage-specific loss/retention (phase 3) appears 
to be the major force in the evolution of major histo- 
compatibility complex, immunoglobulin, and verte- 
brate histone genes. 


Mechanisms of Concerted Evolution 


There is general agreement that the genetic mechan- 
isms of unequal crossing-over (also called unequal 


exchange) and out-of-register gene conversion, both 
homologous chromosome and sister chromatid types, 
contribute to concerted evolution, although their 
relative roles are still the subject of debate. Measure- 
ments of rates, mathematical modeling, and com- 
puter simulations all support the capacity of both 
mechanisms to foster concerted evolution. While 
repetitive cycles of either unequal exchange or out- 
of-register gene conversion can result in intra- 
chromosomal homogenization and interchromosomal 
spread, predominance by either predicts different out- 
comes. 

Unequal crossing-over is a reciprocal event that 
results from recombination between family members 
that are paired out of alignment (Figure 3A). It tends 
to generate changes in copy numbers and variable 
lengths of exchange of sequence information, depend- 
ent upon the site of the recombination event. Gene 
conversion is a nonreciprocal, mismatch repair type 
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(A) Consequences of unequal crossing-over 
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Figure 3 Mechanisms of concerted evolution. Four paralogs (boxes) with one variant repeat (shaded) are depicted 
with ancestral sequences at two nucleotide positions (a) and a change (a to b) at one of the sites. For simplicity, 
only the two chromatids involved in the genetic exchange are shown. The brackets on the left mark repeats taking 
part in the second cycle of each process; copy numbers are shown on the right. Shaded regions approximate 
the extent of sequence exchange. (A) Consequences of unequal crossing-over. Point of cross over is indicated by 
an X. (B) Consequences of out-of-register gene conversion. Double-sided arrows indicate site and direction of 


conversion. 


mechanism that operates over relatively small dis- 
tances (Figure 3B). Estimates of the sizes of typical 
conversion tracts are several hundred base pairs for 
yeast, fruit flies, and humans. Gene conversion gener- 
ates small tracts of sequence identity between repeats 
and does not change copy numbers of family mem- 
bers. Gene conversion has an additional advantage 
since it is commonly biased (favoring one allele in 
the nonreciprocal information transfer) which would 
facilitate concerted evolution considerably. 


Since the mechanism(s) that underlie concerted 
evolution are responsible for the extant partitioning 
of variation, knowledge of the features of a family 
must be available prior to reaching conclusions about 
whether unequal crossing-over or out-of-register gene 
conversion is the predominant genetic mechanism. 
Experimental evidence has been obtained in support of 
each. On the one hand, unequal exchange, but not gene 
conversion, can explain partitioning of polymorphic 
repeats in some gene families (e.g., Responder repeats 
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and rDNA in fruit flies). Furthermore, a range of copy 
numbers, consistent with unequal exchange, charac- 
terizes some families like rDNA. On the other hand, 
gene conversion, but not unequal exchange, can 
account for the partitioning of variation in other 
gene families (e.g., human U2 arrays). It is likely that 
both mechanisms are important in concerted evolu- 
tion, and that the relative roles of gene conversion and 
unequal exchange are not the same in all gene families. 
In cases in which family members are located at dif- 
ferent chromosomal sites, unequal crossing-over is 
unlikely to be a predominant mechanism due to the 
undesirable results of nonhomologous exchanges. An 
exception to this restriction occurs in gene families at 
the telomeres of chromosomes where such deleterious 
effects are minimized. 

What is responsible for the different tempo and 
mode of concerted evolution among gene families? 
Not surprisingly, numerous features (several of which 
are known to affect unequal crossing-over and/or gene 
conversion) have been implicated in the regulation of 
concerted evolution. Three global features of a locus 
are involved: 


1. There is a direct relationship between copy num- 
bers and the strength of concerted evolution: high 
copy number families are more homogeneous than 
low copy number families. 

2. Chromosome position has an effect as demon- 
strated directly in yeast and as suggested by the 
tendency of tandem repeats to localize in, or adja- 
cent to, classical heterochromatin. 

3. As spacers, introns, and flanking sequences become 
more divergent, homogenization is less effective. 


Three features found in repeats, especially in spacers, 
have been postulated to play roles in the regulation of 
concerted evolution: 


1. Sequences related to mobile genetic elements occur 
within many repetitive units (e.g., rDNA, Stellate, 
Suppressor of Stellate, Responder, Hsr-omega, his- 
tones, U2) as well as at the edges of tandem arrays. 
Given the well-known correlation between sites of 
mobile element sequences and recombination hot- 
spots, as well as the capacity for transposition to 
cause duplication and gene conversion, the recogni- 
tion that mobile elements may play roles in con- 
certed evolution is becoming more commonplace. 

2. Many repetitive units contain simple sequence 
repeats that are themselves subject to instability 
via replication slippage. The fact chat microsatellite 
stability i is a function of mismatch repair is an in- 
triguing aspect of the suggestion that embedded 
simple sequence repeats are crucial components 
that promote homogenization. 


3. There has also been speculation that three- 
dimensional structural features of repeats (such as 
the presence of stem-loops and scaffold attachment 
sites) may promote concerted evolution. It is prob- 
able that many different factors contribute to the 
tempo and mode of concerted evolution, and that 
each family has its own unique combination of 
factors. 


Future Prospects 


Most of the empirical approaches to concerted evolu- 
tion focus, by necessity, on its second phase, the 
mechanisms of sequence homogenization. Even for 
this relatively well-studied aspect of concerted evolu- 
tion, there are still gaps in our knowledge, particularly 
of the rates of unequal exchange and gene conversion. 
Better understandings of both are imminent, due to 
recent advances in analysis of repeated genes and 
detection of polymorphisms. Whether dinucleotide 
repeats or ribosomal DNA, we have relatively little 
understanding of the precise features and genetic 
forces that trigger amplification and subsequent 
spread throughout a species — the first and third 
phases. 

The growing availability of appropriate reporter 
genes that can be assembled into artificial arrays and 
introduced into genomes by transgenic technology 
promises new experimental systems for the study of 
concerted evolution. Such powerful approaches have 
been largely confined to yeast and cultured mamma- 
lian cells. Given the wonderful and growing collec- 
tions of genes (and corresponding mutations) known 
to affect chromatin structure, repair, and recombin- 
ation in several model organisms, there will be almost 
endless possibilities for examination of synergistic 
effects, and dissection of mechanisms. 
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‘Concordance’ is used in two different ways by geneti- 
cists. Informal genetic studies, it describes the situation 
where two expressed traits or alleles that are found 
together in one parent are also found together in 
the offspring of that parent. The level or percent of 
concordance refers to the fraction of total offspring 
characterized from an experimental cross that shows 
concordance. The remaining fraction is considered 
to be discordant. A high rate of concordance is con- 
sistent with pleiotropy, genetic linkage, or association 
between the two loci or traits under analysis. Con- 
cordance is also used in twin studies to describe pairs 
in which both individuals express the same trait. 
By comparing levels of concordance for a particular 
trait in populations of monozygotic and dizygotic 
twins, it is possible to estimate the heritability of the 
trait. The greater the heritability, the higher the level 
of concordance expected among monozygotic twin 
pairs. 


See also: Heritability; Linkage; Pleiotropy 
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Lethal mutations affect genes that are required for life. 
Organisms homozygous (genotype = aa) or hemizy- 
gous (genotype = a) for such mutations are not viable. 
Conditionally lethal mutations constitute a subclass 
of lethal mutations in which the functionality of the 
mutant gene depends on ‘conditions’ such as tempera- 
ture, pH, or genetic background. Under ‘permissive’ 
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conditions, the mutant gene is at least partially func- 
tional, and the affected individual is viable. Under 
‘restrictive’ conditions, the mutant gene is not func- 
tional, and the affected individual dies. Nonmutant, or 
wild-type, individuals (genotype= AA or A) survive 
under either set of conditions. 


Dominance 


Geneticists can make little use of a dominant mutation 
that is lethal under all environmental conditions 
because an organism carrying such a mutation cannot 
reproduce. Conditionally lethal mutations, on the 
other hand, are valuable precisely because they can be 
propagated under permissive conditions. Conditional 
lethals can be, and sometimes are, dominant. Thus, the 
Aa genotype may or may not be viable under restrict- 
ive conditions. In practice, the vast majority of con- 
ditionally lethal mutations are recessive. 

Collecting lethal mutations is a relatively simple 
matter for researchers working with diploid organ- 
isms, which can carry recessive lethal mutations in 
the heterozygous state (Aa). However, for those who 
study haploid organisms such as bacteria, bacterial 
viruses, or fungi, the only way to collect lethal muta- 
tions is to use conditional lethals, which can be propa- 
gated under the permissive conditions and then 
studied genetically and physiologically under restrict- 
ive conditions. 


Temperature-Sensitive Mutations 


Mutants whose permissive temperature is lower 
than the restrictive temperature are usually called 
temperature-sensitive, although some investigators 
prefer the term heat-sensitive. Mutants whose per- 
missive temperature is higher than the restrictive 
temperature are said to be cold-sensitive. ‘Hot’ and 
‘cold’ are not used here in the sense that they are used 
on water faucet labels. For example, temperature- 
sensitive mutants of bacteriophage T4, first isolated 
by Robert S. Edgar in the 1960s, were screened for the 
ability to grow at 23°C, roughly room temperature, 
but not at 42°C, roughly the high temperature on a 
sunny midsummer day in Phoenix, Arizona. The 
Escherichia coli host bacteria will grow and wild-type 
phage T4 will make plaques at either temperature. The 
temperature-sensitive (ts) mutants of T4, however, 
make plaques only at 23 °C. 

At the DNA level, almost all temperature-sensitive 
mutations are single-base substitutions, which lead to 
amino acid switches in the protein product of the gene. 
The substitution of one amino acid for another can 
affect the folding of the protein into its active three- 
dimensional structure, the stability of that folded 
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structure, or the assembly of several folded protein 
subunits into an active protein complex. Thus, the 
temperature sensitivity of a given mutant might reflect 
the effects of temperature on the folding or assembly 
steps leading to the active gene product, or it might 
be due to the instability of the endproduct itself. 

A few ts mutations have been found where 
temperature sensitivity 1s attributable to the instability 
of RNA rather than protein. For example, a mutation 
that weakens the base pairing between inverted re- 
peats in a transfer RNA (tRNA) can cause the tRNA 
to be nonfunctional at elevated temperature. 


Nonsense Mutations 


A second major class of conditionally lethal muta- 
tions is made up of base-substitution mutations that 
introduce premature stop codons into genes. When a 
stop codon, such as UAG, is substituted for a codon 
corresponding to an amino acid found in the wild- 
type protein product of the stops, the translation 
apparatus synthesizes the protein from its N-terminus 
up to the stop codon and the stops. Thus, the product 
of the mutant gene is a truncated, nonfunctional poly- 
peptide. Mutations that introduce premature stop 
codons into genes are called nonsense mutations. 
They were given whimsical names that derive from a 
laboratory joke: Mutants that introduce the stop 
codon UAG are called amber mutants; those that 
introduce UAA are called ochre mutants; and those 
that introduce UGA are called opal mutants. 

One might expect that the introduction of a pre- 
mature stop codon into a gene would invariably 
incapacitate the gene. How can such mutations be 
conditionally expressed? The answer is that a non- 
sense mutation can be suppressed by the presence of 
an abnormal transfer RNA (tRNA) whose anticodon 
pairs with the stop codon. For example, a tRNA with 
the codon 3’ GUC 5’ will pair with the amber codon. 
The tRNA carries an amino acid that will be incorpor- 
ated into the protein product of the gene at the pos- 
ition corresponding to the nonsense codon. In the 
presence of the suppressor tRNA, the translation 
of the mutant message results in a mixture of two 
products — some truncated polypeptides and some 
full-length protein molecules with an amino acid sub- 
stitution at the position of the nonsense mutation. If 
the substituted amino acid does not greatly alter the 
structure of the protein, the gene product may be suffi- 
ciently active to allow survival of the nonsense 
mutant. 

‘Stop’-binding tRNAs are not normal components 
of the translation apparatus. A cell harboring one of 
these supernumerary suppressor-tRNAs is called a 
suppressor-plus (sz*) cell. Viruses carrying nonsense 


mutations in essential genes will be able to replicate on 
su™ cells, but not on su` cells. The former, therefore, 
are permissive hosts for nonsense mutants, and the 
latter are restrictive hosts. 


Uses of Conditional Lethal Mutations 


A conditionally lethal mutation can affect almost any 
gene in an organism. In a collection of temperature- 
sensitive mutants of a given organism, each mutation 
serves as an identifying marker for one of the many 
genes essential for viability. If the collection of 
mutants is much larger than the number of genes in 
the organism, it is reasonable to assume that the col- 
lection contains representatives of every essential gene 
in the genome. (For example, with four times as many 
mutants as genes, the collection would be expected to 
contain mutants in more than 98% of all the genes.) 
The effort required to gather such a collection depends 
on one’s experimental system. It is practical to under- 
take the identification of every essential gene in a virus 
simply by collecting a large number of conditionally 
lethal mutations. Viruses have genomes ranging from 
fewer than 10 genes to not more than 200-300. A 
similarly exhaustive study of a eukaryote with tens of 
thousands of genes would not be practical, but the 
isolation of conditionally lethal mutations still yields 
valuable information about essential genetic functions. 

By growing conditionally mutant organisms under 
restrictive conditions, one can often deduce the bio- 
logical function of the mutant gene. For example, 
temperature-sensitive mutants of phage T4 that are 
grown at 42°C show different abnormalities depend- 
ing on which gene is mutated (see Figure |). Some fail 
to replicate their DNA. Some make complete phage 
heads, but fail to make assembled tails. Some make all 
components of the tail, but fail to make heads. Bio- 
chemical fractionation of the contents of cells abort- 
ively infected by these phage mutants reveals the 
exact nature of the defect — the specific enzyme that 
accounts for the failure to synthesize DNA or the 
specific essential protein molecule whose instability 
accounts for the failure to assemble normal com- 
ponents of the phage coat. Careful observation of 
the effects of conditional mutations often reveals 
the order of gene-controlled functions in biosynthetic 
pathways and the nature of interactions between the 
products of genes. 

With temperature-sensitive mutants, it is also pos- 
sible to do temperature-shift experiments in which the 
development of a mutant virus or embryo proceeds 
for some time at the permissive temperature and some 
time at the restrictive temperature. The temperature 
shift can be either an upshift or a downshift, and the 
timing of the shift can be varied at will. Experiments 
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Genetic control of the biosynthesis of bacteriophage T4, revealed through the use of conditionally lethal 


mutations. The figure shows steps in the synthesis of head, tail, and tail fibers. The ‘early’ steps in phage development 
— DNA synthesis and related processes — are not shown in this figure. The numbers appearing above the arrows refer 
to genes of the phage. (The only gene in the figure that is not identified by a number is wac, the gene that encodes 
‘whisker’ proteins.) Under restrictive conditions, a temperature-sensitive or amber mutation in any gene whose 
number appears over a particular arrow will block phage morphogenesis at the step symbolized by that arrow. Thus, 
for example, a mutation in gene 23 blocks the formation of empty phage heads, and a mutation in gene 49 prevents 
the ‘stuffing’ of the heads with phage DNA. (Adapted from Wood WB (1979) Bacteriophage T4 assembly and the 
morphogenesis of subcellular structure. The Harvey Lectures 73: 203-223.) 


of this sort can be done to determine whether the 
biological function affected by a particular condition- 
ally lethal mutation is required early in the growth 
cycle or late in the growth cycle or continuously 
throughout the cycle. By varying the time of the shift, 
one can determine exactly when the temperature- 
sensitive process ends or begins. 


Special Cases 


There are many other examples of mutants that sur- 
vive under one set of conditions but not under 
another. For example, auxotrophic mutants of bacteria 
or other microorganisms differ from the prototrophic 
wild-type in their nutritional needs. They cannot 
grow unless their environment contains some specific 
nutrient — an amino acid, a vitamin, a nucleic acid 
precursor — that is not required by the prototroph. 
Some streptomycin-resistant mutants of E. coli are 
actually dependent on the presence of streptomycin 
in the growth medium. Human cells deficient in 
the enzyme hypoxanthine guanine phosphoribosyl 
transferase (HGPRT) are unable to grow on a medium 
containing hypoxanthine, aminopterin, and thymi- 
dine (HAT medium), whereas wild-type cells grow 
perfectly well. In all these cases, the ‘conditions’ that 
determine the viability of the mutant are highly specif- 
ic to the gene function affected by the mutation. 
Nevertheless, the term ‘conditional lethal’ is generally 
reserved for the likes of nonsense or temperature- 
sensitive mutations, in which the permissive condition 


is permissive because it allows the affected gene to 
function, not because the environment has been mani- 
pulated to compensate for abnormal gene function. 


See also: Auxotroph; Start, Stop Codons; 
Temperature-Sensitive Mutant 
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The term ‘congenic strain’ describes a variant strain of 
mice or other animal or plant that is formed by back- 
crossing to an inbred parental strain for 10 or more 
generations while maintaining heterozygosity ataselec- 
ted locus. With the many new tools of molecular genet- 
ics, ithas become easier and easier to clone genes defined 
by mutant phenotypes. Often, mutant phenotypes 
involve alterations in the process of development or 
physiology. In these cases, simply having a cloned copy 
ofageneis oftennotenoughto examine critically the full 
range of effects exerted by that gene on the developmen- 
tal or physiological process. In particular, normal devel- 
opmentand physiology can vary significantly from one 
strain of mice to the next, and in the analysis of mutants, 
it is often not possible to distinguish subtle effects due 
to the mutation itself from effects due to other genes 
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within the background of the mutant strain. To make 
this distinction, it is essential to be able to compare 
animals in which differences in the genetic background 
have been eliminated as a variable in the experiment. 
This is accomplished through the placement of the mu- 
tation into a genome derived from one of the standard 
inbred strains. It is then possible to perform a direct 
comparison between mutant and wild-type strains that 
differ only at the mutant locus. Phenotypic differences 
that persist between these strains must bea consequence 
of the mutant allele. 

The backcross system of congenic strain creation is 
straightforward in both concept and calculation (see 
Figure |). The first cross is always an outcross 
between the recipient inbred partner and an animal 
that carries the donor allele. The recipient inbred 
partner refers to the inbred strain that will end up pro- 
viding the genetic background for the newly formed 
congenic line. The donor allele is the one that holds the 
interest for the investigator. The donor animals need 
not be inbred or homozygous at the locus of interest, 
but the other partner must be both. 

The second generation cross and all those that fol- 
low to complete the protocol are backcrosses to the 
recipient inbred strain (see Backcross). At each gen- 
eration, only those offspring who have received the 
donor allele at the differential locus are selected for the 
next round of backcrossing. 

The genetic consequences of this breeding protocol 
are easy to calculate. First, one can start with the 
conservative assumption that the donor (D) and re- 
cipient (R) strains are completely distinct with different 
alleles at every locus in the genome. Then, all F, 
animals will be 100% heterozygous D/R at every 
locus. According to Mendel’s laws, equal segregation 
and independent assortment will act to produce 
gametes from these F; animals that carry R alleles at 
arandom 50% of their loci and D alleles at the remain- 
ing 50%. When these gametes combine with gametes 
produced by the recipient inbred partner (which, by 
definition, will have only R alleles at all loci), they will 
produce N; progeny having genomes in which ap- 
proximately 50% of all loci will be homozygous R/R 
and the remaining loci will be heterozygous D/R. Thus, 
in a single generation, the level of heterozygosity is 
reduced by about 50%. Furthermore, it is easy to see 
that at every subsequent generation, random segrega- 
tion from the remaining heterozygous alleles will cause 
a further ~50% overall reduction in heterozygosity. 

In mathematical terms, the fraction of loci that are 
still heterozygous at the Nth generation can be cal- 
culated as [(1/2)N—1], with the remaining fraction 
[1 — (1/2) N—1] homozygous for the inbred strain 
allele. These functions are represented graphically 
in Figure |. At the fifth generation, after only four 
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Figure | A highly schematic representation of the 
relative contributions of donor and recipient alleles at 
sequential generations of backcrossing. The donor con- 
tribution is indicated in white and the recipient contri- 
bution is indicated in black with the checkerboard pattern 
indicative of heterozygous loci. By the lOth generation 
of backcrossing, the differential segment around the 
selected locus will represent the major contribution 
from the donor genome. 


backcrosses, the developing congenic line will be 
identical to the inbred partner across ~94% of the 
genome. By the 10th generation, identity will increase 
to ~ 99.8%. It is at this stage that the new strain is 
considered to be a certified congenic. 

Backcrossing can continue indefinitely after the 
10th generation, but if the donor allele does not ex- 
press a dominant effect that is visible in heterozygous 
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animals, it will be easier to maintain it in a homozy- 
gous state. To achieve this state, two 10th-generation 
or higher carriers of the selected donor allele are inter- 
crossed and homozygous donor offspring are selected 
to continue the line through brother-sister matings in 
all following generations. The new congenic strain is 
now effectively inbred, and in conjunction with the 
original inbred partner, the two strains are considered 
a “congenic pair.’ 

In some cases, it will be possible to distinguish 
animals heterozygous for the donor allele from sib- 
lings that do not carry it. In a subset of these cases, as 
well as others, a donor allele may have recessive dele- 
terious effects on viability or fertility. In all such 
instances, it is advisable to maintain the congenic 
strain by a continuous process of backcrossing and 
selection for the donor allele at every generation. 
Congenic strains that are maintained in this manner 
are considered to be in a state of ‘forced heterozy- 
gosity.’ There are two major advantages to pursuing 
this strategy whenever possible. First, the level of 
background heterozygosity will continue to be re- 
duced by ~50% through each round of breeding. 
Second, the use of littermates with and without the 
donor allele as representatives of the two parts of the 
congenic pair will serve to reduce the effects of extra- 
neous variables on the analysis of the specific pheno- 
typic consequences of the donor allele. 

The rapid elimination of heterozygosity occurs 
only in regions of the genome that are vot linked to 
the donor allele which, of course, is maintained by 
selection in a state of heterozygosity throughout the 
breeding protocol. Unfortunately, linkage will also 
cause the retention of a significant length of chromo- 
some flanking the differential locus which is called the 
‘differential chromosomal segment.’ Even for congenic 
lines at the same backcross generation, the length of 
this segment can vary greatly because of the inherently 
random distribution of crossover sites. Nevertheless, 
the expected average length of the differential chromo- 
somal segment in centimorgans can also be calculated 
as [200 (1 — 2 —N)/N] where N is the generation 
number. For all values of N greater than 5, this equa- 
tion can be simplified to [200/N]. The average size of 
the differential segment decreases very slowly. At the 
10th generation, there will still be, on average, a 20 cM 
region of chromosome encompassing the differential 
locus derived from the donor strain. 

It is possible to reduce the length of the differential 
chromosomal segment more rapidly by screening 
backcross offspring for the occurrence of crossovers 
between the differential locus of interest and nearby 
DNA markers. As an example of this strategy, one 
could recover 50 congenic offspring from the 10th 
backcross generation and test each for the presence 


of donor alleles at DNA markers known to map at 
distances of 1 to 5cM on both sides of the locus of 
interest. It is very likely that at least one member of 
this backcross generation will show recombination 
between the differential locus and a nearby marker. 
The animal with the closest recombination event can 
be backcrossed again to the recipient strain to produce 
congenic mice of the 11th backcross generation. By 
screening a sufficient number of these N4; animals, it 
should be possible to identify one or more that show 
recombination on the opposite side of the differential 
locus. In this manner, an investigator should be able to 
obtain a founder for a congenic strain with a defined 
differential chromosomal segment of 5 cM or less after 
just 11 generations of breeding. 


See also: Backcross; Inbred Strain 
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‘Congenital adrenal hyperplasia’ (CAH) is a term used 
to encompass a series of genetic disorders, each one due 
to a mutation of one of the enzymes necessary for the 
biosynthesis of cortisol from cholesterol. The cortisol 
deficiency results in decreased negative feedback on 
the hypothalamic-pituitary axis. This, in turn, results 
in increased corticotropin-releasing hormone (CRH) 
and adrenocorticotropic hormone (ACTH) secretion 
which is responsible for the adrenal hyperplasia, a 
characteristic common to the various forms of CAH. 

The elevated blood concentrations of ACTH pro- 
duce hypersecretion of steroids formed prior to the 
enzymatic deficiency. Hence, the phenotype of CAH 
reflects not only the cortisol deficiency, but also the 
hypersecretion of cortisol precursors. 


Enzymes Involved in Cortisol 
Biosynthesis 


The conversion of cholesterol to cortisol requires six 
enzymes. Five of these are the cytochromes P-450, 
members of the P-450 superfamily of mixed-function 
oxydases. The sixth enzyme is 3f-hydroxysteroid- 
dehydrogenase. The enzymes, along with the locus 
of their gene(s) are listed in Table |. Some of these 
enzymes require the electron-transfer intermediaries, 
adrenodoxin and adrenodoxin reductase. 


Table | Human adrenal steroidogenic enzymes and cofactors 
Name Location/action Choromosomal Result of markedly altered activity 
location 

CYPIIA (P-450.<c) (Mitochondrial) 20-hydroxylase, 22-hydroxylase, 15q23-q24 Congenital lipoid adrenal hyperplasia (female phenotype in all) 
20, 22-desmolase (cholesterol side-chain cleavage) resulting from StAR’ defect 

3B-HSD (Microsomal) 3B-hydroxysteroid dehydrogenase, lp13.1 Salt-losing congenital adrenal hyperplasia (male or female 
A4—A5 =isomerase pseudohermaphroditism) 

CYPI7 (P-450.17) (Microsomal) |7a«-hydroxylase, |7, 20-lyase 10q24—q25 Hypertensive congenital adrenal hyperplasia (male 


CYP21, CYP2IP (P-450.21) 
CYPIIBI (P-450,,,) 


CYPIIB2 
(aldosterone synthase) 


Adrenodoxin 


Adrenodoxin reductase 


(Microsomal) 21-hydroxylase 


(Zona fasciculata/reticularis, mitochondrial) 
| 1 B-hydroxylase 

(Zona glomerulosa, mitochondrial) 

| I B-hydroxylase, 18-hydroxylase (CMOŤI) 
18-dehydrogenase (CMO'll) 

lron—sulfur protein intermediate 


(Mitochondrial) flavoprotein intermediate for 
P4sosce and Pasocii 


6p21 — active gene and 
pseudogene 

8q22 — two homologous 
CYP! 1B genes 

8q22 


| 1q22: active gene 20: 
pseudogenes 
17q24-q25 


pseudohermaphroditism) 

Congenital virilizing adrenal hyperplasia (female 
pseudohermaphroditism) 

Hypertensive virilizing adrenal hyperplasia (female 
pseudohermaphroditism) 

Aldosterone deficiency (renal salt loss) 
Dexamethasone-remediable aldosteronism 

(CYP! 1B1/B2 fusion gene) 

Unknown 


Unknown 


“StAR, steroidogenic acute regulatory protein. 
‘CMO, corticosterone methyl oxidase. 
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Forms of CAH 


Virilizing Adrenal Hyperplasia 

This form of CAH is due to a mutation of CYP21 
resulting in 21-hydroxylase deficiency. This is by far 
the most frequent type of CAH (over 90% of cases). 
The hypersecretion of adrenal androgens results in 
female pseudohermaphroditism and virilism in both 
sexes. Depending on the gene mutation, one recog- 
nizes three phenotypic variants of this form of CAH: 


1. Salt-losing form: a complete or near complete 21- 
hydroxylase deficiency resulting in total absence of 
secretion of cortisol and the salt-retaining steroid, 
aldosterone. 

2. Simple-virilizing form: a partial 21-hydroxylase 
deficiency that permits the secretion of near normal 
amounts of cortisol and aldosterone. 

3. Attenuated form: a minimal enzyme deficiency in 
which the main abnormality is increased adrenal 
androgen production triggered by adrenarche at 
the time of puberty. 


Hypertensive Form with Virilism 

This form of CAH is due to a mutation of the 
CYP11B1 gene resulting in 11B-hydroxylase defi- 
ciency. This form of CAH represents about 5% of all 
cases. The enzymatic block produces hypersecretion 
of corticosterone, 11-deoxy-corticosterone, and ad- 
renal androgens, the first being compensation for the 
cortisol deficiency, the second producing salt reten- 
tion, and the third producing female pseudohermaph- 
roditism and virilism. 


Hypertensive Form without Virilism 

A mutation of the CYP17 gene results in a deficiency 
of 17a-hydroxylase and 17,20-lyase. This results in an 
accumulation of 11-deoxycorticosterone which is re- 
sponsible for hypertension, and of corticosterone that 
compensates for cortisol deficiency. In addition, the 
absence of 17,20-lyase results in an inability to synthe- 
size C-19 (androgens) and C-18 (estrogens) steroids. 
Because this enzyme is also deficient in the gonads, the 
result is male pseudohermaphroditism and the absence 
of estrogen secretion by the ovaries at puberty. 


Deficiency of 3B-Hydroxysteroid 
Dehydrogenase 

This early block in steroid biosynthesis results in low 
secretion of cortisol, aldosterone, and androgens. 
Steroids accumulating before the block have a A-5 con- 
figuration (pregnenolone, 17-hydroxypregnenolone, 
and dehydroisoandrosterone) which have limited bio- 
logical activity. Hence, patients are salt-losers and lack 
glucocorticoid activity. Because dehydroisoandroster- 


one is a poor androgen, male pseudohermaphroditism 
and very mild female pseudohermaphroditism results 


Congenital Lipoid Adrenal Hyperplasia 
There is a mutation of a steroidogenic acute regulatory 
protein (StAR) necessary for the expression of the 
cholesterol side chain cleavage enzyme gene. In such 
cases, there is a total absence of all steroids, both 
adrenal and gonadal. 


Aldosterone Deficiency 

The biosynthesis of aldosterone requires the products 
of the CYP11B2 gene (aldosterone synthase), cor- 
ticosterone methyloxidase I and II. Aldosterone defi- 
ciency results in increased renin—angiotensin, but not 
in ACTH secretion. Hence, this disorder does not 
result in adrenal hyperplasia. 


Other Genetic Considerations 


Allforms of CAH are inherited as autosomal recessive 
traits. Because of its frequency, 21-hydroxylase defi- 
ciency has been studied most extensively. The fre- 
quency of heterozygotes for this form of CAH is 
quite high, about 1 in 50 individuals, and homozygote 
frequency is about 1 in 10000 births. 

Because the 21-hydroxylase gene (CYP21/) is 
located near the genes of the major histocompatibility 
complex (MHC), there is linkage of 21-hydroxylase 
mutations and HLA type. There are two 21- 
hydroxylase genes, one of them being a pseudogene 
(CYP21P). These are in tandem with two genes 
encoding the fourth component of complement C4A 
and C4B. The disposition of these genes along with 
the high homology of CYP21/CYP21P and C4A/C4B 
explains the rather frequent translocations in this 
DNA region. This may also explain the relatively 
frequent conversion of CYP21 to CYP21P. 
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The term ‘congenital’ signifies a condition that is pres- 
ent at birth. Structural birth defects are the most 
frequent congenital disorders. Other disorders pres- 
enting at birth include genetic, metabolic, and neuro- 
logical disorders, and disorders due to the effects of 
environmental factors such as infections and other 
teratogens. Between 2% and 3% of newborns have a 
congenital disorder that will require medical atten- 
tion. Most birth defects are single such as a congenital 
cardiac defect, spina bifida, cleft lip, or talipes, but 
some, such as chromosomal disorders like Down 
syndrome, affect multiple body systems. The inci- 
dence of anomalies is higher in stillborn babies and 
in spontaneously aborted fetuses. 


Terminology 


Terms such as malformation, deformity, and anomaly 
are often used interchangeably. However, there are 
precise definitions for various terms which aid 
description and scientific analysis. 


Malformation 

Malformations are abnormalities in formation of 
organs or parts of the body that arise because of 
abnormal developmental processes. Structures may 
fail to form altogether, may form incompletely, or 
may develop with the wrong configuration from the 
outset. Most malformations will have occurred by 
8 weeks gestation when formation of the organs is 
complete. 


Malformation Sequence and Syndrome 
Malformation sequence refers to a pattern of multiple 
defects that results from a single primary malforma- 
tion. For example, in spina bifida the neural tube fails 
to close as a primary defect, this may then result in 
hydrocephalus and talipes deformity of the foot. A 
malformation syndrome is a collection of several 
primary malformations that occur together and that 
are due to the same underlying cause, such as an error 
in chromosomal number. 


Deformation 

Deformations are alterations in the shape or position 
of body parts due to mechanical forces acting on nor- 
mally formed parts. The mechanical forces responsible 


are usually external such as abnormal shape of the 
maternal uterus or lack of amniotic fluid but are 
occasionally internal such as those due to a neuro- 
muscular problem in the fetus. Common deform- 
ations include talipes and abnormalities of skull shape. 


Disruption 

Disruptions are the result of destructive processes that 
alter a structure after it has normally formed. The 
cause may be external such as when strands of the 
fetal amniotic membrane become entangled in body 
parts, or internal such as interruption in blood supply 
in a cerebral blood vessel giving rise to tissue infarc- 
tion and porencephalic cysts. 


Aplasia, Hypoplasia, Hyperplasia, and 
Dysplasia 

Aplasia refers to the absence of a tissue or an organ due 
to absent cellular proliferation. Decreased cellular 
proliferation leading to undergrowth is termed hypo- 
plasia. Hyperplasia refers to the formation of an exces- 
sive mass of tissue due to an increase in the cell 
number. Dysplasia describes the disordered organiza- 
tion of cells within tissues or of tissues within a particu- 
lar structure. Many inherited bone disorders are due to 
abnormal tissue organization and are termed skeletal 
dysplasias. 


Genetic Causes of Malformations 


The causes of many malformations are still not fully 
understood, particularly the commoner single malfor- 
mations. It is artificial to completely separate genetic 
and environmental contributions since environmental 
factors may interact with particular genetic variations 
to cause a defect. 


Single Malformations 

Factors that suggested a partly genetic basis for com- 
mon malformations such as neural tube defects and 
cleft lip and palate were the observed increase in risk 
for first degree relatives and a lesser, although still 
greater than background, risk for second and third 
degree relatives. Also, in malformations in which the 
frequency between the sexes was unequal such as 
congenital dislocation of the hip and pyloric stenosis, 
the recurrence risk was found to be greater if the 
affected child was of the lesser affected sex. Whilst 
the genetic predisposition to a malformation cannot 
be altered environmental factors can be modified. An 
example of this preventative treatment is the role of 
folic acid in reducing recurrence risk of neural tube 
defects in susceptible mothers after the birth of an 


affected child. 


Multiple Malformations 

When a child is born with several malformations there 
is an urgent need to find a precise diagnosis in order to 
manage and treat the baby appropriately (see Dysmor- 
phology). 

Abnormalities of chromosomal number or smaller 
duplications or deletions cause malformations by dis- 
turbing the action of multiple genes. There are many 
well-described chromosomal syndromes including 
Down syndrome due to trisomy 21, Edwards’ syn- 
drome due to trisomy 18, and cri-du-chat syndrome 
due to deletion of the short arm of chromosome 5. 
As cytogenetic techniques have improved smaller 
chromosomal duplications and deletions can be 
detected and many recurrent pattern malformation 
syndromes such as Prader-Willi syndrome, Angelman 
syndrome, DiGeorge syndrome, and Williams syn- 
drome have been shown to be due to microdeletions 
of chromosome material which are not visible on 
routine microscopy and require the technique of fluor- 
escent in situ hybridization (FISH) to detect them. 
Other recent studies have shown that small chromo- 
somal rearrangements involving the terminal bands of 
chromosomes (subtelomeric regions) are an important 
cause of mental retardation. Studies have estimated 
that in the region of 7% of children with moderate 
to severe mental retardation have such an abnormality. 

Clinical effects of sex chromosomal number can 
range from minimal to lethal and depend on the 
specific defect and on other influences such as X in- 
activation and mosaicism. In conceptuses with a 46,X 
karyotype (Turner syndrome) it is not fully under- 
stood what determines whether the pregnancy will 
abort, as most will do in the first or second trimester, 
or whether the child will survive with relatively few 
effects such as short stature and streak gonads. 

Hundreds of multiple malformation syndromes are 
inherited as single gene traits. Many are well delin- 
eated and the mutated genes identified but some are 
very rare and the phenotypic range not fully under- 
stood. Single-gene multiple malformation syndromes 
can be inherited as autosomal dominant traits, (for 
example, achondroplasia), as autosomal recessive traits 
(for example, Smith-Lemli—Opitz syndrome, now 
known to be due to a defect in cholesterol biosyn- 
thesis) and as X-linked traits such as several syndromic 
X-linked mental retardation disorders. 


Genetic Disorders without Malformation 
Presenting at Birth 


There are many inherited disorders that affect the 
function of tissues and organs, which are apparent at 
birth, but which are not associated with malforma- 
tions as such. Cystic fibrosis, an autosomal recessive 
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disorder, affects secretions of the lungs and gastro- 
intestinal tract and may present at birth with meconium 
ileus, a bowel obstruction. There are many inborn 
errors of metabolism which are inherited as autosomal 
or X-linked recessive traits which present soon after 
birth with progressive metabolic disturbance leading 
to neurological abnormalities and sometimes death. 
One such condition is X-linked ornithine transcarba- 
mylase deficiency, a defect in urea metabolism usually 
leading to death of affected males. Abnormalities of 
the components of connective tissue such as collagens, 
fibrilins, and proteins involved in assembly of these 
components may present at birth with unusual tissue 
laxity and often deformations due to intrauterine forces. 
Severe neonatal Marfan syndrome, usually due to a 
new dominant mutation in fibrilin 1, is an example. 


Environmental Factors and Birth 
Defects 


A teratogen is an environmental agent that can cause 
abnormalities in an exposed fetus. The effects depend 
on the nature of the teratogen, the timing at which the 
exposure occurs and, most likely, the genetic suscep- 
tibility of the mother and/or the fetus. Teratogenic 
agents can be environmental chemicals, maternal 
metabolic factors, drugs, or infections. 

A number of environmental chemicals have been 
linked with birth defects in exposed fetuses including 
lead, methyl mercury, and polychlorinated biphenyls. 
Maternal metabolic factors associated with a signifi- 
cant risk of birth defects are maternal diabetes and 
maternal phenylketonuria. Excessive alcohol intake 
in pregnancy has been linked with fetal growth retard- 
ation, microcephaly, and cardiac and other malforma- 
tions. Many prescribed drugs can act as teratogens 
including some anticonvulsant agents, lithium, andro- 
gens, retinoids, and misoprostol. 


See also: Achondroplasia; Cri-du-Chat Syndrome; 
Cystic Fibrosis; DiGeorge Syndrome; Down 
Syndrome; Dysmorphology; Genetic Counseling; 
Genetic Diseases; Marfan Syndrome; 
Phenylketonuria; Trisomy 18; Turner Syndrome 
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The term ‘conjugation’ is used most often in genetics 
to describe two types of systems of genetic exchange. 
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One involves various species of bacteria that are nor- 
mally haploid and which can transfer plasmids and 
sometimes portions of their chromosomes to other 
bacterial cells or in some cases to plants. The second 
involves diploid protozoan ciliates such as Para- 
mecium, which can exchange entire haploid nuclei 
between cells, thus giving rise to a new pair of com- 
plete chromosomes in each exconjugant cell, derived 
from both of the parental cells. 

Bacterial conjugation is one of the three major 
known modes of genetic exchange between bacteria, 
the other two being transduction and bacterial trans- 
formation. Of these three modes, conjugation is the 
only onethatinvolves cell-to-cell contact. The phenom- 
enon was first reported in 1946 by J. Lederberg and 
E. L. Tatum using Escherichia coli, as the result of a 
conscious effort to find sexual recombination in bac- 
teria. Bacterial conjugation is a sexual mode of genetic 
transfer in the sense that chromosomal material from 
two sexually distinct types of cells are brought to- 
gether in a defined and programmed process. How- 
ever, in bacterial conjugation the process involves only 
a portion (usually small) of the genome of one of the 
cells (the donor) and the complete genome of its sexual 
partner (the recipient), as opposed to sexual union in 
most higher organisms which involves an interaction 
between the entire set of chromosomes from both of 
the parental cells. Thus, genetic transfer in bacterial 
conjugation is partial, and it is in most cases polar, 
wherein genetic material moves unidirectionally 
from the donor cell into the recipient cell followed 
by separation of the cells and further changes in the 
organization or recombination of the combined 
genetic material within the recipient cell. With a few 
conjugational transfer systems, some transfer can also 
occur from the recipient strain into the donor strain. 
This is known as retrotransfer. The transfer of genetic 
material can take several minutes or more (up to 
several hours). 

In contrast to bacterial conjugation, protozoan con- 
jugation involving Paramecium is more complicated 
and prolonged, taking about 20h to complete. As in 
bacteria, conjugation in Paramecium involves cell-to- 
cell contact and transfer of genetic material. However, 
conjugation in Paramecium involves first meiotic and 
mitotic divisions of its diploid germinal set of chromo- 
somes, followed by an exchange of an entire haploid 
nucleus from each of the conjugating cells to the other. 
Occasionally cytoplasmic material, including small 
autonomous genetic components, is also exchanged. 
Thus in Paramecium, conjugation results in a substan- 
tially symmetric and complete exchange of a haploid 
equivalent of genetic material into both of the partici- 
pants, which can then propagate vegetatively, each 
carrying a new combination of chromosomes. 


Of these two varieties of conjugation, the mechan- 
isms of bacterial conjugation are understood in greater 
molecular detail, and are considered below. 


Natural Diversity of Bacterial 
Conjugative Systems 


Conjugative Plasmids 

Thousands of different naturally occurring plasmids 
have been identified from many thousands of variants 
of bacteria. In many cases (up to 50% of natural isol- 
ates tested, particularly in hospitals which use anti- 
biotics extensively), these plasmids carry all the genetic 
information needed to cause their own transfer into 
other cells. Such plasmids are called conjugative (or 
self-transmissible) plasmids. Some conjugative plas- 
mids with a broad host range (able to replicate in 
numerous host species, in particular Pseudomonas or 
Bacteroides species as well as the Enterobacteriaceae), 
can move widely between species. Conjugational 
transfer is possible, for instance, between virtually all 
of the members of the Enterobacteriaceae family, 
which includes Escherichia coli, Shigella, Salmonella, 
Klebsiella, and 27 other genera, and encompasses 
tens of thousands of natural variants. Other plasmids 
move between many gram-positive species including 
Streptomyces, Enterococcus, Bacillus, Listeria, Strepto- 
coccus, and Staphylococcus, and low-frequency trans- 
fer between gram-negative and gram-positive strains 
has been documented. Transfer among the Archaea 
has also been observed. More far-ranging conjuga- 
tional transfer between bacterial and plant kingdoms 
is also well known in the case of transfer of tumor- 
causing agents from Agrobacterium to plants. Con- 
jugation between bacteria and yeast has also been 
observed in the laboratory. 

Besides genes necessary for conjugation, most of 
these plasmids carry other genetic determinants such 
as those for antibiotic resistance or colicins (such plas- 
mids are called R factors or Col factors). These plas- 
mids can be broadly classified depending on whether 
or not they belong to the same incompatibility group 
(Inc group). Two plasmids are in the same Inc group if 
they cannot coexist stably in the same bacterial host. 
At least 30 different Inc groups of plasmids have been 
identified among the Enterobacteriaceae alone. 


Other Configurations of Conjugative 
Functions 

Some plasmids carry only a portion of the genetic in- 
formation necessary for conjugation. The one essential 
element that a plasmid requires for it to be transferable 
is a site where one strand of the DNA is cut, prior 
to transfer. This site, termed the origin of transfer, is 
usually known as oriT, or bom (basis of mobility). 


Also required in the cell is a gene for a site-specific 
nuclease that cuts at this oviT; this gene is usually given 
a mob or tra designation, as are other necessary genes. 
If the oriT (bom) site and necessary mob genes are 
present, the remainder of the conjugational apparatus 
can sometimes be supplied by a coresident conjugative 
plasmid. This is called mobilization in trans of a non- 
conjugative plasmid by a conjugative plasmid. 

Another configuration in which a set of conjuga- 
tion genes can cause transfer of unrelated DNA is 
when the transfer system (e.g., a conjugative plasmid) 
recombines with another plasmid or with the main 
bacterial chromosome. In this configuration the trans- 
fer of one DNA strand beginning with oriT results in 
transfer of all the DNA that is linked to it on the same 
DNA strand, which includes the other plasmid or 
chromosomal DNA. This is mobilization in cis. In 
the case of a conjugative plasmid stably recombined 
with the chromosome, the transfer system is called 
Hfr (high frequency of recombination for chromo- 
somal genetic markers which are transferred to other 
cells). There is sometimes breakage of DNA during 
extended transfer in Hfr crosses, so that markers 
located far from oriT (distal markers) are transferred 
less frequently than earlier markers closer to oriT 
(proximal markers). This effect is known as the trans- 
fer gradient. 

In contrast to stable integration of a conjugative 
plasmid, the plasmid can in some cases interact tran- 
siently (but rarely) with the chromosome or another 
plasmid and cause mobilization perhaps by initial 
steps in recombination in regions of limited DNA 
homology, or owing to the activity of a transposable 
element. If there is substantial homology (more than 
several kilobases) between the conjugative plasmid 
and the replicon being mobilized, repeated recombin- 
ation events result in an equilibrium between inte- 
grated and autonomous states of the plasmid, and 
thus in a population of cells a large percentage of each 
mode of transfer (mobilization (in cis) versus simple 
plasmid transfer) will occur. This happens, for instance, 
with a plasmid that carries a sizable piece of chromo- 
somal DNA (derived by an abnormal excision event 
from a once-integrated plasmid such as in an Hfr) 
which is known as an F-prime (or R-prime, Col- 
prime, etc.) factor. These ‘primes’ confer merodi- 
ploidy for the regions of the chromosome that they 
carry (e.g., F-lac, which includes the E. coli lac operon) 
and have been used extensively for dominance studies. 

In some bacteria, for example, gram-positive spe- 
cies such as E. faecalis, Clostridium difficile, Strepto- 
coccus pneumoniae, or Lactococcus lactis, and in the 
gram-negative genus Bacteroides, genetic determin- 
ants for conjugation are located on the main chromo- 
some, and in some cases associated with a transposon 
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(conjugative transposon) which can either promote its 
own transfer to a recipient cell or interact with a 
plasmid and cause cotransfer, followed by reinsertion 
into the chromosome. 


Requirements for Bacterial Conjugation 


Donors 

In addition to the presence of an oriT (or bom) site on 
the DNA (see above), conjugational donor bacteria 
depend for fertility on a set of genes which all together 
constitute a fertility factor, and which elicits: 


1. Cell-to-cell attachment upon contact. The genes 
involved in cell-to-cell contact, at least in gram- 
negative bacteria, include genes for specialized pili 
which bring about the union or aggregation of two 
or more cells involved in gene transfer. For the 
F factor from E. coli (see Figure 1), at least 15 gene 
products (part of the tra/trb gene operon which 
contains at least 37 genes) are involved in producing 
and assembling the F pili. The pili produced by 
other conjugative plasmids are in some cases closely 
similar to F pili (F-like) and in other cases different 
(I-like, etc.). The pili have outside diameters usually 
in the 5-12 nm range, and sometimes have a hollow 
central core, 2-3 nm in diameter. 

In gram-positive organisms such as E. faecalis, 
pili do not seem to be involved in cell pairing; 
rather, a diffusible pheromone produced by recipi- 
ent cells causes donor cells to produce an aggre- 
gation substance that causes clumping, followed by 
gene transfer. In other cases, simple prolonged 
overlapping growth together leads to gene transfer. 

2. Transfer of genetic material in the form of a single 
strand of DNA from the donor into the recipient. 
Functions required for the actual transfer of DNA 
include the assembly of a transfer apparatus (called 
a relaxosome, since one of its functions is to relax 
the supercoiling of the double-stranded DNA in 
the donor) and various functions needed for move- 
ment of the DNA strand between cells and recircu- 
larization after one complete length of plasmid has 
transferred. The number of gene functions involved 
in the overall conjugative process can be as large as 
about 150, as in the case of certain Ti plasmid- 
induced transfer from Agrobacterium to plants. 

3. Control of conjugation. Though not essential for 
conjugation, the genes for most conjugative sys- 
tems are negatively regulated so that they are not 
expressed by most cells in a population. If one cell 
in a thousand, for example, is expressing its transfer 
functions and is able to conjugate and transfer a 
conjugative plasmid into a new host, the transfer 
system will be transiently derepressed in the new 
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Figure | Stages in bacterial conjugation, as deduced 
for the F plasmid fertility factor in Escherichia coli. The F- 
encoded sex pilus produced on the donor cell surface 
aids in binding to the recipient cell, and can then pull the 
cells together by retraction and dissociation in the 
donor cell. The oriT site on the donor DNA is nicked 
and the 5’ end of the nicked strand is transferred into 
the cell contact region, followed by more of the same 
DNA strand into the recipient cell. If the entire length of 
this strand is transferred, the oriT site at the distal end is 
joined to the end first transferred, restoring the circular 
continuity of the DNA. DNA synthesis normally 
provides new complementary DNA strands in both 
the donor and recipient. In this figure, the main circular 
chromosome of the bacterium (about 50 times as long 
as the circular F factor) is not shown. (Reproduced 
with permission from Pansegrau and Lanka (1996) 
Progress in Nucleic Acid Research and Molecular Biology 
54: 197-251.) 


host, so that rapid successive conjugation events 
can take place and allow an epidemic spread of the 
plasmid into a new population of cells, after which 
repression of the conjugation system will again 
slowly build up and maintain the corresponding 
gene expression at a low level. 


Recipients 
The major requirements of cells to act as recipients 
are: 


1. The potential to support replication of incoming 
single DNA strands. 

2. The absence of certain donor genes, in particular 
genes for pili production, and in some cases other 
genes such as those for surface exclusion or entry 
exclusion. These genes act in donor cells to greatly 
diminish pairing or transfer from another identical 
donor cell, which would be genetically unproduct- 
ive. Under certain growth conditions (e.g., late 
stationary phase), the transfer functions of donor 
cells are not expressed, and such donor cells can act 
transiently as recipients (F` phenocopy), until they 
are allowed to grow vegetatively again. In the case 
of conjugative transposons in E. faecalis, there 
appears to be no entry exclusion, and donor cells 
can act equally as well as recipients. 


Relation of Conjugative Systems to Each 
Other and to Other Export Systems 


Many conjugative systems, as mentioned above, must 
have the capacity to export protein molecules (for 
example, subunits for pili synthesis) as well as DNA 
molecules, transferred into recipient cells. Systems 
with this dual capacity of export are known as type 
IV secretion systems. Recent work on comparison of 
DNA sequences of various transfer systems has 
shown remarkable similarities which indicate evolu- 
tionary relatedness between conjugative systems and 
pathogenicity-related export systems that transport 
primarily proteins, such as the pertussin toxin excre- 
tion system of Bordetella pertussis, and excretion 
systems in Legionella pneumophila and Helicobacter 
pylori. Figure 2 shows a comparison between some of 
the major conjugational functions and protein export 
functions which share numerous components. 


Historical Aspects 


The discovery of bacterial conjugation in 1946 was 
hailed by Salvador Luria in 1947 as “probably among 
the most fundamental advances in the whole history 
of bacteriological science,” even before the most basic 
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genetic study and manipulation of diverse Eubacteria. 
Recently, conjugation in Archaea was confirmed. 

Remarkably, both F and RP4 can also mediate 
exchange with other kingdoms at low frequency. 
Equally interesting, interkingdom transfer can occur 
very efficiently, as in the case of Agrobacterium 
tumefaciens infection of dicotyledenous plants. The 
Ti plasmids of this plant pathogen transfer sequences 
known as T-DNA to the host plants, where they are 
integrated and expressed to facilitate the infection 
process. The ability to transfer DNA is separable 
from the infection-promoting properties in this sys- 
tem. These properties have been extensively used for 
genetic manipulation of plants since the early 1980s. 

Such processes thus have presumably played a 
major role in horizontal exchange of information be- 
tween evolutionarily distant organisms as well as with 
close relatives. 


Events in the Donor 


All conjugal systems studied sufficiently well display 
similar features. Information transfer is unidirectional 
(from donor to recipient) — the cells do not fuse, and 
the two genomes do not recombine freely. The mater- 
ial transferred is single-stranded DNA from the donor. 
Transfer requires protein synthesis in the donor for ex- 
pression of a specialized transfer assembly encoded by 
tra genes. In gram-negative bacteria, donors usually 
elaborate long or short surface filaments, knownas pili, 
detected by electron microscopy. These are required 
for mating pair formation by systems that encode 
them, and may form a DNA transfer channel. How- 
ever, not all microscopically observable pili are asso- 
ciated with conjugal systems, and gram-positive 
conjugal systems seem not to make pili. The donor 
then engages in transfer-specific DNA synthesis, with 
a mechanism similar to that of rolling-circle replica- 
tion of single-stranded plasmids of gram-positive 
bacteria. Normally, the DNA transferred by conjuga- 
tion is that of the conjugal plasmid itself. Transfer is 
initiated by nicking at a specific site known as the mob 
site by Tral or a homolog. The 5’ end of the nicked 
strand enters the recipient; transfer proceeds in a 5’ to 
3’ direction accompanied by replacement synthesis 
primed from the 3’ end of the nicked strand in the 
donor; and the process ends when the mob site is 
encountered again. Termination is normally accompan- 
ied by Tral-mediated circularization of the transferred 
single strand. 


What Can be Transferred? 


Other DNA can be transferred by the same mechan- 
ism, as long as it is attached to a mob site. Nonconjugal 


plasmids often carry such specific sites, and can 
be mobilized by compatible Tra functions present 
in the same cell. A defective conjugal plasmid (lack- 
ing a mob site) can still mediate transfer of mob- 
containing non-conjugal plasmids. If a mob site is 
present in the chromosome, chromosomal DNA of 
the donor will be transferred as well. E. coli strains 
with integrated F factors are known as Hfr (high 
frequency of recombination) strains because this 
transfer promotes recombination of chromosomal 
markers. In favorable cases, an entire chromosome 
can be transferred. This property enabled construc- 
tion of the circular genetic map of E. coli, the first such 
map ever devised. 


Events in the Recipient 


No active participation by the recipient is required for 
DNA transfer to occur, although it may respond to the 
newly introduced single-stranded DNA with induc- 
tion of DNA repair functions (the SOS response). The 
entering DNA is soon rendered double-stranded, a 
process in some cases promoted by transfer of a 
DNA primase (TraG) from the donor to the recipient. 
The recipient now processes the DNA, e.g., with 
restriction enzymes, or with the machinery of homo- 
logous recombination, or by transcribing it, or all of 
these. If present, transposons carried by the new 
DNA may become active and move into the recip- 
ient genome, even when the rest of the transfer inter- 
mediate is degraded. The early-transferred DNA 
segments of conjugal plasmids frequently encode 
functions able to modulate the recipient responses, 
e.g, by inhibiting restriction enzymes (Ard) or the 
SOS response (Psi). 


Evolutionary Relationships 


Transfer functions of diverse origin show similarities 
with each other and (more distantly) with functions 
required for export of protein toxins, suggesting 
that macromolecular export in general makes use of 
homologous machinery. The tra genes of the F factor 
have been best studied and are closely related to genes 
of other enteric low-copy plasmids. RP4 and rela- 
tives typify a second well-studied family, the broad- 
host-range conjugal plasmids of Pseudomonas. These 
express a smaller set of tra functions quite similar to 
each other. The A. tumefaciens Ti plasmids capable of 
transferring DNA into plants, which express no pili, 
also express some genes with similarity to those of 
RP4. At least 10 of the Tra functions of these families 
display a more distant similarity between families. 
Eight of these exhibit similarity to Ptl proteins, which 
are responsible for the export of Bordetella pertussis 


multiple-subunit toxin, suggesting a common origin 
and similar functional roles in these disparate systems. 


See also: Conjugation; Lederberg, Joshua; 
Plasmids 
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Conjugative Transposons 


Conjugative transposons are genetic elements that 
excise from a donor DNA molecule, transfer from 
the donor bacterium to a recipient bacterium, and 
integrate into a recipient DNA molecule. This process 
is shown in Figure |. They are typically much larger 
than many bacterial transposable elements because, in 
addition to encoding proteins that are responsible for 
DNA cleavage and strand transfer during excision and 
integration, they also encode the proteins required for 
conjugal DNA transfer. They are different from con- 
jugal plasmids in that the circular form of the trans- 
poson that results from excision is not capable of 
autonomous replication. 
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Host Range 


These elements were originally discovered as agents of 
transmissible antibiotic resistance in enterococci and 
streptococci. Subsequently they have been identified 
in many different gram-positive and gram-negative 
bacteria. Some of these transposons are remarkably 
promiscuous, being able to transfer into dozens of 
bacteria belonging to different species and genera, 
and even to transfer from gram-positive to gram- 
negative bacteria. This broad-host-range behavior is 
the result of a number of different molecular mechan- 
isms that operate during conjugative transposition. 
The antibiotic resistance determinants carried by elem- 
ents with a broad host range are expressed and are 
active in a wide range of bacterial hosts. The transpo- 
sons evade DNA restriction mechanisms because con- 
jugation involves transfer of a single DNA strand 
from donor to recipient. In some cases they encode a 
protein related to phage antirestriction proteins, and 
their DNA sequence has evolved to contain very 
few recognition sequences for DNA restriction endo- 
nucleases. 


Mechanism of Transposition 


Most conjugative transposons encode an integrase 
enzyme that is a member of the integrase family of 
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(A) A donor bacterium with a conjugative transposon (bold line, ‘Tn’) inserted at a unique site in the 


donor chromosome (‘C’). In reality, the donor chromosome is approximately 200 times longer than the transposon. 
(B) The transposon has excised from the donor chromosome, assumed a circular form, and is ready to conjugate into 
a recipient bacterium. (C) The transposon is present in its circular form in the recipient bacterium. (D) The 
transposon has integrated into the recipient chromosome at a new site, different from the position it occupied in the 
donor bacterium. These donor and recipient bacteria belong to different species. 
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site-specific recombinases (see Integrase Family of 
Site-Specific Recombinases). Therefore, it is assumed 
that the mechanisms of DNA cleavage and strand 
transfer during conjugative transposition are similar to 
the reactions carried out by other integrases. However, 
unlike many true site-specific recombinases, the inte- 
grases of conjugal transposons can catalyze recombin- 
ation between DNA sequences that are not identical, 
allowing integration of the element into different 
DNA targets. Although they can integrate into differ- 
ent target sites, they are distinct from other bacterial 
transposons that use a transposase enzyme with a dif- 
ferent biochemical mechanism to perform DNA cleav- 
age and strand transfer. In some cases, a small, basic 
transposon-encoded protein has been shown to be re- 
quired for excision. This protein (excisionase) presum- 
ably fulfils a similar role in the assembly of a synaptic 
complex consisting of the two transposon ends and inte- 
grase, as does lambdaexcisionase inthe excisionof phage 
lambda (see Phage A Integration and Excision). 


Mechanism of Conjugation 


Genetic data suggest that a single strand of the trans- 
poson is transferred from donor to recipient during 
conjugation. Conjugation also requires a segment of 
transposon DNA that is distinct from the transposon 
ends. Recombinant plasmids carrying this transposon 
segment are mobilized by a transposon resident in 
the donor chromosome, and so this segment of 
DNA constitutes an origin of conjugal transfer. The 
segment shows similarities to the origins of conjugal 
transfer of bacterial plasmids, and so it is assumed 
that DNA transfer proceeds in a similar manner. A 
transposon-encoded protein (Mob) recognizes the 
DNA sequence of the origin of transfer and nicks the 
DNA, exposing a DNA end that can serve as a primer 
for rolling-circle DNA replication. If so, then follow- 
ing transfer of a single transposon DNA strand to the 
recipient, the complementary transposon strand must 
be synthesized in the recipient and the circular form 
of the transposon reconstituted prior to integration of 
the transposon into its target site. 


Related Elements 


There are a number of elements that are closely related 
to conjugative transposons, but, rather than integrat- 
ing into different sites in the recipient genome, they 
integrate into a unique site. These include elements 
encoding antibiotic resistance found in Vibrio species 
and enteric bacteria. Presumably their integrases act as 
true site-specific recombinases. Two other kinds of 
element are found in Bacteroides and Clostridium 
species. These are mobilizable transposons and the 


nonreplicating Bacteroides units (NBUs). These 
elements encode genes required for excision, recogni- 
tion and nicking at oT, and integration, but only 
conjugate when a conjugative transposon is present 
in the donor cell. Thus, their movement from donor 
to recipient is reminiscent of the mobilization of non- 
conjugative plasmids by a conjugal plasmid. 


Further Reading 

Church Ward G (2001) Conjugative transposons and related 
mobile elements. In: Craig N, Craigire R, Gellbert G and 
Lambowitz A (eds) Mobile DNA Il, chapter 9. Washington, 
DC: American Society for Microbiology. 

Smith CJ, Tribble GD and Bayley DP (1998) Genetic elements of 
Bacteroides species: A moving story. Plasmid 40: 12-29. 

Salyers AA, Shoemaker NB and Li L-Y (1995) In the driver's seat: 
the Bacteroides conjugative transposons and the elements 
they mobilize. Journal of Bacteriology 177: 5727-5731. 

Scott JR and Churchward G (1995) Conjugative transposition. 
Annual Review of Microbiology 49: 367—397. 


See also: Antibiotic Resistance; Conjugation, 
Bacterial; Rolling Circle Replication; Site-Specific 
Recombination 


Conplastic 


L Silver 
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Conplastic strains are a variation on the congenic 
theme, except that in this case, the donor genetic 
material is the whole mitochondrial genome which is 
placed into an alternative host. Conplastic lines are 
generated by sequential backcrossing of females from 
the donor strain to recipient males; this protocol is 
reciprocal to the one used for the generation of Y 
chromosome consomics. 


See also: Congenic Strain; Consomic 


Consanguinity 
M-P Lefranc and G Lefranc 
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‘Consanguinity,’ derived from the Latin consanguin- 
eus (“of common blood”), is defined as the kinship of 
two individuals characterized by a shared common 
ancestor(s). It implies the inheritance of genes which 
are identical by descent, i.e., inherited from the com- 
mon ancestor(s). Consequently, consanguinity affects 


the probabilities of occurrence of genotypes. The 
coefficient of consanguinity of an individual, or coef- 
ficient of inbreeding (F), is the probability of finding, 
at a given locus, two genes which are identical by 
descent. At this locus, the individual is homozygous 
by descent (‘autozygous’ is a useful word for this type 
of homozygosity). The coefficient of relationship (r) 
of two individuals is the probability that they have, 
at a given locus, a common gene which is identical 
by descent. The coefficient of consanguinity of an 
individual is equal to the coefficient of relationship 
of the parents (for example, this would be 1/16 for 
the child of a marriage between first cousins). Statisti- 
cally, for the majority of loci of an individual, the 
coefficient of consanguinity corresponds approxi- 
mately to the percentage of loci holding identical 
genes (again 1/16 in our example). 

If we consider a rare, or very rare, autosomal reces- 
sive allele in the general population, and a common 
ancestor who is a carrier, the mutant gene will have 
been transmitted from the ancestor to its descendants 
with the same probability as a more common gene. 
Therefore, the probability of receiving this rare allele 
from the ancestor — i.e., the probability of being 
homozygous (autozygous) for that allele — is greatest 
in the offspring of a consanguineous marriage. Thus, 
rare or very rare autosomal recessive diseases are more 
frequent in the offspring of consanguineous unions. 
The rarer the occurrence of these genetic diseases (i.e., 
the rarer the autosomal recessive allele frequency), the 
higher the proportion of patients found to be consan- 
guineous. This is shown by the formula: 


_ ¢e(1+ 154) 
~ 16g +c(1- 4) 


where: k is the percentage of consanguineous patients 
(issued from first cousins) among all patients homozy- 
gous for an autosomal recessive allele; cis the frequency 
of first-cousin marriages in the general population; 
and q is the frequency of the recessive allele. 

Marriage between cousins in itself is not necessarily 
harmful and does not always cause genetic disease. 
Children of consanguineous marriages will be at 
increased risk only if both parents carry the same 
mutant gene at a given locus. Mutant genes are present 
in all populations and all people carry one or more 
highly deleterious recessive genes in the heterozygous 
state. As related individuals are more likely to be 
heterozygous for the same mutant gene (identical by 
descent) than unrelated individuals, consanguineous 
marriages such as those between first cousins have a 
higher probability of producing offspring affected by 
an autosomal recessive trait. If the deleterious alleles 
are present at very low frequencies (due to selection, 
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for example), the probability of these alleles appearing 
in the homozygous state in panmictic (randomly 
breeding) populations will also be lower. If a mutant 
gene causing disease is common in a population, then 
consanguineous marriages will carry no observed 
higher risk of having an affected child. 

Where a particular trait is both recessive and very 
rare, the occurrence of parental consanguinity may be 
the first pointer to the fact that the trait is genetic. 
Some genetic disorders, previously thought to have 
been inherited as autosomal dominant defects, have 
been clearly revealed to be recessive following a closer 
investigation. 

In some countries, consanguinity is common and 
marriages among relatives occur widely due to social 
and cultural tradition, economic considerations, reli- 
gion, education, and family pressure. Consanguineous 
marriage is commonly favored in North and sub- 
Saharan Africa, the Middle East, and in western, cen- 
tral, and much of southern Asia, where it remains a 
medical problem due to a lack of health awareness and 
premarital testing, as well as limited genetic counsel- 
ing services. In these regions, 20-50% of all marriages, 
particularly in rural areas, are consanguineous, the 
most prevalent type being between first cousins. 
These so-called first cousins are often much more 
closely related, with a coefficient of relationship 
higher than 1/16, because consanguineous marriages 
within the population have occurred for centuries and 
common ancestors are themselves frequently related. 
Therefore, compared with panmictic populations, 
those with high levels of consanguinity experience 
higher levels of genetic diseases. 

The occurrence in consanguineous populations of 
genetic defects which are almost unknown in panmic- 
tic populations, and the generally large size of families 
in these populations, are invaluable starting points 
from which to identify new genes, their products, 
and their functions. Previously unsuspected links to 
cell physiology are thus revealed and can be analyzed 
to give an understanding of the pathophysiology prior 
to the most suitable treatment. 


See also: Identity by Descent; Inbreeding 
Depression 


Consensus Sequence 
A Liljas 
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A consensus sequence is a nucleotide sequence of 
DNA, RNA, or an amino acid sequence of proteins 
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that is generally used for inter- or intramolecular inter- 
actions. Similar molecules in a cell frequently use the 
same or highly similar consensus sequences and are 
usually well conserved between species. There are 
numerous examples, some of which are mentioned 
here. 

In the case of DNA, the sequence at the origin of 
replication in Escherichia coli and other bacteria has 
two short, repeated sequences with consensus features 
as well as individual variations. DNA polymerase and 
other proteins of the replication machinery identify 
these sequences. For the transcription of DNA to 
RNA, the regulation is essential. Consensus DNA 
sequences are at the core of this regulatory activity. 
Thus eubacterial promotors of genes have consensus 
sequences preceding the start site for replication. One 
such, called the ‘Pribnow box’ or ‘—10’ region, is 
situated about 10 nucleotides before the start of replica- 
tion. Another site is the —35 region. They have the 
consensus sequences TATAAT and TTGACA, 
respectively, which are recognized by RNA polymer- 
ase. In eukaryotic transcription, again consensus 
sequences guide the RNA polymerase to the proper 
site. Here a large number of transcription factors par- 
ticipate to aid in the regulation of transcription. A 
classical consensus sequence is the so-called TATA 
box, recognized by one of the key proteins, the 
TATA box-binding protein (TBP). 

In the case of RNA, a number of consensus elem- 
ents are of great interest. One case is the consensus 
sequences of RNA transcripts in eukaryotes and 
archaea that lead to the splicing to mature mRNAs. 
Prokaryotic mRNAs have a consensus sequence pre- 
ceding the initiation codon. This is called the ‘Shine- 
Dalgarno sequence’ and is used by a complementary 
region of the ribosomal RNA to identify the initiator 
codon from other methionine codons. Another case is 
the 3’ end of tRNA, which has the consensus sequence 
CCA. This consensus sequence is recognized by the 
tRNA synthetases, which charge the terminal ribose 
with the appropriate amino acid. The individual 
tRNAs are recognized by the appropriate synthetase 
owing to additional consensus features. The ribosomal 
peptidyl transfer site also recognizes the CCA se- 
quence of the tRNAs. 

Proteins also have consensus sequences of amino 
acids. This is, e.g., the case for sites of phosphorylation 
in proteins. The pattern recognized by the kinases can 
often be identified as consensus elements in the amino 
acid sequence. One other classical consensus sequence 
in proteins is the GXGXXG motif found in the 
Rossmann fold or nucleotide binding fold at the bind- 
ing site of the nucleotide. Another consensus sequence 
is the leucine zipper, where every seventh amino acid 
residue is a leucine. Such structures are found in 


long, dimerizing a-helices in DNA-binding proteins. 
Numerous other consensus amino acid sequences are 
being identified in proteins as an excellent method for 
classifying specific folds or different types of func- 
tional sites. 

The consensus sequences are conserved between 
species owing to their structural or functional import- 
ance. Likewise these motifs are often used repeatedly 
within a species. They may arise as gene duplications, 
where some part of the macromolecule has diverged 
from the original sequence to fulfil new needs while 
the consensus sequences remain essentially intact, 
forming the structural or functional base for this 
type of molecule. 


See also: Ori Sequences; Shine-Dalgarno 
Sequence; TATA Box 
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R Frankham 
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Conservation genetics is the application of genetics to 
reduce the risk of population and species extinctions. It 
deals with genetic factors causing rarity, endangerment, 
and extinction (inbreeding and loss of genetic diver- 
sity), genetic management to minimize these impacts, 
and the use of genetic markers to resolve taxonomic 
uncertainties in threatened species, to understand their 
biology, and to detect illegal hunting or trade in threat- 
ened species. It is an applied discipline that draws on 
evolutionary and molecular genetics. 

The need to conserve species arises because the 
biological diversity of the planet is rapidly being 
depleted as a direct or indirect consequence of 
human actions. An unknown but large number of 
species is already extinct, while many others have 
reduced population sizes that put them at risk. Many 
species now require human intervention to optimize 
their management and ensure their survival. The scale 
of the problem is enormous; 56% of mammals, 58% of 
birds, 62% of reptiles, 64% of amphibians, and 56% 
of fish are categorized as threatened by the World 
Conservation Union (IUCN). 

Four justifications for maintaining biodiversity 
have been advanced: the economic value of biore- 
sources, ecosystem services, aesthetics, and the right 
of living organisms to exist. IUCN recognizes the 
need to conserve biodiversity at three levels: genetic 
diversity, species diversity, and ecosystem diversity. 
Genetics is directly involved in the first two of these. 


What Causes Extinctions? 


The primary factors contributing to decline in the 
numbers within species are habitat loss, introduced 
species, overexploitation, and pollution. Typically 
these factors reduce species to population sizes where 
they are susceptible to accidental (stochastic) effects, 
whether environmental, catastrophic, demographic, or 
genetic (inbreeding depression, loss of genetic vari- 
ation, and accumulation of deleterious mutations). 


Genetics in Conservation Biology 


Sir Otto Frankel, an Austrian-born Australian, was 
largely responsible for the recognition of genetic fac- 
tors in conservation biology. This began only in the 
1970s. Frankel collaborated with, and strongly influ- 
enced Michael Soulé of the United States, the found- 
ing father of modern conservation biology. Frankel 
and Soulé wrote the first book on conservation bio- 
logy that considered genetic factors. 


Genetic Consequences of Small 
Population Size 


Threatened and endangered species have small popu- 
lations. This inevitably results in inbreeding (mating 
together of relatives), reduced reproduction rate and 
survival, loss of genetic diversity, and accumulation 
of deleterious mutations. These increase the risk of 
extinction, so procedures have been devised to mini- 
mize their effects. 

Inbreeding is inevitable in small populations as 
different individuals come to share common ances- 
tors. Inbreeding results in reduced reproduction and 
survival (reproductive fitness) in all well-studied 
populations of naturally outbreeding species of ani- 
mals and plants. This is referred to as inbreeding 
depression. All aspects of reproductive fitness are 
affected adversely by inbreeding. In animals this 
includes litter or clutch sizes, survival, mating ability, 
sperm quality, maternal ability, milk production in 
mammals, and developmental time. However, zoo 
personnel were skeptical that inbreeding depression 
occurred in wildlife. Katherine Ralls and Jon Ballou of 
the National Zoological Park in Washington, DC 
clearly demonstrated the deleterious effects of 
inbreeding. They found that inbred offspring had 
higher juvenile mortality than outbred offspring 
in 41 of 44 mammalian populations they studied 
(Ralls and Ballou, 1982). The effect was large: 
brother-sister matings on average had 33% higher 
juvenile mortality than outbred matings. As this 
represents only one component of the life cycle, the 
overall impact of inbreeding is much greater than this. 
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Skepticism about inbreeding depression is now 
focused on populations in nature. A growing body 
of evidence has demonstrated inbreeding depression 
affecting wild populations of animals and plants in 
nature. For example, Robert Lacy and colleagues 
from Brookfield Zoo in Chicago demonstrated that 
survival was lower for inbred than outbred native mice 
(Peromyscus) that were reintroduced into their wild 
habitat (Lacy, 1997). 

Small population size elevates the risk of extinction 
due to demographic and environmental fluctuations, 
to catastrophes, and to genetic effects — inbreeding 
and loss of genetic diversity. There has been much 
controversy over the role of genetics in extinctions. 
llik Saccheri and colleagues have show that inbreed- 
ing contributes to the extinction of butterfly popu- 
lations in Finland, even when all known ecological 
and demographic factors have been accounted for 
(Saccheri et al., 1998). Inbreeding explained 26% of 
the variation in extinction rate in these populations. 
This conclusion is likely to apply more widely as 
there is circumstantial evidence that inbreeding con- 
tributes to the elevated proneness to extinction of 
island populations. 

Captive populations are actively managed to maxi- 
mize their genetically effective sizes so that inbreeding 
and loss of genetic diversity are minimized. Indi- 
viduals are chosen for mating so that the parents are 
those that are least related (kinship is minimized), 
meaning that the level of inbreeding in the next 
generation is minimized, as is the loss of genetic 
diversity. 


Loss of Genetic Diversity 


Genetic diversity is required for populations to evolve 
in response to environmental changes, such as climate 
change and new or altered diseases. Genetic diver- 
sity is one of three levels of biological diversity re- 
cognized by the IUCN as deserving conservation. 
The major factor involved in loss of genetic diver- 
sity is small population size. Inbreeding and loss of 
genetic diversity go hand-in-hand in small popula- 
tions of naturally outbreeding species. The sampling 
of gametes that occurs in the reproduction of finite 
populations results in loss of alleles and reduced evo- 
lutionary potential. Species with large populations 
have, on average, higher levels of genetic diversity 
than species with smaller population sizes, and larger 
populations within species typically have more gen- 
etic diversity than smaller populations. Endangered 
species, which by definition have small population 
sizes, generally have lower levels of genetic diversity 
than related nonendangered species. For example, 
the endangered northern hairy-nosed wombat that 
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exists in a single population of only about 70 individ- 
uals in Queensland, Australia has less genetic diversity 
than its nearest relative, the southern hairy-nosed 
wombat. 


Accumulation of New Deleterious 
Mutations 


Accumulation of new deleterious mutations has re- 
cently been recognized as potentially of importance in 
the decline of small populations. Selection against 
mildly deleterious alleles is ineffective in small popu- 
lations and so some will, by chance, drift to fixation, 
resulting in reduced reproductive fitness. There can 
be little doubt that this is an important factor in 
small populations of asexual species and habitual 
self-fertilizing species. However, its role in sexually 
reproducing species is controversial. Theoretical stud- 
ies have suggested that it should be important in popu- 
lations with genetically effective populations up to 
1000. However, experiments with fruit flies have not 
supported these concerns. 


Genetic Deterioration in Captivity 


Many high-profile species require captive breeding 
to save them from extinction as their natural envir- 
onment is so hostile or degraded that they cannot 
survive in the wild. For example, California condors, 
Przewalski’s horse, Pére David’s deer, black-footed 
ferrets, and a number of plants exist now or have had 
periods when they existed only in captivity. Zoos, 
wildlife parks, botanic gardens, and arboretums are 
involved in captive-breeding programs for hundreds 
of species. Generally, these programs aim to even- 
tually reintroduce the species into its natural habitat 
when conditions are again suitable; the scenario envis- 
aged is an eventual reduction in the human popula- 
tion, releasing habitat suitable for wild plants and 
animals. 

Four deleterious genetic changes occur in captivity: 
inbreeding, loss of genetic diversity, mutational accu- 
mulation, and genetic adaptation to captivity. The first 
three of these have been described above, and manage- 
ment aims to minimize their impacts. Genetic adapta- 
tion to the captive environment occurs because natural 
selection operates in captivity to improve the ability of 
populations to reproduce and survive in the captive 
environment. However, this results in a corresponding 
reduction in the ability of the population to reproduce 
and survive in its natural environment. While this 
issue has been recognized since Darwin’s work in the 
nineteenth century, it has until recently been consid- 
ered a minor problem. Evidence from laboratory spe- 
cies (Drosophila), insects raised in captivity for release 


in biological control programs, and fish indicates that 
populations may suffer serious deterioration from 
genetic adaptation. In contrast to the other causes of 
genetic deterioration, this is worse in larger popula- 
tions, rather than in small ones. While it is not com- 
mon practice to do so, this problem can be minimized 
by fragmenting captive populations among, for ex- 
ample, different zoos, and only occasionally exchang- 
ing individuals to control inbreeding. 


Island Populations 


A recurring theme in conservation genetics is the par- 
allels between populations of conservation concern 
and island populations. Endangered species in captivity 
are akin to island populations. Fragmented populations 
often have the characteristics of island populations 
where they previously existed as a continuous popula- 
tion with immigration across the range. The disturb- 
ing implication of the island analogy is that island 
populations are much more prone to extinction than 
mainland populations. 

Island populations typically experience different 
environments to their mainland counterparts, espe- 
cially in the absence of many predators, parasites, 
and disease. The evolution of endangered species in 
captivity has features that are akin to this. Island 
populations are often founded from a small number 
of individuals so they have often experienced popula- 
tion size bottlenecks. Further, the populations are 
typically smaller than their mainland counterparts. 
Both these features are shared with endangered species 
in captivity. As a consequence of bottlenecks at foun- 
dation and small size, island populations typically 
have less genetic diversity than mainland populations 
and are often inbred and may have lowered reproduct- 
ive fitness. 


Resolving Taxonomic Uncertainties 


If the classification (taxonomy) of threatened species 
is incorrect, threatened species may be denied protec- 
tion, resources may be wasted on populations of non- 
threatened species, or different species, or subspecies 
may be hybridized resulting in adverse effects on their 
reproductive fitness. Genetic markers can usually be 
used to resolve the taxonomic status where it is in 
question. For example, tuatara, a unique New Zealand 
reptile, has been considered to be a single species, but 
recent genetic studies using electrophoresis of pro- 
teins demonstrated that it consists of two species, 
one of which has been given no special conservation 
protection. Conversely, the endangered colonial 
gopher from Georgia, USA was shown to be indistin- 
guishable from the common pocket gopher in that 


region. Several species of salmonid fish, such as bull 
and cutthroat trout, have been shown to hybridize 
with introduced trout. 

Confirmation of the status of populations within 
species has allowed additional individuals to be added 
to populations of endangered species founded from 
few individuals. For example, the Mexican wolf is 
probably extinct in the wild, and the only certified 
‘pure’ population traces to only three or four found- 
ers. Two other populations of presumed Mexican 
wolves exist, but it was not known whether they 
were ‘contaminated’ with genes from dogs, or other 
carnivores. Analyses using microsatellites markers 
(short tandem repeats in DNA with variable numbers 
of repeats, e.g., 10 AC repeats versus 12) established 
that these two other populations were ‘pure’ Mexican 
wolf (Hedrick et al., 1997). It has been recommended 
that the three populations be combined to minimize 
inbreeding and maximize genetic diversity in Mexican 
wolves. A similar situations occurred in Speke’s 
gazelle. The US captive population of this species 
was based on only one male and three females, so it 
quickly became inbred. Bringing in additional found- 
ers from the wild was not possible as the species came 
from a region of Africa subject to warfare and they 
may be extinct in the wild. Captive animals from 
Qatar were shown to belong to the same species, 
and to be relatively unrelated to the US population. 
An animal from Qatar has been added to the US 
population. 


Use of Genetic Markers to Understand 
the Biology of Endangered Species 


Genetic markers contribute in a wide variety of ways 
to the conservation of species by helping to reveal 
essential features of their biology. Capture of many 
species of wild animals is difficult and stressful to the 
animals, sometimes resulting in death. Nondestructive 
sampling and genetic typing is now possible with 
the advent of the polymerase chain reaction (PCR) 
to amplify DNA from as little as one copy of the 
DNA. PCR has allowed valuable endangered species 
to be studied without having to subject them to stress 
by capturing them. For example, hair was collected 
from endangered northern hairy-nosed wombats in 
Australia by putting frames with sticky tape in front 
of their burrows. DNA was extracted from this 
hair and used to examine dispersal and mating pat- 
terns. DNA for such analyses can be obtained from 
hair, skin, feathers, urine, feces, eggshells, and sub- 
fossils. 

Males and females are often externally indistin- 
guishable in birds, resulting in cases where two birds of 
the one sex have been placed together in unsuccessful 
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attempts to breed. Genetic markers on the sex chromo- 
somes can be used to distinguish the sexes in bird 
species. 

Population structure and dispersal rates are import- 
ant parameters required for conservation purposes. 
These can be worked out indirectly by using genetic 
markers. For example, dispersal patterns of female 
loggerhead turtles have been determined from studies 
of mitochondrial DNA (mtDNA) which is maternally 
inherited. 

Nocturnal and shy species are difficult to census. 
Scat (feces) counts can be used to census such species. 
However, scats of the species of interest must be 
distinguished from other species with similar feces. 
Analyses of amplified segments of mtDNA have 
been used to distinguish scats of the endangered San 
Joaquin kit fox from those of red foxes, gray wolves, 
coyotes, and domestic dogs. 

Endangered species are protected from hunting and 
trade by laws within countries and by CITES (the 
Convention on Trade in Endangered Species). How- 
ever, illegal hunting may be difficult to detect. A sub- 
stantial proportion of the whale meat in Japan and 
Korea has been shown to come from protected 
whale and dolphin species, rather than from minke 
whales which are subject to legal scientific whaling. 
Scott Baker and colleagues amplified a portion of 
mtDNA from samples of whale meat purchased in 
those countries and took the amplified DNA to their 
home laboratories and sequenced it. They could not 
take the whale meat samples out of Japan and Korea 
without risking violation of CITES rules. 


Methodology in Conservation Genetics 


An important feature of conservation genetics is the 
methodology used for advancing the field, and for 
resolving issues. The normal means for testing theory 
and resolving issues is to use replicated experiments 
with controls. However, endangered species are 
unsuitable for doing this as they are typically slow 
breeders, expensive to keep, and present in low num- 
bers. Two approaches are used to resolve issues in 
conservation genetics, the use of laboratory species 
and combined analyses of small data sets from many 
wild species (meta-analyses). Laboratory species, such 
as fruit flies (Drosophila), flour beetles (Tribolium), 
and mice have long been used to investigate similar 
problems, such as inbreeding and the effects of small 
populations in evolutionary genetics, and animal 
breeding as the genetics of all outbreeding species 
are similar. Such studies are now performing a simi- 
lar role in conservation genetics, often in concert 
with meta- analyses. An increasing number of issues 
in conservation genetics are being resolved using 
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meta-analyses. For example, a meta-analysis of small 
data sets from 44 populations of mammals was used to 
establish that inbreeding had deleterious effects in 
captive populations, as described above. Similarly, 
the relationship between small populations size and 
low genetic diversity has been resolved using a meta- 
analysis. 
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Conservative 
Recombination 
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Conservative recombination is the breakage and 
rejoining of DNA strands without the synthesis of 
new stretches of DNA. 


See also: DNA Repair 


Conserved Synteny 
See: Synteny (Syntenic Genes) 


Consomic 
L Silver 
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Consomic strains of animals are a variation on con- 
genic strains in which a whole chromosome — rather 
than one local chromosomal region — is backcrossed 
from a donor strain onto a recipient background. 
Consomic production with the Y chromosome is 
readily carried out because of the lack of recom- 
bination along this chromosome in male animals. For 
consomic production with other chromosomes, it is 
necessary to select animals at each generation with the 
use of DNA markers that can demonstrate the trans- 
mission of the whole chromosome intact. Like con- 
genics, consomics are produced after a minimum of 10 
backcross generations. Backcrossing to obtain con- 
somics for the Y chromosome must be carried out in a 
single direction: males that contain the donor chromo- 
some are always crossed to inbred females of the 
recipient strain. 


See also: Congenic Strain 


Constant Regions 
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Constant regions are the nonvariable regions of im- 
munoglobulin molecules encoded by C genes. Each of 
the heavy and light chain classes of immunoglobulin 
has a different C gene. 


See also: C Genes; Immunoglobulin Gene 
Superfamily 


Constitutive Expression 


E Thomas 
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When an organism faces an environmental change it 
will generally express a different set of genes in order 
to respond to the new conditions. However, there are 
some basic housekeeping genes that are expressed 
under all conditions. These include the genes required 


for transcription, translation, and other fundamental 
functions. Genes that are always expressed are said to 
be ‘constitutively expressed.’ They are primarily the 
genes that are required for a cell’s normal growth. 

It is sometimes useful in biotechnology to change 
the promoter of a gene so that the gene becomes con- 
stitutively expressed. This can be done by moving 
the promoter of a gene that is naturally constitutively 
expressed upstream of the gene of interest. In this 
way, the gene can be studied under conditions where 
it would not normally be expressed, and large quanti- 
ties of the protein encoded by the gene can generally 
be produced unless such overproduction is lethal to 
the host. 


See also: Biotechnology; Gene Expression 


Constitutive 
Heterochromatin 


See: Heterochromatin 


Constitutive Mutations 
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Constitutive mutations are mutations causing the 
unregulated expression of genes that are normally 
regulated. 


See also: Gene Expression; Mutation 


Contig 
L Stubbs 
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‘Contig’ is a term that is used to describe contiguous 
sets of overlapping clones. More recently, contig has 
also been used to describe an assembled set of over- 
lapping DNA sequences; these sequences are often 
themselves derived from contigs of overlapping clones. 
A contig can be built from any type of clone or se- 
quence data set: overlapping cDNA sequences, plasmid 
subclones, cosmids, bacterial artificial chromosomes 
(BACs) or yeast artificial chromosomes (YACs) can 
be organized to form a contig. The most common use 
of this term refers to a contiguous set of overlapping 
genomic clones. For example, BAC clones isolated to 
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contain two markers that are spaced closely together 
may overlap to form a contig; by design of the process, 
clones identified through chromosome walking pro- 
cedures overlap, and therefore form a contig. Over- 
lapping clones are most often identified by shared 
content of markers, for example, STS markers or 
end-clone sequences. Clone overlap can also be deter- 
mined by the process of restriction enzyme finger- 
printing, which identifies matches between two clones 
by similar patterns of DNA fragment lengths that are 
generated by specific restriction enzymes. 


See also: Chromosome Walking; DNA Cloning 


Continuous Variation 
P Sham 
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Continuous variation refers to individual differences 
of a quantitative rather than qualitative nature. Such 
traits can be quantified numerically, so that each indi- 
vidual in the population can be characterized by a 
certain numerical trait value. The frequency distribu- 
tion of the trait in a population can be examined by 
plotting a histogram of the trait values in a random 
sample. If the distribution of trait values in the popu- 
lation has no clear discontinuities, then the trait is said 
to be continuous. Some continuous traits have a sym- 
metrical, unimodal, bell-shaped frequency distribu- 
tion, while others have a skewed or multimodal 
distribution. Often interest is focused not on one but 
several continuous traits, and then it is necessary to 
visualize the joint distribution of these traits by scatter 
plots or other graphical methods. 

The genetics of continuous variation is the subject 
of a branch of genetics known as quantitative genetics. 
This branch of genetics is based on the recognition 
that, although genes are discrete units, the aggregate 
effects of many genes can take on a continuous dis- 
tribution. The involvement of multiple genes each of 
small effect is called polygenic inheritance, as against 
monogenic inheritance for Mendelian traits. Quanti- 
tative genetics has played an important role in agricul- 
tural animal and plant breeding, producing stocks 
with favorable characteristics (such as milk yield in 
diary cattle). The traditional goal of quantitative 
genetics is to uncover and quantify the underlying 
genetic model for one or more continuous traits, and 
to use this model to generate predictions for the effect- 
iveness of different breeding programs. The genetic 
model is typically characterized by a number of 
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genetic and environmental sources of variation that 
are in turn characterized by both their relationships 
to each other, and by their pattern of effects on the 
continuous traits. The relative contributions of the 
different components of variation in the model are 
typically estimated from summary statistics (such as 
means, variances, and covariances) of the trait values 
in the offspring of elaborately designed crosses. Of 
most interest are additive genetic components that, 
as the name implies, have effects on the continuous 
traits that are additive. Variations due to additive com- 
ponents are correlated between relatives to the same 
extent as the genetic overlap between the relatives. 
The presence of substantial additive genetic compon- 
ents means that offspring will tend to closely resemble 
their parents. This forms the basis of efficient selective 
breeding programs. 

In humans, quantitative genetics is limited by the 
need to use naturally occurring crosses. Extensive 
uses are made of twins and adoptees, in order to 
untangle genetic and environmental sources of vari- 
ation. The need to rely on natural variation and covari- 
ation has also necessitated the use of the sophisticated 
statistical technique of linear structural equation mod- 
eling. In humans, quantitative genetics has application 
in the prediction of illnesses that can be regarded 
as representing the extremes of certain continuous 
variables. Examples are hypertension (high blood 
pressure) and diabetes (high blood glucose level). If it 
is possible to predict one’s genetic risk of developing 
an illness, then it may be possible to target preventa- 
tive measures to high-risk groups. Estimates of the 
relative importance of different genetic and environ- 
mental sources of variation may also help to provide 
directions for research into specific causative factors. 
For example, if quantitative genetic analysis indicates 
a substantial environmental component shared by sib- 
lings, then further research would be directed to style 
of parenting and other factors likely to be shared by 
siblings. 

In recent years, the study of continuous variation 
has been revolutionized by developments in mo- 
lecular genetics. The ability to analyze DNA se- 
quences in individuals has provided genetic markers 
that can be used to map the genes that determine a 
quantitative trait. Such genes are called quantitative 
trait loci (QTLs). The identification and charac- 
terization of QTLs relevant to common diseases pro- 
mise to provide important leads to the development of 
new methods of prevention and treatment of these 
diseases. 


See also: Additive Genetic Variance; 
Multifactorial Inheritance; QTL (Quantitative 
Trait Locus) 


Contractile Ring 
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A contractile ring is a ring of actin microfilaments that 
forms around the equator at the end of mitosis and 
diminishes in diameter, probably by contraction, thus 
pinching the daughter cells apart. 


See also: Mitosis 


Controlling Elements 


P Lu 
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‘Controlling elements,’ when used in the genetics lit- 
erature, refers to maize transposons. More recently, 
‘X-controlling element’ (X-ce) refers to a locus found 
on the mouse X chromosome that influences the 
choice of X chromosome inactivation; this is not likely 
to be a transposon. 


First Report of Transposable Genetic 
Elements 


‘Controlling element’ contains two words that seem 
fundamental. This expression describes observations 
in the variation and variegation in the color of kernels 
and leaves of maize. These observations were made 
by Marcus Rhodes in 1938, and a parallel system 
was observed by Barbara McClintock in the 1950s. 
The basic observation was the high reversion rate in 
some circumstances of mutations, as well as the non- 
Mendelian distribution of phenotypes in the progeny. 
Those original observations are understood today in 
the context of transposons and insertion sequences. In 
descriptions of controlling elements and various re- 
lated observable phenotypes, one is easily confused. 
The following list of definitions should be helpful in 
the vocabulary we associate with transposons today: 


Target gene: the gene or site where transposons 
insert. Usually it will express some visible or select- 
able phenotype. 

Receptor element: the transposon DNA sequence 
itself. 

Regulator gene: the gene that codes fora site-specific 
protein (e.g., transposase) that can move the trans- 
poson sequence which is flanked by appropriate 
inverted and/or direct repeat DNA sequences. 


The controlling element is made up of the receptor 
element and the regulator gene. 

Nonautonomous elements: the elements that arise 
when an insertion that disrupted a target gene is 
reversible only if a regulator gene is present at 
another nonadjacent genomic location, e.g., a trans- 
poson that has lost its transposase. 

Autonomous elements: the elements that create 
disruptions in target genes which can appear to cor- 
rect themselves without the help of regulator genes. 
Simply put, autonomous controlling elements carry 
their own regulator genes, or transposase. 


Other Considerations 


It should be noted that these complex genetic phe- 
nomena were first discovered in maize for the same 
reason that our understanding of genetics started with 
peas. This is a restatement of the fact that horticul- 
ture preceded animal husbandry. Plants are sessile, are 
visible to the naked eye, and are easy to store and 
catalog. Genetics depends on the analysis of offspring 
from breeding parents of known genetic traits. Every 
step of the process in this analysis is simpler with 
plants. 

The existence of transposons and the notion that 
they are parasitic or selfish DNA remains an evolu- 
tionary riddle. 


Further Reading 

Fedoroff NV (1999) The suppressor-mutator element and the 
evolutionary riddle of transposons. Genes to Cells 4: | 1—19. 

Griffiths AJF, Miller JH, Suzuki DT, Lewontin RC and Gelbart 
WM (2000) An Introduction to Genetic Analysis, 7th edn, pp. 
602-605. New York: WH Freeman. 

Heard E, Clerc P and Avner P (1997) X-Chromosome inactiva- 
tion in mammals. Annual Review of Genetics 13: 571—610. 

Keller EF (1983) A Feeling for the Organism: The Life and Work of 
Barbara McClintock. San Francisco, CA: WH Freeman. 

Lewin B (2000) Genes VII, pp 473—479. Oxford: Oxford Uni- 
versity Press. 


See also: Insertion Sequence; McClintock, 
Barbara; Transposable Elements; 
X-Chromosome Inactivation 


Convergent Evolution 
J Read and S Brenner 
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Convergent evolution is the gradual mutation of two 
or more genes, not derived from a common ancestor, 
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resulting in the production of similar DNA sequences. 
This generally occurs as a result of gene products 
acquiring similar functions. 


See also: Evolution of Gene Families 


Conversion Gradient 
F W Stahl 
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Meiotic Gene Conversion 


In meiosis, alleles at a given locus are normally segre- 
gated into the haploid products in such a way that 
two cells carry one allele and two carry the other (2:2 
segregation). Since each haploid cell carries genetic 
information twice, by virtue of the duplex nature of 
DNA, such normal segregation is often called 4:4. 
Of the occasional deviations from normality, the 
most common are 6:2 (full conversion, in which 
three haploid cells carry one allele, and one carries 
the other) and 5:3 (half conversion, in which two 
haploid cells carry one allele, one carries the other, 
and one carries a heteroduplex, in which the com- 
plementary DNA strands have information from 
different parents). 


Gradient 


For some genes, the meiotic conversion frequency for 
different genetic markers (mutations) within the same 
gene varies depending on the position of the marker 
within the gene. Commonly, conversion frequencies 
are relatively high near one end of the gene and fall 
monotonically to a lower value at the other end (this 
is known as the conversion gradient, often called 
‘polarity gradient,’ defining a segment of a chromo- 
some as a polaron). 

Conversion gradients can be detected indirectly, as 
well, by examining random haploid products from 
heteroallelic meioses. Wild-type recombinants from 
such crosses are approximately half the time parental 
type with respect to markers that flank the hetero- 
alleles. If the mutant site in one heteroallele is more 
frequently converted to wild-type than is the other, 
the two parental types will be unequally represented. 
The gradient is revealed by comparing the results 
of several such crosses between various heteroalleles 
and noting that the more frequently converted site is 
always the one located toward the same end of the 
gene. 
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Origin of the Gradient transcription promoter (the 5’ end of the gene). These 
promoters are hot spots for recombination by virtue 
of being regions that undergo a high rate of meiosis- 
specific, enzymatically catalyzed double-strand DNA 
breaks. 

On both sides of the double-strand break, nucleot- 


ides are removed from the 5/-ended strand. The 


The conversion gradient implies that meiotic recom- 
bination events are focused around hot spots. In Sac- 
charomyces cerevisiae, for which studies on eukaryotic 
recombination are most detailed, the high conversion 
rate is usually at the end of the gene adjacent to the 
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Figure | Model for a meiotic conversion gradient based on heteroduplex rejection. Of the four chromatids in a 
meiotic bivalent, only the pair of chromatids involved in the illustrated recombination event are shown. Arrowheads 
signify 3’ ends of polynucleotide strands. Broken lines indicate newly synthesized DNA. A chromatid (A) cut on both 
polynucleotide strands (B) is repaired with the aid of a homolog acting as jig and template. The repair steps involve 
resection of the 5’-ended strand (C; shown here only on one side of the break), followed by invasion of the homolog 
(D). Transfer of the invading strand into the homology may be extended (H) or may be interrupted when the 
mismatch-repair enzyme system recognizes a heteroduplex, stops the strand transfer (E), and resolves the 
heteroduplex (F). Examples of meiotic segregation ratios are indicated (G, K). On the right, repair of a mismatch (J) 
has resulted in a full conversion (6:2). An unrepaired mismatch to its left results in half conversion (5:3 segregation). 
Outward sliding (branch migration) of the Holliday junction (I) can result in aberrant 4:4 segregation (ab4:4, in which 
the ratio of alleles is normal but two of the chromatids are heteroduplex) when the mismatch repair system fails both 
to reverse formation of a resulting mismatch and to repair it. 


exposed 3/-ending single strands invade the homolog 
and prime DNA synthesis, using the homolog as tem- 
plate. This replaces the lost DNA and results in half 
conversion of markers located in this region of pri- 
mary heteroduplex. Such markers may be subjected 
to mismatch repair, with full conversion as a conse- 
quence (Figure |). 

The conversion gradient in S. cerevisiae may be a 
consequence of the following postulated features of 
the double-strand-break repair event: 


1. The extent of degradation of the 5’-ended chain is 
variable (with a mean length of several hundred 
nucleotides), so that the probability of a marker 
being involved in the primary heteroduplex falls 
with distance from the double-strand break. 

2. Secondary heteroduplex, also of variable length, 
may then form. This secondary heteroduplex might 
arise by continued degradation of the 5'-ended 
chain and processive pairing (strand transfer) of 
the 3’-ended strand with its complement in the 
homolog (as shown in Figure 1). Alternatively, it 
might arise by outward sliding (branch migration) 
of a Holliday junction. In either event, formation 
of secondary heteroduplex may be interrupted by 
the mismatch repair system — when a heteroduplex 
recognized by that system arises, strand transfer (or 
junction sliding) is reversed (Figure 1), resolving 
the heteroduplex. In support of this view, a gradient 
of lesser slope is observed when markers that are 
poorly recognized by the mismatch repair system 
when in heteroduplex are used. S. cerevisiae mu- 
tants with genetically deficient mismatch repair 
capacity show a similar reduction in the slope of the 
gradient. The mechanism of interaction of hetero- 
duplex DNA arising during recombination with the 
mismatch repair system remains to be elucidated. 

3. The lesser gradient slope seen in the absence of 
mismatch repair provoked the proposal that the 
gradient results from the direction of mismatch 
repair (Figure 2). In this proposal, heteroduplexes 
at sites close to the recombination-initiating double- 
strand break tend to be mismatch-repaired in favor 
of the DNA sequence of the invaded, intact 
chromatid, resulting in 6:2 segregation; hetero- 
duplexes at sites farther from the initiating break 
site tend to be repaired in favor of the sequence 
on the invading strand, restoring 4:4 segregation 
(Kirkpatrick et al., 1998). The proposed dependen- 
cies of the direction of conversion on distance from 
the initiating double-strand break suggests that 
mismatch repair is directed by the ends of strands 
and often extends from the mismatch to a nearby 
end. Sites near the initiating break are thereby 
prone to full conversion, while sites near the breaks 
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Figure 2 Model for a meiotic conversion gradient 
based on the direction of mismatch correction. As in 
Figure l, a double-strand break (B) is followed by- 
resection (C) and invasion (D). DNA synthesis com- 
pletes the formation of the double Holliday junction 
joint molecule (E). Resolution of the joint molecule 
involves cutting of Holliday junctions. In (F), the right 
junction has been cut. Mismatch repair of a hetero- 
duplex site close to the cut junction can remove DNA 
from the invaded chromatid (white), which is then 
replaced using the invading (black) chromatid as 
template. Such a repair event restores normal marker 
segregation to a site that otherwise would have 
segregated 5:3. 
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that resolve the Holliday junctions are prone to 
restoration (Figure 2). 


Conversions occurring in mitotic cells do not mani- 
fest a conversion gradient, presumably because they 
are initiated not at hot spots but by accidents occur- 
ring at random. 


Further Reading 
Nicolas A and Petes TD (1994) Polarity of meiotic gene conver- 
sion in fungi: contrasting views. Experientia 50: 242-252. 


Reference 

Kirkpatrick DT, Dominska M and Petes TD (1998) Conversion- 
type and restoration-type repair of mismatches formed 
during meiotic recombination in Saccharomyces cerevisiae. 
Genetics 149: 1693—1705. 


See also: Gene Conversion; Heteroduplexes; 
Hot Spots; Polarity; Transposons as Tools 
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Coordinate regulation is the common control of a 
group of genes. 


See also: Gene Regulation 


Copy-Choice Hypothesis 
P J Hastings 
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Copy-choice is a proposed mechanism of recombin- 
ation by which conservative DNA replication fol- 
lowing one chromosome could switch templates and 
copy the other. Originally proposed to explain bacter- 
ial recombination, it was developed into a general 
recombination model on the basis of recombination 
data from Neurospora. The proposal was that, if the 
point at which template switching occurred was not 
precisely reciprocal when two chromosomes were 
being copied, there would be more copies of the geno- 
type of one chromosome than of the other over a 
length, thus accounting for gene conversion and for 
its association with crossing-over. Demonstration of 
semiconservative replication of DNA caused the 
model to fall into disfavor. It also had the problem that 
the mechanism predicted that, contrary to observation, 
all recombination would be confined to the two new 


chromatids. A mechanism of this sort may occur in 
special situations such as recombination of RNA virus. 


See also: Embryo Transfer; Gene Conversion; 
Genetic Recombination 


Cordycepin 
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Cordycepin, 3’ deoxyadenosine, is an inhibitor of 
polyadenylation of RNA. 


See also: Soluble RNA 


Core Particle 
A Liljas 
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The ribosomal core particle has been under investiga- 
tion for decades and is related to the assembly and 
disassembly of ribosomes. The question is how the 
ribosomal RNAs (rRNA) bind ribosomal proteins to 
form the functional particle. Ribosomes from most 
species have more rRNA than protein. Thus the 
majority of ribosomal proteins are likely to primarily 
bind to the rRNA with less direct interactions 
between the ribosomal proteins. Another relevant 
observation is that the ribosomal RNA from which 
essentially all proteins have been removed is able to 
perform the so-called puromycin reaction, which is an 
assay for the central ribosome function of peptide 
transfer. This clearly indicates that at least part of the 
rRNA has the correct functional conformation, essen- 
tially independent of ribosomal proteins. This is in 
agreement with observations using electron micro- 
scopy that the ribosomal RNAs without the protein 
complement have shapes related to the complete sub- 
units. However, the rRNAs cannot possibly have 
adopted their final structure, since only a limited set 
of ribosomal proteins are able to bind specifically to 
the rRNA alone. Thus there seems to be an ordered 


pathway for ribosome assembly. 
From assembly experiments of the Escherichia coli 


small subunit, M. Nomura has established that the 
proteins that bind to the 16S RNA without the presence 
of other proteins and that form the core particle are 
S4, S7, S8, S15, S17, and S20. In the large subunit of 
E. coli ribosomes, the corresponding proteins are L4, 
L13, L20, L22, and L24 according to the findings of 


K. Nierhaus. Since most of these proteins are con- 
served in all types of ribosomes, they may be import- 
ant for the assembly of ribosomes in all species. 

The term ‘core particles’ also relates to the particles 
formed and the proteins that remain on the ribosome 
when treated with increasing concentrations of salt. In 
this type of disassembly of the small subunits from 
E. coli, proteins $4, S7, S8, S15, S16, and $17 remain 
until there is a very high salt concentration. The dis- 
crepancy between such disassembly and the assembly 
of the small subunit is that $16 is strongly bound to a 
site that is dependent on proteins $4 and $17, whereas 
protein S20 is more weakly bound to a site that does 
not depend on the presence of other proteins. Simi- 
larly if large subunits from E. coli are incubated with 
increasing concentrations of salt, proteins L2, L3, L4, 
L13, L17, L20, L21, L22, and L23 together with the 
23S RNA still form a compact particle. Also here there 
is a good correspondence with the proteins that have 
binding sites on the rRNA independent of other ribo- 
somal proteins. 

The core particles are thus formed by proteins that 
bind to the rRNA in the absence of other ribosomal 
proteins. These proteins affect the folding of the 
rRNA is such a way that binding sites for additional 
ribosomal proteins are generated. The binding sites for 
yet other proteins are generated by these proteins. It is 
possible that the binding of these proteins not only 
depends on the correct conformation of the rRNA 
but also on direct interactions with previously bound 
ribosomal proteins. 


See also: Ribosomal RNA (rRNA); Ribosome 
Binding Site; Ribosomes 


Corepressor 
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A corepressor is a molecule that helps to elicit repres- 
sion of transcription by binding to a regulator protein. 


See also: Repressor 


Correlated Response 


G P Wagner 
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Phenotypic characters exhibit various degrees of 
interdependency in their variation. In the case of 
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normally distributed quantitative characters the 
degree of interdependency is conveniently measured 
as genetic and phenotypic correlations between char- 
acters. The phenomenon of correlated response is a 
consequence of this interdependency among charac- 
ters (Falconer and Mackay, 1996; Roff, 1997). Selec- 
tion at a given character, say X, may lead to a change in 
another character, say Y, if they are genetically corre- 
lated. In other words, a correlated response is the 
change of a character Y due to selection at another 
character X. Correlated responses have important 
practical consequences for breeding and far-reaching 
theoretical implications for the understanding of evo- 
lution. This article summarizes the theoretical expla- 
nation of correlated responses and the limitations 
of quantitative genetic theory to predict correlated 
responses. In addition an overview of the practical 
and theoretical implications of correlated responses 
is given. 


Quantitative Genetic Theory of 
Correlated Response 


Consider a character X that is under direct selection 
and another character Y which is not selected but is 
correlated to the first character. The first character X 
experiences a selection differential Sx = Xw — X, 
where X is the character mean before selection and Xw 
is the character mean after selection in the parental 
population and is thus expected to show a selection 
response Rx = X' — X proportional to its heritability 
Rx = h’ Sx. For predicting the correlated response of 
Y to selection on X we need a measure for the degree 
of dependency of Y on X. The relevant measure of 
dependency between Y and X is the covariance 
between the mid parental value of X and the average 
offspring value of Y, Covpocxy), where the index PO 
indicates that this covariance is among parent and 
offspring values. Using the standard linear regression 
model the expected change in Y due to selection at X, 
i.e., the correlated response CRy, then is given by 


Covpocxy) 


CRy = 
Y Vx 


Sx (1) 


where CRy = X’ — Yis the difference in the character 
mean in the offspring generation Y’ and the parental 
generation Y. As it is the case for the direct selection 
response, the correlated response depends on additive 
genetic effects. Noting that the parent offspring co- 
variance Covpocxyy is caused by the additive genetic 
covariance between X and Y, the equation (1) can be 
rewritten as 


CRy = rahxhyix.\/Vy (2) 
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where ra is the additive genetic correlation between 
X and Y, bx = \/Vax/Vx is the square root of the 
heritability of X and analogously for hy, and ix = 
Sx/Vx is the selection intensity at character X. The 
factor ra bx hy is also called the ‘co-heritability’ in 
analogy to the heritability in the corresponding equa- 
tion for the direct selection response Rx = h vV Vxi. 

Equation (2) can be used either to predict the cor- 
related response given estimates of the heritabilities, 
the genetic correlation, and the selection intensity, or 
as a way to estimate the genetic correlation from a 
measured selection response and independent esti- 
mates of the heritabilities. It is even possible to esti- 
mate the genetic correlation from the correlated 
responses of both characters without estimating heri- 
tabilities. In the latter case it is necessary to measure 
the correlated response of Y to selection at X and vice 
versa. The additive genetic correlation can be esti- 
mated from 


CRxCRy 
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Limitations of the Quantitative Genetic 
Theory of Correlated Response 


The utility of quantitative genetic parameters like 
heritability and genetic correlations depends on their 
projectability, which means our ability to apply the 
measurement done in one instance to another instance 
of the phenomenon. In genetics we are primarily inter- 
ested in projectability across generations. If genetic 
variation is caused by a large number of segregating 
loci with individually small effects the projectability 
of heritability is quite good up to about 20 generations 
or more. There are a number of theoretical arguments 
that suggest that genetic correlations are much less 
projectable than heritabilities. Projectability is further 
decreased if genetic variation is caused by few genes 
with large effects (Bohren et al., 1966). These results 
suggest that genetic correlations and hence our ability 
to predict the correlated response depends on the 
genetic architecture of the characters. There are cases 
where the genetic correlation remained essentially 
unchanged over as many as 50 generations, for 
instance the correlation between wing and thorax 
sizein Drosophila melanogaster (Reeve and Robertson, 
1953). On the other hand there are cased where 
genetic correlations have changed substantially over 
22 generations, like in the case of litter weight and 
8-wk weight of mice (Eisen, 1972). In the latter cases 
only detailed information about the number, effects, 
and interactions of alleles underlying the traits will 
allow reliable predictions of the correlated response. 


Practical Uses of Correlated Response 


An interesting consequence of correlated response 
theory is that under certain circumstances the correl- 
ated response can exceed the direct response. Consider 
for instance the ratio of the correlated over the direct 
response 


CRy = Ixhxra 
Ry  iyhy 


(4) 


Assuming equal selection intensities, the correlated 
response can be greater than the direct response if 
hyrx > hy. Hence it might be more economical to 
select for a trait X to obtain an improvement of a 
trait Y than to directly select for the desired trait itself. 
This strategy is called indirect selection, and the char- 
acter under direct selection is called the secondary 
character and the other can be called target character. 
There are also other practical considerations where 
indirect selection is preferable over direct selection. 
For instance if the target trait is either difficult or 
expensive to measure, indirect selection may be more 
effective and/or more economical. Another situation 
in which indirect selection is to be preferred is where a 
trait is only expressed in one sex, like milk production 
in cattle, but has other, correlated traits that are 
expressed in both sexes. Indirect selection can then 
be applied to both sexes and the breeding will be 
more effective than selection on only the sex in 
which the target trait is expressed. 


Optimum 


Figure | Influence of correlated response on adaptive 
evolution. The two curves show the evolutionary tra- 
jectories for the approach to an adaptive optimum of 
characters X and Y for two genetic correlations, ra = 0 
and ra = —0.75. (Simplified after Via and Lande, 1985.) 


Implications for Evolutionary Theory 


The correlated response can retard adaptive evo- 
lution, depending on the circumstances (for a discus- 
sion of this point see p. 433 ff of Futuyma, 1998). 
Adaptive evolution will be retarded if the direction 
of natural selection is not parallel to the direction of 
genetic covariation between characters. In this case 
natural selection will lead to a correlated response 
that can drag the phenotype from the direct path 
to an adaptive optimum (Figure |; see trajectory for 
ra = — 0.75). As a consequence the adaptation of one 
or both of the characters may be retarded or even 
prevented from reaching the adaptive optimum 
(Lande, 1979). Correlated response thus can explain 
the origin and maintenance of nonadaptive features 
of the phenotype even in the presence of natural 
selection. 
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Cosmids are plasmids into which phage (lambda) cos 
sites have been inserted. They are used in recombinant 
DNA studies, where the resultant plasmid DNA can 
be packaged in vitro ina phage coat. Cosmids are often 
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used for construction of genomic libraries since they 
are able to carry relatively long pieces of inserted DNA. 


See also: Genomic Library; Plasmids; Vectors 


Cotransformation 


S A Lacks 
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Transformation refers to a genetic change in a cell that 
is mediated by free DNA. Cotransformation, there- 
fore, refers to the simultaneous change of two or more 
genetic markers. Although two markers may reside on 
a single genomic structure in the donor cell, several 
dispersive mechanisms reduce the frequency of simul- 
taneous integration of both markers into the recipient 
genome. 


Dispersive Mechanisms 


DNA Disruption 

When DNA is extracted from a cell and purified, shear 
forces randomly break the originally intact chromo- 
some into fragments with an average length of ap- 
proximately 30kb. The genome of Bacillus subtilis, 
for example, would be broken into about 150 frag- 
ments. 


Processing on Uptake 

In naturally occurring bacterial transformation, a 
bound DNA molecule is fragmented on the surface of 
the recipient cell, and only one strand of each fragment 
enters the cell. In Streptococcus pneumoniae, these 
strands have an average length of approximately 3 kb. 
If entering strand segments from adjacent parts of the 
donor molecule come from complementary strands, 
they will not be integrated into the same recipient 
strand, and they will not be present in the same daugh- 
ter chromosome. 


Integrative Recombination 

Separation could occur also during chromosomal 
integration of the donor strand segment if it is not 
integrated in its entirety, as a result, for example, of dis- 
placement competition between the incoming donor 
and resident homologous strands. 


Mismatch Repair 

Donor markers that produce certain base mismatches 
in the heteroduplex transformation intermediate are 
subject to elimination by a DNA mismatch repair 
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system. Depending on the mismatch, marker integra- 
tion efficiencies can be reduced as much as 20-fold. 
This will reduce the frequency of cotransformation 
relative to high efficiency markers, on which the mis- 
match repair system does not act. 


Genetic Mapping 


The effect of dispersive processes and the limited 
amount of DNA taken up per cell (usually less than 
1% of the genome) limits the frequency of cotrans- 
formation. Because dispersive processes depend on 
distance between markers, cotransformation fre- 
quency (ctf) can serve as a measure of linkage. Con- 
versely, the frequency at which markers separate, that 
is (1 — ctf), is a measure of distance between markers. 
For example, if two markers show 20% linkage, they 
might each transform 1% of the cells in a population, 
but 20% of the cells transformed for one marker 
(0.2% of total cells) will also carry the other marker. 
If the markers are not closely linked, they will be 
separated by the dispersive processes and enter cells 
separately and randomly. For an individual marker 
transformation frequency of 1%, the cotransforma- 
tion frequency would be 0.01% of total cells (or 1% 
of cells transformed for one marker would be trans- 
formed for both markers). 


Competent Population 


In the above discussion of genetic mapping, it was 
assumed that the entire population was competent; 
that is, all cells were equally able to take up DNA 
and be transformed. This is not always the case. The 
proportion of a population that is competent can be 
calculated by reversing the above approach and 
assuming that a pair of markers enters randomly, 
which will be true for most arbitrarily chosen pairs. 
Measurement of the cotransformation frequency, 
when divided into the square of the single marker 
frequency, gives the proportion of the population 
that is competent. In this way it was found that all of 
the cells in populations of S. pneumoniae and Haemo- 
philus influenzae are competent, whereas in B. subtilis 
only approximately 10% are competent. The molecular 
mechanism by which the competent population of 
B. subtilis becomes differentiated from the non- 
competent majority is not known. 


Congression 


Congression refers to the tendency of cells to be 
transformed by multiple markers, even when they 
are unlinked. There are two molecular bases for this 
phenomenon. Both have been used in practice to 


facilitate the screening for markers that are not directly 
selectable. 


Bacillus subtilis 

As indicated above, because only a fraction of cells ina 
culture of this bacterium is transformable, the freq- 
uency ofa second transformation in that subpopulation 
is much greater than in the population as a whole. Cells 
transformed for one marker, therefore, will be enriched 
for another marker. Cotransformation due to linkage 
can be distinguished from congression of unlinked 
markers by its independence of donor DNA concen- 
tration; ratios of cotransformants to single transfor- 
mants will not change for truly linked markers as the 
concentration is reduced and less DNA enters each cell. 


Yeast and other Eukaryotes 

Most cells have no natural mechanism to take up DNA 
for transformation. However, nearly all cells can be 
transformed by artificial means. Removal of cell walls 
is generally required, and the donor DNA presented 
to the protoplasts is usually complexed with calcium 
phosphate or positively charged polymers. Thus, the 
DNA is taken up by the cell in an aggregated form. 
This increases the likelihood of a second, unselected 
marker entering by congression. Such cotransform- 
ation has been useful in the genetic engineering of mam- 
malian cells to produce proteins of therapeutic value. 


See also: Bacterial Transformation 


Counterselection 


K B Low 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 | .0284 


Counterselection (also known as ‘contraselection’) is a 
special case of selection, i.e., the preferential survival 
of one group of organisms based on increased fitness 
under a particular environmental condition imposed 
on a mixture of organisms of different genotypes. 
The term ‘counterselection’ usually refers to bacterial 
conjugative crosses, wherein a donor and a recipient 
strain are mixed and incubated together for a certain 
length of time to allow DNA transfer, then subjected 
to a growth condition (for example, on agar in a petri 
dish which contains and/or omits certain key growth 
factors or inhibitors) which prevents the growth of 
either parental strain, but allows growth of (selects 
for) exconjugants that received certain genes from 
each parent in the cross. Since in bacterial conjugation 
only a portion of the donor genome is transferred to 
the recipient cell, the recipient usually contributes 


the majority of the eventual recombinant genotype, 
with the main ‘selected’ allele(s) (genetic markers) sig- 
nifying the new marker(s) added to the recipient 
chromosome from the donor. In contrast, the ‘coun- 
terselected’ marker(s) are alleles within the recipient 
which allow recipient-derived recombinants to grow 
under the imposed growth conditions, but prevent the 
donor cells from growing. 

The most commonly used markers for counter- 
selection against donor cells are ones that permit 
bacteriocidal selection, e.g., an antibiotic such as strep- 
tomycin or nalidixic acid, or lysis using a bacterio- 
phage. In each case the recipient strain must harbor a 
mutation that confers resistance to the antibiotic or 
bacteriophage and the donor strain must be sensitive 
to the same bacteriocidal agent. Bacteriocidal counter- 
selection prevents the donor cells from cross-feeding 
the recipient cells when they are mixed, e.g., on select- 
ive agar. The choice of a counterselective marker is 
also dictated by the chromosomal location of the 
mutation, which confers growth ability upon the recipi- 
ent (e.g., antibiotic resistance, bacteriophage resist- 
ance, or ability to grow without a certain nutrient). It 
is desirable to avoid the transfer of the donor allele 
of this counterselective marker early during the con- 
jugative transfer, in order to avoid expression in the 
merozygotes (or in the ultimate progeny of the cross) 
of the donor allele. This eventuality could kill the 
merozygotes owing to the counterselective environ- 
mental condition (antibiotic, bacteriophage, etc.). By 
using an appropriate counterselective marker, the 
transfer of the relevant donor allele by the particular 
donor (Hfr, F’, etc.). can be avoided, or at least reduced 
to a very low frequency. 


See also: Conjugation, Bacterial; Lytic Phage; 
Resistance to Antibiotics, Genetics of 


Covarion Model of 
Molecular Evolution 
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The covarion model of molecular evolution integrates 
knowledge of protein evolution from both primary 
sequence and three-dimensional structures. Its main 
postulate 1 is that, because of continuing small changes 
in secondary and tertiary structures during evolution, 
some amino acid sites in a protein may be free to 
evolve in some taxa, but fixed in others. The model 
was proposed by Walter Fitch (1971) soon after protein 
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sequences became available from many different 
organisms. It certainly allows that some sites in a 
macromolecule are critical to function and can never 
change — all mutations at such sites are lethal. In con- 
trast, an amino acid in cytochrome c of plants may 
function equally well with any of several different 
amino acids, whereas in vertebrates at the same site 
any mutation might be lethal. Conversely, for other 
sites it may be the other way around - they are variable 
in mammals, but fixed in plants. 

The model was first proposed to explain the results 
from a study of cytochrome c. The neutral rate of evo- 
lution of a protein is its rate of evolution if no sites were 
constrained by selection — this neutral rate is directly 
proportional to the mutation rate. The overall rate of 
evolution of cytochrome c was about 10% of the neu- 
tral rate, consistent with 10% of sites being free to vary 
at any one time. However 15% of sites had changed in 
mammals, but about 70% of positions had changed 
when a range of eukaryotes was examined. The con- 
clusion drawn was: 


because of the structural restraints imposed by functional 
requirements, mutations that will not be selected against are 
available only for a very limited number of positions .... 
However, as such acceptable mutations are fixed they alter 
the positions in which other acceptable mutations may be 
fixed. Thus, only about ten codons, on the average, in any 
cytochrome c may have acceptable mutations available to 
them but the particular codons will vary from one species to 
another. We shall term those codons at any one instant in 
time and in any given gene for which an acceptable mutation 
is available as the concomitantly variable codons. 


“Covarion’ is a contraction of concomitably variable 
codons, and the principle is applied to nucleotides as 
well as to amino acids. 

Despite its sound biochemical basis and its poten- 
tial importance for evolutionary studies, the covarion 
model has taken a long time to be fully developed; 
from a statistical viewpoint it appears to have far too 
many parameters to be useful. Consider the following 
reasoning. If most amino acid positions are constant 
over some portions of the tree and variable in others, 
then it appears that one could include as many para- 
meters as desired “in order to fit the data to the model”! 
It seems that you could say an amino acid site was 
constant here (or wherever you liked), and variable 
somewhere else. In general, invoking more and more 
parameters to “explain” the same dataset weakens 
the power of any model. Indeed, in the case of evolu- 
tionary trees it has been proven that, in principle, with 
enough variability of rates between sites any data 
could be derived from any tree. Thus the covarion 
model appeared to lack desirable mathematical prop- 
erties, and as a matter of (statistical) necessity, the first 
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models of molecular evolution had every site evolving 
at its own characteristic rate throughout all of evolu- 
tion. The rates could differ between sites (rates- 
across-sites models), but each site obediently kept to 
its own rate. 


The Biochemical Basis for the Covarion 
Model 


In fact, biochemical information predicts the opposite. 
Sites in widely different lineages should not neces- 
sarily have the same rate because the constraints may 
vary as the three-dimensional structure evolves; there 
should be some variation in rates at a site. Indeed, it is 
difficult to find a biochemical mechanism that would 
maintain the same potential rate of evolution at a site, 
irrespective of whether the gene was in eukaryotes, 
archaebacteria or eubacteria, or within thermophiles 
or mesophiles. One of the strongest conclusions of 
structural biology is that the three-dimensional struc- 
ture of a protein does vary during evolution. The 
standard measure is to compare two homologous pro- 
teins by the positions of the alpha carbon atoms (C,) 
along the backbone of the three-dimensional struc- 
ture. To make the measure quantitative, the root- 
mean-square (rms) of the difference is used. For a 
variety of proteins the average rms difference in 
three-dimensional structure increases with sequence 
divergence — even if only considering the core of the 
proteins. Thus the more different the sequence, the 
larger the difference in tertiary structure. The effect 
is nonlinear, with increasing difference in three- 
dimensional structure at higher sequence divergence. 
Other measures of protein structure give similar effects, 
and the conclusion about structure evolving through 
time is especially marked when insertions and/or dele- 
tions are examined. Examples of both divergence and 
convergence of protein domains are also found. 

The same conclusion also comes from studies 
on specific proteins. The X-ray crystallographic 
structures of a fish and human hemoglobin have a 
rms difference of 1.4 A, though the closeness of the 
match varies throughout the protein. Comparing 
repeated units of a protein gives other examples. For 
example, the ‘regulator of chromosome condensation’ 
protein (RCC1) is a seven-blade propeller structure, 
but the seven repeating units deviate slightly in three- 
dimensional structure. We can think of this structure 
of a protein as evolving through a fitness landscape of 
three-dimensional structure (Lesk, 2000, Chapters 5 
and 6). 

The overriding conclusion is that, although a few 
essential sites may be invariable over long periods of 
evolutionary time, most sites do change their func- 
tional environment during evolution. Indeed, with 


many noncatalytic proteins (such as those involved 
in regulation, ribosome structure, chaperones, and 
the like) there may be very few sites absolutely con- 
served. Consequently, the functional constraints on an 
amino acid site are expected to change over time. This 
is perhaps one of the best-substantiated facts of struc- 
tural biology — individual amino acid sites are not in 
the same environment over all of evolution. This is 
certainly consistent with expectations from the covar- 
ion model. 

This variation through time in tertiary structure 
makes it more difficult to develop simple mathemat- 
ical models of sequence evolution, though it does mean 
that information on the history of a protein can be 
retained longer. It means that is easier to recover older 
divergences because not all sites saturate from mul- 
tiple mutations at a site. This aspect of sites changing 
between fixed and potentially variable is therefore 
important for inferring evolutionary trees. Thus far 
we have the covarion model as accurate biochemically, 
but with undesirable statistical features of requiring 
too many parameters. One powerful solution is to use 
a hidden Markov chain for the covarion model, which 
incorporates the knowledge of evolving tertiary struc- 
ture and is mathematically tractable. 


A Hidden Markov Model for Covarions 


Tuffley and Steel (1997) reported a hidden Markov 
version of a covarion model that requires only two 
parameters additional to the basic Kimura model. 
This solves the main problem that the original co- 
varion model appeared to require several parameters 
per site — the hidden Markov model requires only two 
additional parameters irrespective of the length of the 
sequences. In its simplest form for nucleotides it has 
two main processes: 


1. A standard Kimura 3ST model of molecular evolu- 
tion (explained below). 

2. A second process with the two additional para- 
meters, ~ (the proportion of sites that are free to 
vary — these are the covarions), and 6 (the rate of 
interchange between variable and invariable sites). 
These two parameters are not observable directly, 
they are ‘hidden.’ 


These two parts are discussed in turn. 


The Kimura Model 

This is a simple Markov process on an evolutionary 
tree. Once described, it is easily extended to include 
the two unobserved (hidden) parameters. A scientif- 
ic model generally has the three parts: structure, 
mechanism, and initial conditions. In this evolution- 
ary model: 


1. The structure is an evolutionary tree. 

2. The mechanism is the Kimura 3ST process. 

3. The initial conditions are weights on the edges 
(branches) of the tree. 


The weights are a function of time and mutation rate, 
and all we need to know is the relative numbers of 
changes on each edge. An example of an evolution- 
ary model is illustrated in Figure | for four species, tı 
to t4. 

Consider each of the three parts. The structure of 
the model is straightforward, but does have interesting 
implications. There are four sequences (tı—t4) that are 
linked through common ancestors; this is important, 
implying a continuous series of intermediates that 
unite the observed sequences. The structure of the 
model is shown here as an unrooted tree, for this 
model the same patterns in sequence data are gener- 
ated irrespective of where the root is placed. In 
this case, the tree can also be drawn as a rooted 
tree that approximately follows a molecular clock. 
The mechanism we are using is the Kimura three- 
parameter model, it has one rate for transitions 
(namely «) and two for transversions (f and y), and 
x=—(a+ B+). 

There are several ways of describing the initial 
conditions of the model but it is necessary to convert 
from the instantaneous rate matrix (the mechanism, 
Figure IC) to a Markov transition matrix (Figure 
ID). The mechanism describes, at any point along 
the edge of the tree, the probability of a change — the 
transition matrix gives the overall probability of a 
change between the two ends of the edge. This 
obviously depends on both the rate matrix and length 
(time) along the edge. 

To return to the transition matrices, there will be 
one for each edge of the tree. In most models there is a 
single mechanism for the entire tree, but the amount of 
change onan edge of the tree varies depending on both 
the time and mutation rate. In the example used here it 
is possible to go in either direction along the edge of a 
tree, and thus the tree is shown as unrooted. More 
detailed models will allow for differences in nucleo- 
tide composition, and for asymmetries in the rate of 
conversion between nucleotides. 


i ty t ty ty ty 


Figure | 
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The Two Unobserved Parameters 

Under the covarion model we cannot be sure if a 
particular nucleotide is, or is not, potentially variable 
at a site, but all we need to estimate is the proportion 
of sites that are free to vary. In some case this may be 
inferred from the rate of evolution, calibrated by the 
fossil record; the cytochrome c example in the intro- 
duction is one example. In other cases it might be 
estimated by a maximum likelihood method. 

What we observe is an A, G, C or T. The variability 
status is represented by a superscript plus when the 
states are free to change (A‘, G*, C*, and T*) and 
minus when fixed (A`, G`, C7, and T`). The rate of 
interchange between the fixed and variable states is set 
to maintain the proportion ọ of variable sites. The 
instantaneous rate matrix (K’) for the hidden Markov 
model is shown in Figure 2. The rate matrix is now 
8 x 8 because there are now eight character states. In 
this rate matrix, it is impossible to go directly between 
some states, for example, from A” to C* as shown by 
a zero entry. Given longer periods, then the A~ to Ct 
change is possible by two or more steps. All sites 
have the same chance (@) of being either variable or 
invariable and thus the model is still stationary and 
iid. (independent and identically distributed). The 
model is ‘stationary’ in the sense that the basic process 
is unchanged over the whole tree. Whether a particular 
site in a sequence is able to change is unknown — hence 
the name ‘hidden’ Markov model. 

A covarion model of the type described here 
increases by 50-100% the time over which current 
methods of tree reconstruction are reliable. This need 
not be the limit for increased performance. Many 
other combinations of parameters could be tested in 
future, though it is preferable to explore theoretical 
properties first in order to test predictions more con- 
structively. In a sense, the covarion model increases 
the ‘effective number’ of variable sites. The covarion 
model could also explain why a particular molecule 
might have a range of times for which it is most 
suitable for evolutionary reconstructions. This is 
because the length of time it takes a particular pro- 
tein to saturate depends on the rate of evolution of 
its tertiary structure. If tertiary structure does not 
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(A) The structure of the model (an unrooted evolutionary tree) plus initial conditions (weights). (B) The 


model as a rooted tree; this does not affect the calculations. (C) The mechanism of Kimura’s three-parameter model. 
(D) A Markov transition matrix for a specific edge for a defined time. 
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Figure 2 Parameters for a hidden Markov model for 
nucleotide evolution. (A) The instantaneous rate matrix 
K’, and (B) a graphical representation. The diagonals 
(labeled x) are set so that each row of the rate matrix 
sums to 0. The arrows on the graphical form 
correspond to the positive entries in the rate matrix. 
The rates from A* (a, B, y, and ô) are shown on the 
graph. 


change, the protein is expected to saturate sooner. 
Others suggest that, in practice, some macromolecules 
lost resolution at intermediate dates of divergence, but 
improved again for divergences that were even older. 
Such a result could occur if some slight changes to 
secondary and tertiary structure only occurred very 
occasionally (that is, low values of 6, or no longer a 
stationary model). Under such circumstances, new 
invariable positions that helped recovery of the tree 
would arise occasionally. 


Tests for the Covarion Model 


What has been described thus far is evidence from 
structural biology for a covarion model. It is desirable 
to have quantitative tests on sequence data to test 
whether a covarion model is applicable in a particular 
case. A quantitative test based on Tuffley and Steel 
(1997) shows that for some data a covarion model fits 
better than a model where each site always has the 
same rate of evolution, evenif differentsites are evolving 
at different rates (a rates-across-sites model). Or more 
accurately, the test can reject any rates-across-sites 
model where each site always has the same rate 


throughout evolution. The test estimates a distance 
function dicoy between any two groups of taxa i and j. 
It subdivides the z sites in the dataset into five cat- 
egories: nı are constant in all taxa, n are constant 
within each group (but have a different character 
state in the two groups), n; and n4 are variable in one 
group and constant in the other. Finally n5 are sites 
that vary in both groups. If a covarion model is oper- 
ating then dicgy = (n3 + n4) / n. Under a covarion 
model this value is expected to increase with time, 
whereas it is expected to remain zero if sites always 
evolve at the same rate. 

More elaborate tests are possible, but it is prema- 
ture yet to know how useful they will be in practice. It 
is possible for a test to be ‘correct,’ but not that useful 
in practice because the test is of low power. A test 
based on dicoy may be most useful for a nonstationary 
covarion model, when there is a small number of 
larger changes in tertiary structure. Tests that can be 
applied directly to sequence data is an area for further 
study. 

It is interesting to note that the covarion model 
gives some biochemical justification for the use of, for 
example, a gamma distribution of rates. The gamma 
distribution compensates, in part, for some sites being 
invariant, and for pairs of sequences a covarion model 
can always be mimicked by a gamma distribution. 
Further work is required to determine when the 
gamma distribution isa useful approximation to the co- 
varion model. It is an interesting question whether it is 
useful to identify faster and slower sites, rather than 
assuming a site is sampled from a distribution of rates- 
across-sites. Finally, the covarion model is perhaps a 
justification for the common practice of discarding 
sites that are difficult to align. Such difficult to align 
sites are expected to occur where there has been a 
change in three-dimensional structure of the macro- 
molecule. 


Conclusion 


On the biochemical side the covarion model is well 
established as a realistic description of protein evolu- 
tion through time. In addition, it appears important in 
explaining how sequences allow the recovery of older 
divergences during evolution. Despite its biochemical 
realism, and potential importance for evolutionary 
studies, it is still difficult to use the covarion model 
in practice. The hidden Markov approach has potential 
but still requires more evaluation. One maximum like- 
lihood program for the hidden Markov approach has 
been implemented for nucleotides (A. Rambaugh, per- 
sonal communication), but additional experience with 
such an approach is urgently required. The covarion 
model is a good idea that still requires more research 


to be fully implemented. To return to the opening 
sentence, the covarion model allows the integration 
of molecular evolution at the sequence and tertiary 
structure levels; this is its rationale. 
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The mammalian genome is approximately 40% G or C 
and 60% A or T nucleotides. (According to Chargaff’s 
rules, the number of G nucleotides is equal to the 
number of C nucleotides, and the number of A 
nucleotides is equal to the number of T nucleotides, 
because, in double-stranded DNA, A pairs with T and 
G pairs with C.) Dinucleotides are two consecutive 
nucleotides on the same strand of a nucleotide chain, 
generally represented in the 5’ to 3’ direction. For 
instance, the dinucleotide CpG represents a C on the 
5’ side of a G, joined by a phosphodiester bond. In a 
genome consisting of 20% G and 20% C, the expected 
frequency of CpG dinucleotides is 0.2 x 0.2 or 0.04 
(4%). There are 16 possible dinucleotides, and they 
generally occur at the expected frequencies in the 
genome. The CpG dinucleotide, however, is excep- 
tionally rare throughout most of the genome, repre- 
sented at only a fraction of the expected frequency. 
The reverse dinucleotide, GpC, which has the G on 
the 5’ end of the chain, occurs at about the expected 
frequency. In CpG islands, however, the overall fre- 
quency of G and C is much higher than the average 
40%, and the CpG dinucleotide occurs at the expected 
frequency based on the overall G and C content of the 
region. CpG islands are defined as regions of the 
genome in which the G or C content exceeds 50%, 
and the CpG frequency is approximately equal to the 
GpC frequency. As an example, consider the human 
phosphoglycerate kinase gene. The G+C frequency in 
the promoter region is approximately 64%, while that 
in the coding region is about 48%. The expected 
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number of CpG dinucleotides in the coding region is 
89, but only 11 are found. But in the promoter region, 
83 CpG dinucleotides are found, in close agreement 
with the expected value of 83. The promoter region of 
the Pgk gene is a CpG island. 

The rarity of CpG dinucleotides is a result of the 
high frequency of mutations occurring in CpG dinu- 
cleotides that are methylated (5-methylcytosine in 
place of C). Most methylation in mammals occurs in 
CpG dinucleotides rather than in CpX, where X is 
any other nucleotide. Therefore CpGs are subject to 
mutation, specifically deamination of C to T. CpG 
islands form in regions of the genome that are not 
subject to methylation in the germline. (Mutations 
in other tissues are not passed on to subsequent 
generations and are therefore irrelevant.) For this 
reason, the CpG frequency is as expected. It is not 
clear why the G+C content is often more than 75%, 
however. 

Within CpG islands, the tetranucleotide CCGG 
may occur several times. This site happens to be 
cleaved by the restriction endonuclease HpaII. Hpall 
does not cleave this site when the internal C (which is 
part of a CpG) is methylated. Because of the occur- 
rence of several unmethylated CCGGs in CpG 
islands, these regions of the genome may be cleaved 
into many small fragments by Hpall. For this reason, 
CpG islands are also referred to as HTF islands (for 
H pall tiny fragment). Many rare cutting restriction 
endonucleases have sites that are rich in G and C, and 
may be sensitive to methylation. Rare cutters such as 
Not I cleave within CpG islands, but in very few other 
locations, and clusters of sites may be found within 
CpG islands. 

CpG islands are found in certain regulatory regions 
of the genome, including the promoter regions of 
many housekeeping genes such as the phosphoglycer- 
ate kinase gene. Since DNA methylation is involved 
in the repression of gene expression, it is usually not 
seen in association with housekeeping genes, which 
are expressed in all tissues. In tissue-specific genes, 
CpG islands are much less common, probably because 
these genes are frequently methylated, resulting in 
potential loss of CpG dinucleotides due to mutation. 
There are about 30000 CpG islands in the human 
genome. The fact that many are associated with 
genes suggests that CpG islands might be useful in 
locating genes within DNA sequences. 

G+C content may also be higher in Giemsa-light 
staining regions (also called ‘R-bands’) of the genome, 
which replicate during the first half of S-phase, than 
in Giemsa-dark regions, which replicate later in S- 
phase. 


See also: Chargaff’s Rules; Codon Usage Bias 
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Craniosynostosis is a congenital malformation caused 
by premature fusion of the cranial sutures, the seams 
that separate the individual bones of the skull. This 
prevents skull growth in the direction perpendicular 
to the fused suture, causing compensatory overgrowth 
at unaffected sutures. A specific genetic cause can 
be identified in about 20% of cases, predominantly 
accounted for by heterozygous mutations in four 
genes, encoding three members of the fibroblast 
growth factor receptor family (FGFR1, FGFR2, and 
FGFR3), and the transcription factor TWIST. These 
mutations cause the autosomal dominant syndromes 
of Crouzon, Apert, Pfeiffer, Muenke, and Saethre- 
Chotzen. Several FGFR mutations exhibit the highest 
point mutation rates currently known; these muta- 
tions originate exclusively during spermatogenesis. 
FGFR mutations are notable for the unusually com- 
plex series of allelic and nonallelic mutations that 
cause distinct phenotypes, through a variety of gain- 
of-function mechanisms. 


Classification of Craniosynostosis 


Craniosynostosis affects about 1 in 2500 individuals 
and is a significant medical problem. Without surgical 
treatment, the consequent distortion of skull growth 
may lead to altered blood flow in the brain, raised in- 
tracranial pressure, and cosmetic deformity; in more 
complex cases, involvement of the facial skeleton may 
cause additional problems with vision, hearing, nasal 
breathing, and dental development. 

Two methods of classification of craniosynostosis 
are used: anatomical and etiological (i.e., by cause). 
The anatomical classification identifies the fused cra- 
nial suture. There are six major sutures, comprising 
single metopic and sagittal sutures and paired coronal 
and lambdoid sutures (Figure 1). Single suture synos- 
tosis most commonly involves the sagittal suture 
(50% of cases), followed by coronal (20%, one third 
of which are bilateral), metopic (10%), and lambdoid 
(3%). Multiple suture synostosis accounts for the 
remainder. Alternatively an etiological classification 
emphasizes the primary cause of the craniosynostosis. 
The two most common causes of craniosynostosis are 
restriction of fetal head movement during the preg- 
nancy, and single gene disorders (syndromes) that 
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craniosynostosis. (A) Skull viewed from above showing 
the names and locations of the cranial sutures. (B) 
Alterations in the skull shape caused by sagittal 
synostosis (above) and bicoronal synostosis (below). 
The involved suture(s) is denoted by the thicker line. 


predispose to suture fusion. These syndromes may 
often be recognized by their characteristic clinical 
features, particularly the combination of facial appear- 
ance and limb abnormality. Bilateral coronal and mul- 
tiple suture synostosis occur with disproportionate 
frequency in syndromic cases, whereas sagittal synos- 
tosis is underrepresented in this group. The following 
section summarizes the key diagnostic features of the 
common craniosynostosis syndromes. 


Craniosynostosis Syndromes 


Apert Syndrome 

First described in 1906, Apert syndrome has a preva- 
lence of 1 in 65 000. The clinical features are a distinc- 
tive facial appearance, with high forehead, prominent 
eyes (caused by shallow orbits), prominent beaked 
nose and underdeveloped midface, and characteristic 
complex fusions of the digits (syndactyly) of the hands 
and feet (Figure 2). 


Crouzon Syndrome 

First described in 1912, Crouzon syndrome has a pre- 
valence of 1 in 60000. The facial appearance is similar 
to Apert syndrome but the hands and feet appear 
normal. Crouzon syndrome is sometimes accompan- 
ied by the specific skin disorder, acanthosis nigricans, 
characterized by pigmented, thickened, felty skin. 


Pfeiffer Syndrome 

First described in 1964, Pfeiffer syndrome has a pre- 
valence of approximately 1 in 100 000. It is similar to 
Crouzon syndrome, but the big toes and sometimes 


the thumbs are broad and turned away from the 
other digits. 


Muenke Syndrome 

Muenke syndrome was only recognized in 1996, but is 
probably the commonest craniosynostosis syndrome 
(approximately 1 in 30000). The nonspecific features 
make this disorder difficult to diagnose clinically, but 
it is readily identified by molecular genetic testing. 
Muenke syndrome is defined by the presence of a 
specific C—G transversion in the FGFR3 gene, corres- 
ponding to a proline 250 to arginine substitution. This 
mutation is present in about 30% of patients with 
coronal synostosis. 


Saethre—Chotzen Syndrome 

First described in 1931, Saethre-Chotzen syndrome 
has a prevalence of approximately 1 in 100000. The 
facial features include a low frontal hairline, facial 
asymmetry, drooping eyelids (ptosis), and small ears. 
Diagnostic limb abnormalities, which are not always 
present, are webbing between the digits and a broad 
big toe with a duplicated terminal phalanx. 


Craniofrontonasal Syndrome 

First described in 1977, craniofrontonasal syndrome 
hasa prevalence of less than 1 in 100 000. The craniosyn- 
ostosis involves the coronal sutures and is associated 
with very wide-spaced eyes, a grooved nasal tip, slop- 
ing shoulders, and longitudinally split nails. This dis- 
order is X-linked but, unusually for an X-linked 
condition, females are more severely affected than 
males. The explanation for this awaits identification 
of the causative gene, which has been mapped to Xp22. 
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Mutations in Craniosynostosis 


Mutations of four genes, FGFR1, FGFR2, FGFR3, and 
TWIST, are common causes of craniosynostosis. Key 
features of these genes, their corresponding proteins 
and the syndromes with which they are associated are 
summarized in Table l; for completeness this includes 
a rare disorder termed Beare-Stevenson syndrome. 
An additional gene, MSX2, is of historic interest 
because the first molecularly defined cause of cranio- 
synostosis, described in 1993, was an MSX2 mutation 
in a single family with Boston syndrome. However, 
MSX2 mutations usually give rise to a different 
phenotype with symmetric holes in the skull bones 
(parietal foramina). 


Mutations of Fibroblast Growth Factor 
Receptors (FGFRs) 

The four FGFRs, members of the receptor tyrosine 
kinase superfamily, are transmembrane proteins that 
bind extracellular fibroblast growth factors (FGFs). 
FGF binding promotes FGFR dimerization, resulting 
in trans-autophosphorylation by the intracellular tyr- 
osine kinase domains. This in turn activates specific 
intracellular signaling pathways, leading to alterations 
in cell growth, division, migration, or death. Analysis 
of FGFR genes in craniosynostosis has identified 
mutations in receptor types 1, 2, and 3, and has 
revealed that both Crouzon and Pfeiffer syndromes 
are genetically heterogeneous (Table |). In addition to 
craniosynostosis, other mutations of FGFR3 cause the 
bone dysplasia syndromes thanatophoric dysplasia I 
and II, achondroplasia, hypochondroplasia, and SAD- 
DAN (severe achondroplasia with developmental 


Figure 2 Clinical features of Apert syndrome. The facial appearance (left) combined with the syndactyly of hands 


(right) is characteristic. 
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Table | Genes mutated in craniosynostosis and their corresponding syndromes 
Gene Chromosomal Amino acids Mutation first Associated disorders(s) 
location in protein described (year) 
FGFRI 8p11.2-pl 1.1 822 1994 Pfeiffer syndrome (mild) 
FGFR2 10q26 821 1994 Apert, Crouzon, Pfeiffer, Beare-Stevenson syndromes 
FGFR3 4p16.3 806 1994 Muenke syndrome, Crouzon syndrome with acanthosis 
nigricans. Also short-limbed bone dysplasias (see text) 
TWIST 7p21.1 202 1997 Saethre—Chotzen syndrome 
MSX2 5q34-q35 267 1993 Boston craniosynostosis 


delay and acanthosis nigricans). The positions of the 
most important FGFR mutations in craniosynostosis 
are illustrated in Figure 3. The mutations tend to 
be localized, specific and sometimes highly recurrent 
missense amino acid substitutions. In the case of 
FGFR2 and FGFR3, different (allelic) missense mu- 
tations are associated with different phenotypes, 
suggesting that these mutations act by a variety of 
gain-of-function mechanisms. The three mechanisms 
identified for the craniosynostosis mutations are con- 
stitutive activation by covalent FGFR dimerization, 
increased FGF binding affinity, and altered splicing of 
alternative FGFR isoforms. 

Two particularly notable sites of mutation are high- 
lighted in Figure 3. First, a key cysteine residue in the 
IgIII domain of FGFR2 is one of the few amino acids in 
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any human protein for which all possible substitutions 
obtained by altering a single base of the triplet codon 
(in this case TGC, encoding cysteine) have been 
observed in nature (the substituted amino acids are 
arginine, glycine, phenylalanine, serine, tryptophan, 
and tyrosine). Second, a conserved proline residue in 
the linker between the IgII and IgII extracellular 
domains in each of the three FGFRs commonly 
mutates to arginine in all three proteins. This mutation 
causes Pfeiffer syndrome in FGFR1, Apert syndrome 
in FGFR2, and Muenke syndrome in FGFR3. The 
specific C-+G mutations causing Apert and Muenke 
syndromes have the highest known rates of any nucleot- 
ide transversion in the human genome (~10~ per 
haploid genome). In the case of the FGFR2, it has 
been shown that mutations causing Apert, Crouzon, 
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transmembrane proteins are shown traversing the cell membrane (pair of dashed lines), with the extracellular side 
to the left. Open rectangles delineate the principal domains. Symbols denote specific, recurrent amino acid 
substitutions or splicing mutations and the shaded rectangle shows a broader region of mutation, as indicated in the 
key. The arrowhead shows the position of the hypermutable cysteine in FGFR2 and the arrows indicate the position 
of equivalent proline—arginine mutations in all three receptors. Additional mutations of FGFR3 (not shown) are 


important causes of short-limbed bone dysplasia. 


and Pfeiffer syndromes arise exclusively from the 
father (this has also been demonstrated for the 
FGFR3 mutation in achondroplasia). These fathers 
tend to be older than average, but the mechanism of 
the excessive paternal mutations is not exactly known. 


Mutations in TWIST 

TWIST encodes a transcription factor of the basic 
helix-loop-helix family, required for cranial neural 
tube formation and the control of muscle and bone 
differentiation. The orthologous gene was originally 
identified in Drosophila melanogaster, in which twist 
plays a key role in mesoderm formation. It has been 
demonstrated that a Drosophila FGFR ortholog, htl, is 
a transcriptional target of twist, raising the possibility 
that bone differentiation in the cranial suture utilizes a 
developmental pathway conserved from flies. 

Unlike the FGFRs, the heterozygous TWIST 
mutations in Saethre-Chotzen syndrome cause loss 
of function of the protein (haploinsufficiency). TWIST 
mutations are correspondingly more diverse and 
include complete gene deletions, chromosome trans- 
locations, and intragenic insertions, deletions, mis- 
sense, and nonsense mutations. 
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The application of molecular techniques to the genetic 
manipulation of plants and animals has contributed 
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greatly to the understanding both of the genetic basis 
for a variety of human diseases and of fundamental 
biological control mechanisms. Nevertheless, eukary- 
otic biology presents a number of challenging 
obstacles to precise genetic manipulation. For example, 
transgenic plants and animals are commonly gener- 
ated by incorporation of foreign DNA into the gen- 
ome in a more or less random fashion. This often 
results in unexpected patterns of transgene expression 
due to position effects that can compromise the inter- 
pretation of gene function. Homologous recom- 
bination can be used to ameliorate this problem by 
targeting DNA to a known locus in the genome with 
a predictable expression pattern. Currently such 
procedures are not efficient and demand the use of 
selectable marker genes that can themselves distort 
expression patterns of neighboring genes. A second 
complication arises from the multicellular nature of 
the organism itself. In metazoans not only is there a 
diversity of cell types present in the various tissues 
and organs, but different genetic control processes 
may be operative at different developmental stages. 
Hence, the comprehensive assessment of a particular 
gene’s function necessitates its analysis in many differ- 
ent tissues throughout development and also in the 
adult. 

Site-specific DNA recombination provides one 
solution to these problems. Particularly useful is the 
Cre recombinase protein from bacteriophage P1. Cre 
is a member of the large Int family of site-specific 
DNA recombinases (named after the canonical and 
founding member, Int recombinase of bacteriophage 
lambda). Early on it was determined both genetically 
and biochemically that not only was Cre a potent 
DNA recombinase but also, in contrast to many mem- 
bers of the Int family, that Cre required no Escherichia 
coli host proteins for efficient recombination. These 
observations directly led to the demonstration that the 
prokaryotic Cre protein was active as a DNA recom- 
binase not only in bacteria, but also in eukaryotes. 

The alacrity and precision with which Cre carries 
out site-specific recombination in eukaryotic cells has 
had a significant impact on the genetic manipulation 
of transgenic animals and plants. Using site-specific 
DNA recombination strategies, molecular switches 
can be designed and placed into either transgenic or 
embryonic stem (ES) cell-derived animals to turn ona 
specific gene or, alternatively, to ablate genes in a 
developmentally and tissue-specific manner. Such pre- 
cise control of gene activity allows a truer assessment 
of a gene’s role in a particular organ or tissue. Site- 
specific DNA recombination strategies are becoming 
valuable in cell lineage analysis, in gene targeting, 
and in the engineering of defined chromosome rear- 
rangements. 
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Cre/lox Recombination 


The Cre (cyclization recombination) protein of phage 
P1 catalyzes recombination at a specific site on the P1 
genome called /oxP (locus of X-over of phage P1), and 
plays at least two roles in the biology of the phage. 
First, Cre enhances the stability of the single-copy P1 
plasmid replicon in E. coli by efficiently resolving 
dimeric circles generated by occasional homologous 
recombination between daughter molecules after plas- 
mid replication. In the absence of dimer resolution by 
Cre, partition of the single-copy dimeric DNA to 
only one of the two daughter cells at cell division 
results in cells that no longer carry the P1 replicon, 
i.e., plasmid loss. In addition, Cre ensures the prompt 
adaon of the linear, terminally redundant virion 
DNA of phage P1 after infection should the host 
recombination system fail. 

Cre is a 343 amino acid protein related to the Int 
recombinase of phage lambda. Unlike many other Int 
family members, Cre requires neither accessory pro- 
teins for activity nor any special topology of the DNA 
substrate. The loxP recombination site is 34 bp in size, 
consisting of two 13 bp inverted repeats flanking an 
asymmetric 8 bp spacer region that imparts an overall 
directionality to the site. Conservative site-specific 
DNA recombination occurs within the spacer region. 
Recombination between two directly repeated loxP 
sites on the same DNA molecule results in precise 
excision of the intervening DNA as a covalently 
closed circular molecule (Figure 1). DNA inversion 
occurs between oppositely oriented loxP sites. 


Removal of Selectable Marker Genes 


In general, stable gene transfer into cultured mamma- 
lian cells occurs at low efficiency. To facilitate identi- 
fication of clones that have incorporated exogenous 
DNA into the genome, a selectable marker gene such 
as neo (conferring resistance to the antibiotic G-418) 
is often used. Modification of a specific gene in ES 
cells by homologous recombination (gene targeting) 
typically occurs at a frequency of only 0.1-10% that 
of random or illegitimate recombination, so that over- 
all only a few correctly targeted clones are obtained 
per million cells. Thus, the use of a selectable marker 
and sophisticated screening procedures has been man- 
datory for identifying null or ‘knockout’ mutations by 
gene targeting in ES cells. At times, however, it would 
be advantageous to rid cells of the selectable marker 
after gene targeting, either to facilitate a second round 
of gene targeting, or simply because it may be undesir- 
able to have a functional drug-resistance gene in the 
final engineered transgenic. 

Cre-mediated site-specific recombination provides 
a simple way of attaining this goal. The selectable 


marker gene, such as neo, is embedded between two 
directly repeated loxP sites (a lox? neo cassette, also 
informally referred to as a ‘floxed’ neo gene). Subse- 
quent expression of Cre in cells carrying the lox? neo 
cassette, for example by a second transfection with a 
Cre expression vector, results in efficient removal of 
the selectable marker from the genome. 


Marker Recycling 

Because ES cells are diploid, both gene copies must 
be disrupted to determine the null phenotype. One 
way of doing this is by sequentially targeting each 
allele with a different selectable marker, but two dif- 
ferent DNA targeting constructs must then be made. 
Moreover, there exists only a limited number of select- 
able marker genes that work well in ES cells, and 
multiple targeting events using different markers 
would soon exhaust this repertoire of marker genes. 
Alternatively, a /oxP-flanked marker gene is used for 
the first round of homologous targeting, and it is 
removed by Cre-mediated recombination (Figure 1). 
The same selectable marker gene can thus be used for 
subsequent rounds of gene targeting. Marker recycl- 
ing is particularly useful in situations that require 
genetic modification of two or more autosomal genes 
in ES cells, and which thus require numerous rounds 
of targeting. 


Introduction of Point Mutations 

Classical knockout mutations introduced into mice by 
molecular techniques are most often deletions and/or 
insertions that give a complete null phenotype. How- 
ever, many mutations that cause human disease are 
either point mutations or small deletions that may 
alter a gene’s activity, but not completely eliminate it. 
Strategies incorporating Cre recombinase have facili- 
tated the engineering of such ‘subtle’ mutations into 
mice. Homologous recombination in ES cells is used 
to replace the endogenous wild-type allele in the gen- 
ome with the desired the point mutation, along with 
an adjacent lox’ selectable marker gene. Because the 
presence of a marker gene can have adverse effects on 
the expression of the target gene and/or neighboring 
genes, it is prudent to remove it by Cre-mediated 
recombination. The final gene-modified locus carries 
the desired point mutation and, in addition, the 34 bp 
loxP site. The small 34 bp site has not been shown itself 
to have any deleterious effect on gene expression, as 
long as the site is not itself placed in a critical gene 
element. 

Eviction of the marker is achieved by transfection 
of the lox?-marked ES cells with a Cre expression 
vector. Alternatively, the gene-modified ES cells are 
injected into blastocysts to generate a mouse that can 
then be mated to a second mouse that expresses Cre. 
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Cre-mediated excisive recombination of a selectable marker. Disruption/deletion of a target gene (striped 


box) is achieved by homologous recombination using a targeting vector carrying the loxP? neo selectable marker 
embedded within flanking target gene homology (striped box). The neo marker in the resulting gene knockout (KO) is 
removed as a covalently closed circle by Cre-mediated site-specific recombination at the loxP sites (open arrows) 
flanking the neo gene. The excised circular DNA is not maintained in mammalian cells because it lacks an origin of 
DNA replication and other DNA sequences required for stability. 


By designing the Cre transgenic to express Cre in the 
zygote or germline lineage, progeny mice are pro- 
duced that have ‘automatically’ deleted the marker 
gene from the genome. 


Conditional Mutations 


The binary nature of the Cre//ox system naturally 
gives rise to a conditional system for regulating gene 
expression. Simply put, DNA excision at loxP sites is 
dependent on Cre expression. Hence, recombination 
will have occurred only in those cells that express Cre 
recombinase, or had expressed Cre in a progenitor 
cell. By placing the cre gene under the control of a 
promoter with the type of regulation desired recom- 
bination is directed to a particular cell or time. Evalu- 
ation of the effects of gene expression in a particular 
type cell or tissue among the many different ones 
present in metazoans can thus be achieved by design- 
ing recombination-based genetic switches to either 


turn genes on or to eliminate target gene expression 
in a tissue-specific and/or temporal manner. 


Gain of Function 

Recombination-based genetic switches that result in a 
gain of function are valuable for a variety of transgenic 
strategies, including the targeting of transgene mis- 
expression to a specific tissue, the maintenance of 
transgenic lines expressing potentially lethal genes, 
and cell lineage analysis. To make expression of a 
transgene dependent on Cre-mediated recombination, 
a lox? STOP cassette is placed between the promoter 
and the gene to be regulated, where STOP is a DNA 
sequence designed to thwart downstream gene expres- 
sion by preventing proper transcription and transla- 
tion. Excision of STOP to permit transgene activation 
occurs only in cells that have expressed Cre. An addi- 
tional level of control over transgene expression is 
thereby attained. Transgene expression is confined to 
the overlap of two separate expression patterns, that of 
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the Jox-modified transgene, and that of Cre recom- 
binase, and is temporally restricted to occur only after 
prior Cre expression. Such recombinational strategies 
render potentially embryonic lethal transgenes quies- 
cent until activated at a desired later time by a suitably 
regulated cre gene. Propagation of transgenic models 
that might otherwise be impossible to maintain can 
thus be achieved. Because activation of a reporter gene 
by Cre-mediated recombination indelibly marks Cre- 
expressing cells and their descendents, Cre-based strat- 
egies are becoming increasingly important for cell 
lineage analysis. 


Loss of Function 

Null mutations ablate gene function in all cells of a 
multicellular organism and provide important insight 
into a gene’s biological role. In many cases, though, a 
gene may play different roles in different cells. To gain 
a finer understanding of gene function in a particular 
cell type, site-specific recombination is used to delete 
the gene specifically in the target cell type. Two mice 
must be engineered for this strategy: one that ex- 
presses Cre with the desired tissue or developmental 
specificity, and the other a mouse that has been modi- 
fied by homologous recombination to carry loxP sites 
flanking the gene or gene segment to be deleted. Mat- 
ing of these two mouse lines generates progeny carry- 
ing both Cre and the /ox-modified locus. In these mice 
expression of Cre in the target tissue deletes the Zox- 
modified gene in the desired cells without disrupting 
gene expression in other tissues, so that the biological 
role for that gene can more unambiguously be deter- 
mined. 


Chromosome Rearrangements 


Cre-mediated site-specific recombination is both con- 
servative and reciprocal, proceeds either intra- or 
intermolecularly, is remarkably efficient in eukaryotic 
cells, and is undeterred in recombining loxP sites that 
are quite far from each other (90 kb on Cre’s natural 
substrate, the P1 genome). It has thus become clear 
that Cre might allow actual genome engineering by 
being able to effect large-scale deletions, inversions, 
and even chromosome translocations. Unlike spon- 
taneous or mutagen/radiation-induced chromosome 
rearrangements, Cre-mediated rearrangements can be 
designed with nucleotide precision by using homo- 
logous recombination to place loxP sites exactly as 
desired in the genome, and then having Cre catalyze 
recombination between the loxP sites. 


Deletions, Duplications, and Inversions 
Classical genetics, particularly that of Drosophila, has 
benefited enormously from the use of large deletions 


and inversions in such diverse areas as gene mapping, 
saturation mutagenesis of a particular chromosome or 
chromosomal region, and strain construction. Cre- 
mediated genome rearrangements allow similar strat- 
egies to be implemented in the mouse, and will be 
useful in the functional dissection of complex genetic 
loci. Cre-based strategies have been used to generate 
large chromosomal deletions difficult or impossible to 
obtain by homologous recombination-only strategies. 
Such, for example, has been the case for a 400kb 
interval carrying the gene for the Alzheimer-disease- 
associated amyloid precursor protein. Because a var- 
iety of human genetic diseases stem from deletions of 
chromosomal regions carrying multiple genes, precise 
engineering of the same deletions into the mouse will 
help generate better models of these disorders. Since 
Cre-mediated deletion in the megabase range may not 
proceed as efficiently as for smaller kilobase intervals, 
the desired deletions are selected in ES cells by incor- 
porating a negative selectable marker, such as the 
herpes thymidine kinase gene, into the region to be 
deleted. 

Note that Cre-mediated recombination at loxP 
sites placed in opposite orientations with respect to 
each other will result in a chromosomal inversion. 
Tandem duplications will be formed when similarly 
oriented loxP sites are placed on each of the chromo- 
somal homologs instead of on the same homolog 
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Figure 2 Generation of a balanced chromosome 
deletion/duplication by Cre-mediated recombination. 
Homologous recombination is used to place a loxP site 
(open arrow) distal to the centromere (filled circle) on 
one chromosome homolog, and a second loxP site 
proximal to the centromere on the other chromosome 
homolog. Cre-mediated recombination between the 
loxP sites will generate one chromosome carrying a 
deletion of genes C and D, and a second chromosome 
with a tandem duplication of genes C and D. 


(Figure 2). In this case, ‘unequal’ intermolecular 
recombination between loxP sites on each homolog 
gives rise to one chromosome with a deletion of the 
target interval balanced by another carrying a tandem 
duplication of that interval. The mating of such 
animals with wild-type gives rise to a partial trisomy 
of the target interval, or a partial monosomy. Design 
of targeted partial trisomies should prove useful in 
understanding gene dosage effects, such as those 
described for Down syndrome. 


Translocations 

Spontaneous and mutagen-induced chromosomal 
translocations have played an important role in clas- 
sical genetics by providing tools to generate partial 
trisomies and monosomies, and in obtaining unipar- 
ental disomies (arising from chromosomal nondis- 
junction events). Synthetic reciprocal translocations 
covering precise genomic segments are designed in a 
two-step process: homologous recombination is used 
to place loxP sites at the desired loci on chromosomal 
heterologs, and the balanced reciprocal translocation 
is then generated by Cre recombinase. In this strategy 
both loxP sites must be oriented similarly with respect 
to the centromere. If present in opposite orientations, 
site-specific recombination will lead to formation of 
unstable dicentric and acentric chromosomes. Cre- 
mediated translocations in the mouse have also been 
used to mimic somatic translocations associated with 
various human cancers. The design of mouse analogs 
of these translocations is helping to clarify the con- 
tribution of chromosomal position effects on gene 
expression to tumorgenesis. 


Genomic Targeting of DNA 


Position effects on transgene expression can be miti- 
gated by targeting DNA to a predetermined genomic 
site. For example, homologous recombination has 
been used to ‘knock-in’ an altered allele or other 
transgene to a desired locus so that it is now under 
the developmental and tissue-specific control of that 
locus. Higher efficiencies of targeting can be achieved 
using Cre-mediated integrative recombination. The 
process requires two steps: first, a loxP site is placed 
at the desired locus by homologous targeting in ES 
cells; next, two plasmids, one a targeting plasmid car- 
rying a loxP site and the other a Cre expression con- 
struct, are transfected into cells carrying the genomic 
loxP site to obtain integrants of the targeting plasmid 
at the chromosomal target where integration is simply 
the reverse of excision (Figure |). Cre-mediated tar- 
geting is valuable when the same locus is to be targeted 
repeatedly with different alleles or other transgenes. 
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Stable integration is obtained by providing only a 
burst of transient Cre expression so that the targeting 
vector is trapped in the genome after the first (per- 
mitted) round of (integrative) recombination. A vari- 
ation of this procedure further increases targeting 
efficiency: two heterospecific lox sites that cannot 
recombine with each other are placed both at the 
chromosomal target locus and onto the targeting vec- 
tor. Because lox sites of the same specificity are profi- 
cient for recombination with each other, Cre catalyzes 
a double crossover exchange at the pairs of hetero- 
specific lox sites to integrate the transgene on the 
targeting vector into the chromosome. 


Modified Cre Proteins 


Modification of Cre has further refined the utility 
of Cre recombinase in eukaryotic cells. For example, 
although wild-type Cre protein itself efficiently 
localizes in the nucleus of eukaryotic cells, the nuclear 
localization signal of the SV40 T-Ag was fused to Cre 
to guarantee nuclear entry, and the resulting chimeric 
protein was shown to be recombinationally active. 
Fusion with other proteins or protein motifs has also 
given functional recombinase derivatives. 


Green Fluorescent Protein 

Since recombination can only take place in cells 
expressing Cre, knowing which cells these are allows 
prediction of the cell population in which recombin- 
ation will occur. Fusion of the naturally fluorescent 
green fluorescent protein (GFP) of the jellyfish 
Aequorea victoria to Cre provides a handy way to 
identify cells expressing the recombinase in living 
cells and has been particularly useful in identifying 
loxP-modified ES cells committed to excisive recom- 
bination. After transfection with DNA coding for the 
GFPcre fusion gene, cells that express the fusion pro- 
tein, even transiently, are fluorescent and are easily 
recovered using a fluorescence-activated cell sorter 
(FACS). Because only a few cells actually take up 
DNA using standard transfection protocols, FACS 
sorting allows isolation of the productively trans- 
fected cell population in which the vast majority is 
destined for recombination. Expression of the GFPcre 
gene in transgenic animals may also help in determin- 
ing which tissues express Cre, information critical for 
success in conditional activation/gene ablation strat- 
egies. 


Regulation by Steroids 

For conditional genetic strategies using Cre, induction 
of recombination by simple administration of a drug 
or other small molecule to an animal would be quite 
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valuable. One way of achieving this goal is to control 
synthesis of the Cre protein using an inducible pro- 
moter system. Placing the cre gene under the control 
of an interferon or tetracycline-responsive promoter, 
for example, prevents synthesis of the Cre protein and 
ensuing recombination until induced by either inter- 
feron or tetracycline, respectively. Alternatively, the 
activity itself of the Cre protein can be regulated by 
fusion of the steroid receptor ligand-binding domain 
to Cre. The ligand-binding domain disables Cre 
recombinase activity in mammalian cells. Treatment 
of cells or animals with the appropriate steroid acti- 
vates the Cre fusion protein so that recombination can 
occur. One advantage of this approach is that a tissue- 
specific promoter can be used to target expression of 
the fusion protein to a desired tissue. Temporally con- 
trolled recombination is achieved by dosing the ani- 
mal with the proper inducer. Such strategies permit 
the application of powerful pharmacological method- 
ologies to the understanding of specific gene func- 
tion by allowing gene activation or inactivation in a 
particular target organ after simple administration of 
an inducer. 


Further Reading 

Bethke B and Sauer B (1997) Segmental genomic replacement 
by Cre-mediated recombination: genotoxic stress activation 
of the p53 promoter in single-copy transformants. Nucleic 
Acids Research 25: 2828-2834. 

Hoess RH and Abremski K (1990) The Cre-lox recombination 
system. In: Eckstein F and Lilley DMJ (eds) Nucleic Acids and 
Molecular Biology, vol. 4, pp. 99. Berlin: Springer-Verlag. 

Justice MJ, Zheng B, Woychik RP and Bradley A (1997) Using 
targeted large deletions and high-efficiency N-ethyl-N- 
nitrosourea mutagenesis for functional analyses of the mam- 
malian genome. Methods 13: 423-436. 

Marth JD (1996) Recent advances in gene mutagenesis by site- 
directed recombination. Journal of Clinical Investigation 97: 
1999-2002. 

Rajewsky K, Gu H, Kuhn R et al. (1996) Conditional gene tar- 
geting. Journal of Clinical Investigation 98: 600-603. 

Sauer B (1998) Inducible gene targeting in mice using the Cre/lox 
system. Methods 14: 381-392. 


See also: Integrase Family of Site-Specific 
Recombinases; Knockout; Site-Specific 
Recombination; Transposons as Tools 
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See: GSD (Gerstmann-Straussler Disease) 
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Francis Harry Compton Crick (1916-) was trained 
as a physicist and joined the Cavendish Laboratory 
after World war II to study X-ray crystallography of 
proteins. There he joined forces with J. D. Watson and 
in 1953 they produced the famous double-helix struc- 
ture of DNA. Although primarily a theoretician, in 
the late 1950s Crick began a study of mutants in the rII 
region of bacteriophage T4, and in a collaboration 
with S. Brenner, showed that certain mutations pro- 
duced frameshifts in the reading of the message. This 
allowed them to deduce that the genetic code was a 
triplet code. He was responsible for many significant 
theoretical contributions to several areas of molecular 
and cell biology. In the mid 1970s he changed direc- 
tion and joined the Salk Institute for Biological Studies 
in La Jolla, California where he entered the field of 
neurobiology, primarily in the area of consciousness 
research. 


See also: Brenner, Sydney; Watson, James Dewey 
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Cri-du-chat syndrome is a chromosomal disorder 
characterized by a deletion of the short arm of chromo- 
some 5 encompassing the mid-portion of the terminal 
band 5p15. In younger patients, the characteristic cry 
similar to the mewing of a cat is the key diagnostic 
feature of the syndrome. Confirmation of the syn- 
drome is made by karyotype analysis, which shows 
the deletion. The incidence of cri-du-chat syndrome is 
estimated as 1 in 20000 to 1 in 50 000 newborn infants 
making this a relatively common genetic disorder. 
Among mentally retarded individuals with IQs of 
below 50, its prevalence is around 1 in 350. 

The clinical features of cri-du-chat syndrome 
evolve with age. In newborn infants the most common 
findings are prenatal growth retardation, low birth 
weight, microcephaly, facial abnormalities, severe 
hypotonia, and a high-pitched monochromatic cat- 
like cry. The facial anomalies include a round face 


with hypertelorism and epicanthal folds, a broad base 
of nose, and micrognathia. Ears are lowset or poorly 
formed. Severe respiratory or feeding difficulties soon 
after birth are frequent. The phenotype in infancy and 
young children include psychomotor retardation, a 
high-pitched cry, microcephaly, growth retardation, 
poor weight gain, a round face, hypertelorism, a 
broad nasal bridge, downslanting palpebral fissures, 
and micrognathia. Coordination problems are always 
present. The gait is unsteady and broad-based, stoop- 
ing with bent knees. With advancing age the pheno- 
type becomes less striking and the clinical picture is 
difficult to establish. The face lengthens with a poorly 
angulated mandible, the nasal bridge normalizes, and 
the hypertelorism and epicanthal folds attenuate. 
Teeth are decayed and abnormally erupted with fre- 
quent malocclusion. Marked growth retardation 
results in short stature, poor weight gain, and signifi- 
cant microcephaly. Hypertonia of the limbs with 
strong reflexes and spastic gait may appear. Scoliosis 
and premature graying of hair are observed. 

Chronic medical problems in childhood include 
upper respiratory tract infections, otitis media, severe 
constipation, and hyperactivity. Minor anomalies such 
as strabismus, deficient tears, dental malocclusion, 
gastroesophageal reflux, inguinal hernia, hip disloca- 
tion, or clubfoot may be present and are amenable 
to various medical and surgical interventions. Scoliosis 
is relatively frequent after 8 years of age. Major mal- 
formations are rare and include mostly cardiac and 
gastrointestinal tract anomalies. They are more fre- 
quent in patients with unbalanced translocations 
and therefore associated chromosomal imbalances. 
Mortality rates, except for those with major anomal- 
ies, are low and many of these patients survive into 
adulthood. 

Most patients are severely to profoundly mentally 
retarded. The sitting posture is usually acquired only 
after the age of 2 years and independent walking after 
the age of 4. Some patients never learn to walk. Lack of 
speech was considered to be a characteristic of the 
syndrome; however, the cri-du-chat children who are 
raised at home and who benefit from early, intensive 
programs of special education are ambulatory and 
can communicate either verbally or through gestural 
language. Stimulation programs including forms of 
communication training could prevent potential 
behavioral problems, which mostly relate to the 
patients inability to express themselves. 

About 85% of patients have a de novo deletion 
either terminal or, less frequently, interstitial. Ring, 
de novo unbalanced translocations, or mosaics with a 
normal clone are sometimes observed. The size of the 
deletion is variable. In 10-15% of patients the cause of 
the deletion is a parental rearrangement, which is a 
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translocation in more than 90% of cases. Pericentric 
inversions, insertions, or complex rearrangements have 
also been described. 

Karyotyping of the parents is needed in order to 
provide genetic counseling. The risk of recurrence in 
a patient’s younger sibling is low unless one parent is 
a carrier of a chromosomal rearrangement involving 
5p. If a parental translocation is present, the risk of 
having another child with a chromosomal imbalance 
involving 5p could be 15-25%. Prenatal diagnosis is 
possible by fetal karyotyping. The risk of recurrence 
in a patient’s child could as much as 50%, but no 
affected individuals are known to have reproduced 
to date. 

Through the analysis of numerous patients at the 
cytogenetic and molecular level, the chromosome 
region that is deleted in all cri-du-chat patients has 
been localized to the mid-portion of the terminal band 
of 5p, more precisely 5p15.2—p15.3. Patients that have 
the characteristic facial features and severe mental 
retardation all have deletions that encompass a portion 
of 5p15.2. Patients with the cat-like cry but lacking 
the characteristic facial features and severe mental 
retardation have deletions that only encompass the 
proximal part of 5p15.3. These results suggest that 
there are two noncontiguous critical regions involved 
in the etiology of cri-du-chat syndrome. Most 5p dele- 
tions encompass both critical regions and give the 
typical cri-du-chat phenotype. However, a 5p deletion 
does not necessarily indicate a diagnosis of cri-du-chat 
syndrome. Probes have been developed to determine 
the extent of the deletion in patients with small 5p 
deletions or an atypical cri-du-chat phenotype. 


See also: Genetic Counseling; Genetic Diseases; 
Idiogram; Translocation 
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A cross is an experimental protocol where organisms 
of one defined genotype and sex are mated with organ- 
isms of a second defined genotype and sex. The 
number of actual matings that are carried out can 
be as low as one or as high as 100 or 1000. Each of 
the matings is considered to be equivalent, and data 
obtained on all offspring are combined together for 
genetic analysis. 


See also: Breeding of Animals 
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Crossing-over is the reciprocal exchange of corres- 
ponding segments between homologous chromo- 
somes. It occurs as a regular event in the prophase of 
the first division of meiosis, and occasionally during 
mitosis. Its physical basis is seen in the form of chias- 
mata between homologous chromosomes at the diplo- 
tene stage of meiotic prophase. Crossovers, if they 
occur between chromosomal loci marked by allelic 
difference, result in genetic recombination. 

The reciprocal nature of crossing-over was first 
inferred from the general statistically equal frequencies 
of reciprocally constituted recombinant classes among 
randomized meiotic products, but it is demonstrated 
more rigorously when the reciprocal recombinants can 
be recovered together in the same meiotic tetrad. Tet- 
rad analysis also provides the most direct evidence that 
each crossover involves only one chromatid from each 
divided chromosome, and therefore generates two 
recombinants and two nonrecombinants (a ‘tetratype’ 
tetrad). Full tetrad analysis is possible only in numer- 
ous fungi and a few algae, but similar conclusions can 
be drawn from experiments with attached-X chromo- 
somes in the fruit fly Drosophila, which amount to 
half-tetrad analysis since each viable egg receives two 
of the four X chromosome copies. 

Meiotic recombination can also occur by gene con- 
version, which is essentially the replacement of a patch 
of chromosomal DNA by a corresponding sequence 
from the homologous chromosome, with the donor 
chromosome undergoing repair back to its original 
constitution. Up to about 50% of conversion events 
(but sometimes a much smaller proportion) tend to be 
associated with nearby crossovers, and there is evi- 
dence, still not conclusive, that conversions and cross- 
overs stem from the same kind of interchromatid 
interaction, which always involves some local nonre- 
ciprocal transfer of DNA but only sometimes results 
in a crossover. 

The relative importance of crossing-over and con- 
version in recombination depends on the spacing of 
the markers being recombined. In classical linkage 
studies, with genes usually separated by some hun- 
dreds or thousands of kilobases, crossing-over is all- 
important, and the effects of conversion negligible. 
Whereas any single crossover falling between two 
marked genes will generate recombinants whatever 
the distance between them, a local nonreciprocal event 
will have an observable effect only if the transferred 


patch happens to include one of the markers. Only 
when the recombination events are very close to the 
markers being recombined does conversion become 
significant. Studies both on fungi and Drosophila 
have shown that when recombination is selected for 
within a gene, with markers only on the order of 
kilobases apart, most recombinants are due to con- 
version, with or without crossing-over between any 
markers flanking the gene. 


See also: Attached-X and other Compound 
Chromosomes; Gene Conversion; Genetic 
Polarity; Meiosis; Meiotic Product; Recombination, 
Models of; Tetrad Analysis; Tetratype 


Crossover Suppressor 


J RS Fincham 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0293 


Crossover suppressors were identified early in the 
history of Drosophila genetics as genetic elements 
which, when heterozygous, had the effect of greatly 
reducing the frequencies of crossovers within blocks 
of linked loci. They were shown to be inversions of 
the chromosome segments within which crossing-over 
was suppressed. When the suppressors were made 
homozygous, normal frequencies of recombination 
were restored, but the loci concerned mapped in an 
inverted order relative to the wild-type. 

The consequences of crossing-over within a hetero- 
zygous inversion depend on whether or not the 
inversion includes the centromere (i.e., whether it is 
pericentric or paracentric, respectively), Single cross- 
overs result in either case in inviable chromosomes 
with deletions and deficiencies (Figure 1). But in the 
case of a paracentric inversion a single crossover will 
lead to the formation at first anaphase of terminally 
deleted chromatids linked together to form a bridge 
between the two centromeres, with the deleted seg- 
ments forming an acentric fragment which is usually 
lost (Figure 1B). Apparently for mechanical reasons, 
the chromatids forming the bridge are excluded from 
the egg nucleus, which instead receives one of the two 
non-crossover chromatids. Thus, on the female side, 
crossing-over within the paracentric inversion is sup- 
pressed without cost. The paracentric inversion does 
not affect the viability of the spermatozoa either, since 
there is no crossing-over in the Drosophila male in any 
case. This may explain why paracentric inversions are 
rather common in wild populations of Drosophila. 

Crossing-over within pericentric inversions, on the 
other hand, always reduces egg viability since it causes 


Figure | 
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Why segmental inversions are crossover suppressors when heterozygous. (A) Pericentric inversion: the 


products of crossing-over within the inversion have duplications and deficiencies of chromosome segments and will all 
be inviable, but they are free to enter the meiotic products. (B) Paracentric inversion: crossing-over within the 
inversion creates crossover products that are trapped in an anaphase | bridge and fragment, and in Drosophila are 
usually excluded from the egg (there is no crossing-over in the male). Symbols: V, viable; IV, inviable; L, lost. 


no bridge-fragment formation and hence no way of 
keeping deletion-duplication chromosomes out of the 
egg nucleus. Pericentric inversions are rare in wild 
populations. 

Crossing-over within heterozygous inversions is 
not suppressed completely, since a second crossover, 
involving the same two chromatids as the first will 
restore normal chromatid structure. However, cross- 
over interference will generally make this a rare event. 

Not only in Drosophila, but in sexual organ- 
isms generally, segmental interchanges can also have 
crossover-suppressing effects when heterozygous. 

In principle, any mutation that prevented normal 
recombination could be called a crossover suppressor, 
but such mutations usually have more conspicuous 
consequences such as radiation sensitivity and/or 
sterility. 


Further Reading 
Sturtevant AH (1961) Selected Papers. San Francisco, CA: WH 
Freeman. 


See also: Crossing-Over; Inversion; Segmental 
Interchange 
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See: Craniosynostosis, Genetics of 
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James F. (Jim) Crow (1916-) continues into the 
twenty-first century more than six decades of active 
investigations of population genetics, particularly in 
humans. The long skein of his contributions includes 
both experimental work, using Drosophila, and the- 
oretical analysis, using published data on the human 
and Drosophila. His chain of contributions is linked 
by a series of strong collaborations with established 
geneticists from around the world, postdoctoral fel- 
lows, graduate students, and even undergraduates at 
the University of Wisconsin, where Crow taught gen- 
eral genetics for many decades. 
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Jim Crow’s early experimental work with Dros- 
ophila explored isolating mechanisms between species 
and the polygenic determination of insecticide resist- 
ance. His student Yuichiro Hiraizumi discovered the 
segregation distortion (SD) system of meiotic drive in 
Drosophila. Further work by Crow’s colleagues Larry 
Sandler, Dan Hartl, Terry Lyttle, Barry Ganetzky, and 
others has subsequently brought the understanding of 
the SD system forward to molecular analysis. 

Jim Crow’s studies of mutation rate, genetic load, 
and the structure of human populations began in 1956 
when he worked with Newton Morton and Hermann 
Muller to measure the impact of inbreeding. These 
studies continued on several fronts: the analysis of 
the Hutterite population with Arthur Mange (1965); 
the impact of assortative mating with Joe Felsenstein 
(1968); the role of recombination (1988); the theory of 
genetic loads (1976); the nature of effective population 
size (1984); isonymy (1980); mutation component 
(1998); and, most recently, studies on the pronounced 
elevation of human mutation rates with age of the 
male parent (1997). 

Jim Crow and his colleagues have contributed 
seminal experimental investigations in Drosophila 
that inform some of the issues for human populations. 
A monumental experiment by Terumi Mukai indi- 
cated that the frequency of new mutations with 
minor deleterious effect may be as high as 50% per 
zygote. With Rayla Greenberg Temin and then with 
Michael Simmons, Crow showed that, typically, the 
detrimental effects of spontaneous mutations and of 
EMS-induced mutations are partially dominant. Mukai 
and Crow’s Drosophila work and more recent esti- 
mates for mammalian species predict an enormous 
mutation burden on these populations. Extinction is 
avoided by eliminating deleterious alleles in groups, 
as elaborated by mathematical analyses that Crow 
carried out with Motoo Kimura on the efficiency of 
truncation selection (1979). 

These formal population genetic studies are com- 
plemented in Jim Crow’s work by more molecular 
insights. With his student Kimura, Crow developed 
Sewall Wright’s formalism of random sampling of 
alleles in small populations to propose, in 1964, the 
‘infinite allele model.’ This foreshadowed Kimura’s 
influential neutral theory of molecular evolution. 
Crow and Kimura also collaborated on their now 
classic textbook An Introduction to Population Genet- 
ics Theory (Crow and Kimura, 1970). In the same 
spirit of “gladly learn and gladly teach,” Jim Crow 
published his Genetics Notes in eight editions (Crow, 
1950-1983), his Basic Concepts in Population, Evolu- 
tionary, and Quantitative Genetics (Crow, 1986), and 
has edited, with William Dove, Perspectives on Genet- 
ics (Crow and Dove, 2000). 


Jim Crow’s impact on the science of genetics is 
enriched by his service to the community of science 
and society. This activity ranges from local (Chairman 
of the Laboratory of Genetics) to national (NIH 
Study Sections on Genetics and Mammalian Genetics; 
Department of Energy and National Research Coun- 
cil Panels on Radiation Hazards; and panels on DNA 
forensics for the National Research Council and, now, 
the US Department of Justice). Internationally, he 
served on the first US Committee for Scholarly Com- 
munication with the People’s Republic of China. His 
appointment as an honorary member of the Japan 
Academy reflects his long-term synergy with genet- 
icists in Japan. Crow’s science and his service are 
recognized by elected memberships in the National 
Academy of Science, the National Academy of 
Medicine, the American Philosophical Society, the 
American Academy of Arts and Sciences, the World 
Academy of Art and Science, and the Wisconsin 
Academy of Sciences, Arts, and Letters. 

The science of genetics and its impact on human 
well-being are Jim Crow’s life work. His work is 
complemented by his lifelong love of music, shared 
with his friends and family. Even here, he has served 
the community by playing in the Madison Symphony 
Orchestra and by serving as its president between 
1984 and 1986. He can often be found playing viola 


in a string quartet of local musicians. 
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Crown gall is one of several plant tumor diseases 
typified by a non-self-limiting tissue overgrowth 
usually on the roots and bottom portions of trunks 
of mainly woody plants. The appearance of tumors is 
rough on the surface with semi-soft, smooth, spongy 
inner layers of tissue. With age, the tumors are easily 
dislodged and their outer layers are friable. Unlike 
other tumor diseases, crown gall is the result of genetic 
transformation caused by Agrobacterium tumefaciens, 
a gram-negative, rod-shaped bacterium that resides in 
soil preferably on the surface of roots. Unique among 
bacteria is the ability of A. tumefaciens to transmit 
tumor-forming genes (oncogenes) into its host plant 
cell, culminating in the integration of the oncogenes 
into the plant chromosomes at one or more sites. The 
products of the integrated oncogenes produce the plant 
growth hormones cytokinin and auxin that cause the 
abnormal proliferation of the transformed cells. The 
appearance of crown gall is shown in Figure I. 


Historical Background of Crown Gall 


The crown gall disease was described in biblical times 
on trees and grapevines as galls and nodules. The first 
scientific description of galls on grapevines was 
reported in France by Fabre and Dunal (1853). The 
causal agent of crown gall was first isolated in 1895 
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from galls on grapevines in Naples, Italy by Cavara 
(1897), who cultured the bacterium onagar medium and 
showed it to cause the tumor disease that he called 
‘tubercolosi della Vite.’ In the United States, George. 
G. Hedgcock in 1904 isolated bacteria that produced 
white colonies on agar medium and caused the same 
galls as that from which he isolated the organism 
(Hedgcock, 1905). In 1907, Erwin F. Smith and C.O. 
Townsend designated the bacterium as Bacterium 
tumefaciens and showed that the white colony pro- 
ducing bacterium causes tumors in chrysanthemum, 
marguerite daisy, tobacco, tomato, potato, sugar beets, 
and on peach roots (Smith and Townsend, 1907). 
Smith continued exploring the range of susceptible 
and ‘immune’ plants to the crown gall disease. By 
1920, numerous reports appeared in describing the 
crown gall disease on fruit trees, primarily on apple 
trees and stone fruit trees. The original name of the 
organism was changed from Bacterium tumefaciens to 
Phytomonas tumefaciens and subsequently to Agro- 
bacterium tumefaciens. Between 1930 and 1950, a 


Figure | Crown gall tumors developing at the base of 
the trunk of a young cherry tree. 


492 Crown Gall Tumors 


number of investigators sought to identify the onco- 
genic material produced by A. tumefaciens. There 
were lengthy debates on whether the bacterium itself 
or a ‘tumor-inducing-principle’ causes the crown gall 
disease. Plant tissue culture studies provided evidence 
that the tumor tissue remained in a transformed state 
in the absence of bacteria. The transforming agent was 
subsequently sought, with a number of studies direct- 
ed toward the physiological and biochemical differ- 
ences between the crown tumor and its surrounding 
healthy tissues, and between A. tumefaciens and other 
tumor-causing bacteria such as Pseudomonas savasta- 
noi (now called Pseudomonas syringae pv. savastanoi). 
Avirulent strains were found when A. tumefaciens was 
cultured at 37°C or when treated with ethidium 
bromide, suggesting that an extrachromosomal elem- 
ent is required for virulence. In support of this 
notion, A. radiobacter, a naturally occurring avirulent 
relative of A. tumefaciens, was shown to be converted 
to the virulent form when mixed with the virulent 
strain and inoculated on plants. The direct analysis 
of A. tumefaciens and A. radiobacter revealed the 
presence of a large virulence-conferring plasmid, 
called the Ti (for tumor-inducing) plasmid (see Ti 
Plasmids). Though A. radiobacter also contained 
large plasmids, it is remarkable that the early work 
concluded correctly that the plasmid in A. tumefaciens 
conferred virulence. Subsequent DNA hybridization 
studies in the late 1970s and early 1980s confirmed the 
original hypothesis that genetic elements were trans- 
ferred from A. tumefaciens into the plant chromo- 
somes. The transmission of genetic material across 
kingdom boundaries by A. tumefaciens is the first 
bona fide case in evolutionary biology of active hori- 
zontal gene transfer between living organisms of dif- 
ferent kingdoms (Prokarya to Eukarya). The research 
on A. tumefaciens gave rise to the modern technology 
of plant genetic engineering whereby any piece of 
DNA placed in the T-DNA can be transferred into 
and expressed in plants. 


Horizontal Transmission of Oncogenes 
and Opine Genes 


Transmission of the oncogenes is mediated by a pro- 
miscuous DNA transfer system in A. tumefaciens. The 
oncogenes are located in the T-DNA, a specific por- 
tion of an extrachromosomal element called the Ti 
plasmid that is resident in all tumor-forming strains 
of A. tumefaciens. Also contained in the T-DNA are 
genes whose products are involved in the production 
of unusual amino acid derivatives composed of a basic 
amino acid such as arginine, and an organic acid such 
as pyruvic acid or 2-ketoglutaric acid to form octopine 
and nopaline, respectively. Additional genes on the 


T-DNA encode products that form disaccharides 
linked by a phosphate bond. These sugar phosphates 
are known as agrocinopines. Collectively, these un- 
usual compounds are called ‘opines.’ The type of 
opine consumed by A. tumefaciens depends on the 
type of Ti plasmid that resides in the organism. The 
Ti plasmid possesses the genes needed to take up and 
catabolize a specific opine. Thus, the type of opine 
utilized defines the type of Ti plasmid is present in the 
bacterial cell. 

Along with auxin and cytokinin, the opines pro- 
duced in crown gall tumors serve as a specific food 
base for A. tumefaciens. Thus, crown gall tumors serve 
as specialized ecological niches for A. tumefaciens. In 
essence, A. tumefaciens is a natural genetic engineer, 
uniquely equipped to horizontally transfer foreign 
genes into plants and genetically transform plant cells 
into cells that benefit and enhance the survival of the 
A. tumefaciens cells. Experimentally, A. tumefaciens 
was found to have a very broad host range, capable of 
causing crown tumors in a wide variety of plants, 
including some monocotyledons. Herbaceous plants 
such as sunflower (Helianthus annuus) and succulent 
plants such as Kalanchoé daigremontiana have been 
widely used by researchers to assay A. tumefaciens 
virulence. The sensitivity of plants used to assay the 
virulence of A. tumefaciens varies considerably. For 
example, members of the Solanaceae such as Datura 
stramonium (Jimson weed) are 50-fold more sensitive 
than members of the Crassulaceae such as K. daigre- 
montiana. 


Dissemination and Control of Crown 
Gall Disease 


Crown gall disease is spread primarily through in- 
fected stock. Secondary spread originates through cul- 
tivation practices. Soil surrounding the crown gall 
diseased tissues become infested with A. tumefaciens 
cells and can serve as a reservoir of the pathogen. 
Selective media designed to culture A. tumefaciens 
from soil are used to monitor the presence of this 
bacterium in orchards. Many fruit and nut trees are 
highly susceptible to A. tumefaciens. The disease is 
most severe on young trees since crown gall tumor 
growths on their roots and small trunks restrict the 
flow of water and nutrients. Unless caught very early 
in tumorigenesis, mechanical elimination of crown 
gall tumors from infected material is a relatively 
fruitless way to control the disease. Prophylactic 
measures using antagonistic soil-borne bacteria such 
as A. radiobacter have proven successful in certain 
cases where the antagonist inhibits the growth of 
the A. tumefaciens strain. Strain specificity of the 
biological control agent therefore limits its use to 


A. tumefaciens strains that are sensitive to the antag- 
onist. Other prophylactic strategies include main- 
taining clean propagation nurseries free of crown gall 
diseased plants, and sanitary cultural practices. The 
recent rise of genetically engineered crop technology 
has opened the way for developing crown gall resist- 
ant lines of fruit and nut trees, including grapevines 
and canes. 


T-DNA Transfer Mechanism 


Depending on the Ti plasmid type, the T-DNA is 
located as one or more adjacent DNA segments on 
the Ti plasmid; for example, the T-DNA is one con- 
tiguous segment in nopaline-type Ti plasmids while 
the T-DNA can occur in three adjacent segments in 
octopine-type Ti plasmids. Regardless of the Ti 
plasmid type, the T-DNA is recognized by its nucleo- 
tide sequences at its borders. These border sequences 
are composed of 25-bp repeats that are recognized 
by processing enzymes that cleave at the left and 
right borders, releasing a single-stranded T-DNA 
molecule on to which a pilot protein called VirD2 
is covalently attached at the 5’ end. T-DNA processing 
is initiated by A. tumefaciens recognizing specific 
phenolic compounds and simple sugars that promote 
the expression of virulence (vir) genes located near the 
T-DNA on the Ti plasmid. The processed T-DNA 
bearing VirD2 protein is transferred by means of 
a transmembrane nucleoprotein transport system 
composed of VirB proteins. There are 11 proteins 
encoded by the virB operon, 10 of which comprise 
the nucleoprotein secretion system. The remaining 
VirB protein, VirB2, is cleaved by a signal peptidase 
and the remaining peptide is cyclized into a circular 
peptide that is the subunit used in the biogenesis of 
an extracellular appendage called the T-pilus. The 
T-pilus is a long flexuous filament of 10 nm diameter. 
The T-pilus forms when A. tumefaciens cells interact 
with plant cells and is essential for T-DNA transfer. 


Further Reading 
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A cruciform structure contains a helical branchpoint of 
four double-stranded helical segments joined by the 
covalent continuity of the four strands (formally 
defined as a 4H junction). The strands pass between 
adjacent helices in a cyclical manner around the junc- 
tion. This junction is equivalent to the Holliday junc- 
tion formed by homologous genetic recombination 
and by the integrase family of site-specific recombin- 
ation events. A cruciform structure is often taken to 
mean a twin-hairpin structure formed by intrastrand 
pairing of the strands at an inverted-repeat sequence, 
and indeed this was the original meaning of the term 
(Figure |). Sucha structure is invariably less stable than 
the perfect duplex from which it forms, but can be 
stabilized in a negatively supercoiled DNA molecule. 


Structure of Four-Way DNA Junction 


Like many nucleic acid species, the structure of the 
four-way DNA junction is highly dependent on 
the presence or absence of metal ions (Figure 2). In the 
absence of added metal ions, the junction adopts an 
open structure in which the axes of the four helices are 
directed toward the corners of a square. This conform- 
ation is probably approximately planar, though it is 
unlikely to be exactly so, since the two sides have a 
different character, with major and minor groove char- 
acteristics. On addition of divalent metal ions, the 
junction undergoes a folding transition based upon 
the pairwise coaxial stacking of helices. The structure 
adopted is termed ‘the stacked X-structure.’ Folding 
reduces the fourfold pseudosymmetry of the junction, 
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Extrusion 


| 


Inverted repeat sequence 


Cruciform structure 


Figure | Formation of a cruciform structure from an 
inverted repeat. The structure is extruded by intra- 
strand base-pairing, forming two stem-loop structures. 
Inverted repeats are often referred to as ‘palindromes’ — 
this term is incorrect and should be avoided. 


dividing the strands into two distinct types. Two con- 
tinuous strands have single axes that run the length of 
the stacked helices, while two exchanging strands pass 
between axes at the junction. The point where the 
strands exchange is variously called ‘the crossover’ or 
‘the point of strand exchange.’ The resulting structure 
is antiparallel, and thus the two continuous strands 
run in opposite directions. However, the axes are not 
exactly antiparallel and lie at a right-handed angle of 
40°-60°. Like the extended structure, the stacked 
X-structure has dissimilar sides, with major and minor 
groove characteristics. The structure of the four- 
way junction was deduced in the late 1980s by the 
application of biophysical methods, but the stacked 
X-structure has recently been confirmed by X-ray 
crystallography. 

Two alternative conformers are possible for the 
stacked X-structure, which depend on the choice of 
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Figure 3 Formation of alternative stacking confor- 
mers by the four-way junction. 


stacking partner (Figure 3). If the arms were num- 
bered 1-4 sequentially around the junction, then one 
conformer would be formed by stacking helix 1 on 4, 
and 2 on 3. Alternatively, a distinct conformer could 
be formed by stacking helix 2 on 1, and 3 on 4. The 
nature of the strands becomes exchanged if the stack- 
ing partners are changed — exchanging strands become 
continuous strands and vice versa in a transition 
between the two conformers. The relative stabilities 
of the two forms depends on local sequence, and most 
junctions consist of populations of both forms with 
dynamic interconversion. 


Branch Migration 


When ajunction is formed by strand exchange between 
two homologous duplexes, it can undergo a sequential 
exchange of base-pairing in which the branchpoint 
becomes effectively displaced along the DNA se- 
quence. This is termed ‘branch migration.’ When the 
junction is folded into the stacked X-structure in 
the presence of divalent metal ions, this process is 


Extended, low-salt structure 


Figure 2 


Stacked X-structure 


lon-dependent folding of the four-way DNA junction into the stacked X-structure. 


relatively slow, with a rate of a few steps per second. 
Thus the process requires protein-mediated acceler- 
ation inside the cell. 


Interaction with Proteins 


Four-way DNA junctions are subject to structure- 
specific recognition by a number of proteins. These 
include the junction-resolving enzymes (junction- 
selective nucleases that resolve the j junction into com- 
ponent duplexes) and branch migration proteins. The 
former have been obtained from a wide variety of 
sources that include bacteriophage, eubacteria, yeast, 
and mammalian viruses. 


Cruciform Structures in Supercoiled 
DNA 


Cruciform structures (twin hairpin-loop structures) 
can enjoy a stable existence in negatively supercoiled 
DNA molecules, but there is little or no evidence 
that they do so inside the living cell. Indeed, the 
instability of long inverted repeats in bacteria suggest 
that their formation may be strongly deleterious. In 
addition to their low stability relative to the duplex 
form (cruciform structures are characterized by a 
large and positive free energy of formation from 
duplex DNA > 14 kcal mol), there is a large kinetic 
barrier to the extrusion of most cruciform structures 
(with alternating adenine-thymine sequences as a 
prominent exception). Extrusion occurs by one of 
two contrasting mechanisms. Most sequences ex- 
trude by the S-type mechanism, in which the center 
of the cruciform forms intrastrand base pairs, fol- 
lowed by branch migration. C-type cruciform for- 
mation occurs in AT-rich sequences at low ionic 
strength and involves the opening of a large region of 
DNA and the formation of the cruciform in a single 
step. 


Further Reading 

Murchie AIH and Lilley DMJ (1992) Supercoiled DNA and cruci- 
form structures. Methods in Enzymology 211: 158-180. 

Lilley DMJ (2000) Structures of helical junctions in nucleic acids. 
Quarterly Reviews of Biophysics 33: 109-159. 

White MF Giraud-Panis M-JE Péhler JRG and Lilley DMJ (1997) 
Recognition and manipulation of branched DNA structure 
by junction-resolving enzymes. Journal of Molecular Biology 
269: 647-664. 


See also: DNA Supercoiling; Holliday Junction; 
Site-Specific Recombination 
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Cryptic Satellite 
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A cryptic satellite is a satellite DNA sequence not 
identified as a separate peak on a density gradient 
but remaining present in main-band DNA. 


See also: DNA Structure 


Cryptic Splice Sites 
and Cryptic Splicing 
T Scholl 
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A cryptic splice site is a consensus recognition 
sequence for the cellular RNA splicing machinery 
that is used, or used more prevalently, due to genetic 
variation. A cryptic splice site shares homology with 
the splice donor, the splice acceptor, or the branch 
point which are all consensus sequences utilized in 
the course of RNA splicing. Most commonly, cryptic 
splice sites are utilized when a point mutation occurs 
within one of the above consensus sequences that are 
used to create the normal splice junction. These muta- 
tions reduce the fitness of the normal site for recogni- 
tion by the splicing system and result in the activation 
or increased use of the cryptic site. While mutations 
that reduce the fitness of normal splice sites cause the 
majority of splicing at cryptic sites, mutations that 
increase the fitness of cryptic sites also induce cryptic 
splicing. In these cases, a point mutation creates a site 
with strong homology to the consensus sequence. 
This results in the preferential recognition of the new 
site by the cellular splicing system and the formation 
of abnormally spliced transcripts. Genetic variants 
outside of splice consensus sites can also result in the 
activation of cryptic splicing. This can occur with 
mutations nearby, but outside of the recognition 
sequences themselves. This effect presumably occurs 
through changes in RNA secondary structure that 
interfere with accessibility to the normal sites by the 
splicing machinery. The use of alternative cryptic sites 
is thereby favored. All of the preceding examples of 
cryptic splicing involve mutations that occur within 
the RNA molecule in question. Genetic variants that 
occur elsewhere and operate to induce cryptic splice 
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sites in trans have also been reported. These mutations 
presumably occur within components of the splicing 
machinery and alter its recognition preferences in 
favor of cryptic splice sites over the normal recognition 
sites. In summary, a cryptic splice site is a nucleotide 
sequence with homology to a normal consensus splice 
site whose activity is increased by genetic change. 


RNA Products Derived from Cryptic 
Splice Sites 


Most splice mutations occur within the splice donor 
or the splice acceptor sites and result in transcripts that 
exhibit ‘exon skipping,’ where splicing deletes the 
affected exon from the mRNA. Cryptic splice sites 
are activated in only a minority of cases. Also, muta- 
tions that occur within splice donor sites seem more 
likely to activate cryptic splice sites than mutations 
that occur in splice acceptor sites. Cryptic sites can 
occur in either introns or exons. Cryptic splice sites 
located in introns create processed transcripts that are 
longer than normal and include a region of intronic 
sequence. The longer this insertion, the greater the like- 
lihood that a nonsense codon will be included that will 
prematurely terminate the open reading frame. Short 
intronic insertions can maintain the reading frame, but 
upon translation will result in the incorporation of 
abnormal amino acids that could impact the protein’s 
function. Cryptic splice sites also occur within exons 
and result in transcripts with deletions. If the resultant 
splice junction maintains the reading frame then the 
deletion will cause the loss of amino acids from the 
protein. If the junction alters the reading frame then 
abnormal amino acids will be encoded and a stop 
codon could be encountered. In summary, the activa- 
tion of cryptic splice sites results in either the prema- 
ture termination of the open reading frame, or in small 
insertions or deletions that alter the length of the 
transcript and modify the primary structure of the 
encoded protein. 


Detection of Cryptic Splicing 


RNA splicing from cryptic sites is commonly identi- 
fied during research into the biochemical mechan- 
isms of RNA splicing or through the identification 
of genetic mutations in research and clinical settings. 
Detection is usually accomplished by PCR amplifica- 
tion from cDNA followed by nucleotide sequence 
analysis to characterize the precise primary structure 
of the RNA splicing junctions. The interpretation of 
these results can be complicated by the presence of 
multiple RNA species. Many genes normally produce 
alternative splice products, most commonly from 
‘exon-skipping.’ It is also possible that splicing from 


a cryptic site occurs normally, but that a mutation can 
increase the prevalence of its products. Therefore, 
mutations that disrupt normal splicing can induce 
the formation of novel RNA species, as well as 
increase the prevalence of normal alternative splicing 
products that may not encode functional proteins. 
Furthermore, these various RNA molecules can have 
widely different stabilities that complicate evaluation 
of the prevalence of use of cryptic splice sites. This 
occurs because cells possess mechanisms termed 
‘RNA surveillance’ or ‘nonsense-mediated decay’ 
that can rapidly degrade mRNA containing premature 
termination codons commonly found in transcripts 
spliced from cryptic sites. 


Clinical Significance of Cryptic Splicing 


Increased availability and efficacy of clinical molecu- 
lar genetic tests have emphasized the medical import- 
ance of understanding splice mutations, including 
cryptic splice sites. At present, the technology to 
detect genetic variants has outpaced the ability to 
interpret their clinical significance. Currently, it is 
possible to identify accurately genetic variants within 
conserved splice sites or variants that could activate 
cryptic sites, but understanding the clinical signifi- 
cance of these variants is difficult. Research projects 
that utilize genetic or biochemical approaches could 
determine the clinical significance of variants that 
impact splicing. However, these difficult and costly 
approaches exceed the expertise of most clinical 
laboratories and the time required implementing 
them could delay clinical test results. 

Information regarding nucleotide use within splice 
consensus sites has been combined with mathematical 
models to produce computer programs that attempt to 
assess the potency of the sequence at a given splice site. 
These programs can be used to gauge the severity of 
splice site mutations by comparing the values calcu- 
lated for the normal and mutant sequences. This 
approach could also signal the possible activation of 
cryptic sites when mutations create sequences with 
strong consensus values. Notably, ‘silent’ mutations 
that exchange the codon for a particular amino acid 
have activated cryptic sites in disease genes. Unfortu- 
nately, splice mutation prediction with computer 
programs is crude since the analysis is limited to 
only the nucleotides that comprise the consensus 
sites and splice site selection is certainly more compli- 
cated. Indeed, biochemical analysis of transcripts in 
BRCA1 show discrepancies between splice site use 
and the strength of sites predicted with some algo- 
rithms. Despite these shortcomings, computer pro- 
grams are among the limited tools available to assist 
clinical interpretation of potential splice mutations. 


The question of cryptic splice site activation can 
distill to one of context. The outcome of a mutation 
within a normal splice site may depend on the proxi- 
mity and strength of a nearby cryptic site. Since most 
disease genes are well characterized, potential cryptic 
splice sites could be identified within their nucleotide 
sequence. This approach could permit the awareness 
of risk of cryptic splicing. In summary, the clinical 
interpretation of genetic variants that could impact 
splicing in disease genes is problematic. Definitive 
genetic and biochemical approaches are beyond the 
scope of clinical laboratories, while attempting to 
model the effects of mutations for all but the most 
conserved bases within the normal splice sites is 
uncertain. 


See also: Alternative Splicing; Eukaryotic Genes; 
Pre-mRNA Splicing 
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ct DNA is the abbreviation for chloroplast DNA. 


See also: Chloroplasts, Genetics of 


CTP (Cytidine 
Triphosphate) 
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Cytidine-5’-triphosphate (CTP) is an energy-rich, 
activated precursor for RNA synthesis. It is formed 
in the cell by amination of uridine triphosphate 
(UTP). The carbonyl oxygen at C4 of the uracil moi- 
ety is replaced by an amino group. The amide donors 
differ among organisms. In mammals, for example, 
glutamine is the amide donor, but, in the bacterium 
Escherichia coli, the ammonium ion is used in this 
reaction. 

For the synthesis of deoxycytidine triphosphate 
(dCTP), a precursor of DNA, the 2’ hydroxyl group 
of the ribose moiety of CTP is replaced by a hydrogen 
atom. The final step in this conversion is catalyzed by 
ribonucleotide reductase. 


See also: RNA 
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Cutis laxa 
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Cutis laxa (CL) is a clinical term referring to the over- 
stretched and inelastic skin, which forms loose folds, 
especially of the neck, face, and flexures. Whilst 
common in old age, it is very abnormal any earlier, 
especially when generalized or very widespread as 
opposed to more localized forms of sagginess. For a 
long time, CL was confused with cutis hyperelastica, 
which characterizes most Ehlers-Danlos syndrome 
(EDS) variants (Pope, 1993). Early cases of EDS 
were often described as showing ‘cutis laxa,’ as was 
the original description of EDS by Danlos himself. 
Here, in contrast to the bloodhound-like jowly 
melted-wax appearance and loss of elasticity of true 
cutis laxa, the skinis overextensible. After being stretch- 
ed or otherwise deformed, it immediately snaps back 
to normal. Rather confusingly, some EDS subtypes 
also show genuine CL, either very early, as in EDS 
types VII a—c or very much later as a late complication 
after middle age in some EDS I/II variants. 

CL is classified into three subsets; primary CL of 
which there are several variants; secondary CL, in 
which the lax skin complicates other inherited defects 
of connective tissue; and acquired CL in which dis- 
orders of systems other than primary connective tissue 
components induce obvious cutaneous laxity and 
redundancy (Pope, 1993, 1995). 


Primary Cutis Laxa 


Classical Types 

This was first described in the late nineteenth century 
under a variety of names, such as generalized derma- 
tolysis, geomorphisme cutane, atrophie idiopathique 
de la peau, peau ridée senile, etc. Published in 1887, 
Dubreuilh’s case showed the striking premature aging 
of an adolescent French girl, who looked old enough 
to be her own grandmother. Other contemporary cases 
were confused with progeriaand even when recognized 
as different from EDS, CL was not easily differentiated 
from neurofibromatosis, calcinosis, and various types 
of scleroderma (Pope, 1993). In 1972, Beighton clearly 
distinguished autosomal dominant and recessive vari- 
ants, observing that in general the recessive variant 
was both earlier in onset and more dangerous. He 
studied a large autosomal dominant family, in which 
the CL occurred from infancy onwards. Emphysema 
had been recognized as early as 1938 and in general is 
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Figure | 
complicating pseudoxanthoma elasticum (PXE). (A) Predom- 
inantly in a female of Afro-Caribbean origin; (B) generalized 
in a Japanese female. 


(See Plate 6) Two examples of generalized CL 


much more severe in the recessive form, who can die 
from respiratory failure in teenage. Beighton’s original 
autosomal dominant proband survived to late middle 
age before requiring lung transplantation to treat her 
progressive emphysema. Her skin histology showed 
elastic fragmentation. In the autosomal recessive 
forms, elastic stains of the skin show either elastic 


depletion or less frequently proliferation. Mutations 
of the elastin gene have now been demonstrated in 
both variants (Zhang et al., 1997). 


Other Primary Variants 

These include CL with joint laxity and developmental 
delay, also called autosomal recessive cutis laxa type II. 
Another variant includes wormian bones and general- 
ized osteoporosis, to which when combined with cor- 
neal clouding and developmental delay with CL, the 
eponym De Barsey syndrome is applied. 


Secondary to other Disorders 


Cutis Laxa Complicating other Inherited 
Connective Tissue Syndromes 

Generalized CL is a rare complication of both the 
Ehlers—Danlos syndrome and pseudoxanthoma elasti- 
cum (PXE) (Figure | A, B) whilst more localized forms 
are very much commoner. It is also a well-recognized 
complication of the occipital horn syndrome. 

In EDS types VII a—c there is premature cutis laxa, 
which ranges from generalized in EDS VIIc to more 
subtle localized CL, usually most obvious of the face 
and less so of the trunk in EDS types VII a and b. The 
degree of laxity correlates with the severity of the 
mispacked collagen fibres, caused by mutations of 
either the NB propeptide cleavage sequence or of its 
cleavage N propeptidase. Unlike in true primary CL, 
the skin is hyperelastic. Transient CL has also been 
occasionally observed in babies with EDS IV caused 
by collagen type III mutations. Late-onset laxity 
complicates EDS types I and II. A beautiful example 
was illustrated by Beighton, who showed the contrast 
between the smooth-skinned hypermobile youth and 
his very weathered facial appearance in old age. Here a 
bloodhound-like appearance very similar to that of 
premature autosomal dominant CL occurs, except 
that in EDS this is a problem of old age. 

PXE produces true cutis laxa, in which the affect- 
ed axillary, neck, and flexural skin becomes truly lax 
and inelastic. As proof of such laxity, it looks like 
plucked-chicken skin and here the abnormal skin 
contains degenerate, fragmented, mid-dermal deposits 
of abnormal elastic fibres. The latter equally will 
induce abnormalities of blood vessels and the retina. 
Very occasionally there is snowstorm calcification of 
the lungs. 

Cutis laxa also complicates the occipital horn syn- 
drome where lysyl oxidase deficiency is caused by 
abnormal copper metabolism. Bladder diverticulae 
and dilatation of the urinary tract are also features 
and there may be phenotypical overlap with Menke’s 
syndrome. However since the lysyl oxidase gene 
is autosomal the linked inheritance of lysyl oxidase 


deficient CL is doubtful. Our two recently reported 
cases presented with infantile CL, later developing 
wormian bones. One had severe obstructive uropathy 
with renal failure. This phenotype resembles auto- 
somal recessive CL with wormian bones (OMIM 
219200). 


Acquired Cutis Laxa Caused by Other 
Systemic Disorders 

Here the end result is the same, i.e, there is true laxity 
of the skin, but the cause is of incidental systemic 
disease, which happens to infiltrate dermal connective 
tissue. Good examples are amyloid disease, either as a 
primary disorder or occurring secondary to multiple 
myeloma. In hereditary neuropathic amyloidosis of 
the Finnish type, the CL is predominantly facial. 
Similar generalized elastolysis can also occasionally 
complicate urticaria, generalized eczema or Sweet 
syndrome (Pope, 1993, 1995). None of these have 
anything in common pathogenetically except for a 
general predisposition to affect the skin. Idiopathic 
generalized elastolysis occurs in the absence of any 
of the listed secondary causes. Unlike primary cutis 
laxa, it is of adult onset, from the third to the sixth 
decades. Characteristic changes include esophageal 
diverticulae, esophogeal or inguinal hernias, severe 
generalized emphysema, colonic diverticulae, and 
progressive joint laxity. Pulmonary hypertension or 
aortic dilatation and rupture have been documented in 
various patients. Whether this is sometimes a late- 
onset allelic variant of primary CL is unknown. 


Blepharochalasis 

Although strictly speaking confined to the orbits, eye- 
lids, and eyebrows, blepharochalasis often accompon- 
ies generalized CL. It can also segregate as specific 
autosomal dominant traits, with or without lip invol- 
vement (OMIM 109900 and 11000). 


Pathogenesis of Cutis Laxa 


In most varieties of CL, elastin itself, or another com- 
ponent of elastic fibres is either fundamentally fragile 
or degraded by virtue of other secondary factors. 
Otherwise, the closely related microfibrillar constitu- 
ents are abnormal, as is the case in lysyl oxidase 
deficiency (Khakoo et al., 1997) and in some cases of 
acquired CL. Less commonly, collagen fibres are dis- 
torted as occurs in those EDS variants with CL, whilst 
in amyloidosis other elastic microfibrillar abnormal- 
ities are produced by amyloid microfibrils. 

Structurally, elastic fragmentation or even gross 
deficiency is very obvious in primary autosomal CL, 
whilst the changes vary in the other primary autoso- 
mal recessive CL. 
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Cyclic AMP was discovered in the 1950s by Earl 
Sutherland in the course of studying how certain hor- 
mones elicit the conversion in liver cells of glucose 
to glycogen. In essence, the binding of a hormone 
to the external face of a highly specific transmem- 
brane receptor triggers the action of an intracellular 
enzyme, adenylate cyclase. This enzyme then converts 
ATP to cyclic AMP. The latter compound (sometimes 
referred to as a ‘second messenger’) acts in variety of 
ways, most notably by stimulating the activity of 
various broad-specificity protein kinases. The phos- 
phoprotein products of the kinase reactions partici- 
pate in signal transduction cascades in such a way as 
to greatly amplify the effect of very slight amounts of 
hormone. 

Cyclic AMP was identified in bacteria in 1965, also 
by Sutherland. As in animal cells, the precursor mole- 
cule is ATP. The best-understood role of cyclic AMP 
in bacteria is to modulate the utilization of carbon 
sources. This is accomplished via the action of an ac- 
cessory protein, the cyclic AMP binding protein 
(CAP orCRP). This protein canpotentially actasatran- 
scription factor by engaging specific target sequences 
in DNA. When physiological circumstances lead to a 
rise in cyclic AMP, the concentration of binding 
protein—cyclic AMP complexes also increases. As a 
result, there is enhanced occupancy of a set of target 
sites in DNA, many of which are situated within or 
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near promoters that drive the expression of proteins 
needed for carbon source uptake and breakdown, 
provided that certain other conditions are met (e.g., a 
carbon source is actually available in the environ- 
ment). 


Further Reading 

Busby S and A Kolb (1996) The CAP modulon. In: Lin ECC and 
Lynch S (eds) Regulation of Gene Expression in Escherichia coli, 
pp. 255-279. Austin, TX: RG Landes. 


See also: ATP (Adenosine Triphosphate); Kinases 
(Protein Kinases) 
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Cyclin-dependent kinases (CDKs) are a family of 
protein serine/threonine kinases whose activity 
depends on association with a noncatalytic regulatory 
subunit called a cyclin. The genes that encode CDKs 
were initially identified in screens for conditional 
mutants of Saccharomyces cerevisiae which reversibly 
arrested at characteristic points in the cell cycle upon 
transfer to the restrictive conditions. Cyclins were 
initially named for their periodic accumulation and 
degradation (cycling) through the early cell cycles of 
fertilized sea urchin eggs. The fusion of these two lines 
of research identified a ubiquitous cell cycle engine 
responsible for coordinating cell growth, DNA repli- 
cation, and mitosis in the orderly fashion required to 
ensure the viability of progeny cells. 


Discovery 


The events of mitosis that generate two daughter cells 
are readily visible with the light microscope. Apart 
from growth, there is little apparent cellular activity 
in the gap phases between successive rounds of cell 
division. DNA is replicated in a discrete period 
between successive rounds of mitosis. This phase is 
called S (for synthesis). The preceding phase is termed 
G, and the gap before the next round of mitosis is 
termed G2. Within G4, growth and the synthesis of 
material necessary for DNA replication occur, while 
in G the cell prepares for mitosis (M-phase), which 
may involve further growth. Cell fusion experiments, 
initially carried out with unicellular protozoa and 
subsequently refined by Rao and Johnson with studies 
on cultured human HeLa cell lines, suggested the 


existence of an M-phase-promoting factor and an 
S-phase-promoting factor. M-phase was shown to be 
dominant over all other phases of the cell cycle: cells at 
any cell cycle stage could be induced to undergo 
chromosome condensation by fusion with cells under- 
going mitosis. S-phase was shown to be dominant over 
Gı, but not G32. This result suggested that once the 
DNA is replicated, the cell instates a block to rerepli- 
cation. The synchrony with which the multinucleate 
cells entered mitosis (or S-phase) strongly suggested 
that there must be feedback controls within the cell to 
coordinate cell cycle progression. These experiments 
created the theoretical landscape on which studies of 
cell cycle control were to develop but offered no route 
to isolating the factors responsible. 

The factor responsible for promoting M-phase was 
first indicated by experiments in which Hunt and 
Jackson monitored the proteins present in a single 
fertilized sea urchin egg as a function of time after 
fertilization. These experiments identified acomponent 
that accumulated up to the time of mitosis, whereupon 
it became entirely degraded, reappearing only at the 
initiation of the subsequent round of cell division. 
Using unfertilized Xenopus laevis eggs as a source of 
material, Maller and colleagues were able to purify 
maturation-promoting factor (MPF) by following its 
ability to cause both germinal vesicle breakdown 
when injected into Xenopus oocytes and chromosome 
condensation (metaphase) in a cell-free system. The 
MPF had protein kinase activity and cofractionated 
with two proteins of 45 and 32kDa apparent mo- 
lecular weight. 

Concurrently with this biochemical approach, 
genetic studies using the yeasts Saccharomyces cerevi- 
siae and Schizosaccharomyces pombe had lead to the 
isolation of a series of cell division cycle (cdc) mutants 
which arrested under restrictive conditions at specific 
stages in the cell cycle. Cells harboring these muta- 
tions were blocked for progression through the cell 
cycle but not for cell growth and macromolecular 
synthesis. The application of genetics, coupled with 
molecular biological methods, allowed the cdc genes 
to be cloned and characterized. A major breakthrough 
came when it was recognized that the 32 and 45kDa 
proteins that constituted MPF activity were the res- 
pective products of the S. pombe cell division cycle 
genes cdc2* and cdc13*. The cdc2* gene encodes a 
protein kinase that requires association with a cyclin 
subunit (in this case the product of the cdc13* gene) 
for activity. It is a remarkably well-conserved protein 
family: the human CDC2* gene was cloned by com- 
plementation of an S. pombe cdc2 mutant. Since this 
pioneering work of the late 1980s, multiple members 
of both the CDK and cyclin families have been char- 
acterized in many eukaryotic species. 


Function 


In the early synchronous divisions of frog embryos, 
MPF activity oscillates without reference to other cell 
cycle events such as the successful completion of 
DNA replication. This observation suggests the exist- 
ence of a ‘cell cycle engine,’ which drives the cell 
through consecutive rounds of division by means of 
periodic activation and inactivation of MPF. 

Somatic cell fusion experiments, together with 
genetic experiments on yeast, provide a rather differ- 
ent view of the role of CDK activity in regulating the 
cell cycle — one in which CDK-driven progression 
depends on the successful completion of earlier cell 
cycle events. Certain mutations in the S. pombe cdc2* 
and cdc13* genes allow inappropriate cell cycle pro- 
gression — a phenotype that Hartwell and Weinert 
called ‘relief of dependence.’ From this observation 
they argued that control mechanisms, termed ‘check- 
points,’ must exist. These checkpoints are composed 
of a surveillance system that detects when a particular 
cell cycle event has not been correctly executed, and a 
signal transduction pathway whose ultimate target can 
bea CDK. The cell cycle of dividing cells has two major 
points of commitment at which CDK/cyclin pairs are 
active in determining cell fate. These are the G,/S and 
G,/M transitions, passage through which results in 
DNA replication and mitosis, respectively. If essential 
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molecular events have not been successfully com- 
pleted, cells will arrest at these transitions as a result 
of inhibition of CDK activity. For example, cells will 
arrest at the G/M boundary if damaged DNA or 
incompletely replicated chromosomes are present. 

The multiplicity of roles played by CDKs in timing 
and coordinating cell division lead David Morgan 
to describe them as “engines, clocks and micropro- 
cessors.” 


Structure and Activity 


CDK molecules catalyze the transfer of the y-phos- 
phate of ATP onto the side chain hydroxyl groups of 
serine or threonine residues of target proteins. CDKs 
in general constitute a protein kinase ‘catalytic core’ of 
approximately 300 amino acids, not elaborated by N- 
or C-terminal extensions (Figure |). They share the 
fold observed in the broad family of protein serine/ 
threonine kinases, protein tyrosine kinases, and cer- 
tain phospholipid kinases such as P13 kinase. This fold 
is formed from an N-terminal domain of approxi- 
mately 85 amino acids composed largely of B-sheet, 
and a C-terminal domain of approximately 215 amino 
acids composed primarily of o-helix. ATP is bound 
between these two domains, while peptide substrates 
associate mainly with the C-terminal domain. Binding 
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The structure of CDK2 in complex with cyclin A. The structure of CDK2 (white, left), and cyclin A (grey, 


right) are shown in ribbon representation. The CDK-specific insert that mediates interactions with CKS proteins and 
KAP follows the G-helix. The sites where protein-protein interactions direct substrate- and inhibitor-binding have 


been determined by peptide-binding studies. 
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of both nucleotide and peptide substrates is dependent 
on an appropriate conformation of a stretch of amino 
acids (residues 145-172 of human CDK2) termed the 
activation segment or T-loop. 

CDKs are defined by their sequence similarity and 
their dependence upon cyclin binding for obtaining 
full activity. Particular features of their sequence 
include a degenerate PSTAIRE motif (single-letter 
amino acid code), the glycine-loop, and a CDK- 
specific insert. The PSTAIRE motif constitutes the 
first turns of the ‘C-helix’ in the N-terminal kinase 
domain, while the CDK-specific insert maps after the 
‘G-helix’ in the C-terminal kinase domain. The gly- 
cine-loop contains the GXGXXG motif that is con- 
served within the protein kinase family and is 
important for ATP binding (Figure 1). In terms of 
primary structure, CDKs resemble most closely the 
family of mitogen-activated protein kinases with 
which they share the characteristic of preferring sub- 
strate serine/threonine residues immediately upstream 
of proline residues. 


CDK Protein Family 
CDKs in Yeasts 


CDK! 

The CDK family contains multiple members in both 
Saccharomyces cerevisiae and Schizosaccharomyces 
pombe, but only one CDK is directly involved in cell 
cycle regulation. This protein is the product of the 
CDC28* and cdc2* genes, respectively, though in 
later literature it is sometimes referred to as CDK1. 
Different forms of the Cdc28 and Cdc2 kinases are 
generated by their association with different cyclins. 
Sc. pombe offers the simplest paradigm for this pheno- 
menon, with Cdc2 pairing with Cig2 or Cig! to initiate 
DNA synthesis, and with Cdc13 to direct entry into 
mitosis. In Sa. cerevisiae, a wider range of cyclin 
molecules pair with Cdc28: Clns1-3 during G, phase, 
Clbs1-6 (although predominantly Clb5 and Clb6) at 
the start of S-phase, and Clbs1—4 at mitosis. 

Under certain conditions, Sc. pombe cells can be 
induced to undergo both DNA replication and mito- 
sis by the activity of a single CDK/cyclin complex. 
Under these conditions, S-phase is initiated by low 
levels of CDK activity, while the higher levels of 
CDK activity that accompany Gz phase serve to 
block rereplication and promote mitosis. Following 
cyclin destruction at exit from mitosis, CDK activity 
is reset to the low level that is a prerequisite for the 
initiation of DNA replication. 


CDk-activating kinase (CAK) 
CDKs are only fully active following phosphoryl- 
ation of a conserved threonine residue within the 


activation segment (Thr160 in the human CDK2 
sequence). In Sc. pombe the enzyme responsible, called 
CDK-activating kinase (CAK), is a CDK/cyclin pair, 
Mop1(Crk1)/Mcs2. This pair is functionally similar 
to CDK7/cyclin H of higher organisms, described 
below. In addition to their CAK activity, both 
enzymes can phosphorylate the C-terminal domain 
of RNA polymerase II and so play a role in the reg- 
ulation of transcription. In Sa. cerevisiae cells, activa- 
tion of Cdc28 results from phosphorylation by CIV1. 
CIV1 is distantly related to the CDK family and does 
not require a cyclin partner for activity. Another 
CDK/cyclin pair in Sa. cerevisiae, Kin28/Ccl1, phos- 
phorylates the C-terminal domain of RNA polymer- 
ase II. 


Pho85 

Pho85 is an Sa. cerevisiae CDK which was identified 
as a negative regulator of the PHO phosphate metab- 
olism system. Pho85 phosphorylates the transcription 
factor Pho4, which results in Pho4 transport from the 
nucleus. Pho85 has subsequently been found to have 
multiple activities, as a result of its ability to form 
complexes with different Pho85 cyclins (Pcls), of 
which there are at least 10. Although not involved in 
cell-cycle progression, Pho85 is able to functionally 
substitute for Cdc28 in cells where the genes encoding 
cyclin-binding partners of Cdc28 are disrupted. 


CDKs in Metazoans 


CDKs | and 2 

A number of CDK/cyclin pairs regulate the cell cycle 
in metazoan cells (Figure 2). Cell cycle progression is 
largely driven by CDKs 1 and 2 and their associated 
regulatory proteins. CDK2 activity is first detected in 
late G, phase following transcription of cyclin E, and 
subsequently in complexes with cyclin A which are 
required for progression through S-phase. Whereas 
cyclin E expression rises and falls rapidly in late G4, 
cyclin A is first detected in late G,/early S-phase and 
its expression rises steadily through S and G3. Cyclin 
A can also form a complex with CDK1 that, together 
with CDK1/cyclin B, controls entry into M-phase. 
Cyclin A degradation by ubiquitin-mediated proteo- 
lysis, which occurs just before the metaphase-to- 
anaphase transition, precedes that of cyclin B. Exit 
from mitosis requires both mitotic cyclins to be 
degraded. The cellular environment that prevails as 
cells move into G; maintains a state of low CDK 
activity which is a prerequisite for DNA replication. 


CDKs 4 and 6 
While complexes containing CDK1 and CDK2 direct 
cycling cells through S-phase and mitosis, CDK4 and 
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Figure 2 Key regulatory events in the higher eukaryotic cell cycle. CDKs are labeled by their number (1, 2, 4/6, 7) 
and cyclins are identified with a single letter (A, B, D, E, H). Light grey arrows denote phosphorylation. Dark bars 


denote inhibition of CDK activity. 


CDK6 in complex with D-type cyclins stimulate the 
cells of multicellular eukaryotes from a quiescent Go 
phase into G; (Figure 2). These complexes act as a link 
to transduce information from mitogenic signaling 
pathways to the cell cycle engine. In response to sti- 
mulation of cells by a variety of mitogens, transcrip- 
tion of genes encoding D-type cyclins (D1, D2, and 
D3) is upregulated through the mitogen-activated 
protein kinase pathway. The formation of complexes 
between D-type cyclins and CDK4 or CDK6 requires 
stoichiometric association with a member of the Cip1/ 
Kip1 family. As cyclin D accumulates, CDK4 and 
CDK6 are activated, resulting in phosphorylation of 
the product of the retinoblastoma gene, pRB. Early in 
G; phase, pRB is found in complex with the hetero- 
dimeric transcription factor E2F-1/DP-1. This com- 
plex represses the transcription of E2F-1-dependent 
genes, the products of which are required for S-phase. 
Later in G;, pRB is phosphorylated by CDK2 in 
complex with cyclin E. This cumulative phosphoryl- 
ation leads to dissociation of the pRB/E2F complex 
and activation of E2F-dependent gene transcription. 
Formation of active CDK2/cyclin E is dependent 
on the presence of D-type cyclin complexes in two 
ways: firstly, for the synthesis of cyclin E, since the 
cyclin E gene is an E2F target; and secondly, because 
cyclin D-dependent CDK complexes sequester 


members of the cyclin-dependent kinase inhibitor 
(CKI) Cip/Kip1 family, including p219! and 
p27™?!, which would otherwise inhibit the CDK2. 
The capacity of CDK2/cyclin E to upregulate its own 
expression creates a positive feedback loop that sus- 
tains pRB phosphorylation. Once activated, CDK2/ 
cyclin E can also phosphorylate p27*?' to target it for 
degradation by the ubiquitin-mediated proteolytic 
pathway. Together these events act to ensure irrever- 
sible progression through the G;/S transition. Where- 
as cyclin D-dependent CDKs appear to have only one 
major cell cycle target, CDK2/cyclin E complexes 
have been shown to phosphorylate a number of pro- 
teins including histone H1, CDC6, and proteins that 
are required for the firing of replication origins. 


CDK7 

In higher organisms, CAK activity has been attributed 
to CDK7 in complex with cyclin H (Figure 2). This 
complex is also able to phosphorylate the C-terminal 
domain (CTD) of the large subunit of RNA polymer- 
ase II, to regulate transcription. When active in CTD 
phosphorylation, CDK7 is found as part of a CDK7/ 
cyclinH/MAT 1 (menage 4 trois) complex which 
associates with core subunits of the transcription 
factor TFIIH to form holo-TFITH. MAT1 plays 
multiple roles in this context: firstly by promoting 
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the association of CDK7 with cyclin H which would 
otherwise require CDK7 phosphorylation; secondly 
by recruiting the CDK7-cyclin H pair to the TFITH 
complex; and thirdly by stimulating CTD phosphor- 
ylation. CTD phosphorylation, which stimulates 
the initiation and early elongation stages of transcrip- 
tion, is shared by other CDKs, notably CDK8 and 
CDK9. The CTD activity of CDK9 is subverted by 
the human immunodeficiency virus HIV-1, which 
recruits CDK9 to HIV pre-initiation complexes 
through the viral protein Tat. 


CDK5 

CDK5 is active in the control of cell morphology 
rather than cell division. CDK5 is expressed in post- 
mitotic neuronal cells, localized to the growth cone. 
The cyclin partner of CDK5 is p35, and mice that lack 
this subunit show severe defects in neuronal migration 
and neurite formation. p35 is subject to proteolysis to 
generate a deregulated 25kDa fragment which may be 
responsible for the hyperphosphorylation of micro- 
tubule-associated proteins such as tau in a number of 
neurodegenerative diseases. 


CDK Regulation 


The paramount role of CDKs in determining the 
phases of the cell cycle, means that they themselves 
have to be tightly regulated and responsive to a variety 
of inputs. Monomeric CDK molecules possess no 
detectable protein kinase activity. A structure of 
monomeric CDK2 determined by Sung-Ho Kim and 
his coworkers provides an explanation for this in- 
activity, since key elements of the substrate recognition 
and phosphotransfer apparatus are inappropriately 
arranged. The molecular mechanisms that are used to 
regulate CDK activity are diverse, but can be charac- 
terized as depending on either reversible phosphor- 
ylation, or reversible association with a regulatory 
partner. These two phenomena affect in turn the abil- 
ity of the CDK to select an appropriate substrate, or to 
adopt an active conformation. The surface of a CDK 
presents multiple sites through which it interacts with 
substrates and regulators (Figure 3). 


Processes that Activate CDKs 

As described below, CDK activity requires cyclin 
binding, which may be aided by the cooperation of 
assembly factors, phosphorylation within the activa- 
tion loop, and correct localization within the cell. 
CDK activity is also controlled by its association 
with other regulatory proteins. An example is mem- 
bers of the CKS family (Figure 3). The exact role of 
this protein family in regulating CDK activity is not 
fully understood. 


Cyclin binding 

Cyclins, which activate cell cycle CDKs, accumulate 
at characteristic points in the cell cycle, and are speci- 
fically degraded by the ubiquitin-directed action of 
the proteasome. This irreversible step contrasts with 
reversible control of CDK activity by phosphoryl- 
ation. A series of CDK2 and cyclin structures provide 
a model for the role of cyclin binding in CDK activa- 
tion. On formation of the CDK2/cyclin A complex 
there are no changes in the structure of cyclin A, but 
substantial conformational changes in CDK2 create 
the ATP triphosphate recognition site. The PSTAIRE 
helix swings into the active site cleft and the short 
aL12 helix in the monomeric structure melts to form 
a B-strand (Figure 3). Both of these regions include 
conserved residues important for ATP binding. 
CDK2/cyclin A exhibits about 0.2% of the activity 
of the fully activated phosphorylated binary complex. 


Assembly factors 

The pairing of CDK7 with cyclin H is stabilized by 
phosphorylation of CDK7 within its activation seg- 
ment. In the absence of this phosphorylation, stable 
association of CDK7 with cyclin H is promoted by 
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Figure 3 CDK-protein interaction sites. The deter- 
mination of the structures of a number of complexes 
containing either CDK2 or CDK6 has identified the sites 
of interaction of members of the CDK family with their 
key regulators. The CDK subunit is shown in ribbon 
representation looking onto the N-terminal domain. 
Each regulatory protein-binding surface is highlighted by 
a curved line. paz binding extends round on to the 
cyclin subunit. 


MAT-1. A similar mechanism may exist to promote 
the formation of other cognate CDK/cyclin pairs. For 
example, the CKI p21?’ has been proposed to assist 
in the formation of CDK4/cyclin D complexes. Meas- 
ured CDK/cyclin association constants suggest that 
CDKs have a low intrinsic ability to discriminate 
between cognate and noncognate cyclin partners. 
Assembly factors may play a role in promoting the 
formation of appropriate pairings. 


Phosphorylation 

CDK/cyclin complexes require phosphorylation of a 
conserved threonine residue (T160 in CDK2) for full 
catalytic activity. Thr160 phosphorylation leads to a 
rearrangement of the activation loop so that it adopts a 
conformation that can recognize substrate. 


Localization 

In addition to the temporal control provided by the 
cell cycle engine, the subcellular location of CDKs 
and the proteins that regulate them play an important 
role in regulating their activity. Specific CDK 
complexes can be localized to substructures within 
organelles: a phenomenon that either results from, or 
serves to promote, CDK-substrate interactions. 
CDKs complexed with cyclin E and cyclin A are 
constitutively nuclear, whereas the location of cyclin 
B1 varies through the cell cycle. During interphase, 
the protein shuttles between the nucleus and the cyto- 
plasm. At M-phase, cyclin B1 accumulates rapidly in 
the nucleus, as phosphorylation of its cytoplasmic 
retention sequence creates a nuclear import signal. 
Where DNA damage is detected, this accumulation 
does not occur. 


Processes that Inactivate CDKs 


CKI binding 

In eukaryotic cells, the activity of CDKs can be inhib- 
ited by the binding of proteinaceous CDK inhibitors 
(CKIs). In Sc. pombe, rum1* was identified as a gene 
without which mitosis becomes uncoupled from 
DNA replication. This gene was found to encode a 
CKI of 25kDa, which is an important regulator of 
G1 progression. The slightly larger protein Sic1 is 
the functional homolog of Rum1 in Sa. cerevisiae. 
Sa. cerevisiae uses the CDK inhibitor Far1 to induce 
G, cell cycle arrest in response to mating pheromones. 
When phosphorylated by Fus3, a MAP kinase, Far1 
inhibits three Cdc28/Cln complexes. Two major 
classes of proteins inhibit CDK activity in higher 
eukaryotes (Figure 2). These are the INK4 (inhibitors 
of CDK4) family, which specifically inhibits CDK4 
and CDK6, and the more promiscuous Cip/Kip 
family, which also inhibits cyclin A and cyclin 
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E-dependent CDKs. As mentioned above, the Cip/ 
Kip family also act as assembly factors to promote 
cyclin D-dependent CDK activity. 

The four INK4 family members contain multiple 
ankyrin repeats, a structural motif associated with 
protein-protein interactions. INK4 binds to both the 
N- and C-terminal lobes of CDK6, disrupting the 
constellation of conserved residues that would be 
expected to bind to ATP. Although the CDK6/INK4 
binding site does not overlap with the anticipated 
binding site of cyclin D, the structure of CDK6 in 
complex with INK4 does not appear compatible 
with further functional cyclin D binding. This struc- 
tural result supports models of INK4 function which 
require that INK4 association with CDK4 or CDK6 
is incompatible with stable cyclin D binding. 

Recruitment of p27*'?!-family inhibitors to CDK/ 
cyclin pairs is through an RXL sequence motif shared 
by p27*?!-family members and a number of CDK 
substrates. This motif interacts with a conserved 
hydrophobic patch called the ‘recruitment site’ on 
the cyclin molecule. Inhibition by the p27™?' family 
is achieved by competition with substrates for binding 
at the recruitment site, and by further interaction with 
structural elements of the CDK. 

In addition to inhibiting CDKs, par", a member 
of the Cip/Kip family, is able to bind to the proliferat- 
ing nuclear cell antigen (PCNA). PCNA is an acces- 
sory subunit of DNA polymerase 6, and binding of 
p21@P! to PCNA inhibits DNA synthesis. p215! 
can bind to PCNA and CDKs simultaneously, pro- 
viding one way in which the regulation of the cell 
cycle and DNA replication are coordinated. 


Phosphorylation 

Phosphorylation of residues within the glycine-loop 
motif that forms part of the ATP binding site, inhibits 
CDK activity. CDKs 1, 2, 4, and 6 are phosphorylated 
on a tyrosine of this loop im vivo. CDKs 1 and 2 are 
also phosphorylated on the preceding residue Thr14. 
Members of the weel kinase family phosphorylate 
CDK1 on Tyr15. In Sc. pombe, weel can also phos- 
phorylate Thr14, but in higher eukaryotes Thr14 is 
phosphorylated by Mytl, a membrane-associated 
kinase. Tyr15 of CDK1/cyclin B is dephosphorylated 
by members of the Cdc25 family of dual-specificity 
phosphatases, providing the rate-limiting step for 
entry into mitosis. Cdc25 is activated by CDK1, creat- 
ing a positive feedback loop that leads to a rapid 
increase in CDK1 activity at the onset of mitosis. 

In higher eukaryotes and in Sc. pombe, glycine-loop 
phosphorylation is an essential element of both the 
DNA damage and DNA replication G2/M check- 
points. These checkpoints delay CDK activation to 
provide an opportunity for DNA repair before the 
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cell enters mitosis. Levels of the protein kinase Chk1 
become elevated in response to DNA damage induced 
by UV radiation leading to Cde25C phosphorylation. 
A 14-3-3 protein binds to the phosphorylated site on 
Cdc25C, causing it to translocate to the cytoplasm, 
where it is unable to activate nuclear CDK1/Cyclin 
B. Chk1 also activates weel kinase, thus ensuring a 
high level of CDK1 Tyr15 phosphorylation. Follow- 
ing cyclin degradation, the phosphorylated activation 
segment of the monomeric CDK becomes acces- 
sible to phosphatases and CDK inactivation is com- 
pleted by its dephosphorylation. The identity of the 
phosphatase responsible is still unresolved. Possible 
candidates include protein phosphatase 2C and 
kinase-associated phosphatase (KAP) (Figure 3). 


Protein degradation 

The degradation of many key activators and inhibitors 
of cell division by the ubiquitin-dependent proteoly- 
tic pathway is an important mechanism controlling 
cell cycle progression and CDK activity. CDKs are 
not subject to ubiquitin-mediated proteolysis, but 
several of their key regulators are, notably members 
of the cyclin and CKI families. Proteins are targeted to 
the 26S proteasome by the attachment of a multi- 
ubiquitin tag. This is carried out by a cascade of 
three enzymes (E1—E3). The SCF complex and the 
APC/cyclosome are two multiprotein ubiquitin 
ligases (E3s) that have essential roles in cell cycle 
regulation, amongst which is regulation of CDK 
activity. SCF complexes are active during the early 
part of the cell cycle, particularly during G1 and S 
phases. The Sa. cerevisiae SCF complexes SCF“ 
and SCF"! mediate ubiquitination of CKIs and G4 
cyclins, respectively. The APC/cyclosome is required 
for passage through, and exit from, mitosis. The activ- 
ities of both complexes are regulated by phosphoryla- 
tion. Specific phosphorylation of SCF target proteins 
provides the signal for their degradation, whereas 
the APC is activated by cell-cycle-dependent phos- 
phorylation. 


CDK Dysfunction and Role in Disease 


Two important pathways that involve CDKs neg- 
atively regulate mitotic cell cycle progression and are 
mediated by the activities of the proteins pRb and p53. 
Functional inactivation of these pathways is a frequent 
and possibly universal event in human carcinogenesis. 
In addition to their roles in cell cycle control, both 
pRb and p53 are involved in directing cells to differ- 
entiation or apoptosis. 


Aberrant CDK function can lead to inappropriate 
control of pRB. Levels of CDK4/cyclin D in cancer 
cells can increase as a result of increased cyclin D 
expression, mutations in CDK4 and/or mutations in 
the CKI p16!"***, giving rise to elevated pRb phos- 
phorylation. This in turn releases E2F, which initiates 
the transcription of genes required for S-phase. p53 is 
a short-lived transcription factor which is stabilized in 
response to DNA damage or E2F-driven expression 
of the protein p19®". As well as being able to pro- 
mote apoptosis, p53 can direct transcription of 
p219! leading to CDK inhibition and pRb-depend- 
ent G, arrest. When CDK activity is compromised, 
p53 is unable to perform this function. p53 can also 
arrest the cell at G2/M, by elevating levels of p21“?! 
and 14-3-30 (a 14-3-3 family member that does not 
bind to Cdc25C), both of which bind to and inhibit 
CDK1/cyclin B. 

The strong genetic link between aberrant CDK 
control and the molecular pathology of cancer has 
provided the rationale for developing small molecule 
CDK inhibitors as anticancer agents. The inherent 
complexity of CDK regulation offers a number of 
possible routes to their inhibition. Peptidomimetics 
of CKIs offer one such route. Directly interfering 
with CDK catalytic activity by binding ATP-com- 
petitive ligands is another attractive and successful 
strategy. Despite the high degree of sequence con- 
servation among protein kinases, small molecule in- 
hibitors selective for different CDK family members 
have been identified. Flavopiridol and UCN-01 (7- 
hydroxystaurosporine) are the first to enter clinical 
trials, although their efficacy does not result solely 
from CDK inhibition. Second generation inhibitors 
are exciting much interest, their identification being 
the result of a combination of combinatorial chemis- 
try and a detailed knowledge of inhibitor-binding 
mode. 


Further Reading 
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See also: Apoptosis; Cancer Susceptibility; 
Cell Cycle 
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Cysteine (Figure l) is one of the 20 amino acids 
commonly found in proteins. Its abbreviation is Cys 
and its single-letter designation is C. As one of the 
nonessential amino acids in humans, it is synthe- 
sized by the body and so need not be provided in the 
individual’s diet. Cys residues positioned at specific 
sites in the polypeptide chain can play a role in the 
higher order structure of the molecule through the 
formation of disulfied bridges. 


COOH 
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SH 


Figure | Cysteine. 


See also: Amino Acids 
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Cystic fibrosis (CF), also known as mucovisidosis in 
some parts of Europe, is a genetic disorder that affects 
a number of different organs. Patients with CF gener- 
ally suffer from obstructive lung disease with chronic 
bacterial infection, pancreatic enzyme insufficiency, 
and high salt content in their sweat. A special species 
of Pseudomonas bacteria is commonly found in the 
airways of CF patients. Male patients with CF also 
suffer from infertility, due to the absence or obstruc- 
tion of the vas deferens. Treatment of the lung disease 
includes aggressive antibiotics and physiotherapy. For 
patients with pancreatic insufficiency, enzyme treat- 
ment is given to ensure nutrient uptake. Salty sweat 
secretion does not generally lead to any illness but the 
sweat test is a diagnostic standard for CE More 
recently, nasal potential difference has also been used 
in resolving cases with borderline sweat values. 
Cystic fibrosis is typical of an autosomal reces- 
sive type of inheritance. Patients inherit the CF gene 
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mutations, one from each of their parents who them- 
selves are free of any CF symptoms. Carrier parents 
have a 1 in 4 chance of having a CF child at each 
pregnancy. 

CF is most prevalent in the Caucasian populations 
of Northern European ancestry, ata frequency of about 
1 in 2500 live births (and a carrier frequency of 1 in 25), 
but is relatively infrequent among people of Asian or 
African descent. If untreated, affected children usually 
die at an early age because of severe lung infection 
and malnutrition but, as a result of advances in clinical 
management, the lifespan of patients has increased 
markedly and many of them now live to adulthood. 

The basic defect in CF resides primarily in the 
secretory epithelia; the transport of water, electro- 
lytes, and other solutes across the cellular membranes 
is defective, due to the absence or deficiency of a 
chloride ion channel. The underlying mechanisms 
have only been uncovered in detail since the isolation 
of the cystic fibrosis gene in 1989. 


CF Gene 


Based on the familial inheritance of the disorder, it was 
possible to use genetic methodologies to locate the 
defective gene to a specific region in the long arm of 
human chromosome 7. Molecular techniques were 
then used to isolate and characterize the gene, which 
was found to encode a protein molecule of 1480 amino 
acids. This molecule appears to span the cellular mem- 
brane of an epithelial cell and function as a channel for 
chloride ion conductance, although it may also regu- 
late the functions of other ion channels and transport 
activities. The protein is named the cystic fibrosis 
transmembrane conductance regulator (CFTR). 

CFTR dysfunction is thought to cause an imbalance 
of salt and fluid secretion. In the sweat gland, the 
inability to reabsorb the chloride and sodium ions 
from the secreted sweat fluid results in an elevated 
salt content of the patient’s sweat. The elevated salt 
concentration in the upper airways is also thought to 
inactivate the antibacterial function of normal airway 
fluids that serves to fight off infections. The excessive 
quantity of mucus found in the lung of CF patients is 
partly due to the dehydration of the normal lubrica- 
tion mechanism (mucin) of the lung and the DNA 
released from the patient’s dead immune cells as a 
result of chronic bacterial infection. However, the 
exact mechanism of Pseudomonas (or Burkhoderia) 
colonization is unknown. 


CF Gene Mutations 


A single mutation named AF508 accounts for 70% of 
the mutant CFTR genes in the world; it corresponds 
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to the deletion of phenylalanine at position 508 of 
the CFTR protein. There are, however, over 950 
other CF mutations reported at the time of writing 
this article, although most of them are rare. There is 
significant variation in the spectrum of the relatively 
more frequent mutations among different popula- 
tions. For example, the prevalence of AF508 follows 
a north-west to south-east gradient (80-40%) in con- 
tinental Europe. There are also over 150 different 
types of alterations in the gene that do not appear to 
cause any CF disease. 

The molecular defect of CFTR mutations may 
be classified into different categories, ranging from 
defects in biosynthesis to defects in the regulation of 
channel activities. Class I mutations are defective in 
the early steps of biosynthesis such that essentially no 
CFTR is made. Class II includes AF508 for which the 
mutant protein is made in its entirety but fails to fold 
properly inside the cell, resulting in the absence of 
mature CFTR in the cellular membrane to perform 
its specific chloride ion conductance function. Class 
III mutant proteins can reach the cell membrane but 
the resulting channels fail to open on receiving normal 
physiological signals. Class IV mutant proteins reach 
the cell membrane and respond to signals, but the 
mutant channels are less effective in chloride ion con- 
ductance. Class V mutations cause a reduction in bio- 
synthesis of otherwise functional CFTR. Additional 
classes of CF mutations may be assigned but they are 
less common. 


Genotype—Phenotype Correlation 


The genotype of a CF patient refers to the description 
of the CFTR mutation(s) at the DNA level. About half 
of the CF patients worldwide carry two AF508 muta- 
tions (one from each parent) and 40% have one AF508 
and another mutation affecting a different part of the 
CFTR gene. The remaining patients have other CFTR 
gene mutation combinations. The ability to associate 
disease severity with the CFTR genotype can improve 
patient management and treatment. Indeed, a strong 
association between CF pancreatic enzyme status and 
genotype can be established; patients with sufficient 
(residual) pancreatic function are found to have one or 
two mutations of class IV or V, which appear to confer 
residual CFTR activity. Therefore, CF patients who 
have two copies of AF508 (homozygotes) are expected 
to be pancreatic insufficient and require dietary sup- 
plements of pancreatic enzymes. Proper nutrition is 
important for management of CF patients. There is a 
general correlation between pancreatic sufficiency and 
overall mild disease. 

Unfortunately, there is no direct correlation be- 
tween CFTR genotype and the other CF symptoms. 


Not only do patients with the same CFTR mutations 
have rather different disease presentations (except for 
pancreatic function status), but there is clinical hetero- 
geneity between patients within the same family. The 
prognostic value of CFTR genotype is limited. 

The poor correlation between CFTR genotype and 
CF phenotype is primarily due to the effects of other 
genetic and environmental factors. For example, the 
presence or absence of meconium ileus (intestinal 
obstruction at birth) appears to be due to the effect 
of several modifier genes, one of which is located 
on chromosome 19. Studies with mice also suggest 
the presence of modifier genes for lung disease and 
ion channel regulation in general. CF genotype- 
phenotype correlation will remain incomplete until 
the full biochemical pathway for CFTR and the physi- 
ology of the whole human body are understood. 


Genetic Diagnosis and DNA Testing 


Genetic diagnosis for CF has been in practice since the 
discovery of the closely linked DNA markers in 1985. 
In families with affected children, one can identify the 
DNA marker alleles that are associated with the 
mutant genes by analyzing the patients and both of 
their parents, and, then use the information to predict 
the status of the unknown relatives (confirming dis- 
ease status, carrier detection, and prenatal diagnosis). 
With the ability to define CF mutations at the DNA 
sequence level, it is now possible to perform genetic 
testing to identify the disease status for any random 
individuals. 

The large number of CFTR mutations and popu- 
lation variations make DNA testing in CF difficult. 
Most genetic testing laboratories are only equipped to 
test a subset of more prevalent CFTR mutations in the 
population. While the efficiency may reach over 95% 
for some relatively homogeneous populations, the 
general detection rate is about 85% for most Cauca- 
sian populations. Despite extensive CFTR mutation 
research, the coverage for certain populations (such as 
Latin Americans) remains low (50-60%). There are 
also ethical and legal issues associated with genetic 
diagnosis and DNA testing in CF. Different guidelines 
have been established for different countries and com- 
munities. 


Atypical Diseases 


Mutations in the CFTR gene have also been found in 
congenital bilateral absence of vas deferens (CBAVD), 
obstructive azoospermia, idiopathic pancreatitis, chro- 
nic obstructive pulmonary disease, diffuse bronchi- 
ectasis, allergic bronchopulmonary aspergillosis, chronic 
pseudomonas bronchitis, neonatal hypertrypsinemia, 


and asthma, all of which constitute a subset of the CF 
phenotypes. Although not all of the patients with the 
above CF-related diseases have CFTR mutations, the 
frequencies found are much higher than those 
expected for CF carriers in the population. For ex- 
ample, 80% of patients with CBAVD have one or two 
typical mutations found in CF and 10% of patients 
with asthma have a CFTR mutation. 

A number of atypical mutations are also found in 
the CF-related diseases. Many of them represent 
DNA sequence alterations that are relatively frequent 
in the population and are found in normal control 
individuals. These atypical mutations are presumably 
more susceptible to modifier gene effects, resulting in 
varied outcome in different patients. Therefore, these 
mutations present major challenges in genetic coun- 
seling as well as in the ethical and legal guidelines. 


Prospects for Treatment 


While most of the current treatments for patients are 
based on treating the symptoms of the disease, the 
discovery of the defective gene in CF is the first step 
toward the development of a more effective means of 
treatment. Specific pharmacological reagents or gene 
therapy are both realistic possibilities. In addition, 
because of the detected correlation between genotype 
and phenotype, CF health professionals may be able 
to consider different treatment plans according to 
future requirements. 


High Frequency of CF in Caucasians 


Various hypotheses have been proposed to explain the 
relatively high frequency of CF in the Caucasian 
population. Heterozygote (carrier of CF mutation) 
advantage appears to be supported by current scientif- 
ic evidence. The selective advantage may be due to 
increased resistance to diseases, such as cholera or 
tuberculosis, in CF heterozygotes who may have re- 
duced fluid secretion or reduced bacterial propagation 
on infection. Studies in mice and analyses of the 
genetic background (DNA marker haplotypes) of sev- 
eral frequent CFTR mutations have provided data that 
are consistent with the selective advantage hypothesis, 
although a complete explanation for this hypothesis is 
difficult to find. 


Further Reading 

Di Berardino MA (1997) Genomic Potential of Differentiated Cells. 
New York: Columbia University Press. 

McLaren A (2000) Cloning: pathways to a pluripotent future. 
Science 288: 1775-1780. 


See also: Gene Therapy, Human; Genetic 
Counseling; Genetic Diseases 
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Cytogenetics is the study of genetic phenomena 
through the cytological analysis of chromosomes 
under the light or electron microscope. It has develop- 
ed over the years from the crude analysis of mitotic 
cells using simple stains, to an analysis of extended 
DNA fibers using digital fluorescence microscopy 
and image analysis where the resolution may be on 
the order of 1 kilobase. Cytogenetic techniques are 
central to the assignment and localization of genes to 
chromosomes and thus to the construction of genetic 
maps. They have played an important role in the veri- 
fication of gene order in such maps and have contri- 
buted to the effort to sequence the human genome. 
Clinical cytogenetics is concerned with the diagno- 
sis and management of constitutional chromosomal 
aberrations and, increasingly, in the diagnosis of leu- 
kemia and other malignancies. Cancer cytogenetics is 
concerned with the classification of tumors and the 
identification of oncogenes and tumor suppressor 
genes. Comparative cytogenetics studies chromosome 
homology between species and contributes to the 
determination of phylogenetic relationships. 


See also: Chromosome Aberrations; Oncogenes; 
Physical Mapping; Tumor Suppressor Genes 
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Cytokinesis is the process by which the cytoplasm of a 
cell is divided after nuclear division (mitosis) is com- 
plete. 


See also: Cytoplasm; Mitosis 


Cytoplasm 
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The cytoplasm is the protoplasm outside the 
nucleus, between the nuclear membrane and cell 
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membranes. It contains organelles and various mem- 
branes. 


See also: Organelles 


Cytoplasmic Genes 
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Cytoplasmic genes are genes normally existing outside 
the nucleus, e.g., within mitochondria or chloroplasts. 


See also: Chloroplasts, Genetics of, Mitochondria, 
Genetics of 


Cytoplasmic Inheritance 
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Cytoplasmic inheritance refers to a property of extra- 
nuclear genes, e.g., those located in mitochondria or 
chloroplasts. 


See also: Chloroplasts, Genetics of; Mitochondrial 
Inheritance; Mitochondria, Genetics of 
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Cytosine is a pyrimidine, and one of the nitrogenous 
bases found in ribonucleic acid (RNA) and deoxy- 
ribonucleic acid (DNA). In nucleic acids, cytosine 
can form three hydrogen bonds to base pair with 
guanine. However, the bases are present in nucleic 
acids as nucleotides. When combined with the sugar 
ribose ina glycosidic linkage, cytosine forms a nucleo- 
side called cytidine. Cytidine can be phosphorylated 


at the 5’ position of the sugar with from one to three 
phosphoric acid groups, yielding three nucleotides: 
cytidine 5’ monophosphate (CMP), cytidine 5’ diphos- 
phate (CDP), and cytidine 5’ triphosphate (CTP). 
Free cytosine is not synthesized directly in cells; 
instead CTP is typically synthesized by amination of 
uridylate 5’ triphosphate (UTP). CTP is a substrate of 
RNA polymerase and is the source of the cytosine 
found in RNA. Analogous nucleosides and nucleo- 
tides are formed from cytosine and deoxyribose, and 
dCTP, a substrate of DNA polymerase, is the source 
of the cytosine in DNA. The deoxyribonucleotides 
are formed by the reduction of ribonucleoside diphos- 
phates. In addition to its role in nucleic acid synthesis, 
CTP is also involved in both carbohydrate and lipid 


metabolism. 


See also: Bases; Nucleic Acid; Nucleotides and 
Nucleosides; Pyrimidine 
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The cytoskeleton comprises of the internal compon- 
ents of animal cells that confer structural strength 
and motility; these components are predominantly 
microfilaments (of actin), microtubules (of tubulin), 
and intermediate filaments. 


See also: Cell Cycle 


Cytosol 
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The cytosol is the general volume of cytoplasm that 
remains when organelles and internal membrane sys- 
tems are removed. 


See also: Cytoplasm 
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Félix Hubert d’Herelle (1873-1949), a French- 
Canadian microbiologist, is best known as the 
co-discoverer of bacteriophages and strong early pro- 
ponent of their application as antibiotics. He was born 
23 April 1873 in or near Montreal; after receiving a 
baccalaureate from a lycée in Paris he was essentially 
self-taught and led a peripatetic existence for most of 
his professional life. His first position was as a micro- 
biologist for the government of Guatemala from 1901 
to 1907. From there he went to Yucatan (Mexico) to 
work for the Ministry of Public Works, and then to 
Paris in 1911, where he joined the Pasteur Institute as a 
research scientist. In 1922 he left the Pasteur Institute 
to spend two years in the Institute of Tropical Medi- 
cine in Leiden. He was then appointed chief of the 
bacteriology service of the League of Nations quar- 
antine service in Alexandria (Egypt), a post he held 
until 1928 when he went to Yale as Professor of Proto- 
biology. He left Yale in 1933, spent two years at the 
Institute for Microbiology in Tbilisi (Georgia), and 
then returned to Paris in retirement in 1937. He died 
in 1949. Among many honors, d’Herelle received the 
Leeuwenhoek Medal of the Royal Dutch Academy of 
Medicine in 1925, and the MD honoris causa from 
Leiden. 

D’Herelle’s initial work was on various practical 
problems of fermentation: in Guatemala he studied 
the fermentation of the excess banana crop to produce 
commercially viable, potable liquor, which he termed 
“my banana whiskey.” In the Yucatan, he devised a 
commercially successful fermentation process based 
on the residue from the sisal crop (bagasse) to make 
industrial alcohol. In both these endeavors, he was 
especially concerned to find new and useful yeast 
strains that were specific to the particular substrates 
to be fermented. 

While in the Yucatan, he noted that the periodic 
locust plagues were sometimes accompanied by a dis- 
ease that infected the locusts; he took up the study of 


these epizootics of locusts, eventually isolating an 
organism which was pathogenic for locusts. This 
organism, called at the time Coccobacillus acridiorum 
d’Herelle, was employed in antilocust campaigns 
both in South America and in North Africa with vari- 
able success. While he has been recognized as the 
founder of modern biological pest control, his original 
organism has been supplanted by Bacillus thuringien- 
sis which is more reliable because, as a spore-former, it 
can be prepared in a stable form for field use. 

In the course of his work on this intestinal disease 
of locusts, d’Herelle noted occasional cultures that 
failed to grow or were not pathogenic for the locusts. 
He surmised that there was some other organism 
associated with the Coccobacillus that was altering its 
pathogenicity. Later in Paris he was investigating an 
outbreak of dysentery among French soldiers in 
World War I, and he again observed this phenomenon 
of variation in growth of cultures. In this instance he 
found that a bacteria-free filtrate of the dysentery 
samples could cause the complete lysis of fresh bacter- 
ial cultures, and because this lysis could be serially 
transmitted indefinitely, he hypothesized that there 
were invisible microbes in these filtrates that were 
growing on the bacteria at their expense. These invis- 
ible microbes caused clear spots of lysis on bacterial 
films spread on agar surfaces. These clear spots he 
called ‘plaques’ and interpreted them as colonies of 
the invisible microbes which were growing from the 
initial infection of single particles. He called these in- 
visible microbes ‘bacteriophage.’ His assay by plaque 
counting is the standard method still in use today. 

When he investigated the origin and distribution of 
these bacteriophage, d’Herelle noted that they were 
most often detected in patients who were recovering 
from infectious diseases. From this observation he 
proposed that bacteriophage were responsible for the 
usual course of recovery from infections and that they 
were responsible for a type of exogenous immunity. 
He was quick to exploit this idea and employed 
phages in clinical trials as therapy for various infec- 
tious diseases. In the era before antibiotics there was 
much excitement about the possibility of phage ther- 
apy and several major pharmaceutical companies 
offered bacteriophage preparations for use in human 
beings. The development of bacterial strains that were 


512 Darlington, Cyril Dean 


resistant to bacteriophage infection soon became 
apparent, and d’Herelle and others investigated this 
phenomenon. Often the resistant strains had antigenic 
determinants, growth characteristics, and virulence 
properties quite different from the initial organism. 
D’Herelle took the position that bacteriophage altered 
the genetic potential of the bacteria, somewhat like a 
mutagen which was also a selective agent. He advo- 
cated a neo-Lamarckian view which was popular 
among French biologists at the time. 

D’Herelle’s interpretation of bacteriophage as 
microbes was not shared by most of the influential 
scientists of the period. Most biologists, led by Jules 
Bordet from Brussels, believed that the “bacterio- 
phage phenomenon” was due to an inducible lytic 
enzyme that was present in a latent or inactive form 
in the bacteria prior to treatment with the active phage 
lysate. Their position was no doubt influenced by the 
recent discoveries of autocatalytic enzyme activation 
such as the conversion of pepsinogen to pepsin. 
D’Herelle was engaged in a long controversy about 
the “nature” of bacteriophage, but it was only in about 
1940 that he was vindicated by the visualization of 
phage as discrete particles with definite morphology 
when the electron microscope was perfected. 


See also: Bacteriophage Therapy; Bacteriophages 
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Cyril Dean Darlington (1903-1981) made major dis- 
coveries about the chromosome theory of heredity, 
unifying biology through the fundamental principles 
of evolution, cytology, genetics, and biochemistry, 
and was one of the influential figures of biology of 
the twentieth century. Building on diverse work from 
animals and plants, he was the first to describe clearly 
the nature of the partitioning and segregation of chro- 
mosomes at mitosis, and the recombination events be- 
tween chromatids that occur at meiosis. He advanced 
the dictum that looking at chromosomes was another 
way of looking at genes, a view that profoundly influ- 
enced fundamental biological thought. His work was 
marked by the ability to synthesize technical excel- 
lence and intellectual insight into the behavior of his 
experimental materials and the meaning of the struc- 
tures he visualized by microscopy. 

Brought up in London, he graduated with a BSc 
in agriculture from the college that became Wye 


College, University of London (1923), although he 
later attributed his success to lack of academic train- 
ing. Having been inspired by the book The Physical 
Basis of Heredity by Morgan, Sturtevant and Bridges 
(Morgan et al., 1919), Darlington moved to the John 
Innes Horticultural Institution as a volunteer to work 
under William Bateson. At the John Innes Institution, 
his plan to become a farmer in Australia was abandon- 
ed, and he progressed to a position as a member of 
staff, eventually becoming Director in 1939, before 
moving to the Sherardian Chair of Botany in Oxford, 
a post he held from 1953 to retirement to an Emeritus 
Professorship in 1971. 

From the start, the work of Darlington was 
strongly theory-, hypothesis-, and model-driven, inte- 
grating a wide variety of facts and observations to 
make a unified science of cytogenetics. Working with 
Len LaCour, Kenneth Mather, W. C. F. Newton, and 
others at the John Innes Institution, he was respon- 
sible for many technological advances in cytology (e.g., 
Darlington, 1939), developing the methods of chromo- 
some spreading for investigations of meiosis and 
mitosis. These entirely replaced earlier laborious 
embedding and sectioning methods which were 
much more difficult to interpret. In the late 1920s, 
there was considerable controversy about the nature 
of events of meiosis, but he built on his own observa- 
tions in polyploids to show that all chiasmata result 
from crossing-over between chromatids of partner 
chromosomes. He also concluded that the chromo- 
some consists of a single strand of duplex DNA, rea- 
soning that the double structures visible at mitosis 
arose from replication, while those at meiosis arose 
by pairing of parental chromosomes. This clear, 
although originally controversial, model was sup- 
ported by genetical experiments as well as by observa- 
tion, and fitted well to his view that “Hypothesis 
based on comparative inference has often proved 
more reliable than the ‘facts’ of direct observation.” 
His structural observations of chromosomes at mito- 
sis and meiosis placed in the broader context of the cell 
cycle allowed him to discover the now-accepted role 
of the ‘centromere, adopted from Waldayer, in chro- 
mosome segregation. 

His first notable book, Recent Advances in Cyto- 
logy (Darlington, 1932), was a remarkable synthesis of 
large amounts of data about chromosomes in mitosis 
and interphase, from plants and animals, organizing 
disparate observational data about the nucleus. For 
the first time, he presented the concept of the central- 
ity of genetic and gene control of breeding systems 
and genetic mechanism. Parts of this book were ex- 
panded into The Evolution of Genetic Systems 
(Darlington, 1939 and later editions). The theories pro- 
pounded in this volume are central to the integration 


of cytology and genetics into population and evolu- 
tionary biology. 

Despite his sometimes abrasive personality, Dar- 
lington was a shrewd leader, teacher, and mentor. Dur- 
ing the 14 years of his directorship of the John Innes, 
there were an average of about 17 staff, producing 
more than 40 papers a year, and of the colleagues and 
students, 11 became Fellows of the Royal Society, 
with many becoming professors or directors. He 
held strong views on publication, regarding work not 
published as work not done. Darlington made import- 
ant contributions to debates on science and politics, 
where his strongly held views and a number of mis- 
understandings, particularly about the importance of 
the genetic component to behavioral characteristics, 
led to many disputes. While being generous to his 
friends and collaborators, he would go out of his 
way to enrage his enemies: it has been stated that, as 
much as friends, his enemies were a source of great 
inspiration and happiness to him! He bore grudges 
with pleasure and did not forget or forgive, having 
an overriding mistrust of authority and the nature of 
committees. He held strong and controversial views on 
the teaching of biology, including the need to build 
on the unifying characteristics of the study of genetics, 
which now underpin most university biology courses 
outside medicine. He was concerned with the suppres- 
sion of science for political and ideological ends, and 
the conflict of science and society. Many of his com- 
ments are relevant today, for example, in pointing out 
the lack of scientific basis to policy formulation, par- 
ticularly in agriculture; fortunately, other points, such 
as the lack of application of genetics to livestock im- 
provement and forestry, have been corrected. 


Further Reading 

Anonymous (1981) Obituary: Professor CD Darlington. The 
Times, 27 March 1981. 
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161-167. 
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As one of the most celebrated natural historians of 
Victorian England, Charles Darwin (1809-82) made 
major contributions in geology, paleontology, zoo- 
logy, botany, and psychology. His theory of evolution 
by means of natural selection is one of the great, 
unifying concepts in biology. As the preeminent 
twentieth-century geneticist and evolutionary bio- 
logist Theodosius Dobzhansky (1900-75) said, “Noth- 
ing in biology makes sense except in the light of 
evolution.” (Cited in Mayr, 1991, p. 105.) The 
implications of this work extend far beyond the natural 
sciences. Darwin has profoundly altered our percep- 
tion of nature and our relationship to it. 


The Life of Charles Darwin 


Charles Robert Darwin was born to Robert and 
Susanna (Wedgwood) Darwin in Shrewsbury, England 
on 12 February 1809 (on the same day as Abraham 
Lincoln). He was their second son and the fifth of their 
six children. His grandfather was the noted physician 
and poet Erasmus Darwin (1731-1802). Darwin was 
born into a wealthy family, one with a prominent 
scientific heritage and connections to the prestigious 
Wedgwood pottery firm. 

By all accounts, Darwin was an unremarkable stu- 
dent who showed no aptitude for classical studies or 
mathematics while attending a local private boarding 
school. He rejected medicine as a career (after observ- 
ing surgical procedures performed without benefit of 
anesthesia at Edinburgh) and spent little time on his 
academic work while studying to be a clergyman at 
Christ’s College, Cambridge. Instead, he went riding 
and shooting, walked in the countryside and collected 
beetles. During his last year at Cambridge, Darwin 
met John Stevens Henslow, an innovative teacher of 
botany and mineralogy who became his mentor. It was 
Henslow who recommended Darwin to Captain 
Robert Fitzroy of HMS Beagle. 

In 1831, the surveying ship HMS Beagle was sched- 
uled to return to South America where it would 
complete charting the coastal waters. Because the voy- 
age would be long and strenuous, and the captain’s 
position was socially isolated, Captain Fitzroy wished 
to include a gentleman companion, someone of his 
own social status with whom he could interact. This 
person could also collect specimens and make natural 
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history notes, although not as a professional naturalist. 
Darwin’s father allowed him to go and agreed to 
pay his expenses. At the age of 22, Charles Darwin 
embarked on a voyage that would change his life. The 
Beagle left Plymouth in December 1831. While at sea, 
Darwin was often sea-sick. When the ship was making 
detailed coastal surveys, he spent time on land explor- 
ing, studying local geology, and collecting samples of 
the native flora and fauna. He shipped biological, 
fossil, and geological specimens back to England 
whenever he could. Once the survey of South America 
was complete, the expedition headed west; they 
stopped at the Galapagos Islands before heading 
across the Pacific. As a result of these many and varied 
experiences, Darwin gradually realized that he did not 
want to enter the clergy. Instead he wished to become 
an independent scholar, as his older brother had. 

Upon his return to England in October 1836, 
Darwin’s life was a flurry of activity that centered on 
organizing and examining his collections. He also 
published his journals which brought him both praise 
and fame. He married his cousin Emma Wedgwood 
in January 1839. For the first few years they lived in 
London. As their family grew, they purchased a home 
and moved to Down(e), Kent in 1842, where they 
resided for the rest of their lives. Between 1838 and 
1854, Darwin edited a five-part series on his zoologic- 
al collections from the Beagle and wrote several 
books, including the journal of his researches on that 
voyage, three books on geology (the formation of 
coral reefs, volcanic islands, and South America), 
and two monographs on barnacles. By 1842, Darwin 
had developed the major outline of his ideas about 
evolution and natural selection. He expanded this into 
a 230-page manuscript in 1844 and left specific instruc- 
tions about its publication in the event of his death. 
He did not return to it until 1856, after completing 
an exhaustive study of living and fossil barnacles. 

In 1858 a young, unknown naturalist, Alfred Russel 
Wallace (1823-1913), sent Darwin a manuscript in 
which he also proposed the idea of evolution by 
means of natural selection. Recognizing that Wallace 
had developed the same idea independently, Darwin 
chose to release both his and Wallace’s work simultan- 
eously. Darwin’s friends Sir Joseph Hooker and Sir 
Charles Lyell presented Wallace’s paper and excerpts 
from Darwin’s letters and 1844 manuscript to the 
Linnean Society of London in July 1858. The next 
year (1859) Darwin published an abstract of his 
work in the book entitled On the Origin of Species 
by Means of Natural Selection, or the Preservation of 
Favored Races in the Struggle for Life (now simply 
referred as On the Origin of Species). 

His later work, from 1859 to 1872, focused on 
evolution, beginning with On the Origin of Species 


(which went through six editions), as well as books on 
domesticated plants and animals, human evolution 
and sexual selection, emotional expression in humans 
and animals, several books on plants, and one on 
earthworms. 

Despite his prolific scientific work, Darwin’s health 
was not good from 1838 until the end of his life. No 
specific causes have been identified, although stress 
and overwork certainly contributed; some speculate 
that he may have acquired American trypanosomiasis 
after being bitten by the vector reduviid bug in South 
America. Others have speculated that he had ten- 
dencies towards psychosomatic illness. He died at 
Down on 19 April 1882 and is buried in Westminster 
Abbey beside another great British scientist, Sir Isaac 
Newton. 


Influences on Darwin and the 
Development of his Ideas 


Among the many influences on Darwin, three are 
especially important. 


Experiences on the Voyage of the Beagle 
The five years Darwin spent on the voyage of the 
Beagle represent a period of exploration, discovery, 
and personal growth that changed him in many ways 
and shaped his future intellectual development. Dur- 
ing this period, Darwin was strongly influenced by 
the work of the geologist Charles Lyell (1797-1875). 


The Work of Geologist Charles Lyell 

Darwin brought the first volume of Lyell’s book Prin- 
ciples of Geology with him on the Beagle and had later 
volumes sent to him during the voyage. Lyell pro- 
posed that natural processes working now are also 
responsible for past geological change (uniformitari- 
anism). This philosophical position was in opposition 
to catastrophism which held that history was punctu- 
ated by unpredictable, sudden events that changed its 
course. As Browne (1995, p. 186) states, “Lyell’s book 
taught Darwin how to think about nature.” Darwin 
applied Lyell’s principles in his geological studies and 
found they helped him analyze his observations and 
test his conclusions. Darwin adopted Lyell’s concept 
of slow, gradual change of the earth and extended it 
to include the organisms living on it. 


An Essay on Population... by economist 
Thomas Malthus 

Upon his return to England, Darwin reviewed all of 
his notes and collections, and his ideas about the 
‘transmutation’ of species began to take shape. In 
1837 he began the first of many notebooks which 


document how his ideas developed. While he consid- 
ered these questions, Darwin read Malthus’ 1798 
Essay on Population. This included a discussion 
about population growth, emphasized the conflict 
between limited resources and robust reproductive 
potential, and suggested the mechanism of natural 
selection to Darwin. 


On the Origin of Species by Means of 
Natural Selection 


On the Origin of Species is an extraordinary book. The 
first printing was issued and sold out on 24 November 
1859. It offers a natural explanation for the unity and 
diversity of life, with detailed supporting evidence. 
Throughout this work, Darwin introduces a different 
vision of the living world by using what Ernst Mayr 
(1904— ) terms ‘population thinking.’ Essentialism 
has been an integral part of Western tradition for 
over 2000 years; this philosophy considers differences 
among individuals as unimportant because they are 
interpreted as mere imperfections of some essential 
type. With population thinking, Darwin emphasized 
the uniqueness of individuals, the importance of vari- 
ability within populations, and the importance of 
competition among members of the same species. 


The Major Ideas 

This book includes not one, but several major ideas. 
Darwin’s contemporaries accepted some of them, but 
not others. Mayr (1991) recognizes five major ideas: 


1. Life evolves. The world is neither constant nor re- 

cent; instead the world and organisms change over 

time. 

Common descent of species. Groups of species alive 

today share common ancestors that lived in the 

past; this explains their similarities. 

Multiplication of species. One species can split into 

two or more independent lineages; this leads to the 

diversity of life. 

Gradualism. Change occurs slowly, in small 

amounts, rather than in single bursts of large change. 

5. Natural selection. Genetically variable individuals 
and local environmental conditions interact. This 
leads to greater reproductive success of those 
individuals with certain variants and a change in 
the characteristics of a population over many 
generations. 


N 


p 
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The Argument for Natural Selection 

The first four chapters of the book outline the argument 
for natural selection as the mechanism leading to evo- 
lution. It is based on several common observations 
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and inferences. The potential for population increase, 
stable population sizes, and limited resources together 
suggested that there must be competition among 
the members of a population, with only a small pro- 
portion of the offspring surviving in each generation. 
It was also well known that individuals differed from 
each other and that many of these differences were 
inherited. This led Darwin to suggest that individuals 
with specific inherited variants might have a greater 
chance of surviving, reproducing, and passing those 
variants on to the next generation. Over many genera- 
tions, the average character of the population would 
change (evolve). This is the process of natural selec- 
tion. If the environment changes, different variants 
may have an advantage and will be favored to repro- 
duce more successfully. This will lead to further 
alterations in a population whose members inherit 
the advantage. Evolution is neither accidental nor 
planned; it is opportunistic. Evolution is not about 
progress; it is about change. 


Evidence Cited to Support the Idea of 
Evolution 

The rest of the book summarized the lines of available 
evidence supporting the ideas of evolution and com- 
mon descent. This evidence focused on six areas: 


1. Systematics. The hierarchical structure of the Lin- 
naean classification system could reflect the pattern 
of relationships among species. Members of taxo- 
nomic groups higher in the system would be des- 
cended from more distant ancestors while members 
of lower groups shared more recent ancestors. 

The geographic distribution of organisms. He noted 
that related species usually live in areas that are 
geographically connected. Species living on islands 
usually are most closely related to species from the 
nearest mainland. 

3. Comparative anatomy. Comparisons of different 
species demonstrated how inherited structures 
could retain a basic underlying structure and yet be 
modified for different functions in various lineages 
or be reduced in size (vestigial structures). 
Embryology and the similarity of the earliest stages 
of development suggested there is a common in- 
herited developmental pattern upon which later 
modifications are made. 

Fossils suggested a continuity in time between 
organisms that lived in the past and those alive 
today in the same area. 

Artificial selection. The physical changes in domes- 
ticated plants and animals achieved by human inter- 
vention are analogous to natural processes acting 
over longer periods of time. 
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The Response to On the Origin of 
Species 


The initial response to Darwin’s work was varied. 
Most of his early supporters accepted the idea of 
evolution and common descent, and they defended 
this idea. They were less enthusiastic about some of 
his other ideas, especially natural selection. Various 
critics objected to his use of indirect evidence 
(especially for natural selection), his focus on small, 
continuous variation (rather than large, discrete differ- 
ences), his emphasis on slow, gradual change (rather 
than rapid change), the implication that the earth was 
much older than contemporary physicists estimated, 
and a perception that he was suggesting “evolution by 
accident.” 

A major criticism focused on the lack of an ad- 
equate mechanism of heredity. Darwin was keenly 
aware of this limitation. None of the ideas current in 
the mid-1800s (blending inheritance; inheritance of 
acquired characteristics) really explained Darwin’s 
model of natural selection. In his 1868 book The Vari- 
ations of Animals and Plants under Domestication, 
Darwin proposed pangenesis as an hypothesis about 
inheritance, but experimental evidence did not sup- 
port it. Gregor Mendel (1822-1884) did not complete 
or publish his work on inheritance until 1866. Few of 
Mendel’s contemporaries understood the significance 
of his work, and it was not until the twentieth century 
that biologists came to appreciate how these two great 
ideas complement one another. 


Philosophical Implications of Darwin’s 
Work 


The philosophical implications of Darwin’s ideas were 
immediately apparent, and they remain unresolved for 
many to this day. His proposals challenge several basic 
tenets of Judeo-Christian tradition and clearly place 
humans within the natural order instead of separate 
from it. His views suggested there was no design, 
no progress, and no perfection. In contrast to the 
philosophy of essentialism, Darwin emphasized the 
uniqueness of individuals and the importance of dif- 
ferences among them. He challenged the influence of 
the physical sciences, with their emphasis on mathe- 
matical universal laws, and introduced the concepts 
of probability and chance to scientific explanation. 


From Darwin to the ‘Evolutionary 
Synthesis’ 


Darwin’s work, both On the Origin of Species and his 
many other books published after that, stimulated 
an immense amount of commentary, debate, and 


experimentation. The ideas of evolution, common 
descent, and multiplication of species were quickly 
accepted, but Darwin’s mechanism of natural selection 
was not. The great German biologist August Weis- 
mann (1833-1914) was an early and influential propon- 
ent of natural selection who also made important 
contributions to our understanding of genetics. 
When Mendel’s work was “rediscovered” by Hugo 
de Vries (1848-1935) and others in 1900 it was thought 
to be inconsistent with Darwin’s ideas. The differences 
that Mendel described were large and discontinuous 
rather than small and continuous. It was not until the 
1920s that geneticists began to demonstrate how Men- 
del’s work complemented that of Darwin. 

In the 1920s the great mathematical population 
geneticists, R.A. Fisher (1890-1962), J.B.S. Haldane 
(1892-1964), and S. Wright (1889-1988) developed 
theoretical models that demonstrated how natural 
selection worked. Experimental data could be tested 
with these models. In the 1930s to 1940s, apparent 
conflicts among genetics, systematics, and paleon- 
tology were resolved, leading to a more unified view 
of evolutionary biology; this is referred to as the 
‘evolutionary synthesis.’ 


Evolutionary Biology Today 


Darwin’s work has withstood the test of time. Evolu- 
tionary principles now are applied in such disparate 
fields as agriculture, conservation, medicine, and psy- 
chology. There is a greater appreciation for the role 
of random factors in population change (genetic drift, 
founder effect, bottlenecks); models of speciation and 
divergence encompass situations of geographic isola- 
tion and situations in which populations become dif- 
ferent while living in the same area. Many examples of 
natural selection have been documented in nature, and 
the role of sexual selection is being examined more 
closely. The potential for human activity to alter 
the environment of other organisms in ways that 
lead to their evolution in “unwanted” directions is a 
major concern in agriculture (resistance to pesticides, 
insecticides, or herbicides), conservation (changes 
in behavior), and medicine (antibiotic resistance of 
pathogens). 

Twentieth-century discoveries in biochemistry, 
genetics, cell biology, molecular biology, and develop- 
mental biology all offer additional support for 
Darwin’s ideas about evolution and the unity of life. 
The discovery of the structure of DNA in 1953 and the 
deciphering of the genetic code suggests that all life 
on this planet is descended from one common ances- 
tor. With modern genetic technologies (restriction 
fragment length polymorphism analysis, DNA fin- 
gerprinting, and DNA sequencing) we can compare 


modern (and in a few cases, fossil) species and inves- 
tigate the relationships among them. Today, evolu- 
tionary biology is a thriving field of study that 
ranges from studies of DNA to the ecosystem, encom- 
passes all living (and near-living) things on this planet, 
and includes analysis of changes in the past, changes 
underway, and predictions about changes in the future. 
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The complete genome of the bactertum Haemophilus 
influenzae was published in 1995. For the first time, 
it became possible to look at the complete DNA 
sequence of the whole circular chromosome of a bac- 
terium. Since then, many more bacterial (and archaeal 
and eukaryotic) genomes have been sequenced and 
deposited into GenBank. At the time of writing this 
article (September 2000) there are currently about 86 
prokaryotic genomes that have been sequenced, of 
which 52 (9 archaeal and 43 bacterial genomes) are 
publicly available. The number of sequenced genomes 
will continue to grow quickly, as it is now possible to 
sequence a bacterial genome in a single day. This is a 
mixed blessing for researchers in that it often feels as if 
there is too much information. 

The purpose of this article is to provide an overview 
of the genome databases currently available. Due to 
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the transient nature of the lists, all of the databases are 
websites, which can be updated easily and regularly, as 
more genomes are sequenced. 

Genome databases can be divided into four broad 
categories: 


“Archival” Databases Which Contain 
Sequences of Published Genomes 


There are several databases which contain sequences 
of published genomes, in various formats. Perhaps the 
most common format for many molecular biologists is 
GenBank, although many people also use the European 
Molecular Biology Laboratories (EMBL) or DDBJ 
(DNA Data Bank of Japan) format. GenBank, EMBL, 
and DDBJ all contain the same data, in slightly differ- 
ent formats. In all of the databases in this group, it 
is possible to download the complete genomic se- 
quence, either with or without annotation of the coding 
regions. 

The NCBI web page is updated regularly, and 
provides a good overview of the sequences available, 
with lists sorted either alphabetically or by taxonomic 
group. In addition, large plasmids which are part of 
the genome are usually included in the entries. The 
GenBank site is simply an ftp site, with little informa- 
tion about the individual genomes, although it is good 
for downloading the genome sequences. The EMBL 
page allows one to download genome sequences in 
a variety of formats, including a “segment” format, 
where it is possible to obtain a sequence of only a 
small region of the chromosome. The DDBJ site uses 
a JAVA applet to allow the user to access a graphical 
view a particular region of the chromosome. 


Databases at Major Sequencing Centers, 
Which Contain Access to “Ongoing” 
Genome Projects 


How does one find information about which of the 
genomes that have been sequenced are similar to a 
particular organism being studied? There are a couple 
of good places to start. For published sequences, the 
NCBI page and TIGR website, mentioned at the top 
of the list in Table I, are very good resources. How- 
ever, there are many additional genomes that have 
been sequenced and are publicly available, even 
though they have not yet been published. This infor- 
mation can be spread amongst several different data- 
bases, and the best current method seems to be 
checking a number of websites on a regular basis to 
keep updated. The Sanger Centre regularly updates its 
web pages with progress on sequenced genomes and 
allow access to the “raw data,” before it has been fully 
assembled. Most of the bacterial genomes have been 
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Table I A list of genome databases? 


Type 


Name of database 


|. Lists of published genomes 


2. Links to genome sequencing centers 


3. Lists of links about sequenced genomes 


4. Lists of genome analysis web pages 


NCBI list of sequenced genomes 
http://www.ncbi.nim.nih.gov/PMGifs/Genomes/org.html 
GenBank 

ftp://ncbi.nlm.nih.gov/genbank/genomes/ 

EMBL 

http://www.ebi.ac.uk/genomes/ 

DDBJ (DNA Data Bank of Japan) 
http://gib.genes.nig.ac.jp/ 


Sanger Centre 

http://www.sanger.ac.uk/Projects/ 

TIGR 

http://www.tigr.org/tdb/mdb/mdbcomplete.html 

TIGR Latest Update for Unfinished Microbial Genome Data 
http://www.tigr.org/cgi-bin/BlastSearch/ReleaseDate.cgi 

TIGR’s “ongoing projects” 
http://www.tigr.org/tdb/mdb/mdbinprogress.html 

University of Oklahoma’s Advanced Center for Genome Technology 
http://www.genome.ou.edu/ 

Washington University in St Louis Genome Sequencing Center 
http://genome.wustl.edu/gsc/C_elegans/navcelegans.pl 


NCBI list of bacterial genomes that are complete but not published 
http://www.ncbi.nim.nih.gov/Entrez/Genome/org.html 

NCBI list of completed and ongoing projects 
http://www.ncbi.nim.nih.gov/PMGifs/Genomes/bact.html 

Blast NCBI genomes 
http://www.ncbi.nim.nih.gov/Microb_blast/unfinishedgenome.html#GENOMES 
Complete genomes in KEGG (Kyoto Encyclopedia of Genes and Genomes) 
http://www.genome.ad.jp/kegg/catalog/org_list.html 

GOLD — Genomes OnLine Database 
http://216.190.101.28/GOLD/completegenomes.html 

GOLD — “Ongoing” Genomes OnLine Databse 
http://216.190.101.28/GOLD/prokaryagenomes.html 

Infobiogen list of complete genomes 

http://www.infobiogen. fr/doc/data/complete_genome.html 

Infobiogen list of incomplete genomes 
http://www.infobiogen.fr/doc/data/uncomplete_genome.html 

The Enhanced Microbial Genomes Library 

http://pbil.univ-lyon | .fr/emglib/emglib.html 

NIH (National Institute of Allergy and Infectious Diseases) supported 
projects 

http://www.niaid.nih.gov/dmid/genomes/genome.htm 

Department of Energy (DOE) funded microbial genomes, completed and 
ongoing projects 
http://www.er.doe.gov/production/ober/EPR/mig_cont.html 


KEGG: Kyoto Encyclopedia of Genes and Genomes 
http://www.genome.ad.jp/kegg/ 

BMERC — Completed genomes search and analysis 
http://bmerc-www.bu.edu/bioinformatics/bioinformatics.html 
Comparative sequence analysis of whole genomes 
http://www.bork.embl-heidelberg.de/Genome/ 
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“What Is There?” — Interactive Metabolic Reconstruction on the Web 
http://129.15.12.51:8080/WIT2/CGl/index.cgi?user= 

NCBI’s Complete Genomes Page 
http://www.ncbi.nim.nih.gov/Complete_Genomes/ 

CBS DNA Structural Atlases for Complete Genomes 
http://www.cbs.dtu.dk/services/GenomeAtlas/ 


“An updated version of this list can be found at the following URL: 


http://www.cbs.dtu.dk/services/GenomeAtlas/Table | .html 


sequenced either by the Sanger Centre or TIGR. The 
TIGR website is also updated regularly, and prelim- 
inary sequencing data can be downloaded with per- 
mission from TIGR. Preliminary data, including 
sequenced but unpublished genomes, are also avail- 
able from the University of Oklahoma and Washing- 
ton University in St Louis. There are other sequencing 
centers; this list is meant to be an overview, and is not 
exhaustive. 


Databases Which Contain a Centralized 
Set of Links to Sequenced Genomes 


There are many websites which contain lists of 
sequenced projects, with links about the various 
genomes, such as which lab the genome was se- 
quenced in, who funded it, and taxonomic classifica- 
tion of the organism. The NCBI list is well maintained 
and current. The INFOBIOGEN website is a good 
place to check the status of sequencing projects; this 
site also has links to FASTA files of the sequences 
from the various genomes. The GOLD website con- 
tains listing of all sequenced genomes, including those 
done by industry which are likely not to be part of 
the public domain for several years. The Enhanced 
Microbial Genomes Library not only contains lists of 
genomes, but provides “improved and corrected anno- 
tations.” Finally, the last two websites in this section 
are lists of microbial genomes funded by the National 
Institute of Allergy and Infectious Diseases (NIAID) 
and the Department of Energy (DOE), both in the 
USA. We also maintain a list of completed genomes 
that is updated on a regular basis (http://www.cbs.dtu. 
dk/services/GenomeAtlas/). 


Bioinformatic Databases Which Analyze 
Various Forms of Data from Genome 
Projects 


We maintain the DNA Structural Atlas of Genomes 
web page, which is updated on a regular basis (http:// 
www.cbs.dtu.dk/services/GenomeAtlas/); we use a 
graphical representation of the whole genome on a 
single page to summarize structural properties. An 
example of this is shown in Figure |, which isa DNA 


Genome Atlas for chromosome 3 of Plasmodium 
falciparium. Note that the telomere regions contain a 
curved band (deep blue in band A in the figure), and 
are generally more thermostable, i.e., will melt at a 
higher temperature (green in band B) and are more 
rigid (dark green in band C). Also they contain a direct 
repeat (blue in band E) and a different, larger region of 
inverted repeats (red in band F). Although both telo- 
meres are GC rich (lighter red in band H), one end 
contains primarily Gs (turquoise at the right hand end 
of band G) whilst the other is enriched in Cs (purple in 
the left hand side of band G). The atlases are a method 
of obtaining an overview of an entire genome. 

There are many other web sites devoted to bioin- 
formatics of whole genomes. One of the most com- 
prehensive projects for analysis of complete genomes 
is the Kyoto Encyclopedia of Genes and Genomes 
(KEGG) database, which has entries on metabolic 
pathways, regulatory pathways, and gene expression 
in whole genomes. The BioMolecular Engineering 
Research Center (BMERC) contains tools for com- 
parison of different genomes, as well as the next 
two web sites in the table. The final link (“What Is 
There?”) attempts to produce metabolic reconstruc- 
tions for sequenced (or partially sequenced) genomes. 


Summary 


There are hundreds of genome databases available; key 
web pages are shown in Table |. Many of these will 
allow blast searches to be done, both against the pub- 
lished genomesas wellas the “current ongoing” genome 
projects. DNA Structural Atlases are a way of viewing 
whole genomes, in terms of DNA structures, and are 
useful for finding regions of unusual DNA structures. 
The number of sequenced genomes will soon reach 
more than a hundred. Genome databases are necessary 
to track and better utilize this information. 


Further Reading 

Baxevanis AD (2000) The Molecular Biology Database Collec- 
tion: an online compilation of relevant database resources. 
Nucleic Acids Research 28: |—7. (Note. Nucleic Acids Research 
traditionally devotes the first issue in January to sequence 
databases.) 
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Figure | (See Plate 9) DNA “Genome Atlas” for Plasmodium falciparium. The different colored lines are as described in the text and at our website (http:// 
www.cbs.dtu.dk/services/GenomeAtlas/). (Note: this figure can also be seen at the following URL: http://www.cbs.dtu.dk/services/GenomeAtlas/Pfalciparum/ 
pfal_3.genomeatlas.lin.htm |.) 
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Nelson KE, Paulsen IT, Heidelberg JF and Fraser CM (2000) 
Status of genome projects for nonpathogenic bacteria and 
archaea. Nature Biotechnology 18: 1049-1054. 

Pedersen AG, Jensen LJ, Steerfeldt HH, Brunak S and Ussery 
DW (2000) A DNA structural atlas for Escherichia coli. Journal 
of Molecular Biology 299: 907—930. 


See also: Genome Organization; Genome Size 
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Although the laws governing the segregation of her- 
editary traits in hybrid crosses of cultivated plants were 
first published by Gregor Mendel in the 1860s, they 
remained virtually unknown until they were re- 
discovered 40 years later by three men working inde- 
pendently: Carl Correns, of Germany, Erich von 
Tschermak, of Austria, and Hugo de Vries (1848- 
1935), of the Netherlands. Of this trio, de Vries is the 
one considered to have made the greatest contribution 
to laying the foundations for the discipline that (fol- 
lowing a proposal made by William Bateson in 1906) 
came to be known as ‘genetics.’ 

Hugo de Vries was born in Haarlem in 1848. After 
graduating from the University of Leiden in 1870, he 
taught at a school in Amsterdam for four years before 
studying plant physiology at the German University 
of Halle, where he was awarded a doctoral degree in 
1877. On his return to Holland, de Vries was ap- 
pointed lecturer in plant physiology at the University 
of Amsterdam, where, rising rapidly through the 
academic ranks, he became a full professor in 1881. 
His early research was concerned with plant respira- 
tion and osmosis, and he did not begin his studies of 
hereditary variation in plants until 1880. De Vries 
remained on the Amsterdam faculty until 1918, when 
he retired to Lunteren, a small town 30 miles to the 
south-east of Amsterdam, where he died in 1935. 

De Vries had been troubled that Darwin’s theory of 
“descent with modification” lacked a plausible explan- 
ation for the source of the variations in hereditary 
traits on which natural selection was supposed to act. 
He was not satisfied with Darwin’s ‘pangenesis’ the- 
ory of heredity, according to which the number and 
relative proportion of diverse ‘gemmules’ present in a 
creature’s body determine its characteristic traits. 
Thus, under pangenesis the phenomenon of heredity 
would be attributable to the transmission from parent 
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to fertilized ovum of a representative sample of the 
parental ensemble of gemmules. Darwin imagined 
two distinct causes of variation in hereditary traits: 
changes in the number and relative proportions of 
various gemmules present in the parental body and 
typological changes in the gemmules themselves. In 
1889, he wrote: 


If one considers the species characters in the light of the 
doctrine of descent, then it quickly appears that they are 
composed of separate, more or less independent factors. 
Almost every one of them is found in numerous species, 
and their changing combinations and association with 
rarer factors determine the extraordinary variety of the 
world of organisms. ... These factors are the units that the 
science of heredity has to investigate. Just as physics and 
chemistry are based on molecules and atoms, even so the 
biological sciences must penetrate to these units in order to 
explain by their combinations the phenomena of the living 
world. 


So de Vries devised a new theory, according to which 
the traits that distinguish individuals belonging to 
different varieties of the same species can vary in- 
dependently of one another, being attributable to vari- 
ous combinations of diverse units of heredity. He 
named these units ‘pangenes’ since he imagined that 
they, rather then Darwin’s ‘gemmules,’ are the factors 
that account for Darwin’s theory of pangenesis. 

To demonstrate the existence of pangenes, de Vries 
crossed different varieties of garden plant species. He 
found, as he had expected, that the parental traits that 
the varieties did not have in common segregated 
among the hybrid progeny plants according to some 
law-like regularities. By the year 1900, he had done 
enough crosses to feel sure that the rules of segregation 
of pangenes he had worked out were correct and that 
he was ready to announce what he thought was an 
entirely new discovery. Before he sent off his paper for 
publication, however, M.W. Beijerinck, professor of 
bacteriology at the University of Delft, showed him a 
reprint of Mendel’s, unknown to de Vries, 1865 paper. 
So in his paper, de Vries presented the results of his 
crosses as merely confirming the findings that Mendel 
had published 35 years earlier. De Vries’s paper, in 
turn, caused Carl Correns and Erich von Tschermak 
to publish their own independent rediscoveries of 
Mendel’s laws. 

De Vries’s accidental discovery of the sporadic 
appearance of novel, hereditarily stable traits of the 
American primrose led to his developing the doctrine 
of evolution proceeding by sudden changes, or ‘muta- 
tions,’ of pangenes (soon after to be renamed ‘genes’ 
by W. Johannsen, who refined de Vries’s concept of the 
pangene). As a botanist, de Vries was unaware that 
cattle breeders, who had often noticed the sporadic 
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appearance of freak specimens in their herds, were 
very familiar with such sudden hereditary changes of 
traits. 

By rediscovering the Mendelian laws of inheritance 
and adding to them his own theory of mutation, 
de Vries provided the missing elements for completion 
of the Darwinian theory of evolution by “descent with 
modification.” Thus de Vries was not a mere “redis- 
coverer” but a creator of broad general principles who 
transcended Mendel in at least two ways: He showed 
that an independent segregation of hereditary units 
occurs in a wide variety of plant species, and he identi- 
fied the mutability of those units as the source of their 
evolutionary diversification. The scientific stature of 
de Vries steadily rose as (premolecular) genetics devel- 
oped during the first half of the twentieth century 
along the lines that he conceived in the 1880s: The 
invariance as well as the variance of living creatures 
is attributable to the properties and activities of mater- 
ial units that implement the transmission of hereditary 
traits in evolution and in development. 


See also: Mendelian Genetics; Mendelian 
Inheritance 
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The genetic code is considered degenerate because 
most amino acids are specified by more than one 
codon. In the genetic code used by most organisms, 
once called the “universal” genetic code, 61 of the 
codons are called sense codons and each encodes one 
of 20 amino acids. In this code only two amino acids 
are specified by a single codon each: Met (AUG) and 
Trp (UGG). All the other amino acids are encoded by 
2, 3, 4, or 6 codons each. The codons encoding a single 
amino acid are termed synonymous codons and are 
related to each other by degeneracy. That is, changing 
a base at some positions of a codon does not neces- 
sarily change the amino acid encoded. Degeneracy 
most commonly results from an equivalency in the 
third position of a codon. For example, the codon 
GGN encodes glycine, whatever the identity of the 
base N. There are eight such groups of codons. There 


are also several codon groups where degeneracy is not 
complete in the third position, i.e., it matters whether 
the third base is a purine or a pyrimidine. Other 
patterns of degeneracy are also known. Degeneracy 
of the code allows for the possibility of a single trans- 
fer RNA being able to recognize or “read” more than 
one codon, and insert the correct amino acid, a situa- 
tion which is quite common. 

Degeneracy is one example of structure, or non- 
randomness, in the genetic code. This structure seems 
to minimize the effect of point mutations within an 
open reading frame. For example, many base-pair 
substitutions at the third position of a codon will be 
silent because the amino acid encoded by the mutated 
codon will not have changed. 


See also: Base Pairing and Base Pair Substitution; 
Codons; Codon Usage Bias; Genetic Code; 
Mutation, Silent; Sense Codon; Transfer RNA 
(tRNA) 
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Max Delbriick (1906-1981), a German-born, Ameri- 
can molecular biologist, was born on 4 September 
1906 in Berlin. His family, originally from Halle, 
had been active in German political and intellectual 
affairs for several generations. He was a son of Hans 
Delbriick, professor of history in Berlin, and a great- 
grandson of the famous German chemist Justus 
Liebig. After starting to study astronomy in Tübingen 
in 1924, young Delbriick moved to Géttingen where 
he eventually abandoned astrophysics for quantum 
mechanics and received a PhD degree in theoretical 
physics in 1930 under Max Born. He carried out 
advanced work in physics in Bristol, England, Switzer- 
land, Berlin, and Copenhagen. While in the laboratory 
of Niels Bohr in Copenhagen, he developed his inter- 
ests in biological problems with the hope that new 
principles of physics and chemistry might be revealed 
by the study of life processes. In Berlin, Delbriick col- 
laborated with Nicolai Timoféeff-Ressovsky, a Russian 
geneticist, and Karl Zimmer, a German radiobiologist 
ona celebrated paper examining the physical nature of 
the gene (mainly with reference to Drosophila) by 
radiobiological approaches. Delbriick became con- 
vinced that genetics, and the stable nature of the gene 
and its incredibly accurate replication, were aspects of 
biology most likely to provide the paradoxes that 
might reveal the new principles he was seeking. 


In late 1937 Delbriick moved to the California 
Institute of Technology to work in collaboration with 
Thomas Hunt Morgan on Drosophila genes. How- 
ever, soon after he arrived at Caltech, he met Emory 
Ellis, a research associate who was already developing 
bacteriophage as a model system to study the basic 
biology of viruses. Ellis had characterized the basic 
process of bacteriophage growth, and had confirmed 
the one-step growth patterns described a decade earl- 
ier by Félix d’Herelle. Delbrück was immediately 
struck by the usefulness of bacteriophage as a tool to 
study heredity, and he abandoned his plans for Dros- 
ophila and joined Ellis’s laboratory to work on phage. 
With the onset of World War II, Delbriick left Caltech 
to take up a position in the physics department at 
Vanderbilt University where he taught physics and 
continued his research on phage multiplication. Ellis 
went into military-related research and had a distin- 
guished career in rocketry. 

In 1941 Delbrück met Salvador Luria and they began 
a life-long collaboration and friendship. In an ongoing 
research program on bacteriophage, they, along with 
Alfred Hershey, initiated the research school now 
known as the “American Phage Group.” In 1947 
Delbriick moved back to Caltech, where, except for 
a brief hiatus in Cologne from 1961 to 1963, he was on 
the faculty until his death in 1981. His research 
focused on the multiplication and genetics of bacterio- 
phages, and later on the phototropism of the mold 
Phycomyces blakesleeanus. Among his many honors, 
Delbrück received a Nobel Prize in Physiology or 
Medicine in 1969, which he shared with Luria and 
Hershey. 

Delbriick’s published work does not represent his 
full impact on the field of genetics and molecular 
biology. His own research papers are highly focused 
on specific problems, and while they are models of 
clarity and care, they do not deal with ground-breaking 
issues of the time. In his collaborations with others 
and in his scientific leadership and intellectual guid- 
ance, he was much more influential. Several examples 
suffice to indicate this influence. 

An old problem in phage biology, that of the ap- 
pearance of phage-resistant bacteria, interested Luria. 
He and Delbriick devised a way to test if the phage- 
resistant bacteria were produced spontaneously and 
subsequently grew out under selective conditions, or 
conversely, if the phage somehow induced the phage 
resistance to appear. Their approach was both sound 
and elegant, but indirect, relying as it did, on prob- 
abilistic arguments similar to those they had often 
used in their radiobiological target theory work. This 
experimental approach, which came to be known as 
the Luria—Delbriick experiment, has been widely 


Delbrück, Max 523 


hailed as a landmark in the development of bacterial 
and molecular genetics. 

When Delbriick, Luria, and Hershey met to discuss 
phage biology in the early 1940s, it was usual for each 
laboratory to isolate phages from local sources and to 
select them on their favorite host strains and species. 
There was a bewildering variety of phages being stud- 
ied, and because there were many controversies, even 
as to the particulate nature of phage, it was too easy to 
explain away the conflicting results by ‘strain differ- 
ences.’ Delbriick realized the critical need for inter- 
laboratory comparability, and in building the small 
school of phage workers that he envisioned, he 
insisted that studies be limited to a group of seven 
lytic phage strains that he, Luria, and Hershey de- 
veloped, the famous “T-phages’ (T for ‘type’). 

Each summer from 1945 until he left phage work in 
the mid-1950s, Delbriick organized (or authorized 
others to organize) a course on bacteriophage work 
with a series of seminars at the Cold Spring Harbor 
Laboratory on Long Island. This course allowed 
Delbriick to proselytize and indoctrinate new converts 
to phage research, to standardize techniques and experi- 
mental protocols, and to set the intellectual agenda of 
the field. Alumni of “The Phage Course” went forth 
into the research laboratories and classrooms of 
America to spread the gospel of phage molecular bio- 
logy as the wave of the future. Delbriick’s efforts and 
strategies stand as a model for discipline-building; 
shared research goals and approaches, networks of 
formal and informal communication, development 
of lineages of mentors and students, and enforcement 
of standards and ethical behavior. 

By the early 1950s the physical nature of the gene 
and its chemistry were becoming clear. With the new 
understanding of the problem of gene duplication that 
came with the discovery of the double-helical nature 
of DNA with its complementary base-pairing rules (a 
straightforward consequence of the stereochemistry 
of the structure), Delbriick realized that deep para- 
doxes were not likely to arise in genetics. He turned 
his attention to the last great mystery in biology, that 
is, the brain. Hoping to find a simple biological “gad- 
get” that might allow him to probe the complexities of 
brain and mind in a clean and clear way, Delbriick 
undertook a study of phototropism ina fungus, Phyco- 
myces blakesleeanus. He attempted to use Phycomyces 
as a model organism just as he employed bacterio- 
phages. He recruited disciples, organized courses, and 
carried out experiments on this organism, which he 
termed the most intelligent of the simple eukaryotes. 
While his legacy in this field is carried on by a devoted 
group of scientists, Phycomyces has yet to prove as use- 
ful as did phage in revealing the basic principles of life. 
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Further Reading 

Cairns J, Stent GS and Watson JD (eds) (1966) Phage and the 
Origins of Molecular Biology. Plainview Press, NY: Cold Spring 
Harbor Laboratory. 

Fischer E Peter and Lipson C (1988) Thinking about Science: Max 
Delbrück and the Origins of Molecular Biology. New York: W.W. 
Norton. 

Summers WC (1993) How bacteriophage came to be used by 
the Phage Group. Journal of the History of Biology 26: 255-267. 


See also: Hershey, Alfred; Luria, Salvador; 
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Deletion is the loss of a contiguous duplex DNA 
segment from a genetic locus or chromosome; the 
deleted segment may range in size from a single base 
pair to many thousands of base pairs. 

Deletions occur through errors of replication, or by 
DNA breakage followed by imprecise repair. Dele- 
tions also result from specialized recombination pro- 
cesses. For example, intramolecular transposition of 
transposable DNA elements results in either the dele- 
tion (or the inversion) of the DNA segment between 
the initial end of the transposon and its target site. 
Similarly, during excision of integrated phages, the 
phage integrase may, in error, occasionally use an 
incorrect target sequence, excising a segment of the 
bacterial chromosome along with the phage DNA, 
leaving behind a deletion of the host genes that abutted 
the phage attachment site. 


See also: Gene Rearrangements, Prokaryotic; 
Phage à Integration and Excision; Resolvase- 
Mediated Deletion; Site-Specific Recombination; 
Specialized Recombination; Transposable 
Elements; Transposon Excision 
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Deletion mapping is a method of genetic mapping 
to determine whether or not two or more genetic 


markers fall within the same region of DNA. Typ- 
ically a genetic cross is set up between a recipient 
and a donor where one of the genetic elements is 
known to harbor a deletion. A point mutation which 
occurs within the bounds of a deletion cannot lead to a 
wild-type allele and therefore function will not be 
restored through a recombinational event. Deletion 
mapping will not determine the relative order of 
mutations but rather answer the question whether or 
not the recombining of the two elements lead to 
restored function. If the genetic cross does restore 
function then point mutation does not lie within the 
genetic element defined by the deletion mutation. If 
the cross does not restore function then the point 
mutation does lie within the deletion. 


See also: Deletion Mapping, Mouse; 
Gene Mapping 
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Basic Information 


Deletions (also known as deficiencies) are aberrations 
in which intervals of varying length are missing from a 
chromosome, resulting in segmental monosomy for 
the affected region. Deletions are typically employed 
for fine-structure mapping of genetic loci within small 
chromosomal regions, and the principles of deletion 
mapping in the mouse are really no different from 
those applied to other experimental organisms, such 
as Drosophila melanogaster. Deletions are useful for 
mapping recessive mutations defined by phenotype, as 
well as codominant genetic markers defined by bio- 
chemical or molecular methods, but they are not typ- 
ically employed for mapping of mutations specifying 
dominant phenotypes. Chromosomal deletions occur 
spontaneously at a low frequency, or are induced by 
treatment of germ cells (most efficiently, mature or 
maturing oocytes in the female, and postmeiotic 
spermatogenic cells in the male) with chromosome- 
breaking agents, such as acute radiation or certain 
chemicals. The first suggestion that intrachromosomal 
deletions occur in mice came in 1958 by W. L. Russell 
and coworkers: in the first-generation progeny of 
irradiated males, they found simultaneous induction 
of mutations at the tightly linked (0.16 cM) dilute 
(d; now called Myo5a) and short-ear (se; now called 
Bmp5) loci. That is, they recovered animals that were 
both dilute and short-eared among the progeny of 


irradiated ++/++ males that had been crossed to 
d se/d se females. Further genetic, cytogenetic, and 
molecular studies have since shown that these, and a 
high proportion of other radiation-induced muta- 
tions, are deletions. 

Because the mouse genome is diploid, one can 
recover deleted chromosomes in the heterozygous 
state in the progeny of mutagenized mice. Hetero- 
zygous deletions often do not exhibit an obvious exter- 
nally visible phenotype, despite the partial monosomy 
for the chromosomal segment corresponding to what 
was deleted. Many times, however, individuals hetero- 
zygous for a deletion, especially a large one, do mani- 
fest a phenotype (albeit not a very precise one) usually 
characterized by poor postnatal fitness and/or survival, 
runting, abnormalities in breeding performance, etc. 
Deletion heterozygotes can also manifest specific 
phenotypes if the deletion (even a small one) removes 
a gene(s) that is required in two doses for a normal 
phenotype. Good examples of such ‘haploinsuffi- 
ciency’ phenotypes are deletions of the c-kit (W; 
dominant white spotting) locus in chromosome 5, 
the mast-cell growth factor (Mgf; Steel [S//) locus 
in chromosome 10, and the Brachyury (T) locus in 
chromosome 17. In some cases, haploinsufficiency of 
certain chromosomal segments can result in lethality 
because the organism requires both doses of certain 
genes to continue normal development. Such aberr- 
ations comprise one class of so-called ‘dominant 
lethals,’ and this class of deletions cannot be recovered 
and propagated in breeding stocks. 

Deletions, particularly those induced by radiations 
or chromosome-breaking chemicals and recovered in 
the heterozygous state, are often, but not always, 
lethal when homozygous. This is because such dele- 
tions, especially large ones, can remove a large number 
of genes within a local area of the chromosome. If any 
one of the deleted genes is necessary for normal 
embryonic, neonatal, or juvenile development, its 
homozygous deletion results in the loss of an essential 
function, and normal development is derailed. It is 
these ‘recessive-lethal’ deletions that are extremely 
useful for fine-structure mapping of recessive muta- 
tions and codominant genetic markers. 


Mapping Recessive Mutations: 
Pseudodominance and 
Complementation 


The basic principle behind the use of deletions for 
genetic mapping in the mouse or in any other organ- 
ism is based on the ability to recognize some kind of 
loss. For deletion mapping of recessive mutations, one 
tests for the loss of wild-type gene function, by a so- 
called ‘pseudodominance test’ (Figure 1). The r locus 
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is defined by an abnormal phenotype exhibited by r/r 
homozygotes; +/r heterozygotes are normal, showing 
that r is a recessive mutation. If one has reason to 
believe that locus r maps near locus m (from other 
mapping data; e.g., linkage analysis), and if deletions 
of locus m [Del(m)] are available, one can determine 
whether the r locus is encompassed by deletions of the 
m locus by crossing Del(m)/+ heterozyotes to r/r 
homozygotes. As is shown in Figure 1, if ~50 % of 
the progeny from such a cross with Del(m)* display 
the r mutant phenotype, one can conclude that the 
Del(m)* deletion chromosome fails to complement 
the recessive mutation at the r locus. This result is 
consistent with the r locus being deleted by Del(»)*. 

This type of deletion complementation test is 
termed a ‘pseudodominance test’ because the nor- 
mally recessive r phenotype is detectable in the first 
generation, owing to the wild-type allele not being 
provided by the deleted chromosome (i.e., the nor- 
mally recessive phenotype is now ‘pseudodominant’). 
If a similar cross, this time with the Del(m)? deletion, 
fails to produce r phenotype progeny, one can con- 
clude that Del(m)’ can indeed provide wild-type r locus 
function (i.e., can complement) and, therefore, does 
not include the r locus. This relatively simple genetic 
analysis places the r locus within a segment bounded 
on either side by the breakpoints of these two Del(m) 
deletions (see arrowheads in Figure 1), greatly facili- 
tating the subsequent identification of the r locus by 
molecular methods (see below). 


Gass Progeny 
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Del(m) LE 
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b 
Dem)? os 100% 0 
+ + EF 


Del(m)? 
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Figure | A pseudodominance test mapping a hypo- 
thetical r locus between two deletion breakpoints. 
Del(m)° and Del(m)? represent two independent dele- 
tions of the m locus. Mice homozygous for mutant 
alleles at the r locus (r/r) exhibit an abnormal phenotype. 
The open boxes below the simple chromosome map 
represent the extent of each deletion. The arrowheads 
designate the interval on the deletion map containing 
the r locus, based on the progeny data given in the Table. 
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Figure 2 Deletion mapping of codominant genetic markers. Enz? and Enz’ represent alleles at a locus, Enz, 
specifying slow and fast electrophoretic forms, respectively, of a hypothetical enzyme. The large box represents an 
electrophoretic gel; the lines represent the loaded wells (the origin); and the dark boxes represent the protein (or 
DNA) bands. Two crosses of deletion heterozygotes to Enz’ homozygotes are indicated. For each cross, lanes | and 2 
represent the deletion parent and the Enz!/Enz’ parent, respectively; the other seven lanes represent their progeny. 
Note that individuals 4, 5, 7, and 9 in the right part of the gel do not receive an Enz* allele from the deletion parent, 
indicating that the Enz locus is included in the Del(m)? deletion. In the deletion map at the bottom of the figure, the 
open boxes represent the extents of each deletion with respect to the r locus (from Figure |) and the Enz locus, 
mapped here. This same strategy can be used to map loci defined by DNA polymorphisms (see text). The stippled 
boxes represent the map positions for essential genes (I | and 12) that are removed by each deletion (see text for 


definition and mapping of II and 12). 


Deletion Mapping of Codominant 
Genetic Markers 


The concept of phenotypically recognizing loss is 
likewise appropriate in the mapping of loci exhibiting 
codominant alleles. For example, Figure 2 modifies 
the situation from Figure | to show how loci specify- 
ing protein electrophoretic variants or loci defined by 
DNA clones can be placed into a fine-structure deletion 
map. In Figure 2’s example, the Enz locus encodes 
an enzyme with an activity that can be assayed in situ 
in an electrophoretic gel, and previous mapping has 
indicated a map position near the m locus. If the mice 
carrying either of the two Del(m) deletions exhibit 
only the ‘slow’ variant (‘slow’ as defined by rate of mi- 
gration in an electric field), and these mice are crossed 
to mice homozygous for a ‘fast’ Enz variant, two out- 
comes are possible for each deletion. As an example, 
Figure 2 shows that, in the Del(7)* cross, all progeny 
exhibit both fast and slow variants, but, in the Del(m)?” 
cross, one half of the progeny exhibits both variants and 
one-half exhibits only the fast variant. Thus, Del(m)’, 


but not Del(m)’, deletes the Enz locus. Combining 
this information with that from the Figure | analysis, 
one can conclude that the Enz and r loci are on oppo- 
site sides of the m locus (Figure 2). 

The mapping of loci defined by DNA sequences, 
either by restriction fragment length polymorphisms 
(RFLPs) or simple sequence length polymorph- 
isms (SSLPs, microsatellites) follows exactly the same 
strategy. Here, one is looking for loss of a particular 
RFLP/SSLP associated with the deletion-heterozygote 
parent (analogous to the ‘slow’ enzyme variant in the 
example above). To increase the amount of poly- 
morphism so that RFLP or SSLP analysis can be 
feasible with just about any DNA clone, investigators 
have prepared banks of DNAs from animals that 
carry a deletion chromosome from the laboratory 
mouse (Mus musculus) heterozygous with a chromo- 
some from a wild species or subspecies of mouse, 
usually M. spretus or M. musculus castaneous. 
Mapping such DNA-defined loci by this strategy has 
been important for locating DNA sequences to dele- 
tion intervals harboring loci defined by phenotypes. 


Importantly, once such molecular access has been 
achieved (to the r-locus regionin Figure |, forexample), 
the deletion breakpoints surrounding the locus pro- 
vide excellent landmarks on a DNA physical map for 
determining the genomic segment that must contain at 
least a part of the gene of interest. Of course, continued 
analysis with additional deletions can often narrow 
down the ‘critical region’ where the gene of interest 
can map, thus simplying its molecular identification. 


Deletion Complexes and High- 
Resolution Deletion/Functional Maps 


As the above discussions indicate, deletion mapping in 
the mouse provides a very important complement 
for fine-structure mapping of genomic regions of 
less than 5 cM, where backcross-mapping techniques 
suffer from low resolution. Deletion mapping is also 
a preferred method for mapping loci defined by 
recessive-lethal mutations, where it would be difficult 
to genotype backcross progeny based on phenotype. 
Such fine-structure mapping of genomic regions is 
being accomplished for several regions of the mouse 
genome where there have been many deletions re- 
covered as induced mutations at specific genetic loci. 
These so-called ‘deletion complexes’ exist for a number 
of regions of the genome, including the chromosome 7 
albino (c; now called Tyr) and the pink-eyed dilution 
(p) loci; the chromosome 4 brown (b; now called 
Tyrp1) locus; the chromosome 9 dilute and short-ear 
loci discussed above; the piebald spotting (s; now 
called Ednrb) locus; the chromosome 2 agouti (a) locus; 
and several loci mapping within the tregion of chromo- 
some 17. At least for the c, p, b, s, and d-se regions, 
dozens of deletions exist. Genetic analyses of these 
deletions, incorporating strategies outlined above, as 
well as several additional strategies described below, 
have made these chromosomal regions among the 
functionally best characterized in the mouse genome. 
In addition to the deletion mapping of flanking 
loci as discussed above, one can also gain regional 
functional-map information by subjecting individual 
deletions to pairwise complementation analyses. This 
strategy was first reported for the mouse in 1971 by 
L. B. Russell for chromosome 9 d-se region deletions. 
We can return to Figure 2 for an example. Suppose 
both Del(m)* and Del(m)’ are lethal when homo- 
zygous, that is, no mice with the m-locus mutant 
phenotype are found in a cross of elem)“ + x 
Del(m)*/ + or Del(m)’/ + x Del(m)’/ + mice. This 
would indicate that each deletion encompasses at least 
one gene that is essential for normal development (it 
could even be the m locus itself). Now, if one performs 
a complementation test, in which Del(m)*/+ mice are 
crossed to Del(m)’/+ mice, and recovers mice with 
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the mutant m phenotype, one can infer immediately 
that the m locus itself is not an essential locus, because 
it can be homozygously deleted with no adverse effect 
on the health of the animal (other than the phenotype 
normally associated with m mutations). Moreover, 
these two deletions must each delete a distinct and 
different gene important for some developmental 
process(es), and these ‘lethals’ (call them /7 and /2) 
must map on opposite sides of m (stippled boxes in 
Figure 2). If this type of complementation analysis is 
combined with the mapping of recessive phenotypes 
and codominant markers (especially loci defined by 
DNA clones) discussed previously, one can begin to 
build integrated functional and physical maps of large 
(megabase) stretches of the mouse genome. 

This functional dissection can be carried one step 
further by exploiting the deletions as tools in addi- 
tional mutagenesis experiments, this time employing 
a chemical point mutagen effective in mouse, such as 
N-ethyl-N-nitrosourea (ENU), instead of radiation 
or chromosome-breaking chemicals. Breeding proto- 
cols have been developed where a marked chromo- 
some (marked with the m mutation, for example) can 
be mutagenized and then placed, by a simple series of 
crosses, opposite a large m-locus deletion. This allows 
the identification, in a phenotype-driven way, of new 
recessive point mutations that map within the large 
deletion, and each point mutation has the potential 
for defining new genes (or series of alleles at known 
genes). The new point mutations (even lethal or detri- 
mental ones) can be recovered from the parent or sibs 
of these ‘test-class’ animals and propagated in breed- 
ing stocks. Then, by the pseudodominance strategy 
described above, each new point mutation (even 
lethals) can be placed into intervals of the deletion 
map (and therefore into the physical map) with just 
one cross. The greater the number of deletions avail- 
able for any particular region, the finer the mapping. 
In this way, very detailed functional/mutation maps 
can be built up from relatively simple deletion maps. 

One major problem in applying the general stra- 
tegies discussed here efficiently to the entire mouse 
genome has been that panels of deletions were avail- 
able only for the specific chromosomal regions out- 
lined above. However, recent encouraging results 
from several laboratories have indicated that deletion 
complexes may be created anywhere in the genome by 
genetic manipulation of embryonic stem (ES) cells. ES 
cells are derived from the early mouse embryo, can be 
propagated in tissue culture (and therefore manipu- 
lated in vitro), and, importantly, can be introduced 
back into developing mouse embryos where they con- 
tribute to both somatic and germ-line tissues. Thus, a 
mutation introduced into ES cells im vitro can even- 
tually wind up in heterozygotes, and then be bred to 
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homozygosity in a living mouse. Two major strategies 
have been applied that can induce deletions in ES 
cells: one involves gene targeting and a Cre-lox 
protein-mediated intrachromosomal recombination, 
whereas the other involves exposing ES cells to radia- 
tion and selecting, in vitro, for chromosomal dele- 
tions. In both cases, mice can eventually be created 
from ES cells carrying deletions, so that panels of 
deletions, for fine-structure mapping of the entire 
genome in the mouse, will be available in the not- 
so-distant future. 


Further Reading 

Rinchik EM and Russell LB (1990) Germline deletion mutations 
in the mouse: tools for intensive functional and physical 
mapping of regions for the mammalian genome. In: Davies 
K and Tilghman S (eds) Genome Analysis, vol. |. pp. 
121-158. Plainview, NY: Cold Spring Harbor Laboratory 
Press. 

Silver LM (1995) Mouse Genetics: Concepts and Applications. 
Oxford: Oxford University Press. 


See also: Chromosome Aberrations; 
Complementation Test; Embryonic Stem Cells; 
Restriction Fragment Length Polymorphism 
(RFLP) 
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Deletion mutation results in the removal of one or 
more base pairs from a region of DNA. This can 
remove an entire gene or even a group of linked 
genes. Deletion is also referred to as ‘deficiency.’ 


See also: Mutation; Mutagens 


Demes 
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Most widespread species are composed of many pop- 
ulations that are somewhat isolated from one another. 
A deme is one of these geographically localized popu- 
lations. It is a group of individuals belonging to a 
single species occurring in one place at one time. 


Table | Correlation in flower color of Linanthus 
parryae 

Distance Correlation 
25 feet 0.899 

75 feet 0.875 

250 feet 0.817 

750 feet 0.723 

0.5 mile 0.599 

1.0 mile 0.505 

1.5 miles 0.115 

2.0 miles 0.096 


In a species that is predominantly outbreeding, 
each deme is roughly panmictic. In a species that is 
predominantly inbreeding, demes are far from pan- 
mictic. In both cases, however, members of the same 
deme are generally part of a gene pool that does 
not include all members of the species. If there is 
no possibility of gene exchange between individuals 
in a particular place, as a result of strict asexual re- 
production for example, then the individuals in that 
place do not form a deme. The defining property of a 
deme is that it consists of a group of interbreeding 
individuals. 

In a large, continuously distributed population, 
individuals that occur close together are more likely 
to mate than those that occur far apart. As a result 
of this isolation by distance the population is not 
panmictic, even if the individuals of which it is com- 
posed are predominantly outbreeding. Neither are 
there discrete, localized demes that are panmictic 
within themselves. Instead, such populations are com- 
posed of smaller, overlapping demes (genetic neighbor- 
hoods) within which mating occurs essentially at 
random. 

In Linanthus parryae, a small plant of the Califor- 
nia desert, there is a polymorphism for flower color. 
Some individuals produce white flowers; others pro- 
duce blue flowers. Epling and Dobzhansky sampled 
the frequency at 427 locations along approximately 
200 miles of roadway in an area just north of the San 
Bernadino and San Gabriel mountains. They found 
that the frequency of blue-flowered individuals was 
highly correlated at distances of up to 1 mile, but that 
the correlation in frequency could not be distin- 
guished from zero at distances of 1.5 and 2 miles 
(Table 1). Although hundreds of thousands of indi- 
viduals occur in this area, Wright calculated that each 
deme consisted of only 15-25 individuals covering an 
area of 2-3 square feet. 


See also: Gene Pool; Isolation by Distance 
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The denaturation of a protein is its conversion from its 
biologically functional native state to a denatured 
state. The native state of a globular protein is usually 
compact with a well-defined structure that is construc- 
ted around a closely packed hydrophobic core. The 
core is formed from hydrophobic side chains that pro- 
trude from the different elements of secondary struc- 
ture in the protein. A common element of secondary 
structure, the o-helix, is amphipathic, that is, having a 
hydrophobic face on one side and a hydrophilic face 
on the other. The hydrophobic face tends to be buried 
in proteins, often being part of the core. The denatured 
state was once thought to be a random polypeptide 
chain. Modern spectroscopic studies show that this is 
rarely so. The denatured state is usually a mixture of 
rapidly interconverting loose structures and notasingle 
state. In small proteins, the denatured state can, at one 
extreme, approximate to a mixture of conformations 
that are close to being random. In most proteins, how- 
ever, the denatured state is more or less compact with 
elements of secondary structure, suchas a-helices, being 
weakly formed but having much of the hydrophobic 
core exposed to solvent. Sometimes, the denatured 
state approximates more closely to a single structure, 
termed a molten globule. The signature of the de- 
natured state is that hydrophobic surfaces that are 
normally buried are partly or fully exposed to solvent. 

Denaturation can be induced by chemical denatur- 
ants, such as urea or guanidinium hydrochloride. These 
denaturants stabilize denatured states relative to the 
native structure because the amino acid side chains and 
peptide backbones of proteins are more stable in guani- 
dinium chloride or urea solutions, and the denatured 
state has more exposed structural features than the com- 
pact native state. Denaturation is also induced by heat- 
ing because the process of denaturation is endothermic 
at higher temperature. In addition, because the de- 
natured state of a protein has a much higher specific 
heat than the native structure, proteins also denature 
on cooling; however the melting temperature for cold 
denaturation is generally below freezing. Most pro- 
teins have maximal stability around physiological 
temperature. 

Denaturation may be reversible or irreversible. Fre- 
quently small denatured proteins will spontaneously 
revert to their native structure after denaturant has 
been removed or it has cooled down below its melting 
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temperature. At high enough concentrations, all pro- 
teins denature irreversibly because their denatured 
state aggregates and precipitates. Some proteins will 
not spontaneously renature because they have been 
subjected to modification after biosynthesis, such as 
the removal of stretches of amino acids or just simple 
cleavage. Proteins that have multiple disulfide bridges 
are particularly intractable to renaturation if those 
bridges are cleaved by reduction as part of the de- 
naturation process. Incorrect bridges may be formed 
on reoxidation. This problem is overcome during 
protein biosynthesis by disulfide-shuffling enzymes. 
Some proteins do not renature because a partly de- 
natured state is kinetically stable and the structure 
becomes trapped. Denaturation can also be caused by 
chemical changes in a protein, such as oxidation of 
cysteine or methionine residues or by deamidation 
of glutamine or asparagine. 

There are several biological consequences of pro- 
tein denaturation. The first is that many proteins are 
on the verge of being denatured at physiological tem- 
perature and so heat shock will cause them to de- 
nature. The denatured protein may be rescued by heat 
shock proteins that specifically recognize the distinct- 
ive feature of denatured states of proteins’ exposed 
hydrophobic surfaces. The heat shock proteins bind to 
the exposed hydrophobic surfaces and prevent aggre- 
gation. Heat shock proteins also function as molecular 
chaperones in protein biosynthesis. They bind partly 
denatured states of proteins during biosynthesis and 
prevent their aggregation and precipitation in the same 
way as they do in heat shock. The hydrolysis of ATP is 
usually required to release the bound protein and 
allow it to fold successfully. The molecular chaperones 
canalso have another role by causing proteins to unfold 
temporarily in order to be transported across mem- 
branes or reassemble as parts of larger protein com- 
plexes. A probable mechanism for the formation of 
amyloid deposits invokes the partial denaturation of 
proteins: sequences of the protein that have a tendency 
to form strands of B-sheet can be exposed on partial or 
full denaturation so that they are able to associate and 
form long fibrils of B-sheet. There is also a hypothesis 
that the interconversion of the normal prion protein to 
its scrapie form requires denaturation of the protein. 


Further Reading 

Fresht A (1999) Structure and Mechanism in Protein Science: A 
Guide to Enzyme Catalysis and Protein Folding. New York: WH 
Freeman. 


See also: Heat Shock Proteins; Proteins and 
Protein Structure; Spongiform Encephalopathies 
(Transmissible), Genetic Aspects of 


530 Derepression 


Derepression 


C Yanofsky 
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Derepression is the activation of transcription of an 
operon as a result of the dissociation of a repressor 
from its cognate operator or operators. 


See also: Operon; Repressor 


Detoxification (SOD) 
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Detoxification is the neutralization of potentially 
damaging compounds before they interact with 
DNA. For instance, superoxide radicals can damage 
DNA. However, the enzyme superoxide dismutase 
(SOD) converts the radicals into hydrogen peroxide, 
which is then enzymatically converted to water by 
catalase. 


Deuteranopia 


See: Color Blindness 


Developmental Genetics 
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From a historical perspective, modern genetics was 
born out of developmental biology, and developmen- 
tal genetics therefore represents the reunion of two of 
the major threads in modern biology. One of the 
founders of genetics, Thomas Hunt Morgan, began 
his research career as a developmental biologist but 
realized that it would be essential to understand the 
nature and mechanism of heredity, in order to make 
any real progress with the investigation of develop- 
ment. In 1902 he therefore began studying Drosophila, 
as a tractable genetic system, with consequences that 
have played out during the twentieth century. 

For most of that time, the impact of genetic methods 
on developmental biology was relatively small. One 


reason for this is that the organisms favored for experi- 
mentation by developmental biologists were those 
with large eggs and embryos, such as amphibia and 
mollusks. These are ideal for the traditional methods 
of embryological manipulation such as ablation and 
transplantation, but none of them is easily subjected to 
genetic analysis. Conversely, organisms favored by 
geneticists tend to be small and rapidly developing, 
and therefore hard to manipulate physically. Only 
since 1970 has developmental genetics come into its 
own, aided in particular by the advent of efficient 
cloning methods. 

Understanding the development of an organism 
depends first on description of the events involved and 
then on experimental intervention. Genetic methods 
can contribute to both phases, though they are more 
important in the second. Description of development 
is greatly aided by techniques for marking a single cell 
and all of its descendants, and the best markers for 
such purposes are genetic, since these can be min- 
imally invasive and transmitted indefinitely. Fate map- 
ping by genetic techniques has produced fundamental 
observations such as the existence of compartments in 
insect development. The primary experimental inter- 
vention for the geneticist is the creation of mutants 
and the study of their altered development. However, 
the genetic toolkit permits other kinds of manipula- 
tion, some of which are equivalent to conventional 
embryological manipulation. The creation of genetic 
mosaics, in which cells of different genotype coexist in 
the same individual, provides similar information to a 
transplantation experiment. The use of conditional 
mutations allows gene activity to be switched on or 
off at different points in development. Temperature- 
sensitive mutants illustrate this approach: shifting 
organisms under study from a permissive temperature 
to a restrictive temperature or vice versa can effect- 
ively allow genes to be switched on and off, thereby 
providing information about when gene activities are 
required and in what order. 

The general strategy adopted by developmental 
geneticists has been to concentrate on a well-described 
developmental process and carry out systematic 
screens for mutants with abnormalities in that process. 
Characterization of the phenotypic defects in these 
mutants and study of the genetic properties and inter- 
actions of the genes thereby implicated can then be 
used to deduce the approximate nature and function 
of the underlying genetic program. Molecular cloning 
methods permit the isolation of these genes and iden- 
tification of their products, leading in turn to a bio- 
chemical description and explanation of the process. 

The molecular data provide an additional bonus, 
because gene sequences can be used to hunt for related 
genes in other organisms. These can then be studied 
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biochemically, without the need for developing a 
genetic system for that organism, in order to see if 
they perform corresponding functions. The rapid pro- 
gress in developmental biology over the past three 
decades has been heavily dependent on this general 
approach, particularly in vertebrates. 

Different genetic systems have been used to con- 
centrate on different developmental problems. The 
question of cell-type specification has been addressed 
in the greatest detail in various bacterial and fungal 
systems, most notably in the analysis of mating-type 
specification in the budding yeast Saccharomyces cere- 
visiae. Simple cases of pattern formation have been 
examined in bacteria, fungi, protozoa, and slime 
molds. For more complex examples of multicellular 
development, genetic studies of animal development 
have been especially powerful, exploiting the advan- 
tages of Drosophila, Caenorhabditis elegans, and the 
mouse. In recent years, the zebrafish Danio rerio has 
been developed as a promising additional system, 
which combines the power of the genetic approach 
with the advantages of a relatively large and transpar- 
ent embryo, amenable to physical manipulation. 

A major general finding from studies on these 
organisms is that the developmental mechanisms are 
remarkably conserved across the animal kingdom, 
despite the great differences in anatomy and ontogeny 
between different groups. Perhaps the most striking 
example is provided by homeobox clusters, which 
control patterning along the anterior—posterior axis 
in all animals. The homeobox cluster was first discov- 
ered in Drosophila, but has proved to be conserved to a 
remarkable degree in all metazoan animals, both in 
function and in genetic and molecular organization. 
Many other examples of conservation in the molecular 
biology of animal development can be cited. 

In contrast, developmental processes in plants 
involve different processes and different molecules. 
These differences may arise from distinctive proper- 
ties of plants such as the lack of cell movement and the 
presence of a cell wall. Such factors must greatly affect 
the available strategies for generating a complex three- 
dimensional structure. However, the genetic approach 
has been equally successful in unraveling some com- 
ponents of plant development, again by concentrating 
on one or two favorable species. This can be illustrated 
by the investigation of flower development, which has 
been analyzed in parallel studies using Arabidopsis and 
Antirrhinum (snapdragon). 

Theforward geneticapproachto developmentalana- 
lysis has been enormously successful, but is now likely 
to be supplanted or at least greatly supplemented by 
genome-based approaches. Microarray methods, in 
which all of the many thousand genes in an organism 
can be assayed for expression at once, are enormously 


increasing the amount of information available on cell- 
specific and tissue-specific gene expression. Systemat- 
ic studies of protein—gene and protein-protein inter- 
actions will have similar impact. These projects will, 
however, build on the frameworks laid down by 
previous genetic investigation. Also, the follow-up 
studies to test the involvement and importance of 
new genes and predicted interactions will continue 
to rely on the general strategies and techniques of 
developmental genetics. 


See also: Developmental Genetics of 
Caenorhabditis elegans; Homeotic Mutation; 
Neurogenetics in Caenorhabditis elegans; 
Neurogenetics in Drosophila; Pattern Formation 
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The nematode Caenorhabditis elegans was chosen as 
an organism for the intensive study of neurobiology 
and developmental biology, using genetic dissection 
as a primary experimental tool. The results obtained 
from this research program have had a major impact 
on many areas of developmental biology, some of 
which are discussed below. 

The general strategy used can be best illustrated 
by the analysis of one of the most closely examined 
parts of C. elegans development, the formation of the 
hermaphrodite vulva. This process has the major 
advantage that it takes place over a short period of 
time (a few hours) during postembryonic develop- 
ment, and can be followed noninvasively by direct 
observation using Nomarski (differential interference 
contrast) light microscopy. The vulva is also very sim- 
ple, containing only 22 nuclei, which arise from three 
precursor cells. Early in larval development, a line of 
six ‘vulval precursor cells’ form along the ventral mid- 
line of the animal. Ablation experiments, in which 
single cells were killed by a laser microbeam, showed 
that any of these six had the potential to adopt a vulval 
fate, but normally only three of the six do so. Laser 
ablation also showed that vulva formation is induced 
by a signal emanating froma single cell in the develop- 
ing gonad called the anchor cell, immediately dorsal to 
the precursor cells. Under this influence, one of the pre- 
cursor cells undergoes a characteristic and invariant 
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pattern of cell divisions, called the primary fate (1), 
and its two neighbors divide in a different pattern, 
called the secondary fate (2). The other three cells 
adopt a default, or tertiary fate (3), which does not 
contribute to the vulva. Thus, a simple linear pattern: 
(3 2 1 2 3 3), is created from the six equipotential 
precursor cells. The simplicity and invariance of 
this pattern, and the fact that vulva formation is not 
essential for viability, offered the opportunity for a 
saturation screen for mutants with abnormal vulva 
formation. Two main classes of mutant were ‘vulva- 
less’ mutants, with a default pattern (3 3 3 3 3 3), and 
‘multivulva’ mutants, with excess vulval differentia- 
tion (for example, 2 1 2 1 2 2). Ultimately, many 
hundreds of mutants, defining over 25 genes, were 
recovered from these screens. 

Detailed phenotypic analyses of the mutants, 
followed by molecular cloning and biochemical ana- 
lysis of the genes involved, has created a very detailed 
picture of how vulval patterning is achieved. This 
includes genes encoding the signal produced by the 
anchor cell (a growth factor-related protein), the re- 
ceptor for the signal in the precursor cells (a receptor 
tyrosine kinase), proteins needed for the correct posi- 
tioning of the receptor, a signal transduction kinase 
cascade, and nuclear transcription factors responsible 
for executing the different fates. Other genes involved 
are responsible for lateral inhibition between the six 
precursors, so that only one cell adopts a primary fate, 
and for the initial spatial specification of the six pre- 
cursors. The signaling and specification systems were 
found to involve exactly the same kinds of molecule 
that were being identified at the same time in parallel 
studies of pattern formation and cell specification 
in Drosophila, most notably in eye and segment 
formation. The dramatic conservation of these de- 
velopmental mechanisms, which have been found to 
operate also in vertebrate development, demonstrates 
the effectiveness of the whole research program. 

Many other components of C. elegans develop- 
ment have been subjected to the same kind of genetic 
attack, with similarly productive results. For example, 
pattern formation in the early embryo has been exam- 
ined in great detail. Embryogenesis begins with a 
pattern of invariant and unequal divisions, in which the 
three majoraxes of developmentare laid downatsucces- 
sive divisions: first anterior/posterior, second dorsal/ 
ventral, and third left/right. A series of cell-cell signal- 
ing events occurs at this time, again involving signal 
transduction pathways (LIN-12/NOTCH and Wnt) 
that are conserved throughout the animal kingdom. 

Much later in development, axons and myoblasts 
need to migrate correctly in order to generate the pre- 
cise and complex order of the nervous system and mus- 
culature. The genetic approach has proved successful 


here also, for example in the identification of proteins 
that are responsible for dorsoventral patterning in 
the nervous system. One of the earliest described 
C. elegans mutants, unc-6, has a nervous system in 
which all dorsoventral organization is lost; the unc-6 
gene encodes the nematode version of netrin, which 
performs the homologous function in vertebrate ner- 
vous systems. 

The strategy of concentrating on one component of 
development, and attempting to understand it in 
detail, has led to an emphasis on fate decisions: how 
the choice between two different development out- 
comes is controlled. This has included examination of 
choices such as cell death (programmed cell death 
versus survival), sex (male versus female), and timing 
(early versus late). Each of these three choices has been 
examined in great detail in C. elegans. The cell death 
pathway was first elucidated in the nematode, and 
found to be conserved in vertebrates. The sex deter- 
mination pathway provides a contrast: Surprisingly, 
there is almost no conservation of genes involved 
in sex determination between C. elegans, Drosophila, 
and mouse, although the sex determination mechan- 
isms are now fairly well understood in each of these 
three species. One sexual differentiation gene provides 
an exception: mab-3 (named for its Male Abnormal 
phenotype) plays a role in part of male development 
of C. elegans and has been found to be homologous to 
dsx (doublesex) in Drosophila, a gene that produces 
both male-specific and female-specific transcripts. 
These two genes define a family that also appears to be 
important in male sexual development in vertebrates. 

Developmental timing may also be poorly con- 
served in evolution, since most of the regulatory 
genes so far defined in C. elegans lack obvious coun- 
terparts in other animal groups. However, here again 
an exception is encountered in the heterochronic gene 
lin-42, which encodes a protein related to the period 
family, molecules that are involved in circadian timing 
in Drosophila and other animals. It may be that the 
apparently unconserved pathways in C. elegans, such 
as sex determination and timing control, have simply 
undergone more evolutionary divergence than other 
aspects of development, and that further similarities 
will come to light as these pathways are studied in 
detail in other animal systems. In general, however, 
the message is that developmental mechanisms are 
strongly conserved between nematodes and other 
animal groups, and that C. elegans provides an excel- 
lent model for the detailed examination of these pro- 
cesses. 


See also: Apoptosis; Cell Division in 
Caenorhabditis elegans; Developmental Genetics; 
Pattern Formation 
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See: Embryonic Development, Mouse 


Dicentric Chromosome 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1821 


A dicentric chromosome is the product of the fusion 
of two chromosome fragments, each of which carries a 
centromere. It is an unstable construct and may be 
broken when the two centromeres are pulled to oppos- 
ite poles in mitosis. 


See also: Centromere 


Dictyostelium 
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Dictyostelium is a genus of the Acrasidae, the cellular 
slime molds. The most commonly used member in 
genetic studies is D. discoideum. 


Dideoxy Sequencing 
See: DNA Sequencing 


Dideoxynucleotide 
See: DNA; Nucleotides and Nucleosides 


Differential Segment 
L Silver 
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In the genome of a congenic mouse, the differential 
segment is the region of chromosome surrounding the 
selected locus that is derived together with it from the 
donor genome. The differential segment represents a 
short region of foreign genetic material within the host 
inbred background. 


See also: Congenic Strain 
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In 1965 DiGeorge recognized the association of hypo- 
calcemia secondary to parathyroid hypoplasia and 
absence of the thymus. As additional cases were 
reported it became clear that cardiac malformations 
and facial dysmorphism were often present as well. It 
was reported in 1991 that a proportion of children 
with DiGeorge syndrome had chromosomal deletions 
within band 22q11. As techniques were developed to 
detect submicroscopic deletions it became apparent 
that 95% of affected children had such deletions. It 
has also become clear that there is a very wide pheno- 
typic spectrum associated with this deletion, classical 
DiGeorge syndrome being at the most severe end of 
the spectrum and the majority of cases having the 
clinical features of velocardiofacial syndrome (dis- 
cussed below). What is the cause of DiGeorge syn- 
drome in the children who do not have deletions 
within chromosome band 22q11? Although many 
genes have been identified within the commonly 
deleted region it is not yet clear which of these con- 
tribute to the phenotype. It is possible that smaller 
deletions or mutations in a gene in this region could 
produce the phenotype. A second genetic locus has 
been identified, submicroscopic deletions in chromo- 
some band 10p13 being associated with the syndrome. 
DiGeorge syndrome has also been reported in the 
offspring of diabetic mothers but there remain cases 
where the cause is unknown. 


Phenotype Associated with 
Chromosome 22q!1 Deletion 


Although the children DiGeorge described had ab- 
normalities of thymus and parathyroids, it is unusual 
for these to cause the presenting features in children 
with the deletion. Severe immunodeficiency is very 
unusual, occurring in less than 1% of individuals 
with the deletion, but T lymphocyte numbers are 
often low, this largely being due to low CD4 counts. 
However, most patients generate good antibody re- 
sponses following immunization. It is important to 
check the calcium levels to prevent hypocalcemic seiz- 
ures but hypocalcemia responds well to oral supple- 
ments. Congenital heart defects are present in 75% of 
patients with the deletion. The heart defects most 
commonly seen are tetralogy of Fallot, pulmonary 
atresia with ventricular septal defect, ventricular septal 
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Figure | 
DiGeorge syndrome: side view. 


(See Plate 7) Facial features associated with 


defect, interrupted aortic arch, and truncus arteriosus. 
As the first three of these are relatively common the 
chance of a child with these heart defects having the 
deletion is small. However, almost half of the children 
with type b interrupted aortic arch or truncus arter- 
iosus have the deletion. Both of these defects carry a 
significant mortality. 

The dysmorphic facial features associated with 
the deletion can be very subtle and as with many 
syndromes the facial appearance changes with age 
(Figures | and 2). In young children the mouth is 
small. The palpebral fissures may be short and narrow 
with lateral placement of the inner canthi. The ears 
have a round appearance because of a deficient upper 
helix and small lobe. The root and bridge of the nose 
are wide, this feature being most obvious in the older 
child and adult. Affected individuals are constitution- 
ally small. 

A third of the patients have velopharyngeal in- 
sufficiency which presents either in the neonatal 
period with drinks regurgitating through the nose or 
later with nasal speech. Overt clefting of the palate 
occurs in 10% of cases. A wide range of genitourinary 


Figure 2 (See Plate 8) Facial features associated with 
DiGeorge syndrome: frontal view. 


abnormalities have been reported including renal 
agenesis and dysplasia. The majority of affected indi- 
viduals have an intelligence quotient less than 100 with 
almost half having an intelligence quotient less than 
70, most of these having mild mental retardation. 
Schizophrenia is more common in adults with the 
deletion than in the general population. 

The phenotype is very variable; parents with minor 
features may have children with severe heart defects. 
This led to debate about whether the term DiGeorge 
syndrome should be applied to all children with the 
deletion or reserved for the pattern of features de- 
scribed by DiGeorge whether or not a deletion is pre- 
sent. One suggestion to avoid family members with the 
same deletion having different diagnoses was use of 
the acronym CATCH 22 (Cardiac defect, Abnormal 
facial appearance, Thymic hypoplasia, Cleft palate 
and Hypocalcemia resulting from chromosome 
22q11 deletion) but this has not been widely accepted 
because of the connotations that it has in Joseph 
Heller’s novel of a no-win situation. The term velo- 
cardiofacial syndrome can be used for the majority of 
cases. 


Prevalence of Deletion 


The standard method of detecting this deletion is 
fluorescent in situ hybridization (FISH). A minimum 
estimate of prevalence is 13 per 100000 (95% con- 
fidence interval 4.5 to 21.5). This figure was based on 
cases presenting in infancy and therefore misses the 
milder end of the spectrum. 


Further Reading 

Ryan AK, Goodship JA, Wilson DI et al. (1997) Spectrum of 
clinical features associated with interstitial chromosome 
22ql | deletions: a European collaborative study. Journal of 
Medical Genetics 34: 798-804. 


See also: Deletion; Genetic Diseases 
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Diploidy is the condition of having two complete sets 
of chromosomes in the same cell nucleus, as distinct 
from the haploid state, with only one set. Since a 
diploid organism has two copies (two alleles) of 
every gene, and one copy is often enough to supply 
the needs of the organism, defective mutant alleles are 
often recessive in the diploid. Thus, diploidy has the 
advantage of covering the effects of deleterious muta- 
tions, but the disadvantage of allowing such mutations 
to accumulate in the population. 


Diploid and Haploid Phases in Life Cycles 


The alternation of haploid and diploid phases is the 
essential feature of the sexual cycle: haploid gamete 
nuclei fuse to initiate the diploid phase, and haploidy 
is restored, either sooner or later, by the process of 
meiosis. The relative durations of the haploid and 
diploid phases vary greatly from one group of organ- 
isms to another. 

Animals, with some exceptions such as males in 
wasps and other Hymenoptera, are entirely diploid 
except for the haploid gametes, eggs and spermatozoa, 
which are the immediate products of meiosis. 

In flowering plants the haploid phase is only 
slightly less abbreviated, consisting of one mitotic 
nuclear division in the pollen tube and three in the 
embryo sac prior to the formation of the egg cell. The 
coniferous gymnosperms (pine trees, etc.) have a 
slightly more prolonged haploid phase on the female 
side, but it is still entirely contained within the seed. 
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In the so-called lower plants haploidy is more pro- 
minent. Meiosis in ferns produces haploid spores 
which germinate to give rise to miniature but free- 
living green plants (prothalli) which in turn produce 
eggs and sperm for restoration of diploidy; in mosses 
the leafy shoots are all haploid and only the green 
spore-bearing capsules borne on the shoots are diploid. 

The fungi, with the exception of some sections of 
the Phycomycetes which are diploid, are nearly all 
haploid, with meiosis in the sexual forms following 
immediately after the fusion of haploid nuclei. The 
budding yeasts, Saccharomyces and allied genera, can 
propagate vegetatively either as diploids or as haploids, 
but unless the haploid cells are artificially prevented 
from mating, the diploid phase predominates. 

In the algae all possibilities are found: entirely 
haploid except for the zygotes (e.g., green flagellates 
such as Chlamydomonas and Euglena), predom- 
inantly diploid, with a very brief haploid phase (e.g., 
the Fucales, brown seaweeds), and alternation of 
morphologically similar haploid and diploid phases 
(e.g., green algae such as Ulva spp., the sea lettuce). 

True diploidy hardly occurs in bacteria, although 
there are several ways in which bacterial cells can carry 
a part of their genome in duplicate. 


Artificial Diploidy in Haploid Fungi 
Although, apart from budding yeast, all the fungi 


commonly used in experimental genetics are naturally 
haploid, diploid strains can be often be obtained ar- 
tificially. Thus, inthefission yeast (Schizosaccharomyces 
pombe) occasional diploid cells, formed by fusion of 
haploids or by extra chromosome division (endoredu- 
plication), do not undergo meiosis to form spores, and 
can be picked out and propagated as diploids. 

In other fungi selection has been made for occa- 
sional nuclear fusions that combine in the same 
uninucleate cell complementary functions initially 
separated in different mutant haploid genomes. 
Usually the selection is for growth on unsupplemen- 
ted medium, starting with two different auxotrophic 
mutant haploids. With Ustilago maydis, the corn smut 
fungus, the starting material for this kind of selection 
has been the dikaryotic mycelium of corn gall tissue. 
In the green mold Aspergillus nidulans, forced het- 
erokaryons between auxotrophs have been used, but 
another convenient strategy has been to make a yel- 
low/white heterokaryon (the two colour mutations 
being in different genes) and to look for the occasional 
green (wild-type) patches of growth, which will have 
the two wild-type genes together in the same nuclei. 

Artificially constructed fungal diploids tend 
to be unstable, and this instability has been used 
in genetic analysis, especially in A. nidulans. In 
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Aspergillus, recessive marker mutations can segregate 
out in two ways. Firstly, chromosomes tend to be lost 
in mitosis, and in the resulting aneuploids there is 
strong natural selection for further loss leading 
ultimately to the stable haploid condition. The effect 
is like that of meiosis but without the crossing-over 
between homologous chromosomes. Consequently, 
linked markers always segregate together without re- 
combination, and linkage groups can be defined with- 
out ambiguity. The second cause of instability, greatly 
stimulated by UV radiation, is mitotic crossing-over, 
which, with 50% probability, makes homozygous, 
and hence visible, any originally heterozygous reces- 
sive marker further from the centromere in the same 
chromosome arm. This type of diploid segregation has 
been extensively used in Aspergillus for the ordering of 
genes within chromosome arms (Figure 1). 


Bivalent Formation at Meiosis as a 
Criterion of Effective Diploidy 


The presence in diploids of pairs of similar (homolo- 
gous) chromosomes is seen most clearly at the first 
division of meiosis. At the pachytene stage homologs 
become closely paired point-for-point along their 


A B 


Figure | 


length; at diplotene they are seen to have undergone 
reciprocal single-chromatid exchanges (crossovers), 
and at first metaphase they come to the equator of 
the division spindle in pairs (bivalents) joined at the 
crossover points (chiasmata). The regular disjunction 
of a single set of divided chromosomes to the poles of 
the spindle at the first anaphase of meiosis depends 
on the equal and opposite attractions to the poles of 
the two centromeres of each bivalent. 

Polyploidy, which is rare in animals but very com- 
mon in flowering plants and ferns, tends to disrupt the 
normal course of meiosis, at least when the multiple 
sets of chromosomes are fully homologous. In an 
autotetraploid, that is a tetraploid arising from chromo- 
some doubling within a diploid species, associations 
of four are formed at pachytene, and the quadrival- 
ents, trivalents, and univalents which may result at 
first metaphase often fail to disjoin two-and-two to 
give regular diploid meiotic products. This generally 
results in some infertility and, in so far as the plants are 
fertile, they exhibit tetrasomic inheritance — a depart- 
ure from normal Mendelian rules. 

However, if, as is commonly the case in flower- 
ing plants and ferns, the tetraploid arose by chromo- 
some doubling in an interspecific hybrid, it usually 
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The generation of homozygous cells from heterozygotes by mitotic crossing-over. One pair of chromo- 


somes is represented, with heterozygosity at two loci a and b in one arm, with a furthest from the centromere (shown as 
small circle). (A) before chromosome replication (G; stage of the cell cycle); (B) chromosomes replicated (G3 
stage), with a rare crossover between chromatids in the a-b interval; (C) anaphase of mitosis with crossover 
products passing to opposite poles of the spindle (50% probability); (D) daughter diploid nuclei, now homozygous 
with respect to a but still heterozygous for b. A crossover between b and the centromere would lead (again with 50% 
probability) to homozygosity for a and b together. This kind of segregation of homozygotes from heterozygous 
diploids has been demonstrated particularly in the fly Drosophila melanogaster and the fungus Aspergillus nidulans. 


functions in meiosis as a diploid. Meiotic pairing will 
usually occur far more readily between the fully homo- 
logous chromosomes from the same species than be- 
tween equivalent chromosomes from different species, 
and bivalents will then be formed virtually exclusively. 
The tetraploid will then behave in meiosis like a diploid 
and will obey ordinary Mendelian rules, though it may 
well have functional duplication of many genes. Tetra- 
ploids of this kind are called allotetraploids, or some- 
times amphidiploids, since they have two different 
diploid chromosome sets. Wheat (Triticum aestivum) 
is an allohexaploid, with three diploid genomes origin- 
ating from different species. Functionally, and with 
minor variations, it has all its genes in triplicate, but 
it behaves in meiosis like a regular diploid. 


Further Reading 
Fincham JRS, Day PR and Radford A (1979) Fungal Genetics, 4th 
edn. Oxford: Blackwell Scientific Publications. 


See also: Dominance; Meiosis; Plasmids; 
Polyploidy; Transduction 
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Direct repeats are identical DNA sequences present in 
two or more copies in the same orientation and within 
the same molecule. 


See also: Repetitive (DNA) Sequence 


Directed Deletion In 
Developmental Processes 


See: Gene Rearrangements, Prokaryotic 


Directed Mutagenesis 


See: Complement Loci 


Directed Mutation 


J H Miller 
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Any mutation that is targeted to a specific gene. 


See also: Mutation 
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Disassortative Mating 


See: Assortative Mating 


Discontinuous Replication 
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Discontinuous replication is the synthesis of DNA in 
short (Okazaki) fragments that are later joined to form 
a continuous strand. 


See also: Okazaki Fragment 


Discordance 


L Silver 
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Discordance is the opposite of concordance. This 
word is used in two different ways by geneticists. In 
formal genetic studies, it describes the situation where 
two expressed traits or alleles that are found together 
in a parent are separated in the offspring of that parent. 
The level or percent of discordance refers to the frac- 
tion of total offspring characterized from an experi- 
mental cross that show discordance. The remaining 
fraction is concordant. Discordance is also used in 
twin studies to describe twin pairs that differ in their 
expression of a particular trait under analysis. 


See also: Concordance 


Disequilibrium 


See: Gametic Disequilibrium; Linkage 
Disequilibrium 


Disjunction 
L Silver 
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During the anaphase I stage of the first meiotic division, 
the two homologs of every chromosome “disjoin’ 
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from each other and are pulled to opposite poles by 
spindles that attach to the centromeric regions. This 
disjunction of chromosomes is the physical basis for 
the genetically observed segregation of alleles accord- 
ing to Mendel’s first law. In animals with a normal 
karyotype, the segregation of any one pair 
of homologs will not affect the segregation of any 
other pair of homologs. Thus, individual homo- 
logs of different chromosomes that came into the 
animal together from one parent will go out into 
the offspring in an independent manner. This is the 
physical basis for Mendel’s second law of independent 
assortment. 


See also: Mendel’s Laws; Nondisjunction 


Disruptive Selection 


W G Hill 
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Disruptive selection is where individuals of both 
extreme high and extreme low performance for a 
trait are fitter than are intermediates. This could arise 
if, for example, there are two environmental niches in 
which the population existed, with small animals 
being fitter in one niche and large animals in the 
other. A nice example has been shown by Grant and 
colleagues for Galapagos finches, where birds with 
small narrow bills feed more efficiently on small, soft 
seeds, and those with deep bills more efficiently on 
large, hard seeds. In laboratory experiments, disrup- 
tive selection is practiced by selecting only the high 
and low scoring individuals as parents of the next 
generation. Disruptive selection can be contrasted 
with directional and with stabilizing selection, 
where, respectively, individuals at one end of the dis- 
tribution and individuals at the middle of the distribu- 
tion are fitter in nature or are selected in laboratory 
experiments. With disruptive selection, the variance 
among the selected parents is higher than in the popu- 
lation as a whole. Consequently the variance among 
the offspring is also increased, to an extent depend- 
ing mainly on the heritability of the trait and the 
strength of the disruptive selection. This increase 
in variance can be predicted simply when the trait 
is assumed to be affected by many loci each of small 
effect (infinitesimal model), and arises from gametic 
(linkage) disequilibrium as Bulmer has shown. In 
contrast to stabilizing selection, where variance is 


reduced by selection but soon reaches an asymptote, 
with intense disruptive selection the variance can 
in principle increase without bounds. In such a case, 
however, it is probably better to regard the selected 
individuals as comprising two subpopulations, H(igh) 
and L(ow): with subsequent random mating, selected 
H individuals of the next generation all come from 
H x H matings, and all L from L x L matings; no 
H x L or L x H offspring are selected. There is then, in 
effect, a divergent selection experiment of low inten- 
sity (because half the matings never contribute) being 
conducted in the same, nominal, single line; and the 
properties can be deduced from those of directional 
selection. 

Interest in disruptive selection has arisen because it 
was suggested, e.g., by Mather, as a possible route to 
sympatric speciation: the disruptive selection would 
become more efficient if H x L and L x H matings 
occurred less frequently than with random mating, 
thereby leading to reproductive isolation. The 
expected increase in variance has been observed in 
laboratory experiments in which disruptive selection 
has been practiced. The evidence for reproductive 
isolation is more equivocal: it was observed by Tho- 
day and colleagues in selection experiments for sterno- 
pleural bristle number in Drosophila melanogaster 
conducted in the 1950s and 1960s, but attempts to 
repeat these, for example, by Scharloo and colleagues, 
have generally been unsuccessful. 


Further Reading 

Falconer DS and Mackay TFC (1996) Introduction to Quantitative 
Genetics, 4th edn. Harlow, UK: Longman. 

Roff DA (1997) Evolutionary Quantitative Genetics. New York: 
Chapman & Hall. 

Thoday JM (1972) Disruptive selection. Proceedings of the 
Royal Society of London, Series B, Biological Sciences 182: 
109-143. 


See also: Additive Genetic Variance; Artificial 
Selection; Genetic Variation 


Distal 


L Silver 
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Distal is a relative term meaning closer to the telomere 
along a chromosome. It is the opposite of proximal. 


Divergent Evolution 
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Divergent evolution is the development of a family of 
proteins from the duplication and mutation of a single 
ancestral gene to produce related proteins with differ- 
ent functions. 


See also: Evolution of Gene Families 


Divergent Transcription 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1824 


Divergent transcription is the initiation of transcrip- 
tion at two promoters facing in opposite directions, 
such that transcription proceeds away in both direc- 
tions from a central region. 


See also: Transcription 


D-Loop 
Y Yamamoto 
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The name ‘D-loop’ is derived from ‘displacement 
loop,’ found in mitochondrial DNA (mtDNA) as an 
early replication intermediate. In some cases, this seg- 
ment of the mitochondrial genome is called the con- 
trol region. The mechanism of mtDNA replication 
has been well characterized in cultured mouse cells 
(Boore, 1999). Replication of the H-strand starts at a 
fixed point (oriH) in the D-loop region, then repli- 
cated DNA displaces the nonreplicated single strand 
to form the D-loop. Replication of the L-strand initi- 
ates at oriL within a tRNA cluster far from the D- 
loop, after the H-strand replication fork has passed 
through (Figure 1). Consequently the replication of 
the complementary strands of mtDNA is asynchro- 
nous. The transcription of mtDNA also starts within 
the D-loop region. Several RNAs are transcribed from 
both H- and L-strands. The L-strand transcript is 
responsible for priming the replication of the H- 
strand. Thus the D-loop is noncoding and the control- 
ling region for transcription and replication lies 
between two tRNA genes on mtDNA. The length of 
the D-loop is approximately 1kb in vertebrates, 
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Figure | The D-loop formation in a mitochondrial 
genome. The dashed line between ‘a’ and ‘b’ shows the 
L-strand transcript, which serves as the RNA primer 
of DNA replication. The curved arrow indicates 
newly synthesized L-strand DNA and its direction of 
replication. 


excluding tandem and/or direct repeated sequences, 
which are frequently found in various species. 

Mitochondrial DNA is compact, circular, and 
double-stranded. For the most part, mtDNA consists 
of coding regions for proteins and tRNAs, except for 
the D-loop region. As mentioned above, the D-loop 
region contains several promoters and an initiation site 
for H-strand replication. Specific sequences needed 
for transcription and replication should be properly 
spaced within this region. The nucleotide sequence of 
the other D-loop region is therefore considered to be 
variable and without effect on transcription and repli- 
cation. In fact, mtDNA evolves 10 times more rapidly 
than nuclear DNA, and the D-loop is the most vari- 
able region of mtDNA. 

Substantial genetic variation is found in the D-loop 
region, even among individuals within a given species. 
Nucleotide variations in the D-loop among indi- 
viduals have been well studied in various species includ- 
ing humans. For example, 604 unrelated Caucasians 
differed on average at eight bases in hypervariable 
regions I and II (HV-I and HV-II) within the D- 
loop. Because mitochondria are maternally inherited, 
mtDNA of each individual has a unique nucleotide 
sequence, which can be dealt with as a haplotype. 

Haplotype analysis of the D-loop region is a useful 
tool for revealing genetic diversity, which is essential 
for the preservation of species. Nowadays many spe- 
cies are endangered as a result of the destruction of 
habitat. Decreases in population lead to reduced 
genetic diversity, which can cause a population sur- 
vival crisis. Determination of the haplotype number in 
endangered species reveals the level of endangerment. 
Haplotypes in the D-loop region of mitochondrial 
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DNA have beenanalyzed with DNA samples extracted 
from blood, tissue, hair roots, and feathers in natural 
and captive populations. At present, a reintroduction 
plan for captive Oriental white storks is under way in 
Toyooka, Japan. In this case, haplotype analysis is 
used to prevent inbreeding among captive birds. If the 
destruction of habitat were to go on further, various 
natural populations might disappear, leaving only cap- 
tive populations. Therefore, it is important to main- 
tain genetic diversity even in captive species by 
avoiding inbreeding using haplotype data. Such prac- 
tices are likely to facilitate the future reintroduction of 
such endangered species into the wild. 


References 
Boore JL (1999) Animal mitochondrial genomes. Nucleic Acids 
Research 27(8): 1767—1780. 


See also: Mitochondrial DNA (mtDNA); 
Mitochondrial Genome 
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DNA, or deoxyribonucleic acid, is the molecule that 
serves as the genetic material in living cells. In the 
reproductive process characteristic of all life, a copy 
of the parent cells’ DNA is passed to the next genera- 
tion. The DNA contains information necessary for 
construction of the daughter organism. 

Although DNA was first identified in the nucleus 
of cells as early as the late nineteenth century, its role 
as genetic material was not established until 1944. In 
that year, Oswald Avery and colleagues reported that, 
when introduced into certain bacterial cells, highly 
purified preparations of DNA were able to cause a 
visible change in the cell surface. Thus, the outward 
appearance of the cells, or phenotype, was shown to 
depend upon the presence of a specific molecule, 
DNA. This finding marked the beginning of the field 
of molecular genetics and provided an explanation in 
chemical terms for the classical genetic experiments of 
Gregor Mendel and others. 

DNA consists of a repetitive chemical structure in 
which phosphate groups are alternatively linked with 
deoxyribose sugars to form the backbone of the mol- 
ecule. However, joined to each sugar is an additional 
chemical group known as a base, which may be one of 
four types: adenine (A), thymine (T), guanine (G), or 
cytosine (C). The combination of a phosphate, sugar, 
and base is called a nucleotide (Figure 1). A human 
chromosome may contain in excess of 100 million 


nucleotides and can be described by the sequence of 
the bases in a single long strand. In 1953 James Watson 
and Francis Crick, relying on the experimental data of 
Rosalind Franklin and Maurice Wilkins, discovered 
that two such DNA strands are paired to create a 
double-helical structure, with the individual chains 
coiled around each other (see DNA Structure). The 
second strand contains a complementary base sequence 
to the first, in accordance with base-pairing rules: A in 
the first strand is always paired with T in the second, 
while G is always paired with C (Figure 2). Thus, if the 
sequence of one strand is known, that of the second, 
complementary strand is immediately specified as well. 
The overall helical structure of DNA is independent of 
the particular sequence of its component nucleotides. 
This is possible because the A-T base pair is nearly 
identical in size and shape to the G-C base pair. 

The diameter of the double helix is about 2 nm, or 
two-millionths of a millimeter. However, its length 
greatly exceeds the dimensions of the cell nucleus, 
which is only about 1 um in diameter, or 1000th mm. 
The total length of DNA found inside the nucleus of 
human cells is estimated to be 1.8m. Because the 
length of the DNA is approximately 1.8 million times 
greater than the diameter of the nucleus, it must be very 
tightly folded and packaged. This is accomplished by 
formation of the large macromolecular assemblies 
known as chromosomes. In addition to DNA, chro- 
mosomes also contain small, positively charged pro- 
teins known as histones. The negatively charged 
phosphate groups of the DNA bind to the histones, 
facilitating the necessary wrapping and tight compac- 
tion of the polynucleotide chains. Human somatic 
cells contain 46 chromosomes arranged in 23 pairs. 


Figure | Chemical structure of part of a DNA 
molecule, showing the phosphate—sugar—phosphate 
backbone linkages. The base shown joined to the sugar 
is adenine (A). The sugar—phosphate backbone is 
identical throughout the entire molecule, while the base 
linked to each sugar may be either A, G, C, or T. 


Figure 2 The two types of base pairs commonly 
found in DNA. Dotted lines indicate hydrogen-bonding 
interactions between the bases. 


The double-helical structure of DNA explains its 
dual functions of copying itself and providing the in- 
formation necessary for the construction of the cellular 
machinery. Replication is the biochemical process by 
which DNA dictates the synthesis of progeny daugh- 
ter molecules identical to itself. In the replication 
mechanism, the two strands of DNA unwind, and a 
complementary strand is synthesized on each to yield 
two daughter duplexes. The information transfer 
needed for construction of progeny organisms is 
accomplished through the genetic code, by which 
sets of three nucleotides in DNA are associated with 
specific amino acids that are then incorporated into 
proteins. By virtue of its ability to specify its own 
replication as well as the synthesis of protein mol- 
ecules that are different in chemical structure, DNA 
may be said to possess both autocatalytic and hetero- 
catalytic properties. 

Genes are short segments of DNA ranging in size 
from hundreds of base pairs to tens of thousands of 
base pairs, which contain the information encoding 
specific proteins. A very complex cellular machinery 
exists to “read out” or express the information encoded 
in the specific sequence of the base pairs. In this two- 
part process, the DNA duplex is first partially 
unwound, and a molecule of ribonucleic acid (RNA) 
is synthesized from one of the DNA strands, begin- 
ning at the start and terminating at the end of the gene. 
RNA contains only very small chemical differences 
compared with DNA, butit exists primarily ina single- 
stranded from rather than as a duplex. Except for the 
substitution of the closely related uracil (U) base in 
place of T, the RNA produced in this transcription 
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process has an identical sequence to one strand of the 
duplex DNA parent molecule. In the second stage of 
gene expression, this messenger RNA (mRNA) is 
transported out of the cell nucleus to the cytoplasm, 
where the synthesis of protein occurs on an enzyme 
known as the ‘ribosome.’ The process of protein 
synthesis from mRNA templates is also known as 
‘translation.’ 

Genes make up only a small proportion of the total 
DNA in a human cell. The majority of the DNA, 
some of which contains repetitive sequences of 
unknown function, does not encode proteins. Indeed, 
in any given cell, even the majority of genes are not 
expressed. The specific combination of genes from 
which mRNA and protein are synthesized is charac- 
teristic of a given cell type and leads to construction of 
highly specialized cells such as those found in muscle 
and in the nervous system. The regulation of gene 
expression is largely accomplished by the action of 
proteins that bind to the DNA duplex and either 
activate or repress the synthesis of mRNA from cer- 
tain genes. Thus, while the overall helical structure of 
duplex DNA is independent of the base sequence, 
small chemical differences among the DNA base 
pairs are sufficient to allow proteins to distinguish 
where specific genes begin. 

The processes of replication, transcription, and 
gene regulation show how the structure of the 
DNA duplex is eminently suitable for performing its 
biological role as the genetic material. The overall 
structure-function relationships established for 
DNA remain the focus of active investigation to this 
day and provide the central conceptual basis for the 
explosion of scientific knowledge that forms the 
underpinning for breakthroughs in biotechnology 
and medicine. 


Further Reading 
Judson HF (1999) The Eighth Day of Creation, 2nd edn. Plainview, 
NY: Cold Spring Harbor Laboratory Press. 


See also: DNA, History of; DNA Structure; 
Genetic Code 
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DNaAase is an enzyme that digests DNA. 


See also: DNA Structure; Endonucleases 
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DNA-binding proteins serve two principal functions: 
to organize and compact the chromosomal DNA and 
to regulate and effect the processes of transcription, 
DNA replication, and DNA recombination. The 
organization of chromosomal DNA is accomplished 
by abundant proteins that can bind to many sites and 
either lack any sequence specificity or have minimal 
sequence-recognition requirements. In Enterobacteria 
this function is performed by highly abundant pro- 
teins including the Factor for Inversion Stimulation 
(FIS), H-NS, Dps, and HU proteins and in eukaryotic 
nuclei by the histone octamer and the linker histones, 
H1 and H5. By contrast, regulation of the enzymatic 
processes that manipulate DNA requires precise 
targeting to particular DNA sequences. This involves 
the specific recognition of the base sequence by a pro- 
tein or proteins. Such proteins act genetically as re- 
pressors or activators either by themselves or in 
combination with corepressors or coactivators. These 
targeting proteins act in concert with the enzymes that 
act on DNA in transcription, replication, and recom- 
bination. In this class of proteins would be such 
diverse entities as the RNA polymerases, DNA poly- 
merases, and the enzymes that effect recombination 


including, for example, the invertases and resolving 
enzymes. In addition to enzymes of this type there 
are others, such as restriction endonucleases, in which 
specific sequence recognition is combined with an 
enzymatic function. Other DNA-binding enzymes 
recognize chemical and structural modifications of 
DNA. Among these would be included the demethyl- 
ases, which remove methyl groups from DNA and 
processive single-strand exonucleases such as the 
phage lambda and T7 exonucleases. 


DNA-Binding Domains (DBDs) 


DNA binding is specified by a large number of dis- 
parate, and often modular, protein motifs. Within any 
particular class of motif the degree of sequence select- 
ivity is highly variable, for example, among proteins 
with the helix-turn-helix motif the Jac repressor 
shows a high degree of sequence specificity, whereas 
the FIS protein possesses little. Commonly encoun- 
tered types of motif and their properties are listed in 
Table |. 

Some proteins contain more than one distinct 
DNA-binding motif that allows them to make dif- 
ferent types of interaction with a single binding site. 
One example is the LEF-1 transcription factor in which 
sequence-specific binding is mediated by an HMG 
domain binding in the minor groove and further 
charge neutralization is accomplished by the binding 
of a short basic region in the opposite major groove. 
Conversely in the Escherichia coli Hin invertase, 


Table | Commonly encountered types of motif and their properties 

Motif Recognition Specificity Examples 

Histone fold DNA backbone None Core histones, TBP-associated factors (TAFs) 
HMG domain Minor groove Variable HMGI, lymphocyte-enhancing factor (LEF-1) 
AT-hook domain Minor groove A/T rich sequences HMG-I(Y) 

TBP domain Minor groove High (TATAAA) TATA-box binding protein (TBP) 

HU class Minor groove Variable HU, IHF 


Helix—turn-helix class Major groove 


Homeodomain 


Highly variable 
Moderate 


FIS, phage lambda cl repressor 
Hox proteins, Drosophila Repo 


Winged helix Highly variable Histones HI and H5, hepatic nuclear factor 3 
Pou domain OCT-| 
Zn-containing motifs 
Zinc finger Major groove Variable, can be high TFIIA 
Receptor DBD Major groove Estrogen receptor 
Gal4 DBD Major groove Gal4 
GATA Major and minor grooves GATA-| 
bZip Major groove High GCN4, c-Fos 
Rel Major groove High NF-KB 


DBD, DNA-binding domain; HMG, high mobility group. The HMG proteins are a disparate group of abundant nuclear 
DNA-binding proteins; the HMGA group contain an AT-hook; the HMGB group contain a canonical HMG domain; and the 
HMGN group (not listed above) bind to core nucleosome particles. 


sequence-specific interactions are mediated by a helix- 
turn—-helix domain binding in the major groove and 
further interactions are made by a short extended 
peptide in the floor of an adjacent minor groove. 
Typically a DNA sequence recognition motif 
recognizes a sequence of 3-5 bp. This is insufficient 
to allow highly selective discrimination between all 
sequences in a genome. In practice the effective site 
size for recognition can be increased by use of a larger 
protein assembly, or as in the POU domain, the 
conjunction of two sequence-specific DNA-binding 
motifs in the same polypeptide. Larger assemblies 
often comprise a stable homo- or heterodimer (as in 
helix-turn-helix and bZip proteins). However, in 
other cases cooperative interactions between proteins 
bound at contiguous or distant DNA sites are 
required for the formation of an assembly. One ex- 
ample of such interactions is the cooperative binding 
of lambda C; repressor dimers to the leftward and 
rightward operators in lambda DNA, each of which 
contains three C; binding sites. Because stable binding 
to any two of these sites is dependent on interaction 
between separate dimers, occupation is sensitive to 
small changes in concentration of repressor molecules 
over a certain range. Interactions can also occur 
between distant binding sites on the DNA such that 
simultaneous occupation generates a loop of interven- 
ing DNA. Loop formation of this type need only 
require a single stable protein assembly (as in the case 
of the tetramer of the lac repressor containing four 
helix-loop-helix motifs) or may require cooperative 
interactions between proteins bound at the separate 
sites (as have been postulated for interactions between 
eukaryotic enhancer and promoter elements). When 
bound at distant sites there is frequently a requirement 
that the two proteins be bound on the same face of the 
double helix. One example of this phenomenon is 
the binding of the E. coli AraC protein to two sites 
separated by about 230 bp on the araBAD promoter. 
Altering the separation of these sites by integral 
double-helical turns has little effect on loop form- 
ation. By contrast alteration of the separation by 0.5, 
1.5, 2.5 turns, etc., severely impairs loop formation 
and regulatory function. The requirement for binding 
on the same face of the double helix arises because the 
torsional rigidity of DNA prevents the untwisting or 
overtwisting necessary to bring the two proteins into 
appropriate spatial register when they are initially 
bound on opposite faces of the duplex. The presence 
of more than one DNA-binding motif in a protein can 
also permit interactions with more than one DNA 
duplex. A good example of this is provided by the 
globular domain of histones H5 and H1 (Figure 1). 
These domains comprise a winged helix variant of the 
helix—turn—-helix motif where the recognition helix is 
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inserted in the major groove of one DNA duplex. The 
interaction with this duplex is further stabilized by a 
highly conserved lysine residue. However, in addition 
the H1 and H5 winged helix domains contain a second 
basic patch on the opposite face of the protein to the 
winged helix domain. This second DNA-binding 
region can bind to an adjacent turn of DNA on the 
nucleosome core particle or when free in solution to a 


second DNA duplex. 


Protein-DNA Recognition 


The binding of a protein to a specific DNA sequence is 
largely dependent on two types of interaction. The 
principal basis for sequence selectivity is direct contact 
between the polypeptide chain and the exposed edges 
of the base pairs, primarily in the major groove of 
B-DNA. These contacts may involve either hydrogen 
bonds or van der Waals interactions, the latter par- 
ticularly with the methyl group of thymine. Small 
molecules, such as water molecules, which are tightly 
and rigidly bound to a protein and are thus integral 
components of the macromolecular structure, may 
also participate in these interactions and so provide 
binding specificity to the protein by proxy. 

The binding energy available from direct interac- 
tions with the base pairs, although significant, is not in 
general sufficient by itself to allow the formation of a 
stably bound complex for binding sites of average 
length (6-15bp). The required additional binding 
energy may be provided by direct electrostatic inter- 
actions between basic amino acid residues and the 
negatively charged sugar-phosphate backbone. The 
spatial constraints imposed by this type of interaction 
may also serve to restrict the configuration of the DNA 
when bound to protein. It is the difference between 
the binding energies for the sequence-dependent and 


) 


Figure |I (See Plate 10) Two DNA-binding sites on the 
globular domain of histone H5. The recognition helix lies 
in the major groove of DNA at the top of the molecule. 
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Figure 2 (See Plate 11) An HMG domain induces a 
bend of ~95° in DNA. (Courtesy of Dr A. Hillisch, IMB, 
Jena, Germany.) 


sequence-independent components of the interaction 
that is the measure of the sequence selectivity of a 
DNA-binding protein. Although DNA charge neu- 
tralization is usually necessary for compaction, not all 
proteins that compact DNA are positively charged. 
One protein, the abundant E. coli nucleoid-associated 
protein Dps, is, like DNA, negatively charged, and is 
believed to mediate DNA compaction by facilitating 
the formation of a bridge of positively charged ions 
between itself and the DNA backbone. 

In many DNA-protein complexes the DNA is dis- 
torted from the canonical B-form structure. Such 
distortions can involve substantial DNA bending, 
DNA untwisting, ora combination of both. The induc- 
tion of DNA bends can be achieved by a variety of 
mechanisms: spatial constraint on a rigid protein sur- 
face (the histone octamer, factor for inversion stimu- 
lation (FIS)), the insertion of hydrophobic residues 
between adjacent bases in the DNA duplex (HMG 
domain proteins, TATA binding protein, /ac repressor, 
integration host factor (IHF)), and charge neutraliza- 
tion on one face of the double helix (histone octamer, 
catabolite repressor protein (CRP)). DNA bending 
accompanies DNA packaging in chromatin and is also 
necessary for bringing transcription factors bound at 
separated sites ona linear DNA into close spatial prox- 
imity. Proteins effecting this latter function often bend 


the DNA by substantial amounts: for example, the 
heterodimeric IHF can induce a bend of 180° over one 
double-helical turn while the HMG domain induces a 
bend of ~90-100° over six base pairs (Figure 2). 

In addition to proteins that induce distortions of 
DNA structure certain DNA-binding proteins recog- 
nize both lesions in DNA such as UV-induced pyr- 
imidine dimers, cisplatin adducts and single-stranded 
nicks, and also enzymatically generated structures 
including four-way junctions and fork junctions. 
One particular example is the A domain of HMG1, 
which specifically recognizes a cisplatin adduct by 
binding in the minor groove and inserting a phenyl- 
alanine residue into the cisplatin-induced kink 
between two adjacent guanine bases. 


See also: DNA Structure 
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Definition 


The word ‘clone’ comes from the ancient Greek, 
meaning ‘bud’ or ‘twig.’ It was introduced in bio- 
medical science to describe a group of genetically 
identical organisms. The members of a clone (often 
named clones themselves) are expected to be identical 
because of their descent from a single organism. By 
‘cloning,’ we mean the ability to generate several iden- 
tical copies of (1) a DNA molecule (‘molecular clon- 
ing,’ to be obtained either in vivo or in vitro), or (2) a 
cell (which could be eukaryotic or prokaryotic: in this 
case we speak of ‘cellular cloning’). Recent attempts to 
clone fully developed mammals have stirred interest 
among scientists and emotions among laymen: in this 
case the term ‘somatic or reproductive cloning’ is 
often used since the starting materials are the nuclei 
of somatic, possibly differentiated cells, and the aim is 
the generation of copies. 


History 


All the information and techniques necessary to iden- 
tify and retrieve a DNA fragment or gene of interest, 
present in any genome, have essentially been known 
since the 1970s. In the early 1970s several techniques 
were developed which launched molecular cloning, 
also known as ‘genetic engineering’ or ‘recombinant 
DNA technology.’ Briefly, they allowed the covalent 
joining (or ligation) of any given fragment of DNA 


(called an ‘insert’) contained in an unresolved mixture 
(such as that produced by a restriction endonuclease 
acting ona genomic DNA), toa replication-competent 
and selectable DNA element, to be known as a ‘mol- 
ecular vector.’ When these ligated products were in- 
troduced (or transferred) into host cells, it became 
possible to propagate each of them within each host cell. 

The first experiment showing the possibility of 
persistently propagating eukaryotic DNA in bacteria 
was the cloning of the ribosomal RNA genes from a 
frog (Xenopus laevis) into Escherichia coli. The poten- 
tial of cloning was immediately foreseen for the study 
of genes and for industrial development. Production 
of somatostatin (a hypothalamus hormone) was one of 
the first examples of hormone production by bacteria. 

The technique was soon extended to plant and 
animal systems, leading to the creation of what were 
later to be known as ‘transgenic organisms’: organisms 
carrying in their somatic cells, as well as in their ger- 
minal cells, genes foreign to their evolutionary history 
(exogenous genes). Thus, transgenic organisms were 
capable of both expressing the foreign gene (trans- 
gene) in some or all of their somatic cells, and also of 
passing it down to ensuing generations through their 
germ cells. In most cases the transgene would be inte- 
grated randomly into the host genome, so that its 
overall effect on the well-being of the host could not 
be fully predicted. Targeted transgene insertions are 
difficult to achieve. Although significant accomplish- 
ments have come from 25 years of research, the often 
invoked ‘replacement’ of defective human genes by 
their correct versions remains a dream for future gen- 
erations of gene therapists. 

We will now examine the purpose of the cloning 
experiments and how to perform them. The cloning 
strategies, hosts and vectors will depend on the goals 
to be achieved and on the systems employed for that 
purpose. 


Cloning Objectives 
Three major objectives can be considered: 


1. Production of large amounts of a DNA sequence. 

2. Study ofasingle gene product(RNA and/or protein). 

3. Construction of transgenic organisms carrying and 
expressing a transgene. 


Production of Large Amounts of a DNA 
Sequence 

One of the main purposes of cloning is to collect large 
amounts of DNA fragments in genomic libraries. 
They may be needed for sequencing projects, for bio- 
physical characterization of DNA structural features 
or DNA-ligand interactions, or as starting materials 
for further specific cloning applications. Genomic 
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libraries are stocks of host cells harboring a recom- 
binant structure containing an unique genomic frag- 
ment, different in each transformed cell. There are 
specialized libraries for DNA fragments correspond- 
ing to single chromosomes, or for DNA complemen- 
tary (CDNA) to messenger RNA (mRNA). These 
libraries are the starting material for gene or genome 
sequencing projects. 

The host of choice for sequencing projects is E. coli. 
Amplification of a large amount of DNA requires 
specialized vectors present in a high copy number 
per cell (multicopy plasmids or phages). For some 
goals, such as genomic mapping, the vectors have to 
be able to accomodate very large DNA fragments; see 
below: ‘Preparation of the vector.’ 

Genomic libraries have facilitated the complete 
resolution of the DNA sequences of tens of bacterial 
genomes (many of them pathogens), of the entire yeast 
genome and, with the ever-improving sequencing 
technologies, the genomes of a worm (Caenorhabditis 
elegans), of an insect (Drosophila melanogaster), of a 
plant (Arabidopsis thaliana), and ultimately of the 


human genome. 


Study of a Single Gene Product 

Another goal of cloning large amounts of DNA is the 
study of single genes. The cloning techniques have 
been crucial for their isolation, for the unraveling 
of their regulation and expression, and for the bio- 
physical studies of specific DNA molecules. The 
construction of a very large number of specialized 
vectors and the development of techniques for their 
delivery to particular hosts have made this new genetic 
approach very powerful. Vectors have been con- 
structed that can replicate in a variety of host cells, 
ranging from bacteria, to yeast and to many other 
types of eukaryotic cells. The so-called shuttle vectors 
can replicate in two different hosts, for example, in E. 
coli and a mammalian host. Two origins of replication 
are present in these cases: one to allow the replication 
of the recombinant structure in E. coli (the favored 
host for DNA manipulations because of its ease of 
handling) and a second one, specific for replication in 
the host of the planned study. 

Specialized vectors have been designed for the 
sustained expression of the transgene for functional 
analysis of the transcription product (mRNA) and/or 
the translation product (protein). Both goals require 
the apposition of the transgene to host-specific tran- 
scriptional signals (e.g., promoter) and translational 
(e.g., ribosome binding site). Possibly the codons are 
selected to optimize the synthesis of a protein by any 
given host, as well as its modification for either extra- 
cellular export or periplasmic compartmentalization, 
so increasing its function, yield, and recovery. 
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Fusing the transgene to tags which lend themselves 
to an easy isolation can further favor the rapid purifi- 
cation of the desired gene product, e.g., by affinity 
chromatography. Dissection of genes into parts 
has allowed the functional studies of protein domains, 
and mutagenesis techniques have been designed to 
enable precise nucleotide changes in transgenes to 
further analyze or improve their function. 

Studies of the regulation of gene expression have 
been made possible by cloning promoter regions and 
fusing them to reporter genes, easily detectable due 
to the presence of colored or fluorescent products. 
Expression of the gene may be placed under the con- 
trol of inducible promoters: this may delay the appear- 
ance of a product which, if present during earlier 
growth phases, may interfere with the host cell func- 
tions. More recent advances in our understanding of 
gene regulation have led to the development of sys- 
tems more suitable to the study of interactions among 
proteins in vivo, particularly in yeast. For example, 
cloning techniques have, insome cases, allowed replace- 
ment of the wild-type gene with its mutated forms 
carried on specialized vectors, e.g., by ‘knockout’ via 
homologous recombination of predetermined genes in 
yeast and mouse. In these systems, a conceptually 
similar application on specific mutants can allow the 
restoration of a pristine function in mutants by a 
procedure called ‘gene knockin.’ 


Transgenic Organisms 

The latest developments of cloning techniques have 
been the production of higher plants and animals, 
which carry exogenous genes that can be transmitted 
to their transgenic descendants. This achievement has 
benefited from all the techniques of gene manipulation 
(detailed below) with an additional step required to 
insert the new gene into the germ cell chromosome(s) 
of the organism, so that it will usually be transmitted 
through generations. 

Plants are easily amenable to such manipulations 
with the help of natural vectors, for instance the trans- 
posable element Ti of the bacterium Agrobacterium 
tumefaciens which naturally moves to the plant gen- 
ome. Sheep and cattle have been genetically manipu- 
lated to produce hormones which are coded for by 
genes placed under the control of mammary gland 
transcriptional signals: this means that the gene pro- 
duct is secreted in milk, allowing easy purification. 

Gene therapy, which carries great expectations for 
curing genetically transmitted diseases as well as other 
conditions such as cancer or AIDS, is a form of clon- 
ing. Viral vectors are presently used to introduce the 
normal gene (or one endowed with therapeutic poten- 
tial) to compensate for the deficient one. This is usu- 
ally done in somatic cells, extracted from the patient, 


manipulated, and finally reimplanted in the patient 
(‘ex-vivo treatment’). Great care is taken to avoid the 
recombinant structures finding their way into organs 
or tissues not under treatment. In view of the many 
unknowns of even the best gene therapy protocols, it is 
of paramount importance to prevent the transplanted 
genes entering in the germline and being passed to the 
next generations. 


Strategies of Molecular Cloning 


In spite of the large variety of cloning objectives and of 
the ways to achieve them, in molecular cloning the 
basic steps are similar: 


1. preparation of the insert and vector; 
2. ligation; 

3. transformation; 

4. selection of the transformants; 

5. screening of the clones. 


These steps are depicted in Figure |; very comprehen- 
sive presentations can be found in Winnacker (1987), 
Oldand Primrose (1989), and Micklos and Freyer (1990). 


Preparation of the Insert 

The first step is the preparation of sufficient amounts 
of the DNA fragment to be cloned. For simple organ- 
isms this can be achieved by digesting genomic 
DNA with restriction enzymes. Most of the restric- 
tion enzymes recognize DNA sequences which are 
short (4—6 bp) and frequently palindromic, and then 
split them (or an immediately proximal sequence), 
often in a staggered way. This method generally pro- 
duces DNA fragments ranging from a few hundred to 
several thousand base pairs depending on the restric- 
tion enzyme used (Figure 2). 

For more complex organisms, there is a well estab- 
lished protocol for cloning expressed genes: (1) 
mRNA can be used as a template in vitro for reverse 
transcriptase and DNA polymerase to generate a 
double-stranded cDNA, (2) large amounts of a given 
fragment can be obtained from any source by ampli- 
fication with a technique known as the polymerase 
chain reaction (PCR). 

Other physical methods (such as sonication or 
shearing) are used to prepare DNA for cloning: these 
methods can generate fragments of hundreds or thou- 
sands of base pairs. In any event, irrespective of the 
technique used, the ends of the DNA fragments need 
to be made compatible with the ends of the vector for 
the ligation step to be possible (see below). 


Preparation of the Vector 

The most commonly used vectors have been the plas- 
mids. These are extrachromosomal elements, able 
to replicate themselves within their hosts, abundantly 


(A) Vector and insert preparation 
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(B) Vector and insert ligation 


(C) Transformation 
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(D) Selection of the transformation mixture 


(E) Screening of the clones 


Figure | 


The steps of a cloning experiment. Shaded areas indicate the ends of the linearized vector or of the insert; 


ori indicates the origin of replication of the vector; marker indicates the resistance to an antibiotic. In (C), 
chromosomal DNA inside the host cell is not shown and only one copy of recombinant DNA is indicated. 


and independently from the replication of the host 
chromosome. A large number of vectors are derived 
from bacterial plasmids, particularly from E. coli. Many 
of them have been modified in vitro to accomplish 
their task more efficiently. They generally (1) are small, 
3-5 kb, (2) are present in a large number (up to 500 
copies) per cell, (3) carry the gene for resistance to one 
or more antibiotics, so that the cells which harbour 
them can be positively selected, and (4) are endowed 
with several unique restriction sites in order to facili- 
tate the insertion of the desired gene. 

Plasmids can easily accommodate DNA fragments 
up to 2-3 times their size. Another frequently used vec- 
tor for cloning in E. coli is the phage lambda, which 
has been modified to allow the insertion of relatively 
large (10-15 kb) fragments. Other phages, mostly the 
phage M13, also considerably modified im vitro, have 
been extensively used especially when the main pur- 
pose of the cloning was sequencing the fragment. 
Some eukaryotic vectors, such as the 2 um plasmid of 
yeast origin or systems based on the simian virus 


SV40, have been used essentially to clone transgenes 
in eukaryotic cells. However, in more recent years, 
the cosmids (hybrid structures constituted partly by 
plasmids and partly by phages), the yeast artificial 
chromosomes (YAC), and the bacterial artificial 
chromosomes (BAC) have been used to clone frag- 
ments in the range between 0.05 and 2 Mb, mostly for 
large sequencing projects. 


Ligation Step 

This step is generally catalyzed by the phage T4 DNA 
ligase prepared from phage infected E. coli cells. 
Immediately upon its discovery it had been postulated 
that this enzyme was only able to form a phosphodi- 
ester bond between a 5/P-end and a 3/OH-end of two 
polynucleotide chains properly juxtaposed thanks to a 
complementary sequence acting as template (Figure 
2A). Subsequently it has been demonstrated that it 
was able to ligate two different DNA duplexes having 
short single-stranded complementary ends (Figure 
2B). Finally, despite some initial skepticism it was 
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(A) 
5 G-T-G-A-T*C-T-A-G 3 
3° C-A-C-T-A-G-A-T-C 5 
(B) 
5 G-T-C-T-G * 
3° C-A-G-A-C-T-T-A-A 
(C) 
5 G-T-T-C-G-A-T * 
3° C-A-A-G-C-T-A * 
Figure 2 


(A) Polynucleotide chain interrupted on a single strand at the phosphodiester bond between Tand C. The 


reaction catalyzed by the ligase is indicated by an asterisk. (B) Two different fragments obtained by degradation with 
the restriction enzyme EcoRI (which leaves 5’ protruding ends) are indicated. In bold the EcoRI recognition site, the 
other nucleotides are as an example. The reactions catalyzed by the ligase are indicated by two asterisks. (C) Two 
different fragments obtained by degradation with the restriction enzyme EcoRV (which leaves flush ends) are 
indicated. The EcoRV recognition site bold; the other nucleotides are examples. The reactions catalyzed by the ligase 


are indicated by two asterisks. 


demonstrated that two DNA duplexes having no pro- 
truding ends (called flush or blunt ends) could be 
ligated to each other, albeit with slightly lower effi- 
ciency (Figure 2C). 

Irrespective of the origins of the DNA fragment 
and vector, the ends of both have to be compatible 
in order for ligation to occur. As indicated above, 
restriction enzymes cut DNA at specific sequences, 
frequently in a staggered way: that means that either 
the 5’ or 3’ ends may be protruding, but always in a 
complementary way. For cloning, the most favor- 
able case is when both the vector and the fragment 
have been cut by the same restriction enzyme (for 
instance EcoRI, PstI, HindIII, etc.) or at least by two 
enzymes which create complementary protruding 
ends (for instance BamHI and BglII). Under these 
circumstances the efficiency of the ligation step is 
very high. However, if the vector and the DNA to be 
cloned have been digested by different enzymes, gen- 
erating non-compatible ends, other steps are taken to 
make them compatible: 


e The simplest procedure is to make both duplexes 
have blunt ends. This can be achieved with several 
enzymes which can either fill in or remove the 
protruding strand. This procedure to ‘polish’ the 
ends of the DNA fragments prior to the cloning 
step is almost mandatory when the fragments have 
been obtained by physical or enzymatic methods 
(such as sonication or shearing or cloning of frag- 
ments obtained through a PCR amplification) 
which leave the fragments frayed at their ends and 
thus not suitable for joining. After this step all the 
fragments are suitable for cloning in a vector pro- 
duced by an enzyme (such as HindII or EcoRV) 
which leave flush ends on its DNA substrate. 


e Another simple procedure is to perform the ligation 
in the presence of short synthetic oligonucleotides, 
called ‘adaptors,’ which have been prepared in such 
a way as to eventually ligate with one terminus to 
the fragment and one to the vector and then be aptly 
cleared. 


At the end of the ligation step only a fraction of the 
vector molecules has been ligated to a foreign DNA to 
give rise to a recombinant or hybrid molecule. The 
vector by itself is able to recircularize by ligation of its 
ends and transform cells with an efficiency higher than 
that of the molecules carrying an insert: therefore a 
mixed population of vectors either carrying or devoid 
of an insert has been produced. Selection of the recom- 
binant clones of interest will be performed after trans- 
formation as described below. 


Transformation 
Transformation is the step by which the hybrid mol- 
ecule comprising the vector and the ligated DNA 
duplex (as described above) is introduced into the host. 
Introduction of large DNA molecules in host cells 
does not occur naturally except by viruses. Therefore 
these cells have to be subjected to chemical or physical 
treatments that will make them ‘competent’ to accept 
foreign DNA. The rationale of the chemical method 
is the observation made in the 1970s that the treatment 
of E. coli cells with calcium chloride (which per- 
meabilizes the cell wall) in cold shock conditions, 
considerably enhanced the uptake by these bacteria 
of a plasmid DNA. All the improved methods sub- 
sequently developed for bacteria are based on this 
observation. The frequency of this reaction is rela- 
tively low, because in the most favorable conditions 
only one out of 500-1000 cells takes up the plasmid. 


However, the efficiency is quite sufficient for the 
purpose of cloning because 1 microgram of vector 
(containing several billions of molecules) can easily 
generate tens or even hundreds of millions of trans- 
formed cells. This method is not applicable to other 
types of cells; transformation of eukaryotic cells can 
be performed by agents such as calcium phosphate, 
which coprecipitates DNA and cells, or, better, poly- 
ethylene glycol, known to favor cell fusion and, in 
cloning protocols, membrane permeabilization, and 
thus DNA uptake. 

A physical method is based on an electric shock — 
electroporation. It is very efficient and can be used for 
all types of cells including bacteria. For plant cells, a 
very efficient technique has been developed: the bom- 


bardment of cells or tissue with metal particles coated 
with the DNA. 


Selection of the Transformants 

Since the frequency of transformation is rather low, 
with only a very small fraction of cells receiving the 
mixed population of the plasmids, it follows that these 
transformed cells have to be selected. The selection 
procedures will depend on the nature of the host cell 
and on the properties of the vector. As far as cloning 
into bacteria is concerned, the selection is easily 
achieved due to their sensitivity to antibiotics when 
devoid of vector. As described under “Preparation of 
the vector,” most of the plasmid vectors carry one or 
more antibiotic resistance gene. Those host cells 
which have received the plasmid will be resistant to 
the antibiotic on an agar medium in a petri plate, 
whereas the untransformed host cells are unable to 
grow under these conditions. 

In yeast, an easy selection relies on nutrient require- 
ments. The yeast cells used for transformation are 
mutated in one gene of the many engaged in the path- 
ways for the synthesis of a nutrient (like an amino acid 
or a nucleoside) and therefore are able to grow only 
when this nutrient is provided in the medium. The 
yeast vectors can carry the corresponding wild-type 
gene and therefore may allow the growth only of the 
transformed cells when the medium lacks the nutrient. 
In plant and animal cells few general selection proce- 
dures are available. In general eukaryotic cells are not 
sensitive to antibiotics, so resistance to DNA base 
analogs or drugs is sometimes used instead. 


Screening of the Clones 

After ligation, a mixture of recombinant DNAs can be 
generated (see above) and, after transformation, the 
cells carrying the recombinant of interest must be 
identified: this can be a difficult task. For this reason, 
cloning in eukaryotic cells is almost always performed 
in two steps: (1) transformation with a shuttle vector 
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and screening in bacteria (E. coli in general) and (2) 
transformation of the selected transformant into its 
final host. Different screening techniques have been 
developed. Some vectors have been designed with 
built-in features to detect the presence of the insert: 


® insertion products with the foreign fragment 
spliced in the gene for an antibiotic resistance can 
allow a counterselection for the loss of resistance to 
this antibiotic by the cell. 

e insertion of the foreign DNA in a coding sequence 
which yields a colorless bacterial colony by in- 
activating the function that makes colored colonies. 


These two methods are very convenient as they can 
be performed on a large population of clones. Other 
methods useful for the analysis of a limited number of 
transformants involve: 


e the preparation of plasmid DNA from individual 
colonies and examination of the presence of the 
insert by restriction analysis size; or 

è examination of the presence of the insert in the 
plasmid DNA directly by the PCR reactions. 


All these methods, some rather laborious, are only 
suitable when a limited number of recombinants has 
to be screened. Other procedures allowing the screen- 
ing of plates containing up to several thousands of 
recombinants have been developed. In these cases the 
colonies are transferred to a nitrocellulose filter which 
is tested for the presence of the insert by hybridization 
with a radioactively labeled probe containing the frag- 
ment of interest, or part of it. When the inserted 
fragment yields an easily detectable phenotype, or can 
complement some host functions, the relevant prop- 
erty can be exploited to identify the correct clone; this 
last approach is being used mainly in yeast in the same 
way as already described in “Transformation.” 

At this point, the selected clones are ready for 
further structural studies or may be used as a source of 
recombinant plasmid to be introduced in another spe- 
cialized cell. Eventually, it may be microinjected into 
fertilized oocytes and become incorporated in the 
genome of the resulting transgenic organism. A slightly 
different route to a transgenic organism may involve 
the introduction by chemical means of the recombinant 
construct into somatic cells: in this case the transgenic 
somatic nucleus may be transferred into an enucleated 
oocyte and eventually elicit the creation of a transgenic 
version of the organism which donated the somatic cell. 


Conclusions 


The ability of cloning to yield an exponential multi- 
plication of DNA molecules — in vivo through vector- 
mediated transformation, as well as im vitro via PCR, 
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is a step adopted in almost all research protocols in 
experimental genetics (Sambrook et al., 1989). 

DNA cloning has brought about a wealth of 
knowledge by enabling the study and sequencing of 
single genes from many organisms. Our knowledge 
of the relationship between structure and function of 
gene products has been enriched by the comparison of 
a high number of homologous genes from diverse 
origins. Structural domains have been established 
which now allow the rapid identification of the genes 
in the sequenced genome. Evolutionary studies have 
also benefited from the large amount of data which 
have shown new relationships among organisms. This 
knowledge expands to applied fields such as the indus- 
trial manufacturing of proteins of medical interest, 
production of transgenic organisms (particularly in 
agriculture) and ultimately to human gene therapy 
(Cavazzana-Calvo et al., 2000). 

The recent development of somatic cloning has been 
receiving some attention when the potential to pro- 
duce large numbers of identical individuals has been 
questioned. So-called ‘therapeutic cloning’ has been 
presented as a procedure which, again through nuclear 
transfer into enucleated oocytes, may bring about the 
creation of embryo cells whose further development 
may be directed toward the production of specific 
cells or tissues to be used for the replacement of defect- 
ive ones (Colman and Kind, 2000). 
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DNA denaturation refers to the melting of double- 
stranded DNA to generate two single strands. This 
involves the breaking of hydrogen bonds between the 
bases in the duplex. From a thermodynamic point of 
view, the most important contribution to DNA helix 
stability is the stacking of the bases on top of one 
another. Thus, in order to denature DNA, the main 
obstacle to overcome is the stacking energies that 
provide cohesion between adjacent base pairs. In gen- 
eral, stacking energies are less for pyrimidine/purine 
(YR) steps, and for AT-rich regions. Thus, the 
sequence TATATA would be expected to melt quite 
readily, and this is indeed what happens, both in a test 
tube and inside cells. 

There are a variety of ways in which to denature 
DNA. Perhaps one of the most common (and oldest) 
methods used in the laboratory is simply to heat the 
DNA to a temperature above its Tm or melting point. 
The unstacking of the DNA base pairs can be readily 
monitored spectrophotometrically. DNA absorbs 
strongly at 260 nm, and as the DNA melts, the absorb- 
ance will increase until all of the DNA is melted, 
and then remains constant on further heating. (This 
is called the ‘hypochromic effect,’ and the absorbance 
of single-stranded DNA is usually around 50% 
greater than that of the corresponding duplex DNA.) 
The process is reversible, and the renaturation time of 
DNA can be used to estimate its base-composition as 
well as the presence of repetitive fractions within the 
sequence. This method was used in the 1960s to 
monitor differences in the base composition of DNA 
from different organisms, and also to demonstrate that 
eukaryotic DNA contained a large fraction of repeat- 
ed sequences. Figure | shows the melting tem- 
peratures of genomic DNA from several different 
microorganisms as a function of the AT content of 
the genome. 

The actual Tm of a given piece of DNA will depend 
on several factors, such as the length of the DNA 
sequence (shorter pieces of DNA will tend to melt 
more easily than longer pieces), the base composition 
of the DNA (in general, regions with alternating 
pyrimidine/purine steps and AT-rich regions will 
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Melting Temperature of DNA in Various Microrganisms 
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Figure | 


Melting temperatures of the genomic DNA from various organisms, as a function of AT content, ranging 


from 33% AT (Deinococcus radiodurans) to 75% AT (Ureaplasma urealyticum). A straight line representing the best fit 


through the points is also shown. 


melt more readily), the topological condition of the 
DNA (e.g., whether it is a closed circle that is relaxed 
or supercoiled, or a linear piece, or is heavily nicked), 
and the composition of the buffer (in terms of the 
amount of salt and which ions are present). Given the 
roles of all of these parameters, it is difficult to predict 
accurately the exact melting temperature of a given 
sequence, although it is generally easy to say which 
region within a long piece of DNA will melt first. 
Denaturation of small regions of DNA within a 
much longer sequence can be estimated by using 
enzymes or chemicals that modify or cut single- 
stranded DNA more readily than duplex DNA. 
Some enzymes, such as methylation enzymes and 
certain single-strand-specific nucleases, can be used 
to monitor the denaturation status of a particular 
region of DNA, either in a test tube or in a living 
cell. Some chemicals will react preferentially with 
single-stranded DNA, such as haloacetaldehydes 
(e.g, chloroacetaldehyde), permanganate, diethyl 
pyrocarbonate (DEP), or osmium tetroxide. These 
chemicals can be used in a similar way to the enzymes, 
and the location of the modified bases can be detected 
using polymerase chain reactions (PCRs). As an alter- 
native, fluorescently labeled oligomers specifically 
designed to hybridize to a suspected region of single- 
stranded DNA can be used both im vivo as well as in 
vitro. Another method is to use a cross-linking agent, 
such as psoralen, to cross-link the single strands 


together, followed by electron microscopy to monitor 
the single-stranded regions. 

There are at least two major biological reasons for 
denaturing the DNA within a cell: DNA replication 
and transcription. In both cases, proteins bind to spe- 
cific DNA sequences, strongly bend the DNA helix, 
and then use the localization of torque to force the 
double-stranded DNA to open (denature) at a specific 
point. In promoters, this is often at the TATA box, 
which melts quite readily. In addition, there are 
specific proteins that bind to single-stranded DNA 
and stabilize denatured regions; this is important, 
for example, in DNA replication and transcription. 
Figure 2 shows the AT content and stacking energy 
for the lac operon in Escherichia coli. Note that the 
regions that melt most readily are upstream of the genes. 

Experimentally, there are times when it is import- 
ant to keep DNA in a single-stranded state. This can 
be done by a variety of methods. Single-stranded 
DNA can be isolated by using a PCR primer that is 
‘tagged,’ and then separating the tagged strand by 
denaturing gel chromatography. High per cent acryl- 
amide gels (e.g., 12 %) can be used to purify oligo- 
mers; these gels have such small pore sizes (around 
1.2nm, depending on the bis:monomer ratio) that 
double-stranded DNA simply will not fit (the width 
of double-stranded DNA is about 2nm). Urea can 
be added to help stabilize the single-stranded con- 
formation, and running gels at higher temperatures 
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Figure 2 (See Plate 12) A ‘DNA atlas’ for the lac operon in Escherichia coli. Genes oriented in the ‘forward direction’ are shown in blue, whilst genes in the 
‘reverse direction’ are indicated in red (B). (C) Color-coded bar representing the calculated stacking energy in kcal mol ~' of the DNA sequence: green indicates 
regions that will require more energy to melt (e.g., more negative numbers), and red indicates regions that will melt more readily (that is, the stacking energy values 
are smaller or closer to zero). Color-coded bar indicating the AT content of the region: blue represents lower AT content, and red indicates more AT-rich regions. 
Note that near the beginning of the operon there are more AT-rich regions, which also correspond to regions that will melt more readily. For more information on 
DNA atlases and melting profiles of regions upstream of genes for whole genomes, see Pedersen et al. (2000) and DNA Genome Atlas (http://www.cbs.dtu.dk/services/ 


GenomeAtlas/). 
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facilitates DNA denaturation. Glyoxal agarose gels 
can also be used to stabilize single-stranded DNA or 
RNA. This is particularly important for Southern and 
Northern blotting methods. 


Further Reading 
Thomas R (1993) The denaturation of DNA. Gene 135: 77-79. 
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Off to a Good Start 


The central role of DNA as a location of genetic in- 
formation is today unquestioned. Since the early 
discovery of nucleic acid in the 1860s this has not 
always been the case and gives an interesting insight 
into how scientific knowledge evolves. The 1860s is 
also the decade in which Gregor Mendel was conduct- 
ing his famous genetic crossing experiments with gar- 
den peas. Johan Friedrich Miescher had moved from 
Gottingen where he had been a medical student and 
was working in Tübingen between 1868 and 1869 with 
Felix Hoppe-Seyler, who was one of the founding 
fathers of physiological chemistry. Miescher decided to 
investigate the cells present in the pus of postoperative 
bandages. This material, which is rich in white 
blood cells, was available in large quantities in the 
years before antibiotics became available and wound 
suppuration was common: it was used extensively for 
scientific investigation in order to avoid the request of 
blood and tissue samples from patients. Miescher 
observed that when treated with mild alkali the nuclei 
swelled up and burst and he was able to isolate the 
constituent material ‘nuclein,’ which was highly vis- 
cous and behaved biophysically very unlike protein 
material commonly isolated from cells. Miescher 
returned to his birthplace, Basel, and the publication 
of this work was delayed until 1871 owing to the 
Franco-Prussian war and Hoppe-Seyler’s desire to 
repeat the work (Miescher, 1871a, b). Miescher’s own 
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work in Basel continued with the study of salmon 
sperm nucleic acid (the word nucleic acid was coined 
by his student, Richard Altmann). The biologists 
Oskar Hertwig in Germany and Hermann Fol in 
Switzerland observed the entry of sperm into eggs 
and fusion under the microscope. Zacharias’s work 
in 1881 indicated that chromosomes contain nuclein, 
hence at the turn of the century nucleic acid became a 
strong candidate for the genome-encoding molecule 
(Rattray-Taylor, 1963; Oldby, 1974; Portugal and 
Cohen, 1977; Gribbin, 1985; Judson, 1995). 


Chemical Characterization of DNA and 
a Misleading Hypothesis concerning the 
Information Content of the DNA 
Molecule 


Great strides into the characterization of DNA into 
its chemical constituents were made by Phoebus 
A. Levene, a brilliant Russian émigré working at the 
Rockefeller Institute in New York. He showed that 
DNA was an acid polymer consisting of sugar, phos- 
phate, and base. Unfortunately he proposed a ‘tetra- 
nucleotide hypothesis’ (Levene and Bass, 1931) which 
became dogma and proposed that DNA was made up 
of a monotonous repeat of the four bases. However, it 
was unclear how such a structure could contain the 
information required to encode a cell. At the time it 
was known that proteins were present in the nucleus 
and were made up of repeating units (amino acids), 
and this lead to a swing in belief that proteins might 
be the important molecules that contain the genetic 
information. 


Experiment Overturns Dogma 


Fred Griffith at the Ministry of Health laboratories in 
London had made the intriguing observation that 
when the cellular contents of virulent smooth pneumo- 
cocci were heat-treated and mixed with avirulent 
rough pneumococci which was then used to inoculate 
rats, the rats died and their blood, surprisingly, was 
found to contain the smooth-coated virulent form and 
could be used to further transmit infectivity. 

Oswald Avery and coworkers at the Rockefeller 
Institute in New York repeated this work extremely 
carefully and isolated the ‘transforming principle’ 
from the contents of the pneumococci. After growing 
up tens of liters of culture and performing extensive 
controls to eliminate proteins and other possible bio- 
chemical contaminants, the ‘transforming principle’ 
was found to be DNA. The paper describing these 
results (Avery et al., 1944) was very cautiously worded 
and was not highly effusive about this pivotal result and 
its full significance was only slowly appreciated. This 
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is probably owing to the fact that a culture of great 
scientific caution abounded at the Rockefeller Insti- 
tute at the time. Northrup at the Rockefeller Institute 
had shown that owing to minor protein contaminants 
being present in his reaction tubes, the famous 
German organic chemist Willstater had incorrectly 
pronounced that proteins did not catalyze enzymic 
reactions! Avery was awarded the Copley Medal by 
the Royal Society for his work, but sadly died before 
a Nobel Prize could be awarded to him. 

Strong supporting evidence that DNA is the ‘trans- 
forming principle’ came from the famous Waring 
blender experiment performed by the members of 
the phage group, Alfred Hershey and Martha Chase 
(Hershey and Chase, 1952). They labeled phage with 
radioactive sulfur and phosphorus. It was known that 
DNA does not contain sulfur and therefore labels 
only the phage capsids. On transfection phages stick 
to the coat of the bacteria and inject their DNA into 
the bacteria. Vigorous agitation with a blender removes 
the capsids from the bacterial surface and Hershey and 
Chase showed that the contents of the bacteria 
contained DNA labeled with °P and the dislodged 
capsids had *°S present. Although not as rigorous as 
the Avery experiments, this simple experiment show- 
ing injection of radioactive DNA and transformation 
caught the imagination and was rapidly disseminated 
by Max Delbriick to other members of the phage 
group, one of whom was James Watson, who had 
studied for his PhD with Salvador Luria. One of 
the earliest applications of electron microscopy by 
Anderson showing the presence of empty DNA-less 
capsids in transfection was not as readily assimilated 
(Watson, 1969; Oldby, 1974; Gribbin, 1985; Crick, 
1989; Chomet, 1995; Judson, 1995). 


Peace-Time Pursuit of Physics and the 
Structure of DNA 


In the post-war years a number of physicists had 
started to turn their sights toward the application of 
physics to biological problems. Schrédinger’s book 
What is Life, which discussed how the molecule that 
carries genetic information has to survive thermal 
bombardment at blood temperature, was widely read 
by physicists. Also the presence of the quantum phy- 
sicist Max Delbriick in the phage group acted as a great 
catalyst for this movement toward biological pro- 
blems. At King’s College London, Professor Sir John 
Randall had received funding from the Medical 
Research Council to set up a biophysics group to 
study, amongst other problems, the structure of DNA. 
Maurice Wilkins, Rosalind Franklin, and Raymond 
Gosling at King’s College worked on the prob- 
lem of solving the structure of DNA using X-ray 


crystallographic methods. Rosalind Franklin suc- 
ceeded in precisely controlling the humidity of her 
DNA fiber sample using hydrogen bubbled through 
ammonium sulfate solutions, which resulted in her ob- 
taining the beautiful and now famous X-ray diffraction 
pattern No. 51 of the high-humidity biologically rele- 
vant B-form. She also obtained excellent pictures of 
the lower humidity A-form and showed that the dif- 
fraction patterns recorded earlier in 1938 by Astbury 
of a much lower quality were a mixture of the A- and 
B-forms. Personality clashes, an erroneous value for 
the DNA density of the B-form, and time spent on 
trying to fully index the diffraction pattern and solve 
the Patterson function for the A-form slowed down 
the elucidation of the more biologically relevant 
B-form at King’s. 

The approach taken by James Watson and Francis 
Crick in Cambridge was that developed by Linus 
Pauling to solve the structure of the «-helix, namely 
to build a model to fit the experimentally relevant data 
incorporating the relevant chemical constraints. That 
the B-form is a helix was appreciated by both the 
London and Cambridge groups. Stokes at King’s and 
Crick, Vand, and Cochran had shown independently 
that the X (cross) pattern in the diffraction pattern of 
the B-form arose from diffraction by a helical object. 
An early model of a triple DNA helix was built by 
Bruce Fraser in the London group on the basis of the 
erroneous density value and the knowledge that the 
structure must be a helix. The key point observed by 
James Watson and Francis Crick was that the sym- 
metry of the B-form diffraction pattern dictated that the 
helix had to have a twofold normal to the helix axis, 
and hence antiparallel strands and an even number of 
strands. In addition to this Watson and Crick pulled 
together a wide range of disparate information: Char- 
gaff rules dictating that the A:T/G:C ratio in a variety 
of organisms is one to one and the correct tautomers 
for the bases allowed correct hydrogen bonded base- 
pairing, the latter gleaned from Jerry Donahue; the 
correct chemical linkages of base, sugar, and phos- 
phate from Lord Todd’s organic chemical group at 
Cambridge; a, pitch of ten base pairs per helical 
turn over 34 A with a base-pair separation of 3.4 A 
(1 A = 10°'°m) (from the reflections on the helical 
cross and the strong meridional reflection in the B- 
form diffraction pattern); potential charge neutraliza- 
tion of the phosphate groups on the outside of the 
helix; and Furberg’s result that base and sugar are 
noncoplanar. The result was the construction of 
their double-helical model for the B-form of DNA 
(Watson, 1969; Oldby, 1974; Portugal and Cohen, 
1977; Chargaff, 1980; Gribbin, 1985; Crick, 1989; 
Chomet, 1995; Judson, 1995), which was published 
in 1953 (Watson and Crick, 1953) together with two 


papers from the King’s College London group de- 
scribing the experimental data (Franklin and Gosling, 
1953; Wilkins et al., 1953). Watson and Crick’s model 
immediately indicated how DNA could be replicated 
and had an immense impact on genetics and biochem- 
ical experiments. The DNA structure was a huge cata- 
lyst for further experiments to elucidate transcription 
and also the biochemistry of replication. Severo Ochoa 
and Arthur Kornberg and other researchers went 
on to work out how nucleic acids are synthesized 
(Kornberg, 1991; Kornberg and Baker, 1991). 


Further DNA Structural Studies 


Rosalind Franklin went on to publish the structure of 
the A-form of DNA, which is a form to be found in 
the RNA-DNA hybrid structures, such as at the start 
point of replication. The King’s group went on to 
refine further the DNA model, study fibers from 
chromosomal material, and also study the structure 
of polymers of repeating sequences using X-ray fiber 
diffraction techniques. This work was a collaboration 
of Maurice Wilkins, Watson Fuller, Struther Arnott, 
Herbert Wilson, Don Marvin, Bob Langridge, Mike 
Spenser, and coworkers (Chomet, 1995). With the 
pioneering work in the organic synthesis of DNA by 
Gobind Khorana, Marvin Carruthers, and others it 
became possible from the late 1970s to synthesize 
and crystalize a specific DNA sequence, using initially 
phosphotriester chemistry in solution and subsequent- 
ly by phosphodiester chemistry on a solid support to 
make the DNA. Initial studies of longer DNA frag- 
ments were limited to crystallographic groups in 
collaboration with a few organic groups keen to syn- 
thesize DNA ona milligram scale. Richard Dickerson 
and Horace Drew solved the single crystal structure of 
the sequence d((CGCGAATTCGCG) (Wing et al., 
1980) using DNA synthesized by Itakura’s group. 
This structure was a B-DNA helix, but resolution of 
the structure showed how sequence determines local 
structural variation. Alex Rich, Andrew Wang, and 
coworkers solved the structure of the sequence 
d(CGCGCG) (Wang et al., 1981) using DNA synthe- 
sized by Jacques van Boom and Gijs van der Marel, 
and this turned out to have a novel left-handed Z- 
conformation. The advent of commercially available 
DNA synthesizers greately increased the number of 
structural studies of defined sequence DNA both by 
crystallography and in solution by nuclear magnetic 
resonance (Neidle, 1994; Calladine and Drew, 1997), 
giving a wealth of information on new structures such 
as the G-tetrad observed at the end of chromosomes 
and the structure of a Holliday junction, as well as 
information on the effect of sequence on DNA con- 
formation. This huge field of single-crystal DNA and 
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DNA-drug structures has been reviewed recently 
(Neidle, 1999). Single-crystal studies of DNA-binding 
protein-DNA complexes have paved the way for a 
fuller understanding of gene regulation and control. 
In recent years the structure of the nucleosome (Luger 
et al., 1997) present in the eukaryotic chromosome has 
been solved by Timothy Richmond’s group at the 
ETH in Zurich to high resolution by X-ray crystal- 
lography. These studies which were initiated in Sir 
Aaron Klug’s laboratory in Cambridge will pave the 
way for understanding both eukaryotic gene control 
and act as a stepping stone to elucidating the detailed 
higher order structure within the chromosome. 


Further Reading 

Branden Cl and Tooze J (1999) Introduction to Protein Structure, 
ch. 7. New York: Garland Press. 

Sayre A (1978) Rosalind Franklin and DNA. New York: WW 
Norton. 
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DNA is double-stranded. Any one (single) strand is 
held together by strong chemical bonds, forming along 
chain of nucleotides. The double strand comprises two 
single strands held together by weak physical bonds 
(such as hydrogen bonds) that are easily ruptured. 
DNA has stability because there are so many of 
these weak bonds. Nevertheless, as the temperature 
of the liquid in which DNA occurs rises, more and 
more of the weak physical bonds are ruptured and 
eventually the two strands come apart. The tempera- 
ture at which half of the physical bonds are ruptured is 
called the melting temperature. Once the double 
strands have all become pairs of single strands, one 
can slowly lower the temperature and the single 
strands will reanneal to each other again reforming 
the original doubled-stranded DNA. 

Now consider the possibility of melting, in the 
same test tube, DNA from two different species such 
as humans and chimpanzees. After melting them and 
then reannealing them, you will have some pairs that 
were in the original DNA but in some cases you will 
get double-stranded DNA that has one strand from 
the human DNA and other strand from the chimpan- 
zee DNA. The result is a hybrid DNA and the process 
is called DNA hybridization. One can study the 


properties of the hybrid DNA and one discovers 
that, among other things, the melting point of the 
hybrid is less than that of the original DNA. This is 
because the chimpanzee DNA is not identical to the 
human DNA and thus the number of hydrogen bonds 
is fewer to begin with and hence there is less stability 
of the hybrid. 

It is well established now that the melting point is 
lowered by close to 1 degree Celsius for each 1% of the 
nucleotides that differ between the two DNAs being 
investigated. That leads to a simple method for deter- 
mining which organisms are closer to each other. 
For example, if the hybrid DNA between human and 
chimpanzee had its melting point lowered by 1 degree 
while that of chimpanzee (or human) DNA with 
baboon DNA had a hybrid melting point lowering 
of 2 degrees, one could conclude that the baboon diver- 
ged from the human-chimpanzee lineage before the 
human-chimp divergence. Moreover, because this 
divergence is reasonably linear with time for small 
changes in melting point, one might easily conclude 
that the divergence time for the baboon and human 
was twice as long ago as the human and chimpanzee 
divergence. 


See also: Molecular Clock; Nucleotides and 
Nucleosides 


DNA Invertases 


See: Hin/Gin-Mediated Site-Specific DNA 
Inversion 
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Lesions in DNA result from damage that occurs spon- 
taneously or as a result of external agents. Often these 
lesions result in blocks to DNA replication and/or 
mutagenesis. Some spontaneous lesions result in alter- 
ed bases such as the products of deamination. Cyto- 
sine deaminates to uracil, adenine to hypoxanthine, 
guanine to xanthine, and 5-methylcytosine to thymine. 
Each of these altered bases has different pairing prop- 
erties from the original base and will lead to mutations 
if left unrepaired. Other altered bases result from 
oxidative damage caused by reactive oxygen species 
generated during metabolism and cellular respir- 
ation. A wide variety of altered bases can be formed, 
of which ring-opened purines, thymine glycol, and 


8-oxoguanine are the most widely studied, the latter 
two being implicated in mutagenesis if left unrepaired. 
Spontaneous lesions are also produced by depurination 
and depyrimidination that results in cleavage of the 
N-glycosidic bond. If the cell is forced to replicate 
past these lesions, mutagenesis results frequently, 
since there is no way to determine the correct pairing 
partner for the lost base. 

Radiation and chemicals are among the treatments 
that can damage DNA. Ionizing radiation generates a 
myriad of DNA lesions, including thymine glycol, 
8-oxoguanine, thymine dimers, 5-hydroxymethylura- 
cil, and many others. Damage to sugars in the DNA 
and strand breaks have been detected; UV radiation 
also results in numerous photoproducts. Two differ- 
ent pyrimidine—pyrimidine dimers have been impli- 
cated in mutagenesis. These are the cyclobutane dimer, 
and the 6-4 photoproduct (also called the pyrimidine- 
pyrimidone (6-4) photoproduct). Thymine glycols are 
also generated. Many different kinds of chemicals 
damage DNA in a manner that can lead to mutagen- 
esis. Some agents such as methylmethanesulfonate, 
ethylmethanesulfonate, N-ethyl-N-nitrosourea, and 
N-methyl-N’nitro-N-nitrosoguanidine alkylate dif- 
ferent positions on the DNA. Alkylations at the O-6 
position of guanine and the O-4 position of thymine are 
best correlated with mutagenesis. Other agents cause 
interstrand cross-links, blocking DNA replication. 
Cis-platin (cis-platinum (II) diaminodichloride) is an 
example. Many chemical agents make large adducts to 
DNA bases, blocking replication and often resulting 
in mutagenesis; the carcinogens aflatoxin B,, 4-nitro- 
quinoline 1-oxide, benzo(a)pyrene diolepoxide and N- 
2-acetyl-2-aminofluorene are examples. The adducts 
are not always in the same place; thus, aflatoxin B, 
forms its principal adduct at the N-7 position of gua- 
nine, benzo(a)pyrene diolepoxide at the exocyclic 
amino group of guanine, and N-2-acetyl-2-amino- 
fluorene at the C-8 position of guanine. 


See also: Mutagens; Mutation 
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DNA ligases are enzymes that join two molecules 
of DNA together. They are specific for double-stranded 
DNA and bind to 5’ phosphorylated nicks which they 
can join to a neighboring 3’OH. The enzymes require 
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either ATP or NAD. They are widely distributed in 
living matter and are an important component of 
DNA replication and repair processes. They are used 
in recombinant DNA research for joining DNA 
molecules together, for example, when segments of 
DNA are cloned in plasmid and other vectors. 


See also: DNA Cloning 


DNA Mapping 


See: Chromosome Mapping 
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A DNA marker is a chromosomal locus that exhibits 
allelic variation within a breeding population of ani- 
mals or plants, and for which a cloned DNA probe or 
sequence-specific assay is available that allows direct 
detection of the different alleles within any genomic 
sample by a method of hybridization such as Southern 
blotting or polymerase chain reaction (PCR). DNA 
markers are used extensively in mapping studies of 
loci defined solely by phenotype. 


See also: DNA Hybridization 


DNA Modification 


S Brenner 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0362 


In addition to the four bases, guanine, cytosine, ade- 
nine, and thymine, many genomes contain chemically 
modified derivatives such as 5-methylcytosine and 
N-methyladenine. Enzymes catalyzing such modifi- 
cations at a wide range of different sites are found 
in bacteria, where they are usually associated with 
restriction enzymes, which cut the DNA at the same 
sites protected by the cognate modification enzymes. 
A large range of specificities exists and more are con- 
stantly being discovered. Eukaryotic cells, especially 
those of vertebrates and plants, are heavily modified 
but in these cases the function of the modification is 
not well understood. 


See also: DNA 
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DNA polymerase n (pol n) isa translesion DNA poly- 
merase, encoded by the gene RAD30A (chromosome 
6). The enzyme replicates nondamaged DNA with low 
fidelity, due to a lack of proofreading activity. Unlike 
polymerases 6 and g, it inserts adenines across thymine- 
thymine dimers, the most frequent pyrimidine dimer 
lesion in DNA induced by UV. Thereby it protects the 
cell against UV-induced mutagenesis. Inherited defi- 
ciency of pol 1 results in a variant form of the skin- 
cancer-prone disorder xeroderma pigmentosum. 


See also: DNA Polymerases; Pyrimidine Dimers; 
Xeroderma Pigmentosum 


DNA Polymerases 


J H Miller 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0364 


The first DNA polymerase activity detected was 
described by Arthur Kornberg in the late 1950s. This 
Escherichia coli polymerase, now termed DNA poly- 
merase I or simply Pol I, is but one of at least five 
DNA polymerases in this same bacterium. Pol I and 
Pol III carry out normal DNA replication, with Pol 
III carrying out continuous synthesis on the leading 
strand and discontinuous synthesis on the lagging 
strand, leaving gaps that are filled in by Pol I and 
sealed by ligase. Pol I and probably Pol II are active in 
DNA repair. Different polymerases can have different 
activities. For instance, Pol I has not only a 5’ — 3’ 
polymerase activity that will add deoxy nucleotide 
triphosphates onto a primer off of a template DNA, 
it also has a 3'—5' exonuclease activity for mismatched 
bases, and a 5'—3' exonuclease activity that operates 
on double-stranded DNA. Recently, two additional 
polymerases, Pol IV and Pol V, have been described 
that can replicate past certain noncoding DNA 
lesions, such as photodimers. However, these poly- 
merases have a lowered fidelity of replication, and 
result in more frequent mutations. Pol IV and Pol V 
are among the SOS-induced functions. Polymerases 
can be relatively simple, as in the case of some bacterial 
phage polymerases, or can be more complex, as in 
the case of E. coli Pol III, which has as many as 20 


individual polypeptide subunits. Eukaryotic DNA 
polymerases can be similarly complex. Mutants with 
lowered replication fidelity resulting from altered 
polymerase subunits have been described. For in- 
stance, mutD strains of E. coli have a defective £ sub- 
unit and lack proofreading, yielding a strong mutator 
phenotype. 


See also: Mutator Phenotype; SOS Repair 
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The term DNA recombination is often used inter- 
changeably with genetic recombination, but there are 
two contexts in which it may have a distinct meaning. 
First, DNA recombination is distinguished from RNA 
recombination, which may occur during the replica- 
tion of RNA viruses or in some in vitro reactions. 
Second, the term is sometimes used to describe recom- 
bination of DNA that has been introduced into cells, 
as distinct from natural recombination processes that 
involve endogenous chromosomes. As with genetic re- 
combination, a distinction is made between legitimate 
recombination, which involves extensive sequence 
homology, and illegitimate recombination, which is 
supported by little or no sequence homology between 
the interacting DNAs. 


See also: Genetic Recombination; Illegitimate 
Recombination 
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The genome of all cells is subject to constant chemical 
alteration (Friedberg et al., 1995). These alterations 
constitute DNA damage and arise as a consequence 
of reactions that occur both spontaneously in living 
cells, and as a result of their exposure to numerous 
environmental agents. Spontaneous DNA damage can 
occur by multiple mechanisms. First, as is the case 
with other biological macromolecules, the chemical 
stability of DNA at physiological temperatures and 


pH is no greater than that of the collective chemical 
bonds of which it is comprised. Hence, the nitrogen- 
ous bases adenine, cytosine, guanine and thymine are 
subject to alterations due to spontaneous tautomeric 
shifts as well as the spontaneous loss of exocyclic 
amino and methyl groups. Additionally, spontaneous 
hydrolysis of the glycosylic bonds linking the bases to 
the sugar—phosphate backbone of DNA results in the 
loss of free purines and pyrimidines, leaving sites of 
base loss in the DNA, which are themselves subject to 
further spontaneous chemical alterations. Second, the 
fidelity of the process of DNA synthesis is limited by 
the accuracy of the replication machinery with respect 
to correct base pairing. Hence, DNA replication is 
intrinsically error-prone. The magnitude of errors dur- 
ing stable base pairing depends on the DNA polymer- 
ase in question and its associated accessory proteins. 

Probably the most prevalent exogenous source of 
DNA damage derives from various reactive oxygen 
species (ROS) which are products of both normal and 
abnormal oxidative metabolism in animal cells. These 
ROS can interact with and chemically modify the 
nitrogenous bases, the sugars and the sugar—phosphate 
linkages. Oxidative alterations to DNA represent an 
extensive and under-appreciated source of DNA 
damage which is likely the source of many mutations 
in cells, and hence of diseases that derive from muta- 
tions in somatic cells, such as cancer. Sunlight consti- 
tutes another prevalent source of exogenous DNA 
damage. Some of the UV radiation that filters through 
the earth’s atmosphere is readily absorbed by the 
nitrogenous bases in DNA resulting in the formation 
of multiple distinct photoproducts which interfere 
with normal DNA replication and transcription. 
Finally, DNA is interactive with multiple diverse 
exogenous chemical agents. Some of these agents 
derive from normal (or abnormal) cellular metabo- 
lism. Others derive from natural organic sources, typi- 
cally other life forms, and in recent decades yet others 
derive in increasing quantities from synthetic indus- 
trial pursuits. 

The last-mentioned category aside, all the other 
sources of DNA damage mentioned above have pro- 
vided powerful environmental influences for the 
selection of multiple and diverse mechanisms for the 
repair of damaged DNA. Not all DNA damage is 
intrinsically deleterious; rather, DNA damage, espe- 
cially spontaneous base damage, also provides a source 
of genetic variability in germline cells which serves as 
the essential basis for Darwinian evolution. Hence, 
the elaboration of DNA repair mechanisms which are 
perfect in the sense that they achieve complete restor- 
ation of all forms of DNA damage in cells is antithet- 
ical to the notion of genetic diversity and of evolution 
by natural selection. 
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Mechanisms of DNA Repair 


For the purposes of this discussion the term ‘DNA 
repair’ is strictly confined to cellular responses to 
DNA damage by which the chemistry and structure 
of the genome is restored to its native state. Most such 
mechanisms discussed here focus on the repair of 
damaged bases, the coding elements of DNA. How- 
ever, some mention will be made of the repair of 
damage to the sugar—phosphate backbone, especially 
the repair of DNA strand breaks. 


DNA Repair by the Reversal of Damage 
Several DNA repair mechanisms comprise relatively 
simple single-step enzyme reactions catalyzed by 
monomeric proteins which directly reverse base 
damage to DNA (Friedberg et al., 1995). In some 
cases these reversal reactions require specific co- 
factors, in some cases not. 


Enzymatic photoreactivation 

One of the quantitatively major photoproducts pro- 
duced in DNA exposed to UV radiation is the cyclo- 
butane pyrimidine dimer. The covalent joining of 
adjacent stacked pyrimidines in the DNA duplex 
derives from saturation of their respective 5'—6' double 
bonds following the absorption of UV radiation at 
~ 260 nm, resulting in the formation of a cyclobutane 
ring structure. The first DNA repair mode to be dis- 
covered is one in which such dimerized pyrimidines 
are restored to their normal monomeric state in 
the presence of visible light. Such repair, called enzy- 
matic photoreactivation, is catalyzed by a class of 
enzymes called DNA photolyases. All DNA photo- 
lyases contain chemical chromophores which absorb 
visible light of specific wavelengths. This light absorp- 
tion facilitates a series of photochemical reactions 
which destroy the cyclobutane ring and restore the 
normal 5'—6' double-bonded structure of the mono- 
meric pyrimidines (Figure 1). 

Microbial photolyases have strong amino acid 
sequence homology with a blue-light photoreceptor 
from plants that is devoid of photolyase activity. It is 
likely that such photoreceptor proteins evolved to 
photoreactivating enzymes under the selective pres- 
sure of the lethal effects of UV radiation. 


The repair of O°-alkylguanine in DNA 

DNA that is exposed to synthetic alkylating agents, 
such as methylmethane sulphonate, becomes alkylated 
at various reactive sites in the nitrogenous bases. 
A repair mechanism operates in cells to remove small 
alkyl groups adducted specifically to the O° position 
of guanine and the Of position of thymine (Friedberg 


560 DNA Repair 


(A) Native DNA 


(B) Pyrimidine dimer in UV DNA 


(C) Complex of DNA with 
photoreactivating enzyme 


7a 


noe rt 


(D) Absorption of light (>300 nm) 


ap 
too. Neo 


(E) Release of enzyme to restore native DNA 


7a 


Figure | Schematic illustration of the enzyme- 
catalyzed monomerization of pyrimidine dimers by DNA 
photolyase; an example of DNA repair by the reversal 
of base damage. The colored symbols (square and 
triangle) represent the two chromophores which are 
required for catalytic activity in all DNA photolyases. 


et al., 1995). The existence of this DNA repair mode 
suggests that it evolved in response to the selective 
pressures imposed by aberrations of natural alkylation 
pathways, such as the methylation of proteins or other 
sites in DNA. The repair of O°-alkylguanine or 
O*-alkylthymine transpires by a single step reaction 
with no cofactor requirement: alkyl (methyl or ethyl) 
groups are transferred from these specific sites to a 
particular cysteine acceptor site in an enzyme which 
catalyzes the reaction (Figure 2). The enzyme is 
designated O°-alkylguanine-DNA alkyltransferase. 
When a single molecule of the transferase accepts a 
single alkyl group it is inactivated. Hence, this trans- 
ferase reaction is not enzymatic in the kinetic sense 
since it is stoichiometric rather than catalytic. 


DNA Repair by the Excision of Damage 
Cells have also evolved multiple biochemical path- 
ways by which damaged or inappropriate (such as 


uracil, which is not normally present in DNA) bases 
are cut out (excised) from the genome. Damaged or 
inappropriate bases can be excised as free bases, as 
mononucleotides or as oligonucleotide fragments 
(Friedberg et al., 1995). 


Base excision repair 

The excision of free bases is catalyzed by a class of 
enzymes called DNA glycosylases. Each known 
DNA glycosylase more or less uniquely recognizes a 
particular damaged or inappropriate base in DNA and 
facilitates the hydrolysis of the N-glycosylic bond 
linking the base to the sugar-phosphate backbone 
(Friedberg et al., 1995). Since the offending base is 
released as a free base, this repair reaction is called 
base excision repair (BER) (Figure 3). 

The action of a DNA glycosylase effectively trans- 
lates one type of DNA damage to another, since the 
removal of bases leaves sites of base loss; so-called 
apurinic or apyrimidinic (AP) sites (Figure 3). You 
will recall from earlier discussions that such DNA 
damage can also arise from the spontaneous loss of 
bases. AP sites are recognized by another class of 
repair-specific enzymes called AP endonucleases, 
which catalyze the hydrolysis of phosphodiester 
bonds at such sites. These incisions place the baseless 
sugar—phosphate residues at free ends in DNA where 
they are accessible to enzymes that degrade DNA 
from such ends. In this way, what started out as a site 
of base damage (or an inappropriate base such as 
uracil) is converted to a small gap of one or several 
nucleotides in the DNA (Figure 3). These gaps are 
repaired by one of several DNA polymerases which 
use the intact opposite DNA strand as an informa- 
tional template. When all missing nucleotides are 
restored the last inserted one is joined to the extant 
DNA by a joining enzyme called DNA ligase. 

The creation of gaps in the DNA duplex and their 
subsequent repair by DNA synthesis and DNA liga- 
tion is common to all forms of excision repair. Since 
this mode of DNA synthesis (called repair synthesis to 
distinguish it from semiconservative synthesis) has an 
absolute requirement for an intact informational tem- 
plate, it is likely that the double-stranded nature of the 
DNA in most genomes evolved specifically for this 
purpose. 


Mismatch excision repair 

A second mode of excision repair operates more 
or less exclusively for the excision of mismatched 
(mispaired) bases that arise from errors during 
DNA replication (Friedberg et al, 1995). As is 
the case during BER, mismatch repair (MMR) 
involves the incision of DNA near sites of 
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Figure 2 An enzyme activity called O°-methylguanine-DNA methyltransferase (O°-MGT) transfers a methyl group 
from the O° position of guanine in DNA to a cysteine residue in the protein, thereby restoring the native chemistry 
of guanine and repairing the base damage by direct reversal. 


mispaired bases, once again allowing for the gener- 
ation of free ends from which a DNA strand can be 
degraded, providing for the release of the mispaired 
base. Since in principle any base in a mismatched pair 
can be either the ‘correct’ or the ‘incorrect’ base, the 
special problem that cells had to solve in the evolution 
of the repair of mismatched bases was how to distin- 
guish the DNA strand containing the mismatched 
base from that with the normal base. Since all mis- 
paired bases that arise during DNA replication have 
the ‘incorrect’ base in the newly replicated strand, the 
way this problem was solved was to evolve a means of 


‘marking’ newly replicated DNA strands and hence 
distinguishing them from extant template DNA 
strands. 

In many prokaryotes, including Escherichia coli, 
this strand discrimination is effected by the methyla- 
tion of GATC sequences in DNA (Figure 4). When 
DNA is replicated the daughter strands are transiently 
undermethylated prior to methylation of the GATC 
sequences. During this kinetic window specific pro- 
teins designated as MutH, MutL and Muts recognize 
both the hemimethylated DNA and the mispaired 
base pair which are brought into very close physical 


562 DNA Repair 


3' OOOOOoOoOO 5 


DNA glycosylase} Q 


AP 
ite 
5' Tp ps ae 
3! 


5' AP endonuclease | (2) 


l ASN 
5 P 3 


dRpase ii © 


s EER PLE , 
OH > 


DNA polymerase + DNA ligase j (4) 


g 


Figure 3 Schematic representation of base excision 
repair. For simplicity only the relevant DNA strand is 
shown in most of the figure. Certain forms of base 
damage are recognized by DNA glycosylases (1) that 
catalyze excision of the free base by hydrolysis of the N- 
glycosyl bond linking the base to the sugar—phosphate 
backbone. This reaction leaves an apurinic or apyrimi- 
dinic (AP) site in the DNA. Attack at such sites by a 5’ 
AP endonuclease (2) results in a strand break with a 
5’ terminal deoxyribose—phosphate moiety. These are 
excised by the action of a DNA deoxyribophosphodi- 
esterase (dRpase) (3). The resulting single nucleotide 
gap is filled by repair synthesis and DNA repair is 
completed by DNA ligase (4). 
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proximity by a DNA looping mechanism (Figure 4). 
The nonmethylated (newly replicated) DNA strand is 
then cut at the GATC site once again providing a free 
end in the genome for directed degradation of the 
DNA, repair synthesis and DNA ligation. Homologs 
of the MutL and MutS proteins have been identified in 
eukaryotic cells, including human cells. However, 
most, possibly all, eukaryotes use a strand discrimin- 
ation system that is not based on methylation of 
GATC sites. The exact mechanism for this discrimin- 
ation remains to be understood. 


Nucleotide excision repair 

A final mode of excision repair to be considered here is 
designated nucleotide excision repair (NER) since it 
results in the excision of damaged bases as compon- 
ents of oligonucleotide fragments (Figure 5). NER 
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Figure 4 Model for initiation of methyl-directed 
mismatch repair. Following the binding of MutL and 
MutH proteins to a mismatched site in DNA, the DNA 
is looped to bring the nearest nonmethylated GATC site 
into intimate physical proximity with this protein-DNA 
complex. The MutH protein then catalyzes a nick at the 
nonmethylated GATC site. 


deals with base damage such as cyclobutane pyrimi- 
dine dimers caused by UV radiation, and with a large 
and diverse spectrum of chemical damage. All sub- 
strates for NER share in common the fact that they 
form bulky distortions of the DNA duplex. In eukar- 
yotes such as the yeast Saccharomyces cerevisiae and in 
human cells, some (as yet undefined) feature of such 
bulky base damage determines its recognition by sub- 
units of a large multiprotein complex, the repairosome. 
Once tightly bound to DNA at sites of base damage, 
additional DNA subunits of the repairosome which 
are endowed with DNA helicase activity effect local- 
ized denaturation of the DNA duplex, thereby gen- 
erating excision repair ‘bubbles’ that incorporate the 
sites of base damage on one of the two strands. The 
helicase subunits which perform this function are 
members of a subcomplex called transcription factor 
ITH (TFIIH) which is also required for the initiation 
of RNA polymerase II transcription. Whereas the gen- 
eration of transcription bubbles facilitates the initiation 
of mRNA synthesis, during NER these bubbles are cut 
by structure-specific endonucleases which recognize 
the junctions between duplex and single-stranded 
DNA at the margins of the bubbles, thereby gener- 
ating nicks on each side of the offending base 


separated by about 30 nucleotides (Figure 5). It is 
not known how these endonucleases discriminate 
between such junctions on the damaged and un- 
damaged DNA strands. A reasonable speculation is 
that strand specificity derives from the precise archi- 
tecture of the repairosome when properly bound to 
DNA. 

The bimodal incision of DNA during NER is com- 
mon to both prokaryotes and eukaryotes. However, 
the biochemical mechanism just described is unique to 
eukaryotes. Prokaryotes utilize a somewhat different 
mechanism for bimodal incision that involves a group 
of proteins which are not conserved in eukaryotes. It 
would appear that the elaboration of a more complex 
mechanism for RNA polymerase II transcription 
initiation in eukaryotes led to the cooption of TFIIH 
for NER in such organisms. An interesting implica- 
tion of the observation that TFIIH proteins are 
required for both NER and RNA polymerase II 
transcription is the potential for competition for 
these proteins during these processes. Recent in vitro 
studies in yeast have indeed demonstrated inhibition 
of transcription initiation in the presence of active 
NER. Assuming that such inhibition also operates in 
vivo, it provides another means for limiting the elab- 
oration of mutant transcripts as a consequence of 
DNA damage. 

The bimodal incision of DNA generates oligonu- 
cleotide fragments which incorporate sites of base 
damage (Figure 5). Our understanding of the precise 
biochemical events that accompany oligonucleotide 
excision, and the repair synthesis needed to repair 
the gaps created by the excision events, is sketchy. 


Strand-Specific Excision Repair 

The biochemical mechanism of NER described 
above derives in large measure from im vitro experi- 
ments in which NER is strictly independent of RNA 
polymerase II transcription. However, it has been 
consistently observed in both prokaryotic and eu- 
karyotic cells that the kinetics of NER are somewhat 
faster in genes transcribed by RNA polymerase II 
than in transcriptionally-silent regions of the genome 
(Friedberg et al., 1995). This phenomenon is largely, if 
not exclusively, attributable to a kinetic preference for 
repair of the transcribed (template) strand compared 
to the nontranscribed (coding) strand of transcription- 
ally active genes. To date it has not been possible to 
establish a cell-free system that supports NER which 
is dependent on transcription. Hence, the molecular 
basis of so-called strand-specific repair remains unclear. 
In addition to a requirement for all the proteins 
that are indispensable for NER in transcriptionally- 
independent NER, strand-specific NER in human 
cells has a requirement for at least two other gene 


DNA Repair 563 


= Am 5 


5' | CA 3' 


Pyrimidine dimer 


Damage-specific DNA 
incising activity (2 nicks) y (1) 


tommo | 
5- Phot on esa 


Oligonucleotide excision | (2) 


mAN ~ 
' P. 
5 ~R hon R 3 OH 


DNA polymerase + DNA ligase | (3) 
TILT 
5 RRR 3" 


Figure 5 Schematic representation of nucleotide 
excision repair. For simplicity only the relevant DNA 
strand is shown in most of the figure. Base damage such 
as pyrimidine dimers is recognized by a damage-specific 
endonuclease that nicks the DNA on each side of the 
lesion, generating a potential oligonucleotide fragment 
(1). (The size of the oligonucleotide shown here is 
purely schematic.) Subsequent enzyme-catalyzed events 
result in the release (excision) of this fragment (2). The 
resulting gap (which is always larger than one nucleo- 
tide) is filled by repair synthesis and DNA repair is 
completed by DNA ligase (3) as in base excision repair. 


products encoded by the CSA and CSB genes. A 
requirement in strand-specific NER for the yeast 
homolog of CSB, designated RAD26, has also been 
shown. 


Defective DNA Repair and Human 
Disease 


One might reasonably anticipate that hereditary 
defects in the many DNA repair modes addressed 
above would lead to an increased mutational load in 
somatic cells and hence a significant predisposition to 
cancer. Defective DNA repair has been implicated in 
several human hereditary diseases (Friedberg et al., 
1995). Defective NER which operates independently 
of RNA polymerase II transcription leads to a disease 
called xeroderma pigmentosum (XP). XP is indeed 
characterized by a profound predisposition to skin 
cancer in sun-exposed individuals. Similarly, individu- 
als with hereditary defects in mismatch repair are 
highly prone to certain types of colon cancer, espe- 
cially so-called hereditary nonpolyposis colon cancer 
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(HNPCC) (Friedberg et al., 1995). Other relation- 
ships between repair-defective phenotypes and cancer 
predisposition are less clear cut. Individuals defective 
in the CSA or CSB genes which are required for 
strand-specific NER in human cells suffer from a 
debilitating developmental and neurological disorder 
called Cockayne syndrome (CS). There is no evidence 
that human CS patients are cancer-prone. However, 
these individuals are frequently incapacitated from an 
early age and often die very young. Thus, it has been 
argued that they might not suffer the same level of UV 
radiation exposure as XP individuals. Consistent with 
this notion, knockout mice defective in the CSB 
gene are skin-cancer-prone when exposed to UVB 
radiation. 

No human patients with hereditary defects in base 
excision repair or in the reversal of base damage have 
been reported. In the case of base excision repair one 
must seriously consider the possibility that an inabil- 
ity to repair the many types of spontaneous base 
damage, especially that produced by ROS, for which 
this repair mode is required, is incompatible with 
normal embryogenesis. Mice which are defective in 
O°%-alkylguanine-DNA alkyltransferase activity are 
viable, but at the time of writing little is known 
about their cancer predisposition. 


Repair of DNA Strand Breaks 


In addition to base damage the phosphodiester back- 
bones of the DNA double helix are vulnerable to 
damage, especially following exposure of cells to 
ionizing radiation (Friedberg et al., 1995). The repair 
of strand breaks, especially double-strand breaks, is an 
increasingly active area of research in the DNA repair 
field. The simplest mechanism for the repair of 
DNA strand breaks provides yet another example of 
DNA repair by the direct reversal of damage. Single- 
strand breaks with 3‘OH and 5’P termini can be 
directly repaired by DNA ligases, of which several 
exist in eukaryotic cells. A single case of a compound 
heterozygous defect in the gene that encodes 
DNA ligase I has been identified. Euphemistically 
designated the 46BR syndrome based on the code of 
cell lines from this individual, the patient, who died at 
an early age, suffered abnormal sensitivity to ionizing 
radiation and from malignant lymphoma. 

A primary mechanism for the repair of double- 
strand DNA breaks is by various recombinational 
events. As is the case with NER, so-called recombin- 
ational repair in eukaryotes requires multiple gene 
products which may also be organized ina multiprotein 
complex. The field of recombinational repair has 
received considerable impetus from recent studies 
demonstrating an interaction of the product of the 


BRCA2 gene implicated in hereditary breast cancer 
with at least one component of the recombinational 
repair machinery. Additionally, cells from mutant 
mice carrying certain mutant BRCA2 alleles which 
support embryogenesis (mutations in BRCA2 are 
typically lethal), are abnormally sensitive to killing 
by ionizing radiation. 


Mouse Models for Defective DNA Repair 
and other Cellular Responses to DNA 
Damage 


No updated consideration of DNA repair, especially 
that in mammals, can be complete without a consid- 
eration of the enormous potential of gene replacement 
by homologous recombination in mouse embryonic 
stem cells. In the past, the prokaryote E. coli, and more 
recently the lower eukaryote S. cerevisiae, have pro- 
vided indispensable genetic frameworks for various 
biochemical studies on DNA repair. Now, the gener- 
ation of mutant mouse strains suffering heterozygous 
and homozygous partial or complete deletions of 
selected genes, offers the potential for an elaborate 
genetic framework directly relevant to many cellular 
responses to DNA damage in humans. Multiple 
mouse strains have now been constructed with defects 
in BER, NER, and MMR. Additionally, as mentioned 
above, mutants defective in the repair of alkylation 
damage by direct reversal are available. Phenotypic 
characterization of individual mutants, and especially 
of strains bred to carry multiple different mutations, 
are expected to be highly informative, particularly in 
dissecting the multiple pathways to cancer in mam- 
malian cells. 
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DNA is the essential carrier of genetic information in 
all living cells. Every human cell contains roughly 2 
meters of DNA that specify the construction and 


hereditary makeup of an entire human being. Each 
human being contains a total length of DNA (if all of 
the cellular DNA components were placed end to 
end) that would wrap around the earth roughly 5 
million times. How is that much DNA maintained 
and protected from the ravages of noxious agents in 
the environment? The chemical stability of the DNA 
molecule is not unusually great. DNA undergoes 
several types of spontaneous modifications, and it 
can also react with many physical and chemical agents, 
of which some are endogenous products of the cellu- 
lar metabolism (e.g., reactive oxygen species) while 
others, including ionizing radiation and UV light, are 
threats from the external environment. The resulting 
alterations of DNA structure are generally incompat- 
ible with its essential role in preservation and transmis- 
sion of genetic information. Damage to DNA can 
cause genetic alterations, usually termed mutations 
and, if genes that control cell growth are involved, 
these mutations can lead to the development of cancer. 
Of course the DNA damage may also result in cell 
death which can have serious consequences for the 
organism of which the cell is a part; for example, loss 
of irreplaceable neurons in the brain. Accumulation of 
damaged DNA has also been considered to contribute 
to some of the features of aging. It is not surprising 
that a complex set of cellular surveillance and repair 
mechanisms has evolved to reverse the potentially 
deleterious damage that would otherwise destroy the 
precious blueprint for life. Some of these DNA repair 
systems are so important that life can not be sustained 
without them. An increasing number of human 
hereditary diseases that are characterized by severe 
developmental problems and/or a predisposition to 
cancer have been found to be linked to deficiencies 
in DNA repair. 


Types of DNA Repair 


Direct Repair 
The simplest DNA repair schemes are those that 
involve the direct reversal of the damage. Thus, the 
inappropriate methylation of guanine can be reversed 
by an enzyme called methyltransferase, that simply 
removes the offending methyl group and attaches it 
to itself. That reaction is irreversible, however, so an 
entire protein must be sacrificed for each repair event. 
The importance of repairing O°-methylguanine is 
clearly documented by such an energetically expensive 
mode for dealing with this lesion. O°-methylguanine 
codes for thymine rather than cytosine during DNA 
replication, so each unrepaired lesion effectively 
causes a base change, a potential transition mutation. 
Another example of direct repair is that of photo- 
reactivation, that can deal with a unique type of DNA 
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damage inflicted by UV light. UV causes adjacent 
pyrimidines (e.g., two thymines) in a strand of DNA 
to become covalently linked together to form a dimer. 
These dimers pose blocks to the essential DNA trans- 
actions of replication and transcription. An enzyme 
called photolyase recognizes and binds to the dimer 
and then, upon exposure to visible light, the enzyme 
catalyzes the splitting of the dimer to restore the intact 
DNA. It is not just that the enzyme “needs light to see 
what it is doing,” the absorbed light actually provides 
the energy for the dimer reversal. Many different 
species contain photolyases, but interestingly, humans 
apparently do not. 

A third example of direct repair involves the 
enzyme, polynucleotide ligase, that can rejoin single- 
strand interruptions at which there is no missing 
nucleotide and where the abutting ends are respect- 
ively, 3’ hydroxyl and 5’ phosphate. There is even at 
least one example of a special ligase that can repair a 
double-strand break, if there are no overlapping 
single-strand ends. 


Excision Repair 

The most ubiquitous and versatile modes of DNA 
repair are those in which the damaged or incorrect 
part of a DNA strand is excised and then the resulting 
gap is filled by repair replication using the comple- 
mentary strand as template. In fact, the redundancy of 
genetic information provided by the duplex DNA 
structure is essential to the maintenance of the genome 
by this “cut and patch” mode called excision repair. 
Each DNA strand can serve as a template for repairing 
the other strand, as well as for replicating it. Excision 
repair was discovered in the early 1960s through basic 
studies on the effects of UV irradiation on DNA 
synthesis in bacteria. Richard Setlow and coworkers 
at Oak Ridge National Laboratory found that DNA 
synthesis did not recover after UV irradiation of a 
certain UV-sensitive mutant strain of Escherichia coli 
bacteria and then they discovered that wild-type cells 
(but not the UV-sensitive mutant) could selectively 
remove thymine dimers from their DNA. The discov- 
ery of pyrimidine dimer excision was soon confirmed 
with other UV-sensitive bacterial mutants by Paul 
Howard-Flanders and his colleagues at Yale Univer- 
sity. At the same time, the patching step, called repair 
replication, was revealed by David Pettijohn and 
Philip Hanawalt at Stanford University, in an analysis 
of the qualitative nature of DNA replication in UV- 
irradiated bacteria. Soon thereafter, repair replication 
was also demonstrated at the University of California, 
San Francisco, by Robert Painter in UV-irradiated 
human cells. A postdoctoral researcher in Painter’s 
laboratory, James Cleaver, then discovered the first 
example of a DNA repair defective human hereditary 
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disease, xeroderma pigmentosum, to be discussed in 
more detail later. Cleaver was able to show that skin 
cells from the victims of this rare genetic disease, 
characterized by sunlight sensitivity and skin cancer 
predisposition, were deficient in repair replication. 
The excision repair pathway that deals with pyrimi- 
dine dimers and a large variety of cancer-causing 
chemical adducts to DNA is known as nucleotide 
excision repair (NER). The damage is recognized by 
an enzyme complex that recruits nucleases to cut the 
damaged strand on each side of the lesion. Then the 
damaged segment is removed by a DNA strand 
unwinding enzyme called helicase, and a DNA poly- 
merase synthesizes a replacement “patch,” using the 
nucleotide sequence information from the intact com- 
plementary strand. The patch is eventually joined to 
the contiguous DNA at the end by ligase to complete 
the repair process and restore the original intact DNA 
structure. This excision repair pathway can remove 
DNA damage from sites throughout the genome. 
However, a unique problem arises if the lesion is first 
encountered by a translocating RNA polymerase 
making messenger RNA, before repair enzymes have 
removed the damage and restored intact DNA. The 
polymerase may be arrested at the site of the lesion 
and that also prevents access to the damage by repair 
enzymes. Furthermore, the arrest of transcription in 
human cells can trigger a programmed death pathway, 
known as apoptosis. A dedicated excision repair path- 
way known as transcription-coupled repair (TCR) 
comes to the rescue and displaces the RNA polymer- 
ase, and then efficiently repairs the blocking lesion so 
that transcription may resume — and so that the cell 
may survive. 

Mismatch repair (MMR) is another example of 
excision repair. Mismatch repair is a process that cor- 
rects mismatched nucleotides in the otherwise com- 
plementary paired DNA strands, arising from DNA 
replication errors and recombination, as well as from 
some types of base modifications. This repair mode 
can also deal with small loops of single-stranded DNA 
at sites of deletions (missing segments of one strand) in 
the duplex DNA structure. The importance of this 
repair mechanism in maintaining genetic stability is 
illustrated by the observation that its absence results in 
a large increase in the frequency of spontaneous muta- 
tions. Some of these spontaneous mutations arise from 
mistakes introduced during DNA replication, in spite 
of the operation of a “proofreading” system that also 
helps to ensure the high fidelity of replication. In 
humans, genetic defects in several mismatch repair 
genes have been linked to hereditary nonpolyposis 
colon cancer (HNPCC) as well as to sporadic cancers, 
that exhibit instability in regions of DNA containing 
short repetitive sequences of nucleotides. In general, 


the types of genetic defects that lead to genomic 
instability are those that compromise the efficiency 
of, or eliminate, DNA repair pathways. Cancer is 
one of the adverse outcomes of genomic instability. 

The principal strategy for dealing with the spontan- 
eous loss of purines from DNA and some minor base 
alterations is base excision repair (BER), a repair path- 
way that is essential for DNA maintenance. In fact, 
the essential nature of BER is highlighted by the fact 
that no human hereditary diseases are known in which 
genes unique to this pathway have been mutated. In 
1h humans spontaneously lose on the order of a tril- 
lion guanines from their DNA, and these guanines 
must be replaced. Similarly, an unacceptably large 
number of cytosines become deaminated spontan- 
eously and the resulting product, uracil, in DNA 
must be removed and replaced with cytosine to restore 
the correct nucleotide sequence. Base excision repair is 
usually initiated by a glycoslase that recognizes the 
altered or inappropriate base and cleaves it from its 
sugar moiety in the DNA. Then the DNA backbone is 
cut at the resulting abasic site and a short patch is 
synthesized, that can be as short as only one nucleo- 
tide. Many DNA glycosylases recognize a particular 
form of base damage or a particular inappropriate 
base. For example, uracil-DNA glycosylase (UDG) 
removes uracil incorporated into DNA inadvertently 
instead of thymine, during semiconservative replica- 
tion (since the nucleotide pool contains dUTP) or 
formed by the endogenous hydrolytic deamination 
of cytosine as noted above. Uracil glycosylases are 
ubiquitous and represent one of the most highly con- 
served amino acid sequences to be found in proteins 
throughout evolution. The universal appearance of 
this base excision repair enzyme clearly attests to the 
importance of reversing the progressive conversion of 
DNA cytosine to uracil, in order to prevent the 
obvious mutagenic effect of this DNA alteration. 
Some other DNA glycosylases recognize a broader 
spectrum of lesions. For example, the E. coli Fpg 
protein removes oxidized purines, including form- 
amidopurine and 8-oxo-guanine, that appear in cellu- 
lar DNA because of endogenous reactive oxygen 
species. In E. coli, two glycosylases are known that 
excise purines methylated at their N-3 and N-7 posi- 
tions, and there are also several glycosylases that 
recognize oxidized pyrimidines. 

The high-resolution X-ray structural analysis of 
several DNA glycosylases together with mutational 
analysis and data on the three-dimensional structures 
of enzyme-substrate complexes have provided detail- 
ed mechanistic information on this class of repair en- 
zymes. UDG features a DNA-binding groove and an 
adjacent pocket which tightly fits a deoxyuridine resi- 
due, that can be flipped out from the DNA helix to be 


“interrogated” by the surveillance systems. Another 
type of glycosylase employs a large hydrophobic cleft 
that is rich in electron-donating aromatic residues, 
rather than a pocket to accommodate flipped out 
damaged residues, and it can act on a variety of lesions 
in this manner. 

The action of DNA glycosylases generates abasic 
sites, the same sort of DNA lesions that originate from 
spontaneous depurination. The repair of these sites 
(which can be considered secondary lesions) is initiated 
by abasic endonucleases. While E. coli contains at least 
two such enzymes, only a single enzyme has thus far 
been characterized in yeast and mammalian cells, and 
that catalyzes the incision of phosphodiester bonds 
exclusively on the 5’ side of abasic sites, leaving 5’ 
deoxyribose-phosphate and 3’ OH residues. Comple- 
tion of the BER pathway requires the removal of the 5’ 
deoxyribose-phosphate residue by a phosphodiester- 
ase, followed by DNA repair synthesis and ligation. 

Attempts to engineer mice deficient in the enzymes 
required for BER have typically resulted in an early 
embryonic death. This attests to the importance of the 
repair of DNA lesions from endogenous causes during 
embryonic development and it also is consistent with 
the notable absence of human cancer prone diseases 
characterized by defects in BER genes, noted above. 


Double-Strand Break Repair 

A very serious type of DNA damage is that in which 
both DNA strands have been severed. Double-strand 
breaks represent an important event caused by ion- 
izing radiation, but these are also naturally generated 
in the course of genetic recombination. In fact, 
double-strand breaks appear to be an essential inter- 
mediate in the process of V(D)J recombination in the 
immune system. Genetic recombination is the princi- 
pal mechanism for dealing with double strand breaks 
in which there are homologous stretches of nucleotide 
at the ends to be joined. If such homology is not 
present, however, then there is another system for 
nonhomologous end joining. An unrepaired double- 
strand break is a highly lethal event, and as few as one 
double-strand break in the entire genome is thought to 
be sufficient to signal cell cycle checkpoints that pre- 
vent attempted DNA synthesis or cell division until 
repair has been completed. 


Relationships between DNA Repair and 
Human Hereditary Diseases 


A complex interplay between intrinsic hereditary fac- 
tors and persisting DNA damage determines the 
susceptibility of humans to cancer. The correlation 
between mismatch repair deficiency and colon cancer 
susceptibility has been referenced above. The discovery 
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of microsatellite instability, that is, the frequent alter- 
ation in the tract lengths of certain short repetitive 
nucleotide sequences in some hereditary colorectal 
cancers, provided the first indication that the etiology 
of these cancers might involve a problem in correcting 
errors introduced during DNA replication. Since the 
repetitive sequences have a tendency to form strand- 
slipped structures, with mispairings during replication, 
small deletions giving rise to frameshift mutations 
are generated. The mismatch repair system normally 
corrects these errors. Thus, a defect in mismatch repair 
might be expected to result in microsatellite instabil- 
ity. The finding of mismatch repair gene defects in 
patients with HNPCC established that these defects 
could be the cause of the enhanced cancer incidence. 
HNPCC accounts for 5% of all colorectal cancers and 
the patients are also at some risk for cancers of the 
endometrium, ovary, stomach, and intestine. The cor- 
respondence between a mismatch repair gene defect 
and susceptibility to cancer provides support for the 
original hypothesis of Lawrence Loeb, at University 
of Washington, that tumorigenesis is usually pro- 
moted by a “mutator” phenotype. 


Xeroderma Pigmentosum (XP) 
As discussed earlier XP was the first example of a 
DNA repair deficient disease in humans. XP is a rare 
autosomal recessive disease characterized by severe 
sun-sensitivity leading to the high incidence of skin 
tumors. XP victims exhibits up to a 4000-fold en- 
hanced risk of cancer in sun-exposed skin but only a 
modest increase in internal cancers. Severe ophthal- 
mological defects and neurological abnormalities are 
alsoassociated withthe more severe forms of the disease. 
XP occurs worldwide, in all ethnic groups, and with a 
frequency varying from 1 to 10 patients per million. 
Cultured XP cells are unusually sensitive to UV. 
The “classical” XP complementation groups are 
defective in the early steps of NER while an XP vari- 
ant class shows no apparent defect in NER. The XP 
variant is deficient in a special DNA polymerase that 
is normally able to carry out translesion replication 
over pyrimidine dimers in the DNA. The importance 
of this polymerase is underscored by the fact that the 
patients lacking it have the same clinical problem of 
high cancer incidence as those who are deficient in 
excision repair. Thus, it is important that replication 
forks be able to bypass pyrimidine dimers, and perhaps 
other lesions as well, that can then be repaired later. 
Genetic analysis based upon fusions between XP 
cells isolated from different patients has revealed the 
existence of seven classical complementation groups, 
clearly demonstrating the genetic complexity of the 
disease. Most of the complementation groups exhibit 
defects in both the global genomic DNA repair and 
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the pathway of transcription coupled repair. However, 
several of the genes, XPC and XPE, appear to operate 
only in the global genomic pathway. Other genes are 
unique to the transcription-coupled repair pathway as 


described below. 


Cockayne Syndrome 

Cockayne syndrome (CS) is another rare autosomal 
recessive genetic disease in which the victims are 
severely sensitive to sunlight. However, unlike XP, 
the CS patients do not develop unusually high levels 
of skin cancer. Instead these patients are characterized 
by short stature, absence of facial fat, wizened appear- 
ance, and severe neurological deterioration during 
early development, resulting in mental retardation, 
hearing loss, optical atrophy, and an average life span 
of only 12 years. By far the majority of the known CS 
patients are defective in one of two genes, CSA or 
CSB, but there are a few examples of overlap between 
CS and XP. Thus, all three known XPB patients have 
CS, two out of 52 known XPD patients have CS, and 
the six most severely afflicted XPG patients have CS. 
The DNA repair defect that is common to all CS 
patients is a deficiency in transcription-coupled repair. 
This can account for the sunlight sensitivity (i.e., severe 
sunburn) because of the cell death, by apoptosis, due 
to transcription arrest at unrepaired lesions. The lack 
of cancer susceptibility includes the fact that dead cells 
do not form tumors and the fact that those cells that do 
survive exhibit perfectly normal global genomic 
repair, so most of the potentially cancer-producing 
lesions are removed. The most likely explanation for 
the severe developmental problems is that endogenous 
oxidative damage in some metabolically active cells 
(e.g., neurons) is blocking transcription. The lack of 
transcription-coupled repair then leads to apoptosis of 
these essential cells. An alternative model derives from 
the realization that both XPB and XPD gene products 
are components of an essential transcription initiation 
factor, TFITH. It has been suggested that CS is a 
‘transcription disease’ in which certain essential genes 
can not be transcribed at adequate frequencies. 


Trichothiodystrophy 

Although the photosensitive form of this rare auto- 
somal recessive disease has some features in common 
with CS (e.g., lack of skin cancer predisposition), tri- 
chothiodystrophy (TTD) presents several characteris- 
tics that are quite unique. These include ichthyosis 
(i.e., dry, scaly skin), but most notably brittle hair 
and nails, due to reduced sulfur content in the compon- 
ent proteins. It is normally the presence of the amino 
acid, cysteine, in certain proteins and the crosslinking 
of these proteins through disulfide linkage, that gives 
hair its flexibility. As with CS, several of the responsible 


genes implicated are XPB and XPD, but in addition 
there is a third complementation group, TTD-A, that 
unlike the others does not appear to be involved with 
the structure of TFIIH. The favored model for TTD is 
that of a transcription deficiency with respect to the 
genes for the sulfur-containing proteins, noted above, 
as well as others that may be common to CS. It is also 
conceivable that TTD could be a disease of ‘premature 
cell death’ in which the transcription deficiency and 
deficiency in transcription-coupled DNA repair 
could cause the apoptosis of certain classes of cells. 
These could include neurons, as with CS, but addition- 
ally those cells in the hair follicles that produce the high 
sulfur content proteins during the assembly of hair. 


Ataxia Telangiectasia 

Ataxia telangiectasia (AT) was originally identified in 
humans with a severe sensitivity to ionizing radiation, 
but curiously, not to UV. Then it was learned that AT 
also affects the immune system and the cerebellum. 
The neurological defect causes progressive loss of 
motor control leading to lack of coordination and 
balance. Respiratory infections develop in children 
with this disease as a consequence of the deficiency 
in the immune system. The term ‘ataxia’ refers to the 
neurological dysfunction while ‘telangiectasia’ refers 
to the characteristic dilation of the blood vessels in the 
eye in AT patients. A single gene, ATM, is responsible 
for the multiple and surprisingly diverse symptoms 
of this disease that also include a predisposition to 
lymphoma and leukemia. Over 10% of AT patients 
develop cancer at an early age. AT is an autosomal 
recessive disease with an incidence of nearly 1 in 
100 000 live births. Even the AT heterozygotes, com- 
prising 1% of the general population, appear to have 
some predisposition to cancer. 

Another curious hallmark of AT is what has been 
termed X-ray-resistant DNA synthesis. We now know 
that the ATM gene is a key element in controlling the 
cell cycle and specifically in delaying the initiation of 
DNA replication following DNA damage of the sort 
that involves strand breaks. The ATM protein exerts 
its regulatory role as a kinase that phosphorylates a 
number of other important proteins, such as the tumor 
suppressor p53, and several proteins implicated in 
double-strand break repair. 


Diseases Involving Homologs of recQ 


There are at least three cancer-prone diseases in 
humans in which the defect is in a homolog of a gene 
originally discovered in E. coli bacteria. (The history 
of recQ is an example of the value of basic research on 
“simple” bacterial cells, that may elucidate relevant 
understanding of human genetic disease.) The product 


Table | Major human genetic diseases involving defects in DNA damage response pathways 


Syndrome 


Gene(s) 


Biological function 


Clinical features 


Hypersensitivities 


Xeroderma pigmentosum (XP) 


Cockayne syndrome (CS) 


Trichothiodystrophy (TTD) 


Ataxia telangiectasia (AT) 


Bloom syndrome (BS) 


Werner syndrome (WS) 


XPA through XPG 


XPV 


CSA, CSB XPB, XPD, XPG 


XPB, XPD, TTDA 


ATM 


BLM 


WRN 


Nucleotide excision repair 


Translesion DNA synthesis 


Transcription-coupled 
repair and transcription 


Nucleotide excision repair 


Damage responsive kinase 


DNA helicase 


DNA helicase 


Sunlight hypersensitivity 


Greatly increased skin cancers 
Neurological defects (sometimes) 


Growth retardation 

Mental retardation 
Premature aging 

Sunlight hypersensitivity 
No increase in skin cancers 


Brittle hair and nails 

Dry, scaly skin 

Mental retardation 
Photosensitivity (sometimes) 
No increase in skin cancers 


Cerebellar ataxia 
Telangiectasia 

Neurological deterioration 
Immunodeficiency 
Lymphomas 

Sunlight hypersensitivity 
Growth retardation 
Leukemias 

Breast and intestinal cancer 


Premature aging 
Atherosclerosis 

Soft tissue sarcomas 
Melanoma, thyroid cancer 


UV, chemical carcinogens 


UV, chemical carcinogens 


UV 


lonizing radiation but not UV 


UV 


4-NQO, camptothecin 
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Syndrome Gene(s) Biological function Clinical features Hypersensitivities 
Rothmund -Thompson RecQ4 DNA helicase Growth deficiency UV 
syndrome (RTS) Sunlight sensitivity 
Osteogenic sarcomas 
Squamous cell carcinomas 
Li-Fraumeni syndrome (LFS) p53 Controls apoptosis, Early-onset cancers, including UV resistance (when both 
cell cycle checkpoints, breast, brain, leukemia, sarcomas p53 alleles are defective) 
and nucleotide excision repair 
Lynch syndrome MSH2, MLH! Mismatch repair Colon cancer 6-thioguanine and cisplatin resistance 


Breast/ovarian cancer syndrome 


Nijmegen breakage syndrome 


Fanconi anemia (FA) 


BRCAI, BRCA2 


NBSI 


FAA, FAC 


Double-strand-break repair 
Oxidative damage repair 
Transcription coupled repair 


Double-strand-break repair 


Interstrand crosslink repair 


Endometrial cancer 


Early-onset breast and ovarian cancers 


Microcephaly 
Immunodefiency 
Lymphomas, neuroblastoma, 
rhabdomyosarcoma 


Growth retardation 
Bone marrow deficiency 
Leukemia predisposition 
Anatomical defects 


? X-rays 


X-rays 


Bifunctional alkylating agents 
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of recQ is a helicase, that in E. coli is involved in 
processing the nascent DNA at arrested replication 
forks. It is not yet clear what the respective homologs 
of recQ do in human cells. However, the effects of 
their deficiency can be quite dramatic and profound. 
Thus, Bloom syndromeis characterized by an extreme- 
ly high frequency of genetic exchanges (so-called 
sister chromatid exchanges) that cause genomic 
instability and cancer. In Werner syndrome the defi- 
ciency in another recQ homolog results in remarkable 
features of premature aging as well as cancer pre- 
disposition. In yet another recQ homolog defect, 
Rothmund-Thompson syndrome presents growth 
deficiency and cancer predisposition. In none of 
these syndromes does there appear to be a deficiency 
in repair of DNA damage of the sort that is subject to 
excision repair. 


Li-Fraumeni Syndrome (LFS) 

The gene defect responsible for Li-Fraumeni syn- 
drome (LFS) is p53, a gene that has been shown to be 
mutated in over 50% of human cancers. The hetero- 
zygote ‘carriers’ of a defective p53 allele do not 
appear to have clinical problems or DNA repair 
defects. However, when the second allele has been 
mutated, or lost, the absence of functional p53 results 
in severe problems for the cell. First of all, the p53 
controlled pathway of apoptosis is disengaged — so 
severely damaged cells will survive and be at risk for 
carcinogenic transformation because of their genomic 
instability. (In fact, LFS p53 homozygote defective 
cells are often less sensitive to UV than wild-type 
cells.) That genomic instability derives from the fact 
that p53 is also an important regulator of cell cycle 
checkpoints. Thus, as with the situation in AT, the cells 
continue to progress through their growth cycle, 
rather than pausing to allow time for DNA lesions to 
be repaired. Finally, p53 serves an important regula- 
tory function in nucleotide excision repair, and in its 
absence some important mutagenic lesions are simply 
not repaired. That, of course, is a major contributor to 
the genomic instability and the consequent develop- 
ment of tumors. It is of some interest and importance 
that most rodent cells do not express the p53- 
dependent excision repair pathway. Therefore, rodents 
are imperfect surrogates for use in carcinogenicity 
testing protocols for environment risk assessment. 


Other Genetic Diseases 


Table | includes a number of other examples of her- 
editary syndromes in which DNA damage processing 
pathways have been compromised. It is likely, indeed 
certain, that there are additional genes to be revealed 
as participants in the diseases listed as well as some 
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new ones. Also there are increasing examples of overlap 
between the different pathways and gene functions. 
Thus, the genes BRCA/ and BRCA2 that predispose 
to breast cancer when defective, are involved in the 
repair of double-strand breaks, and BRCAI has also 
been implicated in the transcription coupled excision 
repair of certain oxidative lesions in DNA. The 
Nijmegen breakage syndrome is also related to 
double-strand-break repair and its gene, NBS7, is one 
of the phosphorylation targets of the ATM gene. 
There is yet a third disease (not listed in the Table) 
called ‘AT-like disorder’ in which the defective gene, 
MRE11, is also required for double-strand repair. 

There is clearly an intricate web of overlapping 
DNA damage surveillance and repair schemes, that 
pose an ongoing and exciting challenge for researchers 
and clinicians. The overall goal must be to learn which 
genes are involved in the different pathways, so that 
human predisposition to different genetic diseases 
may be assessed. The knowledge to be gained will 
hopefully be of value in the design of therapeutic 
strategies for improving the health and well-being of 
the victims of these genetic diseases. 
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The faithful replication of DNA is one of the corner- 
stones of heredity. The process involves a DNA 
duplex separating into two strands, with each strand 
serving as a template for the synthesis of a new com- 
plementary strand. A set of enzymes and associated 
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factors catalyze the intiation, DNA synthesis, and 
termination of replication. For example, in Escherichia 
coli, the DNA replication complex consists of as many 
as 20 proteins, although only a few of them catalyze 
the actual DNA synthesis. Some proteins serve to edit 
out mispairs during replication, and additional pro- 
teins repair damaged DNA or mispairs after replica- 
tion. Replication adheres to the following principles. 
The new strand is synthesized in the 5'—3' direction. 
Replication is semiconservative, in that each duplex 
yields two daughter duplexes containing one old 
strand and one complementary new strand. Synthesis 
is usually, but not always, bidirectional, proceeding in 
both directions from each start site or origin. Because 
all polymerases synthesize new chains in the 5/3’ 
direction, at each growing fork one strand is synthe- 
sized continuously, and the other strand discontinu- 
ously, leaving gaps that are then filled in by subsequent 
DNA synthesis and ligation by the enzyme DNA 
ligase. DNA polymerases do not initiate DNA 
synthesis, but rather add bases onto a short stretch of 
a new strand, or primer. An RNA primer is used to 
initiate DNA synthesis. In E. coli, the enzyme primase 
synthesizes this short stretch of RNA. The primer is 
ultimately removed by the exonuclease activity of one 
of the polymerases. 


See also: DNA Ligases; DNA Polymerases; DNA 
Structure; Semiconservative Replication 
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The first methods for sequencing DNA were devel- 
oped in the middle 1970s by Fred Sanger, and by 
Walter Gilbert and Allan Maxam. Subsequently, 
Sanger developed a new method that forms the basis 
of most DNA sequencing today. This technique uses 
dideoxy nucleotides in DNA synthesis reactions. 
Because they lack a 3’-hydroxyl group, the dideoxy 
nucleotide triphosphates terminate synthesis after 
being incorporated into a growing chain. Labeled frag- 
ments are visualized after electrophoresis on an acryl- 
amide gel by autoradiography. Four parallel reactions 
are run in four different tubes, each tube containing a 
small amount of one of the dideoxy nucleotide triphos- 
phates, and all four of the normal deoxy nucleotide 
triphosphates. A single-stranded DNA labeled at one 
end is used as the template. In each tube all possible 
fragment sizes are generated that result from the 


random incorporation of the respective dideoxy 
nucleotide triphosphate, one per molecule. The auto- 
radiogram of the four reactions can be read to yield the 
linear DNA sequence of all four bases. Automated 
DNA sequencing machines use fluorescent dyes to 
label the DNA fragments, with a different color used 
for each reaction that can be read froma mixture of all 
four reactions. The sequence is then printed out. The 
very latest DNA sequencing machines are so rapid 
that the genomes of certain microorganisms can be 
sequenced virtually in one day, and most of the 
human genome in approximately two years. 


See also: Functional Genomics 
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The primary structure of DNA, or deoxyribonucleic 
acid, consists of three different chemical moieties: the 
phosphate group, the cyclic five-membered deoxyri- 
bose sugar ring, and the bases. DNA is a very long 
polymer with a repetitive backbone structure, in 
which the phosphate groups are joined by phospho- 
diester linkages to the 5’ and 3’ hydroxyl groups of 
successive sugars along the chain. The bases are planar 
aromatic groups joined to the C1’ sugar atom by a 
C-N glycosidic bond, and consist of the single-ring 
pyrimidines, thymine (T) and cytosine (C), and the 
double-ring purines, adenine (A) and guanine (G). 
There is no structural constraint on the sequence of 
bases along a DNA strand, allowing this sequence to 
be the repository of genetic information for the cell. In 
its most common biological form, DNA forms a 
structure in which two strands are coiled around 
each other — the double helix. The strands interact 
via hydrogen-bonding interactions between the bases 
to form base pairs between a purine on one strand and 
a pyrimidine on the other. Adenine is normally paired 
with thymine to form a pair with two hydrogen 
bonds, while guanine and cytosine form a second 
pair with three hydrogen bonds. These two purine- 
pyrimidine pairs are very similar in shape, so that, to a 
first approximation, the overall structure of the double 
helix is independent of the base sequence (Figure 1). 
This permits highly regular packaging into compact 
higher-order structures inside the nucleus of the cell. 

Double-helical DNA contains two grooves desig- 
nated as major and minor, which spiral around the 
outside of the molecule. The bulky outside ‘rails’ of 
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Figure | 


Schematic drawing of a 3-bp segment of duplex DNA. Note that the 5’ to 3’ direction of the chains is 


opposite; the chains are antiparallel to each other. The commonly found A-T and G-C pairs are shown with dotted 


lines indicating hydrogen bonds. 


the helical ladder are formed by the sugar—phosphate 
backbones, which run antiparallel to each other. The 
antiparallel orientation of the two strands is most 
clearly seen in the opposite orientations of the sugar 
rings (Figure |). The floors of the grooves are formed 
by the edges of the stacked base pairs. It is the stacking 
of the base pairs upon each other, like coins in a roll, 
that provides the strongest driving force for formation 
of the duplex from the individual single strands. 

The DNA double-helix can form somewhat differ- 
ent three-dimensional conformations depending on 
environmental conditions and, to some extent, on the 
base sequence. Inside the cell where the DNA is fully 
hydrated, a highly regular structure known as the 
B-form is adopted (Figure 2). In B-form DNA, the 
major groove is approximately twice the width of 
the minor groove, and the helical axis runs through 
the center of the base pairs. The depth of the two 
grooves is approximately the same. The DNA makes 
a complete 360° turn in 10 base pair steps, or 36° per 


base pair. Viewed end-on, the molecule thus exhibits 
an elegant 10-fold symmetry. 

A second, nonphysiological form of DNA is desig- 
nated the A-form and occurs under conditions when 
the molecule is dehydrated. In A-form DNA, the base 
pairs are tilted from the perpendicular defined by the 
helix axis, and the axis itself lies in the major groove. 
This causes the major groove to become extremely 
deep and narrow, while the minor groove instead is 
quite shallow and broad. Also in contrast to B-DNA, 
the number of base pairs required for a full 360° turn 
increases to nearly 11. Overall, the A-form is shorter, 
broader, and less symmetrical than the commonly 
found B-form. DNA can also adopt a third form, 
known as Z-DNA, which is favored by sequences that 
alternate purines with pyrimidines along the same 
chain. Z-DNA is highly unusual in that it adopts a 
left-handed orientation, in contrast to the right- 
handed twist of A- and B-DNA. Z-DNA appears 
to have some biological significance, as it forms 
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Figure 2 Three-dimensional structure of double- 
helical DNA in the B-form as determined by X-ray 
crystallography. The sugar—phosphate backbones of the 
two strands spiral around the outside of the molecule. 
Phosphates are depicted in white and sugars in black. 
The base pairs (gray) are stacked down the center of the 
molecule, and their edges form the bottoms of the 
major and minor grooves. 


preferentially in the DNA of genes that are being 
actively transcribed. 

The sizes and shapes of the major and minor 
grooves in B-DNA have important implications for 
the regulation of gene expression. The major groove is 
wide enough to allow the close approach of proteins, 
which are then able to bind to specific sequences by 
virtue of their ability to form hydrogen bonds with the 
specific chemical groups on the DNA bases. Fortu- 
nately, the major groove edges of the A-T and G-C 
base pairs differ significantly in the identities of the 
chemical groups which can form hydrogen bonds to 
amino acid side chains of the protein. This provides 
an underlying chemical basis for why a protein can 
attach to a particular sequence of DNA in preference 
to others. By contrast, the opportunities for discrimin- 
ation among base pairs in the minor groove are much 
smaller. Thus, the narrow minor groove, which does 


not permit close approach by proteins, is not a serious 
liability to the sequence-discrimination process. Bind- 
ing of proteins to DNA segments near the beginning 
of genes can influence the efficiency with which they 
are transcribed into messenger RNA and ultimately 
translated into proteins. 

Although the overall helical structure of DNA is 
independent of the base sequence, studies of a large 
number of different sequences by X-ray crystallo- 
graphy have shown that the extent of variation in the 
helical parameters can be surprisingly large. Thus, the 
detailed local structure of the helix does depend on 
the particular sequence, although the rules governing 
this are not well understood. This also has important 
implications for specific DNA recognition by proteins, 
because discrimination among base pairs has the poten- 
tial to arise from protein binding to distinct sequence- 
dependent conformations of the sugar-phosphate 
backbone, as well as to the bases themselves. 

DNA is a highly flexible molecule. This can some- 
times lead to the formation of unusual structures 
known as cruciforms, which occur when a portion of 
the double helix is broken in favor of base-pairing 
within the isolated single strands. The processes of 
DNA replication, recombination, and transcription 
exploit DNA flexibility to form a variety of different 
structures, including junctions in which four strands 
of DNA from two separate duplexes come together. 
Another biologically crucial global alteration of the 
DNA structure, known as supercoiling, also occurs in 
all cells. If any flexible material, (such as rubber tubing 
or DNA) is broken, and the ends twisted and then 
rejoined, this will give rise to a structure that is coiled 
upon itself. In eukaryotic cells, the chromosomal 
DNA is organized into distinct domains that are 
firmly attached at each end to the nuclear scaffolding. 
Breakage and rejoining of the DNA in these domains 
is carried out by specific enzymes, which introduce or 
remove twists by passing the DNA strands around 
each other. Supercoiling introduces torsional stress 
into the DNA and alters the global positioning of 
some portions of the chromosome with respect to 
others. This has important effects on both the replica- 
tion of DNA and its expression. 
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Topological Basis of DNA Supercoiling 


DNA supercoiling is a special property of circular, 
double-stranded DNA that is topological in origin. 
It confers new structural and energetic properties. 
When the ends of a linear DNA molecule are ligated 
to produce a covalently closed circle, the two strands 
become intertwined like the links of a chain, and will 
remain so unless one of the strands is broken. The 
number of times one strand is linked with the other 
is described by a fundamental property of DNA 
supercoiling, the linking number (Lk). This is related 
to two geometrical properties of the molecule, the 
twist (Tw, rotation of the strands about the helical 
axis) and the writhe (Wy, which measures the path of 
the helix axis in space). These three properties are 
related by: 


Lk = Tw+ Wr (1) 


A relaxed, closed circular DNA molecule has a 
linking number (Lk°) given by: 


Lk° = N/b (2) 


where N is the number of base pairs, and 4 is the 
helical repeat under the experimental conditions. If a 
linear DNA molecule with an exact number of turns 
under the prevailing conditions were ligated into a 
planar circle in the absence of torsional force, it 
would have a linking number that equaled the num- 
ber of turns in the original linear molecule. However, 
if the circle were closed following the application of a 
torsional force to the molecule, such that one or more 
turns was added or subtracted, the resulting over- or 
underwinding would become trapped in the molecule 
by the circularization. These molecules would be true 
isomers of the relaxed species, by virtue of the top- 
ology of the molecule, and are called topoisomers. A 
negatively supercoiled molecule has a linking deficit 
(Lk) relative to the relaxed species, i.e., 


ALk = Lk — Lk? <0 (3) 


It is often convenient to express the level of super- 
coiling in the form of a density that is effectively 
independent of the size of the molecule considered. 
Superhelix density (ø) is given by: 
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o = ALk/Lk° (4) 


Many natural bacterial DNA species are circular and 
negatively supercoiled. A plasmid extracted from 
Escherichia coli in mid-exponential growth is typically 
supercoiled to the extent of o = —0.06, although inside 
the cell the unconstrained supercoiling in general takes 
about half this value. By contrast, the DNA of some 
thermophiles is positively supercoiled (overwound, 
o> 0). 

In the absence of strand breakage Lk is constant, 
and therefore the sum of twist and writhe changes is 
constant in any structural change that maintains 
strand integrity. Thus, the linking deficit is partitioned 
between geometric alterations in the molecule of tor- 
sional and flexural character: 


ALk = ATw + AWr (5) 


These changes in the shape and geometry of the super- 
coiled molecule lead to different physical properties, 
such as sedimentation and frictional properties 
(Figure 1). 


Energetics of DNA Supercoiling 


Both twisting and writhing deformations are energet- 
ically unfavorable, and a supercoiled DNA molecule 
has a higher free energy compared to its relaxed iso- 
mer. The free energy of DNA supercoiling (AG?) is 
quadratically related to the linking number: 


AG? = 1050: (RT/N) - ALK? (6) 


where R is the gas constant, T is the absolute tempera- 
ture, and N is the size of the DNA molecule in base 
pairs. Thermal fluctuation about a mean linking dif- 
ference results in a Boltzmann distribution that is 
Gaussian; this is readily observed by separating the 
topoisomers of a bacterial plasmid by gel electro- 
phoresis in the presence of an intercalator like chloro- 
quine. The energy held in a supercoiled circle can be 
substantial. For example, a 4 kb plasmid with a super- 
helix density of —0.05 has a free energy of supercoiling 
of 60 kcal mol”! at 37 °C. 

Any local perturbation that is underwound relative 
to B-DNA (such as an unwinding of the helix, or 
the formation of a cruciform structure or a section of 
left-handed DNA) contributes a negative twist change 
that brings about a partial relaxation of the superhelical 
stress. This reduction in the free energy of supercoiling 
offsets the free energy of formation of the new DNA 
structure, and helps stabilize the altered conform- 
ation. Since the free energy of supercoiling increases 
quadratically with linking difference (equation (6)), 
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Figure | Separation of topoisomers of a circular 
DNA plasmid by gel electrophoresis in polyacrylamide. 
sc, supercoiled plasmid as isolated from Escherichia coli in 
exponential growth. Under the electrophoresis condi- 
tions, the native distribution of supercoiled topoisomers 
migrates as a single band. rel, plasmid DNA following 
partial relaxation with topoisomerase. Each band 
contains a topoisomer of given linking number, and 
adjacent bands correspond to Lk = + |. The separation 
is made possible because the physical structure of the 
topoisomers varies with linking number. 


there exists a level of negative supercoiling above 
which the new structure has a stable existence. 


Enzymatic Manipulation of DNA 
Topology 


Altering the linking number of a topoisomer requires 
the temporary breakage of at least one strand, and 
passage of the other through it before resealing. 
Enzymes called topoisomerases carry out such reac- 
tions. These important and ubiquitous enzymes may 
be classified into two classes, called types I and II. The 
type I topoisomerases interconvert topoisomers by 
changes in linking number of + 1, while the type II 
topoisomerases do so by steps of + 2; this significant 
difference underlies a fundamental difference in 
mechanism of the two classes. All topoisomerases 
create temporary breaks in the DNA, and the energy 
of the phosphodiester bond is conserved in the forma- 
tion of a transient covalent linkage between a DNA 


terminus and the protein, usually as a phosphotyro- 
sine linkage. Thus, the topoisomerases are closely 
related to the site-specific recombinases. 

Type I topoisomerases relax DNA supercoiling by 
means of a single-strand break. The linking number is 
then changed either by allowing a swivel to occur 
about the nick (eukaryotic enzymes), or by passing 
the unbroken strand through the break before the 
phosphodiester linkage is restored (most eubacterial 
enzymes), thereby leaving a permanent change in the 
linking number of the DNA circle. No energy trans- 
duction is involved in the function of topoisomerase I 
(excluding reverse gyrase). It is not required for re- 
making the phosphodiester linkage as the free energy 
for this is preserved by a temporary covalent DNA- 
protein linkage, and the position of equilibrium is 
simply allowed to run ‘downhill’; i.e., these enzymes 
just relax supercoiling toward the equilibrium state 
under the prevailing conditions. The eubacterial 
topoisomerases I will specifically relax negative super- 
coiling, while eukaryotic enzymes can relax either 
negative or positive supercoiling. 

Type II topoisomerases function by passing duplex 
DNA through a double-stranded break, thereby alter- 
ing Lk by steps of + 2. DNA gyrase is a type II 
topoisomerase of E. coli. This enzyme has the special 
property of coupling the hydrolysis of ATP to the 
introduction of negative supercoiling into DNA. 
Thus, DNA gyrase is an A2B> tetramer, consisting of 
specialized subunits for topoisomerization and energy 
transduction. Type II topoisomerases are found in all 
cells and even some viruses, but DNA gyrase is the 
only such enzyme that is known to induce negative 
supercoiling. 

The balance between the opposing activities of 
supercoiling (by gyrase) and relaxation (by topo- 
isomerase I) creates a steady-state level of supercoiling, 
demonstrated in Salmonella by studying mutants in 
the relevant genes. 


DNA Supercoiling and Transcription 


There is an intimate relationship between dynamic 
events in DNA and supercoiling. For example, the un- 
winding of the DNA template required for the initi- 
ation of transcription can be strongly affected by the 
state of supercoiling. Moreover, an elongating RNA 
polymerase can itself generate DNA supercoiling, 
particularly where the rotation of the DNA-protein 
complex is hindered in some way. This is described by 
the twin supercoiled-domain model of Liu and Wang 
(1987). Transcription-induced supercoiling is generally 
well relaxed by cellular topoisomerases, but can be 
demonstrated very easily in topoisomerase-mutant 
bacteria. 
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DNA synthesis is the process whereby deoxynucleic 
acids (adenine, thymine, cytosine, and guanine) are 
linked together to form DNA. Jn vivo, most DNA 
synthesis occurs as a result of DNA replication but 
nucleotides can also be incorporated into DNA pre- 
cursors during repair mechanisms and retroviruses are 
able to synthesize DNA from viral RNA of virus- 
infected cells. 

DNA replication is initiated by the melting of the 
DNA double helix. Further, local unwinding is cata- 
lyzed by the enzyme helicase, which generates regions 
of single-stranded DNA. The DNA is then primed by 
the addition of short RNA sequences that provide an 
initial 3’ hydroxyl group to which deoxynucleotides 
can be added. These primers are later removed. The 
extension of nucleotide primers requires a group of 
enzymes called DNA polymerases. Escherichia coli 
contains DNA polymerases I, II, and III, polymerase 
III being important for DNA the de novo synthesis of 
new DNA strands and polymerase I for editing out 
unpaired strands at the end of the growing strands. 
The homologous enzymes in animals are polymerases 
a, B, and y, with a being responsible for nuclear DNA 
synthesis and y for mitochondrial DNA synthesis. 
DNA polymerases extend DNA precursors by adding 
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nucleotides, one at a time, to the 3’ end of an RNA/ 
DNA precursor. This results in the formation of phos- 
phodiester bonds between the 5’ phosphate group of 
one nucleic acid to the 3’ hydroxyl group of the next. 
The type of nucleotide added at each point is deter- 
mined by Watson—Crick base pairing with the tem- 
plate DNA strand. The efficiency of this process is 
improved by the 3/-5’ endonuclease activity of DNA 
polymerases I and II which provides a postsynthetic 
proofreading mechanism. However, since daughter 
DNA strands must be synthesized on both strands 
of the parent DNA, the replication enzymes must 
move in the 5’—3’ direction on one strand and the 3’— 
5’ direction on the other. This problem is solved by 
synthesizing the leading strand in the 5’—3’ direction in 
a continuous manner and the lagging strand in the 3’— 
5’ direction through the synthesis of short, 5'-3' Oka- 
zaki fragments of DNA. These Okazaki fragments are 
then connected by the enzyme DNA ligase to form a 
continuous strand. Thus this mode of DNA synthesis 
is known as semidiscontinuous replication. 

DNA can also be synthesized by reverse transcrip- 
tion. This is the mechanism whereby linear duplex 
DNA is synthesized from a viral RNA precursor in 
the cytoplasm of virus-infected cells, a process that 
requires the enzyme reverse transcriptase. Reverse 
transcription is a useful tool for synthesizing DNA 
from mRNA precursors in vitro. During this process, 
an oligonucleotide primer is annealed to the poly(A) 
tail of the template mRNA. The primer is then 
extended by the 5’—3’ stepwise addition of nucleotides 
through the action of reverse transcriptase. The pro- 
duct is a DNA-RNA hybrid, which can be converted 
into a cDNA by treatment with RNAse and subse- 
quent treatment with DNA polymerase I. 


See also: DNA Ligases; DNA Polymerases; DNA 
Structure; Okazaki Fragment; Replication; 
Reverse Transcription 
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Theodosius Dobzhansky (1900-1975) was responsible 
for the present understanding of the evolutionary sig- 
nificance of genetic variation within and between popu- 
lations. His experimental and observational work on 
the genetics of natural populations, and the general- 
izations that he made from those observations, estab- 
lished the agendas that still characterize experimental 
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population genetics, physical anthropology, and the 
study of the genetics of species formation. 

Theodosius Grigorievitch Dobzhansky was born 
in Nemiroy, Russia on 25 January 1900 and died in 
Davis, California on 18 December 1975. After his 
undergraduate training in Kiev he began his scientific 
career studying the variation in natural populations of 
coccinellid beetles in Europe and Asia. The immense 
morphological polymorphism of these beetles led him 
to reject the usual naming of separate ‘races’ with 
geographical ranges. He substituted the notion of a 
single genetically polymorphic species, subdivided 
geographically into breeding populations character- 
ized by different frequencies of genotypes. This led 
him to the now standard concept of the ‘Mendelian 
population,’ a genetically heterogeneous collection of 
freely mating individuals, recombining genes in repro- 
duction and exchanging genes at some rate with other 
such populations. 

This view of population genetic structure had a 
revolutionary influence on physical anthropology, 
with the elimination of the biological concept of race 
from anthropology and the present emphasis on the 
large intrapopulation variation in contrast with the 
relatively small differentiation between populations. 

Dobzhansky emigrated to the United States in 1927 
and began his genetic studies of barriers between spe- 
cies and of the genetic variation in natural populations 
of Drosophila which he pursued until shortly before 
his death in 1975. His observations on the chromo- 
somal and genic variation in Drosophila and experi- 
ments on crosses between closely related species in 
which chromosomes could be followed by genetic 
markers, produced a coherent view of the steps in 
species formation, outlined in his 1937 book Genetics 
and the Origin of Species. In this theory, genetic vari- 
ation arising from mutations leads to polymorphism 
within local populations that are freely interbreeding 
within themselves but genetically isolated from other 
populations because of extrinsic barriers to migration. 
These local populations diverge from each other either 
because of random genetic drift or differentiating 
natural selection until they have accumulated so much 
genetic difference from each other that they are bio- 
logically incapable of exchange of genes. They are then 
different species, defined as groups that can exchange 
genes within groups but not with each other. 

The chief concern of Dobzhansky’s experimental 
program was to determine what was responsible for 
maintaining the large amount of genetic polymorph- 
ism within populations. From his observations of the 
long-term stability of inversion polymorphisms in 
natural populations of Drosophila, the observation 
of polymorphic equilibria in laboratory populations, 
and the measurement of viabilities of inversion 


homozygotes and heterozygotes in laboratory condi- 
tions, he concluded that inversion polymorphisms 
were maintained by superiority of heterozygotes, 
creating a ‘balanced polymorphism.’ From experi- 
ments on the viability and fertility of homozygotes 
and heterozygotes for random chromosomes sampled 
from nature, he concluded that genic heterozygotes 
were also more fit than homozygotes. At first he 
regarded this as the outcome of a selective retention 
in a population of exactly those alleles that had super- 
ior combining ability, leading to a state of ‘coadapta- 
tion’ within and between loci. Later, as a consequence 
of experiments on crosses between chromosomes from 
different populations and B. Wallace’s experiments on 
the viability of heterozygotes for newly induced muta- 
tions, he concluded that genic heterozygotes per se 
were more fit than homozygotes, without any process 
of coadaptation. 

Dobzhansky’s claim for the adaptive superiority of 
heterozygotes led him to the ‘balance’ theory of popu- 
lation structure that emphasized the normality of 
genetic variation as opposed to what he termed the 

‘classical’ theory that the most fit genotype would bea 
homozygote so that genetic variation was a deleterious 
effect of recurrent mutations. This in turn led him to re- 
ject all eugenic programs aimed at genetically purifying 
the human genome and he had a very strong influence 
on the final elimination of eugenic programs of popu- 
lation and ‘race’ improvement. His influence on phys- 
ical anthropology and human genetics were as great as 
his effect on the general field of population genetics. 


See also: Balanced Polymorphism 


Dogs 


See: Canine Genetics 


Dominance 


M A Cleary 
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Dominance is manifestation of the phenotype associ- 
ated with a particular gene allele even when only one 
copy of thatalleleis presentina organism’s genome. Be- 
cause there is a 1 in 2 chance of a dominant gene allele 
being inherited and only one copy of the allele is 
necessary for the phenotype, 50% of offspring will 
display the phenotype and any associated disease. 


See also: Codominance; Incomplete Dominance 
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In various disparate animal groups sex determination 
is achieved by a pair of heteromorphic sex chromo- 
somes, females being chromosomally XX and males 
XY or XO. Typically the X chromosome is large and 
carries many genes unconnected with sex, whilst the Y 
chromosome is smaller and with fewer genes. Thus, 
potentially males and females have different dosages 
of X-linked gene products. However, in at least three 
groups, mechanisms have evolved that differentially 
modulate transcription of the X chromosomes in the 
two sexes, so that effective dosages of X-linked gene 
products are equalized. This is termed “dosage com- 
pensation.’ 

The three well-studied examples of dosage compen- 
sation are found in the fruit fly Drosophila, the nema- 
tode worm Caenorhabditis elegans, and in mammals. 
The three systems differ. In Drosophila transcription 
of the single X chromosome of the male is enhanced to 
equal that of two female X chromosomes. In C. elegans 
the transcription of the two X chromosomes of the 
hermaphrodite is downregulated to equal that of 
the male, and in mammals all X chromosomes except 
one in each cell are transcriptionally inactivated. In all 
three cases the exact mechanism of control of tran- 
scription is not known. In flies and worms the transcrip- 
tional changes are mediated by hundreds of sites 
distributed along the X chromosome. In both groups 
several genes concerned have been identified. In each 
case a single gene on the X chromosome is involved in 
counting the presence of one or two X chromosomes. 
In flies, if there are two X chromosomes the dosage 
compensation machinery is switched off. If there is 
one, in the male, several autosomal genes encode pro- 
teins which form a complex that locates at hundreds of 
sites along the X chromosome. Two noncoding RNAs 
associate with the proteins. Histone H4 all along 
the X chromosome is hyperacetylated. The male X 
chromosome takes on a different chromatin structure, 
being wider and more diffuse than the autosomes 
and female X chromosomes. In the worm C. elegans 
again there are several genes which encode proteins 
that form a complex at multiple sites, this time along 
the two X chromosomes of the hermaphrodite. 
A single gene switches this mechanism off in the 
male. Some of the genes concerned also have a role 
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in condensation of chromosomes at mitosis. It is 
possible that the transcriptional downregulation 
involves a change in state of the chromatin, and 
that in evolution the organism has made use of 
previously existing mitotic machinery to bring this 
about. 

In mammals, in contrast to the multiple sites in- 
volved in flies and worms, X chromosome inactivation 
requires the presence of a single site, the X-inactivation 
center (XIC), from which the inactivation spreads 
in both directions. Segments of X chromosome lack- 
ing an XIC, through translocation or deletion, do not 
undergo inactivation. A single gene, located at the XIC, 
has so far been identified as essential for the initiation 
of X chromosome inactivation and so for dosage com- 
pensation. Thisis the Xist(X inactive specific transcript) 
gene, which is active on the inactive X chromosome 
and inactive on the active X chromosome. It codes for 
a polyadenylated noncoding RNA. This RNA remains 
close to the inactive X chromosome and appears to 
coat its entire length. It may be complexed with a pro- 
tein. Gene knockouts of Xist have shown that se- 
quences in its 5’ region are essential for the initiation 
of inactivation, and sequences 3’ to exon 6 are essential 
for counting of X chromosomes. When transgenes 
of Xist are inserted into autosomes the Xist RNA 
can coat the autosome and bring about inactivation. 
Hence, X-chromosome-specific sequences are not 
essential for the function of Xist. However, the travel 
and completeness of inactivation are more restricted in 
autosomes. 

As in dosage compensation in flies and worms, the 
exact mechanism by which Xist and its RNA bring 
about inactivation is not known. A feature common to 
flies and mammals is the involvement of a noncoding 
RNA. Differential acetylation of histones is another 
common feature. In Drosophila the doubly active male 
X chromosome is hyperacetylated, whereas in mam- 
mals the inactive X chromosome is hypoacetylated. A 
difference is that methylation of cytosines in DNA is 
required for the silencing of Xist on the active X 
chromosome, whilst differential methylation does 
not occur in Drosophila. In both flies and worms 
a very precise twofold change in transcription is 
required, whereas the all-or-none mechanism of 
mammals appears superficially simpler. The mamma- 
lian mechanism can also cope with supernumerary 
X chromosomes by inactivating them, whilst in 
flies and worms supernumerary X chromosomes are 


lethal. 


See also: X-Chromosome Inactivation; XIST 
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Double minute chromosomes comprise one of the 
cytogenetically visible signs of gene amplification, 
the other being ‘homogeneously staining regions.’ 
Gene amplification is the increase in copy number of a 
chromosomal DNA segment (an amplicon) which may 
occur “spontaneously” in many mammalian tumors. 
The DNA amplification will usually lead to a corres- 
ponding increase in expression of the genes contained 
in the amplicon. The amplicon can be quite large 
(commonly the size range is 100 kb to several mega- 
bases) and contain several genes, but it is thought that 
one gene (usually an oncogene) is the major target of 
amplification, providing the cancerous cell with 
a growth or survival advantage when overexpressed 
(Schwab, 1999). Gene amplification may also be in- 
duced in vitro after treatment of a cell culture with 
stepwise increasing concentrations of certain toxic 
drugs. In this case the amplicon will contain a 
gene whose product provides protection against the 
toxic drug (Schimke, 1988). The amplicons can be 
detected in cytogenetic preparations, where they 
appear as numerous small chromatin bodies. In the 
metaphase cell, where the chromosomal DNA has 
replicated, the chromatin bodies will be double, 
hence they have been called ‘double minute chromo- 
somes’ (dmin). The double minutes have no centro- 
meres and do not attach to the mitotic spindle at cell 
division. Hence, they are distributed essentially at 
random to the daughter cells, which may lead to 
large variations in copy number in individual cells, 
providing an efficient mechanism for rapid response 
to external selection. 
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Double-strand breaks in DNA are repaired faithfully 
by a recombination mechanism that copies the nucleo- 
tide sequence of a sister molecule, or any other homo- 
logous sequence present, to replace the sequence 
lost by the double-strand break. Most examples of 
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Figure | The double-strand break repair model of 


genetic recombination. Each line represents a single 
DNA strand. Molecules of different parental origin are 
distinguished by thick or thin lines. Newly synthesized 
strands are shown as discontinuous. Strand polarity 
is indicated by a half arrow head on the 3’ end. The 
diagram is explained in the text. 
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Figure 2 Three modes of resolution of the double Holliday junction depicted in Figure I(vi). DNA strand 
conventions are as in Figure 1. Solid arrows indicate progression. Open arrows indicate endonuclease action. Half 
arrows show the direction of branch migration. Topoisomerase molecules are shown as circular arrows around DNA 


molecules. The diagram is explained in the text. 


recombination that have been studied, whether they 
are repair processes or whether they are involved 
in meiosis or in development, are initiated by a 
double-strand break. This has been elaborated into a 


general scheme for recombination called the double- 


strand break repair model. 


Figure | shows a basic form of the model. 
The double-strand break shown in part (i) may be 
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genetically programmed or a consequence of cellular 
processes such as replication, or caused by damage 
from DNA damaging agents. The homologous mol- 
ecule is intact. Exonuclease resects the 5’ strands in both 
directions from the break (ii), leaving 3’-ending single 
strands. One single-stranded tail invades a homo- 
logous molecule (a sister chromatid, a homologous 
chromosome, or a region of homology anywhere in 
the same cell). A protein such as RecA in Escherichia 
coli, or its homolog Rad51 in eukaryotes, catalyzes 
the invasion. The invasion forms a D-loop as shown 
in (iii) where the 3’ tail from the damaged molecule 
has replaced the like strand of the homolog and a 
length of hybrid or heteroduplex DNA has been 
formed. (Heteroduplex DNA is hybrid DNA that 
contains mismatched base pairs because the parental 
molecules that formed the hybrid had a genetic differ- 
ence in that region.) 

The 3’ end can now prime DNA synthesis, extend- 
ing its length and displacing more of the like strand of 
the invaded molecule while copying its complement. 
As shown in (iv), this will allow annealing by comple- 
mentary base pairing between the displaced strand and 
the other 3’ tail of the broken molecule, which now 
primes DNA synthesis in the other direction (v). 
This annealing forms a second length of hybrid 
DNA. It is not necessary for the second hybrid length 
to form by annealing with a displaced strand as drawn. 
Instead, there may be two D-loops formed independ- 
ently that then merge as they are extended by DNA 
synthesis. Detachment of polymerase complexes and 
ligation will yield the classical double-strand break 
repair structure shown in (vi). This consists of a 
double Holliday junction with lengths of hybrid 
DNA in between. 

If the original double-strand break involved a 
gap, the missing material will have been replaced by 
the synthesis that copied both strands of the invaded 
molecule. Thus, by this model, double-strand gap 
repair is just as efficient as double-strand break repair, 
as has been observed. As with any other hybrid DNA 
model of recombination, heteroduplex is subject to 
mismatch correction giving various patterns of con- 
version and, if the heteroduplex DNA remains uncor- 
rected, postmeiotic segregation. If a double-strand 
gap has been filled, the gapped molecule will have 
been converted to the genotype of the homologous 
molecule without the formation of heteroduplex 
DNA and mismatch repair at that site. 

Originally, the double-strand break repair model 
specified that the double Holliday junction illustrated 
in Figure | (vi) is resolved by endonucleolytic cleav- 
age. The outcome of this process, whether the event 
was resolved as a crossover or as a non-crossover, 
would depend upon isomerization of the two 


Holliday junctions with random cleavage. This is illu- 
strated in Figure 2A. If the two junctions were 
cleaved in the same plane, there would be no cross- 
over. Cleavage of the two junctions in different planes 
would result in a crossover. This random process 
would be expected to yield an equal number of cross- 
overs and non-crossovers. This is unsatisfactory 
because these are observed to be unequal. Notably, 
when double-strand break repair occurs in mitosis, 
less than 10% of events give rise to crossovers. 

Two other modes of resolution have been sug- 
gested. Figure 2B illustrates the removal of the double 
Holliday junction by the concerted action of a topoi- 
somerase rotating the lengths between the junctions. 
This can cause the distance between the junctions to 
grow shorter, and the junctions to oF each other 
until they disappear, ge in a non-crossover out- 
come. A third method of resolution that would lead to 
a deficiency of crossovers is to resolve one junction 
always in the plane that gives nonrecombinant mol- 
ecules, and then have the remaining junction migrate 
across the nicks left where the first junction was cut 
(Figure 2C). This results in a non-crossover outcome. 
There may be other mechanisms of resolution yet to 
be described, and it is possible that more than one 
mode of resolution occurs. 

The resolution mechanisms shown in Figure 2B 
and 2C both give rise to an interesting configuration 
of heteroduplex DNA called trans-heteroduplex. A 
length of heteroduplex DNA spans the original posi- 
tion of the double-strand break, but the linkage re- 
lationships of the parents are not maintained, with 
recombination within the heteroduplex occurring 
at the site of the original break. The distribution of 
markers predicted by this structure has been observed 
genetically. 
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Trisomy 21 is the commonest autosomal trisomy in 
humans, it is also the commonest genetic cause of 
mental retardation and it is the only genetic condition 
that is readily recognized by lay people. In 1866 the 
English physician John Langdon Down described 
the characteristic constellation of clinical features 
(phenotype) but other accurate descriptions preceded 
this. A chromosomal basis for the syndrome was 
suggested by Waardenburg in 1932 but this was not 
established until 1959, by Lejeune in France and by 
Ford and Jacobs working independently in England. 
Although it is now known that the extra chromo- 
some most frequently arises due to maternal non- 
disjunction at meiosis I, the details of the mechanism 
are still obscure apart from some evidence sug- 
gesting that the event is related to abnormal genetic 
recombination. Also, other factors may be important 
in causing the paternal meiotic error or the post- 
fertilization mitotic error that occur in a minority of 
cases. 


Frequency 


The incidence of Down syndrome at birth is approxi- 
mately 1 in 750. However, since the majority of 
trisomy 21 pregnancies spontaneously miscarry, the 
incidence at conception must be higher, perhaps as 
high as 1 in 150. The chance of a trisomy 21 conception 
rises with advancing maternal age. Thus, in affluent 
countries where there is a population-based, antenatal 
screening program for trisomy 21 and a trend for 
women to postpone having children until the fourth 
decade, the number of Down syndrome diagnoses has 
increased. 


Cytogenetics 


In 95% of cases, Down syndrome is due to the presence 
of an extra, free chromosome 21 and the karyotype is 
given as 47, XX, +21 or 47, XY, +21. In 4% of cases 
the phenotype is identical but the extra chromosome 
21 is not free, instead it is attached to another chromo- 
some, commonly chromosome number 13 or chromo- 
some 14 in a Robertsonian translocation. In the 
remaining 1% of cases, Down syndrome mosaicism 
is present with the affected individual’s cells com- 
prising two populations, one with a normal karyotype 
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and the other with trisomy 21. In individuals with 
mosaicism the phenotype is, on average, less severe 
than in nonmosaic Down syndrome, but a prediction 
about an infant’s future development based on the 
proportions of normal and trisomic cells observed in 
a small sample of amniotic fluid or peripheral blood is 
unreliable. After one child is affected by trisomy 21, 
the risk of recurrence in a sibling is 1%, but this risk 
may be considerably higher if the cytogenetic diagno- 
sis is translocation Down syndrome and one parent 
carries a balanced translocation involving chromo- 
some 21. 


Clinical Aspects 


Infants who are affected by Down syndrome are 
usually diagnosed very soon after birth because they 
have reduced body tone in combination with minor 
features including flat occiput, upslanting palpebral 
fissures, epicanthic folds, large or slightly protruding 
tongue, single palmar crease, small fifth finger, and 
wide gap between first and second toes. More import- 
antly, these infants also have an increased chance of 
being affected by one or several different serious con- 
genital malformations or illnesses. Thus, about one in 
five affected children die before age 5 years and two 
in five are affected by conditions such as congenital 
heart defect, bowel atresia, or leukemia. For most but 
not all families of an affected child, cognitive impair- 
ment is the most important complication of the syn- 
drome. This is always present, although of variable 
severity. In general, the type of cognitive impairment 
is not specific to trisomy 21. Delay in development 
is often evident from early infancy and when IQ is 
measured, scores indicate moderate to severe retard- 
ation (IQ range 10-70). Thus, Down syndrome indi- 
viduals achieve variable levels of independence in 
adult life but only a minority are fully independent 
in all daily living skills. In mid to late adulthood there 
is increased prevalence of dementia. Neuropatholo- 
gical studies on brains of deceased trisomy 21 indi- 
viduals over age 40 years always show microscopic 
changes characteristic of Alzheimer disease, but only 
about half of those individuals have clinical evidence 
of dementia. 

Despite increased mortality and morbidity in tri- 
somy 21, the life expectancy of affected children has 
greatly lengthened and nowadays nearly 50% of 
adults with Down syndrome survive to age 60 years. 
The quality of life of Down syndrome individuals has 
also greatly improved with most important factors 
being care of affected individuals at home and their 
participation in the wider community. Educational 
opportunities have improved and the general public 
have better understanding of the condition due to the 
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efforts of lay organizations, such as national Down 
syndrome associations, which strive to correct mis- 
conceptions and dispel prejudice. 


Research into Down Syndrome 


Each year, over 500 articles citing Down syn- 
drome are published. A major research theme is 
the identification of pregnancies where the fetus is 
affected by trisomy 21. To accomplish this, antenatal 
screening programs assess independent risk factors 
such as the mother’s age, the maternal serum levels of 
certain pregnancy-related proteins (a-fetoprotein, 
human chorionic gonadotrophin), and the appearance 
of the fetus on ultrasound examination. Up to 80% of 
affected pregnancies may be diagnosed before 20 
weeks gestation. In the future, the gestational age at 
which screening and diagnostic testing occurs may be 
brought forward while the specificity and sensitivity 
of the screening tests are improved. Safer diagnostic 
tests such as chromosome analysis of fetal cells in 
maternal blood may replace current tests such as 
amniocentesis and chorionic villus sampling. Ante- 
natal screening for trisomy 21 raises ethical issues 
and heated debate, but the argument for decreasing 
Down syndrome-associated health problems is un- 
opposed. Improved treatments for congenital heart 
disease, leukemia, and infections increase longevity 
and reduce mortality and morbidity. Regular health 
checks identify less severe but treatable chronic 
problems such as glue ear, reduced visual acuity, 
and hypothyroidism. Research is also focusing on 
Down syndrome developmental psychology, which 
is important for teachers, and Down syndrome 
neurology, especially the genetic basis of the increased 
risk of Alzheimer disease. Molecular genetic 
research methods are being employed to identify 
genes on chromosome 21 that cause the above- 
mentioned medical complications of Down syn- 
drome. Overall, in recent years remarkable progress 
has been made in understanding Down syndrome 
and managing its medical complications. As a 
result, affected infants now have a much brighter 
future than the one that was envisaged only a few 
decades ago. 


See also: Alzheimer Disease; Nondisjunction; 
Robertsonian Translocation; Trisomy 
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Downstream 
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Downstream refers to those portions of nucleic acid 
that are more remote from the initiation sites and will 
therefore be translated or transcribed later. 


See also: Initiation Factors 
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Of the many organisms studied by geneticists in the 
twentieth century, the fruit fly Drosophila melano- 
gaster has become one of the most widely used. It 
is small (adults a few mm long), fecund (hundreds 
of progeny from a single female), a rapid breeder 
(generation time about 10 days), innocuous, and an 
undemanding laboratory pet. These qualities allowed 
a school of biologists centered around T. H. Morgan 
to discover many details of the chromosomal basis of 
heredity, during the first three decades of the century. 
In recent decades, genetic tools such as transgenesis, 
insertional mutagenesis, and a sequenced genome, and 
experimental tools for investigation of its cell biology 
and development, have allowed study of the basic 
cellular and developmental mechanisms that it shares 
with other animals, including humans. 


Genetic Tools 


Classical Genetics 

The confirmation of chromosomes as the location of 
the genetic material depended largely on work on 
Drosophila, e.g., on detailed recombination maps, cor- 
relation between sex linkage and inheritance of sex 
chromosomes, and correlation between genetic cross- 
overs and the exchange of material between homo- 
logous chromosomes that could be distinguished 
cytologically. The generation of numerous mutations 
following the discovery of induced mutagenesis by 
Muller in the late 1920s led to the construction 
of genetic maps unrivaled in any other animal. The 
banding patterns of giant polytene chromosomes 
(which are overreplicated but not separated by cell 
division) in larval salivary glands also facilitated gene 


mapping. 


Transposable Elements 

The P transposable element was successfully used to 
introduce cloned DNA into Drosophila by Rubin and 
Spradling in 1982. This aided a marriage of classical 
genetics with molecular biology. Firstly, cloned DNA 
can be tested for function in transgenic flies. Genes can 
be tested for biological function; regulatory sequences 
of DNA can be assayed by fusing them to reporter 
genes whose expression can easily be visualized; pro- 
teins can be tagged with specific markers (e.g., green 
fluorescent protein) and their subcellular distribution 
visualized, even dynamically in living flies. Secondly, 
integration of a P element into a gene can cause a mu- 
tation, and allow immediate molecular identification 
of the affected gene. Methods for controlling trans- 
posase activity make it possible to mobilize P elements 
to new locations, and hence generate large numbers of 
insertional mutants using simple crossing schemes. 


The Drosophila Genome 

Sequencing of the fly genome, largely completed in 
2000, suggests that Drosophila has some 13 600 genes. 
Most of these have homologs in other eukaryotes. The 
availability of the genome sequence greatly facilitates 
comparisons between it and other model organisms, 
and makes it trivial for drosophilists to perform many 
molecular manipulations of Drosophila DNA. 


Strategies for Using Drosophila as a 
Model Organism 


Studying Biological Processes 

Originally the strength of Drosophila was as a model 
for understanding the mechanisms of heredity. Now 
its strength continues because its genetic tools can be 
used to study biological processes that are widespread 
throughout eukaryotes, including humans. Examples 
include the laying down of the anterior—posterior body 
plan, cell signaling pathways, cell division, axon path- 
finding, cell adhesion, cell polarity, and vesicle traffic. 


Genetic Screens 

The ability to mutagenize flies easily, and the growing 
availability of insertional mutants from stock centers, 
make it possible in principle to screen for mutations 
that affect any biological process of interest. Novel 
mutations can identify proteins of central importance 
to that process, without any preconceptions about their 
molecular properties. Understanding the process then 
results from further study of the phenotypes, and from 
cloning affected genes so that they and their products 
can be studied at the molecular level. A spectacularly 
successful example was the screens of Niisslein- 
Volhard, Wieschaus, and colleagues, around the early 
1980s, for mutations that affect embryonic pattern 
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formation. These provided enough material for a gen- 
eration of drosophilists to address the major processes 
of embryonic pattern formation, and the cloned genes 
allowed identification and comparative study of im- 
portant homologs in other species, including humans. 


Reverse Genetics 

Conversely, one may start with a cloned gene, and 
generate a fly mutant to study its function. One 
important source of genes is the few hundred genes 
implicated in human genetic diseases; over 60% of 
these have an orthologous gene in flies. Fly stock 
centres carry P insertions in over a quarter of essential 
Drosophila genes, and this proportion is increasing; if 
a gene of interest is represented in these stocks, a 
mutant can be obtained by post in less than a week. 
Alternatively, if a P element lies sufficiently close to a 
gene of interest, transposase-mediated remobilization 
can generate local deletions, or local reinsertions of the 
element, which may mutate the gene of interest. More 
recently, transgenic constructs that can generate an 
extrachromosomal copy of a gene of interest, which 
has a recombinogenic double-strand break, have been 
used by Rong and Golic for targeted gene knockouts; 
in principle this should now offer the allow generation 
of precise mutant lesions in any gene in the genome. 


Drosophila in Evolutionary Biology 


The extensive genetic knowledge of Drosophila has 
also made it a model for many aspects of evolutionary 
genetics. The evolutionary history of P elements, and 
the phenomenon of hybrid dysgenesis in which the 
progeny of a cross between a P-carrying male and a 
non-P-carrying female show germline degeneration, 
have clarified the parasitic nature of most transposable 
elements, and their evolutionary dynamics. Other 
phenomena studied include barriers to interspecific 
courtship and mating, and molecular aspects of popu- 
lation variability. 


See also: Morgan, Thomas Hunt; 
Muller, Hermann J; Neurogenetics in Drosophila; 
P Elements 
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The growth or viability of various organisms can be 
inhibited by a variety of drugs. In the case of bacteria, 
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the drug referred to is typically an antibiotic or a 
synthetic derivative of an antibiotic. Such drugs are 
usually specific for certain metabolic reactions and 
their effectiveness depends on the sensitivity of the 
cellular target and the organism’s ability to trans- 
port the drug inside the cell. If an organism does 
not normally carry out the targeted reaction or if 
it cannot transport the drug then it will be unaffected 
by the drug. However, typically the term ‘drug 
resistance’ is used to refer to the state of an organism 
that is no longer inhibited by a drug that previously 
inhibited it. 

While resistance to a drug or an antibiotic can occur 
in an organism as the result of mutations altering the 
cellular target of the drug, it is far more common for 
bacteria to acquire entire genes or sets of genes which 
confer resistance from other bacteria. Of course, these 
genes can also mutate to give rise to altered resistance 
patterns. Many of these genes are found on trans- 
posons or other transposable elements, and/or resist- 
ance plasmids which facilitate their horizontal transfer 
from organism to organism. The resistances encoded 
by these genes do not involve changing the cellular 
target of the drug, but rather mechanisms for inacti- 
vating the drug or affecting its uptake. Of course, 
antibiotic resistance is selected for by the widespread 
use of antibiotics. 

Acquired drug resistance among pathogenic bac- 
teria is becoming a threat to the continued use of anti- 
biotics as an effective treatment of many infections. 
Most strains of Staphylococcus aureus, a notorious 
hospital-acquired (nosocomial) pathogen, contain 
plasmid-borne multiple drug resistance but remain sus- 
ceptible to vancomycin. Unfortunately, vancomycin- 
tolerant strains have been observed. Therefore, strains 
of S. aureus may soon appear which will lead to infec- 
tions that cannot be treated by antibiotics. Certain 
pathogenic strains of Enterococcus faecalis, Mycobac- 
terium tuberculosis, and Pseudomonas aeruginosa have 
been identified which are resistant to every clinically 
available antibiotic. 

Because of drug resistance, the death rates of some 
previously treatable communicable diseases such as 
tuberculosis have started to increase again. Note that 
resistant organisms are not more effective at causing 
disease. Indeed, just like their antibiotic-sensitive re- 
latives, many of these strains can be members of 
the normal human flora and are only opportunistic 
pathogens. Unfortunately, the diseases that they cause 
are very difficult to treat. 

Drug resistance is not confined to prokaryotes, nor 
even to cellular organisms. The base analog azidothy- 
midine (3'-deoxy-3’-azidothymidine, AZT) can in- 
hibit retroviral reverse transcriptase and is used to 
treat retroviral infections such as infection with the 


human immunodeficiency virus which causes AIDS. 
However, mutant viruses arise whose reverse tran- 
scriptase is resistant to the drug. Similarly, protease 
inhibitors can also be used to treat retroviral infec- 
tions, because they inhibit the cleavage of the poly- 
proteins that these viruses encode into functional 
proteins. However, mutations can also arise that 
make the viruses resistant to these drugs. Note that, 
unlike the case of the rapidly spreading antibiotic 
resistance in bacteria, the resistance of the retroviruses 
to these drugs involves mutations that lead to modi- 
fication of the target of the drug. Unfortunately, muta- 
tions occur in RNA viruses at a very high frequency, 
making this type of drug resistance also a serious 
problem. 


See also: Antibiotic Resistance; Antibiotic- 
Resistance Mutants; Integrons; Resistance 
Plasmids; Resistance to Antibiotics, Genetics of, 
Transposable Elements; Transposable Elements 
in Plants 
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Clinical Features 


Duchenne muscular dystrophy (DMD) (or Meryon 
disease) is inherited as an X-linked recessive trait and 
is the commonest form of muscular dystrophy with a 
birth incidence of around 1 in 3500. Its clinical fea- 
tures, familial incidence, and muscle pathology were 
first described in detail by the English physician 
Edward Meryon in 1852 and some years later by 
Duchenne de Boulogne. 

It is characterized by enlarged calves (hence the old 
name pseudohypertrophic muscular dystrophy) and 
progressive muscle wasting and weakness, beginning in 
early childhood and mainly affecting the proximal 
limb girdle musculature. Affected boys become chair- 
bound by around age 12 and often succumb by the age 
of 20, usually from cardiac involvement or respiratory 
failure. 

The responsible gene is located at Xp21 and its prod- 
uct is termed ‘dystrophin.’ This remains the largest 
gene associated with a disease (2.4 Mb) and takes over 
24h to be transcribed. It consists of 85 exons with 
introns making up 98% of the gene. There are three 
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dystrophies are caused, for example, by deficiencies of a particular sarcoglycan or merosin. 


full-length transcripts plus five shorter transcripts 
generated by internal promoters. In skeletal muscle, 
dystrophin is associated with other membrane pro- 
teins (Figure |), has a molecular mass of 427 kDa 
and consists of 3685 amino acids. 

Mutations that disrupt the reading frame of the gene 
(out of frame) result in a complete absence of dystro- 
phin in DMD, whereas mutations that are in-frame 
result in a partial deficiency of the protein and the 
clinically similar but milder condition of Becker 
muscular dystrophy. Roughly two-thirds of cases are 
caused by large deletions, the remainder being due toa 
variety of point mutations, small deletions, or dupli- 
cations. Rarely, two different mutations occur in the 
same family and may result from the insertion of a 
transposon into the dystrophin locus. 


Counseling and Prenatal Diagnosis 


Around 10% of female carriers have some muscle 
weakness (so-called ‘manifesting carriers’) due to 
skewed X-inactivation, the proportion of X chromo- 
somes expressing the mutant gene being greater than 
in normal carriers. For genetic counseling, carriers can 
be identified by a raised serum creatine kinase (SCK) 
level in roughly two-thirds of cases, but more pre- 
cisely by DNA studies on peripheral blood leukocytes. 
Prenatal diagnosis is possible from DNA studies on 
cultured amniotic fluid cells obtained at amniocentesis 
around 16-18 weeks gestation or on chorionic villus 
material obtained at chorionic villus biopsy around 10 
weeks gestation. Note however that, because of 
germ-line mosaicism, some mothers may harbor a 
mutation in a proportion of their ovarian tissue but 
not in their somatic (peripheral blood leukocytes) 
cells. For this reason prenatal diagnosis might have 


to be considered in any subsequent pregnancy once a 
woman has had an affected son because there can be 
no certainty that this is the result of a new mutation. 
Preimplantation diagnosis in future could avoid the 
problems of selective abortion. 


Treatment 


Steroids may slow the progression of the disease for a 
time but no drug has yet been found that affects the 
long-term course of the disease. Some form of gene 
therapy offers hope for the future either by using a 
viral vector carrying the normal dystrophin gene to 
transform muscle cells in vivo, or to upregulate a pro- 
tein (such as utrophin) to compensate for the defi- 
ciency of dystrophin. Stem cell therapy now also 
seems likely to be another approach to treatment 
in future. However, it is always possible a drug 
may be found that interrupts the pathogenic pathways 
in some way and thereby ameliorates the disease 
process. 


Further Reading 

Brown SC and Lucy JA (eds) (1997) Dystrophin: Gene, Protein and 
Cell Biology. Cambridge: Cambridge University Press. 

Emery AEH (1993) Duchenne Muscular Dystrophy, 2nd edn. 
Oxford: Oxford University Press. 

Emery AEH and Emery MLH (1995) The History of a Genetic 
Disease: Duchenne Muscular Dystrophy or Meryon’s Disease. 
London: Royal Society of Medicine Press. 

Emery AEH (ed.) (2001) The Muscular Dystrophles. Oxford: 
Oxford University Press. 


See also: Genetic Counseling; Muscular 
Dystrophies 
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Duffy is a human blood group system that includes 
the Fy*/ Fy? polymorphism. The Duffy glycoprotein, 
which is absent from the red cells of most black 
Africans, functions as a chemokine receptor and is ex- 
ploited by merozoites of the malarial parasite Plasmo- 
dium vivax for invasion of host red cells. 


Duffy Antigens and Phenotypes 


The Duffy system consists of three main alloantigens: 
Fy’, Fy”, and Fy3. Fy® and Fy” are allelic and rep- 
resent an amino acid substitution in the Duffy glyco- 
protein. Fy3 is defined by alloantibodies produced by 
rare people with a total deficiency of Duffy glycopro- 
tein. Anti-Fy3 probably reacts with a variety of epi- 
topes on different region of the Duffy glycoprotein. In 
people of European and Asian origin there are three 
main Duffy phenotypes: Fy(a+b—), Fy(a+b+), and 
Fy(a—b+); all are Fy3-positive (Table 1). In people of 
African origin there is a fourth, common phenotype, 
Fy(a—b—) Fy3-negative. The frequency of Fy(a—b—) 
varies from about 70% in African Americans to 100% 
in the Gambia, but is extremely rare in people of other 
races. There are three main alleles at the FY locus: Fy“ 
producing Fy? and Fy3; Fy” producing Fy? and Fy3; 
and Fy producing no Duffy antigen on red cells. 

Fy* has a frequency of around 67% in people of 
European origin. The frequency is much lower (10%- 
20%) in African Americans and much higher in the 
Far East and South East Asia. Fy” has a frequency of 
about 80% in Europeans and 23% in African Ameri- 
cans. A weak form of Fy” has been called Fy*. 

Fy3 is present on all cells except those of the 
Fy(a—b—) phenotype. Although Fy(a—b—) red cell 
phenotype is common in Africans, they seldom make 
anti-Fy3, whereas the few non-African people with 
Fy(a—b—) phenotype have all been found because of 


the presence of anti-Fy3 in their sera. Like Fy3, 
another Duffy antigen, Fy5, is not expressed on cells 
of the Fy(a—b—) phenotype, but, unlike Fy3, Fy5 is 
also absent from cells of the very rare Rhy pheno- 
type, which lack Rh proteins. The reason for this 
association between Duffy and Rh is not known. 


Molecular Basis of Duffy Polymorphism 


The Duffy gene consists of two coding exons. The 
Fy*/Fy” polymorphism results from a single nucleo- 
tide change in the second exon: Fy* encodes Gly42; 
Fy’ encodes Asp42. The coding region for the Fy 
allele, responsible for the Fy(a—b—) phenotype in 
Africans, is identical to that of a Fy’ allele, but the 
Fy allele has a mutation in the promoter region 
of the gene 67 nucleotides upstream of the main 
translation-initiating methionine codon. This muta- 
tion changes the TTATCT motif of the binding site 
for the erythroid-specific transcription factor GATA- 
1 to TTACCT, preventing expression of the gene in 
erythroid tissue. Fy(a—b—) Africans, therefore, lack 
Duffy glycoprotein from their red cells, but they do 
express it in other tissues. This explains why they only 
very rarely make anti-Fy3 and never make anti-Fy”. 
The molecular basis of Fy(a—b—) in two white 
people and one native American, all with anti-Fy3, is 
different: in one there was a 14-bp deletion that results 
ina reading frameshift and introduction of a premature 
stop codon; in the other two there are nonsense muta- 
tions that introduce translation stop codons. These 
individuals would not be expected to have Duffy glyco- 
protein on their red cells or in any of their tissues. 


Duffy Glycoprotein 


The Duffy glycoprotein, also known as the Duffy 
antigen receptor for chemokines (DARC), is a recep- 
tor for a variety of chemokines including interleukin-8 
(IL-8) and melanoma growth stimulatory activity 
(MGSA). It has a molecular mass of 36-46 kDa and 
consists of a 336-amino-acid polypeptide that traverses 
the membrane seven times, with a 63-amino-acid extra- 
cellular N-terminal domain containing two potential 


Table | Phenotypes of Duffy system 
Phenotype Genotype Frequencies(%) 

Europeans Africans 
Fy(at+b—) Fy“/Fy* or Fy°/Fy 20 10 
Fy(a+b+) Fy/Fy? 48 3 
Fy(a—b+) FyPiFy? or Fy/Fy 32 20 
Fy(a—b—) Fy/Fy 0 67 


NH3 
E O| 
Membrane 
COOH 
Figure | Likely conformation of the Duffy glycopro- 


tein (DARC) in the red cell membrane, showing the 
seven transmembrane domains, the N-glycosylated 
N-terminal extracellular domain, and the C-terminal 
cytoplasmic domain. The arrow shows the position of 
the Fy*/Fy° polymorphism. 


N-glycosylation sites, and a cytoplasmic C-terminal 
domain (Figure |). This structure is characteristic 
of one of the largest gene families in the mammalian 
genome, the seven-transmembrane-segment class 
of the G-protein-coupled superfamily of receptors, 
which bind many different ligands. DARC is present 
on endothelial cells lining postcapillary venules 
throughout the body. It has also been detected on 
some other vascular endothelial cells, on epithelial 
cells of renal collecting ducts and pulmonary alveoli, 
and on Purkinje neurons of the cerebellum. 

The function of DARC is not known. It has been 
suggested that it may act as a clearance receptor for 
inflammatory mediators and that Duffy-positive red 
cells function as a “sink” or as scavengers for the 
removal of unwanted chemokines. If so, this function 
must be of limited importance as DARC is not present 
on the red cells of most Africans. 


Duffy and Malaria 


The Duffy glycoprotein is exploited by Plasmodium 
vivax merozoites as a receptor and is essential for their 
invasion of red cells. P. vivax is responsible for tertian 
malaria, a form of malaria widely distributed in Africa, 
but less severe than that resulting from P. falciparum 
infection. Fy(a—b—) red cells are refractory to invasion 
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by P vivax merozoites. Consequently, the Fy allele, 
which prevents expression of DARC on red cells 
whilst permitting expression in other tissues, must 
have a selective advantage in areas where P. vivax is 
present, and this would override any potential dis- 
advantage arising from the absence of the chemokine 
receptor on red cells. 


Further Reading 

Hadley TJ and Peiper SC (1997) From malaria to chemokine 
receptor: the emerging physiologic role of the Duffy blood 
group antigen. Blood 89: 3077-3091. 


See also: Blood Group Systems 
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Renato Dulbecco (1914— ) was born in Catanzaro, 
Italy; he studied medicine at the University of Turin, 
where he received his MD degree in 1936. After World 
War II, Dulbecco moved to the United States to join 
Salvador Luria’s laboratory at Indiana University as 
Research Associate. There he studied the mechanism 
of bacteriophage multiplicity reactivation following 
treatment with UV light. He showed that Luria’s 
hypothesis of multiplicity reactivation by recom- 
bination was incorrect and proposed that a likely 
explanation was repair of damage in multi- 
complexes. He further discovered the phenomenon 
of photoreactivation: reactivation of bacteriophages 
inactivated by UV radiation by treatment with visible 
light. 

In 1950 Dulbecco joined Max Delbriick’s labora- 
tory at the California Institute of Technology (Caltech) 
as Senior Research Fellow. He extended the technique 
of bacteriophage plaque isolation to animal viruses to 
produce genetically pure viral clones, a technique that 
made it possible to study the genetics of animal 
viruses. After establishing his own laboratory as 
Associate Professor at Caltech in 1952, he applied 
the plaque isolation technique to poliomyelitis virus 
in collaboration with Marguerite Vogt. They isolated 
mutants of poliovirus with reduced neuropathogeni- 
city and showed that reverse mutants could be used to 
study mutability in polioviruses. 

In the late 1950s Dulbecco and Vogt began to study 
polyomavirus, a mouse DNA tumor virus. They 
showed that the virus could cause either cell death or 
neoplastic transformation of cell growth properties. 
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Both interactions could be studied in vitro, using cul- 
tured animal cells. These discoveries set the stage for 
molecular characterization of DNA tumor virus genes 
and the role of viral genes in cell transformation. 

In 1963 Dulbecco moved to the newly established 
Salk Institute for Biological Studies in San Diego. 
Over the next 10 years his laboratory used the small 
DNA tumor viruses, polyoma and SV40, to explore 
mechanisms of neoplastic cell transformation, produ- 
cing a series of fundamental insights into the process. 
He and his collaborators demonstrated that viral 
DNA persisted in transformed cells, and was inte- 
grated into cellular DNA. They showed that there 
were two classes of viral genes, ‘early’ and ‘late,’ 
which were transcribed from opposite strands of the 
viral DNA. Both classes of genes were expressed dur- 
ing productive infection, but only the early genes were 
expressed in transformed cells. They provided evi- 
dence that the activity of viral genes led to transcrip- 
tion of cellular genes. Using viral mutants, they 
showed that viral genes could influence the growth 
properties of transformed cells. In recognition of this 
work, Dulbecco was awarded the Nobel Prize for 
Physiology or Medicine in 1975, together with his 
former associates, David Baltimore and Howard 
Temin, who were honored for their discovery of 
reverse transcriptase. 

From 1972 to 1977 Dulbecco served as Deputy 
Director of Research at the Imperial Cancer Research 
Fund Laboratories in London, where he and his col- 
leagues used antisera directed against polyoma tumor 
antigens to characterize virus-specific proteins in the 
plasma membrane of infected and transformed cells. 
These experiments led to the identification of the poly- 
oma middle T antigen, the viral protein that is primar- 
ily responsible for cell transformation by polyoma. 

Dulbecco returned to the Salk Institute in 1977 to 
pursue a new interest in mammary cell biology and 
breast cancer. At the same time, he continued to reflect 
on the implications of genetic discoveries for under- 
standing cancer. These thoughts led him to propose an 
international undertaking to sequence the human gen- 
ome, described in an article published in Science, in 
1986. He interrupted his laboratory research to 
assume the presidency of the Salk Institute in 1988, 
serving with distinction in that post until 1993. There- 
after he joined the Istituto di Tecnologie Biomediche 
Avanzate in Milan, where he directed the Italian Gen- 
ome Project while continuing to study genes involved 
in mammary cell differentiation. At present he divides 
his time between the institute in Milan and the Salk 
Institute, where he is Distinguished Research Profes- 
sor and President Emeritus. 


See also: Delbriick, Max; Luria, Salvador 
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L.C. Dunn (1893-1974) was a naturalist interested in 
development and evolution at the organismal level. 
He embarked on his formal studies of biology at 
Dartmouth College less than 10 years after the redis- 
covery of Mendel’s principles. Later, as a graduate 
student he found himself in the laboratory of W.E. 
Castle who was one of the very ‘roots’ of the tree of 
genetics in the United States, being the first to devote 
himself entirely to this new field. While Dunn was 
grappling with applying these new principles, he was 
interrupted by World War I breaking out. After serv- 
ing in France, he returned to finish his PhD and took 
up his first position in 1919 as a geneticist at the Storrs 
Agricultural Station in Connecticut. There, he cut 
his teeth on the analysis of single gene mutations in 
chickens and mice. Later, he was inspired to return to 
a more academic environment as well as make a fresh 
start on what he was by then calling ‘developmental 
genetics.’ In 1928, he was offered what he considered 
to be a prestigious full professorship at Columbia 
University, where he had the awesome task of help- 
ing to fill the vacancies left by the retirement of 
E.B. Wilson and the departures of Morgan, Bridges, 
and Sturtevant to form a new laboratory at the 
California Institute of Technology. “Dunny,” as he 
came to be called by generations of students, was 
only 35 at the time. 

The system he was lucky enough to come to study 
eventually challenged several of Mendel’s principles 
and their later additions. It was a phenotypic reporter 
system that until ‘knockout’ technology became avail- 
able, defined most of the known mammalian embry- 
onic lethals. The T locus, as it was once called, was a 
region of the mouse genome defined by the dominant 
mutation T (Brachyury) causing a short tail; however, 
when homozygous, it was an embryonic lethal. Thus 
was described one of the first deviations from Men- 
del’s 1:2:1 rule, because when Brachyury hetero- 
zygotes were mated together, one quarter of the 
progeny were lost and a 2:1 ratio resulted. T had 
been given to Dunn by its discoverer Dobrovolskaia- 
Zavadskaia along with two wild trapped mutations (t), 
which caused him to struggle with the concept of 
multiple alleles. The recessive mutations were also 
embryonic lethals that he later dubbed ‘pseudoalleles.’ 
This was because even though they acted like alleles 
of T by interacting with it to cause tailless animals, 


they suppressed normal recombination between T 
and nearby markers. He went on to describe over 
100 such chromosomes that were ultimately found to 
contain six different lethals. Starting in 1935, along 
with a colleague Salome Glueckshon-Waelsch, he 
described the embryology of many different t lethal 
syndromes. 

The most blatant defiance of Mendel’s rule of 
independent assortment was the fact that t haplotypes, 
as they later came to be called, suffered from transmis- 
sion ratio distortion through males, so that over 90% 
of their progeny, instead of the expected 50%, carried 
the t. This phenomenon, which is still poorly under- 
stood, explains the maintenance of these mutations in 
wild populations of mice. 

During his Emeritus days and well into his retire- 
ment at the Nevis Biological Station of Columbia, 
Dunny worked actively in his mouse room and wrote 
extensively. As a young graduate student of his long- 
time colleague D. Bennett, I first encountered him in 
1968 on his knees on the Nevis barn floor chasing an 
escapee wild mouse from Novosibirsk. He labored 
with love in that mouse room until he died there in 
1974 at the age of 80. 

Dunn’s perspective in the history of biology was 
a unique one. He spanned the age of the rediscovery 
of Mendel to the birth of molecular biology. With- 
out understanding the significance at the time, he 
married the precise study of genetics to developmental 
biology. In that sense he debunked an intellectual 
dichotomy that in some corners was debated seriously 
until the advent of modern molecular biology, which 
unequivocally united the areas of developmental bio- 
logy and genetics. As if a prophet, he commented on 
the progress of genetics in a presidential address pre- 
sented at the 1961 meeting of the American Society of 
Human Genetics: 


What we may be witnessing now is only the beginning of a 
kind of renaissance... What seems to be most important, 
especially in its implications for the future, is the growing 
recognition of the logical unity of genetics... being con- 
cerned with a system of elements having similar attributes 
in all forms of life, can be seen to transcend the special 
problems of different categories of organisms. 


Dunn thought and wrote broadly about scientific his- 
tory, philosophy, and the human condition. He was 
indeed the renaissance geneticist. 


Further Reading 
Bennett D (1977) L.C. Dunn and his contribution to T-locus 
genetics. Annual Review of Genetics 11: 1—12. 


See also: Brachyury Locus 
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Mutations that stunt growth and produce individuals 
with markedly reduced height, or dwarfism, are known 
in many species. One of the seven genetic traits 
originally analyzed by Mendel in his formulation of 
the laws of inheritance was dwarfism in the garden 
pea. Recently, Mendel’s dwarf mutation has been 
shown to affect production of the hormone gibber- 
ellin, which is essential for internode elongation. In 
humans, mutations in at least 320 genes can cause 
short stature often in conjunction with other abnor- 
malities. The most common form of human genetic 
dwarfism, achondroplasia, causes disrupted develop- 
ment of the long bone growth plates, producing dis- 
proportionate shortness of the limbs. In dogs, this 
defect is responsible for the distinctive body form 
of the dachshund and the basset hound. Human 
achondroplasia mutations are dominant mutations 
to the gene for fibroblast growth factor receptor 3 
(FGFR3). Many of these human mutations appear 
to be spontaneously generated in a parental germline 
cell. 


See also: Achondroplasia 
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Many spontaneous mouse mutants with growth 
insufficiency, or dwarfism, phenotypes exist. The 
genes mutated in these mice are important for normal 
growth regulation in mice and other mammals 
(Watkins-Chow and Camper, 1998). Several tissues 
are critical for normal growth. The hypothalamus 
secretes releasing factors that act directly on the 
adjacent pituitary gland. The pituitary, in response 
to hypothalamic signals, secretes hormones into the 
peripheral bloodstream. Finally, target organs act 
in response to the presence of pituitary hormones 
in the bloodstream. Target organs may also sec- 
rete factors that feed back to the hypothalamus and 
pituitary gland in order to regulate secretion of 
hormones. 
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Growth Hormone (GH) 


Growth hormone (GH) is produced and secreted 
from the pituitary gland in response to hypothalamic 
growth hormone releasing hormone (GHRH). 
Receptors for GHRH in the pituitary gland sense the 
presence of GHRH and become activated. The activa- 
tion of the GHRH receptors (GHRHR) induces the 
secretion of GH from somatotropes in the pituitary 
gland. GH is then carried through the bloodstream to 
peripheral tissues where it increases amino acid uptake 
and protein synthesis. GH promotes the secretion of 
insulin-like growth factors I and II (IGF-I, IGF-II) 
which act on target organs to promote cell prolifer- 
ation, as well as acting on the hypothalamus and pitu- 
itary gland to regulate GH secretion. 


Thyroid Hormone (TH) 


Thyroid hormone (TH) is also regulated by the 
hypothalamus and pituitary gland and is essential for 
normal growth. The hypothalamus secretes thyrotro- 
pin releasing hormone (TRH) which induces secretion 
of thyroid stimulating hormone (TSH) from the 
thyrotropes of the pituitary gland. TSH then acts on 
the thyroid gland to promote TH secretion. TH in the 
peripheral blood acts to increase metabolic rates and 
promotes growth. 


Mouse Mutants 


Spontaneous mutations have been identified in genes 
acting at various levels of growth regulation. The 
little (lit) mouse mutation is an autosomal recessive 
mutation resulting in proportionate dwarfism visible 
at 2 weeks of age. Adult lit mice have a body weight 
two-thirds the size of control littermates. Female mice 
are fertile but often fail to nurse their first litters, 
whereas male mice have reduced fertility. Dwarfism 
in little mice is due to a missense mutation in 
the GHRH receptor (Ghrhr) gene (Godfrey et al., 
1993). The mutation substitutes a glycine residue for 
a conserved aspartic acid residue within the ligand- 
binding domain of the receptor. This amino acid 
substitution greatly reduces the sensitivity of the 
receptor to GHRH. Therefore, little mice do not 
receive the hypothalamic GHRH signal to secrete 
GH. Serum levels of GH and IGF-I are low and 
the mice are dwarfed. In addition, /it/lit pituitaries 
have fewer somatotropes (GH-producing cells), be- 
cause GHRH normally stimulates proliferation of 
these cells. 

The Ames dwarf (df) mouse mutation is an auto- 
somal recessive mutation resulting in more severe 


growth defect than /it. df/df mice are proportionately 
dwarfed by 3 weeks of age and are half the size of 
control littermates as adults. The mice are infertile 
and hypothyroid as well. At the cellular level, Ames 
dwarf mice are almost completely lacking in pituitary 
somatotropes, lactotropes (prolactin-producing cells), 
and thyrotropes (TSH-producing cells). The lack of 
these three cell types causes the lack of GH, prolactin 
(PRL), and TSH in the mice. A missense mutation in 
the Prophet of Pit1 (Prop1) gene is responsible for 
Ames dwarfism. The Prop gene encodes a transcrip- 
tion factor which contains a paired-like homeo- 
domain. Ames dwarf mice have a serine to proline 
amino acid substitution within the DNA-binding 
domain of PROP1 (Sornson et al., 1996). Mutant 
PROP1 does not bind DNA effectively to regulate 
transcription of downstream genes, resulting in the 
failure of three pituitary cell types to differentiate and 
proliferate during development. 

Snell dwarf (dw) mice have a phenotype nearly 
indistinguishable from Ames dwarf mice. There are 
two noncomplementing alleles of the Snell dwarf 
mutation that are autosomal recessive and cause 
dwarfism, infertility, and hypothyroidism. dw/dw 
pituitaries completely lack GH-, PRL-, and TSH- 
producing cells. The Snell dwarf phenotype is due to 
mutations in the pituitary specific transcription factor1 
(Pit1) gene (Camper et al., 1990; Li et al., 1990). One 
of these mutations is a gene rearrangement and the 
other is a point mutation in the DNA-binding domain 
of this transcription factor. PIT1 is necessary for ex- 
pression of the GH, TSH, and PRL genes and for pro- 
liferation of the cells that produce these hormones. 
Both Ames and Snell dwarfs are unable to respond to 
hypothalamic signals to secrete GH and TSH due to 
pituitary defects. 

The hypothyroid (hyt) mouse mutation is an auto- 
somal recessive mutation resulting in growth retarda- 
tion, infertility, elevated TSH levels, undetectable TH, 
and extreme hypothyroidism. hyt/hyt mice have a 
mutation in the TSH receptor (Tshr) gene (Stein 
et al., 1994; Gu et al., 1995). The mutation is a leucine 
amino acid substitution at a conserved proline within 
the transmembrane domain of the TSHR. The mutant 
TSHR does not bind TSH, therefore the thyroid gland 
does not receive the pituitary signal to secrete TH. The 
end result is an unresponsive thyroid gland and 
reduced TH levels. 

The congenital goiter (cog) mouse mutation is an 
example of target organ failure. The cog mutation is an 
autosomal recessive mutation resulting in hypothy- 
roidism, goiter, and small size early in life. As cog/ 
cog mice age, serum TH levels increase, and they are 
able to overcome their growth retardation (Adkison 
et al., 1990). A point mutation within the thyroglobulin 


(Tgn) gene causes the cog phenotype (Kim et al., 
1998). Thyroglobulin is converted to TH within the 
thyroid gland. Therefore, cog mice receive the pitu- 
itary TSH signal to produce and secrete TH, but are 
unable to produce TH efficiently due to a defect in the 
Tgn gene. 

These examples of spontaneous mouse mutations 
involve the endocrine axis regulating growth. lit, df, 
and dw are pituitary defects; byt and cog are thyroid 
defects. Each produces a similar phenotype of propor- 
tionate dwarfism. Many spontaneous mouse mutants 
not discussed here have skeletal defects resulting in 
nonproportionate dwarfism. 
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Dynamic mutation is the process by which certain 
naturally occurring polymorphic DNA repeat 
sequences expand and result in human disease or fra- 
gile sites on chromosomes (Table 1). It is distin- 
guished from other (static) forms of mutation by 
being a process which can occur over several gener- 
ations, rather than being a single event. 

The inheritance of genetic material has long been 
assumed to conform to a single set of laws — the 
Mendelian laws of inheritance — that apply equally 
well to all genetic material. While DNA can be con- 
sidered to have remarkably consistent properties, cer- 
tain DNA sequences have features which set them 
apart from the majority of the genetic material. 
These distinct physical properties come about from 
an exceptional interaction between the DNA itself 
and the replicative machinery. Such unusual physical 
properties can manifest as unique genetic behaviour 
giving rise to non-Mendelian inheritance. 

Repeated DNA sequences can act as more or less 
discrete elements in the genome (Table 1). They can 
vary in the number of copies of repeat units from one 
chromosome to the next (polymorphism). The repeat 
copy number can expand through a unique form of 
mutation process referred to as ‘dynamic mutation.’ 
Where this increase in copy number has an effect on a 
gene that spans or includes the repeat then any result- 
ing disease can have unusual inheritance characteris- 
tics which reflect the molecular properties of the 
dynamic mutation process. Dynamic mutation of 
repeat sequences is now known to be a molecular 
mechanism responsible for the non-Mendelian genetic 
phenomenon of ‘anticipation’ — the increasing inci- 
dence/severity and/or decreasing age-at-onset of a 
disease in successive generations of an affected family. 
When the expanding repeat is located in or near a gene 
the dynamic mutation can bring about a change (gain 
or loss) of function of the gene product. 

The first identified unstable expanded repeats were 
the trinucleotides CCG and AGC found to cause 
fragile X syndrome (FRAXA) and spinobulbar mus- 
cular atrophy (SBMA) respectively. Subsequently 
myotonic dystrophy was also found to be due to an 
expanded AGC repeat, followed by a series of neuro- 
logical disorders (see Table 2). For SBMA and the 
other neurological disorders the AGC repeat is actu- 
ally translated as CAG into glutamine and for this 
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Table I Unique properties of dynamic mutations 


Repeat interruptions stabilize the DNA sequence 


Disease severity/age-at-onset is related to copy number 


Relationship between repeat copy number and instability or mutation rate 
Product of change in repeat copy number has different probability of further mutation 


Expanded alleles usually arise from premutations which can arise from a pool of longer ‘normal, ‘perfect’ repeats 


Table 2 Diseases and fragile sites due to expanded DNA repeat sequences 


Repeat motif Copy number range* Disease/fragile site Gene 
Normal Affected 
AGC 11-31 40-62 Spinobulbar muscular atrophy AR 
5-35 >80—2000 Myotonic dystrophy DMPK$ 
9-34 37-120 Huntington disease Huntington 
19-38 40-81 Spinocerebellar ataxia | Ataxin 
22-28 37-50 Spinocerebellar ataxia 2 SCA2 
13-36 68-79 Machado-Joseph (SCA3) MJDI 
420 21-30 Spinocerebellar ataxia 6 CACNLIA4 
7-17 38-130 Spinocerebellar ataxia 7 SCA7 
7-23 49-75 Dentatorubral pallidoluysian atrophy Atrophin 
16-37 107-? Spinocerebellar ataxia 8 SCA8 
CCG 6-55 >230 FRAXA FMRI 
6-25 ? (>230)' FRAXE FMR2 
6-29 ? (>230)' FRAXF - 
7-32 ? (>230)' FRAI IB (Jacobsen) CBL2 
16-50 ? (>230)' FRAI6A - 
AAG 7-34 >200—1200 Friedreich ataxia Frataxin 
|2mer 2-3 35-70 Myoclonic epilepsy | Cystatin B 
t24mer 5 6-14 Creutzfeldt—-Jacob disease PrP 
33mer 7-30 ? (>150)' FRAI6B - 
42mer 4-75 >75 FRAIOB - 


*Copy numbers between the normal and affected ranges can act as premutations, expanding in subsequent generations to 


give full mutations. 


TAIl observed alleles are above this copy number but the threshold is unknown. 


SAn additional gene(s) may be involved. 
No evidence for dynamic mutation. 


reason these disorders are sometimes collectively 
referred to as ‘polyglutamine disorders.’ The diseases 
caused by expanded repeats have also been collectively 
termed ‘trinucleotide repeat disorders’ however now 
that other length repeat motifs have been found to 
undergo the same dynamic mutation mechanism this 
title appears to be inappropriate. 

A growing list of diseases and/or chromosomal 
fragile sites (Table 2) is now known to have expanded 
repeats of motif length between 3 and 42 bases as their 
molecular basis. 


Mechanisms of Repeat Expansion 


cis-Acting Elements: Instability Related to 
the Copy Number of Perfect Repeats 


One of the paradoxical features of dynamic mutation 
loci is that they exhibit an apparently high mutation 
rate and yet there is evidence of founder effects — that 
certain chromosomes are predisposed to this form 
of mutation. The basis of this apparent discrepancy 
lies in the fact that the composition of the repeat se- 
quence has a role to play in determining its instability. 


Weber (1990) noted that the more polymorphic dinu- 
cleotide AC repeats were the longer ones that did not 
have interrupting bases in the repeat tract. This same 
rule appears to apply equally well for the 
trinucleotide repeats and for much longer repeats (up 
to 42 base pairs) which give rise to certain chromo- 
somal fragile sites. For example all expanded alleles of 
the SCAZ locus are perfect repeats whereas 98% of 
normal alleles (that do not expand) have a repeat inter- 
ruption. Furthermore the expanded alleles can be seen 
to have their origin in the pool of perfect alleles from 
the normal population. This is a common theme for 
each locus where the data have been collected, sug- 
gesting that it will be a common property of all 
dynamic mutations. 


trans-Acting Factors: Components of the 
Replicative Process Contribute to Repeat 
Instability 

The DNA repeat sequences are not thought to be 
unstable on their own. It is the interaction of the 
repeats with the machinery of DNA replication and 
repair which is thought to be the molecular basis of 
instability. Based on the size range of instability a role 
for the Okazaki fragment was postulated (Richards 
and Sutherland, 1994). Okazaki fragments are pieces 
of DNA that are transient components of lagging 
strand DNA synthesis at the replication fork. Once 
the repeat length reaches a size approximating that of 
the Okazaki fragment, the repeat sequences can show 
a dramatic increase in expansion consistent with the 
slippage of an untethered fragment. Repeat interrup- 
tions serve to anchor and therefore stabilize the repeat, 
preventing the Okazaki fragment from slipping 
during replication. Experimental evidence for a role 
for the Okazaki fragment in repeat instability has 
come from studies in yeast where strains mutant in 
the gene rad27 demonstrate increased repeat tract 
instability. rad27 mutant cells are defective in Okazaki 
fragment maturation, the rad27 gene coding for an 
enzyme that metabolizes the flap structure at the 5’ 
end of the Okazaki fragment. 


Pathway from Repeat Expansion to 
Disease 


Loss of Function 

The molecular pathway from genotype to phenotype 
differs from one locus to the next and is primarily 
dependant upon the location of the expanded repeat 
with respect to the gene(s) which it affects. The prin- 
ciple mechanisms involve loss or gain of function 
(Figure 1). The loss-of-function pathway confers 
either a recessive or X-linked mode of inheritance 
since a single normal allele could produce sufficient 
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of the transcript to avoid phenotypic consequences. 
For FRAXA and FRAXE the expansion of the repeat 
causes localized methylation and consequent extinc- 
tion of transcription from the FMRI or FMR2 gene 
promoter. Demethylation experiments demonstrate 
that protein can be produced from unmethylated 
genes with an expansion. Indeed rare males exist who 
carry the FRAXA expansion but for some reason do 
not undergo methylation, and these individuals appear 
to escape the more severe phenotypic consequences of 
the fragile X mutation. 

Both type 1 myoclonic epilepsy (EPM1) and Frie- 
dreich ataxia (FRDA) also involve loss of transcript. 
The EPM1 expansion is located within the promoter 
region of the gene and expansion appears to affect the 
ability of the promoter to function properly. The 
FRDA expansion occurs within an intron of the gene 
and appears to exert its effect either on the normal 
splicing of transcripts containing the expanded repeat 
or on the stability of these transcripts. Either way 
the result is diminished levels of mRNA and as a 
consequence a dramatic reduction in protein levels. 
Consistent with this pathway is the finding that rare 
point mutations in the FRDA gene also involve loss- 
of-function type mutations. 


Gain of Function 

Where the expanded repeat is located within the cod- 
ing region the repeat usually encodes polyglutamine. 
Instances where this is not the case (rare cases of 
Creutzfeldt—Jacob disease, CJD, and some polyalanine 
associated disorders) may not constitute the same form 
of dynamic mutation. The finding that most of the 
polyglutamine expansion diseases have a common 
pathological copy number threshold suggests that a 
common pathway is involved, even though the proteins 
involved appear to be quite unrelated (Table 2). The 
polyglutamine tracts are aggregated by crosslinking 
into higher molecular weight forms. This aggregation 
involves transglutamination of the polyglutamine. 
There is clear experimental evidence for polyglutamine 
tracts inducing apoptosis either im vivo or in vitro. 
In addition, several studies have shown that the 
polyglutamine aggregates in the form of nuclear inclu- 
sions in the affected cells, suggesting a pathological role 
in the neurodegeneration process. Experiments using 
transgenic Drosophila have indeed shown that the 
expanded polyglutamine tract is able to initiate the 
neurodegenerative disease and while nuclear inclusions 
do form they do not appear to be sufficient to account 
for the phenotype. An inhibitor of apoptosis does 
restrict the pathology providing evidence that the poly- 
glutamine does exert its effects by inducing inappro- 
priate apoptosis in the affected neurons. Since these 
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(A) 


Loss of function (loss of transcript) 


EPM1 


intron 


FRAXA, 
FRAXE 


FRDA 


(B) Gain of function 


polyglutamine disorders 
3) SBMA 
DRPLA 

HD 
SCA1-3,6,7 


5! 


intron 


Figure | Effect of position of repeat expansion on pathway from genotype to phenotype. The location of the 
expanded repeats (indicated by shaded triangles) with respect to the affected gene transcript. Each of these genes has 
more than one intron, although only one is shown for the sake of clarity. (A) Each of the loss-of-function pathways 
involves a loss or reduction in the amount of mRNA for the affected gene. For EPMI the repeat is located in the 
promoter region and expansion appears to affect transcription. The FRAXA and FRAXE repeats are located in the 5’ 
untranslated region, with expanded alleles causing methylation of both the repeat and the promoter region, resulting 
in abolition of transcription. The FRDA repeat is located within in an intron and seems to exert its effect by reducing 
the amount of mRNA, presumably by affecting RNA splicing and/or stability. (B) The diseases due to an expanded 
repeat coding for polyglutamine are likely to have a common pathway from genotype to phenotype. They exhibit 
dominant inheritance characteristics consistent with a gain of function. 


disorders represent a gain of function the resultant pathway(s) is/are quite distinct from the examples 


inheritance pattern is dominant. above. The repeat is located in the 3’ untranslated 
region of one gene DM-PK where it appears to effect 
The Exception: Myotonic Dystrophy RNA compartmentalization. The repeat is also 


Myotonic dystrophy does not fit into either of the located within the promoter region of at least one 
above categories, probably because its molecular other gene and may also be exerting an effect on 


adjacent genes, perhaps by inducing inappropriate 
expression. Overall the pattern of inheritance is 
dominant and therefore the molecular pathways 
would appear to require gain- of-function properties 
although it is possible that dosage effects may contrib- 
ute to the phenotype. 


Conclusions 


Dynamic mutation involves a novel molecular 
mechanism with can account for the non-Mendelian 
phenomena, such as anticipation, exhibited by certain 
human genetic diseases. Dynamic mutation also repre- 
sents the mechanism whereby one class of fragile site 
is generated from loci which normally exhibit copy 
number polymorphism of repeat sequences. The 
molecular pathway from genotype to phenotype is 
largely dependent upon the location of the expanded 
repeat with respect to any gene that it effects and can 
therefore result in either recessive, X-linked, or 
dominant inheritance characteristics. 
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Dysmorphology is the medical and scientific discip- 
line concerned with the study of birth defects and 
syndromes, and with diagnosis, investigation, and 
counseling of affected individuals and their families. 
Dysmorphologists have training in pediatrics and 
clinical genetics and work closely with their col- 
leagues in diagnostic and research laboratories. 


Making a Diagnosis 


When a baby is born with birth defects the parents 
have many questions: What is the problem?; What 
does it mean for our baby?; Why did it happen?; Will 
it happen again? In order to answer these questions 
properly and to manage and treat the baby appropri- 
ately a precise diagnosis is needed. 


Approach to Diagnosis 
A systematic approach to diagnosis includes: 


e History — family, past obstetric and pregnancy his- 
tory 

e Examination — behavior, size and proportions, spe- 
cific anomalies 

e Measurements 

e Investigations — may include chromosomal or 
DNA analysis, metabolic studies, X-rays or scans 

e Photographs — for record purposes. 


Once examination and investigations are complete a 
diagnostic synthesis can be made. Sometimes a clear 
diagnosis is suspected on clinical examination and 
then confirmed by investigations, e.g., by recognizing 
the poor tone and facial features of Down syndrome 
and then confirming the diagnosis by finding trisomy 
21 on chromosomal analysis. In other situations there 
may not be a confirmatory laboratory test so diagnosis 
rests on clinical examination alone. As well as obvious 
structural malformations a child may have a subtle 
characteristic facial appearance or ‘gestalt’? which can 
be recognized by those with appropriate aptitude and 
experience. Various computerized systems exist to 
help in syndrome identification and to provide rapid 
access to relevant literature. 


Delineation of Newly Recognized 
Syndromes 

There are over 1000 described dysmorphic syn- 
dromes and many more if syndromes due to small 
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chromosomal duplications and deletions are included. 
However many children do not fit into previously 
reported syndromes and new patterns are being recog- 
nized all the time. Dysmorphologists have regular 
meetings where undiagnosed patients are discussed 
and new entities delineated. Dissemination of new 
findings is through scientific publications or meetings 
and increasingly through electronic networks. 


Utility of a Diagnosis 


Establishing a precise diagnosis is important for the 
family, for clinical, social, and educational manage- 
ment, and for research. 

Once a diagnosis has been established the under- 
lying cause of the problem (where known) can be 
discussed with the family and information given 
about prognosis and risks of recurrence including 
options for prenatal diagnosis. For many conditions 
there are family support groups. The provision of 
social and educational care for the special needs of a 
child with a syndrome is helped by knowledge of the 
precise condition a child has. 

A precise diagnosis aids early clinical management 
and anticipatory care. For example some syndromes 
may be associated with visual or hearing deficits 
which, if identified early, can be treated. Some mal- 
formation syndromes are lethal and early diagnosis 
may lead to a decision to encourage the parents and 
family to maximize their time with the child rather 
than surgical correction of structural malformations 
which will not affect the eventual outcome. 

For many conditions the underlying mechanism is 
not fully understood and research is continuing. The 
dysmorphologist, by making precise diagnoses and 
ensuring a group to be investigated is as homogeneous 
as possible, maximizes the chance for successful 


research. Clinical observations inform the 


research direction. 


may 


See also: Clinical Genetics; Ethics and Genetics; 
Genetic Counseling 
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Dystrophin is a flexible, rod-shaped protein of 3685 
amino acid residues found predominantly in muscle 
tissue, inassociation with the inner surface of the plasma 
membrane. There are different isoforms of dystrophin 
located in brain cells, Schwann cells, and glial cells. In 
muscle, the probable function of dystrophinis to anchor 
specific membrane glycoproteins to the inner surface 
of the cell membrane. 

The dystrophin gene, situated at Xp21, is large 
(2400 kb) and complex (78 exons, multiple promoters), 
requiring about 16h to be transcribed. Mutational 
alterations (usually deletions) in the dystrophin gene 
lead to the sex-linked diseases Becker and Duchenne 
muscular dystrophies. Muscle degeneration exceeds 
the rate of regeneration, and the life span of affected 
individuals rarely exceeds 20 years. 
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See also: Duchenne Muscular Dystrophy (or 
Meryon’s Disease); Muscular Dystrophies 


E.coli 


See: Escherichia coli 
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Genes that are expressed immediately after phage 
infection are termed early genes. Their transcription 
generally requires only the machinery of the host, 
possibly augmented with proteins carried inside the 
phage particle. The products of these genes are pri- 
marily involved in restructuring the host cell to 
become an efficient factory for making new phages — 
blocking host nucleases, adapting the transcription or 
translation machinery, changing membrane proper- 
ties, degrading host DNA or blocking its synthesis. 
Depending on the life cycle of the particular phage, 
preparation of the cellular machinery to make the 
phage DNA may involve either early genes or middle- 
mode genes, which do require new phage-encoded 
proteins for their expression. 


See also: Bacteriophages 
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The ‘Epstein-Barr virus determined nuclear antigen’ 
or EBNA was discovered by Reedmanand Klein (1973) 
by anticomplement immunofluorescence, following 
the staining of acetone-methanol fixed smears of 
EBV-carrying lymphoblastoid cell lines with the sera 
of EBV-positive human donors. All EBV-DNA-carry- 
ing but not EBV-negative cells show brilliant fine 


granular nuclear fluorescence. Unlike the T-antigens 
of the papova- and adenoviruses, EBNA remains asso- 
ciated with the chromosomes during mitosis. Later 
studies revealed that EBNA is a family of six proteins. 
For their nomenclature and function, see Epstein- 
Barr Virus (EBV). 
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See also: Epstein-Barr Virus (EBV); Tumor 
Antigens Encoded by Simian Virus 40 


Ectoderm 


See: Developmental Genetics 
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The ectodermal dysplasias form two main subgroups: 
those in which sweating is deficient and the absence of 
sweat glands coincides with varying degrees of hypo- 
dontia, and hair and nail deformities, and those in 
which sweating and teeth are normal, but in which 
brittle hair, nails, and palmoplantar hyperkeratosis 
occur. Whilst the former is X-linked recessive or 
autosomal recessive, the latter is usually autosomal 
dominant. Numerous syndromic variations occur and 
recently there has been substantial progress in the 
pathogenesis, with several candidate genes having 
been identified. 


Pathogenesis 


Unlike many other genodermatoses, such as the ich- 
thyoses, Ehlers-Danlos syndrome (EDS), pseudox- 
anthoma elasticum, epidermolysis bullosa, and cutis 
laxa in which particular structural components are 
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defective, the ectodermal dysplasias show defective 
organogenesis. In contrast to these simpler disorders, 
in which particular anatomical structures, such as 
blood vessels, bones, or ligaments have intrinsic weak- 
nesses, in ectodermal dysplasias, the orchestration of 
organogenesis is disturbed. Thus instead of errors in 
simple structural proteins such as collagen or keratin 
in EDS or the ichthyoses, respectively, mutations that 
cause ectodermal dysplasia are generally caused by 
faults in orchestration proteins. 


Anhidrotic Ectodermal Dysplasia 

Given its X-linked inheritance, males are affected, 
while the affected females show partial forms. Hypo- 
trichosis and abnormal teeth and sweat glands are 
consistent features, while affected females show re- 
duced or distorted teeth and minor sweat gland or 
breast deficiency (Clarke, 1987). Not surprisingly, 
given lyonization heterozygotes have sweating defi- 
ciencies, which coincide with Blashko’s lines. The 
chromosomal location at Xq12 was first deduced 
from a translocation and eventually this was shown 
to be syntenic with the mouse tabby locus. Eventually 
a gene responsible for epithelial-mesechymal signaling 
was cloned. This was homologous in mouse and 
human and, as well as a transmembrane domain, con- 
tained a 19 GlyXY collagenous repeat (Monreal et al., 
1998). Although its function is unknown, it seems 
very possible that interaction with other matrix pro- 
teins is probably functionally important for epithelial- 
mesenchymal interactions. Short and long isoforms, 
with or without the collagenous domain, have been 
identified. Isoform II is functionally important in 
tooth, hair, and sweat gland morphogenesis. There is 
also some evidence that the protein ectodysplasin is a 
member of the TNF ligand family, which orchestrates 
epithelial-mesenchymal interactions, which in turn 
regulate epidermal appendage formation. Other simi- 
lar transmembrane proteins with collagenous domains 
include collagen XVII. Another transmembrane pro- 
tein is plakophilin, and mutations of this desmosomal 
protein cause another form of ectodermal dysplasia 
(McGrath et al., 1999). 


Hidrotic Ectodermal Dysplasia (Clouston 
Syndrome) 

In contrast to the anhidrotic variant, sweat glands, 
sebacious glands, and teeth are completely normal. 
Instead there is severe diffuse alopecia, with dys- 
trophic nails and patchy hyperpigmentation. Variations 
include palmar-plantar hyperkeratosis, eyebrow hypo- 
plasia, and mental retardation. Linkage analysis has 
mapped Clouston-like families to 13q11-12 with pos- 
sible mutations of the connexin 30 family (Lamartine, 
et al., 2000). 


Autosomal Anhidrotic Ectodermal Dysplasia 
(ED3) 


Autosomal dominant 

Several large American families have been described 
in which mild hair thinning, mild dental hypodontia, 
and variable hypohidrosis segregate with autosomal 
dominant inheritance. The skin is smooth and dry, 
eyebrows are atrophic, eyelashes and scalp hair are 
deficient, and sweating is confined to the axilae, 
palms and soles. Mutations have been identified in 
the human homolog of the mouse gene ‘downless’ 
(Majumdar et al., 1998). 


Autosomal recessive 

This is phenotypically indistinguishable from the 
X-linked form, except crucially that the inheritance 
is autosomal recessive, rather than X-linked. Muta- 
tions in the EDA1 gene are also missing. Instead an 
homologous gene in the same relation as a second 
mouse homolog to the tabby gene has been identified, 
of which in mouse the autosomal homolog (to tabby) 
is crinkled and ‘downless’ (Majumdar et al., 1998). 


Hypohidrotic Ectodermal Dysplasia with 
Immunodeficiency 

This variant which affects hair, teeth, and sweat glands 
is caused by mutations at the C-terminus of the KK- 
gamma gene. This is allelic to incontentia pigmentii 
(Zonana, 2000). 


Other Variants 

Other are variants include hypohidrotic ectodermal 
dysplasia with hypothyroidism and corpus callosum 
agenesis, and anhidrotic ectodermal dysplasia with 
cleft lip and palate. The latter has recently had muta- 
tions of the cell/cell adhesion protein PVL1 identified 
(Suzuki et al., 2000). 
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Translation, like the other steps in the flow of infor- 
mation from DNA to protein, is very accurate; the 
overall frequency of all types of translational errors is 
clearly less than one mistake per 1000 codons read. 
Part of this accuracy is achieved through various 
energy-dependent editing or proofreading functions 
which operate at the different steps involved in trans- 
lation. These functions prevent the formation of a 
defective protein by preventing the formation of an 
error-containing intermediate or rejecting such an 
intermediate at some step in the pathway. 

The terms ‘editing’ and ‘proofreading’ are often used 
as synonyms to describe mechanisms for preventing 
errors in translation. Such mechanisms can operate 
during the formation of aminoacyl-tRNA or during 
the selection of an aminoacyl-tRNA by the ribosome. 
In this entry, the term editing refers to such events 
during aminoacylation and proofreading refers to 
those that occur on the ribosome. Although these 
mechanisms work to increase accuracy, it is important 
to note that translation in normal cells is not maxi- 
mized for accuracy, since mutations exist that increase 
accuracy above that seen in the wild-type. However, 
such mutations lead to a decrease in the growth rate, 
indicating that translational accuracy has been opti- 
mized for accuracy and growth rate. 


Editing in Aminoacylation 


Aminoacylation, the attachment of an amino acid to a 
tRNA, is typically a two-step process catalyzed by the 
aminoacyl-tRNA synthetases. The first step, termed 
‘activation,’ is the formation of an aminoacyl-AMP 
(aminoacyl-adenylate) on the enzyme through the 
hydrolysis of ATP. The second step is the transfer of 
the activated amino acid residue from the adenylate to 
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a tRNA ina reaction referred to as ‘charging.’ Editing 
can occur in either of these two reactions. 

Because many of the amino acids are similar in 
structure, misactivation or mischarging of an amino 
acid by a given synthetase often involves a subset of 
amino acids structurally related to the cognate amino 
acid. It is believed that in some cases the binding 
energy of closely related amino acids leads to only 
approximately a 100-fold preference for the correct 
amino acid and that editing functions can increase this 
selectivity 1000-fold. 

The molecular pathways involved in editing can 
differ for different aminoacyl-tRNA synthetases. For 
instance, in the case of valyl-tRNA synthetase from 
Escherichia colt, misactivated threonine is first charged 
to a tRNA” (releasing AMP) and then hydrolyzed 
and released from the tRNA. A more widely used 
pathway seems to be the direct hydrolysis of the mis- 
activated amino acid, which also releases AMP but 
does not involve mischarging. This reaction has been 
shown to be important in vivo in the prevention of 
misincorporation of homocysteine by methionyl- 
tRNA synthetase. It appears that this reaction occurs 
approximately once for every 100 methionine residues 
incorporated. Note that such editing schemes result in 
hydrolysis of ATP and, therefore, occur at a metabolic 
cost to the organism. 

The aminoacyl-tRNA synthetases must also re- 
cognize the correct tRNA. The accuracy of tRNA 
selection is several orders of magnitude greater than 
the selection of amino acids. Indeed, these enzymes 
are involved in the process whereby mature, nonde- 
fective tRNAs are identified and exported from the 
eukaryotic nucleus. This process is also referred to as 
‘proofreading’ but is unrelated to the editing processes 
described above. 


Editing at Ribosome 


During elongation, aminoacyl-tRNAs are brought to 
a site on the ribosome containing the next codon to be 
translated (the A-site) as a ternary complex containing 
the aminoacyl-tRNA, guanosine triphosphate (GTP), 
and an elongation factor. As in the case of aminoacyl- 
ation, the initial selection of the aminoacyl-tRNA 
on the ribosome would give approximately a 100- 
fold preference for the correct versus a nearly correct 
codon—anticodon interaction. This is, of course, far 
lower than the observed accuracy of protein synthesis. 
Accuracy is apparently enhanced by proofreading, a 
rechecking of the codon-anticodon action. As in the 
case with editing by aminoacyl-tRNA synthetases, 
proofreading is energy driven, in this case by the 
hydrolysis of GIP. Every time EF-Tu delivers an 
aminoacyl-tRNA to the A-site, there will be GTP 
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hydrolysis. Proofreading would involve the rejection 
of an aminoacyl-tRNA from the A-site with con- 
comitant GTP hydrolysis. Therefore, proofreading 
involves the hydrolysis of GTP in excess of what 
would be required per peptide bond formed. 

The antibiotic streptomycin, which increases many 
types of translational errors, seems to affect both 
initial selection of aminoacyl-tRNA and proofreading 
on the ribosome. Streptomycin-resistant mutants 
of E. coli have an altered ribosomal protein 
$12. Many such mutants decrease the level of transla- 
tional errors in the cells, i.e., they lead to hyperaccur- 
ate ribosomes. Evidence seems to indicate that this 
decrease is related to the initial selectivity of amino- 
acyl-tRNA by such mutants, not with an increase in 
proofreading. 

It has been postulated that there exists another type 
of editing or proofreading on ribosomes, involving 
loss of the peptidyl-tRNA from the P-site after the 
misincorporation of an amino acid residue in a grow- 
ing peptide chain. Loss of a peptidyl-tRNA is often 
referred to as drop-off. If a proofreading mechanism 
exists that involves drop-off, it would be very expen- 
sive for the cell if it was operative at any time after the 
first few peptide bonds had formed, since all the 
energy required for each of the previous elongation 
cycles would be lost. 


See also: Aminoacyl-tRNA Synthetases; 
Elongation; Elongation Factors; Mistranslation; 
RNA Editing in Animals; RNA Editing in Plants; 
Translation 
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The concept of effective population number was 
introduced by Sewall Wright in his pioneering paper 
‘Evolution in Mendelian populations’ (Wright, 1931). 
Wright was interested in the effect of random changes 
in gene frequencies (random genetic drift) that occur 
in finite populations, especially small ones. In the ab- 
sence of systematic changes (e.g., selection, mutation, 
migration) the process of mating and reproduction can 
be thought of as drawing a sample of genes from an 
infinite pool to which each parent had contributed 
equally. In a diploid population of size N, the sample 
contains 2N genes. The variance of this binomial 


sampling process is p(1—p)/2N, where p is the fre- 
quency of the allele of interest in the parent gener- 
ation. This expression formed a part of the general 
mathematical expression for allele frequency change 
formulated by Wright. Real populations do not con- 
form to this idealization. The number of males and 
females may differ, the population size may fluctuate 
over time, and parents are not equally viable and fer- 
tile. To accommodate these problems, rather than 
write more elaborate equations, Wright introduced 
the concept of effective population number, Ne. Ne is 
a number calculated from an actual population of size 
N, that when substituted into the Wright equations 
leads to the same amount of random genetic drift as is 
occurring in the actual population. The population 
size is most appropriately assessed at the beginning 
of the reproductive period. 

When the number of females, Ng, differs from the 
number of males, Nm, the effective population number 
is given by 1/N, = 1/4N¢ + 1/4N,,. Notice that when 
Ng and Nn are each N/2, Ne = N as expected. If the 
numbers of the two sexes differ, then the sex with the 
smaller number dominates the value of N.. To con- 
sider an extreme example, the effective size of a poly- 
gamous population with one male and 100 females is 
about 4, much closer to the number of the rarer sex. 
When the population varies from time to time, the 
effective population number is the harmonic mean of 
the values at different times. Thus, if N; is the popula- 
tion number at generation 7, the effective population 
size averaged over t generations is given by 1/N.= 
(1/t)2,(1/N,). Again, since N, appears in the denom- 
inator, the smaller values dominate. Therefore, size 
bottlenecks are important factors in assessing the 
importance of random genetic drift in the history of 
a population. A situation that is often more important 
and more difficult to calculate occurs when the via- 
bility and fertility differ among the members of the 
parent generation. In the simplest case where the sexes 
are equally frequent, mating is at random (including 
a random amount of self-fertilization), and the popu- 
lation is neither increasing nor decreasing. In this case 
the effective population number is N.=(4N—2)/ 
(2+07), where N is the population number and ø is 
the standard deviation of the number of progeny per 
parent, counted as adults. Formulae adjusting for 
counting at the wrong stage are available (Crow and 
Morton, 1955). 

This aspect of effective population number has had 
a great deal of research in recent years and increasingly 
complicated formulae have been developed to take 
into account such factors as separate sexes, unequal 
numbers of the sexes, increasing or decreasing 
population number, inbreeding, and population struc- 
ture. The small population effect is manifested both in 


a decrease of heterozygosity and fluctuations in allele 
frequency. In a stationary population, the effective 
population numbers are the same for both effects, 
but in a growing or diminishing population they are 
different. For a review, see Caballero, 1994. 

Usually the effective size of a population is less than 
the actual size. It clearly is if the population is cen- 
sused at an early stage at which the death rate is high 
(as in most fish), but it is also true if the population is 
censused at the adult stage. Measured values range 
widely, but typically for most animals the effective 
number is between 1/4 and 3/4 of the census number. 

Although this is the case for a population without 
structure, it is not necessarily true if the population is 
subdivided. When the total population is divided into 
subgroups between which migration is limited, the 
effective population number can be, and often is, 
greater than the census number of the whole popula- 
tion. This is found, for example, in prairie dog colonies. 

Random genetic drift plays an important role in 
Sewall Wright’s ‘shifting balance’ theory of evolution. 
It is also a key quantity in the neutral theory of mo- 
lecular evolution of Motoo Kimura. 
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Ehlers—Danlos syndrome (EDS) is essentially a dis- 
order of collagen connective tissue, in which skin is 
over-fragile and hyperelastic. First described by 
Tschernogobov and then separately by both Ehlers 
and Danlos (Beighton, 1993) at the end of the nine- 
teenth century there are at least seven separate types, 
most of which are nonallelic. Recently a consensus 
committee have proposed a modified classification, 
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which merges the old types I and II, separates type 
VII into two distinct subsets, and confines types V 
and VIII to minority status (Beighton et al., 1998). 

There are some very strong phenotypical correl- 
ations in EDS. In EDS types I/II, (the old gravis or 
mitis forms) missassembled collagen fibers form diag- 
nostic cauliflowers. Clinically skin is extremely exten- 
sile and tears easily (Figure 1). Epicanthic folds, a 
mesomorphic build, with broadened hands and feet 
are common. Defects occur in either the COL5A1 
and 5A2 genes, causing faulty type V collagen protein, 
although linkage excludes both of these genes in some 
families. Known mutations include exon skips and 
glycine substitutions. 

EDS type IV is specifically caused by collagen III 
mutations (Pope et al., 1996). Typically there is acral 
thinning and premature aging (face, hands, and feet), 
thereby overlapping with metageria (Figure 2a,b). 
Thin skin with prominent capillaries is especially 
widespread over the shoulders and upper chest and 
is generalized in the severest cases. Early talipes is 
also common. Light microscopy of skin shows dermal 
collagen depletion and elastic proliferation, whilst 


Figure | 
EDS I/II. 


(See Plate 14) Typical late facial scarring of 
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(B) 


Figure 2 (See Plate 13) (A) Face and (B) hands of acrogeric EDS IV patients. The large eyes and lobeless ears are typical, 
whilst the hands show pulp atrophy from terminal phalyngeal erosions. 


electron microscopy of skin at 60000 magnification 
shows patchy irregularity of fibril size. Collagen III 
protein analysis nearly always shows depletion of 
procollagen, and collagen in the medium and over- 
modified mutant protein intracellularly. Most muta- 
tions are private and unclustered, although exon 24 is a 
hot spot for skipping errors, whilst first position gly- 
cine substitutions are also very common (Pope et al., 
1996). Helical 3’ mutations have the most abnormal 
external appearance and also the highest frequency of 
arterial rupture. The latter is commonest in the fourth 
decade of life, but patients are also at risk during 
adolescence and pregnancy. Blood pressure and aortic 
ultrasound monitoring are of unproven value. 

EDS type VI is typified by extreme general laxity 
and hypotonia, and motor delay owing to ligamentous 
laxity is very common. The typical hypotonic child, 
has good power and normal muscle biopsy. Early 
scoliosis indicates surgical correction in adolescence. 
Blue sclerae and osteopenia overlap with osteogenesis 
imperfect (OD), giving severe joint laxity with EDS 


VII (see below). EDS can be distinguished by abnor- 
mal typel collagen chemistry (type VII and OI) or 
electron microscopy of skin (normal in EDS VI). 
Type VIb lysyl hydroxylase deficiency can be moni- 
tored by gel electrophoresis of collagen type I proteins 
which migrate faster in affected patients, or by meas- 
uring urinary cross-links. The gene has 19 exons, and 
most mutations are double heterozygotes. 

EDS type VII is caused by the persistent N- 
propeptides, either from faulty N-proteinase or in- 
dividual structural mutations of exon 6 of COL1A1 
or COL1A2 genes. Clinically congenital hip disloca- 
tion combines with generalized joint laxity. Fragile 
skin and the external general phenotype overlaps 
with EDS I and II, but is distinguished by abnormal 
collagen chemistry (types a and b), in which the 
retained uncleaved propeptide is retained either as 
an extra o-1 or o-2 chain. In type VIIc, both a-1 and 
2 extensions remain (Pope and Burrows, 1997). Types 
a, b and c retain 1, 2 or 3 chains, with a gradation of 
clinical severity, greatest in type VIIc in which there 


is spectacular cutis laxa. Types a-c are also distinguish- 
able by electron microscopy of the skin, which also 
reflects fibril mispacking caused by retained propep- 
tides, ranging from angulated to swept-wing to hiero- 
glyphic fibrils in type VIIc (Pope and Burrows, 
1997). The EDS III/benign hypermobile subtype is 
the most common occurring in up to 10% of caucasian 
populations, and higher still in other races. It over- 
laps with all EDS subtypes, and also many other 
inherited connective tissue disorders, such as PXE, 
Marfan syndrome, Sticker syndrome, and certain 
chondrodysplasias etc. 

Although EDS VIII was relegated to minority sta- 
tus (Beighton et al., 1998), it also overlaps with EDS I/ 
II and IV, in all of which periodontal recession and 
pretibial scarring with hemosiderosis occur. Essential 
criteria include autosomal transmission of premature 
gum recession and bone loss, with granulomatous 
hemosiderin-rich pretibial plaques. Normal type III 
collagen levels exclude EDS type IV, and an absence of 
cauliflowers eliminates type I/II. 

Other minority EDS subtypes include type V. Only 
two families have been described with an external 
phenotype resembling EDS III/BHS, but with 
X-linked inheritance. Type IX EDS with fibronectin 
deficiency has been described only in one family. The 
new classification (Beighton et al., 1998) combines 
types I and II, retains types III and IV, dividing types 
VI and VII into two separate types which split the 
enzymic from the two structural mutants. Its super- 
iority to the old numerical classification is doubtful. 

With the exception of EDS IV in which uterine 
rupture or lethal arterial fragility during delivery can 
be life-threatening, pregnancy is usually safe. In EDS 
IV, early hospital admission and bedrest with an elect- 
ive cesarian section is advisable, preferably in a major 
medical centre with vascular surgical cover. Except 
for EDS I/II in which premature rupture of the mem- 
branes is common, all other EDS subtypes have 
perineal fragility, which can usually be minimized by 
controlled delivery, to avoid severe third-degree tears 
and later pelvic prolapse. 
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Electron microscopy is the image formation based 
on the interaction of high-energy electrons with a 
specimen, typically used for high magnifications to 
reveal details that cannot be resolved by light micro- 
scopy. 

The first electron microscope was built based on the 
principles of light microscope optics by Ruska and 
colleagues in 1931, following the recognition that a 
magnetic lens can focus a beam of electrons much as a 
glass lens can focus a beam of light. Electrons have 
much shorter wavelengths than visible light and, 
because resolution in microscopy is inversely related 
to the wavelength of the illuminating beam, electron 
microscopy (EM) is in practice capable of resolving 
details of biological specimens that are ~100x smaller 
than those resolved by light microscopy - roughly 
2 versus 200 nm. Resolution of even finer detail is 
possible in principle but is limited primarily by lens 
aberrations and by specimen damage from the electron 
beam. 

The major contributions of EM to genetics have 
followed from signal developments in specimen pre- 
paration to overcome inherent problems. First, speci- 
mens must be durable against beam damage and able to 
withstand high vacuum. This is because electrons inter- 
act so strongly with matter that electron microscopes 
must operate under very high vacuum. Second, speci- 
mens generally require some treatment to increase 
their contrast. Biological molecules scatter electrons 
poorly, being composed of elements with low atomic 
numbers. Thus, heavy metal atoms generally are intro- 
duced as stains, as thin coatings, or as discrete spheres 
(see below) to provide the necessary contrast. Third, 
specimens must remain extremely thin, since electrons 
must pass through the specimen to create the image 
for the most common high-resolution form of elect- 
ron microscopy, transmission electron microscopy 
(TEM). An important extension of this approach is 
to use tomographic techniques to extract two- and 
three-dimensional information from relatively thick 
sections examined in high voltage microscopes. 
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Figure | (above) Thin sections of ultrarapidly frozen, freeze-substituted Dictyostelium amoebae. (A) Wild-type cell. 
(B) CluA-cell, a mutant where mitochondria associate in a single large cluster. The scale bar represents | um; 
nucleus (n), lysosome (l), and mitochondria (m). (Images were kindly provided by S. Fields and M. Clarke, Oklahoma 


Medical Research Foundation.) 


Determination of cytological phenotypes at high 
resolution generally depends on some variation of 
the following common sequence of methods: 


1. Single cells, tissues, or whole organisms are ‘fixed’ 
to stabilize internal structures by cross-linking 
chemically using, for example, glutaraldehyde. 

2. The samples are embedded by infiltration with 
hardening resins then sliced into 30-100 nm ‘thin 
sections’ using diamond knives mounted in instru- 
ments termed microtomes. 

3. Contrast of the specimens is enhanced by staining 
with solutions of uranium, lead, or osmium salts 
(this step can occur earlier in the process). 


Figure 2 (See Plate 16) (left) Mitotic spindles in the 
yeast Saccharomyces cerevisiae, analyzed using thin sections. 
Stereo three-dimensional reconstructions of mitotic 
spindles from wild-type (A) and a cdc20 mutant (B). Light 
gray and dark gray lines represent microtubules; red lines 
represent microtubules that are continuous between 
the two poles. The cdc20 cell division cycle mutant was 
grown at the nonpermissive temperature (36 °C) for 4h, 
where these cells arrest in mitosis with an average spindle 
length of ~2.5 um and contain many more microtubules 
than wild-type spindles of comparable lengths (see 
Winey et al, 1995 and O’Toole et al, 1997). Immuno- 
electron microscopy localization of Kar3-GFP (C) and 
SIkI9-GFP (D) fusion proteins (arrowheads). Spindle 
microtubules appear as straight structures emanating 
into the nucleus from dense spindle pole bodies which 
are embedded in the nuclear envelope. Kar3-GFP is a 
motor enzyme of the kinesin family that localizes close 
to the spindle poles, whereas SIkl9-GFP localizes to 
kinetochores and the spindle midzone (see Zeng et al., 
1999). The scale bars represent 250 nm. 


4. The sections are mounted for viewing in the 
microscope on thin, stable support films, typically 
composed of carbon or plastic, laid over fine-mesh 
metal grids. 

5. Images are captured photographically or electronic- 
ally. 


In some instances, cells are frozen then exposed to 
chemical fixatives at low temperatures in order to 
stabilize morphology; ultrarapid freezing prevents 
formation of ice crystals which damage ultrastructure 
(Figure 1). In single sections, specific molecules are 
localized by tagging them with enzymes that are used 
to develop an electron-dense precipitate or with dis- 
crete markers, such as colloidal gold spheres with 
diameters in the nanometer range, usually employing 
an antibody intermediate (hence the term ‘immuno- 
localization’; Figure 2C,D). Three-dimensional in- 
formation can be generated from thin sections by 
collecting and imaging serial sections then arranging 
the image series in a ‘stack’ to reconstruct the original 
sample architecture (Figure 2A,B). 


Figure 3 Scanning electron microscope image of the 
freshwater dinoflagellate Gymnodinium acidotum, showing 
the typical morphology and positioning of the two 
flagellae. The sinusoidal transverse flagellum (T) sits in an 
equatorial groove that encircles the cell; the longitudinal 
flagellum (L) projects from a longitudinal groove. The 
scale bar represents 10 um. (Image was kindly provided 
by S. Fields, University of Oklahoma, S. R. Noble 
Electron Microscopy Laboratory.) 
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Intracellular topology is visualized using freeze- 
fracture and deep-etch methods. Following freezing, 
cells are broken open and held under vacuum to allow 
sublimation of the ice to proceed until intracellular 
structures are exposed. The surface is then coated 
with heavy metal to produce a film imprinted with 
the surface topology. This film is examined using TEM 
after being separated from the underlying substrate. 


Figure 4 A single molecule of duplex DNA partially 
unwound by Escherichia coli RecBCD enzyme, visualized 
on a thin nitrocellulose support by rotary shadowing 
with platinum. RecBCD unwinds DNA and produces 
two single-stranded DNA loops which are relatively 
thick due to binding by single-strand DNA-binding 
protein present in the unwinding reaction. (Image was 
kindly provided by A. F. Taylor and G. R. Smith, Fred 
Hutchinson Cancer Research Center, Seattle, USA.) 
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Figure 5 (See Plate 15) The surface of a three-dimensional reconstruction of a helical filament of human Rad5| 
protein on DNA is shown in gold in the foreground. In the background is an electron micrograph of the actual 
filaments that Rad5| protein forms on single-stranded DNA in the presence of ATP. (The Rad51 protein is from the 
laboratory of Dr Steve West, ICRF, UK.) The inset (right), a portion of such a Rad5I-DNA filament (scale bar 
represents 400 A) shows the very poor signal-to-noise ratio present in such images. To surmount this problem, the 
reconstruction has been generated using an algorithm for processing such images (Egelman (2000) Ultramicroscopy 85: 
225-234) and involved averaging images of 7620 segments. The reconstruction shows that the filaments contain 


~6.4 subunits per turn of a 99 A pitch helix. 


Surface topology of even larger structures, including 
whole organisms, is the province of scanning electron 
microscopy (SEM), which uses the back-scattered 
electrons from a scanning beam to produce an image. 
Inherently lower in resolution than TEM, SEM 
nevertheless is valuable for providing great depth of 
focus and uniformly clear images of large specimens 
(Figure 3). 

Single molecules and molecular assemblies are 
visualized on thin, uniform support films by ‘shadow- 
ing’ where the biomolecules cause perturbations in an 
otherwise uniform heavy metal coating (Figure 4) or 
by ‘negative staining’ where the biomolecules are evi- 
dent as the less dense areas in a puddle of dried heavy 
metal salt (Figure 5). 

Future advances in the utility of electron micro- 
scopy are likely to derive from refinements in specimen 


preparation, preservation, and contrast enhancement 
as well as from developments in digital image acquisi- 
tion, processing, analysis, and display. 
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Electroporation is the introduction of DNA mole- 
cules into cells by use of an electric current to tem- 
porarily make the cells permeable. 
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There are two processes in gene expression in 
which elongation is of primary interest. One is in 
transcription, the synthesis of RNA from a DNA 
template, and translation; the other is synthesis of a 
polypeptide from a messenger RNA on the ribosome. 
Both processes go through the phases of initiation, 
elongation, and termination. 


Elongation in Transcription 


Genomic DNA cannot be translated but has to be 
copied or transcribed into RNA by different RNA 
polymerases. Here the classic mechanism discove- 
red by Watson and Crick applies. One strand of the 
double-stranded DNA (the negative one) is copied 
with Watson—Crick base-pairing into a positive strand 
of RNA. This occurs in the 5’ to 3’ direction. The 
double-stranded DNA is opened up in a ‘bubble’ 
that travels along the duplex during transcription. 
Here, a DNA-RNA hybrid is formed transiently. 
The process of transcription is in all cases strongly 
regulated. Some genes are transcribed frequently, 
whereas others are transcribed only rarely. Again 
some genes are transcribed in some brief period in 
the life of the cell, whereas others are copied more or 
less continuously. 


Elongation in Translation on Ribosomes 


The process of translation occurs on the ribosome. 
The ribosome is a complex of a few large rRNA mole- 
cules and between 50 and 90 different proteins. The 
ribosome is made up of two subunits (large and small) 
with different functions that dissociate from each 
other at the end of the process. Translation is tradi- 
tionally divided into three steps: initiation, elongation, 
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and termination. Soluble protein factors catalyze the 
process by binding to the ribosome transiently. 

In each cycle of elongation, one amino acid is incor- 
porated into the nascent peptide. There are three 
elongation factors in eubacteria, which catalyze two 
of the basic steps in translation: the binding of an 
aminoacyl-tRNA to the A-site, and the translocation 
of the peptidyl-tRNA from the A-site to the P-site. 
During this step, the mRNA is moved to expose the 
next codon in the ribosomal A-site. However, during 
the central event in elongation, peptidyl transfer, no 
protein factor is needed. 

The recognition of the codon by the anticodon of 
the tRNA is a process that is done in several steps. In 
the initial selection, the anticodon of the aminoacyl- 
tRNA in complex with elongation factor Tu (EF-Tu) 
and GTP is matched against the codon in the A-site of 
the ribosome. When there is a good match, the ribo- 
some induces EF-Tu to hydrolyze its bound GTP 
to GDP and phosphate. The EF-Tu/GDP complex 
has a conformation that has low affinity for the amino- 
acyl-tRNA and the ribosome; accordingly it dis- 
sociates. The aminoacyl moiety of the tRNA is, 
when bound to EF-Tu located far from the peptidyl 
transfer center but can reorient itself into the A-site of 
the ribosome, while retaining the interaction with its 
codon. This process coincides with the proofreading 
of the anticodon of the tRNA by the codon of 
the mRNA. An incorrect (noncognate) match of the 
anticodon to the codon increases the likelihood that 
the aminoacyl-tRNA will dissociate before its amino 
acid has reached the peptidyl transfer site of the ribo- 
some. 

Peptidyl transfer is catalyzed by the rRNA of the 
large subunit without direct assistance of ribosomal 
proteins or elongation factors. Once the aminoacyl 
moiety reaches the A-site of the peptidyl transfer 
site, the peptide on the peptidyl-tRNA in the P-site 
can be transferred to it. This leads to a peptidyl-tRNA 
in the A-site and a deacylated tRNA in the P-site. 

The final step of elongation is the translocation of 
the peptidyl-tRNA from the A-site to the P-site and 
the movement of the mRNA by three nucleotides so 
that the next codon is exposed in the A-site. EF-G, 
which catalyzes this process, binds to the ribosome in 
complex with GTP. After translocation, it dissociates 
in complex with GDP. A surprising finding is that the 
ternary complex of EF-Tu with GTP and aminoacyl- 
tRNA has the same shape as EF-G. It remains possible 
that EF-G, when it dissociates from the ribosome, 
leaves an imprint into which the ternary complex fits 
exactly. 


See also: Messenger RNA (mRNA); Ribosomes; 
RNA Polymerase; Transcription; Translation 
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Translational elongation factors are proteins that play 
two important roles during the elongation cycle of 
protein biosynthesis on the ribosome. First, elonga- 
tion factors are involved in bringing aminoacyl-tRNA 
(aa-tRNA) to the ribosome during protein synthesis. 
Second, an elongation factor is involved in transloca- 
tion, the step in elongation at which the peptidyl- 
tRNA is moved from one ribosomal site to another 
as the mRNA moves through the ribosome. Both 
steps result in the hydrolysis of guanosine triphos- 
phate (GTP), and the conformation of the elongation 
factors changes depending on whether they are bound 
to GTP or to guanosine diphosphate (GDP). The 
elongation factors of archaea and bacteria (both are 
types of prokaryotes) and eukaryotes are similar in 
structure and function, as are the steps in protein 
biosynthesis in which they participate. The first part 
of this entry will deal with the elongation factors in 
bacteria, and later the factors found in other types of 
organisms will be discussed. 


Factors Related to Aminoacyl-tRNA 
Binding in Bacteria 


Elongation factor Tu (EF-Tu), when bound to GTP, 
brings aa-tRNA to the ribosome during the elonga- 
tion phase of translation. When EF-Tu is bound 
to GTP, it has a high affinity for aa-tRNA and forms 
the ternary complex, EF-Tu-GTP-aa-tRNA. EF-Tu 
must recognize common features of all tRNAs and 
also recognize that the tRNA is aminoacylated. EF-Tu 
is one of the most abundant proteins in bacterial cells, 
often present as 5% of the total cell protein. In Escher- 
ichia coli there are more than five molecules of EF-Tu 
per ribosome, and most of the aa-tRNA in the cell is 
bound to EF-Tu. 

The ternary complex has a high affinity for the 
ribosomal A-site, the site at which incoming aa- 
tRNA must be bound during the elongation step on 
the ribosome. If there is not a match between the aa- 
tRNA and the open codon at the A-site, the ternary 
complex leaves the ribosome. If there is a match, the 
aa-tRNA is delivered to the site, GIP is hydrolyzed, 
and EF-Tu-GDP is released from the ribosome. 
Another elongation factor, EF-Ts, is involved in a 


nucleotide exchange, whereby the GDP on the EF- 
Tu is replaced by GTP. 

Interestingly, there is still disagreement on the 
number of GTPs hydrolyzed during binding of each 
aa-tRNA. Models of translation show the involve- 
ment of the classic ternary complex; however, some 
studies indicate that there are two molecules of EF-Tu 
bound per each aa-tRNA and two GTPs are con- 
sumed during the cycle. 

The gene encoding EF-Tu is called tuf, and many 
bacteria have duplicate genes for this protein (tufA and 
tufB). In addition to antibiotic resistance mutants, 
some mutants of EF-Tu alter the error frequency of 
translation (such mutants are also known for the 
homologous protein in yeast). EF-Ts is encoded by 
the tsf gene. 


Factors Related to Translocation in 
Bacteria 


Translocation involves a conformational change of 
the ribosome during elongation, whereby the newly 
formed peptidyl-tRNA is moved from the ribosomal 
A-site to the P-site (the tRNA formerly occupying the 
P site is displaced to the E-site) and the next codon on 
the mRNA is moved into the A-site. Translocation, 
then, completes a cycle of elongation and positions the 
ribosome to accept the next incoming aa-tRNA. 
Translocation is catalyzed by the elongation factor 
EF-G, which is encoded by the fus gene. EF-G is 
bound to the ribosome as EF-G-—GTP. The binding 
site for EF-G overlaps with that of EF-Tu and, fascin- 
atingly, the structure of the EF-G mimics that of the 
ternary complex. The hydrolysis of the GIP seems 
to provide the energy for translocation, after which 
EF-G-GDP dissociates from the ribosome. Several 
mutants of EF-G are also known, and some of these 
also display altered accuracy of translation. 


Other Factors in Bacteria 


Almost certainly other protein factors, not yet com- 
pletely characterized, are involved in translation. A 
protein called elongation factor P seems to function 
at an early step in protein synthesis, possibly in for- 
mation of the first peptide bond. The gene encoding 
this protein, efp, has been found throughout the 
bacteria. The homologous protein in eukaryotes is 
the initiation factor, eIF5A. There is also a separate 
EF-Tu-like elongation factor specifically for bringing 
selenocysteinyl-tRNA to the ribosome in response to 
a UGA codon in the appropriate context. Such a pro- 
tein is also found in the archaea and eukaryotes. 


Elongation Factors in Archaea and 
Eukaryotes 


The eukaryotes have elongation factors that perform 
the same functions as EF-Tu, EF-Ts, and EF-G. The 
eukaryotic equivalent of EF-Tu is EF-1a, and there is 
high sequence conservation between EF-Tu and EF- 
1a. EF-14 is also one of the most abundant cytoplas- 
mic proteins in eukaryotes. Genes for this protein are 
often present in more than one copy and may have 
cell-type or stage-specific regulation. 

The eukaryotes have a complex of proteins, EF-1f, 
EF-1y, and EF-16, which function in a nucleotide 
exchange reaction like that involving EF-Ts. The 
factors EF-1B and EF-16 are closely related to each 
other, but none of these proteins is closely related to 
EF-Ts. 

The eukaryotic equivalent of EF-G is called EF-2. 
Like EF-G, it is responsible for the GI'P-dependent 
translocation step of the ribosome. It also contains a 
diphthamide residue, a unique posttranslational modi- 
fication of a histidine residue, which is the cellular 
target for ADP ribosylation by diphtheria toxin. 

Interestingly, the elongation factors of the archaea 
are more closely related to those of the eukaryotes 
than they are to those of bacteria, and, therefore, 
factors from the archaea are given the same nomen- 
clature as those from eukaryotes. The only elongation 
factor in the archaea that is more closely related to a 
bacterial factor than to the one from the eukaryotes is 
the elongation factor that brings selenocysteinyl- 
tRNA to the ribosome. 

As for the prokaryotes, there are almost certainly 
other eukaryotic protein factors involved in elonga- 
tion. For instance, the fungi have a factor called EF-3 
which has both ATPase and GTPase activities. 


Effects of Antibiotics on Function of 
Elongation Factors 


The elongation factors, or the steps in protein syn- 
thesis catalyzed by the elongation factors, are the 
targets of several different antibiotics, and some of 
these are well studied. The antibiotic kirromycin inhi- 
bits EF-Tu, blocking its exit from the ribosome. 
Kirromycin-resistant alleles of the tuf genes have 
also been isolated. Fusidic acid inhibits EF-G (and 
EF-2) by preventing it from leaving the ribosome. 
Mutants of EF-G are known which are resistant to 
fusidic acid, and they are responsible for the gene 
encoding this factor being termed fus. The amino- 
glycoside antibiotic kanamycin also inhibits trans- 
location, and this antibiotic can be used to select 
mutants of EF-G (although these do not result in 
high-level resistance to the drug). Thiostrepton is a 
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modified peptide antibiotic that binds to a site on 
23S rRNA and inhibits elongation factor-dependent 
reactions in both archaea and bacteria. 

Tetracycline inhibits protein synthesis by interfer- 
ing with the binding of aa-tRNA to ribosomes. Bac- 
terial resistance to the tetracyclines is mediated by two 
major mechanisms. One mechanism involves protec- 
tion of ribosomes from the action of the antibiotic by 
one of a group of proteins whose N-terminal amino 
acid sequences are similar to those of elongation fac- 
tors Tu and G. 

The large-subunit ribosomal RNA contains a very 
highly conserved sequence which is cleaved by the 
antibiotic a-sarcin and modified by the antibiotic 
ricin, both of which abolish protein synthesis on 
eukaryotic ribosomes (and are somewhat less effective 
against prokaryotic ribosomes). These antibiotics block 
the functions of ribosomes dependent on elongation 
factors, apparently by blocking their binding to the 
ribosome. Sordarins are a new family of highly speci- 
fic antifungal antibiotics which inhibit the action of 
fungal EF-2. 


See also: Translation 
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‘Embryo transfer’ refers to the transplantation of a 
mammalian preimplantation embryo into the repro- 
ductive tract of a recipient female so that it may 
implant and continue to develop to birth. Mammalian 
embryos of many species can develop in vitro from 
fertilization to the blastocyst stage (approximately 100 
cells), but at this point they must implant in the uterus 
in order for embryogenesis to proceed normally. For 
this reason, the ability to produce live young, or even 
mid-term fetuses, from isolated preimplantation 
embryos depended historically on the development 
of embryo transfer techniques. The first successful 
embryo transfer was performed in 1890 in the rabbit. 
However, the techniques of embryo transfer were not 
perfected and applied to a large number of mammalian 
species until the 1950s and 1960s, when methods for 
the efficient in vitro culture of preimplantation 
embryos were also developed. In 1978, this work cul- 
minated in the first birth of a human from a transferred 
embryo, which had been conceived by in vitro fertili- 
zation. 
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In laboratory mice, embryo transfer is usually per- 
formed by surgical methods. Mouse embryos at the 
one-cell stage or cleavage stages are transferred to the 
oviduct, while blastocysts, which are ready to implant, 
are transferred to the uterus. The recipient female 
mouse, or ‘foster mother,’ is first made “‘pseudopreg- 
nant’ by mating with a male that has been sterilized by 
vasectomy, so that her own eggs will not be fertilized 
and cannot compete with the transferred embryos. 
The gestational age of the donor embryos and the 
recipient must be synchronized, with optimal results 
occurring when the embryos are one day more 
advanced than the gestational age of the recipient. 
The recipient female is anesthetized and the oviduct 
or uterus is exposed through a small incision. Under a 
low-power stereomicroscope, the embryos to be 
transferred are loaded in a small volume of liquid 
into a fine glass pipette, which is inserted through a 
small hole in the bursa (the membrane covering the 
ovary and oviduct) into the infundibulum (the open 
end of the oviduct). In the case of a uterine transfer, a 
small hole is made in the side of the uterus with a sharp 
needle, and the transfer pipette is inserted through the 
hole into the uterine lumen. The embryos are then 
expelled and the incision is closed. Under optimal 
conditions, the rate of successful implantation and 
development of the embryos to term can exceed 90%. 

In large animals and in humans, embryo transfer is 
usually performed using a transvaginal approach, in 
which the embryos are inserted through the cervix and 
into the uterus with a catheter. 

The applications of embryo transfer are numer- 
ous and of great importance for basic research in 
genetics and experimental embryology, for animal 
husbandry and genetic manipulation of livestock, 
and for reproductive medicine in humans. Experi- 
ments in which the preimplantation embryo is phys- 
ically manipulated (for example, by microsurgery, 
injection of cells, or cell lineage tracers) depend on 
embryo transfer to determine the effects of the treat- 
ment on the resulting fetus or animal. All of the 
powerful and widely used techniques for genetic 
manipulation (transgenesis) of the mouse and other 
animals involve the introduction of foreign genetic 
material into the preimplantation embryo, either in 
the form of purified DNA injected at the one-cell 
stage, embryonic stem cells introduced at the cleavage 
or blastocyst stages, or nuclei transplanted at the one- 
cell stage. Therefore, embryo transfer is required for 
these genetically manipulated embryos to develop 
into live animals. Rare or valuable strains of mice and 
other mammals are often preserved by embryo freez- 
ing (cryopreservation), and embryo transfer is used to 
revive these strains. In agriculturally important mam- 
mals, in addition to the aforementioned applications, 


embryo transfer has been used for artificial twinning 
(by separating the blastomeres of two-cell embryos) 
and cloning (by nuclear transplantation), and for 
increasing the reproductive yield of valuable donors 
by inducing superovulation and transferring the 
embryos to multiple recipients. In humans, embryo 
transfer has made possible the recent advances in treat- 
ments for infertility, such as im vitro fertilization, 
intracytoplasmic sperm injection, and egg donation. 
Another application likely to increase in importance 
in the future is the diagnosis of inherited diseases at 
the preimplantation stage (using DNA isolated from 
one or a few cells), after which embryos selected for 
the absence of disease will be reimplanted. 


See also: Embryonic Stem Cells; Infertility 
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An overview of how the nematode Caenorhabditis 
elegans embryo develops is given below. Some back- 
ground anatomical information is provided and the 
mechanisms that are involved in specifying the axes 
of the embryo and determining the fate of the first 28 
cells to be born are addressed. A brief discussion on 
how the embryo generates its main tissues and organs 
then follows; and finally how the embryo acquires 
its shape and assembles its muscles is described. An 
effort has been made throughout to explain how our 
understanding C. elegans development aids us in our 
understanding of other animals. 

Despite the huge evolutionary distance that sepa- 
rates C. elegans from most species, and its lack of the 
anatomical features (limbs, eyes, hair) that identify 
more familiar animals, C. elegans has undeniably illu- 
minated some general principles that govern animal 
development. The reader who that doubts that 
C. elegans can relate to humans must realize that this 
relationship is at first deceptive if one considers our 
anatomy. C. elegans, which has muscles, nerves, skin, 
and gut, has brought a wealth of information at 
another level, the cellular level. This article aims to 
convey the notion that C. elegans has proven a 
remarkable model organism for studying the intra- 
cellular machinery at play that makes development 
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Anatomy of the nematode embryo. (A) Nomarski picture of the embryo at mid-embryogenesis; muscles, 


neurons, and the gonad cannot be distinguished in this picture. (B) Schematic drawing of a section through the 
embryo or the larva showing the positions of the main tissues. (C) Schematic drawing of the young hatchling. White 
scale bar in (A) is 10 um. In the embryo (A) and the larva (C) anterior is to the left and dorsal is up. The color code 
for tissues and organs is the same in (B) and (C); note that only a subset of muscles are drawn and that only the main 
nerve is shown (but the cell bodies of neurons are not represented). 


possible, for example, can tell the anterior from the 
posterior or generates functional muscles. 


Anatomy of the Caenorhabditis elegans 
Embryo and Timing of its Development 


The key features that make C. elegans so easy to study 
are its transparency, its extremely rapid rate of devel- 
opment, its simplicity, and the invariance of its division 
pattern. Each individual cell of the embryo can be 
visualized at all times in live specimens, so it is actually 
possible to watch a cell divide or migrate. Embryogen- 
esis only lasts for 14h at 25 °C, during which time a 
fertilized egg becomes a young larva with 558 cells! (for 
comparison the fruit fly Drosophila embryo has about 
10° cells at the end of embryogenesis). These cells arise 


'1090 cells are generated during C. elegans embryonic and 
larval development, of which 131 die from apoptosis and 959 
survive. Among those that survive some fuse, such that there 
are in fact 959 somatic nuclei in adult animals and 558 nuclei in 
young hatchlings but slightly fewer cells. 


in a reproducible and fixed pattern of cell divisions, 
migrations, and fusions. This feature together with a 
great deal of patient work allowed a group of scientists 
led by John Sulston to reconstitute the entire pattern of 
cell divisions from the zygote to the adult, which is 
now referred to as the C. elegans cell lineage. 

The anatomy of the late embryo is very simple: it 
comprises two concentric tubes, an inner tube which 
corresponds to the digestive tract (pharynx, intestine, 
and rectum), and an outer tube which corresponds to 
the epidermis (skin). The precursors to the reproduct- 
ive organ, the neurons, and the muscles lie between 
these tubes (Figure |). There are no appendages or 
external organs, implying that a mutant must be iden- 
tified based on its gross shape, on the aspect of its 
internal organs, on muscle activity, or on the presence 
and position of a specific cell. 

C. elegans embryogenesis can be conveniently 
divided into three main stages (Figure 2). During 
the first 100 min, five divisions give rise to 28 cells 
(Figure 3). Gastrulation, which corresponds to the 
set of cell rearrangements that ultimately gives rise to 
the separation between the three germ layers, starts 
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Figure 2 The main stages of Caenorhabditis elegans embryogenesis. The Nomarski pictures on the right show from 
top to bottom: a zygote after positioning of the two pronuclei at its centre; a two-cell embryo (note that the anterior 
AB blastomere is larger than the P; blastomere); a four-cell embryo (the blastomere names are indicated); a 28-cell 
embryo which is initiating gastrulation; an embryo at the beginning of elongation; an embryo at the end of elongation. 
Major embryonic events on the left and pictures on the right should be related to the time-scale and cell-number 


scale shown in the center. The scales are not linear. 


at the 28-cell stage. The second stage corresponds to 
the time of gastrulation, organ formation, and initial 
differentiation; it takes 4h and is accompanied by six 
further cycles of cell division. During the final stage, 
terminal differentiation and morphogenesis of the 
embryo from a ball of cells to a worm-shaped embryo 
occur with only very few additional cell divisions. 


Methods Used to Examine 
Caenorhabditis elegans Embryos 


Microscopy 

Light microscopy using differential interference con- 
trast optics (Nomarski optics) and the use of an auto- 
fluorescent protein (green fluorescent protein) fused 


to the protein of interest play an essential role 
in analyzing mutant phenotypes. The development 
of time-lapse recording methods, in particular the 
increasing power of modern computers, is greatly 
facilitating the observation of embryos. 


Embryological Methods 

To assess the function of a particular cell in a normal 
embryo, it is possible to eliminate that cell using a laser 
microbeam focused onto the cell to be killed via the 
Nomarski microscope. It is possible to determine in 
which cell a given gene acts by removing the eggshell 
in order to separate the first blastomeres and by reas- 
sociating them in different combinations. 
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Figure 3 Early embryonic lineage and founder cells. This figure shows the beginning of the embryonic lineage. The 
vertical axis shows time, the horizontal axis shows divisions. The organ/tissue that is contributed to by each major 
branch of the lineage is symbolized by sectored circles which are roughly proportional to the number of cells 
produced. The color code is given on the right. The letters that follow blastomere names normally refer to the axis of 
divisions. For instance, ABal is the left daughter of the anterior daughter of AB (note however that AB divides along 
the dorso-ventral axis and not the anterior/posterior axis but that physical constraints push it to adopt an anterior 


position, hence its name). 


Molecular Methods 

The usual range of modern molecular tools, such as 
transgenes and reporter genes, is available to examine 
C. elegans embryos. A powerful tool to assess the 
embryonic function of genes predicted from the 
genome sequence (C. elegans was the first multi- 
cellular eukaryote whose genome was fully se- 
quenced) is called RNA interference. In this method, 
double-stranded RNA specific for a target gene is 
introduced into embryos where it will efficiently and 
specifically inhibit the expression of the endogenous 
target gene, thereby creating a transient knockout of 
that gene. 


Caenorhabditis elegans Embryos Define 
their Anterior/Posterior Axis at the 
One-Cell Stage 


Animal embryos use different strategies to define their 
anterior/posterior (A/P) axis. In many species the 
oocyte is already polarized along the future A/P axis 
(e.g., in Drosophila melanogaster), or along a so-called 
animal/vegetal axis (e.g., in amphibians) which in 
some respects resembles the A/P axis. In such species, 
the cytoplasmic composition of the oocyte is not 
homogeneous and differs at both poles. In D. melano- 
gaster, we know that specialized cells, called the nurse 
cells, deposit mRNAs encoding morphogens,” at the 
future anterior pole which will be transported to 


2A morphogen is a factor (generally a protein) that can induce 
the formation of different structures depending on its 
concentration. 


the posterior pole or will stay anteriorly. In contrast, 
the mature C. elegans oocyte is ovoid but has no ap- 
parent polarity. The sperm entry point provides the 
initial cue for A/P polarity. Analysis of the mechanism 
that transforms this asymmetric cue into A/P polarity 
has revealed an apparently well-conserved machinery 
used not only in embryos but also in many different 
polarized cells. 

As in many species, fertilization triggers intense 
cytoplasmic movements. Internal cytoplasm flows 
toward the sperm entry point, while its direction is 
reversed at the cortex. Meanwhile, the female pronu- 
cleus migrates toward the male pronucleus until they 
meet; the two juxtaposed pronuclei move back to their 
final position slightly off the center of the long 
embryonic axis, which will be the A/P axis. The first 
division is asymmetric, giving rise to a large anterior 
daughter named AB and to a smaller posterior daugh- 
ter named P4. 

The cytoplasmic flow, which can be visualized 
using Nomarski optics, is the manifestation of a com- 
plete reorganization of the cytoplasmic content of the 
zygote. Antibodies against constituents of the early 
embryo have been used to identify cytoplasmic gran- 
ules, termed the P granules, which may carry germ cell 
determinants. They are initially uniformly distributed 
throughout the cytoplasm of the zygote and gradually 
accumulate at the posterior pole,’ such that they are 


3Drugs that inhibit polymerization of the protein called actin 
prevent the accumulation of P granules posteriorly, demon- 
strating that movement of P granules occurs along the actin 
cytoskeleton. 
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Figure 4 The A/P axis is defined by a polarizing mechanism. (A-C) Schematic representation of the major 
differences observed between wild-type (A), par-3 (B) or par-2 (C) embryos, at the one-cell (top drawings) and two- 
cell (bottom drawings) stages. In a wild-type embryo, (1) the zygote and then the posterior blastomere are polarized 
(asymmetric black shading), (2) PAR-3 and PAR-6 proteins are at the anterior cortex (red line) while PAR-| and 
PAR-2 proteins are at the posterior cortex (brown line), (3) P granules (red and white dots) are located posteriorly 
and segregate to the P; blastomere, (4) the AB blastomere is larger than the P, blastomere and divides along the D/V 
axis. In a par-3 mutant embryo (it would be the same in a par-6 mutant embryo), (1) polarity is abolished, (2) PAR- I 
and PAR-2 proteins are found all around the cortex, (3) P granules are uniformly distributed, (4) the first cleavage is 
symmetric and generates two blastomeres that divide along the A/P axis. In a par-2 mutant embryo, the situation is 
similar, except that now (1) PAR-3 and PAR-6 proteins are found all around the cortex and (2) both AB and P divide 


along the D/V axis. Anterior is to the left. 


absent from the AB blastomere’ after the first division. 
Thus, four criteria can be used to identify the initial 
polarity of the zygote: the asymmetric localization 
of P granules, the unequal size of AB and P4, the fact 
that AB divides along the future dorso/ventral axis 
(D/V axis) while P, divides along the A/P axis, and the 
different fates” of AB and P, progenies (Figure 4A). 
A breakthrough in the understanding of how the 
zygote acquires its A/P polarity came with the isola- 
tion of mutations that affect the distribution of P 
granules. These mutations define six maternal® genes, 
called par-1 through par-6 (par stands for partitioning 


‘Cells in early embryos are called blastomeres. 

>The fate of a blastomere refers to its pattern of division and 
the type of cells (for instance muscle versus neurons) it 
generates. 

ÉA mutation is classified as maternal when the mother has to 
be a homozygous mutant to affect the embryo; it affects genes 
that are expressed in the oocyte prior to fertilization (the gene 
product is stored in the oocyte as an mRNA or a protein). A 
mutation is classified as zygotic when the embryo itself has to 
be a homozygous mutant to be affected; it corresponds to 
genes that are expressed in the embryo after the onset of 
embryonic transcription. 


defective). Although par genes act in a common path- 
way, they can be subdivided into at least two groups. 
One group includes proteins localized at the anterior 
cortex (PAR-3 and PAR-6; see Figure 4B), the other 
includes proteins localized at the posterior cortex 
(PAR-1 and PAR-2; see Figure 4C). Genetic analysis 
has shown that PAR-1 is the final effector of this 
pathway. The nature of PAR proteins suggests that 
they act in a signaling process. In particular, PAR-1 
and PAR-4 have protein kinase’ domains, whereas 
PAR-3, PAR-6, and PAR-2 have protein-protein 
interaction modules suggesting that they could posi- 
tion or tether other proteins in specific places. 

It is thought that PAR proteins act to interpret the 
polarity cue provided by the sperm, which brings two 
centrosomes. One model that is supported by the 
involvement of actin states that PAR proteins act by 
mediating local changes in the cytoskeleton. How 
they do so and what their immediate targets are is 
unknown. Whatever the mechanism, their activity is 
required to localize several cell fate determinants 


7Protein kinases add a phosphate group onto certain serine/ 
threonine or tyrosine residues of other proteins, and by doing 
so modify their activity or their subcellular localization. 
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(including P granules) to the appropriate blastomeres. 
The demonstration that Drosophila par-1 and par-3 
homologs play an active role in determining the 
polarity of the oocyte and of epithelial cells (see 
below for a description of epithelial cells) suggests 
that par genes correspond to an ancient mechanism 
used to polarize cells. 


Cell—Cell Interactions Define the Dorso- 
Ventral and the Left-Right Axes 


Remarkable as it is, fertilization is only the beginning 
of development. Subsequently, all animal embryos 
generate by rapid cleavage smaller cells with ever 
more restricted potentials, generally with no increase 
in embryonic volume. In all species, the initial egg 
cytoplasm is unequally partitioned and/or modified 
through signals sent by one group of cells to their 
neighbors, as occurs in amphibian and fish embryos, 
or else through the action of localized transcription 
factors, as occurs in the Drosophila embryo. In C. 
elegans, the 28 first blastomeres present at the onset 
of gastrulation acquire distinct fates both through cell- 
cellinteractions and localized transcription factors. The 
dorso—ventral and left-right axes are specified during 
this early stage of embryogenesis through strategies 
that differ from those used in insects and vertebrates. 
As described above, the first division along the A/P 
axis generates two cells with different potentials that 
divide perpendicular to each other. The axis of AB 
division defines the D/V axis. Due to physical con- 
straints imposed by the eggshell the ventral daughter 
of AB becomes positioned anteriorly to the dorsal 
daughter (hence their names ABa and ABp). The 
ABa and ABp blastomeres subsequently divide along 
an axis that is neither the A/P nor the D/V axis and 
defines the L/R axis. Unlike the initial division, the 
AB division and then the ABa/ABp divisions are sym- 
metric resulting in daughters that initially have equal 
potentials. For this reason and because the axes are 
defined when the embryo contains very few cells, 
establishing the axis becomes a matter of generating a 
difference between cells that are equivalent. 


Dorso-Ventral Axis and Left-Right Axis 

The eggshell causes the ABp blastomere, but not the 
ABa blastomere, to come in direct contact with the P, 
blastomere. A signaling cascade between P, and ABp 
sets the D/V axis by instructing ABp to become dif- 
ferent from its sister ABa. This is achieved by a ligand 
(encoded by the gene apx-1) expressed at the surface of 
P, that interacts with a receptor (encoded by the gene 
glp-1) present in ABa and ABp (Figure 5A). As the 
APX-1 ligand is not diffusible, only ABp can receive 
it. In apx-1 or glp-1 mutant embryos, ABp is not 


instructed and generates cells and tissues normally 
generated by ABa (Figure 5B). 

The L/R axis is also specified through cell-cell 
interactions which occur when the embryo reaches 
the 12-cell stage. At that time, the MS blastomere is 
in contact only with a subset of ABa descendants, 
namely ABalp and ABara, but not in contact with 
their left/right relatives ABarp and ABala. In this 
case, a signaling cascade involving the gene glp-1 
again but a ligand of unknown nature sets the L/R 
axis by instructing ABara to become different from 
ABala and ABalp different from ABarp. 


Generation of Cell Diversity until 
Gastrulation by Polarization 

Starting with the division of EMS, ABal, ABar, ABpl 
and ABpr blastomeres (Figure 3), all cells divide along 
the A/P axis until gastrulation starts. During this time 
a common mechanism is repeatedly used to make the 
anterior daughter different from its posterior sister. 
This process involves the phosphorylation of a tran- 
scription factor that accumulates in an active form 
only in the nucleus of the anterior daughter. The tran- 
scription factor is encoded by the maternal gene pop-/, 
while the kinase involved in phosphorylating the 
POP-1 protein is the product of the maternal gene 
lit-1 (a so-called MAP kinase). In pop-1 mutant 
embryos, the anterior daughters adopt the fates of 
their posterior sisters. Conversely, in /it-1 mutant 
embryos the posterior daughters adopt the fates of 
their anterior sisters, implying that /it-1 is a negative 
regulator of pop-1 in the posterior daughter. It is actu- 
ally not known whether POP-1 phosphorylation pre- 
vents its accumulation, its stability, or its activity in the 
posterior daughter. 

The process used to initiate the polarization of the 
anterior/posterior division is well understood in the 
EMS lineage (see Figure 3). Blastomere reassociation 
experiments together with the isolation of specific 
mutations have demonstrated that the EMS blasto- 
mere can generate intestinal cells only if it contacts 
the P, blastomere during a specific time-window dur- 
ing its cell cycle. In the absence of such a contact, the 
posterior daughter of EMS (E blastomere) fails to 
generate intestinal cells and adopts the MS fate. Dur- 
ing this contact, a signaling pathway? polarizes EMS 
to ultimately induce the asymmetric localization of 
the POP-1 protein in the nucleus of MS, which is the 


8This pathway involves all the classical components of a Wnt 
pathway, including a Wnt signal (encoded by the gene mom-2), 
a Frizzled-type receptor (encoded by the gene mom-5), a 
GSK-3 kinase (encoded by the gene sgg-/), a -catenin 
homolog (encoded by the gene wrm-!), and finally a TCF/LEF 
transcription factor (encoded by the gene pop-!/). 
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Figure 5 Defining the D/V axis and generating cell diversity in the early embryo. (A) At the four-cell stage, due to 
the activity of par genes, the EMS and P, blastomeres are different from each other and both are different from the 
ABa and ABp blastomeres; however, ABa and ABp are initially equivalent. (B, upper) In wild-type embryos, the APX-| 
ligand in P2 instructs ABp to become different from ABa, while the MOM-2 ligand in P} polarizes EMS (asymmetric red 
shading). (B, lower) In turn, polarization of EMS is interpreted when it divides in such a way that the POP-I protein 
becomes nuclearly localized mainly in the MS blastomere; this allows the E blastomere to generate the intestine. (C) 
In apx-! mutant embryos, ABa and ABp remain identical and express the ‘ABa fate’ (D) In mom-2 mutant embryos, 
EMS is not polarized at the four-cell stage, hence MS and E both inherit nuclear POP-I, which causes them to be 
identical and to express the ‘MS fate’ (the intestine is not made). 


anterior daughter of EMS (Figure 5A). The ligand is 
expressed in the Pz blastomere, while its receptor 
is expressed in the EMS blastomere. In embryos lack- 
ing the ligand or its receptor, the posterior daughter E 
adopts the fate of its anterior sister MS (Figure 5C). 

How POP-1 activity in the anterior daughter can 
ultimately contribute to generate cell fate diversity is 
beginning to be understood for the EMS lineage. 
Genetic and molecular analysis has shown that the 
maternal gene skn-1 encodes a transcription factor 
that is present and essential in EMS, MS, and E 
blastomeres. Hence, at least two transcription factors 
will be active in MS, POP-1, and SKN-1, which will 
contribute to specify the ‘MS fate,’ while only one will 
be active in E, SKN-1 which will activate the endo- 
derm specification program. Presumably POP-1 
together with other transcription factors can similarly 
specify unique fates among the first 28 cells of the 
pregastrulation embryo. Genetic analysis in C. elegans 
has thus uncovered an entirely new mechanism to 
generate cell diversity, which has now also been 
shown to exist in vertebrates. 


Totipotency of the Germline is Preserved 
by Repressing Gene Expression 


The germline lineage needs to prevent premature 
differentiation to preserve its totipotency. Historic- 
ally, nematodes were important in recognizing the 
special nature of the germline.’ A major contribution 
of C. elegans to biology has been to show that the 
germline is set aside by repressing gene expression. 
The most compelling evidence has been provided by 
the isolation of maternal-effect mutations that lead to 


?Theodor Boveri, a German embryologist working more than 
100 years ago, was the first to observe chromosomes. He 
discovered that in the nematode Parascaris aequorum chromo- 
somes become fragmented during embryogenesis in somatic 
tissue but not in germ cells. During this phenomenon, which 
has been called chromatin diminution, different somatic cells 
inherit different pieces of chromosomes. For some time 
chromatin diminution provided a plausible model for terminal 
differentiation. Now we know that in most species, including 
C. elegans, all cells inherit the same set of chromosomes. 
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the absence of the germline, either because the germ- 
line lineage becomes transformed into a somatic line- 
age or because germ cells die. Genes defined by these 
mutations are associated with P granules or the germ- 
line. They act in germline precursors to lock the chro- 
matin in a repressed state (mes-2 and mes-6'°), to 
repress gene expression at the transcriptional level 
(pie-1) or at the translational level (pos-1, mex-1, 
and mex-3). By doing so, they probably prevent 
premature differentiation of germline precursors. 
Further evidence in support of the idea that germline 
precursors are transcriptionally inactive comes from 
the use of specific monoclonal antibodies that dis- 
tinguish the active from the inactive pool of RNA 
polymerase II, the major enzyme involved in gene 
transcription. In C. elegans embryos, as well as in 
Drosophila embryos, it has been shown that germline 
precursors contain only an inactive form of RNA 
polymerase II. 


Mechanisms Used to Generate Tissues 
and Organs 


Zygotic genes that control the formation of several 
tissues and organs once gastrulation has been initiated 
have been characterized. These genes fall into two 
categories: those that specify organ/tissue identity 
and those that control organ/tissue differentiation. 
Genes that specify the identity of the intestine, the 
pharynx, and the epidermis’’ share the following char- 
acteristics: (1) when they are inactivated, the precursor 
cells from which the organ or tissue is derived adopt 
another fate, leading to the absence of the organ/tissue 
primordium; (2) they are expressed very soon after 
the onset of gastrulation (the intestine identity gene 
is even expressed prior to gastrulation); and (3) they 
can reprogram other cells to develop as if they were 
part of the organ/tissue. In other words, these ‘iden- 
tity genes’ confer the potential to form the intestine, 
pharynx, or epidermis in a group of cells when gas- 
trulation starts. Interestingly, homologous genes have 
been described in flies and vertebrates, which play 
similar roles. Therefore, it appears that, despite very 
different strategies for the early steps of embryogen- 
esis, which reflect the necessity to adapt to very dif- 
ferent environments, the genetic control of organ/ 


'°The genes mes-2 and mes-6 encode Polycomb-like proteins. 
In Drosophila, Polycomb is known to negatively regulate gene 
transcription by binding to chromatin and is involved in 
maintaining silent certain genes of the Hox complex in 
appropriate segments of the embryo. 

''The names of these genes are end-! for the intestine, pha-4 
for the pharynx, and elt-! for the epidermis. They encode 
transcription factors. 


tissue formation has probably been conserved during 
evolution and may be very ancient. 

Genes that are important for the differentiation of 
organs and tissues differ from organ/tissue ‘identity 
genes’ in two respects. First, their inactivation does 
not lead to the absence of the organ/tissue primor- 
dium but only to the abnormal differentiation of 
cells within the organ/tissue. Second, they are 
expressed slightly later than ‘identity genes’ and their 
expression depends on the latter. These genes are more 
numerous than ‘identity genes,’ and probably act 
together with them to activate all or a subset of term- 
inal differentiation genes in the organ/tissue (e.g., 
genes that control specific muscle proteins). Further 
work in C. elegans should help in understanding the 
cellular and genetic steps that are essential to build 
organs and tissues. 


Caenorhabditis elegans Embryo as a 
Model System in Cell Biology 


C. elegans is particularly well suited to help analyze 
some cellular processes that are not necessarily speci- 
fic to embryogenesis. Three of these will be briefly 
mentioned: the mechanics of cell division, the biology 
of epithelial cells, and the assembly of muscles. 


The First Cell Cycle 

As described before, fertilization of the oocyte 
induces completion of the female pronucleus meiosis, 
its migration toward the male pronucleus, movement 
of both pronuclei to the center of the embryo, spindle 
assembly, chromosome separation, and finally cyto- 
kinesis. These events can be easily monitored in live 
embryos because the zygote is a very large cell. 
Genetic analysis has shown that it is possible to indi- 
vidually affect each of these steps. There is no doubt 
that the C. elegans embryo will provide an invaluable 
system with which to analyze processes taking place 
more specifically during the first embryonic cell cycle 
(e.g., pronuclear migration) as well as those common 
to all cell divisions (e.g., spindle assembly). 


Biology of epithelial cells 

Epithelial cells are polarized and characterized by two 
main membrane domains, the apical surface facing 
the external environment and the basolateral surface 
facing the inside of the animal. Among other roles, 
they are essential to shape organs and tissues. A typical 
example is wound healing, which relies primarily on 
epidermal cells (which are epithelial) changing their 
shapes to extend over the wounded area. Genetic ana- 
lysis in C. elegans has identified several genes that are 
importantin controlling cell shape changes. During the 
second half of C. elegans embryogenesis, epidermal 
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cells, which initially have a square shape, stretch out 
along the A/P axis and narrow along the D/V axis 
resulting in a constriction of the internal contents 
of the embryo and its elongation along the A/P axis 
(see Figure 2). Contraction of the actin cytoskeleton 
within epidermal cells provides the driving force to 
undergo this dramatic cell shape change. As in verte- 
brate epithelial cells, actin is anchored to specialized 
junctions that separate the apical surface from the 
basolateral surface via a complex of proteins known 
as &-catenin, B-catenin, and cadherin (encoded by the 
genes hmp-1, hmp-2, and hmr-1, respectively). Muta- 
tions affecting the catenin/cadherin complex disrupt 
actin anchoring and prevent elongation of C. elegans 
embryos. Genes that regulate actin contraction during 
the process of elongation are also known about (for 
instance mutations in the gene J/et-502 reduce the 
extent of elongation). 


Assembly of Muscle Fibers 

Knowledge of muscle function and assembly is par- 
ticularly detailed in C. elegans. C. elegans muscle 
sarcomeres, which assemble during the second half 
of embryogenesis, are in most respects very similar 
to those observed in other species, except that muscle 
cells do not fuse. They include alternating thick fila- 
ments, which contain myosin, and thin filaments, 
which contain actin, tropomyosin, and troponin. 
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Many genes encoding proteins required for sarcomere 
assembly and most if not all those encoding structural 
sarcomeric components have been identified, gener- 
ally by genetic analysis. In vertebrates, muscles are 
anchored to our bones; in C. elegans, they are 
anchored to the cuticle, which is secreted by epidermal 
cells at their apical surface and acts as an external 
skeleton (or exoskeleton). Mutations in genes encod- 
ing muscle-anchoring components lead to embryonic 
lethality, prevent full embryonic elongation (see 
above), and often lead to muscle integrity defects. 
Detailed analysis of these mutations suggests that sar- 
comeres are first assembled around a structure called 
the dense body in muscles (Figure 6), which itself 
attaches to a network of proteins in the space separat- 
ing muscles from the underlying epidermis (the extra- 
cellular matrix). This network in turn is anchored to 
the cuticle through other proteins, some of which 
remain to be identified. 


Conclusion 


For a long time, nematodes were thought to be unique 
among animal species owing to their invariant lineage. 
Indeed, early blastomeres have a fixed fate in C. ele- 
gans, implying that if a blastomere is ablated the tissues 
that should normally be generated by this blastomere 
will be missing. In contrast, in many other species early 
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Figure 6 How the muscle is attached. Sarcomeres are formed by alternating thin (gray lines) and thick filaments 
(red lines). Within muscle cells the anchoring structure is called the dense body and consists of a complex formed by 
several proteins (vinculin, talin, and &-actinin), which interact with actin within thin filaments, and an integrin dimer 
(dark pink/pale pink pair; genes pat-2 and pat-3) at the muscle membrane (some additional attachment is provided 
through so-called ‘M lines’ at the center of thick filaments). The integrin itself recognizes a protein from the 
extracellular matrix called perlecan (intertwined gray lines; gene unc-52). Within epidermal cells the anchoring 
structures are called fibrous organelles: they are made in part by a long transmembrane protein called myotactin 
(gray; gene let-805) that extends toward muscles and contacts in turn one or more proteins that probably run across 
the epidermal cytoplasm from the basal membrane to the cuticle. It is not yet known what provides attachment to the 


cuticle. 


blastomeres do not have a fixed fate, such that if one is 
ablated cell-cell regulatory mechanisms compensate 
for this loss. Embryonic development in C. elegans is 
said to be ‘mosaic,’ whereas in other species it is said to 
be ‘regulative.’ Mosaic development was long thought 
to be strictly under the control of lineage-dependent 
transcription-based mechanisms. It is now clear that 
the invariance of C. elegans lineage does not preclude 
the existence of cell-cell interactions mediated by a 
ligand and its receptor (see above); in parallel it 
appears that classical models with a regulative mode 
of development also use transcriptional control. 
Furthermore, more primitive nematodes, which are 
generally marine nematodes, do not have a fixed line- 
age. Therefore, nematodes develop like all other ani- 
mal species and their further study will be relevant 
to the understanding of human biology, particularly, 
as repeatedly emphasized throughout this article, to 
analyze cellular processes. 
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Early Embryonic Development Is Highly 
Plastic 


For the purposes of scientific analysis, mammalian 
development is divided into two distinct stages of 
unequal length that are separated by the moment of 
implantation into the uterus. During the preimplanta- 
tion phase, which lasts 4.5 days, the embryo is a free- 
floating object within the mother’s body. Because it is 
naturally free-floating, the preimplantation embryo 
can be removed easily from its mother’s body and 
cultured in a petri dish, where it can undergo genetic 
manipulation before it is placed back into a female 
where it can continue along the developmental path 
to anewborn animal. Once the embryo has undergone 
implantation, it can no longer be removed from its 
mother’s body and remain viable. The accessibility of 
the preimplantation embryo provides the basis for 
a number of specialized genetic tools that are used 
to study mammalian development, including the 
production of transgenic animals and targeted muta- 
genesis. 

The preimplantation phase starts with the zygote 
(the one-cell fertilized egg or embryo) at the time of 
conception. Development begins slowly with the first 
22 h devoted to the expansion of the highly compacted 
sperm head into a paternal pronucleus that matches 
the size of the original egg (maternal) pronucleus. 
Once this process is completed, the embryo undergoes 
the first of four equal divisions, or cleavages, that 
increase the number of cells, over a period of 60 h, 
from one to 16 (see Figure I). 

Throughout this period, known as the cleavage 
stage, all of the cells in the developing embryo are 
equivalent and totipotent. The word totipotent is 
used to describe a cell that has not yet undergone 
differentiation, and still retains the ability, or potency, 
to produce every cell type present in the developing 
embryo and adult animal. The cleavage stage mam- 
malian embryo is also called a morula. 

As a consequence of totipotency, cleavage stage 
embryos can be broken into smaller groups of cells 
that each have the potential to develop into individual 
animals. The outcome of this process can be observed 
in humans with the birth of identical twins or, much 
more rarely, identical triplets. In the laboratory, scien- 
tists have obtained completely normal mice from 
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Figure | Preimplantation development. 


individual cells that were dissected out of the four- 
cell-stage mouse embryo and placed back individually 
into the female reproductive tract. This experimental 
feat demonstrates the theoretical possibility of obtain- 
ing four identical clones from a single embryo of any 
mammalian species. 

It is important to contrast the early developmental 
program of all placental mammals with that of other 
animals including the two model organisms Caenor- 
habditis elegans and Drosophila melanogaster. Identi- 
cal twins can never be obtained from a single 
nematode or fly embryo. During nematode develop- 
ment, individual embryonic cells from the two-cell 
stage onward are highly restricted in their develop- 
mental potential or ‘fate.’ The fly egg is polarized 
even before it is fertilized and different cytoplasmic 
regions are devoted to supporting different develop- 
mental programs within the nuclei that end up in these 
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locations. Thus, half a nematode embryo or a half a fly 
embryo could never give rise to a whole animal. 


Embryonic Differentiation and 
Postimplantation Development 


During the 16-cell stage of mammalian embryogen- 
esis, the first differentiative event occurs, and the 
developmental potency of individual cells finally 
becomes restricted. The cells on the outside of the 
embryo turn into a trophectoderm layer that will even- 
tually take part in the formation of the placenta. Mean- 
while, the cells on the inside compact into a small clump 
that remains attached to one spot along the inside of the 
trophectoderm sphere. This clump of cells is called, 
appropriately enough, the inner cell mass (ICM). The 
fetus will develop entirely from the ICM. At this stage 
of development, the embryo is called a blastocyst. 


Two more rounds of cell division occur during the 
blastocyst stage before the embryo implants. 

Throughout the process of normal preimplantation 
development, the embryo remains protected within 
the inert zona pellucida. Thus, there is no difference 
in size between the one-cell zygote and the 64-cell 
blastocyst. To accomplish implantation, the blastocyst 
must first ‘hatch’ from the zona pellucida, so that it 
can make direct membrane-to-membrane contact 
with the cells in the uterine wall. Implantation initiates 
the development of the placenta, which is a mixture of 
embryonic and initiates the development of the pla- 
centa, which is a mixture of embryonic and maternal 
tissue that mediates the flow of nutrients, in one direc- 
tion, and waste products, in the other direction, 
between the mother and embryo. The placenta main- 
tains this intimate connection between mother and 
fetus until the time of birth. The process of internal 
uterine development is a unique characteristic of all 
mammals other than the primitive egg-laying platypus. 

With the development of the placenta, a period of 
rapid embryonic growth begins. Cells from the ICM 
differentiate into all three germ layers (endoderm, 
ectoderm, and mesoderm) during a stage known as 
gastrulation. The foundation of the spinal cord is put 
into place, and the development of the various tissues 
and organs of the adult animal is initiated. With the 
apperance of organs, the embryo is now called a fetus. 
The fetus continues to grow rapidly in size and in 
the mouse birth occurs at ~21 days after conception. 
Newborn mice remain dependent upon their mothers 
during a suckling period which can last another 18 to 
25 days. By 5 to 6 weeks after birth, mice have reached 
adulthood and are ready to begin the reproductive 
cycle all over again. 


See also: Developmental Genetics; Embryonic 
Stem Cells 
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Present naturally in the very early mammalian 
embryo, embryonic stem (ES) cells are members of 
a special class of cells that have the potential to 
differentiate into every cell type present in the adult 
animal. In recent years, scientists have gained the abil- 
ity to culture and grow ES cells (derived from 
embryos) im vitro, and also to convert somatic cells 
into embryonic stem cells. By definition, embryonic 
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stem cells are totipotent, which means they have the 
potential to differentiate into every cell type of an 
animal. Thus, the embryonic stem cell is operationally 
defined, rather than phenotypically defined. The only 
way in which an embryonic stem cell can be identified 
is in the generation of a complete animal from a single 
cell through the normal process of development. 


See also: Cell Lineage 
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End labeling is a technique for adding a radioactively 
labeled group to one end (5’ or 3’) of a DNA strand. 


See also: Autoradiography 


Endoderm 


See: Developmental Genetics 
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The endonucleases are a group of enzymes that cleave 
nucleic acids at positions within the chain. Some act 
on both RNA and DNA (e.g., S1 nuclease, specific 
for single-stranded molecules). Ribonucleases (e.g., 
pancreatic, T1, etc.) are specific for RNA, and de- 
oxyribonucleases for DNA. Bacterial restriction 
endonucleases are important in recombinant DNA 
technology for their ability to cleave double-stranded 
DNA at highly specific sites. 


See also: Nuclease; Restriction Endonuclease 
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End-product inhibition is the process whereby a pro- 
duct of a metabolic pathway inhibits the activity of an 
enzyme that catalyzes an early step in the pathway. 


See also: Enzymes 
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Enhancers are operationally defined as cis-acting ele- 
ments that augment the activity of a promoter in an 
orientation- and position-independent manner. Ini- 
tially identified in a sea urchin histone gene and in 
the simian virus 40 (SV40) viral genome as regulatory 
elements that increase transcription from a promoter 
located at a distant position on the same DNA mol- 
ecule transcriptional enhancers are found in most 
eukaryotic genes transcribed by RNA polymerase II. 
In lower eukaryotes such as yeast, upstream activator 
sequences (UAS) can also function at variable dis- 
tances from the promoter and in either orientation. 
UAS are thus analogous to enhancers of higher eukary- 
otes, although they differ from enhancers in their 
inability to activate a promoter from downstream 
positions. In bacteria, a simple enhancer has been 
identified upstream of a promoter that is recognized 
by a specific form of RNA polymerase containing 
sigma factor 54. 

Transcriptional enhancers of higher eukaryotes are 
typically composed of multiple modules that cooper- 
ate to augment gene expression. The modules con- 
sist either of binding sites for individual transcription 
factors or of composite binding sites for different 
transcription factors. The multiplicity and modularity 
of transcription factor binding sites in enhancers allow 
for combinatorial control and functional diversity. 
In addition, interactions between multiple enhancer- 
binding proteins can help to increase the accuracy of 
DNA sequence recognition in a large and complex 
genome. 

The modularity of enhancers has been demon- 
strated by experiments in which individual modules 
of an enhancer have been multimerized to generate 
synthetic enhancers. Synthetic enhancers typically 
augment gene expression. However, they do not 
reproduce all regulatory properties of natural enhan- 
cers such as cell type specificity or inducibility. In- 
sight into the modularity and functionality of natural 
enhancers is provided by experiments examining 
interactions between transcription factors bound at 
different modules. At some natural enhancers, multi- 
ple transcription factors have been shown to interact 
with each other, resulting in the assembly of a higher- 
order nucleoprotein complex, termed the ‘enhan- 
ceosome.’ The assembly of such complexes can be 
faciltated by architectural proteins that have no 
activation potential by themselves but augment 


interactions between other enhancer-binding proteins 
and/or bend the DNA helix. 

The mechanisms by which enhancers regulate tran- 
scription appear to be diverse. Transcriptional run- 
on experiments have shown that the SV40 enhancer 
increases the rate of transcription initiation from a 
linked promoter, suggesting that enhancer can regu- 
late the recruitment and/or activity of RNA poly- 
merase. This effect of enhancers appears to involve 
contacts between enhancer- and promoter-bound pro- 
teins in which the intervening DNA is looped out. 
Interactions between enhancer- and promoter-bound 
proteins by DNA looping have been visualized by 
electron microscopy. In support of a looping model, 
enhancers can activate promoters that are located on 
physically linked, but topologically uncoupled DNA 
molecules. In the simplest form of activation, the 
enhancer-binding protein of bacteria, Ntr-C, contacts 
the sigma-factor-54-containing RNA polymerase and 
induces the formation of an open complex in an ATP- 
dependent manner. In an analogous manner, activators 
bound at a eukaryotic enhancer may activate RNA 
polymerase via interactions with the polymerase- 
associated mediator complex or they may augment 
the recruitment of RNA polymerase by interactions 
with general transcription factors bound at the core 
promoter. 

A second mechanism of enhancer function involves 
alterations of chromatin structure. Enhancers have 
been found to increase the accessibility of sequences 
in the context of chromatin. These chromatin alter- 
ations can be detected by an increased sensitivity of 
chromatin toward digestion with deoxyribonuclease 
I (DNase I) and by an increased accessibility of adja- 
cent binding sites for transcription factors or restric- 
tion enzymes. 

Although enhancers of higher eukaryotes are typ- 
ically defined by their potential to activate promoters 
in tissue-culture transfection assays, enhancers alone 
are often inefficient in activating promoters in trans- 
genic mice. Enhancers are also found as components of 
large and complex regulatory regions, known as locus 
control regions, which confer activation upon linked 
promoters independent of the chromosomal position 
in transgenic mice. In locus control regions, enhancers 
act in combination with other less-defined regulatory 
elements and regulate the activity of multiple genes that 
are located within a domain of a chromosome. These 
chromosomal domains represent structural entities 
that display increased DNase I sensitivity. 

Enhancers can typically act on heterologous pro- 
moters. However, the interactions between enhancer 
and promoter can display specificity. In addition, the 
interactions between enhancers and promoters can be 
regulated by insulators, which act as boundaries of 


chromosomal domains. An insulator that is placed 
between an enhancer and a promoter blocks the inter- 
actions between these elements. Thereby, insulators 
help to impart promoter specificity in complex gene 
loci in which multiple promoters are located in the 
vicinity of an enhancer. 


See also: Chromatin; Cis-Acting Proteins; 
Promoters 
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Enzymes are catalysts that accelerate the rate of 
chemical reactions without permanent alteration to 
themselves. Virtually all enzymes are proteins or con- 
jugated proteins, although some catalytically active 
RNAs have been identified. These catalytic/en- 
zymatic activities are essential to the information and 
energy management requirements of a cell. Specific 
enzymatic activities are found within all cellular 
organelles. In comparison with classical catalysts of 
chemical reactions enzymes are characterized by: (1) 
higher reaction rates; (2) effectiveness under milder 
reaction conditions in terms of temperature, pressure, 
and pH; (3) greater reaction specificity in terms of the 
reactants, products, and the absence of undesirable 
side reactions; and (4) ability to be regulated either 
by reaction rate control, by catalyst concentration, or 
by specific small molecules. 

A system of classification and nomenclature for 
enzymes has been established by the International 
Union of Biochemistry. This system places all en- 
zymes into one of six major classes based on the 
type of reaction catalyzed. Each enzyme is uniquely 
identified by a four-digit classification number. This 
system is often usurped by trivial nomenclature that 
attempts to give some information concerning the 
reactants (substrates) involved and the type of reaction 
catalyzed. Such names usually end with the suffix 
‘ase.’ For example, histidine decarboxylase removes 
CO; from histidine to form histamine. 

The active site of an enzyme (sometimes referred to 
as the catalytic center) is that portion of the molecule 
that interacts with substrate and converts it into pro- 
duct. The initial step is the formation of an enzyme- 
substrate complex (ES). Two distinct models of how 
an enzyme binds its substrate have been proposed: the 
lock-and-key (complementary) model of Fischer and 
the induced fit (conformational change) model of 
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Koshland. These models represent extreme cases. Dif- 
ferent enzymes show features of both models. Amino 
acid side chain residues at the enzyme’s active site 
interact chemically or physically with substrate to 
lower the energy required for the reaction to occur at 
physiological temperatures. Substrate specificity is 
determined by the chemical properties and spatial 
arrangement of the amino acid residues forming 
the active site of an enzyme. The restriction endo- 
nucleases illustrate enzyme substrate specificity. These 
enzymes are responsible for very specific cutting of 
DNA into unique fragments. Separation of these 
fragments provides a ‘fingerprint’ of an individual 
organism’s DNA for unambiguous identification. 
Restriction enzymes play a critical role in the develop- 
ment of the field of biotechnology. 

Some enzymes require the presence of small non- 
protein units (cofactors), either inorganic ions, organic 
molecules, or both. The precursors for some organic 
molecules (coenzymes) are the vitamins. Coenzymes 
covalently attached to the enzyme are called prosthet- 
ic groups and cosubstrates if they undergo chemical 
modification during the reaction. An enzyme with 
its cofactor is called the holoenzyme; without the 
cofactor, the species is called an apoenzyme. 

Isoenzymes (isozymes) are distinct forms of an 
enzyme that catalyze the same reaction but differ in 
physical or kinetic properties. Different isoenzymes 
are usually encoded by different genes and may occur 
in different tissues of an organism. For example, 
human creatine kinase exists as three isozymes that 
predominate in skeletal muscle, heart muscle, and 
brain tissue, respectively. 

Molecules that act directly on an enzyme to reduce 
its catalytic activity are known as inhibitors. Many 
therapeutically useful drugs, pesticides, and herbicides 
are inhibitors of specific enzymes. Inhibitors are clas- 
sified as either reversible or irreversible. The effect 
of reversible inhibitors may be overcome in various 
ways, whereas irreversible inhibitors lead to a state of 
permanent inactivity as they often form stable covalent 
bonds with reactive amino acid residues of the protein. 
Reversible inhibition is further subdivided into com- 
petitive and noncompetitive types. A competitive 
inhibitor is one whose effect is overcome by the addi- 
tion of substrate. Noncompetitive inhibitors engage a 
site other than the catalytic site causing a conforma- 
tional change altering catalytic activity. This state can- 
not be overcome by substrate addition. 

Enzyme-catalyzed reactions are subject to a variety 
of exquisite control mechanisms. These are feedback 
inhibition of allosteric enzymes, covalent modifica- 
tion, proteolytic activation, and regulation of protein 
synthesis and breakdown. Feedback inhibition occurs 
in a metabolic pathway when an early enzyme in 
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the pathway is inhibited by pathway end-product. 
Inhibition of the first step of a pathway conserves 
metabolic energy and prevents the unnecessary accu- 
mulation of metabolites. Since pathway end products 
may have little structural resemblance to the initial sub- 
strate of a pathway, the active site of the initial enzyme 
of a pathway may not bind the metabolic end product. 
Substances that bind at sites other than the substrate- 
binding site and cause a conformational change in the 
enzyme such that the activity is decreased are referred 
to as allosteric (other site) inhibitors. Enzymes that 
exhibit this behavior are called allosteric enzymes and, 
in some cases, can be activated by positive allosteric 
modifiers. Allosteric enzymes are often, but not 
always, multisubunit proteins. 

Covalent modification of enzymatic activity can be 
either reversible or irreversible. Irreversible modifica- 
tion is illustrated by the partial proteolysis of the 
zymogen, chymotrypsinogen, to form the active 
digestive enzyme chymotrypsin. Reversible covalent 
modifications include phosphorylation, adenylyla- 
tion, and disulfide reduction. Reaction sequences of 
this type serve as a rapid, reversible switch to turn a 
metabolic pathway on or off as required by the cell. 
This is illustrated by the interrelationship between 
kinases and phosphatases. Kinases phosphorylate 
enzymes and have a key role in regulation of meta- 
bolic pathways, cell cycle control, cellular prolifera- 
tion, and in programmed cell death (apoptosis). 
Phosphatases counter the effects of protein kinases 
by removing phosphate, thereby serving as regulators 
of signaling by kinases. Both the kinases and phos- 
phatases are known to be subject to hormonal regu- 
latory control. 

The ultimate control of enzyme activity is at the 
gene level. Since enzymes are proteins, the amount of 
an enzyme in a cell is regulated by factors that control 
gene expression. Such factors include hormones and 
some metabolic pathway end-products. For example, 
in the synthesis of the heme portion of hemoglobin 
excess heme represses at the gene level the synthesis of 
the first enzyme in the heme biosynthetic pathway. 
Enzymes subject to this type of control are usually 
very unstable and have a short lifetime in the cell. 

The emergence of pharmacogenomics, identifica- 
tion of population subgroups that would benefit 
from a particular drug treatment, and toxicogenomics, 
identification of population subgroups that would 
exhibit adverse responses, illustrates the importance 
of understanding the interrelationship between genes 
and their product enzymes. For example, the genes 
that are differentially expressed in people sensitive to 
penicillin have been identified, cloned, and sequenced. 
About 150 genes have been identified as predictors 
of penicillin hypersensitivity. The categories of genes 


induced include those associated with ribosomal, 
apoptosis-related, energy generation, and cell cycle 
regulatory enzymes but, surprisingly, not those 
enzymes associated with drug metabolism or detoxi- 
fication. In addition to the practical value in terms of 
human health, multidisciplinary investigations of this 
type should provide a better understanding of the 
biological interrelationships within and between cells. 

For substantial overviews of enzyme structure and 
function see Devlin (1997). More detailed informa- 
tion on enzyme structure and mechanism with an 
introduction to the current concepts of protein engin- 
eering can be found in Fersht (1999). For discussion of 
the chemical basis of enzymatic activity Jack Kyte’s 
book (Kyte, 1995) is recommended. 
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Cellular activities are modulated in response to diverse 
extracellular stimuli from their surrounding envir- 
onment. In multicellular organisms, growth factors 
represent a subset of external cues that program the 
cellular machinery to proliferate, differentiate, or die. 
Soluble growth factor peptide ligands bind to their 
cognate receptors and initiate a cascade of intracellular 
signals that culminate in an appropriate developmen- 
tal response. Epidermal growth factor (EGF), the 
prototypic member of the EGF family of peptide 
growth factors, represents one such form of extra- 
cellular signals. The EGF family of peptide growth 
factors consists of 12 ligands or growth factors which 
can be broadly classified into five groups: 


1. Growth factors that primarily interact with the 
EGF receptor erbB-1: EGF, transforming growth 
factor a (TGF-a); amphiregulin (AR); vaccinia 


growth factor (VGF); shope fibroma growth factor 
(SFGF); myxoma virus growth factor (MGF). 

2. The neuregulin or heregulin ligand families which 
primarily interact with erbB-3 and erbB-4: nere- 
gulin 1-4, B, 2-7 (NRG-1œ, NRG-18, NRG2-a, 
NRG2-8). 

3. Ligandsthatinteract equally with erbB-1 and erbB-4: 
betacellulin (BTC); heparin-binding growth factor 
(HB-EGE). 

4. Ligands that bind exclusively to erbB-4: neuregulin 
3 and 4 (NRG3, NRG4). 

5. Pan or broad specificity ligands that bind to erbB-1, 
erbB-3, or erbB-4: epiregulin (EPR). 


EGF is synthesized as an inactive transmembrane 
precursor that is processed and released by proteolysis 
into the active soluble form that functions as a signal 
transducer. Six cysteine residues which define a three- 
loop secondary structure that is both required and 
sufficient for receptor binding and activation charac- 
terize the prototypic EGF ligand. Three poxvirus- 
encoded EGF-like factors (VGF, SFGF, and MGF) 
have been isolated. Vaccinia growth factor is synthe- 
sized as a transmembrane precursor glycoprotein after 
infection with the vaccinia virus. The tumorigenic 
viruses, myxoma virus and shope fibroma virus, 
encode MGF and SFGF as secreted peptides, respect- 
ively. Although, the viral encoded ligands have lower 
binding affinities than their mammalian counterparts, 
they exhibit equivalent mitogenicity. 

The EGF family demonstrates distinct expression 
patterns; while EGF is found in most body fluids the 
other related family members are secreted as autocrine 
or paracrine factors and so generally act over short 
distances. EGF peptides exhibit distinct expression 
patterns that are either developmentally regulated or 
tissue specific. This is amply demonstrated by the 
highly regulated expression of HB-EGF in the uterine 
luminal epithelium 6-7 h prior to implantation of the 
egg into the uterus. In the adult organism, EGF pep- 
tides play essential roles in the proliferation and dif- 
ferentiation of the mammary gland (mammopoiesis) 
at puberty and mammary gland milk production (lac- 
togenesis) during pregnancy. Targeted inactivation of 
the EGF ligands indicates that they have specific as 
well as overlapping roles in mammary gland develop- 
ment. For example absence of AR is associated with 
impaired mammary ductal morphogenesis, whilst 
inactivation of EGF and TGFo suggest that both fac- 
tors are required for lactogenesis. The viral encoded 
EGF-like factors are not required for viral replication. 
Genetic inactivation studies suggest that the viral 
encoded EGF ligands are required for the enhance- 
ment of virulence and stimulation of cell proliferation 
at the primary site of infection; therefore they may 
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have a role in inflammatory responses. In general, the 
controlled expression of the EGF family of ligands 
appears to be one way of determining their signal- 
ing specificity. The significance of the regulated 
expression of the EGF family of ligands is under- 
scored by the fact that aberrant expression of the 
EGF-related peptides underlies the pathogenesis of 
conditions such as cancer and inflammatory disease. 
Co-overexpression of the EGF-related peptides and 
their cognate receptors frequently occurs in human 
breast, pancreatic, endometrial, and ovarian carcin- 
omas as well as in inflammatory conditions such as 
chronic pancreatitis. The deregulated expression of 
the growth factors results in an autocrine pathway 
that drives uncontrolled cell growth and maintains 
the neoplastic transformation. 

The biological effects of the EGF ligand family are 
mediated by its cognate receptors, the erbB receptor 
tyrosine kinase family, which consists of four 
members: erbB-1 (commonly referred to as the EGF 
receptor); erbB-2 (also known as the neu or Her-2 
receptor); erbB-3; and erbB-4. The multiple EGF 
ligands differentially induce certain receptor combin- 
ations probably because each ligand is bivalent, carry- 
ing not only a high-affinity site, but also a low- or 
broad-specificity site that determines the dimerization 
partner. The monomeric form of receptor tyrosine 
kinases is inactive, but upon growth factor binding, 
oligomerization primarily through homodimerization 
results in receptor auto- and transphosphorylation. 
The bivalent nature of the EGF peptides enables the 
simultaneous binding of two identical (homodimeri- 
zation) or different (heterodimerization) erbB recep- 
tors. The dimerization or juxtapositioning of two 
erbB receptors results in the activation of the intrinsic 
tyrosine kinase activity and receptor auto- and trans- 
phosphorylation of specific tyrosine residues. The 
transphosphorylation event creates docking sites on 
the activated receptor, which initiate a diverse range of 
intercellular signaling events through the recruitment 
of signaling effectors. The recruitment is highly 
specific and is governed by tyrosine-phosphorylated 
modules in the juxta-membrane and carboxy] tail of 
the RTK containing primarily either Src-homology 2 
(SH2) or phosphotyrosine-binding (PTB) motifs. As a 
result, several linear signaling cascades that culminate 
in regulation of gene expression are initiated. The 
EGF ligand family exhibits differential mitogenic 
potency and signaling potential. Both factors are inex- 
tricably linked to the composition of the homo- or 
heterodimeric receptor complex, which determines 
ligand dissociation rates, receptor recycling/degrad- 
ation as well as the temporal duration of the signal. 
In addition, coupling of a given receptor to specific 
intracellular signaling proteins is modulated by the 
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EGF ligand dimerization partner and may indeed ori- 
ginate from differential receptor transphosphoryla- 
tion. As a result the different cellular responses to 
the EGF family of peptide growth factors is due to 
the array of erbB receptors activated and the reper- 
toire of signaling pathways that are engaged at the 
effector level. 


See also: erbA and erbB in Human Cancer; Neu 
Oncogene; SH2 Domain; Signal Transduction 
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The textbook definition of an epigenetic phenomenon 
is “a mitotically and/or meiotically heritable change in 
gene function that cannot be explained by changes 
in DNA sequence” (reviews: Russo et al., 1996; 
Chadwick and Cardew, 1998). To expand this succinct 
formula, epigenetics studies genetic censorship, i.e., 
instances of genome control where a particular locus 
is inactivated (or activated) in a very stable manner 
(through multiple mitotic divisions, sometimes for the 
entire life of the organism, and frequently through 
multiple generations, i.e., it cannot be changed even 
by meiosis!). Remarkably, the program to maintain 
the active or inactive state of a given locus is not 
contained in the primary DNA sequence of that 
locus, hence the etymology of the name “epigenetics,’ 
i.e., heritable phenomena that appear to occur on top 
of, or above, the sequence of the DNA. 


Above and Beyond DNA 


It is useful to distinguish epigenetic regulation of 
genome function from other instances of stable gene 
expression programs. For example, in metazoa, certain 
genes are permanently and exclusively activated in 
unique cell types (in eutherian mammals, globins are 
only expressed in erythroblasts, insulin, in cells of 
Langerhans islets of the pancreas, serum albumin, in 
hepatocytes, etc.) and are silenced in all other cell 
types. While stable, this regulation is not commonly 
referred to as epigenetic, because it is known to be due 
to action by stretches of regulatory DNA (promoters 
and enhancers) contained within those loci (in con- 
cert, of course, with a host of attending DNA-binding 
proteins). By contrast: 


Ì deceased 


e A complete chromosome is transcriptionally inacti- 
vated (with the important exception of a single 
gene) in each cell of mammalian females, but this 
inactivation is not encoded for in the primary 
sequence of the unfortunate censored piece of 
DNA, but rather is determined by how many such 
chromosomes there are in the nucleus. 

e Inthe genomes of eutherian mammals, many genes 
are ‘imprinted,’ i.e., only expressed from one copy 
(we are functionally hemizygous for a number of 
loci in our genome): which allele is censored into 
silence is entirely determined by whether it was 
inherited by its current genome of residence from 
the mother or the father; thus, it does not matter 
what the allele ‘says,’ but only where it came from. 

e In certain fungi, the mating type (‘gender’) of a 
particular cell is not initially decided on by the 
cell’s genotype, but rather is determined by events 
in its grandmother cell. 

e Bothinplants and in animals, genomes protect them- 
selves from parasites such as transposable elements 
by maintaining them in stably silenced form. Quite 
contrary to expectation, however, this silencing is 
not determined by some unique primary sequence 
feature of the transposon or endogenous retrovirus, 
but rather the copy number of that articular sequence 
in the genome; thus, stretches of DNA sequence 
reiterated in a given nucleus more than a certain 
‘allowed’ number of times are censored into silence 
not because they say something offensive (which 
they do, in an evolutionary sense, although the cell 
does not have a mechanism for sensing that), but 
because they occur more than a certain number of 
times (which the cell somehow does sense). 


These, and many other, examples of epigenetic regula- 
tion of gene expression have been an understandable 
source of bewilderement and wonder for many years: 
it was very clear that conventional models of gene regu- 
lation were inadequate to explain, for instance, how 
fission yeast switch mating type, or how repeated DNA 
is silenced, or how gametes imprint particular loci, but 
virtually nothing was known about the underlying 
molecular mechanisms. The past 5 years have reversed 
this predicament quite emphatically, and epigenetics is 
nolonger the proverbial ‘black box’ carrying thefamiliar 
“and then a miracle occurs” logo as a euphemism for 
“we have not the slightest idea how this might work.” 
There followsa brief survey of the history of scholar- 
ship in epigenetics and then we consider, from a gen- 
eral molecular standpoint, what challenges a cell faces 
in creating a stable domain of gene expression. Detail 
of recent molecular evidence regarding the origin and 
functional impact of DNA methylation in the regula- 
tion of vertebrate genomes is described and aspects of 


chromosome structure that collude with DNA 
methylation in effecting epigenetic control are dis- 
cussed. Possible mechanisms for the tagging of loci 
in taxa are also reviewed, e.g., arthropods, that do not 
appear to have DNA methylation in their genomes. 
Then several representative examples of epigenetic 
regulation, are presented, focusing in each case on 
the presumed evolutionary benefit reaped by the cell 
and the organism from effecting such a mode of gene 
control, and on recent molecular data that offer 
mechanistic explanations (reviews: Russo et al., 1996; 
Chadwick and Cardew, 1998). In conclusion, there is a 
short perspective on the general applicability of epi- 
genetic principles to genome control in eukarya. 


Brief History of Research in Epigenetics 


As is the case for virtually every branch of genetics, 
most major epigenetic phenomena were initially char- 
acterized in plants and insects. In the early 1950s, 
Barbara McClintock (Cold Spring Harbor La- 
boratory) discovered that the suppressor-mutator 
(Spm) transposable element in maize can be inacti- 
vated and kept silent for generations, until this silence 
is suddenly reversed, again in heritable fashion 
(McClintock, 1958). Soon afterwards, R. Alexander 
Brink (University of Wisconsin, Madison) reported 
that the penetrance of particular gene alleles control- 
ling kernel color in maize is sometimes dependent on 
the genotype of the parent plant from which they were 
inherited, and not the genetic constitution of the plant 
currently carrying them (this remarkable phenom- 
enon was dubbed ‘paramutation’; Brink, 1958). In 
the 1930s, Charles Metz (Columbia University) dis- 
covered that female flies of the genus Sciara have a 
most unusual mechanism of sex inheritance: a given 
female has only daughters or only sons. A genetic 
explanation for this peculiarity was provided in 1960 
by Helen Crouse (Columbia University) — once a 
student of Barbara McClintock — who realized that, 
in Sciara, chromosomes of paternal origin are some- 
how heritably ‘marked’ for elimination in future gener- 
ation; she dubbed this ‘chromosome imprinting’ 
(Crouse, 1960). 

The general characteristic that emerged from these 
seemingly unrelated observations was that, in utter 
contradiction to common-sense notions of a relation- 
ship between genotype and phenotype for a given 
organism, epigenetically regulated traits (be it seed 
color in corn or gender in flies) are sometimes defined 
by the genotypic environment that was experienced 
by particular alleles controlling that trait prior to 
being inherited by that organism. Since the primary 
DNA sequence of those alleles remained unchanged 
(McClintock, Brink, Metz, and Crouse did not know 
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that at the time of their pioneering studies, but we now 
do), something other than the sequence must have 
“tagged along” withthe DNA toregulate its expression. 
The nature and mechanism of action of that something 
is the focus of this article. 

The study of epigenetics on a single-cell level was 
launched by Mary Lyon’s (Medical Research Council, 
UK) studies (in 1961) on coat color in the mouse: She 
insightfully combined earlier cytological observations 
by Susumu Ohno on the compaction of one X chromo- 
some into a dense “Barr body’ with her own genetic 
analysis to propose that a stable and random inactiva- 
tion of one of the X chromosomes must occur in 
females (Lyon, 1961). Thus, it became clear that, 
even in the lifespan of a given organism, significant 
portions of the genome can be entirely eliminated 
from expression programs, and that such elimination 
is not based on primary DNA sequence (if it were, 
then only a specific X chromosome would be inactiv- 
ated, whereas the inactivation is random). 

In 1975, after a sufficient number of phenomena of 
this sort had been reported in the literature to warrant 
attempts at a mechanistic explanation, A. Riggs (City 
of Hope NMC, USA) and, independently, Robin 
Holliday and J. Pugh (NIME, UK) proposed a role 
for DNA methylation in controlling vertebrate 
genomes (Holliday and Pugh, 1975; Riggs, 1975). By 
then, it had been established that the chemical modi- 
fication by methyl groups of bases in double-stranded 
DNA of prokaryotic genomes plays an important role 
in the familiar restriction-modification pathways for 
host genome stability. While based on little to no 
experimental data, these investigators’ proposals have 
withstood empirical testing remarkably well. As dis- 
cussed in some length in the section “The little methyl 
that can”, DNA methylation is particularly well suited 
tocarrying the censor’s epigenetic mark. Its prominence 
and ubiquity in mammalian genomes was revealed by 
Adrian Bird and Edward Southern, who in 1978 
developed an ingenious method for its detection. 

A short time later, Azim Surani (Wellcome Insti- 
tute) and Davor Solter (Max Planck Institute) made an 
experimental observation with far-reaching conse- 
quences: by pronuclear transplantation, they showed 
that the artificial union of two haploid male genomes 
or of two haploid female genomes cannot sustain nor- 
mal embryonic development. Thus, they reasoned, the 
contributions made by the two haploid chromosome 
sets to the final karyotype must be unequal, at least for 
some loci that are required during embryogenesis. A 
dedicated effort from a large number of researchers 
has now led to the firm realization that specific loci in 
mammalian genomes are stably silenced in a gender- 
specific manner (i.e., female gametes always repress 
subset X and male gametes, subset Y of the genome). 
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Because this repression is irreversible until the next 
passage through the germline, a gynogenetic or andro- 
genetic embryo will be functionally null for subset X 
or Y, respectively; this results in lethality, because at 
least one active copy of each gene in both subsets is 
required for development. Thus, it became clear that 
the phenomenon of chromosome imprinting discover- 
ed by Helen Crouse in Sciara has a close evolutionary 
analog in mammals, except the parent-of-origin im- 
print applies not to entire chromosomes, but to smaller 
chromosomal domains, or even individual genes. 

For many scholars of epigenetics and molecular 
biology, it was very intellectually gratifying when 
data from many laboratories obtained over the past 
10 years provided a firm functional link between the 
methylation status of particular alleles and imprinting. 
Some of the most convincing of these observations 
came from the work done in Timothy Bestor’s labora- 
tory at Columbia University: a mouse genetically 
engineered to lack the major enzyme responsible for 
maintaining the genome in a methylated state (DNA 
methyltransferase-1) died owing to an extraordinary 
misregulation of epigenetic pathways, most notably 
X chromosome inactivation and the maintenance of 
transcriptional silencing at imprinted loci. Many of the 
functional pathways involved in the latter two phe- 
nomena were meticulously dissected in work by 
Rudolf Jaenisch’s research group at MIT, and Shirley 
Tilghman’s laboratory at Princeton University. An 
additional piece in the puzzle was filled in by the dis- 
covery in Adrian Bird’s laboratory, at the University of 
Edinburgh, that mammalian genomes contain several 
proteins that appear to very selectively bind to methy- 
lated (epigenetically regulated) DNA loci (Bird and 
Wolffe, 1999). Subsequent work from the Bird labora- 
tory, and from Alan Wolffe’s research group at the 
NIH, has shown that some of these proteins are potent 
repressors of transcription (Bird and Wolffe, 1999). In 
an exciting development, this repression was revealed 
to be mechanistically based on the localized creation of 
an area of highly specialized, inaccessible chromosome 
structure; thus, a hypothetical mechanism whereby 
such a structure could propagate itself through multi- 
ple rounds of cell division became immediately appar- 
ent. Considering the progress made over the past few 
years, it is nevertheless remarkable how much in epi- 
genetics remains obscure, unexplained, and occasion- 
ally beyond the pale of rational explanation; this 
promises many more decades of exciting data. 


How to Keep Your State of Expression 
When All about You Are Losing Theirs 


Gene and genome regulation is at its core a dynamic 
phenomenon: All of its key players are bound to each 


other, not through covalent or strong electrostatic 
bonds into permanent crystalline arrays, but rather 
through weaker-charge, hydrophobic, and van der 
Waals interactions. This is not a whim of nature, but a 
response to evolutionary pressure. The eukaryotic gen- 
ome evolved to be rapidly responsive to a great variety 
of internal and external stimuli, so macromolecular 
complexes that control it do not associate with each 
other permanently, but rather engage in much more 
fluid interactions. While familiar textbook images of 
gene control in bacteria present static pictures — e.g., the 
lac repressor firmly bound to the operator in the lac 
operon — the reality is that many protein-DNA inter- 
actions that occur in the nucleus have relatively high off 
rates (i.e., complexes fall apart relatively easily, and then 
quickly reform, and then fall apart again). 

Genomes in all taxa, however, have a firm need to 
impose on particular regions of themselves a relatively 
permanent state of activity; for example, so-called 
‘housekeeping genes’ (i.e., genes whose products are 
indispensable for cell viability, such as enzymes 
involved in anabolic and catabolic pathways, proteins 
that are structural components of the cytoskeleton) 
have to be active at all times, as do tissue-specific 
genes (in the cognate tissue). Conversely, some regions 
of the genome — for example, invading genomic para- 
sites such as transposable elements — must be kept in 
perpetual silence, because their spurious activation 
will lead to an intranuclear epidemic and the destruc- 
tion of the genome. 

A solution very commonly used by cells is to not 
rely on single proteins to regulate the expression of 
a particular gene, but many proteins; for example, 
tissue-specific genes are well known to be activated 
through the concerted binding and action of at least 
a dozen distinct factors bound to several 100 bp of 
DNA both in promoters (i.e., stretches of DNA next 
to the transcription start site) and enhancers (DNA 
more distant). Work from the laboratory of Tom Man- 
iatis at Harvard has shown that, in the case of the 
human interferon-B gene, all these regulatory com- 
plexes coalesce into a “united we stand” type of struc- 
ture called an ‘enhanceosome.’ 

A serious problem arises, however, when the cell 
needs to divide and therefore replicates its DNA; 
DNA polymerase and its entourage move through 
the chromosome with all the subtlety of a military 
tank, erasing all nucleoprotein organization in their 
path. The carefully assembled regulatory complexes 
that sat over particular loci are therefore destroyed 
and must be recreated de novo, and in two copies, 
instead of just one that existed before replication. 
It is hardly surprising, therefore, that many tissue- 
specific genes are transiently deactivated when mature, 
differentiated cells are induced to divide (for example, 


proliferating hepatocytes stop expressing many 
liver-specific markers). In general, proliferating cells 
in multicellular organisms only very rarely express 
genes associated with the differentiated state, such 
luxury is only allowed to cells that are replicatively 
quiescent and can assemble regulatory complexes on 
DNA with no fear of being swept out of the way by a 
passing megadalton assembly of DNA polymerase. 
Thus, the first major challenge facing the cell in 
enabling stable domains of gene expression is a need 
to maintain them through repeated rounds of genomic 
replication. 

There is an additional problem, however: even in 
cells that are in a state of proliferative arrest, a protein 
complex bound to DNA is no guarantee of stability. In 
large part this is because the eukaryotic nucleus con- 
tains significant quantities of “philandering” transcrip- 
tional regulators; not tethered to any particular DNA 
segment, they can spuriously affect a random gene 
through a “hit-and-run” mechanism. One possible 
kind of insurance against such accidents is the establish- 
ment of a particular kind of “fortified” regulatory 
structure at a given locus that would be impervious 
to such sporadic attacks. The second major challenge 
in stably controlling the genome is to minimize levels 
of regulatory noise due to spurious interactions. 

As discussed in the following two sections, epigen- 
etically regulated genes and loci offer a wonderful 
example of how parsimonious natural selection can be 
in molding regulatory pathways. The solution to both 
challenges is implemented via a remarkably elegant 
integration of simple biochemical mechanisms. 


The Little Methyl that Can 


Nature’s answer to the first challenge — maintenance 
of structure in the face of replication — turns out to 
be a remarkably ancient one; the biochemical system 
used is found in all taxa studied and is commonly used 
to maintain genomic stability in the face of invaders or 
DNA replication. 

As mentioned in the section “A brief history of 
research in epigenetics,” the mid-1970s saw the con- 
current emergence of two independent proposals that 
DNA methylation may be an important regulatory 
mechanism. Before elaborating on the remarkable ser- 
ies of discoveries about its functional role in the gen- 
ome, it would be helpful to show a quick snapshot of 
the main agent in question: 5-methylcytosine (m°C; 
Figure 1). Of the many different kinds of DNA 
methylation that occur in nature, the one most con- 
spicuous in the genome of higher vertebrates is that on 
carbon atom 5 in cytosine (Figure |) — we will focus 
on this, bearing in mind that other bases in the DNA 
are modified by methylation as well. 
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Teleological reasoning is dangerous in describing 
biological phenomena, but it is nevertheless remark- 
able how well suited m°C is to the task of being at the 
center of epigenetic regulation. This stems from sev- 
eral circumstances: (1) the C-C bond is not chemically 
labile, and thus provides desired stability to the regu- 
latory pathway that exploits it; (2) carbon-5 of cyto- 
sine does not engage in Watson—Crick hydrogen 
bonding with the guanine on the other stand of 
DNA, thus this modification does not impede the 
formation of conventional DNA structure; and (3) 
that being said, the hydrogen atom normally attached 
to carbon-5 does project into the major groove of 
DNA, and, as such, is part of a recognition surface 
for the multitude of DNA-binding proteins that read 
the DNA sequence by scanning the electron orbital 
profiles of the bases in the major groove. It is hardly 
surprising, therefore, that replacing a hydrogen atom 
with a methyl group yields a highly distinctive state- 
ment in the molecular Braille of DNA, one that can be, 
and is, recognized as being markedly different from 
unmethylated cytosine. 

To appreciate the final bit of m°C biology relevant 
to epigenetics, we must mention a peculiarity to 
methylated cytosine in our genomes that has profound 
regulatory consequences: Its overwhelming majority 
occurs not on random cytosines, but only in the 
context of the dinucleotide 5’-CpG-3’. By itself, this 
would not be particularly significant, unless one 
appreciates the fact that CpG is quite the genomic 
oddity in mammals, for two reasons: (1) because 
there are 4* = 16 possible dinucleotides, one would 
expect each one to represent ~6.25% of the human 
genome, and this simple math holds for all dinucleo- 
tides, except CpG, which is remarkably rare in mam- 
malian genomes compared with all the others; and 
(2) within the sequence of the genome, most dinucleo- 
tides occur relatively randomly, except for CpG, 
which occurs in clusters (commonly called ‘CpG 
islands’). Thus, as one browses through the narrative 
of mammalian DNA, CpG will be the rarest three- 
letter word, and, when it does occur, it will do so many 
times within the course of a single short passage (e.g., 
100 times over a given 1000bp), only to disappear 
again (i.e., occur once every 200 bp). 
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Figure 2 (A) The passage of the replication fork and reestablishment of methylation; (B) the replication of 


chromatin. 


This is very significant for controlling epigenetic 
phenomena, because, in the double helix of DNA, the 
stretch around 5’-CpG-3’ is symmetric (i.e., the other 
strand reads 3’-GpC-5’); in fact, when human DNA is 
examined, all of the methylation is also symmetric: 


5 —...mC—-P-G...-3' 
Faas, G-P-m’C...—5 


An immediate consequence of such symmetry is 
that methylation can be endlessly propagated on the 
DNA during replication; all that is needed for such 
maintenance is an enzymatic activity that will recog- 
nize the product of replicating methylated DNA - 
‘hemimethylated DNA’ (i.e., DNA where one strand 
is methylated and the other is not) — and methylate the 
currently unmodified strand (Figure 2). As the reader 
will know, such systems exist in bacteria to facilitate 
replication-coupled DNA repair (when a postreplica- 
tive mismatch is encountered, the strand that is 
methylated is assumed to contain the correct sequence 
by default). Cells of higher vertebrates contain an 
enzyme called DNA methyltransferase-1 (DNMT1); 
its enzymatic specialty is to restore methylation to 
the other strand (i.e., perform the reaction shown 
in Figure 2). In wonderful testament to the evolution- 
ary unity of life, bacterial, plant, and mammalian 
DNA methyltransferases are closely related to each 
other in primary sequence. One of the really inter- 
esting features of this system in our cells is how fast it 
is: DNMT1 is known to be targeted to ‘replication 
foci’ (i.e., portions of the nucleus where the DNA 
replication machinery is located and acts), and 


remethylation occurs within 1min of replication. 
Thus, the censors act quickly to suppress any 
unwanted information from being revealed! It is use- 
ful to recall at this point that DNMT1 is required for 
normal mouse development; thus, the organism takes 
the censors’ job seriously. 

The molecular weight of the modifying methyl 
groups is a combined miniscule 26 Da. Thus, in the 
language of DNA, methylation is little more than a 
diacritical mark such as the umlaut (e.g., 4 or ii) in 
German, a tiny modification of the text; and yet its 
impact on molecular complexes that outsize it by 4-5 
orders of magnitude (e.g., the RNA polymerase II 
holoenzyme) is quite powerful. This, perhaps, is not 
surprising, since in human languages these tiny 
accents, when placed over particular words, can 
change the meaning of entire passages (for example, 
in German, schon, meaning ‘already’ can become 
schön, meaning ‘pretty’). In an analogous way, the 
behavior of the same DNA stretch in methylated and 
unmethylated form is dramatically different, because 
it means quite distinct things. How such alteration of 
meaning is thought to be effected is described in the 
next section. 


Chromatin and Methylation: Large 
Effects from Small Causes 


A general rule that helps understand the role of 
methylation in epigenetic control is as follows: methyl- 
ated loci in our genomes are repressed (how epigenetic 
control works in organisms whose genomes lack 


DNA methylation, e.g., arthropods such as the fruit 
fly Drosophila melanogaster and nematodes such as 
the round worm Caenorhabditis elegans, is a fascin- 
ating issue). Thus, if a particular gene has its promoter 
methylated, it becomes transcriptionally inert (‘in- 
visible’ to RNA polymerase). DNA methylation 
is one of the most potent mechanisms for transcrip- 
tional repression known in biology today. Most sig- 
nificantly, we currently do not know of any way to 
reactivate a chromosomal locus that has been silenced 
by methylation except to remove the methyl residues. 

This is not the academic issue that it might seem; 
advanced forms of human cancer are well known to 
have aberrant DNA methylation; for example, genes 
required for cell-cycle arrest, such as cyclin-dependent 
kinase inhibitors, are erroneously silenced in tumor 
cell lines because their promoters are methylated 
(which they never are in normal, noncancerous cells). 
In addition, as elaborated in the section “Repetition, 
mother of genetic silencing” up to one-third of our 
genomes consists of parasites (self-propagating DNA 
elements such as transposons). They are kept in check 
(i.e., silenced) by hypermethylation, and woe to the 
genome that unleashes the parasites within it. 

How can DNA methylation — by the tiny 26 Da of 
methyl groups involved — be so powerful in antagon- 
izing the transcriptional machinery? The answer to 
this question also takes care of the other challenge 
to enabling stable domains of gene expression: how to 
prevent noise. It turns out that methylation of a 
particular DNA stretch is read by the cell as a com- 
mand to envelop it in a protective cocoon, a specia- 
lized protein structure that makes the DNA 
physically inaccessible to regulators. This shielding 
occurs in two steps: first, when a DNA stretch is 
methylated, it is immediately bound by specialized 
proteins, discovered by Adrian Bird’s research group, 
called methylated DNA binding domains (MBDs) — 
these have the interesting and useful property of being 
highly selective for methylated DNA (for example, the 
best-studied MBD, a protein called MeCP2, will bind 
m°CpG, but not CpG). Once bound, these MBDs 
attract other proteins, all of which can remodel chro- 
matin, i.e., alter the structure of the chromosome 
around their binding site. 

To appreciate how chromatin remodeling can lead to 
transcriptional repression, we must recall that there is 
no naked DNA in our cells, all of it is complexed with 
highly positively charged proteins called histones. 
Every 146 bp of DNA in the genome is wound around 
eight molecules of histones to form a nucleosome — the 
elementary building block of our chromosomes; the 
familiar “bead-on-a-string” array of nucleosomes 
winds around itself to form chromosomes. As one 
might expect, the cell uses chromatin to regulate its 
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genome: in a general sense, tighter assembly of DNA 
into mature chromatin leads to transcriptional repres- 
sion. This “tightness” is, to some extent, regulated by 
changing the charge of the histones: dedicated en- 
zymatic complexes called histone deacetylases can 
promote DNA binding to the histones by increasing 
the amount of positive charge found in the histone 
tails (stretches of the histone proteins that stick like 
tentacles outside of the spool of the nucleosome). The 
positively charged tails then envelop the phosphodi- 
ester backbone of DNA in a web of protein and make 
it inaccessible to other molecules. As discovered in the 
laboratories of Adrian Bird and Alan Wolffe, proteins 
that bind methylated DNA exist in various large com- 
plexes that include, among other things, histone de- 
acetylases, and ATP-dependent molecular machines 
that make chromatin more compact. 

Thus, a stretch of methylated DNA is bound by 
dedicated proteins (MBDs) that, in turn, target special- 
ized enzymatic complexes (histone deacetylases), 
which build a wall of repressive chromatin between 
the DNA and the rest of the nucleus. Therefore, much 
like the pea from H.C. Andersen’s famous fairytale 
The Princess and the Pea, the tiny methyl makes its 
presence known through layer upon layer of proteins 
that assemble over the DNA under its command. 

Most significantly for our purpose, however: (1) 
methylation does not change the primary sequence 
of the DNA, and (2) it can be propagated endlessly 
through many rounds of DNA replication. How that 
manifests itself in the various epigenetic phenomena 
known to occur in nature is described in the following 
section, but at this point, it is useful to consider that, 
while DNA methylation itself is ‘replication-resistant,’ 
a particular chromatin structure associated with a 
hypermethylated locus will also probably segregate 
to the two nascent DNA strands in the aftermath of 
DNA replication fork passage (Figure 2), and thus 
also enforce a particular state of expression onto the 
daughter chromosomes. 


Epigenetic Regulation in Action: The 
Rest Is Silence 


From a few short reports in the 1950s and 1960s, 
epigenetics has blossomed into a very large field of 
study, populous enough to warrant c. 40 separate 
review chapters in a recent compendium (Russo et al., 
1996)! We will briefly survey the wondrous breadth of 
epigenetic phenomena. 


Epigenetic Silencing of Entire Chromosomes 
In the most extreme case of epigenetic regulation, an 
entire chromosome is eliminated from expression. 
The most common instance for this is in ‘dosage 
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compensation’: a ubiquitous mechanism whereby the 
organism ensures that males and females have an equal 
number of active alleles for all loci on their sex chromo- 
somes. By way of a simple example, men express one 
set of X chromosome genes and are genotypically 
hemizygous for them; women also express a single 
set, but are genotypically diploid. This very interest- 
ing predicament comes about by the well-studied 
process of X chromosome inactivation: very early 
in development, female embryos randomly and per- 
manently inactivate one of their X chromosomes. The 
silenced X becomes condensed (and forms the famous 
Barr body), replicates much later in S-phase of the cell 
cycle than its active homolog (a well-known feature of 
transcriptionally inert DNA), has its CpG islands 
hypermethylated, and its histones deacetylated (in 
comforting support of a role for both processes in 
epigenetic regulation). The key point to stress here is 
that the primary DNA sequence of the inactive X can 
be indistinguishable from that of the active X; thus, 
regulation in this system is enabled on a level “above 
the DNA.” 

The inactive X is transcriptionally silent, with a few 
very important exceptions, the most significant ones 
being genes found in the a small portion of the X 
chromosome, the XIC (the X-inactivation center). 
Of the genes in the X/C, the most interesting one is 
Xist (X-inactivated-specific transcript, pronounced 
“exist”). Its product is a 17-kb RNA that does not 
contain open reading frames and is presumed to 
function by physically coating the chromosome 
from which it is transcribed, and thereby inactivating 
all of it, except the gene for itself, which remains active. 
A great number of tantalizing hypotheses and datasets 
have been offered to explain the many questions sur- 
rounding this remarkable phenomenon: How is the 
chromosome to be inactivated chosen from the two 
that are active early in development? Why does only 
one X chromosome express X7st? How does a coat of 
RNA lead to hypermethylation and chromatin con- 
densation? How does the Xzst gene on the inactive 
X escape from being inactivated by its own product? 

The organism whose study originated the term 
‘imprinting,’ the fly Sciara, goes even further than 
mammals and inactivates an entire chromosomal set 
(Gerbi, 1986): In this organism, the female determines 
the sex of her progeny, and this is enabled by a truly 
remarkable process of chromosome elimination in the 
male: during spermatogenesis, in meiosis I, the entire 
paternal set of chromosomes condenses and is physic- 
ally eliminated, leaving the spermatocyte with only its 
mother’s chromosomes! Thus, the paternal chromo- 
some set, when inherited by the male, carries an epi- 
genetic imprint for the entirety of that male’s lifespan, 
until such a mark instructs the gonads to eliminate 


them. How this occurs is largely unknown (insects 
are not known to have CpG methylation in their 
genomes), but it is interesting that in other insects 
such as Drosophila dosage-compensated, epigenetic- 
ally regulated sex chromosomes have defined alter- 
ations in chromatin structure and histone tail 
acetylation. 


Fetal Growth as a Casus Belli: Imprinting in 
Mammals 

As mentioned in “A brief history of research in epi- 
genetics,” the maternal and paternal genome make 
unequal contributions to the genomic output of their 
joint product, the progeny: for a certain number of 
genes, one copy in our genome is inactivated for the 
duration of our lifespan. Appropriately borrowing the 
term from Helen Crouse’s study of Sciara, loci regu- 
lated in this way are referred to as ‘imprinted.’ It would 
be helpful to explain the terminology used in this field: 
a gene is called ‘maternally expressed’ if organisms 
always use (i.e., transcribe) the allele they inherited 
from their mother; conversely, a ‘paternally expressed’ 
gene is one in which the allele inherited from the father 
is the one that is active. Imprinting, while of immense 
academic and general intellectual interest, has medical 
relevance: several human disorders, including Prader- 
Willi and Angelman syndromes, are caused by a mis- 
regulation of imprinted loci. 

Of the many questions that spring to mind regard- 
ing imprinting, we will briefly address three: (1) Ona 
mechanistic level, how is the difference in expression 
between the two alleles effected? (2) How do male 
and female gonads ensure correct imprinting in future 
generations and distinguish sets of paternally and 
maternally expressed genes when producing gametic 
precursors? (3) What selection pressure could have 
lead to the evolution of such a peculiar mode of gene 
regulation? 


1. The difference in expression is maintained by keep- 
ing the imprinted loci ina state of differential methy- 
lation, such that the expressed allele is demethylated, 
and the repressed allele is hypermethylated. We 
emphasize that the primary DNA sequence of the 
two alleles can be identical, and yet profound differ- 
ences in expression levels are observed. A genetic 
ablation of pathways leading to DNA methylation 
abrogates correct regulation on many imprinted loci 
in the mouse genome. One very interesting recent 
development has been the discovery, by the research 
groups of Shirley Tilghman at Princeton and Gary 
Felsenfeld at the NIH, that for the H19/Igf2 
imprinted locus, the effect of such differential 
methylation is to control the ability of a protein 
called CTCF to associate with a regulatory element 


found in this area (CTCF binding is prevented by 
methylation). Interestingly, the role of CTCF bind- 
ing is to speed the spread of regulatory information 
along the chromosome, i.e., CTCF enables bound- 
ary, or insulator function in this locus. In other cases, 
the effect of methylation is to drive the creation of a 
repressive chromatin structure over the genes being 
regulated. 

Few things in epigenetics are the cause of greater 
wonder and mystery than the establishment of 
methylation patterns relevant to imprinting during 
gametogenesis. By way of example, consider a 
paternally expressed gene in a human female: of 
the two alleles she has in her genome, the allele 
she inherited from her father (let us designate it 
as 3) is demethylated and active; the allele she 
inherited from her mother (2) is methylated and 
inactive. During oogenesis, the following happens 
(‘m’ stands for ‘methylated’): 


Paternally expressed genein an ovary: 9™ g > 2" g™ 


Thus, in her germ cells, the maternal allele is kept 
methylated, and the paternal allele is methylated 
de novo (by definition, her children must inherit 
this allele from her in inactive, methylated form). 
Remarkably, a maternally expressed locus in that 
same woman undergoes the exact opposite process: 


Maternally expressed gene in an ovary: 2 3g” > 2 3 


(i.e., the cell takes the allele this woman inherited 
from her father, and demethy]ates it; again, by defin- 
ation, this woman’s children have to receive this gene 
from her in active, demethylated form). 
During spermatogenesis in the male, all maternally 
expressed genes are methylated and all paternally 
expressed genes are demethylated. This process is 
called ‘resetting of gametic marks,’ and we have 
little beyond very weak conjecture on how it is 
enabled. How can the cell possibly scan the vast 
narrative of its entire genome for all loci that are 
differentially methylated on two homologous 
nonsister chromatids? Once these loci have all 
been found, whatever quasi-miraculous machine 
lies behind this search, what can be the mech- 
anism whereby the cell determines whether this 
particular allele pair is maternally or paternally 
expressed (this cannot be based on the difference in 
methylation between the two alleles, of course), and 
decides whether to demethylate or hypermethylate 
both alleles? As if this was not mysterious enough, 
how can such programs be perfectly reversed in a 
gender-specific way (1.e., the same locus is treated ina 
diametrically opposite way in males versus females)? 
However this works, itclearly does, and quite well; 
but how does organism benefit? A very attractive 
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hypothesis proposed by David Haig and colleagues 
is that the actual embryo in which imprinting is 
manifested does not benefit from it at all, and that, 
instead, genomic imprinting is the manifestation of 
a genomic arms race, a tug-of-war between its two 
parents. This ‘conflict theory’ is based on the 
known unequal contribution parents make to their 
child in mammals: A father’s investment is fre- 
quently minimal, but he does have an interest in 
having a healthy baby, a mother carries the fetus for 
the duration of gestation, but cannot afford to 
devote all her resources to the development of this 
particular infant, because she needs to reproduce 
again. Thus, fathers have an agenda: the embryo 
must grow as large as possible and obtain as many 
resources from the mother as it can, all for the cause 
of propagating the father’s genes. The mother’s 
agenda is more balanced: it cannot allow a given 
fetus to squander away too many of her resources, 
because she has other pregnancies in the future to 
consider. How can these agendas be implemented 
in molecular terms? Ergo imprinting: according to 
the theory, paternally expressed genes promote 
embryo growth, while maternally expressed genes 
stymie it. In the embryo, a genome-wide tug-of- 
war thus occurs: paternally expressed genes 
attempt to make the embryo grow larger, while 
maternally expressed genes try to do the exact 
opposite. 


This elegant theory received a great deal of attention 
and experimental testing, and has largely withstood 
these empirical trials. One of its most interesting pre- 
dictions is that imprinting should have disappeared in 
monogamous mammals (i.e., animals that mate for 
life), because parents have an equal interest in their 
progeny (the father cares about the mother’s welfare, 
because she is the sole carrier of his children). Shirley 
Tilghman’s laboratory tracked down a species of 
monogamous mouse (which was hard to find, because 
true monogamy is very, very rare among mammals), 
Peromyscus polionotus, and discovered that imprinting 
has been preserved. There are sufficient data from 
other studies, however, to keep the conflict theory as 
the best explanation we currently have for the utility 
of imprinting. 


Repetition, Mother of Epigenetic Silencing 
The examples of epigenetic regulation discussed up to 
now all involved pathways for the control of the organ- 
ism’s own genes. In addition, a wide variety of ex- 
amples from fungi, plants, and animals all point to a 
major role for epigenetic silencing in preserving the 
stability of the genome and protecting it from being 
swamped by genomic parasites. 
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For humans and other mammals, this is not an 
academic issue at all (our genomes are only 5% exons 
of active genes, and a lofty 35% intact or mutated 
genomic invaders such as transposons). A simple 
illustration of the gravity of this matter was provided 
in a study of genomic stability in interspecific hybrids 
by J. Marshall-Graves and colleagues: the offspring of 
two different species of wallaby had its genome 
practically destroyed by an explosion of endogenous 
retroelements (these were succesfully kept in check in 
the parent species, but the hybrid failed to recognize 
transposons of heterologous origin as a threat). 

The overwhelming bulk of endogenous genomic 
parasites are silenced by DNA methylation; one very 
interesting side effect of such repression is the known 
tendency of m’CpG to spontaneously deaminate and 
yield TpG. Such point mutations, while relieving 
methylation-driven repression, irreversibly inactivate 
open reading frames within the retroelement required 
for its propagation! A major question in the study 
of this ‘genome defense’ pathway is the mechanism 
whereby the genome recognizes repetitive DNA 
within itself and targets the DNA methylation 
machinery to it. It has been suggested that such recog- 
nition occurs during gametogenesis, when repetitive 
DNA will tend to associate during homologous chro- 
matid pairing in meiosis. 

It is important to appreciate that the silencing of 
repetitive DNA is a phenomenon that occurs in all 
eukaryotic taxa; for example, it is well known that, of 
the many rDNA repeats found in the genome of the 
budding yeast Saccharomyces cerevisiae, only a few 
are transcriptionally active. Similar processes occur in 
such filamentous fungi as Ascobolus and Neurospora 
(Wolffe and Matzke, 1999), where repetitive DNA is 
actively sought out and epigenetically inactivated in 
processes termed ‘repeat-induced point mutation’ 
(RIP) and ‘methylation induced premeiotically’ 
(MIP). In some cases, the probable utility to the cell 
of such silencing is not in the abrogation of transcrip- 
tion per se — after all, rRNA, for example, is essential 
for viability — but in the suppression of DNA recom- 
bination capacity (repetitive DNA is a dangerous site 
for interchromatid recombination, because it leads to 
genomic instability). In other cases, the genome’s 
capacity to seek out and inactivate repetitive DNA is 
clearly a defense mechanism (although somewhat of 
an inefficient one, since mammals, armed to the 
genetic teeth with methylation, have 10 times as 
many genomic parasites as invertebrates, which do 
not methylate their genomes). From a clinical stand- 
point, the irreversible inactivation by hypermethyl- 
ation of transgenes introduced into organisms during 
gene therapy is, however, a poignant illustration of the 
power of such defense. 


Epigenetic Inheritance as a Violation of 
Mendelian Principles 

Whatever the mechanism whereby epigenetic activa- 
tion or repression is enabled, one of its most salient 
features is that traits controlled epigenetically fre- 
quently exhibit non-Mendelian inheritance patterns. 

A classic example is the paramutation phenomenon 
that affects maize kernel color: the R locus controls 
pigment formation, with the R" allele producing dark 
kernels and the R“ allele, stippled kernels. This would 
be yet another case of the exceptional utility of color 
inheritance in providing textbook illustrations of 
Mendelian inheritance, if it were not for the following: 
a testcross of an R*/R* plant yields all dark kernels and 
ofan R*/R*, all stippled, in full agreement with expect- 
ation. In overwhelming contradiction to common 
sense, a testcross of an R'/R™ plant yields all stippled 
kernels, even though 50% of them are genotypically 
R". We now know that the r allele is somehow epigen- 
etically modified (weakened) by the act of its passage 
through a heterozygotic environment that contains an 
st allele, but the mechanistic details are not well under- 
stood (Russo et al., 1996). 

As mentioned earlier, studies of epigenetic pheno- 
mena were initiated by Barbara McClintocks’s experi- 
ments on the Spm transposon in maize (which, 
incidentally, also affects pigment formation in the 
kernel), in which she showed that activity of this 
transposon can fluctuate through generations, and 
thatitcan become epigenetically inactivated and reacti- 
vated (not surprisingly, the trait affected by the trans- 
poson insertion fails to comply with Mendelian 
segregation rules). Subsequent work from Nina 
Fedoroff’s laboratory at Penn State University has 
shown this regulation to be due to alterations in the 
methylation status of a stretch of the Spm promoter 
that contains a very high percentage of G/C residues — 
a hypermethylated transposon is heritably inactivated 
until passage through a nucleus containing an active, 
demethylated copy of the Spm transposon leads to 
demethylation and activation (Russo et al., 1996). 

A final, wonderful example of the extraordinary 
power of epigenetic regulation in effecting non- 
Mendelian inheritance comes from fission yeast 
Schizosaccharomyces pombe. Work from Amar Klar 
and his colleagues has unraveled the very elegant 
mechanism whereby this organism switches mating 
type: after meiosis, haploid spores reacquire the capa- 
city to mate by assuming one of two mating types 
(‘plus’ and ‘minus’; Figure 3A). The mating type is 
defined though a DNA recombination event in the 
spore; only spores of opposite type can mate. Remark- 
ably, when a single spore of the plus mating type 
divides twice, of its four granddaughters, three remain 
plus and one switches to minus. We now know this 
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Figure 3 The epigenetic regulation of mating-type 
switching in fission yeast. (A) After meiosis, haploid 
spores assume a ‘plus’ or ‘minus’ mating type; (B) after 
invasion of the mating-type locus by a DNA replication 
fork, strand-specific epigenetic modification occurs in the 
grandmother cell. 


to be enabled by a strand-specific epigenetic modific- 
ation that occurs in the grandmother. As shown in 
Figure 3B, the invasion from a specific direction by 
a DNA replication fork of the mating-type locus 
creates an inherent asymmetry whereby the ‘top 
strand’ is replicated by the leading strand mechanism 
and the ‘bottom’ strand, by the lagging strand mechan- 
ism (i.e., via Okazaki fragments). As a consequence, a 
strand-specific epigenetic modification is introduced 
into the bottom strand; this modification is passed on 
to one of the two daughter cells and, when that daugh- 
ter replicates its own DNA, one of its two progeny 
will inherits an epigenetically modified DNA double 
helix, which leads to the initiation of recombination 
and mating-type switching. 


Epigenetic Regulation: Old Curiosity 
Shop? 


Is epigenetic regulation merely an intellectually amus- 
ing curiosity, or does it illuminate principles in gene 
control of general relevance? While several very speci- 
alized systems certainly use epigenetic regulation, its 
broad applicability is also clear. We present three brief 
examples. 

Work from Kim Nasmyth’s laboratory at the Insti- 
tute of Molecular Pathology in Vienna examined the 
regulation of the budding yeast HO endonuclease 
gene. Using high-resolution analysis, these investiga- 
tors made the remarkable observation that the effects 
of certain transcriptional regulators on the activity of 
this gene persist long after the regulators themselves 
have left the DNA. The likeliest explanation for this 
epigenetic memory is that the regulator effects a stable 
modification of chromatin structure over the gene 
promoter, and that the structure itself is stable enough 
to confer regulation of the gene. 

Studies from the laboratories of Renato Paro, 
Vincenzo Pirrotta, and others have investigated the 
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regulation of homeotic genes in Drosophila. These 
are required for proper body-plan development dur- 
ing embryogenesis, and are expressed in stable fashion 
in specific segments of the embryo. Biochemical and 
genetic analysis showed that a class of proteins termed 
‘Polycomb’ form large-scale repressive, self-propagat- 
ing complexes that epigenetically silence homeotic 
genes, and that proteins of the trithorax group act in 
similar fashion, but with opposite functional effects, 
i.e., genes become stably activated. 

Finally, it is useful to recall that the impact of 
epigenetic regulation on the function of the human 
genome clearly extends beyond imprinted loci and 
the inactivated X chromosome. For example, recent 
genetic data have shown that humans with mutations 
of the methylated DNA-binding protein MeCP2 
develop a progressive and debilitating developmental 
and neurological disorder called Rett syndrome. Thus, 
epigenetic regulatory pathways control many more 
aspects of our genome’s behavior than we currently 
appreciate. 
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The term episome was introduced (Jacob and Wollman, 
1958) to describe an accessory genetic element (e.g., 
in Escherichia coli bacteria) similar to a plasmid, 
but which had the additional ability to become inte- 
grated into the chromosome semistably and, further- 
more, could at some point dissociate from the 
chromosome and again replicate independently, or 
could even be totally eliminated (cured) from the cell. 
The definition was also intended to include lysogenic 
bacteriophages that could either integrate into the 
chromosome and persist as prophages or replicate 
extrachromosomally and produce new bursts of 
phage particles on cell lysis. A plasmid in contrast, 
was defined (Lederberg, 1952) as an accessory, extra- 
chromosomal, independently replicated element. 
Most plasmids or bacteriophages do not provide any 
essential function for the survival of the cell except in 
special circumstances, such as a plasmid that carries a 
gene conferring antibiotic resistance when the cell is 
growing in the presence of the antibiotic. The episome 
concept was initially very useful in pointing out that 
two types of element, namely the temperate bacterio- 
phage lambda and the F sex factor for conjugal fertility, 
could both spend part of their life history integrated 
into the chromosome, even though they were first 
discovered as chromosome-independent entities, and 
chromosomal integration was not previously known 
to occur for plasmids or bacteriophages. The term 
episome was sometimes also used for analogous 
mammalian systems such as the simian virus SV40. 
In recent years, the term episome has evoked less 
meaning and has been used less. This is because an 
increased spectrum of recombination events between 
plasmids and chromosomes has been observed, 
spanning a vast range of frequency and host cell 
dependence, and depending more or less (or not at 
all) on special recombination functions to facilitate 
the integration and/or excision events. Thus, it is 
difficult, if not impossible, to distinguish an episome 
from a plasmid based on some arbitrarily defined 
frequency of integration/excision, etc. Depending on 
the particular element, either the term plasmid or 
lysogenic bacteriophage generally suffices to cover 
the range of naturally occurring elements of this type. 
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In its original usage (Bateson, 1909), epistasis referred 
to the masking or unmasking of the effects of allelic 
substitution at one locus by the allelic state at a second 
locus. In modern usage, epistasis refers to any rela- 
tionship of nonadditive interaction between two or 
more genes in their combined effects on a phenotype. 
Epistasis is only defined in the context of genetic 
variation at multiple loci. This variation may be nat- 
ural or experimental. 

Epistasis is an important concept in biochemical 
genetics, population genetics, and quantitative genet- 
ics. Although its definition varies somewhat across 
these fields, the underlying concept is that the effects 
of allelic substitution at one gene can be dependent 
on the allelic state of another gene or genes. In bio- 
chemical genetics, analysis of epistatic relationships 
can be used to assign genes to pathways and to define 
the order of gene action within a pathway. In popula- 
tion genetics, epistasis plays a role in theories of fitness 
and adaptation. In quantitative genetics, epistasis has 
taken on a broader meaning that encompasses any 
nonadditive interaction among genes and it is often 
identified with the interaction term in analysis of 
variance. Defining the scale of measurement is import- 
ant when considering epistasis as some statistical 
interactions can be removed by a change of scale, 
e.g., multiplicative effects can be converted to additive 
effects by taking logarithms. Epistatic interactions can 
be synergistic (greater than additive) or antagonistic 
(less than additive). When two genes interact stat- 
istically it is implied that they must also interact phy- 
sically, either through direct (protein-protein) 
interaction or indirectly through a network of inter- 
acting gene products. Thus statistical epistasis can 
provide insights into the genetic architecture under- 
lying complex phenotypes. 


Biochemical Epistasis 


Gene products often act together in pathways and 
networks. Examples include biosynthetic pathways, 
signal transduction pathways, and transcriptional 


regulation networks. Epistasis analysis of mutant 
alleles provides a means to assign genes to pathways 
and to determine their order of action. Suppressor and 
enhancer screens are often used to identify epistatic 
mutations. A suppressor is a mutation at a second site 
that causes a reversion to the wild-type phenotype and 
thus masks the mutation at the first site. An enhancer 
is a second mutation that has a novel phenotype, thus 
unmasking an effect that cannot be observed in either 
of the single mutants. Synthetic lethality is a special 
case in which both of the single mutants are viable but 
the double mutant is not. This can occur when the two 
loci belong to parallel pathways driving an essential 
function as illustrated in scheme (1): 


(1) 


Here A and B represent two enzymes, either of which 
can convert substrate S into product P. If a loss-of- 
function mutation occurs at either A or B, the other 
gene can still provide the function. In a positive regu- 
latory signaling pathway (Scheme 2), loss of function 
at any step will result in the inability of the system to 
respond (R) to a signal S. The single mutants and the 
double mutant all have the same phenotype: 


S>A>B>R (2) 
In a negative regulatory pathway (scheme 3), a loss of 
function at B will lead to a noninducible response R 
and a loss of function at A will lead to a constitutive 
response. The double mutant will behave like the 
single mutation in B: 


S-ATABOR (3) 
Although epistasis analysis explicitly involves only 
two loci, it can be applied repeatedly to combinations 
of loci to elucidate the structure of larger pathways 
and networks. 
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Statistical Epistasis 


Epistasis in Biometrical Genetics 

Cockerham, 1954 introduced the idea of partitioning 
the genetic variance from inbred line crosses into 
additive, dominance, and epistatic components. The 
contrasts defining this partitioning are shown in 
Table |. Variance components have proven to be use- 
ful in predicting the response of a population to select- 
ive pressure and have been successfully applied in 
breeding programs. The epistatic variance compon- 
ents reflect an average effect over many genes on the 
phenotype distribution in a population. Physiological 
epistasis can contribute to both additive and domin- 
ance components of variance, but physiological epi- 
stasis must be present in order to generate statistical 
epistasis. 


Epistasis in Quantitative Trait Analysis 

The availability of polymorphic DNA marker loci 
distributed throughout the genomes of many organ- 
isms enables us to track the inheritance of specific loci 
in line crosses and in pedigrees. Methods for analyzing 
quantitative trait inheritance using marker data are 
often based on single gene models that do not allow 
for epistasis. In the presence of large epistatic effects, 
these methods may fail to detect important loci or may 
produce misleading results by a statistical pheno- 
menon known as Simpson’s paradox. Epistasis can be 
detected using the F test for interaction in a two-way 
analysis of variance. However, this test can require 
large sample sizes to achieve reasonable power. In 
addition, corrections required to avoid potential false 
results when searching through all locus pairs further 
restrict the power. These difficulties may explain the 
paucity of reports of epistasis in the quantitative traits 
literature. 

This measured genotype approach can be used to 
make specific predictions about the phenotype of 
individuals based on their genotype, whereas bio- 
metrical analysis can only make statements about 
population averages. 


Table I Analysis of variance contrasts for variance components in diallele cross 

Component AABB AABb AAbb AaBB AaBb Aabb aaBB aaBb aabb 
Additive 2 2 2 0 0 0 —2 —2 —2 
Additive 2 0 —2 2 0 —2 2 0 —2 
Dominance l | l —2 —2 —2 l l l 
Dominance l —2 l l —2 l l —2 l 
AxA l 0 -l 0 0 0 -I 0 l 
AxD l —2 l 0 0 0 -I 2 -I 
DxA l 0 -l —2 (0) 2 l 0 -I 
DxD l —2 l —2 4 —2 l —2 l 
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Table 2 Epistasis in mouse model of cleft palate 


clf2 
clfl BB Bb bb 
AA - - - 
Aa - - - 
aa - + ++ 


Epistasis in Complex Disease Traits 
Many common diseases show familial aggregation but 
do not follow simple Mendelian patterns of inher- 
itance. Evidence for epistasis has been reported for 
traits of medical importance including cancer, hyper- 
tension, kidney disease, epilepsy, and alcoholism. In 
addition, there are many genetic modifiers of disease 
phenotypes that alter the severity of a trait depending 
on the genetic background in which they occur. Back- 
ground effects are an example of epistasis. 
Epidemiological studies of a common birth defect, 
cleft lip and palate, in human populations have sug- 
gested that a single major gene with incomplete pene- 
trance may be responsible for this condition. In a 
mouse model (Juriloff, 1995), the condition appears 
to be determined by an epistatic interaction between 
two loci, clf1 and clf2, as shown in Table 2. When the 
clf1 genotype is aa, the clf2 heterozygote shows a mild 
form of the condition and the clf2 bb homozygotes 
show a more severe form. It is conjectured that these 
two genes have partially overlapping functions and 
that the recessive alleles are loss-of-function. In this 
example, a model of epistatic interaction provides a 
testable prediction about the molecular mechanism. 
Epistasis can be particularly difficult to unravel in out- 
bred human populations, but is often more amenable 
to analysis in the context of a model organism. Con- 
struction of special inbred lines (congenic or nearly 
isogenic lines) is of further use in the analysis of epi- 
stasis by reducing the complexity of the genetic back- 
ground. 


Epistasis in Population Genetics 


The quantitative theory of population genetics, as 
introduced by Fisher, 1918, is based on models of 
additive genetics in which epistatic effects are repre- 
sented as a ‘noise’ term. However, epistasis is known 
to play a key role in a number of evolutionary pro- 
cesses. Epistasis in traits related to fitness of an indi- 
vidual can lead to the existence of multiple fitness 
peaks and multiple stable equilibria for gene frequen- 
cies in a population. This idea is central to Wright’s 
(1930, 1980) shifting balance theory. Wright proposed 
that population subdivision can lead to the evolution 


of coadapted gene complexes. Incompatibilities among 
sets of genes can lead to genetic isolation and specia- 
tion. It is interesting to note that epistasis in quanti- 
tative traits has been observed more often in crosses 
between widely diverged strains than in crosses be- 
tween closely related strains. 

Epistasis can be beneficial. If epistasis is present ina 
population that has been reduced to a very small size, 
inbreeding leads to an increase in additive variance. 
Thus hidden variation is exposed to selection and 
rapid adaptation can occur following a bottleneck. In 
asexually reproducing populations, the gradual accu- 
mulation of deleterious mutations (an effect known as 
Muller’s ratchet) can be slowed significantly by the 
presence of epistasis. Finally, the theoretical advantage 
of sexual reproduction requires that deleterious muta- 
tions should occur frequently and that their effects 
should be synergistic. 


Future of Epistasis 


What sorts of phenotypes will tend to show epistatic 
effects? Transcriptional regulation of gene expression 
is complex, involving both positive and negative regu- 
lation of multiple factors with varying degrees of 
specificity. Gibson, 1996 demonstrated that inherent 
properties of such systems lead to epistatic and pleio- 
tropic effects. Traits that are closely related to direct 
regulation of one or a few genes are more likely to 
reveal epistasis than are morphological traits that de- 
pend on the cumulative effects of many genes for their 
expression. The availability of molecular markers and 
technology for monitoring gene expression opens 
up new possibilities for unraveling the network of 
biochemical mechanisms underlying the relationship 
between phenotype and genotype. As our ability to 
study the effects of genes at the biochemical level 
improves, so will our understanding of the mechan- 
isms underlying epistasis. Modern molecular tech- 
niques are helpful in reuniting the biochemical and 
statistical descriptions of this ubiquitous phenom- 
enon. 
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Epstein-Barr virus (EBV) is a human lymphotropic 
herpesvirus that is carried in a latent, essentially non- 
pathogenic state by 80-90% of all humans. It belongs 
to the gamma herpesvirus subfamily and is regarded as 
the prototype lymphocryptovirus. Such viruses have 
only been found in Old World primates. Humans 
are the exclusive natural host for EBV. Each of the 
other Old World primate species is infected with 
closely related lymphocryptoviruses and is resistant 
to infection with human EBV. In immunologically 
naive New World primate species, EBV can cause 
fatal lymphoproliferative disease. 

In humans, the virus is mainly transmitted by the 
saliva. In low socioeconomic groups early childhood 
infection is the rule, followed by seroconversion but 
no identified disease. Under good hygienic condi- 
tions, where the primary infection is often postponed 
to the teens or to adulthood, the first encounter with 
the virus leads to mononucleosis, a self-limiting lym- 
phoproliferative disease, in about half of the cases. The 
other half undergoes silent seroconversion. In im- 
munodeficient patients mononucleosis may follow a 
progressive course. EBV-carrying immunocytomas 
occur in iatrogenically (e.g., transplant recipients), 
congenitally (e.g., X-linked lymphoproliferative syn- 
drome), or infection (e.g., HIV) based immunosup- 
pressive states, with fatal outcome. They can be cured 
by adoptive immunotherapy with appropriate reactive 
and histocompatible T-cells. 

Similarly to other herpesviruses, EBV has a toroid- 
shaped protein core, wrapped with DNA, a nucleo- 
capsid with 162 capsomeres, a protein tegument 
between the nucleocapsid and the envelope, and an 
outer envelope with external glycoprotein spikes. The 
major EBV capsid proteins are 160, 47, and 28 kDa in 
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size, packaged with a number of minor virion pro- 
teins. The most abundant EBV envelope and tegument 
proteins are 350/220 and 152 kDa in size, respectively. 
The EBV genome is carried by the virion as a linear, 
double-stranded 172-kbp DNA. 

The interactions of EBV with the human host are 
seemingly paradoxical. It is the most highly trans- 
forming known virus. It turns resting B lymphocytes 
regularly into immunoblasts that can give rise to 
immortal lymphoblastoid cell lines (LCLs). The 
EBV-transformed immunoblasts closely mimic IL-4 
and anti-CD40 activated blasts morphologically and 
with regard to their repertoire of activation markers. 

In spite of its high transforming ability, EBV 
induces or contributes to malignant disease only 
exceptionally. These exceptions can be seen as bio- 
logical accidents at the level of the host (like immuno- 
suppression as already mentioned) or of the cell 
(oncogene activation). The second, related paradox 
concerns the relationship between the virally infected 
lymphocyte and the host organism. EBV-transformed 
immunoblasts are highly immunogenic. Mononucleo- 
sis can be seen as a somewhat chaotic but nevertheless 
efficient rejection reaction. In vitro exposure of T cells 
to autologous EBV-transformed immunoblasts gener- 
ates CD8+ killer T cells that lyse their specific targets 
with an equally high efficiency as allogeneic T cells can 
kill MHC class I incompatible targets. In spite of the 
highly efficient elimination of the proliferating im- 
munoblasts, the virus regularly succeeds in establish- 
ing its permanent latency in the B cell compartment 
itself, without causing either proliferation or rejection 
of its carrier cell. Both paradoxes have been resolved 
by the analysis of the viral strategy. 


Viral Expression Phenotypes 


Like other herpesviruses, EBV can enter latent (non- 
lytic) or lytic interactions with its host cell. The lytic 
cycle is only different in detail, but not in principle, 
from other herpesviruses. The nonlytic, growth trans- 
forming interactions are specific for EBV. 

The course of the primary infection has been 
mainly studied in normal B lymphocytes. The virus 
uses a B-cell-specific membrane component, CD21, 
also known as CR2, or as the B-cell-specific comple- 
ment (C3d) receptor, as its receptor. Following its 
attachment to CD 21, the viral envelope fuses with 
the host cell membrane and its DNA is internalized. 
The linear viral genome circularizes 12-16 h after 
entry and amplifies to 40-50 episomal copies. The 
infected B cell is activated like after mitogen exposure 
and turns into an immunoblast. Viral transcription 
starts at the Wp promoter, at the time of circu- 
larization. A giant message is generated out of which 
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monocistronic messages for six nuclear proteins, 
EBNA 1-6 (alternative names: EBNA1, EBNA2, 
EBNA3A, B, C, and EBNA LP), are spliced. EBNA2 
and EBNAS (alternative name: EBNA-LP) are 
expressed first, reaching their peak level in 24-32 h. 
EBNA2 transactivates a gamut of cellular genes, 
including immunoblast-associated activation markers 
and the virally encoded membrane proteins LMP1, 
LMP2A, and LMP2B. Meanwhile, the transcriptional 
start of the EBNAs switches to the Cp promoter, as a 
rule. All six EBNAs remain expressed in the 
immortalized lymphoblastoid cell lines (LCLs) that 
emerge. All nine growth transformation associated 
genes (6 EBNAsand3 LMPs)are expressed by 32 hours. 


Function of the Growth Transformation 
Associated Proteins 

Six of nine proteins expressed in lymphoblastoid cell 
lines, EBNA1, EBNA2, EBNA3 (alternative name: 
EBNA3A), EBNAS (alternative name: EBNA-LP), 
EBNA6 (alternative name: EBNA3C), and LMP1, 
are essential for immortalization. Their function is 
only incompletely known. 

EBNA1 is a sequence-specific DNA-binding pro- 
tein that interacts with the latent replication origin 
(oriP) of the virus. This binding is essential for the 
maintenance of the EBV genomes as circular episomes 
and for their replication in synchrony with cellular 
DNA synthesis. EBNA2 is a transcription factor. It 
is essential for the initiation of immunoblastic trans- 
formation and for the maintenance of the immorta- 
lized state. It activates the Cp promoter that generates 
the polycistronic message for the six EBNAs. It also 
activates the viral LMP1/LMP2 promoter and numer- 
ous cellular genes. It is noteworthy that the EBNA2 
responsive LMP1/LMP2 promoter element works, 
like several EBNA2-induced cellular genes, only in B 
lymphocytes. EBNA2 interacts with the transcrip- 
tional regulator, RBPJk, also called CBF1, a DNA- 
binding cellular protein that activates, in turn, CD23, 
other immunoblast markers, and B cell survival 
factors. 

The EBNA 3 family (EBNA3A, B and C, alterna- 
tive names: EBNA 3, 4 and 6) encode similar motifs, 
including binding sites for RBPJk, a leucine zipper, 
acidic domains, proline- and glutamine-rich repeats, 
and several arginine or lysine residues, responsible for 
nuclear translocation. The full significance of these 
interactions remains to be elucidated, but it is note- 
worthy that RBPJk belongs to a conserved group of 
proteins linked to the Notch signaling pathway. 
Ligand-elicited signaling by Notch can influence dif- 
ferentiation and proliferative responses. All three 
members of the EBNA 3 group are related. Only 
EBNA3 (EBNA 3A) and EBNAG (EBNA3C) but 


not EBNA4 (EBNA3B) are essential for transform- 
ation. 

EBNAG6 (EBNA3C) isa transcriptional activator. It 
upregulates cellular genes like CD21 and viral genes 
like LMP1. Insertion of an amber stop codon after 
aa 365 results in recombinants incapable of B-cell 
immortalization. 

All three members of the EBNA 3 family are pre- 
ferred targets for cytotoxic T cell responses. It may be 
therefore inferred that all three, including EBNA4, 
would have been eliminated, were they not essential 
for the viral strategy. 

EBNA5 (EBNA-LP) is one of the earliest viral 
proteins expressed after primary B cell infection. It is 
required for the induction of cyclin D2, in cooper- 
ation with EBNA2. The length of its repetitive part 
(W repeat) varies between different EBV isolates. This 
can be exploited for tracing the origin of viral sub- 
strains. EBNAS5 colocalizes with the hsp 70, PML, and 
retinoblastoma (Rb) proteins in virally transformed 
immunoblast nuclei. 

The major cell-membrane-associated protein, 
LMP1, can transform immortal rodent fibroblasts in 
vitro and is therefore regarded as a viral oncogene. It 
forms patches and caps on the villous surfaces of 
lymphoblastoid cells. It has a short, cytoplasmic, N- 
terminal hydrophilic part and six transmembrane 
loops, followed by the C-terminal cytoplasmic part 
of the protein. The number of transmembrane loops is 
not critical. Important functions are associated with 
the C-terminal part which has to be anchored to the 
membrane by the hydrophobic segment. The struc- 
ture of LMP1 is similar to some ion-channel proteins. 
LMP1 induces many of the changes associated with 
EBV transformation of B lymphocytes, such as cell 
clumping, and the parallel increase of villous projec- 
tions, vimentin expression, cell surface expression of 
CD23 and other activation markers, MHC class II 
proteins, IL-10, and the cell adhesion molecules 
LFA1, ICAM1, and LFA3. It also upregulates several 
adhesion molecules on B cells, a calcitum-dependent 
protein kinase, bel-2, and NFkB. 

The cytoplasmic domain of LMP1 interacts with 
cellular proteins that mediate cytoplasmic signaling 
from the TNFR family. LMP1 aggregates interact 
with TNFR (tumor necrosis factor receptor)-TRAF 
(TNFR associated factor) aggregates to form large 
complexes. Through this mechanism, LMP1 can 
cause constitutive cell growth, inhibit apoptosis, and 
activate NFkB. 

The transmembrane domains and the carboxy 
terminus are essential for primary B lymphocyte 
growth transformation. The first 44 amino acids of the 
transmembrane domain interact with a protein, LAP1, 
homologous to the TNFR-associated factors (TRAFs). 


LMP1 associates with LAP1 in B lymphoblastoid 
lines and with an EBV-induced cell protein, EB16, 
which is the human homolog of the murine TRAF1, 
implicated in cell growth and NFkB activation. 


LMP2A and B (TPI and 2) 

The first exons of these two membrane proteins are 
unique while all other exons are shared. They encode 
12 hydrophobic integral membrane sequences and a 
27aa hydrophilic domain. Both proteins colocalize in 
the plasma membrane with LMP1. LMP2A associates 
with tyrosine kinases of the src family and can modu- 
late transmembrane signal transduction. LMP2A and 
B are not required for immortalization. Importantly, 
LMP2A blocks the switch from latent to lytic infec- 
tion in B lymphocytes and is therefore believed to 
contribute to the maintenance of latency. 

The EBERs, two EBV-encoded small RNAs, are 
expressed in virtually all EBV-carrying cells. They are 
the most abundant EBV products in latent infection 
and are therefore the preferred targets for the im- 
munohistochemical detection of EBV-carrying cells 
by im situ hybridization. They are localized in the cell 
nucleus where they form a complex with the cellular 
La protein. The EBERs are not essential for lympho- 


cyte transformation and their function is unknown. 


Program Switches and Viral Strategy 

Three major forms of latency have been identified in 
EBV-carrying growth transformed and/or neoplastic 
cells. Phenotypically representative type I Burkitt 
lymphoma (BL) cells express a monocistronic EBNA1 
message, initiated from the Qp promoter. In addition 
to EBNA1, they express the EBERs, but none of the 
other growth transformation associated viral products 
(except occasionally LMP2). This expression pattern 
is referred to as latency I and is also found in latently 
infected normal B cells of healthy seropositive persons. 
Latency II is similar to latency I, in that EBNA1 is 
expressed from the Qp promoter and EBNA 2-6 are 
not expressed. LMP1 and 2 are constitutively expres- 
sed, however. It is found in nasopharyngeal carcinoma 
(NPC) and in most other EBV-carrying non-B cells. 
In latency II all six EBNAs and all three LMPs are 
expressed. This program is only used in immuno- 
blasts, such as freshly transformed B cells, established 
LCLs, BL lines that have drifted to a more immuno- 
blastic (type III) phenotype, proliferating B cells in 
mononucleosis, and in the immunoblastomas that 
arise in immunodefective persons. 

The choice between these three main programs (and 
minor variants that will not be discussed here) thus 
depends on the host cell phenotype. The Wp/Cp- 
initiated giant message from which all six EBNA 
mRNAs are spliced is thus only used in cells with an 
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immunoblastic phenotype. The LMP1 promoter is 
repressed in B cells, but this repression can be over- 
ridden by EBNA2. In latency I, Wp/Cp are inactive, 
Qp is active. EBNA2 is not made and LMP1 is, there- 
fore, repressed. Non-B cells permit constitutive LMP1 
expression in the absence of EBNA2, as a rule. 

The scenario of the primary B cell infection starts 
with the massive induction of immunoblast prolifera- 
tion. The majority of the virus-carrying blasts are 
rejected after a couple of weeks (see below). A small 
fraction of the EBV-carrying immunoblasts are belie- 
ved to switch to long-lived memory cells with a rest- 
ing B cell phenotype. Concurrently, they switch their 
EBV expression pattern to the more restricted type I 
program. The virus thus hides from immune rejection 
in memory B cells. There is no evidence for other sites 
of latent viral persistence. Ablation of the bone mar- 
row eradicates the resident virus in bone marrow 
transplant recipients. This is consistent with the exclu- 
sive hemopoetic localization of the resident virus. 

The lytic cycle can be induced in many but not all 
EBV-carrying cell lines by phorbol esters, butyrate, 
hydroxyurea and, in some B cell lines, by anti-Ig anti- 
bodies. The lytic cycle is initiated by the activation 
of the BZLF1 (also called Z, or Zebra) gene, a viral 
transactivator of multiple early genes. In vivo, infec- 
tious virus matures in the keratinizing cells of the 
pharyngeal epithelium. Oral hairy leukoplakia, fre- 
quently observed in AIDS and other immunosup- 
pressed patients, is a macroscopically visible focus of 
productive EBV infection. It is curable by antiherpes 
drugs, e.g., acyclovir. 

Several EBV genes expressed during the lytic cycle 
are closely homologous to cellular genes. The immedi- 
ate early lytic switch gene, BZLF1, is closely related to 
the jun/fos family of transcriptional activators. The 
early gene BHRF1 resembles anti apoptotic bcl-2 
gene structurally and functionally. The late gene 
BCRF1 is nearly identical to human IL-10. 


Immune Responses 


EBV-transformed immunoblasts are highly immuno- 
genic for autologous T cells. Several immune effectors 
and CD8+ T cells react to them with an equally 
intense proliferation and cytotoxic response as to allo- 
geneic MHC class I incompatible cells. In the auto- 
logous T anti EBV-B mixed lymphocyte culture, one 
or two of the growth transformation associated EBV- 
encoded proteins (with the exception of EBNA1) are 
chosen as the main targets, depending on the MHC 
class I allotypes of the responder that serve as the 
preferential restriction specificities. Other effectors, 
such as NK cells, LAK-type cells, and macrophages 
are also mobilized and a variety of lymphokines are 
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released in the course of mononucleosis, but the effi- 
cient rejection is probably largely due to the CD8+ 
CTL. EBNA3, 4, and 6 and LMP2 are the most fre- 
quent rejection targets. 

The exemption of EBNA1 from being targeted by 
the CD8+ T cells is due to the long glycine—alanine 
repeat that inhibits the proteasome—ubiquitin-depend- 
ent processing of EBNA1, as long as it is in the normal 
cis position. This exceptional handling of EBNA1 can 
be seen in relation to the fact that it is the only EBV- 
encoded protein that can be expressed irrespectively 
of the cellular phenotype. This is also one of the main 
reasons why the memory B cells that carry latent virus 
escape the “attention” of the immune system. 


Disease Association 


EBV is the causative agent of infectious mono- 
nucleosis and of the immunoblastomas that arise in 
immunosuppressed patients, such as transplant recipi- 
ents, congenital immunodeficiencies, particularly the 
X-linked lymphoproliferative syndrome (XLP, an 
inherited immunodeficiency syndrome that preferen- 
tially effects the EBV-specific immune surveillance 
mechanism), and in HIV-infected persons. The virus 
is associated with 98% of endemic Burkitt lympho- 
mas, but is only present in about 20% of the sporadic 
cases. All Burkitt lymphomas carry the chromosomal 
Ig/myc translocation, however, that is believed to 
provide the proliferative drive of the tumor. Multiple 
viral genomes are present in 100% of low differen- 
tiated or anaplastic nasopharyngeal carcinomas (see 
Nasopharyngeal Carcinoma (NPC)). They are also 
present in 50% of Hodgkin’s lymphomas, a variable 
but usually low percentage of T-cell lymphomas 
(except midline granulomas where the association is 
100%), NK-cell leukemias, a small fraction of gastric 
adenocarcinomas, and leimyosarcomas that arise in 
immunosuppressed (e.g., HIV-infected) patients. The 
role of EBV in these malignant diseases is not clear. 
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A system is said to be at equilibrium if it is no longer 
changing. Equivalently, an equilibrium is a state at 
which the system will remain, baring any perturb- 
ations away from it. The stability of an equilibrium 
hinges on whether the system returns to the 
equilibrium state following a perturbation. A stable 
equilibrium can be either locally or globally stable, 
depending on whether it shows stability after only 
small or arbitrarily large perturbations, respectively. 


Mathematical Criteria 


To formalize the concepts of equilibrium and stable 
and unstable equilibrium for a system determined by a 
single variable, suppose the value of the variable x (say 
the gene frequency at a locus) changes through time 
such that after one generation its new value x’ is some 
function, f(x), of its value x in the previous generation. 
The variable x then changes from one generation to 
the next according to the recursion equation: 


x! = f(x) (1) 


Equilibrium 

At ‘equilibrium,’ the variable x is no longer changing, 
i.e., the change in x after one generation, Ax = x’ — x, 
equals zero. An equilibrium of this system is thus a 
value £ for x satisfying the mathematical condition, 


fe) = x. 


Locally Stable Equilibrium 

An equilibrium state £ is ‘locally stable’ if the system 
always returns to it following a slight perturbation. 
This will hold if whenever the value of the variable x is 
near X, its next value x’ is closer to £ or, equivalently, 
after one generation the deviation from the equi- 
librium &% is less than it was previously. Formally, 
local stability of x requires that: 


|x’ — &| < |x — è| whenever x = £ (2) 


It is shown below that this condition holds if at x = x 
the derivative (rate of change) of the new value 
of x with respect to the old has magnitude less than 
1, i.e.: 


df (x) 


—1 < 
dx 


<1 (3) 


Under this condition, the equilibrium x is locally 
stable because the value of the variable x will always 
return to the equilibrium value £ if it is perturbed 
slightly from that equilibrium. 


Globally Stable Equilibrium 

An equilibrium < is called ‘globally stable’ if the value 
of the variable x always converges through time to £ 
from all possible (nonequilibrium) starting values, no 
matter how far away the system initially is from this 
equilibrium. 


Unstable Equilibrium 

An equilibrium % is called ‘unstable’ if the system 
moves away from it following a slight perturbation. 
This will hold if for values of the variable x near £, its 
next value x’ is farther from x or, equivalently, after 
one generation the deviation from the equilibrium £ is 
more than it was previously. Formally, instability of 
% requires that the inequality in equation (2) fails for 
x values near x. This will be the case whenever at x = £ 
the derivative (rate of change) of the new value of x 
with respect to the old has magnitude greater than 1, 
Le: 


df (%) 
dx 


> tor 8) <4 (4) 


Under this condition, the equilibrium % is unstable 
because the value of the variable x moves away from 
the equilibrium value £ if it is perturbed slightly from 
that equilibrium. 


Example 
A simple selection model provides a useful example of 
the concepts of equilibrium and stability. Consider 
an autosomal locus with two alleles, A; and A>, in a 
population of haploid organisms where the frequency 
of the A; allele in newborn individuals is x and the 
frequency of the alternate allele Az is 1 — x. Suppose 
a fraction fı of newborns carrying the A; allele and a 
fraction f2 of newborns carrying the A; allele survive 
to reproduce where f; Æ f2, and that this is the only 
evolutionary force acting on this locus. 

The frequency of the A, allele after one generation 
of selection is readily derived by working through a 
complete generation of this organism. To assist with 
this derivation, let us deal in terms of numbers, using 
N to denote the current number of newborn in- 
dividuals. The number of new adults of each type 
after selection is then simply the number of newborns 
of that type that survive to reproduce, as shown in 
Table I. 

The frequency of the A; allele in the new adults is 
simply the fraction of adults carrying that allele: 
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fixN 
fixN +h = x)N 


Since the differential survival rates are assumed to be 
the only force acting on this genetic locus, this expres- 
sion also gives the new frequency of the A; allele in the 
new generation of zygotes. After canceling the com- 
mon factor N, the new allele frequency simplifies to: 


maae fix — 
fix+f(1 — x) 


To find the equilibria in this system, we first find after 
some straightforward algebra that the change in allele 
frequency after each generation of selection is: 


f(x) (5) 


ae fix 
fix+ f(-x) 
_ (fi-fr)x(1 -x) 
fix+fali —x) 


Since at equilibrium Ax = x’ — x = 0, we conclude that 
there are two equilibrium allele frequencies, x = 0 
(fixation for A2, with only A; alleles and no A; alleles) 
and £ = 1 (fixation for A;, with only A; alleles and no 
Az alleles). 

To determine when these two equilibria are locally 
stable, we differentiate the right-hand side of the 
recursion equation (5) which yields: 


df (x) _ fifo 
dx [fix +fa(1 —x)]° 


This derivative is fı/f2 at £ = 0 and fo/f, at £ = 1. Since 
the two survival rates, f, and f>, are nonnegative frac- 
tions, we conclude that fixation for A2(x = 0) is 
locally stable and fixation for A;(x = 1) is unstable if 
0 < fi < f2, while fixation for A;(% = 1) is locally stable 
and fixation for A7(* = 0) is unstable if 0 < f> < fi. In 
other words, fixation for a given allele is locally stable 
if individuals carrying that allele have a higher survival 
rate than the other type in the population. Fixation for 
the allele with the lower survival is unstable. 

In this particular biological system, each equilib- 
rium is actually globally stable whenever it is locally 


(6) 


Table | Generation cycle under selection 

Type of individual 

A; A2 
Number of newborns xN (1—x)N 
Survival rate fi h 
Number of new adults fixN fa(|—x)N 
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stable because analysis of the sign of allele frequency 
change in equation (6) shows that the frequency of the 
A, allele will always steadily increase to 1 if 0 < f> < fi 
and will steadily decrease to 0 if O < fı < fi. Thus, 
the frequency of the allele conferring the higher 
survival rate will always increase to 1 (and that of 
the allele with the lower survival rate will always 
decline to zero) under the simple selection scheme 
for haploid populations considered here. Some sample 
trajectories showing how the frequency of the A, 
allele changes through successive generations under 
various parameter values are shown in Figure |. In 
Figure IA, fixation for the A; allele is unstable and its 
frequency always monotonically decreases to 0. In 
Figure IB, fixation for the A; allele is locally (and 
globally) stable and its frequency always monotonic- 
ally increases to 1. 


0 5 10 15 20 25 30 
Generation 


0 5 10 15 20 25 30 
Generation 


Figure I Trajectories through time in generations of 
the frequency of the A, allele for various initial 
frequencies (xg) for A; and survival rates (f), f2) of the 
two alleles. 


Derivation of Local Stability Condition 

The local stability criterion in equation (3) follows by 
noting that if the variable x starts near the equilibrium 
value £, its new value x’ after one generation in equa- 
tion (1) can be approximated by the tangent line to the 
function f(x) at the point x = x. The latter is also 
the first-order Taylor polynomial approximation to 
the function f(x) near x. Under this linear approxi- 
mation, we have: 


v= fe) + LË 


(x —%) forx =X 


Remembering that f (2) = % at any equilibrium for this 
system, we immediately find that: 


WFO, 


(x —%) forx =X 


and thus the condition in equation (2) for local sta- 
bility reduces to the criterion given in equation (3). 


See also: Balanced Polymorphism; Hardy- 
Weinberg Law 
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When a population is in equilibrium both genotype 
and allele frequencies remain constant from one 
generation to the next. If a population satisfies the 
conditions necessary to ensure that genotypes are in 
Hardy—Weinberg proportions, it follows that it is also 
in equilibrium. Even if a population does not satisfy 
the Hardy—Weinberg conditions, however, it may still 
be in equilibrium. The frequency of recessive alleles 
preventing individual monkeyflower plants from pro- 
ducing pollen, for example, is likely to represent a 
balance between the tendency of natural selection to 
eliminate the recessive allele and recurrent mutation 
that tends to increase its frequency. A population in 
which such forces are balanced might be said to be 
in dynamic equilibrium. 


Populations in the Absence of Selection 


It is easiest to understand the concept of an equilib- 
rium population by considering what happens to allele 
frequencies in a very large population from one 
generation to the next. Suppose that the frequency of 


an allele, A;, at a particular locus is p and the fre- 
quency of the alternative allele at this locus, A2, is 
q(q = 1 — p). If there are no differences among in- 
dividuals in the probability that they survive or in 
the numbers of offspring that they produce and if 
there is no mutation, then clearly we will have the 
same number of A, and A; alleles in the next gener- 
ation as we have in this generation. Putting it another 
way: 


Pri = Pt 


where p, refers to the allele frequency in the current 
generation and p;, refers to the allele frequency in the 
next generation. A population is in equilibrium when- 
ever Pir = Pr 

Suppose we now allow for the possibility that 
mutation can occur. Then we would normally expect 
the allele frequency in the next generation to be dif- 
ferent from the allele frequency in the present gen- 
eration. Specifically, imagine that A; mutates to A2 
with a frequency u and that Az mutates to A; with a 
frequency v. Then: 


Poi = (1 — u)pi + (1 — pr) 


Clearly, p;,1 will not normally equal p,, so the popula- 
tion is not at equilibrium. But what if p, = v/(u + v)? 
It is not hard to verify that pı will also equal 
v/(u +v). Thus, p1 =p: and the population is at 
equilibrium. At this equilibrium the rate at which A, 
alleles give rise to Az alleles (u p+) is equal to the rate 
at which A; alleles give rise to A, alleles (v (1 — p;)) so 
there is no net change in the frequency of either allele. 


Populations Undergoing Selection 


In a population undergoing selection the situation is a 
bit more complicated, but the basic principle is the 
same. A population is in equilibrium if allele frequen- 
cies do not change from one generation to the next. 
Norway rats in Great Britain, for example, evolved 
partial resistance to the blood anticoagulant warfarin 
that has been used for rat control since World War II. 
The resistance results from a mutation in a gene that 
would normally be deleterious. When warfarin is pre- 
sent homozygotes for the susceptibility allele (SS) 
survive only 68% as often and homozygotes for the 
resistance alleles (RR) survive only 37% as often as 
heterozygotes (SR). Because heterozygotes are the 
most likely to survive, natural selection maintains 
both alleles in the population. Moreover, the popula- 
tion will evolve from any initial allele frequency of S to 
a frequency of 0.66. Once that allele frequency is 
attained, it will remain constant. When the frequency 
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of S equals 0.66, in other words, the population is in 
equilibrium. 

Warfarin resistance is an example of a general pat- 
tern of selection known as heterozygote advantage or 
overdominance. The relative survival abilities of the 
genotypes are referred to as relative fitnesses. When- 
ever a population is large and heterozygotes are more 
likely to survive than homozygotes, natural selection 
maintains both alleles in the population in which the 
frequency of A, is 


p = (1 — wx)/(2 — wi — wz) 


where w11 is the probability that the genotype homo- 
zygous for A survives relative to the probability that 
the heterozygous genotype survives, and w22 is the 
probability that the genotype homozygous for A 
survives relative to the probability that the heterozy- 
gous genotype survives. 

Often mutations cause deleterious effects on the 
individuals that carry them. If it were not for the fact 
that mutation is introducing new copies of these dele- 
terious alleles, natural selection would tend to elim- 
inate them from populations. If the mutations recur 
repeatedly, however, the population will approach an 
equilibrium where the rate at which natural selection 
eliminates deleterious alleles is exactly balanced by the 
rate at which mutation reintroduces them, a phenom- 
enon often called mutation-selection balance. If the 
relative survival probabilities of the favorable homo- 
zygote, the heterozygote, and the deleterious homo- 
zygote are denoted as 1, 1-hs, and 1-s, respectively, 
when the deleterious allele is completely recessive its 
frequency is: 


q = (u/s)? 


When the deleterious allele is expressed in heterozy- 
gotes its frequency is: 


q = (u/hs) 


Stationarity in Finite Populations 


The frequency of alleles may change from one gener- 
ation to the next in small populations simply because 
of random chance, a process referred to as genetic 
drift. Over time genetic drift would lead to the loss 
of genetic variability within populations. In fact, a 
population would lose a fraction 1/2N, of the gene- 
tic variability it contains every generation if genetic 
drift were the only process affecting the population, 
where Ne is the effective population size. Just as 
recurrent mutation to a deleterious allele can prevent 
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its elimination by natural selection, however, mutation 
can prevent the loss of genetic variability from small 
populations. 

About 2N,q alleles are introduced into a diploid 
population of size N. every generation by mutation, 
but there is no equilibrium between mutation and drift 
in the same sense as there is between mutation and 
selection. In a small population where allele frequen- 
cies are subject to drift they will tend to change in 
every generation. Nonetheless, the ‘probability’ that a 
population will have a particular allele frequency will 
eventually stop changing. When it does we say that the 
population has reached stationarity. 

Stationarity in a small population is the analog of 
equilibrium in large ones. Although it may be very 
difficult to calculate the probability that a population 
has a particular allele frequency, populations will 
almost always approach stationarity if rates of muta- 
tion, selection, and migration remain constant and if 
the population persists for a long enough period of 
time: about 4N, generations, on average. 


Applicability of Equilibrium Concepts 


In real populations of plants or animals it is rarely, if 
ever, the case that rates of mutation, migration, and 
selection remain constant for long periods. As a result, 
real populations are rarely, if ever, exactly at equilib- 
rium (or at stationarity, if small). Nonetheless, the 
features of equilibrium populations play an important 
role in evolutionary theory, both because sometimes 
the variation in evolutionary forces is small enough 
that the assumption of equilibrium is not far wrong 
and because the investigation of equilibrium condi- 
tions allows us to infer the direction in which evolu- 
tion is likely to proceed. Even if we knew only that 
rats heterozygous for warfarin resistance were more 
likely to survive than those homozygous for either 
allele, we could predict that natural selection would 
tend to maintain both alleles in populations exposed to 
warfarin. Neither will this part of the outcome be 
affected if the survival probabilities of genotypes dif- 
fer from one generation to the next, provided that 
heterozygotes are alway most likely to survive. 

In many genetic models of the evolutionary process 
two types of equilibria are encountered: stable equilib- 
ria and unstable equilibria. Although a population 
with an allele frequency that matches the frequency 
of either type of equilibrium will not change in later 
generations, populations tend to evolve away from 
unstable equilibria and tend to evolve toward stable 
equilibria. Small differences between a population’s 
allele frequency and the allele frequency at an unstable 
equilibrium are magnified from one generation to the 
next, while allele frequency differences from a stable 


equilibrium are decreased in every generation. Simi- 
larly, small populations will tend to change in ways 
that cause them to have allele frequencies that are 
associated with high probabilities at stationarity. 


Further Reading 

Crow JR and Kimura M (1970) An Introduction to Population 
Genetics Theory. Minneapolis, MN: Burgess. 

Hartl DL and Clark AG (1997) Principles of Population Genetics, 
3rd edn. Sunderland, MA: Sinauer Associates. 

May RM (1985) Evolution of pesticide resistance. Nature 315: 
12-13. 

Willis JH (1999) The contribution of male-sterility mutations 
to inbreeding depression in Mimulus guttatus. Heredity 83: 
337-346. 


See also: Effective Population Number; 
Equilibrium; Fitness; Genetic Drift; 
Hardy-Weinberg Law; Heterogenote 


erbA and erbB in Human 
Cancer 
J Y-K Lau 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1566 


The tyrosine kinase pathway constitutes a very 
important cellular signal transduction pathway. 
Tyrosine kinases can be grouped into two classes: 
receptor tyrosine kinases and nonreceptor tyrosine 
kinases (without extracellular binding domains). 
When cellular tyrosine phosphorylation is enhanced, 
for instance, by a growth factor to the receptor 
tyrosine kinase, this triggers a cascade of downstream 
signals, thereby affecting many different cellular func- 
tions. Importantly, many of the cellular tyrosine 
kinases are frequently products of proto-oncogenes 
and their aberrant expression has been associated 
with many different human cancer types. One of the 
best-studied families of tyrosine kinases is the epider- 
mal growth factor receptor (EGFR) family. The erbB 
family consists of four different types of receptor 
tyrosine kinase, including erbB-1 (also known as 
EGER), erbB-2 (HER-2/neu), erbB-3 (HER-3), and 
erbB-4 (HER-4). The first two types have been well- 
studied and characterized in human cancer. 
Amplification/overexpression of erbB-1 and erbB-2 
has been associated with different types of human 
cancer, for example, breast cancer, lung cancer, and 
head and neck squamous cell carcinoma. erbB-1 is a 


transmembrane tyrosine kinase receptor. The erbB-1 
protein is composed of two cysteine-rich extracellular 
domains and an intracellular tyrosine kinase domain. 
It shares extensive sequence homology with erbB-2. 
erbB-1 is expressed throughout development and in a 
variety of cell types. Several ligands, such as TGF-a 
and amphiregulin, can bind to the 170-kDa cell- 
surface erbB-1, resulting in activation of its intrinsic 
kinase activity. In the presence of its ligands, over- 
expression of erbB-1 can transform the mouse fibro- 
blast cells indicating its potential role in oncogenesis. 
In some types of human cancer, the expression level of 
erbB-1 is significantly associated with the tumor stage 
and size. In addition, antibodies against erbB-1 have 
been shown to inhibit tumor growth in experimental 
studies. This indicates that erbB-1 may play a signifi- 
cant role in oncogenesis in some human cancer types. 
The erbB-2 gene encodes a transmembrane protein 
of 185 kDa. erbB-2 has intrinsic tyrosine kinase activ- 
ity. Amplification/overexpression of the erbB-2 onco- 
gene was found in 20-30% of cases of human breast 
cancer. Its overexpression is also found in ovarian, 
lung, gastric, and oral cancer with high frequency, 
suggesting that erbB-2 overexpression may play an 
important role in the development of human cancer. 
In an experimental model, transfection of the normal 
erbB-2 gene into cells expressing erbB-2 at low levels 
can enhance metastatic potential by promoting multi- 
ple steps associated with metastasis such as cell migra- 
tion rate and in vitro invasive ability. Unlike erbB-1, 
no ligands directly binding to the erbB-2 protein have 
been clearly identified. A mutation at the transmem- 
brane domain or its overexpression can result in con- 
stitutively activated erbB-2. This is likely due to the 
enhanced formation and stabilization of the receptors, 
allowing the protein to be in the activated state. When 
erbB-2 protein is activated, it can interact with many 
different cellular proteins such as mitogen-activated 
protein (MAP) kinase, Shc, PLC-y, and GAP, PI3 
kinase mediating the signal transduction pathway. 
Members of the erbB family have been shown to be 
able to form heterodimer and transphosphorylate in 
response to NDF (also known as heregulin) or EGF. It 
has been found that the erbB-3 gene product is a 
receptor for NDF and coexpression of erbB-2 and 
erbB-3 reconstitutes a high-affinity receptor for 
NDF. NDF can also stimulate cell proliferation in 
breast and ovarian cancer cell lines. A recent report 
has shown that NDF can stimulate mitogenesis in 
NIH 3T3 cells that express either erbB-3 or erbB-4, 
but not transformation. However, when the cells 
expressing either erbB-1 or erbB-2 are coexpressed 
with either erbB-3 or erbB-4, NDF can induce cellular 
transformation. These data indicate that different 
members of the EGFR family may have different 
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signaling pathways. Also, the findings imply that 
erbB-1 and erbB-2 may play an important role in 
cellular transformation. 

With the elucidation of the significant role of 
erbB-1 and erbB-2 in the pathogenesis of human can- 
cer, different approaches have been used to target the 
signal transduction pathway of these two oncogenes 
and their expression level in cancer cells. For example, 
in erbB-2, a recombinant humanized monoclonal anti- 
body, herceptin, has been used in different clinical 
trials with some encouraging results. In a series of in 
vitro and animal experiments, adenovirus E1A protein 
successfully repressed the expression level of erbB-2, 
and there was significant i improvement in thesurvival 
of those mice with erbB-2-overexpressing tumors that 
were treated with E1A. In addition, tyrosine kinase 
inhibitors, such as tyrphostin and emodin, have been 
used to block the erbB-1 and erbB-2 tyrosine kinase 
activities. Thus, different strategies that target either at 
the expression level of these oncogenes or their signal- 
ing transduction pathways have been employed with 
considerable success, providing a novel and hopefully 
better therapeutic option for those suffering with 
cancer. 


See also: Cancer Susceptibility; Oncogenes 
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The error catastrophe is a conjecture in search of 
experimental verification. The initial form of the con- 
jecture arose at an early stage in the analysis of the 
genetic code, when Leslie Orgel contrived the follow- 
ing syllogism. (1) Translation of the genetic code must 
be afflicted by some nonzero frequency of error. (2) 
The devices that translate the genetic code are them- 
selves proteins (e.g., aminoacyl-tRNA synthetases, 
translation factors, ribosome proteins) and they will 
themselves contain errors. (3) Therefore, error rates of 
gene expression are intrinsically unstable because they 
are autocatalytic. This conclusion follows from the 
supposition that the error frequencies of translation 
are enhanced by the errors already incorporated into 
the proteins of the translational apparatus itself. 
Accordingly, the more errors that have been incorpor- 
ated into the proteins of the translation system, the 
more errors the translation system will make. Clearly, 
the catastrophic implication is that at some point the 
autocatalysis of translation errors will get out of hand 
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and the translation system will be unable to generate 
canonical gene products. Thus, a destructive, positive 
feedback loop fueled by the errors of translation leads 
inexorably to the death of cells. 

According to the original formulation of the error 
catastrophe, the question is not whether such an error 
catastrophe will occur. It is taken to be inevitable. 
Rather, the questionis how long it takes before the catas- 
trophe erupts. Of course, the underlying appeal of this 
scenario is that it provides a simple, molecular explan- 
ation of senescence and death at the cellular level. 

Orgel and others soon recognized that the error 
catastrophe is not inevitable if the magnitude of the 
coupling between errors in proteins and errors of 
translation is sufficiently small. In other words, if the 
feedback between successive rounds of translation 
errors is contained within sufficient bounds, the 
error rate of translation will be stable, i.e., not inclined 
to catastrophes. 

From the 1960s through the 1980s, a great variety of 
studies were aimed at testing the prediction that aging 
(either in whole organisms or in cultured cells, a favor- 
ite model system) is accompanied by increasing errors 
in protein synthesis. The overwhelming majority of 
technically adequate studies detected no such increase. 
A smaller number of studies sought to evaluate the 
formal characteristics of error feedback. Most of these 
studies, utilizing bacteria, demonstrated that the error 
feedback term was indeed small, and that normal as 
well as artificially enhanced translation error frequen- 
cies are stable. 

The failure to detect signs of the error catastrophe 
either in aging test subjects or under conditions of 
experimentally enhanced translational error led to a 
frustrating situation for the experimentalist. A propon- 
ent could always argue that it had not been proven that 
the error catastrophe never occurs. The experimental- 
ist, realizing that it is in principle impossible to find 
such a proof, was then obliged to return to theory in 
order to find out what made the error catastrophe so 
elusive. 

The underlying assumption of the error catas- 
trophe is that errors of protein construction inexor- 
ably increase the errors of protein function. Such a 
state of affairs is encountered if the accuracy of protein 
function has evolved to an absolute maximum. At 
such a maximum, changes in protein structure can 
not improve the accuracy of function. Rather, struc- 
tural changes can only make the accuracy of function 
worse or at best leave it unchanged. In contrast, it has 
been known for many years that hyperaccurate muta- 
tions in ribosomes are easily obtained by selection 
with antibiotics such as streptomycin. These ribo- 
somes contain alterations in their proteins. If ribo- 
somal proteins can mutate so that translation is 


carried out at much higher accuracy levels than that 
supported by wild-type ribosomes, the canonical 
ribosomes are not operating at maximum accuracy. 

Since some mutations can increase translation 
accuracy while others decrease it, the net effect of 
errors in the constructions of ribosomes could well 
be to cancel each other out. This would account for the 
apparent stability of the error rates of both wild-type 
and error-prone mutant ribosomes. It would also 
account for the measurements of substitution errors 
in ribosomal proteins which suggest that in a normal 
ribosome, containing roughly 7800 amino acids in its 
proteins, there are an average of three to ten erroneous 
amino acid substitutions. Thus, no two ribosomes are 
completely alike and the bacterial cell’s entire ribo- 
some population normally provides an experiment in 
error feedback. These ribosomes do not generate a 
catastrophic cascade of errors in translation, presum- 
ably because the influence of these errors is to neutral- 
ize one another. 


See also: Aging, Genetics of; Ribosomes 
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Historical Background 


The first recorded case of hemolytic disease of the 
newborn was described in 1609, but it was not until 
1932 that hydrops fetalis, jaundice, and kernicterus 
were shown to be part of the same disease associated 
with hemolytic anemia, extramedullary erythropoiesis, 
hepatomegaly, and erythroblastosis. These features 
were collectively known as erythroblastosis fetalis. 
By 1939, Levine and Stetson had demonstrated the 
involvement of the Rh antigen, but it was not until 
1954 that Chown proved that the fetal hemolysis was 
caused by the production of maternal anti-RhD allo- 
immune antibodies. Since 1970 prevention of the dis- 
ease has been possible by routinely giving anti-RhD 
y-globulin to nonsensitized RhD-negative mothers 
immediately following the birth of a RhD-positive 
child. The antibody removes fetal cells from the mater- 
nal circulation before they can cause sensitization. 


Rh Blood-Group System 


Depending on the presence or absence of D antigen 
on the red blood cell surface, individuals are classified 
as Rh-positive or Rh-negative. In addition to the D 


antigen there are two other major antigens, the C/c and 
E/e antigens, which have important clinical implica- 
tions, the only difference being that there is no appar- 
ent d antigen, where ‘d’ refers to an absence of ‘D’. 
In the case of Cc and Ee, both the upper and lower 
case letters indicate the presence of serologically 
definable antigen. 

The genes encoding these three sets of antigens are 
inherited together rather than randomly. Earlier in- 
vestigators therefore proposed a single gene where 
recombination would only rarely occur. When indi- 
viduals totally lack these antigens, they usually have 
membrane instability, suggesting that the Rh antigens 
have major physiological importance. The ethnic in- 
cidence of the RhD-negative phenotype varies con- 
siderably, being about 15% in Caucasians, 35% in 
Basques, and virtually zero in Asiatic Chinese and 
Japanese. 

The nomenclature of the Rh system is confusing, 
but the most common system used is the Fisher—-Race 
system which was based on the theory that the Rh 
system locus consists of three genes with antithet- 
ical alleles C/c, D/d, and E/e. The haplotypes are 
described in triplets, Cde, cde, and cDE being the 
most frequent. 


Molecular Basis of Rh Antigens 


There were found to be approximately 60000 Rh 
polypeptides per erythrocyte. When the isolated 
Rh polypeptides were digested and analyzed by elec- 
trophoresis, the variations in the degradation patterns 
indicated that RhD is distinct from the C/c and E/e 
polypeptides, with the former having a molecular 
weight of 31.9kDa and the latter two each having a 
molecular weight of 33.1 kDa. These studies therefore 
showed that the Rhc, RhD, and RhE polypeptides 
were very similar though distinct proteins. 

Cloning of the Rh polypeptides was complicated 
as the monoclonal antibodies that had been developed 
to identify the different Rh antigen sites on the red 
cell membrane were not suitable for identification of 
Rh polypeptides expressed from cDNA expression 
libraries. Oligonucleotide probes for isolating Rh 
cDNAs were designed from partial amino acid se- 
quencedataofisolated polypeptides. In1990twogroups 
of workers (Aventetal., 1990; Cherif-Zahar etal., 1990) 
used the polymerase chain reaction (PCR) with oligo- 
nucleotide primers from segments of N-terminal 
amino acid sequence to amplify cDNA templates pre- 
pared from thalassemic spleen erythroblasts or per- 
ipheral reticulocytes. These PCR products were then 
hybridized to commercially available cDNA libraries. 
The open reading frame sequences were found to 
be identical for each group and in situ hybridization 
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confirmed localization to chromosome 1p34.3- 
1p36.1, which had previously been suggested by link- 
age data some years earlier. The first cDNA clone 
proved to encode both the C/c and E/e proteins. By 
1992 Le Van Kim and colleagues (Le Van Kim, 1992) 
reported the isolation of the RhD polypeptide, and 
through restriction fragment length polymorphism 
analysis showed RhD-positive individuals had two 
polypeptide genes, and RhD-negative individuals 
had just one. The conclusion was that the RH gene 
locus consists of two highly homologous, closely 
linked genes, one of which encodes both the C/c and 
E/e proteins. The other gene encodes the RhD pro- 
tein, which is absent in RhD-negative individuals. 


Structure of the Rh cDNAs 


The RhD and RhEe cDNAs consist of 10 exons, exons 
1-9 being almost identical. Exon 10 of the RhD cDNA 
contains regions of divergence with an Alu repeat 
element. Subsequently it was demonstrated that 
the RhCcEe gene encoded both the E/e and C/c 
polypeptides by differential splicing of the primary 
mRNA transcript. The RhE polypeptide is synthe- 
sized from a full-length transcript of the RhCcEe 
gene. The 417-amino acid polypeptide is the same 
length and has a very similar sequence to the RhD 
polypeptide. The difference between the antithetical 
E and e epitopes depends ona single point mutation in 
exon 5 at position 226, substituting an alanine in the E 
polypeptide for a proline in the e polypeptide. 

The Cc polypeptides are synthesized from at least 
two different truncated transcripts that have exons 4, 
5, and 6, or exons 4, 5, and 8 spliced out. The tran- 
scripts are identical to one another and to the E/e 
polypeptide at the N-terminus. The C-terminus is 
either identical to the same region of E/e but has 
reverse orientation in the membrane or has a novel 
protein sequence as a consequence of the introduction 
of frameshift by the splicing of exon 8. The difference 
between C and c is due to a series of six point muta- 
tions in exons 1 and 2, two being silent and four that 
result in amino acid substitutions. 

The molecular basis of a number of rare RhD posi- 
tive/negative variants has now been identified. Many 
are due to substitution of parts of the RhD gene 
sequence into the RhCcEe gene or vice versa to form 
‘hybrid’ genes. 


Function of the Rh Polypeptides 


Even though the predominant interest regarding Rh 
polypeptides is in their role as antigens, it is probable 
that they play a crucial role in the physiology of 
red cell membranes that is quite unrelated to their 
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antigenicity. Their function is clearly defined but the 
multiple membrane-spanning domains of the Rh poly- 
peptides suggest a transportor protein. Erythrocytes 
from all of the common Rh phenotypes are normal, but 
the membrane defects seen in those with the Rhnull 
phenotype provide some clues to their function. 
Rhnull individuals have a mild to moderate hemo- 
lytic anemia (never severe) suggesting that the Rh 
system may act as a fine-tuning mechanism in mem- 
brane stability; however, the exact mechanisms are not 
clear. 


Prenatal Determination of the RhD 
Phenotype 


Approximately 56% of Rh-positive subjects are het- 
erozygous for the D antigen. When the mother is 
RhD-negative and the father is heterozygous RhD- 
positive there is a 50% chance that the fetus will be 
RhD-negative and so not at risk for erythroblastosis 
fetalis. Previously in clinical practice, prenatal deter- 
mination of the RhD phenotype where the father 
was heterozygous for the trait involved fetal blood 
sampling with serological Rh typing, resulting in a 1- 
2% fetal loss rate and a 40% risk of fetomaternal 
hemorrhage, which may also have increased the risk 
of sensitization. An alternative method is serial 
amniocentesis for quantitation of bilirubin in amniotic 
fluid. This technique is unable to distinguish an 
RhD-positive fetus that is mildly affected from an 
RhD-negative one. It also potentially exposes the 
fetus to multiple invasive procedures. 

The ideal strategy for prenatal determination of fetal 
RhD phenotype would be to generate a pair of PCR 
primers that only amplified a specific region of the 
RhD gene without cross hybridization to the RhCeEe 
gene or any other gene. A second amplification, ideally 
of the RhCcEe gene, should be performed in duplex. 
This strategy is difficult to design in practice as there is 
a high degree of homology between the RhCcEe 
and D genes. Exon 10 of the RHD gene demonstrates 
a region of divergence with a copy of the Alu repeat 
motif which is found in a large number of other genes 
and noncoding sequences. Bennett et al, 1993 
reported the first use of PCR for prenatal determina- 
tion of fetal RhD using primers at the 5’ extreme end 
of exon 10 in RHD. The 5’ primer lay within a region 
of 100% homology between RhCcEe and RHD, but it 
was the 3’ primer, designed to an RhD-specific region, 
that gave the specificity to the reaction. A control 
primer from sequences in exon 7, which amplified a 
134-bp product from both RhCcEe and RHD, acted 
as a control in the duplex reaction. 

The original report was from 15 samples of amni- 
otic fluid cells with the fetal RhD type being also 


confirmed on fetal blood sampling. Several groups of 
workers have used different primers but it would 
appear that the original primers of exons 7/10s are 
able to predict consistently the RhD serotype in all 
cases. Nevertheless, it is still important to use two 
different primer sets, designed from a different part 
of the RhCcEe gene and D genes, and this can be 
combined successfully in a single multiplex reaction. 
Trophoblastic tissue has been shown not to express the 
RhD antigen so is not useful as a form of prenatal 
diagnostic test for genotyping. 


Noninvasive or Minimally Invasive 
Prenatal Determination of RhD Type 


As the maternal RhD-negative DNA should not act 
as a template for PCR, PCR should only amplify a 
product if there is fetal RhD-positive DNA present. 
Several groups have attempted to extract fetal DNA 
from the maternal circulation with variable success, 
possibly in some circumstances due to the rapid clear- 
ance of RhD-positive cells from the circulation of 
sensitized RhD-negative women. There is evidence 
that extraction of fetal DNA by fluorogenic PCR 
analysis from the maternal circulation is reliable 
from the second trimester, being less reliable in the 
first trimester where samples give false negative 
results, presumably due to the low concentration of 
fetal DNA in the maternal plasma at the time. Other 
noninvasive tests, such as harvesting fetal cells from 
endocervical mucus, have not been shown to be suffi- 
ciently reliable to be used in clinical practice. 


Preimplantation Determination of 
RhD Type 


For sensitized women with heterozygous partners 
who have experienced recurrent miscarriages or serial 
transfusions and are unable to cope with future affected 
pregnancies and also for those who have had severely 
affected pregnancies prior to conventional therapy, 
preimplantation determination of embryonic RhD 
type after in vitro fertilization and before embryonic 
transfer is a possible option. To perform molecular 
diagnosis of RhD from DNA present in a single 
human diploid cell requires amplification by ‘nested 
PCR.’ A low number of cycles is used with an outer 
set of oligonucleotide primers. The second round of 
PCR is then performed on a small aliquot from the 
first reaction, using a higher number of amplification 
than that in the first reaction with primers internally 
nested. A common primer rather than two sets of 
primers for duplex PCR in the outer reaction of nested 
PCR reduces the incidence of locus-specific amplifi- 
cation when used in a single-cell diploid genome. 


To differentiate the two genes, one inner primer was 
designed to anneal with RhD sequences and the 
second to RhCcEe sequences, but at different yet 
overlapping sites, so resulting in an amplification pro- 
duct of different size from that of the RhD gene. This 
method would eliminate the risk of an RhD-positive 
embryo being missed. 
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The science of genetics has benefited from concen- 
trated studies on a relatively small number of living 
systems — so-called paradigm or model organisms. 
Examples include the laboratory (or house) mouse 
(Mus musculus), the fruit fly (Drosophila melano- 
gaster), the nematode worm (Caenorhabditis elegans), 
the protozoan Paramecium (Paramecium aurelia), and 
the bread mold (Neurospora crassa). The enteric bac- 
terium Escherichia coli has been among the model 
organisms of genetics ever since the middle of the 
twentieth century. Study of E. coli and its viruses has 
contributed much information to fundamental genet- 
ics, including the nature of the genetic material, the 
molecular definition of genes, and the mechanisms of 
their function and regulation. The biotechnology 
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industry was founded on the basis of discoveries 
about the genetics of E. coli, and the organism itself 
continues to serve many important roles in biotech- 
nology processes. 


Escherichia coli and Its Life 


E. coli is a rod-shaped bacterium measuring a few 
microns in length and 0.5 um wide. Being a prokary- 
ote, it lacks a nuclear membrane. Its 4290 genes 
reside on a single circular, double-stranded DNA 
molecule tightly packed within the cytosol of the cell 
(Figure 1). E. coli grows rapidly in simple media 
(generation time of the order of 1h) and reproduces 
by binary fission. A double-membrane envelope gives 
the cell a gram-negative staining characteristic. 
Because it is a facultative anaerobe (able to grow anaer- 
obically by fermentation and aerobically by oxida- 
tion), it is admirably suited for its main ecological 
niche — the intestine of humans and other animals, 
where it universally constitutes a part of the normal 
flora. The genus to which E. coli belongs was named 
after Theodor Escherich, an early bacteriologist. E. coli 
is said to be an enteric bacterium because its major 
habitat is the intestine (enteron) of humans and 
other animals. There are several other species of 
Escherichia, but none comes close to sharing the 
research spotlight with E. coli. The closest relatives 
to E. coli seem to be the several species of the genus 
Shigella, many of which are human pathogens, but it is 
also quite similar to the mouse and human pathogen 
Salmonella. 

Although resident within the ileum (rather than the 
colon, as its name might imply), cells of E. coli must 
perforce survive conditions external to animals suffi- 
ciently to insure successful passage from one indivi- 
dual to another. Humans are colonized almost 
immediately after birth, generally by the E. coli strain 
inhabiting the mother; every few months another 
replaces the particular resident strain from the envir- 
onment. A few strains are pathogenic; some cause 
genitourinary infections, and some are responsible 
for traveler’s diarrhea. Great attention has been direc- 
ted toward the exceptional strains that produce a 
potent toxin that can produce a fatal or near fatal 
septicemia when ingested from contaminated water 
or food. 


How Escherichia coli Became a Paradigm 
for Genetic Studies 


That E. coli, or any bacterium for that matter, should 
turn out to be a preeminent subject in explorations 
of genetics is exceedingly odd, for until the 
mid-twentieth century there were scientists who 
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Figure | 


Thin section of Escherichia coli. The DNA was immunostained, revealing the nucleoid as the convoluted 


central area. (Reproduced with permission from Kellenberger E (1996) Structure and function at the subcellular level. 
In: Neidhardt FC et al. (eds) Escherichia coli and Salmonella: Cellular and Molecular Biology, 2nd edn, ch. 4. Washington, 


DC: ASM Press.) 


questioned whether bacterial inheritance follows the 
same rules that govern plants and animals. 


Do Bacteria Follow the Standard Rules of 
Genetics? 

There were good reasons for suspecting that bacterial 
inheritance was too specialized to serve as a model for 
general cellular genetics. The cardinal rule of genetics, 
that like begets like, seemed often violated. A popula- 
tion (called ‘a culture’) of bacteria produced under one 
set of growth conditions might differ in subtle ways 
(enzyme content, antigenic characteristic, etc.) and 
not-so-subtle ways (cell size and gross chemical com- 
position) from a culture of the same organism grown 
in a second environment. Many of the properties of 
the cells in the second medium would disappear when 
these cells were grown to produce offspring in the 
original medium. This extreme plasticity of bacterial 
cells raised the question of what role heredity played 
in these very small cells. This doubt was reinforced 
by the lack of convincing cytological evidence that 
bacteria had chromosomes and assorted them by 
mitosis. 

The easy manner in which bacteria acquired and 
lost characteristics depending on their growth medium 
was largely explained when it became recognized that 
the genetic makeup of a bacterial cell (its genotype) 
determines a wide range of possible appearances 
(phenotypes). Gene expression is greatly influenced 
by the environment. The enzymatic constitution of 
these cells depends on the activation and repression of 
the individual genes of their genome in response to 
chemical signals from the environment. 

But there was a second problem. Some properties 
acquired by the population in a particular environ- 
ment were retained during subsequent growth in 
other environments; that is, they appeared to breed 
true, as do mutations in higher organisms. Exposure of 
a bacterial population to a deleterious agent, for ex- 
ample, led usually to the growth of cells resistant to 


that agent. In these cases it seemed that environmental 
components might induce mutations that favor 
growth in that environment, quite contrary to the 
well-established principle of Mendelian genetics that 
specific mutations are not directed by the environ- 
ment. This was not a preposterous notion, since bac- 
teria grow by binary fission and thus there is no 
distinction between germ cells and somatic cells; 
each bacterium passes on to its daughter cells what- 
ever changes may have occurred to its genetic material. 

But in 1943 Luria and Delbriick concluded, from 
measurements of the distribution of cells resistant to 
bacteriophage T1, that spontaneous mutations occur 
at random in a growing E. coli population. In 1952 
Joshua and Esther Lederberg, by means of replica- 
plating, isolated mutants resistant to streptomycin 
without ever exposing the cells to that agent. By 
these and other ingenious ways of demonstrating 
the existence of spontaneous mutants in a popula- 
tion, microbiologists became persuaded by the mid- 
twentieth century that bacterial mutations occur 
essentially randomly within individual cells, and that 
the environment plays a large role in selecting chance 
mutants that have a growth advantage. What had not 
been sufficiently appreciated earlier was that selective 
pressures could bring about changes in population 
composition very quickly in organisms growing 
exponentially, with generation times measured in 
minutes rather than months or years. 

The uneasiness about bacterial inheritance had not 
prevented some fundamental discoveries in bacterial 
genetics even before the issue of mutations was settled. 
In 1944, Oswald Avery, Colin MacCloud, and Maclyn 
McCarty demonstrated that the ‘transforming prin- 
ciple’ discovered by Frederick Griffith (1928), which 
conferred new properties on Streptococcus pneumo- 
niae, was DNA. Alfred Hershey and Martha Chase 
in 1952 verified the conclusion that DNA and not 
protein was the genetic material by showing that it 
was only the DNA of bacteriophage T2 that is injected 


into the host E. coli cells which proceed to produce a 
new crop of phage. 


Escherichia coli as a Model Organism for 
Genetic Studies 

The very characteristics that provided the early puz- 
zles about phenotypic plasticity and mutability made 
E. coli enormously valuable once it was realized that 
its genetics would model that of plants and animals. 
The small size of these cells, their rapid growth rate, 
and the extensive phenotypic influence by the envir- 
onment provided geneticists with powerful tools. 
Small size meant that many millions, even billions of 
individuals could be studied in a single experiment. 
Rapid growth meant that many generations could be 
produced within a single day. The ability to grow 
these cells in chemically diverse media and at different 
temperatures made it possible for biochemical genet- 
ics to flourish. The latter characteristic opened the 
door to the biochemistry of how genes function and 
how inheritance works, and also brought genetic an- 
alysis to bear on discovering the biochemical nature 
and workings of the cell. As the structure and function 
of DNA and the nature of the genetic code became 
known, the biochemical genetics of E. coli evolved 
into the field of molecular genetics. Soon thereafter 
recombinant DNA techniques were developed, aided 
greatly by studies of E. coli and its restriction en- 
zymes. 

Of particular importance was the early realization 
of the rich opportunities for genetic studies provided 
by the many kinds of bacteriophage (bacterial viruses, 
or ‘phage’) to which E. coli is susceptible. 


Contributions of Escherichia coli to Genetics 
Once the advantages of working with E. coli and 
its bacteriophage were appreciated, genetic studies 
advanced quickly. In fact, the great success of studies 
on E. coli relates to the possibility of bringing the 
power of genetic analysis to any problem studied 
in this organism. Accordingly, the contributions to 
biology made through genetic studies with E. coli are 
extremely impressive. A few examples chosen for 
variety will illustrate the riches harvested over the 
second half of the twentieth century: 


1. Biochemical pathways. In the period from 1950 to 
1965, the enzymatic steps in the synthesis of amino 
acids and nucleotides were established in E. coli, 
largely through the powerful tool of mutant analy- 
sis; this accomplishment provided the framework 
for understanding the biosynthetic pathways of all 
organisms. 

2. Definition of the cistron. Intensive study of one 
genetic locus (rI1) of the bacteriophage T4 enabled 
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Benzer in the early 1960s to define the term ‘gene’ 
with great precision and to distinguish between 
genes as units of mutation, of recombination, and 
of function. Benzer’s work moved the concept of 
the genetic material from an image of genes being 
like beads on the chromosome string to a depiction 
closely approximating our current molecular 
understanding of the gene as a segment of a linear 
DNA molecule. He introduced the term ‘cistron’ 
(defined operationally by the cis/trans test) as the 
unit of heredity that encodes a single polypeptide 
chain. 


. Regulation of gene function. The elements of gene 


regulation were first indicated by the monumental 
genetic and biochemical study of the lac genes of 
E. coli led by Jacques Monod and Francois Jacob. 
To their work, and that of their many inter- 
national collaborators, we owe the discovery of: 
regulatory genes and their protein products, opera- 
tor regions of DNA where the regulators work, and 
mRNA transcripts which carry information to the 
ribosomes for making proteins. Negative regula- 
tion by repressor proteins was quickly followed 
by recognition of positive regulation by activator 
proteins. Genetic analysis of the regulation of the 
trp operon uncovered still a third mode of regula- 
tion, attenuation, which functions by alterations in 
the secondary structure of a leader sequence of 
mRNA. Other studies with E. coli have uncovered 
additional means by which this bacterium controls 
its genes, leading to the conclusion that any step 
leading from a gene to its ultimate cellular function 
can serve as a control point in this organism. 


. Genetic code. Mutational studies in the early 1960s 


with the rII locus of E. coli bacteriophage T4 
provided the first experimental evidence that the 
genetic code probably related triplets of nucleotide 
bases to individual amino acids. In the same era, 
work with the bacteriophage T4 coat protein and 
with the trpA gene of E. coli independently demon- 
strated the colinearity of gene and polypeptide. 


. Global gene control systems. Realization of the 


hierarchical nature of gene regulatory networks 
came about through discovery of regulons (groups 
of genes controlled by the same regulatory protein) 
and modulons (groups of operons and regulons that 
are subject to a common control system). So-called 
global control systems, which govern the activity of 
dozens or even hundreds of independent genes, 
have been characterized in E. coli; heat shock, cata- 
bolite repression, the stringent response, and emer- 
gency repair of DNA damage are well-studied 


examples. 


. DNA repair and recombination. How damage to 


DNA is repaired, and how recombination occurs, 
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has received much attention in E. coli genetics. 
Repair of damage by ultraviolet (UV) radiation 
has been intensively studied and has led to the 
discovery of an excision-based repair of minor 
damage and an inducible system that handles 
major DNA damage. The latter, called the SOS (or 
distress) system, involves a dozen or so genes that 
collectively halt cell division, depress metabolism, 
and catalyze an efficient, but error-prone, repair of 
the damaged DNA. 

Gene cloning in vivo and in vitro. The use of E. coli 
to clone genes of interest, whatever their source, 
began in the 1960s with sophisticated use of viral 
(e.g., bacteriophages M13 and lambda) and plasmid 
(e.g., ColE1 and its derivative pBR322) vectors. 
Used alone or, more commonly, combined with in 
vitro recombinant DNA techniques, these cloning 
procedures have continued to make it possible to 
isolate individual genes, obtain them in multiple 
copies in vivo, and express their products for 
further study. Many specialized techniques have 
been developed in E. coli to aid physiological and 
genetic exploration of cell processes, of which the 
most common may be fusion of genes of interest to 
reporter genes (such as lacZ) with easily recognized 
or measured products. 


Pa 


Inheritance in Escherichia coli 


The formal genetics of E. coli (i.e., inheritance, as 
distinguished from the biochemical nature, action, 
and regulation of genes) includes the origin of genetic 
variability, the mechanisms of intercellular genetic 
exchange, the intracellular mobility of genes, the 
nature of auxiliary genetic elements, the genetic 
structure of populations, and evolution. 


Mutational Studies 

The advantage of working with millions of cells of 
short generation time is attractive to scientists inter- 
ested in the nature of mutations and mutagenic agents. 
Measurement of mutation rates is rather straightfor- 
ward with bacterial cultures, so different agents and 
treatments can be assessed for their ability to increase 
the frequency of mutations. The specific biochemical 
changes in DNA induced by different chemical 
mutagens and by physical agents such as UV and X- 
ray radiation have been characterized, and the means 
by which the cell repairs the damage has been inten- 
sively studied. Interestingly, the issue of Darwinian 
versus Lamarckian acquisition of mutations has arisen 
anew in £. coli, but with a slightly different twist. The 
question today is not whether mutations occur at 
random and are then subjected to environmental 


selection: that certainly is true; the question is 
whether, under stressful circumstances, the rate of 
mutation is ever increased in favor of mutations 
relevant to the environmental stress. Experimental 
results have shown that mutations that restore func- 
tion in a mutant lacZ gene increase during starvation, 
particularly if lactose is present; other examples of so- 
called adaptive evolution have been uncovered. As 
suggested by Margaret Wright, a mechanism can be 
envisioned by which environmental stress can increase 
the mutation rate of genes related to relief of that 
stress; such genes are commonly induced or dere- 
pressed in this circumstance, leading to the formation 
of transcription bubbles, where locally single- 
stranded DNA could be more vulnerable to damage. 
This possibility is under active investigation. 

The usefulness of mutants was greatly increased 
by introduction of the technique of conditionally 
expressed mutations. These mutations, which are ex- 
pressed only under specified conditions such as high 
temperature permit the isolation and growth of 
mutants defective in growth-essential genes. The 
mutant cells can be grown under permissive condi- 
tions and the effect of the loss of the gene’s function 
studied under the restrictive condition. 

Suppressors, which partially reverse the effect of a 
mutation, provide another approach to the isolation of 
mutants in essential functions. Suppressors can be 
mutant transfer RNA molecules that at low frequency 
mistakenly insert an amino acid at a nonsense codon 
produced by mutation (nonsense suppression), or 
insert the correct amino acid at an incorrect codon 
(missense suppression). The antibiotic streptomycin, 
at low concentrations or in resistant mutants, causes 
misreading of the genetic code, and this property can 
be used to isolate and grow streptomycin-dependent 
mutants in essential genes. The sequencing of the 
E. coli chromosome (consult the web site http:// 
www.genetics.wisc.edu/) has made it possible to 
clone and mutate each of the 4290 genes (or open 
reading frames) to study its function. 


Genetic Exchange 

At the heart of genetic analysis is the ability of the 
investigator to execute crosses, that is, to mate two 
individuals of differing phenotype and observe the 
phenotypic and genetic properties of the offspring. 
Bacteria are haploid and reproduce by binary fission, 
each cell dividing into two daughter cells when its 
mass has doubled. Bacterial geneticists therefore 
had to search for means to carry out crosses outside 
the normal reproductive cycle of these cells. For 
years bacterial geneticists searched in vain for some 
way to perform crosses with E. coli. Eventually 
three processes were discovered, all of which involve 


one-way transfer of DNA from a donor cell to a 
recipient cell. 


Conjugation 

In 1946, Joshua Lederberg and Edward Tatum demon- 
strated that two particular strains that had multiple 
nutritional requirements (auxotrophic mutants) dif- 
ferent from each other would, when mixed together, 
give rise to cells able to grow without any nutritional 
supplement. These so-called prototrophic recombin- 
ants had a complete set of wild-type genes, so it was 
logical to think that a mating had occurred by cell 
fusion. But this turned out not to be the case. Cell 
contact between the two ‘parental’ strains was neces- 
sary, but, as shown by William Hayes in 1952, the two 
wild-type recombinants did not arise by fusion of 
a pair of the two different auxotrophs. Rather, one 
of the strains, the donor, transferred its DNA into 
the other, or recipient strain, by a process called con- 
jugation. 

How conjugation brings about the transfer of bac- 
terial genes is an odd story having to do with plasmid 
biology. Plasmids are autonomously replicating, cir- 
cular, double-stranded DNA molecules, much smaller 
than the chromosome, found in great variety within 
E. coli and probably all bacteria. They confer a great 
range of properties on the cell. In E. coli, ‘maleness,’ 
the ability to transfer DNA to a recipient, is related to 
the presence of the F (for fertility) plasmid within the 
donor cell. This plasmid carries genes that can bring 
about the plasmid’s transfer from one cell to another. 
Among many related functions of these genes is the 
ability to produce a hair-like protein structure called a 
sex pilus, which helps the male cell (called donor or 
F*) capture the female cell and maintain a conjugation 
bridge through which the F plasmid DNA passes. The 
transfer is initiated by a single-strand break that 
occurs at a site called oriT (origin of transfer). The 
linearized strand is driven into the female cell (called 
recipient or F`) by a special mode of replication called 
transfer replication of the plasmid. The strand entering 
the recipient cell directs the synthesis of its com- 
plementary strand, the completed plasmid DNA 
circularizes, and its genes become functional. The for- 
merly F` cell grows a sex pilus and is now functionally 
a donor, male cell. The donor cell remains F* because 
the strand not transferred directs synthesis of its 
complement. Mixing a population of F* cells with 
one of F cells results in the massive conversion of 
the latter to F* (Figure 2). 

Transfer of F does not transfer chromosomal genes 
from donor to recipient. A variation of this process is 
responsible. Every so often, in a population of Ft 
cells, the plasmid DNA and the chromosome fuse 
and form a cointegrated DNA molecule. A cell with 
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F` cell 


F* cell 


Figure 2 Transfer of F plasmid from an F* to an F7 
cell. Formation of a mating pair triggers transfer 
replication of F One strand is nicked at oriT, then 
replication (at arrowhead) occurs by a rolling-circle 
mechanism. The newly synthesized DNA displaces a 
preexisting single strand of F, which enters the F cell, 
where its complementary strand is synthesized. (Re- 
produced with permission from Neidhardt FC et al. 
(1990) Physiology of the Bacterial Cell: A Molecular 
Approach. Sunderland, MA: Sinauer Associates.) 


this cointegrated DNA is called an Hfr cell (for high 
frequency of recombination). When it encounters 
an F` cell, conjugation is initiated as usual by the 
breaking of the cointegrate DNA at the normal oriT 
site, but in this case entry of a segment of the F genome 
into the F` cell brings along the integrated bacterial 
chromosome. Although a portion of the E. coli 
chromosome enters the F` cell, the conjugation bridge 
ruptures long before the entire chromosome and the 
remaining segment of the F genome can be trans- 
ferred, so the recipient remains F`. 

Hfr cells, though a very small proportion of any F” 
population, are the ones responsible for bacterial gene 
transfer. A pure population of Hfr cells mixed with F~ 
cells gives rise to large numbers of recombinants. 

Conjugation provided two independent measures 
of the relative locations of genes from which a genetic 
linkage map could be constructed for E. coli. The 
frequencies with which genes were separated by cross- 
over events during conjugation provided one measure; 
the time of entry of genes into the F cells, in experi- 
ments in which the conjugation process was inter- 
rupted by vigorous blending of the mating mixtures 
at intervals after mixing the Hfr and F` populations, 
provided independent information. 

One of the early signs that the E. coli chromosome 
is circular came from conjugation experiments 
employing Hfr strains, with the F plasmid integrated 
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at different sites of the chromosome and in different 
orientations. The different patterns of gene transmis- 
sion could be interpreted only with the assumption of 
a circular linkage map resulting from a circular physi- 
cal chromosome. 

Later physical studies bore out this interpretation. 
When the F genome occasionally excises from its resi- 
dence on the chromosome of Hfr cells, the excision is 
not always precise, and some adjacent bacterial DNA 
becomes part of the newly formed F plasmid. Such 
hybrid plasmids, called F’ to signal the presence of 
bacterial genes, are widely employed for genetic stud- 
ies, including complementation and cis/trans tests, or 
whenever heterodiploids are useful. 

Valuable as it has been for genetic studies on E. coli, 
conjugation must be regarded not as a bacterial pro- 
cess designed for genetic exchange, but as an acciden- 
tal consequence of the peculiar properties of the F 
plasmid. 


Transduction 

While a plasmid is the mediating factor in genetic 
transfer by conjugation, viruses (bacteriophages) 
bring about the transfer of DNA from donor to reci- 
pient cells in the process called transduction. First 
discovered in Salmonella by Norton Zinder in 1948, 
transduction quickly became both a tool for genetic 
analysis and a phenomenon with which to investigate 
virus—cell interactions in E. coli as well. 

E. coli bacteriophages are either virulent or temper- 
ate. Virions (viral particles) of the former kind infect 
cells by injection of their DNA (or RNA), take over 
the host’s synthetic apparatus, and direct the produc- 
tion of a new crop of virions associated usually with 
the lysis of the infected cell. Temperate bacterio- 
phages may initiate a lytic process, but can also pro- 
duce a lysogen which is an infected cell that carries in 
quiescent form a copy of the bacteriophage genome (a 
prophage), either physically integrated into the cell’s 
chromosome or maintained as a plasmid. The lysogen 
and its offspring can grow indefinitely with this viral 
passenger, but occasionally in a population of lysogens 
viral multiplication and virion production will be trig- 
gered. Transduction is mediated by temperate phages, 
and they do so in one of two broad ways. In general- 
ized transduction, any given gene of the host cell has 
an equal probability of being packaged, by mistake, 
into the protein capsules of the new virions, forming a 
pseudovirion (viral particles containing bacterial 
instead of viral DNA). Infection of a bacterial popula- 
tion with pseudovirions results in the injection of the 
bacterial DNA and with subsequent recombination 
with the recipients’ chromosome. Generalized trans- 
duction occurrs commonly with bacteriophage P1, 
but can occur with any bacteriophage that forms its 


mature virions by a process called headful packaging 
of DNA. 

Specialized transduction occurs with those bac- 
teriophages that have a chromosomally integrated 
prophage. When lysogens of this sort are induced to 
the lytic process, imprecise excision of the prophage 
DNA occasionally leads to the incorporation into a 
virion of a small segment of bacterial DNA along with 
the truncated phage DNA. Only genes that border 
the integrated prophage can be picked up in this way, 
and hence the name ‘specialized transduction’ or 
‘restricted transduction’ is used for this process. 
Geneticists have learned, however, to engineer the 
prophage integration site in order to produce transdu- 
cing virions of their genes of interest. 


Transformation 

Transformation is a bacterial process in which DNA 
released into the environment by the lysis of some 
cells is directly taken up by other cells and recombined 
with their DNA. Many bacterial species (notably 
Streptococcus pneumoniae and Hemophilus influen- 
zae) have natural mechanisms for the uptake of 
DNA and are thus said to be competent. Despite its 
widespread use as a host cell in recombinant DNA 
technology, involving the necessary uptake of hybrid 
plasmids, E. coli has no functional mechanism for 
transformation, i.e., it is not naturally competent. 
Treatment with salt and temperature shocks, or 
electroporation, must be employed to bring about 
entry of DNA into E. coli and thereby achieve artifi- 


cial transformation. 


Gene Transpositions 

How genes move within and between chromosomes 
of E. coli has been an area of great interest for geneti- 
cists interested in the mechanism of gene rearrange- 
ments, and for medical microbiologists exploring 
the development and spread of antibiotic-resistance 
among bacteria. The considerable intracellular mobil- 
ity of genes within E. coli is the result in large measure 
of transposable elements. Transposable elements are 
genetic elements that have the ability to catalyze their 
own movement (transposition), with or without repli- 
cation, from one DNA site to another, on either the 
same or a different DNA molecule. The simplest 
transposable elements, called insertion sequence 
elements (IS elements), are small (approximately 
1000 bp) segments of DNA consisting of terminal 
inverted-repeat sequences bordering a few genes that 
encode enzymes for transposition. There are six dif- 
ferent IS elements found in the E. coli chromosome, 
each in several copies. IS elements contain only genes 
for their own transposition, and thus are not readily 
detectable genetically unless they happen to transpose 


to a new site within a gene, thereby inactivating it. The 
second broad class of transposable element consists of 
transposons, which are segments of DNA-containing 
genes beyond those needed for transposition, fre- 
quently genes encoding enzymes for antibiotic resist- 
ance. Many transposons include IS elements at their 
ends. Transposons promote many types of DNA re- 
arrangements. 


Extrachromosomal Genomes 

As noted in our discussion of the F plasmid, most 
if not all E. coli strains found in nature contain one 
or more different plasmids as auxiliary genomes. The 
variety of cellular properties associated with plasmids 
goes far beyond fertility and includes production of 
toxins (including bacteriocins that kill other bacteria), 
resistance to antimicrobial agents and other toxic 
chemicals, and especially properties associated with 
virulence. As a general rule, the plasmids of E. coli 
are involved in interactions of these cells with their 
environment rather than with metabolism and growth. 


Population Structure and Evolution 

Study of the genetic structure of populations of E. coli 
and of the origin and evolution of this organism is of 
relatively recent origin, dating only from the early 
1970s, with the publication of studies of electrophore- 
tic variability of proteins in a large number of strains 
from around the world. Because of their replication by 
binary fission the structure of bacterial populations is 
essentially clonal, i.e., populations consist of clones of 
immense numbers of organisms with an exclusive 
common ancestor. But recombination following inter- 
cellular transfer of genes (mediated by plasmids, 
viruses, or direct uptake of DNA) modifies this clonal 
inheritance. One task of population geneticists is to 
evaluate the contribution of recombination through 
genetic exchange to E. coli evolution. Current work 
benefits greatly from nucleotide sequence informa- 
tion, including whole genome analysis. 


Current Genetic Studies in Escherichia 
coli 


The continued use of E. coli in fundamental cell 
research as well as in applied processes in biotechno- 
logy derives in large measure from the ease of genetic 
manipulation of this organism. Genetic analysis pro- 
vides the major tool for the current study of advanced 
cellular functions such as motility and chemotaxis, 
pathogenesis, and cell division in E. colz. The informa- 
tion being generated in the field of bioinformatics, 
with contributions from genomic and proteomic 
research, is encouraging attempts on the one hand to 
construct models of the living E. coli cell, and on the 
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other hand to understand the origins and evolution of 
this model organism. With the complete sequence of 
the genome of three important strains of E. coli 
known, the outlook for further discoveries is bright. 
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Genetics, Ethics, and Eugenics 


All discussion of ethics and genetics takes place in the 
shadow of abusive use of supposed genetic knowledge 
in the early and middle years of the twentieth century, 
especially (but not only) in Nazi Germany. The so- 
called “eugenic movement’ sought to improve the 
genetic characteristics of populations, either by encour- 
aging the supposedly “genetically superior” to have 
children, or by preventing the supposedly “genetically 
inferior” from doing so. At its worst, eugenic pre- 
judices led to forced detentions and sterilizations 
and even to extermination, particularly of mentally 
ill persons and of racial minorities. These were only 
the most serious aspects of a more pervasive lack of 
respect for persons and their rights. The complicity 
of many doctors with Nazi eugenics constitutes a 
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massive and much-studied example of dereliction of 
professional duties. 

The explosive growth in understanding of human 
genetics at the end of the twentieth century has led to 
public fears that new, sounder genetic knowledge 
might be used for eugenic purposes, yet also to public 
demands that any medical benefits of this knowledge 
be made rapidly available. A realistic look at the eth- 
ical issues raised by genetic knowledge at the start of 
the twenty-first century reveals a wide spectrum of 
issues, of potential benefits and dangers, and of ethical 
difficulties, as well as numerous efforts to devise regu- 
latory structures that will guarantee that the new 
genetic knowledge is used only for ethically accept- 
able purposes. 


Is Genetics Ethically Distinctive? 


Many, but not all, of the ethical problems raised by the 
new genetics resemble other problems in medical 
ethics and research ethics. However, genetics also 
raises some distinctive ethical problems. Some of 
these arise because genetic information is intrinsically 
familial, rather than attached solely to individual 
patients; others because it can sometimes be used to 
make very long-term predictions (predicting late- 
onset illness early in life); yet others because it can 
be used for nonmedical purposes (such as insurance). 
A more general range of concerns arises from the 
widespread sense that genetic knowledge and its use 
may alter our sense of self and family relationships in 
ways that are hard to foresee. 


Genetics and Research Ethics 


Genetic research on human subjects can raise a num- 
ber of distinctive problems. One common problem is 
that researchers (and sometimes experimental sub- 
jects) may acquire genetic information that also per- 
tains to relatives who have not consented to any 
investigation and need not be made aware of its 
results. All accounts of research ethics insist that 
prior consent must be obtained from individual re- 
search subjects, and that data obtained must remain 
confidential. This individualistic position is chal- 
lenged when the results of investigation are relevant 
not only to an individual but to a family. Genetic 
research may also raise distinctive ethical problems if 
it ‘medicalizes’ characteristics previously accepted as 
natural variation. 


Genetics and Medical Ethics 


The ethical problems arising in clinical genetics are 
numerous, and mostly similar to those arising in 


other areas of medicine. When genetic tests are used 
for diagnosis, when genetic conditions are treated, 
even if by use of somatic gene therapy, the ethical 
problems arising will mainly be those that recur 
throughout medicine. Typical problems will be those 
of conveying difficult information to patients and 
their families with adequate care, ensuring that genu- 
inely informed consent to tests and treatment is 
obtained, preserving confidentiality, and identifying 
best available treatments (particularly when some 
treatments are new, risky, or expensive). 

However, when genetic tests yield either certain (or 
probabilistic) information about relatives, or about 
late-onset conditions, distinctive additional ethical 
problems can arise. Should consent from certain rela- 
tives be sought before genetic tests are undertaken? 
Should relatives have a right to receive genetic inform- 
ation obtained from others, but which pertains to 
them? Should unexpected information about undis- 
closed paternity or nonpaternity be divulged? When, 
if ever, should genetic tests for late-onset conditions be 
done on individuals (children, noncompetent adults) 
who cannot consent for themselves? 


Genetics and Reproductive Ethics 


By contrast, genetics raises numerous distinctive eth- 
ical problems in human reproduction. Prenuptial, 
preconception, and preimplantation genetic testing 
(for those using in vitro fertilization (IVF)) are all 
used in various communities or jurisdictions to 
enable those who otherwise risk having a child with 
genetic disease to eliminate this risk by avoiding 
conceiving such children. More controversially, pre- 
natal testing followed by abortion of affected fetuses 
can be and is used for the same purpose. Genetic tests 
can also be used to settle paternity either prenatally or 
later. 

Some people fear that these possibilities might 
revive the old eugenic agenda, others that these prac- 
tices will lead to lack of respect for those who suffer 
genetic diseases. One fairly common view is that 
“negative” uses of genetic tests to avoid disease are 
permissible, but that their “positive” use to have 
“designer babies” with genes chosen for reasons 
other than avoiding disease is wrong. This position is 
problematic insofar as the boundary between disease 
and undesired characteristics is blurred. 

Germline gene therapy, which would eliminate 
the genes for certain diseases not only for a patient 
but for descendants, remains more controversial. 
Eliminating a gene associated with harmful effects 
in certain cases might also eliminate beneficial effects 
it has for carriers or in combination with other 
factors. 


Genetics and Social Issues 


Genetic information may be of value not only to pa- 
tients and their families, and to would-be parents, but 
more widely. Insurers haveargued thatthey need genetic 
test information to calculate risks and set premiums 
more accurately. There has been public worry that 
those whose test results indicate particularly high risks 
of disease or early death could be priced out of health 
or life insurance, so creating a “genetic underclass.’ 

In practice, there is so far limited evidence of the 
actuarial implications of most genetic variations. Risk 
levels are accurately established only for some serious 
single-gene disorders. Moreover, even for single-gene 
disorders, genetic tests for early-onset conditions add 
little of actuarial value, since information about these 
conditions is generally included in medical records. If 
insurers were permitted to request disclosure of all 
genetic test results (let alone to require that tests be 
taken) complex ethical problems could arise, particu- 
larly in the areas of privacy and data protection. 

Genetic test results can also be of relevance in 
numerous other social contexts. For example, they 
can be used forensically to identify criminals and to 
eliminate innocent suspects. They may be of interest 
to employers who want to know whether employees 
face particular health risks. Evidently, in these and 
other contexts, protection of individual rights and 
control both of genetic testing and of the use of test 
results will be ethically and politically sensitive, and 
demand effective regulation. 


Ethics and NonHuman Genetics 


Genetic information about nonhuman animals has long 
been seen as valuable: witness the breeding of pedigree 
animals. As in human genetics, advances in nonhuman 
genetics have raised additional issues. Some of the most 
contentious new issues in this area have been about 
genetic modification or engineering of animals. When 
this is done to treat human disease without harm to 
animals (‘pharming’: e.g., producing sheep that express 
human insulin in their milk) there is considerable pub- 
lic acceptance. When it is done for research purposes or 
harms animals there is considerable public opposition 
and unease (the engineering of animal models for 
human disease: ‘oncomouse’). The prospect of using 
the organs of genetically modified animals 
for transplant to humans (xenotransplantation) has 
aroused both public eagerness about possible benefits 
and public anxiety on safety and other ethical grounds. 

A mixture of ethical concerns also surrounds the 
genetic modification of plants. Some point to the pos- 
sibility of harm to the environment, to nonhuman ani- 
mals, or to human consumers of genetically modified 
plants (e.g., by crops with built-in insecticide); others 
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to the possible benefits to the environment, to nonhu- 
man animals, and to humans, for example by reducing 
the use of insecticides and herbicides and from nutri- 
tional improvements. 
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The Ets family of eukaryotic transcription factors 
characterized by a strongly conserved DNA-binding 
domain, called the domain ETS, is composed of 
more than 30 members and classified in 13 subfamilies 
depending on the sequence identity of this latter 
domain as well as on the conservation of other 
domains/motifs. The founding member of the Ets 
family, ets-1, was discovered in the early 1980s as 
part of the tripartite oncogene of the E26 avian eryth- 
roblastosis virus. In 1990, Ets proteins are found to 
activate transcription of genes by binding a sequence- 
specific site in the promoter/enhancer of these target 
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genes. Most Ets proteins are transcriptional activators 
but some have been characterized as repressors. 


Evolutionary Relatedness 


Ets genes are conserved throughout the metazon spe- 
cies ranging from diploblastic organisms to Dros- 
ophila and vertebrates, but they are absent from the 
genome of plants and yeast. Phylogenetic analyses 
indicate that the ets genes in contemporary species 
are derived from an ancestral gene early in metazoan 
evolution. The amplification of such families of tran- 
scription factors is viewed as a critical step in the 
evolution of multicellular animals, including higher 
vertebrates. 


The ETS Domain 

The ETS domain identifies all Ets proteins as sequence- 
specific DNA-binding proteins. This motif, composed 
of 85 amino acids, forms a winged helix—turn-helix 
tertiary structure, which allows Ets proteins to interact 
with an approximately 10 bp long DNA element con- 
taining a GGAA/T central core. This recognition 
motif is present in a vast majority of promoters and 
enhancers. 


Regulating Domains 

The other transcriptional modulating domains of the 
Ets proteins display only very few sequence identities, 
but are characterized by domains enriched in certain 
amino acids, i.e., proline, glutamine, or acidic residues. 
The variability lies in the number and the composition 
of these domains. 


Biological Importance 


Biological Role 

The Ets factors are expressed in almost all tissues of 
the organism and control a vast number of target 
genes. It is expected that different Ets proteins regu- 
late the expression of distinct target genes, thus 
generating biological specificity. Due to the experi- 
mental difficulty of demonstrating this target gene 
selection, there are only tentative lists that link puta- 
tive target genes to Ets regulators. Several Ets proteins 
play an important role in regulating mammalian hema- 
topoiesis anda number of other developmental proces- 
ses. For example, Ets-1 plays a critical role in the 
differentiation of hemopoietic stem cells and Tel is 
critical for fetal angiogenesis. Ets proteins are also 
implicated in the development and regulation of the 
immune system, and also in the regulation of genes 
controlling the cell cycle, neural differentiation, and 
apoptosis. 


Implication of Ets in Cancer 

DNA rearrangements in the loci encoding several Ets 
proteins are associated with tumorigenic processes. 
Chimeric proteins that contain domains of Ets pro- 
teins have been identified in certain types of leukemia, 
such as B-type childhood acute lymphoblastic leu- 
kemia (ALL) and in Ewing’s tumors. This chromo- 
somal translocation fuses a fragment of an ets gene to 
an unrelated gene that results in the expression of a 
chimeric oncoprotein. 

Ets proteins are also implicated in the appearance 
and/or evolution of certain types of cancer and some 
of them have their own oncogenic potential. In many 
cancers, there is an overexpression of one or several 
Ets proteins. For example, Ets-1 is overexpressed in 
invasive cancer, while the PEA3 subfamily is over- 
expressed in breast carcinoma. 


Regulation of Ets transcription factors 
Regulation of DNA Binding 


Many Ets transcription factors are subject to auto- 
regulatory mechanisms, which inhibit their DNA- 
binding activity by domains outside the ETS domain. 
This may function to prevent promiscuous DNA 
binding by these transcription factors because of 
their relatively nonstringent DNA-binding specificity. 
Moreover, posttranslational modifications, such as 
phosphorylation, represent other potential mechan- 
isms for regulating DNA binding. 


Interactions with Co-Regulatory Partners 
Ets proteins interact not only with the basal transcrip- 
tional complex, but also with other gene-specific 
transcription factors. For example, direct physical 
interaction between Ets factors and b-ZIP proteins 
represents a conserved mechanism for regulating gene 
expression in a variety of lymphoid and nonlymphoid 
cell types. 

Ets proteins also functionally cooperate with vari- 
ous transcriptional coactivators, such as the histone 
acetylase CBP/p300 that modulates the chromatin 


structure. 


Regulation by Signal Transduction Pathways 
Differential phosphorylation of transcription factors 
by signal transduction pathways plays a major role 
in gene expression. Many Ets transcription factors 
have been demonstrated to be direct targets of the 
mitogen activated protein kinase (MAPK) pathway. 
Most phosphorylation sites are located in the ETS 
domain and so phosphorylation may play a dual role 
in regulating transcriptional activation and DNA 
binding. 


Further Reading 

Graves B and Petersen JM (1998) Specificity within the ets family 
of transcription factors. Advances in Cancer Research 75: 
1-56. 

Sharrocks AD, Brown AL, Ling Y and Yates PR (1997) The ets 
domain transcription factor family. International Journal of 
Biochemistry and Cell Biology 29: 1371—1387. 


See also: Signal Transduction; Transcription 


Euchromatin 
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Euchromatin is the term for those parts of chromo- 
somes, generally the greater proportion, which show a 
normal cycle of decondensation at the end of mitosis. 
Most genes are in euchromatin, which, however, also 
contains a very high proportion of nongenic DNA. 


See also: Chromatin; Heterochromatin 


Eugenics 


See: Ethics and Genetics 


Eukaryotes 
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A eukaryote is an organism whose cells have chromo- 
somes with nucleosomal structure, separated from the 
cytoplasm by a nuclear envelope and exhibit func- 
tional compartmentalization in distinct cytoplasmic 
organelles. 


See also: Prokaryotes 


Eukaryotic Genes 


T M Picknett and S Brenner 
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Eukaryotic genes may include additional sequences 
that exist within the coding region, interrupting the 
protein coding sequence. These introns are excised 
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from the messenger RNA, which represents the exon 
(coding) regions. 

In the interrupted genes of eukaryotes, most in- 
trons appear to serve no function, and are removed 
during gene expression. However, some exceptions 
exist, notably in the yeast mitochondrion, where an 
intron itself codes for the synthesis of a protein that 
functions independently from the protein encoded by 
the exons. 

Not all eukaryotic genes are interrupted. Some 
correspond directly to the protein product as found 
in prokaryotes. 


See also: Introns and Exons 


Euploid 
J RS Fincham 
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Euploid is the term that denotes the condition of 
having a complete normal set of chromosomes, or a 
multiple thereof. Thus the term includes haploid (n), 
diploid (27), triploid (37), etc. Organisms that are not 
euploid (27 + 1, 2n — 1, etc.) are called aneuploid. 


See also: Aneuploid; Diploidy; Polyploidy; 
Triploidy 


Evolution 


B Guttman 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0430 


Evolution is the process through which organisms 
change into new types over time, as individuals 
gradually diverge from one another during the course 
of their reproduction. The fact that evolution has 
occurred (and continues to occur) is well documented 
in an enormous fossil record, and it is attested to 
by studies of comparative anatomy and compara- 
tive molecular structure (for instance, amino-acid 
sequences of homologous proteins in various species). 
As the geneticist Theodosius Dobzhansky observed, 
nothing in biology makes sense except in the light of 
evolution. The evolutionary history of life on earth 
may be summarized by a complex branching tree, a 
phylogeny, showing the relationships of all species to 
one another and the probable course of their evolution, 
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although, of course, phylogenies are subject to con- 
tinuing revision and modification like any other pieces 
of scientific information. At least in eukaryotes, simi- 
lar phylogenetic trees are generated whether one bases 
them on the relationships between morphological 
structures or the sequences of ribosomal RNA or of 
widely distributed proteins like cytochrome c. 


Theory of Evolution 


The theory of evolution is based on the fundamental 
nature of organisms. Organisms are genetic systems 
(Guttman, 1999). That is, they are self-reproducing 
systems that operate on the basis of instructions 
encoded in their genomes, and so during the course 
of reproduction, parental genomes must be replicated 
to produce new copies for the offspring. But replica- 
tion is inherently an error-prone process, and mu- 
tation continuously introduce genetic novelties. 
Furthermore, sexual reproduction entails the reshuf- 
fling of chromosomes into different combinations, 
so individuals acquire variant genomes, thus giving 
them different traits. Organisms inhabit ecosystems 
that afford various opportunities for obtaining the 
resources they need (energy, raw materials, living 
spaces, etc.). Each organism, with its particular com- 
bination of traits, has a particular ability to exploit 
those resources — in other words, to adapt to a particu- 
lar ecological niche — and thus experiences a certain 
level of reproductive success, which is generally meas- 
ured by an organism’s fitness. Those with the high- 
est reproductive success have, by definition, the 
highest fitness, and thus are most successful in passing 
on their particular genotypes. Thus, organisms are 
subject to a process of natural selection as those that 
are most fit for a particular way of life — those that are 
best adapted to a particular niche — are most successful 
in reproducing and their genotypes become most 
common. This, as Darwin recognized, is the most 
central and most critical process governing evolution, 
even though other factors also intervene. (The descrip- 
tion of the process clearly has a certain tautological 
character.) 


The Modern Synthesis 


While some major features of evolution, and espe- 
cially the centrality of natural selection, became clear 
from Darwin’s great work, a modern consensus on the 
process only emerged in the early decades of the 
twentieth century. By about 1940, it was possible to 
outline a modern synthetic theory combining the 
essential discoveries of Mendelian genetics with a 
mathematical analysis of genes in populations as 
outlined by R.A. Fisher, S. Wright, and J.B.S. Haldane, 


and with the observations of taxonomists and field 
naturalists, as outlined most clearly by Ernst Mayr 
for animals and by G.L. Stebbins for plants. Since 
that time, some features of the modern theory of 
have been challenged, and enormous detail has been 
added, but the theory as a whole remains successful 
and intact. 

Evolution is fundamentally a population phenom- 
enon. Individuals do not evolve. Populations do. 
Studies of morphological, genetic, and biochemical 
features have shown that natural populations harbor 
enormous variation, and it is generally believed that 
this variability is the basis for further evolution. It acts 
as a kind of genetic insurance, a buffer that allows the 
population to maintain itself by adapting to future 
environmental changes and perils. (A major concern 
for the survival of endangered species is the severe 
reduction in their genetic variability as a result of 
their reduced populations.) All natural populations 
are highly polymorphic, at least at a genetic and 
biochemical level. The classic observations by 
Dobzhansky and his associates of natural populations 
of Drosophila demonstrated that these populations 
carry many chromosome types, identified by the 
inversions they carry, and that the frequencies of chro- 
mosome types vary geographically, apparently reflect- 
ing subtle adaptations to different environments. 
Furthermore, the relative frequencies of different 
chromosomes may change regularly throughout the 
year, reflecting adaptations to conditions that change 
with the seasons. It is generally believed that the mere 
recombination of allelic differences already in a nat- 
ural population (ignoring further variations created by 
mutation) is adequate to produce considerable novelty 
and thus considerable raw material for natural selec- 
tion in the future. 

The evolutionary process is commonly divided into 
three phases. Microevolution refers to the relatively 
small changes that occur within populations and in- 
dividual species; speciation refers to the process in 
which a single species divides into two or more; and 
macroevolution refers to the larger changes observed 
over much longer times as organisms of quite different 
forms develop. This description revolves to a degree 
around the concept of a species, which is in itself a 
matter of considerable controversy at present. Con- 
temporary thinking has been shaped largely by the 
biological species concept delineated most clearly by 
Mayr: a species is a series of populations that are 
actually or potentially capable of interbreeding with 
one another. This definition is only relevant to sexu- 
ally reproducing organisms. The concept of a species 
may be meaningless for those that reproduce asexu- 
ally, since such organisms are related only by an 
ever-expanding family tree of cell division after cell 


division, augmented by occasional lateral genetic 
transfer, often mediated by viruses. The features of 
organisms on the many branches of this tree may 
diverge from one another without limit or may be 
kept somewhat confined by continuing selection. 
The biological species concept has been applied most 
consistently and successfully to certain groups of ani- 
mals; it may be applied with some difficulty in plants, 
which often are able to reproduce in more plastic ways 
and to hybridize with one another quite freely. This 
conception of a species has been challenged by the 
phylogenetic species concept, which is much more 
difficult to define but says, essentially, that a species 
shall be considered a distinct branch of a phylogenetic 
tree that can be distinguished morphologically or 
genetically. The issue may be more anthropological 
(that is, reflective of the human need to categorize 
objects neatly) than biological; it is clear in any case 
that ‘species’ by any conception have diverged from 
one another in the past and continue to do so. 


Speciation 

As described by Mayr and others, speciation probably 
occurs primarily through geographic isolation. Two 
populations are said to be sympatric if their ranges 
overlap and allopatric if they do not. Speciation 
in many well-documented instances has evidently 
occurred when one population of a species becomes 
isolated from the rest. During the time of its isolation, 
it acquires differences that result in reproductive isol- 
ation once the populations again become sympatric. 
Reproductive isolating mechanisms may entail eco- 
logical factors, such as occupying slightly different 
habitats so prospective mates do not come into con- 
tact; temporal factors, such as breeding at different 
times; and physical barriers to reproduction such 
as chromosomal rearrangements, incompatibility be- 
tween sperm and eggs, or failure of hybrid embryos to 
develop. The records of intense speciation in the past 
are quite obvious in archipelagos; the ground finches 
(Geospizinae) of the Galapagos Islands or the honey- 
creepers (Drepanididae) of the Hawaiian Islands show 
how one original species has apparently diverged 
into a considerable variety of species, occupying dif- 
ferent ecological niches, as populations probably 
became quite isolated from one another on different 
islands. 

While allopatric speciation may be a common pro- 
cess in animals, many plants have evolved through 
genetic events that may occur sympatrically. Plants 
appear to be much more plastic genetically than ani- 
mals, and plant development seems to be much more 
tolerant of major changes in the genome, such as the 
loss or addition of whole chromosomes and changes 
from a diploid to a triploid or tetraploid condition (or 
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even higher ploidy). Related plant species often have 
genomes related by such large changes. A great deal of 
plant evolution has been explained by introgressive 
hybridization, in which related species hybridize and 
one or more chromosomes of one parent species 
becomes incorporated into the genome of the other, 
eventually resulting in a third species with features 
derived from both parents. 

Detailed mathematical analysis of the behavior of 
genes in populations has shown how the frequencies 
of alleles may be changed by mutation or by various 
regimes of selection. The rate at which allele frequen- 
cies may change depends strongly on the size of the 
population, leading Sewall Wright to point out that in 
small populations gene frequencies may change 
rapidly in directions not determined by natural selec- 
tion. This phenomenon, called genetic drift, may be 
very important in speciation. The individuals that 
become isolated in the first place may themselves 
have genotypes different from the average genotypes 
of the parent population — the founder effect; further- 
more, genetic drift within the small isolated popula- 
tion may produce just those differences that make for 
eventual reproductive isolation. 


Extinction 
The fossil record reveals three major patterns in evo- 
lution: speciation, extinction, and phyletic evolution. 
Speciation has already been described. Extinction is 
clearly a major feature of evolution. Although a few 
species have apparently persisted for very long times 
(in relatively stable environments, such as the depths 
of the ocean), most species have appeared in the fossil 
record, have persisted for periods on the order of 
100000 years to a few million years, and then have 
become extinct. The paleontologist G.G. Simpson 
estimated that 99.9% of all species have become 
extinct. Phyletic evolution refers to a gradual change 
in morphology in a certain direction; for instance, 
hominid (human) evolution, while involving apparent 
instances of speciation, has also entailed a gradual 
increase in height and cranial capacity, and certain 
trends in anatomical details. However, some paleonto- 
logists have proposed that phyletic evolution is illu- 
sory and that evolution is more properly described as 
punctuated equilibrium — that is, a species generally 
endures with little or no change until it becomes 
extinct, but occasional instances of speciation occur 
rapidly, so it may appear that a single species has 
gradually changed. This may be a non-issue. Instances 
of both phyletic evolution and punctuated equilib- 
rium can apparently be documented in the fossil 
record. 

The synthetic theory pictured evolution as being 
driven largely by gradual selection of alleles with small 
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effects, over relatively long times. This viewpoint has 
been challenged, by champions of punctuated equilib- 
rium with rapid speciation and by proposals that more 
drastic genetic events might be responsible for quite 
dramatic changes in morphology. It is clear that very 
small effects, as demonstrated by selection experi- 
ments with animals such as Drosophila, can account 
for the large morphological changes observed in fossil 
series (Stebbins and Ayala, 1981). Furthermore, spe- 
ciation that appears to be rapid on the geological time 
scale may actually require tens of thousands of years, a 
period perfectly consistent with small, slow genetic 
events. On the other hand, studies of developmental 
genetics have revealed genes, such as homeotic genes, 
that govern major morphological changes, and the 
growing marriage of developmental biology with 
evolution may reveal ways that rapid evolutionary 
change might result from changes in these regulatory 
genes. 


Microbial Evolution 


There are clearly instances of lateral transfer, where 
viruses, for example, carry genes from one species to 
another or insert themselves in the middle of a gene 
making substantial changes. That may further cloud 
the picture at times, and it clearly plays a far greater 
role in complexities of microbial evolution. The 
current sequencing of large numbers of microbial 
genomes is facilitating rapid growth in our under- 
standing of that process, presenting very different 
pictures of the early stages of cellular evolution and 
the development of the three kingdoms than those 
based primarily on ribosomal RNA data. The pos- 
sibility has been raised that viruses may even provide 
windows into some of the ancient organisms that dis- 
appeared in the bottleneck of the ‘last common ances- 
tor’ of the three kingdoms; only a fraction of the genes 
of the large viruses look like anything seen to date 
in cellular organisms, and a significant number of 
similarities have been seen between genes of bacterio- 
phages and eukaryotic viruses. There is much evi- 
dence, at least, that most families of viruses are very 
ancient in origin and have coevolved with their vari- 
ous hosts. 


Further Reading 
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Genomic Complexity Increases by Gene 
Duplication and Selection for New 
Function 


Mice, humans, the lowly intestinal bacterium Escheri- 
chia coli, and all other forms of life evolved from the 
same common ancestor that was alive on this planet a 
few billion years ago. We know this is the case from 
the universal use of the same molecule- DNA -for 
the storage of genetic information, and from the 
nearly universal genetic code. But E. coli has a genome 
size of 4.2 megabases (Mb), while the mammalian 
genome is nearly 1000-fold larger at ~3000 Mb. If one 
assumes that our common ancestor had a genome size 
that was no larger than that of the modern-day E. coli, 
the obvious question one can ask is where did all of 
our extra DNA come from? 

The answer is that our genome grew in size and 
evolved through a repeated process of duplication and 
divergence. Duplication events can occur essentially at 
random throughout the genome and the size of the 
duplication unit can vary from as little as a few nucleo- 
tides to large subchromosomal sections that are tens, 
or even hundreds, of megabases in length. When the 
duplicated segment contains one or more genes, either 
the original or duplicated copy of each is set free to 
accumulate mutations without harm to the organism 
since the other good copy with an original function 
will still be present. 

Duplicated regions, like all other genetic novelties, 
must originate in the genome of a single individual and 
their initial survival in at least some animals in each 
subsequent generation of a population is, most often, a 
simple matter of chance. This is because the addition 
of one extra copy of most genes — to the two already 
present in a diploid genome — is usually tolerated 
without significant harm to the individual animal. In 
the terminology of population genetics, most dupli- 
cated units are essentially netural (in terms of genetic 


selection) and thus, they are subject to genetic drift, 
inherited by some offspring but not others derived 
from parents that carry the duplication unit. By 
chance, most neutral genetic elements will succumb 
to extinction within a matter of generations. But 
even when a duplicated region survives for a sig- 
nificant period of time, random mutations in what 
were once-functional genes will almost always lead 
to nonfunctionality. At this point, the gene becomes a 
pseudogene. Pseudogenes will be subject to continu- 
ous genetic drift with the accumulation of new muta- 
tions at a pace that is so predictable (~0.5% divergence 
per million years) as to be likened to a ‘molecular 
clock.’ Eventually, nearly all pseudogene sequences 
will tend to drift past a boundary where it is no longer 
possible to identify the functional genes from which 
they derived. Continued drift will act to turn a once- 
functional sequence into a sequence of essentially ran- 
dom DNA. 

Miraculously, every so often, the accumulation of a 
set of random mutations in a spare copy of a gene can 
lead to the emergence of a new functional unit — or 
gene — that provides benefit and, as a consequence, 
selective advantage to the organism in which it resides. 
Usually, the new gene has a function that is related to 
the original gene function. However, it is often the 
case that the new gene will have a novel expression 
pattern — spatially, temporally, or both — which must 
result from alterations in cis-regulatory sequences that 
occur along with codon changes. A new function can 
emerge directly from a previously functional gene or 
even from a pseudogene. In the latter case, a gene can 
go through a period of nonfunctionality during which 
there may be multiple alterations before the gene 
comes back to life. Molecular events of this class can 
play a role in ‘punctuated evolution’ where, according 
to the fossil or phylogenetic record, an organism or 
evolutionary line appears to have taken a ‘quantum 
leap’ forward to a new phenotypic state. 


Duplication by Transposition 


With duplication acting as such an important force in 
evolution, it is critical to understand the mechanisms 
by which it occurs. These fall into two broad cat- 
egories: (1) transposition is responsible for the disper- 
sion of related sequences; (2) unequal crossing-over is 
responsible for the generation of gene clusters. Trans- 
position refers to a process in which one region of the 
genome relocates to a new chromosomal location. 
Transposition can occur either through the direct 
movement of original sequences from one site to 
another or through an RNA intermediate that leaves 
the original site intact. When the genomic region itself 
(rather than its proxy) has moved, the ‘duplication’ of 
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genetic material actually occurs in a subsequent gen- 
eration after the transposed region has segregated into 
the same genome as the originally positioned region 
from a nondeleted homolog. In theory, there is no 
upper limit to the size of a genomic region that can 
be duplicated in this way. 

A much more common mode of transposition 
occurs by means of an intermediate RNA transcript 
that is reverse-transcribed into DNA and then 
inserted randomly into the genome. This process is 
referred to as retrotransposition. The size of the retro- 
transposition unit — called a retroposon — cannot be 
larger than the size of the intermediate RNA tran- 
script. Retrotransposition has been exploited by vari- 
ous families of selfish genetic elements, some of 
which have been copied into 100 000 or more locations 
dispersed throughout the genome with a self-encoded 
reverse transcriptase. But, examples of functional, 
intronless retroposons — such as Pgk2 and Pdha2 — 
have also been identified. In such cases, functionality 
is absolutely dependent upon novel regulatory ele- 
ments either present at the site of insertion or created 
by subsequent mutations in these sequences. 


Duplication by Unequal Crossing-Over 


The second broad class of duplication events result 
from unequal crossing-over. Normal crossing-over, 
or recombination, can occur between equivalent 
sequences on homologous chromatids present in a 
synaptonemal complex that forms during the pachy- 
tene stage of meiosis in both male and female mam- 
mals. Unequal crossing-over — also referred to as 
illegitimate recombination — refers to crossover events 
that occur between nonequivalent sequences. Unequal 
crossing-over can be initiated by the presence of re- 
lated sequences — such as highly repeated retroposon- 
dispersed selfish elements — located nearby in the 
genome. Although the event is unequal, in this case, 
it is still mediated by the homology that exists at the 
two nonequivalent sites. 

So-called nonhomologous unequal crossovers can 
also occur, although they are much rarer than homo- 
logous events. They are “so-called” because even these 
events may be dependent on at least a short stretch of 
sequence homology at the two sites at which the event 
is initiated. The initial duplication event that produces 
a two-gene cluster may be either homologous or non- 
homologous, but once two units of related sequence 
are present in tandem, further rounds of homologous 
unequal crossing-over can be easily initiated between 
nonequivalent members of the pair as illustrated in the 
Figure |. Thus, it is easy to see how clusters can 
expand to contain three, four, and many more copies 
of an original DNA sequence. 
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Figure | Unequal crossing-over generates gene families. The left side illustrates an unequal crossing-over event and 


the two products that are generated. One product is deleted and the other is duplicated for the same region. In this 
example, the duplicated region contains a second complete copy of a single gene (B). The right side illustrates a 
second round of unequal crossing-over that can occur in a genome that is homozygous for the original duplicated 
chromosome. In this case, the crossover event has occurred between the two copies of the original gene. Only the 
duplicated product generated by this event is shown. Over time, the three copies of the B gene can diverge into three 


distinct functional units of a gene family cluster. 


In all cases, unequal crossing-over between homo- 
logs results in two reciprocal chromosomal products: 
one will have a duplication of the region located be- 
tween the two sites and the other will have a deletion 
that covers the same exact region (Figure 1). It is im- 
portant to remember that, unlike retrotransposition, 
unequal crossing-over operates on genomic regions 
without regard to functional boundaries. The size of 
the duplicated region can vary from a few base pairs to 
tens or even hundreds of kilobases and it can contain 
no genes, a portion of a gene, a few genes, or many. 


Genetic Exchange between Related 
DNA Elements 


There are many examples in the genome where genetic 
information appears to flow from one DNA element 
to other related — but nonallelic — elements located 
nearby or even on different chromosomes. In some 
special cases, the flow of information is so extreme as 
to allow all members of a gene family to coevolve with 
near-identity as in the case of ribosomal RNA genes. In 
at least one case — that of the class I genes of the major 
histocompatibility complex (MHC or H2) — informa- 
tion flow is unidirectionally selected, going from a 
series of 25 to 38 nonfunctional pseudogenes into two 
or three functional genes. In this case, intergenic infor- 
mation transfer serves to increase dramatically the 
level of polymorphism that is present at the small 
number of functional gene members of this family. 
Information flow between related DNA sequences 
occurs as a result of an alternative outcome from the 


same exact process that is responsible for unequal 
crossing-over. This alternative outcome is known as 
intergenic gene conversion. Gene conversion was ori- 
ginally defined in yeast through the observation of 
altered ratios of segregation from individual loci that 
were followed in tetrad analyses. These observations 
were fully explained within the context of the Holli- 
day model of DNA recombination which states that 
homologous DNA duplexes first exchange single 
strands that hybridize to their complements and 
migrate for hundreds or thousands of bases. Resolu- 
tion of this ‘Holliday intermediate’ can lead with 
equal frequency to crossing-over between flanking 
markers or back to the status quo without crossing- 
over. In the latter case, a short single strand stretch 
from the invading molecule will be left behind within 
the DNA that was invaded. If an invading strand 
carries nucleotides that differ at any site from the 
strand that was replaced, these will lead to the produc- 
tion of heteroduplexes with base pair mismatches. 
Mismatches can be repaired (in either direction) by 
specialized ‘repair enzymes’ or they can remain as-is 
to produce non-identical daughter DNAs through the 
next round of replication. 

By extrapolation, it is easy to see how the Holliday 
model can be applied to the case of an unequal cross- 
over intermediate which can be resolved in one of two 
directions with equal probability. With one resolution, 
unequal crossing-over will result; with the alternative 
resolution, gene conversion can be initiated between 
nonallelic sequences. Remarkably, information trans- 
fer — presumably by means of gene conversion — can 


also occur across related DNA sequences that are even 
distributed to different chromosomes. There have 
been numerous modifications of the Holliday model 
— including those proposed by Meselson and Rading — 
that allow a better fit to the actual data, and there is 
still lack of consensus on the some of the details 
involved. However, the central feature of the Holliday 
model — single-strand invasion, branch migration, and 
duplex resolution — is still considered to provide the 
molecular basis for gene conversion. 


See also: Gene Conversion; Holliday’s Model; 
Major Histocompatibility Complex (MHC); 
Molecular Clock; Unequal Crossing Over 


Evolutionarily Stable 
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When natural selection acts on several different alter- 
native behaviors, the most optimal should be favored. 
If costs and benefits of alternatives depend on choices 
made by other individuals, optimal solutions are not 
always as obvious as they are in simpler situations. An 
evolutionarily stable strategy, or ESS, is a mathemat- 
ical definition for an optimal choice of strategy under 
such conditions. 

Interactions between two individuals can be de- 
picted as a mathematical game between two players. 
A branch of mathematics, called game theory, seeks to 
find the best strategy to play in any given carefully 
defined game. The central problem of game theory is 
to find the best strategy to take in a game that depends 
on what other players are expected to do. 

Originally used in studies of economics and human 
conflicts of interest, game theoretical thinking was 
first used in biology by Hamilton (1967) to study evo- 
lution of sex ratios. Later, game theory was explicitly 
applied to behavioral biology by Maynard Smith (1972) 
and Maynard Smith and Price (1973). Maynard Smith 
coined the term ESS for a refinement of the Nash 
equilibrium used by economists to define a solution 
to a game. 

The notion of a Nash equilibrium makes some tacit 
assumptions about rational foresight on the part of the 
player. An ESS must meet a stricter set of require- 
ments than Nash equilibria; the mathematical differ- 
ence boils down to whether a tie between strategies 
leads to a new strategy being considered better. An 
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ESS attempts to define conditions under which blind 
evolution will return to the strategy in question, rather 
than requiring rational foresight to dissuade the ex- 
ploration of alternatives. 

An ESS is a strategy that cannot be beaten by any 
other strategy. An individual adopting it outperforms 
any individual adopting any alternative tactic. No 
other strategy can outperform an ESS. Individuals 
adopting an ESS tactic have a higher reproductive 
success than individuals adopting other tactics. Such 
an unbeatable tactic can go to fixation (100%) in a 
population and such a population cannot be invaded 
by any other tactic. Inevitably, an ESS ends up 
encountering itself more often than it confronts any 
other strategy, and it must therefore perform better 
against itself than any other strategy can perform 
against it. 

Game theory involves conflicts of interest in which 
the value of a given action by a decision maker 
depends both on its own choices as well as on those 
of others. A ‘payoff’ matrix of values of outcomes is 
postulated based on the respective behaviors of two or 
more contestants under all possible situations. Payoffs 
are frequency dependent. Decision rules that repre- 
sent an evolutionarily stable solution to such an 
evolutionary game constitute an ESS (Axelrod and 
Hamilton, 1981). 

As an example, consider a well-known game theor- 
etical model called the ‘prisoner’s dilemma.’ In this 
hypothetical situation, two partners in crime have 
been arrested. The police interrogate each person 
alone. Each party could cooperate with the other and 
steadfastly refuse to squeal on their friend. If both 
cooperate and remain silent, the authorities cannot 
establish guilt and both get off scott free (loyalty 
pays off). Alternatively, each could betray their part- 
ner and confess. Now consider respective rewards and 
punishments received by each partner for making each 
decision. If only one party confesses while the other 
remains quiet, this betrayal is rewarded by giving the 
confessor a light sentence for providing ‘state’s evi- 
dence’ and testifying as to the guilt of their loyal silent 
partner, who is then found guilty and receives a much 
longer prison term (he gets the ‘sucker’s payoff’). 
However, if both partners tell, the authorities put 
both on trial and both receive moderate, but not 
long, sentences of imprisonment. In a ‘zero sum’ 
game, all losses add up to equal all gains. Not so in 
this game, where each partner can gain considerably 
without as much loss to the other (indeed, by working 
together, both could escape conviction altogether). 
But they are not allowed to work together and neither 
knows what the other will do. 

Here then, is the classic ‘prisoner’s dilemma’: each 
prisoner must decide what to do without knowing 
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what decision the other will make. What is the best 
strategy? Confess to the crime! Any attempt to co- 
operate could lead to the ‘sucker’s payoff,’ but confes- 
sion results either in a light sentence or a moderate 
one. Avoid the worst situation. In such a symmetric 
nonzero sum game, both partners betray the other’s 
confidence and both do moderate ‘time.’ Although 
both partners would have been better off if they had 
cooperated, the best solution for each person indi- 
vidually in isolation is to defect rather than take the 
risk of being loyal but being betrayed and ending up 
with the inglorious ‘sucker’s payoff.’ 

The ‘prisoner’s dilemma’ game involves just one 
decision. Suppose instead, that participants interact 
repeatedly and that each knows that the other will be 
encountered again and again. Now many decisions 
must be made in sequence. In such a situation, “the 
future can cast a long shadow backwards onto the 
present” (Axelrod, 1984). Cooperation can evolve 
under such a long-term situation. Consider the evolu- 
tionary game “tit-for-tat,” the rules of which are co- 
operate on the first encounter but then copy the 
behavior of the other player on all subsequent encoun- 
ters. Using this strategy, a player always cooperates on 
its first encounter. But, if player B defects, player A 
retaliates on its next move. In a population composed 
of a mixture of players with a variety of behavioral 
strategies, an individual employing the tit-for-tat 
strategy does well. When interacting with cooperative 
individuals, players always cooperate to the mutual 
advantage of both. If the other player does not coop- 
erate, the two may then retaliate all the time, and the 
tit-for-tat player will receive none of the advantages of 
cooperation. The initial attempt at cooperation will 
incur only a minor cost. The tit-for-tat strategy is most 
profitable, quickly spreading to fixation. When the 
entire population employs the tit-for-tat strategy, it 
cannot be invaded by individuals employing most 
other tactics — tit-for-tat is normally an ESS (but see 
below for an exception). 

Axelrod (1984) identified three behavioral ten- 
dencies that would favor the evolution of cooperation: 
(1) being ‘nice’ (never first to defect); (2) being ‘provoc- 
able’ (retaliate against defection); and (3) being ‘forgiv- 
ing.’ The first two are the hallmarks of tit-for-tat. The 
third, allowing bygones to be bygones and resuming 
cooperation is the strategy known as ‘generous tit-for- 
tat,’ unusual in that it can invade tit for tat under certain 
conditions. Possession of these three behavioral traits 
make it more likely that both parties will reap the 
benefits of mutual cooperation. Many highly social 
animals do indeed display these three behaviors. 

The above examples illustrate ‘pure’ strategies: 
always adopt a single, best rule of behavior. Such 
an outcome often arises in contests with just two 


contestants. However, when an individual must play 
against an entire population of other individuals, ESS 
solutions are often ‘mixed,’ with probabilistic rules 
determining the chosen strategy. In a particular situat- 
ion, be a bully with probability p but be cowardly with 
probability g. At equilibrium, a fraction p of the popu- 
lation will be bullies and another fraction q will be 
cowards, with each tactic doing equally well overall. 
Overall benefit to all bullies equals overall benefit for 
all cowards. If the proportions in the population devi- 
ate toward too many bullies, cowards outperform 
bullies, whereas if there are too many cowards, bullies 
perform better. This is the classic hawk—dove game. 
Sex ratios are similar: if males are in short supply, on 
average an individual male will contribute more genes 
to the next generation than an individual female (and 
vice versa if females are scarce). These are also exam- 
ples of frequency-dependent selection. 

ESS rules can also be ‘conditional,’ taking a form 
like “if hungry, be a bully, but if satiated be a coward” 
(Enquist, 1985). In the real world, most behaviors are 
probably closely attuned to such immediate envir- 
onmental situations. Often, combatants are not 
equal, leading to conditional rules, such as “fight if 
I’m bigger” but “flee if I’m smaller” (Hammerstein, 
1981). Such rules lead to pecking orders with larger 
animals dominant over smaller ones. Because even 
the winner can be injured in a fight, fights are best 
avoided by both contestants if the outcome is already 
relatively certain. Often, ritualized appeasement 
behaviors and postures are adopted by the loser, effect- 
ively curtailing aggressive behaviors of winners. 
Indeed, fights only make evolutionary sense when 
two contestants are closely matched and each is 
equally likely to win (Enquist and Leimar, 1983). 
In such a situation, fights escalate and serious injuries 
can occur. Often the loser gives up abruptly and 
flees, but holds its stance almost as a bluff, right up 
until the end. Among many animals, residents typ- 
ically win in encounters with vagrants — the first 
animal to arrive seems to acquire ownership and the 
motivation to defend its turf. Game theory easily 
accommodates such flexible behavior (Maynard 
Smith and Parker, 1976). The ESS approach has been 
particularly useful in analyzing the evolution of 
communication (Johnstone, 1997). 
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An evolutionary rate is used to describe the dynamics 
of change in a lineage across many generations. The 
changes of interest may be in the genome itself or 
in the phenotypic expression of underlying genetic 
events. 

For example, one might be interested in the evolu- 
tionary rate during the domestication of corn (Zea 
mays) from its teosinte ancestor (Z. parviglumis or a 
related species). One would first need an estimate of 
the time since their divergence from a common ances- 
tor, which in this case is approximately 7500 years ago 
based on archeological evidence from Mesoamerica 
(where corn was domesticated). The evolutionary 
rate of genetic change could be ascertained by com- 
paring DNA sequences, ideally for several genes from 
a number of individuals of each species. To a first 
approximation, the rate of change can be expressed 
as the number of differences, per base pair sequenced, 
per year of divergence, where the time of divergence of 
two lineages is twice the time since their common 
ancestor. In practice, several issues may necessitate 
more complex analyses, such as the possibility that a 
single difference in DNA sequence may reflect multi- 
ple evolutionary changes; this particular effect is most 
pronounced when the sequences are highly divergent. 

The evolutionary rate of phenotypic change could 
be obtained by comparing the values of one or more 
traits of interest, such as the number of seeds produced 
per ear or the concentration of oil in the seeds. These 
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traits may depend on environmental influences, such 
as soil fertility, as well as on genetic changes; it is 
therefore important that the corn and teosinte plants 
be grown under the same conditions to isolate the 
effect of evolutionary changes in genotype from the 
direct effects of environment. The rate of change in a 
given phenotype could then be calculated as the dif- 
ference in the average value of the trait in the two 
species, divided by twice the time of divergence. 
Note, however, that this calculation may give a mis- 
leading picture, as the common ancestor may not have 
had a trait value intermediate to the values in modern 
corn and teosinte. Indeed, it is likely in this case that 
the ancestor was much more like present-day teosinte, 
with most of the phenotypic change having occurred 
as a consequence of rapid evolution of corn under 
domestication. 

Evolutionary rates differ quite substantially from 
one case to the next, and for a variety of reasons. In 
the broadest terms, evolutionary change at the genetic 
level depends on the interplay of several processes, 
including mutation, which produces new genetic vari- 
ation, and natural selection, which influences the fate 
of any particular genetic variant. A few examples serve 
to illustrate two of the most important factors that 
influence rates of genetic evolution. 


Replication and Repair 


All cellular organisms have DNA as their hereditary 
material, but some viruses, including HIV (which 
causes AIDS) and influenza virus, use RNA instead. 
These RNA viruses undergo extremely rapid sequence 
evolution because RNA replication lacks the proof- 
reading and repair processes that increase the fidelity 
of DNA replication. Even among the DNA-based 
bacteria, there exist mutants that are defective in 
DNA repair, and these ‘mutators’ should evolve 
much faster at the level of their DNA sequence. 


Functional Constraints 


A mutation may be deleterious, neutral, or beneficial 
in terms of its effect on an organism’s reproductive 
success. Deleterious and neutral mutations are both 
very common, whereas beneficial mutations are much 
rarer and thus have less effect on variation in evolu- 
tionary rates at the genetic level. Because of the redun- 
dancy of the genetic code, some point mutations in 
protein-encoding genes (especially those at the third 
position in a codon) will not actually alter the amino 
acid sequence of the protein. Such synonymous muta- 
tions are therefore likely to be neutral. By contrast, 
nonsynonymous mutations cause a change from one 
amino acid to another, and such mutations often have 
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deleterious consequences for the protein’s function 
and, ultimately, the organism’s performance. The 
extent to which nonsynonymous mutations are dele- 
terious depends on their particular position within a 
gene as well as on the particular gene. Mutations that 
alter critical positions in a protein’s structure are 
usually more harmful than those that affect a less 
crucial site. Evolutionary approaches can be used to 
identify conserved sequences, which in turn suggest 
potentially important features of protein structure and 
function. Among different genes, those that encode 
essential and highly constrained proteins can tolerate 
fewer mutations than those that encode less con- 
strained proteins, which may accept a wider range of 
mutations without compromising the organism’s per- 
formance. For example, the rate of amino acid substi- 
tution in fibrinopeptides (proteins involved with 
blood-clotting) is more than 100 times faster than the 
corresponding rate in histones (proteins used to pack- 
age DNA in eukaryotic chromosomes). 

Neutral mutations serve as a sort of benchmark for 
understanding evolutionary rates, and they lead to the 
notion of a ‘molecular clock’ to describe genetic evo- 
lution. Population genetic theory shows that the 
expected evolutionary rate of genetic change for neu- 
tral mutations depends only on the underlying rate at 
which these mutations occur, and not on population 
size or natural selection. This simple result can be 
understood as follows. Let u be the rate of neutral 
mutation and N be the population size, so that each 
generation 2Ny new neutral mutations arise in a 
diploid population. Because they are neutral, each of 
these mutations has no greater or lesser chance of 
eventually being substituted in the population than 
any other of the 2N alleles present at a locus; in other 
words, a neutral mutation has a probability of 1/2N of 
becoming substituted. Given these considerations, 
the overall rate of substitution of neutral mutations 
is 2Nu x 1/2N = n. In other words, the rate of genetic 
evolution would, in the case of neutral mutations, 
behave like a stochastic molecular clock which ticks 
at the rate u. 

Rates of phenotypic evolution are even more com- 
plex and variable. Whereas the balance between neu- 
tral and deleterious mutations is especially important 
for understanding rates of genetic evolution, neither of 
these classes is thought to play much role in pheno- 
typic evolution — neutral mutations because they have 
no outward manifestation, and deleterious mutations 
because they will be eliminated by natural selection. 
Instead, phenotypic evolution depends on beneficial 
mutations, which are rare but extremely important 
because they provide raw material for organisms to 
adapt evolutionarily to their environments. Species 
that live in environments that hardly change over 


long periods of time typically show very slow rates 
of phenotypic evolution. Such organisms have pre- 
sumably run out of ways to become better adapted 
to their environment, accounting for their phenotypic 
stasis. The horseshoe crab (Limulus polyphemus) is 
one such ‘living fossil’; its outward appearance is 
very similar, although not identical, to fossils from 
more than 200 million years ago. At the other extreme, 
organisms in new environments often experience dif- 
ferent selective agents and constraints from their 
ancestors, thus promoting rapid phenotypic evolution 
as they adapt genetically to their new environment. 
The conspicuous differences between domesticated 
plants and animals and their wild progenitors provide 
many examples of very rapid change. Another inter- 
esting example is the morphological divergence of 
Darwin’s finches (Geospiza spp.) in the Galapagos 
Islands, where these birds experienced an environ- 
ment different from their mainland ancestor. A critical 
factor in their rapid evolution was their release from 
competition with other species, which presented the 
island populations with the opportunity to fill eco- 
logical roles that would otherwise not have been 
available. 


See also: Genetic Drift; Molecular Clock; 
Mutators; Natural Selection; Retroviruses 


Ewing’s Tumor 
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Ewing’s tumor is a malignant neoplasm of bone and 
soft tissues, known by several alternative names, 
including peripheral primitive neuroectodermal 
tumor, neuroepithelioma, and Askin’s tumor (when 
affecting the chest wall). The tumor usually develops 
in children and adolescents. The neoplastic cells are 
primitive, although there is varying evidence of neuro- 
ectodermal differentiation. The cells contain charac- 
teristic chromosomal translocations, producing fusions 
between the EWS gene at chromosome 22q12 and 
several members of the Ets family of transcription 
factors, most frequently FLI1 at chromosome 11q24. 
The Ews-Ets family fusion genes are likely to contri- 
bute to neoplastic progression by induction of a range 
of secondary transforming genes. 


See also: Ets Family 
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In the context of genetics, this usually means exchange 
of segments between chromosomes. Exchanges be- 
tween equivalent segments of paired homologous 
chromosomes occur regularly in meiosis, and occasion- 
ally in mitosis (see Crossing-Over). Exceptionally, 
crossing-over can occur between chromosomes 
which are paired out of register, to give products of 
unequal size (see Unequal Crossing Over). 

Exchanges between nonhomologous segments to 
give structurally rearranged chromosomes occur as 
rare aberrations, the frequency of which is greatly 
increased by chromosome-breaking agents such as 
X-rays (see Segmental Interchange). 

The term exchange may also refer to the point in a 
single recombinant chromosome or nucleic acid mole- 
cule where the sequence switches from one parental 
type to the other. Thus a bacteriophage particle emer- 
ging from a mixedly infected bacterial cell may be said 
to have undergone one or more exchanges in its gen- 
ome without any implication as to the nature, recipro- 
cal or nonreciprocal, of the recombination process. 


See also: Crossing-Over; Genetic Recombination; 
Segmental Interchange; Unequal Crossing Over 


Exchange Pairing 


See: Exchange, Segmental Interchange 
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Faithful maintenance of the genome is important for 
survival of both the species and the individual. While 
stability is the hallmark of genome maintenance, the 
DNA molecule itself is susceptible to alterations as it 
is the target for a variety of reactive molecules that 
damage and modify DNA. Typically we are not aware 
that such damage has occurred because cells have 
mechanisms for the error-free removal of DNA 
damage and restoration of the DNA molecule to its 
original unmodified state. If DNA damage is not 
removed, mutations (permanent changes in the genetic 
code) may result, and mutations in critical genes are 
important events in cancer initiation and progression. 
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DNA Damage 


At some point in time virtually all cells are expo- 
sed to endogenous and environmental agents that 
damage the genome. Genetic damage is a rare event 
as cells possess multiple mechanisms for eliminating 
or neutralizing genotoxic substances before they 
damage DNA. But damage does occur and sources 
of modification include rare misincorporation events 
during DNA replication, normal cellular metabolism 
involving oxygen and water which generate DNA- 
damaging free radicals, and extracellular sources 
such as environmental chemicals and sunlight (UV 
radiation). Damage may include base modification 
or cleavage of the phosphodiester backbone such 
that RNA and DNA polymerases are blocked at 
the lesion (modified base or strand break) and 
unable to translocate along the helix, thus interrupt- 
ing normal DNA replication and RNA trans- 


cription. 


General Comments on DNA Repair 


In both prokaryotes and eukaryotes, a major cellular 
mechanism for the removal of DNA damage is 
nucleotide excision repair (excision repair), an en- 
zymatic pathway that recognizes and corrects a wide 
spectrum of structural anomalies (DNA lesions) ran- 
ging from bulky, helix-distorting adducts to nonhelix- 
distorting lesions. The modifications that transform 
normal bases into damaged bases corrected by nucleo- 
tide excision repair are so diverse that it is unlikely 
that a specific chemical structure is recognized. 
Rather, it appears that any abnormal DNA structure 
that destabilizes (denatures) the double helix is recog- 
nized as damage both in Escherichia coli and human 
cells. 

The primary function of nucleotide excision repair 
is removal of bulky adducts generated by chemicals or 
UV radiation, while base excision repair is the major 
pathway for correction of non-helix-distorting lesions 
such as those introduced by ionizing radiation or 
cellular metabolic events. Additional pathways exist 
for direct reversal of certain types of damage (e. 8 
photolyase and methyltransferase), correction of mis- 
matched bases, removal of interstrand crosslinks, and 
repair of DNA strand breaks. Excision repair involves 
removal of a damaged nucleotide by dual incisions 
bracketing the lesion; this is accomplished by a multi- 
subunit enzyme referred to as the excision nuclease or 
excinuclease. The basic mechanism of excision repair 
involves: (1) damage recognition; (2) subunit assem- 
bly; (3) dual incisions that result in excision of the 
damage-containing oligomer; (4) resynthesis to fill 
in the gap; and (5) ligation to regenerate an intact 
molecule. 
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Excision Repair in Escherichia coli 


UvrA, UvrB, and UvrC constitute the E. coli excision 
nuclease, (A)BC excinuclease. UvrA binds specifically 
to both damaged DNA and UvrB and, by virtue of 
these interactions, delivers UvrB to the damage site 
(see Figure 1). UvrA, a molecular matchmaker, then 
dissociates from the UvrB-DNA complex. UvrC 
interacts with UvrB bound to DNA and the two acting 
together make the 3’ incision, and then UvrC makes the 
5’ incision on the damaged strand. These concerted 
reactions begin with hydrolysis at the fourth or fifth 
bond 3’ to the damage, followed within a fraction of 
a second by incision at the eighth phosphodiester 
bond 5’ to the lesion. UvrD is a helicase that releases 
both UvrC and the 12-13 nucleotide-long damage- 
containing oligomer. Repair is completed by DNA 
polymerase I, which synthesizes the repair patch and 


Figure | Models for nucleotide excision repair in 
humans (A) and in Escherichia coli (B). (i) DNA damaged 
by UV radiation or chemicals is a substrate for excision 
repair and repair is initiated by recognition of the 
damage and formation of a stable complex at the 
damage site. This is accomplished by RPA, XPA, XPC, 
and TFIIH in humans and by UvrA and UvrB in E. coli. (ii) 
Each system has a protein that functions as a molecular 
matchmaker. In humans XPC®HR23B recruits XPG to 
the preincision complex and in E. coli UvrA delivers 
UvrB to the damage site; both XPC and UvrA then 
leave, having matched the 3’ endonuclease to the 
damaged DNA. (iii) In both pathways the last member 
of the excinuclease to assemble is the enzyme 
responsible for the 5’ incision event. This is the 
XPFeERCCI heterodimer in humans and UvrC in E. coli. 
(iv) Dual incisions follow rapidly and the damage- 
containing oligomer is released as the excision nuclease 
dissociates. In humans this is accomplished without 
additional proteins, while E. coli requires an accessory 
repair helicase (Hel Il, product of the UvrD gene) to 
release the damage-containing oligomer and UvrC. (v) In 
humans resynthesis of the gap is accomplished by 
polymerase 6 and e, their accessory factors RFC and 
PCNA, and a DNA ligase to generate a 30-nucleotide 
repair patch. In E. coli a 12-mer patch is resynthesized by 
polymerase | and ligated to the parental DNA. 


displaces UvrB, and by DNA ligase, which ligates the 
newly synthesized DNA to the parental DNA. 


Excision Repair in Mammalian Cells 


Biological Relevance of Excision Repair 

The physiological importance of nucleotide excision 
repair is illustrated by a rare human hereditary disease, 
xeroderma pigmentosum (XP), caused by mutations 
in any of seven genes named XPA through XPG. XP 
patients are extremely sensitive to sunlight and have 
an increased incidence of skin and certain internal 
cancers. Cultured cells derived from XP patients are 
hypersensitive to both killing and mutation induction 
by UV light and chemicals and, biochemically, this 
hypersensitivity has been correlated with defects in 
nucleotide excision repair. 


Mechanism of Nucleotide Excision Repair 

In contrast to the three-subunit (A)BC excinuclease 
employed in E. coli, the human excision nuclease 
utilizes 15 polypeptides in six repair factors for the 
basal steps, which include damage recognition and 


sequential assembly of subunits leading to the dual in- 
cision event. These six factors are XPA, RPA, TFIIH 
(XPB and XPD plus four additional polypeptides), 
XPCeHR23B, XPG, and XPFeERCC1. Following 
damage recognition by XPA and RPA, XPC and 
TFIIH are recruited to the damage site (see Figure 1) 
to form the first stable preincision complex. The 
initial, localized helical denaturation resulting from 
DNA damage is extended both 5’ and 3’ by the helic- 
ase activities of two TFIH subunits, XPB and XPD. 
XPC helps to stabilize this open complex and, further- 
more, XPC is a molecular matchmaker that dissociates 
after recruiting and positioning XPG 3’ to the DNA 
damage. The last factor to assemble is the 
XPFeERCC1 heterodimer. Dual incisions follow 
rapidly with XPG nicking the DNA at the sixth +3 
phosphodiester bond 3’ to the damage and XPFe 
ERCC1 hydrolyzing at the 20th + 5 bond 5’ to the 
lesion. The 24-32 nucleotide-long oligomer contain- 
ing the damaged base is released from the DNA 
(excision) and repair factors rapidly dissociate fol- 
lowing the dual incision event leaving a gapped sub- 
strate. In subsequent steps, DNA polymerases 6 and € 
and their accessory factors, PCNA and RFC, assem- 
ble at the gapped molecule and the undamaged strand 
is used as a template for precise resynthesis of the 
DNA. The repair patch size matches the size of the 
excision gap and, when the gap is filled to the 3’ end, 
the repair patch is ligated to the parental DNA by a 
ligase. 
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See also: DNA Repair; Xeroderma Pigmentosum 


Exon 


See: Introns and Exons 


Exonucleases 
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Exonucleases are enzymes that digest the ends of a 
piece of DNA. The nature of the digestion is usually 
specific (e.g. 5’ or 3’ exonuclease). Exonuclease III 
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(exo III), for example, is used to prepare deletions in 
cloned DNA, or for DNA footprinting. 


See also: Endonucleases; Footprinting; Nuclease 


Expression Vector 
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An expression vector is a vector designed for the 
expression of inserted DNA sequences propagated in 
a suitable host cell. The inserted DNA is transcribed 
and translated by the host’s cellular machinery. 


See also: Vectors 


Expressivity 
J A Fossella 
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Expressivity refers to the variation seen among indi- 
viduals expressing a particular trait or mutant pheno- 
type. ‘Variable expressivity’ is the term used to describe 
a trait or mutant phenotype that fluctuates in degree or 
severity from individual to individual in a population. 
For example, all individuals of a population expressing 
a trait or mutant phenotype such as ‘spotted’ may 
show an identical number of spots. This would be an 
example of low or nonvariable expressivity. Alterna- 
tively, some individuals may have many spots while 
others only a few and many with an intermediate 
number of spots. This would be an example of variable 
expressivity, since all the individuals express the trait 
or mutant phenotype of ‘spotted’ but vary in the 
degree of spotting. 

Expressivity is similar in meaning to ‘penetrance’ 
and the two terms are often used together when 
describing mutations. For example, certain weak 
alleles of the W locus seen in mice result in white 
coat color spots. These mutant alleles are said to 
show reduced penetrance and variable expressivity. 
The distinction between penetrance and expressivity 
is that penetrance refers to the genotype while expres- 
sivity refers to the phenotype. In this example, only 
some of the mice that carry the W /+ genotype show 
any spots at all. This is an example of reduced pene- 
trance. Of the animals that show the spotted ‘pheno- 
type’ however, some tend to show much spotting 
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while others show very little spotting. This is an 
example of variable expressivity. 

The phenomena of variable expressivity and 
reduced penetrance have a similar root cause. The 
phenotypic effects of a specific gene are highly con- 
tingent on the environmental conditions that exist 
during the development of an organism and during 
maturity. The effects of a specific gene are also depend- 
ent on other modifier genes in the same developmental 
or physiological pathway. Hence, variation in the 
environment and in modifier loci among individuals 


in a population may alter the phenotypic effects of a 
specific gene or mutation resulting in reduced pene- 
trance and variable expressivity. 


See also: Penetrance; W (White Spotting) Locus 


Extranuclear Genes 


See: Cytoplasmic Inheritance 


F Factor 


S M Rosenberg and P J Hastings 
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The F (for fertility) factor is a conjugative plasmid of 
Escherichia coli. It was the first plasmid discovered 
and has been significant in the development and prac- 
tice of bacterial genetics. Like other conjugative plas- 
mids, the F factor encodes the machinery for its own 
conjugative transfer and for the transfer of other DNA 
molecules that contain transfer origins — specific 
sequences that allow them to be mobilized (recruited 
and transferred) by the F-encoded transfer proteins, 
during bacterial conjugation. 


Structure of the F Factor 


The F factor is 100kb of duplex DNA with two 
replication-origin regions (Figure 1). The oriV or 
vegetative replication region contains two replication 
origins, one of which is used for bidirectional main- 
tenance replication of the plasmid when it is not being 
transferred to another cell. oriT, the transfer origin, 
promotes a special mode of unidirectional, single- 
(leading) strand replication used during conjugative 
transfer of the F factor to another cell. The copy 
number control of the F factor is similar to that of 
the chromosome such that there are one or two copies 
per bacterial chromosome. This feature has made the F 
factor useful to workers wishing to perform comple- 
mentation and dominance tests with their gene in a 
single copy replicon in E. coli. This allows creation of a 
state of partial diploidy (also called ‘merodiploidy’). 
Originally, this was done by isolation of F’ plasmids: 
F factors that have incorporated often large segments 
of DNA from the bacterial chromosome by homolo- 
gous recombination with the chromosome. Forma- 
tion of F’ plasmids is described (below) in “Importance 
of the F Factor in Bacterial Genetics” (see Figure 2). 
Since the advent of recombinant DNA technology, 
smaller derivatives of the F factor have been con- 
structed, including roughly 9-kb mini-F plasmids, 
containing just the oriV region, and the 55-kb pOX 


plasmids, containing DNA from oriV clockwise to the 
far end of the transfer region. 

The F factor encodes genes for sexual pili, thin rod- 
like structures with which F-carrying (male or donor) 
bacteria attach to F (female or recipient) cells for 
conjugative transfer. The F factor carries an operon 
of about 30 genes, encoding Tra proteins promoting 
transfer (Figure |). Importantly for bacterial genetics, 
the F factor also contains four transposable genetic 
elements: two copies of the insertion sequence IS3, 
one IS2, and one transposon Tn1000 (also called y8). 
These elements are important in two respects. First, 
because they are also present in the E. coli chromo- 
some, the transposable elements provide regions of 
DNA at which homologous recombination occurs 
between the F factor and the chromosome. The F 


IS3 


Leading region 


Figure | The F factor. The F factor is a 100-kb 
conjugative plasmid. The tra operon encodes functions 
required for conjugative transfer of the F factor. 
Transposable elements are indicated: IS3, IS2, and 
Tn/000, and the direction of transfer is indicated by 
the thin arrow. (Modified from Firth et al., 1996.) 


678 F Factor 


Bacterial chromosome 


¢ 
Recombination 


Recombination 


F 
4 > 


Figure 2 Formation of Hfr and F molecules by 
homologous recombination of the F plasmid with the 
bacterial chromosome. Transposable elements are re- 
presented as triangles, and single lines represent duplex 
DNA. The transposable elements present in the F plasmid 
provide regions of sequence identity with the E. coli 
chromosome and so allow the F plasmid to become 
incorporated into the chromosome via homologous 
recombination, to form an Hfr. Once incorporated, 
recombination may occur between transposable elements 
other than those that recombined upon integration of the F 
plasmid. This can produce an F’ plasmid. 


factor can integrate into the chromosome, forming an 
Hfr strain by this route (Figure 2). The F factor is 
therefore an episome, that is, a replicon that can exist 
either outside, or integrated into, the bacterial chromo- 
some. Second, the Tn/000 insertion interrupts the 
finO (fertility inhibition) gene. In other, similar con- 
jugative plasmids, the FinO protein represses expres- 
sion of the tra or transfer operon genes such that they 
are inducible upon mating. In the F factor, their 
expression is constitutive. 


Interesting F Factor Products that May 
Affect DNA Metabolism 


Other genes carried by the F factor encode proteins 
that probably affect DNA metabolism in the recipient 
bacterium during conjugative transfer. The leading 
region of the F factor, that is, the region that is 


transferred first, encodes a single-stranded DNA 
binding protein, Ssb, a protein (PsiB) that inhibits 
the SOS response by modifying RecA protein, and 
Flm, the F leading maintenance protein (also called 
ParL and Stm). The F factor also encodes the Ccd 
plasmid addiction system. Plasmid addiction systems 
consist of a stable toxin protein and a labile antidote 
protein. If the plasmid is lost from a cell, degradation 
of the antidote leads to killing of the cell by the stable 
toxin. The Ced toxin binds the topoisomerase DNA 
gyrase, resulting in it functioning like a double-strand 
endonuclease. Flm is part of a different plasmid addic- 
tion system, with a different postsegregational killing 
mechanism. The plasmid addiction systems, plus the 
infectivity of the F factor between cells, species, 
genera, and domains, give an impression of a selfish 
DNA element. Although the F was once considered a 
narrow-host-range conjugative plasmid, the discovery 
of its transfer to distantly related bacteria and even to 
yeast has changed this classification. 


Conjugative Transfer 


Cells carrying the F factor are called male or donor 
cells. They express long, rod-like pili on their surfaces 
and use these to attach to female cells for transfer of 
the F factor. Once attached, the pili retract, bringing 
the mating pair into close contact. The Tral endonu- 
clease makes a single-strand nick at oriT, and, with its 
helicase activity, peels back the 5’ end to which it 
remains covalently bound. The 3’ end primes leading 
strand synthesis that displaces the 5’-ending strand. 
The displaced strand is transferred into the recipient 
cell. Whether the DNA is transferred through a pilus 
or via some other close contact is not yet clear. 
The synthesis and strand displacement end when the 
whole single-strand length of the circle has been dis- 
placed and the 3’ growing end again reaches oriT. Tral 
is hypothesized to nick again, releasing the end of the 
displaced strand, and to assist recircularization of the 
ends in the recipient cell. Meanwhile, the complement 
of the transferred single-strand is synthesized in the 
recipient cell, such that a duplex circle is reestablished. 
The recipient thus becomes an F-carrying male, and 
the donor remains male. The Tra proteins can act on 
other bacterial plasmids with similar origins of trans- 
fer, including the ColE1 plasmids (from which 
pBR322, pUC, and many other cloning vectors are 
derived). The process of recruiting and transferring 
other plasmids is called mobilization. The sites on 
those plasmids that allow mobilization are called 
mob and the nick site itself (orv7T, which is necessary 
but not sufficient for transfer) has also been called bom 
and nic. pBR322 lacks mob but carries bom, and 
cannot be mobilized by the F factor unless a third 
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Figure 3 Conjugative transfer of the F plasmid. Each line represents a DNA strand; dashed lines represent newly 
synthesized DNA, and arrowheads 3’ ends. During transfer, the F-encoded Tral endonuclease cleaves one strand of 
DNA at the transfer origin, oriT, and remains covalently bound to the 5’ end. Leading-strand synthesis primed from 
the 3’ end displaces the cleaved strand, which is transferred into a recipient cell. Lagging-strand synthesis and 
recircularization occur in the recipient, regenerating an F plasmid there. 


plasmid, apparently supplying mob function (ColK), 
is present. pUC plasmids contain neither mob nor 
bom sites and so cannot be mobilized. 


Importance of the F Factor in Bacterial 
Genetics 


The original isolate of E. coli K12, from Stanford, 
carried an F plasmid. When Edward Tatum turned to 
E. coli for generalization of his biochemical genetic 
studies with Beadle (which led to the “one gene, one 
enzyme” hypothesis) in the fungus Neurospora, he 
made auxotrophic mutants of E. coli K12. To bring 
about mutagenesis, he used large doses of radiation, 
which caused loss of the F factor in some of the 
derivative strains. Joshua Lederberg’s interest in test- 
ing whether mating could occur between different 
E. coli auxotrophic mutant strains, to give proto- 
trophic recombinants (the selection of which he 
invented), led to his joining Tatum and using K12- 
derived strains (Lederberg and Tatum, 1946b) Because 
some of the strains had lost their F factor and others 
had not, Lederberg discovered mating and recombin- 
ation in bacteria. In strains that retained the F factor, the 
F factor could integrate into the bacterial chromosome. 
The integrated F factor can transfer segments of 
chromosomal DNA contiguous with its integration 
site during conjugation, and these can be recombined 
into the recipient chromosome, resulting in the proto- 
trophic recombinant bacteria reported by Lederberg 
and Tatum in 1946 (Lederberg and Tatum, 1946b) 
(Hfr). The results encouraged the idea that bacteria, 
like other organisms, had genes, and led to much of 
our current understanding of DNA recombination. 
Strains with the F factor integrated are called Hfr 
(high-frequency recombination) strains (Hfr). The 
integrated F factor can be excised from the chromo- 
some using homologous recombination with the 
same insertion sequences used upon its integration to 


regenerate an F” plasmid (a wild-type F plasmid with 
no bacterial DNA incorporated into it). If different 
insertion sequences from the bacterial chromosomal 
DNA are used for direct repeat recombination excising 
the F factor, then the F factor brings with it chromo- 
somal DNA, forming an F’ factor (Figure 2). 

The discoveries by William Hayes, Elie Wollman, 
and Frangois Jacob that the F factor is a plasmid, and 
the subsequent discoveries of other bacterial plasmids, 
made possible the development of plasmid vectors for 
molecular cloning (Hayes, 1952; Wollman et al., 1956; 
Jacob and Wollman, 1958). Fs are important replicons 
used in single-copy gene-complementation experi- 
ments and in tests of dominance. As discussed in Hfr, 
the use of Hfrs for studies of bacterial recombination led 
tothe characterization of mechanisms and proteins used 
in homologous genetic recombination in E. coli, and, 
because the DNA transferred in Hfr crosses is linear, 
the enzymes used in double-strand break-repair were 
illuminated in these studies. Descriptions of those 
proteins, bacterial recombination, and double-strand 
break-repair are given in Rec Genes, Recombination 
Pathways, RecA Protein and Homology, RecBCD 
Enzyme, Pathway, RuvAB Enzyme, RuvC Enzyme. 


Further Reading 

Brock TD (1990) The Emergence of Bacterial Genetics. Plainview, 
NY: Cold Spring Harbor Laboratory Press. 

Firth N, Ippen-lhler K and Skurray RH (1996) Structure and 
function of the F factor and mechanism of conjugation. 
In: Neidhardt FC, Curtiss Ill R, Ingraham JL et al. (eds) 
Escherichia coli and Salmonella: Cellular and Molecular Biology, 
2nd edn, vol. 2, pp. 2377-2401. Washington, DC: ASM 
Press. 

Holloway B and Low KB (1996) F-prime and R-prime factors. 
In: Neidhardt FC, Curtiss Ill R, Ingraham JL et al. (eds) 
Escherichia coli and Salmonella: Cellular and Molecular Biology, 
2nd edn, vol. 2, pp. 2413-2420. Washington, DC: ASM 
Press. 
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The F1 generation is the first generation resulting 
from a cross between two dissimilar parental lines. 


See also: Mendelian Genetics; Mendelian 
Inheritance 


Fl Hybrid 
L Silver 
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The most obvious advantage of working with inbred 
strains is genetic uniformity over time and space. 
Researchers can be confident that the inbred animals 
of a particular strain used in experiments today are 
essentially the genetic equivalent of animals from the 
same strain used 10 years ago. Thus, the existence of 
inbred strains serves to eliminate the contribution of 
genetic variability to the interpretation of experimen- 
tal results. However, there is a serious disadvantage to 
working with inbred animals in that a completely 
inbred genome is an abnormal condition with detri- 
mental phenotypic consequences. The lack of genomic 
heterozygosity is responsible fora generalized decrease 
in a number of fitness characteristics including body 


weight, life span, fecundity, litter size, and resistance 
to disease and experimental manipulations. 

It is possible to generate organisms that are genet- 
ically uniform without suffering the consequences of 
whole genome homozygosity. This is accomplished 
by simply crossing two inbred strains to each other. 
The resulting F1 hybrid organisms express hybrid 
vigor in all of the fitness characteristics just listed 
with an overall life span that will exceed that of both 
inbred parents. Furthermore, as long as both of the 
parental inbred strains are maintained, it will be pos- 
sible to produce F1 hybrids between the two, and all 
F1 hybrids obtained from the same cross will be 
genetically identical to each other over time and 
space. Of course, uniformity will not be preserved in 
the offspring that result from an “intercross” between 
two F1 hybrids (see Intercross); instead random seg- 
regation and independent assortment will lead to F2 
animals that are all genotypically distinct. 


See also: Hybrid Vigor; Intercross 


FAB Classification of 
Leukemia 
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From 1976 onward, a French-American—British (FAB) 
cooperative group of hematologists formulated a 
series of classifications of acute myeloid leukemia 
(AML), acute lymphoblastic leukemia (ALL), the mye- 
lodysplastic syndromes, chronic lymphoid leukemias, 
and the leukemic phase of non-Hodgkin’s lymphoma. 
These classifications were initially based only on 
cytology and cytochemistry, but immunophenotypic 
analysis was later incorporated. Subsequently it became 
apparent that several FAB categories of leukemia 
identified specific cytogenetic/molecular genetic 
entities, e.g., M3 AML (hypergranular promyelocytic 
leukemia) and L3 ALL (Burkitt’s lymphoma-related 
acute leukemia). Other FAB categories included 
more than one specific cytogenetic/molecular genetic 
entity, e.g, M5 AML was found to include not 
only various acute monocytic/monoblastic leukemias 
associated with t(9;11)(p21-22;q23) and other trans- 
locations with an 11q23 breakpoint, but also the 
completely different entity, acute monoblastic leuke- 
mia associated with t(8;16)(p11;p13). The FAB classi- 
fications were important in advancing knowledge 
of hematological malignancies, since they provided a 
framework for cytogenetic and molecular genetic 


research and also, by providing widely accepted ter- 
minology and definitions, facilitated clinical trials and 
international collaboration. 


See also: Leukemia; WHO Classification of 
Leukemia 


Fabry Disease 
(a-Galactosidase A 
Deficiency) 


R J Desnick 
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Fabry disease, an X-linked lysosomal storage disease, 
is caused by the deficient activity of o-galactosidase 
A (a-Gal A; EC 3.2.1.22), a lysosomal exoglyco- 
hydrolase which catalyzes the hydrolysis of terminal 
a-galactosyl residues from glycosphingolipids, pri- 
marily globotriaosylceramide. The primary site of 
pathology is the vascular endothelium. Patients with 
the classic form of Fabry disease have no detectable 
a-Gal A activity and typically present in childhood 
with acroparesthesias, angiokeratoma, hypohidrosis, 
and characteristic corneal and lenticular opacities. With 
increasing age, the progressive glycosphingolipid 
deposition results in renal failure, cardiac disease, and 
strokes. Death usually results from vascular disease of 
the kidney, heart, or brain. Patients with the clinically 
milder ‘cardiac variant’ have residual «-Gal A activity 
and present in mid to late adulthood primarily with 
cardiac manifestations. The disorder is panethnic and 
its estimated incidence is about 1 in 40 000 males. Over 
160 mutations in the «-Gal A gene that cause Fabry 
disease have been identified. Clinical trials of enzyme 
replacement therapy are underway and effective treat- 
ment may be available in the real future. 


Further Reading 

Desnick RJ, loannou YA and Eng CM (2001) In: Scriver CR, 
Beaudet AL, Sly WS and Valle D (eds) The Metabolic 
and Molecular Bases of Inherited Disease, pp 3733-3774. 
New York: McGraw-Hill. 

Eng CM, Banikazemi M, Gordon R et al. (2001) A phase 1/2 
clinical trial of enzyme replacement in Fabry disease: phar- 
macokinetic, substrate clearance, and safety studies. Ameri- 
can Journal of Human Genetics 68: 711—222. 


See also: Sex Linkage 
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l-, 2-, 3-Factor Crosses 
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A genetic marker is a nucleotide sequence difference 
that has phenotypic consequences, so that its trans- 
mission to progeny of a genetic cross can be moni- 
tored. In a genetic cross, one, two, or three (or more) 
markers (factors) may distinguish the parents. 


|-Factor Crosses 


When only one marker distinguishes two, sexually 
reproducing eukaryotic parents, three Mendelian 
principles can be illustrated: 


1. The F1, diploid offspring from a cross between two 
pure-breeding diploid parents (P, the parental gen- 
eration), may resemble either one parent or the 
other, illustrating dominance/recessiveness. 

2. The population of haploid cells (gametes) produced 
by meiosis in the F1 is composed equally of cells 
containing one or the other of the markers that 
distinguished the parents, illustrating Mendel’s law 
of segregation. 

3. The diploid generation resulting from the union of 
F1 gametes (F2) will have a phenotypic ratio of 3:1, 
favoring the type determined by the dominant 
gene. This ratio is expected if gametes unite with 
each other at random, without regard to their 


genotype. 


2-Factor Crosses 


When the P generation differs by two markers located 
in different genes, additional Mendelian principles are 
illustrated: 


1. The frequency of recombinants among haploid 
cells produced by meiosis in the F1 illustrates 
Mendel’s principle of independent assortment 
when, as is likely to be true, the two markers 
involved are on separate chromosomes. If the two 
markers are on the same chromosome, they may 
illustrate linkage, by the production of fewer than 
50% recombinant haploid products of meiosis. 

2. When the factors involved influence different 
phenotypes, the F2 usually manifests independent 
expression of those phenotypes. If the two genes 
are on separate chromosomes, this results in relative 
frequencies of the four phenotypes of 9:3:3:1, illus- 
trating the mosaic nature of phenotype determin- 
ation. 
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3-Factor Crosses 


When the P generation differs by three markers, new 
principles emerge if the three markers are linked. 


1. The recombination frequencies for the factors 
taken two at a time determine the order of the 
markers on the linkage map. In the absence of com- 
plications, this corresponds to the order of the mark- 
ers on the chromosome and genetically defines the 
concept of locus. 

The frequency of double crossovers may be differ- 
ent from that expected if simultaneous crossing- 
over in the two joint intervals were the result of 
statistically independent exchange events (inter- 
ference). 


j 


Tetrad Analysis 


Some fungi produce meiotic spore tetrads in which the 
order of the spores in the ascus reflects their origin via 
the two divisions of meiosis. The first two spores in the 
ascus are sister spores from the same second meiotic 
division, as are the last two spores. In a 1-factor cross, 
the frequency with which sister spores carry the same 
marker (first division segregation) is a measure of 
the distance of that marker from the centromere of the 
chromosome on which it is carried. If the frequency is 
close to 100%, the marker is close to its centromere. 
(For those species, like Neurospora crassa, that have 
eight ascospores by virtue of postmeiotic mitosis, 
substitute ‘spore pairs’ for ‘spores’ in the above.) 

In organisms with unordered spore tetrads (e.g., 
Saccharomyces), linkage of a newly found marker to 
its centromere can be established when another mar- 
ker tightly linked to a different centromere is avail- 
able. In a 2-factor cross involving the new and the old 
markers, the frequency of tetratype tetrads is indi- 
cative of the degree of linkage under question. If the 
frequency of tetratype tetrads is close to zero, the new 
marker also is close to its centromere. 


Variations on Mendel’s Rules 


Variations from the Mendelian expectations outlined 
above may be encountered. 

In 1-factor crosses between pure-breeding parents, 
the phenotype of the F; may be intermediate between 
that of the parents (incomplete dominance). 

In 2-factor crosses involving unlinked marker pairs 
each of which shows simple dominance, ratios other 
than 9:3:3:1 may be observed, implying that the pheno- 
types of the genes involved are not expressed inde- 
pendently of each other. For instance, a ratio of 9:3:4 
implies that the genotype at one locus interferes with 
phenotypic expression at the other (epistasis). 


In 1-factor crosses, examination of the four haploid 
products coming from individual acts of meiosis 
reveals occasional violations (gene conversions) of 
the expected 2:2 marker ratio. 

In 2-factor crosses, conversion of one marker 
occurs independently of that of other markers unless 
the two markers are tightly linked, in which case they 
may undergo co-conversion. 

In 3-factor crosses with linked markers, conversion 
at the central site is accompanied by a high rate of 
crossing- over of the flanking markers, implying that 
conversion and crossing-over are aspects of acommon 
process. 

1-, 2-, or 3-factor crosses may be conducted with 
bacteria or viruses with similar consequences. 


See also: Deletion Mapping; Epistasis; Gene 
Conversion; Gene Mapping; Incomplete 
Dominance; Interference, Genetic; Mapping 
Function; Marker; Tetrad Analysis 


Facultative 
Heterochromatin 


See: Heterochromatin 


Familial Fatal Insomnia 
(FFI) 


See also: GSD (Gerstmann-Straussler Disease) 


Familial 
Hypercholesterolemia 
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Familial hypercholesterolemia (FH) is a prevalent 
autosomal codominant disorder that causes elevated 
blood cholesterol levels and premature heart attacks. 
The disease is caused by mutations in the gene encod- 
ing the low density lipoprotein (LDL) receptor, which 
removes LDL, the major cholesterol-carrying protein, 
from blood. FH heterozygotes (1 in 500 in most popu- 
lations) have a 50% reduction in LDL receptors and 
a two- to threefold elevation in plasma LDL levels. 
They frequently experience heart attacks in the fifth 
decade. The rare FH homozygotes (1 in 1 million) 


manifest three- to eightfold elevations of plasma LDL, 
and they typically have heart attacks in childhood. 
The disorder has been observed in nearly every popu- 
lation of the world, placing FH among the most pre- 
valent single-gene disorders in humans. More than 500 
mutations in the LDL receptor gene have been defined 
by genomic analysis. The LDL receptor was the first 
cell-surface receptor that was recognized to carry a 
protein into cells by receptor-mediated endocytosis. 
Comparison of normal and mutant FH cells helped to 
establish the properties of this fundamental process, 
which is now known to be used for many purposes in 
all animal cells. 


See also: Genetic Diseases 


Fanconi’s Anemia 


C Mathew 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0445 


Fanconi’s anemia is an autosomal recessive inherited 
disorder associated with progressive aplastic anemia, 
diverse congenital abnormalities, and a high incidence 
of acute myeloid leukemia. It is genetically hetero- 
geneous, with seven complementation groups (A-G) 
having been described. The genes for six of these 
groups have been identified, but the sequences of 
the encoded proteins have not provided immediate 
insight into the functional pathway that is disrupted 
in this condition. Cells from patients are hyper- 
sensitive to DNA cross-linking agents such as mito- 
mycin C, which suggests that the encoded proteins 
may be involved in the repair of DNA interstrand 
cross-links. 


See also: Leukemia, Acute 


Fate Map 
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A fate map is a map of an embryo illustrating the adult 
tissues that will be derived from particular embryonic 
regions. 


See also: Cell Lineage; Embryonic Stem Cells 
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Favism 
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The term ‘favism’ is used to indicate a severe reaction 
occurring on ingestion of foodstuffs consisting of or 
containing the beans of the leguminous plant Vicia 
faba (fava bean, broad bean). The reaction manifests 
itself, within 6-24 h of the fava bean meal, with pros- 
tration, pallor, jaundice, and dark urine. These signs 
and symptoms result from (sometimes massive) 
destruction of red cells (acute hemolytic anemia), 
triggered by certain glucosides (divicine and convi- 
cine) present at high concentrations in the fava 
beans. These substances cause severe damage to red 
cells only if they are deficient in the enzyme glucose 
6-phosphate dehydrogenase (or G6PD), therefore 
favism only occurs in people who have inherited 
G6PD deficiency (see Glucose 6-Phosphate Dehy- 
drogenase (G6PD) Deficiency). Favism is more com- 
mon and more life-threatening in children (usually 
boys) than in adults; however, once the attack is over 
a full recovery is usually made. In a person who is 
G6PD deficient favism can recur whenever fava beans 
are eaten, although whether this happens or not is 
greatly influenced by the amount of beans ingested 
and probably by many other factors. From the public 
health point of view, it has been proven that favism 
can be largely prevented by screening for G6PD 
deficiency and by education through the mass media. 


See also: Mutagens 
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F-duction is the same as sexduction, i.e., the high- 
frequency transfer of a segment of bacterial (for ex- 
ample, Escherichia coli) DNA incorporated into an 
F’ plasmid. 


See also: Sexduction 
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The Family Felidae includes 37 recognized species 
that range over five continents. Human fascination 
with these champions of predatory hunting has led 
to their deification in ancient Egypt and Asia, to 
domestication, to celebration in art and theology, 
and to voluminous literary and scientific descriptions 
dating from Tutankhamen’s tomb and Marco Polo’s 
chronicles. Biologists have studied the Felidae exten- 
sively, producing deep insight into an evolutionary 
process that favored these stunningly efficient carni- 
vores, specialized or adapted in stealth, speed, and 
majesty, and unchallenged in the natural habitat until 
the rise of humankind. And nonspecialists treasure 
cats unashamedly; indeed “cat” is among the first 
words uttered or spelt by English-speaking children. 
The Felidae is one of the eight families of the Car- 
nivora order which began to evolve intrinsic special- 
ities during the lower Eocene, some 40 million years 
ago. Today’s feline species descend from an ocelot- 
sized ancestor named Pseudailurus from which a 
group of large saber-toothed cats and surviving wild 
cats emerged, largely in the last 12 million years. 
The saber-tooths disappeared rather recently in the 
Pleistocene (10-20000 years ago) coincident with 
the latest ice ages which saw the extinction of quite a 
few other large mammals such as the mammoths, 
mastodons, dire wolves, and giant ground sloths. 
Living cats, along with the hyena, mongoose, and 
civet families, comprise the aelurid (cat-like) side of 
the carnivore family tree, which was originally recog- 
nized by the presence of an ossified segment in the 
auditory bulla in the cranial inner ear. This is in con- 
trast to arctoid (bear-like) carnivores that do not have 
it. Today a variety of additional morphological and 
DNA-based characters have affirmed the historical 
separation between the two Carnivora suborders. The 
cats share several adaptations inherited from their com- 
mon ancestor, including blunt foreshortened face, large 
eyes with binocular color vision, retractible claws into 
a fur-covered sheath, and large sensitive ears. The 
tawny color range and patterning serves as adaptive 
camouflage for cats, three-quarters of which inhabit 
dense forests and live isolated solitary existences. 
Pelage or coat display among the living cats varies in 
pattern (stripes in tigers, tawny solid in lions and 
pumas, marbled in clouded leopard, king cheetahs, 
and marbled cat), in pigmentation (albino in lions and 
tigers, black in leopards, jaguars, and jaguarundi), and 


in hair length (long hair in snow leopard, great mane 
in African male lions, short hair in jaguarundi). That 
the same pelage patterns seen among different felid 
species are also observed and selected within domestic 
cat breeds implies that the intrinsic pelage genetic 
diversity may have originated in the ancestors of all 
cats to be reinforced by natural selective pressures 
during species isolation. 

Cat specialists recognize 36 wild cat species 
(Figure 1), and there is little disagreement on their 
identification (there is some; for example a few hold- 
outs consider the Iriomote cat a separate species, but 
most data classify this as a subspecies of leopard cat). 
Domestic cat is considered a separate species for ease 
of discussion even though domestic cat establishment 
from artificial selection of African wild cat in Egypt 
date to around 2000 sc. 

Genus level relationships among the 36 species 
have been contentious with dozens of taxonomic opin- 
ions over the twentieth century ranging from a min- 
imum of two genera (Felis and Acinonyx — cheetah) to 
amaximum of 19 genera. Molecular genetic data using a 
consensus of mitochondrial and nuclear gene compar- 
isons have been a useful new approach that many 
believe will solve this difficult taxonomic puzzle. 
(The reason it is so difficult is because 36 wild species 
diverged in a relatively short evolutionary time period 
of 12 million years.) The gene comparisons cluster the 
species into three major lineages (subfamilies): (1) the 
ocelot lineage (7 species); (2) the domestic cat lineage 
(6 species); and (3) the pantherine lineage (23 species). 
Within these lineages, the cats assort into eight mono- 
phyletic groups; that is, each group displays evidence 
for a recent common ancestor subsequent to diver- 
gence from the older common ancestor for all modern 
cats, Prionailuris. One of these groups includes the 
five traditional Panthera species, lions, tiger, jaguar, 
leopard, and snow leopard, plus the clouded leopard, 
all descended from a 3-million-year-old ancestor. 
Another group joins cheetah, puma, and jaguarundi 
to an older 8-10-million-year-old split. The eight 
groups will likely represent a future genus proposal 
for the Felidae, one based on the imputed evolution- 
ary history of species divergence, a type of pedigree of 
Felidae natural history. 

Below the species level, geographical subdivision 
and isolation is probably the best way to identify sub- 
species, an important distinction that Charles Darwin 
considered as preludes to future species isolation. 
Molecular genetic tools are currently being applied to 
develop explicit DNA-based characters to recognize 
subspecies partitions and identification. Such studies 
usually reduce maximal members of subspecies to 
populations with explicit verifiable criteria for distinc- 
tion. Thus leopard subspecies have been reduced from 
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27 to 8; tigers from 8 to 5, lions from 7 to 2, and pumas 
from 32 to 6. Since subspecies recognition forms the 
basis for conservation strategies and protective legisla- 
tion, these distinctions gain added importance. 

Sadly each of the 36 wild cat species is listed as 
endangered or threatened by IUCN and CITES, 
international bodies that monitor global conser- 
vation. The threat to Felidae survival is principally in 
three areas: (1) habitat loss owing to human develop- 
ment; (2) hunting and depredation owing to human 
protection; and (3) poaching for skins and internal 
organs of erroneously perceived medicinal/aphrodi- 
siac benefit. The realization of the sorry state of Feli- 
dae species has spawned conservation initiatives 
across their range and continues to be a high priority 
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worldwide to stop or reverse the extinction of these 
remarkable specimens. 


See also: Conservation Genetics 
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The term ‘female carrier’ usually refers to females who 
are heterozygous for X-linked recessive disorders. 
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Most will have a healthy phenotype and so carrier 
detection is a major role of the genetic clinic. A female 
carrier of an X-linked recessive disorder has a 1 in 2 
chance of passing the mutant allele to each child. Since 
the child has a 1 in 2 chance of being male, her chance 
of having an affected male is 1 in 4. 

Although carrier females are heterozygous for any 
X-linked recessive trait they carry, only one allele is 
active in each cell. In early embryogenesis in females, 
one of each cells X chromosomes is randomly and 
permanently inactivated. The mix in the tissues 
usually prevents the development of the full mutant 
phenotype but female carriers of an X-linked recessive 
trait are at risk of developing variable expression of the 
disorder. This contrasts with autosomal recessive 
traits where carrier heterozygotes have the normal 
phenotype. 


Pedigree Analysis 


Features of X-Linked Recessive Pedigrees 

Figure | shows a family with X-linked recessive 
Becker muscular dystrophy (BMD), which is allelic 
to the commoner and more severe Duchenne form 
(DMD). The pedigree shows several features typical 
of X-linked recessive inheritance. Only males are 
affected. Affected males form a ‘knight’s move’ pat- 
tern in the pedigree, i.e., they are related through 
healthy females. There is no apparent male to male 
transmission. An affected male such as II:2 cannot 
transmit the BMD to his sons as he gives them his Y 
chromosome and not the X chromosome with the 
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mutant allele. Sons inherit their X chromosome from 
their mother. 


Identification of Obligate Carriers 

After drawing the pedigree of an X-linked recessive 
disorder, it may be possible to infer which females, 
with a healthy phenotype, have heterozygote (carrier) 
genotype from their position in the family. Such 
females are obligate carriers and their identification 
is an important first step in assessing the carrier risks 
of the other females in the family. Obligate carriers are 
marked on the pedigree diagram by placing a dot in 
the middle of the pedigree symbol. In the pedigree in 
Figure |, there are two affected males in generations 
II and III, I:2 and III:3. The older affected male’s 
daughter, III:2, is an obligate carrier because her father 
must have given her his only X chromosome, which 
carries the mutation (md). Her aunt, II:3, is also an 
obligate carrier as she has an affected son and an 
affected brother. Since two separate new mutations 
in one family would be extremely rare, she must have 
inherited her brother’s mutation from her mother, in 
order to pass it to her son. Her mother, I:2, must also 
be an obligate carrier as she has two offspring who 
have the mutation. The genetic situation of the first 
obligate carrier in a family is complex. She could have 
inherited the mutation, she could be a new mutation, 
or the mutation could have started in her ovaries 
(gonadal mosaicism). In the first situation, her sisters 
and aunts also have a carrier risk and genetic counsel- 
ling should be offered. In the second and third situ- 
ations, only her female descendants are at risk. 


Identification of Females with a Carrier Risk 
The healthy females II:5 and III:4 each have an obligate 
carrier mother. Their mothers have two X chromo- 
somes and the chance of passing on the mutant allele 
(md) to their daughters is 1 in 2. If a female with a 
carrier risk has healthy son(s), the (conditional) risk 
for each son being born healthy can be used in a Bayes 
calculation to reduce her inherited (prior) carrier risk. 


Pitfalls in Pedigree Analysis 


Nonpaternity 

Before assuming that the daughter of an affected male 
is an obligate carrier, the geneticist should make sure 
that the man is the biological father. 


New mutation 

Very occasionally, perhaps once in several hundred 
pedigrees, a separate new mutation can arise in a 
branch of a family with an existing mutation. Before 
assuming the intervening females are all obligate car- 
riers, it is worth trying to confirm that the affected 
males in each branch of the family carry the same 
mutation. 


Gonadal mosaicism 

Mother of a sporadic affected male If the mother of 
a sporadic affected male has no evidence of her son’s 
mutation in her blood, it cannot be assumed that she is 
not a carrier. She could be a gonadal mosaic and carry 
the mutation in her ovaries. Although her mother, 
sisters, and aunts are not at risk, her daughters will 
have a carrier risk and she is at risk of passing the 
mutation to any future sons. 


Parents of a sibship consisting of only carriers at the 
top of a pedigree Ina family where the first sibship 
known to carry the mutation consists of two or more 
carriers and no affected males, it is tempting to assume 
that the mutation has been inherited from the mother. 
If she has no mutation in her blood it cannot be 
assumed that she is a gonadal mosaic. The carrier 
sisters’ father could be a gonadal mosaic. In this case, 
all sisters in the sibship, including any half sisters he 
has by another partner, have a very high carrier risk 
depending on the proportion of his gonads carrying 
the mutation. The mother of the carrier sisters is not a 
carrier and her aunts, sisters, and any children she has 
by another partner are not at risk. DNA linkage an- 
alysis can assist in determining the parental origin of 
the mutated X chromosome. 


Daughters of normal transmitting males 
In most X-linked recessive disorders, such as BMD or 
DMD, the mutation cannot be passed from a healthy 


Female Carriers 687 


male, such as ITI:1 in Figure I, to his daughters. This is 
because the mutations tend to be fully penetrant as 
males are hemizygous for X-linked genes. However, 
some disorders do not follow classical Mendelian 
inheritance patterns. For example, in fragile X mental 
retardation syndrome, caused by an unstable ampli- 
fied CGG trinucleotide repeat mutation in the FMR1 
gene, some phenotypically normal males inherit a small 
amplified repeat, or premutation, from their mothers. 
The males then transmit this to all their daughters. 
These healthy obligate carriers are at risk of transmit- 
ting an expanded full mutation to their offspring 
because premutations are more unstable when trans- 
mitted through females than males. In fragile X syn- 
drome all healthy males with a prior risk of inheriting 
the mutation should have their DNA screened before 
advising that their daughters are not carriers. 


Carrier Tests 


Direct Tests, which Confirm Carrier Status 

Following pedigree analysis, a number of different 
types of test can help determine the carrier status of 
at risk females. The most accurate methods directly 
identify the mutation or the mutated gene product. It 
is preferable to confirm the mutation in the proband. 
Unfortunately, direct tests are currently not possible 
in all circumstances. 


Conditional Tests, which Produce a Carrier 
Risk 

Clinical geneticists can combine the results of sev- 
eral independent conditional carrier tests together 
with conditional pedigree information using a Bayes 
calculation to produce a final carrier risk. Where the 
mutation is not detectable, genetic linkage studies can 
help. These label the chromosome around the gene 
and track the segment of chromosome through indi- 
viduals in several generations of the family. Linkage 
studies are not 100% accurate because of genetic 
recombination, which can cause the markers to switch 
chromosomes. Biochemical tests, not directly related 
to the trait’s gene product, can also provide condi- 
tional risk information. Examples include serum crea- 
tine kinase in DMD and fibroblast very long chain 
fatty acids in adrenoleukodystrophy. Carrier/noncar- 
rier risk ratios are available for various values of these 
substances. 


Symptomatic Female Carriers: 
Underlying Genetic Mechanisms 


Manifesting carriers are found in many X-linked 
recessive disorders such as DMD, in which about 1 
in 40 carriers have some symptoms. 
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Nonrandom X-Inactivation 

Studies have shown that most manifesting carriers of 
DMD have skewed X-inactivation with over 70% of 
the chromosomes, carrying the normal allele, inactiv- 
ated. The muscles are a considerable proportion of 
total body mass and to manifest symptoms, a large 
shift in total body nonrandom X-inactivation needs to 
occur. Other traits such as color blindness have much 
smaller target tissues and a manifesting color blind 
carrier may have considerable skewing of X inactiva- 
tion in the retina but still have overall random X 
inactivation. 


Homozygosity 

Homozygous mutant females will express the full 
phenotype of an X-linked recessive trait. Females 
may inherit a mutation from one parent who is 
affected or a carrier and the X chromosome from 
the normal parent undergoes a new mutation. Some 
disorders such as colour blindness and glucose-6- 
phosphate dehydrogenase deficiency are common, 
particularly in some communities. In this situation, 
homozygosity often occurs because the mutations 
are inherited from an affected father and a carrier 
mother. 


Turner Syndrome (45,X0) 

Females with Turner syndrome are hemizygous for all 
X-linked genes and manifest the full phenotype of any 
X-linked recessive disorder they carry. Females with 
partial deletions of the X chromosome will manifest 
symptoms of any recessive trait carried on the non- 
deleted ‘normal’ X, if the corresponding allele is not 
present on the deleted X. 


X-Autosome Translocation 

In a number of X-linked recessive disorders, some 
manifesting carriers have reciprocal X-—autosome 
translocations. The breakpoint on the X chromosome 
disrupts the disorder’s gene locus. In this situation, the 
normal X with the healthy allele selectively inactivates 
in all cells. This happens because inactivation of the 
translocated X would spread through the disrupted, 
nonfunctioning gene to the adjoining autosomal 
genes, which would be lethal. 


Further Reading 

Connor JM and Ferguson-Smith MA (1997) Essential Medical 
Genetics, 5th edn. Oxford: Blackwell Science. 

Gelehrter TD, Collins FS and Ginsburg D (1998) Principles of 
Medical Genetics, 2nd edn. Bethesda, MD: Williams & Wilkins. 

Online Mendelian Inheritance in Man: http://www.ncbi.nlm.nih. 
gov/omim/ 


University of Glasgow, Department of Medical Genetics, Encyclo- 
paedia of Genetics pages contain a number of illustrations 
and animated diagrams to accompany this article: http:// 
www.gla.ac.uk/medicalgenetics/encyclopedia.htm 


See also: Sex Linkage; Translocation; 
X-Chromosome Inactivation 
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Although the success of many species throughout the 
world is dependent on their status as commensal spe- 
cies in some regions with appropriate environmental 
conditions, animals have reverted back to a noncom- 
mensal state, severing their dependence on human- 
kind. Such animals are referred to as feral. The return 
to the wild can occur most readily with a mild climate, 
sufficient vegetation or other food source, and weak 
competition from other species. Feral mice, for ex- 
ample, have successfully colonized small islands 
off Great Britain and in the South Atlantic, and in 
Australia, Mus musculus has replaced some indigen- 
ous species. Although feral populations exist in North 
America and Europe as well, here they seem to be at a 
disadvantage relative to other small indigenous rodents 
such as Apodemus (field mice in Europe), Peromyscus 
(American deer mice), and Microtus (American voles). 
In some geographical areas, individual house mice will 
switch back and forth from a feral to a commensal state 
according to the season — in mid-latitude temperate 
zones, human shelters are much more essential in the 
winter than in the summertime. 


See also: Commensal; Mus musculus 
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Fertilization is a complex set of events involving 
the fusion of two gametes to produce a new indi- 
vidual. In pro duction of the gametes, the number of 


chromosomes are halved by meiosis. Oogenesis results 
in a large, complex oocyte that contains the proteins, 
enzymes, and other factors necessary for the first 
days of development. Spermatogenesis has to ensure 
that the sperm is able to travel through the female 
reproductive tract to meet and fertilize the oocyte. 

Oogenesis is a complex process that starts during 
fetal life. At birth, females are born with primary 
oocytes arrested in meiosis I (diplotene stage) and no 
further oocytes will be produced. Each month only 
one oocyte will fully mature and just before ovulation, 
meiosis is resumed, the first polar body is extruded 
(containing one set of oocyte chromosomes), and the 
oocyte arrests in metaphase II. Meiosis is only com- 
plete upon fertilization. Maturation (M-phase) pro- 
moting factor (MPF) causes resumption of meiosis 
and is regulated by the c-mos gene. MPF is high during 
meiosis I and II. 

Sperm are produced by the testis and become 
mature and motile as they travel through the epididy- 
mis. Sperm released on ejaculation have to travel 
through the cervical os, the uterus, and the fallopian 
tubes where hopefully an oocyte is waiting for fertil- 
ization. The mature sperm consists of a head piece, 
neck, and flagellum. The flagellum is responsible for 
initiation and maintenance of motility through the 
female reproductive tract. The head of the sperm con- 
tains the sperm DNA and is the area involved in recog- 
nition of the zona pellucida (the glycoprotein coat that 
surrounds the oocyte) and sperm-—oocyte fusion. The 
head is covered by a membrane-bound vesicle called 
the acrosome. Before a sperm can fertilize an oocyte it 
needs to go through capacitation and the acrosome 
reaction. Capacitation occurs in the female reproduct- 
ive tract and involves a range of poorly understood 
processes that do not alter the ultrastructure of the 
sperm, but enable the sperm to fertilize the oocyte. 

At ovulation the oocyte is surrounded by a dense 
array of cumulus cells that play a vital role in oogen- 
esis. The sperm swim through the cumulus cells until 
they reach the zona pellucida. It is thought that there 
are two stages to sperm binding to the human oocyte. 
The first involves the binding of the acrosome intact 
sperm to the primary binding site on the oocyte, 
termed ZP3. The acrosome matrix consists of a num- 
ber of enzymes, the most important of which is acro- 
sine, a serine protease that is packaged as the inactive 
pro-acrosin. The acrosome reaction exposes the 
second binding site on the sperm head that binds to 
the secondary binding site, ZP2, on the oocyte. The 
sperm is then able to penetrate the occyte and two 
reactions are stimulated: the cortical reaction and 
oocyte activation. 

The cortical reaction may be involved in block- 
ing additional sperm penetrating the oocyte. In vitro 
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analysis of human zygotes show many sperm bound to 
the zona pellucida but usually only one sperm fertil- 
izes the oocyte. The cortical reaction involves vesicles 
of Golgi apparatus, which contain enzymes and muco- 
polysaccharides. The granules break open releasing 
their contents into the perivitelline space which causes 
the zona pellucida to harden. Oocyte activation is 
caused by calcium oscillations within the oocyte. 
Calcium oscillations occur in mammals and blocking 
this calcium increase with chelators blocks fertiliza- 
tion. Calcium causes a decrease of MPF. MPF needs to 
be low for the oocyte to exit meiosis I and II. There are 
two hypotheses to how sperm cause calcium oscilla- 
tions. The first is the surface receptor mediated model, 
which compares the sperm to a giant ligand that binds 
to a receptor on the oocyte surface resulting in activa- 
tion of the polyphosphoinositide pathway. The se- 
cond is the soluble sperm factor hypothesis, which 
suggests that sperm contain a soluble factor that is 
released into the ooplasm causing calcium oscillations. 
This hypothesis would explain why intracytoplasmic 
sperm injection (ICSI) is able to work (see below). 

Upon sperm entry, the oocyte chromosomes 
undergo the final stages of meiosis and the oocyte 
extrudes the second polar body. The fertilized oocyte, 
or zygote, contains two pronuclei; one from the 
oocyte and one from the sperm, and two polar bodies 
which are the waste product of oogenesis. The meiotic 
spindle of the oocyte breaks down and the sperm 
contributes factors that are involved in establishing 
the first mitotic spindle. The sperm produces astral 
microtubules that migrate through the ooplasm and 
pull the female pronucleus close to the male pronu- 
cleus. The zygote undergoes syngamy (the sun in the 
egg) where the nuclear membranes of the male and 
female pronuclei break down and the chromosomes 
condense separately and line up on the first mitotic 
spindle. The zygote cytoplasm will divide in half 
(cleavage) to give two identical daughter cells, each 
with a complete diploid set of chromosomes. 


IVF 


In vitro fertilization (IVF) is a technique developed 
for the treatment of some forms of infertility. The first 
successful birth was that of Louise Brown in 1978. 
IVF can be used for the treatment of several types of 
male and female infertility. In the female, one of the 
most common causes of infertility is tubal blockage, 
but IVF can also be used in the treatment of ovula- 
tory disorders, including polycystic ovarian disease 
(PCO), immunological disorders such as antisperm 
antibodies, endometriosis, coital problems, and “unex- 
plained” infertility where no etiological factor has been 


identified. 
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In an IVF treatment cycle the female’s menstrual 
cycle is usually downregulated to block her normal 
cycle and injections of follicle stimulating hormone 
(FSH) are administered to stimulate multiple follicle 
development. Follicle development is tracked using 
ultrasound and when the follicles have developed suf- 
ficiently (i.e., three follicles over 18 mm), a single dose 
of hCG (human chorionic gonadotrophin) is adminis- 
tered to mimic the luteinizing hormone (LH) surge 
and ensure maturation of the oocytes. The oocytes are 
collected under light sedation with ultrasound guid- 
ance, by aspiration of the follicles. The oocytes can 
easily be identified under a dissecting microscope and 
are placed in a simple culture medium and stored at 
37 °C. The male partner’s sperm is collected and pre- 
pared to remove seminal plasma and enrich for motile 
sperm. This can be performed using the traditional 
swim-up technique in which medium is laid over the 
sperm and motile sperm swim up into the medium, or 
using density gradient centrifugation, which separates 
motile from immotile sperm. The prepared sperm 
are used to inseminate the oocytes (approximately 
100 000 sperm ml™* of culture medium). The follow- 
ing day after insemination, the cumulus cells are 
removed from the oocytes and the oocytes checked 
for fertilization. Normal fertilization can be seen by 
the presence of the male and female pronuclei. Nor- 
mally fertilized oocytes (zygotes) are returned to 37 °C 
and examined over the next 1-2 days, during which 
stage they should undergo cleavage. Cleavage is the 
halving of the ooplasm to produce daughter cells or 
blastomeres. Human embryos are graded taking into 
account the size and shape of the blastomeres, the 
number of cell divisions, and the degree of fragmenta- 
tion. Normally fertilized embryos can be transferred 
to the uterus of the female either on day 2 post in- 
semination (2-4 cell stage) or day 3 (6-8 cell stage). 
Good-quality embryos can be cryopreserved for a 
future cycle. Recently, some groups have reported an 
improvement in the pregnancy rate and a decrease in 
the multiple pregnancy rate by using blastocyst trans- 
fer (day 5-6 of embryo development). This procedure 
has been hindered as very few human embryos 
develop to the blastocyst stage in vitro, but recent 
improvements in IVF culture medium have improved 
this. Most research in IVF is aimed at improving the 
IVF success rates. Other recent advances include 
assisted hatching and aneuploidy screening. Assisted 
hatching involves making a hole in the zona pellucida 
to ensure that the embryo can successfully hatch. 
Aneuploidy screening involves removing 1-2 blasto- 
meres from the 6-10 cell embryo and testing for the 
chromosomes commonly involved in aneuploidy (13, 
16, 18, X, and Y) so that embryos normal for these 
chromosomes are transferred. 


For male infertility, if the sperm count is very 
low (the World Health Organization guideline for a 
normal sperm count is 20 million sperm ml”), intra- 
cytoplasmic sperm injection (ICSI) can be used. This 
technique involves injection of a single sperm into the 
cytoplasm of a metaphase II oocyte. The surrounding 
cumulus cells are removed and the oocyte is posi- 
tioned with the first polar body at the 6 or 12 o’clock 
position to minimize damage to the meiotic spindle. A 
single sperm is taken up into a fine-bore pipette and 
the pipette inserted directly through the zona pellu- 
cida into the ooplasm. A small amount of ooplasm is 
gently aspirated into the pipette and the sperm is 
expelled. This procedure has proved to be very suc- 
cessful for most forms of male infertility. 

In cases where there is no sperm in the ejaculate, it 
may be possible to aspirate sperm from the epididymis 
(MESA — microepididymal sperm aspiration or PESA 
— percutaneous epididymal sperm aspiration) in the 
cases of obstructive azoospermia, or the testis (TESA 
— testicular needle aspiration or TESE — open biopsy 
testicular extraction). More recently, several IVF 
centers worldwide have reported on the use of the injec- 
tion of spermatids and some successful pregnancies 
have been obtained. However, spermatid injection is 
still controversial. In some cases of male infertility, a 
genetic reason for the infertility is known. Some men 
show deletions of regions of the long arm of the Y 
chromosome and will therefore pass infertility to all 
their sons. Other genetic abnormalities may be muta- 
tions in the androgen receptor gene, expansion or 
reduction in the triplet repeat in the androgen receptor 
gene, cystic fibrosis mutations which can lead to con- 
genital absence of the vas deferens, and chromosomal 
translocations, which have been shown to be associated 
with an increase risk of infertility. If the genetic causes 
of the infertility are known, genetic counseling is 
required to ensure these families are aware of the risk 
of transmitting these abnormalities to their offspring. 


See also: Fertilization, Mammalian; Genetic 
Counseling; Oogenesis, Mouse; Spermatogenesis, 
Mouse; Zygote 
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Introduction to Mammalian Fertilization 


Fertilization is the means by which sexual reproduc- 
tion takes place in nearly all multicellular organisms 
and is fundamental to maintenance of life. It is defined 


as the process of union of two germ cells, egg and 
sperm, whereby the somatic chromosome number 
is restored and the development of a new individual 
exhibiting characteristics of the species is initiated. 
Both mammalian eggs and sperm are designed to 
ensure that fertilization takes place reliably. Accord- 
ingly, mechanisms are in place to support species- 
specific interactions between gametes and to prevent 
fusion of eggs with more than one sperm (polyspermy). 


Egg Development 


Oogenesis begins during fetal development when 
primordial germ cells are transformed first to oogonia 
(mitotic) and then to oocytes (meiotic). The pool of 
small, nongrowing oocytes, present at birth, is the 
sole source of unfertilized eggs in the sexually mature 
female mouse. These oocytes are arrested at the dip- 
lotene (dictyate) stage of the first meiotic prophase. 
Each oocyte (~ 15 um in diameter) is contained within 
a cellular follicle that grows concomitantly with the 
oocyte for about 2 weeks, from a single layer of a few 
epithelial-like cells to three layers of cuboidal granu- 
losa cells by the time the oocyte has completed its 
growth (~80um in diameter). Over several days, 
while the oocyte remains the same size, follicular 
cells undergo rapid division, increasing to more than 
5x10* cells in the Graafian follicle. The follicle ex- 
hibits a fluid-filled cavity, or antrum, when it consists 
of ~6x 10° cells and, as the antrum expands, the oocyte 
takes up an acentric position surrounded by two or 
more layers of granulosa cells (cumulus cells). 

Fully grown oocytes in Graafian follicle complete 
the first meiotic reductive division, called meiotic 
maturation, just prior to ovulation in response to 
a surge in the level of luteinizing hormone (LH). 
Oocytes progress to metaphase II of the second meiotic 
division, with separation of homologous chromo- 
somes and emission of a first polar body, and become 
unfertilized eggs. Oocytes must complete meiotic 
maturation in order to be capable of being fertilized 
by sperm. The ovulated egg completes meiosis, with 
separation of chromatids and emission of a second 
polar body (i.e., becomes haploid, 17), only upon 
fertilization by sperm (sperm chromosomes restore a 
diploid, 2n, state to the zygote). 


Sperm Development 


In mice, it takes ~35 days for each spermatogonial 
stem cell, already present in the fetus, to progress 
through meiosis as a spermatocyte, become four 
haploid spermatids (17), and to be transformed into 
spermatozoa. Spermatogenesis takes place within the 
seminiferous epithelium lining the tubules of the testes 
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and is supported by Sertoli cells, a secretory cell type 
and major site of testosterone action. Spermatozoa 
initially move passively from the seminiferous epithe- 
lium to the rete testis, to the epididymis, and to the 
vas deferens. During this period of transport, sperm 
become motile and fully functional. The final and 
essential maturation of sperm, called capacitation, 
occurs in the female genital tract following ejacula- 
tion. Capacitation involves removal of inhibitory 
factors from sperm, as well as biochemical changes in 
sperm proteins (e.g., tyrosine phosphorylation). Only 
capacitated sperm are capable of binding to eggs, 
undergoing exocytosis (acrosome reaction), and 
producing zygotes. 


Eggs and Sperm in Oviduct 


Eggs released from Graafian follicles enter the open- 
ing (ostium) of the oviduct (fallopian tube) and move 
to the lower ampulla region where fertilization takes 
place. It has been estimated that mouse eggs and sperm 
in the oviduct remain capable of being fertilized and 
giving rise to normal offspring for 8-12h following 
ovulation. Typically, very few ovulated eggs are found 
in oviducts of mice (~10) and human beings (~ 1). 
Similarly, very few sperm are found at the site of 
fertilization (~ 100-150) as compared to the number 
of sperm deposited into the female reproductive tract 
(~107); an extremely low percentage of ejaculated 
sperm make their way to the position of unfertilized 
eggs in the oviduct. It takes ~15 min for ejaculated 
mouse sperm and ~30 min for human sperm to tra- 
verse the female genital tract and reach the oviduct. 
Whether binding of mammalian sperm to eggs occurs 
due to a chance encounter in the oviduct or is promoted 
by a chemical gradient stimulus (chemotaxis) remains 
to be determined. Today, there is evidence for human 
sperm chemotaxis mediated by an egg follicular factor. 


Pathway to Mammalian Fertilization 


The pathway to fertilization in mice follows a com- 
pulsory order (Figure 1): 


1. Capacitated, acrosome-intact sperm bind in a 
species-specific manner to the egg zona pellucida 
(ZP). 

2. Bound sperm undergo the acrosome reaction 
(cellular exocytosis). 

3. Acrosome-reacted sperm penetrate the ZP. 

4. Sperm that penetrate the ZP bind to the egg plasma 
membrane. 

5. Bound sperm fuse with the egg plasma membrane 
to form a zygote (fertilization is completed). 

6. Following fusion, blocks to polyspermy are 
instituted. 
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The mammalian fertilization pathway includes a series of steps taken in a compulsory order. Acrosome- 


intact sperm bind to sperm receptors in the zona pellucida (ZP) by using egg-binding proteins on the sperm head 
plasma membrane. Sperm then undergo the acrosome reaction (cellular exocytosis), penetrate through the ZP, and 
reach the perivitelline space between the ZP and plasma membrane. A single sperm then binds to and fuses with egg 
plasma membrane. Fusion of sperm and egg triggers the cortical reaction that, in turn, triggers the zona reaction. The 
zona reaction alters the properties of the ZP making it a barrier to other sperm and preventing polyspermic 


fertilization. (Adapted with permission from Wassarman, 1988.) 


Some of the egg and sperm molecules that support 
each step in this pathway to fertilization have been 
identified and characterized. A description of these 
molecules and the manner in which they participate 
in mammalian fertilization follows below. 


Binding of Sperm to Unfertilized Eggs 

All mammalian eggs are surrounded by a thick extra- 
cellular coat, called the zona pellucida (ZP). Conse- 
quently, sperm must first bind to and then penetrate 


the ZP in order to reach and fuse with the egg plasma 
membrane (Figure 2). Removal of the ZP (e.g., by using 
acidic buffers or proteases) exposes the egg plasma 
membrane directly to sperm and, as a result, virtually 
eliminates any barriers to fertilization between species 
in vitro. This has made the ‘hamster test’ a routine 
method of assessing the fertilizing capacity of human 
sperm in zn vitro fertilization (IVF) clinics. 

The ZP consists of only a few glycoproteins, called 
ZP1-3, that are organized via noncovalent bonds into 


Fertilization, Mammalian 693 


Figure 2 Light photomicrograph of mouse sperm bound to the ZP of an unfertilized mouse egg in vitro. 


cross-linked filaments. Apparently, the ZP of eggs from 
all mammalian species, from mice to human beings, 
consists of ZP1-3. Even the vitelline layer surround- 
ing eggs from many nonmammalian species, includ- 
ing fish, birds, and amphibia, contains glycoproteins 
structurally related to ZP1-3. Each glycoprotein 
possesses a unique polypeptide that is hetero- 
geneously glycosylated with both asparagine-linked 
and serine/threonine-linked oligosaccharides. Genes 
encoding ZP1-3 polypeptides from a wide variety of 
mammalian species have been cloned and character- 
ized. In addition, targeted mutagenesis of ZP genes 
has been carried out by using homologous recombin- 
ation in embryonic stem (ES) cells and ‘knockout’ 
mice produced. 

In mice, acrosome-intact sperm bind exclusively to 
the glycoprotein ZP3, which is therefore called the 
sperm receptor. Sperm recognize and bind to specific 
oligosaccharides linked to serine residues in a region 
of ZP3 polypeptide near the C-terminus (encoded by 
exon-7). These oligosaccharides have been isolated 
and shown to possess sperm receptor activity. Thus, 
binding of mammalian sperm to eggs is another ex- 
ample of carbohydrate-mediated cellular adhesion. 
Whether species-specific binding of sperm to eggs 
can be attributed to changes in oligosaccharide struc- 
ture (composition, sequence, linkage, and modifica- 
tion) is currently under investigation. 


When acrosome-intact sperm bind to the ZP they 
do so by using one or more proteins associated with 
plasma membrane overlying the sperm head. These 
proteins recognize and bind to sperm receptors in 
the ZP and are called egg-binding proteins. Many 
such proteins have been described during the past 20 
years. Some of these are integral membrane proteins, 
while others are peripheral proteins associated with 
integral membrane proteins. Examples of these 
include B-1,4-galactosyltransferase, sperm proteins 
—56 and —17, zonadhesin, spermadhesin, mannose- 
and galactose-binding proteins, and many others. 
Some of these candidate proteins can be considered 
to be lectins. It is unclear to what extent the diversity 
of these proteins is attributable to misleading experi- 
mental evidence. It is possible that a single class of 
sperm proteins may eventually emerge as the bona fide 
egg-binding protein in many, if not all, mammals. 


Acrosome Reaction 

The acrosome is a large secretory vesicle that appears 
in spermatids as a product of the Golgi apparatus and, 
in certain respects, is biochemically similar to a lyso- 
some. It is located at the anterior portion of the sperm 
head, just under the plasma membrane and above the 
nucleus. Acrosomal membrane underlying the plasma 
membrane is called the ‘outer’ acrosomal membrane 
and that overlying the nucleus is called the ‘inner’ 
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acrosomal membrane. During the acrosome reaction, 
multiple fusions occur between plasma membrane and 
outer acrosomal membrane at the anterior region of 
the sperm head. Extensive formation of hybrid mem- 
brane vesicles takes place. As a result, the egg ZP 
is exposed to the inner acrosomal membrane and 
acrosomal contents of bound sperm. 

Among the many different acrosome reaction- 
inducers, which include progesterone, is the sperm 
receptor ZP3. It is now generally accepted that ZP3 
is the natural agonist that initiates the acrosome reac- 
tion following binding of sperm to the ZP. It appears 
that multivalent interactions between egg-binding 
protein(s) and ZP3 may trigger this Ca**-dependent 
reaction. As in secretion by somatic cells, intracellular 
Ca** is necessary and sufficient to initiate the acro- 
some reaction. ZP3 stimulation of sperm activates 
voltage-sensitive T-type Ca?™ channels, resulting in 
depolarization of the sperm membrane from ~—60 
to ~ —30 mV, and increases intracellular Ca?* concen- 
tration from ~ 150 to ~400 nM. It is likely that open- 
ing of sperm T-type channels leads to a sustained 
release of Ca°™ from an internal store, perhaps via 
inositol 3,4,5-triphosphate (IP3) and IP3 receptors. 
In addition to these changes, ZP3-stimulated sperm 
exhibit a transiently elevated pH (alkanization) that 
may activate Ca~‘/calmodulin-dependent adenyl 
cyclase, protein phosphatases, protein kinases, tyro- 
sine kinases, and phospholipases. 

It is clear that ZP3 stimulation of sperm also 
activates G proteins and activation of Gj, and Giz 
accounts for the pertussis toxin sensitivity of the acro- 
some reaction. Participation of another G protein, 
Gg/11, has also been suggested. Receptors that activate 
G proteins have remained elusive, although aggrega- 
tion of B-galactosyltransferase on the sperm head 
by ZP3 or antibodies has been reported to lead to 
activation of a pertussis toxin-sensitive G-protein 
complex and induction of the acrosome reaction. 


Penetration of Zona Pellucida by Sperm 

Only acrosome-reacted sperm can penetrate the ZP 
and fuse with egg plasma membrane. The course taken 
by sperm is indicated by a narrow slit left behind in the 
ZP of the fertilized egg. In mice, it takes ~ 15-20 min 
for acrosome-reacted sperm to penetrate the ZP 
and reach the egg plasma membrane. Until relatively 
recently, it was thought that the acrosomal serine- 
protease, acrosin, was essential for penetration of the 
egg ZP by bound, acrosome-reacted sperm. However, 
sperm from mice that are homozygous nulls for acro- 
sin (Acr^) penetrate the ZP and fertilize eggs, suggest- 
ing that acrosin may not be essential for these steps. 
On the other hand, the absence of acrosin does cause 
a delay in penetration of the ZP by sperm, which may 


be due to a delay in dispersal of acrosomal proteins 
during the acrosome reaction. It is possible that other 
acrosomal proteases either replace acrosin or are 
themselves responsible for sperm penetration through 
the ZP. It should also be noted that sperm motility is 
an important contributing factor to ZP penetration 
and schemes have been suggested whereby sperm 
penetrate the ZP solely by mechanical shear force. 


Fusion of Sperm and Egg 

In mice, plasma membrane above the equatorial 
segment of acrosome-reacted sperm fuses with egg 
plasma membrane. Fusion between gametes nearly 
always involves egg microvillar membrane (i.e., all 
but the region where the second metaphase plate and 
first polar body are located), since it permits max- 
imum apposition of sperm and egg. As proposed for 
other biological systems, localized dehydration at the 
site of membrane contact and establishment of hydro- 
phobic interactions are critical steps for fusion. As 
mentioned earlier, there is little evidence for barriers 
to interspecies fertilization once sperm have pene- 
trated the ZP and reached the plasma membrane. In 
most mammals, fusion of the sperm head with the egg 
is closely followed by entry of the sperm tail into the 
egg cytoplasm. 

Several sperm proteins have been implicated in 
binding of sperm to and fusion of sperm with egg 
plasma membrane. One of these proteins, PH-30 or 
fertilin, has received the most attention. Fertilin is a 
heterodimer of a- and B-glycosylated subunits and is 
a member of the ADAM (contain a disintegrin and a 
metalloprotease domain) family of transmembrane 
proteins. Peptides based on sequences at the disinte- 
grin domain of fertilin-B and, perhaps, fertilin-«, can 
prevent binding of sperm to eggs from which the ZP 
has been removed in vitro. It has been proposed that 
binding of acrosome-reacted sperm to egg plasma 
membrane is supported by interactions between fer- 
tilin’s disintegrin domains and integrin (e.g., &6ß1) 
receptors on unfertilized eggs. 

Fertilin-« possess a moderately hydrophobic se- 
quence, ~ 17-25 amino acids long, in its cysteine-rich 
domain that may function as a fusion peptide follow- 
ing binding of acrosome-reacted sperm to egg plasma 
membrane. The peptide can be modeled as an -helix 
having a strongly hydrophobic face (amphipathic 
helix), similar to several viral fusion peptides. Experi- 
mental evidence suggests that this peptide and related 
peptides can bind to membranes and induce fusion. 
Despite such evidence, sperm from mice that are 
homozygous null for fertilin-B and possess reduced 
levels of fertilin-~ can fuse with egg plasma mem- 
brane in in vitro assays, albeit with reduced efficiency. 
This, as well as other observations with the null mice, 


suggests that either an additional fertilin-ß- 
independent pathway to fusion exists or that fertilin-o 
and -P are not essential components of the gamete 
fusion pathway. In this context, it has been reported 
that, although the fertilin-x gene is expressed in 
humans, it does not produce a functional protein. 
Further experimentation should clear up these issues. 


Prevention of Polyspermy Following 
Fertilization 

Once an egg has fused with a single sperm to become 
a zygote it is imperative that no additional sperm 
fuse with the zygote’s plasma membrane. In 
mammals this is achieved by immediate changes in 
the electrical properties of the plasma membrane 
(‘fast block,’ within seconds) and by slower changes 
in the properties of the ZP (‘slow block,’ within mi- 
nutes). The latter is a result of the so-called ‘zona 
reaction.’ 

The zona reaction occurs within minutes of fertil- 
ization. It is induced by the contents of cortical gran- 
ules, small membrane-bound organelles that underly 
the egg plasma membrane, which are deposited in- 
to the ZP following the ‘cortical reaction.’ There are 
~ 4000 cortical granules in each mouse egg. The cor- 
tical reaction involves fusion of cortical granule and 
plasma membranes with exocytosis of cortical granule 
contents into the ZP. Apparently, this occurs as a 
result of localized release of Ca** from egg cyto- 
plasmic stores. Among the contents are a variety of 
enzymes and other proteins. These components cause 
a hardening of the ZP (i.e., a decrease in solubility), 
perhaps due to proteolytic modification of ZP2, and 
a loss of sperm binding, perhaps due to modification 
of ZP3 by glycosidases. Consequently, movement of 
bound sperm through the ZP and binding of additional 
sperm to the ZP are prevented. 


Final Considerations 


Reproduction of the species is a fundamental property 
of all living things. Fertilization activates the mam- 
malian egg to initiate a complex program of develop- 
ment, transforming a single cell into a multicellular 
organism. Accordingly, development of eggs and 
sperm and the interactions between gametes that cul- 
minate in fertilization are highly regulated. Some of 
the egg and sperm molecules that participate in the 
fertilization pathway have been identified and their 
mechanisms investigated. This relatively new infor- 
mation has already contributed to our ability to con- 
trol reproduction and will continue to have an impact 
on medical aspects of human reproduction for years 
to come. 
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Filamentous bacteriophages (phages) are long, flexible 
rods that are simply extruded through the surface of 
their host bacteria rather than killing them during 
productive infection. Each consists of a circular, 
single-stranded (ss) DNA molecule encased ina sheath. 
The best-studied are three related ‘Ff’ coliphages — f1, 
M13, and fd — and Vibrio cholerae phage, CTX4, 
which encodes the cholera toxin (CT). Like most 
filamentous phages, they use the tip of a conjugative 
pilus as an initial receptor in recognizing their target 
bacteria; they thus are relatively specific for bacteria 
containing appropriate conjugative plasmids. Attach- 
ment occurs via the N-terminal of a specific protein, 
pIII, at one end of the phage. It leads to retraction of 
the pilus, probably by depolymerization into the 
membrane, bringing the phage tip into contact with 
the bacterial outer membrane proteins Tol Q, R, and 
A, which are required to translocate the DNA into the 
cytoplasm. As long as these three proteins are present, 
filamentous phages can infect at a very low efficiency 
even in the absence of a fertility plasmid. 
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The Ff phages are approximately 7 nm in diameter 
and 690 nm long, with a mass of 16.3 MDa, 13% of it 
DNA. The sheath contains 2700 molecules of a 50- 
amino acid protein, pVIII, which is highly «-helical. 
It has a basic C-terminal domain toward the DNA 
phosphate backbone, a hydrophobic central region, 
and an acidic N-terminal exposed to the outside; the 
molecules are arranged in single fashion. One tip 
contains five copies each of the pII recognition 
protein (406 AA) and a 111-AA protein, pVI, forming 
knobs. The other end has about five copies each of 
pVII (33 AA) and pIX (32 AA). A 78-nucleotide hair- 
pin loop of the DNA located at this latter end serves as 
the packaging signal. The capsid proteins reside in the 
inner membrane until assembly. pVIII is synthesized 
with a 23-AA signal sequence that is cleaved after it is 
inserted into the membrane, leaving the N-terminal 
domain in the periplasmic space. An 18-residue signal 
sequence helps pII get through the membrane into 
the periplasmic space, remaining anchored to the 
inner membrane by a 23-AA C-terminal hydrophobic 
sequence. The other three capsid proteins simply 
contain membrane-spanning hydrophobic regions. 

Six additional proteins are encoded in the Ff phage 
genomes, along with a regulatory intergenic region. 
Three of the proteins are required for replication: pII, 
409 AA, a site-specific endonuclease required for both 
phage and replicative-form (RF) DNA synthesis; pX, 
111 AA, synthesized from an internal start in gene II, 
needed for synthesis of the ss viral DNA; and pV 
(87 AA), ass DNA-binding protein (SSB). A possible, 
specific hairpin site in the intergenic region of the 
entering viral DNA is recognized by the host SSB 
and RNA polymerase (RNAP), which synthesizes a 
primer used by the host DNA polymerase III to initi- 
ate synthesis of the double-stranded replicative form. 


DNA pol I and ligase are needed to close the comple- 
mentary strand. The DNA is then supercoiled by 
gyrase to become replicative form I (RFI), the tem- 
plate for transcription. Further replication is rather 
complex. The phage pII has to nick a specific site, the 
viral-strand origin, and the 3’-OH end thus formed 
acts as the primer for rolling-circle replication carried 
out by pol IIT and the host SSB and rep helicase. After 
one round of replication, pII cleaves and circularizes 
the single-stranded viral-strand ‘tail,’ which can initi- 
ate formation of a new RFI, while the RFI containing 
the new viral RNA strand is resealed and supercoiled 
to again act as a substrate for pII, as well as for 
transcription. 

Once sufficient gpV accumulates in the cell, it binds 
cooperatively to some of the ssDNA molecules; 
pX somehow helps regulate the balance between pro- 
geny DNA and RF formation. Three additional, 
phage-encoded proteins aid in viral assembly. Outer- 
membrane protein pIV (405 AA), ‘secretin,’ is synthe- 
sized with a 21-AA signal sequence. Gene I encodes a 
348-AA inner-membrane protein with its N-terminal 
253 residues in the cytoplasm. An internal trans- 
lational start produces pI* (108 AA), still containing 
the membrane and periplasmic domains of pI. As 
shown genetically, the cytoplasmic part of pI interacts 
with thioredoxin and the packaging signal during 
assembly, while the outer portion appears to interact 
with pIV in the outer membrane to form the extrusion 
passage. The pV helps form the DNA into a linear 
antiparallel structure facilitating assembly; about 1500 
molecules of pVare required per phage for this process. 
Assembly of the virus particle can then be initiated by 
an interaction of the packaging signal with the cyto- 
plasmic domain of pI, the membrane-associated pVII 
and pIX. During elongation and extrusion, pVIII from 
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the membrane displaces the pV, with some sort of 
assistance from reduced host thioredoxin. When the 
end of the DNA is reached, pVI and pIII are added and 
the particle is released from the cell; the C-terminal 
region of plII is particularly important for particle 
stability. 

Two classes of filamentous phages have been des- 
cribed; within each class, the DNA is largely homolo- 
gous, but phage Ike, prototypic of the second class, is 
only 55% homologous to the Ff phages. Pseudomonas 
phages Pf1 and Pf3 (in the second class?) have been 
shown to have a very different packaging of the DNA 
and protein than the Ff viral particles. 

In 1996, Matthew Waldor and John Mekalanos 
showed that the structural genes for CT are actually 
encoded by a filamentous bacteriophage (designated 
CTXphi) related to coliphage M13 and f1. The CTX 
genome either replicates as a plasmid or integrates in 
the chromosome. CTX uses the toxin-coregulated pili 
(TCP) that are required for intestinal colonization as 
its receptor and infects V. cholerae cells within the 
gastrointestinal tracts of mice more efficiently than 
under laboratory conditions. Thus, the emergence 
of toxigenic V. cholerae involves horizontal gene 
transfer that may depend on im vivo gene expression. 
Although the genome of CTXphi closely resembles 
that of coliphage f1, CTXphi lacks a homolog of f1 
gene IV; instead of encoding its own outer membrane 
‘secretin,’ it uses epsD, the putative outer membrane 
pore for the host type II secretion system, which is 
also used for excreting the CT as well as protease and 
chitinase. 

The fact that the length of the phage simply 
depends on the amount of DNA being packaged has 
made the filamentous phages popular as cloning vec- 
tors. Up to 6kb of DNA can be inserted into appro- 
priate intergenic regions without affecting packaging 
efficiency, and it is possible to put in significantly 
longer inserts. Either ss DNA or ds DNA can be 
readily obtained for various purposes. A variation on 
this theme has been the construction of ‘phagemids,’ 
vectors incorporatating the intergenic packaging and 
replication signal of a filamentous phage in addition to 
a plasmid origin of replication. They thus replicate as 
plasmids until infected by a helper filamentous phage, 
which activates the phage origin of replication and 
provides the proteins to package the plasmid contain- 
ing the clone into a transducing phage. 

Filamentous phages have also been used exten- 
sively as vehicles for ‘phage display’ by cloning se- 
quences encoding small peptides into the N-terminal 
region of pIII. Libraries made in this fashion can read- 
ily be screened for a large variety of binding activities. 
Larger proteins can be incorporated into specific 
places in either pIII or pVIII as long as they are 
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being expressed from a plasmid in the infected cell 
and replace only a small fraction of the given protein 
molecules in the final phage. 
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Filial generation is the term pertaining to a particular 
generation in a sequence of brother-sister matings that 
can be carried out to form an inbred strain. The first 
filial generation, symbolized as F,, refers to the off- 
spring of a cross between animals having nonidentical 
genomes. When F; siblings are crossed to each other, 
their offspring are considered to be members of 
the second filial generation or F2, with subsequent 


generations of brother-sister matings numbered with 
integer increments. 


See also: Fl Hybrid; Inbred Strain 


Filter Hybridization 
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Filter hybridization is a technique for im situ solid- 
phase hybridization whereby denatured DNA is 
immobilized on a nitrocellulose filter and incubated 
with a solution of radioactively labeled RNA or 
DNA. 


See also: In situ Hybridization 


Fingerprinting 
JH Miller 
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The chromatographic pattern of spots produced by 
proteolytic digestion of a protein followed by electro- 
phoresis. 


See also: Proteins and Protein Structure 


First and Second Division 
Segregation 
J RS Fincham 
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During the first division of meiosis in a diploid cell 
the chromosomes are each divided into chromatids, 
but sister chromatids remain attached together at the 
centromere. At first anaphase, the centromeres do not 
split, as in anaphase of mitosis; instead the centro- 
meres of homologous chromosomes separate (segre- 
gate) from each other toward the two poles of the 
division spindle as wholes, each taking two chromat- 
ids with it. Centromeres always segregate at the first 
division of meiosis and do not split to allow their 
two halves to separate into different meiotic products 
until the second division. 
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First division (A) or second division (B) segregation of alleles A and a depending on whether or not a 


crossover occurs between the A/a locus and the centromere (vertical bar in the left panel), the point at which sister 
chromatids remain connected until the second division of meiosis. 


When two homologous chromosomes are distin- 
guished by a genetic marker, say with allele A on one 
chromosome and allele a on the other, the A—a differ- 
ence will segregate at the first division provided that 
the alleles remain attached to their original centro- 
meres. However, when a single crossover, which 
always involves just one chromatid of each chromo- 
some, occurs between the A/a locus and the centro- 
mere, different alleles become joined to the same 
centromere, and the anaphase separation at first ana- 
phase will not be between A-A and a-a but rather 
between A-a and A-a (Figure |). Then segregation of 
A from a will be delayed to the the second division. 

The effect of two crossovers in the locus—centromere 
interval depends on whether the same chromatids are 
involved in the second crossover as in the first. If the 
same two cross over twice (a two-strand double), or if 
the second crossover involves the two chromatids not 
involved in the first crossover (a four-strand double), 
the effect in either case is to restore first division 
segregation. If one chromatid crosses over twice, two 
cross over once each, and the other not at all (three- 
strand double), the effect is second division segre- 
gation. So double crossovers give, on average, 50% 
second division segregation. An indefinitely large 
number of crossovers will give, on average, two-thirds 
second division segregation, a result most easily 
understood by imagining the four alleles as totally 
uncoupled from their centromeres and distributed 
two-and-two to the first division spindle poles at 


random. An A allele will then be twice as likely to be 
accompanied by an a allele as by the other A allele. 

First and second division segregation can be dis- 
tinguished by tetrad analysis (see Tetrad Analysis). 
When the marker is a gross chromosomal feature 
such as a large terminal deletion, first and second 
division segregation can also sometimes be seen 
under the microscope (Figure 2). 

The second division frequency of a genetic marker 
is a measure of the frequency of its crossing-over with 
the centromere, and hence of the map length of the 
marker—centromere interval. To make second division 
segregation percentages equivalent to recombination 
percentages, on which map units (centimorgans, cM) 
are conventionally based, they need to be divided by 
two. This is because a single crossover in a marker- 
centromere interval will always give second division 
segregation, whereas a single crossover in a marker- 
marker interval will recombine only two out of the 
four chromatids. In fact, second division segregation 
and recombination frequencies relate linearly to true 
map distance (total average number of crossovers per 
chromosome pair x 50) only when there is never more 
than one crossover in the interval concerned. Both 
measures approach a maximum value as the number 
of crossovers in the interval becomes large, and this 
maximum value is different in the two cases: 50% 
recombination and 67% second division segregation, 
which, without correction, would convert to 50 and 
33.3cM. Thus, as distance increases, both measures 


700 FISH (Fluorescent in situ Hybridization) 


No chiasma between centromere Chiasma formed between 
and the deletion in one homolog centromere and deletion 


Interpret- 
ation 
Bivalents N 


at first Arm shortened 


meta- by deletion 
phase | actual 
appearance 


Anaph l 
Anaphase II a a 


J i 
Segregation of length 
difference at 1st division 


Anaphase II 


J 
y | 


Segregation of length 
difference at 2nd division 


Figure 2 First and second division segregation made 
visible in a lily heterozygous for a chromosome length 
difference. (Reproduced with permission from Fincham 
JRS (1983) Genetics after Brown and Zohary (1955) 
Genetics 40: 850.) 


increasingly underestimate true map distance, but 
second division segregation does so to a greater extent. 


See also: Centimorgan (cM); Centromere; 
Crossing-Over; Map Distance, Unit; Meiosis; 
Tetrad Analysis 


FISH (Fluorescent in situ 
Hybridization) 
J Read and S Brenner 
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Fluorescent in situ hybridization (FISH) is a tech- 
nique used to identify the chromosomal location of a 
particular DNA sequence. A DNA probe is fluores- 
cently labeled and hybridized to denatured metaphase 
chromosomes spread out on glass slides. 


See also: Physical Mapping 


Fisher, R.A. 
AW F Edwards 
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Sir Ronald Fisher (1890-1962), the father of modern 
statistics, was for most of his life a professor of genet- 
ics, first in London and then at Cambridge. He made 
lasting contributions to mathematical and evolution- 
ary genetics as well as to statistical theory applied to 
genetics, and experimented widely, studying espe- 
cially linkage in the mouse and in polysomic plants 
and natural selection in the wild. 

Fisher was born in London on 17 February 1890, 
the son of a fine-art auctioneer. His twin brother was 
stillborn. At Harrow School he distinguished himself 
in mathematics despite being handicapped by poor 
eyesight which prevented him working by artificial 
light. His teachers used to instruct him by ear, and 
Fisher developed a remarkable capacity for pursuing 
complex mathematical arguments in his head. This 
manifested itself later in life in an ability to reach a 
conclusion whilst forgetting the argument, to handle 
complex geometrical trains of thought, and to develop 
and report essentially mathematical arguments in 
English (only for students to have to reconstruct the 
mathematics later). Fisher’s interest in natural history 
was reflected in the books chosen for special school 
prizes at Harrow, culminating in his last year in the 
choice of the complete works of Charles Darwin in 13 
volumes. 

Fisher entered Gonville and Caius College, 
Cambridge, as a scholar in 1909, graduating BA in 
mathematics in 1912. At college he instigated the 
formation of a Cambridge University Eugenics 
Society through which he met Major Leonard Darwin, 
Charles’s fourth son and president of the Eugenics 
Education Society of London, who was to become 
his mentor and friend. Prevented from entering war 
service in 1914 by his poor eyesight, Fisher taught in 
schools for the duration of the war and in 1919 was 
appointed Statistician to Rothamsted Experimental 
Station, an agricultural station at Harpenden north of 
London. In 1933 he was elected to succeed Karl Pearson 
as Galton Professor of Eugenics (i.e., of Human Genet- 
ics, as it later became) at University College, London, 
and in 1943 he was elected Arthur Balfour Professor 
of Genetics at Cambridge and a Fellow of Gonville 
and Caius College. He retired in 1957 and spent his 
last few years in Adelaide, Australia, where he died 
of a postoperative embolism on 29 July 1962. His 
ashes lie under a plaque in the nave of Adelaide 


Cathedral. 


Fisher married Ruth Eileen Guinness in 1917 and 
they had two sons and six daughters, and a baby girl 
who died young. He was elected a Fellow of the Royal 
Society in 1929 and was knighted in 1952 for services 
to science. He was the founding President of the Bio- 
metric Society, and served as President of the Royal 
Statistical Society, the International Statistical Insti- 
tute, and the Genetical Society. He received many 
honorary degrees and memberships of academies, 
and the Royal, Darwin, and Copley Medals of the 
Royal Society. 

Fisher made profound contributions to applied and 
theoretical statistics, to genetics, and to evolutionary 
theory. This account concentrates on genetics and 
evolution. Attracted to natural history at school, in 
his first term as an undergraduate at Cambridge Fisher 
bought Bateson’s book Mendel’s Principles of Hered- 
ity, with its translation of Mendel’s paper. Before 
graduating he had already remarked on the surpris- 
ingly good fit of Mendel’s data, and by 1916, encour- 
aged by Leonard Darwin, he had completed the 
founding paper of biometrical genetics and the analy- 
sis of variance The Correlation between Relatives on 
the Supposition of Mendelian Inheritance, eventually 
published in 1918. 

From his post of statistician at Rothamsted Fisher 
made advances which revolutionized statistics, but his 
advances in genetics and evolution were hardly less 
revolutionary. In a single publication in 1922 he 
proved that heterozygotic advantage in a diallelic sys- 
tem gives rise to a stable gene-frequency equilibrium, 
introduced the first stochastic model into genetics 
(a branching process), and initiated the study of 
gene-frequency distributions by means of the diffu- 
sion approximation, and in another paper he applied 
the method of maximum likelihood to the estimation 
of linkage for the first time. Other papers dealt with 
variability in nature, the evolution of dominance, and 
mimicry, and in 1926 he started his long association 
with E.B. Ford with whom he later measured the 
effect of natural selection in wild populations. 

In 1930 Fisher’s The Genetical Theory of Natural 
Selection was published, containing a wealth of new 
evolutionary arguments, from the fundamental the- 
orem of natural selection to ideas about sexual selec- 
tion, inclusive fitness, and parental expenditure. More 
than any other work The Genetical Theory established 
a firm basis for the modern view that evolution 
by natural selection is primarily a within-species 
phenomenon. 

Taking up his appointment at University College in 
1933, Fisher’s pace did not slacken. Experimental 
organisms included mice, poultry, and the purple 
loosestrife, and even dogs, under the auspices of the 
Genetical Society. But it is in human genetics that 
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he made the most lasting contribution. In 1935 he 
secured funds from the Rockefeller Foundation to 
establish a Blood-Group Serum Unit at his Galton 
Laboratory with the express purpose of initiating the 
construction of a linkage map for man, for he had 
already seen the connection between “Linkage studies 
and the prognosis of hereditary ailments” (to use the 
title of his lecture to the International Congress on 
Life Assurance Medicine in that year). Here is the 
intellectual origin of the Human Genome Project. 
At the same time Fisher, with J.B.S. Haldane and 
L.S. Penrose, was advancing the special statistical 
theory required in the estimation of human linkage. 

In 1943 Fisher moved to Cambridge, where he was 
reunited with his colleagues from the Blood-Group 
Unit who had been evacuated there during the war. An 
immediate consequence was his brilliant solution of 
the Rhesus blood-group puzzle, involving three close- 
ly linked loci which between them explained the array 
of serological reactions which to everyone else had 
appeared chaotic: Fisher did for Rhesus what Mendel 
did for round and wrinkled. 

After World War II ended in 1945 Fisher attempted 
to establish bacterial genetics in his Cambridge 
department and to retain for Cambridge the Blood- 
Group Unit, but without success. Work in his small 
department revolved around linkage in the mouse, and 
studies on purple loosestrife, wood sorrel, and prim- 
roses, always with a strong background of mathemat- 
ical and statistical developments. His Theory of 
Inbreeding was published in 1949, and in 1950 he 
published the first paper applying a computer to a 
biological problem. Fisher retired from Cambridge 
in 1957. 

Fisher was one of the great intellects of the twen- 
tieth century. In statistics he keeps company with 
Gauss and Laplace whilst in biology he has been com- 
pared with Charles Darwin as “the greatest of his 
successors.” In the intersection of the two fields of 
statistics and biology he was the outstanding pioneer, 
and as the first person to recognize both the desirabil- 
ity and the practicability of constructing the human 
genome map he initiated one of the major scientific 
achievements of the century. 
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Definition 


Fitness is a concept that is often considered to be 
central to population genetics, demography, and the 
synthetic theory of evolution. In population genetics, 
it is technically a relative or absolute measure of re- 
productive efficiency or reproductive success. The 
absolute or Darwinian fitness of a certain genetic con- 
stitution living in a defined homogeneous environ- 
ment would be equated with the mean number (or 
the expected number) of zygotes sufficiently similar 
to that produced during its entire lifetime, whereas 
relative fitness would be the measure of the reproduct- 
ive efficiency of a certain genotype, as defined above, 
compared with that of another from the same popula- 
tion. 

The term ‘sufficiently similar’ needs explanation: it 
does not mean that the offspring of a certain genotype 
would necessarily share the same genotype with their 
parent (actually, in many instances, Mendelian segre- 
gation would prohibit this); rather, it indicates that 
genetic effects seriously affecting fitness with one gen- 
eration delay should be taken into consideration and 
should affect the value of fitness of the parent geno- 
type exclusively responsible for these delayed effects. 
Thus, the grandchildless mutants in Drosophila sub- 
obscura and D. melanogaster have such an effect: 
female homozygotes for the mutant allele produce 
sterile offspring (regardless of the genotype of the 
male parent or that of the offspring). The reason for 
this is that in their fertilized eggs, the posterior polar 
cells are not formed. The mean number of offspring 
may not be sufficient to define the fitness of a geno- 
type: the distribution of the number of its offspring 
may also be of importance. Thus, Gillespie has shown 
that genotypes having the same mean number of pro- 
geny, but differing in variances, have a different evolu- 
tionary fate. Everything else being equal, an increase 
in variance of the number of progeny (from generation 
to generation, or spatially, or developmentally) is, in 
the long run, disadvantageous. From the actual mean 
expected number of progeny we should subtract a 
quantity equal to 1/2s? for the case of temporal variation 
(where sis the variance in offspring number) and 1/Ns” 
for the case of developmental variation to arrive at an 
estimate of fitness. However, with the exception ex- 
plained above, concerning the one-generation delayed 
effects of a certain genotype affecting reproduction, 


we will restrict fitness definition to only one genera- 
tion, (where N is the effective population size) thus 
avoiding the temporal variation. (The long-term evo- 
lutionary fate addressed by Thoday and Cooper will 
not be considered here.) Furthermore, we will restrict 
the definition of fitness to a certain homogeneous 
selective environment, thus avoiding complications 
such as those described by Brandon (1990), where a 
genotype having two different fitnesses in two environ- 
ments, both lower than the respective two fitnesses of 
a different genotype, may end up having a higher mean 
fitness due to an unequal distribution of individuals of 
the two respective genotypes in these two environ- 
ments]. The reason for these restrictions is that fitness 
values serve to put some flesh onto the models des- 
cribing allelic frequency changes from generation to 
generation and thus allowing short-term genetic pre- 
dictions. Fitness is a useful device in quantifying the 
kinetics of a genetic change; it is otherwise devoid of 
any other independent meaning and cannot serve as a 
substitute to the nebulous concept of adaptation. Of 
course in some models, one may consider complex 
fitness functions, e.g., the weighted mean fitness in 
two environments. 

Medawar (in Krimbas, 1984) expresses this view in 
the following statement: 


The genetical usage of ‘fitness’ is an extreme attenuation of 
the ordinary usage: it is, in effect, a system of pricing the 
endowments of organisms in the currency of offspring; i.e. in 
terms of net reproductive performance. It is a genetic valu- 
ation of goods, not a statement about their nature or quality. 


Historical Overview 


The first use of ‘fitness’ with a loosely similar mean- 
ing is found in Darwin’s On the Origin of Species. 
From the first to the sixth edition, Darwin employed 
the verb ‘fit? and the adjective ‘fitted’ as synonyms 
for ‘adapt’ and ‘adapted,’ respectively. The noun first 
appears in 1859: 


Nor ought we to marvel if all the contrivances in nature be 
not, as far as we can judge, absolutely perfect; and if some of 
them be abhorrent to our idea of fitness. 

(Paul, 1992) 


Of course, Darwin inherited the concept of a fitness 
between the organism and its environment from nat- 
ural theology and from the concept of adaptation. In 
1864, Herbert Spencer used the expression “survival 
of the fittest” as a synonym for natural selection, 
which was later used by Darwin. Thus, from the 
beginning, fit and fitness were seen to be semantically 
closely related to the process of natural selection and 


to ‘adaptation.’ Even today, Brandon (1990) equates 
fitness with adaptedness. 

In 1798, Malthus compared the rates of increase of 
population size with the amount of food produced. 
According to Tort (1996), the ratio 2 of the number of 
individuals of one generation (N,+1) to that of its 
parental generation (N,) is the Darwinian fitness or 
the Malthusian fitness. No differences among indi- 
viduals are considered in this formulation, which 
describes a geometric or rather an exponential increase 
of population size, if A is constant. In 1838, Verhulst 
gave another formulation, taking into consideration 
the change in ratio as the population reaches its 
carrying capacity, K. Thus Verhulst distinguishes w, 
the biological fitness (the Verhulstian fitness is the 
number of offspring produced by an individual at its 
sexual maturity) and population fitness, which varies 
also according to K and to the present population 
size, N;: 


(w—1) 


Nm = WN; — K 


(Ni) 


Thus, the relation between a nonconstant Malthusian 
fitness and a Verhulstian fitness is: 


(w—1) 


KO 


A= Wr 


Let b be the percentage of the individuals in a 
population that during a small time interval Az give 
birth to one individual (bAt) and d is the percentage of 
individuals dying at the same time interval (dAt), the 
net change in individuals at the same time interval will 


be: 


By substituting b—d with m (where m is, according to 
Fisher, the Malthusian parameter) and integrating we 
get the form of increase of population size: 


N, = Noe” 


Lotka, as well as Fisher, used mortality and fertility 
tables for the different biological ages to estimate fit- 
ness from m. The Darwinian fitness is related to the 
Malthusian parameter in the following: 


A= e” 


Furthermore, Fisher considered that Malthusian 
parameters, and thus fitnesses, are inherited, different 
genotypes having different fitnesses. The course of 
evolution is to maximize population fitness, that is 
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the (weighted) mean value of the individual fitnesses 
of a population. 


The rate of increase in fitness in any organism at any time is 
equal to its [additive] genetic variance in fitness at that time. 


Since variances are always positive the change 
will always be in the direction of an increase of this 
quantity. Fisher considered this ‘fundamental theorem 
of natural selection’ as a general law, equivalent to 
the second law of thermodynamics, which stipulates 
always an increase of a physical quantity, i.e., entropy. 
The generality of Fisher’s law was questioned and, in 
some cases, it was shown not to hold true. Further- 
more, as Crow and Kimura remarked, 


One interpretation of the theorem is to say that it measures 
the rate of increase in fitness that would occur if the gene 
frequency changes took place, but nothing else changed. 


Thus an environmental deterioration that would affect 
fitness values, and thus decrease mean population fit- 
ness, is not considered by Fisher. 

Wright used the population fitness as varying 
according to the gene frequencies in the population. 
Excluding competition among individuals, Wright 
states that every genotype is characterized by a fitness 
value and each individual belonging to that genotype 
has an expected number of progeny, which is the fit- 
ness of that genotype. The population fitness, W, is the 
expected mean number of progeny of every individual 
of the parental generation. W is a composite function, 
the sum total of the products of all genotype frequen- 
cies by their specific fitnesses (or adaptive values). 
Contrary to Fisher, Wright, in his shifting balance 
theory, envisages most of the species to consist of 
many small and more or less isolated populations, 
each with its specific gene frequencies. Populations 
occupy the peaks of an adaptive surface, formed by 
the values of W (population fitnesses), for every point 
corresponding to certain gene frequencies. These 
peaks are positions of stable local equilibria. Due to 
drift, gene frequencies may change and, thus, popula- 
tions may cross a valley of the adaptive surface and be 
attracted by another peak. Equilibrium points are 
local highest points of population fitness values. 


Components of Fitness: Inclusive 
Fitness 


It is often stated that selection acts on survival and 
reproduction. This is not an exact phrasing: fitness is 
the mean number of progeny left; therefore viability 
components (survival, longevity) are important as far 
as they affect the net reproductive effect. Longevity 
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may beimportant only inthose cases where it may affect 
the net reproductive effect. Selection is blind to lon- 
gevity at a postreproductive age. This is the reason why 
the inherited pathological syndrome of Huntington 
disease, which appears after the reproductive years, 
seems not to be selected against. 

According to Hartl and Clark (1989), starting from 
the stage of the zygote, the components of fitness are 
as follows: viability; subsequently sexual selection 
operates favoring or prohibiting a genotype to find 
mates; in the general case every combination of geno- 
types of the mating pair may correspond to a specific 
fecundity. Thus, fecundity depends on the genetic 
constitution of both partners. For a simple one gene/ 
two alleles case, nine different fecundity values are 
defined. Before the formation of the zygote, gametic 
selection (one aspect of it being meiotic drive) 
may take place, and sometimes counteract the dir- 
ection of selection exercised at the diploid phase. 
Fitness is estimated by counting zygotes produced 
by a zygote. A proposal to overcome the difficulty 
of counting zygotes in many animals was to start from 
another well-recognized stage of the biological cycle 
and complete this cycle to the same stage of the pro- 
geny. This proposal is, however, mistaken, since the 
progeny of a genotype do not necessarily have the 
same genotype as their parent. As a result this esti- 
mates fitness components corresponding to different 
genotypes. 

Developmental time is an important, but generally 
neglected or ignored, component of fitness in popu- 
lations of overlapping generations at the phase of 
increase of their size (e.g., at the beginning of colon- 
ization of a new unoccupied territory; r-selection). 
Lewontin (1965) examined the case of insects that 
follow a triangle schedule of oviposition (a triangular 
egg productivity function is characterized by three 
points: the age of first production, that of peak 
production, and that of the last production reported 
in a time coordinate and the number of eggs produced 
at the other). In his specific model, a shortening of 
developmental time may be equivalent to a doubling 
of total net fecundity. This shortening is equal to a 
1.55 day decrease of the entire egg production 
program (what Lewontin calls a transposition of 
the triangle to an earlier age), or to a 2.20-day decrease 
only of the age of sexual maturity (the age at which 
the first egg is produced), leaving the other ages 
unchanged as well as the total number of eggs 
deposited. It is also equivalent to a 5.55-day decrease 
of the age of the highest egg production only (the 
peak of the triangle), other things remaining unchanged 
or, finally, to a 21-day decrease of the age at which the 
last egg is deposited, other variables remaining the 
same. 


Hamilton’s concept of ‘inclusive fitness’ was for- 
mulated to provide a Darwinian explanation for al- 
truistic actions that may endanger the life of the 
individual performing such acts. An individual may 
multiply its genes in two different ways: directly by its 
progeny, and indirectly by protecting the life of other 
individuals of a similar-to-it genetic constitution. If 
the danger encountered is outweighed by the gain 
(all calculated in genes) then the performance of such 
acts may be fixed by natural selection. Estimations of 
inclusive fitness do not take into account only the 
individual’s fitness but also that of its relatives (of 
similar genetic constitution): it is the sum total of 
two selective processes, individual selection and kin 
selection. In this case, in fact, the counting tends to 
change from the number of individuals in the progeny 
to the number of genes preserved by altruistic acts 
in addition to those transmitted directly through its 


progeny. 


Adaptation, Adaptedness and the 
Propensity Interpretation of Fitness 


Natural selection acts on phenotypes; certain traits of 
these phenotypes are the targets of selection. The 
individuals bearing some traits are said to be adapted. 
However, no common and general property may char- 
acterize adaptation. A search through the literature of 
all the important neo-Darwinists reveals that, in spite 
of the suggestion that adaptation has an autonomous 
meaning, it is used in fact as an alternative to selection. 
Van Valen seems to differ from all other authors be- 
cause he equates adaptation with the maximization of 
energy appropriation, both for multiplying and for 
increasing biomass, thus solving the problem of lianas 
and other clone organisms. The concept of adaptation 
was shown to be completely dependent on that of 
selection (Krimbas, 1984). Brandon provided an argu- 
ment proving the impossibility of establishing an 
independent of selection criterion or trait for adapta- 
tion. He argued that we may be able to select in the 
laboratory against any character except for one, fit- 
ness. There is no reason to exclude from natural selec- 
tion the selection experiments performed in the 
laboratory, since the laboratory is also part of nature. 
Thus there is no character or trait in the diploid organ- 
ism that could be taken in advance as an indication of 
adaptation independently of selection. Fitness is a 
variable substantiating and quantifying the selective 
process. 

While one would expect adaptation to disappear 
from the evolutionary vocabulary, it is still used 
for describing the selective process that changes or 
establishes a phenotypic trait as well as the trait 
itself. Sometimes the engineering approach is used: 


adaptation, it is argued, is in every case the optimal 
solution to an environmental problem. The difficulties 
with such an approach are twofold. First, we are often 
unable to define precisely the problem that the organ- 
ism faces (it might be a composite problem) in order to 
determine in advance the optimal solution and, as a 
result, we tend to adapt the ‘solution’ encountered to 
the nature of the problem the organism faces. Second, 
it is evident that several selection products are not 
necessarily the optimal solutions, the evolutionary 
change resembling more a process of tinkering rather 
than an application of an engineering design. 
Recently, several authors (Brandon, Mills and 
Beatty, Burian, and Sober; see Brandon, 1990) have 
supported the propensity interpretation of fitness (or 
adaptedness). In so doing they try first to disentangle 
‘individual fitness’ (something we are not considering 
here; as mentioned earlier we have taken into consid- 
eration only the fitnesses of a certain category or 
group of individuals) from the fitness that is expected 
from its genetic constitution. Indeed, all kinds of acci- 
dents may drastically modify the number of progeny 
one individual leaves behind. A sudden death may 
zero an individual’s contribution to the next gener- 
ation. But selection is a systematic process in the sense 
that in similar situations similar outcomes are ex- 
pected. Thus, in order to pass from the individual or 
actual fitness to the expected one, these authors are 
obliged to consider two different interpretations of 
‘probability.’ The first interpretation considers prob- 
ability as the limit of a relative frequency of an event in 
an infinite series of trials, but since this series is never 
achieved, the observed frequency in a finite series of 
trials might be used instead. The second interpretation 
is that of propensity, where the very constitution, i.e., 
the physical properties, of the individual underlies the 
propensity for performing i ina given way. This may be 
a dispositional property, i.e., it might be displayed in a 
certain way in some situations and in another way in 
others. The propensity interpretation of fitness attri- 
butes to physical causes, linked to the very structure of 
the individual, the tendency to produce a specific 
number of offspring in a particular selective environ- 
ment. This is another way of reifying fitness, and via 
fitness relative adaptedness, and finally adaptation. It 
is reminiscent of the Aristotelian potentia et actu, 
where the propensity is ‘potentia’ and the actual mean 
number of offspring corresponds to the ‘actu.’ In some 
situations of viability selection this interpretation 
seems quite satisfactory (e.g., in mice resistant to war- 
farin). No one would deny that the selection process 
depends most of the time on the properties of a geno- 
type performing in a certain environment. But this 
may not be as general as one may think. There are 
situations in which the contribution to fitness from 
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the part of the organism is not clear or does not seem 
preponderent. Thus, it is more difficult and much less 
satisfactory to attribute to a certain genetic constitu- 
tion the mating advantage of the males when they are 
rare and the mating disadvantage when they are fre- 
quent. It is a case of frequency-dependent selection. 
On the other hand, the definition of genotypic 
fitness might also suffer from some disadvantages. 
Let us consider the case of dextral and sinistral coiling 
in shells of certain species of snails. The direction of 
coiling is genetic, due to one gene with two alleles. The 
allele d (for dextral coiling) is dominant to the / allele 
(recessive). But the phenotype of the individual is ex- 
clusively determined by the genotype (not the pheno- 
type) of its mother and not by its own genotype. Thus, 
there is a delay of one generation in phenotypic 
expression. Selection operates on phenotypes (the 
interactors of D. Hull). In the case of selection for 
dextral or sinistral direction of coiling, the phenotypic 
fitnesses may be clearly understood and simple but 
useless, while the genotypic fitnesses would be a com- 
plicated function depending on the frequency of the 
alleles in the population and the mating system. 
Thus, it seems better to consider genotypic fitness 
as a useful device in performing some kinetic studies 
regarding changes in gene frequencies or attraction to 
an equilibrium point. It is useless to attribute other 
qualities or properties to this device. Modern evolu- 
tionary theory is basically of a historical nature (al- 
though some processes may be repeated). A complete 
and satisfactory explanation of a specific case should 
comprise a historical narrative including information 
of the phenotypic trait being the target of selection, the 
ecological, natural history, or other reason driving 
the selective process (why this trait is being selected), 
the genetics of the trait, the subsequent change to 
selection of the genetic structure of the population, 
and the corresponding change in the phenotypes. In 
natural history, generality and the search for hidden 
and nonexisting entities and properties may only con- 
tribute to an increase in the metaphysical component of 
evolutionary theory inherited from natural theology. 


On Population Fitness 


It is much more difficult to define population fitness: 
population geneticists use to calculate the mean adap- 
tive value or the mean individual fitness in a popu- 
lation. But this exercise is quite futile when comparing 
two different populations. A group of adapted organ- 
isms is not necessarily an adapted group of organisms. 
Demographers earlier equated size (or increase in size) 
with population fitness. However, as Lewontin once 
remarked, it is not certain that a greater or denser 
population is better adapted, since it may suffer from 
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parasites and epidemics; on the other hand, a popula- 
tion depleted of individuals may suffer collapse and 
extinction. I have argued (Krimbas, 1984) that accord- 
ing to the ‘Red Queen hypothesis’ of Van Valen, all 
populations (at least of the same species) seem to have, 
a priori, the same probability of extinction, and thus 
possess, a priori, the same long-term population fitness. 

In addition, it is not clear enough how we should 
consider a group: a group is not an organism that 
survives and reproduces. Although individuals of the 
group interact in complex ways and thus provide some 
image of cohesion, the ‘individuality’ of the groups 
seems most of the time to be quite a loose subject. 
Should we consider group extinction per unit of time 
to determine group fitness? What about group multi- 
plication? In order to achieve a model in group selec- 
tion cases, one may resort to different population 
selective coefficients, or population adaptive coeffi- 
cients (something related to the population fitness). 
In these cases, the search for the nature of population 
fitness becomes even more elusive. As a result, popu- 
lation fitness is a parameter useful exclusively for its 
expediency; no search for its hidden nature is justified. 
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A fitness landscape, or adaptive surface, is a geomet- 
rical construct in which the fitness or adaptive value 
of a genotype is the ordinate and the two abscissae 


are gene (or sometimes genotype) frequencies. It is 
usually pictured in three dimensions, but conceptually 
can involve a larger number. In some models it has an 
exact mathematical meaning, in others it is employed 
as a metaphor. 

The idea of an adaptive surface was introduced by 
Sewall Wright. He thought of a surface on which each 
point on the surface corresponded to a combination of 
allele frequencies on the abscissae. Figure | shows a 
simple two-locus example. Random mating propor- 
tions and linkage equilibrium are assumed. The two 
abscissae are the frequencies of the dominant A and 
B alleles. The relative genotype fitnesses of aa bb, 
A-— bb, aa B—, and A—B-— are 1, 1 — s, 1 — s, and 
1+ t, where s and t are both positive and A — (or B —) 
indicates that the second allele can be either A or a (or 
B or b). The ordinate represents the average fitness of a 
population with particular allele frequencies. There 
are two peaks, one when the genotype AA BB is 
fixed, the other a lower peak for the genotype aa bb. 
Genotypes AA bb and aa BB are at the other two 
corners and are least fit. Ordinarily a population, 
located at a point on the surface, climbs the nearest 
peak, but not necessarily in a straight line. The com- 
plications of mutation, linkage, and epistasis may 
cause the path upward to be circuitous. And, as these 
complications are introduced, along with more loci, 
the mathematics becomes more difficult. 

This is the situation envisioned by Sewall Wright. 
A population cannot change from the lower peak to 
the higher one, because it has to pass through a less 
fit region. It was this dilemma that led Wright to 
propose his shifting-balance theory whereby a combi- 
nation of random drift and differential migration 
make it possible to cross the valley and reach a higher 
peak. Wright regarded the fitness surface more as a 


fitness 
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aa bb 


Figure | Example of a fitness landscape, with two loci. 


metaphor than as a mathematical model. As a result his 
papers present different, often confusing concepts. 
Sometimes the abscissae are allele frequencies, some- 
times they are genotype frequencies, and sometimes 
phenotypes. How rugged the fitness surface is has 
been a matter of continual discussion since Wright 
first introduced his ideas in the 1930s. Wright thought 
of the multidimensional surface as quite rugged, with 
numerous peaks and valleys. Others, R. A. Fisher in 
particular, have suggested that the surface is more like 
an ocean with undulating wave patterns. Furthermore, 
as the number of dimensions increases, only a small 
fraction of the stationary points are maxima. A popu- 
lation is much more likely to be on a ridge than on a 
peak. The debate was not settled while Wright was 
alive and still continues. Wright summarized his life- 
time view of the subject in a paper entitled “Surfaces of 
selective value revisited,” published in 1988, shortly 
before his death (Wright, 1988). 

Although Wright, more than anyone else, was 
responsible for introducing random processes into 
population genetic theory, he never attempted to 
model the whole shifting-balance process stochastic- 
ally. Recently there has been considerable mathemat- 
ical work in this area, partly as a way of developing 
and testing Wright’s theory. The entire process has 
been treated stochastically, something that was miss- 
ing in Wright’s formulations. 

The landscape idea has been extended to concepts 
other than fitness, such as developmental morphology 
and protein structure. The ruggedness of the landscape 
determines whether orderly change is possible or 
whether alternatives, such as stasis or chaos, emerge. 
In evolution, the lower the peaks and the higher the 
valleys, the more likely it is that selection can carry a 
population, if not to the highest peak, at least to one 
that has a respectable fitness. Similar considerations 
apply to the study of morphological development in 
the presence of various constraints. The ruggedness of 
the landscape can be deduced from parameters, such 
as the number of factors involved, and especially the 
degree to which they are coupled, or in genetic terms, 
the degree of epistasis. 
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The process of biological nitrogen fixation is an ex- 
tremely energy-demanding process requiring, under 
ideal conditions, approximately 16 moles of ATP per 
mole of N, fixed. The property of reducing atmos- 
pheric dinitrogen to ammonia is found among a wide 
variety of free-living, associative, and strictly symbi- 
otic bacteria. The genetics of nitrogen fixation were 
initiated in the free-living diazotroph Klebsiella pneu- 
moniae. This analysis led to the identification of 20 nif 
(for nitrogen fixation) genes. The identification of 
these if genes has substantially facilitated the study 
of nitrogen fixation in other prokaryotes such as 
Sinorhizobium meliloti (formerly Rhizobium meliloti) 
and Bradyrhizobium japonicum in identifying genes 
that are both structurally and functionally equivalent 
to K. pneumoniae nif genes, including nif HDK, nifA, 
nifB, nifE, nifN, nifS, nifW, and nifX. In addition, in 
these organisms, genes essential for nitrogen fixation 
were identified for which no homologs are present in 
K. pneumoniae. These were named fix genes and are 
often clustered with nif genes or regulated coord- 
inately. Table | summarizes the properties and func- 
tions of fix genes. 


Regulation of Nitrogen Fixation 


Owing to the extreme oxygen sensitivity of the nitro- 
genase enzyme, a major trigger for nif and fix gene 
expression in all systems studied so far is low oxygen 
tension. For instance, in the legume nodule, the 
dissolved oxygen concentration is 10-30 mmol 17t, 
creating a hypoxic environment. Conversely, all nitro- 
gen-fixing bacteria deploy a complex regulatory cas- 
cade preventing aerobic expression of nif and fix 
genes. In addition, the deprivation of fixed nitrogen 
also controls the process of nitrogen fixation in free- 
living fixers but not in symbiotic bacteria (except 
Azorhizobium caulinodans). Many, but not all, nif 
and fix genes including the nitrogenase structural 
genes and accessory functions are preceded by a char- 
acteristic type of promoter, the —24/—12 promoter, 
recognized by the alternative sigma factor o°* (or 
RpoN). Activation of this promoter requires the 
presence of an activator protein, i.e., the nitrogen 
regulatory protein NtrC or the nitrogen fixation regu- 
latory protein NifA. While NtrC regulates gene 
expression in response to the nitrogen status, NifA 
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Table I Function or putative function of fix genes 
Gene or Homology and/or function Renamed 
operon 
fixABCX Required for nitrogen fixation, function unknown; 
FixX shows similarity to ferredoxins 
fixD Transcription activator of nif, fix, and additional genes nifA 
fixF Codes for a polypeptide homologous to the nifK gene product nifN 
fixGHIS Required for the formation of the high-affinity cbb3-type 
cytochrome oxidase; Fixl shows similarity to the Cu-transporter CopA 
fix] Regulatory two-component system involved in oxygen regulation of 
fixK and nifA (S. meliloti) transcription 
fixK Regulatory protein, belongs to the Crp/Fnr family of prokaryotic 
transcriptional activators 
fixNOQP Microaerobically induced, membrane bound high-affinity cbb3 cytNOQP 
cytochrome oxidase 
fixR Sequence similarity to NAD-dependent dehydrogenases, 
not essential for symbiotic nitrogen fixation in B. japonicum 
fixT Negative regulator of FixL 
fixU Function unknown 
fixW Function unknown 
fixY Deduced amino acid sequence of the sequenced part 
of fixY shows similarity to the regulatory NifA protein from K. pneumoniae 
fixZ May contain an iron-sulfur cluster; the sequence of FixZ is 


very similar to K. pneumoniae NifB 


senses the oxygen concentration either directly, as in S. 
meliloti, or indirectly through the activity of the NifL 
regulatory protein, as observed in K. pneumoniae. 
In addition to the basic NifA-mediated regulatory 
mechanism, most symbiotic diazotrophs have evolved 
additional control mechanisms of nif and fix gene 
expression. 


The FixL-Fix] Regulatory Cascade 


The activation of nitrogen fixation genes in S. meliloti 
involves a regulatory cascade of which the fixLJ genes 
are the primary controllers (Figure 1). The FixL and 
Fix] proteins are members of the ubiquitous two- 
component family of regulatory systems in which 
the sensor (in casu FixL), a histidine kinase, activates 
the response regulator (FixJ) by phosphorylation 
in response to a specific environmental signal. The 
S. meliloti FixL protein is a membrane-anchored 
hemoprotein that acts as an oxygen sensor. Oxygen 
binds to a heme group joined to a histidine residue that 
is located within a PAS structural motif. These motifs 
are found in a wide variety of protein modules that 
sense diverse stimuli such as the redox potential, light, 
and oxygen. Under hypoxic conditions, FixL auto- 
phosphorylates on a conserved histidine residue with 
a y-phosphate from ATP. In the absence of bound 


oxygen, the kinase activity of FixL is turned on and 
the phosphate group is subsequently transferred to an 
aspartate residue in the cognate receiver protein, FixJ. 
Upon phosphorylation, Fix] is turned into a transcrip- 
tional activator of two regulatory genes, nifA and fixK. 
In addition, FixL also has phosphatase activity, redu- 
cing effectively the amount of FixJ-phosphate under 
aerobic conditions. 

Besides nifA, FixJ-phosphate activates the expres- 
sion of the fixK gene. FixK is homologous to members 
of the Crp/Fnr family of prokaryotic transcriptional 
activator proteins. FixK acts as an activator of the 
fixNOQP genes, coding for a high-affinity respiratory 
oxidase complex, fixGHIS, and fixT genes. This func- 
tion can be taken over in other rhizobia such as B. 
japonicum, Rhizobium leguminosarum biovar. viciae, 
and Rhizobium etli by the FnrN protein. The latter 
protein possesses a distinct cysteine signature believed 
to play a role in redox sensing as does a similar motif in 
Fnr. In contrast, the FixK protein does not show con- 
served cysteines and the activity of this protein is not 
subject to oxygen control. FixK and FnrN bind to 
conserved DNA motifs called anaeroboxes in the 
promoter of their target genes. 

In S. meliloti, a repressor of nitrogen fixation gene 
expression was also identified. The fixT gene codes 
for a small protein that modulates the activity of the 
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Model of the FixL—Fix] regulatory cascade in Sinorhizobium meliloti. Low oxygen conditions stimulate the 


autophosphorylation activity of FixL and repress the phosphatase activity of FixL—P. In contrast, the phosphatase 
activity of the unphosphorylated FixL as well as phosphoryl transfer from FixL—P to Fix] protein are independent of 
the oxygen concentration. Activity of the transcriptional regulator NifA is repressed by oxygen (see text for details). 
Genes regulated by NifA and FixK are marked in grey and black respectively. Proteins marked in a black oval are 
transcriptional activators. Not all known open reading frames (ORFs) associated with nif and fix genes are shown. 


two-component system FixLJ. The target of FixT 
is the C-terminal domain of the FixL protein and 
the interaction of both proteins leads to inhibition 
of FixL-phosphate synthesis and consequently a de- 
crease in nifA and fixK transcription. 


Conclusion 


Up to now many nitrogen fixation genes, different 
from the previously characterized nif genes, have 
been identified. These fix genes are involved either in 
basic cellular functions in nitrogen fixing conditions 
(e.g., respiration), or in processes more directly linked 
to the nitrogen fixation process (e.g., electron trans- 
port to the nitrogenase enzyme), or have a regulatory 
role (oxygen sensing). fix genes are present not only in 
symbiotic bacteria but also in free-living nitrogen 
fixers. Genes homologous to fix genes are even 
found in non-nitrogen-fixing bacteria (e.g., fixABCX 
homologs in Escherichia coli). It is to be expected that 
ongoing sequencing projects and the associated gene 
expression and functional analyses of prokaryotic 


organisms will ultimately lead to complete models 
describing processes as complex as biological nitrogen 
fixation. 
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Fixation of Alleles 
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The fixation probability of a mutant allele is the prob- 
ability that it becomes fixed and substitutes for the 
original allele in a population. The probability depends 
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on the initial frequency and selective value of the 
mutant as well as on the effective population size. 
Fixation probability becomes larger as the advanta- 
geous effect of the mutant increases. However, select- 
ively neutral as well as very slightly deleterious 
mutations have a finite probability of fixation. Many 
mutants at the molecular level have very small effects, 
and neutral and slightly deleterious mutations are 
prevalent. Fixation probability is a most basic quan- 
tity for discussions of evolution, particularly at the 
molecular level. 

In the following, let us consider the simple case of 
genic selection. Let A and a be the original and mutant 
alleles, and s be the selection coefficient of the allele, a, 
such that the relative fitness of the genotypes, AA, Aa, 
and aa are 1, 1+s and 1+2s, respectively. Ne and p 
denote the effective population size and the initial 
frequency of a. M. Kimura has shown that the fixation 
probability of a, u( p), becomes as follows: 


ES e7 4Nesp 
u(p)= T eN (1) 


Most mutations are unique, and the initial frequency, 
p, is the reciprocal of the actual size of the population, 
1/(2N), where N is the actual size: 


1 1 — e72(Ne/N)s 
u — (2) 
2N 1 — e4Nes 


For a neutral mutant, i.e., s = 0, the fixation probability 
is equal to the initial frequency, p. Equation (1) tells us 
that u(p) simply depends on the product, Nes. When 
there is dominance, the formula becomes more com- 
plicated. 

The relationship between u(p) and Nes is given in 
Figure |. As seen from the figure, the fixation prob- 
ability is a monotonically increasing function of N,s. 
In other words, for advantageous mutants (s > 0), u( p) 
increases as N, and/or s get larger. If the selective 
advantage of a mutant remains the same in varying 
population size, the mutant has a greater chance of 
becoming fixed in large populations than in small 
ones. For slightly deleterious mutations (s < 0), u(p) 
decreases as N. and/or the absolute s value get larger. 
Therefore, slightly deleterious mutations have less 
chance of survival in large populations than in small 
ones. 

When 4 N.s > 1, mutants are said to be definitely 
advantageous, and equation (2) reduces to 2s, provided 
N = N.. In other words, the fixation probability of a 
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Figure | Fixation probability, u(p), of mutant genes as 


a function of 2 Nes. 


definitely advantageous mutant is twice its selective 
advantage, as found by J. B. S. Haldane a long time ago. 

In natural populations, the values of N. and s are 
not constant, but varies in space and in time and the 
above simplified treatment is an approximation. The 
application of the theory is mostly on molecular evo- 
lution. The data on the rate of gene substitution are 
available on many proteins and DNA sequences. By 
examining such data, one may infer various selective 
forces in relation to fixation probability. The rate of 
substitution may be obtained in terms of fixation 
probability. In one generation, 2Nv new mutations 
appear in the population, if v is the rate of occurrence 
of mutations per gamete per generation. The rate of 
substitution (&) is the product of 2Nv and fixation 
probability: 


k=aNux (3x) (3) 


This equation is useful in interpreting data. 
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The flagellum is a molecular machine whose function is 
to generate motility, a common characteristic of many 
species of bacteria. Motility gives cells the freedom to 
move into a wider world, but, in order to confer a 
survival advantage, movement must be coupled to 
some form of sensory machinery that allows move- 
ment toward a favorable environment. The sensory 
machinery is a protein complex consisting of various 
receptors and signal-transducing enzymes. The behav- 
ior imparted by a harmonious combination of motility 
and chemical sensing is called chemotaxis. 

Of the 50 genes required for chemotactic behavior 
in Salmonella enterica serovar typhimurium, 10 genes 
encode the sensory complex (four genes for chemo- 
receptors, six genes for components of signal-transdu- 
cing enzymes). The remaining 40 genes are required for 
the biogenesis of the flagellum (Figure |). Phenotypes 
caused by mutations in each gene are divided into 
three groups: Che (chemotaxis-deficient), Mot 
(motility-deficient), Fla~ and (flagella-deficient). 

Defects in genes of the chemosensory system give 
rise to Che” mutants, which show either smooth (no 
change of direction) or tumbling (continuous changes 
of direction) swimming. There are two authentic 
genes (motA and motB) and three pseudogenes 
(fliG, fliM, and fliN) necessary for torque generation. 
Most of the remaining genes are responsible for fla- 
gellar construction and were originally called fla 
genes. When the number of fla genes surpassed 26, in 
1988, fla genes were assigned to four groups according 
to the gene clusters on the chromosome: flg, flag (A- 
N; 23 min); flh, fluh (A-E; 40 min); fli, fly (A-T, Y, Z; 
42 min); and flj, flaj (A, B; 56min). This unified 
nomenclature was proposed for Escherichia coli and 
Salmonella enterica serovar typhimurium and is now 
widely applied to many other bacterial species. 

In Salmonella and related species, there is a three- 
tier regulatory hierarchy that governs the transcrip- 
tion of the flagellar genes. The master operon (f/hD, 
flbC, or flhDC), the only operon in class 1, activates 
the class 2 genes (37 genes in 8 operons) that mostly 
encode structural proteins of the hook-basal body 
(HBB). The class 2 level contains two regulatory 
genes, fliA and flgM: FIA is a sigma factor (078) for 
initiating class 3 gene transcription, and FlgM is an 
anti-sigma factor that binds FIiA to halt its action. 
FlgM is secreted through the central channel of the 
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complete HBB, resulting in release of FliA, which can 
then freely interact with RNA polymerase and direct 
transcription of the class 3 operons. FlgM is transcribed 
from class 2 as well as class 3 operons. The amount 
of FlgM expressed at class 3 is much higher than that 
at class 2, indicating an autogenous regulation of the 
class 3 operons. Therefore, flagellar gene regulation is 
strictly coupled with the flagellar construction, pre- 
venting unnecessary production of abundant proteins 
of class 3. Upon initiation of class 3 gene expression, 
flagellar filaments are formed by flagellin (FliC) export 
and polymerization, and the sensory system (com- 
posed of 10 Che proteins) is organized. At the same 
time, the MotA/B complex is assembled on the peri- 
plasmic side of the membrane to rotate the motor, 
thereby completing a functional flagellum. 

There are three multifunctional genes (fliG, fliM, 
and fliN) that show three different phenotypes 
depending on the mutational sites. These gene prod- 
ucts, FliG, FliM, and FIN, form a cup-shaped com- 
plex (called a C ring) at the cytoplasmic side of the MS 
ring complex see (Figure 1). In fact, the C ring is 
multifunctional: It works as a part of the export appar- 
atus, it generates torque by the interaction with Mot 
complexes, and it switches the rotational direction of 
the motor by binding to a signal protein, the phos- 
phorylated form of CheY. 

Salmonella species have two sets of flagellin genes: 
fliC and fljB. The hin gene upstream of the fljB gene 
flip-flops, allowing f7B gene expression in only one 
direction. When the fljB gene is expressed, the fljA 
gene downstream of the fljB produces a repressor of 
the fliC gene, inhibiting a concomitant expression of 
the latter. By switching these flagellins (a property 
known as phase variation), cells can evade the immune 
system of the host. 

Flagellar construction is not independent from the 
cell division cycle. The number of flagella on a peri- 
trichously flagellated cell must stay constant after each 
cell division, otherwise the number will quickly 
become either zero or infinite after several genera- 
tions. Hence, it is reasonable to assume that the fla- 
gellar system is under global regulation, occurring 
synchronously with cell division. The gene(s) directly 
controlling the flagellar master genes has (have) not 
yet been identified. 

Although as many as 70% of the bacterial species 
so far studied show flagellar motility, flagella are not 
necessarily expressed at all time points throughout the 
life cycle. Some E. coli cells do not grow flagella in rich 
medium because of catabolite repression. Many spe- 
cies living in water grow flagella only at lower tem- 
peratures. Some soil bacteria and freshwater bacteria 
show flagella only during early log-phase. The master 
operon flhDC is transcribed with the help of the 
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housekeeping sigma factor o°. The sequences 
upstream of the flbDC operon are fairly diverse 
among bacterial species. In E. coli, several genes and 
physiological factors affecting the fl//DC expression 
have been known: the heatshock proteins (DnaK, 
DnaJ, and GrpE), the pleiotropic response regulator 
(OmpR) activated by acetyl phosphate, and the DNA- 
binding protein H-NS. More such genes and factors 
have been discovered in other species, and their 
mechanisms are under investigation. 


Further Reading 

Aizawa S-I and Kubori T (1998) Bacterial flagellation and cell 
division. Genes to Cells 3: 1—10. 

Aizawa S-I, Harwood CS and Kadner RJ (2000) Signaling 
components in bacterial locomotion and sensory reception. 
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FLI1 (Friend leukemia virus integration 1), identified 
as a common mouse viral integration site, encodes 
a protein that is a member of the Ets family of tran- 
scriptional regulators. As a consequence of the 
t(11;22)(q24;q12) translocation found in Ewing’s 


sarcoma (EWS) and primitive neuroectodermal tu- 
mours, FLI1 at 11q24 becomes joined to the EWS 
gene producing a fusion protein in which N-terminal 
EWS protein sequences become fused to the C- 
terminal FLI7 DNA-binding domain. 


See also: Ets Family; Ewing’s Tumor 


Flower Development, 
Genetics of 


G Theissen 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1674 


Essentials of Flower Developmental 
Genetics 


Flowers are the well-known reproductive structures 
of flowering plants (angiosperms) which are by far the 
largest group of extant plants. Flowers are composed 
of up to four different types of specialized floral 
organs: green, leaf-like sepals; showy petals which 
may attract pollinators; stamens, being the male repro- 
ductive organs which produce the pollen; and carpels, 
being the female reproductive organs inside which 
the ovules and seeds develop (Figure 1). The number, 
arrangement and morphology of these organs is 
diverse, but species-specific, since flower develop- 
ment is under strict genetic control. This guarantees 
that flower development is initiated only under con- 
ditions favorable for reproduction, but that once 
started it proceeds in a highly standardized way. 
Flower development can be subdivided into several 
major steps, such as floral induction, floral meristem 
formation, and floral organ development. Accurate 
genetic control of the different steps of flower devel- 
opment is achieved by a hierarchy of interacting regu- 
latory genes, most of which encode transcription 
factors (Figure |). Close to the top of that hierarchy 
are ‘flowering time genes’ which are triggered by 
developmental cues and environmental factors such 
as plant age, day length, and temperature. ‘Flowering 
time genes’ mediate the switch from vegetative to 
reproductive development by activating meristem 
identity genes. ‘Meristem identity genes’ control the 
transition from vegetative to inflorescence and floral 
meristems and work as upstream regulators of ‘floral 
organ identity genes.’ Combinatorial interactions of 
these genes specify the identity of the different floral 
organs by activating organ-specific ‘realizator genes.’ 
Most of the genes controlling flower development 
belong to highly conserved gene families, such as the 
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MADS-box, FLO-like, and AP2/EREBP-like genes, 
which are assumed to encode transcription factors. 

Our current knowledge about the genetics of 
flower development has been mainly worked out in 
two model plants, thale cress (Arabidopsis thaliana) 
and snapdragon (Antirrhinum majus). While Arabi- 
dopsis has been of great importance for studies on all 
different kinds of genes involved in flower formation, 
Antirrhinum was of special importance during cloning 
of the first floral meristem and organ identity genes. 
Therefore, the descriptions outlined below focus 
on these predominant model systems, unless stated 
otherwise 


Floral Induction 


When flowering plants have reached a critical age, 
environmental signals may trigger a switch to floral 
development. The shoot apical meristem, a small 
group of progenitor cells, ceases production of leaf 
primordia and switches to the production of floral 
meristems which develop into flowers. Since flower- 
ing at the wrong time may seriously hamper repro- 
ductive success, the angiosperms have evolved 
multiple genetic pathways to regulate the timing of 
the floral transition in response to environmental stim- 
uli and developmental cues. Since plants live under 
very different environmental conditions and follow 
diverse life strategies, the mechanisms controlling the 
transition to flowering vary a lot, often even within 
single species. 

The analysis of natural variants (ecotypes) and of 
mutants that flower later or earlier than wild-type has 
revealed more than 80 gene loci that affect flowering 
time in Arabidopsis. These flowering time genes may 
contribute to two different components of the floral 
transition: the production of flowering signals and the 
competence of the shoot apical meristem to respond 
to these signals. The flowering time mutants can be 
grouped into different classes defining different path- 
ways of floral induction. Arabidopsis is a facultative 
long-day plant which responds to long days (indicat- 
ing spring and summer) by flowering earlier than when 
grown in short days. One class of mutants displays 
a reduced response to changes in photoperiod (day 
length) when compared to wild-type. The correspond- 
ing genes, therefore, may participate in a photoperiod 
promotion pathway. A second class of late-flowering 
mutants are unaffected in their response to photo- 
period. The corresponding genes thus may be 
involved in an autonomous promotion pathway. 
This pathway monitors the signals of an internal 
developmental clock that measures plant age. A third 
pathway, termed vernalization promotion pathway, 
confers susceptibility to vernalization, i.e., an extensive 
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Figure | A simplified and preliminary depiction of the genetic hierarchy that controls flower development in 
Arabidopsis thaliana. Examples for the different types of genes within each hierarchy level are shown. ‘Gibberellic acid, 
‘vernalization, ‘autonomous, and ‘photoperiod’ refer to the different promotion pathways of floral induction. 
‘Intermediate genes’ summarizes a functionally diverse class of genes including ‘cadastral genes? MADS-box genes are 
shown as squares, non-MADS-box genes as circles, and genes whose sequence has not been reported up to now as 
octagons. Some regulatory interactions between the genes are symbolized by arrows (activation), double arrows 
(synergistic interaction), or barred lines (inhibition, antagonistic interaction). For a better overview, by far not all of 
the known genes and interactions involved in flower development are shown. In case of the downstream genes, just 
one symbol is shown for every type of floral organ, though whole cascades of many direct target genes and further 
downstream genes are probably activated in each organ of the flower. A flower structure is shown in the lower region 
of the figure. At the bottom of the figure, the classical ‘ABC model’ of flower organ identity is depicted. According to 
this model, flower organ identity is specified by three classes of ‘floral organ identity genes’ providing ‘homeotic 
functions’ A, B, and C, which are each active in two adjacent whorls. A alone specifies sepals in whorl |; the combined 
activities of A + B specify petals in whorl 2; B + C specify stamens in whorl 3; and C alone specifies carpels in whorl 
4. The activities A and C are mutually antagonistic, as indicated by barred lines: A prevents the activity of C in whorls 
| and 2, and C prevents the activity of A in whorls 3 and 4. 

Abbreviations of gene names used: AG, AGAMOUS; AGL, AGAMOUS-LIKE GENE; AP, APETALA; ASK I, ARABIDOPSIS SKP 1- 
LIKEI; CAL, CAULIFLOWER, CO, CONSTANS; FLC, FLOWERING LOCUS C; FRI, FRIGIDA; FUL, FRUITFULL; LD, 
LUMINIDEPENDENS; LFY, LEAFY; LUG, LEUNIG; NAP, NAC-LIKE, ACTIVATED BY AP3/PI; PI, PISTILLATA; SEP, SEPALLATA; 
SHP, SHATTERPROOF; SOCI, SUPPRESSOR OF OVEREXPRESSION OF COI; SVP, SHORT VEGETATIVE PHASE; UFO, 
UNUSUAL FLORAL ORGANS; TFL!, TERMINAL FLOWERI. 


exposure to cold signaling the passage of winter and 
the onset of spring. A fourth pathway that mediates 
floral induction, the gibberellic acid promotion path- 
way, depends on the plant hormone gibberellic acid 
(Figure 1). 

Quite a number of flowering time genes have 
already been cloned, among them GA1, LUMINIDE- 
PENDENS (LD), CONSTANS (CO), FCA, FHA, 
FPA, FLOWERING LOCUS C (FLC), and 
SHORT VEGETATIVE PHASE (SVP). Mutations 
in the first six genes mentioned confer late-flowering 
phenotypes — hence they are termed ‘late flowering 
genes’ indicating that they normally function to pro- 
mote the floral transition. While CO and FHA belong 
to the photoperiod promotion pathway, FCA, FPA 
and LD are involved in the autonomous flowering 
pathway. GAZ is a key gene of the gibberellic acid 
promotion pathway, which eventually may activate 
the floral meristem identity gene LEAFY (see below). 

The late flowering genes encode proteins with very 
diverse biochemical or biophysical properties. GA 
encodes ent-kaurene synthetase, a key enzyme of gib- 
berellin biosynthesis. The FHA gene encodes the blue 
light receptor CRYPTOCHROME2 (CRY2) which 
is probably involved in photoperiod perception. The 
FCA and FPA gene products show similarity to 
RNA-binding proteins, suggesting that they promote 
flowering via a posttranscriptional mechanism. CO 
and LD encode putative transcription factors which 
promote flowering by activating early target genes 
such as SUPPRESSOR OF OVEREXPRESSION 
OF CO 1 (SOC1), and FLOWERING LOCUS T 
(FT) in case of CO. FT encodes a protein with simi- 
larity to Raf kinase inhibitor protein. SOCZ, like 
many other genes involved in flower development 
(Figure 1), isa member of the MADS-box gene family 
encoding transcription factors. MADS-box genes 
share a highly conserved, approximately 180 bp long 
DNA sequence, termed the MADS-box, which 
encodes the DNA-binding domain of the respective 
MADS-domain proteins. 

In contrast to the late-flowering phenotypes of the 
genes mentioned above, flc and svp null mutations 
result in early flowering, indicating that FLC and SVP 
are repressors of flowering. Reduction of FLC expres- 
sion is an important component of the vernalization 


response. Both FLC and SVP are MADS-box genes. 


Floral Meristem Formation 


As a consequence of floral induction, shoot meristems 
become committed to flowering. In Arabidopsis and 
Antirrhinum, floral meristems arise at the flanks of 
the inflorescence meristems at the shoot apices. 
Two key genes (‘floral meristem identity genes’) are 
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responsible for the transition from inflorescence to 
floral meristems and the specification of floral meris- 
tem identity in Antirrhinum, FLORICAULA (FLO) 
and SQUAMOSA (SQUA). The putative orthologs 
and functional equivalents from Arabidopsis are 
LEAFY (LFY) and APETALA1 (AP1), respectively. 
The function of these genes is indicated by the pheno- 
type of loss-of-function mutants. In these mutants, 
floral meristems often fail to form, and secondary 
inflorescences form instead, indicating that the transi- 
tion from inflorescence to floral meristems does not 
take place. Three other floral meristem identity 
genes, APETALA2 (AP2), FRUITFULL (FUL), 
and CAULIFLOWER (CAL), have little effect on 
meristem identity as single mutations, but the ap2, 
ful, and cal mutations enhance effects of /fy and ap1 
mutations on floral meristem identity. 

In the apical inflorescence meristem of Antirrhi- 
num the action of the floral meristem identity genes 
is antagonized by the CENTRORADIALIS (CEN) 
gene. Therefore, loss-of-function of this gene results 
in ectopic expression of FLO and SQUA in the inflor- 
escence meristem, thus transforming it into a floral 
meristem which generates a terminal flower. The 
putative ortholog and functional equivalent of 
CEN from Arabidopsis is TERMINAL FLOWER1 
(TFL1). TFL1 and CEN encode putative membrane- 
associated proteins which may be involved in a signal 
transduction chain required to repress the expression 
of the floral meristem identity genes in the inflor- 
esence meristem. In contrast, all known meristem 
identity genes that promote floral fate encode putative 
transcription factors. AP1, CAL, FUL, and SQUA 
belong to the family of MADS-box genes. LEY and 
FLO are members of a small family termed FLO-like 
genes. AP2 is a founder member of a large gene family 
called AP2/EREBP-like genes. 


Flower Formation and Floral Organ 
Development 


When the transition from inflorescence to floral mer- 
istems has taken place, floral organs arise at defined 
positions from within these meristems under the con- 
trol of different types of genes. In Arabidopsis, ‘floral 
meristem size genes’ such as CLAVATA1 (CLV1), 
CLV2, CLV3, and WIGGUM (WIG = ERAI) regu- 
late the size of the floral meristem and also influence 
floral organ number. CLV/ encodes a receptor protein 
kinase, CLV3 encodes the presumed extracellular pro- 
tein ligand for CLV1, and CLV2 encodes a receptor- 
like protein that may form a heterodimer with CLV1. 
WIG encodes a farnesyltransferase B-subunit involved 
in numerous aspects of plant development. ‘Cadastral 
genes’ like LEUNIG (LUG), AP2, and AG are 
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involved in setting the boundaries of floral organ 
identity gene functions. ‘Floral organ pattern genes’ 
such as PERIANTHIA (PAN), which encodes a 
bZIP-type transcription factor, act to establish 
floral organ primordia in specific numbers and posi- 
tions. These primordia develop into the different 
types of floral organs under the control of specific 
homeotic selector genes, termed ‘floral organ identity 
genes.’ 

The function of floral organ identity genes was 
recognized during the study of homeotic mutants in 
which the identity of floral organs is changed. In 
Arabidopsis and Antirrhinum such mutants come in 
three classes, A, B, and C. Ideal class A mutants 
have carpels in the first whorl instead of sepals, and 
stamens in the second whorl instead of petals. Class 
B mutants have sepals rather than petals in the 
second whorl, and carpels rather than stamens in 
the third whorl. Class C mutants have petals instead 
of stamens in the third whorl, and replacement of the 
carpels in the fourth whorl by sepals. In addition, 
these mutants are indeterminate, i.e., there is con- 
tinued production of mutant floral organs inside the 
fourth whorl. 

Based on these classes of mutants and all combin- 
ations of double and triple mutants the ‘ABC model’ 
proposes three classes of combinatorially acting floral 
organ identity genes, called A, B, and C, with A spe- 
cifying sepals in the first floral whorl, A + B petals in 
the second whorl, B + C stamens in the third whorl, 
and C carpels in the fourth whorl (Figure 1). The 
model also maintains that the class A and class C 
genes negatively regulate each other. Based on studies 
in petunia (Petunia hybrida), the ABC model was later 
extended by class D genes, specifying ovules. Mean- 
while it has been demonstrated by a reverse genetic 
approach that yet another class of floral organ iden- 
tity genes, tentatively termed class E genes here, is 
involved in specifying petals, stamens and carpels. 
The floral organ identity genes can be interpreted as 
acting as major developmental switches that activate 
the entire genetic program for a particular organ. 

In Arabidopsis, class A genes comprise APETALA1 
(AP1) and APETALA2 (AP2). The class B genes are 
represented by APETALA3 (AP3) and PISTILLATA 
(PI), and the class C gene is AGAMOUS (AG). 
In Antirrhinum, the class B genes comprise DEFI- 
CIENS (DEF) and GLOBOSA (GLO), and the class 
C gene is PLENA (PLE). Class D genes have been 
recognized only in petunia so far, where they have 
been termed FLORAL BINDING PROTEIN7 
(FBP7) and FBP11. The class E genes in Arabidopsis 
comprise SEPALLATA1 (SEP1), SEP2, and SEP3, 
which have highly redundant functions. All these 
genes have been cloned, which revealed that they 


all encode putative transcription factors. Thus the 
products of the floral organ identity genes probably 
all control the transcription of other genes (‘target 
genes’) whose products are involved in the formation 
or function of the different floral organs. Except 
for AP2, all floral organ identity genes are MADS- 
box genes. 

Among the regulators of the floral organ identity 
genes in Arabidopsis is the transcription factor LFY. 
LFY alone can induce expression of the class A gene 
AP1, i.e., other, flower- or region-specific coregulators 
are not needed. In contrast, the class B gene AP3 and the 
class C gene AG are activated by LFY in region-specific 
patterns within flowers, depending on other factors 
such as the F-box gene UNUSUAL FLORAL 
ORGANS (UFO) incase of AP3 andanunknown factor 
‘X’ in case of AG (Figure |). Recently, it was shown 
that AP1 and AG are direct downstream targets of LFY. 

Not much is known so far about the downstream 
target of the ‘floral homeotic genes’ itself. How floral 
organ identity is realized at the molecular level is, 
therefore, not well understood. The first proven direct 
target gene of a floral homeotic gene (AP3), termed 
NAP, was identified just recently. It may play a role in 
the transition between growth by cell division and cell 
expansion in stamens and petals. 

In contrast to the actinomorphic (polysymmetric) 
flowers of most angiosperms, including Arabidopsis, 
the flowers of Antirrhinum and many other species are 
zygomorphic, meaning that they have only one plane 
of reflectional symmetry. Genetic analyses revealed 
that the development of zygomorphic Antirrhinum 
flowers requires the interaction between several genes 
that affect the upper (dorsal) region (CYCLOIDEA, 
RADIALIS, DICHOTOMA) or the lower (ventral) 
region (DIVARICATA) of the flower. CVCLOIDEA 
(CYC) and DICHOTOMA (DICH) have been 
cloned and were shown to encode quite similar and 
functionally partially redundant transcription factors 
that are expressed in dorsal regions of the flower. CYC 
and DICH are founder members of a small group of 
transcription factors termed the TCP family. 

The formation of seeds, which are just ripened 
ovules, could be considered as the final goal of any 
flower development. Quite a number of genes involved 
in different stages of ovule development have been 
identified. Some of these genes have already been 
cloned, among them several encoding transcription 
factors suchas the AP2-like gene ANTITEGUMENTA 
(ANT) and the homeobox gene BELL1 (BEL1). 


Future Prospects 


In the future, the genes involved in flower develop- 
ment will be studied less and less individually, but 


rather more and more as components of complex gene 
networks. Since most human food is derived from 
flower parts or products, such as fruits and grains, 
there will be intensive attempts to apply the know- 
ledge obtained with the model plants (which are higher 
eudicots) to commercially important crop plants 
(which are predominantly monocots). The goal will 
be to design these plants according to our desires with 
respect to traits such as time to flowering, and inflor- 
escence, flower, and fruit structure. Comparative stud- 
ies on genes controlling reproductive development in 
a diverse range of phylogenetically informative taxa, 
including monocotyledonous and basal angiosperms, 
but also nonflowering plants, will provide a better 
understanding of flower evolution and the origin of 
biodiversity. 
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Site-Specific DNA Recombination 


Recombination is a universal strategy employed by 
life forms to reshuffle and reorganize their genetic 
information from time to time. Recombination can 
be classified broadly into two types: homologous 
and site-specific. The former is dependent on rather 
long stretches of homology between the participant 
DNA substrates (as in mitotic or meiotic recombin- 
ation between chromosomes in eukaryotic cells). 
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By contrast, the latter utilizes much shorter segments 
of homology embedded within sequence-specific 
DNA targets. In a more extreme form of site-specific 
recombination, DNA transposition, for example, the 
recombination partners often share little or no homo- 
logy between them. 


Families of Site-Specific Recombinases 


Two families of site-specific recombinases have been 
well characterized: the resolvase/invertase family and 
the integrase family (see Site-Specific Recombination). 
Members of these two families bring about recombin- 
ation by breaking specific phosphodiester bonds within 
their DNA targets, and reforming themacross substrate 
partners. The reaction does not require an exogenous 
energy source such as ATP, and proceeds without 
degradation or synthesis of DNA. Hence these recom- 
binases have been classified as ‘conservative’ site- 
specific recombinases. While the resolvase/invertase 
family appears to be confined to the prokaryotic 
world, the integrase family (named after the Int 
protein of phage lambda) includes members from 
bacteriophage, bacteria, and yeasts. The Flp recom- 
binase, the subject of this article, is an Int family 
member from the yeast Saccharomyces cerevisiae. 

Recombination mediated by the Int family pro- 
teins can lead to DNA fusions or dissociation, DNA 
deletions or inversions, and DNA translocations. A 
particualr outcome depends on whether the DNA 
substrates are circular or linear, whether the two sites 
partaking in a recombination event are present on a 
single DNA molecule or two separate DNA molecules, 
and, for the intramolecular case, whether they are in the 
same (head-to-tail) or opposite (head-to-head) orien- 
tations. The DNA rearrangements resulting from 
recombination have profound genetic and physiologic- 
al consequences: ranging from phage integration into 
and excision from bacterial genomes to developmental 
regulation of gene expression in specific cell types; 
stable segregation of unit copy or low copy circular 
genomes by the resolution of dimers and higher oli- 
gomers into monomers; and (as will be discussed here 
for the Flp system) copy number amplification of 
yeast plasmids. 


2-Micron Plasmid and Flp Site-Specific 
Recombination 


The 2-micron plasmid is a circular, multicopy extra- 
chromosomal element present in most strains of Sac- 
charomyces yeasts (Figure 1). The steady-state copy 
number of the plasmid is approximately 60 per yeast 
cell. Under normal growth conditions, the plasmid 
does not appear to confer any advantage to its host 


rather more and more as components of complex gene 
networks. Since most human food is derived from 
flower parts or products, such as fruits and grains, 
there will be intensive attempts to apply the know- 
ledge obtained with the model plants (which are higher 
eudicots) to commercially important crop plants 
(which are predominantly monocots). The goal will 
be to design these plants according to our desires with 
respect to traits such as time to flowering, and inflor- 
escence, flower, and fruit structure. Comparative stud- 
ies on genes controlling reproductive development in 
a diverse range of phylogenetically informative taxa, 
including monocotyledonous and basal angiosperms, 
but also nonflowering plants, will provide a better 
understanding of flower evolution and the origin of 
biodiversity. 
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Site-Specific DNA Recombination 


Recombination is a universal strategy employed by 
life forms to reshuffle and reorganize their genetic 
information from time to time. Recombination can 
be classified broadly into two types: homologous 
and site-specific. The former is dependent on rather 
long stretches of homology between the participant 
DNA substrates (as in mitotic or meiotic recombin- 
ation between chromosomes in eukaryotic cells). 
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By contrast, the latter utilizes much shorter segments 
of homology embedded within sequence-specific 
DNA targets. In a more extreme form of site-specific 
recombination, DNA transposition, for example, the 
recombination partners often share little or no homo- 
logy between them. 


Families of Site-Specific Recombinases 


Two families of site-specific recombinases have been 
well characterized: the resolvase/invertase family and 
the integrase family (see Site-Specific Recombination). 
Members of these two families bring about recombin- 
ation by breaking specific phosphodiester bonds within 
their DNA targets, and reforming themacross substrate 
partners. The reaction does not require an exogenous 
energy source such as ATP, and proceeds without 
degradation or synthesis of DNA. Hence these recom- 
binases have been classified as ‘conservative’ site- 
specific recombinases. While the resolvase/invertase 
family appears to be confined to the prokaryotic 
world, the integrase family (named after the Int 
protein of phage lambda) includes members from 
bacteriophage, bacteria, and yeasts. The Flp recom- 
binase, the subject of this article, is an Int family 
member from the yeast Saccharomyces cerevisiae. 

Recombination mediated by the Int family pro- 
teins can lead to DNA fusions or dissociation, DNA 
deletions or inversions, and DNA translocations. A 
particualr outcome depends on whether the DNA 
substrates are circular or linear, whether the two sites 
partaking in a recombination event are present on a 
single DNA molecule or two separate DNA molecules, 
and, for the intramolecular case, whether they are in the 
same (head-to-tail) or opposite (head-to-head) orien- 
tations. The DNA rearrangements resulting from 
recombination have profound genetic and physiologic- 
al consequences: ranging from phage integration into 
and excision from bacterial genomes to developmental 
regulation of gene expression in specific cell types; 
stable segregation of unit copy or low copy circular 
genomes by the resolution of dimers and higher oli- 
gomers into monomers; and (as will be discussed here 
for the Flp system) copy number amplification of 
yeast plasmids. 


2-Micron Plasmid and Flp Site-Specific 
Recombination 


The 2-micron plasmid is a circular, multicopy extra- 
chromosomal element present in most strains of Sac- 
charomyces yeasts (Figure 1). The steady-state copy 
number of the plasmid is approximately 60 per yeast 
cell. Under normal growth conditions, the plasmid 
does not appear to confer any advantage to its host 
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cell; nor is ita burden on the cellular metabolic machin- 
ery. The plasmid may be regarded as a typical ‘benign 
parasite genome’ that has optimized functions for its 
stable inheritance and its copy number maintenance. 

The 2-micron circle molecules exist in the yeast 
nucleus as minichromosomes, and are replicated dur- 
ing the S phase of the cell cycle by the same replication 
apparatus that duplicates the chromosomes. Replica- 
tion is initiated at the origin (ORI; Figure 1), and the 
replication forks proceed bidirectionally along the 
circular contour of the plasmid genome. Normally, 
each plasmid molecule is restricted to one round of 
replication per cell cycle. Equal partitioning of the 
duplicated circles is achieved by the Rep1 and Rep2 
proteins (coded for by the REP1 and REP2 plasmid 
genes) acting in concert with the partitioning locus 
STB (Figure 1). 

The 2-micron circle contains a duplicated sequence, 
599 bp long, and arranged in a head-to-head orienta- 
tion (indicated by the parallel lines in Figure 1). These 
inverted repeats divide the plasmid into two unique 
regions, represented by the circular arcs in Figure I. 
The Flp site-specific recombinase is the product of the 
FLP locus, and acts on the FRT (Flp Recombination 
Target) sites located within the inverted repeats. The 
result of the recombination reaction is an inversion of 
the left unique region with respect to the right unique 
region. As a consequence, the plasmid population 
within the yeast cell consists of an equilibrium mix- 
ture of the two forms A and B, present in roughly 
equimolar amounts (Figure |). The relative flipping of 
the DNA by recombination is what gives the recom- 
binase its name Flp (pronounced either as ‘flip’ or as 
the letters F-L-P). 


Mechanism of Flp Recombination 


The FRT site consists of three 13bp Flp-binding 
elements (1a, 1’a, and 1’b) and an 8 bp strand exchange 
region (or spacer) arranged as shown in Figure | 


REP1 


Figure | 


(bottom). The two phosphodiester bonds that take 
part in recombination at the la-spacer junction and at 
the 1'a-spacer junction are indicated in Figure |. Note 
that la and 1'a bordering the spacer at the left and right 
ends, respectively, are oriented in a head-to-head fash- 
ion. The third element 1’b is not directly involved 
in the recombination reaction, although it may modu- 
late the reaction efficiency im vivo in yeast. For sim- 
plicity, the mechanism of Flp recombination will be 
described for the ‘minimal’ 34 bp FRT site consisting 
of the 1a-1'a Flp-binding elements and the included 
spacer sequence. 

The Flp recombination reaction follows the typical 
Int family recombination pathway (Figure 2A). The 
reaction is initiated by the synapsis of two DNA sub- 
strates, each bound by two Flp monomers. In order to 
appreciate the geometry of the recombination com- 
plex, it is useful to divide each substrate into a left 
DNA arm (corresponding to 1a) anda right DNA arm 
(corresponding to 1’a). The reuslts from a number of 
studies are most easily accommodated by arranging 
the two substrates, L1R1 and L2R2, in the antiparallel 
configuration: L1 and L2 (also R1 and R2) being 
placed at opposite ends of the synaptic structure. The 
bend introduced into each substrate, L1R1 and L2R2, 
results from the interaction of the bound Flp mono- 
mers. This left-to-right dimeric interaction is essential 
for assembling the Flp active site. During this func- 
tional interaction, one Flp monomer orients the scis- 
sile phosphate using an active site cleft that includes 
three invariant Int family residues, Arg191, His305, 
and Arg308. This phosphate is then attacked by 
Tyr343, the fourth invariant family residue, from the 
second Flp monomer to break the DNA strand (see 
Figure 2B). The result of the cleavage reaction is the 
formation of the 3/-phosphotyrosine bond and a 5’- 
hydroxyl group in each substrate at one end of the 
spacer (the left end in Figure 2A). This trans- 
esterification mechanism, as opposed to a hydrolytic 
cleavage mechanism, conserves the energy of the 


REP1 REP2 


STB FLP 
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In a schematic representation of the 2-micron plasmid, the 599 bp inverted repeat (shown by the parallel 


lines) divides the circular genome into two unique regions (circular arcs at the left and at the right). Flp-mediated 
recombination at the FRT sites is responsible for interconversion between forms (A) and (B) by DNA inversion. The 
products of the REP genes, Repl and Rep2 proteins, together with the STB locus are responsible for plasmid 
partitioning at cell division. ORI is the plasmid replication origin. 
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Figure 2 (A) Recombination between two DNA substrates, LIRI and L2R2, is initiated by strand cleavage and 
exchange at one end of the spacer (at the left end in the scheme shown here). The interactions between recombinase 
monomers bound to each substrate (the two shaded monomers in substrate |, and the two unshaded monomers in 
substrate 2) are responsible for the first strand cleavage and exchange reaction. The resulting Holliday intermediate is 
resolved into the recombinants, LIRI and L2R2, by cleavage and exchange at the right end of the spacer. During the 
resolution step, the catalytic dimers are formed between Flp monomers bound on the left and right arms of partner 
substrates. Each ‘active dimer’ is constituted by a darkly shaded and a lightly shaded monomer. Note that, throughout 
the reaction pathway, a cyclic peptide connectivity is maintained among the four Flp monomers. The catalytically 
active and inactive associations between pairs of Flp monomers are indicated by the solid and dashed arcs, 
respectively. The switch in the configuration of the active Flp dimers involves the isomerization of the Holliday 
junction from HI to H2. These junctions have an approximate fourfold symmetry, but are strictly only twofold 
symmetric. The circles and split arrowheads indicate the 5’ and 3’ ends, respectively, of DNA strands. (B) The strand 
exchange reaction at the initiation and termination steps of recombination involves the formation of a covalent protein- 


DNA intermediate in which the 3’-phosphate end of a cleaved strand is linked to the active site tyrosine of Flp. 


phosphodiester bond for the strand-joining reaction. 
Attack by the 5’-hydroxyl groups on the phospho- 
tyrosine bonds across substrates (Figure 2B) results 
in the first exchange of strands and the formation of the 
Holliday intermediate (H1; Figure 2A). The junction 
rearranges (isomerizes) to the H2 form in preparation 
for its resolution, and thus the termination of re- 
combination. In H2, the two Flp dimers constituted 
by the R2 and L1 arms and the R1 and L2 arms are 
in the proper geometric configuration for strand 
cleavage and exchange at the right end of the spacer. 
The outcome of Holliday resolution is the forma- 
tion of the two reciprocally recombinant products, 
L1R2 and L2R1. The mechanism of the reaction as 
outlined in Figure 2A is supported by the X-ray 
structure of a Flp-DNA complex solved recently by 


P. Rice and colleagues at the University of Chicago 
(Figure 3). 

In addition to the Arg—His—Arg triad and the 
tyrosine nucleophile, two other amino acids in Flp 
(Lys223 and Trp330) are thought to be active site 
residues that assist or participate directly in catalysis. 
Amino acid sequence comparisons indicate that 
the conserved residue corresponding to Trp330 
of Flp is a histidine in most Int family members. 
The Lysine corresponding to Lys223 of Flp is located 
in a three B-sheet region in solved X-ray structures 
of the Int type recombinases. An equivalent lysine is 
also seen in the crystal structures of human and vaccinia 
topoisomerases. The overall similarity between the 
Int family recombinases and type IB topoisomerases 
in their active site architecture is consistent with the 
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Figure 3 In the structure of a Filp-DNA complex, two 
Flp monomers execute strand cleavage by providing the 
tyrosine nucleophile, while the other two monomers 
assist cleavage by orienting the target phosphodiester 
bonds. Note that recombination is completed in two 
steps, each step exchanging one pair of strands between 
the DNA partners. Hence the ‘cleavage-active’ and 
‘cleavage-assisting’ monomers switch roles during a 
recombination event. The FIp-DNA structure has a 
roughly fourfold symmetry and is consistent with the 
reaction pathway drawn in Figure 2A. (From P. Rice, 
University of Chicago.) 


common chemical mechanism they employ for strand 
cutting. 


Relevance of Flp Recombination to 
Plasmid Physiology 


The Flp recombination reaction serves an important 
function in the physiology of the 2-micron plasmid. In 
the event of a stochastic drop in copy number, caused, 
for example, by a missegregation event, the amplifica- 
tion system constituted by the Flp protein and the 
FRT sites is brought into play to restore it quickly to 
the steady-state value. Thus, the Flp recombination 
system, together with the plasmid stability system 
constituted by the Rep proteins and STB (see Fig- 
ure l), provides a dual strategy to ensure the persis- 
tence of the plasmid as a benign parasite genome. 

A clever model for how the recombination reaction 
can be utilized to mediate plasmid amplification has 
been proposed by Bruce Futcher. The essential fea- 
tures of the ‘Futcher model’ are illustrated in Figure 4. 
The model is critically dependent on the asymmetric 


location of the plasmid replication origin (ORJ) with 
respect to the FRT sites (see Figure |), and the bidir- 
ectional replication mode by which plasmid molecules 
are duplicated during the yeast cell cycle. One of the 
two replication forks initiated at the ORI sequence 
will traverse the proximal FRT site well before the 
second fork crosses the distal FRT site. Imagine a Flp 
recombination reaction to occur (as illustrated in 
Figure 2) within a replicating plasmid when only the 
proximal FRT site has been duplicated. The result is 
the inversion of one fork with respect to the other. 
Instead of meeting head-on and terminating replica- 
tion, as they do during a normal cell cycle, the forks 
now chase each other around the plasmid contour, 
spinning out multiple copies of it. The tandemly 
linked copies can be reduced to the monomeric units, 
also by Flp-mediated recombination. This reductional 
recombination will occur between alternate (as 
opposed to adjacent) FRT sites, which are in direct 
(head-to-tail) orientation. Thus, the recombinational 
inversion of a bidirectional replication fork allows 
a single initiation event (dictated by the cell cycle 
control of replication) to be transformed into a multiple 
plasmid copying mechanism. Note that amplification 
can be terminated when a second recombination even 
reinverts the forks, thereby restoring their bidirec- 
tional movement. 

Although the Futcher model has not been exhaus- 
tively verified, it has been clearly demonstrated that 
the act of recombination per se is essential for ampli- 
fication. When the Flp protein is mutated to a cata- 
lytically inactive variant, or when the FRT site is altered 
to a recombination-incompetent state, a plasmid sub- 
strate, which is present at an initial low copy state, fails 
to amplity. 


Control of Copy Number Amplification 


Under steady-state growth conditions, when the plas- 
mid is at its normal copy number, the amplification 
system is unnecessary, and may even be disadvanta- 
geous to the 2-micron plasmid. A runaway increase in 
plasmid copy number by unregulated expression of 
Flp would be harmful to the host, and hence indirectly 
so for a benign parasite that it harbors. Hence, it is 
logical to suppose that the amplification system would 
be tightly controlled, either at the level of Flp expres- 
sion, or at the level of the recombination reaction, or 
both. For the system to act beneficially and efficiently, 
it must not only be silenced at normal copy number, 
but also should be rapidly commissioned into action 
when there is a downward fluctuation in copy num- 
ber. Preliminary genetic evidence suggests that the 
2-micron circle Rep proteins may provide an indirect 
readout of the plasmid levels in a cell, and act as 
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Figure 4 The 2-micron circle replication is initiated at the replication origin (ORI; located close to one FRT site and 
away from the other) and proceeds bidirectionally. A plasmid molecule is restricted to one round of replication during 
a normal cell cycle. During the amplification mode by the Futcher model, a Flp-mediated recombination event 
(indicated by the DNA crossover) inverts one fork with respect to the other. As the two forks chase each other 
around the circular template, multiple tandem copies of the plasmid are made from the single replication event 
initiated at ORI. Amplification can be terminated by a second recombination event that now redirects the forks 
toward each other. In the example shown, n + | copies of the plasmid are made before replication is terminated. A 
single plasmid unit in the ‘amplicon’ is indicated by the square brackets, with the arrows representing the 2-micron 
circle inverted repeats. After resolution to individual copies, there would be a total of n+ 2 plasmid molecules. 


negative regulators of the FLP gene expression in 
a concentration-dependent manner. However, the 
details of this regulatory circuit remain to be resolved. 


Site-Specific Recombination in 
Evolution: the Means to Many Ends 


The circular geometry of the 2-micron plasmid, its 
structural organization, and its genetic potential are 
all part of the elegant biological design of a successful 
selfish DNA element. One central outcome from this 
molecular architecture is that a carefully controlled 
site-specific recombination event can be exploited to 
promote replicative amplification of the genome. It is 
not surprising therefore that circular plasmids found 
in yeasts that are rather distantly related to Saccharo- 
myces are structurally similar to the 2-micron plasmid 
(despite their large diversity in nucleotide sequences), 
and harbor their own individual site-specific recombin- 
ation systems. Furthermore, the observed kinship 
among site-specific recombination systems found in 
phage, bacteria, and yeasts attests to the axiom that 
evolution is adept at reutilizing or retooling the same 
basic biochemical strategy to bring about widely varied 
end results under distinct biological contexts. 
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The FMS oncogene was identified on the basis of its 
homology to the v-fms gene, transduced from the 
Susan McDonough strain of feline sarcoma virus 
(SM-FeSV) (McDonough et al., 1971). The gene was 
sequenced and found to code for a transmembrane 
glycoprotein and the C-terminal region was found to 
be homologous to protein tyrosine kinases. Human 
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FMS was isolated in 1983 and was localized to 
chromosome 5. FMS was shown to code for the col- 
ony monocytic stimulating factor-1 receptor (CSF- 
1R) (Sherr et al., 1985). It is expressed in monocytes 
produced by the bone marrow, where it is required for 
monocytic differentiation and survival of macro- 
phages, and is also expressed in the spleen, liver, 
brain, and placenta. v-fms exhibits constitutive tyro- 
sine kinase activity in the absence of ligand and trans- 
forms cells. The differences between oncogenic v-fms 
and normal cellular FMS are a number of scattered 
point mutations and the replacement of 50 amino acids 
at the C-terminus of the human gene with 11 unre- 
lated amino acids in the viral gene. The effects of FMS 
are cell type dependent. In NIH3T3 mouse fibroblast 
cells the sequence responsible for transformation 
was localized to amino acid 301 in the extracellular 
domain. Regulatory sequences at position 969, when 
mutated, enhance transformation mediated by muta- 
tions in codon 301. However, in hematopoietic 
FDCP-1 cells the 969 mutations transform these cells, 
rendering them anchorage- independent and tumori- 
genic in nude mice, whereas the 301 mutant construct is 
not transforming. Cells infected with the 969 mutant 
construct cannot be saturated with concentrations of 
CSF-1 observed to saturate the wild-type receptor 
(McGlynn et al., 1998). Screening myeloid (pre)leuke- 
mia patients for these mutations revealed that muta- 
tions at codon 969 were more frequent than those at 
codon 301 (Ridge et al., 1990; Tobal et al., 1990), 
suggesting that the FMS oncogene may be involved in 
the pathogenesis of this disease (see Gallagher et al., 
1997 and references therein). 
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Single-stranded DNA with sequences that permit it to 
make stable secondary structures by folding back 
upon itself and forming hydrogen bonds. 
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Follicular lymphoma is a neoplasm of germinal center 
B cells that recapitulates the histology of reactive B-cell 
follicles. It is one of the commonest nonHodgkin’s 
lymphomas in Western countries. Follicular lymph- 
oma is characterized by t(14;18)(q32;q21) that 
leads to overexpression of the apoptosis inhibitory 
bel-2 protein. It is clinically indolent but ultimately 
incurable. 


See also: Cancer Susceptibility 
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Footprinting is a technique used to identify the bind- 
ing site of, for example, a protein in a nucleic acid 
sequence by virtue of the protection given by the 
binding site against nuclease attack. 


See also: Nuclease 
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The first of the many significant contributions of 
Charles Ford (1912-1999) to mammalian cytogenetics 
was an involvement in the 1956 correction of the 
human diploid chromosome number. For over 30 


years there had been debate as to whether it was 47 or 
48 before Ford and Hamerton (1956) unequivocally 
showed the presence of 23 pairs of chromosomes at 
meiosis in direct preparations obtained from the germ 
cells of normal men and so corroborated the mitotic 
counts of 46 obtained by Tijo and Levan in the same 
year. Ford, with others, went on to show correlations 
between aberrant chromosome numbers and pheno- 
type in known human syndromes such as Turner 
(XO) and Klinefelter (XXY). These revelations led 
to a worldwide surge of interest in human cytogen- 
etics but also gave rise to increasing conflict in the 
reporting of observations. To resolve the disparities, 
Ford was instrumental in convening a study group to 
decide on an acceptable international nomenclature 
system. Their recommendations were published in 
1960 as the Denver Report (after the venue of the 
meeting) and this has served as a model for nomen- 
clature and further updates to this day. 

Although widely recognized as one of the initiators 
of a golden era of mammalian cytogenetics, before 
1956 Ford was exclusively involved with plant mater- 
ial. After graduating in botany from King’s College, 
London, he studied the chromosome translocation 
complexes in the genus Oenothera before departing 
in 1938 for what became a war-interrupted seven-year 
period as the geneticist at the Rubber Research 
Scheme in the then Ceylon (Sri Lanka). An increasing 
postwar concern about the genetic damaging effect of 
radiation and radiomimetic chemicals saw his recruit- 
ment to work with one of the classic tools of chromo- 
some breakage study, the root tips of Vicia faba. He 
started at the Atomic Energy Laboratory at Chalk 
River, Canada and then returned to the UK to head 
the Cytogenetic Section at the newly founded Medical 
Research Council Radiobiology Unit at Harwell. 
Here a failure of root-tip growth (subsequently 
found to result from the toxic effect of copper leaching 
from the new pipework) played a significant role in his 
destiny. To await new pipework, he experimented 
with the more readily available supply of animal tissue 
and perfected the technical methods that were used to 
correct the human chromosome number. At the same 
time he became aware of the potential value to radio- 
biology of combining the use of these new methods 
with the expertise of other scientists both within and 
outside the Unit. Mice with induced chromosome 
aberrations were produced by the geneticists and 
these yielded valuable information in assessing genetic 
risk and also in the study of the effects of gross genome 
imbalance on survival. One of these aberrations, an 
unequal reciprocal translocation, proved additionally 
useful in that it presented a derived chromosome 
much smaller than the smallest normal chromosome. 
In the then absence of any convenient cell marker, 
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Ford realized its value in tracking donor cell contribu- 
tions in the ongoing experiments by the immuno- 
logists to ‘rescue’ lethally irradiated mice by bone 
marrow injection. The small chromosome, named 
T6, was and is still used worldwide as a convenient 
cell marker and the early experiments laid the founda- 
tions of the basic principles of immunosuppression 
and tissue transplantation such as for human bone 
marrow replacement. 

In over 20 years of involvement with animal cyto- 
genetics, first at Harwell and then at the University of 
Oxford, Ford worked with the chromosomes of innu- 
merable species in a variety of situations. Only a few 
examples can be cited. The chromosome marker studies 
were continued in analyzing cellular contributions in 
mouse chimeras ‘created’ by morula fusion or blasto- 
cyst injection. The chimeras also produced insights 
into the masculinizing effect of the mammalian 
Y chromosome in XX:XY combinations, an interest 
that was extended into studies of the natural secondary 
chimeras found in cattle (freemartins) and marmoset 
monkeys. An earlier interest in the Robertsonian 
translocation systems discovered in the common 
shrew broadened with the discovery of similar systems 
in feral mice and their property to induce high levels of 
nondisjunction and zygotic imbalance when crossed 
to laboratory mice. At the same time, human cyto- 
genetics was not neglected with such studies as meiosis 
in XYY males and the chromosomal screening of 
cultured blood from athletes competing in the Mexico 
City Olympic Games. 

Ford was renowned for his inspirational enthu- 
siasm in all branches of cytogenetics and his many 
contributions were acknowledged by his election 
to a Fellowship of the Royal Society of London in 
1965 and in the compilation in 1978 of a special issue 
of an international journal in honor of his 65th birth- 
day. The contents, by friends and associates, reflect 
many of his interests and the esteem in which he was 


held. 
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Forward mutations are those that inactivate a wild- 
type gene. 


See also: Wild-Type (WT) 


Fosmid 
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A fosmid is a low-copy-number cosmid vector based 
on the Escherichia coli F factor, which is present in 
only a few copies in each bacterial cell. Eukaryotic 
DNA cloned into vectors that are present in many 
copies per cell is sometimes unstable, tending to 
undergo deletion or rearrangement. Unstable inserts 
of this type can often be stably propagated as fosmid 
clones. 


See also: F Factor 
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The founder effect implies that a small number of 
individuals have a significant and lasting effect on the 
gene pool of a population. Since the genes can migrate 
only when carried in or out of a location by indi- 
viduals, the founder effect is linked to the history of 
a population. Typically, the genes in the current popu- 
lation originated from a well-defined, restricted group 
of individuals that became separated from a larger 
initial population and migrated to a new location. 
The gene pool of the migrating population represents 
a small sample from the original population, since 
only a small number of the original population 
migrated. The migration event is an example of what 
is called a bottleneck in population genetics. The selec- 
tion of particular alleles of the genes that moved to the 
new population is entirely a matter of chance. 
Identification of evidence of the founder effect 
at the gene level does not necessarily require DNA 


analyses of the population. The founder effect can 
become evident through the observation of some dis- 
eases. Some populations show an exceptionally high 
prevalence of recessive diseases, which are rare else- 
where. The frequency of a recessive disease allele 
might have been very low in the initial population, 
but in a small subset comprising the new migrating 
population this allele might have a relatively high 
frequency due to the small number of founders. Its 
frequency thus becomes markedly higher than in the 
initial population. (The recessive genes are a good 
example, since there is less selection pressure on 
these genes that ‘remain silent’ in the population and 
their prevalence reflects the history of the founders 
of the population better than the prevalence of domin- 
ant disease genes, which can be selected against since 
they express themselves in an individual’s disease 
phenotype.) 

Good examples of populations exhibiting founder 
effect are small, isolated, or remote populations, such 
as the Sardinians or Finns, which exhibit a uniquely 
high prevalence of some disease genes and a very low 
prevalence of others. Some 30 recessive diseases are 
more common in Finland than elsewhere in the world 
and diseases like cystic fibrosis (CF) and phenylketo- 
nuria (PKU), which are common in other Caucasian 
populations, are extremely rare. Characterization of 
the molecular background of Finnish diseases that are 
enriched in the population showed that they exhibit 
striking locus and allelic homogeneity. Although some 
of the diseases enriched in the Finnish population, 
such as Meckel syndrome (early lethal malformation 
syndrome) or PLOSL (early adulthood-onset progres- 
sive dementia), show a feature called locus heterogen- 
eity (the occurrence of multiple genes causing the 
same clinical phenotype), globally, all Finnish patients 
share the same chromosomal locus. Furthermore, one 
major mutation has been systematically identified in 
the vast majority of diseases, the prevalence of one 
mutation being as high as 98 % (Table 1). These find- 
ings strongly support the hypothesis that one founder 
mutation was brought to this population in the 
genome of a single immigrant generations ago, and 
Finnish patients living today originate from one 
common ancestor. A similar founder effect has been 
demonstrated in the French Canadian population. 
One mutation resulting in tyrosinemia I (caused by 
the deficiency of an enzyme, fumarylacetoacetate 
hydrolase) was found in 90% of disease alleles. In 
contrast this mutation is found only in 28 % of the 
tyrosinemia alleles in the rest of the world. 

The founder effect can be further exemplified by 
the fact that some Finnish disease alleles show major 
regional variations in their population frequencies, as 
well as in the number of affected individuals. This is 


Table | 
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Examples of disease mutations demonstrating the founder effect in the Finnish population 


Disease (OMIM number) 


Defective protein Major mutation 
occurrence in 


Finland (%) 


APECED (240300) 
Aspartylglucosaminuria (AGU, 208400) 
Congenital chloride diarrhea (CCH, 214700) 


Congenital nephrosis (CNF, 256300) 
Diastrophic dysplasia (DTD, 222600) 
Familial amyloidosis, Finnish type (FAF, 105120) 
Gyrate atrophy of choroid and retina (HOGA, 258870) 
Hypergonadotrophic ovarial dysgenesis (ODGI, 2333300) 
Infantile neuronal ceroid lipofuscinosis (INCL, 256730) 
Lysinuric protein intolerance (LPI, 222700) 
Nonketotic hyperglycinemia (NKH, 238300) 
Progressive myoclonus epilepsy (PME, 254800) 
Retinoschisis (RS, 312700) 
Sialic acid storage disease (SIASD, 268740) 
Finnish variant of late infantile 

neuronal ceroid lipofuscinosis (vVLINCL, 256731) 


Novel nuclear protein 82 
Aspartylglucosaminidase 98 
Product of the gene 

downregulated in adenoma 100 
Nephrin 78 
Sulfate transporter 90 
Gelsolin 100 
Ornithine-aminotransferase 85 
Follicle-stimulating hormone receptor 100 
Palmitoyl protein thioesterase 98 
t-Amino acid transporter 100 
Glycine cleavage system; protein P 70 
Cystatin B 96 
XLRSI 70 
Novel transporter 94 
Novel membrane protein 94 


the result of an internal migration after initial settle- 
ment in the country. Some 2000 years or 100 gener- 
ations ago, small immigrant groups inhabited Finland. 
Later, small subgroups of this initial population 
moved to still more remote regions of Finland and 
established small population subisolates. Perhaps 
only 20-40 families moved to remote areas 200-300 
years ago, and the founder effect and chance (genetic 
drift) resulted in the enrichment of some disease genes 
in these subisolates. 

The founder effect in one ancestral mutation makes 
the mapping and identification of disease genes a 
straightforward task. Genome-wide searches for dis- 
ease genes are based on the identification of a chromo- 
somal region containing genetic markers which 
co-segregate with the disease, due to the close vicinity 
of the marker and the mutated gene. Families with 
multiple affected children are needed to reveal this 
co-segregation. In the presence of the founder effect, 
mapping strategies based on the analyses of only dis- 
eased individuals can be applied. Monitoring of shared 
marker alleles among the affected individuals has been 
highly successful in the identification of genes and 
alleles causing inherited diseases in genetic isolates. 
The shared chromosomal regions indicate that the 
alleles are identical by descent (IBD), since they 
share a common ancestor. In the case of recessive 
diseases, this strategy has been called homozygosity 
mapping. Typically for disease alleles showing a foun- 
der effect, linkage disequilibrium or the nonrandom 
association of alleles is seen over a long genetic interval 


flanking the disease gene. The length of this interval is 
negatively correlated with the number of generations 
that have passed since the founder effect took place 
and with the expansion rate of the population. 

The founder effect has been invoked to explain 
the exceptionally high prevalence of some worldwide 
genetic disorders in specific populations. Good ex- 
amples are cystic fibrosis in Northern Europeans and 
Tay-Sachs disease in Eastern European Jewish popu- 
lations. Furthermore, in some genetic isolates, such as 
in Sardinia, the prevalence of some common diseases 
like type I diabetes is exceptionally high. One hypo- 
thesis for this phenomenon is a founder effect. This 
concept of limited variation in the genetic background 
caused by a founder effect has raised significant inter- 
est in those projects designed to map genes contribut- 
ing to complex diseases using population isolates. 
Examples are studies of asthma in Tristan da Cunha 
or schizophrenia in Palau, Micronesia. 

The founder effect has some practical consequences 
for DNA testing and disease diagnostics. If one muta- 
tion is found in 90 % of the disease alleles, diagnostic 
DNA tests providing high specificity and reliability 
are easy to develop. This is different from tests for 
mutations in other, more heterogeneous populations, 
in which the value of DNA diagnostics has remained 
limited due to the high number of disease mutations. 
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Fragile Chromosome Site 
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Fragile sites are specific points on chromosomes that 
show nonrandom gaps on breaks when the cells from 
which the chromosomes were prepared have been 
exposed to a specific chemical agent or condition of 
tissue culture. The fragile site is an area of chromatin 
that is not compacted when seen at mitosis. 

Fragile sites are classified as rare (on less than 1 in 
40 chromosomes) or common (on all chromosomes) 
and by the conditions under which they are seen. 
There are more than 120 recognized fragile sites in 
the human genome (Sutherland et al., 1996). 
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Fragile X Syndrome 
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Fragile X syndrome is the most common form of 
familial mental retardation. It is so-called because it 


is associated with a fragile site (FRAXA) on the end of 
the long arm of the X chromosome. The most promin- 
ent feature of the condition is moderate to severe 
mental retardation in most affected males and milder 
intellectual deficits in a proportion of females. In add- 
ition to the mental retardation, there is a syndrome of 
minor subtle malformations, again more evident in 
males than females. 

The syndrome was first described with the fragile X 
chromosome in 1969 but its relatively common occur- 
rence (about 1 in 4000 boys and 1 in 6000 girls) was not 
recognized until the early 1980s. This was largely 
because in 1977 it was discovered that chromosome 
studies needed to be performed in a specific way for 
the fragile X chromosome to be seen. It was recog- 
nized in the mid-1980s that the fragile X syndrome 
had anomalous inheritance patterns and was not a 
simple X-linked recessive disorder. The reasons for 
this were unknown until the molecular basis of the 
disease was elucidated in 1991. The fragile site was 
shown to be due to expansion of a naturally occurring 
polymorphic CCG trinucleotide repeat in the 5’ 
untranslated region of the FMR1 gene. The number 
of copies of the repeat can change on transmission 
from parent to child and when the number exceeds 
about 230 the expression of the FMR1 gene is extin- 
guished and this is the molecular cause of fragile X 
syndrome. 


Clinical Features 


There are many physical and behavioral features of 
fragile X syndrome. Those which occur in more than 
50% of males are listed in Table 1. 

These features are shown by those with a full muta- 
tion. Individuals with a premutation are intellectually 
and physically normal. The only significant exception 


Table | Clinical signs present in more than 50% of 
Fragile X males* 


Physical signs Behavioral signs 


Long face Hand flapping 
Prominent ears Hand biting 
High arched palate Hyperactivity 
Hyperextensible fingers Perseveration 
Double-jointed thumbs Aggression 
Flat feet Shyness 
Macroorchidism Anxiety 


Strabismus Poor eye contact 


Soft smooth skin Tactile defensiveness 
Mitral valve prolapse 
Tall as children, short as adults 


Large heads as children, small as adults 


“From Hagerman and Cronister (1996). 
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Fragile sites are specific points on chromosomes that 
show nonrandom gaps on breaks when the cells from 
which the chromosomes were prepared have been 
exposed to a specific chemical agent or condition of 
tissue culture. The fragile site is an area of chromatin 
that is not compacted when seen at mitosis. 

Fragile sites are classified as rare (on less than 1 in 
40 chromosomes) or common (on all chromosomes) 
and by the conditions under which they are seen. 
There are more than 120 recognized fragile sites in 
the human genome (Sutherland et al., 1996). 
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Fragile X syndrome is the most common form of 
familial mental retardation. It is so-called because it 


is associated with a fragile site (FRAXA) on the end of 
the long arm of the X chromosome. The most promin- 
ent feature of the condition is moderate to severe 
mental retardation in most affected males and milder 
intellectual deficits in a proportion of females. In add- 
ition to the mental retardation, there is a syndrome of 
minor subtle malformations, again more evident in 
males than females. 

The syndrome was first described with the fragile X 
chromosome in 1969 but its relatively common occur- 
rence (about 1 in 4000 boys and 1 in 6000 girls) was not 
recognized until the early 1980s. This was largely 
because in 1977 it was discovered that chromosome 
studies needed to be performed in a specific way for 
the fragile X chromosome to be seen. It was recog- 
nized in the mid-1980s that the fragile X syndrome 
had anomalous inheritance patterns and was not a 
simple X-linked recessive disorder. The reasons for 
this were unknown until the molecular basis of the 
disease was elucidated in 1991. The fragile site was 
shown to be due to expansion of a naturally occurring 
polymorphic CCG trinucleotide repeat in the 5’ 
untranslated region of the FMR1 gene. The number 
of copies of the repeat can change on transmission 
from parent to child and when the number exceeds 
about 230 the expression of the FMR1 gene is extin- 
guished and this is the molecular cause of fragile X 
syndrome. 


Clinical Features 


There are many physical and behavioral features of 
fragile X syndrome. Those which occur in more than 
50% of males are listed in Table 1. 

These features are shown by those with a full muta- 
tion. Individuals with a premutation are intellectually 
and physically normal. The only significant exception 


Table | Clinical signs present in more than 50% of 
Fragile X males* 


Physical signs Behavioral signs 


Long face Hand flapping 
Prominent ears Hand biting 
High arched palate Hyperactivity 
Hyperextensible fingers Perseveration 
Double-jointed thumbs Aggression 
Flat feet Shyness 
Macroorchidism Anxiety 


Strabismus Poor eye contact 


Soft smooth skin Tactile defensiveness 
Mitral valve prolapse 
Tall as children, short as adults 


Large heads as children, small as adults 


“From Hagerman and Cronister (1996). 


to this is that females with premutations appear to be 
prone to premature ovarian failure, which can occur at 
the age of 30 years onwards, although most premuta- 
tion carriers do not have premature menopause. 


Treatment 


Cure of fragile X syndrome is not possible. A number 
of the behavioral difficulties exhibited by fragile X 
syndrome are amenable to both pharmaceutical and 
behavior treatments. Integrated approaches to treat- 
ment will maximize the potential of affected individuals 
and minimize the disruption to family life that this con- 
dition can produce (Hagerman and Cronister, 1996). 


Cytogenetics 


The appearance of the fragile X chromosome is shown 
in Figure |. This is most easily seen in chromosomes 
prepared from lymphocyte cultures. The lymphocytes 
need to be cultured in media which have a relative 
deficiency of thymidine or deoxycytidine. This can 
be achieved by using special commercially available 
media, using medium TC199, or by adding a variety of 
inducing agents such as the antifolate aminopterin, the 
thymidylate synthetase inhibitor fluorodeoxyuridine, 
or high concentrations of thymidine which inhibit 
the availability of deoxycytidine (Sutherland, 1991). 
Cytogenetic testing for fragile X syndrome has largely 
been replaced by DNA testing. 


Molecular Genetics 


The molecular basis of fragile X syndrome is lack of 
FMRI, the protein encoded by the FMR1 gene. Within 
the 5’ untranslated region of the FMR1 gene there 
is a polymorphic CCG repeat, which on normal 


Figure | 
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chromosomes varies in size from about 5 to 55 copies. 
Once this repeat exceeds about 55 copies the chromo- 
some is said to have a fragile X premutation. Beyond 
about 230 copies of the repeat a full mutation is pre- 
sent (Warren and Nelson, 1994). The full mutation 
results in CpG methylation of the DNA in both the 
promoter region of the FMRI gene, and of the 
expanded repeat, and this results in transcriptional 
silencing of this gene. Males with the full mutation 
have fragile X syndrome. 

The fragile X chromosome is subject to random X 
inactivation. This, and possibly other factors, influ- 
ences the clinical picture in females with a full muta- 
tion on one of their X chromosomes. About 60% of 
such females will be mildly mentally impaired or 
worse. This presents a difficulty at prenatal diagnosis 
as the phenotype of a female fetus with a full mutation 
cannot be accurately predicted. 

Some individuals (males and females) show somatic 
instability of the expanded repeat and can be termed 
‘mosaics.’ This means that there are populations of 
cells in which the number of copies of the CCG repeat 
are different. In extreme cases the one patient may 
have normal, premutation, and full mutation cells. 
Full mutations are inherited via the ovum and appar- 
ently exhibit somatic instability (‘breakdown’) very 
early in embryonic development. 

More than 99% of fragile X syndrome mutations 
are due to expansion of the CCG repeat by a mechan- 
ism known as dynamic mutation (see Dynamic Muta- 
tions). The other 1% or so are due to a variety of 
mutations, primarily deletions of various sizes, but 
point mutations have been recorded. The function of 
the FMR1 protein is not fully understood, but it is an 
RNA-binding protein (Oostra, 1996). The protein is 
widely expressed during development and, later on, in 
brain, testis, and uterus. There appears to be extensive 


f 


Sex chromosome complements from individuals expressing FRAXA. A female (left) showing the fragile X 


and normal X chromosome, and a male (right) showing the fragile X and a normal Y chromosome. 
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alternative splicing of the FMR1 mRNA and some 
forms of the protein locate in the cytoplasm, and 
others in the nucleus of the cell. 

Diagnosis of fragile X syndrome now is primarily 
by measuring the number of CCG repeats in the 
FMR1 gene. This is usually performed by Southern 
blot analysis to estimate the size of a DNA restriction 
fragment, with increases in the size being due to add- 
itional copies of the CCG repeat. This can be per- 
formed either postnatally or on DNA extracted from 
chorionic villus samples for prenatal diagnosis. 


Genetics 


The paradoxical nature of the fragile X chromosome 
was documented by Sherman et al. (1985). They 
showed that normal males could ‘carry’ the condition, 
an anomalous situation for an X-linked disease but 
now known to be because of the premutations being 
clinically harmless. They showed that the mothers and 
daughters of normal fragile X carrier males had differ- 
ent risks of having children with fragile X syndrome 
(‘the Sherman paradox’). 

It is now recognized that when women transmit the 
fragile X mutation it usually increases in size and the 
risk of going from a premutation to a full mutation 
depends upon the size of the premutation (Fisch et al., 
1995). When a male with a premutation transmits it, 
the size of the premutation changes little. When a male 
with a full mutation transmits his fragile X chromo- 
some (to a daughter) she always receives it as a 
premutation. 

It is worth noting that whenever a child is identified 
with fragile X syndrome, the mother is always a car- 
rier (either pre- or full mutation) as is one of the 
maternal grandparents. 


Conclusion 


Fragile X syndrome is a common disorder. Its genetics 
are reasonably well understood but much remains to 
be learned about the molecular pathway from geno- 
type to phenotype. Diagnosis by DNA analysis is very 
reliable, and prenatal diagnosis is appropriate and 
available to women who are carriers of this disorder. 
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A nucleic acid sequence is translated into the protein it 
encodes by means of transfer RNAs (see Transfer RNA 
(tRNA)) interacting with the ribosomal apparatus. 
Transfer RNAs bind to three nucleotides at a time 
and thus divide the nucleic acid sequence into codons, 
each specifying one amino acid. However, depending 
on the point at which division into codons begins, the 
nucleic acid can be read in three distinct phases (three 
distinct reading frames) and, aside from the signal for 
initiation of translation, the sequence does not contain 
‘punctuation signals’ to indicate which frame should 
be used. A frameshift mutation is an alteration in the 
nucleic acid sequence, generally an addition or dele- 
tion, that shifts the translation mechanism from one 
reading frame to another. 

In hypothesizing possible coding mechanisms, 
Crick and his colleagues suggested in 1961 that the 
code might be commaless; in other words, that there 
are no intrinsic ‘commas’ to show the proper reading 
by marking off groups of three nucleotides as being 
the correct codons. In this case, they suggested, a 
short insertion or deletion might act as a frameshift 
mutation, and it might be corrected by a nearby sup- 
pressor mutation that would shift the reading frame 
back into the proper phase. Suppose a gene encoding a 
certain protein is properly divided into codons as 
shown by the following spaces (which do not exist in 
reality): 


CAT CAT CAT CAT CAT CAT CAT CAT CAT... 


A deletion of one nucleotide would shift the reading 
frame one space to the left and encode the wrong 
peptide after a certain point: 


CAT CAT CAC ATC ATC ATC ATC ATC ATC... 


However, a nearby insertion of one nucleotide would 
shift the reading frame back into its proper phase: 


CAT CAT CAC ATC ATX CAT CAT CAT CAT... 


Although a few codons still specify the wrong amino 
acids, in many proteins this would make little differ- 
ence and the double mutant will still exhibit the 
normal phenotype. 

Crick et al. (1961) tested this hypothesis by collect- 
ing rlJ mutants of phage T4 caused by the mutagen 
proflavine, which was known to produce insertions 
and deletions. (T4 r// mutants are particularly suited 
for this study because wild-type phage multiply in 
bacteria that are lysogenic for phage lambda but 
mutants do not.) They started with one mutant, 
which we may designate arbitrarily as having a phase 
shift to the left (L). Proflavine-induced suppressors of 
this mutation must therefore have a phase shift to the 
right (R). In turn, suppressors of these R mutants must 
be L mutants. After collecting several mutants, arbit- 
rarily designated L or R, they showed that in general 
a phage will have a wild-type phenotype if it bears 
an L and an R mutation that are quite close together. 
Furthermore, they confirmed that a phage with three 
L mutations or three R mutations close together also 
has the wild-type phenotype, as expected if the code is 
triplet, since three frameshifts in one direction will 
then restore the proper reading frame. 
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Degrees of freedom are part of the specification of 7 

and of certain other statistical distributions such as 
. 2 

the t and the EF We only discuss the y° test here. 
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In practice, degrees of freedom (abbreviated df) need 
to be known when carrying out a 7’ test, in order to 
identify the appropriate column of a table of critical 
values to consult or to calculate the appropriate p 
value. 

If the df are not known a priori, then the question 
arises of determining their correct value in a given 
context. In some cases there are straightforward rules 
that can be followed, but in general this is not a simple 
question to answer. The general determination of the 
df of a x? test is embedded in the statistical theory 
underpinning a particular test in a given context, and 
is thus only accessible to those familiar with this the- 
ory. Many computer programs calculate the df 
automatically using rules, not always correctly. 

The most common example of a 7’ test and its 
associated single degree of freedom (equivalently, 
l df) comes with the 2 x 2 contingency table. This 
can arise when comparing two binomial proportions 
or when cross-classifying units according to two bin- 
ary characteristics. A familiar genetic example is as 
follows. Suppose that we have a random sample of 
individuals who are classified as affected or not in 
relation to some disease, and that we also classify 
them as aa or not aa (i.e., Aa or AA) at a biallelic 
locus. A statistical test of the null hypothesis of no 
association between disease status and this particular 
genetic dichotomy can be carried out by organizing 
the data in a 2x2 table, and computing a 7’ test 
statistic. As indicated above, this test will have | df, 
and this is used in the assessment of significance. If we 
did not collapse the genotypes as described, but kept 
all three separate, we would have a 2 x 3 classification: 
2 disease states (affected, unaffected) and 3 genotypes 
(aa, Aa, and AA). A 7’ test of the null hypothesis of no 
association could still be carried out, but in this case 
the df would typically be 2, failing to be so only if one 
of the rows or columns had no entries. More generally, 
a 7° test of no association based on data from a table 
with r rows and c columns normally has (r—1)(c—1) df, 
though different df can be appropriate if not all 
cells have positive counts. The two-way contingency 
table is an example of the situation in which the 
calculation of the df is usually but not always by a 
simple rule. 

Another such example arises with the 7? test of 
goodness-of-fit. Here the x° statistic might be the 
familiar sum over all cells of observed minus expected 
cell count squared, divided by expected cell count. If 
no unknown parameters need to be estimated to cal- 
culate the expected cell counts, then the df are the 
number of cells minus 1. When k parameters have to 
be estimated to calculate the expected cell counts, the 
df are typically the number of cells minus k +1. This 
rule is not universally true, for there are conditions 
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that need to apply, but they are beyond the scope of 
this entry. 

In summary, the degrees of freedom of a 7? distri- 
bution will usually be determined in a particular con- 
text by a simple rule. The rule will cover most but not 
all cases that arise in practice. 


See also: Null Hypothesis 
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Frequency-dependent selection means that the fitness 
of a genotype is a function of its rarity or commoness 
relative to other genotypes. Several types of such 
selection have been reported, including rare male mat- 
ing advantage, rare male fertility advantage in the 
histocompatibility system in mammals, rare type pre- 
dator resisitance, rare type survival advantage, rare 
type allele advantage in self-incompatibility systems 
in plants, a similar system of sex determination in bees 
and some other hymenoptera, and mimicry. These 
cases will be discussed in turn. 

Rare male mating advantage was first reported for 
laboratory experiments with Drosophila by Petit (1954) 
and additional cases by Ehrman and others (reviewed 
by Ehrman and Probber, 1978). The competing geno- 
types were visible mutants, inversions, and strains 
from different locations. Usually the alternative com- 
petitors had an advantage when rare, but equal mating 
success when common, which would result in a poly- 
morphic equilibrium. Rare male advantage has also 
been observed in the wasp Nasonia, the beetle Tribo- 
lium, the ladybird beetle Adalia, the guppy Poeciliop- 
sis and the mosquito fish Gambusia. These last two 
were observed under natural and seminatural condi- 
tions. The laboratory experiments with Drosophila 
have been criticized with respect to experimental con- 
ditions and statistics (Bryant et al, 1980; Merrell, 
1983; Knoppien, 1985; Partridge, 1988) so, at this 
point, these Drosophila results can be considered 
controversial. 


There have been many experiments showing 
that competition for resources other than mates is 
frequency dependent. Such experiments have been 
done with Drosophila and different strains of crop 
plants. In most of the Drosophila experiments differ- 
ent allozyme genotypes or inversion karyotypes were 
placed as young larvae in the medium at different 
frequencies and the survival to adulthood recorded. 
In many cases the rare type had higher survival than 
the common type surrounding it. 

Many such experiments have been done with dif- 
ferent strains of crop plants (reviewed by Donald and 
Hamblin, 1983). For seed crops, at least, the yield is 
fitness — survival x fertility. In many cases the strain 
performs much better when surrounded by competi- 
tors than by pure stands, and sometimes better than 
competitors showing rare type advantage. In some 
cases the data have been simulated showing a stable 
polymorphic equilibrium (Allard and Adams, 1969). 
However, theory shows that yield is not maximized at 
equilibrium. 

The self-incompatibility system in plants is another 
case of rare type advantage. There is a single incom- 
patibility locus where if the pollen and style have the 
same allele the cross is sterile. It is evident that a rare 
new mutation would have an advantage. Plant species 
with this system have a large number of alleles. One 
clover (Trifoliume) species has 100 alleles. A similar 
system occurs in bees and some other hymenoptera in 
which there is a ‘sex locus’ where a homozygote is a 
male that dies. Normal males (drones) are haploid and 
heterozygotes are workers or a queen; this system 
favors rare alleles. There are 14-20 sex locus alleles in 
bees. There is an analogous system in the major histo- 
compatibility complex (MHC) in mammals, which in 
humans is called human leukocyte antigen (HLA). It 
has been shown that if the embryo has the same geno- 
type as the mother abortion results, which favors 
fathers with different genotypes. There are 100 alleles 
in the HLA system. In mice females prefer males with 
a different MHC genotype, which the female recog- 
nizes by odor. 

Frequency-dependent selection has been proposed 
in predator-prey interactions, where the predator is 
conditioned to favor the most common prey pheno- 
type. The resulting rare type prey advantage is termed 
‘apostatic advantage.’ A number of experiments have 
been carried out in which the prey population is con- 
trived, usually but not always with artificial prey, and 
bird predation has been measured. Both apostatic selec- 
tion and preference for the rare type has been observed 
(anti-apostatic selection) in these experiments. 

Finally, mimicry shows frequency-dependent 
selection. In the case of Batesian mimicry, the mimetic 
morph resembles a bad-tasting model which predators 
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are conditioned to avoid. Here there is a rarity advan- 
tage so that the predators will not experience the good- 
tasting mimic. This results in a low frequency of the 
mimic equilibrium. If the abundance of the model 
varies then there should be a positive correlation 
with the frequency of the mimetic morph within the 
mimetic species. This relation has been shown by 
Edmunds (1966) in Africa for temporal variation in 
frequencies of the butterfly model and mimic. Danaus 
chrysippus and Hypolymnas misippus, respectively, 
and by Brower and Brower (1962) in North America 
for spatial variation in frequencies of the butterfly 
model and mimic Battus philenor and Papilio glaucus, 
respectively. 

Miillerian mimicry is where the distasteful or 
harmful species conspicuously resemble each other. 
The poisonous coral snakes of South America are all 
striped — black, white, and red. There are 50 species 
which all appear the same. Another case of Miillerian 
mimicry is the Heliconius butterfly complex in South 
America. In this case there are several different warn- 
ing designs in different places. Some species are poly- 
morphic for different warning designs in different 
places with steep clines between. Because they belong 
to the same species the design genetics has been stud- 
ied. Mallet and Barton (1989) performed a field study 
in which they released one member of a design group 
into a different design group and showed a rare type 
disadvantage. Those individuals with a locally rare 
design were conspicuous to bird predators that were 
conditioned to avoid a different design. This case of 
frequency dependent selection is the opposite to that 
discussed up to this point, being a rare type disadvan- 
tage. The steep clines between these design regions 
have been studied and successfully modeled by Mallet 
et al. (1990). 
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The rare male mating advantage, representing 
frequency-dependent selection, has long fascinated 
population geneticists. The term implies that the fit- 
ness of a given genotype depends on its proportions in 
a population. Frequency dependence may be positive 
(in favor of the common type) or negative (in favor of 
the rare type). A situation is conceivable in which the 
advantage, or disadvantage, holds only for one type 
when rare. In that case it is called one-sided frequency 
dependence. When the rare type has a higher fitness 
than the common type, selection is balancing, because 
as soon as the rarer type becomes more common 
its advantage disappears. Models implying some kind 
of balancing selection can explain high levels of 
genetic variability, routinely maintained in natural 
populations. (For definitions of assorted types of 
natural selection, see Natural Selection.) The model 
most commonly employed for this purpose is the 
overdominance model, implying that the hetero- 
zygote has a higher fitness than the homozygote. 
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(However, this model implies the occurrence of 
genetic load. Two-sided negative frequency depend- 
ence, on the other hand, can maintain genetic variation 
even without any genetic polymorphism.) 

As a consequence of frequency-dependent fitness 
values, the frequency of the rare type will increase 
until an equilibrium value is reached, wherein all geno- 
types have equal fitnesses. For this reason frequency- 
dependent selection, with advantage for the rare type, 
is proposed as a possible mechanism for the mainten- 
ance of genetic variation in nature. It is claimed that 
there is strong evidence for frequency-dependent 
selection with an advantage for the rare type among 
prey as a result of predation, as an aspect of mimicry, 
among hosts as a result of parasitism, and also due to 
competition. Consideration of the rare male advan- 
tage from the viewpoint of population genetics leads 
to the hypothesis that an initially rare genotype will 
increase in frequency if there are no other selective 
forces operating against it. As the rare type becomes 
more common, its advantage diminishes, leading 
to equilibrium (see Figure | where this has been 
recorded as happening in competitions between epi- 
static eye color mutants). 

A successful experimental approach employed to 
detect frequency dependence of mating success in 
Drosophila in the laboratory has involved two types 
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Figure | Distribution of matings in mass cultures of 
Drosophila pseudoobscura in which orange-eyed (or) and 
purple-eyed (pr) females had a choice of or and pr males, 
showing that as the minority male becomes more 
common, its advantage diminishes, leading to equilibrium 
(similar results were obtained in reciprocal experiments 
reversing rarity). These unlinked marker genes are 
useful in determining paternity because: or/or +/+ or 
orfor +/pr = orange-eyed; +/+ pr/pr or +/or pripr = 
purple-eyed; or/or pripr = white-eyed; and +/or +/pr 
or +/+ +/pr or +/or +/+ or +/+ +/+ = wild-type red- 
eyed. L] indicates rare male advantage. 


of flies in mating chambers, the frequency of the 
types of flies being varied in different replicas 
(Ehrman and Parsons, 1981). For an excellent and 
comprehensive review see Knoppien (1985) and the 
references therein. We also recommend articles by 
Ehrman et al. (1991) and Lofdahl et al. (1992), which 
deal with toxic media and with newer approaches to 
geotaxis, respectively. 

A number of gene and chromosomal polymorph- 
isms have been documented as maintained by such 
frequency-dependent equilibria. The magnitude and 
reproducibility of the effect appears to depend on the 
species, but it has been observed in insects other than 
Drosophila (house flies), as well as in a vertebrate (the 
guppy). Because these are polymorphisms for which 
minimal fitness differentials between competing com- 
ponent genotypes are expected at equilibrium, a dif- 
ferent sort of selection would prevail from that of the 
heterozygote advantage model. Therefore, frequency 
dependence may represent a way of maintaining a 
high level of genetic variability without obviously 
associated fitness differentials. This could be of 
considerable evolutionary significance, since it has 
been argued that there is a limit to the amount of 
variability a population can maintain under the 
classic heterozygote fitness advantage model (see 
Dobzhansky, 1970; Dobzhansky et al., 1977; Ehrman 
and Parsons, 1981). 
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Functional genomics is the development and imple- 
mentation of techniques to examine — both in time and 
space — the global patterns by which genes and their 
protein products act in concert to effect function. 

The human genome consists of a DNA comple- 
ment of approximately 3 billion nucleotide pairs in 
length, divided into 24 distinct chromosomal seg- 
ments (autosomes 1-22 plus X and Y). Contained 
within this linear array of nucleotides is anything 
between 30000 and 40000 genes. The protein prod- 
ucts of genes interact in complex pathways to effect 
cellular function. The combination of genes which is 
expressed in a cell at any particular point in time 
determines the protein complement of the cell and 
hence its functional mechanics. 

During the process of development, subsets of cells 
in the body contain different subpopulations of acti- 
vated genes and, as a result of the different combin- 
ations of proteins that result, differentiate to form 
distinct tissues with their own specialized physiology. 
Proteins, as mobile functional elements, allow com- 
munication between cells, both locally and remotely, 
and are hence responsible for the integration of vari- 
ous cell types into the coordinated, highly complex 
physiological systems that comprise a living organism. 

Differences in the genetic complement between 
individuals can cause differential expression of genes 
or the production of proteins which function in 
slightly different ways. In extreme cases, some individ- 
uals possess harmful variants which cause disease 
directly, but this variation also underpins disease pro- 
cesses in more subtle ways, influencing an individual’s 
susceptibility to disease and varying responses to 
therapeutic interventions. In addition, intracellular 
protein systems allow cells to respond to changes in 
their environment. Certain environmental stimuli will 
perturb the normal cellular functions of proteins and 
cause changes in gene expression. These kinds of 
environmental factors can also lead to the pathology 
of disease. Often the development of a disease will be 
the result of a complex mix of factors including inher- 
ent genetic susceptibility and a series of environmental 
changes or challenges. 

The determination of the full sequence of the 
human genome provides a tremendous opportunity 
to the international research community to begin to 
get the “full picture” of the ways in which cells work 
and the mechanisms underlying disease. Hidden 


within this DNA sequence is the information that 
underpins the biochemistry by which our cells func- 
tion, interact, and differentiate during development to 
effect the complex physiology which makes the 
human body function. However, although the char- 
acterization of the sequence is a critical first step, 
determination of the primary sequence itself leaves 
us a long way from characterization of the complex 
mechanisms by which genes interact to impart this 
function. Knowledge of this sequence, for instance, 
does not in itself elucidate the mechanisms governing 
the control of gene expression, so that the correct 
proteins are present in our cells at the correct time 
during development, nor the ways in which gene 
expression changes during the development of certain 
types of disease. Further studies are also needed to 
shed light on the ways in which the protein products 
of genes interact both temporally and spatially within 
the cell to form the complex pathways which effect 
cellular process. Until we find ways of dissecting these 
processes and making sense of these mechanisms, we 
will never truly understand how genes act and the 
factors which underpin the development of complex 
diseases. 

The new discipline of functional genomics aims to 
develop and apply technologies to use the information 
generated from the characterization of human and 
other genomes to dissect the complexity of function. 


See also: Gene Expression; Genetic Diseases; 
Genome Organization; Human Genome Project 
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Ever since it was first put forward by Fisher in 1930, 
the ‘fundamental theorem of natural selection’ (hence- 
forth referred to as FTNS) has provoked as much 
controversy, and caused as much misunderstanding, 
as perhaps any other result in evolutionary population 
genetics. The reasons for the misunderstandings arise 
from Fisher’s cryptic writing style, the fact that the 
precise statement of the theorem was never clear, the 
existence of typing errors in almost every account he 
gave of the theorem, and the leaps of faith apparently 
made in the mathematical derivations. The position 
was not helped by the appearance of Fisher’s 1958 
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book, in which none of these problems was remedied, 
and in which various further printing errors added to 
the confusion. 

Fisher (1958) gives the following statement for the 
FTNS: “The rate of increase in fitness of any organism 
at any time is equal to its genetic variance in fitness at 
that time.” The contemporary statement of the classic- 
al version of the theorem would be, approximately: 
“The rate of increase of mean fitness of any population 
at any time is equal to its additive genetic variance in 
fitness at that time.” This statement does not imply 
any change to the content of the theorem, but is 
intended to clarify three points. First, the result relates 
to a population, not some given organism in that 
population. Second, it relates to the mean fitness of 
that population. Finally, the contemporary expression 

‘additive genetic variance, denoted here Va, clarifies 
the meaning of the perhaps ambiguous term ‘genetic 
variance.’ 

Why did Fisher place so much weight on this the- 
orem, claiming that it holds the “supreme position 
among the biological sciences?” Fisher’s central aim 
was to restate the Darwinian theory — that evolution 
by natural selection requires variation, and that evolu- 
tion by natural selection is a process of ‘improvement’ 
— in Mendelian terms. Because a parent passes on a 
gene at each locus to an offspring, not his/her geno- 
type at that locus, and because entire genome geno- 
types are regularly broken up over successive 
generations by recombination, he focused on the 
gene as the fundamental unit of transmission and 
thus as the natural entity for describing evolution as 
a Mendelian process. This is why Va, being that com- 
ponent of overall genotypic variation in fitness ascrib- 
able to genes, was relevant to him. To show that this 
component is equal to the increase in mean fitness 
must surely have appeared to him as encapsulating 
the restatement that he desired. 


‘Classical’ Interpretation of the 
Theorem 


Since its foundation in the 1920s by Fisher, Wright, 
and Haldane, population genetics theory has consisted 
in large part of results that assume random mating in 
the population considered; that is, that the choice of 
one’s mate is made at random, independent of the 
genetic constitution of the mate. A second assumption 
often made is that, in studying the evolution of gene 
frequencies at any locus through the effects of muta- 
tion and selection, all other loci can be ignored and the 
locus of interest treated in isolation. A third assump- 
tion, often made in connection with the second, is that 
the fitness of an individual of any given one-locus 
genotype is a fixed quantity, independent of the 


genes in the remainder of the genome. All three 
assumptions were initially made in large part to sim- 
plify the theory, which otherwise would have encoun- 
tered almost insuperable mathematical obstacles. It 
was, however, recognized from the start that a com- 
plete theory would eventually relax these assump- 
tions. 

This set of assumptions led to the following ‘classic- 
al’ interpretation of the FINS: if an arbitrary number 
of different allelic types is allowed at some gene locus, 
if the fitness of any individual depends only on its 
genotype defined by these alleles, and if these geno- 
type fitnesses are fixed constants, then assuming mat- 
ing is random, the population mean fitness will 
increase from one generation to the next, or at least 
remain constant, with the increase in mean fitness 
from one generation to the next being approximately 
equal to the additive genetic variance at that locus. A 
proof of the classical version of the theorem, under 
these assumptions, appears in almost every textbook 
in population genetics, the formal result being: 


Aw = Va (1) 


where Aw is the change in the mean fitness © between 
parental and offspring generations and V4 is the par- 
ental generation additive genetic variance. Further, it is 
also easy to show, if the various single-locus genotype 
fitnesses differ from each other by a small term of 
order 6, that the actual increase in mean fitness differs 
from Va by a term of order 5°. The reason for the 
random-mating requirement is that it is easy to find 
examples, when random mating is not the case, for 
which mean fitness decreases between parental and 
offspring generations. 

The fact that Aw is not exactly equal to Va appears 
to contradict the claims by Fisher that “the rate of 
increase in fitness...is exactly equal to the genetic 
variance” and that “the theorem is exact,” and throws 
doubt on “the rigor of the demonstration...” and the 
use of the word “theorem” to describe the result. This 
observation prompted some involved in the exegesis of 
the theorem to doubt the correctness of Fisher’s calcu- 
lations, a view apparently supported by the “failure” 
of the theorem in the multiple locus, as described 
below. Others took the view that at best Fisher 
intended his result to be approximate, a view hard to 
reconcile with his words quoted above. It is shown 
below that two modern interpretations of the theorem 
claim that the FTNS as correctly understood is an 
exact statement, involving no approximations. 

The classical version of the theorem implies that 
mean fitness is a potential function, that is a math- 
ematically defined time-dependent quantity that, in a 
dynamic process, increases steadily (or at worst remains 
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constant) as time goes on. Yet Fisher, who was well 
aware of the properties of potential functions, 
steadfastly disclaimed any such interpretation to the 
FTNS. Thus in a (1956) letter to Kimura (Bennett, 
1983) he said: 


.. I preferred to develop the theory without (the) assump- 
tion [of a potential function], which...is a restriction. ... I 
should like to be clear that the expression I have obtain- 
ed...does not depend on the existence of any potential 
function. 


This claim, of course, merely adds to the mystery: 
what can Fisher have claimed to be discussing, and 
what can he claim to have proved? 

Strict mathematical proofs that mean fitness does 
increase (under the above assumptions) was accom- 
plished independently by various authors around 
1960. The most direct proof was given by Kingman, 
who showed further that mean fitness strictly increases 
unless gene frequencies are at equilibrium values. 

Equation (1) is found under the assumption of a 
nonoverlapping generation model — there is a distinct 
parental generation, giving rise to a distinct offspring 
generation, and so on. Continuous-time models, in 
which generation membership does not arise, have 
also been studied, with conclusions similar to those 
given. The sex-linked case has also been analyzed. 
Further, a generalization of the theorem, referring to 
any character, not only fitness — sometimes called 
(Robertson’s) ‘secondary theorem’ of natural selection 
— has been made. This generalization states, roughly, 
that the between-generation increase in any character 
is equal to the parental generation covariance between 
additive effects of that character and fitness. 

Of course, the assumptions made for the proof of 
(1) describe a situation that often is far from biological 
reality. Mating might not be random, fitnesses will 
usually involve fertility as well as viability, will depend 
on all genes in the genome, can change for extrinsic 
ecological reasons, and will often not be fixed con- 
stants, but rather be frequency-dependent. Later ver- 
sions of equation (1) were devised, incorporating 
several of these factors, leading to formulas of increas- 
ing degrees of complexity. All versions, however, had 
much the same flavor as that encapsulated in equation 
(1), differing from equation (1) in details but not in 
fundamentals. 

The classical version was, for many years, the 
accepted statement of the theorem. What influence 
has the classical interpretation had in evolutionary 
thinking? The classical version of the FINS is attract- 
ive in that it appears to quantify in Mendelian terms 
the two prime themes of the Darwinian theory, namely 
that variation is needed for evolution by natural 


selection, and that evolution by natural selection is a 
process of steady improvement in the population. 

A variety of views exists about the biological value 
of the classical version of the theorem. Whatever its 
biological value might be, the theorem received con- 
tinual attention from the purely mathematical point of 
view. The most interesting discussion concerned the 
‘multiple-locus case.’ Fisher’s statement of the FINS 
clearly claimed that it was derived assuming that the 
fitness of any individual depends on one’s complete 
genetic make-up. The Kingman analysis showing that 
mean fitness does increase, assumes, however, that 
fitness depends on the genotype at one locus only. 
Thus, immediately after this result was firmly estab- 
lished, attempts were made in the literature to remove 
the ‘one locus’ assumption and to derive a mathemat- 
ical theorem for the case where fitness depends on an 
individual’s genotype at two loci, the first step in 
moving to a multiple-locus result and thus coming 
closer to Fisher’s claimed general statement. 

It is not possible to obtain the multiple-locus 
generalization of the approximation (1) by summing 
both sides over all loci in the genome, since when 
epistasis exists the total additive genetic variance is 
not the sum of the single locus marginal values. Nor 
is it possible to obtain the desired result using only 
gene frequencies in the analysis. Even under random 
mating the vehicle needed to study the evolution of a 
randomly mating Mendelian population, when fitness 
depends on the genotype at many loci, is the set of 
gametic frequencies in the population. 

Eventually an analysis of the theorem was carried 
out using these gametic frequencies. When this 
analysis was done, it was found that in the multilocus 
case the population mean fitness can decrease from 
one generation to the next. The change in mean fitness, 
being in such cases negative, could not then be equated 
with any form of variance, so that the classical version 
of the FT'NS fails in the multiple-locus case. This 
reinforced the views of those who had claimed that 
Fisher’s calculations were always at best approximate. 

The reason why mean fitness can decrease in the 
two-locus case, even under random mating, derives 
from the existence of recombination. Recombination 
can cause an offspring chromosome to differ from 
either parental chromosome, and to this extent the 
offspring does not resemble the parent. It is thus not 
unexpected that the FTNS, in its classical form, will 
fail in the multiple-locus case. 

Despite these comments, cases where mean fitness 
decreases have the nature of comparatively rare odd- 
ities. When fitness differentials are small and linkage is 
loose,meanfitness ‘usually’ increases, andis ‘usually’ ap- 
proximately equal to Va. Several important results of 
this typeare given, forexample, by Nagylaki(1991, 1992). 
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Thus in the multiple-locus case the classical version 
of the theorem is often ‘almost true.’ However, a 
mathematical theorem as an exact statement must 
always be true. Further, from the biological and evo- 
lutionary point of view, the fact that mean fitness 
could decrease is disturbing. The position of the the- 
orem as an exact mathematical statement was thus still 
unresolved. 


Recent Versions of FTNS 


The ‘classical’ version of the FI'NS, described in detail 
above, and about which so much has been written, 
cannot have been what Fisher meant by the theorem. 
This is most clearly seen in the case of nonrandom 
mating. Fisher emphasized frequently that mating (in 
human populations in particular) is not random, and 
claimed that the FTNS is true even for nonrandom- 
mating populations. For example, in an acerbic com- 
ment on Wright’s evolutionary work he said Wright’s 
formulas are “foredoomed to failure just as soon as the 
simplifying, but unrealistic, assumption of random 
mating is abandoned.” It is easy to find cases where 
the population mean fitness decreases when mating is 
not at random, and Fisher was well aware of these, so 
that the theorem in its classical version cannot have 
been what he had in mind. However, his writings 
unfortunately do not make clear, with any degree of 
certainty, what he did have in mind, and sometimes 
seem to state results that cannot be what he meant. 
Thus the focus changes from problems of exegesis of 
Fisher’s written work to the more dangerous under- 
taking of reading Fisher’s mind, and finding what must 
have been his interpretation of the FINS, camou- 
flaged though it may be in his writings. This change 
of direction has led to two recent interpretations of the 
theorem, both quite different from the classical inter- 
pretation. 

The breakthrough in this direction came with a 
little-appreciated paper by Price (1972). Price claimed 
that Fisher was not interested in the actual change of 
mean fitness, but rather only in that part of the change 
“due to natural selection [rather than] due to environ- 
mental change, [where we regard] dominance and 
epistasis as environmental effects.” This ‘natural selec- 
tion’ change was also thought of as the change in mean 
fitness due to changes in gene frequencies. Difficult 
though it is to make immediately concrete the concept 
of “change due to natural selection and gene frequen- 
cies,” this insight nevertheless led to both modern 
interpretations of the theorem. The “change due to 
natural selection” has been called the “partial change” 
in mean fitness, and the interpretation of this change is 
clarified by considering the case where fitness values 
depend on the genotype at one gene locus only. 


Suppose then that the fitness of any individual 
depends entirely on his genotype at a single locus ‘A’ 
at which may occur genes (here and elsewhere, more 
exactly ‘alleles’) A;, Ao,..., Ax. Denote the frequency 
of the genotype A;A;, at the time of conception of the 
parental generation, as Pi; (when i = f) and 2P;; (when ż 
# J). This notation implies that the frequency. pi of the 
gene Aj is >`; Pj. If the fitness of an individual of geno- 
type A; A; is wij, the mean fitness © of the parental 
generation, at this time, is then given by: 


w = Vidi; Piwi (2) 


As noted above, Fisher’s main evolutionary focus was 
on the genes at any locus, not the genotypes, and a key 
concept for Fisher was the average effect in fitness of 
any gene. The average effects 01, &2, . . ., % of the genes 
Ay, Ao,..., Ax are defined as the values that minimize 
the gradatie function )7; -j Pj(wij — @ — a; — ai)’, 
subject to the constraint J`; a= = 0. These average 
effects may be thought of as roughly the ‘fitnesses’ of 
the various genes, and © + a; + a as the best additive 
approximation to the fitness Wij using these average 
effects. The additive genetic variance V4 is the amount 
removed from the above quadratic function by fitting 
these o; values. 

The next step in the argument is to note that, for 
evolutionary analyses, Fisher appears to have con- 
ceived of the fitness of the typical genotype A,A; not 
as the actual fitness wi, but rather as the additive 
approximation ®© + a; + a. This interpretation is 
justified from the excerpts such as the following 
from his 1958 book: 


..for any specific gene combination we build up an 
“expected value’... by adding [to the mean] appropriate [a 
values] according to the...genes present. This expected 
value will not necessarily represent the real [fitness]... but 
its statistical properties will be more intimately involved in 
the inheritance of real [fitness] than [fitness] itself. 


This additive approximation is called the ‘breeding 
value’ in animal breeding programs. 

This change of viewpoint implies that Fisher 
thought of the mean fitness not as in equation (2), 
but rather as: 


>: 2i Pi(@ + ai + 05) (3) 


This change of viewpoint is purely conceptual, since 
the expression (3) is numerically identical to the mean 
fitness defined in (2). Despite this identity, this new 
conceptualization leads to the concept of the partial 
change in mean fitness as the change, over one genera- 
tion, of the expression (3) brought about by changes in 
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the genotype frequencies P;;, with the quantities ®, di, 
and a; being unchanged, remaining at the parental 
generation values. This partial change in mean fitness 
is then: 


dei Daj (Pj — Pij)(@ + ai + aj) (4) 


where P’;; is the daughter generation frequency of the 
genotype A;A;, defined as for the parental generation 
value. It is straightforward to show, with minimal 
evolutionary assumptions, that: 


partial increase in mean fitness = Va/w@ (5) 


whether or not random mating occurs. The additive 
genetic variance Va in this expression may be com- 
puted as: 


Va = 20); a; (Api) (6) 


This exact single-locus discrete-time result involving 
no approximations, and together with its analogous 
continuous-time version, is in reasonable agreement 
with Fisher’s wording. 

The parallel multiple-locus statement of the the- 
orem, namely that if fitness depends in an arbitrary way 
onan arbitrary number of genes at an arbitrary number 
of loci, with an arbitrary recombination structure, and 
with no assumption made about random mating, is: 


partial increase in mean fitness = yo /w (7) 


where vm is the full multiple-locus additive genetic 
variance. This equation, again exact and embodying 
no approximations, is the statement of one of the 
modern interpretations version of the FI'NS. 
Equation (7) is not achieved by simply summing 
both sides of equation (5) over all loci. Despite this, a 
summation result of a different form does hold. An 
expression parallel to that in (6) is that the multiple- 
locus additive genetic variance may be written as: 


Vin) = 20 X; Dy cx (Api) (8) 


where a is the multiple-locus average effect of gene A; 
at locus j, Ap; is the one-generation change in the 
frequency of that gene, and the sum is over all genes 
at all loci in the genome. 

Equations (7) and (8) imply that this version of the 
FTNS can be restated in the form: 


partial increase in mean fitness = 2 }7, 57; aij(Apij) (9) 


the double sum being over all alleles at all loci. The 
expression on the right-hand side of (9) derives from 


the following argument. All multiple-locus genotypes 
are thought of as being listed in order, the typical such 
genotype being described as genotype g. The partial 


change in mean fitness is then 


UgAP(g)@(8) a1 (10) 


where AP(g) is the between-generation change in fre- 
quency of the multiple-locus genotype g and w(g), 
is the sum of the average effects of all genes at all loci 
in the genotype g, any average effect being counted 
in twice in the sum if the corresponding gene 
occurs twice in genotype g. The expression (10) may 
be shown to be identical to the expression 
2wd; dja (AP) © arising on the right-hand side of 
(9), so that the above interpretation of the FINS can 
be written as 


DAP(g)w(g)4 = Va (11) 


A second modern interpretation of the FINS, due to 
Lessard (1997), appears initially to be similar to (11), 
but is arrived at by a quite different analysis than that 
leading to (11), and differs from (11) in several import- 
ant ways. Lessard’s equation is 


LAP(g),w(g) = Va (12) 


The difference between the two expressions (11) and 
(12) is the following. In (11), AP(g) is the actual change 
in frequency of genotype g over one generation, and 
w(g),, can be thought of as the best estimate of the 
fitness of genotype g, given the genes in this genotype. 
In (12), w(g) is the actual fitness of genotype g and 
AP(g),, defined as P(g)w(g), —P(g), may be 
thought of as the best estimate in the change in the 
frequency of genotype g, given the genes in this geno- 
type. 

Lessard’s interpretation of the theorem appears to 
agree more closely with Fisher’s words than does the 
interpretation deriving from (9) and (10), and may 
very well be the correct interpretation of the theorem. 
If so, a final resolution of the interpretation of the 
FTNS has been reached. A full discussion of this 
point is given in Lessard (1997). 

The above discussion in terms of a discrete time 
model with viability fitnesses only. Lessard (1997) and 
Ewens (1989) show that the two modern interpreta- 
tions hold, with appropriate changes, for continuous 
time models, and Lessard discusses models with age 
structure and fitness defined as the mean number of 
offspring produced. Lessard and Castilloux (1995) 
show that the modern interpretations hold also when 
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fitnesses relate to fertility differences among couples. 
Frank (1997) discusses the relation of the FINS with 
Price’s equation. 
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Fungal genetics is the experimental study of the 
properties of genes and chromosomes carried out 
with filamentous fungi (such as Neurospora, Asper- 
gillus, and Ascobolus) or with yeasts (such as Sacchar- 
omyces, Schizosaccharomyces, and Candida). These 
organisms have been important in basic genetics 
because they are eukaryotes but are also amenable to 
the elegant methods of bacteriology. 


See also: Ascobolus; Aspergillus nidulans; 
Neurospora crassa; Saccharomyces cerevisiae 
(Brewer’s Yeast); Schizosaccharomyces pombe, the 
Principal Subject of Fission Yeast Genetics 


Fungi 
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A group of simple, nongreen plants that includes molds, 
mushrooms, rusts and smuts, and sometimes yeasts. 


See also: Ascobolus; Aspergillus nidulans; 
Neurospora crassa 


FUS-CHOP Fusion 


See: Myxoid Liposarcoma and FUS/TLS-CHOP 
Fusion Genes 


Fusion Gene 
P Riggs 
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A gene fusion is defined as two genes that are joined so 
that they are transcribed and translated as a single unit. 
Gene fusions can occur in vivo, both naturally and as a 
result of genetic manipulations, and can be con- 
structed in vitro using recombinant DNA techniques. 
They occur in nature over the course of evolution, for 
example, where two genes whose products are part 
of a metabolic pathway fuse, giving rise to a fusion 
protein that carries out both steps of the pathway. 


History 


The first gene fusions created by design were between 
the rIIA and rIIB genes of phage T4, studied by 
Champe and Benzer. They used the effects of mis- 
sense, nonsense and frameshift mutations in the rIIA 
gene on RIIB activity to elucidate the properties of 
the genetic code. Subsequently, fusions were created 
in Escherichia coli using in vivo genetic techniques 
to join various genes to the lacZ gene, which codes 
for the easily assayed enzyme B-galactosidase. These 
fusions were used as a way to examine the expression 
level and regulation of the gene fused to lacZ. Fusions 
were originally limited to genes that were located near 
the B-galactosidase gene, but later Casadaban and 
coworkers pioneered in vivo and in vitro techniques 
that allowed fusion to virtually any gene. 


Current Uses 


The major current use of gene fusions is still the study 
of gene expression, including levels of expression and 
location of gene products. Both gene fusions and 
reporter constructs (where the gene of interest is 
replaced by a ‘reporter’ gene instead of being fused 
to it) are used for this purpose. Fusions to lacZ are 
common, but any gene whose product is active as a 
fusion and can be assayed is suitable for this purpose. 
In this method, an extract of a cell or tissue containing 
a gene fusion is prepared and the level of gene expres- 
sion is measured by assaying the fusion. Gene fusions 
can also be used to study the differential expression 
of a gene in different tissues of an organism, by 
histochemical staining for the fused gene in sections, 
tissues, or the whole organism. Two genes commonly 
used for this technique are the lacZ and gfp genes. The 
lacZ gene has been used primarily because of the 
vast experience researchers have with B-galactosidase 
fusions, and the many substrates available for this 
enzyme. One of these substrates, X-gal, produces 
a dark-blue insoluble product when cleaved by B- 
galactosidase. Thus, the blue color does not diffuse 
away from the site of cleavage, and one can infer the 
location and level of expression from the intensity of 
the blue color. The gfp gene codes for green fluores- 
cent protein, which fluoresces green when excited by 
blue or UV light. This allows visualization, and in 
many cases can be used on intact, live organisms. 


Further Reading 

Casadaban M J, Martinez-Avias A, Shapiro D K and Chou J 
(1983) B-galactosidase gene fusion for analyzing gene expres- 
sion in Escherichia coli and yeast. Methods Enzymology 100: 
293-307. 

Champe S P and Benzer S (1962) An active cistron fragment. 
Journal of Molecular Biology 4: 288-292. 


See also: Beta (f)-Galactosidase; Fusion Proteins 


Fusion Proteins 
P Riggs 
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A fusion protein is a protein consisting of at least two 
domains that are encoded by separate genes that have 
been joined so that they are transcribed and translated 
as a single unit, producing a single polypeptide. Fusion 
proteins can be created in vivo, but are usually created 
using recombinant DNA techniques. The fusion often 
consists of the protein that is being studied joined to 
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one of a small number of proteins that have useful 
properties to aid in the study. 


History 


Some of the first fusion proteins were created in 
Escherichia coli using in vivo genetic techniques to 
join various proteins to the B-galactosidase enzyme. 
These fusions were used initially as a way to assay the 
expression level of the protein of interest. Fusions 
were originally limited to proteins whose genes 
were located near the B-galactosidase gene, but later, 
Casadaban and coworkers pioneered in vivo and in 
vitro techniques that allowed fusion to virtually any 
protein. Researchers were originally surprised that 
some of the fusions were bifunctional, i.e., when the 
C-terminus of a protein was fused to the amino ter- 
minus of f-galactosidase, both the proteins retained 
activity. As more and more fusions to B-galactosidase 
were obtained and found to have activity, researchers 
began to make fusions to other proteins besides 
B-galactosidase and found that they could be bifunc- 
tional as well. 


Uses of Fusion Proteins 


The technique of creating fusion proteins has been 
extended to other fusion partners, and additional 
uses have been developed for the fusion partner. 
Three of the most important uses of fusion proteins 
are: as aids in the purification of cloned genes, as 
reporters of expression level, and as histochemical 
tags to enable visualization of the location of proteins 
in a cell, tissue, or organism. 

For purification, a protein that can be easily and 
conveniently purified by affinity chromatography 
is fused to a protein that the researcher wishes to 
study. A number of proteins and peptides have been 
used for this purpose, including staphylococcus 
protein A, glutathione-S-transferase, maltose-binding 
protein, cellulose-binding protein, chitin-binding 
domain, thioredoxin, strepavidin, RNasel, poly- 
histidine, human growth hormone, ubiquitin, and 
antibody epitopes. 

The proteins used most often as fusion partners for 
reporter constructs are B-galactosidase, luciferase, and 
green fluorescent protein (GFP). B-galactosidase has 
the advantage of numerous commercially available 
substrates, including some that produce a colored 
product and some that lead to the production of 
light. Luciferase and GFP both produce light, and 
can be visualized directly or quantitated using a 
luminometer or a fluorometer, respectively. GFP has 
an advantage in that it does not require a substrate, 
whereas luciferase requires its substrate, luciferin, as 
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well as ATP, O, and Mg”. GFP emits green light 
when excited by blue or UV light, and in many cases 
can be used on live, intact cells and organisms. 

A useful extension of fusion proteins as reporters is 
the two-hybrid system. In this method, two separate 
fusions are employed to test for interaction between 


two proteins, where binding of the two proteins 
brings together their fusion partners and results in 
activated transcription of a reporter gene. 


See also: Beta (B)-Galactosidase; Fusion Gene 


GI 
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The G1 phase of the eukaryotic cell cycle is that 
between the end of cell division and the start of 
DNA synthesis. G1 refers to the first gap phase. 


See also: Cell Cycle 


G2 
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The G2 phase of the eukaryotic cell cycle is that 
between the end of DNA synthesis and the start of 
cell division. G1 refers to the second gap phase. 


See also: Cell Cycle 


Galactosemia 
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Galactosemia is the most common form of abnormal 
galactose metabolism and is a recessively inherited 
disorder with an incidence of 1:20 000 to 1:60 000 live 
births. Although the deficient enzyme is known, the 
etiology of the clinical syndrome is enigmatic. The 
clinical picture evolves in two phases. The first occurs 
after birth when the feeding of milk and other formu- 
las containing lactose produces a galactose toxicity 
syndrome manifested by hyperbilirubinemia, failure 
to thrive, vomiting, cataract formation, blood coagu- 
lation defects, and renal tubule dysfunction. With a 
galactose-restricted diet these abnormalities regress. 
The second phase occurs despite the diet therapy 
with later development of speech abnormalities, 


mental retardation, neurological ataxias, and ovarian 
failure. 

The diagnosis is suggested by the presence of 
abnormally high galactose levels in blood and urine 
and elevated red blood cell galactose-1-phosphate. It 
is confirmed by quantitation of red blood cell enzyme 
activity of less than 7% of normal as well as determin- 
ation of the genotype. Heterozygotes express about 
50% of normal enzyme activity. Subjects who are 
carriers of the defective galactosemia gene com- 
pounded with the Duarte gene express about 25% of 
normal red cell activity. 

The normal disposition of dietary galactose in- 
volves the conversion of the sugar to glucose via a 
series of three enzymes known as the Leloir pathway: 
(1) galactokinase catalyzes the phosphorlylation of 
galactose with ATP to form galactose-1-phosphate; 
(2) galactose-1-phosphate uridyltransferase reacts 
the sugar phosphate with UDPglucose to form 
UDPgalactose and glucose-1-phosphate; and (3) UDP- 
galactose-4-epimerase converts the UDPegalactose 
to UDPglucose. The net result of the series is the 
conversion of galactose-1-phosphate to glucose- 
1-phosphate. Inherited deficiencies of galactokinase 
and UDPgalactose-4-epimerase are known but occur 
much less frequently than transferase deficiency 
galactosemia. The main manifestation of galactokinase 
deficiency is cataract formation when a galactose- 
containing diet is ingested. Epimerase occurs in two 
forms: one benign with reduced red blood cell enzyme 
activity, and the other, which is extremely rare, ex- 
hibits a toxicity syndrome similar to transferase defi- 
ciency. 

With a block in the pathway due to absence of 
transferase, galactose-1-phosphate and galactose accu- 
mulate. As a consequence two alternative routes of 
galactose disposal are activated. The first is reduction 
of the sugar by aldose-reductase to galactitol, which is 
not further metabolized. The second involves oxida- 
tion to galactonate, which can be further metabolized 
to CO; and xylulose. Both galactitol and galactonate 
are excreted in urine in large quantities, and red blood 
cell galactose-1-phosphate remains elevated despite a 
galactose-restricted diet. The explanation for the ele- 
vation of these abnormal metabolites appears to be a 
large endogenous synthesis of galactose, presumably 
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from turnover of galactose-containing complex glyco- 
conjugates. Short-term 2-h oxidation of isotopic 
galactose to CO, is very slow in most patients, but 
24-h oxidative capacity is similar to that measured in 
normal patients after 5h. The oxidative pathways 
involved have not been completely defined. This cap- 
acity to oxidize the sugar plus the urinary metabolite 
excretion maintain the patient in a steady-state with 
plasma galactose levels in the low micromolar range. 

The human galactose-1-phosphate uridyltransfer- 
ase gene of 4 kb has been cloned and sequenced and 
consists of 11 introns and exons on chromosome 9. 
The cDNA codes for a 374 amino acid protein, with 
about 49% conservation between human and Escher- 
ichia coli enzymes. The active enzyme is a dimer with a 
molecular mass of 96 kDa. There are over 100 muta- 
tions known to occur in galactosemic patients. Most 
are missense mutations with a single base change, but 
stop mutations, splice site changes, frameshifts, and 
large deletions are found. The most common muta- 
tion, accounting for over 60% of mutant alleles, is 
Q188R, in which arginine is substituted for glutamine 
in the highly conserved region of exon 6. About 45% 
of patients are homozygous for Q188R. A number of 
Q188R alleles are compounded with other mutations. 
In African American and South African black gal- 
actosemics the prevalent mutation is $135L. The 
Q188R mutant is believed to be devoid of enzyme 
activity, while the $135L mutation results in residual 
liver enzyme activity. Heterodimer formation may 
be a significant determinant of enzyme activity. The 
N314D mutation with an asparagine to aspartic 
change is prevalent and the basis of the Duarte variant. 
It results in diminished but not absent erythrocyte 
enzyme activity and is itself benign. 

There appears to be no clear genotype-phenotype 
correlation. However, the ability to oxidize admin- 
istered 1-'°C galactose to '*COz of less then 2% in 
2h appears to indicate a more severe disorder as 
observed in many Q188R homozygotes than that pres- 
ent in compound heterozygotes. 

Neither the pathobiochemical basis of galactose 
toxicity in the newborn period, nor the late on- 
set long-term diet-independent complications are 
known. Accumulation of galactose-1-phosphate and 
galactitol are believed to be responsible, but the 
mechanism of multiorgan involvement is unclear. 
Cataract formation is associated with galactitol accu- 
mulation. A knockout mouse with absent transferase 
activity shows no manifestations of the human 
phenotype, suggesting that absence of transferase is 
necessary but not sufficient to cause disease. This 
points to epigenetic factors and abnormal alternative 
pathway metabolites as the possible basis of the 
human disease. 


The only known treatment of galactosemia has been 
restriction of lactose and other galactose-containing 
foods. Although the postnatal toxicity is alleviated, 
the long-term complications have not been averted. 
Speech therapy, special schooling, and hormonal 
therapy of ovarian failure are indicated and may be 
helpful. The disorder remains an enigma requiring the 
search for new therapeutic strategies. 


See also: Lactose 
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The Victorian intellectual Francis Galton (1822-1911) 
was one of the chief founders of the science of 
biometry, or the statistical and quantitative study of 
living things. He described its chief objective to be “to 
afford material that shall be exact enough for the dis- 
covery of incipient stages in evolution,” stages that are 
“too small to be otherwise apparent.” His goal was to 
establish the foundations upon which to base a policy 
to control and direct the future of mankind. In 1883 he 
coined the word ‘eugenics’ for the science of improv- 
ing stock, a term, he remarked, 


that is not confined to questions of judicious mating, but 
which especially in the case of man, takes cognizance of all 
influences that tend in however remote a degree to give to 
the more suitable races or strains of blood a better chance of 
prevailing speedily over the less suitable than they otherwise 
would have. 


Galton was particularly concerned to show that the 
behavioral as well as the physical traits of mankind are 
inherited, that what is acquired in life cannot be passed 
to the offspring, and that nature (that which is inherit- 
ed) has much more influence on the individual than 
has nurture (that which is gained from experience and 
education). He was, in other words, an ‘hereditarian.’ 
But he did not simply make hereditarian claims. He 
developed the statistical techniques of regression and 
correlation to analyze the biometric data he collected 
from sampling human populations. His passion for 
quantitative treatment he also applied to establish 
weather patterns, to test the effectiveness of prayer, 
and to explore sensory perceptions, imagery, and 
memory. 

The zoologist Raphael Weldon and the mathemat- 
ician Karl Pearson became devoted followers of 


Galton. From the 1890s they vigorously developed the 
science of biometry, and in the early years of the 
twentieth century they opposed the newly rediscov- 
ered science of Mendelian heredity. Nearly two dec- 
ades passed before biometry and Mendelism were 
effectively united to form what we call population gen- 
etics. Meanwhile Galton’s science of eugenics passed 
through phases of popular approval and disapproval. 
The intensive study of the chemical sequence of the 
genetic material of our genes that has been ongoing 
since the 1980s has once more brought the subject of 
eugenics to popular attention. Could new and power- 
ful techniques now make possible a kind of eugenics 
“by the backdoor”? Not, in other words, public legis- 
lation enforcing eugenic policies, but covert pressures 
of the market through discrimination, and limited 
access to resources. 


Francis Galton’s Life 


Galton came of a wealthy and well-connected family, 
his mother being the daughter of Charles Darwin’s 
grandfather, Dr Erasmus Darwin. As a boy and the 
youngest in the family, much affection was bestowed 
upon Francis, especially by his three sisters. Adèle, the 
youngest, acted as his tutor, and those around him 
soon considered him an infant prodigy. However, for- 
mal education, neither in France where he was sent at 
the age of eight, nor in England from the age of 14, 
proved to be to his liking. At 16 he began to study 
medicine, but two years later he turned to math- 
ematics and moved to Cambridge. Four years later 
he gained a BA without honors and prepared to return 
to his medical studies. Then his father died, and he 
came into an inheritance that permitted him to forget 
medicine and indulge in his love of exploration. 

The resourceful and courageous young Galton 
traveled through Egypt, Syria, and South West Africa, 
where he covered some 1700 miles of uncharted coun- 
try and came to know the Damara, Namaqua, and 
Ovampo tribes. He was struck by their distinctive 
behavioral and physical characteristics, and those of 
their domesticated animals. On his return to England 
in 1852 the Royal Geographical Society awarded him 
their gold medal for his achievement. During the 
1850s he worked to promote geographical explor- 
ation, published his guide The Art of Travel (1855), 
introduced weather maps, discovered anticyclones, 
and worked for the British Association for the 
Advancement of Science. When in 1859 his cousin 
Charles Darwin published his book On the Origin of 
Species, Francis read it and was greatly impressed. If all 
life is the product of evolution, we should be able, 
given sufficient knowledge, to control our own evolu- 
tion — our future. But, like his cousin, he realized there 
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was a weak spot in the theory, the lack of sound 
knowledge of the nature of heredity. Accordingly, he 
turned in the 1860s to address this subject. Between 
1865 and 1889 he worked often obsessively on gather- 
ing material. Publications on the subject flowed from 
his pen, the most important being his two books: 
Hereditary Genius (1869) and Natural Inheritance 
(1889). He lived on to 1911 — long enough to seize 
the opportunity that the changing political climate 
of the new century afforded him to publicly appeal 
for the establishment and support of the science he 
called eugenics. 


Human Heredity 


Conceptual 

During the nineteenth century Herbert Spencer and 
Francis Galton gave up the term ‘inheritance’ and 
following the French they substituted the term ‘her- 
edity’ (hérédité). This signaled Galton’s conception of 
heredity as based upon the continuity from generation 
to generation through an unbroken line not of persons 
but of the elements in the fertilized eggs from which 
they came. The term ‘inheritance’ suggested the legal 
concept of the transmission of a person’s estate to his 
descendants. Here the link is between the visible char- 
acteristics of the grown person (the parent) and the 
corresponding features of the offspring. But heredity 
is often indirect. The offspring bear similarities to 
many ancestors, not just to the parents. Moreover 
Galton was convinced that nothing we acquire in our 
organic constitution can thus be passed on. If we 
behave more virtuously, will our children do likewise? 
Do the sons of old soldiers, he asked in 1865, learn 
their drill more quickly than others, or the sons of 
fishermen escape sea-sickness? And if acquired char- 
acters are inherited, why have the many tribes of 
American Indians, though scattered over the vast 
range of different climates and situations of the Amer- 
icas, remained much the same? Yet, if heredity is so 
unyielding, why is it that a father’s characters are 
sometimes revealed in the son, sometimes in the 
daughter, or the child may bear the character seen 
only in a grandparent or more distant ancestor? How 
can so hard a process be so fickle? Galton saw that the 
answer lay in the statistical study of large numbers of 
ancestors and descendants, and in making an analysis 
of their statistical relations one to another. This is the 
heart of his project for what was later to be called 
biometry. 


Observational 

First he wanted to gather evidence that behavioral as 
well as physical characters are inherited. He chose to 
study what he called ‘genius,’ or as he defined it, an 
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ability that is “exceptionally high, and at the same time 
inborn.” It excludes any ability that can be attributed 
to the effects of education, but it includes an energetic 
disposition. Brilliance without application, persist- 
ance, and stamina, is of little use. Then he made the 
questionable assumption that ability correlates with 
eminence in public and professional life. Noting that 
great ability seemed to cling, as it were, to particular 
families like his cousin’s, the Darwins, or the Bachs 
with its musicians and the Bernoullis with its math- 
ematicians, he turned to the legal profession and 
extracted the names of 109 judges sufficiently eminent 
to be mentioned in Foss’s Lives of the Judges (1865). 
Then he tracked the 85 families involved to establish 
how many relatives of these judges also achieved emi- 
nence in the legal or other professions. He found that 
one in every nine of these judges was either father, son, 
or brother to another judge, not to mention the rela- 
tions of judges that attained higher legal office. He set 
out his results in tabular form (Table 1). The table 
illustrates how fewer and fewer relations of the most 
gifted member of a family attain to eminence the more 
distant is their kinship to that member. The percent- 
ages, wrote Galton, “are quartered at each successive 
remove.” He concluded that the data show “in the 
most unmistakable manner the enormous odds that a 
near kinsman has over one that is remote, in the chance 
of inheriting ability.” To consolidate this claim he 
turned to another eight professions, and to oarsmen 
and wrestlers. Most of the data were supportive of his 
claim, though he noted that some sons of very pious 
parents occasionally turn out extremely badly! 


Methods in Population Studies 


Pedigrees 

To the objection that Galton was ignoring the effects 
of nepotism, and the advantages of privileged upbring- 
ing and expensive education, he replied with the 
names of great men who, despite their lowly origin, 
had become eminent. This criticism, of course, struck 
at Galton’s assumption that public eminence is a meas- 
ure of native ability. He was aware of another prob- 
lem, i.e., the underrepresentation of family data. This 
is the Achilles heel of the pedigree method, i.e., the use 
of family pedigrees for genetic data collection. Have 
some of the ‘failures’ in life been left out? Are more 
representatives of the male kin included than those of 
the female? And how does one assess the contribution 
to ability coming from the females in the line using 
professional achievement at a time when the profes- 
sions studied by Galton were not open to them? By 
the 1890s whole-population studies were being under- 
taken in Germany to escape such criticisms in the 
debate over the supposed inheritance of tuberculosis, 


Table I The judges from 1660 to 1865. (From data in 
Galton (1889) Ancestral Inheritance.) 


Ye Great-grandfathers 


72 Grandfathers Ye Great-uncles 


26 FATHERS 42 Uncles 


The most eminent members of 23 BROTHERS 1% First cousins 
100 distinguished families | 


36 SONS 4% Nephews 


9%» Grandsons 2 Great-nephews 


1% Great-Grandsons 


and in the 1870s Galton developed his famous method 
of twin studies in his effort to gather reliable evidence 
concerning the relative power of heredity and envir- 
onment upon the shaping of the offspring. This has 
become one of the classic approaches whenever deal- 
ing with human traits, since the experimental ap- 
proach is excluded. 


Twin Studies 

For his study of what he called “The history of 
twins” Galton used the questionnaire method. 
Darwin had circulated a questionnaire in his study of 
heredity and variation in the 1830s, and Galton had 
followed his example in his investigation into the 
upbringing and personal characteristics of Fellows of 
the Royal Society. His appeal for information about 
twins resulted in 35 adequately answered responses 
from parents of ‘closely similar’ twins, and 20 from 
parents of ‘exceedingly unlike’ twins. These allowed 
him to distinguish between what we call identical and 
non-identical twins. From this comparison of the two 
groups he concluded that “nature prevails enormously 
over nurture when the differences of nurture do not 
exceed what is commonly to be found among persons 
of the same rank of society and in the same country.” 
This was a wise qualification because he did not have 
data on twins reared apart either in identical or non- 
identical environments. Subsequent researches by 
Galton’s successors did extend the data collection in 
this way, but it is questionable how different were the 
environments of the separate homes in which the two 
members of each pair of twins grew up. In the 1970s 
the most extensive collection of twin data, that of the 
British psychologist, the late Sir Cyril Burt, was 
exposed as fraudulent. On a subject as politically 
sensitive as the heredity-environment equation, this 


revelation had a damaging impact upon the field, but 
careful twin studies continue, particularly as a tool in 
the study of hereditary predispositions to diseases, 
including mental illnesses. 


Regression 

Another method of central importance in the study of 
populations is that of the statistical distribution of 
traits. Galton was aware of the curve of ‘normal’ dis- 
tribution, also known as the Gaussian or error curve 
after the mathematician Gauss who applied it to the 
study of errors in astronomical measurement. Follow- 
ing Gauss, the Belgian Adolph Quetelet found that 
the measurement of the chests of 5738 Scottish sol- 
diers and the stature of 100000 French conscripts, 
when compared with the expectation from Gaussian 
curves, showed a “marvelous concordance.” The graph 
is bell-shaped, its top or plateau representing the 
median of the data (Figure 1), the median being that 
value which divides the data on either side equally and 
symmetrically. As an error curve the sides of the bell 
represent the ‘population’ of error measurements, and 
the top itself is hopefully the ‘true’ measurement. As 
a representation of the distribution of the soldiers’ 
heights, Quetelet envisioned the top as marking the 
height of the ‘average man.’ Those taller or shorter 
than this measure were ‘errors’ as it were in attempts 
to copy the ideal of the race. The fact that these data 
fitted the error curve demonstrated, in his view, that 
they were homogeneous. 

Galton focused his attention less on the homogen- 
eity of the population than on its variability. How, in 
spite of variability, did its median remain the same in 
successive generations, for of this he was already con- 
vinced? Therefore, he wanted to dissect the curve into 
its parts and follow the progeny of those parts. So he 
devised an exploratory study in which he got his 
friends to help by asking them to grow sets of sweet 
pea seeds (Lathyrus odoratus), which he had divided 
into seven classes by weight. They returned the crop 
to him and he was then able to plot the progeny seed 
weights against the weights of their respective parents. 
The result revealed the presence of a tendency of the 
progeny of heavy seeds to be lighter than their parents 
and those of lighter seeds to be heavier. There was a 
‘reversion’ toward the ancestral mean. Since the aggre- 
gate mean remained the same and because his helpers 
all lived in different parts of the British Isles he was 
confident that the data did not reflect the effects 
of environment. This tendency to counteract the 
extremes of individual variation by ‘shrinking’ the ex- 
cesses whether dwarfs or gaints in their progeny he 
called ‘reversion,’ and later more wisely, ‘regression,’ 
since reversion was the termalready inuse to refer to the 
return of the progeny of hybrids to their originating 
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distribution’ (Redrawn from data in Quetelet (1817) 
Edinburgh Medical and Surgical Journal: 260-264.) 


species. Now he could understand why variability 
does not change the median or ‘center of gravity’ of 
the population. 

Having established in a rough manner evidence for 
this regression in plants, Galton cast around for data 
on human characteristics, but in vain. However, when 
he advertized prizes of £500 for those who best filled 
in the elaborate set of questions that he prepared 
concerning them, their grandparents, parents, sisters, 
brothers, children, and other relatives, he was re- 
warded with a good response. These family records 
included stature of family members, so he was able to 
plot the statures of parents against offspring from 
which he calculated the regression (Figure 2). He 
expressed what he called the ‘coefficient of regression’ 
as the ratio between the deviation of the offspring and 
that of the mid-parents from the population mean. 
This is measured on the graph by the distances AB 
and AC or EF and EG. Since the data fall approxi- 
mately ona straight line, the ratio is constant through- 
out its length, giving a coefficient of regression of 
two-thirds. Now he had measured a statistical relation 
between two generations. 


Correlation 

Initially he considered regression in one direction 
only, but later realized that the regression of the 
parent on the child is the reciprocal of the child on 
the parent. Then in 1885 he hit upon the concept of 
‘correlation,’ namely that where there is a relation 
between the variation of one entity and that of 
another, they can be considered causally related. This 
important conception was developed more fully later 
by Karl Pearson. 
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Figure 2 Graph of the relation between the mid- 
parental heights of parents and the mean height of their 
children. The diagonal line represent all points on the 
graph corresponding to hypothetical parents and their 
children where the means of the children’s heights are 
identical with their parents. The steeper line is plotted 
from Galton’s data. The ratios AB:AC or EF:EG give the 
coefficient of regression. (Redrawn from Galton (1869) 
Hereditary Genius: An Inquiry into its Laws and Conse- 
quences, pp. 83. London: Macmillan.) 


Natural Selection 


Stabilizing Selection 

Darwin had focused upon the slight individual differ- 
ences that constitute the variation to be found among 
members of a family and of a species. “They afford 
material,” he said, “for natural selection to accumu- 
late, in the same manner as man can accumulate in any 
given direction individual differences in his domesti- 
cated productions.” But Galton was convinced that 
natural selection cannot work effectively against her- 
edity, and he considered that his work was showing 
heredity maintaining the racial mean. According to 
Galton, the mean is the adapted form and deviations 
from it will be less adapted to the conditions of life. 
Therefore, natural selection will be aiding heredity in 
preserving it. In other words, he granted natural selec- 
tion its ‘stabilizing’ role, but not its ‘creative role.’ As 
the reason for this state of affairs he turned to the 
physiology of reproduction. The fertilized egg is com- 
posed of hereditary material from two parents, so that 
each time a new generation is produced there is a 
bringing together of two such materials. Inevitably 
the contributions of each parent and each ancestor 
will be diluted. So he accepted the long-held tradition 


of fractional inheritance, according to which the par- 
ents collectively contribute one-half, the grandparents 
one-quarter, and the great-grandparents one-eighth to 
the hereditary constitution of the offspring. These 
ancestral contributions exert their influence and tend 
to bring back the progeny of deviants toward the mean 
of the ancestral population as a whole. 


Discontinuous Variation 

Having thus restricted the role of natural selection, he 
turned to ‘sports’ of nature, those marked deviations 
that possess a stability shown by the absence of regres- 
sion to the existing type among their progeny. These 
deviations create a new mean toward which any pro- 
geny will tend to regress instead of regressing toward 
the mean of the original population. Hence, he 
explained, these sports may give rise to a new race 
with but little help from natural selection. He was 
thus opposed to cousin Darwin who in his On the 
Origin of Species had stressed how unlikely it was that 
such sports could serve as the starting point for new 
species. Granted they were strongly inherited, but 
most sports were closer to monstrosities than to 
newly adapted forms. In any case, thought Darwin, 
their rarity would result in the dilution of their type in 
successive generations of breeding with other mem- 
bers of the species. Darwin proved to be largely right 
on his first objection, but wrong on his second. 

It should not be assumed that Galton’s apostasy 
over natural selection was unusual for the nineteenth 
century. The consensus was in favor of evolution by 
descent, but not under the principal agency of natural 
selection. Therefore, it is ironic that those who most 
strongly supported the pre-eminence of natural selec- 
tion considered themselves Galton’s successors. Thus 
Karl Pearson and Raphael Weldon developed Galton’s 
statistical techniques and corrected his errors. But 
when Weldon sought to demonstrate natural selection 
in its creative role, shifting the mean of a population, 
he only succeeded in demonstrating its stabilizing 
role. Pearson exposed some of the confusions in the 
several differing representations of the ancestral law 
offered by Galton. He corrected the figure of two- 
thirds for the regression of offspring on parents, 
explained why regression was not the barrier to selec- 
tion that Galton claimed, and explored the effects of 
‘assortative’ mating, i.e., the choice of mates based on 
similarities of ability and background, which pro- 
motes the shifting of the mean of the resulting off- 
spring further and further away from that of the 
general population. 


Finger Prints 
The minute and distinctive patterns of ridges on the 
skin were used to make finger prints before Galton 


took up the subject. But it was he who conducted 
a systematic study leading to his classification of 
the differing types and it was he who persuaded the 
police to adopt the practice of fingerprinting for 
personal identification of criminals. For Galton the 
subject had a compelling theoretical interest because 
the trait appeared to have no function such that 
natural selection could act upon it. Marriage selection 
does not depend on it; the different patterns are 
not confined to particular classes or races. Therefore, 
there is complete ‘promiscuity’ with respect to this 
trait. Yet the varieties remain distinct. Here, then, 
we have a trait whose varieties do not blend and 
are not subject to selection. This, he believed, was an 
example of the existence and persistence of distinctive 
types independent of selection. Of course he could not 
really have known whether the trait was connected to 
some other trait that is subject to natural selection. 


Eugenics 


Controlling our Evolution 

The driving force behind Galton’s extensive and long- 
continued research was not just his curiosity, great as 
that was, but his vision of a future in which mankind 
would attain to greater energy and coadaptation. But 
he realized how easy it was to follow the wrong 
course. To accept the evolutionary process passively 
would be to surrender to “blind and wasteful pro- 
cesses” in which raw material is produced extrava- 
gantly and all that is superfluous is rejected “through 
the blundering steps of trial and error.” He favored the 
alternative that we should take control of our evolu- 
tion, for it may be that we are the “only executives on 
earth.” Hence the importance of eugenics in providing 
the proper scientific basis for action. To support such 
work he settled an endowment on University College 
London so that in 1905 a Research Fellow in eugenics 
could be appointed. Further expansion led to the cre- 
ation ofthe Eugenics Laboratory. 1In1911,againthrough 
Galton’s munificence, a chair of eugenics was estab- 
lished at the College, the first occupant being Pearson. 


Victorian Attitudes 

Galton’s attitude to racial differences, to women, and 
to the indigent was typical of a wealthy Victorian. 
Mild of manner and gentle in his disposition, yet his 
attitude to the less fortunate was unquestionably 
harsh. Many at the time endorsed the policy of nega- 
tive eugenics, i.e., to discourage the marriage and 
procreation of offspring by the exceptionally unfit, 
but Galton went further and wanted to favor those 
families that were “exceptionally fit for citizenship.” 
He argued that since there was substantial giving to the 
poor and destitute, could not support be forthcoming 
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to promote “the natural gifts and the national 
efficiency of future generations”? His concern, like 
Pearson’s, was over the differential between the repro- 
ductive rates of the upper middle and lower classes in 
favor of the latter. 


Publicizing Eugenics 

In 1904 the time was judged opportune for Galton to 
address the Sociological Society on eugenics. Here he 
argued for the maintenance of diversity, but “each 
class or sect” represented “by its best specimens,” 
and then to leave them to “work out their common 
civilization in their own way.” The best were the 
healthy, the energetic, the able, the manly (!), the 
courteous, but he advised leaving out the cranks and 
refusing the criminals. Eugenics should study the con- 
ditions that cause families to thrive and leave more 
descendants, so that the most useful members of 
society could be encouraged to adopt such conditions. 
The main task ahead was to establish eugenics as an 
academic question, then to bring about consideration 
of its practical development, and third to introduce 
eugenics “into the national conscience, like a new 
religion.” He ended by cautioning his audience against 
too much zeal which could lead to hasty action. A 
golden age is not round the corner, he warned. Such 
expectations would lead to discrediting of the science. 
In the event it took Hitler’s treatment of the Jews in 
World War II to achieve that. 


Conclusion 


Galton was the confident English gentleman, well 
aware of the superiority of his nation and his class, 
condescending to the former colonies, and dedicated 
to turning back the degeneration of his countrymen. 
But he disparaged the institution of the aristocracy, 
rejected the Christian religion, and considered many 
of our behavioral characteristics as outworn relics 
from a primitive stage in our social evolution. Al- 
though his mathematical skills were limited, his im- 
agination, insight, and inventiveness were remarkable. 
Allied to his incessant curiosity, these talents made 
him one of the founders of the statistical revolution 
that occurred in his lifetime. His book Natural Inherit- 
ance (1889) proved an inspiration and a turning point in 
the lives of several of those who became important 
contributors to the development of biometry, statis- 
tics, and evolutionary biology. The imaginative psycho- 
logical studies that he published in Inquiries into 
Human Faculty and its Development (1883), proved 
an important influence among psychologists. 
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Gametes are the haploid cells that fuse in the sexual life 
cycle to form the diploid zygote. Not all sexual organ- 
isms have gametes in the sense of specialized un- 
inucleate cells, but they nevertheless contrive to 
bring haploid nuclei together for fusion (karyogamy), 
and we can refer to these as gamete nuclei. Not all 
gametes and gamete nuclei are differentiated into male 
and female, and not all are the immediate products of 
meiosis. In the majority of sexually reproducing 
organisms other than animals there is usually a haploid 
phase of mitotic division interposed between meiosis 
and sexual nuclear fusion, and in ferns, mosses, and 
liverworts, and most fungi and algae, the haploid 
phase is free-living. 

This article briefly reviews the variations found in 
different groups of sexually reproducing organisms. 


Animals 


All groups of animals, other than the unicellular forms 
(Protozoa), have differentiated female eggs and male 
spermatozoa, the former contributing both nucleus 
and cytoplasm to the zygote and the latter little more 
than the nucleus. Both are the immediate products of 
meiosis, but whereas all four spermatozoa formed ina 
sperm mother cell (spermatocyte) are potentially 
viable, only one nucleus from meiosis in the oocyte 
survives in the egg, which is generally released from 
the ovary for fertilization as a free cell. The spermato- 
zoa are each propelled by a single flagellum. 


Variations in the Form of Gametes in 
Other Organisms 


All Gametes Motile in some (not all) Algae 
In algae we see all the stages in the hypothetical evolu- 
tion of male and female gametes from the supposed 


primordial state of gametes of similar form and size. In 
some green algae, including the unicellular, motile 
Chlamydomonas reinhardtii, the gametes are motile 
biflagellate cells all the same size, though of two dif- 
ferent mating types. In some related species, sexual 
fusion is between larger and smaller motile cells, 
which may be called male and female; the female 
gametes may, as in the colonial genus Volvox, lose 
their flagella and become nonmotile, so becoming 
more like eggs. 

Among the brown algae, some forms, such as the 
filamentous Ectocarpus, have equal-sized biflagellate 
motile gametes. The large brown seaweeds, exempli- 
fied by the genera Laminaria and Fucus and their 
allies, have nonmotile free-floating ova and motile 
sperm (called antherozoids). These genera are predom- 
inantly diploid, and the gametes in Fucus are the 
immediate products of meiosis. 

In the red algae there is another variation, with the 
male gametes not motile at all but rather nonmotile 
spermatia, which are released into the water in great 
numbers with the object of fusing with female recep- 
tive filaments which connect to the ova, which are 
retained within the female organ rather than allowed 
to drift. 

One well-studied group of Fungi, the Blasto- 
cladiales (e.g., Blastocladiella, Allomyces) can be men- 
tioned here because of their remarkably alga-like 
mode of reproduction, with motile uniflagellated 
‘male’ and ‘female’ gametes of different size. 


Gametes and Vegetative Cells 
Interchangeable 

In the budding yeasts such as Saccharomyces cerevi- 
siae, the haploid products of meiosis are ready to 
function as gametes immediately, provided that differ- 
ent mating types come together (as they always do in 
strains with mating-type switching), but if restricted 
to one mating type they can bud indefinitely as vege- 
tative haploid cells. 


Ferns, Mosses, etc.: Male Gametes Motile 
In both ferns and mosses, the female gametes are eggs 
held within female receptive structures (archegonia), 
while the male gametes are motile sperms, biflagellate 
in mosses but with many fine cilia in ferns. Two orders 
of much larger plants, the Cycadales and Ginkgoales, 
sometimes classified as distantly related to the gym- 
nosperms (pine trees, etc.), also have multiciliated 
motile male gametes. 


Seed Plants: Female Eggs and Male Gamete 
Nuclei 

The two main groups of seed plants, the angiosperms 
(flowering plants) and gymnosperms have egg cells 


within their respective female reproductive structures, 
but, strictly speaking do not have male gametes, in the 
sense of separate cells, but only gamete nuclei. 

In the angiosperms the product of meiosis on the 
female side is a megaspore, which undergoes haploid 
mitosis to produce eight nuclei, one of which becomes 
the nucleus of the egg. On the male side, the pollen 
grain (microspore) germinates to give a pollen tube 
which, after two mitotic divisions, contains three hap- 
loid nuclei. The pollen tube grows down the style of 
the flower to the embryo sac, where one of its nuclei 
fuses with the egg nucleus while another fuses with 
two other embryo sac nuclei to found the (usually) 
triploid tissue of the seed endosperm, which has a 
nutritive function. The pollen tube nucleus that fertil- 
izes the egg can certainly be called a gamete nucleus, 
and the one that contributes to the endosperm, which 
has no genetic future, is a gamete nucleus in a more 
special sense. 

The gymnosperms are different in that the haploid 
tissue derived from the megaspore is much more 
extensive than in the angiosperms. It includes the 
endosperm of the seed, which here is purely maternal, 
and, embedded within it, the archegonia that contain 
the eggs. The pollen grains make no genetic contribu- 
tion to the endosperm and, of the few haploid nuclei 
(usually four) in the pollen tube, only the one that 
fuses with the egg can be called a gamete nucleus. 


Fungi: Gamete Nuclei in Gametangia and 
Dikaryons 

In fungi of the important group Mucorales, which 
include the bread-mold genus Mucor, and Phycomyces 
blakesleeanus (much worked on by Max Delbriick for 
its response to light) the cells which fuse sexually are 
called gametangia, borne as club-shaped branches on 
the filamentous mycelium. They appear to be multi- 
nucleate and generally similar in size, though often of 
different mating types. Cell fusion is followed by 
nuclear fusion (karyogamy), but whether of one or 
several pairs of nuclei is not completely clear. The 
multinucleate gametangia cannot be described as 
gametes in themselves, but the nuclei that they contain 
can be termed gamete nuclei. 

Most of the fungi that have been used for genetics 
belong either to the Ascomycetes or the agaric (mush- 
room) division of the Basidiomycetes. In both of these 
groups the growth that gives rise to the diploid cells, 
within which meiosis occurs, is haploid and except in 
the yeasts, dikaryotic — that is to say consisting of 
binucleate cells, with the pairs of nuclei dividing in 
synchrony. In the mushrooms the dikaryon is the 
major proliferative phase of the life cycle. Haploid 
basidiospores, the immediate products of meiosis, 
germinate to give mycelia which remain monokaryotic 
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only so long as it takes them to find another mono- 
karyon of compatible mating type with which they 
can fuse to form a dikaryon. The dikaryon produces 
the mushrooms which bear the basidia — specialized 
cells within which fusion of the mutually compatible 
nuclei finally takes place, with meiosis following 
immediately. There are no gametes in the life cycle, 
but the nuclei fusing within the basidium might be 
called gamete nuclei, though they are not usually so 
termed. 

In the filamentous Ascomycetes (which include 
such genetically important genera as Aspergillus, 
Neurospora, Sordaria, Podospora, and Ascobolus) 
dikaryon formation follows the fertilization of female 
structures (ascogonia), which have receptive filament- 
ous outgrowths called trichogynes. The male fertiliz- 
ing elements come in various forms: as specialized 
fertilizing spores (microconidia), as conidia of the 
same kind as propagate the fungus vegetatively, or as 
vegetative hyphal tips. Following fusion with a tricho- 
gyne, a male nucleus migrates into the ascogonium to 
establish a dikaryon in partnership with the ascogonial 
nucleus. The dikaryon proliferates briefly within the 
developing fruit body (ascogenous hyphae) but soon 
form ascus initials within each of which a pair of 
nuclei, the descendants of the original pair, undergo 
fusion. Meiosis in the ascus follows immediately, with 
the formation of haploid ascospores. In this system the 
term gamete, if used at all, should be reserved for the 
nuclei within the dikaryon which finally fuse, rather 
than the cells which initiate the dikaryon. 


Contrasting Styles in the Protozoa 

The Protozoa — single-celled animals — are a vast and 
diverse group. The ciliates Paramecium and Tetra- 
hymena are probably the most extensively studied 
from the genetic point of view. 

Paramecium and Tetrahymena spp. are diploid 
organisms, and a cell about to enter into sexual fusion 
(conjugation) undergoes meiosis; three of the four 
haploid nuclei degenerate, and the survivor divides 
once mitotically to give two haploid nuclei. Conjugat- 
ing pairs of cells remain joined for long enough for one 
haploid nucleus from each cell to pass into the other, 
where it fuses with the resident nucleus. This is 
another example of gamete nuclei, rather than gamete 
cells. 

In a very different protozoan, Plasmodium falci- 
parum, the mosquito-transmitted cause of malaria, dif- 
ferentiated male and female gametes are formed in the 
mosquito; the female gametes are nonmotile spherical 
cells and the motile male gametes are whip-like, not 
with distinct head and flagellum as in higher animals. 


See also: Meiotic Product 
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Gametes, Mammalian 
L Silver 
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Gamete is the general term used to describe the repro- 
ductive cells of animals or plants. Thus, in animals, 
sperm and eggs are both considered gametes. 


See also: Meiosis 
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Definition and Relationship to 
Recombination under Neutral Model 


The description of genetic variation at the population 
level usually begins with consideration of allelic vari- 
ation at a single gene locus. The next step is to consider 
genetic variation at two or more loci simultaneously, 
including nonrandom associations. Gametic disequi- 
librium (also referred to as linkage disequilibrium) 
describes the nonrandom association of alleles at dif- 
ferent genetic loci. The pairwise gametic disequilib- 
rium parameter, usually deCell Cyclenoted D, is given 
by the difference between the observed frequency of a 
gametic type and the frequency expected on the basis 
of random association of alleles in gametes. Gametic 
disequilibrium can occur in populations as a conse- 
quence of mutation, selection, migration or admix- 
ture, and random genetic drift. The amount of 
gametic disequilibrium observed in a population is 
affected by recombination, selection, nonrandom 
mating, and the demographics of the population. 
Consider two genes (denoted A and B), with two al- 
leles each (A, a and B, b), and four gametic types (also 
referred to as haplotypes): AB, Ab, aB, and ab. The fre- 
quencies of the four gametes (denoted f(AB), etc.) can 
be described in terms of the allele frequencies pa( pa = 
1 — pa) and pp( pp = 1 — pp) at the two loci, and the 
gametic disequilibrium parameter D, as follows: 


f(AB) = paps + D, 
f(aB) = paps — D, 


f(Ab) = paps — D, 
f(ab) = papp + D 


where D =f(AB) — paps =f(AB)f(ab) — f(AB) 
f(aB). If D = 0 (a state referred to as gametic or linkage 


equilibrium) then the alleles at the two loci are ran- 
domly associated. If D > 0 the allele A occurs more 
often with allele B than expected by chance (and hence 
a with b), while if D < 0 alleles A and b (and hence a 
and B) are preferentially associated. The possible value 
of D in a two locus system is constrained by the fact 
that the haplotype frequencies must be > 0. The 
normalized gametic disequilibrium D! = D/D max is 
often considered, where Dmax is equal to either the 
lesser of p a Pp OF Pa pg if D is positive, and the lesser of 
Pa Pp OF Pa Pp if D is negative. The advantage of this 
measure over D is that it has a range from —1 to +1, 
regardless of the allele frequencies. The correlation 
coefficient: r = D/(papapsps)'/”, with a range from 
—1 to +1, is also often used, as well as 77 with a range 
from 0 to +1. 

It is possible, although relatively rare, that unlinked 
loci can be in significant gametic disequilibrium, and 
that very closely linked loci may be in gametic equilib- 
rium. The value of gametic disequilibrium is often 
expected to change each generation. Further, a popu- 
lation at genetic equilibrium can have significant 
gametic disequilibrium (e.g., under various selection 
schemes). The term linkage disequilibrium is thus an 
unfortunate choice and the term gametic disequilib- 
rium is preferable, although also not perfect; however, 
the term linkage disequilibrium is commonly used. 

There is a relationship between D and the recom- 
bination fraction c between two loci. Changes in 
gamete frequencies over generations only occurs by 
recombination in individuals heterozygous at both 
loci under consideration. In fact, the value of D 
decreases by a fraction (1—c) each generation under 
random mating and a neutral model. Thus the gametic 
disequilibrium in this case converges to zero (random 
association of alleles) with time as (1 — c)”, where n is 
the number of generations. The more loosely linked 
are two loci, the faster the decay of gametic disequilib- 
rium. On the other hand, for very tightly linked loci 
gametic disequilibrium may exist for a very long time. 

The definition of gametic disequilibrium is easily 
extended to accommodate more than two alleles at a 
locus. When considering three or more loci, higher 
order disequilibrium terms are also needed. For ex- 
ample, in a three-locus system a gametic frequency can 
be expressed in terms of the three-allele frequencies, 
the three pairwise gametic disequilibria, and a single 
measure of third-order gametic disequilibrium. 


Evolutionary Forces Creating Gametic 
Disequilibrium 


Historical 
When a new mutant arises it occurs in one individual 
and is in gametic disequilibrium with all polymorphic 


loci in the population. For example, if the A locus 
is monomorphic initially (allele A), and the B locus is 
polymorphic (alleles B and b), then when a new 
mutant (allele a) arises it will occur on a chromosome 
carrying either B or b, but not both, so the alleles 
are nonrandomly associated with D # 0. Although 
the absolute value of D is relatively small in this case, 
the normalized disequilibrium D’ is +1 or —1. The 
new allele may increase in frequency due to, for ex- 
ample, genetic drift or selection and although recom- 
bination will break down this nonrandom association, 
significant gametic disequilibrium may be maintained 
for a long time between very closely linked loci. 


Selection (Direct and Hitchhiking) 

Selection for different combinations of alleles can pro- 
duce D ¥ 0. If the selection is acting directly on the 
two loci being considered gametic disequilibrium can 
be maintained in an equilibrium state. While this can 
apply to unlinked loci, it is expected more often for 
closely linked loci. Transient gametic disequilibrium 
can also be created with neutral loci via a hitchhiking 
event. If an allele, say b, at a neutral locus is in gametic 
disequilibrium with an allele favored by selection at 
another locus, say a, then changes in the frequency of 
the allele b will occur due to this nonrandom associ- 
ation. Such hitchhiking can noticeably increase the 
absolute value of the gametic disequilibrium, if selec- 
tion in favor of the new mutant is greater than the 
recombination rate between the neutral and selected 
loci. Further, gametic disequilibrium can be generated 
between two neutral loci via hitchhiking at a third 
closely linked locus. Gametic associations built up via 
hitchhiking events are expected to decline in strength 
as recombination breaks up haplotypes bearing the 
selected allele. 


Migration or Admixture 

Mixing of two genetically different populations can 
create gametic disequilibrium. As an extreme consider 
two populations — one monomorphic for alleles A and 
B, the other monomorphic for alleles a and b, so 
initially when the populations are mixed there are 
only two gametic types AB and ab, hence D # 0. For 
pairwise gametic disequilibrium to be generated by 
migration or admixture, the allelic frequencies of both 
loci in the two populations must be different, and the 
difference in allele frequencies must be substantial in 
order to generate very much gametic disequilibrium. 
Again recombination will break down this association 
over time. 


Finite Population Size 
Genetic drift can cause nonrandom associations 
between alleles at different loci. While the expected 
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value of pairwise gametic disequilibrium due to drift 
over many generations is zero, the variance is large 
for closely linked loci in small populations. The 
demographic structure of a population will affect the 
amount of gametic disequilibrium observed. A small 
founder population or a bottleneck in the recent 
past can cause significant gametic disequilibrium for 
closely linked loci. While less gametic disequilibrium 
will be generated by genetic drift in a rapidly grow- 
ing population, gametic disequilibrium present before 
or during the early phase of the expansion will 
persist. 


Nonrandom Mating 

The mating or reproductive system can retard the rate 
of approach to random allelic association. For ex- 
ample, a high level of self-fertilization leads to a reduc- 
tion in the proportion of double heterozygotes, from 
which recombinants are subsequently formed, and 
hence retards the decay to gametic equilibrium. 


Population Level Observations 


Significant nonrandom association (gametic dis- 
equilibrium) between alleles at two loci can be tested 
using the chi-square (x^) test and the Fisher’s exact test 
on the contingency table of gametic types. Algorithms 
are available to perform Fisher’s exact test and Monte 
Carlo methods for approximating the results for the 
exact test so that examples where the expected num- 
bers of some gametic types are small can be consid- 
ered. While genetic drift and demographic effects 
should be randomly distributed over the genome, the 
effects of natural selection are expected to be nonran- 
domly distributed. 

General observations are that there is an overall 
proportionality between gametic disequilibrium and 
the inverse of the recombination distance, although this 
breaks down in very closely linked regions. Gametic 
disequilibrium is nonrandomly distributed through- 
out the genome. Some regions, such as the immune 
response human leukocyte antigen (HLA) system on 
chromosome 6, show strong evidence of selection and 
significant gametic disequilibrium which may span 3 
centimorgans (cM) or more. 


Disequilibrium (Association) Mapping of 
Disease Loci 


The existence of gametic disequilibrium has been a 
very powerful tool in mapping over 200 diseases to 
the HLA region. An increased frequency of an HLA 
antigen (allele) in patients over that in an ethnically 
matched control population is inferred to be due 


752 Gametogenesis 


either to the direct effect of the HLA antigen itself 
on disease, or to gametic disequilibrium (association) 
of the HLA allele with the actual disease-causing 
allele at a separate locus. Stratification analyses 
can be used to distinguish between these two pos- 
sibilities. 

For monogenic traits, most disease genes mapped 
to date show gametic disequilibrium with mark- 
ers sufficiently close to the disease gene, 0.5cM or 
more, e.g., cystic fibrosis, Huntington disease, Wilson 
disease, Batten disease, Friedreich ataxia, myotonic 
dystrophy, torsion dystonia, hemochromatosis, di- 
astrophic dysplasia, adult onset polycystic kidney 
disease, and many others. The familial breast can- 
cer gene BRCA1 is an exception to this rule; gam- 
etic disequilibrium is not seen with closely linked 
markers since each family usually has a unique mu- 
tation. 

For complex diseases involving multiple loci, in- 
complete penetrance, and genetic heterogeneity, asso- 
ciation mapping has been successfully applied in the 
study of candidate regions. The greater number of 
markers needed for an association genome scan to 
detect disease-predisposing genes compared with 
standard linkage analysis techniques (LOD score 
analysis and affected sib pair methods) has prevented 
their wide-scale implementation to date. However, the 
use of DNA pooling for the study of microsatellite 
variation in patients and controls, and the current 
development of DNA chip technology for the study 
of single nucleotide polymorphisms (SNPs), has 
opened the way for future routine disequilibrium 
mapping of disease genes. 


See also: Linkage Disequilibrium; Linkage Map 


Gametogenesis 
J Hodgkin 
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Gametogenesis is the process leading to the produc- 
tion of specialized reproductive cell types, either eggs 
or sperm, collectively known as gametes. It entails 
meiotic division, to generate haploid cells, together 
with maturation into the appropriate functional 
gamete. 


See also: Oogenesis in Caenorhabditis elegans; 
Oogenesis, Mouse; Spermatogenesis in 
Caenorhabditis elegans; Spermatogenesis, Mouse 


Gamma Distribution 
N Saitou 
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Gamma distribution is based on gamma function I (a), 
so first this function must be explained. Gamma func- 
tion T (a) is defined as fẹ e*t ‘dt. This function has 
various interesting properties. For example, F(a + 1) 
= aľ (a). Therefore, when variable a is integer '(n+1) 
=n. Hence, gamma function is also called ‘factorial’ 
function. 

Gamma distribution f(r) is defined as 
[b¢/T(a)]e~"r*“!, where a = mean(r)*/var(r) and b = 
mean(r)/var(r). Mean(r) and var(r) are mean and vari- 
ance of variable y, respectively. Shape of the gamma 
distribution f(r) is determined by a, while b is a scaling 
factor. The gamma distribution is known to be very 
flexible and takes various shapes depending on the 
value of a. Therefore, variable a is often called a 
‘shape parameter’ of gamma distribution. When a is 
small, its distribution is skewed to the left. When a 
is infinite, r takes only one value (Dirac’s delta func- 
tion). 

In molecular evolutionary studies, this gamma dis- 
tribution is sometimes used when a certain distribu- 
tion is empirically known to be quite heterogeneous. 
For example, evolutionary rate of amino acid or 
nucleotide substitution is often assumed to be con- 
stant for every site in simple evolutionary models. 
However, the rate varies greatly in reality. In this 
case, the gamma distribution may be used. If we can 
estimate the value of shape parameter a, application 
of the gamma distribution is possible. Evolution- 
ary distance thus estimated is often called ‘gamma 
distance.’ 


See also: Evolutionary Rate 


GAP (RAS GTPase 
Activating Protein) 


A C Lloyd 
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Ras-GAP is a 120-kDa, ubiquitously expressed cyto- 
solic protein. It was initially identified as an activity in 
cell extracts able to stimulate the intrinsic GTPase 
activity of p21Ras (Trahey and McCormick, 1987). 


The catalytic GAP activity resides in the carboxy 
region of the protein and acts as a negative regulator 
of Ras signaling, modulating the levels of Ras-GTP 
(van der Geer et al., 1997). There is increasing evi- 
dence for functions independent of GAP activity via 
the N-terminus that consists of two SH2 domains 
flanking an SH3 domain which mediate interactions 
with other cellular proteins such as p190 and p62 
(Kulkarni et al, 2000). A central region contains 
a plekstrin homology (PH) domain, and a CaLB 
domain thought to be important in regulating mem- 
brane interactions. Mice homozygous for a mutant 
GAP allele die at about day 10.5 of embryogenesis 
displaying a variety of defects including vascular 
abnormalities and increased apoptosis in the nervous 
system. 
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See also: Ras Gene Family 


Gastrulation 


See: Developmental Genetics 


Gaucher’s Disease 
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Frequency of Gaucher’s Disease 


The overall frequency of lysosomal diseases in the 
general population worldwide is estimated to be 
about 1 in 5000 live births of which Gaucher’s disease 
is the most common having an estimated frequency of 
1 in 50000 to 60000 live births. In selected popula- 
tions the frequency appears to be much greater and 
although the predicted frequency of Gaucher’s disease 
in the Ashkenazi population is unknown, homo- 
zygotes for the N370S mutation and compound het- 
erozygotes with the N370S 84GG genotype would 
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occur in an overall frequency of about 1 per 855 indi- 
viduals in the population at large. 


Definition 


Gaucher’s disease is a multisystem disorder princi- 
pally affecting macrophages and classified in the 
Online Mendelian Inheritance in Man (OMIM) web- 
site as OMIM 23080, 23091, and 23100. It is a proto- 
type of the glycosphingolipidoses, an important group 
of lysosomal disorders characterized by deficiency of 
specific acid hydrolases responsible for the degrad- 
ation of complex membrane glycolipids. 

Gaucher’s disease is caused by a recessively in- 
herited deficiency of an acid B-glucosidase, gluco- 
cerebrosidase (EC.3.2.1.45). As a consequence of this 
deficiency N-acyl-sphingosyl-1-0-B-p-glucoside and 
other minor glycolipid metabolites such as gluco- 
sylsphingosine accumulate. All the accumulated gly- 
colipids represent metabolic intermediates derived 
from the cellular turnover of membrane lipid macro- 
molecules of the ganglioside and globoside classes. 


Genetics 


The human glucocerebrosidase locus has been 
mapped to chromosome 1q21 where it is found in 
close proximity to a nonprocessed cognate pseudo- 
gene which is absent in several other vertebrates. 
The human acid-B-glucosidase gene is also found in 
proximity to two other genes, metaxin and thrombo- 
spondin 3. Expression of mRNA encoding human 
acid-B-glucosidase is constitutive in nearly all cells 
but varies in abundance. In the 5’ untranslated region 
of the functional acid-B-glucosidase gene in humans 
there are two CAAT boxes and two sequences encod- 
ing putative CAAT boxes. Promoter/expression stud- 
ies have revealed several transcription factors, 
octamer-binding transcription factor 1 (OCT binding 
protein), oncogene Jun activator protein-1 (AP-1), 
ets-related transcription factors: Polyomavirus 
Enhanced Activator-3 (PEA3), and the CAAT bind- 
ing protein). These and other transcription factors 
yet to be characterized indicate that the expression 
acid-B-glucosidase may be regulated by transcrip- 
tional activation associated with proliferative cell re- 
sponses. 


Clinical Spectrum of Gaucher’s Disease 


Gaucher’s disease may be associated solely with 
systemic (nonneuronopathic) or with neuronopathic 
(neurological) features. In the nonneuronopathic 
(type 1) form of Gaucher’s disease partial enzymatic 
deficiency of acid-B-glucosidase is associated with the 
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accumulation of glycolipids in macrophages thatbelong 
to the mononuclear phagocyte system located princi- 
pally inthe liver, bone marrow, and spleen. Pathological 
macrophages containing excess lysosomal stored lipid 
may also be found within the lung and, on rare occa- 
sions, pericardium and kidney. In the neuronopathic 
forms of Gaucher’s disease (type 2 and type 3) severe 
deficiency of glucocerebrosidase caused by disabling 
or inactivating mutations is additionally associated 
with disease of the nervous system. The pathogenesis 
of neuronopathic Gaucher’s disease is complex but in 
many instances failure to degrade endogenous glyco- 
sphingolipids present in brain tissue is a contributing 
factor although the accumulation of Gaucher’s cells 
around adventitial spaces in cerebral blood vessels 
as a result of uptake of circulating glucosylceramide 
present in plasma may bea contributory factor. 


Genotype—Phenotype Correlations 


In a particular variant of neuronopathic Gaucher’s 
disease described in Arabic, Japanese, and Spanish 
populations, an intermediate phenotype associated 
with corneal opacities and endocardial thickening 
with mitral and aortic valve disease of the heart has 
been identified and related to homozygosity for a 
particular missense mutation D409H in the glucocere- 
brosidase gene. Other genotype/phenotype correl- 
ations are less close but of the common widespread 
mutations L444P is associated with disease severity 
and has been described in all three phenotypes of 
Gaucher’s disease. Homozygosity for L444P is a sig- 
nificant cause of neuronopathic Gaucher’s disease in 
the so-called Swedish or Norrbottnian variant with 
slowly progressive neurological symptoms associated 
with survival to adult life. In type 2 Gaucher’s disease 
the rapid onset of bulbar paresis, spastic paraparesis, 
and opisthotonus with swallowing difficulties is noted 
within the first few months of life and survival beyond 
the first few years of life is very unusual. 

A recently recognized and rare variant of Gaucher’s 
disease is associated with premature fetal loss and 
stillbirth as well as infants with a desquamating and 
dehydrating skin lesion that die shortly after birth of 
dehydration. This condition associated frequently 
with an abnormal appearance of the dermis ‘collo- 
dion’ is associated with severely inactivating lesions 
in the human glucocerebrosidase gene and has paral- 
lels with the short-lived lethal murine glucocerebro- 
sidase deficiency state generated by targeted disruption 
of the glucocerebrosidase gene in embryonic stem 
cells. Homozygous animals die within 24h of birth 
and although scant Gaucher’s cells are present 
within systemic organs and excess glucosyl ceramide 
accumulates, the principal cause of death appears to be 


skin desquamation and dehydration. Ceramides 
released by the action of acid-B-glucosidase appear 
to be essential for the maintenance of dermal integrity 
and the prevention of water loss. 

Several mutations in the glucocerebrosidase gene 
appear to result from genetic rearrangements between 
the functional and human pseudogenes as a result of 
gene conversion or recombination events. This leads 
to the transfer of multiple missense or point mutations 
in the presence of the closely related glucocerebro- 
sidase pseudogene that may create difficulties for facile 
detection and more precise identification of causal 
mutations. Definitive genomic sequencing, cDNA 
sequencing procedures are recommended. Apart 
from the widely distributed L444P allele, the N370S 
allele harboring a missense mutation is widespread 
in several populations and may have occurred on a 
background of several haplotypes. This mutation 
appears to be associated with the presence of only a 
mild catalytic impairment of the cognate enzyme poly- 
peptide. 

The presence of at least one copy of the N370S 
allele militates against the occurrence of neuro- 
nopathic Gaucher’s disease. Several mutations, and in 
particular the N370S mutation, as well as the 84GG 
mutation, occur with particular frequency in the 
Ashkenazi Jewish population. Population studies 
reveal diverse phenotypes associated with homo- 
zygosity for N370S but the N370S mutation is wide- 
spread in populations throughout the world including 
South America, Spain, and Portugal and in patients 
with no known Ashkenazi ancestry. The high gene 
frequency for N370S and 84GG have been estimated 
to be approximately 0.03 and 0.002 in the Ashkenazi 
population, respectively. The basis for this high allele 
frequency has not been fully explained. The operation 
of selective evolutionary pressure has been postulated. 
It has been suggested that homozygotes or heterozyg- 
otes for the N370S mutation may have constitutive 
activation of macrophages in target organs such as the 
spleen that would confer resistance against infection 
with pathogenic microorganisms — particularly tuber- 
culosis. No experimental evidence to support this 
speculation has been yet provided. 


Clinical Presentation 


Symptoms of type 1 Gaucher’s disease usually result 
from the presence of splenic enlargement and either 
the enlarged viscera are noted by the patient or the 
consequences of hypersplenism (anemia, thrombo- 
cytopenia, or leukopenia) declare themselves by the 
occurrence of spontaneous bruising or unexplained 
sepsis. Abnormal blood counts combined with en- 
largement of the liver and spleen (hepatosplenomegaly) 


may lead ultimately to bone marrow examination or 
tissue biopsy that may reveal the presence of the char- 
acteristic Gaucher’s cells. Bone marrow biopsy or 
tissue biopsy is no longer necessary for the diagnosis, 
however, which can be easily made by enzymatic 
assay of circulating leucocytes using fluorescent sub- 
strates to reveal a profound deficiency of acid-B- 
glucosidase in affected homozygotes. Retrospective 
enquiry may reveal a history of bone pains attribut- 
able to the so-called bone infarction crises resulting 
from marrow infiltration particularly in regions and 
occurring particularly at the growing ends of long 
bones (epiphyses). A prior diagnosis of Perthe’s dis- 
ease is common. Pallor, fatigue, and palpitations often 
presage the diagnosis. Occasionally, massive enlarge- 
ment of the liver and spleen occurs in infancy. Patients 
with neurological disease may present at any age. In 
the more indolent type 3 neuronopathic forms natural 
gaze, disturbances of vertical gaze, result from neuro- 
nophagia and other localized injury within the nuclei 
of the brain stem that are key to control of conjugate 
eye movements. Later, ataxia, mild spasticity, myo- 
clonic or complex epilepsy, and slowly progressive 
dementia may become clear. These patients have a 
degree of systemic involvement with hepatospleno- 
megaly and bone marrow infiltration that is very vari- 
able ranging from massive hepatosplenomegaly with a 
bleeding tendency and gross abdominal swelling to 
only subtle enlargement of the liver and spleen detect- 
able by ultrasonic examination. 

In the acute neuronopathic variant, type 2, diffi- 
culty in swallowing, paralytic squint, and persistent 
hyperextension of the head is common followed by 
spasm of the jaw (trismus), generalized spasticity, and 
psychomotor retardation; respiratory obstruction due 
to laryngospasm also occurs with aspiration pneumo- 
nia, myoclonus, and generalized seizures in the late 
stages of the illness. 


Treatment of Gaucher’s Disease 


As emphasized earlier, Gaucher’s disease is in many 
respects the prototypic lysosomal disorder. It affects 
all ages and as the most common lysosomal disorder 
has been subject to intensive investigation of its bio- 
chemistry and genetics — and of definitive methods for 
therapy. 


Marrow Transplantation 

Because macrophages, the principal focus of 
Gaucher’s disease, are derived from granulocyte- 
monocyte progenitor cells in the bone marrow, it 
was likely that bone-marrow transplantation would 
provide a population of cells competent in the degrad- 
ation of glycosphingolipids and thereby correct the 
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defect. Marrow transplantation has been successfully 
carried out in infants, children, and young adults with 
Gaucher’s disease. The donors have been HLA- 
matched sibling donors either normal or heterozygous 
for the glucocerebrosidase defect. Successful engraft- 
ment of the bone marrow stem cells has been asso- 
ciated with clinical regression of the disease with 
catch-up growth in stunted children and, ultimately, 
almost complete disappearance of the pathological 
storage cells in the tissues including the liver. 
Although only a minority of patients with Gaucher’s 
disease will be suitable candidates for bone-marrow 
transplantation, particularly with the emergence of 
enzyme replacement therapy (see below), its success 
in eradicating the disease demonstrates that a comple- 
ment of tissue macrophages derived from the bone 
marrow with at least 50% of normal B-glucosidase 
activity is sufficient to correct the nonneuronopathic 
manifestations of this systemic disease. 


Enzyme Replacement Therapy 

Early studies using preparations of human glucocere- 
brosidase prepared from placental tissue were con- 
ducted at the National Institutes of Health by 
Roscoe Brady and colleagues. Infusion of the native 
protein was associated with a reduction of erythrocyte 
and plasma glucocerebroside over a few days. No 
convincing clinical improvement was demonstrated. 
However, since the pioneering discovery of the lyso- 
some and its access to the aqueous phase by Christian 
de Duve and contemporaneous studies on the uptake 
of glycoproteins by parenchymal and nonparenchy- 
mal hepatic cells, it was considered that native human 
glucocerebrosidase may lack the critical recognition 
signals for uptake and delivery to the disease macro- 
phage of Gaucher’s tissue. With the identification of a 
mannose receptor on the surface of macrophages and 
the preferential uptake of mannosylated proteins by 
human alveolar macrophages, experiments were 
undertaken to modify the terminal carbohydrate resi- 
dues of placental glucocerebrosidase by sequential 
enzymatic deglycosylation. Mannose-terminated pre- 
parations of human glucocerebrosidase were then 
shown to be taken up preferentially by nonparenchy- 
mal (Kupffer-cell-rich) rather than parenchymal 
hepatic cells in rats and prompted further studies 
of enzyme replacement therapy in patients with 
Gaucher’s disease. 

Early clinical trials of mannosylated human pla- 
cental glucocerebrosidase (alglucerase) showed rapid 
regression of symptoms and visceromegaly with im- 
provement and blood counts and other parameters of 
Gaucher’s disease activity. The preparation secured 
approval as an Orphan Drug under the Food and 
Drug Administration of the USA in 1990. 
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With advances in genetics and the cloning of the 
human glucocerebrosidase gene by several groups the 
development of a recombinant enzyme replacement 
strategy was a key element of pharmaceutical invest- 
ment by the Genzyme Company, the commercial part- 
ners in this pioneering work. Recombinant human 
glucocerebrosidase, imiglucerase (Cerezyme@), 
now produced by Genzyme as a recombinant product 
purified from Chinese hamster ovary cells transfected 
with the human glucocerebrosidase gene. Many thou- 
sands of patients worldwide with Gaucher’s disease 
are now able to receive this agent, which also appear 
to relieve some aspects of the mild neuronopathic 
(type 3) forms of Gaucher’s disease. Immunological 
and sensitivity reactions to the infusions are rare and 
this is accounted for by the observation that most 
patients with Gaucher’s disease harbor mutations 
that allow expression of residual glucocerebrosidase 
polypeptide antigens. 

With the commercial success of enzyme therapy in 
Gaucher’s disease preparations to treat other lyso- 
somal disorders such as Fabry’s disease (an X-linked 
endothelial disorder with principal effects on the 
heart, peripheral nerves, and kidneys) and MPS-1 
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Other Therapeutic Opportunities 

Although enzyme therapy has proved to be effective, 
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appropriate safety and stability profiles for human 
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deoxynojirimycin and N-butyl deoxygalactonojiri- 
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in the biosynthesis of glycolipids without affecting 
glucocerebrosidase and other acid glucosidases. The 
administration of these iminosugars to genetically 
modified animals that represent experimental models 
of the debilitating glycosphingolipidoses such as 


Tay-Sachs disease and Sandhoff disease have shown 
reduced glycolipid storage with partial delay or arrest 
of the ineluctable progression of these lysosomal 
storage diseases affecting brain tissue. 
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viously used in clinical trials in an attempt to arrest 
the proliferation of human immunodeficiency virus 
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sion of the major disease parameters of Gaucher’s 
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synergize with enzyme replacement therapy and 
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value for patients suffering from the otherwise intract- 
able neuronopathic forms of this disorder. 


Gene Therapy 

Since Gaucher’s disease can be corrected by transplant- 
ation of allogenic bone marrow providing a source of 
granulocyte-monocyte progenitor cells, the possibil- 
ity of gene therapy directed toward hematopoietic 
stem cells is raised. Several trials have been approved 
in the USA for the genetic transduction of CD34* 
hemopoietic stem cells that have been therapeutically 
corrected by transfer of the human glucocerebrosidase 
gene in retroviral vectors. This approach has already 
been successful in normal mice where prolonged 
expression of human glucocerebrosidase at a high 
level has been achieved in the macrophages of mice 
that have received primary and secondary marrow 
transplants. At present it is not clear how in humans 
transfected cells would have a selective advantage for 
survival and to populate the entire bone marrow that is 
diseased in Gaucher’s disease thereby providing long- 
term remission of glycolipid storage by the metabolism 
of endogenous glucocerebroside. However, high ef- 
ficiency vectors for long-term expression in grafted 
autologous cells are currently being studied to secure 
corrective expression with the wild-type glucocere- 
brosidase gene. The means to secure a selective advan- 
tage within the marrow population continues to be 
explored actively. 


Genetic Studies of Pathophysiology 


The Gaucher’s cell, a pathological macrophage, isa stri- 
king feature of Gaucher’s disease but the connection 
between the pathological storage of glycosphingolipid 
and the diverse manifestations of the disease remain 


unexplained. Gaucher’s disease is accompanied by 
weight loss, fatigue, increased metabolic rate, sus- 
tained acute inflammatory reaction with B-cell prolif- 
erative responses as well as massive enlargement of the 
spleen and liver and in tissue destruction in the bone, 
lung, liver, and brain stem. Although the visceral 
organs may enlarge 50-80-fold pathological lipid 
that accumulates within the tissues accounts for less 
than 2% of the additional tissue mass. Thus the link 
between the macrophage abnormality and the com- 
plex phenotype that characterizes Gaucher’s disease 
and other lysosomal disorders due to glycolipid acti- 
vation remains unknown. Studies are under way to 
understand better the pathogenesis of Gaucher’s dis- 
ease and related glycolipid disorders. And clearly 
cDNA microarray analysis would offer the chance of 
a cluster analysis of genes upregulated and downregu- 
lated as part of the cellular response to the presence of 
stored glycolipid. Recent studies have been reported 
by the author’s group based on the polymerase chain 
reaction to identify genes whose transcriptional prod- 
ucts are increased in Gaucher’s disease tissue as a first 
step toward understanding the pathogenesis of this 
condition and opening up new avenues of therapy. 
Several genes including those encoding for chemokine 
and three lysosomal cysteine proteinases which are 
known to participate in tissue modeling antigen pre- 
sentation and bone matrix destruction, respectively, 
were shown to be upregulated in Gaucher’s disease 
tissue. The proteinases were present also in excess in 
the plasma and serum of affected patients. Expression 
of several proteinases appear to be correlated with 
Gaucher’s disease activity and severity score indices 
and serum levels of the cysteine proteases decreased 
upon reduction of Gaucher’s disease activity with 
enzyme replacement treatment. 

Thus the study of the secondary genetic abnormal- 
ities in the lysosomal disease, such as Gaucher’s dis- 
ease, may prove to be revealing to identify the 
pathological cascades that are activated as a result of 
abnormal lipid storage and may ultimately provide 
avenues for additional therapy. Proinflammatory 
cytokine pathways such as that mediated by interleu- 
kin 6 have been implicated in Gaucher’s disease. Since 
this cytokine influences gene expression of several 
cathepsins and has been shown to be increased in the 
serum of patients with Gaucher’s disease it may thus 
represent one critical triggering factor for disease acti- 
vation. The identification of increased expression of 
the cathepsin K proteinase with preferential activity 
against collagen 1, the principal bone matrix protein, 
is also of eiiean and provides an example of 
how a new candidate for therapeutic attack can emerge 
from the genetic study of Gaucher’s disease. Specific 
inhibitors of cathepsin K have been developed for 
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pharmaceutical use for the treatment of metabolic 
bone diseases including osteoporosis: enhanced cath- 
epsin K expression associated with active Gaucher’s 
disease and lytic bone lesions immediately suggest the 
potential for the use of selective cathepsin K inhibitors 
for those patients afflicted. 

With the introduction of cluster analysis of patho- 
logical gene expression profiling and systematic pro- 
teome analysis further opportunities for studying the 
pathogenesis of Gaucher’s disease and related glyco- 
sphingolipid disorders will undoubtedly come to 
light. From every aspect, therefore, Gaucher’s disease 
represents a landmark condition as a prototype for 
the glycosphingolipid storage disorders and provides 
a vivid example of many productive interactions 
between clinical, biochemical, and genetical research. 


Further Reading 
Online 
www3.ncbi.nlm.nih.gov/Omim/ 


Mendelian Inheritance in Man (OMIM) _ http:// 


See also: Fabry Disease (a-Galactosidase A 
Deficiency); Hurler Syndrome; Tay-Sachs 
Disease 
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G-banding (Giemsa banding) is a technique that gen- 
erates a banded pattern in metaphase chromosomes, 
thus allowing identification of the separate chromo- 
somes. It involves brief treatment with protease and 
staining with Giemsa. 


See also: Giemsa Banding, Mouse Chromosomes 
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Electrophoresis, initially described by Arne Tiselius 
in 1937, is the process by which charged particles 
move through a media in the presence of an electric 
field ata given pH. The charged particles move at a con- 
stant velocity. The electric force (Eq) is equal to the 
frictional force or viscous drag (fv), as defined by the 
relationship: 
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Electrophoresis, initially described by Arne Tiselius 
in 1937, is the process by which charged particles 
move through a media in the presence of an electric 
field ata given pH. The charged particles move at a con- 
stant velocity. The electric force (Eq) is equal to the 
frictional force or viscous drag (fv), as defined by the 
relationship: 
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Eq = fv 


where E is electric field strength (volts per centimeter), 
q is net charge of the particle (electrostatic units), f is 
frictional coefficient (a function of the size and shape 
of the particle), and v is velocity of the particle (centi- 
meters per second). 

Ohm’s Law states that voltage (V) and current (J) 
are related by the relationship: 


V=IR 


The electric field strength (£) is defined either by the 
voltage (V) or the current (I), one of which typically 
is held constant. Therefore, the velocity (v) of a par- 
ticle in an electric field is defined by the relationship: 


v = Eq/f 


If the electric field strength (E) is kept constant, then 
the velocity (v) of the particle depends on its net 
charge (q) and its frictional coefficient (f). The fric- 
tional coefficient is directly proportional to the par- 
ticle’s Stokes radius, i.e., the radius of a spherical 
particle with equivalent hydrodynamic properties. 

In practical terms, if a mixture of different proteins 
or varying-sized nucleic acids are electrophoresed, 
generally the higher molecular weight proteins or 
nucleic acids will have larger effective diameters, a 
higher frictional coefficient, and travel slower than 
lower molecular weight proteins or nucleic acids. 
The distance traveled then is approximately inversely 
proportional to the log of the molecular weight of the 
particle. There are exceptions, especially in the case of 
very small polypeptides and oligonucleotides or very 
large proteins or nucleic acids. Also, if a protein is rich 
in proline or charged amino acids the solution struc- 
ture will be distorted. Such proteins will have an elec- 
trophoretic mobility different than that predicted by 
its molecular weight. Similarly, double-stranded DNA 
also can have a supercoiled circle, a relaxed circle, or 
a linear structure. Although all three forms would 
have the same molecular weight, each species would 
have a different electrophoretic mobility on the same 
agarose gel. 

The major electrophoretic media and apparatus 
used today for protein separations is polyacrylamide 
gel electrophoresis (PAGE) on ‘slab’ gels. If the 
separation voltage for the PAGE is kept constant, 
based on the preceding discussion of Stokes radius 
and frictional coefficient, the major factors that affect 
the separation of proteins with varying molecular 
weights are the pore size and amount of cross-linking 
of the gel media, but typically a 6% PAGE gel will 
resolve almost all proteins in the 10000 to 100000 


molecular weight range. There are two types of 
PAGE gels, a ‘native’ gel and a ‘denaturing’ gel. A 
native gel will separate proteins that are monomers 
from dimers, from tetramers, etc. For example, hemo- 
globin, which has two copies of two identical subunits 
and the structure a8, will be resolved as a single band 
on a native gel. However, a denaturing gel typically 
contains the detergent sodium dodecyl sulfate 
(SDA). Prior to loading, the protein mixture usually 
is heated in the presence of a reducing agent such as b- 
mercaptoethanol and a chelating agent such as EDTA 
to disrupt any subunits. Thus, hemoglobin, with its 
2B structure, will be resolved into two bands ona de- 
naturing gel, where the faster moving band is the small- 
er a-subunit, and the slower moving band is the larger 
B-subunit. Proteins are detected on both native and 
denaturing gels by staining either with methylene blue 
or the more sensitive silver stain. 

Nucleic acid electrophoretic media typically is 
either a polyacrylamide gel for nucleic acids that are 
shorter than 1000 bases, or an agarose gel for nucleic 
acids that contain more than several hundred or sev- 
eral thousand bases but less than a few hundred thou- 
sand bases. In both instances, the nucleic acid can be 
either single-stranded or double-stranded. The pore 
sizes for the electrophoretic media is adjusted by vary- 
ing the percentage of media and the amount of cross- 
linking. For example, a 0.8% or 1% agarose gel, which 
can separate larger nucleic acids, would have much 
larger pore size and less cross-linking than a 4% 
or 6% polyacrylamide gel used to resolve smaller 
nucleic acids. Extremely large nucleic acids that 
contain more than several hundred thousand bases 
can be resolved on pulse-field agarose gels. Large 
nucleic acids such as plasmids, cosmids, and 
restriction endonuclease-digested DNA usually are 
separated on agarose gels, and the DNA bands are 
detected by ethidium bromide staining. Mixtures of 
nucleic acids shorter than 1000 bases such as DNA- 
sequencing reaction nested fragment sets or multiple 
restriction endonuclease-digested DNA often are 
separated on polyacrylamide gels. Here the nucleic 
acids are either fluorescently or radioactively labeled. 
The fluorescent-labeled nucleic acids can be detected 
by a photomultiplier tube or CCD camera after laser 
activation of the associated fluorescent dye. Alter- 
natively, radioactivity-labeled nucleic acids can be 
detected by direct exposure to X-ray film, or the gel 
can be sliced and the radioactivity measured in a liquid 
scintillation counter. 

More recently, slab gels have begun to give way to 
capillary electrophoretic gels, which are much thinner, 
contain less media, require protein or nucleic acid 
samples that are several orders of magnitude lower, 
and resolve the samples in minutes rather than hours. 


The capillary electrophoresis instrument also is 
coupled with automated detection equipment and 
an associated computer, on which the results can be 
stored for further analysis. Various media have been 
described which can provide single-base resolution of 
either single- or double-stranded nucleic acids as large 
as 500-1000 bases. These media include linear poly- 
acrylamide, methyl cellulose, hydroxyethyl cellulose 
either alone or mixed with polyethylene oxide, and 
hydroxypropyl cellulose either alone or mixed with 
polyethylene oxide. The capillaries are either coated 
with a siliconizing reagent to reduce the charges on the 
glass capillary or used directly without coating. Capil- 
lary electrophoretic-based instrumentation now is 
quite robust and has gained wide acceptance in both 
the gene-mapping and DNA-sequencing commu- 
nities. In the case of protein separation by capillary 
gel electrophoresis, media similar to that used in 
nucleic acid separations have been described. How- 
ever, coated capillaries almost always are required 
because of the greater tendency for proteins to bind 
to the capillaries and thereby cause altered observed 
electrophoretic mobility and irreproducible quantita- 
tion of any resolved samples. 

There are numerous manufacturers and resellers 
of electrophoretic equipment that range from simple 
but effective Plexiglas acrylamide apparatus and 
power supplies to self-contained PAGE gel or capil- 
lary electrophoresis instruments. These suppliers 
also include detailed protocols for optimal use of 
their apparatus or instrumentation that are easy to 
follow and typically yield reproducible results. 
Finally, the electrophoresis literature is extensive and 
provides many detailed procedures. However, the 
reader is encouraged to investigate the following two 
books: Molecular Cloning: A Laboratory Manual 
(Sambrook and Russell, 2000); and Proteins (Walker, 
1984), as they provide an almost complete review of 
the existing literature. 
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The gene is the unit of heredity. While this definition 
may seem to stand on its own and not need further 
explanation, in fact it represents a continuing evolu- 
tion in the way we view the biological process of 
inheritance. Mendel’s view of the gene (the mechanism 
of inheritance) was an important conceptual change 
from the established view in the nineteenth century. 
Darwin, the establishment figure, promulgated the 
idea, first voiced by Hippocrates (400 Bce), that 
inheritance derived from miniature body parts or 
characters transmitted through copulation. Darwin’s 
theory of ‘pangenesis’ saw semen as being replenished 
by ‘gemmules’ derived from all the somatic tissues of 
the body. Mendel’s explanation instead makes it clear 
that it is information about the characters rather than 
the characters themselves that are transmitted. 

Mendel recognized the impossibility of a 
pangenesis-like model to explain his experimental 
observations as well as those of earlier plant hybridi- 
zers. They saw that recessive traits could be carried 
unchanged through several generations, that the traits 
could reappear by the F, generation, and that the 
recessive homozygotes extracted from such crosses 
could form pure breeding stocks indistinguishable 
from the original parental strains. Mendel’s explan- 
ation, with genes present in pairs that segregate dur- 
ing gamete formation, is the basis of the present 
science of genetics. The coupling of the idea of single 
gene inheritance with differences for individual traits 
was the breakthrough that unified biology in its 
disciplines ranging from evolution to physiological 
function. 

After the rediscovery of Mendel’s work in 1900, 
Bateson introduced many of the terms in current 
usage: genetic, zygote, homozygote, heterozygote, 
allelomorph (later shortened to allele), and F; and F2 
generations. Mendel’s term ‘Merkmal’ was translated 
as either character, unit character, or factor (Bateson’s 
choice). Johannsen proposed using the word gene in 
1909. 
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Mendel introduced a convenient symbolism to 
describe genetic relations between parents and their 
offspring. One character could abstractly be referred 
to as A, another as B, and so forth. He used the upper 
case letter to indicate which was the dominant trait 
and the lower-case letter for the recessive trait. Subse- 
quently, workers extended this to name the gene after 
the character, initially describing the dominant trait 
but subsequently shifting to usually naming genes 
after recessive traits. The reason for the shift is that 
most genes were identified by recessive mutations that 
departed from the ‘wild-type’ standard appearance. It 
was understood that each gene so identified had two 
alleles, one mutant and the other representing the 
wild-type. 

The discovery of multiple alleles called into ques- 
tion what is meant by the gene. Two eminent geneti- 
cists, A. H. Sturtevant and G. W. Beadle, who had 
coauthored a textbook on genetics, discovered that 
each had used the term ‘gene’ differently. The white 
gene to Sturtevant was the specific white mutant but 
to Beadle it represented the group of white alleles 
including the wild-type allele. While most geneticists 
today follow Beadle’s usage, medical geneticists 
frequently refer to those mutations associated with 
genetic diseases as genes, e.g., the Duchenne muscular 
dystrophy gene or the Huntington disease gene, 
without mentioning the normal alleles. Many geneti- 
cists, however, are adopting the compromise language 
of referring to the Duchenne muscular dystrophy gene 
‘mutation’ or the Huntington disease gene ‘mutation’ 
to distinguish between gene and allele. 

By this definition it is alleles, not genes, that are 
observed to be the units of segregation. Allelism is also 
conferred from the shared properties of similar 
mutant phenotypes and the failure to complement 
other mutant alleles. Alleles will almost invariably 
segregate from each other in trans heterozygotes, 
at least in multicellular organisms having low levels 
of recombination. Rarely is a wild-type recombinant 
progeny observed to indicate that those parental 
alleles cannot be mutant in the same location. Instead 
they are described as ‘pseudoalleles,’ at different sites 
but still regarded as marking the same gene. Other 
examples pose more of a challenge to the idea of one 
gene, however. In the phage T4, mutants of the rII 
class, named after their phenotype, map to one section 
of the linkage group in a cluster that is over seven map 
units long! Do rII mutants mark one gene? Employing 
what he called the cis-trans test, Seymour Benzer 
described the mutants in the cluster as forming two 
complementation groups, or cistrons. He showed that 
mutants in one cistron localized by recombination 
studies to one side of the cluster and mutants of the 
other cistron localized to the other side. Moreover, 


Benzer showed that the T4 linkage group is linear 
both inside and outside the rII cluster. Mutation pos- 
itions are continuously distributed along this line with 
no obvious demarcations between the flanking adja- 
cent genes or the rII cistrons. This means that genes 
cannot be separated solely by mutant position or by 
mutant phenotype. The remaining alternative, defin- 
ing a gene by complementation testing, implies that 
genes can be separated on the basis of biochemical 
function. 

The function of genes, known since the work of 
Beadle and Tatum in 1945 and Linus Pauling in 1949, 
is to code for the structure of proteins (more accur- 
ately, polypeptide chains). The disciplines of bio- 
chemistry and genetics are united in the DNA 
nucleotide sequence of a gene coding for the amino 
acid sequence of a polypeptide chain. The latter 
sequence determines not only how a protein folds 
into three dimensions but also the specific enzymatic 
reaction(s) or other chemical role(s) of the protein in 
the organism. A considerable body of effort over the 
past 50 years consisted of finding which proteins were 
coded by which genes. Starting with an isolated pro- 
tein, the amino acid sequence was used to predict the 
nucleotide coding sequence. Eventually an oligo- 
nucleotide with this sequence could be made and 
used by hybridization to isolate or locate the chromo- 
somal gene. Starting with a mutant gene in an organ- 
ism, biochemical assays are used to determine which 
protein is aberrant, or mapping studies are used to 
determine which candidate nucleotide sequence is 
mutated. Automation in sequencing techniques has 
led to great advances in linking genes and proteins. 
The goal of organism genome projects is to place every 
gene/protein-coding unit on the complete sequence of 
that organism’s genome. 

Predicting genes from long nucleotide sequence 
tracts is a matter of identifying open reading frames 
(ORFs). Of the six possible reading frames in any 
interval, a genuine protein-coding region is expected 
to have one reading frame consisting of sense or amino 
acid codons, that is open long enough to specify a 
candidate polypeptide chain. Reading frames that are 
interrupted by stop codons do not qualify. With three 
stop codons out of 64 possible triplet codons, random 
noncoding sequence is expected to be interrupted 
by stop codons every 21 codons on average. The 
complete sequence of the Saccharomyces cerevisiae 
genome yields 6023 predicted ORFs. About a third 
could be connected initially with known mutant genes 
or known biochemical products. About a third were 
experimentally verified to be genes by showing a 
mutant phenotype after knocking the gene out. In 
S. cerevisiae homologous recombination is used to 
replace the normal allele with a nonfunctional allele 


or ‘knockout’ in order to test for function. The remain- 
ing third of the predicted ORFs cannot be confirmed 
as being genes by these tests. It is more difficult still to 
apply this approach to the genome sequences of multi- 
cellular plants and animals because of the compli- 
cations introduced by introns interrupting ORFs. 
Computer-generated predictions taking into account 
species-specific codon usage preferences and the pre- 
ferred splice donor and splice acceptor sequences (to 
recognize the ends of introns) still do not recognize all 
of the known genes. The race to guess the number 
of genes from the human sequence has assumed the 
status of a TV game show, with lotteries and prizes 
promised to the winners. The estimates range between 
20 000 and 120 000 genes. 

There are other complications that get in the way of 
precisely identifying genes on the basis of DNA 
sequence alone. Equating genes with biochemical 
functions has problems because of multiple gene 
families and/or multiple gene products. Are two 
ORFs that code for the same protein counted as one 
gene or two? One example among many is the two &- 
globin-coding regions on human chromosome 16. 
Many genes, perhaps most, produce more than one 
mRNA product through alternative transcription 
starts or alternative splicing. Awareness is growing 
that proteins with some shared domains and some dif- 
ferent domains play an important role in fine-tuning 
tissue-specific development. Still other polypeptide 
chains are subsequently cleaved or modified to yield 
different kinds of products. Mutation locations can 
vary to include hits in the regulatory regions outside 
the recognized coding intervals. What should be coun- 
ted as a gene at the DNA level: the code for each func- 
tion (chemical reaction), the template for each mRNA 
(cDNA) transcript, the code for each polypeptide 
chain, or each mutation location? Epigenetic regula- 
tion by imprinting means that some aspects of gene 
expression are above the sequence; the implication, 
reminiscent of Goldschmidt’s 1946 argument, is that 
individual genes cannot be separated from functioning 
of the larger chromosomal unit. A modern synthesis 
might state that reproductive success, natural selection, 
and evolution value only what works for the organ- 
ism. There is no design to biology, just history in the 
form of inheritance and tinkering through the noise of 
mutations and environmental variations to result in 
the individual, for better or worse. 

If gene as a concept is not as well founded as say, 
atoms or molecules, does this mean the term should be 
discarded? Probably not. Gene still has heuristic value. 
We mean by it the awareness of the origin of specific 
molecules that are well founded, as in the gene for 
telomerase, or any other enzyme under active inves- 
tigation. We mean also the importance of inheritance 
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over environment for individual differences, not to 
close discussion but to stimulate investigation of the 
mechanisms influencing biology. And gene codifies 
what is known about the mechanism of inheritance, 
as the statements “it’s in the genes” or “genes run in 
families,” cannot be imagined with gemmules re- 
placing genes. Ultimately it is the responsibility of 
authors to make clear how they are using the term in 
order to convey their message. 


See also: Alleles; Benzer, Seymour; Linkage 
Group; Mutation; Nomenclature of Genetics 
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Gene action is the consequence(s) of the presence and 
activities of the product of a gene. 


See also: Operon 
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A fundamental property of living cells is their orderly 
transmission of genetic information from generation 
to generation. One aspect of this property involves a 
mechanism which controls replication and ensures 
one complete doubling of each replicon during each 
cell generation. Another aspect of this property is the 
placement of genes such that they are expressed prop- 
erly and that each daughter cell receives the genes in 
the appropriate configuration. It is now appreciated 
that deviations from this principle occur commonly 
and that amplification of DNA sequences as well as 
rearrangement of sequences occurs often. 
Amplification with of DNA sequences, the differ- 
ential increase in a specific portion of the genome in 
comparison with the remainder, occurs during devel- 
opmentas well as during the vegetative growth of cells. 
The processes of polyploidization and endoreduplica- 
tion, where the entire chromosome complement is 
multiplied in one nucleus, will not be discussed in 
this section. Additionally, aneuploidy (or trisomy) can 
also result in a differential increase in a portion of the 
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genome, but this is distinct from DNA amplification 
and will not be further discussed. Developmental 
amplification has been documented extensively in 
both germline cells and somatic cells of many organ- 
isms. Clearly these changes in the DNA content of the 
nucleus of cells are carefully regulated and lead to the 
appearance of the extra DNA at a predetermined time 
and, in many cases, the dissipation of this DNA on 
cue. Another type of DNA amplification (sporadic) is 
detected when cells are overcoming adverse environ- 
mental conditions. Amplification of this type, usually 
visualized by selection for a desired phenotype, has 
been found in bacteria, yeast, insects, and vertebrates. 
Salient features of this amplification process are being 
studied in several laboratories and several general 
characteristics have emerged. These include: (1) a multi- 
step process leading to the generation of highly 
amplified sequences; (2) a karyotype characterized 
by chromosomal abnormalities; (3) a genetic instabil- 
ity of the resistant (amplified) phenotype which may 
extend into a marked clonal variation among cells; (4) 
a spontaneous rate of detection which varies; and (5) a 
spontaneous rate of detection which may be increased 
by manipulations of the cellular growth conditions 
(among these, treatment with carcinogens). Several 
excellent reviews of amplification have appeared in 
the literature (Hamlin et al., 1984; Stark and Wahl, 
1984; Stark, 1986; Schimke, 1988). 


Development 


In Germline Cells 

The earliest suggestion of developmental DNA amp- 
lification was made by King in 1908 when she 
described the extra ‘chromatin’ (associated with form- 
ing nucleoli) which arose in the oocytes of the toad, 
Bufo sp., during pachytene (King, 1908). Later studies 
show that these ‘masses’ contain DNA, and hybrid- 
ization studies demonstrate a great excess of sequences 
coding for rRNA (Gall, 1968). This differential synthe- 
sis of rRNA genes is correlated with the appearance of 
hundreds of nucleolar organizers during the pachytene 
stage of meiosis. Similar examples of this phenomenon 
are seen in other amphibians: Xenopus, Rana, 
Eleutherodactylus, Triturus (Gall, 1968), as well as in 
an echuiroid worm, the surf clam (Brown and David, 
1968), and many insects (Gall et al., 1969). Documen- 
tation of rDNA amplification is particularly amen- 
able, because the rDNA forms nucleoli which are 
distinctive in appearance and because most rDNA 
sequences have a relatively high guanosine-cytosine 
content, which allows their separation from bulk DNA 
on CsCl gradients. These two aspects of rDNA have 
been important in the recognition of this and other 
phenomena (magnification and compensation) in 


which the rDNA copy number changes under various 
genetic conditions. 

The most complete study of oogenic rRNA gene 
amplification has been made in Xenopus laevis. In this 
animal the early primordial germ cells do not contain 
amplified rDNA. Amplification is initiated in both 
oogonia and spermatogonia of the tadpole during sex- 
ual differentiation and germ cell mitosis and results in 
a 10- to 40-fold increase in rDNA genes. The pre- 
meiotic amplification is lost at the onset of meiotic 
prophase. This loss seems to be permanent in male 
germ cells but temporary in female germ cells. Early 
during meiotic prophase, the oocyte nucleus under- 
goes a second burst of rDNA amplification which 
results in a 1000-fold increase in ribosomal genes 
(Kalt and Gall, 1974). The mechanism by which the 
first extrachromosomal rDNA copies are produced in 
the premeiotic stage is unknown, although circular 
structures have been observed (Bird, 1978). Since the 
number and placement of ribosomal cistrons does 
not change, a mechanism involving disproportionate 
replication is favored. The second burst of rDNA 
amplification probably involves a rolling-circle inter- 
mediate. Such structures have been visualized by 
Hourcade et al. (1973) and could account for the 
increase in rDNA in the oocyte during the given 
period of time (Rochaix et al, 1974). While the 
rDNA content is constant, the size and number of 
the rings are variable, suggesting fission and fusion of 
nucleoli during oogenesis (Thiebaud, 1979). The cir- 
cular molecules exhibit sizes which are integral mul- 
tiples of a basic unit. Using molecular techniques, this 
basic unit was found to be the DNA segment coding 
for one precursor rRNA molecule plus the accom- 
panying nontranscribed spacer region. Extrachromo- 
somal, circular molecules containing rDNA are also 
found at alow frequency (0.05-0.15% of total number 
of molecules) in Xenopus tissue culture cells and in 
Xenopus blood cells (Rochaix and Bird, 1975). It has 
been suggested that the difference in the state of 
rDNA between a somatic cell and an amplified germ 
cell may only be one of degree (Bird, 1978). 

The cytological literature contains many other 
references to extrachromosomal DNA in oocytes, of 
which one of the most striking examples is found 
in dytiscid water beetles such as Rhyncosciara. The 
nucleus of each oocyte contains a large chromatin 
mass, termed ‘“Giardina’s body,’ in addition to the 
chromosomal complement. In older oocytes this 
DNA is associated with multitudes of nucleoli and 
has been shown to contain an increase in rDNA 
sequences (Gall et al., 1969). Hence, amplification of 
rDNA occurs during the maturation of the oocyte. 
However, hybridization studies indicate that only a 
fraction of the extrachromosomal DNA is made up 


of rDNA sequences, indicating that other DNA 
sequences (of unknown function) are also amplified 
(Gall et al., 1969). 

The phenomenon of gene amplification in Tetra- 
hymena, while also resulting in a considerable increase 
in rDNA sequences, is somewhat different from that 
in oocytes described above. Tetrahymena contain two 
types of nuclei in each cell: a transcriptionally quies- 
cent micronucleus, which is responsible for genetic 
continuity; and a macronucleus, which is derived 
from the micronucleus in a developmentally regulated 
process whereby micronuclear sequences are elimin- 
ated, rearranged, and amplified (Yao and Gall, 1974; 
Yao and Gorovsky, 1974; Yao et al., 1978). Consider- 
able amplification of many micronuclear sequences 
occurs, which results in the macronucleus containing 
45 times the haploid amount of DNA. Some of this 
increase is accounted for by amplification of rDNA 
sequences which are present as a single integrated 
copy in the micronucleus but are present in 200 copies 
in the macronucleus (Gall and Rochaix, 1974; Yao and 
Gall, 1974; Yao et al., 1978). These amplified rDNA 
sequences are present as linear, extrachromosomal, 
palindromic molecules (Yao et al., 1978; Yao, 1981). 
The present data favors a model whereby excision of 
the single rDNA copy is followed by amplification 
(Yao et al., 1978). 

The amplification, rearrangement, and elimination 
of sequences is also developmentally regulated in the 
slime molds Physarum and Dictyostelium, as well as 
another ciliated protozoan, Stylonychia, to varying 
extents. Studies concerning the molecular structure 
of these amplified DNA sequences have been under- 
taken and have revealed interesting structures at the 
termini. 


In Somatic Cells 

Gene amplification during the differentiation of 
somatic cells also occurs and was again first detected 
by morphological criteria. Several regions of the poly- 
tene chromosomes found in the larval salivary glands 
of the fly Rhynchosciara americana showed a puffing 
response to hormone treatment which was carefully 
defined both temporally and spatially. These puffs 
were found to contain greater amounts of DNA than 
surrounding region (Breuer and Pavan, 1955) and code 
for peptides utilized in the synthesis of the cocoon. 
Amplification and puff formation are dependent on 
the developmental stage of the cell as well as the cell’s 
position within the gland (Glover et al., 1982). Puff 
formation is not peculiar to the salivary gland chromo- 
somes of Rhynchosciara, because it has also been ob- 
served in the cells of Malpighian tubules and intestinal 
cells of the same insect. In addition, puff formation has 
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also been observed in different tissues of Chironomi- 
dae, Drosophila, Sciara, Hybosciara, and other Diptera 
(Breuer and Pavan, 1955), in some cases thought to be 
under hormonal control (Pavan and da Cunha, 1969; 
Bostock and Sumner, 1978). 

In many cases of differentiation, somatic cells 
acquire adequate amounts of mRNA for production 
of abundant proteins by accumulation of stable 
mRNA molecules over a period of days. Examples of 
such are the silk fibroin genes, the ovalbumin genes, 
and the B-globin-chain genes. This is thought to be 
controlled at the transcriptional or posttranscriptional 
level. However, in other cases, such as during the 
synthesis of the insect eggshell by the ovarian follicle 
cells of Drosophila, little time is allotted for the pro- 
duction of the specific mRNAs which are needed 
in large quantities. In these latter cases, the rates of 
transcription and translation do not seem to be high 
enough for the production of adequate amounts of 
protein. Spradling and Mahowald (1980) found that 
this need is met by differential amplification of the 
chorion gene sequences in the ovarian follicle cells. 
Spradling and Mahowald (1981) found that the chorion 
genes that are located on the X chromosome are in two 
clusters, s36 and s38, and are amplified 15-fold. The 
s15 and s18 loci are on the third chromosome and 
are amplified 60-fold. The genes in both clusters are 
amplified at the same time and both homologs 
are amplified equally. Sequences which flank these 
genes are also disproportionately replicated but not 
to as great an extent. This results in a gradient of 
amplification which spans 90 kb of DNA, is maximal 
in the center, and does not evidence any discrete ter- 
mination sites. 

Changes in the DNA content of the nucleus of 
germline cells and somatic cells during development 
is common. Amplification of rRNA sequences, as well 
as others coding for proteins which are needed in large 
amounts during that particular phase of development, 
have been extensively documented. In still other cases, 
the nature of the amplified DNA cannot be totally 
accounted for by known sequences and probably con- 
tains amplified sequences of unknown function. Such 
sequences have been implicated in the differentiation 
of the orchid Cynidium sp. (Nagl et al., 1972), the 
differentiation of peas (Van’T Hof and Bjerknes, 
1982), and in the flowering of the tobacco plant, 
Nicotiana sp. (Wardell, 1977). 


Acquisition of a Selected Phenotype 


Duplication and amplification of genetic material in 
cells has long been documented as a means for overcom- 
ing deleterious growthconditions. Unlikeamplification 
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events that are specifically regulated during develop- 
ment, amplification as a means of survival is a more 
sporadic event, whose frequency is often detected at 
much lower levels than that seen in developmental 
amplification. 


Bacteria 

The first well-documented instance of gene duplica- 
tions in bacteria which provided an adaptive advan- 
tage was seen by Novick and coworkers (Novick and 
Horiuchi, 1961; Horiuchi et al., 1962, 1963). Bacterial 
strains were grown for long periods of time in 
limiting concentrations of lactose in the chemostat. 
The bacterial strains which emerged were able to 
synthesize four times the maximal normal amount of 
B-galactosidase. The ability to produce large amounts 
of enzyme was unstable and could be transferred by 
conjugation at a time when the lactose operon genes 
were expected to be transferred. It was concluded that 
the ability to overproduce B-galactosidase was due to 
extra copies of the lactose genes present in these 
strains. The spontaneous rate of duplication was 
estimated to be 107° (Horiuchi et al., 1963) and, in a 
similar system, 1074 (Langridge, 1969). Overpro- 
ducers with similar characteristics were subsequently 
reported for other enzymes such as ribitol dehydro- 
genase, B-lactamase (Normark et al., 1977), and sev- 
eral others (for an excellent review, see Anderson and 
Roth, 1977). Roth and coworkers have demonstrated 
that duplications of up to a quarter of the Salmonella 
typhimurium chromosome occur at large homologous 
segments such as the rRNA genes (Anderson and 
Roth, 1977, 1981). Such large duplications are depend- 
ent on the recA system. On the other hand, dupli- 
cations in the range of 10-30kb appear to be 
independent of recA and do not involve very large 
homologous segments of DNA (Emmons et al., 1975; 
Anderson and Roth, 1977; Emmons and Thomas, 
1981). Edlund and Normark (1981) have detected 
tandem duplications of 10-20 kb at the E. coli chromo- 
somal ampC locus, which codes for B-lactamase and 
confers resistance to ampicillin. After a step-wise 
selection in increasing concentrations of ampicillin, 
30-50 copies of the duplication events were analyzed 
at the restriction fragment level and were found to 
have different endpoints. The junction point of the 
amplified unit in one case was sequenced and it was 
found that the original duplication event occurred at a 
sequence of 12 bp, which was repeated on each side of 
the ampC locus, 10 kb apart. Tlsty et al. (1984a), have 
detected amplification events as revertants of certain 
leaky Lac mutants. The DNA sequences, which were 
amplified, contained the lactose operon and any- 
where from 7 to 32 kb of flanking sequences. These 
regions were amplified 100-fold. Virtually all of the 


duplications, which have been detected and subse- 
quently studied in bacteria, have been tandem in nature. 

Several bacteriophages are also known to 
amplify genetic markers. The phage lambda (Edlund 
and Normark, 1981), T4 (Kozinski et al., 1980), and 
P1 phage (Meyer and Lida, 1979) may use mechanisms 
that parallel those used by viral sequences in mamma- 
lian cells. At the present time, the mechanism is un- 
known. 


Yeast 

Resistance to the toxic effects of copper in Saccharo- 
myces cerevisiae is mediated by tandem gene ampli- 
fication of the CUP1 locus (Fogel et al., 1983) but 
is different from sporadic amplification events char- 
acterized in other organisms. The CUP1 locus of 
yeast codes for a small molecular-weight copper- 
binding protein. Copper-sensitive strains contain one 
copy of this locus and, when grown in elevated 
concentrations of copper, failed to produce resistant 
derivatives with a higher gene copy number of CUP1 
genes on one chromosome. Infrequently, copper- 
resistant strains were isolated in the laboratory and 
were found to carry up to 10 tandem duplications 
of the region, which is 2 kb in size. Further ampli- 
fication could be achieved by growing the copper- 
resistant strains in elevated copper concentrations. 
The authors postulate that the mechanism of ampli- 
fication in this instance proceeds through the forma- 
tion of a disomy for chromosome VIII (which carries 
the CUP1 locus). Copper-resistant mutants of the 
sensitive strain were found to be disomics for chromo- 
some VIII. The amplification or randem iteration 
could then result primarily for subsequent unequal 
chromosome or sister chromatid exchanges. The for- 
mation of a disomy may constitute an initial event in 
the process. 

A second example of gene amplification in yeast 
exhibits molecular structures that are more similar to 
those described in other organisms. A yeast strain 
resistant to antimycin A, an alcohol dehydrogenase 
inhibitor, has been found to contain multiple copies 
of a nuclear gene, ADH4, an isoenzyme of alcohol 
dehydrogenase. The amplified copies are 42kb in 
length, display a linear, extrachromosomal, palindrom- 
ic structure and contain telomeric sequences. Their 
structure resembles that of the amplified rDNA 
genes in the macronucleus of Tetrahymena and related 
cilliated protozoa, except that the nuclear copy 
remains within the chromosome in this situation. In 
contrast to what is often observed in mammalian 
amplification, the extrachromosomal copies of this 
gene were stable during mitotic growth. Ampli- 
fication of the ADH4 gene is a relatively rare 
event (~ 10 '° mutations/cell/generation); alternative 


mutations compose the majority of antimycin 
A-resistant events. 


Protozoans 

Drug resistance in protozoan parasites is a com- 
mon occurrence and presents a serious problem for 
the chemotherapy of diseases caused by such patho- 
gens as Trypanosoma, Leishmania, and Plasmodium 
(Browning, 1954; Peters, 1974; Rollo, 1980). Recently, 
Leishmania strains resistant to the well-known chemo- 
therapeutic agent methotrexate (MTX), have been isol- 
ated and analyzed as to their mechanism of resistance. 
Organisms which were resistant to high concen- 
trations of MTX (1mM) had a 40-fold increase 
in dihydrofolate reductase (DHFR), which in this 
organism is associated with thymidylate synthetase 
(Beverley et al., 1984). 

Recent studies have shown that Plasmodium falci- 
parum contains genes that are analogous to the multi- 
drug resistance genes in mammalian cells. Parasites 
that become resistant to chloroquine have also proven 
to be resistant to other antimalarial drugs; similar 
to the phenomenon of multi-drug resistance seen 
in human tumors. Wilson et al. (1989) found that 
sequences that were similar to the mammalian P- 
glycoprotein existed in P. falciparum. Their studies 
showed that drug-resistant parasites contained ampli- 
fied copies of these specific DNA sequences when 
compared with their drug-sensitive siblings. 


Invertebrates 

Amplification of rRNA genes in drosophila during 
oogenesis does not occur as has been described in 
Xenopus (see previous section, “Development”). How- 
ever, sporadic amplification of the rRNA genes during 
one generation has been observed under specific 
genetic conditions. Amplification of the rDNA genes 
in this situation results in a reversion from a mutant to 
a wild-type phenotype. Each sex chromosome carries 
approximately 130-150 rRNA genes (Ritossa et al., 
1966). The phenotype is wild if the diploid cell carries 
at least one normal locus (~130 genes) while the 
phenotype is altered (bobbed) if the genome carries 
less than 130 genes (Ritossa and Scala, 1964; Ritossa 
et al., 1966). The intensity of the bobbed phenotype 
(slow development, thin chitinous cuticle, reduced 
body traits, and short bristles) is inversely propor- 
tional to the number of genes for rRNA. Ribosomal 
DNA magnification, the increase in rDNA copy 
number, is observed in the progeny of phenotypically 
bobbed males. It involves rapid accumulation of 
rDNA by unknown mechanisms at either nucleolus 
organizer. The rDNA, which is accumulated during 
the first generation, does not have a noticeable effect 
on the phenotype of the fly. The phenotypic effects of 
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the magnified rDNA become evident in the F, pro- 
geny if the rDNA has been transmitted by males and if 
the genotype of the F, generation is again character- 
ized by rDNA deficiencies. In other words, the extra 
copies of rDNA are eliminated if the magnified fly is 
crossed with a normal (bb*) female. The phenotypic 
inheritance of magnified rDNA requires its integra- 
tion and inheritance through the male germline. Two 
hypotheses have been proposed to explain rDNA 
magnification: disproportionate replication of rDNA 
(Ritossa and Scala, 1964; Ritossa et al., 1971) or un- 
equal sister chromatid exchange (Tartof, 1974). Recent 
experiments demonstrating the decrease in magnifica- 
tion frequency in organisms which carry the rDNA 
ona ring chromosome strongly suggest unequal cross- 
over as the mechanism. 

Another type of amplification occurs in Drosophila 
melanogaster, which differs from rDNA magnifica- 
tion in several characteristics. This amplification, 
called ‘compensation,’ occurs when one nucleolus 
organizer of the two homologs is completely deleted 
(X/O or X/X-no females). In such mutants, the 
remaining organizer ‘compensates’ for the deletion 
of the rDNA sequences by a disproportionate repli- 
cation of the remaining sequences on the intact 
homolog. Compensation may only occur on the X 
chromosomal nucleolus organizer, and the extra 
rDNA is not inherited in subsequent generations. 
Utilizing various deficiencies for the X-chromosomal 
heterochromatin, evidence has been presented for the 
existence of a genetic locus that regulates rDNA com- 
pensation (Procunier and Tartof, 1978). This locus, 
called the ‘compensatory response’ (cr), is located out- 
side the ribosomal cluster and in the X-chromosomal 
heterochromatin. The locus acts in trans to sense the 
presence or absence of its partner locus on the oppos- 
ite homolog. If only one cr locus is present, it acts in 
cis by driving compensation (disproportionate repli- 
cation) of adjacent rRNA genes. Not all embroys with 
the proper genotype undergo compensatory amplifi- 
cation to emerge with an increased number of rRNA 
genes. Only a small fraction undergoes the putative 
compensatory amplification. In this respect the ampli- 
fication behaves like a mutagenic reversion event to 
restore the functional phenotype. 

Resistance to environmental agents such as pesti- 
cides and toxic chemical waste has been documented in 
laboratory stocks and natural populations of inverte- 
brates. Selection of Drosophila larvae in increasing 
concentrations of cadmium yields strains that contain 
duplications of the metallothionien gene (Otto et al., 
1986). The duplication is stably inherited in the 
absence of selection pressure and produces a corres- 
ponding increase in metallothionien messenger RNA. 
A survey of natural populations found that this event 
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is common (Maroni et al., 1987) and may signal the 
early stages of the evolution of a gene family. The 
mosquito Culex quinquefasciatus develops resistance 
to various organophosphorus insecticides by over- 
producing the enzyme esterase B1. Molecular studies 
have demonstrated that the overproduction of the 
enzyme is the result of amplification of the esterase 
B1 gene some 250-fold (Mouches et al., 1986). The 
resistant mosquito was described as normally devel- 
oped and had reproductive capacity. This observation 
raises questions of evolutionary significance for the 
duplication and amplification event at least in inverte- 
brates. 


Plants 

DNA changes in plants during response to environ- 
mental stress have been reported for flax (see Cullis, 
1977, 1979, 1983). The suggestion that these changes in 
DNA content are induced by the environment awaits 
further studies for verification. 


Vertebrates 

DNA amplification in mammalian cells was first 
detected when murine tumor cell populations became 
resistant to chemotherapeutic drugs. MTX, an oft- 
used chemotherapeutic drug, inhibits the action of 
dihydrofolate reductase, which is required for the 
biosynthesis of thymidylate, glycine, and purines. 
Step-wise selection of cells in increasing con- 
centrations of MTX generated highly resistant cells 
(Hakala et al., 1961; Alt et al., 1976; Flintoff et al., 
1976a,b; Haber et al., 1981): Beidler and Spengler 
(1976) detected chromosomal abnormalities in the 
cells, which overproduced dhfr and suggested that 
they reflected an increase in gene dosage. Schimke 
and coworkers obtained a cDNA for the dhfr 
sequence and were able to show that the over- 
production of dhfr enzyme was the result of ampli- 
fication of the DHFR DNA sequence (Alt et al., 1978). 
It is now known that amplification of the DNA 
sequence coding for the target enzyme of a metabolic 
inhibitor is a common mechanism for overcoming 
growth restriction (Dolnick et al., 1979; Melera et al., 
1980; Tyler-Smith and Bostock, 1981; Flintoff et al., 
1983; Stark and Wahl, 1984 for review). 

Other examples of this phenomenon were subse- 
quently found, the best studied being amplification 
of the CAD gene. The CAD gene codes for a multi- 
functional protein which catalyses the first three steps 
in the synthesis of pyrimidines. The asparate transcar- 
bamylase activity can be inhibited by the transition 
state analogue, N-phosphoacetyl-L-aspartate (PALA). 
PALA-resistant cells overproduce not only the aspar- 
tate transcarbamylase but the other two enzymes as 
well (carbamy] synthetase and dihydrooratase; Kempe 


et al., 1976). Wahl et al. (1979) have shown that over- 
production of these enzymes is the direct result of 
amplification of DNA coding for these proteins. 

Numerous other instances of DNA amplification 
have now been described (for a comprehensive list and 
references, see Stark and Wahl, 1984). In all cases the 
growth of cells is inhibited either by metabolic inhibi- 
tors, toxic agents, or altered enzymes with reduced 
efficiency. Of clinical importance has been the discov- 
ery that multidrug resistance in cancer chemotherapy 
is, in some cases, mediated by amplification of the mdr 
locus (Roninson et al., 1984a,b). Selection pressure 
can also lead to the amplification of sequences with 
initially unknown functions. 

The acquisition of a selected phenotype may often 
result from selection pressures, which are unknown at 
that time. In these cases a certain phenotype may be 
accompanied by the manifestations of gene amplifica- 
tion for sequences unknown. Such an instance has 
been described in studying the sequences, which are 
carried on the double-minute chromosomes (DMs) 
and found in the homogeneously staining regions 
(HSRs) of neuroblastoma cells, where these structures 
were first described. The sequences, which are ampli- 
fied in these lines, are cellular onc genes, the N-myc 
gene (Schwab et al., 1983). Amplification of onco- 
genes has now been found in several tumor types 
containing DMs and HSRs. 


Mammalian Gene Amplification 


There are several characteristics of amplification 
which many of the systems above share in common. 
To illustrate these characteristics, general properties of 
MTX-resistant cells, which result from amplification 
of the DHER gene, will be described. 

Classifically, mammalian cells containing amplified 
DHER genes were obtained by a stepwise selection 
for cells, which were highly resistant to MTX. High 
MTX resistance (by virtue of amplification of the 
DHER gene) cannot be obtained by a large, single- 
step selection protocol; it is a multistep process. The 
initial step seems to be rate-limiting, since cells with a 
low copy number can be rapidly stepped up to a high 
level of resistance and a high copy number. When the 
initial increase in gene copy was examined more close- 
ly, Brown et al. (1983a,b) and Tlsty et al. (1984b) found 
that the stringency of selection is critical in obtaining 
cells, which have amplified DHFR. It was found that 
incremental increases in drug concentration not only 
promote the rapid emergence of resistance but also 
specifically promote the rapid amplification of the 
DHFR gene (Rath et al., 1984). 

The second property of MTX-resistant cells, which 
have amplified their DHFR gene, is the frequent 


presence of karyotypic abnormalities in the cells. As 
indicated previously, abnormal chromosomal struc- 
tures were associated with overproduction of the 
DHER in the early studies of Beidler and Spengler 
(1976). They described a marker chromosome in 
over producing cells which contained an elongated 
chromosomal arm. The term ‘homogeneously staining 
region’ (HSR) was coined to describe a region of this 
chromosome which banded abnormally when stained 
with Giemsa and which was subsequently shown to be 
the site of the amplified DHFR sequences (Alt et al., 
1978). This structure was associated with stable resist- 
ance to MTX; that is, retention of the resistant pheno- 
type even after subsequent growth in the absence of 
selection pressure. This is in contrast to the karyo- 
type of cells, which were unstably resistant to MTX; 
i.e., with extended growth in nonselective medium, 
the resistant phenotype (amplification) diminished 
rapidly and disappeared. HSR structures were not 
found in unstably resistant cells. Close examination 
of the karyotype of unstably resistant cells did, how- 
ever, bring to light the presence of small chromosomal 
fragments known as DMs. These structures (as well as 
HSRs) had been described by Balaban-Malenbaum 
and Gilbert (1980) in cell lines obtained from human 
neuroblastoma. Subsequent work demonstrated that 
unstably resistant cells contained the amplified copies 
of DHFR on the DMs. The lack of centromeric 
structure in these fragments leads to their random 
(unequal) segregation at mitosis and a diminution in 
their number if selection pressure is no longer exerted 
on the cells (see Kaufman and Schimke, 1981). 

The molecular structure of HSRs and DMs has 
been studied. The first obstacle in characterizing the 
amplified unit derives from its large size. The DHFR 
gene, which is amplified to confer MTX resistance, is 
large: 31 kb including introns. The size of the ampli- 
fied region is greater still; gross estimates vary from 
120 to 1000 kb as the unit of DNA, which is amplified. 
Analysis of the end points of the amplified units pro- 
vided information on the structure of the amplified 
unit. Although the sequence of DNA which needs to 
be characterized is long, cloning of neighboring frag- 
ments (‘chromosomal walking’) has been accom- 
plished by several laboratories in both mouse and 
hamster model systems (Zeig et al., 1983; Federspiel 
et al., 1984; Giulotto et al., 1989). The information 
derived from these endeavors has not provided the 
desired portrait of the amplified unit because of 
another obstacle. The amplified structure, at the 
molecular level, seems to be continually changing, as 
evidenced in the chromosomal walking studies. On 
each amplified cell studied, the amplified sequences 
correlate with the cloned map only up to a certain 
point and then diverge. Rearrangements of DNA 
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accompany the amplification of genes. The basic 
molecular event of DNA amplification is obscured 
by the dynamic aspect of the process. HSRs, DMs, 
and translocations are karyotypic abnormalities 
which have been detected in cells which are highly 
resistant to a given metabolic inhibitor (i.e., cells 
which have already progressed through much of the 
multistep process). Contrasting results have been 
obtained by Hamlin and Montoya-Zavala (1985) in 
their study of DHFR gene amplification in Chinese 
hamster ovary (CHO) cells. They find the amplified 
unit to be uniform in size and exist in head-to-head 
and head-to-tail tandem repeats. 

A third characteristic of MTX-resistant cells, which 
have amplified the DHFR gene, is the initial genetic in- 
stability of the resistant (amplified) phenotype, which 
is accompanied by a marked heterogeneity in the 
population. Cells newly selected for MTX resistance 
are unstable with respect to DHFR levels, and the loss 
of the elevated DHFR levels was variable in the 
progeny of different cloned cells. The initial instability 
of the amplified DHFR genes in emerging, resistant 
CHO cells is consistent with the hypothesis that they 
are present as extrachromosomal pieces of DNA. 
Stabilization of the resistant phenotype could be 
the result of integration of these sequences into the 
chromosome, either at the site of amplification or 
elsewhere in the genome, or could be the result of 
processes that are unknown at the present time. 

A final characteristic is that the frequency of DNA 
amplification can be manipulated. Several agents have 
been found that increase the incidence of DNA ampli- 
fication. Pretreatment with hydroxyurea, ultraviolet 
light, or MTX itself increases the incidence of the 
initial amplification of the DHFR sequences (Brown 
et al., 1983a,b; Tlsty et al., 1984b). Similar observa- 
tions have been made using an SV40-transformed cell 
system to detect amplification of SV40 sequences 
(Lavi, 1981). Viral sequences undergo an amplification 
process that is enhanced by pretreatment with carcino- 
genic agents. Lavi and coworkers (Lavi and Etkin, 
1981) observed dramatic increases in viral sequences 
after the cells were treated with agents such as benzo- 
pyrene, aflatoxin, methlymethane sulfonate, and a host 
of other carcinogens. The extent of the enhancement 
of amplification can be as little as a few-fold or exceed 
a 1000-fold. The basis for the enhancement of gene 
amplification by carcinogen pretreatment is not 
known at the present time. 


Frequency of Sporadic Amplification in 
Mammalian Cells 


In the last few years, it has become obvious that the 
frequency of gene amplification in different cells can 
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vary dramatically. Initially, gene amplification was 
measured in the model systems that were used to 
study the phenomenon: established rodent cell lines 
such as $180, BHK cells, CHO cells, and 3T6 cells. 
Reported values for the rodent model systems were 
incidences of 107% or 10°“ or rates that approached 
107° events/cell/generation. Several laboratories have 
begun examining the incidence of gene amplification 
in different cell populations. Early results suggested 
that tumorigenic cells could amplify more frequently 
than nonturmorigenic cells (Sager et al., 1985; Otto 
et al., 1989). Ingeneral, highly tumorigeniccells amplify 
at a greater frequency than nontumorigenic cells. Earl- 
ier studies of gene amplification used immortalized 
cell lines and biopsied tumor samples. However, in two 
studies, the amplification potentials of primary diploid 
cells, both human and rodent, were examined and 
quantitatively compared with the amplification po- 
tentials of their transformed counterparts. Strik- 
ingly, the difference in amplification incidence 
between ‘normal’ cells and their transformed counter- 
parts (in some cases tumorigenic) is immense (Tlsty, 
1990; Wright et al., 1990). Amplification potential was 
measured at two loci, the CAD gene and the DHFR 
gene. Comparatively quantitative data for both 
normal (<2x10~8) and transformed cell lines (10~*) 
indicated a difference in frequency which is greater 
than four orders of magnitude (Tlsty, 1990). These 
studies suggest that there is some fundamental differ- 
ence between normal cells and transformed cells that 
affects their ability to amplify; diploid cells lack a 
detectable frequency of gene amplification, while 
tumorigenic cells readily amplify DNA sequences (at 
least a fourth order of magnitude difference). Sub- 
sequent studies have identified the p53 tumor sup- 
pressor gene as a regulator for the amplification 
event in mammalian cells (Livingstone et al., 1992; 
Yin et al., 1992). 

Experiments with tissue culture cells have shown 
us that a wide variety of loci may be amplified in 
mammalian cells. The amplification is usually mani- 
fested as an overproduction of the protein product 
that is targeted by the chemotherapeutic agent. Luria- 
Delbriick fluctuation analysis has demonstrated that 
the amplification events are occurring spontaneously 
at a constant rate; it is the selective environment that 
allows them to be visualized. A recent study has 
compared the amplification rate in nontumorigenic 
and tumorigenic cells and found that the tumorigenic 
cells amplified the endogenous locus 100 times more 
than the nontumorigenic cell line (Tlsty et al., 1989). 
Restrictions on the loci that can spontaneously 
amplify have not been encountered. Studies have also 
shown that more than one locus can be amplified at 
the same time (Giulotto et al., 1989). 


Summary 


The literature suggests that when gene amplification 
does occur in normal tissues it is developmentally 
regulated. This evidence is mostly compiled from 
studies on Xenopus and Drosophilia (see the section 
“Development”). In higher organisms, the documen- 
tation of gene amplification as a developmental event 
is lacking. At the present time we do not know if gene 
amplification can be developmentally programmed in 
mammalian cells. 

Sporadic amplification can occur in unicellular 
organisms such as bacteria and yeast, but seems to be 
lacking in the normal somatic tissues of higher eukary- 
otes. Several reports of sporadic amplification in the 
germline cells of several organisms have been reported 
and have been shown to be heritable. In all of these 
cases, the phenotype demonstrated an increased resist- 
ance to an environmental toxin (Mouches et al., 1986; 
Maroni et al., 1987; Prody et al., 1989). The extensive 
documentation of sporadic amplification in neoplastic 
tissues raises questions of when the neoplastic cell 
acquires the ability to amplify and if the manipulation 
of this event can aid in the treatment of cancer. 
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Gene cassettes are small, discrete mobile elements. 
A gene cassette generally comprises a single gene and 
a downstream 59-be (59-base element) which is a 
recombination site. Cassettes differ from most other 
known mobile elements in that they do not encode the 
enzymatic machinery responsible for their movement; 
this is supplied by a companion element called an in- 
tegron (see Integrons). Cassette integration involves a 
site-specific recombination reaction between the 59-be 
and the att/ site in an integron, which is catalyzed by 
an integron-encoded Intl-type integrase. Excision of 
cassettes occurs via bothattI x 59-be and 59-be x 59-be 
reactions. The mobility of gene cassettes is thus depend- 
ent on the presence of an integron in the same cell but, 
as the most common location for gene cassettes is with- 
in an integron, this condition is normally satisfied. 


Structure of Gene Cassettes 


The organization of gene cassettes is very compact. 
They generally include only a single gene (or open 
reading frame) and a downstream recombination site 
called a 59-be (59-base element) and any gene can, in 
theory, be part of a cassette. Occasionally, two open 
reading frames (ORFs) are found in a single cassette. 
Gene cassettes are normally found in a linear form 
integrated at the att] site of an integron, but can 
also exist transiently in a free, closed-circular form 
(Figure | A) which is created as a product of excision 
of a cassette from an integron. Circular cassettes can 
be reincorporated at the att/ site of an integron, and 
IntI1-catalyzed integration of gene cassettes into the 
attI1 site of a class 1 integron has been demonstrated 
experimentally using the IntI1 integrase (see Inte- 
grons). As the recombination crossover has been local- 
ized to a unique position between the conserved G 
and TT in the 1R site of 59-be, integrated cassettes 
begin with TT and end with G (Figure |B). 
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As few as 7 bp separate the first inframe initiation 
codon of the gene from the start of the linear form of 
the cassette, and the termination codon normally lies 
very close to the 59-be or even within it. Thus, there is 
usually no space for transcription signals and such 
cassettes rely on the presence of an upstream promoter 
for expression of their genes. This promoter is nor- 
mally supplied by the integron (see Integrons) and the 
correct orientation of the gene in a cassette with 
respect to the promoter in the integron is essential if 
the gene is to be expressed. This is achieved only when 
the 59-be is located downstream of the gene, as is the 
general rule. In rare cases, a promoter is located within 
the cassette. Both a promoter and translational 
attenuation signals have also been found upstream of 
cmlA genes that confer resistance to chloramphenicol, 
and production of the protein is induced by chloram- 
phenicol. The presence ofa promoter within a cassette 
will permit expression of genes in cassettes that are 
situated too far from the integron’s promoter to per- 
mit expression from it. 


Cassette-Associated Genes 


It is presumed that any gene can become packaged in 
cassette form, though how and where this happens 
remains a matter for speculation. Many (over 60) of 
the known cassettes contain an antibiotic resistance 
gene, and these genes determine resistance to vari- 
ous antimicrobial agents (f-lactams, aminoglyco- 
sides, trimethoprim, chloramphenicol, erythromycin, 
rifampicin, and antiseptic quaternary ammonium 
compounds) using a variety of mechanisms. Over 
150 further cassettes have been found in the Vibrio 
cholerae small chromosome. Only a few of the ORFs 
(potential genes) contained in these cassettes have 
been identified. They include genes for a toxin, a 
virulence determinant, and a lipoprotein as well as a 
few potential antibiotic resistance genes. Restriction 
and modification enzymes have also been found to be 
encoded in cassettes. 


Cassette-Associated Recombination 
Sites 


The 59-be recombination sites found in cassettes pro- 
vide the signal that permits cassettes to be mobilized. 
They were originally recognized as a consensus of 
59bp found downstream of several different genes 
and were subsequently shown to be recombination 
sites recognized by the integron-encoded IntI inte- 
grases. Each cassettes includes a unique 59-be and 
members of the 59-be family were later found to 
vary considerably in sequence and length; the shortest 


are 57 bp and the longest 141 bp. However, all 59-be 
share a set of identifiable features. The term 59-be has 
been retained because it has been widely used and 
is generally understood. The VCR (Vibrio cholerae 
repeat) found in cassettes from the V. cholerae in- 
tegron region also share these features and are thus 
59-be. Each 59-be is made up of two regions of 25- 
30 bp located at the outer ends (labeled LH and RH 
simple sites in Figure |) that each have an organiza- 
tion equivalent to that of the simple sites of other 
integrase-type recombinases. Each simple site includes 
a pair of inversely oriented core sites of 7 bp (boxed in 
Figure |) that are part of somewhat longer IntI bind- 
ing domains. Both simple site regions are needed for 
the 59-be to be an effective recombination site, though 
one simple site can participate in recombination at 
greatly reduced efficiency. The overall organization 
of 59-be is unusual as the sites recognized by other 
integrases (tyrosine recombinases) include only one 
simple site. 

The sequences of the LH and RH simple sites are 
only moderately conserved and the consensus regions 
in 59-be are confined to them. Indeed, the variation 
between the sequences of individual 59-be is such that 
only eight bases, four in each simple site region, are 
completely conserved in known 59-be. The sequences 
of the two consensus or simple site regions are imper- 
fect inverted repeats of one another and, in any indi- 
vidual 59-be, complementarity between key bases in 
the two simple site regions appears to be preserved in 
preference to conformity to the consensus. The length 
of the central region of 59-be between the two simple 
sites is highly variable and this accounts for differences 
in the lengths of 59-be. The central sequence is also 
variable, but it commonly includes an inverted repeat. 
The importance of these features in recognition of 
59-be-type sites remains to be established. How- 
ever, differences between the LH and RH simple site 
regions, such as the extra residue in 2L (x in Figure 1), 
may play a role in ensuring that the RH simple site is 
the location for strand exchange and hence that the 
cassette gene is correctly oriented with respect to the 
promoter in the integron. 

A small number of examples of variants of known 
gene cassettes that have lost most of the 59-be have 
been found. In each of these, only one simple site 
remains. This simple site is made up of the 1L and 
1R core sites of the original 59-be separated by a 
spacer. Cases where the spacer is derived from the 
spacer of either the RH or LH simple site of the 
original 59-be or from the att/1 simple site have been 
found. While it is probable that cassettes containing 
only a simple site can move, this has not yet been 
demonstrated. 
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Figure | Structure of a circular gene cassette. A generalized typical cassette in (A) its free, circular form and (B) its 


linear integrated form showing the coding region of the gene and the 59-be recombination site. The extent of the 
coding region (not to scale) is delineated by start (ATG) and stop(*) codons. 7-bp core site sequences, related to the 
consensus GTTRRRY, that lie within putative Intl binding sites are boxed, and arrows indicate their relative 
orientations. Only bases found in all 59-be are shown, while other consensus bases are represented by dots. An extra 
base in 2L is marked by an x. In any individual 59-be, sites numbered | are closely related, as are sites numbered 2, 
but the bases between them are not. The left hand (LH) and right hand (RH) simple sites consist of pairs of core sites 
(IL and 2L; 2R and IR, respectively) together with flanking sequences. The region located between the LH and RH 
simple sites is an inverted repeat (represented by a pair of arrows) which has a variable sequence and length. A vertical 
arrow indicates the recombination crossover point. On integration of the cassette into the attl site of an integron, the 
IR core site is split at the recombination crossover point so that the last six bases of IR in the circular cassette 


become the first six bases of the integrated cassette. 


Cassettes usually Congregate in 
Integrons 


The normal location for gene cassettes is within the 
attl site of an integron (see Integrons). Indeed, cas- 
settes were first identified as discrete entities because 
they constituted variable regions found in the sur- 
rounding conserved integron structure, and only 
subsequently was their mobility established experi- 
mentally. Integrons can capture one or many gene 
cassettes to form arrays of cassettes. These arrays can 
include one, a few, or many gene cassettes. The arrays 
can readily be lengthened by incorporation of new 
cassettes, shortened by excision of one or more cas- 
settes, or reshuffled to create new orders. All of these 
events can be effected by IntI1-mediated recombin- 
ation between att] and a 59-be or between two 59-be. 
Several different classes of integron that include dif- 
ferent intI/attI modules have been found and identical 
cassettes have been found in the cassette arrays of the 
integrons belonging to different classes (1, 2, 3, etc.). 
This indicates that integrons share gene cassettes 
and that the 59-be sites are recognized by all IntI1 
integrases. 


However, cassettes are not always found associated 
with an integron. The IntI1 integrase and possibly 
other IntI integrases can, at low frequency, catalyze 
recombination between a 59-be site (a primary recom- 
bination site) and a secondary site, and this reaction 
can lead to the integration of a gene cassette at a 
location other than an att! site. The secondary sites 
conform to a simple consensus (Ga/tT) and this 
potentially permits incorporation of new genes at 
many different positions. Though this reaction is 
much less efficient than recombination between two 
primary sites, it may be quite important in the evolu- 
tion of bacterial chromosomes. However, when a cas- 
sette is incorporated at a secondary site, the gene it 
contains can only be expressed if the cassette includes 
a promoter or an appropriately oriented promoter is 
located upstream. 


Further Reading 

Hall RM and Collis CM (1995) Mobile gene cassettes and inte- 
grons: capture and spread of genes by site-specific recombin- 
ation. Molecular Microbiology |5: 593—600. 
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Gene conversion is an event in which a gene in a 
heterozygous diploid appears to have taken on the 
identity of its allele. It is distinguished conceptually 
and operationally from crossing-over by its nonrecip- 
rocal nature. This nonreciprocality can be convin- 
cingly demonstrated for conversion that occurs during 
meioses in which all four products of individual acts of 
meiosis can be recovered, as in many fungi. When the 
heterozygous diploid A/a undergoes meiosis, most of 
the resulting tetrads are composed of two A cells and 
two a cells (normal 2:2 segregation). Gene conversion 
is manifest as occasional violations of this Mendelian 
rule, in which ratios 3A:1a or 3a:1A (3:1 segregation) 
occur (Figure 1). 


Related Aberrant Segregation Ratios 


Other variations in the segregation ratio can be seen in 
fungi that have eight spores due to a postmeiotic 
mitosis. These variations can be seen also in four- 
spored fungi when care is taken to determine separ- 
ately the genotypes of the chromosomes carried in the 
daughter cells of the first mitosis following meiosis. 
The common variations are 5:3 (5A:34 or 5a:3A) 
or aberrant 4:4. In 5A:3a tetrads two of the haploid 
cells are A on both strands, one is a on both strands, 
and one is heteroduplex A/a. These tetrads are ‘half- 
conversion’ tetrads by virtue of having an allele ratio 
which is half-way between the normal 2:2 ratio and 
the 3:1 ratio of (full) conversion. In aberrant 4:4 
tetrads, one haploid cell is A on both strands, one is a 
on both strands, and two are A/a heteroduplexes. 
In both 5:3 and aberrant 4:4 tetrads, segregation of 
alleles is completed only at the first postmeiotic 
mitosis. Accordingly, such tetrads are often called 


diploid cell m/+ 


l meiosis 


normal gene conversion 
segregation 
m m m 
m m + 
or 
+ m + 
+ + + 
Figure | Gene conversion in meiosis. When a 


heterozygous diploid cell (genotype m/+) undergoes 
meiosis, the usual outcome is two haploid cells of 
genotype m and two of genotype +. Occasionally, 
however, this normal, Mendelian segregation is dis- 
turbed, with three of the haploid cells being of one 
genotype and one of the other. This aberrant segrega- 
tion is a manifestation of meiotic gene conversion. 


postmeiotic segregation (PMS) tetrads. Tetrads with 
segregation other than normal 2:2 are collectively 
called aberrant segregation tetrads. The rarity of tet- 
rads whose ratios are more extreme than 6:2 implies 
that two of the four meiotic chromatids are unin- 
volved in any given interaction that leads to aberrant 
segregation. 


Conversion Results in Nonreciprocal 
Recombination 


In two-factor crosses, aberrant segregation at one site 
can occur separately from that at the other site — 
aberrant segregation is local. When the aberrant seg- 
regation is 3:1, such tetrads produced from the diploid 
AB/ab will be of four kinds depending on which site 
converts and in which direction: (AB AB Ab ab) and 
(AB aB ab ab) have both been converted at the site 
marked by the alternatives A and a; (AB AB aB ab) and 
(AB Ab ab ab) have been converted at the site marked 
by the alternatives B and b. Each tetrad contains a 
recombinant spore, Ab or aB, but not a pair of com- 
plementary recombinants. Thus, conversion produces 
recombinants nonreciprocally. 

In two-factor crosses, the other kinds of aberrant 
tetrads contain recombinants, too. In a 5a:3A tetrad, 
the eight strands of DNA, paired as they would be ina 
tetrad, are AB AB, AB aB, ab ab, ab ab. In a tetrad that 
is aberrant 4:4 at the A site, the eight strands are AB 
AB, AB aB, Ab ab, ab ab. Thus, 5:3 tetrads are recom- 
binant (on one polynucleotide strand) nonrecipro- 
cally; aberrant 4:4 are recombinant (on two strands) 
reciprocally. Coaberrant segregation in PMS tetrads is 
common for markers within a few hundred base pairs 
of each other. 


Conversion Gradient 


The frequency of conversion (half and/or full) for 
markers within a given gene varies with the position 
of the marker. These rates may vary monotonically 
from one end of the gene to the other (conversion 
gradient). 


Aberrant Segregation and Crossing- 
Over 


In three-factor crosses with linked markers, aberrant 
segregation at the central site is accompanied by 
crossing-over of the flanking markers (as long as they 
lie outside the aberrant segregation tract) about half 
the time. Conversion gradients and the correlation of 
aberrant segregation with crossing-over have motiv- 
ated models for meiotic recombination. 


The Double-Strand-Break-Repair Model 
for Conversion and Crossing-Over 


A double-strand-break-repair (DSBR) model has 
enjoyed support both from genetic analysis of tetrads 
and from physical analysis of meiotic DNA. In the 
DSBR model as currently understood (Figure 2), 
one chromatid is cut at a place that, for reasons of 
chromatin structure, is sensitive to a meiosis-specific 
endonuclease. In Saccharomyces cerevisiae, these places 
are often at promoters of transcription. The 5’-ended 
strands on each side of the break are resected by an 
exonuclease. The resulting 3/-ended single strands 
bind a protein related to RecA of Escherichia coli, 
which enables them to invade a chromatid of the 
homolog. The invading ends form hybrid DNA with 
the complementary strand of the intact homolog dis- 
placing the resident strand. Heteroduplexes (hybrid 
DNA with point(s) of noncomplementarity between 
the two strands) may activate the mismatch repair 
system, resulting in degradation of some of the invad- 
ing strand. DNA synthesis, primed by the invading 3’ 
ends and using the intact homolog as template, 
replaces DNA lost by the initial resection and by 
mismatch repair. These rounds of DNA destruction 
and replacement can result in 5:3 and 6:2 segregations 
(Figure 2). As a result of synthesis and covalent com- 
pletion of both invading strands, the two participants 
are held together in a joint molecule by two Holliday 
junctions. The junctions may be moved outwards by 
the action of proteins like RuvA and RuvB of E. coli. 
Resolution of the joint molecule to give two duplexes 
can result in either crossing-over or noncrossing-over. 
Crossing-over will occur when one Holliday junction 
is cut (by an enzyme, resolvase) on one pair of strands 
and the other is cut on the other strands (‘vertically’ 
and ‘horizontally’ in Figure 2, left). If both junctions 
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are cut on the same strands (both ‘vertically’ or both 
‘horizontally’), noncrossing-over will result (Figure 
2, right). Noncrossovers will result, also, if only one 
junction is cut by resolvase and the other junction 
slides to the still open site of the first. A topoisomerase 
could also effect this alternative noncrossover resolu- 
tion of the joint molecule. Strand interruptions intro- 
duced by resolvase may direct a second round of 
mismatch repair. 

In S. cerevisiae, genetic support for the DSBR 
model includes the following: (1) in diploids hetero- 
zygous for an endonuclease-sensitive site, the chro- 
matid that carries the active site loses markers near 
that site; (2) a conversion gradient is demonstrable on 
both sides of the initiating site; (3) when conversion is 
accompanied by crossing-over of flanking markers, 
the exchange effecting the crossing-over cannot be 
located uniquely to one side or the other of the con- 
verted site; (4) in the absence of the major mismatch 
repair system, or when the markers used escape detec- 
tion by that system, the frequency of 5:3 tetrads rises 
at the expense of 6:2 tetrads; and (5) aberrant 4:4 
tetrads are seen for markers at the low end of the 
conversion gradient. 

In the ARG4 gene of S. cerevisiae the noncrossover 
resolution of the joint molecule intermediate appears 
to occur only rarely by cutting of the two Holliday 
junctions. In tetrads that segregate 5:3 for markers 
close to and on opposite sides of the initiation site, 
most observed double heteroduplexes are in the same 
chromatid, in the configuration shown in Figure 2 for 
the alternative resolution (Gilbertson and Stahl, 1996). 
Furthermore, 5:3 tetrads of the type shown on the 
right in Figure 2 are so rare in yeast as to be called 
‘aberrant 5:3s.’ They are recognized as noncrossovers 
manifesting quasi-reciprocal exchange of a short seg- 
ment between the two participating chromatids. 

Physical analyses of isolated meiotic yeast DNA 
have supported the DSBR model: (1) double-strand 
breaks occur at hot spots for meiotic recombination 
at rates commensurate with the rates of aberrant 
segregation of markers near those hot spots; (2) the 
5’-ended strands on either side of a double-strand 
break are eroded; (3) joint molecules, in which hom- 
ologous duplexes are held together by a pair of Hol- 
liday junctions, arise near recombination hotspots 
(Schwacha and Kleckner, 1995); and (4) mutations 
that block the progression of physically monitored 
events block meiotic recombination. 

The DSBR model offers the opportunity for full 
conversion that is independent of mismatch repair 
of heteroduplexes. If the 3’-ending as well as the 5/- 
ending strand should be resected, a double-strand gap 
arises. The repair of this gap using the homolog as 
template will result in a full conversion tetrad for 
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Figure 2 Double-strand-break-repair model. A duplex (a) is cut on both strands at a hot spot (b). Resection of 5’ 
ends creates 3’ overhangs (c). These single strands of DNA bind proteins like RecA of Escherichia coli and invade the 
homolog, creating regions of hybrid DNA (d). (Alternatively, invasion by one end, followed by DNA synthesis primed 
by that end, displaces a strand from the intact chromatid, which can then anneal with the resected end on the other 
side of the initiating break. The resulting double Holliday junction intermediate is the same for either scenario.) The 
Holliday junctions (where the strands swap partners) may be pushed outward (branch migration) (e). Mismatch repair 
of heteroduplexes (shown only on the right side of the initial break) removes invading DNA from the break site to a 
mismatch. DNA lost by resection is resynthesized (broken lines) using the intact homolog as template, creating a joint 
molecule (f). Resolution of the joint molecule results in crossing-over if one junction is cut vertically and the other 
horizontally (g). Noncrossovers result if both junctions are resolved in the same way (h) or if the two participating 
duplexes are separated from each other by an alternative route that could involve a topoisomerase or could result 
from the cutting of one junction followed by sliding of the other (i). In the ARG4 gene of Saccharomyces cerevisiae, the 
rarity of 5:3 segregations of the type shown on the bottom-right (h) and the occurrence of tetrads like those shown 
on the bottom-middle (i) imply that cutting of the two Holliday junctions is rarely the route to the noncrossover 
resolution of joint molecules. (Modified from Szostak et al., 1983.) 
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Figure 3 Single-strand-gap-repair model. Premeiotic replication of chromosomes (a) results in occasional single 
strand gaps in one daughter or the other (b). RecA-like protein binds to this single-stranded DNA and catalyzes 
interaction with a chromatid of the homolog (c). The invading 3’ end primes DNA synthesis using the intact homolog 
as template (D). The resulting joint molecule may contain two Holliday junctions (i), or only one (e). Resolution of the 
joint molecule by cutting the single Holliday junction (f) may yield either crossover (g) or noncrossover products (h). 
Examples of segregation ratios prior to and consequent to mismatch repair are shown. If two Holliday junctions are 
formed, alternative resolution to form noncrossovers, effected by topoisomerase or by cutting of only one junction, 
would preserve the homolog genetically intact (j), in keeping with most observations of noncrossovers in 
Saccharomyces cervisiae. (Modified from Kuzminov, 1996.) 


any marker that was in the gap. The failure to replace 
all 6:2 tetrads with 5:3 tetrads consequent to the 
removal of mismatch-repair systems suggests such 
gap repair. When the marker examined is a deletion 
of the initiation site, the only conversions seen are full 
conversions, which favor the deletion and which occur 
independently of mismatch repair. 


Conversion Initiated by Single-Strand 
Gaps? 


Hot spot recombination is demonstrably due to 
hot spots for meiosis-specific double-strand cuts and 


accounts for a major fraction of meiotic recombin- 
ation in S. cerevisiae. The possibility of other routes 
to conversion, perhaps with attendant crossing-over, 
remains open. A prime candidate for another route is 
single-strand gap repair. When DNA is damaged on 
one strand, replication can skip across the impedi- 
ment, producing two duplexes, one of which is gapped 
in its daughter strand. In mitotic cells, such a gap can 
be repaired, and the impediment removed, with help 
from the sister duplex (West et al., 1981). In meiotic 
cells, a chromatid left gapped on one strand following 
premeiotic DNA replication may be repaired with the 
aid of the homolog, rather than the sister chromatid 
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(Figure 3). Such events could be responsible for 
crossing-over and conversion that may be unaccounted 
for by hot-spot-initiated recombination. 


Meiotic Conversion Separable from 
Crossing-Over 


Factors that alter the fraction of conversions that are 
accompanied by crossing-over, without altering the 
total rate of conversion, are understood to be operat- 
ing on the resolution of joint molecules. However, 
the existence of factors that alter the frequency of 
meiotic conversion without changing the frequency 
of crossing-over suggests that some conversions are 
formed by a route that does not lead to crossing-over. 
Single-strand gap repair (Figure 3) could be such a 
route if resolution of joint molecules so formed were 
constrained to noncrossover modes. It is plausible, 
however, that some treatments alter conversion rates, 
but not crossover rates, by altering the lengths of 
heteroduplex DNA and/or the probabilities of mis- 
match repair in joint molecule intermediates like those 
of Figure 2 without altering the rate of formation of 
such joint molecules or the mode of their resolution. 


Common Misuse of ‘Conversion’ 


‘Conversion’ is often used to denote just those meiotic 
conversion events that are not accompanied by cross- 
ing-over of flanking DNA. This use tends to create the 
false impression that meiotic conversion and crossing- 
over are mutually exclusive events. 


Other Occurrences of Conversion 


‘Conversion’ is widely used to denote any recombin- 

ation event that appears to involve nonreciprocal 
exchange of a segment of DNA, especially when 
those events are unaccompanied by exchange of flank- 
ing markers. Transformation of cells by introduced 
fragments of genomic DNA qualify for this use of 
‘conversion, as does phage-mediated transduction. 
In vegetative or somatic cells that contain reverse tran- 
scriptase, a DNA copy of an mRNA molecule can 
‘convert’ a homolog. Conversion (the nonreciprocal 
change of a gene by its homolog) may be responsible 
for maintaining sequence identity between multicopy 
genes. Some of these conversions may occur by the 
same mechanisms as does meiotic conversion. 

Some conversions serve to alter gene expression. 
Among such events are mating-type switching in 
some yeasts, surface antigen changes in trypanasomes, 
and functional diverse immunoglobulin gene forma- 
tion in chickens. The first two systems involve the 
nonreciprocal transfer of information from a silent 


locus to an expression locus and are not normally 
accompanied by crossing-over. 
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Gene dosage is the number of copies of a particular 
gene locus in the chromosome. In most cells, this is 
either one or two. 


See also: Dosage Compensation 
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Gene duplication is a process that occurs periodic- 
ally (usually rarely) within genomes of all types of 


organisms. As the name implies, one or more add- 
itional copies of a preexisting gene are generated. The 
new copy may reside adjacent to the original (tandem 
duplication) or be inserted at a novel chromosomal 
location (dispersed duplication). The duplication pro- 
cess may be reiterated a number of times, leading to 
the production of gene families; the history of various 
family members can often be deduced by sequence 
comparisons. 

Tandem duplications, i.e., the creation of a new 
copy of a gene right next to the old copy on a chromo- 
some, probably occur by unequal crossing-over 
between homologous chromosomes or sister chroma- 
tids (Figure |). If the whole gene and the regulatory 
sequences that control its expression are duplicated, 
the new copy will be expressed in the same way as the 
old one. If one of the two copies accumulates muta- 
tions that inactivate the gene product, this will have no 
consequence for the organism, since the other copy 
will provide the necessary function. Inactivated gene 
copies are called pseudogenes. In rare circumstances, 
mutations in a gene copy will lead to a new function or 
a new pattern of expression for the gene product. In 


l i i 
O 1 2 
— — 
O O O 
Figure | A hypothetical gene duplication event. In 


step |, two homologous or sister chromatids undergo 
unequal crossing-over, as indicated by the x in the left 
diagram. This creates one chromosome with a deletion 
of the gene indicated by the open rectangle, and another 
with a duplication of that gene. In step 2, one of the 
copies of the duplicated gene is modified by mutation, 
indicated by the shading. This modification may inacti- 
vate the second copy, or it may alter its function or 
pattern of expression. The circle represents a centro- 
mere. 
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this fashion, the gene repertoire is augmented without 
the loss of preexisting functions. 

In the genomes of higher organisms, there are many 
examples of gene families that have arisen by gene 
duplication. For instance, the five human genes for 
the various B chains of hemoglobin that are expressed 
at different times during development are located in a 
cluster on chromosome 11 (Figure 2). There is also 
one pseudogene in this cluster. The four genes for the 
æ chains of hemoglobin reside in a separate cluster on 
chromosome 16, where there are also three pseudo- 
genes. Examining the sequence relationships among 
these genes, we can deduce that an ancient duplication 
event created separate « and B genes, and they were 
subsequently dispersed. Then, each of these was 
amplified by several tandem duplication events. Accu- 
mulated mutations created the current distinctions 
among the family members. 

Dispersed duplications are sometimes the result 


of making a DNA copy of a messenger RNA and 
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Figure 2 Depiction of the human hemoglobin gene 
family. In the & globin cluster, «| and «2 are expressed in 
fetal and adult stages, C2 in early embryos. @Cl, ~a2 and 
al are pseudogenes that are no longer functional. The 
role of the 0 gene is not known. In the B globin cluster £ 
is expressed in embryos, Gy and “y in the fetal stage, 
while B and 6 are the major and minor adult forms, 
respectively. pBI is a pseudogene. 
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inserting that copy at a novel chromosomal site. 
Typically these copies will not be expressed because 
they do not have the appropriate regulatory sequences 
around them at the new location. Because the organ- 
ism does not rely on the new copy for a functional 
gene product, mutations that accumulate in it will be 
neutral, i.e., there is no selection against them, and 
most such duplicates exist as pseudogenes. They can 
be recognized because, like the mRNAs that are their 
progenitors, they lack introns, and the sequences 
surrounding them in the chromosome bear no resem- 
blance to sequences around the real gene from which 
they were derived. 

Sometimes very large segments of a chromosome 
are duplicated at once. This has been observed in 
bacteria, where as much as 25% of the chromosome 
may be duplicated in a single event. Such large tandem 
duplications are relatively unstable because the entire 
duplicated segment is a target for elimination by 
homologous recombination. 

In some instances, gene amplification by tandem 
duplication can give cells a growth advantage. This has 
been observed in some human tumors, where duplica- 
tion of a gene whose product is involved in promoting 
cell proliferation can overcome normal cell cycle regu- 
lation and lead to uncontrolled growth. An example is 
amplification of the N-myc gene in some cancers of 
the nervous system. The amplified copies are arranged 
in tandem and they may be located at the normal 
N-myc chromosomal site or spun off as extrachromo- 
somal elements called double-minute chromosomes. 
These amplification events usually occur in somatic 
cells during the life of the organism and they are not 
passed on to succeeding generations as stable gene 
families. 


Further Reading 

Lewin B (1997) Genes VI. New York: Oxford University Press. 

Romero D and Palacios R (1997) Gene amplification and 
genome plasticity in prokaryotes. Annual Review of Genetics 
31: 91-111. 


See also: Double-Minute Chromosomes; 
Evolution of Gene Families; Gene Amplification 
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The genetic material contains the information ne- 
cessary for an organism to develop, function, and 
reproduce, but it is necessary that this information 


be expressed in order for any activity, even main- 
tenance, to be carried out. Therefore, one can consider 
gene expression as encompassing all the processes 
which are necessary to produce a gene product from 
a gene. One can also include all regulatory steps, those 
necessary to synthesize the gene product in appropri- 
ate amounts at an appropriate time and those involved 
in regulating the activity of the gene product. There- 
fore, the following description is not meant to be an 
exhaustive account of gene expression, but an over- 
view of some of the processes involved. Many of these 
processes are explored in more detail in articles else- 
where in this volume. 


Transcription 


The genes of all cellular organisms are composed 
of double-stranded DNA (some viruses have single- 
stranded DNA genomes and others even RNA 
genomes) and the first step in their expression is tran- 
scription (see Transcription). Transcription involves 
using one of the two strands of DNA as a template 
to make an RNA copy by an enzyme called RNA 
polymerase (see RNA Polymerase). All RNA poly- 
merases synthesize an RNA chain from the 5’ end to 
the 3’ end while reading the template strand of the 
DNA in the 3’ to 5’ direction. The RNA molecules are 
synthesized from specific starting sites on the DNA 
and also terminate at specific sites. The sites where 
RNA polymerase (using accessory factors) recognizes 
the beginning of a transcriptional unit are termed 
promoters (see Promoters). In higher organisms, the 
unit of transcription is almost always a single gene. 
However, in prokaryotes the transcriptional unit may 
contain several contiguous genes. These genes are often 
related in function and/or belong to one pathway. 
Transcription is a target of several regulatory 
mechanisms. These can serve to repress or activate 
transcription, or lead to premature termination. One 
common mechanism in bacteria is the binding of 
a repressor protein to a specific region of the DNA 
near the promoter which then blocks transcription 
(see Repressor). The sequence to which the repressor 
protein binds is termed an ‘operator’ (see Operators), 
a term which has given its name to the transcriptional 
unit called an ‘operon’ (see Operon). In bacteria an 
operon may contain one or more genes, all under the 
control of the single operator. Another mechanism for 
regulating gene expression is the binding of a regula- 
tory protein to the DNA which activates transcrip- 
tion. Such positive control is widespread in eukaryotic 
genes. It is not uncommon for genes to be under more 
than one form of regulation, nor is it uncommon, in 
bacteria, for some regulatory proteins to be both re- 
pressors and activators for different genes. Attenuation 


is another form of transcriptional regulation, but in 
this case the transcript is terminated early in elonga- 
tion (see Attenuation). The mechanism by which 
attenuation takes place can vary quite dramatically 
between different organisms. Also not all regulatory 
molecules are proteins; regulatory RNA can also play 
a role (see Regulatory RNA). 

The majority of genes encode proteins, and the 
RNA transcript must then be used as (or processed to 
become) a messenger RNA (mRNA). As mentioned 
above, eukaryotic transcriptional units are almost 
always single genes, but some transcripts from 
protein-encoding genes (particularly from animals) 
can be very long (more than one million bases). The 
great length of these transcripts results from the fact 
that the protein-encoding genes of eukaryotes often 
have several introns (noncoding sequences) inter- 
spersed within the coding sequences (exons), and these 
are transcribed as a unit. Such genes are sometimes 
referred to as ‘split genes’ (see Introns and Exons; Split 
Genes). In genes containing introns, then, one part 
of gene expression is the processing of the transcript 
to remove these introns. Indeed, in eukaryotes 
most transcripts from protein-encoding genes need 
three distinct processing steps to be converted into 
mRNA: capping, splicing, and tailing. Capping 
involves adding a modified guanosine to the 5’ end of 
the pre-mRNA. It is this cap that allows the RNA to 
be recognized by the translational machinery of the 
cell as an mRNA. The RNA splicing process removes 
introns and joins the exons together. Tailing involves 
cutting the transcript at a specific site downstream of 
the region encoding the protein and polyadenylating 
the newly created 3’ end. 

These processing events are coupled to transcrip- 
tion. Capping takes place very soon after transcription 
has started. At least in the higher eukaryotes, where 
genes may have, in the extreme, many large introns, 
splicing is also coupled to transcription. The splicing 
process in eukaryotic pre-mRNA is complex and 
involves ribonucleoprotein particles called ‘spliceo- 
somes’ that contain various protein factors and small 
nuclear RNA molecules (snRNPs or ‘snurps’; see Pre- 
mRNA Splicing). Splicing involves recognition of spe- 
cific sites on the RNA and very precise cleavage and 
ligation of the RNA (since an error of a single nucleot- 
ide will result in a frameshifted message). Splicing is 
also regulated, and some genes have transcripts that 
can be spliced in more than one way (alternative spli- 
cing) to yield more than one protein from a single 
gene. Alternative splicing pathways are particularly 
prevalent i in the transcripts from genomes of small 
animal viruses but occur in other genomes also. 

The transcripts of protein- encoding genes from pro- 
karyotes do not require processing to be functional; 


Gene Expression 781 


therefore, the transcripts of these genes are mRNAs. 
Also, as mentioned above, some transcriptional units 
in prokaryotes contain information from several con- 
tiguous genes. The mRNAs produced from such units 
are said to be ‘polycistronic, in contrast to ‘mono- 
cistronic’ mRNA, which carries information for only 
one gene product (see Polycistronic mRNA). In 
Escherichia coli over 70% of the mRNA is mono- 
cistronic and about 30% is polycistronic (with about 
6% containing the information from four or more 
genes). 

For some genes the final product is an RNA mol- 
ecule, but even here processing is involved, and in this 
case processing occurs in both prokaryotes and eukar- 
yotes. (Therefore, the only major class of RNA that 
can be used directly as transcribed is mRNA from 
prokaryotes.) The only genes we shall discuss here 
whose final product is RNA are genes encoding trans- 
fer RNA (tRNA) and genes encoding ribosomal RNA 
(rRNA). In both prokaryotes and eukaryotes, some of 
both types of genes may contain introns. Although the 
process by which these introns are removed involves 
excising the intron and ligating the exons, and is called 
‘splicing,’ the machinery which performs these reac- 
tions is not related to that which splices eukaryotic 
mRNA (see Introns and Exons). Some of the introns 
in rRNA and tRNA are self-splicing (and self-splicing 
introns are also known in a few bacteriophage 
mRNAs). Self-splicing introns (a particular kind of 
self-splicing intron) are widely found in nature and 
they are the only type found in bacteria and bacterio- 
phages. In both eukaryotes and prokaryotes, tRNAs 
and rRNAs are made initially as longer precursors and 
all must be cut to their final size. In addition, tRNAs 
contain many modified bases (and in some cases the 
final conserved CCA sequence at the 3’ end must be 
added enzymatically; see Transfer RNA (tRNA)). 
Modification of rRNAs is less extensive (see Ribo- 
somal RNA (rRNA)). 

All these RNAs, whether they are informational 
intermediates like mRNA or final products of gene 
expression like tRNA and rRNA, are used in the next 
step of gene expression: translation. 


Translation 


In prokaryotes, transcription and translation are 
coupled, that is, the translation of a mRNA begins 
before its synthesis is complete. There are even some 
regulatory mechanisms which take advantage of this 
coupling (see Attenuation). In eukaryotes, however, 
transcription (and processing) occurs in the nucleus 
and translation occurs in the cytoplasm. Therefore, in 
eukaryotes the mature mRNA must be transported 
to the cytoplasm. In all organisms, most mRNA is 
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reasonably unstable (as contrasted to tRNA and 
rRNA), but the stability of mRNAs from different 
genes can vary widely, and mRNA stability is another 
area where gene expression can be regulated. 

Translation itself is the process whereby protein is 
synthesized using the information in the mRNA as a 
template (see Translation). This process takes place on 
large ribonucleoprotein particles called ‘ribosomes’ 
(see Ribosomes) which contain many different pro- 
teins and one copy of each of the different rRNAs that 
the cells make. A large number of different protein 
factors as well as the cell’s tRNAs are involved in the 
overall process. However, it has been demonstrated 
that peptide bond formation (the linkage together of 
the amino acid residues) is catalyzed by the large sub- 
unit rRNA. 

In translation the ribosome (and attendant factors) 
must first recognize the start site of the information 
encoding the protein and then proceed down the 
mRNA (in the 5’ to 3’ direction) until a stop codon is 
reached and chain growth is terminated. The protein 
synthesized will have an amino acid sequence corres- 
ponding in identity and order to the three base codons 
of the genetic code (see Genetic Code). Prokaryotic 
ribosomes bind to mRNA at a ribosome-binding site, 
which is a larger sequence than just the start codon 
(see Ribosome Binding Site). Eukaryotic ribosomes 
typically bind to the cap at the 5’ end of the mRNA 
and travel down the ribosome, initiating protein 
synthesis at the first possible start codon (AUG, 
methionine). The differences in the signal for binding 
of the ribosome to the mRNA and initiating the 
synthesis of a protein allow prokaryotic ribosomes 
to use polycistronic mRNA, since downstream cis- 
trons will require initiation of protein synthesis from 
some genes toward the middle of such an mRNA 
molecule. 

The codons on the mRNA are ‘read’ by anticodons 
on aminoacylated tRNAs, and peptide bonds are 
formed between consecutive amino acid residues car- 
ried by adjacent tRNAs. The protein being synthe- 
sized is typically folding during synthesis and, when a 
stop codon is reached, the completed protein is hydro- 
lyzed from the last tRNA and released; its tertiary 
structure may be nearly formed. Translation is also a 
step at which regulation can occur, and mechanisms of 
translational regulation are known that involve both 
regulatory RNA and regulatory protein. 


Posttranslation Steps in Gene 
Expression 


There are several possible steps that can take place 
after translation which alter the activity of a protein 
(and therefore alter gene expression). Of course, many 


enzymes can be inhibited or activated by a number of 
noncovalent interactions with small molecules. How- 
ever, many proteins are subject to covalent modifica- 
tions which also affect their normal activity, location, 
or stability. Indeed the majority of proteins undergo at 
least some modification as the initiating methionine 
(or N-formyl-methionine) is removed. Some proteins 
require more extensive processing. For instance, tryp- 
sin is cleaved from an inactive precursor, and many 
peptide hormones such as insulin are cleaved in a more 
complicated pattern from larger molecules. There are 
also examples known where the protien must be 
‘spliced.’ Protein splicing involves cutting out inter- 
vening amino acid residues (called ‘inteins’) and ligat- 
ing together those portions of the protein required for 
activity (‘exteins’). Although not as common as RNA 
splicing, protein splicing has been found in a number 
of organisms, both prokaryotic and eukaryotic (see 
Protein Splicing). 

One other type of cleavage that may occur relates to 
proteins that are specifically transported into various 
membrane-bound cellular compartments or exported 
from the cell. Such proteins have a signal sequence, or 
leader peptide, at their N-terminus which is cleaved off 
by the cellular machinery during transport of the pro- 
tein across the membranes (see Leader Peptide). 

Finally there are many examples known of small 
molecules being specifically covalently attached to 
proteins and at least some of these have regulatory 
significance. Proteins from higher eukaryotes are 
often extensively glycosylated, but other modifica- 
tions also occur and modifications can also occur in 
prokaryotes. Some modifications such as the protein 
phosphorylations involved in signal transduction (a 
type of transcriptional control; see Signal Transduc- 
tion) and the adenylation of glutamine synthetase are 
reversible. Some posttranslational covalent modifica- 
tions convert a ‘standard’ amino acid inserted transla- 
tionally into a modified amino acid, such as the 
iodotyrosine in the thyroxin hormones. Although all 
of these process are considered ‘posttranslational,’ at 
least some can occur cotranslationally. 


See also: Attenuation; Autoregulation; Cistron; 
Derepression; Enhancers; Genetic Code; 
Induction of Transcription; Introns and Exons; 
Leader Peptide; Messenger RNA (mRNA); 
Operators; Operon; Polycistronic mRNA; 
Pre-mRNA Splicing; Promoters; Protein Splicing; 
Regulatory Genes; Regulatory RNA; Repressor; 
Ribosomal RNA (rRNA); Ribosome Binding Site; 
Ribosomes; RNA Polymerase; Signal 
Transduction; Split Genes; Transcription; 
Transfer RNA (tRNA); Translation; Translational 
Control 
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Origins and Examples 


Much of the functional DNA in the genome is organ- 
ized within gene families and hierarchies of gene 
superfamilies. The superfamily term was coined to 
describe relationships of common ancestry that exist 
between and among two or more gene families, each of 
which contains more closely related members. As 
more and more genes are cloned, sequenced, and ana- 
lyzed by computer, deeper and older relationships 
among superfamilies have unfolded. Complex rela- 
tionships can be visualized within context of branches 
upon branches in evolutionary trees. All of these 
superfamilies have evolved out of combinations of 
unequal crossover events that expanded the size of 
gene clusters and transposition events that acted 
to seed distant genomic regions with new genes or 
clusters. 

A prototypical small-size gene superfamily is repre- 
sented by the very well-studied globin genes. All func- 
tional members of this superfamily play a role in 
oxygen transport. The superfamily has three main 
families (or branches) represented by the f-like 
genes, the a-like genes, and the single myoglobin 
gene. The duplication and divergence of these three 
main branches occurred early during the evolution of 
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vertebrates and, as such, all three are a common fea- 
ture of all mammals. The products encoded by genes 
within two of these branches — -globin and B-globin - 
come together (with heme cofactors) to form a tetra- 
mer which is the functional hemoglobin protein that 
acts to transport oxygen through the bloodstream. 
The product encoded by the third branch of this 
superfamily — myoglobin — acts to transport oxygen 
in muscle tissue. 

The B-like branch of this gene superfamily has 
duplicated by multiple unequal crossing over events 
and diverged into five functional genes and two f-like 
pseudogenes that are all present in a single cluster on 
mouse chromosome 7 as shown in Figure l. Each of 
the B-like chains codes for a similar polypeptide which 
has been selected for optimal functionality at a specific 
stage of mouse development: one functions during 
early embryogenesis, one during a later stage of 
embryogenesis, and two in the adult. The a-like 
branch has also expanded by unequal crossing-over 
into a cluster of three genes — one functional during 
embryogenesis and two functional in the adult — on 
mouse chromosome 11. The two adult « genes are 
virtually identical at the DNA sequence level, which 
is indicative of a very recent duplication event (on the 
evolutionary time scale). 

In addition to the primary a-like cluster are two 
isolated a-like genes (now nonfunctional) that have 
transposed to dispersed locations on chromosomes 15 
and 17. When pseudogenes are found as single copies 
in isolation from their parental families, they are 
called ‘orphons.’ Interestingly, one of the «-globin 
orphons (Hba-ps3 on Chr 15) is intronless and 
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would appear to have been derived through a retro- 
transposition event, whereas the other orphon (Hba- 
ps4 on chromosome 17) contains introns and may have 
been derived by a direct DNA-mediated transpos- 
ition. Finally, the single myoglobin gene on chromo- 
some 15 does not have any close relatives either 
nearby or far away. Thus, the globin gene superfamily 
provides a view of the many different mechanisms that 
can be employed by the genome to evolve structural 
and functional complexity. 

The Hox gene superfamily provides an alternative 
prototype for the expansion of gene number. In this 
case, the earliest duplication events (which predate the 
divergence of vertebrates and insects) led to a cluster 
of related genes that encoded DNA-binding proteins 
used to encode spatial information in the developing 
embryo. The original gene cluster has been duplicated 
en masse and dispersed to a total of four chromosomal 
locations (on chromosomes 2, 6, 11, and 15) each of 
which contains nine to twelve genes. Interestingly, 
because of the order in which the duplication events 
occurred — unequal crossing-over to expand the 
cluster size first, transposition en masse second — an 
evolutionary tree would show that a single ‘gene 
family’ within this superfamily is actually splayed 
out physically across all of the different gene clusters. 
Some gene additions and subtractions within indivi- 
dual clusters have occurred by unequal crossing-over 
since the en masse duplication so that differences in 
gene number and type can be seen within a basic 
framework of homology among the different whole 
clusters. 

A final example of a gene super-superfamily is the 
very large set of genes that contain immunoglobulin- 
like (Ig) domains and function as cell surface or sol- 
uble receptors involved in immune function or other 
aspects of cell-cell interaction. This set includes the 
immunoglobulin gene families themselves, the major 
histocompatibility genes (called H2 in mice), the T cell 
receptor genes, and many more. There are dispersed 
genes and gene families, small clusters, large clusters, 
and clusters within clusters, tandem and interspersed. 
Dispersion has occurred with the transposition of 
single genes that later formed clusters and with the 
dispersion of whole clusters en masse. Furthermore, 
the original Ig domain can occur as a single unit in 
some genes, but it has also been duplicated intragenic- 
ally to produce gene products that contain two, three, 
or four domains linked together in a single polypep- 
tide. The Ig superfamily, which contains hundreds 
(perhaps thousands) of genes, illustrates the manner 
in which the initial emergence of a versatile genetic 
element can be exploited by the forces of genomic 
evolution with a consequential enormous growth in 
genomic and organismal complexity. 


Tandem Families of Identical Genetic 
Elements 


A limited number of multicopy gene families have 
evolved under a very special form of selective pressure 
that requires all members of the gene family to main- 
tain essentially the same sequence. In these cases, the 
purpose of high copy number is not to effect different 
variations on a common theme, but rather to supply 
the cell with a sufficient amount of an identical prod- 
uct within a short period of time. The set of gene 
families with identical elements includes those that 
produce RNA components of the cell’s machinery 
within ribosomes and as transfer RNA. It also 
includes the histone genes which must rapidly pro- 
duce sufficient levels of protein to coat the new copy 
of the whole genome that is replicated during the 
S-phase of every cell cycle. 

Each of these gene families is contained within one 
or more clusters of tandem repeats of identical elem- 
ents. In each case, there is strong selective pressure 
to maintain the same sequence across all members of 
the gene family because all are used to produce the 
same product. In other words, optimal functioning of 
the cell requires that the products from any one indi- 
vidual gene are directly interchangeable in structure 
and function with the products from all other indi- 
vidual members of the same family. How is this 
accomplished? The problem is that once sequences 
are duplicated, their natural tendency is to drift apart 
over time. How does the genome counteract this 
natural tendency? 

When ribosomal RNA genes and other gene 
families in this class were first compared both between 
and within species, a remarkable picture emerged: 
between species, there was clear evidence of genetic 
drift with rates of change that appeared to follow the 
molecular clock hypothesis. However, within a spe- 
cies, all sequences were essentially equivalent. Thus, it 
is not simply the case that mutational changes in these 
gene families are suppressed. Rather, there appears to 
be an ongoing process of ‘concerted evolution’ which 
allows changes in single genetic elements to spread 
across a complete set of genes in a particular family. 
So the question posed previously can now be nar- 
rowed down further: how does concerted evolution 
occur? 

Concerted evolution appears to occur through two 
different processes. The first is based on the expansion 
and contraction of gene family size through sequential 
rounds of unequal crossing-over between homolo- 
gous sequences. Selection acts to maintain the absolute 
size of the gene family within a small range around an 
optimal mean. As the gene family becomes too large, 
the shorter of the unequal crossover products will be 


selected; as the family becomes too small, the longer 
products will be selected. This cyclic process will 
cause a continuous oscillation around a mean in size. 
However, each contraction will result in the loss of 
divergent genes, whereas each expansion will result in 
the indirect ‘replacement’ of these lost genes with iden- 
tical copies of other genes in the family. With unequal 
crossovers occurring at random positions throughout 
the cluster and with selection acting in favor of the 
least divergence among family members, this process 
can act to slow down dramatically the continuous 
process of genetic drift between family members. 

The second process responsible for concerted evo- 
lution is intergenic gene conversion between ‘nonal- 
lelic’ family members. It is easy to see that different 
tandem elements of nearly identical sequence can take 
part in the formation of Holliday intermediates which 
can resolve into either unequal crossing over products 
or gene conversion between nonallelic sequences. 
Although the direction of information transfer from 
one gene copy to the next will be random in each case, 
selection will act upon this molecular process to 
ensure an increase in homogeneity among different 
gene family members. As discussed above, informa- 
tion transfer — presumably by means of gene conver- 
sion — can also occur across gene clusters that belong 
to the same family but are distributed to different 
chromosomes. 

Thus, with unequal crossing-over and interallelic 
gene conversion (which are actually two alternative 
outcomes of the same initial process) along with selec- 
tion for homogeneity, all of the members of a gene 
family can be maintained with nearly the same DNA 
sequence. Nevertheless, concerted evolution will still 
lead to increasing divergence between whole gene 
families present in different species. 


See also: Concerted Evolution; Gene Conversion; 
Globin Genes, Human; Immunoglobulin Gene 
Superfamily; Molecular Clock; Unequal Crossing 
Over 
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Gene flow is defined as the movement of genes among 
populations. The rate of gene flow, m, is the propor- 
tion of the gene copies in a population that have been 
carried into that population by immigrants. Gene flow 
can be mediated by the dispersal of either gametes or 
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individuals. But gene flow is not equivalent to disper- 
sal, for gametes or individuals that move among popu- 
lations but fail to incorporate genes into the gene pool 
have not mediated gene flow. 

Population structure, the pattern of genetic vari- 
ation among populations, is produced by the joint 
action of gene flow, genetic drift, and natural selection. 
Genetic drift is change in allelic frequencies produced 
by accidents of sampling and chance variation in sur- 
vival, mating success, and family size. Natural selection 
is defined as the differential reproduction of genotypes. 


Elaboration 


Genetic Drift Differentiates Populations 

If populations are not connected by gene flow, sto- 
chastic changes will cause them to diverge in time. 
Imagine a large, genetically diverse, randomly mating 
population that is suddenly broken apart into two 
perfectly isolated populations, each considerably 
smaller than the initial population. Initially, these 
populations might share the same alleles at similar 
frequencies. The Hardy-Weinberg Law demonstrates 
that, in the absence of selection, mutation, and migra- 
tion, allelic frequencies will not change in an infinitely 
large population with random mating. But in finite 
populations, allelic frequencies drift over time, with 
stochastic variation in survival and reproduction. The 
stochastic loss of alleles may differ between popula- 
tions, increasing the genetic distance between them. In 
addition, mutations may introduce new alleles into 
populations, further distinguishing them. With suffi- 
cient time, perfectly isolated populations will become 
completely differentiated, so that they do not share 
any alleles. 

The rate of genetic drift in a population is depend- 
ent on the number of breeding adults in the popula- 
tion. Consider a gene segregating two alleles, A and a, 
at frequencies p and q, respectively, so that: 


p+q=1.0 


The standard error (SE) of the allelic frequency is a 
measure of the magnitude of drift of the frequency of 
an allele in a single generation. The standard error 
of the frequency of an allele is: 


Pq 
E = ,/— 

5 2N 
where N is the number of breeding adults. Most of the 
time (95%), the change in frequency will be less than 
two standard errors. For example, in a population 


with 1000 breeding adults, and p and q equal to 0.5, 
the standard error of allelic frequency is 0.01. Thus, in 
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the next generation, 95% of the time, p will be greater 
than 0.48 but less than 0.52. However, in a population 
with the same frequencies but only 10 breeding adults, 
the standard error is 0.11, so the likely range of p in the 
next generation will be from 0.38 to 0.72. Thus, the 
rate of genetic drift increases with diminishing popu- 
lation size. 


Gene Flow Tends to Homogenize 
Populations 

Gene flow among populations makes them more simi- 
lar. This point can be made intuitively by considering 
an exercise with two glasses of wine, one red and one 
white. Imagine pouring a small amount from the glass 
of white wine into the other glass, then swirling it. The 
red wine will still be red, but careful inspection would 
reveal that the intensity of the color has diminished. 
Now pour some of the red wine into the other glass. A 
few drops of red wine bring a tinge of red to the white 
wine. Now imagine repeating the exchanges many 
times. Ultimately, the colors in the two glasses will 
be identical. Similarly, some gene flow between popu- 
lations will make them more similar, and high gene 
flow will make them indistinguishable. 

The impact of gene flow on population structure 
can be illustrated quantitatively by modeling genetic 
variation at a single gene. Now consider gene flow into 
a population from populations that have different 
allelic frequencies. If the proportion of migrants into a 
population is m, and the frequency of A in the mi- 
grants is p, then p’, the new frequency of A in the 
population, will be: 


p =pm+p(1—m) 


If gene flow were unopposed by other forces, the 
populations connected by gene flow would ultimately 
share the same alleles, at the same frequencies. 


Natural Selection Can Overcome Gene Flow 
Natural selection can oppose the homogenizing effect 
of gene flow, sustaining genetic differences among 
populations linked by gene flow. For example, the 
blue mussel, Mytilus edulis (Figure 1), exhibits an 
abrupt genetic boundary despite high gene flow. Blue 
mussels are native to the North Atlantic, and are com- 
mon in the rocky intertidal. They are dioecious, i.e., an 
individual is either male or female, and they release 
their gametes into the water. The gametes unite to 
form veliger larvae, which are carried by currents for 
at least 3 weeks. Studies of coastal currents suggest 
that larvae could be carried more than 100 km, and an 
estimate of gene flow from genetic data (see below) 
indicates that blue mussels exchange many individuals 
among populations each generation. Despite high 


levels of gene flow, the mussels in Long Island Sound re- 
main distinctly differentiated from other populations. 

Long Island Sound receives water from several 
major rivers (Housatonic, Quinnipiac, Connecticut, 
Thames), which dilutes the salinity of the Sound to 
about one half of the salinity of the open ocean. Thus, 
the Sound is a distinct environment for mussels, which 
must make physiological adjustment to retain osmotic 
cell pressure. Variation at the gene coding for leucine 
aminopeptidase (Lap) plays an important role in the 
maintenance of cell pressure; some genotypes are most 
efficient at high salinity, while other genotypes are 
most efficient at low salinity. Each spring, millions of 
larvae are carried into Long Island Sound by currents 
sweeping west along the coast of Rhode Island and 
Connecticut. But each fall, mortality in the young 
mussels creates a sharp genetic cline in Lap frequen- 
cies near Guilford, Connecticut, where salinity 
changes abruptly. Although the veliger are capable of 
dispersing more than 100 km, the genetic cline is only 
20 km wide. 

In addition, studies of both ribbed mussels, Genk- 
ensia demissa, and acorn barnacles, Semibalanus 
balanoides, have reported significant differentiation 
between the samples taken from the upper and lower 
portions of the intertidal zone — distances of one or 
two meters. These species, like the blue mussel, also 
have pelagic larvae, and consequently gene flow 
would homogenize the frequencies of neutral or un- 
selected genes within the intertidal zone. Both cases of 
differentiation were produced by selection differing 
among habitats in a heterogeneous environment. 


Some Generalizations concerning 
Gene Flow 


Dispersal 
Although gene flow is not synonymous with disper- 
sal, it is certainly true that long-distance dispersal 


Figure | 


(See Plate 17) The blue mussel, Mytilus edulis. 


provides the opportunity for long-distance gene flow, 
and hence for high levels of gene flow among popula- 
tions. The larvae of some marine mollusks have been 
documented to be carried by equatorial currents from 
the coast of Africa to the Caribbean Sea, and we would 
expect those species to have high levels of gene flow 
among populations in Africa or in the Caribbean. On 
the other hand, some marine mollusks brood their 
young, or attach egg cases to the substrate, severely 
limiting the opportunity for dispersal, and restricting 
gene flow. Species that are philopatric with respect to 
breeding sites, such as salamanders and some species 
of birds, are characterized by very low gene flow. 


Mating System 

The mating system can have a profound impact on 
gene flow. For example, the mating systems of plants 
can be characterized as predominantly selfing, or pre- 
dominantly outcrossing, or a mixed system, employ- 
ing an intermediate balance of selfing and outcrossing. 
Many species of plants, such as wheat, barley, oaks, 
and pines, are monoecious, meaning that an individual 
produces both male and female gametes. Wheat and 
barley produce their seeds predominantly (> 99%) by 
selfing. This mating system is characterized by very 
low gene flow, for there is no gene flow in the fertil- 
ization of selfed seed, and the seeds typically dis- 
perse less than 2 m. Gene flow is much higher in oaks 
and pines, which are typically outcrossed and wind- 
pollinated. Outcrossed seeds have separate maternal 
and paternal parents, and the wind pollination pro- 
vides the possibility that the parents are distant from 
one another. How far can oak or pine pollen travel? 
Pollen traps on ships 150 km from shore have captured 
pine pollen, confirming long-distance dispersal, and 
providing the opportunity for long-distance gene flow. 

Behavior can have a major impact on gene flow. 
Plants with animal pollinators will have gene flow de- 
termined by the behavior of their pollinators. Plants 
pollinated by bees that visit many flowers on a plant 
before visiting an adjacent plant will have low gene 
flow. Gene flow mediated by various species of hum- 
mingbirds can be low or high, depending on whether 
the birds defend small territories or are ‘trapliners,’ 
flying substantial distances between sequential pollin- 
ations. 

Pods of killer whales around the San Juan Islands, 
Washington State, have distinct feeding behaviors that 
constrain their social systems and limit gene flow 
among pods. Some of the pods prey predominantly 
on marine mammals, such as seals and sea lions, while 
other pods prey almost exclusively on salmon. Long- 
term studies of the behaviors of the pods revealed 
that the pods defend their territories, and are stealthy 
when they trespass into the territories defended by 
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neighboring pods. Studies of mitochondrial DNA 
identified diagnostic differences between mammal- 
eating and fish-eating pods, and suggest that gene flow 
between these pods had not occurred for 2000 years. 


Direct Measurement of Gene Flow 


The most direct measure of gene flow is to tag perman- 
ently an individual at or near its natal site, and then 
record where it breeds. For example, bird bands, 
which are amulets or rings placed on a bird’s leg, 
have been used to study gene flow in many species of 
birds. Tags have been attached to the fins of fish, and 
tiny bar code signs have been glued on insects. Radio 
beacons fashioned into collars have revealed the 
movements of wolves and lynx. Fluorescent dyes pre- 
pared as a fine dust have been used to mark birds and 
small mammals for short periods of time. Radio trans- 
mitters have been placed in the stomachs of snakes and 
beneath the skin of sharks. Mammals have had their 
coats numbered with bleach or paint. These marking 
techniques have the advantage of providing clear 
evidence of dispersal and, if the animal breeds at its 
destination, evidence of gene flow. They have the dis- 
advantage that they are often labor-intensive, and 
some of the tags, such as radio transmitters, are both 
expensive and short-lived. But these techniques can- 
not be used in all species and, in addition, they provide 
just a single estimate of gene flow. Because animal 
behavior is flexible, and can vary among years and 
generations, tagging studies may not reflect the aver- 
age gene flow. Finally, population structure may be 
predominantly determined by historical events rather 
than the current rate of gene flow. 


Inference from Genetic Data 


F. Measures the Differentiation of 
Populations 

Fx is a quantitative estimate of the degree of differen- 
tiation of populations. Consider a gene segregating 
two alleles, A and a, at frequencies p and q, respect- 
ively. F,,, a standardized variance of allelic frequencies, 


is defined as: 
S2 
Fs = ae 
Pq 


where the numerator is the variance of p among popu- 
lations, and the denominator is the product of the 
means of the allelic frequencies. The variance of allelic 
frequencies among populations is calculated as: 


1 _ 
S; = 5 (ti -57 
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where d is the number of populations, p; are the fre- 
quencies of the A allele in the populations, and f is the 
mean of the frequencies. Fs is zero if all populations 
have the same alleles at the same frequencies, and 1.0 
for two populations fixed for different alleles. Fy, will 
increase over time between isolated populations, and 
because genetic drift increases with decreasing popu- 
lation size, the rate of divergence increases with 
decreasing population size. 

The degree of differentiation among populations 
will come to an equilibrium that reflects a balance 
between genetic drift and gene flow. The relationship 
between differentiation and gene flow is: 


Fa = 1/(4Nm +1) 
Nm = (1/Fe — 1)/4 


or, equivalently, 


where N is the number of breeding individuals in a 
population. Thus, if populations are completely isol- 
ated for a long time, Fs will decline to zero, but if just 
one member of a population is a new immigrant (e.g., 
Nm = 1) then the rate of gene flow is: 


and the equilibrium value of F,, will be 0.20. Higher 
rates of gene flow will make the populations even 
more similar. For example, if the number of immi- 
grants is 5 per generation, then Fs will be less than 
0.05, and the populations will be, for all practical 
purposes, very similar. 

An important threshold is placed at the rate of gene 
flow of Nm = 1.0. Effectively, when Nm < 1, gene 
flow is not sufficient to offset the effects of genetic 
drift. So populations connected by Nm < 1 will 
diverge in time, while for populations connected by 
Nm > 1, gene flow will prevent differentiation by 
genetic drift. 


Inference of Gene Flow in Limber Pine 
The organellar genomes of pines are ideal for measur- 
ing gene flow, as mitochondrial DNA (mtDNA) has 
maternal inheritance and chloroplast DNA (cpDNA) 
has paternal inheritance in pines. These different 
modes of inheritance allow us to explicitly identify 
gene flow mediated by pollen and by seeds. In add- 
ition, pollen and seeds have disparate potentials for 
dispersal. The wind-borne pollen have the potential to 
travel great distances, but in contrast, the seeds of 
pines usually fall within a circle that has a radius 
equal to the height of the tree. 

Limber pine, Pinus flexilis (Figure 2), is native to 
western North America, where it is primarily 
restricted to windy ridges and scree slopes from the 


Figure 2 (See Plate 18) The limber pine, Pinus flexilis. 


Sierra Madre of Mexico to the Canadian Rockies, 
from Mt Pinos in southern California to the Black 
Hills of South Dakota. The seeds of limber pine are 
dispersed and planted by Clark’s nutcracker, Nuci- 
fraga columbiana. The bird and pine are engaged in a 
mutualism sculpted by evolution. Limber pine relies 
on the bird to harvest, disperse, and plant its seeds. 
Clark’s nutcracker relies on limber pine seeds to get 
through the winter. Both the bird and the pine have 
evolved morphological traits (a sublingual pouch, 
wingless seeds) to better serve and exploit their 
partner. The birds usually cache seeds on windy or 
south-facing slopes that will be free of snow in winter, 
and this explains the curious distribution of limber 
pine. A bird can carry approximately 30 limber pine 
seeds in its sublingual pouch. When its pouch is full, 
the bird flies to a propitious site for caching and har- 
vesting seed. The flight distances are highly variable; 
although the record flight exceeds 20 km, most flights 
are very short, a few meters to a few hundred meters. 

The potentials for dispersal of pollen and seed lead 
biologists to expect high gene flow in genes dispersed 
by pollen (nuclear genes, cpDNA) and low gene flow 
for genes dispersed solely by seed. This hypothesis 
was tested with a study of gene flow among popula- 
tions of limber pine in the Front Range of Colorado. 
The populations were distributed from tree line at 
the Continental Divide to an isolated stand of trees 
100 miles to the east, on an escarpment on the Great 
Plains. Haplotype frequencies were used to calculate 
Fx for both cpDNA and mtDNA, and gene flow was 
inferred from Fx with the equation directly above. F,,s 
were 0.02 and 0.68 for cpDNA and mtDNA, respect- 
ively, suggesting that the number of migrants among 
populations per year are 12.25 for pollen and 0.12 for 
seeds. The gene flow of cpDNA is high, and should 
tend to homogenize the frequencies of cpDNA hap- 
lotypes and nuclear genes among populations within 
distances of approximately 100 miles. In contrast, the 


gene flow of mtDNA is below the threshold at which 
the influence of genetic drift predominates. SomtDNA 
is expected to vary more among populations than 
nuclear genes and cpDNA, and genetic drift will 
cause populations to diverge with respect to mtDNA 
haplotypes. 


Private Alleles Estimate Gene Flow 

Private alleles, or alleles found only in a single popula- 
tion, can also be used to infer rates of gene flow among 
populations. The private alleles can be from markers 
from mtDNA, cpDNA, nuclear DNA, or allozyme 
markers, and they are usually taken from surveys of 
geographical variation within a species. For example, a 
survey of allozyme variation throughout the range 
might reveal several or many private alleles. The aver- 
age frequency of the private alleles, f, is plotted on a 
regression line ona plot of In(f) on the ordinate versus 
In(Nm) on the abscissa. The regression line was esti- 
mated from a computer simulation study examining 
the relationship between genetic drift and gene flow in 
the determination of the geographical distribution 
of new mutations. Consider a species that has very 
low gene flow among populations. When mutation 
produces a novel allele in a single population, it 
could drift to moderate or even high frequencies 
before an individual bearing that allele migrated to 
another population and reproduced. However, if 
gene flow in the species was very high, then it is likely 
that the new mutation would still be at a low frequency 
when it was successfully introduced to another popu- 
lation. Thus, low gene flow allows private alleles to 
drift to higher frequencies while high gene flow holds 
private alleles to low frequencies. 
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Estimates of gene flow from private alleles are 
usually, but not always, consistent with estimates 
from Fs. In a compilation of estimates of gene flow 
from private alleles, Slatkin (Table 1) found the very 
highest rate of gene flow in the blue mussel, M. edulis 
(Nm = 42) (Figure 1). This estimate is probably real- 
istic, for the mussels have pelagic larvae that ride 
ocean currents for weeks. At the other end of the 
scale were four species of salamanders, all with values 
of Nm considerably below 1.0. Once again, this esti- 
mate of gene flow seems reasonable given our know- 
ledge of salamanders. Salamanders forage only short 
distances, and they usually breed in their natal ponds. 
Consequently, movement of individuals among popu- 
lations is rare. 


Caveats concerning the Relationship of 
F to Nm 


The relationships between Nm and F,, and between 
Nm and the frequency of private alleles are both 
dependent on assumptions that may be frequently 
violated in the data collected in range-wide surveys 
of genetic variation. 


Assumption of ‘Evolutionary Equilibrium’ 

The inference of rates of gene flow from either F,, or 
private alleles depends on the assumption that there 
has been sufficient time for population structure to 
come to an evolutionary equilibrium determined by 
the joint action of gene flow and genetic drift. Con- 
formation to this assumption is rarely considered, but 
some biologists believe that very few species have 
reached equilibrium. For example, limber pines were 


Table | Estimates of the number of migrants moving among populations (Nm) from the average frequency of 


private alleles (p(1)) 


Common name Formal name PC!) Nm 
Blue mussel Mytilus edulis 0.008 42.0 
Fruit fly Drosophila willistoni 0.014 9.9 
Milkfish Chanos chanos 0.030 4.2 
Desert lizard Lacerta melisellensis 0.066 1.9 
[Annual plant] Stephanomeria exigua 0.054 1.4 
Pacific treefrog Hyla regilla 0.081 1.4 
Valley pocket gopher Thomomys bottae 0.087 0.86 
Pacific slender salamander Batrachoseps pacifica 0.117 0.64 
Red back salamander Plethodon cinereus 0.200 0.22 
Oldfield mouse Peromyscus polionotus 0.158 0.31 
Camp’s slender salamander Batrachoseps campi 0.338 0.16 
Zigzag salamander Plethodon dorsalis 0.294 0.10 


Note: values of Nm have been adjusted for the sample sizes, so there is not a perfect rank-order correlation between p(1) 


and Nm. 


(Adapted from Slatkin, 1985.) 
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displaced from high elevations by the glaciers that 
reached their most recent glacial maximum 18000 
years ago. Once the glaciers subsided, limber pine 
were able to colonize numerous sites above 10000 
feet in the Rocky Mountains, where limber pines 
commonly attain ages in excess of 1000 years. The 
populations with ancient trees are certainly not at an 
evolutionary equilibrium between drift and gene flow, 
for very few of their generations have passed since 
they recolonized high elevations. Similar scenarios 
apply to the plants and animals that moved northward 
in North America and Europe since the last glacial 
maximum. 


Heterogeneity among Estimates 
In studies of gene flow based on F,,, the values of Fx 
are commonly heterogeneous. This should not be the 
case for neutral characters, for migration and drift 
should influence all loci in similar ways. The relation- 
ship between F,, and Nm is appropriate only for 
neutral genes; selection on a subset of the loci can 
produce heterogeneous estimates of F,,. One of the 
most striking cases of heterogeneity of estimates of 
gene flow comes from a series of studies of the Ameri- 
can oyster, Crassostrea virginica. Estimates of gene 
flow from allozyme markers suggest that the larvae 
move great distances, homogenizing allelic frequen- 
cies from Massachusetts to Texas. However, both 
mtDNA and several nuclear DNA markers reveal a 
picture of limited gene flow, with a major barrier to 
gene flow in the vicinity of Cape Canaveral, Florida. 
The authors attribute the heterogeneity of estimates of 
gene flow to balancing selection on the allozyme loci. 

Heterogeneity of estimates of gene flow frequently 
involve lower estimates of F,, from microsatellite loci 
than from other nuclear markers. The differences are 
particularly pronounced when the populations are 
well differentiated, and gene flow between them is 
low. This heterogeneity is attributable to hetero- 
geneous mutation rates. While the mutation rates for 
nuclear loci are typically 1076—1078, mutation rates 
for microsatellite loci are much higher, often around 
10°°, but reaching 1/20. High mutation rates at micro- 
satellite loci are due to the nature of the variation at 
these loci. Microsatellite alleles differ in their numbers 
of tandem repeats, and the different sizes of the 
alleles produces chromosomal rearrangements when 
chromosomes are unable to synapse perfectly in the 
first division of meiosis. The high rates of mutation 
generate many size variants in each population. For 
microsatellite loci, the sharing of alleles among popu- 
lations may be due to independent mutations, rather 
than gene flow. 

Biologists using genetic data to infer rates of migra- 
tion are obliged to be cognizant of the assumptions 


underlying their methods. If there are egregious viola- 
tions of the assumptions, estimates of gene flow may 
be unreliable. 
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Gene frequency refers to the proportion of a popula- 
tion that carries one type of variant, or allele, at a locus. 
More appropriately referred to as ‘allele frequency,’ 
gene frequency ranges from 0 (where the particular 
variant is absent from the population) to 1 (where the 
variant type is the only allele present). In the latter case, 
the population is said to be ‘fixed’ for this particular 
allele. While often defined in terms of a locus or gene 
and, in the early days of genetics, assessed by phenotype 
of the corresponding genotype, the gene frequency is 
now applied to the frequency of any alternative form 
found segregating in a population, e.g., alternative 
nucleotides at a single site in a sequence, whether it 
be in coding, intron, or intragenic regions, as well as 
insertion/deletion variants and even alternative gene 
rearrangements such as inversion types. 

Gene frequency is estimated by taking a random 
sample of individuals from what might be considered 
a population of the species of interest (e.g., from a 


geographic locale). By random, we simply mean that 
the individuals are chosen without regard for their 
genotype or phenotype associated with the locus of 
interest. For a haploid organism like E. coli, we could 
estimate the frequency of a particular nucleotide poly- 
morphism (A versus T for example) by sampling 1000 
bacterial cells and assaying them for the sequence 
variant of interest. We might find that 15 of the 1000 
are A. Our estimate of the gene frequency of the A 
allele is thus 15/1000 = 0.015 or 1.5 %. The frequency 
of the alternative allele in this case, T, would be 985/ 
1000 = 0.985, or 98.5 %. We could also calculate the 
frequency of T as simply 1 — p (where p is the fre- 
quency of the A allele), or 1 — 0.015 = 0.985. The 
larger the sample, the more precise our estimate of the 
gene frequency. The sampling variance of this estimate 
is p(1 — p)/n, where p is the frequency of the allele of 
interest and 7 is the number of alleles sampled (also 
equal to the number of haploid individuals sampled 
since each individual cell has only one copy of the 
genome and thus can only carry one allele). In this 
case, the sampling variance is 1.4775 x 107”, and the 
standard error (SE) of the estimate is the square root of 
the variance or 3.8438 x 107°. We can thus be 95% 
sure that the true population frequency is within the 
interval of p+1.96(SE), or (0.00747, 0.02253). 

It is important to distinguish between gene and 
genotype frequency for organisms other than hap- 
loids. Consider a diploid like ourselves. At a particular 
site in our DNA, some chromosomes carry a C (in 
frequency p) and some a G (frequency q). Some indi- 
viduals will have two Cs, some a C and a G, and some 
two Gs. Individuals carrying two copies of the same 
allele are called homozygotes (e.g., C/C or G/G) and 
individuals with two different alleles are called hetero- 
zygotes (e.g., C/G). Genotype frequencies represent 
the proportion of each type of genotype in the popu- 
lation sample. For one set of gene frequencies, there 
can be many different genotype frequencies: e.g., for 
p =q = 0.5, we could have 0.5 C/C, 0.0 C/G, and 0.5 
G/G; or we could have 0.0 C/C, 1.0 C/G, 0.0 G/G; or 
we could have 0.25 C/C, 0.50 C/G, 0.25 G/G; the 
latter is what is expected with random mating and no 
selection, drift, mutation, or migration — the Hardy- 
Weinberg equilibrium genotype frequencies for these 
gene frequencies. 

For a dipoid organism like ourselves, we estimate 
gene frequency and the associated sampling variance 
as we did for haploids, but we must take account of the 
fact that if we are examining an autosomal gene, then 
each individual carries two copies of each gene. Thus, 
allele frequency is calculated as twice the number 
of homoygotes (for example, A/A individuals) plus 
the number of heterozygotes (A/T individuals), all 
divided by twice the total number of individuals 
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sampled. For X-linked genes, the heterogametic sex 
(males in humans) only carries one gene copy, while 
females carry two copies. Similar logic applies to the 
estimation of gene frequencies in polyploid species, or 
in haplodiploid species. 

What do gene frequencies tell us about the evolu- 
tionary forces shaping variation? Variation ultimately 
has its origin as mutation. Mutation introduces alleles 
into populations. The alleles can be spread to other 
populations by gene flow (migration). If the popula- 
tion of interest is infinitely large, and the variants do 
not confer any advantage or disadvantage to their 
carriers (so called selectively neutral), then from one 
generation to the next there will be no chance sam- 
pling ‘genetic drift’ of gene frequencies; they will 
change in frequency only by additional mutation. 
However, all real populations are finite, and thus 
genetic drift is a process that contributes to allele 
frequency change in all populations. 

If drift is the only factor influencing gene frequen- 
cies, then the higher the frequency of a particular 
variant, the older that variant is likely to be. That 
is, a new variant in a diploid population of size N 
individuals (and thus 2N copies of each locus) starts 
at a frequency of 1/2N and increases or decreases by 
drift. The probability that an allele is eventually fixed 
in the population by drift alone turns out to be simply 
its frequency (1/2N for a new allele), but it will take on 
average 4N generations for this to occur. This is the 
average time it takes for all alleles in a population to 
share the same single common ancestor allele, that is, 
all alleles present now are descendant copies of a single 
allele present in the population on average 4N gener- 
ations ago. The probability that a new mutation is 
ultimately lost from the population by drift alone 
(barring new mutation) is 1 — 1/2N. For a large popu- 
lation, this is very high and most new mutations 
are destined to be lost. The continual introduction of 
alleles into a population and their inexorable march to 
fixation or loss, leads to an steady-state (‘equilibrium’) 
distribution of gene frequencies expected in a popula- 
tion of size N and with a given mutation rate. Most 
alleles will be of low or high frequency, with relatively 
few of intermediate frequency. 

Some variants do affect the contribution of their 
carriers to the next generation (e.g., influence survival, 
number of progeny produced, etc.). For those new 
mutants that are favored by selection, they will 
increase in frequency to fixation, provided their selec- 
tive advantage is large enough to overcome drift. In 
some cases, natural selection favors individuals with 
multiple different allelic types (for example human 
heterozygotes for normal and sickle-cell B-chain 
hemoglobin in regions of the world with malaria). 
Here, selection maintains a stable ‘equilibrium’ 
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frequency of both allele types. Even harmful (deleteri- 
ous) mutants will exist in an equilibrium frequency in 
populations, due to the balance of the introduction of 
the allele type into the population by mutation and its 
elimination by natural selection. Alleles that are pheno- 
typically recessive, often due to a loss of function, 
can reach moderate mutation-selection balance fre- 
quencies, since they are ‘hidden’ as heterozygotes 
and only selected out as homozygotes. Knowledge of 
the selective disadvantage of such alleles allows the 
estimation of mutation rates for these types of alleles 
and has been widely used to do so, particularly in 
human genetics. 


See also: Balanced Polymorphism; 
Gene Flow; Genetic Drift; Genetic Equilibrium; 
Hardy-Weinberg Law 
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Gene insertion is the term for a gene that has been 
altered by the insertion of extra DNA within it. In 
most cases this leads to loss of function. 
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The term gene interaction is regrettably ambiguous, 
being used with many different meanings in the scien- 
tific literature. Generally, it describes situations where 
the presence of two mutations in an organism leads 
to a phenotype that is different from what might be 
expected from either mutant phenotype alone. The 
mutations may affect the same gene (allelic interac- 
tions) or different genes. 

For allelic interactions (those affecting a single 
gene), one allele may be recessive, dominant, incom- 
pletely dominant or codominant to the other. If a gene 
has a wild-type allele + and a mutant allele x, with a 
mutant phenotype X in x/x homozygotes, then if the 
phenotype of the heterozygote x/+ is wild-type, x is 
recessive to wild-type. If the heterozygote x/+ has the 
phenotype X, then x is dominant to wild-type. If the 
phenotype is intermediate, then x is incompletely 


dominant (also known as semidominant). If two 
alleles x and y confer two distinguishable phenotypes 
X and Y, then they are said to be codominant if both 
phenotypes are seen in the heterozygote x/y. Other 
forms of interaction include overdominance, or hetero- 
zygote superiority. In the case of overdominance the 
fitness of a heterozygote, x/y, is higher than that of 
either homozygote, x/x or y/y. The reciprocal situ- 
ation, when the fitness of x/y is lower than that of 
either homozygote, is called underdominance. 

For interactions involving two genes, a variety of 
possibilities exist. These include suppression, epistasis 
and hypostasis, and synergy. In the case of suppres- 
sion, mutation of a second gene results in the ameli- 
oration of the phenotypic effects of mutation in the 
first gene, either partly or wholly to a wildtype pheno- 
type. In the case of epistasis, if two genes have distinct 
mutant phenotypes, then the double mutant exhibits 
only one of these phenotypes and the other is masked. 
If A is the gene with the masking phenotype, and B the 
gene with the masked phenotype, then in this situation 
gene A is said to be epistatic to gene B. The different 
term hypostasis, which is less frequently used, has the 
opposite meaning, so in this example gene B is hypo- 
static to gene A. The distinction between epistasis and 
suppression is that the suppressing mutation restores 
the wild-type phenotype, and may have no other 
phenotype of its own. 

Synergistic interactions are observed when the 
combination of two mutant genes results in a much 
more severe phenotype than either mutant alone. 
For example, the combination of two viable mutations 
may result in lethality. Synergy is often caused by 
redundancy in gene action, so that loss of function in 
either gene alone has little or no effect on a given 
process, but loss of both blocks the process com- 
pletely, and therefore has much more drastic conse- 
quences. 

The term gene interaction has also been used with 
reference to direct physical interactions between gene 
products, most usually protein-protein interactions, 
though protein-RNA, protein-DNA, RNA-RNA, 
and RNA-DNA interactions may also be encoun- 
tered. These physical interactions often involve high- 
affinity, stable binding, which can be easily detected 
by biochemical methods, but may alternatively involve 
transient phenomena, such as protein modification or 
cleavage. Weak or transient interactions may still be 
biologically important, and can often only be detected 
by genetic methods. Genetic approaches, however, 
cannot usually distinguish between direct and indirect 
interactions. It may be that the genetic analysis of two 
genes suggests strongly that their products interact, 
but this interaction may in fact be mediated by some 
additional factor or factors, so that the two products 


never come into actual physical contact. The genetic 
data can nevertheless provide evidence for involve- 
ment in the same pathway or process. 


See also: Alleles; Dominance; Epistasis; Recessive 
Inheritance; Suppression; Suppressor Mutations 


Gene Library 
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A gene library is a collection of single cells (usually 
bacteria), each of which has received a single segment 
of DNA usually carried by a plasmid, bacteriophage, 
or viral vector, the DNA segments having been 
derived from genomic DNA or cDNA. 


See also: Genomic Library 
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The Detection of Recombination within 
Genes 


In the sense intended here, gene mapping means the 
ordering of sites within genes. In the earlier history of 
genetics, the gene was considered to be a single indi- 
visible unit of mutation and recombination, but from 
the early 1950s onwards it became apparent that dif- 
ferent mutations in the same gene were nearly always 
at different sites and able to recombine to yield wild- 
type and, where they were looked for, doubly mutant 
genes. In eukaryotic organisms (e.g., fungi and Dros- 
ophila) the recombinants were generated in meiosis 
following a sexual cross between mutants. In bacteria, 
recombination occurred in the course of conjugation, 
transduction, or transformation, as the donor genomic 
fragment was integrated into the whole genome of the 
recipient cell. In bacteriophage it occurred during 
mixed infection of bacterial cells. 

The frequency of recombination within genes is 
low compared with that between genes. In the budding 
yeast Saccharomyces cerevisiae it may amount to a few 
percent of the meiotic products, but in most other 
eukaryotes favored by geneticists it is very much 
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lower. Consequently, the use of recombination for 
mapping within genes depends on some method for 
selecting wild-type recombinants from a large excess 
of nonrecombinant mutants. This is straightforward 
when the mutants have a growth handicap not shared 
by the wild-type, as when they have special nutritional 
requirements (i.e., are auxotrophs) or are sensitive to 
higher temperatures, or (in bacteriophage genetics) 
unable to grow in a particular host. 


Mapping by Recombination Frequency 


It is a general principle of linkage mapping that the 
frequency of recombination between sites of mutation 
increases with their distance apart. But the use of 
recombination frequency for mapping within genes 
is complicated by the fact that the sites being recom- 
bined are close to the recombination event, which is a 
complex process probably always involving local non- 
reciprocality. In fungi and Drosophila, which are the 
eukaryotic organisms most studied in this regard, 
much or most recombination between mutant sites 
in the same gene (usually detected as production of 
wild-types from intermutant crosses) is due to the 
nonreciprocal conversion of one mutant site to wild- 
type. The contribution of reciprocal crossing-over is 
greater when the mutant sites are relatively widely 
spaced. 

However, even if most recombination within genes 
is due to conversion and not to crossing-over, we still 
expect, and generally find, a strong correlation between 
recombination frequency and distance. The reason is 
that conversion involves tracts of DNA rather than 
single base pairs, and recombination between two sites 
by conversion will occur only when the conversion 
tract covers one but not the other, and this will ob- 
viously be less likely the closer the spacing of the sites. 

In practice, recombination frequency is a good gen- 
eral guide to gene mapping but sometimes gives 
ambiguous results. One source of confusion is that the 
nature of the mutational site may strongly influence its 
probability of conversion (see Gene Conversion and 
Mismatch Repair (Long/Short Patch)). This is called a 
marker effect. 


The Use of Flanking Markers 


To the extent that recombination between mutant sites 
within a gene is due to reciprocal crossing-over, it 
should result in recombination of genetic markers 
placed on either side of the gene. Thus, wild-type 
recombinants should be associated with one new flank- 
ing marker combination, and double-mutant recom- 
binants (if they are recoverable) with the reciprocal 
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Figure | The use of flanking markers to determine the order of mutant sites within a gene. (A) The use of flanking 
markers in the study of recombination within a gene. The two mutant strains being crossed (with mutational sites | 
and 2, with corresponding wild-type sites shown as +) are also distinguished by allelic differences A/a and B/b at 
closely placed flanking loci. Wild-type (++) recombinants from the A | B x a 2 b cross can arise in the following ways: 
(1) Reciprocal crossing-over between the sites: all ++ recombinants will be a B if the | — 2 order is as shown; (2) and 
(3) conversion of either | or 2 to wild-type without crossing-over, conveying no information about the order of the 
sites; (4) and (5) conversion of | or 2 to wild-type, with crossing-over in the interval adjacent to the conversion 
event: all +--+ recombinants will be a B if the | — 2 order is as shown. If the conversion-associated crossover is on the 
other side of the gene (shown as a dotted cross), the outcome will be A ++ b recombinants; this is usually a less 
common event. Overall, therefore, the order A— | —2-B will be indicated by a predominance of a ++ B over A ++ b 
products. (B) The use of one flanking marker in a transduction cross in bacteria. Mutant sites | and 2 are present in 
the donor and recipient, respectively; A is a flanking marker, not subject to selection, present in the donor, as opposed 
to a in the recipient. If the sites are arranged |—2—A, integration of a donor phage-borne fragment to give 
a ++ recombinant requires one exchange and another either (i) between 2 and A or (ii) to the right of A. If the 
sites are the other way round, the exchange between | and 2 will always exclude A unless there are more than 
two exchanges. 


flanking marker combination (Figure 1). If the intra- 
genic recombination is due to conversion at one site 
without crossing-over, the flanking markers will 
retain their parental combinations and give no infor- 
mation about order within the gene. 

However, tetrad analysis in fungi, particularly 
in the budding yeast S. cerevisiae, shows that even 
though much, or sometimes nearly all, recombination 
within genes is due to conversion and not to reciprocal 
crossing-over, it is still, with a frequency which is often 
about 40% or 45%, associated with crossing-over 
between flanking markers. Most of the conversion- 
associated crossovers were found to be on the side of 
the gene where the conversion event had occurred and 
could have beenimmediately adjacent to the conversion 


tract (Fogel and Hurst, 1967). This result is consistent 
with the hypothesis that conversions and crossovers 
have a common origin in hybrid DNA structures, 
formed by interaction between chromatids, that 
always involve local unilateral transfer of DNA (and 
hence gene conversion if the transferred segment 
happens to carry a distinguishing marker), but lead 
to crossing-over only in a certain proportion of cases 
(see Recombination, Models of). To the extent that 
conversion-associated crossovers really are adjacent 
tothe conversion tracts, the wild-type intragene recom- 
bination will be associated with the same crossover 
combination of flanking markers as they would have 
been had they originated by reciprocal crossing-over 
(Figure 1). In this case the relative frequencies of the 


two flanking marker recombinant classes will reveal 
the order of the sites within the gene, even if most or 
all intragenic recombination is due to gene conversion. 

In practice, flanking markers usually give a clear 
order of sites, though in the fungus Neurospora the 
data are often complicated by the occurrence of a 
substantial minority of wild-type recombinants with 
the ‘wrong way round’ flanking marker recombin- 
ation. In some cases, these ‘exceptions’ are too numer- 
ous for an unambiguous ordering of sites within the 
gene (Figure IA). 

This complication does not appear to arise in Dros- 
ophila, where in the best-analyzed case — the mapping 
of sites within the rosy (ry) gene which encodes the 
enzyme xanthine dehydrogenase — only one of the two 
flanking marker crossover combinations occurred in 
the ry* recombinants (Chovnick et al., 1971). In this 
study the rare ry* recombinants were selected through 
their ability to survive on purine-containing medium, 
which kills ry mutants. 

The flanking marker principle has also been used in 
transduction experiments to order sites within bacterial 
genes, though here the gene being mapped has usually 
been flanked by only one marker. If the transductants 
are selected for intragenic recombination, the prob- 
ability of the donor flanking marker being included 
will depend on whether it is more closely linked to the 
selected or to the excluded site (Figure IB). 


Deletion Mapping 


Among any large collection of mutations within a 
particular gene some are likely to be due to deletions 
of gene sequence rather than to changes of single base 
pairs (point mutations). Deletion mutations can in 
general be distinguished from point mutations through 
their inability to back-mutate to wild-type. More 
definitively, they fail to give wild-type recombinants 
in crosses to sets of point mutations that are able to 
recombine with each other. Deletions provide the 
most unambiguous method of mapping within genes. 
The analysis proceeds in two steps. 

First, the deletions are arranged in a linear order 
defined by their overlaps. Nonoverlapping deletions 
can give wild-type recombinants when crossed, 
whereas overlapping deletions can not. Then, when 
the map of deletions has been established, the point 
mutations can all be placed in one or other of the 
segments defined by the deletion overlaps and non- 
overlaps. The ability to recombine with a deletion 
shows that the point mutation falls outside the deleted 
segment; conversely, the failure to recombine with a 
deletion shows that the point mutation falls within, or 
at least very close to, the deleted segment. The prin- 
ciple is explained in Figure 2. 
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Figure 2 The principle of deletion mapping. (A) Of 
eight mutations within a gene, | to 3 are deletions and 
4-8 are ‘point’ mutations. Crosses between them in all 
combinations yield either some wild-type recombinants 
(+) or none (—). (B) The results determine the order of 
the point mutations (above). 


The deletion method was first used for intragene 
mapping by Seymour Benzer (1959), whose fine- 
structure map of the bacteriophage T4 rII gene was a 
major factor in the demise of the doctrine of gene 
indivisibility. In yeast (S. cerevisiae) probably the 
best example has been the mapping by Sherman et al. 
(1975) of the CYC1 gene, which encodes the major 
cytochrome c protein. 


The principle of collinearity 


As soon as it became clear that genes determined 
protein structure, it was an obvious hypothesis that 
the gene was a linear code for the sequence of amino 
acids in the protein polypeptide chain. The sequence 
of mutational sites within the gene, determined by one 
of the methods outlined above, should correspond to 
the order in the polypeptide chain of the amino acids 
changed by the mutations (the principle of collinearity 
— the double-l is optional). This prediction was con- 
firmed wherever it was tested, firstly for the Escher- 
ichia coli gene encoding the A subunit of tryptophan 
synthetase (Yanofsky et al., 1964), and later in several 
other cases, including the yeast example mentioned 
above. 

Today the ordering of mutational sites by genetic 
crosses has been largely superseded by the direct 
determination of the DNA sequences in wild-type 
and mutants. The principle of collinearity has been 
confirmed by molecular methods in countless cases. 
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Gene numbers of free-living organisms range from 
about 1500, for the simplest bacteria, to probably 
more than 100000 in some higher eukaryotes, 
although the true upper limit is impossible to deter- 
mine at present. Parasitic organisms can survive with 
much smaller numbers of genes, so bacteriophage 
genomes may contain as few as four genes, in the 
case of some RNA bacteriophages, and mycoplasma 
(bacteria that can live only as intracellular parasites) 
have fewer than 500 genes. 

Gene numbers can only be known accurately for 
species with completely sequenced genomes, and even 
then good numbers can be hard to come up with. At 
the time of writing a rough draft of the human genome 
sequence has been completed, and estimates of the 
number of genes included in this total sequence 
still range from 30000 to 150000. Undoubtedly im- 
provements in sequencing and gene prediction will 
rapidly refine the estimates, but it will probably be 
a long time before the number is known to better 
than + 5%. 


There are many sources of difficulty in attempting 
to count genes in raw sequence data obtained from 
large eukaryotic genomes. A major problem is that 
exon prediction becomes more and more difficult as 
the size and number of introns increases. Similarly, 
knowing where one gene ends and another begins 
may be very difficult. Genes may be embedded within 
the introns of other genes, or overlapped with them. 
Small genes may be missed, especially those that 
encode RNAs rather than proteins. All these factors 
will tend to lead to underestimates; conversely, failing 
to distinguish between functional genes and pseudo- 
genes will lead to overestimates. 

Gene counting in prokaryotes is much easier, since 
introns are usually absent and signals for translational 
initiation and termination are well defined. Some 
bacteria with large genomes, such as streptomycetes 
and myxobacteria, must have more genes than lower 
eukaryotes such as fungi. Complete sequences for 
the budding yeast Saccharomyces cerevisiae and the 
pathogenic bacterium Pseudomonas aeruginosa show 
that both have about 6000 protein-coding genes, so 
there is clearly overlap between the prokaryotic and 
eukaryotic worlds in this respect. The minimal eukary- 
otic gene set may contain as few as 4000 genes, though 
this is still much larger than the minimal prokaryotic 
set. 

A conspicuous failing in the classical genetic analy- 
sis of eukaryotes has been the consistent underesti- 
mates of true gene number. For the two best-studied 
examples, the fruit fly Drosophila melanogaster and 
the nematode Caenorhabditis elegans, predictions of 
gene number derived from genetic studies were low by 
factors of at least two. Observations on banding pat- 
terns on the polytene chromosomes of Drosophila 
created a longstanding bias. Saturation mutagenesis 
of some regions of the fly genome suggested an exact 
correspondence between the number of polytene 
bands and the number of essential genes (as defined 
by lethal mutations), and led to a prediction of about 
5000 genes in all. In hindsight, it is clear that the 
correspondence between bands and essential genes is 
no more than an unfortunate coincidence, and the 
current gene number inferred from genome sequen- 
cing is much higher, about 13 600. Surprisingly, this is 
a lower number than the estimate for C. elegans (about 
19000 protein-coding genes), and lower yet than the 
estimate for Arabidopsis (about 25 000 genes), although 
the apparent organismal complexity of Drosophila, 
in terms of cell types and anatomical detail, is higher 
in fly than in worm or weed. The apparent paradox 
can be explained by larger gene families and more ex- 
tensive gene duplication in C. elegans and Arabidopsis. 
Also, genes in Drosophila may be more complex, 
undergoing more alternative splicing and therefore 


generating a greater variety of final proteins. Total 
gene number therefore should not be regarded as a 
very useful or informative genomic property. 


See also: Genome Organization 
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The genetic information encoded in all the genes of a 
population or species at one time comprises the gene 
pool from which the genes of the next generation are 
derived. The gene pool determines the genetic charac- 
teristics that future generations will have, except to the 
extent that the genetic information currently present is 
altered by mutation in the production of gametes. The 
characteristics of the gene pool determine how readily 
a population can respond to natural selection. 

According to the classical view of population struc- 
ture, as exemplified in the writings of H. J. Muller, 
individuals in a population are homozygous for a 
single wild-type allele at almost every locus. Rare 
alleles are maintained only by continual mutation, 
because they are unconditionally deleterious. Because 
the population is nearly uniform genetically, the power 
of natural selection to provoke a response is quite 
limited. The rate at which adaptation can occur is limit- 
ed by the rate at which favorable mutations arise. 

According to the balance view of population struc- 
ture, as exemplified in work by Th. Dobzhansky, the 
gene pool consists of several to many alleles at many 
loci. Balancing selection, either in the form of hetero- 
zygote advantage or negative frequency-dependent 
selection, is presumed to be responsible for maintain- 
ing large amounts of genetic variability at many loci. 
Because the population is highly variable, natural 
selection can provoke a dramatic response, and the 
rate at which adaptation occurs is not limited by the 
rate at which favorable mutations accumulate, at least 
in the short run. 

The concept of a gene pool is not restricted to 
populations that are panmictic. It will often take 
more generations for particular gene combinations to 
be formed in a population that is inbreeding or divided 
into geographically distinct subpopulations than in 
one that is panmictic. Nonetheless, those gene com- 
binations will eventually be formed. Once formed 
they will persist longer in a population with inbreed- 
ing or geographical structure than in one that is 
panmictic. 
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In a panmictic population, genotypes at each locus 
will be found in approximately Hardy-Weinberg pro- 
portions, unless genotypes differ substantially in their 
abilities to survive and reproduce. In an inbred popu- 
lation, heterozygotes will be less common and homo- 
zygotes will be more common than in a panmictic 
population with the same allele frequencies. If inbred 
and outbred populations have the same amount of 
genetic diversity in terms of the numbers and types 
of alleles at each locus, the allelic composition of the 
two gene pools is equivalent. The genotypic compos- 
ition of the two gene pools will, however, be different. 


See also: Balanced Polymorphism; Demes; 
Dobzhansky, Theodosius; Hardy-Weinberg Law; 
Heterozygote and Heterozygosis; Natural 
Selection; Panmixis 
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The product of a gene is the protein or the RNA which 
it encodes. The vast majority of genes encode proteins. 
For instance, the bacterium Escherichia coli has 4288 
possible protein-encoding genes, representing 87.8% 
of the chromosome, while only 0.8% of the genome 
encodes RNA as its final product. Messenger RNAs 
(mRNAs), or in eukaryotes the RNAs which are pre- 
cursors to messenger RNAs, are informational inter- 
mediates in protein synthesis (translation). Since they 
are not the ultimate product of the gene, mRNAs are 
not included in lists of gene products. The RNAs that 
are included are the ribosomal RNAs, the transfer 
RNAs, and other stable RNAs. In prokaryotes these 
include 4.55 RNA, 10S RNA, and the RNA compon- 
ent of RNase P, while in eukaryotes there are large 
numbers of such small RNAs. 

Although they can be considered the products of 
transcription and/or translation, most gene products 
undergo one or more processing steps before they 
reach their final form. This is almost universally true 
for the stable RNAs, which are cut from longer pre- 
cursors and/or require modifications of one or more 
bases. Posttranslational processing of proteins is also 
common, from rather minor changes such as the 
removal of the initiating methionine to much more 
complex processing and modification steps. 


See also: Coding Sequences; Transcription; 
Translation 
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The entire pool of genetic information (DNA) in an 
organism is referred to as the organism’s genome. This 
genetic information is organized into units called genes 
and the physical location of a gene within a genome is 
called a locus. In general, the position of a gene within 
agenomeis fixed. However, in some cases a gene may be 
moved from one physical location to another. Such gene 
rearrangements can contribute to several important 
processes, including the regulation of gene expression, 
generation of diversity in a population, generation 
of diversity in proteins, and cellular differentiation. 
Sometimes gene rearrangements can be harmful and 
may lead to inherited disease. On an evolutionary time 
scale, DNA rearrangements can produce gene dupli- 
cations, giving rise torepetitive DNA elements, pseudo- 
genes, and gene superfamilies. 


Transposable Genetic Elements 


Transposons are small pieces of DNA (500-1500 bp 
long) capable of moving themselves from one place to 
another within a genome. These mobile genetic elem- 
ents were first recognized in maize (corn), but are now 
known to be present in essentially all organisms. In the 
fruit fly, Drosophila melanogaster, transposons may 
constitute as much as 10% of the entire genome! 
Transposons usually have repetitive DNA sequences 
at each end to facilitate their excision from the genome, 
and include a gene for the enzyme (transposase) that 
catalyzes excision. Once excised, transposons reenter 
the genome at random positions and usually do not 
disrupt the general architecture of the genome. How- 
ever, transposons often have dramatic effects on gene 
expression and may cause deleterious gene rearrange- 
ments if their integration disrupts important regu- 
latory or protein coding sequences, or if pieces of the 
genome surrounding the transposon are inadvertently 
deleted during transposon excision. 


Regulation of Mating Type in Fungi 


Haploid cells of the budding yeast Saccharomyces 
cerevisiae are able to repeatedly switch between 
two alternate mating types, a and a. The choice 
between these two mating types is determined by the 


identity of the gene in the mating type (MAT) locus. 
Cells with a MATa gene in the MAT locus become 
mating type a, while those with a MATz gene in the 
MAT locus become mating type a. Each yeast cell 
harbors an unexpressed (“silent”) copy of the a gene 
and the « gene. These silent genes are located at the 
HMR and HML loci, 100-200 kb away from the MAT 
locus. The silent a and «& genes are never expressed in 
wild-type cells. When a yeast cell switches between 
mating types, the active gene at the MAT locus is 
removed and replaced with a duplicate version of 
one of the silent mating type genes from either the 
HMR or HML locus. Once placed into the MAT 
locus, the newly duplicated mating-type gene is 
turned on and cellular differentiation proceeds, gen- 
erating a cell of the new mating type. Rearrangement 
of the yeast mating type genes occurs over a distance 
of 100-200 kb of DNA and is controlled by stringent 
regulatory mechanisms that are coordinated with cell 
division. Similar gene rearrangements control mating- 
type switching in other fungi. 


Antigenic Variation in African 
Trypanosomes 


Aftican trypanosomes are eukaryotic pathogens that 
infect a wide variety of mammals, including humans. 
In order to disguise themselves from the mamma- 
lian immune system, these single-celled parasites 
periodically change the identity of their major surface 
glycoprotein antigen, a process known as ‘antigenic 
variation.’ Although trypanosomes contain several 
hundred genes for these variant surface glycoproteins 
(VSGs) scattered throughout their genome, only one 
VSG gene is expressed at any given time. Expres- 
sion of the active VSG gene occurs exclusively at a 
telomere-linked ‘expression site,’ while silent VSG 
genes can be located internally in the genome, or at 
inactive telomere expression sites. Three types of gene 
rearrangements are generally associated with the acti- 
vation of a silent VSG gene. First, a duplicated copy of 
a silent VSG gene may be transposed into the active 
expression site, displacing the previously active VSG 
gene. This duplicative transposition may include all or 
part of the VSG gene. A variation of this type of gene 
rearrangement is a telomere conversion whereby the 
active telomeric region containing the expressed VSG 
gene is completely replaced with a duplicated copy 
of a silent telomeric region. Finally, two telomeres 
may undergo a reciprocal exchange, activating one 
gene and inactivating the other. These dramatic 
gene rearrangements occur over a distance of 100 kb 
or more, and may even occur between different 
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Starting DNA 
arrangement 


Final DNA 
arrangement 


Figure | 


Simplified diagram of antibody gene rearrangements. The boxes labeled ‘V; ‘D; ‘J; and ‘C’ represent the 


four DNA segments of the gene for an antibody heavy-chain subunit. Rearrangements of these gene segments is 
necessary to produce a functional gene, as shown in the final DNA arrangement. 


chromosomes. Similar gene rearrangements are res- 
ponsible for antigenic variation in other microbial 
pathogens. 


Generation of Diversity in the 
Vertebrate Immune System 


Perhaps the most elaborate example of gene rearrange- 
ment in eukaryotes occurs during assembly of the 
genes for antigen-recognizing molecules of the verte- 
brate immune system. Invading pathogens are recog- 
nized as foreign by the immune system on the basis 
of the structures of their pathogen-specific macro- 
molecules (proteins, carbohydrates, and lipids). This 
recognition is mediated by two groups of specialized 
proteins of the immune system called ‘antibodies’ and 
“T-cell antigen receptors.’ Pathogen molecules that 
are recognized by antibodies and T-cell receptors are 
collectively referred to as ‘antigens,’ since antibodies 
are generated in response to them. Any given pathogen 
is composed of its own unique set of tens to thousands 
of antigens. Hence, in order to recognize and respond 
to all potential pathogens, the immune system must 
have an extremely large repertoire of antibodies and 
T-cell receptors. Indeed, it is estimated that the human 
immune system has the capacity to produce as many as 
100 billion different antibody molecules! There is a 
similarly large pool of variant T-cell receptor mol- 
ecules. Since the human genome is estimated to 
30000-40000 protein coding genes, there is not 
enough genetic material for every different antibody 
and T-cell receptor to be derived from its own, indi- 
vidual gene. How then, is this great diversity gener- 
ated? It turns out that vertebrate cells employ a 
complex and highly regulated series of gene rearrange- 
ments to generate variant antibodies and T-cell recep- 
tors from a relatively small number of variant gene 
segments. 

Antibodies are dimeric proteins, composed of one 
heavy-chain subunit and one light-chain subunit. The 
genes for each antibody subunit are not present in 
the genome as single, contiguous units. Rather, each 


subunit gene is arranged as linear array of fragmented 
segments (V, D, J, and C in Figure |), each encoding a 
different part of the antibody subunit molecule. In this 
initial DNA arrangement, a functional antibody is not 
produced. Instead, the fragmented gene segments are 
first repositioned to generate, in the final DNA 
arrangement, a single gene that encodes one complete 
subunit of the antibody. This shuffling of gene seg- 
ments is accomplished through an ordered series of 
highly regulated gene rearrangements. 

For each gene segment, there are multiple segments 
that can all be mixed and matched with other segments 
to construct a complete antibody subunit gene. Each 
antibody-producing cell undergoes antibody gene 
rearrangements independently of other antibody- 
producing cells. This process results in a pool of anti- 
body genes with an overall diversity that is several 
million times greater than the diversity of the original 
pool of variant gene segments, and is referred to as 
‘combinatorial diversity.’ Antibody gene rearrange- 
ments are primarily considered in the context of 
their ability to generate diversity in antibody proteins. 
However, these gene rearrangements also cause acti- 
vation of important regulatory elements (called 
‘promoters’ and ‘enhancers’) that control antibody 
gene expression. Similar gene rearrangements are 
responsible for generating diversity in variant gene 
segments that encode T-cell receptor subunits. Thus, 
regulated gene rearrangements are used by the 
immune system to generate immense diversity in anti- 
bodies and T-cell receptors, which in turn are neces- 
sary to effectively combat infection by microbial 
pathogens. 
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Gene rearrangements have occurred in prokaryotes 
since the dawn of unicellular life. They play a critical 
role in bacterial evolution. The remarkable plasticity 
of bacterial genomes has been revealed by the compari- 
son of restriction maps and, more recently, the com- 
parison of whole genome DNA sequences. We find, in 
the latter, evidence for lateral gene transfer between 
species as well as long- and short-range rearrange- 
ments within strains of the same species. The recom- 
bination events leading to these rearrangements are 
relatively rare, so that most genetic experiments con- 
ducted on the time-scale of years result in a unique 
physical map for a given strain. However, there are 
several examples of rearrangements of bacterial genes 
that occur on a much shorter time-scale, minutes or 
seconds; these are described below. 

Some contemporaneous rearrangements are 
stochastic: they occur at a given frequency throughout 
vegetative growth, providing variant phenotypes 
available for selection when the need arises. Others 
are developmentally regulated, responding to envir- 
onmental cues to provide new proteins for the devel- 
opmental program. 


Stochastic Rearrangements 


The general idea of a stochastic rearrangement is to 
provide a new promoter for a gene encoding a struc- 
tural protein or an enzyme. A classical example is 
the phenomenon of phase variation in Salmonella, 
observed originally in the 1920s and studied further 
in the 1950s. Certain strains of Salmonella can switch 
from a form expressing one flagellar antigen (H1) toa 
form expressing a different flagellar antigen (H2) and 


then back. This switch, or phase variation, corres- 
ponds at the molecular level to the inversion of a DNA 
segment by a site-specific recombination enzyme. The 
inverted segment carries a promoter such that, in one 
orientation, it drives the transcription of a gene en- 
coding the H2 flagellar protein and a repressor of 
transcription of the distant H1 antigen gene. In the 
opposite orientation, the promoter points the ‘wrong’ 
way, preventing transcription of both the H2 gene and 
the repressor of H1. Thus, transcription of H1 occurs 
and the phase is switched. Flipping of the promoter 
segment is accomplished by a site-specific recombin- 
ase operating on short inverted repeat sequences at the 
ends of the segment, which also encodes the recombin- 
ase. This antigenic variation occurs in about one cell per 
thousand per generation, allowing the population as a 
whole to survive antibody directed against one or the 
other flagellar antigen. 

A variation of this theme in Escherichia coli has the 
promoter for the fimA gene, encoding the structural 
protein for type 1 fimbriae, alone on the invertable 
segment. Two recombinases (FimB and FimE) are 
each encoded nearby. Inversion of the promoter- 
containing segment by FimB results in transcription 
of the fimA gene, while the FimE recombinase flips 
the promoter, shutting off transcription of fimA. 
Other proteins, such as IHF, play a role in this in- 
version. Fimbriae are important in virulence, mediat- 
ing the attachment of E. coli to epithelial and other 
human cells. 

Other rearrangements occur as a consequence of re- 
combination between repeated sequences in the gen- 
ome, catalyzed by the general recombination system, 
or by the movement of elements that encode site- 
specific recombinases that catalyze their own trans- 
position. Any pair of repeated sequences can be found 
in two different relative orientations: direct or inverted. 
Recombination between two identical sequences in in- 
verted orientation results in inversion of the entire DNA 
segment between the recombining repeated elements. 
One such event involves the genes encoding ribosomal 
RNA in E. coli, two of which flank the origin of 
chromosome replication (ORI), oriented away from 
the ORI. Transcription seems to be more efficient, 
for any gene, if it is oriented in the same direction 
as DNA replication. Recombination between the in- 
verted rRNA operons flanking ori does not, of course, 
change the direction of transcription relative to ORI, 
since DNA replication is bidirectional, but neverthe- 
less growth of cells having one arrangement is slightly 
faster than growth of cells with the other arrangement. 
General recombination between the rRNA operons 
flips the ORI in a few cells per thousand in each gener- 
ation. Under normal circumstances, the rearranged 
chromosomes are lost because the cells containing 
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Other rearrangements occur as a consequence of re- 
combination between repeated sequences in the gen- 
ome, catalyzed by the general recombination system, 
or by the movement of elements that encode site- 
specific recombinases that catalyze their own trans- 
position. Any pair of repeated sequences can be found 
in two different relative orientations: direct or inverted. 
Recombination between two identical sequences in in- 
verted orientation results in inversion of the entire DNA 
segment between the recombining repeated elements. 
One such event involves the genes encoding ribosomal 
RNA in E. coli, two of which flank the origin of 
chromosome replication (ORI), oriented away from 
the ORI. Transcription seems to be more efficient, 
for any gene, if it is oriented in the same direction 
as DNA replication. Recombination between the in- 
verted rRNA operons flanking ori does not, of course, 
change the direction of transcription relative to ORI, 
since DNA replication is bidirectional, but neverthe- 
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faster than growth of cells with the other arrangement. 
General recombination between the rRNA operons 
flips the ORI in a few cells per thousand in each gener- 
ation. Under normal circumstances, the rearranged 
chromosomes are lost because the cells containing 


them grow more slowly than their predecessors. 
However, if a selectable gene is inserted such that it 
can be transcribed only in the less preferred orienta- 
tion of the rRNA operons, that orientation can be 
selected and maintained. Relaxation of selection 
results in repopulation of the culture with the more 
preferred orientation. 

Recombination between two identical sequences in 
direct orientation results in deletion, rather than inver- 
sion, of the intervening sequences. Most bacterial 
genomes (Bacillus subtilis is a notable exception) con- 
tain substantial numbers of genetic elements called 

‘insertion sequences,’ usually about 1 kb long, con- 
taining an open reading frame encoding a site-specific 
recombinase (transposase) flanked by short (up to 
40 bp) sequences themselves in inverted repeat orien- 
tation. Expression of the encoded transposase results 
in flipping of the entire insertion sequence at the same 
locus. But general recombination can occur between 
two copies of the same insertion sequence at different 
chromosomal locations, again leading to either long- 
range inversions or deletions. If the deleted DNA 
segment contains an essential gene, the cell in which 
that rearrangement occurred will die. Insertion se- 
quences are also responsible for recombination events 
between chromosomes and plasmids, such as the 
insertion of the F plasmid into the chromosome of 
F* E. coli, generating the high-frequency conjugating 
strains called Hfr. Plasmid—plasmid recombination to 
form cointegrates also occurs via insertion sequences. 
Finally, DNA segments flanked by two identical or 
nearly identical insertion sequences, in direct or 
inverted orientation, are capable of transposition 
from one chromosomal location to another as a unit. 
Such ‘transposons’ are also responsible for large-scale 
genome reorganization in bacteria. 


Developmentally Regulated Gene 
Rearrangements in Bacteria 


Recombination between two directly repeated DNA 
elements leads to deletion of the DNA between the 
elements. Several such events have been described in 
connection with specific developmental programs 
in bacteria: induction of bacteriophage lysogens, 
differentiation of nitrogen-fixing heterocysts in cyano- 
bacteria, and sporulation in bacilli. 

The E. coli bacteriophage lambda has two alterna- 
tive life styles. Upon infection of a naive cell, the linear 
viral DNA is circularized. It then chooses between 
replication, leading to lysis of its host accompanied 
by release of several hundred progeny virus particles, 
or integration by site-specific recombination between 
a special site (attP) on the viral chromosome and a 
corresponding site (attB) on the host chromosome. 
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There it is content to rest and be replicated once 
each generation by the host’s DNA replication 
machinery. If the host is endangered by any one of 
several insults, the viral DNA is excised from the 
chromosome by reversal of the recombination events 
that inserted it originally. It then replicates, eventually 
yielding several hundred virus particles, as in the lytic 
cycle. The excision of lambda DNA from the chromo- 
some of a lysogen is perhaps the best-studied example 
of a developmentally regulated gene rearrangement 
in bacteria. All of the enzymes and participating 
protein factors have been purified and the role of 
each nucleotide in the insertion and excision sites has 
been determined in vitro. 

Regulation of this rearrangement is essentially 
negative. That is, the inserted viral chromosome 
expresses one gene, yielding a repressor protein that 
effectively blocks transcription of every viral gene 
except its own. Insults to the host cell result in activa- 
tion of a protease that cleaves the repressor protein, 
leading to expression of viral genes encoding the exci- 
sion recombinase and DNA replication proteins; the 
pathway to virus production and cell lysis described 
above is then followed. Induction of this lytic path- 
way requires relief of repression in the lysogen. 

Such negative regulation has not been detected yet 
in the case of cyanobacterial heterocyst differenti- 
ation. Cyanobacteria are oxytrophic photosynthetic 
bacteria; that is, they carry out green plant photo- 
synthesis, evolving oxygen in the light. Although 
some cyanobacteria can utilize fructose or glucose as 
a carbon source, most known species cannot do so, but 
rather are obligate phototrophs dependent upon light 
and the fixation of CO, for their reduced carbon. 
Some species, such as Anabaena, grow in filaments 
of several hundred cells, indistinguishable from one 
another as long as a good source of reduced nitrogen 
(ammonia or nitrate) is available. Deprived of such a 
source, Anabaena differentiates cells specialized for 
nitrogen fixation along each filament, usually spaced 
about ten cells apart (Figure |). The undifferentiated 
vegetative cells continue to fix CO, and to generate 
Op. The specialized cells, called heterocysts, are anaer- 
obic factories for nitrogen fixation, the reduction of 
atmospheric nitrogen gas to ammonia. 

The patterned conversion of a dividing, oxygen- 
evolving vegetative cell to an anaerobic, nitrogen-fixing 
heterocyst requires the orderly expression of many 
genes, up to 20% of the 7300 genes in the Anabaena 
genome. Among the genes needed in the heterocyst 
are those encoding the machinery for nitrogen fix- 
ation, including the polypeptides of the nitrogenase 
complex. These are organized in an operon, nifHDK, 
encoding the protein called dinitrogenase reductase 
(NifH) and the two subunits of dinitrogenase (NifD 
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Figure I (See Plate 19) Filaments of the cyanobacter- 
ium Anabaena 77 h after transfer to nitrogen-free 
medium. Nitrogen-fixing heterocysts have differentiated 
at regular intervals along each filament. The image 
shown is a composite of a fluorescence image showing 
the location of green fluorescent protein expressed 
from the promoter of the hetR gene and a DIC image 
that outlines the cells. The HetR protein is required for 
heterocyst differentiation. It is expressed early in the 
differentiation of only those cells destined to develop. 


and NifK). Most strains of Anabaena contain an 11-kb 
DNA element interrupting the 7zfD gene in vegetative 
cell DNA. During heterocyst differentiation, and 
only during differentiation, the 11-kb element is 
excised by a site-specific recombinase, acting on 
directly repeated sequences at the ends of the element. 
The resulting circular element does not replicate or 
reinsert during the limited life of the heterocyst. These 
nitrogen-fixing cells do not divide. Eventually they 
die or are diluted out by growth of the vegetative 
cells if a new supply of reduced nitrogen is found. 
Under nitrogen- fixing conditions, the vegetative cells 
can grow by virtue of amino acids supplied directly to 
them by the heterocysts. Continued differentiation 
of heterocysts, halfway between existing heterocysts 
once each vegetative cell generation, maintains the 
spacing pattern. 

Excision of the 11-kb element in differentiating 
heterocysts is catalyzed by a site-specific recombinase 
encoded by the xisA gene, located within the 11-kb 
element. The repeated sequences at the ends of the 
11-kb element at which recombination occurs have 
the same feature as those of the bacteriophage lambda 
attachment site: a fully conserved core flanked on both 
sides by regions of partial sequence identity. In these 
respects, the 11-kb element looks like a remnant of 
a bacterial virus chromosome, but lacking genes for 


5'nifH...nifD..GGCA----T-C---GCCTCATTAGG-----CAC—AA----C..nifD.....nifK. 
5'nifB...fdxN..T-G-----A-T—TATTC—AGAA-TTT-C---A.. faXxN.....nifS...nifU. 


S'hupL..G----CACAGCAGTTATATGG------- T---G—A..hupL. 


Figure 2 Nucleotide sequences involved in the 
excisions that occur during cyanobacterial heterocyst 
differentiation. In each case, the sequences shown are 
repeated directly, separated by, from top to bottom, 
II kb, 55kb, and 10.5 kb, respectively. Recombination, 
catalyzed by a recombinase encoded within the excised 
element, occurs within the bold-faced sequence, result- 
ing in the excision of circular elements of the size 
mentioned. Dashes represent nucleotides that differ in 
the two copies of the sequence prior to excision. Plain 
capitals represent nucleotides that are conserved 
around both copies of the repeated sequences. The 
recombinases that catalyse the nifD and hupL rearrange- 
ments are related proteins. The fdxN recombinase is un- 
related to these but is related to the enzyme that excises 
the skin element during Bacillus subtilis sporulation. 


head and tail components. The element seems not to 
contribute materially to Anabaena vegetative cell life, 
because cells cured of the element grow as well as 
wild-type cells in medium containing ammonia and 
they differentiate and fix nitrogen normally. 

The 11-kb element interrupting the nifD gene is only 
one of three such elements that interrupt Anabaena 
genes whose products are involved in nitrogen fixation. 
A 55-kb element interrupts a nearby operon that 
includes the mifB, nifS, and nifU genes. Just as the 
11-kb element prevents transcription through the 
nifHDK operon, the 55-kb element prevents tran- 
scription of nifU and nifS, genes whose products are 
required for formation of iron-sulfur clusters and their 
insertion into dinitrogenase. The 55-kb element is 
excised precisely during heterocyst differentiation, 
using a site-specific recombinase encoded by the elem- 
ent, acting on directly repeated sequences at the ends of 
the element (Figure 2). Both the amino acid sequence 
of the recombinase and the DNA sequences at the 
excision sites differ from those of the 11-kb element. 

These two elements appear to provide ultimate 
examples of selfish DNA. They provide no known 
advantage to the cells carrying them, but they are 
clever enough to get out of the way when the genes 
they invade are necessary for survival. At the time of 
their discovery, each of the excision enzyme sequences 
defined new families of recombination enzymes. Sub- 
sequently, another small element was discovered inter- 
rupting a gene encoding hydrogenase i in Anabaena. 
Like the first two, its excision occurs only during 
heterocyst differentiation. In this case, the sequence 
of the excisase encoded by the new element puts it in 


the same family as the excisase of the 11-kb element. 
The excisase of the 55-kb element remained an orphan 
until the discovery of another element, described 
below, that interrupts a gene required for sporulation 
in B. subtilis. 

Many gram-positive bacteria, such as the soil inhab- 
itant B. subtilis, produce heat-stable spores when con- 
ditions become unfavorable for vegetative growth. The 
process of sporulation involves the regulated expres- 
sion of a very large number of genes, ending with the 
lysis of the mother cell within which the spore develops. 
In response to environmental signals such as carbon 
or nitrogen starvation, a cascade of two-component 
regulators is brought into play. The final step in this 
cascade of phosphorylations is the activation of a 
sigma factor that permits transcription of the earliest 
acting sporulation-related genes. A septum forms 
asymmetrically and one bacterial chromosome parti- 
tions into each of the daughter cells. The smaller of 
these cells pinches off within the intact mother cell and 
is supplied with several layers of protein to provide the 
characteristic tough coat of the spore. This program is 
managed by the differential expression and use of 
sigma factors in the two cell compartments, the devel- 
oping spore and the mother cell. One of the last events 
is the expression in the mother cell of a gene encoding 
the sigma factor that directs transcription of the major 
spore coat protein gene. The sigma factor gene is 
interrupted by a 42-kb element that must be excised 
for the functional sigma factor to be made. As might 
now be expected, the excision is carried out by a site- 
specific recombinase acting on directly repeated 
sequences at the ends of the element. Since this event 
occurs only in the mother cell, which will die, it has to 
be repeated whenever a vegetative cell sporulates. 
Finally, the amino acid sequence of this excisase puts 
it in the same family as the enzyme that excises the 55- 
kb element in Anabaena. 

The similarities between excision of bacteriophage 
lambda DNA from a lysogen and the excision of these 
elements interrupting genes in Anabaena and Bacillus 
suggest that the latter elements entered their respective 
host chromosomes as viral DNA. In the case of Ana- 
baena, many strains from different parts of the world 
have one or more of these elements in their chromo- 
somes, so some comparative sequencing might permit 
analysis of their age and evolution. There are also 
parallels between these bacterial gene rearrangements 
and the transactions in developing lymphocytes that 
generate the reorganized genes responsible for anti- 
body diversity. 
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Gene Regulation Occurs Primarily at the 
Level of Transcription 


Gene regulation is the highly controlled turning on 
and off of gene expression. In single celled organisms 
it directs the efficient use of cellular resources in 
response to the cell’s environment. In multicellular 
organisms gene regulation defines the cell, its struc- 
ture and function, and ultimately the whole organ- 
ism. Aberrant gene regulation results in cancer, birth 
defects, and even death. The first step in gene expres- 
sion is transcription. Therefore, transcription is the 
primary point of regulation in the process of gene 
expression. However, one must keep in mind that 
before a gene can be transcribed the chromatin pack- 
aging must be opened up to allow the transcriptional 
machinery access to that gene. 


RNA Polymerase II General 
Transcription Factors and Basal 
Transcription Mechanism 


Eukaryotes versus Prokaryotes 
In the most basic sense, the mechanism of transcription 
in eukaryotes is very similar to that of prokaryotes. 
The promoter and RNA start site must be recognized 
and the RNA transcript must be initiated, elongated, 
and terminated. In addition, several RNA polymerase 
subunits have conserved structures and functions indi- 
cating that they have a common ancestry. 

In eukaryotes, however, the DNA is in a form 
called chromatin, and only a small proportion of the 
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genome is expressed (in higher eukaryotes). The 
expressed regions of chromosomes were shown over 
a decade ago to be more open or accessible than 
repressed regions, for example, to nuclease cleavage. 
Other differences between the expressed and unex- 
pressed regions include the presence of an RNA poly- 
merase, the nonhistone proteins, modified histones, 
and undermethylated DNA (mammals). 

Another major difference between prokaryotic 
and eukaryotic transcription results from the seques- 
tration of chromosomes in the nucleus, or compart- 
mentalization. Transcription and translation occur in 
separate compartments, the nucleus and the cytoplasm. 
In the nucleus the RNA (hnRNA) is transcribed and 
processed to mRNA. It is then transported out of the 
nucleus to the cytoplasm (and on the endoplasmic 
reticulum, ER) to be translated by the ribosomes 
(mRNA) or to participate in the process of translation 
(rRNA and tRNA). 

Finally, there are differences in genome complexity 
and gene structure. Eukaryotic cells have 10° to 10° 
single-copy genes. In addition, their genes are discon- 
tinuous, containing introns and exons. 


Eukaryotes have Three Classes of Genes, 
Each Transcribed by a Separate RNA 
Polymerase 
These three RNA polymerases were originally identi- 
fied by Robert Roeder while in William Rutter’s 
laboratory in the early 1970s by fractionation of 
nuclear extracts from cells on a DEAE-Sephadex col- 
umn and are found in all eukaryotes (yeast, plants, 
insects, mammals). These enzymes were named RNA 
polymerase I, II, and III (Pol I, Pol II, and Pol IU, 
respectively). Subsequently, the polymerases have 
been shown to transcribe pre-rRNA genes (class I), 
hnRNA genes (class II), and pre-tRNA and 5S RNA 
genes (class III), respectively. The subunit compos- 
itions of the three eukaryotic polymerases are similar 
(Figure l). In addition, the two largest subunits of each 
eukaryotic polymerase have structural and functional 
conservation with Escherichia coli subunits B and R’. 
None of the subunits seem to correspond to sigma 
factor (in E. coli). The corresponding activities associ- 
ated with sigma factor are found in TFIIF and TBP, 
thus these functions are divided between proteins. 
The RNA polymerase II largest subunit contains an 
unusual domain on its C-terminus referred to as the 
C-terminal domain (CTD). The CTD is not found in 
Pol I and Pol III largest subunit or P’ of E. coli RNA 
polymerase. The CTD consists of 26-52 repeats of a 
7-amino acid sequence: Tyr-Ser-Pro-Thr-Ser- Pro-Ser. 
This domain is highly phosphorylated on the Ser, Thr, 
and Tyr residues. The less phosphorylated form is 
called IIa and highly phosphorylated form is called 
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IIo. Generally, the IIA form of RNA polymerase II, 
containing subunit Ia, is associated with inactive 
genes, whereas the IIO form, containing IIo, is associ- 
ated with active genes. 

Each RNA polymerase is regulated independently 
and has a different promoter structure. All three have 
more complex promoter structures than prokaryotic 
promoters do. Each polymerase has multiple subunits 
(8-14) and has a molecular weight on the order of 
5 x 10°. In each case, the RNA polymerases them- 
selves are not sufficient for promoter recognition and 
promoter specific transcription. Alone, they will initi- 
ate transcription essentially randomly, primarily at the 
ends and nicks in DNA. To initiate transcription pro- 
moter specifically they require additional protein fac- 
tors or ‘accessory factors’ that have been identified by 
fractionation from cell extracts. These are also called 
transcription factors. These transcription factors can 
be further divided into two groups: the basal or gen- 
eral transcription factors (GTFs), which are absolutely 
required for promoter-dependent transcription, and 
the regulatory or promoter-specific transcription fac- 
tors. 


RNA Polymerase Subunits and Ancillary 
Factors were Identified by Purification from 
Cell-Free Extracts 

The three types of cell extracts that have been used are 
whole cell, nuclear, and cytoplasmic. These extracts 


were the starting material for fractionation, purifica- 
tion, and identification of these protein factors. One 
factor, TBP, which stands for TATA-binding protein, 
is required by all three RNA polymerases. The TATA 
box isa DNA motif found in the minimal promoter of 
many class II genes. TBP is a component of the general 
transcription factors TFIID (pol II), TFIIB (pol II), 
and TIF-IB (pol I). Class II genes are more numerous 
and are regulated differently from class I and III genes. 
Class I and III genes have simpler promoter structures 
and require fewer accessory factors. Class II genes have 
much more complex promoter structures and require 
a much larger number and variety of general transcrip- 
tion factors and transcription regulatory factors. 


RNA Polymerase II Basal Transcription 
Factors and the Basic Mechanism of RNA 
Polymerase II Transcription 

Fractionation of crude cell-free extracts has identified 
seven general (basal, minimal) transcription factors 
(GTFs). These have been designated as transcription 
factors (TF) ITA, IIB, IID, ITE, IF, and ITH. Initially 
these proteins or protein complexes were separated 
by chromatography. Their functions were identified 
using in vitro transcription assays and electrophoretic 
mobility shift assays. 

The minimal eukaryotic promoter consists of an 
RNA start site and the TATA-element (Figure 2). 
The RNA start site is often an A surrounded by pyr- 
imidines, and is called an initiator (Inr). The consensus 
TATA-element sequence is TATA(A/T)A(A/T) that 
tends to be surrounded by GC-rich sequences. 
The TATA-element is found at —25 to —35 in higher 
eukaryotes and —40 to —90 in yeast. However, this 
promoter element is absent on many constitutively ex- 
pressed genes, sometimes called housekeeping genes. 
In addition, there is some variation in TATA box 
sequences recognized by TBP, as well. 

TBP is a 38-kDa protein that binds the TATA box 
in the DNA minor groove and has a modular structure 
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Figure 2 The minimal or core promoter sequences. 


(seen in many transcription factors). The C-terminus 
of this protein has homology to E. coli sigma factor. 
When TBP binds the TATA-element it bends the 
DNA 80° (shown by TBP-DNA co-crystal). In 
addition, it puts a kink in the DNA and unwinds 
DNA 110°, opening up the minor groove (Figure 3). 

The ‘saddle’ is actually perpendicular to the main 
DNA long axis, but is parallel with it through the 8 bp 
of the TATA box. The 80° bend and the 110° unwind- 
ing of the DNA nearly compensate; the net result is no 
measurable change in the supercoiling or any measur- 
able bend in the DNA associated with TBP. When 
TBP binds in the minor groove, it opens it up and 
molds it to the underside of the TBP saddle. 

TFIID is the only GTF that makes a sequence- 
specific contact with the DNA template. TFIID has 
TAFs (TBP-associated factors). TBP plus the TAFs 
make up TFIID (total MW of 750000). There are 
eight TAFs with molecular weights of 250, 150, 110, 
80, 60, 40, 30-a, and 30-B. TBP is a part of pol I and III 
transcription factors and has pol I and III specific 
TAFs. Hence TBP is sometimes referred to as the 
universal transcription factor. TFIID TAFs are design- 
ated by, for example, TAFy250. The TAF proteins are 
thought to function as adapters or surfaces for inter- 
action with other protein factors or DNA sequences. 
TFIIA contacts TAF; 110, 250 and TBP. TAFy150 
binds the Inr sequence and TAFy7150 and TBP alone 
cover the same length of DNA as native TFIID (TBP 
plus all TAFs, Figure 4). TBP alone has an approxi- 
mately 20-bp footprint right over the TATA box, 
whereas TFIID has a 75-bp footprint centered on the 
RNA start site. 
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Figure 3 The interactions between TBP and the TATA element DNA. 
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Figure 4 The complete native or holo-TFIID bound 
to the TATA element and the Inr. 


Role of the RNA Polymerase II GTFs in 
Transcription 

The ‘preinitiation complex’ that is formed is very 
large, having a molecular weight of greater than 2 
million Da and consisting of as many as 40 polypep- 
tides (Figure 5). On a supercoiled template only a 
minimal set of TFs, TBP, TFIIB, TFIIF, and RNA 
pol II, are required for transcription; the free energy 
of the supercoiled template may promote open com- 
plex formation. On a TATA-less promoter there is 
evidence for Inr-binding protein, TAF};150, and RNA 
polymerase II-mediated complex formation on the Inr 
site. In addition, TATA-less promoters have multiple 
RNA start sites, while TATA plus Inr containing 
promoters often have a single RNA start site. 

The following is a summary of RNA polymerase II 
and its GTFs by function and by order of appearance. 
TBP (or TFIID) binds to the TATA element, forming 
astable or template committed complex. TFILA stabil- 
izes IID binding and counteracts negative factors, but 
is not required with purified factors. TFIIB also stabil- 
izes IID binding. In addition, TFIIB functions in RNA 
polymerase II docking during preinitiation complex 
formation and in measuring the distance to the RNA 
start site(s) from the TATA element. TFIIF tightly 
associates with RNA polymerase II and functions to 
repress nonspecific DNA binding by polymerase II 
and in RNA polymerase II docking on the preiniti- 
ation complex. RNA polymerase II binds DNA and is 
the catalytic component, functioning to synthesize 
RNA from the DNA template. TFHE has a regulatory 
role, functioning to recruit TFIIH and to stimulate the 
CTD kinase activity and to inhibit the helicase activ- 
ity. TFIIH contains DNA duplex melting and CTD 
phosphorylation activity and functions in promoter 
clearance. 

The intermediate complexes formed by the sequen- 
tial binding of the transcription factors were identified 
by EMSA (gel shift), footprinting, and order-of- 
addition im vitro transcription assays. Interactions 
between transcription factors were determined by 
affinity chromatography, glycerol gradient. This multi- 
step, multifactor process provides many opportunities 
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Figure 5 A summary of the basic mechanism of RNA 
polymerase II transcription initiation. 


for regulation. Protein—protein interactions between 
GTFs and Pol II make up most of the interactions that 
hold the preinitiation complex together. Elongation is 
carried out by RNA polymerase IIO and is regulated 
and stimulated by TFIIF and TFIIS (elongation 
factors). Termination is difficult to study owing to 
the rapid processing of the RNA transcript. Termin- 
ation sites are not well defined, but there is some 


evidence that termination does occur and requires 3’ 
cleavage. 


Regulatory Elements and Factors 


Basic RNA Polymerase II Promoter 
Structure and the Proteins that Bind the 
Promoter Motifs 

It could be said that there are five steps in gene expres- 
sion (protein coding genes): (1) activation of gene 
chromatin structure; (2) initiation of transcription; 
(3) RNA processing; (4) transport to the cytoplasm; 
and (5) translation. Transcription initiation is an early 
step, and is therefore an important control point. 

Basic RNA polymerase II promoter structure in- 
volves cis-acting sequences that are bound by trans- 
acting protein factors (Figure 6). The cis-acting 
sequences are often identified by deletion and ‘linker- 
scanning’ mutagenesis. 

The proximal, minimal, or core promoter region 
consists of TATA and Inr/Start site (Figure 2). 
Upstream promoter elements often act constitutively 
or in an unregulated manner (Figure 6). These elem- 
ents are often found on the promoters of ‘housekeep- 
ing’ genes. Some examples of constitutive promoter 
elements and the factors that bind them are the GC 
box and the CCAAT box. GC boxes (GGGCGG) are 
bound by Sp1, which is expressed in all cell types in 
humans. There are often multiple GC boxes (G/C 
islands) found in gene promoters. These may function 
in concert with regulated factors to increase their 
effect on transcription. In addition, GC boxes are 
often seen upstream of TATA-less and Inr-less pro- 
moters. CCAAT boxes are bound by several factors 
including CTF/NF1. CTF/NF1 is present in all 
tissues, as well. 

Regulatory promoter elements and the factors that 
bind them respond to environmental stimuli or are 
cell-type specific. Inducible element or response elem- 
ents and transcription factors include those induced 
by stress (for example heat shock). The HSE (heat 
shock element) is bound by the heat shock transcrip- 
tion factor (HSTF). These elements can also be hor- 
mone inducible, for example, the hormone response 
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Proximal elements 
(constitutive and regulator 


promoter elements are often 

intermixed) 
Figure 6 A generic example of an RNA polymerase II 
promoter structure. 
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elements, which are bound by the hormone receptors. 
The hormones include steroid hormones, derived 
from cholesterol, and thyroid hormone, derived from 
tyrosine. A well-studied example is the glucocorticoid 
receptor (GR), which is bound by glucocorticoid 
in the cytoplasm causing it to move into the nucleus. 
The GR-hormone complex then binds to the GRE 
(glucocorticoid response element). In contrast, 
membrane-bound receptors act through second mes- 
sengers to activate transcription regulatory factors 
bound to their cognate sequences (e.g., CREB/ATF 
on CRE). 

Cell type-specific regulatory elements and tran- 
scription factors regulate cell type-specific gene 
expression. The transcription factors are expressed or 
active only in particular cell types. For example, in the 
B-cell-specific expression of immunoglobulin genes 
the promoter is bound and activated by Oct-2, which 
is only expressed in B cells. 

Enhancers are made of many of the same DNA 
elements, GC boxes, CCAAT boxes, response elem- 
ents, cell-specific elements, and are bound by the same 
factors. However, enhancers can function over very 
large distances. Enhancers have been described as “a 
promoter element that might have been designed by an 
overenthusiastic graduate student” (Gary Felsenfeld). 
These elements can function downstream as well as 
upstream (3’ and 5’ from proximal promoter), and 
they can function in either orientation (can be inverted 
180°). Enhancers are composed of different combin- 
ations and often of redundant regulatory elements, in- 
cluding constitutive elements, providing a wide range 
of regulatory possibilities. They can be cell type- 
specific or respond to external factors. These elements 
provide cell type-specific or factor-regulated expres- 
sion to heterologous genes. 

Transcription factors interact with the nucleotide 
bases on chemical groups outside of those participating 
in H-bonding between base pairs (bp) in the major and 
minor grooves. Each bp has a set of H-bond donors 
and acceptors, and hydrophobic surfaces. This is true 
of both the major and minor grooves. However, only 
the major groove has a unique pattern for each bp. 
Therefore, most sequence-specific factors bind in the 
major groove. 


Structural Families of Regulatory 
Transcription Factors 

Transcription factors have modular structures, afeature 
common to many eukaryotic proteins. Nearly all regu- 
latory transcription factors are DNA-binding proteins. 
One of the first transcription factors to be purified and 
identified was Sp1, which binds many promoters at 
GC boxes. Sp1 can be purified by GC box double- 
stranded oligonucleotide affinity chromatography. 
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Figure 7 The ‘modular’ structure of transcription 
regulatory proteins. 


These factors must bind to promoters containing 
their cognate binding site to activate transcription. 
They have two to four modules or domains. Nearly 
every transcription factor contains a sequence recogni- 
tion domain, which binds DNA, and a protein-protein 
interaction domain, which binds the general transcrip- 
tion factors or RNA polymerase II (Figure 7). Some 
transcription factors have dimerization domains and 
some have regulatory domains, where they are modi- 
fied or bound by regulatory factors. They also often 
have flexible connector regions between domains. The 
‘modular’ or domain structure of these proteins has 
been demonstrated by the ability to form hybrid pro- 
teins consisting of domains from two different 
proteins that will function in transcriptional activation. 

Activation domains interact with basal transcrip- 
tion factors and RNA polymerase II via protein- 
protein interactions. A common example is the acidic 
activation domain containing several asp, glu residues. 
GAL4, GCN4, VP16 (binds Oct-1), and the gluco- 
corticoid receptor (GR) contain acidic activation 
domains. These activation domains have no specific se- 
quence homology. However, all these proteins have a 
net negative charge. Another example is the glutamine- 
rich domain, found in Sp1, Antennapedia, Oct-1, 
Oct-2 N-terminus, and homeobox proteins. These 
domains contain approximately 25% glutamines and 
few negatively charged residues. A third example is 
the proline-rich domain found in CTF/NF1, Jun, 
AP2, Oct-2 C-terminus, which consists of 25% pro- 
line residues. Activation domains are thought to be 
somewhat unstructured and to contain hydrophobic 
amino acids and variously placed characteristic side 
chains. The interaction of activation domains with 
their targets is driven by hydrophobic forces (similar 
to protein folding). The order or periodicity of amino 
acid side chains in the cohesive surfaces determines the 
specificity of the interaction. Activation domains 
appear to be essentially unstructured and may only 
adopt a specific structure upon binding to their tar- 
gets, in other words undergo an induced fit. This 
model accounts for both the specificity and the flexi- 
bility in the activator—target interactions seen in tran- 
scription. 

DNA-binding (sequence-specific) domains are 
generally made up of a-helices and bind in the DNA 
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Figure 8 The structure of class | and Il zinc finger 
domains. 


major groove. One of the first DNA-binding domains 
identified is the zinc finger domain of which there are 
three types (Figure 8). Class I zinc finger proteins 
TFIIA, Sp1, Krüppel, and steroid hormone receptors 
have a group of conserved amino acids that bind a zinc 
ion (Zn?”) to form a particular structure. Class I zinc 
fingers have a single finger consensus (cys2/his2) that 
forms a tetrahedral structure with Zn°™ and that con- 
tains 23 amino acids. Class I zinc fingers have seven to 
eight amino acids between fingers. TFIIA has nine 
zinc fingers and Sp1 has three zinc fingers. Members 
of this class are usually monomeric and have multiple 
fingers. Class II zinc finger proteins include the ster- 
oid hormone receptors and have cys2/cys2 (C4) zinc 
fingers. These zinc finger proteins have a region on the 
first zinc finger that determines DNA-binding speci- 
ficity and bind as a dimer, for example, the GR. Class 
III zinc finger domains are typified by the GAL4 
DNA-binding domain and have cys6 (C6) zinc fin- 
gers. They contain dimerization a-helices that form a 
coiled coil. The cys6 zinc finger domains are compact 
globular domains. 

Other DNA-binding domains include the homeo- 
domain proteins (or basic helix—-turn-helix), the 
helix-loop-helix domain, the leucine zipper domain 
(Figure 9), and the POU (Pit-Oct-Unc) domain. 
Homeodomain proteins include the homeotic gene 
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Figure 9 The structures of three common sequence- 
specific DNA-binding domains. 


products (Antp, en), Oct-1, Oct-2, and «2. These 
proteins are well conserved (80-90% similarity 
among Drosophila factors) in a 60-amino acid domain. 
They are made up of three -helical regions and are 
related to the CAP protein, lambda repressor, and the 
Lac repressor in structure but are monomeric. Basic 
helix-loop-helix (bHLH) proteins include E12 and 
E47, myoD, c-myc, and Drosophila neuronal develop- 
ment factors. The bHLH domain is a 40-50-amino 
acid domain made up of two amphipathic «a-helices. 
These transcription factors form homodimers and 
heterodimers through interactions between the hydro- 
phobic face of the helices. The bHLH proteins have a 
basic region just N-terminal to the HLH domain 
that is required for DNA binding. Whether they are 
homodimers or heterodimers and what their partner 
is determines whether they will bind DNA at their 
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Figure 10 Regulation of steroid hormone receptor 
transcription activation activity. 


cognate site. Dimerization is required for stable 
DNA binding, which is why they have double DNA- 
binding motifs. Non-basic HLH proteins, when 
dimerized with bHLH proteins, render them unable 
to bind DNA. Leucine zipper-containing proteins 
include C/EBP, Jun, Fos (or AP1), and Gcn4p. These 
proteins also have a basic region required for DNA- 
binding. A leucine zipper is an amphipathic helix in 
which every seventh amino acid is a leucine protrud- 
ing from the hydrophobic face, with four to five 
repeats of this motif per protein (called bZIP pro- 
teins). These leucines interdigitate with those on a 
second bZIP molecule, and the two helices wind 
around each other. The DNA-binding site consists of 
two inverted repeats, with no separation. The bZIP 
transcription factors are often heterodimers. 


Transcription Regulatory Mechanisms 


Regulating the Regulators 

Transcription factors can be regulated at the level of 
gene expression. They can also be regulated through 
covalent modification (phosphorylation, etc.). For 
example, CREB (cAMP response element binding 
factor) is activated by phosphorylation and AP1 is 
inactivated by phosphorylation. Some transcription 
factors are regulated through ligand-binding. The 
binding of these lipid-soluble hormones regulates the 
steroid hormone receptors (Figure 10). Lipid-soluble 
hormones include cortisol, retinoic acid, and thyrox- 
inine. The hormone-binding domain of glucocorticoid 
receptor (GR) inhibits transcription activation in the 
absence of hormone. GR is thought to bind an inhibitor 
that anchors the GR in the cytoplasm in the absence 
of bound hormone, as well. Thyroid hormone recep- 
tor (THR) binds DNA and represses transcription in 
the absence of hormone. THR becomes a transcrip- 
tional activator on hormone binding. Other factors, 
such as NF-«B, are regulated, by protein inhibitors 
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(Figure | 1). The released NF-«B enters the nucleus, 
binds DNA, and activates transcription. Transcription 
factors that form heterodimers, such as MyoD/ID and 
MCM1/a2, can be regulated through a change of di- 
merization partner. Homodimers of E12 and MyoD 
bind poorly or dimerize poorly, respectively. Finally, 
an important regulatory mechanism is the accessibility 
of binding sites, which can be determined through 
changes in chromatin structure. 

Peptide hormones often function via posttransla- 
tional modification. These hormones first activate a 
membrane-bound receptor that sends a signal through 
a signal transduction pathway or a second messenger 
(small molecule). The end result is a modification, 
such as phosphorylation, of the transcription regula- 
tory factor. This modification can affect nuclear local- 
ization, DNA binding, or transcription activation. 
A classic example is G-protein mediated signaling 
through cAMP. cAMP activates PKA, which in turn 
phosphorylates and activates CREB. 


Regulation of Transcription; Regulation of 
the Function of GTFs 
Transcription regulatory factors can act in at least four 
ways. First, they can act through stabilizing or in- 
creasing the rate of general transcription factor-binding 
or association with the DNA or preinitiation com- 
plex. Second, they can act by activating (increasing 
the catalytic rate) of the activity of a factor (e.g., the 
CTD kinase activity of TFITH). Third, they can func- 
tion by inducing a conformational change in basal 
transcription factors. There are several steps and 
GTFs to serve as targets. Finally, activators may func- 
tion to counteract negative factors, for example, those 
that are part of or associated with TFIID, nucleo- 
somes, and histone H1. 

The protein-protein interactions between the tran- 
scription regulatory factors and the GTFs are well 


conserved. Activation domains from yeast work in 
Drosophila, plants, and mammals, although in most 
of these experiments acidic activators were used. 

Synergistic activation is observed when there are 
multiple factors bound to a promoter. Transcription 
factors may interact simultaneously with the same or 
different targets in the complex. Synergism may be the 
result of there being many factors and steps that act 
as targets. A related phenomenon, ‘squelching’ (repres- 
sion that occurs from high concentrations of a tran- 
scriptional activator), indicates that GTFs are targets 
for activators. Squelching is thought to result from 
high concentrations of an activating transcription fac- 
tor titrating out a GTF and inhibiting transcription. 

While the process of transcription initiation is often 
discussed as though there are several individual steps, 
many of the GTFs may be associated before promoter 
binding. This pre-assembled transcription complex 
is referred to as the holoenzyme. The holoenzyme 
model has important implications for how transcrip- 
tion regulatory factors work. 

Finally, RNA polymerase II transcription forms a 
cycle of initiation, elongation, termination, and reini- 
tiation. This cycle indicates that transcription acti- 
vators can stimulate multiple rounds of transcription. 
In addition, postinitiation steps like promoter clear- 
ance and elongation are also regulated. 


Adapters, Coactivators, or Mediators 

The TAF;40 is required for activation by GAL4- 
VP16 (an acidic activator) and the TAF,;110 is 
required for Sp1 (glutamine-rich activator). TBP 
alone is not sufficient for activated transcription by 
these transcription factors. 

Highly purified TFIID (includes all the TAFys) is 
not sufficient for activated transcription by certain 
transcript factors, as well. These transcription factors 
require adapter proteins or coactivators. These adap- 
ter proteins may also be titrated out in ‘squelching.’ 

Another type of coactivator protein are the archi- 
tectural transcription factors. These proteins mediate 
protein-protein interactions and bend the DNA to 
promote interactions between transcription factors 
bound to an enhancer. 


Promoter—Proximal Attenuation 

Attenuation sites are located 20-30 bp downstream of 
the RNA start site and are found in c-myc, hsp70, 
hsp26, hsp27, a- and B-tubulin, polyubiquitin, and 
GAPDH genes. The Drosophila hsp70 gene is a 
model for gene regulation through this mechanism. 
RNA polymerase II is paused on the uninduced 
hsp70 promoter in vivo with a ~25 nt transcript, and 
is distributed between —17 to +37 bp on the pro- 
moter (Figure 12). Gene activation by stress (heat 
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Figure 12 The paused transcription complex on the 
uninduced hsp70 promoter. 


shock, etc.) releases the pause with a concomitant 
phosphorylation of the CTD. 


Gene Regulation through Chromatin 
Structure 


Most of what we know about the role of chromatin 
structure in the regulation of specific genes is at the nu- 
cleosome level. A growing number of gene promoters 
have been shown to have positioned nucleosomes that 
play an important role in transcriptional repression 
and activation. In addition, some transcription factors 
function, at least in part, to counteract nucleosomal 
(core histone, H1) repression. 

If cells are depleted of one of the core histones 
many genes are deregulated, apparently by the loss 
of nucleosomal repression. This is accomplished by 
shutting off the expression of one of the histone genes. 
When these cells go through S-phase and replicate 
their DNA, they end up with half the amount of one 
of the core histones and so have only half the nucleo- 
somes needed for the assembly of two copies of the 
genome into chromatin. Several genes are derepressed, 
indicating that histones in the form of nucleosomes are 
required to maintain these genes in the inactive state. 
A similar situation is seen when trying to reconstitute 
transcription im vitro. Mutations in specific domains 
in the core histones disrupts repression and activation. 
This suggests a direct interaction between the tran- 
scription regulatory machinery and the core histones. 
In addition, mutations in the histones that affect their 
stability can suppress the effect of defective promoter 
elements, further supporting the idea that transcrip- 
tion factors can function to counteract nucleosomal re- 
pression. 


Examples of Positioned Nucleosomes in 
Repression and Activation of Specific Genes 
Nucleosomes can be positioned such that key promoter 
elements are wrapped around a nucleosome. Some 
regulatory transcription factors can bind their recog- 
nition sequences when they are wrapped around the 
nucleosome. Alternatively, nucleosomes can be posi- 
tioned such that a key promoter element is placed in the 
linker DNA between nucleosomes and constitutively 
available. 
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The mouse mammary tumor virus-long terminal 
repeat (MMTV-LTR) promoter is regulated by the 
GR (glucocortrioid receptor). The MMTV-LTR pro- 
moter is incorporated into six positioned nucleosomes 
(Figure 13). When this promoter is activated the 
glucocorticoid hormone binds and activates the GR. 
The activated GR can bind the GREs (glucocortrioid 
response elements) when wrapped around nucleo- 
some B, indicating that some transcription factors 
can recognize their binding sites on the surface of a 
nucleosome. GR-binding appears to displace histone 
H1 and recruit the Swi/Snf complex, which disrupts 
or reconfigures nucleosome B. This allows transcrip- 
tion factors NF1 and Oct-1 to bind their cognate 
sequences. All three transcription factors then displace 
nucleosome A or help the basal transcriptional machin- 
ery to displace nucleosome A. The preinitiation 
transcription complex is formed and the promoter is 
transcribed. 

The «2/MCM1 transcription factor complex func- 
tions to inhibit a-cell-specific genes in -cells (yeast 
Saccharomyces cerevisiae). Transcription factor «2 is 
absent in a-cells so no inhibition of these genes occurs. 
The «2/MCM1 complex binds the «2 operator in the 
promoter of these genes (e.g., STE6) and positions 
a nucleosome next to the operator over the TATA 
element, repressing transcription. This complex is 
thought to recruit Tup1p and Ssn6p (transcriptional 
repressors) to the STE6 promoter. The histone H4 tail 
is required for nucleosome positioning and transcrip- 
tional repression. In addition, Tup1p and Ssn6p have 
been shown to bind the histone H4 tail. Insertion of 75 
bp between the «2 operator and the TATA element 
in the linker DNA between nucleosomes does not 
relieve the repression. This may explain the fact that on 
the STE6 promoter an array of nucleosomes is formed. 
This array of positioned nucleosomes is thought to 
be stabilized by a backbone of Tup1p/Ssn6p mol- 
ecules. The transcription factors Miglp and Roxip 
also recruit the Tuplp/Ssn6p complex and repress 
transcription through nucleosome positioning on the 
promoters of the metabolic genes SUC2 and GAL1- 
GAL10. 


Role of Core Histone Acetylation in 
Transcriptional Regulation 

The core histone N-terminal tails are unstructured 
and highly positively charged, containing several 
lysines and a few arginines. The ¢-amino groups on 
the lysines are posttranslationally modified by acety- 
lation, which removes the positive charge. The un- 
modified, positively charged core histone tails may 
interact with the linker DNA and with negatively 
charged patches in the core histones on the exposed 
surface of adjacent nucleosomes. These interactions 
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Figure 13 The modulation of chromatin structure in MMTV-LTR promoter activation. 
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Figure 14 Transcriptional activation on chromatin vs. 
free DNA templates. 


stabilize the folding of the chromatin fiber into higher 
order structures that repress transcription. Acetyl- 
ation and removal of the positive charges disrupts 
these interactions and tends to derepress transcription. 
Sequence-specific DNA-binding transcription regu- 
latory proteins can recruit the activities (histone de- 
acetylases and histone acetyltransferases) responsible 
for maintaining the histone acetylation state. 


Biochemical Analysis of the Mechanism of 
Transcription Regulation with Chromatin 
Templates 

Most biochemical studies on the mechanism of tran- 
scription and its regulatory factors have used naked 
DNA templates for purposes of simplicity. The level 
of activation observed with these templates typically 


has only been in the range of 5- to 10-fold, possibly 
20-fold at the outside. Reconstitution of the DNA 
template into nucleosomes results in a general re- 
pression of transcription (Figure 14). If TFIID or 
sequence-specific DNA binding activators are bound 
prior to nucleosome formation then these templates 
are activated. The net result is a much greater-fold 
activation (107 to 10°) on chromatin templates than 
seen with free DNA templates, a level similar to that 
seen in vivo. This finding has led to the hypothesis 
that some transcriptional activators function at least 
in part to counteract chromatin-mediated repres- 
sion (‘antirepression’). This antirepression occurs on 
top of ‘true activation,’ which is the result of recruit- 
ment and stimulation of the basal transcription 
factors. 
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Gene silencing is defined as an epigenetic modification 
of gene expression leading to inactivation of pre- 
viously active genes. Epigenetic modification does 
not alter the DNA sequence and, although it is herit- 
able, variable frequencies of reversions to expression 
are observed. Gene silencing is used in the course of 
normal development and differentiation to repress 
genes whose products are not required in specific cell 
types or tissues. This may apply to individual genes 
or larger chromosome regions. In some special situ- 
ations, such as chromosome dosage compensation in 
mammals, one of the two female X chromosomes is 
almost completely repressed. Mechanisms responsible 
for repression of genes involve changes in chromatin 
structure and levels of DNA methylation, or destabil- 
ization of mRNA. Modifications of chromatin and 
DNA template make genes inaccessible to the tran- 
scription machinery. Mechanisms of RNA destabil- 
ization are still largely unknown. Aberrant silencing 
of genes may lead to disease in mammals and generate 
developmental variants in plants. For example, methy- 
lation of tumor suppressor genes contributes to the 
onset and progression of cancer, while methylation 
of genes controlling flower development results in 
heritable changes of flower morphology. 

Gene silencing can act at the transcriptional or 
posttranscriptional level; the two phenomena being 
referred to as transcriptional gene silencing (TGS), 
and posttranscriptional gene silencing (PTGS). 
Genes affected by TGS are either not transcribed at 
all, or transcripts are produced at very low levels. TGS 
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has been observed in fungi, plants, and animals. It is 
probably triggered by redundancies of genetic infor- 
mation since its occurrence correlates well with the 
presence of repeated genes or subgenomic fragments. 
As shown in plants, increased levels of ploidy may 
likewise act as a trigger of TGS. In organisms that 
are able to methylate their DNA, levels of DNA 
methylation are significantly increased in genes 
silenced by TGS. In the fungus Neurospora crassa, 
DNA methylation of redundant sequences is followed 
by a modification of their nucleotide sequences in a 
process referred to as repeat induced point mutation 
(RIP). In the fungus Ascobolus nidulans, methylation 
and inactivation of redundant genes occurs in a spe- 
cific phase of the life cycle, in a process called MIP 
(methylation induced premeiotically). TGS has been 
well studied genetically in yeast, and more recently 
also in plants. These studies revealed a number of 
genes that are required for silencing. Their protein 
products are either chromatin components or post- 
translational modifiers of chromatin proteins. In 
organisms that are able to methylate DNA, TGS regu- 
lators also include DNA methyltransferases and pro- 
teins recognizing methylated DNA. The biological 
role of TGS is still under debate. One postulated 
function is to extinguish transcription of transposable 
elements in order to prevent their movement and pro- 
pagation in chromosomal DNA. In plants, TGS is also 
able to affect single copy genes, giving rise to semi- 
stable epigenetic variants. Creation of reversible epial- 
leles adds to the phenotypic variability important in 
evolving plant populations. 

In PTGS, also referred to as cosuppression in 
plants, or quelling in N. crassa, the affected gene is 
transcriptionally active but its transcripts undergo 
rapid degradation, resulting in the absence of trans- 
latable mRNA. PTGS is frequently observed in trans- 
genic organisms, in particular when multiple copies of 
the transgene are present. Transcripts of both the trans- 
gene and host genes having 80% or more sequence 
identity with the transgene, are subject to the degrad- 
ation. In plants, infection with RNA viruses engin- 
eered to express sequences homologous to host 
genes, will likewise result in specific degradation of 
host and viral RNAs. Available evidence indicates 
that small antisense or double-stranded (ds) RNAs 
are responsible for specific RNA degradation. Such 
aberrant RNAs may be formed as a result of the 
artifactual bidirectional transcription from the trans- 
gene loci, or may be produced from endogenous genes 
modified by ectopic interactions with homologous 
transgenes. The strongest support for the role of 
dsRNA in PTGS comes from RNA interference 
(RNAi) experiments. Injection of dsRNA into the 
nematode Caenorhabditis elegans or into the eggs 
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of Drosophila melanogaster leads to potent and 
sequence-specific PTGS. Injected dsRNA is fragment- 
ed into ~ 23-nt-long RNA pieces which appear to act 
as guides hybridizing to endogenous mRNAs and 
targeting them for degradation. In both plants and 
C. elegans, the PTGS/RNAi effect spreads across cel- 
lular and tissue boundaries and small RNA fragments 
are the best candidates for the diffusible silencing 
signals. Genetic screens have identified several genes 
essential for establishing and/or maintaining PTGS. 
Some of them are also required for RNAi, indicating 
that the two phenomena are mechanisticaly related. 
Like TGS, PTGS may also represent a mechanism to 
defend the organism and its genome against invasive 
nucleic acids such as transposons, retroelements, and 
viruses. Certain forms of PTGS, in particular RNAi, 
offer a targeted and efficient way of inactivating 
genes, providing a powerful tool for investigating 
gene function. 

Recent experiments point to links between TGS 
and PTGS. In plants, dsRNA which acts as a trigger 
of PTGS can also direct methylation of the homolo- 
gous sequences in DNA, leading to transcriptional 
inactivation of the gene. 


See also: Epigenetics; Transposable Elements; 
X-Chromosome Inactivation 
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Gene substitution is the process in which a mutant 
allele replaces the original allele in a population. Many 
mutants arise in natural populations, but the majority 
of them are lost within a few generations by chance. 
Those lucky mutants that survive the first few genera- 
tions are tested by natural selection, i.e., selectively 
advantageous mutations increase their frequencies in 
the population, and disadvantageous ones are elimin- 
ated from the population. For selectively neutral 
mutants, their rise and fall in the population is gov- 
erned by random genetic drift. When the frequency of 
a mutant gene in the population becomes one, it is said 
to have fixed in the population, and a gene substitution 
has occurred. 


The process of gene frequency change in a popula- 
tion has been studied by population geneticists. Two 
approaches are deterministic and stochastic. When the 
population size is large and random drift is negligible, 
the deterministic model is applicable, i.e., the change 
of gene frequency by natural selection can be pre- 
dicted by simple formulas. However, when the popu- 
lation size is not large, the chance effect becomes 
significant, and stochastic treatments are needed. 

Behavior of molecular mutants is often influenced 
by random genetic drift even in a large population 
because of their minute effect. In other words, many 
molecular mutants are selectively neutral or nearly 
neutral, and their behavior depends on random drift. 
The dynamics of a completely neutral mutant has been 
theoretically described, i.e., the average course of the 
substitution process is known. For nearly neutral 
mutants, interaction of selection and random drift is 
important. 

The number of gene substitutions at a locus is 
estimated by comparing gene sequences at this locus 
between species. From such comparative studies of 
gene sequences, the rates of gene substitutions at vari- 
ous protein loci and noncoding regions have been 
obtained. The rate is defined as the number of sub- 
stitutions per unit time. As an example, the rate of 
substitution of the « hemoglobin gene is about 
0.5 per site for amino acid replacement sites, and 
about 4 per site for synonymous sites (rates are given 
per 10° years). In general, unimportant sites are evoly- 
ing rapidly and important sites are evolving slowly. 


See also: Population Genetics 
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From the beginning of medical history, attempts to 
treat human disease have necessarily been aimed at 
ameliorating symptoms and easing suffering rather 
than correcting the underlying causes of most human 
disease. The reasons for this are quite clear. In most 
cases, until late into the twentieth century, other than 
for attributing human disease and misfortune to irate 
gods, healers simply did not recognize or understand 
the true causes of most human afflication and could 


envision no alternatives to simply bringing relief 
and comfort to death and suffering. An exception to 
this general rule might be represented by even the 
most ancient forms of surgery, in which root causes 
of disease, as imaginary and wrong as they may often 
have been, were identified and invasive surgical 
procedures developed to rid the afflicted patient of 
the offense. 

The emergence of modern medical science over 
the past few centuries, and particularly the last few 
decades of the twentieth century, was to change that 
approach. Epochal advances first slowly and then with 
increasing speed: 


e The development of the art/science of human ana- 
tomy. 

e The discovery of blood circulation by the English 
physician William Harvey in 1628. 

e The invention of the compound light microscope by 
the Dutch cloth merchant Anton van Leeuwenhoek 
in 1674 and the identification by Robert Hooke in 
1665 of ‘cells’ as the structural basis of life. 

e The development of the science of cell biology 
by Theodor Schwann, Matthias Schleiden, and 
Rudolph Virchow in the early 1800s. 

e The first public demonstration of anesthesia by 
William Morton in 1846 and the consequent birth 
of modern surgery. 

e The discovery by Gregor Mendel in 1865 of the 
laws of genetic inheritance. 

e The development of the germ theory by Louis 
Pasteur and others during the same period. 

e The revelation of the concepts of chemical path- 
ology by Archibald Garrod at the beginning of the 
twentieth century (the concept of ‘inborn’ errors of 
metabolism — the principle that genetic errors lead 
to disruptions of normal metabolic processes to 
produce disease). 

e The discovery of antibiotics by Alexander Fleming 
in London and a group of chemists at Oxford in the 
1920s and the 1930s. 

e The invention of experimental genetics in Dros- 
ophila by Thomas Hunt Morgan in the early 1900s. 

e The discovery in the 1940s and 1950s by Oswald 
Avery, Colin McLeod, MacLyn McCarty, Alfred 
Hershey, and Martha Chase that genetic informa- 
tion is carried by deoxyribonucleic acid (DNA). 

e The discovery of the chemical rules by which DNA 
stores and transmits its genetic information during 
the full flowering of molecular biology in the 1960s 
and 1970s, under the leadership of the giants of the 
era — Francis Crick, James Watson, Sydney Brenner, 
François Jacob, Jacques Monod, Fred Sanger, Max 
Perutz, and many others. 
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The result of this explosion of knowledge of the 
physical and chemical basis of life became applied 
very quickly to an understanding of the nature of the 
genetic errors that lead to disease, making the design 
of drugs ever more rational and effective. An import- 
ant product of all these developments was an under- 
standing that most human disease results from a 
combination of inborn genetic factors and environ- 
mental influences, with predominating genetic factors 
in some disorders (such as cystic fibrosis, sickle-cell 
anemia, Tay-Sachs disease, Huntington disease, etc.), 
and a combination of genetic and environmental 
influences in most of the common and severe diseases 
(cancer, heart disease, degenerative disorders, neuro- 
logical diseases such as Parkinson and Alzheimer 
diseases, and even infectious disease). Yet still, until 
the late 1960s and early 1970s, even with all this 
new understanding of the causes of disease, the pre- 
dominant treatment model was still one in which the 
target for therapy was the drug treatment of abnormal 
cellular processes that resulted from the underlying 
defect. Treatment was still not aimed at the defect itself 
— what was being fixed was not what was broken. 

A new and more definitive approach to therapy 
began to surface in the mid to late twentieth century, 
one that is destined to provide a rational attack not 
only on the results of the causative defects but also, for 
the first time, more directly on the causes themselves. 
In 1944, Oswald Avery, Colin McLeod, and their col- 
leagues at the Rockefeller Institute in New York first 
demonstrated that purified DNA from one strain of 
bacteria could be introduced into another strain to pro- 
duce transfer genetic traits to the recipient bacteria. 
This process came to be called ‘genetic transformation.’ 
Inevitably, as the science of mammalian cell biology 
matured in the 1950s and 1960s, scientists would begin 
to try the same sort of experiment in mammalian 
cells rather than bacterial cells. Could normal human 
and other mammalian cells be changed permanently 
‘transformed’ by exposure to DNA from another 
mammalian cell? Could cells carrying disease traits 
be changed to normal cells ‘cured’ by exposure to 
DNA from normal cells? 

The answer was that the genetic modification was 
found to be much, much more difficult and far less 
efficient to correct errors in mammalian cells than it 
was in bacteria. There were a number of early experi- 
ments indicating that, after exposure to foreign normal 
DNA, genetically altered cells could indeed be found 
among defective cells, but only at a frequency of one 
in a million or less — certainly not efficient enough 
to imagine correcting a disease by such an approach. 
But by the mid-1960s, a number of investigators came 
to realize that there were agents all around us in nature 
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that were able to do the job of introducing foreign 
DNA into human and other mammalian cells with 
very great efficiency. These agents are called viruses. 
Their life cycles depend on their ability to insert their 
genetic material, whether it is DNA or RNA, into tar- 
get cells and to express their genes for varying lengths 
of time in those cells, thereby both reproducing them- 
selves and imparting new genetic traits to those cells. 
They have therefore, had to learn to carry out such 
gene transfer with great efficiency. Viruses are essen- 
tially packages of DNA or RNA surrounded by pro- 
tein, sugars, and fat molecules. The functions of these 
viral ‘coats’ is not only to package and protect the viral 
genetic material but also to help the virus identify 
specific molecules on the surface of their target host 
cells (virus receptors) that serve to attach the virus 
to the cells and promote its entry into the cell. It is 
because of this interaction of viruses with their specific 
cell-surface receptors that viruses are so much more 
efficient at transferring their genes into cells than 
other nonviral methods of gene transfer into mamma- 
lian cells. Unfortunately, at least for the infected cell, 
the cell often becomes nothing more than a factory 
more or less single-mindedly devoted to reproducing 
the virus and subverting all other cell functions 
necessary for cell survival, thereby killing the cells. 
However, some viruses have come to a very happy 
accommodation with their host cells and are able to 
exist for long periods in the infected cell without 
producing any apparent damage to the cell. In such 
cells, the foreign piece of new genetic information can 
become integrated into the genetic information 
of the cells, thereby providing the cell stably and 
permanently with new genetic functions without 
killing the cells. Unfortunately, as Renato Dulbecco 
and his colleagues at the Salk Institute in California 
showed in the mid-1960s, those new genes can 
have the effect of causing the cell to forget how to 
stop growing in its usual controlled fashion, thereby 
producing a cell that grows out of control — a cancer 
cell. 

For someone interested in human disease and, by 
good fortune, exposed to the environment of such a 
laboratory, the leap from inefficient gene transfer into 
defective human cells with purified (‘naked’) DNA to 
the use of viruses as agents to carry foreign and poten- 
tially therapeutic DNA into cells seemed obvious to 
several of us in the Dulbecco laboratory. In 1972, my 
colleague and I proposed that such viruses might be 
genetically modified to make them incapable of repli- 
cating and also abrogate their pathogenicity, while 
at the same time using them as vehicles to transfer 
therapeutic genes into defective cells (Friedmann and 
Roblin, 1972). We envisioned two general approaches 
to the therapeutic applications: 


e The ex vivo approach in which the genetic correc- 
tion would be accomplished by removing target 
cells from a patient, introducing a therapeutic viral 
vector into them im vitro, and then returning the 
genetically corrected cells to the patient. 

e The in vivo approach in which the gene transfer 
vector is introduced directly into the target defect- 
ive cells in the patient. 


While very attractive in principle, these concepts could 
not be put into practice at the time because no efficient 
viral gene transfer vectors existed and the recombinant 
DNA techniques needed to produce them had not 
yet been developed. Fortunately, over the next few 
years, methods of recombinant DNA manipulation 
were developed and refined and allowed, in the early 
1980s, the design and production of the first truly 
efficient viral vectors for gene transfer into mamma- 
lian cells. These vectors were derived from mouse 
viruses that used RNA as their genetic material and 
that were associated with several kinds of cancer in 
laboratory mice. These original vectors were derived 
from viruses that were called retroviruses because they 
have the property of converting their RNA into DNA 
after infection. They are able to integrate the DNA 
copies of their genomes into the genome of the host 
cell, thereby allowing them to express some of the 
viral genes in a stable and heritable way in the cell for 
the lifetime of the cell. Recombinant DNA methods 
allowed investigators to remove the potentially dele- 
terious genes from the viruses and replace them with 
other genes that could also be expressed perman- 
ently in the infected cells. These methods provided a 
proof of principle that such retrovirus vectors could 
carry out the functions required of a gene therapy vec- 
tor for human disease. Very quickly thereafter, our 
laboratory showed for the first time that a retrovirus 
vector carrying a normal copy of a human disease- 
related gene could correct the abnormal properties of 
cells derived from patients. We transferred a cDNA 
corresponding to the normal allele of the hypo- 
xanthine guanine phosphoribosyl transferase (HPRT) 
via a retrovirus vector into cultured cells from patients 
with the rare but devastating Lesch-Nyhan disease, 
and found that we could identify modified cells that 
not only demonstrated restored expression of the nor- 
mal gene but also correction of some of the secondary 
metabolic defects resulting from their HPRT defi- 
ciency (Willis et al., 1984). It was the development 
of the retrovirus vectors that represented the single 
most important early technical advance that opened 
the door to the subsequent explosion of gene transfer 
with many additional disease-related genes. 

One of those other disease-related genes that was 
applied early to gene transfer studies with retrovirus 


vectors was the gene encoding adenosine deaminase 
(ADA), a defect of which is responsible for a severe 
immunological defect in human patients. Model stud- 
ies with the normal ADA gene similar to those with 
HPRT demonstrated correction of the enzyme defect 
in cells from ADA patients and it was this disease 
model that was eventually to become the subject of the 
first potentially therapeutic human gene therapy study. 

Retrovirus vectors have many advantages for po- 
tential gene therapy applications, but they were also 
quickly found to demonstrate a number of disad- 
vantage. They are unable to infect nonreplicating 
cells such as neurons or hepatocytes, both important 
potential target cells for gene therapy. They are also 
relatively unstable in vivo and they cannot be made to 
sufficiently high titers to make gene delivery efficient 
in vivo. For these and other reasons, vectors have been 
developed from a number of other parent viruses, in- 
cluding human retroviruses such as HIV-1 and HIV-2 
and other lentiviruses, adenoviruses, herpes viruses, 
and adeno-associated viruses. This growing collection 
of vectors now allows gene transfer into virtually any 
and every possible human or other mammalian cell, 
either in vitro or in vivo. Furthermore, gene transfer 
methods using nonviral vectors such as liposomes and 
naked DNA have become increasingly efficient and 
useful in a wide variety of disease models. Some of the 
important properties of the more common of these 
vectors are summarized in Table |. These represent 
the properties of the most commonly used versions of 
each of the major vector systems. It must be kept 
in mind that major improvements are rapidly being 
made in each of the systems, which will significantly 
improve their properties and make some of the dis- 
advantages described in the list above out of date quite 
quickly. For example, methods are emerging to permit 
the targeting of some of these vectors to specific cells in 
vivo, allowing efficient delivery by the bloodstream. 
Titers and vector concentrations are improving, and 
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cytotoxic and immunogenic properties are being 
reduced. Methods are emerging for the integrating 
viruses (retroviruses, lentiviruses, AAV) that will 
eventually allow insertion into specified sites in the 
host cell genome, thus abrogating the possibility of 
insertional mutations in the cell. 

Even though the initial disease models were those 
of the single gene defects, the ‘inborn errors of 
metabolism’ such as Lesch-Nyhan disease, adenosine 
deaminase deficiency described above, cystic fibrosis, 
familial hypercholesterolemia, and others, other more 
complex diseases also became targets for gene ther- 
apy studies. Cancer quickly became one of the most 
attractive targets for gene therapy studies because of 
the enormous importance of the public health problem 
posed by cancer, and because of the identification of 
a variety of cancer- causing genes (oncogenes, tumor 
suppressor genes, apoptosis and cell death genes, cell- 
cycle-regulating genes, immune-modulatory genes, 
and others) that presented appealing targets for gen- 
etic manipulation and disease intervention. Other 
complex disease also came to be identified more 
and more as potential gene therapy targets, including 
degenerative diseases such as atherosclerosis and 
many forms of cardiovascular disease, arthritis, dia- 
betes mellitus, familial and sporadic forms of neuro- 
logical degenerative disorders such as Parkinson 
and Alzheimer diseases, and others. Even infectious 
diseases such as AIDS became potential targets for 
genetic intervention, and genetic approaches toward 
the control of agents responsible for other infectious 
diseases such as malaria have also become active areas 
of research. In many of the direct human disease 
models, laboratory studies have shown that foreign 
genes introduced into affected cells or into animals sub- 
jects by one or another of the gene transfer techniques 
could modify or even prevent the disease phenotype. 

This plethora of gene transfer techniques tech- 
niques and the availability of a growing number of 


Table | Some important properties of the more common vectors 

Vector Advantages Disadvantages 

Retrovirus Noncytotoxic, integrates, stable expression Requires replicating cells, low 
titers, unstable in vivo, insertional 
mutations 

Lentivirus Noncytotoxic, infects nonreplicating cells, stable expression Low titers, unstable in vivo 

(HIV, FIV, etc.) 

Adenovirus High titers, efficient expression Usually transient expression, 
cytotoxic, immunogenic 

Herpes simplex High titers, latency in some cells, prolonged expression Cytotoxic 

Liposomes Noncytotoxic Inefficient 

Naked DNA Noncytotoxic, stable in some cells, vaccination uses Inefficient 
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convincing disease models made it appear in the late 
1980s that the road to successful gene therapy in 
human patients was going to be relatively smooth 
and uncomplicated. Beginning in 1989 and 1990, pro- 
posals for human application of promising laboratory 
gene transfer results began to pour in to the federal 
regulatory bodies empowered to evaluate human gene 
therapy trials — the Gene Therapy Subcommittee of 
the Office of Recombinant DNA Advisory Com- 
mittee (RAC) at the National Institutes of Health and 
the Food and Drug Administration (FDA). By the 
mid-1990s, several hundred clinical studies had been 
reviewed and approved by the RAC and FDA in the 
US and by their equivalent agencies in Britain, 
France, Japan, Italy, Germany, and a number of other 
countries. Clinical gene therapy trials were under- 
taken in many forms of cancer, ADA deficiency, cystic 
fibrosis, and hypercholesterolemia, involving several 
thousand patients. Despite high levels of expectation 
for some evidence of therapeutic efficacy even in the 
early phase I studies, the results of this first rigorous 
set of clinical studies published in 1985 were disap- 
pointing, since they failed to provide definitive proof 
for clinical benefit to any patients. However, the stud- 
ies did demonstrate clearly that foreign genes could 
be introduced into humans without any apparent 
deleterious effects and that such genes could be 
expressed for prolonged periods (up to several years) 
and even that they produced physiological effects that 
were relevant to the disease processes. But no convin- 
cing evidence was presented by any of these studies for 
a cure, reversal, stabilization, or cessation of a disease 
process or for improved quality of life for any of the 
patients. 

These early experiments should not be seen as 
outright failures but rather as experiments that were 
carried out in an atmosphere of unrealistically exag- 
gerated expectations and overstated claims by some 
scientists, by their institutions (including universities) 
and even the National Institutes of Health, and by 
both the lay and scientific media. Several investigators, 
as well as the director of the NIH, became concerned 
that the general field of human gene therapy was 
promising more than it could deliver and began to 
call for more through basic and clinical research and 
more restraint in public statements from all parties 
regarding immediacy of clinical benefit from a field 
so obviously in its infancy (Friedmann, 1994). 

In the few years since those studies and the criti- 
cisms that followed, all aspects of the basic and clinical 
science of human gene transfer have improved mark- 
edly. New and vastly improved vectors have become 
available and many new disease-related genes have 
been described and their role in disease better under- 
stood. Gene transfer studies in tissue culture systems 


and in the growing number of faithful animal model 
systems for human disease have provided very con- 
vincing evidence for continuously improving effi- 
ciency and stability of gene transfer and expression. 
But most exciting of all is the clear evidence for some 
clinical benefit to patients that is beginning to perco- 
late to the surface through the layer of uncertainty and 
doubt from so many previous inconclusive clinical 
studies. Only the most pessimistic could fail to see or 
believe that the clinical promise of human gene ther- 
apy is about to be delivered, slowly at first but with 
increasing speed and efficiency as our techniques and 
tools improve. 

Gene therapy is actually two things. It is the con- 
cept that much of human disease can and should be 
treated at the level of the underlying genetic mechan- 
isms. That part of the revolution of human gene ther- 
apy is over. Gene therapy is now a widely accepted 
and even a central driving force in modern medicine. 
It will not vanish or fail in the long run. Gene therapy 
is also the implementation to clinical reality. That part 
of the revolution is now occurring. Within the coming 
several years, patients will survive who would have 
died without genetic intervention, suffering will be 
eased that could not have been ameliorated by trad- 
itional means, and quality of life will improve for 
many people because of the power of genetic modifi- 
cation. 

This newly justified optimism does not mean that 
the road ahead for gene therapy in humans will be 
entirely smooth. There will many technical and con- 
ceptual obstacles to the treatment of disease, and, 
inexorably, public policy and ethical problems posed 
by the inevitable extension of disease management to 
manipulations of traits not so clearly disease-related, 
e.g, physical stature, memory, cognitive, and even 
some personality traits. As the technology of gene 
transfer into human somatic cells becomes more and 
more efficient, predictable, and error-free, extension 
of genetic manipulation to the human germline to 
reduce the expression of disease not only in a treated 
patient but also in the patient’s progeny will become 
more and more irresistible. The debates surrounding 
human gene therapy will be far from over with the 
imminent demonstration of therapeutic success in 
current clinical studies. Nevertheless, it is clear that 
medicine is on the verge of being able finally to deliver 
truly definitive therapy for so many diseases that have 
been otherwise intractable scourges since the begin- 
ning of medical history (Friedmann, 1996, 1997). Itis a 
truly remarkable time for medicine. 
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One side product of many transgenic experiments is 
the generation of mice in which a transgene insertion 
has disrupted an endogenous gene with a consequent 
effect on phenotype. Unlike spontaneous or mutagen- 
induced mutations, ‘insertional mutations’ of this type 
are directly amenable to molecular analysis because 
the disrupted locus is tagged with the transgene con- 
struct. Unexpected insertional mutations have pro- 
vided instant molecular handles not only for 
interesting new loci but for classical loci, as well, that 
had not been cloned previously. 

When insertional mutagenesis, rather than the 
analysis of a particular transgene construct, is the 
goal of an experiment, one can use alternative experi- 
mental protocols that are geared directly toward gene 
disruption. The main strategies currently in use are 
based on the introduction into embryonic stem (ES) 
cells of B-galactosidase reporter constructs that either 
lack a promoter or are disrupted by an intron. The 
constructs can be introduced by DNA transfection or 
within the context of a retrovirus. It is only when 
a construct integrates into a gene undergoing tran- 
scriptional activity that functional B-galactosidase is 
produced, and producing cells can be easily recog- 
nized by a color assay. Of course, the production of 
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B-galactosidase will usually mean that the normal 
product of the disrupted gene can not be made and 
thus, this protocol provides a means for the direct 
isolation of ES cells with tagged mutations in genes 
that function in embryonic cells. Mutant cells can be 
incorporated into chimeric embryos for the ultimate 
production of homozygous mutant animals that will 
display the phenotype caused by the absence of the 
disrupted locus. This entire technology, referred to 
as ‘gene trapping,’ is clearly superior to traditional 
methods for the production of mutations at novel 
loci that use chemical mutagens or irradiation. 


See also: Beta (f)-Galactosidase; Embryonic 
Stem Cells 


Gene Trees 


N Saitou 
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The phylogenetic trees of genes are called ‘gene trees.’ 
Reconstruction of gene trees is quite important for 
evolutionary studies, because replication of nucleotide 
sequences automatically produces a bifurcating tree 
of genes. It should be emphasized that the phylo- 
genetic relationship of genes is different from the 
mutation process. The former always exists, while 
mutations may or may not happen within a certain 
time period and DNA region. Therefore, even if 
several nucleotide sequences happen to be identical, 
there must be a genealogical relationship for those 
sequences. However, it is impossible to reconstruct 
that genealogical relationship without the occurrence 
of mutational events. In this respect, the extraction of 
mutations from genes and their products is import- 
ant for reconstructing phylogenetic trees of genes. 
The advancement of molecular biotechnology has 
made it possible routinely to produce nucleotide se- 
quences. 

Phylogenetic trees of genes and species are called 
‘gene trees’ and ‘species trees,’ respectively, and there 
are several important differences between them. One 
such difference is illustrated in Figure |. Because a 
gene duplication occurred before speciation of species 
A and B in Figure IA, both species have two homolo- 
gous genes in their genomes. In this situation, we 
should distinguish ‘orthology,’ which is homology 
of genes reflecting the phylogenetic relationship of 
species, from ‘paralogy,’ which is homology of genes 
caused by gene duplication(s). Thus, genes 1 and 3 (and 
2 and 4) are ‘orthologous,’ while genes 1 and 4 (and 2 
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(A) 


Gene 1 (species A) 


Gene 3 (species B) 


Gene 2 (species A) 


Gene 4 (species B) 


(B) 


m~ Gene 1 (species A) 


Gene 2 (species A) 


Gene 3 (species B) 


Gene 4 (species B) 


Figure I Two possible relationships of four homolo- 
gous genes sampled from two species. (A) When 
gene duplication preceded separation. (B) When 
two independent gene duplications occurred after 
speciation. 


and 3) are ‘paralogous,’ as well as homologous genes in 
the same genome (gene pairs 1-2 and 3-4). If one is 
not aware of the gene duplication event, the gene 
tree for 1 and 4 may be misrepresented as the species 
tree of A and B, and thus a gross overestimation of 
the divergence time may occur. Note also that the 
divergence time between genes 1 and 3 is identical 
to that between genes 2 and 4, since both times 
correspond to the same speciation event. 

When two homologous gene copies are found in 
species A and B, another situation is possible, as 
shown in Figure IB. Now two gene duplications 
have occurred after the speciation of species A and B, 
and two gene copies in the genome of each species 
are more closely related with each other than the 
corresponding homologous genes at different species. 
Because two duplication events occurred independ- 
ently, the divergence time between genes 1 and 2 is 
different from that between genes 3 and 4. 


Human-1 (C) 
Chimpanzee-1 (C) 
Gorilla-1 (G) 
Orang utan(C) 
Gibbon-1 (C) 
Human-2 (C) 
Chimpanzee-2 (C) 
Gorilla-2 (G) 


[deletion] 


Gibbon-2 (C) 
Crab-eating macaque (C) 


(B) 


Human-1 

Human-2 

Gorilla-1 

Gorilla-2 
Chimpanzee-1 
Chimpanzee-2 

Orang utan 

Gibbon-1 

Gibbon-2 

Crab-eating macaque 


Figure 2 Effect of gene conversions to tandem 
duplicated IgA genes. (A) Plausible gene trees. (B) 
Spurious gene tree adopted from Kawamura S, Saitou, N 
and Veda S (1992) Journal of Biological Chemistry 267: 
7359-7367. 


When gene conversion and/or recombination has 
occurred within the gene region under consideration, 
a gene tree may be different from the species tree. 
Figure 2A shows the plausible gene tree for primate 
immunoglobulin « genes 1 and 2, and the gene 
duplication clearly preceded speciation of hominoids, 
followed by deletion of the «-2 gene from the orang 
utan genome. However, there are many nucleotide 
sites that possibly experienced gene conversion. One 
example is shown in Figure 2A: two gorilla genes were 
both G at a particular nucleotide site, while the 
remaining genes were C. This suggests either parallel 
substitution in the gorilla lineage or gene conversion 
between two gorilla genes occurred. If this kind of 
nucleotide configuration occurs multiple times close 
to each other, gene conversion is suspected. The 
resulting ‘spurious’ gene tree (Figure 2B) is distorted 
from the tree of Figure 2A because of the strong effect 
of gene conversion. 

When closely related genes, such as genes sampled 
from the same species or same population, are com- 
pared, the resulting gene trees are often called “gene 
genealogies.’ Although basic characteristics do not 
change from gene trees in which remotely related 
gene are compared, a somewhat different approach 


may be necessary. This short-term evolution has been 
central to population genetics theories, where allele 
frequency changes were considered. When the overall 
divergence time of a gene genealogy is small, the total 
number of mutations occurring in that genealogy may 
be quite small. In this case, detailed reconstruction of a 
gene genealogy is not easy, especially when only a 
short nucleotide segment is examined. Therefore, allele 
frequency change can be more powerful to delineate 
short-term evolution. 


See also: Homology; Orthology; Paralogy; 
Phylogeny; Species Trees; Trees 


Genetic Code 


S Brenner 
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One of the main outcomes of the elucidation of the 
structure of DNA was that the gene could be con- 
sidered as a one-dimensional sequence of the four 
bases, adenine, guanine, cytosine, and thymine. It 
was known from the work of Frederick Sanger and 
the protein chemists who followed him that proteins 
were folded versions of linear polypeptide chains, 
i.e., one-dimensional sequences of the 20 different 
amino acids. How the sequence of four bases in 
DNA determined the sequence of 20 amino acids in 


Table | The genetic code 
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proteins came to be known as the coding problem. 
The physicist G. Gamow proposed a special code in 
which he supposed that three bases were used to spe- 
cify one amino acid (a triplet code) and that the triplets 
overlapped; this was a degenerate code in which each 
base was used three times in successive triplets. By 
choosing a particular rule to classify the triplets he 
showed that the 64 triplets could code for exactly 20 
amino acids. The fact that the magic number 20 could 
be derived in what seemed to be a natural way lent 
encouragement to the idea that the code could be 
deduced theoretically. In due course, Gamow’s code 
was shown to be wrong and, in fact, the theory of 
overlapping triplet codes could be eliminated 
simply by showing that there were more dipeptide 
sequences than the 256 to which overlapping codes 
were limited. It became clear that the code would have 
to be determined experimentally. 

In the 1960s it became known that proteins were 
translated in special particles called ribosomes and that 
the messenger RNA was read not directly by amino 
acids but by special transfer RNAs to which the amino 
acids had become linked (see Adaptor Hypothesis). 
This provided a way of studying which triplets corres- 
ponded to which amino acids and in this way the code 
was determined experimentally. Thus GCA, GCU, 
GCG, and GCC all specify alanine while histidine 
is coded by two triplets, CAC and CAU. Three of 
the triplets, UAA, UAG, and UGA, are reserved 
as chain termination signals while AUG, which nor- 
mally codes for methonine, also has a special tRNA 
for initiating translation of the sequence (see Tables | 
and 2). 


UUU phenylalanine UCU serine 
UUC phenylalanine UCC serine 
UUA leucine UCA serine 
UUG leucine UCG serine 
CUU leucine CCU proline 
CUC leucine CCC proline 
CUA leucine CCA proline 
CUG leucine CCG proline 


ACU threonine 
ACC threonine 


AUU isoleucine 
AUC isoleucine 


AUA isoleucine ACA threonine 
AUG methionine ACG threonine 
GUU valine GCU alanine 
GUC valine GCC alanine 
GUA valine GCA alanine 
GUG valine GCG alanine 


UAU tyrosine 
UAC tyrosine 


UGU cysteine 
UGC cysteine 


UAA stop (ochre) UGA stop 
UAG stop (amber) UGG tryptophan 
CAU histidine CGU arginine 
CAC histidine CGC arginine 
CAA glutamine CGA arginine 
CAG glutamine CGG arginine 
AAU asparagine AGU serine 
AAC asparagine AGC serine 
AAA lysine AGA arginine 
AAG lysine AGG arginine 
GAU aspartic acid GGU glycine 
GAC aspartic acid GGC glycine 
GAA glutamic acid GGA glycine 
GAG glutamic acid GGG glycine 
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Table 2 Variations on the genetic code 


Organism! Genes Codon?” Universal Actual References 
meaning meaning 


Prokaryotes 

Various Selenoproteins® UGA Stop SeCys Low and Berry, 1996, 
Trends Biochem. Sci. 21: 203 

Mycoplasma sp. All genes UGA Stop Trp Yamao et al., 1985, 


Proc. Natl Acad. Sci., USA 82: 2306 


Organellar genomes 


Mammals All mitochondrial UGA Stop Trp Anderson et al. (1981) Nature 290: 457 
AGA, AGG Arg Stop 
AUA lle Met 
Drosophila All mitochondrial UGA Stop Trp Clary et al. (1984) Nucl. Acids Res. 12: 3747. 
AGA Arg Ser 
AUA lle Met 
Saccharomyces All mitochondrial UGA Stop Trp Sibler et al. (1981) FEBS Letters 132: 344 
cerevisiae 
CUN Leu Thr 
AUA lle Met 
Fungi All mitochondrial (?) UGA Stop Trp Waring et al. (1981) Cell 27: 4 
Maize* All mitochondrial (?) CGG Arg Trp Fox and Leaver (1981) Cell 26: 315 
Eukaryotic nuclear 
genomes 
Protozoa All nuclear UAA, UAG Stop Gln Caron and Meyer (1985) Nature 314: 185; 
Preer et al. (1985) Nature 314: 188; 
Horowitz and Gorovsky (1985) 
Proc. Natl Acad. Sci, USA 82: 2452; 
Kuchino et al. (1985) Proc. 
Natl Acad. Sci, USA 82: 4758 
Candida cylindracea All nuclear CUG Leu Ser 
Various mammals Selenoproteins® UGA Stop SeCys Low and Berry (1996) 


Trends Biochem. Sci. 21: 203 


'Where a single species is given it is possible that related organisms also display the same code modifications. 

?N = any nucleotide. 

>The following are known to be selenoproteins: formate dehydrogenase (Escherichia coli, Enterobacter aerogenes, Clostridium 
thermoaceticum, C. thermoautotrophicum, Methanococcus vannielii), NiFeSe hydrogenase (Desulphomicrobium baculatum, M. voltae), 
glycine reductase (C. sticklandii, C. purinolyticum), cellular glutathione peroxidase (human, cow, rat, mouse), plasma glutathione 
peroxidase (human), phospholipid hydroperoxide glutathione peroxidase (pig, rat), selenoprotein P (human, cow, rat), 
selenoprotein W (rat), type | deiodinase (human, rat, mouse, dog), type 2 deiodinase (Rana catesbiana), type 3 deiodinase 
(human, rat, R. catesbiana). See http://www.tigr.org/tdb/at/at.html 

íin maize and other plants the CGG codon is probably converted into UGG (the correct codon for tryptophan) by RNA editing. 
(Reproduced with permission from Molecular Biology Labfax 1: Recombinant DNA. London: Academic Press.) 


For some time it was thought that the code was uni- Genetic Colonization 


versal, that is, identical for all living organisms from 

viruses to humans. However, in certain protozoaandin JB Mitton 

the mitochondrial organelles of higher organisms there Copyright © 2001 Academic Press 

are differences. For example, a codon that normally sig- doi: 10.1006/rwgn.2001.0529 

nifies chain termination can encode an amino acid, or 

a codon that codes for a particular amino acid in one Genetic colonization refers to the establishment of 

organism can code fora differentaminoacidinanother. new breeding populations. This process is more than 
just the arrival of individuals at an unpopulated site; 

See also: Adaptor Hypothesis; Codon Usage Bias; ‘genetic colonization’ indicates that the colonists 

Codons; Universal Genetic Code; Variable Codons breed and establish a self-sustaining population. 


Each spring, the pelagic larvae of the blue mussel, 
Mytilus edulis, colonize the Outer Banks of North 
Carolina from breeding populations further north. 
However, the summer temperatures on the Outer 
Banks exceed the tolerance level of the mussels, so 
the populations go extinct before they have a chance 
to breed. In contrast, the mussel native to the Medi- 
terranean, Mytilus galloprovincialis, successfully col- 
onized sites in southern Africa and Australia during 
the last glacial maximum. 

Genetic colonizations are, in a historic sense, 
quite common, and they can be organized into three 
general groups: changes in geographical range as 
climates shift, contemporary invasions facilitated 
by man, and the normal flux of establishment and 
extinction of local populations in species with 
metapopulations. 


Genetic Colonizations Associated with 
Climate Change 


The waxing and waning of glaciers modifies the dis- 
tributions of species in both temperate terrestrial and 
marine environments. As the glaciers grow, they dis- 
place species from high elevations and high latitudes 
into glacial refugia at lower elevations and latitudes. 
The accumulation of glacial ice lowers sea levels, 
exposing land bridges that connect continents and 
islands. For example, during the last glaciation, Native 
Americans colonized North America by crossing the 
land bridge between Siberia and Alaska. 

At the height of the most recent glaciation, 18 000 
years ago, Scandinavia and parts of the British Isles 
were covered with ice, and tundra and permafrost 
covered central Europe. So it comes as no suprise 
that the modern ranges of European plants and ani- 
mals were colonized from glacial refugia further 
south. Plants and animals occupied at least four glacial 
refugia, in the Iberian Peninsula, and areas in Italy, 
Greece, and Turkey. The plants and animals migrated 
to the north and into the mountains as the glaciers 
receded. 


Genetic Colonizations by Invading 
Species 


Biological invasions are genetic colonizations of an 
environment by a non-native species. Man’s activities 
have produced numerous genetic colonizations, often 
with disastrous results. For example, domesticated 
cats introduce to New Zealand and Australia have 
caused the extinction of ground-dwelling birds and 
the local extinctions of some marsupials. When 
ocean-going freighters do not have a full load, they 
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pump sea water into their tanks to adjust their buoy- 
ancy. The next time they take on cargo, they pump 
water out of the tanks, often introducing marine pela- 
gic larvae into new environments. There are more than 
60 marine species that have successfully colonized San 
Francisco Bay in this way. 


Genetic Colonizations in Species with 
Metapopulations 


Some species, such as several species of songbirds in 
the British Isles, do not live in large, continuous popu- 
lations, but in a metapopulation, i.e., a series of small 
populations linked by occasional gene flow. The small 
populations sometimes die out, but migrants from 
nearby populations have the opportunity to recolog- 
nize the site. 


Founder Effect 


Genetic variability in a population is a function of the 
number of breeding colonizers. Reduction in the 
genetic variability of a population due to a small num- 
ber of breeding colonists is called founder effect. 
Genetic variability in areas once covered by glaciers 
is often reported to be low, probably as a consequence 
of repeated founder effects in successive genetic 
colonizations. 


See also: Founder Effect; Gene Flow; 
Phylogeography 


Genetic Correlation 
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The phenotypic correlation (rp) is a measure of asso- 
ciation between the observed performance (pheno- 
typic value) of individuals for a pair of quantitative 
traits, for example, stature and body weight of man. 
The genetic correlation is the corresponding measure 
of association between the genotypes of individuals, 
formally their genotypic or breeding values. It is im- 
portant in describing how traits are associated at the 
genetic level and in predicting the effect of selection on 
one trait on changes in other traits. 

For a single trait, the phenotypic variance can be 
partitioned into genetic and environmental compon- 
ents, and the genetic variance into further components. 
In the same way the phenotypic covariance (but not 
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correlation) covp can be expressed as a sum of covari- 
ance components, covp = cova + COVp + COV] + COVE 
of which only the phenotypic, (additive) genetic 
(cova) and environmental (covg) are much used, the 
dominance (covp) and epistatic covariances (covy) 
typically being subsumed into covg. Unless otherwise 
qualified, the genetic correlation is usually defined as 
the correlation, ra, of breeding values (or sums of 
average effects), Ax and Ay for traits X and Y, rather 
than as the correlation of genotypic values, because ra 
can be estimated from the correlation between rela- 
tives and is useful in predicting selection response. 
Thus ra = cov(Ax, Ay)/V[Vax Vay]. The genetic cor- 
relation is visualized most simply for individuals with 
large numbers of progeny, such as dairy sires used in 
artificial insemination, where it becomes approxi- 
mately equal to the correlation of progeny group 
means for the two traits. 

The correlation may be caused by the pleiotropic 
effects of individual genes on the two traits or by 
linkage disequilibrium between genes each affecting 
only one of the traits. Although pleiotropy is likely 
to lead to essentially stable correlations, for example, 
genes influencing appetite may affect size and obesity, 
correlations due to disequilibrium are likely to be tran- 
sient, perhaps following the crossing or introgression 
between populations. For example, the cross between 
a line with large body size and high prolificacy and a 
line with small size and low prolificacy will induce a 
positive genetic correlation between the traits, which 
may be sustained by disequilibrium. 

As pointed out by Falconer in 1952, the genetic 
correlation can also be defined where the two traits 
specify performance in two different environments, or 
indeed in two sexes. As an individual is reared in only 
one environment, an equivalent phenotypic correla- 
tion can not be defined, however. A high genetic cor- 
relation between environments then specifies a lack of 
genotype X environment interaction, and shows, for 
example, that selection in one environment will lead to 
genetic change in another. 

The genetic correlation can be estimated from 
resemblance among relatives using the same designs 
and similar methods as used to estimate heritability, 
including offspring-parent and sib correlations, and 
maximum-likelihood methods using all relationships 
in the data. The covariances (directly, or scaled as 
correlations or regressions) are now computed be- 
tween the performance of individuals for one trait 
with that of their relatives for another. For example, 
if (X,Y) is the sample covariance between trait X on 
the parent and trait Y on the offspring, 2c(X,Y) is an 
estimate of the (additive) genetic covariance and 
V{[c(X, Y)c(Y, X)]/[c(X, X)c(Y, Y)]} an estimate of 
the genetic correlation. Unless the data set is large, the 


estimate of the genetic correlation typically has a high 
standard error. 

Estimates of genetic and phenotypic correlations 
have been obtained for many traits and populations. 
Because of real differences among populations and 
species and because of sampling errors they are not 
all consistent, but some patterns emerge: 


1. Genetic correlations between repeat records such 
as milk yield of cattle in different lactations or 
number of bristles on two abdominal segments of 
Drosophila typically show genetic correlations 
close to 1, although the phenotypic correlations 
(the repeatability of the record) may be much 
lower, say 0.5. 

Correlations among general size and among con- 
formation traits are high (over 0.4); correlations 
between growth rates and fatness are generally 
quite small; further, for such characteristics, gen- 
etic and phenotypic correlations tend to be very 
similar. 

3. Genetic and phenotypic correlations between pro- 
duction traits such as milk yield and the concen- 
tration of its components, e.g., fat%, are negative 
(say —0.3). 

Genetic and phenotypic correlations between traits 
of growth and reproduction are usually small, but 
signs are not consistent. Typically, however, there 
is a positive correlation between body size and 
offspring number (litter size). 


ee 
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The genetic change or correlated response (CRy) in 
a trait (Y) to selection on another trait (X) is pro- 
portional to the genetic covariance or correlation. 
If Sx =io, is the selection differential on X, then 
CRy = (cova/Vpx)Sx = thxhyrxopy. The genetic 
correlation can therefore also be estimated from the 
correlated response to selection if one of a pair of lines 
is selected for X and the other for Y. Correlations 
often change substantially over generations in selec- 
tion experiments, however, presumably as a conse- 
quence of gene frequency change at pleiotropic loci. 
If selection intensities are unaffected, the correlated 
response in Y from selection in X compared to the 
direct response from selecting on Y alone is given by 
rahx/by. This specifies the relative effectiveness of 
indirect selection, for example on growth rate to 
improve feed conversion efficiency. 

The magnitude of genetic correlations reflect both 
the pleiotropic nature of genes present and arising in 
the population from mutation, and the evolutionary 
forces to which the species or population has been 
exposed. Thus it would be surprising to find very 
strong correlations between body size or conform- 
ation traits and reproduction traits, on the assumption 


that the latter were exposed to natural selection, and if 
there were any associations they would be nonlinear, 
i.e., intermediates at an optimum. Negative associ- 
ations are to be expected among traits that individually 
contribute to fitness, because positive variants will 
have been removed by selection. The magnitude of 
such correlations among life history traits is a subject 
of active research. 


Further Reading 

Falconer DS and Mackay TFC (1996) Introduction to Quantitative 
Genetics, 4th edn. Harlow, UK: Longman. 

Kearsey MJ and Pooni HS (1996) The Genetical Analysis of Quan- 
titative Traits. London: Chapman & Hall. 

Lynch M and Walsh B (1998) Genetics and Analysis of Quantitative 
Traits. Sunderland, MA: Sinauer Associates. 

Roff DA (1997) Evolutionary Quantitative Genetics. New York: 
Chapman & Hall. 


See also: Artificial Selection; Genetic Variation; 
Heritability; Selection Index 
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Kelly (1986) has defined genetic counseling as an 
individual- and family-based “educational process 
that seeks to assist affected and/or at-risk individuals 
to understand the nature of the genetic disorder, 
its transmission and the options open to them in 
management and family planning.” A comprehensive 
review can be found in Practical Genetic Counselling 
(Harper, 1998). 


Specialist Genetic Clinic 


In the specialist genetic clinic, counseling frequently 
involves risks of recurrence of genetic disorders and 
reproductive options rather than treatment and has 
five components (see Table |). A family tree and a 
precise genetic diagnosis are important for accurate 
risk estimation. All available clinical records of the 
patient and family and appropriate special investiga- 
tions, including the latest DNA methods, will be used. 
Risk estimation may be a simple process if the diag- 
nosis and family history are known and the genetic 
disorder is consistent in its manifestations, but in 
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many cases there is a need to combine such data with 
new findings to arrive at a final risk estimation. The 
use that the client makes of information, including 
decisions about reproductive options, may be influ- 
enced by the way it is communicated. Effective com- 
munication requires training in counseling as well as 
knowledge of medical genetics. The counselor needs 
to be aware of the client’s attitudes, level of emotional 
involvement, perception of the facts, and religious or 
other precepts. Counselors are consequently required 
to identify the individual client’s ‘agenda,’ knowledge, 
and needs rather than to deploy standardized explan- 
ations or advice. Continuing support may be highly 
desirable during, for example, prenatal diagnosis, ter- 
mination of pregnancy, or the consequences of adverse 
results of predictive tests. 


Ethical Issues 


The aim is to be nondirective and to supply clients 
with the facts, understanding, and confidence to make 
reproductive or other decisions that are best for them. 
Trained medical geneticists and counselors, especially 
when dealing with reproductive decisions, will always 
attempt to adhere to the gold standard of nondirec- 
tiveness. This means that counselors provide the five 
components of genetic counseling (see Table 1) but 
rarely advise a ‘correct’ or ‘incorrect’ course of action 
for the client to follow. This is not always true of 
other specialists, because they are accustomed to ad- 
vising patients to accept various forms of treatment 
for physical illness. There are shades of opinion 
amongst health professionals (and their patients) 
about giving advice rather than only information, 
but the rule is to reject coercion aimed at the subjec- 
tion of an individual patient’s wishes to the public 
good (Ethics and Genetics). 


Who Else Does Genetic Counseling? 


The preceding description is based on counseling 
developed in specialist centers dealing with indi- 


Table I The five components of genetic counseling as 
developed in specialist genetic clinics 


No. Components 


Taking a family history 

Making a diagnosis 

Estimating risk 

Empathic communication of facts to the client 
Follow-up and support 
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viduals and families who request counseling because 
of the birth of an affected infant or a preexisting 
genetic disorder. However, there are many other cir- 
cumstances requiring genetic counseling but where 
the full process is not available because of time 
restraints and the absence of fully trained geneticists. 
For example, family studies and population screen- 
ing identify individuals who may not have sought 
genetic counseling themselves but are at increased 
risk or are shown to be carriers or to have genetic 
susceptibility factors. Genetic counseling is also 
needed in many specialities as part of routine prac- 
tice including treatment and prevention. Common 
diseases of complex etiology, diabetes mellitus, coron- 
ary heart disease, cancer, etc. may require quanti- 
tative and probabilistic risks to currently healthy 
individuals. Here similar ethical principles apply 
respecting individual autonomy, but counseling is 
increasingly likely to be provided by physicians and 
others concerned with the management of com- 
mon disease and more familiar with therapy than 
counseling. 


Audit of Genetic Counseling 


Maintaining overall quality, accuracy, respect for 
patient autonomy, and nondirectiveness requires 
continuous audit, especially as medical and nursing 
undergraduate and postgraduate genetic education 
have tended to lag behind scientific advances. Appro- 
priate clinical management of genetic disorders must 
include records of timely counseling so that the avoid- 
ance of genetic disease can always be seen to result 
from informed patient choice. Only when there are 
records to document that counseling was accurate 
and empathic can we be confident that rejection or 
acceptance of screening, prenatal diagnosis, or ter- 
mination of pregnancy are autonomous decisions 
made by adequately informed patients (Harris et al., 
1999). 
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Consider a quantitative trait. The phenotype of an 
individual X, which is its measurement with respect 
to this trait, may be written as: 


Py =Gy+ex+ (Ge)y 


where Gy is the mean of all individuals with the same 
genotype as X, ex is the effect of the environment, and 
(Ge) x is the effect of the genotype—-environment inter- 
action. Then if it is assumed that (Ge) x is equal to 0 
and that genotypes are randomly distributed among 
environments, the covariance between the phenotypes 
of pairs of individuals X and Z with a particular 
pattern of relationship is 


cov(Px, Pz) = cov(Gx, Gz) 


where the right side of the equation is the genetic 
covariance between X and Z. The genetic covariance 
is the average of cross-products of deviations of Gx 
and Gz from the mean of the population when all pairs 
of individuals X and Z with the same particular 
pattern of relationship are considered. 

Let us assume that there is an infinite random mat- 
ing population and independent assortment. Let the 
pairs of parents of X and Z be respectively (P, Q) and 
(R, S) and fag be the probability that independently 
chosen random copies of a gene from individuals A 
and B are identical by descent. Then 


cov(Gy, Gz) = 2fxz04 + uxzoh 


+ X (2fxz) (wxz) orp. 


r+s>2 


where uxz = ferfas +frsfar- The variance compon- 
ents 04,07, and 0%,p, are, respectively, the additive 
genetic variance, the dominance variance, and the vari- 
ance associated with all interactions of single alleles at 
r loci and genotypes at s other loci. This general 
expression for the genetic covariance was independ- 
ently derived by Cockerham and Kempthorne. 

If there is no epistasis but loci are not in 
gametic phase equilibrium, a general expression for 
cov(GxGz) is obtainable, but it has a very compli- 
cated form, as shown by Weir, Cockerham, and 
Reynolds. Genetic covariances can also be calculated if 
there is inbreeding, but the resulting expressions con- 
tain covariances that are not present when there is 
random mating. A thorough analysis of this problem 


if two loci are involved was presented by Weir and 
Cockerham. 

Other models in the literature allow for sex-linked 
loci, maternal effects, effects of cytoplasmic genes that 
are maternally inherited, polyploidy, and covariances 
between relatives when one relative of a pair is meas- 
ured in trait Y4 and the other in trait Y>. There is also 
some theory that applies when there is assortative 
mating. Discussions of these topics and references to 
the original papers mentioned above can be found in 


the books listed below. 
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In contrast to an infectious disease, which is acquired 
adventitiously during an individual’s lifetime as a result 
of invasion by a foreign organism, a genetic disease 
results from a defect (mutation) within an individual’s 
own genetic material (DNA) that causes detectable 
malfunction of certain tissues and organs. 

The first molecular genetic change underlying an 
inherited genetic disease was identified in 1957, when 
Ingram demonstrated that sickle-cell hemoglobin dif- 
fers from normal hemoglobin by a single amino acid 
substitution. Since the advent of recombinant DNA 
technology in the 1970s, the mutated genes respon- 
sible for many genetic disorders have been identified 
and the precise molecular lesions to these genes that 
are produced by individual mutations have been 


established. 
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For a genetic disease caused by a recessive mu- 
tation, both parents must be heterozygous for the 
mutation and there will be a one in four chance of 
producing a homozygous affected child. However, 
given that heterozygous individuals (carriers) are 
asymptomatic, in the case of relatively rare genetic 
diseases, carriers can be completely unaware of their 
status and thus unprepared for the birth of a genetic- 
ally compromised child. Some recessive genetic dis- 
ease, such as Tay-Sachs disease, show dramatically 
increased prevalence in certain ethnic groups. Consid- 
erable effort has been devoted to identifying and 
counseling potential carriers within these groups. 
For X-linked recessive diseases such as hemophilia, 
where half the sons of a heterozygous mother are 
affected, and for dominant mutation diseases such 
as achondroplasia (see Achondroplasia), individuals 
carrying a single mutant allele will, in general, be 
aware of their problem and thus able to make 
informed choices about parenthood. Unfortunately 
some diseases caused by dominant mutations, such as 
Huntington disease, do not usually manifest until after 
the reproductive years. 

Two complementary approaches to the eradication 
of genetic disorders are ongoing, both of which rely 
on preliminary identification of the affected gene. One 
approach is preventative and involves genetic testing 
of asymptomatic potential carriers or potentially 
affected individuals so that those carrying a mutant 
allele may know their status and, if necessary, avoid 
passing on the mutation in question. In combination 
with genetic testing of embryos produced by in vitro 
fertilization and selection of embryos carrying non- 
mutant genes as potential progeny, this approach can 
permit carriers to produce their own children while 
simultaneously eradicating the disease mutation from 
their family lineage. Some in the medical community 
advocate neonatal genetic testing of all individuals for 
all possible genetic diseases. 

The second approach is a therapeutic one for 
affected individuals and involves supplying a correctly 
functioning version of the mutated gene to the mal- 
functioning tissue(s). The first experiments using this 
approach, termed gene therapy, were initiated in 1990. 
Introducing a gene into the cells of certain body 
(somatic) tissues within an individual still leaves the 
germline sperm or egg progenitor cells mutant and 
thus does not eliminate the potential for disease in 
any offspring. However, current methods for perman- 
ent gene integration into germline cells carry a 
potential for causing further genetic damage and thus 
are ethically unacceptable. The option of genetically 
screening embryos and selecting unaffected embryos 
for implantation (as discussed above) is also possible 
for genetic disease patients. Currently, harmless, 
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modified versions of viruses are proving to be the best 
vectors for introducing genes into somatic tissues of 
individuals with genetic diseases. However, to date 
gene therapy has had few clinical trials, and very limit- 
ed success. At least one individual has died as a direct 
result of this type of treatment. 

All forms of cancer involve mutations to genes 
within an individual. In contrast to genetic disease 
mutations these defects are not initially present in 
the genome but arise from damage to DNA in a parti- 
cular tissue during the individual’s lifetime. 


See also: Cancer Susceptibility; Clinical Genetics; 
Gene Therapy, Human; Genetic Counseling 
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Genetic distance is the degree of genetic difference 
(genomic difference) between species or populations 
that is measured by some numerical method. Thus, 
the average number of codon or nucleotide differences 
per gene is a measure of genetic distance. There are 
various molecular data that can be used for measur- 
ing genetic distance. When the two species to be com- 
pared are distantly related, data on amino acid or 
nucleotide sequences are used Nei and Kumar (2000). 
In the comparison of closely related species or popu- 
lations, however, the effect of polymorphism cannot 
be neglected, and one has to examine many proteins or 
genes. For this reason, it is customary to measure the 
genetic distance between populations in terms of a 
function of allele frequencies for many genetic loci. 
Genetic distances are useful for constructing 
phylogenetic trees of populations as well as for esti- 
mating times of divergence between populations. In 
the past, many investigators have used allele frequency 
data obtained by protein electrophoresis and immu- 
nological methods. In recent years, many different 
types of molecular data such as microsatellite DNA 
and RAPD data are used, but the basic principle of 
computing genetic distances and constructing phylo- 
genetic trees remains essentially the same. Here only 
the basic methods for computing genetic distances 
are discussed. The reader who is interested in more 
detailed information should refer to Nei and Kumar 
(1983). Some results from recent studies of the evolu- 
tion of human populations will also be presented. 


Commonly Used Distance Measures 


Rogers’ Distance 

Suppose that there are q alleles at a locus, and let x; and 
y; be the frequencies of the ith allele in populations X 
and Y, respectively. Each allele frequency may take a 
value between O and 1. Therefore, it is possible to 
represent populations X and Y in a q-dimensional 
space. The distance between the two populations in 
the space is then given by 


P 1/2 
dr = es -x7 (1) 


i=1 


This distance takes a value between O and v2, the 
latter value being obtained when the two populations 
are fixed for different alleles. This property is not very 
desirable. So, Rogers (1972) proposed the following 
measure, which takes a value between 0 and 1: 


1 1/2 
Dr= pyc - J (2) 


i=1 


When allele frequency data are available for many 
loci, the average of this value is used. Note, however, 
that this measure has one deficiency. When the two 
populations are both polymorphic but share no com- 
mon alleles, Dr is given by [(3> x? + 30 92) /2]'?. This 
value can be much smaller than 1 even if the popula- 
tions have entirely different sets of alleles. For example, 
when there are five nonshared alleles in each popula- 
tion and all allele frequencies are equal (x; = 1/5; 
y;= 1/5), we have Dr = 0.45. This property is clearly 
undesirable. 


Bhattacharyya’s Distance and its 
Modifications 

Representing two populations on the surface of a 
multidimensional hypersphere, Bhattacharyya (1946) 
suggested that the extent of differentiation of popula- 
tions be measured in terms of the angle (0) between the 
two lines projecting from the origin to the two popu- 
lations (X and Y) on the hypersphere (Figure 1). 
When there are q alleles, we consider a g-dimensional 
hypersphere with radius 1 and let each axis represent 
the square root of the allele frequency, i.e., & = y (x;) 
and n; = v( y:i). Therefore, > & = $ n? = 1. When 
there are only two alleles, populations X and Y can 
be represented on a circle, as shown in Figure I. 
Elementary geometry shows that in the case of q 
alleles the angle 0 is given by 


4 


q 
cos 6 = 5 Er = 5 iyi (3) 
i=1 


1=1 


Allele Az 
4 


Population X 


Population Y 


l > Allele A, 
AX; Jy: 
Figure | Bhattacharyya’s geometric representation of 
populations X and Y for the case of two alleles. 


Bhattacharyya proposed that the distance between 
two populations be measured by 


= [> cos 2 ») 
‘ (4) 


This measure takes a value between O and 1. When 
there are allele frequency data for many loci, the aver- 
age of this quantity is used as a genetic distance mea- 
sure as in the case of Dr. 

In a computer simulation, Nei et al. (1983) noted 
that the following distance measure is quite efficient in 
recovering the true topology of an evolutionary tree 
when it is reconstructed from allele frequency data. 


L qk 
Dg= >> (: ae vw) /L (5) 


k=1 


where qg and L are the number of alleles at the kth 
locus and the number of loci examined, respectively, 
and the subscript zk refers to the ith allele at the kth 
locus. This measure takes a value between 0 and 1, the 
latter value being obtained when the two populations 
share no common alleles. Since the maximum value 
of D4 is 1, D4 is nonlinearly related to the number of 
gene substitutions. When D4 is small, however, it in- 
creases approximately linearly with evolutionary time. 

The standard error of D4 or the difference in D4 
between two pairs of populations can be computed by 
the bootstrap method if it is based on many loci. In 
this case, a bootstrap sample will represent a different 
set of loci, which have been chosen at random with 
replacement (Nei and Kumar, 2000). Similarly, the 
standard errors of average Dp, 6°, and dc can be 
computed by the bootstrap. 


Genetic Distance 829 


Fer Distance 

The allele frequencies of different populations may dif- 
ferentiate by genetic drift alone without any selection. 
When a population splits into many populations of ef- 
fective size N ina generation, the extent of differentia- 
tion of allele frequencies in subsequent generations 
can be measured by Wrights Fsr Nei and Kumar, 
2000. When there are only two populations but allele 
frequency data are available from many different loci, 
it is possible to develop a statistic whose expectation is 
equal to Fs. One such statistic is given by 


Fer = (Ix +Jv)/2-Jxvl/-Sxy) (6) 


where Jy, Jy, and Jyy are unbiased estimators of the 
means (Jx, Jy, and Jxy) of Xx, So y?, and Ð x;y; 
over all loci, respectively. For a single locus, unbiased 
estimates of > x?, > y?, and >> x;y; are given by 


E (2mx 30% = 1) /(2mx —1) (7) 
jy = (2my S05? -1)/Qmy-1) (8) 


Ixy =) 85 (9) 


where my and my are the numbers of diploid individ- 
uals sampled from populations X and Y, respectively, 
and x; and ĵ; are the sample frequencies of allele A; in 
populations X and Y. Therefore, Jx, Jy, and J ‘yy are the 
means of jx, jy, and jyy over all loci, respectively. The 
expectation of Fé, is given by 


EF) Si. (10) 


where ¢ is the number of generations after population 
splitting. Therefore, we have 


D; = —In(1 — Fe) (11) 


which is expected to be proportional to t when the 
number of loci used is large [E(Dz) = t/(2N)]. This 
indicates that when evolutionary time is short and new 
mutations are negligible, one can estimate t by 2ND; 
if N is known. In practice, however, new mutations 
always occur, and this will disturb the linear relation- 
ship between Dz and t when a relatively long evolu- 
tionary time is considered. N is also usually unknown. 


Standard Genetic Distance 
Nei (1972) developed a genetic distance measure called 
the standard genetic distance, whose expected value is 
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proportional to evolutionary time when both effects 
of mutation and genetic drift are taken into account. 
It is estimated by 


D=-—Inl (12) 


where 
I =Jxy/VixJy (13) 


The variances of J and D can be computed by the 
bootstrap method. 

When the populations are in mutation—drift bal- 
ance throughout the evolutionary process and all 
mutations result in new alleles following the infinite- 
allele model (Nei and Kumar, 2000), the expectation of 
D increases in proportion to the time after divergence 
between two populations. That is, 


E(D) = 2aT (14) 


where o is the rate of mutation or gene substitution per 
year and T is the number of years after divergence of 
the two populations. Therefore, if we know œ we can 
estimate divergence time from D. 

The « value varies with genetic locus and the type 
of data used. For the genetic loci that are commonly 
used in protein electrophoresis, it has been suggested 
that æ is approximately 107” per locus per year. If this 
is the case, the time after divergence between two 
populations is estimated by 


T=5x10°D (15) 


This formula is based on the assumption that all loci 
have the same rate of gene substitution. In practice, the 
o value varies from locus to locus approximately fol- 
lowing the gamma distribution. In this case, the aver- 
age value of J over loci is given by 


fie sal (16) 


a+2a 
where a is the shape parameter of the gamma distribu- 


tion and @ is the mean of « over loci. Therefore, the 
number of gene substitutions per locus is given by 


D, = 2at = al(1 — I4)" — 1] (17) 
When a = 1, this becomes 
D, = (1 — I4) I4 (18) 


Here J, is estimated by equation (14). In the case of 
a > 0, T can be estimated by replacing D in equation 


v/2 v/2 v/2 v/2 


Figure 2 Stepwise mutation model. 


(16) by D,. Note that D, is nearly equal to D when 
la> 8anda=1. 


(u)? Distance 

Microsatellite DNA loci are segments of repeated 
DNA with a short repeat length, usually two to six 
nucleotides. Thus, an allele for a CA repeat locus may 
be represented by CACACACACACACA, where 
the dinucleotide CA is repeated seven times. Micro- 
satellite loci are believed to be subject to a mutational 
change following the slippage model of duplication or 
deletion of repeat units. Therefore, new alleles are 
supposed to be generated by following the stepwise 
mutation model given in Figure 2. Microsatellite loci 
are usually highly polymorphic with respect to the 
number of repeats, and therefore they are useful for 
studying phylogenetic relationships of populations. 
Goldstein et al. (1995) proposed that the following 
distance measure be used for microsatellite DNA data. 


L 
=5( Hxk — HYk) "IL (19) 
k 


where uye (= X. ixik) and uyg (= X iyjk) are the mean 
numbers of repeats at the kth locus in populations X 
and Y, PA The expectation of (6u) is given 
by E(u) = 2aT, where « is the mutation rate per 
year. Therefore, T can be estimated by (6u) /(2a). 

In practice, however, there are a number of prob- 
lems with this method. First, the « value apparently 
varies considerably with locus and organism, and it is 
not a simple matter to estimate « for each locus. Se- 
cond, the variance or the coefficient of variation of 
(6u)? is very large compared to that of other distance 
measures such as dc and D4. Therefore, a large num- 
ber of loci must be used to obtain a reliable estimate of 
T even if « is known. Third, there is evidence that the 
actual mutational pattern is irregular and deviates con- 
siderably from the stepwise mutation model on which 
this distance measure is based. 


Genetic Distance and Phylogenetic Trees 


A linear relationship of a distance measure with evo- 
lutionary time is important for estimating the time of 
divergence between two populations. It is also a nice 
property for constructing phylogenetic trees, other 
things being equal. In practice, however, different dis- 
tance measures have different variances, and for this 


Chimpanzee 
96 Bantu (Lisango) 
Pygmy (CAR) 
ea 
Pygmy (Zaire) 
96) North Italian 


North European 
Melanesisn 
Australian 


New Guinean 
Japanese 

Chinese 

Cambodian 
Amerindian (M) 


|Log Amerindian (K) 
Amerindian (S) 


Genetic Distance 831 


Chimpanzee 


Amerindian (K) 

Melanesian 

Japanese 

A North Italian 

Chinese 

Cambodian 

‘Amerindian (M) 

North European 

Australian 

New Guinean 

Amerindian (S) 
Bantu (Lisango) 
Pygmy (CAR) 
Pygmy (Zaire) 


Figure 3 Neighbor-joiningtrees ofhuman populations obtained by using Bowcocket al’s (1994) data of 25 microsatellite 
loci. (A) Tree obtained by D4 distance. (B) Tree obtained by (6) distance. The number for each interior branch is 
the bootstrap value from 1000 replications. M, Maya; K, Karitiana; S, Surui; CAR, Central African Republic. 


reason a distance measure that is linear with time is not 
necessarily better than a nonlinear distance in obtain- 
ing true trees (topologies). 

A number of authors have studied this problem by 
using computer simulation. The general conclusions 
obtained from these studies are as follows: 


1. For all distance measures, the probability of obtain- 
ing the true topology (Pr) is very low when the 
number of loci used is less than ten but gradually 
increases with increasing number of loci. In general, 
Pr is lower for the stepwise mutation model than 
for the infinite-allele model. This indicates that a 
larger number of loci should be used for micro- 
satellite DNA data than for electrophoretic data 
when the level of average heterozygosity is the 
same. 

Distance measures D4 and dc are generally more 
efficient in obtaining the true topology than other 
distance measures under many different conditions. 
When the total number of individuals to be studied 
is fixed, it is generally better to examine more loci 
with a smaller number of individuals per locus 
rather than fewer loci with a large number of indi- 
viduals in order to have a high Py value, as long as 
the number of individuals per locus is greater than 
about 25. When average heterozygosity is as high as 
0.8, however, a larger number of individuals per 
locus need to be studied. 


= 


Evolutionary Relationships of Human 
Populations 


Bowcock et al. (1994) examined microsatellite DNA 
(mostly CA repeats) polymorphisms for 25 loci from 
14 human populations and one chimpanzee species. 
Figure 3A, B show the phylogenetic trees obtained 
by using the D4 and (5,1) distances, respectively, from 
allele frequency data for the 25 loci. 


The tree obtained by D4 distances (Figure 3A) 
shows that Africans (Pygmies and Bantu) first sep- 
arated from the rest of the human groups and that the 
bootstrap values for the interior branches connecting 
Africans and chimpanzees and non-Africans and chim- 
panzees are both very high. (A bootstrap value for an 
interior branch is an indicator of the accuracy of dis- 
tinction of the two population groups separated by the 
interior branch.) This result supports the currently 
popular view that modern humans originated in Africa. 
The same tree shows that Europeans first diverged from 
the other non-African people and then the group of 
New Guineans and native Australians separated from 
the remaining group. The first separation of Europeans 
from the rest of non-Africans is well supported by a 
high bootstrap value, but the next separation of New 
Guineans and Australians is less clear, because the boot- 
strap value for one of the two interior branches involved 
is only 53%. In fact, a similar study using classical 
markers (blood group and allozyme data) has suggested 
that New Guineans and Australians are genetically 
close to southeastern Asians (Indonesians, Filipinos, 
Thais). To clarify this aspect of evolutionary relation- 
ships, it seems necessary to examine many more loci. 

Figure 3B shows the tree obtained by (6u)? dis- 
tances. The topology of this tree is very different from 
that for D4 distances and is poorly supported by the 
bootstrap test. This unreliable tree was obtained 
mainly because the sampling error of (ôu)? is very 
large, as mentioned earlier. 


Other Genetic Markers 


In recent years, anumber of other genetic markers have 
been used for studying the phylogenetic relationships 
of populations. They are restriction fragment length 
polymorphism (RFLP), amplified fragment length 
polymorphism (AFLP), and random amplification of 
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polymorphic DNA (RAPD) data. The allele fre- 
quency data obtained by these markers can be ana- 
lyzed by the same methods as mentioned earlier. They 
can also be used to estimate the average number of 
nucleotide differences per site between two popula- 
tions. In the latter analysis somewhat sophisticated 
statistical methods are required, and they are pre- 
sented in Nei and Kumar (2000). 

During the last two decades, RFLP data for mito- 
chondrial and chloroplast DNA have been used exten- 
sively to study the extent of genetic differentiation of 
closely related species or populations. RFLP data can 
be obtained inexpensively and give sufficiently accurate 
results for studying closely related populations. In 
recent years, however, many authors sequence poly- 
morphic alleles to obtain more accurate results. Statis- 
tical methods for analyzing these polymorphic DNA 
sequences are described in Nei and Kumar (2000). 
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Genetic drift is the random variation in gene frequen- 
cies due to sampling. Since populations are finite in 
size, those individuals contributing genes to the next 
generation constitute a sample from the population. 


As an example consider a single genetic locus with two 
alleles, A and a, with frequencies p and q in the gene 
pool (p + q = 1). If there were 50 A alleles (p = 50/100 
= 0.5) and 50 a alleles (q = 50/100 = 0.5) among 50 
(Ne= 50) diploid parents that interbreed, the sampling 
process might yield 48 A alleles (p = 48/100 = 0.48 
and 52 a alleles (q = 52/100 = 0.52) among offspring of 
the next generation. Buri (1956) gives an example of 
this sampling process in 107 populations of Droso- 
phila melanogaster segregating for two alleles both 
initially with frequencies of p = g = 0.5 and breeding 
size of Ne = 16. As can be seen in Figure IA gradually 
the empirical density (histogram) of allele frequency 
(p) of each replicate population drifts or spreads out 
from 0.5. 

Wright (1931) and Fisher (1930) calculated the pre- 
dicted effects of genetic drift on natural and experi- 
mental populations. If f ( p;t) is the theoretical density 
(histogram) of an allele frequency under drift at time t, 
Wright calculated this density of allele frequencies 
over time in a population undergoing genetic drift as 
shown in Figure IB. 

The theoretical density of an allele frequency f(p;t) 
flattens out with time (t) at 1/2N, (with all allele 
frequencies being equally likely) after about 2N, gen- 
erations. Eventually a population is expected to drift 
to fixation or loss of the A allele under genetic 
drift. Thus, if genetic drift goes on long enough, a 
consequence is reduction in genetic variation within 
a population. 

Under random drift the variance in the current 
allele frequency per generation is approximately pq/ 
2N, where N, is the breeding size of the population 
and p is the allele frequency in the current generation. 
When the breeding size of the population is small, 
then there is more variation in an allele frequency 
from generation to generation. As shown in Figure | 
in both panels, the eventual outcome of genetic drift is 
that the allele frequency p does a random walk to p = 0 
or p = 1 so that that the population becomes fixed for 
either the A allele or a allele. Another consequence of 
drift is that as the gene pool becomes fixed for A or a, 
the heterozygosity (H,) in the population (the fre- 
quency of heterozygotes) in generation t is expected 
to decline each generation from the initial hetero- 
zygosity (Ho) according to the rule: 


H, = (1 —1/2N,)'Ho 


Genetic drift is one of four factors (mutation, 
migration, genetic drift, and natural selection) causing 
gene pools to change over time, and genetic drift is at 
the heart of several recent theories of evolution. In the 
shifting-balance theory of evolution (Wright, 1931) 
genetic drift is part of atwo-phase process of adaptation 
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(A) Empirical histogram of allele frequency in 107 experimental populations of Drosophila melanogaster all 


started with an allele frequency of p = 0.5. (B) Theoretical histogram of allele frequency f(p;t) after multiple of N 
generations and all started with an allele frequency of p = 0.50. (Redrawn from Buri, 1956 and Wright, 1969.) 


of a subdivided population. In the first phase genetic 
drift causes each subdivision to undergo a random 
walk in allele frequencies to explore new combin- 
ations of genes. In the second phase a new favorable 
combination of alleles is fixed in the subpopulation by 
natural selection and is exported to other demes by 
factors like migration between demes. Much of the 
basic theory of genetic drift was developed in the 
context of understanding the shifting balance theory 
of evolution. 

Genetic drift has also played a fundamental role in 
the neutral theory of molecular evolution. In this the- 
ory most of the genetic variation in DNA and protein 
sequences is explained by a balance between mutation 
and genetic drift. Mutation slowly creates new allelic 
variation in DNA and proteins, and genetic drift 
slowly eliminates this variability, thereby achieving a 
steady state. 

Consider for example a new mutation arising in a 
gametes DNA with probability u each generation. 
If there are 2N, alleles in the gene pool, then the 
number of new mutations per gamete per generation 


is u x 2N,. As the distribution flattens out in Figure I, 
the chance that a new neutral allele becomes fixed is 
1/2N,, in that this copy is equally likely to be fixed. 
The rate of new substitutions becoming fixed per 
generation is then = (number of new mutations) 
(probability of fixation) = (u x 2N,) (1/2N,) = u. A 
fundamental prediction of genetic drift theory is then 
the substitution rate À in genes or replacement rate in 
proteins is constant and equal to the mutation rate. 
This prediction amounts to the prediction that there 
is a molecular clock in DNA and protein sequences, 
a prediction for which there is now considerable 
supporting data (Kimura, 1983). 
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Genetic engineering is the manipulation of genetic 
material by either molecular biological techniques or 
by selective breeding. While selective breeding has 
been practiced for thousands of years (domestication 
of the dog; farming corn; brewer’s yeast) the manipu- 
lation of genetic material in vitro was developed in the 
1970s. The DNA is manipulated within a test tube and 
subsequently introduced back into a cell in order to 
change the processes of a cell or organism. In its 
simplest conception a molecular biologist can com- 
bine molecules of DNA from different organisms 
encoding different properties. Most typically manipu- 
lating DNA im vitro requires first, isolating DNA 
from cells, cleaving the DNA with sequence specific 
restriction endonucleases, mixing two independently 
isolated DNAs and joining the DNA molecules with 
DNA ligase. Lastly, reintroducing the DNA into cells 
and identifying the cells which carry the newly joined 
DNA molecules. For example, an antibiotic resistance 
gene is isolated from one bacteria and combined 
in vitro with a plasmid (vector) that is capable of 
replicating in another bacteria. This engineered plas- 
mid is introduced into the bacterium where it confers 
the antibiotic resistance to the newly transformed 
bacteria. 

The term ‘genetic engineering’ is also used to refer 
to the process of altering the expression level of a 
protein, for example a protein may be overexpressed 
for purposes of purifying large amounts of the protein 
by changing its promoter. ‘Genetic engineering’ can 
also be used as a term synonymous with ‘protein 
engineering’ where the biochemical characteristics of 
a protein are altered by mutating the gene which 
encodes the protein. 


See also: Biotechnology; Breeding of Animals; 
Recombinant DNA; Recombinant DNA 
Guidelines 
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Equilibrium is a state in which opposing forces 
balance to create a steady state. This steady state may 
be stable, in which case perturbation away from the 
steady state is followed by a return to that state. Or the 
equilibrium state may be unstable in which case 
changes in the equilibrium lead either to the establish- 
ment of a new equilibrium value or loss of equilib- 
rium. The most well-known genetic equilibrium is the 
Hardy-Weinberg equilibrium. 


Hardy-Weinberg Equilibrium 


Ina sexually reproducing, diploid population of infin- 
ite size in which there is no mutation or migration, no 
natural selection, and where mating takes place at 
random, the frequencies of alleles and genotypes will 
remain unchanged in Hardy—Weinberg equilibrium. 
If this hypothetical population is not at equilibrium 
at the outset, it will take only a single generation to 
establish equilibrium under the conditions defined 
above. Imagine a population like the one defined 
above in which we focus on a gene with two alleles, 
h1 and h2. Let us assume that the frequency of h1 is 
defined as 0.1 and the frequency of h2 equals 0.9, since 
h2 is always 1 — f(h1) in a two-allele system, because 
the sum of allele frequencies must always equal one. 
Since h1 cannot change into h2 by mutation, nor can 
h2 change into h1 and there is no natural selection, 
we may represent random mating by multiplying the 
frequencies of male gametes by the frequencies of 
female gametes: 


(f(h1) + (f(h2)) x (f(h1) + (f(h2)) 
= h1? + 2hth2 + h2? 


For this h1 = 0.1 and h2 = 0.9 example this is 


(0.1 +0.9) x (0.1 +0.9) 
= 0.01h1h1 + 0.18h1h2 + 0.81h2h2 


and the allele frequencies which will produce the next 
generation are ascertained by collecting alleles from 
the diploid individuals on the right side of the equation: 


f (bl) = 0.01 + (1/2)0.18=0.1 and 
f(h2) = (1/2)0.18 + 0.81 = 0.9 


Neither the allele frequencies nor the genotype fre- 
quencies will change under the conditions defined; the 


single locus, two-allele genetic system is at equilibrium. 
This is an inherently unstable equilibrium, because 
there are no active forces balancing the equilibrium 
state. 

If, for example, we relax the requirement that the 
population size is infinite and instead reduce the 
population size to 505, allele and genotype frequencies 
will change simply because the population is of finite 
size (see Genetic Drift). If we go from the infinite 
population to the population of 505 at the diploid 
stage we would expect to see: 


0.01(505)hih1 + 0.18(505)hih2 + 0.81(505)h2h2 
= 5.05hih1 + 90.9h1h2 + 409.05h2h2 


The total number of individuals must equal 505, but 
organisms come only as whole units, so one or more of 
the genotypic classes will gain and others will lose by 
chance. There may, for example, be 6 hihi, 90 hih2, 
and 409 h2h2 for a total of 505 progeny. When these 
reproduce the allele frequencies which will produce 
the next generation will be: 


f (bl) = ((12 + 90)/1010) = 0.10099 and 
f(h2) = ((90 + 818)/1010) = 0.89901 


Using these allele frequencies it is obvious that the 
genotype frequencies will be different and if the infin- 
ite population assumption is restored alleles will 
remain at these new frequencies until another force 
acts to disturb the equilibrium. This Hardy—Weinberg 
equilibrium is, thus, an unstable equilibrium. How- 
ever, stable equilibria are seen in genetic systems. 
For example some dominant alleles are lethal or 
produce sterility in the heterozygous state. In this 
case, natural selection acts to remove these dominant 
alleles from the population in a single generation, 
because the heterozygotes die or fail to reproduce. 
The equilibrium state is represented by the balance 
of elimination by death or sterility and production 
by mutation. 


Mutation-Selection, a Balanced 
Equilibrium 


Dominant alleles which confer prereproductive death 
or sterility on their carrier are eliminated from the 
population in a single generation. All the alleles seen 
in a population must be new mutations and the 
equilibrium is simply a balance between elimination 
by death or sterility and new mutations. 


Overdominance 


Perhaps the most well-known case of stable equilib- 
rium in human genetics is sickle-cell anemia, where 
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the equilbrium is determined by the balanced loss of 
the normal genotype that is susceptible to malaria 
and the sickle-cell homozygote that is lost to anemia. 
The heterozygote is resistant to malaria and does not 
suffer anemia; thus both normal and sickle-cell alleles 
are maintained in the population at an equilibrium 
value determined by severity of the anemia and pre- 
valence of malaria in specific populations (Sickle Cell 
Anemia and Overdominance). 


See also: Balanced Polymorphism; Genetic Drift; 
Hardy-Weinberg Law; Overdominance; 
Sickle Cell Anemia 
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In environments that fluctuate and change unpre- 
dictably, the adaptability of a population is critically 
dependent upon the maintenance of reserves of genetic 
variability. Genetic homeostasis describes this prop- 
erty of populations that emerges from stabilizing 
selection which operates on individuals. In such envir- 
onments, the adaptedness of individuals is increased 
by the enhanced buffering against developmental 
instability that comes from heterozygosity. Such buf- 
fering enables individuals to produce the proper adapt- 
ive phenotype despite the inevitable environmental 
fluctuations that occur during development. 


What is Homeostasis? 


From an organism’s perspective, the ability to main- 
tain normal physiological functioning despite incon- 
stant environmental variables — both internally and 
externally — is a central aspect of their fitness. Conse- 
quently, homeostasis is one of the fundamental adap- 
tations. The ability of an animal embryo to develop 
normally even in the face of large insulin fluctuations, 
for example, or of a mammal to maintain a constant 
body temperature despite changing weather, confer a 
survival advantage. 


Populations as well as Individuals Can Be 
Homeostatic 


Just as individuals exhibit physiological or develop- 
mental homeostasis, populations, too, may exhibit 
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homeostatic devices. That is, the genetic composition 
of populations may have properties that confer on that 
population a greater or lesser likelihood of persisting 
in the face of environmental change. Among popula- 
tions of sexually reproducing organisms, balanced 
polymorphism 1 is a likely suspect in achieving popula- 
tion homeostasis. In nature, heterozygous genotypes 
are often more fit than homozygotes. For example, 
in a population where all individuals share a single 
genotype, e.g., A;Aj, individuals may be fertile only 
within the temperature range of 20-30 °C. In another 
population, all individuals may be AA, and fertile 
only between 15 and 25 °C. A population containing 
individuals of both genotypes would, of course, have 
fertile individuals throughout the larger range of 15 to 
30 °C and, consequently, in the face of varying envir- 
onments would be more likely to persist than either 
monomorphic population. 

The converse of this situation is true as well. To the 
extent that balanced polymorphism is best for popula- 
tion homeostasis under conditions of environmental 
fluctuation, long-term environmental stability results 
in populations that exhibit decreased polymorphism. 
In a clever experiment utilizing a naturally occurring 
inversion system in Drosophila, Richard Lewontin 
demonstrated this, documenting the destruction of a 
polymorphic system by natural selection when it was 
maintained under extremely constant conditions for 
more than 40 generations. 

Additional evidence of genetic homeostasis comes 
from artificial selection experiments. From these 
we can observe that natural selection tends to resist 
changes in allele frequencies. In an equilibrium popu- 
lation, selection on one particular feature (say abdom- 
inal bristle number in fruit flies or toe morphology in 
chickens) predictably decreases one or more of the 
major components of fitness as a correlated response. 
Moreover, when the artificial selection is suspended 
before too much of the genetic variance is lost due 
to fixation, the frequencies of those genes for fitness 
components return to equilibrium and the mean value 
of the selected character reverts towards its original 
value. Given this strong property of populations 
to resist change, genetic homeostasis is sometimes 
referred to as genetic inertia. 

In similarly revealing experiments on hetero- 
zygosity and homeostasis, 178 strains of Drosophila 
pseudoobscura were created from a balanced poly- 
morphism population (Lewontin, 1956). Within each 
of these strains, all of the flies were completely homo- 
zygous for the second chromosome. Larval viability 
was then measured for each strain and compared with 
that found among the heterozygotes. Somewhat sur- 
prisingly, more than a dozen of the homozygous 
strains showed greater larval viability than the average 


of the heterozygotes. This prompted the question of 
why the original population would be polymorphic at 
all when there were clearly some homozygotes with 
higher fitness. 

The key is that when the environment (either tem- 
perature or food composition) was altered just slightly, 
the homozygous strains that had exhibited high larval 
viability could no longer match the viability of the 
heterozygotes, which barely changed at all. Occasion- 
ally, under a narrow and specific set of environmental 
conditions, homozygotes may be more fit than hetero- 
zygotes. But like a 100 m sprinter versus a decathlete, 
these superspecialists just cannot compete when the 
playing field varies over time. Hence, the polymorphic 
populations persist. 


Organisms May Achieve Homeostasis via 
Heterozygosity 


Animportant feature of populations in which balanced 
polymorphisms are maintained is that they inevitably 
produce more heterozygous individuals than less 
genetically variable populations. It turns out that just 
as the populations that these heterozygous individuals 
come from exhibit greater genetic homeostasis, the 
heterozygous individuals themselves have greater 
developmental homeostasis in the face of varying 
environments. 

Imagine a gene for a generalized enzyme. Suppose 
that alternative alleles for this enzyme code for slightly 
different forms of the protein, each with a slightly 
different range of conditions (temperature, pH, or 
salt concentration) of optimal activity, perhaps oper- 
ating through slightly different synthetic pathways. 
Heterozygous individuals, with two different forms 
of the enzyme, would better be able to accom- 
modate the vagaries of the environment. Now multi- 
ply this homeostatic effect across hundreds or even 
thousands of genes. The greater the number of hetero- 
zygous loci, the greater the biochemical diversity 
and the stronger the potential buffering, homeostatic 
effect. 

In the absence of knowledge about a trait’s adaptive 
significance, of course, simple measures of variability 
do not necessarily represent an index of homeostasis. 
For some traits, evolution may actually favor high 
variability rather than uniformity within individuals. 
On the other hand, there are traits such as histone 
structure, for which selection may favor minimal 
variability across changing environments. Because 
there is, unfortunately, no consistent relationship 
between homeostasis and variability, it is not always 
possible to estimate homeostasis by observing pheno- 
typic variability for just a single trait. 


Inbreeding Leads to Genetic Uniformity 
but not Phenotypic Uniformity 


The fitness benefits of maintaining some heterozyg- 
osity in populations is most clearly demonstrated via 
inbreeding. The process of inbreeding has two effects: 
First, it creates populations with no genetic variance, 
where all individuals are genetically identical; and 
second, by reducing the number of unique alleles 
occurring at each locus to one, it produces individuals 
which all are completely homozygous. The first effect 
is usually the desired goal of inbreeding. The second 
effect is an unavoidable by product of inbreeding. 
In an inbred strain, after 20 or more generations of 
brother-sister mating there is no genetic variability 
and individuals can be presumed homozygous at 
every locus. In a population of F; hybrid animals, on 
the other hand, there also is no genetic variability, but 
all of the individuals are heterozygous at every locus 
for which the parent strains had different alleles. 
Populations of random-bred and wild-caught animals, 
by comparison, usually have some level of genetic 
variability and individuals have some intermediate 
level of heterozygosity. 

Phenotypic variability can be compared between 
populations by measuring coefficients of variation 
(CVs) for a variety of physical, biochemical, and beha- 
vioral characters. The CV is simply the ratio of the 
standard deviation to the mean for the group. It serves 
to make the variance measure independent of the mean. 

For many characters, in many organisms, inbreed- 
ing increases the developmental instability so much 
that it overrides any decrease in phenotypic variance 
that may have been achieved by the decreased genetic 
variance. Somewhat surprisingly to many researchers, 
in these cases, using inbred strains of animals makes it 
more difficult to detect significant differences between 
treatment groups than if Fı hybrids or random-bred 
animals were used. In one study utilizing data from 
fourteen species, including invertebrate and vertebrate 
animals as well as plants, the CVs for 172 characters 
were calculated and compared between inbred strains 
and F; hybrids. In more than 80% of the cases, the F; 
hybrids exhibited significantly less phenotypic vari- 
ability, sometimes several-fold less. The characters 
analyzed spanned a broad range and included life 
history characters such as rate of development, repro- 
ductive output, and longevity, physical traits, and 
behavioral traits such as learning, wheel running, and 
open-field activity. 


Fluctuating Asymmetry is Another 
Indicator of Poor Homeostasis 


In order to test whether heterozygosity enhances 
homeostasis, it is useful to have a single, standard 
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measure which can serve as an index of developmental 
stability, rather than having to compare the CV for 
multiple traits. One such measure is fluctuating asym- 
metry. Fluctuating asymmetry (FA) is defined as ran- 
dom deviations in the expression of normally bilateral 
characters and is generally ascribed to “developmental 
accidents’ or noise. 

The empirical calculation of fluctuating asymmetry 
is straightforward. First, the asymmetry of a character 
in an individual is measured by noting the difference 
in the measure of that character between the right and 
left side of the individual as a proportion of the mean 
value of the character. This is then repeated for six to 
ten additional characters for that individual, and by 
summing these values a single composite ‘symmetry 
index’ for the individual is computed. This makes it 
possible to compare the symmetry index between dif- 
ferent populations of individuals, such as those with 
high versus low average heterozygosity. 

Because they have virtually no heritability, devia- 
tions from bilateral symmetry do not appear to repre- 
sent genetic differences. Similarly, it seems unlikely 
that any aspect of the external environment differs 
systematically or consistently between the left and 
right side of an organism. Instead, such deviations seem 
to indicate a breakdown in normally well-buffered 
developmental pathways or a lack of homeostasis. 
Thus, the greater the average FA among the indi- 
viduals in a population, the lower their homeostasis. 

Observations of FA and its relationship to hetero- 
zygosity have been made for many traits in a wide 
variety of taxa. For instance, in D. melanogaster, the 
length of the right and left wing, as well as the number 
of bristles on the left and right sides of the body within 
an individual, vary significantly more among indi- 
viduals as homozygosity increases. Similar patterns 
are seen for structural features in fish, mammals, and 
molluscs. Generally, it appears that: (1) populations 
and individuals with higher heterozygosity generally 
exhibit lower frequencies of FA, and (2) the frequency 
of FA increases with the degree of inbreeding. Because 
of the apparent link between FA and homeostasis, 
researchers have used it to assess exposure to environ- 
mental stress in humans and other animals. Interest- 
ingly, it has even been noted that some animals 
preferentially choose mates that exhibit greater sym- 
metry than the population average and in humans, 
ratings of attractiveness, too, are significantly correl- 
ated with measures of physical symmetry. 


Increased Variability among Inbreds has 
Practical Implications 


The evolutionary origin of developmental stability is 
important in its own right, but it also has significant 
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practical implications for experimental biologists. 
Because it is critically important to obtain research 
animals that offer maximum likelihood of detecting 
real differences between treatment and control groups, 
it may be unwise to rely on a small number of inbred 
strains of animals. Increased developmental instability 
among inbred organisms may obscure true relation- 
ships between biological characters and the effects of 
experimental manipulations. Testing the response of 
an experimental treatment in an inbred strain is the 
equivalent of repeated testing on a single individual, 
since an inbred strain is a single genotype. Thus, stud- 
ies utilizing a single strain (or a small number of 
strains) may produce results which do not necessarily 
characterize the general pattern of response to the 
treatment. 

This problem is manifest, for example, as signifi- 
cant differences among strains of mice and rats in rates 
of occurrence of common lesions. An investigator 
using Fischer 344 rats, for instance, might conclude 
that most rat mortality and morbidity is due to ade- 
nomas of Leydig cells, bile duct hyperplasia, and hep- 
atic microabscesses, since they are observed in 51% 
(of males), 56%, and 33% of the animals, respectively. 
An investigator using Brown Norway rats, on the 
other hand, might not observe a single incidence of 
any of these lesions and instead might conclude 
that the pathologies of greatest concern are testicular 
atrophy, chronic dacryoadenitis of the harderian 
gland, and nodular vacuolation of adrenal cortical 
cells (observed in 57% (of males), 52%, and 31%, 
respectively). 

Such unique patterns of pathology make the study 
of individual diseases easier by utilizing a single inbred 
strain. Inbred animals may not, however, be the best 
tools for dissecting a multifactorial process such as 
aging or development. Researchers may gain experi- 
mental power by using F; hybrids in place of any 
specific inbred strain. The F; hybrid genotypes are 
equally replicable and their inter individual pheno- 
typic variability may be significantly lower. 
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Genetic load is a measure of the extent to which the 
average fitness, viability, or other favorable attribute 
of a population is decreased by the factor under con- 
sideration. Thus there are the following types of load: 
a mutation load, caused by deleterious mutations; a 
segregation (or balanced) load, caused by segregation 
of poor homozygotes at loci where the heterozygote is 
favored; a recombination load, caused by the breakup 
of favorable gene combinations by recombination; a 
load due to meiotic drive or gamete selection in which 
these processes produce less favored genotypes; an 
incompatibility load, cause by maternal-fetal incom- 
patibility, as in the Rh blood groups; a drift load, 
caused by unfavorable alleles increasing in frequency 
by random processes in small populations; and a 
migration load, caused by immigrants adapted to a 
different environment. 

The word ‘load’ was introduced in 1950 by H.J. 
Muller in an article entitled “Our load of mutations” 
(Muller, 1950). His purpose was to quantify the reduc- 
tion in mean fitness caused by recurrent mutation 
using the Haldane-Muller principle, which says that 
the effect of mutation on fitness is to reduce it by the 
total mutation rate per zygote. The word was then 
extended to include all the fitness-reducing processes 
mentioned above. 

The choice of word, load, is unfortunate in its impli- 
cation that a load is necessarily bad. A genetic load may 
be a reflection of the opportunity of the species 
to undergo further evolution. For example, a variable 
natural population has a lower average fitness than one 
consisting entirely of the genotype of maximum fitness. 
Yet the uniformly high-fit population lacks the genetic 
variability necessary for evolution by natural selection. 
Likewise, mutation is a requisite for evolution. 

The load can be the ‘expressed’ load, i.e., that which 
occurs in a natural, usually randomly mating popula- 
tion. There is also a ‘total’ load, which includes the 
‘hidden’ load, i.e., that which is brought out by special 
circumstances, such as inbreeding. Separating these 
loads, usually by studies of inbreeding, has revealed a 


great deal about the amount of hidden variability in 
natural populations. 

The loads that have been the most extensively 
researched and discussed are the mutation and segre- 
gation loads. Beginning in the 1950s, the hidden muta- 
tion load, as revealed by inbreeding, was used as a way 
to estimate the genomic mutation rate in organisms 
such as the human, where experimental measures were 
not feasible. Load principles were also invoked in an 
attempt to assess the impact on the population of an 
increased mutation rate, such as might be caused by 
radiation or environmental mutagens. 

In the 1960s there was controversy between those, 
especially H.J. Muller, who favored the ‘classical’ hypo- 
thesis of population structure, and those, especially 
Th. Dobzhansky, who favored the ‘balance’ hypo- 
thesis. According to the classical hypothesis, most 
genetic variability in a sexually reproducing popula- 
tion is caused by recurrent mutation and overdom- 
inant loci are rare. The balance hypothesis assumes that 
most loci are overdominant and that most variability is 
caused by segregation from superior heterozygotes. 
The genetic load is much larger under the balance 
hypothesis. The reason is that under the classical 
hypothesis deleterious mutants are kept at low fre- 
quency by natural selection, whereas with overdomin- 
ance deleterious homozygotes are relatively common. 
Some argued that the balance hypothesis entails a large 
segregation load, perhaps too large to be realistic. 
Others countered that with rank order selection, the 
load could readily be accommodated. Although the 
issue was strongly debated and many experimental 
studies were undertaken, often with useful by-product 
information, they failed to settle the issue as to how 
much genetic variability a natural population contains. 
The answer came later with the discovery of molecular 
methods, first protein polymorphisms and later direct 
measurements of DNA. The answer, curiously, is be- 
tween what the two hypotheses predict. The amount 
of protein heterozygosity, about 5-10%, is less than 
the balance school would have predicted, but higher 
than had been argued by the classical school. Now 
other causes of genetic variability have been discovered 
and this, along with the realization that much molecu- 
lar variability may be neutral, has caused the debate 
to subside. Genetic load is now a part of population 
genetics theory and not a matter of controversy. With 
the evidence that the mutation rate may be higher than 
was earlier suspected and the number of overdominant 
loci fewer, the question of current interest is not a large 
segregation load but a large mutation load. 

The other kinds of loads mentioned above have had 
much less theoretical treatment, but all are factors in 
the structure and evolution of natural populations. 
Current research on these subjects emphasizes direct 
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measurements of allele frequencies rather than indir- 
ect assessments from load theory. 
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Genetic Mapping 


See: Chromosome Mapping, Gene Mapping 
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A genetic marker is any identifiable allele of interest in 
an experiment. 


See also: Marker; Marker Effect; Marker Rescue 
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The term genetic material describes the physical sub- 
stance that is inherited from parents by offspring. 
Generally this refers to DNA. This physical substance 
is a constant connection for all cells in a body or 
colony, all individuals in a species, or ultimately, all 
organisms. The physical substance carries the infor- 
mation specifying the enzymes, structural proteins, 
and other gene products that are characteristic of life. 
More abstractly, genetic material refers to that infor- 
mation, or code, that directs life processes. 


See also: DNA; Universal Genetic Code 
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Genetic migration is the expansion of the geographical 
distribution of a species by expansion of populations 
and the founding of new populations in a previously 
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unoccupied area. Many of the best-documented cases 
involve the movement out of a glacial refugium into an 
area from which the species had been excluded by the 
glacier. 

Fourteen to 17000 years ago, when sea levels were 
lowered by the accumulation of glacial ice, Native 
Americans migrated from Asia to North America 
across the land bridge between Siberia and present- 
day Alaska. They continued their migration east to the 
Atlantic Ocean and south to the tip of South America. 

The most recent cycle of the Wisconsin glaciation 
forced plant species occupying either high latitudes 
or elevations to migrate substantial distances. For 
example, during the Wisconsin glaciation, ponderosa 
pine retracted to refugia in two general areas, northern 
Mexico and the Pacific Coast. At the end of the ice age, 
ponderosa pines began to spread north from Mexico, 
reaching the San Andres Mountains of southern New 
Mexico 14 920 years ago (ya), the Santa Catalina Moun- 
tains of southern Arizona a few centuries later, and the 
Grand Canyon about 10 000 ya. They arrived in eastern 
Nevada 6100 ya, and in northern Colorado 5090 ya. 
Ponderosa pines reached northeastern Wyoming 
about 4000 ya, and continued north around the north- 
ern edge of the Great Basin where they formed a narrow 
transition zone in eastern Montana with the ponderosa 
pines spreading east from their Pacific refugia. 

At the height of the most recent glaciation, 18 000 
years ago, Scandinavia and parts of the British Isles 
were covered with ice, and tundra and permafrost cov- 
ered central Europe. So it comes as no surprise that the 
modern ranges of European plants and animals were 
colonized from glacial refugia further south. Phylogeo- 
graphical studies of eight animals (including a newt, 
Triturus cristatus, a grasshopper, Chorthippus paralle- 
lus, hedgehogs, Erinaceus spp., and bear, Ursus arctos) 
and four plants (alder, Alnus glutinosa, oaks, Quercus 
spp., beech, Fagus sylvatica, and fir, Abies alba) identi- 
fied four refugia: the Iberian Peninsula, and areas in 
Italy, Greece, and Turkey. The northern areas of mod- 
ern distributions generally exhibit less genetic diver- 
sity than the southern areas, almost certainly as a 
consequence of successive population bottlenecks as 
populations spread to the north, trickling over high 
passes and dispersing across inhospitable terrain. 

Concordant plant and animal phylogeographies 
have revealed genetic migrations from a previously 
unappreciated glacial refugium in western North 
America. A comparison of cpDNA phylogenies of 
plants in the Pacific Northwest revealed similar geo- 
graphical patterns of cpDNA variation in six of seven 
species analyzed. These species include three herb- 
aceous perennials (Tolmiea menziesii, Tellima grandi- 
flora, Tiarella trifoliata), a shrub (Ribes bracteosum), 
a tree (Alnus rubra), anda fern (Polystichum munitum). 


Similar phylogeographical patterns were found in 
mtDNA of black bear (Ursus americanus), brown 
bear (U. arctos), marten (Martes americana), and 
short-tailed weasel (Mustela erminea). Deep clefts in 
the intraspecific phylogenies separate populations 
north and south of the border between Oregon and 
Washington. The concordance of these phylogeograph- 
ies is attributable to isolation of both the plants and 
animals in the Haida Gwaii refugium, in the present 
Queen Charlotte Islands of British Columbia. The 
clefts in the phylogeographies reveal the differenti- 
ation that evolved between populations in the Haida 
Gwaii refugium, surrounded by ice, and the popula- 
tions occupying the ice-free area south of Washington. 


Further Reading 

Byun SA, Koop BF and Reimcher TE (1997) North American 
black bear mtDNA phylogeography: implications for mor- 
phology and the Haida Gwaii glacial refugium controversy. 
Evolution 51: 1647—1653. 

Hewitt GM (1999) Post-glacial re-colonization of European 
biota. Biological Journal of the Linnean Society 68: 87—| 12. 

Soltis DE, Gitzendanner MA, Strenge DD and Soltis PS (1997) 
Chloroplast DNA intraspecific phylogeography of plants 
from the Pacific Northwest of North America. Plant System- 
atics and Evolution 206: 353-373. 


See also: Allopatric; Phylogeography; Speciation 
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In Mendel’s original experiments on the transmission 
of different alleles from crosses that segregated various 
mutations, certain ratios of offspring were observed 
consistently. Thus, with a cross between two parents 
both heterozygous for a recessive mutation, the off- 
spring appeared in a 3:1 ratio of wild-type to mutant. 
In a cross between one parent heterozygous for a 
dominant mutation and a second wild-type parent, 
the offspring appeared in a 1:1 ratio of wild-type to 
mutant. More complicated ratios were obtained by 
Mendel in crosses that involved mutations at more 
than one locus. 


See also: Mendel’s Laws; Punnett Square 
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D Carroll 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0543 


Definitions 


Genetic recombination refers to the rearrangement 
of DNA sequences by some combination of the 
breakage, rejoining, and copying of chromosomes 
or chromosome segments. It also describes the con- 
sequences of such rearrangements, i.e., the inheritance 
of novel combinations of alleles in the offspring that 
carry recombinant chromosomes. Genetic recombin- 
ation is a programmed feature of meiosis in most 
sexual organisms, where it ensures the proper segrega- 
tion of chromosomes. Because the frequency of 
recombination is approximately proportional to the 
physical distance between markers, it provides the 
basis for genetic mapping. Recombination also serves 
as a mechanism to repair some types of potentially 
lethal damage to chromosomes. 

Genetic recombination is often used as a general 
term that includes many types of DNA rearrange- 
ments and underlying molecular processes. Meiotic 
recombination is an example of a reaction that 
involves DNA sequences that are paired and hom- 
ologous over very extended lengths. This type of pro- 
cess, which is illustrated in Figure I, is termed general, 
legitimate, or homologous recombination. Recombin- 
ation of this type is reciprocal, because each partici- 
pating chromosome receives information comparable 
to what it donates to the other partner. The event 
shown in Figure | is also designated as a crossover, 
since all the information on both sides of the effective 
break has been exchanged. 

Gene conversion is a form of homologous recom- 
bination that is nonreciprocal. This is recognized by 
the recovery of unequal numbers of the parental mark- 
ers at a particular locus, and a simple example is 
shown in Figure 2. Conversion events can be accom- 
panied by a crossover, or not (as shown in Figure 2). 
In the latter case, conversion looks like a very local- 
ized double crossover, but it is nonreciprocal and is 
likely the result of a single event. 

Homologous recombination can occur between 
homologous chromosomes or sister chromatids in 
mitotic cells as well. In addition, essentially analogous 
events may take place between homologous sequences 
that are present at different locations on nonhomolo- 
gous chromosomes; this is often called ectopic recom- 
bination. Recombination that involves very limited or 
no homology between the interacting DNA sequences 
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Figure | Simplified diagram of a meiotic recombin- 
ation event. Vertical bars indicate individual chromatids 
(i.e., double-stranded DNA molecules); shaded ovals are 
centromeres. We imagine two pairs of sister chromatids 
after premeiotic DNA synthesis that are distinguished 
by color and by genetic markers at locations A/a and B/b. 
If meiosis were to proceed without recombination, the 
markers would segregate 2:2 in linked pairs in the 
resulting gametes or spores, as indicated below the left 
diagram. If one reciprocal recombination event takes 
place between the two markers, the linkage relation- 
ships are changed, yielding two new chromatids as 
shown and ultimately four distinct haploid products. 


is termed illegitimate or nonhomologous recombin- 
ation. Sometimes a few matched base pairs are seen 
precisely at illegitimate recombination junctions, and 
these are called microhomologies. An event supported 
by homologies of 100 bp or more would typically be 
classified as homologous, a match of 10 bp or fewer 
would be nonhomologous, and there is evidently a 
gray area in between. 

In conservative recombination events, the number 
of copies of the interacting chromosomes or DNA 
sequences is maintained throughout the process, 
while in nonconservative events, two original copies 
are reduced to one in the product. This distinction can 
be made for both homologous and nonhomologous 
recombination. 

Site-specific recombination events are mediated 
by sequence-specific recombination enzymes often 
encoded by viruses or transposable elements. The 
molecular processes they catalyze may rely on very 
short stretches of homology between the interacting 
DNAs, or they may be entirely nonhomologous. 
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Gene conversion: 
1 AB 
1 aB 
2 ab 


Figure 2 Illustration of a gene conversion event. 
Unlike the reciprocal recombination shown in Figure |, 
information has been transferred only from one parent 
to the other, and the extent of the information ex- 
changed is smaller. 


Genetic Mapping 


To a first approximation, the probability that a genetic 
recombination event will occur during meiosis is dis- 
tributed equally along the length of each chromosome, 
and in most organisms the number of crossovers in 
each chromosome arm is limited to one or a few. This 
means that it is quite unlikely that an event will occur 
between two genes that are very close to each other on 
a chromosome, but much more likely between distant 
genes. The closer genes A and B are along the DNA, 
the less likely an exchange that rearranges the alleles of 
these genes, as shown in Figure |. This forms the basis 
of genetic mapping. 

The frequency of recombination is defined as the 
fraction of all cases in which two genetic markers that 
came from the same parent are found separated in the 
offspring. If two markers are on different chromo- 
somes, they will not be linked as they pass through 
meiosis, and their recombination frequency will be 
0.5, i.e., they will segregate into the same gamete by 
chance half the time and into different gametes half the 
time. Markers that are very close to each other on the 
same chromosome arm will be separated very rarely 
and will have a recombination frequency close to zero. 
Markers more distant from each other on the same 
chromosome will show recombination frequencies 
between zero and 0.5. 

Now imagine a situation in which a third marker, 
c, is added to the same chromosome. When the three 


markers are monitored in pairwise combinations, the 
measured recombination frequencies (if they are not 
too high) are essentially additive, and the numbers are 
consistent with the physical order of the correspond- 
ing genes on the chromosome. For example, if b lies 
between a and c, the recombination frequencies for 
the ab and bc pairs will be smaller than that for ac, and 
the latter will be approximately the sum of the two 
smaller numbers. In this way, measured recombin- 
ation frequencies are used to determine the order of 
genes along chromosomes and the relative distances 
between them. 

We now know that recombination frequencies are 
not uniform throughout the length of a chromosome. 
When examined very closely, there are hot spots with 
elevated frequencies and relatively cold spots with 
reduced frequencies. This reflects the interaction 
of the recombination machinery with specific DNA 
sequences and chromosomal configurations. None- 
theless, since genetic recombination measures genetic, 
not physical, distances, distant markers usually obey 
the additivity rules. 


Recombination and DNA Structure 


Each cellular chromosome usually consists of a single 
molecule of double-stranded DNA. Genetic recom- 
bination may begin with the exchange of only one of 
the two DNA strands, and recombination outcomes 
often reflect this fact. Some examples are shown in 
Figure 3, which illustrates gene conversion, post- 
meiotic segregation, and DNA repair. We imagine 
two replicated homologous chromosomes (indicated 
by different shading in Figure 3) undergoing hom- 
ologous recombination. At any particular location 
along the chromosomes, one or both strands of DNA 
may be exchanged. When only one strand is exchanged, 
a heteroduplex is formed that contains one strand from 
each parent. If the parents differ in sequence in this 
region, the heteroduplex will be subject to correction 
by the mismatch repair machinery of the cell. Mis- 
match repair is frequently responsible for gene con- 
version, as shown in products 4 and 7 in Figure 3. 
When meiosis is completed, each haploid gamete 
will receive one of the four DNA duplexes shown in 
each of the diagrams in Figure 3. Homoduplex will be 
inherited at all sites that were not directly involved in 
the recombination event and at all sites where a het- 
eroduplex was repaired. In cases where mismatches 
escape repair, the heteroduplex will be transmitted to 
a gamete, and information from both parents will be 
present at the unrepaired site (B/D in diagrams 3 and 
6). When the heteroduplex DNA is replicated, after 
fertilization or germination (depending on the type of 
organism), homoduplexes of the two parental types 
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Figure 3 Illustration of some events that occur following the initial exchange of one strand of DNA between 


homologous chromosomes. Each DNA single strand is shown as a vertical bar, and the parental chromosomes are 
imagined to differ in allelic markers in three genes: A/a, B/b, and C/c. In diagram I, all white strands carry ABC 
information, while all shaded strands are abc. In subsequent diagrams, the information in the strands of the interacting 
chromosomes is noted explicitly. After the initiation event (2), in which a segment of one strand invades 
corresponding sequences in a homologous chromosome, several outcomes are possible. The initial patch, carrying 
marker b, can be incorporated into the recipient chromosome, and the gap it left behind can be filled by DNA 
synthesis (3). Mismatches in the resulting heteroduplex at B/b can be repaired (4). An alternative fate of intermediate 
2 is the reciprocal exchange of a single strand from the invaded chromosome (5). One possible way this intermedi- 
ate can be resolved is by cleavage and religation of the nonexchanged strands; this leads to a crossover, with 
heteroduplexes remaining at site B/b (6). If these heteroduplexes are both repaired to the B allele, the result will be 
that shown in 7. Below the diagrams of products 3, 4, 6, and 7, the genetic outcomes are indicated. The line labeled 
‘Gametes’ shows the status of the double-stranded DNAs, with B/b indicating the persistence of heteroduplexes. The 
results of postmeiotic segregation (PMS) are indicated, as are the recoveries of the parental alleles at B/b for each of 
the outcomes. Product 4 would be scored as a gene conversion, while product 7 is a gene conversion associated with 
a crossover. 


will be produced. This phenomenon is referred to as 
postmeiotic segregation, or PMS. 

As shown in Figure l, the usual outcome of meio- 
sis would be the recovery of equal numbers of parental 
alleles, in either parental or recombinant configur- 
ation. The processes illustrated in Figure 3 can alter 
this distribution for markers that are close to the site of 
the recombination event itself. The number of alleles 
of each parental type that are present in each of the 
recombination products is tabulated in the figure. The 


2:6 and 6:2 segregation (products 4 and 7) represent 
gene conversion. The 3:5 (product 5) and aberrant 4:4 
(product 6) segregation patterns are revealed as PMS. 


DNA Repair by Recombination 


So far we have emphasized recombination events that 
occur in a programmed fashion during meiosis. In 
mitotic cells, the principal function of recombination 
appears to be the repair of double-strand breaks (DSBs) 
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Figure 4 Modes of double-strand break repair by recombination. Each horizontal bar represents a double-stranded 
DNA molecule. Thin bars indicate no homology, while thick bars denote homologous sequences. In nonhomologous 
end ligation, broken ends are rejoined precisely without gain or loss of information. Nonhomologous end joining is 
typically accompanied by deletion (as shown) or insertion of DNA. In conservative homologous recombination, the 
break is repaired by copying information from homologous sequences elsewhere in the genome, often from ahomologous 
chromosome or sister chromatid, and the repair may or may not be accompanied by crossing over, as shown in 
the two alternative outcomes. Nonconservative homologous recombination relies on repeated sequences near the 
break, and repair is accompanied by deletion of one copy of the repeat and all sequences between the two copies. 


in DNA. By definition, this type of damage must be 
repaired by recombination: what came apart must 
be put back together. DSBs can be generated by ex- 
ternal agents, like ionizing radiation and some types 
of chemicals, or during normal cellular processes, like 
generation of reactive oxygen species and problems in 
DNA replication. Essentially all types of recombin- 
ation play a role in DSB repair in some organisms or 
cell types: homologous and nonhomologous, conser- 
vative and nonconservative, reciprocal and non- 
reciprocal. Some examples are illustrated in Figure 4. 

Some types of breaks in DNA can be rejoined by 
the simple action of DNA ligase without a need for 
extensive sequence homology. Examples would be the 
short, complementary single-stranded tails generated 
by many restriction endonucleases. More frequently, 
however, the broken DNA ends are not ligatable and 
end joining occurs between novel sequences, often 
with concomitant deletions or insertions. Examin- 
ation of the junctions produced in these illegitimate 
events frequently reveals microhomologies (~1-5 
nucleotides) between the parental sequences. 

Two forms of homologous recombination are 
involved in DSB repair. If an unbroken copy of the 
same sequence is available on a homologous chromo- 
some or sister chromatid, the most reliable way to re- 
store the integrity of the broken DNA is to copy that 


information in repairing the break. Conservative 


homologous recombination of this sort may result in 
a gene conversion, i.e., replacement with information 
from another allele, close to the break, and it may or 
may not be accompanied by a crossover (Figure 4). If 
the donor and recipient sequences were not on hom- 
ologous chromosomes, a crossover would lead to a 
reciprocal chromosome translocation. The conserva- 
tive mechanism operates very efficiently in fungi, where 
it shows considerable similarity to meiotic recombin- 
ation, both in mechanism and in genetic requirements. 
In mammalian cells, a nonconservative homology- 
dependent mechanism seems to predominate. As 
shown in Figure 4, repeated sequences flanking the 
break can recombine with each other. All the DNA 
between these two interacting copies is deleted in the 
process. 


Genetic Engineering by Recombination 


Genetic recombination is a natural process that plays 
critical roles in DNA metabolism. In the research 
laboratory it is sometimes possible to make use of 
cellular recombination machinery to produce specific 
genetic alterations. For example, a yeast researcher 
may want to replace the normal version of a gene 
with a mutant copy, then examine the effects of the 
mutation on the life of the organism. The mutant ver- 
sion of the gene can be produced and verified using 


DNA cloning and sequencing techniques. If it is then 
introduced into living yeast, some fraction of the cells 
will incorporate it at the homologous chromosomal 
site, using the normal recombination apparatus. 

This type of experiment, called gene targeting, 
works rather well in fungi, but is less efficient in multi- 
cellular organisms. The frequency of homologous 
recombination events between an introduced DNA 
molecule and the corresponding chromosomal target 
can be improved if both the target and the introduced 
DNA are broken. Apparently, the cell sees the DSBs 
as damage that needs to be repaired. Picture a variant 
of the conservative homologous DSB repair illustrated 
in Figure 4, in which the broken chromosome is 
shaded and the white DNA is a linear fragment intro- 
duced into the cells. The non-crossover product 
carries the targeted insertion. As described in the 
preceding section, cells have multiple pathways of 
DSB repair, and in practice both homologous and 
nonhomologous events occur at the broken ends. 

In addition to adding to the arsenal of the experi- 
mental geneticist, gene targeting holds promise for 
human gene therapy. In principle, the disease-causing 
version of a gene could be replaced by the normal 
allele using this same procedure. At present, however, 
the efficiency of gene targeting in human cells is too 
low to make this approach practical. 


Further Reading 

Kucherlapati R and Smith GR (eds) (1988) Genetic Recombination. 
Washington, DC: American Society for Microbiology. 

Low KB (ed.) (1988) The Recombination of Genetic Material. 
San Diego, CA: Academic Press. 

Stahl FW (1979) Genetic Recombination: Thinking about It in Phage 
and Fungi. San Francisco, CA: W.H. Freeman. 

Whitehouse HLK (1982) Genetic Recombination: Understanding 
the Mechanisms. New York: John Wiley. 


See also: Crossing-Over; DNA Recombination; 
Gene Conversion; Gene Therapy, Human; 
Recombination, Models of 
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Following the genetic knockout of particular genes 
there is often no detectable or ‘scoreable’ change in 
the phenotype of the organism. Such studies have 
alerted biologists to the presence of genes with over- 
lapping or redundant functions. These include, among 
numerous others, the Drosophila genes gooseberry 
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and sloppy paired, the mouse tanascin and Hox 
genes, and yeast myosin genes. In each of these cases, 
removal of the gene does not result in a quantifiable 
change to the phenotype in the laboratory. 

There are two distinct sets of questions relating to 
genetic redundancy. One set involves determining the 
origins of correlations among gene functions, whereas 
the other relates to the preservation or persistence of 
these correlations through evolutionary time. There 
are two principal theories for the origin of redun- 
dancy, functional shift and genetic duplication. In 
functional shift, two independent genes evolve toward 
some degree of overlap in relation to their current 
functions or a third, novel function. By contrast, 
after a random gene duplication event there are two 
identical genes performing the same function. 

Once a degree of redundancy has emerged how is it 
maintained? This presents a problem because random 
inactivation of one gene from a redundant pair of 
genes is not expected to produce any selectable con- 
sequences. Several mechanisms for preserving redun- 
dancy have been proposed. These include a cumulative 
benefit from gene copy number (dosage effects), 
increased fidelity from overlapping functions (error 
buffering), structural constraints such that genes 
with independent activities and overlapping function 
remain redundant through selection on their independ- 
ent functions (pleiotropy), and convergent functions 
emerging from common structures. 

It can be seen that the problem of functional shift is 
to provide theories for why genes should evolve 
toward correlated function, whereas the problem of 
gene duplication is how to preserve the redundant 
function following the random mutation. Thus the- 
ories for the origin of functional shift are effectively 
equivalent to theories for the persistence of duplicated 
genes. In the discussion that follows, we shall concen- 
trate on the problem of persistence. 


Cumulative Benefit Theories 


When increasing the quantity of a gene product 
increases fitness, genetic redundancy is easy to under- 
stand. Thus, each eukaryotic cell harbors multiple 
copies of the mitochondrial genome, which enables 
cells to metabolize efficiently, and multiple copies of 
tRNA and mRNA genes for efficient translation. 
Eliminating copies would reduce net fitness and 
hence redundancy is maintained by stabilizing selec- 
tion acting on the ensemble of identical genes. 


Mutational Error Buffering Theories 


Genetic error buffering is defined as any mechanism 
that reduces mutational load. Consider two identical 
genes with slightly different mutation rates. One 
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observes that the gene with the higher mutation rate is 
eliminated from the population — it becomes a pseudo- 
gene. If mutation rates are equal, one of two genes 
will eventually become silenced by drift. Only by 
allowing a slight asymmetry in each of the gene’s 
abilities to perform their correlated function, might 
redundancy be preserved. The essential asymmetry is 
that the less efficient gene also possesses the lower rate 
of mutation. If the more efficient gene had the lower 
rate of mutation, there would be no selective reason to 
maintain redundancy. 


Developmental Error Buffering Theories 


Developmental error buffering is defined as any 
mechanism that reduces the deleterious effects of 
nonheritable perturbations of the phenotype during 
ontogeny. An appropriate analogy is that of the dupli- 
cate flight systems employed in aircraft design or an 
external storage device used to back up data from a 
personal computer hard disk. The recurrent risk of 
error during the lifetime of a device can select for 
noise-buffering. Once again, considering two genes it 
can be shown that to preserve a sizeable frequency of 
both genes within the population, we require that 
one gene from the pair must mutate less frequently 
than its duplicate experiences an ontogenetic defect. In 
other words, a gene that acts as a developmental buffer 
must have a high mutation rate and be in support of a 
developmentally unstable gene with a low genetic 
mutation rate. 


Pleiotropic Theories 


Some forms of pleiotropy can also ensure the con- 
servation of redundant function. Recall that pleio- 
tropy refers to cases where a single gene experiences 
selection in more than one context. Consider again 
two genes with two independent functions. Further- 
more, assume that one gene can, in addition to its 
own function, perform the function of the other 
gene, but less efficiently. Redundancy is partial and 
measured in terms of correlated function. When muta- 
tions to the pleiotropic gene can either eliminate its 
unique function or both functions, then redundancy is 
preserved whenever the rate of elimination of the 
pleotropic function is lower than elimination of its 
unique function. In other words, when one gene’s 
pleiotropic function is more robust than the other 
gene’s unique function, the correlated function can 
be preserved. 


Genetic Regulatory Element Theories 


We should consider not only redundancy among cod- 
ing regions, but also redundancy among regulatory 


elements associated with these genes. For example, if 
two duplicated genes are each accompanied by dupli- 
cates of subsets of regulatory elements (where each sub- 
set overlaps to some degree through shared elements), 
the redundant genes can be maintained by selection 
acting through their unique patterns of expression and 
the shared regulatory element. The shared element is 
assumed to control the correlated function. If one 
assumes that the shared regulatory element is a smaller 
mutational target than the coding region, this will 
prolong the half-life of the redundant function. It is 
not sufficient, however, to prevent one or more shared 
elements from becoming silenced in the long term. To 
preserve redundancy indefinitely we would require, 
as with the pleiotropic model, some asymmetry in 
mutation and/or efficacy of the regulators. 


Summary 


In summary the problem of redundancy is the prob- 
lem of its preservation. Evolutionary stability of non- 
trivial redundancy requires asymmetries in mutation 
and functional efficiency. However, trivial redun- 
dancy such as dosage effects could provide the most 
parsimonious explanation for the observed data. 
Assuming that weak selection acting in large popula- 
tions over long time scales has played an important 
role in genome evolution, cumulative benefit appears 
to be refuted simply because experimental assays are 
not sufficiently sensitive. 


See also: Gene Regulation; Mutation Load; 
Pleiotropy 
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See: Gene Mapping; Gene Therapy, Human; 
Genetic Counseling; Pedigree Analysis; Prenatal 
Diagnosis 
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A stock center is a repository for strains or varieties 
or species of organisms for the purpose of preserva- 
tion and distribution of a wide range of useful organ- 
isms. Genetic stock centers collect primarily mutant 


derivatives of one or more founder strains, along with 
related nonmutant strains for long-term preservation 
and distribution to researchers. This contrasts with 
collections of germplasm and type cultures, which 
are more heterogeneous holdings of a wide range of 
species and varieties (see, for example, Comprehensive 
Centers for Microbes, in Berlyn, 2000). A few such 
comprehensive collections are cited in this entry 
because they have incorporated one or more genetic 
stock collections within their aggregate of accessions. 
Many of these general collections in Europe have a 
history predating genetic stock centers, having been 
founded before the rediscovery of Mendel. 


Early Genetic Stock Centers 


In most cases, genetic stock centers originated when 
geneticists working with the species recognized that 
the stocks they were making and using were valuable 
for contemporary and future research in other labora- 
tories and made accommodations for preserving and 
distributing these stocks to colleagues. The recogni- 
tion of the need for the stocks was often accompanied 
by the realization that a means for disseminating new 
scientific results in a rapid, informal way was also 
required for advancing scientific progress with the 
organism. The information function for stock centers 
has made a natural progression from species-specific 
newsletters and published genotypes and linkage 
maps to comprehensive on-line databases (Table 1). 
Many of the genetic stock center databases are part of 
or are linked to genome databases for their species. The 
earliest genetic stock centers include research collec- 
tions of Drosophila and maize in the 1920s, the Jackson 
Laboratory mouse collection in 1929, and, for 
microbes, the Fungal Genetics Stock Center in 1960 
and the E. coli Genetic Stock Center (CGSC) in 1970. 

Contributions of stocks and strains to the stock 
centers have been for the most part from the academic 
research community. Support for public stock center 
operations in the US has primarily come from the 
National Science Foundation (NSF) Living Stock 
Collections Program  (http://www.nsf.gov/pubs/ 
1997/nsf9780/nsf9780.htm), the US Department of 
Agriculture (http://www.ars-grin.gov), the National 
Institutes of Health (NIH) National Center for 
Research Resources (http://www.ncrr.nih.gov), and, 
for some centers, in part from industry. 


Drosophila 


T. H. Morgan and his students at Columbia, as part of 
their studies of Drosophila melanogaster mutants that 
began in 1913, preserved their stocks in a collection 
maintained by C. Bridges. These stocks were provided 
to anyone requesting them. In 1928, Morgan, Bridges, 
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A.H. Sturtevant, and the collection moved to the 
California Institute of Technology. A stock list was 
first published in the Drosophila Information Service 
in 1934 and it consisted of 572 stocks. E.B. Lewis 
directed the stock center from 1948 until his retire- 
ment, with an approximate tripling of the number of 
stocks. In 1987, it moved to Indiana University, under 
the direction of T.C. Kaufman and K.A. Matthews, 
growing from approximately 4500 stocks in 1995, with 
the merger of some of the stocks from the Drosophila 
Mid-America Center (Drosophila Species Collection) 
at Bowling Green, KY, in 1997, and subsequent acquisi- 
tions, to more than 7700 stocks in the year 2000. It is 
supported by the NSF and NIH. History, information 
about stocks, mutations, and nomenclature, and pro- 
cedures for ordering and culturing the stocks, are 
given on the web site for the Bloomington Drosophila 
Stock Collection, http://flystocks.bio.indiana.edu. 
Links to the stock information as well as many other 
kinds of molecular, genetic, morphological, and map- 
ping information on Drosophila are found in the com- 
prehensive Drosophila database, FlyBase, at http:// 
fly.ebi.ac.uk:7081 or http://flybase.bio.indiana.edu or 
http://www.grs.nig.ac.jp:7081. 

Collaborative European Drosophila stock centers 
in Umea, Sweden, and Szeged, Hungary, have been 
supported by the European Union. The P Insertion 
Mutant Stock Centre in Szeged has recessive lethal 
P insertion mutants on chromosome 2 and 3 and a 
collection of mobile element insertions causing 
altered-expression phenotypes, the EP element 
lines (http://www.bio.u-szeged.hu/genetika/stock). 
The European Drosophila Stock Center in Umea 
included general stocks, maternal-effect lethals, zyg- 
otic lethals, non-melanogaster species and wild-type 
D. melanogaster. It closed at the end of February 
2001, with stocks to be transferred to a new center 
in Kyoto (http://www.grs.nig.acup/.data/doecs/reet 
man-B.hcml). 

The National Institute of Genetics in Mishima, 
Japan, also maintains about 700 mutant stocks of 
D. melanogaster and 400 stocks of several other Dros- 
ophila species in various locations and distributes 
them upon the request of researchers (http:// 
www.shigen.nig.ac.jp/fly/nighayashi.html). They also 
maintain a data depository of information and docu- 
ments for the Japanese-speaking community, http:// 
jfly.nibb.ac.jp. 

There are also regional stock centers: the Drosophila 
Stock Center, Mexico (http://hp.fciencias.unam.mx/ 
Drosophila/LOSHTML/portada.html), the Moscow 
Regional Drosophila melanogaster Stock Center, 
and the Indian Drosophila Stock Centre at Devi 
Ahilya University (see addresses at http://flystocks. 
bio.indiana.edu/other-centers.html). 
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Stocks from the Drosophila Species Stock Center Maize 
at Bowling Green that were not incorporated into 
the Bloomington Stock Center have been moved At the 1928 Winter Science meetings in New York, a 
to the University of Arizona (http://stockcenter.arl. work on the maize linkage maps, and the idea of an 
arizona.edu). group of maize geneticists discussed organized Maize 
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Type Organism Address (Dated March 2001) 
Microbial Bacillus subtilis http://bacillus.biosci.ohio-state.edu 
Chlamydomonas http://www.biology.duke.edu/chlamy 
Escherichia coli http://cgsc.biology.yale.edu; http://shigen.lab.nig.ac.jp/ecoli/strain; 
http://www.shigen.nig.ac.jp/cvector/cvector.html 
Filamentous fungi http://www.fgsc.net; http://www.hgmp.mrc.ac.uk/research/fgsc/intro.html 
Pseudomonas http://www.pseudomonas.med.ecu.edu 
Salmonella http://www.acs.ucalgary.ca/ ~ kesander 
Yeast http://phage.atcc.org/searchengine/ygsc.html or http://www.atcc.org (Berkeley); 


http://panizzi.shef.ac.uk/msdn/peter (Peterhof); 
http://www.ifrn.bbsrc.ac.uk/NCYC (UK); see also 
http://genome-www.stanford.edu/Saccharomyces 


Miscellaneous (having http://www.cabi.org/; http://www.belspo.be/-bccm; 
subcollections of http://www.dsmz.de; http://www.pasteur.fr/applications/CIP; 
Agrobacterium, Escherichia —http://www.ukncc.co.uk/ http://www.jcm.riken.go.jp/JCM/ about}CM.html; 
coli, yeast, etc.) http://wdcm.nig.ac.jp; http://www.atcc.org; 
http://mgd.nacse.org/ocid/prospect3.html 
Plant Arabidopsis http://aims.cps.msu.edu/aims; http://nasc.nott.ac.uk/home.html; see also 
http://www.arabidopsis.org 
Barley http://www.ars-grin.gov/ars/PacWest/Aberdeen/hang.html 
Maize http://w3.ag.uiuc.edu/ maize-coop/mgc-info.html; see also 
http://www.agron.missouri.edu/ 
Pea http://www.ars-grin.gov/ars/-PacWest/Pullman/GenStock/pea/MyHome.html 
Rice http://shigen.lab.nig.ac.jp/rice/oryzabase; see also http://ars-genome.cornell.edu 
Tomato http://tgrc.ucdavis.edu 
Wheat http://www.ars-grin.gov/ars/PacWest/Aberdeen/hang.html 
Animal Axolotl http://www.indiana.edu/ ~ axolotl 
Chicken http://danr0 | 3.ucdavis.edu/publications/indexa.htm 
Drosophila http://flystocks.bio.indiana.edu (Bloomington); 


http://www.bio.u-szeged.hu/stock (Szeged); 
http://www.grs.nig.ac.jp:708 | /.data/doc/refman/refmanB.html (see Section B.I 1.2.2.) 
http://www.shigen.nig.ac.jp/fly/nighayashi.html (Japan); 
http://flystocks.bio.indiana.edu/other-centers.html (Moscow and India); 
http://stockcenter.arl.arizona.edu 
http://hp.fciencias.unam.mx/Drosophila/LOSHTML/portada.html (Mexico); 
see also: http://jfly.nibb.ac.jp, http://fly.ebi.ac.uk:7081 or 
http://flybase.bio.indiana.edu, or http://www.grs.nig.ac.jp:708 | 

Mouse http://www.jax.orgnd http://jaxmice.jax.org (Jackson Lab); 
http://imsr.har.mrc.ac.uk (MRC); http://Isd.ornl.gov/htmouse; 
http://www.nih.gov/science/models/mouse/resources/ornl.html (Oak Ridge); 
http://stkctr.biol.sc.edu (Peromyscus); see also http://www.informatics.jax.org 

Caenorhabditis elegans http://biosci.umn.edu/CGC 

Zebrafish http://zfin.org/zf_info/stckctr/stckctr.html; see also 
http://zfin.org/index.html 


Genetics Cooperation originated in that discussion. 
It was formalized in 1932 at the 6th International 
Genetics Congress. It included provision for the 
Maize Genetics Cooperation Newsletter and the Maize 
Genetics Cooperation Stock Center. M. M. Rhoades 
served as first secretary of the Newsletter and first 
director of the stock center. The responsibility for 
these activities rotated among several prominent 
maize geneticists during 1936-1952, while the Stock 
Center was located at Cornell University, and then 
again after 1953, when the collection moved to the 
University of Illinois, Urbana. It was supported by 
grants from the NSF (1953-1981) and then by the 
USDA Agricultural Research Service and Plant 
Genetic Resources Program. Currently, the Maize 
Cooperation Stock Center is at the University of Illi- 
nois under the direction of M.M. Sachs, USDA/ARS 
and University of Illinois, and the Maize Genetics 
Cooperation Newsletter secretary is E.H. Coe of 
the USDA/ARS and the University of Missouri and 
current co-secretaries are M. Polocco and J. Birchler, 
also at the University of Missouri. The collection 
includes nearly 80 000 pedigreed samples, including 
alleles of several hundred genes, combinations of such 
alleles, chromosome aberrations, ploidy variants, and 
other variations. Details about the collection, its his- 
tory, available stocks, and request forms can be found 
at http://w3.ag.uiuc.edu/maize-coop/mgc-info.html. 
The Stock Center database is an integral part of the 
Maize Genome Database (Maize DB), which links 
stock center data to the Maize DB information on 
alleles, genes, molecular markers, maps, probes, etc. 
The newsletter is also found at this site. http://www. 
agron.missouri.edu/munil. 


Mouse 


The Jackson Laboratory is a nonprofit, independent 
research institution which was founded in 1929 by 
C. C. Little to conduct basic genetic and biomedical 
research and to provide training and genetic resources 
to the scientific community. It has since that time sup- 
plied inbred and mutant strains of mice to the research 
community. Its current resource includes over 2500 
strains of genetically defined mice, both live stocks 
and frozen embryos, and a transgenic mouse resource 
and DNA resources (http://www.jax.org and http:// 
www.jaxmice.jax.org). Mouse Genome Informatics is 
served from (http://www.informatics.jax.org). 

The MRC Mammalian Genetics Unit, Harwell, 
UK  (http://www.mgu.har.mrc.ac.uk) maintains a 
Frozen Embryo and Sperm Archive of almost 1000 
stocks and live mouse stocks of 200 mutant, chromo- 
somal anomaly, and inbred lines available on request. 
The Archive provides free cryopreservation and 
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storage to researchers, with charges for withdrawals 
from the Embryo Bank. 

Oak Ridge National Laboratory Mutant Mouse 
Collection has several hundred mouse stocks, 
propagating mutations induced by radiation or chem- 
ical mutagenesis, plus several standard inbred 
strains. Stocks include live mice, frozen embryos or 
sperm, and frozen tissues (http://www.nih.gov/ 
science/models/mouse/resources/ornl.html and 
http://Isd.ornl.gov/htmouse). 

The Peromyscus (deer mouse) Genetic Stock Center 
originated in 1985 at the University of South Carolina, 
under the direction of W.D. Dawson, with support 
from the NSF, and in 1998 contained 35 mutant lines, 
stocks of wild-type animals of 7 species, and 2 inbred 
lines. Planning for a comprehensive database, 
PeroBase, began in 1997. The Stock Center also has re- 
ceived support from the NIH to develop strains to serve 
as animal models for disease (http://stkctr.biol.sc.edu) 


The Fungal Genetic Stock Center 


The prominent role of Neurospora and other as- 
comycetesinthestudy of the genetics of nutritional, bio- 
chemical mutants in the 1940s resulted in the isolation 
of many important strains for research in biochemical 
and molecular genetics. The Fungal Genetic Stock 
Center (FGSC) was organized as a result of recom- 
mendations by the Genetics Society of America in 
1960 and has been funded continuously by the NSF. 
It was originally located at Dartmouth College, 
directed by R. Barratt, then moved to California 
State University at Humboldt and, in 1985, to the 
University of Kansas Medical Center, directed by 
J.A. Kinsey and K. McCluskey. Its holdings include 
nearly 9000 strains of filamentous fungi, mostly 
genetic derivatives of Neurospora crassa and Aspergillus 
nidulans, but also strains of Aspergillus niger, Neuro- 
spora tetrasperma, and isolates of other Neurospora 
and Aspergillus species. The collection also contains 
Fusarium species and mutants, Nectria, and Sordaria 
mutants and species. The stock center publishes the 
Fungal Genetics Newsletter, originally a mailed 
publication, now on-line, as well as meeting abstracts 
and announcements, and a bibliography available on its 
web site. The website includes information on genes, 
alleles, and maps, and on plasmids, clones, and gene 
libraries for N. crassa and A. nidulans that the center 
supplies (http://www.fgsc.net and http://www.hgmp. 
mrc.ac.uk/research/fgsc/intro.html). 


Escherichia coli 


Sexuality and the ability to make genetic crosses 
between mutant strains of bacteria were discovered 
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in E. coli only in the 1940s. Individual laboratories 
studying biochemical and molecular genetics then accu- 
mulated large numbers of E. coli mutants. It soon 
became apparent that a national repository would 
greatly aid the free exchange of strains and the advance 
of molecular genetics. In the US, the NSF supported a 
proposal to begin with E.A. Adelberg’s Yale University 
collection of stocks and add important strains and sets 
of strains from laboratories worldwide. This became 
the E. coli Genetic Stock Center (rGSC) at Yale, curated 
and directed for 25 years by B.J. Bachmann (until her 
retirement in 1993, then succeeded by M. Berlyn) and 
supported continuously since 1971 by the NSF. The col- 
lection holds over 7800 strains and a plasmid library en- 
compassing cloned segments of nearly all of the E. coli 
genome. Unlike other early stock centers, it did not 
establish a newsletter as an integral part of its activ- 
ities, but it soon assumed the functions of registering 
gene names and allele numbers, as set forth in the 
widely accepted guidelines for bacterial nomenclature 
by Demerec et al. (1966), and for registry of designa- 
tions for deletions, insertions, and F’ plasmids. It also 
took on responsibility for periodic publishing of the 
linkage map for E. coli. These information functions 
provided a natural progression to the development 
of an online database, established in 1989 and also 
supported by NSF, covering gene names, functions, 
map locations, strain genotypes, mutation infor- 
mation, and supporting documentation (http:// 
cgsc.biology.yale.edu). 

In Japan, the National Institute of Genetics in 
Mishima established a Genetic Stock Center in 1976 
which has a collection of about 4000 genetic deriva- 
tives of E. coli and 400 cloning vectors. Its reorgan- 
ization in 1997 created the Genetic Strains Research 
Center, the Microbial Genetics Center, and the Center 
for Genetic Research Information (http://shigen.lab. 
nig.ac.jp/ecoli/strain and http://www.shigen.nig.ac.jp 
/cvector/cvector.html). 

In Europe, many of the broader collections, such 
as those cited at the end of the next section, carry large 
numbers of E. coli genetic stocks. Phabagen in particu- 
lar was an early collection of E. coli strains and bac- 
teriophage, which has broadened its range of bacteria 
and also merged with other collections (see BCCM in 
Table 1). The American Type Culture Collection 
(ATCC) in the US also has many E. coli strains and 
is cited in the next section (“Yeast”). 


Other Bacteria, Yeast, and 
Chlamydomonas 


Salmonella 
The Salmonella Genetic Stock Center (SGSC) at the 
University of Calgary, Alberta, Canada, originated in 


the laboratory of M. Demerec at Cold Spring Harbor 
Laboratory and Brookhaven National Laboratory, 
Long Island, NY, in the 1950s and 1960s, as derivatives 
primarily of Salmonella typhimurium (aka Salmonella 
enterica subspecies enterica serovar typhimurium) 
strain LT2. After Demerec’s death, the collection was 
moved and expanded at the University of Calgary 
by K. Sanderson. It currently has several thousand 
strains, cosmid and phage libraries, and a set of cloned 
genes. Many of the mutant strains are organized 
into special-purpose kits, useful for specific genetic 
techniques or analyses. In addition to the mutants, 
it has the Salmonella Reference Collection (SARC) 
representing all subgenera of Salmonella. The Center 
is supported by the Natural Sciences and En- 
gineering Research Council of Canada (http:// 
www.acs.ucalgary.ca/~kesander). 


Agrobacterium, Escherichia coli, and 
Bacteriophages 

Phabagen, the Phage and Bacterial Genetics Collec- 
tion, includes 3500 mutant bacterial strains, 450 
cloning vectors, 800 other plasmids, 2 plasmid- 
containing gene banks of E. coli, and over 100 phages. 
It was established in the early 1960s with deposits of 
bacterial mutants from researchers of the Working 
Community Phabagen. Since 1990 it has been part of 
the Centraal Bureau voor Schimmelcultures (CBS), 
and has merged with the Laboratory for Microbiology 
at Delft (LMD) Collections to become the National 
Culture Collection of Bacteria of the Netherlands, 
which includes mutant derivatives of E. coli K-12 
and B and also mutants of Agrobacterium tumefaciens, 
wild-type and reference strains of other bacteria, 
plasmids, and phages (http://www.cbs.knaw. nl/nb). 


Bacillus subtilis 

Bacillus subtilis is a spore-forming bacterium that has 
been used particularly to study that process in pro- 
karyotes. The Bacillus Genetic Stock Center at Ohio 
State University was established in 1978. Itis supported 
by the NSF under the direction and management of 
D.H. Dean and D.R. Ziegler. The collection includes 
1000 genetically characterized B. subtilis strains and 
300 strains of other Bacillus species, as well as a bac- 
terial artificial chromosome (BAC) library, cloned 
DNA, and shuttle plasmids in E. coli strains. The 
Center publishes a newsletter and a genetic map for 
B. subtilis (http://bacillus.biosci.ohio -state.edu). 


Pseudomonas aeruginosa 

The Pseudomonas Genetic Stock Center is a collection 
of genetic derivatives of the prototrophic Pseudo- 
monas aeruginosa strain PAO1. The collection was ori- 
ginally created at Monash University in Australia by 


B. Holloway and is currently located at the Brody 
School of Medicine, East Carolina University (ECU), 
Greenville, NC, under P.V. Phibbs. The Center main- 
tains and distributes, in addition to these strains, gen- 
eralized transducing phages for P. aeruginosa, some 
P. putida strains from the J. Sokatch and R. Gunsalus 
laboratories, and the Holloway cosmid library. It is 
supported by the Department of Microbiology and 
Immunology of the Brody School of Medicine at 
ECU (http://www.pseudomonas.med.ecu.edu). 


Yeast 

The Yeast Genetic Stock Center originated at the Uni- 
versity of California at Berkeley, in 1960, founded and 
administered by R.K. Mortimer. It included 1200 
strains of Saccharomyces cerevisiae, primarily deriva- 
tives of the stocks of C.C. Lindegren at Southern 
Illinois University. Professor Mortimer and the stock 
center annually published updated linkage summaries 
and linkage maps. After his retirement, the collection 
moved, in 1998, to the ATCC, where it is maintained 
as a separate collection (http://phage.atcc.org/search- 
engine/ygsc.html). The ATCC also has a number of 
the mutant lines of Schizosaccharomyces pombe. In 
addition it will serve as a repository for a complete 
set of deletion strains (http://www.atcc.org). Maps 
and sequence are now presented on-line as part of 
the Saccharomyces Genome Database (http://genome 
-www.stanford.edu/Saccharomyces). 

The Peterhof Genetic Collection of Yeasts (PGC) 
is part of the Biotechnology Center at St. Petersburg 
State University in Russia. It has over 1000 genetically 
marked yeast strains, with an origin distinct from the 
Carbondale/Berkeley collection. The Peterhof lines 
originated from a diploid cell of an inbred strain of 
Saccharomyces cerevisiae. The Collection includes 
mutants derived from this line, other genetically 
marked yeast strains, and segregants of crosses between 
the Peterhof-derived and other strains. (panizzi.shef. 
ac.uk/msdn/peter). 

The National Collection of Yeast Cultures 
(NCYC), Institute of Food Research, in Norwich, 
UK, includes brewing yeast strains, genetically 
defined strains of Saccharomyces cerevisiae and Schizo- 
saccharomyces pombe, and general yeast strains, 
totalling over 2700 nonpathogenic yeasts. In addition 
to supplying cultures, it is a patent and safe repository 
and it performs yeast identification services. A search- 
able database for the NCYC is found at http: //www. 
ifrn.bbsvc.ac.uk/ncye. 


Chlamydomonas 

The Chlamydomonas Genetics Center (CGC) at Duke 
University was founded in 1984. It collects, describes, 
and distributes nuclear and cytoplasmic mutant 
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strains, and genomic and cDNA clones of Chlamydo- 
monas reinhardtii. The web and gopher sites provide, 
in addition, information on genetic and molecular 
maps of Chlamydomonas, plasmids, sequences, and 
bibliographic citations (http://www.biology.duke. 
edu/chlamy www.biology.date.edu/chlamy). 


Large Diverse Collections that Include 
Microbial Genetic Stocks 
A number of the large national collections of micro- 
organisms include genetic stocks. These are described 
in more detail in the Stock Centers entry in the En- 
cyclopedia of Microbiology (Berlyn, 2000). 

For example: 


1. The International Mycological Institute (IMI) for 
Culture Collections, which was founded in 1920 as 
an organization supported by 32 governments and 
has over 16 500 strains of filamentous fungi, yeasts, 
and bacteria. It is part of Commonwealth Agricul- 
tural Bureaux (CAB) International, a nonprofit 
intergovernmental organization (http://www.cabi. 
bioscience.grc.htm). 

2. The Belgian Coordinated Collections of Micro- 
organisms (BCCM), a consortium of four research- 
based collections, include 50 000 documented 
strains of bacteria, filamentous fungi, and yeasts 
and over 1500 plasmids, supported by the Belgian 
Federal Office for Scientific, Technical, and Cul- 
tural Affairs. They provide patent and safe-deposit 
services, as well as fingerprinting/biotyping and 
identification services, contract research, and train- 
ing (http://www.belspo.be/bcecm). 

3. The Deutsche Sammlung von Mikroorganismen 
und Zellkulturen (DSMZ) is the national culture 
collection in Germany, founded in 1969 and sup- 
ported by the Federal Ministry of Research and 
Technology and the State Ministries. It includes 
genetic stocks of bacteria, filamentous fungi, and 
yeast. It also has plant and animal cell cultures. In 
addition to supplying scientists and institutions 
with its cultures, it acts as a patent and safe repos- 
itory (http://www.dsmz.de). 

4. The Collection of the Institut Pasteur (CIP) traces 
its origin to Dr. Binot’s collection of microbial 
strains in 1891. It now includes genetic stocks of 
E. coli (http://www.pasteur.fr/applications/CIP). 

5. The NCYC has been cited for its yeast collections. 
The National Collection of Type Cultures in 
London (NCTC) is another of the UK National 
Culture Collections (http://www.ukncc.co.uk) and 
it has genetic derivatives as well as natural isolates 
and pathogenic strains of E. coli in its large collec- 
tion, which emphasizes pathogenic bacteria and 
mycoplasmas. It is a patent and safe depository 
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and, jointly with the DSMZ, is the resource centre 
for plasmid-bearing bacteria for Europe. It is sup- 
ported by the UK Public Health Laboratory Ser- 
vice and is part of the Central Public Health 
Laboratory, Colindale  (http://www.phls.co.uk/ 
services/nctc/index.htm). 

6. The Japan Collection of Microorganisms (JCM), in 
the Institute of Physical and Chemical Research 
(RIKEN), has over 6000 strains of bacteria, 
filamentous fungi, yeast, and archea (http:// 
www,jcm.riken.go.jp/JCM/about]CM.html) and 
http://wdcem.nig.ac.jp). 

7. The Cloning Vector Collection at the National 
Institute of Genetics provides vectors of E. coli 
as purified DNA (gillnet.lab.nig.ac.jp/~ cvector/ 
NIG_cvector/aboute.html). 

8. The ATCC, in the US, has already been mentioned 
for its Yeast Genetic Stock Collection; it also has 
genetic stocks of E. coli and other bacteria and is a 
patent and safe depository (http://www.atcc.org). 

9. The Microbial Germplasm Database (MGD), at 
Oregon State University, is not a physical collec- 
tion, but a database that contains information on 
collections maintained for research purposes in 
laboratories of universities, industry, and govern- 
ment and on NSF-supported collections, including 
contact information for researchers holding these 
collections. The MGD provides a newsletter and 
maintains a website where queries can be made 


(http://mgd.nacse.org/ocid/prospect3.html). 


Arabidopsis and Crop Plant Genetic 
Stock Centers 


Arabidopsis 

Research using Arabidopsis thaliana as a model organ- 
ism for flowering plants increased dramatically from 
the mid-1980s through the 1990s. Resource centers 
that included genetic stocks, genomic libraries, and 
cloned DNA were established in response to recom- 
mendations from a series of workshops sponsored by 
NSF and culminating in a long-range plan for a multi- 
national-coordinated Arabidopsis genome project pre- 
sented in 1990. The Arabidopsis Biological Resource 
Center (ABRC) at Ohio State University includes 
seeds, restriction fragment length polymorphism 
(RFLP) markers, and yeast artificial chromosome 
(YAC) libraries. The seed collection and distribution 
activities in Europe are performed by the Arabidopsis 
Centre at Nottingham, in England, and there is a clone 
center in Germany. The centers manage the same 
(mirrored) collection of seed stocks and collaborate 
and coordinate their efforts to meet the needs of the 
world Arabidopsis research community. The US cen- 
ter, directed by R. Scholl, receives funding from the 


NSF, and the UK center, directed originally by 
M. Anderson and then S. May, is funded by the Bio- 
technology and Biological Sciences Research Council 
and the European Union, in addition to user and local 
institutional support (http://aims.cse.msu.edu/aims 
and http://nasc.nott.ac.uk/home.html). 


Genetic Stocks within Plant Germplasm 
Collections 

The largest and best-known crop plant collections are 
primarily germplasm repositories for cultivars, land- 
races, and plant breeding stocks, rather than genetic 
derivatives of specific stocks. For example, the ex situ 
conservation efforts administered by the USDA are 
centered at the National Seed Storage Laboratory at 
Fort Collins, CO, with a base collection that includes 
over 232 000 accessions of nearly 400 genera and over 
1800 species. It preserves valuable germplasm for the 
US and, by agreement with the International Board 
for Plant Genetic Resources (http://www.ipgri.cgiar. 
org), for the global network of genetic resources cen- 
ters called the Consultative Group for International 
Agricultural Research (CGIAR) (http://www.sgrp. 
cgiar.org) and provides these seeds to researchers 
worldwide. In addition, the USDA National Genetic 
Resources Collections include a number of collections 
of genetically defined mutant strains. Besides the 
USDA/ARS Maize Cooperation Stock Center pre- 
viously described, there are genetic stock collections 
for tomato, wheat, barley, and pea. Rice genetic stocks 
are available through the international Rice Genetic 
Cooperative. 


Tomato 

The tomato collection was started by C.M. Ricks in 
the Department of Vegetable Crops, the University of 
California at Davis, with collections he made of wild 
species and mutant marker and cytogenetic stocks 
created in the laboratory. Others then contributed 
both germplasm and mutant stocks. It has c. 3000 
accessions. The C.M. Ricks Tomato Genetic Resource 
Center has become part of the USDA National Plant 
Germplasm System, the NPGS (http://www.ars- 
grin.gov/npgs) and is supported by them, by the Uni- 
versity of California, and by industry-sponsored 
endowments and grants. Seeds are stored in Davis 
and also, for long-term storage and backup, at the 
National Seed Storage Laboratory (NSSL) in Fort 
Collins and are provided to researchers. The annual 
Tomato Genetics Cooperative Report includes a list of 
stocks, which is also available through the website. 
History, query capability, gene and allele descriptions, 
and links to related sites are provided at the web site, 
http://tgrc.ucdavis.edu. 


Wheat 

The E.R. Sears Wheat Genetic Stock Collection 
origin ated with the cytogenetic and breeding 
work of E.R. Sears at the University of Missouri in 
Columbia and includes aneuploids of Chinese Spring 
wheat — monosomic, trisomic, tetrasomic, nullisomic, 
and more complicated variations — as well as addition, 
subtraction, and translocation lines. There are 334 
accessions of Triticum aestivum subsp. aestivum and 
a total of c. 600 accessions from the Columbia 
collection. Data are available from the GRIN system 
(http://www.ars-grin.gow/ars/Pac West/Aberdeen/ 
hang.html). 


Rice 
The international Rice Genetic Cooperative (RGC) 
was founded in 1985 for the purposes of maintaining 
genetic stocks, enhancing rice genetics, publishing a 
Rice Genetics Newsletter that includes gene symbol 
coordination and linkage map information, and hold- 
ing periodic symposia. A Japanese committee con- 
structed a network of the genetic stock centers that 
were located at universities and research stations in 
Japan, and the information is available through the 
National Institute of Genetics. Like the Maize Genetic 
Cooperation stocks, rice stocks include mutant lines, 
polyploids, trisomics, translocation lines, landraces, 
varieties, and wild species (http://shigen.lab.nig.ac. 
jp/rice/oryzabase/Strain.html). The Oryzabase web 
site includes information about the stock centers, 
strains, alleles, linkage maps, genes, and other informa- 
tion. Oryzabase was established in 2000 to bring 
together information ranging from classical genetics 
to genomics and basic descriptions of rice biology 
(http://shigen.lab.nig.ac.jp/rice/oryzabase). 
RiceGenes at Cornell University (http://ars- 
genome.cornell.edu/rice) is also a database of the rice 
molecular marker map and genomic information, 
particularly quantitative trait loci. Its sister databases, 
SolGenes for the Solanaceae (including a periodic 
downloading from the Tomato Stock Center website, 
see above) and GrainGenes, for wheat and relatives, as 
well as other crop and animal genome databases, can 
be reached from the USDA-ARS Center for Bio- 
informatics and Comparative Genomics, http://ars- 
genome.cornell.edu. 


Barley 

The Barley Genetic Stock Center previously housed at 
Colorado State University and the NSSL moved in 
1993 to the USDA-ARS National Small Grains 
Germplasm Research Facility in Aberdeen, Idaho. It 
includes over 2500 accessions of Hordeum vulgare 
subsp. vulgare. The database of information on the 
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collection are part of the GRIN system. The collection 
includes aneuploids (primary trisomics) and desynap- 
tic mutants (http://www.ars-grin.gov/ars/PacWest/ 


Aberdeen/hang.html). 


Pea 

G. Marx, at Cornell University, collected pea germ- 
plasm and mutants, and upon his retirement the G.A. 
Marx Collection became part of the NPGS, and the 
collection was moved to Washington State University, 
with accessions numbering c. 3000. It includes muta- 
tions affecting foliage, flowers, seeds, pods, product- 
ivity, and photoperiodism, and a special subset 
tagged ‘Mendels Genes’ (http://www.ars-grin.gov/ 
ars/PacWest/Pullman/GenStock/pea/MyHome.html). 


Caenorhabditis, Zebrafish, and Other 
Animal Stock Centers 


Nematode Caenorhabditis elegans 

Use of this model organism for the genetics of devel- 
opment, behavior, and neurobiology began in S. Bren- 
ner’s laboratory in the early 1970s and encompasses 
mutant isolation and analysis, documentation of cell 
lineage and development, and the complete genome 
sequence (see, for example, Cell Division in Caenor- 
habditis elegans). The Caenorhabditis Genetic Center 
(CGC) at the University of Minnesota keeps genetic 
stocks of C. elegans, approximately 3500, and a data- 
base linked to the C. elegans genomic database, http:// 
biosci.umn.edu/CGC. 


Zebrafish Resource Center 

A more recently developed model system, the zebra- 
fish, for study of vertebrate development and genetics, 
has a repository for strains at the University of Oregon, 
supported by funds from the NIH and the state of 
Oregon. 

The International Resource Center for Zebrafish 
preserves sperm samples, embryos, and live stocks 
of zebrafish wild-type and mutant stocks submitted 
by researchers and available for distribution to the 
research community, maintains the genetic map and 
information on genetic markers, publishes informa- 
tion on methods for maintenance and use use of zebra- 
fish in research, and studies disease and health of 
zebrafish strains. The Center maintains the ZFIN, 
the Zebrafish Information Network database for 
disseminating information on genetics, genomics, 
and development of the organism and community 
information. The database project was founded in 
1994, with initial support from the NSF and the 
Keck Foundation and current support from the NIH 
(http://zfin.org/zfinfo/stckctr/stcketr.htmland http:// 
zfin.org/index.html). 
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Domestic Chicken Genetic Stocks 

An Avian Genetic Stock Collection for mutants of the 
domestic chicken at the University of California, 
Davis, was funded by NSF in 1997 for preservation 
of existing stocks and planning for future long-term 
preservation of the collection (http://danr013.ucdavis. 
edu/publications/indexa.htm). 


The Axolotl Colony 

The Axolotl Colony, a colony of the Mexican axolotl 
(Ambystoma mexicanum) was founded at Indiana 
University in 1957 by R.R. Humphrey and has been 
supported since 1957 by the NSF. It serves as a genetic 
stock center with mutant lines that affect coloration, 
organs, limbs, development, and isozymic variation. It 
has approximately 80 000 axolotls. Embryos, larvae, 
and adults are sent to research scientists and to class- 
rooms. Information on axolotls and methods of care 
and a newsletter, as well as mutant descriptions, are 
found on their web site, http://www.indiana.edu/ 
~ axolotl. 


Further Reading 

Knutson L and Stoner AK. (1998) Biotic Diversity and Germplasm 
Preservation, Beltsville Symposia in Agricultural Research. 
Boston, MA: Kluwer. 

Letovsky SI (1999) Bioinformatics: Databases and Systems. 
Boston, MA: Kluwer. 

World Federation of Culture Collections (WWFCC) publications: 
http://wdem.nig.ac.jp/wfcc/ publications.html. 
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Genetic translation refers to the process whereby 
messenger RNA (mRNA) serves as a template for 
ribosome-mediated protein synthesis. The process of 
translation occurs in the cytoplasm of a cell and can be 
divided into three distinct phases: translation initi- 
ation, polypeptide chain elongation, and chain termin- 
ation. For translation initiation, the small ribosomal 
subunit must bind to the mRNA to form, along with 
initiation factors, an initiation complex. The subse- 
quent formation of a polypeptide chain starts with a 
methionine, which is donated by a unique initiator 
transfer RNA (met-tRNAj). A different met-tRNA 
functions in chain elongation. Once the initiation pro- 
cess is completed, the initiation factors are released 
from the initiation complex and the large ribosomal 
subunit binds. Additional amino acids are then added 
to the growing polypeptide chain, in a stepwise 
manner, where the choice of amino acid is determined 
by consecutive triplets (codons) along the mRNA. 
Chain elongation is terminated when one of the 
three translational stop codons, UAA, UGA, or 
UAG is encountered. 

There are many aspects of translation that are com- 
mon among both prokaryotic and eukaryotic organ- 
isms. For example, the existence of two ribosomal 
subunits with similar overall structure and similar 
biochemical steps involved in peptide bond formation. 
However, there are significant differences in the struc- 
ture of the mRNAs that prokaryotic and eukaryotic 
organisms produce, requiring a different process for 
translation initiation. Bacterial mRNAs are typically 
polycistronic, which means that more than one gene is 
contained on a single mRNA. Since these genes are 
often functionally related or part of a common bio- 
synthetic or degradative pathway, this organizational 
arrangement has the advantage of allowing coordinate 
expression of these genes. Translation initiation of 
bacterial messages is dependent on two elements: an 
initiator codon and a purine-rich sequence that must 
be located approximately 10 bases upstream from the 
initiator codon. The most common initiator codon is 


AUG, but others such as GUG, AUU, or UUG are 


being used. The purine-rich sequence, also referred to 
as ‘Shine-Dalgarno’ sequence, is complementary to 
the 3’ end of the 16S ribosomal RNA and is found 
not only upstream of the initator codon but also in 
intercistronic regions, or in some cases, within the 3’ 
end of the preceding gene. Abolishing this sequence, 
either by mutation or deletion, will result in pre- 
mature termination of translation. 

By contrast, eukaryotic messages are strictly 
monocistronic. Large precursors, synthesized in the 
nucleus, are processed (spliced) during their transport 
into the cytoplasm, where they are further post- 
transcriptionally modified. These modifications in- 
clude addition of a methylated cap to the 5’-terminus 
of the message and addition of a poly(A) tail to the 3’- 
terminus. In eukaryotes, the smaller ribosomal sub- 
unit binds to the capped 5’-terminus, and according 
to the scanning model proposed by Kozak (1989), 
migrates linearly until it encounters the first AUG 
codon. At this point, the larger ribosomal subunit 
binds and translation begins. The sequence context in 
which the AUG resides determines the efficiency with 
which translation initiation takes place. In certain in- 
stances, where the AUG is in an unfavorable sequence 
context, ribosomes can bypass the first AUG and pro- 
ceed to the next one. However, this is more the excep- 
tion than the rule. 

Regulation at the translational level is less well 
understood than regulation at the transcriptional 
level. However, it is clear that the sequences within 
the 5’ end of both prokayotic and eukayotic messages 
have a profound impact on its ability to be translated. 
In general, high G+C content that promotes second- 
ary structure formation causes poor translational 
efficiency. This is an important consideration for 
generating engineered cell lines for the purpose of 
maximizing gene expression. 


Further Reading 

Kozak M (1983) Comparison of initiation of protein synthesis 
in procaryotes, eucaryotes, and organelles. Microbiological 
Reviews 47: |- 45. 

Schoner B, Belagaje RM and Schoner RG (1987) Expression of 
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cistron system. Methods in Enzymology 153: 401- 416. 
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Genetic Variation 
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Evolution by natural selection in a population can 
occur only if genetic variation exists within that popu- 
lation. Genetic variation is however important not 
only in evolution but also in all areas where genetics 
is involved, and many empirical and theoretical stud- 
ies have been made into the nature and extent of 
genetic variation and the reasons for its existence and 
maintenance. The subject is indeed a vast one and here 
we can only touch on a few aspects of this important 
topic. 

Perhaps the most important fact concerning the 
maintenance of genetic variation is that the Mendelian 
hereditary system itself is a “variation-preserving’ one: 
if there are no selective forces, then genetic variation in 
any population is maintained (except for random sam- 
pling effects in small populations) from one gener- 
ation to another. A hereditary scheme in which the 
character of any offspring is a kind of average, or 
blend, of the values of the character in the two parents 
rapidly extinguishes variation in the character. In 
Darwin’s time the hereditary mechanism was assumed 
to be some form of ‘blending,’ and the loss of variation 
in such a scheme was recognized by Darwin as an 
important argument against his theory. The discovery 
of the Mendelian hereditary mechanism immediately 
removed this problem. 

The amount of genetic variation at any gene locus is 
usually measured by the degree of heterozygosity at 
that locus, although other measures (for example the 
number of alleles present) are sometimes more appro- 
priate. In subdivided populations, the degree of vari- 
ation both within and between populations can be 
measured in various ways, the most frequently used 
measured of such variation being Wright’s F-statistics. 

Some characters are determined by the genes at a 
single locus and thus exhibit classical Mendelian seg- 
regation. Other characters are determined by a small 
number of major loci, together with minor effects 
from other loci. In other cases a character is deter- 
mined by a large number of loci, with no one locus 
being predominant in the determination of the char- 
acter. 

The latter case includes many examples of a meas- 
urable character such as height or weight. Characters 
are often also determined in part by environmental fac- 
tors. The attempt to apportion variation in a measur- 
able character to genetic and environmental effects has 
long fascinated scientists and laymen alike. Artificial 
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selection on any character depends on the variation in 
that characters’ having in part a genetic basis. For 
characters depending on many gene loci, the store of 
genetic variation in a population is such that artificial 
selection can bring about substantial changes to the 
values of many characters, often well outside pre- 
sently observed limits. 

A quantitative measure of variation of some meas- 
urable character within a population is naturally pro- 
vided by the statistical concept of a variance. This 
variance can be estimated from measurements taken 
from a sample of individuals from a population. The 
similarities between two characters, either two differ- 
ent characters (for example, height and weight) in the 
same individual, or the same character in two related 
individuals, are measured by the covariance, and from 
this by the correlation, between these characters. 

The simplest possible variance calculations arise 
where the character measurement of any individual 
depends on its genetic constitution at one single gene 
locus, with no environmental component, with only 
two alleles, A; and A3, possible at the locus. Suppose 
that in diploids (the only case we consider) individuals 
of the three possible genotypes, A141, A142, and A2A2 
have measurement values 7711, 712, and m22, respec- 
tively. Let the population frequencies of these three 
genotypes be P11, 2P12, and P22. Then the population 
mean for this measurement is n = P41 m11 + 2P12 m2 
+ P22 m2 and the population variance in the character 
iso? =P 44 (ma —M)+-2Py2(my2— mM? + Poo (ma mM)’. 

In statistical terminology, this variance has two 
degrees of freedom and can thus be split up into two 
components, each describing some significant compo- 
nent to the population variation in the measurement. 
By far the most useful subdivision of this type is the 
partition of o° into the additive genetic variance (see 
Additive Genetic Variance) and the dominance vari- 
ance. Roughly speaking, the additive genetic variance 
is the variance due to genes within genotypes and the 
dominance component is the variance not explainable 
by genes. The former is important in evolution and 
artificial breeding programs because a parent passes on 
a gene, and not an entire genotype, to an offspring. 
In the two-allele case, genetic variation is preserved 
when the fitness of the heterozygote exceeds that of 
both homozygotes. When many alleles are possible, a 
complicated mathematical criterion is needed to assess 
whether genetic variation is preserved. 

Generalizations of these ideas to the case where the 
character depends on the genes at many loci are also 
possible. The criteria for the maintenance of genetic 
variation are now far more complicated than in the 
single-locus case. It is also interesting to ask how many 
loci influence the variation in a particular character. 
In the case of genetic diseases, this is associated with 


the distinction between ‘simple Mendelian’ and ‘multi- 
factorial’ diseases. 

Genetic variation is also preserved when a popu- 
lation is divided into small subpopulations, with 
selective forces acting in different directions in the 
subpopulations, provided that there is a small migra- 
tion rate between them. Another agency preserving 
genetic variation is a selective force acting in different 
directions between the sexes. 

Genetic variation is lost, in small populations, by 
random sampling effects. Quantitative expressions for 
the rate of loss of variation through this agency are 
available in simple cases, particularly those where 
genes are not subject to natural selection. Generaliza- 
tions of these expressions in cases where whole sub- 
populations are subject to extinction are also available. 


See also: Additive Genetic Variance; QTL 
(Quantitative Trait Locus) 
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Genetics has been defined as the scientific study of 
heredity. It has three major subdivisions: transmis- 
sion genetics, physiological genetics, and population 
genetics. 


Transmission Genetics 


Transmission genetics concerns the germinal sub- 
stance (deoxyribonucleic acid or DNA) and its mode 
of transmission from parent to progeny. DNAs are 
distinguished from one another by the sequence of 
nucleotides along their length. Linear molecules of 
DNA constitute the core of microscopically visible 
structures (the chromosomes) that divide and segre- 
gate at cell division so that each cell of a multicellular 
organism generally has the same chromosome com- 
plement. Sexually reproducing eukaryotes are typ- 
ically diploid: an individual has one chromosome set 
from each of his or her parents. At reproduction, a 
meiotic division produces gametes (sperm or egg), 
each of which has a single chromosome set. 

The discipline of genetics follows the work of 
Gregor Mendel, who discovered in 1865 the regular 
pattern of transmission of units (later called genes) 
that affect visible properties (later called phenotypes) 
of organisms. Mendel’s units are segments of linear 
DNA molecules. Most of the basic rules of genetics 


were deduced before 1951 (when the germinal sub- 
stance was shown to be DNA) and long before 
DNA sequencing. Among the processes fundamental 
to deducing the rules are mutation, recombination, and 
the meiotic behavior of structurally aberrant chromo- 
somes. A mutation is a heritable change, almost 
always a change in DNA sequence. Mutations can be 
used to mark the chromosome entering a cross from 
the parents and recovered in the progeny. Genetic 
recombination of such marked chromosomes allows 
the construction of linkage maps. All these processes 
aid in equating genetic determinants with specific 
chromosomal segments — a goal that is superseded by 
complete genome sequencing, when that is available. 

Transmission genetics includes the study of DNA 
transfer between individuals by means other than 
sexual reproduction, and its incorporation into the 
recipient genome. This process is conspicuous in pro- 
karyotes, which lack a meiotic cycle and frequently 
have circular rather than linear chromosomes. Trans- 
mission genetics also include the study of organelles 
such as mitochondria or plastids that contain DNA 
but are not distributed in a regular manner at cell 
division, and of viruses and related elements that con- 
tain RNA rather than DNA (some of which are trans- 
mitted vertically from parent to progeny cell). 


Physiological Genetics 


Physiological genetics concerns the mechanisms 
whereby genes affect organismal properties through 
transcription of DNA to RNA, translation of RNA to 
protein, and their regulation. Some subdivisions are 
biochemical genetics, developmental genetics, and cell 
genetics. A major tool of physiological genetics has 
been the characterization of mutant organisms. Such 
studies have identified various regulatory elements 
including activators and repressors of transcription 
and translation and nucleases and proteases that affect 
protein concentrations. Mutant studies frequently 
reveal complex pathways leading from primary gene 
functions to visible traits, sometimes allowing genes to 
be classified into regulatory hierarchies that define 
temporal patterns of gene expression during such 
processes as metazoan development and cell cycle 
progression. 


Population Genetics 


The subject matter of population genetics is the dis- 
tribution of heritable variation among the members of 
an interbreeding population. It includes both the gen- 
esis of such variation (through mutation and selection) 
and its maintenance from one generation to the next. 
Much of the variation in natural populations has no 


Genome 857 


detected phenotypic consequences and is observed 
only at the level of nucleotide sequence. Development 
and application of appropriate mathematical theory 
has been central to the discipline of population genet- 
ics. The basic rules of population genetics were put 
forward by R.A. Fisher, J.B.S. Haldane, and S. Wright, 
from about 1920 onward. One of their principal goals 
was to explain how Darwinian selection should affect 
diploid populations. 


See also: Developmental Genetics; Mendel, 
Gregor; Mouse, Classical Genetics; Population 
Genetics 
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The term genome has been used traditionally to define 
the haploid set of chromosomes in the nuclei of multi- 
cellular organisms. Hence, one sees reference to the 
‘human genome,’ the ‘mouse genome,’ and the ‘fly 
genome.’ Today the term is used more generally, as for 
example, to define the chromosomes in cytoplasmic 
organelles such as mitochondria and chloroplasts, and 
the chromosomes of prokaryotes and viruses. Hence 
one sees reference to the ‘mitochondrial genome,’ the 
‘yeast genome,’ the ‘Salmonella genome,’ and the 
‘SV40 genome.’ 

Genome is a noun as formerly used, but today it is 
also used as an adjective. One sees reference to ‘geno- 
mic variability’ or ‘genomic size.’ The adjectival form 
is also used as a noun as in the journal dealing with 
genomic matters, entitled, Genomics. The study of 
genomes is referred as ‘genomics.’ Researchers inves- 
tigating genomes are referred to as ‘genomasists.’ 

Genomes vary greatly in size as measured by their 
DNA content. In general there is a positive correlation 
between size and developmental complexity. This cor- 
relation is imperfect, because developmental complex- 
ity cannot be defined in strictly quantitative terms. 
The SV40 virus genome contains approximately 5000 
base pairs. The Escherichia coli bacterial genome has 
4.6 million bp. The yeast Saccharomyces cerevisiae 
genome has been measured at 12 million bp. The 
multicellular worm Caenorhabditis elegans has a gen- 
ome size of 100 million bp, while the simple flowering 
plant Arabidopsis thaliana has a nuclear genome of 
comparable size. The fruit fly Drosophila melano- 
gaster, an important experimental organism, has a 
genome size of 140 million bp. Homo sapiens has 
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a genome size of 3 billion bp and the genome of the 
laboratory mouse, Mus musculus, is only slightly 
larger at 3.3 billion bp. 

The genomes of nucleated organisms, eukaryotes, 
are generally organized into nuclear organelles termed 
chromosomes. In the somatic cells, the chromosomes 
exist as two sets, one of maternal, the other of paternal 
origin. This is known as the diploid condition. In the 
germ cells, sperm and ova, the nuclei contain a single 
or haploid set of chromosomes. In humans, the hap- 
loid number of chromosomes is 23. Irrespective of cell 
type, the genome always refers to the haploid set of 
chromosomes. The nuclear genomes of male and 
female are slightly different, since the male possesses 
a Y + X sex chromosome pair, while the female is 
characterized by a XX condition. 

Eukaryotic chromosomes generally consist of a 
linear DNA duplex complexed with histone proteins 
plus a variety of minor proteins. The complex of DNA 
plus associated proteins is termed chromatin. Chro- 
matin has the capacity to alter the compaction of the 
chromosomes over many orders of magnitude, to 
replicate the chromosome, and to appropriately regu- 
late the expression of genes encoded in the DNA. The 
ends of chromosomes terminate in structures termed 
telomeres that stabilize the DNA strand terminus. 
Centromeres are structures located at positions 
between the telomeres and serve as attachment points 
to the mitotic spindle and serve to distribute replicated 
chromosomes to daughter cells. Chromosomes can be 
identified morphologically on the basis of their overall 
length and position of the centromere. An average 
size human chromosome contains approximately 130 
million bp. 

The chromosomes of cytoplasmic organelles, bac- 
teria, and viruses are generally organized as circular 
structures obviating telomeres, but frequently con- 
taining structures analogous to centromeres. These 
chromosomes are generally much smaller than eu- 
karyotic nuclear chromosomes. The human mito- 
chrondrial chromosome contains 17000 bp, while 
the rice Oryza sativa chroloplast genome contains 
136000 bp. The Escherichia coli genome contains 4.6 
million bp. The SV40 virus has a genome size of only 
5000 bp. 

The number of genes residing in a genome can be 
most accurately determined by DNA sequencing and 
sequence analysis. Modern DNA sequencing methods 
are now producing data on the gene content of even 
very large genomes. Currently, the large genomes of 
important eukaryotic research organisms such as the 
yeast S. cerevisiae, the worm C. elegans, and the fly 
D. melanogaster have been completely sequenced. 
The human genome will be completely sequenced 
in the very near future. The complete sequences of 


prokaryotic genomes and organellar cytoplasmic 
genomes are also known. 

The number of genes within a genome does not 
necessarily correlate directly with DNA content. 
This is because of the complex organization of the 
genome into coding and noncoding components. 
Coding components specify the amino acid composi- 
tion and sequence of proteins. The coding elements of 
the genome define the total collection of proteins of a 
cell or organism, termed the proteome. The noncod- 
ing elements of DNA fall into a number of categories. 
One consists of control elements that are essential for 
the proper expression of coding regions. Principal 
control elements are promoters which reside proximal 
to the coding elements and initiate their transcription 
and enhancers that may reside at a distance from the 
coding elements and regulate the spatiotemporal 
expression of coding regions. Genes may be defined 
as coding elements producing a particular protein 
product in association with their noncoding control 
elements. Additional noncoding elements are satellite 
DNA consisting of long (1-10 kb) tandemly arranged 
repetitive elements usually concentrated near centro- 
meres, microsatellite DNA made up of short repeats 
of about 20 bp generally distributed throughout the 
chromosome and serving as useful genetic markers, 
and transposable elements that have the capacity to 
remodel genomes by recombination and additional 
repetitive and nonrepetitive DNA elements that have 
no known function. Higher organisms with large 
genomes may have large noncoding components 
compared to coding elements. The human genome 
contains only 3.0% coding DNA. Smaller genomes 
have a relatively higher content of coding DNA and 
are said to be more ‘compact.’ Organellar genomes are 
highly compact with little noncoding DNA. 

Complete DNA sequencing of genomes allows an 
accurate estimate of gene number. Hemophilis influ- 
enzae, a pathogenic bacterium, has 1709 predicted 
genes, while S. cerevisiae has 6241, C. elegans 18 424, 
D. melaogaster 13 601, and Homo sapiens not yet fully 
sequenced with an estimated gene number of approxi- 
mately 30000. It is interesting that as genome size 
increases, the gene number increases correspondingly 
less. For example, the human genome is approxi- 
mately 25 times larger than that of the worm and fly 
genomes, but the increase in gene number is only 
twofold. One possible explanation for this discon- 
tinuity is that genes interact combinatorially so that 
fewer genes may by interaction accomplish more 
complex functions. 

Genes increase in number by several mechanisms. 
One such is by unequal crossing-over whereby a gene 
undergoes lateral duplication to give rise to two 
daughter genes residing initially side by side on a 


chromosome. A second mechanism is by whole 
genome duplication whereby the initial gene will 
give rise to daughter genes residing initially on sep- 
arate chromosomes. Duplicated genes within a genome 
are termed paralogs and constitute gene families the 
members of which are related both structurally and 
functionally. The large genomes of higher organisms 
are characterized by numerous gene families fre- 
quently of large size. For example, the homeobox 
genes concerned with developmental regulation exist 
as large gene families in the worm with 88 members 
and in the fly with 113 members. 

Recent progress in our understanding of genome 
organization in a variety of organisms promises ad- 
vances in a number of important areas. These include 
an understanding of evolution and its associated 
mechanisms, developmental control and the design 
of body plan, mechanisms associated with the aging 
process, and practical advances in medicine, agricul- 
ture, and biotechnology. An expanding genomic 
knowledge base will also generate ethical and legal 
problems which will require political solution and 
cultural adjustment. 
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This review highlights important aspects of the 
genome architectures of humans, a number of mam- 
mals, fish, invertebrates, fungi, plants, protoctists, and 
bacteria. Many of the differences in genome size and 
organization among organisms at the same morpho- 
logical grade are due to variations in the amounts of 
tandemly repetitious DNA sequences located around 
centromeres and telomeres, in the amounts of active 
and degenerate transposable elements, and in the sizes 
of introns and the spacing between genes. Distantly 
related genomes differ more by whole and partial 
genome duplications, the piecemeal amplification and 
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contraction of gene families, and the evolution of more 
complex multidomain proteins. 


Whole Genome DNA Sequencing 


The collective understanding of global genome organ- 
izations was accelerated as a result of the industrial- 
ization of whole genome sequencing technologies. 
Different consortia have completed the genomes of 
31 bacteria, the megabase (Mb)-sized nuclear genomes 
of baker’s yeast Saccharomyces cerevisiae (12 Mb), the 
nematode worm Caenorhabditis elegans (100 Mb), and 
the fly Drosophila melanogaster (180 Mb). The geno- 
mes of humans (Homo sapiens; 3300 Mb) and two 
plants, the wall cress (Arabidopsis thaliana; 125 Mb) 
and rice (Oryza sativa; 430 Mb), have been completed. 


Genomes of Bacteria 


The variation in bacterial genome size and gene 
number is large. Mycoplasma genitalium (0.58 Mb) 
has approximately 470 protein coding genes, whereas 
Myxococcus xanthus (9.5 Mb) probably has in excess 
of 8000 genes. Genome sizes also vary within a taxo- 
nomic group, e.g., from 2.7 to 6.5 Mb in cyanobacteria 
and from 6.5 to 8Mb in different strains of Strepto- 
myces ambofaciens. Furthermore, the genome organ- 
izations of Mycoplasma genitalium (470 genes), 
Haemophilus influenzae (1709 genes), Synechocystis 
ssp. (3200 genes), Bacillus subtilis (4000 genes), and 
Escherichia coli (4300 genes) reveal that gene order is 
not conserved, and that there is no absolute functional 
requirement for specific gene juxtapositions. Bacterial 
genomes are organized as linear and circular struc- 
tures. Linear chromosomes occur in Borrelia burgdor- 
feri, various species of Streptomyces, Agrobacterium 
tumefaciens, and in Rhodococcus fasciens. Circular 
chromosomes are found or inferred in other species: 
Mycoplasma genitalium, Haemophilus influenzae, 
Escherichia coli, Deinococcus radiodurans, Leptospira 
interrogans, and Rhizobium meliloti. 


Genomes of Placental Mammals 


The variation in genome organization and size is strik- 
ing. The Indian barking deer, Muntiacus muntjac, has 
only three pairs of chromosomes, whereas the black 
rhinoceros, Diceros bicornis, has 67 pairs. Genome size 
varies from 1650Mb in the Italian bat Miniopterus 
schreibersi to 5500 Mb in the South African aardvark, 
Orcyteropus afer. In short evolutionary time spans, 
these differences in genome size have little effect on 
embryological development, morphology, or physio- 
logy, as revealed by comparisons of the Indian munt- 
jac, Muntiacus muntjac (2400 Mb), with its three pairs 


chromosome. A second mechanism is by whole 
genome duplication whereby the initial gene will 
give rise to daughter genes residing initially on sep- 
arate chromosomes. Duplicated genes within a genome 
are termed paralogs and constitute gene families the 
members of which are related both structurally and 
functionally. The large genomes of higher organisms 
are characterized by numerous gene families fre- 
quently of large size. For example, the homeobox 
genes concerned with developmental regulation exist 
as large gene families in the worm with 88 members 
and in the fly with 113 members. 

Recent progress in our understanding of genome 
organization in a variety of organisms promises ad- 
vances in a number of important areas. These include 
an understanding of evolution and its associated 
mechanisms, developmental control and the design 
of body plan, mechanisms associated with the aging 
process, and practical advances in medicine, agricul- 
ture, and biotechnology. An expanding genomic 
knowledge base will also generate ethical and legal 
problems which will require political solution and 
cultural adjustment. 
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This review highlights important aspects of the 
genome architectures of humans, a number of mam- 
mals, fish, invertebrates, fungi, plants, protoctists, and 
bacteria. Many of the differences in genome size and 
organization among organisms at the same morpho- 
logical grade are due to variations in the amounts of 
tandemly repetitious DNA sequences located around 
centromeres and telomeres, in the amounts of active 
and degenerate transposable elements, and in the sizes 
of introns and the spacing between genes. Distantly 
related genomes differ more by whole and partial 
genome duplications, the piecemeal amplification and 


Genome Organization 859 


contraction of gene families, and the evolution of more 
complex multidomain proteins. 


Whole Genome DNA Sequencing 


The collective understanding of global genome organ- 
izations was accelerated as a result of the industrial- 
ization of whole genome sequencing technologies. 
Different consortia have completed the genomes of 
31 bacteria, the megabase (Mb)-sized nuclear genomes 
of baker’s yeast Saccharomyces cerevisiae (12 Mb), the 
nematode worm Caenorhabditis elegans (100 Mb), and 
the fly Drosophila melanogaster (180 Mb). The geno- 
mes of humans (Homo sapiens; 3300 Mb) and two 
plants, the wall cress (Arabidopsis thaliana; 125 Mb) 
and rice (Oryza sativa; 430 Mb), have been completed. 


Genomes of Bacteria 


The variation in bacterial genome size and gene 
number is large. Mycoplasma genitalium (0.58 Mb) 
has approximately 470 protein coding genes, whereas 
Myxococcus xanthus (9.5 Mb) probably has in excess 
of 8000 genes. Genome sizes also vary within a taxo- 
nomic group, e.g., from 2.7 to 6.5 Mb in cyanobacteria 
and from 6.5 to 8Mb in different strains of Strepto- 
myces ambofaciens. Furthermore, the genome organ- 
izations of Mycoplasma genitalium (470 genes), 
Haemophilus influenzae (1709 genes), Synechocystis 
ssp. (3200 genes), Bacillus subtilis (4000 genes), and 
Escherichia coli (4300 genes) reveal that gene order is 
not conserved, and that there is no absolute functional 
requirement for specific gene juxtapositions. Bacterial 
genomes are organized as linear and circular struc- 
tures. Linear chromosomes occur in Borrelia burgdor- 
feri, various species of Streptomyces, Agrobacterium 
tumefaciens, and in Rhodococcus fasciens. Circular 
chromosomes are found or inferred in other species: 
Mycoplasma genitalium, Haemophilus influenzae, 
Escherichia coli, Deinococcus radiodurans, Leptospira 
interrogans, and Rhizobium meliloti. 
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varies from 1650Mb in the Italian bat Miniopterus 
schreibersi to 5500 Mb in the South African aardvark, 
Orcyteropus afer. In short evolutionary time spans, 
these differences in genome size have little effect on 
embryological development, morphology, or physio- 
logy, as revealed by comparisons of the Indian munt- 
jac, Muntiacus muntjac (2400 Mb), with its three pairs 
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of chromosomes, and the Chinese muntjac, Muntiacus 
reevesi (2900 Mb), with its 23 pairs of chromosomes. 
Despite these different genome architectures and 
genome sizes, the species are morphologically similar 


and yield viable hybrids. 


Genomes of Fish, Plants, Yeasts, Ciliates, 
Crustaceans, and an Ant Species 


The African lungfish, Protopterus aethiopicus, has a 
genome of 130000Mb (40 times that of humans), 
whereas the puffer fish, Fugu rubripes, has a genome 
of only 400 Mb, yet both are osteichthyian fish. The 
lily, Lilium henry has a genome of 33000Mb, 
whereas the wall cress, Arabidopsis thaliana, has a 
genome of only 125 Mb. The similarly sized genomes 
of the yeasts Saccharomyces cerevisiae and Schizosac- 
charomyces pombe are organized in 17 and three 
chromosomes, respectively. In protoctists, the genes 
in the macronucleus of ciliates occur either in large 
chromosomes, as in Tetrahymena pyriformis, or as 
ten thousand or so individual gene-sized pieces in 
Oxytricha similis. Finally, the ultimate reductionist is 
the ant Myrmecia pilosula; its genome consists of just 
one pair of chromosomes. 


Localized Repetitive DNA Sequences in 
Rats, Humans, and Flies 


In genomes of the morphologically similar American 
kangaroo rats, Dipodomys ordii monoensis (5300 Mb) 
and Dipodomys heermani tularensis (3400 Mb), the 
difference of 1900 Mb is largely accounted for by 700 
million copies of just three simple DNA sequences, 
(AAG, TTAGGG, and ACACAGCGGG), located 
in the centromeric heterochromatin of D.ordii 
monoensis. By contrast, D. heermani tularensis has a 
small amount of centromeric heterochromatin and a 
minimal investment in such sequences. These three 
nontranscribed sequences constitute an amount of 
DNA equivalent to over half the human genome and 
are obviously dispensable for centromeric and cellular 
functions and make no significant contribution to 
morphology. In humans, there can be differences of 
many megabases in the size of the Y chromosome 
among different individuals, the differences being 
due to varying amounts of two tandemly repetitious 
DNA sequences. Polymorphisms involving many 
megabases of centromeric heterochromatin occur on 
other human chromosomes, particularly chromosome 
9, and none of these inherited polymorphisms has any 
known clinical manifestation. A similar situation is 
found in Drosophila melanogaster, where the satellite 
DNA-rich centromeric heterochromatin is poly- 
morphic, with differences of many megabases among 


different strains of flies. In addition, deletion analysis 
of the satellite DNA-rich X chromosome heterochro- 
matin reveals that at least 12 Mb can be deleted and 
viability is maintained, attesting to this DNA being 
devoid of essential genes. In populations of the grass- 
hopper, Atractomorpha similis, there are extensive 
polymorphisms in telomeric heterochromatin, with 
differences of the order of tens of megabases between 
individuals in the same population. Finally, in the 
crustacean Cyclops strenuus, 600 Mb of centromeric, 
telomeric, and interstitial heterochromatic DNA is 
excised from the chromosomes during the early cleay- 
age divisions of embryogenesis and degraded. The 
remaining DNAs are spliced together to leave a 
somatic genome of 400 Mb. The DNAs of these dis- 
posable heterochromatic segments are clearly not crit- 
ical for embryogenesis or cellular functions. 


Organization of Centromeres in Fungi, 
Worms, Flies, and Humans 


The localized centromeres of the budding yeast, Sac- 
charomyces cerevisiae, consist of a 125-bp region of 
DNA, while those of the fission yeast, Schizosaccharo- 
myces pombe, occupy 40000 to 100000 bp. In con- 
trast, those of the fungus Neurospora crassa are made 
up of degenerate transposable elements. The one cen- 
tromere characterized in Drosophila melanogaster is a 
0.42 Mb region consisting of tens of thousands of 
copies of two simple sequence DNAs. The centro- 
meric regions of human chromosomes typically con- 
sist of 2-4 megabases of different combinations of the 
four satellite DNAs and various other repetitive elem- 
ents. However, stable human chromosomes exist in 
which direct sequencing reveals that their neocentro- 
meres totally lack all repetitive sequences. The worm 
Caenorhabditis elegans does not have localized cen- 
tromeres at all; its chromosomes are holocentric. 
Thus, while the centromeres of humans, flies, and 
some fungi are embedded in blocks of repetitious 
sequences, there is no common underlying sequence 
organization between them, and yeast, and some 
human centromeres, are totally devoid of repetitive 
sequences. 


Organization of Telomeres in Humans, 
Ciliates, Yeasts, and Flies 


The ends of human chromosomes consist of thou- 
sands of tandemly repeated copies of the simple 
sequence TTAGGG, internal to which are a hetero- 
geneous group of 93 bp repetitive sequences found at 
the telomeres of chromosomes 5, 7, 17, 19, 20, 21, and 
22. The telomeres of trypanosomes also have these 
TIAGGG repeats, whereas the ciliates Tetrahymena 


and Euplotes have variants of these; TTGGGG and 
TTTTGGGG, respectively. The telomeres of Saccharo- 
myces cerevisiae consist of repetitive sequences based on 
T(G) 2-3 (TG) 1-6 and in, addition, the Y’ family of 
conserved repetitive sequencesisfoundat 19 of the yeast 
telomeres. In contrast to these G-rich sequences, the 
telomeres of Drosophila melanogaster lack the charac- 
teristic simple TTAGGG-rich repeats of humans and 
other organisms. Instead, fly telomeres are composed of 
a tandem array of elements related to non-LTR retro- 
transposons of the HeT-A and TART families, which 
are related to the LINE families of vertebrate transpo- 
sons. 


Dispersed Transposable Sequences in 
Humans, Flies, Plants, Worms, Yeasts, 
and Bacteria 


The second major component of eukaryotic genomes 
are the dispersed repetitive sequences, the bulk of 
which originate from the activities of transposable ele- 
ments. The sequencing of the human genome reveals 
that the euchromatic regions of human chromosomes 
contain a heterogeneous array of transposable elem- 
ents that were once mobile, but are now mostly 
degenerate and sessile. These elements are finely inter- 
spersed with protein coding genes. The bulk of these 
elements, of the retrotransposon and DNA trans- 
poson types (the Alu, MIR, LINE1, LINE2, HERV, 
Ma1R, mariner, and other miscellaneous transposons), 
account for approximately 1300Mb of the human 
genome. One group, the Alu family, has over a million 
members dispersed between genes and within intronic 
regions. 

The genes of Drosophila melanogaster are inter- 
spersed with the members of at least 90 different 
families of transposable elements. In addition, 
D. melanogaster has approximately eight times as 
many dispersed transposable elements as its sibling 
species D. simulans, and this accounts for the 20 Mb 
difference between these two genomes. This huge 
imbalance in the amount of dispersed repetitive 
DNA has not manifested itself in significant morpho- 
logical change, as the two species are near identical and 
viable hybrids can be produced. 

The interdigitation of transposons and other sun- 
dry repetitive elements with protein coding genes is a 
general feature of all genomes, the main difference 
being the types and amounts of sequences involved. 
In the lily Lilium henryii (33 000 Mb), there are 13 000 
copies of just one family of transposons, while in 
Arabidopsis thaliana there are smaller memberships 
of many different transposable element families, 
of the LTR, non-LTR retrotransposons, En-like, 
TNP2-like and MuDR families. In C. elegans there 


Genome Organization 86l 


are at least 40 families of dispersed repetitive elements, 
most of which probably arose from transposition 
events. The Saccharomyces cerevisiae genome is more 
modest in this regard, with the Ty elements and some 
solo LTRs together constituting about 3% of the 
genome. In bacterial genomes, dispersed repetitive 
sequences constitute no more than 2% of genomes. 
In E.coli, 18 repetitive families make up a hetero- 
geneous mixture of autonomously transposable elem- 
ents, cryptic prophage and phages, and short DNA 
sequences (such as the 40 bp elements termed REP/ 
BIME/PU). Family memberships vary from a few to 
approximately 600 and they are dispersed throughout 
the chromosome. 


Gene Numbers in Bacteria, Yeasts, 
Worms, Flies, Plants, Fish, and Humans 


The number of protein-coding genes in fully sequen- 
ced bacterial genomes varies enormously from 470 
genes in Mycoplasma genitalium to a number esti- 
mated to be in excess of 8000 in Myxococcus xanthus. 
In free-living eukaryotes, the variation is from 6200 in 
Saccharomyces cerevisiae, 18000 in Caenorhabditis 
elegans, 14000 in Drosophila melanogaster to 26 000 
in Arabidopsis thaliana. Gene numbers in human 
beings, and in mammals in general, are still controver- 
sial, with estimates varying from below 40 000 to well 
over this figure. 


Gene Families 


All genomes contain different-sized gene families, the 
members of which have arisen by duplicative pro- 
cesses. Thus in Haemophilus influenzae, while there 
are 1709 genes in total, 284 of these are duplicated 
products, or paralogs. Thus, there are only 1425 
distinct families some of which have more than one 
family member. In E. coli, nearly 50% of the genes 
are duplicated, a figure not very different from the 
percentage of paralogs in the genomes of the worm 
(49%) and the fly (41%). Thus, independently of 
their grade of evolutionary organization, genomes 
have undergone a significant degree of duplication of 
their genes. These duplications can be local and form 
a cluster, such as a tandem array of 10 glutathione 
S-transferase genes in the fly and the cluster of 10 
kallikrein serine proteases in the rat. Alternatively, 
the duplicated family members can be dispersed, such 
as the G-protein-coupled receptor genes (GPCRs), 
which are distributed throughout the entire fly gen- 
ome. Furthermore, the extent of these duplication 
events is different in each evolutionary lineage. In 
the case of the trypsin-like (S1) proteases, yeast has 
one gene, the worm has seven, and the fly has 199. 
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Inthe case of the GPCRs, there are 160 in the fly, 1100 in 
the worm, and an estimated 700 in the human genome. 
In the case of neurotransmitter-gated ion channels, 
there are 27 genes in the fly, 81 in the worm, and 
none in yeast. In summary, hundreds of gene families, 
with very different membership sizes, characterize the 
various evolutionary lineages. 


Pseudogenes 


In addition to duplicated gene products that are func- 
tional, many metazoan genomes are littered with 
pseudogenes, duplicated copies that have become 
inactivated. For example, while there is only a single 
functional copy of the glyceraldehyde 3-phosphate 
dehydrogenase (GAPDH) gene in humans, mice, 
and rats, there are 10 to 30 nonfunctional GAPDH 
pseudogenes in humans and more than 200 in mice and 
rats. In the completely sequenced human chromo- 
some 22, there are estimated to be 545 genes and 134 
pseudogenes. Furthermore, at least half of the human 
olfactory GPCRs are pseudogenes. In the worm, at 
least 300 of the 1100 GPCRs are pseudogenes. In yeast 
and bacteria, on the other hand, pseudogenes are rare 
(usually less than 1%). 


Orphan Genes 


The most surprising result that has emerged from all 
completely sequenced genomes, be they from bac- 
teria, eukaryotes, or metazoans, is that irrespective of 
their genic content, at least 20% of the genes (and 
sometimes much more) are orphans. ORFans are 
genes whose protein products have no clear sequence 
similarities to proteins encoded from their own gen- 
ome or to any other protein in existing public databases. 
For example, even in Mycoplasma genitalium with 
only 470 genes, 120 are ORFans of totally unknown 
origin or function. In yeast and the fly, the figures are in 
excess of 25 %. Whether ORFans constitute an irredu- 
cible core of genes that evolve rapidly and whose pro- 
tein products can maintain old functions, or acquire 
new ones, is not yet clear. They remain a mystery. 


Gene Sizes 


A comparison of the partially characterized 400 Mb 
genome of the puffer fish with that of the 3300 Mb 
human genome is intriguing, particularly since these 
two organisms are believed to contain the same num- 
ber of genes. When the human dystrophin, utrophin, 
and Huntington genes are compared with their puffer 
fish homologs, it is found that the human genes are 
2500, 1000, and 170 kb in length, respectively, whereas 
their puffer fish homologs are only 200, 100, and 


23 kb, respectively. The number of exons and their 
sizes are near identical between homologs, but the 
intron sizes in the human genes are almost eight 
times larger on average than those in the puffer fish. 
In addition, the puffer fish has less DNA between its 
contiguous genes than the human genes. When one 
takes into consideration that the human genome has 
2000 megabases of localized and dispersed repetitive 
sequences, as well as much larger introns and between 
gene distances than the puffer fish, then the initial 
eightfold difference in genome size becomes much 
less mysterious. 


Summary 


Eukaryote genome organization of the have been 
dominated by a mixture of whole genome as well as 
piecemeal genome duplications. Thus the genes pre- 
sently constituting the mammalian lineage likely stem 
from a combination of whole genomic amplifications 
and subsequent reductions of a much smaller genome. 
Layered on top of this are the local and dispersed 
duplicative processes that have resulted in the expan- 
sion and contraction of individual protein coding 
families, as well as the expansion and contraction of 
noncoding tandemly repetitious and transposable 
families. Layered on top of this again are the molecular 
processes that generate larger proteins with a greater 
combinatorial complexity of protein domains. Finally, 
it is clear that much of the variation that is seen in 
present-day genomes, particularly in the localized het- 
erochromatic and transposable element compartments 
of the genome, is essentially the flotsam and jetsam of 
genomic turnover events. Most of these processes have 
little effect on phenotype in the short term. 
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Gene sequences have remained highly conserved dur- 
ing evolution. Hence, a single set of complementary 
DNAs (cDNAs) i.e., sequences derived from tran- 
scribed genes, can be used as hybridization probes 
across a range of related species to construct compara- 
tive genetic maps, outlining their genome relation- 
ships. Comparative mapping within the grass family 
(Poaceae), including the major cereals rice, maize, and 
wheat, has demonstrated that both gene content and 
orders have remained highly conserved during 60 mil- 
lion years of evolution. Thus, it is possible to describe 
each grass genome, irrespective of its genome size or 
chromosome number, by its relationship to a single 
reference genome, rice. These relationships can be 
depicted by a series of concentric circles with the 
inner and outer circles representing the smallest 
and largest genomes in the comparison, respectively 
(Figure 1). Within a genome, chromosomes are 
ordered so that a minimum number of rearrangements 
are needed in the overall comparison. Corresponding 
genes across the species can be found on the radii. 
Maize (27 = 20; C = 2.5 pg), a species belonging to 
the subfamily Panicoideae, originated about 16 to 11 
million years (My) ago through the hybridization of 
two diploid ancestors and subsequent diploidization 
(Gaut and Doebley, 1997). The ancient tetraploid ori- 
gin of the maize genome is revealed in the comparative 
maps. Each of two sets of five maize chromosomes 
(1, 2, 3, 4,6 and 5, 7, 8, 9, 10) corresponds to a complete 
rice genome, albeit with a different order of the rice 
linkage blocks (Figure 1). Some rearrangements rela- 
tive to the rice genome are common to both genomes 
(indicated by red arrows in Figure 1), and also extend 
to other Panicoideae species. These chromosomal 
mutations provide information on species’ rela- 
tionships and evolution. However, the rate at which 
rearrangements occur and are fixed may be species- 
specific, and thus dependent on the genome structure 


rather than on evolutionary divergence time (Zhang 
et al., 1998; Devos et al., 2000). 

Once the comparative maps have identified corres- 
ponding, or orthologous, regions across species, DNA 
sequencing can provide more detailed information on 
the extent to which gene orders have remained con- 
served. DNA sequence analysis of orthologous Adh 
regions in maize and sorghum, which diverged 16-20 
My ago, showed that nine genes were present in the 
same order and orientation, while three had appar- 
ently been deleted in maize (Tikhonov et al., 1999) 
(Figure 1). The difference in physical length of the 
region (78 kb in sorghum and 225 kb in maize) was 
mainly due to the presence of nonconserved retro- 
elements, which inserted within this maize region 
over the last 6 My (SanMiguel et al., 1998). The region 
identified by the most conserved Adh gene in rice 
displayed no colinearity with the maize and sorghum 
Adh regions (Tikhonov et al., 1999; Tarchini et al., 
2000). This indicated that the Adh region had under- 
gone rearrangements in either rice or the Panicoideae 
lineage since their divergence from a common an- 
cestor. Similar studies across the grass family have 
indicated that single gene and small segmental dupli- 
cations and transpositions within otherwise colinear 
regions may be common events in genome evolution 
(Bennetzen, 2000; Devos and Gale, 2000). 

The main application of the integration of genomic 
data is the transfer of knowledge across species and the 
exploitation of common resources including marker 
sets, mutant collections, and ever-increasing rice geno- 
mic sequence data. For example, a maize dwarf mutant 
that maps in a region orthologous to a plant height 
QTL in sorghum may be the homolog of the gene 
underlying the sorghum trait. If rice genomic se- 
quence data are available for this region, it may even 
be possible to readily identify a candidate gene. The 
small sorghum genome may also be used as a tool for 
the isolation of genes in its large-genome relative, 
maize. Following the identification of the region 
in sorghum that is orthologous to the target region in 
maize, chromosome walking and gene isolation can be 
carried out in the threefold smaller sorghum genome. 
Although this approach circumvents many of the 
problems associated with the presence of highly re- 
petitive DNA elements in maize, any disruption of 
colinearity in the orthologous regions may affect suc- 
cessful isolation of the target gene. 

The high level of conserved colinearity within the 
grass family is in stark contrast with the almost com- 
plete lack of gene order conservation between the 
grass and Arabidopsis genomes (Devos et al., 1999; 
Tikhonov et al., 1999; van Dodeweerd et al., 1999). 
Although the eudicot and monocot species diverged 
some 130-240 My ago, this large erosion of colinearity 
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carried out in the threefold smaller sorghum genome. 
Although this approach circumvents many of the 
problems associated with the presence of highly re- 
petitive DNA elements in maize, any disruption of 
colinearity in the orthologous regions may affect suc- 
cessful isolation of the target gene. 

The high level of conserved colinearity within the 
grass family is in stark contrast with the almost com- 
plete lack of gene order conservation between the 
grass and Arabidopsis genomes (Devos et al., 1999; 
Tikhonov et al., 1999; van Dodeweerd et al., 1999). 
Although the eudicot and monocot species diverged 
some 130-240 My ago, this large erosion of colinearity 
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was unexpected and suggests a high rate of genome 
rearrangements in the lineage leading to Arabidopsis. 
Although this needs to be confirmed by further com- 
parative data, one can speculate that the extensive 
duplication of the Arabidopsis genome (Bancroft, 
2000; Blanc et al., 2000) may have contributed to its 
faster evolution. Gene duplication, and subsequent 
divergence of the two copies may also be an important 
mechanism through which species acquire new gene 
functions. 

In conclusion, comparative genome analyses have 
demonstrated that gene orders have remained con- 
served during 60 My of evolution, both at the map and 
at the DNA sequence level. The wealth of information 
provided by the integrated grass maps can now be 
exploited to enhance our knowledge of both well- 
studied major cereals and under-resourced orphan 
crops. 
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Genome sizes are usually expressed in terms of the 
number of base pairs in the haploid genome, either in 
kilobases (1kb = 1000bp) or megabases (1Mb = 
1000 000 bp). Kilobases are related to other units by 
the useful 1-2-3 mnemonic: 11m of linear duplex 
DNA has an approximate molecular weight of 2 
million daltons and contains approximately 3 kb of 
DNA. One megabase of duplex DNA has a mass of 
1fg (10°'°g). Genome sizes of bacteriophages and 
viruses range from a few thousand bases to several 
hundred kilobases. Bacterial genomes range from 
0.5Mb to 10Mb. Eukaryotic genomes are diverse, 
from approximately 10Mb in some fungi to more 
than 100000 Mb in certain plants. Genome size in 
eukaryotes is poorly correlated with organismal com- 
plexity. For example, the largest genome known is that 
of the protozoan Amoeba dubia, at 670000 Mb. The 
Database of Genome Sizes contains convenient list- 
ings of genome sizes for a large number of organisms. 
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which the library was derived. A DNA clone is a 
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microorganism. The clone is composed of two parts 
that are fused into a single continuous DNA molecule. 
One part is the vector, which at a minimum contains 
genes coding for the proteins and other DNA elem- 
ents necessary for the propagation and selection of the 
clone in the host microorganism. The other part of 
the clone is the insert DNA. This is the DNA that is 
isolated from the organism under study and inserted 
into the vector. 


History 


Genomic libraries were constructed in the early days 
of the development of recombinant DNA technology 
in the mid to late 1970s. The libraries were the source 
of clones for the analysis of genes of interest. The first 
libraries were constructed using partial restriction 
digests as the means for fragmenting the genomic 
DNA in a way that generated overlapping fragments 
of length suitable for cloning. Such fragments were 
cloned into plasmid vectors by Clark and Carbon 
(genomic libraries of Escherichia coli and Saccharo- 
myces cerevisiae DNA). Maniatis constructed geno- 
mic libraries of Drosophila, rabbit, and human using a 
bacteriophage lambda vector. 

Procedures were developed for the rapid screening 
of these libraries for sequences of interest based on 
sequence similarity to a labeled nucleic acid probe. 
Colony hybridization for screening plasmid libraries 
and plaque hybridization for screening bacteriophage 
lambda libraries revealed which clones contained 
DNA sequences with identity or very high similarity 
to the sequence of the probe. In these procedures DNA 
from the high-density plates of colonies or plaques 
was transferred to solid hybridization membranes, 
initially nitrocellulose and subsequently various 
formulations of modified nylon. The initial pattern 
of colonies or plaques on the plates was preserved on 
the membrane. A DNA fragment serving as a probe 
was typically labeled with the *’P isotope of phos- 
phorus. The membrane was hybridized to the probe 
in solution and after washing away the unhybridized 
probe, it was exposed to X-ray film to reveal those 
colonies or plaques that contained DNA identical or 
similar to the probe. Based on the location of the hybri- 
dizing signal on the membrane, the corresponding 
colonies or plaques were recovered for further analysis. 

Recombinant DNA technology lead to the explo- 
sive development of molecular genetics as numerous 
genes of biological and medical importance were iso- 
lated from genomic libraries, characterized, and their 
products expressed in E. coli, other bacterial species, 
S. cerevisiae, and insect and mammalian cultured cells. 

The plan to map and sequence the human genome 
emerged as the Human Genome Project in the late 


1980s, bringing genomic libraries into this new appli- 
cation. Up to this point the libraries were the source of 
clones for studying individual genes or sequences. For 
whole genome scale analysis, the properties required of 
genomic libraries were more rigorous. Three charac- 
teristics emerged as the requirement for use in libraries 
for genome projects. These characteristics (large clon- 
ing capacity, stable propagation of insert DNA, and 
curtailing of chimeric inserts) are critical features of 
vectors and library construction protocols for genome 
mapping and sequencing. For these applications cos- 
mids, yeast artificial chromosome vectors (YACs), 
bacteriophage P1 vectors, P1 artificial chromosome 
vectors (PACs), and bacterial artificial chromosome 
vectors (BACs) have been developed and used. 

The decade of the 1990s has brought a focus on high 
throughput genomic DNA sequencing of many spe- 
cies including the human, the laboratory mouse, the 
roundworm Caenorhabditis elegans, plants, including 
Arabidopsis thaliana, rice and potato, and numerous 
species of bacteria. A critical component of all of these 
projects is the construction of genomic libraries from 
either the entire genome or from a large insert geno- 
mic clone such as a BAC. These libraries are con- 
structed by shearing the genomic DNA to randomly 
generate overlapping fragments of the appropriate 
size. A fraction of the sheared DNA is then selected 
by size for construction of the library. The resultant 
library has a narrow range of insert sizes and clones 
are randomly selected from the library for sequencing 
from both ends (shotgun sequencing). The insert size 
of the clones being sequenced is typically 2 kb and the 
average sequence read length is about 650 bases. The 
randomly collected sequence reads are then assembled 
into the original molecule using assembly software, 
which will find overlaps in the sequence reads to 
accomplish the assembly process. 

The features of the vectors and library construction 
protocols for shotgun DNA sequencing are very rig- 
orous due to the high cost of sequencing, the technical 
challenge of the assembly of shotgun sequence reads, 
and the cost of closing the remaining gaps after the 
assembly. The libraries need to have a very low in- 
cidence of clones that contain no inserts, or the 
no-insert-containing clones need to be readily identi- 
fiable and excluded from the sequencing pipeline. The 
libraries should have a narrow insert size range. This 
allows the assembly software to use the distances 
between the sequences obtained from each end of the 
clone. The libraries cannot contain chimeric DNA 
inserts as these will confound the assembly process. 
The libraries need to be as truly random as is technic- 
ally achievable to minimize the number of gaps in the 
sequence after the assembly of the sequence reads. 
This requires that the insert DNA be sheared instead 


of the more traditional technique of partial restriction 
digestion to reduce the size of the DNA fragments to 
that required for cloning. 


Vectors for Genomic DNA Libraries 


Vectors for genomic DNA libraries are selected based 
on the projected application for the library as discus- 
sed above. Table | summarizes the properties of the 
vectors covered in this review. A brief introduction to 
each type of vector and the procedure for preparing 
the vector for library construction follows below. 


Plasmid Vectors 

Plasmids are small extrachromosomal circular double- 
stranded DNA molecules that replicate independently 
of the chromosome or chromosomes of a microorgan- 
ism. Their copy number in the cell is maintained by 
control systems built into the plasmid’s gene content 
but varies depending on the replication system of the 
plasmid. For example, pUC-based plasmid vectors are 
maintained at a copy number of 500-700 per cell. 
Plasmid vectors derived from the E. coli F plasmid 
are rigorously maintained at a copy number of one. 

Naturally occurring bacterial plasmids have been 
engineered to serve as vectors for the propagation of 
exogenous DNA fragments. To serve as a vector the 
plasmid must have in addition to a replication system, 
a selectable marker (typically an antibiotic resistance 
gene) and a cloning site, a unique restriction site in a 
nonessential region of the plasmid for the insertion of 
the exogenous insert DNA. 

One of the historical limitations on the use of plas- 
mids for genomic libraries was the inefficient proced- 
ure of chemical transformation for transferring the 
recombinant plasmid DNA constructs into E. coli. For 
library construction, that technique has been replaced 
by the use of very high efficiency electroporation. A 
high-voltage electric field is applied briefly to cells, pro- 
ducing transient holes in the cell membranes through 
which plasmid DNA enters. Electroporation allows 
for the efficient transfer of plasmid DNA as large as 
200 kb into cells. 
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Preparation of Plasmid Vectors for 
Library Construction 


Vector preparation in general includes procedures to 
remove unwanted DNA fragments and to generate the 
desired ends at the cloning site. The process of making 
a plasmid genomic library is illustrated in Figure 1. 
Additional steps may be incorporated to reduce or 
eliminate the ability of the vector to be replicated in 
the absence of an insert fragment. 

For plasmid vectors, preparation for use in library 
construction typically involves digestion with the 
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Figure | Construction of a genomic library in a 
plasmid vector. 


Table I Properties of the vectors used in construction of genomic libraries 
Vector Cloning Applications 
capacity (kb) 
Plasmids 0.1-12 Single gene cloning; shotgun sequencing libraries 
Bacteriophage lambda 10—20 Single gene cloning 
Cosmids 35-45 Single gene cloning; genome mapping and sequencing 
Bacteriophage P| 30-90 Genome mapping and sequencing 
BACs 30-300 Genome mapping and sequencing 
YACs 100—1000 Genome mapping and sequencing 
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appropriate restriction enzyme to give the appropriate 
sticky (single-stranded tails of typically four bases) or 
blunt ends followed by the removal of the 5’ phos- 
phates using a phosphatase enzyme. The absence of 
the 5’ phosphates eliminates the ability of the plasmid 
to be ligated back into a circular molecule. This phos- 
phatase step is required to minimize the number of 
clones in the library that contain no inserts. 


Bacteriophage lambda Vectors 

Bacteriophage lambda is a virus that infects E. coli. 
The typical infection cycle results in the lysis of the 
E. coli cell and the release of about 100 progeny phage 
particles, each capable of infecting another cell. When 
lambda is plated at low density on a lawn of E. coli 
cells on agar medium, the resulting pattern of clearings 
(plaques) in the lawn caused by the lysed cells identify 
the location of individual lambda clones. Harvesting 
the phage particles from a plaque (picking a plaque) 
provides a stock of the phage clones for subsequent 
rounds of propagation. 

Like the plasmid vectors, wild-type lambda has 
been extensively engineered for use as a vector. 
Genes not essential for the lambda life cycle described 
above have been removed to make room for carrying 
exogenous insert DNA. The early popularity of 
lambda as a cloning vector for genomic library con- 
struction is a consequence of the very efficient path- 
way for getting lambda DNA into E. coli cells. This 
was in contrast to the inefficient chemical trans- 
formation used for plasmids, particularly for larger 
constructs. The bacteriophage lambda DNA or 
recombinant lambda DNA-containing inserts is pack- 
aged into infectious phage particles using an efficient 
in vitro packaging reaction. Once the particles are 
formed, each one can inject its DNA into an E. coli 
cell. The limit on how much exogenous DNA can be 
propagated in a lambda vector results from the pack- 
aging capacity of the phage particle, approximately 
35-50kb of DNA. Because of the requirement for 
lambda genes for a productive infection, the amount 
of insert DNA is restricted to 10-20 kb depending on 
the specific vector. Figure 2 illustrates the process of 
constructing a genomic library in a lambda vector. 


Preparation of lambda Vectors for Library 
Construction 

Lambda vectors come in two different forms because 
of the restriction on DNA size that is a feature of the 
packaging system. Lambda insertion vectors are those 
that simply require that the lambda DNA be cut at a 
unique restriction site for the insertion of exogenous 
DNA. This is why they are termed insertion vectors. 
After the restriction digestion, a phosphatasing step to 
remove the 5’ phosphates is frequently also included 
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Figure 2 Construction of a genomic library in a 
bacteriophage lambda vector. 


to minimize the number of clones in the library with- 
out inserts. Since the packaging limitations are from 
about 35-50 kb, insertion vectors can handle exogen- 
ous DNA fragments up to only about 15 kb because 
the vector itself must be at least 35 kb to be package- 
able and therefore viable. 

The second form of lambda vector is termed a 
replacement vector. These vectors have a stuffer frag- 
ment that is removable by restriction digestion and 
DNA size fractionation. In the course of removing the 
stuffer fragment, the cloning site sticky ends are also 
generated. The removal of a stuffer fragment allows 
for the insertion of a larger DNA fragment. Typical 
replacement vectors can propagate inserts with sizes 
upto25 kb. Thephosphatase stepisalsoincorporatedin- 
to the vector preparation process with lambda replace- 
ment vectors. Removal of the 5’ phosphatase is used 


to prevent vector to vector ligation that will reduce the 
efficiency of the in vitro packaging reaction. After the 
removal of the stuffer fragment, the remaining arms 
are sufficiently small that if ligated together without 
inserts will not be packaged. This will prevent the 
propagation of vectors without inserts. 


Cosmid Vectors 

Cosmid vectors are plasmids that can be packaged into 
infectious bacteriophage lambda particles via the 
lambda in vitro packaging system. All that is required 
for this to occur is for the plasmid to contain the 
lambda cos DNA sequence. Since the plasmid does 
not require the lambda genes necessary to form pro- 
geny phage particles on infection, there is more capa- 
city in the cosmid for containing insert DNA than 
with a bacteriophage lambda vector. As a result, cos- 
mids can generally accept 30-40 kb insert fragments. 
Once the phage particle containing the cosmid clone 
injects its DNA into an E. coli cell, the cosmid is 
replicated through its plasmid replication system and 
cells containing the cosmid clone are selected by the 
antibiotic resistance marker in the vector. So a cosmid 
clone is simply a 35-50 kb plasmid which can be effi- 
ciently packaged into bacteriophage lambda particles 
and injected into E. coli cells. One of the advantages of 
cosmids for constructing genomic libraries of organ- 
isms with large genomes is that they have a cloning 
capacity about twice that of lambda vectors, i.e., they 
can accept inserts of up to about 40kb whereas 
lambdas are restricted to about 20 kb. A disadvantage 
is that some cosmid clones are unstable on propaga- 
tion in E. coli due to the high copy number plasmid 
replication system. 


Preparation of Cosmid Vectors for Library 
Construction 

Cosmid vectors are prepared in much the same man- 
ner as plasmids. The cloning site sticky ends are gen- 
erated by digestion with a restriction enzyme and a 
phosphatase is used to remove the 5’ phosphates from 
the vector to prevent vector to vector ligation. The 
insert fragments must be size selected so that the 
in vitro packaging of the DNA into bacteriophage 
lambda particles occurs efficiently. 


PI Vectors and PACs 

The P1 cloning system was developed by Nat Stern- 
berg for use in large genome mapping and sequencing 
projects. P1 vectors are much like cosmids in that they 
are plasmids that can be packaged in a phage particle 
for efficient injection into E. coli. The bacteriophage 
P1 phage head can hold about 110kb of DNA. The 
vectors designed for use with the P1 packaging system 
are up to about 30 kb in size so that the cloning capacity 
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of P1 systems is 70-100 kb. The P1 cloning vectors 
feature two replication systems. The P1 replicon func- 
tions after the DNA is injected into the cell. This 
replicon maintains the P1 plasmid at a copy number 
of one that minimizes the possibility that the insert 
will rearrange. An inducible replicon is available to 
increase the copy number by 20-30 fold immediately 
before DNA purification. The P1 cloning system does 
not exhibit the instability of the cosmid system due to 
the low copy number propagation of the clone in E. coli. 

P1 artificial chromosomes, PACs, use the same P1 
cloning vectors but do not go through the in vitro 
packing step. Instead, the inserts ligated to the vector 
molecules are electroporated directly into E. coli 
cells. Without the need to be packaged into bacteri- 
ophage particles, the size of the inserts can be in- 
creased to greater than 100kb. The properties of 
PAC libraries and the procedures for making and 
manipulating them are similar to the BAC libraries 
discussed below. 


Preparation of PI and PAC Vectors for 
Library Construction 

P1 vectors are prepared in essentially the same way as 
bacteriophage lambda vectors (see above). PAC vec- 
tors are prepared in essentially the same way as BAC 
vectors (see below). 


Bacterial Artificial Chromosome Vectors 
Bacterial artificial chromosome vectors (BACs) were 
developed to permit the cloning and stable mainten- 
ance of large (100-200 kb) pieces of DNA in E. coli. 
Their stability and ease of handling have made these 
vectors increasingly popular for whole genome map- 
ping and sequencing projects from microbes, plants, 
and animals. The copy number of these cloning vec- 
tors is rigorously maintained at one by the BAC repli- 
cation system derived from the E. coli F plasmid. The 
use of these vectors with a recombination-deficient 
host allows DNA that is unstable in higher copy 
number cloning systems to be propagated without 
incurring deletions or rearrangements. Large insert 
libraries constructed in BAC vectors have served as 
the starting point for the sequencing of several organ- 
isms with large genomes including the human, mouse, 
the model plant Arabidopsis thaliana, and rice. 


Preparation of BAC or PAC Vectors for 
Library Construction 

As a result of the reduced electroporation efficiency of 
100 kb circles (BAC clones) relative to 8 kb circles 
(BAC vectors) even a minor amount of recircularized 
vector present after ligation with inserts will yield a 
major fraction of the colonies containing only vector 
molecules. To avoid this, considerable effort must be 
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expended in preventing the formation of BACs with- 
out inserts in the library construction process. 

The vector molecules are digested with the appro- 
priate restriction enzyme and the 5’ phosphates 
removed by treatment with a phosphatase. Linear 
vector molecules of the correct size are obtained by 
recovery from an agarose gel after sizing by electro- 
phoresis. The recovered linear vector molecules are 
then self-ligated in bulk to form multimers from 
those molecules still retaining the 5’ phosphates. 
These ligation products are removed by another 
round of agarose gel electrophoresis and size selection 
for the linear BAC monomer. 

At this point aliquots of the vector are self-ligated 
and ligated with a test insert in separate reactions. The 
reaction products are electroporated into E. coli and 
the colony count compared. The vector only electro- 
poration should yield few to no colonies and a large 
number of colonies should be obtained with the test 
insert. If the vector only electroporation does give a 
consequential number of colonies, then another round 
of self-ligation and size purification is required. Once 
the desired result is obtained, the vector is ready to 
receive inserts. 


Yeast Artificial Chromosomes 

Yeast artificial chromosomes (YACs) provide the larg- 
est insert capacity of any cloning system. This system, 
developed by Burke and Olson in 1987, supports the 
propagation of exogenous DNA segments hundreds 
of kilobases in length. YACs representing contiguous 
stretches of genomic DNA (YAC contigs) have 
provided a physical map framework for the human, 
mouse, and even Arabidopsis genomes. 

The YAC vector itself provides the essential elem- 
ents for propagation of DNA as a chromosome in 
the yeast Saccharomyces cerevisiae. These elements 
include a yeast centromere, two functional telomeres, 
and auxotrophic markers for selection of the YAC in 
an appropriate yeast host. A problem encountered in 
constructing and using YAC libraries is that they 
typically contain clones that are chimeric, i.e., contain 
DNA ina single clone from different locations in the 
genome. 


Preparation of YAC Vectors for Library 
Construction 

YAC vectors are like P1s and lambda replacement 
vectors in that they require the isolation of two vector 
arms and removal of the 5’ phosphates from the arms 
to prevent vector to vector ligation and recirculation. 
Since recircularized vector DNA can transform yeast 
with a high frequency, the presence of even a small 
quantity of such molecules can produce a high back- 
ground of vector only transformants. 


Average Insert Size and Representation 
of the Genome 


When the purpose of a genomic library is to screen for 
a single gene, if the gene is there the library represen- 
tation of the genome is sufficient. If the library is 
intended for genome-wide studies, the usefulness of 
the library depends on maximizing the fraction of the 
entire genome present in the library. Genomic 
libraries are usually characterized by the size of the 
library, i.e., the number of clones in the library and the 
average insert size of the clones. 

The typical way of presenting library size is to 
determine the ratio of the amount of genomic DNA 
in the library to the amount of DNA in the genome. 
For example, if a human BAC library has an insert size 
of 100 kb and contains 300000 clones, the library 
contains 30 Gb of human DNA. Since the haploid 
human genome contains 3 Gb of DNA, the library is 
10 times larger than the genome. This is sometimes 
referred to as a 10 x library or a 10-hit library. This 
ratio is the library coverage value. 

The coverage value indicates on the average how 
many times a particular sequence is present in the 
library. This does not mean that libraries larger than 
1x in coverage contain the entire genome. Since the 
probability of finding a sequence in a library follows a 
Poisson distribution, assuming random cloning, some 
sequences will be present less often, or even absent, 
and others more often than the number indicated by 
the coverage value. For a 1x library, the probability of 
finding one or more clones containing a particular 
sequence is 0.632. For a 10x library this probability 
increases to 0.99995. A 5x to 7x library with prob- 
abilities ranging from 0.99 to 0.999 reflects what is 
considered to be the typically useful library size. 


Preparation of the Inserts 


The ideal genomic library contains all sequences pre- 
sent in the genome of the subject organism. In addition 
to the considerations of library size discussed above, 
sequences are absent from the library as a result of 
either of two additional circumstances. The first cir- 
cumstance is that a particular sequence in a cloning 
vector results in the killing of the host or the DNA se- 
quence itself is unstable in the host. If all of the cells con- 
taining a sequence are killed, then that sequence will 
not be presentin the library. If the sequence is deleted or 
rearranged in the host, it will not be found or recogn- 
ized in the library. These kinds of sequences are called 
unclonable sequences. The problem of unclonable se- 
quences will be treated below. The other circumstance 
occurs when a nonrandom method such as restriction 
digestion is used for fragmenting the genomic DNA. 


This results in some specific fragments that are too big 
or too small to be cloned ina particular vector. There are 
two approaches to the fragmentation of genomic DNA 
for the construction of libraries. One is by partial 
restriction digestion and the other is by physical shear- 
ing. Both methods are reviewed below. 


Genomic DNA Fragmentation by Partial 
Restriction Digestion 

Ideally, the fragmentation of genomic DNA for 
library construction is accomplished by a process 
that breaks DNA randomly. Physical shearing is the 
only way to generate truly random fragments. The 
first genomic libraries and the majority of libraries 
made to date were constructed utilizing fragmentation 
of the genomic DNA by partial digestion with a 
restriction endonuclease. Partial digestion with 
restriction enzymes can be used to break the DNA 
in an approximately random manner. For enzymes 
with a 4-base recognition sequence, the restriction 
site will statistically be present about every 200 
bases. For 6-base-recognizing enzymes this statistical 
frequency is about once every 4000 bases. The pri- 
mary advantage of using restriction enzymes is that 
the sticky ends and blunt ends generated by the 
enzymes can be efficiently ligated to a vector. Fre- 
quently a 4-base-recognizing enzyme is selected for 
genomic DNA fragmentation which will give a sticky 
end compatible to the sticky end of a 6-base recogni- 
tion site in the vector that is used as the cloning site. 

Conditions for the partial restriction digestion are 
determined empirically on an analytical scale. Geno- 
mic high-molecular-weight DNA is incubated with 
limiting amounts of the selected restriction enzyme 
for variable lengths of time. Samples of the digested 
DNA are removed at different time intervals and ana- 
lyzed by agarose gel electrophoresis to determine the 
size range of the digested DNA. The time point of the 
digestion containing the largest amount of DNA in 
the desired size range is used as the guide for the 
preparative digestion reaction. 

DNA of the desired size is isolated from the pre- 
parative partial digestion reaction by size fractionating 
the DNA by low-melting-point agarose gel electro- 
phoresis or sucrose gradient centrifugation. The agar- 
ose gel technique is the more versatile and can be used 
for obtaining insert fragments from the hundred base 
pair to the Mb size range in the appropriate gel system. 

Digestion with a restriction endonuclease requires 
that additional considerations be reviewed for library 
construction. The presence of compatible stickly ends 
on both the ends of the vector molecule and the inserts 
allow for the vector to ligate to itself without an insert 
and also allows the insert fragments to ligate to each 
other. This can lead to vector only clones and chimeric 
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clones in the library. Procedures to minimize these 
outcomes must be employed. 


Genomic DNA Fragmentation by Physical 
Shearing 
Physical shearing fragments DNA inarandom fashion. 
Depending on the desired size of DNA inserts, differ- 
ent procedures are applied to accomplish the frag- 
mentation. Sonication can be used to reduce the size 
of genomic DNA fragments into the hundreds of base 
pairs range. Nebulization using a disposable medical 
nebulizer can be used to achieve reproducible frag- 
mentation to obtain products in the 1500 bp to 10 kb 
size range. The desired size ranges are obtained by 
using different gas pressure to achieve the nebulization. 
The minimum pressure that achieves slow nebulization 
(about 5-6 lb/sq. in. = 35-53 kPA) is used to obtain 
fragments inthe 10 kb range. Higher pressures shear the 
DNA tosmaller fragments. DNA fragments larger than 
10 kb can frequently be obtained directly from the 
DNA purification procedure. Alternatively, a BAL31 
exonuclease digestion can be used to reduce the size of 
genomic DNA obtained after extraction from the cells. 

The use of a physical shearing process to obtain 
DNA fragments for library construction necessitates 
additional steps before the inserts can be cloned into 
the library vector. Sheared DNA has ragged ends in 
contrast to the defined sticky or blunt ends generated 
by restriction enzyme cutting. Before sheared frag- 
ments can be cloned into a vector, the ragged ends 
must be repaired. In doing this they are either repaired 
to blunt ends, which can then be blunt-end-cloned 
into the vector or repaired to blunt ends and modified 
by the addition of oligonucleotide adaptors to give 
sticky ends, which can then be sticky-end- cloned in 
to the vector. Since blunt-end- cloning is inherently 
inefficient, the preferred method is to modify the 
fragment ends with adaptors or linkers. Adaptors are 
small synthetic pieces of DNA that contain one blunt 
end and one sticky end compatible with a restriction 
enzyme-generated end. Linkers are small completely 
double-stranded blunt-ended pieces of DNA contain- 
ing the recognition sequence for a restriction enzyme. 
An important feature of the adaptor strategy is that 
the adaptors must have sticky ends that are not self- 
complimentary so that the adaptors cannot ligate to- 
gether through their sticky ends. The most commonly 
used adaptor for this strategy is the commercially 
available BstXI adaptor. 

The steps in the adaptor strategy process are: 


1. after completion of the genomic DNA preparation, 
a quantity of the DNA is sheared to the desired size. 
The size range of the sheared DNA is verified by 
analytical agarose gel electrophoresis. 
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2. The DNA is size-fractionated to obtain fragments 
of the desired size typically using preparative low- 
melting-point agarose gel electrophoresis. 

3. After extraction from the gel, the ragged ends of the 
fragments are repaired to blunt ends using T4 DNA 
polymerase. 

4. The now blunt ends on the fragments are ligated to 
oligonucleotide adaptors. The adaptors are selected 
to have sticky ends compatible with those on the 
library vector. 

5. The excess adaptors and any chimeric fragments 
generated in the ligation reaction are removed 
by recovering the fragments of the desired size 
from low-melting-point agarose. At this point 
the insert fragments are ready for ligation to the 
library vector. 


Future Genomic Libraries 


There are more genomic libraries being made now 
than at any time in the past. These libraries are being 
made to support genome- -wide mapping and sequen- 
cing projects. The scale and scope of these projects 
demand very high- quality libraries as discussed earl- 
ier. Most of these requirements result from the high 
cost of DNA sequencing and from the need to assem- 
ble the sequence reads from both ends of a clone into 

contiguous sequence. When the sequence is assembled 
in these projects, unclonable sequences remain as gaps 
in the assembly. These gaps are expensive and time- 
consuming to fill. At The Institute for Genomic 
Research, Rockville, MD, and elsewhere the issue of 
vector design to minimize the incidence of unclonable 
sequences is being investigated. Some sequences are 
unclonable because the DNA is unstable in £E. coli or 
because the RNA or protein product of a sequence is 
toxic to E. coli. Using E. coli host strains that are 
recombination deficient, which is common practice, 
minimizes the unstable DNA problem. The deleteri- 
ous consequences of unstable DNA and toxic prod- 
ucts are ameliorated by use of a vector that is 
maintained at a lower copy number. Plasmid vectors 
with replication systems that maintain copy number 
from 500-700 (pUC) down to 1 (BAC), and at many 
copy number levels in between, can be explored for 
genomic library applications. 

An additional issue of clone viability i is transcrip- 
tion of the insert region or transcription originating 
within the insert. The first will express toxic products 
coded by the insert, the second may initiate transcrip- 
tion that may interfere with replication as transcription 
extends around the plasmid vector circle. An approach 
to dealing with this issue is to design a vector in 
which the entire cloning region is isolated from RNA 


transcription. Strong promoters oriented toward the 
cloning site, such as the Jac promoter contained in the 
pUC series of vectors, should not be present. Such 
promoters can lead to expression of toxic peptides co- 
ded by theinsert, and might contribute to transcription- 
stimulated recombination events in the insert region. 
In addition, it would be desirable to enclose the insert 
region within strong transcription terminators. The 
terminators serve a dual purpose. Firstly, they prevent 
strong promoters that might be present in the cloned 
insert from transcribing into the vector sequence 
and possibly interfering with plasmid replication. 
Secondly, they prevent transcription arising in the 
surrounding vector sequence from reading into the 
insert. 

As the vectors and associated library construction 
strategies continue to develop in supporting genome 
sequencing projects, the quality of the libraries 
will continue to increase. The level of coverage of the 
genome will improve as more sequences in the gen- 
ome are removed from the unclonable category by 
library vector design and by the use of physical shear- 
ing for fragmentation of the genomic DNA. Addition- 
ally, library construction strategies will be used that 
minimize the incidence of chimeric clones in libraries. 
The development of genomic library technology in 
these directions will result in better libraries being 
available for any application. 


Further Reading 

Birren B, Green ED, Kapholz S et al. (eds) (1999) A Laboratory 
Manual: Cloning Systems, vol. 3, Genome Analysis. Plainview, 
NY: Cold Spring Harbor Laboratory Press. 

Sambuod J and Russell D (2001) Molecular Cloning: A Laboratory 
Manual, 3rd edn. Plainview, NY: Cold Spring Harbor Labora- 
tory Press. 


See also: Genome; Human Genome Project; 
Phage A Integration and Excision; Plasmids; 
Vectors 
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Genomics is the term for the study of the genome, the 
DNA content of a cell. 


See also: Functional Genomics; Genome; 
Genome Organization; Genome Size 


Genotype 
L Silver 
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For any one organism, its genotype is the set of alleles 
present at one or more loci under investigation. At any 
one autosomal locus, a genotype will be either homo- 
zygous (with two identical alleles) or heterozygous 
(with two different alleles). 


See also: Heterozygote and Heterozygosis; 
Homozygosity 


Genotypic Frequency 
A Clark 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0562 


Populations consist of assemblages of individuals each 
having its own genotype. Considering the entire gen- 
ome, in all but exceptional cases, like identical twins or 
clonal organisms, each individual has a unique geno- 
type. If we restrict our attention to one or a few genes 
at a time, then there will be many individuals having 
the same genotype. Considering just a single gene, there 
may be one, two, or more alleles segregating in the 
population. If there is only one allele, then all geno- 
types are the same and the genotypic frequency is 1. If 
there are two alleles, say A and a, then there may be as 
many as three genotypes, AA, Aa, and aa. The “geno- 
typic frequency’ is defined as the count of a genotype 
divided by the total count of individuals in the sample. 


Numerical Example 


If a sample of genotypes from a population consists of 
16 AA, 48 Aa, and 36 aa, then the frequency of geno- 
type AA is 16/100 = 0.16. Similarly the frequencies of 
Aa and aa are 0.48 and 0.36, respectively. Note that the 
sum of the frequencies of all genotypes is 1. Notice 
also that these are estimates of the genotypic frequen- 
cies. Out of the entire population, the true frequency 
of genotype AA may be slightly different from 0.16. 
We can estimate our statistical confidence in the geno- 
typic frequency estimate by assuming that the sam- 
pling was done by randomly drawing individuals from 
the population. Under this kind of sampling, the vari- 
ance of a genotype with frequency x is approximately 
x(1 — x)/n, where n is the sample size. It should be 
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clear that the larger our sample is, the smaller will be 
this variance, and the better will be our estimate of 
genotypic frequency. 


Hidden Variation 


Whenever we determine what genotypes individuals 
have, we nearly always restrict attention to one or a 
few genes. Calculation of the genotypic frequencies of 
the genes we observe is done in the same way, whether 
we score only the one gene or many other genes. There 
will always be hidden or unobserved variation lying 
within each of the genotypic classes. Another kind of 
hidden variation that makes calculation of genotypic 
frequencies difficult is dominance. The simplest kind 
of dominance occurs when genotypes AA and Aa both 
have the same phenotype. In this case we cannot 
directly count up the genotypes, so only indirect esti- 
mation of genotype frequencies is possible. In this 
case, it would be necessary to make some additional 
assumptions about the population before it would be 
possible to estimate genotypic frequencies. In this 
example, if we were willing to assume that the popula- 
tion is in Hardy-Weinberg equilibrium, then we could 
estimate the frequencies of AA and Aa from the fre- 
quencies of the A and a alleles. 


See also: Allele Frequency; Hardy-Weinberg Law 


Germ Cell 
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Germ cells are a central component of sexual repro- 
duction in animals. They are the route by which the 
genome and cytoplasmic components are transferred 
to the next generation. This route utilizes meiosis and 
gametogenesis, processes that are unique to germ cells. 
Germ cells differentiate to produce male and female 
gametes, sperm and unfertilized eggs (oocytes or ova), 
and undergo meiosis to produce a haploid set of 
chromosomes. Haploid gametes then unite to form a 
diploid zygote that develops into a new individual. 
Germ-cell-mediated sexual reproduction thus creates 
genetic diversity, which is essential for evolution: 
meiosis and gamete fusion generates offspring that 
are genetically dissimilar from each other and distinct 
from either parent. In many animals, there is a germ- 
line lineage, composed of germ cells that will form 
gametes, and a somatic lineage, containing the majority 
of cells, which form the rest of the organism (tissues 
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Figure | Cycle of the germline. Soon after formation of the diploid zygote, germ cells become specified as distinct 


from somatic cells that will give rise to the rest of the organism. These primordial germ cells migrate and then 
interact with specific somatic cells to form the gonad. Germ cells proliferate and then initiate meiotic development 
(enter meiotic prophase). The timing of proliferation and entry into meiotic prophase depends on the species and 
the sex. For example, in female mammals all germ cells have entered meiotic prophase prior to birth, while in male 
mammals, proliferation and entry into meiotic prophase is continuous in sexually mature animals. Reciprocal 
recombination between homologs occurs during meiotic prophase I. Many of the activities of gametogenesis occur 
contemporaneously with late stages of meiotic prophase |. For male germ cells, progression through meiotic 
prophase | and the divisions of meiosis | and meiosis Il occur without pause. Female germ cells of most species arrest 
late in meiotic prophase. Following external signals, the oocyte matures and progresses through meiosis |. A number 
of species have a second arrest point (e.g., vertebrate oocytes arrest in meiosis Il). Fertilization relieves the arrest, 


resulting in the completion of meiosis and the initiation of a new round of zygotic development. 


such as gut, limbs, etc.). One can take the view that the 
raison d’être for an organism’s somatic cells is to facil- 
itate the function of the germline so that their genetic 
material is passed to the next generation. Features 
of germ cells and their development are described 
below in general terms and with specific organismal 
examples, which give them their unique character. 


Germline Development 


The development of germ cells is similar among ani- 
mals (Figure 1), although the details often differ 
between species and between sexes of the same spe- 
cies. Germ cells are usually separated from somatic 
cells in early development. In a number of nonmam- 
malian species, cytoplasmic ‘germ plasm’ in the unfer- 
tilized egg (also called pole plasm, germinal granules, 
or P-granules, depending on the organism) may spe- 
cify the germ cell identity. In the fruit fly Drosophila, 
the nematode Caenorhabditis elegans, and various 
amphibians, embryonic cells that contain ‘germ plasm’ 
usually develop as germ cells. The ‘germ plasm’ is 
prelocalized in the Drosophila oocyte or becomes 
asymmetrically segregated to certain blastomeres 
during cleavage divisions in C. elegans. By contrast, 


in the mouse (and likely other mammals), cell-cell 
interactions are important for germline specification 
and localized maternal ‘germ plasm’ appears not to be 
involved. Once specified, these primordial germ cells 
migrate and populate the forming gonad, i.e., the 
female ovary or the male testis. Germ cells in the 
gonad enter the meiotic pathway and undergo either 
oogenesis (development of the egg) or spermatogen- 
esis. The mechanism by which the sex of germ cells is 
determined depends on both germ cell autonomous 
influences (sex chromosomes, X/A ratio, and/or 
maternal factors) and signals from the somatic gonad, 
and varies widely among different species. 


Meiosis 


The process by which diploid germ cells produce 
haploid gametes that contain only one of each homo- 
logous chromosome is called meiosis. Following the 
last mitotic division, germ cells initiate meiosis/game- 
togenesis, undergoing a period of DNA synthesis 
such that both the maternal and paternal homologous 
chromosomes are duplicated, resulting in each con- 
taining two sister chromatids. Prior to the first meiotic 
division, the 47 germ cells are in prophase (prophase I) 


for a prolonged period, which can last more than 40 
years for mammalian oocytes. While meiotic prophase 
I resembles G, of the mitotic cell cycle, it is distinct in 
two important ways: (1) the chromosomes proceed 
through a series of stages (lepotene, zygotene, pachy- 
tene, diplotene, and diakinesis) that are associated 
with the process of reciprocal recombination (cross- 
over) between homologs; and (2) there are many syn- 
thetic activities and morphological changes associated 
with gamete differentiation. Early in meiotic prophase 
I, the maternal and paternal homologous chromo- 
somes pair and synapse and initiate recombination 
(lepotene and zygotene stages). The paired homologs 
are assembled into an elaborate structure called the 
synaptonemal complex that is maintained throughout 
the pachytene stage. The paired homologous chromo- 
somes, each with two closely opposed sister chroma- 
tids, is called a bivalent or tetrad. The chromosomes 
desynapse and the synaptonemal complex dissolves 
during diplotene. At this time, chiasmata are formed 
that are attachment points between recombined 
homologs; chiasmata are considered to be the mor- 
phological consequence of prior crossovers between 
two nonsister chromatids of a bivalent. These pro- 
cesses prepare the chromosomes for the specialized 
two successive cell divisions of meiosis. 

The meiosis I reductional division (MI) separates 
the two homologous chromosomes. There are at least 
three features of the MI reductional division that differ 
from mitosis. First, chiasmata and sister chromatid 
cohesion distal to the chiasmata serve an essential 
function, analogous to a mitotic centromere, of hold- 
ing and aligning the maternal and paternal chromo- 
somes until MI anaphase. Second, the kinetochores 
(the site of attachment of the spindle microtubules) 
of the two sister chromatids of a homolog behave as a 
single unit, insuring that both proceed to the same 
pole. Third, in the transition from metaphase to ana- 
phase of MI, the previously closely opposed sister 
chromatids become unglued and the chiasmata dis- 
solve leading to segregation of the homologs to oppos- 
ite poles. Following MI, the meiosis II equational 
division (MII) occurs, without an intervening period 
of DNA synthesis, where the sister chromatids are 
separated to opposite poles in a similar way to a mito- 
tic division. Meiosis thus generates genetic diversity in 
two ways: random assortment of chromosomes at the 
MI and MII divisions and the reshuffling of genetic 
material through recombination during prophase I. 
Disruption of any of the steps in meiosis can cause 

abnormal chromosome segregation (called nondis- 
junction) producing aneuploid gametes. The resulting 
progeny may have birth defects as a consequence of 
nondiploid chromosome number (e.g., Down syn- 
drome, trisomy 21). 
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Gametogenesis 


The sperm and egg are highly specialized for their 
different tasks. Sperm are small, highly motile and 
efficient in the process of fertilization. In many spe- 
cies, sperm also provide the centrioles necessary for 
zygotic development. The egg is very large, ranging 
from a thousand- to more than a millionfold the mass 
of a typical somatic cell, depending on the species. The 
egg supplies organelles (e.g., mitochondria), nutrients, 
precursors, RNAs, proteins, and a protective covering 
or shell. The stored RNAs and proteins provide 
the materials necessary to direct embryogenesis until 
expression from the zygotic genome is initiated. For 
many nonmammalian species, the unfertilized egg 
contains molecular determinants that are either pre- 
localized or become localized following fertilization, 
which provide polarity information for the develop- 
ing embryo. In addition, for nonmammalian species, 
the stored materials allow embryogenesis to occur 
externally without further support from the mother. 
In the production of gametes, the nuclear events of 
meiosis and the cellular differentiation of oogenesis 
and spermatogenesis are intimately intertwined. Much 
of the RNA and protein synthetic activity necessary 
for gametogenesis occurs in pachytene and diplotene. 
For oocytes, the massive growth usually occurs in 
diplotene. The large accumulation of material in the 
oocyte is also often assisted by somatic gonad cells 
(follicle cells) and, for many invertebrates, can also be 
aided by other germ cells called nurse cells. For many 
nonmammalian species, yolk is synthesized outside 
the ovary andis transported to growing oocytes. Sperm- 
atogenesis usually occurs continuously, without 
arrest, in reproductively mature males. The meiotic 
divisions produce four haploid spermatids, of equal 
size, which then undergo extensive postmeiotic differ- 
entiation to produce mature spermatozoa. Oogenesis 
has a number of features that are distinct from sperm- 
atogenesis. To generate the large size of the egg, the 
meiotic divisions are unequal; MI generates a large 
diploid oocyte (often called the secondary oocyte) 
and a small first polar body and MII produces a large 
haploid egg and a small second polar body. Oogenesis 
is often arrested in prophase to allow oocyte growth 
and to provide a means of regulating egg release. In 
most vertebrates, the prophase arrest is in diplotene. 
The release from prophase arrest (called meiotic 
maturation) is regulated by external cues (e.g., hormo- 
nal signals from the menstrual or estrus cycle). In 
many vertebrate species, there is a second arrest at 
metaphase of MII. Following ovulation, where the 
egg is discharged from the ovary, fertilization releases 
the arrest resulting in the completion of meiosis and 
the initiation of zygotic development. The point in 
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oocyte/egg development at which fertilization occurs 
varies from late prophase to after the MII division, 
depending on the species. 


Immortality and Totipotency 


The life history of the germline thus marches from 
fertilization to fertilization, proceeding through the 
stages of germ cell specification, migration and gonad 
formation, proliferation, and entry into and progres- 
sion through meiosis and gametogenesis (Figure 1). 
This cycle of the germline, from generation to genera- 
tion, is a central feature of the continuum of multi- 
cellular life. Because the germline is essentially 
continuous from generation to generation, the germ- 
line lineage can be thought of as being ‘immortal,’ 
although individual germ cells are not. 

The fertilized egg is totipotent as it will give rise to 
all the cell types and cell assemblies that constitute the 
organism. Since germ cells form the zygote, they can 
be considered as carrying the property of totipotency. 
In certain cases (e.g., mouse), cells from cell lines 
derived from primordial germ cells (embryonic germ 
(EG) cells), as well as cell lines from early embryos 
(embryonic stem (ES) cells), have been experimentally 
demonstrated to be totipotent. These cell lines have 
been very useful for genetic manipulations in the 
mouse, allowing targeted mutations to be generated 
and studied in the whole organism. 
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Identification of individual chromosomes of the 
laboratory mouse (genus Mus) was virtually impossible 


until the development of methods for staining 
metaphase chromosomes to reveal their differential 
banding patterns. A method for banding mouse 
chromosomes was first developed using quinacrine 
mustard fluorescence by Lore Zech and Torbjérn 
Caspersson in 1969-1970. During the early 1970s, 
several laboratories developed methods using Giemsa 
stain and various combinations of heat and trypsin 
treatment, called the ASG (acetic acid—saline—Giemsa) 
or ASG/trypsin methods. Edward P. Evans was one of 
the key scientists involved in developing high quality 
Giemsa banding (G banding) of mouse chromosomes. 
The Giemsa stain used in these methods is the same as 
that traditionally used for staining blood smears. In 
the mid 1990s, fluorescence banding of chromosomes 
returned with the use of DAPI and related stains to 
identify mouse chromosomes with fluorescent in situ 
hybridization (FISH) gene mapping methods. G band- 
ing, however, remains the best method for high re- 
solution identification of banding patterns in mouse 
chromosomes and chromosomal aberrations. 

The basis of all these banding methods appears to 
be the frequency of A-T versus C-G base pairs in a 
stretch of chromosomal DNA. An extensive literature 
was published during the mid 1970s on ‘chromosomal 
banding.’ It should be noted that even G-banded 
mouse chromosomes can be difficult for the novice 
to identify and classify. Although banding patterns of 
individual chromosomes are nonvariant (except for 
pericentromeric heterochromatin C bands), they may 
appear different at different stages of chromosomal 
contraction. In 1984, Cowell produced a good guide 
to classification with photographs of mouse chromo- 
somes at different stages of contraction (Cowell, 1984). 

A standard method for preparing G-banded meta- 
phase chromosomes from living mice is outlined 
below; details on technique and sources of reagents 
may be found in Davisson and Akeson (1987). The same 
method can beused to prepare G-banded chromosomes 
from any mitotic tissue in the mouse. For example, 
suspensions of bone marrow cells can be washed out 
of femurs with a 23 to 25 gauge needle or solid tissues 
such as the spleen can be minced and pipetted to obtain 
cell suspensions. To prepare metaphase chromosomes 
from live mice, approximately 70 ul of blood is drawn 
by retroorbital or tail vein bleeding and mixed 
immediately with 0.1 ml sterile sodium heparin (500 
USP units ml~'). Blood is cultured in 16 x 125mm 
disposable culture tubes. 0.2ml of whole blood/ 
heparin mixture is inoculated into 0.95 ml of RPMI 
1640 culture medium containing glutamine, Hepes 
buffer, and gentamicin solution (final concentration, 
0.1 mg ml~'), and supplemented with 0.15 ml of fetal 
bovine serum, 0.1 ml of 750 ug ml! lipopolysacchar- 
ide (LPS) and 0.1 ml of 60-90 ug ml~’ purified PHA 


(phytohemagglutinin; concentration determined by a 
dose-response curve for each batch of PHA). The 
cultures are incubated at an approximately 45° angle 
for 43h at 37 °C in a shaking water bath. Colchicine 
(0.15 ml of a 50 ug ml“! solution) is added to each cul- 
ture forthe last 15-20 min. Cells are harvested by centri- 
fugation, resuspension in hypotonic 0.56% (0.75 mol) 
potassium chloride for 15 min, centrifugation, and fix- 
ation in methanol:glacial acetic acid (3:1). After 30 min 
cells are centrifuged and resuspended in three sequen- 
tial washes of the methanol:glacial acetic fixative. 

The method of slide preparation is important 
because well-spread metaphases are critical for high 
quality G-banded preparations. Precleaned slides are 
soaked in fixative at least 15 min prior to use. Air-dried 
metaphases are prepared by dropping a few small 
drops of cell suspension onto a precleaned slide, 
allowing it to spread, and then rapidly blowing dry 
when the drop begins to contract and rainbow colors 
appear at the edges. Some cytogeneticists believe 
spreading is improved by dropping a very small drop 
of clean fixative onto the preparation just as it starts to 
dry and allowing the slide to dry in a horizontal posi- 
tion. G bands appear sharper if slides are aged at room 
temperature for 7-10 days. 

To prepare G-band chromosomes slides are incuba- 
ted in Coplin jars (no more than five to six per jar) in2 x 
SSC at 60-65 °C for 1.5 h, transferred to 0.9 % NaCl at 
room temperature, then each slide is rinsed indi- 
vidually in fresh 0.9 % NaCl and drained. Thorough 
rinsing is critical. Slides are stained for 5-7 min in 
a trypsin-Giemsa solution (1.0ml Gurr improved 
Giemsa R66, 45ml Gurr pH 6.8 phosphate buffer, 
4 drops 0.0125% trypsin), then transferred to Gurr 
phosphate buffer diluted 1:1 with distilled water, then 
slides are rinsed individually in two changes of buffer— 
distilled water solution and blown dry. Factors that 
influence chromosomal response to trypsin treatment 
and, therefore, G-band quality, include chromosome 
length (contracted chromosomes are more sensitive 
than elongated ones), chromosome dryness (recently 
made preparations are more sensitive than aged ones), 
and chromosome fixation time (sensitivity is inversely 
proportional to fixation time or chromosome hard- 
ness). 


Further Reading 

Akeson EC and Davisson MT (2000) Analyzing mouse chromo- 
somal rearrangements with G-banded chromosomes. In: 
Jackson | and Abbott C (eds) Mouse Genetics and Transgenics: 
A Practical Approach Series, 2nd edn, pp. 144-153. Oxford: 
Oxford University Press. 

Committee on Standardized Genetic Nomenclature for Mice 
(1972) Standard karyotype of the mouse, Mus musculus. 
Journal of Hereditary 63: 69-71. 
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Lyon MF, Rastan S and Brown SDM (eds) (1996) Genetic Variants 
and Strains of the Laboratory Mouse, 3rd edn. Oxford: Oxford 
University Press. 
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Walter Gilbert (1932—), an American molecular biolo- 
gist, was born 21 March 1932 in Boston, Massachu- 
setts. He was educated at Harvard University and the 
University of Cambridge, receiving the PhD with a 
thesis on particle physics in 1957. He did postdoctoral 
work in physics at Harvard, and in 1959 joined the 
Physics faculty at Harvard. In the summer of 1960 he 
joined James Watson and François Gros in Watson’s 
laboratory in research on messenger RNA. This initial 
exposure to molecular biological research redirected 
his career from theoretical physics to molecular biol- 
ogy, where he has made his major scientific contribu- 
tions, and he subsequently transferred to the faculty in 
Biochemistry and Molecular Biology at Harvard. In 
1982 he left Harvard to head the Swiss biotechnology 
company, Biogen, but returned to Harvard in 1984. 
Among his many honors, Gilbert received the Nobel 
Prize in Chemistry in 1980, sharing it with Frederick 
Sanger and Paul Berg. 

His early research focused on the utilization of 
mRNA and the mechanisms of protein synthesis, 
especially the relationships between the messenger 
RNA, the ribosome, and the transfer RNA. In the 
mid-1960s, Gilbert and Benno Miiller-Hill isolated 
the protein that functions as the repressor of the lac- 
tose operon in Escherichia coli, the first example of a 
genetic control element. This work led to his investi- 
gation of the physical basis of gene regulation by study 
of the interaction of the lac repressor with RNA poly- 
merase and fragments of DNA. In 1968 Gilbert and 
David Dressler proposed the ‘rolling-circle model’ 
for DNA replication which gave the first clear indica- 
tion as to how certain small phages might replicate 
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their DNA. This model was quickly extended to many 
other systems and subjected to experimental tests. In 
the mid-1970s, Allan Maxam and Gilbert developed 
an ingenious method to determine the sequence of 
nucleotides in DNA by base-specific chemical cleav- 
ages of end-labeled DNA fragments followed by size 
fractionation by gel electrophoresis. This method, 
often called the chemical method or the ‘Maxam-— 
Gilbert’ method, was widely used in the early stages 
of DNA sequence analysis until it became supplanted 
by the simpler enzymatic methods developed by Fred 
Sanger. 

As an outgrowth of nucleic acid sequencing, 
Gilbert was an early proponent of genomics, the use 
of sequence databases to study genome structures, 
sequences, organization, and evolution. He has writ- 
ten extensively on the evolutionary origins and sig- 
nificance of the intron/exon structure of eukaryotic 
genes as well as the possible relationship of splicing, 
exon shuffling, and gene rearrangements to modular 
protein evolution. 


See also: Genome Organization; Repressor; 
Rolling Circle Replication; Sanger, Frederick 


Glioma 
V P Collins 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1575 


Gliomas are neoplasms composed of tumor cells 
that on histopathological examination show varying 
degrees of phenotypical similarity to adult or devel- 
oping macroglia. The macroglia form the main 
subgroup of the neuroglia and include astrocytes, 
oligodendrocytes, and ependymal cells. More than 20 
types of glioma are recognized and the histological 
criteria for their diagnosis defined in the World Health 
Organization (WHO) classification of tumors of the 
central nervous system. The tumors may in addition 
be malignancy graded in grades I-IV on the basis of 
histological attributes defined by WHO. The malig- 
nancy grade is an estimation of the degree of 
malignancy usually encountered in each type of tumor, 
where grade I is the least and grade IV the most 
malignant. Response to contemporary therapy is indi- 
vidual to each tumor type and malignancy grade. The 
cells of origin for these phenotypically diverse tumors 
are unknown. The various tumor types have different 
genetic abnormalities. The commonest form of glioma 
in adults is the highly malignant glioblastoma, the 
tumor cells of which show phenotypical similarities 


to astrocytes. In children, the commonest glioma is the 
relatively benign pilocytic astrocytoma. Gliomas are 
more common in males than in females. 


See also: Genetic Diseases 
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The globin genes determine the structure and synthe- 
sis of the globin chains that constitute the different 
hemoglobins that are produced in the human embryo, 
fetus, and adult. Human beings make different hemo- 
globins as they develop as an adaptive response to the 
variation in oxygen requirements between embryonic, 
fetal, and adult life. 

All the normal human hemoglobins have the same 
basic structure. They consist of two different pairs of 
globin chains, that is, long strings of amino acid which 
fold into a complex three-dimensional structure. Each 
of the four globin subunits that makes up a hemo- 
globin molecule has a heme group, the oxygen- 
carrying moiety, embedded in its surface. The different 
globin chains are named after letters of the Greek 
alphabet. Adult and fetal hemoglobins have « chains 
associated with P (hemoglobin A, 2B), 6 (hemo- 
globin A2, 0262) or y chains (hemoglobin F, «72), 
whereas in the embryo, embryonic a-like chains called 
G chains combine with y (hemoglobin Portland, C272) 
or g chains (hemoglobin Gower 1, C&2), and and € 
chains combine to form hemoglobin Gower 2 (4282). 
The embryonic hemoglobins are so-called because 
they were first characterized at University College 
Hospital in Gower Street, London, and in Portland, 
Oregon. 

Since each globin peptide chain is the product of a 
gene locus, it follows that there must be a, P, y, 6, €, 
and ¢ globin genes. 


Hemoglobin Genes Organized in 
Clusters 


The globin genes are organized into two clusters which 
are situated on different chromosomes (Figure |). The 
a-like genes, which are encoded on chromosome 16, 
are found in the order 5’-C-@C-@a2-@pal-«2-01-61-3’. 
The B-like globin genes, on chromosome 11, occur in 
the order 5/-e-Cy-“y-@f-5-B-3’. The 5’ to 3’ nomen- 
clature indicates the order of the genes, from left to 
right. 
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Both clusters contain two genes which are dupli- 
cated; there are two chain genes, «2 and a1, and two 
ychain genes, y and ^y. The Gand A refer to the amino 
acids glycine and alanine; the products of the two y 
genes are identical except at amino acid residue 136, at 
which one contains glycine and the other alanine. The 
product of the pairs of a genes are identical. The other 
feature of these clusters is the presence of pseudogenes 
which are given the prefix ọ. They are thought to be 


evolutionary remnants of once-active globin genes. 


Structure of Globin Genes and their 
Clusters 


The structure of the globin genes has been highly 
conserved throughout evolution. Their transcribed 
regions, that is the parts of the gene which form the 
template for messenger RNA production, contain 
three coding regions, or exons, separated by two 
introns, or intervening sequences (IVS), of variable 
length. From the CAP site, the start of transcription, 
the first exon encompasses approximately 50bp of 
untranslated sequence (UTR) and the codons for 
amino acids 1-31 in the æ and 1-30 in the B globin 
genes. Exon 2 encodes amino acids 32-99 and 31-104 
respectively, those portions of the globin chains that 
are involved in heme binding and in contacts between 
the æ and B chains that are critical for the normal 
function of hemoglobin as an oxygen carrier. The 
third exon encodes the remaining amino acids, 101- 
141 for the a, and 105-146 for the B chains, together 
with a 3’ untranslated region of about 100bp. The 
sizes of the introns vary between different genes. In 
the a globin genes they are both small, 117-149 bp, 
while in the € gene IVS-1 is ~886 and IVS-II 
is ~239bp. IVS-1 in the B genes is also small, 
122-130 bp, while IVS-II is much larger, 850-904 bp. 

As well as the exons there are other sequences in the 
globin genes which are highly conserved. Removal of 
the intervening sequences from the initial messenger 
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RNA transcript, and joining the exon sequences to 
form the definitive messenger RNA, is dependent on 
the specific sequences of the borders between exons 
and introns. At the 5’ end of each intron there is 
always the dinucleotide GT, and at the 3’ end AG. 
Adjacent nucleotides are also conserved to form a 
consensus sequence. Mutations that involve these 
regions in certain inherited disorders of hemoglobin 
interfere with the normal processing of messenger 
RNA to such a degree that no gene product is pro- 
duced. Processing also involves the addition of a track 
of adenylic acid (A) residues at the 3’ end of the mes- 
senger RNA. The signal in each globin gene for this 
process is AATAAA, which is conserved in the 3’ 
untranslated region, approximately 10-30 nucleotides 
upstream of where the initial transcript is to be cut and 
polyadenylated. 


How Globin Genes Are Regulated 


A complete account of the regulation of the globin 
genes would explain why they are only active in 
appropriate tissues, that is, in the red cell precursors 
in the bone marrow, how their expression is controlled 
such that they synthesize relatively large amounts of 
globin in a way which ensures that the output of the 
æ and B chains is almost synchronous, and how the 
different globin genes are activated and repressed 
at different stages of development. Currently, it is 
impossible to answer these questions fully, although 
some progress has been made. 

Transcription of genes is dependent on the attach- 
ment of a transcription complex including the enzyme 
RNA polymerase, at their 5’ ends. Appropriate posi- 
tioning of the transcription machinery is brought about 
by recognition of specific DNA sequences in the region 
upstream of the transcriptional start site, known as the 
promoter. Like many genes the globin genes have boxes 
of DNA homology, TATA and CCAAT, found 30 and 
70 bp upstream of the CAP site. In addition to these 
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regions, many erythroid-specific genes, including the 
globin genes, have a CACCC homology box in the 
promoter, upstream of the CCAAT box. This par- 
ticular region is found in most of the B-globin-like 
promoters and is duplicated in the B globin gene but 
not in the « globin gene promoters. This sequence is 
also missing from the promoter of the 6 gene. 

In addition to the promoter sequences, more distal 
sequences are found in the B globin gene clusters 
which increase the levels of gene transcription. Five 
regions with this property, called enhancers, have been 
identified in the æ and B globin gene complexes. In 
addition, both complexes have major regulatory elem- 
ents which, if deleted, completely inactivate all the 
genes in the complex. The B globin locus control 
region (LCR) lies upstream from the e globin gene 
and is marked by five DNase hypersensitive sites. 
Similarly, there is a region 40 kb upstream from the q 
globin genes which is also marked by a site of this 
kind, and hence which is called HS-40. Again, if this is 
lost by deletion the entire « globin gene cluster is 
inactivated. These regulatory regions, and a variety 
of other regions throughout the globin gene clusters, 
are marked by DNA binding motifs for a variety 
of transcription factors, some of which, including 
GATA-1 and NF-E2, are erythroid-specific, while 
others are for ubiquitous factors, transcription factors 
which are active in many different tissues. 

Currently it is believed that the B LCR together 
with other enhancers, a variety of transcription fac- 
tors, and other regulatory proteins becomes opposed 
sequentially to the different genes of the B globin gene 
cluster, resulting in their activation. 

The mechanisms for turning on and off the £ and y 
globin genes, and for activating the B and 6 globin 
genes at different stages of fetal development are not 
understood. It seems likely that there may be devel- 
opmental-stage-specific transcription factors although 
these have not been identified in the case of the human 
hemoglobin genes. 


How Human Hemoglobin Genes Evolved 


Globin genes arose early in evolution and are found in 
fungi, plants, and invertebrates, as well as in all ver- 
tebrate species. It seems likely that gene duplication, 
followed by selection of adaptive sequence changes, 
resulted in the production of diverse globin chains 
with specialized functions. This process presumably 
allowed what were originally monomeric forms of 
hemoglobin to evolve into the tetrameric proteins 
that are now found in all higher animals. Different 
a and B globin chains are found in all vertebrates, 
suggesting that they originated before ~4-5 million 
years ago. In fish and amphibians the genes for the two 


types of chains are linked together in a single cluster. 
In other species chromosomal rearrangements must 
have resulted in the separation of the a and B gene 
clusters, certainly by the time that birds evolved. 

In the a globin gene cluster, duplication leading to a 
specialized embryonic (¢) globin chain occurred ~ 400 
million years ago, while the a gene underwent a 
further duplication in many species. Duplication of 
the primitive B chain gene occurred independently 
in birds and mammals ~ 180-200 million years ago 
to give rise to the embryonic g gene. Before the diver- 
gence of the mammals (~ 85 million years ago) further 
duplication events of both genes gave rise to the £ and 
y proto-gene in one case and the adult proto-6 and 
proto-B genes in the other. Other duplications must 
have given rise to the various pseudogenes that are 
seen in the æ and f gene clusters. Interestingly, in 
most mammals the proto-y gene has remained as an 
embryonically expressed gene and was only recruited 
to the fetal stage of development after the emergence 
of primates (55-60 million years ago). Its duplication 
occurred about 35-55 million years ago and has been 
maintained in the lineages leading to the apes. 


Normal Variation of Structure of Globin 
Genes 


The globin gene clusters show a considerable amount 
of variability in their base composition. This can easily 
be identified when a single nucleotide change produces 
or removes a cutting site for a restriction enzyme; these 
harmless changes are called restriction fragment length 
polymorphisms (RFLPs). These do not occur at ran- 
dom but form a series of patterns, or haplotypes, which 
occur at varying frequencies among different popula- 
tions of the world. In the B globin genes there are two 
separate haplotype regions separated by an area where 
there is frequent recombination. In this gene cluster 
there are only single nucleotide RFLPs. However, 
although the « globin gene cluster contains no ‘hot- 
spots’ for recombination it is even more highly poly- 
morphic, containing a number of single nucleotide 
RFLPs and several highly variable regions of DNA, 
that is repeat sequences which vary considerably in 
length and hence provide valuable genetic markers. 
The RFLP haplotypes of the globin gene clusters are 
of considerable value for population genetics and for 
evolutionary studies. They are also useful markers for 
studying the distribution and evolution of different 
mutations of the B globin genes. 


Mutations of Globin Genes 


The mutations of the globin gene clusters result in 
the commonest genetic diseases in man. They cause 
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either structural hemoglobin variants, or thalassemias, 
disorders that are due to a reduced rate of production 
of either the & or B chains of hemoglobin. The parti- 
cularly common disorders of the globin genes, sickle 
cell anemia and the different thalassemias, have 
reached their high frequency in the world population 
because of heterozygote advantage against malaria. 


Further Reading 
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8th edn, pp. 4571-4636. New York: McGraw-Hill. 
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The glucose 6-phosphate dehydrogenase (G6PD) gene 
isa prototype housekeeping gene, as it is ubiquitously 
expressed in most organisms and cell types, and its 
product performs a general, important function in cell 
metabolism. Specifically, G6PD is an enzyme that 
catalyzes the oxidization of glucose 6-phosphate 
(G6P) to 6-phosphoglucono lactone (6PG), coupled 
with the reduction of the coenzyme NADP to 
NADPH. Because 6PG can then be decarboxylated 
to a pentose sugar, the G6PD reaction is often referred 
to as the first reaction in the pentose phosphate path- 
way; at the same time, NADPH is essential as an 
electron donor in numerous biosynthetic pathways 
and in the defense of cells against oxidative stress. 
There is evidence from evolutionary data and from 
genetic inactivation of the G6PD gene in microorgan- 
isms and in mammalian cells that G6PD is indeed 


indispensable for these functions, but not for pentose 
synthesis. 


Formal and Molecular Genetics 


The G6PD gene is highly conserved in evolution. 
The alignment of all available sequences from a wide 
range of organisms highlights regions with the highest 
degree of conservation, for instance, the active center 
and the NADP-binding domain. In mammals the 
G6PD gene is X-linked, and in humans it maps to 
the tip of the long arm of the X chromosome (cyto- 
genetic band Xq28). The human gene spans some 13 kb, 
and it consists of 13 exons, encoding a polypeptide 
chain of 515 amino acids; the active enzyme is a dimer 
of this polypeptide chain. Each subunit is folded into a 
globular structure including 9 a-helices and 9 B-sheets; 
there is no covalent bond between the two subunits, 
and the subunit interface in the dimer consists of 
B-sheets and a-helices, which form a kind of barrel. 
Like in many housekeeping genes, the promoter region 
is highly GC-rich, with several Sp1 and Ap2 bind- 
ing sites, the functional role of which has been char- 
acterized by deletion analysis and mutagenesis. Within 
this region, a 630-bp promoter has been shown to 
retain housekeeping gene expression in transgenic 
mice. 

Since the G6PD gene is X-linked, women hetero- 
zygous for G6PD deficiency are genetic mosaics in 
their somatic cells after X chromosome inactivation. 
For instance, about half of their red cells will be GGPD 
normal and the other half will be G6PD deficient. 
However, in some cases, owing to drift or to somatic 
cell selection, there may be an excess of one or the 
other cell types, giving a completely normal or a 
completely deficient phenotype. Thus, the extent of 
clinical consequences of G6PD deficiency (see 
below) will be a function of the proportion of G6PD 
deficient cells. For this reason G6PD deficiency 
should formally be regarded not as recessive but as 
codominant. 


Evolutionary Genetics 


G6PD is very ancient in an evolutionary context: it is 
found in all organisms except in some of the Archaea 
that live in anaerobic environments and some intracel- 
lular microorganisms that seem to be able to exploit 
the G6PD activity of their respective host cells. The 
G6PD sequence shows evidence of conservation 
throughout all living phyla, with some regions being 
identical in disparate organisms, for example, the 
active center (which includes the G6P binding site) 
and the NADP binding site. 
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G6PD Deficiency 


Investigations of patients who developed acute hemo- 
lytic anemia upon exposure to certain antimalarial 
drugs revealed, in 1956, that their red blood cells had 
a markedly reduced G6PD activity, and that this trait 
was inherited. Thus, G6PD deficiency emerged as the 
first example of a blood cell disease caused by a spe- 
cific enzyme abnormality. It quickly became apparent 
that this inherited abnormality predisposes to hemo- 
lysis in response to several other factors. The wide 
range of factors that can trigger hemolysis in G6PD- 
deficient subjects is related to the fact that all of them 
impose an oxidative stress on red cells. The response 
to this type of stress involves, in particular, glutathione 
(GSH). Since G6PD activity is rate-limiting for regen- 
eration of GSH, normal red cells can withstand such 
stress, but G6PD-deficient red cells succumb. G6PD 
deficiency is due to mutations in the G6PD gene. 
There are some 130 mutations known to date: all of 
them are in the coding region, and almost all of them 
are point mutations causing single amino acid replace- 
ments. In most cases of G6PD deficiency the activity 
of the enzyme in red cells is reduced to about 10-20% 
of normal activity; in some cases it may be as low as 
1-2%. However, there is always some residual activ- 
ity. In a few instances these amino acid replacements 
may affect the catalytic function of the enzyme, but in 
the majority of cases they cause G6PD deficiency 
because they cause the protein to become unstable. 
The absence of large deletions, frameshifts, or non- 
sense mutations supports the notion that complete 
G6PD deficiency would be lethal. This notion has 
been confirmed recently by targeted homologous re- 
combination in mouse embryonic stem (ES) cells: when 
‘G6PD knock-out’ ES cells are injected into blasto- 
cysts heterozygous female mice can be obtained, but 
hemizygous male mutants die în utero at about 10 days 
of gestation. 


Population Genetics 


In many human genes pathogenic mutations are often 
regarded as being in a different category from ‘poly- 
morphisms.’ In the case of G6PD it is quite remarkable 
that many mutations, which are potentially pathogenic 
because they cause G6PD deficiency, are also poly- 
morphic. Indeed, these mutant genes have frequencies 
of up to 10-20% and even greater in many human popu- 
lations. Since the G6PD gene is X-linked, in any 
population in which G6PD deficiency is common, 
the frequency of G6PD-deficient hemizygous males 
will be higher than that of G6PD-deficient homo- 
zygous females but lower than that of females hetero- 
zygous for G6PD deficiency. Interestingly, different 


allelic mutants account for the overall prevalence of 
G6PD deficiency in different parts of the world, and in 
many populations several polymorphic alleles coexist 
(see Figure 1). All of these populations are in malaria- 
endemic areas, or in areas that have been malaria- 
endemic until recently, suggesting that each one of 
these alleles represents an example of balanced poly- 
morphism. In fact, there is evidence from clinical 
studies that subjects with G6PD deficiency have a 
relative resistance to Plasmodium falciparum malaria, 
decreasing significantly the risk of death from this 
condition. In vitro studies have shown that G6PD- 
deficient red cells parasitized by P falciparum are 
phagocytosed by autologous macrophages more effect- 
ively than G6PD normal red cells. The fact that so 
many independently arisen G6PD deficiency muta- 
tions have become prevalent wherever malaria has 
existed for a long time virtually eliminates the pos- 
sibility that G6PD deficiency has become common 
merely by genetic drift. Indeed, the multitude of these 
G6PD-deficient alleles is in itself a strong argument 
for the notion of balanced polymorphism in the sense 
of convergent evolution. 


Clinical Genetics 


As stated above, it was the clinical manifestation of 
acute hemolytic anemia (AHA) that led to the discov- 
ery of G6PD deficiency; AHA can be triggered by a 
variety of drugs, including antimalarials, aspirin, some 
sulfate drugs, and some antibiotics such as nalidixic 
acid. G6PD-deficient subjects can also develop AHA 
in concomitance with a variety of infections, or after 
ingestion of fava beans (see Favism, a well-character- 
ized syndrome which in children is life-threatening). 
The most important approach to these clinical pro- 
blems is prevention, by helping people at risk to avoid 
the offending agents. In cases of severe AHA blood 
transfusion may be imperative. In addition, G6PD 
deficiency can cause a predisposition to severe neo- 
natal jaundice, which can result in long-term neurolo- 
gical damage. Phototherapy is sufficient in preventing 
such damage in most cases, but exchange transfusion 
may be required in severe cases. 

A small proportion of patients with G6PD 
deficiency present with a more severe disease, namely 
chronic nonspherocytic hemolytic anemia (CNSHA), 
even in the absence of any triggering agent. These 
patients have anemia and jaundice, and may require 
regular blood transfusion, which can bring about iron 
overload and the need for iron chelation. The associ- 
ation of G6PD deficiency with CNSHA and with 
AHA is an excellent example of genotype-phenotype 
correlation. Indeed, not surprisingly, the mutations 


Figure | Worldwide distribution of polymorphic variants of G6PD. The variants in each country are shown in order of prevalence according to these symbols: 
U = Union; C = Canton; M = Mediterranean; A = A — (202A); k = Kaiping; t = Taipei; v = Viangchan; m = Mahidol; h = Chatham; | = Coimbra; p = Local variant; 
S = Seattle; s = Santamaria; a = Aures; z = Cosenza; A = A — (968C). See Color Plate 9. 
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that cause AHA are different from those that cause 
CNSHA: the latter mutations are invariably those that 
cause amino acid replacements that compromise the 
stability of the enzyme most drastically. A large pro- 
portion of the mutations map to the region of the 
molecule involved in the dimer interface, because 
they make the dimer structure unstable. 


See also: Balanced Polymorphism; Embryonic 
Stem Cells; Favism 


Glutamic Acid 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.2081 


Glutamic acid (Glu or E) is one of the 20 amino acids 
commonly found in proteins. It has a negatively 
charged side chain and exists as glutamate. Its chemical 
structure is shown in Figure I. 


COO 
*H3N—C—H 


Figure | Glutamate. 
See also: Amino Acids; Proteins and Protein 
Structure 


Glutamine 


J Read and S Brenner 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.2080 


Glutamine (Gln or Q) is one of the 20 amino acids 
commonly found in proteins. Its side-chain contains a 
polar amide group, which can interact strongly with 
water by forming hydrogen bonds. Its chemical struc- 
ture is shown in Figure I. 
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Figure | Glutamine. 


See also: Amino Acids; Proteins and Protein 
Structure 


Glycine 
J Read and S Brenner 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.2074 


Glycine (Gly or G) is the smallest of the 20 amino 
acids commonly found in proteins and has no special 
hydrophobic or hydrophilic character. Its chemical 
structure is shown in Figure I. 


Glycine 
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Figure | Glycine. 


See also: Amino Acids; Proteins and Protein 
Structure 


Glycine max (Soybean) 
P Gresshoff 


Copyright © 2001 Academic Press 
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Soybean is the common name for Glycine max 
(Merrill), an amphidiploid grain legume (2n = 2x = 
40), part of the genus Glycine Willdenow, family Leg- 
uminosae, subfamily Papilionoideae, tribe Phaseolae. 
Glycine genus has its origins in Asia and Australia, 
first named by Linnaeus (Genera Plantarum, 1737) 
based on the Greek glykys = “sweet” (from the 
sweet tubers of Glycine apios L. which now correctly 
is classified as Apios americana). Glycine max is con- 
genic with the wild soybean Glycine soja, with which 
fertile hybrids can be obtained. Soybean is self-fertile 
but outcrossing at about 1-3% is possible. Biparental 
inheritance of some mitochondrial DNA markers 
suggests the possibility of mixed cytoplasms. Soybean 
is a major crop, being used for animal feed, vegetable 
oil, lubricants, industrial paints, ink, mayonnaise, 
soaps, and pharmaceuticals such as isoflavone phyto- 
estrogens (genistein and daidzein) and anticancer 
treatment (naranginin which stimulates cytochrome 
P-450 mono-oxygenase). Flowering is controlled by 
maturity and daylength; genetic variation produced 
different maturity groups ranging from 000 (high lati- 
tudes) to X (= 10) in tropical regions. Average yield is 
about 1.5 tonnes per hectare. World production (1999) 
was 156 million tonnes selling as a commodity on the 
Chicago Board of Trade at a cyclically low price of 


about US$ 190 per tonne. Soybean seeds contain about 
20% oiland 40% (range 35-45%) protein. Average seed 
size is 15g per 100 seeds. The average soybean plant 
grows to 1 minheight, and develops determinate (non- 
meristematic, spherical) nitrogen-fixing nodules in 
symbiosis with bacterial cells of Bradyrhizobium 
japonicum and Sinorhizobium fredii. At present about 
100 genes from the bacterial microsymbiont being in- 
volved in nitrogen fixation or nodule initiation have 
been cloned and characterized. Despite this compon- 
ent of genetic information in the prokaryotic partner, 
most of the key regulatory functions of the soybean 
nodule symbiosis are encoded in the plant genome. 
The haploid genome size of soybean is about 1050- 
1100 Mb, consisting of about 35% highly repeated 
DNA, 30% moderately repeated DNA, and 35% uni- 
que or near-single-copy DNA. The karyotype reveals 
two large, 14 intermediate, and four small chromo- 
somes, with extensive centromeric heterochromatin, 
allowing pachytene discrimination. Trisomics for 
each chromosome are available. Telomere-associated 
sequences have been sequenced and contain the canon- 
ical TTTAGGG sequence. One nucleolus is visible 
matching molecular data for one rRNA locus. Two 
major satellite DNA types of 92 bp and 132 bp have 
been cloned. The 92 bp satellite is clustered in four 
regions with about 70 000-100 000 copies per haploid 
genome. The 132 bp satellite is dispersed. The genome 
of soybean has been found to contain several transpos- 
able elements, although phenotypic evidence for their 
action is scarce. Numerous retrotransposons have been 
discovered. Isoenzymes and biochemical mutants 
(e.g., nitrate reductase, lipoxygenase) are available as 
markers and tools of molecular physiology. 

Several genetic maps are available comprising 
phenotypic markers such as seed coat, hilum, flower 
and pubescence color, root fluorescence, viral, cyst 
nematode and fungal resistance, male sterility, pubes- 
cence density, dwarfism, leaf shape, and nodulation. 
Recessive EMS and fast neutron mutations leading to 
non-nodulation and supernodulation demonstrate 
that the plant genome controls major components of 
the nodulation and nitrogen fixation process. The 
classical genetic maps have been improved through 
the integration of molecular markers such as random 
RFLP clones, EST clones, AFLP, RAPD, and DAF 
polymorphisms, and microsatellites (simple sequence 
repeats, SSRs) allowing marker-assisted breeding as 
well as map-based cloning. The total genome size is 
about 3300cM. Physical mapping in one region 
(pA36 marker on linkage group H) suggests that 
1cM represents about 400 kb. BAC libraries arrayed 
on nylon filters are available as are expressed 
sequence tagged (EST) libraries from different 
tissues and developmental stages. EST collections 
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have been arrayed on microarrays for molecular 
expression studies. 

Soybean was first transformed by Agrobacterium 
tumefaciens and by biolistic particle bombardment in 
1988, leading to the development of one of the first 
GMO products in agriculture, the Round-Up Ready 
soybean. This transgenic plant is resistant to lethal 
doses of the herbicide Round-Up (phosphono-methy]- 
glycine) and has led to considerable public debate and 
antagosism towards its inventors, the Monsanto Com- 
pany. Other transgenic products with altered insect 
resistance and oil composition are being developed. 


Further Reading 
http://www.unitedsoybean.org/soystats 
http://www.ag.uiuc.edu/ ~ stratsoy/new/ 


See also: Nodulation Genes; Symbionts, Genetics 
of; Transfer of Genetic Information from 
Agrobacterium tumefaciens to Plants 


Glycolysis 
F K Zimmermann 
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Glycolysis, a centrally important metabolic pathway 
in almost all organisms, degrades hexoses to pyruvate 
with the concurrent production of adenosine triphos- 
phate (ATP) and reduced nicotinamide adenine 
dinucleotide (NADH) (Figure 1). 


Glycolysis in Saccharomyces cerevisiae 


The genetics of the glycolytic enzymes has been fully 
explored in the yeast Saccharomyces cerevisiae, where 
many genes coding for regulatory factors have been 
identified. There is a large set of genes coding for 
hexose uptake facilitators with different regulation and 
kinetic parameters. Two hexokinases (genes HXK1 
and HXK2), with a 76% amino acid identity, are not 
only catalysts but also sensors for internal glucose and 
fructose and thus trigger carbon catabolite repression. 
Their activity is modulated by an essential feedback 
inhibition by trehalose-6-phosphate, as shown by the 
drastic effects of mutants deficient in trehalose synthe- 
sis. A specific glucokinase (gene GLK1) accounts for 
about 20% of the total glucose phosphorylating activ- 
ity. It is not involved in carbon catabolite repression or 
sensitive to trehalose-6-phosphate. Phosphoglucose 
isomerase (gene PGI) is required for growth 
not only on glucose but also on fructose, because 
the formation of the essential regulator trehalose- 
6-phosphate starts from glucose-6-phosphate. 
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Glycolysis: metabolites, enzymes, and products. Glycolysis can be represented by: 


Glucose + 2ADP + 2P; + 2NAD* = 2Pyruvate + 2ATP + 2NADH + 2H* 


ATP, adenosine triphosphate; ADP, adenosine diphosphate; NADH, reduced nicotinamide adenine dinucleotide; 
P;, inorganic phosphate; DHAP dihydroxyacetone phosphate. 


Heterooctameric phosphofructokinase consists of 
two different subunits, with about 50% amino acid 
identity (genes PFK1 and PFK2). Its activity is subject 
to numerous effectors, most prominently by activ- 
ating fructose-2,6-bisphosphate generated by two 6- 
phosphofructo-2-kinases (genes PFK26 and PFK27). 
A block in glycolysis requires deletion of PFK1 and 
PFR2. A single deletion of PFK2, but not of PFK1, 
slightly reduces growth on glucose. Double mutants 
without PFK26 and PFK27 cannot form fructose- 
2,6-bisphosphate but grow normally on hexoses. 
However, all these mutants have altered levels of 
glycolytic metabolites. 

Yeast aldolase (gene FBA1) belongs to the prokary- 
otic type II aldolases. Mutants deleted for FBA/ are 
inhibited by glucose and grow very poorly ona mixture 
of acetate and low amounts of galactose. The deduced 
amino acid sequence of triosephosphate isomerase 


(gene TPI1) shows about 50% identity to vertebrate 
forms. There are three genes coding for glyceralde- 
hyde-3-phosphate dehydrogenases, TDH1, TDH2, 
and TDH3, with over 90% amino acid sequence iden- 
tities and amounting to 10-15%, 25-30%, and 50- 
60% of the total activity, respectively. Mutant strains 
with all three genes deleted cannot be obtained, sug- 
gesting that this type of protein is essential for growth. 
Gene PGK1, coding for phosphoglycerate kinase, 
with 65% amino acid sequence identity to the 
human enzyme, has been used to construct heterolo- 
gous expression cassettes in yeast, and the regulatory 
components of the promoter have been studied in 
great detail. Phosphoglycerate mutase, gene PGM1, 
shares about 50% identical amino acids with the 
human erythrocyte bisphosphoglycerate mutase. 
Two enolases, which differ only in 20 out of 436 
amino acids, are encoded by constitutively expressed 


Table | 


Human enzyme deficiencies and genetic disease 
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Glycolytic enzyme 


Mutation-associated 


demonstrated or possible defects 


Hexokinase | 
Hexokinase II 


Nonspherocytic hemolytic anemia 
Nonspherocytic hemolytic anemia; insulin resistance; 


possible cause of increased glycolysis in cancer cells 


Glucokinase 


Gestational diabetes; hyperinsulinism of the newborn; 


maturity-onset diabetes of the young 


Phosphoglucose isomerase 
Phosphofructokinase 

Aldolase B 

Triosephosphate isomerase 
Glyceraldehyde-3-phosphate dehydrogenase 


Nonspherocytic hemolytic anemia 

Exercise intolerance and compensated hemolysis (Tarui disease) 
Hereditary fructose intolerance 

Multisystem disease, lethality in early childhood 

Diverse nonglycolytic functions, could be involved in, 


e.g., prostate cancer, age-related neurodegenerative disease 


Phosphoglycerokinase 
Phosphoglycerate mutase 
Enolase | 

Pyruvate kinase 


Chronic hemolytic anemia 
Exercise intolerance 
Deregulation of c-myc oncogene 
oa-Hereditary hemolytic anemia 


ENO! and ENO2, which is strongly induced when 
glucose-6-phosphate levels increase. There are also 
two pyruvate kinase genes in yeast that share about 
40% of the amino acids with the mammalian iso- 
enzymes. PYK1 codes for the major enzyme that 
is induced by increased levels of both glucose-6- 
phosphate and fructose-6-phosphate. This enzyme 
requires fructose-1,6-bisphosphate for activation. 
Lack of this enzyme blocks growth on glucose. 
PYK2 codes for a pyruvate kinase with about 70% 
amino acid identity to the PYK1-encoded protein. 
However, it is fully active without fructose-1, 6-bis- 
phosphate, and transcription is repressed by glu- 
cose. Pyruvate kinase converts phosphoenolpyruvate 
to pyruvate under glycolytic conditions, whereas, 
under conditions of gluconeogenesis, phosphoenol- 
pyruvate is formed from oxaloacetate by phosphenol- 
pyruvate carboxykinase. A simultaneous activity of 
both enzymes would create a futile ATP-wasting 
cycle under gluconeogenic conditions. Strains pro- 
ducing the fructose-1,6-bisphosphate-independent en- 
zyme at the level of the glycolytic pyruvate kinase 
grew at the normal rate under gluconeogenic condi- 
tions, suggesting the existence of an additional control 
mechanism preventing such metabolic waste (Boles 
et al, 1997). The rate of glycolysis as determined by 
the rate of ethanol production could not be increased 
by the overproduction of individual enzymes or several 
combinations of glycolytic enzymes. 


Glycolysis in Humans 


The genetics of glycolysis in humans is complicated 
(1) by the presence of tissue and cell type-specific 


isoenzymes and (2) because several glycolytic 
enzymes and their genes have additional functions 
beyond a strictly catalytic role. The expression of 
the glycolytic enzymes is stimulated by glucose in 
several cell types via glucose-6-phosphate and a 
hypoxia-inducible helix-loop-helix transcription fac- 
tor. Numerous genetic diseases are caused by enzyme 
deficiencies in the glycolytic pathway (Table 1). 
Deficiency in hexokinase type I causes hemolytic 
anemia. Hexokinase II is a leading enzyme and glucose 
‘sensor’ in insulin-sensitive tissues, and a defect causes 
type 2 diabetes. Many tumor cells have increased 
rates of glucose catabolism, which can promote cell 
proliferation. Certain tumor-associated p53 mutant 
proteins cause a significant activation of the type II 
hexokinase promoter. Glucokinase is the glucose sen- 
sor, and low-activity and low-stability mutants can ex- 
plain in part the maturity-onset diabetes of the young 
(MODY), because glucose metabolism of the B-cells 
controls insulin secretion, and amino acid sub- 
stitutions have been associated with this syndrome. 
Different amino acid substitutions of the muscle phos- 
phofructokinase cause an exertional myopathy and 
hemolytic syndrome (Tarui disease). A stop codon in 
position 145 of the triosephosphate isomerase locus 
has been associated with neurological disorders. Gly- 
ceraldehyde-3-phosphate dehydrogenase has a sub- 
unit that participates in RNA export and DNA 
replication and repair. Mutant forms of this enzyme 
could be involved in several disease syndromes. Phos- 
phoglycerate kinase deficiency has been found in 
patients with myoglobinuria. The gene coding for 
the a-enolase isoenzyme is transcribed into a single 
mRNA species which, when translated from the first 
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initiation codon, yields enolase. Another AUG codon 
400 bp downstream starts the translation of a protein, 
MBP-1, binding and thus downregulating the pro- 
moter of the c-myc gene which, when overexpressed, 
causes cancer. Thus the human eno/ gene could be a 
tumor suppressor gene. Many well-defined mutations 
affecting erythrocyte pyruvate kinase enzymic param- 
eters cause severe hemolytic anemia. 

Recent findings support the view that nuclear genes 
for the enzymes of glycolysis in eukaryotes were 
acquired from mitochondrial genomes (Liaud et al., 
2000). 
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A large number of intrinsic and extrinsic mutagens 
induce structural damages to cellular DNA, as well 
as errors occuring during DNA replication. These 
DNA damages are cytotoxic, miscoding, or both, 
and are believed to be the origin of cell lethality, tissue 
degeneration, aging, and cancer. In order to counteract 
immediately the deleterious effects of such lesions, 
leading to genomic instability, cells have evolved 
a number of DNA repair mechanisms including 
the direct reversal of the lesion, sanitization of the 
dNTPs pools, and three different DNA excision path- 


ways: mismatch repair, nucleotide excision repair, and 


base excision repair (BER). In the BER pathway, the 
process is initiated by a DNA glycosylase excising 
the modified or mismatched base by hydrolysis of 
the glycosidic bond between the base and the deoxy- 
ribose of the DNA, generating a free base and an abasic 
site (AP site) which is cytotoxic and mutagenic. In 
turn an AP-endonuclease or an AP-lyase incises the 
phosphodiester bond next to the AP site that is further 
processed by the sequential action of either dRPase or 
5’ termini removing activity, DNA polymerase and 
DNA ligase and other accessory proteins, in order to 
restore the integrity of the information contained in 
DNA. The BER pathway is highly critical for cells 
since it is conserved from Escherichia coli to humans. 
The pioneering investigations were performed using 
bacteria and led to the concept of a new pathway for 
the repair of uracil residues, the deaminated product 
of cytosine, then to the demonstration that the initial 
steps for the repair of alkylated bases was mediated by 
the sequential action of two repair proteins, then to the 
identification of the various DNA glycosylases, the 
cloning of the genes coding for the respective proteins, 
and the identification or the construction of mutant 
strains deficient in these activities. These investigations 
greatly facilitated subsequent work in human cells. 
DNA glycosylases remove lesions generated by 
deamination of bases, alkylating agents, oxidative 
stress, ionizing radiation, or replication errors. All 
these lesions cause little perturbation of DNA struc- 
ture. Most DNA glycosylases excise a wide variety of 
modified bases, while few of them have, so far, a 
very narrow substrate specificity. The fact that BER 
enzymes perform more than one step in the BER 
pathway is another piece of evidence of their versa- 
tility. There are two types of DNA glycosylases, the 
monofunctional devoid of any other associated activ- 
ity and the bifunctional with an associated AP-lyase 
activity (B or B-6-lyase activity) incising the phospho- 
diester bond 3’ to the AP site and leaving a 5’ phos- 
phate termini or a 3’ phosphate-5’phosphate gap. The 
biological role of this latter activity is still unknown. 
As a general rule, the free modified base excised is an 
extremely poor inhibitor of its respective DNA gly- 
cosylase. The best inhibitors known of the activity of 
DNA glycosylases are transition-state analogs of the 
reaction catalyzed by these proteins. The goal of 
DNA glycosylases is to locate fast and efficiently the 
aberrant base amongst a huge excess of normal ones. 
Very little is known how these proteins achieve this 
goal. Based upon the known structures of DNA 
glycosylases bound to their substrates or inhibitors, 
it appears that different types of distortions occur in 
DNA leading to the insertion of the aberrant nucle- 
otide of the DNA substrate into a pocket of the active 
site by a process termed base flipping or nucleotide 
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flipping and first described in the case of a cytosine 
5-DNA methyltransferase acting on DNA. The com- 
parison of the crystal structures of a number of DNA 
glycosylases revealed structural homologies leading to 
the concept of a superfamily of BER glycosylases, the 
helix—hairpin—-helix (HhH) superfamily, having simi- 
lar HhH fold and a Gly/Pro-rich stretch with nearby 
Asp (GPD) motifs, although very little sequence simi- 
larity. This HhH motif plays an important role in the 
flipping out of the modified base. 

The number of known DNA glycosylases re- 
mained constant for a long time; however, by identify- 
ing the active core region of some of these enzymes 
then searching for homologs to this core, new DNA 
glycosylases have been identified. By improving func- 
tional predictions for uncharacterized genes by evolu- 
tionary analysis, one could expect to identify new 
DNA glycosylases. 

The BER pathway has been reconstituted in vitro 
with cell-free extracts of E. coli, or human cells, or 
using proteins purified at homogeneity. The major 
proteins performing this process are well defined but 
the accessory proteins required to obtain an optimal 
repair are not yet completely identified. Since the 
damage-specific initial step is carried out by either a 
monofunctional or a bifunctional DNA glycosylase, it 
yields abasic sites with different structures. The pro- 
cessing of the resulting AP site, a mutagenic repair 
intermediate, presumably by the major mammalian 
AP-endonuclease, HAP1/APEX, occurs via two 
alternative pathways: the short-patch (filling a one- 
nucleotide gap) and the long-patch (resynthesis of 
two to six nucleotides) BER. These two pathways 
involve some common proteins but also some specific 
ones. For example, in the short-patch pathway, Pol $ is 
involved in the resynthesis step, whereas PCNA and 
Pol B/6/e are implicated in the long-patch pathway. 
The results obtained so far suggest that lesions re- 
cognized by monofunctional DNA glycosylases are 
processed by both the short- and the long-patch path- 
ways, whereas those recognized by bifunctional DNA 
glycosylases are processed via the short-patch path- 
way. Moreover the selection of the BER pathway 
could be cell-cycle dependent, the long-patch one 
might be postreplicative. However the rates of repair 
measured are not yet optimal and should be improved 
by the identification and the use of accessory proteins. 
Although some proteins such as poly(ADP-ribose) 
polymerase are involved in the repair of lesions induced 
by simple alkylating agents, the precise role of this pro- 
tein in the resistance of the cells to alkylating agents 
remains unclear. The recent identification of new DNA 
polymerases able to replicate efficiently and accur- 
ately miscoding and mutagenic modified bases have 
to be taken into account in the understanding of BER. 


In the case of oxidative damages generated by 
hydroxyl radicals caused by a track of ionizing radia- 
tions, clustered multiple damaged sites have been 
observed, most of them being modified bases rather 
than DNA strand breaks. These modified bases are 
within half a turn of the double helix, i.e., five nucleo- 
tides, some of them on the two strands, and they 
therefore present a challenge to the cell for their repair. 
The precise mechanisms are so far very poorly under- 
stood. 

Since, so far, no human diseases have been linked to 
defects of protein involved in BER, DNA repair genes 
functionally expressed in mammalian cells and now 
transgenic mice having a null mutation in the gene 
coding for BER proteins are very important tools to 
ascertain the biological role of these proteins in mam- 
malian cells. It has been surprising to notice that, apart 
from a few examples of targeted deletion of genes 
encoding some BER proteins in mice leading to em- 
bryonic lethality (for example the AP-endonuclease), 
the genotype of the other knockout mice (such as a 
number of DNA glycosylases) does not show any 
striking particularity in term of predisposition to 
cancer or aging for example, raising the possibility of 
back-up pathway(s) that have yet to be identified. One 
could expect important breakthroughs from crosses 
between different strains to produce double knock- 
outs to identify the possible back-up systems, the 
processes involved in regulation, and the interactions 
of the different pathways. 

Detailed understanding of the mechanisms leading 
to the coordination of various proteins involved in 
the molecular reaction of BER is of paramount im- 
portance for gaining insights into the efficiency and 
fidelity of this key pathway for genome stability, 
prevention of cancer, resistance to chemotherapeutic 
agents, degenerative diseases, and more recently in 
some aspects of teratogenicity. 


See also: Excision Repair 
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The grass family (Gramineae or Poaceae) is descended 
from a single common ancestor, thought to have lived 
sometime between 70 and 55 million years ago (mya) 
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in tropical forest margin habitats. The major radiation 
of the grasses was much later, probably around 35 mya 
and correlates with an acquired ability to tolerate 
drought. Today there are about 10000 species of 
grasses, occurring on all continents and covering 
about 20% of the earth’s land surface. Members of 
the family provide food for most humans, and include 
rice, maize, wheat, oats, barley, rye, sugarcane, sor- 
ghum, and the various species known as millet. Other 
grasses are the main source of feed for livestock. 
Because of their economic importance, the grasses 
have been studied extensively by biologists and have 
become important model systems on which our 
knowledge of plant biology is based. This is particu- 
larly true for maize, which has an excellent genetic 
map and an enormous collection of mutants, and rice, 
whose genome is now almost entirely sequenced. 

The evolutionary history is now well known thanks 
to numerous investigations by molecular systematists. 
From these studies, a classification has been derived 
that follows the evolutionary history. Because the 
family is so large, it is divided into twelve subfamilies 
for convenience. The most important of these are 
the Panicoideae, which includes about 3200 species, 
the Pooideae, which includes about 3300 species, the 
Chloridoideae, which includes about 1350 species, 
and the Bambusoideae, with about 1000 species. The 
Panicoideae and Chloridoideae include many species 
that exhibit the C4 photosynthetic pathway, which 
appears to be an adaptation to hot, dry environments. 

The nuclear genomes of the grasses are approxi- 
mately colinear, with large blocks of genes in the same 
order in all species investigated. The blocks of genes 
are then arranged in different ways, so that the number 
of chromosomes varies. For example, the genes on 
chromosome 10 of rice are all found in the same 
order in maize and other grasses in the subfamily 
Panicoideae. In the panicoids, however, rice 10 is not 
a separate chromosome, but is inserted into the middle 
of rice 3. Combination of some chromosomes gives 
the panicoids a smaller number of chromosomes (9 or 
10) than rice, which has 12. 

Gene order is conserved in spite of large changes in 
genome size. The amount of DNA in the nucleus 
varies among grasses by a factor of 20, with rice and 
foxtail millet having among the smallest genomes and 
wheat and barley among the largest. The greatest dif- 
ferences in size are caused by the amount of noncod- 
ing DNA between the genes. This noncoding DNA 
appears to be largely an accumulation of retrotrans- 
posons. 

The forces that maintain colinearity are unknown. 
Although most grasses have relatively few rearrange- 
ments, a few have extensive changes in gene order. The 
amount of rearrangement does not correlate with 


evolutionary relationship. For example, although rye 
is more closely related to wheat than it is to barley, 
wheat and barley have nearly identical gene orders, 
whereas rye has multiple differences. 

Colinearity of the genomes is potentially useful in 
positional cloning of genes. The enomorous size of the 
wheat genome makes chromosome walking virtually 
impossible, even with a very precisely mapped gene. If 
a gene can be localized well enough, however, it is 
possible to find the corresponding region in the rice 
genome and locate the gene in the rice genomic 
sequence. The orthologous wheat gene can then be 
identified by sequence similarity to the rice gene. 
This approach could in principle be used to investigate 
variation in any grass, not just well-studied crop 
species. 


See also: Genome Relationships: Maize and the 
Grass Model; Hordeum Species; Oryza sativa 
(Rice); Triticum Species (Wheat) 
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Life on earth has evolved in the presence of gravity. 
Hence, it is not surprising that many organisms have 
acquired ways to use that inherent vectorial infor- 
mation to guide specific processes. Plants are no 
exception: they have acquired the ability to use gravity 
to orient the growth of their organs. This response, 
named gravitropism, is of primary importance to these 
sessile organisms. Indeed, it allows the shoots to grow 
upward, above the soil, where they can photosynthe- 
size, and the roots to grow downward into the soil, 
where they can take up the water and mineral ions 
required for plant growth and development. 

Gravitropism is also important in agriculture and 
horticulture. It promotes upward growth of crop 
shoots prostrated by the action of wind and rain, 
thereby keeping seeds away from soil moisture and 
pathogens and amenable to mechanical harvest. On 
the other hand, gravitropism is responsible for some 
unwanted shoot bending that occurs during transport 
and/or storage of cut flowers. 

Plant organs grow using a combination of cell div- 
ision in their apical meristems, and cell expansion in 
their subapical regions. Cells that are laid down by the 
division of initials in the apical meristem undergo 
an expansion process before full differentiation. Cell 


expansion is a highly controlled process, and is the 
primary target for environmental signals that guide 
organ growth. Thus, when a plant organ is reoriented 
within the gravity field, it responds with differential 
cellular elongation (expansion along the longitudinal 
axis) on opposite flanks of the elongation zone. The 
differential growth results in the development of a 
curvature that brings the organ tip back to an accept- 
able orientation (gravitational set point angle). 

The existence of a gravitropic response implies that 
plant organs can sense a change in their orientation 
within the gravity field, and transduce this physical 
information into a physiological signal. The physio- 
logical signal i is then transmitted from a site of sensing 
to the site of response (elongation zone), where it 
promotes a differential cellular elongation on opposite 
flanks, responsible for the curvature. A great deal of 
information on the gravitropic response of plant 
organs has recently been obtained through the mo- 
lecular genetic analysis of gravitropism in the model 
plant Arabidopsis thaliana. 


Arabidopsis thaliana as a Model for the 
Study of Gravitropism 


Arabidopsis thaliana is a powerful model for the 
study of growth and development processes in plants. 
It is a small plant that has a short generation time 
(~6 weeks), and grows well under laboratory condi- 
tions, on shelves at room temperature, with limited 
amounts of light. It reproduces by self-pollination, 
although cross-pollination can be easily accom- 
plished. It generates approximately 10000-30000 
seeds. Its nuclear genome is small (125 Mb) and has 
been completely sequenced. The plant can be trans- 
formed very easily by Agrobacterium tumefaciens, 
and large collections of T-DNA-insertion and trans- 
poson-mobilized lines have been generated and are 
available for forward and reverse genetic studies. 
Importantly for the field of gravitropism, Arabidop- 
sis thaliana is a small plant that generates tiny seeds. 
Upon germination, these seeds give rise to small seed- 
lings that can be grown under sterile conditions in petri 
dishes, under controlled environmental conditions. 
Hence, it is possible to subject individual seedlings to 
changing levels of a specific environmental parameter, 
while maintaining other growth conditions constant. 
This ability to grow a large number of Arabidopsis 
seedlings under highly controlled environmental con- 
ditions has allowed the development of large-scale 
screens to examine many mutagenized plants for iden- 
tification of gravitropic mutants. These screens have 
typically involved growing seedlings on or in vertical 
agar-containing media for a few days. Then, young 
seedlings were gravistimulated by rotating the plates 
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by 90°. Under these conditions, wild-type seedlings 
reoriented the growth of their primary organs within 
12 h, resuming vertical upward and downward growth 
for hypocotyls and roots, respectively. Gravitropic 
mutant seedlings were not able to reorient well in 
response to gravistimulation. Rather, their roots and 
hypocotyl grew more randomly along the gravity 
vector than the wild-type, even before plate rotation. 

Similar procedures have been developed to identify 
mutants affected in inflorescence stem gravitropism. 
In this case, plants are germinated and grown in soil 
until bolting. When inflorescence bolts reach a few 
centimeters, they are cut, inserted in a block of solidi- 
fied medium, and placed horizontally. Here again, 
wild-type shoots reorient upward, while mutant shoots 
do not. It is interesting to note at the outset that 
mutations were identified that affect the gravitropic 
response of all three organs (roots, hypocotyls, and 
inflorescence stems), while others were specific to one 
or two of these organs. This reflects both the redun- 
dancy that exists at some steps of the gravity signal 
transduction pathway, and the fact that some of the 
steps in gravity signal transduction are common 
between all three organs, while others are specific to 
one or two of them. 


Gravity Sensing and Signal Transduction 


Gravity sensing appears to occur in a few specialized 
cells of each plant organ, named statocytes. In roots, 
statocytes are located in the center of the cap, an organ 
that covers the root apical meristem. In shoots, the 
statocytes appear to be located in the starch sheath, an 
endodermal cell layer that surrounds the vasculature. 
The statocytes are highly polarized cells that contain 
sedimentable amyloplasts, named statoliths, which are 
starch-filled plastids whose density is 1.5 times higher 
than that of the surrounding cytoplasm. Hence, upon 
reorientation within the gravity field, amyloplasts 
sediment to the new physical bottom of the statocytes. 
The starch-statolith hypothesis proposes that the stato- 
cytes are capable of sensing amyloplast sedimentation, 
or the pressure exerted by these plastids on unknown 
gravity receptors. 

Amyloplast sedimentation appears to be the pri- 
mary gravity-sensing mechanism in higher plants, 
although alternative models have been proposed that 
may account for some aspects of the response. Mag- 
netophoretic studies involving a lateral mobilization 
of the diamagnetic amyloplasts within the statocytes 
by high-gradient magnetic fields have demonstrated 
that amyloplast sedimentation is sufficient for the 
promotion of shoot and root tip curvature. 

Consistent with a primary role of amyloplast sedi- 
mentation in gravity sensing, starch-deficient mutants 
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show strong defects in gravitropism. For instance, 
mutations in the phosphoglucomutase (PGM) gene of 
Arabidopsis affect both shoot and hypocotyl gravi- 
tropism. Phosphoglucomutase is an enzyme involved 
in starch biosynthesis, and some of the pgm mutants 
are unable to accumulate starch in their statocytes. 
Interestingly, magnetophoresis does not promote sta- 
tolith displacement or organ-tip curvature in these 
mutants. 

The scr (SCARECROW) and sbr (SHORT- 
ROOT) mutations affect the formation of ground 
tissue (cortex and endodermis) in A. thaliana roots 
and shoots. Mutant organs lack one cell layer at the 
position normally occupied by the ground tissue. 
The remaining layer in this position has characteris- 
tics of both tissue types in scr, while they lack any 
endodermal specification in shr. In both mutants, 
ground-tissue cells lack statoliths, while endodermal 
cells in wild-type shoots and hypocotyls do contain 
them. Interestingly, shoots and hypocotyls of scr and 
shr mutant seedlings did not respond to gravistimula- 
tion, while their roots did. As the root statocytes are 
located in the cap, not in the endodermis, the results 
provide good correlative evidence for the starch- 
statolith hypothesis described above. 

Even though amyloplast sedimentation appears 
sufficient to promote the development of a curvature 
at the tip of a plant organ, it is not clear how the 
corresponding physical information is transduced into 
a physiological signal within the statocytes. Physio- 
logical evidence points to Cat, IP3, and pH as pos- 
sible second messengers in this pathway. However, 
genetic evidence for this conclusion has yet to come. 

So far, mutations in only three genes, ARG/, ARL2, 
and RHG, have been shown to affect the signal trans- 
duction phase of gravitropism. Mutant seedlings 
develop an altered gravitropic response in hypocotyls 
and roots, without affecting their phototropic compe- 
tency (ability to curve toward or away from a light 
source, respectively). Because gravitropism and photo- 
tropism appear to involve similar differential cellular 
elongation responses promoted by the redistribution 
of a specific plant growth regulator (auxin: see below), 
this result strongly suggests that these genes are 
involved in early phases of gravity signal transduction. 
The ARG/ and ARL2 genes encode similar dnaJ-like 
proteins that carry a coiled coil domain at their 
C-terminus. In ARG1, this domain is similar to coiled 
coils found in a number of cytoskeleton-binding pro- 
teins. Hence, it was postulated that ARG1 might regu- 
late gravity signal transduction either by promoting 
the formation of a signal transduction complex in the 
vicinity of the cytoskeleton, or by altering the general 
organization of the cytoskeleton. It is interesting to 
note that dnaJ-like proteins have been implicated as 


molecular chaperones in the facilitation of a number 
of signal transduction pathways, as well as in general 
protein folding, translocation, or degradation. Al- 
though the molecular function of ARG1 has not 
been fully elucidated yet, it is important to note that 
this protein is probably not important for general 
protein folding, translocation, or degradation, consid- 
ering the specificity of the Arg1 phenotype. 

Hence, ARG1 and ARL2 could act in gravitropism 
by serving as chaperones in the folding of specific 
components of the gravity signal transduction path- 
way, or their targeting to specific cellular subcompart- 
ments. Interestingly, genetic modifiers of arg1 have 
been identified. Modified seedlings appear to develop 
a more dramatic phenotype, displaying an almost ran- 
dom orientation of their organs, with some tendency 
to an opposite orientation compared to wild-type. The 
molecular analysis of these genetic modifiers promises 
to unravel important clues on the molecular function 


of ARG1 in gravitropism. 


Signal Transmission to the Responding 
Zone 


The composition of the physiological signal that is 
generated upon perception of a gravistimulus within 
the statocytes and informs the elongation zone of a 
need to respond to the stimulus has not yet been fully 
elucidated. However, this signal appears to include a 
plant growth regulator, named auxin. Early physiolo- 
gical studies showed that auxin may be redistributed 
across the gravistimulated organ in response to the 
activated gravity signal transduction pathway. The 
corresponding cross-organ gradient is then trans- 
mitted to the elongation zone where it promotes a 
differential growth response. 

In plants, auxin is transported through cell files in 
a polar fashion. It enters successive cells in the file 
through an influx carrier or by passive diffusion 
through the plasma membrane, and exits them through 
a complex auxin efflux carrier. The efflux carrier is 
made of a transmembrane protein, a regulatory pro- 
tein that may bind the cytoskeleton and appears to be 
the target for a number of transport blockers, and a 
putative linker protein. Polarity of transport appears 
to be mediated by the polar distribution of this auxin 
efflux carrier complex within the transporting cells. 

Interestingly, several mutations that affect gravi- 
tropism in A. thaliana were recently shown to affect 
the transport of auxin. Mutations in AUX7 result in 
altered root gravitropism and increased root growth 
resistance to auxin. The gravitropism phenotype of 
aux1 seedlings can be rescued by adding a low con- 
centration of 1-NAA, a synthetic auxin that appears 
to diffuse through the cellular membranes quite 


efficiently, but not by adding 2, 4D or IAA to the 
medium. Because the latter two auxins are believed 
to require a transporter to penetrate the cells, it was 
hypothesized that AUX1 encodes an influx carrier 
of auxin. The AUX1 gene encodes a transmembrane 
protein that shares homologies with tryptophane 
(TRP) transporters. Because the molecular structure 
of auxin is quite similar to that of TRP, it has been 
postulated that AUX1 encodes an auxin influx carrier 
involved in the local transport of auxin at the root tip. 
Auxin-transport studies have since confirmed this con- 
clusion. 

Other gravitropism mutations of A. thaliana have 
been shown to affect a transmembrane component of 
the auxin efflux carrier complex. Indeed, agr1 mutant 
seedlings are more sensitive to high concentrations of 
1-NAA, more resistant to ethylene, and more resistant 
to blockers of the auxin efflux carrier (NPA, TIBA) 
than wild-type plants. Mutant roots are also defective 
in their ability to transport radioactively labeled auxin 
in a basipetal fashion, supporting a role for the corres- 
ponding gene in auxin transport in roots. The AGRI 
gene (also named E/R1, PIN2, or WAV6) encodes a 
transmembrane protein that is localized on the basal 
membrane of root elongation- zone cells. When 
expressed in yeast, this protein allows for better 
auxin export activity. Taken together, these results 
support a direct role for AGR1 in cellular auxin efflux. 

Auxin is a growth regulator that has multiple roles 
in plant growth and development, including embryo 
axis formation, vasculature development, lateral root 
formation and development, apical dominance, and 
tropisms. However, aux1 and agr1 show very specific 
defects in gravitropism. In fact, AUX1 and AGRI 
belong to large gene families, and one can speculate 
that specific members of each family have different 
functions in a subset of these growth and develop- 
mental processes. For instance, the PIN1 gene appears 
to mediate the polar transport of auxin in inflores- 
cence stems. Hence, a better insight into the function 
of each member of these important gene families will 
enhance our understanding of the role(s) played by 
auxin in multiple phases of plant growth and develop- 
ment. 

Although auxin appears to be an important com- 
ponent of the physiological signal that dictates organ 
tip curvature in response to gravistimulation, it is not 
the only player. Indeed, auxin transport and auxin 
response mutants (see below) still appear to develop 
some remnants of a gravitropic response. Also, a 
robust gravitropic response is still observed even 
when corn or Arabidopsis roots are exposed to high 
auxin levels, otherwise sufficient to completely inhibit 
root growth. Furthermore, the differential cellular 
elongation that occurs on opposite flanks of the root 
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elongation zone in response to gravistimulation is 
very complex, and cannot be explained by a simple 
redistribution of auxin across the root. Hence, it 
appears that gravitropism also involves an auxin- 
gradient-independent process. Although there is no 
clear understanding of this auxin-gradient-independ- 
ent phase of gravitropism, physiological experiments 
suggest that it might involve electrical signals. The 
availability of ion channel mutants in A. thaliana, 
and of efficient reverse-genetic procedures to disrupt 
the expression of other channel genes identified by the 
completed genome-sequencing project, should allow 
experimental testing of this model. 


The Curvature Response 


A number of auxin-response mutants have been isol- 
ated in A. thaliana. Most of these mutants were also 
shown to be defective in their ability to respond to 
gravistimulation. Molecular analysis of the corres- 

ponding genes revealed interesting features of the 
auxin-response pathway. 

Auxin appears to regulate cellular elongation by 
altering the activity of the plasma membrane proton 
pump, by affecting cell wall extensibility and by regu- 
lating the expression of a number of genes important 
for these processes. Auxin has been shown to bind to a 
number of proteins within plant cells. However, only 
the auxin-binding protein ABP1 has been postulated 
to act as an auxin receptor in the control of cell expan- 
sion. Upon auxin binding, this predominantly ER- 
localized protein would somehow regulate the activity 
of the proton pump, and promote cell expansion. The 
details of its mode of action are yet to be elucidated. 

Some aspects of auxin signal transduction leading 
to differential gene expression have recently been eluci- 
dated. The AXR/ gene of A. thaliana is important for 
gravitropism and other aspects of auxin response. It 
encodes a nuclear protein that interacts with ECR1 to 
activate members of a the RUB/NEDD8 family of 
ubiquitin-related proteins. Interestingly, the AXR1/ 
ECR1 complex appears to mediate the rubination of 
another protein, named cullin. Cullin belongs to a 
protein complex that also includes ASK and the F- 
box containing TIR1 protein, which is also essential 
for gravitropism and auxin response. The ASK/cullin/ 
TRI1 complex is similar to the yeast SKp1-Cok 53- 
F-box-protein (SCF) complex which has been impli- 
cated in ubiquitin-mediated protein degradation. The 
targets of this TIR1-containing SCF-like complex 
appear to be repressors of early auxin-response genes 
that may be targeted to destruction by the proteasome 
inan ubiquitin-dependent manner. The AXR2, AXR3, 
and SHY2 proteins may constitute such targets. These 
short-lived proteins interact with auxin-response 
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transcription factors, and may negatively regulate the 
expression of other auxin-response genes. These three 
genes are also important for gravitropism and auxin 
response. Hence, a gene-regulation cascade appears to 
be activated by this complex auxin-dependent path- 
way, even though the site of auxin action in the 
pathway remains elusive. 


Future Prospects 


Our understanding of the molecular mechanisms that 
drive gravitropism in plant organs has improved 
through the analysis of gravitropic-response mutants 
in A. thaliana. This analysis has contributed to sub- 
stantiate the starch-statolith hypothesis, even though 
the data remain purely correlative at this time. A role 
for auxin as a component of the gravitropic signal 
transmitted from the site of sensing to the site of 
response has been confirmed. Also, some of the 
proteins involved in polar auxin transport have been 
identified and are being characterized, thus opening 
the door to an elucidation of the multiple roles played 
by auxin transport in plant growth and development. 
Finally, a clear involvement of ubiquitin-mediated 
proteolysis in the auxin signal transduction pathway 
has been elucidated, and a number of target regulatory 
genes for that pathway have been uncovered. 

Many things remain to be done, however, before 
one can fully understand the multiple mechanisms 
involved in gravitropism in higher plant organs. The 
gravitropic receptor that is activated by amyloplast 
sedimentation or pressure in the statocytes has to be 
identified and characterized. The molecules involved 
in transducing the corresponding signal within the 
statocytes have yet to be characterized. Physiological 
and physicochemical evidence suggest the existence of 
an alternative mode of gravity sensing in higher plants, 
possibly involving perception of the pressure exerted 
by whole protoplasts on their cell walls and intracel- 
lular cytoskeleton networks. The relative contribution 
of each gravity-sensing mechanism remains to be eluci- 
dated. A better understanding of the mechanisms 
involved in auxin redistribution is needed, as well as 
the identification of additional components of the 
signal transmitted to the responding zone. Finally, a 
complete elucidation of the mechanisms involved in 
the cellular responses to these signals is needed. 

Fortunately, an unprecedented number of tools 
derived from genetics, reverse genetics, genomics, 
proteomics, and biochemistry in Arabidopsis, rice, 
corn, and other plant species have recently been 
added to an already impressive arsenal of physiolo- 
gical, cytological, and physicochemical techniques. 
A multidisciplinary approach is now possible, and 
should improve our ability to answer these important 


questions of plant biology. Thus, we can anticipate 
some important breakthroughs in our understanding 
of the molecular mechanisms that allow plant organs 
to use gravity and other environmental stimuli to 
control their growth patterns and generate some 
truly amazing growth behaviors. 
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Basic Concepts 


Natural selection occurs in any system whose mem- 
bers have the properties of replication, variation, and 
heredity (Lewontin, 1970; Maynard Smith, 1976). 
When the system consists of cells, the variation 
among cell lineages in replication and death rates, 


and the similarity of daughter to mother cells, gives 
rise to among-cell selection, which determines tissue 
shape. When such a process operates among cells 
within the germline, it can result in gametic selection 
or ‘meiotic drive,’ one of the strongest evolutionary 
forces known. When selection occurs among indi- 
viduals, among groups, or among species, it is called 
individual selection (sometimes mass selection), group 
selection, or species selection, respectively. 

Group selection has been a controversial topic in 
evolutionary biology for several reasons (Williams, 
1966; Wade, 1978; Wilson, 1980). First, it is difficult 
to establish that groups of individuals have the neces- 
sary properties of replication, variation, and heredity. 
Groups can be formed in so many different ways and 
the processes of group formation determine, in large 
part, whether biologically significant variation among 
groups can exist and, if it exists, whether or not it is 
heritable (Wade, 1996). Secondly, if groups do have 
the requisite properties, it is not clear what category 
of adaptations or patterns in nature can be better 
explained as a unique result of group selection than 
by the more familiar individual selection. It is for this 
reason that much of the group selection controversy 
has been focused on adaptations that are good for the 
group but harmful for the individual or on adaptations 
such as sex which might favor group ‘evolvability’ 
(Williams, 1975; Maynard Smith, 1976). Such adapta- 
tions would be the distinctive signature of group selec- 
tion (Wilson, 1992). Thirdly, whenever individual and 
group selection operate simultaneously, the number of 
episodes of individual selection is likely to be greater 
than that for group selection, because individual birth 
and death rates are higher than group colonization 
and extinction rates. (This criticism does not apply 
to D.S. Wilson’s trait group selection — what (Wade, 
1978) has called ‘intrademic group selection’.) 
Fourthly, the common wisdom subscribes to a naive 
form of group selection when it incorrectly describes 
adaptations of all sorts as being “for the good of the 
species.” This attribution is a serious misunderstand- 
ing of the Darwinian logic and evolutionary dynamic. 
Countering this misconception and misuse of naive 
group selection as a causal explanation has instilled a 
profound bias against the entire concept of group 
selection in some biologists (e.g., Williams, 1966; 
Dawkins, 1976). 


Illustration of Group Selection 


Geographic and physical barriers often constrain the 
movements of individuals and thereby impose a degree 
of genetic subdivision or population genetic structure 
on most species. In addition, individuals tend to ag- 
gregate or cluster together whenever resources are 
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patchily distributed. It is this spatial aggregation of 
individuals and the expression of social behaviors 
within aggregations that results in novel ecological 
and evolutionary processes involving group selection. 
Whenever an individual’s behavior affects its own fit- 
ness and the fitness of conspecifics, group selection will 
affect the evolution of that behavior in a genetically 
subdivided population (Wade, 1978; Wilson, 1980). 

Consider a hypothetical species with two kinds of 
individuals, benefactors and recipients (Figure |). The 
benefactors provide a fitness benefit to other members 
of the group and do so at a cost to their own fitness. 
Recipients do not engage in provisioning behaviors 
but benefit from the behavior of benefactors and 
experience increased fitness whenever they are around 
benefactors. This difference in behavior and its fitness 
effects makes the benefactor—recipient interaction an 
example of the frequently discussed altruism—cheater 
interactions. Darwin believed that the existence of 
such benefactor adaptations could be “fatal to 
my whole theory” of evolution by natural selection 
(Darwin, 1859, p. 236) because, by definition, the 
benefactor lowers its fitness while increasing the fitness 
of the recipient. Natural selection should operate to 
eliminate such behaviors, yet they appear prevalent in 
some of the major taxonomic groups of insects and 
mammals, e.g., the sterile castes in colonies of bees, 
ants, or wasps, ‘helpers at the nest’ in some birds, or 
group feeding in the social spiders. 

Darwin solved this problem by postulating that 
group selection, among colonies or families, operated 
in opposition to individual selection within colonies 
or families (Darwin, 1859 p. 237). We can illustrate 
how selection operates in different directions at dif- 
ferent levels using the benefactor—recipient illustra- 
tion. First, consider two groups of birds (Figure 2). 
Each group consists of five birds, but the groups differ 
from one another in the frequency of benefactors. 
Group 1 is rich in benefactors, with a frequency of 
0.80, while group 2 is relatively poor in benefactors, 
with a frequency of only 0.20. Thus, the groups meet 
the first criterion for the existence of group selection, 
variability, specifically, variability in the frequency of 
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No cost of altruism 
but reap benefits from others 


Bear cost of altruism 
and bestow benefit on others 


Figure | Individual variation in social behavior. 
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benefactors. Indeed, there are two components of vari- 
ation in the frequency of benefactors: (1) among birds 
within groups, and (2) among groups. 

Each component of variation has a selective effect 
or consequence for both individual and group replica- 
tion. Within each group, benefactors experience re- 
duced fitness (Figure 3). In group 1, the frequency 
of benefactors declines from 0.80 to 0.78. Similarly, in 
group 2, the frequency of benefactors declines from 
0.20 to 0.17. Individual selection operating within 
groups selects against the benefactors and their fre- 
quency declines as a consequence. The magnitudes 
of the decline in benefactor frequency are —0.02 and 
—0.03, in groups 1 and 2, respectively. The total 
decline in the frequency of benefactors by individual 
selection is —0.027. This is the weighted average de- 
cline, where the weights are determined by the size of 
the group relative to the total after individual selection. 

The among-group component of variation also has 
a selective effect (Figure 4). A group with a high 
frequency of benefactors has a higher growth rate 
than a group with a lower frequency of benefactors. 
This positive effect of benefactor frequency is the 
opposite of the negative fitness effect of being a bene- 
factor within a group. Group 1, with an initial fre- 
quency of 0.80 benefactors, increased in size from five 
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Figure 3 


(-0.02)(0.6) + (-0.03)(0.4) 


to nine birds, while group 2, with a lower initial 
frequency of benefactors (0.20), increased only from 
five to six birds. The relative fitness of group 1 is 1.2, 
which is calculated as a per-head growth rate of 1.8 
(i.e., 9/5) relative to the mean growth rate of 1.5 (ie., 
15/10). This is much higher than the relative growth 
rate of group 2, which is 0.80, i.e., a per-head growth 
rate of 1.2 (6/5) relative to the mean of 1.5. This 
difference in growth rate of groups also causes a 
change in the frequency of benefactors. Hence, 
group selection favors benefactors and results in a 
positive change in their frequency equal to +0.06. 

The total change in the frequency of benefactors 
equals the sum of the changes caused by the two 
opposing levels of selection (Figure 5): individual 
selection against benefactors and group selection 
favoring benefactors. The total change in the fre- 
quency of benefactors is positive despite the oppos- 
ition of individual selection against benefactors within 
every group. In this example, group selection is 
stronger than opposing individual selection. This 
kind of interesting interaction between individual 
and group selection and behavioral evolution has 
been experimentally demonstrated in laboratory popu- 
lations of flour beetles (Wade, 1980a), in farm popula- 
tions of chickens (Muir, 1996), in field populations of 
willow leaf beetles (Breden and Wade, 1989; Wade, 
1994), jewelweed (Stevens et al., 1995), and social 
pides (Aviles, 2000). (See Goodnight and Stevens, 
1997, for a recent review of experimental studies of 
group selection.) 


Group Genetic Structure 


Group genetic structure is often characterized in hier- 
archical terms associated with the components of 
genetic variation among individuals within groups 
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Individual selection within group 2 
AP benefactors = 0-17 — 0.20 = 0.03 


= -0.027 


Individual selection within groups opposes benefactors. 


AP group ={(0.8)*(1.2) + (0.2)*(0.8)}/2 — 0.5 = +0.06 


Figure 4 Group selection favors benefactors. 


AP group = (0.8)*(1.2) + (0.2)*(0.8) — 0.5 = +0.06 


Average APingividual = (-0.02)(0.6) + (-0.03)(0.4) = -0.027 


AP total = (A, Total = (Ay, otal 


After Before 
APtotal = {8/15} — {5/10} = +0.033 


APtotal = APindividual + AP group = —0.027 + 0.060 


Figure 5 Total selection favors benefactors. 


and among groups. When quantified using Wright’s F 
statistics (Wright, 1969, 1978), group genetic structure 
describes the fraction of the total genetic variance 
accounted for at a given level of metapopulation sub- 
division. For our example, the total variance in the 
frequency of benefactors is (0.5)*. This total variance 
can be partitioned into two components: (1) the mean 
variance within groups, which is 0.16 {[(0.8)(0.2) + 
(0.2)(0.8)]/2}; and (2) the variance among groups, 
which is 0.09 {[(0.8 — 0.5)? + (0.2 — 0.5)?]/2}. Note 
that, in this example, the variance among groups is 
approximately only 36% of the total variance so that 
F, the fraction of the variance among groups, equals 
0.36, which is half of that within groups. In fact, 
the among-group variance is only 56% as large as the 
mean variance within groups. Note also that the 
genetic variance among groups is also the genetic 
correlation among individuals within groups 
(Cockerham, 1954). Thus, whenever individuals live 
in groups of genetic relatives, there will necessarily be 
genetic variation among groups (Wade, 1980b). 

The value of F is influenced by a large number of 
factors, including the numbers of breeding adults per 
group (‘effective group size, Ne), the rate and pattern 
of gene flow among groups (m), the extinction and 
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Figure 6 Two-locus, ‘additive-by-additive’ epistasis. 


colonization of local groups (Whitlock and McCauley, 
1990), group fission and fusion (Breden and Wade, 
1989; Whitlock, 1992), and group density regulation 
(i.e, hard versus soft selection: Wade, 1985; Kelly, 
1992, 1994). The effects of these factors on the 
among-group genetic variance have been reviewed 
elsewhere (Wade, 1996). Wright noted that variation 
in offspring numbers, variation in breeding sex ratio, 
and fluctuations in the size of breeding groups 
(Wright, 1931, 1941, 1952) all tend to reduce Ne. It is 
important to emphasize that natural selection itself 
reduces N, and, thus, increases F: Whenever natural 
selection occurs, the variance in fitness exceeds ran- 
dom, by definition, and consequently N, is reduced to 
less than N. This inevitable reduction in N, that 
accompanies natural selection is called the Hill- 
Robertson effect (Hill and Robertson, 1966; Barton, 
1995). 


Genetic Structure of Adaptations in 
Relation to Group Selection 


Arguments in the controversy over individual versus 
group selection tend to overlook the genetic architec- 
ture of adaptations. Most adaptations are not deter- 
mined by alternative alleles at single genes, but rather 
by epistasis, the integrated action of many genes. 
Whenever multiple loci determine a trait, individual 
selection becomes significantly less efficient and 
group selection more efficient, an important feature 
unique to interaction systems and not captured by 
single-gene models. Epistasis both enhances the evo- 
lutionary potential of group selection and simultan- 
eously diminishes that of individual selection. To 
see this, consider a simple two-gene interaction (Fig- 
ure 6). In Figure 6, there is a simple additive-by- 
additive genetic interaction between the A and B loci 
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Figure 7 Variation in the sign of the allelic effect of B 
with genetic background. 


in affecting an individual’s phenotypic value. When 
the genetic background at the A locus is homozygous 
aa, the effect of the B allele is to additively increment 
phenotypic value. By ‘additively increment,’ I mean 
that a homozygous BB individual with two B alleles 
has twice the phenotypic value of a heterozygous 
individual Bb with only one B allele. However, 
when the genetic background at the A locus is 
changed to be homozygous AA, the effect of the B 
allele is the opposite: it additively decrements pheno- 
typic value. 

The effect of the B allele is not a property of the 
allele itself, but rather a property of the interaction 
system (Figure 7). This has profound effects on the 
evolution of the B allele, especially in genetically sub- 
divided populations where the frequency of the A 
allele changes from group to group. This kind of epi- 
stasis represents a ‘genetic constraint’ on individual 
selection. In groups with a high frequency of the A 
allele, B will increase by virtue of its positive effect on 
phenotypic value. However, in groups, with a low 
frequency of the A allele, B will decrease in frequency 
by virtue of its negative effect on phenotypic value. 
Because positive and negative values of Apg are 
combined and averaged to determine the effect of 
individual selection (see section “Illustration of group 
selection”), the change in the frequency of the B allele 
by individual selection is reduced. The greater the 
value of F, the greater the variation in genetic back- 
ground among groups. In contrast, group selection 
will favor those groups in which gene combinations 
result in high group fitness (see section “Illustration of 
group selection”). 


Summary and Conclusion 


Although group selection remains a controversial topic 
in evolutionary biology, experimental studies, in both 
laboratory and field, have shown that groups have 


the necessary properties of replication, variation, and 
heredity. Indeed, given the known processes of group 
formation, these essential properties must be common 
in nature. It is also clear that group selection can affect 
the evolution of many traits, especially those with a 
complex genetic basis, and not only adaptations, which 
are good for the group but harmful for the individual. 
Even in those circumstances where the number of 
episodes of individual selection exceeds that for 
group selection, epistasis for fitness can severely 
limit the efficiency of individual selection at the same 
time that it opens unique opportunities for group 
selection. 

It remains important, however, to avoid naive group 
selection when attempting to explain the origin of 
adaptations. Recognizing the hierarchy of biological 
levels, to which the Darwinian logic and evolutionary 
dynamic apply, does not constitute an endorsement of 
causal explanations based on “the good of the species.” 
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‘Growth factors’ is a generic term applied to define a 
specific set of polypeptides that act via association 
with high-affinity transmembrane receptors to induce 
intracellular signals, which mediate cell proliferation, 
differentiation, and survival. Growth factors are the 
principal means of intercellular communication in the 
development and regeneration of metazoan organ- 
isms. Mutation of either growth factors or their cognate 
receptors can have profound effects on organismic 
development and physiological function. 

Although there is considerable structural and func- 
tional diversity amongst growth factors, some com- 
mon themes regarding their mechanism of action can 
be defined. Growth factors, unlike classical endocrine 
hormones, generally act locally within tissues rather 
than between organs. Many growth factors exhibit 
biochemical features which constrain their activity to 
cells in close proximity to the source of synthesis. 
These may include association with nonsignaling 
components such as specific binding proteins or extra- 
cellular matrix components, anchorage to the plasma 
membrane, or a requirement for proteolytic cleavage 
to elicit biological activity. 

Growth factors and their receptors can be grouped 
into ‘families, based upon shared features of amino 
acid sequence, and into ‘superfamilies, based upon 
shared structural folds. Many growth factor families 
display significant evolutionary conservation in 
sequence; for example, homologs of the fibroblast 
growth factor (FGF), epidermal growth factor (EGF), 
and transforming growth factor beta (TGF-beta) 
families can be found in nematodes, echinoderms, and 
Drosophila, as well as higher vertebrates such as 
mouse and humans. A common finding is that higher 
vertebrates have larger growth factor families than 
invertebrates. For example, there are currently 22 
members of the FGF gene family in the human genome, 
butonly onein Drosophila and Caenorhabditis elegans. 
Some growth factor superfamiles such as chemokines, 
whose primary action is in infection and immunity, are 
found in gene clusters and exhibit significant diver- 
gence in sequence and gene number between closely 
related mammalian species. 

A key feature of the divergence and elaboration of 
growth factor and receptor gene number in higher 
vertebrates is that it results in diversification of recep- 
tor recognition specificity; it is frequently observed 
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that each member of a growth factor family has a 
unique repertoire of receptors with which it can in- 
teract. In addition, individual members of growth 
factor families can exhibit widely divergent patterns 
of gene expression in vivo. Collectively this means 
that individual members of growth factor families 
can display characteristic physiological defects upon 
mutation; for example, homozygous null mutants 
of FGF-4 result in a peri-implantation lethal defect 
in the mouse, whereas homozygous null mutants 
of FGF-5 are viable but exhibit an ‘angora’ hair 
phenotype. 

As might be expected from their biochemical func- 
tions, mutations in growth factors and their receptors 
have important consequences for human disease. 
Somatic mutations in particular growth factors or 
receptors have been associated with carcinogenesis 
and exhibit the properties of oncogenes. These muta- 
tions are generally dominant in character and result, 
by a variety of different means, in activation of intra- 
cellular signaling pathways. For example, mutations in 
receptors which result in receptor oligomerization 
(such as fusion to a dimeric partner protein) are associ- 
ated with particular human malignancies. Ectopic 
activation of growth factor expression by retroviruses 
has been been associated with retroviral-induced 
carcinogenesis in experimental systems. Some inher- 
ited dominant mutations are associated with develop- 
mental dysplasia; for example, Crouzon is a congenital 
craniofacial syndrome which results from mutations 
in FGF receptor-2, leading to receptor activation in the 
absence of FGF ligand. Achondroplasia is a dominant- 
acting congenital dwarfism syndrome which results 
from specific mutations in FGF receptor-3. Recessive, 
homozygous loss-of-function mutations in growth 
factors and receptors are much rarer in natural popu- 
lations and frequently arise from forced selective 
breeding for desirable physiological traits; for ex- 
ample, the ‘double muscle’ phenotype of Belgian Blue 
cattle results from homozygous recessive mutation 
of the gene encoding the TGF-beta family member 
myostatin. 

Finally, certain growth factors have significant 
practical utility in genetics research. The ability to 
cultivate embryonic stem (ES) cells in culture is 
dependent upon a specific growth factor, leukemia 
inhibitory factor (LIF). In the presence of LIF, ES 
cells can be selected for specific mutations, which 
can be introduced back into the germline by trans- 
plantation of the genetically modified ES cells into the 
host embryo. 


See also: Achondroplasia; Embryonic Stem Cells; 
Oncogenes 
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Gerstmann-Straussler disease (GSD), also known 
as Gerstmann-Straussler—Scheinker disease (GSSD), 
is a human neurodegenerative disease characterized 
by cerebellar ataxia and progressive dementia. Like 
the related diseases Creutzfeldt—Jakob disease (CJD) 
and familial fatal insomnia (FFI) it is associated with 
alterations in the prion protein. Most cases of GSD are 
familial, in contrast to CJD, and are caused by certain 
missense mutations in the prion gene. 


See also: Creutzfeldt-Jacob Disease (CJD); 

Familial Fatal Insomnia (FFI); Spongiform 

Encephalopathies (Transmissible), Genetic 
Aspects of 


GT Repeats 


See: Microsatellite, CA Repeats 


GT-AG Rule 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1852 


The GT-AG rule describes the presence of these 
invariable dinucleotides at the first two and last two 
positions of introns in nuclear DNA. 


See also: Introns and Exons 
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Guanosine-5'-triphosphate (GTP) is synthesized in 
the cell by phosphorylation of guanosine diphosphate 
(GDP), catalyzed by a nucleoside diphosphate kinase, 
with ATP as the phosphate donor: 


GDP + ATP = GTP + ADP 


For the synthesis of deoxyguanosine triphosphate 
(dGTP), a precursor of DNA, the 2’ hydroxyl group 
of the ribose moiety of GTP is replaced by a hydrogen 
atom. The final step in this conversion is catalyzed by 
ribonucleotide reductase. 

GTP is an energy-rich, activated precursor for 
RNA synthesis that also plays important roles in sev- 
eral other cellular processes such as protein synthesis, 
protein localization, signal transduction, visual excita- 
tion, and hormone action. The free energy of hydro- 
lysis of GTP can be used to drive reactions that 
otherwise are energetically unfavorable. For example, 
for translocation of a protein through a membrane of 
the endoplasmic reticulum, GTP hydrolysis is prob- 
ably needed to insert the signal sequence into the 
channel and is required to release the signal recogni- 
tion particle from its receptor. GTP may act as an 
allosteric effector, causing a protein to change shape 
slightly. Its hydrolysis can then lead to a cyclic vari- 
ation in macromolecular shape and functioning of the 
protein. This is seen in the GTP-dependent release of 
photoexcited rhodopsin from transducin. 

In mRNA-programmed, ribosome-dependent pro- 
tein synthesis, GTP plays a role at all three stages: 
initiation, elongation, and termination. For initiation, 
the binding of GTP to a protein initiation factor leads 
to formation of the small subunit initiation complex. 
Subsequent hydrolysis of GTP results in the associ- 
ation of the large subunit with the complex. In elonga- 
tion, GTP has more than one role. It binds to 
elongation factor (EF) Tu (EF-1 in eucaryal cells) to 
facilitate the delivery to the ribosome of each succes- 
sive aminoacyl-tRNA as dictated by the mRNA 
sequence. The aminoacyl-tRNA is delivered as part 
of a ternary complex composed of itself, EF-Tu, and 
GTP. After GTP hydrolysis, EF-Tu is released, GTP 
is regenerated, and the cycle continues for the next 
designated aminoacyl-tRNA. To ensure the accuracy 
of the match between the incoming aminoacyl-tRNA 
and the mRNA codon in the A-site of the ribosome, 
the binding of GTP and its EF-Tu-dependent hydro- 
lysis play a role in the process of proofreading. After 
peptide bond formation, translocation of the peptidy]- 
tRNA from the A-site to the P-site requires the bind- 
ing of GTP to EF-G (EF-2 in eucaryal cells) and its 
subsequent EF-G-dependent hydrolysis. 

Finally, after the presence of a termination codon 
(UGA, UAA, or UAG) in the A-site is recognized by 
the combined action of a protein release factor (RF) 
and specific regions of ribosomal RNA, a signal is 
transmitted to the hydrolytic center of the ribosome 
(in the large subunit) to hydrolyze the peptidyl-tRNA 
in the P-site. The binding of GTP to another RF and 
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its subsequent RF-dependent hydrolysis functions to 
promote the release of the codon-dependent RF from 
the ribosome, preparing the way for dissociation of 
the subunits, release of mRNA, and utilization of the 
ribosomal subunits in another round of polypeptide 
synthesis. 


See also: Protein Synthesis; Ribosomal RNA 
(rRNA) 
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Guanine is a purine (molecular formula C;H;N5O) 
found within RNA in the form of a ribonucleotidyl 
residue and in DNA in the form of a deoxynucleotidyl 
residue. In DNA, guanine is usually base-paired via 
three hydrogen bonds with the pyrimidine cytosine. A 
number of low-molecular-weight, guanine-containing 
nucleotide coenzymes are also found within cells, 
where they serve as substrates for RNA and DNA 
biosynthesis, energy sources in protein biosynthesis, 
and donors of sugar residues in the synthesis of poly- 
saccharides. Guanine residues in DNA are uniquely 
susceptible to alteration by reactive oxygen species. 
When guanine in DNA undergoes oxidation to 
8-oxoguanine, its base-pairing properties change 
and it acquires the ability to pair with adenine. Most 
cells have an active base-excision repair system that 
removes 8-oxoguanine residues from DNA, thereby 
avoiding the potentially hazardous creation of a trans- 
version mutation. 


See also: Purine 


Guide RNA 


See: RNA Editing in Trypanosomes 
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Gynogenetic embryos have maternal genomes only 
(haploid or diploid) but arise from oocytes that have 
been fertilized by sperm. In natural gynogenesis 
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(occurring, for example, in some fish species), the pater- 
nal, sperm-derived genome is inactivated or lost. Ex- 
perimentally, gynogenetic embryos can be made using 
irradiated sperm. The haploid gynogenetic embryos 
can then be diploidized. In the mouse, gynogenetic 
embryos are made by pronuclear transplantation. 
Following fertilization and formation of pronuclei, 
the male pronucleus is removed by microsurgery and 
replaced by a second female pronucleus. 

Gynogenetic embryos are useful for genetic map- 
ping or for the rapid recovery of mutations. In mice 
gynogenetic development only progresses midway 
through gestation, to early postimplantation stages. 
Such embryos have particularly deficient development 
of extraembryonic membranes such as the trophoblast 
and yolk sac, which may explain the failure of further 
development. This developmental failure is explained 
by the phenomenon of genomic imprinting, whereby 
certain genes in eutherian mammals are expressed 
from only one of the parental chromosomes. Gyno- 
genetic embryos thus lack gene products that are only 
made by the paternal genome, and have overexpres- 
sion of gene products made by the maternal genome. 


Parthenogenetic embryos are those that have been 
activated to develop without sperm. Diploid partheno- 
genetic embryos have the same developmental poten- 
tial as gynogenetic ones, showing that 1 imprinting is a 
purely nuclear phenomenon. The existence in nature 
of gynogenetically or parthenogenetically reproduc- 
ing species, or normal development following experi- 
mental production of gynogenones, indicates that 
genomic imprinting is largely absent in these species. 


See also: Imprinting, Genomic; Parthenogenesis, 
Mammalian 
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Gyrase is a type II topoisomerase of Escherichia coli 
that is able to generate negative supercoils in DNA. 


See also: DNA Supercoiling; Topoisomerases 
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H19 is an imprinted gene in which one of the parental 
copies of the gene is silenced. It encodes for a 
nonprotein-coding RNA and is closely linked to the 
reciprocally imprinted gene Igf2, which encodes a 
fetal growth factor. On the parental chromosome, 
H19 is not transcribed and Igf2 is active, while on 
the maternal chromosome H19 is transcriptionally 
active and Igf2 is not. Differences in methylation dis- 
tinguish the parental origin of the gene and methyla- 
tion of nearby silencer and enhancer elements play an 
important role in the gene’s regulation. The differen- 
tially methylated domain (DMD) located upstream of 
H19 is essential for the imprinting of both H19 and 
Igf2. H19 is located on mouse distal chromosome 7 
and on the Beckwith-Wiedemann region on human 
chromosome 11p15.5. 


See also: Gene Silencing; Igf2 Locus; Imprinting, 
Genomic 


H2 Locus 


See: Major Histocompatibility Complex (MHC) 
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Hadulins are proteins that are synthesized during 
root-hair development, but especially during the 
deformation and curling that accompanies invasion 
by symbiotic bacteria (rhizobia). An example is a 
nonspecific lipid transfer protein (LTP), the expres- 
sion of which is upregulated in root hairs by rhizobia 
and their Nod-factors (Krause et al., 1994). 
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A hairpin is a double helical region formed by base- 
pairing between adjacent (antiparallel) complementary 
sequences in a single strand of RNA or DNA. It 
comprises a stem and loop structure. 


See also: Antiparallel 
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Hairy cell leukemia (HCL) is a malignancy of mature 
B lymphocytes with cytoplasmic ‘hairy’ projections 
involving the peripheral blood, bone marrow, and red 
pulp of the spleen. HCL comprises about 2% of adult 
leukemias and affects predominantly males (male: 
female ratio 5:1) with a median age of 52 years. Clin- 
ically the main features are splenomegaly, anemia, 
thrombocytopenia, and leukopenia. The low leukocyte 
count, chiefly neutrophils and monocytes, is respon- 
sible for opportunistic infections in untreated patients. 
Large abdominal nodes are a feature in a minority. 
Hairy cell express surface Ig (IgM+/— D, G, or A) 
with a single light chain, and B-cell antigens CD19, 
20, 22, 79a, and express strongly CD11c, CD25, and 
CD103. There is no consistent cytogenetic abnormal- 
ity but there is overexpression of cyclin D1 in about 
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50% in the absence of t (11314) or BCL1 rearrange- 
ment. Prolonged remission can be obtained with the 
nucleoside analogs pentostatin and cladribine. Median 
survival is greater than 10 years. A rare variant form 
of the disease has the same histological features as 
typical HCL but the leukocyte count is high (50 x 
10°17"), the hairy cells have a prominent nucleolus and 
the response to therapy and overall prognosis are 
poor. 


See also: Leukemia 
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John Burdon Sanderson Haldane (“J.B.S.”) (1892- 
1964) (Figure 1) was widely acknowledged as the 
last of the polymaths, a renaissance man, and a scholar 
of ancient classics, who contributed significantly to 
physiology, genetics, biochemistry, and biometry 
while possessing no academic qualification in science. 
He was a highly skilled and versatile popularizer of 
science, who regularly contributed to numerous 
magazines and newspapers. 

J.B.S. Haldane was born in Oxford, England, on 5 
November 1892. He was the son of John Scott 
Haldane, a distinguished Oxford physiologist, and 
Louisa Kathleen Trotter, who came from a comfort- 
able south Scottish family whose ancestors served 
with distinction in India. Haldane’s childhood was 
marked by episodes of precocious intellectual feats 
which occasionally, though not always, portend a 
future genius. Haldane was educated at Eton and 
Oxford, graduating with distinction in the classics 
in 1914, but received no formal training in any branch 
of science. From an early age, his father encouraged 
him to assist in physiological experiments and taught 
him the fundamentals of science. The rest was self- 
taught. 


Scientific Work 


Haldane’s contributions to genetics were largely the- 
oretical and mathematical. Yet few scientists have had 
more influence on the steady growth of genetics than 
Haldane during his long career. Haldane’s first con- 
tribution to genetics, which dealt with the measure- 
ment of linkage in mice, was published in 1914. His 
research was interrupted by World War I, but in 1919 
Haldane worked out a more accurate method of 
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J.B.S. Haldane, arriving in India, 1957. 


detecting linkage and a way of relating the map dis- 
tance to the frequency of recombination (‘mapping 
function’). He suggested the use of “centimorgan’ 
(cM) as a unit of chromosome length. In collaboration 
with others, he undertook a series of linkage studies in 
the following years, extending the linkage theory to 
polyploids, demonstrating the effect of age on the 
frequency of recombination in the fowl, and demon- 
strating partial sex-linkage in the mosquito Culex 
molestus. In his book New Paths in Genetics (1941) 
Haldane introduced ‘cis’ and ‘trans’ to replace the 
terms ‘coupling’ and ‘repulsion’ that were in vogue 
at that time. 

Perhaps the most famous aspect of Haldane’s 
genetical work is his generalization concerning the 
offspring in interspecific crosses, which he formulated 
in 1922, called ‘Haldane’s rule’: 


When in the first generation between hybrids between two 
species one sex is absent, rare or sterile, that sex is always the 
heterogametic sex. 


This rule has stood the test of time since Haldane first 
proposed it in 1922, having shown to be valid in 
different species across several taxa in the animal king- 
dom. 

As early as 1920, Haldane was already referring to 
the gene as a nucleoprotein molecule, emphasizing 
that enzymes are products of gene action, and intro- 
ducing the concept of one gene-one enzyme. 
Although experimental evidence was produced in 
1941, using Neurospora, Haldane’s early emphasis on 
the biochemical interpretation of gene action prepared 
the ground for the ready acceptance in later years of 
the experimental results of Beadle, Ephrussi, Tatum, 
and Lederberg. 


Population Genetics 


Haldane is best remembered as a founder of popula- 
tion genetics, an honor he shared with R.A. Fisher and 
S. Wright. Population genetics is best described as the 
offspring of the union between Mendelian genetics 
and the Darwinian theory of evolution. In a series of 
papers, entitled “A mathematical theory of natural 
selection,” which were published between 1924 and 
1934, Haldane investigated the conditions required to 
maintain a balance between selection intensity and 
mutation pressure, under varying intensities of selec- 
tion, inbreeding, size of the population, frequency of a 
character, reproductive isolation, type of inheritance, 
and environmental interaction. Haldane later com- 
mented that adequate quantitative data are rarely 
available to test the mathematical models that he and 
others had developed. 

Haldane’s contributions to population genetics 
were quite extensive. He showed that the probability 
that a single mutation will ultimately become estab- 
lished in a population of finite size is proportional to 
its selective advantage, but for dominant mutations is 
independent of the population size. He showed 
further that mutant genes which are harmful singly, 
but become advantageous in combination, could accu- 
mulate in small, isolated populations, leading even- 
tually to speciation. 

Haldane further showed, mathematically, that the 
impact of a mutation on a population depends merely 
on the rate of recurrence of the mutation and not on 
the degree of severity of selection against it. This 
principle was later applied to measure the impact of 
genetic damage resulting from high-energy radiation 
by the US National Academy of Sciences. From an 
evolutionary point of view, Haldane’s paper on “The 
cost of natural selection” broke new ground in its 
approach to measuring one of the major factors deter- 
mining the rate of evolution. His eee showed 
that, during the course of evolution, the substitution 
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of one gene by another involves a number of deaths 
that is equal to 30 times the number ina generation (on 
average) and that the mean time taken for each gene 
substitution is about 300 generations. He concluded: 
“This accords with the observed slowness of evolu- 
tion.” Subsequently, Haldane’s work became the basis 
for Motoo Kimura’s neutral theory of evolution. 


Human Genetics 


Haldane’s contributions to human genetics were 
of particular importance. He was a pioneer who laid 
its foundations, and shaped and nursed its growth 
from its infancy to a mature discipline. Furthermore, 
through numerous popular writings, Haldane pre- 
pared the ground for the acceptance of human genetics 
and an appreciation of its importance in the public 
domain. He developed statistical methods for the 
study of genetic traits in families and populations 
and the analysis of gene-environment interaction. 
He estimated the first mutation rate of a human gene 
(hemophilia) and prepared the first human gene map, 
involving the traits on the X chromosome, hemophilia 
and color blindness. 

Of special importance was Haldane’s suggestion 
that resistance to malaria and other infectious diseases 
played a significant role in recent human evolution, 
resulting in greater genetic diversity and greater pre- 
valence of certain diseases such as sickle cell anemia. 
This has stimulated a great deal of epidemiological 
research of considerable importance in recent years. 
Haldane’s books include: Daedalus or Science and the 
Future (1923), Possible Worlds and Other Essays 
(1928), The Causes of Evolution (1932), Science and 
the Supernatural (1935), Heredity and Politics (1938), 
New Paths in Genetics (1941), and The Biochemistry 
of Genetics (1954). He was also the author of a popular 
children’s storybook, My Friend, Mr. Leaky (1937). 
For several years during the 1940s, Haldane embraced 
Marxism, but there is no evidence to indicate that it 
had a significant influence on his scientific work. 
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The Haldane—Muller principle, as the name suggests, 
was discovered independently by J.B.S. Haldane and 
H.J. Muller. It relates the mean fitness of a population 
to the mutation rate. The impact of recurrent harmful 
mutation on fitness is a function, not of the deleterious 
effect per mutation, but of the mutation rate itself. 

This is perhaps counterintuitive. It can be explained 
simply as follows. If a mutation has an effect so drastic 
that it kills the individual carrying it, the mutation 
causes one death. If, in contrast, it causes a 1% prob- 
ability of death, it will persist in the population for an 
average of 100 generations before being eliminated 
and will therefore affect 100 individuals. If, ina system 
of mutation cost-accounting, 100 individuals, each with 
a1% probability of death, are equated to one individual 
with 100% probability, then each causes one death. 
The effect may be reduced fertility rather than sur- 
vival, but the principle is similar. Muller called each 
premature death or failure to reproduce a genetic 
death. 

Algebraically, in a population of size N, 2Nu 
dominant mutations occur per generation, where p is 
the mutation rate per locus per generation. Each gen- 
eration NQs mutations are eliminated by selection, 
where Q is the number of mutations per individual 
and s is the individual probability of elimination. At 
equilibrium these two processes must balance, hence 
NQs = 2Nu, and Q = 2u/s. Now, if each mutant 
causes a fitness reduction equal to s, the mutation 
load is Qs = 2u. Summing over all relevant loci, the 
mutation load is 2Zu or twice the total mutation rate 
per gamete. If the mutations are recessive, then two are 
eliminated by each genetic death, and the load is only 
half as large. 

The Haldane—Muller principle may then be stated, 
as Haldane did, that the total effect of mutation on 


fitness is the total haploid mutation rate per gener- 
ation, multiplied by a factor of 1 or 2 depending on 
whether the mutation is recessive or dominant. If n 
mutations are eliminated with each genetic death, as 
might be true with extreme epistasis, then the load is 
1/n as large as if they were eliminated independently. 

J.L. King made this more precise by saying that the 
mutation load is twice the mutation rate divided by 
the difference between the frequency of mutations in 
individuals eliminated by selection and that before 
selection. This principle can be written in more gen- 
eral form. The mutation load is 


L=2U/(z—x+2U) 


in which U = Sy(1 — q), q is the mutant allele fre- 
quency, x is the mean number of mutations per indi- 
vidual before selection and z is the mean number of 
individuals eliminated by selection per generation 
(Kondrashov and Crow, 1988). 

With epistasis the mutation load can be consider- 
ably decreased by permitting several mutations to be 
eliminated with each genetic death. Such epistasis is 
generated by truncation selection, which may be the 
way in which many organisms survive a high mutation 
rate (see Mutation Load). 
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Selfish Genes and Cooperation 


Paradoxically, inheritance is the basis of evolutionary 
change. Without safe transmission of genetic informa- 
tion from one generation to the next, there would be 
random arrangement of the genetic building blocks. 
Constant randomization of information carriers ob- 
viously cannot lead to meaningful information. Thus, 
the cornerstone of evolution is genetics. Only after 
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Selfish Genes and Cooperation 


Paradoxically, inheritance is the basis of evolutionary 
change. Without safe transmission of genetic informa- 
tion from one generation to the next, there would be 
random arrangement of the genetic building blocks. 
Constant randomization of information carriers ob- 
viously cannot lead to meaningful information. Thus, 
the cornerstone of evolution is genetics. Only after 


conserving well-tried genes can there be competition 
(selection) between new, yet untested ones (1.e., muta- 
tions). Charles Darwin (1809-1882) was the first to 
formulate a theory of gradual evolutionary change 
caused by adaptive mutations that are selected out of 
a number of other random variants. In a relentless 
“struggle for existence” many slightly different vari- 
ants are competing with each other and only few 
survive. Darwin’s notion of “survival of the fittest” 
seems to convey the picture of a war in which every- 
one fights everyone. Nature is “red in tooth and claw,” 
a merciless killing in which only the strongest and 
meanest can prevail. Victory (i.e., evolutionary success 
or fitness’) is granted according to the reproductive 
success of the survivor. Again, only if the trait that led 
to successful reproduction is safely transmitted to the 
offspring, will this trait spread and eventually be 
represented as a feature of the species. Of course, if 
the trait in addition leads to procreation at a competi- 
tor’s expense, the animal not only gains fitness itself, 
but also reduces the fitness of those animals it is 
exploiting, increasing its odds even further. It is no 
wonder that parasitism and exploitation are wide- 
spread phenomena and virtually universal across the 
living world. Darwin himself emphasized: 


No instinct has been produced for the exclusive good of 
other animals, but each animal takes advantage of the 
instincts of others. 


Indeed this is one of the few truly falsifiable test 
statements in the Darwinian theory. And it seems so 
easily falsifiable: is there not ample evidence of co- 
operation in the animal kingdom? Parental care, shoal- 
ing fish, cooperatively hunting wolfs or lions, the 
mycorhiza symbiosis between the fungus and the 
plant, the subterranean colonies of the naked mole 
rat, coalition forming in primates or the social insects 
are but some of the most well known examples. Dar- 
win was well aware of the problem and described it as: 


One special difficulty, which at first appeared to me insuper- 
able, and actually fatal to the whole theory. 


Group Selection 


When describing the above problem, Darwin was 
referring to the social insects in particular. At that 
time, it was already common knowledge that hymen- 
opteran colonies (honeybees, wasps, bumble bees, 
and ants) usually consist of one reproducing queen 
and a multitude of sterile workers. This particular 
case of sociality is termed ‘eusociality.’ In addition to 
sterile individuals cooperatively helping the fertile 
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animals to raise their offspring, eusociality is charac- 
terized by another trait: At least two generations 
overlap in life stages in which they are capable of 
contributing to colony labor, so that the offspring 
can assist their parents during part of their life cycle. 

The abandonment of reproduction by the worker 
caste was the huge dilemma to which Darwin devoted 
an entire chapter in his book On the Origin of Species. 
While the omnipresence of exploitation is well in 
accord with the rule “reproduce at the cost of your 
competitors,” the equally obvious existence of all 
degrees of altruism up to the complete sacrifice of re- 
productive success in favor of another organism seemed 
an insurmountable obstacle. How can individuals 
without their own offspring exist if reproduction and 
inheritance are the foundation of the whole theory? 

Darwin’s own solution was to assume that the 
colonies formed some sort of superorganism that 
competes against other colonies in a very similar way 
as individuals do. To perceive animal colonies as 
superorganisms with their members as rough ana- 
logs of cells has long been known and is a very 
useful concept, even today for certain studies. The idea 
of family or ‘group selection’ placated Darwin’s con- 
temporaries and was still widely accepted well into the 
twentieth century. According to this idea, the unit of 
selection for altruistic alleles of an originally selfish 
gene would be the colony or deme, not the individual. 
The altruistic, cooperative allele spreads in the species, 
as colonies without a high occurrence (gene fre- 
quency) of this allele become extinct. However, in 
order for interdemic selection to be effective, one has 
to assume that there is no migration between the groups 
and that there is sufficient selection pressure, i.e., the 
rate of colony extinction is very high. Fucthermots, 
individual selection will always be faster than group 
selection, as the number of individual organisms is 
much larger than that of populations and the turnover 
rate of individuals is much higher. Thus, group selec- 
tion can never counteract individual selection. Because 
of these considerations, group selection was even- 
tually abandoned as the prime explanation for the 
evolution of cooperation. Then, in 1964, William 
Donald Hamilton’s principle of ‘kin selection’ was 
published in the Journal of Theoretical Biology. At 
the time, it was so innovative that it almost failed to 
be published and was largely ignored for a decade. 
When finally noticed, its influence spread exponen- 
tially until it became one of the most cited papers in 
the field of biology. It is the key to understanding 
the evolution of altruistic cooperation among related 
organisms, such as the social insects. Cooperation 
among unrelated individuals is beyond the scope 
of this article and is treated elsewhere (see Further 
Reading). 
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Kin Selection 


Why should there be a distinction between cooper- 
ation among unrelated individuals and that among 
related individuals? We have learnt that genetics is 
the basis upon which evolutionary change is taking 
place. Fitness was defined above in terms of successful 
reproduction, i.e., the number of offspring carrying 
the selected allele. The more offspring, the ‘fitter’ the 
parent. Darwin’s “struggle for existence” is a struggle 
for reproduction. With sexual reproduction, however, 
only one half of an organism’s genome is transferred to 
one of his offspring at a time. Therefore, any particular 
trait — depending on its mode of inheritance — is often 
transmitted from a parent to its offspring with a prob- 
ability of less than one. Thus, in order to transmit as 
many of one’s genes into the next generation as pos- 
sible (and hence be evolutionarily successful), an organ- 
ism has to produce as many surviving offspring as 
possible in order to maximize the probability of trans- 
mitting all its genetic information. This might consti- 
tute a difficult task, however, since all its competitors 
try to do the same. But there are other sources of one’s 
own genes available: relatives. 

An ordinary diploid, sexually produced organism 
shares 50% of its genes with either of its parents. 
Accordingly, it shares about 50% of its genes with its 
siblings, 25% with its uncles, aunts, grandparents, 
grandchildren, etc. (coefficient of relatedness, see 
Figure |). Hamilton’s stroke of genius was to refor- 
mulate the definition of fitness as the number of an 
individual’s alleles in the next generation. Or, more 
precisely, inclusive fitness is defined as an individual’s 
relative genetic representation in the gene pool in the 
next generation: 


(own contribution 
+ contribution of relatives) 
inclusive fitness = (1) 
average contribution 


of the population 


Thus, fitness denotes the capability of an allele to 
spread in a population: if the fitness value for a given 
allele is larger than one it will increase in frequency 
and if it is smaller it will decrease in frequency. It is 
evident that such genic (as opposed to group) selec- 
tion will favor an allele that not only enhances repro- 
ductive success of its carrier, but also of all other 
individuals sufficiently related to it. But could an allele 
that reduces the fitness of its carrier while enhancing 
the fitness of its relatives be adaptive? Would it spread 
in a natural population? This is not a trivial question 
and it takes some computational effort to solve it. 
Let us try to formulate the inclusive fitness w of an 
individual 7. As noted in equation (1), w should be 


composed of the fitness a of the focal individual and 
the contribution x of its relatives: 


Wi = 4i +x (2) 


The contribution x to individual 2’s inclusive fitness w 
is then the sum of all alleles in the gene pool that are 
shared by 2 and its relatives j: 


x= ig (3) 
J 


where r is the coefficient of relatedness between indi- 
vidual z and its relative j, and b is the fitness of j. Note 
that ris always < 1 and therefore j’s contribution to w; 
depends critically on its relatedness to 7. We can thus 
reformulate equation (2) to: 


Wi = ai + 5 rjbij (4) 
J 


Obviously, if the allele in question infers a fitness cost 
(i.e., a; < 1), w; will only be greater than one if r is 
sufficiently high (given that the higher fitness b of the 
relative also means higher cost). Reformulating equa- 
tion (4) into a cost (C)/benefit (B) ratio describing the 
necessity of w; being greater than one if the allele of 
interest is to spread, yields 


1—C+rB>1 (5) 


which can be easily rearranged to produce Hamilton’s 
rule: 


B-C>0or $ <r (6) 


Put into words, the relatedness of the individual that 
profits from the altruistic act of the focal individual 
must be higher than the cost/benefit ratio this act 
imposes. Thus, the question as to whether ‘coopera- 
tive’ genes may spread even if the cooperation infers 
fitness costs, can be solved both by simulation to find 
out the critical ranges of the parameters in question and 
experimentally by measuring the relevant parameters 
and comparing them with the simulated results. 

A very simple example will explain the concept. 
Consider a pair of brothers (r = 0.5, see Figure 1), 
one of whom sacrifices all of his fitness (C = 1) by not 
reproducing, but helping his brother to successfully 
rear offspring. In order for C/B to become smaller 
than r = 0.5, the altruist’s act must at least double the 
receiver's fitness in order for the altruist to gain 
representation in the next generation. Evidently, a 
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Figure | The coefficient of relatedness. In diploid organisms, every parent (top row) transmits 50% of its genetic 
information to each offspring (middle row). On average, therefore, siblings share half of each parent’s contribution to 
their genome, adding to a coefficient of relatedness r = 0.5. Consequently, cousins share an r = 0.125 or r = 1/8 
(bottom row). Likewise, these cousins are related to their common grandparents by 1/4 or r = 0.25. It might also be 
said that r is a measure for the probability that any given allele is shared by two individuals. 
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Figure 2 The coefficient of relatedness with haplo-diploid sex determination. Note how the coefficients are 
skewed with respect to the diploid system depicted in Figure I. For example, sisters (middle row) are more related 
to each other (r = 0.75) than they are to their mother (top row; r = 0.5). 
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very high coefficient of relatedness is needed to over- 
come high fitness costs due to sterility or a decrease in 
life expectancy, or both. The benefit of altruism de- 
creases rapidly with declining relatedness. It becomes 
clear that the distinction between cooperation among 
related and among unrelated individuals is vital for 
understanding the evolution of cooperation. 


Hamilton’s Rule and Social Insects 


Why did Hamilton’s theory have such an impact on 
modern evolutionary biology? The main reason for 
this was because it explained the evolution of a signifi- 
cant part of all the cooperation occurring in nature, 
without having to resort to group selection and its 
very restrictive assumptions. But there is another piece 
of evidence that adds embellishment to a beautiful the- 
ory: the haplo-diploid sex determination of the social 
hymenopterans, i.e., the bees, ants, and wasps -the very 
insects that posed such a severe puzzle to Darwin. 
While most animal genera have a hetero- and a 
homogametic sex (i.e., a different set of sex chromo- 
somes for the different sexes), hymenopterans univer- 
sally produce males from unfertilized (i.e., haploid) 
eggs and females from fertilized (i.e., diploid) eggs. 
This system skews relatedness in an almost perfect 
way for eusociality to evolve (see Figure 2). Consider 
a female worker. Half of her genome comes from the 
father (haploid) and half from the mother (diploid). 
That means she carries all of her father’s genes and half 
of her mother’s genes, and so does her sister, implying 
that they share the entire genome of their common 
father (i.e., already 50% of their genome), plus, on 
average, a quarter of their mother’s genome, yielding 
a coefficient of relatedness of 0.75. Thus, altruistically 
helping their mother (the queen) and her offspring 
(new founding queens and workers) need only yield 
a small benefit (compared to a ‘normal’ diploid organ- 
ism) in order to spread through the population. 
Accordingly, the hymenopterans are the order with 
the highest occurrence of eusociality in the animal 
kingdom: eusociality has arisen at least eleven times 
independently during the evolution of the hymenop- 
terans. Only a few species within the Arthropoda are 
known to be eusocial, such as the termites (Isoptera) 
and some aphids (Hemiptera). Outside the Arthro- 
poda, the only species known to form eusocial col- 
onies is the naked mole rat (Heterocephalus glaber). 
This prevalence of eusociality within the hymenopter- 
ans is very suggestive of Hamilton’s rule having a deep 
impact on their evolutionary path. (Note that haplo- 
diploidy is not sufficient, however, to create sociality 
because most hymenopteran species are solitary.) In 
the light of the theory of kin selection, even Darwin’s 
notion of family or group selection can be seen in a 


different light: the otherwise weak interdemic selec- 
tion can act together with genic selection, and against 
individual selection, to spread cooperative genes in a 
population. 


Further Reading 

Brembs B (1996) Chaos cheating and cooperation: potential 
solutions to the Prisoner’s Dilemma. Oikos 76: 14-24. 
http://brembs.net/ipd 

Futuyma DJ (1986) Evolutionary Biology. Sunderland, MA: Sinauer 
Associates. 

Hamilton WD (1996) Narrow Roads of Gene Land. Oxford: 
Oxford University Press. 

Holldobler B and Wilson EO (1990) The Ants. Cambridge, MA: 
Belknap Press. 


See also: Fitness; Frequency-Dependent 
Selection; Population Genetics 
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Handedness, or left-right asymmetry, can refer to 
asymmetry at three or more levels of organization. 


Molecular Handedness 


Molecular handedness refers to the chirality of mol- 
ecules resulting from asymmetrically substituted car- 
bon atoms, or to differently handed arrangements of 
larger assemblages of atoms, for example in right- 
handed and left-handed DNA structures. 


Developmental Handedness 


Developmental handedness or laterality usually 
refers to differences between the left and right sides 
of bilaterally organized animals. Some animals, such as 
fruit flies, are bilaterally symmetric in their anatomy, 
but most animals exhibit anatomical differences 
between left and right sides, to a greater or lesser 
extent. How these differences are created is a special 
case of pattern formation in development, and some 
progress has been made in understanding the genetic 
and molecular basis of laterality in vertebrates and 
nematodes, although the mechanisms do not seem to 
be conserved. Some animals, or organs within them, 
also exhibit helical anatomy which can take one of two 
hands. The two testes of male fruit flies are bilaterally 
symmetric in placement, but both develop into 


left-handed helical tubes, spiraling counterclockwise. 
Snail shells can occur in either left-handed or right- 
handed spirals. In this case the direction of the spiral is 
genetically determined by the maternal genotype, 
which affects the handedness of the first spiral cleav- 
age in developing eggs. Many plants exhibit helical 
growth patterns, with one hand or the other preferred. 


Behavioral Handedness 


Behavioral handedness in animals refers to the prefer- 
ential use of one limb or organ as compared to its 
contralateral homolog. The fact that most, but not 
all, humans are right-handed suggests that this behav- 
ioral asymmetry is adopted partly at random, but is 
biased towards right-handedness by developmental 
asymmetry. There is no convincing evidence for a 
separate genetic influence on behavioral handedness 
in humans. 


See also: Maternal Effect; Pattern Formation; 
Right/Left Handed DNA 
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The term haploid number refers to the number of 
chromosomes contained within each gamete. During 
gametogenesis the chromosome number is reduced to 
half the number present in somatic cells. This is 
achieved in the first meiotic (reduction) division, at 
which the chromosomes in the pregametic cell pair 
with their homologs to form bivalents, which process 
allows each member of the pair to separate from one 
another during first anaphase into different daughter 
cells prior to the second meiotic division (see Meiosis). 


See also: Diploidy; Meiosis 
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Haploinsufficiency is the requirement for two wild- 
type copies of a gene for a normal phenotype. For 
haploinsufficient genes, when one copy of a gene is 
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deleted or contains a loss-of-function mutation, the 
dosage of normal product generated by the single 
wild-type gene is not sufficient for complete function. 
Diseases resulting from haploinsufficiency are usually 
caused by mutations in genes encoding proteins 
required in large amounts, or in genes encoding regu- 
latory molecules whose concentrations are closely 
titrated within the organism. Human diseases asso- 
ciated with haploinsufficiency include Greig syn- 
drome, which results from loss of the transcriptional 
regulatory protein GLI-3, and Williams syndrome, 
which results from a deletion of the gene encoding 
the extracellular matrix protein elastin. 
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The term ‘haplotype’ refers to a particular set of alleles 
at linked loci that are present on one of two homo- 
logous chromosomes. During the course of a gene- 
mapping experiment involving backcross or intercross 
mating schemes, geneticists use a process known as 
‘haplotype analysis’ to place genetic markers in a pre- 
cise order. For purposes of this discussion, we will use 
a backcross as an example. To initiate a backcross 
mapping experiment, two inbred parental strains, A 
and B, are mated to produce an F, hybrid. By defini- 
tion, strain A is inbred and can be considered to be 
homozygous for A alleles (A/A) at all autosomal loci. 
Likewise, strain B can be considered to be homo- 
zygous for B alleles (B/B) at all autosomal loci. F; 
hybrids derived from these parental strains must be 
heterozygous (A/B) at all autosomal loci. Meiotic 
events within the germline of the F, hybrid generate 
recombinant chromosomes in which A and B alleles 
are placed in new combinations along the length of the 
chromosome. By backcrossing the F, hybrid back to 
an inbred parental strain (of strain B in this case), one 
can determine the haplotype of these recombinant 
chromosomes by genotyping the resulting progeny. 
If, for example, five closely linked markers, 1-5, 
were genotyped in a single offspring and had the 
haplotype 1° (A at locus 1), 28 34, 44, and 57, one 
can conclude that loci 1, 3, and 4 are on one side of a 
point of recombination, and that loci 2 and 5 are on the 
opposite side. Similarly, by determining the haplo- 
types of additional progeny in which recombination 
has occurred between different sets of markers, one 
can begin to subdivide these groups and refine the 
order of markers even further. 
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By using mapping panels containing DNA from 
hundreds of progeny that have been previously geno- 
typed with thousands of markers, one can quickly 
establish the map location of any new genetic marker 
that is polymorphic between the two parental strains. 
Two such well-characterized mouse community map- 
ping crosses include The Jackson Laboratory Back- 
cross Mapping Panels and the European Collaborative 
Interspecific Mouse Backcross Mapping Panel. By 
utilizing distantly related parental strains, these map- 
ping panels provide useful community resources that 
exploit polymorphism at a large number of loci. 

Specialized mapping panels are also frequently 
established that are segregating for an investigator’s 
phenotype of interest. Using these mapping panels, a 
phenotype (for which no molecular basis has been 
elucidated) can also be mapped with respect to nearby 
genetic markers. This provides the basis for position- 
ally cloning the gene underlying the mutant pheno- 
type. 

The term ‘haplotype’ can also be used to describe 
particular sets of alleles present at linked loci within 
naturally occurring populations; for example, t haplo- 
types occurring within a specialized region of mouse 
chromosome 17 known as the ż complex. 


See also: Gene Mapping; Linkage Map; 
t Haplotype 
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When Mendel’s laws were rediscovered, many biolo- 
gists had difficulty understanding why recessive 
traits were not lost from populations. “If recessive 
traits are not expressed in the presence of a dominant 
allele,” they reasoned, “they should eventually dis- 
appear from populations.” Observations showed that 
this does not happen. W. E. Castle explained why this 
was so in 1903, although his explanation was ignored 
until G.H. Hardy, a British mathematician, and 
W. Weinberg, a German biologist, published papers 
independently in 1908 that provided mathematical 
justification for Castle’s intuitive argument. 

Hardy’s and Weinberg’s papers pointed out another 
important consequence of Mendel’s rules as applied to 
populations: if individuals choose mates at random 
and several other important assumptions apply, 
there is a simple relationship between allele frequen- 
cies and genotype frequencies. If x; is the frequency of 


a genotype carrying alleles A; and A;, and if pẹ is the 
frequency of allele Az, then: 


xj = 2pipj if i Aj and xi =p; 


A population in which this relationship between geno- 
type and allele frequencies holds is said to have its 
genotypes in Hardy—Weinberg proportions. 


Deriving Hardy—Weinberg Proportions 


In many types of population genetic problems it is 
useful to construct a mating table. From this table we 
can calculate the frequency of genotypes and alleles 
among offspring produced according to any specified 
mating pattern. Table | shows the mating table for the 
six conditions sufficient to guarantee that genotypes in 
a population segregating for two alleles at one locus 
will be found in Hardy—Weinberg proportions. 


Meiosis is Fair 

The first of the six conditions required for derivation 
of the Hardy-Weinberg proportions is that segrega- 
tion in heterozygotes produces equal proportions of 
the two types of gametes. While this assumption is 
usually met, there are exceptions. At the t-allele locus 
in house mice or the segregation distorter locus in 
Drosophilia, for example, some alleles may be found 
in more than 90% of gametes produced by heterozyg- 
otes. If segregation distortion occurs, then the geno- 
type proportions among progeny of any matings 
involving heterozygotes will be quite different from 
those shown in Table I. If A; is found in 90% of the 
gametes produced by A42 heterozygotes, for example, 
then 90% of the progeny of a mating between AA; 
and A,A) will be A,A, and only 10% will be A442. 


Table I Mating table for Hardy-Weinberg 
proportions 


Offspring genotypes 


Mating Frequency A,A, A\A2 A2A2 
AAi x AA; x? | 0 0 
AVA, x AA X11 1/2 1/2 0 
A\A\ x AzA2 X1 1X22 0 | 0 
A\A2 x A\A\ X12X11 1/2 1/2 0 
A,A x AA xÈ 1/4 1/2 1/4 
A,A X AA X12X22 0 1/2 1/2 
AA x AA, X22X11 0 l 0 
ArA2 x A\A2 X22X12 0 1/2 1/2 
MA2 x ArAr x32 0 0 | 


No Input of New Genetic Material 

If mutations occur while gametes are being produced, 
alleles will be passed from parents to progeny with 
probabilities different from those in the absence of 
mutation. If A; can mutate to A, for example, there 
is a chance that some progeny of a mating between 
two AA; individuals will be A,A> or even AzA>. In 
fact, if the A4414; mutates to A442 with a frequency w 
A,A) progeny will occur in this cross with a frequency 
2u(1— u) and A,A, progeny will occur with a fre- 
quency u^. Similarly, if new individuals become part 
of the population through migration, then the fre- 
quency with which different types of matings occur 
will depend on the genotype frequency of migrants, 
not just on the genotype frequency of the resident 
population. Thus, either the genotype proportions or 
the mating frequencies in Table | would have to be 
changed if this assumption were violated. 


Individuals Mate at Random 

The assumption of random mating is the one most 
commonly identified with the Hardy-Weinberg law. 
In conjunction with the assumption that there is no 
migration into the population (see section “No input 
of new genetic material”), this assumption allows us to 
calculate the frequency with which each type of 
mating occurs. The probability that a particular pair of 
genotypes mates at random is equal to the probability 
that two individuals we select at random from the 
population have those genotypes. For example, the 
probability that we select an A,A) individual at ran- 
dom is x12. The probability that we select an A,A, 
individual at random is x11. Thus, the probability of 
an A1A2 x A,A; mating is x12x11, and the probability 
of an AA; x AA mating is x11x12. (Recall that it is 
conventional to describe matings with the genotype of 
the maternal parent first, so these are two different 
types of matings.) Similarly, the frequency of an A242 
x AzA, mating is X22. 


The Population is Effectively Infinite 

In a small population the ‘actual’ frequency of off- 
spring genotypes observed in matings involving a 
heterozygote may be different from the ‘expected’ 
frequency listed in the Table I for the same reason 
that a fair coin tossed four times will not always give 
two heads and two tails. If meiosis is fair, the gametes 
that participate in fertilization are a random sample of 
all gametes produced, and in a small sample the 
observed and expected frequencies may be different 
from one another. Similarly, the ‘actual’ frequency of 
a mating in a small population may differ from the 
‘expected’ frequency. As with sampling of gametes to 
form zygotes, the matings that actually occur are a 
sample of all those that could have occurred. 
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The sampling of gametes and matings are two 
sources of the phenomenon of genetic drift. In a very 
large population the actual and expected frequencies 
will almost always be very close to one another, so we 
can neglect the difference between them. If there are 
50 matings between A,A; and A143, each of which 
produces one offspring, for example, there is a 5% 
chance that the frequency of heterozygotes from 
these matings will either be less than 36% or more 
than 64%. If there were 5000 such matings, however, 
there is a 95% chance that the frequency of hetero- 
zygotes will be between 48% and 52%. 


All Mated Pairs Produce the Same Number 
of Offspring 

The preceding assumptions allow us to calculate the 
frequency with which different types of matings occur 
and the frequency of different genotypes among those 
matings. If we also assume that all mated pairs produce 
the same number of offspring regardless of their geno- 
type, we can calculate the frequency of the different 
genotypes among newly formed progeny. Specifically: 


he no De 4 | 1 1,2 
X11 = X11 X11 X12 + 3X12X11 + 7X12 


2 1,2 
= x41 + X11X12 + 3X13 


t i 1 11,2 
X19 = gX 1X12 + X11X22 + 7X 12X11 + 7X yp 
1 1 
+ 5 X12X22 + X22%11 + 7X 22X12 


= 1.2 
= x11X12 + 2x11X22 + X12X22 + 3X15 


Po PA Ped 2 
X22 = 4X12 + 7X12X22 + 7X22%12 + X23 


A2 2 
= 4X45 + X12X22 + X35 


where the’ is used to distinguish genotype frequencies 
among offspring from those in their parents. 


All Genotypes Survive with the Same 
Probability 

If all genotypes survive with the same probability, 
then the frequency of each genotype in the offspring 
generation is equal to its frequency in newly formed 
zygotes. The frequency of A,;A; among adults, for 
example, will be: 


EE 1,.2 
X11 = X41 T X11X12 + 4X45 
— 1 2 
= (x11 + 5X12) 
=p 


Similarly, the frequency of A4142 among adults will be 
2pq and the frequency of A2Az among adults will 
be q*. Thus, p°, 2pq, and q’ are the Hardy-Weinberg 
proportions for one locus with two alleles. Notice that 
the allele frequency in offspring is equal to the allele 
frequency in parents. 
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Importance of Hardy-Weinberg 
Proportions 


Given the many assumptions needed to derive the 
Hardy—Weinberg Law, it may come as a surprise to 
learn that it plays a central role in the theory of popu- 
lation genetics. It does so for two reasons. First, it 
provides a way to estimate allele frequencies for a 
trait in which heterozygotes are indistinguishable 
from one of the homozygotes, provided we are willing 
to assume that all of the assumptions apply to the 
population in which we are interested. Second, it 
tells us what will happen ina population in the absence 
of any evolutionary forces. As the philosopher Elliott 
Sober has pointed out, it plays a role in population 
genetic theory similar to the role that the first and 
second laws of motion play in Newtonian mechanics. 

The first and second laws of motion tell us that an 
object at rest will tend to remain at rest and an object 
in motion will tend to remain in motion (in a straight 
line at a constant speed) unless acted on by outside 
forces. They are “zero-force laws’ that tell us what to 
expect when no forces are operating on an object. 
Moreover, they allow us to judge the magnitude and 
direction of any forces operating on an object by the 
acceleration to which it is subject. 

The Hardy—Weinberg law is population genetics’ 
zero-force law. It tells us what a population will look 
like if neither genetic drift nor any evolutionary forces 
affect it. If all of the assumptions of Hardy-Weinberg 
apply, then the population must have genotypes in 
Hardy-Weinberg proportions. Moreover, a single gen- 
eration in which those assumptions apply i is sufficient 
to put genotypes into those proportions, and neither 
the allele frequency nor the genotype frequencies will 
change so long as they continue to apply. If genotypes 
are not in Hardy-Weinberg proportions, then one or 
more of the assumptions must have been violated in 
this population, and the direction in which genotypes 
depart from Hardy-Weinberg proportions is often a 
clue to the cause of the departure. If, for example, 
fewer heterozygotes are observed than expected, 
some form of inbreeding is a likely cause. 

It is important to remember, however, which infer- 
ences can be made with the Hardy-Weinberg law and 
which cannot: 


1. If the assumptions apply, genotypes will be in 
Hardy-Weinberg proportions. 

2. If genotypes are not in Hardy-Weinberg propor- 
tions, one or more of the assumptions has been 
violated. 


It is tempting to conclude that if genotypes are in 
Hardy-Weinberg proportions, all the assumptions 
apply. But this conclusion is not justified. Suppose, 


for example, genotypes differ in their ability to sur- 
vive, but all the other assumptions apply. Then geno- 
types will be found in Hardy-Weinberg proportions 
among newly formed zygotes, but they will not be 
found in Hardy-Weinberg proportions in adults. 
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Harlequin chromosomes are metaphase chromosomes 
whose two sister chromatids show reciprocal patterns 
of lightly and darkly stained segments along their 
length. These patterns are the result of multiple sister 
chromatid exchanges that have been made visible 
by incorporating bromodeoxyuridine (BrdU) into 
one strand of the DNA of one chromatid during a 
previous S-phase and preferentially destroying the 
BrdU-containing strand before staining with a 
DNA-binding dye. 


See also: Bloom’s Syndrome; DNA Repair 
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Heat shock proteins (Hsps) are specific proteins 
that are made when cells are briefly exposed to 


temperatures above their normal growth temperature. 
Because Hsps may also be produced by cells exposed 
to harmful chemicals or to other conditions that cause 
cellular stress, they are sometimes called stress pro- 
teins. The synthesis of Hsps results from a turning on 
or induction of the genes encoding these proteins, 
following the temperature increase. 

It was first observed in the fruit fly Drosophila 
melanogaster that when either isolated tissues or 
whole flies were subjected to a heat shock, new pro- 
teins, not detectable in unshocked cells, were made. 
Furthermore, other specific proteins that were 
present in unshocked cells were made in much 
greater amounts following a heat shock. Both of 
these categories of proteins were defined as heat 
shock proteins (Hsps). The synthesis of Hsps is a 
universal phenomenon, occurring in all plant and 
animal species studied, including humans. Hsps are 
also made by prokaryotic cells, namely the bacteria 
and archaea. The temperature at which heat shock 
proteins are induced varies depending upon the 
normal growth temperature of the species. For 
instance, fruit flies, normally grown at 25 °C in the 
laboratory, are heat shocked at 35-37 °C, whereas 
human or mouse cells are induced to make Hsps 
when the temperature is raised to several degrees 
above their normal body temperature of 37 °C, for 
instance 41-42 °C. Chemicals that can induce Hsps 
in many cell types include heavy metal ions and 
arsenite. 

Heat shock proteins can be distinguished on the 
basis of their molecular masses, and are thus conveni- 
ently named according to their sizes. Major Hsps in 
animal cells have molecular masses of approximately 
90000 daltons (Hsp90), 70000 daltons (Hsp70), 
60000 daltons (Hsp60), and 25000-30000 daltons 
(Hsp25 or Hsp30). These four groups also make up 
distinct families of Hsps which have characteristic 
amino acid sequences, three-dimensional structures, 
and mechanisms of action. 

One of the properties of Hsps is the ability to 
prevent partially unfolded proteins from aggregating 
to form insoluble complexes. Since Hsps are able to 
prevent such undesirable interactions, they are also 
referred to as molecular chaperones. In cells, unfolded 
or partially unfolded proteins may include those in the 
process of being made on ribosomes, and therefore not 
yet folded to their mature state, pre-existing proteins 
that have become unfolded due to physical or chem- 
ical stresses, and proteins that are partially unfolded in 
the process of their transport across a cell membrane. 
Thus most Hsps have roles in interacting with 
unfolded proteins in normal, unstressed cells, and are 
also of particular importance during exposure to heat 
or other stressors. 
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With respect to mechanism of action, the best 
understood heat shock protein is Hsp60. Hsp60 
occurs in mitochondria and chloroplasts of eukaryotic 
cells, and in the cytoplasm of bacteria. The bacterial 
Hsp60 is also known as GroEL. GroEL/Hsp60 forms 
a barrel-shaped complex made up of two stacked 
seven-membered rings, and acts as a catalyst of protein 
folding. Partially unfolded protein substrates are 
bound inside the barrel, where repeated cycles of 
binding and release lead to their refolding. GroEL/ 
Hsp60 utilizes adenosine triphosphate (ATP) as a 
source of energy to drive the changes in shape that 
cause the binding and release of the protein substrate. 
A large number of proteins made in bacteria rely on 
GroEL/Hspé60 to attain their correct folded shape, 
and this chaperone system is essential for the life of 
the bacterial cell under all temperature conditions. 

One of the most prominent Hsps in most cells is 
Hsp70, and most eukaryotic cells contain several types 
of Hsp70 with specialized functions. Hsp70 can bind 
to exposed hydrophobic regions of unfolded protein 
chains, recognizing lengths of seven to eight amino 
acids. Like Hsp60, Hsp70 binds and utilizes ATP as a 
source of energy to power its changes in shape asso- 
ciated with binding and release of substrate proteins. 
Many proteins that are transported from the cyto- 
plasm of the cell into the mitochondrion are bound 
by Hsp70, which keeps them in an unfolded state so 
that they may be threaded through channels in the 
mitochondrial membrane before they become re- 
folded and functional inside the mitochondrion. 

The wide range of functions carried out by heat 
shock proteins in both normal and stressed cells has 
made them objects of intense research. They are of 
interest in a variety of medical studies, including 
investigations of their roles in stress tolerance, 
immunity, aging, and neurodegenerative diseases. 


Heavy/Light Chains 


See: Globin Genes, Human 
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Helicases are ubiquitous enzymes that actively 
unwind the helical structure of nucleic acids, using 
the free energy of hydrolysis of nucleoside tri- 
phosphates (generally ATP). Both DNA and RNA 
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helicases exist and are important in virtually every 
transaction undergone by nucleic acids. Unpairing of 
DNA by DNA helicases is essential in replication, 
recombination, repair, and chromatin remodeling, 
while RNA helicases are required in translation, tran- 
scription, splicing, RNA processing, editing, mRNA 
export, and degradation. Common eubacterial DNA 
helicases include: the PriA protein, which is involved in 
the assembly of the primasome; DnaB, which acts at the 
replication fork; Rho, in transcriptional termination; 
UvrAB, in DNA repair; and RecBCD, in homologous 
recombination; while SV40 T-antigen is a eukaryotic 
example. An example of a prominent eukaryotic RNA 
helicase is the elF4a protein required for the initiation 
of translation, one of the common set of RNA heli- 
cases containing the amino acid motif DEAD. 

Helicases exist in superfamilies, containing com- 
mon sequence elements. Some of these are the Walker 
A and B boxes that form nucleotide binding pockets. 
The free energy of hydrolysis of NTP is used to 
unwind the nucleic acid and to translocate along it at 
a high rate. Thus, in general, helicases are DNA- or 
RNA-dependent ATPases that act as molecular 
motors. Translocation of DNA helicases is normally 
unidirectional, but can be either 5’ to 3’ (defined rela- 
tive to the enzyme-bound strand) or the reverse. The 
translocation may be more or less processive. 

Most helicases act in multimeric form, and a num- 
ber, exemplified by the Rho protein and T7 Gp4, form 
hexameric ring structures. Others act in dimeric form, 
while some such as PcrA and UvrD act as monomers. 
The structures of some DNA helicases have recently 
been solved by X-ray crystallography. 


See also: ATP (Adenosine Triphosphate); 
Nucleic Acid; Rho Factor 


Helicobacter pylori 


F Carneiro and C Caldas 
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Helicobacter pylori organisms are spiral, microaero- 
philic, gram-negative bacteria that colonize the human 
stomach. H. pylori bacteria were identified for the 
first time in 1982 by Warren and Marshall in Perth, 
Australia. However, there is evidence to suggest that 
these organisms colonized the stomach well before 
we became humans. H. pylori infection is one of 
the most common chronic infections in man. It is 
believed that until the twentieth century nearly all 
humans carried H. pylori or closely related bacteria 


in their stomach. Presently, it is calculated that the 
infection chronically affects up to 50% of the world’s 
human population. However, there are wide geo- 
graphical differences in the distribution of H. pylori. 
In most developing countries the infection affects 
90% or more of the population, while in developing 
countries its prevalence ranges from 20 to 50%. The 
decline of the infection in some parts of the world is 
most probably related to the improvement of socio- 
economic conditions, sanitation, and nutrition. H. 
pylori is thus becoming a ‘submerging’ rather than an 
‘emerging’ pathogen. The infection is acquired in 
childhood and, if not treated, persists for the lifetime 
of the host. H. pylori causes acute and chronic inflam- 
mation in the stomach (gastritis). The magnitude of 
the inflammation varies from host to host. Most 
infected individuals remain asymptomatic throughout 
their lives; however, in 20-30% of people, organic 
diseases will develop in the stomach or duodenum 
such as duodenal ulcer, gastric ulcer, gastric cancer 
(adenocarcinoma), or mucosa-associated lymphoid 
tissue (MALT) lymphoma. Diversity of clinical out- 
comes of H. pylori has been attributed to different 
factors, such as environmental factors (mainly diet), 
host factors (characteristics of the mucus layer cover- 
ing the gastric mucosa, immune response, etc.) and 
virulence factors of H. pylori strains. 

Genetic studies indicate that H. pylori strains are 
enormously diverse. The complete genomic sequences 
of two distinct H. pylori strains were published in 
1997 and 1999. About 1500 genes exist in H. pylori 
strains; a large majority of these have been function- 
ally characterized and a good proportion seems to be 
H. pylori specific. A few genes have been shown to 
be associated with virulence of the strains, namely 
vac, cagA, and iceA genes. The product of the vacA 
gene is a protein with cytotoxic activity that induces 
vacuolization of human cells. Two distinct regions 
exist in vacA gene, the s (signal) region and the m 
(middle) region. Within each of these regions several 
variants can be identified (sla, s1b, sic, and s2 in the s 
region; m1, m2a, and m2b in the m region). Strains 
typed as s1/m1 have the highest cytotoxic activity. The 
gene cagA encodes a high molecular weight protein 
whose function is not fully elucidated. This gene is one 
member of a genomic region that exists in only about 
60% of H. pylori strains. This region is designated as 
the pathogenicity island (PAI) and encompasses sev- 
eral virulence-associated genes: the cagA gene is con- 
sidered as a ‘marker’ of this island; the iceA gene is 
induced by contact with epithelium and exists as two 
variants, zceA1 and iceA2. The genetic constitution of 
the strains has clinical relevance. In Western countries 
it was shown that a person colonized by a cagA’, 
vacA s1 (and iceA1?) strain is more likely to develop 


gastric or duodenal ulcer or gastric cancer. In contrast, 
people infected with cagA, vacA s2, and iceA2 strains 
will most probably remain asymptomatic despite 
developing gastritis. There is a wide variation in the 
H. pylori genotypes colonizing different parts of the 
world. The similarities between several populations 
(for instance in the Iberian Peninsula and South Amer- 
ica) with respect to the prevalence of specific H. pylori 
genotypes suggest comigration and coevolution of H. 
pylori and humans. These similarities may reflect his- 
torical, cultural, and socioeconomic relationships 
between different areas of the world. 


See also: Adenocarcinomas; Bacterial Genetics 


Helix-Loop-Helix Proteins 
See: DNA-Binding Proteins 


Helix—Turn—Helix Motif 


J Read and S Brenner 
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A helix—turn-helix motif is a protein motif that is able 
to recognize and bind to specific DNA sequences. The 
motif comprises two o-helix separated by a short 
B-sheet. One helix interacts with the major groove of 
the DNA while the other inserts into the DNA and 
interacts with the bases. Such motifs are commonly 
found in transcription factors. 


See also: DNA Structure 


Helper Phage 
E Kutter 
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Bacteriophage replication normally requires a variety 
of functions that are routinely supplied by their bac- 
terial hosts. However, some bacteriophages also lack 
certain additional essential components for their own 
replication. A second homologous or heterologous 
phage that can supply such missing components and 
permit replication and packaging of the other phage 
is termed a helper phage. The best-studied natural 
heterologous system requiring such a relationship 
is the bacteriophage P2—P4 system. Bacteriophage P4 
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is missing all the genes for structural phage proteins. 
It can only complete its replication cycle and make 
phage particles when the cell is simultaneously 
infected with another temperate phage like P2. This 
is a particularly interesting case, since the two phages 
have no sequence homology and the P4 head is only 
about one-third as large as that for P2, reflecting the 
relative sizes of their genomes. Somehow, a P4 protein 
(sid) is able to tell the P2 capsid protein, gpN, to 
assemble into a very different structure than it would 
normally make. The same five P2 genes are also 
required to make the P2 and P4 capsids. The switch 
is total; no P2 phage are made under these circum- 
stances when P2 is acting as a helper phage for P4 
assembly. The term ‘helper phage’ can also be applied 
to a second phage in the same family that permits the 
replication of a phage damaged by UV, chemicals, or 
X-rays or one mutated in genes that are essential to 
survival. Molecular biologists have also designed a 
number of clever systems for packaging foreign DNA 
into phage particles; most of these involve especially 
designed helper phages that provide the necessary 
components and packaging machinery without them- 
selves being replicated under the conditions used for 
packaging the foreign DNA. 


See also: Bacteriophages; Temperate Phage 


Hemizygote 
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The term ‘hemizygote’ refers to a nucleus, cell, or 
organism that possesses only one of a normally diploid 
set of genes. 


See also: Heterogenote; Homozygosity 


Hemoglobin 


See: Globin Genes, Human 


Hemophilia 
F Giannelli 
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Hemophilia is the name shared by two X-linked reces- 
sively inherited defects of blood coagulation. These 
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manifest as spontaneous or excessive bleeding follow- 
ing minor surgery or trauma. The bleeding episodes 
may occasionally threaten life and in the long run may 
cause serious disability, especially by damaging joints. 
Hemophilia is due to defects in either the gene for 
coagulation factor VIII or that for factor IX. Muta- 
tions of the factor VIII gene cause hemophilia A, or 
classic hemophilia, while those of the factor IX gene 
cause hemophilia B, or Christmas disease. 


Population Genetics 


Both hemophilias have been maintained in the popu- 
lation by an equilibrium between mutation and selec- 
tion against the affected males. The latter causes a 
loss of hemophilia genes at each generation equal to 
[(1 — f) 1/3], where f is the chance that a patient will 
produce offspring relative to that of a normal male, 
and J is the incidence of the disease. The value of f was 
about 0.5 prior to the introduction of modern treat- 
ment for both hemophilias, so that existing hemo- 
philia genes were lost from the population and were 
replaced by new mutations at a rate of 1/6 per gener- 
ation. As a result both diseases show a high degree of 
mutational heterogeneity. In the second half of the last 
century the value of f is thought to have increased in 
developed countries because of treatment and better 
patients’ health. In this situation, since the mutation 
rates are expected to remain unchanged, the incidence 
of the disease rises, eventually to reach a new equilib- 
rium between mutation and selection. Currently, 
in the UK, the incidence of hemophilia A and B is 
respectively 1 per 5000 and 1 per 30000 males. 


Factor VIII Gene 


This gene is in band Xq28, 1.5 Mb from the telomere. 
It spans 186 kb, contains 26 exons, and is oriented so 
that the promoter lies telomeric to the rest of the 
gene. A CpG island in intron 22 of the factor VIII 
gene is the origin of two nested genes: F8A and F8B. 
The first is a 1.8 kb intronless gene entirely contained 
within intron 22 and transcribed in opposite orienta- 
tion to the factor VIII gene. The second is transcribed 
in the orientation of the factor VIII gene, and its 
message contains a specific first exon followed by 
exons 23 to 26 of the factor VIII gene. The CpG island 
at the origin of the F8A and F8B genes is part of a 9503 
bp segment of intron 22 of the factor VIII gene called 
int22h that is found repeated in opposite orientation 
350 and 450kb telomeric to the factor VIII gene. 
These three repeats are designated int22h-1, —2, and 
—3 according to their increasing distance from the 
centromere. The factor VIII gene produces a mRNA 
of 9028 nucleotides. 


Factor IX Gene 


The factor IX gene is near the boundary between 
Xq26 and Xq27. It spans 33.5 kb, contains 8 exons, 
and produces a message of 2802 nt. This gene appears 
to derive from an ancestral gene that gives rise to three 
more genes encoding proteins of blood coagulation: 
factors VII and X, and protein C. 


Factor VIII and Factor IX 


Factor IX is a serine protease that cleaves and activates 
factor X in the proteolytic cascade that results in the 
conversion of fibrinogen into fibrin, and hence in 
blood coagulation. Factor VIII is the cofactor that 
associates with factor IX to ensure physiologic levels 
of factor X activation. The complex of factors VIII and 
IX is, more generally, responsible for maintaining the 
coagulation cascade after its initiation by factor VII 
and tissue factor. 

Factor IX is synthesized with a signal peptide con- 
sisting of a prepeptide that is cleaved upon transport to 
the endoplasmic reticulum and a propeptide that is 
cleaved prior to secretion. The latter is important for 
interaction with the enzyme that y-carboxylates the 
first 12 glutamates of circulating factor IX. This cir- 
culating protein consists of 415 amino acids organized 
in the following domains: 


1. The gla domain, containing the y-carboxylated glu- 
tamates important for Ca** binding and affinity for 
phospholipidic membranes. 

2. Two epidermal growth-factor-like domains import- 
ant for protein-protein interactions. 

3. An activation domain that is cleaved to release resi- 
dues 146-180 and activate factor IX. 

4. The catalytic or serine protease domain homo- 
logous to trypsinand other members of this family of 
proteases. Factor IX posttranslational modification 
includes y-carboxylation, N- and O-glycosylation 
of different residues, and partial -hydroxylation of 
aspartate 64. 


Factor VIII is synthesized with a prepeptide of 19 
residues that is cleaved off prior to secretion. The 
remaining 2332 residues of factor VIII are organized 
in the domain structure A,a;A,Ba,A3;C,C>, where 
A13 are homologous to the domains of ceruloplasmin 
(a copper ion-binding protein), a; and az are small 
acidic peptides, B is a unique domain encoded by an 
exon (number 14) of 3106 bp, and C, and C3 are 
homologous to milk-fat globule-binding protein. 
Prior to secretion the protein is extensively modified 
by N- and O-glycosylation of several residues and 
sulfation of six tyrosines. In addition, it is cleaved at 
the B/az boundary and at variable positions within the 


B domain. The heterodimer (A,a;A2B + a2A3C1C2) is 
the inactive circulating form of factor VIII and is 
carried by a large multimeric protein: von Willebrand 
factor. This protects factor VIII and slows down its 
clearance. Mutations of von Willebrand factor that 
only affect its factor VIII binding property may there- 
fore mimic hemophilia A. 

Factor VIII is activated by cleavage at the A,a;/A2 
and a2/A3 boundary while any residual B domain is 
eliminated by cleavage at the A/B boundary. The 
heterotrimeric (Aja; + Az + A3C,C)) active form of 
factor VIII is unstable and may become inactive either 
by spontaneous dissociation of the Az chain or by 
enzymatic cleavage at the A,/a,; boundary or within 
the A, chain. Activated protein C operates these cleav- 
ages as part of a negative feedback control on blood 
coagulation. 

Factor VIII is homologous to coagulation factor V. 
This, however, has a clearly distinct B domain and 
lacks the a; and az acidic peptides. 


Mutations Causing Hemophilia A 


The severity of hemophilia is a function of the gene 
mutation and is directly related to the deficit of coagu- 
lant factor activity. A quarter of all hemophilia A cases 
is due to gross gene rearrangements and, in particular, 
5% is due to gross gene deletions and 20% to inver- 
sions of 500 or 600 kb breaking intron 22 of the factor 
IX gene. These inversions are due to intrachromo- 
some or intrachromatid homologous recombination 
between the imt22h-1 sequence of the factor VIII 
gene and either imt22h-2 or int22h-3. However, 
int22h-3 is involved five times more frequently than 
int22-2. The inversions appear to occur at the rate of 
4-7 x 10~° per gamete per generation and account for 
nearly half the patients with severe hemophilia A. 

Approximately 75% of hemophilia A mutations 
are base substitutions or small deletions/insertions. 
These may act by (1) leading to abnormal RNA 
splicing through damage to normal or creation of 
abnormal splicing signals; (2) causing premature ter- 
mination of translation (frameshifts, nonsense 
codons); or (3) producing subtle protein changes 
such as amino acid deletions or amino acid substitu- 
tions. So far, 228 different missense mutations (i.e., 
mutations causing amino acid substitutions) have 
been found in the factor VIII gene of hemophilic 
patients but more than 500 different missense muta- 
tions are expected to be capable of causing hemophilia 
A. Promoter mutations causing hemophilia A have 
not been reported so far. 

Most hemophilia A mutations arise in the male 
germline and this appears especially true of the inver- 
sions breaking intron 22. 
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Mutations Causing Hemophilia B 


Less than 2% of hemophilia B cases is due to gross 
rearrangements, represented generally by gross dele- 
tions that may even remove the entire gene. The other 
mutations are base substitutions and small deletions/ 
insertions. 

Three per cent of the mutations affect the promoter 
of the factor IX gene and usually cause a disease that 
markedly improves after puberty and may become 
asymptomatic. This is called Leyden-type hemophilia 
B. The exceptions, so far, are two different substitu- 
tions at nucleotide 26 that cause nonimproving hemo- 
philia. Since residue 26 is part of an androgen receptor 
binding site, it can be argued that binding of the ligand 
saturated androgen receptor at this site may restore 
the promoter activity impaired by the mutations caus- 
ing Leyden-type hemophilia B, while serious damage 
to the same site irretrievably damages promoter activ- 
ity. 

Other mutations damage or create RNA splicing 
signals (~12%); cause frameshifts (~4%); generate 
nonsense codons (~12%); or result in amino acid 
deletion (~2%) or substition (~63%). So far, 425 
different missense mutations have been found in the 
factor IX gene of hemophilia B patients. 

Factor IX mutations occur eight to nine times more 
frequently in the male than in the female germline. 


Genotype-Phenotype Correlations 


In both hemophilia A and B frameshifts and nonsense 
mutations tend to cause severe disease with absence of 
coagulant protein in circulation; in hemophilia B this 
seems to be true irrespective of the position of the 
premature translation stop signal. Splicing and mis- 
sense mutations may cause mild, moderate, or severe 
disease. Missense mutations may simply impair the 
function of the coagulant factor; cause gross reduction 
or virtual absence of the coagulation factor in the 
blood; or decrease the amount of protein in circulation 
as well as reducing its specific activity. 

Mutations expected to prevent the synthesis of 
‘near-normal’ coagulant proteins such as gross or 
complete gene deletions, frameshifts, nonsense muta- 
tions, and inversions breaking intron 22 of the factor 
VIII gene, predispose to the inhibitor complication. 
This entails the development of antibodies against the 
coagulant factor used in replacement therapy, so that 
the patient becomes refractory to standard treatment. 
Predisposition to manufacture such antibodies is 
probably due to failure to develop tolerance to the 
relevant coagulation factor because of inadequate 
exposure to the factor during maturation of the 
immune system. 
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Mutational Heterogeneity and its 
Relevance to Genetic Counseling 


Most hemophilia A and B mutations are unique, but 
frequent repeats of some mutations may occur. In 
some instances this is due to founder effects and 
tends to be restricted to the populations the founders 
belonged to, while in others it is due to mutational 
hotspots such as the imt22h regions for the common 
inversions causing hemophilia A, or CpG sites. The 
latter undergo transition mutations at 10 times the rate 
of other sites. 

In general, hemophilia A and B mutations are of 
recent origin, and a significant proportion is less 
than three generations old. This, together with the 
small size of modern families, allows as many as half 
the hemophilia families to appear sporadic. These 
families are unsuited to methods for carrier and 
prenatal diagnosis based on the analysis of the intra- 
familial segregation of polymorphic markers, and 
instead direct characterization of the gene defect is 
needed. 

A strategy that allows optimal genetic counseling 
and rapid progress in the understanding of the mo- 
lecular biology of the relevant disease is based on the 
construction of national confidential databases of 
mutations and pedigrees. The databases are assembled 
by characterizing the mutation of an index patient 
from each family and collecting the family’s pedigree. 
Carrier and prenatal diagnoses can then be based 
on detection of the defect specific to each family, 
and can be made for all the at-risk blood relatives of 
the index patient, for generation after generation. In 
the UK such a database has been constructed for 
hemophilia B, and that for hemophilia A is being 
assembled. 

The high mutational heterogeneity of the hemo- 
philias makes the analysis of natural mutants a very 
efficient way of investigating the features that are 
important to the function of factors VIII and IX and 
their genes. 


Treatment of Hemophilias A and B 


Replacement therapy is available for hemophilia A 
and B and is based on intravenous administration of 
concentrates of factor VIII and factor IX, respectively. 
These factors are either purified from blood donations 
or from cultures of cells expressing the recombinant 
factors. 

Work for the development of gene therapy is 
ongoing but is still at the animal-experimentation 
phase. Human application requires safe and efficient 
methods of gene delivery capable of ensuring satisfac- 
tory and stable gene expression. 


Further Reading 

Tuddenham EGD and Cooper DN (1994) The Molecular Genetics 
of Haemostasis and its Inherited Disorders, Oxford Mono- 
graphs on Medical Genetics no. 25. Oxford: Oxford Univer- 
sity Press. 

Bloom AL, Forbes CD, Thomas DP and Tuddenham EGD (eds) 
(1994) Haemostasis and Thrombosis. Edinburgh: Churchill 
Livingstone. 


See also: Genetic Counseling; Genetic Diseases; 
Sex Linkage 


Hereditary Diseases 


D E Wilcox 
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Hereditary diseases are those whose causation has a 
genetic component. This component is caused by 
transmissible change(s) in the genetic material. The 
heritability of a disease is a measure of the relative 
proportions of genetic and environmental factors. A 
single gene disorder with little environmental influ- 
ence, e.g., Duchenne muscular dystrophy in humans, 
will have a high heritability. A multifactorial disorder, 
e.g., congenital heart disease, will have a lower heri- 
tability. The hereditary component of a disease may be 
caused by a single gene, multiple genes (polygenic), or 
various chromosome abnormalities such as deletions 
or translocations. 


See also: Clinical Genetics; Congenital Disorders; 
Genetic Diseases 


Hereditary Neoplasia 


L M Mulligan 
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Cancers or groups of related cancers which occur with 
an increased frequency in families, as compared to the 
general population, due to genetic risk factors may 
be termed hereditary neoplasia. These diseases result 
from the inheritance of a mutation in a tumor sup- 
pressor gene or (rarely) an oncogene which makes 
the individual susceptible to developing the specific 
tumor type(s). Although risks of developing an her- 
editary neoplasm may be very high for individuals 
within such families, overall, only 10-15% of cancer 


falls into this category. Characteristic features of her- 
editary neoplasia include an early age of disease onset 
and the occurrence of multiple primary tumors. It 
should be stressed that hereditary neoplasia refers to 
inheritance of a susceptibility allele or alleles but not 
to inheritance of a cancer phenotype per se. 


See also: Cancer Susceptibility; Tumor 
Suppressor Genes 


Heritability 
W G Hill 
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Heritability is a commonly used and important term 
to describe properties of the inheritance of quantita- 
tive traits, such as stature in man or milk yield of cows. 
Informally, heritability (4) is the proportion of the 
variation in the trait due to genetic differences be- 
tween individuals, but a more precise definition of 
heritability is important because the term is both 
widely used and widely misused. Correlations among 
relatives and response to directional selection are pro- 
portional to the heritability. Although a property of a 
specific trait in a specific population, it is found that 
heritabilities of similar traits take similar values in 
different species and populations. The first use of the 
word ‘heritability’ is uncertain, but it is most often 
associated with Jay L. Lush, who applied the theory 
of quantitative genetics of Sewall Wright and R.A. 
Fisher to animal breeding. 


Definition 


The observed performance or phenotypic value, P, of 
an individual for a quantitative trait can be partitioned 
in a simple additive model into two components, 
genotypicvalue(G) and environmental deviation (E)as: 


P=G+E 


A genotype x environment interaction term, GE, 
can also be included in the model, but cannot usually 
be distinguished from the environmental deviation, E, 
as each environmental deviation is unique. Because 
individuals transmit only one gene at each locus to 
their offspring, the other copy coming from the sec- 
ond parent, in describing correlations among most 
relatives and in predicting responses to selection it is 
necessary to consider the average performance of indi- 
viduals who receive one copy of a specified gene and 
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the other at random from the population. Fisher 
described this in 1918 as the average effect of the 
gene. The breeding value, A, of an individual is the 
sum of the average effects of the genes it carries. 
More simply and practically, the breeding value of an 
individual is defined as twice the expected deviation of 
the mean of its progeny, if randomly mated, from the 
population mean; but these definitions are the same 
unless there is epistasis. The dominance deviation, D, 
defines differences between genotypic value and 
breeding value which are due to interactions between 
genes at individual loci, and the epistatic deviation, J, 
define differences due to interactions between differ- 
ent loci. (J can be further partitioned into additive x 
additive, additive x dominant and other terms.) 
Hence a fuller model is: 


P=A+D4I/+E 


Variation among individuals in phenotypic value, Vp, 
can now be partitioned into components: 


Vp = Vo + Ve = Va + Vp + V7 + VE 


assuming that correlations between or interactions of 
genotype and environment can be ignored or catered 
for in other ways. 

There are two different definitions of heritability: 


e Heritability in the broad sense: H? = Vg/Vp 
e Heritability in the narrow sense, or simply ‘heri- 
tability’: b? = V4/Vp 


In most situations, particularly when describing 
correlations among relatives or in predicting response 
to selection, heritability in the narrow sense is the 
more useful quantity and is implied. Heritability 
appears as a squared term, because h was first defined 
by Sewall Wright in 1918 as the path coefficient from 
genotype (or breeding value, since he included no 
dominance term) to phenotype. As b also equals the 
correlation between breeding value (A) and pheno- 
type (P), 4 is the accuracy of selection on phenotype. 
It also follows that 4? is the regression of breeding 
value on phenotype. Further, since h = corr(A, P), the 
variance in A which is not explained by P is 


V(AIP) = (1 — h3) Va. 


Magnitude 


Because the amount of genetic variance depends on the 
frequencies and effects of genes at many loci, and the 
environmental variation depends on the environment 
in which individuals are kept, heritability differs 
among traits, species, populations within species, and 
over time. In practice, however, it turns out that each 
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Table |I Heritability values for different species and 
traits 
Species and traits h? (%) 
Drosophila’ 
Life history traits (longevity, 
fecundity, development time) 12 


Behavioral traits (locomotion, 
mating activity, geo- and phototaxis) 18 
Morphological traits (bristle 


number, wing and thorax size) 32 
Pig? 
Ig 
Reproductive rate (litter size) 10 
Growth rate (daily gain, feed 
intake, and conversion efficiency) 30 


Morphology (backfat, carcass lean %) 45 
Humans 


IQ (meta-analysis)? 34 

48 

(broad sense) 
Stature* 65 
Finger ridge count? >95 


'Roff and Mousseau (1987). 
Rothschild and Ruvinsky (1998). 
3Devlin et al. (1997). 

Roberts et al. (1978). 

*Holt (1955). 


kind of trait has a typical heritability value, which is 
often similar among very different species. Some gen- 
eral values are given in Table |. 


Estimation 


Relatives resemble each other because they have 
genes in common, and the closer the relationship the 
more likely they are to share genes and the more 
highly correlated are their phenotypes for quantita- 
tive traits. Similarly, the higher the heritability, the 
more highly correlated are the phenotypes of relatives. 
Heritability is therefore estimated from the resem- 
blance between relatives, scaled to take account of 
the relationship. 

There are two major problems in estimating heri- 
tability. The first is to avoid confounding of the cor- 
relations among relatives by nongenetic causes such 
as shared environment. It is therefore much easier to 
get good estimates in well-designed experiments in 
laboratory animals than in man. The second problem 
is to get sufficient data to provide accurate estimates; 
and while estimates from distant relatives may be less 
confounded by common environment, they estimate 
only a fraction of h° and so have to be scaled up and 
have a high sampling error. 


Heritability in the broad sense can be estimated 
from the correlation of phenotypes of individuals 
which have the same genotype, i.e., clones or identical 
twins. In plants this can be feasible, whereas in 
humans identical twins share the pre- and postnatal 
environment. In cattle, identical twins formed by 
embryo splitting can be reared in different foster 
mothers; in humans adoptive twins do not share post- 
natal environment. 


Parent and Offspring 

The covariance of parent and offspring, which have 
precisely one (autosomal) gene in common at each 
locus, and therefore half their genotype, equals V4/2. 
Hence if individuals are sampled at random, the cor- 
relation of phenotype of offspring and individual par- 
ent is h?/2, and similarly the regression of phenotype 
of offspring on one of its parents is h7/2. (The word 
regression was coined by Galton to describe the 
fact that extreme parents had less extreme offspring.) 
Hence if a set of data on parent and offspring are 
collected and the regression (+SE) of progeny on 
parent phenotype is 0.2 (+ 0.1), then the estimate of 
heritability is 0.4 (+ 0.2). Because it is easier to deal 
with large numbers of offspring and because the esti- 
mate is not biased by selection on the trait (providing 
it is only on that trait), it is usual to use the regression 
rather than correlation as the estimator. If the pheno- 
type of offspring is regressed on parental average for a 
trait measured on both parents, the regression coeffi- 
cient estimates h°. Maternal effects can bias estimates 
from offspring—parent regression or correlation, for 
example in body weight due to family environment 
in man or the association between dam’s milk produc- 
tion and weight in cattle. If there is nonrandom mating 
among parents, as in humans for stature, the regres- 
sion or correlation of offspring on individual parent is 
biased (upward with positive assortative mating), but 
the regression on mid-parent is not. 


Full and Half Sibs 

Full sibs share 0, 1, or 2 parental genes at each locus, 
with respective probabilities 1/4, 1/2, and 1/4, and half 
sibs 0 or 1, each with probability 1/2. It follows that 
the genetic covariance of full sibs equals V4/2 + Vp/4 
and of half sibs V4/4 (plus some epistatic terms). 
Typically, however, full sibs also share a common 
environment, for example both pre- and postnatal in 
mammals, which contributes a variance, Vc, to the 
variance among families or covariance between family 
members. The environmental correlation is often 
called the c? term, where c? = Vc/Vp. Of course, 
there are designs in which Vc is eliminated among 
full sibs (e.g., embryo transfer) and others where it is 
present among half sibs (e.g., in plants where maternal 


half sibs are the norm, and in animals where half sibs 
are raised together). Data from experiments or field 
trials are typically subjected to analysis of variance, 
and heritability is estimated from the intraclass correl- 
ation. Assuming there is no confounding, this correl- 
ation is an estimate of 47/2 for full sibs and 7/4 for 
half sibs. In mammals in which each male has several 
mates, both the full and half-sib correlations can be 
estimated for the same experiment. The half-sib esti- 
mate is usually taken because it is less likely to be 
confounded by common environment and dom- 
inance, although it has a higher sampling error. As 
for the offspring—parent correlation, positive assorta- 
tive mating can increase the correlation among sibs. 


Twins 

There are considerable problems in eliminating com- 
mon environment effects for heritability estimation in 
man. The use of twins provides a route, specifically by 
comparing the correlations of identical (monozygous, 
MZ) and nonidentical (dizygous, DZ) twins. If the 
MZ correlation is assumed to equal h* +c? and the 
DZ correlation h?/2 + œ, then an estimate of heritabil- 
ity is 2[corr(MZ)—corr(DZ)]. This is, however, biased 
upward by all nonadditive genetic effects (for domin- 
ance the MZ covariance includes Vp and the DZ 
includes Vp/4) and epistasis, and by any extra similar- 
ity of environment that MZ share over DZ through 
their treatment or behavior. 


Combination of Information 

So as to make best use of information on all relatives, 
particularly from field data, sophisticated models 
and computer-intensive statistical methods using 
(restricted) maximum likelihood or Bayes’ theorem 
(via Gibbs sampling) are adopted. These incorporate 
correlations among all relatives, suitably weighted for 
relationship and numbers of records, and account for 
identifiable environmental differences such as farms 
or years of birth. Such methods are replacing simple 
regression or correlation analyses in many applica- 
tions because they are efficient, enable the precision 
of an estimate of heritability to be computed accur- 
ately, and enable successively more complicated 
models to be fitted and tested. Thus the shape of the 
likelihood curve describes the degree of support for a 
particular value of the heritability. In a Bayesian con- 
text, the posterior distribution of heritability fulfills a 
similar role. Also, for example, a likelihood ratio test 
can be used to check whether a nonadditive genetic 
component is important. 


Selection Response 
As the regression of offspring on mid-parent pheno- 
type equals heritability and is linear or close to linear 
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under polygenic inheritance (exactly linear under 
multivariate normality), the regression of the off- 
spring of a group of individuals on the mean of their 
parents’ phenotype also equals h?. Hence, if a group of 
individuals are selected which differ in phenotype by 
an amount S, the selection differential, their offspring 
will be expected to deviate in performance from those 
of unselected parents by an amount h7S. This is the 
selection response, given by R = 47S, the classical 
prediction equation of quantitative genetics. There- 
fore, providing environmental change over generations 
can be eliminated, or corrected for by maintaining an 
unselected control population alongside the selected 
population, and heritability can be estimated from the 
response to selection as Falconer’s ‘realized heritabil- 
ity,’ b* = R/S. If selection is practiced over several 
generations the heritability may not change much, in 
which case the (realized) heritability can be estimated 
from the regression of cumulative response over gen- 
erations on the cumulative section differential. 


Uses 


Heritability tells us no more than the additive genetic 
variance and phenotypic variance do separately, but 
it is a useful summary and descriptive parameter. Just 
as the correlation among relatives can be used to esti- 
mate heritability, so the heritability can be used to 
predict the correlation of relatives. Prediction of the 
expected phenotype of offspring of selected individ- 
uals (equal to the breeding values of these individuals) 
and thus of selection response is probably the most 
important practical use of the heritability estimate. A 
comparison between the heritability predicted from 
collateral relatives such as half sibs and the realized 
heritability or selection response provides a check on 
quantitative genetics theory (whereas comparison 
of realized heritability and regression of offspring on 
parent does not, for they are based on the same 
principles). 


Discrete Traits 

Although primarily used for traits with continuous 
expression such as stature, heritability can also be 
applied to traits with discrete phenotypes. Traits 
with many categories such as litter size in pigs can be 
treated as continuous. There are alternative methods 
for traits which have only two or so classes, but no 
simple Mendelian expression, such as survival to 
weaning, incidence of twinning in man or cattle, or 
incidence of a congenital defect such as club foot, can 
also be analyzed. One way is simply to regard the 
traits as having two values, say 1 (affected) and 0 (un- 
affected), and ignore any nonlinearity or heterogen- 
eity of variance. More naturally within the quantitative 
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genetics framework, the discrete (all-or-none) trait 
can be considered as the expression of some under- 
lying continuous variable liability, such as level of 
circulating hormone or strength of immune reaction, 
with a threshold value above which affected individ- 
uals lie. Heritability on the all-or-none and on the 
underlying liability scale are functions of each other: 
the former is always lower, the difference widening 
the further the incidence of the trait departs from one- 
half. Methods were developed by Falconer to estimate 
heritability on the liability scale directly from the 
frequencies of the trait in the population as a whole 
and in the relatives of affected individuals, by analogy 
with a selection experiment in which the latter play the 
role of offspring in the next generation of selected 


(affected) individuals. 


Some Misinterpretations 


The magnitude of the heritability does not tell us a lot 
of things. For example, as it applies to individuals 
within populations, it cannot be used to predict 
genetic differences between races or other populations 
from phenotypic differences, whether or not they 
share the same environment. 

The prediction formula R = h’S applies only (other 
than in very special circumstances) if selection is prac- 
ticed on the trait on which response is measured. If 
selection is practiced on some trait or combination of 
traits other than the one of interest, the regression of 
response on selection differential is not therefore an 
unbiased estimate of heritability, but depends inter 
alia on the genetic and phenotypic correlations among 
the traits. This is a serious problem in inferences 
about selection in nature, where the actual selection 
applied is not known. Methods exist to overcome this 
problem, but require that records be available on all 
traits on which selection is practiced or to which fit- 
ness is related. 

As the heritability is a summary parameter over 
loci, it does not tell us about either the numbers of 
genes that affect a quantitative trait or the magnitude 
of their effects. It is not therefore a constant as a 
population changes. But heritability is nevertheless a 
useful concept when properly used. 
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A hermaphrodite is an individual that possesses both 
male and female gonads, theoretically capable of pro- 
ducing both sperm and ova. The situation is normal in 
some species of plants, can occur uncommonly in some 
amphibia, birds, and fish but only rarely in mammals, 
where it is usually associated with infertility. Humans 
with both testicular and ovarian tissue are usually 
described in the scientific literature as true herm- 
aphrodites, to distinguish them from male and female 
pseudohermaphrodites who may show sex reversal in 
the presence of testes and ovaries respectively. 
Various anatomical varieties of true hermaphrodit- 
ism are described. Lateral hermaphrodites have a testis 
on one side and an ovary on the other. Spermatogonia 
may be observed in the testis and oogonia in the ovary 
in lateral hermaphroditism. More commonly a com- 
pound gonad, or ovotestis, is present either unilat- 
erally or bilaterally. Ooogonia and developing oocytes 
may be present in the ovarian part of the ovotestis but 
the testicular structure is usually devoid of spermato- 
gonia after puberty; in fact, degenerating oocytes are 
occasionally seen within testicular tubules. Differenti- 
ation of the internal genital ducts depends on the 
nature of the ipsilateral gonad. An ovary is always 
associated with a normal fallopian tube and at least 
partial development of the uterus and absence of the 
Wolffian ducts on the same side. In lateral hermaphro- 
ditism this results in a unicornuate uterus and tube 


associated with the ovary, and a vas deferens, seminal 
vesicle, and regression of the uterus and tube on the 
side of the testis. An ovotestis is usually associated 
with development of the Mullerian ducts and regres- 
sion of the Wolffian ducts. In all types of true herm- 
aphroditism, the presence of testicular tissue leads to 
ambiguity of the external genitalia with posterior 
fusion of the labial folds and clitoral enlargement. At 
puberty, there is breast development with the forma- 
tion of both glandular and ductal components and 
menstruation may occur. 

In most patients with true hermaphroditism no 
cause can be found and the chromosome constitution 
is indistinguishable from that of a normal female. A 
small number of cases are described with mosaicism for 
XXY and XX cells. In rare cases there is true chimerism 
in which both normal 46, XY (male) cells and 46, XX 
(female) cells coexist in the same individual. A double 
contribution of alleles from each parent at a number of 
genetic loci confirms an origin from the fusion of two 
fertilized eggs, or the double fertilization of a diploid 
egg. In equally rare cases, the condition is due to abnor- 
mal recombination between the X and the Y during 
paternal meiosis whereby the sex-determining region 
of the Y is transferred to the end of the short arm of 
the X. It is presumed that random X inactivation leads 
to the development of testis-inducing and ovary- 
inducing populations of cells in the early embryo, a 
situation analogous to XX/XY chimerism. It is note- 
worthy that experimental XX/XY chimerism in mice, 
produced either by blastocyst fusion or by injection of 
donor embryonic stem cells into the recipient blasto- 
cyst, may lead to hermaphroditic phenotypes identical 
to those found in true hermaphroditism in humans. It is 
also of interest that most examples of XX/XY chimer- 
ism in mice are associated with an unambiguous male 
phenotype. X-Y interchange in humans also most 
often leads to a male phenotype in infertile, so-called 
XX males with features of Klinefelter syndrome 
(see Klinefelter Syndrome). Very rarely, XX males 
and XX true hermaphrodites have been identified in 
the same pedigree; the cause is so far unexplained. 


See also: Chimera; Intersex; Klinefelter 
Syndrome; Sex Reversal 
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Alfred Day Hershey (1908-97), an American geneti- 
cist, was born 4 December 1908 in Owosso, Michigan, 
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and received his BS (1930) and PhD (1934) from 
Michigan State College (East Lansing). He was a 
faculty member in the Department of Bacteriology at 
Washington University (St Louis) from 1934 to 1950, 
when he joined the Department of Genetics of the 
Carnegie Institution of Washington at Cold Spring 
Harbor, New York. His research focused on the 
genetics of bacteria and bacteriophages and he made 
important contributions to the understanding of the 
nature of genes, their replication and recombination. 
Among many honors, he received the Nobel Prize in 
Physiology or Medicine in 1969, sharing it with Max 
Delbrück and Salvador Luria. He died 22 May 1997. 

Hershey’s early research at Washington University 
was carried out in collaboration with Jacques 
Bronfenbrenner, a well-known immunologist and 
early bacteriophage worker. They studied the meta- 
bolism of bacteria before and after phage infection. 
In 1943 Hershey, Delbrück, and Luria initiated a series 
of periodic meetings to discuss their mutual interests 
in bacteriophage biology, an event which is often 
viewed as the start of the research school now 
known as the “American Phage Group.” 

In his work during the 1940s, Hershey developed 
the bacteriophage T2 as a genetic organism. He found 
both host-range and plaque-morphology mutants and 
showed that coinfection with two different parental 
phage allowed detection of genetic recombination in 
bacteriophage. Through this work, he showed that T2 
phage was an ideal organism to study basic genetic 
mechanisms. One class of his plaque-morphology 
mutants turned out to be an unusual type of host- 
range mutant as well, the rapid-lysis (r) mutants. An- 
alysis of the rII locus in T-even phage provided deep 
insight into the nature of the gene and the genetic code. 

Study of the process of phage infection and multi- 
plication led Hershey to devise methods to interrupt 
phage infection by hydrodynamic shearing of the 
bacterium—bacteriophage complex. With this tech- 
nique, Hershey and his collaborator Martha Chase 
carried out their most famous work, an experiment 
that came to be known as the ‘Hershey—Chase 
Experiment.’ (Because they used a common food 
blender to shear the bacterial culture, the experiment 
is also called the ‘Blender Experiment.’) Using newly 
available radioactive tracers for metabolic labeling of 
the protein (S) and nucleic acid (°P) components of 
phage T2, they sheared the phage-infected complexes 
after a time when shearing would not prevent intra- 
cellular phage production. They found that the pro- 
tein and nucleic acid components of the phage 
dissociated upon infection, with most of the protein 
remaining susceptible to removal by shearing while 
most of the nucleic acid had entered the bacterial cell 
and was thus protected from the external shear forces. 
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The interpretation they cautiously presented was that 
the proteinaceous phage coat remained outside the 
cell, while the DNA was injected into the cell. This 
result was immediately taken as confirmation that the 
DNA was the substance which was associated with the 
genetic continuity of the phage and that the protein 
coat was merely a transport vehicle. This experiment is 
usually described in idealized terms, although the 
actual data presented by Hershey and Chase certainly 
allowed for some possible protein to accompany the 
DNA into the cell. 

In the 1960s Hershey turned his attention to the 
lysogenic phage lambda and devised simple yet elegant 
approaches to study the physical states of the lambda 
DNA. He pioneered methods for dealing with large 
DNA molecules, which are highly sensitive to break- 
age by shear forces in solutions. His methods for 
DNA extraction (phenol) and zone sedimentation (in 
sucrose gradients) allowed him to show that lambda 
DNA existed in both linear and circular forms, and that 
it has unpaired (presumably complementary) cohesive 
termini. This work was seminal in developing our 
current understanding of lysogeny as well as in the 
applications of lambda bacteriophage in recombinant 
DNA technologies. 


See also: Bacteriophages; Delbriick, Max; 
Luria, Salvador 
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Heteroalleles are alternative mutant forms of a given 
gene resident at the same locus. 


Heteroallelic Complementation 


A heteroallelic diploid is characteristically mutant in 
phenotype. A heteroallelic diploid that has wild-type 
or quasi wild-type phenotype is said to manifest inter- 
allelic (or intragenic) complementation. Such comple- 
mentation often reflects either a multimeric state of 
the functional protein product of that gene or two or 
more domains within the protein manifesting more 
or less independent functions. 


Heteroallelic Recombination 


When the altered nucleotide sequences defining the 
two heteroalleles are not overlapping, interallelic 


(intragenic) recombination can generate the wild-type 
as well as the doubly mutant allele. When genes are 
small (intron-free), recombination between heteroal- 
leles usually occurs by gene conversion. 


History 


In the 1940s and 1950s, demonstrations of interallelic 
complementation and recombination strained the 
classical definition of a gene. Complementation be- 
tween mutations is a classical demonstration that two 
mutations are in separate genes, defined as units of 
function. However, understanding of quarternary 
protein structure soon rationalized the exceptional 
cases of heteroallelic complementation. Recombin- 
ation between mutants is a classical demonstration 
that the two mutations are in separate genes, defined 
as units of recombination. However, analysis of the rll 
gene of bacteriophage T4 combined with the Watson- 
Crick hypothesis for DNA structure established the 
modern view that a gene is a segment of a continuous 
DNA duplex with recombination possible between 
any pair of adjacent nucleotides (Benzer, 1955). 


Reference 

Benzer S (1955) Fine structure of a genetic region in bacterio- 
phage. Proceedings of the National Academy of Sciences, USA 
41: 344-354. 


See also: Complementation Test; 
Gene Conversion 
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Heterochromatin was originally defined by Heitz in 
1928 as chromosome segments that failed to decon- 
dense at the end of telophase, but which remained 
condensed throughout interphase, and which appeared 
as condensed segments at the following prophase, that 
is, it showed positive heteropyknosis. Subsequently, it 
was realized that there is more than one class of hetero- 
chromatin. ‘Constitutive heterochromatin’ is found at 
virtually all stages of an organism’s life cycle, in the 
same place on both of a pair of homologs, can be stained 
by specific methods, and generally contains distinctive 
types of DNA. ‘Facultative heterochromatin,” on the 
other hand, only occurs in one of a pair of homologs, 
cannot generally be stained distinctively, and necessar- 
ily contains the same type of DNA as that found in 
the nonheterochromatic homolog. The best-known 


example of the latter is the inactive X chromosome of 
female mammals. 


Constitutive Heterochromatin 


Constitutive heterochromatin is most easily demon- 
strated using C-banding; a variety of other chromo- 
some banding methods produce specific staining of 
certain heterochromatic regions of chromosomes in 
certain species. Characteristically, constitutive hetero- 
chromatin consists largely of highly repetitive (‘satel- 
lite’) DNA, although blocks of heterochromatin 
may not necessarily consist exclusively of such 
DNA, and in some species moderately repetitive 
rather than highly repetitive DNA seems to be pres- 
ent. The DNA of constitutive heterochromatin is 
late-replicating, and in mammals, its cytosines are 
often methylated. A number of proteins have been 
described that are either specific to, or concentrated 
in, constitutive heterochromatin; such proteins may 
well be involved in the condensed state of heterochro- 
matin. Heterochromatin has generally been regarded 
as genetically inert. The quantity in the genome can 
vary extensively without any apparent phenotypic 
effects. In Drosophila it is not replicated during poly- 
tenization of chromosomes, and in certain other 
organisms heterochromatin is eliminated in somatic 
cells, and retained only in the germline. The highly 
repetitive DNA sequences found in most heterochro- 
matin could not be translated into proteins. Never- 
theless, constitutive heterochromatin is not without 
effects. It can have profound effects on the position 
and number of chiasmata at meiosis; induce the inacti- 
vation of genes close to it (position-effect variega- 
tion); and in Drosophila can contain Y-chromosome 
fertility factors, factors involved in pairing and dis- 
junction of achiasmate chromosomes, and certain 
other unconventional genetic factors such as Respon- 
der and ABO. The genetics of few organisms have 
been studied as intensively as that of Drosophila, and 
it may yet turn out that constitutive heterochromatin 
in many species contains nonconventional factors. 


Facultative Heterochromatin 


The best-known example of facultative hetero- 
chromatin is the inactive X chromosome of female 
mammals, in which one of the X chromosomes is 
permanently inactivated early in development, appar- 
ently as a means of dosage compensation, so that the 
amount of X-chromosome gene products produced is 
similar in males (with only one X) and in females (with 
two X chromosomes). (It should be noted that in 
birds, with an independently evolved ZW/ZZ sex 
chromosome system, there appears to be no dosage 
compensation, and no facultative heterochromatin, 
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while in Drosophila dosage compensation is achieved 
by increased transcription from the single X chromo- 
some in males.) Like constitutive heterochromatin, 
the facultative heterochromatin of the mammalian 
inactive X is late-replicating, and its DNA is more 
methylated than that of its euchromatic homolog; 
however, the inactive X cannot be stained distinctively 
by chromosome banding techniques. 

The other reasonably well-known system of facul- 
tative heterochromatin occurs in the mealybugs. In the 
males of this insect, the entire paternal set of chromo- 
somes becomes heterochromatinized, although this 
does not appear to be related to sex determination. 
In somatic cells, the heterochromatin replicates less 
than the euchromatin, while in male meiosis, two 
wholly heterochromatic and two wholly euchromatic 
nuclei form, of which only the two latter develop into 
spermatozoa. 


Heterochromatin: Substance or State? 


In the past, it was argued whether heterochromatin 
was a substance or a state. We can now answer that 
question. Constitutive heterochromatin is evidently a 
substance, since it consists of specific DNA fractions 
combined with specific proteins. Conversely, faculta- 
tive heterochromatin is evidently a state, as its DNA 
sequence is identical to that of its euchromatic homo- 
log, and in rare cases its heterochromatinization is 
reversible. Euchromatin inactivated as a result of pos- 
ition-effect variegation, when the inactivation spreads 
from an adjacent region of constitutive heterochro- 
matin, is clearly also a state of chromatin. Neverthe- 
less, there are occasional systems in which typical 
constitutive heterochromatin becomes decondensed, 
for example in the early stages of development in 
Drosophila, when the rate of division is very high, 
and there may perhaps be no time to condense the 
heterochromatin. In spite of these exceptions, it is 
still useful to make the distinction between consti- 
tutive and facultative heterochromatin. 


See also: Chromosome Banding; Heteropyknosis; 
Position Effects; X-Chromosome Inactivation 
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The term heterochronic is derived from the Greek 
heteros, meaning other or different, and khronos, 
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meaning time. Thus, a heterochronic mutation is a 
mutation that alters the relative timing of events as 
an organism develops. Heterochronic mutations have 
been identified in many organisms; among the best 
studied are certain cell lineage mutants of the nema- 
tode Caenorhabditis elegans. 


Developmental Timing in Caenorhabditis 
elegans 


Genetic analysis has been used to study the temporal 
progression of pattern formation during postembry- 
onic development in C. elegans. The heterochronic 
mutations identified in these studies alter the timing 
of certain stage-specific postembryonic developmental 
events relative to other unaffected events. One of the 
events studied is the terminal differentiation of lateral 
epidermal cells (called hypodermal cells in C. elegans), 
which is illustrated in Figure | A and B. The nematode 
hatches from an egg and develops through four larval 
stages (L1 to L4) in the process of reaching adulthood 
(A). During the first three larval molts in wild-type 
animals, these lateral hypodermal cells divide and 
synthesize a larval-type cuticle (Figure 1B). During 
the final molt they terminally differentiate; they do not 
divide and they synthesize an adult-type cuticle con- 
taining a set of ridges, termed adult alae, that extend 
along the lateral length of the animal. Heterochronic 
mutations have been identified that cause hypodermal 
cell terminal differentiation to occur too early or too 
late relative to the properly timed gonadal develop- 
ment when compared with wild-type animals. The 
genes defined by these mutations, the heterochronic 
genes, have been analyzed in considerable detail. 

Inactivation of the heterochronic gene lin-14 (lin- 
14(0)) results in the precocious execution of hypoder- 
mal cell terminal differentiation during the L3 molt 
(Figure IB). Conversely, a gain-of-function mutation 
in lin-14 (lin-14(gf)), which causes inappropriately 
high levels of lin-14 activity at late developmental 
times, results in a ‘retarded’ phenotype, i.e., the 
indefinite delay of hypodermal cell terminal differen- 
tiation. These animals execute a larval-type develop- 
mental program during the fourth molt, and this 
program is repeated during extra molting cycles not 
observed in wild-type animals. 

The biological basis of the altered time of hypoder- 
mal cell differentiation in /in-14 mutants has been 
traced to cell lineage defects. Mutations in lin-14 
alter the time at which certain stage-specific cell div- 
ision patterns occur. The wild-type hypodermal cell 
division pattern of each stage is denoted as S1-S4, 
with A representing the terminally differentiated 
adult state (Figure |B). During the L1 stage, the hypo- 
dermal cell V6 divides once -the S1 pattern. During the 
L2 stage, a double division is executed, the S2 pattern, 


and so on until terminal differentiation occurs in adults 
(A). In lin-14(0) animals, the S1 pattern is deleted and 
the remaining patterns are each executed one stage 
early: S2—S3—S4—A. The net result of this temporal 
transformation in cell fate is that terminal differenti- 
ation occurs during the third, rather than the fourth, 
molt. In /in-14(gf) mutants, the S1 pattern is reiterated 
indefinitely. This interpretation of the /in-14(gf) defect 
is best illustrated by examining the lineage of a tail 
hypodermal cell (T, Figure 1B). The S1 T cell division 
pattern is characterized by seven cell divisions and one 
programmed cell death (x). The S2 pattern is much 
simpler and consists of a single cell executing a double 
division. Loss of /in-14 activity results in this double 
division during the L1 stage, while in the presence of 
extra /in-14 activity, the cell that normally divides in the 
L2 stage still divides, but instead of undergoing the 
simple S2 division pattern it behaves like its grandpar- 
ent and executes the complex S1 pattern. 

Other identified heterochronic genes in C. elegans 
include lin-4, lin-28, lin-29, lin-42, and daf-12. These 
genes are each also required for the correct temporal 
patterning of the lateral hypodermis and mutations in 
these genes cause cells to express developmental pro- 
grams that are normally reserved for a different stage. 
Lateral hypodermal cell lineage patterns for lin-4, 
lin-28, and lin-29 mutants are summarized in Figure 
IC. As for lin-14, loss of lin-28 activity results in 
precocious execution of hypodermal terminal differ- 
entiation; however, S1 patterns are executed normally 
and the S2 pattern is omitted. In contrast, loss of lin-4 
or lin-29 activity results in a retarded phenotype, 
although the cell lineage defects caused by these muta- 
tions differ. Genetic analysis has demonstrated that 
lin-4 is a negative regulator of lin-14 and lin-28 and 
that these genes in turn negatively regulate lin-29. 
lin-29 activity triggers the switch to the adult pro- 
gram; in its absence, larval cell division patterns are 
observed during the fourth and subsequent molts. 


Molecular Analysis of Heterochronic 
Genes 


The opposite phenotypes exhibited by gain-of- 
function and loss-of-function lin-14 alleles reflect the 
key role that lin-14 plays in the heterochronic gene 
pathway. Molecular analysis of lin-14 has revealed that 
it encodes a nuclear protein (LIN-14) that accumu- 
lates in hypodermal cells of newly hatched L1 larvae 
and decreases to an undetectable level by the early L2. 
This disappearance of LIN-14 is required for the 
switch from the S1 to the S2 cell division pattern. In 
lin-14(gf) mutants, LIN-14 remains present in the 
hypodermis throughout development and the S1 
pattern is reiterated. The normal disappearance of 
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Illustration of phenotypes resulting from heterochronic gene mutations in C. elegans. (A) A schematic LI 


stage larva is shown indicating the positions of the left lateral hypodermal blast cells. This pattern is repeated on the 
right lateral side of the animal. (B) The cell lineage of the V6 and T cells are shown for wild-type and lin-! 4 null (0) and 
gain-of-function (gf) mutants. The vertical axis indicates developmental time, showing the four larval stages and the 
adult stage. The marks on the Hatch verticle axis indicate the molts. In the lineage diagrams, vertical lines indicate 
cells and horizontal lines indicate cell divisions. The triple horizontal bars indicate terminal differentiation and 
synthesis of the adult cuticular ridges termed alae. V|—V4 lineage patterns resemble the V6 lineage and the remaining 
hypodermal blast cell lineage patterns contain slight variations. Arrows indicate that the division pattern is repeated 
through additional molting cycles not observed in wild-type animals. Cells that undergo a programmed cell death are 
indicated with an ‘X? SI-S4 and A are used to denote the stage-specific division patterns in wild-type animals. (C) The 
cell division patterns defined in (B) are used to summarize the phenotypes of heterochronic mutants lin-4, lin-1 4, lin- 


28, and lin-29. 


LIN-14 protein in young L2 larvae requires wild-type 
lin-4 activity. In lin-4 mutants, LIN-14 remains inap- 
propriately high, again resulting in reiteration of S1 
patterns. The functional /in-4 product is not a protein, 
but rather a small RNA molecule with antisense com- 
plementarity to sequences present in the 3’ untrans- 
lated region (UTR) of the lin-14 mRNA. These 
complementary sequences are deleted in lin-14(gf) 
mutants, rendering the mutant /in-14 mRNAs insensi- 
tive to lin-4 activity and preventing down-regulation 
of LIN-14 levels. 

lin-28 encodes a cytoplasmic protein with RNA 
binding motifs and is also downregulated through 
a lin-4-complementary site within its 3’ UTR. The 


disappearance of the /in-14 and lin-28 gene products 
during early larval stages ultimately allows accumula- 
tion of LIN-29 in hypodermal cells during the L4 
larval stage. lin-29 encodes a transcription factor 
with five Cys2-His2 type zinc finger motifs and trig- 
gers the switch to the adult program by regulating the 
expression of other genes, including stage-specifically 
expressed cuticle collagen genes. 


Coordination of Developmental Time 
Throughout the Organism 


Cell division defects in /in-29 mutants are limited to 
the hypodermis. Thus /in-29 is a downstream effector 
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of the timing genes in a specific cell type. In contrast, 
the upstream genes in the heterochronic pathway, lin- 
4, lin-14, and lin-28, are more global temporal regula- 
tors. In addition to controlling stage-specific division 
patterns in the hypodermis, they also regulate tem- 
poral patterning in several other cell types including 
muscle, neurons, and intestine. These genes act in 
the temporal coordination of developmental events 
throughout the organism, presumably by controlling 
genes with effector functions analogous to that of 
lin-29. 


Developmental Timing Mutants in Other 
Organisms 


The molecular mechanisms that control the timing of 
developmental events in other organisms are also 
being elucidated. Mutations have been identified in 
several organisms that cause alterations in the time of 
onset of certain developmental events and define genes 
with roles in the temporal progression of patterning. 
eek aa to the heterochronic gene mutations in 
C. elegans, mutations in these genes either advance 
or retard the expression of specific developmental 
programs. For example, in Dictyostelium, mutations 
in rde cause premature terminal differentiation of stalk 
and spore cells, and in maize, mutations in the 
Teopod1, Teopod2, and Teopod3 genes retard the tran- 
sition between the expression of juvenile and adult 
characteristics in shoot development, while mutations 
in glossy15 cause premature expression of adult char- 
acteristics. In Drosophila, mutation of the ana gene 
causes certain neuroblasts to proliferate too early. 
Finally, one example of a developmental timing 
abnormality described in humans is altered time of 
onset of puberty. Puberty, or sexual maturation, is a 
developmental event that is normally timed to occur 
in the early teenage years, triggered by the synthesis of 
hormones which must be produced and function at 
the correct developmental time. Individuals have been 
described in which puberty is triggered at the wrong 
time, resulting in premature or delayed puberty. A 
variety of molecular defects can cause these condition. 
In males, precocious puberty can be caused by a dom- 
inant gain-of-function mutation in the luteinizing 
hormone receptor. Luteinizing hormone (LH) binds 
this receptor causing specific cells in the testes to syn- 
thesize testosterone, thus triggering sexual matur- 
ation. The receptor mutation causes the receptor to 
behave as if LH is present when it is not and testoster- 
one is produced abnormally early, leading to preco- 
cious sexual maturity. Conversely, individuals with an 
inactive LH receptor fail to undergo sexual matur- 
ation at puberty, an abnormality that may be inter- 
preted as retarded expression of the juvenile program. 


Relationship to Heterochrony 


The term heterochrony is usually applied in an evolu- 
tionary context, referring to a change in the timing of a 
developmental event in an organism relative to when 
that event occurred in its ancestors. Naturally occur- 
ring heterochronic mutations analogous to those 
described here could, if stably incorporated into 
a population, result in heterochrony and provide 
a mechanism for evolutionary variation between 
species. 


Further Reading 

Ambros V (1997) Heterochronic genes. In: C. elegans Il, pp. 
501-518. Plainview, NY: Cold Spring Harbor Laboratory 
Press. 

Slack F and Ruvkun G (1997) Temporal pattern formation by 
heterochronic genes. Annual Review of Genetics 31: 61 1-634. 


See also: Caenorhabditis elegans; Cell Division in 
Caenorhabditis elegans 


Heterochrony 


See: Neoteny 


Heteroduplexes 
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Hybrid DNA is formed from complementary single 
DNA strands from two different parental molecules. 
The parental molecules must be homologous with 
each other, that is, they have the same sequence of 
base pairs overall. This does not exclude the pos- 
sibility that there are allelic differences between the 
parental molecules, in which case, there will be mis- 
matched base pairs within the hybrid molecule. 
Hybrid DNA with such mismatches is called 
heteroduplex DNA. The term heteroduplex is also 
sometimes used to mean hybrid DNA, whether or 
not it contains a mismatch. A mismatched base pair 
is a pair of bases in complementary nucleotide chains 
that are unable to form the correct hydrogen bonds 
between them, despite being chemically correct. The 
mismatches will cause distortion of the DNA 
molecule, often with the bases swinging into a 
position outside the double helix (extrahelical bases). 
Mismatches also occur as single nucleotides or 
short deletions and insertions, forming loops of 
unpaired single strands. Substantial heterologies 
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Resolution of mismatched base pairs in heteroduplex DNA. (A) Resolution by replication. WVhen the 


replication fork passes a mismatch, the two chains are separated and replicated faithfully, so that each daughter 
molecule is now of one genotype or the other. (B and C) Resolution by mismatch repair. The mismatch is recognized 
and excised on one strand or the other. Copying the remaining strand restores homoduplex. 


(nonhomologous sequences) can be incorporated into 
heteroduplex, in which case there will be a large 
unpaired loop. 

Evidence for heteroduplex was described in 1952 
based on the occurrence of bacteriophage bursts, 
derived from a single phage particle, that were found 
to contain the genotype of both parents. This is inter- 
preted as each DNA strand having carried one geno- 
type. When the heteroduplex is replicated, each 
strand is copied faithfully and the first two progeny 
each have one of the two genotypes (see Figure IA). 
Not long after heteroduplex was first described, it 
was detected in meiotic tetrads of spores in several 
different fungi. These fungi have eight spores derived 
mitotically from the four meiotic products. Mitotic 
spore pairs were seen that differed in genotype 
from each other. Whereas other pairs of alleles had 
segregated from each other during meiosis, these 
mixed spore pairs were evidence that segregation 
could also occur during the following mitosis. This 
phenomenon is therefore known as _ postmeiotic 
segregation. These observations gave rise to the idea 
that recombination proceeded by the formation 
of hybrid molecules joined by complementary base 
pairing. Those mismatches that do not show post- 
meiotic segregation have been resolved to homo- 
zygosity by a mismatch repair system (see Figure IB 
and ©). 


Natural Occurrence of Heteroduplex 


Ideas on how heteroduplex DNA is formed during the 
process of recombination are discussed in detail else- 
where (Recombination, Models of). Although single 
strands of DNA can anneal spontaneously and quite 
rapidly, in vivo the process is catalyzed by a class of 
proteins of which RecA from Escherichia coli is the 
best-known example. Eukaryotic homologs of RecA 
are known as Rad51, after a RecA homolog found in 
Saccharomyces cerevisiae. These proteins can also cata- 
lyze the invasion of a duplex by a single strand and, 
once the reaction has begun, the reciprocal exchange of 
strands between two duplex molecules. This generates 
hybrid DNA reciprocally on two DNA molecules. 


Making Heteroduplex in the Laboratory 


Heteroduplex is also generated in the laboratory for 
use in experiments on mismatch repair mechanisms. 
This is readily done by use of certain bacteriophage 
DNA that occurs both as duplex DNA while growing 
in the infected cell, and as single strands in the mature 
viral particles. Separation of the strands of the duplex 
and reannealing with an excess of single strand DNA 
from phage of a different genotype yields hetero- 
duplex. Another method is available for use with 
bacteriophage lambda, which has two strands of 
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different density. The separated linear single strands 
can be isolated individually by density gradient cen- 
trifugation and then annealed with complementary 
single strands of a different genotype producing 
heteroduplex molecules. 


Mismatch Repair 


The best known mismatch repair system is the Mut 
system of E. coli. Homologous systems are found 
in eukaryotes. It is called Mut because mutations in 
this system cause cells to have a mutator phenotype. 
This is because the mismatch repair system acts 
on mismatches generated by replication errors, as 
well as those occurring in heteroduplex DNA. 
The mismatch repair system acts on mismatches in 
heteroduplex at two different levels. It prevents the 
formation of heteroduplex between molecules that 
have substantial divergence in their sequence, as 
would be encountered in interspecific crosses. In 
intraspecific heteroduplex, where mismatches are 
few, the mismatch repair system recognizes the mis- 
match and excises one strand over a distance of a few 
hundred base pairs. The resulting gap is then filled 
by DNA synthesis that copies the remaining strand. 
This results in homoduplex of one genotype or the 
other (see Figure IB and C). Mismatch repair of 
heteroduplex is the major mechanism of gene con- 
version. 

Different mismatches are recognized by the 
mismatch repair system with different efficiency. The 
frequency of DNA-mediated transformation in 
pneumococcus varies with the efficiency of mismatch 
repair. Mismatches that are readily recognized are 
excised from the donor strand so that incorporation 
into the genome is rare, while those that escape recog- 
nition are incorporated very frequently. This obser- 
vation was interpreted as showing the effects of 
mismatch correction as early as 1966. The C-C base 
pair is poorly recognized in several organisms. These 
differences underlie many marker effects, that is, 
situations in which the nature of the heterozygosity 
present in a cross has an effect on the outcome of the 
experiment. 


Further Reading 

Ephussi- Taylor H and Gray TC (1966) Genetic studies of recom- 
bining DNA in pneumococcal transformation. Journal of Gen- 
eral Physiology 49 (suppl.): 211-231. 

Hershey AD and Chase M (1952) Genetic recombination and 
heterozygosis in bacteriophage. Cold Spring Harbor Symposia 
on Quantitative Biology 16: 471-479. 


See also: Marker Effect; Mismatch Repair (Long/ 
Short Patch); Recombination, Models of 


Heterogenote 
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‘Heterogenote’ is a term meaning the same as hetero- 
zygote, viz., a diploid organism having different alleles 
for one or more genes that therefore produces dif- 
ferent gametes. 


See also: Heterozygote and Heterozygosis 


Heterokaryon 
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A heterokaryon is a cell containing two or more nuclei 
of different origin or in different states in a common 
cytoplasm. Examples include: (1) a mouse and a 
human nucleus as separate and distinct organelles 
within a single cell; or (2) two nuclei in different 
epigenetic states, one from a liver cell, the other from 
a pancreatic cell within a common cytoplasm; or (3) 
nuclei at different positions within the cell cycle 
bounded by a cell membrane. Heterokaryons are pro- 
duced by bringing two different cells into contact and 
then inducing membrane fusion to produce a single 
cell with a common cytoplasm and containing mul- 
tiple donor nuclei. Heterokaryon analysis has been 
useful in determining nuclear cytoplasmic interactions 
and particularly the influence of cytoplasmic factors 
on nuclear gene expression. 


See also: Nuclear Transfer 


Heteropyknosis 


AT Sumner 
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Heteropyknosis is the attribute of chromatin that 
shows condensation behavior different from that of 
‘normal’ chromatin (generally equivalent to euchro- 
matin). Heterochromatin typically shows ‘positive 
heteropyknosis’ by remaining condensed in inter- 
phase. Chromosomal regions that show less con- 
densation than the rest of the chromosome during 


prophase or metaphase are said to show ‘negative 
heteropyknosis.’ 


See also: Chromatin; Heterochromatin 


Heterosis 
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Heterosis is a synonym for hybrid vigor: the increased 
size, performance, resistance, and strength of hybrids. 
Heterosis is particularly pronounced in crosses be- 
tween inbred strains. Early in the twentieth century, 
after the rediscovery of Mendelian inheritance, it be- 
came obvious that hybrids had greater heterozygosity 
than their parents. The word ‘heterosis’ was coined by 
G.H. Shull as a descriptive term to avoid such cumber- 
some expressions as ‘the stimulus of heterozygosis;’ it 
is not intended to favor any genetic hypothesis. 

The weakening effect of inbreeding and the vigor of 
hybrids has been known since classical antiquity. The 
hardiness and strength of mules were recognized and 
made use of by the Greeks and especially the Romans. 
In the nineteenth century many botanists noticed that 
species hybrids regularly exceeded their parents in 
size. The most thorough analysis was done by Charles 
Darwin, whose book The Effects of Cross- and Self- 
Fertilization in the Vegetable Kingdom (Darwin, 
1876) can still be read with profit. In this he says: 


The first and most important conclusion which may be 
drawn from the observations given in this volume, is that 
cross-fertilization is generally beneficial and self-fertilization 
injurious. 


An understanding of heterosis in genetic terms had to 
await the rediscovery of Mendel’s laws in 1900. It was 
immediately apparent that hybrids are more hetero- 
zygous than their parents. A decrease in the number 
of heterozygotes implied an increase in the number of 
homozygotes. This immediately gave rise to two 
explanations. The ‘dominance’ hypothesis notes that 
most recessive mutants are deleterious, so inbred lines 
are weakened by having an increase in the number of 
homozygous recessive genes. Hybrids, in contrast, are 
stronger because the recessives from each parent are 
usually concealed by dominants from the other. The 
‘overdominance’ hypothesis assumes that there are 
some loci at which the heterozygote is superior to 
either homozygote. Although the two ideas are not 
mutually exclusive, the dominance hypothesis is now 
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generally favored. This explanation also applies to 
variety and species hybrids, because the hybrids are 
always more heterozygous than their parents, the 
more so as the parents diverge. The contrast is greatest, 
however, when the parents are highly homozygous 
inbred lines. 

The greatest practical impact of heterosis has been 
from hybrid corn. Inbred lines have been developed 
and crossed to produce hybrids that are grown by 
the farmer. The inbred lines are selected not only for 
their own performance, but for producing superior 
hybrids. Since the introduction of hybrid maize in 
the 1930s, the yield of corn has increased about five- 
fold. It represents a high point in modern agriculture. 
About 70% of the improvement is the result of super- 
ior hybrids, while the remainder is due to improved 
agronomic practices. 

Although less widely applied than in maize, other 
horticultural and cereal crops also show heterosis. In 
many cases the corn model of crossing inbred lines has 
been productive. In others the heterosis is not so great 
and greater practical results are obtained by more 
conventional breeding methods. 


Reference 
Darwin C (1876) The Effects of Cross- and Self-Fertilization in the 
Vegetable Kingdom. London: John Murray. 


See also: Overdominance 


Heterotrimeric G Proteins 
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Heterotrimeric guanine nucleotide-binding proteins 
(G proteins) form an ancient family of signaling mol- 
ecules that connect seven-helical transmembrane 
receptors (7-TM receptors) to a limited set of intracel- 
lular effectors. 7-TM receptors are one of the largest 
receptor families in vertebrates and function in a var- 
iety of cellular processes. Thus, 7-TM receptors are 
required for the response to hormones and neuro- 
transmitters, but are also required for light detection 
in the visual system and odorant sensation in olfactory 
cells. Downstream effectors of 7-TM receptors and G 
proteins can be enzymes such as adenylyl cyclases, 
phosphodiesterases and phospholipases, ion channels 
or other intracellular proteins. G protein activation 
can stimulate or inhibit such effectors, resulting in 
the generation or breakdown of second messengers. 
An important property of G-protein-coupled signal 
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transduction is that at each step in the pathway there is 
a considerable amplification of the signal. 

Heterotrimeric G proteins consist of a guanine 
nucleotide-binding Ga subunit and a closely asso- 
ciated GBy subunit, both of which are linked to the 
plasma membrane through lipid modifications. Based 
on sequence similarity and shared intracellular effect- 
ors, mammalian Ga subunits can be divided into four 
subfamilies: Gs, Gi, Gg, and G12. Both the Ga and the 
GBy subunit have signaling capabilities and can inter- 
act with specific targets in the cell. 

Heterotrimeric G proteins act as molecular 
switches in signal transduction. In the inactive state, 
the Ga subunit is associated with a molecule of GDP 
and is complexed with the GBy subunit. Ligand bind- 
ing by an appropriate 7-TM receptor will induce the 
Ga subunit to exchange GDP for GTP, which results 
in the dissociation of the two subunits, enabling them 
to interact with their specific targets in the cell. The 
intrinsic GTPase activity of the Ga subunit hydro- 
lyzes the bound GTP back to GDP, allowing reasso- 
ciation with the GBy subunit to restore the inactive 
heterotrimeric complex. The relatively slow GTPase 
activity of the Ga subunit cannot completely account 
for the fast GTP hydrolysis observed in vivo. A family 
of RGS domain (Regulator of G protein)-containing 
proteins is responsible for enhancing the slow GTPase 
activity of specific Ga subunits. 

Structural analysis of heterotrimeric G proteins has 
resulted in considerable insight into the molecular 
mechanism of GTPase activity and the molecular inter- 
action of the Ga subunit with its effectors. The crystal 
structure of Ga shows that the Ga subunit consists 
of a guanine nucleotide-binding domain that is struc- 
turally similar to small G proteins such as Ras and 
elongation factor Tu, and a helical domain that is 
unique to heterotrimeric G proteins. Thus, the helical 
domain has functions that are performed by separate 
proteins in small G proteins. Thus, the helical domain 
prevents dissociation of GDP from the guanine- 
nucleotide-binding core and functions in GTP hydro- 
lysis. The catalytic mechanism of the Ga GTPase 
activity and the conformational changes necessary 
for GBy dissociation and effector interactions were 
determined from the structures of Ga-GDP, Ga-GTP, 
and the complete heterotrimeric complex. The general 
picture that emerges from these studies is that differ- 
ential binding of guanine nucleotides induces specific 
conformational changes in the Ga subunit that allow it 
to release or bind the GBy subunit and enable it to 
interact with its effectors. 

Multiple G-protein-coupled signal transduction 
pathways may function in a single cell. Consequently, 
G proteins form complex signal transduction 
networks im vivo. Insight into the complexity of 


G-protein-coupled signal transduction pathways can 
be gained from genetic studies. Model organisms such 
as the yeast Saccharomyces cerevisiae, the slime mold 
Dictyostelium discoideum, the nematode Caenorhabdi- 
tis elegans, the fruitfly Drosophila melanogaster, and the 
mouse have been used to study G protein signaling in 
vivo. In yeast and Dictyostelium, G proteins transmit 
developmental signals such as a pheromone and aggre- 
gation signal. Inthe metazoan organisms C. elegans and 
Drosophila, G proteins have beenadapted to transducea 
more complex set of developmental, endocrine, and 
sensory signals. Clear homologs of the four mammalian 
subfamilies of Ga subunits are present in these organ- 
ismsand they serve as animportant model for conserved 
G-protein-coupled signal transduction. The powerful 
genetic tools available for C. elegans and Drosophila 
allow detailed genetic dissection of G protein signaling. 
Using genetics, novel players of G-protein-coupled sig- 
nal transduction pathways have been discovered. An 
example is the family of RGS proteins, which was first 
identified as a negative regulator of G-protein signaling 
in yeast and C. elegans. 


See also: Signal Transduction 


Heterozygote and 
Heterozygosis 
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A heterozygote is an individual whose DNA mol- 
ecules in a homologous pair of chromosomes differ in 
sequence at a particular genetic locus. (A homozygote 
is an individual whose DNA sequences at a locus are 
identical.) Usually this locus will be a gene and the 
different forms of the gene are called alleles. A locus is 
said to be in heterozygosis when two alternate alleles 
are present. If the phenotype of the heterozygote is 
normal, the effects of the alternate allele are said to 
be recessive to the normal allele. Conversely, if the 
phenotype of the heterozygote is abnormal then the 
effects of the alternate allele are said to be dominant. A 
major task in human medical genetics is identifying 
whether a patient with a normal phenotype is a het- 
erozygote (carrier of a disease allele). 


Alleles and Heterozygotes in Populations 


The alternate DNA sequence, or allele, may occur 
rarely or commonly in the whole population. When 


the alternate allele is very rare, it is often called a 
mutant and the common allele is called the normal or 
wild-type. Shorthand for the wild-type homozygote 
is +/+, for the wild-type/mutant heterozygote is 
+/m, and for the mutant homozygote is m/m. If the 
mutant is dominant to normal its symbol is capital- 
ized, i.e., +/M. At some loci the alternate alleles have 
more equal proportions, such as the three common 
alleles of the human ABO blood group. When the 
frequency of a variant allele in a population is too 
high to be explained by recurrent mutation, it is called 
a polymorphism. Even though there may be three or 
more alleles of a locus in the population, an individual 
with normal chromosomes can only have a maximum 
of two alleles at that locus, one for each chromosome. 
(Sex-linked loci will either have one or two alleles 
depending on the individual’s sex and thus number 
of each sex chromosome.) 


Allelic Origins in a Heterozygote 


A heterozygote carrying a common variant such as a 
blood group antigen will have inherited it from a 
parent who also has that variant either as a hetero- 
zygote or as a homozygote. If the alternate sequence 
is unique to that individual, with neither of the parents 
carrying the variant, then it will have arisen as the 
result of mutation. In population terms, an allele that 
is rare will more commonly be present in hetero- 
zygotes than in homozygotes. The exact proportions 
can be calculated using the Hardy-Weinberg Law; as 
an example, approximately 1 in 20 people in Scotland 
are heterozygous carriers of one cystic fibrosis muta- 
tion (+/cf), while only 1 in 1600 people are affected 
and are homozygous for two cystic fibrosis mutations 


(if). 


Single Nucleotide Polymorphisms 


Not all alleles will have an effect on the phenotype of 
the individual. Some DNA sequence changes will have 
no effect on the final structure and function of the 
protein coded by the gene. Nonetheless, when identi- 
fied, they can be used to track rarer, unidentified dis- 
ease causing mutations to which they are linked by 
being situated nearby on the same DNA molecule. 
Some silent variants in DNA sequence effect a change 
at a single nucleotide only (single nucleotide poly- 
morphism, SNP) and occur at regular intervals 
throughout the genome. The study of genetic com- 
ponents of common diseases such as hypertension will 
be revolutionized by comparing SNPs in healthy and 
affected members of the population. 
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Phenotypic Effects of Alleles in 
Heterozygotes 


Alleles that are recessive have no effect on the pheno- 
type of a heterozygote (+/m). Recessive alleles usually 
involve changes to the coded protein which result in 
loss of normal function. In heterozygotes, the wild- 
type allele on the other chromosome produces suffi- 
cient normal protein to maintain healthy function and 
phenotype and the disease phenotype is only seen 
when an individual is homozygous with two mutant 
recessive alleles (m/m). Dominant mutant alleles will 
affect the phenotype of the heterozygote (+/M). In 
this situation, the mutant protein may have gained a 
new function that affects the phenotype even in the 
presence of the normal protein. The mutant protein 
may not be processed or broken down at the same rate 
as the normal protein. Another possibility is that the 
protein may function normally by forming polymers 
or chains. In this case, a heterozygote will form poly- 
mers that are a mixture of normal and mutant proteins. 
The resulting compound polymer will have a different 
structure and function to the normal polymer. 


Reproductive Fitness of Heterozygotes 


The reproductive fitness of a heterozygote is only 
affected if the phenotype is altered. Thus, genetic selec- 
tion can act on heterozygotes for a dominant mutation. 
If the heterozygotes for a disease mutation have a low 
reproductive fitness, the mutant allele will only be 
maintained in the population by the process of new 
mutation. In recessive disorders the heterozygotes 
have a normal phenotype and genetic selection can 
only act on the affected homozygotes. Since the major- 
ity of mutant alleles in a population are present in 
healthy heterozygotes, the frequency of the two alleles 
will change very little from generation to generation, 
even if none of the mutant homozygotes reproduce and 
their alleles are lost to the population each generation. 


Heterozygote Advantage 


In some circumstances, the effects of a recessive 
mutation can affect the phenotype and thus reproduct- 
ive fitness of heterozygotes. This is not always a nega- 
tive effect as can be seen in the condition human 
sickle-cell anemia. Sickle-cell carriers have a hetero- 
zygote advantage over the reproductive fitness of 
normal homozygotes in some environments. In most 
populations, sickle-cell anemia is a rare mutation, but 
in malarial regions of Africa as many as one in three of 
the population are carriers of the mutation in the 
hemoglobin gene. The presence of the mutant hemo- 
globin in heterozygotes interferes with the malarial 
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parasite’s life cycle. Heterozygotes are therefore more 
resistant to the debilitating effects of malaria than the 
normal homozygotes. This heterozygote advantage in 
many sickle-cell carriers outweighs the severe repro- 
ductive disadvantage of the rarer sickle-cell homo- 
zygotes. This maintains the mutation in this population 
at a high frequency as a polymorphism. 


See also: Balanced Polymorphism; Heterosis; 
Sickle Cell Anemia 
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Hfr strains of bacteria such as Escherichia coli are 
strains carrying an integrated conjugative plasmid 
such as the ~100kb F (for fertility) factor (see: 


F Factor). This enables them to transfer their chromo- 
somal DNA to other bacteria into which the DNA 
can recombine. The existence of Hfr strains of E. coli 
was observed by their high frequency of recombin- 
ation with other bacteria. This was possible because 
some of the strains mixed by Joshua Lederberg 
and Edward Tatum in early mating experiments 
(Lederberg and Tatum, 1946b) or contained the F 
conjugative plasmid and others did not. In cultures 
of cells carrying an F, some of the cells are Hfrs 
(have an integrated F). The non-F-carrying strains 
are called female or recipient bacteria and the Hfr or 
F-carrying strains are male or donors. Transfer of 
conjugative plasmid DNA, or chromosomal DNA in 
Hfr cells, is unidirectional, i.e., male to female (Hayes, 
1952). Males can be recipients only at much lower 
efficiency, or under special environmental conditions. 

Hfr strains form because the F carries transposable 
elements that are also carried by the E. coli chromo- 
some: two copies of the insertion sequence IS3, one 
IS2, and one copy of transposon Tn000 (also called 
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Formation of an Hfr by recombination of the F plasmid with the Escherichia coli chromosome. Single lines 


represent duplex DNA. Triangles represent transposable genetic elements IS3 (}), IS2 (X), and Tn! 000 (p) that are 
present in the F (double lines) and also in the E. coli chromosome (single lines). These elements provide regions of 
DNA sequence identity between which homologous recombination can occur (represented by an X), incorporating 


the F into the chromosome and producing an Hfr. 
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Figure 2 Conjugative transfer of chromosomal DNA from an Hfr donor bacterium to an F recipient bacterium. 
Single lines represent single strands of DNA; dashed lines represent newly synthesized DNA and arrowheads 


represent 3’ ends. Transfer begins with single-strand cleavage of F DNA ( 


) at the transfer origin, oriT, followed by 


DNA synthesis, which displaces a single strand that is transferred into the recipient. Synthesis of the complementary 
strand occurs in the recipient, and the duplex fragment (——) can be incorporated into the recipient chromosome 


(=) by recombination. 


v6) (Figure |). These elements are regions of DNA 
sequence homology between the chromosome and 
the F that allow the F to recombine with the chromo- 
some and become incorporated into the chromosome 
(Figure |). The F can integrate at many different sites in 
the chromosome, making many different Hfr strains. 
Each different Hfr is capable of high frequency transfer 
of the chromosomal DNA next to itself, or if given 
enough time to mate without interruption, the whole 
4.7 megabase E. coli chromosome. 

Transfer of chromosomal DNA from an Hfr to a 
female cell is depicted in Figure 2. Transfer begins by 
action of an F-encoded single-strand endonuclease 
and helicase, Tral, on the F origin of transfer, oriT. 
Leading strand synthesis is primed from the 3’ end at 
the nick and displaces the 5’ DNA strand. Continued 


synthesis displaces that strand extending into the con- 
tiguous bacterial DNA, and the single DNA strand 
displaced is transferred into a female bacterium that 
has become attached to the male in a mating pair. 
‘Transfer stops at random locations when the synthesis 
tract encounters a DNA break in the donor template, 
or due to breakage of the transferred strand. The 
occurrence of such random disruptions produces a 
gradient of transfer, with DNA near oriT being 
transferred most efficiently, and decreasing trans- 
fer efficiency with increasing distance from oriT. 
Once inside the female, lagging strand synthesis of 
the complementary strand takes place, creating a 
double-strand linear DNA fragment. The transferred 
DNA will be lost unless it recombines into the recipi- 
ent chromosome, which it can do (Xs in Figure 2) 
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using the cell’s RecBCD system of homologous recom- 
bination of linear DNA and double-strand break 
repair. This results in homologous replacement of a 
segment of recipient DNA with sequences derived 
from the donor chromosome. If that segment contains 
different genetic information (prototrophic pro” infor- 
mation is depicted in the transferred piece entering an 
auxotrophic pro” recipient in Figure 2), the recipient 
can become genetically recombinant. Recombinant 
strains made by Hfr conjugation do notusually become 
male (Hfr) upon acquisition of donor DNA, because 
the F transfer genes are the last to be transferred and are 
not homologous with the recipient DNA. 

Hfr crosses provided the first demonstration of 
genetic recombination in bacteria and in so doing 
encouraged the idea that bacteria, like other organisms, 
possess genes. Hfr crosses were also the first tools used 
for exploration of the proteins and enzymes that cata- 
lyze DNA recombination, leading to the discovery of, 
for example, RecA (Clark and Margulies, 1965), a uni- 
versal recombination and DNA repair protein of 
which there are orthologs in all eubacterial, eukaryo- 
tic, and archaeal species examined to date. For descrip- 
tions of the E. coli rec genes discovered using Hfr 
crosses, the recombination systems and pathways, 
and double-strand break repair machinery of E. coli. 
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Inversion Systems 


The DNA invertases catalyze a recombination reac- 
tion that inverts a segment of DNA between two 
specific recombination sites. The best-characterized 
invertases, Hin from Salmonella typhimurium and 
Gin from bacteriophage Mu, catalyze site-specific 
inversion reactions that result in alternate gene 
expression. The Hin invertase regulates flagellar 
phase variation in Salmonella, allowing the bacterium 
to evade a host immune response (Figure 1A). In one 
orientation, a promoter located within the invertible 
segment of DNA directs the expression of the 
H2 flagellin gene (f1jB), as well as a repressor of the H1 
flagellin gene ( fljC). After Hin catalyzes a site- 
specific inversion event, the promoter becomes 
inverted and can no longer drive the expression of 
these genes. Consequently, the H1 flagellin gene is 
expressed from its unlinked site. The Gin invertase 
of bacteriophage Mu controls the alternate expression 
of tail fiber genes (Figure IB). Each orientation of the 
invertible segment in bacteriophage Mu encodes a 
different C-terminal portion of the tail fiber protein 
S. Site-specific inversion catalyzed by Gin switches 
the expression of the C-terminal part of the protein, 
which determines the host specificity range for the 
phage. The Cin-mediated reaction of phage P1 per- 
forms a similar function. Due to the homology of 
these proteins and the similarity of their recombin- 
ation substrates, the characterized invertases are func- 
tionally interchangeable. The invertases belong to the 
resolvase/invertase (also known as the serine) family 
of recombinases which currently has over 50 mem- 
bers. Site-specific DNA inversions can also be cata- 
lyzed by recombinases belonging to the phage 
integrase (also known as tyrosine recombinase) family. 


Site-Specific Inversion Reaction 


Site-specific inversion by Hin and Gin has been 
studied extensively both im vivo and in vitro. The 
invertases require a supercoiled DNA substrate that 
contains two inversely oriented recombination sites. 
The 26-bp recombination sites have partial dyad 
symmetry with the central two base pairs being the 
site of DNA strand exchange (Figure IC). For effi- 


cient recombination the invertases also require 
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Figure | Regulation of gene expression by site-specific DNA inversion. (A) Salmonella invertible DNA segment. 


The hixL and hixR recombination sites are shown as dark rectangles. The recombinational enhancer is depicted as a 
striped rectangle. The | kb invertible segment, located between the two recombination sites, contains the hin gene 
and a promoter (P) that directs the expression of the flagellar genes, fljB and fljA. Hin-catalyzed inversion switches the 
orientation of the invertible segment such that the promoter can no longer direct the expression of fIjB and fljA. 
(B) Phage Mu invertible DNA segment. The gixL and gixR recombination sites are depicted as dark rectangles. The 
recombinational enhancer, illustrated as a striped rectangle, and the gin gene are located outside of the ~3 kb 
invertible segment. A promoter (P) located outside of the invertible segment controls the expression of the S and U 
tail fiber genes. The constant N-terminal portion of the S tail fiber gene (S,) is also located outside of the invertible 
segment, while the variable C-terminal portion of the S tail fiber gene (S,) and the U gene are located within the 
invertible segment. Gin-catalyzed inversion of the invertible segment alternates the expression of the S, and U genes 
with the S; and U’ genes. (C) Sequence of the invertase recombination sites. The recombination site consensus 
sequence for the invertase family of recombinases is shown at the top. The hixL and gixL recombination site sequences 
are shown below the consensus sequence. The arrows mark the sites of 2 bp staggered double-strand DNA cleavage. 
The relative orientation of the recombination sites are determined by these two core nucleotides. 


another cis-acting DNA element called the recombin- 
ational enhancer. Each recombination site is bound 
by a dimer of the Hin or Gin recombinase, and the 
enhancer contains two binding sites for the dimeric 
protein Fis. Once bound to their respective DNA 
sites, Hin/Gin and Fis dimers are able to assemble 
into a higher order nucleoprotein complex called 


an invertasome (Figure 2iii). The DNA bending pro- 
tein HU also aids in the formation of the inverta- 
some complex in the Hin system by facilitating the 
bending of a small loop of DNA between one recom- 
bination site and the enhancer. Once assembled in 
the invertasome structure, Fis stimulates Hin/Gin to 
catalyze recombination. The inversion reaction can 
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be broken down into two basic catalytic steps: DNA 
cleavage and strand exchange. The recombination sites 
are concertedly cleaved, producing 2 bp staggered 
double-strand DNA breaks (Figure IC). In this reac- 
tion, a serine nucleophile in each invertase subunit 
bound to the recombination sites attacks the phos- 
phate backbone, resulting in a phosphoserine bond 
with the 5’ recessed end of the DNA. After DNA 
cleavage, the DNA ends are exchanged and the recom- 
bination sites are religated in a recombinant config- 
uration through a reversal of the phosphoserine 
linkage. 
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Invertasome 


The three DNA sites must synapse in a highly specific 
fashion to form an invertasome complex. The Fis- 
bound enhancer interacts with the invertase-bound 
recombination sites at a branch in plectonemically 
supercoiled DNA (Figure 2iii). The recombination 
sites pass on either side of the enhancer such that 
two negative DNA nodes are trapped within the com- 
plex. Immunoelectron microscopy of crosslinked 
invertasome complexes has provided direct evidence 
for the three-looped DNA structures containing Hin 


hixL 


Enhancer 


hixR 


(iv) (v) 


Figure 2 The site-specific inversion reaction. Pathway of invertasome assembly using the Hin system as a model. (i) 
Supercoiled DNA substrates contain two inversely oriented recombination sites, hixL and hixR, and a recombinational 
enhancer. (ii) A Hin recombinase dimer binds to each recombination site and two Fis dimers bind to the 
recombinational enhancer. (iii) Hin and Fis assemble into an invertasome complex with the aid of the DNA-bending 
protein HU. In the invertasome complex the recombination sites associate with the enhancer at a branch in the 
supercoiled DNA. (iv) Once assembled in the invertasome complex, Fis activates Hin to catalyze DNA cleavage and 
strand exchange. (v) Recombination results in an inversion of the segment of DNA located between 


the recombination sites. 
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and Fis. The sizes and positions of the DNA loops in 
the invertasome complex were consistent with the 
enhancer associating with both recombination sites 
at a branch in supercoiled DNA. The formation of 
these structures was absolutely dependent on DNA 
supercoiling. 

The specific topology of the DNA strands in the 
invertasome complex has been determined through 
several experimental approaches. The change in link- 
ing number observed in the DNA molecules after 
inversion by Gin suggested that two negative DNA 
nodes were trapped in the invertasome complex. In 
addition, the stereostructure of knotted DNA pro- 
ducts generated from iterative rounds of Hin/Gin 
recombination provided strong evidence for this 
specific configuration of DNA strands at synapsis. 
The knotted DNA products also indicated that each 
round of recombination results in a 180° right-handed 
rotation of the DNA ends. Since the invertase is co- 
valently associated with the DNA ends during strand 
exchange, this observation implies that the recombin- 
ase subunits must also undergo a rotation. Direct 
experimental evidence for exchange of subunits 
between dimers accompanying strand exchange, how- 
ever, is lacking thus far. 


Regulation of Inversion Reaction by Fis 
and the Enhancer 


The DNA invertases catalyze recombination very 
weakly on their own even when two recombination 
sites have formed a complex. When Hin/Gin and Fis 
assemble the topologically correct invertasome com- 
plex, however, Fis activates each of the invertase sub- 
units to initiate the chemical steps of recombination. 
The enhancer is located 90-500 bp from the closest 
recombination site in the characterized inversion sys- 
tems. However, it can be artificially positioned many 
kilobases from the recombination sites and still func- 
tion effectively to activate the reaction. Although the 
position of the enhancer relative to the recombination 
sites is very flexible, the relative positions of the Fis 
binding sites within the enhancer (48 bp between their 
centers) is critical for efficient activity. The precise 
positioning of the Fis dimers on the enhancer enables 
both Fis dimers to contact the DNA invertases to 
assemble the invertasome. 

Effective regulation of invertase activity is essential 
to avoid unwanted chromosomal rearrangements at 
secondary recombinase binding sites found through- 
out the genome. Fis and the recombinational enhancer 
perform this control function by (1) limiting the 
location of recombination to the vicinity of the en- 
hancer and (2) strongly biasing the type of recombin- 
ation to DNA inversion rather than a deletion or 


intermolecular fusion. The weak association between 
the DNA invertase and Fis is overcome by DNA 
supercoiling. DNA supercoiling directs the appropri- 
ate three-site collision at the base of a plectonemic 
branch to form the invertasome structure where align- 
ment of the recombination sites specifies inversion. 

Mutational and structural studies have shown that 
the N-terminal region of Fis is responsible for activat- 
ing the invertases to catalyze recombination. This 
region contains two mobile B-hairpin arms that extend 
over 20 A from the Fis dimer core, although only one 
of these arms is required to activate the DNA invert- 
ase (Figure 3A). A triad of amino acids near the tip of 
one of these B-arms is believed to form the critical 
contact region with the invertase. The opposite end 
of the Fis dimer structure contains helix—turn—helix 
DNA binding motifs. The two DNA recognition 
helices within the Fis dimer are separated by only 25 
A rather than the usual 32-34 A, requiring the DNA 
to bend significantly when bound by Fis. 


Structure and Mechanism of Activation 
of DNA Invertases 


The 180-190 amino acid DNA invertases are organ- 
ized in a two-domain structure similar to the re- 
solvases (Figure 3B). The crystal structure of the 
C-terminal 52 amino acid DNA-binding domain of 
Hin revealed a 3 «-helix fold that displays aspects of 
both a bacterial helix—turn—helix motif and a eukary- 
otic homeodomain. The N-terminal catalytic and 
dimerization domain, which is located on the opposite 
side of the DNA from the DNA-binding domain, is 
believed to closely resemble the structure of the cata- 
lytic domain of y resolvase. In the resolvase-DNA 
crystal, the active site serines that form an ester linkage 
with the DNA upon cleavage are not located close to 
their sites of attack. Thus, it is likely that a conforma- 
tional change must occur within the recombinase 
structure in order to initiate catalysis. Fis—invertase 
interactions may induce a conformational change upon 
invertasome assembly that repositions the active sites 
within each invertase dimer to promote DNA cleav- 
age. Several lines of evidence suggest that this Fis- 
induced repositioning of the active sites may involve 
a quaternary change in the invertase dimer interface. A 
dimer containing a disulfide bond that covalently links 
the subunits is able to form synaptic complexes but is 
catalytically inactive. Certain detergents that partially 
destabilize the Hin dimer increase the rate of DNA 
cleavage by Hin over 30-fold. Additionally, a subset of 
amino acid substitutions within the dimer interface 
result in hyperactive mutants that are able to catalyze 
recombination without the presence of a recombin- 
ational enhancer or Fis. Reactions performed without 
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(A) 


Figure 3 Model of Fis and Hin dimers. (A) Structure of a Fis dimer. The N-terminal B-hairpin arms (protruding 
from the top of the structure in the figure) are responsible for stimulating invertase activity. The helix—turn—helix 
DNA-binding domains are located in the C-terminal end of the protein. (B) Model of a Hin dimer bound to a 
recombination site. The structure of the Hin C-terminal DNA-binding domain bound to a recombination half site was 
determined by X-ray crystallography. The N-terminal catalytic domain is modeled after the structure of the 
homologous recombinase yò resolvase. The location of the active-site nucleophile serine 10 is marked with a black 
ball. In this figure, the catalytic domains are located above the DNA and the DNA-binding domains are located below 


the DNA. 


Fis using the Fis-independent mutant DNA invert- 
ases, efficiently catalyze deletions as well as inversions 
since random collision of recombination sites yield 
catalytically active synaptic complexes. 


Current and Future Research 


Although the first steps are well established in this 
relatively simple recombination system, there are 
many questions yet to be answered. Researchers in 
the field are investigating the precise molecular 
arrangement of the proteins and DNA sites in the 
invertasome complex, the conformational changes 
that accompany catalytic activation, and the mech- 
anics of DNA strand exchange. 
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In Hirschsprung’s disease there is an obstruction of 
the intestine due to aganglionosis of the gut. Germline 
mutations of a receptor tyrosine kinase and proto- 
oncogene, RET, have been found in approximately 
50% of familial cases and 30% of isolated cases but 
the disorder is a model for a complex disorder. Muta- 
tions have been found in a few instances in four other 
genes, all of which are within functional pathways 
involving RET. Glial cell line derived neurotrophic 
factor is a soluble ligand of RET in which mutations 
have been found. Two components of a further signal- 
ing pathway involving RET, endothelin B receptor 
(EDNRB), and its ligand endothelin 3, are mutated 
in about 5% of cases as is SOX10 which regulates 
EDNRB expression. There is preliminary evidence 
for interactions between variants of these genes affect- 
ing the penetrance and severity of the disorder. 


See also: RET Proto-Oncogene 
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Histidine (His or H) is one of the 20 amino acids 
commonly found in proteins. Although it contains a 
positive charge it is only a weak base at neutral pH. Its 
chemical structure is: 
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Figure | Histidine. 


See also: Amino Acids; Proteins and Protein 
Structure 
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Histidine Operon as Model System 


Studies of the biosynthetic pathway leading to the 
synthesis of the amino acid histidine in prokaryotes 
and lower eukaryotes began more than 40 years ago. 
This effort resulted not only in an elucidation of the 
chemical intermediates in the pathway, but also in the 
unravelling of many fundamental mechanisms of biol- 
ogy. The histidine system was of the utmost import- 
ance in the definition and refinement of the operon 
theory and of the one operon—one messenger theory 
of transcription. Together with lac and trp, the his 
operon was used as a model system to study the phe- 
nomenon of polarity. Another area in which the his 
operon system played a fundamental role was the 
study of regulatory mutants and of the mechanisms 
governing operon expression in general. Together 
with early studies on the trp operon these studies 
were the basis for the characterization of a novel 
mechanism of gene regulation, termed attenuation. 
Studies of the mechanisms by which the first enzyme 
in the pathway was inhibited by feedback inhibition 
provided important insights into the allosteric regula- 
tion of biochemical reactions. 
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Histidine Biosynthetic Pathway 


The biosynthesis of histidine has been studied exten- 
sively in Salmonella typhimurium and Escherichia colt. 
In these microorganisms, a single operon composed of 
eight adjacent genes encodes the complete set of 
enzymes required for the biosynthesis of histidine. 
Three (hisD, hisB, and hisI) of the eight genes of the 
operon encode bifunctional enzymes, while two (hisH 
and hisF) encode enzymes that catalyze single steps, 
for a total of 10 enzymatic reactions. 

The first step in histidine biosynthesis (Figure 1) 
is the condensation of ATP and 5-phosphoribosyl 
1-pyrophosphate (PRPP) to form N’-5’-phosphoribo- 
syl-ATP (PRATP). This reaction is catalyzed by 
N’-5'/-phosphoribosyl-ATP transferase, the product 
of the /isG gene. This reaction is the one involved in 
feedback inhibition by the end product of the path- 
way, histidine. The inhibitory effect of histidine 
requires the presence of the product of the reaction, 
PRATP, and is further increased by AMP. Synergistic 
inhibition by the product of the first reaction and the 
end product of the pathway is a sophisticated variation 
of the general principle of feedback control, that has 
been found to also regulate the activity of glutamine 
synthetase. The inhibitory effect of AMP supports the 
energy charge theory of D.E. Atkinson and is logical 
in view of the high energy input required for histidine 
biosynthesis. 

The product of the transferase reaction, PRATP, is 
hydrolyzed to N’-5'-phosphoribosyl-AMP (PRAMP). 
This irreversible hydrolysis is catalyzed by an activity 
associated with the C-terminal domain of the enzyme 
encoded by the isl gene. The other activity, localized 
within the N-terminal domain of the bifunctional 
enzyme, is a cyclohydrolase, that opens the purine 
ring of PRAMP. This leads to the production of an 
imidazole intermediate, the N’-[(5'-phosphoribosyl) 
formimino]-5-aminoimidazole-4 carboxamide ribo- 
nucleotide (abbreviated to 5’-ProFAR). The fourth 
step of the pathway of histidine biosynthesis is an 
internal redox reaction, also known as an Amadori 
rearrangement, involving the isomerization of the 
aminoaldose 5’-ProFAR to the aminoketose N’-[(5’- 
phosphoribulosyl) formimino]-5-aminoimidazole-4 
carboxamide ribonucleotide (abbreviated to 5/- 
PREAR). 

Although the pathway of histidine biosynthesis 
was almost completely characterized by 1965, the 
biochemical event leading to the synthesis of imida- 
zole-glycerol phosphate (IGP) and 5-aminoimidazole 
4-carboxamide ribonucleotide (abbreviated to 
AICAR or ZMP) from 5’-PRFAR remained unsolved 
for a long time. The protein products of the þisH and 
hisF genes were known to be involved in the overall 
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Figure | Structure of the his operon of Salmonella typhimurium and metabolic pathway of histidine biosynthesis. Top: 


therelative positions of P | (primary promoter), P2 and P3 (internal promoters) and T (rho-independent bifunctional 
transcription terminator) are indicated below the genetic map. L represents the leader regions preceding the 
structural genes. Bottom: biosynthetic steps from ATP and PRPP to histidine. Abbreviations are specified in the text. 


process in eubacteria, but the catalytic events were 
elusive. The last blind spot of histidine biosynthesis 
has recently been clarified. The protein encoded by 
the hisF gene has an ammonia-dependent activity that 
converts PRFAR to AICAR and IGP, while the pro- 
duct of the hisH gene has no detectable catalytic prop- 
erties. However, in combination, the two proteins are 
able to carry out the above reaction with glutamine as 
a nitrogen donor, without releasing any free metabolic 
intermediate. The hisH and hisF gene products form 
a stable 1:1 complex that constitutes the IGP 
synthase holoenzyme. AICAR, which is produced in 
the reaction catalyzed by IGP synthase, is recycled 
into the de novo purine biosynthetic pathway. The 
other product, IGP, is dehydrated by an activity of a 
bifunctional enzyme encoded by þisB. The resulting 
enol is ketonized nonenzymatically to imidazole- 
acetol phosphate (IAP). The seventh step of the path- 
way consists of a reversible transamination between 
IAP and glutamate. The reaction, catalyzed by a 


pyridoxal-P-dependent aminotransferase encoded by 
the þisC gene, generates a-ketoglutarate and L-histi- 
dinol phosphate (HOL-P). The HOL-P is converted 
to t-histidinol (HOL) by a phosphatase activity 
situated in the N-terminal domain of a bifunctional 
enzyme encoded by the /isB gene. In the final steps of 
histidine biosynthesis, HOL is oxidized to the 
corresponding amino acid t-histidine (His). This ir- 
reversible four-electron oxidation proceeds via the 
unstable amino aldehyde t-histidinal (HAL), which 
is not released as a free intermediate. A single enzyme, 
t-histidinal dehydrogenase, encoded by hisD ca- 
talyzes both oxidation steps. This prevents the decom- 
position of the unstable aldehyde intermediate. This 
enzyme is one of the first examples of a bifunctional 
NAD*-linked dehydrogenase. 

Mutants bearing nonfunctional enzymatic activ- 
ities that are required for histidine biosynthesis grow 
normally in minimal medium when supplied with 
exogenous histidine. On the basis of this evidence, 


the histidine pathway was presumed to lack any 
branch point leading to other metabolites required 
for growth. Nevertheless, the two initial substrates 
of histidine biosynthesis, PRPP and ATP, play key 
roles in intermediary and energy metabolism and 
link this pathway to the biosynthesis of purines, 
pyrimidines, pyridine nucleotides, folates, and trypto- 
phan. Moreover, the purine and histidine biosynthetic 
pathways are connected through the AICAR cycle. 
AICAR, a by-product of histidine biosynthesis, is 
also a purine precursor. The conversion of AICAR 
to purines involves a folic acid-mediated transfer of 
a one-carbon unit. Following treatment thought to 
lower the folic acid pool, the unusual nucleotide 5- 
aminoimidazole-4-carboxamide _ riboside-5’-tripho- 
sphate (ZTP) accumulates in S. typhimurium. On the 
basis of this and additional evidence, the rare nucleo- 
tide ZTP was proposed to be an alarmone signaling 
C-1 folate deficiency and to mediate a physiologically 
beneficial response to folate stress. 


Organization of Histidine Genes 


In many of the species where his genes were identified 
and characterized, they were not dispersed through- 
out the genome but were clustered with other genes 
in complete or partial operons. The same is partly true 
for operonless fungi, in which some of the his 
genes resulted from the fusion of different segments 
bearing homology to different bacterial genes. The 
organization of genes into his operons or clusters 
varies among different species, indicating that during 
evolution, genes were separated or linked, apparently 
without severe constraints. In other bacterial operons 
that have been characterized in several species, such 
as the trp operon, gene order was largely invariant. 
The recently determined organization of his gene 
clusters in several microorganisms is presented in 
Figure 2. 


Regulation of Histidine Biosynthesis 


It has been calculated that 41 ATP molecules are 
consumed for each histidine molecule made. The 
considerable metabolic cost required for histidine 
biosynthesis accounts for the evolution in different 
organisms of multiple and complex strategies to fine 
tune the rate of synthesis of this amino acid in 
response to environmental changes. In S. typhimurium 
and in E. coli, the biosynthetic pathway is under the 
control of distinct regulatory mechanisms that operate 
at different levels. Feedback inhibition by histidine of 
the activity of the first enzyme of the pathway almost 
instantaneously adjusts the flow of intermediates 
along the pathway in response to the availability of 
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exogenous histidine. Transcriptional attenuation at a 
regulatory element, located upstream of the first struc- 
tural gene of the cluster, allows coordinate regulation 
of the levels of the histidine biosynthetic enzymes in 
response to the changing of histidyl tRNA. Two prom- 
inent features of the leader region of the his operon 
account for /zs-specific translational control of tran- 
scription termination, which is the essence of attenu- 
ation control: (1) a short coding region that includes 
numerous tandem histidine codons (7 histidine 
codons in a row of 16); and (2) overlapping regions 
of dyad symmetry that can fold into alternative 
secondary structures, one of which includes a rho- 
independent terminator. In the termination configur- 
ation, base pairing involves regions A and B, C and D, 
and E and F (Figure 3). The stable stem-loop struc- 
ture E:F followed by a run of uridylate residues 
constitutes a strong intrinsic terminator. In the anti- 
termination configuration, base pairing between B and 
C and between D and E prevents formation of the 
terminator, thus allowing transcriptional readthrough. 
The equilibrium between these alternative configur- 
ations is determined by the ribosome occupancy of the 
leader RNA, which in turn depends on the availability 
of charged histidyl tRNA. Low levels of the specific 
charged tRNA will cause ribosomes to stall on the 
leader region at the histidine codons, thereby disrupt- 
ing A:B pairing by masking region A. Under these 
circumstances, the antitermination configuration will 
be favored. Conversely, in the presence of high levels 
of charged histidy] tRNA, ribosomes will rapidly 
move through the histidine regulatory codons, there- 
by occupying both the A and B regions. Pairing 
between C and D and between E and F leads to pre- 
mature transcription termination. 

In addition to histidine, the system is also regu- 
lated by other molecules whose levels reflect the 
energetic and metabolic state of the cell. It has been 
previously mentioned that PRPP and ATP stimulate 
the activity of the first enzyme of the pathway, 
whereas AMP enhances the inhibitory effect of histi- 
dine on this enzyme. Moreover, the alarmone guano- 
sine 5/-diphosphate 3/-diphosphate (ppGpp), the 
effector of the stringent response, positively regulates 
his operon expression by stimulating transcription 
initiation at the level of the primary /isP1 promoter. 
Stimulation occurs under conditions of moderate 
amino acid starvation and in cells growing in minimal 
medium. In addition to hisP1, two weak internal pro- 
moters, designated hisP2 and hisP3, have been local- 
ized proximally to hisB and hisI, respectively. 
Although such internal promoters are quite common 
in large bacterial operons, the physiological signifi- 
cance of these genetic elements is controversial. 


Although these promoters may be physiologically 
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Bacteria 
Escherichia coli 
Salmonella typhimurium hisG hisD hisC hisB hisH | hisA hisF hisl 
Haemophilus influenzae 
Azospirillum brasiliense —{hisBd || hisH HSE hisa | hisF | hist |88} 
Streptomycei coelicolor — hisD hisC | hisBd hisH H hisA [$ 
Lactococcus lactis — hisC | onFs lhisG| hisD Horre hisBd onre H hisH | hisA hisF } hisli | oRF13 — 
Mycobacterium ~ hisG | hisl HAH hisD hisC | hisBd | hisH | hisA | impA |hisF |his12 
tuberculosis 
Bacillus subtilis —| hisz hisG | hisD | hisB | hisH | hisA | hisF [hisl /#A{_hisC_ | 
Archaea 
Methanococcus hisE VA hisF WA hisH VA hisBd VA hisC WH hisG WA hisl YA hisD |/AyhisA 
jannaschii 
Eukarya 
Saccharomyces —[hisH hisF YHA _ his! hisD WA hisG Y HhisBpx}y 4 _hisc YA hisA YA hisBd 
ee HIS7 HIS4 HIST HIS2 HIS5 HIS6  HIS3 
(chr.2) (chr.3) (chr.5) (chr.6) (chr.9) (chr.9) (chr.15) 


Figure 2 Organization of the histidine genes in different organisms. The single gene encoding a bifunctional 
enzyme, formerly known as hislE, has been renamed hisl in E. coli. | have therefore used hisl for organisms with a single 
gene and hisl and hisE for organisms with two independent genes. Another gene encoding a bifunctional enzyme, hisB, 
is often split into two separate genes in different organisms. They are referred to as hisB proximal (hisBpx) encoding 
the HOL-P phosphatase, and hisB distal (hisBd) encoding the IGP dehydratase. In Mycobacterium tuberculosis, a gene 
encoding inositol monophosphatase, impA, is located between hisA and hisF. In Bacillus subtilis the structural gene 
encoding the histidyl tRNA synthetase is located proximally to the biosynthetic his cluster. 


A 


ATCAAATGAATAAGCATTCATCGAAT TTT TATGACACGCGTTCAATT TAAACACCACCATCATCACCATCATCCTGACTAG 
+1 MetThrArgValGInPheLysHisHisHisHisHisHisHisProAspEnd 


Cc D E 
TCTTTCAGGCGATGTGTGCTGGAAGACATTCAGATCTTCCAGCGGCGCAT GAACGCAT GAGAAAGCCCCCGGAAGATCATCT 


F 


TCCGGGGGCTTTTTTTTTGGCGCGCGATACAGACCGGTTCAGACAGGATAAAGAGGAACGCAGAAT GT TAGACAACACC 
MetLeuAspAsnThr 


Figure 3 Features of the leader region of the his operon of Salmonella typhimurium. The nucleotide sequence of the 
leader region from the transcription initiation site (+1) to the first structural gene (hisG) is reported. The amino acid 
sequence of the leader peptide and of the amino-proximal region of the hisG gene product are indicated below the 
nucleotide sequence. Solid lines above the nucleotide sequence correspond to regions (A to F) capable of forming 
mutually exclusive secondary structures. 


unimportant and their presence merely fortuitous, 
their presence in homologous genomic regions of 
related microorganisms supports their physiological 
relevance. They could reinforce the expression of dis- 
tal cistrons of large operons, thereby alleviating the 
effects of natural polarity. Alternatively, they could 
allow regulation of an operon in a noncoordinate fash- 
ion and cause differential expression of certain genes 
under specific growth conditions. Based on several 
features of the nucleotide sequence, the internal pro- 
moters, as well as the primary hisP1 promoter, belong 
to the Eo” class of promoters. 

Transcription of the his operon is also modulated at 
the level of intracistronic rho-dependent terminators 
by a nonspecific mechanism operating during the 
elongation step. Terminators account for the polarity 
exhibited by several nonsense and frameshift muta- 
tions. Polarity is a phenomenon observed in poly- 
cistronic operons, by which certain mutations that 
prematurely arrest translation not only affect the 
gene in which they occur, but also reduce the expres- 
sion of downstream genes. Although polarity was 
first described in the lactose system, the coordinate 
effect of polar mutations on downstream gene expres- 
sion and the existence of polarity gradients were 
defined with precision in the his system by using a 
large collection of polar mutations. The phenomenon 
of polarity has been explained by postulating the 
existence of cryptic intracistronic rho-dependent ter- 
minators. According to a general model of transcrip- 
tional polarity, premature arrest of translation would 
favor the binding of rho to the nascent transcript via 
cytosine-rich and guanosine-poor regions. Using the 
energy of ATP hydrolysis, rho moves along the nas- 
cent transcript, overtakes elongating RNA polymer- 
ase, and precipitates release of the transcript. The 
physiological significance of rho-dependent intra- 
cistronic termination should be to prevent further 
elongation of nontranslated or infrequently translated 
transcripts. 

Finally, it has recently been documented that post- 
transcriptional events contribute substantially to is 
operon expression. In S. typhimurium and E. coli, the 
unstable native 7300 nucleotide-long polycistronic his 
message is degraded with a net 5’ to 3’ directionality, 
generating products that decay at different rates. The 
decay process generates three major processed species, 
6300, 5000, and 3900 nucleotides in length (Pri, Pr2, 
and Pr3), that encompass the last seven, six, and five 
cistrons, respectively, and have increasing half-lives (5, 
6, and 15min, respectively). RNase E controls the 
decay of the native transcript. Active translation of 
the 5/-end-proximal cistrons of the processed Pri 
and Pr2 species is required to temporarily stabilize 
these species. The overall process of decay may have 
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functional relevance in balancing the expression of the 
promoter-proximal and the promoter-distal genes. The 
most distal 3900 nucleotide-long processed species has 
a half-life of about 15min. The specific processing 
event leading to production of this species is mechan- 
istically complex. It requires sequential cleavage by 
two endoribonucleases, RNase E and RNase P. 

As discussed above, the regulation of is operon 
expression in E. coli and S. typhimurium has been the 
subject of intensive studies and the general mech- 
anisms and molecular details of the process are fairly 
well established. On the other hand, very few studies 
in this area have been performed with other prokaryo- 
tic cells. In general, it seems that while the bio- 
chemical reactions leading to histidine biosynthesis 
are the same in all organisms, the overall genomic 
organization, the structure of the his genes, and the 
regulatory mechanisms by which the pathway is 
regulated differ widely among taxonomically unre- 
lated groups. For these topics, the interested readers 
are referred to specialized reviews that cover this 
subjects. 
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Histocompatibility is required for one individual to 
accept tissue grafts from another individual. It has 
long been recognized that successful blood transfu- 
sion/tissue transplantation are dependent on matching 
donor and recipient red blood cells. This led Gorer to 
the identification of a group of antigens in mice which, 
when matched between donor and recipient animals, 
greatly improved the success of a tissue graft. These 
antigens are known as histocompatibility antigens. 
Different antigens are recognized by different T cell 
types. For example, in man, cytotoxic T cells involved 
in the recognition of noncompatible tissue grafts and/ 
or virally infected cells recognize HLA-A and HLA-B 
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antigens on the surface of foreign cells and, in con- 
junction with T cells, will destroy the foreign cells. 


See also: Antigen; Major Histocompatibility 
Complex (MHC) 


Histocompatibility 
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Background 


Histone genes were among the first eukaryotic genes 
to be characterized. Their cloning and isolation in the 
1980s was facilitated by their repetition in metazoans, 
their small size, the abundance of their mRNAs, and 
the early sequence characterization of the histone pro- 
teins. Interest in histone genes derives from their regu- 
lated transcription, the control of histone mRNA 
stability, and the regulation of histone mRNA 3’ pro- 
cessing. The histone genes provide a paradigm for the 
study of DNA replication (S-phase)-dependent tran- 
scription. They have also been exceptionally useful in 
the investigation of the determinants of tissue-specific 
and embryo stage-specific transcriptional control. 
Pioneering studies on the assembly of specialized 
architectures within chromatin made effective use of 
histone gene sequences. Research in the 1990s has led 
to recognition of the importance of histone protein 
sequence in the packaging of DNA for transcription, 
replication, recombination, and repair, together 
with the maintenance of chromosome stability and 
chromosome segregation. The histone genes have 
been subjected to an extensive mutational analysis, 
with the consequences for DNA metabolism of histone 
gene ablation, deletion, and point mutation investi- 
gated by many research scientists. 


Histone Gene Organization 


Some simple eukaryotes such as the budding yeast 
Saccharomyces cerevisiae have only two copies of 
each gene encoding the core histones H2A, H2B, 
H3, and H4. This lack of diversity and low copy 
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number greatly expedite mutational analysis (see 
below). In metazoans, the genes for all four core his- 
tones are normally clustered together and tandemly 
repeated 5 to 20 times. For example, Xenopus laevis, 
the clawed frog, has two predominant types of tan- 
demly repeated clusters that differ in the precise gene 
arrangement and in the presence of genes for par- 
ticular linker histone H1 genes. The regulatory DNA 
and coding sequence for each core histone gene with- 
in each cluster occupy less than 1 kb. Each cluster 
generally occupies less than 10 kb and appears to 
possess the capacity to assemble a unique regulatory 
nucleoprotein complex within chromatin. The vast 
majority of core histone genes are found clustered 
together. This organizational strategy is likely to facili- 
tate coordinate expression. The clustered majority of 
core histone genes are almost invariably expressed as a 
cohort during S-phase. These replication-dependent 
genes lack introns and utilize a specialized processing 
mechanism for generating their 3’ ends that is distinct 
from polyadenylation. A smaller group of core his- 
tone genes are not primarily regulated in response 
to cell-cycle signals, but are either constitutively ex- 
pressed at low levels in somatic cells, or they can be 
expressed in differentiation specific patterns during 
metazoan development. These core histone variants, 
whose mRNAs are encoded by these replication- 
independent histones, can accumulate to very high 
levels only in cells that have ceased to divide. Non- 
dividing cells have also stopped synthesizing the 
replication-dependent histones. This facilitates the 
replacement of replication-dependent histones by 
replication-independent variants, especially on DNA 
sequences at which regulated chromatin disruption 
might occur during transcription and repair. The 
replication- independent histone genes differ in the 
cis-acting elements controlling promoter activity 
from the replication-dependent genes. Replication- 
independent genes can also have introns and can be 
polyadenylated. Thus, the replication-independent 
histone genes look much more like normal genes 
transcribed by RNA polymerase II. They are also 
normally not present in the large clusters. These dif- 
ferences can also be extended to the linker histone 
genes. The normal histone H1 somatic gene in Xeno- 
pus is found in a cluster with the core histone genes, 
lacks introns, and is transcribed in S phase. In contrast, 
the specialized maternal linker histone B4 gene is tran- 
scribed throughout oogenesis in the absence of repli- 
cation and contains introns. The contrast between 
replication-independent and _ replication-dependent 
histone genes serves to emphasize the many unusual 
features of specialized organization and control util- 
ized by the replication- dependent genes to ensure 
very high expression at a single time in the cell cycle. 


Transcriptional Control of Histone 
Genes 


Replication-dependent core histone gene transcrip- 
tion is generally regulated through a three- to tenfold 
range during the cell cycle. Control is mediated by cis- 
acting elements that are within 200 bp of the start site 
of transcription. In S. cerevisiae the histone H2A and 
H2B genes share common regulatory elements with 
other genes controlled by the cell cycle, including the 
HO endonuclease (see below). Negative and positive 
regulatory elements have been identified. In humans, 
the H2B gene is regulated by three elements: the 
TATA box, an octamer motif (ATTTGCAT), and a 
distal activating domain including the CCAAT box. 
The TATA box is recognized by the basal transcrip- 
tional machinery including TFIID, the octamer motif 
is recognized by the ubiquitous octamer-binding tran- 
scription factor (OTF-1), and the proteins binding the 
distal activating domain have not yet been fully 
characterized. However, the constitutive activator 
NF-Y is an excellent candidate for interaction with 
the CCAAT box in vertebrate cells. In the replication- 
dependent H4 promoters, the octamer motif is 
replaced by other regulatory elements shared with 
replication-dependent linker histone gene promoters. 
The molecular definition of the transcription factors 
binding to these sites is still at a rudimentary stage of 
development. It appears that the core histone genes 
utilize a diverse group of constitutively expressed 
transcription factors to control transcription. It is 
probable that their expression is coordinated through 
the recruitment of common transcriptional coactiva- 
tors such as the p300/CBP protein. Consistent with 
this hypothesis is the observation that the Drosophila 
and Xenopus core histone genes are assembled into 
specific regulatory nucleoprotein architectures inde- 
pendent of cell-cycle-regulated transcription. This 
result demonstrates that the regulatory DNA for the 
histone genes is always occupied by the DNA-binding 
transcription factors, and that it is the efficiency with 
which these preassembled complexes recruit RNA 
polymerase II that is regulated. 

The regulation of the replication-independent and 
differentiation-specific core and linker histone genes 
is more complex. The promoters of these genes, such 
as the oocyte-specific histone B4 gene in Xenopus or 
the erythroid-specific histone H5 gene in the chicken, 
depend upon specific regulatory factors for transcrip- 
tional activation in particular tissues. For example, 
the accumulation of histone H5 protein in avian 
erythrocytes occurs during the differentiation of 
the erythroid cell, correlating with the shut down 
of replication and a decrease in transcriptional 
activity. The accumulation of histone H5 mRNA is 
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predominantly controlled at the transcription level. 
Erythroid-specific and ubiquitous elements control 
expression of this gene in erythroid lineages. The 
activity of the gene is low in early erythroid precursors 
and rises as differentiation proceeds. Activation dur- 
ing erythropoiesis is essential due to the action of three 
enhancers, two of which lie upstream and one down- 
stream of the transcription start site. The tissue speci- 
ficity of these enhancers is related to the presence of 
several sites for an erythroid-specific transcription 
factor GATA-1. However, the activity of GATA fac- 
tors alone cannot account for the activation of H5 
gene expression and ubiquitous transcription fac- 
tors seem also to play a central role in this process. 
The proximal promoter region of H5 contains a 
segment showing extensive similarity with a region 
of the H4 gene proximal promoter. A positive tran- 
scriptional regulatory element has been identified 
in this region which binds specifically the histone 
gene-specific factor, H4TF2, in proliferative cells. 
However, it does not seem to be essential for the 
activity of the gene in differentiated cells. In contrast, 
a neighboring GC-rich sequence element is required 
for gene activity in both the proliferative precursors 
as well as in the early stages of cell differentiation. 
Finally, the basal transcription of this promoter 
seems to involve sequences located downstream of 
the initiation site. 


Posttranscriptional Control of Histone 
mRNA 


At the end of S-phase when DNA replication stops, 
the half-life of the replication-dependent histone 
mRNAs decreases from 30-60min to 10-15 min. 
This destabilization of histone mRNA depends on 
the regulated association of proteins with the 3’ ter- 
minus of histone mRNA. A stem-loop structure in the 
3’ terminus controls the processing nucleocytoplasmic 
export of histone mRNA, translational efficiency, and 
mRNA stability. Exactly how this is accomplished is 
unknown. There is also a possible autoregulatory con- 
tribution to the regulation of mRNA abundance, since 
individual core histones and linker histones have been 
reported to induce the destabilization of histone 
mRNA zn vitro. 


Importance of Histone Gene Sequences 
for Transcriptional Control in Eukaryotic 
Nucleus 


The core histones, H2A, H2B, H3, and H4, are 
among the most evolutionarily conserved of all 
eukaryotic proteins. They consist of two domains: 
a basic N-terminal domain and a_histone-fold 
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C-terminal domain. The histone-fold domain has two 
defined functions: it heterodimerizes with a second 
histone — H3 with H4, H2A with H2B — and, once 
heterodimerized, it wraps DNA in the nucleosome. 
The basic N-terminal ‘tail? domains lie outside the 
nucleosome and do not have any defined structure. 
Although extensive protein-protein and protein- 
DNA interactions can potentially explain the seq- 
uence conservation of the histone-fold domains, the 
N-terminal tails of histones H3 and H4 show compar- 
able conservation from yeast to man. The reasons 
for this conservation have been enigmatic, but two 
nonexclusive explanations have been proposed. 

The first suggested explanation is that the H3 and 
H4 N-terminal tails represent the sites at which signal 
transduction pathways impact on chromatin struc- 
ture. The N-terminal tails are known to be sites of 
histone phosphorylation, acetylation, and methy- 
lation, and these modifications are closely correlated 
with changes in the functional properties of chromatin. 
Sequence conservation at the N-terminus might be 
required to transduce the activities of various targeted 
and ubiquitous histone modification enzymes in- 
volved in chromatin assembly and transcription. The 
second suggested explanation is that the N-terminal 
tails might represent the sites of interactions between 
histones and regulatory proteins that have direct struc- 
tural and functional roles in the transcription process. 
Such specific interactions have now been shown to 
occur. Histone modifications are predicted not only 
to alter chromatin structure, but also the interactions 
between the N-terminal tails and histone-binding 
regulatory proteins. The first genetic experiments sug- 
gesting that the histone tails play a part in the regulation 
of specific eukaryotic genes concerned the establish- 
ment of silent mating-type loci in S. cerevisiae. 
Subsequent work has firmly established that the N- 
terminal tail domains of histones H3 and H4 are essen- 
tial for repression of the silent mating type loci, as well 
as of genes placed close to the telomeres in yeast. Tran- 
scriptional repression at these chromosomal sites also 
depends on the silent-information regulatory proteins 
SIR2, SIR3, and SIR4. SIR3 and SIR4 interact with each 
other and with the DNA-binding protein RAP1. 
Together, they direct the compartmentalization of 
yeast chromosomal telomeres to the vicinity of the 
nuclear envelope. 

Mutations in the N-terminal tail of histone H4 that 
alleviated silencing can be suppressed by single amino 
acid substitutions in SIR3, suggesting that the two 
proteins directly interact. Biochemical experiments 
have confirmed that SIR3 binds directly to the 
N-terminal tail of H4, and also to the N-terminal tail 
of H3. The data suggests that SIR4 interacts in a 
similar way with these two histones. The specificity 


of these interactions was demonstrated by the failure 
of either SIR3 or SIR4 to interact with the N-terminal 
domains of H3 and H4 are also required for the 
assembly of SIR3 into telomeric chromatin, and con- 
sequently for the association of the telomere with the 
nuclear envelope. 

A model for transcriptional silencing at yeast telo- 
meres predicts that RAP1 interacts with the telomeric 
repeats and recruits SIR3 and SIR4, which polymerize 
along nucleosomal arrays through interactions with the 
N-terminal tails of H3 and H4. At the silent mating 
type loci, a distinct repressive mechanism (yet to be 
definitively characterized) also leads to the recruit- 
ment of SIR3 and SIR4. This model proposes that 
transcriptional silencing is dependent on the assembly 
of an extended domain of repressive chromatin struc- 
ture, where transcription factors and RNA polymer- 
ase are excluded both by SIR3 and SIR4, and by the 
entrapment of this chromatin domain in a perinuclear 
compartment. 

This second set of experiments that link the his- 
tones to the transcriptional regulation of specific genes 
concerns the C-terminal histone-fold domain and 
the SWI/SNF general activator complex. A substantial 
component of transcriptional regulation is increas- 
ingly perceived to depend upon the interplay of tran- 
scription factors and histones at specific sites within 
the enhancers and promoters of eukaryotic genes. In 
the yeast S. cerevisiae, the outcome of this interaction 
is influenced by the products of the SWI1/ADR6, 
SWI2/SNF2, SWI3, SNF5, and SNF6 genes. All five 
of these proteins are found within a single “general 
activator’ complex, required for the transcriptional 
induction of many yeast genes. Genetic and bio- 
chemical studies of the yeast proteins and their larger 
eukaryotic homologs suggest that the general acti- 
vator complex serves as a molecular machine that 
functions to help transcription factors overcome the 
specific repressive effects of nucleosome assembly on 
transcription. 

In the early 1980s, Herskowitz and colleagues dis- 
covered that mutations in a set of “SWItch’ genes — 
SWI1, SW12, and SW13 — reduce expression of the HO 
gene, which encodes a endonuclease involved in yeast 
mating-type switching. Simultaneous experiments by 
Carlson and colleagues defined sucrose nonfermenta- 
tion mutations of the genes SNF2, SNF5, and SNF6, 
which reduced expression of the SUC2 invertase gene. 
Both sets of mutations reduced target gene induction 
by two orders of magnitude; moreover, SWI2 was 
found to be identical to SNF2, suggesting that both 
the SWI and SNF gene products functioned through a 
common mechanism. 

Over the subsequent decade, a dozen other in- 
ducible genes were found to be dependent on SWI or 


SNF gene activities for transcriptional stimulation. 
More recent experiments have shown that the Dros- 
ophila fushi tarazu and bicoid gene products, mamma- 
lian steroid receptors, and yeast transcription factor 
GAL4 all stimulate transcription through mechanisms 
dependent on SWI/SNF activities. Drosophila, mouse, 
and human homologs of the SW12/SNF2 subunit exist 
and have similar roles in facilitating transcriptional 
activation of a variety of genes. Taken together, these 
results clearly indicate that the general activator com- 
plex has a central role in the regulation of eukaryotic 
transcription, but how is this transcriptional activa- 
tion function exerted? 

A major clue to the molecular mechanism by which 
the general activator complex exerts its function came 
from a genetic screen for mutations of genes that 
would allow transcription of HO in the absence of 
SWI1. Two genes, SIN1 and SIN2, were identified 
that, when mutated, led to SWI-independent tran- 
scription. Both of the SIN genes isolated in this way 
encode components of chromatin. SIN1 is a highly 
charged nuclear protein, somewhat similar to mam- 
malian HMG1/2 proteins. The HMG1/2 proteins 
have been found to be associated with nucleosomes, 
most probably interacting with linker DNA. Every 
nucleosome contains 165-220 bp of DNA, of which 
146 bp are wrapped in 1.75 turns around the octamer 
of core histones in the nucleosome core. The additional 
DNA that lies between nucleosome cores is the linker 
DNA. Linker histones (such as H1, H5, and H1°) 
normally bind to linker DNA, however, in certain cir- 
cumstances, they may be replaced by HMG1/2. 

A more direct association with nucleosomal struc- 
ture is found for SIN2, which encodes histone H3. 
Kruger, Peterson, Herskowitz, and colleagues also 
identified SIN alleles of the H4 gene, after reintroduc- 
tion in vivo of the im vitro mutagenized gene. The 
location of the amino acid changes in histone H3 and 
H4 that lead to the SIN phenotype offer additional 
insight into potential roles for the general activator 
complex. However, in order to appreciate the struc- 
tural significance of the mutations, it is important to 
know their position within the nucleosome. The 
carboxy-terminal histone-fold domains of each core 
histone are predominantly «-helical, with a long cen- 
tral helix bordered on each side by a loop segment and 
a shorter helix. Each of the loop segments has some 
B-strand character. Histone dimerization leads to the 
loop segments from each half of the dimer being 
paired to form eight, parallel B-bridge segments, two 
of which are found within each of the histone hetero- 
dimers — H3, H4 and H2A, H2B. Each B-bridge seg- 
ment is associated with a least two positively charged 
amino acids, which are available to make contact with 
DNA on the surface of the histone octamer. 
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The second repeating motif within the nucleome is 
assembled from the pairing of the amino-terminal end 
of the first helical domain of each of the histones in 
the heterodimers. These four ‘paired-ends-of-helices’ 
motifs also appear to contact DNA. Thus, each of the 
four heterodimers within the core can make at least 
three, pseudosymmetrical, contiguous contacts with 
three inward-facing minor grooves of DNA. The par- 
allel B bridges and four paired-ends-of-helices provide 
12 potential DNA-contact sites that are regularly 
arranged along the ramp on which the double helix is 
wound. The SIN mutants in histones H3 and H4 
cluster in one B-bridge motif within the heterodimer. 
Because of the juxtaposition of two (H3, H4) hetero- 
dimers at the dyad axis of the nucleosome, the SIN 
mutations have the potential to disrupt histone-DNA 
interactions involving the central turn of DNA at the 
dyad axis. This could have a major impact on the 
integrity of both the nucleosome and higher order 
chromatin structures. 

These two examples of transcriptional regulation 
have in common the highly selective recognition of 
individual core histones by a variety of regulatory 
proteins. These interactions can be targeted by 
sequence-specific DNA-binding proteins, and pro- 
vide an explanation for the highly selective activation 
or repression of particular genes following mutation 
of individual histones. The inclusion of histones as 
architectural components within regulatory nucleo- 
protein complexes further strengthens the evidence 
for their essential role in eukaryotic transcription. 
The reasons for the conservation of the primary 
sequence of the core histones and their genes thus go 
beyond merely conserving the internal architecture of 
the nucleosome, and include the functional require- 
ment of conserving interactions with the regulatory 
proteins that modulate chromatin function. These 
results also suggest that novel families of proteins 
remain to be defined that will contain conserved 
regions capable of specifically recognizing histone 
domains both outside and inside the nucleosome. 
Defining the nature of these proteins that truly ‘hang 
on’ to the histones will offer much insight into how 
regulatory events occur within chromosomes. 
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Histones are conserved proteins found in the nuclei 
of all eukaryotic cells where they are complexed to 
DNA forming the nucleosome, the basic subunit of 
chromatin. Histones are of relatively low molecular 
weight and are basic, owing to their high arginine/ 
lysine content. 


See also: Chromatin 
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DNA, the genetic material, is packaged into chromo- 
somes ranging in length from thousands to many tens 
of millions of base pairs. Now consider the fate of two 
independent mutations that have occurred by chance 
on one specific copy of a chromosome in a population. 
Imagine one of these mutations is a selectively favor- 
able mutation that natural selection will increase in 
frequency in the population each generation (see Select- 
ive Sweep). The other mutation on this same chromo- 
some is a selectively neutral mutation (see Neutral 
Mutation), one whose fate will be governed under 
normal circumstances by genetic drift (see Genetic 
Drift). Its association with the favorable mutation on 
the same chromosome guarantees that as the adaptive 
mutation increases in frequency by natural selection, 
so the ‘linked’ neutral mutation will also determin- 
istically increase in frequency. The ‘hitchhiking effect’ 


is the associated change in frequency of a nonselected 
mutation resulting from its physical linkage to a dif- 
ferent mutation under selection on the same chromo- 
some. 

The magnitude of genetic hitchhiking is related 
directly to the recombination rate between the muta- 
tions under consideration. The animal mitochondrial 
genome, for example, a maternally inherited circular 
genome, is expected to be particularly susceptible to 
hitchhiking events because it is a nonrecombining 
genome. In one species of the fruit fly, Drosophila 
simulans, a maternally inherited microorganism, called 
Wolbachia, has a mechanism by which it provides a 
strong selective advantage to females carrying the 
infection when they are introduced into a population 
without the infection. This strong selective advantage 
and maternal inheritance of both the advantageous 
bacteria and the mitochondrial genome has been 
shown to cause the mitochondrial variant in the 
infected female to increase in frequency as it hitchhikes 
up along with the frequency of Wolbachia infection. 

A curious feature of genetic hitchhiking accom- 
panying the fixation of a selectively favored mutation 
(see Selective Sweep) is that other mutations within a 
tightly linked interval spanning the site under positive 
selection, if they do not undergo a recombination 
event during the course of fixation of the favored 
mutation, will also all go to fixation (or extinction). 
Therefore, one telltale sign of a selective sweep of a 
favorable mutation is a region of the genome that has 
lower than expected levels of nucleotide polymorph- 
ism in a population sample. Several such signatures of 
a selective sweep have been reported in this manner, 
especially in the Drosophila. 

A second type of hitchhiking is also possible, and it 
involves the hitchhiking to extinction (rather than to 
fixation) of mutations that are linked to a selectively 
deleterious mutation, i.e., one that is doomed to be 
eliminated from the population by natural selection. 
When a deleterious mutation arises in a population, 
it is generally eliminated, but often this elimination 
requires tens or hundreds of generations to complete. 
During this time, any other mutation that also arises 
on this doomed chromosome, unless it is strongly 
advantageous or it is sufficiently loosely linked and 
can recombine away, will also be eliminated in due 
course. Under this scenario, called ‘background selec- 
tion,’ only those chromosomes in the population 
without any deleterious mutation will contribute to 
the future ancestry of the population. Theory shows 
that this fraction of unmutated chromosomes is 
approximately f(0) = e“/s where u is the deleterious 
mutation rate and s is the selective disadvantage of the 
mutation. In Drosophila, certain regions of a chromo- 
some have very much lower recombination rates, as 


measured by the recombination rate per kilobase of 
DNA, than other regions of the same chromosome. In 
these regions, background selection is predicted to 
reduce the standing crop of neutral mutations by the 
fraction f(0). In fact, strong reductions in variation have 
consistently been found in these low-recombining 
regions of the genome, providing a modicum of 
support for the prevalence of background selection 
(but see Selective Sweep for an alternative explanation 
for this observation). 


Further Reading 

Charlesworth B (1996) Background selection and patterns of 
genetic diversity in Drosophila melanogaster. Genetic Research 
68(2): 131-149. 

Kaplan NL, Hudson RR and Langley CH (1989) The “hitchhiking 
effect” revisited. Genetics |23: 887-899. 

Kim Yand Stephan W (2000) Joint effects of genetic hitchhiking 
and background selection on neutral variation. Genetics 
155(3): 1415-1427. 
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Neutral Mutation; Selective Sweep 
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Hodgkin’s disease (HD) is a collection of disparate 
lymphoid disease, defined histologically by the pre- 
sence of multinucleated Hodgkin or Reed-Sternberg 
(H/RS) cells. The first eponym derives from the post- 
mortem description of six cases with lymphadeno- 
pathy and splenomegaly by Thomas Hodgkin at 
Guy’s Hospital, London in 1832. The H/RS cells were 
described by Dorothy Reed in 1902, and Sternberg in 
1898. There are four distinct histological subtypes: 
nodular sclerosing (NS), mixed cellularity (MC), 
lymphocyte depleted (LD), and lymphocyte predomi- 
nant (LP). NS is the most common and is found 
mainly in young adults. The LP subtype is distinct, 
lacking H/RS cells and having instead populations of 
large ‘lymphocyte and histiocytic’ or L&H cells, 
which derive from mature B cells. In contrast, the H/ 
RS cells of the other histological subtypes express 
molecules associated with a number of hemopoietic 
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lineages including T-cell antigens such as CD2 
and CD4, myeloid antigens such as CD15 as well as 
B-cell antigens. 

The etiology of HD remains unknown and given 
the wide differences in the histological appearances, 
it is likely that the etiology of each subtype will be 
distinct. Familial clustering of HD has been reported. 
Whether this represents a common genetic predispos- 
ition and/or exposure to some common environmental 
agentis not clear. Epstein-Barr virus (EBV) may have a 
role to play in some cases as some H/RS cells contain 
EBV genomes and express LMP1 which is known to 
have oncogenic potential in B cells. Cytogenetic ana- 
lysis of primary material and derived cell lines has 
shown no recurrent abnormalities; HD-derived cell 
lines are notable for their cytogenetic complexity. 
Recently, comparitive genomic hybridization (CGH) 
studies have shown gains of chromosome 2p13 and 
high-level amplification of chromosomes 4p16, 4q23- 
q24, and 9p23-p24. The last region, which is also 
amplified in mediastinal B-cell lymphomas, contained 
JAK2. 

The H/RS and L&H cells represent the malignant 
cells of HD. The major problem with the study of HD 
is that these malignant cells comprise only a small 
subset, often less than 1% of the tumor, with the 
remainder composed of infiltrating reactive T cells, 
B cells, neutrophils, and fibrotic tissue. For a long 
time the cell of origin of both H/RS and L&H cells 
remained unknown. To overcome this problem, 
microdissection and amplification of RNA and 
DNA has been undertaken. Using these techniques, 
along with high-throughput sequencing of HD 
cDNA libraries (http://www.hodgkins.georgetown. 
edu./) and gene profiling methods, the origins and 
pathophysiology of HD are being revealed. Analysis 
of the microdissected H/RS cells from patients with 
NS HD has shown that these cells not only exhibit 
rearranged immunoglobulin heavy (IGH) chain gene 
segments but also have mutations consistent with their 
exposure to antigen in the germinal center of the lymph 
node. Furthermore, analysis of rare patients with con- 
current HD and B-cell lymphoma showed the same 
clonal IGH rearrangements with an overlapping pat- 
tern of somatic mutations within the variable region 
(VH) gene segments. Together, these data indicate a B- 
cell origin for at least some if not all, H/RS cells. 

Concerning the pathophysiology, study of HD 
cell lines has revealed constitutive activation of 
NF-«B and secondly, autocrine stimulation via IL 
13. Nuclear NF-«B promotes cell survival through 
the transcriptional upregulation of a number of 
antiapoptotic genes. However, in most normal cells, 
NF-«B is retained in the cytoplasm due to the 
presence of inhibitory (IKB) proteins. Concurrent 
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deletion and mutation of IkBa alleles, resulting in 
protein truncation and loss of inhibitory activity, 
has been reported in a number of cell lines and pri- 
mary cases. Secondly, constitutive IL 13 secretion has 
been detected by gene profiling of HD lines and 
shown in primary material by in situ hybridization. 
Moreover, in one HD cell line, neutralizing anti- 
bodies to IL 13 blocked proliferation, suggesting that 
this might be a new therapeutic target in some cases 
of HD. 


Further Reading 

Jarrett RF and MacKenzie J (1999) Epstein-Barr virus and other 
candidate viruses in the pathogenesis of Hodgkin’s disease. 
Seminars in Hematology 36: 260-269. 

Joos S, Kupper M, Ohl S et al. (2000) Genomic imbalances 
including amplification of the tyrosine kinase gene JAK2 in 
CD30+ Hodgkin cells. Cancer Research 60: 549-552. 

Rose M (1981) Curator of the Dead: Thomas Hodgkin (1798— 
1866). London: Peter Owen. 

Staudt LM (2000) The molecular and cellular origins of Hodg- 
kin’s disease. Journal of Experimental Medicine 191: 207-212. 


Reference 
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Also known as the “TATA box,’ the Hogness box is an 
8-bp AT-rich promoter sequence in eukaryotes and 
Archaea that is the binding site for the TATA-box 
binding protein (TBP), a subunit of the TFIID initia- 
tion factor in metazoans. TBP functions as an initiation 
factor without additional TBP-associated factors in 
Archaea and at many promoters in Saccharomyces 
cerevisiae. The first base of the sense strand consensus 
sequence T-A-T-A-T/A-A-T/A-N is approximately 
30 bp upstream of RNA polymerase II transcription 
start sites in metazoans, Archaea, and some fungi. In 
S. cerevisiae the Hogness (TATA) box occurs ~ 90 bp 
upstream of the transcription start site. The match to the 
consensus sequence (determining the affinity for TBP) 
is an important determinant of promoter strength. 


See also: Consensus Sequence; Promoters 


Holliday Junction 
P J Hastings 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1429 


A Holliday junction is the structure formed by the 
exchange of single DNA strands between two 
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Figure | 
molecules. Single strands of one parent are distinguished 
from the other parent by the thickness of the line. Hybrid 
DNA can be seen as thick and thin strands within the 


A Holliday junction with hybrid DNA on both 


same molecule. (i) Migration of the Holliday junction 
toward the left (small arrow) has extended the two 
lengths of heteroduplex. (iii) Cleavage of the junction by 
resolvase from the structure (ii), cutting the crossing 
strands, yields two molecules with parental combinations 
of markers A and B or a and b, although each includes a 
length of hybrid DNA. (iv) Rotation of the upper arms, 
shown by the circle, shows the same structure from a 
different point of view. (v) A further rotation, this time of 
the two arms on the right (shown by a circle), reveals the 
alternative isomer. (vi) Cleavage of the isomer in (v) by 
cutting the crossing strands gives two molecules with the 
recombinant combination of the markers, A and b or a 
and B. Each has a length of hybrid DNA. 


homologous DNA molecules. The structure is named 
after Robin Holliday who first proposed the structure 
in 1964 (Holliday, 1964). Figure | shows how the two 
molecules are held together by the presence of hybrid 
DNA, that is, DNA formed with one strand from one 
parental molecule and the other strand from the other 
parent. Physical models of DNA show that it can 
adopt this structure without strain and with all bases 
remaining paired. 

The Holliday junction is central to recombination 
theory because it has three interesting properties. 
First, it can isomerize, i.e., take on an alternative 
structure (see Isomerization (of Holliday Junctions)). 
Second, it can migrate, leading to extension or short- 
ening of the lengths of hybrid DNA (see Branch 
Migration). Third, it can be resolved by a special class 
of enzymes (resolvases) that cut the structure symmet- 
rically to give two separate molecules (see Resolvase). 

Isomerization occurs spontaneously. Resolving the 
structure while in one isomer is expected to lead to 
crossing-over. In the other isomer, resolution restores 
the parental combination of flanking regions, but 
lengths of single strands have been exchanged. Migra- 
tion of a Holliday junction may be able to occur by 
random drift, but it is an enzyme-mediated process in 
Escherichia coli, where the RuvABC proteins acting 
together are able to catalyze both branch migration 
and resolution. 


Reference 
Holliday R (1964) A mechanism for gene conversion in fungi. 
Genetic Research 5: 282-304. 


See also: Branch Migration; Isomerization 
(of Holliday Junctions); Resolvase 
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In 1964, Robin Holliday proposed the basic model of 
recombination by the formation of hybrid DNA 
coupled with correction of mismatched base pairs. In 
this model, initiation of recombination occurs by cut- 
ting a single DNA strand (nicking) at identical pos- 
itions on the like strands of two homologous DNA 
molecules, as shown in part (1) of Figure |. Both of 
these strands become unwound from the nick (2) and 
anneal with the homolog so that the two displaced 
strands have changed places thereby forming hybrid 
DNA (3). The structure so formed is called a Holliday 
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Figure | Holliday’s model. Each line represents a single 
DNA strand. Thick and thin lines distinguish the DNA of 
two homologous molecules. Arrows on the strands 
indicate polarity. The figure is described in the text. 


junction. The Holliday junction can migrate in either 
direction. If it migrates away from the site of initi- 
ation, hybrid DNA is extended on both DNA mol- 
ecules (4). If the Holliday junction migrates towards 
the initiation site, the lengths of hybrid DNA will 
diminish symmetrically. 

Holliday proposed that the interacting DNA mol- 
ecules could be separated by strand breakage at the 
Holliday junction. If the breaks occur on the inner 
strands at the positions marked p in (4), the two mol- 
ecules could separate (5) without recombination of 
markers flanking the event, and ligation would yield 
two noncrossover products in which lengths of single 
strands have been exchanged locally (7). If the outer 
strands labeled r in (4) are broken, as seen in (6), 
ligation of the ends would result in a crossover (8). 

If there is an allelic difference between the interact- 
ing molecules, the hybrid DNA will contain one or 
more mismatched base pairs or unmatched nucleo- 
tides (single-strand loops). Such a hybrid molecule is 
called a heteroduplex. Holliday proposed that a mis- 
match repair system will operate on the mismatch in 
the heteroduplex to excise one genotype or the other, 
and replace it by copying the remaining single strand. 
This correction process may then either convert a 
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DNA molecule to the genotype of the homolog or 
restore the parental genotype. Uncorrected hetero- 
duplex DNA would persist until the next replication 
of the chromosomes, when the two daughter chromo- 
somes would be of different genotypes. This explains 
the phenomenon of postmeiotic segregation, where a 
single meiotic product is seen to have both parental 
genotypes even though it has only one copy of any one 
DNA molecule. 

By this simple form of the model, lengths of hetero- 
duplex DNA are necessarily symmetrical, that is, 
they have the same length on the two participating 
DNA molecules. However, it was known that the 
distribution of conversion may be asymmetrical. 
Holliday overcame this problem by proposing that 
the Holliday junction might migrate back toward the 
initiation site after mismatch correction has occurred 
on only one chromatid. This could have the effect of 
leaving an asymmetrical length of conversion. 


Further Reading 
Holliday R (1964) A mechanism for gene conversion in fungi. 
Genetic Research 5: 283—304. 


See also: Heteroduplexes; Holliday Junction 
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Holocentric chromosomes are distinguished by the 
structure of the kinetochore, which extends along the 
poleward face of the metaphase chromosome. Micro- 
tubule attachment is distributed along holocentric 
chromosomes, in contrast to monocentric chromo- 
somes where the kinetochore and hence microtubule 
attachment is localized to one region. In meiosis, the 
nonlocalized kinetochore is absent and the ends of the 
chromosomes are said to adopt ‘kinetic activity, refer- 
ring to the observation that in the meiotic divisions 
the chromosomes move end on toward the spindle 
poles. Holocentric chromosome organization has 
been described for certain plants, protozoa, nema- 
todes, and insects. A review of the earlier literature 
describing the cytological observations on holocentric 
chromosome behavior in various groups is available 
(White, 1973). In recent years, the nematode 
Caenorhabditis elegans has been the subject of 
extensive cytological, molecular, and genetic studies, 
which have contributed to the understanding of 


various aspects of holocentric chromosome behavior 
in this organism. Research on mitotic and meiotic 
segregation in C. elegans indicates that these holo- 
centric chromosomes have features and behaviors in 
common with the more familiar monocentric chromo- 
somes (Albertson et al., 1997). 


Mitotic Behavior 


The nonlocalized kinetochore becomes visible at the 
ultrastructural level in prophase. By metaphase, it is 
typically a well-differentiated trilaminar structure 
resembling the kinetochore of monocentric chromo- 
somes and probably extends the entire length of the 
chromosome. Holocentric chromosomes appear as 
stiff rods under the light microscope and lack the 
primary constriction that demarcates the centromere 
of monocentric chromosomes. At metaphase, the 
chromosomes align parallel to the equator of the meta- 
phase spindle and lie entirely within the spindle. 
Microtubule attachments are distributed along the 
kinetochore, so that at anaphase the chromosomes 
move broadside on to the spindle poles. Studies in C. 
elegans have also demonstrated that these holocentric 
chromosomes terminate in telomere sequences similar 
to those of mammalian telomeres. 


Chromosome Rearrangements 


Holocentric chromosome organization allows the 
stable propagation of chromosome rearrangements 
that are not mitotically and meiotically stable in 
organisms with monocentric chromosomes. Transloc- 
ation chromosomes involving two entire holocentric 
chromosomes align and segregate to a single spindle 
pole, whereas in organisms with monocentric chromo- 
somes, the linkage of two chromosomes results in 
the formation of dicentric chromosomes that fail to 
segregate properly. Fragments of holocentric chromo- 
somes may also be propagated, because they retain the 
capability to attach to the spindle apparatus. In con- 
trast, fragmentation of monocentric chromosomes 
results in the generation of mostly acentric fragments 
that are lost. Indeed, before visualization of the holo- 
centric kinetochore by electron microscopy was pos- 
sible, this differential behavior of holocentric and 
monocentric chromosome fragments formed the 
basis of a test for holocentric organization. 


Meiotic Behavior 


Holocentric chromosomes typically behave differ- 
ently in meiosis and mitosis. In meiosis, in most 
organisms that have been examined at the ultrastruc- 
tural level, no kinetochore structure is seen. Instead, 
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Orientation and segregation of axially oriented holocentric chromosomes in meiosis. (A) Two holocentric 


meiotic bivalents are shown in diakinesis. The homologs are associated at the ends labeled (b). (B—E) Segregation of 
meiotic chromosomes on meiotic spindles drawn with the long axis vertical. The spindle microtubules are indicated 
by lines and are shown converging toward the poles. (B) Alignment of the bivalents on the metaphase | spindle with 
the ends labeled a proximal to the spindle poles and the ends labeled b on the spindle equator. (C) At anaphase I, 
homologs separate to opposite spindle poles with the ends labeled a leading the way to the spindle poles. (D) Axial 
orientation of sister chromatids with ends labeled b proximal to the spindle poles and the ends labeled a on the 
equator of the spindle. (E) Anaphase II segregation with ends b leading the way to the spindle poles. 


microtubules appear to project directly into the chro- 
matin. At diakinesis of meiotic prophase, the bivalents 
of holocentric chromosomes are composed of homo- 
logous chromosomes, which appear to be held 
together in an end-to-end association. In earlier litera- 
ture, this association was attributed to terminalization 
of chiasmata, but whether there is terminalization in 
organisms for which meiosis I is reductional is now 
being questioned. It seems more likely that the 
extreme condensation of the chromatin obscures cyto- 
logical manifestations of distributed crossovers and 
gives rise to the apparent end-to-end association of 
the homologs. Furthermore, proper disjunction of the 
homologs requires a crossover event, and it appears 
that the location of the crossover determines which of 
the two ends of the homologs are associated in the 
bivalent. 

The orientation of the bivalents on the metaphase I 
spindle varies from species to species. The bivalents may 
adopt the equatorial orientation and align parallel to the 
equator of the spindle, or they may align parallel to 
the spindle pole axis, adopting the axial orientation. If 
the bivalent aligns axially, then the sister chromatids 
segregate to the same pole at anaphase I, so that the 
first meiotic division is reductional, as occurs in meiosis 
in species with monocentric chromosomes. For equa- 
torially oriented bivalents, the order is reversed and 
the first meiotic division is equational. In C. elegans 
and in some heteropteran species, it has been possible 
to use cytological markers to study the segregation of 
axially oriented homologs. As shown in Figure I, the 
chromosomes align axially at metaphase I and move 
end on toward the spindle pole at anaphase I. On 
completion of meiosis I, the sister chromatids 
remain in association at the ends that were poleward 


in metaphase I. They align axially with these ends on 
the equator of the metaphase II spindle, and then at 
anaphase II, the opposite ends of the chromosomes 
lead the way toward the spindle poles. Thus, in these 
organisms, it has been established that both ends of the 
chromatids adopt ‘kinetic activity’ in meiosis, with 
first one end performing this function at meiosis I 
and the other at meiosis II. 

Our understanding of the behavior of holocentric 
chromosomes in mitosis and meiosis is based largely 
on cytological observations in a variety of species. 
These observations raise a number of questions 
regarding the structure and function of the holocentric 
‘centromere.’ For example, how are kinetochores 
assembled on the metaphase chromosomes, how 
does a holocentric metaphase chromosome become 
oriented toward only one spindle pole, are there 
underlying centromeric DNA sequences distributed 
throughout the genome, and how is kinetic activity 
restricted first to one end and then the other of meiotic 
chromosomes? Future application of molecular and 
genetic approaches should help to provide answers 
to these questions as they relate specifically to holo- 
centric chromosomes and to the behavior of chromo- 
somes in general. 
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Holophyly 
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This is the process by which all the descendants of a 
stem species, no matter how divergent, are combined 
into a single holophyletic lineage (cladon). Such a 
cladon was erroneously called monophyletic but 
Haeckelian monophyly is a very different concept. 
Hennig’s monophyly was therefore renamed holo- 
phyly by Ashlock (1971). Holophyly is a property of a 
branch of the phyletic tree (cladogram), while mono- 
phyly is a property of a taxon in a Darwinian classifi- 
cation. 


Further Reading 
Haeckel E (1866) Generelle Morphologie der Organismen. Berlin: 
Georg Reiner. 
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The homeobox was identified independently by Bill 
McGinnis and Mike Levine in the laboratory of 
Walter Gehring in Switzerland, and Matthew Scott 
and Amy Weiner working with Thomas Kaufman 
and Barry Polisky at Indiana University in 1983- 
1984. When the sequences of several cloned homeotic 
genes were compared, it was found that they shared 
a common, conserved stretch of approximately 
180 bp. This sequence element has been termed ‘the 
homeobox’ (in previous literature, also ‘homeo box,’ 
‘homoeobox,’ etc.), because it was discovered in 
homeotic genes. The homeobox encodes a protein 
domain, the homeodomain, that has now been found 
in many developmental control genes. In essence, 
homeobox genes code for transcription factors and 
most of them play important roles during the develop- 
ment of multicellular organisms. They have been 
found in plants, fungi, and animals, as well as slime 
molds. 


Structure of Homeodomain 


The typical homeodomain is 60 amino acids long. The 
structure of several highly divergent homeodomains 
from yeast, flies, and vertebrates has been determined 
using X-ray and nuclear magnetic resonance (NMR) 
analysis. The different homeodomains are essentially 
very similar, even though the primary sequence simi- 
larity can be very small. The core of the homeodomain 
consists of three o-helixes (Figure |). Helix 2 and helix 
3 are linked via a short turn and form a structural motif 
called a ‘helix-turn-helix motif’ that is shared with 
many bacterial DNA-binding transcription factors 
and repressors. Helix 1 crosses over helix 3 so that the 
three helixes form a hydrophobic core that stabilizes 
the structure (Figure 1). A substantial part of the 
DNA-binding activity is located in helix 3, which lies 
in the major groove of the DNA and provides most of 
the sequence-specific contacts. In particular, residue 9 
of helix 3 is a key residue that provides DNA-specific 
contacts; most homeodomains have a glutamine at that 
position. The flexible N-terminal arm of the homeo- 
domain can reach into the minor groove of the DNA 
and provide additional contacts, although the mode of 
contact of this arm is subject to more variation between 
different types of homeodomains. 

In several different classes of homeobox genes, 
insertion events have expanded or contracted the size 
of the homeodomain. The insertion points for extra 
residues are either in the loop between helix 1 and 
helix 2, such as in the TALE class of homeobox 
genes, or in the turn between helix 2 and helix 3. 


Classes of Homeobox Genes 


The homeobox genes can be divided into different 
classes depending on their sequence and gene 
structure. Many homeobox genes encode not only a 
conserved homeodomain but also additional con- 
served domains that are located N- or C-terminally 
of the homeodomain. The largest diversity of homeo- 
box genes is found in animals, thus, unless noted, the 
described classes and families are found only in 
animals. 


Hox Cluster Genes 

The perhaps best-known homeobox genes are those 
located in the Hox cluster (Figure 2). Some of the first 
homeobox genes cloned were genes such as Antp, 
Ubx, and ftz. With the exception of the Abd-B genes, 
all Hox cluster genes have a recognizable, small, five- 
to six-amino acid motif upstream of the homeo- 
domain, called the ‘hexapeptide’. While the genes in 
the center of the cluster are very similar to each other, 
the outermost genes can be very different from each 


Figure | 
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Structure of the homeodomain. Two schematic views of the Antennapedia homeodomain bound to DNA 


as determined by M. Billeter, G. Otting, and colleagues in the laboratory of K. Withrich. The NMR data were 
modeled in RasMol V2.6, the DNA is shown as a stick model in light gray, while the protein backbone (side chains 
not shown) is displayed as a dark ribbon. The numbers indicate the three o-helixes. Helix 3 sits in the major groove of 


the DNA. 


other, often sharing less than 50% identity in the 
homeodomain. The vertebrate Evx genes, though not 
Hox genes, are part of the vertebrate Hox cluster; in 
other species such as flies, this gene family has sep- 
arated from the cluster. 


Dispersed Hox-like Genes and Other 
Clusters 

A number of homeobox gene families share similar- 
ities with the Hox cluster genes. For example, the 
empty spiracles (ems) and caudal (cad) genes play a 
role in anterior—posterior patterning and have a hexa- 
peptide upstream of the homeodomain. A second 
group of genes of the NK-2, NK-1, Tlx (which also 
has a hexapeptide), and ladybird (lbx) families reside 
in another gene cluster, the NK cluster, in Drosophila. 
Several NK and NK-related goes are also linked in 
vertebrates: analysis of human genome data suggests 
that an NK cluster was duplicated and subsequently 
broken in vertebrate evolution. A further small clus- 
ter, termed the ‘ParaHox cluster’ has been found in 
amphioxus, with the gene families cad, Xlox, and Gsx. 
Some other gene families that are dispersed through 
the genome, but are more similar to the Hox, NK, and 


ParaHox cluster genes than to other classes, are msh, 
Mox, DIl, Hlx, en, NEC, ceh19, Bar, Xnot, and Hex; 
some of these may have originally been part of the 
Hox or NK clusters. 


POU Class 

These homeobox genes encode a POU-specific 
domain upstream of a distinct type of homeodomain, 
the POU homeodomain. The POU-specific domain is 
a DNA-binding domain of about 80 amino acids that 
contains a helix—turn—helix motif like the homeo- 
domain. The POU domain was first found in the 
mammalian transcription factors Pit-1, Oct-1, and 
Oct-2, and the Caenorhabditis elegans gene unc-86. 
A special feature of the POU homeodomain is the 
cysteine residue at position 9 of helix 3. Six families, 


POU-I to POU-VI, have been defined. 


prd Class 

The paired class of homeobox genes is named after its 
first member, the Drosophila gene paired. This class is 
characterized by having a prd domain upstream of the 
homeodomain. The prd domain is about 130 amino 
acids in length and binds DNA. The Paired domain is 
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Figure 2 Hox clusters in Drosophila melanogaster and mouse. The Hox cluster of D. melanogaster contains 
12 homeobox genes: labial (lab), proboscopedia (pb), zerknüllt-related (zen2), zerknüllt (zen), bicoid (bcd), Deformed 
(Dfd), Sex combs reduced (Scr), fushi tarazu (ftz), Antennapedia (Antp), Ultrabithorax (Ubx), abdominalA (abd-A), Abdominal 
B (Abd-B). The cluster is in fact split into two parts, one part is the Antennapedia complex and the other, the Bithorax 
complex. Although located in homeotic complexes, several of the homeobox genes in the cluster are not homeotic 
genes: zen, zen2, bcd, and ftz. In mouse and other vertebrates, there are four paralogous Hox clusters (apart from 
fish, which have more). The duplications of the cluster from a single ancestral cluster probably happened at the 
beginning of vertebrate evolution. In the course of evolution, some of the paralogous Hox genes were lost, so that 
the present-day mammalian cluster contains 39 Hox genes, as well as two genes of the even-skipped family (Evxl and 
Evx2), which probably formed part of the ancestral cluster, too. The genes can be grouped into 13 paralog groups. 
The lines between the mouse and the fly cluster show the evolutionary relationships between the homeobox genes. 
Thus, the Hox genes of paralog group | are orthologous to lab in flies, paralog groups 9-13 are homologous to Adb-B. 
Not all genes have |:| paralogs: for example, in the central part of the cluster the fly genes Antp, Ubx, and adb-A) and 
the mouse paralogs Hox6, Hox7, and Hox8 may have arisen through independent duplication events from a single 
ancestral gene. Likewise, zen2 is a relatively recent duplication from zen, and bcd also is derived from an ancestral 
zen/Hox3 gene. 


actually comprised of two similar domains, each con- 
taining a helix—turn—helix motif. The homeodomain 
distinguishes itself from other homeodomains by 
having a serine residue at position 9 of helix 3. 


prd-Like Class 

This group of homeobox genes is related to the paired 
class of homeobox genes through their homeodomain. 
Some prd-like homeodomains are more than 70% 
identical to prd class homeodomains. However, they 
do not contain a prd domain, and residue 9 of helix 3 in 
the homeodomain is not a serine residue. More than 15 
families have been described. 


LIM Class 

The LIM class of homeobox genes contain two LIM 
domains upstream of the homeodomain. The LIM 
domain is composed of two so-called zinc fingers, 
which contain conserved cysteine, histidine, and 
aspartate residues that bind zinc. The LIM-domain 
zinc fingers are distinct from other zinc fingers, and, 
unlike many of the other zinc-finger families that are 


involved in DNA-binding, the LIM domains of LIM 
homeobox genes are involved in protein-protein 
interactions with other factors. At least six conserved 
families are found. 


ZF Class 

The ZF (zinc-finger) class of homeobox genes are an 
unusual group of genes. They contain classic zinc- 
finger domains such as have been found in zinc-finger 
transcription factors that bind DNA plus one or more 
homeodomains. The combination of these domains 
can take quite bizarre proportions, as in the mam- 
malian gene ATBF1, which contains 17 zinc-finger 
domains and 4 homeodomains. 


cut Class 

The cut class genes are characterized by a variable 
number of cut domains upstream of the homeo- 
domain. Three separate families exist, having either 


1, 2, or 3 cut domains. The cut domain is also a 
DNA-binding domain. 


SO/SIX Class 

The sine oculis/Six class of homeobox genes contain a 
large conserved domain of presently unknown func- 
tion N-terminally adjacent to the homeodomain. 
Several families exist. 


HD-ZIP Class 

Immediately C-terminal of the domain is a so-called 
Leucine-Zipper family of homeo box genes found in 
plants. Within the homeodomain is a so-called leucine- 
zipper, a region that forms coiled-coil structures in- 
volved in dimerization. Four different families have 


been defined. 


TALE Class 

The TALE group of homeobox genes is very ancient; 
their homeodomain is 63 amino acids long. TALE 
stands for “three amino acid loop extension,” because 
of the three extra residues in the loop between helix 1 
and helix 2. TALE homeobox genes are found in 
plants (two classes: KNOX and BEL), in fungi (two 
classes: CUP and M-ATYP), and animals (four classes: 
PBC, MEIS, TGIF, and IRO). The KNOX, PBC, and 
MEIS classes each contain large conserved domains 
upstream of the homeodomain. Sequence comparison 
has shown that the KNOX, PBC, and MEIS domains 
share weak sequence similarity, suggesting a common 
ancestry. 


pros Class 

The prospero class is a highly divergent class of homeo- 
domain proteins. The homeodomain has three extra 
residues between helix 2 and helix 3, and a pros 
domain of about 100 amino acids follows immediately 
after the homeodomain. 


Evolution 


Homeobox genes are found in plants, fungi, and 
animals, and even in slime molds (Dictyostelium). 
Although now several prokaryotic genomes have 
been sequenced, no true homeobox gene has been 
found in these organisms. Thus, it appears likely that 
the first homeobox appeared sometime in eukaryote 
evolution, probably derived from a helix—turn—helix 
factor. In the ancestral organism from which even- 
tually plants, fungi, and animals were derived, at 
least two different homeobox genes must have existed 
already: one a typical 60-amino acid homeobox gene, 
and one TALE homeobox gene. This ancestral TALE 
homeobox gene had a conserved upstream domain 
from which the KNOX, MEIS, and PBC domains 
are derived. While in plants and fungi some pro- 
liferation of different types of homeobox genes has 
taken place, by far the largest expansion has happened 
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in animals, where there are now dozens of different 
classes and families of homeobox genes (see Classes of 
Homeobox Genes). The emergence of the different 
classes of homeobox genes seems to have happened 
early in metazoan evolution, since in sponges and 
cnidaria many different types of homeobox genes are 
found, and in the Bilaterialia phyla essentially all 
classes and many families are present. 


Function 


Given the widespread nature of homeobox genes in 
higher eukaryotes, it is not surprising that their func- 
tion is very diverse. Nevertheless, most of them play 
important roles in the development of their respective 
organisms. Homeobox genes have been found to func- 
tion at the earliest points in development as well as in 
the very latest cell differentiation events. Some ex- 
amples follow. 

The Hox cluster genes of animals are involved in 
patterning and specification of identity in regions 
along the anterior—posterior body axis. A striking 
aspects is that the order of the genes in the cluster is col- 
linear with their function along the anterior—posterior 
axis. Thus, the Drosophila gene labial (lab) functions 
in the very anterior of the animal, while the gene 
Abdominal-B (Abd-B) functions in the 5th to 8th 
abdominal segments. In vertebrates, the Hox cluster 
genes are likewise involved in patterning along the 
body axis. Since the binding site of Hox cluster genes 
is rather short, additional cofactors are necessary to 
provide DNA-binding specificity. Two TALE class 
homeobox genes have been identified as cofactors for 
Hox cluster genes: in flies the two genes extradenticle 
(exd), a PBC class gene, and homothorax (hth), a MEIS 
class gene, have been shown to form complexes with 
Hox proteins such as Ubx or lab. 

One of the earliest developmental homeobox genes 
is found in Drosophila. The gene bicoid plays a key 
role in setting up the anterior—posterior axis in the 
embryo. Mutant embryos lack head and thorax and 
develop posterior structures at the head. The bcd 
RNA is provided maternally, and the RNA as well as 
the protein are localized at the anterior pole of the 
embryo. Despite the crucial role in early development 
for Drosophila, the bcd gene is a relatively new gene in 
evolutionary terms; it is most likely derived from an 
ancestral gene of the Hox3 group. Furthermore, while 
most homeobox genes bind DNA and function as 
transcription factors, bcd also plays a regulatory role 
at the level of messenger RNA. It can bind RNA and 
regulate expression of other genes at the translational 
level. 

The C. elegans prd-like gene unc-4 is involved in 
the specification of motor neurons. Mutations in this 
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gene lead to abnormal synaptic connectivities so that 
the VA neurons receive synaptic input that is normally 
appropriate only for VB neurons (which are the sister 
cells of the VA neurons). The consequence of these 
wiring defects is that the animals cannot move back- 
wards anymore. unc-4 is expressed in the VA moto- 
neurons; it confers VA identity to these neurons, and 
is thus one of the final steps in differentiating a subset 
of motoneurons. 

The POU genes Oct-1 and Oct-2 were first identi- 
fied as transcription factors because of their biochem- 
ical properties of binding the octamer sites in the 
promoter region of the immunoglobulin enhancers. 
This provided compelling evidence that homeobox 
genes are transcription factors. 

In yeast, the mating-type locus contains two 
homeobox genes, Mat1 and Mat2, the latter a TALE 
homeobox gene. These two genes are involved in 
mating-type switching, i.e., they regulate and switch 
between the two cell fates that yeast can adopt. 

In plants, the gene shootmeristemless (STM) of 
Arabidopsis thaliana encodes a KNOX class homeo- 
box gene. Mutations in STM fail to develop a shoot 
apical meristem. Converse phenotypes have been 
found when the closely related gene Knotted 1 from 
maize is overexpressed in tobacco. Thus, also in plants, 
homeobox genes are involved in developmental 
processes. 
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A homeotic gene is one that contains a homeobox, 
whose level of expression is set during embryo- 
genesis in response to positional cues, and which 


subsequently directs the later formation of tissues 
and limbs appropriate to that part of the organism. 


See also: Homeobox; Homeotic Mutation 
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The term ‘homeosis’ was coined by William Bateson 
in 1894 to describe particular types of biological vari- 
ation whereby “something has been changed into the 
likeness of something else.” More than 20 years later, 
the first mutation that causes homeosis — a homeotic 
mutation — was described by C.B. Bridges in Dros- 
ophila, and many more were subsequently discovered. 
These homeotic mutations lead to partial or complete 
transformations of particular body regions in the fly. 
For example, a segment can be transformed such that 
it resembles its anterior neighbor, as in the case of 
particular mutations in the ultrabithorax (Ubx) gene, 
which cause partial transformations of the third thor- 
acic segment into the second thoracic segment. In the 
most extreme case, when several Ubx alleles are com- 
bined, a fly can have four wings instead of two wings 
and two halteres, because the halteres of the third 
thoracic segment are converted into wings. 

Another well-known gene is Antennapedia, domin- 
ant mutations in which can cause transformations of 
antennae into legs. The first homeotic genes cloned 
were found to contain a conserved sequence element 
that was termed the ‘homebox.’ However, subsequent 
research showed that notall homeotic genes are homeo- 
box genes, and not all homeobox genes are homeotic 
genes. For example, the Drosophila homeotic gene 
spalt encodes a zinc-finger protein, and the homeotic 
gene fork head was the founding member of another 
family of transcription factors that contain a fork head 
domain. 

Several of the homeotic genes in flies are located in 
two gene clusters: the bithorax complex (BX-C) and 
the antennapedia complex (ANT-C). Collectively, 
these two complexes are often referred to as the 
homeotic complex (HOM-C). Genes in these two 
complexes control the development of the Drosophila 
body along the anterior—posterior body axis. Intri- 
guingly, the gene order on the chromosome is col- 
linear with the respective gene function along the body 
axis. In vertebrates, the corresponding clusters of 
genes are called HOX clusters. The genes in the verte- 
brate HOX clusters are highly conserved with their 


fly counterparts. Functional analysis of these genes 
using knock-out techniques revealed that in verte- 
brates, too, they function in patterning along the 
anterior—posterior body axis and cause homeotic 
transformations. The HOM-C and HOX clusters har- 
bor the perhaps most well-known developmental con- 
trol genes; Ed Lewis was awarded the Nobel Prize for 
his ground-breaking studies of BX-C. 

While the term ‘homeotic mutation’ is mainly 
known from mutations in segmentation genes in Dros- 
ophila, the original definition of homeosis is very 
broad. Thus, other mutations that cause transform- 
ations have also been termed homeotic. For example, 
in the nematode Caenorhabditis elegans the gene lin- 
12, which encodes a transmembrane receptor, has been 
termed a homeotic gene, because many cell lineages 
(patterns of cell divisions) are transformed into other 
cell lineages. In plants, many homeotic mutations 
are known that cause transformations of leaves and 
flowers. 


See also: Homeobox; Lewis, Edward 
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This is one of the cytogenetically visible signs of gene 
amplification, the other being ‘double-minute chromo- 
somes.’ It is known that the homogeneously staining 
regions (hsr) just as the dmin, will contain copies of an 
amplified DNA segment (the amplicon), leading to 
cellular overexpression of the genes contained in the 
segment. In a single hsr there are usually many ampli- 
con copies arranged in tandem array. Characteristic- 
ally, Asr can be detected after chromosome banding in 
metaphase preparations as a large chunk of diffusely 
staining chromatin somewhere inside an ordinary 
chromosome. The mechanism for generating the hsr 
is not known exactly, but it is generally assumed that 
the amplification can take place during an episomal 
phase, in which a circular DNA molecule is replicat- 
ing autonomously relative to the bulk of chromo- 
somal DNA. The episomes may be transferred into 
chromatin bodies visible in the light microscope 
(dmin) and the dmin will subsequently be integrated 
at a (random) chromosomal site to generate the hsr. 
However, various other schemes have been proposed 
for the origin of hsr and it is quite likely that several 
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different molecular mechanisms may be functional 
leading to the same end results (Schwab, 1999). 


Reference 
Schwab M (1999) Oncogene amplification in solid tumors. Sem- 
inars in Cancer Biology 9: 319-325. 


See also: Amplicons; Double-Minute 
Chromosomes; Gene Amplification 
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Homologous means having a common origin by des- 
cent. Chromosomes that are homologous in this broad 
sense may have diverged to a very considerable extent. 
In the long evolutionary term, chromosomes may 
undergo structural rearrangements, so that homology 
between different species and genera is often a prop- 
erty of chromosome segments rather than whole 
chromosomes. Between the chromosomes of mice 
and humans, for example, there is quite a high degree 
of patchwork homology, with blocks of similar genes 
in locally similar sequences (syntenic genes) in very 
different larger-scale arrangements. 

However, in the context of experimental genetics, 
homology most usually refers to the close similarity of 
the pairs of chromosomes in the diploid organism. 
The classical criterion for assessing homology in this 
stricter sense is ability to pair. A fully homologous 
chromosome pair will be closely associated all along 
their lengths at the pachytene stage of meiosis. The 
pairing can be seen under the light microscope in 
organisms with reasonably large chromosomes (e.g., 
very clearly in Zea mays and not at all in yeasts, except 
by fluorescent in situ hybridization, also known as 
FISH), but the synaptonemal complex, which is 
formed between the paired homologues, can usually 
be clearly visualized with the electron microscope, 
even in yeasts, by virtue of its staining with silver 
ions. In Drosophila species (as well as in other flies), 
homologous pairing can be seen in unrivalled detail in 
the giant nuclei of salivary gland cells, where maximally 
extended chromosomes are amplified over 100-fold in 
thickness by repeated replication without separation 
(polytene chromosomes), and homologs are closely 
paired. The close pairing of chromosomes, either at 
pachytene of meiosis or in Drosophila giant nuclei 
(where much more detail can be seen), reveals structural 
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differences between homologs due to inversions, inter- 
changes, or deletions of chromosome segments. 

Whereas homology, as judged by pairing, is vir- 
tually complete between the two chromosome sets 
within a diploid species, it is usually much less evident 
between species, even when the species are closely 
related taxonomically. Interspecific hybrids seldom 
show regular chromosome pairing and bivalent forma- 
tion at meiosis; that is the usual reason for hybrid 
sterility. Nevertheless, the chromosomes of related 
species are often similar in number, in relative sizes 
and, so far as it can be determined, in function, and are 
obviously homologous in the sense of related by des- 
cent. A good example is provided by wheat, Triticum 
aestivum, which is a 42-chromosome hexaploid, with 
three different diploid sets of 14 chromosomes, 
derived from three different species. At meiosis, 
wheat regularly forms 21 bivalents, with pairing 
restricted to chromosomes from the same ancestral 
diploid. However, this stringent specificity of pairing 
is under genetic control, and when a certain chromo- 
some (5B, the fifth chromosome of the B genome) is 
removed by selective breeding, pairing also occurs 
between corresponding chromosomes from different 
ancestral diploids. The lower degree of homology so 
revealed is sometimes called homeology, a term used 
mainly by cereal breeders, though it could have a 
wider application. For example, it is used to describe 
recombination in Saccharomyces cerevisiae with a 
chromosome froma closely related species or between 
closely related but diverged, duplicated genes. 

Still lower degrees of homology exist between 
chromosomes, or segments of chromosome, of non- 
hybridizable species, but have to be demonstrated 
by methods based on DNA technology such as in 
situ hybridization of DNA probes to chromosomes 
(‘chromosome painting,’ FISH). 


See also: Meiosis; Polytene Chromosomes; 
Segmental Interchange; Synapsis, 
Chromosomes; Synaptonemal Complex; 
Synteny (Syntenic Genes) 
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Homologs are chromosomes that carry the same 
genetic loci. A diploid cell has two copies of each 
homolog, one derived from each parent. 


See also: Chromosome; Homologous 
Chromosomes 
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Although (Owen, 1843) is generally credited with 
coining the word ‘homolog,’ the idea that parts 
of organisms are comparable in some fundamental 
sense can be traced back at least to Aristotle. Owen 
characterized the term ‘homolog’ to denote the com- 
parative similarity in structure between parts of two 
different organisms “under every variety of form and 
function.” For example, the right forelimb of a bird 
would be considered homologous with the right 
forelimb of a human in spite of differences in function 
and considerable differences of form. This was con- 
trasted with the term analogy which denoted similar 
function without necessary underlying similarity 
(wings of birds and butterflies). Although many con- 
sider these words as having a complementary mean- 
ing, this was not their original intent (Pachen, 1994). 
Homologous parts can have analogous functions 
(wings of birds and wings of bats), just as nonhom- 
ologous parts can have analogous functions (wings of 
bumble bees and wings of birds). After the general 
acceptance of the general theory of evolution (descent 
with modification), most biologists used the term 
homolog to denote comparable (similar or identical) 
characters shared through common descent. This 
generated a whole new set of terms to denote similar- 
ity of form gained independently (e.g., convergences, 
parallelisms, paralogs, etc.). 


Characters, Character States, and 
Homology 


It is common to distinguish between characters as a 
general description of the part of an organism or 
taxon, and character state, the specific feature of a 
particular organism or taxon. Thus, one might term 
the character ‘base pair position 148 in cytochrome b’ 
and the character state ‘guanine nucleotide present.’ 
However, this is simply restating two characters: one 
that exists at a higher level (presence of that base 
position in the gene) and another that exists at a 
lower level (guanine nucleotide). Because homologous 
characters have a history that is tied directly to a 
hierarchy of descent, the distinction between charac- 
ter and character state is not necessary (Wiley, 1981; 
Ax, 1987; Patterson, 1988). Essentially, homologous 
characters are simply recorded features of two different 
organisms that are thought to have a particular 
relationship, and some level of similarity between 


structure and position of the characters of two different 
organisms seems necessary to be able to do so. The 
practice of distinguishing characters and character 
states grew from the use of data matrices where col- 
umns of data were given a general name and the 
characters of organisms a specific name. But, columns 
really represent initial hypotheses of homology. Char- 
acters placed in a single column of data are initially 
thought to be good candidates for having a hom- 
ologous relationship. Whether this is true in the end is 
another matter. 


Homology at the Taxon Level 


Just as with species concepts, concepts of taxic hom- 
ology are numerous and what constitutes homology 
between parts of two organisms is hotly debated. 
(Wagner, 1994 and earlier papers) has distinguished 
three concepts of homology: historical, morphologic- 
al, and biological. The question is, should there be 
three (or more) kinds of taxic homology, or are some 
kinds simply a manifestation of a larger concept? With 
the rise of the evolutionary paradigm, what we take as 
the fundamental nature (or ontology) of homology 
became associated with descent with modification. 
Homologous parts are comparable, not because they 
are derivations from an archetype per se, but because 
they are inherited from a common ancestor in mod- 
ified or unmodified form. Wiley (1975), Patterson 
(1982), and others have taken this to a logical conclu- 
sion: homologs at the level of taxa are apomorphies 
(derived characters, evolutionary novelties) at some 
point in their history. Perhaps a thought experiment 
is in order. Imagine that we have the entire tree 
of genealogical descent mapped out at our feet. If 
we place all the similarities and differences observed 
among organisms on this tree at the point where they 
arose and followed their fates, we would see the 
coalesced homologies as apomorphies that diagnose 
species (autapomorphies) and monophyletic groups 
(synapomorphies). We would see the homoplasies 
(nonhomologous similarities) and analogies (func- 
tionally similar but structurally dissimilar) scattered 
thoughout the tree in different groups. Interestingly, 
what we would not see are symplesiomorphies, shared 
primitive homologies. This is because every sym- 
plesiomorphy is actually a synapomorphy higher in 
the phylogeny and the reason we have the term ‘sym- 
plesiomorphy’ is because we do not consider the entire 
tree at any one time. Symplesiomorphies are simply 
homologies that arose in ancestors more ancient than 
those that are logically included in the restricted tree. 
Under this concept, there is a single concept of taxic 
homology of which other concepts of homology are 
special (and perhaps perfectly valid) cases. 
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Origin of Homologous Characters 


If homology is a concept that extends below the 
level of taxa, then it should be obvious that hom- 
ology at lower levels cannot simply be apomorphy. 
Haszprunar (1991) suggests four levels of homology: 
(1) iterative homology is the correspondence serial 
homologs in the same individual at the same time; 
(2) ontogenetic homology is the correspondence of 
parts at different times in the same individual; (3) poly- 
morphic homology is the correspondence of parts 
between individuals of the same species lineage; and 
(4) supraspecific homology is the correspondence of 
parts between taxa (taxic homology). Ontogenetic 
and polymorphic homology are directly related to 
the origin and eventual fixation of apomorphies, 
while iterative homology is related to serial homology 
and homonomy (mass homology). Iterative hom- 
ology may or may not be translatable into taxic 
homology (see below, “Conjunction test”). The origin 
of taxic homology, suggests Haszprunar, lies with the 
origin of apomorphies within species where they 
coexist, for a time, with their plesiomorphic homo- 
logs. Further, their origin on the molecular level may 
not be unique but recurrent. It is possible for gene 
alleles to be identical by descent and yet remain poly- 
morphic over speciation events, creating homoplasy at 
the taxic level, while being homologous at the gene 
level. Such phenomena and others create differences 
between gene trees and species trees. 


Nature of Homologs 


Exactly what constitutes homology from the onto- 
logical perspective is also debated. Although we 
understand that homologs gain their ‘comparability’ 
through descent, and we understand that homologies 
appear as apomorphies on phylogenetic trees, we also 
understand that the homologies being compared do 
not actually have descent relationships. That is, right 
hands do not actually give rise to other right hands, 
nor does guanine at position 158 in a cytochrome b 
gene sequence give rise to a descendant guanine at that 
same position in a descendant mitochondrion. Rather, 
the relationship between homologs is always indirect, 
being mediated by ontogeny at the morphological 
level and semiconservative replication at the DNA 
level (and other processes at intermediate levels). 
This has led authors such as Van Valen (1982), 
Hausperger (1991), and Roth (1994) to characterize 
homology as a manifestation of the flow of biological 
information between generations and over phylogeny. 
This concept of information should not be confused 
with sequence information; it includes epigenetic 
information as well. Under this concept, homologous 
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structures are the observable manifestation of infor- 
mation flow over time and through descent. If so, then 
this general concept of homology can be easily 
extended to behavioral and functional characters (see 
Greene, 1994 and Lauder, 1994, for examples). 
Further, it solves certain conundrums such as how to 
homologize Meckel’s cartilage in vertebrates where 
the structure is induced by different tissues (see review 
in Wagner, 1994). In such cases, epigenetic constraints 
on the developing phenotype may allow for consider- 
able variation in the actual way that a particular struc- 
ture is built during ontogeny. 


Homology and Homoplasy at Different 
Levels of Organization 


Not all comparable features of organisms are apomor- 
phies at some level in a phylogeny. Even identical 
characters such as the same base residue at the same 
position can evolve independent of each other (and 
thus be apomorphies at different levels or in different 
places in the phylogeny). Given homologies, what of 
similar but nonhomologous characters? The general 
concept of homoplasy can apply to characters that 
show some level of structural, behavioral, ontogenetic, 
or genetic similarity, but that do not qualify as 
homologies because they have independent evolution- 
ary origins. The complexities of homology and homo- 
plasy can be seen in molecular systems where there 
are three levels of homology, two at the taxic level and 
one at the gene level. At the first level, orthologous 
genes (Fitch, 1970) are strictly comparable between 
organisms, so their sequence variation can contain 
homologs. At the second level, the level of the organ- 
ism, orthologs are candidates for taxic homology. That 
is, the presence of orthologous genes in the taxa of 
which the organisms are part can be a synapomorphy 
of a monophyletic group containing these organisms. 

Paralogous genes (Fitch, 1970) are related among 
themselves in gene trees, but because of gene duplica- 
tion, two or more paralogous genes exist in the same 
organism. At the level of the organism, paralogs may 
contain similar bases at the same base position, but 
these similarities are nonhomologous in terms of taxic 
homology. That is why we call the genes by different 
names. Using a mix of sequence data from a- and 
B-hemoglobin (a from one species, B from another, 
etc.) would lead to spurious results since positional 
homology between the genes does not exist relative 
to the organisms of which the paralogous genes are 
a part. But as parts of organisms themselves, the 
distribution of paralogous genes can be used to test 
relationships because the presence of various gene 
copies can act as synapomorphies. So, among 
vertebrates, the presence of B-hemoglobin is a 


synapomorphy of jawed vertebrates, Gnathostomata 
(Goodman et al., 1987), while other paralogs are syn- 
apomorphies at higher and lower levels in the phyl- 
ogeny. Thus, among gene families there are two levels 
of taxic homology relative to organisms. The first level 
is the level of sequence variation among orthologs. 
At this level, analysis of homologous base positions 
leads to an hypothesis of the relationships among taxa 
in the same manner as the analysis of homologous 
morphological characters. The second level is the dis- 
tribution of orthologs and paralogs among the organ- 
isms in a phylogeny. The distribution of members of 
a gene family leads to an hypothesis of relationship 
among organisms in the same manner as sequence 
variation among orthologs or the distribution of mor- 
phological homologs. The third level obtains among 
paralogs and their gene descent. While sequence pos- 
itional homology might not obtain between paralogs 
relative to taxa or relative to organisms within taxa, 
it does obtain between paralogs relative to their own 
descent in gene trees. This level does not pertain to the 
organisms per se, but to the descent of the genes from 
their own gene ancestors. Figure | illustrates these 
levels and concepts. 


Independence of Different Homologs 


Atomization refers to the ability of an investigator to 
gather characters of organisms into suites of sup- 
posedly homologous characters. This activity is best 
seen in the construction of a data matrix for purposes 
of analyzing phylogenetic relationships. Columns of 
data are hypothesized to represent different and inde- 
pendent suites of homologous characters. (Indeed, all 
phylogenetic algorithms treat different data columns 
as independent.) At the level of gene sequences, this 
may be an easy task because base position of ortho- 
logous gene sequences provides a rationale for re- 
cognizing data columns that contain homologous 
nucleotides. In morphology, behavior, and function, 
the issue of how to atomize characters can be more 
complex, but in general some judgement is made that 
divides the features observed into the smallest com- 
parable units that the investigator can justify. 

Given this atomization, there remains the issue 
of character independence among different suites of 
homologs. In systematic analysis, this issue can be 
framed rather crisply: how many independent col- 
umns of data actually exist as compared to the total 
number of data columns. For example: if the investi- 
gator is analyzing sequences from a ribosomal gene, 
are the data columns that record base pair comple- 
ments really independent of each other? If we examine 
the distribution of synapomorphies over a phylo- 
genetic tree, we can partly address this question. 
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Sequence synapomorphies 
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Figure | A phylogeny of selected vertebrate groups 
illustrating two levels of homology for orthologous and 
paralogous genes (upper), and a gene tree of the globin 
family of genes illustrating sequence homology between 
paralogous genes at the level of gene trees. 


Synapomorphies from different suites of homologous 
characters that appear at different points on the phylo- 
geny are independent in the evolutionary sense. 
They may come to be dependent where they occur 
together in the same group, but their origins are not 
coupled. The same cannot be said when synapomor- 
phies co-occur on the same branch. In such cases, 
other studies (ontogenetic, for example) would have 
to be applied to demonstrate that they are independent 
characters. In some cases, such as synapomorphies 
from different genes, different gene regions, or differ- 
ent functional complexes, the case for independence 
may seem to be evident. In other cases, such as com- 
plementary base pairs in stem regions of ribosomal 
genes, the case for independence may be suspect. In 
evolutionary studies, especially those concerned with 
phylogeny reconstruction, the issue of independence is 
closely tied to the issue of support for a tree. Four 
synapomorphies that are functionally or ontogenet- 
ically linked may only be one synapomorphy (one evo- 
lutionary event with four manifestations) rather than 
four synapomorphies. If an alternative monophyletic 
group is diagnosed with one, two, or three different 
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synapomorphies, then what appears to be the most 
parsimonious tree (four dependent synapomorphies) 
may actually have less support than the alternative tree. 


Recognition and Testing of Homologs 


Patterson (1982, 1988) outlined and discussed the tests 
that can be applied to parts of organisms hypothesized 
to be homologs and has used these distinctions to 
characterize many types of nonhomologous similar- 
ities. His analysis was made under the assumption that 
homologies are apomorphies. 


Similarity Test 


Parts that are dissimilar are not likely candidates for 
hypotheses of homology. Testing may take the form of 
(Remane, 1956) tests of similarity, topological pos- 
ition, and special correspondence. Base position forms 
the major criterion of similarity in DNA sequence 
data (Remane’s criterion of topological position 
within the gene). Base similarity forms the major cri- 
terion among bases that occupy the same base position 
(similar bases are presumed homologous as an initial 
hypothesis). As Hennig (1966) stressed, characters that 
pass similarity tests must always be assumed hom- 
ologs in the absence of contrary evidence (such as 
that provided by the two additional tests detailed 
below). This assumption of homology is necessary to 
avoid ad hoc dismissal of evidence. 


Conjunction Test 


“If two supposed homologues are found together 
in one organism, they cannot be homologous” 
(Patterson, 1988, p. 605). In morphological characters, 
similarities that are found in two to many ‘copies’ are 
termed homonomies. Homonomies (iterative homo- 
logies at lower levels) may take the form of ‘serial 
homologs’ in the case of metameristic repeats of 
body segments or ‘mass’ or ‘general’ homologies in 
the case of hair in mammals. Homology statements 
mixing parts of homonomous body segments would 
result in spurious taxic homology statements (as in 
paralogous genes). However, the evolutionary novelty 
that produced the serial homology may act as a syn- 
apomorphy at a higher level in the phylogeny. In the 
case of such characters as mammalian hair, general 
presence of the mass homology may act as a synapo- 
morphy in spite of the fact that it may be difficult 
to impossible to provide a one-to-one homology 
statement about individual hairs. 

In genetic systems, similar characters that fail 
the conjunction test may take the form of paralogous 
genes or xenologous genes (paralogy is discussed 
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above). Some paralogous gene families show concer- 
ted evolution and in some circumstances their copies 
(plerologs) may be treated as a single gene for the 
purposes of phylogenetic analysis (but see Hillis 
(1994), for further discussion). Xenology obtains 
when genes of the same gene family are spread by 
lateral gene transfer rather than common descent. 
On a phylogeny of organisms, paralogous genes are 
expected to form nested sets of characters that reflect 
descent of the gene family (Figure 1). There is no 
expectation of such a pattern in xenologs whose 
spread is not historically constrained. 


Congruence Test 


Similarities that pass both the similarity and con- 
junction tests whose distribution on a phylogenetic 
tree are congruent with many other similarities are 
deduced to be candidates for the status of uncon- 
tested homologies. Under the assumption that hom- 
ology is apomorphy, the congruence test provides 
the final arbitrator for accepting or rejecting parts 
that pass the first two tests. Similarities that fail the 
congruence test are frequently termed parallelisms if 
they are very similar, or convergences if they are dis- 
similar upon reinspection. (The distinction between 
parallel and convergent characters is debatable, see 
Homoplasy.) 


Homology Issues in Molecular Genetics 


One basic issue is the extent to which molecular hom- 
ology differs from morphological, behavioral, and 
other kinds of homology. Patterson (1988) suggested 
that there was a difference because similarity was used 
to establish homology and the basis of this similarity 
was statistical (the probability that sequence similarity 
is due to chance is rejected.) However, if we treat this 
issue as one of identity or an issue of relationships 
among comparable entities, then there is no need to 
conclude that homology is fundamentally different on 
the molecular and morphological levels. Statistical 
similarity may lead to the conclusion that the genes 
of two organisms belong to the same class of gene (e.g., 
they are both a-hemoglobin), or to the hypothesis that 
they are members of the same gene family (e.g., they 
are members of the globin gene family). However, 
testing the hypothesis that two apparent paralogs are 
members of a gene family would seem to be a matter of 
establishing their gene tree relationships and that 
requires synapomorphies at the gene level. 

Hillis (1994) provides a detailed discussion of issues 
of homology particular to molecular biology: an in- 
complete summary of some of the major issues is given 
below: 


1. Positional homology. Positional homology refers 
to the position of a single nucleotide site within a 
gene, a ribosome, or an amino acid site within 
a protein. Since nucleotides and amino acids have 
the same structure regardless of their evolutionary 
origins, the similarity criterion applied to the 
sequence and amino acid levels of analysis refers 
only to positional homology. An adenine and a 
thymine at a well-established homologous position 
are regarded as homologous in spite of their 
obvious nonsimilarity on the structural level of 
the nucleotide. For orthologous genes, accuracy of 
positional homology is dependent on correct align- 
ment of the sequences or amino acids. Alignment of 
sequences of orthologous genes of the same length 
is relatively easy. Difficulties arise when genes con- 
tain introns or diverge such that they are of differ- 
ent lengths. Such cases require an understanding of 
gene architecture (in the case of exon-intron rela- 
tionships; loop and stem architecture of the func- 
tional ribosomal sequence, etc.) and explicit rules 
for aligning the obtained sequences that include 
costs for introducing gaps. 


2. DNA hybridization. DNA hybridization provides 


a measure of overall similarity of cross-hybridized 
sequences but does not distinguish between ortho- 
logy, paralogy, and positional homoplasy. DNA 
hybridization is not useful for explicit hypotheses 
of homology. 


3. Restriction enzyme analysis. Restriction site map- 


ping can yield homologous characters because 
restriction sites are composed of specific base 
recognition sites along orthologous genes. Restric- 
tion fragment homology determination is more 
problematic, especially between species because of 
various sources of error. 


4. Random amplified polymorphic DNA (RAPD). 


Fragments produced in RAPD studies have the 
same sources of error as other fragment data. In 
addition, studies have suggested that amplification 
of paralogs and nonhomologous loci may yield 
fragments of the same size and that RAPD-based 
phylogenetic inferences are incongruent with well- 
established phylogenies. 


5. Allozyme electrophoresis. Allozyme electrophore- 


sis is particularly valuable for studying the distribu- 
tion of paralogs among taxa and studies of the 
differential expression of paralogous genes in dif- 
ferent tissues of the same organism. Criteria for 
determining the orthologous or paralogous nature 
of expressed products is well established. For elec- 
tromorphs of an orthologous gene, homology is 
determined by electrophoretic mobility coupled 
with the congruence test and works best for closely 
related species. 
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Lankester (1870) introduced the term ‘homoplasy’ to 
describe all resemblances that were not homologous. 
Lankester included such resemblances of serial and 
general homologs within the concept, but most mod- 
ern biologists restrict the concept to analogous, con- 
vergent, and parallel similarities shared among species 
or other taxa. In general, taxic homoplasies are similar- 
ities in either form or function that fail one or more of 
the three tests: similarity, conjunction, and congru- 
ence (see Homology). 


Analogous Similarities 


There is considerable confusion concerning the con- 
cept of analogy (see Analogy). Analogous similarities, 
as the term is usually applied in systematics, are 
similarities in function and frequently do not appear 
as similarities in underlying structure. As such, they 
do not usually appear as homoplasies because they are 
screened before analysis and would appear in different 
data columns in a matrix of characters. That is, the 
investigator would not enter the analysis with an 
underlying hypothesis that the wings of bats and the 
wings of insects were homologous. However, analo- 
gous similarities can appear as homoplasies if the 
underlying structures are homologous but modified 
to perform a similar function. For example, one could 
imagine that a matrix of all vertebrates would contain 
a column containing the character ‘wings present’ 
versus ‘forelimbs present,’ and that the resulting an- 
alysis would show ‘wings present’ as homoplastic, 
appearing as a synapomorphy of birds and another 
synapomorphy of bats independently. However, 
even a cursory examination of the character ‘having 
wings’ would reveal that the structure of the wings of 
bats and birds are different relative to the details of 
wing architecture. Analogies fail the similarity test 
(see Analogy for further discussion). 


Convergence and Parallelism 


Patterson (1988) reviews the history of the distinction 
between parallel and convergent similarities. Some 
authors find the distinction to be arbitrary and use 
only the term homoplasy (e.g., Wiley, 1981; Ax, 1987). 
Others suggest that while the concepts are not easily 
separated, convergences are similarities exhibited by 
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groups that are not closely related (e.g., enlarged 
canines of marsupial cats and saber-toothed cats), 
while parallelisms are exhibited by groups that are 
closely related. Parallelisms and convergences 
are similar in that both are identified through the 
congruence test. Patterson (1988) suggested that con- 
vergences fail the similarity test at some level (like 
analogies) while parallelisms pass the similarity test. 
Of course, since all characters of organisms, including 
the homologous ones, are dissimilar at some level, 
a certain amount of arbitrariness might be involved 
in the assessment. Hennig (1966) used a special term 
for some parallelisms, homoiology, to denote those 
parallelisms that arise repeatly from a common genetic 
base. Some authors, such as Wagner (1989) consider 
parallelisms to be homologous under some concepts 
of homology. (This reasoning finds its analog in the 
idea that paraphyletic groups are a kind of monophy- 
letic group.) Parallelisms are not considered homolo- 
gous under the concept that taxic homologs are 
apomorphies at some level in the phylogeny. 

Parallelisms and convergences can be found at sev- 
eral levels of organization. The most basic level is the 
level of sequence variation in DNA, amino acid vari- 
ation in protein sequences, electromorph variation at 
the allele level, and part descriptions at the morpho- 
logical level. In each case, similar characters (base 
residues, amino acids, electronmorphs, flower color, 
bone shape, etc.) that pass the criteria of similarity and 
conjunction fail the test of congruence. That is, they 
appear in a phylogenetic tree more than once, indicat- 
ing that they originated in two or more lineages. At the 
level of genes, lateral gene transfer may result in the 
presence of genes in quite distantly related organisms. 
Such genes are termed xenologous genes. Patterson 
(1988) suggested the term ‘paraxenolog’ for the 
case in which more than one copy of a xenologous 
gene family was present in the same organisms. He 
suggested that this was somewhat analogous to 
homeosis at the morphological level. At morphologi- 
cal levels of organization, parallelism or convergence 
may take the form of presence or absence of an entire 
structure that has been lost or gained independently in 
several lineages. 


Homonomy 


At the level of taxa, homonoms are similarities that fail 
the conjunction test because two or more similar 
structures are found in the same taxon. Some homo- 
nomies are termed ‘general homologies’ or “mass 
homologies’ and their presence versus absence is 
treated as a synapomorphy. For example, although it 
is difficult to homologize any two mammalian hair 


follicles, the presence of hair is treated as a synapo- 
morphy of Mammalia. In other cases, body parts are 
duplicated through metamerism in development 
(legs and antennae of insects). The classic example of 
homoplasy in this example is the mouth parts of 
arthropods where function of feeding is allocated to 
appendages that belong to different segments in dif- 
ferent groups. Homologous structures can be found 
among organisms within a homologous segment, 
but comparison of similar appendages that belong to 
different segments would lead to a mistake in taxic 
homology determination. Paralogous genes are ex- 
amples of homonomous parts at the organism level 
of organization. Just as with ‘mass homology,’ the 
presence of a particular gene family may be treated 
as a synapomorphy if it passes the congruence test 
(see Homology). 
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Homozygosity is the genetic state in which a diploid 
organism carries two identical alleles at a locus of 
interest. In this situation, the organism is considered 
to be homozygous at this locus. The contrasted state 
is heterozygosity. 


See also: Heterozygote and Heterozygosis 
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The genus Hordeum, the barleys, comprises a group 
of grass species, the most economically and socially 
important of which is the cultivated form, Hordeum 
vulgare, which is the fourth most widely grown cereal 
after wheat, rice, and maize. Hordeums belong to the 
tribe Triticeae which also includes the wheats, rye, and 
oats, in the grass family Poaceae, with a basic chromo- 
some number of 2n = 14. However, they exist in 
diploid, tetraploid, and hexaploid forms. The mor- 
phology of the Hordeums is rather specialized and 
they are characterized taxonomically by having spike- 
lets with single flowers borne together in triplets on 
the main axis of the spike (the rachis). The central 
spikelet is generally sessile and male and female fertile, 
whereas the two lateral flowers, which are stalked in 
most species, may be fertile (as in six-row cultivated 
barely) or sterile (as in two-row cultivated barley). In 
Hordeums the glumes are reduced compared to most 
Triticeae and situated on the dorsal side of each spike- 
let with long awns both on the lemmas as well as the 
the glumes. Species differ in reproductive behavior 
and life cycle with H. vulgare being annual and self- 
pollinating, and species such as H. bulbosum being 
perennial and obligatory cross pollinating by virtue 
of having a self-incompatibility mechanism. 


Origins and Phylogeny 


The genus comprises about 30 species distributed 
through the temperate regions of most of Eurasia, 
North and South America, Africa, and Australia. 
With respect to cultivated barley, it is generally recog- 
nized that there are three gene pools that can be 
exploited for barley improvement. The primary gene 
pool consists of cultivated barley, H. vulgare subsp. 
vulgare, and its wild progenitor which grows pre- 
dominantly in the Middle East, H. vulgare subsp. 
spontaneum (known generally as H. spontaneum). 
Crosses between H. vulgare and H. spontaneum are 
easily obtained and hybrids are self-fertile. The sec- 
ondary gene pool comprises only H. bulbosum, which 
exits in two forms, a diploid form (27 = 14) and an 
autotetraploid form (2n = 28). H. vulgare can be 
hybridized with both forms and hybrids are easily 
obtained with the use of embryo rescue techniques. 
Although the hybrids are generally sterile, seed can be 
obtained by backcrossing the hybrid as female to H. 
vulgare, and genetic recombination between the 
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genomes has recently been obtained. One peculiarity 
of H. vulgare x diploid H. bulbosum crosses is that 
generally, after a hybrid zygote forms, the H. bulbo- 
sum chromosomes are eliminated at cell divisions giv- 
ing rise to the embryo, so that the end product is a 
haploid H. vulgare plant. This process is now used as a 
breeding tool to produce barley doubled haploid 
populations. 

The tertiary gene pool comprises all the other 
Hordeum species. Cultivated barley can be hybridized 
to many of these to give sterile hybrids, but few 
authenticated reports of gene transfer from these 
hybrids into cultivated barley have been reported. 
However barleys can also be hybridized to many 
other Triticeae species, for example, wheat and rye, 
and this is very useful for genetical and cytogenetical 
analysis. 

Cultivated barley is an annual species, but varieties 
have been bred that are suitable for sowing either in 
the autumn (winter barley) or in the late winter, early 
spring (spring barley). This difference is genetically 
determined and particular varieties are adapted to 
each of these different life cycles. Winter varieties 
have a requirement for a period of low temperatures 
treatment (vernalization) before floral initiation can 
commence, whilst spring varieties do not. Winter var- 
ieties tend to be more frost tolerant and are generally 
adapted to resist or tolerate a different disease spec- 
trum to spring barleys. 


Uses of Barley 


The grain of cultivated barley has two major uses, first 
for malting to produce beer and spirits, and second for 
animal feed. Plant breeding has produced varieties that 
are specialized for malting, and all others not suitable 
for this are used for animal feed. These latter varieties 
tend to be the highest yielding. Some grain is used 
directly for human food products, for example, in 
certain countries such as Ethiopia and Nepal, but 
overall, this is a minor use. Malting varieties are bred 
for a particular grain composition which includes low 
protein and B-glucan content, and high enzyme activ- 
ity, although the final product is also affected by the 
environmental conditions under which the variety is 
grown. To produce malt for the brewing industry, 
grains of barley are germinated so that enzymes are 
released for digestion of the cell walls and endosperm. 
The digested grains are then heat treated and dried. 
This produces malt which is a mixture of enzymes and 
substrates, mainly starch, proteins, and B-glucans. The 
malting process thus involves degradation of the cell 
wall material by B-glucanases, digestion of starch by 
a-amylases, and hydrolysis of the protein matrix. The 
malt is then used as a substrate for fermentation by 
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yeasts in the brewing process. Different malts and 
brewing additives result in different types of beers. 
Breeding of feed varieties concentrates on maximiz- 
ing the yield through improved agronomic character- 
istics. Little research has been done on selecting for 
improved nutritional aspects of barley, although some 
research has tried (unsuccessfully) to increase the 
lysine content of barley. Lysine is an essential amino 
acid needed for animal growth and is limiting when 
barley is fed in isolation. Generally, however, barley is 
used as the energy component of the animal diet with 
protein coming from other sources such as legumes. 


Cytogenetics of Barley 


Barleys have a basic chromosome number of seven. 
The chromosomes are large enough to be identified 
individually by light microscopy, particularly if they 
are differentially stained using C-banding or N-band- 
ing where blocks of heterochromatin reveal distinctive 
patterns for each chromosome. This also reveals spe- 
cies relationships, such as the close relationship 
between the genomes of North and South American 
species. The H symbol (with or withouta species super- 
script) is conventionally used to designate chromo- 
somes of the genus so as to indicate homoeology 
with chromosomes of other species of Triticeae. Cul- 
tivated barley chromosomes are thus designated 1H to 
7H, and H. bulbosum chromosomes or H. chilense 
chromosomes H?” and H”, respectively. Barley geneti- 
cists originally designated the chromosomes of culti- 
vated barley from 1 to 7 and the relationship between 
the old and new (H) nomenclature is 1 = 7H, 2 = 2H, 
3 = 3H, 4 = 4H, 5 = 1H, 6 = 6H, and 7 = 5H. 
Cytogenetical and genetical analysis in barley has 
been greatly assisted by the availability of a range of 
aneuploid stocks including a complete barley trisomic 
series and 11 telotrisomic lines. Amongst the most 
useful aneuploid stocks for genetical analysis in barley 
are those obtained by interspecific hybridization. In 
particular, the chromosomes of cultivated barley and 
H. chilense added to bread wheat, and substitution 
lines derived from these. In addition, cultivated barley 
has a whole range of other cytogenetically defined 
stocks including over 1000 reciprocal translocations, 
inversions, deletions, and duplications. Deletion 
breakpoints have been used to make comparisons 
between physical and genetic maps of barley. 


Genetic Markers and Genetic Maps of 
Barley 


Various molecular assays have been developed to 
detect polymorphism at the DNA level in barley. 
Restriction fragment length polymorphism (RFLP), 


relying on the use of restriction enzymes, has been 
complemented by assays arising from development of 
the polymerase chain reaction (PCR). These include: 
random amplified polymorphic DNA (RAPD), 
amplified fragment length polymorphism (AFLP), 
and simple sequence repeats (SSRs) or microsatellites. 
Both RAPD and AFLP allelic polymorphisms are 
inherited in a dominant manner, whereas SSR poly- 
morphisms are transmitted in a codominant manner. 
The convenience and high information content of 
SSRs have resulted in this class of molecular marker 
being very popular with barley researchers. Currently 
there are over 560 functional barley SSRs. The detec- 
tion and quantification of single nucleotide poly- 
morphisms (SNPs) is in its infancy in barley. 
However, it is anticipated that this form of biallelic 
marker has great potential to improve the efficiency of 
marker-assisted selection and provide a means of relat- 
ing sequence diversity to phenotype. 

Using standard segregation analysis, more than 80 
loci for morphological and disease resistance charac- 
ters were assigned to the seven barley chromosomes 
by 1962. Genetic maps have been created by monitor- 
ing the segregation of alleles from Fz, backcross, 
recombinant inbred, and doubled haploid families. 
Developments in molecular biology, coupled with 
access to computer software and mapping algorithms, 
have resulted in a recent explosion of information. 
Extensive genetic maps, incorporating morphological, 
biochemical, and molecular marker data are now 
being created. In addition, composite maps repre- 
sented by data from multiple mapping populations 
have been generated. 

Barley genetic maps are now viewed as an import- 
ant resource to localize qualitative and quantitative 
traits for marker-assisted breeding. They also and pro- 
vide a platform for the map-based cloning of genes for 
simple and complex phenotypes. 


Breeding Barley 


Barley is a natural inbreeder and most breeding 
schemes follow a pedigree selection scheme with 
minor variations in detail. All schemes are based on 
the principle of identifying the desirable recombinant 
whilst progressing to homozygosity. Conventional 
breeding schemes tend to be lengthy (up to 10 years) 
but have been successful in contributing an average 
1% annual increase in grain yield. Both single seed 
descent and doubled haploid methods are being used 
to augment conventional breeding methods. These 
approaches reduce the time scale and improve the 
efficiency of selection by creating homozygous mater- 
ial for evaluation. Molecular marker technology is 
being used to enhance the effectiveness of barley 


breeding by identifying new sources of allelic vari- 
ability and for targeted backcross conversion pro- 
grammes. Genotype by environment interaction is 
one of the factors that has limited barley breeding for 
low input environments. Decentralization of the 
breeding process together with farmers’ participation 
is being deployed in developing countries. 


Pests and Diseases 


With respect to fungal pathogens of barley, powdery 
mildew (Erysiphe graminis) is of major significance 
and interest. Major gene resistance loci have been 
located on the seven barley chromosomes, and two 
genes (Mlo and Mla) responsible for resistance have 
been isolated and characterized. Cereal rusts (Puccinia 
graminis, P. hordei, P. striformis) are a second class of 
obligate biotrophic pathogens of economic signifi- 
cance. In addition Rhynchosporium secalis (scald), 
Pyrenophora teres (net blotch), P. graminea (leaf 
stripe), and Cochliobolus sativa (spot blotch) are 
important pathogens. For many pathogens, major 
gene resistance genes have been recognized and local- 
ized to chromosomes. Resistance to barley yellow 
dwarf virus (BYDV) conferred by Yd2 is located at 
the centrometric region of chromosome 3L. The bar- 
ley yellow mosaic virus complex comprises two 
different strains: barley mild mosaic virus (BaMMV) 
and barley yellow mosaic virus (BaYMV). Cereal 
cyst nematode Heterodea avenae is an important 
pest of barley with resistance genes being identified 
on chromosome 2L. 


Genetic Engineering of Barley 


The genetic transformation of barley is now possible 
using a variety of techniques. This has opened up the 
possibility of genetically engineering barley using 
cloned genes from any biological source, be it other 
plants, microorganisms such as bacteria and viruses, 
and even animals. The predominant technique of 
transforming barley is to use ‘biolistics,’ that is, 
shooting isolated pieces of DNA coated onto gold 
particles into target tissue. Target tissues are generally 
isolated microspores (immature pollen grains) or 
immature embryos excised from developing grains. 
After shooting, the target tissue is placed on a medium 
which allows the development of callus tissue, and 
transformed callus selected by the presence of an 
introduced selectable marker gene in addition to the 
target gene. Usually the selectable marker is the Bar 
gene, conferring resistance to the herbicide Bialophos, 
so that when the callus is cultured on media containing 
the herbicide only transformed tissue grows. Thus, 
most barley varieties transformed for a particular 
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desired trait, are also herbicide-resistant. Biolistic 
methods of transformation are random with respect 
to where the target genes are introduced into the 
genome, and usually several copies can be introduced 
in one or a few loci. Recently, barley has also been 
successfully transformed using Agrobacterium, and 
this may have the advantage of allowing more control 
of the gene integration process. 

Present commercial targets for the genetic engin- 
eering of barley include modifications for improved 
malting quality, better pest and disease resistance, and 
greater nutritional quality of the grain, although no 
transgenic barley has been released commercially in 
the world up to the beginning of the new millennium. 


See also: Grasses, Synteny, Evolution, and 
Molecular Systematics; Polyploidy; Transfer of 
Genetic Information from Agrobacterium 
tumefaciens to Plants; Triticum Species (Wheat) 
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Horizontal gene transfer is generally defined as the 
lateral transfer of a gene, or other DNA sequence, 
from one genome to another. Transfer between con- 
temporary individuals of different species is usually 
implied. However, a special case involves the horizon- 
tal transfer of DNA between chloroplast, or mito- 
chondrial, and nuclear genomes. Horizontal transfer 
is distinct from the normal mode of vertical transfer by 
which genetic information is passed from parent to 
offspring. In addition to entire genes, parts of genes, 
such as exons or introns, may also be transferred in 
this way. Sometimes horizontal transfer is also used to 
denote the transfer of a parasite, or endosymbiont, 
from its association with one host species to that of 
another. Although horizontal transfer is more likely to 
be successful between closely related than distantly 
related species, it does occur between species as diver- 
gent as those found in different kingdoms. This review 
focuses on horizontal transfers involving eukaryotic 
organisms. 


Frequency of Horizontal Transfer 


Until quite recently, it was widely believed that hori- 
zontal transfer was mostly restricted to bacteria, and 
that this process, if it occurred at all, had little import- 
ance for the understanding of evolution in eukaryotes. 


974 Horizontal Transfer 


With the advent of large-scale DNA sequencing it has 
become apparent that both the frequency and signi- 
ficance of this phenomenon have been considerably 
underestimated. Not only is there evidence that hori- 
zontal transfer is rampant among contemporary 
bacteria, but it also seems to have dominated the evo- 
lution of early life before modern cells came into 
being. Horizontal transfer appears to have become 
increasingly less frequent with the evolution of in- 
creasingly more complex eukaryotic cells and the 
erection of barriers to the promiscuous exchange of 
DNA between divergent lineages. It is important 
to note that only those transfers affecting the germ 
cells that produce the next generation are of any signi- 
ficance from an evolutionary perspective. When the 
germline is sequestered in specialized organs, as it is in 
humans, its reduced accessibility provides an add- 
itional barrier to horizontal gene transfer. Although 
the frequency of horizontal transfer involving eukary- 
otes appears to be extremely low compared to that 
among prokaryotes, it can, in some instances, have 
important evolutionary consequences, as described 
below. 


Mechanisms of Horizontal Transfer 


Horizontal transfer is an endproduct of a process, 
rather than a specific mechanism. Apart from in- 
stances of horizontal transfer following rare matings 
between closely related species that are usually 
reproductively isolated, mechanisms of horizontal 
transfer are by their very nature mating-independent. 
Transfers from eukaryotes to prokaryotes, by means 
of transformation, commonly occur in contemporary 
molecular biology laboratories. Transfers from pro- 
karyotes to eukaryotes may occur by transformation 
or conjugation in nature. A large variety of bacterial 
plasmids can stimulate conjugal transfer of DNA from 
bacteria to a broad range of organisms, including other 
bacteria, yeast, fungi and plants. Conjugal plasmids 
can survive in host species during normal vertical 
evolution and they have the ability to adapt to their 
new host following horizontal transfer. Other pos- 
sible mechanisms of non plasmid-mediated transfer 
into eukaryotic cells include endocytosis, mediated 
by mammalian cell transfer and fungus-to-fungus 
endoparasitism. In theory, viruses have many of the 
properties necessary to enable them to carry DNA 
sequences between species. However, many viruses 
have limited host ranges and well-documented ex- 
amples of viral transfer in nature have been difficult 
to find. Although parasitic wasps and mites may also 
serve as transfer vectors, the identity of the specific 
vector in most cases of horizontal transfer remains 
enigmatic. 


Detection of Horizontal Transfer 


The discovery of an outstanding discontinuity in the 
phylogenetic distribution of a gene or other DNA 
sequence, or the incongruence between gene trees 
and species trees, often provide reasons to suspect 
that horizontal transfer may have occurred. However, 
there are several pitfalls in making quick conclusions 
from such observations alone because a number of 
other mechanisms can also lead to incongruent phylo- 
genetic trees. These include unequal rates of nucleotide 
substitution, ancestral polymorphisms, convergent 
evolution and inappropriate comparisons between 
paralogous, rather than orthologous, members of 
multigene families. 


Transkingdom Horizontal Transfer 


A number of possible instances of horizontal transfer 
between different kingdoms have been proposed, but 
the supporting evidence is much stronger for some 
claims than others. The horizontal transfer of glucose- 
6-phosphate isomerase between a eukaryote (plant) 
and a prokaryote (ancestor of the bacterium Escher- 
ichia coli) provides one well-supported example. 
Another such example is the transfer of Fe-superoxide 
dismutase between a prokaryote and the eukaryotic 
protist Entamoeba histolytica. Endosymbiotic gene 
transfer is a special case of transkingdom horizontal 
transfer that was initiated by the import of certain 
bacteria into the bodies of early eukaryotes. These 
imports later evolved into the organellar genomes of 
mitochondria and chloroplasts that are now a univer- 
sal component of plant cells and the mitochondria 
found in all animal cells. Although these organelles 
have kept the majority of proteins that are integral to 
the eubacterial nature of their metabolisms, mitochon- 
dria and chloroplasts have subsequently relinquished 
the majority of their remaining genes to the nucleus by 
horizontal transfer. Some genes of eubacterial origin 
have replaced their nuclear homologs subsequent to 
transfer. In other instances, the products of other 
transferred genes were rerouted during evolution to 
compartments other than those from which the genes 
were donated. 


Horizontal Transfer between 
Eukaryotes 


In contrast to prokaryotes in which horizontal trans- 
fer is rampant, only relatively few cases involving 
eukaryotes have been well documented. These may 
conveniently be divided into two groups, depending 
on whether, or not, transposable genetic elements are 


involved. Some transposable elements naturally pos- 
sess the molecular machinery for inserting their DNA 
into different locations of a host genome — an import- 
ant prerequisite for successful horizontal transfer. 
Transposable elements routinely use this machinery 
for transposition to different sites within a single host 
genome, but occasionally it is used for jumping 
between genomes of different host species. However, 
unlike some viruses, transposable elements do not have 
the ability to survive outside of the environment of a 
host cell. Therefore, they are dependent on other organ- 
isms for transfer between species. In most instances the 
identity of thesetransfervectorsisnotknown. 

Prominent among well-documented examples of 
horizontally transferred transposable elements are 
the P and mariner elements that were first described 
in Drosophila species. Both these elements trans- 
pose by means of a DNA-DNA intermediate. The 
mariner element is capable of spectacular inter- 
kingdom jumps because it does not depend on host 
factors to integrate into the genome of a new species. 
In contrast, the P element does require host factors 
for integration and has a host transfer range that is 
apparently restricted to a few insect orders. Recent 
evidence indicates that copia and some other retro- 
elements that use reverse transcriptase for transpos- 
ition also have the ability for horizontal transfer 
between species. 


Significance 


The existence of horizontal gene transfer in nature has 
important implications for both basic and applied 
science. However, because of the infancy of studies 
in this area, the full significance of this process is not 
yet known. The strictly bifurcating tree of life as 
envisaged by Darwin assumes no exceptions to the 
vertical transmission inherent in normal parent- 
to-offspring inheritance of genetic material. In con- 
trast, horizontal transfer introduces crosslinks into the 
phylogenetic trees of those genes that are transferred 
and incongruities between the phylogenies of differ- 
ent gene sequences. Thus if horizontal transfer is fre- 
quent, our picture of the tree of life is changed 
significantly and serious practical difficulties can 
arise when attempts are made to infer phylogenies 
from horizontally transferred sequences. The exist- 
ence of natural horizontal transfer also has important 
implications for artificial gene transfer in medicine 
and agriculture. 


See also: Conjugation; Symbiosis Islands; Transfer 
of Genetic Information from Agrobacterium 
tumefaciens to Plants 
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Large virulent bacteriophages like coliphage T4 and 
subtilis phage SPO1 generally have many weapons in 
their arsenal, each of which by itself is capable of kill- 
ing or seriously damaging the host cell. The products 
of these genes are involved in shutting off host tran- 
scription, translation, DNA replication and/or cell 
division. The phage may also encode nucleases that 
selectively degrade the host DNA. If cloned into a 
host cell, each of these individual host-lethal genes 
can kill the host or drastically slow its growth if any 
expression of the gene occurs during growth of the 
host cell, even if the gene is not intentionally being 
expressed. Regions encoding such genes are generally 
missing from cloning libraries and/or contain many 
mutations, since only cells where the lethal functions 
have been lost can survive. Understanding the 
mechanisms involved in the virulence of such viral 
genes can provide new insights into key aspects of 
host physiology. The genes they target are presumably 
important to bacterial survival; thus, identifying those 
host genes can suggest potential targets and 
approaches for developing new classes of chemical 
antibiotics involving molecules that can mimic the 
effects of these phage proteins 

Cloning of such host-lethal genes is challenging, 
since readthrough of terminator sites generally 
permits a basal level of transcription of the entire 
plasmid even if the cloned gene is missing its own 
promoter and is put under the control of a promoter 
that can be carefully controlled. Special vectors have 
been developed to aid in cloning such genes in bacter- 
ial systems and to permit very tightly controlled over- 
expression of the cloned protein. For example, many 
of the pET vectors carry the lac operator region adja- 
cent to the cloning site along with the gene for the lac 
repressor in the opposite orientation following the 
cloning site. This blocks readthrough into the cloned 
gene both through binding of the lac repressor and 
through the synthesis of antisense messenger from the 
lac promoter. The pET vectors also have a bacterio- 
phage T7 late promoter, recognized only by the effi- 
cient T7-encoded RNA polymerase, in front of the 
cloning site. Thus, expression can be obtained by 
transferring the plasmid into a host with an inducible 
T7 polymerase gene under tight control or, in the case 
of very host-lethal proteins, by growing the cells into 
mid-log phase and then infecting them with a special 
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lambda phage into which the T7 polymerase gene has 
been cloned. This method works even for very host- 
lethal proteins like the T4 gpalc, which shuts off the 
elongation of transcription on all templates containing 
cytosine in their DNA ~a good strategy for T4, since it 
uses hydroxymethylcytosine rather than cytosine in its 
DNA. The alc protein provides a valuable tool for 
looking at the process of transcription elongation, 
since it is the only factor known that can produce 
termination only when the RNA polymerase is actively 
elongating, not when it is pausing or moving slowly. 


See also: Bacteriophages; Elongation Factors; 
Translation 
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The host range of a phage is the spectrum of cells that 
they can infect and lyse. For instance, the bacterio- 
phage T4 may infect a series of Escherichia coli strains, 
its host range. Host-range mutants of phage such as T4 
can be found that change the spectra of strains the 
phage can infect, now allowing infection of certain 
strains that could not be infected before. Often, the 
mutations that cause the altered host-range phenotype 
are in the phage tail fiber protein that adsorbs to 
specific receptor sites on the cellular exterior. 


See also: Bacteriophages 
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Enzyme systems for generalized recombination can 
effect recombination anywhere along a pair of hom- 
ologous chromosomes. However, the rate of such re- 
combination per internucleotide bond is not uniform. 
A short segment of chromosome with a conspicuously 
higher than average rate of recombination is a hot spot. 


Basic Properties of Meiotic Hot Spots 


Early-described hot spots for meiotic recombination, 
cog in Neurospora crassa and M26 in Schizosaccharo- 
myces pombe, manifest features that have character- 
ized most subsequently discovered hot spots: they can 


mutate to an inactive state; they can function when the 
hot spot is present on only one of the two homologs; 
they can increase recombination up to several kilo- 
bases away; and they promote meiotic gene conver- 
sion unidirectionally — genetic markers near and in cis 
to an active hot spot tend to be lost. 


Molecular Basis of Meiotic Hot spots 


Extrapolating from studies in Saccharomyces cerevi- 
siae, hot spots of meiotic recombination are sites at 
which chromatids are cut on both strands by a meio- 
sis-specific endonuclease. Repair of these cuts is car- 
ried out with the help of an intact chromatid, usually 
from the paired homolog. The homologous chromatid 
serves as a jig to align the two segments of the broken 
chromatid and as a template for the replacement of 
nucleotides lost subsequent to the cutting. The result- 
ing intermediate, which contains two Holliday junc- 
tions, is resolved in a manner that recombines the 
segments of DNA flanking the intermediate approxi- 
mately half the time. Whether the resolution effects 
such crossing-over or not, genetic markers between 
the junctions are subject to recombination by 
gene conversion, a local violation of the 2:2 rule of 
Mendelian segregation resulting from the loss and 
replacement of DNA segments during or after forma- 
tion of the intermediate. 

In S. cerevisiae, meiotic hot spots are manifested 
physically by the detection of meiosis-specific double- 
strand breaks and genetically by the high rates of 
conversion they impose on markers within a few kilo- 
bases and by the high rates of crossing- over they 
impose on markers flanking the region of conversion. 
One known hot spot confers a conversion rate on 
adjacent markers approaching 50%. More commonly, 
rates of 5%-10% are reported. These values are higher 
than those reported in other fungi, and they are atyp- 
ically high for S. cerevisiae. The rate of conversion 
falls with distance from a hot spot, resulting in a 
conversion gradient. 

Meiotic hot spots correspond to regions of the 
chromosome that are highly susceptible to cutting 
in vitro by endonucleases whose rate of cutting 
is limited by chromatin structure. These nuclease- 
sensitive regions tend to correspond to promoters of 
transcription. Large regions of some chromosomes 
have higher rates of recombination than other large 
regions, which may sometimes reflect the relative con- 
centration of transcription promoters. 


Hot Spots in Prokaryotes 


In prokaryotes also, hot spots correspond to DNA 
double-strand cut sites. Inphage lambda, whose circular 
replicating form is linearized at cos prior to packaging 


of the chromosome into a phage head, cos is a hot 
spot for recombination. The role of double-strand 
breaks as hot spots is illustrated also by lambda crosses 
in which the chromosome of one parent has a site for 
cutting by a restriction system carried by the host cell. 
Recombination in such a cross is focused close to the 
restriction site. 

In bacteria, the primary recombination pathway is 
dependent on proteins homologous to the RecA and 
RecBCD proteins of Escherichia coli. The pathway is 
activated by double-strand breaks, which serve as 
entry points for the RecBCD enzyme, which then 
unwinds the duplex processively from the double- 
strand break, cutting the resulting single strands as it 
does so. This destruction stops, with low probability 
per base pair, when the enzyme undergoes a transition 
that diminishes the nuclease but not the helicase activ- 
ity of the enzyme. This transition occurs with high 
(about 50%) probability at species-specific nucleotide 
sequences, called Chi. In E. coli, the fully active Chi 
sequence is 5’ GCTGGTGG 3’. The intact single 
strands resulting from the unwinding of DNA distal 
to Chi are recombinagenic after becoming coated with 
RecA protein. Thus, Chi is a hot spot of recombin- 
ation because it limits the extent of DNA degradation 
occurring at a double-strand break. This recombin- 
ation system helps maintain normal rates of DNA 
replication by promoting recombinational repair of 
accidentally broken replication forks. 

Some recombination systems are specialized to cut 
and rejoin DNA at specific nucleotide sequences. att 
of phage lambda is a specific site for recombination 
effected at high rate by lambda’s Int system. Int can 
also effect homologous recombination at low levels 
nearby the att site. Int-mediated recombination 
requires that both participants have an att site. 

A hot spot in the gene 34-35 region of phage T4 is 
absent in the closely related phage T2. DNA glucosyl- 
ation, which differs between the two phages, is 
required for the hot spot activity in T4. 


See also: Chi Sequences; Gene Conversion; 
RecBCD Enzyme, Pathway; Recombination, 
Models of 
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A hot spot is a site in the DNA that is significantly 


more mutable then normal. Seymour Benzer first 
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established the concept of hot spots in his classic 
studies of the r77 locus in bacteriophage T4 in the 
late 1950s and early 1960s. Benzer mapped a very large 
series of mutations in the r/J locus, assigning each 
mutation to a specific site. Two sites had an enormous 
number of recurrences of mutations and were clearly 
extraordinary hot spots. Statistical methods could 
show that other sites were also more mutable than 
normal. Subsequent work from different laboratories 
has revealed that hot spots are a general phenomenon. 
The molecular basis for some spontaneous hot spots 
are now understood. Benzer’s hot spots, as well as 
others in different genes, result from repeat-tract 
sequences i.e., tandemly repeated mono-, di-, or even 
tetranucleotides. For instance, in the lacL gene of 
Escherichia coli, the sequence 5'-CTGGCTGGCT- 
GG-3' appears in the wild-type. More than 70% of 
the spontaneous mutations in lacL are the addition 
or deletion of one of the tandemly repeated units, 
CTGG. In mismatch repair deficient backgrounds, 
repeat-tract sequences respresent very powerful hot 
spots. In the E. coli xylB gene, 90% of the spontan- 
eous mutations in a mismatch repair deficient back- 
ground are deletions or additions of a -G- at a run of 
eight Gs ((GGGGGGGG-) in the wild-type xy/B 
gene. 

5-methylcytosine residues also result in hot spots in 
many cases, since the deaminations at the 5-methyl- 
cytosine result in thymine across from guanine, 
that can lead to mutations at the next round of replica- 
tion if not repaired. Mutagen-induced mutations 
are rarely randomly distributed, resulting in hot 
spots at certain points. Neighboring pyrimidines 
are favored sites of UV-induced mutations, since sev- 
eral photoproducts occur at pyrimidine—pyrimidine 
sequences. Even among these sequences, however, 
hot spots still occur, for reasons that are not presently 
understood. 

In certain cases, hot spots for mutations are pro- 
grammed into natural DNA sequences, to allow for 
more frequent variation and sometimes to avoid 
host immune responses. For instance, in Haemo- 
philus influenzae, the intergenic region between the 
fimbriae protein encoding bifA and hifB genes has 
10 repeats of the -TA- sequence in the promoter. 
When the sequence mutates to 11 or 9 repeats, tran- 
scription is lowered, or abolished, respectively. The 
number of tandem repeats is so high that the result- 
ing hot spot allows 0.1-1% variation in a typical 
population. 


See also: Mutation, Spontaneous; Tandem 
Repeats 
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While different tissues in higher organisms are distinct 
phenotypically, they generally have the same set of 
genes. The phenotypic differences are brought about 
by differential regulation of gene expression. The 
genes that are expressed differentially are called ‘tis- 
sue-specific genes’ (or sometimes ‘luxury genes’). 
Housekeeping genes, on the other hand, are expressed 
in all tissues, and are generally assumed to be involved 
in key steps in cellular metabolism such as DNA 
synthesis, protein synthesis, transcription, or energy 
metabolism. 

As there is a vast difference in how tissue-specific in 
contrast to housekeeping genes must be regulated, it is 
not surprising that the promoters of these genes differ 
as well. Most housekeeping genes utilize a promoter 
lacking the common TATA and CAAT boxes, and 
having instead a series of GC boxes (consensus 
sequence GGGCGG). GC boxes provide binding 
sites for the transcription factor Sp1 and, like the 
TATA box, direct the start of transcription. Since 
there are several GC boxes in the promoters of many 
housekeeping genes, the transcription start site is 
ambiguous. Indeed, many housekeeping gene tran- 
scripts have heterogeneous 5’ start sites. The coding 
function of these genes is not impaired, however, 
because all of the alternative start sites are within the 
5’ untranslated region of the mRNA. As an example, 
the human c-Ha-ras oncogene promoter has about 
80% G+C content, 10 GC boxes, and at least four 
transcription start sites. 

The products of housekeeping genes may be 
needed in all cells, but in limited quantities. Therefore 
the housekeeping gene promoters are often weak, 
representing a baseline level of transcription. 

While some genes such as Hprt and Pgk fall 
squarely into the housekeeping category, as they are 
involved in nucleotide and energy metabolism, others 
are not so easily categorized. The metallothionein 
gene, for instance, is relatively quiescent, but is stimu- 
lated in the presence of heavy metals. This gene, how- 
ever, is available for transcription in all cell types, even 
though it may not actually be transcribed at a particu- 
lar point in time. 


See also: Gene Regulation; TATA Box; 
Transcription 
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Hox genes are the homologs of the homeotic genes of 
the fruit fly Drosophila. The Drosophila homeotic 
genes were first identified through mutations that 
caused the transformation of a particular segment of 
the fly body into the likeness of another, hence 
the term homeotic from the Greek word homeo, 
which means similar. With the advent of molecular 
biology these genes were isolated and found to 
encode proteins that play fundamental roles in con- 
trolling regulation of many other genes. The Hox 
genes share a 60-amino-acid DNA-binding motif, 
the homeodomain, and in association with other 
homeodomain-containing proteins act as transcrip- 
tion factors to regulate gene expression. Today we 
know that these homeobox (Hox) genes have been 
widely conserved during metazoan evolution and 
they are present in organisms ranging from primitive 
chordates to humans. They are generally linked 
in chromosomal clusters. In simple ancestral 
organisms there is a single cluster. In association with 
genome-wide duplications in higher animals, this gave 
rise to the four Hox clusters that encompass a total of 
39 Hox genes present in nearly all vertebrates, includ- 
ing mice and humans. 

A distinguishing hallmark of Hox clusters is the 
correlation between the physical arrangement of 
these genes along the chromosome and their temporal 
and spatial order of expression in the developing 
embryo. Genes located closer to the 3’ end of the 
chromosomal clusters will be expressed earlier and in 
more anterior domains than genes located closer to 
their 5’ ends. This property is known by the term 
temporal and spatial colinearity and is thought to 
reflect the mechanism that regulates the expression 
of these genes. Hox genes encode key developmental 
regulators, which specify the regional character of 
cells along the antero-posterior body axis of all three 
germ layers in both vertebrate and invertebrate 
embryos. 

Studies using the mouse and other vertebrates as 
model systems have shown that genetic mutations in 
some of the Hox genes or changes in their expression 
patterns result in abnormalities in a large number of 
tissues. This can cause defects in the nervous system, 
limbs, skeleton, and many organs. In some cases the 
defects are much milder than expected, but genetic 
studies have shown that some of these Hox genes 


work together and can compensate for each other. 
Hence a defect in one gene is corrected by the similar 
activities of other Hox proteins. In humans, specific 
Hox genes have been implicated in genetic disorders 
affecting development of the limbs and the genito- 
urinary tract. Several studies have suggested that 
Hox genes are also required for proper function of 
adult tissues. Specific Hox genes function together to 
control development of the mammary gland in 
response to pregnancy, whereas others may be 
involved in human endometrial development and 
implantation. Recent studies have also shown direct 
involvement of deregulated Hox genes in the develop- 
ment of human leukemias. 

Since the description of the first homeotic muta- 
tions by Bateson in 1894 and the discovery of the 
homeodomain in 1984 there has been tremendous 
progress in understanding the function of these 
important genes. These genes represent important 
control points in the processes that regulate mor- 
phogenesis or how tissues are formed and pat- 
terned. To build a picture of how this entire process 
occurs we still need to determine the immediate 
gene and cellular targets of their action in order to 
understand how they regulate cell growth and differ- 
entiation. 


See also: Homeotic Genes 


Hsp 


See: Heat Shock Proteins 
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Human T-cell lymphotrophic virus 1 (HTLV-1) is a 
9032 bp human C-type retrovirus that was isolated in 
1979 from T-cell lymphoma cell lines maintained in 
vitro with IL2. It was the first human pathogenic 
retrovirus to be described. HTLV-1 is the causative 
agent for at least two diseases, firstly a malignancy of 
mature CD4 T cells (adult T-cell lymphoma/leukemia 
or ATLL) and secondly, a neurological disorder 
known as either tropical spastic paraparesis (TSP) or 
HTLV-1-associated myelopathy (HAM); only the 
former is discussed here. 

Like other retroviruses, HTLV-1 contains Env 
(encoding receptor binding protein), Gag (core 
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protein), and Pol (RNA-dependent DNA polymer- 
ase) genes, but also Tax and Rex, genes involved in 
the regulation and splicing of viral RNA. HTLV-1 
lacks an obvious transforming oncogene. HTLV-1 
may infect several different cell types in vitro but 
only replicates efficiently in CD4+ T cells. The virus 
is endemic in the tropics and the prevalence may reach 
over 20% in some areas. Transmission may be vertical, 
from mother to infant by breastfeeding, or horizon- 
tally, via intravenous drug abuse, sexual contact, or 
transfusion of contaminated blood. The percentage 
of infected individuals developing either disease is 
very low and the factors necessary for the develop- 
ment remain unknown. 


ATLL 


In 1977, a rapidly progressive and uniformly fatal 
T-cell lymphoproliferative disorder in patients from 
the south west of Kyushu, Japan was described by 
Takatsuki and Uchiyama. An identical disease in 
patients from the Caribbean was described by 
Catovsky and colleagues in 1982 in London. Subse- 
quent investigations showed the presence of anti- 
bodies to HTLV-1 and monoclonal proviral integration 
in tumor cells. Other cases have now been reported 
from a number of other geographical sources includ- 
ing southeastern USA, South America (Chile and 
Brazil), and West Africa. In southwest Japan, ATLL 
constitutes a major health problem. Various clinical 
types of ATLL have been described, but ultimately, all 
forms progress and are fatal; treatment is often asso- 
ciated with opportunistic infections. Patients present 
with enlarged lymph nodes and skin rash often accom- 
panied by hypercalcemia. Diagnosis is usually made 
on the presence of cells with a characteristic con- 
voluted nuclear morphology (‘flower cells’) in the 
peripheral blood. These cells are characteristically 
CD4+ and CD25-+, the latter being a component of 
the IL2 receptor. There are no consistent cytogenetic 
abnormalities in ATLL patients and the mechanisms 
that promote transformation of infected CD4+ T cells 
are not known. Proviral integration appears to be 
random. Comparative genomic hybridization studies 
have shown amplification of 14q32 and 2p13 in some 
patients, although the nature of the target genes is not 
known. Recent work indicates that viral Tax may 
result in constitutive NF-«B activation and therefore 
prolonged cell survival through its interaction with 
the IKKB/IKKy complex of controlling kinases. 
Expression of antiapoptotic proteins such as BCL-x1 
may also be upregulated. 


See also: Retroviruses 
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Human chromosomes were probably first observed in 
cancer cells by Arnold in 1879. Hansemann in 1881 
and Flemming in 1898 attempted to count the number 
in serial sections of mitotic cells producing crude esti- 
mates of approximately 24. Quite different results 
were produced in 1912 by de Winiwarter. He was 
probably the first to study gonadal material and 
found 47 chromosomes in testis and 48 in ovary. He 
concluded that humans, like the locust, had an XX 
female/X male sex-determining mechanism. Painter 
in 1923 repeated this work on sections of testis mater- 
ial, in which he detected the small Y chromosome 
which de Winiwarter had apparently missed. He con- 
cluded that 48 and not 47 was the correct number for 
humans of both sexes, but mentioned in his publica- 
tion that in the clearest mitotic figures he could only 
count 46. 

There matters stood until 1956 when Tjio and 
Levan, working on colchicinized cell cultures treated 
with hypotonic fluid before fixation, regularly 
counted only 46 chromosomes, in samples from dif- 
ferent cultures. This number was confirmed as the 
correct number by Ford and Hamerton using testis 
material later the same year. 

More widespread interest in human chromosomes 
immediately followed the discovery by Lejeune and 
colleagues of an additional small chromosome in cells 
cultured from five children with Down syndrome. 
The observation that such a gross genetic abnormality 
could occur in a live, albeit handicapped, individual 
led to a search for similar chromosome abnormalities 
in other clinical syndromes. However, it was the para- 
doxical sex chromatin findings in the Turner and 
Klinefelter syndromes (see Klinefelter Syndrome; 
Turner Syndrome) which led to the next discovery of 
sex chromosome aneuploidy in these disorders later 
in 1959. 

These early results on human chromosome aber- 
rations were made on fibroblast cultures from skin 
biopsies or from bone marrow samples obtained by 
sternal aspiration. A major technical advance was 
made in 1960 when Moorhead and colleagues devel- 
oped the short-term culture of lymphocytes from 
peripheral blood samples. Chromosome analysis 
thus became more widely applicable for the investiga- 
tion of human chromosome aberrations. Cytogenetic 
laboratories have flourished ever since, using increas- 
ingly sophisticated methods for the identification of 


even smaller defects. The latest methodology now 
exploits multicolor fluorescent in situ hybridization 
and a wide range of other molecular genetic techniques. 


See also: Klinefelter Syndrome; Sex 
Determination, Human; Turner Syndrome 
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Human genetics is the study of genetics and biological 
variation in Homo sapiens. Its various branches 
are population genetics, cytogenetics, biochemical 
genetics, and genome studies including biodiversity 
and human evolution. Clinical genetics is that part 
of human genetics that studies genetic variation 
associated with the pathogenesis of disease (see Clin- 
ical Genetics). 


See also: Biochemical Genetics; Clinical Genetics; 
Cytogenetics; Ethics and Genetics; Genetic 
Diseases; Human Chromosomes; Human 
Genome Project; Population Genetics 
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The Human Genome Project (HGP) is an inter- 
national 13-year effort to sequence and discover all 
human genes (the human genome) and make them 
accessible for further biological study. The collabora- 
tive project began formally in October 1990, and 
involves 20 groups from the USA, UK, Japan, France, 
Germany, and China. Originally, the project was 
expected to last 15 years, but technological advances 
have brought forward the completion date to 2003. 
The total size of the human genome is estimated to 
be about 3 billion base pairs, arrayed in 24 distinct 
chromosomes (autosomes 1-22 plus X and Y). The 
chromosomes range in size from 50-250 million 
bases (megabases) long, too large to be sequenced 
directly, so each chromosome is first broken into rela- 
tively large fragments about 150000 bp long. The 
large fragments are inserted into bacterial artificial 
chromosomes (BACs), and genome mapping tech- 
niques are used to determine the position of each 


of these fragments in the genome. The next stage 
involves ‘shotgunning’ — cutting each of the fragments 
into smaller, overlapping pieces for sequencing (about 
500 bp each). Shotgunning at random, but repeatedly, 
ensures that some of the fragments will contain 
overlapping regions. Finally, the small DNA pieces 
are sequenced, the sequences assembled into the full 
sequence of the original BAC fragment, and the 
sequences of the BACs assembled to give the full 
chromosomal sequence. 

An alternative strategy for sequencing a genome 
is termed the ‘whole-genome shotgun’ method. This 
method does not involve mapped bacterial clones; 
instead, the whole genome is broken up into small 
pieces at random, the pieces are sequenced and the 
sequence reassembled. This method can produce 
sequence more rapidly, but reassembly of the infor- 
mation is more difficult, especially since about half of 
the human genome is composed of highly repetitive 
sequences. 

Chromosome 22 — the first human chromosome 
to be sequenced — was completed in December 1999. 
An initial ‘working draft,’ which covers more than 
90% of the euchromatic part of the genome (which 
contains most of the genes) at an accuracy of about 
99.9%, was completed in June 2000. The final ‘gold 
standard’ standard genome sequence — produced by 
sequencing each piece of the genome about 10 times — 
to an accuracy of 99.99% — is due for completion 
in 2003. 

All the DNA sequence produced by the Human 
Genome Project is released freely onto the Internet. 
The sequence is then analyzed to find the 
estimated 30-40 000 human genes encoded within it 
— but which comprise only about 5% of the entire 
genome. Other studies are examining variations in 
human genome sequences, in particular the single- 
nucleotide polymorphisms (or SNPs) which occur 
about once every 1000 bases and account for most 
of the variation between individuals. Using the 
genome sequence, functional genomics studies are 
examining how and when genes are expressed (tran- 
scriptomics) and the structure and function of the 
proteins encoded by the genes (proteomics). As these 
studies advance, the human genome sequence will 
undoubtedly have a significant impact on our under- 
standing of biological processes and advance the treat- 
ment of disease. 
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See also: Artificial Chromosomes, Yeast; 
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Hunter syndrome (type II mucopolysaccharidosis) 
is a rare, recessive X-linked genetic disorder almost 
exclusively limited to Caucasian males. The symp- 
toms arise from a loss of iduronate sulfatase activity, 
an enzyme required for degradation of the muco- 
polysaccharide components of connective tissues. 
Partially degraded mucopolysaccharides accumulate 
in the bones and connective tissues producing 
characteristic developmental defects such as facial 
distortions, dwarfism, and a hunched posture with 
flexed limbs. In the mild form of the disease, average 
life expectancy is about 20 years. Intellectual impair- 
ment is minimal and death is typically due to cardiac 
complications. For the more severe form, life expect- 
ancy is about 12 years. Progressive neurological 
deterioration, seizures, and emaciation characterize 
the later stages and death usually results from pul- 
monary failure. Bone marrow transplantation has 
been attempted as a corrective measure, but enzyme 
replacement, via protein or gene therapy, would 
appear to be the most hopeful future possibility. 


See also: Gene Therapy, Human; Sex Linkage 
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Huntington’s disease (HD) is an autosomal dominant 
neurodegenerative condition associated with abnor- 
mal movements, cognitive decline, and psychiatric 
disturbances. Symptoms most commonly appear be- 
tween the ages of 35 to 50 years, but the disease can 
present at any age. Death occurs about 15-20 years 


982 Huntington’s Disease 


after the initial symptoms. HD neuropathology is 
characterized by neuronal loss in the caudate nucleus, 
the putamen, and the cerebral cortex. HD is caused by 
abnormal expansions of a (CAG), trinucleotide repeat 
tract in the coding portion of a gene of currently un- 
known function, which maps to 4p16. The (CAG), 
repeats are translated into a polyglutamine tract. This 
mutation confers a deleterious new function on the 
mutant protein. The formation of abnormal ubiquiti- 
nated protein aggregates, containing the polygluta- 
mine-containing protein of the HD protein, are a 
characteristic of the pathology. 


Epidemiology and Clinical Features 


HD varies in prevalence in different populations. It is 
particularly common in the Zulia region of Venezuela, 
near the shores of Lake Maracaibo, where there is a 
cluster of cases derived from a single ancestor. This 
extensive pedigree of about 7000 individuals contains 
over 100 living affected cases. HD is rare in Japan 
(<0.5 per 100000) and among Black South Africans 
(1 per 100000). Its prevalence in the UK and USA 
ranges from about 5 to 10 per 100000. 

HD generally presents insidiously. In adults, 
the motor features include chorea, abnormal eye 
movements, dysphagia, dysarthria, rigidity, and gait 
disturbances. Swallowing difficulties often lead to 
death, either from suffocation or from starvation. 
Juvenile-onset HD often presents with a different 
picture, where bradykinesia, rigidity, and dystonia 
are dominant features and chorea may be absent. 

The overt cognitive features of HD generally start 
to manifest around the same time as the motor fea- 
tures present, although this is not universal. The 
patients develop a form a subcortical dementia which 
is progressive and becomes more global in the late 
stages of the disease. Subtle neuropsychological 
abnormalities have been detected in HD patients 
before any overt clinical features have manifested. 

HD patients can develop a range of psychiatric 
disturbances. Depression is the most frequent prob- 
lem and may be found in up to 40% of patients. The 
depression seen in HD is often a primary feature of the 
disease process, rather than a secondary reaction to the 
diagnosis or other symptoms. Irritability and apathy 
are also common features, while HD patients can 
develop obsessive-compulsive disorder and, rarely, 
schizophrenia-like features. 


Genetics 


HD is associated with abnormal expansions of a 
(CAG), repeat in the 5’ end of the coding region of 
a large gene called IT15. Normal chromosomes are 


polymorphic with respect to repeat number and have 
35 or fewer perfect repeats, while disease chromosomes 
are associated with 36 or more repeats. The mutant 
allele is expressed at the protein level and the (CAG), 
repeats are translated into a polyglutamine tract. 

HD shows the clinical feature of anticipation, 
where the age at onset of symptoms tends to decrease 
in successive generations. This phenomenon can be 
explained by a combination of the following two 
observations. First, while normal chromosomes have 
low mutation rates, the number of repeats on disease 
chromosomes frequently changes in successive gen- 
erations. Increases in repeat number tend to be more 
common than decreases when the mutation is passed 
through the male line, although this mutational bias is 
not an obvious feature of female transmissions. Sec- 
ond, age-at-onset of symptoms correlates inversely 
with repeat number, with juvenile-onset cases having 
particularly long alleles. The CAG repeats on disease 
chromosomes account for about 70% of the variance 
in the age at onset of symptoms. 

The penetrance of HD is not always complete, as 
some individuals with 36-39 repeats have lived into 
their ninth and tenth decades without clinical or 
neuropathological features of the disease. It has been 
suggested that genotype variation at the GluR6 kai- 
nate receptor locus may modify the age at onset of the 
primary mutation in the HD gene. 

HD is one of the rare diseases where homozygotes 
do not appear to have a more severe phenotype than 
heterozygotes in the same family. 


Pathology 


Neuronal loss in HD is particularly severe in the 
caudate nucleus, putamen, and cerebral cortex. How- 
ever, in advanced cases, there is overall atrophy and 
brain weight can be reduced by up to 25%. The cell 
loss in the caudate nucleus and the putamen (which 
together comprise the corpus striatum) is selective. 
The earliest loss is in the dorsal and medial regions 
and this progresses laterally and ventrally as the dis- 
ease takes its course. Within the striatum, the medium 
spiny neurons, particularly those synthesizing enke- 
phalin and y-aminobutyric acid (GABA), show par- 
ticular sensitivity to the HD mutation. In the cortex, 
the large neurons appear to be most severely affected, 
with greatest loss in layers VI, V, and III. 


Pathological Mechanisms 


The HD mutation confers a deleterious gain-of- 
function on the mutant protein. This model was sug- 
gested before the gene was cloned by observations 
in patients with Wolf-Hirschorn syndrome. These 


individuals have hemizygous deletions of the tip of 4p 
and are hemizygous for the HD gene but do not show 
the clinical features of HD. Subsequent to the HD 
mutation being identified, the gain-of-function 
mechanism has been confirmed. A woman with a 
balanced translocation disrupting the HD gene has 
been identified and shows no abnormalities. Trans- 
genic mice expressing only one HD allele have no 
features of the disease, while HD ‘null’ mice have 
embryonic lethality. This lethality is rescued by trans- 
genes with the HD mutation. Furthermore, a knock- 
in of the HD mutation into the endogenous mouse 
HD homolog is not associated with embryonic le- 
thality, even in the homozygous form. 

On the other hand, transgenic mice expressing exon 
1 of the human HD gene with expanded repeats do 
show an abnormal neurological phenotype. These 
transgenic mice develop abnormal aggregates contain- 
ing the expanded polyglutamine repeats in the nuclei 
of neurons. Subsequently, such neuronal intranuclear 
inclusions (NII) were found in brains of HD patients. 
It is not clear how these NIIs arise or how they relate 
to the neurodegenerationin HD. Two possible explana- 
tions for the mode of pathogenesis of the NIIs have 
come from work on the related disease spinocerebellar 
ataxia type 1, which is also caused by a (CAG),,/poly- 
glutamine expansion mutation. First, the inclusions 
appear to be alter matrix-associated structures, 
suggesting that this disease may result from disruption 
of nuclear function. Second, these inclusions are ubi- 
quitinated and appear to sequester some of the cellular 
machinery responsible for the degradation of short- 
lived proteins. Since the levels of short-lived proteins 
have important regulatory consequences, it is possible 
that perturbation of these proteins levels may result in 
cell death. 

It is not clear how these inclusions arise. Poly- 
glutamine stretches in proteins may predispose to 
aggregate formation, as such sequences can form 
polar zippers. The formation of the NIIs may be 
partly mediated by transglutaminase, since inhibition 
of this enzyme partially reduces NII formation in 
vitro. The rate of aggregate formation may also be 
greatest in fragments of the mutant HD protein con- 
taining the expanded polyglutamines and slower in the 
full-length mutant protein. The formation of such 
fragments appears to be partly mediated by caspases; 
thus these enzymes may play an important part in the 
pathogenic pathway. 


Relationships with Other Trinucleotide 
Repeat Diseases 


HD is one of a class of diseases caused by ab- 
normal expansions of (CAG),,/polyglutamine repeats, 
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including the spinocerebellar ataxias (SCA) types 1, 2, 
3, 6, 7, spinobulbar muscular atrophy, and dentatoru- 
bral—pallidoluysian atrophy. In general, these disease 
are associated with repeat expansions above 36-40 
glutamines, except for SCA6, which is associated 
with expansions of <30 repeats and may operate via a 
distinct mechanism. The other polyglutamine diseases 
appear to also be caused by gain-of-function mutations 
and intracellular aggregates have found in patients 
with SCA3, SCA1, SCA7, and DRPLA and in in 
vitro models of spinobulbar muscular atrophy. Thus, 
these disease are likely to share common patho- 
physiologies. However, it is not clear why the pattern 
of neurodegeneration in these diseases differs, par- 
ticularly since the disease proteins are often widely 
expressed. 
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See also: Genetic Counseling; Genetic Diseases; 
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Hurler syndrome is a genetic disorder resulting in a 
metabolic defect, and named after Gertrude Hurler, 
Austrian physician. Also known as gargoylism or 
mucopolysaccharidosis 1, Hurler syndrome is one of 
several rare genetic disorders involving a defect in the 
metabolism of mucopolysaccharides. Specifically, an 
autosomal mucopolysaccharidosis recessive storage 
disease in which o-iduronidase is absent, resulting in 
an accumulation of heparan and dermatan sulfates. 
Extensive deposits of mucopolysaccharide are found 
in gargoyle cells and in neurons. 

Onset of the syndrome is in infancy or early child- 
hood and affected individuals rarely live beyond ado- 
lescence. The disorder is characterized by severe 
mental retardation, large skull with wide-set eyes, 
heavy brow ridge and depressed nose bridge, hyper- 
trichosis, short neck, large tongue and lips, poorly 
formed teeth, and clouding of the cornea. Individuals 
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exhibit dwarfism with hunched back, short limbs and 
clawed hands, hirsutism, and deafness. Enlarged liver 
and spleen are common and coronary valves, vessels 
and heart muscles are often affected, leading to death 
from heart failure. 


See also: Genetic Diseases; Inborn Errors of 
Metabolism; Metabolic Disorders, Mutants 


Huxley, Thomas Henry 


K Handyside, E Keeling, and S Brenner 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 |.0648 


Thomas Henry Huxley (1825-95) was better known 
for his defence of Darwin’s theory of evolution by 
natural selection than his own scientific research. He 
did more than even Darwin himself to gain acceptance 
for the theory among scientists and the public. His 
passion for the theory gained him the title of “Dar- 
win’s Bulldog.” 

His family was not wealthy and his only childhood 
education was two years at Ealing School. However, 
he schooled himself in science, history, philosophy, 
and German. Huxley began a medical apprenticeship 
at the age of 15 and a scholarship at Charing Cross 
Hospital meant that he could continue his studies. 
However, he did not pursue a career in medicine and 
instead joined the British Navy as an assistant surgeon 
on the frigate HMS Rattlesnake which was sent to 
chart waters in the South Pacific. 

Returning to England in 1850 he found the research 
he had sent home on marine organisms had gained 
him entrance into the ranks of the English scientific 
establishment. He left the Navy in 1854 to go to the 
School of Mines in London and took up a lecturing 
position. For the next 40 years he was an active teacher, 
writer, and lecturer. 

At first, Huxley was not an outspoken defender of 
Darwin’s theory, disagreeing with certain ideas. But 
later, he began to accept evolutionary views and 
defended the cause in many debates. The most famous 
occasion, in June 1860, saw Huxley face the Bishop of 
Oxford, Samuel Wilberforce, at the British Associ- 
ation meeting in Oxford. All accounts describe it as 
an extremely heated debate, with Huxley declaring he 
would rather be descended from an ape than a bishop. 

By profession he was a biologist but in fact covered 
the whole field of exact sciences. His most famous 
book was published in 1863, five years after Darwin’s 
On The Origin of Species. Huxley’s Evidence on Man’s 
Place in Nature described what was known about 


primate and human paleontology and ethology, link- 
ing evolution to homo sapiens. 

Having had to fight his way into and to the top of 
the scientific profession he also helped set in place 
procedures for scientists to be awarded salaries. This 
gave all people, rich and poor, a chance to enter the 
scientific ranks. 


See also: Darwin, Charles 
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Hybrid is the term for the offspring from two genet- 
ically distinct parents. When the two parents have no 
recent common ancestry, the offspring are referred to 
as F; hybrids. 


See also: Fl Hybrid 


Hybrid-Arrested 
Translation 
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Hybrid-arrested translation is a technique used to 
identify cDNA representing an mRNA molecule, by 
virtue of its ability to base-pair with the RNA in vitro 
and thus to inhibit translation. 


See also: cDNA; Messenger RNA (mRNA) 
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Hybrid dysgenesis is a term used to describe a suite of 
phenotypic abnormalities, referred to as dysgenic 
traits, which are simultaneously induced by intraspe- 
cific hybridization. These traits were first described in 
Drosophila melanogaster. They include increased 
rates of mutation and recombination, chromosomal 


rearrangements (such as inversions and transloca- 
tions), and reduced fertility and viability. The genetic 
abnormalities result from the mobilization of certain 
families of transposable genetic elements (trans- 
posons) by intraspecific hybridization. 

In many instances, hybrid dysgenic traits are 
observed to occur nonreciprocally. For example, 
given two interacting strains, A (carrying a particular 
transposon family) and B (lacking the relevant trans- 
poson family), only crosses between males of strain A 
and females of strain B will produce dysgenic hybrids; 
the reciprocal cross, between males of strain B and 
females of strain A, will produce normal offspring. 
Usually, but not always, the mobility of the trans- 
posons is restricted to the germline of the host; the 
somatic, or body, cells are not affected. This is thought 
to be an evolved trait that reduces the likelihood that 
unbridled activity of the transposon will reduce the 
fitness of its host, and thus decrease its own chances of 
survival. 

Hybrid dysgenesis in nature appears to be asso- 
ciated with the arrival of an active transposon family 
in a new species by horizontal transfer, or introgres- 
sion. Examples are the P J, and hobo elements in 
D. melanogaster and the Penelope element in D. virilis. 
All four of these transposon families have invaded 
their new host species within the last century, possibly 
aided by increased human mobility and trade. Activa- 
tion of the P I, and hobo families of transposons is 
responsible for the PM, I-R, and H-E systems of 
hybrid dysgenesis, respectively. There is no evidence 
for cross-mobilization of elements among any of these 
three systems. However, in a fourth system, found in 
D. virilis, hybrid dysgenesis results in the simultan- 
eous activation of multiple families of transposons, 
including the Penelope, Ulysses, Paris, Helena, and 
Telemac families. 

Partial or complete sterility is a signal trait com- 
monly associated with hybrid dysgenesis. However, 
this hybrid sterility occurs in two distinctly different 
ways, referred to as GD sterility and SF sterility. GD 
sterility, or gonadal dysgenesis, describes the sterility 
associated with the P, hobo, and Penelope elements. In 
this case, one, or both, gonads of F; dysgenic hybrids 
are arrested at an early stage of development. If the 
arrested development is unilateral, then the individual 
will be fertile; individuals that are bilaterally affected 
are completely sterile. High temperatures applied at 
an early stage of development increase the frequency 
of gonadal dysgenesis. In contrast, the sterility caused 
by the I-R system of hybrid dysgenesis (SF sterility) is 
caused by partial or complete inviability of eggs laid 
by F; dysgenic hybrids. In this instance, low temper- 
atures, applied early in development, increase the fre- 
quency of sterility. 
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In addition to Drosophila, hybrid dysgenesis-like 
phenomena are observed in other insects, such as the 
Mediterranean fruit fly, Ceratis capitata, and midges 
of the genus Chironymus. As reports of hybrid dys- 
genesis have so far been largely restricted to well- 
studied insect species, it is not clear whether this 
phenomenon is really phylogenetically limited, or 
whether, with additional study, its occurrence will be 
found to be more widespread. 

Hybrid dysgenesis has evolutionary implications 
for the generation of new genetic variability with 
both negative and positive effects on the fitness of 
affected individuals. The discovery of the P-M system 
of hybrid dysgenesis led to the development of a new 
generation of tools for the genetic engineering of 
Drosophila. For example, the P element was de- 
veloped as a transformation vector that allowed the 
production of transgenic flies through the manipula- 
tion of germline DNA. 


See also: Horizontal Transfer; Transposable 
Elements 


Hybrid Sterility, Mouse 
S H Pilder 
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Hybrid sterility, the phenomenon in which the hybrid 
offspring of parents from different populations fail 
to produce functional gametes, is a postzygotic 
reproductive isolating mechanism (RIM) that impedes 
gene exchange between diverging populations. This 
trait, generally thought to arise as an incidental by- 
product of genetic differentiation, is considered a 
causal hallmark of incipient speciation. Most instances 
of hybrid sterility follow Haldane’s Rule, a general- 
ization proffered by J.B.S. Haldane. He observed that 
when parents from divergent populations produce 
hybrid progeny, the absent, rare, or sterile sex among 
the offspring is always the heterogametic sex. In keep- 
ing with this ‘rule,’ hybrid sterility in the genus Mus 
(mouse) is male specific. 

In Mus, hybrid sterility maps to seven genetic loci 
named Hybrid Sterility 1-7 (Hst1-7) are numbered by 
order of discovery. The Hst1 phenotype appears to be 
governed by a single gene located in the third inver- 
sion from the centromere, In(17)3, of the region of 
proximal chromosome (Chr) 17 known as the t com- 
plex. This infertility trait is exhibited by male progeny 
of crosses between particular laboratory inbred strains 
of the species Mus musculus domesticus (domesticus) 


986 Hybrid Sterility, Mouse 


and some wild mice from the closely related species, 
Mus musculus musculus. These two incompletely iso- 
lated species diverged from a common ancestor nearly 
one million years ago, and presently form a narrow 
hybrid zone across Europe through which introgres- 
sion of some genes continues to occur. 

Hst1 affected males suffer from spermatogenic 
arrest at pachytene I of meiosis, a defect which is 
germ cell autonomous. While the gene responsible 
for the Hst1 phenotype has not yet been cloned, it is 
physically contained on a single 580 kb yeast artificial 
chromosome (YAC), and several testis-expressed can- 
didate genes mapping to this YAC have been isolated. 
Because alleles of Hst1 may interact epistatically with 
other hybrid sterility genes, the efficacy of these can- 
didate genes to affect the Hst? phenotype may be 
difficult to determine. 

The Hst2 and Hst3 phenotypes were originally 
identified on the basis of backcross analyses between 
Mus spretus and M. domesticus. These species diverged 
approximately three million years ago, and do not 
interact in the wild. However, they will occasionally 
interbreed in the laboratory if caged together. The 
hybrid male progeny of these matings are always 
sterile. The existence of Hst2, originally mapped to 
chromosome 9, has since been questioned, and its 
assignment to chromosome 9 has been retracted. 
Hst3 has been mapped close to the pseudoautosomal 
region (PAR) of the X chromosome, tightly linked to 
the Sxa locus, thought to control X-Y chromosome 
association during meiosis. While Sxa could be Hst3, 
it is possible that the Hst3 phenotype is caused by 
chromosomal rather than genic incompatibility be- 
tween the PARs of different species. As yet, there is 
no definitive evidence in support of one possibility 
versus the other. 

A unique phenotype of male-specific hybrid 
sterility was discovered when chromosome 17 from 
Mus spretus (S) was introgressed into the domesticus 
genetic background. In this case, the affected male 
offspring carried S and a domesticus homolog known 
as at haplotype (t), a peculiar variant of the t complex 
region. This aberrant chromosome 17 polymorphism 
has been shown to house genetically interacting fac- 
tors which perturb spermatogenesis in domesticus, so 
that +/t heterozygous males express a meiotic drive 
phenotype in which the ¢ homolog is passed to the 
progeny of affected males at an abnormally high 
ratio. Interestingly, the same set of interacting genes 
that causes meiotic drive in the +/t heterozygote 
appears to be the basis of t/t homozygous male ster- 
ility, a phenotype that is absolute. In retrospect, the 
singular S/t hybrid sterility trait appears to result from 
an interaction of alleles on the S chromosome 17 


homolog with mutant alleles on the thomolog, rather 
than wild-type, domesticus alleles carried on the t 
homolog. 

The gross S/t hybrid sterility phenotype derives 
from the expression and/or epistatic interaction of 
four discrete t complex loci, Hst4, 5, 6, and 7. Three 
of these loci (Hst4, 5, and 6) are tightly linked to each 
other as well as to the strongest t haplotype meiotic 
drive locus within the confines of the largest and most 
distal of the t complex inversions, /n(17)4. The fourth 
locus, Hst7, maps to the smallest and most proximal 
t complex inversion, In(17)1, to which another power- 
ful enhancer of t-specific meiotic drive has been loca- 
lized. While Hst4, 5, 6, and 7 also map in close 
proximity to Hst1, significant differences exist in 
the way in which Hst1 and these other chromosome 
17 genes manifest their effects on spermatogenesis. 
Unlike Hst, which appears to be a meiotically 
expressed defect resulting in almost complete sperm- 
atogenic arrest, Hst4, 5, 6, and 7 are all expressed 
postmeiotically, affecting spermatid differentiation 
(axonemal assembly and/or mitochondrial sheath 
maturation) and sperm function (sperm motility, fla- 
gellar curvature, and/or sperm-—egg penetration). 

The most studied of these four loci is Hst6, map- 
ping to a region of less than 1 centimorgan. Three 
genes map within this locus, two of which influence 
sperm flagellar curvature, while the third, sandwiched 
between the other two, plays a role in sperm- 
oolemma interaction. Additionally, in the domesticus 
background, homozygosity for the spretus allele of the 
proximal-most flagellar curvature gene causes a break- 
down in the assembly of the sperm axoneme, the 
functional backbone of the sperm tail. Moreover, 
because both t/t homozygous males and Hst6*/t 
males express an indistinguishable abnormality in 
sperm flagellar curvature, it is feasible that Hst6 is 
identical to the strong, distal t haplotype factor caus- 
ing male meiotic drive and sterility in the domesticus 
species. Thus, an intensive effort to isolate the Hst6 
genes is currently underway. 

Considerable work remains to be done in terms of 
understanding the process of speciation and the evo- 
lution of genetic diversity. In particular, a thorough 
molecular analysis of hybrid sterility in the mouse 
would be of benefit in elucidating the biological 
mechanism underlying Haldane’s Rule, the roles of 
natural selection and genetic drift in generating hybrid 
sterility phenotypes in Mus as well as other genera, 
and the relationship between meiotic drive and hybrid 
sterility. 


See also: Meiosis; Reproductive Isolation; 
Speciation 
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Hybrid vigor is the unusual health, stature, or fitness 
of offspring produced from the mating of unrelated 
inbred strains or between closely related species. 
‘Hybrid vigor’ that occurs as a result of mating 
between unrelated inbred strains, also known as ‘het- 
erosis, should be distinguished from hybrid vigor 
that can occur in matings between closely related 
species. 

In the case of inbred parental strains, stature, 
health, and reproductive performance are commonly 
superior in F, hybrids. In these cases, the increase 
in fitness may arise from the complementation of 
deleterious recessive alleles fixed during inbreeding. 
This is also known as ‘associative overdominance.’ 
Vigor among F; hybrids may also arise from the 
synergistic interaction of alternate alleles at the same 
locus. This is referred to as ‘true overdominance.’ 
Both associative and true overdominance are the con- 
sequence of a phenomenon known as ‘inbreeding 
depression,’ the commonly observed decrease in fertil- 
ity, health, and viability that occurs during the process 
of inbreeding. 

In the case of closely related species, an increase in 
growth or stature in hybrids is frequently accompan- 
ied by defects in fertility such as ‘hybrid sterility.’ In 
interspecific crosses, where parents are taken from 
wild, noninbred populations, the complementation 
of deleterious recessive alleles is not responsible 
for hybrid vigor. The causes of hybrid vigor for 
interspecific hybrids is not well understood. One 
consistent trend is that reciprocal crosses between 
closely related species produce hybrid vigor in one 
direction but not in the reciprocal cross. One example 
of this common phenomenon are crosses between 
closely related species of Peromyscus, or common 
North American field mice. Crosses between P. man- 
iculatus and P. polionotus yield large, vigorous F; pups 
when the father is P maniculatus, but produce small, 
less fit F, offspring when the father is P. polionotus. 
The basis for this phenomenon may be any of a num- 
ber of factors that underlie parent-of-origin effects 
such as sex chromosomes, maternal nourishment, 
maternal care, maternally transmitted episomes, and 
genomic imprinting. 


See also: Heterosis; Hybrid Sterility, Mouse; 
Inbreeding Depression; Overdominance 


Hybrid Zone, Mouse 987 


Hybrid Zone, Mouse 
L Silver 
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Although mouse systematicists have reached a 
consensus on the structure of the Mus musculus 
group — with the existence of only four well-defined 
subgroups — there is still a question as to whether each 
of these subgroups represents a separate species, or 
whether each is simply a subspecies, or race, within a 
single all-encompassing house mouse species. The 
very fact that this question is not simply answered 
attests to the clash that exists between (1) those who 
would define two populations as separate species only 
if they could not produce fully viable and fertile 
hybrid offspring, whether in a laboratory or natural 
setting, and (2) those who believe that species should 
be defined strictly in geographical and population 
terms, based on the existence of a natural barrier (of 
any kind) to gene flow between the two populations. 

The first question to be asked is whether this is 
simply a semantical argument between investigators 
without any bearing on biology. At what point in the 
divergence of two populations from each other is the 
magic line crossed when they become distinct species? 
Obviously, the line must be fuzzy. Perhaps, the house 
mouse groups are simply in this fuzzy area at this 
moment in evolutionary time, so why argue about 
their classification? The answer is that an understand- 
ing of the evolution of the Mus group in particular, and 
the entire definition of species in general, is best served 
by pushing this debate as far as it will go, which is the 
purpose of what follows. 

Each of the four primary house mouse groups 
occupies a distinct geographical range. Together, these 
ranges have expanded out to cover nearly the entire 
land mass on the globe. In theory, it might be possible 
to solve the species versus subspecies debate by exam- 
ining the interactions that occur between different 
house mouse groups whose ranges have bumped up 
against each other. If all house mice were members 
of the same species, barriers to interbreeding might 
not exist, and as such, one might expect boundaries 
between ranges to be extremely diffuse with broad 
gradients of mixed genotypes. This would be the pre- 
diction of laboratory observations, where members 
of both sexes from each house mouse group can 
interbreed readily with individuals from all other 
groups to produce viable and fertile offspring of both 
sexes that appear to be just as fit in all respects as 
offspring derived from matings within a group. 


988 Hybrid Zone, Mouse 


However, just because productive interbreeding 
occurs in the laboratory does not mean that it will 
occur in the wild where selective processes act in full 
force. It could be argued that two populations should 
be defined as separate species if the offspring that 
result from interbreeding are less fit in the real world 
than offspring obtained through matings within either 
group. Itis known that subtle effects on fitness can have 
dramatic effects in nature and yet go totally unrecog- 
nized in captivity. If this were the case with hybrids 
formed between different house mouse groups, the 
dynamics of interactions between different popula- 
tions would be quite different from the melting-pot 
prediction described above. In particular, since inter- 
specific crosses would be ‘nonproductive,’ genotypes 
from the two populations would remain distinct. 
Nevertheless, if the two populations favored different 
ecological niches, their ranges could actually overlap 
even as each group (species) maintained its genetic 
identity — such species are considered to be ‘sympatric.’ 

Species that have just recently become distinct 
from each other would be more likely to demand the 
same ecological niches. In this case, ranges would not 
overlap since all of the niches in each range would 
already be occupied by the species members that got 
there first. Instead, the barrier to gene flow would 
result in the formation of a distinct boundary between 
the two ranges. Boundary regions of this type are 
called hybrid zones because along these narrow geo- 
graphical lines, members of each population can inter- 
act and mate to form viable hybrids, even though gene 
flow across the entire width of the hybrid zone is 
generally blocked. 

The best-characterized house mouse hybrid zone 
runs through the center of Europe and separates the 
domesticus group to the West from the musculus group 
to the East. If, as the one-species protagonists claim, 
musculus and domesticus mice simply arrived in Europe 
and spread toward the center by different routes — 
domesticus from the southwest and musculus from the 
east — then upon meeting in the middle, the expectation 
would be that they would readily mix together. This 
should lead to a hybrid zone which broadens with time 
until eventually it disappears. In its place initially, one 
would expect a continuous gradient of the characteris- 
tics present in the original two groups. 

In contrast to this expectation, the European hy- 
brid zone does not appear to be widening. Rather, it 
appears to be stably maintained at a width of less than 
20km. Since hybridization between the two groups 
of mice does occur in this zone, what prevents the 
spreading of most genes beyond it? The answer 
seems to be that hybrid animals in this zone are less fit 
than those with pure genotypes on either side. One 


manner in which this reduced fitness is expressed 
is through the inability of the hybrids to protect 
themselves against intestinal parasites. It has shown 
through direct studies of captured animals that hybrid 
zone mice with mixed genotypes carry a much larger 
parasitic load, in the form of intestinal worms. This 
finding has been independently confirmed. Superfi- 
cially, these ‘wormy mice’ do not appear to be less 
healthy than normal; however, one can easily imagine 
a negative effect on reproductive fitness through a 
reduced life span and other changes in overall vitality. 

Nevertheless, for a subset of genes and gene com- 
plexes, the hybrid zone does not act as a barrier to 
transmission across group lines. In particular, there is 
evidence for the flow of mitochondrial genes from 
domesticus animals in Germany to musculus animals 
in Scandinavia with the reverse flow observed in Bul- 
garia and Greece. An even more dramatic example of 
gene flow can be seen with a variant form of chromo- 
some 17 — called a t haplotype — that has passed freely 
across the complete ranges of all four groups. 

In contrast to the stable hybrid zone in Europe, 
other boundaries between different house mouse 
ranges are likely to be much more diffuse. The extreme 
form of this situation is the complete mixing of two 
house mouse groups — castaneus and musculus — that 
has taken place on the Japanese islands. So thorough 
has this mixing been that the hybrid group obtained 
was considered to be a separate group unto itself — 
with the name Mus molossinus — until DNA analysis 
showed otherwise. 

In the end, there is no clear solution to the one- 
species versus multiple-species debate and it comes 
down to a matter of taste. However, the consensus has 
been aptly summarized by Bonhomme: 


None of the four main units is completely genetically isol- 
ated from the other three, none is able to live sympatrically 
with any other. In those locations where they meet, there is 
evidence of exchange ranging from differential introgres- 
sion...to a complete blending. It is therefore necessary to 
keep all these taxonomical units, whose evolutionary fate is 
unpredictable, within a species framework 


Thus, in line with this consensus, the four house 
mouse groups are described by their subspecies names 
M. m. musculus, M. m. domesticus, M. m. castaneus, 
and M. m. bactrianus. M. musculus is used as a generic 
term in general discussions of house mice, where the 
specific subspecies is unimportant or unknown. 


See also: Mus musculus; Mus musculus castaneus; 
Speciation; Sympatric 


Hybridization 
T M Picknett and S Brenner 
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Hybridization (of nucleic acids) is a technique in 
which single-stranded nucleic acids are allowed to 
interact to form complexes, or hybrids with suffi- 
ciently similar complementary sequences. This tech- 
nique allows the detection of specific sequences or 
may be used to assess the degree of sequence identity. 
Hybridization may be carried out in solution or more 
commonly on a solid-phase support, e.g., nitrocellu- 
lose paper. The hybrid of interest is often identified 
with a radioactively, or alternatively labeled nucleic 
acid probe or by digestion with an enzyme that 
specifically attacks single-stranded nucleic acids. 
Hybridization can be performed with combinations 
of DNA-DNA (heat-denatured to produce single 
strands), DNA-RNA, or RNA-RNA molecules. Jn 
situ hybridization of labeled nucleic acids with pre- 
pared cells or tissue sections is used to identify specific 
transcription or to locate genes on specific chromo- 
somes (e.g., fluorescence im situ hybridization, FISH). 


See also: DNA Hybridization; FISH (Fluorescent 
in situ Hybridization); Probe 
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There are two, genetically related, types of abnormal 
placental morphogenesis known as complete and 
partial hydatidiform moles. Their basic etiology is 
diagrammatically illustrated in Figure |. Complete 
hydatidiform mole represents a proliferation of cells 
containing 46 chromosomes of paternal origin only, 
while partial hydatidiform mole is usually associated 
with triploidy (69 chromosomes) where two paternal 
and one maternal haploid complements are present. 
The dominance of the paternal sets is a common fea- 
ture of both moles while the presence of a maternal set 
in partial mole and its complete absence in complete 
mole represents the main difference. It has been 
shown that both parental genomes are required for 
normal embryogenesis and that the paternal genetic 
contribution is essential for the development of pla- 
cental (extraembryonic) tissues, whereas the maternal 
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genetic contribution is more important in the devel- 
opment of the early embryo. This differential expres- 
sion of genetic messages, depending on their maternal 
or paternal origin, is known as genomic imprinting 


(Hall, 1990). 


Complete Hydatidiform Mole (CHM) 


CHM is typically detected between the 11th and 25th 
week of pregnancy with an average gestational age of 
about 16 weeks. Excessive uterine enlargement occurs 
and may be accompanied by severe vomiting and 
pregnancy-induced hypertension. Ultrasonography 
often discloses a classic ‘snowstorm’ appearance. 
CHM is characterized by gross generalized villous 
edema with enlarged placental villi forming ‘grape- 
like,’ transparent vesicles, measuring up to 2cm, 
absence of amnion, umbilical cord, and embryo/ 
fetus. In all instances, when CHM is associated with 
an embryo or fetus, this finding represents a twin 
gestation (Lage et al., 1992). For microscopic features, 
see Table |. 

The majority of complete moles have a 46,XX kary- 
otype, resulting either from dispermy or duplication 
of haploid sperm in an anuclear ovum. This process is 
known as diploid androgenesis (Kajii et al., 1984). The 
undisputable result of dispermy, XY moles, represent- 
ing only some 4% of complete moles, originate from 
the fertilization of an anuclear ovum by two sperm- 
atozoa. No significant difference has been noted in the 
gross and microscopic findings between the XY and 
XX complete moles. Studies of invasive moles and 
choriocarcinomas have led to the suggestion that het- 
erozygous complete moles (caused by dispermy) may 
have a more malignant potential than their homo- 
zygous counterparts arising through diploid andro- 
genesis. 


Partial Hydatidiform Mole (PHM) 


PHM is more common than CHM. Morphologically, 
partial moles differ from that of a complete mole in 
three principle respects: 


1. An embryo/fetus is usually present. 

2. Microcystic pattern may be diffuse or focal and is 
not as prominent as in a complete mole and tropho- 
blastic hyperplasia is both less prominent and strik- 
ingly focal. 

3. Genetically partial hydatidiform moles are usually 
triploid with two paternal and one maternal haploid 
complements (Hall, 1990). They result from fertil- 
ization of a normal ovum either by a diploid sperm 
or by two different haploid sperm. Occasionally, 
tetraploidy, arising as a result of abnormal fertiliza- 
tion of a haploid ovum by sperm representing three 
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CHM 
46 chromosomes 


PHM 
69 chromosomes 


Figure | Origin of complete (CHM) and partial (PHM) hydatidiform moles. 
Table | Differential features of complete and partial moles 
Feature Complete mole Partial mole 


Clinical presentation 


Spontaneous abortion 


Missed or spontaneous abortion 


Gestational age 16-18 weeks 12-20 weeks 

Uterine size Often large for dates Often small for dates 

Serum hCG ++++ + 

Cytogenetics XX (over 90%) or XY (> 10%) Triploid XXY (58%), XXX (40%), XYY (2%) 


Two paternal sets 


Persistent gestational trophoblastic 10-30% 
disease 
Embryo/fetus Absent 
Microscopic features 
Villous outline Round 
Hydropic swelling Marked 


Circumferential 
Often present 


Trophoblastic proliferation 
Trophoblastic atypia 
Immunocytochemistry® 


BhCG ++++ 
ahCG + 
PLAP + 

PL ++ 


Two paternal sets and one maternal set 
4—11% Same rate as in nonmolar pregnancies 


Present 


Scalloped 

Less pronounced 
Focal, minimal 
Absent 


+ 

++++ 
++++ 
++++ 


“hCG, human chorionic gonadotropin; PLAP, placental alkaline phosphatase; PL, placental lactogen. (Modified from 
Silverberg SG and Hurman RJ (1992) Atlas of Tumor Pathology: Tumors of the Uterine Corpus and Gestational Trophoblastic 
Disease. Washington DC: Armed Forces Institute of Pathology.) 


paternal chromosome sets, is detected. A few tri- 
somic conceptuses with partial mole-like morph- 
ology have been described. 


The gross specimen in PHM shows hydropic villi like 
those seen in CHM mixed with nonmolar placental 
tissue. Evidence of an embryo or an amnion is usually 
present; stromal vasculature and vessels may contain 
fetal nucleated erythrocytes. Microscopic and 
differential features between CHM and PHM are 


summarized in Table |. However, the only conclusive 
means for the differential diagnosis is by cytogenetics 
or more practically flow cytometry (Lage et al., 1992). 
It is important to distinguish between partial and 
complete moles, as the malignant transformation rate 
in partial hydatidiform mole is the same as in any 
nonmolar pregnancy. 

The parental origin of the extra haploid set in tri- 
ploidy has been shown to have a detectable effect on 
fetal phenotype in the second and third trimester. Two 
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fetal phenotypes have been delineated: type I fetus 
with paternal sets dominance, associated with a large 
cystic placenta, has relatively normal fetal growth and 
microcephaly; type II fetus with maternal sets dom- 
inance, associated with a small noncystic placenta, is 
markedly growth retarded, and has a disproportion- 
ately large head (McFadden and Kalousek, 1991). 
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See also: Triploidy 


Hyperchromicity 
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Hyperchromicity is the increase in optical density 
(OD) that occurs when DNA is denatured. 


See also: DNA Denaturation 


Hypervariable Region 
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A hypervariable region is a region of either heavy or 
light chains of immunoglobulin molecules displaying 
great sequence diversity. This region specifies the anti- 
gen affinity of an antibody. 


See also: Constant Regions; Immunoglobulin 
Gene Superfamily 
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The ichthyoses (literally fish-scale dermatoses) are not 
only extremely heterogeneous, but are also a spec- 
tacular example of the application of modern molecular 
biology and protein chemistry to the wider biology of 
the epidermis. The latter has proven to be extraordin- 
arily diverse and very much more complex and subtle 
than would have thought to be the case. Thus the 
molecular pathology of the ichthyoses afflicts basic 
structural components such as keratin intermediate 
filaments, cell envelope proteins, sulfating enzymes, 
desmogleins, and desmocollins. 

Clinical classification includes autosomal dominant 
ichthyosis vulgaris, X-linked recessive ichthyoses, a 
variety of autosomal recessive erythrokeratodermas, 
and various other localized striate variants, often 
grouped under the term ichthyosis congenita or the 
collodion fetus. In other cases, there is overlap with 
proven keratin disorders such as epidermolysis bul- 
losa simplex, Weber, Cockayne, and Dowling Meara. 
Some classifications are more complex than others and 
Mallory includes the ichthyoses under the term dis- 
orders of cornification (DOC) in which she also 
includes Darier disease as DOC 22 (Mallory and 
Leal-Khouri, 1994). However her groups DOC 1-7 
correspond to ichthyosis vulgaris, steroid sulfatase 
deficiency, bullous epidermolytic hyperkeratosis, col- 
lodion baby, congenital erythrodermic, autosomal 
dominant lamellar ichthyosis, and the harlequin fetus 
respectively. Types 8 and 9 are ichthyosis hystrix and 
Netherton syndrome, respectively, whilst types 10 
and 11 are Sjogren-Larssen and Refsum disease, 
respectively. Types 12-24 are either extremely rare or 
regarded as disorders of cornification but not strictly 
ichthyoses. 


Ichthyosis Vulgaris Simplex 


This is the commonest form of ichthyosis, with onset 
within the first 3 months of life. There is fine scaling 


of the extensor surfaces, sparing the trunk and flex- 
ures. There is criss-crossing of the palms and soles 
and histologically the granular cell layer is deficient, 
with epidermal hyperkeratosis. There is frequently 
associated atopy. Profillagrin deficiency has been 
identified (Sybert et al., 1985). 


X-Linked Ichthyosis 


This is also has a very early onset, at birth or within 
the first 3 months of age. The distribution of scaling 
differs substantially from ichthyosis vulgaris. Thus it 
involves the scalp, ears, neck, and flexures and affects 
the abdomen and the anterior trunk. Unlike ichthyosis 
vulgaris, the epidermis is hypertrophic, with a normal 
granular layer. 

Both 3f-steroid sulfatase and aryl sulfatase are 
deficient causing estriol deficiency, delayed labor, 
and increased fetal loss. Postnatally, affected boys 
develop ichthyosis. In some families there is hypo- 
gonadism. The STS gene has been cloned and in most 
cases is completely deleted, but if not has 5’ misfunc- 
tional deletions (Basler et al., 1992). In other cases 
there are point mutations. The steroid sulfatase 
enzyme assay is also very reliable, and a simple stain- 
ing assay for hexanol dehydrogenase provides rapid 
confirmation of diagnosis (Lake et al., 1991). 


Epidermolytic Hyperkeratosis and 
Ichthyosis Bullosa of Siemens 


In epidermolytic hyperkeratosis (EH) there are 
generally blisters or erosions at birth, followed later 
by generalized infantile or childhood scaling (Figure 
1A,B), closely resembling hyperkeratosis. Histologic- 
ally, the upper spinous layer is vacuolated with 
clumping of keratin filaments visible with electron 
microscopy (Haenke and Anton-Lamprecht, 1982). 
As such it is closely etiologically related to epidermo- 
lysis bullosa simplex (EBS). Like EBS there are muta- 
tions of keratins 1 and 10, usually in the highly 
conserved rod domains (Rothnagel et al., 1992). 
Ichthyosis bullosa of Siemens is similar to EH, but 
has general erythema at birth, followed by erythema 
and blistering. Later large grey hyperkeratoses 
develop with lichenification. Siemens skin is more 
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(B) 


(See Plate 21) (A) Generalized hyperkeratosis with background erythema of the lower limbs; 
(B) palmoplantar hyperkeratosis extending proximally, typical of epidermolytic hyperkeratosis. 


delicate than EH skin. However, like EH, there are 
keratin mutations, in this case in the rod domain of the 
keratin 2e gene on chromosome 12. 


Usually affected infants are collodion babies at birth. 
There are ectodermal dysplastic features, with poor 
sweating, dystrophic nails, alopecia, and ectropion. 
However, the very large branny scales are diagnostic 
and very typical and, furthermore, the teeth are nor- 
mal. Inheritance is usually autosomal recessive. 
Another variant has much more severe erythro- 
derma and more severe collodion changes. Like the 
former type, there are very large adherent scales. The 
two differ histologically, with severe orthokeratosis 
and hyperkeratosis in the milder phenotype. The out- 
come is variable, some affected infants dying of dehy- 
dration, sepsis, or hypoproteinemia, whilst others heal 
and survive. There are two nonallelic gene loci, one of 
which at 14q11 is close to the transglutaminase gene 
(Russell et al, 1995), and mutations have been 
detected (Parmentier et al., 1995). There is a second 


locus at 2q33-35 and a third locus, on chromosome 
19 p12-q12. A fourth locus occurs at 3p21 and there is 
even further heterogeneity. The transglutaminases 
catalyze s-y-glutamyl lysine isopeptide bonds and 
are very important for keratin cross linking. 


It is unclear whether this is allelic to the lamellar 
ichthyoses or a separate entity, ( ). In any 
event, there is spectacular hyperkeratosis, with very 
severe facial edema and distortion. It is unclear 
whether or not this is allelic to any of the other lamel- 
lar ichthyoses. 


These include Netherton syndrome (linear ichthyosis, 
with pili torti, or trichorrhexis). Refsum disease (ich- 
thyosis vulgaris-like scaling with retinitis pigmentosa, 
peripheral neuropathy, and cerebellar ataxia), tri- 
chothiodystrophy, ichthyosiform erythroderma with 


Figure 2 (See Plate 22) Generalized cutaneous 
features of a harlequin fetus. 


sulfate deficient hair and photosensitivity, and numer- 
ous others. 
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Identity by Descent 
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One of the most influential concepts in the theory 
of population genetics is ‘identity by descent.’ Two 
alleles of a gene are said to be identical by descent if, 
within the span of some specified number of gen- 
erations, they originated by replication of a single 
allele in a common ancestor. In studies of pedigreed 
populations, the specified span of generations is 
usually short and the beginning often coincides with 
the most remote ancestors in the pedigree. For 
population studies, the span of generations is typically 
the time since the founding of any subpopulation in 
question. A number of important concepts in popula- 
tion genetics are based on the probability that two 
alleles are identical by descent. For example, the 
inbreeding coefficient equals the probability that the 
two alleles at a locus in an individual are identical by 
descent, and the coefficient of kinship (coefficient of 
consanguinity) equals the probability that a pair of 
homologous alleles, drawn at random, one from 
each of two individuals, are identical by descent. Con- 
ceived independently by Charles Cotterman (1940), 
and Gustave Malécot (1944), use of the concept of 
identity by descent and calculation of its probability 
soon reproduced all of the key results obtained 
previously by Sewall Wright using his method of 
path coefficients, which is related to partial regression 
coefficients. Because of its intuitive simplicity and 
ease of calculation, the concept of identity by 
descent soon replaced path coefficients in most appli- 
cations in population genetics, especially in the the- 
ories of inbreeding and hierarchical population 
structure. 
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Idiogram 
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An idiogram (or ideogram) is the diagrammatic 
representation of the karyotype of a cell, individual, 
or species. It is based on measurements of chromo- 
some length and centromere position, and on the 
characteristic banding appearance revealed by staining 
techniques such as Giemsa banding. These bands pro- 
vide landmarks for the identification of individual 
chromosomes and regions of chromosomes and act 
as an aid in the analysis of chromosome rearrange- 
ments. 


See also: Chromosome Aberrations; Giemsa 
Banding, Mouse Chromosomes; Karyotype 


Igf2 Locus 
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Igf2 is an imprinted gene in which one of the two 
parental alleles is inactivated, or silenced. The gene 
encodes for a potent fetal growth factor and is 
closely linked to the reciprocally imprinted H19 
gene. On the paternal chromosome, [gf2 is transcrip- 
tionally active and H19 is not transcribed, while on the 
maternal chromosome, /gf2 is not transcribed and 
H19 is active. This phenomenon and other expression 
regulation is mediated by control elements such as 
the differentially methylated domain (DMD), the 
mesodermal tissue silencer element DMR1, and a 
muscle-specific silencer element that is as yet 
unnamed. Loss of imprinting of Igf2 is associated 
with Beckwith-Wiedemann syndrome, which is char- 
acterized by fetal overgrowth and childhood tumors. 
Igf2 is located on mouse distal chromosome 7 and on 


the Beckwith-Wiedemann region on human chromo- 
some 11p15.5. 


Further Reading 
Peters J (2000) Imprinting: silently crossing the boundary. 
Genome Biology |: Reviews 1028.1—1028.4. 


See also: Beckwith-Wiedemann Syndrome; 
Imprinting, Genomic 


Igf2r Locus 
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The Igf2r locus encodes the insulin growth factor 
2 receptor (IGF2R) polypeptide. This polypeptide 
sequesters — and thus modulates — the level of active 
insulin-like growth factor in the developing mam- 
malian fetus. This modulation, in turn, adjusts the 
growth of the fetus. [gf2r is one of a small subset 
of mammalian genes that are subjected to a process 
known as genomic imprinting, where a gene is active 
or inactive depending on its parental origin. In the case 
of Igf2r, the maternal copy of the gene is active, while 
the paternal copy is suppressed. Genomic imprinting 
at Igf2r appears to be the result of an ancient battle 
between male and female parents attempting to maxi- 
mize the survival and success of their offspring. 


See also: Imprinting, Genomic 
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The distinction between legitimate and illegitimate 
recombination is based on the extent of homology 
between the DNA sequences undergoing recombin- 
ation. Legitimate processes involve extensive homo- 
logy, like that manifested by paired chromosomes in 
the act of meiotic recombination. Illegitimate recom- 
bination relies on very short homologies, and some- 
times none at all. A special type of illegitimate process 
is site-specific recombination, in which particular 
sequences are recognized by proteins that catalyze 
breakage and rejoining events at those sites. 


The definition of illegitimate recombination is 
imprecise, since there is no strict threshold in the 
amount of homology that defines legitimate events. 
When the junctions resulting from illegitimate recom- 
bination are examined — for example, in experiments 
with cultured mammalian cells — they often show 
matches of a few base pairs between the parental 
sequences. These microhomologies are not absolutely 
required, since some joints show no such matches. 
Typically the number of matched base pairs at the junc- 
tion is 1-5, but occasionally longer matches are seen. 

Illegitimate recombination is observed in most 
organisms. The origin of spontaneous illegitimate 
events cannot be traced, but such joints are clearly 
formed in response to double-strand breaks in chromo- 
somal DNA. In cultured cells from multicellular 
eukaryotes, illegitimate end joining is the most com- 
mon fate of linear DNAs introduced artificially into 
the cells. The yeast Saccharomyces cerevisiae and some 
other fungi have very efficient mechanisms of homo- 
logous recombination that predominate in the proces- 
sing of chromosomal breaks and of DNA introduced 
during transformation; but illegitimate events can be 
detected if no homology is present, or if the capability 
of performing homologous recombination has been 
disabled by mutation. 

The mechanism by which illegitimate recombin- 
ation occurs in cells is not known. Two simple and 
attractive hypotheses describe mechanisms that very 
likely both contribute to the observed junctions. The 
first hypothesis is that DNA ends are simply joined by 
a DNA ligase (Figure 1). When there are complemen- 
tary nucleotides appropriately situated in single- 
stranded regions, they stabilize an association between 
the ends and help set the register for the ligase. Joints 
of this type have been produced in crude extracts from 
eukaryotic cells and with some purified DNA ligases. 
There are also ligases — e.g., that encoded by bacterio- 
phage T4 - that can join blunt DNA ends that have no 
single-stranded overlaps. 

The second hypothesis is that rather long single- 
stranded tails are formed — presumably by the action 
of exonucleases — at broken ends (Figure 2). If these 
tails have free 3’ ends, microhomologies can support 
transient associations that can be stabilized by DNA 
synthesis through use of the transient joint as a 
primer-template complex by a cellular DNA poly- 
merase. 

While the details of the illegitimate recombination 
mechanism remain obscure, some information is avail- 
able on proteins that participate in the process. In 
mammalian cells, a protein complex called Ku and its 
associated DNA-dependent protein kinase (DNA- 
PK) are required for efficient end joining. Yeast share 
the requirement for Ku and for a DNA ligase that is 
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Figure | Illegitimate recombination by microhomology- 
directed ligation. The broken ends of two DNA 
molecules are indicated in the top diagram. Horizontal 
lines show the phosphodiester backbones and short 
vertical lines the Watson—Crick base pairs. In step |, each 
end is partly degraded by a strand-specific exonuclease. In 
step 2, the single-stranded tails of the two DNAs come 
together, directed by the formation of two base pairs 
surrounding a mismatch. In step 3, the strands of the two 
DNAs are joined by a cellular DNA ligase. 
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Figure 2 Illegitimate recombination by microhomology- 
directed DNA synthesis. The starting point is the same 
as in Figure |. In step |, each DNA is degraded more 
extensively by a 5’—+3’ exonuclease. In step 2, the 3’ single- 
stranded tails come together through the formation of two 
base pairs. In step 3, the 3’ end of the thinner strand is 
extended by DNA polymerase, forming an extended 
base-paired region. In step 4, the thicker single strand 
is degraded, its 3’ end is used to prime synthesis in the 
remaining gap, and nicks at both ends of the new joint are 
sealed by DNA ligase. 
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different from the one utilized in DNA replication — 
DNA ligase IV and its associated XRCC4 protein. 
These same factors participate in site-specific recom- 
bination during immunoglobulin gene rearrange- 
ments (see below). 

Site-specific recombination is seen with some 
viruses and transposable elements and in a few 
chromosomal situations that are probably derived 
from transposons. The hallmark of these processes 
is the involvement of at least one element-encoded 
protein that recognizes the DNA sequences that will 
be joined, helps hold them together, and catalyzes the 
breakage and rejoining reactions before releasing the 
products. In each case, the recombination event is a 
directed part of the life-style of the element. 

An example of site-specific recombination is the 
integration of the bacteriophage lambda genome 
into the host Escherichia coli chromosome. The 
lambda-encoded Int protein recognizes the specific 
attachment sites of both DNAs and, in collaboration 
with host proteins, brings them together into a pre- 
integration complex. Recombination proceeds by a 
topoisomerase-like mechanism, in which hydroxyl 
groups on active site tyrosines in the Int protein attack 
specific phosphodiester bonds in the target DNA, 
producing covalent joints between Int and DNA as 
intermediates in integration. A subsequent transes- 
terification reaction generates the new DNA joints 
and releases the protein. 

An example of site-specific recombination in mam- 
malian chromosomes is the generation of functional 
genes for antibodies, or immunoglobulins. Specific 
DNA sequences, called recombination signal se- 
quences, are recognized by the RAG1 and RAG2 pro- 
teins that are expressed specifically in lymphoid cells. 
Recombination at these sites brings coding sequences 
for variable regions into proximity with the constant 
region coding sequences, allowing the production of a 
functional messenger RNA for the complete protein. 
The biochemical mechanism of this recombination has 
striking similarities to the mechanism of transposition 
by mobile DNA elements. Thus, it is hypothesized 
that immunoglobulin gene rearrangement is derived 
from an ancient transposable element. 
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Immunity in phages, plasmids, or transposons refers 
to the ability of a prophage, plasmid, or transposon 
to prevent another molecule of the same type from 
infecting the same cell (or for transposons, transposing 
to the same DNA molecule). Phage immunity (lyso- 
genic immunity) is due to the synthesis of phage 
repressor by the phage genome. The ability of plas- 
mids to confer immunity usually results from inter- 
ference with the ability to replicate: transposon 
immunity results from a variety of mechanisms. 


See also: Plasmids; Prophage; Transposable 
Elements 
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The immunoglobulin gene superfamily is a very large 
family of genes present in all vertebrates. This gene 
superfamily consists of a series of gene families that 
each play a distinct role in the immune response. The 
superfamily is named after its most well-known and 
well-characterized gene family, the immunoglobulin 
gene family, which codes for polypeptides that form 
circulating antibodies (or immunoglobulins) in the 
bloodstream. Antibodies are one component of a 
two-pronged immune response that an animal mounts 
against invading bacteria and viruses. The antibody 
component of the immune response has been referred 
to as humoral immunity. The other component of the 
immune response is cellular immunity, carried out by 
cells called T cells and B cells. 

Each gene member of the immunoglobulin gene 
super-superfamily contains immunoglobulin-like (Ig) 


domains and functions as a cell surface or soluble 
receptor involved in immune function or other 
aspects of cell-cell interaction. This superfamily 
includes the immunoglobulin gene families them- 
selves, the major histocompatibility genes (called H2 
in mice), the T cell receptor genes, and many more. 
There are dispersed genes and gene families, small 
clusters, large clusters, and clusters within clusters, 
tandem and interspersed. Dispersion has occurred 
with the transposition of single genes that later formed 
clusters and with the dispersion of whole clusters en 
masse. Furthermore, the original Ig domain can occur 
as a single unit in some genes, but it has also been 
duplicated intragenically to produce gene products 
that contain two, three, or four domains linked 
together in a single polypeptide. The Ig superfamily, 
which contains hundreds (perhaps thousands) of 
genes, illustrates the manner in which the initial emer- 
gence of a versatile genetic element can be exploited by 
the forces of genomic evolution with a consequential 
enormous growth in genomic and organismal com- 


plexity. 


See also: Evolution of Gene Families 
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A fertilized egg inherits a haploid set of chromosomes 
from both the egg and the sperm; however, in mam- 
mals these maternal and paternal gametes do not con- 
tribute equal genetic functions to the developing 
diploid embryo. This functional difference between 
the two sets of parental chromosomes is due to a 
process called genomic imprinting. Genomic imprint- 
ing is a mechanism that differentially ‘marks’ the 
maternally and paternally inherited chromosome 
homologs and results in particular genes being ex- 
pressed or repressed in response to this parent-specific 
modification. Because the imprint affects gene activ- 
ity, some imprinted genes are expressed only from 
the maternally inherited chromosome and others 
are expressed only from the paternally inherited 
chromosome (Figure 1). It is not known why such 
a process evolved and the precise mechanisms 
involved in the regulation of imprinted genes is not 
yet fully understood. However, it follows that the 
dosage of an imprinted gene can be doubled or lost 
completely if there is a uniparental duplication or 
deficiency involving the gene or chromosomal region 
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Figure | Schematic representation of a homologous 
chromosome pair with both imprinted (A, B) and 
nonimprinted genes (C, D). White boxes represent 
active alleles and black boxes inactive alleles. Imprinted 
genes show activity from one parental allele and 
repression at the other. The two neighboring imprinted 
genes, A and B, are said to be reciprocally imprinted: A 
is active on the maternal homolog and B is active on the 
paternal homolog. The nonimprinted genes, C and D, do 
not show differences in expression on the two parental 
alleles and are representative of the majority of genes in 
the genome. 


in which it resides. Expression of an imprinted gene 
can also be affected if there is mutation in the chromo- 
somal modifications responsible for its regulation. 
These effects on the dosage of an imprinted gene 
can have profound effects on mammalian embryonic 
development and in humans can result in recognized 
imprinting disorders. 


Developmental Consequences of 
Imprinting 


Genomic imprinting ensures the requirement for both 
a mother and a father to produce normal mammalian 
offspring as shown by the failure of bimaternal and 
bipaternal conceptuses to complete embryogenesis. 
Parthenogenesis, the development of an egg without 
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fertilization by a sperm, is successful in some lower 
organisms. However, it is clear that parthenogenetic 
eutherian mammals cannot survive to term. In the 
mouse parthenogenesis, to create a diploid maternal 
egg, can be induced experimentally. Parthenogenetic 
embryos will survive to midgestation and appear 
morphologically relatively normal though growth 
retarded. The extraembryonic tissues, however, are 
underdeveloped and do not proliferate properly. 
Gynogenetic embryos, also containing a diploid 
maternal contribution though from two different 
mothers, exhibit the same properties as parthenogen- 
ones. Diploid paternal androgenetic conceptuses are 
made by replacing the female pronucleus in a newly 
fertilized egg with a second male pronucleus from 
another egg. These embryos fare worse than partheno- 
genones, with very poor development of the embryo 
which rarely develops beyond the 4-somite stage. In 
contrast to the parthenogenones, the extraembryonic 
tissues are well developed though not completely nor- 
mal. In this respect the androgenone is reminiscent of 
the complete hydatidiform mole in humans. These 
conceptuses contain a genome derived solely from 
paternal chromosomes. The mole resembles a mass 
of cytotrophoblast without any embryonic compon- 
ents. Thus it appears that the parental genomes have 
reciprocal functions in embryogenesis, with the pre- 
sence of a paternal genome generally being important 
for the development of the extraembryonic lineages 
and the maternal genome being required for the 
development of the embryonal components at these 
early stages. This reflects the properties of imprinted 
genes whose activity is either doubled or lost in the 
uniparental conceptuses. 


Genetic Studies of Imprinting in the 
Mouse 


It has been shown that the requirement for both par- 
ental genomes is limited to a subset of mammalian 
chromosomes. This has become evident using mouse 
translocation breeding experiments which result 
in embryos carrying uniparental duplications and 
corresponding deficiencies of whole chromosomes 
(uniparental disomy, UPD) or particular chromo- 
somal regions. These duplications represent a 
subset of the whole genome duplications seen in 
the parthenogenetic and androgenetic embryos. Nor- 
mal development of a UPD conceptus suggests that 
the duplicated region is not imprinted. These studies 
have shown that regions on mouse chromosomes 2, 6, 
7, 11, 12, 17, and 18 are imprinted and hence the 
biparental requirement applies to a subset of the gen- 
ome. On perturbation of the parental origin of these 
chromosomes, quite severe phenotypes are observed, 


including lethality, growth defects, and behavioral 
anomalies. This indicates that developmentally im- 
portant imprinted genes reside within these regions; 
however, it does not rule out the presence of imprinted 
genes elsewhere, which cause more subtle effects 
when their dosage is perturbed by uniparental dupli- 
cation. Around 90% of the imprinted genes identified 
to date map to the regions identified in the genetic 
studies. 


Imprinting in Disease 


It became evident that imprinting had clinical impli- 
cations through the study of patients with disorders 
that exhibit parental-origin effects in their patterns of 
inheritance. There are now several syndromes which 
are recognized as imprinting disorders. Imprinting 
mutations have also been implicated in the genesis of 
some tumours, notably Wilms’s tumor and familial 
glomus tumors. These imprinting disorders show a 
normal autosomal dominant pattern of inheritance 
but from a parent of one sex — offspring of an affected 
individual of the opposite sex are completely unaf- 
fected. The disorder remanifests itself in a subsequent 
generation after inheritance through a phenotypically 
normal carrier individual of the appropriate sex. Males 
and females are equally affected which clearly distin- 
guishes an imprinting pedigree from that of a sex- 
linked disorder. For example, benign familial glomus 
tumors show autosomal dominant inheritance but are 
only manifest in individuals inheriting the mutant 
gene from their fathers. Inheritance of the mutation 
from the mother results in normal offspring; however, 
her sons (if carriers) will have affected offspring at a 
frequency of 50%. Other imprinting disorders have 
been associated with a significant level of UPD. To 
date, all human chromosomes involved in these syn- 
dromes show evolutionary conservation with those 
in the mouse identified as imprinted chromosomes 
(see above). In addition to hereditary glomus tumors, 
imprinted disorders identified to date include 
Beckwith-Wiedemann syndrome and Silver—Russell 
syndrome which are growth defects, two neurological 
disorders - Angelmann syndrome and Prader-Willi 
syndrome, transient neonatal diabetes, and maternal 
UPD14 syndrome. The latter is a rare disorder associ- 
ated with growth defects and premature puberty. 


Mechanism of Imprinting 


The mechanism causing parental-origin specific gene 
expression must allow the transcriptional machinery 
of the cell to distinguish between two chromosome 
homologs and differentially act on one or the other. 
The imprint is believed to be initiated late in the 


development of the egg and sperm and then acted 
upon in the zygote and developing conceptus to affect 
developmental gene activity. It is therefore likely 
that the imprint is a modification to the DNA and/ 
or chromatin which must have the following proper- 
ties: 


1. It must be able to affect the transcription of the 
gene. 

2. It must be heritable in somatic cells over many cell 
divisions and not lost during chromosome repli- 
cation. This renders the imprint stable and allows 
it to have parental-origin specific memory. This 
step is known as maintenance. 

Importantly, the imprints must be erased in the 
male and female germlines during gametogenesis 
to allow new imprints to be set down which are 
specific to the parental origin of the newly formed 
gametes. 


pa 


There are only a few recognized mammalian genome 
modifications that might fulfil the above criteria. By 
far the best studied is DNA methylation. DNA 
methylation of CpG dinucleotides is known to affect 
gene activity. Indeed, methylation of CpG-rich regu- 
latory portions of genes, for example on the inactive 
X chromosome in females, has long been associated 
with gene inactivity. More recently, it has been shown 
that imprinted genes contain regions that are differen- 
tially methylated on the two parental chromosomes; 
however, sometimes methylation is associated with 
the inactive allele and sometimes with the active allele. 
In the absence of the DNA methyltransferase gene, 
which encodes the methylating enzyme, the methy- 
lation imprint is lost from somatic cells and imprinted 
gene activity is perturbed. Thus, methylation is 
involved, at least, in the maintenance of i imprinting. 
Whether methylation is the germline imprinting 
initiator remains to be proven; however, several differ- 
ences in methylation have been found in the DNA of 
eggs and sperm in imprinted regions, which suggest 
that CpG methylation may have a role to play in the 
earliest imprinting events. 

Other modifications may also be involved in the 
imprinting process. It is apparent that many imprinted 
genes show differences in their chromatin structure 
between the active and inactive alleles. However, in 
the case of imprinting, the relationship, if any, between 
a region’s chromatin conformation and its methy- 
lation status is not understood. It is now well docu- 
mented that modifications to chromatin-associated 
proteins, notably acetylation of core histones, have 
key roles to play in the regulation of gene expression 
and it is possible that these may be involved in the 
imprinting mechanism. Nonetheless, it seems that the 
imprints are acting both at short and long range, 
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perhaps to provide a particular chromatin context 
within which individual genes can be further modi- 
fied. It is likely that this context must differ between 
the two parental homologs. 


Function and Evolution of Imprinting 


Imprinting renders an autosomal gene functionally 
hemizygous and the potential benefit to the organism 
of this costly process remains unclear. Many of the 
imprinted genes identified to date are involved in the 
regulation of fetal and embryonic growth and are 
clustered in the genome. To date the most widely 
discussed theory to explain the evolution of imprint- 
ing is the ‘parent-offspring conflict’ theory. In pro- 
miscuous animals, the father seeks to promote the 
growth of his offspring at the expense of the resources 
of the mother who is likely to procreate with other 
males. The mother, in contrast must conserve her 
resources in order that she can maximize the chances 
of future pregnancies and many litters. The model 
predicts that in this parental ‘tug-of-war, paternally 
expressed genes will promote growth and mater- 
nally expressed genes will repress growth. This model 
is consistent with many of the growth defects observed 
in imprinted disorders in mouse and man and also 
with the function of many of the imprinted genes 
identified to date. Some disorders and imprinted 
genes do not fit this model and other theories have 
been proposed. These include the idea that imprinting 
arose to prevent parthenogenesis in mammals or ovar- 
ian teratomas in females; while this fits with the silen- 
cing of maternal genes it cannot explain the silencing 
of paternal genes. Others have suggested that imprint- 
ing is an extension of the bacterial host defense mech- 
anism that guards against the invasion of foreign DNA 
via DNA methylation. However, while some 
imprinted genes are intronless retrotransposons of 
X-linked genes, most are not and furthermore have 
important functions in mammalian development. It is 
likely that, as more imprinted genes are discovered 
and analyzed, these and other theories will be further 
scrutinized and the biological significance of this 
remarkable phenomenon will be better understood. 


Further Reading 
Bartolomei MS and Tilghman SM (1997) Genomic imprinting in 
mammals. Annual Review of Genetics 31: 493-525. 


See also: Androgenone; Chromatin; CpG Islands; 
DNA Modification; Epigenetics; Hydatidiform 
Moles; Igf2 Locus; Igf2r Locus; Parthenogenesis, 
Mammalian; Uniparental Inheritance; 
X-Chromosome Inactivation 
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In situ hybridization (ISH) is used to map and order 
genes and other DNA and RNA sequences to their 
location on chromosomes and within nuclei. The 
technique is based on the principle that double- 
stranded DNA denatures on heating to single-stranded 
DNA. Oncooling, the single-stranded DNA reanneals 
withits complementary sequence into double-stranded 
DNA. If an appropriately labeled fragment of a DNA 
sequence (a DNA probe) is denatured and added to 
denatured nuclei or chromosomes on a routine, air- 
dried interphase preparation during the process of 
reannealing, some of the labeled DNA will hybridize 
to its complementary sequence in the chromosomal 
DNA. Detection of the labeled DNA probe under the 
microscope will identify the site of hybridization and 
thus the region of chromosomal DNA complemen- 
tary to the DNA sequence in the labeled probe. If, 
for example, the DNA probe represents a sequence 
of more than 1kb from a cloned gene, ISH has the 
capability of assigning that gene to its chromosomal 
location. 

When ISH was introduced in 1970, DNA probes 
made from highly repetitive DNA fragments (satellite 
DNA) were labeled with tritium (°H) or radioactive 
1351 and detected by autoradiography using photo- 
graphic emulsion applied directly to the microscope 
slide. The technique had poor resolution and was 
difficult to use with single-copy probes, even when 
they were cloned in phage or plasmid vectors. 

Radioisotopic methods for ISH were replaced in 
the 1980s by nonisotopic alternatives such as biotin 
and digoxigenin, which are coupled to nucleotides and 
incorporated into the DNA probes by techniques 
such as nick translation using DNA polymerase. 
These probes are detected by fluorescence microscopy 
using fluorochromes coupled to avidin, streptavidin, 
or antibiotin antibodies in the case of probes labeled 
with biotin. The same fluorochromes coupled to anti- 
digoxigenin antibodies are used for probes labeled 
with digoxigenin. The fluorochromes most com- 
monly used are fluorescein isothiocyanate (FITC), 
tetramethyl rhodamine isothiocyanate (TRITC), and 
aminomethyl coamarin acetic acid (AMCA). More 
recently the indirect systems using avidin and antibo- 
dies have been replaced by direct labeling methods in 
which fluorochromes such as FITC, Cy3, and Cy5 are 
coupled directly to the nucleotides (e.g., FITC-11- 
dUTP) that are used in labeling the DNA probes. 


When exposed to a UV light source, each fluoro- 
chrome is excited by a different wavelength and 
each emits a distinctive fluorescence. In order to dis- 
tinguish the various emissions produced by each 
fluorochrome, a series of exitation and emission filters 
are used that are specific for each fluorochrome. Com- 
binations of filters allow the observation simultan- 
eously of several fluorochromes excited by different 
wavelengths, and this, together with the development 
of digital fluorescence microscopy and image analysis, 
has led to the introduction of multicolor fluorescence 
ISH (M-FISH). M-FISH systems depend on the use of 
combinations of up to five different fluorochromes to 
label individual DNA probes so that a large number of 
probes can be distinguished in each preparation. This 
requires a sensitive, monochromatic, cooled charged- 
coupled device (CCD) camera and computerized 
image analysis. A gray-scale image of the fluorescence 
of each fluorochrome is acquired sequentially and 
merged to provide a false color on the computer 
screen, which is chosen on the basis of the relative 
intensities of the constituent fluorochromes. 


DNA Probes Used in FISH 


Total genomic probes are prepared by labeling DNA 
extracted from blood samples, cell cultures, or solid 
tissues. Chromosomes hybridized with these probes 
show an evenly distributed signal along their length, 
referred to as ‘chromosome painting.’ The main 
application of total genomic probes has been in the 
identification of human chromosome material in 
human-to-rodent interspecific somatic cell hybrids, 
including radiation-reduced cell hybrids. 
‘Chromosome-specific paint probes’ are genomic 
probes that were prepared initially from chromosome- 
specific genomic libraries cloned in plasmid vectors. 
They can also be made from single-chromosome 
interspecific somatic cell hybrids. Most are now pre- 
pared from flow-sorted chromosomes and these tend 
to have the highest resolution. Each chromosome- 
specific paint is made from sorting 300-500 chromo- 
somes and amplifying chromosomal DNA fragments 
by the random-primed polymerase chain reaction 
(DOP-PCR). Flow-sorted chromosomes can be 
obtained in high purity, and the PCR procedure 
amplifies over 90% of the chromosomal DNA. 
Chromosome-specific hybridization, free of back- 
ground signal, is assured by prehybridization of the 
probe with itself before application to the test ma- 
terial. This ensures that highly repetitive signals are 
largely eliminated, and unique, conserved DNA 
sequences are available to paint all but the heterochro- 
matic regions of the chromosomes. Chromosome- 
specific paint probes have wide application in the 


analysis of complex chromosome aberrations and are 
commercially available from several distributors 
either as single chromosome-specific paint probes or 
as complete probe sets in which each chromosome is 
labeled differently for M-FISH analysis. This allows 
the analysis of a complete cell in one hybridization. 

The main disadvantage of chromosome-specific 
paint probes is that they are unable to identify 
intrachromosomal aberrations such as inversions, 
duplications, and insertions, and that areas containing 
repetitive sequences, especially telomeres and centro- 
meres, are not painted. In these cases, region-specific 
paint probes prepared from amplified chromosome 
segments obtained by chromosome microdissection 
have found some application. 

Chromosome-specific centromeric probes are pre- 
pared from cloned alphoid repeat sequences which are 
located adjacent to centromeres. Almost all human 
chromosomes have chromosome-specific sequences 
of this type. The exceptions are chromosomes 13, 14, 
21, and 22. Chromosomes 13 and 21 have the same 
centromeric sequences, different from 14 and 22, 
which also share the same sequences. These probes 
are used to determine chromosome copy number in 
interphase nuclei. More than 80% of normal diploid 
nuclei will show two distinct signals when hybridized 
with a chromosome-specific centromeric probe. Cen- 
tromeric probes are therefore used for aneuploidy 
detection in uncultured amniotic fluid cells, for pre- 
implantation diagnosis in cells from the blastocyst, for 
the detection of residual disease in the management 
of certain hematological malignancies, and for the 
analysis of nondisjunctional abnormalities in sperm. 
Chromosome-specific sequences cloned in yeast arti- 
ficial chromosome (YAC), bacterial artificial chromo- 
some (BAC), or cosmid vectors replace the lack of 
specific centromeric probes for aneuploidy detection 
involving chromosomes 13, 14, 21, and 22. 

The project to map and sequence the human gen- 
ome has, as one of its by-products, a complete series of 
overlapping DNA clones from which reference 
probes can be produced which can be used as FISH 
markers to delineate any point on any chromosome. 
Cloned in a variety of cosmid and other vectors, they 
can be used to characterize specific breakpoints and to 
detect specific microdeletions (such as the DiGeorge 
syndrome on chromosome 22). These single-copy 
DNA sequence probes have wide application in 
clinical cytogenetics and in the mapping and cloning 
of disease genes. 

Telomere-specific probes are now available for the 
ends of all human chromosomes. They have proved to 
be particularly valuable in the detection of reciprocal 
translocations which are beyond the resolution of 
conventional diagnostic cytogenetics. 


In situ Hybridization 1003 


Other Applications of FISH 


Due to the condensation of the DNA fiber with- 
in metaphase chromosomes, the fluorescent signals 
from two cosmid clones can be resolved only if they 
are more than 2-3 Mb apart. At interphase the chromo- 
somes are 10 times more extended than at metaphase, 
and so two cosmids more than 50kb can usually be 
distinguished from one another. The order of several 
closely linked cosmids may be determined at inter- 
phase provided they are more than 50kb and less 
than 1 Mb apart. The latter restriction is due to the 
tendency of a chromosome to coil back on itself. 

The elemental DNA fiber may be further decon- 
densed by techniques which release it from its asso- 
ciated histones and other proteins (see Chromosome 
Scaffold). Such preparations of DNA fibers on micro- 
scope slides can be used for hybridization with standard 
DNA probes. The technique permits the ordering of 
very closely linked single-copy DNA sequences and the 
analysis of the intrachromosomal relationships of vari- 
ous repetitive elements. It has also been used to identify 
small duplications and deletions within known genes 
(such as the Duchenne muscular dystrophy gene) and 
distances as short as 1 kb have been resolved. 

While the genetic basis of cancers are well estab- 
lished and complex chromosome rearrangements are a 
common feature of malignancy, cytogenetic analysis 
of cancer cells has proved technically difficult. In part 
this is due to the difficulty in finding suitable meta- 
phases in tumor material, and in part due to the com- 
plexity of the chromosomal rearrangements observed 
when suitable metaphases are found. One of the aims 
of cancer cytogenetics is to map regions of the chromo- 
some complement which have been deleted and 
regions which have duplicated. Consistent patterns 
of abnormality may lead to the identification of key 
oncogenes or tumor suppressor genes important in the 
clonal evolution of the cancer. M-FISH techniques are 
now contributing to the detailed cytogenetic analy- 
sis of tumors. Comparative genome hybridization 
(CGH) has been a particularly informative method, 
because it has permitted the mapping of DNA amp- 
lifications of over 5-10Mb and the deletion of 
chromosome segments over 10-20Mb. In brief, the 
method involves the mixing of equal amounts of 
total genomic DNA from the tumor tissue labeled 
with FITC (green), with TRITC (red)-labeled total 
genomic reference DNA, and the hybridization of the 
mixture to normal metaphases. The relative amounts of 
tumor and normal DNA that anneal to a particular 
chromosome region depend on the number of copiesof 
DNA complementary to that region in the test sample. 
If the tumor sample contains relatively more of a par- 
ticular DNA sequence than the reference sample, this 


1004 In vitro Evolution 


will be revealed by an increased green-to-red fluores- 
cence ratio in the complementary region; similarly, 
chromosomal deletion in the tumor sample is revealed 
by a decreased green-to-red ratio. The method requires 
digital fluorescence microscopy in which the relative 
amounts of green and red fluorescence are measured 
along the length of the chromosome. 

Mention should be made of the use of chromosome- 
specific paint probes in the study of comparative geno- 
micand karyotype evolution. The conservation of genes 
between mammalian species is widely appreciated, 
and even widely divergent species such as the human, 
the fruit-fly, the nematode worm Caenorhabditis 
elegans, and yeast share a number of genes. Compara- 
tive mapping studies reveal that the X chromosome 
carries the same transcribed genes in all mammals, and 
also that large blocks of linked autosomal genes show 
similar conservation between species. In closely 
related species, these genetic linkage groups tend to 
be more extensive than in more distantly related spe- 
cies, sometimes representing whole chromosomes that 
are shared between the species. Cross-species chromo- 
some painting has been used to demonstrate the extent 
of chromosome homology between species. Chromo- 
some- specific paints from one species are hybridized 
to the chromosomes of a second species. The precise 
origin of a particular block of homology revealed by 
a paint probe from the first species can be determined 
by hybridizing chromosome-specific paint from the 
second species back to the chromosomes of the first 
species. In this way simple comparative maps can be 
constructed between species. If one of the species is 
a well-mapped species, such as human or mouse, a 
preliminary genetic map can be constructed for the 
unmapped species. This homology map can assist in 
more detailed mapping using genetic linkage and radia- 
tion hybrid techniques. Phylogenetic relationships 
between species can be studied, based on chromosome 
rearrangements revealed by chromosome painting and 
shared by species diverged from a common ancestor. 


See also: Chromosome; Chromosome Painting; 
Chromosome Scaffold; FISH (Fluorescent in situ 
Hybridization); Gene Mapping; Genome 
Organization 


In vitro Evolution 
R L Dorit 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0712 


For evolution to occur, three conditions must be met. 
First, variation must be present among the evolving 


entities. Second, that variation must, to some extent, 
translate into differential survival and reproduction 
(fitness) among the evolving entities. Third, the vari- 
ation responsible for differential fitness must be herit- 
able: transmitted from parents to offspring. If these 
three conditions are met, the stage is set for a popula- 
tion to evolve. 

In his seminal work, On the Origin of Species, 
Charles Darwin put forth this revolutionary under- 
standing of the mechanisms of evolution. This 
Darwinian insight provides a materialistic account of 
the evolution and diversity of organisms on earth, an 
explanation that has, since its inception, endured. 
Evolution is an ongoing process and its consequences 
are constantly on display. The emergence of antibiotic 
resistance in bacteria, of insecticide resistance in agri- 
cultural pests and of herbicide resistance in weeds are 
but a few obvious examples of evolution at work. 
More desirable instances of the power of selection to 
shape organisms also surround us in the form of crops, 
domesticated animals, and livestock. Darwin dis- 
cerned in nature a parallel to the practice of selective 
breeding, or artificial selection, which humans have 
been practicing for the past 10000 years. Whether 
selecting for faster horses, higher milk yields from 
cows, or showier pigeons, humans have shown that 
the selective breeding of individuals exhibiting the 
desired traits will usually lead to changes in the popu- 
lation — and to an accentuation of the selected trait 
over generations. 

Over the past 30 years, selective breeding has been 
brought to bear on an increasing variety of biological 
entities. Bacteria, viruses, nucleic acids, and proteins 
are now routinely evolved in the laboratory. These 
in vitro experiments in evolution occur in the beakers 
and test tubes of laboratories around the world. The 
motivations behind in vitro evolution experiments 
range from an interest in the mechanisms of adapta- 
tion to the determined pursuit of molecules exhibiting 
a desired feature. All of these varied investigations 
seek to harness the immense creative power of the 
evolutionary process. Such work has taught us much 
about evolutionary responses, about limits to adapta- 
tion, and about the genetic basis of novel features. 
Ultimately, this work may also help us to understand 
the mechanisms responsible for the emergence of life 
on this planet. 


The History of In Vitro Evolution 


The work carried out by Sol Spiegelman in the 1960s 
serves as an early landmark in the effort to examine 
the evolutionary process in vitro. In an elegant set 
of experiments, Spiegelman explored the evolution 
not of organisms, but of a particular molecule: the 
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short RNA template molecules that can be copied 
by the enzyme Qf replicase. This enzyme, the RNA- 
directed RNA polymerase of phage Qf, uses an 
RNA template to synthesize new RNA molecules. 
This replicase can be made to operate in a cell-free 
system that contains only rNTPs (ATP, CTP, UTP, 
and GTP), salts, and a population of diverse short 
RNA molecules capable of acting as templates for the 
replicase. When Spiegelman’s system evolved over 
several generations of replication, he realized that the 
character of the template population had changed 
significantly and now consisted almost entirely of a 
small subset of similar sequences. Those template 
sequences best suited to copying by the Qf replicase 
had increased in frequency throughout the experi- 
ment, eventually coming to dominate the system 
(Figure 1). The conditions for evolution laid out in 
the introduction had all been met: 1) variation existed 
in the population (and was constantly resupplied by 
the errors committed by the QB replicase); 2) that 
variation led to differential reproduction — in this 
case copying by the Qf replicase; and 3) those 
sequence features were passed on to the subsequent 
generation by the QB replicase through the template- 
directed synthesis of the complementary strand. 
The result was a succession of template RNA strands 
particularly well suited, in sequence and three- 
dimensional structure, to serve as templates for the 
QB replicase. 

The field of in vitro evolution has expanded, ex- 
ploded, really, since those early experiments. In vitro 
evolution is now an important aspect of both basic and 
applied research in the life sciences. 


In Vitro Evolution: Basic Insights 


The study of evolution is, for the most part, a retro- 
spective endeavor. Until recently, evolutionary bio- 
logists dealt with products of evolution shaped over 
timespans far exceeding the lifetime of the investi- 
gator. The task of the investigator, then, was to re- 
construct the evolutionary process based on its 
contemporary outcomes. 

The idea of controlling and observing evolution 
directly, rather than reconstructing it post hoc, how- 
ever, holds immense appeal. The power of retrospect- 
ive approaches can now be supplemented by results 
obtained from experimental evolution. Furthermore, 
the validity of our methods of evolutionary recon- 
struction can now be tested directly by comparing re- 
constructions to observed events. Increasingly, since 
the 1970s, evolutionary experiments are being carried 
out using bacteria, phages, viruses, and even cell-free 
systems. 

Over the past two decades, a number of investiga- 
tors have used im vitro evolution to explore molecular 
function directly. Much of this pioneering work 
again focuses on RNA, which had been shown to be 
both an information-conveying molecule (as most 
nucleic acids are), and a molecule capable of carrying 
out precise biochemical function. The discovery of 
catalytic RNA immediately prompted questions 
about the catalytic range of RNA and about the pos- 
sibility that an entire rudimentary metabolism could 
be based on RNA alone. Central to this conjecture 
of an ‘RNA world’ was the assumption that RNA 
could catalyze a variety of reactions, possibly even 
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A diagram of the results obtained by Sol Spiegelman in his early in vitro evolution experiments on QB 


replicase. As can be seen, the particular templates best replicated by the Qf replicase rise in frequency; as new 
variation is constantly introduced, new, even better templates emerge and come to dominate the system. The 
composition of the template population changes and evolves at every generation. 
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including the template-directed synthesis of new 
RNA molecules. 

In vitro evolution methods provide a powerful 
tool with which to explore this assumption, and two 
strands of research quickly emerged. The first of these 
searched for RNAs capable of binding with high affin- 
ity to particular molecules or molecular features. 
Such RNA sequences (often referred to as ‘aptamers’) 
would confirm the ability of RNA to adopt the precise 
three-dimensional configuration required to bind sub- 
strates and cofactors in enzymatic reactions. Such 
aptamers would also confirm the ability of RNA to 
stabilize transition states, the critical intermediate 
molecular configuration adopted by reactants in a 
chemical reaction. Unless RNA could be shown to 
be capable of a high- affinity interaction with defined 
chemical species, it was impossible to argue for the 
plausibility of RNA-based metabolisms. A series of 
experiments was quickly undertaken to demonstrate 
the existence of high-affinity aptamers capable of 
binding synthetic and naturally occurring molecules. 
The overall design of these SELEX (systematic evolu- 
tion of ligands by exponential amplification) experi- 
ments consists of the generation of starting populations 
containing an immense variety (10'°-10'°) of RNA 
sequences. Such starting pools are created by synthe- 
sizing either fully randomized sequences flanked by 
conserved regions (used in subsequent amplification 
of selected molecules) or by partially randomizing 
a pre-existing functional molecule. This population 
then is passed through a column composed of inert 
material covered in the target molecular species (the 
ligand’). Those RNA molecules in the population 
comparatively best able to bind the ligand would then 
be slowed in their passage through the column; con- 
versely, other RNA molecules would flow through 
freely. After the entire RNA population has been 
passed through the column, bound RNA molecules 
are stripped from the ligand and used as the progenitors 
of the next round of im vitro selection. This simple 
cycle, successfully completed multiple times, leads to 
an increase in the mean affinity of the evolving pool for 
the target ligand and to the eventual isolation of RNA 
molecules showing enhanced ligand affinity. Note that 
in vitro ‘selection,’ the eventual isolation of desired 
molecules from a large starting population, should be 
contrasted with in vitro ‘evolution,’ where, in addition 
to a selection step, new variation is constantly reintro- 
duced into the population (see Figure 2). 

One characteristic SELEX experiment began with 
a pool of 10" versions ofa RNA 100-mer, in pursuit of 
molecules capable of binding a synthetic dye (Ciba- 
chron Blue). This pool was estimated to contain 1 in 
10'° molecules capable of binding the dye with notice- 
able affinity; after six rounds of selection, more than 
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Figure 2 The basic elements of a simple in vitro 
evolution system. Mutation introduces variation into the 
system; a selection step sorts among the available 
variants, allowing only those best suited for the 
particular function to emerge and be amplified (‘re- 
produce’) in the subsequent step. A mutation step then 
restores variation. This basic cycle, iterated multiple 
times, results in the evolution of a population of 
molecules. In vitro selection experiments follow the 
same basic design, but mutation is only introduced at the 
outset, and subsequent cycles involve an alternation of 
the amplification and selection steps. 


60% of the pool exhibited high-affinity binding. Simi- 
lar in vitro evolution strategies have resulted in the 
isolation of aptamers capable of binding to specific 
nucleic acid and protein sequences with micromolar 
or submicromolar affinities (Ką < 1 um). Aptamer 
selections have now been directed to a broad variety 
of compounds including amino acids, nucleotides, 
cofactors, and antibiotics. 

In a second strand of research, scientists have suc- 
cessfully used in vitro evolution to explore the versa- 
tility and limitations of RNA’s catalytic ability. To do 
this, studies begin with a pool of variants based on an 
existing, catalytically active ribozyme, or, in some 
instances, with a fully randomized pool of longer 
(>60 nucleotides) RNA molecules. These studies have 
different objectives. They may seek to modify an 
existing RNA catalytic activity (e.g., by changing the 
ion dependency of the GpI intron from Mg** to Ca") 
or to expand the catalytic activity of a ribozyme to a 
new substrate or reaction (e.g., evolving DNA-cleaving 
derivatives of the Gp I and RNaseP RNA ribozymes) 
(Figure 2). Such studies also aim to isolate RNA 
molecules capable of performing a particular function, 
such as self-cleavage, ligation, aminoacylation, and 
peptide bond synthesis. Ongoing experiments also 
explore the dynamics and interactions in molecular 
ecosystems involving multiple molecular species. 

More recently, pools of DNA variants have been 
subjected to a similar battery of in vitro evolution 
regimes, resulting in the identification of DNA 
aptamers and of DNA enzymes (DNAzymes, or 
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Figure 3 An example of in vitro evolution, where RNase P RNA is evolved to cleave a DNA substrate. The top panel 
showsa diagram of the selection scheme, where variant RNA molecules annealtoaDNA substrate, whichin turn attaches 
to a column (via a biotin ‘B’ molecule). Those variants that can cleave the DNA substrate are eluted from the 
column, and are amplified for the next generation of selection. The bottom panel shows the response of 
three parallel RNase P RNA populations under in vitro evolution. Note the increase in the overall activity of the 
population, as well as the dramatic, but transient drop in activity that accompanies the reintroduction of mutations 
into the evolving population. 
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deoxyribozymes) capable of enhancing the rate of bio- 
chemically important reactions. The success of these 
DNA selections expand the perspective of workers 
in the field, and have led to the realization that any 
biopolymer that can be copied with some degree of 
fidelity can, in principle, serve as the raw material 
for in vitro evolution. 

Taken together, in vitro results underscore the tre- 
mendous functional versatility of nucleic acid poly- 
mers. The fact that im vitro evolution experiments 
frequently succeed attests to the density of func- 
tional solutions scattered throughout sequence space. 
Phrased differently, a fully randomized pool of RNA 
100-mers could theoretically contain 41° or ~10°° 
variants. A typical experiment will thus sample only 
10'°/10°° or 1 of 10*” possible sequences. Even with 
this extremely sparse sampling, functional sequences 
are almost always retrieved. While there may be cer- 
tain functions for which a viable solution is so rare that 
it cannot be captured by in vitro selection, theoretical 
and empirical results paint a different picture of func- 
tional space. Indeed, multiple solutions appear to exist 
for any given catalytic challenge, and these solutions 
seem both to lie in close proximity and to be accessible 
from practically any given starting point. The pres- 
ence of so many peaks on the RNA functional land- 
scape may well account for the rapid emergence of 
organization early in the history of life. 


In Vitro Evolution: Applied Research 


Early in the history of the field, the isolation of apta- 
mers directed to particularly visible targets, such as 
the Rev protein and reverse transcriptase of the HIV 
virus, hinted at the applied potential of in vitro evo- 
lution methods. Although naturally occurring, high- 
affinity interactions between proteins and nucleic 
acids are integral to all known metabolisms, it soon 
became apparent that any protein could, in principle, 
be targeted using in vitro approaches. In fact, high- 
specificity aptamers capable of binding with disease- 
causing molecules or pathways in some cases show 
stronger binding affinities than those typically asso- 
ciated with antibodies. This binding has obvious 
implications for the diagnosis of disease conditions. 
In those cases where binding interferes with the oper- 
ation of molecules involved in disease pathways, apta- 
mers show significant therapeutic promise. Over the 
past decade, a number of aptamers of potential diag- 
nostic or therapeutic importance have been developed 
using im vitro evolution. These aptamers target a 
diverse collection of disease-related proteins (e.g., 
thrombin, antibodies involved in autoimmune con- 
ditions, proteases). More recently, scientists have 
isolated aptamers that can be directed not only at 


specific proteins, but at particular diseased tissues 
(e.g., sclerotic arterial deposits). 


Future Directions 


The uses of in vitro evolution continue to expand, 
limited only by the ability to identify targets of in- 
terest and to design effective selection strategies. 
The raw materials for in vitro evolution are now 
more diverse. For example, synthetic nucleotides and 
nucleotide analogs have been incorporated into the 
sequences constituting the initial pool for in vitro 
evolution. This increase in the complexity of the 
nucleotide sequences increases the number of poten- 
tial three-dimensional interactions, and, by extension, 
the number of shapes that can be assumed by the sam- 
pled pool. Similarly, recent methods have succeeded in 
coupling peptides to their coding sequences. This sig- 
nificant advance allows for the in vitro isolation of 
proteins with desirable properties, followed by the 
replication of their cognate coding sequences. The 
substantially wider repertoire of side groups provided 
by a 20 amino-acid alphabet may well allow for the 
in vitro evolution of catalysts capable of a broader 
range of chemical reactions (Figure 3). 

The power of the in vitro approach is now being 
directed toward the more subtle issues of emergence 
and complexity. Studies now underway seek not just 
to evolve novel molecules or to expand the catalytic 
repertoire of single molecular species, but instead to 
evolve metabolic networks. Such enclosed networks, 
composed of multiple interacting molecular species, 
serve as a model for the earliest protocells and their 
rapidly evolving metabolic potential. The field of 
in vitro evolution is still in its early phase, and its 
potential still enormous. In effect, in vitro evolution 
allows us to explore sequence, structure, and function 
space beyond the solutions already present in living 
systems. This ability to compare existing functional 
solutions with possible (but unrealized) solutions, to 
probe not just the actual but the possible, adds a rad- 
ically new tool to the arsenal of comparative biology. 


See also: Bacterial Genetics; Biochemical Genetics; 
Evolution; RNA World; Selection Techniques 
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In vitro fertilization (IVF) opened up the prospects of 
many genetic studies on human conception. Some of 


them were of greater clinical interest, such as embryo 
transfer for the alleviation of various forms of male 
and female infertility. Others were more genetic and 
academic, yet have now been developed clinically. For 
example, controlling the growth of the human embryo 
in vitro enabled its various forms of growth to be 
classified and related to differing chromosomal, 
nuclear, and cytoplasmic anomalies typical of early 
human development. Astonishingly, high numbers of 
human embryos grown in vitro carry several such 
anomalies, and limited evidence suggests the same is 
true for those developing in vivo after natural concep- 
tion. Some anomalies do not seem to be correlated 
with identified disorders in the human oocyte and 
preimplantation embryo. Others are recognizably 
abnormal and involve chromosomal disorders such 
as aneuploidy, haploidy, polyploidy, and mosaicism, 
which effectively terminate development before im- 
plantation or soon afterward. A few continue to later 
stages of gestation. A very significant feature is the 
suprisingly low implantation rate per embryo, i.e., 
20% or even less after growth im vitro or in vivo. 
Coupled with this evidence of serious weaknesses 
in oogenesis and embryogenesis, enormous numbers 
of human spermatozoa are weakly immotile or mis- 
shapen, so that as few as 14% of normal forms is 
considered to indicate a highly fertile man! Human 
gametogenesis and embryogenesis thus seem to be 
highly flawed, yet women ovulate only one egg per 
month. Humans seem to have serious flaws in the 
control of reproductive systems, unlike other mam- 
mals where strong selective pressure apparently main- 
tains highly effective systems in reproduction with 
implantation rates of 80-90% and few disorders in 
preimplantation growth. Perhaps a highly effective 
mother-child bonding or some similar highly adaptive 
and protective system has relaxed the human need for 
tight controls over the close cell cycle and over meio- 
sis, fertilization, and cleavage. 

IVF has helped to gain a deeper understanding 
of other genetic aspects of conception. Very severe 
infertility in men has an unusual genetic basis, o owing 
to large deletions in three distinct regions of the 
Y chromosome. Characterizing these regions has pro- 
vided a superb understanding of Y chromosome 
genetics and the exact sequences undergoing deletion. 
For such men, the intracytoplasmic injection of a single 
spermatozoon into an egg (ICSI) enables their extreme 
oligozoospermia to be overcome by using the very rare 
spermatozoa in ejaculates, epididymis, or testis, or even 
spermatids. Fewer spermatozoa are collected from 
some of these men than the number of oocytes col- 
lected from their wives. This finding has necessitated 
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great care in searching for mutants among the chil- 
dren, although at present more attention is paid to 
disordered chromosome constitutions. Treating other 
extreme forms of male infertility has also uncovered 
genetic defects rarely found in a normal-conceiving 
human population, such as cystic fibrosis variants that 
distort the formation of the vas deferens. Applying 
ICSI in these cases can risk the health of the child 
when wives are carriers for cystic fibrosis. Separating 
human X and Y spermatozoa is now possible and 
reliable and also more easily achieved by applying 
ICSI for the limited spermatozoa available until tech- 
niques improve to produce sufficient spermatozoa for 
artificial insemination. 

Preimplantation genetic diagnosis for inherited dis- 
ease is also becoming more widespread in IVF pro- 
grams. A single cell excised from an 8-cell embryo or 
half a dozen cells from the trophectoderm of a blas- 
tocyst can be used to type genetic disorders in human 
embryos. Many single-gene disorders can be identi- 
fied in the embryos, and also highly complex translo- 
cations, chromosome errors, and complex variants, 
such as those involved in Duchenne muscular dystro- 
phy. Improvements in array technology promise to 
permit hundreds or even thousands of genes to be 
identified in preimplantation embryos. Such know- 
ledge might provide a genetic blueprint of the growth 
of the embryo, with considerable social and ethical 
implications. Other genetic-related advances stem- 
ming from IVF include the potential cloning of 
human embryos for spare-parts surgery. It is notable 
that cloned embryos and offspring have enormous 
anomalies and very high death rates, and the effects 
of such epigenetic changes will presumably be present 
in embryo stem cells. Cloning was not attempted in 
hundreds of IVF laboratories practicing ICSI, which 
could enable cloning to be introduced. The UK 
government’s decision to permit cloning of human 
embryos to make tolerant embryo stem cells for 
organ repair has just been announced. 

Knowledge about the genetic regulation of the 
human oocyte and embryo is now accumulating 
rapidly. Polarities have been identified in oocytes, 
cleaving embryos, and blastocysts, and genes affect- 
ing early growth have been identified. This informa- 
tion, together with that gained from the mouse and 
human genome projects, indicates that hundreds or 
thousands of genes are expressed in preimplantation 
mammalian embryos, with blocks of closely linked 
genes acting in concert to regulate successive cleavage 
stages. 


See also: Ethics and Genetics; Fertilization 
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In vitro Mutagenesis 
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In vitro mutagenesis methods, especially site-directed 
mutagenesis, have revolutionized our understanding 
of protein function and gene regulation. In vitro muta- 
genesis describes the process by which a researcher 
alters one or more base pairs in a cloned gene; expres- 
sion of the gene yields a protein with one or more 
altered amino acids. These mutant proteins may show 
a change in function, such as lost or altered activity. 
The ability to manipulate precisely the chemical na- 
ture of a gene — and therefore the protein encoded by 
this DNA - has enabled biologists to identify protein 
function, characterize protein structure, and manipu- 
late the activity of a protein in vivo. Furthermore, 
‘protein engineers’ have used site-directed and random 
mutagenesis procedures to create new proteins 
designed to have unique or improved function. 

In vitro mutagenesis has been enabled by a number 
of breakthroughs in biotechnology. Other articles in 
this encyclopedia describe discovery and uses of 
recombinant DNA, DNA polymerases, the polymer- 
ase chain reaction (PCR), and restriction endonu- 
cleases. This article will describe the application of 
these technologies to the mutagenesis of recombinant 
genes. 


Nonselective Mutagenesis 


Deletions 

Nested deletion mutagenesis has been used to identify 
functional domains of proteins and RNA. By this 
method, the plasmid containing the gene of interest 
is linearized at a restriction site near the gene. The gene 
is then cleaved for discrete amounts of time by the 
enzyme exonuclease III, which removes bases from 
duplex DNA containing a 5’ overhang. The result is a 
‘nested set’ of plasmids in which the gene fragments 
vary in length from one side of the gene and contain a 
common end. These partially digested genes are then 
recloned into a plasmid vector and transformed into 
Escherichia coli. In one early example of this method, 
researchers studying 5S ribosomal RNA used exo- 
nuclease III to delete bases from the 5’ end and identi- 
fied regions within the 5S rRNA gene which control 


its transcription initiation. 


Chemical Damage and Enzymatic 
Misincorporation 

Chemical mutagenesis and enzymatic misincorpor- 
ation techniques cause a small number of mutations 


throughout a piece of DNA. Both methods yield a 
library of mutations which are cloned into a plasmid 
and then screened or selected for function. Commonly 
used chemicals include sodium bisulfite, formic acid, 
and hydrazine. Sodium bisulfite causes the deamin- 
ation of cytosine to uracil; during DNA synthesis, the 
altered base is paired with adenosine instead of gua- 
nine. Hydrazine and formic acid remove bases from 
the DNA strand, creating abasic sites that can pair 
with any one of the four bases during enzymatic 
synthesis. Nucleotides can also be altered at random 
sites through misincorporation of deoxyribonucle- 
otide triphosphates (dNTPs) during DNA synthesis. 
For example, DNA polymerase runs with impaired 
fidelity in the presence of manganese ions, and occa- 
sionally adds an incorrect base. Alternatively, when 
one of the dNTPs is added in very low concentrations, 
the enzyme will sometimes misincorporate one of 
the other three bases. Certain dNTPs, such at N6- 
hydroxydeoxycytidine, are also mutagenic, and can 
cause mispairing mutations. In all cases, the frequency 
of mutation is increased by using DNA polymerase 
without a proofreading function (such as Klenow 
fragment from E. coli). The modern version of en- 
zymatic misincorporation, error-prone PCR, is regu- 
larly used to make mutant DNA libraries. 


Site-Directed Mutagenesis 


Site-directed mutagenesis involves the specific substi- 
tution of one DNA base for another. Unlike the non- 
specific mutations described above, site-directed 
mutagenesis allows precise control of the number, 
placement, and base substitution of mutants. The two 
classes of site-directed mutagenesis include methods 
that use double-stranded DNA cassettes and those 
that use single-stranded oligonucleotide primers. 
All of the techniques described here can give high 
yields of the desired mutations; the choice of muta- 
genesis method is largely a matter of convenience and 
personal preference. 

Site-directed mutagenesis is possible because of the 
invention of automated chemical synthesis of DNA 
and the overexpression of DNA-processing enzymes. 
Through chemical DNA synthesis, defined oligonu- 
cleotides up to ~100 bases can be prepared repro- 
ducibly and inexpensively. Synthetic oligonucleotides 
are used extensively for site-directed mutagenesis, as 
primers for DNA polymerase and as oligonucleotide 
cassettes. Equally important has been the identifica- 
tion and overexpression of DNA-modifying enzymes, 
including restriction endonucleases for cleaving DNA 
at specific recognition sites and DNA polymerases for 
generating double-stranded DNA from a single- 
stranded template. Furthermore, the discovery of 


thermophilic DNA polymerases has enabled PCR- 
based methods for site-directed mutagenesis. 


Cassette Mutagenesis 

In cassette mutagenesis, a synthetic double-stranded 
oligonucleotide ‘cassette’ containing the desired 
mutations is docked between two restriction enzyme 
sites on a plasmid vector. In the simplest procedure, 
the restriction sites are separated by no more than 100 
base pairs; the ends of the oligonucleotide duplex are 
complementary to the restriction cleavage sites so that 
the cassette can be readily ligated into the plasmid 
(Figure IA). Since dozens of restriction enzymes are 
commercially available, it is often possible to identify 
restriction sites near the sequence of interest. 

One clever cassette design takes advantage of 
restriction endonucleases such as BspMI and Bcgl 
which cleave DNA several base pairs away from 
their recognition sequences. Bcgl, for instance, cleaves 
DNA at any sequence 10 bases away from each side of 
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the enzyme’s specific binding site, while BspMI cleaves 
on one side of an asymmetric recognition sequence. 
The recognition sequence and product of BcgI cleav- 
age are shown below, where N is any nucleotide: 


Nio-CGA-N6-TGC-N:2 
N12-GCT-N6- ACG-Nio 


To prepare a BcgI cassette, these recognition 
sequences are added into a cloned gene by PCR 
such that the restriction site replaces the region to 
be mutated (Figure IA). A cassette is synthesized 
to contain (1) the gene sequence which was removed 
from the vector, (2) the desired mutations, and 
(3) ends complementary to the products of Bcgl 
cleavage. The site-directed mutant is then made by 
cutting the plasmid with BcgI and ligating in the 
cassette. The advantage of these vectors is that the 
restriction enzyme sites are cut out of the gene when 
the cassette is added. Thus, the recombinant gene does 


a. anneal mutagenic 
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of E. coli 
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Methods of oligonucleotide-directed mutagenesis. (A) Cassette mutagenesis with Bcgl-containing plasmid. 


The Bcgl-containing plasmid is constructed by removing the region to be mutagenized and replacing it with a Bcgl 
recognition sequence. Short arrows show sites of Bcgl cleavage. Following cleavage by the restriction enzyme, the 
mutagenic cassette is ligated into the gene. Note that the restriction sites are removed during mutagenesis. (B) dU 
method for primer-based mutagenesis. A dU-containing single-stranded plasmid is prepared from an M13 vector in a 


dut ung — 


strain of Escherichia coli. The mutagenic oligonucleotide is hybridized to the dU template (the mutagenic 


primer shown will create an insertion in the gene of interest). The rest of the second strand is filled in by DNA 
polymerase and ligated by DNA ligase. Transformation into a du ‘ung’ strain of E. coli results in degradation of the dU 


strand and propagation of the mutant. 
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not need to contain unique restriction sites, and the 
wild-type vector can be readily distinguished from the 
mutant by restriction digest. Furthermore, since linear 
DNA is not readily transformed and replicated in 
cells, precutting with the restriction enzyme before 
transformation will increase the yield of mutant 
clones. This type of cassette has been used in the 
mutational analysis of HIV reverse transcriptase. 


Primer-Directed Mutagenesis 


General methods 

Site-directed mutagenesis can also be accomplished 
using an oligonucleotide containing the desired muta- 
tion, called a mutagenic oligonucleotide, as a primer for 
DNA synthesis. By this technique, the single-stranded 
oligonucleotide is hybridized to a single-stranded plas- 
mid, using bases complementary to the wild-type gene. 
The mutagenic region of the oligonucleotide can con- 
tain several single base mismatches, or it can be much 
longer or shorter than the wild-type sequence (yielding 
insertions or deletions in the mutated gene). DNA 
polymerase initiates synthesis of the DNA at the oligo- 
nucleotide and fills in the second strand; addition of 
DNA ligase seals the nick in the newly synthesized 
strand. Transformation of this heteroduplex plasmid 
produces both wild-type and mutant plasmids in 
E. coli, but several methods (see below) have been 
devised to increase the proportion of mutants. 

DNA templates for oligonucleotide-based muta- 
genesis are readily prepared using the single-stranded 
DNA bacteriophage M13. Commercially available 
plasmids contain M13 replication initiation sites as 
well as cloning sites with regulated promoters. Thus, 
a single plasmid can be used for cloning, M13 muta- 
genesis, and protein expression. 


dU Method 

Variations of the primer-based method increase the 
yield of mutant by preferentially degrading the tem- 
plate strand. A commonly used technique, first 
described by Kunkel, takes advantage of dut ung” 
strains of E. coli (Figure 1B). Whereas most bacteria 
will degrade DNA containing uracil (dU-DNA), dut” 
ung strains are are deficient in the degradadation of 
both dUTP (dut) and dU-DNA (ung). Thus, M13 
templates isolated from dut ung” bacteria will con- 
tain some dU in place of dT. After hybridization of the 
mutagenic oligonucleotide, DNA synthesis and liga- 
tion, the heteroduplex plasmid is transformed into a 
dut* ung* strain of E. coli which degrades the wild- 
type, dU-containing template strand but not the newly 
synthesized mutant strand. Thus, mostly mutagenic 
plasmid is propogated. Other methods use similar ap- 
proaches by adding methyl-dC or thiophosphate-dC 


during in vitro DNA synthesis; these modifications 
make the mutagenic strand resistant to degradation by 
certain restriction enzymes. 


Polymerase Chain Reaction 

PCR-mediated mutagenesis is similar to the oligonu- 
cleotide methods described above, in that a mutagenic 
oligonucleotide is used as a primer for DNA synthesis. 
An advantage of the PCR method lies in the inherent 
amplification of the mutagenic DNA, which requires 
only asmall amount of the wild-type DNA as template. 
PCR mutagenesis can be performed on linear pieces 
of DNA, such as restriction fragments, as well as 
on circular plasmids. Figure 2 pictures some of the 
methods discussed below. 


PCR Mutagenesis of Linear DNA 

If the desired mutation is found near a restriction 
enzyme site, the mutation can be incorporated by 
preparing one PCR primer containing the mutation 
and the restriction site and a second primer contain- 
ing a downstream restriction site. The PCR product 
is then treated with the restriction enzymes and 
ligated into the plasmid as a DNA cassette. If there 
are no restriction sites near the mutagenic sequence, 
‘overlap-extension’ PCR and ‘megaprimer’ PCR can 
be used to introduce the mutations. Overlap-exten- 
sion PCR requires four primers and three PCR steps 
(Figure 2A). The first two PCR steps produce two 
overlapping DNA fragments, both containing the 
desired mutation. The final PCR step uses the outside 
primers to stitch together the two fragments into the 
full-length cassette. Megaprimer PCR, a variant of the 
overlap-extension method, uses three primers and two 
PCR steps. The first step yields a DNA fragment 
containing one restriction site and the mutations. 
This long DNA fragment is used as a megaprimer in 
the second PCR step along with a primer containing 
the second restriction site. 


PCR Mutagenesis of Circular DNA 

PCR mutagenesis can also be used to amplify the 
entire plasmid containing the gene of interest. One 
straightforward method, termed ‘inverted’ or ‘coun- 
ter’ PCR, uses back-to-back primers (Figure 2B); one 
PCR primer serves as the mutagenic oligonucleotide 
and the other oligonucleotide primes from the oppos- 
ite strand, adjacent to the mutagenic primer. The PCR 
product is a full-length, linear plasmid which is then 
phosphorylated and ligated before transformation. 
This method can readily be used to make deletion 
mutants by creating a gap between the primers. Vari- 
ants of this method include ‘recombinant circle’ PCR 
(Figure 2C) and ‘recombination’ PCR, both of which 
rely on recombination of linear plasmids. In these 
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Figure 2 Methods of PCR mutagenesis. (A) Extension-overlap PCR generates mutations between two restriction 
enzyme sites. Four primers are prepared; two containing the restriction sites (primers | and 4) and two containing 
the mutagenic sequence (primers 2 and 3). After two PCRs, fragments I—2 and 3—4 are combined and stitched 
together by PCR using primers | and 4. The long product |—4 is restricted and ligated into the vector. (B) Inverted 
PCR uses two back-to-back primers, one containing the mutations of interest. PCR yields the full-length, linear 
plasmid which is made into closed circular DNA by DNA ligase. (C) Recombinant circle PCR uses two sets of 
primers. Primers 2 and 3 contain the mutagenic sites and prime opposite strands; primers | and 4 prime from 
different positions on the plasmid. The PCR products l-2 and 3—4 are truncated, linear versions of the plasmid; 
recombination in vitro gives gapped plasmids which are repaired in E. coli. 


techniques, two inverse PCRs are performed with 
gapped primers at different sites. These two mutant 
plasmids are then recombined in vitro by mixing 
and annealing (recombinant circle PCR) or in vivo 
(recombination PCR). The gaps are then repaired by 
the bacterial DNA repair machinery. 


Libraries of Mutations 


Combinatorial and random mutagenesis methods cre- 
ate libraries of DNA which are subsequently screened 
or selected for function. By analyzing large numbers 
of clones simultaneously, a small number of active 
mutants can be separated from a pool of millions 
of variants. The preparation of libraries, selection of 
‘winners’ and amplification of these selectants is often 
called ‘in vitro evolution.’ 


Doped versus Saturation Mutagenesis 

Libraries of mutant DNA molecules can be designed 
such that a small number of random mutations are 
introduced throughout the gene — analogous to the 
nonselective mutagenesis described above — or large 
number of mutations are focused ona small region of a 
gene. When all possible DNA mutations can be found 
at a given site with equal frequency, the site is 
described as ‘saturated.’ When the number of muta- 
tions at a given site is small, the site is said to be 
‘doped’ with the mutation. Saturation mutagenesis 
is readily accomplished through automated DNA 
synthesis. During synthesis, discrete bases are added 
in sequence to the growing DNA chain; to saturate a 
position on this chain, equal amounts of all four bases 
are added simultaneously. Doping can similarly be 
accomplished by mixing a measured fraction of 
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mutagenic base at a given site. After these doped or 
saturated mutagenic oligonucleotides have been 
synthesized, im vitro mutagenesis proceeds as usual 
via cassette mutagenesis or primer-based mutagenesis. 
Error-prone PCR offers an alternate method for dop- 
ing mutants throughout a gene; the rate of mutagen- 
esis is approximately 0.7% for Taq DNA polymerase. 
Both saturation and doping strategies have been used 
to identify critical protein residues and to create novel 
binding or catalytic functions. Examples include pep- 
tides that antagonize or agonize cell-surface receptors, 
and enzymes that are active in nonaqueous environ- 
ments. 


Mutagenesis and Recombination 

An increasingly popular method for generating 
libraries of a gene utilizes an in vitro recombination 
technique called ‘DNA shuffling.’ In DNA shuffling, 
one or more genes are randomly chopped into smaller 
pieces of DNA by a nuclease and reconnected with a 
DNA polymerase. During this reconstruction phase, 
homologous fragments of DNA can anneal and prime 
each other, creating a recombined gene. Mutations are 
incorporated into the genes via errors during DNA 
polymerization. DNA shuffling has been used to opti- 
mize the function of proteins as well the activity of 
whole operons and viruses. 


Prospects 


In vitro mutagenesis has become an integral part of 
genetic analysis. Controlled mutagenesis has identi- 
fied the function of new genes, a process termed 
‘reverse genetics,’ and allowed dissection of the 
mechanism of known proteins. Additionally, site- 
directed mutagenesis has become an important tool 
in biotechnology. For example, the design of non- 
immunogenic antibodies for human therapeutics 
underscores the practical benefits of mutagenesis 
and protein engineering. The availability of DNA- 
modifying enzymes, cloning vectors, and synthetic 
DNA make site-directed mutagenesis straightforward 
in most laboratories; its applications are limited only 
by the imagination. 
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In vitro packaging is the method of reconstituting 
a virus in vitro by mixing the protein components of 
the virus with nucleic acid. The protein components 
of the virus are prepared from extracts of infected cells 
by eliminating the nucleic acids in the extract. The 
nucleic acid component to be packaged is usually an 
in vitro recombinant DNA construct. In vitro pack- 
aging is useful as a means to efficiently introduce a 
DNA fragment recombined with a viral vector into a 
cell by using infective properties of viral particles to 
pass through the cell wall/membrane. 


See also: Vectors 


Inborn Errors of 
Metabolism 
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An inborn error of metabolism is a biochemical or 

genetic lesion that gives rise to an inherited metabolic 

block. Many are due to the inability to synthesize an 
y y y- 


individual protein or the production of a biologically 
inefficient form of a protein. 


See also: Genetic Diseases; Metabolic Disorders, 
Mutants 
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An inbred strain is a population of animals that result 
from a process of at least 20 sequential generations 
of brother-sister matings. The resultant animals are 
essentially clones of each other at the genetic level. 
When two animals have the same strain name — such as 
BALB/c or C57BL/6 — it means that they can both 
trace their lineage back through a series of brother- 
sister matings to the very same mating pair of inbred 
animals. With the use of the same standard inbred 
strain, it is possible to eliminate genetic variability as a 
complicating factor in comparing results obtained from 
experiments performed in any laboratory in the world. 


The Generation of Inbred Strains 


The offspring that result from a mating between two 
F, siblings are referred to as members of the ‘second 
filial generation’ or F; animals, and a mating between 
two F; siblings will produce F; animals, and so on. An 
important point to remember is that the filial (F) gen- 
eration designation is only valid in those cases where a 
protocol of brother-sister matings has been strictly 
adhered to at each generation subsequent to the initial 
outcross. Although all F; offspring generated from an 
outcross between the same pair of inbred strains will 
be identical to each other, this does not hold true in the 
F, generation which results from an intercross where 
three different genotypes are possible at every locus. 
However, at each subsequent filial generation, genetic 
homogeneity among siblings is slowly recovered in a 
process referred to as ‘inbreeding.’ Eventually, this 
process will lead to the production of inbred animals 
that are genetically homogeneous and homozygous at 
all loci. 

The process of inbreeding becomes understandable 
when one realizes that at each generation beyond 
F,, there is a finite probability that the two siblings 
chosen to produce the subsequent generation will be 
homozygous for the same allele at any particular locus 
in the genome. If, for example, the original outcross 
was set up between animals with genotypes AA and aa 
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at the A locus, then at the F; generation, there would 
be animals with three genotypes AA, Aa, and aa pre- 
sent at a ratio of 0.25:0.50:0.25. When two F; siblings 
are chosen randomly to become the parents for the 
next generation, there is a defined probability that 
these two animals will be identically homozygous at 
this locus. Since the genotypes of the two randomly 
chosen animals are independent events, one can derive 
the probability of both events occurring simultan- 
eously by multiplying the individual probabilities 
together according to the ‘law of the product.’ Since 
the probability that one animal will be AA is 0.25, the 
probability that both animals will be AA is 0.25 x 0.25 
= 0.0625. Similarly, the probability that both animals 
will be aa is also 0.0625. The probability that either of 
these two mutually exclusive events will occur is 
derived by simply adding the individual probabilities 
together according to the ‘law of the sum’ to obtain 
0.0625 + 0.0625 = 0.125. 

If there is a 12.5% chance that both F, progenitors 
are identically homozygous at any one locus, then 
approximately 12.5% of all loci in the genome will 
fall into this state at random. The consequence for 
these loci is dramatic: all offspring in the following F; 
generation, and all offspring in all subsequent filial 
generations will also be homozygous for the same 
alleles at these particular loci. Another way of looking 
at this process is to consider the fact that once a starting 
allele at any locus has been lost from a strain of ani- 
mals, it can never come back, so long as only brother- 
sister matings are performed to maintain the strain. 

At each filial generation subsequent to F3, the class 
of loci fixed for one parental allele will continue to 
expand beyond 12.5%. This is because all fixed loci 
will remain unchanged through the process of incross- 
ing, while all unfixed loci will have a certain chance of 
reaching fixation at each generation. 

After 20 generations of inbreeding, 98.7% of the 
loci in the genome of each animal should be homo- 
zygous. This is the operational definition of ‘inbred.’ 
At each subsequent generation, the level of hetero- 
zygosity will fall off by 19.1%, so that at 30 gener- 
ations, 99.8% of the genome will be homozygous and 
at 40 generations, 99.98% will be homozygous. 

These calculations are based on the simplifying 
assumption of a genome that is infinitely divisible 
with all loci assorting independently. In reality, the 
size of the genome is finite and, more importantly, 
linked loci do not assort independently. Instead, large 
chromosomal chunks are inherited as units, although 
the boundaries of each chunk will vary in a random 
fashion from one generation to the next. As a conse- 
quence, there is an ever-increasing chance of complete 
homozygosity as animals pass from the 30th to 60th 
generation of inbreeding. In fact, by 60 generations, 
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one would be virtually assured of a homogeneous 
homozygous genome if it were not for the continual 
appearance of new spontaneous mutations (most of 
which will have no visible effect on phenotype). How- 
ever, every new mutation that occurs will soon be fixed 
or eliminated from the strain through further rounds 
of inbreeding. Thus, for all practical purposes, animals 
at the Feo generation or higher can be considered 100% 
homozygous and genetically indistinguishable from 
all siblings and close relatives. 


See also: Homozygosity; Mutation, Spontaneous 
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129 is the name given toa group of related inbred strains 
of mice that are commonly used in germline genetic 
manipulation experiments. The various 129 strains 
have been used as the source of a series of embryonic 
stem cell lines that can be readily manipulated in tissue 
culture and then directed back into the mouse germline 
through a process of chimera formation. 


See also: Chimera; Embryonic Stem Cells; Inbred 
Strain 


Inbreeding 


See: Inbred Strain 
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The major hurdle that must be overcome in the devel- 
opment of new inbred strains from wild populations is 
inbreeding depression which occurs most strongly 
between the F, and Fg generations (second through 
eighth generation of sequential brother-sister mating). 
The cause of this depression is the load of deleterious 
recessive alleles that are present in the genomes of wild 
animals as well as all other animal species. These dele- 
terious alleles are constantly generated at a low rate by 
spontaneous mutation but their number is normally 
held in check by the force of negative selection acting 


upon homozygotes. With constant replenishment and 
constant elimination, the load of deleterious alleles 
present in any individual mammal reaches an equilib- 
rium level of approximately ten. Different unrelated 
individuals are unlikely to carry the same mutations, 
and as a consequence, the effects of these mutations 
are almost never observed in large randomly mating 
populations. 

However, it not surprising that during the early 
stages of inbreeding, many of the animals will be 
sickly or infertile, because deleterious recessive muta- 
tions present singly in one parent are likely to be 
homozygous in future inbred generations. At the F2 
to Fg generations, the proportion of sterile animals is 
often so great that the earliest mouse geneticists 
thought that inbreeding was a theoretical impossibil- 
ity. Obviously they were wrong. But, to succeed, one 
must begin the production of a new strain with a very 
large number of independent F, x F; lines followed 
by multiple branches at each following generation. 
Most of these lines will fail to breed in a productive 
manner. But, an investigator can continue to breed the 
few most productive lines at each generation — these 
are likely to have segregated away most of the deleteri- 
ous alleles. The depression in breeding will begin to 
fade away by the Fg generation with the elimination of 
all of the deleterious alleles. Inbreeding depression 
will not occur when a new inbred strain is begun 
with two parents who are themselves already inbred 
because no deleterious genes are present at the outset 
in this special case. 


See also: Breeding of Animals; Inbred Strain 
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Incompatibility is the inability of certain plasmids to 
coexist in the same cell and is a cause of plasmid 
immunity. 


See also: Immunity; Plasmids 
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A mutantallele is said to show ‘incomplete dominance’ 
or ‘semidominance’ when its phenotypic effects as a 


heterozygote are distinctly dominant but less severe 
than when homozygous. For example, for a hypothet- 
ical locus b affecting hair growth, bb homozygotes 
have normal hair, Bb heterozygotes show partial bald- 
ness, while all BB homozygotes are completely bald; 
the B allele shows incomplete dominance since the 
heterozygous phenotype is less severe than that of 
BB homozygotes. In most cases, the phenotype of 
heterozygotes is intermediate relative to the wild- 
type and the homozygous states. 

The term incomplete dominance is similar, but 
distinct in meaning to the term codominant. The 
distinction between codominance and incomplete 
dominance is that codominance refers to pairs of 
alleles, while semidominance refers to a single allele. 
Codominance is observed when individuals that are 
heterozygous for alternative alleles at the same locus 
express both phenotypes observed in the correspond- 
ing homozygotes, or when all three classes (both 
classes of homozygotes and one class of heterozy- 
gotes) are all distinguishable from each other. 

Incomplete dominance may also be used with 
respect to fitness, rather than with respect to the vis- 
ible effects of a gene, as described above. A novel 
dominant allele may show no visible phenotypic dif- 
ferences in homozygotes versus heterozygotes, but 
may have an effect on the overall fitness of an organ- 
ism such that hetrozygotes may gain only a partial 
increase in fitness that is less than the benefits afforded 
by homozygosity. 


See also: Codominance; Heterozygote and 
Heterozygosis 


Incomplete Penetrance 


See: Penetrance 
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A cross between two organisms that have the same 
homozygous genotype at designated loci, for example, 
between members of the same inbred strain. 


See also: Backcross; Inbred Strain; Intercross; 
Outcross 
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Two homologous molecular sequences are often of 
unequal length indicating that either one gene has 
suffered an insertion or the other a deletion. In the 
absence of further information, it is hard to tell which 
of the two possibilities is correct. It is thus easier to 
indicate such differences as indels. 


See also: Deletion; Insertion Sequence 
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Independent assortment is one of the two great 
principles annunciated by Mendel that underlie our 
awareness of genes as units of heredity. Mendel pro- 
posed that a hybrid individual produces two gamete 
types in equal frequency for each heterozygous 
character and that the choice of trait for each character 
is independent of the other character. We recognize this 
as producing four gamete types in equal frequency from 
a dihybrid, resulting in progeny in the ratios 9:3:3:1 
froma self-cross or 1:1:1:1 from a test cross. In modern 
terminology the observation of independent assort- 
ment means the segregation of alternative alleles at 
one locus or gene is not influenced by the segregation 
of the alternative alleles at a second locus. 

Although generations of students have struggled to 
keep the laws of segregation and independent assort- 
ment clear in their minds, independent assortmentis the 
less important in understanding the biological mech- 
anisms of heredity. What is important is that genes are 
part of chromosomes. The reductional (first) division 
of meiosis separates homologous parental chromo- 
somes, providing a mechanism for the Mendelian 
segregation of alternative alleles into different 
gametes. This was confirmed by nondisjunction that 
results in the wrong inheritance of both parental homo- 
logous chromosomes with both parental alleles at a 
gene or neither chromosome nor parental allele. 
Genes located on different (nonhomologous) chromo- 
somes show patterns of segregation that are inde- 
pendent of each other. Thus, one meiotic origin of 
independent assortment is that genes are located on 
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nonhomologous chromosomes. By observing meiosis 
in species where chromosome size and shape differ 
sufficiently, individuals heterozygous for two distinct 
pairs of nonhomologous chromosomes can be seen to 
produce four gamete types equivalently from two 
different patterns of reductional divisions. Genes lo- 
cated on the same pair of homologous chromosomes 
may not show independent assortment. Linkage, rec- 
ognized as the exception to independent assortment, 
locates genes on the same chromosome. The fact that 
genes can be traced to specific chromosomes and can 
be located within chromosomes is a key to identifying 
individual genes that is a major goal of genetics. Genes 
are commonly identified through their neighbors or 
their position on a chromosome. In that sense linkage, 
rather than independent assortment, is the more useful 
concept for the modern study of genes. 

When comparing segregation patterns of alleles at 
two genes, such as with linkage studies, independent 
assortment is the default or null hypothesis. This is 
because it is constant whereas linkage results cannot 
be predicted in advance. There are two approaches to 
testing the observed results by means of the 7’ test to 
see if the results fit the predictions of independent 
assortment. The usual way simply compares each 
observed number of progeny class against an expected 
value derived from the total progeny number x 25% 
(for a dihybrid test cross). The x? test compares each 
difference between observed and expected numbers 
to arrive at a total value based on the differences. 
With four classes this test has three degrees of freedom. 
The x’ test leads to a conclusion that accepts the null 
hypothesis, when the expected and observed numbers 
are similar, or rejects the null hypothesis when the 
observed numbers are too different from expected. In 
that case it is the expected numbers that are rejected. In 
some situations, such as reduced viability of some of 
the progeny classes, the 7” value may be large and lead 
to rejecting the null hypothesis but may not mean 
linkage. A further test for linkage or independent 
assortment is to use a 2-by-2 contingency table to 
generate the predicted numbers based on the observed 
subtotals for each row and column. Based on independ- 
ence the number expected for one cell in the table is 
one row subtotal x one column subtotal with the 
product divided by the total number of progeny. 
Comparing the actual and expected numbers usually 
gives a better fit by this approach, but the number of 
degrees of freedom is reduced to one. 

There are two models to understand the meiotic 
origin of independent assortment. One is that the 
two marked genes are located on nonhomologous 
chromosomes. The other is that the two genes are 
located on the same pair of chromosomes but are 
far enough apart that recombination in the interval 


between them mimics independent assortment. On 
either model the observation that half the gametes 
are parental and half recombinant equals a map dis- 
tance of 50 units between the genes, indicating inde- 
pendent assortment. The reason that recombination 
distances cannot exceed 50 units for a dihybrid cross is 
thata crossover involves just two of the four chromatids 
present in a meiotic prophase bivalent. Every cell with 
one crossover in an interval potentially yields two 
crossover-bearing gametes and two non-crossover- 
bearing products of meiosis. With multiple crossovers 
within an interval but no preferential distribution of 
which chromatids are involved with each, half the 
gametes will contain either zero or an even number 
of crossovers and half the gametes will contain one 
or an odd number of crossovers. The former will be 
scored as non-crossovers and the latter will be scored 
as crossover-bearing gametes. 

In the end most pairs of genes assort independently 
of each other. That is why linkage is a powerful state- 
ment for genetic investigations. The chance that two 
genes will not assort independently can be assessed 
for a species by taking into account the number of 
chromosomes and the level of recombination. In 
species like the fruit fly Drosophila, with few chromo- 
somes and moderate levels of recombination, the 
chance of independent assortment is perhaps about 
80% for two genes chosen at random. In species like 
humans, with 23 chromosome pairs and low to mod- 
erate recombination levels the chance is closer to 99%. 
In species like the yeast Saccharomyces cerevisiae, with 
16 chromosome pairs and high levels of recombin- 
ation, independent assortment is almost always 
expected. The importance of independent assortment 
rests with shuffling the genome at each meiosis to 
create new combinations of alleles from those making 
up the parental generation. This permits the popula- 
tion to more rapidly change genotypes in response to 
environmental changes. This feature was probably 
necessary for the development of biological complex- 
ity such as multicellularity. And it underlies most 
theories of the origin of sexual reproduction. 


See also: Linkage; Mendel’s Laws; Mendelian Ratio 
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When a diploid undergoing meiosis is heterozygous 
at two or more loci (Aa, Bb, Cc, etc.), the haploid 


meiotic products will each carry one or other of each 
of the pairs of alleles. With n heterozygous loci there 
will be 2” different kinds of haploid product. If all 
the allelic differences are segregated independently, 
all combinations will be equally frequent, apart from 
sampling error and any differences in viability. For 
example, with three loci, there will be eight equally 
frequent meiotic products: ABC, abc, Abc, aBC, ABe, 
abC, AbC, and aBc. 

Independent segregation, also called independent 
assortment, occurs when the allelic differences are 
associated with different chromosome pairs and hence 
different linkage groups and is explained by the fact 
that different bivalent chromosomes at the first meta- 
phase of meiosis are oriented at random with respect 
to the spindle poles, as are the dyads at second division 
metaphase. Since nearly all eukaryotic organisms 
have several or many chromosome pairs, independent 
rather than linked segregation is the most common 
outcome of meiosis in double or multiple heterozy- 
gotes. It should be noted that allelic differences on the 
same chromosome can also segregate independently if 
their loci are sufficiently far apart. For a discussion of 
linked segregation see Three-Point Cross (Test-Cross). 

When, exceptionally, different chromosome pairs 
fail to show independent segregation it may be because 
they have undergone a reciprocal exchange of seg- 
ments (see Segmental Interchange). 

Independent assortment also occurs when a diploid 
becomes haploid through random loss of chromo- 
somes during mitotic growth, as can happen in such 
normally haploid fungi as Aspergillus nidulans. 


See also: Aspergillus nidulans; First and Second 
Division Segregation; Heterozygote and 
Heterozygosis; Linkage Group; Meiosis; 
Segmental Interchange; Three-Point Cross (Test- 
Cross); Translocation 
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An inducer is a small molecule that triggers gene tran- 
scription on binding to a regulator protein. 


See also: Induction of Transcription 


Inducible Enzyme, 
Inducible System 


See: Induction of Transcription 
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Induction of Prophage 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1875 


Induction of prophage is the excision of phage DNA 
from the host genome and entry into the lytic (infect- 
ive) cycle. It occurs as a result of destruction of the 
lysogenic repressor. 


See also: Prophage 
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All organisms have developed and continue to develop 
mechanisms to adapt to an ever-changing environ- 
ment. For microorganisms like bacteria or yeast, the 
carbon sources they can use for growth may change 
fast and drastically. A suitable carbon source that was 
present in great amounts may disappear and may sud- 
denly be replaced by another carbon source. Sensors 
may sense the presence or absence of carbon sources. 
To adapt and optimize the transcription frequency of 
the genes which code for the relevant permeases and 
enzymes involved in the metabolism of these carbon 
sources is the simplest way of responding to such 
changes. Such adaptation may happen by mutation 
or induction. The term induction was introduced 
into bacterial genetics in 1953 by Melvin Cohn, 
Jacques Monod, Martin Pollock, Sol Spiegelman and 
Roger Stanier at a time when its mechanism was not 
known (Cohn et al., 1953). Then, it was believed that 
an inactive precursor of the enzyme would interact 
with the inducer. By folding in the presence of the 
inducer, the inactive precursor would be transformed 
into an active enzyme. This was called instruction 
theory. 

Induction has been studied extensively in Escher- 
ichia coli, other bacteria, and yeast. We now know 
induction implies that a particular compound acts as 
an inducer and turns on the transcription of one or 
several genes. A particular enzyme, or at the extreme, 
a whole system of enzymes and proteins may thus be 
inducible. An inducer may work either by counter- 
acting repression or by stimulating activation of tran- 
scription. The compound may also act as a corepressor, 
such as tryptophan with Trp repressor. It may also 
act indirectly and use signal transduction. The fact 
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that the inducer or corepressor is often metabolized 
may obscure the analysis. The detailed description of 
an example is illuminating. The lactose system in E. coli 
may serve as a paradigm of enzyme induction. E. coli 
grown on lactose produce about 3000 molecules of 
tetrameric B-galactosidase per cell. In contrast, E. coli 
grown on glycerol produce about three molecules 
of B-galactosidase per cell. The steps which lead to 
induction will now be listed using the example of 
this case. 


1. Inducer has to enter the cell in order to induce. 
E. coli is not freely accessible to chemicals from 
the outside. Every molecule on the outside has to 
be transported by a specific transporter or permease 
to the inside. At the start of induction, lactose is 
transported by one of the one or two Lac permease 
molecules which are produced by the lac operon in 
the absence of any inducer. 

Lactose (1-4-galactosido-B-p-glucose) itself is not 

an inducer. It has to be metabolized to allolac- 

tose (1-6-galactosido-B-p-glucose) which then 
acts as an inducer. Lactose which has entered the 

E. coli cell meets there the very few molecules of B- 

galactosidase which are produced in the absence of 

inducer. They isomerize lactose into allolactose 
before hydrolyzing it into glucose and galactose. 

That lactose is not the inducer can be demonstrated 

in Z (-galactosidase negative) cells. Lac per- 

mease, which belongs to the same operon as B- 

galactosidase, is not induced by lactose in such 

cells. However, it is induced by allolactose. 

3. If one wants to study the process of induction in 
detail, one has to use an inducer which is not metab- 
olized, a gratuitous inducer. Such inducers have been 
synthesized in large numbers for the lac system. In 
contrast to lactose or ordinary f-p-galactosides 
they are all 1-thio-B-p-galactosides which are not 
hydrolyzed by the amounts of B-galactosidase pre- 
sent. The structures and tests of such synthetic 
thiogalactosides indicate that the steric demands 
for an optimal inducer are very specific. Isopro- 
pyl-1-thio-B-p-galactoside (IPTG) is the best in- 
ducer of all 1-thio-B-p-galactosides. If a saturating 
amount of IPTG (107° mol 17') is added to lac 
wild-type (I"O*Z*Y*) cells growing on glycerol, 
newly synthesized B-galactosidase can be detected 
3 min after the addition of the inducer: 3 min is 
the time it takes to synthesize the four subunits 
of B-galactosidase which form one molecule. 
From then on the rate of synthesis does not 
change any more. Such kinetic measurements were 
used by Jacques Monod to argue for de novo 
synthesis of B-galactosidase and against the in- 
struction theory. Finally, compounds exist which 


b 


counteract induction: they are called anti-inducers. 
o-Nitrophenyl-B-p-fucoside (i.e., o-nitrophenyl- 
B-p-6-deoxygalactoside) is the best-known ex- 
ample of an anti-inducer of the lac operon. 

Inducers inactivate repressors or activate activators. 
Lac repressor occurs in two conformations. In the 
absence of inducer it binds tightly to lac operator 
DNA and thus represses transcription from the 
adjacent lac promoter. In the presence of inducer 
it changes its conformation and binds about 1000- 
fold less tightly to lac operator. In the wild-type 
situation induction of the lac operon depends 
on the second or third power of inducer (IPTG) 
concentration. A close analysis indicates that all 
four subunits participate in operator binding. Two 
subunits bind the main operator, the two other sub- 
units bind an auxiliary operator. Thus only one 
monomer of tetrameric Lac repressor has to be 
occupied by inducer in order that repression 
decreases drastically. Indeed induction of Lac 
repressor does not follow the model of an allostery, 
where either all four or none of the subunits of Lac 
repressor would have to change their conformation. 
One subunit after the other binds to inducer as 
inducer concentration increases. Finally it should 
be pointed out that the exact mechanism of the 
detailed structural changes of Lac repressor during 
induction is unknown. lac mutants have been isol- 
ated, which still repress but in which inducer does 
not induce any more. This may happen either by 
destruction of the inducer binding site of Lac 
repressor or by destruction of the region where 
the structural changes caused by inducer binding 


occur. Such mutants are negative dominant. They 
are called 7°. 


> 


Induction was explained according to the well- 
analyzed paradigm of the lac operon. Inspection of 
other systems indicates that they act in principle in a 
similar manner but often differently in detail. Some 
examples will illustrate this. It was stated in the begin- 
ning that induction may work either through counter- 
acting of repression or through stimulating activation. 
Like the lac system, the gal system of E. coli is induced 
by D-galactose which inactivates Gal repressor. In con- 
trast to the gal system of E. coli, the gal system of yeast 
is indirectly controlled. It is induced by p-galactose 
which binds to GAL80 protein. GAL80 protein binds 
in the absence of galactose to the activator GAL4 and 
thus inactivates GAL4. In the presence of galactose, 
GAL80 no longer interacts with GAL4, thus allowing 
it to activate transcription. Finally the signal which 
leads to induction may not be a chemical. E. coli 
lysogenic for phage lambda may be induced by UV radi- 
ation. UV irradiation leads to the formation of thymine 


dimersinthe DNA. The presence of thymine dimers tri- 
ggers the turning on of the SOS pathway. This in turn 
leads to the proteolytic destruction of lambda repres- 
sor by RecA and so to the induction of phage lambda 
i.e., the liberation of phage lambda from repression. 


Reference 
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Introduction 


Infertility is a common human health problem, in fact, 
almost as common as diabetes mellitus. Approxi- 
mately 10-15% of couples of reproductive age are 
infertile. In women, common causes of infertility 
include tubal or pelvic disorders such as endometrio- 
sis, ovulatory dysfunction, or anatomical problems. In 
men, infertility can be caused by the presence of 
dilated blood vessels around the testes (varicoceles), 
blockage or absence of the spermatogenic tubules 
from infection or congenital absence of the vas def- 
erens, and low or no sperm counts (oligospermia and 
azoospermia, respectively) from testicular failure. 
Genetic causes of infertility can lead to either defects 
in sperm or egg production or result in defects in 
anatomical development within the reproductive 
tract. This article will review our present understand- 
ing of these two kinds of genetic causes of infertility. 


Genetic Infertility: Problems with Egg 
and Sperm Production 


A frequent cause of infertility is the production of sex 
or germ cells (sperm or oocytes) in fewer than normal 
numbers or of poorer than normal quality. Germ cell 
production is complex and differs from that of any 
other cell type. Normal body (somatic) cells replicate 
by a process termed mitosis, in which identical daugh- 
ter cells are created; no reduction in chromosome 
number occurs. However, when germ cells replicate, 
the process involves an extra cell division that reduces 
the number of chromosomes from 46 (diploid) to 23 
(haploid). As a result of this extra step, a single diploid 
cell gives rise to four haploid progenitors. In males, all 
four cells derived from the diploid precursor cell 
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become sperm. In females, only one ovum is produced 
from this process; the remaining three cell products 
become nonfunctioning polar bodies. This sex cell 
replication pathway is termed meiosis. In well-studied 
organisms such as yeast or flies, meiosis involves hun- 
dreds of genes for its proper execution. 


Genetic Infertility Associated with Egg 
Production Problems 

Common conditions that directly affect the develop- 
ment of oocytes in the ovary are Turner syndrome, 
premature ovarian failure, and mutations in the fol- 
licle stimulating hormone (FSH) receptor. At present, 
little therapy exists to stimulate the production of 
oocytes in women with these conditions. 


Turner syndrome 

Turner syndrome is a well-studied disorder that is 
associated with structural abnormalities or absence 
of an X chromosome. Most women with Turner syn- 
drome have a fairly characteristic appearance of short 
stature, webbed neck, shield chest, and an increased 
carrying angle of the elbow, associated with primary 
amennorhea (absence of menses) throughout life. The 
ovaries of women with Turner syndrome are des- 
cribed as ‘streak’ ovaries in that they lack oocytes and 
the normal associated follicular structures. In approxi- 
mately 60% of Turner women, the karyotype is pure 
45,X. In remaining individuals, the karyotype can 
show variable mosaicism in X or Y chromosome 


abnormalities (i.e., 45,X/46,XY). 


Premature ovarian failure 

Premature ovarian failure is also termed premature 
menopause. It is defined by secondary amennorhea 
(absence of menses) before age 40. It is believed that 
women enter menopause when oocyte reserves 
decrease from an initial population of approximately 
500000 at birth to approximately 1000. Premature 
ovarian failure, especially at a young age, can be 
caused by deletions on the long arm of the X chromo- 
some. There are likely three or four different regions 
of the X chromosome required for oocyte production 
and deletions in any of these regions may cause pre- 
mature ovarian failure. The genes that map to these 
regions have not yet been identified. 

Normal development of oocytes is critically 
dependent upon the pituitary hormones, luteinizing 
hormone (LH) and _follicle-stimulating hormone 
(FSH). The failure of oocyte development in women 
with a normal XX karyotype was considered to be un- 
related to these pituitary hormones until a connection 
was made in studies of ovarian failure in Finnish 
women. In studies that used classical human genetic 
strategies to map a locus called ODG1 (Ovarian 
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DysGenesis 1) to a region of chromosome 2, close 
examination of this region revealed that it contained 
the gene that encodes for the FSH receptor. The open 
reading frame of the FSH receptor gene was sequenced 
from Finnish women with ovarian failure and revealed 
a number of mutations that alter the binding of FSH to 
its receptor. This work implies that the FSH receptor 
gene is required for normal oocyte development. 
However, the incidence of mutations in this gene in 
other ethnic groups is not yet known. 


Genetic conditions that impair the development of 
sperm in the testicle tend to result from structural or 


numerical chromosomal abnormalities. Despite this, 
in even the most severe cases of low sperm pro- 
duction, biological paternity is possible with assisted 
reproductive technologies such as intracytoplasmic 
sperm injection (ICSI); illustrated in . The 
use of ICSI in such cases virtually ensures that the 
genetic cause of infertility will be transmitted to off- 


spring. 


Klinefelter syndrome is the most common genetic 
reason for azoospermia in men, accounting for 14% 
of cases. In this abnormality of chromosomal number, 
90% of men carry an extra X chromosome (47,XXY) 
and 10% of men are mosaic with a combination of 


The intracytoplasmic sperm injection (ICSI) procedure. (A) A mature oocyte (left) is readied for injection 
with a sperm (arrow) in a micropipette under high-power microscopy. (B) The micropipette is placed directly into the 


oocyte and the sperm deposited in the cytoplasm. 


XXY/XY chromosomes. This syndrome may present 
with increased height, decreased intelligence, varicosi- 
ties, obesity, diabetes, leukemia, an increased likeli- 
hood of extragonadal germ cell tumors, and breast 
cancer (20 times higher than normal males). Paternity 
with this syndrome is rare, and more likely in the 
mosaic or milder form of the disease. Recently, pater- 
nity has been reported in several cases of pure XXY 
men with the use of ICSI. 


XYY syndrome 

XYY syndrome is based on another abnormality of 
chromosomal number and can result in infertility. 
Typically, men with 47,XYY have normal internal 
and external genitalia, but are taller than average. 
Semen analyses show either severe oligospermia or 
azoospermia. Testis biopsies may often demonstrate 
arrested germ cell development or complete absence 
of germ cells (Sertoli cell-only syndrome). 


XX male syndrome 

XX male syndrome is a structural and numerical 
chromosomal condition that presents as a male with 
azoospermia. Typically, there is normal male external 
and internal genitalia. Testis biopsy usually reveals an 
absence of spermatogenesis. The most obvious explan- 
ation for the disease is that the sex-determining region 
(SRY) or testis determining region is translocated from 
the Y to another chromosome. Thus, testis differentia- 
tion occurs, but other Y chromosome genes required 
for sperm production (see below) are not similarly 
translocated, with resultant sterility. 


Noonan syndrome 

Noonan syndrome presents phenotypically as a male 
Turner syndrome (45,X). However, the karyotype in 
these men is normal 46,XY and the chromosomal 
abnormality has not yet been identified. Typically, 
these men have dysmorphic features such as webbed 
neck, short stature, low-set ears and wide-set eyes. At 
birth, 75% will have cryptorchidism (undescended 
testes) that may limit fertility in adulthood. 


Immotile cilia syndromes 

Immotile cilia syndromes are a heterogeneous group 
of disorders in which sperm motility is reduced or 
absent. The sperm defects are based on abnormalities 
in the motor apparatus or axoneme of sperm and 
other ciliated cells. Normally, 10 pairs of microtubules 
within the sperm tail are connected by dynein arms 
(ATPase) that regulate microtubule and, therefore, 
sperm tail motion. In these conditions, various defects 
in the dynein arms cause deficits in ciliary motion 
and sperm activity. Most immotile cilia cases are 
diagnosed in childhood due to respiratory and sinus 
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difficulties. Cilia within the retina and ear may also 
be defective and lead to retinitis pigmentosa and deaf- 
ness (Usher syndrome). Men with immotile cilia 
characteristically have completely nonmotile but 
viable sperm in normal numbers. Depending on the 
severity of the ciliary defect, some sperm motility can 
be present. 


Azoospermia gene(s) 

Approximately 10-15% of men with azoospermia 
have structural changes in the Y chromosome. The 
sex-determining region (SRY) of the Y chromosome 
that controls testis differentiation is intact, but dele- 
tions may exist on the long arm of the chromosome 
(Yq) that result in azoospermia or severe oligospermia 
(Figure 2). 

A relationship between the Y chromosome and 
spermatogenesis was originally postulated based on 
the finding of structural changes in the chromosome 
detected by karyotype in a population of men with 
azoospermia. This led to the hypothesis that the 
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Figure 2 Three regions of the Y chromosome are 
required for fertility in men. They are termed the AZFa, 
AZFb, and AZFc regions. 
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Y chromosome held an ‘azoospermia factor (AZF)? 
A mutation in, or absence of, AZF was thought to 
account for the azoospermia in men with observed 
deletions of Yq. Since then, more sophisticated an- 
alyses of the Y chromosome indicate that three gene 
sites may carry AZF genes. The exact function of these 
suspected genes in spermatogenesis has not yet been 
clearly delineated, as the gene products are only just 
beginning to be elucidated. Genes identified include 
RBM (RNA-Binding Motif), DAZ (Deleted in AZoos- 
permia), and a number of others, as shown in Figure 2. 
It is likely that men who have these gene deletions will 
pass them to offspring if assisted reproductive tech- 
nology is used to achieve paternity. 


Genetic Infertility Associated with 
Reproductive Tract Abnormalities 


Female Reproductive Tract Abnormalities 
Infertility can often be traced to abnormal develop- 
ment of the female reproductive tract, including the 
ovaries, oviducts, uterus, and vagina. Although it is 
clear that genetic causes for abnormal development 
exist, they are likely to be polygenic or multifactorial 
in nature. Major female reproductive tract abnormal- 
ities include endometriosis, polycystic ovarian syn- 
drome, and anomalies of uterine structure. 


Endometriosis 

Endometriosis is a complex disorder characterized by 
the presence of endometrial glands and stroma outside 
of the uterus. The most frequent sites of endometriosis 
are the ovaries, the uterosacral ligaments, the anterior 
and posterior cul-de-sac, and the posterior broad liga- 
ments. It is estimated that 3-10% of women of repro- 
ductive age have endometriosis and that 25-35% of 
infertile women have endometriosis. No genes that 
cause endometriosis have been identified. Yet it is 
likely that genetic factors influence susceptibility to 
endometriosis. Numerous studies have found a 5- to 
10-fold increase in the incidence of the disorder in 
first-degree relatives of patients with endometriosis 
when compared with control groups. 


Polycystic ovarian syndrome 

Polycystic ovarian syndrome is characterized by ano- 
vulation associated with the persistence of numerous 
cysts and a continuous secretion of gonadotropins 
and sex steroids. Similar to endometriosis, polycystic 
ovarian syndrome is common and may occur in 10- 
15% of women with normal reproductive function. 
Among infertile women with anovulation, polycystic 
ovaries are detected in 75% of cases. The genetics of 
polycystic ovarian syndrome are complex, yet studies 
suggest that, like endometriosis, there may be a 5- to 


10-fold increase in the disorder in first-degree relatives 
of affected patients compared with controls. To date, no 
genes implicated in the disorder have been identified. 


Uterine abnormalities 

Uterine abnormalities due to defects in mullerian 
development (such as in Mayer—Rokitansky—Kuster— 
Hauser syndrome) are a relatively common cause 
of primary amennorhea. Abnormalities range from 
incomplete development of the vagina to the complete 
absence of all mullerian structures (fallopian tubes, 
uterus, and upper vagina). It is clear that the normal 
development of these structures requires proper func- 
tion of the mullerian inhibiting substance (MIS) gene. 
Yet, mutations or structural alterations in the gene 
have not yet been identified in affected women. 
There are studies to suggest that in families affected 
by Mayer-Rokitansky—Kuster—Hauser syndrome, the 
uterine abnormality is likely to be caused by muta- 
tions in three or four different genes. 


Male Reproductive Tract Abnormalities 

In 10% of male infertility cases, there is abnormal 
development of the male reproductive tract. Abnormal- 
ities of wolffian duct development may affect the 
epididymis, vas deferens, seminal vesicles or associ- 
ated ejaculatory apparatus, and generally result in 
obstruction to the flow of sperm from the testis. As 
with abnormal development of the female reproduct- 
ive tract, such genetic conditions in men are pre- 
dominantly polygenic or multifactorial in nature. 
This discussion excludes conditions that present at 
birth or childhood with ambiguous genitalia (intersex 
disorders). 


Cystic fibrosis 

Cystic fibrosis is the most common fatal autosomal 
recessive disorder in the United States. It is associated 
with more than 550 possible genomic mutations. The 
disease manifests with fluid and electrolyte abnormal- 
ities (abnormal chloride-sweat test) and presents with 
chronic lung obstruction and infections, pancreatic 
insufficiency, and infertility. Interestingly, 98% of 
men with cystic fibrosis (CF) also have wolffian duct 
abnormalities. The body and tail of the epididymis, 
vas deferens, seminal vesicles, and ejaculatory ducts 
are atrophic, fibrotic, or completely absent. Pituitary— 
gonadal hormones and spermatogenesis are usually 
normal. Fertility is possible with assisted reproductive 
technology such as ICSI. 


Congenital absence of vas deferens 

Congenital absence of the vas deferens (CAVD) 
accounts for 1-2% of all cases of infertility and up to 
5% of azoospermic men. Men with this condition 


Figure 3 
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Illustration of scrotal anatomy. In congenital absence of the vas deferens (CAVD), there is a normal 


testis (T), but the epididymis and vas deferens (vas) are abnormal. The caput epididymis (caput) is present and 
attached to the testis, but the corpus and cauda epididymis and the vas deferens are absent (stipled areas). 


have no palpable vas deferens (one or both sides) on 
physical examination (Figure 3). Similar to CF, the 
rest of the wolffian duct system may also be abnormal 
and is largely unreconstructable. Recently, this disease 
has been shown to be a genetic form fruste of CF, even 
though the vast majority of these men fail to demon- 
strate any symptoms of CF. In men with bilateral vasal 
absence, 65% will harbor a detectable CF mutation. In 
addition, 15% of these men will have renal malfor- 
mations, most commonly unilateral renal agenesis. In 
patients with unilateral vasal absence, the incidence of 
detectable CF mutations is lower, and the incidence 
of renal agenesis approaches 40%. Pituitary—gonadal 
hormones are usually normal, as is spermatogenesis. 


Young syndrome 

Young syndrome presents with the clinical triad 
of chronic sinusitis, bronchiectasis, and obstructive 
azoospermia. The obstruction is located in the epi- 
didymis, usually near the junction of the head and 
body. Since obstruction may not occur until well 
after puberty, fertility is possible in some patients. 
The pathophysiology of the condition is unclear but 


may involve abnormal ciliary function or abnormal 


mucus quality. Pituitary-gonadal hormonesandsperm- 
atogenesis are normal in these men. Reconstructive 
microsurgery can be attempted in these men but 
usually meets with lower success rates than observed 
with other obstructive conditions. 


Idiopathic epididymal obstruction 

Idiopathic epididymal obstruction is a relatively 
uncommon, but well-recognized condition found 
in otherwise healthy azoospermic men in which the 
small ducts within the epididymis are obstructed. 
It can be successfully treated with microsurgical re- 
construction. There is recent evidence linking this 
condition with CF: in one series, 47% of men so 
obstructed were seen to harbor a gene mutation asso- 
ciated with CF. This implies that up to one-half of 
patients with obstruction in the epididymis may in 
fact have a genetic predisposition for the problem. 


Summary 


Although our understanding of genetic causes of 
female and male infertility is still quite naive, it is 
already obvious that research in this field has the 
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potential to decipher the origins of many cases of 
presently unexplained infertility. It is also important 
for patients to understand that genetic infertility may 
be passed to offspring, given the recent revolutionary 
developments in the field of assisted reproduction. 


Further Reading 

Desjardins C and Ewing LL (1993) Cell and Molecular Biology of 
the Testis. New York: Oxford University Press. 

Mak V and Jarvi K (1996) The genetics of male infertility. Journal 
of Urology 156: 1245-1257. 

Seibel MM (1997) Infertility: A Comprehensive Text, 2nd edn. 
Stamford, CT: Appleton & Lange. 

Speroff L, Glass RH and Kase NG (1994) Clinical Gynecologic 
Endocrinology and Infertility. Baltimore, MD: Williams & Wilkins. 


See also: Ethics and Genetics; Fertilization 
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Influenza, caused by the influenza virus, is a highly 
contagious infection of the nose, throat, bronchial 
tubes, and lungs. Its severity and recurrence is caused 
by the ability of the virus to mutate quickly and thus 
reinfect populations that have already built up anti- 
bodies to the virus through a previous infection. 

The virus evolves in two ways. Mutations gradually 
build up through continued replication of the viral 
RNA. This antigenic ‘drift’ allows the virus to evade 
the immune system of the host, even if it has been 
previously infected with an older version of the virus. 
The virus can also mutate through abrupt replacements 
of the hemagglutinin and neuraminidase genes that 
make up part of its protein coat. This antigenic ‘shift’ 
results in a new subtype of virus that has no immuno- 
logical relation to the previous subtype, thus account- 
ing for the disease’s virulence when a new form enters 
the population. 

There are three types of influenza virus: type A 
causes the most severe infections and type C the most 
mild. Type A viruses undergo both antigenic drift and 
shift, while type B viruses change only through anti- 
genic drift. Type C viruses cause mild illness and 
do not lead to epidemics. Type A viruses are further 
categorized by differences in their hemagglutinin and 
neuraminidase coat proteins. There are at least 15 
varieties of hemagglutinin (H) and nine varieties of 
neuraminidase (N) that can combine to create differ- 
ent strains, so viruses are named according to the type 


of H and N proteins they produce. Because of the 
virus’s ability to mutate quickly and constantly (1% 
change per year in hemagglutinin), inoculations 
against it are only temporarily effective. The World 
Health Organization and the Centers for Disease 
Control and Prevention oversee influenza surveil- 
lance and make recommendations for the next year’s 
vaccines based on the virus’s mutations the previous 
year. The most common subtypes of influenza A are 
designated A(H1N1) and A(H3N2). These, along 
with an influenza type B strain are included in the 
trivalent vaccines produced each year against the 
influenza virus. 

Influenza infection can be severe or even lethal. 
Death, particularly in the young, elderly, or immuno- 
compromised, is generally caused by cardiopulmon- 
ary or upper respiratory complications associated 
with the infection. Influenza infection peaks during 
the winter months and, when an antigenic shift occurs, 
can spread pandemically. The most notable of these, 
the Spanish Flu Pandemic of 1918, caused between 20 
and 40 million deaths worldwide. One-fifth of the 
world population was infected. More localized epi- 
demics occur frequently and have led to the recom- 
mendation of yearly flu vaccinations, particularly in 
susceptible populations. 


Further Reading 

Center for Disease Control and Prevention:www.cdc.gov/ 
ncidod/disease/ flu/fluinfo/htm 

Landon Pediatric Foundation: www.medmall.org/Profu/ 


See also: Virus 


Inheritance 


J Merriam 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.069 1 


The evidence for inheritance includes circumstantial 
anecdotes, such as the ancient statement that “like be- 
gets like,” or the more contemporary statement that 
“traits run in families.” It also includes sophisticated 
recognition that the experimental manipulation of 
traits through breeding, with consistent and predict- 
able results, requires a mechanism of determinative 
factors or genes that are transmitted from parent to 
offspring. Inheritance refers to the mechanism that 
genes, and more specifically the permanent condition, 
or allele, that can be distinguished from other alleles of 
the same gene, is transmitted from parent to off- 
spring. From examining the observation that related 


individuals share traits and picking out distinct, rare 
traits to follow, details of the inheritance mechanism 
that explain, for instance, how certain traits reappear 
after seeming to skip generations, have been worked 
out. This has progressed so well that inheritance can 
refer to the mechanism for either the transmission of 
specific traits or the transmission of all the biological 
information required for life without specifying 
genotypes. 


See also: Mendelian Inheritance; Quantitative 
Inheritance 
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Rickets is a disorder in which there is a failure of 
mineralization of bone and an accompanying defect 
in remodeling of growing bone in children. The 
mineral deposit is primarily made up of calcium and 
phosphate, as hydroxyapatite, and so it is to be 
expected that disorders of both calcium and phosphate 
regulation could cause of rickets. 

The calcium disorders that cause inherited rickets 
are primarily those related to the metabolism or 
action of vitamin D. Cholecalciferol (vitamin D3) 
can be formed in the skin by UV irradiation of 7- 
dehydrocholesterol or it can be absorbed from 
the diet. It is metabolized in the liver to 25-hydroxy- 
vitamin D, which can be further hydroxylated in the 
kidney by a 1-hydroxylase enzyme to the active form, 
namely 1,25-dihydroxycholcalciferol. Ergocalciferol 
(vitamin D,) is a product of plants that can be metab- 
olized similarly. The term ‘1,25-dihydroxyvitamin 
D’ includes both the vitamin D, and the vitamin D; 
forms. It can be regarded as a hormone in that it is 
produced by one organ, namely the kidney, and enters 
the circulation to act on another organ, particularly 
the intestine, where it increases calcium absorption. 
In the circulation it is bound to a transport protein, 
vitamin D-binding protein, so it is carried like other 
steroid hormones. Within the target cells is also 
behaves like a steroid hormone, being transported to 
the nucleus by a vitamin D receptor protein, which, 
when complexed to ligand, can bind to DNA through 
its zinc fingers to modulate transcription, for example, 
of calcium-binding protein in the intestine. 

The phosphate disorders that cause inherited rick- 
ets are those involving defects in renal tubular phos- 
phate reabsorption. This is an active process that 
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involves sodium-dependent phosphate transporters, 
the genes for which have been cloned. However, 
mutations in these genes for the phosphate cotrans- 
porters have not been shown to occur and these 
genes are therefore not relevant to the conditions 
described here. When there is excessive renal tubular 
loss of phosphate, hypophosphatemia develops and 
that leads to defective mineralization of bone and so 
to rickets. 

There are five situations in which rickets develops 
on the basis of a known gene mutations. Two of these 
are related to vitamin D, and one is a consequence of a 
failure of calcium reabsorption in the renal tubule. The 
other two forms are due to failure to reabsorb phos- 
phate in the renal tubules. 


Rickets due to Inherited 
Abnormalities in the Synthesis 
or Action of Vitamin D 


Defect in Vitamin D, Hydroxylation 

The clinical features of this disorder (OMIM 264 700) 
were first described as pseudovitamin D-deficiency 
rickets. It seems likely to be an autosomal recessive 
disease. The phenotype consists of severe rickets. 
There is hypocalcemia and the characteristic feature 
is finding a normal concentration of circulating 
25-hydroxyvitamin D with a low concentration of 
circulating 1,25-dihydroxyvitamin D. The rickets in 
these patients heals completely after treatment with 
small doses of 1,25-dihydroxyvitamin D. 

The 1-a hydroxylation of 25-hydroxyvitamin D 
occurs in the renal tubules, under the influence of 
25-hydroxy-1-« hydroxylase (P450c1-«). The human 
gene maps to locus 12q14 and has been cloned; it 
consists of nine exons spanning a region of approxi- 
mately 4.8 kb. A transcript of 2.5 kb has been detected 
in renal tissue. The 508-amino acid P450c1-« protein 
has a predicted topology that is similar to that of 
mitochondrial cytochrome P-450 enzymes, with a 
putative N-terminal mitochondrial signal sequence 
and conserved ferredoxin- and heme-binding sites. 
Mutations of this gene (also called CYP27P1) have 
been found in a number of families of varying ethnic 
origin. These mutations include single base-pair sub- 
stitutions, causing alterations in single amino acid 
residues, as well as deletions, resulting in loss of func- 
tion of the enzyme. 


Vitamin D-Resistant Rickets with End Organ 
Unresponsiveness 

Patients with this condition (OMIM 277 440) have 
resistance to treatment with vitamin D in any form. 
Some of the patients have associated alopecia. It has 
been suggested that with the presence of alopecia the 
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rickets is more severe, but this is not the case. The 
condition is inherited as an autosomal recessive and 
occurs particularly in the Arab countries but also in 
Japan and the Philippines. Biochemically the hallmark 
of the condition is the presence of high circulating con- 
centrations of 1,25-dihydroxyvitamin D. The disease 
can be treated effectively by infusions of calcium, over- 
coming the defect in calcium absorption, but the infu- 
sions need to continue for a long time, usually daily, for 
about a year. It is remarkable that with this treatment 
the healing can be complete and that relapse does not 
occur for several years after the treatment has stopped. 

The condition is generally due to mutations in the 
gene for the vitamin D receptor (locus 12-q14). The 
gene consists of 11 exons, spanning approximately 
75 kb. Exons 2 and 3 encode the two zinc fingers that 
are responsible for binding to DNA, while exons 7, 8, 
and 9 encode the ligand binding domain, which com- 
plexes 1,25-dihydroxyvitamin D. Mutations of either 
domain can cause rickets. However, in one patient 
with the typical phenotype, no mutation was found, 
despite sequencing the whole of the coding region and 
large parts of the noncoding regions. A knockout 
model in mice, with deletion of the vitamin D receptor 
gene, produces a phenotype that includes alopecia as 
well as rickets. The effects of mutations in the DNA 
binding domain can be analyzed at the crystallo- 
graphic level, by comparing the known crystal struc- 
ture of the DNA-binding part of the glucocorticoid 
receptor, which is presumed to have a similar structure 
to the corresponding part of the vitamin D receptor, 
for which only the amino acid sequence is known. All 
the mutations in the DNA-binding domain of the vita- 
min D receptor that cause rickets affect conserved resi- 
dues that have a particular function (such as hydrogen 
bonding between the proteins and the DNA) in the 
crystal structure of the glucocorticoid receptor-DNA 
complex. The larger ligand-binding domain of the vita- 
min D receptor has itself been crystallized and its 
structure when complexed to ligand has been estab- 
lished. There is considerable homology in structure 
between this receptor and that for thyroid hormone 
and for the retinoid receptors. The mutations in its 
ligand-binding domain that cause rickets affect resi- 
dues that are important for dimerization of the vitamin 
D receptor to the retinoid X receptor, dimerization 
which is necessary for action of the vitamin. Thus in 
this case it is possible to consider the effects of muta- 
tions causing rickets at the Angstrom level. 


Hypercalciuric Rickets 


Dent’s disease (OMIM 300009) was originally de- 
scribed as a combination of rickets and hypercalciuria. 
It later became apparent that in the same families a 


variety of other phenotypes could occur, including 
renal tubular proteinuria, nephrocalcinosis and renal 
calculi, and the development of renal failure. Within 
any one family, the phenotype is variable, and in some 
families rickets does not occur. As a result of this 
variable phenotype, the condition has had various 
names, including ‘X-linked recessive nephrocal- 
cinosis.’ 

The disease maps to Xp11.22. Mutations in this 
condition led to the discovery of the voltage-gated 
chloride channel gene CLCNS. The gene is organized 
into 12 exons spanning 25-30kb of genomic DNA. 
Mutations of the same gene have been found in 
X-linked recessive Dent’s disease, X-linked recessive 
nephrocalcinosis, and X-linked recessive hypo- 
phosphatemic rickets, implying that these are all 
variants of the same disease phenotype. The way in 
which this chloride channel affects the handling of 
calcium and protein in the renal tubule remains to be 


established. 


Rickets due to Renal Tubular Phosphate 
Leak 


X-Linked Dominant Hypophosphatemic 
Rickets 

This condition (OMIM 307800) is characterized 
by severe rickets, with a low serum phosphate con- 
centration and inappropriately raised level of urinary 
phosphate excretion. Paradoxically bone density in 
this condition is raised, even though there is the defect 
of mineralization. In adults, the increased bone dens- 
ity may be associated with ossification of intraspinus 
ligaments, and there may occasionally be cord 
compression due to exostoses. The condition responds 
partially to treatment with oral phosphate supple- 
ments, which have to be accompanied by vitamin D 
treatment, since phosphate on its own produces 
hypocalcemia. 

In this condition there are mutations of the PEX 
gene. This acronym refers to the gene being involved 
in phosphate handling, having homology with endo- 
peptidases, on the X chromosome (locus Xp22.2- 
p22.1). The gene is also known as the Phex gene. The 
predicted protein has a small intracellular region, a 
single transmembrane domain, and a large extracellu- 
lar catalytic domain. The homology with metallo- 
proteinases, particularly neutral endopeptidase, was 
unexpected, and the mechanism whereby this muta- 
tion causes rickets is not clear. It is possible, by 
analogy with a tumor-associated form of rickets, that 
the enzymeisacting ona putative phosphate-regulating 
hormone, ‘phosphatonin.’ The gene is expressed in 
bone cells and in the kidney but its role remains 
unclear. In X-linked hypophosphatemic rickets, 


about three-fourths of patients have mutations of this 
gene which can be detected. These mutations include 
deletions that may be large or small, or there may be 
point mutations leading to single amino acid changes 
or splice-site alterations. There are two mouse homo- 
logs; one of these is the Hyp mouse. This was the 
result of a spontaneous mutation, while the second 
model is the Gyr mouse, in which there is hypophos- 
phatemic rickets plus a gyratory movement. In both 
mutations of the mouse homolog, Pex has been found; 
in the Gyr mouse this is a deletion which includes also 
an adjacent gene. In the Hyp mouse, there is evidence 
thata hormonal mechanism is involved, which provides 
some support for the possibility that the PEX gene 
product is acting upon the yet unidentified hormone. 


Autosomal Dominant Hypophosphatemia 
This condition (OMIM 193 100) is similar to X-linked 
dominant hypophosphatemic rickets, but increased 
bone density seems not to be a feature, and in fact 
osteoporosis may become a problem in later life. It is 
remarkable that the biochemical and clinical features 
of autosomal dominant hypophosphatemic rickets 
can disappear in late childhood, although they may 
subsequently recur in later life. This can make it 
difficult to establish the true phenotype in an adult, 
especially since the severity of the condition can vary 
within the same family. 

The gene causing this condition has recently been 
identified and mutations have been established. The 
gene encodes a protein that is homologous to fibro- 
blast growth factors and has been given the name 
FGF23. The mechanism whereby alterations in such 
a protein lead to defects in renal tubular phosphate 
reabsorption are not clear. 

In conclusion, it should be pointed out that, in 
identifying five genes, mutation of which leads to 
rickets, the nature of the relevant gene product was 
quite unexpected in three of them. 


Further Reading 
Online Mendelian Inheritance in Man (OMIM), http://www. 
ncoi.nlm.gov.omim/. 


See also: Growth Factors; Sex Linkage; Vitamins 
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Initiation factors are proteins other than the RNA 
polymerase required for correct initiation of tran- 
scription (transcription initiation factors). They are 
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also proteins which, in addition to ribosomal proteins, 
are required for initiation of translation (translation 
initiation factors). In eubacteria, transcription initi- 
ation factors are called o factors. In eukaryotes, tran- 
scription initiation factors usually refer to the general 
transcription factors required for transcription initi- 
ation from most promoters. For RNA polymerase II 
these are TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and 
TFIIH; for RNA polymerase III these are TFIIIB, as 
well as TFIIC, for tRNA genes, and TFIIA and 
TFIIIC for 5S rRNA genes; and for RNA polymerase 
I these are SL1 and UBF in humans, and TBP, Rrn3, 
core factor, and upstream activating factor in Sac- 
charomyces cerevisiae. Bacterial translation initiation 
factors are called IF1, IF2, and IF3. Eukaryotic trans- 
lation initiation factors include eIF1, eIF2, eIF3, elF4, 
and eIF5. The eIF4E subunit of eIF4 binds to the 5’ 
cap structure on eukaryotic mRNAs. 


See also: Transcription 


Insertion Sequence 


M Chandler 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0696 


Discovery 


Insertion sequences (ISs) are small pieces of DNA 
which move within or between genomes using their 
own specialized recombination systems. They were 
discovered in the mid-1960s in studies of gene expres- 
sion in Escherichia coli and its bacteriophages. Initially 
recognized by their ability to generate highly polar 
but unstable mutations in the gal and lac operons 
and in the ‘early’ genes of bacteriophage lambda, 
they were later identified by electron microscopy as 
short insertions of DNA. The repeated isolation of a 
limited number of identical DNA sequences asso- 
ciated with these unstable mutations led to their 
being named: insertion sequences. The similarity of 
ISs and the mobile genetic elements described by 
Barbara McClintock in Zea mays in the 1940s became 
clear when it was realized that ISs formed an integral 
part of the E. coli genome and that their mutagenic 
activity was a result of their movement to new genetic 
locations. At about this time, transmissible resistance 
to antibiotics was also observed. Genetic studies of 
this phenomenon implicated an analogous mechanism 
of gene mobility in the distribution of these drug 
resistance genes among the conjugal plasmids and 
phage involved in this transmission. Subsequently, 
insertion sequences were shown in many cases to 
play a key role in mobilizing these genes. 
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General Structure 


ISs are genetically compact (Figure 1), typically less 
than 2.5 kb in length, and carry only the genes neces- 
sary for their transposition. They comprise a single, or 
sometimes two, open-reading frames covering almost 
the entire length of the element. The products are 
specialized recombinases called transposases (Tpases). 
ISs characteristically terminate in small flanking (10- 
40 bp) inverted repeat sequences (IRs) with imperfect 
homology. By convention, the terminal inverted 
repeat proximal to the Tpase promoter is defined as 
the left repeat, IRL, while the distal IR is defined as the 
right repeat, IRR. In the majority of cases, ISs are 
flanked by small directly repeated duplications in the 
target DNA, which they generate on insertion. The 
length of this duplication is specific for each element 
and ranges from 2 to 13 bp. 


Occurrence and Variety 


ISs form an integral part of the chromosomes of many 
bacterial species and their extrachromosomal elements 
such as plasmids and bacteriophages. They have also 
been found in the genomes of many eukaryotes. ISs 
can represent a significant fraction of genomic and 
plasmid DNA. Although individually each IS is 
mobile at a low frequency (of the order of 1 x 1077 
to 10 */cell per generation), such movements rarely 
become established on a population scale. The local- 
ization of many ISs is sufficiently stable within their 
host genomes to provide a specific and characteristic 
profile, which has made certain ISs useful markers in 


TPase 
Pe 
A B B A 
Figure | General organization of IS elements. The 


open box represents the IS element. Terminal inverted 
repeats (IRL and IRR) are shown as shaded boxes. A 
single open reading frame is shown within the IS. It 
stretches the entire length of the element and, although 
not always the case, is shown here to terminate within 
IRR. The indigenous Tpase promoter is shown located 
(by convention) in IRL. The arrows show that the protein 
acts on the ends of the element. The domain structure of 
the IRs is indicated by A (the region recognized by Tpase 
and which is involved in cleaveage) and B (the region 
to which Tpase binds in a sequence-specific way). XXX 
represents the short direct target repeat sequence 
which is duplicated during the insertion event. 


epidemiological studies. ISs have been characterized 
from most bacterial species analyzed to date and over 
600 have been described. They can be grouped into at 
least 17 families based on their genetic organization, 
similarities in their IRs and transposase sequences, the 
number of target base pairs duplicated on insertion, 
and their preference for given target DNA sequences 
(Table 1). As more of these elements are characterized, 
this classification will certainly continue to evolve. 
Such groupings have provided significant insights 
into important conserved features, which not only 
assist in understanding their phylogenetic relation- 
ships but also contribute to understanding different 
aspects of their function. 


Role of ISs in Gene Transfer and 
Expression 


In early studies of antibiotic resistance, resistance 
genes were often observed to be flanked by DNA 
sequences of between 1 and 2 kb in direct or inverted 
orientation. These segments of DNA proved to be ISs. 
By acting in concert, the flanking ISs are able to mobil- 
ize the intervening DNA segment. Such structures 
are known as compound or composite transposons. 
The mobilized genes are not limited to antibiotic re- 
sistance but can include virulence determinants and 
catabolic genes. ISs are thus important in the seques- 
tration, assembly, and transmission of sets of accessory 
functions in bacteria. Moreover, many elements can 
control expression of neighboring genes either by 
initiating transcription from indigenous IS promoters 
or, more commonly, by formation of hybrid promoters 
as a result of insertion. Many ISs carry outwardly 
directed —35 hexamers in their IRs and can generate 
functional promoters when inserted at the correct pos- 
ition with respect to a —10 element upstream from a 
host gene. 

Note that compound transposons differ fundamen- 
tally in organization from a second large class, trans- 
posons of the Tri3 family, where the genes specifying 
accessory functions form an integral part of the trans- 
poson. However, this family of transposons also 
includes elements resembling ISs in addition to more 
elaborate elements in which the typical accessory 
genes have been integrated. 


Terminal Inverted Repeats 


Transposition requires DNA cleavage at the ends of 
the element and transfer of these DNA ends into a 
target molecule. The signals for recognition and pro- 
cessing by the transposase reside in the terminal IRs. 
Analysis of several different IRs suggests the presence 
of at least two functional domains (Figure 1). One, 
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Table I The IS families 
Family Groups Size range Direct target repeats (bp) Ends IR ORF TPase 
ISI - 770 9 (8-11) GGT Y 2 lambda integrase ? 
IS3 IS2 1200-1550 5 TGA Y 2 DDE 
IS3 3 (4) 2 
IS51 3 (4) 2 
IS150 3-5 2 
S407 4 2 
IS4 - 1300-1950 9-12 C(A) Y | DDE 
IS5 IS5 800-1350 4 GG Y I DDE 
IS427 2-3 Ga/g (2) 
IS903 9 GGC l 
IS1031 3 GAG l 
ISH/ 8 - | 
ISL2 2-3 - | 
IS6 - 750-900 8 GG Y | DDE 
IS21 - 1950-2500 4 (5, 8) TG Y 2 DDE 
IS30 - 1000-1250 2-3 - Y | DDE 
IS66 - 2500-2700 8 GTA Y >3 - 
IS9/ - 1500-1850 N - N I ssDNA Rep 
IS110 - 1200-1550 N - N I Site-specific recombinase 
IS200/IS605 — 700 -2000 N - N (2) Complex organization 
IS256 - 1300-1500 8-9 Ggla Y l DDE eukaryote relatives 
IS630 - 1100-1200 2 - Y l DDE eukaryote relatives 
IS982 - 1000 ? AC Y I DDE 
IS1380 - 1650 4 Ccg Y | - 
ISAS | - 1200-1350 8 Cc Y | - 
ISL3 - 1300-1550 8 GG Y I - 


Size range in base pairs (bp) represents the typical range of each group. N, no; less frequently observed lengths are included 


in parentheses; Ends, typical nucleotide sequences at the very ends of the element. Presence (Y) or absence (N) of terminal 


inverted repeats is indicated. DDE represents the common acidic triad presumed to be part of the active site of the 
transposase. ssDNA Rep indicates that the enzyme is a polymerase of the rolling circle type. 


located within the IR (B), is involved in Tpase binding 
and probably assures correct sequence-specific Tpase 
positioning at the ends. The second (A) corresponds to 
2-4 base pairs located at the tip of the IRs and is 
necessary for efficient cleavage and strand transfer. 
These bases, generally identical at both ends of the 
element, are presumably in intimate contact with the 
catalytic pocket of the Tpase and determine the speci- 
ficity of the cleavage (and/or strand transfer) reac- 
tions. IRL and IRR are tacitly assumed to interact in 
a similar way with Tpase and their contribution to the 
reaction is thought to be identical. This may, however, 
prove to be an oversimplification and the subtle se- 
quence differences found between the IRs of certain 
elements may prove to reflect differential activity of 
the ends. 

Indigenous IS promoters are often partially located 
in IRL (Figure 1). This arrangement would facilitate 
autoregulation of transposase expression by transpos- 
ase binding. In addition to carrying sites for Tpase 


and RNA polymerase binding, binding sites for other 
host-specified proteins involved in regulation of 
Tpase expression or in modulating the transposition 
activity of the ends may be located within or proximal 
to the IRs. 

Members of a small number of IS families (Table 1) 
do not exhibit terminal IRs and are also the only 
families which do not generate direct target repeats 
on insertion. This is presumably because such elem- 
ents have adopted fundamentally different transpos- 
ition mechanisms. 


Transposases: Domain Structure and 
Catalytic Site 


Many Tpases encoded by ISs share a similar overall 
organization. A region involved in recognition of 
the ends is located in an N-terminal domain, while 
the catalytic core of the enzyme is located toward the 
C-terminal end. These enzymes also function as 
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multimers and carry domains involved in multi- 
merization. Indeed, in several cases, multimerization 
appears to be essential for DNA binding (see below). 

Sequence alignment of most bacterial Tpases and 
the functionally related retroviral integrases, IN 
(which catalyze integration of the double-stranded 
viral cDNA into the host genome), revealed a com- 
mon triad of acidic amino acids with a characteristic 
spacing, the DDE motif (Table 2A). This similarity 
was subsequently shown to include additional con- 
served amino acids and has also been detected in 
many other major transposons (Table 2B) and IS 


families (Table 1). Extensive mutagenesis both of IN 
and a limited number of Tpases has shown that the 
DDE motif is intimately involved in catalysis. Deter- 
mination of the three-dimensional structure of IN and 
several Tpases confirmed the close juxtaposition of 
these residues and demonstrated that these enzymes 
share related topological folding. This structural simi- 
larity is not limited to IN and Tpases but is also seen in 
RNase H and in RuvC, the endonuclease which pro- 
cesses recombination intermediates. These observa- 
tions have led to the definition of a ‘superfamily’ of 
phosphoryltransferases. 


Table 2 The DDE motif showing representative transposases from various insertion sequence families (A) and 


transposases from other bacterial transposons (B) 


A N2 N3 c1 
64 116 152 
HIV-1 (IN) wql Dcth (51) vhtDngsnf (35) ynpqsQgvi E smNxKe1 K 
207 287 323 
I1S911 (IS3) wcgDvty (59) fhs D qgshy (35) gnewNspm E rffrs1K 
97 161 292 
IS 70 (IS4) vlvD wsad (63) ivsDagfkv (130) niyskRmq E etfral K 
119 188 326 
IS50 (IS4) siqD ksr (67) avcD readi (136) diythRwri E efHxawK 
121 193 259 
IS903 (IS5) iviDstg (71) asaDgaydt (65) niyskRmqi E eftralK 
78 138 173 
1S26 (IS6) whmDety (59) intDkapay (36) qikylNNvi E cdugkiK 
237 293 327 
IS30 weg D 1vs (55) ltwDrgmel (33) qspwqRgtnE ntNgliR 
122 184 230 
IS21 (IstA) ighDwge (61) vivDngkaa (46) rrartKgkv E rmvKy1K 
181 261 297 
IS630 fyeDeva (80) livDnyiih (35) vyspwvNhv Ē rlwgal 
112 192 237 
IS982 siiDsfp (79) vigDmgylg (45) nfskrRKvi Ervsfl 
167 233 341 
IS256 lmt Dviy (65) visDahkgl (107) nrikstnli EringevR 
86 177 286 
Tc? iwsDesk (90) fqqDndpkh (108) spspdiNpi E hmweeleR 
B 
269 336 392 
Mu (MuA) ingDgyl (66) itiDntrga (55) kgwgqaKpv E rafgvg 
28 114 149 
Tn7(TnsA) (hgkD yip) (85) mstD flvde (34) erleKlel E rrywqqK 
273 361 396 
Tn7(TnsB) yeiDati (87) 1laD rgelm (34) rrfdakgiv E stfRel 
166 240 276 
Tn552 wqaDhtl (73) £ytDhgsdf (35) gvprgRgki E rffetv 
689 765 895 
Tn3 asaD gmr (75) imtDtagas (129) riltqlNrg EsrHavaR 


Large bold letters indicate highly conserved residues, smaller bold letters indicate partially conserved residues. Bold figures 


above each line indicate the coordinates in amino acid residues and figures in parentheses indicate the number of residues 
between the conserved DDE. Part A includes an example of the HIV integrase protein to show its similarity to Tpases and 
Tcl, a member of the eukaryote mariner/Tc insertion elements. 


Not all ISs exhibit a well-defined DDE triad 
(Table 1). For example, the Tpases of members of 
the IS97 family show strong similarities with rep- 
licases involved in rolling circle plasmid and bacterio- 
phage replication. Members of the IS770 family 
appear to encode a novel type of site-specific re- 
combinase, while the ISZ transposase shows limited 
similarity to phage lambda integrase. 


Transposition Strategies 


Endonucleolytic cleavage of the phosphodiester bonds 
at the ends of the transposable element and their 
transfer into a target DNA molecule generally requires 
the assembly of a synaptic complex including the 
Tpase, the transposon ends, and target DNA. There 
are two principal modes of transposition, conservative 
and replicative, based on whether or not the element is 
copied in the course of its displacement. This is dic- 
tated by the nature and order of the cleavages at the 
ends (Figure 2): whether the transposon is liberated 
from its donor backbone by double strand cleavages 
or whether it remains attached following cleavage of 
only a single strand. 

The DNA cleavage and strand joining reactions 
necessary for transposition of many transposable 
elements with Tpases of the DDE type are remark- 
ably similar. These Tpases catalyse endonucleolytic 
cleavage at each 3’ transposon end to liberate 3’ OH 
groups, which are then used in a concerted nucleophil- 
ic attack on the target molecule. An important feature 
of the transposition reaction is therefore the way in 
which the 5’ end (second strand) is processed. 

Replicative transposition entails cleavage of only 
one strand at each transposon end and transfer into a 
target site in such a way as to create a replication fork 
(Figure 2). Some IS elements do not appear to process 
the second strand and simply undergo replicative 
transposition, or more precisely, ‘replicative integra- 
tion.’ These include members of the Tn3 and IS6 
families and perhaps ISZ. If transposition is intermo- 
lecular, replication from the nascent fork(s) generates 
cointegrates (replicon fusions), where donor and tar- 
get replicons are separated by a directly repeated copy 
of the element at each junction. Resolution of these 
structures to regenerate the donor and target mol- 
ecules, each carrying a single copy of the element, is 
accomplished by recombination between the two 
elements. This proceeds for some transposons by site- 
specific recombination promoted by a specialized 
transposon-specific enzyme distinct from the Tpase, 
the ‘resolvase’ (e.g., Tn3 family), or is taken in charge 
by the host homologous recombination system. 

In conservative or ‘cut-and-paste’ transposition, the 
element is excised from the donor site and reinserted 
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into a target site without replication. This implies 
cleavage of both DNA strands at the ends of the elem- 
ent and their rejoining to target DNA to generate a 
simple insertion. The original donor DNA molecule 
is either degraded or repaired by host-specified 
enzymes. Different IS elements have adopted various 
strategies to separate themselves from the donor DNA 
backbone. For the IS4 family members, IS/0 and IS50 
(Figure 2), the two breaks are not analogous. 3’ cleav- 
age occurs before 5’ cleavage and the free 3’ OH 
generated by 3’ cleavage is itself used as the nucleo- 
phile in attacking the second strand. This generates a 
hairpin structure at the transposon ends; this is subse- 
quently hydrolyzed to regenerate the final 3’ OH 
ends, which will undergo transfer to the target. The 
free ends are retained in a relatively stable complex 
with Tpase and generate a noncovalently closed 
excised transposon circle. This mechanism is reminis- 
cent of V(D)J recombination used in generating the 
immunoglobin repertoire, although the V(D)J hairpin 
is generated on what might be considered as the donor 
backbone ends. This chain of controlled consecutive 
reactions allows the repeated use of a single Tpase 
molecule bound to each end of the element. 

A second strategy is used by IS2, IS3, IS750, and 
18911 and presumably by other members of this large 
IS3 family. Here, Tpase promotes single strand cleav- 
age at one end of the transposon and its site-specific 
transfer to the same strand of the opposite end (Figure 
2). This circularizes a single transposon strand leaving 
the complementary strand attached to the donor back- 
bone. This second transposon strand is then resolved 
to generate a double-stranded covalently closed 
transposon circle, in which the transposon ends are 
abutted. The resolution mechanism but could involve 
simple cleavage and repair or replication promoted by 
host proteins. The covalently attached ends can then 
undergo simultaneous single strand cleavage and 
transfer to a target. This strategy of separating the 
transposon from its donor molecule may have also 
been adopted by members of the IS2/ and IS30 
families. While site-specific strand transfer from one 
end of the element to the other generates transposon 
circles, it can also occur between two elements carried 
by the same molecule. Transfer of ends between the 
two IS copies in a plasmid dimer, for example, would 
be expected to generate head-to-tail IS tandem dimers. 
This type of structure has been observed for IS2/, IS2, 
1830, and IS9/1, and is extremely active in transpos- 
ition. 

Of those ISs that do not carry a well-defined DDE 
triad, only IS9/ has been analyzed in detail. As sug- 
gested by the similarity of its Tpase with rolling circle 
type replicases, IS97 appears to have adopted a polar- 
ized rolling circle transposition mode requiring a 
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Figure 2 (opposite) Transposition strategies. Transposon DNA is indicated by open boxes or shaded boxes for 
newly replicated transposon DNA. Donor DNA is indicated as stippled lines and target DNA as bold lines. Strand 
cleavage is shown as small vertical arrows. Nucleophilic attack of the phosphodiester bond (P) by the active 3’ 
hydroxyl (3’OH) resulting in strand transfer is also indicated by arrows. The toothed region shown in the target DNA 
represents target duplications associated with insertion. DNA polarity is shown at the top of each panel. Note that in 
the case of IS9/, the polarity of the target DNA has been inverted to facilitate drawing of the figure. 


specific tetranucleotide target sequence which abuts 
IRR (Figure 2). ‘One-ended’ transposition products 
occur at high frequency in the absence of IRL. They 
carry a constant end defined by IRR and a variable end 
defined by a copy of the target consensus located in 
the donor plasmid. It is thought that donor strand 
cleavage results in a covalent complex between the 5’ 
IRR end and Tpase and is followed by single-strand 
transfer into the target DNA at a site containing a 
consensus tetranucleotide. The attached single strand 
of the IS is displaced by replication in the donor 
molecule. Termination is triggered when the complex 
reaches either the 3’ IRL end or a tetranucleotide con- 
sensus sequence in the donor (Figure 2). This scheme 
does not, however, address how the element is repli- 
cated into the target molecule. 


Target Specificity 


ISs show differing degrees of selectivity in their choice 
of target DNA sites. Sequence-specific insertion is 
exhibited to some degree by several elements and 
varies considerably in its stringency. It is strict in the 
case of IS97, which requires a GITC/CTTG target 
sequence, but less strict for members of the IS630 (and 
the related eukaryotic mariner/Tc) family, which 
require a TA dinucleotide in the target, for IS/0, 
which prefers (but is not restricted to) the symmetric 
5’/-NGCTNAGCN-3’ heptanucleotide, and for IS237, 
which shows a preference for 5’-GGG(N)s;CCC-3’. 
In the case of IS10, sequences immediately adjacent 
to the consensus have also been shown to influence 
target choice. A demonstration that IS/0 Tpase 
directly influences target choice has been obtained 
by isolation of Tpase mutants which exhibit altered 
target preference. Other elements show regional 
preferences such as DNA segments rich in GC 
(18186) or AT (IS1), which could reflect more global 
parameters such as local DNA structure. Indeed, the 
degree of supercoiling (IS50), bent DNA (IS237), 
replication (IS/02), transcription (IS/02, Tn5/Tn10), 
and possibly protein-mediated targeting to (or exclu- 
sion from) transcriptional control regions have all 
been evoked as parameters which influence target 
choice. 

Another phenomenon which may reflect insertion 
site specificity is the interdigitation of various intact or 


partial IS elements noted repeatedly in the literature. 
These are presumably the scars of consecutive but 
isolated transposition events resulting from selection 
for acquisition (or loss) of accessory genes. Some indi- 
cation of the statistical significance of this is expected 
to emerge from the many bacterial genome sequencing 
projects underway. On the other hand, several ISs 
show a demonstrable preference for insertion into 
other elements: [S231 inserts into the terminal 38 bp 
of the transposon Tn4430, which includes both the 
sequence-specific and conformational components 
described above, while IS27 has been reported to 
show a preference for insertion close to the end of a 
second copy. In the latter case, the site-specific DNA 
binding properties of the Tpase are presumably impli- 
cated. At the mechanistic level, this phenomenon 
might be related to the capacity of IS10 Tpase to 
form synaptic complexes with IS/0 ends located on 
separate DNA molecules. 


Control of Transposition Activity 


High levels of transposition are likely to be disadvan- 
tageous to the host cell under normal growth condi- 
tions and ISs have adopted a variety of mechanisms 
to restrain this activity. The location of Tpase pro- 
moters partially in IRL would permit autoregulation 
by Tpase binding. Some ISs such as ISZ encode specific 
repressor proteins. Additionally, binding sites for a 
range of host encoded proteins are found within or 
close to IS ends. These proteins include IHF, FIS, and 
DnaA. Not only can their binding regulate trans- 
position activity per se, but can also provide rather 
subtle changes in the type of transposition products 
obtained. In some elements, Tpase promoter activity 
is also regulated by the state of methylation of neigh- 
boring sites. One example is Dam methylation of a 
GATC sequence in the Tpase promoter of IS10. Tran- 
scription directed by the promoter is reduced when 
the site is fully methylated (on both strands) compared 
with hemimethylated DNA. Methylation has the 
double effect of lowering transposition activity of 
the end and of timing bursts of transposase synthesis 
with the passage of a replication fork which produces 
transiently hemimethylated DNA. This assures dupli- 
cation of the element prior to transposition, an import- 
ant consideration for elements which transpose in a 
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conservative mode. An additional level of regulation 
at the level of transcription is by premature termin- 
ation and mRNA processing. Transcription terminator 
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Figure 3 Control strategies. (A) Sequestration of 
ribosomal binding sites. The figure shows the left end of 
an IS with its terminal inverted repeat represented by 
two shaded boxes. The transcript impinging from 
outside the element is shown as a dotted line above 
and transcription driven by the indigenous promoter 
(Pint) is indicated below. Internal inverted repeat 
sequences are indicated by bold lines and their relative 
orientation is shown by arrows. (B) Programmed 
translational frameshifting. Two consecutive open-reading 
frames (A and B) together with their relative reading 
phases (0 and —I, respectively) and the region of 
overlap (J|) are shown within the IS element. Below 
(bold line) is shown the overall secondary structure of 
the corresponding mRNA. The group of codons which 
permit the ribosome to slide back one nucleotide is 
also indicated. The bottom of the figure shows how 
frameshifting can assemble two different functions into 
one protein. Here this is represented by a helix—turn— 
helix motif (H-T—H) in the N-terminal region, which 
permits sequence-specific binding of the Tpase to the 
ends of the IS and a DD(35)E motif in the C-terminal 
region which is essential for catalysis. 


sequences have been uncovered within the Tpase 
genes of IS7 and IS30 and are undoubtedly wide- 
spread. 

Integration into a highly active gene would also 
be expected to activate expression of IS genes. Many 
elements have adopted specific strategies to reduce 
such adventitious activation. One such strategy is to 
sequester translation initiation signals (Figure 3A). 
Here, an internal inverted repeat sequence carrying 
the Tpase ribosome binding site (rbs) is located close 
to IRL. Transcripts invading IRL from neighboring 
DNA will carry the inverted repeat and form a 
stem-loop structure trapping the rbs. Translation 
initiation signals in transcripts from the resident 
promoter, however, remain accessible since these 
carry only the proximal repeat (Figure 3A). This has 
been demonstrated for the IS4 family members, IS/0, 
and IS50, but many other ISs carry appropriately 
placed potential hairpin structures. Another level of 
control operates at translation initiation and involves 
synthesis of an antisense RNA which sequesters 
translation initiation signals. This type of control 
has been well documented for IS/0, where it is re- 
sponsible for multicopy inhibition in which the pre- 
sence of an IS10 copy ona high copy-number plasmid 
inhibits the activity of a copy located in the chromo- 
some. 

Additional regulation may occur at the level of 
translation elongation. Several ISs carry two partially 
overlapping open reading frames (ORFs). In one case, 
the IS27 family, this arrangement may give rise to 
translational coupling. In its simplest form, this may 
use an overlap of the last base of the termination 
codon of the upstream ORF (in phase 0) with the 
first of the initiation codon of the downstream ORF 
arranged in phase —1 (ITGATG). A second mechanism 
which regulates transposase synthesis involves pro- 
grammed translational frameshifting (Figure 3B). A 
—1 frameshift occurs by slippage of the translating 
ribosome one base upstream. Translation then con- 
tinues in the alternative (—1) phase. This occurs at 
the position of ‘slippery’ codons in a heptanucleotide 
sequence generally of the type Y YYX XXZ in phase 0 
(where the bases paired with the anticodon are under- 
lined), which is read as YYY XXX Z in the shifted —1 
phase. The sequence A AAA AAG is a common 
example of this type of heptanucleotide. Ribosomal 
shifting of this type is stimulated by structures in the 
mRNA that tend to impede the progression of the 
ribosome, such as potential ribosome binding sites 
upstream or secondary structures (stem-loop structures 
and potential pseudoknots) downstream of the slippery 
codons. Translational control of transposition by 
frameshifting has been demonstrated for ISZ and for 
members of the IS3 family, but may also occur in several 


other IS elements (e.g., one subgroup of the IS5 
family). 

Other control mechanisms may occur at translation 
termination. In some cases, the translation termination 
codon of Tpase genes is located within their IRR 
sequences, while in others the transposase gene simply 
does not possess a termination codon. Among the latter 
cases, the IS is known to insert into a specific target 
sequence in which the target direct repeat produced on 
insertion itself generates the Tpase termination codon. 
This has been observed for certain members of the 
1S630 family. The significance of these arrangements 
may be to couple translation termination, transposase 
binding, and transposition activity. 

Early studies of ISZ and IS50 demonstrated that 
impinging transcription from outside reduces trans- 
position activity. Transcription may disrupt the 
formation of the transposition complexes known as 
transpososomes in which transposase and the trans- 
poson ends are intimately bound. 

Tpase stability can also contribute to control of 
transposition since it can limit activity both tempor- 
ally and spatially. This may explain the observation 
that several Tpases function preferentially in cis (see 
below). Derivatives of the IS903 Tpase that are more 
resistant to the E. coli Lon protease than the wild type 
protein are more active and exhibit an increased capa- 
city to function in trans (see below). 

Early studies indicated that transposition activity 
of some elements was more efficient if the transposase 
is provided by the element itself or by a transposase 
gene located close by on the same DNA molecule. 
This preferential activity in cis reduces the probability 
that transposase expression from a given element 
will activate transposition of related copies elsewhere 
in the genome. The effect can be of several orders of 
magnitude. It presumably reflects a facility of the 
cognate transposases to bind to transposon ends 
close to their point of synthesis and is likely to be 
the product of several phenomena such as expression 
levels and protein stability. Another contributing fac- 
tor may derive from the domain structure of known 
transposases (see above) in which the DNA binding 
domain is located in the N-terminal end of the protein. 
This arrangement would permit preferential binding 
of nascent transposase polypeptides to neighboring 
binding sites. Indeed, the N-terminal portion of sev- 
eral Tpases exhibits a higher affinity for the ends than 
does the entire transposase molecule, suggesting that 
the C-terminal end may mask the DNA binding activ- 
ity of the N-terminal portion. 
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Berg DE and Howe MM (eds) (1989) Mobile DNA. Washington, 
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See also: Escherichia coli; Transposable Elements 


Insertion, Insertional 
Mutagenesis 


See: Chromosome Aberrations; DNA Cloning; In 
vitro Mutagenesis; Mutation 
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Insulinoma occurs primarily in one of two principal 
forms, sporadic and familial, specifically as one com- 
ponent of the multiple endocrine neoplasia type 1 
(MEN-1). MEN-1 is a clinical syndrome inherited in 
an autosomal dominant pattern, and includes primary 
hyperparathyroidism, multiple duodenopancreatic 
endocrine tumors (of which insulinoma is one type), 
and pituitary adenomas. No specific genetic abnor- 
mality has been consistently identified as the cause of 
sporadic insulinomas, whereas the recently cloned 
gene responsible for inheritance of MEN-1 has been 
mapped to chromosome 11q13. This gene contains 10 
exons that encode a 610-amino acid protein product, 
menin. Research suggests that the MEN-1 gene is a 
tumor suppressor gene. 


Further Reading 

Chandrasekharappa SC, Guru SC, Manickam P et al. (1997) 
Positional cloning of the gene for multiple endocrine 
neoplasia-type |. Science 276: 404. 
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Larsson C, Skogseid B, Oberg K, Nakamura Yand Nordenskjold 
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The term integrase is used to describe the following 
two enzymes: 


1. An enzyme (Int) responsible for catalyzing the 
breakage and rejoining of DNA during the inser- 
tion of a bacteriophage genome into (and its exci- 
sion from) its chromosomal attachment site by the 
process of site-specific recombinases. Most Int pro- 
teins belong to the tyrosine recombinase family of 
site-specific recombinases but several examples of 
serine recombinases are also known. 

2. An enzyme (IN) encoded by retroviruses that is 
responsible for the 3’ processing of retroviral 
DNA and insertion of the processed DNA into a 
genomic target. IN proteins are derived by proteo- 
lysis from the C-terminus of the gag-pol poly- 
protein, and belong to the DD(35)E family of 
transposases. 


See also: Integrase Family of Site-Specific 
Recombinases; Phage A Integration and Excision; 
Retroviruses; Site-Specific Recombination; 
Transposable Elements 


Integrase Family of Site- 
Specific Recombinases 
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The Int family of recombinases belongs to the general 
class of proteins that act on specific DNA sequences 
to effect deletion, insertion, or inversion of large 
segments of genomic DNA. The approximately 100 
known proteins in this family are found in archaebac- 
teria, eubacteria, and eukaryotes. Sometimes referred 
to as the tyrosine integrase family, they are distin- 
guished by their use of a tyrosine nucleophile and five 


highly conserved basic residues to catalyze DNA 
cleavage and ligation reactions in the absence of high- 
energy cofactors. Another hallmark for the recombin- 
ases in this family is a sequential strand exchange 
mechanism that generates a four-way DNA junction 
(Holliday junction) as a recombination intermediate. 

The biological roles of various Int family members 
include copy number control and stable inheritance of 
circular replicons, the integration and excision of viral 
chromosomes into and out of the chromosomes of 
their respective hosts, the regulation of expression of 
cell surface proteins, conjugative transposition, the 
movement of antibiotic resistance genes into and out 
of transposable elements and plasmids, and the relax- 
ation of positive and negative supercoils during eu- 
karyotic DNA replication repair, recombination, and 
transcription. 


The Reaction 


The minimal Int family target on DNA consists of a 
single binding site for a topoisomerase monomer. 
DNA strand cleavage involves activation of the scissile 
phosphate by the highly conserved pentad of active site 
residues (Arg, Lys, His, Arg, His) and formation of a 
nick with a 5’ OH and 3’-phosphotyrosine linkage to 
the recombinase. This transient covalent intermediate 
releases one superhelical turn, via mechanics that are 
not completely understood, and the nick is resealed by 
a simple reversal of the cleavage step. 

In the case of Int family-mediated recombination, 
the minimal DNA target consists of two recombinase 
binding sites that are positioned as inverted repeats 
separated by 6-8 bp (called the ‘overlap region’). 
Synapsis and proper alignment of two such recom- 
bination partners generates a tetrameric complex in 
which each recombinase protomer carries out the 
cleavage and ligation of one DNA strand, executed 
as two sequential pairs of cleavage/ligation reactions. 
In the first pair of reactions one strand in each partner 
DNA helix is cleaved and the first three or four bases 
of the free 5’ hydroxyl-terminated strands of the over- 
lap region are swapped and then ligated. This forms a 
Holliday junction with four continuous DNA strands 
(see Figure |). After some rearrangements within the 
Holliday junction, the intermediate is ‘resolved’ by a 
reciprocal strand swapping of the second pair of 
strands so that all four DNA strands have new junc- 
tions and two recombinant DNA helices have been 
generated. The formation of Holliday juction 
intermediates distinguishes the Int family from the 
resolvase/invertase family of site-specific recom- 
binases, which use a serine nucleophile to carry out a 
pair of concerted (rather than sequential) strand 
exchanges, and from the transposase family of 
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Figure | Holliday junction formation and resolution 
by Int family members. One strand of each recombining 
partner duplex is cleaved (open arrowhead) at the left 
boundary of the overlap region (in this example, 7 bp 
denoted by short vertical lines) forming a covalent 3’ 
phosphotyrosine intermediate (not shown). A short 
segment (three bases in this example) of single-stranded 
DNA from each partner is swapped between the 
duplexes and the phosphotyrosine linkages are disen- 
gaged by the formation of new phosphodiester linkages 
(heavy vertical bars) at the recombinant joints (second 
panel). Following this pair of ligation reactions the 
Holliday junction intermediate undergoes rearrange- 
ments that include movement of the crossover point 
and some conformational changes (third panel) which 
set the stage for the second pair of strand exchanges on 
the other side of the overlap region (filled arrow heads). 
Now it is the bottom strands that are cleaved, swapped, 
and religated to form the second pair of recombinant 
joints and the recombinant product duplexes. 
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reactions, which do not involve covalent protein- 
DNA intermediates and additionally require some 
DNA synthesis to complete recombination. 


The Overlap Region 


The sequential pair of reciprocal strand exchanges 
that generate and then resolve a Holliday junction 
are separated not only temporally but also spatially, 
by six to eight base pairs that are referred to as the 
‘overlap’ region. The overlap region is precise and 
characteristic for each recombinase. Because of this 
stagger in cleavages the resulting overlap region in 
the recombinant DNA helices is ‘heteroduplex,’ i.e., 
it has one strand from each of the two parental helices. 
If the overlap region DNA sequence were not identical 
in the two parental helices the recombinant hetero- 
duplex region would have base pair mismatches. 
In most (but not all) Int family pathways such mis- 
matches are not tolerated and there is a strict require- 
ment for sequence identity in the overlap regions of 
recombining partners. It is thought that the reciprocal 
(and reversible) strand swaps of 3-4 bp is where 
sequence identity is recognized and not at some earlier 
step such as the synapsis or alignment of parental 
helices. 


Target Specificity 


There are two sources for the target specificity in 
the Int family recombinations. One is the require- 
ment for overlap region identity described above. 
This source of specificity serves primarily to match 
two targets to each other because there is a wide 
latitude of DNA sequences that can function in the 
overlap region for a given recombinase. The second 
source of specificity resides in the DNA-binding 
sites of the recombinase and any required accessory 
proteins. In the simplest Int family reactions this 
amounts to four protein binding sites, one for 
each of the required recombinase monomers. These 
7-9 bp recognition sequences, which occur as inverted 
repeats flanking the overlap regions, are allowed and 
may be favored by some degeneracy. Thus, the overall 
target specificity is equivalent to a DNA sequence 
of approximately 15-20bp. For some Int family 
members target specificity is further enhanced by 
the addition of ‘extra’ recombinase binding sites that 
are not essential for the minimized reaction but 
may play a role in nature. Even higher specificity is 
to be found among those Int Family members that 
contain a second specific DNA-binding domain and/ 
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or depend upon several sequence-specific accessory 
proteins 


Sub-Families 


Monomeric Targets 

Based upon the number of reaction components, the 
eukaryotic type IB topoisomerases exemplify the 
most basic Int family reaction: a single protomer 
executing one cleavage/ligation reaction on one strand 
of duplex DNA. The two best-studied examples of 
this subgroup, and for which X-ray crystal structures 
are available, are the human topo I and the vaccinia 
virus-encoded topoisomerase. 


Dimeric Targets 

After the topoisomerases, the most basic recombin- 
ation pathway requires four identical protomers and is 
described for the most part by the reaction scheme 
outlined above. The two best-studied examples of this 
group are the Cre recombinase of the Escherichia coli 
bacteriophage P1 and the Flp recombinase of the 2 um 
plasmid of the yeast Saccharomyces cerevisiae (see 
Table 1). Cre recombinase acts on two DNA target 
sites (lox sites) to reduce multimers of P1 plasmid to 
monomeric circles (each containing a single /ox site) 
and thereby numerically favoring the passively disper- 
sive inheritance of P1 to both daughter cells. FLP 
recombinase also has the biological function of enhan- 
cing plasmid inheritance but uses a different strategy. 
The 241m plasmid contains two Flp target sites (frt 
sites) oriented with respect to each other such that 
recombination between them results in inversion 
(rather than deletion) of the intervening DNA. The 
effect of this inversion is to convert two divergent 
DNA replication forks into two tandem forks with a 
rolling circle mode of replication that generates multi- 
ple copies of plasmid DNA. Neither Cre nor Flp 
exhibit topological or orientation selectivity. That is, 
they are capable of recombining sites on different 
molecules or on the same molecule. When the sites 
are on the same molecule they can be direct repeats, 
leading to excision of the intervening DNA (called 
resolution), or they can be inverted repeats, leading 


Table I Levels of complexity in different Int family 
reactions 

Recombinase Accessory Heterobivalent Int 

factors 

Cre None - 

Flp None - 

Xer ArgR, PepA - 

Alnt IHF, Xis, Fis + 
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to inversion of the intervening DNA, as is found in 
nature for Cre and Flp, respectively. 


Accessory Proteins 

The next step up in Int family complexity is best 
exemplified by the XerC/XerD pathway of E. coli, in 
which the first pair of strand exchanges is executed by 
XerC and the second pair by (the closely related) 
XerD. Additionally, two site-specific DNA-binding 
proteins, ArgR and PepA, which have other roles in 
E. coli, are incorporated as structural elements in the 
synaptic complex between two XerC/XerD recom- 
bination sites. They act at accessory sequences such 
that approximately 180 bp of DNA adjacent to each 
core recombination site are interwrapped approxi- 
mately three times in a right-handed fashion. The 
topology of the interwrapped synapsed sites ensures 
that recombination occurs only between directly 
repeated sites on the same molecule, a constraint con- 
sistent with the role of this pathway in converting 
plasmid multimers into monomers. This pathway is 
also responsible for maintaining the E. coli chromo- 
some as a monomer and does so at a site called dif. It is 
interesting to note that XerC/XerD core recombin- 
ation sites with a 6bp instead of an 8bp overlap 
sequence do not require the accessory proteins and 
lose the orientation and topological selectivity. How- 
ever, when supplied with accessory sequences and 
accessory proteins selectivity is restored. 


Heterobivalent Recombinases 

The third level of Int family complexity has been best 
studied in the pathways of lysogenic viruses that cata- 
lyze the integration and excision of viral chromo- 
somes into and out of the chromosomes of their 
hosts. Ironically, the first Int family member to be 
identified genetically and characterized biochemically 
was the integrase (Int) of bacteriophage lambda, one 
of the well-studied exemplars of the most complex 
pathways in this family. The distinguishing feature 
of this subgroup is that they possess an additional 
DNA-binding domain that binds with high affinity 
to ‘arm-type’ sites that are different and distant from 
the core-type binding sites where strand exchange takes 
place. The apparent paradox raised by a heterobivalent 
recombinase was resolved by the finding that several 
essential accessory proteins are sequence-specitic 
DNA-bending proteins with binding sites that fall be- 
tween the two different types of Int binding sites. The 
introduction of ‘U-turn’ bends in the DNA delivers Int 
bound at the high affinity arm-type sites to the lower 
affinity core-type sites where catalysis takes place. In 
the lambda pathway example, two of the accessory 
bending proteins, IHF (integration host factor) and 
Fis (factor for inversion stimulation), are encoded by 


the E. coli host where they play important roles in the 
regulation of DNA transcription and replication, and 
the third accessory bending protein, Xis (excision fac- 
tor), is encoded by the viral genome. This additional 
complexity affords mechanisms by which the viruses 
can both control the direction of recombination (the 
presence or absence of Xis is required for excisive 
versus integrative recombination, respectively) and 
modulate its efficiency (the levels of IHF and Fis 
have opposite effects on the efficiency of excisive 
recombination). 


Structures 


Crystal structures have been determined for four 
recombinases and two topoisomerases. Each of the 
structures captures a different view of the protein: a 
monomeric catalytic domain of lambda Int, a dimeric 
catalytic domain of HP1 Int, the full length protein of 
XerD, a Cre tetramer covalently bound to a Holliday 
junction recombination intermediate, the catalytic 
core of vaccinia virus topoisomerase, and fragments 
of human topoisomerase I complexed with DNA. The 
most dramatic and informative of the structures are 
those of the cocrystals with their respective target 
DNAs, where many of the biochemical insights into 
Int Family reaction mechanisms have been visualized 
and extended. 

A number of informative generalizations also 
emerge from a comparison of the structures and espe- 
cially from the structures involving protein-DNA 
cocrystals. Despite the great divergence in primary 
amino acid sequences there are extensive regions 
where the six structures possess a similar tertiary fold, 
but because of the differences in primary sequence 
these structural similarities are punctuated by inser- 
tions and deletions. A critical region, involving several 
of the conserved active site residues and the tyrosine 
nucleophile is surprisingly not part of the highly con- 
served tertiary fold. However, it is thought that these 
differences are likely due to the different multimeriza- 
tion states and the presence or absence of DNA in the 
crystals. It remains to be determined whether these 
differences might reflect structures that are relevant at 
different steps in the reaction or whether they are idio- 
syncrasies of the particular crystallization states. As 
expected, the crystal structures of the Int family site- 
specific recombinases comprise the foundation and 
impetus for further sharpening our understanding of 
this fascinating class of DNA transactions. 


Further Reading 
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Integration is the insertion of one DNA molecule into 
another to form a single product. It is commonly used 
to describe the insertion of a viral genome or a plasmid 
into the chromosome of its host cell. 
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In its general sense, the term integron is used to 
describe genetic entities that are able to capture small 
mobile elements known as gene cassettes (see Gene 
Cassettes) and thus have the capacity toincorporatenew 
genes at a specific internal location. Integrons include 
three characteristic features (Figure |): an intI gene, 
an attI site, and a Pe promoter. The int! gene encodes a 
site-specific recombinase (IntI) belonging to the tyro- 
sine recombinase or integrase family. The adjacent att/ 
site is a recombination site recognized by the inte- 
grase. The IntI integrase also recognizes the 59-be 
(59-base element) recombination sites found in gene 
cassettes and incorporates the cassette into the att 
site. The third key feature of an integron is a promoter 
(P.), facing toward the att? recombination site, that 
directs transcription of the cassette-associated genes. 
Thus, integrons are natural cloning vehicles that act 


1042 Integrons 
Po 
id attl 
(A) Empty integron 
intl gil 
Pint gtirrry 
ORF 
Free gene cassette 
(8) Intl 
Xx 59-be 
GTTRRRY 
P, 
ig atl ORF 59-be Integron containing 
(C) one cassette 
intl 
n gTTRRRY Gttrrry 
i | Integrated cassette 
TT G 


Figure | Integrons capture gene cassettes. (A) An empty integron, showing the key features of an integron, an intl 
gene that encodes the Intl integrase, an adjacent recombination site, att! (hatched box), and promoters P, and Pint. (B) 
A circular gene cassette consisting of a gene or open reading frame (ORF) and a 59-be recombination site (filled box). 
(C) An integron containing one gene cassette, showing the boundaries of the integrated cassette below. Gene 
cassettes are inserted into the integron by Intl-catalyzed recombination between attl in the integron and the 59-be in 
the circular cassette. The ORF in the inserted gene cassette can now be transcribed from P.. The 7-bp core sites 
surrounding the recombination crossover point in the att! site of the integron and in the 59-be of the circular cassette 
are represented by gttrrry and GTTRRRY, respectively, and the configuration of these bases after incorporation of 
the cassette is shown in (C). Further cassettes may be inserted at att! in like manner, leading to arrays of integrated 


cassettes. 


both as agents of gene capture and as expression vec- 
tors for the captured genes. 


There Are Many Different Classes of 
Integrons 


Integrons were discovered relatively recently as a con- 
sequence of observations made in the 1980s. Hetero- 
duplexes formed between DNA derived from 
different bacterial plasmids (or transposons) that con- 
tain one or more antibiotic resistance genes revealed 
that several quite different antibiotic resistance genes 
were flanked by identical, or very closely related, 
regions of DNA. As sequences became available, the 
identity of the flanking regions (the integron) was 
confirmed and the very precise nature of the bound- 
aries between the conserved segments and the various 
regions containing the resistance genes (the gene 
cassettes) was revealed. A site-specific recombination 
mechanism was implied and this was subsequently 
demonstrated experimentally. 


The term integron was originally coined by Stokes 
and Hall in 1989 to describe this specific group. 
However, as further different integrons have since 
been found, this group are now designated class 1 
integrons. Class 1 integrons include the characteristic 
features of an integron, as now defined in its more 
general sense, but are also mobile elements. They are 
widely distributed in clinical and environmental 
gram-negative bacteria and are responsible for the 
dissemination of many different cassette-associated 
antibiotic resistance genes. Because the mobility of 
class 1 integrons is an important factor in spreading 
resistance genes, for this group the original definition 
of an integron continues to be used, i.e., they are 
defined as including the whole mobile element (trans- 
poson or defective transposon derivative), which 
includes intI1, attI1, and P.o. Hence class 1 integrons 
are both transposons and integrons and this dual 
nature allows them to move onto plasmids and 
hence to become widely distributed in the bacterial 
world. 


Several different intI/attI units that are associated 
with gene cassettes have been found and itis likely that 
many more remain to be discovered. To distinguish 
them, integrons are classified using the sequence of the 
intI gene and IntI recombinase. Members of the same 
class have the same (>98% identity) integrase. The 
known IntI proteins all share significant levels of iden- 
tity and form a distinct family within the integrons (or 
tyrosine recombinase) superfamily. Overall, integrons 
fall into two groups: those that are mobile and those 
that are an integral part of a bacterial chromosome. The 
three classes of integrons found in antibiotic-resistant 
clinical isolates are all mobile. The best-characterized 
example of the chromosomal integrons is situated in 
the small chromosome of Vibrio cholerae. Chromo- 
somal integrons are also found in other Vibrionaceae 
and it appears that an integron found its way into the 
small chromosome of the common ancestor before 
speciation occurred. Other bacterial species also in- 
clude an integron as part of their genome. 


Integrons Usually Contain Arrays of 
Gene Cassettes 


An integron does not necessarily include any gene 
cassettes and empty class 1 integrons (Figure 1A) 
have been found in the wild and created experimen- 
tally. However, it is most common for one or more 
cassettes to be found in any individual integron. When 
cassettes are present they are viewed as part of the 
integron though, strictly speaking, such integrons are 
composite structures made up of the integron back- 
bone and an array of gene cassettes. Furthermore, any 
individual integron-cassette combination can be des- 
cribed by listing the cassettes in order. Arrays of one to 
five cassettes containing antibiotic resistance genes are 
most common in mobile integrons but, in the chromo- 
somal integrons, the array can include over 150 gene 
cassettes as is the case for the one in the recently 
sequenced Vibrio cholerae small chromosome. The 
Vibrio cassette array is a highly variable region of the 
chromosome and differs from strain to strain. Though 
most of these cassettes contain an open reading frame 
(ORF) whose function is not known, genes encoding 
toxins, virulence factors, restriction and modification 
enzymes, and a lipoprotein, as well as a few potential 
antibiotic resistance genes, have all been identified 
among them. Thus, these long cassette arrays may 
act as storage depots for cassettes containing a wide 
range of genes. The cassettes in chromosomal inte- 
grons can presumably be picked up and moved out 
into other organisms by passing plasmids that carry a 
mobile integron. Indeed, chromosomal integrons are 
likely to be the source of the cassettes that carry 
antibiotic resistance genes. As there are a vast number 
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of gene cassettes, each of which can be incorporated 
into the attI site of an integron, and as more than one 
gene cassette can be integrated at the att/ site to create 
arrays containing multiple gene cassettes, a potentially 
infinite number of configurations are possible. 


Integrons Capture Gene Cassettes 


The main function of integrons is to capture gene 
cassettes. Integrons differ from most other int/att 
units in that they do not mobilize the entity in which 
they are contained, rather they act in trans to mobilize 
cassettes. Gene cassettes are the simplest of the known 
mobile elements and consist of a single gene, or occa- 
sionally two genes, and a downstream recombination 
site. These cassette-associated recombination sites are 
called 59-be (see Gene Cassettes) and they have a 
different architecture to that of the attI sites. Available 
information on the integron recombination system and 
on cassette uptake and loss is largely restricted to 
studies using the class 1 IntI1/att/1 system. IntI1 has 
been shown to recognize both att/1 and 59-be sites 
and cancatalyze integrative site-specific recombination 
between any pair of primary sites, att/1 x attI1, attI1 
x 59-be, and 59-be x 59-be, and excisive recombin- 
ation between att// and a 59-be or between two 59-be. 
Recombination between attI1 and a 59-be is the pre- 
ferred integrative reaction catalyzed by IntI1 and inte- 
gration of free, circular gene cassettes (Figure 1) 
occurs via a single IntI1-mediated site-specific recom- 
bination reaction between att/7 in the integron and 
the 59-be in the cassette. Though integrative recombin- 
ation between two 59-be sites also occurs with high 
efficiency, it seems to play no part in the integration of 
gene cassettes. When one or more cassettes are already 
present in the integron, further cassettes are inserted 
preferentially at the att/1 site. Excision of cassettes 
occurs via both attľ1 x 59-be and 59-be x 59-be 
reactions. No accessory factors have been identified 
to date and IntI1 appears to be sufficient for both 
integration and excision reactions. 


Expression of Cassette-Associated 
Genes 


Gene cassettes are compactly organized and the vast 
majority do not include a promoter. Expression of the 
cassette-associated genes is thus dependent on the pres- 
ence of an upstream promoter (see Gene Cassettes). 
Cassettes are integrated in only one orientation and, 
in general, the relationship of the gene and 59-be is 
such that in this orientation the P. promoter supplied 
by the integron lies upstream as shown in Figure I. 
For class 1 integrons containing more than one cassette, 
it has been shown that all of the cassette genes are 
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transcribed from P.. Integrons thus create new oper- 
ons containing a wide variety of genes and gene 
orders. The level of expression is highest for the gene 
in the P, proximal cassette and falls progressively for 
genes in downstream cassettes. Consequently, a cas- 
sette needs to be located relatively close to P. if its 
gene is to be expressed. The P. promoters of other 
classes of integrons have not been located, but in some 
cases their presence is implied because antibiotic resist- 
ance genes in associated cassettes are expressed. 
Whether the genes and ORFs found in cassettes that 
are part of the very long cassette array in the V. 
cholerae chromosome are expressed remains to be 
established. However, it is possible that only the genes 
in cassettes located closest to the att] site or in a 
cassette that contains a promoter can be expressed, 
while downstream genes remain silent. 


The att! sites 


The structure of the att/1 site has been examined 
experimentally and is shown in Figure 2. Cassettes 
are incorporated precisely between the G and TT in 
the right-hand core site (Gttrrry) of the att/1 simple 
site. This position is indicated by an arrow in Figure 2. 
A region of 65 bp is required for the reaction between 
attI1 and a 59-be. This region includes a simple site, 
made up of two inversely oriented IntI1 binding 
domains, and two further IntI1 binding domains that 
are located to the left and act as recombination en- 
hancers. This enhancement effect is not seen when 
attI1 recombines with a second att// site. Differences 
in the architecture of 59-be and att// sites presumably 
underlie these preferences. 

The sequences of the other integron-associated att 
sites (attI2, att[3, etc.) are not closely related either to 


P? 
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one another or to att/1, but, like attI1, do not share the 
characteristic features of the cassette-associated 59-be 
sites. The identifiable features shared by the att! sites 
are currently limited to a pair of inversely oriented 
putative IntI binding domains equivalent to those 
that make up the simple site in attI1. Whether the 
other attI sites also include further IntI binding regions 
that enhance recombination remains to be established. 
Available evidence indicates that the various Intl 
recombinases recognize only their adjacent (cognate) 
site, though IntI1 is also able to recognize other attI 
sites with low efficiency. Thus, each att! site must 
include distinctive features that permit this selectivity. 


Integrons of Different Classes Share 
Gene Cassettes 


It is known that integrons of different classes share 
cassettes because identical gene cassettes have been 
found in integrons from more than one class. All of 
the cassettes that have been found in the cassette 
arrays of class 2 and class 3 integrons have also been 
found associated with class 1 integrons. Thus, it 
appears that the known Intl-type integrases can all 
recognize the same cassette-associated 59-be sites, 
though, to date, this has been demonstrated experi- 
mentally only for IntI1 and IntI3. This is in contrast to 
the strong preference of each integrase for its own att] 
site. However, many distinct groups of 59-be have 
been found and the 59-be in the cassette arrays of any 
individual chromosomal integron are generally from 
a single group. In contrast, mobile integrons contain 
cassettes with many different 59-be types. Hence, 
it is possible that the different IntI recombinases 
recognize one type of 59-be more efficiently than 
others. 
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Figure 2 Structure of attl1 and the promoter region of class | integrons. attl? contains four IntlZ binding domains 
that include a central 7-bp sequence related to the core site consensus sequence GTTRRRY. These core sites are 
boxed, and arrows indicate their relative orientations. The simple site of attl1, within which the recombination 
crossover (vertical arrow) occurs, is at the right-hand end. The binding sites found to the left of the simple site 
enhance recombination efficiency. Bases to the left of the crossover belong to the 5’-conserved segment that is found 
in all class | integrons. Bases to the right of the crossover point (lower case letters) are part of the first integrated 
cassette, if a cassette is present. The genes in cassettes are transcribed from the promoter P., located within the intl 
gene, and intl? is transcribed leftward from Pint. 


Intelligence and the ‘Intelligence Quotient’ 


Structure of Class | Integrons 


Class 1 integrons have a variety of structures resulting 
from the incorporation of other genes (e.g., the sul1 
sulfonamide resistance gene) and of insertion 
sequences (IS) that have caused subsequent deletion 
and rearrangement events leading to loss of some or all 
of the transposition genes. The presumed progenitor, 
exemplified by Tn402, is a transposon that includes 
both the integron functions (intI1, attI1, and P.) anda 
set of transposition genes (tniA, B, Q, R) and is 
bounded by 25-bp inverted repeats IR; and IR,. How- 
ever, this structure is rare and most class 1 integrons 
are transposition-defective derivatives. Generally, they 
retain the transposon terminal inverted repeats and 
hence they can and do move using transposition pro- 
teins supplied in trans. Because these structures cannot 
legitimately be numbered as transposons, they are 
designated In and numbered to distinguish the many 
variations in the backbone structure. 


Further Reading 

Hall RM and Collis CM (1995) Mobile gene cassettes and inte- 
grons: capture and spread of genes by site-specific recombin- 
ation. Molecular Microbiology 15: 593—600. 

Hall RM and Collis CM (1998) Antibiotic resistance in gram- 
negative bacteria: the role of gene cassettes and integrons. 
Drug Resistance Updates |: 109-119. 

Partidge SR, Recchia GD, Scaramuzzi C et al. (2000) Definition 
of the attl! site of class | integrons. Microbiology 146: 2855— 
2864. 

Recchia GD and Hall RM (1995) Gene cassettes: a new class of 
mobile element. Microbiology 141: 3015-3027. 


See also: Gene Cassettes; Integrase Family of Site- 
Specific Recombinases; Site-Specific 
Recombination 


Intelligence and the 
‘Intelligence Quotient’ 


T J Crow 
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The concept of intelligence developed from the 
attempt to quantify human cognitive abilities in the 
early years of the twentieth century. There is no doubt 
that individuals can be ranked in terms of their ability 
to complete tests of verbal and nonverbal ability, that 
is to say, the ability to use words and visuospatial 
constructs, as well as more complex capacities such 
as the ability to read and to handle mathematical 
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symbols. These abilities develop in the course of child- 
hood. It was discovered early on through multivariate 
analysis that a general factor of ability can be extracted 
from the performances of individuals on batteries of 
tests constructed to assess the development of cogni- 
tive ability. From these analyses emerged the concept 
of the ‘intelligence quotient’ which attempts to assess 
the extent to which an individual differs from the 
mean of the population of his/her age group, a calcu- 
lation that is based on a population mean of 100. 
Standard batteries of tests (e.g., the Stanford—Binet 
and the Wechsler adult intelligence scale, WAIS) have 
been constructed and widely used both for assessing 
learning disability and for the purposes of educational 
and occupational selection. 

Once generated, the abstract concept of ‘intelli- 
gence’ acquired an autonomous life that left un- 
answered questions concerning its reality and origin. 
Controversy centered on whether intelligence is uni- 
tary or a composite of component abilities and, if the 
latter, which of these are fundamental. Equally 
importantly, the origins of the variation within popu- 
lations and the extent to which it can be regarded as 
genetically determined have been widely and some- 
times acrimoniously disputed, with claims being made 
for differences between populations that cannot be 
accounted for by environmental factors such as edu- 
cational opportunity. 

The possibility that some part of the variation is 
genetic raises the further interesting questions of what 
sort of genes might be responsible and what selective 
pressures these genes might be under. One can also ask 
whether the variation is specific to Homo sapiens or 
whether similar variation might be detected in other 
primates and other mammals. 

These questions suggest a quite different approach 
to human cognitive abilities and that the whole con- 
cept of intelligence can be placed in an alternative 
context. This is the suggestion that what is char- 
acteristic of Homo sapiens is not intelligence (or a 
particular degree of intelligence) but the capacity for 
language, and that this arose as a result of discrete 
genetic changes in the course of hominid evolution. 
Language, according to the linguists N. Chomsky and 
D. Bickerton, for example, is a capacity that has no 
obvious precedents in the communicative abilities of 
other primates. It is the defining feature of modern 
Homo sapiens. 

The salient candidate for the genetic change is that 
the brain lateralized, i.e., that the functions of the 
two hemispheres became differentiated, and that this 
occurred on the basis that development of the 
hemispheres became subtly asynchronous across the 
anteroposterior axis, i.e., from right frontal to left 
occipital lobes. One component of language, probably 
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the phonological sequence, is localized in the ‘domin- 
ant,’ usually the left, hemisphere. Dominance for this 
component of language is reflected in directional 
handedness (85-90% of most populations is right- 
handed) and this also appears to be a characteristic 
that distinguishes humans from the chimpanzee. 

Handedness, reflecting cerebral dominance, is a trait 
that is associated with quantitative variation. Whether 
this variation is a correlate of human cognitive ability, 
as would seem plausible if it underlies the specific 
characteristic of language, has been much debated, 
but it now appears that lesser degrees of lateralization 
(‘hemispheric indecision’) are associated with delay in 
the development of verbal, and also nonverbal, ability 
(Crow et al., 1998). Thus it appears that lateralization 
is associated with significant variation in the rate at 
which words acquire meaning, and that this variation 
reflects a dimension that is specific to Homo sapiens. 
The genetics of lateralization reflects the mechanism 
of transition from a precursor hominid to modern 
Homo sapiens. Of particular note is the fact that 
there are sex differences both in handedness (girls on 
average are more right-handed and less likely to be 
left-handed than boys) and verbal ability (girls acquire 
words faster). There is an obvious possibility that the 
relevant gene(s) is sex-linked and an X-Y homologous 
locus has been suggested. 

These considerations cast the question of human 
‘intelligence’ in a new and perhaps more biological 
perspective. In particular, they emphasize the species- 
bound nature of the variation and the survival value of 
the core characteristic of language. There remains the 
problem of the genetic nature of the variation and its 
persistence. Such questions touch on the evolutionary 
significance of species transitions and the maintenance 
of species boundaries. 


Reference 

Crow TJ, Crow LR, Done DJ and Leask SJ (1998) Relative hand 
skill predicts cognitive ability; global deficits at the point of 
hemispheric indecision. Neuropsychologia 36: 1275—1280. 


See also: Heritability 


Intercross 


L Silver 
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An intercross is a cross between two organisms that 
have the same heterozygous genotype at designated 
loci. An example would be a cross between sibling F1 


hybrid organisms that were both derived from an out- 
cross between two inbred strains. 


See also: Backcross; Incross; Outcross 


Interference, Genetic 


L Silver 
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Multiple events of recombination on the same chromo- 
some are not independent of each other. Instead, a 
recombination event at one position on a chromosome 
will act to interfere with the initiation of other recom- 
bination events in its vicinity. This phenomenon is 
known, appropriately, as ‘interference.’ Interference 
was first observed within the context of significantly 
lower numbers of double crossovers than expected in 
the data obtained from some of the earliest linkage 
studies conducted on Drosophila. Since that time, 
interference has been demonstrated in every higher 
eukaryotic organism for which sufficient genetic data 
have been generated. 

Significant interference has been found to extend 
over very long distances in mammals. The most 
extensive quantitative analysis of interference has 
been conducted on human chromosome 9 markers 
that were typed in the products of 17316 meiotic 
events. Within 10 cM intervals, only two double cross- 
over events were found; this observed frequency of 
0.0001 is 100-fold lower than expected in the absence 
of interference. Within 20 cM intervals, there were 10 
double crossover events (including the two above); 
this observed frequency of 0.0005 is still 80-fold 
lower than predicted without interference. As map 
distances increase beyond 20 cM, the strength of inter- 
ference declines, but even at distances of up to 50 cM, 
its effects can still be observed. 

If one assumes that human chromosome 9 is not 
unique in its recombinational properties, the implica- 
tion of this analysis is that for experiments in which 
fewer than 1000 human meiotic events are typed, mul- 
tiple crossovers within 10 cM intervals will be extre- 
mely unlikely, and within 25 cM intervals, they will 
still be quite rare. Data evaluating double crossovers 
in the mouse are not as extensive, but they suggest a 
similar degree of interference. Thus, for all practical 
purposes, it is appropriate to convert recombination 
fractions of 0.25, or less, directly into centimorgan 
distances through a simple multiplication by 100. 


See also: Linkage Map 


Interphase 
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Interphase is the period between mitotic cell divisions, 
and is divided into three phases: G4, S, and Gp. 


See also: Cell Cycle 


Intersex 


M A Ferguson-Smith 
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The term intersex is used in clinical genetics to 
describe any individual with ambiguity of the internal 
and/or external genitalia. It is used more widely in 
animal genetics to indicate a phenotype in which the 
somatic sex is at variance with the genetic or chromo- 
somal sex. 


See also: Hermaphrodite; Sex Reversal 


Interspecific, Intraspecific 
Cross 


L Silver 
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A cross between organisms from two different, but 
closely related species (that can produce fertile off- 
spring of at least one sex) for the purpose of taking 
advantage of the increased frequency of genetic differ- 
ences to carry out linkage studies. 


See also: Linkage; Linkage Map 


Intervening Sequence 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1880 


An ‘intervening sequence’ is another term for an intron. 


See also: Introns and Exons 
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Intron Homing 
M A Gilson and M Belfort 
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A mobile intron is defined as an intron that moves by 
an active mechanism to a new site on DNA, and upon 
establishment in the new site, continues to function as 
an intron. This active movement is mediated by 
an intron-encoded protein, usually an endonuclease. 
There are two types of intron mobility, homing and 
transposition. In the case of homing, an intron is cop- 
ied from one site to the same position at a homologous 
but intronless site. Transposition occurs when an 
intron is copied into a heterologous site. 

The DNA homing site is the segment of the cognate 
gene into which the intron inserts in the process of 
homing. The homing site consists of three parts: the 
endonuclease recognition sequence, the endonuclease 
cleavage site, and the intron insertion site. Table | 
provides a listing of introns for which mobility has 
been demonstrated. 


History 


Mobile introns are widespread. They have been iden- 
tified in bacteria and bacteriophage, archaebacteria, 
and eukaryotes. The RNA of most of these introns 
folds into a series of stems and loops. There are two 
different basic folding patterns, corresponding to the 
group I and group II introns. In addition to different 
RNA structures, introns in the two groups also have 
distinct autocatalytic splicing mechanisms. Mobility 
has been demonstrated for group I and group II 
introns and for a noncatalytic archaebacterial intron, 
but not for nuclear spliceosomal introns. 

The first intron shown to be mobile, in the early 
1970s, was the group I ribosomal large subunit (LSU) 
intron, formerly called the œ intron, of the yeast Sac- 
charomyces cerevisiae. The DNA-based homing pro- 
cess was elucidated by experiments showing polarity 
of recombination in crosses between intron-plus and 
intron-minus alleles. The intron was mobilized so that 
more than 90% of the progeny were found to carry the 
intron-containing allele. 

The first group II intron shown to exhibit homing 
was the all intron, also of S. cerevisiae. The original 
papers refer to this as transposition, but it is in fact 
homing as defined above. Group II intron homing is 
distinguished from homing of group I introns by the 
involvement of the intron RNA in both templating 
and mediating the mobility event. 
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Mobile introns. Based on the presence of endonuclease-encoding open reading frames and homology to 
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Table | 
known mobile introns, many more introns are likely to be mobile. 
Intron Name Organism Reference 
Group | 
LSU (œ) Saccharomyces cerevisiae Dujon B (1989) Gene 82: 91-114 
coxl-30 Saccharomyces cerevisiae 
coxl-|4a Saccharomyces cerevisiae 
coxl-|5a Saccharomyces cerevisiae 
bi-2 Saccharomyces capensis 
coxl II Schizosaccharomyces 
pombe 
LSU-3 Physarum polycephalum 
LSU-5 Chlamydomonas eugametos 
LSU Chlamydomonas reinhardtii 
Cobl-| Chlamydomonas smithii 
td T4 bacteriophage 
sunY T4 bacteriophage ibid. 
LSU Desulfurococcus mobilis 
USA 92: 12285-12289 
DiSSul Didymium iridis 
cox | Peperomia polybotrya 
USA 95: 14244-14249 
Group II 
all Saccharomyces cerevisiae 
al2 Saccharomyces cerevisiae ibid. 
LI.LtrB Lactococcus lactis 
Rmintl Sinorhizobium melliloti 
XIn6 Pseudomonas alcaligenes 
PIDNA Podospora anserina 
Cox l.l Kluveromyces lactis 


Skelly PJ et al. (1991) Current Genetics 20: 115-120 


Transposition has not been demonstrated for group I 
introns, but a bacterial group II intron is capable of 
transposition to ectopic sites, in addition to homing. 
Transposition also requires an RNA intermediate. 


Homing Mechanism 


Group I Mechanism 

Intron homing requires homology of flanking exon 
sequences. Although extensive homology is favorable, 
homologous regions as small as 10 bp on either side of 
the intron are sufficient. Group I intron mobility is 
DNA-mediated. The intron-encoded endonuclease 
initiates the homing process by generating a double- 
strand break. This process is shown in Figure I. 

The DNA ends are then chewed back to form a gap 
by exonucleolytic activity. This gap is repaired by a 
gene-conversion event, with the intron-containing 
allele as a template. In addition to the insertion of the 


intron, genetic markers both upstream and down- 
stream of the intron insertion site may be converted 
to those of the intron donor. 


Group II Mechanism 


Retrohoming 

Group I introns with mutations that block RNA spli- 
cing remain capable of homing, but for group II 
introns splicing is a requirement for intron mobility, 
because the spliced intron RNA is active in the hom- 
ing process. The prefix ‘retro-’ acknowledges the role 
of RNA in the group II homing mechanism. The 
intron-encoded proteins of group II introns are more 
complex than those of group I introns. They consist of 
a single multifunctional protein that generally encodes 
endonuclease, RNA maturase (for splicing enhance- 
ment), and reverse transcriptase functions. In all 
group II mobile introns, the open reading frame 
(ORF) is located in a large loop in the RNA secondary 
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Group | intron homing pathway. Outlined strands represent sequence of the donor allele. Gray 


lines represent sequence of the recipient allele. Black lines symbolize the intron sequence, and the pac-man 
symbols represent the intron-encoded endonuclease. Arrowheads at the ends of the lines represent the 3’ end of the 


DNA. 


structure. If the ORF is deleted from the intron, mobil- 
ity is lost, but, when the intron-encoded protein is 
provided in trans, mobility is restored. 

Group II intron homing is catalyzed by a ribonu- 
cleoprotein consisting of the intron-encoded protein 
and the spliced intron RNA. This is shown in 
Figure 2. The first step in homing is cleavage of the 
homing site of the intron-minus allele. The top strand 
is cleaved by the intron RNA, in a reverse-splicing 
reaction, while the bottom strand is cleaved by the 


endonuclease function of the intron-encoded protein. 
Recognition of the target occurs primarily by base- 
pairing between intron RNA sequences and the DNA 
homing site. The inserted intron RNA is then copied 
into DNA by the reverse transcriptase moiety of the 
protein, using the 3’ end of the cleaved DNA as a 
primer. The mechanism by which this CODNA-RNA 
hybrid is resolved has not yet been elucidated, but the 
net result is the duplication of the intron in the intron- 
minus allele. 
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Figure2 Group Il intron retrohoming pathway. Outlined strands represent sequence of the donor allele. Gray lines 
represent sequence of the recipient allele. Black lines symbolize the intron sequence, with solid and dashed lines 
representing DNA and RNA, respectively. The staggered arrowheads mark the sites of intron insertion and 
endonuclease cleavage. The pathway shown is for a bacterial group II intron. A similar pathway, with some variations, 
occurs for the yeast introns. (dsDNA, double-stranded DNA; RNP, ribonucleoprotein; RT, reverse transcriptase; 


M, maturase; E, endonuclease). 


Retrotransposition 

Although transposition appears to occur by multiple 
pathways, the major transposition pathway is 
independent of the endonuclease function of the 
intron-encoded protein, as would be predicted for 
integration of the intron into single-stranded nucleic 
acid or the involvement of cellular nuclease(s). The new 
locations show some degree of homology to the intron 
homing target, specifically at the end of the first exon 
in a region that base-pairs with the intron RNA. 


Intron-Encoded Proteins — Homing 
Endonucleases 


The ORFs of mobile introns encode endonucleases 
that function to nick or cut the DNA of the insertion 
site to allow integration of the intron. Homing endo- 
nucleases are quite different from restriction endonu- 
cleases, which also cut DNA at a sequence-specific 


site. Restriction endonucleases generally recognize 
small (4-8 bases), often palindromic sites with strong 
sequence specificity at the cleavage site. In contrast, all 
the intron-encoded endonucleases characterized thus 
far have large (in the 12- to 40-bp range) recognition 
sites. The homing endonucleases exhibit a relaxed 
sequence specificity over these lengthy recognition 
sites and can tolerate many base changes. Yet the 
main requirement of homing endonucleases is simply 
to initiate cleavage of the DNA at the target site. An 
intron has been engineered to express the EcoRI 
restriction endonuclease, and this intron can home to 
an engineered EcoRI restriction site. 

Homing endonucleases constitute a diverse group of 
proteins. While found in both mobile introns and 
inteins (mobile elements that splice at the protein 
level), they also exist in freestanding form, such as the 
HO endonuclease involved in mating-type switching 
in S. cerevisiae. Most homing endonucleases fall into 
four major classes, based on their conserved struc- 
tural motifs: LAGLIDADG, GIY-YIG, H-N-H, 
and His-Cys. The name of the first three classes 
is the amino acid sequence of the motif in single- 
letter code. Enzymes in both the H-N-H and 
His-Cys classes share a common protein fold, and a 
metal ion involved in catalysis, so it has been pro- 
posed that they are diverse members of a single struc- 
tural class. 


Significance 


Evolutionary Implications 
Mobile introns have been found in all the kingdoms of 
life. They share with transposons, retrotransposons, 
and retroviruses the ability to integrate their DNA at 
new positions in the genome. Mobile introns may re- 
present the ultimate ‘selfish DNA,’ since their mobi- 
lity allows for efficient propagation, while the ability 
of the intron to splice prevents gene inactivation. 

Although there is some controversy, it is generally 
believed that mobile introns arose through the inva- 
sion of introns by endonuclease genes. There are 
several lines of evidence to support this hypothesis. 
First, closely related mobile introns of similar 
sequence have been found to code for highly divergent 
endonucleases from different classes, suggesting inde- 
pendent invasion events of the intron by the endo- 
nuclease gene. Second, endonuclease ORFs are looped 
out of different secondary structure elements in dif- 
ferent introns. Third, in a well-studied intron, the 
endonuclease ORF is flanked by sequence that closely 
resembles the intron homing site. 

It is a provocative fact that the group II intron RNA 
is a ribozyme that acts catalytically to nick the DNA 
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at the homing site. The RNA world hypothesis pos- 
tulates that the first enzymes were RNA-based. The 
group II ribonucleoproteins may represent ancient 
biochemistry and a transitional state between an 
RNA world and the DNA-protein world as we 
know it. 

The splicing reactions of group II introns are 
mechanistically similar to those of the nuclear spliceo- 
somal introns, which comprise about 15% of the 
human genome. It is widely hypothesized that group 
II introns evolved into spliceosomal introns, although 
direct evidence in terms of conservation of sequence 
and structure is lacking. It is also noteworthy that 
group II introns resemble retrotransposons that lack 
a long terminal repeat, both in mechanism of inte- 
gration and in sequence of the reverse transcriptase 
moiety of the intron-encoded protein. These retro- 
transposons make up more than 17% of mammalian 
genomes. Group II introns and their close relatives 
have therefore played a major evolutionary role in 
shaping the human genome. 


Potential Applications 

Mobile introns offer a potentially valuable tool for 
gene manipulation. Because the homing sites are of 
such large size, the endonucleases are very useful as 
rare cutters. This enables specific digestion of DNA 
into large fragments. In group II introns, key features 
of the homing site are recognized by base-pairing 
between the DNA and the intron RNA. Thus it is 
theoretically possible to mutate group II introns to 
recognize and insert into any desired site in the gen- 
ome. This may serve to inactivate a deleterious gene, 
or direct a beneficial gene to a benign location, because 
the intron-encoded protein can be provided in trans, 
and the intron can be engineered to carry a gene of 
interest. These introns are therefore useful both for 
gene targeting and as agents of gene delivery. 


Further Reading 

Belfort M, Derbyshire V, Parker MM, Cousineau B and 
Lambowitz AM (2001) Mobile introns: pathways and pro- 
teins. In: Craig N, Craigie R, Gellert M and Lambowitz AM 
(eds). Mobile Elements. Washington, DC: ASM Press. 

Lambowitz AM and Belfort M (1993) Introns as mobile genetic 
elements. Annual Review of Biochemistry 62: 587—622. 

Lambowitz AM, Caprara MD, Zimmerly S and Perlman PS 
(1999) Group | and group II ribozymes as RNPs: clues to 
the past and guides to the future. In: Gesteland RF Cech TR 
and Atkins JF (eds) The RNA World, 2nd edn. Plain view, NY: 
Cold Spring Harbor Laboratory Press. 


See also: Introns and Exons; Retrotransposons; 
Retroviruses; RNA World; Transposable 
Elements 
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Introns and Exons 
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An intron (or “intervening sequence’) is a segment of 
RNA excised from a gene transcript, with concomi- 
tant ligation of flanking segments called ‘exons.’ This 
process of excision and ligation, known as ‘splicing,’ 
is one of several posttranscriptional processing steps 
that may occur prior to translation. Although ‘intron,’ 
in the strict sense, refers only to segments excised 
from RNA (and, by extension, the DNA segments 
that encode them), there exist developmental analogs 
of introns that are excised from DNA (the ciliate 
IES elements) or from protein (the printrons or 
inteins). 


Diversity and Distribution 


Introns of some type are found in every kingdom of 
cellular life, and also in viruses, bacteriophages, and 
plasmids. Different types of introns have different 
splicing mechanisms and distinctive patterns of dis- 
tribution with respect to gene families, subcellular 
compartments, and taxonomic groups (e.g., protein- 
spliced tRNA introns are known only from tRNA 
genes in archaebacterial genomes or eukaryotic nuclear 
genomes). A single gene may have multiple introns 
and, rarely, introns of multiple types (e.g., some fungal 
mitochondrial genes have both group I and group II 
introns). 

The most familiar introns are the ‘spliceosomal’ 
introns, which are excised by a ribonucleoprotein 
‘spliceosome, and which typically have the sequence 
GU...AG. Spliceosomal introns are known only from 
genes in the eukaryotic nucleus (or nucleomorph) and 
in eukaryotic viruses. They range in length from less 
than 20 nt (nucleotides) to over 200 kilo-nt, while 
exons range in length from less than 10 nt to over 3 
kilo-nt. The mean density of introns varies widely, 
from over 4 introns per kilo-nt of protein-coding 
sequence in the most intron-dense nuclear genomes 
(including those of vertebrates and vascular plants), to 
0.04 in the yeast Saccharomyces cerevisiae. 

Group I and group II introns are collectively 
known as ‘self- splicing’ introns, because the intron 
RNA plays a primary role in the biochemistry of 

splicing, in some cases being sufficient for splicing in 
vitro. Group I introns are the most broadly distri- 
buted mobile elements known, being found in the 
genomes of eubacteria and their phages, as well as in 
the nuclear, mitochondrial, and chloroplast genomes 


of eukaryotes. Group II introns are known in eukary- 
otic organellar (but not nuclear) genomes, as well as 
in eubacterial chromosomes and plasmids. Though 
common in some organellar genomes, self-splicing 
introns are extremely rare elsewhere, and seem to be 
entirely absent from most prokaryotic genomes as 
well as many eukaryotic nuclear genomes. 


Role in Gene Expression 


In most cases, introns appear to be dispensable. 
Introns can be removed entirely from mitochondria 
of S. cerevisiae without obvious ill effect. Neverthe- 
less, in a variety of cases, introns and splicing figure 
importantly in development. The delay caused by the 
transcription and splicing of a gene with many long 
introns can be important (e.g., the knrl gene of Dros- 
ophila). The intron may contain within itself some 
other feature: a DNA regulatory site (e.g., a promoter 
or enhancer), a structural RNA (e.g., intron-encoded 
snoRNAs in eukaryotic nuclear genomes), or a pro- 
tein-coding region (e.g., intron-encoded maturases in 
organellar group I and II introns and homing endonu- 
cleases in bacteriophage introns). Splicing may join 
parts of two different RNA transcripts, a process 
known as ‘trans-splicing’ that is common in trypano- 
somes but rare or absent in most other organisms. 
Finally, the pattern of splicing of a single transcript 
may be variable, such that different mRNAs, and 
different protein products, are produced from the 
same pre-mRNA. Regulation of such ‘alternative 
splicing’ schemes plays a crucial role in sex determin- 
ation in Drosophila. The frequency and importance of 
alternative splicing in most species is not well under- 
stood. 


Mutation and Evolution 


Introns are passively subject to the same mutational 
lesions that affect other genomic sequences; in some 
cases they contribute actively to the mutational pro- 
cess as mobile elements. Nucleotide substitutions that 
alter splicing have been implicated in many heritable 
diseases in humans. Such changes usually map to 
within a few nt of a splice junction. Over evolutionary 
time-scales, the internal sequences of spliceosomal 
introns diverge rapidly (by nucleotide substitutions 
as well as by short insertions and deletions), presum- 
ably because the demands of splicing impose no con- 
straint on most internal sites. By contrast, group I and 
II introns evolve more slowly, and are densely packed 
with sequences that participate in splicing and mo- 
bility. 

Rearrangement mutations involving introns also 
occur, sometimes based on recombination between 


repetitive elements within introns. In animal genomes, 
intron-mediated rearrangements have contributed 
importantly to the evolution of novel chimaeric 
genes by so-called ‘exon shuffling.’ On the scale of 
millions to hundreds of millions of years, homologous 
genes may diverge by loss and gain of introns. Loss of 
an intron may occur by way of reverse transcription 
and recombinational reincorporation of a spliced gene 
product. Insertion of introns by transposition has 
been observed experimentally for group I, group II, 
and spliceosomal introns. For group I and II introns, 
‘homing’ to (intronless) allelic sites is also observed. 


See also: Eukaryotic Genes; Pre-mRNA Splicing 


Invariants, Phylogenetic 
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Phylogenetic invariants is a method first proposed 
by Lake (1987). The ‘invariants’ derive from the fact 
that the addition and subtraction of the numbers of 
certain nucleotide distribution patterns are expected 
to remain constant (at zero) for all incorrect phylo- 
genies. And thus can be used to distinguish among 
alternative phylogenetic trees. It is a property that is 
used on nucleotide sequences taken four at a time. For 
example, suppose that we had four such sequences 
that are homologously aligned from left to right, one 
under the other: 


AGA... 
AGT... 
aC Tie 
CTA... 


so that for any position in the alignment the four 
nucleotides produce a (vertical) pattern such as 
AACC. This might suggest that the first two se- 
quences are sister sequences meaning that they are 
more closely related to each other than either of them 
is to the second two sequences (see Figure 1A). There 
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are 256 possible patterns but some of them carry the 
same information. For example, the same relationship 
would be inferred if the pattern were GGTT. The 
method restricts itself to positions that have exactly 
two purines (A and/or G) and two pyrimidines (C 
and/or T) in their pattern as all the examples used 
here do. Their relationship is shown by the tree in 
Figure |A where the arrowhead indicates that only a 
single transversion mutation is required to explain the 
observed nucleotides at the tips of this tree. (A trans- 
version is the historical change from (or to) a purine 
to (or from) a pyrimidine; all other interchanges are 
called transitions.) 

On the other hand, a pattern such as ACCA would 
suggest that sequences 1 and 4 were sisters rather than 
sequences 1 and 2 (see Figure |B). The two relation- 
ships (trees) cannot both be true, but if sequences 1 
and 2 really are the true sister sequences, then this 
third pattern can only have arisen by virtue of two 
transversions having occurred during the history of 
these sequences (see Figure lC). 

However, we can estimate how often the mislead- 
ing case in Figure IC arises. Note that in Figure ID 
we have shown only three of the four nucleotides in the 
pattern. What could the fourth nucleotide be? As we 
only consider those patterns with two purines and two 
pyrimidines, there must be a pyrimidine. Which one? 
If we assume that there is no bias as to which nucleo- 
tide the mutation is to, then it can be either C (as in 
Figure IC) or T (as in Figure IE) with equal prob- 
ability. But that means that, for the wrong tree, the 
number of occurrences of a pattern like that in Figure 
IC should be the same as the number for the pattern 
like that in Figure IE. Hence, subtracting those two 
numbers should give an number not statistically differ- 
ent from zero for the two tree structures that are wrong. 
(The third possible tree is for the pattern ACAC 
which suggests that sequences 1 and 3 are sisters.) 

There are more details to the method but the pre- 
ceding gives the spirit of the method. It is a method 
that is guaranteed to give the correct answer given 
sufficient lengths of the sequences being compared. 
This virtue, however, is more than offset by the answer 
to the question of how long the sequences must be to 
get that correct answer. It turns out that the sequences 


1A 4C 1A 4A 1A 4A 1A 4A 1A 4A 
\o 7 \ | \ / 
Av C i) A—A A—A A—A 
I A AEN f I \ 
2A 3C 2C 3C 2C 3C 2C 3? 2C 3T 
A B D E 


Figure | 
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need to be incredibly long, sometimes greater than the 
size of the genome, as a consequence of which the 
method is not used. 


Reference 

Lake JA (1987) A rate-independent technique for the analysis of 
nucleic acid sequences: evolutionary parsimony. Molecular 
Biology and Evolution 4: 167-191. 


See also: Phylogeny; Transition; Transversion 
Mutation 


Inversion 
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An inversion is a DNA rearrangement in which 
a segment of a chromosome is flipped (or reversed), 
so that the sequence reads in the opposite direction to 
the original. Genes contained within an inversion will 
map in the reverse order to normal and will be 
expressed in the opposite orientation. 


See also: Hin/Gin-Mediated Site-Specific DNA 
Inversion; Site-Specific Recombination 


Inverted Repeats 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1881 


Inverted repeats are two copies of the same DNA se- 
quence repeated in opposite orientation in the same 
molecule. 


See also: Repetitive (DNA) Sequence 


Inverted Terminal Repeats 
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Inverted terminal repeats are short related or identical 
sequences repeated in opposite orientation at the ends 
of some transposons. 


See also: Transposable Elements 
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An isochromosome is an abnormal metacentric 
chromosome formed by the duplication of one arm 
of a normal chromosome with deletion of the other 
arm. Both arms of the metacentric chromosome are 
thus genetically identical. It may arise from transverse 
instead of longitudinal division of the centromere dur- 
ing cell division or, more often, by an isochromatid 
break and fusion of the daughter chromatids above 
the centromere. In the latter case the isochromosome 
is dicentric. One of the two centromeres of a dicentric 
isochromosome usually becomes nonfunctional, so 
that the chromosome segregates normally during cell 
division. 

The commonest human isochromosome observed 
in livebirths is an isochromosome for the long arm of 
the X chromosome. This results in Turner syndrome 
(see Turner Syndrome), and it is found that the iso- 
chromosome is preferentially inactivated, forming 
larger than normal sex chromatin (Barr body; see Sex 
Chromatin). Isochromosomes of the Y chromosome 
are also found in livebirths, and can involve either the 
short or long arms. Short-arm Y isochromosomes 
cause male infertility as the testis-determining region 
is not lost despite the loss of spermatogenesis fac- 
tors on the long arm. Long-arm isochromosomes 
of the Y are associated with female sex determination 
unless the isochromatid break lies distal to the sex- 
determining region of the Y. 

Isochromosomes involving the human autosomes 
usually result in early spontaneous abortion; rare 
exceptions are isochromosomes for the short arms 
of chromosomes 9 and 12, and these are associated 
with severe mental and physical disability. 


See also: Sex Chromatin; Turner Syndrome; X- 
Chromosome Inactivation 


Isolation by Distance 
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Sewall Wright pioneered the study of how genetic 
similarity declines with geographic distance. His 


work was based on a hierarchical model of local popu- 
lations (demes) in successively larger regions, each 
with its own gene frequencies. This model is difficult 
to apply to real populations, and so has been super- 
seded by the theory of Gustave Malecot for pairs of 
individuals born at a known distance d in a given 
region. Genetic similarity is measured by kinship ¢2, 
the probability that a gene drawn randomly from one 
individual be identical by descent with a random allele 
in the other individual. If the pair are spouses, kinship 
is the inbreeding Fz of their children. The Malecot 
equation is usually written as yg=(1—L)a e°¢+L, 
where 0 < a < 1 is kinship within a local popu- 
lation (d = 0) and — 1 < L < 0) is kinship at large 
distance. If current gene frequencies are used in 
kinship bioassay on genotypes, phenotypes, or sur- 
names, L = —pr/(1 — pr), where ppr is random kin- 
ship is the sampled region. If kinship in relation to 
founder gene frequencies is predicted from migration 
or genealogy, L = 0. The parameters a, b are functions 
of effective population size N and systematic pres- 
sure m largely due to migration. Validity of this equa- 
tion depends on discreteness of local populations. 
More complicated expressions derived for continuous 
distributions and two or three dimensions are less 
accurate for real populations. Oceanic islanders 
and nomadic populations have small values of b 
compared with coastal islanders and agriculturists. 
Kinship increases rapidly in populations with pre- 
ferential consanguineous marriage but then reaches 
a plateau that is not much greater than for isolates 
that avoid consanguineous marriage. The effect of 
migration is everywhere apparent. This evidence 
helped to resolve misunderstanding about the role of 
population structure in assessing forensic DNA iden- 
tification. 

In recent years the Malecot model has been useful 
for study of linkage disequilibrium. Distance between 
loci or nucleotide polymorphisms is measured along 
the physical or genetic map, usually in kilobases (kb) 
or centimorgans (cM), taking advantage of the fact 
that recombination acts on allelic association in the 
same way as migration acts on kinship. Isolation by 
distance has become a cornerstone of genetic epidemi- 
ology, as it has long been for population genetics and 
anthropology. 


Further Reading 

Wright S (1951) The genetical structure of populations. Annals 
of Eugenics |5: 323-354. 

Malecot G (1969) The Mathematics of Heredity. San Francisco, 
CA: WH Freeman. 

Lasker GW (1985) Surnames and Genetic Structure. Cambridge: 
Cambridge University Press. 
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Morton NE (1992) Genetic structure of forensic populations. 
Proceedings of the National Academy of Sciences, USA 89: 2556— 
2560. 


See also: Effective Population Number; Linkage 
Disequilibrium; Wright, Sewall 


Isoleucine 
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Isoleucine (Ile or I) is one of the 20 amino acids 
commonly found in proteins. Its side-chain consists 
purely of hydrocarbons and itis only slightly soluble in 
water. Isoleucine belongs to the group of neutral-polar 
amino acids which includes glycine, alanine, valine, 
leucine, phenylalanine, proline, and methionine. These 
amino acids are usually found on the inside of protein 
molecules. 


COO 
N-e- H 
N—C— CH; 
Gh 
Gh, 


Figure | Isoleucine. 


See also: Amino Acids; Proteins and Protein 
Structure 


Isomerization (of Holliday 
Junctions) 
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A crossed-strand exchange or Holliday junction can 
be resolved endonucleolytically to restore the parental 
combination of flanking markers, or to give a reciprocal 
exchange — a crossover (see Figure IC). This bifurcat- 
ing decision need not require two different activities 
of the endonuclease making the cuts (the resolvase). 
Model building has shown that the Holliday junction 
itself can adopt an alternative form, such that the 
same enzyme activity gives the alternative results of 
crossover or noncrossover. Alternative forms of a 
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Figure | 
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(A) With some base pairs unstacked, the Holliday junction takes on an X-form. The two DNA strands 


that cross in the middle of this X (labeled I) are those that exchanged places in the formation of the Holliday junction. 
Rotation of the upper arms, as shown by the circular arrow, reveals that the structure has a hole in its center. (B) A 
rotation of two side arms of this structure relative to the other two gives a configuration in which the other two 
strands cross in the center (shown at II). (C) If the crossing strands are cut at | in (A), the structure is resolved as a 
noncrossover. If the crossing strands are cut at Il, a crossover results. 


molecule are called isomers and the process by which 
a molecule adopts an alternative structure is called 
isomerization. These terms are applied to Holliday 
junctions. 

Isomerization of a Holliday junction is conceived 
as beginning with some bases becoming unstacked. 
That is, the eight bases at the junction no longer inter- 
act with their neighbors in the same DNA strand. This 
allows the arms of the junction to open out so that the 
structure takes on the form of an X as shown in the 
first structure in Figure 1A. The process of isomer- 
ization of a Holliday junction is described as two 


rotations of pairs of arms of the structure, as shown 
in the Figure IA and B. The second rotation causes 
the two strands that cross each other to be a different 
pair than those that cross in the first structure. If the 
resolvase cuts only the crossing strands, the two iso- 
mers then give rise to the alternative outcomes. These 
rotations are constrained to occur in one direction 
only. DNA has sufficient flexibility for the rotating 
parts of the molecule to be local, rather than involving 
the whole length of DNA molecules. The process is 
reversible and the two isomers are expected to occur in 
a state of rapid equilibrium. 


Further Reading 

Meselson MS and Radding CM (1975) A general model for 
recombination. Proceedings of the National Academy of 
Sciences, USA 72: 358-361. 

Sigal N and Alberts B (1972) Genetic recombination: the nature 
of a crossed strand-exchange between two DNA molecules. 
Journal of Molecular Biology 71: 789-793. 


See also: Cruciform DNA; Holliday Junction; 
Holliday’s Model 
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An isotype is a set of macromolecules sharing some 
common features, e.g., closely related immunoglobulin 
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chains. The isotype describes the class, subclass, light 
chain type, and subtype of an immunoglobulin. 


See also: Immunoglobulin Gene Superfamily 


Isotype Switching 


See: Recombination in the Immune System 


J Gene 


See: Recombination in the Immune System 


Jackknifing 


See: Trees 


Jacob, Francois 
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The French biologist François Jacob (1920- ) shared 
the 1965 Nobel Prize for Physiology or Medicine with 
André Lwoff and Jacques Monod for their discoveries 
concerning genetic regulatory mechanisms in bacteria. 

After being severely wounded whilst in active com- 
bat in World War II, Jacob was forced to give up his 
studies for his chosen career as a surgeon. After gain- 
ing an MD degree in 1947 and a PhD in science in 1954 
from the Faculty of Medicine and Faculty of Science 
in Paris respectively, he turned to biology. 

While Jacob started as a research assistant at the 
Pasteur Institute in Paris in 1950, it was not long 
before he became the Laboratory Director. Within 
10 years he had been promoted again to Head of the 
Department of Cellular Genetics. By 1965 he was also 
the Professor of Cellular Genetics at the Collége de 
France, and it was here that a position was created for 
him as the Professor of Cell Genetics. 

With coworker Jacques Monod, Jacob studied the 
regulation of enzyme synthesis in bacteria. Later they 
made the significant discovery of ‘regulator genes’ and 
the mechanisms for controlling the expression of 
structural genes. The ‘operon’ theory of gene regula- 
tion (see Operon) is now central to today’s under- 
standing of genetic control. This discovery explained 
the mechanisms by which cells modulate the expres- 
sion of genes in response to varying environmental 
conditions. Jacob, with Sydney Brenner and Matthew 
Meselson, also proved the existence of messenger 
RNA. 


Jacob has been awarded a number of scientific 
awards and is an honorary member of numerous 
societies including the French Academy of Sciences, 
The National Academy of Sciences of the USA and 
the Royal Society of London. He has published many 
books on molecular biology. 


Further Reading 
http://www.nobel.se/medicine/laureathes/|965/jacob.bio.html. 
http://www.rockefeller.edu/pubinfo/jacob.nt.html. 


See also: Brenner, Sydney; Gene Expression; lac 
Operon; Monod, Jacques; Operon 
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The Jukes—Cantor equation provides an estimate of 
the actual number of nucleotide substitutions since the 
separation of two DNA sequences by correcting 
the observed differences for multiple substitutions at 
the same site. Two DNA sequences evolve from the 
same ancestral sequence by accumulating mutational 
differences. The number of nucleotide differences can 
be counted by comparing the aligned sequences. The 
observed number of nucleotide differences does not 
always show all the nucleotide substitutions that 
have occurred during the evolutionary past, because 
multiple substitutions at the same nucleotide position 
remain undetected. The Jukes—Cantor method cor- 
rects the estimate of sequence differentiation for such 
multiple hits. The method is based on a model which 
assumes that all four nucleotides are equally frequent, 
all types of nucleotide substitutions are equally com- 
mon, and all nucleotide sites mutate with the same 
probability. Under these assumptions, it is easy to 
derive the relationship between the observed propor- 
tion (p) of nucleotide differences between two se- 
quences and the frequency of nucleotide substitutions 
(d) that have occured. Let the proportion of nucleo- 
tide differences at time ż since the common ancestor be 
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p and the probability of one nucleotide mutating per 
unit of time be «. The expected number of nucleotide 
substitutions per site in the two sequences, their evo- 
lutionary distance, is then d = 2at; two because muta- 
tions have occured in both lineages. In terms of the 
observed p, the evolutionary distance becomes 


d = 2at = —(3/4)log[1 — (4/3)p] 


which is the Jukes—Cantor distance of the sequences. 
The estimate d is always larger than the observable 
differentiation measured by p. For small differences 
between the sequences (say p < 0.2), the observed 
nucleotide differences estimate well all the nucleotide 
substitutions as multiple mutations of the same 
nucleotide site are unlikely. With increasing differen- 
tiation, the estimate d starts to depart from p. When p 
becomes or exceeds the value of 0.75, the sequences 
are saturated by mutational differences and d becomes 
undefined. The limit of 75% difference (or 25% simi- 
larity) is what one gets by constructing two random 
sequences from four equally frequent nucleotides. The 
saturation makes any estimate of sequence differenti- 
ation unreliable. This is seen from the variance of the 
distance estimate d, which is 


V(d) = p(1 — p) /[L( — 4p/3)"] 


where L is the length of the sequence (number of 
nucleotides). When p approaches the value of 0.75, 
the variance increases quickly. 

The Jukes—Cantor method corrects for multiple 
substitutions of the same site but not for different 
mutational probabilities that depend on the type (A, 
G, T, or C) and position of the nucleotide. These can be 
taken into account, e.g., by Kimura’s two-parameter 
model (allowing different probabilities for transitional 


and transversional mutations) or by distinguishing 
between synonymous and nonsynonymous sub- 
stitutions in a coding sequence. Even though the 
Jukes—Cantor model oversimplifies the underlying 
evolutionary model, it has the advantage that it is 
robust and depends on the minimal number of model 
parameters (only the equal substitution rate is taken as 
a parameter). Estimators based on more complex sub- 
stitution models are superior if the assumptions of the 
model are correct, but they can also become sensitive 
to departures between the assumptions of the model 
and the reality of molecular evolution. 

If one has an estimate of the substitution rate « from 
known time of differentiation (e.g., based on fossil 
evidence) and the rate is close to constant over time 
in different evolutionary lineages, it becomes possible 
to estimate the time of separation of any lineages from 
t = d/(2a). The rate constancy can be tested by rela- 
tive rate test. A matrix of pairwise Jukes—Cantor dis- 
tances can be used for constructing phylogenies with 
distance-based methods, such as the neighbor-joining 
method. 


Further Reading 

Jukes TH and Cantor CR (1969) Evolution of protein molecules. 
In: Munro HN (ed.) Mammalian Protein Metabolism, vol. 3, pp. 
21-132. New York: Academic Press. 

Li W-H (1997) Molecular Evolution. Sunderland, MA: Sinauer 
Associates. 


See also: Kimura Correction; Molecular Clock 
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See: Horizontal Transfer; Transposon Excision; 
Transposons as Tools 
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Lac mutants are organisms that contain mutations in 
some part of the Jac operon or its controlling ele- 
ments. Therefore, they contain some defect in the 
metabolism of the disaccharide lactose, or in the regu- 
lation of this metabolism, when compared with wild- 
type strains. Lac mutants are of historic interest 
because they helped to uncover the structure and regu- 
lation of the Jac operon, the first operon discovered. 
They are also of interest because the techniques which 
were developed to screen or select these mutants are 
still used in the classroom and the laboratory. 

Wild-type strains of the bacterium Escherichia coli 
are phenotypically Lac, meaning they have the abil- 
ity to use lactose as a sole source of carbon. In order to 
be Lact, E. coli must be able to express a functional 
lacZ gene, which encodes f-galactosidase, and a func- 
tional /acY gene, which encodes the lactose permease. 
Mutants in which either of these genes have been in- 
activated are said to be Lac" and cannot utilize lactose. 
Joshua Lederberg and his associates were the first to 
isolate and map Lac” mutants of E. coli, beginning in 
the 1940s. Lac” mutants can be identified by their 
failure to grow when lactose is the sole carbon source 
or by the use of various types of indicator plates. 
Mutations in lacZ or lacY can be differentiated by a 
variety of techniques. For example, mutants which 
cannot produce the lactose permease also cannot grow 
on melibiose under certain conditions. 

Mutations are also known in the regulatory genes 
or regions controlling the /ac operon. Mutants with a 
mutation in the lac promoter will typically be Lac’, 
that is, the promoter will no longer function or at least 
will show decreased expression. However, mutants 
which cannot make the lactose repressor, the product 
of the /acI gene, or which make a repressor that cannot 
bind the inducer, will remain Lac* but will consti- 
tutively express the products of the operon. Such 
mutants will grow on the sugar raffinose, which 


requires the lactose permease for entry into the cells 
but is not an inducer of the operon. Constitutive 
expression of B-galactosidase can also be monitored 
using the chromogenic compound X-gal (5-bromo-4- 
chloro-3-indolyl-B-p-galactosidase) which is also not 
an inducer of the operon. 

However, /acI mutants are also known which lead 
to repressor binding to JacO, the lactose operation 
even in the presence of an inducer. These mutants 
will be phenotypically Lac’, and the mutation will 
be dominant to the wild-type lacI allele. Similarly, 
most mutations in /acO should diminish or destroy 
the ability of this site to bind the repressor and lead to 
constitutive formation of the lac operon enzymes. 
However, some mutations in lacO lead to enhanced 
binding and the mutants are Lac . Note that because 
lacO is a noncoding regulatory region on the DNA, 
mutations in it will only have an effect on the operon 
of which they are a part; that is, they will only oper- 
ate in cis. On the other hand, Jacl mutations will 
function in trans. The ability to make partial diploid 
strains of E. coli was a very important tool in these Lac 
mutants. 

The Jac operon, like many others in E. coli, is also 
positively controlled by the level of cyclic AMP 
(cAMP) and the cAMP binding protein (catabolite 
activator protein, CAP), encoded by the crp gene. 
Mutations in the genes controlling the level of cAMP 
or the production of CAP will also be phenotypically 
Lac. However, such mutations will be very pleo- 
morphic, and it would be unusual to refer to them as 
“Lac mutants.’ 

Interestingly, amino acid residues can be added to 
the amino terminus of B-galactosidase without import- 
ant effects on enzyme activity. Therefore lacZ is un- 
usually insensitive to insertion mutations in this region 
if they maintain the correct reading frame. Because of 
this, many cloning vectors have been designed to con- 
tain a reporter which consists of a multiple cloning 
site, or polylinker, inserted into this region of the lacZ 
gene. Essentially all that is required is that the syn- 
thetic cloning site does not lead to a frameshift of 
termination of translation. DNA fragments which 
are subsequently inserted into such a multiple cloning 
site will typically introduce such mutations, and 


1070 lac Operon 


therefore clones which contain inserts can be readily 
identified by screening. 


See also: Constitutive Expression; lac Operon; 
Lederberg, Joshua; Phenotype 


lac Operon 


J Parker 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0738 


The lactose or lac operon of Escherichia coli is a cluster 
of three structural genes encoding proteins involved in 
lactose metabolism and the sites on the DNA involved 
in regulation of the operon. The three genes are: (1) 
lacZ, which encodes the enzyme f-galactosidase 
(which splits lactose into glucose and galactose); (2) 
lacY, which encodes lactose permease; and (3) lacA, 
which encodes a lactose transacetylase. Functional 
B-galactosidase and lactose permease are required for 
the utilization of lactose by this bacterium. These 
proteins are present in the cell in very low amounts 
when the organism is grown on carbon sources other 
than lactose. However, the presence of lactose and 
related compounds leads to the induction of the 
synthesis of these proteins. Interest in understanding 
the induction of B-galactosidase by its inducer, lactose, 
led Jacques Monod and his associates to begin study- 
ing the regulation of lactose metabolism in the 1940s. 
These studies were aided by analogs of lactose that 
could also be synthesized. Of equal importance, genetic 
systems (conjugation and transduction) for E. coli 
were known which enabled genetic analysis of mu- 
tants with alterations in lactose metabolism. 

Throughout the 1950s, Jacques Monod, François 
Jacob, and their colleagues performed physiological 
and genetic experiments on lactose metabolism in 
E. coli that led to important breakthroughs in our 
understanding of gene expression and regulation. It 
was found that some inducers were not substrates 
of B-galactosidase and some substrates were not 
inducers. Elegant genetic experiments involving lac 
mutants led in turn to the discovery of regulatory 
genes such as Jacl, which encoded the Jac repressor. 
These and other experiments led to the operon model 
of gene expression proposed in 1961. The power of 
this model was widely appreciated; Jacob and Monod 
won the Nobel Prize in 1966. 

The genes in an operon are transcribed into a single, 
polycistronic messenger RNA (mRNA), in this case 
from the lac promoter lacP. The regulatory sites that 
are part of the operon also include the Jac operator 


lacO. When the lactose repressor binds to lacO, a 
region immediately upstream of the structural genes 
of the lac operon, it prevents transcription of the 
operon. This is an example of negative control. In- 
ducers of the operon bind to the repressor and cause 
a conformational change that leads to the disassocia- 
tion of the repressor from the operator. Transcription 
of the operon then begins. (Although the gene encod- 
ing the lactose repressor is not part of the lac operon, 
it is located next to it on the chromosome.) 

Later it was discovered that there is another regu- 
latory protein, which participates in positive control 
of the lac operon. This is the catabolite activator pro- 
tein (CAP; also called the cAMP receptor protein, 
CRP), which, when bound to cAMP, itself binds to a 
region of the lac operon upstream of the promoter and 
allows RNA polymerase binding. The CAP protein is 
involved in regulation of many operons as part of a 
global control system, catabolite repression, which 
allows the efficient integration of the metabolism of 
different carbon sources. 

The E. coli lac operon is of much more than histor- 
ical importance. Not only has it proved extremely 
useful as a model for studies of gene regulation, it is 
also a powerful tool in genetic analysis. For example, 
the ease of assaying B-galactosidase, both im vitro 
using colorimetric assays and on plates using chromo- 
genic substrates, has made lacZ an ideal reporter gene 
in a large variety of experimental situations. In addi- 
tion, the regulatory system consisting of the lac 
repressor and lac operator is often incorporated into 
cloning vectors to provide an easily controlled regu- 
latory system for cloned genes. 


See also: Catabolite Repression; Cloning Vectors; 
Induction of Transcription; Jacob, Francois; lac 
Mutants; Monod, Jacques; Operators; Operon; 
Polycistronic mRNA; Promoters; Regulatory 
Genes 
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A disaccharide (two sugars joined by an O-glycosidic 
bond) commonly found in milk. Lactose is termed a 
B-galactoside because it consists of galactose joined 
to glucose via a B (1—4) glycosidic linkage. Lactose 
is cleaved by the enzyme f-galactosidase to yield 
galactose and glucose. The study of the regulation of 
B-galactosidase synthesis in bacteria by Jacques Monod 
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The karyotype is the chromosome complement of a 
cell, individual, or species, classified according to 
chromosome length, centromere position, and band- 
ing appearance produced by specific staining tech- 
niques. 

The karyotype of a somatic cell is often arranged to 
show chromosome pairs in order of decreasing length 
and numbered accordingly. A diagram of the karyo- 
type based on the analysis of a number of cells is 
referred to as an idiogram. The process of analyzing 
the chromosomes of a cell or individual and arranging 
them according to the species idiogram is known as 
karyotyping. 


See also: Idiogram 
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kb (kilobase) is the abbreviation for 1000 base pairs. 


See also: Bases; DNA Structure 
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Har Gobind Khorana (1992- ) is one of the most out- 
standing geneticists in the world. Khorana may be best 


known for contributing to solving the genetic code in 
the 1960s. He has solved a number of important genetic 
problems from a chemical standpoint over the past 
five decades. His greatest contributions have been 
in the synthesis of oligonucleotides and small 
nucleic-acid-like molecules which culminated in the 
synthesis of a tRNA gene in the 1970s. His recent 
work has focused on the structure and function of 
rhodopsin and its role in signal transduction across 
membranes. 

He was born in Raipur, India in 1922 and was 
educated in India, England, and Switzerland. He has 
served on the faculty of the University of British 
Columbia (1952-1960), the University of Wisconsin, 
Institute for Enzyme Research (1960-1970), and 
Massachusetts Institute of Technology, Departments 
of Chemistry and Biology (1970—present). He has re- 
ceived numerous awards and prizes including the 
Nobel Prize for Physiology or Medicine (shared 
with R.W. Holley and M.W. Nirenberg) in 1968. He 
has received at least 14 honorary doctorate degrees 
and has been elected to numerous honorary member- 
ships to academic societies. 

During his early work at the University of British 
Columbia, he pioneered the chemical synthesis of 
small ribo- and deoxyribonucleoside triphosphates 
using dicyclohexylcarbodiimide and was the foremost 
laboratory in the synthesis of dinucleotide and trinu- 
cleotide molecules of the deoxy- and ribo- types. In 
the 1960s at the University of Wisconsin, he developed 
methods for the synthesis of oligonucleotides as tem- 
plates for DNA and RNA polymerases and/or sub- 
strates for kinases and ligases. This work culminated 
in solving the genetic code in 1966. 

The total synthesis of a tyrosine suppressor tRNA 
gene with upstream and downstream control se- 
quences was accomplished at MIT in 1970s. Over the 
past 25 years, Khorana and his colleagues have suc- 
cessfully investigated mutant bacteria rhodopsins to 
identify the amino acid residues involved in transport 
of protons across membranes. 


See also: Genetic Code; Nirenberg, Marshall 
Warren 
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When we compare two homologous nucleotide 
sequences, we are often interested in estimating the 
number of nucleotide substitutions accumulated 
during the divergence of the two sequences. Let us 
assume that we obtained a reliable alignment for 
those two sequences. Then the simplest way is to 
count the number (m) of nucleotide differences 
between them. We often divide m by the number (n) 
of nucleotides compared. In this case, gap positions 
caused by insertions and deletions are not included. 
The proportion (p = m/n) is called the p distance. 
When the amount of divergence is small, it is intui- 
tively clear that m or p reflects the actual number of 
nucleotide substitutions accumulated since the diver- 
gence of the two sequences. This is because parallel, 
backward, or successive substitutions at the same 
nucleotide site rarely occur under a low divergence. 
When the amount of divergence is relatively large, 
however, the probability of occurnce of those changes 
is expected to increase. Therefore, we need some kind 
of correction for m and p. 

The simplest mathematical model for the correc- 
tion is the one-parameter model. This model is also 
called the Jukes—Cantor model after the two research- 
ers who first used this model. The four nucleotides are 
assumed to change with equal probability with each 
other under the one-parameter model. This simple 
situation clearly does not satisfy the real pattern of 
nucleotide substitution. 

Kimura (1980) proposed two different rates of 
nucleotide substitutions, so this model is also called 
the two-parameter model. In practice, transitions 
usually outnumber transversions, and usually substi- 
tution rates for those two types are assumed to be 
different under the two-parameter model. Theoretic- 
ally, however, any two substitution types can be con- 
sidered in a two-parameter model. 

The number (K) of nucleotide substitutions per site 
is estimated as: 


K = —[1/2]log K -2P - QW) y0- 2)| 


where P and Q are proportions of transitional and 
transversional differences, respectively. 

There is another Kimura correction for amino acid 
sequences (Kimura, 1983). Estimation of the number 
of amino acid replacements based on Dayhoff’s 


PAM matrix is approximated by the following simple 
equation: 


Kaa = —log|1 — p — 1/(5p7)] 


where Kaa is the number of amino acid substitutions 
per site and p is the proportion of amino acid dif- 
ference. 
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Motoo Kimura (1924-94) was a leading population 
geneticist, widely regarded as the successor to Wright, 
Fisher, and Haldane in developing the theory of popu- 
lation genetics and evolution. He is best known for his 
neutral theory of molecular evolution. Kimura was 
born in Okazaki, Japan on 13 November 1924. During 
his childhood he had a love of botany, but he also 
displayed a talent for mathematics. He attended 
Kyoto Imperial University during World War II and, 
although not in the military, suffered from wartime 
and postwar food shortages. On graduation he joined 
the staff of the National Institute of Genetics in 
Mishima and remained there for the rest of his life. 
After the war he was able to study in the United States 
and after one year at Iowa State College transferred to 
the University of Wisconsin, where he received his 
doctorate in 1956. In his later years he developed 
amyotrophic lateral sclerosis and died on his 70th 
birthday, 13 November 1994. 

Kimura pioneered the use of the Kolmogorov dif- 
fusion equations. Although others had used the for- 
ward equation, he was one of the first to employ the 
backward equation and was particularly creative in its 
use. While still a graduate student he worked out the 
complete solution to the process of random genetic 
drift in a finite population from an arbitrary starting- 
point. He then proceeded to solve a number of 
important problems, including: the probability of 
fixation of a mutant gene, the time until fixation, 


conditions for a stable equilibrium with multiple 
alleles, and the evolution of closer linkage. Early in 
his career, he introduced the widely used stepping- 
stone model of population structure. 

Kimura undertook a wide variety of problems, 
both deterministic and stochastic. He had a gift for 
formulating and solving problems, always with a par- 
ticular genetic or evolutionary issue in mind. He was 
especially adept with partial differential equations, 
both in finding the appropriate boundary conditions 
and in finding solutions. His numerical solutions, 
often involving difficult approximations and worked 
out in the days before modern computers, have turned 
out to be remarkably accurate. 

In 1968 Kimura became convinced that the rate of 
amino acid and nucleotide change in molecular evolu- 
tion was too rapid to be accounted for by selection, 
and introduced his neutral theory — the idea that most 
molecular change is due to selectively neutral changes. 
Evolutionary change then becomes the result of muta- 
tion and random drift. For a strictly neutral gene, the 
rate of evolution, when viewed over a long time, is 
simply the mutation rate. This happy insight per- 
mitted a large number of tests of the neutral theory. 
At the same time he argued that molecular poly- 
morphisms represent, for the most part, neutral sites 
in the process of fixation. 

The neutral theory was greeted with great skepti- 
cism at the time it was introduced. Gradually it won 
acceptance, especially from molecular evolutionists. 
Over the years, partly as a result of Kimura’s relentless 
advocacy, the theory has had a fairly wide acceptance. 
It is probably correct to say that the current consensus 
is that most nucleotide changes in higher animals and 
plants are due to random changes, but that the jury 
is still out on the relative number of random versus 
selected changes of amino acids. 

Among biologists as a whole, Kimura is most 
widely known for his theory of molecular evolution. 
Among population geneticists, he is also greatly 
respected for his pioneering work in the mathematical 
theory of population genetics and evolution. 
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Kinases are enzymes that add phosphate groups to 
substrates. The most numerous and most extensively 
studied kinases are protein kinases, which phosphoryl- 
ate specific target proteins and thereby modify their 
activities. Collectively, protein kinases represent the 
largest gene families in eukaryotes: about 2% of all 
genes in the yeast Saccharomyces cerevisiae, the nema- 
tode Caenorhabditis elegans, and the fruit fly 
Drosophila melanogaster are predicted to encode 
kinases — about 120, 400, and 300 genes, respectively. 
Extrapolation to vertebrate genomes suggests that 
these contain more than 1000 kinase genes. 

Biochemically, protein kinases can be distinguished 
on the basis of the phosphorylated residue: histidine, 
serine, threonine, or tyrosine. Histidine kinases are 
primarily important in prokaryotes, in which they 
act as part of ‘two-component’ signaling systems. 
Few histidine kinases are known in eukaryotes, 
although they do occur in the slime mold Dictyoste- 
lium, in fungi, and in plants. The majority of eukary- 
otic kinases are serine/threonine kinases, which fall 
into dozens of different families. Tyrosine kinases 
seem to be absent from the yeast genome and there- 
fore appear to have arisen during the evolution of 
multicellular organisms. They play major roles in 
development and oncogenesis: for example, of 21 
characterized retroviral oncogenes, seven are tyrosine 
kinases (e.g., the abl, src, and yes oncogenes) and three 
are serine/threonine kinases (e.g., the mos and raf 
oncogenes). 

Phosphorylation of target proteins by kinases can 
be reversed by protein phosphatases. Reversibility is a 
general advantage of phosphorylation as a regulatory 
strategy, in contrast to irreversible modifications such 
as proteolyis. However, although there are many spe- 
cific protein phosphatases, some of which play signi- 
ficant roles in regulation, kinases are more numerous 
and usually more important. 
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Regulation of protein activities by kinases is 
ubiquitous in eukaryotic development, physiology, 
and metabolism. Control of glycolysis depends on 
phosphorylation, and the regulation of the cell 
cycle is centrally dependent on a variety of different 
kinases, most notably the cyclin-dependent kinases. 
The majority, and perhaps all, of signal transduction 
pathways involve kinases, sometimes in cascades of 
activity such as that discovered for mitogen-activated 
kinases (MAPK), which are regulated in turn by MAP 
kinase kinases (MAPKK) and MAP kinase kinase 
kinases (MAPKKK). These act to couple events out- 
side the cell, or in the cytoplasm, to cytoplasmic or 
nuclear responses. The activity of many transcription 
factors is modulated, either postively or negatively, 
by the action of kinases. Similarly, ion channel 
properties can be altered by phosphorylation. 

In animals, the initial responses of cells to external 
stimuli such as growth factors or other developmental 
signals are often mediated by receptor tyrosine 
kinases, which are membrane-spanning proteins with 
an extracellular ligand-binding domain, and a cyto- 
plasmic tyrosine-kinase domain. These membranes 
act on specific cytoplasmic targets, affecting other 
kinases in turn. The multiple steps of phosphorylation 
in these signaling cascades, and the opportunity for 
crosstalk between different pathways, creates the 
opportunity for immensely elaborate modulation of 
cellular activity. Much of the complexity of cellular 
and neuronal function in higher eukaryotes appears to 
depend directly on their huge and versatile repertoires 
of protein kinases. 
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The kinetochore is a proteinaceous region within the 
centromere to which spindle microtubules attach dur- 
ing mitosis and meiosis. The kinetochore is an active 
component of the cell checkpoint machinery that 
ensures the correct orientation and segregation of 
chromosomes at cell division. 

Kinetochores behave in a contrasting manner at 
mitosis and meiosis. At mitosis, sister kinetochores 


of a chromosome attach to spindle microtubules and 
orient to opposite spindle poles. The sister chromatids 
separate at anaphase and pass to the spindle poles to 
ensure each daughter cell receives the full chromo- 
somal complement. In the first meiotic division, how- 
ever, the sister kinetochores of one chromosome 
orient to a single pole, while those of its homologous 
partner orient to the other. As a result, daughter cells 
receive half the original number of chromosomes. At 
meiosis II, the kinetochores orient in the same manner 
as mitosis resulting in chromatid segregation. 


Structure 


The somatic vertebrate kinetochore, when viewed by 
standard transmission electron microscopy, is a tri- 
laminar structure on the surface of centromeric 
heterochromatin of each chromatid of a chromosome 
(Figure 1). A fourth, fibrillar layer can also be dis- 
cerned adjacent to the trilaminar structure. 


DNA Composition of Kinetochore- 
Associated Chromatin 


Kinetochores generally form on chromatin with par- 
ticular DNA sequences. However, there is no evi- 
dence that these sequences show evolutionary 
conservation. In the yeast Saccharomyces cerevisiae, 
for example, the minimal centromere contains 125 bp 
of DNA that falls into three distinct elements (CDE I, 
II, and III). All 17 chromosomes of S. cerevisiae carry 
this DNA at their centromeres. In Drosophila melano- 
gaster, a 420-kb DNA sequence, composed of satellite 
arrays and various transposable elements, has been 
found at one centromere. Notably, these DNA se- 
quences are also present in other chromosomal regions 
that do not form kinetochores. In humans, kineto- 
chores are associated with alphoid satellite DNA 
(240kb to several Mb in length); the kinetochore 
does not form along the whole array but within a 
restricted zone of this array. 

The apparent absence of any consensus DNA 
sequence associated with kinetochores has led to 
the suggestion that formation of kinetochores may 
depend on particular, higher-order DNA-protein 
structures. Such chromatin might also be subject to 
some form of epigenetic modification that ensures 
formation of the kinetochore at this particular region 
in successive cell generations. 


Centromere-Associated Proteins 


Many kinetochore proteins have been identified 
although their functions have not been fully 
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Schematic representation of somatic kinetochores as viewed by conventional electron microscopy. The 


various components of the kinetochore are identified and the locations of some of the centromere associated 


proteins are also indicated. 


characterized. The locations of some of these 
proteins within the somatic vertebrate kinetochore 
are illustrated in Figure |. In vertebrates, some cen- 
tromere-associated proteins are present as constitutive 
elements of the kinetochore throughout the cell cycle, 
e.g., CENP-A, -B, -C. Others show a transient pattern 
of association, and are termed passenger proteins, 
being present usually from late Gz to mitotic 
anaphase, e.g., CENP-E, -F, INCENP. 

CENP-A and -C are essential for kinetochore 
function. In mice lacking these proteins, cell division 
is irregular and embryos die early in development. 
CENP-A is a histone H3-like protein; it may be 
involved in the epigenetic marking of kinetochore- 
associated chromatin and possibly also in the recruit- 
ment of CENP-C to the kinetochore. CENP-C is 
present at active centromeres, including neocentro- 
meres (de novo sites of kinetochore activity outwith 
the centromere), but absent from inactive centromeres 
(e.g., in dicentric chromosomes). CENP-B binds to a 
specific 17bp DNA sequence that shows wide conser- 
vation in vertebrates. However, the functional role of 
this protein in kinetochore formation and activity is 
unclear. It is present on both active and inactive cen- 
tromeres but is not present in neocentromeres. 


Role in Mitotic Spindle Checkpoint 


Kinetochores are important elements of a mitotic 
checkpoint. Failure of kinetochores to bind to spindle 
microtubules, or incorrect association such as when 
both sister kinetochores attach to microtubules from 
the same spindle pole, results in mitotic delay or arrest. 
Some proteins, for example mitotic-arrest-deficient 


protein 2 (MAD2), may monitor microtubule binding 
to kinetochores. Others respond to the “tension” 
imposed on the kinetochore by the spindle micro- 
tubules by altering their phosphorylation state. 
These proteins are phosphorylated in misaligned kine- 
tochores (and can be detected using an antibody, 3F3/ 
2, that recognizes such epitopes) but dephosphoryl- 
ated when kinetochores are correctly attached to the 
mitotic spindle. At present, we have an incomplete 
understanding of the pathway(s) through which 
kinetochores influence the spindle checkpoint. 
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Klinefelter syndrome gets its name from a publication 
in 1942 by Klinefelter, Reifenstein, and Albright 


1066 Knockout 


describing a series of patients with gynecomastia, 
small testes, aspermatogenesis, androgen deficiency, 
and increased levels of follicle stimulating hormone. 
Of particular interest at that time was the association 
of primary hypogonadism with high levels of gona- 
dotrophins. The condition was soon found to be a 
common cause of male hypogonadism. Klinefelter 
syndrome attracted little attention until 1956 when 
Plunkett and Barr demonstrated the sex chromatin 
body in somatic cell nuclei, suggesting that those affect- 
ed were sex-reversed females. However, in 1959 it was 
shown that this was incorrect and that the 
sex chromatin-positive cases had an XXY sex-chromo- 
some constitution. Later variants of the syndrome were 
observed with XXXY and XXXXY sex chromosome 
complements, and others with sex chromosome 
mosaicism, suchas XY/XX Yand XX/XXY, 

The paradoxical sex chromatin findings prompted 
nuclear-sexing surveys of various populations, using 
buccal mucosal cell smears as a readily obtained 
source of test material. Thus, Klinefelter syndrome 
was found to be one of the commonest causes of 
male infertility due to azoospermia and extreme oligo- 
zoospermia, accounting for over 10% of such cases. 
Also, approximately 1% of males with severe learning 
difficulties were found to be affected by Klinefelter 
syndrome. Overall, 1 in 1000 of all male births are 
affected with the disorder. 

In adults with XXY Klinefelter syndrome, the 
one invariable clinical finding is small testes, asso- 
ciated with otherwise normal genitalia. The testes are 
less than half the normal size, measuring in length little 
more than 2cm. Gynecomastia is present in less than 
half the cases. Most patients show evidence of lack 
of androgens, such as scant body and facial hair, poor 
recession of temporal hair, lack of libido and potency, 
and a small prostate. Patients tend to be taller than 
average with longer legs in relation to trunk lengths 
and wide arm span. These findings are apparent 
before puberty and are therefore not due to delayed 
epiphyseal fusion. The testicular defect is character- 
ized by completely atrophic, hyalinized ‘ghost’ 
tubules devoid of elastic fibers alongside large masses 
of interstitial cells. In amongst the interstitial cells are 
occasional tubules lined solely by Sertoli cells, most of 
which are immature and undifferentiated. In rare 
cases, a single tubule may be found in which complete 
spermatogenesis is present. In the prepubertal testes, 
atrophic tubules are absent and spermatogonia may be 
found in a small proportion of tubules. Larger germ 
cells resembling oogonia at varying stages of calcifica- 
tion may occasionally be seen in prepubertal testes. 

Patients with Klinefelter syndrome and more than 
two X chromosomes have greater physical and mental 
handicap associated with a number of malformations. 


These include microcephaly, proximal radioulnar 
synostosis, undescended testes, congenital heart 
disease, cleft palate, and short incurved digit V. 
The facies is characteristic with prognathism, epi- 
canthus, hypertelorism, myopia, strabismus, and 
mid-face hypoplasia. The maximum number of sex 
chromatin bodies per nucleus is always one fewer than 
the total number of X chromosomes, indicating that 
X-inactivation ensures that only one X chromosome 
is genetically active. However, abnormal dosage of 
X/Y homologous loci, which normally escape X- 
inactivation, is thought to be responsible for the level 
of clinical disability associated with additional X 
chromosomes. Male differentiation occurs irrespect- 
ive of the number of X chromosomes and this attests 
to the dominant male-determining effect of the sex- 
determining region (SRY)-containing Y chromosome. 
The XXY condition has been observed in a number of 
other species including mouse, cat, horse, and sheep; 
in each case male differentiation is apparent. 

Other variants of Klinefelter syndrome are 
known. SRY+ XX males, in whom the SRY locus 
has been transferred to the X by accidental recombin- 
ation within the differential segments of the X and Y 
chromosomes, show little disability other than infertil- 
ity (see Sex Reversal). Those with sex chromosome 
mosaicism, i.e., XY/XXY or XX/XXY, also tend to 
show less disability than XXY patients. XY/XXY 
patients are occasionally fertile and XX/XXY patients 
may rarely be found to be true hermaphrodites (see 
Hermaphrodite). 

Intracytoplasmic sperm injection (ICSI) has been 
used increasingly to allow some Klinefelter patients to 
father children. In these cases, small numbers of viable 
sperm have been recovered by testicular or epididymal 
biopsy for the IVF procedure using ICSI. Many 
patients with Klinefelter syndrome benefit from rou- 
tine therapy with small doses of testosterone. 


See also: Fertilization; Hermaphrodite; 
Imprinting, Genomic; Infertility; Recombination, 
Models of; Turner Syndrome; X-Chromosome 
Inactivation 
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A knockout is shorthand term used to describe 
a genetically manipulated organism that has had a 
specific gene eliminated or inactivated. A knockout 
allele is, thus, incapable of producing a gene product. 


Knockout alleles are generated by an in vitro process of 
homologous recombination in embryonic stem cells. 


See also: Embryonic Stem Cells 
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The American scientist Arthur Kornberg (1918- ) 
shared the 1959 Nobel Prize for Physiology or Medi- 
cine with the Spanish-American scientist Severo 
Ochoa (1905-1993). These scientists were honored 
“for their discoveries of the mechanisms of the biologic 
synthesis of ribonucleic and deoxyribonucleic acids.” 
(Nobel Prize Foundation) 

The son of Joseph and Lena (née Katz) Kornberg, 
Arthur was born in Brooklyn, New York. His parents 
had immigrated from Austria and his father operated 
sewing machines in sweatshops prior to owning a 
small hardware store. Arthur was a brilliant student, 
with a reputation as the “smart kid on the block.” 
(Henerdson and Kornberg, 1991) His love of biology 
and biochemistry was sparked after he took a premed- 
ical course at the City College of New York. Enrolling 
himself for medical studies at Rochester University, 
New York, Kornberg earned his medical degree from 
there in 1941. 

Following an internship and a brief period of ser- 
vice as medical officer in the US Coast Guards, 
Kornberg chose a career in biochemistry research 
rather than in medical practice. In 1943, he joined the 
National Institutes of Health in Bethesda, Maryland, 
where he was to conduct much of his prize-winning 
enzyme work. He also received brief, but valuable 
training under Severo Ochoa at New York University 
College of Medicine, in New York, in 1946, and under 
Gerty and Carl Cori at Washington University, in St. 
Louis, in 1947. 

In 1955, Ochoa and Grunberg-Manago isolated a 
new enzyme from Azobacter vinelandii that was cap- 
able of synthesizing RNA in test tubes. They named 
the enzyme polynucleotide phosphorylase. Some years 
later it was shown that polynucleotides synthesized 
in vitro were also active as messengers in protein 
synthesis. 

Working independently, Kornberg attempted to free 
enzymes from cells by using one of the latest physical 
methods — treating bacteria with sound waves. Subse- 
quent steps in enzyme isolation were long and tedious, 
fraught with many technical difficulties. After isolating 
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reasonably pure forms of the DNA polymerase 
enzyme from the bacterium Escherichia coli, 
Kornberg incubated them with radiolabeled thymine, 
one of the four bases of DNA. He then demonstrated 
that thymine had been incorporated into a chemical 
that had some of the properties of natural DNA. 

However, to produce DNA artificially, Kornberg 
needed exquisitely pure forms of the enzyme. This 
required extensive experimentation that would take 
an additional 4 years. After succeeding in isolating 
the purest forms of polymerase enzyme, Kornberg 
showed that, in addition to the enzyme and the four 
base pairs of DNA as ‘raw materials,’ small quantities 
of ‘primer? DNA were needed for artificial DNA 
synthesis. 

Along with describing detailed enzymatic steps of 
DNA replication, Kornberg also presented the first 
experimental proof of how polymerase enzymes cata- 
lyzed reactions resulting in the production of new 
strands of DNA, which were virtually identical to 
the natural DNA. 

Thus, nearly 100 years after the discovery of nucleic 
acids, DNA and RNA could be artificially synthe- 
sized. The findings of Ochoa and Kornberg were 
hailed as a milestone in the history of genetics. Hugo 
Theorell of the Royal Caroline Institute, the scientist 
who delivered the presentation address at the Nobel 
Prize ceremonies of 1959, prophetically predicted that 
just as the discovery of urea in the nineteenth century 
by Friedrich Wöhler, the discoveries of Ochoa and 
Kornberg’s were the next major steps along the path- 
way of bridging the “first gap between the living and 
the dead.” (Nobel Prize Foundation). As Theorell 
predicted, Ochoa and Kornberg’s contributions were 
to play a central role in the technology of genetic 
engineering of the 1980s and in the Human Genome 
Project of the 1990s. 

Kornberg once said with characteristic modesty, 
that he and Ochoa had simply opened up a tiny 
crack and tried driving a wedge — the hammer was 
the enzyme to understand the mystery of DNA 
molecule. When he was asked whether he and his 
colleagues had created life in a test tube, Kornberg 
replied that he might be able to answer the question 
“if you'd first care to define life.” 

Although Kornberg wrote extensively, he keenly 
appreciated the difficulties of good writing, which he 
referred to as “variety of mental torture” (Magner 
1991). His autobiography, For the Love of Enzymes: 
The Odyssey of a Biochemist, was published in 1989. 
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Kuru is a transmissible spongiform encephalopathy 
(TSE; see Transmissible Spongiform Encephalo- 
pathy), which reached epidemic proportions in the 
1950s in Papua New Guinea among the Fore tribe. 
When it was first described in 1957, the disease was 
evident in about 1% of a population of more than 
35 000 people. In some areas the disease was prevalent 
in as many as 5-10% of the population. Those affected 
first develop cerebellar symptoms with unsteadiness 
of gait, progressive trembling or shivering of the body 
(termed ‘kuru’ in the Fore language), and dysarthria. 
The ataxia becomes progressively worse and soon the 
patient is uable to walk or stand, muscle tremors and 
rigidity become pronounced, incontinence and dys- 
phagia develop, and eventually the patient becomes 
mute and unresponsive. Death occurs within 1 year 
of the onset of the disease. Unlike other TSEs, severe 
dementia is not a feature of kuru. 

Microscopic examination of the brain of affected 
patients revealed loss of neurons, particularly in the 
cerbellum, widespread astrocytosis, and spongiform 
change. Amyloid plaques were present in about 75% 


of cases. The cause of the condition was obscure until 
1959, when the similarity of the neuropathology to 
scrapie was first noticed. This prompted attempts to 
transmit kuru to experimental animals. Intracerebral 
inoculation of brain tissue into chimpanzees led to 
a kuru-like disease within 1.5 years. Other animals 
also proved susceptible both by inoculation and 
by oral feeding, including Old World and New 
World monkeys and goats. Kuru does not transmit 
to sheep. 

The early investigators of kuru noticed that the dis- 
ease was common in women and children, but adult 
males were rarely affected. During the past 30 years, 
the condition has gradually disappeared except in a 
few elderly individuals. This correlates with the aban- 
donment of ritual cannibalism in the early 1960s. Up 
to that time, it was the practice of local tribes to take 
part in consuming various tissues, including the brain 
of deceased relatives, partly as an act of respect and 
mourning. Women did the butchery and prepared 
tissues for consumption. This involved much bodily 
contamination with brain and body fluids, and it is 
likely that infection occurred through body sores in 
addition to oral ingestion. Men were not involved in 
handling the affected corpses and tended to eat the 
flesh rather than the brains, while women and children 
were much more exposed to the infection. Since the 
1960s, the mortuary practices have been abandoned 
and this has been associated with a sharp decline in 
disease prevalence. At the time of writing, only a few 
elderly people develop the disease each year, and this 
suggests that, in these cases, the incubation period 
may be as long as 40 years. Children born to affected 
women in recent years, and since the cessation of 
cannibalism, have not developed the disease, suggest- 
ing that maternal transmission either im utero or via 
breast feeding does not occur to any extent. 

It has been suggested that kuru might have origin- 
ated from a sporadic case of Creutzfeldt—Jakob disease 
occurring early in the twentieth century, which spread 
to an increasing number of the population as a result 
of the practice of ritual cannibalism. The spread of 
bovine spongiform encephalopathy, via animal pro- 
tein contained in commercial cattlefeed and thence to 
humans, has close similarity to the spread of kuru. 


See also: Transmissible Spongiform 
Encephalopathy 


and François Jacob led to the first breakthrough in 
understanding gene regulation, and resulted in the 
‘operon model’ of gene regulation. 


See also: Beta (f)-Galactosidase 
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The lagging strand of DNA elongates overall in the 3’- 
5! direction, but is synthesized discontinuously in the 
form of short fragments (5'-3’) that are subsequently 
covalently linked. 


See also: Okazaki Fragment; Replication 
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Jean Baptiste, Chevalier de Lamarck (1744-1829), was 
the first person to develop a comprehensive theory of 
evolution. The essence of his theory, which he worked 
out at the end of the eighteenth century, was that the 
present-day diversity of living species arose via a 
gradual “transmutation” of ancestral species. Thus in 
conceiving the history of living forms in terms of 
“descent with modification,” Lamarck’s evolutionary 
theory was a precursor to that which Charles Darwin 
presented 50 years later in his On the Origin of Species. 
Lamarck was born in the Picardy region of north- 
eastern France in 1744. As a son of a family of im- 
poverished aristocrats, he had only two alternative 
prospects for an honorable career: the Church or the 
Army, and Jean Baptiste tried both. After briefly 
studying for the priesthood with the Jesuits, he joined 
the Grenadiers and distinguished himself by his 
bravery in the battle at Bergen-op-Zoom in the Seven 
Years War. Suffering a head wound (not from hostile 
enemy fire but from friendly horseplay with his fellow 
Grenadiers), he was given a medical discharge and 
took up the study of medicine in Paris in 1766. 
Lamarck did not become a physician, any more 
than he became a priest or professional soldier. 
Instead, he turned to the study of natural history, 
and in 1781, he was appointed to a junior curatorship 
in the King’s Botanical Garden. This position gave 
him the opportunity to undertake field studies, and 


Lamarck, Jean Baptiste 1071 


in 1788, he published a definitive survey of the flora of 
France, presenting a dichotomous diagnostic method 
for the taxonomic classification of plants by scoring 
the presence or absence of alternative traits. 

This novel procedure brought him to the attention 
of Georges Buffon, the foremost French naturalist of 
the time, who sponsored Lamarck’s election to the 
French Academy of Science and his appointment to a 
professorship at the Museum of Natural History in 
Paris. Before long, Lamarck brought out his monu- 
mental Dictionnaire de Botanique, on which his 
scientific reputation would mainly rest during his 
lifetime. 

When the Museum of Natural History was re- 
organized in 1793, in the aftermath of the political 
turmoil of the French Revolution, Lamarck was trans- 
ferred to the chair of zoology and given the assign- 
ment of teaching the taxonomy of insects and worms. 
Being a botanist, he knew very little about animals, 
but he was a fast learner. Between 1815 and 1822, he 
published his great zoological treatise, Histoire nat- 
urelle des animaux sans vertèbres. This contained the 
first subdivision of the phyla of the animal kingdom 
into two grand categories, which he designated as “ver- 
tebrates” and “invertebrates,” according to whether a 
vertebrate column was present or absent. Moreover, 
among the invertebrates (whose classification had 
flummoxed Linnaeus, the founder of modern taxo- 
nomy) Lamarck identified and named the phylum of 
annelids and the classes of arachnids and crustaceans 
of the arthropod phylum. His depth of knowledge of 
the natural history of both plant and animal kingdoms 
was highly unusual for its time, and led Lamarck to 
put forward another novel idea: that there exists a 
general science of living forms, for which he coined 
the (Greek-derived) compound neologism “biology.” 

In developing his theory of evolution (which is 
treated in these pages in a separate entry; see Lamarck- 
ism) Lamarck took into account the geological studies 
that indicated that the earth has existed for a very long 
time, during which its surface features underwent 
many very gradual changes. Moreover, he inferred 
from the character of fossils that animal life has been 
present for a large fraction of that long time, during 
which it too underwent gradual changes. Hence the 
species have to be transmutable rather than eternally 
fixed, as had been generally believed ever since 
Aristotle developed the species concept in the fourth 
century Bc. As Lamarck pointed out, the seemingly 
empirical fact of the characterological permanency of 
the species is actually an illusion, attributable to the 
shortness of the human life span relative to the enor- 
mous length of the geological time scale. 

Lamarck had moved evolutionary theory into the 
forefront of biological thinking, for which he received 
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hardly any credit during his lifetime. This lack of 
appreciation was due in large measure to his having 
been overshadowed by his politically influential 
contemporary Georges Cuvier, the founder of com- 
parative anatomy and leading authority on the 
classification of fossils. Despite his outstanding 
qualifications for the study of evolution, Cuvier was 
a creationist who believed in the literal truth of the 
story told in Genesis 1 of the Five Books of Moses. 
He firmly rejected Lamarck’s theory and explained 
the origin of fossils in terms of a succession of cata- 
strophes in the earth’s history, each of which extermin- 
ated all extant forms of life and was followed by 
another round of de novo creation. 

Lamarck was blinded by an infection for the last 17 
years of his life, and fell into poverty, dying in 1829. 
Even posthumously, he never did receive the recogni- 
tion he deserved as an important pioneer in the devel- 
opment of modern biology. Instead his name became 
the object of ridicule and the term ‘Lamarckist’ an 
invective because his evolutionary theory contained 
a fundamental flaw. Contrary to contemporary popu- 
lar belief, Charles Darwin, who, it should be noted, 
did hold Lamarck in high regard, was no more able to 
provide a satisfactory explanation of the origin of 
novel hereditary traits than was Lamarck. Such an 
explanation had to await the rise of the science of 
genetics in the first part of the twentieth century and 
the development of neo-Darwinism. 


See also: Darwin, Charles; Lamarckism 
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‘Lamarckism’ refers to the first comprehensive the- 
ory of evolution developed by the French natural 
historian Jean Baptiste Lamarck and set forth by him 
in his treatises Recherches sur organization des corps 
vivants (1802); and Philosophie zoologique (1809). 
Lamarck’s theory was based on his lifelong direct 
observation of plants and animals, which provided 
him with a sense of the dynamic quality of life, as 
well as of the close interdependence of physical and 
vital processes in which life is grounded. As originally 
formulated, Lamarckism was part of an elaborate sur- 
mise about processes for whose operation Lamarck 
had no direct evidence. 

Lamarckismasserted thatall living things have arisen 
via a continuous process of a gradual modification 


throughout geologic history, as a vast sequence of life 
forms, ascending a staircase leading from the lowliest 
and simplest to the highest and most complex crea- 
tures. To account for this progressive movement 
Lamarck invoked what then seemed a reasonable 
hypothesis of the inheritance of acquired characteris- 
tics: that organisms develop new traits in response to 
needs created by their environment and pass them on 
to their offspring. The commonly cited example of 
Lamarckism is the evolution of the giraffe, whose 
ancestors were supposed to have acquired their long 
necks by stretching them to reach the upper leaves of 
trees and transmitted that gradually acquired neck 
length to their progeny. Lamarckism also provided 
for the permanent loss of old traits, in case a change 
in the environment eliminated the need for them. In 
Philosophie zoologique Lamarck summarized his the- 
ory in terms of two ‘laws’ governing the evolutionary 
ascent of life to higher stages. One stated that organs 
are improved with repeated use and weakened by 
disuse. The other stated that such environmentally 
determined acquisitions or losses of organs are pre- 
served by transmission from parent to progeny. 

Lamarckism was an important forerunner of the 
Darwinian theory of evolution, which, just as did 
Lamarckism, assigned a critical role to the environ- 
ment in evolutionary processes. Contrary to a mis- 
conception held widely even among present-day 
biologists, Lamarckism is not in conflict with Darwin’s 
theory of natural selection. According to Lamarckism, 
the offspring of those giraffes that did succeed in 
transmitting an acquired extension of their necks to 
the next generation could obtain more food than other 
members of their cohort. They would thus be more 
numerous, which, in turn, would result in an increase 
of the average neck length in successive generations. 
Thus Darwin’s ‘classical’ Darwinism is an improve- 
ment over Lamarckism but not its refutation, since 
Darwin had no more clear idea than Lamarck had of 
the genetic basis of the hereditary variations that are at 
the root of the evolutionary process. 

Lamarckism fell into disrepute only in the early 
years of the twentieth century, after the rediscovery 
of Mendel’s laws of inheritance, the identification of 
genes as the atoms of heredity, and the recognition 
of gene mutation as the source of the novel hereditary 
features that are responsible for evolutionary change. 
These insights gave rise to the development of neo- 
Darwinism which accounts for evolution in terms of 
gene mutation, natural selection for traits, and the 
reproductive dynamics of conspecific populations. 

By the middle of the twentieth century, the desig- 
nation of someone as a “Lamarckist’ had become a term 
of abuse, partly because of its association with one of 
the few world-class monsters of twentieth century 


science: the Russian agronomist Trofim Lysenko, who 
dominated (not to say destroyed) genetics in the 
Soviet Union and its satellite popular democracies 
from the mid-1930s until the mid-1960s. Lysenko 
was not openly opposed to classical Darwinism, 
Karl Marx having been a great admirer of Darwin 
but he declared neo-Darwinism, with its reliance on 
Mendelian genetics and gene mutation, to be idealist- 
racist metaphysical speculations propagated by the 
Catholic Church and the Fascists to keep the proletariat 
intellectually enchained. 

At first, in the 1930s, Lysenko denied that he 
was a Lamarckist and declared that “starting from 
Lamarckian positions, the work of remaking the 
nature of plants by ‘education’ cannot lead to positive 
results.” Then, when he became director of the 
Institute of Genetics of the Soviet Academy of 
Sciences in 1940 and had Stalin’s ear, Lysenko declared 
Mendelian genetics erroneous. By 1948, when he had 
ruthlessly silenced any Soviet geneticists who opposed 
him, he no longer concealed his adherence to 
Lamarckism, declaring that: 


the well-known Lamarckian propositions, which recognize 
the active role of the conditions of the external environment 
in the living body and the inheritance of acquired characters, 
in contrast to the metaphysics of neo-Darwinism, are indeed 
scientific. 


Lysenko was finally dismissed in 1965, after having 
gravely hampered scientific and agricultural progress 
in the Soviet Union for more than 25 years. Never- 
theless, ‘Lamarckist’ remains a term of ridicule. This is 
a most regrettable affront to the memory of one of the 
great figures in the history of biology, to whom that 
discipline owes its very name. 


See also: Lamarck, Jean Baptiste; Lysenko, T.D./ 
Lysenkoism 
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Lampbrush chromosomes (LBCs) are elongated 
diplotene bivalents in prophase of the first meiotic 
division in growing oocytes in the ovaries of most 
animals other than mammals and certain insects. 
Some LBCs reach lengths of a millimeter or more. 
The chromosomes go from a compact telophase 
form at the end of the last oogonial mitosis, become 
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lampbrushy, and then contract again to form normal 
first meiotic metaphase bivalents. They are character- 
ized by widespread RNA transcription from hun- 
dreds of transcription units that are arranged at short 
intervals along the lengths of all the chromosomes. 

LBCs were first seen in salamander oocytes by 
Flemming in 1882 and in oocytes of a dogfish by 
Ruckert in 1892. The name lampbrush originated 
from Ruckert, who likened the objects to a nineteenth- 
century lampbrush, equivalent to the modern test- 
tube brush. LBCs are delicate structures and they 
must be carefully dissected out of their nuclei in 
order to examine them in a life-like condition. The 
largest LBCs are to be found in oocytes of newts and 
salamanders, animals that have large genomes and 
correspondingly large LBCs. 

The best oocytes for lampbrush studies are those 
that make up the bulk of the ovary of a healthy adult 
female at the time of year when the eggs are actively 
growing. They are about 1 mm in diameter and their 
nuclei are between 0.3 and 0.5 mm in diameter (Figure 
1). The techniques for isolating and looking at LBCs 
from such oocytes are specialized but inexpensive and 
simple; details are available in the sources cited in the 
Further Reading section. 

Since an LBC is a meiotic half bivalent, it must 
consist of two chromatids. The entire lampbrush biva- 
lent will therefore have a total of four chromatids. The 
chromosome appears as a row of granules of deoxy- 
ribonucleoprotein (DNP), the chromomeres, con- 
nected by an exceedingly thin thread of the same 
material (Figure 2). Chromomeres are 0.25-2 um in 
diameter and are spaced 1-2 um center to center along 
the chromosome. 

Each chromomere has two or a multiple of two 
loops associated with it. The loops have a thin axis of 
DNP surrounded by a loose matrix of ribonucleopro- 
tein (RNP). The loops are variable in length, ranging 
from about 5 to 100 um. Loops vary in appearance. 
Loops of the same appearance always occur at the 
same locus on the same chromosome within a species. 


Nuclear membrane 


Nucleus — containing 
lampbrush chromosomes 


Cytoplasm and yolk 
Figure | An oocyte (growing ovarian egg) showing 


the relative dimensions of the egg, its nucleus, and its 
lampbrush chromosome. 
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Figure 2 A region of a lampbrush chromosome showing the interchromomeric axial fiber (cf) connecting small 
compact chromomeres (c), chromomeres bearing pairs (L) or multiple pairs (LL) of loops, loops of different 
morphologies, polarization of thickness along individual loops, loops consisting of a single unit of polarization (P), and 
loops with several tandem units of polarization having the same or different directions of polarity (PPP). 


Some particularly distinctive loops can be used for 
chromosome identification and the construction of 
LBC maps. Loops arising from the same chromomere 
have the same appearance and are usually, though not 
always, of the same length (Figure 2). 

The general pattern of events during the lampbrush 
phase of oogenesis is one of extension followed by 
retraction of the lampbrush loops and there is a clear 
inverse relationship between loop length and chromo- 
mere size. The longer the loop, the smaller the 
chromomere, and vice versa. 

Most lateral loops have an asymmetrical form. 
They are thin at one end of insertion into their 
chromomere and become progressively thicker 
towards the other end (Figure 2). If an LBC is 
stretched, breaks first happen transversely across the 
chromomeres so that the resulting gaps are spanned by 
the loops that are associated with the chromomeres 
(Figure 3). This demonstrates the structural continu- 
ity between the main axis of the chromosome — the 
interchromomeric fiber — and the axes of the loops. 

Lampbrush loops are sites of active RNA synthesis 
and RNA is being transcribed simultaneously all 
along the length of the loop. In newts, there are more 
than 20000 RNA-synthesizing loops per oocyte. Par- 
ticular loops may be present or absent in homozygous 
or heterozygous combinations and the frequency of 
combinations within and between bivalents with 
respect to presence or absence of loops, signifies 
that these loops assort and recombine like pairs of 
Mendelian alleles. So there appears to be an element 
of genetic unity in a loop-chromomere complex. 

By 1960, it was known that an LBC has two DNA 
duplexes running alongside one another in the inter- 
chromomeric fiber, compacted into chromomeres at 
intervals and extending laterally from a point within 


Figure 3 Breakage of a stretched lampbrush chromo- 
some across a chromomere, such that loops associated 
with the chromomeres come to span the gap between 
the two halves of the chromomere. 


each chromomere to form loops where RNA tran- 
scription takes place. Each duplex represents one 
chromatid (Figure 4). New technologies of the late 
1970s confirmed this model and extended it. 

A technique that removed most of the protein from 
chromosomes, leaving only the DNA and attached 
newly transcribed RNA, and then visualized what 
was left by electron microscopy, showed a lampbrush 
loop as a thin DNA axis with RNA polymerase mol- 
ecules lined up and closely packed along its entire 
length. Each polymerase carried a strand of RNP. At 
one end of the DNA axis, the RNP strands are short. 


Figure 4 Accepted model of lampbrush chromosome 
organization showing the interchromomeric fiber con- 
sisting of two chromatids that separate from one 
another and become involved in RNA transcription in 
the regions of the loops. 


At the other end, they are much longer and they show 
a smooth gradient in size from one end to the other. In 
essence, the entire region is polarized and asymmetric, 
in the same sense as a loop, as seen with the light 
microscope, is asymmetric. The DNA axis outside 
the region occupied by polymerases shows the struc- 
ture that would be expected of nontranscribing chro- 
matin. The lengths of the transcribed regions of the 
chromosome are about the same as the lengths of 
loops as seen and measured with light microscopy. 
Lampbrush loops are therefore polarized units of 
transcription. The polymerase moves on a stationary 
loop axis. A loop is formed by an initial ‘spinning out’ 
process, probably powered by the continuing attach- 
ment of more and more polymerases to a specific 
region of the chromomeric DNA. The loop remains 
and is transcribed as a permanent structure through- 
out the lampbrush phase. Towards the end of the 
lampbrush phase, transcriptive activity declines, poly- 
merases detach from loop axes, and loops regress 
and disappear. The vast majority of the chromomeric 
DNA is never transcribed and a loop represents a 
short, specific part of the DNA in a loop—chromomere 
complex. 

In situ nucleic acid hybridization is a means of 
locating specific gene sequences on chromosomes. 
Let us suppose that each loop represents ‘a gene.’ 
The RNA that makes up the loop matrix, the attached 
nascent transcripts, will all be or include transcripts of 
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that ‘gene.’ In effect, the loop is a large object, consist- 
ing of hundreds of RNA copies of the gene, all clus- 
tered at one position on the chromosome set. Isolate 
and purify the DNA of that gene, and label it in some 
way, and it will be easy to make it single-stranded 
and bind it specifically to the complementary single- 
stranded RNA attached to the lampbrush loop. The 
technique is known as DNA/RNA transcript im situ 
hybridization (DR/ISH). 

The end product of an experiment involving DR/ 
ISH is a preparation showing one or more pairs of 
loops with label distributed along their lengths. It is 
not uncommon in DR/ISH experiments to find loops 
that are labeled over only part of their lengths. This is 
evidence that the DNA sequence of a loop axis can and 
does change from place to place along the length of the 
loop. Wherever there are partially labeled loops, it is 
usual to find the same partially labeled loops, with 
precisely the same pattern of labeling, in every oocyte 
over quite a wide range of size and stage. So loops are 
permanent structures that transcribe from the same 
stretch of DNA axis throughout the entire lampbrush 
phase. DR/ISH experiments prove that highly re- 
peated short DNA sequences, commonly referred to 
as ‘satellite’ DNA, which could not possibly serve as a 
basis for transcription and translation into functional 
polypeptides, are abundantly transcribed on lamp- 
brush loops along with more complex sequences that 
are definitely translated into functional proteins. 

The current hypothesis for LBC function is as fol- 
lows. At the thin base of each loop or the start of each 
transcription unit there is a promoter site for a func- 
tional gene sequence. RNA polymerase attaches to 
this site and moves along the DNA, transcribing the 
sense strand of the gene and generating messenger 
RNA molecules that remain attached to the polymer- 
ase (Figure 5). In the lampbrush environment there 
are no stop signals for transcription, so the poly- 
merases continue to transcribe past the end of the 
functional gene and into whatever DNA sequences 
lie ‘downstream’ of the gene. This results in very 
long transcription units, very long transcripts, mixing 
of gene transcripts with nonsense transcripts in high 
molecular weight nuclear RNA, and lampbrush loops. 
This ‘read-through’ hypothesis predicts that the num- 
ber of functional genes that are expressed to form 
translatable RNAs may be expected to equal the num- 
ber of transcription units that are active in a lampbrush 
set. The hypothesis says, in effect, that the only un- 
usual feature of an LBC, and the very reason for the 
lampbrush form, is that once transcription starts it 
cannot stop until the polymerase meets another pro- 
moter that is already initiated or some condensed 
chromomeric chromatin that is physically impene- 
trable and untranscribable. 
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Figure 5 Transcription on a lampbrush chromosome 
loop where a gene (thick black line) is transcribed from 
its promoter (black flag) through to and past its normal 
stop signal (white flag) and into the normally nontran- 
scribed DNA that lies downstream, thus generating very 
long transcription units with long transcripts that include 
RNA complementary to the sense strand of the gene 
(thick parts of the transcripts) and nonsense DNA that 
lies downstream of the gene (thin parts of the transcripts). 


Evidence for the Read-Through 
Hypothesis for LBCs 


LBC s dissected directly into a solution of the enzyme 
deoxyribonuclease-1 (DNase-1) fall to pieces and 
their loops break into thousands of fragments. This 
does not happen with ribonuclease or proteases. If 
breakage of the chromosome axis and the loops by 
DNase is watched and timed and the number of breaks 
plotted against time on a log scale, the slope of the plot 
for the chromosome axis is 4 and that for the loops is 2. 
This supports the model in which the axis consists of 
two chromatids — each a DNA double helix consisting 
of two nucleotide chains — and the loop is part of one 
chromatid — consisting of one double helix made up 
from two nucleotide chains. 

A later experiment used restriction enzymes that 
cleaved DNA only at places along the molecule where 
there was a particular short nucleotide sequence. If a 
loop consisted entirely of identical tandemly repeated 
DNA sequences, all with a particular restriction 


enzyme recognition site, then the loop would be 
destroyed by that enzyme. If, on the other hand, the 
DNA sequences all lacked the enzyme recognition 
site, then the loop would be totally unaffected and 
would remain intact. 

An experiment was set up using five enzymes and 
the LBCs from N. viridescens. The control enzyme 
was deoxyribonuclease-1. DNase-1 and three of 
the restriction enzymes destroyed everything. One 
enzyme, Haelll did likewise, except that it left one 
pair of loops completely intact. These HaellI resistant 
loops were big ones, 100 um long, equivalent to at least 
300 000 nucleotides. Their unique resistance to Hael 
provided direct evidence that at least one pair of loops 
consisted of tandemly repeated short sequence DNA. 
Ata later date, the effects of Haelll were tested again, 
with appropriate controls, on the Haelll resistant 
loops of N. viridescens. Breaks regularly occurred 
precisely at the thin beginnings of each loop, but the 
remainder of the loops remained intact, as would be 
predicted on the basis of the read-through hypothesis. 
The start of the transcription unit would be character- 
ized by a long complex gene sequence that would 
almost inevitably include the HaelII recognition site. 
The remainder of the loop would consist entirely of 
repeat sequences that lacked the HaelII site. 


Other Questions We Should Ask about 
Lampbrushes 


Only a small fraction of the entire DNA of a loop- 
chromomere complex forms the transcription unit 
that makes the loop. What about the rest of the 
DNA? Is the DNA segment that makes the loop pre- 
ferentially selected for transcription the same piece at 
the corresponding locus in every egg of every indi- 
vidual of a particular species? This question may be 
approached experimentally. 

Why do loops have different morphologies that 
are heritable, locus-specific, and sometimes species- 
specific? The loop matrix is a site of processing, cleav- 
ing, and packaging of nuclear RNA, so most of the 
variation in gross structure may be expected to reflect 
different modes of binding and interaction involving 
quite a wide range of proteins and RNAs. 

Do LBCs look the same in all animals? They do 
not. The relative lengths of LBCs at the time of their 
maximum development are the same as the relative 
lengths of the corresponding mitotic metaphase 
chromosomes from the same species. The overall 
lengths of LBC are broadly related to genome size. 
Birds, with their notably small genomes, have ex- 
tremely small, but nonetheless very beautiful, LBC 
that present many extraordinary and hitherto un- 
explained features. 


Some LBCs have long loops and others have very 
short ones. We have seen that the transcription units of 
LBCs are unusually long because they include inter- 
spersed repetitive elements of the genome. Structural 
genes in large genomes are more widely spaced than 
in small genomes as they are interspersed with non- 
coding DNA. One might therefore expect LBCs from 
large genomes to have longer loops (transcription 
units) than those of smaller genomes, and this is what 
has been observed. 

Many of the very long loops that we see in LBC 
from animals with large genomes show multiple, tan- 
demly arranged thin-thick segments (transcription 
units). The individual transcription units within one 
loop can have the same or opposite polarities and can 
be of the same or different lengths (Figure 6). This 
observation suggests that it is really the transcription 
unit that is the ultimate genetic unit in an LBC and 
not the loop/chromomere complex, as was once 
thought. 

Why do LBCs exist at all? They are characteristic 
of eggs that develop quickly into complex multicellu- 
lar organisms independently of the parent. A frog’s 
egg is fertilized and develops into a complex tadpole 
within a few days. Much of the information and raw 
materials for this process are laid down during oogen- 
esis through activity of LBCs and amplified ribosomal 
genes and the accumulation of yolk proteins imported 
from the liver. LBCs may therefore be regarded as an 
adaptive feature that has evolved to preprogramme the 
egg for rapid early development. The fact that they are 
not present in mammalian eggs could be regarded as 
an advanced feature that is consistent with the slow 
pace of mammalian development. A frog’s egg, for 
example, will have completed gastrulation and the 
differentiation of its central nervous system and 
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W 
Figure 6 The various arrangements of transcription 
units that actually occur on lampbrush chromosomes. 
The loop on the left comprises a single transcription 
unit. In the middle loop there are two transcription units 
of the same size and polarity. The right-hand loop has 
four transcription units of different sizes and different 
directions of polarity. 
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embryonic axis by the time a human embryo has 
only reached the 8-cell stage. 

LBCs provide a uniquely powerful medium 
through which it has been possible to draw valid con- 
clusions at the molecular level from observations and 
experiments carried out mainly with a light micro- 
scope. Their value extends into the fields of compara- 
tive molecular cytogenetics and systematics. Nowhere 
else is it possible to study genome structure, function, 
and diversity by actually looking at the genome itself 
with a light microscope. LBCs are technically chal- 
lenging but not defeating. They are exceptionally 
beautiful to look at and fun to work with. 

Further information on these remarkable struc- 
tures can be found in the literature listed in the Further 
Reading section below and on the internet. 


Further Reading 

Callan G (1986) Lampbrush Chromosomes. Berlin: Springer- 
Verlag. 

Macgregor HC (1993) An Introduction to Animal Cytogenetics. 
London: Chapman & Hall. 

Macgregor HC and Varley J (1988) Working with Animal Chromo- 
somes. New York: John Wiley. 


See also: Cytogenetics; Developmental Genetics 
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During viral infection, late genes are those that are 
transcribed after the commencement of viral DNA 
synthesis. The bulk of these encode either components 
of the capsid, proteins aiding in morphogenesis or 
DNA packaging, or proteins that are to be carried 
with the DNA in the capsid. 


See also: Virus 
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The term ‘leader peptide’ (or, less commonly, ‘leader 
polypeptide’) refers to a peptide encoded by a DNA 
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sequence immediately upstream of the sequence 
encoding what eventually becomes a mature protein. 
However, this upstream peptide can be encoded in 
two quite different ways, and the term leader peptide 
is used to refer to both. 

In most cases the leader peptide is produced as the 
amino terminus of a longer protein and is released 
from that protein by the action of a protease. That is, 
the sequence encoding the leader peptide is part of the 
open reading frame encoding the rest of the protein. 
Many of these leader peptides are also referred to as 
‘signal peptides’ or ‘signal sequences,’ and these are 
involved in transport of the protein to or through cell 
membranes, transport to different membranous cell- 
ular compartments, or secretion of the protein from 
the cell. Signal peptides are removed from the mature 
protein during this process by a specific peptidase. 
Such signal peptides are composed typically of 16-30 
amino acid residues. Signal peptides contain a hydro- 
phobic core, which can span a membrane, a polar 
N-terminal region, and a hydrophilic C-terminal 
region. However, not all such leader peptides 
synthesized as part of a longer protein are signal 
sequences and, in some cases, e.g., the capsid pro- 
teins of certain viruses, their function remains un- 
known. 

The other type of leader peptide is encoded by a 
short, but independent open reading frame immedi- 
ately upstream of the beginning of certain polycistro- 
nic operons in some bacteria. Therefore, this leader 
peptide is produced independently of the following 
proteins. However, the peptide itself is apparently 
not functional. The efficiency of translation of the 
sequence encoding the leader peptide is coupled to 
the transcription of the downstream genes in a regu- 
latory mechanism called ‘translational attenuation.’ 
Each of the short sequences encoding these peptides 
contains codons related to the function of the enzymes 
encoded by the polycistronic mRNA. For instance, 
the 16-residue leader peptide of the histidine operon 
of Escherichia coli contains seven consecutive histidine 
codons. If a ribosome can translate these codons, the 
transcription of the remainder of the message is 
terminated. However, if the ribosome stalls at one of 
the histidine codons, because of a low concentration 
of histidyl-tRNA, transcription of the rest of the 
operon proceeds. Note that both types of leader pep- 
tide are encoded at the 5’ end of the mRNA, the ‘leader 
sequence.’ 


See also: Attenuation; Leader Sequence; 
Open Reading Frame 
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The mRNA region that precedes the coding sequence 
for a gene is called the leader sequence. Leader se- 
quences can regulate downstream expression at the 
levels of transcription or translation in bacteria, and 
can modulate downstream translation in eukaryotes. 


Transcription in Bacteria 


Transcription attenuation comprises one level of regu- 
lation for most amino acid biosynthetic operons in 
enteric bacteria. Nucleotide sequences within the 
leader cause the formation of a domain of secondary 
structure which acts as a transcription termination 
signal for bacterial RNA polymerase. Transcription 
initiated in the upstream promoter terminates within 
the leader so as to prevent RNA polymerase from 
entering the structural genes of an operon. Transcrip- 
tion termination is relieved when the intracellular 
concentration of the end-product amino acid of the 
operon-specified enzymes falls below some minimal 
level. The level of the end-product amino acid is 
sensed by the translation of a short leader-encoded 
open reading frame (ORF) immediately upstream of 
the transcription termination signal; the open reading 
frame contains one or more codons for the operon 
end-product amino acid. Low intracellular levels of 
the end-product amino acid prevent high level charg- 
ing of the cognate tRNA, resulting in ribosomal paus- 
ing at leader codons for the end-product amino acid. 
The paused ribosome interferes with the secondary 
structure of the transcription terminator causing the 
formation of a second configuration in the mRNA, 
the attenuator, which allows transcription to enter the 
downstream operon coding sequence. 

Transcription antitermination involves the forma- 
tion of a transcription termination structure in leader 
mRNA, which is either inhibited or facilitated by the 
interaction of a protein (or a tRNA molecule) with 
leader mRNA sequences. For example, the TRAP 
protein plus tryptophan binds to the leader sequence 
of the Bacillus subtilis trp operon causing the formation 
of a transcription terminator. In the absence of tryp- 
tophan, TRAP fails to bind to the leader sequence and 
an antiterminator structure forms allowing transcrip- 
tion to enter the operon. Other operons that follow 


this general pattern of regulation include the bgl 
operon of Escherichia coli, the pur, pyr, hut, lic, and 
glp operons and the sac regulon of B. subtilis, the ami 
operon of Pseudomonas, and the nas regulon of 
Klebsiella. Aminoacyl-tRNA synthetases in gram- 
positive bacteria are also regulated by antitermination. 
The uncharged tRNA interacts with the leader se- 
quences to promote the formation of an antitermin- 
ator structure allowing transcription to enter the 
tRNA synthetase coding sequence. 


Translation in Bacteria 


Translation attenuation regulates several antibiotic 
inducible, antibiotic resistance genes (e.g., cat, erm). 
A domain of secondary structure in leader mRNA 
sequesters the ribosome binding site for the down- 
stream resistance determinant, preventing translation 
initiation. Antibiotic-induced ribosome stalling in a 
short open reading frame within the leader causes 
destabilization of the secondary structure, which 
frees the ribosome binding site allowing translation 
of a coding sequence whose protein product can 
neutralize the antibiotic. 

Translational repression is well exemplified by cer- 
tain operons encoding bacterial ribosomal proteins. 
The translational repressor is a single ribosomal pro- 
tein encoded by the operon; the nonregulatory func- 
tion of this protein is to act as a structural component 
of the ribosome. In several examples, the binding 
target for the repressor protein in the operon leader 
sequence mimics the structure or sequence of the 
rRNA target for the same protein. Binding of 
the regulatory protein to leader mRNA is presumably 
of lower affinity than that for rRNA binding in vivo. 
Leader binding by the repressor interferes with trans- 
lation of operon mRNA by occluding the ribosome 
binding site or by changing the secondary structure of 
leader. 


Translation in Eukaryotes 


Translation in eukaryotes is typically initiated by the 
scanning of a 40S ribosomal preinitiation complex. 
Scanning begins at the 5’ capped end of the mRNA 
and halts at the first initiator codon, usually AUG, 
where translation begins. Translation initiation effi- 
ciency at any particular AUG is affected by the con- 
text of the leader sequence flanking the AUG codon; 
a preinitiation complex may ignore an AUG codon 
located in a region of poor context. Several features of 
the leader sequence can dramatically decrease transla- 
tion of the main (downstream) coding sequence: 
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examples include a region of secondary structure 
proximal to the 5’ cap site or a 5’ proximal AUG 
codon lacking a following an open reading frame. 


Regulation by Upstream Open-Reading 
Frames (uORFs) 

In the leaders for many eukaryotic mRNAs, the first 
AUG initiates translation of an upstream open reading 
frame (ORF) which is typically short. Translation of 
the functional protein, therefore, requires translation 
of the uORF followed by scanning of the ribosome to 
the next AUG and reinitiation of translation. Certain 
uORFs enhance downstream translation, probably 
because the uORF sequence facilitates reinitiation 
of translation at downstream AUG codons. uORFs 
which diminish downstream translation are believed 
to interfere with ribosome scanning beyond the 
uORE. Current evidence from studies of a cytomegalo- 
virus uORF-encoded peptide indicate that the short 
peptide prevents ribosome release from the uORF 
termination codon. The stalled ribosome itself cannot 
continue scanning, and can block the movement of 
other ribosomes attempting scanning along the 
mRNA. The most extensively studied example of the 
effects of ORFs on downstream translation is seen in 
the regulation of the yeast gene GCN4. 


Internal Ribosome Entry Site (IRES) 

Certain eukaryotic mRNAs contain an internal 
ribosome entry site (IRES) prior to the coding se- 
quence. An IRES presumably functions in an analo- 
gous manner to a bacterial ribosome binding site in 
allowing translation initiation by directly serving as a 
ribosome-binding target. The presence of an IRES 
preceding a coding sequence in a eukaryotic mRNA 
enables an mRNA that is not capped to be translated. 


Further Reading 

Henkin TM (2000) Transcription termination in bacteria. Current 
Opinion in Microbiology 3: 149-153. 

Hinnebusch A (1994) Translational control of GCN4: an in vivo 
barometer of initiation-factor activity. Trends in Biochemical 
Sciences 19: 409-414. 

Landick R, Turnbough CL and Yanofsky C et al. (1996) Transcrip- 
tion attenuation. In: Neidhardt FC et al. (eds) Escherichia coli 
and Salmonella, 2nd edn, pp. 1263—1286. Washington, DC: 
American Society for Microbiology Press. 

Lovett PS and Rogers EJ (1996) Ribosome regulation by the 
nascent peptide. Microbiological Reviews 60: 366-385. 


See also: Open Reading Frame; Transcription; 
Translation 
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The leading strand of DNA is synthesized continu- 
ously in the 5’-3’ direction. 


See also: Lagging Strand; Replication 


Least Squares 


W-H Li and K Makova 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1497 


The least squares method is a well-established stat- 
istical method of parameter estimation. This method 
chooses predicted values e; that minimize the sum of 
squared errors of prediction )’; (d; — e,) for all sample 
points d; (observed values). The least squares method 
has been utilized in molecular evolution to estimate 
the branch lengths in a phylogenetic (evolutionary) 
tree and to estimate the topology of a tree. The least 
squares estimates of the branch lengths b;s are the 
estimates e;s that minimize the following sum of 
squares: J; (dj — ex), where d, is the observed 
evolutionary distance between taxa z and j, and ej is 
the sum of length estimates (e;’s) of the branches con- 
necting taxa z and j. To choose the best topology 
according to the least squares criterion, the above 
sum is computed for each possible topology and the 
topology with the smallest sum is taken as the best 
tree. 


See also: Phylogeny; Trees 
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Joshua Lederberg (1925— ) has made many major 
contributions to our understanding of the genetics of 
microorganisms. He was born in Montclair, New 
Jersey, and received the Nobel Prize just 33 years 
later (with George Beadle and Edward Tatum) for 
discovering the mechanisms of genetic recombination 
in bacteria. He has been a member of the National 


Academy of Sciences since 1957 and was a charter 
member of its Institute of Medicine. 

Lederberg became intensely interested in studying 
biological mechanisms while still in High School 
and took advantage of a variety of opportunities in 
the New York area to work in laboratories from an 
early age. He studied at Columbia Medical School, 
including work on adaptation in mutants of Neuro- 
spora, and then did his PhD under Tatum at Yale, 
publishing “Gene recombination in Escherichia coli” 
in Nature in 1948. This work gave the first indication 
that bacteria can reproduce not only asexually, 
through binary fission, but also sexually, resulting 
in a complex shuffling of their genetic systems during 
the mating of bacteria. As he discusses in Annual 
Review of Genetics (1987; 21: 23-46), the choice of 
the K-12 strain was highly serendipitous; only about 
1 in 20 E. coli strains would have given positive results 
in their experiments, and the key extrachromosomal 
elements bacteriophage lambda and the F (fertility) 
factor, important in recombination, were also isolated 
in that system. 

Lederberg taught in the University of Wisconsin 
School of Agriculture from 1947 to 1959, making the 
key decision to join that strong center of research in 
microbiology and biochemistry rather than return to 
Columbia to complete his medical studies. He further 
helped lay the foundations of microbial genetics when 
he and student Norton Zinder discovered the phe- 
nomenon of phage transduction in Salmonella: they 
showed that certain bacteriophage strains could incor- 
porate a piece of the bacterial genome and carry it to 
a different bacterium. There it could recombine into 
the new host’s chromosome, thus providing a major 
new mechanism of lateral genetic exchange that has 
proven extremely important in understanding micro- 
bial ecology and evolution. These studies were soon 
extended to transduction of biochemical pathways 
in È. coli K-12, which was nonpathogenic and more 
extensively developed as a genetic system. With these 
discoveries, bacteria took their place along with Dros- 
ophila and Neurospora as key model organisms in 
understanding genetic principles. 

In 1959, Lederberg moved to the new medical school 
at Stanford University, where he became the director 
of the Kennedy Laboratories of Molecular Medicine 
in 1962. He moved to Rockefeller University to 
become its President in 1978, continuing his research 
there as Sackler Foundation scholar and professor 
emeritus of molecular genetics and informatics after 
his retirement from the presidency in 1990. In add- 
ition to his work on the fundamental mechanisms of 
microbial genetics, he has been very interested in the 
expanding field of research in artificial intelligence 
and in the search for life on Mars. 


Lederberg’s interests extend well beyond basic 
science. He has played a number of important roles 
in the international health community, including 
spending years on the World Health Organization’s 
Advisory Health Research Council and serving as 
chairman of the President’s Cancer Panel and of 
the congressional Technology Assessment Advisory 
Council. He also chaired a UNESCO committee on 
improving global internet communications for science 
and helping third-world people get onto the internet 
so they can be more involved in the process. Family 
has played an important role in his life. His father, a 
Rabbi who emigrated from Israel shortly before his 
birth, had a strong impact. His French-born wife is a 
Clinical Professor of Psychiatry at Memorial Sloan 
Kettering Cancer Center, and he has two children, 
David and Annie. His life has exemplified the basic 
advice he gave to young people in a recent inter- 
view (www.almaz.com/nobel/medicine/lederberg- 
interview.htm): 


Try hard to find out what you’re good at, and what your 
passions are, and where the two converge, and build your life 
around that...and make deliberate choices. 


See also: Bacterial Genetics; Conjugation, 
Bacterial; Phage (Bacteriophage); Transduction 
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The legume or bean family (Leguminosae or Faba- 
ceae), with over 650 genera and 18 000 species, is the 
third largest family of flowering plants (angiosperms), 
behind only orchids (Orchidaceae) and the composite 
or sunflower family (Asteraceae or Compositae). 
Morphologically and ecologically, it is a very diverse 
family, ranging from tiny alpine ephemerals to huge 
tropical rainforest canopy trees. As much as one-third 
of the family’s species are concentrated in a handful 
of large genera, such as Acacia, Astragalus, and 
Mimosa, that have radiated abundantly in disturbed 
habitats. 

The family is characterized by its distinctive (and 
eponymous) fruit, a two-valved pod whose halves 
separate to disperse the seeds; however, this form is 
modified into a wide variety of shapes and sizes, 
including indehiscent dry or fleshy forms. Symbioses 
with nitrogen-fixing soil bacteria (collectively called 
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‘rhizobia’), which are housed in specialized organs 
called nodules, are common but not universal in the 
family, nor is nodulation limited to Leguminosae. The 
ability to nodulate is thought to be an important 
adaptation in the family, and is a major factor in the 
economic and ecological importance of legumes. 


Phylogeny and Taxonomy 


Relationships with Other Families 

Molecular phylogenetic studies support the natural- 
ness (monophyly, descent from a single common 
ancestor) of the Leguminosae (Figure 1). The legumes 
have their relationships with taxa of the broad ‘rosid’ 
alliance that includes a major portion of angiosperm 
diversity, among which are other families that partici- 
pate in nitrogen-fixing symbioses. Within this large 
clade (the descendants of a single common ancestor), 
the relationships of the family are more controversial. 
Morphological and chemical data suggest affinities 
with families such as Connaraceae or Sapindaceae, 
but molecular results ally the legumes with families 
previously not suggested as close relatives: Polygala- 
ceae (milk vetches), Surianaceae, and Quillaja, an 
anomalous member of the Rosaceae (rose family). 


The Three Subfamilies 

The family is typically divided into three subfamilies 
(Caesalpinioideae, Mimosoideae, Papilionoideae or 
Faboideae), though these are sometimes considered 
to be separate families (Caesalpiniaceae, Mimosaceae, 
Papilionaceae or Fabaceae). Two of the three subfam- 
ilies, Mimosoideae and Papilionoideae, are supported 
as natural groups, whereas Caesalpiniodeae is not 
(Figure 1). 

Subfamily Caesalpinioideae comprises the earliest- 
diverging elements of the family, a group of separate 
evolutionary lineages, some more closely related 
to the other two subfamilies than to one another 
(Figure |). The group is therefore very heterogeneous 
morphologically and ecologically, and is most easily 
characterized by the absence of the unique features 
that distinguish mimosoids and papilionoids. Its ap- 
proximately 150 genera and 2500 species are mainly 
tropical in distribution and include a number of showy 
species that are planted as ornamentals. 

Mimosoideae has fewer genera (around 65), but 
somewhat more species (around 3000) than Caesalpi- 
nioideae. Most of the genera are small, often with only 
a single species, and around two-thirds of mimosoids 
belong to a few speciose genera such as Acacia and 
Mimosa. Flowers in mimosoids typically are individu- 
ally small but often form showy clusters; petals are 
inconspicuous but the stamens are colored and are 
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Figure | Phylogenetic relationships of legumes summarized from several phylogenetic studies, mostly using gene 


sequences from the chloroplast genome. Vertical heavy lines next to names indicate groups whose relationships are 
unresolved, or which themselves represent several lineages that are not closely related. The ancestor of all 
Leguminosae is indicated by an arrow. Two of the three subfamilies are also indicated: Mimosoideae as a terminal unit 
of the tree and Papilionoideae by an arrow pointing to its ancestor; all other legume taxa (genera or tribes) are 
Caesalpinioideae. Major lineages of papilionoid legumes referred to in the text are boxed, with the informal name of 
the group in bold type (e.g., ‘phaseoloids’) within the box. Dashed lines connect particular tribes to boxes containing 
economically or scientifically important representatives. ‘N’ indicates groups known to be capable of nodulation, 
followed by a number in parentheses that refers to one of three potentially independent origins of the syndrome. The 
vast majority of Papilionoideae derived from the ancestor indicated as ‘N(3)’ are known to nodulate, but some do 
not. Two chloroplast DNA structural mutations are indicated by arrows; the taxonomic distribution of the 50kb 
inversion is not precisely known due to lack of sampling. 


numerous in many species, providing the main floral 
display. Unlike other legumes, many mimosoids shed 
their pollen in polyads of 16 or 32 grains. Mimosoi- 
deae has some very close allies that are classified as 
Caesalpinioideae; this has been suspected for some 
time based on morphology, and the hypothesis has 
been supported by molecular data (Figure 1). 
Papilionoideae is the subfamily most people visu- 
alize when legumes are mentioned. It is by far the 
largest and ecologically most diverse of the three sub- 
families, with some 450 genera and 12 000 species. Its 


members typically have bilaterally symmetrical flow- 
ers like those of pea (Pisum), with two wing petals, 
two keel petals, and a large standard petal. It is this 
butterfly-like (‘papilionoid’) floral morphology that 
gives its name to the subfamily. This morphological 
condition is derived, and some early-diverging mem- 
bers of the subfamily retain the radially symmetrical 
floral morphology of Caesalpinioideae and other 
rosid families. The large papilionoid radiation appears 
to have no particularly close allies among caesal- 
pinioid taxa (Figure 1). 


Relationships within Subfamilies: 
Economically and Scientifically Important 
Groups 

Within each of the three subfamilies, genera are 
grouped in tribes, some of which are further subdiv- 
ided formally into subtribes or informally into generic 
groups. A major focus of phylogenetic work has been 
to identify monophyletic groups of genera and to 
compare these with tribal boundaries, some of which 
have been suspected to be more taxonomically con- 
venient than natural since the taxonomic foundation 
of the family was established in the late 1800s by 
George Bentham. Many of these suspicions have 
been confirmed, both for obviously unnatural amal- 
gamations of genera with mostly ancestral morpholo- 
gies, such as the papilionoid tribe Sophoreae, and for 
groups such as Phaseoleae, which has been considered 
to be among the most advanced papilionoid tribes. 
Molecular data, in particular, are revealing some unex- 
pected relationships, and these findings are beginning 
to be reflected in the taxonomy of the family. 

Genera containing species of economic or scientific 
importance are scattered unevenly throughout the 
three subfamilies and their constituent tribes. Apart 
from their use as ornamentals and tropical timber 
trees, Caesalpinioideae have only a few commonly 
known economic taxa, among them carob (Cerato- 
nia). Relatively few genera are known from north 
temperate regions, exceptions being Gleditsia (honey 
locust) and Cercis (judas tree, redbud). The unfamil- 
iarity and general inaccessibility of caesalpinioid gen- 
era is unfortunate from a scientific point of view, 
because, as noted, Caesalpinioideae represents the 
earliest diverging lineages of the family, and thus 
much of the genetic and evolutionary variation of 
Leguminosae. Simply put, it is impossible to make 
generalizations about any genetic phenomenon (gen- 
ome organization, nodulation, floral development) 
for legumes without considering Caesalpinioideae. 
Although some caesalpinioid tribes appear to be 
natural, others clearly are not. 

Mimosoideae includes some better-known genera, 
such as the large and ecologically important Acacia, 
one group of which are well known for housing and 
feeding ants that, in turn, protect the plants from pre- 
dation. Species of Mimosa aptly named ‘sensitive 
plants’ are famous for the thigmotropic response of 
their leaves. Neptunia includes the only truly aquatic 
legumes. Leucaena is a fast-growing tree with promise 
for agroforestry. Other genera include Prosopis (mes- 
quite) and Parkia (locust bean). Tribal boundaries in 
the subfamily are mostly uncertain. 

Nearly all of the legumes familiar to inhabitants of 
the temperate northern hemisphere are members of 
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Papilionoideae. For geneticists, these include both 
Mendel’s pea (Pisum sativum) and the latest ‘model 
organism’ legumes, Medicago truncatula and Lotus 
japonicus, along with soybean (Glycine max) and 
lupin (Lupinus spp.). As with the legumes as a whole, 
a true appreciation of papilionoid diversity is not read- 
ily obtained from these model groups, all of which are 
relatively advanced in one sense or another. Many of 
the earliest diverging lineages of the family have fea- 
tures in common with caesalpinioids, and comprise a 
number of unrelated lineages whose relationships are 
not fully understood (Figure 1). 

Molecular phylogenetic data suggest that there 
have been several major radiations in the subfamily, 
each of which includes some genera with scientifically 
or economically important species. Among these are 
an ‘aeschynomenoid’ group that includes members of 
tribes Dalbergieae, members of which provide rose- 
wood, and Aeschynomeneae, among whose members 
are peanut (Arachis hypogaea). A ‘genistoid’ group 
includes Genisteae, with Lupinus and the familiar 
‘brooms’ of the northern hemisphere, as well as south- 
ern hemisphere tribes such as the southern African 
Podalyrieae. 

Two additional large groups apparently share a 
common ancestor. The “‘Hologalegina’ comprises two 
sister clades, one of which includes Robinieae, with 
genera such as Robinia (locust) and Sesbania (sesban, 
known for its stem nodulation), and Loteae (Lotus, 
including L. japonicus). The second galegoid lineage 
includes a group of mainly temperate, herbaceous 
tribes among which are: Vicieae, with Pisum (pea), 
Lens (lentil), Lathyrus (grass pea), and Vicia (vetch, 
broad bean); Trifolieae, with Trifolium (clover), Med- 
icago (alfalfa and M. truncatula), and Melilotus (sweet- 
clover); Cicereae (Cicer, chick pea); and Galegeae, 
among whose members is the huge genus Astragalus, 
with around 2000 species. Also part of this group are 
Wisteria and allied genera. 

The final lineage, a ‘phaseoloid’ group, includes 
the largest (in number of genera) papilionoid tribe, 
Phaseoleae. Among the members of this tribe are 
Glycine (soybean), Phaseolus (common bean and 
other ‘beans’), Vigna (cowpea, mung bean), Cajanus 
(pigeonpea), and Canavalia (sword or jack bean, the 
source of concanavalin). This lineage also includes 
many tropical, woody genera of the tribe Millettieae. 
Apparently sister to this entire clade is the small 
tribe Indigofereae, whose members include Cyamop- 
sis, the source of guar gum, and Indigofera, the 
source of indigo dye. Neither Phaseoleae nor 
Millettieae are natural groups; tribes Desmodieae and 
Psoraleeae are nested among Phaseoleae genera and 
subtribes. 
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Fossil Record 


Legumes have a well-developed macrofossil record in 
the Eocene (c. 50 million years ago) that includes 
flowers of each of the three subfamilies, suggesting 
that the major radiation of legumes occurred prior to 
that time. The ages of particular genera within each 
subfamily are more problematic, and thus it is difficult 
to say when, for example, pea diverged from soybean, 
or the various species of Phaseolus diverged from their 
common ancestor. 

The divergence of the family from other rosid taxa 
is also difficult to determine with any precision. This is 
at least in part due to the fact that ancient legumes, like 
many modern Caesalpinioideae, were most probably 
fairly stereotypical rosids in much of their morph- 
ology. The rich Cretaceous floral fossil record from 
around 92 million years ago indicates that by that 
time most major lineages of flowering plants had 
diverged from one another. For example, fossil repre- 
sentatives of the lineage that includes Arabidopsis have 
been described from these Cretaceous deposits, indi- 
cating that legumes and Arabidopsis diverged at least 
this long ago. A possible ceiling for this divergence is 
around 110 million years ago, given the paucity of 
angiosperm fossils prior to this period, though this is 
in conflict with some estimates of angiosperm diver- 
gences based on molecular clock assumptions. 


Evolution of Nodulation 


Molecular phylogenies have identified a subgroup 
within the large rosid radiation that includes families 
that participate in nitrogen-fixing symbioses. How- 
ever, within this ‘nitrogen-fixing clade’ the various 
nodulating families do not share a single common 
ancestor, suggesting that the ability to participate in 
these symbioses arose independently in different plant 
groups, but that an ancestor of the entire group may 
have evolved some key (but unknown) innovation 
that facilitates the formation of symbioses with 
diverse nitrogen-fixing microsymbionts. These are 
various ‘rhizobia,’ in legumes and Parasponia (Ulma- 
ceae), or actinorhizal bacteria in other families. Some 
molecular similarities between mycorrhizal and 
nitrogen-fixing symbioses have been noted, and it 
may be that machinery of the pre-existing and more 
widespread mycorrhizal relationship was co-opted 
and modified in the evolution of nodulation. It now 
appears that genes encoding ‘nodulins’ (proteins that 
function in the nodule) are not, strictly speaking, 
novel or uniquely nodular, but have been recruited 
from other functions. Even the quintessential nodulin, 
leghemoglobin, whose presence in legumes was once 
considered so unexpected that it was thought to be a 


case of horizontal gene transfer from the animal king- 
dom, is now known to be part of a plant gene family 
whose membership includes paralogous copies that 
are not associated with symbiosis. 

The ability to nodulate is characteristic of most 
mimosoid and papilionoid legumes, but it is not uni- 
versal in the family. Most Caesalpinioideae do not 
nodulate, nor do some early-diverging lineages of 
Papilionoideae. The phylogenetic distribution of 
nodulation in Leguminosae suggests that the ability 
to nodulate is not primitive in the family, and that 
nodulation may have arisen several times (Figure 1). 
There are also cases of loss of the ability to nodulate, 
which complicates the picture. Nodulation involves 
the production of a special organ, the nodule, and also 
what has been called a novel organelle, the symbio- 
some, consisting of nitrogen-fixing bacteroids enclosed 
in a primarily host-derived peribacteroid membrane. 
Independent losses of these structures seem more 
likely than does their independent origin, but the fact 
that nitrogen-fixing symbioses have almost certainly 
arisen multiple times elsewhere in the flowering plants 
is a mitigating consideration. 

The details of nodulation vary even within Papilio- 
noideae, where nodulation almost certainly had a sin- 
gle origin. The ancestral type of nodule appears to be 
an indeterminate, unbranched type that is also found 
in mimosoids and caesalpinioids. This “caesalpinioid’ 
type has been modified in several ways among pap- 
ilionoids, including highly branched indeterminate 
types such as are common on Trifolieae, the large 
girdling indeterminate type of Lupinus, the clustered 
determinate types of aeschynomenoid taxa such as 
Arachis, and the single globular determinate ‘desmod- 
ioid’ nodules of Loteae and many phaseoloids. The 
desmodioid nodule appears to have originated inde- 
pendently in these two groups, and the determinate 
condition itself apparently arose yet another time in 
aeschynomenoids. Nitrogen is transported as amide 
compounds in many legumes, but as ureides in at least 
some phaseoloids. 


Genomes 


In some legumes the chloroplast genome departs from 
the common pattern of highly conserved gene content 
and order typical of photosynthetic angiosperms. 
Most notably, the large inverted repeat typical of land 
plant chloroplast chromosomes has been lost from all 
of the members of the major lineage of Hologalegina 
that contains Vicieae and allied temperate herbaceous 
tribes, plus Wisteria and allies. In some but not all 
species that have lost the inverted repeat there has 
been considerable subsequent rearrangement. Other 
major rearrangements include a 50 kb inversion found 


in most papilionoid species, and a 78 kb inversion 
found in Phaseolus and close allies (subtribe Phaseo- 
linae). A number of chloroplast gene and intron losses 
have been reported from the family. Several related 
legume genera (e.g., Medicago, Lens, Cicer) depart 
from the typically maternal pattern of chloroplast in- 
heritance found in angiosperms, and exhibit biparental 
or even predominantly paternal transmission. 

What is known about mitochondrial genomes of 
legumes suggests that they are typical of angiosperms 
in being large relative to their counterparts in most 
animals, and exhibit a master/subgenomic structure 
due to recombination among direct repeats. Recent 
transfer of cytochrome oxidase subunit 2 (cox2) from 
the mitochondrial to the nuclear genome has occurred 
in Phaseoleae, with complex patterns of expression 
and, in some cases, subsequent loss from the mito- 
chondrial genome. 

Nuclear genomes of legumes vary greatly in size. 
The smallest legume genomes are only around twice 
the size of that of Arabidopsis thaliana, and are found 
in species of Lablab, Scorpiurus, Trifolium, and Vigna; 
genomes of the model legumes Lotus japonicus and 
Medicago truncatula are only slightly larger than 
these. At the other end of the spectrum, the genomes 
of some diploid (based on chromosome number) Vicia 
and Lathyrus species are nearly 100 times as large as 
that of Arabidopsis. Variation can be extreme within 
genera — diploid species of Vicia vary in their genome 
sizes from around 4 pg/2C (a haploid genome size of 
around 200 Mb) to over 50 pg/2C (2500 Mb). As is 
true for flowering plants in general, information on 
genome sizes is limited for legumes. A handful of 
papilionoid genera have been surveyed in detail, but 
otherwise there are few published values, with par- 
ticularly sparse sampling in Caesalpinioideae and 
Mimosoideae. The sparse data from these subfamilies 
suggest that relatively small genome sizes are ancestral 
in the family as a whole. Cercis and Bauhinia of the 
caesalpinioid tribe Cercideae, which molecular data 
suggest is one of the earliest diverging legume groups, 
have genome sizes of 1.3 and 1.2 pg/2C, respectively. 

The legumes as a whole are considered to have a 
base chromosome number of x = 7, but many groups 
of the family are thought to have experienced early 
polyploidization followed in many cases by aneuploid 
reduction. This is true, for example, of the entire sub- 
family Mimosoideae, and its closest allies in Caesalpin- 
ioideae, which as a group is thought to be tetraploid at 
x = 14. Similarly, the entire Detarieae—Amherstieae 
lineage of caesalpinioid legumes is considered to 
be based on x = 12. Within individual tribes there are 
relatively few genera that are wholly polyploid, 
among them Glycine, which is 27 = 40, as compared 
with most Phaseoleae at 2n = 22; however, 
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neopolyploidy is frequent, for example, Medicago 
includes both diploids (e.g., M. truncatula) and tetra- 
ploids (among them M. sativa). The same is true of 
Glycine, where neopolyploidy is superimposed on a 
fundamentally paleopolyploid base; other examples 
include Lotus, Trifolium, Astragalus, and Lupinus. 

Linkage maps have been constructed for a handful 
of legumes, primarily cultivated papilionoid genera. 
Comparisons among published maps reveal synteny 
conservation among related groups such as the phas- 
eoloid genera Phaseolus, Vigna, and Glycine, or the 
galegoids Pisum, Lens, and Cicer. Identifying con- 
served linkage blocks between such major groups or 
with more divergent taxa such as Lupinus has been 
more difficult, presumably in part because even in 
relatively close comparisons there are often many 
rearrangments. However, in light of increasing evi- 
dence of synteny conservation among angiosperms 
as a whole (e.g., between soybean and Arabidopsis) it 
seems likely that it will eventually be possible to trace 
linkage evolution across the entire family. 
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Leiomyoma 


See: Lipoma and Uterine Leiomyoma 
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Jérôme Lejeune (1926-94) is credited in 1959 with 
Gautier and Turpin as the first to identify trisomy 
of a small chromosome to be the cause of Down 
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syndrome. The chromosome concerned was later 
designated chromosome 21 and Down syndrome was 
subsequently frequently referred to as trisomy 21 syn- 
drome. The lead up to this discovery was the intro- 
duction of various improvements in the methods for 
studying mammalian chromosomes in the 1950s using 
cultured fibroblasts as a source of dividing cells and 
resulted in the first accurate count of the number of 
human chromosomes in 1956 by Tjio and Levan. The 
scene was set for wide-scale application of cytogenetic 
analysis in humans and in the same year as Lejeune’s 
discovery of trisomy 21, Patricia Jacobs and colleagues 
independently confirmed trisomy 21 in Down syn- 
drome and also demonstrated aneuploidy of the X 
chromosome in Klinefelter and Turner syndromes. 
These discoveries made an enormous impact in med- 
ical circles and overnight changed the perception of 
medical genetics as an obscure activity practised by a 
few hobbyists to a field with enormous potential for 
understanding the causes of mental retardation and 
other congenital abnormalities. In France, Lejeune 
was at the forefront of applying wide-scale cyto- 
genetic analysis in the development of clinical genetics 
services. His group was one of the first to recognize 
the genetic importance of partial deletions of auto- 
somes with their description in 1963 of a syndrome 
called the ‘cri du chat’ (cat cry) syndrome caused by 
deletion of part of the short arm of chromosome 5. 
Lejeune emphasized the relevance of recombination 
and segregation in balanced chromosome rearrange- 
ments in normal carriers to explain phenotypic defects 
in their progeny owing to cryptic duplications and 
deletions and coined the term ‘aneusomie de récombin- 
ation’ to describe this. He coauthored many publica- 
tions together with his long-term colleagues Rethoré 
and de Grouchy describing a wide range of pheno- 
types associated with various forms of partial trisomy 
or monosomy with perhaps trisomy 9p being one of 
the most notable. 

At the time of the development of chromosome 
banding patterns at the beginning of the 1970s, 
Lejeune and Bernard Dutrilleaux were involved in 
developing a gallic form of chromosome banding 
which gave the reverse banding pattern to that devel- 
oped by the rest of the world. This banding was 
termed reversed banding and appeared to be directly 
complementary to the G-banding technique used 
by many others. Besides using reversed banding 
for describing human chromosome abnormalities, 
Dutrilleaux and Lejeune went on to study the chromo- 
some banding patterns in primates and to construct 
their karyotypic evolution. 

The early 1970s saw the introduction of prenatal 
diagnosis on a wide scale in France. Lejeune was a 
devout Catholic and the concept of terminating 


genetically abnormal pregnancies was absolutely 
abhorrent to him. As an alternative, he advocated 
developing therapies for ameliorating mental retard- 
ation, particularly in Down syndrome, based on the 
surmise that neural development is compromised by a 
metabolic imbalance induced by the activity of genes 
present on the extra chromosome. In particular, 
Lejeune believed that there was disturbed mono- 
carbon compound synthesis leading to an excess or de- 
ficiency of some amino acids in the plasma. Some of the 
metabolic features claimed by Lejeune to be charac- 
teristic of Down syndrome were the increased in vitro 
sensitivity to methotrexate and atropine. He advo- 
cated nutritional compensation for the amino acid 
deficiencies and folic acid medication for the increased 
methotrexate sensitivity. Although claims were made 
of an astonishing improvement in the mental capabil- 
ity of individual patients, these were anecdotal and in 
general the treatment strategies were not widely 
accepted by his medical colleagues. Gradually Lejeune 
became more and more isolated from the mainstream 
of human genetics activities in France and internation- 
ally. He devoted increasingly more time to running his 
clinic and cytogenetics laboratory according to pro- 
life principles. In the 1980s and early 1990s, Lejeune 
frequently appeared as a pro-life expert witness in 
court cases in North America. 

Lejeune received many honors and awards in his 
lifetime. He was a member of the academies of 
sciences in the USA, Sweden, Italy, and Argentina, 
of the Royal Society of Medicine in London, the 
Academy of Medicine in France, and the Pontifical 
Academy of Science in Rome. In 1963 he received the 
Kennedy Prize for his discovery of the cause of Down 
syndrome and in 1969 the Memorial Allen Award 
medal from the American Human Genetics Society. 
Shortly before his death, he was appointed by the 
Pope to head the newly formed Pontifical Academy 
for Life. 

His funeral, attended by 3000 people, was held in 
Notre Dame, Paris. The service was remarkable in that 
a Down syndrome patient spontaneously stood up 
and gave his personal thanks to Lejeune for giving 
him the courage and dignity not accorded him by 
French society. In 1996, family, friends and colleagues 
of Lejeune created the Jérôme Lejeune Foundation to 
carry on the work on mental retardation according to 
Lejeune’s principles. 

As in life, Lejeune was also a source of controversy 
in death. Four years following Lejeune’s death, the 
Pope visited France and made a visit to pray at 
Lejeune’s grave. The Holy Father received an unpre- 
cedented public rebuke from France’s ruling Socialist 
Party who claimed that the Pope, merely by visiting 
his grave, was interfering in the legal right of the 


French to abortion, a statute that had been in place 
since 1975. 


See also: Down Syndrome; Ethics and Genetics 
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The Lesch-Nyhan syndrome (LNS) is an X-linked 
severe disorder of purine metabolism, caused by an 
almost complete deficiency of the enzyme hypo- 
xanthine-guanine phosphoribosyl transferase (HPRT). 
HPRT catalyzes the recycling reaction in which the 
free purine bases hypoxanthine and guanine are reutil- 
ized to form their respective nucleotides, inosinic and 
guanylic acids. This purine salvage mechanism pro- 
vides an alternative and more economical pathway to 
de novo purine nucleotide synthesis. Uric acid is the 
end product of purine metabolism. In the absence of 
the salvage pathway, excessive amounts of uric acid are 
produced. 

The classical Lesch-Nyhan disease is characterized 
by hyperuricemia, mental retardation, self-injurious 
behavior, choreoathetosis, and spasticity. However, 
there is wide phenotypic heterogeneity in the expres- 
sion of HPRT deficiency. Three overlapping cat- 
egories can be identified, in which the severity of 
clinical manifestations depends on the degree of re- 
sidual enzyme activity: 


1. Classical Lesch-Nyhan syndrome (less than 1.5% 
of residual enzyme activity). Male infants with 
Lesch-Nyhan disease appear normal at birth and 
usually develop normally for the first 6-8 months 
of their lives. Within the first few years of life, 
patients develop dystonia, choreoathetosis, spasti- 
city, hyperreflexia, and extensor plantar reflexes. In 
established patients the overall motor defects are of 
such severity that they can neither stand nor sit 
unassisted. No patient with this disease has learned 
to walk. Most patients are cognitively impaired, but 
mental retardation is difficult to assess because of 
the behavioural disturbance and motor deficits. 
Many patients learn to speak, but atheoid dys- 
arthria makes their speech difficult to understand. 
Self-injurious behavior is the hallmark of the 
disease and occurs in 100% of patients. The most 
characteristic feature is self-destructive biting of 
hands, fingers, lips, and cheeks. Hyperuricemia is 
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present in almost all patients. The clinical conse- 
quences of the accumulation of large amounts of 
uric acid in body fluids are the classical manifest- 
ations of gout. 

2. Neurological variant (1.5-8% of residual enzyme 
activity). The ‘neurological’ picture has been obser- 
ved in a small but important group of patients and is 
characterized by a neurological examination that is 
identical to that of the classic Lesch-Nyhan patient 
(i.e, cerebral palsy or atheoid cerebral palsy). 
Patients are confined to wheelchairs and unable to 
walk. However, behavior is normal and intelligence 
is normal or nearly normal. 

3. Hyperuricemic variant (more than 8% of residual 
enzyme activity). The phenotype of the patients 
with this partial variant enzyme consists of mani- 
festations that can be directly related to the accu- 
mulation of uric acid in body fluids (acute attacks 
of gouty arthritis and tophi). Indeed, the central 
nervous system and behavior are normal. 


The HPRT gene is located on the long arm of chromo- 
some X (Xq26-q27). The gene has been cloned and its 
sequence determined: the entire locus spans more than 
44 kb, the coding region consisting of 654 nucleotides 
in nine exons. The protein contains 218 amino acids. 
HPRT is expressed in all tissues, although at different 
levels, and the enzyme is particularly active in basal 
ganglia and testis. The incidence of LNS has been 
estimated to range from 1 in 100000 to 1 in 380000. 
Characterization of the molecular defect in the HPRT 
gene of a number of HPRT-deficient patients has 
revealed a heterogeneous pattern of mutations, with 
the same alteration rarely being found in unrelated 
pedigrees. About 63% of all the described molecular 
alterations represent point mutations, giving rise to 
either amino acid substitution in the protein sequence 
or stop codons, leading to truncated protein mol- 
ecules. In some instances, the point mutation alters a 
splice site consensus sequence, activating an alterna- 
tive, cryptic splice site, creating aberrant mRNA and 
protein products. It has not been possible to clearly 
correlate different types of mutations (genotype) with 
the various aspects of the clinical manifestations 
(phenotype). However, a rough guide predicts that 
mutations producing complete disruption of HPRT 
enzyme function (stop codons, deletions) are asso- 
ciated with classical LNS, while mutations allowing 
some residual HPRT enzyme activity (conservative 
amino acid substitutions) are associated with a less 
severe phenotype. 

The excessive uric acid production in HPRT- 
deficient patients is effectively treated with daily 
administration of allopurinol. This is the unique 
and specific treatment available for all the patients 
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diagnosed with HPRT deficiency, both classical 
Lesch-Nyhan and partial variants. Unfortunately, no 
medication has been found to be consistently effective 
in treating the neurological or behavioral manifest- 
ations of the disease in classical Lesch-Nyhan 
patients. The only successful approaches to the self- 
injurious behavior have been physical restraint and 
the removal of teeth, to prevent self-biting. Future 
approaches may include gene therapy: promising 
results have already been obtained in vitro. 


See also: Gene Therapy, Human; 
Genetic Counseling; Purine 
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A lethal locus is any gene in which a lethal mutation 
can be obtained. 


See also: Conditional Lethality; Lethal Mutation 
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Mutations result in permanent alterations or changes 
in DNA sequence. Such changes include point muta- 
tions, in which only single base pairs are affected, or 
chromosomal rearrangements, translocations, or dele- 
tions, in which larger regions of DNA are affected. 
When these alterations cripple a gene that is essential 
for an organism’s survival and result in death, they are 
referred to as lethal mutations. 
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Leucine is one of the 20 amino acids commonly 
found in all proteins. Its abbreviation is Leu and its 


COOH 
ea 
be 
ba 
HC CH3 


Figure | Leucine. 


single-letter designation is L. As one of the essential 
amino acids in humans, it is not synthesized by the 
body and so must be provided in the individual’s diet 
(Figure 1). 
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Leukemia is cancer of the blood which occurs in 
several forms. The disease can be chronic or acute; 
patients with the former live for a number of years 
whereas patients with the latter live for only a few 
weeks or months unless they receive appropriate 
treatment. In addition, leukemias are further sub- 
divided by the type of cell that is involved. Common 
forms are chronic lymphatic leukemia or acute lym- 
phoblastic leukemia (ALL) which affects lympho- 
cytes of either the B- or T-cell lineage, and chronic 
myelogenous leukemia (CML) and acute myelo- 
genous leukemia (AML), which affect bone marrow 
cells of the red cell, granulocytic, monocytic or mega- 
karyocytic (platelet) lineages. 

In this section, I will focus on acute leukemia, both 
ALL and AML. For reasons that are not understood at 
present, ALL occurs much more commonly in chil- 
dren and young adults whereas AML is more frequent 
in older adults. The genetic changes in ALL and AML 
are different and thus it is not surprising that the 
treatments differ as well. In general, children with 
ALL respond much better to present treatments and 
over 70% have very long survivals of more than 5 
years. In contrast, adults with AML may respond 
initially to treatment but then relapse and die. 
In AML and, to a lesser extent, ALL, the length of 
survival is very closely associated with the types of 
genetic changes that are present in the leukemic cell. 
Present evidence indicates that these genetic changes 
occur de novo in an otherwise normal blood cell. 


Table | Cytogenetic—-immunophenotypic correlations in malignant B-lymphoid diseases 
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Phenotype 


Chromosome abnormality 


Involved genes 


Acute lymphoblastic leukemia 
Pro-Pre-B 


B(Slg+) 


B or B-myeloid 


Other 


t(1;19)(q23:p 13) 


PBX1-TCF3 (E2A) 


t(12;21)(p13;q22) TEL-AML I 
t(8;14)(q24;q32) MYC-IGH 
t(2;8)(p12;q24) IGK-MYC 
t(8;22)(q24;q| 1) MYC-IGL 
t(9;22)(q34;q| 1) ABL-BCR 
t(4;11)(q21;q23) AF4-MLL 
t(11;19)(q23;p13.3) MLL-ENL 
50-60 chromosomes 

t(5;14)(q31;q32) IL3-IGH 
del(9p),t(9p) ?CDKN2(p 16) 
t(9;12)(q34;p 13) TEL-ABL 
del(12p) TEL;?p27*"! 


Reproduced with permission from Rowley JD (1999) The role of chromosome translocations in leukemogenesis. Seminars in 
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Table 2 Cytogenetic—-immunophenotypic correlations in malignant T-lymphoid diseases 


Phenotype 


Chromosome abnormality 


Involved genes 


Acute lymphoblastic leukemia 


t(1;14)(p34:q1 1) LCK-TCRD 
t(1;14)(p32;q1 1) TALI-TCRD 
= TALIP® 
t(7;9)(q35;q32) TCRB-TAL2 
t(7;9)(q35;q34) TCRB-TAN I 
t(7;7)(p15;q1 1) TCRG-? 
t(7;14)(q35;q1 1) TCRB-TCRD 
t(7;14)(p15;q1 1) 
t(8;14)(q24;q1 1) MYC-TCRA 
inv(14)(qI 1;q32) TCRA-IGH 
t(14;14)(q1 1;q32) TCRA-IGH 
t(10;14)(q24;q1 1) HOXI I-TCRA 
t(11;14)(p15;q1 1) LMOI-TCRD 
t(11;14)(p13;q1 1) LMO2-TCRD 


Reprinted with permission from Rowley JD (1999). 


That is, there is little evidence for predisposing genetic 
factors as may be found in breast cancer or colon 
cancer. All present evidence indicates that the 
transformation of a normal cell to a leukemic cell 
involves changes in a series of genes only some of 
which are presently known. Thus the challenge for 
the future is to identify all of the genetic changes that 
occur, the order in which they occur and the func- 
tional consequences of these changes. 


Genetic Changes in Acute Leukemia 


Chromosome Translocations 

Most of our information about the genetic changes in 
all forms of human leukemia has come from an analy- 
sis of the chromosome pattern of the leukemic cells. 
The leukemic cells are obtained usually from a bone 
marrow sample or peripheral blood and the dividing 
cells which contain condensed chromosomes are 
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Table 3 Recurring structural rearrangements in malignant myeloid diseases 


Disease 


Chromosome abnormality 


Involved genes 


Chronic myeloid leukemia 
CML blast phase 


Chronic myelomonocytic leukemia 


Acute myeloid leukemia 


AML-M2 
APL-M3, M3V 
atypical APL 
AMMoL-M4Eo 


AMMoL-M4/AMoL-M5 


AMegL-M7 
AML 


Therapy-related AML 


t(9;22)(q34;q1 1) ABL-BCR 
t(9;22), +8, +Ph, i(17q) ABL-BCR 
t(5;12)(q33;p 13) PDGFRB-TEL 
t(8;21)(q22;q22) ETO-AMLI 
t(15;17)(q22;q12) PML-RARA 

t(l 1;17)(q23;q12) PLZF-RARA- 
inv(16)(p13q22) or MYHI I-CBFB 
t(16;16)(p13;q22) 

t(6;1 1)(q27;q23) AF6-MLL 
t(9;1 1)(p22;q23) AF9-MLL 
t(1;22)(p13;q13) 

t(3;3)(q21;q26) RPNI-EVII 

or inv(3)(q21q26) 

t(3;5)(q21;q31) 

t(3;5)(q25;q34) MLFI-NPM 
t(6;9)(p23;q34) DEK-CAN 
t(7;11)(p15;p15) HOXA9-NUP98 
t(8;16)(p I l;p13) MOZ-CBP 
t(9;12)(q34;p 13) TEL-ABL 
t(12;22)(p133q13) TEL-NM I 


t(16;21)(pl 1;q22) 
—7 or del(7q) 
—5 or del(5q) 
del(20q) 

del(12p) 


—7 or del(7q) and/or 
—5 or del(5q) 

t(l 1q23) 
t(3;21)(q26;q22) 


TLS(FUS)-ERG 


TEL, (p27 


IRFI? 
MLL 
EAP/MDS1/EVI!-AMLI 


Reprinted with permission from Rowley JD (1999). 


processed according to standard techniques. Normal 
cells have 46 chromosomes, but leukemic cells can 
contain many abnormalities. Fortunately, a number of 
chromosome changes are recurring and many of these 
recurring changes are associated with certain subtypes 
of leukemia. Moreover, as will be discussed later, some 
chromosome changes provide physicians with very 
important information on the likely response of 
the leukemia cells to the treatment. In fact, certain 
chromosome changes only respond to certain types 
of treatment and thus analysis of the chromosome 
pattern (karyotype) of the leukemic cells helps the 
physician select the most effective treatment. The 
chromosome changes in leukemic cells involve 
both gains and losses of whole chromosomes or 
parts of chromosomes. In addition, chromosome 
translocations are important; in translocations, two 
chromosomes are broken and the broken ends are 


exchanged. Translocations are a very important 
mechanism of genetic change in leukemias, lymph- 
omas, and a few solid tumors. Chromosome transloca- 
tions have one of two consequences. In many of the 
malignant lymphoid tumors, the breaks occur in or 
near to the immunoglobulin gene in B cells or to the T 
cell receptor in T cells. The translocation joins these 
very highly active genes to a target gene that is then 
more actively expressed than in a normal cell. The 
protein produced by the target gene is a normal pro- 
tein. In most of the myeloid leukemias, both acute and 
chronic, the two genes involved in the translocations 
are broken and two new genes may be formed as a 
result of a reciprocal exchange. In some situations, 
part of one gene is deleted so that there is only one 
new fusion gene; it is clearly the fusion that is critical 
for malignant transformation. These fusion genes and 
the resultant fusion protein are unique tumor-specific 


markers and they provide special targets for therapeut- 
ic intervention. 


Chromosome Abnormalities in Acute 
Lymphoblastic Leukemia 

All types of chromosome abnormalities are seen in 
ALL, often in combination. For the most part, the 
genes that are involved in gains or losses of chromo- 
somes are unknown. Translocations or other re- 
arrangements such as inversions that involve the 
immunoglobulin loci 14q32 (heavy chain), 2p12 
(« light chain), or 22q11 (A light chain) or the T cell 
receptor loci, 14q11 («/6 chain), 7q35 (B chain) or 7p13 
(y chain) are of the first type described above (sec- 
tion “Chromosome Translocations”). They alter the 
expression of the target genes but the target gene protein 
is normal. In fact, the first translocation identified ina B 
cell malignant disease was the 8;14 translocation in 
Burkitt lymphoma that subsequently was shown to 
involve the immunoglobulin gene at 14q32 and the 
MYC gene at 8q24. This translocation which is also 
seenin B cell ALL leads to the inappropriate expression 
of the MYC gene which is an important component of 
the pathway regulating cell growth. The immuno- 
globulin light chain genes are also involved in transloca- 
tions with MYC. The other important chromosome 
changes are listed in Tables | and 2. 


Chromosome Abnormalities in Acute 
Myelogenous Leukemia 

As with ALL, all forms of chromosome change are 
seen as recurring abnormalities in AML. The targets of 
these abnormalities are virtually unknown despite 
heroic efforts on the part of many investigators to 
identify the target genes. Identification of chromo- 
some translocations in the 1970s showed that certain 
translocations were closely associated with particular 
subtypes of leukemia; in fact, the association is so 
important that the genetic changes are now used in 
morphologic classification of these leukemias. The 
first consistent chromosome translocation in any 
malignant cell was identified in 1972; it was the 8;21 
translocation seen in AML. Since then several hun- 
dred different translocations have been identified and 
almost 100 of these have been cloned. The majority 
of translocations result in new fusion genes. The 
common recurring aberrations are listed in Table 3. 


Clinical and Biological Importance of 
Chromosome Abnormalities in 
Leukemia 


Studies of chromosome translocations will assume 
even greater importance in the future because the 
unique fusion genes and proteins that are identified 


Leukemia, Acute 1091 


in many of these rearrangements are tumor-specific 
markers for the malignant cells. With further under- 
standing of the alterations in function of these genes 
and proteins, it should be possible to target cells with 
these fusion genes/proteins specifically and to spare 
the other normal cells in the patient. The major goal 
for the new millennium is to translate our increasingly 
sophisticated understanding of how the translocations 
interfere with the critical function of these genes to 
predict specific therapy that would likely be more 
effective and less toxic than current therapy. This 
requires that we identify the multiple genes that are 
involved in leukemogenesis. 


See also: Leukemia, Acute; Leukemia, Chronic; 
Translocation 
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The acute leukemias represent the malignant trans- 
formation of myeloid and lymphoid precursors within 
the bone marrow or thymus. All hematopoietic pre- 
cursor cells can be transformed. The commonest leu- 
kemia in children from developed countries is B-cell 
precursor acute lymphoblastic leukemia (BCP-ALL), 
which express the neutral endopeptidase CD10; these 
are often referred to as ‘common’ ALL (cALL). 
BCP-ALL is the commonest malignancy of child- 
hood. However, as with other malignancies, the 
incidence of other forms of acute leukemia and par- 
ticularly acute myeloid leukemia (AML) increases 
with age. The etiologies of the acute leukemias remain 
unknown. Although many hypotheses have been 
advanced, particularly for childhood BCP-ALL, 
none have been proven, due in major part, to the rarity 
of the disease. Childhood leukemias may arise in utero 
(see below). Familial acute leukemia is rare, although 
when it occurs, it frequently exhibits genetic antici- 
pation. 

Diagnosis is based on a combination of morph- 
ology (particularly for the myeloid/monocytic 
leukemias), immunophenotype, and molecular cyto- 
genetics. Most acute leukemias exhibit the immuno- 
phenotype of a single hematopoietic lineage, although 
some may coexpress molecules associated with two 
different lineages; these are known as biphenotypic 
acute leukemias. Cytogenetics, increasingly supple- 
mented by fluorescent im situ hybridization (FISH) 
and molecular techniques such as reverse transcriptase 
polymerase chain reaction (RI-PCR), plays a 


markers and they provide special targets for therapeut- 
ic intervention. 


Chromosome Abnormalities in Acute 
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All types of chromosome abnormalities are seen in 
ALL, often in combination. For the most part, the 
genes that are involved in gains or losses of chromo- 
somes are unknown. Translocations or other re- 
arrangements such as inversions that involve the 
immunoglobulin loci 14q32 (heavy chain), 2p12 
(« light chain), or 22q11 (A light chain) or the T cell 
receptor loci, 14q11 («/6 chain), 7q35 (B chain) or 7p13 
(y chain) are of the first type described above (sec- 
tion “Chromosome Translocations”). They alter the 
expression of the target genes but the target gene protein 
is normal. In fact, the first translocation identified ina B 
cell malignant disease was the 8;14 translocation in 
Burkitt lymphoma that subsequently was shown to 
involve the immunoglobulin gene at 14q32 and the 
MYC gene at 8q24. This translocation which is also 
seenin B cell ALL leads to the inappropriate expression 
of the MYC gene which is an important component of 
the pathway regulating cell growth. The immuno- 
globulin light chain genes are also involved in transloca- 
tions with MYC. The other important chromosome 
changes are listed in Tables | and 2. 


Chromosome Abnormalities in Acute 
Myelogenous Leukemia 

As with ALL, all forms of chromosome change are 
seen as recurring abnormalities in AML. The targets of 
these abnormalities are virtually unknown despite 
heroic efforts on the part of many investigators to 
identify the target genes. Identification of chromo- 
some translocations in the 1970s showed that certain 
translocations were closely associated with particular 
subtypes of leukemia; in fact, the association is so 
important that the genetic changes are now used in 
morphologic classification of these leukemias. The 
first consistent chromosome translocation in any 
malignant cell was identified in 1972; it was the 8;21 
translocation seen in AML. Since then several hun- 
dred different translocations have been identified and 
almost 100 of these have been cloned. The majority 
of translocations result in new fusion genes. The 
common recurring aberrations are listed in Table 3. 


Clinical and Biological Importance of 
Chromosome Abnormalities in 
Leukemia 


Studies of chromosome translocations will assume 
even greater importance in the future because the 
unique fusion genes and proteins that are identified 
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in many of these rearrangements are tumor-specific 
markers for the malignant cells. With further under- 
standing of the alterations in function of these genes 
and proteins, it should be possible to target cells with 
these fusion genes/proteins specifically and to spare 
the other normal cells in the patient. The major goal 
for the new millennium is to translate our increasingly 
sophisticated understanding of how the translocations 
interfere with the critical function of these genes to 
predict specific therapy that would likely be more 
effective and less toxic than current therapy. This 
requires that we identify the multiple genes that are 
involved in leukemogenesis. 


See also: Leukemia, Acute; Leukemia, Chronic; 
Translocation 
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The acute leukemias represent the malignant trans- 
formation of myeloid and lymphoid precursors within 
the bone marrow or thymus. All hematopoietic pre- 
cursor cells can be transformed. The commonest leu- 
kemia in children from developed countries is B-cell 
precursor acute lymphoblastic leukemia (BCP-ALL), 
which express the neutral endopeptidase CD10; these 
are often referred to as ‘common’ ALL (cALL). 
BCP-ALL is the commonest malignancy of child- 
hood. However, as with other malignancies, the 
incidence of other forms of acute leukemia and par- 
ticularly acute myeloid leukemia (AML) increases 
with age. The etiologies of the acute leukemias remain 
unknown. Although many hypotheses have been 
advanced, particularly for childhood BCP-ALL, 
none have been proven, due in major part, to the rarity 
of the disease. Childhood leukemias may arise in utero 
(see below). Familial acute leukemia is rare, although 
when it occurs, it frequently exhibits genetic antici- 
pation. 

Diagnosis is based on a combination of morph- 
ology (particularly for the myeloid/monocytic 
leukemias), immunophenotype, and molecular cyto- 
genetics. Most acute leukemias exhibit the immuno- 
phenotype of a single hematopoietic lineage, although 
some may coexpress molecules associated with two 
different lineages; these are known as biphenotypic 
acute leukemias. Cytogenetics, increasingly supple- 
mented by fluorescent im situ hybridization (FISH) 
and molecular techniques such as reverse transcriptase 
polymerase chain reaction (RI-PCR), plays a 
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Table | 


some common chromosomal translocations and genes in acute leukemia 


Diagnosis Cytogenetic abnormality Involved genes 

B-cell precursor ALL t(12;21)(p13;q21) ETV6/AMLI 
t(9;22)(q34;q1 1) ABL/BCR 
t(1;19)(q23;p13) PBX/E2A 


T-cell precursor ALL 


t(11;14)(p13-p15;q1 1) 
t(1;14)(p32:q! 1) 


LMO1/2-TCRD/A 
TALI-TCRD/A 


AML Various | 1q23 translocations MLL fusions 
t(8;21)(q22;q22) CBF a/ETO 
inv(16)(p13q22) MYHI IICBF ß 

APL? t(15;17)(q21;q22) PML/RARa 


“APL = acute promyelocytic leukemia. 


prominent role. Detection of certain chromosomal 
translocation is of major prognostic significance 
and determines the intensity, duration, and type of 
therapy; patients with poor prognosis to conventional 
therapy may undergo allogeneic stem cell transplanta- 
tion whilst in first remission. In other patients, it may 
be possible to reduce chemotherapy, without increas- 
ing the relapse rate. 

Since the tumor can be readily accessed, and since 
it is possible to culture human hematopoietic cells 
in vitro and derive cell lines, cytogenetic analysis of 
the acute leukemias is advanced. Moreover, many 
clones are cytogenetically simple, containing only 
one chromosomal translocation, and lacking the cyto- 
genetic complexity seen in solid tumors and lympho- 
mas. Some of the common translocations are shown in 
Table |. Much effort has been made to clone the 
recurrent chromosomal translocations and identify 
the involved genes, as these are intimately involved 
in the pathogenesis of the disease. This has been con- 
firmed experimentally by the creation of ‘knock-in’ 
mice where the translocation is created in embryonic 
stem cells (Corral et al, 1996). Most genes are 
involved with a single partner in a single disease. The 
MLL gene on 11q23 is remarkable for being involved 
with over 20 other genes in translocations, principally 
in AML. 

In most instances, the consequences of transloca- 
tion in acute leukemias are the generation of fusion 
genes derived from the coding regions of genes on the 
two chromosomes. These fusion transcripts are useful 
clone-specific markers, allowing the detection of dis- 
ease with unprecedented sensitivity and redefining the 
criteria used for ‘remission.’ Translocations in T-cell 
precursor ALL in contrast involve the T-cell receptor 
(TCR) gene segments and result in deregulated 
expression of the incoming oncogene, through juxta- 
position of transcriptional enhancers within the TCR 
loci. In both instances, the involved genes are 
transcription factor controlling development and 


differentiation. Further dissection of the transcrip- 
tional pathways involved may allow the rational intro- 
duction of new therapeutic strategies. 

A fascinating observation is that many of the child- 
hood leukemias may originate in utero. This conclu- 
sion was made initially on the basis of data from 
identical twins with concordant leukemia, the leukem- 
ic stem cell passing from one twin to another due to 
the shared placental blood supply. Such twins showed 
identical translocation breakpoints and antigen recep- 
tor gene rearrangements. These data have now been 
confirmed in other patients through the use of Guthrie 
blood spots, collected at birth for the screening for 
phenylketonuria. These spots contain sufficient DNA 
to allow retrospective analysis of the leukemic clone at 
birth by using long-range PCR methods to detect the 
t(12;21)(p13;q21) breakpoint. This clone may only 
present several years later, implying the necessity for 
other genetic/environmental events for its eventual 
appearance. However, at least some chromosomal 
translocations may occur in normal stem cells with 
normal capacity to differentiate without giving rise 
to overt leukemia. 


Further Reading 

Rowley JD (1998) The critical role of chromosome transloca- 
tions in human leukemias. Annual Review of Genetics 32: 
495-519. 

Wiemels JL, Cazzaniga G, Daniotti M et al. (1999) Prenatal origin 
of acute lymphoblastic leukaemia in children. Lancet 354: 
1499-1503. 
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The chronic leukemias comprise a heterogeneous 
group of malignancies, representing the transform- 
ation of mature lymphocytes of B, T, and rarely NK 
lineages at specific points in their normal differenti- 
ation pathways. (Chronic leukemias of the myeloid 
lineage including chronic myeloid leukemia (CML) 
are discussed in the article BCR/ABL Oncogene.) 
Various subtypes may be recognized on the basis of 
cytology, immunophenotype, and molecular cytogen- 
eticfindingsassummarizedin Table | (Catovsky, 1999). 
The causes of these diseases remain unknown, although 
progress has been made in the identification of key 
genes through the molecular cloning of chromosomal 
translocation breakpoints, principally involving the 
immunoglobulin (JG) or T-cell receptor loci. 

B-cell chronic lymphocytic leukemia (CLL) is the 
commonest form of leukemia. It is a disease primarily 
of the elderly. CLL is a disease of CD5+ B cells, which 
may constitute a distinct B-cell lineage. A striking 
feature is the wide variation in biological behavior, 
some patients requiring no therapy for many years, 
others having rapidly progressive and chemotherapy- 
resistant disease. Correspondingly, there is no com- 
mon genetic abnormality. Unlike other forms of 
both acute and chronic leukemia, about 5% of cases 
have a familial component and may exhibit genetic 


Table I Subtypes of chronic lymphoid leukemias 
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anticipation. Abnormalities of chromosome 13q14 
are common but, despite much work, the pathological 
consequences remain unclear. Patients with either 
deletions of chromosome 11q23 or mutations of p53 
have rapidly progressive/chemotherapy-resistant dis- 
ease. The status of the immunoglobulin heavy chain 
variable (IGHV) region gene segments in CLL 
defines two biologically distinct groups of disease. 
B cells that have encountered antigen and passed 
through the germinal centre exhibit [GHV mutations 
within the DNA sequences that encode the antigen- 
binding loops of the antibody protein. Patients with 
unmutated JGHV segments have a worse prognosis 
than those with JGHV mutations. 

Other forms of B-cell chronic leukemia are rela- 
tively uncommon. Splenic lymphoma with villous 
lymphocytes (SLVL) is generally a very indolent dis- 
ease. Nevertheless, a subset exhibits translocations to 
the immunoglobulin loci involving either Cyclin D1 
(CCND1) or CDK6 genes involved in control of cell- 
cycle progression. B-cell prolymphocytic leukemia 
(B-PLL) is remarkable amongst hematological malig- 
nancies for having a high incidence of p53 mutations. 
Moreover, the pattern of p53 mutation is distinct from 
that seen in other diseases. 

All T-cell malignancies are relatively rare. T-cell 
prolymphocytic leukemia (T-PLL) is clinically highly 
aggressive and is of interest, as the same disease is 
seen at increased incidence in patients with ataxia- 
telangectasia (AT). Sporadic T-PLL is characterized 
by enormous cytogenetic complexity. However, 
acquired ATM mutations and rearrangements are 
found in probably all cases of sporadic T-PLL. T-PLL 


Disease Immunophenotype 


Recurrent cytogenetic changes Molecular abnormalities 


CLL CD5+, CD22—, CD23+, FMC7 
Surface lg weak 


SLVL  CD5+/—, CD22++, CD23 
FMC7+, slg+/— 


Deletion/translocations of 13q14 
Deletion of |1q23 

Deletion of |7p13.1 

Trisomy |2 (secondary) 
t(14519)(q32.3;q13) 
t(11;14)(q13;q32.3) 
t(2;7)(p12;q22) 

deletion of 17p13.1 


Unknown 

ATM mutation 

p53 mutation 
Unknown 

BCL3 (rare) 

CCND! overexpression 
CDK6 overexpression 
p53 mutation 


No recurrent clonal abnormalities 


B-PLL CD5+/—, CD22++, CD23 
FMC7+-+, slg++ 
HCL* slg++, CD5—, CD22++, FMC7+ CD254 
T-PLL sCD3+, CD4+, CD25 
T-LGL sCD3+, CD8+, CD25 
ATLL sCD3+, CD4+, CD254 
Sezary’s sCD3+, CD4+, CD25 


inv(14)(q1I 1;q32.1) 

t(X;14)(q28;q1 1) 

deletion of | 1q23 

No recurrent clonal abnormalities 
No recurrent clonal abnormalities 
No recurrent clonal abnormalities 


TCL! overexpression 
MTCP I overexpression 
ATM mutation 


*HCL = hairy cell leukemia. 
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also exhibits deregulated expression of two closely 
related genes of unknown function TCL/ and 
MTCP1; the former locus on chromosome 14q32.1 
contains a number of closely related genes. There are 
no consistent cytogenetic or molecular markers for the 
other T-cell leukemias including the leukemias of large 
granular lymphocytes (LGL), which may be of either 
T-cell or NK lineages, HTLV-1 associated adult T-cell 
lymphoma/leukemia (ATLL), or Sezary syndrome. 
Comparative genomic hybridization (CGH) has 
shown amplifications of chromosome 2p13 and 
14q32.1in ATLL. 
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Albert Levan (1905-98) is most famous for his discov- 
ery that the diploid chromosome number in humans is 
46 and not 48 as had been the dogma since 1912. This 
discovery was made with Joe-Hin Tjio at the Institute 
of Genetics, University of Lund, Sweden. It resulted 
from the application to human cells of a methodology 
for chromosome preparation that Levan had pioneered 
in plants and animals. 

Levan was born and grew up in the Swedish town 
of Gothenburg, where his father, who was Director of 
Post Services, passed to Albert an interest in classical 
languages and botany. After high school, Levan 
moved to the University of Lund where he graduated 
in botany in 1927. From 1926 to 1931, he held a post as 
Assistant at the Institute of Zoology. He was awarded 
a PhD in 1935. At the age of 30, he became Assistant 
Professor in Genetics, and subsequently, in 1947, 
also in Cytology. In 1961, he was awarded a personal 
chair in Cytology. 

Levan published his first paper in 1929, on the 
chromosomes of onions. The large size and clear 
morphology of these species’ chromosomes made 
them especially well suited for both descriptive and 
experimental studies. 

In 1938, Levan published a very important paper, 
entitled “The effect of colchicine on root mitoses in 


Allium.” This was the first in-depth study of the 
influence of colchicine on plant cell division, demon- 
strating its effect on the mitotic spindle and the con- 
comitant condensation of metaphase chromosomes. 
This work, of course, paved the way to development 
of the methodology that would eventually lead to the 
correct identification of the chromosome number of 
humans. Since these first experiments, colchicine (or 
its synthetic derivative colcemid) has been a central 
component of the protocols used to obtain chromo- 
some preparations from plants and animals. 

Over a long period (1938-51), Levan studied the 
reactions of chromosomes to treatments by different 
chemicals. He devised the so-called ‘Allium test’ to 
evaluate the effects of both chemicals and ionizing 
radiation. This work merited for him an honorary 
doctorate from the Sorbonne University, Paris, in 
1968. Levan also devoted himself with great success 
to practical plant breeding, and among other things, he 
produced the first tetraploid strains of sugar beet and 
red clover. 

At the end of the 1940s, Levan became fascinated 
by the similarity between the chromosome aberra- 
tions caused by chemical agents and those described 
and illustrated in the literature of cancer genetics at 
that time. Levan showed that by applying the methods 
developed for plant chromosomes, he could produce 
first-class preparations of chromosomes from mouse 
ascites tumor cells. He realized that this advance 
opened up a completely new field for chromosome 
study, namely investigation of chromosome number 
and morphology during the transition of a normal cell 
to a cancer cell. His seminal work paved the way for 
another new, large field of applied research, namely 
the diagnosis and treatment of malignancies based on 
their underlying chromosome abnormalities. 

The seminal paper by Tjio and Levan entitled “The 
chromosome number of man” was published in 1956 
(Tjio and Levan, 1956). This publication had a dra- 
matic input in genetics, becoming the starting-point 
not only for the new discipline clinical cytogenetics 
but also the rapid development of medical and human 
genetics. The paper also made a significant contribu- 
tion in veterinary medicine and zoology. Tjio and 
Levan’s study was performed on fetal lung fibroblasts 
cultured in vitro by Rune Grubb at the University’s 
Medical Microbiology Department. These cells were 
induced to arrest at metaphase (the best stage for 
chromosome enumeration) by use of the mitotic spin- 
dle poison, colchicine. The chromosomes were fixed 
and stained with the dye acetic orcein, and squash 
preparations made. Remarkably, the millions of cyto- 
genetic investigations performed to date each year are 
using basically the same methodology as pioneered by 
Tjio and Levan. 


In 1953, Levan had set up the Cancer Chromosome 
Laboratory at the Institute of Genetics of the Univer- 
sity of Lund. Initially, the laboratory had only a few 
scientists, but soon attracted researchers from all over 
the world. When Levan retired in 1976 at the age of 71, 
the laboratory had grown to 40 scientists, technicians 
and students. He and coworkers made many import- 
ant contributions to our understanding of cancer cyto- 
genetics, for example: 


1. Chromosomal changes in tumor cell lines are not 
arbitrary but follow particular developmental 
patterns. 

Measles virus and Rous sarcoma virus lead to an 
increased amount of chromosome breakage in 
human blood cells. 

Patterns of chromosome rearrangements are related 
to tumor etiology. Thus, histologically identical 
tumors can show totally different patterns of 
chromosome aberrations dependent on whether 
they have been induced by virus or chemicals. 
Burkitt’s lymphoma is characterized by a particular 
chromosome aberration of chromosome 14; this 
was the second example of a specific chromosome 
abnormality in a human tumor cell. 
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Levan’s interest and involvement in chromosome 
research continued after his retirement from the 
Directorship of the Cancer Chromosome Laboratory. 
In particular, he studied the phenomenon of ‘double 
minutes’ seen in some tumor types. These chromo- 
somal elements result from the massive amplification 
of genes involved in malignancy. 

The inspiring atmosphere that Albert Levan created 
in his laboratory made every member of staff do their 
best. He allowed much freedom of research. Although 
the key subject of the laboratory was cancer research, 
Levan accepted with much enthusiasm some of his 
younger colleagues devoting their time to the study of 
shrews, lemmings, hedgehogs, seals, and even whales. 
His curiosity for scientific matters never waned, and at 
the age of 85 he learnt to use the computer for writing 
and correspondence. As a researcher, Levan stands out 
as an intuitive and creative talent. He had an unusual 
attitude to work. Unlike many others in his labora- 
tory, he adhered stringently to a 9-to-5 working day. 
This allowed him time for his many other interests, 
including playing the cello and writing music. 


Reference 
Tjio JH and Levan A (1956) The chromosome number of man. 


Hereditas 42: |-6. 
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Edward Lewis (1918- ), an American biologist, 
made substantial contributions to our understanding 
of the development of animal embryos through his 
studies of Drosophila melanogaster or fruit flies. His 
work won him the Nobel Prize for Physiology or 
Medicine in 1995 for “discoveries concerning the 
genes that control early embryonic development.” 
He shared this award with Christiane Nusslein- 
Volhard of Germany and Eric F. Wieschaus of the 
United States, who were recognized for their independ- 
ent studies. 

Lewis gained a BA degree from the University of 
Minnesota in 1939 followed by a PhD in 1942 from 
the California Institute of Technology, where he spent 
his professional career. It was Drosophila melano- 
gaster, a popular species for genetic experiments, on 
which Lewis based his studies. By use of crossbreed- 
ing experiments, Lewis demonstrated that the order- 
ing of chromosomes that guide the development of 
the body segments generally matched the order of the 
corresponding body segments themselves, i.e., the first 
set of genes on the chromosome controlled the head 
and thorax, the middle set the abdomen, and the last 
set the posterior. This orderliness was termed the co- 
linearity principle. He also discovered that genetic 
regulatory functions may overlap. For example, a fly 
with an extra set of wings has a defective gene not only 
in the abdominal region but also in the thoracic area 
which in a normal fly would act as a regulator of 
such mutations. 

The results of his research helped to elucidate the 
mechanisms of biological development and shed light 
on the implications for congenital deformities in 
humans and other species. 


See also: Colinearity; Drosophila melanogaster 
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A library (or gene or genomic library) is a set of cloned 
fragments that together represent the entire genome. 


See also: Gene Library; Genomic Library 
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Ligation is the formation of a phosphodiester bond to 
join two adjacent bases in DNA or RNA. 


See also: DNA Ligases 


Light Receptor Kinases 


See: Photomorphogenesis in Plants, Genetics of 


Light, Heavy Chains 


See: Immunoglobulin Gene Superfamily 
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LIM domains are composed of ~55 amino acids with 
the general sequence CX CX4623HX2 CX2CX16-23 
CX,C where C = cysteine, H = histidine, and X = any 
amino acid. LIM domains bind two atoms of Zn** 
with the most common tetrahedral coordination being 
S3N and S4. Modular LIM domains are found in both 
nuclear and cytoplasmic proteins where they function 
in molecular recognition to assemble multiprotein 
complexes. The name ‘LIM’ derives from the first 
three proteins found to contain two of these domains 
at their N-terminus and a homeodomain at their 
C-terminus (lin-11, Is11, and mec-3) (Freyd et al., 
1990). Nuclear LIM domains are found in homeo- 
domain proteins (LIM-HD) and in small proteins 
containing little additional sequence (nuclear LIM- 
only, nLMO). Both LIM-HD and nLMO proteins 
are essential transcriptional regulators whose genetic 
disruption results in profound defects in hematopoi- 
esis and development of the nervous system, endo- 
crine system, and limbs. Cytoplasmic LIM proteins 
consist of variable numbers of LIM domains either 
alone (cLMOs) or in association with other functional 
modules, i.e, PDZ domains, kinase domains, a- 
actinin-binding sites, and GAP domains. Most, if not 
all, cytoplasmic LIM domain proteins are associated 
with the actin cytoskeleton and are essential to its 
structure and function. LIM domains are thus versatile 


protein molecular recognition modules that have 
essential functions in control of gene transcription 
and cytoskeletal architecture. 


Structure of LIM Domains 


Each LIM domain contains two Zn?” fingers. The 
two Zn?°™ atoms are bound independently in N- and 
C-terminal modules, which are packed together via a 
hydrophobic interface (Perez-Alvarado et al., 1994). 
The structure consists of four antiparallel B sheets 
with the hydrophobic residues that constitute the 
core of the LIM domain being conserved among 
family members. The surfaces contain both basic 
and acidic residues. A short « helix is present in 
the C-terminus of the cytoplasmic LIM domains of 
CRP (cysteine-rich protein) and CRIP (cysteine-rich 
intestinal protein). The residues that coordinate the 
Zn** atoms are essential for LIM domain folding but 
it is not yet known how molecular targets are recog- 
nized by the overall structure. 


LIM Domain Transcription Factors 


LIM homeobox genes have been identified in Caenor- 
habditis elegans, Drosophila, and vertebrates and can 
be organized by homology into six subclasses; nuclear 
LIM-only genes have been isolated in flies and verte- 
brates, but not in worms (Hobert and Westphal, 2000). 
Through its ability to homodimerize, the nuclear LIM 
interactor (NLI) protein (the Drosophila ortholog is 
Chip) forms tetrameric complexes with nuclear LIM 
proteins (2NLI:2LIM) to coordinate their activity 
(Jurata et al., 1998). Nearly all LIM-HD and nLMO 
proteins have unique patterns of expression through- 
out development and are required for the normal 
development of many tissue types, especially within 
the nervous and endocrine systems. 


LIM Homeobox Subfamilies 


Lhx! subfamily 

Members of the Lhx1 family, which includes 
C. elegans lin-11 and mec-3, Drosophila dlim1, and 
vertebrate Lhx1 and 5, are widely, but not ubiqui- 
tously, expressed throughout the nervous system. 
lin-11 and mec-3 are required for the specification of 
thermoregulatory interneurons and mechanosensory 
neurons, respectively, while Lhx5 is necessary for 
mouse hippocampal neuronal differentiation and mi- 
gration. Early functions of Lhx1 in anterior patterning 
during gastrulation were revealed by gene deletion 
studies in the mouse, in which embryos developed 
without heads. 


Lhx2 subfamily 

Members of this group, C. elegans ttx-3, Drosophila 
apterous, vertebrate Lhx2 and Lhx9, are all expressed 
in subclasses of developing interneurons. ttx-3 is 
necessary for the development of a thermoregula- 
tory neuron that functionally opposes the lin-11- 
expressing thermoregulatory neuron. In the fly, apter- 
ous is required for appropriate axon pathfinding of 
interneurons as well as in patterning and outgrowth 
of the wing. The function of the Lhx2 family is con- 
served from fly to vertebrates, as Lhx2 also plays a role 
in outgrowth of the chick limb. Additionally, in the 
mouse, Lhx2 is required for eye and forebrain develop- 
ment as well as erythropoeisis. While Lhx9 is highly 
expressed in the developing mouse brain and limbs, 
the major phenotype resulting from genetic disruption 
of this gene was failure of male gonad formation. 


Lhx3 subfamily 

C. elegans ceh-14, Drosophila dlim3, and vertebrate 
Lhx3 and Lhx4 comprise the third LIM-HD subfam- 
ily. ceh-14 was shown to specify a third type of thermo- 
regulatory interneuron in worms, while dlim3, Lhx3, 
and Lhx4 were found to be expressed in and required 
for the normal axon trajectory of subclasses of motor 
neurons. These factors are also expressed in specific 
classes of interneurons. Lhx3 and Lhxé4 are addition- 
ally expressed in the developing pituitary, where their 
coordinate functions are necessary for many aspects 
of pituitary formation. Mutations in human Lhx3 
are associated with combined pituitary hormone 
deficiency disease. 


Lhx6 subfamily 

The members of this group are lim-4 in worm, arrow- 
head in fly, and Lhx6 and Lhx8 in vertebrates. lim-4 is 
necessary for specification of an olfactory neuron 
in worms, while in flies, arrowhead is expressed in 
neuroblasts and is involved in the development of 
abdominal and salivary imaginal cells. In the mouse, 
both Lhx6 and Lhx8 are expressed in the developing 
forebrain and branchial arches and loss of Lhx8 in the 
mouse resulted in cleft palate formation. 


Islet subfamily 

lim-7 in worms, fly dislet, and vertebrate Is11 and Is12 
make up this LIM-HD subfamily. The fly and verte- 
brate genes are expressed in large classes of motor 
neurons, where they are required for appropriate 
axon pathfinding, neurotransmitter identity, and 
differentiation. In addition, Is11 expression in the 
developing pancreas is involved in the formation of 
both exocrine and endocrine cells. 
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Lmx subfamily 

C. elegans lim-6 and vertebrate Lmx1a and Lmx1b are 
members of the last group of LIM homeobox genes. 
lim-6 is expressed in subsets of neurons in the worm, 
and is necessary for differentiation of GABAergic 
motor neurons. In the vertebrate nervous system, 
Lmxla is required for formation of the roof plate 
and dorsalization of the neural tube. Mutation of 
Lmx1b resulted in dorsal/ventral patterning defects 
within the limb as well as kidney defects, and was 
found to be a cause of the human genetic disease 
known as nail—patella syndrome. 


The dLMO Family 

Drosophila dLMO and vertebrate LMO1-4 are also 
expressed in the developing nervous system and limb. 
Genetic analysis of dLMO revealed that this factor 
functions in wing development to downregulate 
apterous activity by disrupting functional apterous/ 
Chip complexes. In humans and mice, misexpression 
of LMO1 and LMO2 in T cells causes leukemia, while 
disruption of LMO2 function in mice resulted in fail- 
ure of erythropoeisis. 


Cytoplasmic LIM Domain Proteins 


Most cytoplasmic LIM domain proteins are associated 
with and regulate the cytoskeleton (Dawid et al., 
1998). The cLMO proteins contain from one to more 
than five LIM domains. Adapter proteins contain one 
or more LIM domains in addition to protein-binding 
motifs such as PDZ domains and o-actinin-binding 
sequences. Both protein kinase and GTPase activat- 
ing functions are found in cLIM domain-containing 
proteins. 


LIM-kinase 

There are two human LIM-kinases, each containing 
two N terminal LIM domains, a central PDZ domain, 
and a C-terminal Ser/Thr protein kinase domain. 
Hemizygous deletion of LIM-kinase is implicated in 
the neurological manifestations of Williams syn- 
drome. LIM-kinases regulate the actin cytoskeleton 
by phosphorylating cofilin at Ser3. This phosphoryl- 
ation blocks cofilin activity and thus decreases depoly- 
merization of actin filaments thereby stabilizing them. 
LIM-kinase functions in a signal transduction path- 
way through which environmental signals are trans- 
mitted through the small GTPases of the Rho family 
via a protein kinase cascade to regulate actin cyto- 
skeleton responses such as cell movement (Edwards 
et al, 1999). 
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Adapter Proteins 


Enigma family 

The Enigma (ENG) family of adapters contains a 
single N-terminal PDZ and one to three C-terminal 
LIM domains. ENG, (LMP-1) cypher (Oracle), and 
the Enigma Homolog (ENH) contain closely related 
PDZ domains and three LIM domains. During devel- 
opment these proteins are preferentially expressed in 
cardiac and skeletal muscle. The PDZ domain of ENG 
binds to the skeletal muscle isoform of tropomyosin 
and the two are colocalized at the boundary between 
the Z line and I band. The PDZ domain of cypher 
binds to o-actinin2 and colocalizes with it at the Z line 
of cardiac myofibrils. A related family of proteins that 
contain a single LIM domain include Ril, CLP36, and 
a-actinin associated LIM protein (ALP). The PDZ 
domain of ALP binds to the spectrin-like motifs of 
a-actinin2 and colocalizes with it at the Z line of 
myofibers. The binding partners for the LIM domains 
of these proteins are incompletely defined but may 
include the protein kinase ret, the insulin receptor, 
and protein kinase C. 


Focal adhesions 

A number of LIM domain proteins are localized at focal 
adhesions, which are sites of integrin—extracellular 
matrix communication. The paxillin family of pro- 
teins (paxillin, leupaxin, Hic-5, and Pax B), contain 
four C-terminal LIM domains that target the proteins 
to focal adhesions. The N-terminus contains vinculin 
and focal adhesion kinase (FAK) binding sites. Paxillin 
is phosphorylated on tyrosine residues by FAK and 
thus binds both SH2 and SH3 domain proteins in 
macromolecular complexes present in focal adhesions. 
Zyxin is also located at focal adhesions and along actin 
filaments. This protein contains three C-terminal LIM 
domains and a proline-rich N-terminus that binds to 
q-actinin. Ajuba and LPP are related proteins that are 
localized to sites of cell-cell adhesion. The C. elegans 
unc-115 and human abLIM proteins contain three 
and four N-terminal LIM domains respectively and a 
C-terminal actin-binding domain related to the villin 
head piece and dermatin domains. unc-115 mediates 
axon guidance while abLIM is specifically expressed 
in the retina where it undergoes extensive phosphor- 
ylation. Both proteins are proposed to be molecular 
adapters that link the actin cytoskeleton to extracel- 
lular signals. Drosophila prickle, which contains three 
C-terminal LIM domains and an N-terminal PET 
domain, is necessary for the development of planar 
polarity in imaginal discs. 


cLIM-Only Proteins (cLMO) 

cLMO proteins consist of one to five LIM domains 
without other identifiable sequence motifs. The 
cysteine-rich protein (CRP) family members contain 
two LIM domains. CRP1-3 proteins bind to the LIM 
protein zyxin and colocalize with actin. Genetic dele- 
tion of muscle LIM protein (MLP(CRP3)) that is 
normally localized at Z lines results in disruption of 
cardiac myocyte cytoarchitecture and heart failure; 
skeletal muscle fibers are also abnormal. Other 
cLMO proteins are specifically expressed in smooth 
muscle (SmLIM) and in skeletal muscle. PINCH, 
which contains five LIM domains, is implicated in 
integrin-associated protein kinase signaling. The 
C. elegans ortholog of PINCH (unc-97) is necessary 
for structural integrity of the integrin containing 
muscle adherence junctions and contributes to the 
mechanosensory function of touch neurons. 


Future Prospects 


The catalog of nuclear LIM proteins is nearly com- 
plete. One high- affinity target, NLI, is the basis for 
combinatorial association of nuclear LIM proteins 
into a transcriptional ‘code’ underlying developmen- 
tal choices. How these complexes operate in the con- 
text of other transcriptional regulators remains to be 
determined. The catalog of cytoplasmic LIM proteins 
is incomplete and other associated protein motifs are 
likely to accompany the LIM domains. Most, if not all, 
interact with and regulate the cytoskeleton. In con- 
trast to nuclear LIM domains, a common high-affinity 
target for cytoplasmic LIM domains has not been 
identified and specific recognition sites remain to be 
determined. In both nuclear and cytoplasmic proteins 
LIM domains function as recognition modules for 
macromolecular assemblies. 
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The tetrapod limb is a complex structure that exhibits 
considerable morphological diversity between spe- 
cies. For example, the forelimbs of bats and birds 
have been adapted for flying while the limbs of alliga- 
tors retain features characteristic of the primitive 
tetrapod condition (Shubin et al, 1997). Despite 
these variations, all tetrapod limbs exhibit a common 
organizational theme that reflects their conserved 
evolutionary origin (Figure l). In this article, the 
cellular and molecular events that occur during 
embryogenesis to form this basic tetrapod limb struc- 
ture are described. 

The first morphological indication of limb devel- 
opment is a localized thickening of the lateral plate 
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mesoderm in presumptive forelimb and hindlimb 
regions of the embryonic flank. This thickening is 
achieved through differential proliferation of the 
lateral plate mesoderm, with maintenance of high 
levels within limb bud forming regions and suppres- 
sion of these high levels in interlimb regions. Available 
evidence suggests that members of the fibroblast 
growth factor (fgf) family are important mediators 
of limb induction. Implantation of beads soaked in 
recombinant FGF protein into the interlimb region 
results in the formation of an ectopic limb at the site of 
bead implantation (Cohn et al., 1995). Several fef 
family members, including fgf-8 and fgf-10, are 
expressed in the intermediate mesoderm at the time 
of limb initiation, making them attractive candidates 
for mediating limb induction (Cohn et al, 1995; 
Ohuchi et al., 1997; Yonei-Tamura et al., 1999). How 
these activities are deployed at specific axial levels is 
not known, but it is likely to involve the action of the 
clustered homeobox genes (Cohn et al., 1997). 

Once limb buds have formed, continued outgrowth 
depends on signaling from a specialized region of limb 
bud ectoderm, the apical ectodermal ridge (AER), to 
the underlying mesenchyme. The AER is a morph- 
ologically visible thickening of limb bud ectoderm 
occurring at the interface between dorsal and ventral 
ectoderm. The function of the AER has been deter- 
mined by microsurgical manipulation of chick 
embryos (Saunders, 1948). Removal of the AER 
leads to limb truncations. The exact level of truncation 
depends on the time at which the AER is removed: 
removal at an early stage leads to proximal truncations 
while later removals lead to progressively more distal 
truncations (Summerbell, 1974b). Hence, the AER is 
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Key stages of vertebrate limb development as illustrated in the chick embryo. Limb initiation begins within 


the lateral plate mesoderm, which proliferates to form limb buds at the forelimb (FL) and hindlimb (HL) levels. At this 
stage outgrowth becomes dependent on the apical ectodermal ridge (AER) and axial pattern is specified through the 
action of multiple signaling centers (see text). After further development, a cartilage model of the adult limb is 
formed. Illustrated here is the skeletal pattern of a chick wing at 8 days of incubation. Note the conserved features of 
all tetrapod limbs: a single proximal long bone (humerus) followed by two long bones (raidus and ulna) and capped by 
carpals and digits. (Adapted from Johnson and Tabin, 1997.) 
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required for distal limb outgrowth. The AER is also 
sufficient for this process as grafting of additional 
AERs leads to the formation of supernumerary limbs 
(Johnson and Tabin, 1997). 

As in limb initiation, fgfs figure prominently in 
AER formation and function. One of the fefs, fgf-10, 
is expressed in the limb bud mesoderm prior to AER 
formation and mice that lack fgf-10 fail to form the 
AER (Min et al., 1998; Sekine et al., 1999). Evidence 
that fgfs mediate the distal outgrowth activity of the 
AER first came from expression studies and explant 
experiments. The ectoderm of the AER expresses a 
number of fgfs, including fgf-4 and fgf-8 (Niswander 
and Martin, 1992; MacArthur et al., 1995; Mahmood 
et al., 1995; Vogel et al., 1996). Culturing of mouse 
limb buds denuded of their AER in the presence of 
recombinant fgfs results in limited restoration of distal 
outgrowth (Niswander and Martin, 1993). These 
initial findings were extended by im ovo experiments 
using chick embryos (Niswander et al, 1993). 
Removal of the AER followed by grafting a bead 
soaked in FGF protein to the distal limb bud results 
in near complete distal limb development. Hence fgfs 
are sufficient to replace the AER in directing limb 
outgrowth and expressed in the AER at appropriate 
times to mediate this function of the AER. Removal 
of fgf function in the AER through gene targeting 
methods has yet to reveal a requirement of fgfs in 
distal limb outgrowth most likely due to the fact that 
multiple fgfs are expressed in the AER (Moon et al., 
2000; Sun et al., 2000). 

Characteristic features of tetrapod limbs are 
asymmetries along the three cardinal limb axes (see 
Figure |). Proximal-distal asymmetries are exempli- 
fied by the presence of a single long bone in most 
proximal regions followed by two long bones in 
more distal regions and a collection of small bones 
and digits in the distal-most regions. 

Asymmetries along the anterior—posterior axis are 
most easily seen by comparing digit morphologies. 
For example, the anterior-most digit of the human 
hand is the thumb, which contains only two phalanges 
while other digits contain three phalanges. More pro- 
nounced differences can be seen in the three digits of 
the avian wing (Figure 1). Dorsal—ventral asymmet- 
ries can be found in both integument derivatives 
(nails and feathers for example) and internal tissues 
such as the arrangement of muscles and tendons. 
These latter asymmetries are essential for coordinated 
extension and flexion of the limb. The mechanisms 
that specify skeletal morphologies along the proximal- 
distal axis are not clear, although the action of several 
homeobox-containing genes are though to be essen- 
tial for proper pattern formation along this axis 
(Capecchi, 1997; Rijli and Chambon, 1997; Capdevila 


et al., 1999; Zakany and Duboule, 1999; Mercader 
et al., 2000). In contrast, the mechanisms by which 
asymmetry along the anterior—posterior and dorsal- 
ventral axes are achieved is better understood and 
detailed below. 

The anterior—posterior identity of limb tissues is 
controlled by a special group of mesenchymal cells 
located in the posterior limb called the zone of polar- 
izing activity or ZPA (Saunders and Gasseling, 1968). 
The ZPA has the remarkable ability to induce the for- 
mation of mirror-symmetric digits when transplanted 
to the anterior margin of a host limb bud. This observa- 
tion led to a model in which the ZPA produces a diffus- 
ible signal that gives cells their identity along the 
anterior—posterior limb axis (Tickle et al, 1976). 
Recently, the molecule responsible for this activity 
has been identified as the product of the sonic hedgehog 
(shh) gene (Riddle et al., 1993). Shh encodes a secreted 
factor that is expressed within the ZPA. Moreover 
ectopic expression of Shh in anterior limb bud tissues 
mimics the effects of ZPA transplantation, indicating 
that Shh can functionally substitute for the ZPA. 
Whether Shh acts as a diffusible morphogen in the 
limb is controversial, however recent studies suggest 
that its effects are long-range, consistent with the mor- 
phogen hypothesis (Yang et al., 1997; Drossopoulou 
et al., 2000; Wang et al., 2000). 

The dorsal-ventral polarity of the limb bud is 
achieved through a cascade of factors expressed within 
the limb bud ectoderm and mesenchyme. Rotation 
experiments indicated that positional identity along 
the dorsal-ventral limb axis is controlled by the 
ectoderm (MacCabe et al., 1974). Inversion of the 
ectoderm, but not the mesenchyme, results in 
dorsal-ventral axis inversion. Hence, these studies 
suggest that the ectoderm sends a signal to the under- 
lying mesenchyme to determine its identity along 
the dorsal-ventral axis. Gene targeting studies in 
the mouse and gain of function experiments in the 
chick indicate that three factors play critical roles in 
dorsal-ventral limb patterning. The secreted glyco- 
protein wnt-7a is expressed in the dorsal ectoderm 
and is both necessary and sufficient for dorsal 
pattern specification (Parr and McMahon, 1995; Riddle 
et al., 1995; Vogel et al., 1995). Localized expression of 
wnt-7a in the dorsal ectoderm is achieved by the action 
of engrailed-1 (en-1) (Loomis et al., 1996; Logan et al., 
1997). En-1 is expressed in the ventral limb ectoderm 
where it directly or indirectly represses expression of 
wnt-7a. The function of wnt-7a is to induce the expres- 
sion of aLIM-homeodomain class transcription factor, 
Imx1b, in dorsal limb mesenchyme (Cygan et al., 1997; 
Loomis et al., 1998). Lmx1b, in turn, is necessary and 
sufficient to modify a default ventral limb pattern to 
create a dorsal-specific arrangement of limb tissues 
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Figure 2 Important molecular regulatory interactions 
in vertebrate limb patterning. Apical ectodermic ridge 
(AER) formation, distal outgrowth, anterior—posterior 
(A/P), and dorsal-ventral (D/V) pattern are controlled 
by a network of transcription factors and signaling 
molecules. See text for details. 


(Riddle etal., 1995; Vogel et al., 1995; Chen et al., 1998). 
How /mx1b achieves this effect is not understood, but 
it likely modulates the expression of a number of genes 
within dorsal limb tissues. 

Although it may appear that the three limb axes are 
specified through independent mechanisms, there 
are significant interactions among these pathways 
(Figure 2). Indeed, one might expect that to achieve 
the proper arrangement of limb tissues in three dimen- 
sions, the pathways that control patterning along each 
axis should be coupled to each other. A number of 
experiments indicate that this is the case. For example, 
transplantation of the ZPA at different times leads 
to duplication of tissues at different proximal- 
distal levels suggesting an integration of anterior- 
posterior and proximal-distal patterning mechanisms 
(Summerbell, 1974a). A second example is that both 
the outgrowth function of the AER and the polarizing 
activity of the ZPA are connected together through a 
reciprocal feedback loop whereby fgf expression in 
the AER and shh expression in the ZPA are codepend- 
ent (Laufer et al., 1994; Niswander et al., 1994). Dis- 
ruption of this loop is observed in the limbs of shh 
mutant mice (Chiang et al., 1996), which exhibit distal 
truncations, resulting from the indirect modulation of 
AER function by shh. Finally, the dorsal-ventral and 
anterior—posterior pathways are linked through modu- 
lation of shh levels by wnt-7a (Parr and McMahon, 
1995; Yang and Niswander, 1995). Removal of wnt-7a 
function leads to both dorsal-ventral patterning defects 
and loss of posterior-most digits. These examples pro- 
vide only an indication of the degree to which axial 
limb patterning mechanisms are coupled and future 
experiments will provide additional complexities. 

Once the axial pattern of the limb has been laid 
down, a series of additional events are necessary to 
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achieve the final form of the limb. Prominent among 
these events is modulation of cell death, especially 
within interdigital regions (Zou and Niswander, 
1996; Chen and Zhao, 1998). The spaces between 
digits are achieved through induction of cell death 
specifically within interdigital zones by a process 
that involves signaling by bone morphogenetic pro- 
teins (bmps). A second important event is endochon- 
drial ossification (Karsenty, 1999). The initial skeletal 
structures of the limb are laid down as cartilagenous 
models that are replaced by bone through this process. 

Endochondrial ossification is also important for 
embryonic and postnatal growth of the long bones 
of the limb and other tissues and is mediated through 
a complex signaling network involving bmps, 
Indian hedgehog, and parathyroid related peptide 
(Vortkamp et al., 1996). Other important events dur- 
ing limb development that have received compara- 
tively little attention are the formation of muscles, 
joints, tendons, and ligaments. Very little is known 
about the molecular basis of these processes, however, 
studies of mouse mutants are starting to reveal some 
key players that regulate these events (Storm et al., 
1994; Storm and Kingsley, 1996; Thomas et al., 1996). 

Many congenital malformations that affect limb 
development are known and it is becoming clear that 
most of these malformations can be interpreted as due 
to mutations in genes affecting pathways of limb 
initiation, axial patterning, or subsequent limb shap- 
ing events such as cell death, endochondrial ossifica- 
tion, and joint formation (Manouvrier-Hanu et al., 
1999). In cases where the genes responsible for limb 
defects have been identified, they can readily be inte- 
grated into known pathways. For example, Greig 
cephalopolysyndactyly (GCPS), in which affected 
individuals have a single extra pre-axial digit, results 
from mutations in the GLI-3 gene (Vortkamp et al., 
1991). Studies of mice with g/i-3 mutations suggest 
that GCPS polydactyly is caused by ectopic anterior 
expression of Shh during early limb development 
(Vortkamp et al., 1992; Hui and Joyner, 1998; Buscher 
et al., 1997; Masuya et al., 1997). Another example is 
nail—patella syndrome (nps), caused by mutations in 
the dorsal patterning gene LMX1B (Dreyer et al., 
1998; Vollrath et al., 1998; Clough et al., 1999). The 
limb phenotype of individuals with nps is small or 
absent patellae and misshapen or absent nails, each 
dorsal derivatives of the limb. These and other 
related studies highlight synergistic interactions 
between human genetics and developmental biology 
that has lead to an understanding of the etiology of 
many limb malformations. It is expected that in the 
future this synergy will continue, especially with the 
identification of novel regulators of limb patterning 
through the application of positional cloning methods 
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to the many existing human and murine genetic limb 
malformations. 
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The general name coined for selfish genetic elements 
that disperse themselves through the genome by 
means of an RNA intermediate is retroposon. There 
are two classes of retroposons. The SINE family is 
made up of very small DNA elements that require 
other genetic information to facilitate their dispersion 
throughout the genome. The LINE family is derived 
from a full-fledged selfish DNA sequence with a self- 
encoded reverse transcriptase. 
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Selfish genetic elements of the LINE type have 
been around for a very long time. Homologous 
LINE elements have been found in a wide variety of 
organisms including protists and plants. Thus, LINE- 
related elements, or others of a similar nature, are 
likely to have been the source material that gave rise 
to retroviruses. 

Full-length LINE elements have a length of 7 kb; 
however, the vast majority (>90%) have truncated 
sequences which vary in length down to 500bp. 
But, of the many full-length LINE elements in any 
genome, only a few retain a completely functional 
reverse transcriptase gene which has not been inacti- 
vated by mutation. Thus, only a very small fraction of 
the LINE family members retain ‘transposition com- 
petence,’ and it is these that are responsible for disper- 
sing new elements into the genome. 

Dispersion to new positions in the germline 
genome presumably begins with the transcription 
of competent LINE elements in spermatogenic or 
oogenic cells. The reverse transcriptase coding 
region on the LINE transcript is translated into 
enzyme that preferentially associates with and utilizes 
the transcript that it came from as a template to pro- 
duce LINE cDNA sequences. For reasons that are 
unclear, it seems that the reverse transcriptase usually 
stops before a full-length copy is finished. These 
incomplete cDNA molecules are, nevertheless, cap- 
able of forming a second strand and integrating into 
the genome as truncated LINE elements that are for- 
ever dormant. 

The LINE family appears to evolve by repeated 
episodic amplifications from one or a few progenitor 
elements, followed by the slow degradation of most 
new integrants — by genetic drift - into random 
sequence. Thus, at any point in time, a large fraction 
of the cross-hybridizing LINE elements in any one 
genome will be more similar to each other than to 
LINE elements in other species. In a sense, episodic 
amplification followed by general degradation is 
another mechanism of concerted evolution. 


See also: Repetitive (DNA) Sequence; 
Retroposon; SINE 
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In Mendel’s crosses, diploids that were heterozygous 
at two loci produced the four possible kinds of haploid 


meiotic products in equal numbers — the hereditary 
factors (genes) assorted at random, appearing in the 
meiotic products without regard to the combinations 
(parental types) in which they entered the diploid. 
Deviations from random assortment occur when two 
loci are on the same chromosome. Such a deviation 
(linkage) is manifested as an excess of parental types 
over the new (recombinant) types. In a two-factor 
cross involving linked loci, the mutant alleles are said 
to enter the cross in coupling when the diploid is 
formed from the union of a wild-type gamete and a 
double mutant gamete. The mutant alleles are in repul- 
sion if the diploid is formed by the union of gametes 
which are each mutant at one of the loci and wild-type 
at the other. 


Detection of Linkage 


When two loci on the same chromosome are far apart, 
they may fail to generate meiotic products whose 
frequencies differ significantly from the expectations 
of nonlinkage. Such a failure to demonstrate linkage of 
two loci may be overcome with the demonstration of 
their common linkage to a locus that lies between 
them on the chromosome. 

Much of genetics involves determining the location 
in the genome of a newly identified gene (mapping the 
gene). The first step in such mapping is determining on 
which chromosome the locus is situated. Crosses of 
the new mutant by strains that carry mutant alleles at 
loci on each of the chromosomes may detect linkage to 
one of those loci. Since only a finite number of haploid 
meiotic products (or of meiotic tetrads) can be exam- 
ined, the statistical test y? is standardly employed to 
determine when deviations from random assortment 
should be taken seriously. 

When tetrad data are available, an excess of parental 
ditype meiotic tetrads over nonparental ditype tetrads 
sensitively indicates linkage. 


Linkage to a Centromere 


Since homologous centromeres segregate in the 
first division of meiosis, relatively strong linkage of a 
locus to a centromere is indicated by an excess of 
first division over second division § segregations; 
weaker linkage is implied as long as the frequency of 
second division segregation is less than 2/3, the fre- 
quency expected for random assortment of a locus 
from its centromere. In the presence of positive 
chiasma interference, second division segregation 
may exceed 2/3, indicating that most of the tetrads 
have a single exchange between that locus and its 
centromere. 


Linkage Maps 


Maps that reflect the degree of linkage between loci 
can be constructed from observed recombination fre- 
quencies (see Centimorgan (cM)). The distances on 
these linkage maps will accurately reflect physical 
distances on the chromosome only if exchange fre- 
quencies are constant along the chromosome. 


Linkage in Prokaryotes 


In bacteria with a single chromosome, loci that are 
close enough together on the chromosome to be trans- 
mitted together in phage-mediated transduction may 
be referred to as ‘linked.’ Similarly, when transform- 
ation is conducted with chromosomal DNA, markers 
are ‘linked’ if they are cotransformed as a result of 
sometimes being on the same fragment created by 
artifactual breakage of the chromosome. Unlinked 
markers are transduced or transformed into the same 
recipient cell at a frequency about equal to the product 
of the transduction or transformation frequencies of 
the individual markers. 

In crosses with bacteriophages, as standardly con- 
ducted, recombination frequencies less than 50% can- 
not be taken as evidence of linkage because a fraction 
of the progeny phage particles has lacked the oppor- 
tunity to assort its genes. Linkage is implied by a pair 
of loci that gives a significantly lower recombination 
frequency than do the loci with the largest observed 
values. 


See also: Centimorgan (cM); Genetic 
Recombination; Mapping Function; Tetrad 
Analysis 
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Dependence of gene frequencies at two or more loci is 
called allelic association, gametic disequilibrium, or 
linkage disequilibrium (LD). Whereas unlinked loci 
reach independence (Hardy—Weinberg equilibrium) 
in a single generation, linked loci with recombination 
rate 0<0.5 reduce initial LD in an infinite population 
to a proportion e™° after t generations. The time 
required to go halfway to equilibrium is therefore 
T = (1n2)/0, or more than a million years if 9= 107° 
and there are 20 years per generation. A convenient 
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but inaccurate rule of thumb is that 0 = 0.01 corres- 
ponds to about 1 megabase (Mb). By this approxima- 
tion, 0=10~° corresponds to 1 kb. If 0 is as small as 
1076, the time since apes and hominids diverged is not 
long enough to go halfway to equilibrium. Therefore 
selection is not required to explain persistence of dis- 
equilibrium, which depends to a considerable extent 
on episodes of population contraction. There have 
been two major bottlenecks in human evolution. The 
first was when two chromosomes that are nonhomo- 
logous in apes fused to form the chromosome 2 inherit- 
ed by our species. The second bottleneck was when 
we migrated out of Africa in the last 100 000 years. As 
a consequence, LD is least for sub-Saharan Africa. 
Lesser bottlenecks have occured in the history of 
particular populations. 

LD may be measured in many ways. Some are con- 
founded with significance tests, and therefore with 
sample size. All are to some degree confounded with 
allele frequencies. The most reliable and best validated 
is the association probability p, which is made up of 
two parts. Association that has diminished from an 
initial value po in founders is p p= poe 0 NT OE 
where N is the effective size over t generations. Asso- 
ciation that has built up by genetic drift since the 
founders is pa = L (1 — 70N +), and p= pu + Der 
If N is constant, the equilibrium value as t > © is 
L=1/(1 + 2N 0) if 0 is small and 1 / (1 + 2 N) if 0=0.5. 
The latter is negligible in real populations. If 1/2 N is 
small compared to 0, pq follows the Malecot model for 
isolation by distance, equating t0 to ed, where d is 
distance between loci. On the genetic scale d measures 
recombination directly, with relatively larger sam- 
pling error over small distances. On the physical 
scale d is only indirectly related to recombination, 
but is more accurate if sequence-based. Choice should 
be based on goodness of fit to the best available maps. 
Analyzed as isolation by distance, LD provides a way 
to compare allelic association for chromosome regions 
in different populations, and therefore to detect vari- 
ations in recombination, selective sweeps that reduced 
haplotype diversity, and effects of population history 
and structure. This information determines the opti- 
mal populations and density of markers for positional 
cloning of genes affecting normal physiology and dis- 
ease. Localization is more precise by LD than by 
linkage. An alternative for multilocus haplotypes is 
cladistic analysis when its assumptions to reduce the 
number of independent variables are valid and the 
causal region has been made small by LD or other 
evidence. 


See also: Bottleneck Effect; Genetic Drift 
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Linkage Group 


M A Cleary 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0768 


Linkage among genes or genetic markers is deter- 
mined by the frequency with which they are inherited 
together. Genes that are frequently inherited together 
tend to lie near each other on the same chromosome. 
The frequency with which genes or markers are inherit- 
ed together is measured by the percentage of recom- 
bination that occurs between them. A linkage group 
is defined by all of the genes and markers for which 
linkage has been established. An entire chromosome is 
considered to be a linkage group. 


See also: Crossing-Over; Independent 
Assortment; Linkage Map 
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Linkage maps of the human and mouse genomes have 
provided the initial framework for genetic studies, 
including the positional cloning of disease genes and 
the scaffold for building physical maps and contigu- 
ous (contigs) stretches of cloned DNA. Although the 
usefulness of the human genetic maps to the comple- 
tion of the Human Genome Project is nearing or at 
an end, the chromosomal positions of highly poly- 
morphic markers that are necessary for many current 
studies still largely depend on these linkage maps. In 
other species, particularly the mouse, linkage maps 
have and continue to be the predominant tool for 
defining the chromosomal location of genes. 

Linkage maps depend on the relationship between 
locations on a chromosome that are defined by cross- 
overs that occur between homologous chromosomes 
during meiosis. The distance between two linked loci 
on a chromosome is defined by the recombination 
frequency, where 1% recombination is equal to 1 
centimorgan (cM). This ‘genetic distance’ can provide 
an accurate relative positioning of genetic markers. 
However, it only roughly corresponds to actual phys- 
ical distance (number of base pairs of DNA between 
loci), since different regions of the genome may have 
more frequent or less frequent crossover events during 
meiosis. In addition, genetic linkage maps must be 


viewed as reflecting a biologic process in which indi- 
vidual variation may be influenced by a large number 
of factors. For example, the recombination frequency 
between homologous chromosomes can be substan- 
tially different in oogenesis than in spermatogenesis. 
Although, in general, there is more frequent recombin- 
ation in female meioses, different regions of the gen- 
ome show different relationships and there are even 
chromosomal segments in which the recombination 
frequency is greater in male meioses. Other studies 
have indicated that recombination frequency 1 is itself 
influenced by genetic factors; meiotic recombination 
frequency may differ in crosses between different 
strains of mice and to some extent in different human 
populations. These factors must be considered when 
utilizing information from mammalian linkage maps. 


Construction of Linkage Maps 


The actual linkage maps are derived by a process of 
linkage analysis or segregation analysis, in which the 
likelihood of a nonrandom relationship between vari- 
ous loci are measured and maps determined and/or 
verified by the application of sophisticated statistical 
algorithms. For human linkage maps, the use of the 
logarithm of the odds (LOD) score provides a measure 
of the strength of linkage relationships at the opti- 
mized recombination distance. The LOD score is the 
logio of the likelihood ratio that two loci are linked 
and separated by a specific genetic distance, divided by 
the likelihood that the observed results would be 
obtained if the two loci were not linked. Although 
the concept is simple, the actual generation of the 
maps requires advanced algorithms that can determine 
LOD scores for a continuous range of possible recom- 
bination frequencies (termed 0) between markers in 
multiple complex pedigrees (e.g., Lathrop et al., 1985) 
(http: //linkage.rockefeller.edu/bib/algorithms/). The 
entire human linkage map contains a sex-averaged 
map distance of about 3500 cM, with each of the 22 
individual autosomal chromosomes and X chromo- 
some varying considerably in genetic distance. 

For the mouse, analysis of haplotypes in defined 
crosses has provided the most accurate relationship 
between markers. This analysis simply involves mini- 
mizing the total number of crossover events between 
linked loci. Here the observation that positive inter- 
ference (the decreased frequency of crossover events 
occurring near other crossover events) is very strong 
in the mouse provides even more confidence in rela- 
tive gene orders determined in a single cross within 
small (<10 cM) intervals. The mouse linkage map of 
each of 19 mouse autosomes and the X chromosome is 
approximately 1500 cM. 


In general, the confidence in a map position can be 
estimated and described by a variety of algorithms. 
These include tests that determine: (1) the likelihood 
of alternative gene orders; (2) a LOD3 interval (indi- 
cating which positions are 1000-fold more likely 
than alternative positions); and (3) a Bayesian 95% 
interval to describe the limits of a particular genetic 
mapping. In the mouse a standard error formula, 
[r(1—r)/n]1/2 where v is the recombination frequency 
and n is the population number, is also commonly 
used. 


Composite Linkage Maps 


In both human and mouse, linkage maps have been 
compiled that contain thousands of markers. Linkage 
maps from most other mammalian species currently 
have limited numbers of precisely defined markers 
and will not be discussed further. Perhaps the most 
useful human and mouse linkage maps are those in 
which composite maps have been developed that con- 
tain information from a wide variety of studies, and 
include traits as well as genes and ‘anonymous markers.’ 
For humans, this information can be obtained from the 
internet ina variety of forms from multiple sites includ- 
ing: the Genome Data Base at http://gdbwww.gdb. 
org/gdb/gdbtop. html, the Genethon Human Genome 
Research Centre (http://www.genethon.fr/genethon_ 
en.html/), the Cooperative Human Linkage Consor- 
tium (http://www.chlc.org/), and the Marshfield 
Center for Medical Research (http://www.marshmed. 
org/genetics/), as well as chromosome-specific web 
sites. In the mouse, this information can be obtained 
from the Mouse Genome Informatics (MGI) web 
site (http://www.informatics.jax.org/) including the 
mouse chromosome committee report composite 
maps (http://www.informatics.jax.org/bin/ccr/index). 

The composite linkage maps include a wide array of 
markers defined by many different techniques. The 
most common method relies on examining length 
variation of polymerase chain reaction (PCR) ampli- 
fied segments that contain microsatellite repetitive 
elements. These common repetitive elements are sim- 
ple tandem sequence repeats (SSRs) of primarily di-, 
tri-, or tetranucleotides. Other assays that have been 
used for these linkage maps include detection of 
variable number of tandom repeat polymorphisms 
(VNTRs), restriction fragment length polymophisms 
(RFLPs), and other measurements of single nucleotide 
sequence variation including polymorphisms defined 
by random sequence oligonucleotide primers and many 
other methods. 

For human maps, the use of the CEPH families (see 
http://www.cephb.fr/), a set of complex pedigrees that 
were distributed as part of a major effort in human 
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chromosomal map building, provided a measure of 
integration between laboratories and quality control 
in the development of genome-wide linkage maps. 
For the mouse, several large mapping panels were 
developed, in which many markers were mapped 
relative to each other, and have been instrumental in 
the advent of reasonable, representative composite 
maps. However, whenever information from different 
sources and crosses or families is combined there is 
some uncertainty in the precise relationship between 
markers. 

For human linkage maps, the vast majority of 
markers are microsatellite repetitive elements. Since 
these are highly polymorphic, informative meioses 
can be readily identified in a substantial percentage 
of the samples analyzed. Since relatively few genes 
have been characterized to date as having frequent 
polymorphisms, there is a scarcity of genes in present 
linkage maps derived from analysis of meiotic re- 
combination. Attempts at integration of these linkage 
maps and the position of genes and expressed se- 
quence tags (ESTs) will be discussed later. 

In the mouse, several mapping panels of inter- 
specific or intersubspecific backcross or intercross 
mice have been used to generate maps containing hun- 
dreds or thousands of markers (The data for many of 
these panels is available from MGI in the mapping 
data section (http://www.informatics.jax.org/crossda- 
ta.html)). In particular, crosses between laboratory 
strains of mice (predominantly Mus musculus domes- 
ticus) and the Mus spretus species have been used. 
These mouse species are estimated to have diverged 
over the course of 3 million years and are sufficiently 
different that polymorphisms can be detected in 
virtually all genes or cloned nonrepetitive genomic 
sequences by analysis of RFLPs. Thus, backcross pro- 
geny of these crosses can be typed using large numbers 
of markers in the same panel of potentially informa- 
tive meioses. Resultant individual, interspecific cross- 
linkage maps provide the most accurate mammalian 
genetic maps and, more importantly, include genes. 
However, it must be stressed that the map positions 
in composite linkage maps in the mouse combine 
information from a wide variety of disparate crosses 
or other genetic techniques. These maps typically 
include data from recombinant inbred strain analyses, 
as well as backcross and intercross breeding schemes. 
In addition, since recombination frequency between 
disparate strains of mice may vary considerably, the 
relative position of many genes in composite maps 

cannot generally be regarded as definitive. Depending 
on the actual data as to how positions have been 
interpolated (including whether additional informa- 
tion, e.g., progeny testing data is available or utilized), 
there may be a large 95% confidence interval for 
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which a marker is positioned relative to other markers, 
genes, or traits. A measure of the confidence of a 
particular chromosomal linkage map position for 
each entry is given in the Mouse Chromosome Com- 
mittee reports (Encyclopedia of the Mouse Genome 
VI, 1997, or see http://www. informatics.jax.org/bin/ 
ccr/index). 


Utilizing Human Linkage Maps 


At present, the prevailing technique utilized for link- 
age studies of human diseases is to map the trait with 
respect to highly polymorphic microsatellites. Most of 
these microsatellite markers are not specifically asso- 
ciated with coding sequences (i.e., they are not derived 
from relatively small clones containing known genes). 
However, for initial localization of a trait, the critical 
factors are how polymorphic a marker is, how easily it 
can be typed (i.e., for reasonably high throughput), 
and how well the marker has been previously mapped. 
Many of the internet sites discussed previously provide 
information on the heterozygosity scores of markers, 
as well as data indicating the two-point or multipoint 
linkage relationships between many of the markers. 
Thus, markers can be chosen that can enable genome- 
wide scans for susceptibility genes in both Mendelian 
and certain complex genetic diseases. In the future, it is 
possible that high throughput typing of single nucleo- 
tide polymorphism (Wang et al., 1998, and see http:// 
www.genome.wi.mit.edu/SNP/human/index.html) 
may also be used in either refining regional linkage 
studies or in genome-wide scans. However, the link- 
age or physical relationship between these poly- 
morphisms and those currently used will be 
necessary for optimal utilization of linkage maps. 


Integration of Human Linkage Maps with 
Genes and Physical Maps 


As discussed above, the human linkage map is largely 
devoid of genes. It is, however, often very useful to 
know the relationship between genes and the an- 
onymous polymorphic markers in the linkage map. A 
major effort utilizing radiation hybrid mapping has 
recently provided a good framework for integrating 
the position of genes with respect to the genetic link- 
age maps. Human radiation hybrids allow the devel- 
opment of a type of linkage map that is based on 
whether any specific segment of DNA from an ir- 
radiated human donor cell has been retained in cross- 
species somatic cell hybrids. In these maps, the 
distance between markers is measured in centirays 
(cR), where for each unit there is a 1% probability of 
X-ray-induced breakage for a specific dosage in rads. 
These maps include genes, as well as anonymous 


sequences including microsatellites. The results of a 
consortium of many investigators allowed the relative 
ordering of several thousand genes, ESTs, and other 
sequence-tagged sites including the polymorphic 
microsatellites used in genetic linkage maps (Schuler 
et al, 1996 and see http://www.ncbi.nlm.nih.gov/ 
genemap/). Therefore, it is possible to determine a 
probable range of the Genethon or other marker pos- 
itions with respect to ESTs. This can be extremely 
useful in searching for candidate ESTs for human dis- 
eases, if, for example, a critical interval containing the 
putative ‘disease’ gene has been defined using markers 
included in the Genethon map. If the markers are not 
in the Genethon map, then finding common markers 
between maps and interpolating may be necessary. 
The relationship among anonymous markers can also 
be examined in other radiation hybrid maps, including 
the large compilation of radiation hybrid data at the 
Stanford University and Whitehead Institutes genome 
sites (http://www-shgc.stanford.edu/RH/index.html, 
http://carbon.wi.mit.edu:8000/cgi-bin/contig/phys_ 
map). It should be noted that for most radiation 
hybrid mapping results, the high confidence group- 
ings, bins in which relative order is 1000-fold more 
likely than other orders, correspond to approximately 
10cM ranges of the meiotic recombination defined 
standard linkage maps discussed above. 

Other ongoing efforts have incorporated many of 
the genetic linkage map markers in the development 
of contigs (e.g., http://www-genome.wi.mit.edu/, 
http://www.cephb.fr/bio/ceph-genethon-map.html, 
http://www.nhgri.nih.gov/DIR/GTB/CHR7/, http:// 
gc.bem.tmc.edu:8088/bio/yac_search.html). Thus, the 
polymorphic markers used in linkage can begin to be 
integrated within the physical map. As the human gen- 
ome project proceeds through its current sequence- 
intensive step, establishing the precise position of 
these markers will become possible. However, linkage 
maps will, as discussed above, continue to be useful in 
the foreseeable future for efforts at positional cloning 
of genes corresponding to traits. The increasing avail- 
ability of sequencing data will obviously provide 
another method for the integration of putative coding 
sequences and single nucleotide polymorphisms with 
the genetic linkage maps. 


Utilizing Mouse Linkage Maps 


As discussed above, linkage maps of the mouse 
genome contain a variety of markers. For micro- 
satellites, the majority have been placed in a single- 
cross-defined linkage map in which there is strong 
confidence in the relative positions of most of the 
markers for even small genomic intervals (1-2 cM; 
http://carbon.wi.mit.edu:8000/cgi-bin/mouse/index). 


Other linkage maps contain large numbers of genes 
and enough microsatellite markers to allow reasonable 
integration with other markers not included in these 
maps. The chromosomal positions of these markers 
and genes can be used to map specific traits in a man- 
ner similar to that employed in human genome screen- 
ing. The cross-specific linkage maps and their derived 
composite maps (discussed above) have thus allowed a 
large number of traits to be placed in specific intervals 
and have facilitated positional cloning projects. 


Using Linkage Maps for Defining 
Homology Relationships 


Homology relationships can be very valuable in further 
utilizing linkage maps for linking genes to phenotypes. 
Many studies have indicated that mammalian genomes 
are mostly composed of chromosome segments that 
have been conserved over 100 million years of evolu- 
tion. Review of human—mouse homology relation- 
ships suggests that there are over 200 such segments 
(DeBry and Seldin, 1996 and see http://www.ncbi. 
nlm.nih.gov/Homology/). For Mendelian traits, sev- 
eral examples can be cited in which information 
from either human or mouse studies has expedited the 
molecular definition of disease in the other species. 
Although it is less certain that these relationships will 
be as useful for complex genetic diseases, it can provide 
the first insight into whether or not animal models are 
likely to be a major adjunct to human studies. 

Homology relationships can allow the use of what 
might be termed ‘virtual maps,’ in which all of the 
genes or ESTs located in a disparate species can be 
putatively placed in a linkage map of the species in 
question. This can suggest candidate genes for traits or 
markers that might be used to test for linkage in the 
other species. However, it is important to apply some 
critical evaluation of homology data to provide some 
assurance that orthologous (the same gene in both 
species) genes/ESTs are utilized, since related (para- 
logous) genes can result in incorrect interpretations. 
Finally, many of the borders of these homology rela- 
tionships are not well defined and will require further 
resolution in one or the other species. 
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Linker DNA isashortself-complementary palindrom- 
ic DNA molecule which forms a blunt end duplex 
containing a recognition sequence for a restriction 
endonuclease. The linker DNA is generally blunt- 
end ligated between two blunt-ended DNA fragments 
to introduce a restriction site. 


See also: Restriction Endonuclease 
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It is evident that genetics will have a major influence 
on everyday life in our modern society, and in areas 
such as predictive and therapeutic medicine and bio- 
technology the impact will be profound. Identifica- 
tion and characterization of the genes involved in 
genetic diseases have already made significant contri- 
butions to diagnosis and to both an understanding of 
therapy and suggestions for novel therapies (including 
gene therapy). It is well established that in benign as 
well as malignant tumors, recurrent genetic aberra- 
tions are regularly found by cytogenetic analysis, 
which are often translocations involving well-defined 
chromosome regions. Sometimes, such genetic lesions 
are characteristic of a particular tumor type suggesting 
the involvement of tumor type-specific genes, and 
occasionally the same chromosomal region is affected 
in a number of tumors and this possibly indicates a 
common genetic denominator in these diseases. Since 
such recurrent cytogenetic aberrations are often 
the sole chromosomal anomalies present, they are 
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believed to represent critical molecular triggers of 
aberrant growth control in tumorigenesis. 

Molecular evaluation of the chromosome break- 
point regions in two benign solid tumor types of 
mesenchymal origin, i.e., lipomas and uterine leiomyo- 
mas, has recently led to the identification of the first 
genes that are frequently targeted in these tumors by 
chromosomal defects. The architectural transcription 
factor genes, HMGIC and HMGI(Y), appear to be 
important targets. Furthermore, preferential trans- 
location partner genes of HMGIC have also been 
identified. In uterine leiomyoma, it is almost exclu- 
sively the RAD51L1 gene on the long arm of chromo- 
some 14. Structurally, this gene is listed as a member of 
the recA/RAD51 recombination-repair gene family 
and its protein product displays protein kinase activ- 
ity. The preferential translocation partner of HMGIC 
in lipoma is the LPP gene on chromosome 3, which 
encodes a LIM protein that enables communication 
between sites of cell adhesion and the cell nucleus. It 
should be noted that the HMGIC gene is also targeted 
by chromosomal aberrations in a variety of other 
benign solid tumors, including pleomorphic adeno- 
mas of the salivary glands, hamartomas of lung and 
breast, endometrial polyps, hemangiopericytomas, 
fibroadenomas of the breast, and chondromatous 
tumors. It clearly is a common genetic denominator 
in benign mesenchymal tumor formation. Their pre- 
cise functions in tumor development remain to be 
established but these recently discovered genes form 
reliable starting points for further molecular genetic 
studies of these lesions. Molecular cytogenetic data of 
both uterine leiomyoma and lipoma are discussed in 
more detail in this article. 


Uterine Leiomyoma 


Pathology of Leiomyomas of the Uterus 

Leiomyomas or myomas are benign tumors of smooth 
muscle cells and they are most frequently found in the 
genitourinary and gastrointestinal tracts and less fre- 
quently in the skin and in deep soft tissues (Enzinger 
and Weiss, 1995). Uterine leiomyomas (fibroids) 
represent the most common pathological growth in 
the female reproductive tract, occurring with a report- 
ed incidence of up to 77% of all women of reproduct- 
ive age (Cramer and Patel, 1990). However, these 
mesenchymal tumors are rare in women below the 
age of 18 and, furthermore, they are more frequent 
in black than in white women. Affected women 
complain of fibroid-related symptoms, e.g., abnormal 
uterine bleeding, pelvic pain, or urinary dysfunction. 
Fibroids may also interfere with pregnancy, leading 
to premature delivery or even fetal wastage. Since 
the current long-term nonsurgical management of 


leiomyomas (hormone replacement therapy) is asso- 
ciated with major side effects, more and more women 
are directly seeking some form of surgery to remove 
their fibroids. This has led to a situation in which in 
the United States alone, uterine leiomyomas are the 
leading indication for about 300000 hysterectomies 
performed annually. 


Cytogenetics of Uterine Leiomyoma 

Besides a normal karyotype, which is being found in 
approximately 70% of the cases investigated, several 
cytogenetically abnormal subgroups (Figure 1) can be 
distinguished (Mitelman, 1998). Excluding the group 
with random changes, one of the largest cytogenetic 
subgroups (comprising approximately 25% of the 
cytogenetically abnormal tumors) is characterized by 
the involvement of 12q14-q15 and/or 14q23-q24, 
mainly as t(12;14)(q14—q15;q23-q24). Another sub- 
group, with a similar incidence, contains deletions 
involving the long arm of chromosome 7, with region 
q21-q22 being the commonly involved chromosomal 
segment. Another subset of uterine leiomyomas is 
characterized by numerical aberrations, mainly tri- 
somy 12. This trisomy is found in approximately 10% 
of the cytogenetically abnormal cases. Furthermore, 
chromosome 6p21-pter has been found to be recur- 
rently involved in roughly 5% of the cases studied. 
Finally, a small percentage (approximately 3.5%) of 
uterine leiomyomas shows t(1;2)(p36;p24). As will be 
discussed below, chromosome 12q13-q15 anomalies 
are frequently found in lipomas. In fact, as outlined 
above, they are encountered in a variety of other 
benign solid tumors as well. In general, these karyo- 
typic changes are balanced and simple. The fact that 
these translocations are often the first or sole cytogen- 
etically visible anomalies suggests that, pathogenetic- 
ally, they are of critical importance in these tumors. 


Genes Affected in Uterine Leiomyoma 


Implication of the high mobility group protein genes 
HMGIC and HMGI(Y) 

Using a classical positional cloning approach, the 
chromosome 12q14-q15 breakpoints in a number of 
uterine leiomyomas were mapped first within a 
1.7 Mb DNA region on the long arm of chromosome 
12. In subsequent FISH studies, it was conclusively 
demonstrated that many of the chromosome 12 break- 
points were clustering within a relatively small 
(175 kb) DNA segment, identifying it as a major target 
area. A single transcribed sequence was identified in 
this target area and it appeared to correspond to the 
human HMGIC gene (Schoenmakers et al., 1995), 
which is a member of the high mobility group 
(HMG) protein gene family. The HMGIC gene (for 
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review, see Jansen et al., 1999) consists of five exons 
and spans about 175 kb (Figure 2A). The gene con- 
tains one large intron, i.e., intron 3, which spans about 
140 kb. The HMGIC protein has three DNA-binding 
domains (a 9 basic amino acid DNA-binding motif, 
also referred to as the AT-hook) and an acidic 
C-terminal domain. The location of the breakpoints 
with respect to HMGIC is variable and has been 
found 5’ to the gene, in its 3’ nontranslated region, as 
well as in one of its introns. The intragenic break- 
points frequently occur in the large third intron. In 
such cases, the three DNA-binding domains in the 
N-terminal region of the protein become separated 
from the acidic, C-terminal domain. Furthermore, 
it is of interest to note that another member of the 
HMG protein gene family, i.e., the HMGI(Y) gene 
(Figure 2B), which maps at chromosome 6p21, is also 
implicated in uterine leiomyoma. 

HMG proteins (Bustin et al., 1990) are named after 
their fast electrophoretic migration at acidic pH, and 
were first discovered in the 1960s as contaminants in 
calf thymus histone H1 preparations. They are oper- 
ationally defined as small (mol.wt < 30kDa) and 
abundant, 2% TCA/2-5% perchloric acid-soluble, 
nonhistone proteins, extractable from chromatin 


with 0.35 m NaCl and having a high content of acidic 
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Cytogenetics of uterine leiomyomas. Schematic representation of the different cytogenetic subgroups of 


and basic amino acid residues. Since this definition is 
based on physical and chemical rather than functional 
features, it may be clear that the HMG protein family 
is composed of an artificial group of proteins with 
possibly unrelated functions. Based on their primary 
structure, three subfamilies of HMG proteins can be 
distinguished, i.e., the HMG1/2, the HMG14/17, and 
the HMGI class, to which the proteins encoded by 
HMGIC and HMGI(Y) belong. 

The HMGI subfamily consists of three members: 
HMGI, HMGY, and HMGIC (Figure 2B). HMGI 
and HMGYareisoforms resulting from differential pro- 
essing of the same parental messenger RNA (mRNA). 
Exceptforastretch of 11 contiguousamino acids, which 
are present in HMGI but not in HMGY, the two pro- 
teins, often referred to as HMGI(Y), are identical 
(Figure 2B). HMGI proteins (mol.wt around 10 kDa) 
have been shown to display a significant preference 
forthenarrow minor groove of certaintypes of stretches 
of AT-rich, B-form DNA im vitro, and conserved 
(TATT), motifs in the 3’ untranslated regions (UTR) of 
certain genes have been identified as preferential bind- 
ing sites. Furthermore, HMGI proteins bind specific- 
ally to the AT-rich octamer sequence associated with a 
number of promoters and also to AT-rich regulatory 
elements of the ribosomal genes. However, it should 
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Figure 2 Structure of the human HMGIC gene and its protein product and the human RADSILI gene products. 
(A) On the HMGIC gene map, the exons are depicted as boxes, with the 5’- and 3’-untranslated regions represented 
as shaded areas. The numbers below the map indicate the intron sizes in kilobase pairs. The dashed lines indicate 
which regions of the HMGIC protein are encoded by the individual exons and the amino acid numbering above 
the protein map marks the boundaries of the various DNA-binding and acidic domains. (B) HMGIC amino acid 
sequence aligned with HMGI and HMGY. (C) Schematic representation of the three alternative RAD5/LI mRNA 
splice variants (exons are numbered). The relative position of two highly conserved nucleotide-binding Walker 
domains are marked by asterisks and the number of amino acids encoded by the three alternative terminal-coding 
exons are indicated. Arrows mark the positions of chromosome breakpoints found in the RADS ILI gene in various 


uterine leiomyomas. 


be kept in mind that this preference for certain AT- 
rich stretches has been shown to be caused by recog- 
nition of substrate structure rather than nucleotide 
sequence. 

As far as expression patterns of HMGI(Y) and 
HMGIC are concerned, there seems to be a link to 
cell proliferation. Expression of the HMGIC gene is 
tightly linked to growth, since it is mainly expressed 
during early development and in growing cells. 
Furthermore, it responds to serum induction as a 
delayed early response gene (Ayoubi et al., 1999). 
Finally, homozygous disruption of the Hmgic gene 
in mice leads to the pygmy phenotype (Zhou et al., 
1995). The observation that the HMGI proteins are 
developmentally regulated and constitute abundant 
proteins might indicate that they could be involved 
in the regulation of many genes, some possibly 
involved in cell growth. The fact that HMGI(Y) is 
known to cause a more general regulatory effect on 
transcription through modification of chromatin 
structure by inducing DNA bends, thereby facilitat- 
ing the assembly of transcriptionally active nucleo- 
protein complexes (Grosschedl et al, 1994), has 
resulted in the definition of so-called ‘architectural 
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transcription factors’ (Wolffe, 1994; Lovell-Badge, 
1995), of which HMGI(Y) is the founding member. 
Indeed, studies on the role of HMGI(Y) in the induc- 
tion of INFf gene expression (Falvo et al., 1995) point 
toward a more architectural role for HMGI(Y), and 
have resulted in the model that the HMGI proteins as 
a group, just like other architectural transcription fac- 
tors, might function as ‘facilitators’ of gene expression 
(Figure 3). The intriguing question remains as to how 
particular genetic changes in such facilitators result in 
aberrant cell proliferation of a benign nature. Identi- 
fication of the spectrum of their target genes is an 
important objective for future research of the 
HMGIC proteins. 


Implication of RAD5 ILI, the Chromosome l4 
Translocation Partner Gene of HMGIC 

Using a positional cloning approach, the RADS51L1 
gene on human chromosome 14q23-q24 was recently 
identified as the almost unique translocation partner of 
HMGIC in uterine leiomyomas (Schoenmakers et al., 
1999). The RAD51L1 gene (also known as R51H2 and 
hREC2) is a member of the recA/RADS1 recombin- 
ation-repair gene family. The gene, which contains 


Focal adhesions 


Figure 3 Signal transduction model for HMGIC and LPP. Schematic representation of the delayed, early response 
activation of the HMGIC gene via the growth factor (GF)—receptor (R)-mediated activation of MAP kinases (MAPK). 
The HMGIC protein will bind to regulatory regions of its respective target genes. The bottom part of this figure 
shows the localization of wild-type LPP in focal adhesions of which other structural components are also shown. 
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11 exons, expresses three distinct mRNA isoforms, 
which differ only in the sequences of their last exons 
(exons 11) (Figure 2C). Two isoforms are broadly 
expressed and their different last exon sequences 
encode only five amino acids. The third isoform dis- 
plays a highly restricted expression pattern but is 
expressed in the uterus. Studies of uterine leiomyomas 
seem to indicate that the pathogenetically critical 
sequences reside in the last coding exon (encoding 80 
amino acids including a putative membrane anchor) of 
this third RADS51L1 isoform. It appears that allelic 
knockout of the third splice variant of RADS1L1, 
resulting in expression of truncated and C-terminally 
altered RAD51L1 proteins, is a tumor-specific feature 
of uterine leiomyomas with t(12;14)(q15;q23-q24) 
translocations. The precise physiological function(s) 
of the various isoforms of RAD51L1 in normal cells 
and the role(s) of their truncated variants in uterine 
leiomyoma remain to be elucidated. A highly related 
family member of RAD51L1, i.e, RAD51A, has 
been shown to promote ATP-dependent homologous 
pairing and strand transfer reactions im vitro, to 
play an essential role in mammalian cell viability, and 
to be linked etiologically to cancers, because of its 
interaction with p53, BCRA1, and BCRA2. However, 
until now, the typical recombinase activity of mem- 
bers of the rec2A/RADS51 gene family could not be 
established for RAD51L1. On the other hand, it has 
been shown that overexpression of RAD51L1 in 
mammalian cells results in a delay in G1. Recently, it 
was reported that RAD51L1 exhibits protein kinase 
activity and is able to phosphorylate various sub- 
strates, including p53, cyclin E, and cdk2, but not a 
peptide substrate containing tyrosine residues only 
(Havre et al., 2000). 


Lipoma 


Pathology of Lipomas 

Lipomasare benign neoplasms of adipose tissue. Histo- 
logically, they belong to the group of lipomatous 
tumors that are classified as soft tissue tumors 
(Enzinger and Weiss, 1995). Several types of benign 
lipomatous tumors can be distinguished such as ordin- 
ary benign lipoma, angiolipoma, fibrolipoma, hiber- 
noma, lipoblastoma, spindle cell/pleomorphic lipoma, 
and atypical lipomatous tumors. Lipomas are one of 
the most common soft tissue tumors and form part of 
the daily practice of many surgical pathologists. With 
rare exceptions they may occur at any age and at 
almost any anatomical location. However, in general, 
most of the lipomas become apparent between the 
fourth and sixth decade and most of these are found 
in the subcutaneous tissues of the upper back, neck, 
shoulder, and abdomen, followed in frequency by the 


proximal portions of the extremities. In a minority of 
cases, multiple lesions are observed, but mostly 
patients have one tumor. Ordinary lipomas (referred 
to as ‘lipoma’ throughout the rest of this article) 
are generally asymptomatic, and are mainly brought 
to the attention of a physician if they reach a large size 
or cause cosmetic problems or complications because 
of their anatomical site. As a consequence of this, the 
reported clinical incidence is probably much lower 
than the actual incidence. Microscopically, there is 
little difference between lipomas and surrounding fat 
tissue. Like fat tissue, lipomas are mainly composed of 
mature fat cells, but the cells vary slightly in size and 
shape and are somewhat larger. The tumors are usually 
thinly encapsulated and have a distinct lobular pat- 
tern. All tumors are well vascularized. Subcutaneous 
lipomas vary in size from a few millimeters to 5 cm or 
more. Occasionally, ‘giant’ cases are reported in the 
literature, measuring at least 20cm (for review, 
Sanchez et al., 1993). Deep-seated lipomas are very 
rare as compared to their cutaneous counterparts. 
These lipomas have been detected in numerous sites 
of the body. They are often detected at a later stage of 
development, and therefore tend to be larger than 
superficial lipomas. 


Cytogenetics of Lipomas 

In the past two decades, lipomas have been studied 
extensively by cytogenetic analysis (Sreekantaiah, 
1998). These studies have demonstrated that more 
than 60% of solitary lipomas have an aberrant karyo- 
type (Mitelman, 1998) (Figure 4). In two-thirds of 
these, chromosomal region 12q13-q15 is affected 
resulting from various types of chromosome aberra- 
tions, mainly translocations. In a quarter of these 
cases, chromosome 3 at bands q27-q28 was found 
as the translocation partner of chromosome region 
12q13-q15. This means that the most consistent 
chromosomal aberration in lipomas is represented by 
t(3;12)(q27-q28;q13-q15), being present in about 
10% of all solitary lipomas. Studies on the remaining 
cases indicated that most if not all chromosomes are 
able to act as translocation partner of 12q13-q15. The 
chromosome regions that are most frequently 
involved are 1p34—p32, 2p24-p21, 5q33, 21q21-q22, 
2q35, 1p36, 11q13, and 13q12-q14. Finally, supernu- 
merary ring chromosomes as well as complex karyo- 
types involving chromosome region 12q13-q15 have 
been reported. 

Lipomas without involvement of chromosome 
region 12q13-q15, most often display chromosome 
13q or chromosome region 6p23—p21 rearrangements 
(Figure 4). Abnormalities of 13q include deletions, 
with del(13)(q12q22) being the most frequently found, 
and translocations. Rearrangements of 6p23—p21 are 


usually due to translocations, inversions, or insertions. 
In addition, rearrangements of chromosome region 
1p36 have been found as well as supernumerary ring 
chromosomes and complex karyotypes. 

Apart from the fact that normal karyotypes are 
more common in patients younger than 30 years old, 
there appears to be no significant association between 
the cytogenetic pattern and patient sex, age, or tumor 
localization, size, or depth. Therefore, to date, the 
pathogenetic basis and clinicopathological relevance 
of the cytogenetic subtypes among lipomas remain 
unexplained (Willén et al., 1998). 


Lipoma and Uterine Leiomyoma III5 


About 5-8% of all patients with lipomas have 
multiple tumors, varying in number from a few to 
several hundred lesions. These lipomas are indistin- 
guishable from their solitary counterparts. They occur 
predominantly in the upper half of the body, usually in 
the back, shoulder, and upper arm. There is a definite 
hereditary trait in about one-third of patients with 
this condition. Cytogenetic analysis of these kinds of 
tumors revealed that most multiple lipomas (98%) 
have a normal karyotype. 
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Figure 4 Cytogenetics of lipomas. Schematic representation of the different cytogenetic subgroups of lipomas. The 
inserted picture represents the partial karyotype from a lipoma showing a t(3;12)(q27—q28;q13-q 15). Arrowheads 
indicate breakpoints. 
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Figure 5 Schematic representation of wild-type HMGIC and LPP proteins and related fusion proteins predicted to 
be expressed in lipomas. The wild-type LPP protein is predicted to consist of a proline-rich N-terminal domain and 
three LIM domains in its C-terminal region. HMGIC consists of three N-terminal DNA-binding domains and an acidic 
C-terminal tail domain. Hybrid transcripts encoding the two variants of HMGIC/LPP fusion proteins (upper part) and 
the reciprocal LPP/HMGIC fusion protein (lower part) were detected in RT-PCR analysis of primary lipomas and 
lipoma cell lines. DBD, DNA-binding domain; AD, acidic domain; LIM, LIM domain. 


Molecular Genetics of Lipomas with 
t(3;12)(q27-q28;q13-qI5) 

The most consistent chromosomal aberration in lipo- 
mas is represented by t(3;12)(q27-q28;q13-q15), found 
in about 10% of all solitary lipomas. It was established 
that the genes HMGIC at 12q15 and LPP (LIM con- 
taining lipoma preferred partner) at 3q27-q28 are 
affected by this preferential 3;12-translocation (Petit 
etal., 1996). Furthermore, it was demonstrated that asa 
direct result of this, HMGIC/LPP fusion transcripts 
are expressed in these tumors (Figure 5). The HMGIC 
protein is described above see also (Figure 2). 

The LPP protein belongs to a recently identified 
family of proteins, also comprising Zyxin and TRIP6 
(Beckerle, 1997). They are all proline-rich in their 
N-terminal region while in their C-terminal region 
they have three LIM domains that are capable of 
mediating protein-protein interactions. In lipomas, 
two alternative HMGIC/LPP hybrid transcripts 
have been detected so far. They encode fusion proteins 
containing the three DNA-binding domains of 
HMGIC followed by: (1) part of the proline-rich 
domain and all three LIM domains of LPP; or more 
frequently (2) the two most C-terminal LIM domains 
(LIM 2-3) of LPP (Figure 5). 

Recent findings suggest that LPP might play a dual 
role in the organization of the actin cytoskeleton and 


in gene regulation (Petit et al., 2000). LPP is able to 
shuttle between the nuclear compartment and the sites 
of cell adhesion. At the sites of cell adhesion, more and 
more proteins are being identified that not only play a 
role in maintaining cell shape and motility but that, in 
addition to these structural functions, are also impli- 
cated in signaling events. In recent years, there has been 
increasing recognition that signaling events do not 
take place freely in the cytosol of the cell but, rather, 
occur in physically and functionally distinct signaling 
units. These signaling complexes may be organized 
around scaffold/adaptor proteins containing multiple 
protein-protein interaction motifs (Pawson and Scott, 
1997). Because of this dual function, these proteins 
have to interact, via multiple binding motifs, with com- 
ponents of both the actin cytoskeleton and signaling 
pathways that regulate, for example, gene expression. 
Therefore, it is important to note that in contrast 
to wild-type LPP, the tumor-specific HMGIC/LPP 
fusion proteins are exclusively located in the nucleus 
and this may result in aberrant signaling of interacting 
proteins. In the case of the scaffold protein LPP, inter- 
acting proteins have been identified. One of these LPP 
partner proteins is also a scaffold protein, since it con- 
tains multiple protein-protein interaction domains. 
This protein is a member of the novel family of LAP 
proteins (Bilder et al., 2000) and interacts via PDZ 


domains with the C-terminus of LPP. The detailed 
analysis of LPP and interacting proteins will reveal 
the exact nature of the signaling pathway in which 
LPP participates (Figure 3). 

In summary, LPP participates in a novel signal 
transduction pathway between the sites of cell adhe- 
sion and the nucleus. The ectopic expression of 
tumor-specific HMGIC/LPP fusion proteins could 
deregulate this pathway, resulting in aberrant growth. 
Defining the physiological function of this signaling 
cascade in the regulating of growth and differentiation 
will provide insight into the molecular mechanism of 
benign solid tumor formation and may be instrumen- 
tal in the development of potential therapeutic agents 
that interfer with tumor growth. 
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The major contribution of Clarence Little was the 
realization of the need for, and development of, inbred 
genetically homogeneous lines of mice. The first mat- 
ing to produce an inbred line was begun by Little in 
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1909, and resulted in the DBA strain, so-called 
because it carries mutant alleles at three coat color 
loci — dilute (d), brown (b), and non-agouti (a). In 
1918, Little accepted a position at the Cold Spring 
Harbor Laboratory, and with colleagues that followed 
— including Leonell Strong, L. and E. C. MacDowell — 
developed the most famous early inbred lines of mice 
including B6, B10, C3H, CBA, and BALB/c. 
Although an original rationale for their development 
was to demonstrate the genetic basis for various forms 
of cancer, these inbred lines have played a crucial role 
in all areas of mouse genetics by allowing independent 
researchers to perform experiments on the same 
genetic material, which in turn allows results obtained 
in Japan to be compared directly with those obtained 
halfway around the world in Italy. A second, and more 
important, contribution of Little to mouse genetics 
was the role that he played in founding the Jackson 
Laboratory in Bar Harbor, Maine, and acting as its 
first Director. The Jackson Laboratory has become a 
crucial center for the research, education, and the 
actual production of laboratory mice for other 
researchers around the world. 


See also: BALB/c Mouse; Coat Color Mutations, 
Animals; Inbred Strain 
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The LMO family of genes (Table |) was uncovered by 
the association of LMO1 (previously called RBTN1 
or TTG1) with the chromosomal translocation t(11;14) 
(p15;q11) human T cell acute leukemia (T-ALL). 
Using LMO1 probes, the two related genes LMO2 
and LMO3 were isolated (previously called RBTN2 
or TTG2 and RBTN3, respectively), of which LMO2 
is located at the junction of the chromosomal trans- 
locationt(11;14)(p13;q11)alsoin T-ALL. Subsequently 
a fourth member of the family was discovered, LMO4, 
but this gene, like LMO3, has no known association 
with chromosomal translocations. Although the 
LMO genes are evolutionary descendents, their exon 
structures vary; LMO1 and LMO4 have four coding 
exons, whilst LMO2 and LMO3 have three coding 
exons. Conservation between homologs in different 
species is extremely high, suggestive of defined and 
crucial roles for these genes. Each of the LIM-only 
genes encode a protein essentially consisting of two 


Table I The LMO family of genes and chromosomal 

location 

Gene Chromosome Human translocation 
Man Mouse 

LMOI IipI5 7 t(11;14)(p15;ql1) 

LM02 IipI3 2 t(11;14)(p13;ql1) 

LM03 12 pl2-13 6 nd 

LM04 | p22.3 3 nd 


The LMO gene family (LIM-Only genes and previously called 
RBTN and TTG genes) has three known members. LMO! 
(previously RBTNI/TTGI) was identified first and then 
LMO2 (previously RBTN2/TTG2) and LMO3 (previously 
RBTN3). Subsequently, a fourth member, LMO4, was iden- 
tified. LMO! and LMO2 are both located on the short arm 
of chromosome || and are both involved in independent 
chromosomal translocations in human T cell acute leuke- 
mia. As yet, LMO3 nor LMO4 have not been found in 
association with any chromosomal translocations. 


zine-binding LIM domains. Short stretches at the 
N-termini of LMO1 and LMO2 have transcriptional 
transactivation activity. 


LMO Genes Encode Transcriptional 
Regulators in Development 


The unique feature of the LMO-derived protein 
sequences is that they are small proteins comprising 
two tandem LIM domains. These zinc-containing 
finger-like structures have structural similarities to the 
DNA-binding GATA fingers but as yet no case of 
a direct, specific LIM—DNA interaction has been 
reported; rather the function of this domain appears 
to be restricted to protein-protein interaction. Gene 
targeting showed that the mouse Lmo2 gene is neces- 
sary for yolk sac erythropoiesis in mouse embryogen- 
esis. Further the use of embryonic stem (ES) cells with 
null mutations of both alleles of Lmo2 in chimeric 
mice has shown that adult hematopoiesis, including 
lymphopoiesis and myelopoiesis, fails completely in 
the absence of Lmo2. In addition, Lmo2 is required 
for the remodeling of existing blood capillary endo- 
thelium into mature blood vessels (the process of 
angiogenesis) but not in the de novo formation of 
capillaries (vasculogenesis). 


The Role of the LIM Domain in Protein 
Interaction 


The LIM domain acts as a protein interaction module. 
For instance, Lmo2 and Tall/Scl proteins (the latter is a 
basic helix-loop-helix protein) could interact directly 
with each other mediated through the LIM domains. 
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Lmo2 participates in DNA-binding complexes. (A) Erythroid Lmo2-containing complex. The Lmo2 


protein interacts with Tall and with GATAI in a complex comprising an Tall-E47 dimer, binding an E-box (CANNTG) 
and a GATAI molecule, binding a GATA site, as part of an erythroid complex, which presumably regulates target 
genes. (B) T cell Lmo2-containing aberrant complex. An analogous DNA-binding complex comprises bHLH 
heterodimers linked by Lmo2 and LdbI proteins, binding to dual E-box sites. 


The LIM domains of LMO1 and LMO2 can bind 
various proteins, such as GATA1, GATA2, and 
Ldb1/Nlil protein. This array of interactions led to 
the observation that Lmo2 can be found in an oligo- 
meric complex in erythroid cells which involves Tall, 
E47, Ldb1, and Gata-1 This complex is able to bind 
DNA through the GATA and bHLH components 
thereby recognizing a unique bipartite DNA sequence 
comprising an E-box separated by one helix turn from 
a GATA site, with Lmo2 and Ldb1 proteins seem- 
ing to bridge the bipartite DNA-binding complex 
(Figure IA). Different Lmo2-containing complexes 
may exist in different hematopoietic cell types, 
which may differ in the types of protein factors 
expressed and may control distinct sets of target genes 

Protein-protein interactions are crucial control 
points for normal cells and alterations in these are 
important components in tumorigenesis after chromo- 
somal translocations have taken place. Gain-of-func- 
tion transgenic mouse models of LMO gene expression 
induce clonal T cell leukaemia with a long latency, 
indicating that the transgenes are necessary but not 
sufficient to cause tumours. These mice show an accu- 
mulation of immature CD4~, CD8~, CD25+, CD44" 
T cells in transgenic thymuses compared to nontrans- 
genic littermates. Thus the role of Lmo2 in T-ALL is 
to cause an inhibition in T cell differentiation. T-ALL 
cells contain a Lmo2 complex which, like its analog in 
erythroid cells, binds to a bipartite DNA recognition 
site. Analysis of the components of this complex 
showed that E47—-Tall bHLH heterodimeric elements 
were present as well as Lmo2 and the Ldb1 proteins 
(Figure IB). A possible role for the E-box—E-box 
binding T cell complex is the regulation of specific 
sets of target genes which, based on the difference in 
DNA-binding site, would differ from those putative 
genes controlled by the Lmo2-multimeric complex in 
hematopoietic cells. 
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See also: Leukemia, Acute; Translocation 
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A locus is any location on a chromosome, or any 
region of genomic DNA (of any length from a few 
base pairs to a megabase-size region containing a large 
gene family), that is considered to be a discrete genetic 
unit for the purpose of formal linkage analysis or 
molecular genetic studies. 


See also: Alleles 


LOD Score 
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The LOD score (‘logarithm of the odds’ score) is a 
statistical test for measuring the probability that there 
is linkage of loci. For non-X-linked genetic disorders in 
humansa LOD score of +3 (1000:1) is generally taken to 
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indicate linkage (compared to the 50:1 probability that 
any random pair of loci will be unlinked). 


See also: Linkage 


Long-Period Interspersion 
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Long-period interspersion is a genomic pattern in 
which long stretches of moderately repetitive and 
nonrepetitive DNA alternate. 


See also: Genome Organization 


Long Terminal Repeats 
(LTRs) 
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Long terminal repeats (LTRs) are identical DNA 
sequences, several hundred nucleotides in length, 
found at the ends of transposons and retrovirus- 
derived DNA. LTRs contain inverted repeats and are 
thought to play an essential role in the integration of 
the transposon or provirus into the host DNA. In 
proviruses the upstream LTR acts as a promoter and 
enhancer and the downstream LTR as a polyadenyl- 
ation site. 


See also: Provirus; Retroviruses; Transposable 
Elements 
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The development of tumors is associated with genetic 
damage confined to the cells of the tumor. This genetic 
damage can be visualized by examination of the 
tumor karyotype. Solid tumors, particularly those of 


epithelial origin, are characterized by highly aneu- 
ploid karyotypes with deletions as a common, fre- 
quently tumor-specific feature. If a patient’s normal 
and tumor DNA are compared at a locus known to be 
heterozygous in that patient’s normal DNA, it is pos- 
sible to determine whether the tumor DNA has suf- 
fered genetic loss (deletion) encompassing that locus. 
If it has, only one of the two alleles will be detectable, 
and the locus will appear to be homozygous in the 
tumor and will show loss of heterozygosity (LOH). 


Sources of Heterozygosity and their 
Detection 


Within the mammalian genome, the majority of DNA 
is not involved in coding for proteins. Lack of selection 
pressure on this noncoding DNA allows inconsequen- 
tial mutations to accrue. A locus at which the two 
parental alleles differ because of mutation is describ- 
ed as heterozygous/polymorphic. Single-nucleotide 
polymorphisms (SNPs) which form part of the re- 
cognition site for restriction enzymes were the first 
source of heterozygosity to be exploited for LOH 
analysis: first by Southern blotting, comparing normal 
and tumor DNA digested with the appropriate 
restriction enzyme, and then using PCR to amplify 
the region flanking the polymorphism followed by 
digestion of the PCR product with the restriction 
enzyme. However, the source of polymorphism 
most often used now exploits the observation that 
repetitive DNA occurs frequently in mammalian gen- 
omes. This DNA is often arranged in tandem repeat 
units, ranging in size from 8 to 50bp, referred to as 
variable number of tandem repeats (VNTRs) or mini- 
satellites. Of most value for LOH analysis are the 
repeat units ranging from 2 to 6 bp called ‘microsatel- 
lites.” Human populations are highly polymorphous 
in the number of these repeats, such that the average 
rate of heterozygosity is more than 70%. Furthermore 
they are abundant and evenly distributed throughout 
the human genome, making them ideal genetic mar- 
kers. They are detected by size fractionation after 
amplification by PCR using priming sites which 
flank the repeat region. 

Recently there has been renewed interest in SNPs 
other than those involved in restriction enzyme sites. 
These are widely and evenly distributed throughout 
the human genome. Their information content is not 
as high as microsatellites, since they are biallelic, but 
the single base-change difference is much more amen- 
able to high-throughput detection than the size differ- 
ences of microsatellites, and they are likely to be the 
markers of choice for future genetic analyses, includ- 
ing LOH. 


LOH and Location of Tumor Suppressor 
Genes 


Tumor suppressor genes are recessive and require 
inactivation of both alleles for a phenotypic effect. 
Inactivation is frequently by mutation of one allele 
and loss, through chromosomal deletion, of the sec- 
ond. Chromosomal deletion is often first discovered 
by cytogenetic analysis of a few samples, usually of 
cell lines, and then confirmed by LOH analysis of 
paired tumor and normal DNA from a larger number 
of individual patients. This requires a group of poly- 
morphic loci within and flanking the deleted region 
whose relative chromosomal positions are known. 
Many such loci have been identified and assigned a 
chromosomal location (D number in humans). By 
comparing the delineated stretch of LOH on the 
chromosome in individual patients in a large number 
of tumor/normal pairs, a common, minimally deleted 
region can be defined. This is sometimes small enough 
(less than 1 Mb) to allow the region to be investigated 
for genes which can be evaluated as tumor suppres- 
sor genes. This method of gene isolation, known as 
positional cloning, has been effective in the isolation 
or confirmation of a number of tumor suppressor 
genes. 

Some tumors appear to have multiple but distinct 
regions of LOH on the same chromosome arm. It 
is uncertain whether all these regions of LOH indi- 
cate different tumor suppressor genes involved in 
the development of that tumor or whether some of 
the deletions occur as a consequence of the primary 
damage to the chromosome. 


LOH Analysis and Clinical Research 


Where tumor karyotyping is difficult, tumor DNA 
samples can be assessed for regions of allele loss by 
performing LOH analysis using evenly distributed 
markers for all chromosomes: ‘allelotyping.’ Different 
tumor types have regions of LOH in common, indi- 
cating a common defective gene in their etiology. This 
has been confirmed on isolation and mutation analysis 
of a gene within a deletion common to a variety of 
tumors. Despite this overlap, there are distinct pat- 
terns of LOH, sometimes associated with tumor pro- 
gression, and thus loss of particular regions can have 
prognostic significance. The overall pattern of allele 
loss as determined by LOH analysis (together with 
any detected point mutations) can serve as a signature 
of an individual patient’s tumor. The pattern of allele 
loss displayed by a tumor can be detected in material 
exfoliated from the tumor and sometimes in the 
patient’s blood. This pattern, the signature, can be 
used as a means of following the course of disease 
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during treatment and can indicate relapse before ob- 
vious clinical symptoms appear. 
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Lotus japonicus is a model plant for the legumes. The 
Leguminosae (or Fabaceae) family is represented by 
approximately 18000 species and is the third largest 
family of angiosperms. With around 700 genera 
divided into three subfamilies, Papilionoideae, Cae- 
salpinioideae and Mimosoideae, the Leguminosae pre- 
sent a wealth of diversity. Several legumes, for example 
pea (Pisum sativum), soybean (Glycine max), peanut 
(Arachis hypogaea), and beans (Phaseolus vulgaris) are 
well-known and important crop plants. Others are 
cultivated as ornamentals, vegetables, pulses, or for 
production of protein, oil, and pharmaceuticals. 

Lotus japonicus originates from East Asia and the 
species is distributed over the Japanese islands, the 
Korean peninsula, and east and central parts of China 
and has been reported from northern India, Pakistan, 
and Afghanistan. Two ecotypes ‘Gifu’ and ‘Miyako- 
jima’ have been chosen for model studies. Lotus japon- 
icus is a close relative of the tannin-containing 
tetraploid forage legume L. corniculatus (birdsfoot 
trefoil) cultivated for its antibloating properties. 
Phylogenetically, L. japonicus belongs to the tribe 
Loteae in Papilionoideae, the largest subfamily of the 
Leguminosae. 

Many cultivated legumes like pea and soybean 
have complex genomes or are, for other reasons, not 
amenable to modern molecular genetic methods. Its 
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favorable biological properties made L. japonicus the 
model plant of choice for classical and molecular 
genetic analysis of legumes. The qualities of L. japo- 
nicus are: a short seed-to-seed generation time, a small 
genome size of approximately 450 Mb, diploid genet- 
ics, six chromosome pairs, self-fertile flowers, ample 
seed production, small seeds, simple nonspiral seed 
pod, large flowers enabling manual crossing, de- 
scribed transformation procedures using Agrobacter- 
ium tumefaciens or A. rhizogenes, described in vitro 
tissue culture and regeneration procedures, effective 
nodulation and mycorrhization. 

Most legumes develop root nodules in symbiosis 
with nitrogen-fixing soil bacteria belonging to the 
Rhizobiaceae, and nodulated legume plants can use 
atmospheric dinitrogen as their sole nitrogen source. 
The interaction between the bacterial microsymbionts 
and legumes is selective. Individual species of rhizobia 
have a characteristic host range allowing nodulation of 
a particular set of legume plants. Mesorhizobium loti 
and the broad host range Rhizobium sp. NGR234 
induce nitrogen-fixing root nodules on L. japonicus. 
Roots of L. japonicus are also effectively colonized by 
symbiotic arbuscular mycorrhizal fungi, for example 
Glomus intraradices and Gigaspora margarita. These 
fungi invade the root tissue by intercellular and intra- 
cellular hyphal growth and form arbuscules in cortical 
cells where metabolic interchanges take place. Mycor- 
rhizal hyphae increase the root surface and improves 
phosphor uptake. 

Identification of single gene plant mutants impair- 
ed in both colonization by mycorrhizal fungi and 
rhizobial invasion demonstrates that the two interac- 
tions share common steps during the early infection 
processes. Extending this observation may open a 
broader approach to the understanding of plant- 
microbe interactions, where symbiotic studies not 
only contribute to realization of the potential of sym- 
biosis, but also to our understanding of (for example) 
plant-pathogen interactions. 

One of the interests of the plant science community 
is to use L. japonicus in the molecular genetic analysis 
of symbiosis. For this purpose, tools and resources for 
molecular analysis have been established. Insertion 
mutagenesis is possible with T-DNA or the maize 
transposon Ac, and EMS is effective for chemical 
mutagenesis. After mutant screening, more than 40 
symbiotic loci have been identified. The phenotypes 
of these developmental plant mutants divide them 
roughly into three classes: non-nodulating mutants 
arrested in bacterial recognition or nodule initiation; 
nodule development mutants arrested at consecutive 
stages of the organogenic process; and autoregulatory 
mutants where the plant control of root nodule num- 
bers is nonfunctional. Development of root nodules 


can thus be divided into a series of genetically separ- 
able steps. For further studies the following genome 
resources are being developed: a general genetic map 
and bacterial artificial chromosomes (BAC) libraries 
for positional cloning of untagged mutants; recombin- 
ant inbred lines; and inventories of expressed 
sequence tags (ESTs) sampling the gene expression 
profiles from several tissues and growth conditions 
(www. Viazusa.or.jp/een/index.html). 

Sequencing of the L. japonicus genome has been 
initiated. The sequences of the bacterial genes required 
for nodulation and nitrogen fixation located on the 
pSym plasmid of NGR234, and the complete genome 
of Mesorhizobium loti, are available, together with a 
wide selection of rhizobial mutants. 

Like soybean, L. japonicus develops the determin- 
ate type of nodules. In contrast to for example pea 
nodules with a persistent meristem, the meristematic 
activity ceases early in determinate nodules develop- 
ing on L. japonicus. After the initial phase with meri- 
stematic cell proliferation determinate nodules grows 
by expansion giving a typical spherical shape. All 
developmental stages from root hair curling to nodule 
senescence are consequently phased in time. Root 
nodule development is a rare example of induced and 
dispensable organ formation in plants. Nodulation 
mutants can be rescued on nitrogen containing nutri- 
ent solution and developmental control genes that 
would compromise plant development and comple- 
tion of the life cycle in other organogenic processes 
could thus be identified from nodulation mutants. 
See www.mbio.aau.dk/nchp/table1.html for a list of 
literature on L. japonicus. 
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Lung tumors, like all common human epithelial 
tumors, have abnormal chromosomes, usually in 
both number and structure. Despite this polyploidy 
and aneuploidy, cytogenetic analysis has indentified a 
number of features which occur frequently in lung 


tumors, and further study of these abnormal regions 
has led to an understanding of molecular genetic 
changes underlying the development of lung cancer. 


Cytogenetic Analysis 


Tumor biopsies are a poor source of material for 
chromosome preparation, and most cytogenetic 
analysis has involved the use of cell lines established 
in tissue culture. Through the development of select- 
ive tissue culture media, hundreds of cell lines have 
been established, making lung tumors one of the most 
extensively studied types of tumor by karyotyping. 
Most work has used traditional G-banding, but more 
recently chromosome-specific paints have been used. 
Comparative genome hybridization in which tumor 
and normal DNA are competitively hybridized to 
normal chromosome spreads has been used to confirm 
and extend observations made by traditional cytogen- 
etics. Molecular genetic analysis using DNA isolated 
from tumors has confirmed the existence of gene 
amplifications and chromosomal deletions and valid- 
ated the cell lines as accurate representations of the 
tumors from which they were established. 


Common Cytogenetic Abnormalities in 
Lung Cancer 


Lung tumors are subdivided into histological subtypes 
which have a different clinical course and require 
different treatment. Nonetheless they are believed to 
a common histogenesis. Most cytogenetic abnormal- 
ities have been detected in all the different histological 
subtypes, although it is common for an abnormality to 
be seen ina higher proportion of small cell carcinomas 
than in non-small cell carcinomas. Common deletions 
are of 3p (associated genes are FHIT and others), 
9p (associated gene, p16’’*“), 17p (associated gene, 
TP53), and 13q (associated gene, RB). Other regions 
have also been noted (e.g., 5q and 10q) but most studies 
now use loss of heterozygosity for revealing and defin- 
ing deletions. Homogenously staining regions and 
double minutes are detectable in lung tumor karyo- 
types and are sometimes associated with amplification 
of members of the MYC gene family. Translocations 
have rarely been observed in lung tumors. 
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Salvador Edward Luria (1912-91), an Italian-born 
American geneticist, was born 13 August 1912 in 
Turin. His research focused on the genetics of bacteria 
and bacteriophages, as well as the action of bacterio- 
cines and bacterial membranes. Among many hon- 
ors, Luria received the Nobel Prize for Physiology 
or Medicine in 1969, sharing it with Max Delbriick 
and Alfred Hershey. 

Luria received his MD degree from the University 
of Turin in 1935. During his medical training, he 
became interested in physics and its applications to 
biology, leading him to do advanced work in radi- 
ology and physics in Rome, working with such teachers 
as Enrico Fermi and collaborating with Geo Riva, an 
Italian phage biologist. Leaving Italy because of 
Mussolini’s “Racial Manifesto” in 1938, Luria moved 
to Paris where he collaborated with Elie Wollman and 
the well-know physicist Fernand Holweck at the 
Radium Institute on radiobiological experiments to 
determine the size of a bacteriophage. Again to avoid 
persecution, he left Paris and joined a group of radio- 
biologists under Frank Exner, a physicist at the 
College of Physicians and Surgeons of Columbia 
University from 1940 to 1942. He taught in the Biology 
Department of Indiana University (Bloomington) 
from 1943 to 1950, at the University of Illinois 
(Urbana) from 1950 to 1959, and then at the 
Massachusetts Institute of Technology until his death 
in 1991. 

In 1941 Luria met Max Delbriick and they began a 
lifetime of collaboration and friendship. Luria secured 
a faculty position at Indiana University in 1943 and he 
and Delbriick along with Hershey, initiated the 
research school now known as the “American Phage 
Group.” Much of Luria’s early research was domin- 
ated by his orientation toward radiobiological target 
theories that fit well with Delbriick’s attempts to make 
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atomic physics relevant to genetics. One important 
line of work was a collaboration undertaken one 
summer at the Cold Spring Harbor Laboratory 
with Raymond Latarjet, a visiting French scientist. 
Latarjet was interested in the use of radiobiological 
target theory to follow the increase in intracellular 
infectious phage as a way to study phage multiplica- 
tion (prior to the availability of radioisotopic tracers). 
Luria and Latarjet showed that this approach worked 
and for the first time obtained a detailed view of 
intracellular phage replication. This approach, later 
known as the Luria—Latarjet (or simply the L-L) 
experiment, was widely employed in the late 1940s 
and early 1950s. 

An old problem in phage biology, that of the 
appearance of phage-resistant bacteria, interested 
Luria. He and Delbrück devised a way to test if the 
phage-resistant bacteria were produced spontaneously 
and subsequently grew out under selective conditions, 
or conversely, if the phage somehow induced the 
phage resistance to appear. Their approach was both 
sound and elegant, but indirect, relying as it did on 
probabilistic arguments similar to those they had 
often used in their radiobiological target theory 
work. This experimental approach, which came to be 
known as the Luria—Delbriick experiment, has been 
widely hailed at a landmark in the development of 
bacterial and molecular genetics. 

While trying to better understand phage resistance 
and host-range mutations in bacteriophage, Luria and 
his collaborator Mary L. Human discovered that bac- 
teriophages are subject to subtle “modification” by the 
last host in which they grew so that they might be 
“restricted” in their growth on hosts of different 
strains. In 1952 they described the phenomenon of 
host restriction—modification. The genetics of this 
phenomenon, as well as its biochemical explanations, 
were subsequently worked out by others. As is well 
known, this forms the basis for much current bio- 
technology. 

In his later research, Luria turned to a phenomenon 
that was historically related to bacteriophage, namely 
that of bacteriocines. He investigated the physiology 
of these lethal molecules produced by some strains of 
bacteria that kill closely related strains, apparently to 
gain competitive advantages in natural environments. 
Luria and his collaborators focused mainly on the 
effects these proteins have on the functions of bacterial 
membranes, and they made substantial contributions 


to this field. 
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In 1943 Salvador Luria and Max Delbrück published 
“Mutations of bacteria from virus sensitivity to virus 
resistance” (Luria and Delbrück, 1943). In this paper 
they presented a novel experimental design aimed at 
answering two questions: Do mutations (to bacterio- 
phage resistance) occur randomly in the absence of 
the selective agent, and if so, how can the mutation 
rate be estimated? The simplicity of its design and its 
wide applicability in microbial and cell genetics for the 
measurement of mutation rates has insured its eponym- 
ous status as a “classic experiment.” 

Since the early work on the existence of mutations 
in bacteria by Beijerinck, Neisser, and De Kruif, 
among others, it was unclear whether the conditions 
used to select or observe the mutations were actually 
inducing the altered state or simply allowing out- 
growth of preexisting variants. Since mutations 
seemed to be rare events, it was difficult to observe 
the infrequent mutants in populations of bacteria 
prior to the application of some selection which in- 
hibited the wild-type and permitted growth of the 
mutants. In the early 1930s this problem was taken 
up by I.M. Lewis who investigated a lactose-negative 
strain of Eschenichia coli designated mutabile (because 
it was noted to revert to lactose-utilization with some 
observable frequency). Lewis clearly formulated the 
problem and carried out careful plating experiments 
and concluded that the lactose-fermenting colonies 
that developed on lactose-containing medium came 
from the few variants that already existed in the cul- 
ture which had been grown in glucose-containing 
medium. 

When Luria and Delbriick investigated the process 
of bacteriophage multiplication, they observed the 
common phenomenon of phage-resistant variants. 
The origin of such phage resistance had been uncertain 
since its discovery almost as soon as phage had been 
discovered in 1917. Some experiments supported the 
notion that the phage resistance was acquired only 
after exposure to phage, and thus phage acted as a 
mutagen to change cell properties. Other experiments 
supported the idea that phage resistance occurs spon- 
taneously even in the absence of exposure to bacterio- 
phage. With subsequent deeper understanding, both 
of the genetics of bacteria and of the phenomenon of 


lysogenic immunity, it is now known that both 
mechanisms can occur. In these particular studies, 
their results clearly confirmed that the mutation to 
phage resistance had occurred spontaneously, prior 
to exposure to the phage. 

What Luria and Delbriick realized was that because 
of the clonal, exponential growth of bacteria from a 
single cell (or at least a small homogeneous popula- 
tion), any mutation which appears at some stage in the 
exponential growth of the population is propagated 
exponentially as well, and thus a large population 
contains all the mutant progeny descended from each 
mutation event that occurred in the culture. If a muta- 
tion event occurred early in the history of a culture, a 
high fraction of the population would be mutant, 
whereas if a mutation event occurred late in the his- 
tory of a culture, it would be represented by a very 
tiny proportion of the total population. Because of the 
rare occurrence of mutations, one would expect some 
populations to have a high fraction of mutants, some 
to have very few, and some to have in-between frac- 
tions, that is, in a series of replicate populations, the 
variation in the proportion of mutants would be great. 
Under the contrasting hypothesis, that is, if the select- 
ive condition imposed on the final population was 
causing the mutations, then, because the selection 
would be applied to nearly identical numbers of cells 
in the large, final populations, one would expect that 
the number of induced mutants would be about the 
same. So in this case, the expected variation would be 
very small. The difference in the two hypotheses, then, 
would appear in the size of the variations (fluctuation) 
in the proportion of mutants in multiple replicate 
populations grown up from pure wild-type parental 
organisms. Luria and Delbriick formalized the math- 
ematical analysis of this process and observed that 
under the assumption that both the wild-type and 
mutant organisms grow exponentially at the same 
rate, one can calculate from the experimental param- 
eters (number of generations, mutation frequencies) 
the actual mutation rate (as distinct from mutant fre- 
quency), that is, the number of mutation events per 
cell per generation. 

The design of their experiment was extended to 
studies of both spontaneous mutation rates and induc- 
ed mutations. The measurement of rates rather than 
frequencies of mutations greatly clarified this process 
and its genetic basis. The Luria—Delbriick ‘Fluctuation 
Test,’ as itis sometimes called, is indirect and statistical; 
because of the importance of the hypothesis of spontan- 
eous mutation with subsequent selection (a basic 
principle of neo-Darwinism), additional research led 
to more direct confirmation of their findings. One such 
example was the replica-plating method of Lederberg 
and Lederberg for studying phage resistance. 
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The Lutheran blood group system is a complex system 
consisting of 18 red cell antigens, including four pairs 
of allelic antigens: Lu’, Lu; Lu6, Lu9; Lug, Lu14; Au’, 
Au’. The Lutheran glycoprotein, a member of the 
immunoglobulin superfamily of receptors and adhe- 
sion molecules, binds the extracellular matrix glyco- 
protein laminin. 


See also: Blood Group Systems; Immunoglobulin 
Gene Superfamily 
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The cultivated tomato (Lycopersicon esculentum Mill.) 
and related wild species are members of the Solanaceae 
family, which includes potato, tobacco, and petunia, as 
well as the deadly nightshade. Though native to the 
Andean region of South America, tomato was first 
domesticated in Mesoamerica, to which it owes its 
common name, a derivation of the Nahuatl (Aztec) 
word ‘tomatl.’ Since its introduction to Europe in the 
early sixteenth century (Figure 1), tomato has 
assumed an increasingly important role in the diets 
of many cultures. Despite the relatively low nutrient 
content of its fresh fruit, tomato is a leading source of 
vitamins A and C and antioxidants such as lycopene, 
due in large part to its heavy consumption in either 
fresh and processed forms. 

As an experimental organism for genetic studies, 
tomato presents many advantages. The cultigen and 
related wild species are true diploids, with a chromo- 
some number of 27 = 2x = 24. Eleven of the 12 chromo- 
somes in its haploid nucleus are submetacentric, 
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Figure | Woodcut of tomato from P. A. Matthiolus 
(1554) Commentarii in libris sex Pedacii Dioscoridis 
Anazarbei, de medica materia. Venetiis. The 1544 edition 
of this herbal includes the first recorded mention of the 
tomato in Europe, which consists of a brief description 
of the plant and its culinary use in Italy at that time. 


while chromosome 2 has an extremely short hetero- 
chromatic short arm consisting primarily of the 
nucleolus organizer region. Each chromosome is dis- 
tinguishable from the others during pachytene by the 
pattern and length of chromatic and achromatic 
regions which are illustrated in corresponding cyto- 
logical maps. The tomato genome is also well defined 
by genetic maps based on morphological and/or mo- 
lecular markers; the high-density molecular marker 
map contains over 1000 restriction fragment length 
polymorphism (RFLP) markers comprising a total of 
1276 map units (Tanksley et al., 1992). In addition, the 
genetic maps have been integrated with cytological 
maps by the analysis of induced deletions. 

The relatively low haploid DNA content of 
tomato, ~ 950 Mb, makes it well suited for molecular 
studies. Though larger than Arabidopsis or rice (about 
145 and 425 Mb, respectively), the tomato genome is 


smaller than many other model plant species, such as 
maize or wheat (2500 and 16000 Mb, respectively). 
The average ratio of physical to genetic distance is 
750 kb/cM, a value low enough to enable the pos- 
itional cloning of genes in most genomic regions. 

Tomato is naturally self-pollinated, which simpli- 
fies the maintenance of stocks, yet hybridizations are 
easy to perform and yield large quantities of seed of 
controlled parentage. Tomato can be grown under a 
wide range of environmental conditions and propa- 
gated through seed or asexually via rooted cuttings. Its 
photoperiodic insensitivity and relatively short gen- 
eration time permit the culture of three or more gen- 
erations per year. The structure of the tomato plant, 
particularly its compound leaves and indeterminate 
sympodial growth habit, allows detection of an 
enormous array of hereditary variations; mutations 
that result in altered growth habit, leaf shape, texture 
and color, flower morphology, color, and function, 
and fruit size, shape, and color have been described. 
Tomato also provides a popular model for physio- 
logical and biochemical studies of fruit development, 
quality, and ripening. 

Protoplasts are easily cultured, fused, and regener- 
ated into whole plants. Transgenic plants are readily 
obtained by cultivation of cotyledon explants with 
Agrobacterium tumefaciens, followed by shoot regen- 
eration. As a result of these and other advantages, the 
first transgenic food plant (GMO) to be marketed in 
the USA was a tomato (FlavrSavr™). For the analysis 
of gene function in tomato, there are several methods. 
The most widely applied are gene silencing, by trans- 
formation with antisense or cosuppression constructs, 
and complementation by transformation with sense 
constructs. Also, the maize transposable elements Ac 
and Ds, which are active in tomato and show the same 
preference for transposition to linked sites, can be 
used to produce insertional mutants. In contrast, 
insertional mutagenesis using the Agrobacterium 
T-DNA element is a relatively inefficient process in 
tomato, unlike Arabidopsis. Finally, the use of radi- 
ation-induced deletions is limited by their generally 
lethal affect during gametogenesis. 

Research on tomato has depended to a large extent 
on genetic resources such as mutants, wild species 
populations, and other genetic stocks which are avail- 
able to researchers through genebanks such as the 
C.M. Rick Tomato Genetics Resource Center 
(TGRC) at the University of California, Davis. The 
TGRC maintains over 1000 monogenic stocks, con- 
sisting of spontaneous or induced mutations at 600+ 
loci affecting most aspects of plant development and 
morphology. Over 1400 other genetic and cytogenetic 
stocks, including mutant combinations, transloca- 
tions, trisomics, autotetraploids, Latin American 


cultivars, and derivatives of wild species such as alien 
additions, substitutions, and introgression lines, are 
maintained by the TGRC. Lastly, the collection also 
includes over 1100 wild species accessions, represent- 
ing nine Lycopersicon and four Solanum species, of 
which all but two can be crossed to L. esculentum, 
albeit with varying degrees of difficulty. 

These wild populations contain a vast amount of 
genetic diversity, in contrast to the cultigen which 
is severely depleted, and are important sources of 
enhanced disease resistance, yield, fruit quality, envir- 
onmental stress tolerance, and other desiderata of 
interest to breeders. Resistances to over 42 diseases 
have been detected in the wild relatives, many of 
which have been bred into the cultivated tomato; 
cloning and sequencing of many of these resistance 
genes has contributed to our understanding of the 
molecular basis of plant-pathogen interactions. The 
wild Lycopersicon species are also tolerant of abiotic 
stresses encountered in their native habitats, which 
include extreme aridity (e.g., Atacama desert), flood- 
ing and high humidity (e.g., equatorial jungle), saline 
soils (e.g., coastal bluffs in Galapagos Islands), and 
freezing or chilling temperatures at high elevations in 
the Andes. Though bearing horticulturally unaccept- 
able fruit, the wild species contain alleles that when 
bred into cultivated tomato confer desired character- 
istics such as increased soluble solids, fruit color, size, 
and yield. 

Despite the complex genetic control of these fruit 
traits, the application of molecular marker maps has 
resolved quantitative trait loci (QTLs) for each of 
them. In the case of fruit size, a single QTL (fw2.2) 
accounts for a large portion of the difference between 
wild and cultivated forms; the recent cloning of this 
QTL (Frary et al., 2000) has contributed to our under- 
standing of the molecular basis of plant domestication, 
and has demonstrated that even genes for complex 
traits such as yield can be isolated through the use of 
molecular maps. Levels of diversity in Lycopersicon 
species vary greatly, due in large part to differences 
in mating systems, which include autogamy, faculta- 
tive allogamy, and self-incompatibility of the gameto- 
phytic type; tomato is therefore a rich source of allelic 
variation for evolutionary and molecular studies of 
self-incompatibility, pollination biology, and many 
other reproductive characters. 

Information on tomato germplasm and many types 
of genetic data are available through online databases. 
The TGRC database (http://tgrc-ucdavis.edu) pro- 
vides search tools, gene descriptions, and photos of 
mutants and wild species from its collection. The 
GRIN database (http://www.ars-grin.gov) allows 
users to search the US Department of Agriculture’s 
entire National Plant Germplasm System, which 
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includes over 5000 accessions of tomatoes, mostly 
cultivated forms, maintained by the USDA at Geneva, 
New York. The SolGenes database (http://ars-geno- 
me.cornell.edu/solgenes/) interconnects genetic maps, 
gene sequences, probes, marker polymorphisms, QTLs, 
and other data on the tomato, potato, pepper, and 
eggplant genomes. 

In conclusion, tomato has many favorable genetic 
and biological attributes, in addition to its status as 
a crop plant, which contribute to its usefulness as 
an experimental organism for genetic research. With 
excellent germplasm collections, databases, and mol- 
ecular resources, tomato will likely remain an import- 
ant tool for plant geneticists in the era of genomics. 
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Trofim Denisovich Lysenko (1898-1976) (Figure 1) 
was prominent in the study of heredity in the Soviet 
Union, and a major political force in Soviet science 
under Joseph Stalin (from about 1934 to 1965). He 
believed in mechanisms of heredity that denied the 
primary importance of genes and mutations, and sup- 
ported research predicated on his beliefs about the 
influence of environment on heredity. Because of his 
powerful political positions in the Soviet government, 
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Figure | 


T.D. Lysenko. (From Lysenko, 1954.) 


he dominated the direction of Soviet genetic research 
for several crucial decades. His particular doctrine was 
termed ‘agrobiology’ and in the West came to be 
known as ‘Lysenkoism.’ 


Lysenko 


Lysenko was born in Karlovka, about 50 miles south- 
west of Kharkov, in Ukraine. His father was a small 
farmer. Lysenko graduated from the Kiev Agricultural 
Institute in 1925 and embarked upon a career of agro- 
nomical research, helped no doubt by the government 
policy (vydvizhentsy) of that time to bring young 
people of peasant and worker backgrounds into posi- 
tions of leadership. He worked on practical breeding 
problems, especially the control of the growing 
periods of agricultural plants. 

In 1925, immediately after his graduation, he went 
to work at the newly established experimental station 
at Kirovabad (in Azerbaijan) and he was entrusted 
with work on breeding legumes for fodder and silage. 
The need for such plants in that region did not corres- 
pond with the availability of reliable water from rains 
or irrigation, so he attempted to find ways to alter the 
growing seasons of legumes to produce fodder in the 
autumn and winter or early spring, when sufficient 
water was present. He sowed varieties of peas, vetch, 


beans, and lentils in the fall and observed that some of 
the peas and vetch survived the winter and produced a 
crop early in the spring. From this research he con- 
cluded that “By changing the external conditions it is 
possible to change the behaviour of different plants of 
the same variety” (Lysenko, 1954, p. 18). In 1929 the 
term ‘vernalization’ was proposed for this plasticity 
of plant varieties. This work was extended to cereals 
and he claimed that spring-sown varieties could be 
transformed into winter-sown forms by the proper 
environmental manipulations. The results of this 
work was reported first at the All-Union Genetics 
Congress in Leningrad in January 1929. 

Lysenko extended his experimental work to actual 
field studies by inducing his father, Denis N. Lysenko, 
to plant winter wheat in the spring. This crop was 
apparently successful and Lysenko reported that: 


In the same summer (1929) the Soviet public learned from 
the press of the full and uniform earing of winter wheat 
sown in the spring under practical farming conditions in 
the Ukraine. (Lysenko, 1954, p. 23) 


This well-publicized work apparently caught the 
attention of both agricultural policy planners as well 
as Marxist philosophers, because 


the Soviet public came to the support of our explanation of 
the length of vegetative period in plants. By order of the 
People’s Commissariat of Agriculture, a special laboratory, 
later a department, was established at the Ukrainian Institute 
of Selection and Genetics (Odessa) to study this problem. 
(Lysenko, 1954, p. 23) 


Lysenko’s theories of heredity drew on Darwinian 
pangenesis, Marxist ideology, and Lamarckianism. 
He wrote: 


Whenever an organism finds the conditions (materials) in 
the external environment which are suitable for its heredity, 
its development takes the same course it took in the preced- 
ing generations. (Quoted in Dobzhansky, 1952, p. 4) 


Heredity “is inherent not only in the chromo- 
somes, but in every particle of the living body” 
(quoted in Huxley, 1949, p. 17). 

By 1932, an agronomy journal, the Bulletin of Ver- 
nalization, began publication to report research in this 
field, and by 1935 Lysenko was its editor, a position he 
held until 1941. From the mid-1930s onward, Lysenko 
became increasingly involved in spreading his beliefs 
about agrobiology and vernalization in opposition 
to what he saw as the erroneous theories based 
on the work of Gregor Mendel, August Weissmann, 
and Thomas Hunt Morgan. His scientific work was 


intimately interwoven with political issues in the 
Soviet Union, and he was eventually relieved of most 
of his leadership roles by 1965. In 1966 he was rele- 
gated to directorship of the Lenin Hills Agricultural 
Experiment Station of the Academy of Sciences until 
his death in 1976. 


Lysenkoism 


Lysenko’s beliefs and theories were so at odds with the 
rest of contemporary genetics, both inside (initially) 
and outside the Soviet Union, that his doctrines came 
to be known as Lysenkoism. He did not, however, 
claim sole credit for his position; Lysenko cited a 
rather obscure Russian horticulturist, plant breeder, 
and patriot, Ivan V. Michurin (1855-1935) as his 
inspiration, and intellectual forerunner. Thus, he 
usually presented his views as “Michurinist’ and he 
and his followers became known by that name. 
Michurin worked with fruit trees and developed a 
theory of ‘mentoring.’ 


By grafting twigs of old varieties of fruit trees on the 
branches of a young variety, the latter acquires properties 
which it lacks, these properties being transmitted to 
it through the grafted twigs of the old varieties. (Lenin 
Academy, 1949, pp. 38-39) 


Michurinist doctrine supposed that hereditary 
properties were transferred from graft to host and 
vice versa, clearly a belief inconsistent with chromo- 
somal theories of genetics. Michurin was a protege of 
Lenin, and Lysenko canonized him as one of the 
founders of the new Soviet biology. 


Soviet Genetics and Politics 


Genetics in the Soviet Union developed along neo- 
Mendelian lines starting in the 1920s and H.J. Muller 
brought the first laboratory stocks of Drosophila to 
the USSR in 1922. In the 1930s Muller spent several 
years as Senior Geneticist in the Institute of Genetics 
of the USSR Academy of Sciences, but left in 1937 
after becoming disillusioned by the political controls 
being exerted over genetics. For complex political and 
ideological reasons, Mendelian genetics came to be 
viewed as ‘idealist’ as opposed to ‘realist,’ a serious 
sin in the Marxist ideology of the time. The new Soviet 
emphasis on scientism and the belief that changes in 
the political environment would create the ‘new 
Soviet man’ led to the hope that similarly, in biology, 
changes in the environment of living organisms, 
including humans, could produce long-lasting, herit- 
able changes (of course, all for the better) in the off- 
spring. Thus, a version of Lamarckianism came to be 
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aligned with orthodox Marxist political philosophy. 
At the same time, internal political struggles in the 
Soviet governing bodies involved important issues 
such as agricultural planning and farm management. 
Lysenko, an ambitious person, allied himself with a 
skilled Marxist philosopher, Isaac I. Prezent, and 
together they attacked Mendelian genetics and its 
practitioners in the USSR in a book published in 
1935. This attack marked the beginning of what later 
became known as “The Lysenko Affair.” Lysenko 
skillfully employed the government press and entered 
politics in 1935 as a member of the Central Executive 
Committee of the Ukranian Communist Party. In 
1936 he was appointed director of the Odessa Institute 
of Genetics and Breeding and that summer the 
presidium of the Lenin All-Union Academy of Agri- 
cultural Sciences (VASKhNIL) initiated public 
discussions on “issues in genetics.” Although the sup- 
porters of Mendelian genetics dominated these discus- 
sions, just a few months later, under the Stalinist Great 
Terror, many senior geneticists were purged and 
Lysenko’s supporters moved in to fill the voids in 
agricultural genetics. By 1938 Lysenko was president 
of VASKhNIL (having replaced Nikolai Vavilov, an 
internationally known geneticist), a member of the 
Supreme Soviet of the USSR, and a deputy head of 
the Soviet of the Union, the highest legislative body in 
the USSR. 

Postwar central planning in agriculture in the 
USSR called for expansion of VASKhNIL which 
Lysenko opposed, and for a time between 1945 and 
1947 there was a period of cooperation between the 
Soviets and the West, during which the geneticists 
recruited international opposition to the Lysenkoists. 
With the onset of the Cold War, however, science 
became part of the “patriotic campaign” and was 
exploited by Lysenko to clamp down on all foreign 
contacts. 

In the summer of 1948 Lysenko staged his famous 
purge of Soviet genetics. Under the guise of open 
discussion of scientific views, he organized a meeting 
to debate “The situation in biological science.” The 
meeting opened with Lysenko reading his carefully 
prepared paper outlining his theories of “Michurinist” 
biology as the basis of the “New Soviet Science.” For 
about a week, many of the leading geneticists in the 
USSR debated and criticized Lysenko’s position paper 
in the spirit of open scientific discussion. At the end of 
the meeting, Lysenko sprang his trap: In his conclud- 
ing remarks, he said 


The question is asked in one of the notes handed to me, What 
is the attitude of the Central Committee of the Party to my 
report? I answer: The Central Committee of the Party ex- 
amined my report and approved it. 
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Thus, all the criticism of his secretly pre-approved 
position rendered the entire Mendelian genetics com- 
munity as enemies of State policy, a serious, possibly 
fatal error at that time. At the next session, many of the 
previously critical geneticists fearfully recanted and 
realized that Lysenko had won the political battle 
for control of hereditary science in the USSR. 
Recent scholarship in newly available archives shows 
that Stalin, himself, worked with Lysenko on the 
draft of his talk to this meeting. The final draft has 
Stalin’s handwritten editing and marginalia, a testi- 
mony to the importance attached to genetics in 
the Soviet Union at that time. It took almost two 
more decades before the cumulative failures of 
“Michurinist” biology, in the form of repeated crop 
failures and food shortages, led to the demise of 
Lysenkoism and the removal of Lysenko from his 
dictatorship of Soviet genetics by Nikita Khrushchev 
in 1965. 
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Lysine is one of the 20 amino acids commonly found 
in proteins. Its abbreviation is Lys and its single-letter 
designation is K. As one of the essential amino acids in 
humans, it is not synthesized by the body and so must 
be provided in the individual’s diet (Figure 1). 


Figure | Lysine. 


Lysis 
E Kutter 
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Lysis is the bursting of a bacterial cell by the breaking 
apart of its cell wall, leading to rupture of the cell 
membrane. An enzyme specialized in this function is 
called a lysozyme. 


See also: Lysozyme 
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Lysogeny is a condition in which a bacterial cell car- 
ries the genome of a virus in a relatively stable state. 
Investigators of bacteriophage growth during the 
1920s and 1930s were often puzzled by a strange 
phenomenon: while some bacteria would produce 
phage shortly after infection, other bacteria yielded 
no phage and even appeared to be immune to infection 
by the phage. However, in a culture of such resistant 
bacteria, small amounts of phage appeared irregularly. 
These puzzling bacteria were termed lysogenic because 
it was supposed that some cells in a culture were 
capable of lysing and producing the observed phage. 
Max Delbriick refused to believe in the phenomenon 
and ascribed the appearance of phage to sloppy tech- 
nique, even though some of the investigators — notably 
Eugene and Elisabeth Wollman — were known to be 
scrupulous workers. 

In 1950, Andre Lwoff and Antoinette Gutmann de- 
monstrated the reality of lysogeny through painstaking 


experiments with a strain of Bacillus megaterium. 
They followed individual cells in microdrops of 
broth by microscopic examination; each time a cell 
divided, the daughter cells were separated into their 
own drops by micromanipulation. Occasionally, a cell 
would disappear from a drop, leaving behind phage 
whose presence could be demonstrated by growth 
on susceptible bacteria. In later experiments, Lwoff 
demonstrated that when lysogenic cells are irradiated 
with UV light, they lyse uniformly and liberate phage, a 
phenomenon called phage induction. The hypothetical 
intracellular state of the phage in a lysogenized bacter- 
ium was called a prophage. Mapping experiments by 
Jacoband Elie Wollman (theson of the above Wollmans, 
who were killed by the Nazis) then demonstrated 
that the phage lambda prophage is located at a specific 
site, near the genes for galactose metabolism. Dale 
Kaiser provided strong evidence that the prophage 
genome is integrated into the bacterial DNA so that 
it is continuous with the bacterial DNA on either side. 
Thus, when bacterial DNA replicates during each 
round of reproduction, the prophage DNA is re- 
plicated as part of the whole genome. (The process of 
lambda integration is discussed in an article of its own.) 

The lysogenic state is maintained by a control sys- 
tem intrinsic to the phage. Phage lambda, which has 
been most intensively studied, carries a single gene, cl, 
that encodes a repressor protein. In a stable lysogenic 
state, this protein binds to certain sites in the lambda 
genome and represses transcription of all other lambda 
genes. However, establishment of the lysogenic state is 
a complex process involving the products of several 
genes, binding to a series of regulatory sites. The heart 
of the molecular decision between the lytic and lyso- 
genic states involves a competition between the 
repressor (cI) protein, which promotes lysogeny, and 
the Cro protein, which promotes lytic growth. The 
latter choice depends heavily on a complex process 
of antitermination (see Antitermination Factors). 
Furthermore, the decision involves proteins that meas- 
ure the availability of energy, as signalled by the level 
of cyclic AMP (cAMP) (see Cyclic AMP (cAMP)). A 
cell with an adequate supply of glucose has a low level 
of cAMP, and a phage entering such a cell is likely to 
enter the lytic cycle; if the glucose level falls, the level of 
cAMP rises, and a phage entering such a cell is more 
likely to go lysogenic. In effect, the phage is determin- 
ing whether the most prudent strategy for reproduc- 
tion is a ‘short-term tactic’ of using the avilable energy 
for synthesis of a cellful of new phage or a ‘long-term 
tactic’ of producing more copies of its genome through 
bacterial growth. 


See also: Antitermination Factors; Cyclic AMP 
(cAMP); Phage à Integration and Excision 
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Lysozyme 
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Bacterial cells are generally protected from lysis induc- 
ed by such factors as osmotic shock by having a cell 
wall made of peptidoglycan, also called murein. The 
entire peptidoglycan sack around each bacterial cellisin 
fact one giant, covalently bonded bag-shaped molecule. 
Growth of the cell requires that links of this sack be 
opened up long enough to insert new links in between 
them; penicillin leads to the death of growing bacterial 
cells by interfering with the filling and resealing of 
these small gaps in the cell’s armor. Lysozymes are a par- 
ticular class of enzymes that are able to attack this mu- 
rein structure and thus generally effect the destruction 
of the cell. In 1922, the Scottish physician Alexander 
Fleming showed that saliva, tears, and sweat all con- 
tined a substance that could destroy bacteria. What he 
was observing was in fact lysozyme - the first human 
secretion shown to have chemotherapeutic properties. 

Peptidoglycans are composed of long polysacchar- 
ides that are alternating copolymers of N-acetyl 
glucosamine and N-acetylmuramic acid that are 
cross linked through unusual short peptides with 
structures such as (L-Ala)-(p-Glu)-(L-Lys)-(p-Ala). 

In gram-negative bacteria, the peptidoglycan sack 
is generally only one layer thick and lies just inside an 
outer membrane. In gram-positive bacteria, it has no 
outer membrane cover but is many layers thick; this 
thick sack is able to take up and retain the Gram stain, 
giving these bacteria their name. In both groups of 
bacteria, lysozymes catalyze the hydrolysis of the 
glycosidic links between GIcNAc and MurNAc, dis- 
solving the cell wall. Lysozymes are found through- 
out nature — in egg whites, in tears and sweat, and in 
mucus. A number of bacteriophages also encode lyso- 
zymes to help them get in and out of cells. Other 
phages make other endolysins -enzymes with peptido- 
glycan degrading activity. The others have somewhat 
different specificity but the same function as lyso- 
zyme, attacking the peptide crosslinker or the bond 
on the other side of MurNAc. 

Four families of endolysins have been identified: 


jà 


. The true lysozymes (glycosidases) that have just 
been described, which include the products of 
the bacteriophage T4 e gene (for endolysin) and 
P22 gp19. 

2. Transglycosylases, such as the phage lambda R pro- 

tein and the product of the P2 phage K gene, which 

attack the same bond as lysozyme but conserve 
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the glycosidic bond energy by forming a cyclic 1,6- 
disaccharide product. They catalyze the intramo- 
lecular transfer of the O-muramy]l residue to its 
own C6 hydroxyl group. 

3. The amidases, suchas bacteriophage T7 gp3.5, which 
degrade the peptide bond between MurNAc and the 
adjacent tetrapeptide crosslinker and endopep- 
tidases, such as the Listeria monocytogenes A500 
ply500, which degrade the peptide bond between 
two tetrapeptides, cutting between m-DAP Ala. 


See also: Lysis 
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A virulent phage that cannot establish lysogeny and 
whose characteristic mode of multiplication is to pro- 
duce rapidly a large number of new phage particles 
and lyse its host cell. 


See also: Lysogeny; Virulent Phage 
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The ciliated protozoa are binucleate organisms that 
undergo dramatic DNA rearrangement during the 
process of sexual reproduction. Each ciliate cell con- 
tains one or more micronuclei and macronuclei. The 
micronuclear genome is arranged as conventional 
eukaryotic chromosomes, but is transcriptionally 
inactive during asexual reproduction (binary fission). 
It plays a major role during sexual reproduction and 
is often considered an analog of a ‘germline’ nucleus. 
The second type of nucleus, the macronucleus, is 
responsible for all nuclear transcription during asexual 
growth and is thus often referred to as a ‘somatic’ 
nucleus. Its genome represents a subset of the sequen- 
ces present in the micronucleus organized in the form 
of minichromosomes. Following mating (conjuga- 
tion), the macronucleus is destroyed and a mitotic 
copy of the new micronucleus is transformed into a 
new macronucleus via a complex series of genome 
rearrangements. The rearrangement processes of 
macronuclear development include chromosome frag- 
mentation, DNA amplification, excision of interstitial 
DNA segments, and the reordering of DNA segments. 


Conjugation 


Sexual reproduction i is initiated by the pairing of cells 
of compatible mating types. The micronuclei in each 
cell then undergo meiosis to form haploid products. 
Next, a haploid nucleus is passed to the mating part- 
ner, where it fuses with a resident haploid nucleus 
to regenerate a diploid nucleus termed the zygotic 
nucleus. The zygotic nucleus divides at least once by 
mitosis, in the absence of cell division. Some of these 
mitotic products are retained as micronuclei when the 
cell resumes asexual growth, while others undergo 
macronuclear development to form new macronuclei. 
The specific factors responsible for determining micro- 
nuclear vs. macronuclear differentiation are unknown, 


but the position of nuclei within the cell is important 
in determining the fate of the mitotic products. 


Chromosome Fragmentation and 
Telomere Addition 


Macronuclear development in all characterized cili- 
ates involves multiple rounds of DNA replication 
that ultimately lead to a polyploid macronucleus (see 
article on Macronucleus for the ploidy levels of repre- 
sentative ciliates). Various types of DNA rearrange- 
ment occur during this DNA amplification process, 
with one of the major events being chromosome frag- 
mentation (Figure IA). Following fragmentation, 
telomeric repeat sequences are quickly added to the 
DNA ends of the macronuclear-destined sequences. 
The fragmentation/telomere addition process is gen- 
erally reproducible, but varying degrees of heterogen- 
eity in the precise position of telomeric repeat 
addition are observed for different species. In Tetra- 
hymena thermophila, a 15 bp sequence element termed 
the Cbs (Chromosome breakage sequence) is necessary 
and sufficient to direct fragmentation and telomere 
addition. Each Cbs resides in developmentally elim- 
inated DNA and appears to work in an orientation- 
independent manner. A different conserved 10bp 
sequence (E-Cbs) is found adjacent to chromosome 
fragmentation sites in the hypotrich Euplotes crassus. 
E-Cbs can reside in either eliminated or macronu- 
clear-destined DNA and is thought to direct fragmen- 
tation in an orientation-dependent manner, suggesting 
that there may be significant differences in chromo- 
some fragmentation among ciliates. Little is known as 
yet concerning the molecular mechanism(s) of chro- 
mosome fragmentation and the proteins mediating it 
have not been identified. In contrast, de novo telomere 
addition is known to be catalyzed by the ribonucleo- 
protein telomerase. The ciliates are unique in respect 
to their ability to efficiently ‘heal’ the DNA ends 
generated during macronuclear development by telo- 
mere addition. 


Internal Eliminated Sequences 


Large numbers of interstitial DNA segments, termed 
‘internal eliminated sequences’ (IESs), are also excised 
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Figure | The three major types of DNA rearrangement that occur during ciliate macronuclear development. (A) 


Chromosome fragmentation, generating the ends of two macronuclear minichromosomes, is illustrated. Following 
fragmentation, species-specific telomeric repeats (C3_4,A2_4) are added to the minichromosome ends. Also shown is 
the removal of an IES by DNA breakage and rejoining. Macronuclear-destined DNA sequences are indicated as open 
rectangles, the IES as a black rectangle, and a micronuclear-specific ‘spacer’ sequence as a line. (B) Illustration of the 
DNA scrambling observed for some oxytrichid genes. Segments of micronuclear DNA are reordered, and sometimes 
inverted, during macronuclear development to form a macronuclear minichromosome. (Reproduced with permission 


from Klobutcher and Herrick, 1997.) 


during macronuclear development, with the concomi- 
tant rejoining of the flanking DNA (Figure IA). 
In Tetrahymena thermophila, about 6000 IESs are 
excised. The IESs are generally a few kilobase pairs 
in size, are bounded by 4-8 bp direct repeats of vary- 
ing sequence, and share little similarity. There are 
typically >50000 IESs in Paramecium and hypotri- 
chous ciliates, and they are generally smaller, ranging 
from 14 to ~500 bp in size. The hypotrichs Euplotes 
crassus, Oxytricha fallax, and O. trifallax also contain 
large families of transposable elements that behave as 
IESs during macronuclear development (i.e., they are 
excised). The termini of the small IESs in Paramecium 
and Euplotes are conserved and similar to the ends of 
the Euplotes Tec transposon IESs. This observation 
has bolstered suggestions that the IES excision process 
originated from transposons that invaded the micro- 
nuclear genome. In this sense, the IESs can be viewed 
as a form of ‘get-out-of-the-way’ transposon, similar 
to mobile introns and inteins that are removed by 
RNA and protein splicing, respectively. The ability 
of all these elements to be removed at some point 
during the process of gene expression enhances their 
ability to coexist with their host genomes. While the 
relationship of IESs to transposons is less evident in 
species such as Tetrahymena and other hypotrichs, 
it is noteworthy that the analysis of either excision 
intermediates or excision products in these species 


has led to excision models that resemble known trans- 
position mechanisms. 


DNA Scrambling 


Some hypotrichous ciliates in the genus Oxytricha 
have been found to undergo an additional DNA re- 
arrangement process: unscrambling. The micronuclear 
copies of some of the macronuclear chromosomes 
inthis group are not only interrupted, but the segments 
that will form the macronuclear DNA molecule are 
not in the correct order and in some cases inverted 
(Figure 1B). The micronuclear copy of the macronu- 
clear DNA molecule containing a DNA polymerase 
alpha gene represents an extreme form of micronuclear 
scrambling: it is split into at least 51 segments, most of 
which are scrambled. Unscrambling during macro- 
nuclear development appears to be guided by 6-19 bp 
repeats which flank the sequences that are ultimately 
joined together. The origin of gene scrambling is un- 
clear, but it may be related to the IES excision process. 


Chromatin Structure and DNA 
Rearrangement 


While the machineries responsible for the various 
forms of ciliate DNA rearrangement have not yet 
been identified, there is increasing evidence that 


changes in chromatin structure occur during macro- 
nuclear development. Tetrahymena genes have been 
identified that encode development-specific proteins 
which interact with DNA sequences eliminated 
during macronuclear development. The eliminated 
DNA appears to be heterochromatic, suggesting an 
alternative chromatin structure. In addition, a variant 
development-specific histone H3 protein in Euplotes 
has been shown to be targeted to the developing 
macronucleus and its expression correlated with a 
change in nucleosome spacing. Chromatin remodeling 
may be a prerequisite for ciliate DNA rearrangement, 
or, alternatively, may be involved in the subsequent 
process of DNA elimination. 
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Unicellular ciliated protozoa (e.g., Tetrahymena, 
Paramecium, Oxytricha) possess two types of nuclei 
in each cell. The smaller micronucleus functions pri- 
marily during sexual reproduction and is considered 
a ‘germline nucleus’ (see Micronucleus). The larger 
macronucleus is responsible for all nuclear transcrip- 
tion during asexual reproduction and is often consid- 
ered an analog of a somatic nucleus. During sexual 
reproduction (conjugation), the macronucleus is des- 
troyed and a new macronucleus is generated from a 
mitotic copy of the micronucleus by a process involv- 
ing extensive rearrangement of the micronuclear 
genome (see Macronuclear Development, in Ciliates). 

Macronuclear development involves both the 
fragmentation of the conventional eukaryotic 
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chromosomes that were originally present in the 
micronuclear genome, as well as extensive DNA elim- 
ination. As a result, the macronucleus contains a sub- 
set of the DNA originally present in the micronucleus 
and it exists in the form of subchromosomal DNA 
molecules. The sizes of the macronuclear DNA mole- 
cules can vary greatly in different ciliate species. At 
one extreme are the hypotrichous ciliates (e.g., Oxy- 
tricha, Stylonychia, Euplotes). The average size of a 
macronuclear DNA molecule in these species is 
~2 kb, and the majority of macronuclear DNA mol- 
ecules contain single genes. In other ciliates, such 
as Tetrahymena and Paramecium, the macronuclear 
DNA molecules are much larger (average sizes of 
600 kb and 300kb, respectively) and contain many 
different genes. The relatively large size of macronu- 
clei, particularly in comparison to the micronucleus, is 
the result of polyploidy. There are approximately 45 
copies of the typical macronuclear DNA molecule in 
Tetrahymena thermophila, while organisms such as 
Paramecium and the hypotrichous ciliates typically 
have >1000 copies of each macronuclear DNA mol- 
ecule. These high copy numbers are attained by the 
DNA amplification that occurs during macronuclear 
development and are maintained during asexual 
reproduction. 

Ciliate macronuclear DNA molecules are often 
called chromosomes or minichromosomes, because 
they are capped by telomeres and contain one or 
more origins of DNA replication. They do, however, 
appear to lack one key component of a typical eukary- 
otic chromosome: a centromere. As a result, the 
macronucleus does not divide by mitosis, but by 
simply pinching in half. This process is referred to 
as amitosis. For heterozygous loci in Tetrahymena 
thermophila, it is clear that amitosis does not result 
in the systematic segregation of the two alleles to 
daughter nuclei. That is, the multiple copies of the 
two alleles are randomly segregated. After many asex- 
ual divisions, this results in the appearance of cells 
in the population that only contain copies of one of 
the original two alleles. This process of ‘phenotypic 
assortment’ provides a means of generating cells with 
different macronuclear and micronuclear genotypes. 
It is also useful in testing whether new mutations 
result in lethal or viable phenotypes. Assortment of 
alleles probably occurs to some degree in all ciliate 
macronuclei, but the large copy numbers of macro- 
nuclear DNA molecules in many species make it un- 
likely that cells bearing one allele exclusively will be 
generated in a reasonable number of generations. 

In the hypotrich group of ciliates, replication of 
the macronuclear genome occurs in an unusual man- 
ner. In contrast to other eukaryotic nuclei, where 
replication occurs at multiple foci, macronuclear 
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DNA replication is localized to specific structures 
termed ‘replication bands.’ These bands originate 
at one or both ends of the macronucleus and pass 
through the nucleus, replicating the DNA they en- 
counter along the way. The biochemical components 
of these unusual replication structures are currently 
poorly characterized. 

The ciliate macronucleus has been instrumental 
in understanding the structure and function of telo- 
meres. Telomeres are structures present at the ends of 
chromosomes that both serve a protective function 
and allow for the complete replication of the chromo- 
some. The key advantage of ciliates as experimental 
systems derives from the huge numbers of telomeres 
present in a single macronucleus. In contrast to a 
human cell nucleus, which contains 92 telomeres, the 
macronucleus of a hypotrichous ciliate can have in 
excess of 10 million telomeres! Telomeres were first 
shown to be comprised of simple tandem repeats in 
ciliates. For example, hypotrich telomeres are com- 
posed of 5'-GGGGTTTT-3’ repeats, while Tetra- 
hymena telomeres are composed of 5'/-GGGGTT-3' 
repeats. Components of the ribonucleoprotein telo- 
merase, the enzyme responsible for synthesizing 
telomeric repeats, were also first isolated in ciliates. 
Key components of the enzyme include a short RNA 
molecule, which serves as the template for repeat 
synthesis, and a protein subunit that is related to 
the reverse transcriptase proteins of retroviruses. 
Finally, proteins that interact with macronuclear telo- 
meres have also been identified in ciliates and these 
appear to form a complex that renders the termini non 
recombinogenic. 


Further Reading 

Gall JG (ed.) (1986) The Molecular Biology of Ciliated Protozoa. 
New York: Academic Press. 

Prescott DM (1994) The DNA of ciliated protozoa. Microbio- 
logical Reviews 58: 233-267. 


See also: Macronuclear Development, in Ciliates; 
Micronucleus; Telomeres 
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The major histocompatibility complex (MHC) is the 
region of a genome responsible for producing the 


Major Histocompatibility Complex (MHC) 


majority of proteins involved in the immunological 
rejection process. In mouse, this region is referred to 
as H-2 and is located on chromosome 17. Analogous 
MHCs have been identified in all mammalian species 
studied so far. In man, the MHC is the HLA gene 
cluster located on chromosome 6. 

Although first identified by its role in transplant 
rejection the MHC is now known to encode proteins 
required for immunological recognition, for example, 
interactions between lymphocytes and antigen- 
presenting cells. Three major sets of molecules 
are encoded within the MHC: class I, II, and III 
antigens. Class I and II antigens are involved in im- 
munological recognition. Class III genes encode com- 
plement components required for the cleavage of 
C3, a major event in the initiation of an inflammatory 
response. 

Genetic maps of human MHC have revealed that 
class I genes, encoding predominantly classical HLA- 
A, HLA-B, and HLA-C antigen heavy chains, are 
clustered into one region of the MHC. These trans- 
membrane glycoproteins associate with the polypep- 
tide, B2-macroglobulin, which is encoded outside the 
MHC. HLA-A and HLA-B act as cell-surface recog- 
nition particles, recognized by cytotoxic T cells. By 
contrast, the human class II genes are arranged in six 
subregions, DP, DZ, DO, DX, DQ, and DR, each of 
which encodes at least one « and/or one B polypep- 
tide. These subunits noncovalently associate to form 
proteins required for cooperation and interaction 
between cells of the immune system. The class III 
region has genes for the serum complement compon- 
ents C2 and Factor B, the two genes for the serum 
complement components of C4 (C4A and C4B), and 
the two genes for cytochrome P-450 21-hydroxylase 
(21-OHA and 21-OHB). 

HLA occupies around 1/3000 of the total genome 
and contains several hundred individual genes. Since 
there are a large number of extremely polymorphic 
gene loci in the MHC, a normal population has many 
different haplotypes. This ensures that the “perfect 
pathogen” is unable to evolve and spread through a 
population but also renders some individuals more 
susceptible to certain diseases than others. MHC geno- 
types must be matched optimally for successful tissue 
transplantation. 


See also: Haplotype; Immunoglobulin Gene 
Superfamily 
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Why Study the Mouse? 


The common house mouse Mus musculus has played a 
prominent role in the study of genetics since the 
rebirth of the field at the beginning of the twentieth 
century. This birth occurred with the rediscovery of 
Mendel’s laws by three independent European scien- 
tists — C. Correns, H. de Vries, and E. Tschermak. But, 
the research of these three scientists, as well as Mendel 
himself, was performed entirely on plants. As a con- 
sequence, there was initial skepticism in the scientific 
community as to whether Mendel’s laws could explain 
the basis for inheritance in animals, and especially in 
human beings. The reason for this skepticism is easy to 
see. People, in particular, differ in the expression of 
many commonly inherited traits — such as skin color, 
eye color, curliness of hair, and height — that show no 
evidence of transmission according to Mendel’s Laws. 
We now understand that all of these traits are con- 
trolled by multiple genes that each individually segre- 
gate according to Mendel’s First Law, even though the 
ultimate trait that they control does not. But, at the 
beginning of the twentieth century, a demonstration 
of the applicability of Mendel’s Laws required the 
analysis of simple traits controlled by single genes. 

The house mouse has a long history of domestica- 
tion as a pet, and over the centuries, mice with numer- 
ous coat color and other gross mutations were selected 
and bred by dealers in the ‘fancy mouse’ trade, first in 
China and Japan, and later in Europe. In contrast to 
the variation that occurs naturally in wild populations, 
new traits that appear suddenly in captive-bred ani- 
mals are always the result of single gene mutations. 
Early animal geneticists appreciated the importance of 
the genetic resource available within the fancy mice 
and these animals were quickly put to use to demon- 
strate the applicability of Mendel’s Laws to mammals, 
and by extrapolation, to humans as well. 

Beyond the readily available fancy mouse muta- 
tions, there are a number of other compelling reasons 
why the house mouse has continued to represent the 
mammal of choice for genetic analysis. Mice have very 
short generation times of just 8-9 weeks, they are 
small enough so that thousands can be housed in 
relatively small rooms, they have large litters of eight 
or more pups, they breed readily in captivity, fathers 
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do not harm their young, and after centuries of artifi- 
cial selection, they are docile and easily handled. 

But why study amammalat all when animals like the 
fruit fly Drosophila melanogaster and the nematode 
Caenorhabditis elegans are even smaller and much 
more amenable to genetic analysis? The answer is that 
a significant portion of biological research is aimed at 
understanding ourselves as human beings. And al- 
though many features of human biology, especially at 
the cell and molecular level, are shared across a broad 
spectrum of life, our most advanced organismal-level 
characteristics are shared in a much more limited fash- 
ion with other animals. In particular, many aspects of 
human development and disease are common only to 
placenta-bearing mammals such as the mouse. 


All Mammals Have Closely Related 
Genomes 


The movement of mouse genetics from a backwater 
field of study to the forefront of modern biomedical 
research was catalyzed by the recombinant DNA 
revolution, which began 25 years ago and has been 
accelerating in pace ever since. With the ability to 
isolate cloned copies of genes and to compare DNA 
sequences from different organisms came the realiz- 
ation that mice and humans (as well as all other pla- 
cental mammals) are even more similar genetically than 
they were thought to be previously. An astounding 
finding has been that all human genes have counter- 
parts in the mouse genome which can almost always 
be recognized by cross-species hybridization. Thus, 
the cloning of a human gene leads directly to the 
cloning of a mouse homolog which can be used for 
genetic, molecular, and biochemical studies that can 
then be extrapolated back to an understanding of the 
function of the human gene. In only a subset of cases 
are mammalian genes conserved within the genomes 
of Drosophila or C. elegans. 

This result should not be surprising in light of 
current estimates for the time of divergence of mice, 
flies, and nematodes from the evolutionary line lead- 
ing to humans. In general, three types of information 
have been used to build phylogenetic trees for dis- 
tantly related members of the animal kingdom - 
paleontological data based on radiodated fossil 
remains, sequence comparisons of highly conserved 
proteins, and direct comparisons of the most highly 
conserved genomic sequences, namely the ribosomal 
genes. Unfortunately, flies (Drosophila) and nematodes 
(C. elegans) diverged apart from the line leading to 
mammals prior to the time of the earliest fossil records 
in the Cambrian period which occurred 500-600 
million years ago. Nevertheless, sequence data together 
with taxonomic considerations indicate a distant point 
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of departure of C. elegans and vertebrates from a com- 
mon ancestor that lived on the order of one billion 
years ago. Drosophila diverged apart from the verte- 
brate line at a somewhat later period approximately 
700 million years ago. The divergence of mice and 
people occurred relatively recently at 60 million 
years before present. Thus, humans and mice are ten 
times more closely related to each other than either is 
to flies or nematodes. 

Although the haploid chromosome number asso- 
ciated with different mammalian species varies tre- 
mendously, the haploid content of mammalian DNA 
remains constant at approximately 3 billion base pairs. 
It is not only the size of the genome that has remained 
constant among mammals; the underlying genomic 
organization has also remained the same as well. 
Large genomic segments — on average, 10 to 20 million 
base pairs — have been conserved intact between mice, 
humans, and other mammals as well. In fact, the avail- 
able data suggest that a rough replica of the human 
genome could be built by simply breaking the mouse 
genome into 130-170 pieces and pasting them back 
together again in a new order. Although all mammals 
are remarkably similar in their overall body plan, there 
are some differences in the details of both develop- 
ment and metabolism, and occasionally these differ- 
ences can prevent the extrapolation of mouse data to 
humans and vice versa. Nevertheless, the mouse has 
proven itself over and over again as being the model 
experimental animal par excellence for studies of 
nearly all aspects of human genetics. 


The Mouse Is an Ideal Model Organism 


Among mammals, the mouse is ideally suited for 
genetic analysis. First, it is among the smallest mam- 
mals known, with adult weights in the range of 25 to 
40 grams, 2000- to 3000-fold lighter than the average 
human adult. Second, it has a short generation time — 
on the order of 8—9 weeks from being born to giving 
birth. Third, females breed prolifically in the labora- 
tory with five to ten pups per litter and an immediate 
postpartum estrus. Fourth, an often forgotten advan- 
tage is the fact that fathers do not harm their young, 
and thus breeding pairs can be maintained together 
after litters are born. Fifth, for developmental studies, 
the deposition of a vaginal plug allows an investigator 
to time all pregnancies without actually witnessing 
the act of copulation and, once again, without re- 
moving males from the breeding cage. Finally, most 
laboratory-bred strains are relatively docile and easy 
to handle. 

High-resolution genetic studies require the analysis 
of large numbers of offspring from each of the crosses 
under analysis. Thus, a critical quotient in choosing an 


organism can be expressed as the number of animals 
bred per square meter of animal facility space per year. 
For mice, this number can be as high as 3000 pups per 
m° including the actual space for racks (five shelves 
high) and the interrack space as well. All of the reasons 
listed here make the mouse an excellent species for gen- 
etic analysis and have helped to make it the major model 
for the study of human disease and normative biology. 


High-Resolution Genetics 


With the automation and simplification of molecular 
assays that have occurred over the last several years, it 
has become possible to determine chromosomal map 
positions to a very high degree of resolution. Genetic 
studies of this type are relying increasingly on ex- 
tremely polymorphic microsatellite loci to produce 
anchored linkage maps, and large insert cloning 
vectors — such as yeast artificial chromosomes (YACs) 
— to move from the observation of a phenotype, to a 
map of the loci that cause the phenotype, to clones of 
the loci themselves Thus, many of the advantages that 
were once uniquely available to investigators studying 
lower organisms, such as flies and worms, can now be 
applied to the mouse through the three-way marriage 
of genetics, molecular biology, and embryology. 

How should one go about performing a mapping 
project? The answer to this question will be deter- 
mined by the nature of the problem at hand. Is there 
a particular locus, or loci, of interest that you wish to 
map? If so, at what level is the locus defined, and at 
what resolution do you wish to map it? Is the locus 
associated with a DNA clone, a protein-based poly- 
morphism, or a gross phenotype visible only in the 
context of the whole animal? Are you interested in 
mapping a transgene insertion site unique to a single 
line of animals? Do you have a new mutation found in 
the offspring from a mutagenesis experiment? Alter- 
natively, are you isolating clones to be used as poten- 
tial DNA markers for a specific chromosome or 
subchromosomal region with the need to know sim- 
ply whether each clone maps to the correct chromo- 
some or not? The answers to these questions will lead 
to the choice of a general mapping strategy. 


Novel DNA Clones 


Gene cloning has become a standard tool for analysis 
by biologists of all types from those studying protein 
transport across cell organelles to those interested in 
the development of the nervous system. Genes are 
often cloned based on function or pattern of expres- 
sion. With a cloned gene in-hand, how does one deter- 
mine its location in the genome? Today, the answer to 
this question is always through the use of an established 


mapping panel. Mapping with established panels is 
relatively painless and very quick. Furthermore, it can 
provide the investigator with a highly accurate loca- 
tion within a single chromosome of the mouse genome. 
With these results in-hand, it is always worthwhile to 
determine whether the newly mapped clone could 
correspond to a locus previously defined by a related 
trait or disease phenotype. This can be accomplished 
by consulting the most recent version of the genetic 
map for the region of interest in a computer database. 


Mutant Phenotypes 


For loci defined by phenotype alone, rapid mapping is 
usually not possible. Interest in the new phenotype is 
likely to lie within its novelty and, as such, the parental 
strains used in all standard mapping panels are almost 
certain to be wild-type at the guilty locus. Thus, a 
broad-based recombinational analysis can be accom- 
plished only by starting from scratch with a cross 
between mutant animals and a standard strain. Before 
one embarks on such a large-scale effort, it makes 
sense to consider whether the mutant phenotype, or 
the manner in which it was derived, can provide any 
clues to the location of the underlying mutation. Is the 
mutant phenotype similar to one that has been pre- 
viously described in the literature? Does the nature of 
the phenotype provide insight into a possible bio- 
chemical or molecular lesion? 

The most efficient way to begin a search for poten- 
tially related loci is to search through an online data- 
base of genetic mapping information. Phenotypically 
related loci can be uncovered by searching an electro- 
nic databases for the appearance of well-chosen key- 
words. Finally, one can carry out a computerized 
online search through the entire biomedical literature. 
Once again, this search need not be confined to the 
mouse since similarity to a human phenotype can be 
informative as well. 

Whena possible relationship witha previously char- 
acterized locus is uncovered, genetic studies should be 
directed at proving or disproving identity. This is most 
readily accomplished when the previously character- 
ized locus — either human or mouse — has already been 
cloned. A clone can be used to investigate the possi- 
bility of aberrant expression from mice that express 
the new mutation. One can follow the segregation of 
the cloned locus in animals that segregate the new 
mutation. Absolute linkage would provide evidence 
in support of an identity between the new mutation 
and the previously characterized locus. 

Even if the previously characterized mutant locus 
has not yet been cloned, it may still be possible to 
test a relationship between it and the newly defined 
mutation. If the earlier mutation exists ina mouse strain 
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that is still alive (or frozen), it becomes possible to 
carry out classical complementation analysis. This 
analysis is performed by breeding together animals 
that carry each mutation and examining the pheno- 
type of offspring that receive both. If the two muta- 
tions — m1 and m2, for example — are at different loci, 
then the double mutant animals will have a genotype 
of +/m1, +/m2. If both mutations express a recessive 
phenotype, then this double mutant animal, with 
wild-type alleles at both loci, would appear wild-type; 
this would be an example of complementation. On the 
other hand, if the two mutations are at the same locus, 
then the double mutant animal would have a com- 
pound heterozygous genotype of m1/m2. Without 
any wild-type allele at this single locus, one would 
expect to see expression of a mutant phenotype; this 
would be an example of noncomplementation. 

Even if the previously characterized mutation is 
extinct, it may still be possible to use its previously 
determined map position as a test for the possibility 
that it did lie at the same locus as the newly uncovered 
mutation. This is accomplished by following the 
transmission to offspring of the newly uncovered 
mutation along with a polymorphic DNA marker 
that maps close to the previously determined mutant 
map position. Close linkage between the new muta- 
tion and a DNA marker for the old mutation would 
suggest, although not prove, that the two mutations 
occurred at the same locus. 

Finally, a similar approach can often be followed 
when the previously characterized mutation is 
uncloned but mapped in the human genome rather 
than the mouse. Most regions of the human genome 
have been associated with homologous regions in the 
mouse genome. Thus, one can choose DNA markers 
from the region (or regions) of the mouse genome that 
is likely to carry the mouse gene showing homology to 
the mutant human locus. These markers can then be 
tested for linkage to the new mouse mutation. Again, 
the data would be only suggestive of an association. 

In some cases, new mutations will be found to be 
associated with gross chromosomal aberrations. This 
is especially likely to be the case if the new mutation 
was first observed in the offspring from a specific 
mutagenesis study. Two mutagenic agents in particular 
- X-irradiation and the chemical chlorambucil — often 
cause chromosomal rearrangements. Rearrangements 
can also occur spontaneously and when the mutant 
line is difficult to breed, this provides a hint that this 
might indeed be the case. In any case where the sus- 
picion of a chromosomal abnormality exists, it is 
worthwhile analyzing the karyotype of the mutant 
animals. The observation of an aberrant chromosome 
— with a visible deletion, inversion, or translocation — 


should be followed up by a small breeding study to 
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determine if the aberration shows complete linkage to 
the mutant phenotype. If it does, one can be almost cer- 
tain that the mutation is associated with the aberration 
in some way. If the chromosomal aberration is a dele- 
tion, the mutant gene is likely to lie within the deleted 
region. With a translocation or inversion, the mutant 
phenotype is likely to be due to the disruption of a 
gene at a breakpoint. In all cases, the next step would be 
to perform linkage analysis with DNA markers that 
have been mapped close to the sites affected by the 
chromosomal aberration. The aberration itself may 
also be useful later as a tool for cloning the gene. 
This is especially true for translocations since the 
breakpoint will provide a distinct physical marker 
for the locus of interest. 

Another possibility to consider is whether the 
mutation is sex-linked. This is easily demonstrated 
when the mutation is only transmitted to mice of one 
sex. Sex linkage almost always means X chromosome 
linkage. If the mutation is recessive, a female carrier 
mated to a wild-type male will produce all normal 
females and 50% mutant males. If the mutation is 
dominant, a mutant male mated to a wild-type female 
will produce all normal males and all mutant females. 
Finally, if all efforts to map the novel phenotype by 
association fail, it will be necessary to set up a new 
mapping cross from scratch in which DNA markers 
from across the genome can be tested for linkage. 


See also: Breeding of Animals; Embryonic 
Development, Mouse; Inbred Strain; Linkage Map 
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Genetic maps may be based on either linkage distances 
or physical distances. 

The linkage map distance between markers is the 
mean number of exchanges (morgans, symbol M) 
estimated from the observed recombination fre- 
quency. The unit of distance most often used is a 
centimorgan (cM), which corresponds to 1% recom- 
bination frequency. Because of multiple exchanges, 
recombination frequencies are less than map distances, 
except for short intervals. 

Maps based on physical distances are constructed 
from restriction analysis and/or from DNA sequen- 
cing data. The unit of distance is bp (base pair), kb 
(kilobase pair = 1000 base pairs), or Mb (megabase 
pair = 1 x 10° bp), depending on the scale. 


Map distances on bacterial chromosomes, deter- 
mined from conjugation between sexually compatible 
strains, are given in minutes, indicative of the time fol- 
lowing cell contact at which a given locus is transferred 
from the donor to the recipient cell. By convention, 
these distances in Escherichia coli are normalized to a 
total map length of 100 minutes. 


See also: Centimorgan (cM); Mapping Function 
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The explicit assumption in linkage mapping is that the 
recombination frequencies observed are a function of 
the distance between the markers and are independent 
of the nature of the markers — the markers are pre- 
sumed to render the recombination event visible, but 
not to perturb it. When recombination frequencies are 
small, as in crosses between heteroalleles, double 
recombinants are expected to be rare. Consequently, 
wild-type recombination frequencies are expected to 
be additive — for mutants 7, m2, and m3, linked in that 
order, the sum of the frequencies of wild-type re- 
combinants from the crosses mı x m and m x m3. 
Map expansion (Holliday, 1964) is the term used when 
the wild-type recombination frequency from the cross 
mı x mz exceeds that sum. This observation implies 
that the genetic markers themselves, not simply the 
distances between them, are influencing the R values. 


Explanations for Map Expansion 


Map expansion is seen when the markers are so close 
that most of the recombinants arise nonreciprocally, 
by gene conversion. Three explanations are plausible. 
(1) The mutant marker mis a deletion (or substitution) 
whose size is appreciable compared to the distance 
from m; to m3. (2) Two marked sites that are especially 
close together may interfere with the formation of 
a recombination intermediate that involves them. (3) 
Mismatch repair subsequent to the formation of an 
intermediate can corepair closely neighboring sites. If 
mismatch repair tracts begin at one site, a second site 
included in the same heteroduplex DNA segment at a 
distance less than the length of the repair tract will be 
corepaired to the same parental type. When markers 
are farther apart, one site can be more often repaired 
without repair of the other, resulting in the formation 
of a genetically recombinant polynucleotide strand of 


DNA. If the repair tracts have a more or less fixed 
length, map expansion will result. 

In crosses of bacteriophage T4, m, x m; gives fewer 
recombinants than expected from the sum of the 
frequencies from the crosses m x mz and mz x m3. 
This result, the inverse of map expansion, indicates 
that much of the localized negative interference in T4 
is due to exchanges occurring in clusters, apparently 
independently of the genetic markers. 


Misuse of ‘Map Expansion’ 


“Map expansion’ has been used to refer to regions of a 
chromosome in which the rate of genetic recombin- 
ation per base pair is higher than normal (see Hot Spot 
of Recombination). 


Reference 
Holliday R (1964) A mechanism for gene conversion in fungi. 
Genetical Research 5: 282-304. 


See also: Gene Conversion; Hot Spot of 
Recombination; Marker Effect; Mismatch Repair 
(Long/Short Patch); Negative Interference 
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Mapping functions are mathematical expressions 
that relate (observed) recombination frequencies to 
(inferred) linkage map distances. 

The order of genetic markers on a linkage map is 
determined from the recombination frequencies ob- 
served in crosses carried out under standard condi- 
tions. Marker pairs that give the largest recombination 
frequencies are placed farthest apart on the map. If 
multiple exchanges are rare, recombination frequen- 
cies are additive and are, themselves, a suitable metric 
for linkage distance (Sturtevant, 1913). By convention, 
two loci manifesting 1% recombination frequency 
with each other in meiosis are said to be linked at 
a map distance of one centimorgan (cM). Multiple 
exchanges are rare when recombination frequencies 
are sufficiently small. For larger frequencies, multiple 
exchanges are rare if interference is positive and suffi- 
ciently strong. When recombination frequencies are 
larger and not subject to strong interference, they are 
not additive because they are influenced by multiple 
exchanges. By extension of the convention defined 


Mapping Function H4 


above, map distance is defined as the mean number 
of exchanges in the interval (M, units of Morgans, M) 
and is often expressed in centimorgans (100 cM = 1M). 
Map distance can be derived from recombination 
frequencies by mathematical expressions (mapping 
functions), which transform recombination frequen- 
cies to an additive metric (map distance) that is equal 
to recombination frequency for small values. 


Haldane’s Mapping Functions 


J.B.S. Haldane addressed the problem of converting 
large meiotic recombination frequencies to linkage 
map distances (Haldane, 1919). Previously, Sturtevant 
had simply equated recombination frequencies (R) 
with map distance (M, the mean number of ex- 
changes), while recognizing the inaccuracy of that 
relationship when R values were sufficiently large to 
admit multiple exchanges (Sturtevant, 1913). Haldane 
bounded the problem of relating R to M by noting 
that realistic functions were likely to lie between 
Sturtevant’s equality, which assumes complete posi- 
tive interference and a function that assumes no inter- 
ference. 


Haldane’s No-Interference Function 


If the number of points in an interval at which ex- 
changes can occur is large, if exchanges at each such 
point are realized with the same low probability and 
independently of each other, and if only odd numbers 
of exchanges lead to crossing over of the markers 
defining the interval, then: 


R= (1 — eM) 


N| =e 


the sum of the odd terms of the Poisson distribution of 
mean M. This expression has come to be known as 
‘Haldane’s mapping function,’ although Haldane 
explicitly recognized that it was not an appropriate 
mapping function (for any organisms then character- 
ized) because of interference. Of course, the function 
is useful for the relatively few organisms whose meio- 
tic recombination lacks interference. For such organ- 
isms, the inverse of the function: 


M= -i(i — 2R) 


facilitates the transformation of (nonadditive) ob- 
served recombination frequencies to (additive) linkage 
map distances. Thisfunctionreducesto R = M for small 
values of M and approaches R = 1/2 as M gets large, 
features that are demanded by meiotic data. 
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Haldane’s Interference Function 


Haldane offered a function contrived to describe the 
empirical relationship between R and M implied by 
extant Drosophila and plant data: 


M =0.7R — 2 h(a = 2R) 


This is not usually the function referred to by the 
phrase “Haldane’s mapping function.’ Haldane’s two 
functions are compared with Sturtevant’s complete 
interference function in Figure I. 


Other Linear Mapping Functions with 
Interference 


Like Haldane’s interference function, some mapping 
functions are contrivances valued solely for their util- 
ity in converting experimentally observed R values to 
M values that are additive. 


1 1, 14+2R 
R= z tanh(2M); M= 7" R 


is a widely used representative of this class, suitable for 
the linear linkage maps of Drosophila melanogaster 
(Kosambi, 1944). 

Other functions describe biological models pre- 
sumed to relate R to M in the presence of interference. 
Counting models for meiotic recombination suppose 
that attempts at exchange are Poisson-distributed, but 
that each successful attempt is separated from its near- 
est successful neighboring attempt by a fixed number 
(m) of failures: 


where y = 2(m+1)M. This model assumes chiasma 
interference but a lack of chromatid interference, 
which is justified by most experimental analyses. 
When m equals zero, the function is identical to 
Haldane’s no-interference function. The ratio of suc- 
cesses to total attempts (1/(m+1) that best describes 
interference (and therefore best converts the R values 
to additive M values) is about equal, in Neurospora 
and Drosophila, to the fraction of gene conversions 
that are accompanied by exchange of flanking markers 


Recombinant frequency (R) 


Map distance (M) 


Figure | 


Haldane’s mapping functions. (1) Sturtevant’s function (R = M); (2) Haldane’s mapping function with 


interference [M = 0.7R — 93 In(1 — 2R]]; (3) Haldane’s no-interference mapping function|R = 3 (1 — e7™)]. 


1143 


Mapping Function 


Recombinant frequency (R) 


Map distance (M) 


Figure 2 Other linear mapping functions. (1) Sturtevant’s function (R = M); (2) counting model with m = 2; 
(3) Kosambi’s function; (4) Haldane’s no-interference function. 


(Foss et al., 1993). For D. melanogaster, m = 4, while 
for Neurospora crassa, m = 2. For values of m up to 
three, the function has been written in closed form. 
For instance, when m = 2: 


R= l [1 — (1 +4M + 6M°)e**] 


This counting model and Kosambi’s function are 
compared with Haldane’s no-interference function 
and with Sturtevant’s complete interference func- 
tion in Figure 2. 


Circular Functions 


Functions for circular maps must depart from 
Haldane’s function because each pair of markers is 
linked by a short arc and a long arc, both of which 
need to be ‘broken’ to effect recombination. A model- 
based function for bacteriophage T4, which has a 
circularly permuted chromosome, achieved linkage 
circularity by assuming a mixed population composed 
of chromosomes that had termini in one or the other 
arc in proportion to the relative linkage lengths of those 
arcs. These chromosomes were assumed to undergo a 
succession of spatially clustered exchanges between 


randomly chosen partners. The assumption of clusters 
accounted for the localized negative interference 
observed in intragenic crosses (Stahl et al., 1964). 

With linear maps, estimates of total map length are 
minimal because the ends of the map are defined by 
the available markers. The closed nature of circular 
maps allows for true, rather than minimal, estimates of 
map length. 


Functions and Chromosomes 


Mapping functions convert recombination frequen- 
cies to linkage distances, but tell us nothing about 
physical distances between markers. For instance, 
regions of a chromosome that have a high density of 
recombination-initiating sites (hot spots) will have a 
high cM/kilobase ratio. However, when recombin- 
ation is approximately uniform per kilobase, an ap- 
propriate mapping function can help determine the 
location of genes on a physical map, facilitating gene 
cloning. 


Further Reading 
Stahl FW (1979) Genetic Recombination: Thinking about It in Phage 
and Fungi. San Francisco, CA: WH Freeman. 
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In human, mouse, and other mammalian species a 
common method for determining the chromosomal 
location of genes and anonymous sequences is the 
use of a DNA mapping panel. These panels consist 
of a set of individual DNA samples that have been 
typed in previous studies to define an ordered chromo- 
somal map of a single or multiple chromosomes. 
Originally, mapping panels consisted of DNAs char- 
acterized in genetic linkage mapping studies; however, 
radiation hybrid mapping panels are used in prefer- 
ence to genetic linkage mapping for determining 
human chromosomal positions. Although radiation 
hybrid panels are now available in the mouse and 
may offer a suitable substitute in future studies, cur- 
rently, the use of mapping panels from genetic crosses 
is the major tool for gene localization in this species. 
In general, the usefulness of a mapping panel relies 
on: (1) previous studies that have established an accur- 
ate gene/marker order along the entire length of each 
autosome and the X chromosome; and (2) how easily 
an informative polymorphism can be ascertained. The 
mapping of any unknown gene or sequence can be 
determined in a mapping panel if polymorphisms 
can be identified that distinguish the two parental 
haplotypes used in a genetic cross or species in a 
radiation hybrid fusion. This is done by examining 
the segregation or linkage of the parental haplotypes 
or species-specific polymorphism that is defined by 
the unknown sequence with that of the previously 
defined markers. In general, if the mapping panel has 


been previously characterized by the accurate typing 
of hundreds or thousands of markers, and the panel 
contains over 100 informative meioses or radiation 
hybrids, the relative order of the markers and genes 
can be established with good confidence. 

Radiation hybrid DNA mapping panels consist of 
DNAs from somatic cell hybrids in which the donor 
cells (e.g., human) have been irradiated and the fusion 
partner (e.g., hamster) has not. The difference between 
the donor and recipient species DNA allows it to 
be determined whether a particular species-specific 
sequence has been retained or not in the recipient 
cells. A variety of satistical tools are used to determine 
the relative order of markers and position along a 
chromosome. Several internet sites provide typing 
results of large numbers of markers in specific radiation 
hybrid panels (e.g., http://www-shgc.stanford.edu/ 
RH/index.html, — http://carbon.wi.mit.edu:8000/cgi- 
bin/contig/phys_map). 

Most mouse mapping panels that are currently util- 
ized derive from crosses between different species or 
subspecies of mice. These types of crosses are very 
valuable because of the easy identification DNA 
sequence differences between the parents. Whether 
an investigator is using simple sequence repeats, single 
nucleotide polymorphisms, restriction fragment 
length polymorphisms, or any other sequence-based 
technique, the likelihood of defining an informative 
marker for segregation analysis can approach 100 %. 
Extensive typing information on many of these 
crosses is available on the internet at the Mouse 
Genome Informatics web site (http://www.informa- 
tics.jax.org/) in the maps and mapping data section 
(http://www.informatics.jax.org/crossdata.html). 


See also: Gene Mapping; Mapping Function 
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The Marfan syndrome (MFS) is an autosomal dom- 
inant, heritable disorder of connective tissue charac- 
terized by clinical findings in multiple tissues and 
organ systems, including the eyes, skeleton, muscles, 
heart, major arteries, lungs, and skin. The cause is a 
defect in the extracellular microfibril. 


Clinical Manifestations 


The manifestations of MFS show considerable vari- 
ability in expression, and people need not show all 


features to warrant the diagnosis. In general, people 
with MFS have disproportionate tall stature, with 
arms and legs particularly long. The ribs also over- 
grow, and push the sternum in (pectus excavatum) or 
out (pectus carinatum). Joint laxity is common, but 
congenital contractures of the elbows and digits also 
occur. Ligamentous laxity also contributes to abnormal 
spinal curvature (scoliosis) and to flat feet (pes planus). 
The palate tends to be narrow and high, and the teeth 
crowded and maloccluded. While skeletal muscle is 
underdeveloped in many, which contributes to the 
asthenic habitus, few people are functionally weak. 
Dislocation of the ocular lens can be present at birth 
or appear at any time during growth of the eye. Near- 
sightedness (myopia), strabismus, and astigmatism are 
common ocular signs. If the ocular problems are not 
detected in early childhood, amblyopia becomes a per- 
manent problem. Glaucoma and cataract frequently 
occur in young and middle-aged adults with MFS. The 
lung is subject to spontaneous pneumothorax from 
rupture of apical blebs (5%). The dura stretches in 
the lumbosacral region producing a dilated thecal sac 
(dural ectasia) and occasionally anterior meningo- 
celes, which can cause pain and weakness. 

The complications that reduce life expectancy in 
MFS by half are mainly cardiovascular. The first seg- 
ment of the aorta (aortic root) may be enlarged at 
birth, and typically dilates progressively throughout 
life. This process is painless, and in the absence of 
appropriate imaging, goes unrecognized until the 
symptoms of aortic regurgitation or life-threatening 
aortic dissection appear. Mitral valve prolapse occurs 
in most people with MFS, and has a tendency to pro- 
gress to mitral regurgitation, which is the most com- 
mon indication for cardiac surgery in children. 


Prevalence 


The MFS is one end of a spectrum of heritable dis- 
orders of connective tissue, and drawing the line of 
demarcation among these other disorders is based 
on arbitrary diagnostic criteria. Estimates of the 
prevalence of classic MFS in all populations range 
from 1 per 3000 to 10000. No racial or ethnic group 
seems predisposed. About 25-30% of affected people 
are the first case in the family due to a new mutation 
in the egg or the sperm that led to that person’s con- 
ception. 


Cause 


The cause of MFS, defined in 1991, is a mutation in the 
gene encoding fibrillin-1, the principal component of 
the extracellular microfibril. This gene, FBNI, spans 
over 100kb of chromosome 15, and consists of 65 


Marfan Syndrome 1145 


exons specifying a 365-kDa glycoprotein. More than 
200 different mutations have been found in people 
with MFS, with very few recurrences in unrelated 
individuals. Fibrillin monomers polymerize with 
other proteins to form microfibrils, which are found 
in the extracellular matrices of most tissues and per- 
form various functions. In the eye, microfibrils are 
the zonules that attach the lens to the ciliary bodies. 
In skin, microfibrils are arrayed perpendicular to 
the epidermal-dermal junction and seemingly have a 
structural role. Deeper in the dermis, and in the media 
of arteries, and in the lung, microfibrils combine with 
tropoelastin to form elastic fibers. The pathogenesis of 
the MFS results from the diverse functions played by 
microfibrils. Dislocation of the lens is directly attrib- 
utable to defective microfibrils, but how these struc- 
tures control bone growth is unclear. Susceptibility to 
aortic root dilatation and aortic dissection undoubt- 
edly stem from defective elastic fibers, but why the 
lung, with considerable elastic tissue, is relatively 
mildly affected is unknown. 


Management 


Early diagnosis is crucial to effective management, 
which is much easier when a family history of MFS 
raises suspicion. Even today, some patients are not 
detected until a major complication occurs. Diagnosis 
of the first case in any family should prompt evalu- 
ation of close relatives. 

Early evaluation by an ophthalmologist familiar 
with MFS is key to preventing amblyopia. With 
improved ocular surgery, lens removal for valid indi- 
cations is much less risky. Little can be done to affect 
stature. Screening for scoliosis should begin in early 
childhood. Bracing may be effective, but curves 
greater than about 40° require surgical stabilization. 
Severe pectus excavatum can be surgically repaired to 
improve respiratory mechanics and surgical access to 
the heart and aorta. 

Central to managing the cardiovascular features is 
echocardiography: the size of the aortic root can be 
followed, cardiac and valvular function quantified, 
and the effects of therapy gauged. The most effective 
therapy is early administration of a f-adrenergic 
blocking agent. The intent is to reduce both heart 
rate and impulse of ejection in order to reduce 
hemodynamic stress on the aorta and delay or prevent 
dilatation and dissection. When the aortic root 
reaches 50-55 mm in the adult, strong consideration 
to prophylactic aortic replacement should be given. 
The long-term responses to this approach have 
resulted in average life expectancy rising over the 


past three decades from the fourth to and the seventh 
decade. 


1146 Marker 


Further Reading 

Dietz HC and Pyeritz RE (1995) Mutations in the human gene 
for fibrillin-| (FBNI) in the Marfan syndrome and related 
disorders. Human Molecular Genetics 4: 1799-1809. 

Pyertiz RE (1997) Marfan syndrome and other disorders of 
fibrillin. In: Rimoin DL, Connor JM and Pyeritz RE (eds) 
Principles and Practice of Medical Genetics, 3rd edn, pp. 1027— 
66. New York: Churchill Livingstone. 

Pyeritz RE (1993) The Marfan syndrome. In: Royce PM and 
Steinmann B (eds) Connective Tissue and Its Heritable Disor- 
ders: Molecular, Genetic and Medical Aspects, pp. 437-468. 
New York: Wiley-Liss. 

Shores J, Berger KR, Murphy EA and Pyeritz RE (1994) Chronic 
B-adrenergic blockade protects the aorta in the Marfan 
syndrome: a prospective, randomized trial of propranolol. 
New England Journal of Medicine 330: 1335-1341. 


See also: Clinical Genetics 


Marker 
B S Guttman 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 I .0793 


A marker is a mutation or other distinctive nucleic 
acid sequence that can be used to identify a gene and 
construct a linkage map. Classical mapping techniques 
depend upon the identification of a gene on the basis 
of mutations that occur within it. A single mutation 
that alters or abolishes the gene’s function marks the 
gene. In classical mapping experiments, the (relative) 
distances between genes are determined by crossing 
individuals bearing markers in two or more genes and 
determining the frequency of recombination between 
each pair of markers, on the assumption that the 
greater the distance between the markers, the more 
recombination will occur. In some systems, parti- 
cularly in viruses and microorganisms, it is possible 
to work with multiple distinct mutations within 
a single gene; each mutation can be used as a distinct 
marker, and they can be used to explore the fine struc- 
ture of the gene. 

With the advent of restriction mapping, which 
depends upon ordering the sites cut by restriction 
endonucleases, a new type of marker has come into 
use. A change in a nucleic acid sequence may create or 
abolish a restriction site for some enzyme without 
having any other effect, perhaps because the site occurs 
between genes or within an intron in a gene. The muta- 
tion may also change the third nucleotide in a codon 
while still specifying the same amino acid, or substitute 
a similar-enough amino acid to not affect the function. 


Such benign changes have occurred frequently in 
populations, creating variations among individuals in 
the lengths of restriction fragments generated from 
their DNA - that is, producing restriction fragment 
length polymorphisms (RFLPs). RFLPs in the human 
genome have been particularly useful in mapping. The 
site that generates the polymorphism can be identified 
and located, and it then serves as a useful marker. Some 
human genes have been located initially because of 
their association with particular RFLP sites. 


See also: Restriction Endonuclease; Restriction 
Fragment Length Polymorphism (RFLP) 
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A marker is an allelic difference between two parental 
chromosomes introduced to mark a specific position 
so that one can identify which parent contributed that 
information during recombination. Marker effect is 
the term used to describe the situation when the allelic 
difference has an impact on the outcome of the experi- 
ment. Such marker effects are frequently encountered 
in fine-structure mapping, i.e., in situations where 
recombination results more often from gene conver- 
sion than from crossing-over. In keeping with this, 
some marker effects are understood to result from 
mismatches that are subject to unusual modes of mis- 
match repair. Most mismatches are repaired by the 
mut system, which is the major mismatch repair path- 
way in both prokaryotes and eukaryotes. But other 
mismatches are poorly recognized by this mismatch 
repair system and are either repaired by another sys- 
tem or not repaired at all. 

In Escherichia coli, the G-T base pair occurring 
within a particular context is recognized by an alter- 
native mismatch repair system called ‘very short patch 
repair? Recombination between very close alleles 
occurs only when a correction tract covers one marker 
and not the other. Very short lengths of mismatch 
repair would be expected to enhance the frequency 
of endings of repair tracts nearby, and hence give more 
recombination per unit length of DNA. In the yeasts 
Saccharomyces cerevisiae and Schizosaccharomyces 
pombe, the C-C mismatch is recognized inefficiently, 
so that when heteroduplex occurs at the site of a 
marker producing this mismatch, the mismatch often 
persists until resolved by replication. Co-correction of 
the C-C mismatch instigated by another mismatch 


nearby does not occur when the two markers are very 
close (the reason for this is not known). The persist- 
ence of heteroduplex gives a very high frequency of 
recombination with other markers nearby because 
one nucleotide strand will always show recombina- 
tion. This effect can also be produced in Saccharo- 
myces cerevisiae by the introduction of a palindrome 
for use as a marker; palindromes within heteroduplex 
appear to be corrected inefficiently. 

Deletions or the inclusion of heterologous se- 
quences influence the amount of recombination in 
several systems, perhaps by interfering with the 
migration of a Holliday junction or other branch struc- 
ture. Another kind of marker effect reported in several 
fungi stems from the creation of a specific DNA 
sequence by the introduction of a mutation for use as 
a marker. This effect can be explained as the creation of 
a binding site for a protein that creates a recombin- 
ation hot spot. 


See also: Hot Spot of Recombination; Map 
Expansion; Mismatch Repair (Long/Short Patch) 


Marker Rescue 


I Schildkraut 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0795 


Marker rescue is a method that analyzes gene organ- 
ization and establishes the relationship of a mutation 
to a physical location ona DNA molecule. Ina simple 
conception of marker rescue, a mutation carried by a 
virus that prevents the virus from forming plaques can 
be mapped to one of the many DNA fragments that 
are created by digestion of the wild-type virus with a 
restriction endonuclease. First, the DNA fragments of 
the viral DNA are physically separated. Cells are then 
simultaneously infected with the mutant virus and 
transfected with one of each of the DNA fragments. 
The DNA fragment that carries the wild-type copy of 
the mutant allele for which the mutant virus is defect- 
ive recombines with the viral DNA and enables the 
virus to form a plaque or rescue the virus. All the other 
DNA fragments do not contain the wild-type copy of 
the mutant allele and when tested do not give rise to 
a plaque. Marker rescue determines which DNA 
restriction fragment carries the mutation in the mutant 
virus. Marker rescue can show that the ordered pre- 
sentation of genes along a chromosome correlates to 
their linear occurrence in the DNA molecule. 


See also: Genetic Marker 
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“Masked mRNA’ was the term coined by A.S. Spirin in 
the mid-1960s to describe the state of messenger RNA 
isolated from early fish embryos and sea urchin eggs. 
mRNA was proposed to be associated with proteins 
in ribonucleoprotein particles (mRNP). Unless steps 
were taken to deproteinize the mRNP, they were 
inactive in a translation assay im vitro. However per- 
fectly proper and active template was obtained follow- 
ing phenol extraction or trypsinization, implying that 
these treatments removed inhibitors (repressors) of 
translation that normally hold maternal mRNA in a 
masked state. 

During oogenesis and spermatogenesis, mRNA is 
synthesized and stored. Subsequent development of 
the germ cells including meiotic maturation of the 
oocyte, spermiogenesis, and early embryogenesis 
occurs in the absence of transcription, and is largely 
made possible by selective temporal and spatial acti- 
vation of translation of the masked mRNAs. The 
question of how the initial state of repression is 
imposed and how it is relieved to allow expression is 
intriguing at several levels. First, early development, 
and this is true for all organisms examined, ranging 
from marine invertebrates, worms, and flies to frog, 
mouse, and man, is conspicuously a period when gene 
expression is essentially governed by translational 
control, rather than transcriptional control. Second, 
the control mechanisms target particular RNAs, 
which implies the recognition of specific sequences 
and/or specific RNA-binding proteins. Finally, the 
processes regulated by the translational repressors 
and activators are of fundamental physiological im- 
portance. Thus, masked maternal mRNAs encode 
proteins required for entry and progression through 
the cell cycle, such as cyclins, c-mos, and ribonucleo- 
tide reductase in lower and higher eukaryotes, regu- 
lation of sexual fate in the Caenorhabditis elegans 
hermaphrodite germline, and specification of pattern 
along the anteroposterior body axis in Drosophila by 
generation of protein gradients from localizedmRNAs. 
During spermiogenesis, nuclear compaction relies on 
the ordered substitution of somatic histones by the basic 
transition proteins and protamines, whose levels are 
temporally regulated; premature translation of the pro- 
tamine 1 masked ‘paternal’ mRNA leads to sterility. 
Lessons that are being learnt from this wealth of ex- 
amples complement studies of the somewhat rarer 
cases of translational control of somatic mRNAs such 
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as ferritin, lipoxygenase, and ribosomal protein 
mRNAsas wellasthose that mediate synaptic plasticity. 

Several underlying principles have emerged from 
genetic and biochemical studies of masked germline 
mRNAs. The Y-box family of nucleic acid-binding 
proteins of relatively low RNA-sequence specificity 
participate in the general packaging of mRNA as it 
emerges from the nucleus. Regulatory elements speci- 
fying repression lie in the 3’ untranslated region 
(UTR). They are generally short, apparently unstruc- 
tured motifs, often present in more than one copy, that 
mediate the binding of specific trans-acting factors, 
the masking repressors. Full control is sometimes ac- 
hieved in conjuction with 5’ UTR elements; mRNAs 
that are controlled by both localization and trans- 
lation tend to have more complex, structured motifs. 

How do repressors located after the termination 
codon prevent ribosome binding to the 5’ cap struc- 
ture and how is translation activated? Models that 
explain several instances of repression/activation use 
as a framework the recently arrived at view of eukar- 
yotic mRNA: circularized through the 5’ cap and 3’ 
poly(A) tail by interactions between the cap-binding 
initiation factors eIF4E and elF4G and the poly(A)- 
binding protein. This bridging interaction, documented 
physically in atomic force microscopy and function- 
ally in translation and stability assays, most likely 
enhances translational efficiency by permitting ribo- 
somes to reinitiate promptly following termination 
of protein synthesis. Repressors may interfere with 
this so-called closed loop form of mRNA, either 
directly or indirectly. Thus CPEB (cytoplasmic poly- 
adenylation element-binding protein) sequesters the 
cap-binding factor eIF4E (indirectly through another 
protein termed maskin), prevents productive eIF4F 
complex formation and hence ribosome recruitment. 
Some repressors, including C. elegans GLD-1, inter- 
fere with poly(A)’s role in translation and/or promote 
deadenylation. However, others (such as Drosophila 
nanos and pumilio, and rabbit lipoxygenase DICE- 
binding protein) can exert their effects in a cap- and/ 
or poly(A)-independent manner, suggesting targets 
downstream of cap recognition and scanning to the 
initiator AUG, e.g., ribosome assembly at the AUG. 

Activation of translation may result from simple 
relief from repression, achieved through repressor 
modification such as phosphorylation and degradation 
or displacement by localization factors. Ample evi- 
dence attests to the view that in many cases derepres- 
sion is coupled to an activation process, and that both 
are required for full unmasking. The best-characterized 
activation process is the conserved cytoplasmic 
polyadenylation of maternal mRNAs that contain 
one or more U-rich cytoplasmic polyadenylation 
elements (CPEs) nearby to the ubiquitous nuclear 


polyadenylation AAUAAA signal. Extending a short 
poly(A) tail during meiotic maturation or after fertil- 
ization dramatically increases protein synthesis, most 
likely by enhancing PABP-eIF4G contacts. Methy- 
lation of the cap structure at N-7 (which enhances 
eIF4E binding), in conjuction with polyadenylation, 
synergistically stimulates translation during Xenopus 
oocyte maturation. Interestingly, some regulatory 
proteins appear to have a dual role in modulating the 
expression of maternal mRNA; CPEB is both a repres- 
sor in the oocyte, and an activator of cytoplasmic 
polyadenylation in the maturing egg. 

Without a doubt, huge progress has been made in the 
last decade in masked mRNA research. Regulatory 
sequences have been delineated in a wide variety of 
mRNAs from many lower and higher eukaryotes, and 
at least a dozen or so specific RNA-binding proteins 
that mediate repression/activation have been cloned 
and characterized. Strikingly, there is no single path- 
way by which mRNAs are regulated in early develop- 
ment: Control may be exerted by interfering with the 
function of the 5’ cap structure or the 3’ poly(A) tail, a 
mixture of both, or by as yet unknown means. 


See also: Messenger RNA (mRNA); Translation; 
Translational Control 
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Maternal effect refers to the influence of maternal gene 
expression on embryonic development as revealed by 
the behavior of mutations in maternally expressed 
genes. Although much of embryonic development is 
dependent only on activity of the embryonic genome, 
proteins and RNAs made during oogenesis and de- 
posited in the egg often play a substantial role, espe- 
cially in the earliest events of embryogenesis. 

The most easily observed maternal effect is revealed 
by maternal effect mutations, a class of mutations with 
a characteristic behavior. Females homozygous for 
such mutations appear normal, but all of their progeny 
exhibit a mutant phenotype. This contrasts with the 
more typical behavior of ‘zygotic’ mutations, which 
cause homozygous individuals themselves to exhibit 
the mutant phenotype. (The term ‘zygotic’ derives 
from the fact that the mutations exert their effect 
through the genome created by fusion of sperm and 
egg pronuclei in the zygote.) 


Maternal effects were first discovered in popula- 
tions of snails with two different directions of shell 
coiling: rightward and leftward. A series of crosses 
among the snails revealed inheritance patterns for 
coiling that did not fit with typical mutations. The 
snails produced broods of all left-coiling progeny or 
all right-coiling progeny irrespective of whether the 
parents’ shells coiled leftward or rightward. Further 
analysis showed that the rightward allele of the coil- 
ing gene was dominant and that the coiling direction 
depended on the genotype of the mother and not the 
embryo. If the mother carried one or two copies of the 
rightward allele, all her progeny coiled right; if she 
carried two copies of the leftward allele, all her pro- 
geny coiled left. Assuming rightward coiling is the 
ancestral form, then leftward coiling is a recessive 
maternal effect mutation. 

Maternal effect mutations have been induced and 
studied most extensively in fruit flies and nematodes. 
Most known maternal effect mutations cause lethality; 
that is, mothers homozygous for the mutations lay 
eggs that can be fertilized but arrest during develop- 
ment. Studies of maternal effect lethal mutations 
have revealed two kinds of maternal effects based 
on tests for maternal sufficiency and necessity. 
Maternal sufficiency is revealed when homozygous 
progeny of a heterozygous mother do not exhibit the 
mutant phenotype. Maternal necessity is revealed 
when introduction of a wild-type allele to progeny 
of a homozygous mother fails to rescue the embryos 
to a wild-type phenotype. If expression of a wild-type 
gene is maternally sufficient but not necessary, the 
mutation is called a partial maternal. This behavior 
indicates that the gene in question is expressed both 
by maternal and embryonic genomes and that expres- 
sion by either is sufficient for normal development. If 
maternal expression is both sufficient and necessary, 
the mutation is considered a strict maternal effect. 
Strict maternal effect could indicate that the gene is 
not expressed in the embryo, or that it is not expressed 
at an appropriate time or place or in sufficient quan- 
tities to compensate for the absence of the maternal 
contribution. 

Many zygotic lethal mutations exhibit a cryptic 
maternal effect. In this case, gene expression is re- 
quired both maternally and zygotically, but because 
the maternal contribution from the heterozygous 
mother is not adequate to allow the homozygote to 
develop to adulthood, the maternal effect is masked. 
To uncover this cryptic maternal effect, two methods 
are used. The first relies on temperature-sensitive mu- 
tations. A female with a temperature-sensitive embry- 
onic lethal mutation can develop to a fertile adult 
at permissive temperature. If she is then shifted to 
the nonpermissive temperature, inactivation of the 
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temperature-sensitive gene product during oogenesis 
will result in a maternal effect among the progeny 
if the product is required maternally. The second 
method is the creation of genetic mosaics. Using a 
variety of techniques, females that are otherwise 
wild-type can be made homozygous for a mutation 
of interest in the ovary. Failure to produce a required 
wild-type gene product during oogenesis will produce 
a maternal effect upon the progeny. 

Maternal effects are known in both vertebrates and 
invertebrates, although the extent of maternal effects 
varies greatly among animals. In fruit flies and nema- 
todes, maternal gene expression plays a major role in 
patterning the early embryo. As a result, studies of 
maternal effects have made a significant contribution to 
understanding of early development in these systems. 


See also: Chimera; Developmental Genetics; 
Temperature-Sensitive Mutant 


Maternal Inheritance 


J Poulton 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0798 


Maternal inheritance means the inheritance of a DNA 
sequence, with or without associated phenotype, 
exclusively from the mother. This implies extrachromo- 
somal inheritance. In medical texts mitochondrial and 
maternal inheritance are used synonymously. This 
definition excludes epigenetic phenomena, which may 
cause parent of origin effects. In X-linked inheritance, 
skewed inactivation can mimic a maternal inheritance 
pattern. 


Mitochondrial Diseases 


Mitochondrial DNA (mtDNA) mutants cause diverse 
phenotypes in different organisms due to impaired 
respiratory chain function: the petit colony morphol- 
ogy with loss of aerobic respiration in yeast, cyto- 
plasmic male sterility and nonchromosomal stripe 
in higher plants, neurological or multisystem disease 
in man. The latter include mtDNA rearrangements 
(Holt et al, 1988) which cause sporadic Kearns— 
Sayre syndrome, Pearson syndrome (Rotig et al., 
1988), or maternally inherited diabetes and deafness 
(Ballinger et al., 1994); point mutations which cause 
mitochondrial encephalopathy with lactic acidosis and 
stroke-like episodes (MELAS) (Goto et al., 1990), 
myoclonic epilepsy with ragged red fibres (MERRF) 
(Shoffner et al., 1990), and Leber’s hereditary optic 
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neuropathy (LHON) (Wallace et al., 1988). Hetero- 
plasmy (the presence of both normal and mutant 
mtDNA in a single individual) is present in many 
such mtDNA diseases, so that the proportion of 
mutant mtDNA in any cell or tissue may vary from 
0% to 100%. In most disorders, there appears to be a 
threshold effect such that tissues with a high level of 
mutant result in symptoms and preferential accumula- 
tion of mutant mtDNAs in affected tissues appears to 
explain their progressive nature (Poulton et al., 1995; 
Weber et al., 1997). 


Parent of Origin Effects in Human 
Diseases 


Several other conditions appear to be inherited or 
acquired from the mother without exhibiting a strictly 
maternal inheritance pattern. Environmental factors 
include intrauterine exposure to infections, maternal 
antibodies, or biochemical environment. In congenital 
myasthenia gravis, the fetus of an affected mother is 
exposed to maternal antibodies to acetylcholine recep- 
tors. These antibodies cross the placenta, causing tran- 
sient weakness in the child. In some cases the exposure 
is so prolonged or the titer so high that the child 
develops severe contractures (arthrogryphosis) even 
though the mother’s symptoms may be mild. In 
other conditions such as type 2 diabetes, it is possible 
that the excess of maternal over paternal transmissions 
is attributable to intrauterine ‘programming’ which 
also occurs in offspring of female mice with chem- 
ically induced diabetes. 

Pseudomaternal inheritance may occur where an 
autosomal mutant impairs sperm function. For ex- 
ample, myotonic dystrophy is a dominant disorder 
caused by expansion of a triplet repeat. It seems that 
the severe, congenital form is associated with large 
expansions that are never seen in sperm and hence 
always maternally inherited. 

Parent of origin (‘epigenetic’) effects are also seen 
in disorders associated with mutations in imprinted 
regions of the genome, for example Prader—Willi and 
Angelman syndromes. 


Presumed Benefit of Uniparental 
Inheritance of mtDNA 


All mitochondrial genomes encode only a small num- 
ber of polypeptides. The nucleus encodes the vast 
majority of respiratory chain subunits and all of the 
proteins needed for replication, transcription, and 
maintenance of mtDNA. There is therefore poten- 
tially a large number of important interactions between 
mitochondrial and nuclear genomes. The constraints 


of these requirements may explain the high level of 
uniformity (homoplasmy) among mtDNAs within 
an individual, contrasting with the great diversity 
of mtDNA between individuals. Furthermore, the 
importance of homoplasmy is implied by (1) the 
instability of mtDNA heteroplasmy in unicellular 
organisms and (2) the existence of a genetic bottleneck 
in multicellular organisms as diverse as maize and man. 
The chance of detrimental heteroplasmic mtDNA 
mutants persisting in subsequent generations is min- 
imized by uniparental inheritance combined with a 
genetic bottleneck. 


mtDNA Bottlenecks in Human 
Populations 


Extensive studies of human population genetics and 
evolution have failed to demonstrate unequivocal 
evidence of mtDNA recombination: if it occurs it 
is probably rare. The unique inheritance pattern of 
mtDNA appears to be a consequence of the mito- 
chondrial bottleneck. When there is a point mutation 
difference between a mother and her offspring, there 
may be complete switching of mtDNA type ina single 
generation: that is, each was homoplasmic with regard 
to that base. Because oocytes contain approximately 
100000 mtDNAs and yet the mutation probably only 
occurs once, there must be a restriction/amplification 
in numbers of mtDNAs where by the mutant mtDNA 
becomes the mitochondrial founder for the child. 

Studies of oocytes from both controls (Marchington 
et al., 1997) and from patients with mitochondrial 
disease (Marchington et al., 1998) suggest that segre- 
gation of founder mtDNA molecules has prob- 
ably occurred by the time the oocytes are mature 
(Poulton et al., 1998). However, the apparent bottle- 
neck size may depend on the mtDNA mutation. For 
instance, segregation was very marked in a human 
family with mitochondrial disease due to a mt DNA 
mutation at position T8993G (Blok et al., 1997) com- 
pared with a patient with the mtDNA rearrangement 
(Marchington et al., 1998). 

Four groups have constructed heteroplasmic 
mouse models of mtDNA segregation in which the 
major component of the bottleneck occurs between 
the primordial germ cell and primary oocyte stage 
(Jenuth et al., 1996; Laipis, 1996; Meirelles and Smith, 
1997; White, 1999). None of these mice was symptom- 
atic and there are no published analyses of developing 
female germ cells from any animal models using detri- 
mental mtDNA mutations. Taken together, these stud- 
ies suggest that a major bottleneck occurs during 
oogenesis and that mtDNA does not segregate much 
during embryogenesis. 


Prenatal Diagnosis of mtDNA Disease 


Precise recommendations regarding prenatal diagnosis 
for maternally inherited mtDNA diseases is straight- 
forward if there are all of: (1) a close correlation 
between load of mutant mtDNA and disease severity, 
(2) uniform distribution of mutant in all tissues, and 
(3) no change in mutant load with time. These are 
fulfilled in a minority of mtDNA disorders (Poulton 
and Turnbull, 2000). 

Currently the options open to women with 
mtDNA disease are: 


1. receiving donated oocytes; 

2. preimplantation diagnosis — although the tissue dis- 
tribution of mtDNA mutants varies postnatally, 
current data suggest that in the preimplanta- 
tion embryo, heteroplasmic mtDNA is uniformly 
distributed (Jenuth et al, 1996; Molnar and 
Shoubridge, 1999); 

3. chorionic villus sampling (CVS) — such evidence 
that exists suggests that the mutant load in extra- 
embryonic tissues, such as chorionic villi, probably 
reflects that of the fetus (White et al., 1999). 
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The term ‘mating’ can be used to describe the manner 
in which an organism recognizes a compatible sexual 


Prenatal Diagnosis of mtDNA Disease 


Precise recommendations regarding prenatal diagnosis 
for maternally inherited mtDNA diseases is straight- 
forward if there are all of: (1) a close correlation 
between load of mutant mtDNA and disease severity, 
(2) uniform distribution of mutant in all tissues, and 
(3) no change in mutant load with time. These are 
fulfilled in a minority of mtDNA disorders (Poulton 
and Turnbull, 2000). 

Currently the options open to women with 
mtDNA disease are: 


1. receiving donated oocytes; 

2. preimplantation diagnosis — although the tissue dis- 
tribution of mtDNA mutants varies postnatally, 
current data suggest that in the preimplanta- 
tion embryo, heteroplasmic mtDNA is uniformly 
distributed (Jenuth et al, 1996; Molnar and 
Shoubridge, 1999); 

3. chorionic villus sampling (CVS) — such evidence 
that exists suggests that the mutant load in extra- 
embryonic tissues, such as chorionic villi, probably 
reflects that of the fetus (White et al., 1999). 


References 

Ballinger S, Shoffner J, Gebhart S, Koontz D and Wallace D 
(1994) Mitochondrial diabetes revisited. Nature Genetics 
7(4): 458—459. 

Blok R, Cook D, Thorburn D and Dahl H (1997) Skewed 
segregation of the mtDNA nt 8993 (T — G) mutation in 
human ocytes. American Journal of Human Genetics 60(6): 
1495-1501. 

Goto Y-I, Nonaka | and Horai S (1990) A mutation in the tRNA 
leu(UUR) gene associated with the MELAS subgroup of 
mitochondrial encephalomyopathies. Nature 348: 651-653. 

Holt IJ, Harding AE and Morgan-Hughes JA (1988) Deletions in 
muscle mitochondrial DNA in patients with mitochondrial 
myopathies. Nature 331: 717-719. 

Inoue K, Nakada K, Oguva A et al. (2000) Generation of mice 
with mitochondrial dysfunction by introducing mouse 
mtDNA carrying a deletion into zygotes. Nature Genetics 
26(2): 176-181. 

Jenuth J, Peterson A, Fu K and Shoubridge E (1996) Random 
genetic drift in the female germline explains the rapid segre- 
gation of mammalian mitochondrial DNA. Nature Genetics 
14(2): 146-151. 

Laipis P (1996) Construction of heteroplasmic mice contain- 
ing two mitochondrial DNA genotyoes by micromanipulation 
of single-cell embryos. Methods in Enzymology 264: 345-357. 

Marchington D, Hartshorne G, Barlow D and Poulton J (1997) 
Homoploymeric tract heteroplasmy in mtDNA from tissues 
and single oocytes: support for a genetic bottleneck. Amer- 
ican Journal of Human Genetics 60: 408—416. 

Marchington D, Hartshorne G, Barlow D and Poulton J (1998) 
Evidence from human oocytes for a genetic bottleneck in a 


Mating Types IISI 


mitochondrial DNA disease. American Journal of Human 
Genetics 63: 769-775. 

Meirelles F and Smith L (1997) Mitochondrial genotype in a 
mouse heteroplasmic lineage produced by embryonic kary- 
oplast transplantation. Genetics 145: 445—451. 

Molnar M and Shoubridge E (1999) Preimplantation diagnosis for 
mitochondrial disorders. Neuromuscular Disorders 9(6—7): 521. 

Poulton J and Turnbull D (2000) Neuromuscular disorder. In: 
74th ENMC International Workshop: Mitochondrial Diseases. pp. 
00- 00. 

Poulton J, O’Rahilly S, Morten K and Clark A (1995) Mitochon- 
drial DNA, diabetes and pancreatic pathology in Kearns— 
Sayre syndrome. Diabetologia 38: 868-871. 

Poulton J, Marchington D and Macaulay V (1998) Is the 
bottleneck cracked? American Journal of Human Genetics 62: 
752-757. 

Rotig A, Colonna M, Blanche S et al. (1988) Deletions of blood 
mitochondrial DNA in pancytopenia. Lancet i: 567-568. 
Shoffner JM, Lott MT, Lezza AM et al. (1990) Myoclonic epilepsy 
and ragged-red fiber disease (MERRF) is associated with 
a mitochondrial DNA tRNA (Lys) mutation. Cell 61(6): 

931-937. 

Sutherland B, Stewart D, Kenchington ER and Zouros E (1998) 
The fate of paternal mitochondrial DNA in developing 
female mussels, Mytilus edulis: Implications for the mechan- 
ism of doubly uniparental inheritance of mitochondrial DNA. 
Genetics 148(1): 341-347. 

Wallace DC, Singh G, Lott MT et al. (1988) Mitochondrial DNA 
mutation associated with Leber’s hereditary optic neuropa- 
thy. Science 242(4884): 1427-1430. 

Weber K, Wilson J, Taylor L et al. (1997) A new mtDNA 
mutation showing accumulation with time and restriction 
to skeletal muscle. American Journal of Human Genetics 60: 
373-380. 

White S (1999) Molecular Mechanisms of Mitochondrial Disorders. 
PhD thesis, University of Melbourne. 

White S, Collins V, Wolfe R et al. (1999) Genetic counseling and 
prenatal diagnosis for the mitochondrial DNA mutations 
at nucleotide 8993. American Journal of Human Genetics 65: 
474-482. 


See also: Epigenetics; Mitochondria, Genetics of; 
Mitochondrial DNA (mtDNA); Mitochondrial 
Inheritance; X-Chromosome Inactivation 


Mating Types 
L A Casselton 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.0799 


The term ‘mating’ can be used to describe the manner 
in which an organism recognizes a compatible sexual 
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partner and how this recognition leads to fusion 
of gametic nuclei and the initiation of sexual repro- 
duction. 

Sexual reproduction is important for maintaining 
genetic variability within populations. The sexual 
cycle is initiated by cell fusion, followed by nuclear 
fusion and is completed by meiosis. Cell fusion, or 
fertilization, brings together available genetic vari- 
ation in a population and meiosis leads to its re- 
combination. For a sexual cycle to be effective, the 
cells that fuse must come from genetically different 
individuals. There has been strong selection during 
evolution for mechanisms that prevent selfing, i.e., 
fusion of cells from the same individual. These 
mechanisms are known as incompatibility systems. 
Selfing is impossible in the animal kingdom because 
of sexual dimorphism and fusion is restricted to highly 
specialized gametes, the male sperm and female eggs. 
In flowering plants, where a single flower may pro- 
duce both male and female gametes, there are gen- 
etically imposed self-incompatibility systems that 
prevent ovules from being fertilized by pollen from 
the same plant. In simple eukaryotes such as fungi, the 
mating partners may be morphologically indistin- 
guishable, but genetic barriers to selfing still exist 
and these barriers divide a population into different 
‘mating types.’ 

Fungi, because they lack the complication of mor- 
phological differences associated with sex, are excel- 
lent models for understanding some of the molecular 
events that control mating. Mating type is determined 
by genes that reside at a specific position on one of 
the chromosomes known as the mating type locus. 
These few genes are sufficient to determine the mating 
type of an individual, the ability to attract a compatible 
mating partner, and the ability to bring about major 
changes in gene expression once the gametic cells have 
fused. 

The way in which the mating type genes exert their 
effect can best be illustrated by reference to the uni- 
cellular budding yeast Saccharomyces cerevisiae. Here, 
just three genes are sufficient to tell the cells whether 
they are haploid or diploid and, if haploid, which of 
two mating types they have. These three genes reside 
at the mating type locus, MAT. There are two versions 
of this locus. Haploid cells may be MATa or MAT a. 
Cells of both types are able to express an array of 
genes required for mating, but some of these genes 
can only be expressed in a mating type-specific fash- 
ion, i.e., mating functions specific to MATa cells and 
mating functions specific to MAT« cells. The only 
differences between the cells are the genes that reside 
at the MAT locus. Cells are MATa if they have a single 
gene al, whereas they are MATa if they have two 
genes, «l and «2. The proteins encoded by these 


three mating type genes are all different, but have the 
same general function of being DNA binding proteins 
(transcription factors) that regulate transcription of a 
specific subset of genes. In both types of cells, genes 
encoding general mating functions, such as compon- 
ents of a signaling pathway mentioned below, are 
expressed independently of the mating type proteins; 
they are expressed constitutively. In haploid MATa 
cells, genes required for expression of MATa-cell- 
specific mating functions are also constitutively 
expressed because the al protein has no role in haploid 
cells. In MATa cells, the MATa genes have two 
functions: to repress the transcription of MATa-cell- 
specific genes; and to activate the transcription of 
MATo-cell-specific genes. The «2 protein is a repres- 
sor and a1 protein is an activator. 

The genes that are expressed in a mating type- 
specific way enable cells to identify a compatible part- 
ner. MATa and MATa cells detect each other by means 
of small, secreted peptide pheromones. Each mating 
type produces its distinct pheromone (a-factor or 
a-factor) together with a cell-surface pheromone 
receptor that will only bind pheromone produced by 
cells of the other mating type. Pheromone binding 
triggers an intracellular signal transduction pathway. 
This is known as the pheromone response pathway, 
and it leads to activation of a transcription factor 
responsible for activating transcription of genes 
required to bring about cell fusion and subsequent 
nuclear fusion. The pheromone signaling pathway of 
yeast is one of five mitogen-activated protein kinase 
(MAPK) pathways found in yeast. Amongst the genes 
that are activated by the pheromone response are 
those that lead to the production of cell surface pro- 
teins that make the compatible cells adhere to each 
other. Another activated protein has the dual role of 
arresting the cell cycle so that cells can fuse before 
DNA replication, as well as being a scaffold protein 
that links the site on the cell surface where receptor 
activation occurred to the proteins that determine the 
orientation of the cytoskeleton. Mating cells are thus 
able to respond to the direction from which the phero- 
mone is coming and to reorient their growth to form 
mating projections that will enable them to fuse. 

Once compatible cells have fused, there is no need 
to send and respond to mating signals. The genes 
required for signaling are repressed, or fail to be acti- 
vated, and the diploid cell follows a new developmen- 
tal program that will ultimately lead to meiosis. A 
remarkably simple mechanism enables a cell to sense 
that it has successfully mated. The a1 protein encoded 
by the gene at the MATa locus and the «2 protein 
encoded by one of the genes at the MATa locus are 
produced in different haploid cells, but following cell 
fusion, these two proteins are in the same cell and form 
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a heterodimeric protein complex. This complex is a 
diploid cell-specific transcription factor. It represses 
directly or indirectly the transcription of all genes 
required for mating and permits the activation of 
genes required for meiosis. 

In other fungi, for example, the bread mold Neuro- 
spora crassa, the genes found at the mating-type locus 
are not all homologs of the yeast genes. However, 
the proteins they encode are transcription factors 
and lead to a similar cell-type expression of the signal- 
ing molecules that enable compatible mates to detect 
each other. The mating-type genes of the mushroom 
fungi, such as the ink cap Coprinus cinereus, are 
remarkable in that they are multiallelic and there 
may be several thousands of different mating types 
in a population. Here, the mating-type genes encode 
large families of proteins that are the homologs of 
the yeast al and «2 proteins, and mate recognition 
depends on a sensitive dimerization domain that per- 
mits proteins from compatible partners to dimerize, 
but not those from incompatible partners. The pher- 
omones and receptors of these fungi help to determine 
mating type and are also members of a large family, 
where subtle differences in amino acid sequence are 
sufficient for a pheromone to distinguish between 
compatible and incompatible receptors. 

The genetic mechanisms that enable fungal cells to 
signal and respond to each other during mating are 
universal and illustrate in relatively simple systems 
some of the complex cellular mechanisms involved in 
mating in all eukaryotic organisms. 


See also: Mating-Type Genes and their Switching 
in Yeasts 
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Although nearly all the cells of an organism contain 
the same genetic material, they vary in their patterns of 
gene expression to produce multiple arrays of cell 
types during development. To study such fundamental 
developmental controls, single-celled eukaryotes have 
been exploited, primarily because of the ease in experi- 
mental manipulations and the ability to monitor deci- 
sions at the single-cell level. Classical and molecular 
genetic studies carried out with fission yeasts (Schizo- 
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saccharomyces pombe) and budding yeasts (Saccharo- 
myces cerevisiae) have shown that these organisms 
have chosen very different mechanisms of cell-type 
change where individual haploid cells express one or 
the other mating cell type during mitotic growth. 
Interestingly, the pattern of cell division in both yeasts 
is analogous to the stem cell pattern of cell division 
found in the growth of many self-renewing tissues of 
mammals and in other species. As the mechanism of 
asymmetric cell division is crucial to explain cellular 
differentiation during development in higher systems, 
these yeasts have provided different paradigms for 
mechanisms of asymmetric cell division. This process 
generates cells of opposite type, which can mate and 
undergo meiosis and sporulation. This review dis- 
cusses the mechanism of cell-type change in these 
two distantly related organisms as a model system 
for answering questions about gene regulation in 
eukaryotes. 


Schizosaccharomyces pombe and 
Saccharomyces cerevisiae as Model 
Systems for Cellular Differentiation 


Of the two yeasts, budding yeast is the more import- 
ant economically as it has been used by bakers and 
brewers worldwide for centuries, while fission yeast is 
only used for brewing beer and rum in Africa. How- 
ever, both organisms have been used extensively in 
research because they are eukaryotic, are easy to 
grow in the laboratory, and have well-developed 
genetics. These organisms contain only about three 
times as much DNA as the prokaryote Escherichia 
coli. Entire genomes of both yeasts have been se- 
quenced, showing that these organisms are not closely 
related. They both grow fast with a doubling time of 
about 2% h and may grow in either liquid or solid 
media; a single cell grows into a colony of approxi- 
mately 10° cells in just 3 days. 

Fission yeast and budding yeast cells grow by a 
different pattern. Budding yeast cells grow by produ- 
cing a round bud, which pinches off the oval or round 
mother cell after the bud reaches nearly the size of 
the mother cell; fission yeast cells, which are rect- 
angular in shape, elongate and then divide in the 
middle to produce daughter cells of nearly equal size. 
Because of their single-cell nature, both yeasts have 
been extensively used to study control of the cell cycle 
and growth, and meiosis and sporulation. Another 
major area of research addresses the mechanism of 
cellular differentiation, as haploid cells of each yeast 
exist in one of two cell types. 

Sa. cerevisiae haploid cell types are called a and a 
mating types or sexes which are controlled by the 
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alternate alleles of the mating-type locus (MAT) 
MATa and MATa, respectively. These two alleles dif- 
fer by a DNA substitution; ~ 650 bps are unique to 
MATa and ~750bps are unique to MATa. While 
MATa codes for a single regulatory protein, the 
homeodomain protein a1, MATa codes for the protein 
al and the homeodomain protein «2. These proteins 
control expression of many other cell-type specific 
genes to confer an a or o cell type. The «1/42 hypoth- 
esis established that the «1 factor turns on the 
a-specific genes, while the «2 factor turns off the 
a-specific genes. Accordingly, lack of functions confers 
an a cell type. The a1 factor has no known function in 
haploid cells. But the diploid cells resulting from mat- 
ing of haploid MATa and MATa cells acquire an addi- 
tional cell type, which is unable to mate but capable 
of undergoing meiosis and spore formation. Meiosis 
culminates in the production of asci, each of which 
contains two MATa and two MATa meiotic products 
called ascospores. 

Likewise, Sc. pombe has Plus (P) and Minus (M) cell 
types conferred by the alternate mat1-P and mat1-M 
alleles of the mating-type locus (mat1), which consist 
of ~1.1 kb P- and M-specific DNA regions, respect- 
ively. Each mati allele encodes two divergently tran- 
scribed genes; one gene from each allele is required 
for mating but both genes are required for meiosis 
and sporulation. Indeed, one of the P genes encodes a 
homeodomain protein, while one of the M gene pro- 
ducts shows homologies in the HMG; and HMG, 
domain of the Tdy and TDF genes located in the 
testis-determining region in the Y chromosome of 
mice and humans, respectively. Existence of such 
domains suggests that the mat1 genes encode tran- 
scription factors. Indeed, these mat1 factors control 
several other genes located elsewhere in the genome to 
regulate cell type, meiosis, and sporulation. Only cells 
of P can mate to M, which requires nutritional starva- 
tion. The resulting zygote undergoes meiosis to pro- 
duce two mat1-P and two mat1-M ascospores in each 
ascus. Unlike Sa. cerevisiae, deletion of mat1 causes 
sterility in Sc. pombe. 

Following studies of these yeasts, whose mating- 
type loci were molecularly cloned and shown to encode 
master regulatory transcription factors, mating-type 
loci of many other fungi have been cloned. These 
include Schizophyllum commune, Pyrenopeziza bras- 
sicae, Ustilago maydis, Neurospora crassa, Podospora 
anserina, and Candida albicans. Most of their mating- 
type loci encode members of the homeodomain family 
of transcription factors. Such factors frequently het- 
erodimerize on mating to generate an active transcrip- 
tion regulator such as al - «2. In many of these fungi, 
the mating-type locus is stable and sometimes exists in 
many alternate alleles where cells of the same type do 


not mate but confrontation of any pair of different 
types results in conjugation and development. 


Mating-Type Switching Occurs by 
Directed DNA Rearrangement at the 
mat! /MAT Locus 


Naturally occurring strains of both Sa. cerevisiae and 
Sc. pombe efficiently switch mating type spontan- 
eously. As the alleles of the locus contain different 
DNA sequences, the ability to switch implies that 
new information must have been substituted during 
mating-type interconversion. This was first estab- 
lished genetically and confirmed by molecular studies 
with the budding yeast. Earlier genetic studies had 
identified three loci essential for MAT switching. 
Each of the HO (homothallism), HML (homothallic 
locus on the Left arm of chromosome 3), and HMR 
(homothallic locus on the Right arm of chromosome 
3) loci were found in two alternate, naturally occur- 
ring forms or alleles (Figure |). The HO-containing 
strains are able to switch from a to «a and vice versa, 
while the ho derivatives are stable as one or the other. 
The transposition model (Figure 1) predicted that 
strains containing mutations in the HM loci should 
produce only mutant MAT alleles reflective of muta- 
tions of the HML or HMR donor loci. According 
to the specific cassette model, HML contains unex- 
pressed MATa information while HMRa contains 
unexpressed MATa information. To activate their 
information, a replica of the donor locus is transplaced 
into MAT where it substitutes the existing allele by 
gene conversion. Studies to isolate mutations of HML 
and HMR discovered the phenomenon of gene silen- 
cing. These so-called ‘MAT-wounding’ experiments 
established genetically the controlling element/cas- 
sette-model. The MAT, HML, and HMR loci were 
molecularly isolated, by complementing mat muta- 
tions following transformation with a plasmid library 
containing Sa. cerevisiae genomic sequences. This 
work established that intact copies of MAT exist at 
HML and HMR but they are kept silent by the MAR/ 
SIR repression mechanism. 

Subsequent studies with Sc. pombe showed that 
the transcriptionally active mat1 allele is switched by 
transposing a copy of the donor mat2 or mat3 locus 
by gene conversion (Figure 1). The mat2 locus 
contains silenced mat-P information, while the mat3 
locus contains silenced mat-M information. Analo- 
gous to the silencing phenomenon in Sa. cerevisiae, 
Sc. pombe mat2-P and mat3-M cassettes are silenced 
by several transacting factors, some of which encode 
proteins implicated in heterochromatin organization 
in other organisms. Thus, both donor loci in both 
organisms are silenced; to activate that information, a 
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Figure | (A) Arrangement of expressed MAT and silent HMLa and HMRa loci on chromosome 3 of Saccharomyces 
cerevisiae. HMR is located about 120kb to the right, while HML is about 180kb to the left of MAT. The MAT 
interconversion occurs by replacing the Y region derived from the HMLa (747 bp) or HMRa (642 bp) locus. The W 
(723 bp), X (704 bp), and ZI (239 bp), and Z2 (88 bp) boxes represent sequence homology shared by these loci. This 
homology is used for pairing these elements during the process of gene conversion. The DSB at the MAT Y/ZI 
boundary catalyzed by the HO-endonuclease initiates recombination. The dot represents the centromere. (B) 
Arrangement of expressed mat! and silent mat2-P and mat3-M loci on chromosome 2 of Schizosaccharomyces pombe. 
mat2 is located |7 kb distal to mat! and mat3 is situated || kb distal to mat2. Switching occurs by replacing the allele- 
specific mat! sequences with those derived from the mat2-P or mat3-M locus. Short sequence homologies H, (59 bp), 
H2 (135 bp), and H3 (57 bp) flank the P-specific | 104 bp (jagged line) and M-specific | 128 bp (straight line) regions. The 
vertical arrow indicates a strand-specific imprint that initiates gene conversion for switching of mat! only. Long 
arrows in both drawings represent unidirectional gene conversion. 


copy of a specific donor is transposed into the tran- 
scriptionally active MAT/mat1 locus by recombina- 
tion. 


The Pattern of Mating-Type Switching in 
Cell Lineage Is Highly Regulated 


A remarkable feature of both yeasts is the asymmetry 
of cell division such that only one of two sister cells 
produces progeny with a changed cell type. Each yeast 
follows its own rules of switching. The rules followed 
by Sa. cerevisiae are: 


1. The pairs rule: Whenever a switch has occurred, 
both progeny of a cell always switch together. 
The mother-daughter asymmetry rule: Cells in- 
herently divide by asymmetric cell division, since 
only the ‘experienced’ mother cell produces 
switched progeny. The newly born daughter cell 
does not; it will only do so when it has produced 
its own daughter, thus acquiring the motherhood 
status. 
. The directionality rule: Nearly 80 % of mother cells 
switch. 


2. 


Analogous yet different rules for Sc. pombe are: 


1. The pairs rule: Between a pair of sister cells, only 
one member is competent to produce switched 
progeny. That is, a Pu (P mating type, unswitch- 
able) cell produces Ps (switchable) and Pu cells in 
~ 80% of cell divisions, and two Ps daughters are 
never produced. The remaining 20% of divisions 
produce two Pu daughter cells. 

. The one-in-four granddaughters switching rule: A 
switchable cell (i.e., Ps or Ms) produces only one 
switched daughter. Combining this with rule (1), 
only one in four granddaughters of a Pu or Mu cell 
switches in ~ 80% of cases. 

. The recurrent switching rule: The sister of the 
recently switched cell is itself competent to switch 
in ~ 80% of cell divisions. By combining the above 
rules, a Pu cell produces a Ps daughter (which pro- 
duces Mu and Ps daughters of its own) and a Pu 
daughter. 

. The directionality rule: A Ps or Ms cell switches in 
about 80% of the cases satisfying the above rules. 


These patterns of switching suggest that: 


1. While switching occurs in G4 in Sa. cerevisiae to pro- 
duce two switched daughters, Sc. pombe switches in 
Gz such that only one of two daughters switches. 
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2. In Sa. cerevisiae, only one type of asymmetric cell 
division occurs such that only mother cells switch. 
In Sc. pombe, two consecutive asymmetric cell div- 
isions are required to produce switches of one in 
four granddaughters. 


Mechanisms of Asymmetric Cell Division 


As discussed above, most cell divisions in both types 
of yeast produce developmentally different daughters 
as the potential to switch is not equivalently acquired 
by sister cells. Overall, both yeasts switch cell type by 
DNA rearrangements, but the mechanism of initiation 
of recombination in cell pedigrees is fundamentally 
different in these organisms. 

The primary control of restricting the switching to 
mother cells in Sa. cerevisiae lies in the expression of 
the HO gene, which encodes a site-specific endonu- 
clease, only in mother cells and in a narrow window of 
G,. This control is exerted by >1.4 kb of the promoter 
region of HO carrying several cis-acting elements 
which respond to different trans-acting factors 
encoded by other unlinked genes, including several 
SWI (for switch) genes. The nonswitching ho strains 
carry a mutation in the endonuclease gene itself. 
Recent studies have shown that sequestration of the 
ASH1 mRNA into the daughter cell prohibits, by an 
unknown mechanism, the expression of HO in the cell 
cycle of the daughter cell. 

The HO endonuclease belongs to a class of unusual 
endonucleases whose recognition site is over 16bp 
long. The cleavage occurs with a four-base 3’ extension 
terminating in 3’ hydroxyl groups exactly at the 
junction of the allele-specific ‘Y’ sequence and the 
adjoining homology ‘Z’ sequence shared by all cas- 
settes (Figure |). In exponentially growing and 
switching cells, about 2% of the cells contain the 
double-stranded break (DSB). This break initiates 
gene conversion where broken ends invade the intact 
homologous sequences of the HM loci by the classical 
double-stranded break repair mechanism. Interesting- 
ly, the same Y/Z junction sequences present at the HM 
loci are inaccessible in vivo to cleavage by the endo- 
nuclease. Consequently, only MAT switches, while 
the donor loci remain intact during recombination. 

In contrast, Sc. pombe switching occurs by a novel 
imprinting event (a DNA lesion) at the mat1 locus 
(Figure 1). This lesion, which may be a single-strand 
nick or one or more ribonucleotides, is thought to 
generate a transient site-specific DSB at the junction 
of the allele-specific sequence and the 59 bp homology 
H1 box found at all these cassettes. Since only one in 
four granddaughters of a given cell switches, the deci- 
sion for a given switch must be made in the grand- 
parental cell two generations before the switched cell 


is produced. Clearly, two consecutive asymmetric cell 
divisions are required to produce a switch of a single 
grandchild cell. 

For producing asymmetric cell division, a DNA 
‘strand segregation model’ was proposed in which the 
Watson and Crick strands of mat1 DNA are considered 
nonequivalent in their ability to acquire the develop- 
mental potential for switching. Molecular analysis has 
shown that switching is initiated by the lesion result- 
ing from a site- and strand-specific base modification 
or a nick (see below). Mutations in the swi1, swi3, and 
swi7 genes cause defects in switching by reducing 
the level of the lesion. The functions of swil and 
swi3 are not defined, but swi7 encodes the catalytic 
subunit of DNA polymerase œ. These genetic 
studies established that: (1) gene conversion potential 
segregates in cis with mat; (2) potential is conferred 
by the swil, swi3, and swi7 gene products; and (3) 
potential is imparted by the grandparental cell to one 
of its daughter chromosomes. These studies led to the 
strand segregation model for mat1 switching in which 
only one of the specific chromatids of the grand- 
parental cell is imprinted during its replication. The 
cell inheriting this chromatid will become switchable 
as it has inherited the imprint. During its replication, 
one progeny will switch owing to the transformation 
of the lesion to a DSB in S, while its sister will remain 
unswitched but will acquire switchability. This 
model was established genetically with the finding 
that strains engineered to contain an inverted duplica- 
tion of mat1 switched two cousins, not sisters, among 
four granddaughters of a cell. These studies indicated 
that the pattern of inheritance of DNA chains by 
progeny cells regulates the pattern of switching. 


Molecular Mechanism of mat! 
Imprinting 


Molecular analysis of DNA isolated from Sc. pombe 
cells showed that about 20-25% of chromosomes are 
cleaved due to a DSB at mati, and the level of this 
break remains constant throughout the cell cycle. 
However, when the DNA was isolated by gentler 
means, by embedding spheroplasts in agarose plugs, 
the DSB was absent. When that DNA was denatured 
by formaldehyde treatment, both DNA chains were 
found to be intact. But when DNA was denatured 
with alkali treatment, a strand-specific nick defining 
the imprint at mat1 was discovered. Thus, the imprint 
is either a nick or an alkali-labile modification in DNA 
creating a fragile site that results in a DSB during 
preparation of DNA by conventional means. One 
possibility is that the imprint is an RNA moiety ori- 
ginating from an Okazaki fragment and ligated into 
DNA. These studies established the strand-specific 


feature of the model and furthermore provided bio- 
chemical evidence for the nature of the imprint. 

Interestingly, the imprint is made only by replica- 
tion of the specific strand by the lagging-strand 
replication complex, supporting the ‘orientation of 
replication model.’ When the mat1 fragment, which 
does not contain an origin of replication, is inverted, 
imprinting is abolished. When the inverted mat1 is 
facilitated to replicate in the opposite direction by 
judicious placement of origins of replication or 
terminators of replication next to the mat1 locus, 
imprinting and switching are restored. Also, mat 
is indeed replicated in the specific direction promoting 
imprinting. These results establish the orientation of 
replication model and support the general model 
whereby the act of DNA replication advances the 
developmental program of Sc. pombe. 

It is thought that during replication of the 
imprinted template by the leading strand complex a 
transient DSB is created. The resulting 3’-OH-ended 
strand may invade the intact donor mat2 or mat3 locus 
to prime DNA synthesis. That extended strand is then 
ligated at the mati locus to provide a template for 
synthesis of the other strand. Such a model is analo- 
gous to template switching or a copy-choice mechan- 
ism of repair of the DSB. 


Directionality of Switching 


In both Sc. pombe and Sa. cerevisiae donors are chosen 
nonrandomly such that nearly 80% of switches occur 
to the opposite allele, which contains nonhomologous 
allele-specific sequences. Both yeasts have evolved 
mechanisms whereby the specific donor is preferen- 
tially chosen, based on cell type but regardless of the 
donor’s genetic content. This was discovered because 
when the donor’s genetic content was swapped 
experimentally, cells in both yeasts preferentially 
underwent futile, homologous cassette replacements. 
However, the precise mechanism of donor choice is 
different in these organisms. In Sc. pombe, it is thought 
to be regulated by chromatin structural changes of the 
donor loci, probably in a cell-type specific fashion, 
but in Sa. cerevisiae, the control lies with a distantly 
located site on chromosome 3 that influences avail- 
ability of the left chromosome arm for recombination. 
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The maximum likelihood procedure has the advantage 
of being able to analyze statistical models with differ- 
ent characters on the same basis. All we need is to 
formulate statistical models in the form of likelihood 
function as a probability of getting the data at hand. 
The larger the likelihood is, the better the data fit 
the model. Once a log likelihood function is described, 
a numerical optimization routine such as Newton- 
Raphson method calculates the maximum likelihood 
estimates. Owing to the remarkable development 
of computers, the maximum likelihood procedure 
became a powerful tool not only for estimating evolu- 
tionary trees but also for answering the important 
questions of molecular evolution, such as testing 
molecular clock and modeling change in evolutionary 
rates, examining neutrality and detecting adaptive 
molecular evolution, estimating ancestral sequences 
and G+C contents, and combining protein structure 
and evolution. This review begins with an introduc- 
tion of the likelihood function and discusses statistical 
models of evolutionary rates, and a statistical compari- 
son of the models based on likelihoods. 
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Information Theory and Maximum 
Likelihood Procedure 


The likelihood function is related to entropy, or more 
specifically with mutual information content. It takes 
the value of 0 in the case of two random variables 
being independent, and can be regarded as a measure 
of departure from the model assuming independence. 
This interpretation leads to a generalization to the 
measure of the distance of a distribution from a 
model, called Kullback—Leibler information content: 


- =e f(Z|0)] (1) 


Here g(Zm), m = 1,..., M is a true distribution of a 
random variable Z, atid f (Zm 0), m = 1,..., M is the 
distribution under the model. EZ [.] is an expectation 
with respect to the true distribution, and @ is the 
parameter in the model. A statistical model with par- 
ameters minimizing this value is closest to the truth. 
Since the first term is common among different 
models and parameters, we ae the second term. 

Given the data Z,,..., Zm the population mean 
E7 [log HEM} is approximated by the sample mean 


LAIZ,,.. ah ur =F Ly"? =i log f(Z; 10). 
0(0|Z1,...,Zn) = Slog f(Zi\0) 
=I 


= log] [/(Z.10) 
i=1 


= log L(O|Z1,...,Zn) (2) 


is a log likelihood function. This shows clearly an 
advantage of maximum likelihood procedure. It com- 
pares statistical models as well as parameters on the 
same basis. 


Likelihood of a Tree 


Sequence data consist of homologous sites after 
alignment. Assuming a model of substitution with 
the independence among sites, the log likelihood of a 
given tree i is: 


6,(0;|X) = X> log fi X;,|;) (3) 


where f; (X, | 0;) is the likelihood of the Ath site given 
the tree structure 7. It includes the parameter 0; repre- 
senting evolutionary history. 

Assuming the evolutionary processes are independ- 
ent along the two lineages after divergence, the two 
groups separated by a branch are independent given 
the states at the two ends of the branch. In Figure I, 
the conditional likelihood of the group Xj, La; 
(X;iolk2), is obtained by 


Lai (Xio|k2) 


= = 2 Pears 


where pky, kı (v) is the probability of evolving from kz 
at node B to k at node A. 

Combining the subsets step by step, finally we get — 
at some node Q - the three conditional likelihoods 
Lai (Xizlk), Laz (XS,1k), Los (X5z1k) of the subtrees 
of the subsets X17, X5,, and X$, divided by the node. 
The likelihood of the Ath site is obtained by 


fi(X,10;) 
=Y mLo (X lk) Lo Xn lk)LoXy lk) © 
k 


v)La(Xalki)Laz(Xzlk1) (4) 


The tree with the highest maximum likelihood value 
is selected as a maximum likelihood tree. Given the 


4 (Xio | ko) 


Figure | Partial tree and conditional likelihood at the 
ith site: Since the two subsets Xi and Xn composing Xio 
are conditionally independent given the value at the in- 
ternal node A, the conditional likelihood of Xo, La(Xio|k1) 
is a product of the conditional likelihood, La\(X;\|k;) 
and La2(X2|k;, and the conditional likelihood Lg) (Xio| 


k2) = Èk Pe» (V) Lai(Xilki) La2(Xalkı). 


topology, standard errors of maximum likelihood esti- 
mates are obtained by Fisher’s information matrix, 
the minus inverse of the matrix consisting of second 
derivatives of the log likelihood. 


Nucleotide Substitution and Protein 
Evolution 


Sequence evolution such as nucleotide substitutions or 
amino acid replacements can be regarded as a Markov 
process. The likelihoods of sites are described in terms 
of transition probabilities from internal nodes to 
progenitor nodes. Statistical models of evolutionary 
processes are obtained by formulating the evolution- 
ary rate matrix R, noting that the Markov transition 
matrix P(t) is obtained by e*. Besides the branch 
lengths, parameters for the transition/transversion 
rate ratio, unequal nucleotide composition, rate 
heterogeneity among sites were incorporated in the 
model to approximate the biological reality. Allowing 
for different G+C content at interior nodes, the 
widely believed hypothesis of the hyperther- 
mophilic character in extant prokaryotes was per- 
suasively challenged. 

Protein evolution has also been extensively mod- 
eled. The rates between amino acids of similar 
characters are higher than the rates between very dif- 
ferent amino acids. Large protein data bases enable us 
to calculate the relative rate matrix of amino acid 
replacement. Further, the site rate depends on the 
location in the protein structure, particularly solvent 
accessibility. From the transition rate matrices for 
those categories in the data base, it is possible to pre- 
dict secondary structure for proteins whose structures 
are unknown. 

Neutrality of molecular evolution can be tested by 
comparing the rates of nonsynonymous substitu- 
tions with synonymous substitutions. Codon-based 
models, which include the parameter of the ratio of 
nonsynonymous substitutions to synonymous substi- 
tutions, detected adaptive sites in genes with known 
important function. 


Comparing Likelihoods of Models 


Validity of statistical models are examined and com- 
pared based on the maximum log likelihoods, i.e., the 
log likelihood values at the maximum likelihood esti- 
mates. The larger the maximum log likelihood, the 
better fitting has the statistical model. However, if 
the models have different numbers of parameters, 
a penalty should be paid for additional parameters. 
In the classical setting of hypothesis testing where 
the null hypothesis Ho is embedded in the alternative 
hypothesis Hy, twice the log likelihood ratio 
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follows asymptotically a x? distribution with the 
degrees of freedom being the difference between the 
numbers of parameters. Many important problems 
such as the existence of a molecular clock can be tested 
within this framework. 

Since branches cannot have negative lengths, care- 
ful treatment is required for testing significance of 
interior branches. Parameter spaces of different topo- 
logies are separated, and the log of the likelihood ratio 
between two topologies 4 = È} } = 1 log (fı (Xz 101) / fo 
(X,|05)) follows a normal distribution instead of 7’. 
Its variance is evaluated from the sampling variance 
among log likelihoods of sites: 


n fi n A 2 
_ 7 5 ise fi(Xl0) sS ee fi(Xy 101) 
(n — 1) fo(X,|00) rpm  fo(Xw|4o) 
Normalized statistic z = / VV with absolute value 
larger than 2 can be regarded as significant. 
Maximum log likelihoods of more than two topo- 
logies follow multivariate normal distribution. When a 
topology is compared not with a prespecified tree but 
with the maximum likelihood tree, the normalized stat- 
istic z should be compared with the distribution of the 
maximum of the multivariate normal random variable. 


Empirical Bayes Procedures and 
Bayesian Hierarchical Models 


Different genes often have different evolutionary rates. 
Even in a single gene, the evolutionary process is het- 
erogeneousamongsites. Ratesalsovary among lineages. 
Some aspect of heterogeneity is well characterized by 
a classification such as the first, second, and third sites 
of codons. But, itis often difficult to classify sites, genes, 
or lineages to prespecified categories a priori. 

The hierarchical model allows uncertainty of clas- 
sification, and assumes that sites, genes, or lineages are 
allocated to categories with some probabilities. Gen- 
erally, it considers a distribution for the parameters in 
the likelihood function. Site heterogeneity was mod- 
eled by introducing the gamma distribution for vari- 
able rates at sites. Since protein secondary structure 
has correlation between neighboring sites, it was 
modeled by Markov chains regarding sites as ‘time.’ 
Stochastic processes such as Brownian motion and 
compound Poisson processes were introduced to 
model the fluctuation of evolutionary rate. 
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Formally these distributions can be regarded as 
prior distributions of the parameters in Bayesian stat- 
istics. Priors influence the posteriors, the estimate 
given the data, unless the data have sufficient inform- 
ation to conclude the result without uncertainty. Pos- 
sible bias from the inadequate prior is avoided in two 
ways. An empirical Bayes procedure estimates the 
hyperparameter (the parameter specifying the prior) 
by maximizing the marginal likelihood, the expected 
likelihood with respect to the prior. On the other hand, 
pure Bayesians take account of uncertainty in the 
hyperparameters, and introduce a distribution of the 
hyperparameters, called a hyperprior. Fortunately, 
the two procedures give similar results in most cases. 


Genome and Post Genome 


With the remarkable development of computers, the 
maximum likelihood procedure became practical in 
the analysis of sequence data of reasonable size. Now 
that dozens of complete genomic sequences are avail- 
able, it is important to unify the estimated trees of 
genes. Apparent inconsistency may be within the 
level of uncertainty in some cases, and may be real in 
other cases, suggesting horizontal gene transfers. Prob- 
abilistic models of gene duplications, inversions, trans- 
locations, and horizontal gene transfers will become 
indispensable to extract genomic information fully and 
to get a total picture of genomic evolution. One of the 
most important merits of the maximum likelihood is 
that, once appropriate models for these processes 
become available, analyses of the different categories 
of the data can be combined on the same basis as the 
conventional analyses of nucleotide substitutions. 

Keeping in mind that many attractive models of 
molecular evolution have been born even in the last 
ten years, we could expect to have powerful models of 
genomic evolution in a few years. Maximum likeli- 
hood procedure will make full use of the probabilistic 
models. 
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The avian myelocytomatosis virus, MC29, is an onco- 
genic replication-defective retrovirus that encodes an 
oncogenic fusion protein between viral gag sequences 
and sequences derived from the coding regions of 
c-Myc. Specifically, the hybrid gene arose by deleting 
5’ coding sequences of the c-myc gene, these being 
substituted with a 5’ region of the viral gag gene, and 
a small number of base substitutions in c-myc result- 
ing in a few amino acid changes. The resulting v-Myc 
phospho-protein (p110-gag-myc) transforms avian 
hematopoietic target cells and fibroblasts in vitro, 
and induces tumors in vivo, most likely as a result of 
its DNA-binding transcription factor activity that is 
conferred by the Myc-related portion of the fusion 
protein. The activity of v-Myc is required for both 
proliferation and long-term survival of transformed 
cells. 


See also: Retroviruses 


McClintock, Barbara 
N Fedoroff 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 | .0804 


Barbara McClintock (1902-92) was one of the great 
geneticists of the twentieth century. She is best known 
for her later work which led to the discovery of trans- 
posable elements or transposons, which are mobile 
genetic elements. McClintock was a pioneering cyto- 
geneticist of maize (corn). She identified the maize 
chromosomes, showed that crossing-over was accom- 
panied by the exchange of chromosome pieces, mapped 
genes, and studied a variety of chromosomal behaviors. 

McClintock was born on 16 June 1902, in Hartford, 
Connecticut. Her father, Thomas Henry McClintock, 
was a physician, and her mother, Sara Handy 
McClintock, was a pianist, poet, and painter. Barbara 
was the third of four children and an odd child by her 
own account. She was self-contained from a very early 
age and later loved sports and solitary occupations 
such as reading and thinking. She attended Erasmus 
Hall High School in Brooklyn, where she discovered 
science. She graduated in 1918 and entered Cornell 
University. Here she was drawn to the fledgling 
science of genetics. McClintock undertook graduate 
study in the Cornell Botany Department, earning a 
PhD in 1927. 

At the time, maize was one of the central experi- 
mental organisms used by geneticists and Cornell had 
a particularly strong group of maize geneticists. 
McClintock’s first major scientific contribution as a 
graduate student was to identify the 10 maize chromo- 
somes, laying the groundwork for many subsequent 
discoveries connecting the behavior of chromosomes 
with the genetic characteristics of the organism. These 
included assignment of linkage groups to individ- 
ual chromosome, in which McClintock played a 
major role. Then, working with Harriet Greighton, 
McClintock showed that genetic crossing-over was 
accompanied by exchange of chromosome segments. 
Published in 1931, this contribution was quickly and 
widely recognized. Important milestones in sub- 
sequent years were the discovery of sister chromatid 
exchanges, the physical localization of genes on 
chromosomes, identification of ring chromosomes, 
discovery of the nucleolus organizer region, and a 
description of the behavior of broken chromosomes. 

McClintock continued to work at Cornell for a 
number of years, interrupted by visits to other institu- 
tions and a period spent in Germany ona Guggenheim 
Fellowship in 1933. In 1936, McClintock went to 
the University of Missouri as an assistant professor. 
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In 1941, she left the University of Missouri to work as 
a visiting scientist at the Genetics Department of the 
Carnegie Institution of Washington at Cold Spring 
Harbor, New York. She eventually became a staff 
scientist and remained with the Carnegie Institution 
of Washington until and after her retirement in 1967. 

Continuing to investigate the behavior of broken 
chromosomes at Cold Spring Harbor, McClintock 
used them to produce mutations. Some genetic anom- 
alies surfaced in the progeny of plants that began 
development with two copies of a broken chromo- 
some. One was a genetic locus or site at which a 
chromosome broke regularly. Another was the occur- 
rence of many unstable mutations, which revert to 
wild-type repeatedly during development, giving 
plants that are variegated for mutant and normal tis- 
sue. These investigations led directly to McClintock’s 
discovery of transposition. 

Analysis of the breakage site quickly revealed that 
the affected chromosome broke at the same place 
repeatedly, so McClintock named it the Dissociation 
(Ds) locus. She soon learned that breakage at Ds 
requires a second gene, which she called the Activator 
(Ac) locus. Then she identified a single anomalous 
descendant in which the chromosome broke at a dif- 
ferent place. She analyzed the properties of the new 
strain and concluded that Ds could move to a new 
chromosomal location, publishing her conclusion in 
1948. McClintock had discovered transposition. In 
the following few years, she showed that Ac could 
move too and that both Ds and Ac could insert into 
genes to cause unstable mutations. This was the first 
recognition that unstable mutations, which had been 
studied by others for many decades, were caused by 
the insertion and excision of transposons. 

Through the 1950s and 1960s, McClintock con- 
tinued to study the properties of maize transposons. 
She recognized that they fell into interacting groups, 
some members of which could transpose autono- 
mously, while others could not. Thus Ac is the autono- 
mous transposon of the Ac-Ds family, while Ds is the 
name for any nonautonomous member of the same 
family. The second major transposon family that 
McClintock identified and studied is called the Sup- 
pressor-mutator (Spm) family. 

While McClintock is widely recognized for discov- 
ering transposition, she also adduced some of the 
earliest evidence for regulatory interactions between 
genes. Understanding that a single locus like Ac could 
control the pattern of expression of multiple genes with 
Ds insertions, McClintock promoted the idea that 
transposons were ‘controlling elements’ that regulated 
the behavior of genes. Today it is evident that trans- 
poson sequences occasionally become incorporated 
in the regulatory regions of genes to control their 
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expression, but that they are not the essential regula- 
tory components of most genes. Yet McClintock’s 
studies on autonomous Spm elements and genes with 
nonautonomous Spm insertions provided the first 
genetic insights into regulatory gene function. 
McClintock was also the first to describe what is now 
called ‘epigenetic’ regulation, which she discovered 
in her studies on the Spm and Ac transposons. 

Although McClintock’s early contributions were 
recognized by her election to the National Academy 
of Sciences in 1944, the importance of her discovery 
of transposition was not immediately appreciated. 
Transposons were not identified in another organism 
until more than a decade after their discovery in maize. 
Thier ubiquity became increasingly apparent through 
the 1970s and their pervasiveness was understood in 
the 1980s. The Nobel Prize in Physiology or Medicine 
was belatedly awarded to Barbara McClintock in 1983 
for her discovery of transposition almost four decades 
earlier. 


See also: Transposable Elements; Transposons as 
Tools 
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Medicago truncatula Gaertner (or M. tribuloides 
Desr.), the so-called barrel medic, is a diploid (27 = 
2x = 16) legume, part of the genus Medicago (section 
Spirocarpos Ser. subsection Pachyspireae (Urb.) 
Heyn), family Leguminosae, subfamily Papilionoi- 
deae, tribe Trifolieae. Flowers yellow. Fruits with at 
least a few simple trichomes; spines are more obviously 
at right angles to the coil edge than any other species of 
the hard-fruited subsection Pachyspireae. Serration 
of the leaf margins characteristically shows teeth of 
alternating size. Omni-Mediterranean origin. Winter- 
growing autogamous annual species with a generation 
time of 3-6 months. 

One of the most common weedy Medicago of Old 
World rural habitats, M. truncatula hybridizes and 
integrades with M. littoralis Rohde ex Lois. Several 
hundred M. truncatula ecotypes have been collected 
and characterized. Various commercially available 
M. truncatula varieties are commonly grown in rota- 
tion with cereal crops in areas of Australia receiving 
between 275 and 400 mm annual average rainfall on a 
variety of soils. M. truncatula is closely related to the 
agriculturally important forage crop species M. sativa 


L. (alfalfa or lucerne), a polymorphic Medicago spe- 
cies, whose genetic analysis is complicated by abun- 
dant repetitive DNA, hybridization, polyploidy, and 
domestication. M. truncatula develops root nodules 
(indeterminate nodule type) in symbiosis with the 
nitrogen-fixing bacteria Rhizobium meliloti (Rhizo- 
biaceae), whose genome has been completely se- 
quenced. M. truncatula was selected as a model plant 
for studying the genetic and molecular processes of 
nodule formation. Moreover, M. truncatula provides 
the opportunity to study symbiotic associations with 
arbuscular-mycorrhizal fungi as well as resistance to 
plant pathogens. 

The genome size of M. truncatula is approximately 
5 x 108 bp, and the current genetic map comprises 348 
markers on eight linkage groups and covers 1400 cM 
(about 400 kb/cM). A bacterial artifical chromosome 
(BAC) library has been constructed. Genome compari- 
son between M. truncatula and other legumes resulted 
ina high level of macro- and microsynteny. Map-based 
cloning of a number of symbiosis-related loci is in 
progress. Insertion mutagenesis programs (using 
either transposon or T-DNA tagging strategies) have 
been initiated. Efficient Agrobacterium tumefaciens- 
mediated transformation systems have been develop- 
ed for specific M. truncatula ecotypes (Jemalong, 
R108). M. truncatula plants that overexpress the 
early nodulin gene enod40 exhibited accelerated 
development of root nodules and increased mycor- 
rhizal colonization. Expressed sequence tags (ESTs) 
have been obtained from various tissues including 
roots, nodulated roots, and mycorrhizal roots. Chem- 
ical mutagenesis has generated a set of various M. 
truncatula mutant lines, including mutants unable to 
form root nodules. Among them, a number of mutants 
are blocked at early stages of the symbiosis and do not 
display calcium spiking (sharp oscillations of cyto- 
plasmic calcium ion concentration) in the root hairs, 
a response induced by rhizobial lipochitooligosac- 
charide signals (Nod-factors). These mutants are also 
impaired in their ability to interact with mycorrhizal 
fungi. Another M. truncatula mutant is insensitive to 
the plant hormone ethylene and can be hyperinfected 
by R. meliloti. 


Further Reading 
http://chrysie.tamu.edu/medicago/ 
http://sequence.toulouse.inra.fr/Mtruncatula.html 
http://www.ncgr.org/research/mgi/ 
http://www.tigr.org/tdb/mtgi/ 


See also: Nod Factors; Rhizobium; Transfer of 
Genetic Information from Agrobacterium 
tumefaciens to Plants 
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Definition 


Meiosis is defined as the cellular and nuclear processes 
that reduce the chromosomal content per nucleus 
from two sets to one set. In most organisms, two sets 
of chromosomes (diploid) are reduced to one set 
(haploid) (see Chromosome Pairing, Synapsis). 
When the haploid cell becomes involved in the process 
of fertilization, it is referred to as a ‘gamete.’ If a cell 
with one set of chromosomes goes on to proliferate, it 
is called a ‘gametophytic generation.’ This occurs in 
many fungi, ferns, and, for a few divisions, in plants. 
Many variations in the meiotic process have evolved 
that are of particular adaptive value to specific organ- 
isms. The products of meiosis in organisms with three 
or four sets of chromosomes are usually unbalanced 
because of difficulties in the segregation and assort- 
ment of chromosomes. Some of the mechanics of 
meiosis are presented in the articles on Chiasma, and 
Synaptonemal Complex. 


Genetic Effects of Meiosis 


Because the genetic information contributed to an off- 
spring by the male parent is likely to be somewhat 
different in details from the set contributed by the 
female parent, the offspring will be different from 
either. This is evident by inspection of human families. 
The reproductive cells of the offspring contain a mix 
of genetic information derived from both parents so 
that the variability is continued from generation to 
generation. The variability comes not only from the 
random assortment of the parental chromosomes, but 
is further increased by recombination within pairs of 
homologous chromosomes. 

Recombination is the process whereby DNA, that 
is, genetic information, is exchanged between parental 
chromosomes. This exchange of information can be 
reciprocal or nonreciprocal. The process involves the 
amazing ability of cells that are undergoing prophase 
of meiosis to induce breaks in their DNA and then 
repair the breaks by molecular association with the 
corresponding sequence of the unbroken chromo- 
some. To detect the breaks and to carry out the repair, 
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meiotic cells use the molecular mechanisms that have 
evolved in nonmeiotic cells, such as bacteria and the 
somatic cells of multicellular organisms, to prevent 
the damaging effects of accidentally induced breaks. 
The lack of proper detection and repair mechanisms 
leads to genetic instability and sensitivity to radiation 
in bacteria and somatic cells. In meiotic cells, it leads 
to defects in synapsis, recombination, and segregation, 
and generally to infertility. 


Benefits of Meiosis 


Because meiosis can introduce genetic variation, it is 
intuitively assumed to be of biological benefit. This 
concept, however, needs qualification. Clearly, if an 
organism or a population i is genetically fine-turned to 
a given environment, it would be counter productive 
to break up the balanced genetic makeup by meiosis. 
Thus, one would expect to see asexual mechanisms, or 
at least reduced genetic variability, evolve in a sexual 
population under stable conditions. Numerous ex- 
amples have been cited of such adaptations. They 
include vegetative reproduction as an alternative to 
sexual reproduction under stable conditions (e.g., 
strawberries), or temporary asexual reproduction dur- 
ing stable conditions followed by sexual reproduction 
under unfavorable conditions (e.g., aphids). Complete 
asexual reproduction derived from sexual reproduc- 
tion has been reported in cases of wide dispersion in 
which male-female contact becomes highly tenuous. 
Reduced genetic variability through permanent trans- 
location heterozygosity exists in a number of plant 
species, but the relationship to environmental factors 
is not obvious. In some insects and spiders, the sex- 
linked chromosome complex may become extensive 
(e.g., some termites), thereby also limiting genetic 
variability. 


Biological Costs of Meiosis 


In genetic terms, sexual reproduction is biologically 
expensive. The reproductive cells must have some 
recognition mechanism that ensures the fertilization 
between two genetically different cells, usually one 
male and one female. This carries with it the genetic 
cost of specialized genes for the development of fe- 
males and males or their equivalents in single-celled 
organisms. This in turn requires specialized genetic 
programs for mate detection, courtship, mating, and 
parental investment mechanisms. The most frequently 
cited cost of meiosis is the individual’s loss of genetic 
contribution to the next generation of 50%, which 
might translate into a 50% loss of Darwinian fitness. 
In the absence of sexual reproduction, the female 
makes the maximum contribution of her genes to her 
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offspring, whereas in the sexual form of reproduction, 
the offspring has only half her genes. The fact that 
sexual reproduction is common among many organ- 
isms, however, suggests that the benefits outweigh the 
costs. The full extent of these costs and benefits as well 
as an appreciation of their balance has still not been 
elucidated. 


Meiotic Mistakes 


It would be an oversimplification to assume that any 
biological process is executed flawlessly. However, 
most mistakes that take place in cells of the body are 
usually of minor importance and can be attended to by 
repair or replacement. The consequences of meiotic 
mistakes, on the other hand, can be disastrous, because 
the resulting individual may be affected in its entirely. 
In humans, statistics show that 7.5% of all concep- 
tions carry lethal chromosome defects and 0.5% have 
nonlethal chromosomal aberrations. Half of these are 
the result of missegregation at meiosis in the male or 
female parent. Typically, instead of one of a pair of 
chromosomes going to each nucleus, both end up in 
one nucleus and none in the other. If fertilized, the first 
cell would result in an offspring with three instead of 
two copies of the chromosome, while the other would 
carry a single chromosome donated by the other par- 
ent. For the larger chromosomes, this imbalance is 
lethal, but for the sex chromosomes, X and Y, the 
imbalance is tolerated because of the cell’s ability to 
shut down most of the X chromosome when present 
in more than one copy (as in normal females). How- 
ever, there are still more or less severe developmental 
problems arising from this aneuploidy (incorrect 
numbers of chromosomes). In humans, a trisomy 
(three chromosomes) involving the small chromo- 
some 21 can be viable, resulting in the condition 
known as Down syndrome. However, monosomy (a 
single chromosome) is lethal. 

It has been reasoned that organisms with offspring 
numbering in the thousands, such as plants, tend to 
have less stringent control of meiosis. In exceptional 
cases, organisms such as some crustaceans and plants 
have very large numbers of chromosomes (hundreds), 
all of which appear to be repeat copies of one or a few 
chromosomes. In such cases, there does not appear to 
be a careful assortment at either mitosis or meiosis. 

A variety of plants species can tolerate unbalanced 
products of meiosis better than animals such as verte- 
brates. The existence of chromosomal variation has 
resulted in the observable evolution of new species 
by natural or artificial selection of favourable types. 
In wheat, an ancestral wheat-like species with chromo- 
somes A1, Al, A2, A2, etc., hybridized with a closely 
related species with chromosomes B1, B1, B2, B2, etc. 


The resulting plant is of type A1, B1, A2, B2, etc., and 
is infertile because the A chromosomes cannot prop- 
erly match at meiosis with the B chromosomes and no 
balanced gametes can form. However, after doubling 
of the chromosome number per cell, which is not an 
uncommon event in plants, the plant is of type A1, A1, 
A2, A2, B1, B1, B2, B2,...It is fully fertile because at 
meiosis, the A chromosomes pair with A chromo- 
somes and B chromosomes pair with B chromosomes, 
so that each gamete has a complete set of each type. 
Such a plant is called an ‘allotetraploid’ or ‘amphidi- 
ploid.’ It is a new species in the sense that it is genetic- 
ally isolated from both ancestors, because meiosis in 
crosses between the allotetraploid and the ancestors 
produces chromosomally unbalanced gametes. It is 
estimated that about 8000 years ago, a further new 
species arose through the hybridization between the 
Emmer wheat with the AABB genomes and a wild 
wheat species with DD chromosomes. After doubling 
of the ABD chromosomes, the present-day bread 
wheat, Triticum aestivum, with genomes AA BB DD 
resulted. 


Meiosis and Sex Determination 


In humans, the genetic factors that initiate male de- 
velopment are located on the Y chromosome. Thus, 
females have, in addition to two sets of 22 autosomal 
chromosomes, two sex chromosomes, XX, while 
males are of type XY. At meiosis, the female’s two X 
chromosomes pair and then segregate, one to each 
daughter nucleus. The situation in the male is more 
complex. At an early evolutionary stage of the differ- 
entiation of the X and Y chromosomes, the two were 
quite similar. Natural selection then favored differen- 
tiation of the two types of sex chromosomes so that 
male and female development would not get mixed up. 
As a consequence, the X and Y chromosomes have 
retained only a very restricted region of similarity, the 
pseudo-autosomal region, which is capable of synap- 
sis and recombination at meiosis. If synapsis and seg- 
regation occur correctly, then one daughter nucleus 
will recetve the X chromosome while the other 
receives the Y chromosome. 

The combination of the X-bearing female egg 
nucleus with an X-bearing sperm cell produces an 
XX embryo that will enter the female developmental 
pathway. Fertilization between an X- and a Y-bearing 
gamete will result in male development. Mistakes in 
sex chromosome segregation at meiosis in the male or 
female can give rise to individuals of type XO, XXY, 
XYY, or XXXXY. The reason that such chromosome 
imbalances are not always lethal is that normally the 
cell has mechanisms that inactivate the X chromo- 
somes in excess of one and also the Y chromosome 


has very few genes other than the sex-determining 
region (SRY). However, development is not entirely 
normal in these cases. Unfortunately, the SRY of 
humans is close to the pseudoautosomal region so 
that occasionally, by accident, there is a crossover at 
meiosis that transfers the SRY to the X chromosome. 
The result is an XX, individual who expresses male 
characteristics. By contrast, in mice, the SRY is far 
away from the pseudoautosomal region, so these ac- 
cidents are less frequent. There are many forms of sex 
determination that are entirely different from the XX/ 
XY type, but they are beyond the scope of this article. 


Male versus Female Meiosis 


In general, males produce large numbers of gametes 
while females produce relatively few. The male 
gametes tend to be little more than a nucleus with a 
minimum number of cellular components, while the 
female cells may contain large amounts of resources 
for the early development of the embryo, as is evident 
from the size of the eggs of a wide variety of organ- 
isms. The process of meiosis reflects these different 
requirements of the sexes. For example, in human 
males, meiosis is a continuing process in the testes 
from puberty to advanced age, and hundreds of thou- 
sands of spermatozoa are formed every day. In 
females, on the other hand, only a few hundred thou- 
sand cells of the ovaries enter meiotic prophase some- 
time before birth and they stay arrested at that stage 
for up to 50 years, unless recruited for ovulation. The 
arrest of female meiosis is not only a common phe- 
nomenon in animals, but also in numerous plant spe- 
cies where the flower buds overwinter. The selective 
advantage of large numbers of male gametes is not 
certain, but it is often attributed to between-male 
gamete competition in outbreeding species. 

In males, all four products of meiosis usually 
become gametes capable of fertilization. In females, 
on the other hand, only one of the four products will 
function in reproduction. The other three products 
degenerate or may contribute to accessory tissues or, 
in rare instances, may reenter the oocyte and fuse with 
the oocyte nucleus and thereby simulate fertilization, 
a process known as parthenogenesis. For complex 
reasons, this does not lead to viable offspring in mam- 
mals but can produce viable offspring in other verte- 
brates and in invertebrates. 


Further Reading 

Moens PB (ed.) (1987) Meiosis. San Diego, CA: Academic Press. 

Moens PB, Pearlman RE, Heng HHQ and Traut W (1998) 
Chromosome cores and chromatin at meiotic prophase. 
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Mendelism is a magnificent invention for fairly testing genes 
in many combinations, like an elegant factorial experimental 
design. Yet it is vulnerable at many points and is in constant 
danger of subversion by cheaters that seem particularly 
adept at finding such points. (J. Crow, 1988) 


Richard Dawkins popularized the ‘selfish gene’ with 
the notion that the gene, as the unit of selection, is 
inherently selfish and that the individual is simply the 
vehicle in which genes propagate themselves. There 
exists a class of genes, however, which take this passive 
selfishness a step further and which are capable of 
their own active self-propagation. That is, they pos- 
sess characteristics which allow them to enhance their 
own transmission relative to the rest of the indivi- 
duals. Such genes, which actively interfere with, or 
destroy, other genes in the same nucleus have been 
referred to as the ‘ultraselfish genes.’ 

One class of ultraselfish genes are the meiotic drive 
genes, which attracted the attention of geneticists 
because they ‘cheat’ during meiosis. Meiotic drive 
was the term first used by Sandler and Novitski in 
1957 to refer to segregation distortion resulting from 
an event, or events, associated with meiotic divisions 
per se. It has now come to encompass broadly all ex- 
amples of segregation distortion, regardless of mechan- 
ism and including examples that we now know 
to occur postmeiotically. Meiotic drive is generally 
restricted to one sex (usually the male) and is broadly 
defined as an excess recovery of one allelic alternative 
in the functional gametes of a heterozygous parent. 

Drive systems rarely have phenotypic markers, and 
can be difficult to study, thus known incidences of 
meiotic drive are restricted to organisms that are well 
characterized genetically. Nevertheless, they are taxo- 
nomically widespread and the number of examples 
described continues to grow. Because meiotic drive 
genes often actively destroy their homologs to 
increase their own representation in the gene pool, 
this has earned them colorful names such as ‘spore 
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ism and including examples that we now know 
to occur postmeiotically. Meiotic drive is generally 
restricted to one sex (usually the male) and is broadly 
defined as an excess recovery of one allelic alternative 
in the functional gametes of a heterozygous parent. 

Drive systems rarely have phenotypic markers, and 
can be difficult to study, thus known incidences of 
meiotic drive are restricted to organisms that are well 
characterized genetically. Nevertheless, they are taxo- 
nomically widespread and the number of examples 
described continues to grow. Because meiotic drive 
genes often actively destroy their homologs to 
increase their own representation in the gene pool, 
this has earned them colorful names such as ‘spore 
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killer? and ‘gamete eliminator.’ The best-described 
examples of drive come from Drosophila (such as 
Sex-Ratio (SR), a meiotic drive system on the X 
chromosome of D. pseudoobscura, and Segregation 
Distorter (SD), an autosomal drive system on the sec- 
ond chromosome of D. melanogaster) and the house 
mouse, Mus musculus, where at least two examples 
have been described. 

In every system that has been analyzed, meiotic 
drive involves interactions among several loci of a gene 
complex, encompassing large chromosomal regions. 
The molecular mechanisms and evolutionary con- 
sequences of meiotic drive are still not well understood, 
although such deviations from Mendelism can have 
profound effects from an evolutionary perspective. 
Simple models of meiotic drive generally predict 
rapid fixation of the driven allele, yet all known ex- 
amples are maintained in natural populations as poly- 
morphisms. Genomes may respond to meiotic drive 
genes in a variety of ways, and strong counterbalan- 
cing selection to prevent their fixation may result in 
the evolution of suppressers, enhancers, sterility, and 


lethal alleles. 


t Haplotypes 


The best studied example of meiotic drive in the house 
mouse is the t haplotype, which biases DNA transmis- 
sion by disrupting spermatogenesis. t haplotypes are 
a selfish form of chromosome 17 that are found in 
natural populations of all subspecies of the house 
mouse. They comprise a large 20cM (centimorgan) 
region, which is approximately the proximal third 
of the chromosome. Within this region are a series of 
four major nonoverlapping inversions which suppress 
recombination across the region in +/t heterozygotes 
so that t haplotypes are inherited as a single genetic 
unit. t haplotypes show segregation bias in male mice 
only. In +/t females, segregation is normal and off- 
spring are produced in the expected Mendelian ratios. 
In contrast, in +/t males, the t haplotype is transmitted 
to over 90 % of the offspring. This is known as trans- 
mission ratio distortion (TRD) and is a consequence 
of the production of wild type sperm that are func- 
tionally inactivated due to motility defects. Multiple 
independent loci are involved in drive. Three to five 
t complex distorter loci (Tcds) have been described. 
These vary in strength and act additively on a single, 
centrally located t complex responder (Tcr) locus to 
produce the high transmission bias in favor of the 
t haplotype. The mechanism by which this occurs is 
still unclear and investigations are ongoing. 

t haplotypes have not become fixed in natural 
populations owing to several, strong counterbalancing 
forces. All males that inherit two t haplotypes are 


unconditionally sterile, due to the inactivation of all 
of their sperm. Additionally, most t haplotypes carry 
recessive lethal mutations, which results in homo- 
zygous lethality during early embryogenesis. The 
overall frequency of t haplotypes in wild populations 
is very low, around 10-15 %, and additional forces 
have also been demonstrated to be acting against 
t haplotypes to maintain such a low frequency. These 
include selection against +/t heterozygotes, reduced 
TRD due to multiple mating, and the social and popu- 
lation behavior of mice, which can result in loss of 
t haplotypes through genetic drift. 


HSR Inverted Duplication — In 


Most well-known instances of meiotic drive have 
typically been confined to males; however, an example 
of drive has been described in the Eastern European 
subspecies of Mus musculus, in which an aberrant 
form of chromosome 1, known as In, causes segrega- 
tion distortion during oogenesis. Unlike the t haplo- 
type, this is an example of meiotic drive that actually 
does occur during meiosis, as all interactions are 
known to occur during the second meiotic division. 
In contains two large insertions held together in an 
inversion and behaves strangely during oogenesis. 
Chromatid segregation in heterozygous (+/Jn) fe- 
males depends on which sperm enters the oocyte 
before the second meiotic division, such that drive in 
favor of the Jn chromosome happens from heterozyg- 
ous females if they are mated to a +/+ homozygote 
male. However, if the male himself carries an In 
chromosome (+/In), then drive is ameliorated and 
the female’s offspring inherit her two chromosomes 
in Mendelian ratios. Genetic analysis has identified a 
two component system consisting of a postulated dis- 
torter and responder loci, where the distorter is on 
chromosome 1, distal to the responder, and acts on the 
responder when in trans. The organizational features of 
this system are very similar to other drive systems, such 
as the t haplotype, including a two-component system 
and inversions, and there is considerable parallelism in 
the way meiotic drive affects various steps in the for- 
mation of gametes and zygotes in the two sexes. In is 
also found at low, variable frequencies in natural popu- 
lations, and studies of the population dynamics of this 
chromosome show that selection again acts against 
homozygous carriers. The viability of homozygotes 
of both sexes is reduced to 55 % and the fertility of 
homozygous females is as low as 10%. There are at 
least three meiotic drive levels (ranging from 50% to 
85%) determined by different allelic variants of dis- 
torter, and population structure and small population 
sizes may also contribute to the loss of the chromo- 
some, particularly at lower levels of drive. 


Deviation from Mendelian Inheritance 
(DMI) 


While no other meiotic drive systems are known for 
the mouse at present, several instances of deviations 
from Mendelian inheritance (DMI) have been de- 
scribed. Modest DMI has been described from linkage 
test crosses on chromosomes 2, 4, and 10, but these 
may be the result of sampling fluctuations from small 
numbers of test mice. These findings are often not 
replicated. Strong and replicated DMI (of 70-90 %) 
has been described favoring Mus spretus-derived 
alleles at several X-linked loci in four mouse inter- 
specific crosses. The mechanism for this deviation, 
however, appears most likely to be due to lethality of 
embryos carrying particular combinations of alleles, 
rather than to true segregation distortion during 
oogenesis in F; hybrid females. 
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Meiotic product is a general term that refers to any of 
the four haploid cells resulting from a meiotic div- 
ision. The specific names and eventual fates of these 
cells differ between organisms with distinct life cycles. 

The haploid products of a gametic meiosis, charac- 
teristic of animals and some protists, are gametes 
which are formed by meiosis in a diploid individual. 
These gametes fuse to produce a zygote. As a variation 
on the theme, gametogenesis in females of many animal 
species proceeds in such a way that only one gamete is 
produced per diploid cell entering meiosis. The other 
three by-products, known as polar bodies, simply 
remain as a nuclei with a small amount of cytoplasm. 

Fungi and some algae undergo zygotic meiosis. In 
this type of life cycle, a diploid zygote formed by 
syngamy of two gametes immediately enters meiosis. 
This results in production of four haploid cells, which 
divide mitotically and eventually produce multicellu- 
lar haploid organisms (or many single-cell organisms). 
These individuals give rise to gametes by differenti- 
ation of their cells. 

In sporic meiosis, seen in plants and some algae, 
a diploid zygote differentiates into a multicellular 
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diploid individual (sporophyte). Certain cells of this 
individual undergo meiosis to produce spores, which, 
in turn, divide mitotically to give rise to multicellular 
haploid individuals (gametophytes). These gameto- 
phytes eventually generate gametes, which fuse to 
produce zygotes. 


See also: Tetrad Analysis 
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The majority of cytogenetic studies of malignant mela- 
noma have been performed on human tumors; a few 
studies have also been performed on transplantable 
melanomas in rodents. Cytogenetic studies on the 
human malignant melanoma have revealed that, as in 
most human cancers, melanoma cells display acquired, 
clonal chromosome aberrations. The most consist- 
ently observed numerical changes have been losses of 
chromosomes 10 and 9, and gain of chromosome 7. 
Among structural aberrations, the most common 
have been del(6q) or other rearrangements, including 
i(6)(p10), that lead to loss of 6q. Results from chromo- 
some transfer experiments have provided functional 
evidence for the presence of a tumor suppressor gene 
on chromosome 6, which may be acting early in the 
pathway of tumor formation. All aberrations men- 
tioned above may play an important role in the tumor- 
igenesis and development of malignant melanoma. 
In addition, various abnormalities of chromosome 1, 
often resulting in loss of 1p material, and, with a 
lower frequency, abnormalities of chromosomes 7, 9 
(mostly affecting 9p), 11, and 17 were observed. How- 
ever, the cytogenetic pattern of cutaneous malignant 
melanoma outlined above concerns predominantly 
metastatic tumors, since only approximately 20% 
of all abnormal malignant melanoma karyotypes 
have been obtained from primary tumors. In general, 
the karyotypes of metastatic melanoma are more 
complex, with higher modal chromosomal numbers 
and higher numbers of structural chromosome 
abnormalities. Itseems that rearrangements of chromo- 
some 11 are later events in tumor progression, 
and may represent an indicator for a less favorable 
clinical outcome. An increase in number of chromo- 
some 7, often accompanied by enhanced expression 


1168 Melting Temperature (Tm) 


of the EGF receptor, has been detected in advanced 
melanomas. 

At present, little information is available on 
chromosomal aberrations in malignant melanoma 
of different subtypes and different growth pattern. 
In nodular melanomas more chromosomal changes 
have been found in comparison with superficially 
spreading melanomas. The number of chromosomal 
aberrations and the ploidy of cells increase with the 
tumor stage in both subtypes. 

Aneuploidy seems to be a feature of advanced 
stages of malignant melanoma but it does not replace 
other prognostic factors and should be considered 
together with previously known prognostic de- 
terminants. Some authors are of the opinion that 
cytogenetic analysis may provide useful prognostic 
information about patients with metastatic melanoma. 
Patients with structural abnormalities of chromo- 
some 7 and 11 in the tumor cells had significantly 
shorter survival than patients without these abnormal- 
ities. 

Cytogenetic studies of uveal melanomas revealed 
that the most characteristic chromosome aberrations 
are gains of 8q, often as a result of an i(8q) formation, 
loss of one copy of chromosome 3 (60% of cases), 
and loss or partial deletions of the short arm of 
chromosome 1. The presence of additional copies 
of chromosome 8q and especially monosomy of 
chromosome 3 in the tumor cells correlated with 
reduced survival. 

Transplantable melanomas in mice and hamsters 
are widely used in oncological research; in some 
cases, cytogenetic studies have also been performed. 
Karyotypes of transplantable rodent melanomas are 
usually stable. However, the spontaneous phenotypic 
variations of transplantable Bomirski hamster mela- 
nomas, including a tendency toward partial or com- 
plete loss of ability to synthesize melanin pigment, 
have been associated with the karyotypical changes 
of the melanoma cells. Cytogenetic studies of the 
B-16 murine melanoma have provided information 
on the role of chromosome changes in the progres- 
sion and phenotypic diversification of this melanotic 
tumor type. In K-1735 murine melanoma, rearrange- 
ments of chromosome 14 are associated with meta- 
static potential of melanoma cells, and structural 
anomalies of chromosome 4 together with alter- 
ations of chromosomes 1, 3, 12, and 15 may be asso- 
ciated with tumorigenic properties of this murine 
melanoma. 


See also: Aneuploid; Cancer Susceptibility; 
Chromosome Aberrations 
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The melting temperature (Tm) is the midpoint of the 
temperature range over which DNA is denatured. 


See also: DNA Denaturation 
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In the words of Theodosius Dobzhansky: 


Genetics, an important branch of biological science, has 
grown out of the humble peas planted by Mendel in a 
monastery garden. 


This entire encyclopedia represents an affirmation, 
transcending time and space, of this tribute to the 
genius of Abbot Gregor Johann Mendel, a man 
whose achievement can without any hint of hyperbole 
be described as unique in the annals of science. The 
uniqueness of his achievement resides in the fact that a 
series of experiments with peas which occupy only a 
brief interlude in his life, sandwiched during a few 
years between a multitude of other activities both 
scientific and administrative, now inspire and inform 
every aspect of the large areas of biology which are 
associated with genetics. 

It has become a truism to say that if a scientific 
discovery had not been made by a certain scientist in 
a certain place, then it would have been made within a 
very short span of time by another scientist in another 
place. Indeed, this pattern of more or less simultan- 
eous scientific advance in institutes that are widely 
separated geographically, is now so well established 
that it is unusual for a clear-cut winner to emerge even 
a few months ahead of the field with respect to an 
important discovery; as a result, bitter and rancorous 
controversies about priority are all too common. This 
pattern even applies to the virtually simultaneous 
rediscovery of Mendel’s work in 1900 by Correns, de 
Vries, and Tschermak. In sharp contrast, Mendel had 
no rivals for several decades both before his discovery 
and for several decades afterwards, until the rediscov- 
ery took place. 


There are, of course, other examples of ‘prematur- 
ity’ in scientific discovery, prematurity in this context 
being defined by Stent as follows: “A discovery is 
premature if its implications cannot be connected by 
a series of simple logical steps to canonical, or gen- 
erally accepted, knowledge.” There is a good case, 
nevertheless, for arguing that Mendel’s discovery 
transcends these other instances, both in the quality 
of its ‘prematurity’ and in its importance which has led 
to the passing of his name into everyday language in 
the form of words such as mendelian and mendelism. 

Although very few of Mendel’s experimental notes 
have survived, we know that between 1857 and 1863, 
he investigated the laws of the origin and development 
in Pisum of variable hybrids in connection with seven 
pairs of traits. It is difficult to conceive how Mendel 
could have had the prescience and good fortune to 
have chosen just these traits in just this species, 
whose study enabled him to demonstrate the basic 
laws of heredity and to create clarity and order out 
of the chaos which had long characterized this area of 
biology. This degree of prescience border seems to 
same on the preternatural; however that may be, it is 
a fact that Mendel was not able to repeat the results 
which he obtained with Pisum in subsequent experi- 
ments involving several other plant species. 

Mendel’s insight was so profound that his concepts 
of dominance and recessivity remain entirely valid 
today. Thus, he denoted the round shape of the ripe 
pea seeds as dominating over the angular wrinkled 
shape which, temporarily receding from view in the 
F, hybrid generation and reappearing in a ratio of 1:3 
in the F, generation, he denoted as recessive. Among 
the plants with round seeds of the Fz generation, he 
showed a ratio of 2:1 if he differentiated in the F; 
generation bred by self-fertilization between the 
“meaning of the dominating trait as a hybrid (ie., 
producing F; plants with round and wrinkled seeds 
in the ratio of 3:1) and as a parental (i.e., producing 
only F; plants with round seeds) trait.” Thus, in his 
analysis of this monofactorial experiment, as it came 
to be called, he clearly appreciated the difference 
between the appearance of the dominating trait, or 
phenotype, and its hereditary basis, or genotype. As 
a trained physicist, he commanded combinatorial 
mathematics to an extent which enabled him to inter- 
pret the ratios obtained in his bi- and trifactorial 
experiments, and to extrapolate these results in math- 
ematical terms to general predictions involving 7 pairs 
of factors. 

These arithmetical ratios, through which Mendel 
demonstrated the particulate inheritance of traits in 
the pea, seem in retrospect to be extremely simple. 
However, this simplicity is deceptive, being apparent 
only with the benefit of hindsight; no one had had an 
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inkling of these truths before Mendel. Nor did any 
one grasp these truths for several decades after he 
reported his results in two lectures given on 8 Febru- 
ary 1865 and 8 March 1865 to the Natural Science 
Society of Briinn (Brno), a prosperous city of moderate 
size in Moravia, then a part of the Austro-Hungarian 
Empire, where he was a monk in the Augustinian 
monastery. 

Thus, in so far as his cardinal discovery of particu- 
late inheritance was concerned, Mendel had no pre- 
decessors and, for several decades, no successors. R.A. 
Fisher performed an extensive analysis of Mendel’s 
work with peas and came to the conclusion that this 
was not based solely on an experimental approach, but 
rather represented an exposition of particulate inherit- 
ance which Mendel had already thought out and 
which he had then demonstrated in his capacity as a 
teacher. He had had no precursors to help him in 
formulating this exposition on the basis of his discov- 
ery of the cardinal principles on which the whole 
discipline of genetics is founded. It is of interest to 
note that Mendel himself showed insight into the 
importance and the uniqueness of his discovery, in 
that in the preamble to his paper, based on his lectures 
and published in 1866 in the Proceedings of the Nat- 
ural Science Society of Brünn (Verhandlungen des 
Naturforschenden Vereines (Briinn)), having surveyed 
previous work in the field of ‘plant hybridization,’ he 
stated: 


among all the numerous experiments made, not one has been 
carried out to such an extent and in such a way as to make it 
possible to determine the number of different forms under 
which the offspring of hybrids appear, or to arrange these 
forms with certainty according to their separate generations, 
or definitely to ascertain their statistical relations. 


As far as the lack of immediate successors is con- 
cerned, it would be an error to suppose this to have 
been due to the inaccessibility of the published report 
of Mendel’s 1865 lectures. Mendel corresponded with 
the leading scientists in the field, and sent a reprint of 
his publication to the most prominent among them, 
Nägeli, as well as describing his work to him in detail 
in the course of an extensive correspondence over a 
number of years. In fact, Mendel ordered 40 reprints 
of his publication, and these reached colleagues all 
over Europe; some of these reprints have come to 
light for the first time relatively recently, often 
uncut. In addition, the journal itself, Verhandlungen 
des Naturforschenden Vereines (Briinn), was not an 
obscure one, and it is known to have reached the 
libraries of the Royal Society and the Linnean Society 
in London, among many other academies, univer- 
sities, and institutes throughout the world of learning. 


1170 Mendel, Gregor 


Despite this, Galton, who, during the years 1872-75, 
made the closest approach to Mendelian theory that 
was achieved in the nineteenth century, did not know 
of Mendel’s work. 

In passing, it is of interest to note that Mendel 
visited the Great Exhibition in London in 1862, at a 
time when he was coming to the end of several years of 
experimentation with Pisum. Although there is no 
evidence that Mendel’s visit to London represented 
anything more than an excursion as a tourist, in the 
company of a large group of fellow Moravians, there 
has been unfounded speculation that Mendel might 
have paid a visit to Darwin. It is astonishing, in general, 
how little the details, both personal and scientific, of 
the life of this modest and retiring priest are docu- 
mented. The main reason is that his personal and 
scientific papers were unaccountably incinerated at 
his monastery soon after his death. 

Had such a meeting occurred during Mendel’s visit 
to England, it might also have included Darwin’s cou- 
sin, Galton. Discussions between these three men 
might well have led to the immediate recognition of 
the importance of Mendel’s work, with momentous 
consequences for the development of the science of 
genetics. Not only did the meeting not take place, but 
it has been claimed, in addition, that an uncut reprint 
of Mendel’s publication was found in Darwin’s library 
at the time of his death. Thus, despite the fact that his 
paper was published in a widely distributed journal 
in 1866, it was not until a third of a century later that 
Mendel’s work was rediscovered. There is no evidence 
that Mendel felt resentful or bitter with respect to the 
failure of his contemporaries to appreciate the import- 
ance of his work. As already indicated, he himself 
appreciated its importance and, in talking with a col- 
league, Niessl, he uttered the prophetic words: “My 
time will come.” 

And his time has indeed come. Throughout the 
twentieth century, his work on Pisum has been sub- 
jected to endless analyses, questioning the reasons 
why it was undertaken, the way in which it was car- 
ried out, and the accuracy of the reporting of the 
results. Perhaps the most appropriate comment on 
these analyses is that of Sturtevant who concluded 
that the best answer to all these problems, or ques- 
tions, is that Mendel was right. 

There is a great deal more to interest us in Mendel’s 
scientific life, especially the failure which met his 
attempts to repeat his results in Pisum during exten- 
sive experiments with several other species of plants. 
He also worked with great skill in the fields of meteor- 
ology and of apiculture, including attempts, ultim- 
ately unsuccessful, to acclimatize members of a 
species of bee indigenous to Brazil, Trigona lineata, 
which had migrated to Brünn by accident in the 


hollow of a tree-trunk included in a consignment of 
wood imported from that country. 

It is a fascinating question to consider how it 
came about that a man who did not form part of the 
scientific establishment of his time was able to make 
a contribution to science of such transcendental 
importance. While he was born in 1822 in humble 
circumstances as the only son of a peasant farmer, 
of mixed Czech and German origin, in Moravian 
Silesia, a province of the Austro-Hungarian Empire, 
Mendel was very far from being a self-taught prodigy, 
as was, for example, Srinivasa Ramanujan, the Indian 
mathematician of similarly humble origin. Thus, 
he showed great talent at school, and his parents 
who had enormous respect for learning, endured 
great financial privations to support him during his 
education. 

From an early age, Mendel had to augment the 
necessarily meager allowance provided by his parents 
through private tutoring. He wrote of himself in 1850 
in the third person in his curriculum vitae: 


His sorrowful youth taught him early the serious aspects of 
life, and it also taught him to work....It was impossible for 
him to endure such exertion further. Therefore, having fin- 
ished his philosophical studies, he felt himself compelled to 
enter a station in life that would free him from the bitter 
struggle for existence. His circumstances decided his voca- 
tional choice. He requested, and received in the year 1843, 
admission to the Augustinian monastery of St Thomas in 
Brno. 


Mendel then led a charmed life for a quarter of a 
century. He was able to study natural sciences, espe- 
cially physics, at the University of Vienna, and, on his 
return to the monastery, as long as he fulfilled his 
duties as a priest and as a secondary-school teacher, 
he was free to devote himself to private study, sur- 
rounded by a stimulating group of gifted colleagues, 
and able to play a full part in the active intellectual life 
of a thriving provincial city of the Austro-Hungarian 
Empire. 

A major change occurred in Mendel’s circum- 
stances when he was elected to be Abbot of the Mon- 
astery of St Thomas in Brno in 1868, a post which he 
was to fill for 16 years until his death in 1884. He had 
to bid farewell to his beloved teaching and he soon had 
to give up his botanical researches. Even though his 
way of life necessarily became more worldly as he was 
loaded with honors and as important functions were 
thrust upon him, his essential humility, compassion, 
and kindliness remained unaltered. Much has been 
made of his longstanding dispute with the authorities 
over the taxation of the monastery. Mendel remained 
steadfast in his refusal to agree to payment and he 


stubbornly declined to consider the compromise 
whereby this matter was resolved soon after his 
death, because he firmly believed that he was in the 
right. However, he did not allow himself to become 
embittered by the dispute to the extent of aban- 
doning his many intellectual interests. He continued 
until his last days to pursue his scientific enquiries 
vigorously, mainly in the fields of apiculture and 
meteorology, and, as an extremely skillful practical 
gardener, he remained active in breeding varieties of 
fruits, vegetables, and flowers. He also played chess, 
especially with his nephews who visited him fre- 
quently, and he took great delight in composing 
chess problems. 

This gentle and unpretentious man who always 
remained faithful to his family and to his peasant 
origins, became, as ‘the first geneticist,’ one of the 
tiny band of those responsible for substantial advances 
along humanity’s difficult road towards knowledge of 
itself and of its environment. While this is not a road 
on which the lengths of advances can be exactly meas- 
ured, we can say that the advance which we owe to 
Mendel is among the greatest which has ever been 
achieved by a single individual. The century which 
began with the rediscovery of Mendel’s work ended 
in an unprecedented explosion of science and tech- 
nology. It is impossible to think of the many compon- 
ents of this explosion which are related to genetics 
and which are summarized in this encyclopedia with- 
out thinking also of this unassuming monk tending his 
peas in the peaceful garden of his monastery. Both the 
writers and the readers of this encyclopedia owe their 
profession to him. In return, we should strive to con- 
tinue to pursue our work in directions of which 
Mendel as its fons et origo — source and origin — 
would have approved. 

In this connection, Mendel wrote some verses in his 
youth in memory of Gutenberg; these sentiments can 
now be fittingly applied to himself. 


May the might of destiny grant me 

The supreme ecstasy of earthly joy, 

The highest goal of earthly ecstasy, 

That of seeing, when I rise from the tomb, 
My art thriving peacefully 

Among those who are to come after me. 


To go far back in time to the sixth century BC, to the 
fragments which survive of the writings of Xeno- 
phanes on the limitations of human knowledge: 


The gods did not reveal all things to mortals in the 
beginning; but in long searching man finds that which is 
better. 


Mendel’s Laws II7I 


Mendel’s contribution, even though it occupied only a 
few brief years of his life, is making this searching less 
long than it would have been otherwise. All who con- 
sider themselves to be geneticists would do well to 
study the life and work of the founder of their science, 
and thus to gain an incomparable insight into the 
manner of its founding. 


The kind permission of the British Medical Journal 
Publishing Group to adapt a book review entitled 
Gregor Mendel: the First Geneticist and published in 
the Journal of Medical Genetics (1997) 34:878-879 for 
the preparation of this contribution is gratefully 
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DNA (deoxyribonucleic acid) is the living language of 
life, a molecule that provides information, in the form 
of its building block sequence, that tells cells which 
proteins to produce. Identifying the genes that under- 
lie specific inherited disorders leads to new medical 
technologies, from highly accurate diagnostic tests 
that literally probe a person’s DNA to replacement 
genes. However, understanding how a particular med- 
ical condition is transmitted requires not sophisticated 
biochemical tests, but a familiarity with the basic laws 
of inheritance that describe the patterns in which genes 
pass from parents to offspring. In fact, the reports of 
new gene sequences that crowd genetics journals 
today, with their strings of A, C, T, and G, nearly 
always begin where studies of inheritance have always 
begun — with observing the appearance of traits over 
generations. These principles were discovered more 
than 100 years ago, lay dormant for a generation, and 
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today even in the age of genome sequencing remain at 
the core of the science of genetics. 


Gregor Mendel: The Father of Genetics 


The basic principles of inheritance were worked out in 
garden peas by Gregor Mendel (1822-84), an August- 
inian monk living in what is now the Czech Republic. 
During his 7 years of experiments in the monastery 
garden and greenhouse, Mendel, who was not a 
trained scientist, instinctively followed the classic 
steps of scientific investigation. Drawing on course- 
work in mathematics, he methodically tested how 
certain traits seemingly vanish between generations, 
only to reappear. He repeated experiments to rule out 
the possibility of chance causing the results, and car- 
ried out reciprocal crosses so that in one cross a male 
contributed a trait and in another, the female. Al- 
together, he looked at a variety of traits in an estimated 
24 034 plants, even breeding some plants for up to 
seven generations (2 years) to prove that a trait did 
not change with time. 

Unlike investigators and philosophers who had 
pondered heredity before him, Mendel added a quan- 
titative perspective to his experiments. He sought 
similarities and trends in the data, then proposed 
physical explanations for them. But when Mendel 
published his famous paper “Experiments in plant 
hybridization” in 1865, the world was simply not 
ready for the mathematical precision and clarity of 
his work. Ironically, his ideas on the inheritance of 
discrete units that could account for the variability 
seen within a species would have explained a major 
omission in the work of his contemporary, Charles 
Darwin (1809-82). Darwin wrote in On the Origin of 
Species in 1858 that “the laws governing inheritance 
are for the most part unknown.” 

Mendel’s paper — an astonishingly clear treatise — 
went unnoticed until three botanists ‘rediscovered’ 
it at the turn of the century. James A. Peters in 1961 
wrote “Itis the original classic paper onthe theory of the 
gene, and the cornerstone of the science of genetics.” 


Setting the Scene: What Mendel Did and 
Did Not Know 


If one understands how chromosomes are equally ap- 
portioned into forming gametes (eggs and sperm), 
then Mendel’s two laws that explain trait transmission 
seem obvious. But when Mendel carried out his 
experiments little was known beyond the cell theory 
expounded by the German botanist Matthias Schleiden 
and the German zoologist Theodor Schwann in 1838 
and 1839, respectively, and the German pathologist 
Rudolf Virchow’s statement 15 years later that 


cells come from preexisting cells. It was not until 
1882 that yet another German, Walther Flemming, 
described the maintenance of chromosome number 
as a cell divides. It would be another half a century 
before the ‘golden age of cytology,’ when chromo- 
some structure and function would become more 
intensely investigated. 

Interest in inheritance in Mendel’s time centered 
around plant and animal breeding. Horticulturists 
sought new varieties of ornamental plants, after the 
explorers of the sixteenth and seventeenth centuries 
brought many new species to Europe. The late 1700s 
saw botanical gardens and parks proliferate across the 
continent, as interest in plant variants rose. In the late 
1700s and early 1800s, ‘agricultural science’ courses at 
universities considered breeding an outgrowth of 
natural science. As the textile industry flourished in 
Mendel’s hometown of Brno (Briinn), the capital of 
the province of Moravia, breeding sheep for their 
wool became a high priority. More art than science, 
agricultural experiments sought new varieties or new 
ways to better perpetuate existing favorities. Pursuit 
of valuable traits at this time was more qualitative than 
Mendel’s statistical analyses. 

Mendel was influenced by two researchers in 
particular, J. G. Kélreuter and Andrew Knight, who 
pioneered plant breeding by crossing pure varieties 
to obtain hybrids. Kélreuter (1733-1806) studied 
hybridization in 54 species at the University of 
Tübingen, publishing three reports from 1761 to 
1766. He controlled breedings by placing pollen 
from one plant onto the female parts of another 
plant. Kélreuter noted that when he crossed hybrids 
to each other, the traits present in the original parental 
plants reappeared in the third generation. Although 
Kélreuter was the first to systematically hybridize 
plants, he did not attempt to explain how they 
arose. In fact, he supported epigenesis, the idea that 
the new organism does not inherit discrete units or 
traits, but forms from a homogeneous mix that spe- 
cializes into distinctive characteristics as development 
proceeds. 

Andrew Knight (1759-1838) recommended artifi- 
cial pollination of fruit trees to increase the prevalence 
of desired varieties. When he became more interested 
in trait transmission than in the particulars of fruit- 
raising, he switched to peas, which offered many traits, 
a short generation time, and a flower form that 
allowed control over breeding. In experiments begun 
in 1787, Knight saw what KGlreuter had seen and what 
Mendel would notice much later — creation of hybrids 
in a second generation, and reappearance of the 
parental traits in the third generation. Others would 
repeat this observation in melons. Still, no one 
sought the mechanism underlying the uniformity of 


the hybrids, and the reappearance of traits when the 
hybrids were crossed. 

English biologist William Bateson (1861-1926) 
coined the term ‘genetics’ to denote the science of 
heredity in 1906, but the term had actually been used 
earlier. In 1819, Count E. Festetics, a prominent sheep 
breeder from Hungary, published ‘genetic laws’ which 
included the observations that progeny inherit traits 
from their parents, and that traits of grandparents can 
reappear in the offspring of their offspring. 


Mendel’s Early Life 


Gregor Mendel was born on 22 July 1822 in the tiny 
village of Hyncice, to Anton, a peasant farmer, and 
Rosine, who was the daughter of a gardener. His given 
birth name was Johann Mendel. Young Mendel 
learned from an early age to care for fruit trees, both 
because of his mother’s background, and because the 
family needed the fruit to eat. He excelled in school, 
and in the third grade was sent away to a ‘Gymnasium’ 
for gifted students. He received little financial help 
from his ailing father, and supported himself by tutor- 
ing for the 6 years he spent there. By age 16, he was 
completely on his own financially. 

After the Gymnasium, Mendel spent 2 years at a 
‘philosophical study’ (a 2-year preparatory program 
before college), but it took him an extra year to 
complete because he had to return home to care for 
his father, and his own health was not good. He grew 
intensely interested in physics and mathematics, but 
did not, at that point, continue on to college. His 
parents encouraged him to enter the priesthood, and 
in 1843, at the age of 21, Mendel entered the August- 
inian monastery of St Thomas in Brno, where he took 
the name Gregor. It was an unusual monastery in that 
the members taught in public schools and maintained 
plant and mineral collections, encouraging active in- 
vestigation of nature. From 1843 until 1848, Mendel 
attended lectures in agricultural science, and there he 
learned how to use artificial pollination to produce 
higher-yielding varieties of plants. After an unsuccess- 
ful stint working in a hospital and a short period of 
hospitalization for what some sources report as a ner- 
vous breakdown, Mendel received an assignment 
much more to his liking, teaching Latin, Greek, and 
mathematics in the seventh grade. 

Mendel secured the teaching post because a revolu- 
tion had led to an increased interest in education. But 
he had no formal training as a teacher, and had to take 
an exam for certification. A curriculum vitae which he 
attached to his application has supplied much of what 
we know of Mendel’s early years. Mendel wanted to 
teach natural history. But he suffered from test anxiety, 
and failed because he had had no experience taking 
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examinations and he had not prepared sufficiently. A 
zoologist who graded one essay was especially harsh, 
criticizing Mendel’s ideas on evolution and speciation 
that would turn out, in light of Darwin’s contributions 
8 years later, to have been quite brilliant. Mendel was 
told to retake the examination a year later. 

He never retook the examination, but instead luck 
intervened. In 1851, Mendel substituted for a sick 
teacher at the Brno Technical School, and made such 
a good impression that he was sent to the University of 
Vienna to finally complete his education. He was 29. 
At the university, Mendel supplemented his know- 
ledge of language and philosophy with courses in 
chemistry, botany, and zoology, becoming very inter- 
ested in plant hybridization. A course in “combinator- 
ial analysis’ would prove particularly valuable later 
on as Mendel devised and carried out his breeding 
experiments with peas. Also at this time, scientists, 
both amateur and professional, were turning more 
toward experimentation than observation. 

Three years later, Mendel switched to a new type of 
institution in Brno for the children of factory workers 
called a ‘Realschule, where he taught his beloved 
natural history and physics. It was here that Mendel 
began to formulate precisely what was missing from 
the experiments of K6lreuter, Knight, and others and, 
more importantly, to plan how he would reveal the 
mechanisms behind trait transmission through a 
hybrid generation. He recognized the compelling 
need for a statistical analysis of the problem. It was a 
new way to look at an old question. 


Mendel’s Paper 


Mendel read his famous paper describing experiments 
conducted from 1857 to 1863 at two meetings of the 
Brno Natural History Society on 8 February and 8 
March 1865. The paper was published in the proceed- 
ings of that organization the following year, and 
Bateson translated it into English in 1901 (Bateson, 
1901). The paper has been reprinted in many collections 
of historical papers of scientific importance, and can 
be read online at http://ftp.netspace.org/Mendel Web/ 
mendel.html. A second publication presented experi- 
ments with the hawkweed Hieracium, but the results 
were not clear because of a tendency of this plant to 
die in the embryonic stage. 


Choosing Traits to Follow 

Mendel’s paper is very logically organized into 11 
sections. It begins by questioning the nature of hybrid- 
ization, based on observations on ornamental plants. 
Why and how do some parental traits reappear in the 
third generation, and why do some crosses produce 
the same proportion of hybrids time after time? In the 
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next section, Mendel extols the virtues of the garden 
pea as an experimental organism, citing much the same 
reasons as Andrew Knight had years earlier. The third 
section then lists the ‘differentiating characters,’ or 
traits, that Mendel considered as the subjects of his 
study. He whittled down several dozen possible traits 
to 15 (Table 1), then selected seven to pursue because 
they appeared in two distinct forms, rather than the 
‘more or less’ nature of the others (Table 2). For each 
cross of one type with the second type for a given trait 
he conducted 23 to 60 artificial fertilizations, and var- 
ied whether the female or the male transmitted each 
variant. Then, he selected the ‘most vigorous’ hybrids 
for further study. Mendel used the tools of the back- 
yard gardener, working with plants “maintained in 
their natural upright position by means of sticks, 


Table | Traits in the garden pea Pisum sativum 
considered by Mendel 


Stem length* Unripe pod color 
Stem color Pod form* 

Leaf size Pod size 

Leaf form Seed form* 
Flower position* Seed size 


Flower color Seed coat color* 


Flower size Seed color* 


Length of flower stalk 


* Those selected for experiments 


twigs, and taut strings.” Certain experiments were 
replicated in a greenhouse to eliminate possible 
disturbances by insects. 

In his first experiments, Mendel crossed plants 
bearing the two forms of each trait, and observed the 
hybrid progeny. He thus established the concepts of 
dominance and recessiveness. A dominant trait is the 
one that appears in the hybrid, and the recessive trait is 
the one that seemingly vanishes. Mendel’s own words 
describe his conclusions best: 


In the case of each of the 7 crosses the hybrid character 
resembles that of one of the parental forms so closely that 
the other either escapes observation completely or cannot 
be detected with certainty. ... The expression “recessive” has 
been chosen because the characters thereby designated with- 
draw or entirely disappear in the hybrids, but nevertheless 
reappear unchanged in their progeny. 


The First Generation from the Hybrids, and 
Beyond 

The fifth section of Mendel’s paper shows, repeatedly, 
that the dominant and recessive forms of each trait ap- 
pear in a 3:1 ratio in the progeny of hybrids crossed to 
each other. The numbers speak for themselves in 
Table 3. Mendel showed the classic 3:1 phenotypic 
ratio of a monohybrid cross (one trait present in two 
forms, or alleles), although the terms ‘phenotype’ (an 
individual’s appearance) and ‘genotype’ (the gene vari- 
ants present) were not yet in use. This phenomenon 


Table 2 Dominant and recessive traits used in Mendel’s experiments 


Trait Dominant expression Recessive expression 
Seed form Round (R) Wrinkled (r) 

Seed color Yellow (I) Green (i) 

Seed coat color Gray or gray-brown (A) White (a) 

Pod form Inflated (V) Constricted (v) 

Unripe pod color Green (Gp) Yellow (gp) 


Flower position 
Stem length 


Axial (along stem) (Fa) 
Long (6-7 feet) (Le) 


Terminal (on top) (fa) 
Short (3/4 to | 1/2 feet) (le) 


Table 3 The ‘first generation from the hybrids’ experiments reveal a 3:1 dominant to recessive phenotypic ratio 


Experiment Total Dominant Recessive Ratio 
Seed form 7324 5474 1850 2.96: 1 
Seed color 8023 6022 2001 3.01:1 
Seed coat color 929 705 224 3.15:1 
Pod form 118] 882 299 2.95: 
Unripe pod color 580 428 152 2.82:1 
Flower position 858 65l 207 3.14:1 
Stem length 1064 787 277 2.84: 


average = 2.98:] 


would become known as Mendel’s first law, or the law 
of segregation, years later. The gene segregation that 
Mendel chronicled is actually the result of the process 
of meiosis, the type of cell division that gives rise to 
gametes (Figure 1). That is, when a sperm or egg 
forms, the chromosome pairs (homologous pairs), 
whose DNA has been replicated, separate. Likewise, 
the pairs of genes that comprise the chromosomes 
separate and are distributed into different gametes. 
The part of meiosis that determines the gene combin- 
ations that will enter gametes, and eventually be 
expressed in organisms, is called metaphase, when 
chromosomes align down the center of the cell. 
Mendel followed crosses beyond the third gen- 
eration, determining that the dominant-appearing 
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Figure | Mendel’s observations on the inheritance of 
a single trait that had two forms (alleles) became known 
as Mendel’s first law, or the law of segregation. The 
details of meiosis, worked out after Mendel had published 
his paper, provided the physical basis for the segregation of 
alleles. Chromosome pairs (homologous chromosomes) 
replicate their DNA and then separate at the first meiosis 
division. Then, the single replicated chromosomes split, 
yielding four gametes. This illustration follows only one 
chromsome pair. In reality, all the pairs replicate, split, and 
are apportioned into gametes, generating astounding 
genetic variability through new combinations of traits. 
(Reproduced with permission from Lewis R (1997) Life, 
3rd edn. New York: McGraw-Hill.) 
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individuals among the progeny of the hybrids had 
‘double signification,’ meaning that they were of two 
types. He wrote, “ . . . of those forms which possess the 
dominant character in the first generation, two-thirds 
have the hybrid character, while one-third remains 
constant with the dominant character.” One type 
bred true, always yielding the dominant phenotype 
in further crosses. The second type, when crossed to 
hybrids, produced both the dominant and recessive 
phenotypes. The plants that did not breed true out- 
numbered the other plants two to one. 

Today we call the dominant-appearing plants that 
are ‘constant’ homozygous dominant. They have two 
copies of the dominant allele. The hybrids, called 
heterozygotes, have one dominant and one recessive 
allele. Individuals expressing the recessive trait consti- 
tute the homozygous recessive class, and they too 
breed true, that is, when crossed among themselves, 
they yield only homozygous recessive individuals. A 
monohybrid cross results in a phenotypic ratio of 3:1 
(dominant to recessive), and a genotypic ratio of 1:2:1 
(homozygous dominant to heterozygous to homo- 
zy gous recessive). 

Mendel carried out crosses for four to six genera- 
tions for each of the seven traits, each time self-crossing 
the individuals that ‘bred true’ (the homozygous dom- 
inants and homozygous recessives) as well as self- 
crossing the hybrids. When he did this repeatedly, 
the proportion of hybrids decreased by 50% at each 
generation. By the tenth generation, only two hybrids 
would remain for every 1023 individuals of each 
homozygous class. 


Tracking More Than One Trait 

Next, Mendel set up crosses and followed more than 
one trait. Again, he began with a general observation — 
when he crossed individuals that were hybrid for two 
traits, most of the offspring resembled the original 
parent (that gave rise to the hybrids) that had two 
dominant alleles, one for each gene. 

Mendel’s crosses involving two or more traits 
vividly reveal the detail of his mathematical analyses. 
In one often recounted experiment, he crossed round 
yellow seeds (genotype RRYY in Figure 2) to wrink- 
led green seeds (rryy) and obtained heterozygotes of 
genotype RrYy. (Mendel used the letters A and B to 
denote all traits. The round/wrinkled gene was named 
‘r in 1917, for ‘rugosus.’) He then crossed the hetero- 
zygotes, and found a 9:3:3:1 phenotypic ratio of plants 
with the following types of seeds: 


315 round yellow 
101 wrinkled yellow 
108 round green 

32 wrinkled green 
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(a) All round, yellow 


(b) All wrinkled, green 


All round, yellow 


3 round, 
green 


3 wrinkled, 
yellow 


1 wrinkled, 
green 


9 round, 
yellow 


Figure 2 Mendel’s observations on the inheritance of 
hybrids, or heterozygotes, for two genes not located on 
the same chromosome became known as his second 
law, or the law of independent assortment. Dihybrids 
(genotype RrYy) crossed to each other yielded a 9:3:3: | 
ratio. The ratio derives from counting the phenotypes 
of the peas, not the pods, for these particular traits. 
(Reproduced with permission from Lewis R (1997) Life, 
3rd edn. New York: McGraw-Hill.) 


Further crosses established the genotypes of these 
phenotypic classes. Figure 3 shows a method by which 
to follow what Mendel deduced, using a chart called 
a Punnett square that displays gene combinations 
through gametes. Mendel identified nine genotypic 
classes among the 16 combinations. There were four 
ways to generate offspring with both genes heterozy- 
gous (RrYy); two ways to produce each of four types of 
individuals with one gene heterozygous and the other 
homozygous (RRYy, RrYY, Rryy, and rrYy); and four 
ways to produce offspring with no heterozygotes 
(RRYY, RRyy, rrYY, and rryy). An experiment that 
followed seed coat color also was even more complex. 


Parental generation 


Plants with wrinkled, 
green seeds 


Plants with round, 
yellow seeds 


F, Generation 


All plants produce 
round, yellow seeds 


F, Generation 


Gametes of female parent in F, 


RY Ry rY ry 


Gametes of male parent in F4 


Figure 3 A Punnett square depicts the combinations 
of gametes that arise when dihybrids independently assort. 
(Reproduced with permission from Lewis R (1997) Life, 
3rd edn. New York: McGraw-Hill.) 


Mendel deduced from all these numbers and pro- 
geny classes that the different genes were inherited 
separately, that is, all combinations of the variants 
appeared in predictable ratios. He wrote, “...the 
relation of each pair of different characters in hybrid 
union is independent of the other differences in 
the two original parental stocks.” This observation 
became known as Mendel’s second law, or the law of 
independent assortment. It, too, has its roots in 
meiosis. We know today that Mendel observed these 
results because the seven traits he studied are carried 


on different chromosomes. Had they not been, certain 
traits would have appeared together more often than 
predicted, because they are physically conveyed to the 
next generation on the same chromosome, a phenom- 
enon called linkage. 


Mendel’s Conclusions 

The ninth part of Mendel’s paper reads curiously like 
an introduction — and for good reason. This was the 
first part of his second lecture. Here he related the 
ratios seen in his crosses to events in the pollen and 
eggs, writing that “...the hybrids produce egg cells 
and pollen cells which in equal numbers represent all 
constant forms which result from the combination of 
the characters brought together in fertilization.” 

Mendel used this hypothesis to predict the out- 
come of a cross: round yellow dihybrids (RrYy) 
fertilized with pollen from plants that had wrinkled 
green (rryy) seeds. If the four types of gametes from 
the female plant (RY, Ry, rY, and ry) formed in equal 
numbers, and were then fertilized by ry pollen, then 
four progeny classes (RrYy, Rryy, rrYy, and rryy) 
should appear in approximately equal numbers. They 
did so, as Table 4 shows. 

The tenth part of the paper details Mendel’s less 
than successful attempts to repeat certain pea experi- 
ments with the bean plants Phaseolus vulgaris and 
Phaseolus nanus. Although the results were difficult 
to interpret because of embryonic lethality and very 


Table 4 Gametes form in equal numbers 


Parental cross: RrYy XxX  rryy 
(round yellow) (wrinkled green) 

Gametes: RY Ry rY ry ry 

Progeny: 

Phenotype Genotype #s 

Round yellow RrYy 31 

Round green Rryy 26 

Wrinkled yellow rrYy 27 

Wrinkled green rryy 26 


Table 5 Genetic phenomena that can appear to 
disrupt Mendel’s laws 


Genetic heterogeneity 
Epistasis 

Multiple alleles 

Lethal alleles 
Incomplete dominance 
Codominance 
Penetrance 
Expressivity 
Pleiotropy 
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variable flower colors, Mendel nonetheless concluded 
that the principles he had demonstrated in peas still 
applied, but were obscured by the complexity of pig- 
mentation. The eleventh and final portion of the paper 
eloquently summarizes the overall findings of the 
experiments: “With Pisum it was shown by experi- 
ment that the hybrids form egg and pollen cells of 
different kinds, and that herein lies the reason of the 
variability of their offspring.” 


Mendel is Ignored, then Rediscovered 


It is astounding, in retrospect, that Mendel’s paper 
initially failed to attract attention at a time when 
Darwin’s On the Origin of Species was an overnight 
sensation. Mendel himself sought support for his work 
by a frustrating correspondence with Karl Wilhelm 
von Nägeli, a noted Swiss botanist. Nägeli, whose 
thinking sometimes veered from science to mysticism, 
dismissed Mendel’s work because he was uncomfort- 
able with the mathematics and logic, and because it 
lacked speculation, according to historians of science. 
Noted scientist and science writer Isaac Asimov called 
Nageli’s harsh treatment of the sensitive Mendel his 
“most far-reaching mistake,” and credits him with 
single-handedly delaying the recognition of genetics 
as a discipline for a full generation. Other biologists 
accustomed to more descriptive science, such as evo- 
lutionary thought at the time, may also have been 
uncomfortable with the mathematical nature of the 
work. 

Another reason cited for the initial disregard for 
Mendel’s paper was that the results were not suffi- 
ciently provoking. People anticipated the discovery 
of some new phenomenon to introduce the traits that 
reappeared after vanishing for a generation, not an 
explanation based on different combinations of pre- 
existing inherited units. Monroe W. Strickberger, a 
geneticist at the University of Missouri at St Louis 
wrote in his classic textbook Genetics (Strickberger, 
1968): “To those biologists who were seeking a source 
of variability in evolution, Mendel’s findings indi- 
cated, on the contrary, an unacceptable ‘constancy’ 
of hereditary factors.” Mendel’s vision of discrete, 
measurable traits also did not fit Darwin’s gradual 
view of evolution. It would be years before geneticists 
would understand that discrete factors can combine 
and interact to produce graded phenotypes. 

Another explanation for why Mendel’s ideas were 
not eagerly embraced was that before the golden 
age of cytology, it was difficult to picture a physical 
basis for his ‘characters.’ It was not until 1903 that 
Walter S. Sutton and Theodor Boveri independently 
deduced that chromosomes carry the units of inherit- 
ance, which would later come to be called genes. And 
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the chemical nature of the gene would not be 
described until James Watson and Francis Crick 
assembled the clues contributed by many others to 
depict the double helix of DNA, in 1953. 

The rediscovery of Mendel’s laws occurred at the 
turn of the last century, when three botanists independ- 
ently and unknowingly repeated the work. When 
searching the literature for similar efforts, each found 
Mendel’s paper. The Dutchman Hugo de Vries (1848- 
1935) began thinking about inherited variation arising 
from new combinations of existing traits when 
Darwin’s work was published. He experimented 
with several types of plants, eventually demonstrating 
Mendelian ratios in the evening primrose. The Ger- 
man Karl Franz Joseph Erich Correns who worked 
with peas, after learning of Mendel’s work, published 
the correspondence between Mendel and Nägeli, who 
was his uncle-in-law. The Austrian von Seysenegg 
Tschermak repeated Mendel’s experiments using peas 
in 1898, only learning of Mendel’s work in 1900. None 
of the three knew each other and all graciously cred- 
ited discovery of the principles of inheritance to Gre- 
gor Mendel. 


Confirming and Extending Mendel’s 
Laws 


In the first years of the twentieth century, various 
researchers confirmed Mendel’s work in different spe- 
cies. But it was only a matter of time before further 
experimentation would reveal that gene transmission 
is not always as clear-cut as Mendel’s crosses had in- 
dicated. Various complications do not negate Mendel’s 
laws, but make them more difficult to observe. For 
example, the actions of different genes can con- 
tribute to the same phenotype, a phenomenon called 
genetic heterogeneity. Some effects that seem to blur 
Mendel’s ratios reflect the fact that genes do not func- 
tion alone: The actions of other genes or the environ- 
ment can influence their expression. In epistasis, for 
example, activity of one gene masks the effect of 
another. And some traits may appear to be inherited 
but instead reflect the exposure of several family 
members to the same environmental influence, such 
as infectious microorganisms. Other genetic situations 
that can appear to obscure Mendelian ratios are out- 
lined below. 


Linkage 

Genes carried on the same chromosome do not inde- 
pendently assort. Bateson and Reginald Crundall 
Punnett (1875-1967) described genetic linkage in 
several reports to the Evolution Committee of the 
Royal Society of London from 1905 to 1908. They 
studied poultry comb form, demonstrating significant 


departure from Mendelian ratios for some gene pairs. 
Combining these observations with Sutton’s finding 
that the number of genes far exceeds the number of 
chromosomes led to the idea of physically linked 
genes being inherited together. 


Multiple Alleles 

Mendel selected a subset of pea genes for scrutini- 
zation that were easy to work with because each had 
two distinctive alternate guises. Analysis of crosses be- 
comes more complicated when a gene exists in several 
forms, or alleles. As the number of alleles increases, so 
does the number of phenotypic classes. If the domin- 
ance relationships among the alleles are understood, 
then Mendel’s ratios can still be observed, but the 
observer must discriminate more phenotypic classes. 


Lethal Alleles 

Allele combinations that are lethal before an indi- 
vidual has matured sufficiently to be observed may 
also appear to disrupt Mendel’s laws. Mexican hairless 
dogs, for example, are heterozygous for a gene that 
causes lack of hair. Their genotype can be written Hh. 
Homozygous recessive dogs, of genotype hh, are hairy. 
However, the HH homozygous dominant dogs die as 
spontaneous abortions or stillbirths. Because the HH 
class never appears, the genotypic ratio is 2H}: 1hh, 
and, phenotypically speaking, hairless dogs outnum- 
ber hairy dogs two to one. Breeders cross Mexican 
hairless dogs to hairy dogs to avoid the HH doomed 
dogs, which could stress the mother. 


Different Types of Dominance 

Not all traits are as easy to tell apart as plant height 
and pea pod color, with straightforward dominance 
and recessiveness. Sometimes each allele in a hetero- 
zygote is expressed, producing a blended phenotype. 
Such alleles are called incompletely dominant. A red- 
flowered snapdragon plant crossed to a white-flowered 
variant yields 1/2 pink-flowered offspring for this 
reason. Both alleles can also be expressed in a situation 
called codominance. In type AB blood in humans, for 
example, red blood cells have two types of surface 
molecules, A and B. 


Degrees of Phenotypic Expression 

The terms penetrance and expressivity refer to grad- 
ations ina phenotype, which can complicate discerning 
Mendelian progeny classes. A genotype is completely 
penetrant if every individual who has it expresses the 
associated phenotype; it is incompletely penetrant if 
this is not the case. A genotype is variably expressive 
if it differs in severity among individuals. A good ex- 
ample is polydactyly, the condition of having extra 
fingers and/or toes. It is seen in many mammals, 


most notably humans and cats. Polydactyly is incomp- 
letely penetrant, because some people known to have 
inherited the genes for it (because they have affected 
parents as well as children) are unaffected. It is vari- 
ably expressive because affected individuals vary in 
the numbers of extra digits they have. Imagine how 
difficult Mendel’s experiments would have been to 
interpret if a pea plant inherited the genotype for 
tallness, but didn’t express it, or if the plants assumed 
many different heights! 

Some genes produce several effects, and not all of 
them occur in all individuals who have the same geno- 
type. This phenomenon, called pleiotropy, is seen in 
certain inherited diseases in humans. Consider pro- 
phyria, a blood disorder that affected the British royal 
family. King George III suffered from all the symp- 
toms (attacks of abdominal pain, weak limbs, fever, 
racing pulse, hoarse voice, dark red urine, and effects 
on the central nervous system), but other royal rela- 
tives noticed only the telltale red urine, sometimes 
together with abdominal cramps. 


A True Exception to Mendelian Inheritance 
With the ability to determine the DNA sequences of 
genes as well as the parts of chromosomes near them, 
medical geneticists can trace which alleles come from 
each parent in particular families. This ability led to 
the discovery of a true exception to gene segregation 
(Mendel’s first law) in 1988. Arthur Beaudet, at the 
Baylor College of Medicine, noted that a patient with 
cystic fibrosis, who would normally have inherited 
her condition from two heterozygous (carrier) parents, 
instead had only one parent who was a carrier. Further 
analysis of the gene in the woman and her parents 
indeed revealed that her two chromosomes carrying 
the gene both came from her mother. This condition is 
called uniparental disomy, which means ‘two bodies 
from one parent.’ Uniparental disomy is probably 
very rare. It can happen when a sperm with a missing 
chromosome fertilizes an egg with an extra chromo- 
some of the same type, or vice versa. 

Another example of uniparental disomy is seen in 
two disorders with different symptoms that result 
from having a double dose of a certain part of one 
chromosome from either the mother or the father. A 
person with Angelman syndrome has poor muscle 
tone and coordination, an extended tongue, large jaw, 
laughs uncontrollably, and flaps the arms. In some 
cases, the affected chromosomal region is present in 
two copies from the father. (In the other cases, the 
gene is absent on one chromosome.) A person with 
Prader-Willi syndrome eats obsessively and is obese, 
has small feet and hands, and does not mature sexually. 
In nearly half of all cases, the gene is inherited in a 
double dose from the mother. Geneticists do not yet 
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understand precisely how the same gene can function 
differently depending on the parent of origin. 


A Molecular View of the Traits Mendel 
Studied 


The humble garden pea that was instrumental in 
founding the field of genetics has not been forgotten. 
In Alnarp, Sweden, the Nordic Gene Bank houses the 
Pisum Genetic Stocks Collection of 319 varieties, 
where 492 pea genes have been identified and cata- 
loged. Researchers are also turning the tools of mo- 
lecular biology to some of the traits in garden peas that 
Mendel immortalized in his experiments. This type of 
investigation reveals how the phenotypes that Mendel 
studied arise. 

In 1990, investigators at the John Innes Institute 
and AFRC Institute of Plant Science Research in 
Norwich, UK identified the protein difference that 
distinguishes round (RR or Rr) from wrinkled (rr) 
peas. The functional R allele encodes a form of starch- 
branching enzyme, which normally links sugars into 
longer carbohydrates. Developing seeds (peas) of rr 
plants lack this enzyme, so they contain many free 
sugars. This draws water into the cells, which swells 
the seeds. When the pea matures, the water exits 
the cells, and the seeds wrinkle. Peas of genotype rr 
also have less protein and more lipid than Rr or RR 
peas. 

In 1997, researchers at the University of Tasmania 
in Australia identified the product of the Le gene, 
which determines stem length, and therefore whether 
a plant is short or tall. The functional allele encodes an 
enzyme necessary for synthesis of gibberellin, a plant 
hormone that causes stems to elongate between nodes. 
A change in the gene (a mutation) replaces one amino 
acid with another in the encoded enzyme product at 
its active site, impairing its function. With the enzyme 
disabled, gibberellin is in short supply, and the plant is 
stunted. 


Mendel’s Laws Today 


The beauty of Mendel’s laws is that they apply to all 
diploid organisms, i.e., those with two copies of each 
chromosome. When a family is surprised to learn that 
a child has inherited sickle cell disease, or cystic fibro- 
sis, or any of hundreds of other recessive conditions 
because no other relatives are affected, it is because the 
disease-causing gene has remained hidden in ‘carriers’ 
(heterozygotes) or, in Mendel’s language, hybrids. 
When an extremely rare disease appears in a family 
where blood relatives have had children together, it is 
usually because the parents share a recessive allele that 
each, as a carrier, transmitted to an affected child. 
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A testament to Gregor Mendel’s contribution to the 
science of genetics is a huge volume and corresponding 
on-line service that is the ‘Bible’ of medical genetics, 
called Mendelian Inheritance in Man. Compiled by 
Victor McKusick at Johns Hopkins University, it is a 
compendium of all known single genes in humans. 
Single gene traits are called Mendelian traits, and the 
study of trait transmission is called Mendelian genetics. 

Although Mendel’s name is associated with genet- 
ics as perhaps no other scientist’s is with a particular 
field, perhaps with the exception of Charles Darwin, 
he would no doubt be astounded at the state of our 
knowledge today. Genetics journals routinely spell 
out the DNA sequences of the genes behind many 
disorders, and researchers can compare the same 
genes in different species with a few clicks of a com- 
puter mouse. For two decades researchers have trans- 
ferred and expressed human genes in bacteria, and 
have created such unnatural combinations as tobacco 
that lights up with a firefly’s luminescence genes, or 
sheep that produce human proteins in their milk for 
use as drugs. Hundreds of patients are undergoing 
experimental gene therapies, and more than a 50 spec- 
ies have had their entire genomes sequenced, including 
humans. 


Further Reading 

Bhattacharyya, Madan K, Smith AM, et al. (1990) The wrinkled- 
seed character of pea described by Mendel is caused by a 
transposon-like insertion in a gene encoding starch-branching 
enzyme. Cell 60: | 15-122. 

Henig RM (2000) The Monk in the Garden: The Lost and Found 
Genius of Gregor Mendel. Boston, MA: Houghton Mifflin Co. 

Lester DR, Ross JJ, Davies PJ et al. (1997) Mendel’s stem length 
gene (Le) encodes a gibberellin 3B-hydroxylase. The Plant Cell 
9: 1435-1443. 

Online Mendelian Inheritance in Man, www3.ncbi.nlm.nih.gov/ 
omin/searchomim.html 

Orel V (1996) Gregor Mendel, the First Geneticist. New York: 
Oxford University Press. 
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Mendelian genetics typically means the recognizable 
patterns or rules that are associated with single gene 
inheritance. These patterns include segregation of the 
different gene states, or alleles, from a hybrid indi- 
vidual to produce two types of gametes, one for each 
of the two alleles. Depending on the characteristics of 
the trait associated with the alleles, whether dominant, 
codominant or recessive, and the genotypes for both 
parents, segregation from single gene hybrids leads to 
recognizable patterns such as the well known 3:1 ratio 
among the offspring. The rules of Mendelian genetics 
extend to following two or more hybrid genes simul- 
taneously, each following the patterns known from 
single gene inheritance. The observation that inherit- 
ance of a specific trait fits the patterns of segregation 
and known ratios in the offspring is recognized by 
describing the trait as showing Mendelian genetic 
behavior, with the conclusion that the trait is due to 
an allele difference at a single gene. 


See also: Mendel’s Laws 


Mendelian Inheritance 
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Mendelian inheritance typically means that a gene 
shows segregation of two alleles from a hybrid indi- 
vidual. Segregation at the formation of gametes 
through meiosis yields two types of gametes from 
a hybrid individual, each gamete type distinguished 
by the allele it contains. It can be extended to mean 
that two genes show independent assortment from 
each other in their segregation patterns. Independent 
assortment from a dihybid results in four gamete 
types, with equal combinations of the different alleles 
from the two genes. More generally, Mendelian 
inheritance refers to traits shown through crosses to 
appear in the ratios that are consistent with single gene 
inheritance. Such traits are said to exhibit Mendelian 
inheritance and are inferred to result from different 
alleles at a single gene. 


See also: Mendel’s Laws 
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An interbreeding population of sexually reproducing 
individuals sharing a common gene pool. 


See also: Gene Pool 
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Mendelian ratio usually refers to the expected ratio(s) 
of different progeny types from a cross. This can be 
the 3:1 comparison of progeny with dominant trait to 
those with the recessive trait from a single gene hybrid 
self cross. It can be the 1:1 comparison of progeny 
types from testcrossing in order to show two gamete 
types are made in equal frequency by the hybrid par- 
ent. Mendelian ratios can also refer to the results of 
dihybrid inheritance such as 9:3:3:1 ratios from a self 
cross or 1:1:1:1 obtained by a test cross. Predicting 
or understanding Mendelian ratios requires knowing 
whether the gamete type or progeny phenotype is 
specified, whether one or both parents is hybrid, and 
the number of hybrid genes, and whether the trait 
phenotypes are dominant, recessive or codominant. 


See also: Punnett Square; Test Cross 
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The Meselson—Radding model is a model of recom- 
bination that allows the formation of hybrid DNA 
on one chromatid only. It was created in response to 
the finding from Saccharomyces cerevisiae that much 
gene conversion is asymmetrical, that is, confined to 
one chromatid. This implies that heteroduplex DNA 
(hybrid DNA containing one or more mismatched 
base pairs) had occurred on a single chromatid. The 
model achieves an asymmetrical distribution of het- 
eroduplex by initiating the recombination event with 
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a nick on only one of the two participating DNA 
molecules. This is seen in part (i) of Figure |. The 3’ 
hydroxyl at the nick then primes DNA synthesis, 
which displaces the identical strand of DNA as the 
new strand is formed. The 5’ single-stranded tail pro- 
duced by this reaction invades a homologous molecule 
with the formation of a D-loop, illustrated in part (ii). 
The D-loop continues to grow, driven by the action 
of the DNA polymerase in making a new strand and 
displacing the original strand. This polymerization 
eventually ceases. The open part of the D-loop is 
subject to endo- and exonucleolytic degradation, re- 
sulting in the structure shown in part (iii). At this 
stage, there is a length of asymmetrical hybrid DNA 
from the site of initiation of the recombination event 
to the position where the DNA synthesis has stopped. 


iii Z e 


vi Z 


Figure | The Meselson—Radding model. Each line 
represents a single DNA strand. The two participating 
parental molecules are distinguished by thin or thick 
lines. Half arrows indicate the 3’ ends of the strands. 
The broken line shows the newly synthesized DNA. The 
figure is explained in the text. 
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The structure (iii) in Figure I is similar to a Holli- 
day junction and is proposed to form a Holliday 
junction upon isomerization. The product of isomer- 
ization is illustrated in part (iv). As in Holliday’s 
model, the Holliday junction can now migrate, pro- 
pagating symmetrical hybrid DNA. Thus, the struc- 
ture shown in part (v) has a length of asymmetrical 
hybrid DNA near the initiation site and lengths of 
symmetrical hybrid DNA at the side where the 
Holliday junction is expected to be. 

Wherever there is an allelic difference between 
the two parental molecules, the hybrid molecule will 
contain mismatched base pairs. As with other 
recombination models that are based on the formation 
of heteroduplex DNA, these mismatched base pairs 
are subject to correction by a mismatch repair system 
that excises one or other strand over a length that 
includes the mismatch. Repairing the excision gap by 
copying the remaining strand can produce the patterns 
of conversion and postmeiotic segregation seen in re- 
combination data. 

The Holliday junction seen in Figure | part (v) is 
subject to rapid isomerization, so that the structures 
seen in (v) and (vi) occur equally. If an endonuclease 
(resolvase) cuts the crossed strands in structure (v), it 
will, upon ligation, give two reciprocal crossover 
molecules. Endonucleolytic cleavage of the crossed 
strands in the other structure, seen in (vi), will yield 
a noncrossover recombination event. Thus, by this 
model, crossover and noncrossover products will be 
equally common and are expected to show the same 
pattern of conversion and postmeiotic segregation. 


Further reading 

Meselson MS and Radding CM (1975) A general model for 
genetic recombination. Proceedings of the National Academy 
of Sciences, USA 72: 358-361. 


See also: Gene Conversion; Genetic 
Recombination; Heteroduplexes; Holliday 
Junction; Mismatch Repair (Long/Short Patch); 
Reciprocal Recombination; Recombination, 
Models of 
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One of the most attractive features of the Watson- 
Crick model of DNA was its obvious prediction that 


the genetic material replicates via a simple process in 
which each strand determines the sequence of its com- 
plement through base-pair complementarity. Further- 
more, the model suggested that the most natural mode 
of replication would be for the double helix to sep- 
arate, for each strand to remain intact, and for new 
complementary strands to be synthesized by using the 
old strands as templates, thus forming two new mole- 
cules that are half old and half new. This mode of 
replication is called ‘semiconservative, because each 
strand is conserved even though the double helix is 
not. However, there are other theoretically possible, 
albeit less likely, modes of DNA replication. For 
example, DNA might undergo conservative replica- 
tion, in which each old strand directs the synthesis of 
its complement but the two old strands stay together 
while the two new strands make a whole new helix. 
DNA might also replicate in more complicated ways, 
perhaps by breaking up into smaller units. 

In 1958, Matthew Meselson and Franklin W. Stahl 
tested the prediction of the Watson—Crick model 
regarding replication. Meselson, Stahl, and Vinograd 
found that, if DNA is dissolved in a dense solution of 
cesium chloride (CsCl) and this solution is centrifuged 
for a few days at top speed in an ultracentrifuge, 
the Cs* ions form a density gradient: the density 
increases steadily from a slightly lower value at the 
top (centripetal end) of the cell to a slightly higher 
value at the bottom (centrifugal end). The DNA mole- 
cules in the solution come to an equilibrium position 
where their buoyant density equals that of the gradi- 
ent. All DNA has a buoyant density of about 1.7 g 
ml’, but there are small differences between DNAs 
from different sources. So, if a mixture of DNAs is 
centrifuged in a CsCl solution, the molecules in the 
mixture separate from one another, because each type 
moves to its own position in the gradient. 

To distinguish parental from replicated DNA, 
Meselson and Stahl made bacteria with dense DNA 
by growing them for several generations in a medium 
containing the heavy isotope "N instead of ordinary 
nitrogen ('*N). When these bacteria were transferred 
into an ordinary medium containing '*N, they started 
making DNA of normal density. Meselson and Stahl 
followed the change in density over two generations 
after the transfer by taking samples periodically, 
extracting the DNA, and centrifuging it in CsCl. The 
analytical ultracentrifuge passes a beam of light 
through the rotating cell, in this case showing the 
position of DNA by its absorbance of UV light. 

Figure | shows how the density of the DNA 
should change if it replicates semiconservatively. The 
original molecule, synthesized in medium containing 
IN, is all dense; in the first round of replication in 
'4N-containing medium, it should separate into two 


Original 
parental 
molecule 


1st generation 
daughter 
molecules 


2nd generation 


daughter 
molecules 
Semiconservative Conservative 
Figure | Predictions of the conservative and semi- 


conservative models of DNA replication, showing how 
the models predict different distributions of the original 
molecules (dark strands) after two rounds of replication. 


strands that each acquire a light partner strand, thus 
making hybrid molecules of intermediate density. 
After a second round of replication, the original 
dense strands should still be combined with light 
strands, and an equal number of molecules made ex- 
clusively of light strands should be formed. Figure 2 
shows what Meselson and Stahl actually saw. Initially, 
all the DNA is dense. After one round of replication, it 
is all half-dense, as it should be if it consists of one 
strand made entirely with '*N and one made entirely 
with '*N. After a second round of replication, half the 
DNA is still half-dense and half is light. These results 
are precisely what the Watson—Crick model predicts, 
supporting the semiconservative mode of replication. 
Meselson and Stahl then had to show that the units 
being separated in the CsCl gradient were double- 
stranded molecules with each strand either all heavy 
or all light. Their evidence for this took advantage of 
the fact that heat of 80-100 °C will separate the strands 
of a double helix by disrupting (‘melting’) the hydro- 
gen bonds holding it together. Meselson and Stahl 
melted some of their half-dense DNA and showed 
that it separates into one dense fraction and one light 
fraction, as it should. They concluded that each strand 
of half-dense DNA consists of either totally dense 
or totally light nucleotides, not a mixture of dense and 
light nucleotides in one strand. This work provided a 
definitive confirmation of the Watson—Crick model. 
This experiment is a prime example of scientific 
reasoning, because the various modes of replication 
predict different outcomes from the experiment. The 
logic of science is largely a hypothetical logic, in which 
a hypothesis, H, has certain testable consequences that 
should result in some outcome, O. One then sets up an 
empirical situation in which O should be observed. If 
O is observed, this strengthens the hypothesis, but 
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Figure 2 Meselson and Stahl’s observations of the 
distribution of DNA molecules in a CsCl gradient at 
various stages of replication. The bottom of the 
centrifuge cell is toward the right. Bands are located 
by the absorption of UV light (A); the tracings (B) show 
the densities of the bands. 


does not prove it, since it is invalid to reason “A 
implies B; B, therefore A.” However, if O is not 
observed, the hypothesis is disconfirmed or is at least 
subject to doubt, since it is valid to reason “A implies 
B; not B, therefore not A.” 

The late physicist John R. Platt pointed out a par- 
ticularly powerful form of hypothetical reasoning, 
which he called ‘strong inference.’ Rather than setting 
up a single hypothesis, an investigator entertains alter- 
native hypothesis, each of which predicts a different 
outcome of some experiment. That is: 


If H,, then O; 
If Ho, then O2 
If H3, then O; 
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and more, if possible. The investigator then does a 
critical experiment designed to make all the outcomes 
(O1, O2, O3,...) possible. There can only be one out- 
come, e.g., Oz in this case. 

Then the investigator can reason: 


O; is not observed, so H; is not true. 
O; is not observed, so H; is not true. 
O; is observed, so H is probably correct. 


Again, the experiment does not prove H; is true; but 
H, has withstood a powerful test, especially if the 
hypothesis and experiment are quantitative and 
the results agree closely with the prediction. This 
reasoning is exactly what Meselson and Stahl used in 
testing the replication prediction of the Watson—Crick 
model. 


See also: DNA Structure; Semiconservative 
Replication 
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Messenger RNA (mRNA) is an intermediate in 
the translation of genetic sequences into protein. 
Genomic DNA is transcribed into mRNA, which, 
when bound to the ribosome, can be translated. 


Genetic Code 


The genetic code is the universal dictionary by which 
genetic information is translated into the functional 
machinery of living organisms, the proteins. The 
words (codons) of the genetic message are three 
nucleotides long. Since there are four different 
nucleotides used in mRNA, this results in a dictionary 
of 64 words. There are 20 amino acids that are nor- 
mally used in proteins and which are translated. In 
addition the translation needs a definition of ‘start’ 
and ‘stop.’ The start codon also defines the reading 
frame (the sequence of nucleotide triplets) that is to be 
translated. The start or initiator codon is identical to 
the methionine codon. Special mechanisms are used to 
identify the correct initiation site; in addition there are 
three stop codons: UAA, UAG, and UGA. Thus 61 
codons are available for 20 amino acids, and hence the 


genetic code is degenerate. In the case of leucine, 
serine, and arginine, there are as many as six codons, 
whereas methionine and tryptophan have only one 
codon. 


Transcription 


Genomic DNA cannot be translated but has to be 
copied or transcribed into RNA by different RNA 
polymerases. Here the classic mechanism discovered 
by Watson and Crick applies. One strand of the 
double-stranded DNA (the negative strand) is copied 
with Watson—Crick base-pairing into a positive strand 
of RNA. This occurs in the 5’ to 3’ direction. The 
double-stranded DNA is opened up in a ‘bubble’ 
that travels along the duplex during transcription. 
Here, a DNA-RNA hybrid is formed transiently. 
The process of transcription is in all cases strongly 
regulated. Some genes are transcribed frequently, 
whereas others are transcribed only rarely. Again 
some genes are transcribed in a brief period in the 
life of the cell, whereas others are copied more or 
less continuously. 

In eukarya, transcription is performed in the 
nucleus and the transcript is transported into the cyto- 
plasm to be translated. Transcription and translation 
in mitochondria and chloroplasts is performed in 
these cellular organelles. In the case of eubacteria and 
archaea, the whole process is performed in the cyto- 
plasm. The eubacterial transcripts frequently contain 
several genes controlled by one operator, i.e., mRNA 
is polycistronic. 


Processing of Transcribed RNA 


Some transcribed RNAs are never translated but 
have their same cellular functions as RNA. These 
are primarily the ribosomal RNA (rRNA) and 
transfer RNA (tRNA) molecules. The transcribed 
RNA, called the ‘primary transcript,’ frequently 
has to be processed to become mRNA. Several differ- 
ent processes are involved. The processes in eukarya 
differ from those in eubacteria. The primary tran- 
scripts normally contain longer or shorter regions, 
which are not translated. They form so-called introns, 
while the translated regions form exons. The splicing 
machinery removes the introns by cutting and liga- 
tion. Eukaryotic mRNAs are also modified by the 
addition of a poly(A) tail beyond the 3’ end of the 
message. 

In eukarya the primary transcripts are also fre- 
quently edited to become mRNAs. This is sometimes 
done by changes of U to C or vice versa. More exten- 
sive editing occurs in mitochondria from trypano- 
somes, where the mRNAs are extensively modified 


by large enzymatic particles that use templates called 
‘guide RNAs.’ 


Translation on Ribosomes 


The process of translation occurs on the ribosome, in 
the cytoplasm or in the cellular organelles, mitochon- 
dria and chloroplasts. The ribosome is a complex of a 
few large rRNA molecules and between 50 and 90 
different proteins. The ribosome is made up of two 
subunits (large and small) with different functions that 
dissociate from each other at the end of the process. 
Translation is traditionally divided into three steps: 
initiation, elongation, and termination. A fourth step, 
ribosome recycling, also belongs to the process. Sol- 
uble protein factors catalyze the process by binding to 
the ribosome transiently. More than 10 factors parti- 
cipate in eubacterial translation, whereas a consider- 
ably larger number participate in eukaryal translation. 
The mRNA is bound to the small ribosomal subunit. 
Since the messenger is bound between the subunits, 
they have to dissociate to be able to bind a tRNA. The 
decoding site for interactions between the mRNA and 
the anticodon is part of the A-site for aminoacyl- 
tRNA and located on the small subunit. The neigh- 
boring P-site is the location of the tRNA with the 
nascent peptide. 

The initiation codon is recognized in different ways 
in eukarya and bacteria. In eubacteria a nucleotide 
sequence of the mRNA rich in As and Gs is usually 
found 3-10 nucleotides upstream of the initiator 
codon. These sequences are complementary to a 
region of the 3’ end of the 16S ribosomal RNAs. 
Binding of this region of the mRNA to the 3’ end of 
the 16S rRNA is called the Shine-Dalgarno interac- 
tion. The initiator tRNA (fMet-tRNA) complexed 
with initiation factor 2 recognizes the initation 
codon AUG and binds to the P-site of the small sub- 
unit of the ribosome. 

In eukaryal systems, the binding site on the 
mRNA for the ribosome is recognized quite differ- 
ently. The eukaryal mRNAs are usually capped at the 
terminal 5’ position. This means that they have an N’- 
methylated GTP linked by a 5’-5’ pyrophosphate 
bond to the terminal nucleotide. The cap is situated 
at a varying distance from the initiation codon, the 
first AUG. Some of the eukaryal initiation factors 
interact with the small subunit, while others interact 
with the capped mRNA. The initiator tRNA binds 
to the small subunit in complex with the eukaryal initi- 
ation factor 2. The small subunit then scans the mRNA 
for the initiator AUG codon, which will be recognized 
by the bound initiator tRNA. In both eukarya and 
bacteria, the large subunit subsequently associates 
with this complex to initiate protein synthesis. 
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Reading Frame and Usage of Genetic 
Code 


The initiator AUG codon not only defines the start 
but also the reading frame of a mRNA. Translation 
proceeds from this starting point in steps of three 
nucleotides (one codon) by binding a cognate tRNA 
through base-pairing. The frequent occurrence of ter- 
mination codons out of frame prevents translation in 
the wrong frame for more than short stretches. How- 
ever, there are mRNAs for which the correct transla- 
tion needs a change of reading frame. This is the case 
for Escherichia coli termination or release factor-2 
(RF2). 

The readthrough of a stop codon requires a tRNA 
that would decode a stop (nonsense) codon as a sense 
codon and incorporate a specific amino acid. Such 
tRNAs are called suppressor tRNAs. 

In a few proteins in eubacteria and eukarya, seleno- 
cystein (Se-Cys) is required. This is not incorporated 
by a posttranslational modification as in other cases of 
nonstandard amino acids. Se-Cys is rather incorpor- 
ated during translation in response to one of the stop 
codons. The mechanism for this involves a special 
tRNA (tRNA) which reads the stop codon and a 
specialized version of elongation factor T4. 


Further Reading 

Spirin AS (1999) Ribosomes. New York: Kluwer. 

Garrett RA, Douthwaite SR, Liljas A, Matheson AT, Moore PB 
and Noller HF (eds) (2000) The Ribosome: Structure, Function, 
Antibiotics and Cellular Interactions. Washington, DC: ASM 
Press. 


See also: Anticodons; Genetic Code; Introns and 
Exons; Ribosomes; Transcription; 
Transfer RNA (tRNA) 
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MET is the product of the c-met proto-oncogene and 
the membrane receptor for the polypeptide growth 
factor hepatocyte growth factor/scatter factor (HGF/ 
SF). The MET locus maps to 7q21-q31 and is tightly 
linked to the cystic fibrosis (CF) locus. MET was 
discovered as an activated oncogene in a human osteo- 
genic sarcoma cell line treated with N-methyl-N’- 
nitro-nitroso-guanidine in which activation resulted 
from a genomic rearrangement involving sequences 
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from chromomose 1 (translocated promoter region, 
tpr) and the MET locus. A similar rearrangement is 
now known to occur in certain human tumors. The 
product of the unmutated and unrearranged c-met 
proto-oncogene is expressed in a wide range of cell 
types: epithelial and endothelial cells, myogenic pre- 
cursor cells, and certain groups of neurons, where it 
controls a variety of developmental processes and tis- 
sue regeneration. 


MET Receptor 


MET is a protein of 1436 amino acids with intrinsic 
tyrosine kinase activity. It is synthesized as a single 
polypeptide chain and is subsequently cleaved into a 
two-chain heterodimer consisting of an N-terminal 
a-chain located outside the membrane anda C-terminal 
B-chain, which encompasses a large extramembrane 
domain, a single transmembrane domain, and the cyto- 
plasmic kinase domain (Figure |B). 


MET Ligand 


HGF/SF was isolated concurrently as a liver mitogen 
(hepatocyte growth factor) and as a motility factor for 
epithelial cells (scatter factor). HGF/SF is a protein of 
728 amino acids that differs from other polypeptide 
growth factors and is closely related to the proenzyme 
plasminogen. HGF/SF and plasminogen have similar 
gene organization and a multidomain domain struc- 
ture (Figure IA). They also share a common post- 
translational (proteolytic) mechanism of activation 
which, in the case of HGEF/SF, leads to the formation 
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Figure |I (A) Schematic representation of the domain 
organization of HGF/SF; (B) the membrane-bound MET 
receptor; and (C) a constitutively-active and oncogenic 
form of MET known as tpr-MET. See text for further 
details. 


of a biologically active heterodimer from a single- 
chain, inactive precursor. 


MET Pathway 


Activation of the signal transduction pathways down- 
stream of MET occurs through a short-sequence motif 
(the so-called docking site) located near the the 
C-terminus of the B-chain. This sequence is sufficient 
to induce MET-specific responses when fused to other 
receptor kinases. Early studies on the intracellular 
pathways activated by MET suggested a role for cyto- 
solic adaptors and enzymes known for their role in the 
response to several growth factor receptors. These 
studies have established that the growth response 
requires activation of the Ras-MAP kinase pathway, 
whereas the motility response requires activation of 
the PI3-kinase and the Ras—Rac/Rho pathways. There 
is also evidence for a role of the Signal Transducers 
and Activators of Transcription (STAT) pathway in 
the morphogenetic response to HGF/SF but not in 
the growth or motility responses. Interestingly, re- 
cent studies have led to the identification of at least 
one adaptor (Gab1) that appears to be specific and 
essential for the cell response to HGF/SE. Thus the 
current data imply that there are multiple pathways 
downstream of MET and suggest that the final cell 
response may depend on the availability of individual 
transducers. 


Cell Response to MET Activation 


Early studies have indicated that HGF/SF is a product 
of fibroblasts in culture and interstitial tissue in vivo 
and affects the behavior of epithelial and endothelial 
cells which express the MET receptor. The HGF/SF— 
MET system is therefore paracrine in its action. The 
cell response elicited by HGF/SF and MET is also 
distinctive. Although HGF/SF is a potent mitogen 
for a number of cell types, it does not generally lead 
to simple stimulation of growth in cells expressing the 
MET receptor. The majority of target cells exhibit a 
concurrent and potent motility response characterized 
by ruffling of the plasma membrane followed by cell 
spreading, major changes in cell morphology, loss of 
cell-cell contacts, and a marked increase in local motil- 
ity. Though not the sole cause of the motility response, 
MET activation causes a rapid and transient inhibi- 
tion of junctional communication, downregulation of 
desmosomal proteins and cadherins, and a reorganiza- 
tion of the F-actin cytoskeleton. Movement of iso- 
lated cells is also affected, hence increased motility is 
not simply due to loss of intercellular adhesion or 
cell junctions. Further, in a number of cells, HGF/SF 
and MET mediate a complex growth and motility 


response which results in morphogenesis, i.e., the 
formation of complex structures such as branched 
epithelial tubules or alveoli which require cell prolif- 
eration as well as cell relocation and a correct topolo- 
gical relationship between cells. Finally, HGF/SF is 
one of the most potent angiogenic factors currently 
known. This results from a potent angiogenic activity 
of the factor per se combined with the ability to induce 
expression of several other key angiogenic molecules 
(such as vascular endothelial growth factor (VEGF), 
prostaglandin F, platelet growth factor (PGF), and 
macrophage inflammatory protein-20, macrophage 
inflammatory protein-20 (MIP-20) that act on 
endothelial cells in an autocrine manner. 


Physiological Roles of HGF/SF-MET 
System 


There is now extensive evidence for key roles of 
HGF/SF and MET in development. HGF/SF has 
neural-inducing activity in the chick embryo and is 
expressed in the vertebrate organizer Hensen’s node. 
Mouse embryos with null mutation at the HGF/SF or 
MET loci die between 12.5 and 14.5 days of gestation 
with: (1) severe abnormalities of placenta and liver, (2) 
absence of muscle progenitor cells in the limb and 
diaphragm, and (3) defects in the directional growth 
of the axons of spinal and cranial motor neurons. 
There is also considerable evidence for a role of the 
HGEF/SF and MET system in promoting cell migra- 
tion and/or tissue repair in postnatal life, especially 
after injury. HGF/SF promotes liver regeneration 
after partial hepatectomy or chemical injury and anti- 
bodies against endogenous HGF/SF delay liver regen- 
eration. Liver regeneration is also accelerated in 
transgenic mice overexpressing HGF/SF under the 
control of a liver-specific promoter and transfection 
of the HGF/SF gene into muscle increased the plasma 
level of the factor and inhibited fibrosis and apoptosis 
of hepatocytes in a rat model of liver cirrhosis. 


HGF/SF, MET, and Cancer 


Several lines of evidence now indicate that HGF/SF 
and MET are involved in a prominent way in human 
cancer. In vitro HGF/SF induces carcinoma cells to be- 
come invasive and express matrix-degrading enzymes, 
and transgenic mice overexpressing HGF/SF develop 
a variety of tumors. Clinicopathological studies also 
imply the involvement of HGF/SF and MET in human 
cancer. HGF/SF is generally overexpressed in the 
stroma surrounding epithelial tumors and the MET 
receptor is overexpressed in epithelial cancer. Interest- 
ingly, MET overexpression or mutations strongly 
correlate with tumor progression and metastasis. 
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Further critical evidence implying MET in human 
cancer has come from the finding that patients with 
certain forms of renal and liver cancer carry both 
germline and somatic missense mutations in the kinase 
domain of the receptor. These mutations are tumori- 
genic when introduced into cells in culture or in trans- 
genic mice. Finally, patients with gastric cancer exhibit 
a tpr-MET rearrangement similar to the one which led 
to the initial discovery of MET. In this rearrangement, 
the promoter and 142 amino acids of the tpr sequence 
are fused with cytoplasmic sequences encoding the 
MET kinase. The resulting tpr-MET fusion protein is 
highly oncogenic as a result of: (1) the strong and 
constitutive promoter activity of the tpr gene, and (2) 
the presence in the 142 amino acids of tpr sequences of 
leucine zipper motifs that lead to constitutive dimer- 
ization of the MET kinase in the absence of ligand 
(Figure IC). The role of MET in human cancer out- 
lined above forms the basis for novel anticancer agents 
that target this receptor. 


See also: Angiogenesis; Cancer Susceptibility 


Metabolic Disorders, 
Mutants 


N Gregersen 
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Metabolic disorders are usually defined as inborn 
errors of metabolism, encompassing deficiencies in 
enzymes involved in the metabolism of carbo- 
hydrates, amino acids derived from proteins, and 
fatty acids liberated from lipids. The enzyme deficien- 
cies are attributable to inherited mutations in the 
genes coding for the respective enzymes. Classical 
and prominent examples are: phenylketonuria 
(PKU), caused by mutations in the phenylalanine 
hydroxylase gene; galactosemia, due to mutations in 
one of three enzymes of the galactose metabolism; 
fatty acid oxidation defects, which may be caused by 
mutations in one of at least 20 different genes. 

Most metabolic disorders are autosomal recessively 
inherited. Patients may therefore either be homo- 
zygous for a single mutation, inherited in one allele 
from both parents, or be compound heterozygous for 
two different mutations, one of which is inherited 
from the mother and the other from the father. Since 
these disorders show large allelic heterogeneity, with 
up to 400 different mutations (identified in PKU), 
homozygosity for a single mutation is only detected 
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in consanguineous families and in diseases where a 
founder effect has created a prevalent mutation. For 
instance, in the fatty acid oxidation disorder, medium- 
chain acyl-CoA dehydrogenase (MCAD) deficiency, 
one prevalent mutation (985A>G) is present in 90% of 
alleles in patients of Caucasian decent. Thus, 80% of 
patients are homozygous for this mutation, 18% are 
compound heterozygous with 985A>G in one allele 
and another MCAD gene mutation in the other 
allele. Only 2% of patients are homozygous for one 
or compound heterozygous for two non-985A>G 
mutations. 

With a slow start in the middle of the 1980s the rate 
of identification of mutations in metabolic disorders 
accelerated during the 1990s. To help clinicians and 
researchers the enormous number of mutations iden- 
tified in patients all over the world are collected in 
publicly accessible databases, e.g., OMIM (Online 
Mendelian Inheritance in Man); HGMD (Human 
Gene Mutation Database); and Locus Specific 
Databases, reachable through Human Genome 
Organizations Mutation Database Initiative. The dif- 
ferent types of mutations encountered in metabolic 
disorders are predominantly point mutations, small 
deletions, small insertions, and splicing mutations, 
but mutations in promoter elements and large dele- 
tions or insertions are also known. 

The various types of mutations exert their effect by 
different mechanisms, which are of significance for the 
molecular cell pathology of the diseases. According to 
their effect two main groups of mutations pre- 
dominate. One is premature termination codon 
(PTC) mutations, which if translated would produce 
truncated proteins. They introduce a stop codon, 
either as a result of point mutations or as a result of 
small deletions or insertions of 1, 2, 4, 5, 7, 8, etc. 
nucleotides or splice mutations, which shift the read- 
ing frame, and introduce a stop codon a few amino 
acids downstream. If the stop codon is located 
upstream of the last exon, mRNA species carrying 
such stop codons are detected by mRNA surveillance 
systems and degraded rapidly by the nonsense- 
mediated mRNA decay (NMD) system. These types 
of mutations are usually severe and abolish all 
enzyme activity, although splice mutations may be 
partial. The effects of these types of mutations are 
therefore generally predictable. The other group of 
mutations comprises point mutations, creating mis- 
sense mutant proteins, as well as small deletions or 
insertions of nucleotides, which do not change the 
reading frame. This group of mutations may result in 
severely or mildly altered proteins, depending on 
the actual position and nature of the amino acid 
change. Although it is possible in some cases to 
predict the effect of a given missense mutation 77 silico 


by computer analysis, it is generally necessary to per- 
form expression studies to assess the consequence at 
the protein level. 

Most expression studies related to metabolic dis- 
orders are designed to answer simple questions about 
the disease-causing nature of a given mutation. How- 
ever, questions about the molecular mechanism by 
which missense mutations exert their effect has only 
been carried out for a few selected enzyme deficien- 
cies. From studies of the biogenesis of phenylalanine 
hydroxylase (the defective enzyme in PKU) carrying 
missense mutations and of mutant fatty acid oxidation 
enzymes a picture is emerging: with the exception ofa 
few missense mutations that affect the catalytic site of 
the enzyme, missense mutations result in defective 
folding and premature degradation of enzyme protein. 
This implies that the effect of missense mutations is 
not only dependent on the nature and position of the 
mutation, but may also be determined by the effici- 
ency of the folding and degradation machinery of the 
cell. This machinery constitute the cell’s protein quality 
control systems, and it is composed of molecular 
chaperones and intracellular proteases. Consequently, 
variation in the efficiency of the protein quality con- 
trol systems among individuals and under different 
physiological conditions will affect the residual activ- 
ity of the missense mutant enzymes. 

Many studies of genotype—phenotype relationship 
in metabolic disorders have been carried out. In some 
cases an association between the severity of the 
mutations and the clinical expression of the disorder 
can be detected. In many cases, however, especially 
when missense mutations are involved, the association 
is weak. This illustrates the modifying effects from 
physiological and cellular as well as other genetic 
factors. 
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See also: Galactosemia; Phenylketonuria 
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One of the criteria used in the identification and clas- 
sification of chromosomes, after measurement of the 
chromosome length, is the position of the centromere 
(or primary constriction). The centromere is the site of 
attachment of the microtubules which form part of 
the mitotic apparatus involved in cell division. It has a 
fixed position in each chromosome and is therefore a 
useful landmark. Chromosomes in which the centro- 
mere is present in the mid region of the chromosome 
are termed ‘metacentric.’ When the centromere is at 
the end of the chromosome, the term ‘telocentric’ is 
used. When the centromere is close to one end of the 
chromosome, the term ‘acrocentric’ is used. When the 
centromere is between the middle and the end of 
the chromosome, the term ‘submetacentric’ is used. 


See also: Chromosome; Idiogram 


Methionine 
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Methionine (Met or M) is one of the 20 amino acids 
commonly found in proteins. Like cysteine, methio- 
nine contains an atom of sulfur. It is classed as one of 
the hydrophobic amino acids since it is only slightly 
soluble in water. Its neutral, unbranched side-chain 
makes it extremely flexible. Its chemical structure is 
shown in Figure I. 


Figure I Methionine. 


See also: Amino Acids; Proteins and Protein 
Structure 
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Metric, Four Point 
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A distance coefficient that satisfies the following con- 
dition (in addition to those of a metric) defines a four- 
point metric space: 


nit ôje < max{ j+ ik, pk +6; }four-point condition 


If the distances between pairs of objects in a phylo- 
genetic tree (or any weighted graph) is taken as the 
length (sum of the branch lengths) of the path between 
them then these distances will satisfy the four-point 
condition. These distances do not depend on the 
choice of the root or even whether the phylogenetic 
tree is rooted. 


See also: Metrics; Trees 


Metric, Manhattan 
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The Manhattan distance between two points 7 and j in 
a p-dimensional space is computed as: 


p 
dij = X lea — xel, 
a 


where xj, and x; are the kth coordinates of the ith and 
jth points, respectively. This coefficient is a distance 
coefficient because it satisfies the requirements of a 
metric. It is also called the L1 norm. 


See also: Metrics 


Metric, Ultra 
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A distance coefficient that satisfies the following con- 
dition (in addition to those of a metric) defines an 
ultrametric space for all objects 7, j, and k: 


1190 Metrics 
6 < max{ð;k, jk} ultrametric condition 


If the distances between pairs of objects in a pheno- 
gram or a hierarchical classification are taken as the 
dissimilarity level at which the groups they belong to 
first join, then these distances will satisfy the ultra- 
metric condition. In a phylogenetic tree with tips all at 
the same level, if distances between pairs of objects are 
defined as the dissimilarity level of their most recent 
common ancestors then these distances will also 
satisfy the ultrametric condition. 


See also: Metrics; Trees 


Metrics 
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A measure, 6,, of the dissimilarity between objects 
i and j is called a distance coefficient if it satisfies the 
following four conditions for all objects ż, j, and k: 


6; =O positivity condition 
6; = 0 
bij = ji 
bij < Oi + Sie 


identity condition 
symmetry condition 


triangle inequality condition 


In such cases one can visualize a space, called a 
pseudometric space, in which objects, such as 7 and J, 
correspond to points and 6, corresponds to the dis- 
tance between them. If, in addition, the following 
condition is satisfied then the distance coefficient 
defines a metric space: 


if ¿ Æ j,then 6; >0 definiteness condition 


This requires that different objects must not be 
identical. While this condition can easily be violated 
in small data sets, one assumes that no two objects will 
be identical if a sufficiently long sequence or enough 
descriptive variables are obtained. 


See also: Metric, Four Point; Metric, Manhattan; 
Metric, Ultra 
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Classifications of Leukemia 
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From 1986 to 1990, several related international col- 
laborative groups formulated a number of classifi- 
cations of acute myeloid leukemia (AML), acute 
lymphoblastic leukemia (ALL), the myelodysplastic 
syndromes, and the chronic lymphoproliferative dis- 
orders. These classifications are based on morphology, 
immunophenotype, and cytogenetics (MIC). The 
morphological classification adopted was that of the 
French-American-British (FAB) group. The MIC 
classifications resulted from the recognition that im- 
munophenotyping was essential for the diagnosis of 
some subtypes of leukemia and also that recurring 
clonal cytogenetic abnormalities permitted the iden- 
tification of specific disease entities with a greater 
degree of precision than was possible on the basis of 
cytology, cytochemistry, and immunophenotype. The 
application of cytogenetic analysis confirmed that the 
FAB group’s M3 AML and L3 ALL were indeed 
specific biological entities and, furthermore, con- 
firmed that the FAB group were correct in assigning 
hypergranular and hypogranular/microgranular pro- 
myelocytic leukemia to the same category, since they 
showed the same recurring cytogenetic abnormality. 
Cytogenetic analysis also permitted the recognition 
of entities that did not constitute FAB categories, 
such as AML associated with t(8;21)(q22;q22) (usually 
M2 AML), AML associated with inv(16)(p13q22) or 
t(16;16)(p13;q22) (often M4Eo AML) and AML asso- 
ciated with t(1;22)(p13;q13) (usually M7 AML in 
infants and young children). The application of the 
MIC principles of classification led to both scientific 
and practical advances. For example, the recognition 
of AML associated with t(8;21) and inv(16) was im- 
portant not only because it led to new knowledge as to 
mechanisms of leukemogenesis but also because re- 
cognition of the relatively good prognosis of these 
MIC categories meant that unnecessarily intensive 
treatment was not given to these patients. There are, 
however, some MIC categories that are likely to be 
heterogeneous, rather than representing genuine bio- 
logical entities. These include AML associated with 
deletion of the short arm of chromosome 12, and 
B-lineage or T-lineage ALL associated with deletion 
of the long arm of chromosome 6. 

The need to incorporate molecular genetic in- 
formation into the classification of hematological 


neoplasms led to the proposal, in 1998, of an MIC-M 
classification of AML and ALL. The MIC-M classifi- 
cation is a refinement of the MIC classification, being 
based on morphology, immunophenotype, cyto- 
genetic analysis, and molecular genetic analysis 
(MIC-M). This classification recognizes that it is the 
nature of the underlying molecular events that deter- 
mines the characteristics of any neoplasm and that 
identifying the genetic changes that have occurred 
will therefore permit more precise and scientifically 
accurate diagnosis. Furthermore, at a practical level, 
there are some leukemia-associated chromosomal 
rearrangements that can be defined only by molecular 
genetic analysis, either because the banding pattern of 
the chromosomes concerned is not sufficiently dis- 
tinctive to permit recognition of an abnormality 
or because the rearrangement has occurred at a sub- 
microscopic level. The former is the case with 
t(12;21)(p12;q22) associated with B-lineage ALL, 
whereas the latter is the case with a deletion upstream 
of the TAL gene, associated with T-lineage ALL. In 
addition, cytogenetic analysis may fail so that the 
application of molecular genetic analysis will permit 
the accurate diagnosis of more cases of acute leukemia 
than if reliance is placed only on morphology, immu- 
nophenotype, and cytogenetics. An example of the 
value of the MIC-M approach can be seen in relation 
to acute hypergranular promyelocytic leukemia (M3 
AML) and related disorders. M3 and M3 variant AML 
represent a single MIC and MIC-M category, since 
they show the same cytogenetic and molecular genetic 
abnormality, leukemogenic mechanism, and respon- 
siveness to treatment. However, M3-like AML asso- 
ciated with t(11;17)(q23;q21) shows subtle differences 
from M3/M3 variant AML. In the latter there is a PML- 
RARz fusion gene, whereas in the former there is a 
PLZF-RAR« fusion gene; this distinction is of practical 
as well as scientific significance, since M3/M3 variant 
AML responds to differentiating therapy with all- 
trans-retinoic acid, whereas M3-like AML does not. 


See also: FAB Classification of Leukemia; 
Leukemia; Leukemia, Acute; Leukemia, Chronic; 
MLL; WHO Classification of Leukemia 
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Microarray technology is a powerful technique used 
to compare differences in gene expression between 
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two mRNA samples. Comparing RNA prepared 
from diseased cells and normal cells can lead to the 
identification of sets of genes that play key roles in 
diseases. Genes that are overexpressed or underex- 
pressed in the diseased cells often present excellent 
targets for therapeutic drugs. The process uses micro- 
array chips, prepared commercially, which comprise 
numerous wells, each of which contains an isolated 
gene. mRNA is extracted from the ‘normal’ sample, 
and a fluorescent labeled cDNA probe is generated, 
representing all of the genes expressed in the reference 
sample. A second cDNA probe is generated using a 
different-colored fluorescent label and mRNA ex- 
tracted from the ‘affected’ cells. These may be cells ex- 
posed to a drug or toxic substance, taken from a tumor 
or diseased patient, or cells removed at a different time 
to the ‘normal’ sample. The two fluorescent probe 
samples are simultaneously applied to a single micro- 
array chip, where they competitively react with the 
arrayed cDNA molecules. Each well of the microar- 
ray is scanned for the fluorescence intensity of each 
probe, the intensity of which is proportional to the ex- 
pression level of that gene in the sample. The ratio of 
the two fluorescent intensities provides a highly accur- 
ate and quantitative measurement of the relative gene 
expression level in the two cell samples. 


See also: cDNA; Cell Markers: Green Fluorescent 
Protein (GFP) 
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The genetics of bacteria and bacteriophages played a 
key role in the development of molecular biology and 
our knowledge of the flow of genetic information in 
biological systems. Microbial genetics also provides 
tools for dissecting many other biological processes. 
Throughout the twentieth century, particularly the 
1940s, these simple organisms provided powerful 
experimental systems for the study of mutation, 
inheritance, the structure of the gene, control of gene 
expression, and the genetic basis of fundamental cel- 
lular processes such as intermediary metabolism and 
DNA recombination and repair. In addition, because 
many of the microbial experimental systems are 
pathogens, microbial genetics has provided a powerful 
approach to the understanding of infectious diseases. 
The best understood microbial systems, Escherichia 
coli, Salmonella typhimurium, Bacillus subtilis, and 
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their bacteriophages, were the primary models for 
these studies and our deep understanding of their biol- 
ogy derives from the range of genetic manipulations 
that have been developed in these organisms. Today 
these genetic tools have been extended to other bacteria 
that were previously much less tractable to study. 


The Power of Microbial Genetics 


A unique aspect of genetic analysis of microbes is the 
extremely high resolving power. A culture of bacteria 
contains over 10° organisms per milliliter and it is easy 
to observe rare genetic events (mutation, recombin- 
ation, rearrangements, gene transfer) that occur at 
frequencies of 10~'° per cell or lower. With bacterio- 
phages, events at frequencies 10-fold or more rarer 
than this are observable. This allows extremely fine 
dissection of genes, an important goal in genetic an- 
alysis. In addition, the ease of growing bacteria (E. coli 
has a doubling time of about 30min in media rich 
in nutrients) and bacteriophages (a typical growth 
cycle is about 1h) and quantitating their numbers 
(by measuring turbidity of a culture of bacteria or by 
counting bacterial colonies or bacteriophage plaques 
on agar medium in petri dishes) provide the basic 
elements for extremely powerful genetic analysis tech- 
niques. 


The Activities of Microbial Genetics 


Experiments in microbial genetics can be classified 
into sets of activities that apply to the study of any 
biological process. The five most important activities 
are generating mutations, genetic mapping, comple- 
mentation, suppression studies, and epistasis analysis. 
These can be applied to the whole genome of an 
organism or to specific genes. As an example of an- 
alysis applied to the whole genome, a genetic study of 
DNA repair ina bacterium would start with mutagen- 
esis of the bacterium followed by screening for 
mutants that are hypersensitive to UV irradiation, 
caused by a mutation causing a defect in a gene encod- 
ing a DNA repair enzyme. Subsequent analysis would 
involve the other activities and would be applied to the 
whole genome since there are numerous DNA repair 
genes located around the bacterial chromosome. On 
the other hand, to study a specific gene involved in 
DNA repair, mutagenesis would be directed at that 
gene (targeted or localized mutagenesis) and other 
activities would likewise focus on the gene. This latter 
situation is sometimes called fine structure genetics 
since it dissects the gene into its components, such as 
short control sequences needed for gene expression or 
functional domains of the gene product. With modern 
techniques, fine structure genetics can be carried to the 


ultimate level of mutating every codon in a gene (an 
example of saturation mutagenesis) so that each amino 
acid in the protein gene product is replaced by each of 
the other amino acids. 


Genetics Beyond E. coli 


Most microbial genetic techniques were developed in 
a few workhorse bacteria. E. coli became an important 
genetic system because the original strain (K12) that 
was studied fortuitously carried the F episome, which 
allowed conjugal gene transfer, important for genetic 
mapping, constructing strains with multiple muta- 
tions, and complementation. In addition, E. coli K12 
was lysogenic for bacteriophage lambda, and the dis- 
covery of this phage provided another important sim- 
ple system for study. Other important genetic systems 
associated with E. coli also became targets for study, 
such as other bacteriophages (e.g., T4), plasmids (e.g., 
colE1), and transposable elements. Over time, the con- 
centrated effort on E. coli yielded many other tools, 
such as bacteriophage P1 for generalized transduction 
(used for gene transfer), modified versions of plasmids 
and phage lambda as vectors for recombinant DNA 
techniques, and transposons for mutagenesis, to name 
just a few. The application of these genetic tools to 
other bacteria is an important development for micro- 
bial genetics, since it allows processes in pathogenesis, 
physiology, and other areas not present in E. coli to be 
studied. However, often genetic tools do not translate 
well into other important bacteria. For instance most 
bacterial species cannot be infected by phages from E. 
coli, E. coli plasmids will not replicate in other hosts, 
selectable markers used in E. coli cannot be used, or 
growth requirements may not be compatible with 
genetic methodology in other bacteria. In some 
cases, fine structure genetics or parts of the analysis 
can be performed on foreign genes that have been 
cloned into E. coli, and E. coli becomes a surrogate 
host for the analysis. However, in other instances the 
methods must be performed in the authentic host. It is 
sometimes possible to find genetic tools (plasmids, 
transposons, phages) that are native to these other 
organisms. In addition, a limited number of the plas- 
mids and transposons used in E. coli have an extended 
host range and will function in other bacteria. These 
broad host range elements are extremely useful and 
allow a reasonable subset of microbial genetic ap- 
proaches to be applied to other organisms. 


Mutagenesis and Phenotype 


Theory and Goals 
All genetic analysis starts witha phenotype, the observ- 
able trait that is followed in each manipulation. 


Phenotypes can be natural characteristics such as the 
ability to grow on a nutrient like lactose (the Lac* 
phenotype), tested by using a minimal medium that 
has lactose for a carbon source and few other growth- 
promoting compounds. Phenotypic traits can also be 
artificial, such as the ability to cause a color change in 
a compound introduced into the medium that is un- 
necessary for growth and probably never encountered 
by the bacterium in nature. In the case of lactose 
utilization, compounds (pH indicators) are used that 
undergo color changes due to the acid produced dur- 
ing lactose metabolism or due to their own degrad- 
ation by an enzyme of the lactose system (dyes). This 
allows one to observe which cells can use lactose by 
visual inspection of the color of colonies on agar in a 
petri dish. For example, the colorless compound X-gal 
(5-bromo-4-chloro-3-indolyl-B-p-galactoside) is con- 
verted into an insoluble blue indigo dye by the action 
of the lac system: thus colonies of Lac‘ cells are 
blue while the Lac” colonies are colorless. Such simple 
phenotypes are used to screen for mutants, e.g., a Lac” 
phenotype, after mutagenesis by visual inspection 
of colonies on plates or by testing individual colonies 
for their ability to grow on appropriate minimal media 
by replica plating. 

In addition to screening, selection for mutants with 
a desired phenotype can be performed if there is a way 
to inhibit growth of the wild-type cells. For the lac 
system there are compounds whose degradation by a 
lactose utilization enzyme yield products that are 
toxic to the cell. Thus when a culture of cells is plated 
on agar containing such a compound, only the Lac” 
cells will grow and form colonies, allowing direct 
selection of mutants. 


Tools and Types of Mutations 
When a genetic selection is used, often the spontan- 
eous mutation frequency is high enough to provide 
enough mutants for analysis. However, a wide variety 
of mutagenesis techniques are available to increase 
the frequency of mutations. These include treatments 
with chemical mutagens or radiation resulting in 
direct alteration of DNA bases and ultimate mispair- 
ing during replication or repair. Sometimes mutator 
strains are used in which the normal DNA replication 
and repair processes are aberrant (due to preexisting 
mutations in key genes), resulting in a high spontan- 
eous mutation rate. For cloned genes it is possible 
to create a specific desired mutation in a chemically 
synthesized oligonucleotide which is then used to 
create a new mutant gene with the desired mutation 
(site-directed oligonucleotide mutagenesis). 
Transposable elements (or transposons) provide 
another extremely versatile and useful method for 
mutagenesis. These genetic elements are short stretches 
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of genes with two important properties. First is the 
ability to insert into DNA that bears no relation to the 
transposon itself, creating insertion mutations when 
the transposon lands in a gene. Second, transposons 
contain selectable markers such as antibiotic resistance 
genes, which allow the presence of the transposon to 
be identified by the drug-resistant phenotype. These 
two properties allow insertion mutations to be 
selected by introducing a transposon into a cell and 
selecting cells that have become antibiotic resistant. 
Modified transposons are available that have been 
genetically engineered for easy introduction into the 
cell and to carry a range of selectable markers. 

A variety of classes of phenotypes are desirable to 
perform the whole range of genetic analysis that is 
possible in bacteria. An extremely important pheno- 
type is the null phenotype, where there is complete 
loss of function and one is essentially observing the 
properties of a cell lacking the gene of interest. Dele- 
tions are the ideal null mutation but insertions into a 
gene (c.g., with transposons) to create knockout muta- 
tions are often sufficient. Methods that create base 
changes allow a whole range of other important muta- 
tional alterations to be isolated: missense, nonsense, 
and frameshift mutations. Among these are various 
conditional phenotypes, such as suppressible non- 
sense and frameshift mutations or temperature- 
sensitive and cold-sensitive phenotypes (due to altered 
activity, stability, or folding of mutant proteins). Base 
change mutations are also essential for studying regu- 
latory sites, where a gene’s expression is turned on or 


off. 


Mapping 

Theory and Goals 

Following mutagenesis, one has a collection of cells 
with the desired phenotype. How many genes are 
involved in the process under study? What is the rela- 
tion of these gene(s) to other genes in the chromo- 
some? Does altering a single gene cause the phenotype 
or are multiple alterations required? To begin to 
answer these and other questions, the first step is to 
map the location of the mutations in the chromosome. 
The mutated genetic loci can be near each other or far 
apart. For example, the Lac” phenotype can be 
caused by mutation of five different genes: the lacZ 
(B-galactosidase) gene, the lacY (permease) gene, and 
the lacI (repressor) gene, located next to each other at 
about 1 o’clock on the circular genetic map, the crp 
(cyclic AMP-binding protein) gene, located at 9 
o’clock, and the cya (adenylyl cylase) gene, located at 
10 o’clock. The first two genes are required for degrad- 
ation and entry of lactose into the cell, respectively, 
and their loss of function directly prevents lactose 
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utilization. The last three genes are all involved in 
regulation of the expression of the first two genes 
and their alteration (but not necessarily loss of func- 
tion) impairs production of the lactose utilization 
enzymes, indirectly preventing lactose utilization. 

The key cellular process used in genetic mapping is 
homologous recombination: The ability of the cell to 
take DNA of nearly identical sequence and replace 
one version with the other. If a cell has a mutation in 
the lacZ gene, rendering it Lac’, and a wild-type copy 
of lacZ is introduced, homologous recombination can 
lead to the replacement of the mutant gene by the 
wild-type one, converting the cell from Lac to 
Lac’. This process does not usually replace just a 
single gene, but involves the replacement of the 
chunk of the chromosome that includes the lacZ 
gene. Thus by monitoring what other genes are 
replaced when the cell is converted from Lac to 
Lac”, one can identify what region of the chromo- 
some contains the mutated gene. This type of linkage 
analysis is the basis for mapping methods in virtually 
all organisms. More sophisticated and detailed ver- 
sions of this analysis (fine structure analysis) can 
zoom in on the specific location of a mutation within 
a gene or between genes. 


Tools 

In all cases, genetic mapping involves the transfer of 
genes between two organisms that differ in the pheno- 
type under study. Gene transfer is accomplished by 
one of three routes: transformation, transduction, or 
conjugation. In transformation, DNA from one 
organism is isolated and directly added to the other 
organism. Usually there is some treatment of the re- 
cipient organism to make it ‘competent’ to take up 
pure DNA. In transduction, a bacteriophage is used 
to carry genes from one host to another. During infec- 
tion some bacteriophages, such as P1 and P22, pro- 
duce a small number of phage particles that contain 
pieces of the host chromosome instead of bacterioph- 
age DNA. These transducing particles introduce this 
host DNA into the next bacterium that they infect, 
providing a simple method for gene transfer. Finally, 
in conjugation DNA is transferred directly from one 
bacterium to another by a process that certain plas- 
mids use to transfer themselves between hosts. This 
involves genes encoded on the plasmid chromosome 
that both create a channel between the donor and 
recipient bacteria and also direct the movement of 
DNA between the two bacteria. While all methods 
can be used to map mutations by linkage analysis, 
conjugation also provides a novel mapping method. 
In conjugation the transfer of the chromosome from 
the donor begins at a unique point, and genes are 
transferred sequentially. Thus one can determine the 


time of entry of the wild-type gene relative to the 
origin of transfer to map its location. 


Complementation 


Theory and Goals 

While genetic mapping determines the location of 
mutations in the chromosome, it does not define 
genes or functional units. For example, as mentioned 
earlier, the lacZ and lacY genes are adjacent to each 
other in the chromosome. Thus, two mutations, each 
giving a Lac” phenotype, that map to this region could 
be in either or both genes. In the absence of any other 
information it is not possible to distinguish between 
these possibilities. However, complementation analy- 
sis can resolve the issue. In a complementation test, the 
two mutated regions of the chromosome are put in 
the same cell, creating a diploid for this region of the 
chromosome (a merodiploid since only a part of 
the chromosome is diploid). Then the phenotype of 
the merodiploid is measured. If it is Lac, one con- 
cludes that the two mutations are in the same func- 
tional units (called a cistron) since they do not 
complement each other. However, if the phenotype 
is Lac’, the mutations must affect different cistrons 
since each part of the merodiploid can provide the 
function that the other is missing. 

There are a few controls that are done for a com- 
plementation test. First, each mutation must be reces- 
sive. That is, if it is in a merodiploid with a wild-type 
region of the chromosome, the resulting phenotype 
must be wild-type. Second, the complementation test 
is carried out in two parts, one with each mutation ona 
different chromosome in the merodiploid (the trans 
configuration) and one with the two mutations on the 
same chromosome in the merodiploid with the other 
chromosome being wild-type (the cis configuration). 
The term ‘cistron’ refers to these two configurations. 
While cistrons defined by the complementation test 
are often individual genes, they need not be. A gene 
encoding a multidomain protein could have two 
mutations in different domains that would comple- 
ment each other. Thus, this is a test for defining func- 
tional units, not purely genes. 


Tools 

The construction of merodiploids in a haploid organ- 
ism is the principal challenge in a complementation 
test. With the advent of gene cloning, this has become 
much more straightforward than at the time the com- 
plementation was originally invented. One typically 
has one mutation in the cell’s chromosome and the 
other in a region of the chromosome that is cloned 
in a plasmid or bacteriophage vector. In the case of 
a plasmid, the other mutation is maintained as an 


extrachromosomal element, while bacteriophage 
vectors are usually integrated into the cell’s chromo- 
some at the phage attachment site. 


Reversion and Second-Site Suppression 


Reversion of a mutation refers to a second mutational 
event that changes the phenotype to its original state. 
Thus a mutant strain with a mutation in the lacZ gene 
that causes a Lac” phenotype can be reverted to wild- 
type, the Lac” phenotype, by a second round of muta- 
genesis. One type of reversion event changes the 
mutated base pair back to the original wild-type base 
pair, which is called a true revertant. However, muta- 
tions at other positions (second-site revertants) can 
also reverse the phenotype. A second mutation in 
lacZ might change the amino acid sequence of the 
gene product, B-galactosidase, so that the combination 
of the two altered amino acids is now a functional 
protein. Thus the second amino acid change sup- 
presses the defect caused by the first change. Similarly, 
if the first mutation reduced (but did not eliminate) 
enzyme activity, a second mutation that increased 
expression of the lacZ gene could suppress the Lac” 
phenotype by producing more of the weaker enzyme. 
These are examples of second-site suppressors. 

Second-site suppression is most important when it 
is used to find new genes that influence a biological 
process. For example, the lamB and malE proteins are 
involved in uptake of maltose into E. coli. These pro- 
teins must be exported to the cell surface to function. 
Mutations in these genes were isolated that were 
defective because they altered the signal sequence, 
the portion of the protein that is recognized by the 
cell’s secretory apparatus. Second-site suppressors 
that restored function included mutations in the 
genes encoding parts of the secretory apparatus. 
These mutations reduced the specificity of the secre- 
tion system, allowing recognition of the altered signal 
sequences. This allowed new components of the secre- 
tion system to be identified. 


Epistasis and Pathway Analysis 


When a mutation in one gene is epistatic to a mutation 
in a second gene, it masks the phenotype of the second 
mutation when a doubly mutant strain is constructed. 
This is important in determining if the two genes 
affect the same or different pathways, and the order 
in which they act in the pathway. For example, in yeast 
both the ade1 and ade4 mutants are unable to synthe- 
size adenine. In the ade1 mutant the colonies are red 
because of the accumulation of a biosynthetic inter- 
mediate at the blocked step, while in the ade4 mutant 
the colonies are white, like wild-type, because this 
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intermediate does not accumulate. The double mutant 
ade1 ade4 forms white colonies, indicating that ade4 is 
epistatic to ade] and that the ade4 mutation blocks the 
adenine pathway before the step that ade/ affects. 

A related example is two mutations that affect 
DNA repair genes, each giving a phenotype of 
increased sensitivity to UV radiation. If the two gene 
products are in different pathways, then a strain that is 
mutant for both genes will be defective in both path- 
ways and will be more UV-sensitive than either 
mutant alone. However, if they are in the same 
pathway the double mutant will show the same UV- 
sensitivity as the single mutants. 


Analysis of Regulons and Regulatory 
Circuits 


Theory and Goals 

Microbial genetics provides powerful tools for de- 
ciphering the regulation, as well as the functional and 
pathway organization, of cellular processes. This 
involves both discovering the regulatory genes and 
sites that control individual gene expression, as well 
as determining which genes are coregulated and thus 
likely to participate in the same process. Often genes 
that are coregulated are located next to each other in 
the same transcriptional unit (an operon) but there are 
numerous cases of dispersed sets of genes that are 
coregulated (regulons). 


Tools 

Gene fusions are the traditional genetic tool for study- 
ing regulation. The most popular approach is to con- 
struct a hybrid gene (gene fusion) using a truncated 
lacZ gene that contains the coding sequence for the B- 
galactosidase enzyme but lacks signals for initiating 
transcription and sometimes also lacks its translational 
start signals. In the hybrid, the signals from a gene of 
interest are placed immediately before the truncated 
lacZ gene, so that the regulators of the gene of interest 
will now control lacZ expression. The colorimetric 
screens described above for the lac system can now 
be employed to study the desired regulatory system. 
Mutations can be isolated that increase or decrease 
lacZ activity, and these mutations will be in the regu- 
lators of the gene of interest. 

Identifying coregulated genes can also employ lacZ 
fusions as a reporter system. In this case, lacZ gene 
fusions are constructed randomly throughout the bac- 
terial chromosome, usually with the aid of a transpos- 
able element that can create a lacZ fusion when it 
inserts in a gene in the correct orientation. Each fusion 
strain grows up as a colony and expression is then 
compared under desired conditions (e.g., with or 
without a DNA-damaging treatment for studying 
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regulation of DNA repair gene expression) using a 
colorimetric indicator plate to assess whether expres- 
sion of each fusion increases, decreases, or is un- 
changed. By this method exhaustive screening of all 
genes can be achieved to identify those that show 
common regulatory patterns. 


Genomics 


No discussion of microbial genetics would be accept- 
able without some mention of microbial genomics and 
its influence on the field. Since the first complete 
bacterial genome sequence was completed in 1995, 
dozens of other have appeared. Armed with the com- 
plete DNA sequence, the microbial geneticist can now 
perform many of the tasks described above in a 
broader yet more efficient manner. For example, it is 
possible to create knockout mutations in every (non- 
essential) gene in an organism now, since the sequence 
allows directed mutations to be constructed. Mapping 
of mutations to the nucleotide can be performed with 
much greater ease because of the availability of com- 
plete reference (wild-type) sequences. And studies of 
regulation can now be performed with DNA arrays 
that measure the changes in transcription of every 
gene in the chromosome at the same time. These and 
other future developments based on genomics will 
continue to extend the power of microbial genetics. 
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In1995 thefirst complete genome sequence and annota- 
tion of a free-living organism, Haemophilus influenzae, 


was completed. This accomplishment ushered in the 
genomic era for microbiology. Currently the range of 
genome sequencing projects includes representatives 
from all three domains of life, and provides good 
coverage of most major groupings within the Archaea 
and Eubacteria. However, there is a relative concentra- 
tion of sequencing projects on well-studied groups 
(i.e., Y-proteobacteria and the low GC gram-positive 
bacteria) while other groupings, such as the Crenarch- 
aeota, are very underrepresented. Another way to 
consider the type of microorganism being sequenced 
is by their ecological role. Considered in this manner, 
pathogenic bacteria and microorganisms from extreme 
environments are well represented in current genome 
sequencing efforts. However, organisms of agricul- 
tural significance and difficult to culture organisms 
are currently relatively poorly represented, but with 
the increasing rate of genome sequencing, it is antici- 
pated that these deficiencies will be temporary. 

The diversity in the representative organisms allows 
for comparative studies of genome composition and 
gene organization within and across the domains. 
Insight has been gained into how genes are acquired 
and shared between organisms (Nelson et al., 1999; 
Heidelberg et al., 2000), and the ability of bacteria to 
change their genome composition rapidly by captur- 
ing and maintaining megaplasmids (White et al., 1999; 
Heidelberg et al., 2000). The later events have been 
suggested to increase the competitive nature of Vibrio 
cholerae in the aquatic ecosystem. 

In these early days of genomics, a major challenge 
to the scientific community is both keeping up to date 
with the remarkable amount of genomic data being 
released, determining the most relevant data for a 
particular study, and determing how to best apply 
these data to your science. This article will review 
the current status of the field of microbial genomics, 
discuss hypotheses being addressed in environmental 
microbiology by the use of genomic data, and give an 
overview of where the field of genomics is going. 


Microbial Genome Sequencing and 
Annotation 


The random shotgun sequencing method is currently 
the most efficient and cost-effective strategy for com- 
pletion of microbial genomes (Frangeul et al., 1999). 
This approach has successtully been used to comple- 
tely sequence microorganisms with varying genomic 
characteristics including variations in genome size 
(560 kb to 6.2 Mb), base composition (19% to 67% 
G+C), presence of various repeat elements, inser- 
tion sequence (IS) elements, and multiple chromo- 
somal molecules and plasmids (Fraser et al., 1995, 
1997; White et al, 1999; Heidelberg et al, 
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This approach has successtully been used to comple- 
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(560 kb to 6.2 Mb), base composition (19% to 67% 
G+C), presence of various repeat elements, inser- 
tion sequence (IS) elements, and multiple chromo- 
somal molecules and plasmids (Fraser et al., 1995, 
1997; White et al, 1999; Heidelberg et al, 


2000). In the random shotgun method, total DNA of 
the organism of choice is isolated, randomly sheared, 
size selected, cloned into a plasmid, and the ends of the 
clones are sequenced to give a predetermined level of 
coverage that represents the entire genome. 

The theory for shotgun sequencing is based on the 
Lander—Waterman application of the equation for the 
Poisson distribution (Lander and Waterman, 1988). 
This model allows for the determination of the number 
of sequence reactions needed by estimating the total 
genome size and the total sequence lengths for each 
individual reaction. However, in practice more se- 
quence gaps in the genome are likely to occur than 
predicted by the model due to repeat areas, secondary 
structures, and unclonable regions in the genome. The 
successful construction of random sequencing li- 
braries, with complete coverage of the genome and few 
‘no-insert’ or chimeric clones, is the most critical step 
for the generation of good representation of the entire 
genome during from the random sequencing phase. 
Once a sufficient number of sequences are generated 
(e.g., eightfold sequence coverage of the genome), the 
sequences are assembled into continuous DNA assem- 
blies of the consensus sequence from the shorter indi- 
vidual clone sequences (contigs). 

Any unsequenced regions of the genome are closed 
by acombination of methods. Contigsthatare linked by 
forward/reverse clone pairs can usually be closed by se- 
quencing off the spanning clone, or by sequencing a 
polymerase chain reaction (PCR) product generated 
from primers designed at the ends of the contig. Gaps 
for which there is no linking clone information are 
ordered by multiplex or combinatorial PCR (Tettelin 
et al., 1999), or optical maps (J. Lin et al., 1999). Direct 
walking on bacterial DNA can also be used to close 
these gaps. All repetitive sequence regions including 
IS elements, ribosomal RNA regions, or transposons 
are confirmed by walking spanning clones across the 
repetitive regions. 

Sequencing and closure do not represent the end of 
a microbial genome project, as bioinformatic analysis 
of the completed sequence is essential for interpreting 
and understanding the data. Bioinformatic analysis 
involves identification of all open reading frames 
(ORFs), and other features (tRNA, rRNA, repeated 
sequences, etc.) in the genome and subsequent analyses 
of these features. Gene prediction programs using 
Hidden Markov models (HMMs) or Interpolated 
Markov models (e.g, GLIMMER: Delcher et al., 
1999) effectively identify microbial genes in an auto- 
mated fashion. Biological names and functions are 
assigned where possible by a combination of com- 
puter programs and human annotation/curation. 


Functional predictions are based both on traditional 
methods such as BLAST or FASTA searches against 
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sequence databases, as well as approaches based on 
homologous families of proteins, such as HMMs, 
Pfams, and COGs (Bateman et al., 1999; Tatusov 
et al., 2000). In addition to the identification of all 
ORFs, annotation also involves the identification of 
intergenic regions and novel features on the genome 
including nucleotide biases, origins of replication, 
putative regions of horizontal gene transfer, repeat 
structures, insertion elements, and plasmids. More 
detailed analyses of the genomic sequence can allow 
for a reconstruction or complete description of the 
biology of the organism. See for example a recent 
reconstruction of the physiology and transport abil- 
ities of V. cholerae (Figure 1). 

A major problem encountered with the dissemin- 
ation of genome data is cascading gene nomenclature 
error also known as transitive catastrophe error. This 
occurs when an overly ambitious gene name and bio- 
logical function is assigned to an ORF with no experi- 
mental evidence. This incorrect gene assignment can 
then in turn be passed onto the next genome during 
annotation, and so on. This type of transitive error can 
be reduced in several ways; the first is by careful and 
consistent reannotation of genomes, and consult- 
ing new computational models and phylogenomic 
methods for gene naming. It is critical that future 
ORF assignments take into account this potential 
problem. 


Current Applications of Genomic Data 


Genome data is such a powerful tool because it allows 
the ability to consider microorganisms in a more com- 
prehensive context. Also, in combination with func- 
tional genomics, the genome sequence and annotation 
information allows for a more complete modeling of 
an organism’s global response to changes in its envir- 
onment. 


Understanding Pathogenic Bacteria 

Genomic information can greatly expedite the search 
for new drugs and vaccine candidates to help cure and 
prevent human disease. In an effort to more rapidly 
overcome the obstacles of vaccine development for 
Neisseria meningitidis genome sequencing and vac- 
cine candidate identification were undertaken in par- 
allel (Pizza et al., 2000; Tettelin et al., 2000). An 
effective N. meningitidis vaccine has been difficult 
because of sequence variation in surface-exposed pro- 
teins and cross-reactivity of the serogroup B capsular 
polysaccharide with human tissue. During the sequen- 
cing phase of the bacterium, DNA fragments were 
examined that might contain ORFs that were poten- 
tially encoding novel surface-expressed or exported 
proteins. This produced a list of 570 potential vaccine 
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Figure | (See Plate 23) Reconstruction of transport and metabolism of Vibrio cholerae based on the annotated 
genome sequence. Pathways for energy production and the metabolism of organic compounds, acids, and aldehydes 
are shown. Transporters are grouped by substrate specificity: cations (green), anions (red), carbohydrates (yellow), 
nucleosides, purines and pyrimidines (purple), amino acids/peptides/amines (dark blue), and other (light blue). 
Question marks associated with transporters indicate a putative gene, uncertainty in substrate specificity, or direction 
of transport. Permeases are represented as ovals, ABC transporters are shown as composite figures of ovals, 
diamonds, and circles, porins are represented as three ovals, the large-conductance mechanosensitive channel is 
shown as a gated cylinder, other cylinders represent outer membrane transporters or receptors, all other 
transporters are drawn as rectangles. Export or import of solutes is designated by the direction of the arrow through 
the transporter. If a precise substrate could not be determined for a transporter, no gene name was assigned and a 
more general common name reflecting the type of substrate being transported was used. Gene location on the two 
chromosomes, for both transporters and metabolic steps, is indicated by arrow color: all genes located on the large 
chromosome (black), all genes located on the small chromosome (blue), all genes needed for the complete pathway 
on one chromosome, but a duplicate copy of one or more genes on the other chromosome (complete pathways, 
except for glycerol, are found on the large chromosome) (purple), required genes on both chromosomes (red), 
complete pathway on both chromosomes (green). Gene numbers on the two chromosomes are in parenthesis and 
follow the color scheme for gene location. Substrates underlined and capitalized can be used as energy sources. 
Abbreviations: PRPP, phosphoribosyl-pyrophosphate; PEP, phosphoenolpyruvate; PTS, phosphoenolpyruvate- 
dependant phosphotransferase system; ATP, adenosine triphosphate; ADP, adenosine diphosphate; MCP, methyl- 
accepting chemotaxis protein; NAG, N-acetylglucosamine; G3P, glycerol-3-phosphate; glyc, glycerol; NMN, 
nicotinamide mononucleotide. *Since V. cholerae does not use cellobiose, we expect this PTS system to be involved 


in chitobiose transport. **Complete pathways, except for glycerol, are found on the large chromosome (Heidelberg 
et al., 2000). 


candidates, and from those seven were selected for 
extensive study because they gave a positive result 
for several immunological assays and were predicted 
not to be phase variable (Pizza et al., 2000). 

Genomics has also led to the improvement of vac- 
cines. Research on V. cholerae led to the identification 
of several key virulence factors (e.g., cholera toxin, 
which causes the diarrhea, and the toxin-coregulated 
pilus, required for colonization of the human intes- 
tine). However, even with these virulence genes and 
several other putative accessory toxins deleted from 
the genome, live, attenuated V. cholerae vaccines still 
remain reactogenic in humans, causing diarrhea and 
other symptoms. During the sequencing phase of V. 
cholerae, a new toxin was discovered in the genome. 
This toxin belongs to the repeat in toxin (RTX) family 
of toxins, and it likely plays a role in the reactogenicity 
of some live V. cholerae vaccines (W. Lin et al., 1999). 

An important defense mechanism in humans is iron 
limitation. A central paradigm has been that a bacterial 
pathogen must first overcome the human host iron 
limitation to establish a successful infection. However, 
genome analysis of the Lyme disease pathogen, Borre- 
lia burgdorferi, suggest very few metalloproteins 
and even those that typically have iron as a cofactor 
have significant similarity to manganese-dependant 
enzymes (Fraser et al., 1997). Additionally, analysis 
of B. burgdorferi membranes indicate they lack metal- 
loproteins commonly associated with bacterial 
cytoplasmic membranes (Bledsoe et al., 1994). These 
indicate that B. burgdorferi has evolved a novel 
mechanism to overcome host iron limitations, elimin- 
ating proteins that require iron as a cofactor and sub- 
stituting manganese for iron in the few metalloproteins 
it maintained (Posey and Gherardini, 2000). 

Several environmental bacteria have acquired the 
capacity to cause serious human disease. Notable 
examples that currently have had their genome 
sequenced include V. cholerae and Pseudomonas 
aeruginosa (Heidelberg et al., 2000; Stover et al., 
2000). The genome sequence of these environmental 
pathogens allows for a more complete understanding 
of how environmental bacteria emerge to become sig- 
nificant human pathogens. Vibrio cholerae seems to 
have achieved its human pathogenicity in several dif- 
ferent mechanisms. These include the chromosomal 
integration of a filamentous phage (CTX) containing 
the cholera toxin genes, and other recently acquired 
regions of DNA (the VPI or vibrio pathogenicity 
island). Both of these regions reside on the large 
chromosome and have a trinucleotide composition 
that is significantly different from the rest of the V. 
cholerae genome. Also, the small chromosome (appar- 
ently a captured megaplasmid) contains an integron 
island which has allowed additional gene capture 
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events and several of the ORFs in this region appear 
important to the pathogenicity of this bacterium (e.g., 
drug resistance genes and virulence factors). 

Additionally, genome analysis offers additional evi- 
dence to evaluate genes implicated in human patho- 
genicity. The maltose-sensitive hemagluttinin (MSHA) 
was originally implicated in intestinal colonization; 
however, recently several investigators have reported 
MSHA is not required for intestinal colonization, but 
instead is important in biofilm formation. Therefore, 
MSHA maybe more important in the ‘environmental 
fitness’ of V. cholerae rather than pathogenic potential. 
Interestingly, this gene cluster does not appear to be 
recently acquired (i.e., there are no integrase, transpo- 
sase, or phage homologs that might suggest an origin 
other than V. cholerae), and the trinucleotide compo- 
sition is similar to the rest of the chromosome. These 
genome analysis tools suggest these genes have been in 
the V. cholerae genome longer than the other patho- 
genicity genes, and thereby imply their greater impor- 
tance in the environmental aspects of this bacterium 
rather than the pathogenic. 


Understanding Phylogeny and Evolution 


Horizontal gene transfer and acquired genes 

Analyses of genomic data from microorganisms sug- 
gest that a single universal phylogenetic tree may not 
be the most accurate depiction of relationships among 
organisms; instead a net-like pattern that reflects the 
frequency and significance of horizontal/lateral gene 
transfer has been proposed (Doolittle, 1999). A very 
recent example of lateral gene transfer has been shown 
from the genome sequences of the two hyperthermo- 
philic Archaea Pyrococcus furiosus and Thermococcus 
litoralis (Diruggiero et al., 2000). Both organisms 
share a 16kb region on their genomes that contains 
only 173 nucleotide differences from one to the other; 
the 16 kb insert in P. furiosus is flanked by insertion 
elements with inverted and direct repeats. Similarly, 
analysis of the Thermatoga maritima genome se- 
quence suggested that almost one-quarter of the 
genome was acquired by lateral gene transfer with 
extensive conservation of gene order with the thermo- 
philic Archaea (Nelson et al., 1999). These acquired 
genes likely convey some selective advantage to these 
thermophiles, or alternatively may not be deterimen- 
tal to the organism and have subsequently ameliorated 
into the genome. Lawrence and Ochman (1998) con- 
cluded that subsequent to the divergence of Escheri- 
chia coli and Salmonella (100 million years ago), 10% 
of the E. coli genome was acquired in over 200 events 
of lateral gene transfer. Their data also suggest that a 
significant percentage of E. coli might have been 
acquired recently, at an average rate of 16 kb per million 
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years (Lawrence and Ochman, 1998). Complete gen- 
ome sequences of closely related Pyrococcus spp. and 
Chlamydia spp. also have shown significant differ- 
ences in genome structure, organization, and compos- 


ition (Kawarabayasi et al., 1998; Read et al., 2000). 


Captured megaplasmids 

Another mechanism for a bacterium to rapidly change 
its overall genomic content is by capturing a mega- 
plasmid and all the associated genes. Once captured, 
required genes can be moved from the chromosome to 
the megaplasmid, thus making the captured megaplas- 
mid essential for the survival of the cell, and thereby, 
stabilizing this new replicon. This model was proposed 
for the small chromosome of V. cholerae (Heidelberg 
et al., 2000). In this case, the genome sequence analysis 
suggests that the smaller replicon was captured by an 
ancestral Vibrio, subsequent to traveling through a 
broad range of hosts. The capture of this megaplasmid, 
which presumably contained genes that gave the 
ancestral Vibrio a competitive advantage in its ecosys- 
tem, resulted in it being stabilized by the transfer of 
essential genes to this replicon. 

In addition to containing genes that make the cell 
more competitive, second chromosomes (and mega- 
plasmids) may increase survivability and speed re- 
covery from hostile environmental conditions. Such 
situations have been suggested from analysis of the 
small chromosomes from Deinococcus radiodurans 
and V. cholerae. For D. radiodurans it appears that 
the small chromosome may have genes involved in 
de novo synthesis and importing precursors (White 
et al., 1999). For V. cholerae the small chromosome 
has been suggested to help cells survive in biofilms and 
as a suggested model to help explain the viable but 
nonculturable (VBNC) state (Heidelberg et al., 2000). 


Application of Genomic Data to 
Bioremediation 
Genomic analysis is under way of organisms poten- 
tially useful in the bioremediation of waste and radi- 
ation exposed sites, contaminated soils, ground waters, 
sewage, and solvent disposal. One example is the recent 
sequencing of D. radiodurans as a model organism for 
potential remediation of radioactive waste sites. Deino- 
coccus radiodurans is the most radiation-resistant 
organism known and is also capable of reducing 
Fe(II1)-nitrilotriacetic acid coupled to the oxidation 
of lactate to CO, and acetate, and uranium and tech- 
netium in the presence of humic acids or synthetic 
electron shuttle agents (Fredrickson et al., 2000). 
Expression of heterologous genes in D. radiodurans 
has led to the development of strains which can 
reduce Hg(II) to volatile elemental mercury which is 
less toxic (Brim et al., 2000) and express toluene 


dioxygenase enabling the organism to oxidize toluene, 
chlorobenzene, 3,4-dichloro-1-butene, and indole 
(Lange et al., 1998). The mercury-resistant strains can 
grow in the presence of both radiation and ionic mer- 
cury at concentrations well above those found in radio- 
active waste sites. Thus, engineered D. radiodurans 
strains show substantial promise for bioremediation 
of mixed wastes exposed to radiation. 


Comparative Genomics 


Comparison of the transport capabilities of 
microorganisms 

The completion of substantial numbers of genome 
sequences allows the undertaking of comparative 
genomic studies. One example of such an approach 
was the comparison of membrane transport proteins 
between 18 prokaryotic microorganisms (Paulsen 
et al., 2000). Overall analysis of the transporters in 
organisms with diverse lifestyles revealed the total 
numbers of transporters and their substrate specifici- 
ties correlated well with the likely concentration and 
diversity of nutrients in their particular habitat. For 
example, phylogenetically distinct intracellular para- 
sites, such as the chlamydias and Rickettsia prowaze- 
kii, have an extensive set of transporters for amino 
acids and nucleotides, but little ability to transport 
free sugars, which almost certainly reflects the relative 
accessibility of these compounds in an intracellu- 
lar environment. Additionally, the energy-coupling 
mechanisms of transporters correlated well with the 
mode of energy generation in each organism. For ex- 
ample, the mycoplasmas and spirochetes, which lack a 
TCA cycle and an electron transfer chain and hence 
can only generate a proton motive force by substrate- 
level phosphorylation, were highly dependent on 
ATP-dependent rather than proton-dependent trans- 
porters, whereas the converse was true of more meta- 
bolically versatile organisms such as E. coli. Thus, 
comparative genomic studies can provide insight into 
the physiological differences and similarities between 
organisms. 


Future Applications for Genomics 


Functional Genomics 
Functional genomics, in comparison to traditional 
approaches that investigate the role of a single gene 
or protein, employs high-throughput/large-scale 
approaches to investigate the roles of large numbers 
of genes or proteins systematically. Functional geno- 
mic technologies include microarray expression an- 
alysis, large-scale gene knockouts, and proteomics. 
DNA array or “DNA chip’ technology enables the 
measurement of expression patterns of thousands of 


genes in parallel. Large numbers of PCR fragments or 
oligonucleotides, typically corresponding to all of the 
genes from a particular organism, are immobilized 
onto a support matrix (glass slide, nylon membrane, 
or silica chip). This matrix is then probed with labeled 
mRNA isolated from cells grown under different con- 
ditions to examine gene expression, or with DNA 
from different strains or isolates to look at genome 
variability between strains. The use of microarrays 
enables the identification of genes expressed under 
similar conditions, and hence provides insight into 
the function of uncharacterized genes. 

Proteomics uses two-dimensional gel electrophor- 
esis to examine protein production and localization, 
and hence provides a complementary approach to that 
of microarray gene expression studies. Matrix-assisted 
desorption/ionization—-time of flight (MALDI TOF) 
mass spectrometry enables high-throughput and high- 
sensitivity screening of protein samples derived from 
two-dimensional gel electrophoresis (Traini et al, 
1998). Using such an approach it starts to become 
practical to undertake whole proteome analysis stud- 
ies for completely sequenced microbial organisms. 

Large-scale gene knockout approaches for study- 
ing gene function are now feasible using complete 
genome sequence data. Possible approaches for con- 
structing gene knockouts on a genomic scale include 
saturation transposon mutagenesis and identification 
of the transposon insert sites by sequencing, or by 
making targeted gene knockouts (Traini et al., 1998; 
Hutchison et al, 1999). The construction of such 
knockouts enables the presumptive identification of 
essential genes, and enables the generation of large 
banks of gene mutants, whose function can subse- 
quently be examined by high-throughput phenotypic 
screening (Bochner, 1989). The Mycoplasma minimal 
genome project is one example of such an approach 
(Kanehisa and Goto, 2000). In this study, transposon 
mutagenesis of Mycoplasma genitalium and M. pneu- 
moniae was used to identify nonessential genes under 
laboratory growth conditions (Hutchison et al., 1999). 
As M. genitalium has the smallest known microbial 
genome with only 480 protein-encoding genes, this 
approach indicated that only approximately 265-330 
of these genes were essential for growth under the 
conditions examined, thus providing an estimate of 
the minimal genome required for life. 


Bioinformatics Applications 


Databases 

The tremendous amount of data being generated by 
genome sequencing projects makes the development 
of user-friendly databases and new bioinformatic 
tools essential. In particular, researchers need to 
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be able to compare gene content across different 
sequenced organisms. A variety of second generation 
biological databases have been developed which 
address particular demands resulting from the enor- 
mity of genomic data. An increasingly important fea- 
ture of databases (e.g., Omnione: http://www.tigr.org, 
Ecocye: _ http://ecocyc.PangeaSystems.com/ecocyc, 
Interpro: http://www.ebi.ac.uk/interpro/) is that 
they incorporate detailed manual curation of the data 
in addition to sophisticated automated analysis. Other 
metabolic databases include WIT (http://wit.mcs.anl. 
gov/WIT2) (Overbeek et al, 2000) and KEGG 
(http://kegg.genome.ad.jp/kegg/) (Kanehisa and 
Goto, 2000). Such metabolic databases are valuable 
in metabolic reconstruction of pathways in newly 
sequenced genomes. 


Exploring Microbial Diversity 


Non-culturable microorganisms and genomic potential 
One of the most exciting future steps for genomics is 
the analysis of the population of uncultured or uncul- 
turable microorganisms. To date, studies on uncultur- 
able bacteria have been primarily limited to 
phylogenetic analysis based on 16S rRNA sequence 
and enumeration of specific 16S rRNA containing 
cells. These methods have greatly increased our 
knowledge of the phylogenetic diversity of many eco- 
systems, but they do not allow accurate determination 
of the functional niche these microorganisms occupy. 
Based on 16S rRNA sequence, the only way that 
biogeochemical function can be assigned to an unchar- 
acterized organism is by relatedness to cultured 
bacteria. However, genomics confers the ability to 
examine both the biogeochemical capabilities of 
uncultured or unculturable bacteria (genomic poten- 
tial) and what specific genes and metabolic pathways 
are being expressed in response to changes in the 
environment (functional genomics, see above). 

To determine the genomic potential of an environ- 
ment, the DNA from the environment is isolated, 
cloned into bacterial artificial chromosomes (BAC), 
and these BACs are sequenced to closure. These 
BACs can then be annotated similarly to an entire 
microbial genome (i.e. genes found and roles 
assigned, RNAs found, etc.). While this does not 
necessarily give the genome of any single uncultured 
bacterium, the gene content of the BACs can give an 
idea of what important biogeochemical processes may 
be going on in an environment. Also, such methodo- 
logies may prove valuable commercially because of the 
large potential for new gene discovery. This also has 
the advantage of providing a mechanism for expres- 
sing genes from unculturable organisms in alternative 
hosts, such as E. coli. 
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Conclusions 


The advent of complete genome sequencing has revo- 
lutionized biology, allowing biologists to ask and 
answer questions on a genome-wide scale that was 
not previously possible. Current and ongoing bioin- 
formatic analyses of complete genome sequences has 
provided insights into genome organization, gene regu- 
lation, gene content, novel genes, and gene families, 
and the many biochemical pathways that reside in 
these organisms. The availability of complete genome 
sequences has led to the development of new fields 
such as microarray expression analysis and proteomics 
that allow for studies on a global scale. 
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Birds tend to have a high chromosome number, with 
the majority between 74 and 80 pairs (Christidis, 
1990). The lowest recorded diploid number is 40 for 
the stone curlew (Burhinus oedicnemus) and the high- 
est is 126 in the hoopoe (Upupa epops). Chromosomes 
are divided into macrochromosomes and micro- 
chromosomes based on their size, the typical number 
being 14-16 macrochromosomes and 60-64 micro- 
chromosomes. In the chicken (Gallus gallus), for 
example, the karyotype comprises 39 pairs of chromo- 
somes. The term macrochromosomes is used only for 
the longest size chromosomes pairs (1, 2, 3, 4, 5, and 
the Z sex chromosome). The remaining 33 pairs of 
small chromosomes and the W sex chromosome are 
called the microchromosomes. In the standard chicken 
karyotype (Ladjali-Mohammedi et al., 1999), it is pos- 
sible to distinguish macrochromosomes 1-5 and Z, 
and the microchromosomes 6-8 and W. 
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Micro-complement fixation (MC’F) is an immuno- 
logical technique used to quantitatively estimate 
amino acid sequence differences between homologous 
proteins from different species. Antisera are raised in 
rabbits to highly purified proteins. The resultant high- 
affinity, broad-specificity antibodies are used to meas- 
ure the degree of reactivity of antigens from pairs of 
species. The resultant immunological distance meas- 
ured between two antigens is highly correlated with 
the degree of similarity of the antigenic sites of the two 
homologous proteins and has been shown to be a 
linear estimator of amino acid replacements between 
the two antigens. 

MCF has been used extensively in phylogenetic 
studies of vertebrates that have permitted estimates 
of both branching patterns and timing of speciation 
events. MC’F was first used to infer the divergence 
dates of humans and chimpanzees (estimated in 1967 
as 3-5 million years BP). Since that time MC’F has 
been extensively used to investigate phylogenetic rela- 
tionships among vertebrates, primarily mammals 
and amphibians. Until the advent of direct DNA se- 
quencing, MC’F had the advantage of being an inex- 
pensive and rapid method of estimating sequence 
differences between proteins. 
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tematics, pp. 127—155. Sunderland, MA: Sinauer Associates. 

Maxson RD and Maxson LR (1986) Micro-complement fixation: 
a quantitative estimator of protein evolution. Molecular Biol- 
ogy and Evolution 3(5): 375—388. 

Sarich VM and Wilson AC (1967) Immunological time scale for 
hominid evolution. Science 154: 1200-1203. 


See also: Molecular Clock 
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A defining feature of the unicellular ciliated proto- 
zoa (e.g., Tetrahymena, Paramecium, Oxytricha) is 
nuclear dimorphism. Each cell possesses one or more 
micronuclei and macronuclei (Figure 1). Both nuclei 
replicate and divide during vegetative or asexual 
reproduction, which occurs by binary fission, but 
each nucleus serves a distinct function within the 
cell. The genes in the micronucleus are transcription- 
ally inactive during asexual growth. This nucleus 
functions primarily during sexual reproduction (con- 
jugation), and is often referred to as the ‘germline’ 
nucleus of the cell. Following mating, the micronuclei 
in each cell of the mated pair undergo meiosis 
to generate haploid products. One of the haploid 
nuclei is then transferred to the mating partner, 
where it fuses with a resident haploid micronucleus 
to generate a new diploid (zygotic) micronucleus. 
The new diploid micronucleus divides mitotically 
one or more times in the absence of cell division. 
Some of the resulting division products will remain 
as micronuclei when asexual reproduction resumes. 
Other division products are transformed into new 
macronuclei, while the old macronuclei degenerate. 
This process of macronuclear development involves 
extensive rearrangement of the micronuclear genome 
(see Macronuclear Development, in Ciliates), includ- 
ing chromosome fragmentation, excision of interstitial 
DNA segments, and DNA amplification (hence, the 
term ‘macronucleus’). 

The micronuclear genome is organized as typical 
eukaryotic chromosomes and is usually diploid. 
Micronuclear chromosome numbers vary greatly 
in ciliates. The oligohymenophoran Tetrahymena 
thermophila has five pairs of chromosomes, while 
the hypotrich Stylonychia lemnae has more than 
100 micronuclear chromosomes. Similarly, the total 
amount of DNA in the micronucleus is quite variable. 
The micronuclear DNA content of T. thermophila is 
2.1 x 10° bp per haploid genome, while it is ~ 10!° bp 
in S. lemnae. 

The micronucleus is clearly required for sexual 
reproduction, but it is less clear if it is essential for 
asexual reproduction. Amicronucleate strains of cili- 
ates have been isolated from the wild, and they some- 
times arise in laboratory cultures. Such strains show 
no impairment of asexual reproduction. In contrast, 
removal of the micronucleus by techniques such as 


Figure | (A) The hypotrichous ciliate Oxytricha nova 
as viewed by scanning electron microscopy (by K. G. 
Murti). (B) An O. nova cell fixed and stained by Feulgen 
reaction to visualize nuclei. One of the two macronuclei 
and one of the four micronuclei characteristic of this 
species are indicated. Bars = 30 um. (Photo courtesy of 
D.M. Prescott.) 


microsurgery or laser irradiation often result in at 
least temporary impairment of growth, and sometimes 
death, in some ciliate species. This has led to sugges- 
tions that the micronucleus may also have some func- 
tion during asexual growth. 


Further Reading 

Gall JG (ed.) (1986) The Molecular Biology of Ciliated Protozoa. 
New York: Academic Press. 

Prescott DM (1994) The DNA of ciliated protozoa. Microbio- 
logical Reviews 58: 233-267. 


See also: Macronuclear Development, in Ciliates; 
Macronucleus 
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With large-scale sequencing and hybridization analy- 
ses of mammalian genomes came the frequent obser- 
vation of tandem repeats of DNA sequences, without 
any apparent function, scattered throughout the gen- 
ome. The repeating unit can be as short as two nucle- 
otides (CACACACA, etc.), or as long as 20 kb. The 
number of tandem repeats can also vary from as few as 
two to as many as several hundred. The mechanism by 
which tandem repeat loci originate may be different 
for loci having very short repeat units as compared 
to those with longer repeat units. Tandem repeats of 
short di- or trinucleotides can originate through ran- 
dom changes in nonfunctional sequences. In contrast, 
the initial duplication of larger repeat units is likely to 
be a consequence of unequal crossing-over. Once two 
or more copies of a repeat unit (whether long or short) 
exist in tandem, unequal pairing followed by crossing- 
over can lead to an increase in the number of repeat 
units in subsequent generations. Whether stochastic 
mechanisms alone can account for the rich variety of 
tandem repeat loci that exist in the genome or whether 
other selective forces are at play is not clear at the 
present time. In any case, tandem repeat loci continue 
to be highly susceptible to unequal crossovers and, as a 
result, they tend to be highly polymorphic in terms 
of overall locus size. 

Tandem repeat loci are classified according to both 
the size of the individual repeat unit and the length of 
the whole repeat cluster. The smallest and simplest — 
with repeat units of one to four bases and locus sizes 
of less than 100 bp — are called microsatellites. The use 
of microsatellites as genetic markers has revolution- 
ized the entire field of mammalian genetics. Next 
come the minisatellites with repeat units of 10 to 
40 bp and locus sizes that vary from several hundred 
base pairs to several kilobases. Tandem repeat loci of 
other sizes do not appear to be as common, but a 
great variety are scattered throughout the genome. 
The term midisatellite has been proposed for loci 
containing 40bp repeat units that extend over dis- 
tances of 250 to 500 kb, and macrosatellite has been 
proposed as the term to describe loci with large 
repeat units of 3 to 20kb present in clusters that 
extend over 800kb. However, the use of arbitrary 
size boundaries to ‘define’ these other types of loci 
is probably not meaningful since it appears that, in 
reality, no such boundaries exist in the potential for 
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tandem repeat loci to form in the mouse and other 
mammalian genomes. 


See also: Minisatellite; Tandem Repeats; Unequal 
Crossing Over 


Microscopy 


See: Electron Microscopy 


Microtubules 
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Microtubules are cytoplasmic filaments made of tubu- 
lin heterodimers. Interphase microtubules reorganize 
into spindle fibers at mitosis, where they are respon- 
sible for chromosome movement. 


See also: Mitosis; Spindle 


Minicells 
I Schildkraut 
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Minicells are cells that have lost their chromosome 
through a defective partitioning during cell division. 
Minicells can be used to express foreign genes in the 
absence of host-expressed genes. 


See also: Mitosis 


Minichromosome 
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A minichromosome can be either: 


1. a chromatin structure resembling a small chromo- 
some arising from a complex between certain 
viruses (e.g., SV40 or polyoma) with the histones 
of the infected host cell, or 

2. a plasmid that contains a chromosomal origin of 
replication. 


See also: Chromatin; Plasmids 
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Minimal Residual Disease 
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Current cytotoxic treatment protocols induce com- 
plete remission in most cancer patients, but many of 
these patients relapse. Apparently, the treatment 
protocols are not capable of killing all clonogenic 
malignant cells in these patients, although they have 
reached complete remission according to clinical and 
cytomorphological criteria (Figure |). More sensitive 
techniques are required for detection of low frequen- 
cies of malignant cells during and after treatment, i.e., 
detection of minimal residual disease (MRD). Such 
information can provide more insight into the effect- 
iveness of treatment. 

During the last 15 years, several techniques for MRD 
detection have been developed and clinically evalu- 
ated, mostly focusing on hematopoietic malignancies, 
i.e., leukemias and non-Hodgkin lymphomas (NHL). 
The detection limit of cytomorphological techniques 
in hematopoietic malignancies is not lower than 1-5% 
of malignant cells, implying that these techniques can 
provide only superficial information about the effect- 
iveness of the treatment (Figure |). MRD techniques 
should reach sensitivities of at least 107° (1 malignant 
cell within 1000 normal cells), but sensitivities of 
1074 to 10° are preferred. Moreover, reliable MRD 
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Figure | The putative relative frequencies of leukemic 


cells in peripheral blood or bone marrow of acute 
leukemia patients during and after chemotherapy and 
during development of relapse. The detection limit of 
cytomorphologic techniques as well as the detection 
limit of PCR techniques are indicated. |, induction 
treatment; C, consolidation treatment; II, re-induction 
treatment. 


techniques should be characterized by leukemia speci- 
ficity (discrimination between malignant and normal 
cells, without false-positive results), reproducibility, 
feasibility (easy standardization and rapid collection of 
results for clinical application), and should allow pre- 
cise quantification of MRD levels. Such characteristics 
allow ‘true’ MRD detection and thereby evaluation of 
the treatment effectiveness. The application of sensitive 
MRD techniques is especially valuable in those hema- 
topoietic malignancies which potentially can be cured 
by use of cytotoxic therapy and/or bone marrow trans- 
plantation (BMT). This concerns acute lymphoblastic 
leukemia (ALL), acute myeloid leukemia (AML), sev- 
eral types of NHL, and chronic myeloid leukemia 
(CML). In these disease categories, MRD information 
might be used for adaptation of treatment. 


Techniques and Targets for Molecular 
MRD Monitoring in Hemopoietic 
Malignancies 


Several cellular and molecular techniques have been 
evaluated for their capacity to detect MRD, including 
conventional cytogenetics, cell-culture systems, fluor- 
escent in-situ hybridization, Southern blotting, im- 
munophenotyping, and PCR techniques. Most of 
these techniques appear to have limited sensitivity, 
specificity, and/or applicability. However, current 
flow cytometric immunophenotyping and PCR- 
based approaches for MRD monitoring can reach 
sensitivities of 107° to 10°°, are sufficiently specific, 
and have a relatively broad applicability. PCR tech- 
niques can be used for detection of tumor-specific 
sequences such as junctional regions of rearranged 
immunoglobulin (Ig) and T-cell receptor (TCR) 
genes or breakpoint fusion regions of chromosome 
aberrations. In the context of this encyclopedia we 
only discuss the PCR-based MRD techniques. 


Ig and TCR Gene Rearrangements as 
Patient-Specific ‘Fingerprints’ 

During early B- and T-cell differentiation, the germ- 
line V, (D), and J gene segments of the Ig and TCR 
gene complexes rearrange, in order to provide each 
lymphocyte with specific combinations of V-(D)-J 
segments that code for the variable domains of Ig 
and TCR molecules. The random insertion and dele- 
tion of nucleotides at the junction sites of V, (D), and J 
gene segments make the junctional regions of Ig and 
TCR genes into ‘fingerprint-like’ sequences, which are 
most probably different in each lymphocyte and thus 
also in each lymphoid malignancy. These junctional 
regions can be used as tumor-specific targets for PCR- 
based MRD studies, for instance by choosing PCR 
primers at opposite sides of the junctional region and 


100bp junctional 
region 
i Vô2 D83 
5' < = 
> ~< 


Vé2 primer Dé2 primer 


Pic -4TCCAGGG -2 
3'GACGTCCCCGTTTTCACGGTAAAGATCTAGATG 5' 


5'CGCGTCGACCAAACAGTGCCTGTGTCAATAGG 3' 
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Hybridization with junctional region probe (CTGTGATCCAGGGTGGGGGA) 


Figure 2  Precursor-B-acute lymphoblastic leukemia 
patient with a Vo2-Dd3 rearrangement as PCR target 
for minimal residual disease (MRD) detection. The 
specificity of the junctional region is based on the 
deletion of six nucleotides and the random insertion of 
seven nucleotides. This sequence information was used 
for the design of a patient-specific junctional region 
probe. DNA from the ALL cells was diluted into DNA 
from normal blood mononuclear cells (MNC) and 
subjected to PCR analysis with V2 and Dd3 primers. 
PCR products were size-separated in an agarose gel, 
blotted onto a nylon membrane, and hybridized with the 
junctional region probe. In all dilution steps and in 
the MNC, Vd2-D83 PCR products were found, but only the 
first five dilution steps appeared to contain leukemia- 
derived PCR products, i.e., a sensitivity of 107° was 
reached. 


Table | 
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subsequent hybridization of a patient-specific junc- 
tional region probe to the obtained PCR products 
(Figure 2). 

For this purpose, the various Ig and/or TCR gene 
rearrangements have to be identified in each leukemia 
at initial diagnosis by using various PCR primer sets. 
It should be confirmed whether the obtained PCR 
products are derived from the clonal malignant cells 
and not from contaminating normal polyclonal cells 
with similar Ig or TCR gene rearrangements. There- 
fore the obtained PCR products are analyzed for their 
clonal origin, e.g., by heteroduplex analysis or by gene 
scanning. Subsequently, the precise nucleotide se- 
quence of the junctional regions should be determined. 
This sequence information allows the design of 
junctional region-specific oligonucleotides. MRD 
detection via PCR analysis of Ig and TCR genes is 
applicable in more than 98% of all lymphoid malig- 
nancies, which represent approximately 75% of all 
hematopoietic malignancies (Table 1). 

In some categories of lymphoid malignancies, the 
Ig/TCR gene rearrangement patterns are not fully 
stable during the disease course, because of continuing 
or secondary rearrangements (e.g., in ALL) or somatic 
mutations (e.g., in some types of NHL). In such 
disease categories false-negative MRD-PCR results 
should be prevented by using two rearranged Ig/ 
TCR alleles as PCR targets. The selection of these 
targets should be based on their chance to remain 
stable, such as ‘end-stage’ rearrangements in ALL 
and nonfunctional or incomplete (D-J) rearrange- 
ments in NHL with ongoing somatic mutations. 


Fusion Genes as Leukemia-Specific Markers 

Breakpoint fusion regions of chromosome aberrations 
can be employed as unique, tumor-specific PCR tar- 
gets for MRD detection, in which the PCR primers are 
chosen at opposite sides of the breakpoint fusion 
region. PCR-mediated amplification of breakpoint 


Applicability of PCR-based MRD detection in hemopoietic malignancies. 


Disease category 


PCR-based MRD techniques (sensitivity) 


Ig/TCR (10 4- 10°) 


Fusion genes (10 4-107 *) 


ALL > 95% 
APL ? 

AML (non-APL) <10% 
Chronic lymphocytic leukemia >98% 
CML - 

NHL (high grade) >98% 
Other NHL >98% 
Multiple myeloma >98% 


~40% 
>90% 
~25% 
? 
>95% 
~25% 
~20% 
>40% 
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Figure 3 RT-PCR analysis of BCR-ABL fusion gene transcripts for MRD detection. (A) Schematic of the exon-intron 
structure of the BCR and ABL genes involved in t(9;22)(q34;qI1), with focus on the minor breakpoint cluster region 
(m-bcr). The centromeric (cen) and telomeric (tel) orientation, exon numbering, and relevant breakpoint regions are 
indicated. The old nomenclature for BCR exon | (el) and ABL exons 2 and 3 (a2 and a3) is also indicated. (B) Schematic 
diagrams of the BCR-ABL p|90-type fusion gene transcripts. The numbers under the fusion gene transcript refer to the 
first (5’) nucleotide of the involved exon, except when the last (3’) nucleotide of the upstream gene is indicated. The 
arrows indicate the relative position of the primers; the numbers refer to the 5’ nucleotide position of each primer. 
The outer primers A and B (BCR-el-A and ABL-a3-B) are used for first-round amplification and the internal primers 
Cand D (BCR-el-C and ABL-a3-D) are used for the nested RT-PCR reaction. Primer E is the so-called shifted primer 
used exclusively to confirm the positive results obtained with AB primers. The five primers were developed by 
the PCR laboratories participating in the BIOMED-| Concerted Action: Investigation of minimal residual disease in 
acute leukemia: International standardization and clinical evaluation (Van Dongen et al., 1999). (C) Agarose gel electro- 
phoresis of first-round amplification of serially diluted leukemic cells derived from precursor-B-ALL patients, as well as 
undiluted cDNA from the MIK-ALL cell line as control. In the first round, RT-PCR product can be detected down to 
107° dilution mixtures. (D) Agarose gel electrophoresis of a nested RT-PCR reaction of the same serially diluted 
samples and the undiluted MIK-ALL control sample. RT-PCR products can be detected down to 1074 dilution 
mixtures in a nested RT-PCR reaction. (E) Agarose gel electrophoresis of a control RT-PCR amplification using 


primers for the constitutively expressed ABL gene. Control size markers. 


fusion sequences at the DNA level can only be used for 
chromosome aberrations in which the breakpoints of 
different patients cluster in a relatively small break- 
point area of preferably less than 2 kb. This is the case 
in t(14;18) in follicular cell lymphoma (FCL), where 
most breakpoints are clustered ina few relatively small 
regions of the BCL2 gene, which are juxtaposed to one 
of the JH gene segments of the IGH locus. Other ex- 
amples include T-ALL-associated aberrations such as 
t(11;14)(p133q11), t(1;14)(p345q11), t(10;14)(q245q11), 
and the TAL/ deletions. Despite the clustering of the 
breakpoints, the nucleotide sequences of the break- 
point fusion regions of the above chromosome aberra- 
tions differ per patient. Therefore these breakpoint 
fusion regions represent unique patient-specific and 
sensitive PCR targets for MRD detection. 

In most translocations, however, breakpoints of 
different patients are more widespread resulting in 
breakpoint regions, which are far larger than 2 kb. 
This implies that in each individual patient the exact 


breakpoint has to be determined for PCR primer 
design, which is technically possible but laborious 
and time-consuming. However, several malignancies 
with chromosome aberrations have characteristic 
tumor-specific fusion genes, which are transcribed 
into fusion-gene mRNA molecules that are similar in 
individual patients despite distinct breakpoints at the 
DNA level. After reverse transcription (RT) into 
cDNA, these fusion-gene mRNA molecules can 
therefore be used as appropriate RT-PCR targets 
for MRD studies (Figure 3). Examples include: 
BCR-ABL transcripts in the case of CML or precur- 
sor-B-ALL with t(9;22); TEL-AML1 transcripts in 
the case of precursor-B-ALL with (12321); E2A- 
PBX1 mRNA in most pre-B-ALL with t(1;19); 
MLL-AF4 transcripts in pro-B-ALL with t(4;11); 
PML-RARA mRNA in acute promyelocytic leuke- 
mia (APL) with t(15;17); AML1I-ETO mRNA in 
AML with t(8;21); and NPM-ALK mRNA in anaplas- 
tic large cell lymphoma with t(255). 


An advantage of using chromosome aberrations as 
tumor-specific PCR targets for MRD detection is 
their stability during the disease course. However, 
MRD detection of chromosome aberrations by PCR is 
not always applicable, because in many hematopoietic 
malignancies no chromosome aberrations with fusion 
genes have been found yet (see Table |). Depending 
on the type of tumor-specific PCR target, detection 
limits of 107° to 10~° can be reached (Figure 3). 

Because of the high sensitivity of PCR techniques, 
cross-contamination of RT-PCR products between 
patient samples is a major pitfall in RT-PCR-mediated 
MRD studies. Such cross-contamination is difficult to 
recognize, since leukemia-specific fusion-gene mRNA 
PCR products are not patient-specific. This is in con- 
trast to PCR products obtained from breakpoint 
fusion regions at the DNA level such as in t(14;18) 
and TAL1 deletions, which can be identified by use of 
patient-specific oligonucleotide probes. Furthermore, 
very low levels of fusion transcripts, particularly 
BCR-ABL mRNA have also been found in healthy 
individuals, which occasionally may be the source of 
false-positive results in leukemia patients in long-term 
remission. 


Quantification of MRD by Use of the PCR 
Analyses 

For reliable MRD monitoring, the PCR results should 
be quantified, directly related to the number of malig- 
nant cells. MRD quantification by PCR analysis is a 
complex process. Firstly, the quantity and amplifi- 
ability of the isolated DNA or RNA should be 
ensured. In RT-PCR studies, the number of fusion 
gene transcripts should be normalized to the number 
of transcripts of a housekeeping gene. In DNA-based 
PCR studies, this can be achieved by using a non- 
polymorphic intron-exon region of a single-copy gene 
as control PCR target. Secondly, minor variations in 
RT efficiency, primer annealing, and primer extension 
may lead to major variations at the end of PCR, i.e., 
after 30-35 PCR cycles. The disadvantages of “PCR 
end-point quantification’ might (partly) be overcome 
by using serial dilutions of DNA or RNA isolated 
from the leukemic cell sample at diagnosis into DNA 
or RNA of normal mononuclear cells. The same dilu- 
tion series of diagnosis DNA is generally used to 
determine the tumor load in a follow-up sample in a 
semiquantitative manner by comparison of hybridiza- 
tion signals. This approach gives an indication of the 
tumor burden in the follow-up sample. 

A more precise but also more laborious quantifica- 
tion method is based on limiting dilution of MRD- 
positive remission samples. To make this assay reliable, 
it is necessary to perform replicate experiments to 
determine the level of MRD positivity. A less tedious 
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strategy for quantitative PCR uses an internal stand- 
ard that is coamplified with the target of interest. 
Quantification by competitive PCR is performed by 
comparing the PCR signal of the specific target DNA 
with that of known concentrations of an internal 
standard, the competitor. 

Recently, a novel technology has become available, 
the ‘real-time quantitative PCR’ (RQ-PCR).Incontrast 
to the above-described PCR end-point quantification 
techniques, RQ-PCR permits accurate quantification 
during the exponential phase of PCR amplification 
(see Polymerase Chain Reaction, Real-Time Quanti- 
tative). RQ-PCR is currently the method of choice for 
the quantitative detection of MRD using leukemia- 
specific chromosome aberrations or junctional regions 


of Ig and TCR gene rearrangements as PCR targets. 


MRD Monitoring in ALL as Example of 
Assessment of Early Treatment 
Response 


Clinical MRD studies, mainly based on Ig/TCR gene 
rearrangements as MRD-PCR targets, have shown 
that the most significant application of MRD moni- 
toring in ALL is the estimation of the initial response 
to single or multiagent therapy. Low levels or absence 
of MRD in bone marrow (BM) after completion of 
induction therapy appears to predict good outcome, 
and the risk of relapse is proportional to detected 
MRD levels. Multivariate analyses showed that the 
degree of MRD positivity after induction therapy is 
the most powerful prognostic factor, independent of 
other clinically relevant risk factors, including age, 
blast count at diagnosis, immunophenotype at diag- 
nosis, presence of chromosome aberrations, response 
to prednisone, and classic clinical risk group assign- 
ment. The results from the large prospective MRD 
study of the International BFM (Berlin-Frankfurt- 
Miinster) Study Group indicate that combined 
information on the kinetics of tumor load decrease 
at the end of induction treatment (at 1 month) and 
before consolidation treatment (at 3 months) is 
superior to analysis of MRD at a single time point, 
because this approach allows recognition of patients 
with poor prognosis as well as patients with good 
prognosis. The combined MRD information distin- 
guishes patients at low risk having MRD negativity 
at both time points (5-year relapse-free survival 98%); 
patients at high risk having high (>107°) or inter- 
mediate (107°) degrees of MRD at both time points 
(5-year relapse-free survival 16%); and the remaining 
patients at intermediate risk (5-year relapse-free sur- 
vival 76%; Figure 3). The MRD-based low-risk 
patients make up a group of a substantial size 
(approximately 43%), comparable with the frequency 
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of survivors of childhood ALL in the 1970s, before 
treatment intensification was introduced. Within the 
MRD-based low-risk group, half of the patients 
already have low (<10~*) or undetectable MRD levels 
after 2weeks of treatment. This group might profit 
particularly from treatment reduction. On the other 
hand, the group of patients at MRD-based high risk is 
larger than any previously identified high-risk group 
(approximately 15%) and has an unprecedentedly 
high 5-year relapse rate of 84%. This group might 
benefit from further intensification of treatment proto- 
cols including BMT during first remission or novel 
treatment modalities. 

Continuous MRD monitoring in childhood ALL 
throughout chemotherapy has shown that steady 
decrease in MRD levels to negative PCR results dur- 
ing treatment is associated with favorable prognosis, 
whereas persistence of high MRD levels or steady 
increase in MRD levels generally leads to clinical 
relapse. However, this information is of limited clin- 
ical value when compared with assessment of early 
treatment response and might only be valuable for 
the small group of MRD-based high-risk patients in 
order to evaluate the effectiveness of BMT or alter- 
native treatment approaches. 


MRD Detection in APL as Example of 
Continuous Monitoring for ‘Treatment 
Titration’ 


With modern treatment protocols, combining all- 
trans-retinoic acid with consolidation chemotherapy, 
clinical remission is achieved in virtually all APL 
patients, but 20-30% of patients will still suffer from 
relapse. MRD studies in APL are based on RT-PCR 
monitoring of PML-RARA fusion mRNA associated 
with t(15;17), which is present in more than 90% of 
APL patients (Table 1). MRD monitoring during 
early treatment phases is of limited clinical value, 
showing variable degrees of positivity. To obtain clin- 
ically relevant information, prospective MRD moni- 
toring is required during the first 6-12 months after 
consolidation treatment. This ‘continuous’ MRD 
monitoring allows early identification of patients at 
increased risk of relapse, because the reappearance 
of MRD usually precedes hematological relapse at a 
median time of 2-3 months. This information led to 
the definition of molecular relapse in APL, which is 
manifested by conversion from RT-PCR negativity to 
positivity in two successive BM samplings during 
follow-up. Patients treated at the time point of mo- 
lecular relapse have significantly better 2-year event- 
free survival as compared to patients treated with the 
same salvage therapy at the time of hematological 
relapse (92% vs 44%). 


MRD Detection in CML as Example of 
post-BMT Monitoring 


In order to cure patients with CML, chemotherapy 
and interferon-o therapy has to be followed by allo- 
geneic BMT. In virtually all (more than 95%) CML 
patients, MRD monitoring is possible based on RT- 
PCR detection of BCR-ABL fusion mRNA (Table 1). 
The most relevant clinical application of MRD in 
CML is the assessment of treatment response after 
BMT. The vast majority of patients are PCR-positive 
in the first 6-9 months after BMT, and in vitro experi- 
ments show that at least some of the BCR-ABL- 
positive cells have a clonogenic potential. Sustained 
PCR negativity within 1 year after BMT is associated 
with cure, while patients with PCR positivity after 
1 year or more post-BMT have significantly greater 
risk of relapse than patients with PCR negativity. 
Not all post-BMT patients with persistent PCR 
positivity relapse. The group of high-risk patients 
can be identified with serial quantitative PCR an- 
alyses; these patients generally show increasing MRD 
levels several months prior to hematological or cyto- 
genetic relapse. Patients who remain in remission gen- 
erally have decreasing or persistently low MRD levels, 
with some patients being BCR-ABL mRNA-positive 
even 10 years after allogeneic BMT. Quantitative 
MRD studies in CML enabled the definition of 
molecular relapse after allogeneic BMT, which is 
equivalent to rising or persistently high MRD levels 
(BCR-ABL to ABL ratio of more than 0.02%) in two 
consecutive specimens more than 4 months after BMT. 
Quantitative MRD analysis is also used for moni- 
toring the response to immunotherapy, i.e., donor 
lymphocyte infusions for patients relapsing after allo- 
geneic BMT. Preliminary data indicate that the outcome 
after immunotherapy is more favorable when immuno- 
therapy is administered at the phase of cytogenetic or 
molecular relapse, when the burden of disease is rela- 
tively low. In some responders such early treatment 
results in conversion into sustained PCR negativity. 


Conclusion 


Reliable quantitative PCR techniques are currently 
available for MRD detection in most patients with 
hemopoietic malignancies (Table 1). Each MRD 
technique has its advantages and disadvantages, 
which have to be weighed carefully to make an appro- 
priate choice. On the one hand false-positive and 
false-negative results should be prevented, but on the 
other hand the MRD techniques should be sufficiently 
sensitive. These requirements can generally be met 
with PCR analysis of chromosome aberrations, if 
adequate precautionary measures are taken to prevent 


cross-contamination of PCR products. PCR analysis 
of junctional regions of Ig/TCR gene rearrangements 
has the advantage of its broad applicability in all cat- 
egories of lymphoid malignancies as well as the advant- 
age of high sensitivity levels. However, the use of Ig/ 
TCR gene rearrangements as PCR targets requires 
extra efforts per patient for identification of the junc- 
tional region sequences and needs careful selection of 
the most stable rearrangements. 

Although most MRD techniques are relatively sen- 
sitive, one should realize that MRD negativity does 
not exclude the presence of malignant cells. Each 
MRD test only screens 10° to 10° cells, which repre- 
sent a minor fraction of the total amount of hemo- 
poietic cells in a human body. In addition, it might 
well be that the distribution of low numbers of 
malignant lymphoid cells throughout the body is not 
homogeneous and that the investigated cell sample is 
not fully representative. 

Finally, the clinical impact of MRD detection in the 
various categories of hemopoietic malignancies is not 
identical. In ALL, the main application of MRD data 
was shown to be the evaluation of early treatment 
response, with precise measurement of tumor load 
reduction during remission induction therapy. In 
contrast, the value of MRD detection in CML and 
APL relies on monitoring over a clinically relevant 
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Figure 4 Relapse-free survival of the three MRD-based 
risk groups of childhood ALL treated with chemotherapy 
protocols of the International BFM Study Group, as 
defined by combined MRD information at the end of 
induction treatment (at | month) and before consolida- 
tion treatment (at 3 months). Patients in the low-risk 
group have MRD negativity at both time points (43% of 
patients), patients in the high-risk group have MRD 
degrees of > 107° at both time points (15% of 
patients), and the remaining patients form the MRD- 
based intermediate-risk group (43% of patients). The 
numbers of patients at risk are given in parentheses for 
each group at 24months and 48months after time- 
point two. (From Van Dongen et al., 1998.) 
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disease-specific (and treatment protocol-dependent) 
time-span, with possibilities for adapting the treatment 
based on MRD results. This is probably also possible 
in other subtypes of AML and mature lymphoid 
malignancies. However, further studies are needed to 
define the disease-specific ‘MRD windows’ (required 
sensitivity and time span) for clinically reliable MRD 
monitoring in AML, chronic lymphocytic leukemias, 
and NHL. 

The success of MRD studies in hemopoietic malig- 
nancies is related to the easy accessibility of BM and PB 
(peripheral blood), which are seeded with malignant 
cells. In contrast, MRD studies in solid tumors are 
hampered by the difficulty to sequentially sample tis- 
sues which are primarily affected by the malignancies. 
Nevertheless, MRD information can at least improve 
the staging of solid tumors at diagnosis, such as BM 
analysis in breast cancer and colon cancer patients. 
Preliminary data indicate that in some solid tumors 
MRD can be detected in PB during followup, which 
is related to a more aggressive disease course. 
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Minimum change is the minimum number of evolu- 
tionary steps or changes that are required to explain 
the differences among the sequences under study. 
In the case of two sequences, each difference between 
the two sequences must require at least one change, 
so the minimum change (i.e. the minimum number of 
changes) can be estimated as the observed number 
of differences between the two sequences. When 
there are more than two sequences, the situation 
becomes more complex because the minimum number 
of changes depends on the evolutionary relationships 
among the sequences (i.e., the phylogenetic tree). 
However, for each alternative tree, we can infer the 
minimum number of changes required to explain the 
differences among the sequences. A method for mak- 
ing this inference has been developed by Fitch (1971). 
When this is done for all possible alternative trees, one 
chooses the tree with the smallest minimum change as 
the best tree. This principle of tree reconstruction is 
known as the maximum parsimony method and the 
chosen tree, the maximum parsimony tree. Computer 
programs have been developed to undertake this 
tedious procedure (e.g., Swofford, 1999). 
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In contrast to traditional restriction fragment length 
polymorphisms (RELPs) caused by base pair changes 
in restriction sites, a special class of RFLP loci present 
in all mammalian genomes is highly polymorphic with 
very large numbers of alleles. These ‘hypervariable’ 
loci were first exploited in a general way by Jeffreys 
and his colleagues for genetic mapping in humans. 


Hypervariable RFLP loci of this special class are 
known by a number of different names including 
variable number tandem repeat (VNTR) loci and 
minisatellites. Minisatellites are composed of unit 
sequences that range from 10 to 40 bp in length and 
are tandemly repeated from tens to thousands of 
times. Although various functions have been sug- 
gested for minisatellite loci as a class, none of these 
has withstood the test of further analysis. Rather, it 
appears most likely that minisatellite loci (like micro- 
satellite loci) evolve in a neutral manner through 
expansion and contraction caused by unequal crossing- 
over between out-of-register repeat units. Recombi- 
nation events of this type will yield reciprocal prod- 
ucts that both represent new alleles with a change in 
the number of repeat units. 

The frequency with which new alleles are created at 
minisatellite loci — on the order of 107° per locus per 
gamete — is much greater than the classical mutation 
rate of 10~ to 107°. This leads to a much higher level 
of polymorphism between unrelated individuals 
within a population. At the same time, one change in 
a thousand gametes is low enough so as to not interfere 
with the ability to follow minisatellite alleles in classic- 
al breeding studies. 

Length polymorphisms at minisatellite loci are most 
simply detected by digestion of genomic DNA samples 
with a restriction enzyme that does not cut within the 
minisatellite itself but does cut within closely flanking 
sequences. As with all other RFLP analyses, the restric- 
tion digests are fractionated by gel electrophoresis, 
blotted, and hybridized to probes derived from the 
polymorphic locus. However, unlike traditional point 
mutation RFLPs, minisatellites are caused by, and 
reflect, changes in the actual size of the locus itself. 

The simultaneous detection of 10 to 40 unlinked 
and highly polymorphic loci provides a whole genome 
‘fingerprint’ pattern which is very likely to show 
differences between any two unrelated individuals. 
These DNA fingerprints provide a powerful tool in 
human forensic analysis in the absence of any know- 
ledge as to the map location of any of the individual 
loci that are being detected. DNA fingerprinting per se 
is of much less use in the analysis of laboratory ani- 
mals, who do not bring paternity suits or stand trial 
for rape or murder. However, fingerprinting can allow 
field biologists to follow individual animals in wild 
populations subjected to repeated capture-and-release 
sampling. It can also be used to monitor the integrity 
of inbred strains of mice and for the characterization 
and comparison of different breeds of domesticated 
animals that have commercial importance. 


See also: Microsatellite; Restriction Fragment 
Length Polymorphism (RFLP) 


Mismatch Repair 
(Long/Short Patch) 


S A Lacks 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 | .0832 


DNA base mismatches correspond to noncomple- 
mentary base pairing or the absence of base pairing 
due to insertions or deletions in one strand of the 
DNA duplex. Such mismatches result from errors in 
DNA replication, base damage, or the formation of 
heteroduplex products of genetic recombination. A 
generalized mismatch repair system that recognizes a 
variety of base mismatches and short deletions or 
insertions is present in nearly all living species. This 
system removes a long segment of the strand targeted 
for correction, which is then resynthesized with the 
other strand as template to give long patch repair. 
Some species have specialized systems that recognize 
a particular mismatch within a specific DNA sequence 
and always remove the same component of the mis- 
match as a DNA segment just a few nucleotides long 
(short patch repair). 


See also: Repair Mechanisms 
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Translation is the biosynthesis of protein using the 
codons in messenger RNA (mRNA) as a template. 
Therefore, the sequence of a protein should be easily 
predicted from the sequence of the mRNA, or from 
the corresponding region of DNA, by use of a genetic 
code table. Mistranslation is the biosynthesis of a 
protein whose sequence is not predicted from a 
codon by codon reading of the sequence of the 
mRNA. Translation, like the other steps in biological 
information flow, is quite accurate, and the total 
cumulative error frequency is certainly less than one 
error per 1000 codons read. However, an ‘average 
rate’ can be very misleading, since particular mistrans- 
lation events seem to occur at a very wide range of 
frequencies. Indeed, even in cells that use the standard 
genetic code, the synthesis of certain proteins requires 
what might seem like an error in the translation of a 
particular mRNA. These events are referred to as 
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alternative readings, and although some occur at fre- 
quencies of a few percent, others occur at, or near, 
100%. Both erroneous readings and alternative read- 
ings will be discussed below. 

The kinds of abnormal proteins that are produced 
by mistranslation can also be generated by errors in 
transcription or by mutation (and some of the transla- 
tional errors have names similar to specific types 
of mutations). It is relatively simple to determine 
whether the errors observed were generated by muta- 
tion, because all the molecules of a protein made 
subsequent to the mutation will contain the error 
and mutations are inheritable. An error-containing 
mRNA may also be translated several times, but the 
error is not heritable and the error-containing mRNA 
has a relatively short half-life. Errors in translation can 
often be distinguished from errors in transcription by 
the use of mutants, antibiotics, or growth conditions 
which are known to increase or decrease the fidelity of 
translation. Studies of mistranslation are handicapped 
by the fact that the resulting products may be difficult 
to detect and may be rapidly degraded by the cell. 


Errors in Translation 


There are a few general types of errors (or alternatives) 
and a very large number of specific errors that could 
happen during the synthesis of a protein from an 
error-free mRNA. An amino acid substitution caused 
by mistranslation is called a missense error (the corres- 
ponding mutation is called a missense mutation). 

Missense errors can be caused by misacylation of a 
transfer RNA (tRNA; and subsequent incorporation 
of the incorrect amino acid into a protein) or by a 
properly acylated tRNA misreading a codon. Mis- 
acylation by aminoacyl-tRNA synthetases typically 
involve related amino acids, e.g., valine for isoleucine. 
Therefore, missense errors caused by these events 
would probably involve conservative amino acid sub- 
stitutions, that is, the substituted amino acid might 
function almost identically to the correct amino acid. 
In vitro measurements would indicate that even the 
most frequent events probably occur at frequencies of 
less than 10 * errors per acylation. 

Misreading of codons is apparently the result 
of a mismatch in one of the three possible codon- 
anticodon base pairs during elongation on the ribo- 
some. Therefore, misreading, like misacylation, will 
often also lead to conservative amino acid substitu- 
tions, because similar amino acids often have related 
codons. Indeed, because of the degeneracy of the code, 
many possible codon misreadings will produce an 
unaltered protein (as is the case for silent mutations). 
A reasonable number of misreading events have been 
measured im vivo, and in bacteria the mean error 
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frequency seems to be about 5 x 107“ errors per amino 
acid incorporated. However, some missense errors 
happen at frequencies of almost 1%, and some codons 
are clearly error-prone. Note that proteins containing 
missense errors will be full-length and many will have 
normal or nearly normal activity. 

The codon being misread does not have to be a 
sense codon. Even if it is, misreading does not neces- 
sarily result in an amino acid substitution. Read- 
through of stop codons, observed as leakiness, results 
from the insertion of an amino acid in response to a 
stop codon. This is known to happen at a wide range 
of frequencies and has been observed in prokaryotes 
and eukaryotes. Indeed, readthrough of certain stop 
codons in bacteria can happen at frequencies of a few 
percent. In bacteria, the commonly used UAA stop 
codon seems to be the least leaky, with readthrough 
frequencies of less than 107°. These errors are the 
result of an acylated tRNA erroneously reading the 
stop codon. Therefore, the amino acid inserted often 
has a codon closely related to the stop codon, e.g., 
tryptophan (UGG) inserted at UGA. Most of these 
errors have been studied by examining the suppression 
of a nonsense mutation, although readthrough of stop 
codons at the normal end of a gene is also known to 
occur. The opposite type of error, premature termin- 
ation at a sense codon, is called a false stop and seems 
to occur much less frequently. 

While there are many examples of genes that seem 
to use internal initiation to produce alternative pro- 
teins, there are also examples of what seem quite 
clearly to be errors in initiation. These then are ex- 
amples of sense codons being misread as start codons. 

The protein that results from stop codon read- 
through will have a C-terminal extension which may 
or may not affect activity. For aberrant initiation 
events, the protein produced will be closely related 
to the native protein, but it will be longer or shorter 
depending on the exact position of the error. (Erro- 
neous initiation events in other frames would be diffi- 
cult to detect.) It is necessary to know both the 
location of the event and the protein affected before 
one can predict what the effect on activity might be. 
However, false stops will lead to truncated proteins, 
and the majority of these should be without activity. 
Truncated proteins can also result from frameshift 
errors. 

Frameshift errors are caused by a ribosome shifting 
into another reading frame while translating a par- 
ticular mRNA. Such a ribosomal frameshift will 
yield the same type of defective protein as a frameshift 
mutation. Frameshifted products will be identical to 
the expected product up to the site of the frameshift 
and then differ depending on the direction and extent 
of the shift and the location of stop codons in the other 


frames. Since stop codons are usually abundant in 
other frames, most frameshift products will be trun- 
cated and inactive. The mean frequency of frameshift 
errors in bacteria may be about 10~* errors per trans- 
location event, but some erroneous events may be 
much more frequent. 

The typical frameshift error seems to be the result 
of slippage of the ribosome by one base (or two) 
toward either the 3’ end or the 5’ end of the mRNA. 
However, events called ‘hops’ are also known. In these 
the ribosome seems to skip over one or more codons, 
resulting in a protein missing a number of amino acids 
internally, but otherwise normal. A hop that did not 
result in a ribosome being back in the correct frame 
would be very difficult to detect. 


Antibiotics and Mutations which 
Influence Error Frequency 


The antibiotics streptomycin and neomycin can be 
used to increase the frequency of translational errors 
in bacteria. Paromomycin has a similar effect on 
eukaryotic cells. As mentioned above, there are also 
mutations that affect the accuracy of translation. 
Many of these are in genes encoding ribosomal pro- 
teins. The first studied were those leading to strepto- 
mycin resistance in Escherichia coli. Many mutations 
in the gene encoding ribosomal protein $12 decrease 
the occurence of many kinds of translational errors, 
including missense errors and stop codon read- 
through. Such mutations are said to have a restrictive 
phenotype, that is, they restrict certain kinds of errors. 
Mutations in other genes that encode ribosomal muta- 
tions, including those for S4 and S5, can increase 
errors. These mutations lead to a ribosomal-ambiguity 
(Ram) phenotype. Mutations leading to altered trans- 
lation elongation factors, elongation factor Tu and 
elongation factor G, are also known to affect the 
accuracy of translation, as do some mutations in ribo- 
somal RNA (rRNA). All such mutations tend to slow 


down translation. 


Alternative Readings of Standard Code 


There are derivatives of the standard genetic code in 
which some codons call for different amino acids, e.g., 
the codon UGA is a termination codon in the standard 
code but is a tryptophan codon in animal mitochon- 
dria. However, here we will deal not with alternative 
codes, but with alternative readings of the standard 
or ‘universal’ genetic code. Alternative readings differ 
from errors only in their frequency and in their 
obvious programmed nature. An alternative reading 
can also be defined as one that produces a required 
protein. These events have clearly established that the 


context surrounding codons can affect the accuracy of 
translation and that context-dependent translational 
strategies are used by a variety of organisms. 

One of the earliest known examples of alternative 
readings discovered is the use of start codons other 
than AUG. For example, in prokaryotes the codons 
GUG and UUG, which typically encode valine and 
phenylalanine, respectively, can be used as start 
codons and in that case they encode methionine. It is 
the nearby sequence context, including the ribosome 
binding site, on the mRNA that allows the ribosome to 
recognize these codons as start codons. More recently 
it has been discovered that in both prokaryotes and 
eukaryotes the stop codon UGA can call for the in- 
corporation of selenocysteine in certain contexts. In 
mRNAs using alternative start codons or containing 
UGA as a selenocysteine codon, the alternative read- 
ing is the dominant reading. A variety of other alter- 
native translation strategies are known in which there 
is more than one possible outcome. These are not 
considered simply as errors that occur at a very high 
frequency, because in many cases it has been clearly 
demonstrated that functional protein cannot be pro- 
duced in the absence of such events. 

Many of these programmed alternatives are known 
to be involved in the production of essential virus 
proteins. Numerous examples can be found in the 
strategies used to produce RNA replicase or reverse 
transcriptase in viruses and virus-like genomes. These 
complex events involve a required readthrough of a 
stop codon or programmed frameshifts within a par- 
ticular mRNA. Typically a fusion protein is produced, 
with the N-terminal domains of the protein synthe- 
sized normally, while the C-terminal portion, which 
includes the RNA replicase or reverse transcriptase 
domains, can only be synthesized if the ribosome 
uses an alternative translation strategy. In some 
viruses, for instance the alpha virus of animals, several 
plant viruses, and several retroviruses, stop codon 
readthrough is required. Since this readthrough hap- 
pens at most in a few percent of the messenger transits, 
the full-length fusion protein is not produced in large 
amounts. In many other retroviruses, a programed 
ribosomal frameshift is required. Some retroviruses 
require two frameshifts to synthesize the full-length 
fusion protein. Once again, these are typically not 
highly efficient events and the fusion protein is pro- 
duced in small amounts. These frameshifts may serve 
both to regulate the amount and the activity of the 
active enzyme. 

In programed alternatives for translation of viral 
genes or cellular genes, it is clear that the context 
near the stop codon or the frameshift site is critical 
for the event. This context always involves sequence 
specificity and often secondary structure of the 
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mRNA. When these sites are moved into other con- 
structs, they still function. It seems that almost all 
‘errors’ can be programmed, e.g., the bacteriophage 
T4 has a programmed ribosomal hop over 50 nucleo- 
tides in the mRNA from gene 60 which occurs nearly 
all the time. 

In some cases, these programmed alternatives are 
susceptible to the same types of mutations that affect 
the frequency of translational errors. However, this is 
not always the case. In E. coli programmed ribosomal 
frameshifts are involved in the translation of at least 
some genes whose protein products are essential to the 
cell. Such events will not be eliminated by mutations 
that generally decrease error frequency such as the 
restrictive streptomycin resistance mutations. How- 
ever, programmed alternatives not used by host 
encoded genes could be sites for specific control func- 
tions. Some plants produce ribosome-inactivating pro- 
teins. One of these, pokeweed antiviral protein, 
modifies the large rRNA and interferes with ribosomal 
translocation. It seems possible that the antiviral activ- 
ity of this protein results from its ability to specifically 
inhibit certain programmed ribosomal frameshifts. 


See also: Degenerate Code; Frameshift Mutation; 
Genetic Code; Mutation, Missense; Ribosome 
Binding Site; Start, Stop Codons; Translation 
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Mitochondria, cellular organelles of respiration and 
ATP production, are found in almost all eukaryotic 
cells. The mitochondrion has a primary role in energy 
metabolism, a role that is intimately connected with its 
double-lipid membrane structure. Formation of mito- 
chondria (mitochondrial biogenesis) is under the dual 
control of the nuclear and mitochondrial genetic 
systems. The presence of functional DNA in mito- 
chondria reflects its evolutionary descent from an 
endosymbiotic bacterial ancestor. 


Structure 


The mitochondrion has two bounding membranes, 
outer and inner, which are structurally and function- 
ally distinct. One major difference is their permea- 
bility properties: the outer membrane permits free 
passage of most molecules of molecular weight less 
than about 10 000 daltons, whereas the inner membrane 
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forms an effective barrier to even small molecules and 
ions. The inner and outer membranes define two sub- 
mitochondrial soluble compartments, the intermem- 
brane space and the matrix (the latter enclosed by the 
inner membrane). The inner membrane is highly in- 
vaginated, folded into cristae that greatly increase the 
membrane’s surface area. 

As isolated or as viewed in electron micrograph 
thin sections, mitochondria often appear round or 
oblong in shape. However, in a living cell, mitochon- 
dria may actually comprise a dynamic interconnected 
network, or syncytium, pieces of which are constantly 
breaking off and re-fusing. 


Function 


The mitochondrial matrix is the site of the tricar- 
boxylic acid (TCA) cycle, a series of enzymatic reac- 
tions initiated by the conversion of pyruvate and fatty 
acids to acetyl coenzyme A (acetyl-CoA). Pyruvate 
and fatty acids are transported into mitochondria 
from the cytoplasm by membrane-bound permeases. 
The acetyl group of acetyl-CoA is oxidized in a 
number of steps to yield carbon dioxide (CO3) and 
the reduced electron carriers nicotinamide adenine 
dinucleotide (NADH) and flavin adenine dinucle- 
otide (FADH3). These coenzymes are the source of 
the electrons that are transported along the respiratory 
chain of the inner mitochondrial membrane, in a path- 
way that ultimately leads to the formation of ATP. 
Electrons are passed through a series of donors and 
acceptors organized into four complexes (I-IV), with 
a variety of electron carriers, including cytochromes, 
serving to shuttle electrons from one complex to the 
next. In the final reaction, catalyzed by complex IV 
(cytochrome c oxidase), electrons are transferred from 
reduced cytochrome c to molecular oxygen (O2), with 
formation of water (H20). 

Oxidation of substrates through the respiratory 
chain is coupled to formation of ATP through the 
process known as oxidative phosphorylation. Elec- 
tron transport is directly linked to the pumping of 
protons across the inner mitochondrial membrane 
(from matrix to intermembrane space). Proton pump- 
ing sets up an electrochemical proton gradient, also 
known as the proton-motive force. Subsequent dissi- 
pation of this gradient by movement of protons back 
across the inner mitochondrial membrane (down the 
proton concentration gradient) is in turn coupled to 
formation of ATP from ADP and inorganic phosphate 
(P;). This reaction is catalyzed by the ATP synthase 
complex (complex V) of the inner membrane. The 
proton-motive force is also used to power the carrier- 
mediated transport of ADP and P; into the mitochon- 
drion from the cytoplasm, in exchange for ATP. 


Biogenesis 


Most of the proteins required for the formation and 
functioning of mitochondria are encoded in the 
nucleus, synthesized on cytoplasmic ribosomes, and 
imported into the organelle. However, a small but 
critical number of mitochondrial proteins are speci- 
fied by the mitochondrial genome; their messenger 
RNAs are translated by a mitochondrial protein- 
synthesizing system that is distinct from the main 
translation system located in the cytoplasm. Proteins 
encoded by mitochondrial DNA (mtDNA) include 
essential constituents of the inner membrane com- 
plexes I-V. Human mitochondria encode and synthe- 
size 13 such proteins, which interact with imported, 
nucleus-encoded partner proteins to form functional 
respiratory complexes. 

The mitochondrial translation system itself has a 
dual genetic origin. The ribosomal RNA components 
of the mitochondrial ribosome are always encoded by 
and transcribed from mtDNA, but most or all of the 
ribosomal proteins are imported. In some eukaryotes 
(e.g., animals), the mitochondrial genome specifies 
a minimal set of the transfer RNA (tRNA) species 
required to support mitochondrial translation; in 
other cases (e.g., plants), additional nucleus-encoded 
tRNA species must be imported into mitochondria to 
supplement a mtDNA-encoded set that is insufficient 
for mitochondrial protein synthesis. Components of 
the mitochondrial replication and transcription sys- 
tems are encoded exclusively by the nuclear genome in 
virtually all eukaryotes. Nucleus-encoded proteins 
that are destined for the mitochondrion are initially 
synthesized in the cytoplasm. These precursor pro- 
teins contain N-terminal targeting sequences that 
bind to receptors on the mitochondrial surface. The 
precursors then pass through a translocase of the outer 
membrane (the TOM complex), then interact with 
and transit the translocase of the inner membrane 
(the TIM complex). Following transport of the pre- 
cursor into the matrix, a processing peptidase removes 
the N-terminal targeting sequence. Chaperones then 
mediate the folding of the processed protein into its 
mature form. 


Evolutionary Origin 


The discovery and detailed investigation of mtDNA 
and the genes it encodes has provided compelling 
molecular evidence in support of a direct eubacterial 
origin of the mitochondrion and its genome. The 
‘endosymbiont hypothesis,’ first entertained more 
than a century ago, proposes that mitochondria origin- 
ated as bacteria-like endosymbionts within a nucleus- 
containing host cell. Although there is continuing 


debate about whether mitochondria originated at the 
same time as, or subsequent to, the nuclear and some 
other components of the eukaryotic cell, there is little 
question that the mitochondria traces its origin to a 
particular group of eubacteria, the a-Proteobacteria. 
The molecular evidence supporting this view includes 
comparisons of mitochondrial gene sequence (both 
rRNA and protein) and gene organization. Among 
extant bacterial phyla, mitochondria appear most 
closely related to a subgroup of a-Proteobacteria 
that includes obligate intracellular parasites such as 
Rickettsia prowazekii, the causative agent of epidemic 
typhus. 

Evolution of mtDNA has involved repeated and 
progressive loss of genes, sometimes with concomi- 
tant transfer of these genes to the nuclear genome. As 
a result, although the genetic function of mtDNA is 
basically the same in all eukaryotes, its actual content 
of genetic information varies quite widely. 


See also: Mitochondrial DNA (mtDNA); 
Mitochondrial Genome; Mitochondrial 
Inheritance; Mitochondrial Mutants 


Mitochondria, Genetics of 


B D Dyer 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1492 


All cells may be characterized as one of two types: 
prokaryotic and eukaryotic. The prokaryotes are the 
bacteria and their cells are relatively simple in that 
they lack nuclei and other membrane-bound compart- 
ments. Eukaryotes include all of the protists, fungi, 
plants, and animals. Their cells, by contrast, do con- 
tain nuclei (a definitive characteristic) as well as other 
compartments such as mitochondria in which specific 
cell processes are contained. Two important metabolic 
pathways, the Krebs cycle and electron transport are 
contained within the mitochondria. Mitochondria 
(along with Chloroplasts see separate article) are 
especially interesting among membrane-bound com- 
partments in that they have their own semiautono- 
mous genetic systems, a legacy of their evolutionary 
origins. Mitochondria are essentially well integrated 
bacterial symbionts with an estimated two and a half 
billion year history of intimate association with their 
hosts. The most closely related free living bacteria are 
the -Proteobacteria such as Rickettsia and Paracoc- 
cus. Even though mitochondria have greatly reduced 
genomes compared to their bacterial counterparts, 
they still retain many of their genetic capabilities as 
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well as some unusual features such as modified genetic 
code and distinctive patterns of inheritance. 


Mitochondrial Genomes 


Mitochondrial genomes are circular and are present in 
multiple copies per mitochondrion, characteristics of 
their bacterial ancestry. However mitochondria have 
lost most of their genes. Reclinomonas, an obscure 
protist has the largest number of mitochondrial 
genes, 97, densely arranged on 69kb of DNA. Plant 
mitochondria have hundreds to thousands of kb of 
DNA and yet relatively few genes. For example Arabi- 
dopsis has 367 kb but only 57 genes. Apicomplexans 
(including malaria-causing Plasmodium) have the 
smallest mitochondrial genomes with five genes and 
6 kb, perhaps a reflection of their highly obligate rela- 
tionships as intracellular parasites. The human mito- 
chondrial genome has 37 genes and 16.5 kb. 

How did these genomes come to be so reduced? A 
general evolutionary trend in obligate symbioses is a 
streamlining of genomes such that many redundant 
and extraneous genes are lost and many others are 
transfered horizontally to the symbiotic partner. 
Although mechanisms for loss and transfer of genes 
are not known to be directional, the fixation of such 
events is biased in favor of the nucleus. That is, 
although loss may occur in any genome and hori- 
zontal transfer may occur from mitochondrion to 
mitochondrion and in either direction between mito- 
chondria and nucleus, the net effect is that mitochon- 
dria have lost many genes and the nuclei have gained 
some. This is because nuclear genes are more likely to 
be evenly distributed and inherited in cell division, 
while the mitochondrial mechanism for distribution 
is less precise. During the cell cycle, mitochondria 
(often hundreds per cell) also replicate but are not 
necessarily partitioned evenly to offspring. Also any 
loss of redundant or extraneous genes is likely to con- 
fer a greater advantage to an individual mitochondrion 
by increasing its speed of replication relative to other 
mitochondria in the same cell. 


Shared Coding 


Horizontal transfer of genes from the mitochondria to 
the nucleus has resulted in several instances of shared 
coding for some essential mitochondrial functions. A 
good example is the mitochondrial F1 ATP synthase 
with some of its subunits coded for by the nucleus and 
others by the mitochondria. Different lineages of 
mitochondria have evolved different combinations 
for sharing the coding of the subunits. Thus while 
horizontal transfer seems to occur frequently and is 
almost always fixed in favor of the nucleus, the details 
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(such as which essential genes are transferred) seem to 
be more random. One consequence of shared coding is 
a cementing of symbiotic relationships in that neither 
partner can survive without the other if crucial struc- 
tures are the responsibility of both. Furthermore the 
regulatory control, especially on the transcriptional 
level for mitochondrial functions and structures, 
requires intricate coordination between nucleus and 
mitochondria 


Variations on the Genetic Code and 
Editing 


The universal genetic code is not entirely universal in 
respect to mitochondria. Slight changes have occurred 
in the mitochondria of all four kingdoms of eukar- 
yotes (animals, plants, fungi, and protists) and these 
changes correspond to the phylogenetic tree of eukar- 
yotes. Changes in genetic code entail mutations to 
tRNA genes which would be lethal in most cases.’ 
The seemingly high tolerance for tRNA mutations by 
mitochondria may be because of their reduced number 
of genes and therefore the lesser likelihood of a muta- 
tion being lethal. Also most of the changes have been to 
stop and start codons which may be less disruptive to 
the formation of complete mRNAs and proteins. The 
spacing of protein-coding genes is such that they often 
alternate with rRNA and tRNA genes such that pro- 
cessing of those non-mRNA transcripts might actually 
compensate for the lack of punctuation. 

Further complications arise in some plant mito- 
chondria and trypanosome mitochondira, as well as a 
few scattered taxa which edit their mRNA transcripts. 
Here, changes to amino acid codons have occured and 
the correct reading is restored by converting bases (in 
particular C to U) in the mRNA, a process called 
editing. It would seem to be a rather convoluted solu- 
tion to a problem with a tRNA mutation, and yet 
editing evolved at least twice. Essentially, editing com- 
prises two potentially lethal mutations, one which 
alters the use of a codon and another that compensates 
by switching C to U but which in itself could be dis- 
ruptive. An additional oddity is the arrangement of 
the trypanosome mitochondrion (present as one large 


‘Actually there are some nonmitochondrial examples of code 
changes in a few disparate groups. Some ciliates and a 
prokaryote, Mycoplasma have undergone changes in their stop 
codons and Candida, a yeast has undergone a change in an 
amino acid codon. These rate codon mutations are not well 
understood, although the examples of mutated stop codons 
would seem to support the contention that those are more 
tolerated than other codon mutations, at least under rare 
circumstances. 


organelle per cell). In addition to 40-50 copies of the 
main genome, there are thousands of copies of mini- 
circle DNA which is involved in the editing process. 


Recombination of Mitochondrial DNA 
and Mitochondrial Diseases 


The semiautonomous genetics of mitochondria is 
exemplified in recombination experiments using organ- 
isms (especially yeast) with a high tolerance for mito- 
chondrial mutations as a result of being facultatively 
aerobic. Mitochondria, in keeping with their bacterial 
ancestry, may exhibit resistance or sensitivity to anti- 
biotics and populations of doubly resistant (or sensi- 
tive) mitochondrial mutants may be obtained via 
mitochondrial fusions and gene recombination which 
apparently happen readily. In fact a cell with tens to 
hundreds of mitochondria may be viewed as a sort of 
container for the population genetics of mitochondria. 
Recombination is especially obvious in organisms in 
which gametes are of similar size and both contribute 
to the zygotic population of mitochondria. In the 
section “Maternal inheritance” below, recombination 
is somewhat less likely. 

Other mitochondrial mutations that can be shown 
to recombine include those that affect respiration 
These include petite mutations of yeast which are 
expressed as stunted growth due to lack of a complete 
aerobic metabolism. In obligately aerobic organisms, 
such as mammals, such mutations are in most cases 
lethal and therefore difficult to observe. However new 
techniques have revealed more and more mitochon- 
drial diseases of humans, many of which manifest 
themselves by a progressive loss of respiratory func- 
tion and deterioration of muscular and nervous 
systems. Kearns-Sayre disease, for example, progres- 
sively affects nerves and muscles because of large 
deletions in mitochondrial DNA that occur in devel- 
opment in some tissues. Interestingly, a milder version 
of the diseases may occur in old age due to the 
‘normal’ accumulation of mitochondrial mutations. 
Indeed, mitochondrial genes are more in the muta- 
tional cross fire than most because they are exposed 
to oxygen and free radicals as a normal part of their 
environment. 


Maternal Inheritance 


In organisms in which gametes are of unequal size 
(typically with female gametes being larger), there is 
an opportunity for maternal inheritance of mitochon- 
drial genes. The mechanism is simply that male 
gametes (sperm or pollen) do not contribute either 
significantly or at all to the mitochondria of the 
newly formed zygote. The mitochondria are all or 


almost all from the egg (or ovum). Thus, pedigrees for 
many mitochondrial diseases show an inheritance 
only from the mother. 

Maternal inheritance was the essence of the idea of 
discovering a ‘mitochondrial Eve’ by examining mito- 
chondrial changes in various human populations 
worldwide and attempting to construct a family tree. 
Mitochondrial DNA has a higher mutation rate than 
most nuclear DNA and therefore shows more vari- 
ability in just a few generations. Furthermore, human 
mitochondrial DNA is relatively unperturbed by 
recombinations. From such a study, a ‘mitochondrial 
Eve’ (or village of related ‘Eves’) was extrapolated to 
have lived in East Africa 200 000 years ago and to have 
given rise to all of the migrating populations of humans. 
However, the interpretation is not entirely straightfor- 
ward because a small number of sperm mitochondria 
sometimes do make a contribution. Recombination 
events between maternal and paternal mitochondria 
do occur, therefore, making the mitochondrial lineage 
a little less direct and the timing of the migrations 
somewhat more difficult to extrapolate. 
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Definition 


Mitochondrial DNA (mtDNA) isthe physical embodi- 
ment of the mitochondrial genome, the sum total of 
genetic information encoded in the mitochondrion. 
As its name implies, mtDNA is compartmentalized 
within the mitochondrion and is therefore physically 
and transcriptionally separate from the main, nuclear 
genome of the eukaryotic cell. Moreover, mtDNA is 
distinct in its evolutionary origin, having been derived 
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from a eubacterial ancestor through a process of endo- 
symbiosis. 


Isolation 


Mitochondrial DNA is concentrated in and readily 
isolated from a mitochondrial subcellular fraction pre- 
pared by cell disruption and differential centrifuga- 
tion. Alternatively, and especially in those cases in 
which a mitochondrial fraction is not readily isolated, 
mtDNA can often be separated from nuclear DNA 
(and from chloroplast DNA, when that is also pre- 
sent) by buoyant density centrifugation in a gradient 
containing the salt of a heavy metal (such as cesium 
chloride) and a UV-fluorescent dye. This separation 
is possible because mtDNA frequently has a base 
composition that is substantially different (usually 
more AT-rich) than that of the bulk of the nuclear 
DNA. 


Form 


The physical form of mtDNA is quite variable. In 
most animals, the mtDNA exists as covalently closed, 
circular DNA molecules of uniform size; in a few 
animals (e.g., hydra), the mtDNA is linear. Both linear 
and circular mtDNAs are found throughout the 
fungal and protist kingdoms. In linear mtDNAs, the 
structure of the termini can be quite different. For 
instance, in the green alga Chlamydomonas reinhard- 
tii, the linear mtDNA terminates at each end in a 3’ 
single-strand extension. Incontrast, inthe ciliate proto- 
zoon Tetrahymena pyriformis, the linear genome is 
capped at both ends by a tandem array of small, 
repeated sequences 31 bp long. 

Mitochondrial DNA has a complex structure in 
flowering plants (angiosperms). Through restriction 
site analysis and sequencing, circular maps can be 
constructed for most plant mtDNAs. These ‘master 
circles’ invariably contain large, directly repeated 
regions. Homologous recombination between two 
such repeats resolves a master circle into subgenomic 
circles, each of which contains one copy of the recom- 
bination repeat. In plant mtDNAs containing a large 
number of different repeats, the potential number of 
subgenomic recombination products becomes quite 
large. Some studies suggest that plant mtDNAs that 
map as circles, as well as perhaps some nonplant 
mtDNAs, actually exist and function in the form of 
complex, tandemly repeated, linear arrays. 


Size 


The smallest known mtDNA is found in Plasmodium 
falciparum, the human malaria parasite, and other 
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members of the protist phylum Apicomplexa. This 
mtDNA is about 6000 bp long and contains only 
three protein-coding genes in addition to small- 
subunit and large-subunit ribosomal RNA (rRNA) 
coding sequences. The mtDNAs of flowering plants 
are by far the largest characterized to date, ranging 
between approximately 200 and 2400 bp in size. How- 
ever, the greatest number of mtDNA-encoded genes 
(97) is contained in the 69034-bp mtDNA of the 
protist Reclinomonas americana. Size ranges for 
mtDNA are typically 15-20 kb in animals, 20-100 kb 
in fungi, and 20-80 kb in protists. 


Gene Organization 


Wide variation in gene organization is seen in 
mtDNA. A hallmark of animal mtDNAs is their com- 
pact structure, with coding sequences separated by 
only a few nucleotides, directly abutting one another 
or even overlapping. A noncoding (D-loop, or con- 
trol) region, ranging from a few hundred to a few 
thousand base pairs in size, contains signals directing 
replication and transcription. Some small fungal and 
many protist mtDNAs are similarly compact, with 
less than 10% of the total sequence being noncoding. 
At the other extreme, the large plant mtDNAs consist 
mostly (more than 90%) of noncoding DNA. 

Plant (angiosperm) mtDNAs are extremely vari- 
able in organization: large-scale differences have 
been found even within different varieties of the 
same angiosperm species. This genomic fluidity 1 is 
attributed to frequent recombination among angio- 
sperm mtDNA molecules, with variant organizational 
patterns becoming fixed in the population. In contrast, 
gene order is almost invariant in vertebrate animals. 

In some mtDNAs, the structure of certain genes 
borders on the bizarre. In the green alga C. reinhardtii, 
small-subunit and large-subunit rRNA genes are frag- 
mented into separate modules that are interspersed 
with protein-coding and tRNA genes throughout a 
6-kb stretch of the 15.8-kb mtDNA. The rRNA pieces 
transcribed from these rDNA modules are not spliced 
together to form a covalently continuous rRNA; 
rather, they function in fragmented form within the 
mitochondrial ribosome. An opposite situation is seen 
in the mtDNA of two protists, Acanthamoeba castel- 
lanii and Dictyostelium discoideum; here, two protein- 
coding genes are fused into a single open reading 
frame. 


Evolutionary Origin 


The discovery and subsequent investigation of 
mtDNA has provided compelling evidence that the 
mitochondrial genome derives in evolution from the 


a-proteobacterial phylum of eubacteria, and specifi- 
cally from a subgroup of o-Proteobacteria that 
includes obligate intracellular parasites such as Rick- 
ettsia prowazekii, the causative agent of typhus. 
Phylogenetic reconstructions based on both mito- 
chondrial rRNA and mitochondrial protein-coding 
genes support this evolutionary affiliation. 

The Reclinomonas americana mtDNA, a “eubac- 
terial genome in miniature,” is the most bacteria-like 
mtDNA characterized to date. Although gene order is 
highly variable among the broad range of mtDNAs, 
there are some cases in which the arrangement of 
genes is very similar in mitochondrial and eubacterial 
genomes. For example, a number of ribosomal protein 
genes are in the same order in Reclinomonas amer- 
icana and some other protist and plant mtDNAs as 
they are in Rickettsia prowazekii and other eubacterial 
genomes. However, certain of the genes present in 
the eubacterial clusters are specifically missing in the 
mitochondrial clusters. These mitochondrion-specific 
deletions, in concert with other kinds of data, argue in 
favor of a single endosymbiotic origin of mitochon- 
dria. 

Comparison of genes that carry out the same 
function but are encoded in different genomes (mito- 
chondrial or nuclear) has in some cases clearly demon- 
strated that these genes are homologous (related by 
descent from a common ancestor). These observations 
provide evidence of mitochondrion-to-nucleus gene 
transfer in the course of evolution, an ongoing process 
in many eukaryotes that has helped to reduce substan- 
tially the size and overall coding capacity of mt DNA 
relative to the endosymbiont genome from which it 
originated. 


Further Reading 
Margulis L (1970) Origin of Eukaryotic Cells. New Haven, CT: Yale 
University Press. 


See also: Mitochondrial Genome 


Mitochondrial Genome 


M W Gray 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 | .0837 


Mitochondria, the organelles of cellular respiration, 
contain their own store of genetic information, 
which comprises the mitochondrial genome. Mito- 
chondrial DNA (mtDNA) is the physical entity 
that encodes this organelle-specific information. The 
distinction between ‘mitochondrial genome’ and 


‘mitochondrial DNA’ is a subtle one, so much so that 
the two terms are often used interchangeably. 

The mitochondrial genome is the evolutionary 
remnant of a eubacterial (specifically «-proteobacter- 
ial) genome that became part of the eukaryotic cell 
through a process of endosymbiosis. Although the 
genetic information content of the mitochondrial 
genome is limited, it is essential in the formation of a 
functional organelle. Correct expression of the mito- 
chondrial genome is necessary for oxygen utilization 
and ATP formation to occur normally and at normal 
levels. 


Function 


The function of the mitochondrial genome is basic- 
ally the same in all eukaryotes in which it has been 
investigated: it always encodes a limited number of 
proteins involved in electron transport and coupled 
oxidative phosphorylation and (less frequently) one 
or more of the protein components of the mitochon- 
drial ribosome. The RNA species of the mitochondrial 
ribosome are always encoded by the mitochon- 
drial genome; in contrast, the number of mitochon- 
drially encoded transfer RNA (tRNA) species is quite 
variable. Messenger RNAs (mRNAs) transcribed 
from the mitochondrial genome are translated by 
a mitochondrial protein-synthesizing system whose 
components have a dual genetic origin. The recently 
characterized mitochondrial genome of Reclinomonas 
americana, a freshwater protozoan, contains the larg- 
est number of genes (97) so far identified in any 
mtDNA, including 18 protein genes not previously 
known to be encoded in mitochondria. 


Mitochondrial Genes 


Genes Encoding Proteins Involved in 
Electron Transport and Oxidative 
Phosphorylation 

The mitochondrial genome specifies components of 
complexes I-IV of the electron transport chain as well 
as complex V (ATP synthase). The genes correspond- 
ing to these various complexes are abbreviated nad 
(complex I), sdh (ID), cob (III), cox (IV), and atp (V). 
The number of genes in each class varies among mito- 
chondrial genomes, with the mtDNA of humans 
encoding seven nad, no sdh, one cob, three cox, and 
two atp genes (13 in total). The largest number of such 
genes (24) is found in the R. americana mitochondrial 
genome, whereas the smallest number (3) occurs in the 
mitochondrial genome of Plasmodium falciparum, 
the human malaria parasite, and related members 
of the protist phylum Apicomplexa. In mitochondrial 
genomes harboring smaller numbers of respiratory 
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chain genes, the ‘missing’ genes are typically found 
in the nuclear genome, with their cytoplasmically syn- 
thesized protein products being imported into mito- 
chondria. 


Genes Encoding Components of the 
Mitochondrial Translation Machinery 

All mitochondrial genomes encode the large subunit 
(23S-like) and small subunit (16S-like) RNA compon- 
ents of the mitochondrial ribosome, but only a few 
also encode a 5S ribosomal RNA (rRNA), an other- 
wise ubiquitous constituent of prokaryotic and eukary- 
otic ribosomes. Ribosomal protein genes are absent 
or almost absent from animal and fungal mitochon- 
drial genomes, but are encoded in plant and a number 
of protist mitochondrial genomes. For example, 27 
ribosomal proteins (12 small subunit and 15 large 
subunit) are specified by the R. americana mitochon- 
drial genome. The latter genome also has a gene (tufA) 
for a translation elongation factor. 

Mitochondrial genomes contain a variable number 
of tRNA genes, ranging from 0 in the kinetoplastid 
protozoa to upwards of 30 in other protozoa. The 
human and most other animal mitochondrial genomes 
encode 22 tRNA genes, a number just sufficient to 
support mitochondrial translation through an expand- 
ed codon recognition mechanism. In those cases 
where the number of mitochondrially encoded 
tRNAs is insufficient to support mitochondrial 
protein synthesis, supplementary nucleus-encoded 
tRNA species are imported from the cytoplasm. In 
flowering plants, some of the mitochondrial tRNAs 
are transcribed from portions of ‘promiscuous’ chloro- 
plast DNA that have been transferred to and incor- 
porated into the mitochondrial genome in the course 
of evolution. 


Genetic Code 


Mitochondria provided the first exceptions to the 
standard genetic code, demonstrating that the genetic 
code is not completely universal. In the mitochondria 
of humans, yeast, and many other eukaryotes, the 
codon UGA is translated as tryptophan instead of 
signaling termination. This particular change seems 
to have occurred independently in a number of differ- 
ent mitochondrial lineages. Other changes in the 
mitochondrial code are more limited in distribution, 
occurring in a few, closely related lineages. 


Gene Expression 


In virtually all mitochondria, mitochondrial trans- 
cription is entirely under the control of the nuclear 
genome, with the mitochondrial genome devoid of 
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any transcriptional genes. In almost all cases, the mito- 
chondrial RNA polymerase is a nucleus-encoded, 
single-polypeptide bacteriophage T3-like enzyme. A 
notable exception is the R. americana mitochondrial 
genome, which contains genes (rpo A-D) encoding four 
subunits of a multi-subunit, eubacteria-like «2Bp’o 
RNA polymerase. It appears that, at an early stage in 
the evolution of the mitochondrial genome, the latter 
enzyme was replaced by the nucleus-encoded, phage- 
like RNA polymerase, with subsequent loss of the rpo 
genes from the mitochondrial genome of almost all 
eukaryotes. 

Self-splicing introns (group I and II) are present in 
genes encoded by plant, fungal, and some protist 
mitochondrial genomes; so far, such introns have 
only been found in a few lineages of primitive animals. 
Intron sequences are removed following trans- 
cription in a cis-splicing process that, although funda- 
mentally autocatalytic, often requires auxiliary 
protein factors, encoded by either the mitochon- 
drial or nuclear genomes. In some plant mtDNAs, 
protein-coding genes are fragmented at the genome 
level, with the sub-genic ‘modules’ being scattered 
throughout the genome as a result of recombination, 
sometimes ending up on opposite strands of the 
mtDNA. Such fragmentation invariably takes place 
within intron sequences that are normally cis-spliced 
at the level of the transcript. In the case of these split 
genes, an intermolecular trans-splicing process 
between two separate RNAs (independently tran- 
scribed) serves to join exon sequences together in the 
correct order. 

A notable feature of mitochondrial gene expression 
is the frequent occurrence of ‘RNA editing,’ which 
re-tailors an otherwise nonfunctional transcript. Two 
types of editing, insertional and substitutional, have 
been described. Insertional editing is exemplified 
by the U insertion/deletion editing of mRNAs that 
occurs in kinetoplastid protozoa; in this case, U resi- 
dues are inserted at specific sites within the primary 
transcript, whereas some U residues that are encoded 
in the genome are removed at the RNA level. Such 
editing may involve only a few U residues or it may be 
quite extensive; in the latter case, such ‘pan editing’ 
may account for more than half of the residues present 
in the mature mRNA, with the corresponding ‘gene’ 
being unrecognizable. In this system, small antisense 
‘guide RNAs’ specify where editing should occur, 
providing information in trans through complemen- 
tary base-pairing interactions with the transcript to 
be edited. A C-to-U type of substitutional editing is 
prominent in plant mitochondria, with changes usually 
occurring at first or second positions of codons; as a 
result, a different amino acid is specified by the edited 
codon, compared with the corresponding unedited 


sequence. Such editing may also create initiation and 
termination codons. 

Although mRNAs are usually the targets of mito- 
chondrial editing systems, both rRNAs and tRNAs 
are edited in some mitochondrial systems. 


See also: Mitochondrial DNA (mtDNA); 
Mitochondrial Mutants 
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The form of extrachromosomal DNA inheritance that 
is attributable to mitochondrial DNA. Extrachromo- 
somal DNA, including mitochondrial and plastid 
DNA, is generally maternally inherited with rare 
paternally inherited exceptions. In humans the term 
mitochondrial inheritance is used synonymously with 
maternal inheritance. In humans the mechanism may 
be anatomical because the sperm contributes almost 
no cytoplasm to the zygote; there are exceptions in 
other organisms, for example Chlamydomonas rein- 


hardtii. 


See also: Maternal Inheritance 
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Like other genomes, the mitochondrial genome is 
subject to a variety of assaults that may induce heri- 
table mutations (notably base substitutions and dele- 
tions). Mitochondrial DNA (mtDNA), by virtue of its 
localization, is particularly susceptible to the muta- 
genic effects of reactive oxygen species, generated in 
the course of oxidative phosphorylation. In certain 
species, intragenomic recombination may delete 
large portions of the mitochondrial genome, result- 
ing in a nonfunctional mtDNA and a respiratory- 
deficient phenotype. In obligate aerobes, mutant 
mtDNA molecules invariably coexist with their nor- 
mal counterparts (a state termed ‘heteroplasmy’), with 
deleterious effects often manifested only under certain 
growth conditions or in selected tissues, and dependent 
on the ratio of mutant to wild-type molecules. Be- 
cause most of the genes for mitochondrial biogenesis 


are nuclear, mutations in certain of these nuclear genes 
may also induce mitochondrial dysfunction. 


Fungi 


Yeast (Saccharomyces cerevisiae) 

The first recognized mutation in mtDNA, identified 
even before the discovery of the mitochondrial gen- 
ome itself, was the ‘petite mutation’ in yeast. Petite 
(p ) mutants contain deletions of the wild-type (p*) 
yeast mtDNA, and may even lack mtDNA entirely 
(p°). They arise spontaneously and at high frequency 
(1-2% per cell in each generation) as a result of recom- 
bination between small, directly repeated sequences 
that are scattered throughout the yeast mitochondrial 
genome. Asa result, blocks of essential rRNA, tRNA, 
and/or protein-coding genes are lost, which incapaci- 
tates mitochondrial translation, electron transport, 
and/or coupled oxidative phosphorylation. Petite 
mutants are so named because they form small (‘petite’) 
colonies relative to wild-type cells when grown on a 
solid medium containing glucose. Petite mutants lack a 
functional respiratory chain and therefore grow more 
slowly than wild-type cells. These mutants obtain their 
energy by fermentation. Unlike wild-type yeast, petite 
mutants are unable to grow on nonfermentable carbon 
sources suchas glycerol. The ‘neutral’ petite phenotype 
(which may be either p° or p_) is not transmitted in 
crosses with p* cells, whereas the ‘suppressive’ petite 
phenotype appears ina portion of the progeny (ranging 
from less than 1% to more than 99%, depending on the 
petite strain) in such crosses. 

Other yeast mitochondrial mutations include: ant? 
(point mutations conferring antibiotic resistance in 
genes encoding the mitochondrial rRNAs and certain 
proteins); mit” (point or deletion mutations in indi- 
vidual mitochondrial protein genes); and syn” (point 
or deletion mutations that inactivate genes encoding 
components of the mitochondrial translation system). 
These three types of mutation display a uniparental 
(non-Mendelian) type of inheritance. 


Filamentous Fungi 
Like yeast petite mutants, the poky strain of Neuro- 
spora crassa is a slow-growing, respiratory-deficient 
mutant. In this case, 4 bp are deleted just upstream of 
the small subunit rRNA gene, within a 15-bp consen- 
sus sequence that corresponds to the transcription 
initiation site. The poky mutation results in a marked 
reduction in the number of mitochondrial ribosomes 
and hence in translation capacity, a consequence of 
defective assembly of small ribosomal subunits. 

The ‘stopper’ mutants of N. crassa have an unusual 
phenotype in which growth of the mycelium starts, 
stops, then starts again. During the stopped phase of 
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growth, aberrant mtDNA molecules arise as a result 
of intramolecular recombination between small, re- 
peated segments in the wild-type mtDNA. Selective 
accumulation of defective mtDNA molecules at the 
expense of the normal genome leads to growth arrest 
due to respiratory insufficiency. In senescent mutants 
of another filamentous fungus, Podospora anserina, 
the mycelial mass stops proliferating as the terminal 
hyphae become incapable of further extension. Here, 
hyphal senescence has been correlated with the ampli- 
fication of mtDNA sequences (senDNA), although 
the precise mechanism by which this amplification 
exerts its effect is not understood. Senescence is sup- 
pressive in that mutant mycelia confer a senescent 
phenotype on normal mycelia to which they are 
grafted. 


Flowering Plants (Angiosperms) 


Two types of mitochondrial mutation have been 
described in angiosperms: nonchromosomal striped 
(NCS) mutants in maize, and cytoplasmic male sterile 
(CMS) mutants in a variety of flowering plants. NCS 
mutants, which display poor growth and reduced 
yields, are characterized by yellow and white striping 
of the leaves. This mutation results from deletions that 
involve specific mitochondrial genes. 

The CMS trait is agriculturally important in the 
production of hybrid seed, because it renders plants 
male-sterile, thereby preventing self-pollination. CMS 
has been correlated with mtDNA duplications 
and rearrangements that create novel protein-coding 
genes. In the best-studied case, the Texas (cms-T) 
strain of maize, a unique 13 000-Da protein (URF13) 
is the product of the T-urf13 gene. Portions of this 
chimeric gene are derived from normal protein-coding 
(atp6) and rRNA (large subunit) genes found else- 
where in the mtDNA. Experiments have shown that 
URF13 is directly responsible for CMS, probably 
through its integration into the inner mitochondrial 
membrane as a potential channel-forming protein. 
An intriguing observation is that, under normal con- 
ditions, only the mitochondria of pollen-producing 
cells seem to be affected by the presence of mem- 
brane-bound URF13. 

URF13 isalso responsible for susceptibility of cms-T 
mitochondria to the toxin produced by Bipolaris 
maydis, a fungus that causes Southern corn leaf blight. 
This plant disease was responsible for a massive failure 
of the corn crop in the US in 1970, at which time about 
85% of the hybrid corn grown was the cms-T variety. 
The B. maydis toxin may exert its effect by interacting 
with URF13 in the membrane to open a channel that 
permits massive leakage of ions and small molecules 
through the inner mitochondrial membrane. 
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A remarkable feature of CMS is the role of nuclear 
genes in modifying its effects. In many cases, nuclear 
genes (‘restorer loci’) are able to reverse the effects of 
the mitochondrial determinants of CMS, even though 
the novel CMS-inducing genes are still present in the 
mtDNA. In maize, dominant nuclear alleles Rf' and 
Rf? act to alter expression of T-urf13 so that URF13 
abundance is greatly reduced. Asa result, cms-T maize 
plants carrying these restorer alleles are male-fertile. 


Humans 


Mutations in mtDNA have been associated with a 
variety of human syndromes (mitochondrial diseases). 
Base substitution (missense) mutations in mitochon- 
drial protein genes are the cause of two classes of mito- 
chondrial disease: Leber’s hereditary optic neuropathy 
(LHON) and neurogenic muscular weakness, ataxia, 
and retinitis pigmentosa (NARP). Typically, LHON 
mutations occur in nad genes, which encode com- 
ponents of respiratory complex 1 (NADH dehydro- 
genase). Mutations in tRNA genes cause myoclonus 
epilepsy with ragged red fibers (MERRF), mitochon- 
drial encephalomyopathy, lactic acidosis, and stroke- 
like symptoms (MELAS), and maternally inherited 
myopathy and cardiomyopathy (MMC). Because 
tRNA mutations may have a generalized negative 
effect on mitochondrial translation, their phenotypic 
consequences tend to be more severe than mutations 
in individual protein genes. 

The human mitochondrial genome is also suscep- 
tible to spontaneous deletions leading to the forma- 
tion and propagation of defective mtDNA molecules. 
In some cases, it is clear that the deletion is the result of 
intramolecular recombination between two directly 
repeated sequences in the circular genome, as in the 
case of petite mutants in yeast. Syndromes attribut- 
able to mtDNA deletion in humans include Kearns- 
Sayre syndrome (KSS), chronic progressive external 
ophthalmoplegia (CPEO), and Pearson syndrome. 

These various syndromes illustrate the importance 
of heteroplasmy in the progression to mitochondrial 
disease. Because humans and other animals are obli- 
gate aerobes, and these mitochondrial mutations (par- 
ticularly large deletions) may completely obliterate 
normal mtDNA function, the affected tissue can 
only survive and function if some proportion of 
wild-type mtDNA is present to support normal mito- 
chondrial function. Studies of mitochondrial diseases 
(which are maternally inherited in humans) have led 
to the concept of a ‘threshold effect,’ whereby the 
phenotype of the cell does not change until mutant 
mtDNA molecules reach a sufficiently high propor- 
tion (often more than 90%) to compromise the bio- 
energetic capacity of the cell. Because the proportion 


of mutant mtDNA molecules may increase with time 
owing to factors such as replicative advantage and 
somatic segregation, some mitochondrial diseases are 
typically late-onset syndromes. In view of this pat- 
tern, there is increasing interest in the possibility that 
somatic mtDNA mutations, associated with a pro- 
gressive decline in mitochondrial function, may be 
implicated in aging and cancer. 


See also: Aging, Genetics of; Mitochondrial DNA 
(mtDNA); Mitochondrial Genome 
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Mitosis, the process of somatic cell division, has been 
one of the most closely studied cellular processes 
since microscopists first witnessed dividing cells. Cell 
proliferation through mitosis is fundamental to devel- 
opment, growth, and tissue maintenance and so influ- 
ences human biology and medicine at fundamental 
levels. Mitosis is primarily a large-scale mechanical 
reorganization of the cell in which chromosome seg- 
regation and cytoplasmic fission are precisely choreo- 
graphed to provide error-free cell replication. In 
parallel, an intricate network of regulatory enzymes 
and interactions guides the cell through mitosis. 
Defects in mitotic regulation are central in the estab- 
lishment, growth, and genomic instability of human 
tumors. The geometry of cell division is critical for 
properly partitioning cells with different cytoplasmic 
contents in early development and contributes to tis- 
sue architecture through directed asymmetry. This 
evolving view of mitosis has resolved some of the earli- 
est puzzles of mitosis and defined new questions at 
the molecular and cellular levels that will profoundly 
impact human health. 


Systems 


Advances in microscopy continue to drive studies of 
mitosis. Digital imaging and advanced optical modes 
such as differential interference microscopy (DIC) 
yield resolution of individual spindle fibers in living 
cells. Fluorescence microscopy and fluorescent pro- 
tein analogs, facilitated by green fluorescent protein 
(GFP) and its derivatives, enable cell biologists to 
literally illuminate specific molecules and pathways 
in living cells. Genetic approaches in unicellular 
yeasts, fungi, and multicellular animals like flatworms 
and flies have identified many individual genes 


involved in mitosis and the pathways that they operate 
within. For example, elucidating the network of pro- 
tein kinase, phosphatase, and protease reactions that 
regulate execution of mitosis has depended critically 
on genetic methods, with many of these genes highly 
conserved from yeast to humans. Genetic assays have 
also been essential for dissecting the structure and 
function of centromeres, the chromosomal structures 
that link the chromosomes to the spindle. Biochemical 
approaches have been the central means of examining 
the individual molecular mechanisms that drive mito- 
sis, such as microtubule assembly, motor protein func- 
tion, and chromosome condensation. Methods for 
assembling functional mitotic spindles in soluble 
extracts of frog oocytes now provide a valuable sub- 
strate for deconstructing mitosis at a molecular level. 


Description 


During mitosis nearly all cellular components are 
switched to a form specialized for mitotic function 
or for transport. At a cytological level, chromosome 
condensation, mitotic spindle assembly and function, 
and formation of the cleavage furrow and cytokinesis 
dominate mitosis, which begins in prophase with the 
onset of chromosome condensation. Nuclear envel- 
ope breakdown initiates prometaphase and allows 
microtubules access to the nuclear contents. The 
developing mitotic spindle captures chromosomes by 
their kinetochores at the centromere and then trans- 
ports them toward the spindle equator in a process 
called congression. At metaphase, each chromosome 
is attached to microtubules from both poles of the 
spindle apparatus and is dynamically balanced at the 
spindle equator. The transition to anaphase is initiated 
by separation of sister chromatids which immediately 
results in the poleward chromosome movement of 
anaphase A, followed by spindle elongation in ana- 
phase B. Once segregated into separate cytoplasmic 
domains chromosome decondensation and nuclear 
envelope reassembly initiate reformation of the 
nucleus in telophase, followed soon thereafter by 
cytokinesis and the physical separation of the two 
daughter cells. Other global changes in cell structure 
facilitate segregation of cytoplasmic components. 
Cytoskeletal reorganization couples with changes in 
cell adhesion to cause most cells to adopt a rounded 
configuration, while dissolution of the Golgi appar- 
atus and endoplasmic reticulum into vesicles ensures 
that these components segregate with the cytoplasm. 


Spindle 


The spindle itself is made up primarily of micro- 
tubules, hollow 25 nm diameter protein tubes up to 
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several micrometers long. The subunit protein tubulin 
is a M, 120kDa heterodimeric GTPase that uses the 
free energy of GTP hydrolysis not to drive assembly, 
but to destabilize the assembled polymer. The result is 
a highly dynamic polymer array, with individual 
microtubules persisting for 2-15 min, that can be read- 
ily regulated by the cell. A key parameter is selective 
nucleation of microtubules by the spindle poles, 
which contain a specialized microtubule nucleating 
subunit, y-tubulin. Another is the selective stabiliza- 
tion of microtubules, which reinforces functional 
interactions. Chromosomes can glide along the sur- 
face of microtubules or bind to the dynamic ends of 
microtubules to generate motile force. 

A number of proteins interact with microtubules 
to modulate their stability, to promote binding — to 
chromosomes or other microtubules, for instance—and 
to generate force along the scaffold of the microtubule 
surface. Cytoplasmic dynein, a microtubule-stimulated 
ATPase motor, serves several roles including poleward 
chromosome movement in prometaphase and prob- 
ably anaphase, cross-linking microtubules at the spin- 
dle poles, and linking astral microtubules to the cell 
cortex. A number of kinesin family members such as 
CENP-E, MCAK, and Eg5 function throughout the 
spindle to promote pole formation (Eg5), mediate 
antipoleward forces through chromokinesins and 
CENP-E, link dynamic microtubule ends to the 
kinetochores (CENP-E), and destabilize microtu- 
bules bound at the kinetochore (MCAK). 


Chromosomes 


During prophase the chromosomes condense into 
their familiar rod-like configuration, driven by his- 
tone modifications, specific condensation proteins, 
and other chromosome-associated proteins such 
as DNA topoisomerases. Condensation continues 
through metaphase, and is reversed as cells exit mitosis 
beginning in telophase. The replicated chromatids are 
bound together by specific cohesion proteins and each 
possesses a kinetochore complex formed over the cen- 
tromere, which establishes the primary connection 
between chromosomes and the spindle. Kinetochores 
function as microtubule-binding and force-generating 
elements and also as signal-processing centers that 
integrate chromosome movement with global mitotic 
regulation. Each kinetochore binds the ends of a bun- 
dle of 15-40 microtubules to form the kinetochore 
fiber, readily visible by light microscopy. 


Force and Motility 


Throughout mitosis, kinetochores generate poleward 
force or idle in neutral. Chromosomes that are fully 
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engaged on the spindle come under tension as the 
sister kinetochores generate opposing poleward 
forces. The cohesion that binds sister chromatids 
together until anaphase provides an essential static 
element in this network of forces. Cohesion, estab- 
lished during S-phase as cohesin complexes are 
assembled on the newly replicated chromatids, main- 
tains the mechanical integrity of the centromere under 
tension. Cohesins are heterotetrameric protein com- 
plexes containing two SMC family ATPase subunits 
and two associated proteins that function to link sister 
chromatids together. Antipoleward forces, known as 
polar winds, push the chromosome arms away from 
poles along spindle microtubules using chromokine- 
sins that decorate the chromosome surface. The 
balance of these counterpoised forces leads to the 
dynamic alignment of chromosomes in the spindle 
equator in metaphase. Indeed, proteolytic degradation 
of the cohesin subunit Scc1p is a key event in mitosis 
and probably provides direct the trigger for anaphase 
chromosome movement. 

Chromosome movement in anaphase is driven by 
poleward forces generated at the kinetochore. This 
occurs through minus-end-directed motor activity, 
such as dynein, or by coupling chromosomes to 
disassembly kinetochore fibers, with CENP-E for 
example, or by a combination of these mechanisms. 
Centromere-associated microtubule destabilizing 
proteins could provide the stimulus for the observed 
shortening of kinetochore fibers in anaphase. At the 
end of anaphase A, the only microtubules in the cen- 
tral spindle comprise an array of antiparallel micro- 
tubules. Microtubule sliding drives the opposite poles 
apart, lengthening the spindle in anaphase B and driv- 
ing the chromosomes further apart. 

Cytokinesis is driven by the assembly and function 
of the contractile ring apparatus that forms on the cell 
cortex directly over the metaphase plate. Made largely 
of actin filaments with associated myosin motors, the 
contractile ring shortens, drawing the flexible cell 
membrane down along the plane of the spindle equa- 
tor. Specific membrane- and cortex-associated pro- 
teins link the contractile ring to the membrane, while 
a conserved family of filament-forming proteins, the 
septins, may be involved in regulating membrane 
fusion in cytokinesis. The placement of the contractile 
ring over the spindle equator has been long known to 
be a dominant property of the spindle. Thus, the cleav- 
age plane follows the spindle when it is experimentally 
repositioned in cells undergoing mitosis. A molecular 
link between the centromeres and the cleavage furrow 
has been identified among a group of passenger pro- 
teins, which bind the centromere until anaphase and 
are left behind at the equator when chromosomes 
transit to the poles. Some of these proteins relocate 


to the cell cortex prior to contractile ring formation 
and may thus transmit the spatial footprint of the 
spindle to the cell surface. 

One interaction that has recently been found to 
play an important role in development is the connec- 
tion between the spindle poles and the cell surface by 
specific linkages between cell surface sites, cytoplas- 
mic dynein, and microtubules. These interactions are 
required for placement of the spindle within the cell 
during divisions that result in different cell lineages 
due to asymmetric distribution of cytoplasmic com- 
ponents. It is likely that integration of spindle assem- 
bly and position with cell architecture is a key element 
in somatic mitoses in highly structured tissues. 


Regulation 


Mitosis represents a highly differentiated physiological 
state that, once triggered, evolves along a defined path- 
way that results in progeny cells. Three major regu- 
latory mechanisms control mitosis. The first is protein 
phosphorylation, mediated by cell-cycle-regulated 
protein kinases and phosphatases. The second is tar- 
geted protein degradation mediated by the ubiquitin- 
proteasome pathway, which imparts irreversibility to 
key steps in the process of mitosis. The third is check- 
point mechanisms that monitor the completion of 
specific processes and integrate parallel pathways, 
such as chromosome attachment and anaphase onset. 
These events are monitored by the p53 pathway and 
defects in mitosis can trigger apoptosis, presumably 
to prevent survival of cells with potentially harmful 
karyotypic damage. 

The onset of mitosis is triggered by activation of the 
strategic mitosis-directing kinase, Cdc2 kinase, during 
G2. Accumulation of the activator subunit cyclin B 
during Gz and the activity of upstream kinases (Weelp) 
and phosphatases (Cdc25p) regulate the build-up of 
Cdc2 kinase activity. At a threshold level, phosphor- 
ylation by Cdc2 kinase transforms a number of sub- 
strates into their mitotic forms, activating other 
kinases, and directly phosphorylating structural pro- 
teins in the nuclear envelope and spindle poles as well 
as Golgi and endoplasmic reticulum membranes. 
Other mitotic kinases play distinctive roles in mitosis 
and are localized in discrete domains of the spindle, 
for example the Aurora and Polo kinases, found in 
spindle poles, centromeres and midbodies, and regu- 
late centrosome duplication, histone phosphorylation, 
kinetochore function, and cytokinesis. Each phos- 
phorylation site has a corresponding phosphatase, 
and so the interwoven regulatory networks that drive 
mitosis by protein phosphorylation are quite complex. 

Progression through mitosis is governed by ubiqui- 
tin-dependent proteolysis, pathways that inactivate 


key proteins such as cyclin B and the cohesin subunit 
Sccl. Control of mitotic proteolysis is exerted through 
the anaphase promoting complex (APC), an evolution- 
arily conserved ubiquitin ligase complex. APC trig- 
gers exit from mitosis by targeting cyclin B and 
securin for degradation. APC activity is at least partly 
regulated by substrate-specific activators. Two of 
these, p55cdc (Cdc20p) and Hct1, mediate early and 
late waves of cyclin B degradation, respectively. Hct1- 
dependent cyclin degradation is dependent on the 
inactivation of Cdk1 mediated by the first round of 
p55cdc-APC-mediated destruction, exemplifying the 
irreversible sequence of events driven by ubiquitin- 
dependent proteolysis. 

Checkpoints are biochemical circuits that are acti- 
vated when a process fails to take place properly, and 
arrest downstream events to provide the cell an oppor- 
tunity to correct the error. The best-characterized 
checkpoint in mitosis is the spindle assembly check- 
point or anaphase checkpoint. This system, first dis- 
covered in yeast by finding mutants that failed to 
arrest in mitosis after spindle disassembly, appears to 
act as a sensor that reports functional kinetochore- 
microtubule attachment to the cell cycle machinery. 
Indeed, Mad2p is found concentrated at the kineto- 
chore with p55cdc, the APC targeting subunit, and is 
bound to p55cdc in the cytoplasm where the complex 
inhibits APC activity. The consequence of check- 
point-mediated arrest is that the cell pauses in meta- 
phase and waits until the triggering event is corrected. 
The checkpoint system in the kinetochore includes 
protein kinases (Bub1, BubR1) as well as other pro- 
teins such as Mad1p and Bub3p that may facilitate 
formation of APC inhibitor complexes or other func- 
tions of the checkpoint. The presence of checkpoint 
proteins at the kinetochore is sensitive to microtubule 
binding, tension exerted across the centromere, or 
both. Establishment of a functional spindle fiber con- 
nection inhibits kinetochore-dependent APC inhibi- 
tor production. A single free kinetochore is sufficient 
to inhibit APC and prevent mitotic progression until 
a connection is established. This system ensures that 
each chromosome is fully engaged by the spindle prior 
to anaphase onset. 


Therapeutics 


Mitosis is an important therapeutic target in cancer 
treatment. Vinblastine has been a staple of chemother- 
apy for some time while paclitaxel (taxol) is a newer 
anti-spindle chemotherapy agent. Both of these drugs 
bind to tubulin, affecting microtubule dynamics and 
function throughout the body. The improved under- 
standing of mitosis that is emerging has identified 
numerous potential targets for drug development in 
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a variety of mitosis-specific proteins and processes. 
New strategies for chemotherapy based on more 
selective or tumor-specific targets may improve the 
efficacy of this key approach to cancer and other pro- 
liferative diseases. 


Summary 


Mitosis is a large-scale physical reorganization of the 
cellular contents to form two new cells that are faithful 
replicas of their mother. It involves the construction of 
specific motility complexes to separate the chromo- 
somes and split the cytoplasm. Regulation of mitosis is 
driven by protein modification and regulated proteo- 
lysis, linked to the execution of key mitotic events by 
a system of checkpoints that ensure quality control. 
Approaches linking biochemistry and genetics with 
cell biology have produced an unprecedented view 
of the complexities of mitosis that will preoccupy 
researchers for some time. Unraveling these complex- 
ities should provide new understanding of mitosis, 
particularly in the context of architecturally distinct- 
ive tissues, and generate new directions for therapeutic 
intervention in proliferative diseases. 
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The MLL gene (myeloid/lymphoid leukemia or mixed 
lineage leukemia) also termed ‘ALL-1,’ ‘HRX, or 
‘Htrx’ is rearranged in somatically acquired recipro- 
cal translocations and in deletions and inversions 
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at chromosomal band 11q23. These rearrangements 
occur in 5-10% of patients with acute lymphoblastic 
leukemia (ALL) or acute myeloid leukemia (AML) 
and in some patients with acute myelodysplastic 
syndrome (MDS). MLL is involved in the majority 
of both acute leukemias occurring in children 
under the age of 1 year and therapy-related AMLs, 
usually arising due to treatment of primary malignan- 
cies with DNA topoisomerase II inhibitors. Neo- 
plasms with MLL rearrangements are clinically 
aggressive and respond poorly to therapy. The preva- 
lence of these leukemias in infants, sometimes only 
several weeks old, and the rapid appearance of the 
therapy-related neoplasms suggest that few, if any, 
additional mutations are necessary to produce the 
malignant phenotype. 

The 100-kb MLL gene is involved in dozens of dif- 
ferent chromosomal translocations. Twenty of these 
have been cloned currently and all shown to have led 
to in-frame fusions. The breakpoints within MLL 
occur within an 8.3-kb region delineated by exons 5- 
11 and containing multiple topoisomerase II sites and 
Alu repeats. The derivative chromosome 11 is retained 
and transcribed in all the tumors and encodes the 
leukemogenic hybrid protein composed of MLL N- 
terminus joined in frame to the carboxy portion of 
the partner protein. The most frequent chromosome 
translocation is t(4;11), almost uniformly associated 
with ALL. These tumors though, usually express at 
least one myeloid-associated antigen, reflecting an earl- 
ier stage of differentiation compared with common 
ALL. The second most common abnormality is the 
t(9;11) translocation, usually with an acute monoblast- 
ic phenotype (M5a). The moderately frequent trans- 
locations such as t(6;11) and 9(10;11) are often either 
monocytic or myelomonocytic (M5, M4). Chromo- 
some band 19p13 carries three MLL partner genes, 
each specifically associated with lymphoid or myeloid 
phenotype. In addition to translocations, MLL is re- 
arranged in some adult AML by a mechanism involving 
partial tandem duplication. The association of MLL 
rearrangements with lymphoid, myeloid, and bipheno- 
typic leukemias, as well as the composition of sur- 
face markers in these malignancies, indicate that the 
recombination events occur in multipotent early cells. 
Nevertheless, the correlation between particular re- 
arrangements and specific leukemia lineage, suggest 
that different MLL fusion proteins are restricted in 
their transformation capacity to cells at specific differ- 
entiation stages. This might also explain the finding of 
nonmalignantsmall clones of cells with MLL rearrange- 
ments in some hematological samples from normals. 

MLL is the human homolog of Drosophila tri- 
thorax (trx). The latter is a transcriptional activator 
and a member of the trithorax—Polycomb gene family, 


which plays an important role during development 
and adult life by providing a cellular memory for 
transcription of homeotic (Hom/Hox) and other 
genes. The major function of the homeotic genes is 
to specify body segment identity. This is accomplished 
by establishing a precise spatial expression pattern 
early in embryogenesis and maintaining it during 
further cell divisions. The maintenance function is 
carried out by the trithorax and Polycomb genes, act- 
ing as activators and repressors, respectively. The prod- 
ucts of these genes work in multiprotein complexes 
which bind to the chromatin of target genes and mod- 
ify it so as to enable or prevent transcription. 

The 430-kd MLL protein shares with the trx pro- 
tein two major motifs, the C-terminal SET domain, 
and a cluster of zinc fingers termed PHD fingers. Both 
domains are evolutionarily conserved and have been 
identified in a large number of proteins associated 
with transcription and chromatin. MLL alone con- 
tains several additional motifs —- AT hooks involved 
in DNA binding, a region with homology to DNA 
methyl transferase with a capacity to repress tran- 
scription, and a region conferring transcriptional 
transactivation. 

MLL rearrangements replace the C-terminal two- 
thirds of the protein with polypeptides derived from 
the partner genes. This results in loss of the SET 
domain, PHD fingers, and transactivation motif. The 
deleted molecule might still retain the capacity to 
localize to native MLL target sites, because it spans 
the AT hooks as well as sequences conferring nuclear 
speckled distribution. MLL partner proteins are a 
diverse, but not a random collection. This is indicated 
by the strong homology between some of these pro- 
teins such as AF-9 and ENL, AF-10 and AF-17, and 
CBP and p300. Experiments with mouse models engin- 
eered to express MLL/AF-9 by a ‘knock-in’ strategy, 
or retrovirally transduced with MLL/ENL clearly 
show that the partner polypeptides make essential 
contributions to the leukemogenic capacity of MLL 
chimeric proteins. Whereas the nature of these con- 
tributions is not yet resolved, one of the possibilities 
raised is that the partner polypeptides provide a cap- 
acity for homodimerization. 

Future investigations of MLL and its leukemic 
derivatives will try to resolve the normal biological 
processes with which MLL is involved, the genes (in 
addition to Hox) it controls, the proteins interacting 
with the major motifs identified within the MLL pro- 
tein, cellular signals regulating MLL activity, and crit- 
ical genes and pathway(s) regulated by MLL fusion 
proteins and directly involved in pathogenesis. 


See also: Alu Family; Leukemia; Leukemia, Acute; 
Leukemia, Chronic 
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One major issue in evolution is the evolutionary his- 
tory of life or organisms. Before the 1960s, the major 
source of data for inferring the history of organisms 
was fossils. As fossils are often scanty, the evolution- 
ary history that can be inferred from the fossil record 
is limited. Thus, there was a need to have some other 
source of data that is more readily accessible. Protein 
sequencing and DNA sequencing have provided such 
a source of data. (Proteins and DNA are known as 
macromolecules because they are much larger than 
chemical elements or compounds.) 

In the early 1960s, a number of protein sequences 
became available, and some biochemists were inter- 
ested in knowing how protein sequences have evolved 
with time. A surprising finding was that the hemo- 
globin sequences from human, cow, rabbit, and horse 
were roughly equally distant from one another. (The 
distance between two protein sequences is measured 
in terms of the number of amino acid differences per 
site.) Since these mammalian species were thought to 
have radiated at about the same time (about 75 million 
years ago), the approximate equality of pairwise dis- 
tances would suggest that amino acid substitution has 
proceeded at approximately the same rate in all these 
mammalian species. Zuckerkandl and Pauling (1965) 
therefore proposed that the rate of evolution in a 
macromolecule is approximately constant per year 
over time and among different evolutionary lineages. 
This proposal, known as the molecular clock hy- 
pothesis, immediately stimulated much interest in the 
use of macromolecules in evolutionary study, because 
if the hypothesis holds, macromolecules would be 
extremely useful for dating evolutionary events such 
as species divergence times. This method would be 
similar to the dating of fossils by radioactive elements. 
Moreover, macromolecules would be useful for infer- 
ring the relationships among organisms, for the distance 
between two species would be roughly proportional to 
their divergence time. 

The hypothesis, however, has provoked a great 
controversy because the clock concept does not fit 
well with the erratic tempo of morphological evolu- 
tion. Moreover, it is difficult to imagine why the rate 
of evolution should be constant per year instead of per 
generation because the rates of mutation for different 
organisms appear to be more comparable when meas- 
ured in terms of generation. Consequently, there has 
been a strong controversy over whether differences in 
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generation time can have a significant effect on the rate 
of evolution, i.e., the molecular clock. 

In the 1970s and early 1980s, numerous studies 
were made to test the molecular clock hypothesis, 
but the hypothesis was not rejected by the use of 
amino acid sequence data. Therefore, the hypothesis 
was widely accepted by molecular evolutionists. The 
rapid accumulation of DNA sequence data has allow- 
ed a much closer examination of the hypothesis. 
Although some scientists still support the molecular 
clock hypothesis, strong evidence now suggests that 
no global or universal clock exists. Below we review 
briefly a commonly used method for testing the mo- 
lecular clock hypothesis and studies of molecular 
clocks in mammals; we consider mammals because 
there have been extensive studies on mammals and 
because mammals are more familiar to the general 
readers. 


The Relative-Rate Test 


The rate of molecular evolution is usually defined as 
the number of nucleotide (or amino acid) substitu- 
tions per site per year. To estimate the rate we need 
to have the sequences from two species and the diver- 
gence date between the two species. Unfortunately, 
the species divergence dates are usually uncertain. 
As a consequence, the controversy over the molecular 
clock hypothesis often involved disagreements on 
the dates of species divergence. To avoid this pro- 
blem, the relative-rate test was proposed (Sarich 
and Wilson (1973)). This method is illustrated in 
Figure |. Suppose that we want to compare the rates 
in lineages A and B. Then, we use a third species, C, as 
a reference. The reference species should have 
branched off earlier than the divergence between spe- 
cies A and B. For example, to compare the rates in the 
human and orangutan lineages we can use a monkey 
species as a reference. 

From Figure I, it is easy to see that the number of 
substitutions between species A and C, Kac, is equal 
to the sum of substitutions that have occurred from 
point O to point A (Koa) and from point O to point C 
(Koc). That is, 


Kac = Koa + Koc 
Similarly, 


Kgc = Kog + Koc 


Kap = Koa + Kop 


Since Kac, Kgc, and Kap can be directly estimated 
from the nucleotide sequences (see Li, 1997), we can 
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A B Cc 


Figure | The rooted tree for species A, B, and C, 
assuming that C is the known outgroup. O denotes the 
common ancestor of species A and B. 


easily solve the three equations to find the values of 
Koa, Kos, and Koc: 


Koa = (Kac + Kas — Kpc)/2 
Kos = (Kas + Kpc — Kac)/2 


Koc = (Kac + Kac — Kap) /2 


We can now decide whether the rates of substi- 
tution are equal in lineages A and B by comparing 
the value of Koa with that of Kog. The time that has 
passed since species A and B last shared a common 
ancestor is, by definition, equal for both lineages. 
Thus, according to the molecular clock hypothesis, 
Koa and Kop should be equal, that is, d = Koa — 
Kop should not be statistically different from 0. 
From the above equations, we obtain Koa — Kos = 
Kac = Kc. Therefore, 


d = Kac — Kec 


An approximate formula for the variance of d has 
been developed (Wu and Li, 1985). A simple way to 
test whether an observed d value is significantly dif- 
ferent from 0 is to compare it with the standard error; 
for example, if the absolute value of d is larger than 
two times the standard error, it may be considered 
significant at the 5% level. Other methods for testing 
the molecular clock hypothesis have also been devel- 
oped (e.g., Li and Bousquet, 1992; Muse and Weir, 
1992). 


A Local Clock in Mice, Rats, and 
Hamsters 


Since the molecular clock hypothesis is controversial, 
the first question to ask is, “Does there exist a mo- 
lecular clock in any group of organisms?” (Such a clock 
is known as a local clock.) The best organisms to look 
for the existence of a local clock are a group of organ- 
isms with similar physiology and life histories such as 
generation time. The muroid rodents (i.e., mice and 
rats) and their relatives would be such a group for 
which there is abundance of DNA sequence data. We 
review below the study of O’hUigin and Li (1992), 
who used extensive sequence data from mice, rats, 
and hamsters (Table 1). 

First, let us compare the substitution rates in the 
mouse and rat lineages, using the hamster lineage as a 
reference. The number of substitutions per synonym- 
ous site (Ks) is 30.3% between mouse and hamster and 
31.1% between rat and hamster (Table 1). The differ- 
ence (0.8%) is not statistically significant because it is 
smaller than the standard error of Ks (1.0%). So, the 
synonymous rates in the mouse and rat lineages are 
nearly equal. The difference in the nonsynonymous 
rate (Ka), i.e., d = 2.9% — 2.7% = 0.2%, is equal to 
two times the approximate standard error of d and 
may be considered statistically significant. Thus, the 
nonsynonymous rate seems to be slightly faster in the 
mouse lineage than in the rat lineage. 

Second, we compare the substitution rates in the 
mouse (or rat) and hamster lineages, using the human 
lineage as a reference. The Ks value is 53.4% between 
mouse and human and 52.3% between hamster and 
human, so the difference 1.1% is smaller than the 
approximate standard error (1.5%) and is not statistic- 
ally significant. Similarly, the difference between the 


Table | Numbers of nucleotide substitutions per 100 
sites between species 

Species pair Ks Ka 
Mouse-rat 18.0 + 0.7 1.8 + 0.l 
Mouse—hamster 30.3 + 1.0 2.9 + 0.1 
Rat—hamster 31.1 + 1.0 2.7 + 0.1 
Mouse—human 53.4 + 1.5 5.2 + 0.2 
Rat-human 51.6 + 1.5 5.0 + 0.2 
Hamster—human 52.3 + 1.5 5.1 + 0.2 


Ks: number of nucleotide substitutions per synonymous 
site; Ka: number of nucleotide substitutions per nonsynon- 
ymous site. 

Number of synonymous sites compared = 4229; number of 
nonsynonymous sites compared = 15217. 

From O’hUigin and Li (1992). 


two lineages in the rate of nonsynonymous substitu- 
tion is also not significant. The same conclusion can be 
drawn when the mouse lineage is replaced by the rat 
lineage (Table |). Thus, the mouse, rat, and hamster 
lineages have evolved at nearly equal rates in terms of 
nucleotide substitution. 

In conclusion, there appears to be an approximate 
molecular clock in these rodents, at least in terms of 
synonymous substitution. This clock may be used to 
date divergence times among these rodents. For ex- 
ample, since the Ks value is 18.0% between mouse 
and rat and is about 31.0% between mouse-rat and 
hamster, the hamster lineage is estimated to have 
branched off 0.31/0.18 = 1.7 times earlier than the 
mouse-rat divergence. 


Lower Rates in Humans than in Monkeys 


There has been a longstanding controversy over the 
hominoid rate-slowdown hypothesis, which postu- 
lates that the rate of molecular evolution has become 
slower in hominoids (humans and apes) after their 
separation from the Old World (OW) monkeys. 
This hypothesis, proposed by Goodman (1961) and 
Goodman et al. (1971), was based on rates estimated 
from immunological distance and protein sequence 
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data. Wilson et al. (1977) contended that the slowdown 
was an artifact, owing to the use of an erroneous 
paleontological estimate of the ape—human divergence 
time. They conducted relative-rate tests using both 
immunological distance data and protein sequence 
data and concluded that there was no evidence for a 
hominoid slowdown. However, comparative analyses 
of DNA sequence data by Koop et al. (1986), Li and 
Tanimura (1987), and others provided strong support 
for the hominoid slowdown hypothesis and the hy- 
pothesis was accepted by many molecular evolutionists. 

Table 2 shows some comparisons of the substitu- 
tion rates in the human and OW monkey lineages. In 
the table, Kı and K>3 are the distances between an 
OW monkey and a New World (NW) monkey and 
between the human and a NW monkey, respectively. 
For the introns compared K,3 — Ky; is positive, except 
that K13 — Kə; is 0 for the e-globin and interferon-o 
receptor introns and is slightly negative for the lipo- 
protein lipase intron. When all introns are considered 
together K,;3 — Kə is significantly greater than 0, 
implying that the rate in the OW monkey lineage is 
significantly faster than that in the human lineage. 
The same conclusion is obtained from the flanking 
sequence data (Table 2). Thus, there is indeed evi- 
dence for the hominoid-slowdown hypothesis. 


Table 2 Differences in the number of nucleotide substitutions per 100 sites and the relative rates of substitution 
between the Old World monkey (species |) and human (species 2) lineages, with the New World monkey (species 3) 


as a reference 


Sequence Nucleotides K,2° K,3° K23° Kiz — K323 Rate 
compared ratio” 
n-globin pseudogene® 8,781 6.7 11.8 10.7 l.l + 0.3** 1.4 
Introns 
IGF2 1,589 6.4 15.8 14.2 1.6 + 0.8* 1.7 
é-globin 928 49 11.5 11.5 0.0 + 0.8 1.0 
Insulin 862 9.7 17.0 15.9 L.I + 1.3 1.3 
Mast-cell carboxypeptide 1,275 5.5 13.3 12.5 0.8 + 0.8 1.3 
Carbonic anhydrase 7 501 7.2 II.I 9.7 1.5 + 1.4 1.5 
Interferon-o receptor 885 7.6 14.0 14.0 0.0 + I.I 1.0 
Apolipoprotein C3 1,270 8.7 18.5 16.9 1.6 + 1.0 1.5 
Lipoprotein lipase 1,168 7.9 13.6 13.8 —0.3 + 1.0 1.0 
Total 8,478 7.1 14.7 13.9 0.8 + 0.3** 1.3 
Flanking and untranslated regions 
é-globin 388 5.3 13.5 10.6 2.9 + 1.4* 3.4 
Insulin 548 9.8 15.8 12.6 3.2 + 1.5* 2.0 
Total 936 7.9 14.9 11.7 3.1 + 1.1** 2.3 


“Kj = number of substitutions per 100 sites between species i and j. 
>The ratio of the rate in the Old World monkey lineage to the rate in the human lineage. 


“Excluding Alu sequences. 
*Significant at the 5% level. 
Significant at the 1% level. 


Data from Bailey et al. (1991), Porter et al. (1995), and Ellsworth et al. (1993 and unpublished). 
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The intron sequence data suggest that the OW 
monkey lineage evolves 1.3 times faster than the 
human lineage, which is similar to that (1.4) estimated 
from the n-globin data. The flanking sequence data 
suggest that the rate ratio is more than two times. 
However, since the latter data set is small, the ratio 
estimated from this set may not be reliable. Further 
data are needed to see whether the ratio varies among 
different DNA regions. 


Higher Rates in Rodents than in Other 
Mammals 


From DNA hybridization data, Laird et al. (1969) and 
Kohne (1970) estimated the substitution rates between 
mouse and rat and between human and chimpanzee 
and concluded that the former rate is much higher 
than the latter. They attributed the higher rate in 
rodents to a shorter generation time, i.e., the gener- 
ation-time effect. Sarich and Wilson (1973) argued that 
this difference in rate was based on questionable 
assumptions about the divergence times between spe- 
cies. To avoid controversies over the assumptions of 
divergence times, the following study used the rela- 
tive-rate test. 

To compare the rate of nucleotide substitution in 
the rodent lineage with that in another eutherian lin- 
eage, the marsupial lineage is used as an outgroup; it 
is difficult to use another eutherian as an outgroup 


because the eutherian phylogeny remains uncertain. 
Only synonymous transversions (changes between 
a purine (A or G) and a pyrimidine (C or T)) and 
nonsynonymous substitutions were used because 
synonymous transitional changes between a marsupial 
and a eutherian appear to have been saturated in 
many genes. Table 3 shows that the nonsynonymous 
rate is significantly higher in the rodent lineage 
than in the human, rabbit, carnivore, horse, and artio- 
dactyl lineages. For synonymous transversions the 
rate in the rodent lineage is also significantly higher 
than those in the human, rabbit, and artiodactyl 
lineages, though it is not significantly higher than 
those in the carnivore and horse lineages, perhaps 
owing to smaller sample sizes. The rate ratio is at 
least 1.2 and can be close to 2 (Table 3). Note that 
these ratios are averages over the long period of 
time since the divergence of the rodent and other 
eutherian lineages. As the rate of evolution would 
have been similar among lineages during the early 
stage of divergence, the rate differences among 
lineages in more recent times are likely to be larger 
than the long-term average. For example, under 
the assumption that the rate difference has increased 
linearly with time since the primate-rodent diver- 
gence, the synonymous rate in mice and rats at 
the present time is expected to be 2 x 1.9 = 3.8 instead 
of only 1.9 times faster than the rate in humans. At 
any rate, rodents in fact have a faster molecular 


Table 3 Difference (d) in the number of nucleotide substitutions per 100 sites between the rodent lineage (species |) 
and a nonrodent lineage (species 2) with the marsupial lineage (species 3) as a reference 


Nonsynonymous substitutions 


Synonymous transversions 


Genes Lo + L2 Ki2 Kiz K23 d dlo r L4 Ki2 Ki3 K223 d dlo r 
(total bp) 
Human 34 (40641) 34067 7.1 12.8 116 1.2 71#* 14 6574 194 498 43.8 6.1 4.75** 1.9 
Rabbit 13 (16647) 13851 9.3 15.9 144 1.5 465** 14 2796 23.2 53.1 47.6 5.5 245* 1.6 
Carnivore 8 (6660) 5563 12.3 20.1 17.9 2.3 3.71** 1.5 1097 24.7 484 45.7 2.7 0.79 1.2 
Horse 9 (6996) 5729 9.0 17.1 15.3 1.9 3.75** 1.5 1267 23.2 53.1 504 2.7 081 1.3 
Artiodactyls 24 (26679) 22316 107 166 15.4 1.3 4.64** 14 4363 243 535 490 46 2.46* 1.5 


Lo + L2 = total number of nondegenerate and twofold degenerate sites and L4 = total number of fourfold degenerate sites. 
A nondegenerate site means that every possible change at that site changes the amino acid encoded, while a fourfold 
degenerate site means that all the three possible nucleotide changes at the site are synonymous (i.e., do not change the 
amino acid encoded). 

Kj, is the number of substitutions per 100 sites between species i and j. Species | was mouse (or rat when mouse was not 
available). For carnivores, cat (or dog or ferret) was used. For artiodactyls, cow (or sheep or pig) was used. Species 3 
(outgroup) was a marsupial (possum, opossum, kangaroo, or potoroo). 

d = Kn — Kaz and o = \/var(Ki3 — Kz3). 

ris the ratio of the evolutionary rate of the rodent lineage (species |) to that of the nonrodent lineage (species 2) computed 
as the ratio of the branch lengths of the two lineages. 

* and ** significant at the 5% level (i.e., d/o > 1.96) and the 1% level (i.e., d/o > 2.58), respectively. 

From Li, Tourasse and Adkins (unpublished). 


clock than primates, artiodactyls, lagomorphs, carni- 
vores, and perissodactyls. 


Concluding Remarks 


As noted above, a local clock may exist as long as the 
generation times of the organisms under study are 
similar. These local clocks are useful for estimating 
the divergence times between species and for con- 
structing phylogenetic relationships. Although no 
global clock seems to exist, macromolecules are still 
better data for estimating divergence times between 
organisms that are distantly related because they 
evolve more regularly than morphological characters. 
Moreover, as macromolecules are abundant, one can 
select those that seem to have less serious violation of 
the rate constancy assumption for estimating diver- 
gence times. Note also that methods for estimating 
divergence times that do not rely on strict rate con- 
stancy are being developed. 
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The English geneticist Gabriel A. Dover has coined 
the term ‘molecular drive’ to include all of the very 
different mechanisms that can lead to similarities in 
repetitive sequences. Such sequences form an import- 
ant component of eukaryotic DNAs. For complex 
repetitive sequences, the copies have not evolved 
independently of each other. Rather, the sharing of 
sequence implies they are homologous, that is, that 
the similarities between copies reflect shared descent. 

Which mechanisms create this shared descent? This 
depends upon the organization of the repeats. Some 
sequences, such as the ribosomal RNA genes, show 
tandem repetition. In eukaryotes, they consist of a 
cluster of three genes, producing the 18S, 5.8S and 
28S ribosomal RNAs. This cluster is then tandemly 
repeated hundreds of times, with each copy in the 
same orientation. For tandem arrays, the most likely 
means of sharing descent is unequal recombination. 
With an internally repetitious DNA sequence, the two 
homologous chromosomes are likely to pair out of 
register in meiosis, with a sequence at a given position 
in the array on one homolog pairing with a different 
sequence on the other. Recombination will now dupli- 
cate one sequence and delete another. Over many gener- 
ations, this random process of unequal recombination 
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will cause sequences throughout the array to be des- 
cended from the same sequence. 

A related process, gene conversion, also plays a 
part. This results from an abortive attempt at recom- 
bination between two repeated sequences which, in its 
early stage, involves the invasion of a DNA double 
helix with one strand of another. This process is 
resolved without recombination occurring, but the 
strand invasion process is associated with repair of 
the recipient double helix so that it looks like the 
invader — a gene conversion. 

Interspersed repetitive DNAs, scattered throughout 
the chromosomes, are mobile DNAs — transposable 
genetic elements. For these sequences, it is transpos- 
ition itself which causes repeat copies to share des- 
cent. For example, many transposable elements, the 
so-called retrotransposons, move via an RNA inter- 
mediate, transcribed from the DNA and subsequently 
copied back into DNA (in a process called reverse 
transcription). The new DNA copy is integrated into 
a random location in the chromosomes, and shares 
homology with the donor sequence at the original 
site. Thus, as elements copy themselves to new loca- 
tions and old copies of the elements are lost, the elem- 
ent family as a whole retains a similarity of sequence 
which results from this sharing of ancestry. 

The homology of repetitive sequences implies that 
all of the copies that we see in an organism’s chromo- 
somes must have had, at some time in the past, a single 
common ancestor. (There would, at that time, have 
been many other copies but, by chance, only one has 
descendants surviving today.) Suppose that we looked 
at the same repetitive sequence family in two species, 
A and B, and suppose that, within each species, all 
copies of the repeat had an ancestor 5 million years 
ago. Suppose also that the two species themselves 
separated 10 million years ago. The consequence is 
that there will be differences in sequence between 
all copies of the sequence in A and all copies in B. 
These will be the changes in the first 5 million years 
after the speciation event, before the times of common 
ancestry of the sequences within species. Thus, the 
different repeats of the sequence within a species 
evolve together, showing what has been called con- 
certed evolution. 

The evolution of repetitive sequences depends 
greatly upon whether the sequences are functional 
and tandemly arrayed, such as the ribosomal RNA 
genes. In order to have a role in evolution, a new 
mutation arising in one of the array of copies of 
the gene has to spread in two ways. It has to spread 
through all copies in the array (by unequal crossing- 
over), and also has to spread throughout the popula- 
tion. The rate of the latter process will depend on 
whether the mutation is a neutral change, spreading 


by random drift, or selectively advantageous. Some 
have argued that the spread of new variants through 
arrays might be biased, with some variants being more 
likely to spread than others. This might be thought of 
as the strongest sense of ‘molecular drive.’ Conceiv- 
ably, even a selectively disadvantageous new mutation 
might spread through a biased process, if the bias was 
strong enough. However, there is no evidence to sup- 
port such speculations. 

The evolution of transposable elements is more 
complicated. Transposition can be replicative — a 
copying of the sequence into a new location. It follows 
that, all else being equal, the number of copies will 
increase. However, copy number appears to be 
roughly constant in time, implying that its expected 
increase is being balanced by some compensatory 
force. The best candidate for this force is natural selec- 
tion, with individuals with an above average number 
of copies having a reduced fitness. This implies that 
transposable elements are parasitic sequences, or 
selfish DNAs, giving no benefit to their host, but 
nevertheless persisting because of their ability to 
overreplicate. They may, indeed, harm the host. In 
Drosophila melanogaster, hybrid dysgenesis is a syn- 
drome of aberrant traits, including sterility at high 
temperatures and elevated rates of mutation, which 
arises in the offspring of crosses between males bear- 
ing a transposable element and females lacking the 
element. Since sterility in hybrids is often seen in 
incipient speciation, perhaps transposable elements 
play a role in speciation, although the evidence so far 
does not favor this hypothesis. 

That transposable elements are parasitic DNAs is 
not universally accepted. Since they are inevitably 
mutagenic when they insert into chromosomes, some 
have argued that this mutagenicity benefits the host. 


See also: Hybrid Dysgenesis; Ribosomal RNA 
(rRNA); Transposable Elements 
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Molecular genetics differs from classical genetics in 
treating genes and their products as chemicals rather 
than as abstract entities. Several lines of discovery prior 
to 1950 led to the birth of molecular genetics: (1) bio- 
chemical studies of genetic traits, which ultimately 
supported the one gene — one enzyme hypothesis; (2) 
the discovery of chemical mutagenesis, which showed 
that genes could be permanently modified by reactive 


compounds; (3) studies of bacteriophages, which 
behaved chemically as large molecules but had all the 
properties of genes, as presciently seen by Muller in 
1922; (4) investigation of the chemical nature of genes, 
culminating in Avery’s demonstration that DNA had 
the properties expected for the genetic material. The 
discoveries of structural molecular biology, in particu- 
lar the Watson and Crick proposal for the structure of 
DNA, then provided the foundation for the flowering 
of molecular genetics in subsequent decades. 

The term molecular genetics is now frequently 
used to describe a collection of simple and powerful 
techniques for the chemical study of genes. The oldest 
of these are the tools of bacteriophage and bacterial 
genetics, which supplied the ability to move selected 
genes around by means of conjugation, transform- 
ation or transduction. Hybridization between DNA 
strands and/or RNA strands provided a sequence- 
dependent means of manipulating and analyzing 
nucleic acids in vitro. Work on restriction enzymes 
produced methods for the precise cutting and map- 
ping of DNA molecules, and polymerases and ligases 
permitted their reassembly into new combinations. 
Gel-based resolution and hybridization methods 
were developed to analyze large DNA and RNA 
molecules (Southern and northern blotting). 

Construction of whole genome libraries became 
possible with the development of cloning vectors. 
Desired genes could then be identified and isolated 
from these libraries by means of hybridization or 
other detection methods. In parallel with these ad- 
vances, methods for sequencing DNA were developed 
and became steadily faster and cheaper. Specific modi- 
fications could be made to cloned genes by means 
of site directed mutagenesis. Efficient synthesis of 
oligonucleotides, together with the invention of the 
polymerase chain reaction (PCR), provided a whole 
new battery of techniques for the manipulation of 
DNA molecules. These core technologies of molecu- 
lar genetics are still being constantly added to, refined 
and automated, with no end in sight. 


See also: Genetics; Mouse, Classical Genetics 
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The first human—mouse somatic cell hybrids were 
established in 1967 by fusing cultured mouse cells 
deficient in thymidine kinase (TK) with human cells 
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containing TK. These hybrids were then cultured 
under selective conditions where only the hybrid 
cells expressing TK would survive. Such hybrids con- 
tain a complete complement of mouse chromosomes 
plus a random selection of human chromosomes, 
but always contain human chromosome 17 as it bears 
the TK gene. Since 1967 many hybrids have been 
produced that contain varying combinations of 
human chromosomes on different rodent backgrounds 
(mouse, rat, or hamster). Over time, those human 
chromosomes for which there is no selection are lost 
in a random fashion during cell division. However, if 
cells are cultured under selection for a specific bio- 
chemical, antigenic, or cell surface marker it is possible 
to generate hybrids that retain a single human chromo- 
some on a mouse or other rodent background. 
Alternatively a selectable marker (such as resistance 
to the antibiotic neomycin) can be introduced by 
transfection and integration into the human chromo- 
some prior to cell fusion. This has led to the pro- 
duction of human monochromosomal hybrids for 
each of the human chromosomes. 


See also: Transfection 


Monocistronic mRNA 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1915 


Monocistronic mRNA is a messenger RNA that gives 
rise to a single polypeptide chain when translated. 
All eukaryotic mRNAs are monocistronic, but some 
bacterial mRNAs are polycistronic, e.g., those tran- 
scribed from operons. 


See also: Cistron; Operon 
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Background 


Antibodies (syn; immunoglobulins) are tetrameric 
protein molecules composed of two identical heavy 
chains (H chains) and two identical light chains (L 
chains) that have the ability to bind specifically to 
antigens. Each H and L chain contains a variable 
domain (v region) and a constant domain (c region); 
specific binding to antigen is determined by the H and 
L v regions, which together form the antigen binding 


compounds; (3) studies of bacteriophages, which 
behaved chemically as large molecules but had all the 
properties of genes, as presciently seen by Muller in 
1922; (4) investigation of the chemical nature of genes, 
culminating in Avery’s demonstration that DNA had 
the properties expected for the genetic material. The 
discoveries of structural molecular biology, in particu- 
lar the Watson and Crick proposal for the structure of 
DNA, then provided the foundation for the flowering 
of molecular genetics in subsequent decades. 

The term molecular genetics is now frequently 
used to describe a collection of simple and powerful 
techniques for the chemical study of genes. The oldest 
of these are the tools of bacteriophage and bacterial 
genetics, which supplied the ability to move selected 
genes around by means of conjugation, transform- 
ation or transduction. Hybridization between DNA 
strands and/or RNA strands provided a sequence- 
dependent means of manipulating and analyzing 
nucleic acids in vitro. Work on restriction enzymes 
produced methods for the precise cutting and map- 
ping of DNA molecules, and polymerases and ligases 
permitted their reassembly into new combinations. 
Gel-based resolution and hybridization methods 
were developed to analyze large DNA and RNA 
molecules (Southern and northern blotting). 

Construction of whole genome libraries became 
possible with the development of cloning vectors. 
Desired genes could then be identified and isolated 
from these libraries by means of hybridization or 
other detection methods. In parallel with these ad- 
vances, methods for sequencing DNA were developed 
and became steadily faster and cheaper. Specific modi- 
fications could be made to cloned genes by means 
of site directed mutagenesis. Efficient synthesis of 
oligonucleotides, together with the invention of the 
polymerase chain reaction (PCR), provided a whole 
new battery of techniques for the manipulation of 
DNA molecules. These core technologies of molecu- 
lar genetics are still being constantly added to, refined 
and automated, with no end in sight. 


See also: Genetics; Mouse, Classical Genetics 
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The first human—mouse somatic cell hybrids were 
established in 1967 by fusing cultured mouse cells 
deficient in thymidine kinase (TK) with human cells 
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containing TK. These hybrids were then cultured 
under selective conditions where only the hybrid 
cells expressing TK would survive. Such hybrids con- 
tain a complete complement of mouse chromosomes 
plus a random selection of human chromosomes, 
but always contain human chromosome 17 as it bears 
the TK gene. Since 1967 many hybrids have been 
produced that contain varying combinations of 
human chromosomes on different rodent backgrounds 
(mouse, rat, or hamster). Over time, those human 
chromosomes for which there is no selection are lost 
in a random fashion during cell division. However, if 
cells are cultured under selection for a specific bio- 
chemical, antigenic, or cell surface marker it is possible 
to generate hybrids that retain a single human chromo- 
some on a mouse or other rodent background. 
Alternatively a selectable marker (such as resistance 
to the antibiotic neomycin) can be introduced by 
transfection and integration into the human chromo- 
some prior to cell fusion. This has led to the pro- 
duction of human monochromosomal hybrids for 
each of the human chromosomes. 


See also: Transfection 
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Monocistronic mRNA is a messenger RNA that gives 
rise to a single polypeptide chain when translated. 
All eukaryotic mRNAs are monocistronic, but some 
bacterial mRNAs are polycistronic, e.g., those tran- 
scribed from operons. 


See also: Cistron; Operon 
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Background 


Antibodies (syn; immunoglobulins) are tetrameric 
protein molecules composed of two identical heavy 
chains (H chains) and two identical light chains (L 
chains) that have the ability to bind specifically to 
antigens. Each H and L chain contains a variable 
domain (v region) and a constant domain (c region); 
specific binding to antigen is determined by the H and 
L v regions, which together form the antigen binding 
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site. The size of the repertoire of unique specificities in 
humans is estimated to be between 10’ and 10°, and 
since each unique antibody must be encoded by an H 
and L chain gene, the number of genes encoding the 
antibody repertoire is very large. Since the total human 
genome is thought to be only around 10° genes, the 
large number of unique genes for antibody specificity 
is brought about through the process of somatic re- 
arrangement of Ig gene segments resulting in the large 
number of functional Ig genes. This process of somatic 
rearrangement of Ig gene functions occurs only in B 
lymphocytes (B cells) and each developing B cell has a 
unique functional Ig gene. 

Each developing B cell carries out the process of 
random somatic rearrangement and the resultant Ig is 
expressed at the surface of the B cell as an antigen- 
specific receptor. When the B cell makes contact with 
the appropriate antigen, the cell is induced to prolifer- 
ate, and by a translational modification begins export- 
ing the Ig molecules as antibodies. The proliferation 
results in a clonal expansion of the particular B cell and 
thus the elaboration of large numbers of specific anti- 
bodies in the serum. Most antigens are composed of 
many antigenic determinants (epitopes) and each epi- 
tope is able to stimulate a different B cell. Because each 
of these antibodies reacts with a different epitope on 
the inducing antigen, this results in antibodies to each 
of the specificities being secreted into the serum. Such 
an immune serum is called an antiserum. 

It had long been the goal of immunologists to have 
pure populations of antibodies against known epit- 
opes. This was realized in 1975 when Köhler and 
Milstein (1975) published a method for growing 
individual clones of B cells by fusing them with 
plasmacytomas (tumors of B cells) and selecting cells 
producing antibodies of the desired specificity. The Ig 
products of these clones were called monoclonal anti- 
bodies (and the antibodies found in antiserum were 
called polyclonal antibodies to distinguish the source, 
even though all immunologists realized that antibodies 
are by definition the product of a single clone). Since 
the original method was published, advances in mole- 
cular biology have allowed monoclonal antibodies to 
be produced by a variety of methods. 


Monoclonal Antibodies from 
Plasmacytomas 


Cells producing antibodies that react with the immun- 
izing antigen represent a small fraction of the total B 
cells in the animal. After immunization of a mouse, it 
is possible to fuse each of the lymphocytes to plasma- 
cytomas. The antibody-forming cells are incapable of 
surviving for more than a few days in tissue culture 
and although the plasmacytomas survive indefinitely 


(they are said to be ‘immortal’), they are not making 
the antibody of the desired specificity. Kohler 
and Milstein developed the methods to fuse the 
antibody-forming cells with the plasmacytomas and 
capture the desired characteristic of each; secretion of 
the specific antibody of interest and ‘immortality.’ 
Using appropriate screening methods, those ‘immor- 
tal’ cells making the desired antibody can be isolated 
and propagated in vitro and large amounts of mono- 
clonal antibody can then be harvested from these cells. 
But this method has proven to be technically very 
difficult with human B cells. 


Monoclonal Antibodies from 
EBV- Transformed cells 


Monoclonal antibodies are produced from human 
antibody-forming cells by the im vitro infection of a 
population of peripheral blood lymphocytes with 
Epstein-Barr virus (EBV). EBV-infected B cells 
become ‘immortal’ and can be propagated in vitro. 
Selection methods similar to those for mouse plasma- 
cytomas are used. Humans obviously can only be 
injected with material that is of potential therapeutic 
value, but the fact that the entire repertoire of anti- 
bodies resides as functional antibody genes clonally 
distributed among the human B cells, makes EBV 
transformation of peripheral human lymphocytes in 
theory a source of human monoclonal antibodies. In 
practice however, EBV transformation has been useful 
only in probing the nature of antibodies in auto- 
immune disease (Nakamure et al., 1988). 


Humanized Monoclonal Antibodies 


Because it is difficult to produce human monoclonal 
antibodies, a compromise set of technologies has been 
developed to produce a hybrid antibody that contains 
mouse variable (v) regions and human constant 
regions. These molecules are produced by replacing 
the genes that encode the areas of the v regions involved 
in antigen binding in a human Ig (of unknown binding 
specificity) with the genes encoding the binding 
regions from a mouse monoclonal antibody of known 
(and desired) specificity. (The replaced areas are called 
complementarity-determining regions, CDRs, and the 
process is called CDR grafting.) The process is labori- 
ous but has yielded several monoclonal antibodies of 
potential therapeutic value (Queen et al., 1989). 


Monoclonal Antibodies from Libraries of 
Genes and Gene Segments 


Using polymerase chain reaction (PCR) it has been 
possible to clone both unrearranged and rearranged 


germline variable gene segments, and libraries of all 
heavy and light chain variable gene segments have been 
constructed. Additional diversity is introduced into 
the libraries by random alteration of a segment unique 
to the heavy chain. By displaying the variable region 
proteins on filamentous phage, libraries of functional 
v genes ranging from 10° to 10'° members have been 
generated (Marks et al., 1991; Winter and Milstein, 
1991). These libraries are then screened and phages 
displaying the variable regions of interest are isolated. 
The desired constant region gene is added and the 
complete functional monoclonal antibody is produced 
in quantity. 


Monoclonal Antibodies from Synthetic 
Libraries 


The human v region germ line gene segments can be 
grouped into several families based on sequence 
homology at both the DNA and protein levels, 
and completely synthetic antibody genes have been 
synthesized from consensus sequences of these 
families. Diversity is added to these consensus family 
genes by random modification of the same H chain 
gene segment used in the method described above. 
Members of the resultant library of synthetic variable 
region genes are then joined to the desired c region 
genes and functional genes for H and L chains are 
produced (Knappik et al., 2000). 
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Jacques Monod (1910-76) was born in Paris. An 
accomplished musician (he played the cello through- 
out his life), Monod seriously considered a career in 
music. However, he graduated from the University of 
Paris with a degree in science in 1931. He was in- 
fluenced during a stay at Marine Biology Station by 
his contact with biologists such as Lwoff and 
Ephrussi, and also during a stay in 1935 at Caltech, 
where he interacted with Morgan, Sturtevant, Beadle, 
and McClintock. In 1937, Monod began work in Paris 
on Escherichia coli growth, showing different growth 
rates on different sugars, and discovering in 1940 the 
phenomenon of diauxic growth, which led to his inter- 
est in the phenomenon known at the time as enzyme 
adaptation. During World War II, Monod served in 
the French Resistance in Nazi-occupied Paris, rising 
to the post of Chief of Staff in the Paris area. Monod 
had joined the Communist Party to enable him to 
have influence in the Resistance. Despite being in 
danger of arrest, Monod still continued experiments 
in the laboratory of André Lwoff at the Pasteur Insti- 
tute. In 1944 Monod and Alice Audreau completed a 
study of the reversion from Lac’ to Lac* of a strain of 
E. coli, E. coli mutabile, that permitted a determin- 
ation of the partly random origin of spontaneous 
mutations, independently of the classic work by 
Luria and Delbriick. In 1945, Monod joined the 
Pasteur Institute, and began working on the extension 
of his experiments with diauxic growth. In two years 
he shifted his focus to the ‘induction’ problem, or how 
the appearance of an enzyme able to metabolize lac- 
tose appeared only after lactose was introduced into 
the medium. These experiments were carried out 
against additional turmoil, since Monod split with 
the Communist Party over the Lysenko affair, which 
had forced Russian biologists to denounce the chromo- 
somal theory of inheritance, meaning that genes 
did not exist. Monod’s courage in writing articles 
attacking Lysenko’s ideas drew support from Albert 
Camus, but not from many Communists in France, 
who attacked him. In 1950, Francois Jacob came to 
work in Lwoff’s laboratory, and the historic collab- 
oration that resulted in the operon model for gene 
regulation was born. Monod and Jacob showed 
that gene regulation was mediated by cytoplasmic 
intermediates, a repressor in the case of the lactose 
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metabolism genes, that had dual recognition, being 
able to bind to both an inducer and a site on the 
DNA called an operator. By binding to an inducer, a 
derivative of lactose, the repressor changed its activity 
so that it could no longer prevent transcription of the 
lactose metabolism genes by binding to the operator. 
For this work Monod was awarded the 1965 Nobel 
Prize for Physiology or Medicine together with Jacob 
and Lwoff. Monod also helped to develop the concept 
of allostery, in which a protein such as the repressor 
could exist in two states, and shift from one to the 
other in response to binding a small molecule such as 
an inducer. Monod became Head of the Cellular Bio- 
chemistry Department at the Pasteur Institute in 1954 
and became Director of the institute in 1971. In 1970, 
Monod authored the widely discussed book Chance 
and Necessity, which argued that chance was respon- 
sible for evolution and the origin of life. 


Further Reading 
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A monomorphic locus is a locus or gene that is 
uniform throughout a particular population. There is 
only one allele at the locus, and thus it cannot be 
followed in classical genetic breeding experiments 
(which depend upon segregation of different alleles). 


See also: Polymorphism 
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In On the Origin of Species, Darwin stated his view 
that “the natural system is founded on descent with 
modification” and that “all true classification is genea- 
logical,” because the characters which the species of a 
taxon share “have been inherited from a common 
parent” (Darwin, 1859, p. 420). An evolutionary tax- 
onomist can draw from this the conclusion that the 


characters of all the species included in a higher taxon 
must be derivable from the characters of the nearest 
common ancestor. Taxa that conform to this demand 
were called by Haeckel (1866), monophyletic. 

It is not always easy to determine which characters 
were inherited from the nearest common ancestor. 
Owing to special adaptations, some characters of a 
taxon may become strikingly different from the 
equivalent feature of the ancestor while, conversely, 
some characters of unrelated species may become 
exceedingly similar by convergent evolution. 

In order to fully understand how to determine 
monophyly, it is helpful to compare it with the defin- 
ition of an identical twin. Two individuals are identical 
twins, not because they are so similar, but they are so 
similar because they are monozygotic, i.e., derived 
from a single zygote (fertilized egg). The definition 
of monophyly follows from analogous reasoning: “A 
taxon is monophyletic if all the included species are 
derived from their most recent common ancestor,” 
and not because the species are so similar. Therefore, 
ina Darwinian classification, to be recognized, a taxon 
must consist of species that are not only similar, but 
also satisfy the conditions of monophyly. Such a clas- 
sification is strictly phylogenetic (since all included 
taxa are monophyletic, as defined by Haeckel). 
A Darwinian dendrogram may differ considerably 
from a holophyletic cladogram in a cladification (see 
Cladistics). 

How can one determine the derivation of one taxon 
from another one? This can be done by determining 
how many homologous characters they share. Fea- 
tures in two taxa are homologous when they are 
derived from the same (or a corresponding) feature 
of their nearest common ancestor. The test for hom- 
ology consists of similarities of various kinds, such as 
general appearance, position of a structure in relation 
to neighboring structures or organs (inapplicable to 
nonmorphological features), and similarity in on- 
togeny (a comparison of embryonic stages sometimes 
reveals homologies that are not apparent in a compari- 
son of adults). The more homologies two taxa share, 
the greater is the probability of their monophyly. 

The qualification ‘nearest common ancestor’ is 
important, as illustrated by the example of the bird 
wing, which is homologous to the anterior extremity 
of terrestrial vertebrates, but not to the wing of bats. 
The nearest common ancestor of birds and bats had 
no wings. 

Homologous features usually show similarity; they 
generally also perform the same or similar functions, 
particularly in close relatives. However, it is mislead- 
ing to refer indiscriminately to any kind of similarity 
as homology, as is done by some molecular bilogists. 
Exceedingly similar macromolecules sometimes 


originate independently by convergence. Some hom- 
ologous features are extremely dissimilar in form and 
function, such as the articulating bones of the reptilian 
jaw, which evolved in the mammals into two of the 
middle ear ossicles. The detection of the homology of 
such dissimilar features is one of the most gratifying 
triumphs of comparative research. 


References 

Darwin C (1859) On the Origin of Species. London: John Murray. 

Haeckel E (1866) Generelle Morphologie der Organismen. Berlin: 
Georg Reiner. 


See also: Cladistics; Holophyly 


Monosomy 


M A Ferguson-Smith 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 | .0848 


Somatic cells contain a specific number of pairs of 
chromosomes, the particular number depending on 
the species. When one member of a pair of chromo- 
somes is missing in the individual’s karyotype (see 
Karyotype), this is given the term monosomy for the 
missing chromosome. Monosomy for an autosome 
leads to severe developmental abnormality if not 
inviability. In humans, autosomal monosomy appears 
only in spontaneous abortions, or as a somatic defect in 
tumor tissue. Sex chromosomes prove the exception, as 
monosomy X has been observed in a number of mam- 
mals, and leads to Turner syndrome in humans (see 
Turner Syndrome). Most human conceptions with 
monosomy X abort in early pregnancy and in those 
that survive there are interesting differences in cogni- 
tion between those in whom the paternal rather than 
the maternal X is present. Conceptions in which a Y 
chromosome is present without an X are unknown. 


See also: Karyotype; Trisomy; Turner Syndrome 
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Thomas Hunt Morgan (1866-1945) was a legendary 
leader in genetics who discovered sex-linkage and 
whose work on recombination in Drosophila paved 
the way for the first mapping of genes to linear pos- 
itions on a chromosome. 
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Born in Kentucky, Morgan attended the University 
of Kentucky and obtained his PhD from John 
Hopkins in 1890. From 1891 until 1904, he was a 
professor at Bryn Mawr College where he studied 
experimental embryology. During this time he became 
an internationally respected developmental biologist, 
studying regeneration in earthworms and develop- 
ment in sea urchins as well as producing the book 
The Development of the Frog’s Egg: An Introduction 
to Experimental Embryology. 

In 1904, Morgan moved to Columbia University, 
where he assembled an exceptional group of students 
and colleagues to work in the “fly room” as it came to 
be known. Soon after this move, he turned his atten- 
tion from developmental biology to genetics and 
around 1909 he began using the fruit fly as an experi- 
mental model. Attracted by its short generation time, 
tremendous fertility, and low cost of maintenance, 
Morgan was able to make rapid progress in his search 
for mutants owing to the huge populations of Dros- 
ophila he could maintain in his laboratory. Another 
important consideration in his selection of this new 
animal model was Morgan’s belief that evolution 
should be studied with non-domesticated organisms. 

After a couple of years of work with fruit flies, 
Morgan was fortunate to find a single male with white 
eyes, the result of a spontaneous mutation. From this 
single fly he made one of his greatest discoveries, 
conducting a classic set of experiments quite represen- 
tative of much of his work: first he mated this mutant 
fly with wild-type (red-eyed) females, every single 
one of the offspring (1237 to be exact) regardless of 
sex, had red eyes. But when these red-eyed F; off- 
spring were mated with each other, Morgan found 
that all of the female offspring were red-eyed yet 
only half of the males were red-eyed with the remain- 
ing half white-eyed. He then conducted the reciprocal 
crosses (white-eyed females with wild-type males) 
and observed that all female offspring had wild-type 
eyes and all male offspring had white eyes. From these 
simple results, Morgan was able to conclude that the 
white-eyed phenotype was related to the sex of the fly 
and must be carried on one of the sex chromosomes, 
the X chromosome. 

Morgan’s discovery that genes reside on the chromo- 
somes was an important breakthrough inasmuch as 
it provided a mechanism by which Mendel’s laws of 
segregation and independent assortment could be 
explained. Moreover, soon after his description of 
sex-linked inheritance, Morgan and his students 
found that numerous other genes were also on the X 
chromosome. With the help of Alfred Sturtevant, an 
undergraduate assistant in his lab at the time, Morgan 
was then able to construct the first chromosome map, 
deducing not only that each gene was located at a 
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specific location on the linear chromosome but also 
calculating their exact distances from each other. 
(The unit describing distances along a chromosome is 
now called a centimorgan and equals 1% recombin- 
ation between two genes.) Morgan and his students 
expanded on these ideas in their important 1915 work 
Mechanisms of Mendelian Heredity. 

It is remarkable that Morgan and his colleagues 
were able to determine the map locations of individual 
genes at a time microscopes were not even powerful 
enough to observe genes. Most of the fly room re- 
search during the 1910s and 1920s simply used Morgan’s 
elegant and powerful research program based on an- 
alysis of linkage data from experimental crosses. 

One of the most important contributions of T.H. 
Morgan was actually nonscientific. It was the unique 
environment he created in which he and his colleagues 
conducted their science. The fly room at Columbia is 
legendary in part due to Morgan’s ability to attract 
so many brilliant scientists — his students included, 
among others, Sturtevant, Calvin Bridges, Curt 
Stern, and Herman Muller (a future Nobel laureate) 
— but what really set it apart from the traditional 
university research laboratories was the collegial and 
generous atmosphere in which the highest value was 
placed on the free exchange of ideas and results. 

When he was 62, Morgan moved to the California 
Institute of Technology where he recruited numerous 
former students and colleagues to join him as he 
helped establish a critical mass of experimental geneti- 
cists in their young biology department. He remained 
at Caltech until his death in 1945. In 1933, Morgan was 
awarded the Nobel Prize for Medicine or Physiology 
“for his discoveries concerning the role played by the 
chromosome in heredity.” In keeping with his gen- 
erosity of spirit he shared the prize money with 
Sturtevant and Bridges. 


See also: Drosophila melanogaster; Muller, 
Hermann J 
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MOS is the cellular homolog of the Moloney murine 
sarcoma virus oncogene. In Xenopus oocytes the 


synthesis of MOS is stimulated by the steroid hormone 
progesterone which initiates maturation. MOS, a 
serine/threonine kinase, activates a MAPK kinase 
cascade that in turn activates maturation-promoting 
factor, a complex of Cdc2 and cyclin B. MOS is not 
essential for oocyte maturation in other species but 
MOS expression is sufficient to cause meiosis I and is 
required for meiosis II. MOS is an active component 
of cytostatic factor, an activity responsible for arrest in 
metaphase at the end of meiosis II. Ectopic expression 
of MOS in somatic cells can induce oncogenic trans- 
formation. 


Further Reading 
Sagata N (1997) What does Mos do in oocytes and somatic 
cells? BioEssays 19: 23-28. 
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Overview 


The karyotype of most individuals is established at 
fertilization. This constitutional karyotype is then 
maintained throughout subsequent somatic cell div- 
ision. Cytogenetic analysis of some individuals, how- 
ever, shows the presence of two or more cell lines with 
different karyotypes. Typically this is one normal cell 
line alongside one, or occasionally more, abnormal 
cell lines, although a normal cell line may not always 
be apparent. Analysis of DNA polymorphisms usually 
demonstrates that the various cell lines are derived 
from a common fertilization event, i.e., one individual 
with two or more cell lines (mosaicism), rather than a 
fusion of separate zygotes (chimerism), the latter being 
a much rarer condition in humans. 


Origins of Mosaicism 


Mosaicism arises by one of two general mechanisms: 


1. A somatic error in a postfertilization mitotic div- 
ision in a ‘normal’ conception. This produces mo- 
saicism for trisomies, 45,X, other sex chromosome 
abnormalities, and autosomal structural rearrange- 
ments. 

2. A somatic event in a postfertilization mitotic div- 
isioncorrectinganerror presentinan ‘abnormal’ con- 
ception. Correcting events are primarily restricted 


to correction of trisomy arising from meiotic errors 
during gametogenesis (these are specifically termed 
reduction to disomy), or loss of a marker chromo- 
some. How a cell recognizes the presence of an 
additional chromosome and then subsequently 
excludes it, is unknown; the frequency at which 
correction occurs (an estimated 5-10% of all aneu- 
ploid conceptions) strongly suggests that it is not a 
random process. Analysis of DNA polymorphisms 
is consistent with correction occurring at a single 
mitotic division rather than being a continuous 
process. 


Somatic errors and correcting events are often referred 
to as mitotic and meiotic errors, respectively, indicat- 
ing the source of the abnormal cell line. 


Timing of Mosaicism 


In humans, most significant somatic errors and cor- 
recting events take place prior to blastocyst formation, 
although somatic errors may continue to occur 
beyond this point of development. It is this timing of 
establishment of a mosaic karyotype that makes the 
situation complex. In a human 64-cell blastocyst, 
the great majority of cells are destined to produce 
the trophoblast cells in the developing chorion. Only 
10-15 cells form the inner cell mass, with perhaps as 
few as three or four of these producing the embryo 
proper, the rest primarily developing into other com- 
ponents of the chorion and the remainder of the extra- 
embryonic tissues. Chance distribution of normal 
and abnormal cells in a mosaic blastocyst, frequently 
appears to result in a quite uneven representation of 
the different karyotypes in each of these early em- 
bryological cell lineages, an effect that is presum- 
ably being exaggerated by any tendency of daughter 
cells, of like karyotype, to remain in close proximity 
following mitotic cell division. 


Development of Mosaic Embryos 


In cases where the abnormal cell line is present in the 
fetal tissues, development will be affected directly. 
Mosaic forms of the common ‘viable’ trisomies for 
chromosomes 13, 18, and 21 and for the sex chromo- 
some abnormalities are well documented in both live- 
born individuals and pregnancy losses. For these 
abnormalities, high levels of aneuploid cells may still 
be compatible with survival to term. In contrast, most 
other trisomies are relatively lethal im utero, both in 
the nonmosaic state, or if significant numbers of 
abnormal cells are present in the fetus. Liveborn in- 
dividuals, mosaic for these latter trisomies, although 
demonstrating quite profound phenotypic effects, 
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will therefore often have only very low levels of aneu- 
ploid cells. Additionally, these aneuploid cells may 
have an uneven or restricted tissue distribution, mak- 
ing detection or exclusion of mosaicism, in the clinical 
situation, somewhat problematical. Mosaics for the 
more lethal trisomies are frequently observed among 
chromosomally abnormal first trimester pregnancy 
losses. Overall, of the 30-50% of early pregnancy 
losses that are chromosomally abnormal, around 1 in 
10 are mosaic. 

An alternative scenario, however, is equally com- 
mon; it is also the subject of much interest because of 
its diagnostic implications. Here, the fetus develops 
solely from the normal cell population, all abnormal 
cells being limited to the extraembryonic tissues, a 
condition known as confined placental mosaicism 
(CPM). Correction of a trisomic conception with sub- 
sequent CPM is referred to as trisomic zygote rescue. 
CPM is found in as many as 1 in 50 pregnancies 
undergoing prenatal diagnosis using chorionic villus 
sampling in the first trimester. As this incidence is as 
high as that of ‘true’ cytogenetic abnormalities affect- 
ing the fetus, it is clearly important for diagnostic 
accuracy to differentiate between these two possibil- 
ities. CPM can, however, have other consequences. 
Although the great majority of CPM pregnancies pro- 
ceed uneventfully to term, about 5-10% of cases are 
associated with adverse outcomes of pregnancy, e.g., 
fetal losses, intrauterine deaths, or stillbirths. Others 
result in moderate to severe intrauterine growth 
restriction (IUGR) in the third trimester; indeed, 5- 
10% of severe IUGR (below the 3rd centile for 
weight) may be attributed to this cause. A small num- 
ber may result in fetal overgrowth im utero. Adverse 
outcomes are almost always associated with trisomic 
zygote rescue. Unlike somatic errors, where the 
abnormal cell line may represent only a minor sub- 
population within the blastocyst, starting with an 
abnormal karyotype inevitably means that a high 
percentage of cells (in practice it often seems the 
majority) will be abnormal. If these abnormal cells 
persist, they will subsequently form a large propor- 
tion of the placenta, in many cases affecting its 
growth, function, and ability to support normal fetal 
development. Such effects are seen in corrected tri- 
somy 2 and 16 pregnancies. Trisomic zygote rescue will 
also result in uniparental inheritance of the remaining 
two homologs in one in three cases. Uniparental 
inheritance of chromosomes carrying imprinted 
genes in this way is one of several causes of certain 
uniparental disomy syndromes in man. The chromo- 
somes of particular significance in CPM are 7 and 15, 
where maternal uniparental inheritance produces 
Silver—Russell syndrome and Prader-Willi syndrome, 
respectively. 
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Chromosome-Specific Factors 


Within the above broad picture of CPM, each chromo- 
some has its own highly individual pattern of fre- 
quency and behavior, with much detail yet to be 
elucidated. Of the aneuploidies commonly seen in 
this form, trisomy 16 CPM is almost exclusively a 
consequence of correction of errors arising in maternal 
meiosis I, whereas trisomy 7 CPM is primarily due to 
mitotic errors in normal conceptions; trisomy 15 CPM 
orignates through both mechanisms. CPM for tri- 
somies 2 and 3 have an additional complication in that 
they demonstrate a cell-lineage-specific distribution, 
with abnormal cells being preferentially encountered 
in inner cell mass-derived and trophoblast-derived 
components of chorionic tissue, respectively. The 
mechanisms underlying these patterns of chromosome 
specific behavior are not understood. In a small sub- 
group of pregnancies, mosaicism within the placenta 
may actually result in greater survival of an abnormal 
fetus. Studies of trisomy 13 and 18 fetuses reaching 
term (or late terminations of pregnancy) show a much 
greater incidence of placental mosaicism, with a nor- 
mal karyotype in trophoblast cells, than in studies of 
those pregnancies lost in the first trimester. This indi- 
cates that the presence of normal trophoblast cells 
somehow interferes with the processes by which the 
maternal system recognizes these as abnormal concep- 
tions. Such effects on enhanced survival are not seen 
among trisomy 21 pregnancies. 


Mosaicism in Preimplantation Embryos 


Most of our understanding of mosaicism in humans 
has been deduced from the analysis of diagnostic 
samples derived from continuing and noncontinuing 
pregnancies and from liveborns. Cytogenetic analysis 
of small numbers of nonreturned embryos from IVF 
programs has allowed some direct observation of 
aspects of mosaicism and related phenomena; caution 
should be exercised, however, in extrapolating this to 
‘normal’ conceptions. Most information comes from 
limited fluorescence in situ hybridization (FISH) 
analysis of interphase cells, which detects three over- 
lapping groups of embryos: those with essentially 
uniform normal or abnormal karotypes; those with 
two or more cell lines present; and those with large 
numbers of cells with diverse karyotypes (chaotic 
embryos). Data from fully karyotyped blastocyst 
metaphases support this broad classification, but add- 
itionally suggests that the abnormal cells seen, par- 
ticularly those in chaotic embryos, may have gross 
structural chromosome errors as well as simple gain 
or loss of whole chromosomes. Mosaicism for ploidy, 
notably tetraploidy, is common. In general, although 


some of the abnormal karyotypes, e.g., trisomy 16, 
seen in IVF-derived embryos, are detected at frequen- 
cies broadly comparable to those seen in later con- 
ceptions, many of them are not. Either cells with more 
complex abnormalities are being outgrown or 
excluded, or the embryos themselves are failing to 
produce on going pregnancies; it is not possible to 
exclude that some may also be artifacts of the IVF 
process itself. 


Conclusion 


Mosaicism is remarkably common in early human 
development. Its clinical effects are well recognized, 
but the mechanisms behind its origins are poorly 
understood. 


See also: Amniocentesis; Nondisjunction; 
Prenatal Diagnosis; Trisomy; Uniparental 
Inheritance 
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All mammals are very similar in genetics, embryology, 
biochemistry, physiology, anatomy, and even behav- 
ior. Therefore, we could choose any mammal as an 
experimental object for study to understand ourselves 
better biologically and medically, and, of course, the 
closer the relationship the better. However, the mouse, 
although a distant relative of humans, has distinct 
advantages. It is one of the smallest mammals, weigh- 
ing little more than 20 g as a young adult, which per- 
mits large numbers to be raised and bred efficiently. 
Mice breed prolifically as young as 40 days old, which 
enables a few generations a year to be studied. Mice 
age about 30 times as rapidly as humans so the study of 
embryogenesis, development, and aging can be stud- 
ied in a relatively short period of time. 

The mouse genome comprises 20 pairs of chromo- 
somes; the human genome has 23 pairs. Genes on the 
chromosomes are arranged in a very similar order in 
both species. In fact, if the mouse chromosomal set 
were broken into about 150 specific pieces and re- 
arranged in the correct way, one could just about 
reconstruct the arrangement of the human genome. 
The difference in these arrangements of only about 
150 pieces seems small considering the two species 
have common ancestry 65 million years ago. We now 
know that all mammals share in this great genomic 
similarity. 
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All mammals are very similar in genetics, embryology, 
biochemistry, physiology, anatomy, and even behav- 
ior. Therefore, we could choose any mammal as an 
experimental object for study to understand ourselves 
better biologically and medically, and, of course, the 
closer the relationship the better. However, the mouse, 
although a distant relative of humans, has distinct 
advantages. It is one of the smallest mammals, weigh- 
ing little more than 20 g as a young adult, which per- 
mits large numbers to be raised and bred efficiently. 
Mice breed prolifically as young as 40 days old, which 
enables a few generations a year to be studied. Mice 
age about 30 times as rapidly as humans so the study of 
embryogenesis, development, and aging can be stud- 
ied in a relatively short period of time. 

The mouse genome comprises 20 pairs of chromo- 
somes; the human genome has 23 pairs. Genes on the 
chromosomes are arranged in a very similar order in 
both species. In fact, if the mouse chromosomal set 
were broken into about 150 specific pieces and re- 
arranged in the correct way, one could just about 
reconstruct the arrangement of the human genome. 
The difference in these arrangements of only about 
150 pieces seems small considering the two species 
have common ancestry 65 million years ago. We now 
know that all mammals share in this great genomic 
similarity. 


Mice have been cohabitants of humans for thou- 
sands of years. For over a hundred years, and probably 
much longer, mouse fanciers have been in the business 
of selling mice with exotic coat colors and patterns. 
In the process of living and breeding in a human- 
managed environment, mice were inadvertently 
selected for tameness. Only within the last 100 years 
have mice been used seriously for biological research 
and systematically bred for that purpose. Many of the 
founders of present-day mouse stocks and strains have 
their origins in the variety of colored and relatively 
tame mice that were widely available for sale. More 
recently, to explore greater genetic variation, strains 
and stocks have been initiated in laboratories from 
mice caught in the wild. 

The use of mice as good genetic, embryological, 
physiological, developmental, and aging models 
makes it possible to isolate and examine the various 
paths of genetics to the development of different dis- 
eases. 


The Strategy 


The aim is to find genetic disease in mice that mirrors 
genetic disease in humans. Presuming a very similar 
etiology of disease in both species, an observation in 
one can provide information on the other. Thus, know- 
ing the genetic defect in the mouse and its physio- 
logical and developmental consequences, biomedical 
scientists can devise new ways in which to intervene or 
alter these defective pathways. Successful intervention 
or amelioration in the mouse portends the success of 
the same strategy in humans. The strategy, then, is first 
to find genetic problems in mice. This can be done by 
screening and then mating phenotypic deviants. These 
deviants initially are suspected to be the result of a 
mutation, which must be confirmed by further breed- 
ing of the affected animal with his relatives. Although 
the mutation rate is low, there are a large number of 
genes that can mutate, so the appearance of a pheno- 
typic deviant owing to a genetic mutation is not un- 
usual in a sizeable colony. The mutation rate can be 
enhanced by subjecting the mice to mutagens, such as 
X-rays. A powerful chemical mutagen, ethyl nitroso- 
urea (ENU), injected subcutaneously has been found 
to cause a relatively high frequency of point mutations 
in the male germ cells, thus dramatically increasing the 
frequency of new mutants for study. When teams of 
scientists collaborate to examine the offspring of these 
mice for many different biomedical end points, the 
result is an effective way in which to increase the 
numbers of important models. 

The more we know about genes, the proteins they 
encode, and the physiological effects of those proteins, 
the better we can devise schemes to intervene in the 
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debilitating effects of damaged genes. Certainly a var- 
iety of gene therapy techniques now contemplated 
for humans can be attempted and perfected using 
homologous or similar mouse models. 


Inbred Strains 


Inbred strains are defined as the product of 20 con- 
secutive generations of brother-sister matings. Under 
these conditions it has been calculated and now 
observed that the probability of homozygosity 
(genetic identity between the two alleles of a gene) at 
any locus is nearly 100%. Having achieved status as an 
inbred strain, it receives a name by strictly agreed 
upon nomenclature rules, and the strain, through 
the scientific literature, becomes known world-wide 
for its genetic and phenotypic traits. The strain 
usually then becomes available to any researcher in 
the world. Genetically independent strains, i.e., strains 
independently initiated from different founder popu- 
lations, are the most likely to be useful in finding 
phenotypic differences between inbred strains. This 
is because the strains themselves, by virtue of their 
distinct origins, have the greatest chance of being 
genetically different. 


Crosses 


The study of inbred strains is also a powerful method 
for finding genes that cause important biomedical 
phenotypes. If animals are raised in the same environ- 
ment, the phenotypic differences between inbred 
strains are mainly owing to genetic differences 
between the strains. Often there are many genes con- 
tributing to these differences. To determine the na- 
ture and number of the genetic differences, crosses 
between mice of two strains are made followed by 
breeding the offspring (called the F,) back to 
either parental strain (backcross) or by breeding the 
F; with another F; of the same parental origin. These 
crosses produce offspring (called the F2) in which the 
intensities of the trait can be quantified and an 
estimate made of the number of genes causing the 
trait of interest. If the mice in the backcross or F; fall 
into a very few categories, then there are probably 
very few gene differences causing the variation in the 
trait. If the mice fall broadly across a continuum, 
then there are probably several genes involved and 
possibly a greater relative influence of environmental 
factors as well. 

Crossing mice of inbred strains that differ in spe- 
cific characteristics is also a way of uncovering import- 
ant genetic effects. Recombinant inbred strains are 
derived by continual brother-sister matings from 
independent mated pairs from the F, of a cross 
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between two inbred strains. The resulting set of strains 
provides a powerful method for genetic analysis of 
any traits by which the parental strains differ. Con- 
genic and consomic inbred strains are produced by 
repeatedly backcrossing a gene or a chromosome 
from one strain onto the background of another. The 
single donated gene can usually be recognized in the 
phenotype of the backcrossed mice in making a con- 
genic strain, but the animals must be typed for micro- 
satellite markers or other genetic polymorphisms in 
making consomics to be sure that the chromosome of 
interest has not recombined with the new host strain 
chromosome. 


Selection 


Selection experiments can also be carried out to 
greatly exaggerate the population mean of any trait 
with a reasonable heritability, that is, where the pheno- 
typic difference is owing in large part to a genetic 
etiology. Many such experiments have been attempt- 
ed. A far greater mean difference for study can prob- 
ably be achieved by selection than can be found 
among inbred strains. But one can argue that other 
techniques, such as choosing specific strain combin- 
ations to make recombinant inbred lines, are better. 
Usually they provide sufficient strain differences, 
and they offer powerful opportunities for genetic 
analysis, which includes determining the number and 
influence of genes affecting the trait and initial map- 
ping of the genes. 

There are hundreds of good mouse models for 
human genetic disease. Our experience so far indicates 
that mouse and human will share most of their symp- 
toms in genetic disease. Some important examples are 
given below. 


Chromosomal Aberrations 


Humans have various kinds of chromosomal defects 
and rearrangements. For example, there are trisomies, 
duplications, aneuploids, translocations, and inver- 
sions, all with potential major effects on viability, 
reproduction, and developmental abnormalities, 
including physical deformities and mental retardation. 
Mice have numerous examples of the same con- 
ditions, many having been studied to provide insight 
into the human condition. There is a mouse segmental 
trisomy, T(16;17)65Dn, that emulates Down syn- 
drome and is currently being widely studied. Of 
particular importantance to families with these 
chromosomal problems is information on the prob- 
ability of recurrence in subsequent pregnancies. 
Mouse models provide the best material to understand 
causation and recurrence. 


A Specific Disease Analogy 


Osteoporosis 

The mouse has a number of genes that control bone 
density, a fact originally discovered because differ- 
ences in this trait were found among inbred strains. 
The many possible paths through which these genes 
act to regulate bone density are open to study, under- 
standing, and probable therapy. 


Specific Gene Homologies or Analogies 
Obesity 


Obesity is found in mice just as it is in humans. Since 
about 60% of cases of diabetes, 30% of gall bladder 
disease, 20% of cardiovascular disease, 10% of mus- 
culoskeletal disease, and 2% of cancer is attributed to 
obesity in humans, exploring its genetic causes in mice 
is very useful. More than half a dozen mutations have 
been found in mice that cause obesity by differing 
physiological actions. One example is the obese gene 
(Lep’”) which has been cloned and is now known to 
produce a hormone called leptin. An understanding of 
this gene may permit a medical regimen utilizing lep- 
tin to control this aspect of human obesity. 


Muscular Dystrophy 

The Dmd”®™ gene mutation in the mouse causes a 
muscular dystrophy similar to Duchenne muscular 
dystrophy in humans. This gene is located on the X 
chromosome of both the mouse and human; therefore, 
it usually affects males. 


Spinal Cord Injury 

Spinal cord injury in humans leading to lower leg and 
sometimes arm immobility is a great problem in terms 
of health, emotional burden, and economic cost. A 
huge research effort has been made in attempting to 
make spinal cords rejoin and mend, thus restoring 
function. There are mice that are knockouts for neuro- 
trophins which keep the nervous system intact. These 
animals are ideal subjects for new treatments or regi- 
mens to revive neuron growth and restore function. 


Parkinson’s Disease 

In this mouse model, the dopamine receptor gene, 
Drd1a, has been subjected to targeted mutation pro- 
viding a model for Parkinson’s disease, schizophrenia, 
and diseases of addiction to amphetamine, cocaine, 
and alcohol. 


Severe Combined Immunodeficiency 

The mouse with severe combined immunodeficiency 
(Prkde™) has a severely weakened immune system, 
making it difficult for it to fight infections and reject 


foreign tissue. These mice can be engrafted with 
human lymphocytes with the full human immuno- 
logical power to attack foreign tissue. Thus, the human 
immunological response to many foreign bodies, 
such as the HIV virus, can be studied in the laboratory 
mouse. Furthermore, antibiotics and other drugs can 
be evaluated relatively easily under these defined 
experimental conditions. 


Heart Disease 

The build-up of plaque in the arteries is a major cause 
of human heart disease. The Apoe deficient mouse 
shows arterial plaque accumulation as early as 3 
months of age even when raised on low fat diets. 


Cancer 

Mice can develop all the same cancers that humans do. 
There is a naturally occurring variant the Apc gene in 
the mouse that causes colon cancer similar to that in 
humans. An understanding of the malfunction of this 
gene could lead to ways in which to cure the human 
genetic condition and also the vast majority of colon 
cancers caused by environmental factors. 

A major characteristic of cancer is the uncontrolled 
growth of cells. The 77p53 gene makes a protein that 
controls cell division and therefore controls wild 
tumor growth. The Trp53 targeted mutant mouse has 
a damaged form of this cancer suppressor gene, mak- 
ing the mutant animal highly susceptible to many 
different cancers. The model is important in studies 
of breast and ovarian cancers. 

Exciting opportunities for a cure lie in immuno- 
therapy, which employs the immune system’s ability 
to recognize ‘foreign,’ including cancerous, tissues 
and to attack and destroy the tumors. The immun- 
ology of the mouse is strikingly similar to that of 
humans and can easily be manipulated in the mouse. 


Juvenile Diabetes 

Juvenile diabetes or insulin-dependent diabetes 
mellitus (IDDM) is an autoimmune disease. Mice of 
inbred strain NOD develop this disease and are cur- 
rently being widely studied to identify the diabetes 
susceptibility genes and the mechanism of the disease. 
Mice of the TH stock are a good model for adult onset 
diabetes or noninsulin dependent diabetes (NIDDM). 


Cystic Fibrosis 

Cystic fibrosis is a very widespread, fatal human 
genetic disease. A targeted mutation of the mouse 
Cftr gene causes many of the symptoms of human 
cystic fibrosis. 


Epilepsy 
A mouse model exists that shows both major forms of 
human epilepsy, i.e., petit mal and grand mal. It willbe 
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particularly useful in the study of the petit mal form, 
which usually occurs in children. 


Eye Genetics 

There are many genes that cause cataracts in humans, 
and this high frequency is also found in the mouse, 
from which it may be concluded that we are dealing 
with the same spectrum of genes and mutational 
events in both species. Furthermore, mice can have 
corneal disorders, glaucoma, and retinal degeneration, 
which together cause most cases of blindness in 
human populations. The DBA/2J mouse strain has 
been found to have several symptoms common in 
human glaucoma, a leading cause of human blindness. 
The strain is now widely used to investigate the nature 
of the development of glaucoma. 


Aging 
Inbred strains are well known for their different life 
spans. Some strains die young from specific diseases 
such as leukemia. But it is more difficult to determine 
genes that have an effect in extending life span beyond 
the normal range. Recently, in a selection experiment 
for life span, a significant association was found for 
two unlinked genes and longevity. With further eluci- 
dation of the effects of these genes, an understanding 
of the mechanisms for prolonging life will probably be 
revealed. 

In addition to genetic models there are hundreds of 
good tissue and developmental models for study. 


Other Models 


Germ Cells 

All germ cells of both male and female mice can be 
studied histologically or manipulated im vivo in situ- 
ations impossible or difficult to simulate in humans. 


Embryogenesis 

The early embryology of the mouse is nearly 
identical to that of humans. In fact, except for size, it 
is difficult to distinguish the mouse embryo from the 
human embryo throughout the first trimester. This 
means that the potentially thousands of genes 
that bring the embryo to this stage from a fertilized 
ovum are doing virtually the same thing in mouse 
and humans. Therefore, studying any develop- 
mental anomaly in the mouse caused by a defective 
gene can lead us to knowledge of the homologous 
human condition. Later in embryogenesis one 
can see the human head enlarge extensively, the 
mouse head elongate with a snout, the mouse tail 
elongate, and the digits for toes and fingers differen- 
tiating from paws. 
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Nevertheless at birth, similarities are still apparent 
in anatomy and continuing developmental patterns. 
After birth there are genetic problems associated 
with the onset of maturity and reproductive function- 
ing. Again the mouse emulates humans in postnatal 
development and aging, although in the mouse the 
process is 30 times more rapid. Finally as humans are 
living longer, we are more commonly finding human 
genetic disorders that appear in middle age or later in 
life, such as hemochromatosis, Huntington disease, 
type II diabetes, glaucoma, and many forms of cancer. 
Again the study of mice in their rapid transition 
through these developmental periods into old age is 
of enormous value. 


Advanced Protocols 


With the large amount of information acquired on 
mice, several distinct subdisciplines have grown dedi- 
cated to the raising, care, and use of mice. The 
advanced state of these protocols makes the mouse 
an even more important laboratory asset. 


Nomenclature 


Without a specific nomenclature, it would be impos- 
sible for scientists to communicate about specific 
genes, chromosomal variants, and strains. A 16-member 
International Committee on Standardized Genetic 
Nomenclature for Mice is responsible for ensuring 
cohesive guidelines for nomenclature. Collaboration 
with committees overseeing nomenclature of other 
species is important, especially in the naming of 
genes shown to be orthologous between species. 


Breeding Systems 


The purpose of a breeding system is to preserve and 
control the genetic causes of variability in the bio- 
logical traits of interest. Important considerations are 
the avoidance of inbreeding, which is more complex 
than random breeding alone. Other crosses have been 
designed to manipulate gene and chromosomal trans- 
fer from one strain to another permitting several kinds 
of biological analysis. 


Record Keeping and Colony 
Management 


One must be able to identify each mouse in a genetic- 
ally heterogeneous colony. Thus, it is necessary to 
maintain a perfect association between the mouse and 
its location, its ancestry, descendants, and relatives and 
all the biological information acquired on it. Further- 
more, animals need an optimally comfortable envir- 
onment in which to live. Thus, there is considerable 


knowledge based on tested protocols of proper care, 
feeding, sterilizing of feed, providing clean water, and 
cleaning of equipment. Good physical and comfort- 
able surroundings, along with continual concern for 
the health and well being of each animal, is essential. A 
most important concern is the humane treatment of 
animals in research. Protocols for proper humane 
handling of mice are in common practice and are con- 
tinually reviewed and improved. 


Cryobiology 


Now that the freezing of early-stage embryos is vir- 
tually routine, it is possible to keep colonies of mice 
that have great potential for research but which are not 
at the present time being used. With no adverse effects, 
the embryos can be thawed, transplanted into pseudo- 
pregnant females, and brought to an otherwise normal 
birth. Frozen embryos can also be used as insurance 
against loss of strains or stocks that are kept in very 
small living colonies. The freezing of sperm, which is 
more advanced in human reproductive science, can 
now be done with mice, making possible an effective 
and inexpensive way to preserve specific haplotypes 
for future use. 


Informatics 


The acquisition and assembly of information in 
computer-accessible form is greatly advanced for the 
mouse. The large recent growth in information on the 
genetics and biology of the laboratory mouse fortu- 
nately has advanced with the similar exponential 
development of the computer and its programing 
applications. This happy coincidence has made it 
possible for the information to be immediately made 
available to researchers in laboratories world-wide. 
Curators systematically read the scientific literature 
and put data selectively into a database from which it 
can be systematically accessed through the internet. 
Now major databases, all accessible on-line, include 
genomic sequencing, gene descriptions, genetic and 
strain nomenclature, experimental mapping data, link- 
age maps, cytogenetic maps, physical maps, gene 
homologies among mammals, phenotypes, allelic vari- 
ants, strain data, and committee reports. Also, there is 
an index of various types of gene expression during 
mouse development, which willbe increasingly import- 
ant as biology proceeds beyond genomic sequencing. 
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The Early Years of Genetic Analysis 


Although its significance was not immediately recog- 
nized, the first demonstration of linkage in the mouse 
was published in 1915 by the great twentieth-century 
geneticist J.B.S. Haldane. What Haldane found was 
evidence for coupling between mutations at the albino 
(c) and pink-eyed dilution (p) loci, which we now 
know to lie 15 cM apart on chromosome 7. Since that 
time, the linkage map of the mouse has expanded 
steadily at a near-exponential pace. During the first 
65 years of work on the mouse map, this expansion 
took place one locus at a time. First, each new muta- 
tion had to be bred into a strain with other phenotypic 
markers. Then further breeding was pursued to deter- 
mine whether the new mutation showed linkage to 
any of these other markers. This process had to be 
repeated with different groups of phenotypic markers 
until linkage to one other previously mapped marker 
was established. At this point, further breeding studies 
could be conducted with additional phenotypic mark- 
ers from the same linkage group to establish a more 
refined map position. 

In the first compendium of mouse genetic data 
published in The Biology of the Laboratory Mouse in 
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1941, a total of 24 independent loci were listed, 
of which 15 could be placed into seven linkage 
groups containing either two or three loci each; 
the remaining nine loci were found not to be 
linked to each other or to any of the seven confirmed 
linkage groups. By the time the second edition of 
The Biology of the Laboratory Mouse was published 
in 1966, the number of mapped loci had grown to 
250, and the number of linkage groups had climbed 
to 19, although in four cases, these included only two 
or three loci. 

With the 1989 publication of the second edition of 
Genetic Variants and Strains of the Laboratory Mouse, 
965 loci had been mapped on all 20 recombining 
chromosomes. However, even at the time that this 
map was actually prepared for publication (circa late 
1987), it was still the case that the vast majority of 
mapped loci were defined by mutations that had been 
painstakingly incorporated into the whole genome 
map through extensive breeding studies. 


The Middle Ages: Recombinant Inbred 
Strains 


The first important conceptual breakthrough aimed 
at reducing the time, effort, and animals required to 
map single loci came with the conceptualization and 
establishment of recombinant inbred (RI) strains by 
Donald Bailey and Benjamin Taylor at the Jackson 
Laboratory. A set of RI strains provides a collection 
of samples in which recombination events between 
homologs from two different inbred strains are pre- 
served within the context of new inbred strains. The 
power of the RI approach is that loci can be mapped 
relative to each other within the same ‘cross’ even 
though the analyses themselves may be performed 
many years apart. Since the RI strains are essentially 
preformed and immortal, typing a newly defined locus 
requires only as much time as the typing assay itself. 
Although the RI mapping approach was extremely 
powerful in theory, during the first two decades after 
its appearance, its use was rather limited because of 
two major problems. First, analysis was only possible 
with loci present as alternative alleles in the two inbred 
parental strains used to form each RI set. This ruled 
out nearly all of the many loci that were defined by 
gross phenotypic effects. Only a handful of such loci — 
primarily those that affect coat color — were poly- 
morphic among different inbred strains. In fact, in the 
prerecombinant DNA era, the only other loci that were 
amenable to RI analysis were those that encoded: 
(1) polymorphic enzymes (called allozymes or iso- 
zymes) that were observed as differentially migrating 
bands on starch gels processed for the specific 
enzyme activity under analysis; (2) immunological 
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polymorphisms detected at minor histocompatibility 
loci; and (3) other polymorphic cell surface antigens 
(called alloantigens or isoantigens) that could be dis- 
tinguished with specially developed allo-antisera. 
In retrospect, it is now clear that RI strains were 
developed ahead of their time; their power and utility 
in mouse genetics only began to be fully unleashed in 
the 1990s. 


DNA Markers and the Mapping Panel 
Era 


Two events that occurred during the 1980s allowed the 
initial development of a whole genome mouse map 
that was entirely based on DNA marker loci. The 
first event was the globalization of the technology 
for obtaining DNA clones from the mouse genome 
and all other organisms. Although the techniques of 
DNA cloning had been developed during the 1970s, 
stringent regulations in the USA and other countries 
had prevented their widespread application to mam- 
malian species like the mouse. These regulations were 
greatly reduced in scope during the early years of the 
1980s so that investigators at typical biological 
research facilities could begin to clone and character- 
ize genes from mice. The globalization of the cloning 
technology was greatly hastened in 1982 by the pub- 
lication of the first highly detailed cloning manual 
from Cold Spring Harbor Laboratory, officially 
entitled Molecular Cloning: A Laboratory Manual, 
but known unofficially as “The Bible.” Since the ori- 
ginal publication of the Maniatis manual, a second 
edition has appeared, other competing manuals have 
been published, and most suppliers of molecular 
biology reagents now also provide detailed accounts 
of molecular techniques. 

Although DNA clones were being recovered at a 
rapid rate during the 1980s, from loci across the mouse 
genome, their general utilization in linkage mapping 
was not straightforward. The only feasible technique 
available at the time for mapping cloned loci was the 
typing of restriction fragment length polymorphisms 
(RFLPs). Unfortunately, the common ancestry of 
the traditional inbred strains made it difficult, if not 
impossible, to identify RFLPs between them at most 
cloned loci. 

The logjam in mapping was broken not through the 
development of a new molecular technique, but rather, 
through the development of a new genetic approach. 
This was the second significant event in terms of 
mouse mapping during the 1980s — the introduction 
of the interspecific backcross. François Bonhomme 
and his French colleagues had discovered that two 
very distinct mouse species — Mus musculus and 
M. spretus — could be bred together in the laboratory 


to form fertile Fı female hybrids. With the 3 million 
years that separate these two Mus species, base-pair 
substitutions have accumulated to the point where 
RFLPs can be rapidly identified for nearly every 
DNA probe that is tested. Thus, by backcrossing an 
interspecific superheterozygous F; female to one of its 
parental strains, it becomes possible to follow the 
segregation of the great majority of loci that are 
identified by DNA clones through the use of RFLP 
analysis. 

Although the ‘spretus backcross’ could not be 
immortalized in the same manner as a set of RI strains, 
each of the backcross offspring could be converted 
into a quantity of DNA that was sufficient for RFLP 
analyses with hundreds of DNA probes. In essence, 
it became possible to move from a classical three- 
locus backcross to a several-hundred-locus backcross. 
Furthermore, the number of loci could continue to 
grow as new DNA probes were used to screen the 
members of the established ‘mapping panel’ (until 
DNA samples were used up). The spretus backcross 
revolutionized the study of mouse genetics because it 
provided the first complete linkage map of the mouse 
genome based on DNA markers and because it pro- 
vided mapping panels that could be used to rapidly 
map essentially any new locus that was defined at the 
DNA level. 


The Era of Microsatellites 


The most recent major advance in genetic analysis has 
come not from the development of new types of 
crosses but from the discovery and utilization of 
PCR-based DNA markers that are extremely poly- 
morphic and can be rapidly typed in large numbers of 
animals with minimal amounts of sample material. 
These powerful new markers — especially microsatel- 
lites — have greatly diminished the essential need for 
the spretus backcross and they have breathed new life 
into the usefulness of the venerable RI strains. Most 
importantly, it is now possible for individual investi- 
gators with limited resources to carry out independ- 
ent, sophisticated mapping analyses of mutant genes 
or complex disease traits. As Philip Avner of the Insti- 
tut Pasteur in Paris stated: 


If the 1980s were the decade of Mus spretus — whose use in 
conjunction with restriction fragment length polymorph- 
isms revolutionized mouse linkage analysis, and made the 
mouse a formidably efficient system for genome mapping — 
the early 1990s look set to be the years of the microsatellite. 


In the new millennium, the new genetic tool of DNA 
arrays is sure to replace microsatellites as the method 
of choice for mapping analysis. 


Further Reading 
Maniatis (1982) Molecular Cloning: A Laboratory Manual. Plain- 
view, NY: Cold Spring Harbor Laboratory Press. 


See also: DNA Cloning; Linkage Map; 
Microsatellite; Mus spretus; Recombinant Inbred 
Strains 
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Mouse leukemia viruses (MLV) are members of the 
large family of retroviruses, enveloped RNA viruses 
with many shared properties including a signature 
replicative strategy that involves reverse transcription 
of viral RNA into double-stranded DNA and integra- 
tion of that DNA into the cellular genome, the process 
of proviral insertion. MLV genomes — expressed or 
silent, complete or partially defective — comprise as 
much as a few percent of the mouse genome and are 
inherited as well as transmitted horizontally. Genetic 
exchanges between related MLV or MLV and cellular 
genes are possible, generating altered viruses respon- 
sible for a variety of neoplastic and non-neoplastic 
diseases. In addition, genomic proviral insertions of 
MLV cause mutations or alter expression of cellular 
genes. Studies of MLV particles and genomes, their 
mode of replication, and host responses to infection 
provide deep insights into the molecular basis of can- 
cer and lay the foundations for two major directions of 
current retrovirology: the control and preventions of 
AIDS and the use of retroviruses for gene delivery. 


Basic Virology 


The genome of a prototypic infectious MLV is simple, 
consisting of protein coding sequences for gag (MA 
(matrix), p12, CA (capsid), NC (nucleocapsid)), pol 
(PR (protease), RT (reverse transcriptase), IN (inte- 
grase)), and env (SU (surface), TM (transmembrane)) 
flanked by long terminal repeats (LTRs) (Figure 1A). 
The LTRs contain elements that regulate transcription 
through the binding of multiple transcription factors. 
Full-length and spliced env transcripts are the source 
of gag/pol and env polyprotein precursors; full-length 
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transcripts are also incorporated into budding virions. 
Free virions bind to specific receptors, fuse with the 
cell membrane, and deliver the preintegration com- 
plex of double-stranded RNA, RT, and IN to the 
cytoplasm, where reverse transcription takes place. 
The complex then enters the nucleus, where viral 
DNA is incorporated into the cellular genome. 

Infectious MLV are classified based on their ability 
to infect cells of different species. This property is 
determined by env-encoded sequences that mediate 
interactions between the virus and its cellular receptor. 
The receptors differ for each host range class and 
include an amino acid transporter and a phosphate 
symporter, indicating that MLV co-opt normal cellu- 
lar proteins to mediate cell entry. Ecotropic MLV, 
isolated from both inbred and wild mice, infect only 
murine cells. Xenotropic MLV, isolated only from 
inbred mice, infect cells from many species other 
than mice. Amphotropic MLV, recovered only from 
wild mice, have the combined host range of ecotropic 
and xenotropic MLV. Another class of MLV with 
equally broad host range has been termed mink cell 
focus-inducing (MCF) or polytropic MLV. These 
MLV arise via recombination between replication- 
competent ecotropic MLV and other endogenous pro- 
viruses (Figure 1B). Altered host range results from 
acquisition of SU sequences from defective polytropic 
viruses, while changes in expression derive from 
acquisition of xenotropic sequences within the U3 
region of the LTR. 

Sequences for infectious ecotropic and xenotropic 
MLV are found in the genomes of some inbred strains 
and wild mouse populations, while amphotropic MLV 
are only transmitted horizontally. Infectious endogen- 
ous polytropic viruses have not been observed. MLV 
sequences entered the mouse genome by unknown 
means about 1.5 million years ago, with nonecotropic 
viruses probably being the first because they are more 
widely distributed in mouse evolution than ecotropic 
MLV. At the time nonecotropic viruses entered the 
genome, they may have been able to replicate in mouse 
cells. Retroviruses must have entered the genome on 
multiple occasions, because the complement of en- 
dogenous MLV sequences is distinct for mice of 
different geographic and taxonomic origins. These 
introductions were sometimes mutagenic, creating 
phenotypic changes in coat color or hair distribution, 
as examples. Proviruses can also be deleted, albeit 
rarely, by homologous recombination between the 
two LTRs. 


Host-Virus Interactions 


MLV nucleic acids and proteins interact with a large 
variety of cell components during the processes of 
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Figure | (A) Structure of a normal infectious MLV genome as it is inserted into host DNA is shown on top, with 


the two major translational products (gag/pro/pol and env) shown beneath. The expanded structure of the LTR 
details the positions of the U3, R, and U5 regions along with the positions of positive (+) and negative (—) regulatory 
sequences. The positions of representative transcription-factor-binding sequences in the direct repeats of the 
enhancer are also shown. (B) Comparisons of the genomic structure of a nonpathogenic Akv ecotropic virus and a 
pathogenic MCF virus. Sequences in the MCF acquired by recombination with endogenous xenotropic or polytropic 
viruses are shown. (C) Mechanisms of oncogene activation by MLV. The structure of a hypothetical c-onc gene is 
shown. The acutely transforming MLV containing a v-onc gene has incorporated most of the cDNA version of the 
c-onc gene while replacing the normal regulatory 5’ sequences with regulatory elements in the LTR. Activation by 
proviral insertional mutagenesis is illustrated for promoter insertion and enhancer insertion. The transcription 
orientation of the MLV is indicated by straight arrows and the effect of the enhancer element in the 5’ LTR on the 
c-onc promoter by a looped arrow. 


infection and replication. The long period of evolu- 
tionary coexistence between MLV and their hosts 
has provided numerous opportunities for the host to 
develop mechanisms for resistance to infection. Two 
of the best-understood cell resistance genes — Fv4 and 
Rmef — inhibit infection by blocking receptors with 
expressed envelope proteins of endogenous ecotropic 
and polytropic viruses, respectively. A third resistance 
gene, Fv1, with a gag-like sequence possibly derived 


from a non-MLV endogenous defective virus, blocks a 
step in the virus life cycle subsequent to reverse tran- 
scription but prior to integration. Integration requires 
that the preintegration complex pass the nuclear mem- 
brane, a process that is poorly understood. Finally, 
transcription of integrated MLV requires interactions 
of the LTRs with transcription factors that may be 
expressed only in specific cell types or at particular 
stages of differentiation within a cell lineage. These 


features may contribute to the mechanisms that limit 
infections by most MLV to neonatal mice. 

The ease with which neonates can be infected as 
compared to adults is also a reflection of cell extrinsic 
mechanisms of resistance including maturation of the 
immune system. In adults, expression of immunogenic 
epitopes of env sequences elicits specific humoral 
antibody responses capable of inhibiting cell-to-cell 
spread by virus neutralization and elimination. 
Cellular immune responses with helper CD4+ T 
cells facilitating the response of CD8+ cytotoxic T 
cells (CTL) to both gag and env determinants result 
in the killing of cells that express those determinants. 
Effective cellular responses to MLV can be generated 
by neonatal mice, but only if the exposure is to 
very low doses of virus, which permits induction of 
CD4+ T cell helper responses featuring high-level 
expression of interferon-y and little or no IL-4. Other- 
wise, early infection results in recognition of MLV 
determinants as ‘self? and the absence of either 
humoral or cellular antiviral immunity. Resistance to 
a variety of MLV infections maps to genes within the 
major histocompatibility complex (MHC). Both class 
I loci encoding proteins that present antigens to CTL 
and loci encoding class II molecules that present anti- 
gens to helper T cells are implicated in resistance to 
leukemias induced by Friend, Rauscher, and Gross 
MLV as well as neurotropic and immunodeficiency- 
inducing MLV. Genes mapping outside the MHC have 
also been shown to affect the magnitude of antiviral 
antibody responses. 

One mechanism by which adult infection can be 
achieved is induction of immunodeficiency. Profound 
impairment of humoral and cellular immune func- 
tions and progressive lymphoproliferation involving 
T and B cells are central features of a murine retro- 
virus-induced immunodeficiency syndrome termed 
MAIDS. Disease induction requires expression of a 
replication-defective MLV that encodes an altered gag 
gene product with changes in the carboxyterminus of 
MA and in p12 that resists normal proteolytic process- 
ing. B cells are the primary targets of infection, but 
disease develops only in the presence of normal T cell- 
B cell interactions. T and B cells from infected mice 
are almost totally nonresponsive to activation by any 
means, a state of anergy with no parallel in other 
infections or immune responses. During the later 
stages of this infection, all mice develop clonally 
expanded populations of B and/or T cells that can be 
transplanted to immunocompromised hosts. Although 
there are many gaps in our understanding of the 
mechanisms contributing to immunodeficiency and 
lymphoma, this condition differs from the immuno- 
deficiencies that characterize AIDS and its feline and 
simian equivalents. 
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Another disease induced by MLV but not by other 
retroviruses is spongiform encephalopathy of the 
spinal cord and brain stem manifested by limb paraly- 

is. Mice infected neonatally with any of several 
ecotropic MLV develop a vacuolar degeneration of 
microglia and neurons and perivascular astrocytosis 
in the absence of an inflammatory response. A distinct 
syndrome of hyperexcitability and ataxia character- 
ized by astrocytosis and astrocyte degeneration is 
induced by a MCF virus. The brain cell types infected 
with the different viruses vary somewhat, but for all 
viruses, the env gene harbors the major determinants 
of virulence. The effects of the env proteins seem to 
be indirect, because the neurons exhibiting cytopath- 
ology appear not be infected. Resistance of mice older 
than 10 days to infection with one of the ecotropic 
viruses can be overcome by injecting virus-infected 
microglia intracerebrally, indicating maturation of 
the blood-brain barrier as the mechanism for devel- 
opmental resistance. Of interest, mice with MAIDS 
also develop an encephalopathy characterized by 
spatial learning and memory defects, inflammatory 
changes in the periventricular spaces, and neuronal 
damage in the striatum. These abnormalities are 
worsened by tumor necrosis factor and reduced by 
interferon-y. 


MLV and Neoplasia 


Infections with MLV are associated with a wide var- 
iety of malignancies. Studies of these disorders have 
generated a wealth of information about how MLV 
cause specific tumors and the nature of similar neo- 
plasms in which MLV are not involved. MLV asso- 
ciated with transformation are of two types: acutely 
transforming and slowly transforming. Both are asso- 
ciated with the activation of genes, termed oncogenes, 
that transform cells so that they grow with character- 
istics of tumor cells. Activation results from the genes 
coming under the control of viral rather than cellular 
regulatory sequences (Figure IC). The acutely trans- 
forming viruses are those that acquired a viral onco- 
gene (a v-onc gene) as a result of recombination 
between a non- or slowly transforming replication- 
competent virus and a cellular proto-oncogene (c-onc 
gene). The genes captured by these MLVare universally 
important for controlling cell growth, intracellular 
signaling, differentiation, or programmed cell death. 
Acutely transforming viruses usually cause sarcomas 
or hematopoietic tumors and most often transform 
cells in culture. Because acquisition of v-oncs leads 
to loss of some virus genes, the viruses are defective 
and require the presence of competent helper MLV to 
provide the full complement of viral products needed 
for replication and packaging of the transforming 
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genome into virions. Slowly transforming MLV do 
not carry v-onc genes, and tumors appear only after 
an extended latent period and are associated with 
integration of the virus near a c-onc gene, effect- 
ing its activation. This process is termed proviral 
insertional mutagenesis. The two most common 
mechanisms of activation are promoter insertion, 
with transcription of the c-onc initiated from the 3’ 
LTR, and activation by enhancer sequences within the 
LTR (Figure IC). Viruses can activate oncogenes via 
viral enhancers by inserting upstream or downstream 
of the oncogene and in either orientation. The slowly 
transforming MLV do not transform cells in tissue 
culture. 

The spontaneous thymic T-cell lymphomas that 
characterize AKR mice develop between 6 months 
and 1 year of age and are due to the activities of slowly 
transforming MLV. Endogenous ecotropic MLV 
(Akv) activated stochastically before and after birth 
give rise to a systemic infection with a virus that does 
not induce lymphoma on transfer to low-leukemia 
strains of mice; however, the immediate preleukemic 
period is characterized by the appearance of recombin- 
ant MCF viruses that rapidly induce disease on trans- 
fer to young AKR mice. The altered host range leads 
to more efficient infection of thymocytes and inser- 
tional mutagenesis of Myc or other c-onc genes in T 
cells. Similarly active MCF viruses are formed after 
infection with ‘laboratory strains’ of ecotropic MLV 
such as Moloney virus. MCF viruses are also gener- 
ated during the development of B cell lineage lymph- 
omas in mice expressing ecotropic virus at high levels, 
although ecotropic rather than MCF MLV usually 
mediate insertional mutagenesis of c-onc genes in 
these lymphomas. 

C-onc genes modified by insertional mutagenesis 
have traditionally been identified by molecularly 
cloning the viruses and cellular flanking sequences 
and using probes derived from the cellular sequences 
to determine whether the site is structurally altered in 
other tumors. Recent studies have demonstrated the 
value of using polymerase chain reaction (PCR) 
amplification and sequencing of virus-cell junction 
fragments. In this system, appearance of the same 
cellular sequence flanking MLV in different tumors 
identifies a common integration site. This technology 
has identified more than 100 candidate disease genes in 
mouse myeloid leukemia and lymphomas. 


MLV as Genetic Vectors 


The study of acutely transforming viruses demon- 
strated that MLV genomes with large substitutions 
can be propagated by providing in trans MLV 
sequences that permit the defective genome to be 


packaged into virions. The presence of replication- 
competent viruses is often unacceptable, however, 
particularly in the setting of human gene therapy. 
This drawback has been overcome by developing 
packaging cell lines in which the gag-pol and env 
coding regions are introduced on separate plasmids, 
greatly reducing the chance for recombination events 
leading to generation of productive virus. Infection by 
packaged viruses is thereby limited to a single cycle. 
The replication-defective virus is engineered to 
replace most retroviral sequences between the LTRs 
with sequences of interest. The use of multiple pro- 
moters and internal ribosome entry sites allow for the 
expression of more than one sequence from a single 
virus. The use of various env genes and LTRs allows 
transduction of a wide variety of cell types with stable 
integration of vector sequences and generally high 
levels of expression. 


Vaccination against MLV 


Extensive understanding of the immune system of the 
mouse has made mice a productive system for evalu- 
ating vaccines against retroviruses. Passive administra- 
tion of antibody to MLV can block development of 
disease if given to newborn mice. Infection with attenu- 
ated viruses has also proven effective, eliciting both 
strong humoral and cellular responses. In some 
instances, these approaches have not inhibited virus 
infection but have effectively blocked targeting of 
crucial cells, thus immunizing against disease rather 
than infection. Attempts to immunize with SU pro- 
tein, defective viruses expressing SU, or recombinant 
adenoviruses expressing env have generally been less 
effective than exposure to attenuated viruses. 


Future Prospects 


Much research related to MLV focuses on the patho- 
genesis of diseases they cause, including neurodegen- 
eration, immunodeficiency, and lymphomas. Disease 
mechanisms of non-neoplastic disorders are not well 
understood and deserve further study. Identifying 
new common sites of integration in neoplasia should 
provide insights into an expanded array of genes that 
can cooperate to induce transformation. MLV vectors 
will continue to be of great use as tools in gene therapy 
and probes to understand the function of genes in 
health and disease. 


Further Reading 
Coffin JM, Hughes SH and Varmus HE (eds) (1997) Retroviruses. 
Plainview, NY: Cold Spring Harbor Laboratory Press. 


See also: Leukemia; Retroviruses 
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Sxr (sex-reversed, formally designated Tp(Y)1Ct) was 
discovered in 1971 by Cattanach, and was initially 
thought to be an dominant autosomal mutation. Sub- 
sequently, Sxr was identified as a Y chromosome re- 
arrangement where most of the minute short arm (Yp) 
was duplicated and transposed to the telomeric end of 
the pseudoautosomal region (PAR) on the long arm 
(Yq) (Figure |). The duplicated region is designated 
Sxr*, and includes the Sry gene (sex-determining 
region Y, also known as Tdy, testis determining Y) 
and a number of male-specific H-Y tissue transplant- 
ation antigen genes. The Sx?“ region is transferred to 
half of the X chromosomes during male meiosis 
because obligate crossing-over occurs between the 
PARs of the X and Y chromosomes. This process 
generates XX4 sex-reversed mice that are male 
because they carry Sry, and sterile because they pos- 
sess two X chromosomes and lack a complete Y 
chromosome. However, X*”“O males, which possess 
only a single X chromosome, undergo all stages of 
spermatogenesis and even produce a few malformed 
sperm. These data show that Sxr“ contains all the genes 
needed to produce immature spermatozoa. X°*”“O 
males, however, cannot complete spermiogenesis 
because they lack one or more Y chromosome spermio- 
genesis factors located outside Sxr*. 

Fertile females carrying X**”* can be produced by 
mating XY°*”* males to females heterozygous for the 
T(X;16)16H translocation (T16H). In KTHS” 
mice, the X®™ chromosome is preferentially X- 
inactivated and thus some of these mice develop as 
fertile females presumably because X-inactivation 
spreads into and inactivates the Sry gene on the Sxr* 
chromosome. However, these females remain H-Y 
antigen positive probably because X-inactivation 
does not reach this locus (or loci). KY males 
can be produced by mating KTHK females to 
XY**”* males. These males are both viable and fertile, 
and produce only sons. 


Deletion Mapping 


Conventional meiotic mapping cannot be used to 
order Y chromosome genes outside of the PAR 
because this region has no meiotic pairing partner 
and therefore is recombinationally inert. Sxr has 
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been invaluable for ordering genes resident on Yp 
using deletion, meiotic, and physical analyses. A 
number of spontaneous Sxr deletion variants have 
been identified and are designated as Sxr?, Sxr’, Sxr4, 
and Sxr°. The best studied of these, Sxr”, was identified 
by McLaren in a XSH female: all of the Sxr- 
carrying progeny from this female retained Tdy, but 
lacked the Hya antigen. Sxr’ has an interstitial dele- 
tion between the Zfy1 (zinc finger protein 1, Y-linked) 
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Figure | A diagrammatic representation of the 
mouse Y chromosome and Sxr region (not to scale). 
Y% is shown within the box. The Sxr* region probably 
encompasses most of Yp (short arm). The ASxr? interval 
is stippled, and the genes identified within it are given to 
the right in genetic order. The position of Spy, within 
ASxr’, is indicated. Genes listed to the left are within 
Sxr*, but outside of ASxr’. The position of genes listed in 
parentheses is likely, but not proven. The centromere is 
represented by the black oval. PAR, pseudoautosomal 
region. Tel, telomere. 
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and Zfy2 (zinc finger protein 2, Y-linked) genes that 
likely arose from an unequal crossover between these 
two genes and created a Zfy1/2 fusion gene. The Sxr’ 
deletion interval has been termed ASxr’ (Figure 1). 
KPO males have severely anomalous spermatogen- 
esis resulting from an early postnatal failure of differ- 
entiating type A spermatogonia to proliferate after 
exit from mitotic arrest. This finding indicates that at 
least one spermatogenesis gene resides in ASxr’, and 
this locus has been named Spy (spermatogenesis Y). 

Capel and colleagues and Laval and colleagues 
identified a number of spontaneous Y chromosome 
deletion variants as XY fertile females whose fathers 
were X°*’*Y males. These deleted Y chromosomes 
(designated Del(Y)1H, Del(Y)2H, etc.) resulted from 
asymmetric, or illegitimate, crossing-over between 
Sxr* on the X chromosome and Yp. Molecular analysis 
proved that each variant Y chromosome was deleted 
for Sx1 (DYBis4) and Rbm (RNA binding motif pro- 
tein) sequences which lie between the centromere and 
Sry. Because the Sry locus is intact, the most plausible 
explanation for the XY sex reversal is that Sry is sub- 
ject to a position effect induced by increased prox- 
imity to centromeric heterochromatin. 

King and colleagues took a directed approach to 
deletion mapping Sx7r*. Specifically, they irradiated an 
XO cell line and immunoselected for loss of H-Y 
expression within Sxr* using H-Y specific cytotoxic T 
lymphocytes. This approach defined up to 16 ordered 
deletion intervals, gave a detailed map of the Sx7r* 
region using then current molecular markers, and sug- 
gested that H-Y is encoded by at least five distinct loci 
(Hyab, Hydb, Hykk1, Hydk, and Hykk2). 


Meiotic Mapping 


Laval and colleagues used meiotic mappin to order 
loci in the Sxr* interval by mating X°’“Y"S® males 
to normal females. They identified progeny that 
had inherited a crossover between Sxr* and Yp using 
restriction enzyme polymorphisms that exist between 
the Mus musculus-derived Sxr* and Mus domesticus- 
derived Yp. This approach provided direct proof of 
meiotic exchange between Sxr and Yp, oriented 
the genes on Yp with respect to the centromere, and 
placed the Rm genes between the centromere and Sry. 


Physical Mapping 


Recent molecular and physical mapping approaches 
have provided a dramatic increase in the number 
of genes localized to Sxr“, and specifically to ASxr’. 
The genes localized to Sxr* and ASxr’ are presented in 
genetic order in Figure |. As mentioned above, Zfy1 
and Zfy2 delimit ASxr’ and have provided anchors for 


positional cloning approaches. It is interesting to note 
that all of the functional genes in this region have X 
chromosome homologs. Two H-Y antigens have 
been molecularly identified: Smcy (selected mouse 
cDNA on the Y) is Hykk1 and Uty (ubiquitously 
transcribed tetratricopeptide repeat gene, Y chromo- 
some) is Hydb. Smcy has homology to the human 
retinoblastoma binding protein-2 and is a putative 
transcription factor, and Uty has been proposed to 
play a role in regulating cell division or transcription 
based on sequence homology. However, the exact 
function of neither gene is known. A number of 
genes within ASxr’ have X chromosome homologs 
with ‘housekeeping’ functions, such as initiation of 
protein translation (eukaryotic translation initiation 
factor 2, y subunit, Y chromosome, Eif2yy; and dead 
box gene, Y chromosome, Dby), and ubiquitin metab- 
olism (ubiquitin activating enzyme E1, chromosome 
Y, Ubely; and ubiquitin specific protease 9, Y chromo- 
some, Usp9y, also known as Dffry). Three genes have 
no X chromosome homologs and are expressed 
pseudogenes (Ras homolog gene family, member A, 
Y chromosome 1 and 2: RhoAy1, and RhoAy2), or are 
nonfunctional (testis specific protein, Y chromosome: 
Tspy). Three loci have been localized to Sxr* outside of 
and proximal to ASx7*: Sry, the Rbm gene cluster, and 
a third RhoAy gene (RhoAy3). The functional genes 
localized to ASx7” can be categorized as follows. Zfy1, 
Zfy2, Ubely, and Usp9y are expressed exclusively in 
the testis, and their X chromosome homologs are 
X-inactivated in females. Smcy, Uty, and Eif2yy are 
expressed ubiquitously, and their X chromosome 
homologs are not X-inactivated. Dby is ubiquitously 
expressed, but the X-inactivation status of its X 
chromosome homolog is unknown. It is unclear if 
any of the genes identified in ASxr’ is Spy, and it is 
possible that the Spy phenotype results from the loss 
of multiple genes. 


See also: X-Chromosome Inactivation 


mRNA 


See: Messenger RNA (mRNA) 


mtDNA 
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mtDNA is the abbreviation for mitchondrial DNA. 


See also: Mitochondria; Mitochondrial DNA 
(mtDNA) 


Muenke Syndrome 


See: Craniosynostosis, Genetics of 


Mule 


A C Chandley 
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Mule is the F; hybrid formed by crossing a female 
horse with a male donkey. The less common hybrid 
of the reciprocal cross is known as the hinny. Both 
hybrids are very much a product of artificial selection 
and dependent on man for their creation. There appear 
to be no records of naturally occurring hybrids, horses 
and donkeys roaming together in the wild always 
mating preferentially with their own kind. 

Mules are bred principally for three tasks: pack 
work, draft work, and riding. As working animals, 
they have enormous economic potential, their life 
being almost twice that of a horse, and being capable 
of carrying more in proportion to their weight. They 
are better able to resist changes in climate, and can 
withstand hunger and thirst better than the horse. 
They can be worked in large teams without difficulty. 
In countries like China, they are still much valued as 
working animals and are created there by artificial 
insemination. In Britain, their use has been chiefly 
for service in the army in India and elsewhere abroad, 
their surefootedness making them invaluable over 
mountainous terrain. 

The chromosomal complements of the horse, don- 
key, mule, and hinny have been studied in somatic 
metaphases from peripheral blood lymphocytes. The 
diploid numbers are horse 27 = 64; donkey 2n = 62; 
mule and hinny 2% = 63. In addition to the numerical 
difference between the horse and donkey, there are 
structural differences, the horse having 26 metacentric 
pairs, the donkey having 38. 

Mules are famous for their sterility, and when testis 
biopsies have been studied, they show a histological 
picture of severe spermatogenic depletion, many 
testicular tubules lacking germ cells altogether. Break- 
down in the development of the germ cells appears 
to start early and is virtually complete by the pachy- 
tene stage of meiosis. Pairing difficulties are seen 
and ultrastructural studies on spermatocytes reveal 
absence of the synaptonemal complex normally asso- 
ciated with paired homologs. Nevertheless, a few 
spermatozoa have sometimes been recovered from 
the epididymis or ejaculate of mules and hinnies 
although these appear to be smaller than those of 
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normal horse or donkey and may not be capable of 
normal fertilization. Whether they are motile has not 
been established. No reports of pregnancy following 
the mating of a male hybrid to a female horse or 
donkey have been reported. 

By contrast, in spite of earlier skepticism associated 
with anecdotal reports of fertility in female mules and 
hinnies, a small number have now been found, obser- 
vations having been supported by good cytogenetic 
investigation. One outstanding example of fertility in 
a female mule came from Brazil, where three pregnan- 
cies were achieved over a short period of time in the 
late 1980s. Both horse and donkey sires were used 
successfully. Other good examples come from China, 
where a fertile mule and a fertile hinny each gave birth, 
by donkey sires, to filly foals, and the USA, where a 
mule gave birth to a colt foal, again sired by a donkey. 

Studies into gametogenesis in the ovaries of mules 
and hinnies show that, as in male hybrids, germ cell 
numbers are severely depleted, and even at the time of 
birth, numbers of oocytes are greatly reduced. A few 
oocytes do, however, survive to ovulation, and have 
been recovered from the Fallopian tubes. 

The mule and hinny have helped to shed light on 
the mechanism of action of the sex chromosomes. The 
X chromosomes of the horse and donkey are morpho- 
logically distinct from each other and carry a species- 
specific glucose-6-phosphate dehydrogenase (G6PD) 
locus. It is possible, therefore, to have both a morpho- 
logical and biochemical marker of the paternal and 
maternal X chromosomes inherited by the female 
hybrid. In contrast to random inactivation (Lyoniza- 
tion), it has been shown that preferential inactivation 
of the donkey X chromosome occurs in both the mule 
and hinny, with further selection in culture favoring 
cells carrying a horse X chromosome. 


See also: X-Chromosome Inactivation 


Muller, Hermann J 


J F Crow 
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Herman Joseph Muller (1890-1967), known to his 
friends as Joe, was the second geneticist to win the 
Nobel Prize. He was best known for his discovery 
that ionizing radiation produces mutations, but his 
ideas permeated the whole field of genetics during its 
first half-century, 1900-50. 

Muller was born and grew up in New York City. 
When he was nine years old, his father died, leaving 


1256 Muller, Hermann J 


the household quite poor. Yet, by working long hours, 
Muller was able to attend Columbia University, and 
he graduated with honors. As a student he was at- 
tracted to the Drosophila group of T. H. Morgan and 
became one of the brilliant group that Morgan was 
able to attract. After obtaining his doctorate, Muller 
taught at Rice University, then at the University of 
Texas where he did his widely acclaimed work on 
mutation. He later tried living in Germany, but just 
in time to encounter the Nazi regime. He then moved 
to Russia, unfortunately at the time when Lysenko 
was rising to power. He left by way of Spain and 
found a temporary position in Edinburgh in 1937. 
His hard luck continued as he was able to obtain 
only a temporary job at Amherst College where he 
spent the remainder of the war years. Finally, in 1945, 
at the age of 55, he was offered a faculty position at 
Indiana University. At last, he had a permanent job, 
with laboratory facilities and graduate students. He 
spent the remainder of his life there. The Nobel Prize 
came the following year. 

From his graduate student days, Muller had an 
interest in mutation. He developed one technique 
after another to measure mutation rates in Drosophila, 
first showing their temperature dependence and 
finally demonstrating that X-rays enormously en- 
hanced the rate. The major technical achievement 
was developing the CIB chromosome for objectively 
and quantitatively measuring the mutation rate. This 
system and its successors set the standard for Dro- 
sophila mutation work. Mutation became a subject 
that could be studied experimentally. 

Muller was the chief ideas-man of the first half- 
century of genetics. He was responsible for a large 
share of genetic thinking in the early period. He 
realized as early as 1922 that bacteriophage might 
provide the way to attack the gene. He formulated 
the properties that a the gene must have: (1) ability 
to store and carry information, (2) ability to copy 
itself, (3) ability to copy mistakes (mutation), and (4) 
ability to control development and function. He 
showed how to use deletions to overcome the limita- 
tion that mutation could only substitute, not add or 
subtract information. While in Russia, Muller made 
use of the recently discovered salivary gland chromo- 
somes in Drosophila to pinpoint the location of genes 
and provide the first estimate of the gene’s size. He 
also showed the importance of heterochromatin in 
development, as brought out by the phenomenon of 
position effect. 

Throughout his life, Muller was interested in evo- 
lution. He pointed out an evolutionary advantage of 
sexual reproduction. He explained that polyploidy is 
rare in animals because of incompatibility with the 
sex-determining mechanism. He emphasized the 


importance of gene duplications in facilitating evolu- 
tion by making possible the acquiring of a new func- 
tion while retaining the old. He contrived a way of 
getting advanced generations from sterile hybrids 
between two Drosophila species. He pointed out 
ways in which geographical isolation could lead to 
separate species. He used the phenomenon of dosage 
compensation to point out the precision of genetic 
adaptation. 

Muller also had an interest in human genetics. He 
was the first to study identical twins who had been 
reared apart. He estimated the human mutation rate 
from the study of children of consanguineous mar- 
riages. And, in an influential paper entitled “Our 
load of mutations” (Muller, 1950), he introduced a 
quantitative way to assess the impact of mutation 
on the population. Starting with his first paper on 
radiation-induced mutation, he was involved in a cru- 
sade to limit all unnecessary use of radiation. 

Muller’s interest in human genetics did not stop 
with research. He was an enthusiastic advocate of 
eugenics. While rejecting the crudities of the earlier 
eugenics movement and insisting that all such actions 
be voluntary, he advocated artificial insemination as a 
means of genetic improvement. He argued that sperm 
not be used until the donor had died, so that diseases 
of old age would be discovered and lifetime worth 
could be assessed. He thought that such a technique 
might eventually be widely used, and he rejected “the 
stultifying assumption that people would have to be 
coerced rather than inspired” to participate in such a 
program. 

Muller’s two great crusades had different out- 
comes. His advocacy of limitation of radiation expos- 
ure was a great success, and is reflected in the strict 
radiation protection standards now in force in many 
countries. In contrast, his program of positive 
eugenics has been a failure. Although artificial insem- 
ination is regularly employed in medical practice, it 
has not had any appreciable acceptance and use as a 
eugenic measure. 
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Multicopy plasmids are those present in bacteria with 
a copy number greater than one per chromosome. 


See also: Plasmids 


Multifactorial Inheritance 
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When the expression of a trait is determined by alleles 
of at least two separate genes or by one or more genes 
and environmental factors, it is said to be a multi- 
factorial trait. 


See also: Complex Traits; QTL (Quantitative Trait 
Locus); Quantitative Inheritance 


Multiple Endocrine 
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The multiple endocrine neoplasia (MEN) syndromes 
are familial forms of cancer that affect several endo- 
crine tissues or cell types. MEN includes two unre- 
lated syndromes, MEN type 1 and MEN type 2, 
which are associated with distinct and characteristic 
combinations of tumors affecting specific endocrine 
tissues. These tumors frequently secrete very high 
levels of protein products that would normally be 
expressed by the cell type from which they arise 
(e.g., insulin, gastrin, epinephrine). As a result, the 
affected individual often suffers from serious compli- 
cations associated with over expression of these nor- 
mal molecules, in addition to any effects of the cancer 
itself. Both MEN 1 and MEN 2 are inherited as auto- 
somal dominant diseases but they arise by very differ- 
ent genetic mechanisms. 


Multiple Endocrine Neoplasia Type | 
(MEN |!) 


MEN 1 is characterized by tumors of the endocrine 
cells of the parathyroid and pituitary glands, and of 
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the islet cells of the pancreas, although other, less 
common, tumors also occur. The disease affects about 
2 to 20 individuals per 100 000 in the population. The 
penetrance of MEN 1, or the probability of expressing 
symptoms if one has inherited a disease mutation, is 
high. More than 90% of individuals with MEN 1 
develop tumors by their fifth decade and approxi- 
mately 60% develop two or more tumor types. 
Tumors generally appear in the second and third dec- 
ade of life. MEN 1 is caused by mutations of the tumor 
suppressor gene MEN1 which lies on chromosome 
11q13 and encodes the menin protein. Individuals 
with MEN 1 have an inherited mutation of the 
MEN1 gene. A variety of mutations including dele- 
tions, amino acid substitutions, and premature stops 
have been identified. The majority of these result in 
truncation of the menin protein. In tumors from 
MEN 1 patients, there is frequent mutation or loss 
of the remaining (normal) copy of the gene or loss of 
large regions of chromosome 11 including the gene, 
resulting in absence of menin protein in the tumor 
cells. As yet, the functions of menin are not fully 
understood. It is broadly expressed in both adult and 
developing tissues, not just in the cell types affected 
by MEN 1. The protein is chiefly localized in the 
nucleus where it has been shown to bind to the tran- 
scription factor JunD and repress its ability to stimu- 
late expression of its target genes. As yet, it is not 
known whether the menin-JunD interaction contrib- 
utes to the MEN 1 disease phenotype or whether 
menin has other important interactions that have not 
yet been recognized. 


Multiple Endocrine Neoplasia Type 2 
(MEN 2) 


The MEN 2 cancer syndrome is associated primarily 
with medullary thyroid carcinoma (MTC), a tumor of 
the endocrine C cells of the thyroid. This syndrome 
affects approximately 1 in 25 000 individuals and may 
be divided into three subtypes based on the tumors 
that are present. In addition to MTC, MEN 2A is 
characterized by tumors of the adrenal gland, called 
pheochromocytoma, and of the parathyroid gland. In 
MEN 2B we see the same thyroid and adrenal tumors 
but instead of the parathyroid tumors there are other, 
physical features including an elongated body shape 
(Marfanoid phenotype) and small bumps caused by 
clumps of nerve cells (neuromas) on the mouth and 
lips. In the third subtype, familial MTC (FMTC) the 
only disease feature is the thyroid tumors. All three 
forms of MEN 2 are caused by mutations of the RET 
proto-oncogene which lies at chromosome 10q11.2. 
The RET protein is a receptor molecule found on the 
cell surface of some endocrine, nerve, and kidney cell 
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types. Normally, it is stimulated by binding a circulat- 
ing ligand molecule and a cell surface co-receptor. The 
RET mutations in MEN 2 are almost exclusively sin- 
gle amino acid substitutions that result in continuous, 
unregulated activation of the RET receptor. Unlike 
those found in other cancer syndromes, MEN 2 muta- 
tions do not inactivate RET but either render it 
independent of the ligand molecules that normally 
control its activity or cause RET to recognize in- 
appropriate targets that trigger a cascade of inter- 
actions leading to cell proliferation. More than 95% 
of MEN 2 patients inherit a mutation that changes one 
of only 10 amino acids in the protein, making muta- 
tion testing in MEN 2 quite simple to perform. Once 
a RET mutation is identified, the patient generally 
undergoes prophylactic surgery to remove the thyroid 
before tumors can arise, effectively removing the 
major tumor risk. Thus, MEN 2 represents an instance 
where identification of the disease causing mutation 
has greatly improved our ability to both diagnose and 
manage the disease. 


See also: RET Proto-Oncogene 


Multiplicity of Infection 
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Multiplicity of infection (MOI) is the ratio between 
the number of viruses in an infection and the number 
of host cells. This ratio can be determined approxi- 
mately by adjusting the relative concentration of virus 
and host. It cannot be determined exactly for each 
individual host cell, but the average MOI can be cal- 
culated. The ability to adjust this ratio is important. In 
some experiments it is important that there is only one 
virus infecting each host cell. In others, it is most 
important to ensure that virtually all host cells have 
been infected, possibly with each of two different 
phage mutants if genetic experiments are being con- 
ducted, and a high MOT is used. 

Both the phage and bacteria diffuse randomly and 
collide and bounce off each other until the phage 
interacts with an appropriate receptor. At any given 
average MOI, there will be a substantial range in the 
number of phage that infect each bacterium; a math- 
ematical function called the “Poisson distribution’ can 
be used to show the distribution in number of phage per 
cell at any given MOI. Some bacteria will not be infect- 
ed at all, even at high MOI. For very virulent phage 
like T4, where a single phage particle is sufficient 


to initiate an infection, the zero-order term of the 
Poisson distribution can be used to calculate the actual 
MOI of infecting phage particles, given the number of 
surviving (i.e., uninfected) bacteria. This calculation 
makes the assumption that all parts of the culture were 
equally accessible to the bacteria and phage, i.e., that 
the two were thoroughly mixed: 


No. of surviving bacteria — MOI 
original bacterial titer 


Thus, the MOI = — In (fraction of bacteria that sur- 
vive the infection). 

When carrying out genetic crosses, the relative 
numbers of each of the two different phages involved 
can be calculated by using the appropriate term of the 


Poisson distribution. Such techniques are discussed in 
Karam (1994). 
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Multisite Mutation 
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A multisite mutation is one of a category of perman- 
ent, heritable change in DNA that is the result of the 
loss of more than two adjacent nucleotide pairs from 
the genome. This type of mutation is also a deletion — 
although not every deletion is a multisite mutation, 
because deletions of 1 bp can occur. The term ‘multi- 
site’ was coined prior to the advent of molecular 
techniques for DNA analysis, when pairwise crossing 
between mutant organisms was a prominent method 
for analyzing genome structure. Operationally speak- 
ing, multisite mutations were distinguished from 
‘point’ mutations by virtue of their inability to yield 
wild-type recombinants in pairwise crosses with more 
than one different point mutant. Mutants were classi- 
fied as ‘point’ if they could be shown to undergo true 
reversion; multisite mutations fail to revert (although 
sometimes they may be phenotypically reversed by 
suppressor mutations). There is no upper limit to the 
number of nucleotide pairs whose deletion can give 


rise to a multisite mutation. Multisite mutations can 
be entirely internal to a gene or can encompass mul- 
tiple genes. 


See also: Mutation; Mutation Rate 


Mus musculus 
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Mus musculus (L) is the polytypic species which 
encompasses all the subspecies and geographical or 
chromosomal races of the house mouse. It is also 
the species which has paid the largest tribute to 
modern science, since the historical laboratory strains 
(the ‘old inbreds’) stem from this species through 
the old tradition of ‘fancy’ mice which were bred 
in Europe and Asia for their coat color. For this 
reason, there is a great deal of literature on many 
aspects of its biology (see, for example, Berry et al., 
1990; Boursot et al., 1993). We illustrate here what 
geneticists have deduced about its history of differ- 
entiation, which occurred long before its commensal- 
ity with humans. 


Origin and Differentiation 


Because of its vast home range, the variety of eco- 
logical conditions under which it occurs, and its com- 
plex evolutionary history, the house mouse shows 
extensive variation in coat color and other morpho- 
logical characters. Consequently, its systematics has 
attracted much debate for a long time. 

Insights into the origin of the present day diversity 
come from the genetic study of mice from the central 
part of Eurasia, i.e., from the Middle East to the north- 
ern Indian subcontinent. Phylogeographic recon- 
struction of nuclear or mitochondrial gene variation 
and molecular divergences indicate that the radiation 
of M. musculus has occurred from this region of the 
world, most probably within the last few hundred 
thousand years. From there, as illustrated in Figure I, 
the species has radiated outwards in several directions, 
conquering distinct geographical regions separated 
by deserts or mountain ranges, which are abundant 
in this part of the world. This has triggered the rapid 
geographic differentiation of several genetic isolates. 
Such isolates or quasi-isolates are still to be found in 
every distinct montane basin so far examined, such as 
western and northern Iran, Afghanistan or, as shown 
recently (Prager et al., 1998), the eastern part of the 
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Arabian peninsula. Naming all these local forms, for 
which about 150 Latin names have been proposed 
since Linnaeus (Marshall, 1998), is a nomenclatorial 
nightmare that is perhaps preferable not to resolve, 
pending more information on their relationships and 
interaction in the wild. 


Evolution in Association with Humans: 
Secondary Expansion 


After this initial radiation, the evolutionary history of 
Mus musculus continued due to the remarkable feature 
of association with humans. Probably because of its 
steppic origin, it showed an excellent preadaptation 
to profit from grain storage of early Neolithic man- 
kind. Then, when the agricultural revolution started, 
it embarked on a new range expansion through com- 
mensalism, that led the already well differentiated 
local forms at the periphery of its initial range to 
establish themselves over the entire planet within the 
last 12 000 years (Auffray et al., 1990). This occurred 
at least three times independently, giving rise to the 
now well recognized peripheral subspecies: M. m. 
domesticus stemmed westward from the near East 
(Fertile Crescent) toward Europe and the Medi- 
terranean; M. m. musculus colonized almost all of 
the Palearctic from eastern Europe to China, starting 
at the northern slopes of the Himalayas; M. m. casta- 
neus went eastward from India through southeast 
Asia. 

More recently, an increase in human traffic across 
the oceans has led to the colonization of the rest of the 
world by house mice, most prominently the European 
M. m. domesticus in the Americas, but traces of long- 
distance transportation have also been reported for the 
Asian M. m. castaneus around the Pacific and Indian 
oceans. 


Secondary Admixture, Hybridization, 
and Reproductive Isolation 


The recent expansion of the species range in several 
directions has resulted in secondary contacts between 
peripheral subspecies, which are still able to exchange 
genes wherever they come into contact. In Europe, 
M. m. domesticus and M. m. musculus show limited 
genetic exchange across a narrow hybrid zone (30- 
40 km wide) where they form natural hybrid popu- 
lations. However, patterns of genetic introgression 
across the zone vary depending on the regions of 
the genome considered, and the introgression of sex 
chromosomes appears to be particularly impeded, 
preventing the two subspecies from rehomogenizing. 
In contrast, the secondary contact between M. m. 
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Figure | 


Geographical distribution of the three peripheral subspecies of the complex species Mus musculus. 


Arrows represent the plausible migration routes out of the cradle of species during its radiation. The insert 
represents the phylogeographic tree based on genetic distances at 30 protein coding loci. (Modified from Boursot 


et al, 1996 and Din et al., 1996.) 


musculus and M. m. castaneus in central China 
appears to have resulted in a more thorough admixture 
of the two subspecies. Both of these last two subspe- 
cies have contributed to the colonization of the Japan- 
ese archipelago, and this hybrid Japanese population 
is often referred to as M. m. molossinus (Yonekawa 
et al., 1988). M. m. castaneus and M. m. domesticus are 
also known to have formed hybrid populations in the 
Hawaiian islands and in California. These incidents of 
gene exchange, enhanced by human activities, justify 
Mus musculus being considered as a single polytypic 
species, despite partial reproductive isolation of some 
of its components. 

Genetic divergence is also happening between 
closely related populations. In several parts of its 
range, M. m. domesticus populations are fixed for 
major chromosomal mutations that reduce their 
number of chromosomes as a result of chromosomal 
fusions (from one to nine Robertsonian centric 
fusions). These chromosomal races are geographically 
limited, and rarely hybridize with neighbouring 
populations that carry either the ancestral 2n = 40 
karyotype, or karyotypes with different chromosomal 
mutations. This is an emblematic example of rapid 
genetic differentiation that has occurred in a very 
short time, maybe as short as a few hundred or one 
thousand years. 


Laboratory Strains and the Wild 
Reservoir of Genetic Variability 


The ‘old inbred’ laboratory mouse strains that are 
used in biomedical research were raised at the begin- 
ning of the century from a few founders coming 
from the tradition of fancy mice. It has been shown 
that they all share a single mitochondrial DNA 
haplotype which is relatively frequent in wild M. m. 
domesticus populations. However, they also carry a 
single Y chromosome type, which is relatively rare in 
nature and found only in mice from Japan and on the 
Chinese mainland facing Japan. In contrast to their 
fixed mitochondrial and Y chromosomes, these inbred 
strains harbor extensive variation at many nuclear 
genes. Though the nuclear genome appears essentially 
of domesticus origin, such extensive variation is best 
accounted for by contributions from two other sub- 
species (musculus and castaneus). These strains thus 
appear to be complex hybrids, an important fact to 
bear in mind when using them to study genome 
expression. 

The extent of genetic variation present in wild 
populations is, however, only partially represented in 
the ‘old inbreds’ and represents a huge pool of genetic 
diversity. This has been exploited in the 1970s and 
1980s through the production of a new generation 
of wild-derived laboratory strains of known origin, 


which may give a more faithful image of the diversity 
available in Mus musculus, and provide useful variants 
for genetic analyses (Avner et al., 1988). For example, 
the centric fusions from domesticus chromosomal 
races have been convenient centromeric markers in 
genetic mapping experiments, and other contributions 
of these new wild-derived sources of variability are 
now commonplace in mammalian genetics. 
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Mus musculus castaneus is a subspecies within the 
M. musculus group of house mice with a natural range 
across parts of China and Southeast Asia. M. m. cas- 
taneus animals breed readily with traditional inbred 
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strains of mice (M. m. domesticus), and the divergence 
between the two strains can be used efficiently for 
linkage analysis and mapping studies. 


See also: Hybrid Zone, Mouse; Mus musculus 
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Mus spretus Lataste, known as the western Mediterra- 
nean short-tailed mouse (or Aboriginal or grassland 
or Algerian mouse), belongs to the youngest clade of 
the subgenus Mus, which also encompasses the house 
mouse (M. musculus) and its subspecies, and the other 
two Palearctic short-tailed mice M. spicilegus (Petenyi) 
and M. macedonicus (Ruzov). 

The range of M. spretus is limited to the Mediter- 
ranean climatic zone of France, Spain, and North 
Africa, as well as the Atlantic coast of Morocco and 
Portugal (Figure 1). It was not before the advent of 
protein electrophonesis that its status as a species dis- 
tinct from M. musculus was firmly established (Britton 
et al., 1976). M. spretus seems to be more drought- 
tolerant than M. musculus domesticus, the subspecies 
with which it is sympatric, and it is an ecological 
competitor of feral populations of this subspecies in 
those places where there is enough water to support 
both. On the other hand, it seldom enters houses, so it 
is common to find situations where one species occu- 
pies the cellar while the other lives in the garden. 

The behavioral and physiological bases of this 
ecological differentiation has been the subject of a 


Figure | 
Mediterranean (hatched). (After Gray and Hurst, 1997.) 


Distribution of Mus spretus around the 


1262 Muscular Dystrophies 


number of comparative studies (Gray and Hurst, 1997 
and older references therein). Despite its rather nar- 
row geographic range, populations of M. spretus 
are genetically differentiated. Those of Europe show 
reduced genetic polymorphism, with one major mito- 
chondrial clade, whereas North African mice show 
much more polymorphism and a marked differ- 
entiation between eastern and western populations 
(Boursot et al., 1985). This suggests a possible recent 
postglacial colonization or recolonization of the 
Iberian peninsula. 

Mus spretus is perhaps best known through its con- 
tribution to modern mammalian genetics (more than 
130 publications within the last 5 years) than through 
its peculiarities as a wild species. Actually, despite a 
separation from M. musculus estimated by various 
molecular techniques between 3 million years ago and 
1.5 million years ago (see for instance Lundrigan and 
Tucker, 1994), it has proven possible to obtain F1 
hybrids against laboratory strains. Although the 
males are sterile, the females are not, and it is thus 
possible to get backcross offspring. Many gene pro- 
ducts have diverged between M. spretus and M. m. 
domesticus but chromosomal organization has re- 
mained virtually identical. This provided a very 
powerful means of analyzing gene segregation, 
at a time when genetic variants were rare. In this 
manner, two protein loci where mapped as early as 
1979 (Bonhomme et al., 1979). Subsequently, spretus 
x domesticus crosses became the basis for the estab- 
lishment of the first comprehensive mouse genetic map 
(see Avner et al., 1988, for a review). However, the 
existence of hypervariable DNA loci which are poly- 
morphic between laboratory strains has now rendered 
almost obsolete the use of interspecific crosses. Never- 
theless, Mus spretusis still one of the best candidates asa 
comparative model, together with the two sibling spe- 
cies M. macedonicus and M. spicilegus which are also 
intercrossable with M. musculus. Various inbred and 
non-inbred laboratory strains exist, through the 
establishment of congenic lines containing a spretus 
chromosome fragment embedded in a musculus back- 
ground, and the species offers a unique opportunity to 
understand the evolution of gene interactions. 
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The muscular dystrophies are a group of inherited 
disorders of muscle characterized by progressive 
muscle wasting and weakness. A unifying feature is 
the muscle histology which typically includes vari- 
ation in fiber size, muscle fiber necrosis, and eventually 
replacement by fat and connective tissue. On the basis 
of predominant muscle weakness several different 
types can be diagnosed (Figure 1). 


Duchenne and Becker Muscular 
Dystrophy (DMD and BMD) 


Duchenne muscular dystrophy (DMD) has been 
described elsewhere (Duchenne Muscular Dystrophy 
(or Meryon’s Disease)). Becker muscular dys- 
trophy (BMD) is a clinically similar X-linked reces- 
sive condition but is milder, affected individuals often 
surviving into middle age. Both DMD and BMD are 
due to mutations in the dystrophin gene at Xp21, 
which results in a deficiency of dystrophin in DMD, 
and a partial deficiency in BMD. 


Emery-Dreifuss Muscular Dystrophy 
(EDMD) 


This form of dystrophy is characterized by proximal 
(scapulohumeral) weakness in the upper limbs and 
distal (peroneal) weakness in the lower limbs, early 
contractures of the postcervical muscles, elbows, and 
tendo Achilles, and cardiac conduction defects, the 
latter often requiring the life-saving insertion of a 
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Figure | 


Distribution of predominant muscle weakness in different types of dystrophy: (A) Duchenne and Becker; 


(B) Emery—Dreifuss; (C) limb girdle; (D) facioscapulohumeral; (E) distal; (F) oculopharyngeal. 


pacemaker. EDMD may be inherited either as an 
X-linked recessive or rarely as an autosomal dom- 
inant trait. The former is due to mutations at the 
Xq28 locus resulting in an absence of emerin, a ubi- 
quitously expressed nuclear membrane protein. 
The diagnosis can be established by immunohisto- 
chemical staining for emerin in peripheral blood 
leukocytes or a buccal smear. The rarer autosomal 
form is due to mutations in the nuclear lamin A/C 
gene at 1q11-23. 


Limb Girdle Muscular Dystrophies 
(LGMD) 


These are a very heterogeneous group of disorders 
characterized by predominantly limb girdle weakness. 
So far six milder autosomal dominant types have been 
recognized and nine severer recessive types. Four of 
the latter are due to specific deficiencies of various 
sarcoglycans of the muscle cytoskeleton and one is 
due to a muscle-specific protease (calpain 3) deficiency. 
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Clinical diagnosis depends on demonstrating the defi- 
ciency of a specific protein on muscle immuno- 
histochemistry and mutation analysis on peripheral 
blood leukocytes or amniotic fluid cells or chorionic 
villus material for prenatal diagnosis. 


Facioscapulohumeral Muscular 
Dystrophy (FSHMD) 


The essential features are weakness of the facial, 
scapulohumeral, anterior tibial and later pelvic girdle 
muscles. EcoRI restriction fragments associated with 
the gene (at 4q35) are greater than 35 kb in normal 
individuals, but less than this in FSHMD. In this way 
suspected and asymptomatic cases can be diagnosed. 
The shortest fragments are associated with more 
severe disease. This information can be used in coun- 
seling and prenatal diagnosis. 


Distal Muscular Dystrophy 


These rare types are associated with mainly distal 
weakness. Autosomal dominant (largely Scandi- 
navian) and recessive types have been recognized. The 
gene and its protein product have so far been identified 
only in one rare (Miyoshi) type (dysferlin, chromo- 
some 2p) which is allelic with a type (2B) of LGMD. 


Oculopharyngeal Muscular Dystrophy 
(OPMD) 


This autosomal dominant type occurs largely, but not 
exclusively, in French Canadians. It is characterized 
by onset in late adulthood of progressive ptosis and 
dysphagia followed by involvement of other cranial 
and limb muscles. The gene (at 14q11-13) product 
(poly(A) binding protein 2) leads to a (GCG) triplet 


expansion greater than normal. 


Congenital Muscular Dystrophy (CMD) 


This autosomal recessive type presents at birth or 
early infancy with hypotonia and generalized weak- 
ness. Around 50% of cases are due to a deficiency of 
the extracellular muscle protein laminin «2 chain or 
merosin (chromosome 6q2). Some cases are due to a 
deficiency of the merosin receptor (integrin q7). 
Prenatal diagnosis is possible from either direct im- 
munohistochemical staining of chorionic villi with 
labeled antibodies to merosin (or integrin) or molecu- 
lar genetic studies. A common recessive form of CMD 
in Japan (Fukuyama CMD), with severe CNS abnor- 
malities, has been mapped to chromosome 9q31-33 
and the gene product named fukutin. 
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Results in phage, bacteria, yeast, rodent cells, and 
human cells have shown that mutation is highly spe- 
cific both in type and in location. The patterns of 
mutation produced in a gene by mutagenic agents or 
processes have been termed mutational spectra (see 
Figure | for an example). The mutational spectrum 
induced by an agent is determined by the chemical and 
enzymatic specificity involved in each step of its 
mutagenic pathway. These steps include: (1) initial 
DNA adduction or damage, (2) replication infidelity 
at DNA lesions, and (3) DNA repair, both pre- and 
postreplication of the site of DNA damage. Due to 
this specificity, each mutagenic agent exhibits a char- 
acteristic mutational spectrum in a given species or cell 


type. 


Chemical Structure of Mutagens 


Different mutagenic agents possess different chemical 
structures which, when allowed to react with DNA, 
present different modified DNA structures to the 
cell’s repair and replication machinery. Covalent 
modifications to DNA can be simple, such as methyl 
groups deposited on DNA by methylating agents, or 
they can be very large and complicated, such as those 
produced by the electrophilic metabolites of multi- 
ring compounds like aflatoxins. Given the diversity 
in chemical structures, it is not too surprising that 
different mutagens can produce different types of 
mutations. 

The nature of the DNA modification produced by 
a mutagen defines its possible genetic consequences. 
Conceptually, these modifications fall into several 
general classes as follows. 


Base Modifications that Alter Base Pairing 
Preference 

Certain synthetic base analogs such as 5-bromodeoxy- 
uridine (BU) can be incorporated into DNA by living 
cells. BU is an analog of thymidine and is inserted 
opposite adenine in the DNA. However, once in 
DNA, BU can base pair with guanine during subse- 
quent replication resulting in AT—GC transition 
mutations. Some covalent modifications produced by 
exogenous mutagens can also cause the affected base 
to mispair during replication. A good example is O6- 
methylguanine which is produced by reaction of a 
methylating agent with DNA. During replication, 
Og-methylguanine behaves more like adenine than 
guanine, causing it to mispair with thymidine. Subse- 
quent replication of the Og-methylguanine: thymi- 
dine mismatch results in GC—AT transitions. 


Uninformative Base Modifications 

Some mutagens can produce modified DNA bases that 
appear ‘uninformative’ to the polymerase. These may 
be bypassed in an error-prone manner with a random 
selection of a base opposite the modified base. The 
ultimate example of this class is an abasic site in 
which the template base has been completely removed. 
Polymerase can synthesize over such lesions, albeit 
inefficiently. In Escherichia coli, there appears to be a 
preference for the insertion of an adenine opposite the 
uninformative lesion, but this is not absolute, while in 
mammalian cells sucha preference is not evident. Con- 
sequently, a particular uninformative lesion may give 
rise to a variety of base substitutions. 


Modifications blocking DNA Synthesis 

Some DNA modifications present a block to replica- 
tion. N3-methyladenine is a good example. Because 
synthesis past the lesion is not possible, these base 
modifications cannot directly give rise to a mutation. 
To allow continuation of replication the cell must first 
remove the lesion either by base excision, excision re- 
pair, or recombination. While these repair processes are 
considered to be quite accurate, the chance for an in- 
direct mutagenic event during repair is still a possibility. 


Nonmutagenic Base Modifications 


Certain base modifications pose no direct threat to 
the fidelity of DNA replication. A good example is 


Figure | 
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N7-methylguanine which behaves normally during 
DNA replication. However, N7-methylguanine and 
other similar adducted bases may spontaneously 
depurinate or be removed by a glycosylase to form 
an abasic site. Although abasic sites are repaired 
very efficiently in the cell, they can promote muta- 
tions if they persist during the time of DNA 
replication. Therefore, modifications such as 
N,-methylguanine may promote mutations in an 
indirect manner. 


DNA Strand Breakage 

Agents capable of reacting with the phosphoribose 
backbone of DNA directly or indirectly through the 
production of reactive oxygen species are capable of 
producing single- and double-strand breaks in 
DNA. Ionizing radiation is a good example of a strand- 
breaking agent. Possible mutagenic outcomes of 
strand breakage include rearrangement and deletion 
of large sections of DNA. 


DNA Intercalaton 

During DNA synthesis, flat planar molecules such as 
acridines can slip between the stacked bases of the 
DNA helix in a noncovalent manner and stabilize 
loop-out structures in either the template (parental) 
strand or in the newly-synthesized daughter strand. 
This can promote polymerase slippage, resulting in 
either the addition of an extra base (+1 frameshift) if 
the loop-out is in the daughter strand or the deletion 
of a base (—1 frameshift) if the loop-out is in the 
template strand. This is especially prevalent in regions 
of DNA containing runs of repetitive bases. 


Sequence-Specific Reaction of Mutagens 
with DNA 


Direct-acting mutagens and the reactive metabolites 
of promutagens are usually electrophilic compounds 
that form covalent bonds with nucleophilic sites on 
DNA. There are a total of 17 nucleophilic sites on the 
four DNA bases; phosphate groups on the helix are 
also reactive. The N; and N; positions of guanine and 
adenine are the most nucleophilic. 

The strength of a given electrophile is one of the fac- 
tors that determines the ratio of reaction products pro- 
duced at the nucleophilic sites on DNA. For example, 


(See over) Example of mutational spectra: distribution of background mutations and those induced by 


acridines, bromouracil, and ultraviolet light in the cl gene of bacteriophage lambda. The acridine mutations are +1 or 
— | frameshifts in runs of consecutive guanines. Mutation induced by bromouracil (BU) are all AT—GC transitions; the 
four most frequently mutated sites all contain the sequence 5’-ACGC-3’. Mutations induced by UV occur primarily at 
pyrimidine—pyrimidine sequences and comprise both transitions and transversions. The background mutational 
spectrum comprises transitions, transversion, frameshifts, and several large insertions. 
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a strong electrophile, such as ENU, reacts well at both 
the highly nucleophilic N7 position of guanine and the 
less nucleophilic Og position, while a weaker electro- 
phile, such as ethylene oxide, reacts predominantly at 
N3. A second important factor is steric hindrance that 
is imposed both by the structure of DNA and the 
structure of the mutagenic agent. For example, 
although the N; position of adenine is more nucleo- 
philic than the N; position of guanine, many more 
adducts are observed at the N; position due to its 
accessibility in the major groove. One can envision 
that smaller mutagens will have access to more places 
on the helix than larger bulky agents. Also, a muta- 
gen’s access/reactivity toward a given target base can 
be affected by the sequence surrounding it. 


Sequence-Specific Repair of DNA 
Damage 


The reactivity and structure of a mutagen defines the 
pattern of damaged bases produced in DNA. It is then 
the task of repair systems in the cell to restore the 
DNA structure back to its original state. 

Not all DNA damage is repaired at the same rate. 
Although repair occurs throughout the genome, the 
cell concentrates its efforts on the transcribed strand 
of actively expressed genes in an excision repair pro- 
cess called transcription-coupled repair (TCR). While 
the immediate goal of this process is to cleanse the 
transcribed strand of lesions that may block RNA 
polymerase, the end result is fewer mutatins in the 
transcribed than the nontranscribed strand. This can 
sometimes manifest itself in the resulting mutational 
spectra. For example, if an agent specifically produces 
promutagenic adducts at guanine bases and is subject 
to TCR, then most of GC base pairs that ultimately 
undergo mutation will have the guanine located in 
the nontranscribed strand, since adducts in the tran- 
scribed strand will have been preferentially removed. 

The different types of modified bases produced by 
a given agent may be repaired with different kinetics, 
so that with time the ratio of different lesions in the 
DNA may change. Also, a given type of altered base 
may be removed with different kinetics when located 
in different sequence contexts. The reason for these 
differences probably results from the relative effi- 
ciency with which the repair process can detect the 
damage. The end result of DNA repair processes is a 
reduced load of potentially mutagenic adducts, but the 
distribution of the remaining damage may be different 
than that originally deposited on the DNA by the 
agent. 

If an error is made by polymerase at a DNA modi- 
fication, the cell can utilize its mismatch repair system 
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to remove and resynthesize the region with the mis- 
inserted base. This repair system corrects certain types 
of mismatches more efficiently than others and dis- 
plays local-sequence-context effects; thus another 
layer of specificity is imposed on the mutations 
produced. 


Specificity of Background Mutation 


Polymerase errors (base misinsertions) during DNA 
synthesis contribute to background mutations. The 
enzymatic discrimination of correct nucleotides by 
polymerase and its ability to back up and excise incor- 
rectly inserted bases result in an amazingly low yet 
finite error rate during DNA replication. The prob- 
ability of inserting the incorrect base during synthesis 
is strongly influenced by surrounding sequence, and 
thus, this class of mutations is expected to display 
distinct patterns of changes. 

Certain normal cellular processes can produce 
chemically reactive intermediates such as oxygen rad- 
icals and methylating agents. These can react with 
DNA and promote mutations during replication just 
as agents applied externally to the cell. 


Target Gene Contribution to Mutational 
Spectra 


The genetic system and method used to measure 
mutation limits the mutations that can be detected, 
and as a result directly affect the observed distribution 
of mutations produced by a given agent. For example, 
mutants are often detected under selective conditions 
based on the activity of the protein product of a particu- 
lar gene (wild-type = active gene product, mutant = 
inactive gene product). Here, mutations that do not 
affect protein activity will not be detected. These 
‘invisible’ mutations will include those occurring at 
noninformative codon wobble bases or those resulting 
in amino acid substitutions that spare protein activity. 
Presently, virtually all mutation detection systems are 
based on a change in phenotype, but as molecular 
biology techniques become more sophisticated and 
sensitive, direct detection of all mutations at the 
DNA level should become feasible. 

The primary sequence of the target gene chosen for 
study may affect the apparent sequence specificity for 
a given agent. Many agents will react preferentially in 
certain sequence contexts, so a target gene enriched for 
that sequence will appear more mutable than other 
targets. Also, certain sequences are methylated by 
endogenous methylases to form modified DNA 
bases. Perhaps the most important is 5-methylcyto- 
sine which can be produced in the sequence 5’-CCA/ 
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TGG-3' in E. coli and in the sequence 5/-CG-3’ in 
mammalian genes (underlined C is methylated). It is 
known that 5-methylcytosine can deaminate to form 
thymidine in the DNA; if the resulting thymidine is 
replicated, a GC to AT transition results. 5-methyl- 
cytosine residues are known hot spots for sponta- 
neous mutation in both E. coli and in mammalian 
cells. Furthermore, it has been shown that the reaction 
of certain mutagens can be enhanced by the presence 
of 5-methylcytosine in DNA. 


See also: DNA Repair 


Mutagens 
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Mutagens are agents that cause an increase in the rate 
of mutation, including X-rays, UV irradiation, and a 
variety of chemicals. 


See also: Mutagenic Specificity; Mutation 
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This term is defined differently by population geneti- 
cists and other geneticists. In population genetics, a 
mutant allele is one that is present in a population at an 
allele frequency (see Allele Frequency) of less than 
1%, irrespective of its function or lack thereof. In all 
other areas of genetics, a mutant allele is one that dis- 
plays abnormal function as a consequence of a change 
in the coding or regulatory sequence associated with a 
particular gene. In both definitions, the term is used in 
a relative sense. According to the population genetics 
definition, the same allele that would be considered 
mutant in one population (where its frequency is less 
than 1%) could be considered wild-type in another 
population (where its frequency is greater than 1%). 
This is the case for alleles that control skin color in 
human populations observed alternatively from cen- 
tral Africa and Scandinavia. Again, according to the 
standard definition, an allele that causes harm in 
homozygous individuals (and would, therefore, be 
classified as mutant) could provide an advantage in 
heterozygous individuals (and would, therefore, be 


classified as wild-type). The sickle cell allele at the 
B-globin locus is an example of the latter. 


See also: Allele Frequency; Sickle Cell Anemia 
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Mutations are sometimes due to DNA damage 
inflicted by external agents, but mutagenic damage is 
often inflicted endogenously by the chemical reac- 
tions occurring normally inside the cell. Additional 
sources of mutations are independent of DNA 
damage, but are instead caused by enzymatic errors 
during normal DNA metabolism. Mutations are kept 
from accumulating by the inherently high accuracy of 
the enzymes that act on DNA and by efficient proof- 
reading systems that further correct errors before 
mutation becomes permanent in both strands of the 
DNA. Cells deal with most DNA damage by repair- 
ing it accurately. Nonetheless, the repairs themselves 
create opportunities for error. Sometimes the errors 
represent enzymatic processing of damage that has not 
yet been corrected. The need for low mutation rates 
(high fidelity) lies in the adverse biological conse- 
quences (disease and death) of many mutations; how- 
ever, mutations also provide the building blocks for 
molecular evolution. Thus, despite the negative con- 
sequences of some mutations, they are also necessary 


for life. 


What Is a Mutation? 


Heritable changes in the genomic nucleic acid 
sequence of an organism are called mutations. If a 
mutation becomes fixed in the genome of a cell, the 
progeny of that cell will carry the same mutation. The 
genomic nucleic acid is usually DNA but is RNA in 
some viruses. Mutations include substitutions, add- 
itions, deletions, or rearrangements in the linear 
array of bases in the nucleic acid. 

Changes in the sequence of DNA are considered to 
be mutations regardless of whether the mutation pro- 
duces any detectable effect on the gene product or its 
function in the organism. Indeed, mutations some- 
times have no discernible influence on anything other 
than the DNA sequence. Since mutations often cause 
deleterious effects on the ability of a gene to function 
properly, the consequences may be death or disease in 
the organism suffering the mutation. Mutations that 


have small effects or that occasionally have beneficial 
effects can be important contributors to the molecular 
evolution of DNA sequences. 

Pre-existing mutations in DNA molecules may be 
found in new combinations following the process of 
homologous recombination. This kind of recombin- 
ation happens during meiosis in the creation of haploid 
eggs and sperm during the reproduction of diploids 
such as humans. For example, each chromosome in the 
haploid egg is a recombined mixture of the DNA (and 
its mutations) from the two chromosomes that were 
present in the mother. This recombination process, 
which is highly accurate, gives rise to a new arrange- 
ment of pre-existing nucleotide sequences and is not 
considered to be a mutation. In extremely rare cases, 
errors may occur during meiotic recombination to 
produce a DNA sequence that is not present in either 
of the original DNA molecules undergoing recombin- 
ation. In these rare cases, the new sequence is a 
mutation. 


What Kinds of Mutations Occur and 
What are their Consequences? 


A mutation can be described by its genotype, i.e., what 
‘kind’ of sequence change occurred? For convenience, 
mutations are frequently subdivided into categories 
by both size and kind of change. This convention 
derives in part from the fact that the mechanisms 
that lead to different kinds and sizes of mutations are 
often different. 

Point mutations modify one or a small number of 
base pairs, but larger DNA sequence deletions, add- 
itions, or rearrangements also occur. These more ex- 
tensive mutations may be restricted to a single gene 
but when they exceed the size of a single gene they are 
called multilocus mutations. In organisms whose gen- 
omes consist of multiple chromosomes, the infor- 
mation in a whole or part of a chromosome can be 
lost, duplicated, rearranged, or translocated to a dif- 
ferent chromosome. The long-term consequences of 
these chromosomal changes can be appreciated by 
comparing evolutionarily related sequences. For 
example, man and mouse have different numbers of 
chromosomes and the order of genes on chromosomes 
are sometimes different, despite the fact that many 
genes are very similar. 

Mutations are sometimes also described by their 
phenotype, i.e., what are the consequences of the 
mutation? Phenotype is a secondary characteristic of 
a mutation and relies on the position of the mutation 
in the DNA of the organism. For example, the sub- 
stitution of an A-T for a G-C base pair might have 
absolutely no discernible consequence, or it might 
change an amino acid in a protein, which may or may 
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not alter its function. Such a substitution, at a different 
location, might also alter the expression level of a pro- 
tein or might prevent the production of a protein 
entirely. The differences depend on the whereabouts 
of the base substitution in the DNA sequence. 

Point mutation genotypes include base pair substi- 
tutions, additions, or losses, and base pair sequence 
inversions or complex changes. 


Base Substitution Mutations 

There are four bases in DNA. Thus, at any given site 
there are three possible substitutions (Figure |). Sub- 
stitutions are called transitions if the pyrimidine bases 
(T or C) substitute for each other in one strand or 
if purine bases (A or G) substitute for one another. 
Notice that when a pyrimidine is substituted in one 
DNA strand, the complementary pairing of bases in 
the duplex results in purine substitutions in the other 
strand. In contrast, substitutions are called transver- 
sions if a purine (A or G) substitutes for either of the 
pyrimidines (T or C) and vice versa. At any given site 
there is the possibility for one transition substitution 
and two transversion substitutions. If base substi- 
tution mutations happen randomly, one might expect 
there to be twice as many transversions, as transitions. 
However, mutations are not random, and in fact tran- 
sition mutations regularly occur more frequently than 
do transversions, a feature that is likely to reflect the 
greater similarity of a purine to a purine than a purine 
to a pyrimidine. 


Addition or Deletion Mutations 

Additions of 1bp or many base pairs may occur 
within a gene. Additions are often tandem dupli- 
cations of adjacent sequence. When this is the case, 
they are typically called duplications. Insertions are 
also found. In deletion mutations information is 
totally lost, and thus cannot be reversed by a second 
mutation. In contrast, additions can be reversed to the 
original sequence if a deletion precisely removes the 


added bases. 


G 
T 1 Transition 


2 Transversions 


C T 


Figure | Three different substitutions are possible at 
any position in the DNA. The more common transition 
mutations substitute purines for purines and pyrimidines 
for pyrimidines. The less common mutations substitute 
purines for pyrimidines and vice versa. 
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Figure 2 (A) A DNA polymerase typically incorporates C opposite G, thus no mutation. A polymerase will 
incorporate an A opposite the G in the syn configuration more often than it incorporates A opposite a G in the 
correct anti configuration. When the A is incorporated a transversion is produced. (B) The normal environment 
within the cell leads to substantial production of 8-oxoG damage, necessitating multiple repair mechanisms to keep 
mutagenesis at a low level. Repair mechanisms act at three different stages, as described in the text, to minimize the 
frequency of 8-oxoG occurrence in the DNA and to repair the mutations that do occur. 


In organisms that splice mRNA, base substitution 
mutations may cause splicing errors. Such events 
sometimes produce a mRNA containing fewer bases 
than normal. The resulting mRNA has a deletion, 
which sometimes shifts the translational reading 
frame. Nonetheless, the mutation is classified on the 
basis of its effect on DNA (i.e., it is a base substi- 
tution); it is neither a deletion nor a frameshift, even 
when the result is a deleted mRNA sequence. 


Inversions and Complex Mutations 
Inversions of DNA sequences are typically found to 
be associated with specific sequences that mediate 
their production through locally misaligned base pair- 
ing. Somearise through ectopic homologous recombin- 
ation or through ectopic DNA synthesis (related to 
the mechanism described in Figure 5) (Rosche et al., 
1997; Slupska et al., 2000). Technically speaking, G -C 
to C-G (and A -T to T- A) transversions can be con- 
sidered to be single base inversions. 

Complex mutations is a general term applied to 
multiple, not necessarily contiguous sequence changes 
that happen as a result of a single mutational event. 
Although the absolute frequency of these mutations is 
typically lower than single base changes, the fact that 
they impart multiple changes at the same time may 
play a quantitatively important role in the molecular 
evolution of DNA sequences (Ripley, 2001). 


Large Deletion/Addition Mutations 

The deletion of large segments of DNA is potentially 
deleterious, especially if the deleted genes encode one 
or more proteins that carry out key roles in physi- 
ology. The mechanisms that lead to deletion mutations 
can produce deletions that include multiple genes 
(multilocus deletions). Some humans with multilocus 
deletions display disorders called contiguous gene 
syndromes. The multiple gene deficiencies produced 
by the deletion cause combinations of symptoms that 
are a mixture of the symptoms displayed by indi- 
viduals with different single gene deficiencies. For 
example, deletions leading to the loss of the cellular 
receptor for the hormone androgen and to mental 
retardation can be attributed to deletions spanning 
two genes on the X chromosome (Schueler et al., 
2000). Advances in molecular cytology now allow 
more precise visualization of human chromosomes, 
including improved detection of deletions (and other 
DNA rearrangements). These techniques have sub- 
stantially improved the diagnosis of patients and 
have also been used to help correlate the genetic map 
of human DNA with the physical map of human 
chromosomes. 

Very large duplications can occur, but their pheno- 
types are often more subtle. Multigenic duplications 
are likely to result in chromosomes with wild-type 
copies of all genes. Thus, phenotypes are expected 


primarily when too much of a gene product is deleteri- 
ous. An interesting example of a gene duplication 
associated with neurological disease in humans is 
Charcot—Marie-Tooth disease (OMIM 118220; 
OMIM, 2000). The phenotype of the duplication is 
distinct from deletions of the same gene, which 
encodes peripheral myelin protein 22. Deletions are 
generally associated with a different neurological dis- 
ease called hereditary neuropathy with liability to 
pressure palsies or HNPP (OMIM 162500). 

Some addition mutations are due to insertion of 
other pieces of DNA. The human chromosome con- 
tains inactive insertions of mitochondrial DNA, for 
example. An important class of insertions in most 
organisms are transposable DNA elements. When 
insertions disrupt coding or regulatory regions ad- 
verse phenotypes are likely to occur. 


Chromosomal Mutations 

Chromosome loss or gain results in massive change in 
DNA content genetic that are expected to have major 
genetic consequences. Indeed, such mutations in the 
germline are often lethal. One of the most common 
chromosomal abnormalities in live-born children is 
Down syndrome, caused by trisomy (triplicate rather 
than the usual duplicate copies) of chromosome 
21 (OMIM 190685). The majority of chromosome 21 
trisomies arise in mothers due to improper segregation 
(chromosomal nondisjunction) at meiosis I. The 
molecular basis of this preference (mothers rather 
than fathers and meiosis I rather than meiosis II) is 
unknown. Interestingly, trisomy of chromosome 18 
arises most often due to maternal chromosomal non- 
disjunction at meiosis II. 

Chromosomal segregation errors also arise in som- 
atic tissue and play a role in human cancer. The 
fidelity of chromosome distribution to daughter cells 
in mitosis and meiosis is clearly important to the 
health and survival of an organism. 


How Frequently Do Mutations Occur? 


The accurate transmission of genomic DNA from one 
cell to the next is required for the existence of life. For 
example, after replication of the genome there are 
two genomic copies. These copies are equally divided 
among two progeny. If the new copy were full of 
deleterious mutations, the cell would be unable to 
reproduce. In multicellular organisms, if division of 
the fertilized egg did not lead to the accurate transmis- 
sion of DNA during subsequent divisions, the organ- 
ism would be unable to grow and function. Thus, 
mutations must be rare to enable life to continue. 

On the other hand, life would not exist without 
mutations. The diversity and evolution of living 
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organisms depends on not only the ability to develop 
genes with new functions but also the ability to come 
up with new combinations of genes to take advantage 
of or combat changes in the environment. For example, 
mutations allow disease-producing microbes to be- 
come resistant to antibiotics. The widespread use 
of antibiotics has resulted in selection for antibiotic- 
resistant organisms. 

Comparisons of mutation rates in organisms ran- 
ging from microbes to mammals suggests that muta- 
tion rates may be quite similar when compared on the 
basis of mutations/coding region of the genome/cell 
(sexual) division (Drake et al, 1998). However, 
because the sizes of the genomes differ (mammalian 
genomes are larger and have more genes than bac- 
teria), the mutation rate per base pair of genomic DNA 
is substantially lower in mammals than in bacteria. 

It has been estimated that new mutations arise in 
humans at a frequency of about 107'° to 107"! per bp 
or about 1-100 per person, on average. Although new 
mutations in each generation represent only a small 
fraction of the differences that distinguish the 
sequences in the DNA of unrelated people, their 
impact on the health of the population as a whole 
is not likely to be beneficial. Current regulatory ap- 
proaches to minimizing new mutations have focused 
on minimizing human exposure to known mutagens. 

Although mutations are rare, they are not ran- 
domly distributed in the DNA sequence. This was 
first demonstrated in the rI genes of bacteriophage 
T4, where frameshifts at just three sites account for 
two-thirds of all inactivating mutations arising in 
~ 1500 bp of DNA. Many genes in many organisms 
have subsequently been shown to have hot spots that 
often account for most of the mutations that inactivate 
gene function. 

The nonrandom distribution of mutations in the 
DNA reflects two important features of mutations, 
phenotype and mechanism. Mutations that produce 
no change in phenotype are usually not observed. 
For example, inherited human disorders are generally 
recognized by phenotype but sometimes only certain 
mutations in a gene cause a recognizable phenotype. 
An extreme example is achondroplasia (OMIM 
100800), a specific type of dominantly inherited 
dwarfism. DNA sequencing of unrelated individuals 
has identified only three specific base pair substitutions 
in the entire fibroblast growth factor receptor-3 gene 
to be associated with this phenotype. However, other 
mutations in this same gene (OMIM 134934) have 
different phenotypes. 

Mutational mechanisms contribute to nonrandom 
patterns of mutagenesis as well. Specific DNA se- 
quences are required for or stimulate specific mechan- 
isms. Thus, mutations in these sequences are the most 
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frequent when the mechanism is at work and the preva- 
lence of particular sequences in the gene are ex- 
pected to have a major influence on the distribution 
of mutations. 

Inherited mutations that are frequent in the popu- 
lation are called polymorphisms. Polymorphic differ- 
ences between unrelated individuals are frequent 
(1/1000-1/10000 bp of DNA). Polymorphic DNA 
variation is the basis for the use of DNA analysis in 
forensics and for identifying individuals at risk of 
inherited disease in susceptible families. The frequen- 
cies of specific polymorphisms reflect the structure of 
the population and are not a direct reflection of the 
frequency of newly arising mutations at that site. 


Why Mutations Occur 


Mutational mechanisms contribute to nonrandom dis- 
tributions of mutations in DNA. For example, deam- 
ination of cytosine occurs spontaneously at high rates 
in DNA and if left unrepaired results in the creation of 
a C to T substitution. Multiple repair pathways min- 
imize mutations due to deamination. One repair path- 
way cannot act at methylated C sites. In humans CpG 
sequences are frequently methylated. As a result there 
is a higher rate of C to T changes at these sequences 
when compared to other sequences. Nonrandomness 
is also associated with addition or deletion of bases. 
These mutations are often seen in repeated sequences. 
As will be seen below, misalignments of the repeat 
sequences can play a role in this kind of mutagenesis. 
The first of these reflects errors that occur during 
‘normal’ DNA metabolism of ‘normal’ DNA, and 
are due to enzymatic errors and/or unusual conform- 
ations of undamaged DNA: unforced errors. For ex- 
ample, DNA replication and recombination involve a 
multitude of enzymes that cut, copy, and rejoin DNA. 
When the enzymes carrying out these essential func- 
tions malfunction, mutations sometimes occur. But 
not all mistakes produce mutations. For example, 
many mutations are corrected by additional proof- 
reading enzymes called mismatch repair enzymes 
whose function is to detect the mistakes and correct 
them. In many cancers, elevated mutation rates con- 
tribute to the development of the disease and in some 
cases this has been shown to be due to mutations that 
inactivate enzymes whose normal job is to correct 
errors in DNA (Jiricny and Nystrom-Lahti, 2000). 
The second kind of mutational mechanism is triggered 
by damaged DNA. DNA is not perfectly stable. It 
accumulates damage from exposure to its biological 
environment. Damage can be due to DNA hydrolysis 
or modifications mediated or stimulated by oxygen, 
water, heat, and chemical byproducts of normal cellu- 
lar metabolism. Additional damage can be due to 


exposure of DNA to external environmental assaults 
such as the chemicals in cigarette smoke or exposure to 
physical agents such as radiation from X-rays or sun- 
light. One can think of the damage as an event that 
makes errors more likely. 

Nonetheless, most DNA damage is processed in 
a nonmutagenic way by ‘normal’ or ‘special’? DNA 
metabolism. Some DNA damage, if not repaired, has 
no major adverse effect on the stability of DNA or its 
ability to be replicated, but is however directly muta- 
genic because the damage changes the properties of a 
DNA base so that the base is interpreted ‘incorrectly’ 
by DNA polymerase, after which copying produces a 
base substitution mutation. Other DNA damage is 
not directly mutagenic, but has a severe negative effect 
on the stability of the DNA or acts as a road block to 
the replication machinery. Under these circumstances 
‘special’ enzymes are typically called upon to repair 
the damage and/or to aid in copying past the damaged 
DNA. Sometimes these special repair processes gener- 
ate mutations. 

DNA repair processes usually accurately remove a 
great deal of both types of base damage from the DNA 
before it is copied. Thus, DNA repair plays an import- 
ant role in reducing mutation frequencies by removing 
the damage before it is converted into a mutation. 
Indeed, there are regulatory functions in cells that can 
delay the start of replication when there is a great deal 
of damage to be repaired. Mutation frequencies would 
be far too high for life to be sustained if DNA damage 
were not constantly being repaired. 

DNA damage is not limited to base damage but also 
includes modifications of the sugar—phosphate back- 
bone DNA. Nicks and breaks are especially important 
in producing addition/deletion mutations and chromo- 
somal rearrangements. DNA nicks are also normal 
intermediates in DNA replication, recombination, 
and repair and thus may be produced by enzymes 
during DNA metabolism or they may alternatively 
be due to physical damage to the DNA. 


Examples of How Mutations Can Occur 


Our current understanding of the biochemical events 
responsible for mutations is incomplete. Because 
mutations reflect the diverse ways in which the gen- 
erally accurate transmission of genetic information 
from generation to generation is subject to error or is 
regulated to create mutations, it comes as no surprise 
that there are a large number of mechanisms produc- 
ing mutagenesis. 

Studies of DNA polymerases and how they interact 
with template and precursor substrates have led to 
considerable insight into the mutational mechanisms 
that depend on this important enzyme of the DNA 


replication process. All studies to date suggest that 
different DNA polymerases preferentially make dif- 
ferent kinds of mistakes. Most organisms have mul- 
tiple DNA polymerases. Prokaryotes like Escherichia 
coli have at least three different DNA polymerases 
(Tang et al., 1999; Wagner et al., 1999). The current 
number of polymerases identified in eukaryotes is at 
least nine. These different polymerases play different, 
but sometimes overlapping roles in DNA metabolism 
(Hubscher et al., 2000). Their diverse roles in muta- 
genesis are not fully known, but the importance of 
polymerization errors to mutagenesis is well estab- 
lished. 

DNA polymerization errors produce a diversity of 
mutations. Mistaking one base for another is one 
source of base substitutions. When the DNA poly- 
merase encounters a template base, for example a G, it 
usually incorporates a C, the complementary base; the 
base pair formed is that for a ‘normal’ DNA structure. 
However, occasionally another base is incorporated. 
In general terms, this may be for either of two reasons: 
(1) the template base (G) is misread, leading to a mis- 
incorporation; or (2) an incorrect substrate base (not 
C) is mistakenly incorporated. 

One way in which misreading can occur is when 
a base assumes an unusual conformation. For example, 
when a G base is rotated relative to the sugar to which 
it is attached so that it is in a syn rather than its 
usual anti configuration, the DNA polymerase sees a 
different part of the base than normal and sometimes 
creates a G- A rather than the usual G - C pair. If left 
unrepaired, the next round of replication produces a 
G-C to T-A change in the DNA (a transversion) 
(Figure 2A). 

Some chemical modifications of G can favor the 
adoption of the miscoding syn configuration, thus 
elevating mutation rates. A common spontaneous 
promoter of DNA damage by this mechanism is 
8-oxoG. Not surprisingly, there are multiple repair 
mechanisms in cells designed to prevent 8-oxoG- 
induced mutations (Figure 2B). Three classes of 
repair mechanisms are shown in Figure 2B. Class 1 
degrades 8-oxoG DNA precursors preventing incor- 
poration by DNA polymerase. Class 2 removes the 
8-oxoG that is in the DNA; 8-oxoG may be present in 
the DNA because it was incorporated by the poly- 
merase, or it may have directly formed in the DNA. 
Class 3 scans for mistakes made by mispairing with 
8-oxoG and corrects the them. Together these three 
repair processes can reduce base pair substitution fre- 
quencies by several orders of magnitude. 

Base substitutions are also induced by DNA lesions 
that act as road blocks. Because the normal poly- 
merases are unable to pass the lesions, special DNA 
polymerases are able to copy for a short distance to 
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Figure 3 Some lesions in DNA, such as thymine 
dimers, act as a block to DNA polymerization. However, 
other special polymerases in the cell are called upon in 
such emergencies to copy past the lesion, leading to 
more mutations in the immediate vicinity of the lesion. 


bypass the damage. These special polymerases appear 
to be considerably less accurate and thus mutations 
tend to occur at or near the lesion sites. Figure 3 
shows how a thymine dimer, DNA damage produced 
by sunlight, stops many DNA polymerases. The 
human XPV (eta) polymerase is a special polymerase 
that can replicate past the site of dimer damage. 
Humans having mutations in both copies of this poly- 
merase have a version of the disease called xeroderma 
pigmentosum (OMIM 278750), where patients are 
sensitive to light and have an increased incidence of 
skin cancer. 

Mutations can occur not only when enzymes make 
mistakes or when DNA is damaged, but also when 
DNA fails to maintain the proper alignment of its 
complementary strands during DNA metabolism. 
The misalignments of an elongating DNA with the 
templates in conjunction with otherwise accurate 
DNA polymerization are important sources of muta- 
tions. DNA is not static but is constantly undergoing 
physical transformations to accommodate the en- 
zymes responsible for gene expression, DNA repli- 
cation, recombination, and repair. For example, the 
Watson—Crick base pairs that make up a duplex 
DNA structure are transiently disrupted during 
these processes. 

Occasionally DNA sequence allows the refor- 
mation of duplex DNAs that are locally comple- 
mentary but are nonetheless ‘misaligned.’ A repeated 
DNA sequence is an excellent example. If a misalign- 
ment occurs in a repeat (Figure 4), and DNA poly- 
merases or other enzymes act as though the DNA 
strands were correctly aligned, mutations are pro- 
duced. Some of the most frequent base duplications 


1274 Mutation 


=> E 


the 


tl DNA misaligns ED [> [E> 


C> Polymerase Polymerase copies 


inserting an extra 


DEDE meme 


Figure 4 DNA repeats are among the most frequently mutated sequences in DNA. DNA misalignments are 
responsible for some of the mutations in these sequences. Here, a polymerase produces an extra (4th) copy of the 
repeat due to a misalignment of the DNA strand being elongated on the strand being copied. Converse misalignments 
produce a deletion of the repeat. 
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Figure 5 Misalignments between imperfectly repeated DNA sequences even when they are not adjacent can 
promote mutations. Here an imperfect repeat produces a templated substitution of a circle for a diamond. The 
sequence change may be a complex mutation or a more simple mutation. The hallmark of the event is that the mutant 
sequence is identical to the sequence from which it was templated. Palindromic DNA sequences allow misalignments 
between otherwise complementary strands (not shown). Misalignments of these types produce inversions as well as 
complex mutations. 


occur when a substantial number of identical base 
pairs are adjacent. But the repeat may be of any kind. 
For example, in Figure 4 the repeats labeled 1-3 
might be a 4bp sequence. The first sequencing of 
mutants at a hot spot, carried out in the Jacl gene of 
E. coli, showed that mutations at such a site dominate 
spontaneous mutagenesis (Coulondre et al., 1978). 
Related misalignments involving looping out of the 
template rather than the elongating strand produce 
deletions in these sequences. However, not all dupli- 
cations and deletions depend on DNA misalignments, 
so finding mutations in a repeat is not sufficient evi- 
dence to confirm the mutational mechanism (Ripley, 
1990). 

DNA misalignments may occur between more dis- 
tant sites (even different chromosomes). When the 
misaligned DNA is elongated by DNA polymerase 
at the wrong site but then returns to the original site 
and extension continues, mutations occur at all sites at 
which the distant and original sites differ. Thus, muta- 
tions can be as simple as a single base substitution or 
deletion, or as complex as the substitution or deletion, 
or as complex as the substitution of multiple new bases 
(complex mutations) (Figure 5). 


Further Reading 
Friedberg EC, Walker GC and Siede W (1995) DNA Repair and 
Mutagenesis. Washington, DC: ASM Press. 
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A back mutation, also known as reverse mutation, a 
mutation restoring the wild-type sequence of a gene. 
The term is usually reserved for exact reversion (a 
change back to the original nucleotide sequence), but 
may also be used for equivalent reversion (a change 
back to a synonymous codon). 


See also: Mutation, Silent; Mutation, 
Spontaneous; Reverse Mutation; Reversion Tests 


Mutation Frequency 
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Mutation frequency is the frequency at which a par- 
ticular mutation occurs. It is usually expressed as the 
mutation rate per replication. 


See also: Mutation Rate 
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Mutation, Leaky 
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A leaky mutation is one that allows some residual level 
of gene expression. 


See also: Gene Expression; Mutation 


Mutation Load 
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The mutation load is the decrease in fitness or viability 
(or other trait of interest) caused by recurrent harmful 
mutations. As pointed out independently by J.B.S. 
Haldane and H.J. Muller, the effect of mutation on 
fitness is independent of the harmful effects of the 
individual mutations, but rather is equal to the total 
mutation rate per gamete, multiplied by a factor of 2 if 
the mutants are dominant. This formulation assumes 
that the mutations at different loci act independently. 
When there is epistasis the formula is modified (see 
Haldane—-Muller Principle). The mutation load theory 
was used in the 1960s in an attempt to assess the total 
impact of mutation on the population, particularly the 
human population, and its possible increase from 
radiation and chemical mutagens. 

In a sexual population the mutation load can be 
greatly reduced if selection and recombination operate 
in such a way that several mutations can be eliminated 
at once. This can happen with synergistic epistasis. 
It is also accomplished by truncation selection, in 
which all individuals with more than a certain number 
of deleterious mutations are eliminated by selection. 
Although truncation selection can be practiced by 
breeders, it is unlikely that nature truncates. However, 
the truncation need not be exact and the approximate 
process is called quasi-truncation selection. It is 
almost as effective in mutation elimination as strict 
truncation and it is likely that in many populations 
selection is of this form. To the extent that this kind of 
selection occurs, the population can tolerate a much 
higher mutation rate without risk of greatly reduced 
fitness or possible extinction. In contrast, there is no 
such load reduction in asexual species, since there is 
no recombination to facilitate elimination of mutants 
in groups. The ability to tolerate a high mutation rate 


has been used as an argument for the value of sexual 
reproducion. 


See also: Genetic Load; Haldane—Muller Principle 
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Like all mutations, a missense mutation is an inherit- 
able alteration in the sequence of the genetic material 
of an organism. At the DNA level, a missense muta- 
tion is one class of base pair substitution which 
occurs in that portion of a protein-encoding gene 
which contains the sequence actually encoding the 
amino acid residues. Therefore, missense mutations 
cannot occur in genes which encode rRNA or 
tRNA, or in noncoding portions of the genome. 
Specifically however, missense mutations are those 
mutations which alter the base sequence in such a 
way that the final protein product of the mutated 
gene contains a different amino acid at a specific 
residue, relative to the wild-type protein. Therefore, 
the missense mutation results in the changing of one 
sense codon in an mRNA to another sense codon 
calling for a different amino acid. Note that because 
of the degeneracy of the genetic code, other base pair 
mutations in the same region may lead to a change in 
codons, but the codons will be synonymous, calling 
for the same amino acid. Mutations which result in 
synonymous codon substitution are one class of silent 
mutations, and are not generally considered missense 
mutations. 

However, the change in phenotype brought about 
by a particular missense mutation depends very 
much on the nature and the location of the amino 
acid substitution. Some missense mutations are also 
phenotypically silent. This could be because the 
amino acid residue which is changed is not involved 
in the activity of the protein and the new residue 
does not interfere with the activity or folding of the 
protein. The mutation could also be phenotypically 
silent, or nearly so, because the amino acid substitu- 
tion is very conservative. These are examples of neu- 
tral mutations. Such mutations are typically 
discovered by sequencing, or by examining the activ- 
ity of alleles of a gene which has been the target of site- 
directed mutagenesis. 

However, many missense mutations do lead to an 
altered phenotype because the amino acid substitution 
does, in fact, affect the activity, folding, or assembly of 


the protein. Such missense mutations can be dis- 
covered through selection or screening techniques, 
or by examining alleles that are known to be asso- 
ciated with inheritable genetic traits. For example 
almost all the causative mutations leading to achon- 
droplasia, a frequently occurring autosomal domin- 
ant form of dwarfism in humans, are missense 
mutations leading to the substitution of an arginine 
for a glycine at a particular residue in a fibroblast 
growth factor receptor. Missense mutations can 
also be constructed by site-directed mutagenesis. 
The mutant protein formed may have nearly normal 
activity or may be totally inactive, once again 
depending on the nature and the location of the 
amino acid substitution. 

In our discussion of missense mutations, and the 
silent mutations which yield synonymous codons, we 
are assuming that the presence or absence of pheno- 
typic change results only from a change in the primary 
sequence of the protein. However, different organisms 
use synonymous codons with very different efficien- 
cies which may result in changes in the level of prod- 
uct. In addition, of course, mutations which may be 
‘silent’ at the level of translation may change RNA 
processing steps or messenger RNA stability, leading 
to phenotypic change which would be difficult to 
predict. 

Typically, a full-length, stable protein will be pro- 
duced from a gene having a missense mutation. Even if 
inactive, this protein can often be detected by anti- 
bodies to the normal protein and even purified, a fact 
which Yanofsky took advantage of in his classic 
experiments demonstrating the colinearity of the 
gene and the protein, using the protein encoded by 
the trpA gene of Escherichia colt. 


See also: Base Substitution Mutations; 
Colinearity; Introns and Exons; Mutation; 
Mutation, Silent; Neutral Mutation; Phenotype; 
Sense Codon; Yanofsky, Charles 
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A null mutation is one that entirely eliminates the 
function of a gene, usually via deletion, so that there 
is no gene product. 


See also: Mutation 
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Mutation rates describe the speed of the mutation 
process. In practice, one usually counts mutant organ- 
isms and total organisms and calls the ratio mutation 
frequency, f. Using theory, the mutation frequency 
and the population history are used to calculate a 
mutation rate. The theory depends on the organism 
and on the way that the observations were conducted. 
The symbol u will denote the mutation rate, often 
with subscripts to indicate which kind of rate is 
meant. Only rates of spontaneous mutation will be 
considered. has dimensions such as ‘per genome 
replication’ or ‘per sexual generation.’ 

The pitfalls in determining mutation rates are both 
experimental and theoretical. fis often underestimated 
because some mutations produce no readily detectable 
change in the trait under observation, even though the 
large majority of mutations are deleterious and are 
eventually eliminated by natural selection. If muta- 
tions manifest themselves only some time after they 
arise, f will be underestimated. If mutants arising in 
growing populations grow either faster or slower than 
nonmutants, f will be estimated inaccurately. If either 
the topology of genome replication or the dynamics of 
population growth is insufficiently understood, the 
theory will not correctly connect f and pu. 

Just as with any other heritable trait, mutation rates 
are highly evolved entities. Natural selection acts on 
mutations throughout the genome. As a result, the 
major regularities in rates of mutation are found 
among mutation rates per genome replication, [g. 
These regularities represent an evolutionary balance 
between the deleterious effects of the great majority 
of mutations, and the cost of further reducing the 
mutation rate. Beneficial mutations are very rare 
and do little to increase average mutation rates in a 
species. 

The riboviruses (simple RNA viruses such as polio- 
virus) have the largest rates of spontaneous mutation. 
These viruses reproduce by first repeatedly copying the 
infecting genome into complementary sequences, then 
repeatedly copying those complementary sequences 
back into sequences identical to that of the infecting 
genome. The final RNAs are then packaged into virus 
particles and released from the cell. Assuming that 
mutation rates are the same in both rounds of 
copying, this topology of reproduction leads to the 
equation f = 2, for a single cycle of infection. When 
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multiple cycles occur (the released viruses infecting 
other cells), the equation becomes f = 2cug for 
c rounds of infection, or Hg = f/2c. The characteristic 
value of a riboviral u is about 0.76 per genome repli- 
cation, or about 1.5 per infection cycle. The result is 
that a population of riboviruses is extremely hetero- 
geneous, with sibling particles usually differing 
genetically. This high mutation rate contributes to the 
difficulty of developing an optimal antibody response. 
Even a small increase in this rate extinguishes the viral 
population because all particules soon accumulate 
deleterious mutations. 

Retroelements consist of retroviruses and those 
transposons that alternate between genomes made of 
DNA and of RNA. In DNA form, retroelements in- 
habit the chromosomes of their hosts. These inserted 
elements are occasionally transcribed by RNA poly- 
merase, whereupon that transcript encodes proteins 
that include a reverse transcriptase (RT). Later the 
RT (which in the case of retroviruses is packaged 
into the viral particle) makes a DNA copy of the 
RNA genome, and then synthesizes the complemen- 
tary DNA strand. Finally, this double-stranded DNA 
inserts itself into a host chromosome, sometimes 
mutating a host gene in the process. Thus there are 
three successive rounds of copying, considering only 
the time when the retroelement is on its own and not 
passively replicating as part of the host chromosome. 
Although the mutation rate probably varies at each of 
these three steps, the relation between average muta- 
tion rate and observed mutant frequency is simply 4g 
= f/3. The characteristic value of a retroelement Hg is 
about 0.15 per genome replication, roughly five times 
lower than for a ribovirus. Thus, although the HIV 
viruses that cause AIDS are sometimes touted as the 
most highly mutable of organisms, they are in fact less 
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mutable than influenza viruses and the rhinoviruses 
that cause common colds. 

The DNA-based microbes, including DNA 
viruses, archaea, bacteria, yeasts, and fungi, also dis- 
play a characteristic mutation rate. The theory relating 
u to f is complex for these organisms because of their 
exponential mode of replication, in which DNA 
molecules double in number in each replication. 
One simple formulation, valid for large populations 
in which many mutations have occurred, is u 
f/ln(Np) where N is the final population size. (Here, 
u must be calculated by trial and error or by the 
computer equivalent.) The characteristic value of pg 
for DNA-based microbes is 0.0034, or one mutation 
per 300 genome replications or cell divisions. Unlike 
the riboviruses and retroelements, the DNA-based 
microbes vary greatly in the sizes of their genomes. 
In order to maintain a constant lg, their mutation rate 
per average base or base pair, up, must vary inversely 
with genome size (Figure 1). The range of this vari- 
ation is about 7000-fold, so that uz is vastly higher in a 
small DNA virus than in a fungus. 

Mutations in DNA-based microbes that raise the 
mutation rate produce ‘mutator mutants.’ Because the 
standard mutation rate is low, mutators are often 
viable. In the race to adapt mutationally to new envir- 
onments, these mutators sometimes win. However, 
they are clearly disadvantaged in the long run, and 
must either mutate back to the standard rate or be 
overtaken by a nonmutator competitor. 

In higher eukaryotes, mutation rates can be ex- 
pressed in different ways. Their dimensions can be per 
sexual generation, or per cell division along the cell lin- 
eage that generates the gametes. Higher eukaryotic 
genomes are often heavily loaded with nongenic 
DNA, including intergenic DNA and introns, in 


which most mutations have little deleterious effect. 
Mutation rates tend to vary greatly depending on 
how many germline cell divisions occur between gen- 
erations and how much extraneous DNA is present. 
The least variable rate is Heg per germline cell division 
(including the more flexible plant equivalent) per 
effective genome, the latter being that fraction of the 
genome in which most mutations are deleterious. The 
characteristic value of jeg is roughly 0.01, with per- 
haps twofold uncertainty. In mammals, where the 
number of germline cell divisions is large and occurs 
mostly in the male, the mutation rate per sexual gen- 
eration is roughly 1. Thus, any general and sustained 
increase in the human mutation rate would be likely to 
extinguish the species. 

At a far finer scale, regularity gives way to irregu- 
larity. At each of the scores to thousands of base pairs 
in a gene, mutation rates vary by factors as large as 
1000-fold. The most mutable sites are called muta- 
tional hotspots. Some of these reside in repeating 
sequences, such as AAAA or AGAGAG, within 
which the replication apparatus frequently slips (gen- 
erating, for instance, AAAAA or AGAG). Other hot- 
spots are prone to base substitutions, for reasons that 
are only imperfectly glimpsed at present. 


Further Reading 

Drake JW and Holland JJ (1999) Mutation rates among RNA 
viruses. Proceedings of the National Academy of Sciences, USA 
96: 13910-13913. 

Drake JW, Charlesworth B, Charlesworth D and Crow JF 
(1998) Rates of spontaneous mutation. Genetics 148: 1667— 
1686. 


See also: Evolutionary Rate; Mutation; Mutation, 
Spontaneous; 
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A silent mutation is a change in the nucleotide sequence 
of a gene that does not alter the aminoacid sequence of 
the encoded protein. Usually, this is a change from one 
codon to a different, synonymous codon (for example, 
from GGG to GGA, both of which encode glycine). 
Changes in untranslated regions of a gene such as 
introns may also be silent mutations. 


See also: Mutation, Spontaneous; Neutral 
Mutation 
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Mutations are permanent changes in the sequence of 
the heritable genetic material (DNA or RNA) of a cell 
or organism. Spontaneous mutations are those mu- 
tations that occur in the absence of an exogenous 
chemical or physical agent. Spontaneous mutations 
commonly arise from errors made during DNA repli- 
cation. The fidelity of DNA replication is normally 
high due to the combination of base incorporation 
specificity, proofreading and postreplicative repair. 
DNA polymerase inserts the incorrect base at a rate 
of one per 10* to 10° bases replicated. Many DNA 
polymerases also have an associated exonuclease that 
recognizes mispaired bases and ‘proofreads’ the newly 
synthesized DNA. If a mispaired base is detected, the 
newly synthesized DNA is removed and the polymer- 
ase resynthesizes the region. 

Proofreading increases the fidelity of DNA repli- 
cation 10- to 200-fold. Enzymes that recognize and 
repair mismatches after replication further increase 
replication fidelity 10- to 1000-fold. Despite these 
mechanisms, some replication errors remain uncor- 
rected and give rise to spontaneous mutation at a rate 
of about one error per 10° to 10'° bases replicated. 
Spontaneous mutations may also result from en- 
dogenous DNA damage. For example, methylcyto- 
sine can spontaneously deaminate to thymine; if the 
thymine is not removed from the DNA it will pair 
with A, creating a G:C to A:T mutation. Some spon- 
taneous mutations may be due to error-prone poly- 
merases replicating damaged or undamaged DNA. 
Spontaneous mutations also result from the move- 
ment of mobile genetic elements such as transposons 
and viruses, and from large-scale chromosomal re- 
arrangements. 


See also: DNA Repair; DNA Replication 
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The connection between protein structure and func- 
tion is a central question in biology, and mutational 
analysis has proven to be a powerful method for 
describing structure-function relationships. In this 


which most mutations have little deleterious effect. 
Mutation rates tend to vary greatly depending on 
how many germline cell divisions occur between gen- 
erations and how much extraneous DNA is present. 
The least variable rate is Heg per germline cell division 
(including the more flexible plant equivalent) per 
effective genome, the latter being that fraction of the 
genome in which most mutations are deleterious. The 
characteristic value of jeg is roughly 0.01, with per- 
haps twofold uncertainty. In mammals, where the 
number of germline cell divisions is large and occurs 
mostly in the male, the mutation rate per sexual gen- 
eration is roughly 1. Thus, any general and sustained 
increase in the human mutation rate would be likely to 
extinguish the species. 

At a far finer scale, regularity gives way to irregu- 
larity. At each of the scores to thousands of base pairs 
in a gene, mutation rates vary by factors as large as 
1000-fold. The most mutable sites are called muta- 
tional hotspots. Some of these reside in repeating 
sequences, such as AAAA or AGAGAG, within 
which the replication apparatus frequently slips (gen- 
erating, for instance, AAAAA or AGAG). Other hot- 
spots are prone to base substitutions, for reasons that 
are only imperfectly glimpsed at present. 
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A silent mutation is a change in the nucleotide sequence 
of a gene that does not alter the aminoacid sequence of 
the encoded protein. Usually, this is a change from one 
codon to a different, synonymous codon (for example, 
from GGG to GGA, both of which encode glycine). 
Changes in untranslated regions of a gene such as 
introns may also be silent mutations. 


See also: Mutation, Spontaneous; Neutral 
Mutation 
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Mutations are permanent changes in the sequence of 
the heritable genetic material (DNA or RNA) of a cell 
or organism. Spontaneous mutations are those mu- 
tations that occur in the absence of an exogenous 
chemical or physical agent. Spontaneous mutations 
commonly arise from errors made during DNA repli- 
cation. The fidelity of DNA replication is normally 
high due to the combination of base incorporation 
specificity, proofreading and postreplicative repair. 
DNA polymerase inserts the incorrect base at a rate 
of one per 10* to 10° bases replicated. Many DNA 
polymerases also have an associated exonuclease that 
recognizes mispaired bases and ‘proofreads’ the newly 
synthesized DNA. If a mispaired base is detected, the 
newly synthesized DNA is removed and the polymer- 
ase resynthesizes the region. 

Proofreading increases the fidelity of DNA repli- 
cation 10- to 200-fold. Enzymes that recognize and 
repair mismatches after replication further increase 
replication fidelity 10- to 1000-fold. Despite these 
mechanisms, some replication errors remain uncor- 
rected and give rise to spontaneous mutation at a rate 
of about one error per 10° to 10'° bases replicated. 
Spontaneous mutations may also result from en- 
dogenous DNA damage. For example, methylcyto- 
sine can spontaneously deaminate to thymine; if the 
thymine is not removed from the DNA it will pair 
with A, creating a G:C to A:T mutation. Some spon- 
taneous mutations may be due to error-prone poly- 
merases replicating damaged or undamaged DNA. 
Spontaneous mutations also result from the move- 
ment of mobile genetic elements such as transposons 
and viruses, and from large-scale chromosomal re- 
arrangements. 


See also: DNA Repair; DNA Replication 
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The connection between protein structure and func- 
tion is a central question in biology, and mutational 
analysis has proven to be a powerful method for 
describing structure-function relationships. In this 
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approach, mutant proteins are either identified from 
natural systems or synthesized through zn vitro muta- 
genesis; characterization of these mutant proteins 
reveals the amino acids required for proper protein 
function. Functions analyzed by mutation include 
protein stability, molecular recognition, enzymatic 
activity, and drug susceptibility. The term ‘mutational 
analysis’ can also be used to describe the process of 
characterizing mutations by DNA sequencing or 
hybridization studies; these topics are discussed else- 
where in this volume. Mutational analysis using 
in vitro evolution techniques is also discussed in a 
separate article. 

The following systems exemplify important prin- 
ciples of mutational analysis, including experimental 
strategies and biological consequences of mutagenesis. 
In many cases, the functional information provided by 
mutational analysis is combined with structural infor- 
mation obtained through X-ray crystallography and 
nuclear magnetic resonance spectroscopy (NMR) to 
generate a very detailed picture of how a protein works. 
However, mutational data on its own can provide 
insight into the structure of proteins, such as trans- 
membrane receptors, which are not amenable to high- 
resolution structure determination by direct methods. 


Naturally Occuring Mutations and 
Tumorigenesis: p53 


Analysis of disruptive mutations can provide import- 
ant information about the molecular mechanisms of 
diseases such as cancer. The tumor suppressor protein 
p53 plays a critical role in preventing transformation 
of normal cells into cancerous cells, and mutations 
in p53 leave tissues vulnerable to tumor formation. 
In fact, approximately 50% of human tumors contain 
mutations in p53, and over 10 000 such mutations have 
been documented. p53 has three protein domains, one 
each for activation, sequence-specific DNA binding, 
and self-association (p53 is a tetramer). The large 
majority of p53 mutations found in tumors are single 
amino acid substitutions in the 200-residue DNA- 
binding domain; most of these mutations have been 
found to reduce sequence-specific DNA binding. 
Three observations can be made about DNA 
damage and carcinogenesis from mutational analysis 
of p53 genes isolated from tumors. First, the p53 
mutations found in different tissues may be correlated 
with the cause of cancer. For example, G to T transver- 
sions are common in smoking-associated lung cancers, 
likely due to chemical reaction of DNA with the 
polyaromatic hydrocarbons (PAHs) found in tobacco 
smoke. Second, carcinogenic mutations show evi- 
dence of selection. For instance, one-fourth of all 
mutations found in p53 are C to T transversions 


arising from deamination of 5-methylC at CpG 
sequences. These mutations are isolated with very 
different frequencies, however, being 100-fold more 
common at position 273 (see below) than at codon 
202. Third, several common sites of mutation, called 
mutational hot spots, are involved in sequence- 
specific DNA binding. This observation can be made 
by comparing mutational data with an X-ray structure 
of the DNA-binding domain bound to an oligo- 
nucleotide containing the p53 recognition sequence. 
Structure-function analysis indicates that arginine 
273, found in ~9.6% of cancers, binds directly to 
the DNA and also plays a structural role in orienting 
another DNA-binding residue (arginine 280). In sum- 
mary, analysis of mutations has helped to characterize 
the DNA-binding function of p53, the link between 
p53 mutation and disease, and the mechanism of 
mutagenesis in different cancers. 


Enzyme Activity and Engineering: 
Protease Enzymes 


Proteases are an abundant class of enzymes which 
catalyze the cleavage of protein and peptide bonds 
(proteolysis). Proteases are divided into four classes — 
serine, cysteine, aspartic, and metallo — based on the 
structure of the enzyme’s active site. For each class, 
extensive mutational analysis, coupled with structural 
data and kinetic measurements, has helped determine 
the mechanisms of catalysis and inhibition as well the 
origins of substrate specificity. 


Catalytic Mechanism 

The catalytic mechanism of the serine protease sub- 
tilisin involves a precise interplay between the active 
site residues Asp32, His64, or Ser221 (Figure 1). 
Mutation of any of these residues to alanine reduces 
enzyme activity 10* to 10°-fold, demonstrating the 
importance of the ‘catalytic triad.” However, subtilisin 
variants containing a mutated catalytic triad are still 
weakly active; these mutants function by binding to 
the tetrahedral intermediate formed during catalysis. 
Mutational analysis indicates that the NH group on 
asparagine 155 mediates the stabilization of this inter- 
mediate by hydrogen bonding to the oxyanion; muta- 
tion of Asn155 to several different residues reduces the 
activity of the protease 100-2000-fold. 


Substrate Specificity 

The binding interactions between proteases and pep- 
tides have been characterized by mutational analysis 
of both the enzyme and the substrate. A protease 
recognizes the N-terminal side of a peptide substrate 
using a series of four pockets, named S1-S4, that bind 
to the substrate residues, named P1-P4. P1 is adjacent 


to the cleaved (scissile) bond. In general, it has been 
difficult to predict the specificity changes in a protease 
resulting from mutations in the $1-S4 pockets. In one 
notable exception, subtilisin, which has weak sub- 
strate selectivity, was converted to a highly selective 
protease with the sequence specificity of the mamma- 
lian protease furin. By installing acidic residues in the 
S1, S2, and S4 subsites, a mutant was generated that 
was selective for furin-like substrates having basic 
residues at P1, P2, and P4. 

Substrate mutations are often used to characterize 
enzyme-substrate interactions. Typically, substrate 
selectivity is determined by comparing biological sub- 
strates and then measuring the activity of synthetic 
peptides containing the consensus sequence(s). Pos- 
itional scanning mutagenesis has recently been 
described as an efficient method for determining 
optimal protease substrates (Figure 2). A positional 
scanning library of tetrapeptides comprises four sub- 
libraries in which one position in the peptide is fixed 
and the other three positions contain mixtures of each 
amino acid. This method was first demonstrated for 
the cysteine protease interkerleukin-1B converting 
enzyme (ICE). For ICE, aspartic acid was already 
known to be required at P1, and screening showed 
maximal activity for histidine at the P2 position, glu- 
tamate at P3, and tryptophan at P4. The consensus 
peptide, Trp-Glu-His-Asp, is a very active substrate 
and a scaffold for potent inhibitors of ICE. Interest- 
ingly, this optimal substrate does not have the same 
sequence as the only known iv vivo substrate. Thus, a 
thorough mutational analysis can uncover new sub- 
strates as well as novel inhibitors. 


Fighting Drug Resistance: HIV Protease 

HIV protease, an aspartyl protease, is the target 
of several important antiHIV drugs; however, drug- 
resistant mutants of the protease have emerged. Muta- 
tional analysis of clinical isolates has uncovered some 
of the mechanisms of induced drug resistance. In 
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several cases, HIV protease has incorporated active- 
site mutations that weaken its binding to the inhibitor. 
A commonly found active-site mutation, valine 82 to 
alanine (V82A), results in ~10-fold reduced binding 
of three marketed drugs. In addition to active-site 
mutations, many clinical isolates contain mutations 
distant from the active site. These mutations can effect 
drug resistance by reducing drug binding, increasing 
enzyme activity, or acting synergistically with active- 
site mutations. One example of a synergistic mutation 
is leucine 63 to phenylalanine, which partially restores 
the function of the handicapped active-site mutant 
V82F/I84V. X-ray crystallographic studies are be- 
ginning to provide details about the structural 
consequences of such drug-resistant mutations. Even- 
tually, structure-function studies of HIV protease may 
speed discovery of new therapies by suggesting ways to 
circumvent mutationally induced drug resistance. 


Scanning Mutagenesis 


Scanning mutagenesis involves the sequential muta- 
tion of a series of protein residues. This method sur- 
veys functionally important regions of a protein and 
requires minimal structural information. Once the 
protein sequence has been mapped by scanning muta- 
genesis, interesting regions can be analyzed in more 
detail by saturation mutagenesis, in which a single site 
is mutated to several residues. 


Alanine Scanning: Human Growth Hormone 
Alanine scanning mutagenesis is often used to map 
protein-protein interfaces in hormone-receptor and 
antibody-antigen systems. Alanine scanning is a sub- 
tractive technique, since it identifies important resi- 
dues by replacing larger side-chain functional groups 
with small methyl groups. It is important to note that 
while side-chain interactions are directly studied by 
this method, amino acid substitutions can have signifi- 
cant effects on local backbone structure as well. 


(See over) Protease mechanism. (A) Top: mechanism of the serine protease subtilisin. Panel | shows the 


catalytic triad Asp-His-Ser and a generic peptide substrate. Ser22] is activated by hydrogen bonding to His64, which is 
also hydrogen-bonded to Asp32. In the first step of catalysis, Ser22] attacks the peptide carbonyl, forming a 
tetrahedral intermediate (panel 2) which collapses to a covalent acyl-enzyme intermediate (panel 3). In the second 
step, a water molecule, activated by hydrogen-bonding to His 64, attacks the covalent intermediate to yield the 
hydrolyzed product (panel 4). Bottom: mutagenesis of subtilisin active site residues. Sequential mutation of the 
catalytic triad shows dramatic reductions in catalytic activity. Asn |55, while not part of the catalytic triad, is involved 
in stabilizing the tetrahedral intermediate (panel 2). (B) Positional scanning mutagenesis of substrates for ICE. The 
positional scanning library contains a sublibrary for each position in the peptide substrate. X denotes a mixture of 20 
amino acids and O represents the fixed position. Each sublibrary contains 20 wells, one for each amino acid at the 
fixed position. Each well contains 20 x 20 = 400 compounds, for a total of 8000 compounds per sublibrary. As shown 
here, peptides are written from the A-terminus (P4) to the C-terminus (P1). The C-terminus contains a coumarin 
which becomes fluorimetric when cleaved by the enzyme. (Adapted from Rano et al., 1997.) 
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The human growth hormone-growth hormone 
receptor (hGH-hGHbp) complex has been character- 
ized in detail using alanine-scanning mutagenesis. 
Before structural information was available for this 
complex, a scan of 49 residues on hGHbp predicted 
the hGH binding site. Subsequent alanine scans, 
guided by X-ray crystallography, focused on hormone 
and receptor residues at the protein interface (Figure 2). 
Four conclusions have been drawn from mutagen- 
esis of hGH-hGHbp and other protein-protein sys- 
tems. First, protein-protein interfaces often have a 
‘hot spot,’ a subset of residues at the structural inter- 
face which confer most of the binding energy. In the 
hGH-hGHbp complex, ~30 residues of each protein 
are buried upon binding, but only eight residues 
on each face provide ~85% of the binding energy. 
Second, hot spots on each side of the interface are 
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complementary, since important residues on each 
side of the interface interact with each other. Third, 
hot spots tend to have hydrophobic residues in the 
center and hydrophilic residues around the edges. 
Finally, reductions in binding tend to result from per- 
turbations in the dissociation, rather than the associ- 
ation, of the protein-protein complex. 


Homolog and Cysteine Scanning 

Other scanning mutagenesis techniques have been 
helpful in characterizing proteins for which little 
structural data is available. Homolog scanning, for 
example, requires two closely related proteins with 
different binding specificities, such as two hormones 
or two DNA-binding proteins. Short sequences of 
one protein are grafted onto the homologous regions 
of the other; sequences which alter the binding of the 


Figure 2 Alanine-scanning mutagenesis of the hGH—hGHbp 1:1 complex. The structures shown are space-filling 
models of hGH (right) and hGHbp (left) generated from the X-ray crystal structure of the complex. The two 
molecules are separated to show the complementary effects of single alanine substitutions on the binding affinity of 
the complex. The loss in binding energy upon alanine mutation is shown by shading, with AAG >1.5 kcal mol”! in dark 
gray, <1.5kcal mol! in light gray. (Adapted from Clackson and Wells, 1995.) 
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second protein to its receptor are likely to be part 
of the protein-protein (or protein-DNA) interface. 
Another method, cysteine scanning, has been used to 
predict the conformations of membrane-bound pro- 
teins, such as the bacterial aspartate receptor. In this 
technique, pairs of cysteine mutants are tested for 
their ability to form covalent disulfide bonds. If two 
cysteines can react with each other, they are probably 
nearby in the folded protein structure. Furthermore, 
the activity of the disulfide-linked proteins gives 
information about the conformational requirements 
for protein function. 


Protein Stability 


The structure of a protein determines its function, and 
it is therefore important to understand how a stable 
protein structure is formed. Protein stability studies 
often start with an alanine scan of the whole protein 
followed by saturation mutagenesis of important resi- 
dues. Several careful studies, involving many 130 
mutations (for the 110-residue protein barnase), have 
sought to define the rules governing secondary struc- 
ture and the kinetics of protein folding. While our 
understanding of protein structure is still rudimentary, 
these studies have provided a guideline for generating 
hyperstable proteins and even for designing new pro- 
teins de novo. 

The importance of side-chain functionality in pro- 
tein tertiary structure has been analyzed for the 
P22 arc repressor, a 53-residue homodimer. Approxi- 
mately half of the residues in arc repressor are 
only mildly destabilizing when mutated to alanine. 
Strongly destabilizing alanine mutations tend to be 
in the interior of the protein, where hydrophobic 
residues are well packed and polar groups are involved 
in desolvated hydrogen bonds and salt bridges. These 
buried polar interactions can be very stabilizing. 
Mutagenesis of a buried salt bridge involving arginine 
31, glutamate 36, and arginine 40 suggests that the 
interaction between Arg31 and Glu36 stabilizes the 
protein by 1.7 kcal mol! relative to alanines at these 
positions, while interactions between Glu36 and 
Arg40 provides 4.3 kcal mol of stability. In contrast, 
solvent-exposed salt bridges usually provide less than 
0.5 kcal mol”! of stabilization energy because the spe- 
cific interaction has to compete with water. Hydro- 
phobic residues can be even more stabilizing in the 
protein core; random mutagenesis and selection at the 
same three positions recovered a triple mutant, con- 
taining methionine, tyrosine, and leucine, which was 
4 kcal mol! more stable than the wild-type protein. 
This improved stability was due to an increased rate of 
dimerization for the mutant, where the hydrophobic 
residues pay a smaller penalty for desolvation relative 


to the wild-type hydrophilic residues. This example 
demonstrates that mutational analysis improves our 
understanding of how native proteins are stabilized 
and suggests how more stable mutants may be 
designed. 


Prospects 


In vitro mutational analysis has helped define the 
mechanisms of enzymes, the binding sites of hor- 
mones, and the energetics of protein folding. An accur- 
ate description of how proteins function, and why 
they malfunction, will be critical for fighting diseases 
such as cancer and for circumventing drug resistance 
by infectious agents. Furthermore, insight gained 
through mutational analysis directs the synthesis 
of proteins with new functions and inhibitors with 
improved efficacy. 


Further Reading 

Ballinger MD, Tom J and Wells JA (1996) Furilisin: a variant of 
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See also: Artificial Selection; DNA Hybridization; 
DNA Sequencing; In vitro Evolution; In vitro 
Mutagenesis; P53 Gene; Proteins and Protein 
Structure 
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‘Mutational site’ is the term for the smallest segment 
of a gene whose alteration can produce a mutant phe- 
notype. In terms of DNA structure, a mutational site 
corresponds to a base pair. If two mutants can recom- 
bine with each other (usually detected by the produc- 
tion of wild-type), their mutations are at distinct 
mutational sites. A mutant that cannot recombine 
with either of those two mutants is a multisite mutant, 
such as deletion. 


See also: Heteroallele; Multisite Mutation 
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The mutator phenotype refers to the effect of muta- 
tions that destabilize the genome and lead to an in- 
crease in mutation rate. A variety of genes with many 
different cellular functions may increase mutation rate 
to some extent; however, the genes most frequently 
involved are the families of genes required for repairing 
damaged DNA or for maintaining chromosomal stabil- 
ity. Mutations that give rise to a mutator phenotype 
tend not to be transforming or lethal for a cell, but 
they result in inability of the cell to repair acquired 
damage that affects its genome or to maintain the 
integrity of its genetic material. Over time, this results 
in the accumulation of mutations that damage DNA 
and can contribute to inactivation of tumor suppressor 
genes or activation of oncogenes. As a result of the 
accumulation of gene damage, phenotypes such as 
cancer arise. Genes that are associated with a mutator 
phenotype fall into one of several groups, depending 
on the type of genomic damage with which they are 
associated. 

Mismatch repair (MMR) genes encode proteins 
responsible for repairing errors that occur during the 
normal replication of DNA. As new DNA strands are 
synthesized, errors such as insertion of an incorrect 
(mismatched) base or small loops of DNA may occur. 
The MMR proteins recognize these errors and form a 
protein complex which first removes the mismatched 
bases and then corrects the sequence. At least six 
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MMR genes are known to contribute to these repair 
complexes in humans: MSH2, MLH1, PMS2, PMS1, 
MSH6, and MSH3. Each of these genes functions as an 
autosomal recessive at the cellular level. The inactiva- 
tion of both alleles of even one of these genes can 
prevent the cell from repairing mistakes in the normal 
replication process. As a result, there is a rapid accu- 
mulation of somatic mutations with each round of 
DNA replication, resulting in genomic instability 
and errors in other cancer-related genes. Inherited 
mutations of the MMR genes, and particularly of 
MSH2 and MLH1, are found in patients with heredi- 
tary nonpolyposis colon carcinoma (HNPCC). This 
is an autosomal dominant form of colon carcinoma 
which may also be associated with tumors of the 
endometrium, stomach, ovaries, and other tissues. 
Loss of MMR genes in HNPCC can result in a pheno- 
type termed ‘microsatellite instability.” Throughout 
the genome there are many regions where simple 
repeated sequences occur. These are very prone to 
changes in repeat copy number during DNA replica- 
tion. MMR deficiency, as seen in HNPCC, is often 
associated with instability of these repeat numbers, 
because insertion or deletion of repeat copies is 
not recognized and repaired. Thus, microsatellite 
instability is frequently used as a marker for a mutator 
phenotype. 

Nucleotide excision repair (NER) genes are 
responsible for repairing damage to DNA caused by 
exogenous agents such as chemicals, UV light, or 
ionizing radiation. NER proteins recognize the 
damaged DNA, excise the error, and repair the DNA 
strand. Loss of the proteins required to identify or 
repair these errors can lead to increased sensitivity to 
many agents that may damage DNA. In xeroderma 
pigmentosum (XP), patients develop frequent skin 
tumors because they are deficient in one of the pro- 
teins required for repair of the sequence errors in their 
DNA caused by UV light. Cockayne syndrome 
patients are deficient in a distinct group of genes 
which specifically repair damage to actively tran- 
scribed genes, making these individuals very sensitive 
to UV light and other DNA-damaging agents. Patients 
with Cockayne syndrome have a distinctive pheno- 
type, including growth failure and developmental 
deterioration, but do not have an increased risk of 
cancer. 

The mutator phenotype may also be associated 
with chromosomal instability such as that observed 
in Bloom syndrome and Fanconi anemia. Individuals 
with these syndromes have very high recombination 
rates, which result in increased chromosomal re- 
arrangements and chromosome breakage due to loss of 
genes involved in stabilizing chromatin. Patients with 
Bloom syndrome have an increased risk of leukemia 
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and intestinal cancer. Fanconi anemia patients have 
reduced numbers of circulating blood cells, a high 
risk of leukemia, and are very sensitive to radiation- 
induced cancers. 


See also: Bloom’s Syndrome; Fanconi’s Anemia; 
Mutation Rate; Xeroderma Pigmentosum 
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Mutator cells are those which display a higher inci- 
dence of spontaneous mutations than a normal or 
wild-type cell. Cell lines, or strains of microorganisms 
derived from mutator cells, will display a population- 
wide increased mutation rate, as would a whole organ- 
ism. The first mutators were detected in Drosophila in 
the early 1940s and then in bacteria in the early 1950s. 
Mutators are useful for the study of mutation avoid- 
ance and repair pathways, and of certain human dis- 
eases including cancer and cancer susceptibilities. In 
fact, much of our knowledge of repair systems em- 
anates from work with mutators. For instance, the 
discovery of a locus in Escherichia coli, mutD, confer- 
ring a very strong mutator phenotype was instrumen- 
tal in demonstrating that the epsilon subunit of DNA 
polymerase III played a key role in replication error 
correction in vivo. This subunit has an exonuclease 
activity that allows it to serve as a proofreading moni- 
tor of replication. Mutations in the mutD gene en- 
coding epsilon, renamed dnaQ, that interfere with 
the proofreading function confer the mutator charac- 
ter. Complementing the biochemical work elucidating 
the methyl-directed postreplication mismatch repair 
system in bacteria was the discovery of several muta- 
tors that defined the genes involved in expressing this 
system. The dam locus encodes adenine methylase, 
which methylates the adenines on each strand of a 5’- 
GATC-3’ sequence. This allows the cell to recognize 
the hemimethylated pattern of recently replicated 
DNA and to mark the methylated strand as the tem- 
plate strand. Mutations at the dam locus are mutators, 
as are mutations in the mutH, mutL, mutS, and uvrD 
genes, all of which encode proteins that together 
recognize mismatches, determine the template and 
newly synthesized strands, and exercise the mismatch 
leaving a patch that is filled in by repair synthesis. 
Mutators stemming from inactivation of the E. coli mis- 
match repair system (MMR_) have greatly elevated 


rates of base substitution transitions (G:C—A:T and 
A:T—G:C), and frameshifts at runs of repeated short 
sequence units (28, 29, 37), such as mono, or dinucleo- 
tides (e.g.. -AAAAAAA- or -CTCTCTCTCT-), and 
their counterparts in human cells are involved in cer- 
tain types of cancer (see below). 

Another example of how mutators have led to the 
discovery of repair systems involves the response to 
oxidative damage to DNA and its precursors. Muta- 
tors were sought and found that stimulated specific 
transversions, and two such mutator loci, mutY and 
mutM, resulted in an increase of only the G:C — T:A 
transversion. Further biochemical characterization 
revealed that mut Y encodes a glycosylase that removes 
A residues from mispairs with G and the oxidation 
product, 8-oxodGuanine, and that mutM encodes 
a previously described glycosylase that removes 
8-oxodGuanine and certain degraded purines such as 
ring-opened guanines across from C. These results 
defined a two-component system in which oxidatively 
damaged G residues are removed by the mutM prod- 
uct as a first line of defense. 8-oxodG residues that 
persist specify A during replication most of the time, 
generating a premutational mispair that if left unre- 
paired will yield the observed G:C-—>T:A transver- 
sion. The mutY product then excises the A, allowing 
repair sythesis to operate, which restores a C most of 
the time. The regenerated 8-oxodG:C pair can now be 
acted on again by the mutM product. Normally, this 
two-part system prevents transversions arising from 
8-oxodGuanine generated in double-stranded DNA 
extremely well. Knocking out either mutY or mutM 
results in a small or moderate mutator effect, since the 
remaining glycosylase still operates. Inactivating both 
the mutY and mutM genes results in a very high muta- 
tion rate, all due to G:C—T:A transversions resulting 
from the creation of 8:oxodG. The mutT gene encodes 
an additional component of this repair system for 
oxidatively damaged guanines. This gene, the first 
mutator described in bacteria, hydrolyzes the oxi- 
dized precursor 8-oxodGTP back to the monophos- 
phate, eliminating it from the precursor pool. 
Otherwise, incorporation of 8-oxodG across from A 
could result in A:T—C:G transversions, which is 
precisely what is observed in mutT strains. 

A number of human genetic diseases result from 
repair defects that lead to mutator phenotypes under 
certain conditions. For example, individuals with xero- 
derma pigmentosum (XP) lack one of a number of 
the XP complementation group excision repair pro- 
teins and display greatly increased UV-induced skin 
cancer, as a result of the inability to repair UV damage 
and the resulting increased mutation rate. The finding 
that the inherited form of ovarian and colon cancer 
susceptibility is due to the presence of a defective copy 
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of one of the genes involved in the human counterpart 
to the bacterial mismatch repair system, underscores 
the importance of mutators. Here, a mutator cell 
presumably arises when one somatic cell loses or 
mutates the second copy of the mismatch repair 
gene. Such cells display strong mutator effects, similar 
in form to bacterial mismatch repair deficient 
(MMR) cells and are characterized by a high propen- 
sity to generate additions or deletions at sequence 
repeat tracts, often termed microsatellites. This micro- 
satellite instability, described previously in bacteria 
and yeast, is the hallmark of MMR cells. The involve- 
ment of mutators in the development of some cancers 
might be because a series of mutations that inactivate 
tumor suppressor and negative growth factor genes 
are required for cancer cells to break free of multiple 
growth restrictions. Thus, mutator lines would be able 
to generate these changes more readily than normal 
cells. 

Mutators also play a role in the generation of 
biodiversity, offering advantages under a variety of 
selective conditions, since they generate diverse pheno- 
types more rapidly than normal cells. Also, mutators 
lacking the mismatch repair system lack the barrier to 
recombination between divergent chromosomes, 
allowing interspecies (horizontal) transfer to occur 
more readily. 


See also: Mismatch Repair (Long/Short Patch); 
Mutation, Spontaneous; Xeroderma 
Pigmentosum 
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The Myb oncogene is a transcription factor that is 
highly conserved in vertebrates and is expressed in a 
number of proliferative tissues during development 
and in the adult. It is indispensable in the formation 
and functioning of the adult hemopoietic system, regu- 
lating transcription in both progenitors and specific 
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differentiating blood cell types. c-Myb controls pro- 
liferation, differentiation, and cell survival, but the 
extent of its involvement in each process is cell-type 
specific. c-Myb has been oncogenically activated by 
transduction in two avian acute leukemia viruses 
(AMV and E26) and is likely to be involved in 


human leukemia. 


Further Reading 
Weston K (1998) Myb proteins in life, death and differentiation. 
Current Opinion in Genetics and Development 8: 76-81. 


See also: Leukemia 


myc Locus 


See: Oncogenes 
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Myxoid liposarcoma (MLS)/round cell liposarcoma 
(RCL) is the most common subtype of malignant 
adipose tissue tumor. The tumor is characterized by 
lipoblasts and preadipocytes at various stages of 
maturation. Most MLS/RCLS carry translocations 
involving chromosome bands 12q13 and 16p11 or, 
less frequently, 12q13 and 22q12. The translocations 
result in fusion genes consisting of the 5’ half of either 
FUS/TLS on chromosome 16 or EWS on chromosome 
22 fused to the transcription factor gene CHOP on 
chromosome 12. The FUS/TLS-CHOP fusion pro- 
tein probably acts as an abnormal transcription factor, 
and transgenic mice expressing it develop MLS but no 
other tumors. 


See also: Cancer Susceptibility; Translocation 


N,, N3, N,, etc. 
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These terms describe organisms obtained froma multi- 
generational protocol of backcrossing used to generate 
a congenic strain, and can also be used to describe the 
generation itself. The N2 generation describes off- 
spring from the initial cross between an F; hybrid 
and one of the parental strains used to produce the 
F, hybrid. Each following backcross generation is 
numbered in sequence. There is no N generation. 


See also: Backcross; Congenic Strain; Fl Hybrid; 
Hybrid; Parental 
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Nasopharyngeal carcinoma (NPC) is a rare tumor in 
the Western world but is common in Southern China, 
North Africa, Alaska, and Greenland. In Guangzhou 
and other areas of Southern China it is the most com- 
mon tumor among males. Different Chinese dialect 
groups in Singapore have similar incidences of NPC as 
the corresponding dialect regions in China. Family 
clusters were reported in Alaska and Greenland Inuit 
and in both Chinese and American families. They 
suggest a genetic contribution, but the relative role of 
genetic and environmental factors has not been ana- 
lyzed by modern epidemiological methods. Ethnically 
related environmental factors, particularly the con- 
sumption of salted fish, are also believed to contribute. 
Other suspected environmental factors include chloro- 
phenol, cigarette smoking, and N-nitrosamines. 

Low differentiated or anaplastic NPC carry 
Epstein-Barr virus (EBV) in nearly 100% in both 


the high-incidence (Chinese) and low-incidence 
(Western) groups. Anaplastic carcinomas at other 
sites, such as the salivary glands and the thymus, may 
also carry EBV. Differentiated squamous cell carcino- 
mas of the oro- and hypopharynx do not carry EBV. 

NPC cells carry the virus as multiple episomal 
copies. They express EBNA1, EBERs, and LMP2 A 
and B while LMP1 is only expressed in 60% of the 
cases (for explanation of the EBV products, see 
Epstein-Barr Virus (EBV)). About one-third of the 
tumors are LMP1-negative. LMP1 expressors and 
nonexpressors differ with regard to the methylation 
status of the LMP1 promoter. 

LMP1 protein expressed from EBV genomes 
carried by LMP1-negative tumors can render a 
nonimmunogenic mouse carcinoma immunogenic 
(rejectable) in syngeneic hosts, in contrast to LMP1 
expressed from LMP1-positive tumors. This was 
taken to suggest that the LMP1 expressors carry 
LMP! protein that has been modified by immuno- 
selection in vivo. Sequence information is consistent 
with this but the critical sequence difference has not 
been identified. 

The role of the virus in the genesis of NPC is not 
understood. Progress has been hampered by the lack 
of appropriate EBV-carrying NPC lines and of EBV- 
epithelial cell transformation systems in vitro. 

The EBV-associated anaplastic form of NPC is 
highly infiltrated with lymphocytes, as a rule. The 
tumor has been often referred to as ‘lymphoepithe- 
lioma,’ implying that the lymphoid elements play a 
part in the neoplastic process. This is, however, not 
supported by the fact that NPC metastases have a 
reduced lymphocytic component, and nude mouse 
passaged NPC are entirely free from human lympho- 
cytes. In the primary tumors, most of the lymphocytes 
are small, nonactivated T cells. There is no indication 
that they play an immunological role. 

Electron microscopic studies showed the presence 
of tonofilaments and desmosomes in NPC, confirm- 
ing its epithelial origin. The high radiosensitivity of 
NPC is a distinctive feature, compared to other head 
and neck carcinomas. 

It is not clear how the virus enters into the naso- 
pharyngeal epithelium. Unlike B cells, epithelial cells 
do not carry high affinity EBV receptors. Conceivably, 
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the virus may be carried into the NPC precursor cell 
by lymphocytes, entering by emperipolesis. Each 
NPC carries the clonal progeny of a single viral infec- 
tion event, as proven by the presence of a single ter- 
minal repeat band, reflecting a unique circularization. 

Neomycin-tagged EBV can convert established 
EBV-negative carcinoma lines of gastric or pharyngeal 
origin into EBV-carrying lines. Infection is a low- 
probability event and requires continued drug selec- 
tion both for initiation and maintenance. Convertants 
may show increased clonability and occasionally, in- 
creased tumorigenicity. 

Cytogenetic changes in NPC include frequent 
deletions of the short arm of chromosome 3 at 3p14, 
or 3p21.1. Deletions and other chromosomal anom- 
alies have also been found on chromosomes 7, 9p11q, 
13q, 3p21, 25, and 26. 

No p53 mutations were found in NPC. Virtually all 
tumors express high levels of the p53 related protein 
p63. The truncated delta N-isotype that can block 
p53-mediated transactivation is the dominant p63 spe- 
cies. It was suggested that 6 N-p63 may be a suppres- 
sor of wild-type p53 function. 


Further Reading 

Crook T, Nicholls JM, Brooks L et al. (2000) High level expres- 
sion of delta N-p63 in NPC. Oncogene 19: 3439-3444. 

Hu LF, Eiriksdottir G, Lebedeva T et al. (1996) LoH on hetero- 
zygosity on chromosome arm 3p in NPC. Genes, Chromo- 
somes and Cancer 17: 118-126. 

Nicholls J (2000) A century of nasophygeal carcinoma. Epstein— 
Barr Virus Reports 7: 73-82. 


See also: Epstein-Barr Virus (EBV); Genetic 
Diseases 
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Daniel Nathans (1928-99), an American molecular 
geneticist, pioneered the use of restriction endonu- 
cleases for the genetic analysis of viruses and cells. 
For this seminal work, he shared the 1978 Nobel 
Prize in Physiology or Medicine with his colleague 
Hamilton O. Smith and the Swiss microbiologist 
Werner Arber. 

Nathans was born in Wilmington, Delaware, the 
youngest of eight children of Russian immigrants. He 
attended the University of Delaware and Washington 
University in St. Louis, Missouri, where he earned a 


medical degree in 1954. He completed an internship in 
internal medicine at the Columbia Presbyterian 
Hospital in New York City and then spent 2 years as 
a Clinical Associate at the National Cancer Institute. 
After 2 additional years of residency training at the 
Presbyterian Hospital, Nathans gave up plans to prac- 
tice medicine in favor of a career in medical research. 
In 1959 he joined the laboratory of Fritz Lippman at 
the Rockefeller Institute as a Guest Investigator. 

In Lippman’s laboratory, Nathans initiated studies 
of the mechanisms of protein synthesis in bacteria. His 
first important contribution was the development of a 
cell-free translation system in extracts of Escherichia 
colt. Following the discovery of RNA bacteriophage 
by Norton Zinder and colleagues, Nathans demon- 
strated that addition of phage RNA to this system 
resulted in the synthesis of the phage coat protein. 
This was the first demonstration of the in vitro synthe- 
sis of a specific protein with a purified mRNA and led 
to a number of important insights into the mechanism 
of protein synthesis. 

In 1962 Nathans joined the faculty of the Depart- 
ment of Microbiology at the Johns Hopkins Univer- 
sity School of Medicine. He continued his studies of 
protein synthesis for several years. Among his numer- 
ous contributions was the demonstration that the 
antibiotic puromycin is incorporated into growing 
polypeptide chains and inhibits protein synthesis by 
causing premature chain termination. For his work on 
mechanisms of bacterial protein synthesis, Nathans 
received the Selman Waksman Award in 1967. 

In the late 1960s, Nathans turned to the study of the 
small DNA tumor virus SV40. The genome of this 
virus consists of a single circular double-stranded 
DNA molecule of 5000 bp. SV40 multiplies in cul- 
tured simian cells, but causes tumorigenic transform- 
ation of rodent cells, thus providing a simple model 
system for viral carcinogenesis. While on sabbatical 
leave at the Weitzman Institute in Israel in 1969, 
Nathans received a letter from his colleague Hamilton 
Smith, describing the identification and character- 
ization of a novel restriction endonuclease from the 
bacterium Haemophilis influenzae that cleaved DNA 
molecules at specific nucleotide sequences. Nathans 
immediately realized that the enzyme could provide a 
powerful approach to dividing the viral genome into 
smaller fragments whose functions could be more 
readily studied. Upon his return to Johns Hopkins, 
Nathans and his colleagues developed methodology 
to separate the fragments produced by digestion of 
SV40 DNA with Smith’s restriction endonuclease. 
By analyzing the products produced by partial diges- 
tion, they were able to determine the order of the 
fragments in the viral genome and to construct a so- 
called cleavage map of SV40 DNA. This map provided 


physical points of reference for localizing viral genes 
and other genetic elements. Over the next several 
years, Nathans and his collaborators demonstrated 
the power of this new genetic method by mapping 
the location of the SV40 origin of DNA replication, 
determining the positions of mutations that caused 
defects in virus multiplication or transformation, and 
mapping the mRNAs encoding the various viral gene 
products. The method was soon applied by many 
laboratories to other viruses and plasmids. The tech- 
niques developed by Nathans also proved to be an 
important foundation for the subsequent recombinant 
DNA revolution. 

In addition to demonstrating that restriction endo- 
nucleases could be used to dissect the functional or- 
ganization of the viral genome, Nathans was among 
the first to show that these enzymes could also be used 
to generate deletion and point mutations at specific, 
predetermined sites in the genome. Such site-directed 
mutagenesis methodsrepresentedafundamentalchange 
inthe way genetics was practiced and allowed a more 
precise definition of the functions of proteins and 
regulatory signals than had previously been possible. 

Nathans remained at the Johns Hopkins University 
School of Medicine throughout his scientific career. 
He served as Director of the Department of Micro- 
biology from 1972 to 1981 and in 1982 was named to 
the Directorship of the newly formed Department of 
Molecular Biology and Genetics. Nathans served 
as Interim President of the Johns Hopkins University 
in 1995-96. From 1982 until his death in 1999, he 
was also a Senior Investigator of the Howard Hughes 
Medical Institute. Nathans was the recipient of 
numerous awards in addition to the Nobel Prize. 
He was a member of the US National Academy of 
Sciences and received the Academy’s Award in 
Molecular Biology in 1976. He served on the 
President’s Council of Advisors on Science and 
Technology from 1990 to 1993 and was awarded the 
US National Medal of Science in 1993. 


See also: Restriction Endonuclease 
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Natural selection occurs when differences among 
individuals cause differences in survival and reproduc- 
tion. Evolution by natural selection occurs when these 
differences in survival and reproduction cause the 
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population to evolve. In some populations of bacteria 
a few cells carry plasmids that make them resistant 
to penicillin. Because the plasmid is transmitted to 
daughter cells when they divide, the descendants of 
resistant cells are also resistant. If a population that has 
a few resistant cells in it is exposed to penicillin, the 
resistant cells will become more common and suscep- 
tible cells will become less common. Eventually, the 
entire population will be composed of resistant cells. 
The change in the composition of bacterial popula- 
tions exposed to penicillin is an example of evolution 
by natural selection. The process of evolution by nat- 
ural selection explains why so many features of plants 
and animals are well-adapted to the circumstances in 
which they live. 


Darwin, Wallace, and the Theory of 
Natural Selection 


Charles Darwin (1809-1882) spent nearly 5 years 
(December, 1831 through October, 1836) as a natural- 
ist aboard the Beagle during its expedition around the 
world, in addition to spending many years studying 
natural history in south-eastern England. Alfred 
Russel Wallace (1823-1913) spent 4 years in the tropi- 
cal forests of Brazil and 8 more years in the forests of 
south-eastern Asia. Their years of work and that of 
many biologists who preceded them revealed many 
examples of plants and animals whose physiology and 
habits made them well-adapted to their environment. 
Darwin had completed the first draft of a book on 
species and speciation when he received a letter from 
Wallace including the draft of a paper entitled ‘On the 
Tendency of Varieties to Depart Indefinitely from 
the Original Type.’ In this paper Wallace described 
the theory of natural selection, a theory Darwin had 
independently discovered 20 years earlier but was 
only now preparing for publication. In 1858 Wallace’s 
paper was published in the Journal of the Linnaean 
Society together with extracts from an essay Darwin 
wrote in 1844 but never published. One year later 
Darwin’s On the Origin of Species appeared. 

The theory of natural selection that Darwin and 
Wallace presented is beguiling in its simplicity, yet it 
is sufficient to explain the many intricate adaptations 
of plants and animals. 


1. In every population of every species on earth more 
individuals are born than survive to reproduce. 

2. In most populations individuals differ from one 
another in characteristics that cause them to differ 
in their chances of survival, in the number of off- 
spring they produce if they survive to reproduce, or 

oth. 

3. Offspring tend to resemble their parents. 
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From these simple observations follows the obvious 
conclusion: Any characteristic that increases an indi- 
vidual’s chance of survival or its fecundity will tend to 
become more common. Similarly, any characteristic 
that lessens an individual’s chance of survival or 
reduces its fecundity will tend to become less com- 
mon. Thus, individuals will tend to be well adapted to 
the circumstances in which they find themselves. 

Mutations introduce variation into populations 
that may lessen the chance that some individuals 
survive. Genetic drift may have a greater influence 
than natural selection on the transmission of genetic 
variation from one generation to the next in small 
populations. A population may not have had time 
to adapt to recent changes in its environment, or its 
environment may be constantly fluctuating so that 
no single characteristic is always favored. For these 
and other reasons organisms are not perfectly 
adapted to their environment. But the process of 
evolution by natural selection guarantees that most 
characteristics of most organisms will be well-suited 
to the conditions in which they are found most of the 
time. 


Genetic Consequences of Natural 
Selection 


The genetic consequences of natural selection are 
easiest to understand if we study how allele fre- 
quencies at one locus with two alleles change when 
genotypes differ in their probability of survival. An 
individual’s fitness is its contribution to the compos- 
ition of later generations, relative to the contribution 
of other individuals in the same population. Fitness 
differences may arise because individuals differ in 
their probability of survival, in their ability to find 
mates, in the number of offspring they produce 
when mated, and in many other ways. Differences in 
probability of survival are the easiest to understand. 
Fortunately, many of the genetic consequences of 
natural selection do not depend on whether fitness 
differences arise from differences in survival prob- 
ability or differences in some other component of 
fitness. 

Suppose an individual with genotype A;A; survives 
to reproduce with probability w11, and suppose that 
the survival probabilities for genotype A;A2 and A2A> 
are W 17 and w22. If individuals choose their mates at 
random, then genotypes in newly formed zygotes will 
be found in Hardy-Weinberg proportions. So if the 
frequency of allele A; is p and the frequency of allele 
Az is q, we can calculate the frequency of the three 
genotypes in zygotes and adults as follows in Table I. 
©, which is equal to piwi + 2p:g:wi2 + giw, is 
known as the mean fitness. It is the average survival 


Table | 

Genotype AJA] AJA2 A2A2 
Zygote frequencies pie 2Pt qe qi 
Probability of survival wy, Wi2 W2 
Adult frequencies piwi W 2b.qewi2/w qwi2/W 


probability in the population. From the adult fre- 
quencies in the last line of Table I, we can calculate 
the allele frequency among newly formed zygotes of 
the next generation, namely: 


Pui = (pion + p.q:w2)/@ 


Suppose the frequency of the A; allele in newly 
formed zygotes is 0.4 and that w1; = 0.9, w12 = 0.8, 
and w 2 = 0.7, then the above equation allows us to 
predict that the frequency of A; in newly formed 
zygotes of the next generation will be 0.43. Now 
suppose that the survival probabilities were all cut in 
half, i.e., w11 = 0.45, w12 = 0.4, w22 = 0.35. Then we 
can use the above equation again to predict the fre- 
quency of A; in newly formed zygotes of the next 
generation, namely 0.43, exactly what we predicted 
before. These calculations illustrate a very important 
fact about natural selection: The change in allele fre- 
quency from one generation to the next as a result of 
natural selection depends only on the fitness of geno- 
types relative to one another. Even a genotype with a 
low probability of survival can be favored by natural 
selection if its probability of survival is higher than 
that of other genotypes in the population. 

Since natural selection favors characteristics that 
increase the probability of survival, it is not surprising 
that the mean fitness of the new progeny generation is 
greater than the mean fitness of the one that preceded 
it, unless the mean fitness is as great as it can be under 
the current conditions. When mean fitness is at a 
maximum, the allele frequency will not change from 
one generation to the next, even though genotype 
frequencies will differ between newly formed zygotes 
and reproductive adults. The population is at equilib- 
rium. 

We can predict characteristics of the population at 
equilibrium simply by knowing which genotype is 
most likely to survive, which is least likely to survive, 
and which has an intermediate probability of survival. 
Three patterns of selection are possible: 


Directional selection 
Directional selection 
Disruptive selection (het- 
erozygote disadvantage) 
Stabilizing selection (het- 
erozygote advantage) 


W11 > W12 > W22 
W11 < W12 < W22 
w11 > w12 and W22 > W12 


w11 < W12 and wa < W12 


Directional Selection 

Directional selection occurs when individuals homo- 
zygous for one allele have a fitness greater than that of 
individuals with other genotypes and individuals 
homozygous for the other allele have a fitness less 
than that of individuals with other genotypes. At 
equilibrium the population will be composed entirely 
of individuals that are homozygous for the allele 
associated with the highest probability of survival. 
The rate at which the population approaches this 
equilibrium depends on whether the favored allele 
is dominant, partially dominant, or recessive with 
respect to survival probability. An allele is dominant 
with respect to survival probability if heterozygotes 
have the same survival probability as homozygotes for 
the favored allele, and it is recessive if heterozygotes 
have the same survival probability as homozygotes for 
the disfavored allele. An allele is partially dominant 
with respect to survival probability if heterozygotes 
are intermediate between the two homozygotes in 
survival probability. This pattern of selection is refer- 
red to as directional selection because one of the two 
alleles is always increasing in frequency and the other 
is always decreasing in frequency. 

When a dominant favored allele is rare most indi- 
viduals carrying it are heterozygous, and the large 
fitness difference between heterozygotes and dis- 
favored homozygotes causes rapid changes in allele 
frequency. When the favored allele becomes common 
most individuals carrying the disfavored allele are 
heterozygous, and the small fitness difference between 
favored homozygotes and heterozygotes causes allele 
frequencies to change much more slowly (Figure 1). 
For the same reason changes in allele frequency occur 
slowly when an allele with recessive fitness effects is 
rare and much more rapidly when it is common. A 
deleterious recessive allele may be found in different 
frequencies in isolated populations even if it has 
the same fitness effect in every population, because 


e Dominant | eas rar 
—— Partially dominant eee ee / 
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Figure | Dynamics of directional selection. 
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natural selection is relatively inefficient when reces- 
sive alleles become rare, allowing the frequency to 
fluctuate randomly as a result of genetic drift. 


Disruptive Selection 

Disruptive selection occurs when heterozygous indi- 
viduals are the least likely to survive. For that reason 
this fitness pattern is also referred to as heterozygote 
disadvantage. If a population happened to start with 
an allele frequency exactly equal to: 


Pp = (Wr — Wx) /(2w12 — w11 — w22) 


the allele frequency would not change, i.e., the popu- 
lation would be in equilibrium. But the equilibrium 
is not stable. Selection magnifies even a tiny change 
in allele frequency until eventually one allele or the 
other is lost from the population. Which allele is lost 
depends on whether the initial allele frequency is 
greater or less than p*. If the initial allele frequency 
is greater than p*, A> will be lost and the population 
will be composed entirely of A; homozygotes at 
equilibrium. If the initial allele frequency is less than 
p*, A; willbe lost and the population will be composed 
entirely of Ay homozygotes at equilibrium. This pat- 
tern of selection is referred to as disruptive selection 
because selection will cause two populations with 
similar allele frequencies to evolve in opposite direc- 
tions if one has an allele frequency slightly less than p* 
and the other has an allele frequency slightly greater 
than p*. 


Stabilizing Selection 

Stabilizing selection occurs when heterozygous indi- 
viduals are the most likely to survive. For that reason 
this fitness pattern is also referred to as heterozygote 
advantage. As with disruptive selection, if a popula- 
tion happened to start with an allele frequency exactly 
equal to: 


p* = (w2 — W22)/(2W12 — w11 — w22) 


the allele frequency would not change. When hetero- 
zygotes are more likely to survive than either homo- 
zygote, however, p* is a stable equilibrium. Selection 
causes small departures from p* to become even 
smaller with time. Moreover, the allele frequency in 
the population will evolve toward p* regardless of the 
initial allele frequency, as long as both alleles are ini- 
tially present. In Figure 2, for example, w11 = 0.72, 
w12 = 0.9, and w22 = 0.81, and the population evolves 
toward p* = 0.33 regardless of whether the initial 
allele frequency is 0.01 or 0.99. 
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Figure 2 Dynamics of heterozygote advantage. 


Selection on Continuous Traits 


Darwin and Wallace proposed the theory of natural 
selection almost 50 years before Mendel’s rules were 
rediscovered. The logic is incontrovertible. If there 
are heritable differences among individuals that cause 
differences in reproduction and survival, traits that 
increase the probability of survival and reproduction 
will become more common and those that decrease 
the probability of survival and reproduction will be- 
come less common. When geneticists study the in- 
heritance of a continuous trait they use the heritability 
of that trait to describe the extent to which offspring 
resemble their parents. 


Response to Selection 

Differences among individuals may arise because they 
have the same genotype but were exposed to different 
environments, because they were exposed to the same 
environment but have different genotypes, or because 
they have different genotypes and were exposed to 
different environments. The heritability of a trait is 
the proportion of phenotypic variation that can be 
transmitted from parents to offspring. In Figure 3, 
the x-axis is half the summed body weight (in grams) 
of paired male and female laboratory mice.* The y-axis 
is the body weight of offspring. The slope of the 
regression line running diagonally through the figure 
is equal to the heritability of body weight in this 
population of mice. 

The regression line allows us to predict the body 
weight of offspring from the body weight of parents. 
Specifically, if we let X, be the mid-parent body 
weight of a particular pair of parents in the population 
and £ be the average mid-parent body weight in the 
population as a whole. Then the expected body weight 
of their offspring is: 


*This quantity is known as the mid-parent body weight. 


Xo = hb (x, —x) +x 


Suppose that natural selection causes a difference 
between the mean of a trait in those individuals that 
reproduce and the mean in a population as a whole. 
We can apply the above equation to the whole popu- 
lation. Now we interpret £ as the mean mid-parent 
body weight ‘before selection’ and x, as the mean 
mid-parent body weight ‘after selection.’ We can also 
rearrange the equation so that it directly predicts how 
much the mean body weight will change from parents 
before selection to offspring before selection (in the 
next generation): 


Xo — x = h’ (x; — x) 


The quantity x, — X is called the selection differen- 
tial, S. The quantity x, — x is called the response to 
selection, R. The response of a trait to selection de- 
pends both on the heritability of the trait and on the 
selection differential: R = h° S. If a trait lacks heritable 
variation, h? = 0, there will be no change in response 
to selection no matter how strong the selection is. 


Fisher’s Fundamental Theorem of Natural 
Selection 

We can use these results to study how fitness itself 
changes from one generation to the next. Intuition 
tells us that if fitness-enhancing traits become more 
common the average fitness in the whole population 
should also increase over time, and that is just what we 
find. Let w, be the mean fitness before selection, wž be 
the mean fitness after selection, and w, , ; be the mean 
fitness in the next generation. Then: 


Wri T Wt = bh? (w* = Wr) 


The mean fitness of a population after selection, 
(w*) is greater than its mean fitness before selection 
(w,), because individuals with a high probability of 
survival are more common in the population after 
selection than they were before selection. In addition, 
p:is necessarily positive. Thus, w; , ı must be greater 
than w,. The only time when this inequality will not 
hold is when 4? = 0, which means that the population 
has reached the maximum possible fitness. This equa- 
tion embodies Fisher’s fundamental theorem of nat- 
ural selection, and it implies three things about the 
process of evolution by natural selection: 


1. The change in mean fitness between generations is 
proportional to the heritability of fitness. 

2. The mean fitness of a population will never 
decrease from one generation to the next and will 
remain constant only when the population has 
reached the maximum possible fitness. 
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Figure 3 Selection on body weight in laboratory mice. 


3. Evolution by natural selection gradually depletes 
the heritable variation in fitness that is required for 
it to continue. 


Fisher’s Fundamental Theorem of Natural Selection 
has many exceptions. In small populations genetic 
drift may be a more important influence on the evolu- 
tion of populations than natural selection, selection on 
fecundity differences need not follow the same pattern 
as selection on viability differences, and fitess differ- 
ences are rarely the same for more than a few gener- 
ations, for example. Nonetheless, Fisher’s fundamental 
theorem validates our intuition: as fitness-enhancing 
traits become more common, the average level of 
adaptation in a population will increase. 


Detecting Natural Selection 


When biologists say that a trait is subject to natural 
selection they mean that: 


e Individuals differ in the probability with which 
they survive (viability selection). 

e Individuals differ in their ability to attract mates 
(sexual selection). 

e Pairs of individuals differ in the number of off- 
spring they produce (fecundity selection). 
or 

e Individual alleles differ in the probability with 
which they are incorporated into gametes (gametic 
selection). 


Viability Selection 
Differences in probability of survival are most often 
associated with natural selection. When we say that an 


organism is well-adapted to its environment, we often 
mean that it has a high probability of survival. Detect- 
ing differences in survival probability is relatively 
straightforward. 

When a wild-type female of Drosophila melano- 
gaster heterozygous for the allele causing white eye 
color is crossed with a white-eyed male, we expect 
half of the offspring to be white-eyed and half to 
be wild-type, regardless of sex. In one such set of 
crosses experimenters obtained the results shown in 
Table 2. 


Sexual Selection 

The most obvious way in which individuals differ in 
their ability to attract mates is when males compete 
for control of a harem, as in North American elk 
(Cervus canadensis). This type of sexual selection is 
known as male—male competition. Males and females 
of most species differ in many characteristics that 
are not directly related to the reproductive process. 
Such characteristics are known as secondary sexual 


Table 2 

Red-eyed White-eyed Total 
Observed 2652 2088 4740 
Expected 2370 2370 4740 


Since the rules of Mendelian genetics tell us that there were 
equal numbers of wild-type and white-eyed zygotes formed, 
the deficiency of white-eyed flies must be the result of a 
lower probability of survival. This experiment shows that 
there was viability selection against white eyes in this 
laboratory population of Drosophila melanogaster. 
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characteristics. When differences among males in se- 
condary sexual characteristics cause females to choose 
some for mates in preference to others it is a type of 
sexual selection known as female choice. 

The pied flycatcher (Ficedula hypoleuca) breeds in 
Europe, northern Africa, and western Asia. Two color 
forms of male pied flycatchers are found in Europe. 
The black-and-white form has black feathers on its 
head, the nape of its neck, and its back. It has white 
feathers on its chin, the front of its neck, and its under- 
side. The brown form has brown feathers instead of 
black ones on its head, the nape of its neck, and its 
back. 

To determine whether a male’s color affects the 
female’s choice of mate, experimenters placed pairs 
of males, one black-and-white and one brown, in 
outdoor aviaries. The aviaries contained three com- 
partments. Each male was placed in a separate 
compartment and prevented from seeing the other 
male. After the males were habituated to the aviary, 
a female was placed in the third compartment with 
two nest boxes, one close to each of the males. The 
female could see both males. When the female 
built a nest, the experimenters noted whether she 
built it in the nest box associated with the black- 
and-white male or in the one associated with the 
brown male. 

Females for the experiment were collected from an 
area in central Europe where the closely related black- 
and-white collared flycatcher (Ficedula albicaulis) also 
occurs. Ten out of twelve (5:1) females built their nests 
in boxes associated with the brown male, showing that 
female choice leads to sexual selection in favor of the 
brown form in this region. In areas where the black- 
and-white collared flycatcher does not occur, similar 
experiments showed that sexual selection through 
female choice favors the black-and-white form of the 
pied flycatcher. 


Fecundity Selection 

Experiments in laboratory populations of Drosophila 
melanogaster have repeatedly shown that differences 
among individuals in the number of offspring they 
produce contribute more to fitness differences among 
individuals than do differences in the probability 
of survival. Experiments measure the magnitude of 
fecundity selection by counting the number of offspring 
produced from different types of matings. 

Adults heterozygous for Cy have curled wings 
when pupae are raised at 25°C. To determine whether 
females with curled wings produce fewer offspring 
than those with normal wings, experimenters allowed 
each female to mate with one male and calculated the 
mean number of adult offspring each type of mating 
had produced 18 days after mating (Table 3). 


Table 3 Fecundity selection in Drosophila melanogaster 


Male genotype 


Female genotype Cy Wild-type 
Cy 90.0 111.8 
Wild-type 114.2 117.1 


The fecundity of Cy females was 95% of the fecundity of 
wild-type females when mated with a wild-type male and 
only 79% of the fecundity of wild-type females when mated 
with a Cy male. Similarly Cy males produced fewer offspring 
than wild-type males regardless of whether they were 
mated with Cy or wild-type females. Fecundity selection 
favors wild-type in both females and males. Moreover, the 
differences between Cy and wild-type in female fecundity 
(5-20%) are much greater than those in viability (< 1%). 
(Data from Clark and Feldman, 1981.) 


Gametic Selection 

Mendel’s rules tell us that half of the gametes produced 
by a heterozygous individual will carry one allele and 
half will carry the other, but Mendel’s rules are some- 
times broken. In mice (both Mus musculus and Mus 
domesticus) the genes of the major histocompatibility 
complex and many others are tightly linked in a region 
near the centromere of chromosome 17. Because re- 
combination between these genes is rare, the entire 
region is usually transmitted as if it were a single 
Mendelian gene. Mutations in the t complex, as this 
chromosomal region is known, often affect viability. 
In addition, over 90% of the gametes transmitted by 
males heterozygous for a ‘complete’ t haplotype and 
a wild-type t haplotype carry the complete t haplo- 
type. The great excess of complete t haplotypes in the 
progeny of heterozygous males shows that gametic 
selection favors the complete ¢ haplotype over wild- 
type t haplotypes. 


Levels of Selection 


It is natural to refer to organisms when discussing 
natural selection and its consequences. But the ex- 
ample of the t haplotype in mice illustrates that natural 
selection can operate at the level of gametes too. In 
fact, the ¢ haplotype illustrates that natural selection 
may act simultaneously at several different levels of 
biological organization, the gamete or gene, the indi- 
vidual organism, and the population or group. 

We have just seen how biased segregation in favor 
of the complete t haplotype leads to gametic selection 
in favor of the complete ¢ haplotype. If gametic se- 
lection were the only evolutionary force affecting 


this trait, the complete t allele would rapidly sweep 
through mouse populations and eliminate the wild- 
type allele. But gametic selection is not the only force. 
Many complete t haplotypes carry recessive lethals, 
and many of those that do not carry lethals cause 
sterility when homozygous. Selection at the level of 
the individual organism favors the wild-type haplo- 
type. If it were the only evolutionary force affecting 
this trait, the complete t allele would be rapidly elimin- 
ated from mouse populations. The balance between 
these opposing forces leads to maintenance of both 
alleles in mouse populations. 

Mathematical models that combine gametic and 
individual selection, however, predict that the com- 
plete t haplotype should be found much more fre- 
quently than it is. Selection at the level of groups or 
populations may be responsible for the discrepancy. 

Mouse populations are often small and founded by 
only a few individuals. As a result, genetic drift may 
have a large influence on allele frequencies within 
them. In a few populations the complete t may become 
very common. When it does there is also a possibility 
that, by chance, all the offspring produced will be 
homozygous for a complete t haplotype. If they are, 
the population is doomed to extinction. If the popula- 
tion is recolonized, the new colonists will probably 
have a lower frequency of the complete t haplotype. 
Selection among groups favors groups with a low 
frequency of the complete t haplotype, reinforcing 
selection at the level of individual organisms. 

Evolution by natural selection among groups or 
populations is possible when: 


1. Groups differ from one another in their probability 
of extinction, in the probability that migrants from 
them found new populations, or in the probability 
that migrants from them are incorporated into 
existing groups. 

2. Migrants that form new groups or are incorporated 
into existing groups resemble the groups from which 
they were drawn. 


As the example of the complete t haplotype makes 
clear, evolution by natural selection need not produce 
the best possible results for a whole species or even for 
individual populations. The ‘best’ result for mouse 
populations would be if the complete t haplotype 
were completely eliminated. Selection in favor of 
the complete t haplotype at the level of gametes en- 
sures that this will not happen, and the result is an 
equilibrium (a compromise) between two extremes. 
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‘Nature,’ as a word by itself, is simply a proxy for the 
terms ‘genetics’ or ‘heredity’; ‘nurture,’ by itself, is 
simply a proxy for the terms ‘environment’ or ‘experi- 
ence,’ broadly construed. Separating the proxies by 
‘versus(vs)’ or adding the word ‘controversy,’ as in the 
title of this entry, opens a Pandora’s box going back 
to post-Darwinian (1809-1882) and neo-Mendelian 
times (rediscovered and noticed in 1900). An alias 
for the nature—-nurture controversy is the heredity- 
environment debate, giving rise to prototypes known 
as ‘hereditarians’ and ‘environmentalists’, whose 
descendants can still be found today, fortunately as 
outliers. 


Roots of the Controversy 


Francis Galton (1822-1911), an English gentleman- 
scholar with an impeccable pedigree that included 
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Charles Darwin (1809-1882) as his first cousin (they 
shared Erasmus Darwin as their grandfather), is 
widely credited with using nature and nurture in 
apposition. He did so, first in the context of his 1874 
book ‘English Men of Science: Their Nature and 
Nurture’ (Galton, 1874), which was a follow-on to 
his famous book of 1869, ‘Hereditary Genius’ 
(Galton, 1869), both books being focused on the 
theme of what makes men eminent and some more 
eminent than others (Galton, 1874, pp. 12, 16): 


The phrase ‘nature and nurture’ is a convenient jingle of 
words, for it separates under two distinct heads the innumer- 
able elements of which personality is composed. Nature is 
all that a man brings with himself into the world; nurture 
is every influence from without that affects him after 
birth....In the competition [sic] between nature and nur- 
ture, when the differences in either case do not exceed those 
which distinguish individuals of the same race living in the 
same country under no very exceptional conditions, nature 
certainly proves the stronger of the two. 


His self-confidence in such a formulation was bol- 
stered by his anecdotal observations on twins’ mental 
and physical characteristics, even though he did not 
know then that identical twins and same-sex fraternal 
twins must be kept separate in order to make scientific 
use of the information on twins. 

Returning to a quest for a simple-minded dicho- 
tomy of forces — with nature and nurture competing 
for ‘supremacy’ — to explain variation in human 
behaviors, first in Fraser’s Magazine (November, 
1875) and reprinted virtually verbatim that same year 
in the Journal of the Anthropological Institute, Galton 
produced a paper entitled “The History of Twins, as a 
Criterion of the Relative Powers of Nature and 
Nurture,” still baffled by his assumption that all same- 
sex pairs were from one ovum and hence monozygotic 
or identical twins (Galton, 1875). From 94 pairs of 
twins that he had contacted with questionnaires, he 
selected 35 pairs for special attention that were said 
to be closely similar in childhood, of whom some 
20 pairs grew to be unalike, and combined those 
observations with anecdotal descriptions of twins’ 
behaviors from other observers to formulate an even 
stronger statement than his earlier one (Galton, 1875, 
p. 404): 


The impression that all this evidence [sic] leaves on the mind 
is one of some wonder whether nurture can do anything at 
all, beyond giving instruction and professional training.... 
There is no escape from the conclusion that nature prevails 
enormously [italics added] over nurture when the differences 
of nurture do not exceed what is commonly to be found 
among persons of the same rank of society and in the same 
country. 


Thus were planted the seeds to perpetuate an empty 
controversy which advances in knowledge over the 
past 125 years should have laid to rest. 

It is curious that this gentleman, fond of quoting 
Shakespeare throughout his writings, did not ac- 
knowledge priority of authorship for the alliteration 
on which we focus. In The Tempest, Shakespeare gives 
his own views on the unmodifiability of behavior, 
given a strong hereditary predisposition. Prospero, 
describing Caliban, says, “A devil, a born devil, on 
whose nature nurture can never stick, on whom my 
pains, humanely taken, all, all lost, quite lost” 
(4.1.187-190). 

Although neither Darwin nor Galton made use of 
a valid theory for explaining inheritance, Gregor 
Mendel (1822-1884), their contemporary, labored 
without fanfare in central Europe to produce a viable 
theory of heredity from his observations on pea 
plants; it would not be until 1900 that his ideas were 
rediscovered in Holland, Germany, and Austria, too 
late to influence the initial shape of the controversy. 
The fact that Galton coined the term and concept of 
‘eugenics, and that he had overvalued ideas about 
the superiority of the white race over black South 
Africans makes it an uphill struggle to disentangle 
the useful ideas he generated about the importance of 
studying individual differences and his contributions 
to measurement (regression and correlation), while 
founding what came to be called behavioral genetics, 
from the nature-nurture controversy. Also absent 
from the array of concepts that could, in the aggregate, 
have prevented the simple dichotomy from taking 
root and proliferating, were the concepts of ‘gene’ 
(1909), ‘genetics’ (1906), ‘mutation’ (1907), and 
the critical distinction we now take for granted — 
differentiating genotype from phenotype — as made 
in 1909 by W. Johannsen, a Danish botanist. It is 
noteworthy that Mayr, 1982, in his magnificent book 
The Growth of Biological Thought — Diversity, Evolu- 
tion, and Inheritance, relegates the phrase ‘nature vs. 
nurture’ to one line of one footnote and describes 
Galton as a “dilettante and maverick,” albeit also as 
pioneering in regard to population approaches to 
human variation. 


Polarization in the Political and Policy 
Arenas 


With the rise of behaviorism as a dominant theory for 
explaining human behavior and its variation in the 
1920s, J. B. Watson threw down the gauntlet for envir- 
onmentalism with his biological-free battle cry to the 
effect that given a dozen healthy infants he would guar- 
antee that any one taken at random could be trained to 
be anything from a physician to a thief, regardless of 


any “raw material” (1924, 1928). The battle was then 
joined for some two decades, focusing on individual 
differences in intelligence test scores and school 
achievement, with claims and counterclaims about 
the power of family influences and education to eradi- 
cate the variation seen in the general population. 
Twin studies came into their own after being put on 
a sound scientific basis by Siemens in Germany and 
Merriman in the USA in 1924. A rich harvest of adop- 
tion studies contributed their results, both for and 
against, about the effects of raising children from 
poor environments with phenotypically inadequate 
parents in decent middle-class homes. Regretably, 
the studies were conducted in an adversarial atmos- 
phere by researchers disinterested in compromise; 
seldom was it appreciated that the phenotype was an 
echo of a series of distal causes including the genotype, 
and that the same phenotype, for example, mental 
retardation, could arise from disparate causes ranging 
from poor prenatal nutrition or diseases to hundreds 
of rare dominant and recessive gene loci. Protagonists 
in the first half of the twentieth century were as insensi- 
tive as those in the last half of the nineteenth century 
to the reality that familiality per se could result from 
gene-sharing, experience-sharing, or culture-sharing 
and, most often, from all three to different degrees 
for different traits. Two different volumes of the 
National Society for the Study of Education (vol. 27 in 
1928, and vol. 39 in 1940), as well as an effort at 
compromise by R. S. Woodworth (1940) for the Social 
Science Research Council, provide a rich source of ma- 
terials and attitudes of the times. The science of genet- 
ics has much to teach the social scientists if the former 
are not also held captive by some ideology from either 
the political right or the political left. Observed vari- 
ation for complex traits in human populations will 
always arise from different combinations of three 
major causes — genetic variance, environmental vari- 
ance, and that variation resulting from genotype by 
environment interaction. Such facts by themselves do 
not provide sufficient guidance for then determining 
the roles of these components for the development of a 
particular trait in a particular individual. 


Dialectic Reconciliation 


Given the history of the concept of the nature-nurture 
controversy, it is easy to embrace the advice proffered 
by Dobzhansky (1962): “The complexity of nature 
should not be evaded. The only way to simplify nature 
is to study it as it is, not as we would have liked it to 
be.” A systems approach to the study of genetically 
influenced traits and diseases is now widely accepted 
within the field of genetics, but the specifics are yet to 
be detailed and they may be expected to be different 
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across traits and across species. A start can be made, 
however, that places the controversies about the 
ancient controversy squarely among concepts of his- 
torical interest only. Figure | is an effort to combine 
experiences in research into behavioral genetics, in this 
case cognitive abilities, with those from coronary 
artery disease (Sing et al., 1994). The schema, or car- 
toon if you wish, tries to accommodate the various 
causes that are known or suspected that account for 
the variation in general human cognitive ability (also 
known as ‘g’). It provides indicators of causes and 
contributors all along the complex pathways from 
genotype to phenotype, highlighting, at the most dis- 
tal end of a pathway, the genes themselves by name; the 
latter can be considered to be quantitative trait loci 
(QTLs) inthe ‘g-relevant’ system and have been discov- 
ered in the course of research using linkage and asso- 
ciation strategies to understand one or another aspect 
of brain functioning in humans, mice, and Drosophila. 
Many more such genes and gene regions will be uncov- 
ered with the rapid advances being made in mapping 
the entire genomes of these species. Obviously, each 
trait-relevant gene can harbor mutations for one or 
more of the functional polymorphisms that can 
enhance or diminish the phenotype specified at the 
far end of the system. 

Four different, trait-relevant ‘endophenotypes’ 
are conjectured that mediate the indirect influences 
of the named genes’ products and regulatory functions 
on the phenotype of interest. Such a separation of 
levels of influence in the gene-to-behavior pathway 
should facilitate research from a bottom-up approach, 
as it restrains complexity to the shorter path from genes 
to endophenotypes. The realms of developmental 
genetics are sketched in the upper part of the figure, 
once the possible combinations of endophenotypes 
are launched at zygote formation. Pre-, peri-, and 
postnatal influences are then free to play their roles 
in the development of the trait of general cognitive 
ability along the time (age) dimension shown, while at 
the same time (cf. Dobzhansky, 1962) environmental 
forces that can be construed as falling along a dimen- 
sion of stifling to facilitating come into play. The net 
result of all these influences changes over time; the 
resulting variation in levels/values of the trait are indi- 
cated by a point on the ‘reaction surface’ in Figure I. 
Each person would have a value at each age that 
depends on his or her individualized history for each 
of the elements in the system, e.g., genotype, ex- 
pressed genotype, endophenotype, developmental 
history, and so forth. No simple-minded or simplistic 
model that contains the vague terms ‘nature’ and ‘nur- 
ture’ can have the heuristic power to design and imple- 
ment research in genetics that is possible with the 
systems approach sketched here. 
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(See Plate 24) Schematic representation of a systems approach for explaining individual differences for the 


trait of general cognitive ability encompassing genes (quantitative trait loci, QTLs) at the distal end of the gene-to- 
behavior pathway, endophenotypes, and the developmental genetic aspects of the reaction surface. (Reproduced with 
permission from Gottesman II (1997) Twins: en route to QTLs for cognition. Science 276: 1522. Copyright © 1997 


American Association for the Advancement of Science.) 


Prototype for a Complex Systems 
Approach 


What is the role of genetic factors in the causation of 
cancers? What is the role of environmental factors in 
the causation of cancers and are they those shared 
within a family or those unique to the individual and 
not shared with other family members? The answers 
to such very difficult questions can be approached, in 
the context of this article, by looking at a unique study 
(Lichtenstein, 2000) of 45 000 twin pairs from Sweden, 
Denmark, and Finland, of whom 9500 pairs had at least 
one member with cancer at one of the most common 
28 sites. Major mendelizing genes that cause cancer, 
e.g., BRCA1 and BRCA2 for breast cancer, are quite 
rare, so the investigators used genetic models appro- 
priate for complex traits/diseases that have unknown 
amounts of genetic, environmental, and interactional 
variance to estimate such values. Absolute levels of 
concordance for same-site cancers were quite low for 
both identical and fraternal twins, for example, 13% 


and 9%, respectively, for female breast cancer, and 18% 
and 3% for prostate cancer. Nonetheless 67%, 6%, and 
27% of the variance in the liability to developing breast 
cancer could be attributed to unique environmental 
exposures, shared family environmental factors, and 
heritable factors, respectively. The proportions of vari- 
ance for liability to prostate cancer were 58%, 0%, and 
42%; for liability to lung cancer, 62%, 12%, and 26%; 
and, lastly, for colorectal cancers, 60%, 5%, and 35%. 
Thus, for these cancers, the role for unique environ- 
mental risk factors predominated, but the modern 
strategy revealed appreciable roles for genetic factors 
that make searching for them worthwhile, with a great 
potential for prevention. In sum, may the nature- 
nurture controversy rest in peace, and may the systems 
approach to complex causality with genetics move 
forward. 
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Carey G. Human Genetics for the Social Sciences. Thousand Oaks, 
CA: Sage Publications (in press). 

Jones BC and Mormede P (eds) (1999) Neurobehavioral Genetics: 
Methods and Applications. Boca Raton, FL: CRC Press. 

Plomin R, DeFries JC, McClearn GE and McGuffin P (2001) 
Behavioral Genetics, 4th edn. New York: WH Freeman. 

Plomin R and McClearn GE (eds) (1993) Nature, Nurture and 
Psychology. Washington, DC: American Psychological Asso- 
ciation. 

Rowe DC (1994) The Limits of Family Influence: Genes, Experience, 
and Behavior. New York: Guilford Press. 


References 

Dobzhansky T (1962) Mankind Evolving. New Haven: Yale 
University Press. 

Galton F (1869) Hereditary Genius: An Enquiry into its Laws and 
Consequences. London: Macmillan. 

Galton F (1874) English Men of Science: Their Nature and Nurture. 
London: Macmillan. 

Galton F (1875) The History of Twins, as a Criterion of the 
Relative Powers of Nature and Nurture. Journal of the Anthro- 
pological Institute. 6: 391—406. 

Lichtenstein et al. (2000) Environmental and heritable factors in 
the causation of cancer — analyses of cohorts of twins from 
Sweden, Denmark and Finland. New England Journal of 
Medicine 343: 78-83. 

Mayr E (1982) The Growth of Biological Thought — Diversity, Evolu- 
tion, and Inheritance. Cambridge, Mass: Harvard University 
Press. 

Sing C, Lerba KE and Reilly SL (1994) Traversing the biological 
complexity in the hierarchy between genome and the CAD 
endpoints in the population at large. Clinical Genetics 46: 
6-14. 

Watson JB (1924) Behaviorism. New York: Norton and Co. 

Watson JB (1928) Psychological Care of Infant and Child. New 
York: Norton and Co. 


See also: Heritability; Intelligence and the 
‘Intelligence Quotient’; QTL (Quantitative Trait 
Locus) 


Nearly Neutral Theory 
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History 


The nearly neutral theory of molecular evolution is an 
extension of the neutral theory. It was put forward by 
T. Ohta in the early 1970s. The theory contends that 
the borderline mutations whose effects lie between the 
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Figure | Diagram to show how new mutants are 
classified under the selection, the neutral, and the nearly 
neutral theories. 


selected and the neutral classes are important at the 
molecular level. Their fate is influenced by both ran- 
dom genetic drift and selection. The rate of molecular 
evolution is highly dependent upon selective con- 
straints of proteins or nucleic acids; highly constrained 
proteins like histone IV evolve very slowly, whereas 
little constrained ones like fibrinopeptides change 
rapidly. Under the neutral theory, it is assumed that a 
certain fraction of new mutations are free of constraint 
or are selectively neutral, while the rest have deleteri- 
ous effects and are eliminated from the population. 
The nearly neutral theory regards the borderline 
mutations as most significant in molecular evolution, 
and is directed toward understanding the interaction 
between random genetic drift and selection. Figure | 
depicts the comparison of the selection, the neutral, 
and the nearly neutral theories on how new mutations 
are classified. During the 1970s and the early 1980s, 
data on protein evolution and polymorphisms 
accumulated, but the issue on the neutral versus the 
selection theories had continued. In the late 1980s, 
comparative studies of DNA sequences became pos- 
sible, and unimportant parts of DNA exhibited rapid 
evolution. The result favored the neutral theory over 
the selection theory, and supporters of neutralism 
increased. At the same time, molecular systematics 
expanded, and the molecular clock was thought to be 
consistent with the strict neutral theory. In the 1990s, 
DNA sequence data increased, enabling detailed an- 
alyses on the pattern of nucleotide substitutions. 
Deviations from the strict neutrality have often been 
found. The nearly neutral theory has become neces- 
sary to study in detail in relation to these observations. 


Implications of Theory 


The nearly neutral theory is summarized as follows. 
Random drift and selection both influence the 
behavior of very weakly selected mutations with 
drift predominating in small populations, and selec- 
tion in large populations. Most new mutations are 
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deleterious, and most mutations with small effects are 
likely to be slightly deleterious. Such mutations are 
selected against in large populations, but behave as if 
neutral in small populations. They are called nearly 
neutral mutations, and entail a negative correlation 
between evolutionary rate and population size. Quan- 
titative treatment may be pursued in terms of the 
principle that the rate of gene substitution equals the 
number of new mutations multiplied by their fixation 
probability. 

An important prediction of the nearly neutral the- 
ory is related to the molecular clock of amino acid 
substitution, which is dependent on the chronological 
time rather than on the generation number. Mutation 
rate depends on the number of cell generations, and 
DNA regions without genetic information should 
evolve directly reflecting the generation number. As 
an empirical observation, large organisms with long 
generation time tend to have small population size and 
vice versa. Then, under the nearly neutral theory, the 
generation-time effect of mutation rate partially can- 
cels the population-size effect of fixation probability. 
On the other hand, for DNA regions without gene- 
tic information, such cancellation is not predicted. 
The prediction was tested by comparing the patterns 
of synonymous and nonsynonymous substitutions. 
Forty-nine gene sequences of three orders, primate, 
artiodactyl and rodent, were analyzed. The results 
show that the generation-time effect is more conspicu- 
ous for synonymous substitutions than for non- 
synonymous substitutions, i.e., the rodent branch is 
much longer than the primate branch for synonymous 
changes, but the difference of the two branches is not 
so large for nonsynonymous ones. Primates generally 
have longer generation times, and the difference in the 
patterns of the two types is consistent with the nearly 
neutral theory. 


Population Genetic Studies 


Data on DNA polymorphisms within populations are 
rapidly accumulating. Under the neutral theory, most 
polymorphisms are phases of gene substitution, and 
quantitative predictions can be made. Again, by separ- 
ately measuring synonymous and nonsynonymous 
polymorphisms, some departures from the neutral 
prediction were reported. As an alternative to the 
neutral theory, it is often difficult to discriminate 
between the selection theory and the nearly neutral 
theory. This is because various patterns of poly- 
morphisms may be explained under both theories. 
For example, if the selection coefficient of a nearly 
neutral mutant differs in the opposite direction 
between local colonies, and migration is limited, the 
very weak selection would be effective in maintaining 


polymorphisms, and it is difficult to distinguish it 
from balancing selection. 

With the progress of the genome diversity project, 
data on DNA polymorphisms are rapidly accumulat- 
ing. Data revealed prevalence of slightly deleterious or 
nearly neutral amino acid substitutions again by com- 
paring the patterns of synonymous and nonsynon- 
ymous single nucleotide polymorphisms. 

Precise formulation of the nearly neutral theory is 
difficult. Much depends on the assumption of the 
fitness distribution of mutations. Epistatic interaction 
at various levels, such as among amino acid sites within 
a protein and between regulatory regions of DNA and 
proteins, is another important factor that needs to be 
studied in shaping the nearly neutral model. Variation 
of evolutionary rate of proteins, which was found to 
be too large to the neutral prediction, reflects such 
interactive systems. 

Although there are differences between the neutral 
and the nearly neutral theories, some scientists use the 
former term to include the latter. This is because ran- 
dom drift is the driving force in both theories. It is 
suggested here that the nearly neutral theory should 
be used as it is in professional discussions. However in 
general discussions, the neutral theory in the broad 
sense may include both theories. In particular, select- 
ive neutrality usually means the set of all mutations 
around strict neutrality. 


See also: Epistasis; Fixation Probability; Gene 
Substitution; Molecular Clock; Neutral Theory; 
Selective Neutrality 
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Negative complementation refers to interallelic com- 
plementation where a mutant subunit suppresses the 
activity of a wild-type subunit in a multimeric protein. 


See also: Complementation 
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In the recombination of linked markers, the coefficient 
of coincidence (C) measures the degree of correlation 


of recombination in two intervals on the same linkage 
group as C= Rı2/Rı R2, where R, and R32 are the 
recombination frequencies for the two intervals and 
Rp is the frequency of individuals recombinant simul- 
taneously in the two intervals (double recombinants). 
Interference is defined by = 1 —C. I is positive when 
double recombinants are less than expected on the null 
hypothesis that exchanges in the two intervals occur 
independently of each other; J is negative when ex- 
changes in the two intervals are positively correlated 
with each other. 


Bacteriophage Crosses 


In bacteriophage crosses, as conventionally con- 
ducted, interference is characteristically negative. A 
portion of this negative interference can be accounted 
for by heterogeneities in opportunities for recombin- 
ation and, for some phages, by the circularity of the 
linkage map, which requires a minimum of two ex- 
changes per recombinant chromosome. The magni- 
tude of this interference is largely independent of R 
values. 

For markers in the same or adjacent genes, however, 
negative interference is seen to increase in absolute 
value as the markers employed are ever closer. From 
a formal point of view, that observation implies a 
clustering of exchange events. This localized (or 
‘high’) negative interference is a result of several 
factors: 


1. A recombinational interaction may result in either 
the splicing together of segments of two DNA 
duplexes or of the patching of a short segment (< 
about 1kb) of a single strand from one DNA 
duplex into another (see Figure 1). The latter 
event contributes to localized negative interference 
by contributing a close double exchange. 


Negative Interference 1303 


2. When two duplexes differ at two or more close 
sites, both sites may be included in the splice or 
patch. Mismatch repair operating on some of the 
sites, or in opposite directions on sites, can contri- 
bute to the apparent clustering of exchanges. 

3. In T-even phages, chromosome ends, which are 
recombinagenic, are at different loci in different 
particles. In a given host cell in which a cross is 
conducted, there is a higher than average rate of 
recombination in regions of the phage chromosome 
that are near an end of an infecting chromosome. 


Meiosis 


In meiosis, interference between reciprocal exchanges 
leading to crossing-over is absent in some organisms 
and positive in others. However, when markers are in 
the same or neighboring genes, recombination mani- 
fests localized negative interference as it does in phage. 
Meiotic tetrad analysis reveals that crosses showing 
such localized negative interference are producing 
recombinants primarily by gene conversion, a non- 
reciprocal process that violates the Mendelian segre- 
gation ratio of 2:2. The underlying mechanisms of 
conversion are much the same as those accounting 
for localized negative interference in phage crosses. 
The coefficient of coincidence for a pair of adjacent 
intervals is most sensitively measured in a three-factor 
cross, which allows direct determination of R42, the 
frequency of double crossover chromosomes. The 
three separate two-factor crosses allow a less sensi- 
tive estimate of the coefficient of coincidence through 
the relation R = R;+R,— 2 CR, Ro, where R; is the 
recombinant frequency for the outside markers. In 
some circumstances, the two methods give signifi- 
cantly different estimates of C, indicating that the 
markers themselves, rather than only the distances 
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Simplified scheme of recombination between a pair of homologous DNA duplexes, one black and one 


white. With comparable probability, segments of DNA from the two duplexes can be spliced together (left), or a 
patch (right) may be donated from one duplex to the other. Splices and patches can be either reciprocal or 
nonreciprocal, depending on a variety of factors. Patches are an important source of negative interference. Repair of 
any mismatches within the segments of hybrid DNA (black on one strand, white on the other) can make further 


contributions to negative interference. 
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between the markers, are influencing recombinant 
frequencies. In extreme cases, the two-factor cross 
method gives negative values for C (map expansion). 
Such discrepancies are seen with crosses between 
heteroalleles, where mismatch repair of heteroduplex 
recombination intermediates is a major determinant of 
recombinant frequencies. 


Further Reading 
Stahl FW (1979) Genetic Recombination; Thinking about It in Phage 
and Fungi. San Francisco, CA: WH Freeman. 


See also: Coincidence, Coefficient of; 
Heteroallele; Heteroduplexes; Interference, 
Genetic; Map Expansion; Mismatch Repair (Long/ 
Short Patch); Recombination, Models of 
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Negative regulators are regulatory molecules that 
function by switching off transcription or translation. 


See also: Regulatory Genes 


Negative Supercoiling 
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Negative supercoiling is the twisting of a DNA duplex 
in space in the opposite sense to that of the turns of the 


double helix. 


See also: DNA Supercoiling 


Neoteny 
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Neoteny has at least two, interrelated meanings. In the 
general literature of evolutionary biology, neoteny is 
one of a set of terms that define the relationship be- 
tween development and evolution. The most general 
of these terms is heterochrony, which may be defined 
as a phylogenetic shift in the timing of expression of a 


feature between ancestors and descendants. Descend- 
ants may be relatively juvenilized compared to their 
ancestors (a phenomenon known as pedomorphosis) 
or they may demonstrate an extension of the ancestral 
ontogeny beyond the normal endpoint seen in an 
ancestor (peramorphosis). Neoteny is one way of 
achieving pedomorphosis, in which somatic develop- 
ment is slowed down, resulting in a sexually mature 
descendant adult that is relatively juvenile with 
respect to its immediate ancestor. 

Neoteny also has a much more specialized mean- 
ing, and refers to a pedomorphic condition found in 
many salamanders and newts (Urodela). The ancestral 
condition for urodeles is a biphasic life cycle, with an 
aquatic larval phase, a distinct metamorphosis, and a 
prolonged, postmetamorphic juvenile and adult phase. 
However, in many salamanders and newts the post- 
metamorphic phase is eliminated, and individuals, 
populations, or entire species go through life as sexu- 
ally mature larval individuals. This phenomenon of 
larval reproduction is often referred to as ‘neoteny,’ 
particularly in the older amphibian literature. Since 
the early 1980s many evolutionary biologists have 
referred to larval reproduction in urodeles simply as 
pedomorphosis, since it is definitely a condition in 
which descendants are juvenile in most features with 
respect to their metamorphosing ancestors. 

The most famous example of neoteny (or pedo- 
morphosis) is the Mexican axolotl, Ambystoma 
mexicanum. Often referred to simply as ‘the axolotl,’ 
this species has been a model study system in develop- 
ment, including the interface of development and evo- 
lution, for decades. Ambystoma mexicanum is closely 
related to the tiger salamanders (A. tigrinum) from the 
United States and Mexico, and both species are mem- 
bers of a group of 15 species (the tiger salamander 
complex) in which pedomorphosis is extremely com- 
mon. Because the mechanistic basis of amphibian 
metamorphosis is reasonably well understood, and 
the axolotl is such a well-established model system, 
recent research on amphibian ‘neoteny’ has focused 
on the genetics and molecular basis of metamorphic 
failure in A. mexicanum. 

Recent genetics work utilizing artificial crosses 
between A. mexicanum and wild-caught, metamor- 
phosing eastern tiger salamanders (A. tigrinum) have 
confirmed that metamorphosis is dominant to meta- 
morphic failure, and may be controlled by one or a 
few genes. Environmental conditions can also influ- 
ence the expression of the metamorphosis phenotype, 
with both food level and temperature having an 
influence in the laboratory. Quantitative trait locus 
(QTL) mapping experiments with laboratory axolotls 
showasimilar result, witha single QTL explaining over 
90% of the variance in completion of metamorphosis. 


When these results were replicated using wild-caught 
A. mexicanum, the genetic basis of metamorphic fail- 
ure appears to have a more complex basis, suggesting 
that some evolution has occurred in laboratory lines. 
Candidate gene analysis has failed to identify the pre- 
cise mechanisms by which metamorphosis is blocked 
in the axolotl, although some feature of the thyroid 
hormone cascade is probably involved. 

At a somewhat more phenomenological level, neo- 
teny has been suggested as playing a major role in the 
evolution of groups as diverse as humans, flowers, 
insects, trilobites, and most groups of amphibians. 
As increasingly sophisticated molecular tools are 
used to unravel the mechanistic basis of developmen- 
tal shifts during evolution, studies of neoteny should 
continue to contribute to our understanding of how 
the diversity of form evolves over time. 


See also: QTL (Quantitative Trait Locus) 
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Mathematically speaking, network is a concept in 
graph theory, where relationships of nodes (abstract 
objects) are described through edges (or lines). A net- 
work is a graph defined as follows: All the nodes are 
connected, so there must be no node without a con- 
nection with other edges. Edges in a network should 
not connect a node itself, or self-connection is pro- 
hibited. Figure | shows various types of network. It 
should be noted that in graph theoretical notation, a 
tree is also a network. In some old literatures in mo- 
lecular evolution, however, rooted and unrooted trees 
are called trees and networks, respectively. In recent 
literature of molecular evolution, networks usually 
mean non-tree type networks. In this case, we should 
have at least one reticulation (loop structure) in the 
network. A tree structure can be obtained, even if the 
network method is applied depending on the data set; 
it may thus be better to consider networks first. When 
networks are constructed for phylogenetic study, 
these are often called ‘phylogenetic networks.’ 

The biological mechanisms responsible for causing 
reticulations (loops) are as follows: 


1. Parallel changes: when changes of the same type 
occur at the same nucleotide or amino acid site 
but at different lineages, reticulation appears. 
Figure IA is a schematic explanation for this for 
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Figure | (A) Above: phylogenetic relationship of four 
sequences. Below: its corresponding network. Because 
of parallel changes, designated as star symbols, a 
reticulation appeared in the network. (B) Above: 
crossing-over of two alleles or haplotypes (l and 2) 
produced two recombinants (3 and 4). Below: its 
corresponding network. 


four sequences. Because parallel changes (desig- 
nated as a star in the phylogenetic tree) occurred 
in the lineages going to sequence 2 and 4, one long 
rectangular form appears in the phylogenetic net- 
work shown below the phylogenetic tree. If a long 
edge at the bottom is omitted, we obtain the correct 
unrooted tree corresponding to the phylogenetic 
tree above. 
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Table I Example of mutually incompatible sites for 
four sequences (W-Z) 


l 2 3 4 5 6 7 
W A G T Cc A G T 
xX A G T T A C G 
Y A T A C A C G 
Z A T A T G G G 
X 
Figure 2 


2. Recombination: when a crossing-over occurs 
between two alleles, the relationship of those two 
parental alleles and newly created two recombinant 
alleles is reticulation. Figure | B shows this relation- 
ship. Alleles 1 and 2 are parental, while 3 and 4 are 
recombinants. 

Gene conversion: when gene conversion occurs, 

reticulation may be created, as in the case of recom- 

bination. 

4. When evolutionary history of populations, local 
race, or subspecies are considered, once genetically 
diverged populations may exchange genes (admix- 
ture) and may create a new hybrid population. In 
this case, reticulation of population history appears. 


„n 


All the above events are important particularly, when 
closely related nucleotide sequences are considered. 
As sequence divergence becomes larger, effects of 
those phenomena are weakened, and the result- 
ing phylogenetic network is expected to have a tree 
structure. 

When the maximum parsimony method is applied 
to closely related many sequence data, we often observe 
a large number of equally parsimonious trees. If we 
apply this phylogenetic network method, only one net- 
work is obtained. Therefore, a phylogenetic network 
is a good way of visualizing data structure. However, 
there is a serious drawback to phylogenetic networks. 
Theoretically, a source of reticulation is the existence 
of mutually incompatible sites, as shown in Table | 
for the case of four nucleotide sequences. Sites 2 and 3 
have identical nucleotide configuration, while those 
two sites are mutually incompatible with sites 4 and 6. 


Figure 2 is the resulting three-dimensional phylo- 
genetic network for the data of Table 1. When there 
are many incompatible sites in the given sequence data, 
however, many dimensions are required to visualize 
those complex data structures. 


See also: Gene Trees; Species Trees; Trees 
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Neu, also known as p185/Her-2 and erbB-2, is a 
member of a family of transmembrane tyrosine kinase 
receptors which are involved in the regulation of 
growth and development. Consisting of four mem- 
bers, including erbB-1 (EGFR), erbB-2 (Her-2/neu), 
erbB-3, and erbB-4, the erbB receptor family, along 
with a multitude of ligands, constitutes a signaling 
network in which activation requires homo- or het- 
erodimerization or oligomerization. As an orphan 
receptor (see Orphan Receptor), erbB-2 functions 
either as a homo- or heterodimer oligomer, and ap- 
pears to upregulate the function of other family mem- 
bers. Dimerization/oligomerization leads to kinase- 
dependent erbB binding partner cross-phosphoryl- 
ation, providing docking sites for signaling molecules, 
including proteins containing SH2 domains. ErbB-2 
was first identified as causative in transformation in a 
chemically induced rat neuroglioblastoma, where a 
transmembrane mutation was found to result in 
constitutive dimerization, constitutive signaling, and 
hence, malignant transformation. While no such mu- 
tation has been regularly observed in human tumors, 
the principle that increased signaling as a result of 
increased dimerization leads to transformation has 
held true. Her-2/neu is overexpressed, either with or 
without concomitant amplification, in a wide variety 
of human tumors, most notably breast and ovarian 
tumors; once a critical expression threshold is 
reached, spontaneous dimerization occurs, resulting 
in increased signaling in the absence of normal 
regulation. The critical role of Her-2/neu in human 
tumors has been demonstrated through the use 
of Her-2/neu-directed clinical therapy, as a 
monoclonal antibody against Her-2/neu has been 
approved by the FDA for use in a subset of breast 
cancer, and has shown marked success with minimal 
side effects. 


See also: Breast Cancer; Cancer Susceptibility 
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Neurofibromatosis is the collective term for a group 
of genetic disorders with overlapping clinical features. 
The definition of the different types depends on the 
occurrence, number, and distribution of flat, brown 
marks on the skin (called café au lait spots), be- 
nign tumors of the nervous system (neurofibromas 
and Schwannomas), and ophthalmological findings 
(which are frequently asymptomatic). The different 
forms are classified using a numerical system; at the 
present time only type 1 (Nf1) and type 2 (Nf2) are 
defined sufficiently to be classified. There are other 
forms of the disease but these are extremely rare. 

It is important to note that until the early 1980s, the 
medical profession did not widely appreciate the dis- 
tinction between Nf1 and Nf2. Patients were simply 
told that they had neurofibromatosis or von Recklin- 
ghausen disease (von Recklinghausen was a German 
pathologist who originally described Nf1 in 1882). 


Neurofibromatosis type | 


Epidemiology 
Nf1 is one of the commonest autosomal dominant 


disorders in man. It has a birth incidence of around 1 
in 2500-3000. 


Clinical Features 
The major features of Nf1, present in almost every 
patient, consist of specific kinds of skin pigmentation, 
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benign tumors on the nerves supplying the skin, and 
hamartomas of the iris called Lisch nodules. The skin 
pigmentary changes are the first disease features to 
appear, usually within the first two years of life. 
They consist of café au lait spots (these are flat, coffee- 
colored marks, usually 2-3cm in diameter) and 
freckles which develop in places not seen in the gen- 
eral population (in the armpits, groins, and around the 
base of the neck). Ten per cent of the general popu- 
lation have one or two café au lait spots, but children 
with Nf1 always have six or more. The skin tumors 
are called dermal neurofibromas; they appear as small 
purplish swellings, only a few millimetres in diameter. 
They rarely cause symptoms but depending on their 
number, can present a significant cosmetic burden for 
the patient. The neurofibromas usually begin to 
develop in the teens. Iris Lisch nodules are entirely 
asymptomatic and often only visible on slit lamp 
examination. 

If Nf1 patients only have the major features, then 
it is effectively a dermatological condition and it is 
only the cosmetic burden that can be a problem. 
However there is a wide variety of disease complica- 
tions that can affect almost any organ system in the 
body. The occurrence of the complications cannot be 
predicted, even within families. The major compli- 
cations and their frequency are listed in Table |. 
Even patients with only mild skin changes are at risk 
of these. 


Diagnostic Criteria 

The diagnosis of Nf1 is based on clinical features. The 
diagnostic criteria are met in an individual who has 
two or more of the following: (1) six or more café au 
lait macules of over 5 mm in greatest diameter in pre- 
pubertal individuals and over 15mm in greatest 


Table | Summary of the clinical and genetic features of Nfl and Nf2 
Nfl Nf2 
Inheritance Dominant — variable even within families Dominant — families divide into broad 


Gene location Chromosome |7 
Major features 
© Dermal neurofibromas 

@ Lisch nodules 

@ Learning disability (30-65%) 

@ Macrocephaly (50%) 

@ Slight shortening of stature (33%) 
® Plexiform neurofibromas (26%) 

® Scoliosis (6%) 


Other features 


© Disease-related cancer and brain tumors (5%) 


© Pseudarthrosis (2%) 


e Café au lait spots and skin fold freckling 


categories of mild and severe, relatively 
strong intrafamilial correlation 
Chromosome 22 

@ Vestibular Schwannomas 


© Cataracts (usually asymptomatic 87%) 

© Peripheral nerve Schwannomas (68%) 

@ Meningiomas (45%) 

© Café au lait spots (nearly always <6, 43%) 
© Spinal Schwannomas (26%) 

@ Other nervous system tumors (6%) 
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diameter in post-pubertal individuals; (2) two or more 
neurofibromas of any type or one plexiform neurofi- 
broma; (3) freckling in the axillary or inguinal regions; 
(4) optic glioma; (5) two or more Lisch nodules (iris 
harmartomas); (6) a distinctive osseous lesion such 
as sphenoid dysplasia or thinning of the long bone 
cortex with or without pseudarthrosis; (7) first-degree 
relative (parent, sibling, or offspring) by the above 
criteria. 


Genetics 

Nfl is an autosomal dominant disorder. Approxi- 
mately half the patients presenting will be the first 
case in their family. Genetic counseling is complicated 
because there is no way that one can predict the sever- 
ity of the disease in offspring. Affected people have a 
1 in 2 risk of having an affected child and a 1 in 12 risk 
of a child with one of the severe complications. 

The gene for Nf1 is on chromosome 17 and was 
cloned in 1990. The gene is a large one, spanning over 
350 kb of genomic DNA and has 60 exons. It encodes 
for the protein neurofibromin. All the functions of 
neurofibromin are still being determined. The func- 
tion best studied to date is its activity as a GAPase 
activating (GAP) protein. This was identified because 
a portion of the coding sequence of the Nf1 gene 
shows close homology to the GAP activating family. 
The Nfl gene acts as a tumor suppressor gene, with 
loss of function of the second allele resulting in loss of 
regulation of ras activity. Although the tumor sup- 
pressor action is likely to account for the different 
tumors that can occur in Nf1, one copy of the abnor- 
mal gene must have some form of systemic effect to 
give rise to problems such as learning difficulties and 
short stature. 

Molecular genetic diagnosis for Nf1 mutations is 
just becoming available in the National Health Service 
setting in the UK. Analysis has been hampered by the 
large size of the gene and the lack of a mutation hot- 
spot. There is no clear genotype phenotype correl- 
ation except in the small subgroup of patients where 
the entire Nf1 gene is deleted. 


Management 

Children and adults with Nf1 should have an annual 
clinical review, monitoring for the occurrence of dis- 
ease complications. This is particularly important in 
childhood when the majority of severe complications 
will present. The children need to be seen by a pedi- 
atrician with some experience in Nfl management. 
Routine screening tests for the complications are not 
recommended. Many countries have lay neurofibro- 
matosis associations which are an important source of 
information and support for families. 


Neurofibromatosis Type 2 


Epidemiology 

Nf2 is much less common than Nf1 with an estimated 
birth incidence of around 1 in 33000 and a sympto- 
matic prevalence of 1 in 210000. 


Clinical Features 

Nf2 was only established as a separate entity in 1970. 
The overlap with Nf1 arises because café au lait spots 
and peripheral nerve tumors occur in both conditions. 
However it is extremely unusual for Nf2 patients to 
have as many as six café au lait spots and the nerve 
tumors are Schwannormas and not neurofibromas. 
Iris Lisch nodules do not occur in Nf2 but specific, 
often asymptomatic, eye changes also occur in the 
form of cataracts. 

The major clinical feature of Nf2 is the occurrence 
of bilateral vestibular Schwannomas (also known as 
acoustic neuromas). These are benign tumors but 
because they develop in a critical place on the eighth 
cranial nerve, they cause hearing and balance difficul- 
ties. As the tumors enlarge they cause pressure on the 
brain stem and cerebellum. 

The other tumors that can develop in Nf2 are listed 
in Table |. The average age of symptomatic presen- 
tation is around the mid-twenties. However some 
patients have a severe form of Nf2 that usually pre- 
sents in childhood. 


Diagnostic Criteria 

Nf2 is also diagnosed against a set of clinical criteria, 
originally defined at an NIH consensus conference in 
1987. Unlike the Nf1 criteria, which have stood the 
test of time well, the Nf2 criteria have been found to 
be too stringent. In 1997 revised criteria were pro- 
posed that allow for a diagnosis of either definite or 
presumptive/probable Nf2 as follows. 

An individual with the following clinical features 
have definite Nf2: (1) Bilateral vestibular Schwan- 
nomas (VS); (2) a family history of Nf2 (first-degree 
relative) plus a unilateral VS diagnosed less than 30 
years or any two of: meningioma, glioma, Schwan- 
noma, juvenile posterior subcapsular lenticular 
opacities/juvenile cortical cataract. Nf2 is probable 
in individuals with: (1) unilateral VS less than 30 
years plus at least one of the following: meningioma, 
glioma, Schwannoma, juvenile posterior subcapsular 
lenticular opacities/juvenile cortical cataract; 
(2) Multiple meningiomas (two or more) plus uni- 
lateral VS diagnosed less than 30 years or one of the 
following: glioma, Schwannoma, juvenile posterior 
subcapsular lenticular opacities/juvenile cortical 
cataract. 


Genetics 

Nf2 is also an autosomal dominant condition and 
again about half the cases are the first affected person 
in their family. Nf2 is much less variable than Nf1 and 
tends to follow a relatively similar course within 
families. 

The Nf2 gene was cloned in 1993. It encodes a 
protein that has been named alternatively Merlin or 
Schwannomin. The Nf2 gene sequence shows no over- 
lap with the Nf1 gene and no shared functions of the 
proteins have been found. The Nf2 gene spans 110 kb 
and comprises 16 constitutive exons and one alterna- 
tively spliced exon. The gene sequence shows strong 
homology to the highly conserved protein 4.1 family 
of cytoskeleton associated proteins. This group of 
proteins interact with the rho family of GTPases in a 
signaling cascade which controls the organization of 
the spectrin—actin cytoskeleton and cell adhesion. 

In the UK, National Health Service mutation ana- 
lysis of the Nf2 gene is available but present techno- 
logy only identifies mutations in about half the cases. 
There is some genotype-phenotype correlation. 


Management 

Tumors that occur in Nf2, particularly the vestibular 
Schwannomas, need management by an experienced 
team. It is recommended that all patients with Nf2 are 
followed in specialist centers by a multidisciplinary 
team including neurosurgeons, otolaryngologists, oph- 
thalmologists, and clinical geneticists. 


Other Forms of Neurofibromatosis 


The only ones of these that occur at any frequency are 
variant phenotypes of Nf1 and Nf2 that result from 
somatic mutation of the genes involved. These pa- 
tients are described as having segmental/localized 
disease. A patient with localized Nfl may, for ex- 
ample, have a quadrant of the body affected by café 
au lait spots and neurofibromas but no abnormalities 
elsewhere. Likewise in patients with mosaic localized 
Nf2, one might see a unilateral VS with ipsilateral 
meningiomas but no signs of disease elsewhere. 

Other forms of Nf are exceedingly rare and account 
for only a handful of families, even in specialist cen- 
ters. These include families with dominant inheritance 
of café au lait spots only and Watson syndrome. 


Further Reading 

Fiedman J, Gutmann DH, MacCollin M and Riccardi VM (1999) 
Neurofibromatosis: Phenotype, Natural History and Pathogenesis, 
3rd edn. Baltimore, MD: Johns Hopkins University Press. 

Gutmann DH, Aynsworth A, Carey JC et al. (1997) The diagnos- 
tic evaluation and multidisciplinary management of neuro- 
fibromatosis | and neurofibromatosis 2. JAMA 278: 51-57. 
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implications of mosaicism in the neurofibromatoses. Neuro- 
logy 56. 

Upadhyaya M and Cooper DN (eds) (1998) Neurofibromatosis 
Type | from Genotype to Phenotype. Oxford: Bios Scientific 
Publishers. 


See also: Cancer Susceptibility; Mosaicism in 
Humans 
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Overview 


The human brain is an extraordinarily complex and 
beautiful organ. It has one hundred billion neurons 
and an inestimable number of synaptic connections. 
Since there are very few opportunities to ethically test 
experimental hypotheses on the human brain, we rely 
on model organisms to understand how neurons work 
in humans. Such an organism, the nematode Caeno- 
rhabditis elegans, with a total of only 302 neurons, 
has contributed substantially to our understanding of 
the development and function of the human nervous 
system. 


Using a Soil Nematode to Study 
Neurobiology 


In 1974 Sydney Brenner introduced C. elegans as a 
genetic model organism that could be used to eluci- 
date the molecular nature of the nervous system 
(Brenner, 1974). The main advantages of C. elegans 
as a model organism for the study of genetic pathways 
in general include the simplicity of worm mainten- 
ance, the ease of isolating mutants, and the availability 
of molecular reagents for gene analysis. In addition, 
the nematode possesses a number of features that 
make it particularly well suited for the study of its 
nervous system. First, the number and positions of the 
neurons are invariant between individuals (Figure 1). 
This feature allowed John White and others to recon- 
struct the connectivity of the nervous system from 
serial electron micrographs (White et al., 1986). 
Second, the worm is transparent, so individual cells 
can be identified using a light microscope and can be 
killed by firing pulses from a laser microbeam into the 
cell. By using cell-ablation studies, researchers can infer 
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The adult nervous system of C. elegans. There are 302 neurons in an adult nematode. Most of the cell 
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the role that each cell plays in the behavior of the 
worm. Third, the nervous system is largely nonessen- 
tial under laboratory conditions. The worm does not 
need a functional nervous system to eat. This nema- 
tode ingests bacteria using a muscular pump called 
the pharynx, and the pharynx will pump even in the 
absence of neuronal input. In addition, C. elegans is 
a self-fertilizing hermaphrodite, so it does not need a 
nervous system to search for mates and to reproduce. 
However, laser ablation studies have demonstrated 
that there are three neurons, M4, and CANL, and 
CANR, which when ablated will cause the animal to 
die. The M4 motor neuron regulates the peristaltic 
movements of the pharynx and the CAN neurons 
are required for osmoregulation. 


Genetic Dissection of Neuronal 
Development 


The development of the nervous system can be 
divided into four steps, the determination of neuronal 
cell fates, the specification of cell identity, the out- 
growth of axons, and the differentiation of synaptic 
connectivity (extensively covered by Riddle et al., 1997 
and reviewed by Chalfie and Jorgensen, 1998). In C. 
elegans the cell lineage is invariant, that is, every cell 
division generates two daughter cells and the fates of 
the daughter cells of each division are largely fixed. 
Early divisions generate six founder cells (Figure 2); 
individual founder cells generally give rise to one type 
of tissue. For example, all daughter cells of the P4 
founder cell are germ cells, all daughter cells of the 
E founder cell are intestinal cells, and the D founder 
cell only gives rise to muscle cells. Thus, tissue-type 
determinants are likely to be expressed very early in 
these lineages. On the other hand, neuronal tissues are 
derived from AB, MS, and C founder cells. Thus, there 
is no founder cell that gives rise to only neuronal 
tissue. In fact, neuronal cell fates are frequently 


segregated in the terminal cell division of lineages 
that also generate other ectodermal derivatives such 
as epidermal or glial cells (Figure 2). Thus, neuronal 
determinants are likely to be expressed late during 
embryonic cell divisions. 

How then does a cell adopt a neuronal cell fate? 
The mechanisms that determine neuronal cell fate 
seem to be conserved among C. elegans, Drosophila, 
and mice. Specifically, proneural genes related to 
members of the basic helix-loop-helix (bHLH) family 
of transcription factors function in neurogenesis in 
C. elegans. For example, /in-32, an ortholog of the 
Drosophila atonal gene, is required for the generation 
of many sensory cells. In /in-32 mutants, these cells 
become epidermal cells instead. Sensory cell lineages 
expressing LIN-32 are then modified by the unc-86 
gene. UNC-86 is a member of the POU homeo- 
domain family of transcription factors. UNC-86 
expression prevents a daughter cell from adopting its 
mother’s fate. For example, the Q neuroblast divides 
to generate an anterior daughter which will eventually 
produce a sensory cell called AQR. The unc-86 gene is 
expressed in the posterior daughter, and this lineage 
will eventually generate a mechanosensory neuron 
called AVM. In unc-86 mutants, the posterior daugh- 
ter retains the Q neuroblast identity and continues to 
generate an anterior AQR progenitor cell and a pos- 
terior Q neuroblast. In the wild-type, the UNC-86 
protein remains expressed in a subset of these cells, 
after cell divisions are complete, and acquires a new 
function. In particular, it is required to specify the 
correct cell identity in conjunction with other tran- 
scription factors. One such partner, MEC-3, a LIM 
homeodomain transcription factor, is required for the 
specification of six mechanosensory neurons. MEC-3 
expression in these cells activates genes required for 
mechanosensation cell function, such as the gene 
encoding a specific tubulin required for mechanosens- 
ory neurites. In the absence of unc-86 expression, 
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Figure 2 The early cell lineage of C. elegans. The founder cells and the tissues which are derived from their 
descendents are shown (the names of individual cells are noted in parentheses). Horizontal lines represent cell 
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divisions. A very small sublineage from the AB founder cell is shown. The ‘a, ‘p, and ‘I’ suffixes attached to AB 
refers to the anterior, posterior, and left daughters, respectively, of the previous cell in the lineage. Adapted from 


Riddle et al. (1997). 


there are no mechanosensory precursors, and also no 
mec-3 expression. In the absence of mec-3 expression, 
mechanosensory cells are produced but cannot func- 
tion as mechanosensory neurons. These POU and 
LIM homeodomain transcription factors were origin- 
ally identified in C. elegans and have since defined 
new families of transcription factors found in other 
invertebrates and vertebrates. 

After acquiring a cell fate, a neuron must send out- 
axons to form connections with other neurons. The 
direction of axonal outgrowth is determined by 
chemoattractants and chemorepellents. In some 
cases, this can be the same molecule. For example, 
responses to the secreted netrin/UNC-6 protein is 
mediated by the UNC-40 and UNC-5 receptors. 
The UNC-40 receptor causes axons to be attracted 
to secreted UNC-6 molecules, whereas axons express- 
ing both the UNC-40 and the UNC-5 receptors are 
repelled by UNC-6. Neurons of the head ganglia 
appear to rely on partially redundant gradients pro- 
duced by three signal transduction pathways, which 
play similar roles in Drosophila and vertebrates: the 
UNC-6 pathway, the Robo/Slit pathway and the Eph 
pathway. For example, axon migrations of the amphid 
sensory neurons require the parallel action of all three 
of these guidance systems. Worms simultaneously 
mutated in any combination of two of the processes 


result in stronger mutant phenotypes than worms 
mutated in only one process. 

The growth cone converts these guidance cues into 
changes in the cytoskeleton which will redirect its 
trajectory. The conversion of these cues is mediated 
through signaling pathways made up of Rho-family 
guanosine triphosphatase (GTPase) proteins. These 
proteins exist in an inactive guanosine diphosphate 
(GDP)-bound state or an active guanosine triphos- 
phate (GTP)-bound state. Switching from one state 
to the other requires a guanine nucleotide exchange 
factor (GEF) which exchanges GTP for GDP. Muta- 
tions in GEFs such as UNC-73, a homolog of the 
human Trio, result in abnormal axonal guidance and 
premature termination of axons. There are several 
Rho-family GTPases which are possible targets of 
UNC-73. Mutations in any one of these produces 
weak defects in migration. For example, mutations in 
the MIG-2 GTPase result in migration defects in only 
the Q neuroblasts. However, mutations in multiple 
GTPases produce severe phenotypes, reminiscent of 
unc-73 defects. It is likely that these GTPases regulate 
growth cone activity via the WASP actin nucleation 
and polymerization proteins. UNC-34, a homolog of 
the Drosophila-enabled and mammalian Mena pro- 
teins appears to act in parallel with these pathways. 
Mutations in unc-34 result in partially disrupted cell 
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migrations and axon outgrowth. However, loss of 
WASP expression in an unc-34 null background 
results in synthetic lethality. Thus, multiple pathways 
converge on the regulation of actin dynamics during 
cell and axon migrations. 

After outgrowth, neurons need to activate genes 
specific to neuronal classes and to form the proper 
connections with their target cells. For example, 
UNC-30 and UNC-4, members of the orthodenticle 
class of homeodomain transcription factors, are 
required to specify a subset of GABA and acetylcho- 
line neurons, respectively. unc-30 is required to spe- 
cify D-type GABA neurons of the ventral nerve cord. 
UNC-30 controls the expression of GABA-specific 
genes such as unc-25 and unc-47 in these D neurons. 
unc-25 encodes glutamic acid decarboxylase (GAD), 
the biosynthetic enzyme required to synthesize the 
neurotransmitter GABA, and unc-47 encodes the 
GABA vesicular transporter, required to load GABA 
into synaptic vesicles. In addition, UNC-30 is re- 
quired to specify the correct synaptic connections 
for the GABA neurons. Similar to unc-30, unc-4 
is required to specify A-type acetylcholine neurons, 
another class of ventral cord neurons. Specifically, 
unc-4 is required for VA motor neurons to form 
synaptic connections with the correct set of inter- 
neurons. In worms mutant for unc-4, the A-type VA 
neuron is functionally transformed into the B-type VB 
neuron. Since VA motor neurons control backward 
movement, the loss of UNC-4 activity results in an 
animal that is unable to move backward. 


Genetic Dissection of 
Neurotransmission 


Communication between neurons is mediated by 
neurotransmitters. When a neuron is depolarized, cal- 
cium enters the neuron via voltage-sensitive calctum 
channels; the influx of calcium then causes the synap- 
tic vesicles to fuse with the plasma membrane and 
release neurotransmitter into the synaptic cleft. Once 
the vesicle has released its contents, the vesicle and its 
associated proteins are retrieved from the membrane 
and prepared for another cycle of fusion. Biochemical 
studies of yeast and mammalian cells have identified 
many of the proteins required for vesicle dynamics. 
Genetic studies in C. elegans have identified addi- 
tional components and have elucidated the functions 
of many of these proteins at the synapse. The proteins 
uncovered in these genetic studies can be divided into 
two categories: those required for a specific neuro- 
transmitter type; and those required for the functions 
of all synapses, for example, proteins required for 
synaptic vesicle kinetics. We will emphasize those 
components which were discovered in C. elegans 


(see Chalfie and Jorgensen, 1998 and Bargmann and 
Kaplan, 1998 for review). 

The genes required for the two neurotransmitters 
which function at neuromuscular junctions, GABA 
and acetylcholine, are the most well studied. The 
behavioral phenotype associated with the loss of 
GABA neurons was determined by laser ablation. 
Screens for mutants which mimicked the loss of 
GABA identified six genes required for GABA func- 
tion. Three of the proteins identified in these screens 
were UNC-25, the biosynthetic enzyme for synthe- 
sizing GABA, UNC-47, the vesicular GABA trans- 
porter, and UNC-49, the GABA receptor. Screens 
assaying for altered levels of acetylcholine led to the 
identification of CHA-1, the biosynthetic enzyme 
required for acetylcholine synthesis and UNC-17, 
the vesicular acetylcholine transporter. The discovery 
of the vesicular transporters for GABA and acetylcho- 
line led to their subsequent identification in verte- 
brates. 

Proteins required for the functioning of all synapses 
include proteins required to transport materials from 
the cell body to the synapse and proteins required to 
dock, fuse, and recycle synaptic vesicles at the active 
zone. The cell body of the neuron is often far from the 
synapse. To transport synaptic vesicle precursors to the 
synapse, neurons use a kinesin-like motor protein 
encoded by unc-104. Worms with reduced function 
of unc-104 accumulate vesicle precursors in the cell 
body. Vertebrate homologs have been discovered and 
comprise the KIF1 family of kinesins. 

The fusion of synaptic vesicles with the plasma 
membrane requires the formation of the SNARE 
complex. The SNARE complex is comprised of three 
proteins: syntaxin/UNC-64, SNAP-25/RIC-4, and 
synaptobrevin/SNB-1. Both syntaxin and SNAP-25 
are associated with the plasma membrane, whereas 
synaptobrevin is an integral membrane protein of the 
synaptic vesicle. These three proteins form a helical 
bundle that pulls the vesicle close to the plasma mem- 
brane at the active zone, which is thought to induce 
membrane fusion. The SNARE complex is required 
for vesicle fusion. Null mutations in syntaxin and 
synaptobrevin abolish synaptic vesicle release. Two 
proteins implicated in regulating the formation of the 
fusion complex are UNC-18 and UNC-13. The dis- 
covery of unc-18 and unc-13 in C. elegans has led to 
the identification of homologs in vertebrates. Null 
mutations in either one of these proteins result in a 
severe decrease in neurotransmission. Both proteins 
have been demonstrated to bind to syntaxin/UNC-64. 
UNC-18 plays both a facilitory and inhibitory role in 
vesicle fusion. In the absence of UNC-18, there is a 
severe decrease in the release of synaptic vesicles; thus it 
must be playing a facilitory role in neurotransmission. 


In addition, UNC-18 stabilizes syntaxin in a conform- 
ation which prevents binding to synaptobrevin, thus 
UNC-18 inhibits the formation of the SNARE com- 
plex and subsequent fusion events. Unlike unc-18, 
unc-13 only plays a facilitory role in synaptic release. 
In animals with reduced UNC-13 function, synaptic 
release is abolished. Since vesicles dock normally to 
the plasma membrane in these mutants, unc-13 is not 
required for vesicle docking. Rather, unc-13 is 
required for the priming step that makes synaptic 
vesicles competent for fusion. Moreover, this protein 
is the target of modulatory cascades that increase 
neurotransmission. Specifically, UNC-13 acts in a G- 
protein signaling pathway downstream of Gq alpha/ 
EGL-30 and phospholipase C/EGL-8. Activation of 
these modulatory pathways stimulates the association 
of UNC-13 with the plasma membrane and a 
concomitant increase in vesicle priming. Once the 
vesicle is primed, the actual exocytosis fusion event 
is triggered by calcium influx. The Ca?™ sensor is 
likely to be synaptotagmin/SNT-1. Synaptotagmin is 
an integral membrane protein of the synaptic vesicle, 
contains two C2 Ca”*-binding domains, and has been 
demonstrated to bind to the SNARE complex. 
Absence of synaptotagmin causes a loss of calcium- 
dependent neurotransmitter release in mice. 

Surprisingly, synaptotagmin plays a dual role in the 
synaptic vesicle cycle. Once exocytosis is complete, 
the synaptic vesicle and its associated proteins are 
retrieved from the plasma membrane through endo- 
cytosis. Endocytosis is mediated by the formation of a 
clathrin cage, which buds membrane into the cell. 
Synaptotagmin appears to be required to recruit cla- 
thrin adapter proteins to the plasma membrane. In 
snt-1 mutants, there is a striking loss of vesicle endo- 
cytosis. An adaptor protein called AP180, encoded 
by UNC-11, recruits the synaptic vesicle protein 
synaptobrevin, as well as clathrin, to the membrane 
targeted for endocytosis. When the clathrin coat is 
assembled around the invaginating vesicle, dynamin, 
encoded by DYN-1, cleaves the vesicle from the 
plasma membrane. To complete vesicle recycling, the 
clathrin coat must be removed. One protein impli- 
cated in this process is synaptojanin/UNC-26 a 
polyphosphoinositide phosphatase which converts 
phosphatidylinositol-4, 5-bisphosphate (PIP) to 
phosphatidylinositol (PI). Mutations in unc-26 result 
in an accumulation of coated vesicles, presumably 
because the adapter proteins which bind PIP, remain 
attached to synaptic vesicle lipids. In addition, budded 
but uncleaved vesicles accumulate. Thus, the lipid 
composition of synaptic membranes plays an im- 
portant role in regulating progress through endo- 
cytosis. 
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Genetic Dissection of Behavior 


Nematodes monitor environmental signals such as 
odorants, salt concentrations, temperature, and 
hormones to migrate to favorable conditions and to 
retreat from unfavorable ones. Sensory neurons 
mediate either an attractive response or an aversive 
response to the compounds which they sense. For 
example, the AWA chemosensory neuron mediates 
attractive responses. AWA expresses ODR-10, a 
seven-transmembrane G-protein-coupled chemore- 
ceptor, which detects the volatile compound diacetyl. 
When a worm senses diacetyl, it will move up a gra- 
dient of the compound. However, when odr-10 was 
misexpressed in the AWB chemosensory neuron, a 
neuron which normally mediates aversive responses, 
the diacetyl became a repellent. Therefore the attract- 
ive or aversive nature of the odorant was controlled by 
the sensory cell rather than by the molecular nature of 
the output of the receptor. 

Worms can change their response to odorants or 
chemicals in a process called sensory adaptation. 
Through adaptation, worms become less responsive 
to particular odorants or tastes when exposed to a 
stimulus for long periods of time. Adaptation is likely 
to be a change in the transduction pathway of the 
receptor rather than the cell, since adaptation to a 
compound does not affect the responses to other com- 
pounds sensed by that same cell. Worms can exhibit 
long-term changes in their responses to temperature 
and chemicals also. Plasticity in thermotactic and 
chemotactic responses has been demonstrated by clas- 
sical conditioning paradigms built to test associative 
learning ability. For example, C. elegans can associate 
a specific temperature or ion with the presence of food 
(see Mori, 1999 for review). 


Genetic Dissection of Brain Diseases 


In certain circumstances, C. elegans can act as a model 
system for human disorders of the brain. Homologs of 
genes implicated in human brain diseases have been 
identified in C. elegans. The analysis of these genes 
will identify the molecular pathways underlying these 
diseases. For example, se/-12 was identified in a screen 
for suppressors of lin-12 mutants; lin-12 encodes a 
signaling molecule involved in cell fate determination. 
SEL-12 is a transmembrane protein that functions as 
part of the LIN-12/Notch signaling pathway. SEL-12 
is the C. elegans homolog of presenilin, a protein 
implicated in Alzheimer’s disease. Since human pre- 
senilin can substitute for SEL-12 in the worm, it is 
probable that presenilin also functions in a Notch 
signaling pathway in the human brain. 
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Future 


The C. elegans genome contains over 19 000 genes; 
however, only about 2000 have been identified by 
mutations. The absence of known mutations in these 
other 17 000 loci might be due to either redundancy or 
ignorance. First, redundancy has been observed for a 
number of loci; specifically, a phenotype is only 
observed when multiple genes are mutated. Second, 
we are still very naive about the many biochemical 
processes which are required by an organism and 
therefore have not yet designed screens capable of 
revealing them. For example, the C. elegans genome 
contains up to 1000 G-protein-coupled chemo- 
receptors for which specific functions remain 
unknown. In the future, clever screens may begin to 
identify genes for which functions have not yet been 
assigned. 

The C. elegans genome, of course, is the most 
important resource for further study. However, 
another largely unexplored resource is the completely 
determined neuronal connectivity of the C. elegans 
nervous system. Electrophysiological techniques 
have been developed which allow one to record 
from identified cells in the central nervous system of 
C. elegans. These methods will allow researchers to 
explore how neural circuits with known connect- 
ivities, and with defined molecular components, such 
as voltage-sensitive ion channels and ligand-gated 
receptors, function together to generate an electrical 
and behavioral output. 
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In the mid-1960s, Seymour Benzer, a physicist- 
turned-geneticist, who had already made major 
contributions to the fine structure mapping of bacterio- 
phage genes in the 1950s, began to study Drosophila. 
He became fascinated with how the flies behave, and 
how genes build the nervous system that mediates their 
actions. Benzer advocated a new approach to studying 
fly behavior, one that used the gene mutation like a 
scalpel to dissect the nervous system. He suggested 
the use of chemical mutagenesis, allied to ingeniously 
simple genetic and behavioral screening techniques, 
to rapidly isolate new mutations that disrupted the 
phenotype of choice. Initially he searched for X 
chromosome mutants, by mutagenizing males, then 
crossing them to females. In a normal cross, the muta- 
genized X chromosome of the male parent will find 
itself in its diplo-X daughters. These daughters will 
also carry the unmutagenized X from their mothers, 
and this will conceal the effects of any induced recessive 
mutation. To overcome this problem, he crossed the 
mutagenized males to females that carried a pair of 
nonsegregating X chromosomes that were physically 
attached to each other anda free Y chromosome. X-XY 
individuals are normal females in Drosophila, unlike 
their human counterparts with this chromosomal 
karyotype, who are masculinized and have Kleinefel- 
ter’s syndrome. From this cross, the male progeny 
inherit their father’s mutagenenized X chromosome, 
and their mother’s Y. The beauty of this scheme is that 
it allows recessive mutations on the X to be expressed 
in the males in the next generation (see Figure 1). 
The behavioral screens were also simple. The 
‘countercurrent apparatus, for example, allowed 
Benzer to fractionate those flies that did not respond 
to light by repeatedly testing a large sample. The fly’s 
normal behavior is to walk toward light, and Benzer 
soon sorted out those flies that were showing defective 
phototaxis. Many visual mutants were obtained in this 
way, as well as those that showed sluggish locomotor 
behavior. Mutations affecting flight were similarly 
isolated, by dumping flies into a glass cylinder whose 
insides were coated with oil. Those that attempted to 
fly stuck to the sides, while those that could not were 
recovered at the bottom. Mutants, such as drop-dead, 
would suddenly die, while others, such as ether-a 
go-go, would shake their legs rapidly in response to 


X* X-XX* X*Y 
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Figure |I (A) Mutagenesis of the X chromosome in 


Drosophila. Males are fed a powerful chemical mutagen. 
X* denotes the mutagenized X-chromosome. The male 
is crossed to females carrying the attached-X (X-X) 
chromosome and a free Y. In the next generation, the 
male progeny inherit the mutagenized X and express any 
recessive behavioral mutation. X-XX and YY combina- 
tions are lethal. (B) Mosaics are formed by chromosome 
loss. A female carrying the unstable ring-X-chromosome 
is crossed to a male carrying a behavioral mutation on 
the X linked to recessive anatomical markers (X*). The 
female zygote carrying the ring-X loses the ring 
chromosome at the first mitotic division giving a XO 
karyotype which then gives rise to a male, haplo-X 
lineage expressing the behavioral and morphological 
mutations. The nucleus that does not lose the ring-X is 
diplo-X and gives a female lineage that is heterozygous 
for the behavioral and anatomical markers. Thus a 
gynandromorph or mosaic is formed. 


ether. Other mutants such as stuck, would not dis- 
engage from the female after copulation, and itis hardly 
necessary to describe the coitus interruptus phenotype. 

The next step was to find out whether the nervous 
system was distrupted in these mutants, or whether 
peripheral tissue, such as muscle or glands, were invol- 
ved. Benzer refined another clever technique, origin- 
ally devised by Sturtevant in the 1920s, called fate 
mapping, and applied it to behavior. First he recom- 
bined the X-linked behavioral mutation with other 
recessive sex-linked anatomical markers such as yel- 
low (body color) white (eye color) and singed (bristle 
shape), and then crossed these males to females that 
carried an unstable ring-X chromosome (Figure |B). 
This unusual ring-X chromosome encoded the wild- 
type behavioral and marker genes, but had the unusual 
property that it would be lost in a female zygote, 
usually at the first or second mitotic division. Thus 
nuclei carrying both X chromosomes would be female 
and wild-type for the behavioral and anatomical mark- 
ers, but nuclei that had lost the ring-X, would be haplo- 
X and male, and express these recessive genes. As 
development proceeded, a mosaic fly would be 
formed, which would be half male/half female if the 


ring-X had been lost in the first division, or a quarter 
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male if it had been lost in the second (Figure |B). 
These gynandromorphs could then be used to fate map 
the anatomical focus for the mutant behavioral gene. 
This technique pinpoints the behavioral mutant focus 
to an area on the blastual, the hollow ball of cells 
formed 2-3h after fertilization, by which time each 
cell’s developmental fate is largely determined. This 
method relied on taking a large number of mosaics, 
each with a unique distribution of male and female 
tissue, and scoring the behavior (mutant male, or wild- 
type female) and anatomical markers for each (male, 
female). The method correlates which part of the fly 
had to be mutant in order for the mutant behavior to 
be expressed. Thus a mosaic with a drop-dead mutant 
head but normal body would express the sudden death 
phenotype, and give a mutant focus in that part of the 
blastula that would generate the cephalic nervous sys- 
tem. Thus you could rule out that the cause of this 
dramatic phenotype was, say, a circulating substance 
that might have originated from an organ in the fly’s 
abdomen. The fate map technique thus provided an 
anatomical correlate for the mutant behavior. 
Benzer’s students modified the genetic and behav- 
ioral screens, extending them to the autosomes, and 
they improved the resolution of the fate mapping 
technique itself by also using internal markers that 
could distinguish male (XO) from female (XX) 
neurons. They also began to look at much more com- 
plex behavioral phenotypes, such as circadian rhythms 
(see Clock Mutants), sexual behavior, and learning. 
This gave rise to the field that was initially called 
‘molecular ethology’ by some, and ‘neurogenetics’ 
by others. The latter term has tended to stick, and 
the field can now be divided into three main over- 
lapping areas. The first is the analysis of behavior, 
championed most forcefully these days by Jeff Hall, 
one of Benzer’s original students at Caltech. The se- 
cond is the study of neurogenesis, which is the process 
by which neurons are first formed, and in which the 
gene Notch, first identified by Poulson in the 1940s, 
plays a central role. The third major area has been 
developed by Corey Goodman and his colleagues, 
and focuses on how the nervous system wires itself 
up during development. The last two areas are covered 
in other articles (Neuronal Specification, Neuronal 
Guidance), and will be mentioned here only briefly. 


Behavioral Neurogenetics 


Simple Behavioral Phenotypes 

The analysis of learning and memory, sexual behavior, 
and circadian rhythms has provided remarkable in- 
sights into the neurogenetic basis of these extremely 
complex phenotypes. However, equally important is 
the work that originated from mutants that had much 
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simpler behavioral phenotypes. The Shaker mutants, 
for example, have been pivotal in defining the potas- 
sium channel in Drosophila, and have initiated a cot- 
tage industry of molecular neurophysiology. While 
the initial behavior of these mutants, i.e., shaking 
of the legs, cannot be considered to be an especially 
sophisticated form of behavior, and can only define 
the normal behavior as ‘not shaking of legs,’ Shaker 
nevertheless plays a cardinal role in the development 
of physiological neurogenetics and, like many stories 
in the field, begins with a behavioral mutant. We can 
contrast Shaker with the period (per) gene for example 
(see Clock Mutants), in which mutations change the 
period of the circadian behavioral cycle. Deleting the 
per gene leaves perfectly viable flies that are arrhyth- 
mic, whereas deleting Shaker is lethal, providing the 
two extreme classes of behavioral genes — the essential 
and the nonessential. In between are genes that affect 
metabolic processes, so that flies carrying mutations of 
these genes are generally sick, and therefore their 
behavior is sluggish. Again, this does not mean that 
they are any less interesting from a neurogenetic per- 
spective. For example, the inactive mutant sounds 
(and is) rather boring and shows reduced levels of 
the enzyme tyrosine decarboxylase, as well as sluggish 
locomotor behavior. However, inactive is one of the 
rare mutants that fails to be sensitized on exposure to 
cocaine, thereby providing an important link between 
this gene and the craving of drugs by flies, with all 
the implications that it carries for human behavior. 

In general, the ‘simpler’ phenotypes that are stud- 
ied, for example phototaxis, will define genetic lesions 
that affect the sensory or motor systems. Such is the 
case for screens targeting the flight, locomotion, olfac- 
tory, visual, and mechanoreceptor apparatus. The genes 
they uncover, although initially classed as ‘behavioral,’ 
then take on a life of their own outside the behavioral 
field, such as Shaker. Another good example is seven- 
less, which was originally isolated as a visual mutant in 
a phototaxis screen, but which is now understood to 
play a key role in the development of the R7 photo- 
receptor. The message is that Drosophila neuroge- 
netics often starts with a behavioral gene, but the 
more one finds out about it, the less of a behavioral 
gene it becomes. 


Complex Phenotypes: Sexual Behavior 

The neurogenetic analysis of much more complex 
behavioral phenotypes such as sexual behavior was 
also initiated by Benzer when he used his part-male/ 
part-female gynandromorphs to ask the question: 
which part of the fly must be male to produce the male 
elements of the courtship display? The stereotyped 
behavioral sequence (called a fixed-action-pattern by 
ethologists) begins with the male first tapping the 


female’s posterior with its forelegs, then following 
the female, before extending one wing at a time and 
vibrating it. This produces the lovesong, a species- 
specific acoustic signal that arouses the female, and 
also provides her with the species signature of the 
male that is courting her. The male then licks the 
female’s posterior, before attempting to copulate. 
The male’s behavioral program is usually repeated 
many times before successful copulation. Benzer, and 
later Hall, used mosaics to map, with some precision, 
those regions of the brain that expressed a male geno- 
type in order for the gynandromorph to show the 
male courtship elements. Following the female and 
extending the wing required male tissue unilaterally 
in the dorsal part of the brain close to the mushroom 
bodies. For wing vibration to occur, a unilateral male 
focus in the ventral part of the central thoracic gan- 
glion area was required, whereas copulation required a 
more diffuse focus in the thoracic and abdominal 
regions. Similarly, for a gynandromorph to show sex 
appeal and stimulate courtship from other males, the 
posterior part of the abdomen had to be female. Later 
it was discovered that this region carries glands that 
produce the female aphrodisiac pheromones. 

These early studies identified neuroanatomical 
regions that played key roles in the sexual behavior 
program. The fruitless mutant, in which males are 
bisexual and sterile, had been identified in the 1950s 
by Gill. Not only are they bisexual, but their courtship 
song is completely abnormal and the mutants are 
missing a male-specific muscle in the abdomen. The 
gene for fruitless was cloned in the mid-1990s, and was 
found to encode a zinc finger transcription factor. The 
gene was expressed in regions of the dorsal brain that 
the mosaic studies of Hall had indicated were import- 
ant for male sexual behavior. Importantly, fruitless 
was regulated directly by transformer, one of the 
critical sex-determining genes involved in morpholo- 
gical sexual differentiation. fruitless is thus at the top 
of the gene regulation hierarchy that gives the nervous 
system its sexual identity. 

Modern mosaic studies do not rely on chromosome 
loss, as is the case with the unstable ring-X mentioned 
above. In the early studies, each mosaic was a unique 
mixture of male and female tissue. The second gener- 
ation enhancer-trap methodology, in which the yeast 
activator GAL4 is used to misexpress genes in specific 
tissues, provides a technique in which each fly is a 
mosaic, yet is identical to its siblings. This provides 
additional statistical power when attempting to cor- 
relate brain anatomy with behavior. For example, 
Greenspan and his colleagues used the enhancer trap 
system to misexpress the sex-determining gene trans- 
former in males (transformer is ON in females and 
OFF in males). By using different enhancer-trap 


GAL4 lines, the brains of males were feminized to 
different extents. Males that showed bisexual behavior 
were feminized only in regions of the brain that 
included the antennal lobes and the mushroom bodies. 
These are regions associated with the processing and 
integration of olfactory (pheromonal) input from the 
female. The bisexual behavior of these males suggests 
that these parts of the brain carry an inhibitory center 
that normally prevents male—male interactions, or that 
females have a structure in this part of the brain that 
detects male aphrodisiac pheromones, which is trig- 
gered by activating transformer. 

Genes that determine the lovesong pattern have 
also been identified by mutagenesis, for example, non- 
on-transient A (nonA) and cacophony (cac). The for- 
mer gene encodes an RNA-binding protein, whereas 
the latter encodes a calcium channel subunit. Mutants 
for both these genes not only have abnormal love- 
songs, but they also show defective vision, indicating 
a pleiotropic requirement for both these gene pro- 
ducts in two apparently unrelated phenotypes. 

The molecular revolution has clearly enhanced 
Benzer’s neurogenetic approach in that genes can be 
readily cloned, their products identified, and their 
spatial and temporal patterns of expression can be 
visualized by im situ hybridization or with the use of 
antibodies. This has been applied to genes involved in 
sensory and motor systems, as well more central ‘cog- 
nitive’ behavior. The interested reader is referred to 
the excellent work of Heisenberg and his colleagues 
on the role of memory and how the fly processes 
information during flight and locomotion, and that 
of Tully and his coworkers on learning. 


Neurogenic Genes 


The study of the developmental pathway by which 
neurons and sensory organs are initially formed pro- 
vides another major avenue of exploration. Central to 
neurogenesis is Notch, so called because the original 
mutants had notches in the wing margin. Null muta- 
tions, however, caused a massive neural hyperplasia in 
the embryo; this was due to a change in the fate of cells 
destined to become epidermis, which instead devel- 
oped into neural tissue, giving the ‘neurogenic’ pheno- 
type. Thus Notch is the key to whether cells will 
develop as neurons or epidermis, and the way it does 
this is by acting as a signal to amplify and maintain 
molecular differences between adjacent cells. 

Notch encodes a transmembrane receptor with 
an extracellular domain composed of 36 tandem epi- 
dermal growth factor (EGF)-like repeats and three 
cysteine-rich repeats. The intracellular domain con- 
tains six tandem ankyrin repeats and a glutamine-rich 
region (opa). The ligands that bind to the extracellular 
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domain and stimulate the Notch receptor include 
Delta and Serrate, which are expressed on the surface 
of neighboring cells, although Notch and Delta 
may also be found in the same cell. The extracellular 
domain of Delta binds to specific EGF repeats, trig- 
gering the Notch receptor, which activates the tran- 
scription factor Suppressor of hairless (Su(H)) via an 
interaction with Notch’s intracellular ankyrin repeats. 
Su(H) binds to regulatory sequences of the Enhancer- 
of-split (E(spl)) genes, which encode nuclear basic 
helix-loop-helix (bHLH) proteins, and these in turn 
bind to the regulatory sequences called E-boxes of 
the proneural genes, such as those of the achaete- 
scute complex, which define the neural cell lineages 
(Figure 2A). There is also evidence that the intracel- 
lular domain of Notch (ICD, see Figure 2A) itself is 
cleaved, which then translocates to the nucleus where 
it participates in nuclear events, perhaps acting as a 
partner with Su(H). 

Delta thus provides the primary signal to the Notch 
receptor, but how can a group of cells that are initially 
equivalent generate a spatial pattern whereby one 
group becomes a precursor for nervous tissue, and 
the other for epidermis? It is known that development 
is very sensitive to Notch and Delta gene dosage, so 
that anything that alters the ratio of ligand to receptor 
either within a single cell or between cells, may 
have important consequences. Imagine that a random 
event causes one cell to produce slightly more Delta 
ligand compared with its neighbor. This will activate 
Notch signaling in adjacent cells which feeds back and 
downregulates Delta (Su(H) upregulates E(spl), which 
represses achaete-scute, which positively regulates 
Delta). Thus the cells immediately surrounding the 
signaler have lower levels of Delta. The cells sur- 
rounding these cells have relatively more Delta and 
are more likely to become signalers, activating Notch 
in neighboring cells and so on (see Figure 2B). This 
process of lateral inhibition via Notch signaling pro- 
vides spatial patterns to ectodermal cells, and gives rise 
to clusters of signalers and receivers. It is the prior 
expression of proneural genes in clusters (or stripes) 
that initiates the patterning (see Figure 2B). Each 
cluster can give rise to the neural precursor that, for 
example, makes up the four basic cell types of the 
Drosophila sensory bristle: the hair cell, the socket cell 
that supports the hair, the neuron, and the glial cell that 
supports the neuron. 


Wiring up the Fly Nervous System 


In the embryo, each neuron sends out an axon which is 
guided through the masses of growing tissues until 
it finds its target. What are the molecular signposts 
that convey direction to the axonal growth cone? The 
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analysis of these molecules via mutagenesis forms 
the third major area within Drosophila neurogenetics. 
Four types of guidance mechanisms are used: chemo- 
repulsion in which the secreted semaphorins and 
netrins are prevalent; chemoattraction, in which the 
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Figure 2 (A) The Notch signal transduction pathway. 
The core pathway has four main components: a 
transmembrane ligand, Delta (DI); a transmembrane 
receptor, Notch (N); a transcription factor, Suppressor 
of hairless (Su(H)); and Enhancer of split (E(spl)). Initially, 
Su(H) is tethered to the cell membrane by interactions 
with the intracellular portion of Notch. Activation of the 
pathway is initiated by the binding of Delta to the Notch 
receptor on an adjacent cell. This interaction results in 
the nuclear localization of Su(H) and possibly an 
intracellular portion of Notch (ICD). Nuclear Su(H), 
possibly in association with the Notch ICD, activates the 
transcription of the E(spl) genes. (B) Lateral inhibition 
and Notch/Delta signaling. All cells of the proneural 
cluster initially express Achaete-Scute (AS-C) genes, 
Notch, and Delta. After binding of the Delta ligand to 
Notch and expression of E(spl) genes, the E(spl) proteins 
inhibit the AS-C products. The level of Delta transcrip- 
tion is controlled by AS-C proteins, thus closing the 
feedback loop between Notch and Delta. 


bifunctional netrins play an important role; and the 
shorter range cues of contact attraction and repulsion, 
which utilize molecules such as the cadherins and the 
transmembrane semaphorins, respectively. A detailed 
description of the mechanisms of axon pathfinding 
can be found in the article on Neuronal Guidance. 
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Overview 


The nervous system provides a communication net- 
work for an organism and is comprised of special- 
ized cells, called neurons, that exchange information 
through synaptic connections. Neurons project axons 
during development that migrate long distances along 
stereotypic pathways to find their appropriate targets 
and establish the initial connectivity of the nervous 
system. The trajectory of an individual axon is deter- 
mined by the motile tip of the axon, the growth 
cone, responding to the appropriate spatial signals 
along its route. These signals include cell surface mol- 
ecules and extracellular matrix molecules that provide 


short-range guidance or local guidance cues as well as 
secreted molecules that diffuse from their source and 
provide long-range or global guidance information. 
These signals can act to attract as well as to repel a 
migrating growth cone and, through their combined 
action, these signals orchestrate correct axon out- 
growth and pathfinding. Several guidance molecules 
have been identified, and current efforts are directed 
toward further understanding the molecular mechan- 
isms underlying axon guidance. 


Historical background 


A little over 100 years ago, Ramón y Cajal (1893) 
discovered the motile tips of projecting axons, which 
he named growth cones, and observed that they often 
take roundabout routes to reach their targets. He sug- 
gested that growth cones function in axon guidance; 
further experimental evidence supporting his hypoth- 
esis was provided later by Harrison (1910) and Speidel 
(1941). Although alternative models for the establish- 
ment of neuronal connectivity were prevalent during 
the 1930s and 1940s, the work of Sperry in the 1950s 
firmly reestablished the notion that neuronal connect- 
ivity is generated by the directed migration of axons. 
From axon regeneration studies in amphibians and 
related experiments, Sperry (1963) postulated the 
‘chemoaffinity theory,’ which proposed the existence 
of specific surface markers that growth cones use for 
both pathway and target recognition. More recent 
studies in various model systems, including verte- 
brates, insects, and nematodes, have clearly estab- 
lished that axon pathfinding is highly specific and 
that common guidance mechanisms are conserved in 
all organisms. These studies have also led to a greater 
understanding of the cellular and molecular basis of 
axonal guidance. 


Cellular Sources of Guidance 
Information 


Specific cells or groups of cells along the path of an 
extending axon provide guidance cues directing the 
axon to its final target. These cells are called guidepost 
cells in insects and act as intermediate targets for the 
migrating growth cone. The growth cone navigates to 
each intermediate target, one after the other, to reach 
its ultimate destination. Thus, the final trajectory of an 
individual axon, which can be long and complex, is 
composed of many short, sequential segments that are 
perhaps a few hundred microns in length. Although 
guidepost cells are important for correct pathfinding, 
additional cues that are provided by other cells in the 
axon’s environment are essential also. 
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Many growth cones extend along preexisting axons 
for all or part of their migration. The first axon in 
a nerve tract is called the pioneer axon and later- 
growing axons can bundle or fasciculate with the 
pioneer to form a nerve tract. Axons are highly organ- 
ized within a nerve bundle: A particular follower 
axon will always associate with a specific preexisting 
axon in the bundle. The selective affinity of axons 
within a nerve bundle has suggested that different 
types of axons have qualitative differences or labels 
that allow for the recognition of specific axon path- 
ways. The elimination of a pioneer axon often causes 
errors in the growth of the followers, indicating 
that they are important for the initial assembly and 
organization of nerve tracts. However, they are not 
absolutely required, as followers can partially com- 
pensate for their loss and form nerve tracts later in 
development. 


Attractive and Repulsive Guidance 
Forces and Target Recognition 


Four types of guidance forces act in concert to guide 
growth cone migrations: short-range (local) cues and 
long-range (diffusible) cues, each of which can be 
either attractive or repulsive. Short-range guidance 
involves the direct interaction of the growth cone 
with molecules on the surface of cells or in the sur- 
rounding extracellular matrix. Growth cones prefer 
to extend on an attractive or permissive substrate. 
The selective fasciculation of an axon within a nerve 
bundle is an example of an attractive, short-range 
interaction. Local repulsive or inhibitory cues can act 
to channel the growth of axons and prevent them from 
straying from their correct course or from extending 
past their target. 

Some guidance cues are released or secreted from 
their source and can diffuse to establish a gradient 
within the surrounding environment. These long- 
range, diffusible signals (chemoattractants and chemo- 
repellents) can provide global and position-dependent 
guidance information. Chemoattractants, which can 
be derived from the target or an intermediate target, 
direct the growth of the axon towards their source, 
whereas chemorepellents promote or redirect axon 
growth away from their source as well as cause axon 
growth to stall or stop. 

The coordinated, collective action of these four 
guidance forces steers the growth cone along its appro- 
priate path to its target. Some neurons will extend 
axons along acommon pathway to reach a shared target 
consisting of an array of many neurons, and each 
arriving axon will make a unique connection within 
that array. For example, in the vertebrate visual sys- 
tem, retinal ganglion cells make an orderly projection 
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onto the optic tectum in fish, amphibians, and birds or 
the superior colliculus in mammals. Target recogni- 
tion involves two mechanisms: topographic maps of 
graded cues and unique tags marking different targets. 
In the visual system, several gradients of both ligands 
and receptors define a topographic map that provides 
positional information for the formation of correct 
neuronal connections. In other, less complex contexts, 
individual axons can recognize specific cellular labels 
expressed by their target. 


Guidance Molecules 


The molecular characterization of several guidance 
signals and their receptors revealed that guidance 
molecules and their functions are highly conserved 
across species. For example, netrins, which are 
secreted laminin-related signaling molecules, have 
been discovered in worms, flies, frogs, fish, birds, 
and mammals and act in conserved signaling pathways 
in all these organisms. A netrin is an example of a 
bifunctional signal, as it can act to attract as well as 
to repel axon growth. Whether netrin attracts or repels 
an individual growth cone depends on the types of 
netrin receptors expressed by that neuron and upon 
the substrate that the axon is growing. Some types of 
signals, such as the semaphorin family, contain both 
cell surface and diffusible members that are implicated 
in short- and long-range guidance, respectively. Thus, 
depending on the specific context, the same or related 
molecules can mediate more than one of the four 
guidance forces described earlier. 

Guidance signals and their receptors share se- 
quence and structural motifs with extracellular matrix 
and cell adhesion molecules. Two major families of 
cell adhesion molecules have been identified: the 
immunoglobulin (Ig) gene superfamily and the cad- 
herin superfamily, which contain both transmembrane 
and lipid-anchored proteins. The extracellular region 
of the Ig proteins consists of tandem arrays of Ig and 
fibronectin type III domains. Many neural cell adhe- 
sion molecules and guidance receptors are transmem- 
brane proteins and members of the Ig superfamily. 
The intracellular region of some guidance receptors 
contains a protein tyrosine kinase or protein tyrosine 
phosphatase domain, and their signaling function 
depends, at least in part, on these catalytic activities. 
Other receptors lack an obvious catalytic domain and 
presumably signal via the association of other mol- 
ecules. 

In summary, axon guidance is highly specific and 
conserved in both form and function in all organisms. 
Specific receptor proteins present in a growth cone 
allow it to recognize and respond to the appropriate 
guidance cues in its environment that direct it to its 


correct target and establish the initial connectivity of 
the nervous system. 


See also: Immunoglobulin Gene Superfamily; 
Neurogenetics in Caenorhabditis elegans; 
Neurogenetics in Drosophila 
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The nervous systems of both invertebrates and ver- 
tebrates are composed of a large variety of distinct 
cell types. The specification of cell fate results in the 
generation of the various types of neurons and deter- 
mines their distinct structures, interconnectivity, 
neurotransmitters, surface receptors and other fea- 
tures characteristic for their individual function. 


The Analysis of Gene Mutants Facilitates 
our Understanding of Neuronal 
Specification 


Mutations that significantly disturb the development 
of the nervous system reveal that this program is pre- 
dominantly genetically determined. The character- 
ization of mutants has allowed researchers to analyze 
and dissect the various steps required for the gener- 
ation and differentiation of a neuron. Therefore, the 
mechanisms of neuronal specification are preferably 
studied in model organisms in which the following 
two prerequisites are fulfilled: 


1. mutants can be isolated and characterized easily; 
2. the fate of particular cells or cell groups can be 
followed during development. 


There are just a few organisms which are accessible to 
genetic and cellular analysis, among them the verte- 
brate models zebrafish (Brachydanio rerio) and mouse 
(Mus musculus). Currently the best characterized 
cell-fate decisions, however, have been described in in- 
vertebrates. These include the generation of mechano- 
receptor cells in the nematode Caenorhabditis elegans 
and bristle hair and eye development in the fruitfly 
Drosophila melanogaster. 


Neurogenesis Functions in a Hierarchical 


Manner 


A progressive determination model that accounts 
for the formation of various sensory organs during 


Singling out of Proneural 
cell cluster genes 
Selection of Neurogenic 
stem cells genes 
Neuron-specific 
genes 
Differentiation 
of cell 
Figure | Progressive genetic control of neuronal 


determination. The expression of proneural genes 
results in the competence of a cell cluster to develop 
a neural fate. Lateral inhibition singles out one (blast) cell 
that then develops and differentiates through the activity 
of neuron-specific genes. 


neurogenesis has been developed from studying 
Drosophila melanogaster (Figure 1). The principles 
and genetic determinants involved are evolutionarily 
conserved, suggesting that similar molecular mechan- 
isms to determine cell diversity are acting in the other 
organisms as well. 

The first decision in neurogenesis is whether a 
given cell is going to become a neuron or another cell 
type. This decision is initiated by the activation of the 
proneural genes in selected clusters of cells (Figure 1). 
These provide the cells they are expressed in with the 
competence to develop into neurons or neuronal pre- 
cursors. The best studied factors that act in the switch 
between neural and non-neural (epidermal) fate are 
encoded by the genes of the achaete-scute complex 
and the atonal gene. These factors were originally 
discovered in Drosophila, but both their structures 
and functions are conserved in evolution. They in 
turn activate other genes responsible for the differen- 
tiation of particular neutrons. 

At the next level of control, a subset of cells (or one 
cell) is singled out from the cluster to develop as a 
sensory organ precursor. Interactions between neigh- 
boring cells mediate this process through the activity 
of the neurogenic genes. Of central importance in this 
cellular cross-talk are the multifunctional membrane 
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receptor Notch and its ligands Delta and Serrate that 
have been identified from both invertebrate and verte- 
brate organisms. Small variations in the expression of 
ligands and receptor in neighboring, initially equipo- 
tent, cells are reinforced by a signal transduction cas- 
cade of the activated Notch receptor and a positive 
feedback mechanism. This eventually results in a 
stable condition termed lateral inhibition, where one 
cell predominantly expresses the inhibiting signal 
Delta and develops a neural fate, whereas in the 
other cell signaling by the activated Notch first results 
in the downregulation of the proneural genes and then 
in an antineurogenic (epidermal) phenotype. 

The last regulatory level involves neuron-specific 
genes that specify the type (sensory neuron, inter- 
neuron or motor neuron) and function of the respect- 
ive cell. These genes also control the expression and 
function of factors that are involved in establishing 
neural interconnectivity by controlling neurite out- 
growth and the subsequent formation of synapses. 
They regulate the expression of factors controlling 
neural differentiation, the expression of neurotrans- 
mitters and of trophic factors that affect survival and 
synaptic plasticity. This entire process requires a hier- 
archy of genetic events involving both interactions 
between neighboring cells and cell-autonomous gene 
activity. 


Neuronal Specification is Mediated by 
both Cell-intrinsic and Extracellular 
Components 


The mechanisms responsible for generating diversity 
have been subject to intense studies in a wide variety 
of organisms. Two principle mechanisms determine 
the differences between two cells (Figure 2). 

As we have seen above, the decision to develop into 
a particular type of neuron may be controlled by 
lateral interactions between spatially related cells. In 
such a case, factors from one cell influence the fate of a 
neighboring cell. As a consequence, cell specification 
is controlled by the position of both cells, and, gen- 
erally, by their cellular environment. These extrinsic 
mechanisms require an exchange of instructive in- 
formation between cells through receptor—ligand 
interactions like the already described Notch/Delta 
cross-talk. Signaling thus can influence a neuronal 
precursor cell to subsequently divide asymmetrically, 
or it can trigger the differential fates of both daughter 
cells after the division of the precursor cell. Most ex- 
trinsic specification programs involve spatially very 
restricted signals between adjacent cells. Therefore, 
non-autonomous signaling is indicated, if an experi- 
mental repositioning of a particular cell within the 
organism results in a change of cell fate. 
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Figure 2 Cell diversity is generated by asymmetric 
cell divisions. (A) A cell-intrinsic factor is distributed 
asymmetrically upon a cell division. Both daughter cells 
inherit different concentrations of the factor. (B) An 
extrinsic factor induces an asymmetric division of the 
mother cell (a) or an asymmetric development of one of 
the daughter cells (b). 


A second essential means to generate diversity of 
neuronal cell types involves asymmetric cell divisions 
by cell-intrinsic mechanisms. The asymmetry is ini- 
tiated by an unequal distribution of a cytoplasmic 
determinant in the mother cell. For example, factors 
like the proteins encoded by the Drosophila genes 
numb and prospero are first asymmetrically localized 
in a neural progenitor cell. Upon cell division, they are 
then distributed only to one of two daughter cells and 
help to specify its fate. In practice, cell autonomy may 
be suspected, whenever the elimination of one of the 
neighboring sister cells or an alteration of its posi- 
tion by means of experimental manipulation does 
not affect the fate of the other cell. 

Regardless of whether a cell-autonomous and a 
non-autonomous differentiation program controls 
asymmetry, it is in the end one or (more likely) a series 
of transcription factors that are activated. Several 
classes of homeodomain proteins with remarkable 
evolutionary conservation have been implicated in 
the terminal differentiation of neuron type and func- 
tion. Among the best characterized factors are the 
classes of POU and LIM homeodomain proteins. In 
chicken, the identities for various types of motor 
neurons seem to be generated by the expression of 
different combinations of LIM proteins in these 
cells. In Drosophila and C. elegans, the differentiation 
and specificity of certain motor and interneurons (but 


also some muscle cells) is dependent on the activity of 
the apterous/ttx-3 LIM protein. The C. elegans gene 
mec-3, encoding another LIM protein required for 
mechanoreceptor differentiation, is controlled by 
unc-86, encoding a cell-autonomous POU factor. A 
mutational loss of either unc-86 or mec-3 activity, for 
example, completely prevents the development of 
mechanoreceptor cells in the nematode. 

These transcription factors in turn control the 
expression of downstream target genes to specify the 
further cellular development and identity. In only 
a few cases have their targets been identified, and 
encode both structural components of the cell (mem- 
brane proteins, receptors), as well as additional 
transcription factors. One experimental problem asso- 
ciated with the identification of target genes of homeo- 
domain transcription factors is that they in most 
cases bind to only poorly defined DNA target sites. 
Therefore, in order to acquire specificity, they have to 
combine with other (cell-intrinsic) factors of which 
many are still unknown. 


Further Reading 

Hawkins N and Garriga G (1998) Asymmetric cell division: from 
A to Z. Genes and Development | 2: 3625-3638. 

Jan YN and Jan LY (1994) Genetic control of cell fate specifica- 
tion in Drosophila peripheral nervous system. Annual Review 
of Genetics 28: 373-393. 


See also: Cell Lineage; Embryonic Stem Cells 
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Neurospora crassa is an ascomycete fungus that has 
been used extensively in genetic research. Cultures 
of Neurospora are recognized by the orange color of 
the vegetative spores. The first genetic studies of 
Neurospora were carried out by Carl Lindegren in 
the 1930s. He isolated several morphological mutant 
strains and constructed the first linkage maps. He 
demonstrated that the analysis of the ordered tetrads 
of Neurospora permitted the mapping of gene loci 
with respect to their centromeres. Lindegren and 
others used tetrad analysis in Neurospora to determine 
the basic properties of meiotic crossing-over: (1) it 
occurs at the four-strand stage and involves two of 
the four chromatids; (2) a crossover at one site in a 
tetrad diminishes the frequency of crossing-over in 
neighboring regions but does not influence which 


chromatids are involved in a nearby crossover (i.e., 
Neurospora manifests chiasma interference but not 
chromatid interference). 

In 1941, George Beadle and Edward Tatum chose 
Neurospora for the system to demonstrate the gener- 
ality of the one-gene one-enzyme hypothesis. Neuro- 
spora was, at that time, the only eukaryote with a 
known genetic system that would grow on a simple, 
defined medium. Beadle and Tatum reasoned that if 
their hypothesis were correct, they should be able to 
detect mutations in many different genes controlling 
the synthesis of the various kinds of amino acids and 
nucleotides and vitamins. The success of this project 
ushered in the era of biochemical genetics of fungi that 
continues today. 

Mary Mitchell’s analysis of tetrads from a cross 
between two allelic mutants at a locus for pyridoxine 
requirement in 1955 revealed the relationship of gene 
conversion to intragenic recombination and to the 
exchange of flanking markers. The pioneering work 
of Lindegren and of Mitchell led to many investiga- 
tions with Neurospora which have contributed to our 
knowledge of the properties of the meiotic recombin- 
ation event. 

When two vegetative cultures of Neurospora are 
grown in close proximity, they may fuse to form a 
single individual (mycelium) containing nuclei from 
both contributing strains. This is a ‘heterokaryon,’ and 
it continues to grow using gene products from both 
components. Tests for dominance of mutant genes, 
normally performed in heterozygous diploids in most 
genetic systems, are based on the growth properties of 
heterokaryons in Neurospora. There is no exchange of 
genes between nuclei in a heterokaryon, and the separ- 
ate components (homokaryons) can be reisolated 
from vegetative spores. This system has permitted 
the design of many studies of the separate roles of 
nucleus and cytoplasm in determining traits and con- 
trolling metabolic processes. One result was the dis- 
covery of the slow growing poky mutants, which were 
controlled by cytoplasmic determinants, and ultim- 
ately proved to result from changes in the mito- 
chondrial genome. Much of our knowledge of the 
interactions of nuclear and mitochondrial genes has 
come from studies of Neurospora. 

The growth of filamentous fungi by linear extension 
has permitted a vivid demonstration of the changes 
which take place over time. When a culture is observed 
growing down the length of a long tube, changes in 
growth rate and growth habit can be observed over a 
period of days. Such preparations led to the discovery 
of diurnal cycles of growth habit in Neurospora and of 
mutations that altered the length of the cycle. Further 
studies of this system have contributed greatly to our 
knowledge of the mechanisms of biological clocks. 
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Further Reading 
Perkins DD (1992) Neurospora: the organism behind the 
molecular revolution. Genetics 130: 687-701. 


See also: Molecular Clock; Tetrad Analysis 
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Neutral drift is the process of change of genotypes by 
random genetic drift without phenotypic alteration in 
evolution. It occurs when many genotypes give rise to 
the same phenotype. In such cases, genotype may 
change within a given phenotype. An example is the 
evolution of the secondary structure of tRNA starting 
from a random sequence, where many RNA seq- 
uences lead to the same secondary structure. Through 
simulation studies, M.A. Huynen, W. Fontana, and 
P. Schuster have shown that the evolution of the 
tRNA secondary structure is characterized by several 
discrete steps, each corresponding to a transition from 
one shape of secondary structure to another. Neutral 
drift occurs within a given shape, and provides an 
opportunity for further progress toward the final 
structure. A shape change may occur by a single 
point mutation if the current sequence is one step 
away from a sequence with a neighboring shape. 
Neutral drift is needed for searching such a sequence 
space. 

The above picture of tRNA evolution comes from 
the model in which the secondary structure is the only 
target of selection. In nature, many other factors such 
as stability or reactivity with other molecules affect 
fitness, and most evolutionary studies have been on 
the subsequent modification of the molecule after 
establishment of a certain structure. For example, the 
properties of the hemoglobin molecule of various 
organisms that have the same higher order structure, 
have been investigated in detail. The sequences that 
map to a given shape belong to a set of sequences 
connected by nearly neutral mutations. On the other 
hand, a mutation that results in transition between 
different shapes may have a large effect, and possibly 
even correspond to a lethal mutation if the gene pro- 
duct is essential for the organism. 

A gene regulatory network is very important 
for morphological characters. Genotype to phenotype 
mapping may be again many to one. Then one would 
expect neutral or nearly neutral drift with occasional 
transition of phenotypes during the evolution of 
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regulatory systems. Punctuated equilibrium of mor- 
phological evolution may reflect such a process. 


Further Reading 

Fontana W and Schuster P (1998) Continuity in evolution: on 
the nature of transitions. Science 280: 1451—1455. 

Huynen MA, Stadler PF and Fontana W (1996) Smoothness 
within ruggedness: The role of neutrality in adaptation. 
Proceedings of the National Academy of Sciences, USA 93: 
397-401. 


See also: Nearly Neutral Theory; Neutral Theory; 
Selective Neutrality; Shifting Balance Theory of 
Evolution 
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Any mutation in the genetic material that neither 
increases nor decreases the survival probability or 
reproductive potential of an individual carrying that 
mutation is said to be a selectively ‘neutral’ mutation. 
Neutral mutations cannot by definition contribute to 
adaptive evolution, nor can they lead to the evolution- 
ary improvement of a trait or species. Nevertheless, 
neutral mutations may be a large component of the 
standing crop of genetic variation found segregating 
within natural populations and species, including 
humans, and may be an equally i important component 
of the genetic differences between species. Therefore, 
population genetic and evolutionary theories of neu- 
tral mutation are very well developed, and consider- 
able empirical effort has gone into testing predictions 
of neutral mutation theory. 


Dynamics Within Populations 


Natural selection cannot act to increase or decrease 
the frequency of a neutral mutation in a population by 
definition. Instead, the dynamics of allele frequency 
change of a neutral mutation is governed entirely by a 
process known as ‘genetic drift.’ Genetic drift can be 
defined as the chance change in the frequency of a 
mutation in a population from one generation to the 
next resulting from the finite size of a population. 
Genetic drift will be strongest in populations of small 
size and decreases in strength with increasing popula- 
tion size. For a selectively neutral mutation, the ex- 
pected change in gene frequency from one generation 


to the next is approximately the reciprocal of the 
population size. 

Motoo Kimura, the great modern population 
genetics theorist, formulated many results about the 
properties of neutral mutations. Consider an idealized 
diploid species of size N, where each individual pos- 
sesses two copies of every gene. Therefore, there are 
2N copies of each gene, anda newly arising mutation, 
which occurs as a unique mutation in a single off- 
spring, will have initial frequency p = 1(/2N). Kimura 
proved, by diffusion approximation, that the prob- 
ability of eventual fixation of a neutral mutation is its 
current frequency in the population. This means that 
for populations of large size, the probability that a 
neutral mutation will increase in frequency in a popu- 
lation and eventually completely replace the ancestral 
allele from which it arose, is very small. Most neutral 
mutations never become common, remaining rare in a 
population for a period of time before eventually 
‘drifting’ to extinction. Kimura also showed that the 
expected time to fixation of a neutral mutation des- 
tined for fixation is 4N generations. For species such 
as insects, where population sizes must easily be in the 
millions, this means that neutral mutations destined 
for fixation spend a very long time segregating as 
genetic polymorphisms in populations. In contrast, a 
selectively favored mutation will be driven to fixation 
by positive natural selection much more quickly than 
a neutral mutation, and the time spent segregating in 
the population will be correspondingly shorter. This is 
one reason to suppose that at any given time in the 
history of a species, a large proportion of genetic 

variation will be selectively neutral. 


Evolutionary Dynamics 


In a diploid population with 2N genes and a per gen- 
eration mutation rate to neutral alleles, u, the total 
number of new neutral mutations entering the popu- 
lation each generation will be 2Nu. Now, if the prob- 
ability of a given neutral mutation ever reaching 
fixation is 1/(2N), as given above, then it follows that 
the rate, K, of neutral evolution will be simply the 
product of these two terms, K = 2N u * 1/(2N) = uw. 
This implies that the rate at which neutral mutations 
will fix in a species will be a constant (assuming muta- 
tion rate is relatively constant), and that it will 
be independent of the population size of a species. 
The rate of neutral evolution is expected to be rela- 
tively constant, dependent only on generation time. 
For species with similar generation times, a given 
gene that evolved primarily by the accumulation of 
neutral mutations, will have a characteristic and con- 


stant rate of evolution, and is said to obey a ‘molecular 
clock.’ 


Empirical Evidence for Neutral 
Evolution 


Location of Mutations and Rates of 
Evolutionary Changes in Different 
Functional Components of the Genome 
Neutral mutations are expected to accumulate at loca- 
tions in the genome where changes are least likely to 
affect the ontogenetic instructions for making an 
organism. The genome is composed, roughly speaking, 
of three components, a functional component con- 
taining the instructions for producing all the proteins 
and other structural elements of cells (such as ribo- 
somal RNA), another functional (but poorly charac- 
terized) component containing the cis-regulatory 
signals that control the spatio-temporal expression of 
these structural components, and a nonfunctional 
component consisting of much of the remaining 
genetic material. The noncoding component includes 
both between genes and the spacer DNA within genes 
(i.e. introns). The structural components of the 
genome are generally thought to be highly evolved 
through the process of natural selection, and from 
this it follows that most mutations in the structural 
component will be selectively deleterious rather than 
selectively neutral. Natural selection will act to elim- 
inate these deleterious mutations, leaving the sequence 
of the structural component relatively unchanged over 
evolutionary time. In contrast, the lack of so-called 
‘functional constraints’ acting on the noncoding com- 
ponent of the genome will allow mutations to accu- 
mulate because they are selectively neutral. Compared 
to functional DNA, nonfunctional DNA is expected 
to have a higher density of polymorphic mutations seg- 
regating within species, therefore, and a correspond- 
ingly greater rate of substitution between species. This 
is precisely what is found in all organisms, and this pre- 
diction and observation is one of the cornerstones of 
Kimura’s neutral theory of molecular evolution. 


Tests of Neutral Variation and Evolution 

The greatest attention has been given to protein- 
coding regions of the genome and to proteins in parti- 
cular. There is considerable polymorphism in natural 
populations in the amino acid sequence of proteins, 
and most characterizations of the frequency spectrum 
of protein variants fail to show strong departures from 
theoretical neutral expectations. Proteins have also 
been shown to evolve at a roughly constant rate with 
absolute (i.e., geological) time, consistent with a mole- 
cular clock. But the predicted influence of generation 
time on the rate of evolution of proteins is weak at 
best, and much smaller in magnitude than that seen for 
changes in the noncoding component of the genome. 
Thus a key prediction of neutrality is violated by the 


Neutral Mutation 1325 


data on rates of protein evolution. In addition, neutral 
theory makes strong predictions about the expected 
variability (or imprecision) of the molecular clock, and 
Kimura was the first to point out that the measured 
variance in the rate of protein evolution exceeds the 
predicted value. In mammals, it is now believed that 
the variance in the rate of protein evolution is approxi- 
mately 5-10 times greater on average than that expected 
under neutrality, again signaling an incompatibility of 
protein evolution with neutral mutation theory. 

On the other hand, noncoding portions of the 
genome have variability and evolve at fast rates that 
are consistent with selective neutrality. Indeed, the 
bulk of changes in genomes over evolutionary time 
occur in noncoding portions of the genome, and are 
likely to be selectively neutral changes. 


Related Theories 


It might be supposed that there is no such thing as a 
neutral mutation, with fitness effect exactly equal to 
zero. Instead, many mutations may have very small 
fitness effect, so close to zero that they are ‘effectively’ 
neutral. From this supposition, a theory of nearly 
neutral mutations has been developed, largely asso- 
ciated with the theoretical work of T. Ohta (but also 
Kimura). According to this theory, mutations whose 
fitness effects are smaller than the reciprocal of the 
population size, or s << 1/N, will behave as if they are 
neutral mutations. Of greatest interest are those muta- 
tions whose fitness effects are close to the boundary 
s = 1/2N (in diploids), because the fate of these muta- 
tions will be very sensitive to population size. Syn- 
onymous mutations in codons may be one such class of 
mutations. Many organisms, mostly those species with 
large population sizes (such as bacteria, weedy plants, 
yeast, and insects) have genes with highly biased (non- 
random) usage of degenerate codons within amino acid 
codon families. Biased codon usage in these species 
has been shown to be governed by extremely weak 
selection, and in the fruitfly, it is almost certainly on 
this critical neutrality—selection interface. However, 
the evidence to support this contention is highly tech- 
nical, and is beyond the scope of this introduction 
to the subject of neutral mutation. Arguments have 
also been made for the ‘near’ neutrality of amino acid 
substitutions in protein evolution, but the evidence to 
support this claim remains contentious. 


Further Reading 
Kimura M (1983) The Neutral Theory of Molecular Evolution. Cam- 
bridge: Cambridge University Press. 


See also: Codon Usage Bias; Nearly Neutral 
Theory; Neutral Theory 
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History 


The neutral theory of molecular evolution contends 
that at the molecular level most evolutionary changes 
and polymorphisms within species are not caused by 
natural selection, but by random genetic drift. The 
theory was first put forward by M. Kimura in 1968. 
He compared the amino acid sequences of hemo- 
globin « and cytochrome c in several mammalian 
species, and found that the number of mutant substi- 
tutions was too large to be tolerable within Haldane’s 
theory of natural selection if the substitution number 
was extrapolated to the total genome. Based on this 
discrepancy, Kimura proposed the neutral theory. 
By considering more biochemical facts than Kimura, 
J.L. King and T. Jukes published a similar theory in 
1969. Although Kimura’s original argument for the 
neutral theory depended on the concept of the cost 
of natural selection, subsequent discussion of the neu- 
tral theory became almost independent of the cost, 
and has put more emphasis on the constancy of the 
rate of molecular evolution, i.e., the molecular clock. 
Here, the argument for the neutral theory was the 
apparent disconnection between molecular and pheno- 
typic changes. Another important observation for the 
neutral theory was the inverse relationship between 
the importance of a protein and its rate of evolution, 
first noted by King and Jukes. In the principle of 
the neutral theory, important proteins are more con- 
strained and their amino acid changes are less likely 
to be neutral. During the 1990s, DNA sequence data 
have rapidly increased, enabling comparison of the 
patterns of substitutions at selectively important 
(such as nonsynonymous) and unimportant (such as 
synonymous) sites. Many unimportant sites evolve as 
predicted by the neutral theory, whereas important 


<4 4Ne 


sites are more influenced by natural selection, and the 
difference in the patterns provides an opportunity to 
detect selection. The neutral theory has been tested 
through such analyses. Here the special attention is 
directed toward clarifying the interactive effect of ran- 
dom drift and selection, i.e., the nearly neutral theory. 


Behavior of Mutant in Population 


According to the neutral theory, the behavior of 
mutant genes in populations is determined by ran- 
dom genetic drift. The evolutionary process in which 
mutant genes are substituted one after another at a 
locus becomes quite different in this theory from 
that of neo-Darwinism. In every generation, many 
mutants appear in a population, but the majority are 
lost by chance, and only lucky mutants spread and fix 
in the population. This is true for both theories. How- 
ever, under the neutral theory, the whole process is 
governed by chance, whereas under the selection the- 
ory, selection plays a major role and only selectively 
advantageous mutants can fix in the population. 

The behavior of neutral mutants has been analyzed, 
and the process of successive mutant substitutions in a 
finite population is presented. Figure | illustrates the 
process. In the figure, courses of changes of the fre- 
quencies of mutants destined to fixation are depicted 
by thick paths. On the average, it takes a number of 
generations equals four times the effective population 
size, Ne. There are numerous unlucky mutants that are 
shown by thin paths. If we denote the neutral muta- 
tion rate at a locus per generation by v, and the actual 
population size by N, there occur 2Nv new mutations 
in the population in each generation. Among them, 
only the fraction 1/(2N) is lucky and fixes in the 
population. Therefore, the number of mutation that 
fix is equal to the mutation rate, v, per generation. Let 
k be the substitution rate, and we have, 


k=v (1) 
The above formula also tells us that the average 
interval between successive substitutions is the 
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Figure | 
mutation is denoted by v (from Kimura, 1983). 
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Course of change of frequencies of neutral mutants in a finite population of size Ne. The rate of neutral 


reciprocal of the mutation rate, 1/v, as shown in the 
figure. Polymorphism is simply a phase of substitu- 
tion, as seen by the intermediate frequency phase of 
the figure. 


Selective Constraint 


Not all mutations at the molecular level are neutral. 
Some amino acid changes of a protein are known to 
seriously impair the structure and function of the 
protein. Such changes are eliminated by natural selec- 
tion. In fact, the rate of protein evolution varies 
among proteins and there are negative correlations 
between the evolutionary rate and the constraint. 
For example, there are very few amino acid substitu- 
tions in histone IV with strong constraint, whereas 
pseudogenes with no known constraint are rapidly 
evolving. Equation (1) can be modified to include 
such an effect. 


k = fovr (2) 


where fo is the fraction of mutations that are neutral, 
and vris the total mutation rate. Note that 1 — fo is the 
fraction of mutations that impair the structure and 
function. 


Molecular Clock 


A most significant observation supporting the neutral 
theory is the so-called molecular clock, i.e., the rough 
constancy of the evolutionary rate at each locus. Dur- 
ing the 1970s and 1980s, based on comparative studies 
of amino acid sequences, the molecular clock had been 
thought to be fairly general. However some research- 
ers found significant variations among lineages in the 
evolutionary rate. In particular, J.H. Gillespie noted 
that the pattern of substitution appeared to be episodic, 
with bursts of substitutions separated by periods of 
quiescence. Another problem of the molecular clock is 
the generation-time effect. Under the neutral theory, 
the substitution rate is directly proportional to the 
mutation rate from Equations (1) and (2). However, 
the rate is measured per year, contrary to mutation 
rate which is usually measured per generation. 
Together with the variation of evolutionary rate, 
the generation-time problem encourages further 
examination of the neutral theory. 


Further Reading 

Kimura M (1983) The Neutral Theory of Molecular Evolution. Cam- 
bridge : Cambridge University Press. 

Sawyear SA, Dykhuizen DE and Hartl HL (1987) A confidence 
interval for the number of selectively neutral amino acid 
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polymorphisms. Proceedings of the National Academy of 
Sciences, USA 84: 6225-6228. 


See also: Fixation Probability; Gene Substitution; 
Molecular Clock; Nearly Neutral Theory; 
Selective Neutrality 
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A nick is a point in a double-stranded DNA molecule 
where there is no phosphodiester bond between adja- 
cent nucleotides of one strand, which typically arises 
through damage or enzyme action. 


See also: Nick Translation 
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Nick translation is the process of replacing a DNA 
strand in a double-stranded DNA; this is carried out 
by DNA polymerase. The polymerase initiates its 
action at a nick in a DNA strand. It displaces one 
DNA strand and digests that strand with its 5’ to 3’ 
exonuclease activity and coordinately polymerizes a 
DNA strand by using the non digested strand as a 
template. The polymerase initiates synthesis at the 3’ 
hydroxyl] terminus located at the position of the nick. 
As a result of coordinate action of the exonuclease and 
polymerase activities the position of the nick trans- 
lates or moves along the DNA molecule as the DNA is 
digested and synthesized. Here the term translation 
should not be confused with the translation of mRNA 
into protein. Escherichia coli DNA polymerase I can 
carry out nick translation because it contains a 5’ to 
3’ exonuclease activity, while most DNA polymer- 
ases do not and so are incapable of performing nick 
translation. 

Molecular biologists use the nick translating 
activity of E. coli DNA polymerase I in vitro to in- 
corporate radioactivity into DNA. DNA that has 
undergone nick translation maintains its integrity. In 
vitro the process is started by the addition of a non- 
specific DNAse, which introduces nicks along 
the DNA molecule. Then E. coli DNA polymerase I 
and all four deoxynucleotide triphosphates are added. 
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In order to incorporate radioactive label into the 
DNA efficiently generally only one of the deoxy- 
nucleotide triphosphates is radioactive and is added 
at a much lower concentration than the other three 
non-radioactively labeled triphosphates. The reaction 
is stopped and the nick translated DNA is separated 
from the unincorporated nucleotides. Radioactively 
labeled DNA probes are often made by this process 
and are used in many different analytical methods for 
the detection of DNA that specifically hybridizes to 
the probe. 


See also: DNA Hybridization 
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The biological conversion of atmospheric dinitrogen 
to ammonia is known as nitrogen fixation. From the 
energetic perspective, nitrogen fixation is very costly; 
each mole of nitrogen gas reduced to ammonia re- 
quires an input of 16 mol of ATP and 8 high-potential 
electrons. The ability to fix nitrogen is limited to the 
cyanobacteria and members of the bacterial genera 
Klebsiella, Azotobacter, Rhizobium, and Azorhizo- 
bium. The latter two genera live symbiotically on the 
roots or stems of (usually leguminous) plants. 

Nitrogen fixation requires the protein products of 
many genes. In Klebsiella, the nif gene cluster occupies 
24kb of DNA and consists of 17 contiguous genes 
organized in seven operons. The nif genes are posi- 
tively regulated in response to nitrogen limitation and 
negatively regulated in response to oxygen and the 
presence of fixed nitrogen. 

The key enzyme of nitrogen fixation is nitrogenase, 
a multisubunit, oxygen-sensitive protein encoded by 
the products of the nifD, nifK, and nifH genes. Essen- 
tial to catalysis by nitrogenase is an iron—molybdenum 
cofactor whose assembly is mediated by the products 
of nifB, nifN, and nifE. Other nif genes play critical 
roles in transporting electrons into nitrogenase, in the 
maturation of the initial translation product of the 
nifH gene, regulation of nif gene transcription, and 
the sensing of oxygen. 


Further Reading 

Gussin GN, Ronson CW and Ausubel FM (1986) Regulation 
of nitrogen fixation genes. Annual Review of Genetics 20: 
567-591. 


See also: Bacteria 
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Perhaps no other discovery of the twentieth century 
was as central to biology as the elucidation of the 
genetic code. The code — a language written into the 
hereditary chemical DNA - provides the very instruc- 
tions that result in all forms of life on earth. Genetic 
instructions are inscribed in a class of chemicals called 
nucleic acids, generally in the form of DNA. These 
instructions direct the synthesis of proteins, relatively 
large molecules that give form and action to all forms 
of life. The genetic code thus provides the basis for 
‘translating’ nucleic acid instructions into the func- 
tional building blocks of life. Moreover, the genetic 
code is a general solution of the translation problem, 
as valid for simple bacteria and even simpler viruses as 
it is for human beings and giant sequoias. 

Marshall Warren Nirenberg (1927- ), a young bio- 
chemist working with a small group of colleagues at 
the National Institutes of Health near Washington 
DC, solved the coding problem in the 1960s. His 
work proceeded in two electrifying stages. The first 
was carried out with Heinrich Matthaei, a German 
agricultural biologist. It depended on the development 
of a cell-free protein synthetic system that contained 
synthetic mRNAs, made up of the four nucleic acid 
bases that constitute DNA (A,T,G,C) and its copy, 
messenger RNA (A,U,G,C). With this system, Niren- 
berg and Matthaei and their colleagues were able to 
show that the amino acid phenylalanine was encoded 
by some combinations of uridylic acid residues (Us). 
Other combinations of nucleic acid bases yielded other 
amino acids; for example valine was coded by a ratio of 
two Usand one G. The order and length of a code word 
was not determined — it was purely compositional. 

The second step, developed with Philip Leder, a 
young physician at the National Institutes of Health, 
involved a simple binding assay in which each amino 
acid attached to its cognate tRNA could be tested 
against a specific codon. Using this assay, they showed 
that the code was composed of three bases. Each 
‘word’ was a triplet, and using all 64 triplet combina- 
tions of the four nucleic acid bases, they and their 
colleagues assigned an amino acid ‘meaning’ to each 
of the 64 codewords, thus fully elucidating the genetic 
code. 

Nirenberg shared the 1968 Nobel Prize in Medi- 
cine with Robert W. Holley, then at the University 


of Wisconsin, and Har Gobind Khorana, then at 
Cornell University, for work related to solving the 
code. Their work had opened an enormous window 
on the most fundamental aspects of genetics and biol- 
ogy. For Nirenberg, a native New Yorker raised in 
Florida, and first attracted to science as a young 
naturalist trapping snakes and assorted insects in the 
swamps near Orlando, it was an irony. Observation of 
nature in its most integrated form, the ecology of a 
Florida swamp, had led him to the most fundamental, 
reductionist biological discovery of our age. That 
scientific odyssey had led from the University of 
Florida to graduate school in Biochemistry at the 
University of Michigan, to the National Institutes of 
Health, which has been Nirenberg’s scientific home 
for over 40 years. 

After the code work, Nirenberg pioneered studies 
of the brain — the most complex and fascinating of all 
organ systems. 


See also: Amino Acids; Genetic Code 
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The rhizobial nodulation (nod ) genes (see Nodulation 
Genes) are induced by plant produced flavonoid com- 
pounds in a process that requires the NodD activator 
protein, a member of the LysR family of transcrip- 
tional regulatory proteins. The inducible nod genes 
are arranged in operons that are preceded by a well- 
conserved DNA sequence that has been termed the 
‘nod-box.’ These nod-box sequences are found —26 to 
—76 bp upstream of the nod operon transcriptional 
start sites. The nod-box promoter is essential for the 
activation of nod gene expression. Gel retardation 
experiments and DNA footprinting have shown that 
the nod-box is the site of NodD binding. 

As original described, the nod-box was a conserved 
47 bp region found 26 bp upstream of the transcrip- 
tional start site of the Sinorhizobium meliloti nod 
ABC operon. This 47 bp region was subdivided into 
highly conserved regions of 5, 7, and 25bp. DNA 
footprint analysis of S. meliloti NodD, and NodD; 
proteins binding to nod promoters showed that 
approximately 50bp were protected. The protected 
sequence was approximately from —20 to —75 bp up- 
stream of the transcriptional start site, overlapping the 
nod-box sequence. The extensive region of DNA pro- 
tected by NodD binding is surprising since the NodD 
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protein is approximately 35 kDa. These results sug- 
gested the NodD binds to the DNA as a multimer. 
The known native molecular weights of other LysR- 
type proteins indicate that TrpI, CysB, and NahR 
likely exist as tetramers, while MetR, CatR, IlvY, 
IciA, and NodD; are likely dimers. The N-terminal 
portion of NodD has been suggested to play a role in 
multimerization. 

Initial studies showed that NodD bound to the 
nod-box with equal affinity both in the presence and 
absence of the flavonoid inducer. However, studies in 
Azorhizobium caulinodans showed that the NodD of 
this organism had a higher binding affinity for the nod 
box in the presence of the inducer. S. meliloti strain 
AK631 possesses a repressor, NolR, which binds to 
the nod promoter and inhibits NodD binding. Addi- 
tion of the inducer resulted in a displacement of No1R 
and the binding of NodD. However, there are no 
direct in vitro data showing that the inducer interacts 
directly with NodD. This is assumed due to the fact 
that changes in the primary sequence of NodD can 
result in a change in inducer specificity. 

Analysis of the promoters of nod operons from a 
variety of rhizobial species revealed nod-box se- 
quences that diverged significantly from the 47bp 
sequence original identified. For example, the B. japon- 
icum nodD, promoter possesses a nod-box that 
matches the consensus only in its most 3’ region. This 
led to the suggestion that the zod-box actually is com- 
posed of a series of 9 bp repeats. Four 9 bp repeats are 
found in most nod-boxes, but the divergent nod-boxes 
contain only two 9 bp repeats. This model suggested 
that NodD binds to the promoter either as a tetramer, 
contacting four repeats, or as a dimer, contacting two 
repeats. However, comparison of a larger number of 
nod-box sequences revealed considerable variation 
among the 9bp repeat sequences. Alternatively, the 
nod-box was proposed to consist of two inverted 
repeats with the sequence of ATC-Ng-GAT found in 
several nod-box sequences, but lacking in the divergent 
nod-boxes (e.g., 5’ of the Bradyrhizobium japonicum 
nodD, gene). Both the 9 bp repeat model and the 
inverted repeat model possess the T-Ni;-A motif 
that has been proposed as a general feature of the 
DNA binding sites for LysR-type proteins. 

Interference DNA footprinting was used to study 
the binding of the S. meliloti NodD; protein to three 
different nod-box promoters. NodD; is somewhat un- 
usual since it activates transcription independently of 
the presence of host-produced flavonoids. These stu- 
dies showed that NodD; binds to two regions of the 
nod-box located on the same face of the DNA helix. 
The insertion of 4 bp between these two binding 
regions resulted in a disruption of NodD; binding. 
Such an insertion would effectively rotate the DNA 
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binding sites one-half turn of the helix. An insertion of 
10 bp, resulting in a full-turn rotation of the helix, 
had little effect on NodD; binding. Thus, interaction 
of NodD ; with the nod-box requires that the two 
contact points be located on the same face of the 
helix. The affinity of NodD ; binding to the wild- 
type nod-box was determined (Ky = 1.8 x 107° M). 
Other experiments revealed that NodD 3 binding 
resulted in the formation of a bend in the DNA that 
likely plays a role in transcription initiation. The 
induction of such a bend in the promoter region is 
thought to be a general feature of transcriptional 
activation by LysR family members. 


See also: Nod Factors; Nodulation Genes; 
Nodulins 
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Nod factors are lipochito-oligosaccharide molecules, 
excreted by rhizobia (Fam: Rhizobiaceae), which are 
major determinants of host range and nodulation in 
the symbioses between these soil bacteria and legume 
plants. 


Signaling and Host Range in Rhizobium- 
Legume Symbioses 


Rhizobium-legume symbioses are of great ecological 
and agronomic importance, due to their ability to fix 
large amounts of atmospheric nitrogen. These sym- 
bioses result in the formation on legume roots of 
differentiated organs called nodules, in which the bac- 
teria reduce nitrogen into ammonia used by the host 
plant. Infection of legumes by rhizobia generally 
involves the curling of root hairs, formation of in- 
fection threads within root hairs and the root cortex, 
and induction of a meristem in the inner root 
cortex, giving rise to the nodule. An important feature 
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Figure | Generalized Nod factor structure. 


of rhizobium-legume symbioses is their specificity: 
each rhizobium has a defined host range varying 
from a few legume genera to more than a hundred. 
For example Sinorhizobium meliloti nodulates only 
Medicago, Melilotus, and Trigonella species, while 
Rhizobium sp. NGR234 nodulates plants in more 
than 110 legume genera. Genetic analysis of nodulation 
in several rhizobium species has identified a number of 
nodulation (nod) genes which specify host range, infec- 
tion, and nodule formation. Some of these genes suchas 
nodD and nodABC are present in all rhizobia, while 
others, called host-specific nod genes, are found in 
various combinations in the different rhizobium spe- 
cies. The nod genes control an exchange of signals 
between the rhizobium and its host plant. The regula- 
tory nodD genes, in the presence of flavonoid plant 
signals, activate the expression of the other (structural) 
nod genes, which are involved in the synthesis and 
excretion of extracellular signals, called Nod factors, 
which are specifically active on host plants. 


Structure and Biosynthesis of Nod 
Factors 


The structure of Nod factors produced by a number 
of rhizobia has been determined. In all cases they 
are lipochito-oligosaccharides made of a backbone 
of three to five N-acetyl glucosamine residues N- 
acylated at the non-reducing end (Figure 1). In addi- 
tion, chemical groups such as sulfate, fucose, acetate, 
etc. which vary according to the rhizobial strain can 
substitute the oligosaccharide backbone (Table 1). 
While the common nodABC genes determine the 
synthesis of the lipochito-oligosaccharide core com- 
mon to all Nod factors, the various substitutions are 
encoded by the host-specific nod genes (Figure 2). 
These substitutions confer to Nod factors their speci- 
ficity towards the legume host plants. For example the 
sulfate on S. meliloti Nod factors is required for nodu- 
lation of Medicago plants. 


Biological Activity of Nod Factors 


Purified Nod factors induce at very low concentra- 
tions (down to 107"? mol 17*) on the roots of host 
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Figure 2 Role of Sinorhizobium meliloti Nod proteins in Nod factor biosynthesis. NodA: acyltransferase, NodB: 
N-deacetylase, NodC: glucosaminyltransferase, NodE: keto-acylsynthase, NodF: acyl carrier protein, NodH: 
sulfotransferase, NodL: acetyltransferase, NodPQ: ATP sulfurylase and APS kinase. 


plants, a number of developmental responses which 
are similar to those induced by rhizobial cells: root 
hair deformation, division of cortical cells, and forma- 
tion of nodule primordia. At cellular and molecular 
levels, several responses to Nod factors have been 
characterized, such as ion fluxes, reorganization of 
the cytoskeleton, and induction of nodulin gene 
expression. Elicitation of some of these responses 
requires only very low concentrations of Nod factors 
and is highly dependent on Nod factor structure, 
which suggests that high-affinity receptors are 
involved in Nod factor perception. The molecular 
mechanisms which allow perception and transduction 
of the Nod factor signal to the different root cell layers 
are currently under study by a variety of genetic, bio- 
chemical, and pharmacological approaches. 


See also: Nodulation Genes; Root Development, 
Genetics of; Symbionts, Genetics of 
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Gram-negative, soil bacteria of the family Rhizo- 
biaceae (e.g., Allorhizobium, Azorhizobium, Bradyrhi- 
zobium, Mesorhizobium, Rhizobium, Sinorhizobium) 
have the ability to infect and establish a nitrogen-fixing 
symbiosis in the roots of specific legume species. The 
bacterial genes that are important to this infection 
process are termed ‘nodulation genes.’ These genes 


are distinguished from other symbiotically relevant 
genes that might control features such as nitrogen 
fixation, electron transport, nutrient uptake, etc. The 
nodulation (nod) genes were first identified by their 
ability to complement mutants of Sinorhizobium 
meliloti, a symbiont of alfalfa, that were completely 
defective in their ability to nodulate (ie, Nod” 
mutants). Operationally (see below), nodulation 
genes have been named based on either their ability 
to affect the synthesis of the lipo-chitin Nod signal or 
their coregulation with such genes (e.g., within an 
operon controlled by NodD). However, exceptions 
to this definition do exist in the literature. Indeed, an 
all-inclusive definition would have to include all genes 
that, when mutated, affect the nodulation response. 
Such a definition is too broad to be of practical use and 
would not have widespread support in the research 
community. 

The nodulation genes can be grouped into two 
general classes (Table 1). The first class involves 
genes whose protein products biosynthesize, modify, 
or transport the lipo-chitin nodulation signal (see Nod 
factor). The lipo-chitin Nod signal is essential for 
nodulation and is the bacterial signal that triggers 
de novo organogenesis of the root nodule, which is 
intracellularly colonized by the bacterial symbiont. 
Core synthesis of the Nod signal involves the pro- 
ducts of the nodABCMFE genes. The products of 
the nodIJ genes have been implicated in transport of 
the Nod signal to the exterior of the bacterial cell. 
NodT is a bacterial outer membrane protein. NodO 
is excreted and probably acts by inserting itself into 
the plant membrane. Some of the nod genes have 
counterparts involved in normal bacterial metabolism, 
e.g., nodM encoding glucosamine synthase, which is 


an ortholog of glmS. Only nodM is coregulated with 
the other nodulation genes. The other nodulation 
genes in this first class carry out a variety of biochem- 
ical reactions that modify the chemistry of the core 
Nod signal structure. These chemical modifications 
are important since they determine the host specificity 
of the signal. It should be stressed that not all of the 
nod genes listed in Table | are found in a single 
rhizobium. The specific complement of genes in an 
organism helps determine its host range. 

The second general class of nodulation genes in- 
cludes those that act to regulate transcription of the 
nod regulon. The first such gene to be identified was 
nodD. The NodD protein is a member of the LysR 
family of transcriptional regulatory proteins, which 
binds to a conserved promoter element (see Nod- 
box) 5’ of the various nod operons and activates tran- 
scription in the presence of a plant-produced signal. 
This signal is different for each legume species. With a 
few exceptions, all nod gene inducers are members of 
the flavonoid family of secondary plant products. 
Specific examples include luteolin (induces S. meli- 
loti), genistein/ daidzein (induce Bradyrhizobium 
japonicum), and naringenin (induces Rhizobium 
leguminosarum bv. viciae). There are no direct bio- 
chemical data that show that the flavonoid inducer 
interacts directly with NodD. However, different 
NodD proteins differ in their specificity and muta- 
tions of a single nodD gene can result in protein pro- 
ducts that vary in their flavonoid specificity. Hence, the 
primary structure of the NodD protein appears to 
dictate specificity and it is assumed that this is due to a 
direct interaction with the inducer. Different rhizobial 
species can have from one to three or more nodD genes 
or ortholog (e.g., syrM). A full explanation for the need 
for such redundancy is not available. However, it is 
generally thought that the different NodD proteins 
produced by a given rhizobia recognize a different 
repertoire of flavonoid inducers and allow the bacteria 
to infect a larger variety of plant hosts. 

Rhizobium leguminosarum bv. viceae provides 
perhaps the simplest regulatory model for nod gene 
expression, where a single nodD gene is present. 
Mutations of this gene result in the complete loss of 
nodulation ability. However, nod gene expression in 
most other rhizobia is controlled in a much more 
complex way. An organism such as S. meliloti, with 
three nodD genes and syrM, has added complexity due 
to the interaction of these proteins and their response 
to a variety of flavonoid inducers. Regulation of nod 
expression in S. meliloti is also under negative control 
mediated by a repressor encoded by the nolR gene. 

Perhaps the most complex regulatory scheme for 
control of nodulation gene transcription is found in 
the bacterium B. japonicum, which possesses two 
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nodD genes. One of these, nodD,, is required for 
nod gene induction in the presence of the isoflavone 
inducer (e.g., genistein). The product of the second 


Table I Proposed functions of the known nodulation 
(nod, nol, noe) genes 


Gene Proposed function 


Regulatory genes 


nodD Transcriptional activator 

nodD> 3, Transcriptional regulator 

nodV Two-component regulator 

nodW Two-component regulator 

nolA Transcriptional regulator 

nolR Transcriptional repressor 

syrM Transcriptional regulator 
Nod signal core synthesis 

nodA Acetyltransferase 

nodB Deacetylase 

nodC Chitin synthase 

nodM D-glucosamine synthase 

nodE B-Ketoacylsynthase 

nodF Acyl carrier protein 


Nod signal modifications 


nodG 3-oxa acyl-acyl carrier protein 
reductase 

nodH Sulfotransferase 

nodL Acetyltransferase 

nodS Methyltransferase 

nodU Carbamoyltransferase 

nodP ATP-sulfurylase subunit 

nodQ ATP-sulfurylase subunit/APS 
kinase 

nodX Acetyltransferase 

nodZ Fucosyltransferase 

nolK NAD-dependent sugar epimerase 

nolL O-acetyltransferase activity 

nolO Carbamoyltransferase 

nolIXWBTUV Cultivar-specific nodulation 

nolYZ Unknown 

noeC Arabinosylation 

noeD Genotype-specific nodulation 

noeE Sulfotransferase 

noel 2-O-methylation 

noe] Phosphate guanyltransferase 

noeK Phosphomannomutase 

noeL Dehydratase 

Nod signal transport 

nodl ATP-binding protein 

nod] Integral membrane protein 

nodT Outer membrane protein 

nodO Calcium binding, pore-forming 


protein 
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nodD gene, nodD, appears to repress nod gene 
expression. The mechanism for this repression is 
unknown. For example, it is possible that NodD, 
binds to the promoter, blocking access to NodD,, or, 
alternatively, may form an inactive heteromer with 
NodD,. B. japonicum appears to be unique since it 
also possesses a second system for recognition of the 
isoflavone plant inducer. This system, NodV and 
NodW, shows similarity to the well-characterized 
two-component regulatory systems found in a variety 
of bacteria, as well as yeast and plants. The addition of 
genistein has been shown to cause the autophosphoryl- 
ation of NodV with the subsequent transfer of this 
phosphate to a conserved aspartate (D70) residue on 
NodW. A mutant NodW, in which D70 was converted 
to an asparagine, could not be phosphorylated by 
NodV and was unable to activate nod gene transcrip- 
tion. Unlike NodD, there is no evidence that NodW 
can directly interact with the nod promoter. Mutations 
in either nodD, or nodVW do not result in a complete 
loss of nodulation on soybean. However, a double 
mutant, nodD,nodW, is completely defective for 
nodulation. The model proposed suggests that 
NodD, and NodW are individually dispensable for 
nodulation of soybean, but are required for nodula- 
tion of other B. japonicum hosts. 

Similar to S. meliloti, nod gene expression in 
B. japonicum is also under negative control. This con- 
trol is mediated by No1A, in a similar way to the 
MerR family of transcriptional regulatory proteins. 
No1A is required for expression of NodD, and prob- 
ably mediates its repressive effects in this manner. The 
unique feature of nolA is that it encodes three, distinct 
polypeptides. The longest, NolAy, contains a helix- 
turn-helix, DNA-binding motif in its N-terminus; 
therefore, is it probably the transcriptional regulator. 
This is supported by genetic studies, for example, 
NolA, is required for transcription of the two shorter 
nolA peptides via activation of one of two promoters 
that control expression. 

The surprising complexity of nod gene regulation 
probably reflects the importance of this process to the 
physiology and ecology of the rhizobia. It is clear that 
such regulation plays a critical role in host range deter- 
mination. Moreover, fine control of the production of 
the potent lipo-chitin Nod signal is essential for estab- 
lishment and maintenance of the symbiotic state. 

Although the focus of nod gene research has been 
on the lipo-chitin signals, there is some evidence that 
nod gene products affect other processes in the cell. 
For example, in S. fredii, Rhizobium NGR234, and 
B. japonicum, mutations in nodD, can affect cellular 
polysaccharide synthesis. Work in both S. fredii and 
Rhizobium NGR234 has implicated a Type III secre- 
tion system in the export of proteins that affect the 


nodulation response. There have been reports that 
mutations in genes that affect lipo-chitin Nod signal 
modification can also affect the chemistry of cellular 
components. For example, mutations in nodH have 
also been reported to affect sulfation of the lipopoly- 
saccharide. The importance of these secondary effects 
is unknown. 


See also: Nod Factors; Plant Growth Promoting 
Rhizobacteria (PGPR); Rhizobium 
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Nodulins are the products of genes expressed at ele- 
vated levels in nodules or during the course of angio- 
sperm root nodule development. The functional study 
of nodulins represents an attempt to understand the 
molecular biology of nodule development and sym- 
biotic nitrogen fixation. Monitoring the transcrip- 
tional activity of nodulin gene promoters is a useful 
way to analyze signal transduction cascades associated 
either with plant—microbe interaction, or with plant 
cell morphogenesis. 

Root nodules are fully differentiated plant organs 
that harbor procaryotic nitrogen-fixing endosym- 
bionts. The capacity to establish a root nodule sym- 
biosis is confined to a single group of higher plants, the 
Rosid clade I. Within this clade, six tribes have 
evolved the capacity to nodulate with Frankia (a 
Gram-positive endosymbiont) while, within the Legu- 
minosae (Fabaceae), symbiosis is established exclu- 
sively with members of the Rhizobium superfamily 
(which are all Gram-negative endosymbionts). The 
study of nodulins has mainly involved crop legumes 
(e.g., soybean, peas, beans, and alfalfa) but, in future, 
it will focus increasingly on two ‘model legumes’ 
Medicago truncatula and Lotus japonicus, which are 
more amenable for genetic and molecular analysis. 

When first used in 1980, the term ‘nodulin’ 
described the products of genes that were expressed 
exclusively in nodules, but currently the term is taken 
to include nodule-enhanced gene products. In some 
cases nodulin genes may be essential for nodule de- 
velopment, but in other cases they may function co- 
operatively to enhance the process. Some legume 
nodulins have counterparts in non-nodulating 
plants, e.g., rice, and some may have corresponding 
roles in the development of the Frankia root nodule 


symbiosis. Perhaps more surprisingly, several nodulins 
are expressed during the development of the arbuscular 
mycorrhizal symbiosis in legumes. In evolutionary 
terms, symbiosis with mycorrhizal Glomales spp. pre- 
dates the origin of root nodule symbiosis by several 
hundred million years. However, legume mutants 
have been found that are defective both in nodule 
initiation and in the initiation of the mycorrhizal 
symbiosis. Phenotypic analysis suggests the existence 
of common developmental processes, some operating 
early and others operating at later stages in the develop- 
ment of the symbiotic interface. 

The operational definition of a nodulin has changed 
with changing techniques of plant molecular biology. 
Originally, nodulins were identified serologically 
using a tissue-specific antiserum that had been pre- 
adsorbed against an excess of root antigens. In recent 
times, this has been superceded by a ‘proteomics’ 
approach (designed, for example, to identify all the 
proteins of fractionated nodule membranes) or by 
the use of monoclonal antibodies to identify and 
purify plant proteins or glycoconjugates that are 
expressed at a particular time or at a particular place 
during nodule development. Differential expression 
of mRNAs in nodules has been analyzed through 
subtractive hybridization procedures coupled to 
cDNA cloning, or by the use of differential-display 
PCR protocols. Increasingly, however, nodulin 
sequences are isolated from enormous libraries of 
‘expressed sequence tags’ derived from nodule 
mRNA. (These are screened to identify clones giving 
nodule-enhanced gene expression and subjected to 
random DNA sequencing to find homology matches 
for the gene products identified.) Finally, there is the 
‘promoter-trapping’ approach whereby gene expres- 
sion from tissue-specific promoters can be identified 
following random insertional mutagenesis with a pro- 
moterless reporter gene (e.g., glucuronidase). 

Nodulins can be classified according to their time 
of expression, their site of expression, or according to 
their biochemical function. A simple distinction is 
between ‘early nodulins’ (referred to as ENODs) 
expressed prior to the onset of nitrogen fixation and 
‘late nodulins’ (expressed synchronously with or later 
than the onset of nitrogen fixation). Some early nodu- 
lins are induced simply by application of the Rhizo- 
bium-derived lipochitin oligosaccharide signal 
molecule. Among these are ENOD2 (a cell wall pro- 
tein expressed in the outer uninfected tissues of the 
nodule); ENOD40 (encoding a regulatory RNA and 
an oligopeptide that is apparently involved in cell 
cycle activation); and ENOD12 (a proline-rich 
cell wall protein involved in the process of tissue and 
cell invasion by the microsymbiont). Among the late 
nodulins are components that adapt the physiology of 
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the nodule for its specialized role in biological nitro- 
gen fixation. Examples include (leg)hemoglobin 
(involved with facilitated oxygen diffusion in the 
host cell cytoplasm), sucrose synthase (involved with 
carbon metabolism), glutamine synthetase (involved 
with assimilation of ammonia, the product of nitrogen 
fixation), and nodulin-26 (a membrane channel 
protein with homology to aquaporins, involved in 
regulating the microenvironment of endosymbiotic 


rhizobia). 


See also: Nodulation Genes; Rhizobium; 
Symbionts, Genetics of 
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Genetic description and experimentation involves 
the analysis of genes, alleles, transcripts, proteins, 
genotypes, phenotypes, and strains. Systematic nomen- 
clature for the objects within each of these sets is 
highly desirable, in order to achieve unambiguous 
communication and annotation, as well as efficient 
storage and documentation. This need has become 
ever greater, as the amount of knowledge has ex- 
panded, along with the ability to handle information 
digitally. Geneticists working on a variety of different 
organisms have proposed systematic rules or recom- 
mendations for genetic nomenclature for each system. 
Unfortunately, for reasons of history, practicality and 
accident, the recommendations are not uniform be- 
tween different systems. 

To illustrate this, Table I lists how eight different 
organism nomenclature systems would deal with a 
hypothetical gene, (named hypothetical one), its pro- 
tein product, the wild-type allele, a mutant allele, and 
the mutant phenotype caused by the allele. None of 
these systems uses completely identical notation. 
Some systems, designed to be readily parsed by com- 
puter, adhere to strict three-letter formats and avoid 
superscripts (bacteria, Caenorhabditis elegans); others 
are more flexible in length of gene name (Drosophila, 
vertebrates). One general rule is that genes and their 
alleles are always written in italic script, and proteins 
and phenotypes are never italicized. 

The words, abbreviated words or acronyms that are 
used as the core of gene names have two main sources. 
First, genes can be named after a mutant phenotype, in 
cases such as w (white, white-eyed) in Drosophila, unc 
(uncoordinated) in C. elegans, and shaker in the 
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Table | Nomenclature systems for eight different organisms for a hypothetical gene, its protein product, the wild- 
type allele, a mutant allele, and the mutant phenotype caused by the allele 

Organism Gene Protein Wild-type Allele Phenotype 
E. coli hypA HypA + hyp Al Hyp~ 

S. cer. HYPI Hyplp HYPI hyp l-I Hyp 

S. pom. hypI* Hyplp hyp-1* hypl-xl hyp— 

A. tha. HYPI HYPI HYPI hypl-I Hyp 

C. ele. hyp-1 HYPI hyp-1 (+) hyp-I (x!) Hyp 

D. mel. hyp! HYPI hyp- 1" hyp |*' hyp 

M. mus. Hypl HYPI - HypI*' hyp 

H. sap. HYPI HYPI - HYP-1*X] HYP 


Organisms: E. coli (bacterium), S. cer. (Saccharomyces cerevisiae, budding yeast), S. pom. (Schizosaccharomyces pombe, fission 
yeast), A. tha. (Arabidopsis thaliana, plant), C. ele. (Caenorhabditis elegans, nematode), D. mel. (Drosophila melanogaster, fruitfly), 
M. mus. (Mus musculus, mouse), H. sap. (Homo sapiens, human). 


mouse. Second, genes can be named after the biochem- 
ical product, in cases such as adh (alcohol dehydro- 
genase) in Drosophila and rrn (ribosomal RNA) in 
C. elegans. Originally almost all gene naming was 
based on mutant phenotypes, but with the advent of 
molecular cloning and genome sequencing, the bal- 
ance has shifted over to naming on the basis of pre- 
dicted gene product. This has the advantage that it 
is easier to perceive the homologous relationships 
between genes in different organisms, because the 
gene encoding a particular enzyme can be given a 
similar or identical name in each species, whereas the 
mutant phenotypes resulting from defects in this 
enzyme may be different from organism to organism, 
which would lead to dissimilar names. 

Additional rules for naming genotypes, chromo- 
somal aberrations, suppressors, transposons, trans- 
genes and so on have also been developed, when 
necessary, in each experimental system. 

One may note two serious problems in genetic 
nomenclature, which create endless confusion and 
seem unlikely ever to be solved in any general way. 
First, in a single organism multiple different names 
may be used for the same object, usually a gene. This 
often occurs as a result of several research groups 
converging on the same gene from different angles, 
and naming it on the basis of different mutant pheno- 
types or different descriptions of the same phenotype 
or different biochemical properties. Alternative names 
may persist in the scientific literature indefinitely, 
because the abandonment of a gene name may be 
seen as the cession of priority. In principle, nomenclat- 
ure authorities and scientific journals should together 
be able to encourage simplification and the universal 
adoption of a single name, but in practice this rarely 
happens. 

Second, the same name may be used to refer to 
different objects. This situation rarely occurs within 


the nomenclature system used for any particular 
organism, and is usually quickly rectified. However, 
it is a frequent and difficult problem when dealing 
with different organisms. An example is provided by 
the cell division cycle genes of budding yeast (Saccharo- 
myces cerevisiae) and of fission yeast (Schizosaccharo- 
myces pombe). There are dozens of identified and 
well-studied genes affecting cell division in each 
organism, and both sets are called CDC (cdc), but 
there is no connection between the numbering used 
for the two sets. Consequently, it is particularly hard 
to keep track of the correspondences between organ- 
isms in this area, especially when names have been 
indiscriminately borrowed from both yeasts for use 
in other experimental systems. Ultimately, stable and 
uniform nomenclature may solve this problem, but 
not in the near future. 


Further Reading 
Wood R (ed.) (1998) Genetic nomenclature guide. Trends in 
Genetics 14 (Supplement). 


See also: Human Genetics 
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Standard genetic nomenclature is essential for com- 
munication among scientists studying mouse biology, 
mouse genomics, or comparative genomics and be- 
tween databases or different fields of science. A 
unique name for each mouse strain and gene is critical 


to their identification in research and in the scientific 
literature. Mouse nomenclature guidelines are based 
upon the premise that the primary purpose of a genetic 
symbol is to provide a brief and universally acceptable 
symbol that uniquely identifies a specific gene, locus, 
strain or chromosomal anomaly. Complex informa- 
tion about genetic entities is conveyed in the descrip- 
tions accompanying them. Nevertheless, the correct 
genetic nomenclature often provides basic informa- 
tion. For example, using the approved gene symbol 
for a mutated gene and the correct genetic nomenclat- 
ure, a knowledgeable user can identify the gene 
mutated, the type of mutation and the genetic back- 
ground on which the mutation is maintained. The 
gene symbol also links a spontaneous or genetically 
engineered mutation with information on that same 
gene in databases or in the literature. The Mouse 
Genome Database (MGD) assigns and registers 
approved gene symbols for the International Commit- 
tee on Standardized Genetic Nomenclature for Mice 
and serves as a contact point for obtaining symbols 
for strains, and chromosomal anomalies (URL: 
http://www.informatics.jax.org). 

Standard genetic nomenclature has been a corner- 
stone of mouse genetics almost since the field began in 
the early 1900s. The first ad hoc Nomenclature Com- 
mittee was established in 1919 with Clarence Cook 
Little as its chair. The first permanent rules for gene 
nomenclature were published in 1940. Names asso- 
ciated with mouse genetic nomenclature over the 
years since 1919 include Sewall Wright, G. H. Shull, 
O. E. White, A. H. Sturtevant, Prof. H. de Haan, 
George Snell, L. C. Dunn, Hans Gruneberg, Margaret 
C. Green, Mary F. Lyon, and Muriel T. Davisson. 
Members of the International Committee on Standard- 
ized Genetic Nomenclature for Mice are scientists 
actively working in the field of mouse genetics and 
biology, elected by the mouse research community. 
They represent many different countries as well as 
many different areas of research using mice. The 
Committee promotes the use of standard genetic 
nomenclature, provides information to colleagues, 
and revises or adds to the rules whenever new tech- 
nologies or types of genes require it. 

The remainder of this entry is a synopsis of the 
rules for naming and symbolizing mouse genes, trans- 
genes, strains and chromosomal anomalies. The 
complete rules may be found at the Mouse Genome 
Informatics web site (URL: http://www.informatics. 
jax.org/nomen/). 

A key feature of mouse genetic nomenclature is the 
Laboratory Registration Code (Lab Code). It is a 3—4 
letter designation for an institution, a laboratory 
research group or an investigator. It is used in symbol- 
izing DNA markers (loci), targeted or chemically 
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induced mutations, transgenes, chromosomal anomal- 
ies and strain sublines. It identifies the investigator 
who has created or developed a mutation, transgene or 
strain. Lab Codes are assigned froma central registry at 
the Institute for Laboratory Animal Research (ILAR) 
in Washington DC, USA (ILAR http://wwwé4:nas: 
edu/cls/afr.nsf/LabCodeSearch? OpenForm). 


Genes 


Names of genes and loci should be brief and chosen to 
convey as accurately as possible the character by 
which the gene is usually recognized, e.g., a visible 
phenotype, a protein, disease susceptibility, or a 
DNA sequence. Genes are functional units, whereas 
a locus can be any distinct, recognizable DNA seg- 
ment. Symbols for genes are typically two-, three-, or 
four-letter abbreviations of the name, although a sym- 
bol may have up to ten characters, and are always 
italicized. Except in the case of genes only known as 
recessive mutations, the initial letter of the gene sym- 
bol is upper case, and all others lower case. Identifica- 
tion of new genes should not be assumed from the 
discovery of variation between individuals or strains, 
and appropriate genetic tests must be made to show 
Mendelian segregation and identity or not with 
known genes. Genes and loci also may be identified 
by any other method that defines a unique map posi- 
tion, but cloning a DNA segment does not necessarily 
identify a new locus. 

Symbols for quantitative trait loci (QTL) genes 
may end in ‘q’ and those affecting the same complex 
trait are given the same stem symbol and serially 
numbered. Other letters that are used for, although 
not exclusive to, specific types of genes include ‘v’ for 
virus-related genes, ‘r’ for receptor or related, ‘I’ for 
like and ‘p’ for protein. Proved pseudogenes are desig- 
nated by the active gene symbol followed by a hyphen 
and the suffix ‘ps.’ Related sequence loci, defined as 
“any locus that is recognized by the same probe as 
the active gene” (which may include related sequences 
not yet proved to be pseudogenes and uncharacter- 
ized members of a gene family) are designated by 
adding a hyphen and ‘rs.’ Genes that are members of 
a series or of a gene family (usually demonstrated by 
sequencing) are designated by a stem symbol and 
numbered serially. Genes encoded by the opposite 
(antisense) strand of a known gene are given their 
own symbols. However, alternative transcripts and 
splice forms from the same gene are not given different 
gene symbols. A proposed new symbol must not 
duplicate one already used for another locus, even if 
the gene effect is very different. New gene or locus 
symbols should be registered with the MGD (URL 
above). 
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Genes already identified in another species are 
given the same symbol in the mouse if the symbol is 
available. Do not insert the letter ‘m’ (for mouse) as 
the first letter of the symbol for a locus with homo- 
logs in other species. Note: ‘synteny’ should not be 
used to describe conservation between species. Use 
‘conserved synteny’ for genes on homologous chro- 
mosomes and ‘conserved linkage’ or ‘conserved seg- 
ment’ for genes positionally mapped within the two 
species’ chromosomes. 

When a gene has been recognized initially by a 
mutation and later the structural gene is identified, 
the gene is identified by the symbol for the structural 
gene and the mutant allele symbol is designated as 
a superscript to the structural gene symbol, e.g., W, 
which is a mutation in Kit, becomes Kit™. 

D symbols are used for loci identified as segments 
of DNA. A D locus may be an anonymous locus or 
may exist within an identified gene. When the latter 
(termed aliases) are used to follow the gene in linkage 
tests or on genetic maps, the gene symbol is used to 
denote the gene; intragenic D symbols should only 
used for describing intragenic recombination. D sym- 
bols consist of (1) D for DNA, (2) 1...19, X and Y for 
the chromosomal assignment (0 for unmapped loci), 
(3) a Lab Code indicating the laboratory or scientist 
describing the locus, and (4) a unique serial number. 
When describing genetic mapping results, the allele 
type of a specific strain should be given by fragment(s) 
size with a description of the assay used but linkage 
data may be tabulated using single uppercase letters to 
denote the strain alleles. D symbols are also used for 
mini- or microsatellites (simple sequence repeats), 
genetically mapped clone ends or sequence tagged 
sites. Expressed sequence tagged (EST) loci, when 
mapped to chromosomes and not identified to an 
already known gene, should be designated by the 
sequence database accession number. Novel genes 
identified by genome sequencing and validated as 
expressed by some assay, are identified by the BAC, 
etc., clone name and a serial number assigned from 
MGD. 

Alleles are usually designated by the locus symbol 
with an added superscript, also in italics. In the case of 
mutant genes for which there is clearly a wild-type, the 
symbol for the first discovered mutant allele becomes 
both the gene symbol and the symbol for that allele 
until the gene is cloned. Induced mutations are desig- 
nated by a superscript consisting of ‘m’ for mutation, a 
serial number and a Lab Code. Targeted mutations are 
similarly identified except that the prefix is ‘tm’ for 
targeted mutation. Wild-type alleles may be designated 
by a ‘+’ sign, when the gene is clear by context, or by 
‘# as a superscript to the mutant symbol; reversions 
to wild-type are designated by the mutant symbol 


with a superscript ‘+’. When an existing gene is 
replaced with a different, functional gene, called a 
‘knock-in,’ the symbol is written as an allele of the 
original gene. Much more detailed information on 
allele symbols is given in the MGD. The term haplo- 
type may be used to define a set of DNA sequence 
variants within a gene, a complement of alleles at 
multiple loci within a complex or the complement of 
alleles at several loci along a chromosome, typically 
when typed in linkage analysis. Phenotype symbols 
for protein type loci are the gene symbols written all in 
upper case and not italicized. 


Transgenes 


DNA sequences experimentally and stably intro- 
duced into mouse chromosomes are transgenes. They 
are named according to the following conventions, 
developed by an interspecies committee sponsored 
by the Institute for Laboratory Animal Research 
(ILAR) in 1992 and revised in 2000. A transgene sym- 
bol consists of four parts: Tg(YYYYY Y)#####Zzz, 
where Tg is the mode of insertion, (YYYYYY) briefly 
describes the insert, ##### is a Laboratory assigned 
number and Zzz is the Lab Code. Transgenic nomen- 
clature is used for homologous recombination inser- 
tions only when it is used as a mechanism to insert a 
transgene and it is the transgene itself that is of pri- 
mary interest. The insert designation, contained 
within parentheses, is the official gene symbol of the 
inserted DNA. If it is critical to identify the promoter, 
the gene symbol from which it is derived may precede 
the coding gene separated from it by a hyphen. Fusion 
genes may be designated by the symbols for the two 
genes separated by a backslash (/). The character in 
parenthesis may be deleted after first use in manu- 
scripts to abbreviate the symbol. It may be up to six 
characters or it and the laboratory assigned number 
together may be eleven characters. 


Strains 


Since laboratory strains are neither pure Mus domes- 
ticus nor musculus, they should be referred to as 
‘laboratory mice’ or by the inbred strain name when 
known. Mouse strain symbols consist of uppercase 
letters and occasionally numbers. Strain symbols are 
not italicized. Special types of strains include standard 
symbol components that identify the type of strain. 
Strain symbols are followed by a forward slash and 
one or more Lab Codes identifying the originator 
of the strain and subsequent holders. This section 
includes definitions of strain types and the way they 
are symbolized. 


An inbred strain is defined as being created by >20 
(F20) generations of sibling matings and can be traced 
to a single ancestral breeding pair. However, some 
residual heterozygosity may persist up to F40. Strains 
with a common origin separated before F20 are given 
symbols that indicate relationship, e.g., NZB, NZC, 
NZO. Inbred strains derived from only two parental 
strains may be designated using abbreviations for the 
two strains separated by a comma, e.g., B6; 129. Sub- 
strains are designated by adding a foward slash and a 
holder Lab Code. An established inbred strain is con- 
sidered to be divided into substrains when known or 
probable genetic differences become established in 
separate branches, when branches are separated before 
F40 or when a branch is known to have been main- 
tained separately from other branches for > 100 gen- 
erations from their common ancestor. Existing strain 
symbols are listed in MGD and the contact for obtain- 
ing new symbols may be obtained from MGD. 

Recombinant inbred (RI) strains are formed by 
crossing two inbred strains, followed by 20 or more 
generations of sibling mating. The names of RI strains 
consist of an abbreviation of both parental strain 
names separated by a capital X (e.g., CXB is a set of 
recombinant inbred strains derived from a cross of a 
BALB/c female x a C57BL male). Recombinant con- 
genic (RC) strains are formed by crossing two inbred 
strains, followed by a few (usually two) backcrosses to 
one of the parental strains (the recipient strain), with 
subsequent inbreeding without selection for specific 
markers. RC strains are designated by abbreviations of 
the names of the two parental strains (with the recipi- 
ent strain given first, followed by the donor strain) 
separated by a lower case ‘c’ (e.g. CcS, a set of recom- 
binant congenic strains from a cross between BALB/c 
and STS, backcrossed to BALB/c). Individual strains 
of a RI or RC series are distinguished by appending 
numbers to the strain symbols. 

Two strains that are genetically identical (i.e., iso- 
genic), except for a difference at a single gene, are 
called coisogenic. True coisogenicity can be achieved 
only by mutation within an existing inbred strain. 
Segregating inbred strains are developed and main- 
tained by inbreeding with forced heterozygosis. Co- 
isogenic and segregating strains are designated by a 
strain symbol followed by a hyphen and the gene 
symbol of the segregating locus, followed by /+ for 
segregating strains. Congenic strains are produced by 
crossing a differential gene onto an inbred strain by 
repeated backcrosses to the inbred strain. They are 
designated by the abbreviated symbol of the back- 
ground (host) strain followed by a period, the abbre- 
viated symbol of the donor strain, a hyphen and the 
symbol of the differential gene or genes (in italics) 
(e.g., B10.129-m). Although a congenic strain is often 
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useful after five generations of backcrossing (N5), it is 
only considered congenic at N10 or the equivalent. 
New methods known as marker-assisted selection 
breeding or ‘speed congenics’ allow a congenic strain 
to be produced in as few as five backcross generations 
by using markers to identify second backcross gener- 
ation progeny that by chance have the largest contri- 
bution of inbred host strain and selecting mice with 
recombinants closely flanking the segment or gene of 
interest. Consomic strains are produced by repeated 
backcrossing of a whole chromosome onto an inbred 
strain. The symbol for a consomic strain is indicated 
by the symbol for the host strain followed by a 
hyphen and the chromosome number with the donor 
strain as a superscript (e.g, C57BL/6J-Y"*®). Con- 
plastic strains are developed by backcrossing the 
nuclear genome from one strain into the cytoplasm 
of another, i.e., the mitochondrial parent is always 
the female parent in backcrosses. A sample designa- 
tion is C57BL/6J-mt?”^ 8^, F; hybrids are designated 
by listing the female progenitor first and the male 
progenitor second (e.g., B6D2F, mice are the off- 
spring of a C57BL/6J female mated to a DBA/2J 
male; D2B6F, mice are offspring of the reciprocal 
mating). 

Non-inbred stocks are sometimes given specific 
designations if they meet specific criteria as defined 
in ICLA (1972). Symbols are composed of the holder 
Lab Code followed by a full colon and characters 
identifying the stock. A special type of outbred stocks 
are advanced intercross lines (AIL) that are made 
by producing an F, generation between two inbred 
strains and then, in each subsequent generation, inter- 
crossing mice but avoiding sibling matings. The pur- 
pose is to increase the possibility of tightly linked 
genes recombining. Symbols contain the Lab Code 
followed by a full colon, abbreviations for the two 
strains separated by a comma, a hyphen and the gen- 
eration number (G#). The G number will increase 
with each generation. 


Chromosomal Anomalies 


Autosomal chromosomes are numbered and identi- 
fied according to size (Lyon et al., 1996). The X and 
Y chromosomes are indicated by capital letters. The 
word chromosome begins with a capital letter when it 
refers to a specific chromosome and may be abbre- 
viated to Chr after the first use. Symbols for chromo- 
some anomalies begin with a 2-3 letter abbreviation 
that identifies the type of anomaly. Chromosome 
anomaly symbols are not italicized. 


Cen Centromere 
Del Deletion 
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Df Deficiency 

Dp Duplication 

Hc  Pericentric heterochromatin 
Hsr Homogeneous staining region 
In Inversion 

Is Insertion 

Ms Monosomy 

Ns Nullisomy 

Rb Robertsonian translocation 


Sp Supernumerary chromosome 
T Translocation 

Tel Telomere 

Ts Trisomy 


Tet Tetrasomy 
Tp Transposition 


Successive anomalies in a series from one laboratory 
are distinguished by a serial number followed by the 
Lab Code. The chromosome(s) involved in the anom- 
aly are identified by inserting the numbers in parenth- 
eses between the initial letter and the series symbol. 
The two chromosomes involved in translocations and 
insertions are separated by a semicolon, whereas in 
Robertsonian translocations they are separated by a 
period. In the case of insertions, the number of the 
chromosome donating the inserted portion is given 
first. When the G-band locations of chromosomal 
breakpoints are known, these may be indicated by 
including the band numbers in the parentheses 
[T(1A;2H1)#Dn]. When one chromosome anomaly 
is contained within another or inseparable from it, 
the symbols should be combined [e.g., T (In1;5)44H 
is a translocation between Chrs 1 and 5 in which the 
Chr 1 segment is inverted]. Mouse autosomes and the 
X do not have short arms; the symbols p and q may be 
used to denote the short and long arms, respectively, 
of the Y chromosome. Additional details on 
chromosome nomenclature may be found in MGD 
or Lyon et al., 1996. 
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Background 


The first reports which assigned human genes to one 
of the complement of human chromosomes, other 
than the sex chromosomes, appeared in the late 
1960s, and since then the field of human genetics has 
undergone very rapid growth and development. As 
soon as human genes became identifiable as distinct 
entities whose characteristics could be described, it 
was necessary to give them names. At first, the choice 
of names was fairly obvious. The existence of genes 
was deduced from studies of the inheritance of char- 
acteristics or, most commonly, diseases, and thus the 
genes could be named for the disease or other char- 
acteristic they affected. Some genes were discovered 
from the inheritance pattern of a protein, usually an 
enzyme. If this protein could be described in terms of 
the biochemical reactions catalyzed, or structural 
components formed in the body, then this function 
again provided an obvious choice of name for the 
gene. The latter descriptions can be considered more 
valid than names based on diseases, as it is more logical 
to describe a gene in terms of its normal function in the 
majority of the human population than its effects 
when it is nonfunctional or incompletely functional. 
Thus in the early days, it was relatively easy for the 
researchers in the field to name the genes they dis- 
covered in such a way that others would understand 
the meaning. 


Symbols and Names 


Once genes had been assigned to several of the chro- 
mosomes, their order along the length of the chromo- 
some began to be established and human gene 
mapping had truly begun. The publication of gene 
orders generated the idea of a gene symbol, in addition 
to its name. This symbol was a shortened form of the 
name, memorable and recognizable, but short enough 
to be included in the diagrammatic representations of 


chromosomes called maps. These symbols usually 
consisted of only two or three letters in a combination 
that reflected the name, and perhaps with a number 
added if more than one gene was discovered which 
had a similar function. Example: ADH1, ADH2 (the 
genes encoding different forms of an enzyme, alcohol 
dehydrogenase). 

Clearly, a gene symbol alone has no intrinsic mean- 
ing, it is only meaningful in relation to the longer and 
more descriptive name. Whilst a name may be varied 
considerably and still maintain the same meaning 
(amylase, salivary and salivary amylase clearly have 
the same meaning), the more limited letter and num- 
ber combinations of a short symbol must be invariable 
to avoid ambiguity. The importance of a unique iden- 
tifying symbol was recognized in the early years of 
gene mapping, and a Nomenclature Committee was 
formed to oversee the allocation of appropriate sym- 
bols for use in maps and to devise guidelines to ensure 
the greatest possible consistency. The guidelines were 
subject to many influences which included the estab- 
lished practice, in order to avoid confusion by too 
many symbol changes; the need to reduce ambiguity, 
with uniqueness as the most important criteria; other 
simpler recommendations such as avoiding Roman 
numerals; and the more far-sighted aims of increasing 
accessibility and ‘searchability’ by recommendation 
of hierarchical systems of symbol construction. The 
guidelines were also influenced by the restrictions of 
early electronic storage and communication, such as 
the elimination of Greek letters which could not easily 
be represented in electronic databases, and the restric- 
tions on use of punctuation to facilitate searching. 


Current Issues 


Over the years the field of human genetics has changed 
rapidly. New techniques of gene discovery have 
meant that different types of information are now 
available when a gene is identified. The pace of change, 
and the funding and resources made available, largely 
by the Human Genome Project, have enabled many 
more researchers to become involved. Human gene 
discovery is no longer restricted to the small and 
specialized community of ‘gene mappers.’ This has 
significant consequences for the process of naming 
genes. Genes may now be first identified as a portion 
of the DNA sequence, with certain sequence charac- 
teristics, but with no details of their function in the 
organism. Several research groups may report the iso- 
lation of the same ‘novel’ gene almost simultaneously, 
and each may be approaching it from a different view- 
point. A developmental biologist, a clinician, and a 
biochemist may have very different views about the 
relative importance of different characteristics of the 
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gene. Then there are the geneticists working on model 
organisms such as the mouse, fruit fly (Drosophila), or 
yeast. Often they may discover a gene in their model 
species, and name it for the effects it has in that species 
(tailless, white (eye), budding inhibited ...). Surpris- 
ingly, for many such genes there is a very close relative 
in humans, about which no other information may be 
available at the time of discovery. Thus the same name 
is passed on to the human gene, leaving it in many 
cases with a very bizarre descriptor. 

The Nomenclature Committee established by the 
human gene mapping community is still in existence 
and continues to attempt to solve the problems of 
naming confusion in an impartial way. The cooper- 
ation of the community involved is however impera- 
tive. It needs to be recognized that unofficial or trivial 
names will exist, and that errors or omissions will 
occur resulting in the necessity of changing official 
names. Provision needs to be made for tracking such 
changes, and for translating the unofficial designa- 
tions. 

The nomenclature of human genes cannot be static 
when the field is clearly a dynamic one. Nomenclature 
guidelines, and at times the symbols themselves, will 
change as the data accumulate. Now that many genes 
are identified directly from the DNA sequence data, it 
is no longer feasible to insist on descriptions of func- 
tion before an official symbol can be assigned. The 
currently available techniques and analyses lend them- 
selves better to designations based on sequence rela- 
tionships, and the hierarchical systems previously 
used for genes whose products fulfilled similar roles 
in the living organism are now applied to genes whose 
sequence is related in defined ways. Often these two 
definitions overlap, i.e. similar sequence often results 
in similar function, but it is by no means always the 
case. 


Future Directions 


The science of genetics is increasingly reliant on com- 
puters to store and analyze the vast quantity of data 
now accumulating. As computer applications con- 
tinue to improve, they will enable those who need to 
know to connect up the relevant information in genet- 
ics databases with greater ease. At present however, 
this is not a simple task. A huge amount of effort is 
devoted to searching for information on human genes 
and curating the various databases which store it, in 
order to make it more easily available. Even so, errors 
occur and confusion arises, and this is often due to 
ambiguities in nomenclature. If we are not to lose 
the accumulated information of earlier human genet- 
ics research we must continue to keep track of the 
names and symbols used; we need to maintain an 
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equivalent of the Rosetta Stone for human genes. 
There is always a tendency for specialized commun- 
ities to develop their own particular jargon, unintelli- 
gible to the outsider, and this frequently happens 
within areas of human genetics. However the Human 
Genome Project has important implications far beyond 
the interests of these specialized groups, and those 
involved therefore havea responsibility to void obscur- 
ing their knowledge from the wider community. 
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Nonautonomous controlling elements are defective 
transposons that are able to transpose only when 
assisted by an autonomous controlling element of 
the same type. 


See also: Transposable Elements 
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Nothing in biology makes sense except in the light of evolu- 
tion 
Theodosius Dobzhansky 


In 1966 I became interested in the amino acid 
sequences of cytochrome c molecules (Jukes, 1966). 


¥ deceased 


I noted that these sequences differed in the cyto- 
chromes c of various species to an extent that seemed 
unnecessary from the standpoint of their function. 
I stated that 


the changes produced in proteins by mutations will in some 
cases destroy their essential functions, but in other cases the 
change allows the protein molecule to continue to serve its 
purpose. 

Jukes, 1966 


I sought the collaboration of a geneticist (Jack King) to 
help me cope with this idea. 

Early intimations of neutrality may be found in the 
publication of Reichert and Brown (1909). They 
compiled the crystallographic structure of vertebrate 
hemoglobins on a taxonomic basis. They stated the 
principle that “substances that show differences in 
crystallographic structure are different chemical sub- 
stances.” In short, if two crystals have identical crys- 
talline structure, the molecules of which they are 
composed are identical. A report of their studies is 
shown in Table 1. 

Their data showed that an increase in the diver- 
gence of crystallographic properties was found to be 
parallel to the taxonomic separation of various ani- 
mals. Of much interest is the fact that a sample of 
blood labeled as that of a baboon was found upon 
examination of the hemoglobin crystals to be that of 
a cat, and a subsequent follow up showed that mis- 
labeling of the sample vial had occurred (Reichert and 
Brown, 1909). Reichert and Brown’s monograph 
remains as one of the earliest landmarks in the history 
of molecular evolution. They observed that hemin 
crystals obtained from different species were always 
identical, so the differences observed in hemoglobin 
between species must have been due to the globin 
portion of the molecule. It is now known that the 
differences are due to amino acid substitutions 
throughout the polypeptide chains of the globins. 
These substitutions are the result of single base 
changes in the DNA strands of the hemoglobin genes. 

The concept that each protein from each species of 
animal was a single chemical substance at the molecu- 
lar level was implicit for the hemoglobins in the report 
by Reichert and Brown (1909). It was again stated in 
1952 by Sanger as a result of studies of the amino acid 
sequence in insulin: 


It has frequently been suggested that proteins may not be 
pure entities but may consist of mixtures of closely related 
substances with no absolute unique structure. The chemical 
results so far obtained suggest that this is not the case and 
that a protein is really a single chemical substance, each 
molecule of one protein being identical with every other 
molecule of the same protein. Thus it was possible to assign 


Table | Crystallographic comparison of reduced 
hemoglobins of species in Felidae contrasted with other 
species of carnivora? 


Axial ratio 

Specific name Common name a:b:c 
Felidae 

Felis leo Lion 0.9742: 1:0.3707 

Felis tigris Tiger 0.9742:1:0.3839 

Felis bengalensis Leopard 0.9657: 1:0.3667 

Felis pardalis Ocelot 0.9489: 1:0.393 I 

Felis domestica Domestic cat 0.9656: 1:0.3939 

Lynx canadensis Lynx 0.9605: 1:0.3944 

Lynx rufus Wildcat 0.9869: 1:0.3914 
Canidae 

Canis familiaris Dog 0.6745: 1:0.2863 

Vulpes fulvus Red fox 0.6494: 1:0.2894 
Ursidae 

Ursus americanus Black bear 1.2239:1:1429 
Otariidae 

Phoca vitulina Harbor seal 1.2131:1:1970 


“From Reichert and Brown (1909). 


a unique structure to the phenylalanyl chains of insulin. 
Each position in the chain was occupied by only one 
amino acid and there was no evidence that any of them 
could be occupied by a different residue. Whether this is 
true for other proteins is not certain but it seems probable 
that it is. The N-terminal residues of several pure proteins 
have been determined . . . and this position is always found to 
be occupied by a single unique amino acid. These results 
would imply an absolute specificity for the mechanisms 
responsible for protein synthesis and this should be taken 
into account when considering such mechanisms. 

Sanger (1952) 


The term ‘non-Darwinian evolution’ was introduced 
by King and Jukes (1969) to assert that “most evolu- 
tionary changes in proteins may be due to neutral 
mutations and genetic drift.” The term ‘Darwinian 
evolution’ refers to Darwin’s original publication 
(King and Jukes, 1969), which depicts evolution as 
being descent with modification, produced by natural 
selection for desirable characteristics and advanta- 
geous genes. In molecular terms, this would occur or 
be accompanied by adaptive changes in DNA. King 
and Jukes (1969) stated 


natural selection is the editor, rather than the composer, of 
the genetic message. One thing the editor does not do is to 
remove changes which it is unable to perceive. 


Deleterious mutations have long been familiar; for 
example, the effects of X-rays are to produce such 
mutations. Beneficial mutations are quite rare, but 
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are of great importance. For example, a few muta- 
tional changes improved the function of hemoglobins. 
Mammals have tetrameric hemoglobins that increase 
their function of oxygen transport from the lungs to 
the tissues. We can see the reduced hemoglobin in our 
own blue veins, as it is on its way to the lungs for 
reoxygenation. 

Further consideration of these ideas led to the writ- 
ing and publication of an article entitled “Non- 
Darwinian Evolution” (King and Jukes, 1969). In 
retrospect, it might have been better to entitle the 
article “Non-adaptive Evolution,” because “Non- 
Darwinian” probably raised the hackles of admirers 
of Charles Darwin. (It is amusing to remember that 
Darwin himself raised a storm of indignation among 
his contemporaries.) Previously, Kimura (1968) had 
published a short note in Nature in which he pointed 
out that the rate of random fixation of neutral muta- 
tion in evolution, per species per generation, is equal 
to the rate of occurrence of neutral mutation per 
species per generation, and is independent of popula- 
tion size. Kimura devoted much of the rest of his 
career to investigating and defending the neutral the- 
ory, and published a book on it in 1983 (Kimura, 
1983). 

The theory postulates that 


nucleotide substitutions inherently take place in DNA as a 
result of point mutations followed by random genetic drift. 
In the absence of selection constraints, the substitution rate 
reaches the maximum value set by the mutation rate, e.g., 
about 5 x 107° substitutions per site per year 


or at a lower rate when constraints are imposed by 
natural selection (King and Jukes, 1969). 

Although the neutral theory is now widely ac- 
cepted for changes in pseudogenes and other forms of 
noncoding DNA, the theory has been — and remains — 
very controversial when applied to protein-coding 
sequences. One class of nearly neutral mutations 
should be the changes from one synonymous codon 
to another, such as ACU to ACC, threonine. Another 
class of mutations often regarded as nearly neutral 
consists of changes found chemically similar to amino 
acids, such as GAC to GAG (aspartic acid to glutamic 
acid). Deleterious mutations should disappear under 
the influence of natural selection. Beneficial mutations 
can occur, though rarely, such as those that differen- 
tiate a and B-hemoglobins, and thus enabled « and 
B-hemoglobins to form loose bonds, producing a 
tetramer. This innovation improved oxygen transport 
to the tissues by oxyhemoglobin, followed by its 
reduction to reduced hemoglobin, which returns to 
the lungs through the venous system for recharging 
with oxygen. The ‘primitive’ species, the lamprey, 
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does not possess a tetrameric hemoglobin, but instead, 
a monomer. Since the lamprey is parasitic, there is no 
evolutionary pressure to ‘improve’ its hemoglobin. 
(Note that the function of hemoglobin as a transporter 
of oxygen from the lungs (or gills) to the tissues, 
remains the same in all species.) 

Darwin was very optimistic about improvement 
produced by descent with modification. In On the 
Origin of Species, Darwin made the following state- 
ment: 


And as natural selection works solely by and for the good of 
each being, all corporeal and mental endowments will tend 
to progress towards perfection. 


This is incorrect. Deleterious mutations occur fre- 
quently during evolution. There is no “progress 
towards perfection,” rather there is an equilibrium 
between advances and retreats. 

In our 1969 article (King and Jukes, 1969), 
Jack King and I took issue with the following state- 
ment: 


The consensus is that completely neutral genes or alleles 
must be very rare if they exist at all. For an evolutionary 
biologist it therefore seems improbable that proteins, 
supposedly fully determined by genes, should have non- 
functional pasts, that dominant genes should exist over peri- 
ods of generations, or that molecules should change in a 
regular but nonadaptive way...[natural selection] is the 
composer of the genetic message and DNA, RNA, enzymes 
and other molecules in the system are successively its 
messengers. 


Our viewpoint was that evolutionary change arises 
from within DNA, and that natural selection is the 
editor of the genetic message: “One thing the editor 
does not do is to remove changes which it is unable to 
perceive.” 

We also took issue with a statement that each amino 
acid (in a protein) 


must have a unique survival value in the phenotype of the 
organism — the phenotype being manifested in the structure 
of the proteins. 


We disagreed, saying that 


to hold that selectively neutral isoalleles cannot occur is 
equivalent to maintaining that there is one and only one 
optimal form for every gene at any point in evolutionary 
time. We think that life is not so inflexible (loc. cit.) 


We said “... drift is slow but effective in the fixation 
of neutral mutations,” and that, as pointed out by 
Kimura (1968) 


the rate of random fixation of neutral mutations in evolu- 
tion, per species per generation, is equal to the rate of occur- 
rence of neutral mutations per gamete per generation 


and that 


of the 2N copies of a gene in a population of N individuals at 
one point in evolutionary time, only one is destined to be the 
ancestor, through replication, of all the copies of the gene 
that will be in existence in the species in the distant evolu- 
tionary future. The process by which one line becomes fixed 
has been called ‘genetic drift,’ ‘random walk’ or ‘branching 
process.’ If all copies of the gene are selectively equivalent, 
all have equal chances of becoming the common ancestor. 
Thus if a newly occurring mutation is selectively neutral, its 
probability of becoming fixed through random drift is 
1/2N.... Thus the rate of non-Darwinian evolutionary 
change is a function only of the rate of occurrence of neutral 
mutations and is independent of population size.... Even- 
tually the ‘random walk’ of the gene frequency goes to the 
ground states of loss or fixation. 


Once the neutral theory had been stated, examples of 
its effect were proposed. For example, in the genetic 
code, some base pair changes are without effect on 
protein structure: ACC and ACG are both codons 
for threonine, and to change from ACC to ACG 
would therefore be neutral. Of the 549 possible single 
base changes in the 61 amino acid specifying codons, 
134 are substitutions to synonymous codons. These 
should be neutral with respect to natural selection 
except in so far that in some organisms there is natural 
selection favoring the use of some synonymous codons 
over others. 

Mutation pressure should therefore give rise to 
many neutral mutations. In 1961, before the genetic 
code had been discovered, Sueoka noted amino acid 
differences between AT-rich and GC-rich bacterial 
species (Sueoka, 1961). Cox and Yanofsky (1967) stud- 
ied a strain of Escherichia coli containing the Treffers 
mutator allele, which produces a trend toward a DNA 
of a higher GC content than that in the original stock. 
Thousands of such mutations accumulated in labora- 
tory cultures without markedly impairing the fitness 
of the mutated strains. 

In mammalian hemoglobins, most changes in resi- 
dues occurring on the outside of the molecule appear 
to be selectively neutral (or at least they have the 
smallest effect on fitness). In contrast, harmful 
changes are produced when they occur in the interior 
of the molecule. The selective effect change is there- 
fore dependent on its location (King and Jukes, 1969). 

From these and other considerations, King and 
Jukes (1969) concluded that 


the genome becomes virtually saturated with such changes 
that are not eliminated by natural selection. We conclude 


that most proteins contain regions where substitutions of 
amino acids can be made without producing appreciable 
changes in protein function. The principal evidence for this 
is the astounding variability in primary structure of homo- 
logous proteins from various species and the rapid rate at 
which molecular changes accumulate in evolution. 


The neutral theory, though controversial at times and 
in respect to some types of mutational changes, has 
been immensely important in evolutionary thinking. 
It served to crystallize ideas and, unfortunately, to 
polarize views. In the real world, we still do not 
know what fraction of synonymous nucleotide sub- 
stitutions are neutral, let alone what fraction of con- 
servative amino acid replacements are neutral. But 
judged as a font of new ideas, the neutral theory has 
been a driving force in theoretical evolutionary biol- 
ogy for the last half of the 20th century (Crow and 
Kimura, 1970; Hartl and Clark, 1997). The reason for 
the primacy of the neutral theory is straightforward. It 
is the perfect null hypothesis against whose expect- 
ations observed data can be compared. There is an 
infinity of ways for any mutation to be non-neutral 
(selection 4 p) but there is only one way for a muta- 
tion to be neutral (selection = 6). 
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Nondisjunction refers to an abnormal distribution of 
chromosomes to cell poles during mitotic or meiotic 
cell division. In mitotically dividing cells, the non- 
disjunction refers to a failure of sister chromatids to 
segregate to the opposite cell poles. The resulting 
daughter cells are aneuploid, trisomic or monosomic, 
for a nondisjoined chromosome. In meiosis, the non- 
disjunction of homologous chromosomes at anaphase 
I results in aneuploid gametes, disomic and nullisomic 
for a given chromosome. 


Nondisjuction can lead to Trisomy, 
Monosomy and Uniparental Disomy 


After fusion of an aneuploid and euploid gamete, a 
trisomic or monosomic zygote for a nondisjoined 
chromosome is created. The abnormal gene dosage 
caused by a chromosome nondisjunction can be high- 
ly deleterious to a developing mammalian embryo. 
With the exception of monosomy for the X chromo- 
some, all other primary monosomies are preimplant- 
ation lethal in mice and humans. 

By a subsequent loss of one of the supernumerary 
trisomic chromosomes (through nondisjunction) the 
trisomy can turn to uniparental disomy (UPD, see 
Figure |). UPD exists in the form of a heterodisomy, 
when sequences of both homologs of the transmitting 
parent are detected, or as an isodisomy, when two 
identical segments from the same parent are observed. 
Both UPD forms can exert a phenotypic effect if the 
UPD region encompasses imprinted genes. Moreover, 
isodisomies can disclose phenotypes of recessive 
mutations when they become homozygous. As a con- 
sequence of meiotic crossing over, a chromosome 
involved in the UPD can be heterodisomic in one 
and isodisomic in the other part. 


Nondisjunction in Translocation 
Heterozygotes 


Organisms heterozygous for a reciprocal chromosome 
translocation are prone to higher frequency of abnor- 
mal meiotic disjunction, including nondisjunction. 
Only alternative disjunction, combining either both 
translocated chromosomes or both intact homologs 
in the gamete, leads to a balanced, euploid genome. 
Adjacent I and adjacent II disjunctions combine one 
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oogenesis, two maternal (M) and one paternal (P) homologs create a trisomy in the early embryo. Due to a meiotic 
crossing-over between grand-maternal (GM) and grand-paternal (GP) homologs, the uniparental disomy is isodisomic 


beyond the recombination breakpoint. 


translocated and one intact chromosome from the 
pachytene translocation cross and result in partial 
nullisomy of one translocated chromosome associated 
with partial disomy of the other chromosome in- 
volved in the rearrangement (Figure 2). Nondis- 
junction can occur also in the 3:1 form when three or 
one chromosomes involved in the translocation cross 
(Figure 2) enter the secondary gametocyte. The 
resulting N + 1 aneuploid gamete contains an extra 
chromosome composed of two chromosomes 
involved in the translocation. After fertilization with 
a normal gamete, this extra chromosome gives rise to 
tertiary trisomy of the embryo. The N—2 aneuploid 
gamete, if functional, results in preimplanation lethal- 
ity when fused with an euploid gamete. 

The unbalanced gametes occur in translocation het- 
erozygotes with a frequency of approximately 50% 
and result in inviable embryos. The phenomenon is 
referred to as ‘semisterility’ in mice, since the trans- 
location heterozygotes display about half of the nor- 
mal number of pups in their litters. Human reciprocal 
translocation carriers can have a family history of 
frequent spontaneous abortions. 


Clinical Consequences of Chromosome 
Nondisjunction 


Errors in chromosome disjunction have a major effect 
on human reproduction. It has been estimated that 15- 
20% of all clinically recognized pregnancies end in 
spontaneous abortion. Of these, 50% are caused by 
chromosome nondisjunction and resulting trisomies. 
Trisomy of chromosome 16, incompatible with post- 
natal survival, is apparently the most frequent nondis- 
junction in human species since it occurs in 1.5% of all 


recognized pregnancies. The most frequent trisomy 
observed in newborns is trisomy for chromosome 
21, known as Down’s syndrome. Using DNA poly- 
morphic markers, trisomy 21 was shown to be in 90% 
of cases of maternal origin, predominantly caused by 
errors in meiosis I. The mechanism of nondisjunction 
is unknown, but it increases with the maternal age. 


Molecular Biology of Nondisjunction 


During normal mitotic cell division, the sister chro- 
matids are distributed to the daughter cells by attach- 
ing their kinetochores to the microtubules from the 
opposite cell poles. Most of the information on mole- 
cular players in chromosome disjunction comes 
mostly from the genetic and biochemical analysis of 
budding yeast, Saccharomyces cerevisiae. The cohe- 
sins, including the Scc1p protein acts as a glue, holding 
sister chromatids together. The separation of sister 
chromatids is regulated by ubiquitin-mediated pro- 
teolysis, via three protein complexes, E1 (ubiquitin- 
activating enzyme), E2 (ubiquitin-conjugating en- 
zyme), and E3 (ubiquitin ligase). E3 is also called the 
anaphase-promoting complex or cyclosome (APC/C) 
in S. cerevisiae. E3 specifically degrades the inhibitor 
Pds1p to allow sister chromatid separation. The role 
of Pds1p is to inhibit sister chromatid separation by 
disabling Esp1p to stimulate Scc1p cleavage. The spin- 
dle checkpoint mechanism ensures, through a signal 
transduction cascade, that the mitosis does not pro- 
ceed to anaphase if one or more chromatids are not 
properly attached to spindle microtubules. The kin- 
etochores thus can be viewed as ‘sensors’ that recog- 
nize the unattached chromosomes and initiate a signal 
causing arrest of the cell cycle. The mechanism by 


Non-Hodgkin’s Lymphoma 


Adj.| 
“Ss 
Alt. 
1t 2 
. g l 
Adj.ll Adj.ll 
1 g 2t 
Alt. 
a 
Adj.| 


Chr. Disjunction 


Combination of chr. in gametes 


Haploid genome 


Alternant 
Adjacent | 
Adjacent II 
3:1 (N + I) 


I+2orlt+2* 
1+ lor IS +2 
1+ tl or2+2° 
14+24+ lorl+242° 


Balanced 

Unbalanced 
Unbalanced 
Unbalanced 


1347 


Figure 2 Chromosome disjunction in meiosis of a reciprocal translocation heterozygote. Only alternant (Alt.) 
disjunction results in balanced gametes. Recombination between a centromere and the translocation break (not 
shown in the picture) results in uneven chromatids, one of which can end in unbalanced, adjacent | product and the 
other can yield a balanced gamete with alternative disjunction. Nondisjunction (3:1) can lead to tertiary trisomies. 


which nondisjunction overrides the spindle check- 
point mechanism is not yet clear. 


See also: Disjunction; Translocation 
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Non-Hodgkin’s lymphoma (NHL) is the collective 
term that describes all solid tumors of lymphocytes 
other than, for historical reasons, the specific subtype 
known as Hodgkin’s lymphoma. Lymphocytes derive 
from stem cells in the bone marrow from where they 
emigrate to form organized collections of lymphoid 
tissue that comprise the thymus, lymph nodes, spleen, 
Waldeyer’s ring, and nodular aggregates in the intes- 
tine. Divided into T cells and B cells according to their 
immune function, lymphocytes are continually circu- 
lating and may accumulate to form organized lymph- 
oid tissue in any site of chronic inflammation. 
Approximately 70% of cases of NHL arise from 
lymph nodes while the remaining 30%, comprising 


the extranodal lymphomas, arise from lymphoid tis- 
sue in other organs and from sites normally lacking 
organized lymphoid tissue such as the stomach, skin, 
brain, and testis. 


Incidence and Etiology 


Non-Hodgkin’s lymphoma is principally a disease of 
the elderly but a significant number of cases occur in 
younger adults and children. There are approximately 
8000 new cases per year in England and Wales and the 
incidence is rising faster than that of any other cancer. 
Except in very few instances, the etiology is unknown. 
Viruses including human Epstein-Barr virus (EBV), 
T-cell lymphotropic virus-1 (HTLV-1) and human 
herpes virus-8 (HHV-8) are associated with some of 
the rarer varieties of NHL. Epstein-Barr virus is the 
most important etiological agent in many of the lymph- 
omas occurring in patients with congenital or acquired 
immunodeficiency, in which there is an increased inci- 
dence of NHL, but its role in lymphomas of immuno- 
competent individuals, although long suspected, has 
never been proven conclusively. 


Classification 


Non-Hodgkin’s lymphoma is not a homogeneous dis- 
ease but comprises a wide variety of different tumors. 
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The classification of this group of tumors has been 
difficult and contentious. As newer techniques for 
studying NHL have emerged, the classification has 
improved and become more reproducible. The current 
Revised European and American Lymphoma (REAL) 
classification and the related World Heath Organiza- 
tion classification group NHL into B cell and T cell, or 
natural killer (NK)-cell types that account for 85% 
and 15% of cases, respectively. Each type is further 
subdivided into tumors of precursor and mature 
lymphocytes. Individual entities, some of which are 
better characterized than others, are defined according 
to their histology, phenotype and genotype, normal 
cell counterpart, and clinical features. 


Histology and Immunophenotype 


The histological appearance of a lymphoma is, in 
effect, the collective expression of its immunopheno- 
type, genotype, normal cell counterpart and, to some 
extent, clinical aggressiveness and as such remains the 
mainstay of lymphoma diagnosis. Once an entity has 
been defined on the basis of its collective properties, 
histology on its own is often sufficient for a definitive 
diagnosis. Immunophenotypic markers that define 
cell lineage and functional properties are useful in 
helping to define individual entities. An increasing 
number of markers that recognize proteins synthe- 
sized as the result of distinctive molecular genetic 
abnormalities are now becoming available, and in 
some instances they may serve on their own to define 
a specific type of NHL. An example is anaplastic 
large cell lymphoma, which is characterized by t(2;5) 
(p23:q35). This translocation results in juxtaposition 
of the nucleophosmin (NPM) gene on chromosome 2 
to the anaplastic lymphoma kinase (ALK) gene on 
chromosome 5 with consequent expression of a 
novel protein NPM-ALK that can be detected using 
monoclonal antibodies. 


Molecular Genetics 


Molecular genetics provides useful and increasingly 
practical tools both for the diagnosis of NHL and 
understanding their biology. Non-Hodgkin’s lymph- 
omas comprise monoclonal populations derived from 
a single B or T cell with uniquely rearranged immuno- 
globulin or T-cell receptor genes. These rearrange- 
ments can be detected by Southern blotting or 
the polymerase chain reaction can be exploited in 
differentiating reactive (polyclonal) from neoplastic 
(monoclonal) accumulations of lymphocytes and in 
assigning a cell lineage to NHL. Analysis of the pre- 
sence and/or frequency of immunoglobulin gene 
mutations can further specify subtypes of B-cell 


NHL. With increasing recognition that cancer is a 
genetic disease, the genotype of lymphomas is assum- 
ing greater significance in their classification and diag- 
nosis. Characteristic chromosomal translocations that 
often involve the juxtaposition of apoptosis or cell 
cycle genes to immunoglobulin genes appear to play 
a major role in the pathophysiology of B-cell NHL. 
T(14;18)(q32;q21), which results in upregulation of 
the anti-apoptosis bcl-2 gene in follicular lymphoma, 
and t(11;14)(q13;q32), which results in overexpression 
of the important cell cycle regulator gene cyclin D-1 in 
mantle cell lymphoma, are two notable examples. 
Mutations of cell cycle and DNA repair genes includ- 
ing p53 have also been described in NHL and are 


important in disease progression. 


Normal Cell Counterpart 


Many NHL are clearly related to a normal cell coun- 
terpart, which can be a useful aid to classification and 
understanding of their clinical behavior, and may 
relate to the physiological behavior of the normal cell. 


Clinical Features 


Other clinical features, including site of origin and 
aggressiveness, are an integral and practical part of 
the definition of lymphomas as distinct diseases. The 
site of origin of NHL is an important consideration. 
The distribution of lymphoma types shows a mark- 
edly different bias in different sites and in some organs 
and/or tissues such as the skin, gastrointestinal tract, 
and to a lesser extent the spleen, lymphomas more or 
less specifically characteristic of that site alone occur. 
Clinical aggressiveness varies between the different 
NHL categories and to some extent is a function of 
histological grade, which is a function of the size of the 
lymphoma cells and their nuclear characteristics. A 
given type of NHL may transform from a low-grade 
clinically indolent tumor to one that is high-grade and 
clinically aggressive while others may be clinically 
aggressive de novo. 


Prognosis 


Clinical aggressiveness is not the same as prognosis, 
with which it is often confused. A high-grade and 
clinically aggressive NHL may show an excellent res- 
ponse to therapy and have a good prognosis. A variety 
of prognostic factors within each case of NHL influ- 
ence the clinical outcome. One of these is histological 
grade but clinical features are also important. The 
more important of these have been collected together 
to form the International Prognostic Index (IPI), 
the measurement of which is a powerful predictor of 


clinical outcome in any given patient. The prognosis 
of NHL is highly variable between the different types 
and within any given type varies with the IPI. As a 
generalization indolent, low-grade NHL tend to mani- 
fest a prolonged clinical course but tend to be incur- 
able, while approximately 50% of the more aggressive 
high-grade tumors may be cured, the remainder dying 
of their disease within a relatively short period. 


Treatment 


Lymphocytes are circulating cells whose function it is 
to patrol throughout the body searching out harmful 
antigens. Therefore, it is to be expected that their 
neoplastic counterparts would be equally widely dis- 
seminated. With few exceptions this is indeed the case. 
Thus, local treatment, either surgery or radiotherapy, 
is appropriate for only a minority of localized (low 
clinical stage) cases, while systemic chemotherapy is 
the treatment of choice for most cases that are likely to 
have already disseminated, albeit subclinically, at diag- 
nosis. The administration of cytotoxic agents either 
singly or, more commonly, in various combinations 
forms the basis of NHL treatment. Ideally, the opti- 
mum combination of drugs for each type of NHL is 
established on the basis of stringent clinical trials. 
More recently the role of bone marrow transplant- 
ation has been explored and immunotherapeutic 
maneuvers, including administration of cytotoxic 
monoclonal antibodies and DNA immunization, have 
beenused withsome success. 


See also: Cancer Susceptibility; Epstein-Barr 
Virus (EBV); Immunity 
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When Mendel’s laws were rediscovered at the begin- 
ning of the twentieth century (by three separate inves- 
tigators all studying inheritance in plants), there was 
much skepticism in the scientific community as to how 
all-inclusive these laws would be in explaining herit- 
ability in all sorts of plants and animals. In particular, 
there was a general disbelief that Mendelian principles 
could have any bearing on the inheritance of any 
common variation in phenotype observed among 
human beings. It is easy to understand the basis for 
this skepticism for it is hard (if not impossible) to find 
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even a single commonly inherited variation in humans 
that is transmitted according to classical Mendelian 
ratios. 

Just look around at your friends and acquaintances 
and you will see a whole host of characteristics that 
distinguish people from each other — including height, 
skin color, facial shape, eye color, and hair density, 
color, and shape (straight, wavy, or curly). If you 
look ‘deeper’ into people with the help of tools from 
the medical trade, you would find other characteristic 
differences in blood pressure, cholesterol levels, vari- 
ous metabolic processes, and susceptibility or resist- 
ance to a variety of infectious diseases. Finally, if you 
looked in a broader way and compared whole families 
(rather than individuals) to each other, you would find 
striking differences in familial propensity toward 
heart disease, alcoholism, various forms of cancer, 
hypertension, allergies, and mental illnesses such as 
schizophrenia and manic depression. 

It has long been obvious that inheritance plays an 
important role in the expression of all these various 
characteristics. But, at the time of the rediscovery of 
Mendel’s laws, it was also clear that the inherited 
components of these traits are not transmitted accord- 
ing to the simple ratios predicted by Mendel. These 
cases of complex genetic transmission stand in con- 
trast to cases of simple transmission soon observed for 
a variety of human disease phenotypes such as albin- 
ism, sickle-cell anemia, cystic fibrosis, and Tay-Sachs 
disease. As a consequence, early geneticists were 
forced to divide observed patterns of inheritance into 
two classes: Mendelian and so-called ‘non-Mendelian’ 
based on whether a trait could be explained by the 
segregation of simple dominant and recessive alleles 
from a single locus. 

By the 1930s, it had become clear that essentially 
all instances of ‘non-Mendelian inheritance’ could 
indeed be explained in the context of Mendel’s laws 
of inheritance (with the addition of linkage). This new 
understanding was based on an appreciation for the 
fact that genes are always transmitted according to 
Mendel’s laws, but that the connection between geno- 
type and phenotype can be more complicated than 
that first imagined by Mendel. Some scientists persist 
in calling complex patterns of inheritance non- 
Mendelian. This is outdated and inappropriate. In 
most cases, so-called non-Mendelian inheritance can 
be attributed to complexities of gene function, rather 
than gene transmission. Thus, incomplete penetrance, 
polygenic inheritance, and variable expressivity all 
appeared as forms of non-Mendelian inheritance to 
early geneticists. 


See also: Complex Traits; Quantitative 
Inheritance 
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When two chromosomes in meiosis interact to pro- 
duce recombinants for markers that are more than a 
kilobase (or so) apart, the recombinants usually arise in 
complementary pairs. In a cross AB x ab, the recom- 
binants aB and Ab arise in a single act of exchange. 
That reaction, which involves breakage and rejoining 
of the chromatids, is conservative (two chromatids in, 
two chromatids out) and reciprocal (complementary 
recombinants arise in the same, individual act). 

With closer markers, the complementary recombin- 
ant types often arise in separate acts, with each of the 
following outcomes being comparably probable: in a 
cross AB x ab, the products of recombination are 
typically (AB + aB) or (AB + Ab) or (Ab + ab) or 
(aB + ab). The rection is conservative but nonrecipro- 
cal, involving the loss of a marker and replacement by 
its alternative (gene conversion). 

In prokaryotes, homology-dependent recombin- 
ation is frequently nonreciprocal and sometimes 
nonconservative (two DNA molecules in, one DNA 
molecule out). 


See also: Gene Conversion; Recombination, 
Models of 


Nonrepetitive DNA 
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Nonrepetitive DNA is DNA that demonstrates the 
reassociation kinetics expected of unique sequences. 


See also: DNA 


Nonsense Codon 
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A nonsense codon is any one of three triplets - UAA 
(ochre), UAG (amber), and UGA - that do not code 
for an amino acid but act as signals for the termination 
of protein synthesis. 


See also: Amber Codon; Ochre Mutation 


Nonsense Mutation 
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A nonsense mutation is any change in DNA that 
causes a nonsense (termination) codon to replace a 
codon representing an amino acid. 


See also: Mutation; Nonsense Codon 


Nonsense Suppressor 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1932 


A nonsense suppressor is a gene coding for a mutant 
tRNA with the ability to respond to one or more of 
the nonsense codons. 


See also: Nonsense Codon; Transfer RNA (tRNA) 


Nontranscribed Spacer 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1933 


A nontranscribed spacer is the region between trans- 
cription units in a tandem gene cluster. 


See also: Transcription; Transcribed Spacer 


Northern Blotting 


J Eberwine and Y Sugimoto 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0907 


Northern blotting is a widely used procedure for ana- 
lyzing the molecular size and abundance of mRNA. 
This procedure requires the isolation of RNA from 
tissue samples from cultured cells. There are a number 
of RNA isolation procedures including those that use 
chaotropic reagents (to inhibit endogenous RNAse) 
and differential nucleic acid precipitation (to separate 
RNA from DNA) that yield total RNA for character- 
ization by Northern blot analysis. For Northern blot 
analysis the RNA is denatured, loaded on a denaturing 
agarose gel and the RNA species separated by electro- 
phoresis. After electrophoresis the RNA is transferred 


from the gel to a nylon membrane by either diffusion 
blotting or by electroblotting. If diffusion blotting is 
used to transfer the RNA from the gel to the mem- 
brane then usually the transfer buffer is a high molar- 
ity salt solution so that the charged nucleic acids will 
move with the salt through the gel and onto the mem- 
brane (Figure 1). 

After transfer, the membrane is either placed in a 
UV-crosslinker or vacuum oven at 80°C to irreversibly 
attach the RNA to the filter. The next step involves 
prehybridizing the filter in a blocking solution which 
provides reagents that bind to all of the reactive sites 
on the membrane that are not already associated with 
RNA. After prehybridization the filter is exposed to a 
solution containing a suitable probe and hybridization 
is begun. The types of probe can vary; either DNA or 
antisense RNA can be used. The probe is usually radio- 
actively labeled so that a hybridization signal can be 
visualized on film or using a phosphoimager system. 
Alternatively, probe can be made with a label that 
permits antibody detection (e.g., digoxigenin). The 
anti-digoxigenin antibody in turn is usually conjugated 
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to an enzyme that will convert a substrate to product 
at the site of antibody binding. In this way hybrid- 
ization can be visualized by nonradioactive procedures 
including chemiluminescence (Figure 2). An example 
of a Northern blot is presented in Figure 3. In this 
example, cDNA probes specific for prostaglandin 
synthase-2 (COX-2, lanes 1 and 2) and for prostaglan- 
din receptor EP4 (lanes 3 and 4) were used to screen 
RNA from a macrophage-like cell line, RAW 264.7, to 
assess the abundance and size of the cognate mRNA. 
Total RNA (20 ug) isolated from the nonstimulated 
cells (lanes 1 and 3) or the LPS-stimulated cells (lanes 2 
and 4) were electrophoretically separated on a 1.2% 
agarose gel. Hybridization bands for COX-2 mRNA 
are seen ata size of 4.1 kb and bands for EP4 mRNA are 
at 3.7 kb. Expression of EP4 mRNA, but not COX-2 
mRNA, is detected in nonstimulated cells. However, 
COX-2 mRNA expression is highly induced in the 
LPS-treated cells, while EP4 mRNA expression is con- 
stant during stimulation. 

As illustrated in Figure 3, total RNA can be used 
in Northern blotting. It should be remembered that 
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between 1 and 3% of total RNA is mRNA, which is 
usually the class of RNAs being examined. Northern 
blotting has a sensitivity of detection of approximately 
1-5 ng of a particular species of mRNA. The amount 
of mRNA that can be detected on Northern blots is 
dictated by the specific activity of the probe and the 
amount of RNA loaded on the denaturing gel. If an 
mRNA cannot be detected in a total RNA sample the 
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Figure 3 A Northern blot. 


poly (A) + mRNA can be isolated from the total RNA 
using oligo-dT as a ‘hook’ to anneal to the poly (A) + 
mRNA and remove it from the total RNA population. 
The poly (A) + mRNA can be concentrated and run 
on a denaturing gel rather than total RNA. The 
enrichment this offers is illustrated by the following 
example: If 30 ug of total RNA is loaded on a denatur- 
ing gel then approximately 1 ug of this is poly (A) +. If 
the poly (A) + is isolated from the total RNA sample, 
concentrated, and 30 ug is loaded on the gel, a thirty- 
fold increase in sample will be available for hybrid- 
ization with the probe, thereby facilitating the 
visualization of the previously undetectable mRNA 
species. 

The term ‘reverse Northern,’ used recently in the 
microarray literature, refers to a procedure in which 
DNAs are attached to a nylon membrane followed 
by hybridization with labeled RNA probes. This 
procedure can provide quantitative data regarding 
mRNA abundance but will not provide any infor- 
mation concerning mRNA size. 


See also: Messenger RNA (mRNA) 
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The nuclear envelope (NE) represents the boundary 
of the interphase nucleus. It regulates the composition 
and structure of the nucleus, by providing a selective 
barrier to control the exchange of material between 
the nucleoplasm and cytoplasm, and by being in- 
volved in the maintenance of nuclear architecture 
and chromatin organization. 


Structure of the Nuclear Envelope 


The NE consists of two continuous, distinct parallel 
membranes, the inner and outer nuclear membranes, 
enclosing a perinuclear space (Figure |). The outer 
nuclear membrane and perinuclear space are continu- 
ous with the endoplasmic reticulum, and share its 
functions. The inner nuclear membrane is composi- 
tionally distinct from the outer nuclear membrane and 
in many cells is lined with a fibrous nuclear lamina on 
its nucleoplasmic face. The nuclear lamina is a lattice- 
like sheet of variable width, consisting mainly of poly- 
mers of filamentous lamin proteins (which are related 
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Figure | The nuclear pore complex. 


to intermediate filament proteins), and is thought to 
contribute to the structural integrity of the NE. 


The Nuclear Pore Complex 


The sole mediators of exchange across the NE are the 
nuclear pore complexes (NPCs), large proteinaceous 
assemblies embedded within nuclear pores formed by 
the fusion of the inner and outer nuclear membranes. 
Although small molecules (such as nucleotides, water, 
and ions) can freely diffuse across the NPCs, macro- 
molecules such as proteins and ribonucleoprotein par- 
ticles are actively transported in a highly regulated and 
selective manner. The NPC thus acts as a gate, limiting 
the permitted size of transiting molecules; while mol- 
ecules of greater than 9nm in diameter cannot pas- 
sively diffuse across the NPC, molecules with a 
diameter of up to ~30nm can be actively and effi- 
ciently transported by the NPC. Each NPC is capable 
of bidirectional transport across the NE, and it is 
estimated that in actively growing cells many hun- 
dreds of proteins and ribonucleoprotein complexes 
cross each NPC every minute. 

The general morphology, composition, and trans- 
port processes (as currently understood) of NPCs 
appear to be highly conserved between evolutionary 
divergent phyla. The NPC is an ~100 MDa supramo- 
lecular cylindrical assembly whose constituent pro- 
teins are termed nucleoporins (Figure 1). Eight 
spokes surround a hollow tube-like central transport- 
er. All macromolecules transit the NPC through the 
aqueous central channel of the central transporter. 
Each spoke is composed of several struts and is 
attached to its neighbors by four coaxial rings: the 
inner spoke ring, an outer spoke ring within the 
lumen of the nuclear envelope, a cytoplasmic ring, 
and a nucleoplasmic ring. A considerable portion of 
every spoke traverses the pore membrane and resides 
in the lumen of the NE. These structures comprise the 
cylindrical core, which appears nearly mirror sym- 
metric in the plane of the NE. Peripheral filaments 
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project from this core into the nucleoplasm and cyto- 
plasm. These are asymmetric in the plane of the NE; 
while eight cytoplasmic filaments spread from the 
cytoplasmic ring, the nuclear filaments attached to 
the nucleoplasmic ring conjoin distally to form the 
nuclear basket. The composition of the Saccharomyces 
NPC has recently been elucidated, and was found to 
contain ~30 different nucleoporins, a surprisingly 
small number for such a large structure. However, it 
seems that the presence of these proteins in high copy 
numbers (8, 16, or 32 copies per NPC) accounts for 
both the large size and high degree of symmetry 
observed for the NPC. 


Nucleocytoplasmic Transport: 
Karyopherins and the Ran GTPase Cycle 


There is great diversity in the macromolecules that 
move between the nucleus and cytoplasm. These 
include ribonucleoprotein particles (RNPs) such as 
those that contain mRNA, ribosomal subunits, small 
nucleolar RNPs, as well as many structurally different 
soluble proteins, and even viral particles that contain 
preintegration DNA complexes. Because these 
macromolecules exceed the diffusion limit of the 
NPC, they are actively transported in both directions 
by nuclear transport factors that escort them through 
the NPC. The transport factors recognize nuclear 
localization signals (NLSs) present on cargoes to be 
imported, while substrates to be exported from the 
nucleus harbor nuclear export signals (NESs). 
Although the signals can be very different, ranging 
from a variety of short amino acid sequences to 
specific nucleotides in tRNA, most of the transporters 
are structurally related and thus form a family of 
transporters termed the karyopherins (or importins 
and exportins). In yeast this family has at least 14 
members identified by their structural similarities 
and characterized in terms of both their cargoes and 
their direction of transport. In metazoans, the family 
is much larger, and some transporters appear to be 
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cell-type specific, suggesting the potential for an elab- 
orate control program coordinating nuclear transport 
with changes in cellular demands. Unlike the proto- 
type karyopherin (kapB1), which uses an adapter pro- 
tein (kapa) to bind its cargo, other family members 
bind directly to the specific NLS or NES on their 
cargoes. Often, the signals also overlap with other 
functional domains within the cargo. For example, in 
some instances NLSs overlap with RNA or DNA- 
binding domains. Thus in a cargo/carrier complex, 
RNA/DNA-binding domains remain masked until 
the cargo is released where it carries out its specific 
function. Despite the variability in NLSs and NESs, 
there is remarkable redundancy in the karyopherin 
family. In yeast most of the karyopherins can be 
deleted, without catastrophic consequences. This 
may indicate that any given karyopherin can recog- 
nize different types of signals, or cargoes can carry 
otherwise cryptic signals, which are recognized by 
particularly promiscuous carriers. 

All karyopherins also interact with the GTPase 
Ran (Figure 2). Ran exists in either its GTP-bound 
or GDP-bound states, but the GTP-bound state is 
maintained in the nucleus by the nuclear-restricted 
GTP exchange factor RCC1, while the Ran- 
GTPase-activating protein (Ran-GAP) is primarily 
cytoplasmic, ensuring that this pool of Ran is in its 
GDP-bound form. This distribution contributes to 
the directionality of transport by triggering the assem- 
bly and disassembly of transport complexes in the 
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correct compartments. Thus, the formation of an 
import complex between a karyopherin and its cargo 
is stable in the presence of cytoplasmic Ran-GDP, but 
in the nucleoplasm, Ran-GTP binds to the karyo- 
pherin, displacing its cargo. On the other hand, the 
formation of an export complex is stabilized in the 
nucleus by Ran-GTP and as this complex reaches the 
cytoplasm, the GTP is hydrolyzed and the complex 
disassembles. The resulting free Ran-GDP is appar- 
ently returned to the nucleoplasm in a complex with 
the FG-nucleoporin-binding protein, p10/Ntf2p. As 
Ran-GTP is the only energy source required for karyo- 
pherin-mediated translocation of proteins, and GTP 
hydrolysis is not linked to the import process itself, the 
energy for transport likely comes from the mainte- 
nance of the potential energy gradient across the NPC. 


Mechanism of Nucleocytoplasmic 
Transport 


Although the precise mechanism is still unknown, 
clues as to how the NPC mediates directional nucleo- 
cytoplasmic transport are provided by the following 
facts. First, there appears to be no motor protein 
required to transport individual karyopherin—cargo 
complexes across the NPC. Second, there is an 
abundance of binding sites for karyopherins and 
other transport factors at the NPC in the form of a 
particular family of NPC components, termed FG- 
nucleoporins (due to the presence of large numbers 
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Figure 2 Directional transport is controlled by the interaction of karyopherins with Ran-GTP, nups, and substrates 


(see text for details). 


of degenerate Phe-Gly repeats in their primary 
sequences). These are distributed along filamentous 
structures corresponding to the docking sites ob- 
served by EM. Most FG nucleoporins are symmetri- 
cally disposed on both sides of the NPC; however, the 
few asymmetrically disposed FG nucleoporins are 
found at the extremities of the NPC. Finally, electron 
microscopy studies suggest that a karyopherin-— 
import cargo complex docks at multiple sites along 
the cytoplasmic filaments and through the NPC, but 
the terminal event is a high-affinity step to the nucleo- 
plasmic FG nucleoporins prior to release from the 
NPC. Thus it is proposed that nuclear import is facili- 
tated by a limited number of karyopherin docking and 
release steps, as the cargo—carrier complex moves from 
the cytoplasmic filaments of the NPC, through the 
central transporter to the high-affinity docking sites at 
the NPC’s nucleoplasmic face, where it is released to 
the nuclear interior by Ran-GTP. Export is presumed 
to employ an analogous mechanism. 


Nucleocytoplasmic Transport: 
Additional Soluble Factors and RNP 
Export 


Although the mechanics of transport are beginning to 
emerge for the nucleocytoplasmic transport of mono- 
meric proteins, the situation becomes somewhat more 
complicated in the case of macromolecular complexes. 
For example, in the case of the well-studied model 
system of human immunodeficiency virus type 1, the 
viral genome is imported into the nucleus as a large 
DNA/protein complex. Several of the proteins within 
this preintegration complex contain NLSs that are 
recognized by karyopherins. Perhaps because of its 
large size (up to 300S) the use of multiple NLSs 
improves its import efficiency. This is likely analogous 
to the nuclear export of RNPs such as ribosomal sub- 
units and mRNA. In the case of mRNA, it is tran- 
scribed, spliced, and exported complexed with many 
different RNA-binding proteins, which contribute to 
its progression to a mature cytoplasmic translatable 
mRNA molecule. The direct role of any of these RNP 
proteins in mRNA export per se remains to be found; 
however, again HIV provides a valuable example as to 
how this may occur. The HIV mRNA is exported in 
variably spliced forms, and it bypasses the constitutive 
cellular splicing machinery by directly accessing the 
cellular export machinery. The REV protein, encoded 
by the viral genome, binds specifically to the unspliced 
HIV mRNA but also interacts with the karyopherin 
family member Crm1p (or exportin-1) through a spe- 
cific NES, thus promoting its export. 

Cellular mRNAs, however, do not appear to utilize 
Crmip. Instead, another factor, TAP (Mex67p in 
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yeast), plays an analogous role, bridging the RNP 
and the NPC. Interestingly, although TAP binds FG 
nucleoporins like karyopherins, it does not bear any 
obvious structural similarities to the karyopherin 
family. In addition, although TAP does not interact 
directly with Ran, it may employ a different mechan- 
ism to link export to the Ran cycle. TAP interacts with 
a protein, p15, which shares similarity to p10/Ntf2p. 
Furthermore, as large mRNPs have been observed to 
unwind and thread through the central transporter 
during transit, it seems unlikely that the Ran gradient 
provides sufficient energy to drive this process. Thus, 
ATP-dependent RNA helicases tethered to the NPC 
may serve such a function. 


Nuclear Envelope and Chromatin 
Organization 


Electron microscopy of many cells has shown that the 
nuclear periphery immediately adjacent to the NE is a 
region of specialized nucleoplasmic organization, and 
in particular a site enriched in heterochromatin. The 
inner nuclear membrane, lamina, and nucleoplasmic 
face of the NPC have all been implicated as anchor 
sites for the organization of chromatin at the nuclear 
periphery. For instance, the Tpr family of filamentous 
nucleoskeletal proteins are attached to the NPC 
nuclear face and extend into the nuclear interior, and 
have been implicated in anchoring telomeres to the 
NE. Nevertheless, much about the relationship 
between chromatin organization and the NE remains 
to be discovered. 


Nuclear Division and the Nuclear 
Envelope 


In many metazoans, the NE disassembles during 
mitosis and meiosis (‘open’ mitosis and meiosis) to 
allow the spindle assembly forming around the cyto- 
plasmic centrosomes (the spindle organizers) access to 
the condensing chromosomes. At the end of prophase, 
the NE fragments into small vesicles, the lamina de- 
polymerizes, and the NPCs are dismantled into soluble 
or NE vesicle-associated monomers or oligomers of 
nucleoporins. At the end of telophase, the two nascent 
daughter cells reform their NEs. This involves the 
coordinate association, flattening, and fusion of the 
NE-derived vesicles, the reassembly of the NPCs, 
and the repolymerization of the lamina. However, 
many fungi and protists follow a ‘closed’ mitosis, in 
which the NE remains intact and the spindle forms 
within the nucleus. In such cases, the spindle organizer 
can often be found embedded in a pore in the NE, 
although in some organisms (e.g., dinoflagellates) the 
spindle organizer remains outside the NE and it is the 
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kinetochores that span the NE. Hence, the NE can 
also play an active role in cell division as part of the 
spindle assembly. Intriguingly in yeast, the NE-bound 
spindle organizer and NPCs have been shown to share 
protein components, possibly indicating some similar- 
ities in their methods of formation. 


Future Prospects 


Now that many of the components required for 
nucleocytoplasmic transport have been identified 
ina number of model organisms, a detailed knowledge 
of the mechanism of transport is likely to be gained 
in the next few years. Furthermore, a greater under- 
standing of the continuum of the NPC, the NE, and 
the nuclear interior should lead to insights into the 
mechanisms of chromatin organization and global 
gene regulation. 


Further Reading 

Mattaj IW and Englmeire L (1998) Nucleocytoplasmic trans- 
port: the soluble phase. Annual Review Biochemistry 67: 
263-306. 

Went SR (2000) Gatekeepers of the nucleus. Science 288 (5470): 
1374-1377. 


See also: Cytoplasm; Meiosis; Mitosis; Nucleus 
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Export 


See: Nuclear Envelope, Transport 


Nuclear Matrix 
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The nuclear matrix is the protein latticework within 
the nucleus in which DNA replication and transcrip- 
tion complexes are anchored. 


See also: Nucleus 


Nuclear Pore Complex 


See: Nuclear Envelope, Transport 


Nuclear Pores 
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Nuclear pores are openings in the nuclear envelope, 
approximately 10nm in diameter, through which 
molecules synthesized in the cytoplasm (e.g., nuclear 
proteins) and mRNA pass. The pores are generated by 
a large protein assembly. 


See also: Cytoplasm; Messenger RNA (mRNA); 
Nucleus 
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Nuclear transfer techniques have been and are being 
used for a variety of purposes. In broadest terms 
nuclear transfer encompasses the transfer of the entire 
genetic material from one cell to another in order to 
study nuclear—cytoplasmic interaction. The gene ex- 
pression pattern of the donor nucleus will presumably 
change in the recipient cytoplasm under the influence 
of the cytoplasmic factors present. This reprogram- 
ming and the analysis of the mechanisms involved is 
the basic scientific paradigm of nuclear transfer. The 
transfer of the nuclei of somatic cells into enucleated 
egg cytoplasm for the purpose of creating a novel 
organism — cloning — represents one specific but very 
important aspect of nuclear transfer. 


Somatic Cell Hybridization 


Combining two different cells (cells from two differ- 
ent tissues of the same species, cells from two different 
species) is called somatic cell hybridization and is also 
a form of nuclear transfer. In this case, however, in 
addition to the interaction between the nucleus and 
the foreign cytoplasm, two different nuclei exert some 
influence on each other. The fusion of two parental 
cells into a hybrid cell can be achieved by various 
means, the most commonly used include hemagglutin- 
ating virus of Japan (HVJ), also known as Sendai 
virus, electrofusion or, in the case of lymphocyte 
fusion, polyethylene glycon (PEG). Following fusion 
it is usually necessary to selectively eliminate all par- 
ental cells, and various selection methods resulting in 


the death of the parental cells and the survival of the 
hybrids have been developed. Somatic cell hybrids 
have since been extensively used, mostly as tools for 
chromosomal gene mapping and for the production of 
monoclonal antibodies. In order to study nucleo- 
cytoplasmic interactions without the confounding 
presence of another nucleus, methods have been 
developed to transfer a nucleus into a foreign cyto- 
plasm. Following disruption of the cytoskeleton and 
centrifugation, somatic cells can usually be separated 
into a nucleus surrounded by a very small amount of 
cytoplasm and plasma membrane, so-called karyo- 
plast, and the remaining major part of cytoplasm also 
surrounded by plasma membrane, so-called cytoplast. 
Neither of these structures can survive, but following 
the fusion of karyoplast from one cell type and cyto- 
plast from another, the resulting cybrid can survive. 
Cybrids can be used to study cytoplasmic factors 
which reprogram gene expression in the donor 
nucleus. 


Nuclear Transfer into Egg Cytoplasm 


Transferring the nuclei of somatic cells into the cyto- 
plasm of enucleated eggs and observing the develop- 
ment of the resulting embryo was always considered 
to be the crucial experiment for assessing the toti- 
potency of the somatic nucleus. This type of experi- 
ment was initiated 50 years ago in amphibians 
and has been used in different species since then (Di 
Berardino, 1997). In amphibians (frogs and toads were 
most commonly used) the genetic material of the egg 
is either removed mechanically or inactivated by 
means of ultraviolet light. The nucleus from the 
somatic cell is then injected, using a fine pipette, into 
the egg cytoplasm and the ensuing development 
observed. These experiments demonstrated that the 
transfer of nuclei from early embryos results in com- 
plete development to adulthood. The developmental 
capacity of nuclei from older embryos and tadpoles 
is significantly reduced, and the transfer of nuclei from 
adult tissues has never resulted in complete develop- 
ment. 

Nuclear transfer in amphibians was subsequently 
extended to other species and finally to mammals. 
Several methods have been used to perform nuclear 
transfer in mammals. Removal of the genetic material 
(pronuclei from a zygote or metaphase chromosomes 
from an oocyte) should be accomplished, avoiding 
penetration of the plasma membrane of the egg, by 
positioning the enucleation pipette over the corres- 
ponding area and gently removing the genetic material 
within a small amount of cytoplasm and plasma mem- 
brane (karyoplast). The donor nucleus can be intro- 
duced into the recipient egg cytoplast using similar 
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methods as described for somatic cell fusion. Electro- 
fusion or fusion mediated by inactivated Sendai virus 
have been successfully used as was the direct injection 
of a naked nucleus into the egg cytoplasm. Following 
gradual technical improvements over the last 20 years, 
the transfer of adult nuclei into enucleated eggs 
resulted in normal development to adulthood in sev- 
eral mammalian species (sheep, mice, cow, goat, and 
pig). The success rate is admittedly very low: less than 
1% of nuclear transfer embryos develop to adulthood. 
It is at present unclear whether technical or biologi- 
cal problems, or both, contribute to the low success 
rate. Nuclear transfer in mammals will likely play a 
significant role in agriculture as a method to produce 
genetically modified animals to serve as bioreactors 
(McLaren, 2000). The application of this method in 
human medicine, so-called therapeutic cloning, is cur- 
rently a subject of numerous controversies and is thus 
uncertain. 
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A nuclease is an enzyme that degrades nucleic acids by 
hydrolyzing the phosphodiester bond that joins the 
sugar residues. Nucleases are critical components to 
biological processes involving nucleic acids. Some 
nucleases are DNA specific (DNase), some are RNA 
specific (RNase), and some degrade both DNA and 
RNA. Nucleases can also have a strong preference 
for either double-stranded or single-stranded nucleic 
acids. The nucleases are also characterized according 
to whether they degrade from an end of a nucleic acid 
molecule (an exonuclease) or from within the nucleic 
acid molecule (an endonuclease). Furthermore, exo- 
nucleases are specific for either the 3’ end or the 5’ end 
of a molecule. Exonucleases degrade DNA by remov- 
ing a single base per hydrolysis event and typically 
release mononucleotides. Endonucleases cleave nu- 
cleic acids internally and leave either a 3’ hydroxyl 
and 5’ phosphate or a 5’ hydroxyl and 3’ phosphate 
at the site of cleavage. 
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DNA polymerases often contain 3'-specific exo- 
nuclease domains for proofreading base misincor- 
porations and help maintain the fidelity of DNA 
replication. DNA polymerases may also contain a 5’- 
specific exonuclease domain for nick translation. 
Genetic recombination requires DNases for the initi- 
ation of crossover events and for resolution of the 
combined DNA molecules. A relatively newly dis- 
covered class of nucleases is the homing endonuclease, 
which initiates intron and intein mobility by cleaving 
double-stranded DNA. Restriction/modification sys- 
tems require endonucleases for the degradation of 
foreign DNA. DNA repair processes involve many 
nucleases with varying properties. The process of 
RNA maturation is complex and requires a number 
of RNases, both exo- and endonucleases. 

Nucleases are important biochemical tools for the 
molecular biologist. Among the most important are 
the restriction endonucleases, which allow precise 
cleavage of double-stranded DNA and are a main- 
stay of in vitro recombinant DNA technology. The 
homing endonucleases are a recent addition to the 
molecular biologist’s toolbox. These are highly 
sequence-specific, double-stranded endonucleases 
that have similar application as the type II restriction 
endonucleases, but cleave DNA much less frequently 
resulting in extremely large fragments of DNA from 
hundreds of thousands to millions of base pairs. 
The exonucleases are useful in vitro for converting 
double-stranded DNA to single-stranded DNA and 
conversely other exonucleases can be used to remove 
single-stranded termini from double-stranded DNA 
fragments. Exonucleases can be highly processive or 
can be poorly processive or random in their attack on 
nucleic acids. A highly processive exonuclease will 
bind to a nucleic acid end and remove thousands of 
mononucleotides without dissociating from the end. 
On the other hand, a random exonuclease will bind to 
an end, remove only one or two mononucleotides, dis- 
sociate, and then bind to another nucleic acid molecule. 
When molecular biologists convert RNA to DNA in 
vitro they utilize RNaseH because it degrades the 
RNA of a DNA/RNA hybrid molecule leaving a 
single-stranded DNA which can then be converted 
to double-stranded DNA with DNA polymerase. 

Nucleases are a critical component of the genetic 
apparatus of the cell and play an invaluable role in the 
precise manipulation of nucleic acids in vitro. 
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A nucleic acid, or polynucleotide, is a polymer of 
nucleotides. Since a nucleotide consists of a nitro- 
genous base bonded to a sugar, which is in turn 
bonded to a phosphate, a variety of polynucleotides 
based on different sugars are theoretically possible, 
but only two types of polynucleotide are actually 
known: ribonucleic acid (RNA), a polymer of nu- 
cleotides containing ribose, and deoxyribonucleic 
acid (DNA), a polymer of nucleotides containing 2'- 
deoxyribose. 

All nucleic acids have the same fundamental struc- 
ture: The nucleotide monomers are joined to one 
another through phosphodiester linkages between 
the sugars, thus forming a backbone of alternating 
sugars and phosphates with the bases emerging to the 
side (see illustration). The phosphodiester linkages 
always connect the 5’-carbon of one sugar with the 
3'-carbon of the next, thus giving the polymer a polar- 
ity and distinct ends, designated 5’ and 3’. The polarity 
is critical biologically. Polynucleotide synthesis, for 
instance, always proceeds in the 5’ to 3’ direction, and 
this is also the direction in which a nucleic acid encod- 
ing protein is read (translated) during protein synthesis. 

The bases of nucleic acids are either pyrimidines, 
with a hexagonal ring of four carbon and two nitrogen 
atoms, or purines, with the pyrimidine ring extended 
into a pentagonal ring with two addition nitrogens and 
one additional carbon. Although many pyrimidines 
and purines are known, DNA contains only four: the 
pyrimidines cytosine and thymine and the purines 
adenine and guanine. RNA is similar, with thymine 
replaced by uracil. In a few instances, these bases are 
secondarily modified, as in certain viruses and in the 
transfer RNA of the cellular apparatus. 

Nucleic acids form the genomes of all organisms 
and viruses. The genomes of organisms are always 
DNA; those of viruses may be either DNA or RNA. 
The DNA genomes of organisms are always double- 
stranded helices with the well-known Watson—Crick 
structure in which opposite bases are hydrogen- 
bonded into pairs of one purine and one pyrimidine. 
Viral genomes of all four types are known: DNA or 
RNA, single- or double-stranded. The RNA mol- 
ecules that form the cellular apparatus are single- 
stranded but often have internal double-stranded 
regions. Four types of RNA constitute the apparatus 
for translation of genomic information into protein. 


Messenger RNA (mRNA) molecules are synthesized 


on genomic templates (generally DNA, of course) 
and carry the specific messages for the amino acid 
sequences of proteins. Ribosomal RNA (rRNA) 
molecules are structural components of the ribo- 
somes, the factories where proteins are synthesized. 
Transfer RNA (tRNA) molecules carry activated 
amino acids to the ribosomes. Several kinds of small 
nuclear RNAs (snRNAs) are incorporated into 
spliceosomes that convert pre-messenger RNA mol- 
ecules into mRNAs by removing intron sequences. 


See also: DNA; Messenger RNA (mRNA); 
Ribosomal RNA (rRNA); Soluble RNA; Transfer 
RNA (tRNA) 
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See: Nuclear Envelope, Transport 
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The nucleolar organizer is a loop of DNA that pos- 
sesses multiple copies of rRNA genes. 


See also: Ribosomal RNA (rRNA) 
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The nucleolus is a distinct structure in the nucleus of 
the cell composed of filamentous and granular mater- 
ial. It is the site of synthesis and processing of riboso- 
mal RNA and the assembly of this RNA with 
ribosomal proteins into ribosomal subunits. Ribo- 
somes are the molecular machines that in the cyto- 
plasm translate mRNA into protein molecules. In the 
electron microscope, the nucleolus consists of a darker 
fibrillar (pars fibrosa) and granulated (pars granulosa) 
matrix interspersed with lighter areas. The pars fibrosa 
consists of the heavily transcribed rRNA genes and 
rRNA. The pars granulosa contains the maturing 
ribosomal precursor particles. The lighter areas are 
filled apparently with nontranscribed DNA. 
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When a gene encodes for a protein, one copy of the 
gene is mostly sufficient to produce thousands of 
copies of the encoded protein because the processing 
involves two steps of amplification: the first at tran- 
scription and the second at translation. This amplifi- 
cation is not available for the RNA component of 
ribosomes. It is for this reason that the genes encoding 
the ribosomal RNA are present in multiple copies in 
the haploid genome. These multiple copies can be 
found in clusters on a number of different chromo- 
somes. Each cluster of rRNA genes is referred to as a 
nucleolar organizer region (NOR). 

In contrast to yeast, the nucleolus in higher eukar- 
yotes is only present at the G1, S, and G2 phases of the 
cell cycle. The assembly and disassembly of the nucleo- 
lus in relation to the cell cycle is most likely controlled 
by the cell cycle regulators. There are strong indica- 
tions that the genes that control the exit of mitosis are 
also involved in activating the assembly of the nucleo- 
lus after mitosis. In yeast the chromatin-modelling 
protein Sir2 in combination with Net1 appears to be 
involved in structuring the nucleolus. Recent studies 
indicate that at the onset of mitosis the transcription of 
the rRNA genes is silenced through the phosphoryl- 
ation of the rRNA transcription factor SL1. 

After mitosis, the clusters of rRNA genes restart 
the synthesis of RNA and ribosome subunits, and in 
the process fuse together to form the nucleolus. The 
rRNA genes are transcribed into a large precursor 
molecule (pre-rRNA). This precursor molecule is 
processed to form three distinct species of rRNA, 
the 18S, 5.88, and 28S rRNA. The 18S rRNA is pack- 
aged with approximately 30-35 proteins into the small 
ribosomal subunit. The 5.8S and 18S rRNAs are pack- 
aged with approximately 50 proteins and an additional 
rRNA, 5S rRNA, that is synthesized from a group of 
separate genes outside of the nucleolus into the large 
ribosomal subunit. The processing of the pre-rRNA 
involves modification at specific sites followed by 
removal of parts of the molecule (the long external 
spacer and internal spacer sequences ETS and ITS). 
Both processes are mediated by small nucleolar RNAs 
(snoRNAs) and specific proteins in small nucleolar 
ribonucleoprotein (snoRNP) complexes. There are 
two major groups of snoRNAs, the box C/D and the 
box H/ACA snoRNAs. The box C/D snoRNAs 
modify pre-rRNA by methylation of specific ribose 
residues, whereas box H/ACA snoRNA s function in 
site-specific pseudouridylation. The necessary ribo- 
somal proteins are imported from the cytoplasm. 
The assembly of the small ribosome subunit is accom- 
plished in about 30 min, whereas the large ribosome 
subunit is completed in about 1h. The finished ribo- 
somal subunits are subsequently exported through the 
nuclear pores into the cytosol. 
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The size of the nucleolus varies from virtually 
absent to occupying a quarter of the nuclear volume. 
No doubt this reflects the activity of a given cell in 
protein synthesis, and is controlled by regulatory 
mechanisms. 
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The nucleosome is composed of repeating units of 
organization of chromatin fibers in chromosomes, 
comprising approx. 200bp and two molecules each 
of the histones H2A, H2B, H3, and H4. Much of the 
DNA (around 140 bp) may be wound around a core 
made up of histones; the remainder attaches to adja- 
cent nucleosomes, forming a structure resembling a 
string of beads. 


See also: Chromatin; Histones 


Nucleotide Sequence 
See: DNA Sequencing 
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A ‘nucleotide’ is a molecule consisting of a nitrogenous 
base, a sugar (ribose or deoxyribose), and a phosphate, 
usually considered as the subunit of a nucleic acid (but 
see below). A ‘nucleoside’ consists of the base and sugar 
alone; it is converted into a nucleotide by phosphoryl- 
ation — addition of a phosphate (or phosphoryl) group. 


Ribose and deoxyribose are pentose sugars, con- 
taining five carbon atoms, which are numbered 1' to 5’, 
the prime marks distinguishing these positions from 
the positions of the nitrogenous bases. They assume a 
furan ring form including carbons 1’ to 4’ and the 
oxygen bonded to the 4’ carbon; the 5’ carbon is a 
-CH) group to the side of the ring, and it is here that 
the phosphate is attached. The base is always bonded 
to the 1’ position. In ribose, positions 2' and 3’ carry 
hydroxyl groups, but in 2'-deoxyribose, the 2’ carbon 
carries only two hydrogen atoms. 

Nucleotides and nucleosides are named for their 
bases, so the nucleosides of adenine, cytosine, guanine, 
thymine, and uracil are, respectively, adenosine, cyti- 
dine, guanosine, thymidine, and uridine. The nucleo- 
tides are then designated adenosine 5’-phosphate, and 
so on; alternatively, they have been named adenylic 
acid, cytidylic acid, guanylic acid, thymidylic acid, and 
uridylic acid. The deoxy- forms of nucleosides 
should then be designated 2'-deoxyadenosine, and 
so on, and the nucleotides 2'-deoxyadenosine 5'- 
phosphate, and so on. 

From the viewpoint of genetics, nucleotides are 
primarily important as the subunits (monomers) of 
nucleic acids, but they have much broader roles in 
metabolism. The cytoplasm of a cell is rich in nucleo- 
sides diphosphates and triphosphates — that is, mol- 
ecules with chains of two or three phosphates on 
the 5’ position. These compounds are the principal 
energy-carriers in cells. (Other nucleotides with 
bases such as nicotinamide and flavin are also essential 
in energy metabolism.) Adenosine triphosphate 
(ATP) is employed as an energy source to drive 
many endergonic metabolic reactions (reactions that 
entail an increase in free energy, which is thermo- 
dynamically forbidden); other nucleoside triphos- 
phates have similar but lesser roles in specific 
biosynthetic processes. In most reactions, the process 
is made exergonic (with a thermodynamically favor- 
able decrease in free energy) by transferring the ter- 
minal phosphate (or phosphoryl) from the nucleoside 
triphosphate to some other molecule, leaving a 
nucleoside diphosphate. In some cases, the nucleoside 
triphosphate transfers its two terminal phosphates, 
leaving a nucleoside monophosphate. 

It is important to understand this metabolic role of 
nucleotides to make sense of polynucleotide (nucleic 
acid) synthesis. In polynucleotides, the nucleotides are 
connected by phosphodiester linkages between the 3’ 
carbon of one and the 5’ carbon of the next. A poly- 
nucleotide thus has a 5’ end (with a free 5’-phosphate 
or triphosphate) and a 3’ end, with a free 3’-hydroxyl 
group. Both DNA replication (see DNA Replication) 
and RNA transcription are catalyzed by polymerases 
that add nucleotides to the 3’ end of a nascent (growing) 


chain. The incoming nucleotides being added to the 
chain are initially nucleoside triphosphates and thus 
carry enough energy to drive the endergonic process 
of forming phosphodiester linkages. Each linkage is 
made by connecting the terminal 3’-hydroxyl group of 
the chain to the 5’-phosphate of the incoming nucleo- 
tide, releasing a pyrophosphate (P207) molecule. 
Polynucleotide synthesis thus depends critically on 
the 3’-hydroxyl group on the end of the nascent chain. 
This fact forms the basis for the Sanger method of 
DNA sequencing (see DNA Sequencing), in which 
DNA replication is carried out in vitro in a mixture 
containing dideoxynucleotides, or, strictly speak- 
ing, dideoxynucleoside triphosphates — molecules 
that lack oxygen atoms at both the 2’ and 3’ positions. 
The incorporation of one of these molecules into a 
nascent chain stops further synthesis, since the chain 
has no 3'-OH group. The use of this device in DNA 
sequencing is explained in the corresponding article. 


See also: DNA; DNA Replication; DNA 
Sequencing; Transcription 


Nucleus 


M A Ferguson-Smith 
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The nucleus is the structure within the cell which 
contains the chromosomes and is bounded by a 
double-layered nuclear membrane. It is a large, often 
spherical structure whose shape depends on the nature 
of the cellular tissue from which it is derived. Thus, in 
hepatic cells and lymphocytes it is spherical, in squam- 
ous cells it is disk-shaped, and in smooth muscle cells 
it is torpedo-shaped. At interphase, the nucleus con- 
tains one or more prominent nucleoli within which 
the ribosomes are assembled. Transcription, RNA 
processing, and splicing occur within the nucleus as 
does DNA synthesis. 


See also: Cell Cycle; Nucleolus 


Nude Mouse 


L Silver 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0917 


A spontaneous mouse mutation occurred that caused 
homozygous mice to be born and live their lives with- 
out hair. For reasons unrelated to their nudeness, these 
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mice also had a nonfunctioning cellular immune sys- 
tem, a trait that was exploited by immunologists to 
understand how the immune system functions. 


See also: Pleiotropy 


Null Hypothesis 


T P Speed 
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A null hypothesis is a statistical statement about a 
population, where this term is used in the statistical 
sense of a collection of units, associated with each 
of which is one or more quantitative or qualitative 
characteristics. 

A concrete example might be the population of the 
United States on a given day, with income as the 
characteristic of the units. A more abstract example 
is the collection of all possible offspring of a given 
mating pair of organisms, with genotype at a specified 
locus as the characteristic. Here the population is 
hypothetical. A third example might be the set of all 
base pairs in the genome of an organism, with the 
actual base at each position being the characteristic 
of interest. A null hypothesis might assert that the 
population average or population proportion asso- 
ciated with a characteristic has a given value. More 
generally, a null hypothesis is an arbitrary statistical 
statement about the distribution of one or more char- 
acteristics over a real or hypothetical population. 

Examples of null hypotheses abound in genetics, 
perhaps the most famous being those implicit in 
Mendel’s first series of experiments, asserting that the 
proportion of offspring in the first generation from 
the hybrids exhibiting the recessive phenotype is 
exactly 25%. Here the population is the collection of 
all possible peas of that generation bred under the 
specified conditions, the characteristic is qualitative, 
namely the recessive or dominant phenotype, and the 
null hypothesis embodies the well-known Mendelian 
conclusion about the proportion of recessives in that 
generation. Another familiar null hypothesis in genet- 
ics arises in the context of experimental crosses or with 
pedigree data. With the population being the hypothe- 
tical one of all meiotic products of a specified class of 
mating pairs, and the characteristics being two-locus 
phase-known genotypes, the familiar null hypothesis 
of no linkage is equivalent to equal proportions of 
gametes of recombinant and parental types. Yet one 
more example might be the statement that the base 
composition in a given genome is 25% A, 25%C, 25% 


G, and 25% T. 
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As we have defined it, a null hypothesis is no dif- 
ferent from any other statistical hypothesis, and 
strictly speaking that is true. What is missing from 
the description so far is any indication of the role 
null hypotheses play in applied statistical work. In 
general a null hypothesis is a statistical hypothesis of 
the kind we have just described, which is introduced in 
a context where it will be tested using data on the units 
of a random sample from the population. When this 
happens, the major question of interest is whether any 
apparent deviations from the precise expectations 
defined by the null hypothesis are more or less likely 
to have occurred by chance, suitably interpreted. 
Therefore, null hypotheses usually arise in the context 
of their being tested. It is worth emphasizing that null 
(or indeed any other) statistical hypotheses can only 
be asserted to be true on the basis of complete enumer- 
ations of populations. When carrying out tests of null 
hypotheses, we typically find evidence for or against 
their truth, but we usually have no way to conclude 
truth on the basis of sample data. It is also worth 
pointing out that null hypotheses are rarely expected 
tobe precisely true. Rather, null hypotheses are usually 
convenient approximations to the truth, which can 
provide a background against which more subtle is- 
sues may be highlighted. Accordingly, null hypotheses 
frequently concern the precise values, equality or in- 
dependence of population parameters, or randomness, 
in situations where interest really lies in deviations 
from the precise values, inequality, or dependence of 
population parameters, or nonrandomness. 


See also: Population Genetics 


Nullisomy 


M A Ferguson-Smith 
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‘Nullisomy’ is used to describe deletion of both mem- 
bers of a pair of chromosomes in somatic cells. 


See also: Chromosome Aberrations; Karyotype 


Nutritional Mutations 


R A LaRossa 
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These are mutations that either expand or reduce a 
cell’s metabolic capacity. Included among nutritional 


mutants are auxotrophs, variants that grow only when 
the minimal medium needed for wild type growth is 
supplemented with further nutrients that may repre- 
sent the product of a biosynthetic pathway. Nutri- 
tional mutations have been exploited in innumerable 
ways ranging from the formulation of the ‘one gene, 
one enzyme hypothesis’ to demonstration of colinear- 
ity of gene with polypeptide. Incorporated into strains 
used in mapping experiments, their use led to the defin- 
ition of circular bacterial chromosomes. Moreover, 
they have served as starting points for the selection 
of suppressor and regulatory mutations. The elucida- 
tion of biosynthetic pathways was advanced by using 
nutritional mutants as sources of enzymes, substrates, 
accumulated intermediates and byproducts. In bac- 
teria, the linkage of biosynthetic genes into tight clus- 
ters on the bacterial chromosome contributed much to 
the concept of the operon and polycistronic messen- 
ger RNA. Thus nutritional mutations have impacted 
on our understanding of genetics, regulation, bio- 
chemistry and physiology in profound ways. 


Mutations Effecting the Utilization of 
Organic Compounds for the Supply of 
Major Elements (Carbon, Nitrogen, 
Sulfur, Phosphorous) and Energy 


A wide spectrum of mutations can preclude or allow 
the utilization of specific carbon/energy sources. For 
example, Escherichia coli araB mutants which cannot 
use arabinose as a sole carbon/energy source illustrate 
aloss ofacatabolic function, ribulokinase. Similar muta- 
tions have been isolated that allow E. coli to grow with 
glucose as a carbon/energy source but limit the cell’s 
ability to use other carbon sources such as oleate (fad), 
acetate (ace), galactose (gal), or lactose (lac). Other 
mutants, defective in glycolysis, have been isolated 
that cannot use glucose as a carbon/energy source 
but thrive when supplied with trioses. In contrast, 
certain E. coli bgl mutations allow the cell to use 
salicin or arbutin as sole carbon sources while Sal- 
monella typhimurium hut mutations allow histidine to 
satisfy the cellular demand for carbon/energy; these 
mutations represent a gain-of-function. Thus the range 
of carbon source mutant alterations is quite broad. 
Similarly mutations can allow or preclude the use 
of organic compounds as sole nitrogen, phosphate or 
sulfur sources. For example E. coli lacI repressor, ilvA 
feedback insensitive threonine deaminase, gabC regu- 
lator and þisP transport mutations have expanded 
cellular metabolic capacity allowing the respective 
use of N-acetyllactonate, t-threonine, aminobutyrate 
and L-arginine as sole nitrogen sources. A variety of 
conditional E. coli and S. typhimurium mutations pre- 
clude the use of sulfate as a S source while allowing 


satisfaction of the nutritional requirement by organic 
compounds such as cysteine, glutathione and djenko- 
late. E. coli K-12 is unable to cleave certain phosphon- 
ates although it contains a cryptic operon that can be 
mutationally activated to express phosphonate 
degrading activity allowing phosphonate to serve as a 
sole P source. 


Mutations Effecting Central Fueling 
Pathways 


Such mutations have been most extensively studied in 
E. coli; approximately 80 structural genes for these 
activities have been identified. Together with approxi- 
mately 300 genes that allow ŒE. coli to catabolize 
diverse carbon sources to a small set of common, 
glycolytic or TCA cycle intermediates, a rather com- 
prehensive view of carbon utilization has emerged 
from the genetic, physiological and biochemical stud- 
ies of this organism. 


Mutations Effecting Biosynthesis of 
Amino Acids, Lipids, Nucleotides and 
Cofactors 


Nutritional mutations of this class have been exten- 
sively studied. Perhaps the most complete set of such 
mutations is available in E. coli although significant 
collections are available in Salmonella, Pseudo- 
monads, Bacillus, Neurospora and yeast. 

For E. coli, over 120 amino acid biosynthetic genes 
have been identified; nearly that many E. coli genes 
have been shown to be required for cofactor biosyn- 
thesis. Nucleotide biosynthesis requires at least 60 
genes while at least 25 genes are needed for fatty acid 
synthesis in this bacterium. 

Amino acid auxotrophy is not limited to biosyn- 
thetic gene mutations. Mutations in the structural 
genes for aminoacyl-tRNA synthetases that result in 
enzymes with lowered affinity for the cognate amino 
acid have been recovered as auxotrophs. Thus nutri- 
tional mutations can extend from synthesis of building 
blocks at least partially into the assembly of macro- 
molecules. 


Occurrence of Nutritional Mutations 


Bacteria and fungi have been the traditional organisms 
of choice for those studying nutritional mutations. 
Nonetheless, auxotrophic mutations have been isol- 
ated in a broad spectrum of organisms from the fruit 
fly to a variety of plants including Arabidopsis thali- 
ana. Nutritional mutations have also been observed in 
humans; inborn errors of metabolism resulting from 
the loss of catabolic enzyme activity have along history 
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in genetics dating from the pioneering writings of 
Garrod. These clinical genetic analyses have indicated 
that the accumulation of catabolic intermediates can 
be catastrophic, leading to symptoms as severe as 
mental retardation and death. Included among severe 
inborn errors are classical phenylketonuria (PKU) and 
propionic acidemia (PAA). PKU, associated with a 
phenylalanine hydroxylase deficiency, results in loss 
of mental capacity if dietary phenylalanine exceeds the 
minimum necessary for growth. PAA, a near lethal 
inborn error, caused by a lack of propionyl-CoA carb- 
oxylase, an enzyme needed in the catabolism of iso- 
leucine to the TCA cycle intermediate succinyl-CoA. 


Selection of Auxotrophic Mutations 


In bacteria, a number of agents allow for the enrich- 
ment of nongrowing cells by killing the members of a 
population capable of dividing. Compounds such as 
penicillin, ampicillin, nalidixic acid or cycloserine 
have been used for the enrichment of auxotrophs 
from a background of metabolically competent cells. 
Performing such enrichments in a minimal medium 
allows the isolation of a variety of auxotrophs; such an 
enrichment in a defined medium in which all but one 
pathway endproduct is added (e.g. the medium con- 
tains all bases, vitamins and 19 of the 20 common 
amino acids) allows the efficient recovery of mutants 
requiring that one endproduct for growth. 


Intersection with Recombinant DNA 
Technology and Biotechnology 


Auxotrophic mutations of E. coli and yeast have been 
used as tester strains with which to isolate comple- 
menting plasmids or phages from homologous or het- 
erologous genomic or cDNA libraries. Subsequently, 
such genes have been used as heterologous hybridiza- 
tion probes allowing nonexpressed genes to also be 
isolated. Obtaining families of homologous genes 
from a variety of species allows the manipulation of 
genes whose products display a variety of allosteric, 
regulatory properties. Such genes and products 
have impacted the metabolic engineering of both 
microbes and transgenic plants. Certain commercially 
important crop protection chemicals and antibiotics 
cause phenotypic equivalents (‘phenocopies’) of auxo- 
trophic mutations; this, together with the non- 
pathogenic nature of specific auxotrophic mutants, 
underscores the importance of these mutations and 
genes in medicine, industrial microbiology, and agri- 
culture. 


See also: Auxotroph; Biochemical Genetics; 
Escherichia coli; Metabolic Disorders, Mutants 


Ochoa, Severo 


M Salas 
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Severo Ochoa (1905-93) is one of the great biochemists 
of the twentieth century. He had the foresight to move 
from physiology to biochemistry, and then to molecu- 
lar biology, always being at the frontiers of these fields. 
In 1959 he was awarded the Nobel Prize for Physio- 
logy or Medicine for his discovery of polynucleotide 
phosphorylase, an enzyme used to synthesize ribo- 
nucleic acid (RNA) for the first time in the test tube. 

Severo Ochoa was born on 24 September 1905 in 
Luarca, a village of the Asturias province on the north 
coast of Spain. As a high school student he was attract- 
ed by biology and, as a way to learn this discipline, he 
decided to study medicine. While he was still a medical 
student he started to work with Juan Negrin, head of 
the Department of Physiology at Madrid University, 
and spent the summer of 1927 in the laboratory of 
Noel Paton in Glasgow. During this time he developed 
a simple micro-method for the determination of cre- 
atine concentrations in muscle, which he went on to 
publish in the Journal of Biological Chemistry. After 
finishing his medicine degree, Ochoa spent 2 years 
in the laboratory of Otto Meyerhoff in Berlin. Back in 
Madrid in 1931 he married Carmen Garcia Cobian. In 
1932 Ochoa moved to the laboratory of Sir Henry 
Dale at the National Institute of Medical Research in 
London, where he worked on his first enzyme, glyoxy- 
lase, together with H. W. Dudley. Back in Madrid, he 
defended his PhD thesis on the role of adrenal glands 
on muscle contraction. 

When civil war broke out in Spain in 1936 he 
returned to Meyerhoff’s laboratory in Heidelberg 
where he studied the action of nicotinamide adenine 
dinucleotide (NAD), known at that time as cozymase. 
When Meyerhoff was forced to leave Germany because 
of the Nazi regime, Ochoa spent 6 months in the 
Marine Biological Laboratory in Plymouth, England. 
He went on to join the laboratory of Rudolf Peters at 
Oxford University, where on investigating the role 
of vitamin B, and cocarboxylase in the mechanism of 


pyruvate oxidation, he discovered the coupling of phos- 
phorylation to the oxidation of pyruvic acid. In 1940, 
World War II forced Ochoa to move again. This time he 
went to the laboratory of Carl and Gerty Cory at the 
Washington University School of Medicine inSt. Louis, 
USA. There, Ochoa was introduced to the techniques 
of isolation of several of the enzymes of the cycles and 
characterization of enzymes. In 1942 he accepted a 
position as research associate in the Department of 
Medicine at New York University School of Medicine. 
After 2 years in this department, he moved to the 
Department of Biochemistry as Assistant Professor, 
and 2 years later he accepted the chair at the Department 
of Pharmacology. In 1954, he was appointed chairman 
of the Department of Biochemistry. 

His first studies at New York University dealt with 
oxidative phosphorylation, where he found a P/O 
ratio of 3 for the phosphorylation produced by pyru- 
vic acid oxidation. To understand further the process 
of oxidative phosphorylation he decided to study sev- 
eral of the key enzymes of the tricarboxylic (citric) 
acid cycle (the Krebs cycle), which resulted in the 
isolation and characterization of several of the key 
enzymes of the cycles. Particularly relevant was the 
identification and crystallization of the enzyme that 
makes citric acid from acetyl CoA and oxalacetate. 
This key enzyme in the citric acid cycle, the condens- 
ing enzyme, also bears Ochoa’s name. Another 
enzyme of the citric acid cycle, the malic enzyme, led 
Ochoa to obtain for the first time a light-dependent 
reduction of pyridine nucleotides from chloroplast 
preparations. Furthermore, the study of the condens- 
ing enzyme and acetyl CoA led Ochoa to become 
interested in fatty acid metabolism with the discovery 
of several enzymes in the pathway that converts fatty 
acids to acetyl CoA. 

Ochoa’s continuing interest in oxidative phos- 
phorylation led in 1955 to the discovery with Marianne 
Grunberg-Manago, a French postdoctoral fellow, of 
an enzyme, polynucleotide phosphorylase, in the 
bacterium Azotobacter vinelandii, which is able to 
synthesize ribonucleic acid (RNA) from ribonucleo- 
side diphosphates. Despite the fact that this enzyme 
did not require a DNA template to direct the assembly 
of specific RNA messages, thus ruling out its function 
in the biosynthesis of RNA, Ochoa received the Nobel 
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Prize in Physiology or Medicine in 1959 for synthe- 
sizing RNA for the first time in vitro. The award of the 
Nobel Prizein Physiology or Medicine was shared with 
his former postdoctoral student, Arthur Kornberg, 
in this case for the discovery of DNA polymerase, 
an enzyme able to synthesize DNA im vitro in a 
template-directed way. 

Polynucleotide phosphorylase was later found to be 
a crucial factor in deciphering the genetic code. After 
Marshall Nirenberg discovered that polyuridylic 
acid (poly U) was able to encode a homopolypep- 
tide, polyphenylalanine, Ochoa used polynucleotide 
phosphorylase to synthesize different homo- and 
heteropolynucleotides that, in a race with the group 
of Nirenberg, led to the identification of the nucleot- 
ide triplets that encode the 20 amino acids in the 
synthesis of proteins. Ochoa’s work also showed that 
the genetic code is degenerate, that is, most of the 
amino acids are encoded by more than one triplet. 
By using polynucleotides starting or ending with spe- 
cific triplets, Ochoa’s laboratory determined the 
direction of reading of the genetic code. They also 
determined im vitro that UAA is a termination codon. 

Ochoa was also interested, although not really per- 
sonally involved, in the synthesis of RNA in viral RNA 
genomes such as those of phages MS2 and QB. This 
work was mainly carried out by Charles Weissmann in 
Ochoa’s laboratory. 

Another important accomplishment in Ochoa’s 
laboratory was the discovery in 1966 in Escherichia 
coli of the two initiation factors, named IFI and IF2, 
needed to start protein biosynthesis with formyl- 
methionyl tRNA by recognizing the initiation codon 
AUG. Later on, a third initiation factor, IF3, was also 
discovered by Ochoa’s group. 

At the beginning of the 1970s Ochoa switched to 
the study of initiation of protein synthesis in eukar- 
yotes with the discovery of new proteins involved in 
this step of protein synthesis and its control. This 
work was carried out until 1974 at New York Uni- 
versity and then at the Roche Institute of Molecular 
Biology in Nutley, New Jersey until 1985, when 
Ochoa went back to Spain to become Honorary 
Director of the Center of Molecular Biology ‘Severo 
Ochoa’ until his death on 1 November 1993. 

In conclusion it can be said that Ochoa’s history is 
synonymous with the history of biochemistry and 
molecular biology in the second part of the twentieth 
century. His contributions to many of these areas of 
biology were in most cases seminal, and he managed to 
work on the most important biological problems of 
the time. 

Ochoa enjoyed working and he was able to trans- 
mit his enthusiasm to his disciples. I am very fortu- 
nate to have been one of them. As he said in the 


autobiographical work entitled “The pursuit of a 
hobby”, which he wrote for the Annual Review of 
Biochemistry in 1980: “Biochemistry is my hobby.” 


Further Reading 
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See also: Kornberg, Arthur; Nirenberg, Marshall 
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Ochre Codon 
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The nucleotide triplet UAA, or ochre codon, is one of 
the three ‘nonsense’ codons responsible for termin- 
ation of protein synthesis, and is the most frequent 
termination codon in Escherichia coli. 


See also: Amber Codon; Opal Codon 


Ochre Mutation 
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An ochre mutation refers to changes in the DNA 
sequence that convert an amino acid codon (UAA) 
into an ochre codon. 


See also: Nonsense Codon; Ochre Codon; Start, 
Stop Codons 


Ochre Suppressor 
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Ochre suppressors are genes coding for mutant 
tRNAs whose anticodons have been altered such 
that they respond to the ochre codon (UAA). 


See also: Ochre Codon; Transfer RNA (tRNA) 


Ohno’s Law 


M F Lyon 
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In 1967 Susumu Ohno put forward the suggestion that 
a gene which is X-linked in one mammalian species is 
X-linked in all. This has become known as Ohno’s 
Law. The feature of mammals which led Ohno to 
propound this Law was X chromosome inactivation. 
In this phenomenon one X chromosome in every cell 
of female mammals becomes genetically inactive. The 
result of this is that cells of both males and females 
have effectively a single dose of X-linked gene prod- 
ucts but a double dose of autosomal genes. Thus if, 
during evolution, a translocation occurred which 
moved genes from the X chromosome to an autosome 
or vice versa, gene dosages would become unbalanced 
and the translocation would be eliminated by natural 
selection. 

Ohno’s Law is widely obeyed in eutherian mam- 
mals. Very many X-linked genes are known in human 
and mouse, and only one is known to break the Law. 
This is a gene X-linked in human and also in a wild 
mouse species closely related to the laboratory mouse, 
but on an autosome in the laboratory mouse strain 
C57BL. In addition, X-linked genes are known in 
many other species of mammals, including cats and 
dogs, and farm animals such as cow, sheep, and horse. 
All obey Ohno’s Law. 

There are exceptions to the Law among genes 
which are on the human X chromosome but also 
have an ortholog on the Y chromosome, so-called 
pseudoautosomal genes. These genes would not be 
expected to obey the Law because there would be a 
diploid dosage of their products, whether on the X 
and Y chromosomes or on autosomes. Several human 
pseudoautosomal genes have mouse counterparts 
that are autosomal. The constancy of X chromo- 
somal genes throughout mammals contrasts with the 
distribution of autosomal genes. Comparative genetic 
maps have been made covering many mammals from 
various orders, the most detailed maps being from 
man and mouse. Most human autosomes have coun- 
terparts on several different mouse autosomes and vice 
versa. 

Ohno’s Law applies not only to eutherian mam- 
mals, but also to marsupials. However, in marsupials 
only the genes with orthologs on the long arm 
on the human X chromosome are X-linked and 
those from the short arm lie in groups on two or 
more autosomes. Similarly, genes from the human 
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X chromosome long arm, but not from the short 
arm, are X-linked in monotremes. Jenny Graves has 
suggested that genes from the present long arm of 
the human X chromosome constituted the original 
X chromosome in the evolution of mammals and 
have been conserved throughout. In eutherian mam- 
mals, she suggests, there have been two or more 
additions of material from autosomes to the X and 
Y chromosomes. Then, during evolution, ortho- 
logs on the Y chromosome of most of these genes 
have been lost, leaving just a small pseudoautosomal 
region. 

Although all mammalian X chromosomes carry 
the same genes, they are not arranged in the same 
order. The genes on the human and mouse X chromo- 
somes can be divided into several blocks. Within a 
block the genes are in the same order in the two 
species, but the blocks have been rearranged with 
respect to each other. Thus, during evolution, there 
must have been various inversions or other transpos- 
itions of genes. 

Scientifically, Ohno’s Law has been very valuable 
since it has enabled the prediction of which genes will 
be on the X chromosome of previously little-studied 
species. In addition, it is helpful in attempts to find 
animal models of human genetic diseases. If a genetic 
disease is X-linked in man, then its animal model must 
also be X-linked and must lie in the appropriate con- 
served block of genes. 


See also: Pseudoautosomal Linkage, Region; 
Sex Linkage; X-Chromosome Inactivation 


Okazaki Fragment 
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Okazaki fragments are short fragments of newly 
synthesized DNA strands produced during dis- 
continuous DNA replication. They are later joined 
covalently by ligases to form an intact strand. Okazaki 
fragments were first observed by Okazaki using 
pulse-labeling with radioactive thymidine. In eukar- 
yotes, they are typically a few hundred nucleotides 
long, whereas in prokaryotes they are generally longer 
(1000-2000 nucleotides). 


See also: DNA Replication; Lagging Strand; 
Replication; Replication Errors 


1368 Olfaction 


Olfaction 
I Mori 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0924 


Olfaction is a major sense in animals. The detection of 
volatile chemical compounds is an important attribute 
for any animal to survive and reproduce in the natural 
environment. Different animals utilize different types 
of olfactory organs. For example, humans detect 
odors through the olfactory epithelium of the nose, 
whereas most insects detect odors through their 
antennae. Although olfactory systems are divergent 
throughout evolution, olfactory receptor neurons 
possess common properties and structure. It is intri- 
guing to realize that even fish have an olfactory organ 
that is distinct in function and structure from its gus- 
tatory organ, but is similar to the olfactory organ of 
mammals. Similarly, molecular analysis of olfaction 
reveals that on sensing olfactory stimuli, essentially 
the same signaling events occur in vertebrates as in 
invertebrate species. In this review, the mammalian 
olfactory system is described as an example of 
vertebrate olfactory systems. This review also briefly 
considers the olfactory system of the nematode 
Caenorhabditis elegans, which is one of the best- 
characterized sensory systems in invertebrates at 
molecular and cellular levels. 


Olfaction in Mammals 


The mammalian olfactory system is one of the most 
evolved sensory systems. Even humans have the abil- 
ity to detect and discriminate at least 10 000 different 
odorants. In mammals, odors are sensed in the olfac- 
tory epithelium of the nasal cavity, where olfactory 
neurons are distributed in such a way that the sen- 
sory cilia of each olfactory neuron face the nasal cavity. 
Olfaction first occurs in the sensory cilia of olfactory 
neurons, and the generated olfactory signals are trans- 
mitted to the olfactory cortex and to other area of the 
brain through synaptic connections of olfactory neur- 
ons with downstream neurons, such as mitral or tufted 
cells, in the main olfactory bulb. 

In most cases, mammals have a second olfactory 
organ called the vomeronasal organ (VNO), which is 
situated on the lower side of the nasal cavity. The 
VNO detects pheromones and is particularly import- 
ant for some animals such as mice, in which phero- 
mones play a key role in controlling their behaviors. 
Olfactory sensation in the VNO is transmitted to the 
accessory olfactory bulb, which occupies a distinct 
area of the main olfactory bulb. Since the areas of brain 


that receive signals from the accessory olfactory bulb 
are different from those that receive signals from the 
main olfactory bulb, the effects of odorant sensation 
and pheromone sensation cause different behavioral 
and emotional outcomes. 


Olfactory Receptors 


Itis understood that all olfactory receptors are found to 
be G-protein-coupled seven-transmembrane domain 
receptors and are usually encoded by the largest gene 
family in any animal. In mammals, there are about 
1000 genes that encode olfactory receptors in the 
olfactory epithelium. In the VNO, there are two olfac- 
tory receptor families: the V1R family, which consists 
of 35 members; and the V2R family, which consists of 
150 members. These two receptor families are likely to 
detect pheromones. The members of olfactory recep- 
tor families are very diverse in their amino acid 
sequences, which is consistent with the fact that ani- 
mals detect a large number of odorants. Perhaps, each 
odorant interacts with and activates a single or small 
subset of olfactory receptor proteins. 


Organization of Olfactory Receptors 


There are a number of interesting questions regarding 
organization of olfactory receptors. First, molecular 
and cellular studies indicate that each olfactory neuron 
seems to express only a single type of olfactory recep- 
tor. How a single gene is chosen from among 1000 
olfactory receptor-coding genes in a particular olfac- 
tory neuron remains a mystery. Second, recent analysis 
has revealed spatially distinct expression of genes en- 
coding olfactory receptors in the olfactory epithelium. 
Although these receptors are diverse in their make- 
up, they are categorized on the basis of the zone in 
which they are expressed. There are four zones and 
each olfactory receptor is expressed in one of these 
zones. The function of the zonal organization is un- 
known. In the VNO, there appear to be two zones: 
one that expresses members of the V1R family, and the 
other that expresses members of the V2R family. 

On the surface of the main olfactory bulb, there are 
about 2000 units of structures called glomeruli, where 
axons of olfactory neurons synapse onto downstream 
neurons, such as mitral cells. Interestingly, each olfac- 
tory neuron projectsits axon toward a specific glomeru- 
lus. Furthermore, olfactory neurons that express the 
same type of olfactory receptors send their axons to the 
same glomerulus. How is this precise olfactory pro- 
jection established? In one model, the olfactory recep- 
tor per se is thought to be a determinant for projection 
toaparticular glomerulus, since messages (mRNAs) for 
olfactory receptors have been unexpectedly detected in 


the axon that projects to the glomerulus. It still remains 
to be elucidated, however, as to how a receptor expres- 
sed in an olfactory neuron plays a role in olfactory 
axon targeting to a specific glomerulus. The axons of 
VNO neurons also synapse in the glomeruli of the 
accessory olfactory bulb. In the main olfactory bulb, 
a single mitral cell that receives a sensory signal from 
an olfactory neuron is connected to a single glomeru- 
lus, whereas a single mitral cell is connected to mul- 
tiple glomeruli in the accessory olfactory bulb. It is 
generally believed that the VNO sensory system 
reflects the primitive form of olfactory systems in 
vertebrates. 


Olfactory Signal Transduction 


To date, the molecular mechanism of olfactory 
signal transduction in the main olfactory epithelium 
is well established. On sensing a ligand (an odor), the 
G-protein-coupled seven-transmembrane domain re- 
ceptor is activated, which in turn activates G-protein 
Gaolf. Consequently, the activated form of G-protein 
stimulates adenylyl cyclase to increase the intracellu- 
lar concentration of cAMP. Then, the binding of 
cAMP opens a cyclic nucleotide-gated cation channel, 
which leads to depolarization of olfactory neurons. In 
the VNO, sensory signaling is still unknown, although 
several signaling molecules that are different from 
those used in the main olfactory epithelium are im- 
plicated. 

When the same odor is sensed for some time, the 
response to that odor becomes diminished. This 
phenomenon is called olfactory adaptation. Electro- 
physiological studies demonstrated that the continu- 
ous stimulation of olfactory neurons decreases the 
open frequency of ion channels. Olfactory adaptation 
requires extracellular calcium, and can be diminished 
when EGTA is present inside the olfactory neuron. 
Thus, calcium influx induced by olfactory sensation 
causes an increase in intracellular calcium concen- 
trations, which is thought to inhibit the olfactory re- 
sponse by modifying the olfactory signaling pathway. 
What then is the target molecule for calcium modifi- 
cation? Recent studies indicate that calcium directly 
modifies cAMP-gated cation channels, thereby 
decreasing the channels’ sensitivity to cAMP. Also, 
there is evidence to suggest that the sensitivity of 
olfactory receptors is modulated by phosphorylation 
by kinases. 


Genetic Approaches to Studying 
Olfaction: C. elegans as Model System 


Caenorhabditis elegans is a 1mm-long, free-living 
nematode that lives in soil. It is quite likely that 
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C. elegans depends heavily on olfactory cues to find 
and stay near its food source, bacteria, in its natural 
habitat. C. elegans was found to sense, discriminate, 
and adapt to a variety of odors using only six olfactory 
receptor neurons of three types; these are situated 
in the head sensory organs called amphid sensila. 
The C. elegans nervous system consists of only 302 
neurons, the wiring system of which based on ultra- 
structural analysis has been revealed in its entirety. 
The short life cycle, the ease of culturing in the labora- 
tory, and the ease with which genetic crosses by mating 
can be produced make this small worm a powerful 
genetic model organism. 

As in vertebrates, olfactory receptors in C. elegans 
are found to be G-protein-coupled seven-transmem- 
brane domain proteins that are encoded by about 1000 
genes. Of these, nearly 400 genes are thought to 
encode chemosensory receptors, which consist of 
olfactory and gustatory receptors. Thus, the involve- 
ment of a large number of predicted olfactory recep- 
tors in the C. elegans olfactory system is similar to that 
observed in vertebrate olfactory systems, but there are 
differences in other respects. Since there are only six 
olfactory neurons in C. elegans, each one is likely to 
express multiple olfactory receptors, which is consist- 
ent with the results from expression analysis for some 
of the olfactory receptors. As described above, the C. 
elegans olfactory system is in contrast to the mamma- 
lian olfactory system, in which each olfactory neuron 
seems to express a single type of olfactory receptor. 

The C. elegans ODR-10 protein, a predicted 
G-protein-coupled transmembrane domain protein, 
was the first olfactory receptor to be functionally 
revealed by genetic analysis. The ODR-10 receptor 
is expressed only in a single type of olfactory neuron, 
AWA, and interacts with the odorant diacetyl. Of 
the three types of olfactory neurons that mediate 
olfactory responses in C. elegans, the AWA and AWC 
neurons detect attractive cues, and the AWB neurons 
detect repulsive cues. An interesting experiment 
was carried out in which the ODR-10 receptor was 
ectopically expressed only in the AWB neurons 
that usually induce aversion responses. When the 
odorant diacetyl was applied to the transgenic animals 
expressing ODR-10 only in the AWB olfactory 
neurons, an aversive response was induced. This result 
suggests that the olfactory neuron and not the olfac- 
tory receptor determines olfactory responses in 
C. elegans. 

Olfactory signaling in the AWB and AWC neurons 
is found to be similar to that of mammalian olfactory 
neurons, since cyclic nucleotide-gated cation channels 
appear to function in the last step of olfaction in these 
neurons. In the AWA neurons, the OSM-9 protein, the 
capsaicin receptor-like cation channel, is found to be 
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essential, instead of the cyclic nucleotide-gated chan- 
nel. In addition to these molecules, other components 
that are required for olfactory signaling have been 
identified and are gradually becoming specified 
through genetic analysis. Although the olfactory sys- 
tem is essentially conserved throughout vertebrates 
and invertebrates, future genetic analysis of the 
C. elegans olfactory system will reveal further import- 
ant similarities and differences in olfaction across 
species. 


Further Reading 

Buck LB (2000) The molecular architecture of odor and phero- 
mone sensing in mammals. Cell 100: 61 1-618. 

Mombaerts P (1999) Molecular biology of odorant recep- 
tors in vertebrates. Annual Review of Neuroscience 22: 
487-509. 

Mori | (1999) Genetics of chemotaxis and thermotaxis in the 
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33: 399-422. 


See also: Neurogenetics in Caenorhabditis elegans 
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Definition 


During normal growth and differentiation, cell prolif- 
eration is regulated by growth factors that interact 
with specific receptors on the plasma membrane 
and via subsequent reactions eventually lead to alter- 
ations in gene expression. The proteins involved in 
these biochemical steps are the products of proto- 
oncogenes, which are normal cellular genes. If these 
proto-oncogenes are inappropriately activated, they 
become oncogenes and are involved in tumor devel- 
opment. Most oncogene protein products function 
in the signaling pathways that regulate cell prolifer- 
ation in response to growth factor stimulation. These 
products include growth factors, growth factor re- 
ceptors, signal transducers, and transcription factors, 


Events producing oncogene 
activation 


Examples 


Oncogene amplification 


Amplification of the N-myc gene is frequently present in late stage tumors 


and is associated with the progression of neuroblastomas to increased levels of 


malignancy 


Activation of oncogenes by 
transposition to an active chromatin 
domain 


The overproduction of an oncogenic product may also occur by loss of 
transcriptional control through chromosomal translocation, as typified by the 
t(8;14) translocation seen in 75% of patients with Burkitt’s lymphoma. The 


translocation causes the myc oncogene on chromosome 8 to become 
positioned next to an immunoglobulin gene, e.g., the heavy chain on 
chromosome 14. The constitutive expression of the transposed myc gene 
after the translocation thereby leads to an inappropriately high level of gene 


product 


Activation by point mutation 


In members of the ras family, activating single-base substitutions cause amino 


acid changes at positions 12, 13, and 6l in a wide range of human tumors, with 
an overall incidence of 10-15%, but as high as 95% in pancreatic carcinomas. 
These substitutions alter the structure of the normal protein, resulting in 
abnormal activity of the guanine nucleotide-binding proteins that they encode 


Activation by production of chimeric 
gene products 


Oncogenes can also be activated by chromosomal translocation resulting in the 
production of a fusion protein. The best known tumor-specific chromosomal 


rearrangement producing the small acrocentric Philadelphia chromosome is 
seen in 90% of patients with CML. This chromosome is produced by a balanced 
reciprocal 9; 22 translocation. The translocation joins most of the abl gene on 
to a gene called bcl (breakpoint cluster region) on chromosome 22, thereby 
creating a novel fusion gene. This results in both aberrant activity and 
subcellular location of the Abl protein tyrosine kinase, thereby leading to cell 


transformation 


and may also involve direct control of the cell cycle 
and the inhibition of apoptosis. Oncogenes will there- 
fore contribute to the abnormal regulation of cell 
proliferation seen in tumor cells and may contribute 
to abnormal differentiation and failure of program- 
med cell death or apoptosis characteristic of some 
cancers. 


Discovery of Oncogenes 


It is now accepted that cancer is a genetic disease, 
caused by mutations in a number of specific genes. 
This was not clear until 1960 when cytogenetic 
analysis showed that the Philadelphia chromosome 
was consistently found in the cells of patients with 
chronic myelogenous leukemia (CML). This sug- 
gested that genetic aberrations were likely to be 
associated with the production of a cancer cell. 
Further evidence of this association came from the 
identification of a link between cancer and viruses 
when Peyton Rous discovered that a virus (the 
Rous sarcoma virus) caused sarcomas in chickens 
now known to be due to a single gene, the v-src 
oncogene. 
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The Rous sarcoma virus is a member of the retro- 
virus family of viruses that possess an RNA genome 
encoding three genes essential for viral replication. 
However, the genome of the transforming Rous 
sarcoma virus contained a fourth gene that was not 
viral in origin but had been accidentally picked up 
by the virus from its host during a process termed 
transduction. Other examples of this process also 
exist. 


Oncogenes and Human Cancer 


It has been shown that DNA extracted from chem- 
ically transformed cells can transform recipient 
mouse-derived NIH-3T3 cells. Subsequent genome 
analysis revealed the presence of oncogenic sequences 
homologous to those found in the transforming retro- 
viruses. Oncogenic sequences were subsequently 
identified in DNA extracted from both human 
tumor cell lines and biopsies. 

Nearly 200 proto-oncogenes have now been iden- 
tified and an activated oncogenic form of at least one 
of these genes has been shown to be associated with 
most human tumor groups. 


Functions of the products 
of oncogenes 


Examples 


Growth factors 


Bombesin, a peptide produced in small cell lung cancer causes the hydrolysis 


of the membrane phospholipid, phosphatidylinositol 4,5-bisphosphate (PIP2) leading to 
an increase in intracellular calcium, which acts as a message for cells to enter the cell 


cycle 


Growth factor receptors 


ErbB2 is a receptor protein tyrosine kinase activated by gene amplification and is 


overexpressed in approximately 30% of ovarian cancers, and between I5 and 20% of 


invasive breast cancers. Overexpression is also associated with poor prognosis 


Oncogenes as signal 
transducer 


Ras proteins play a key role in mitogenic signaling by coupling growth factor receptors 
to activation of the Rafl protein serine/threonine kinases. These initiate a protein 


kinase cascade which ultimately leads to phosphorylation of nuclear transcription 
factors and therefore altered gene expresssion 


Transcription factors 


Myc gene products induce cell proliferation and the inhibition of terminal differentiation 


in response to mitogenic stimuli. N-myc is amplified frequently in neuroblastomas, 
retinoblastomas, gliomas, and astrocytomas 


Cell-Cycle regulators 


Cyclin DI can be activated to an oncogene (called PRAD I) by gene amplification, 


thereby leading to constitutive expression and driving the cell cycle forward beyond GI 
in the absence of normal growth factor stimulation 


Apoptosis inhibitors 


Bcl-2 overexpression in certain lymphoid neoplasias results from chromosomal 


translocation t(14; 18)(q32; q21) involving the immunoglobulin heavy chain locus at 


chromosome |4q32 and the bcl-2 gene on chromosome 18q21. Because the normal 
function of bcl-2 is to suppress apoptosis, its increased expression will reduce levels of 


apoptosis, thereby maintaining cell survival, and contributing to both tumor formation 


and progression 
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Oncogene Activation 


Activation of proto-oncogenes to oncogenes results in 
a gain of function and may be quantitative (an increase 
in the production of an unaltered product) or qualita- 
tive (the production of a modified product). As a 
result of these alterations, activated oncogenes induce 
abnormal cell proliferation and therefore tumor devel- 
opment. Quantitative forms of oncogene activation 
occur either by amplification or by transposition to 
an active chromatin domain, whereas qualitative 
forms of activation occur either by point mutation or 
by the production of a novel product from a chimeric 
gene (Table 1). These changes are generally dominant 
mutations and are clonally maintained. 


Oncogene Function 


Oncogenes include genes that encode proteins with a 
wide variety of functions as seen in the examples in 
Table 2. 


See also: Apoptosis; Cell Cycle; FMS Oncogene; 
Myb Oncogene; Neu Oncogene; Philadelphia 
Chromosome; Pim Oncogenes; Rel Oncogene; 
RET Proto-Oncogene 
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The registered term OncoMouse™ refers to a genet- 
ically engineered transgenic mouse whose cells con- 
tain an activated oncogene DNA sequence. 


OncoMouse™ Biology 


The advent of transgenic mouse technology enabled 
the investigation of how deregulated expression of 
viral and cellular oncogenes could contribute to the 
multistep process of cancer in the context of the living 
mammal. In the early 1980s, conducting research at 
Harvard Medical School, Timothy Stewart and Philip 
Leder employed transgenic mouse technology to 
examine the consequences of the deregulation of the 
myc proto-oncogene. The myc transgene was con- 
structed so its expression would be under the control 
of the hormonally inducible mouse mammary tumor 
virus regulatory elements. Transgenic mice were 
produced and deregulated expression of the myc 
proto-oncogene in these mice was associated with 


spontaneous mammary adenocarcinomas, leading to 
the term OncoMouse™. Despite the fact that myc 
expression was deregulated in all mammary epithelial 
cells, tumors arose from only a very small number of 
cells. Thus, although the myc gene could contribute 
to tumorigenesis, it alone was not sufficient for the 
tumorigenesis seen in these transgenic mice. This ob- 
servation supported the view that genetic mutations in 
different genetic loci are acting in a multistep pathway 
to cancer. Moreover, the myc transgenic mice provided 
a powerful animal model to facilitate the identification 
of these collaborating cancer genes. 

The paradigm of examining the consequences of 
deregulated oncogenes in transgenic mice was quickly 
extended to a large number of proto-oncogenes (e.g., 
ras, Wnt, neu). The availability of cancer-prone trans- 
genic strains of mice continues to provide powerful 
animal models to study genetic mechanisms, environ- 
mental contributions, and physiological responses 
to cancer in a living mammal. Moreover, the avail- 
ability of increasing well-characterized cancer-prone 
transgenic strains has additional utility in three other 
important areas: (1) derivation of tumor cell lines 
for cell culture; (2) assay of potential carcinogens; 
and (3) testing the effectiveness of novel anticancer 
therapeutics. Lastly, the experimental results obtained 
with the OncoMouse™ model were influential in 
catalyzing the rapid incorporation of transgenic tech- 
nology in the engineering and characterization of 
many other mouse models for a wide range of 
human diseases. 


OncoMouse™ Policy Ramifications 


In addition to its fundamental biological importance, 
the OncoMouse™ continues to be a ‘lightning rod’ for 
the changes and debate surrounding the funding of 
academic biological research and the intellectual pro- 
perty positions and commercialization of genetically 
engineered animals. In 1984, Harvard University filed 
for a patent on “Transgenic Non-Human Mammals.” 
In April of 1988, in the midst of a policy and political 
controversy centered on a the patenting of animals, 
the US Patent and Trademark Office awarded 
Harvard a patent, historically, the first ever on an 
animal. Because the research carried out at Harvard 
Medical School was funded in part by E.I. Dupont de 
Nemours and Company, Inc. the technology was 
licensed exclusively to DuPont under the registered 
name of OncoMouse “, and, soon after, the company 
began marketing and selling the OncoMouse”. In 
both scientific and societal contexts, the Gace: 
Mouse“ patent became emblematic of the contro- 
versy surrounding the patenting, marketing, and 
accessibility of genetically modified life forms. In 


January 2000, after more than a decade of controversy 
regarding the access and utilization of proprietary 
OncoMouse™ transgenic technology, the US National 
Institutes of Health and Dupont reached an agreement 
by which Dupont will retain its commercial rights but 
academic and government researchers will have unen- 
cumbered use of the OncoMouse”. 


See also: Carcinogens; Oncogenes; Transgenic 
Animals 
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Oogenesis is the process of forming the female 
gamete, i.e., the ovum or egg. In Caenorhabditis ele- 
gans, gametes derive from a tissue called the germ line, 
which is specified early in embryonic development. 
Two major events occur during oogenesis: the oocyte 
precursor germ cell undergoes meiotic division and it 
accumulates substantial cytoplasm. In meiosis, two 
sequential rounds of cell division produce a haploid 
egg, with only one copy of each chromosome, from 
the diploid oocyte precursor cell. Simultaneously, a 
large volume of cytoplasm is accumulated; it contains 
yolk and numerous other components that are essen- 
tial for early embryonic development. Meiotic pro- 
gression seems to be an integral part of oogenesis, 
since a number of proteins are required both for meio- 
tic progression and for the development of functional 
oocytes. For example, GLD-1, an RNA-binding 
protein, is required for maintenance of oocyte pre- 
cursors in pachytene stage (see below); in its absence, 
female germ cells will enter meiosis and progress 
to pachytene stage, but then exit meiosis and return 
to mitotic proliferation. In contrast, male germ 
cells do not require GLD-1 for meiosis and gameto- 
genesis. 

The C. elegans gonad is a U-shaped tube and has a 
distal-to-proximal polarity with respect to germline 
development. Germ cells at the distal end of the tube 
are proliferative (mitotic) and germ cells located more 
proximally are meiotic; sperm and mature oocytes 
are present at the proximal end. Certain somatic 
gonad cells, the distal tip cells, maintain a mitotic 
germ cell population in the distal gonad by signaling 
the germ line to proliferate. Most of the C. elegans 
germ line is technically a syncytium with nuclei 
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arranged toward the outside and a common cyto- 
plasmic core that is critical for oogenesis. However, 
each nucleus is associated with local cytoplasm and 
partially enclosed by a plasma membrane; therefore, 
for ease of description, it is often referred to as a 
germ ‘cell.’ 

The C. elegans hermaphrodite produces sperm dur- 
ing mid-late larval development and abruptly begins 
to produce oocytes at approximately the time of the 
larval-to-adult molt; oogenesis continues throughout 
adulthood. Consequently, the hermaphrodite germ- 
line is considered to be male during larval develop- 
ment and become female just prior to the adult molt 
through the regulation of a set of sex determination 
genes. Oocyte precursors located just proximal to 
the distal proliferative region enter meiosis and pro- 
ceed fairly rapidly through early meiotic stages (lepto- 
tene and zygotene stages of prophase I of meiosis). 
They progress very slowly through pachytene stage 
of prophase I during which time oocyte cytoplasmic 
contents are synthesized. By synthesizing compon- 
ents of the oocyte cytoplasm, the oocyte precursor 
cells also act as support cells; they are analogous 
to germline * nurse’ cells found in some other animal 
species. 

The late-stage C. elegans oocyte has a cytoplasmic 
volume far larger than the average cell in the body. It 
contains materials essential for embryonic develop- 
ment in general and early embryogenesis in particular, 
including factors that facilitate metabolism and the 
rapid DNA replication and cell cleavages characteris- 
tic of early development. It also includes specialized 
proteins and messenger RNAs required for setting up 
the embryonic body plan and distinguishing the fate 
of various early embryonic cells. Evidence suggests 
that many of these components are synthesized by 
pachytene germ cells and are moved into the common 
cytoplasmic core, which is eventually included in the 
growing oocytes. In contrast, yolk proteins are 
synthesized in the intestine (see below). 

Most developing oocytes located at and proximal 
to the bend in the gonad exit pachytene stage and 
proceed further through meiosis to diakinesis stage 
of prophase I. They also begin to change morphologic- 
ally, becoming progressively larger and eventually 
taking on the block-like morphology of mature 
oocytes. Most of this growth occurs while cells are in 
diakinesis. Other oocyte precursors at the bend do not 
develop further, but instead undergo programmed cell 
death, perhaps to provide room for the remaining 
oocytes to grow. In the loop region and proximal 
gonad, cells of the somatic gonad, the ‘sheath’ cells, 
form an epithelium that encloses the germ line and 
regulates oogenesis in at least two ways. First, 
together with cells in distal spermatheca, the sperm 
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storage vesicle, sheath cells appear to signal meiotic 
progression beyond pachytene stage. Signaling may 
be accomplished via gap junctions that are observed 
between sheath cells and maturing oocytes. Progres- 
sion past the pachytene stage also depends on signal- 
ing via a mitogen-activated protein kinase (MAPK) in 
the germ line, which is perhaps triggered by the prox- 
imal sheath cell/distal spermathecal signal. Second, the 
proximal sheath cells act as an oviduct. Contractions 
of the sheath cell epithelium, together with dilation of 
the spermatheca, allow oocytes to be ovulated and 
subsequently fertilized. Evidence suggests that the 
oocyte may actively regulate ovulation by modulating 
sheath cell contractions and by signaling spermathecal 
dilation. Sheath cells may also play a role in yolk 
uptake. Yolk proteins are synthesized in the intes- 
tine and transported to the proximal gonad as yolk 
particles. They are taken up by oocytes in the prox- 
imal gonad through specialized pores in the sheath 
cells. 

As the proximal-most oocyte completes differen- 
tiation, it pinches off from the syncytium and is ovu- 
lated into the spermatheca where it is fertilized. 
At ovulation, the oocyte is triggered to complete 
meiosis by interaction with sperm cells. In the ab- 
sence of sperm cells (e.g., in an old hermaphrodite 
that is purged of sperm), oocytes do not pro- 
gress beyond diakinesis. As the oocyte fuses with a 
sperm cell, it resumes meiotic progression and under- 
goes the two rounds of meiotic cell division. To pre- 
serve the large egg volume, these divisions are 
extremely asymmetric: the first division (MI) pro- 
duces a large diploid oocyte and a tiny diploid 
polar body; the second division (MII) produces a 
large haploid egg and a tiny haploid polar body. 
The haploid egg and sperm nuclei (technically called 
pronuclei) can now fuse, allowing fertilization to 
be completed. A protective eggshell is subsequently 
deposited on the egg. 

Systematic screens for oogenesis-defective mutants 
have not been carried out in C. elegans, but oogenesis 
is clearly a complex process that depends on a wide 
variety of gene products. Mutations in g/d-7 and com- 
ponents of the MAPK signaling pathway disrupt 
meiotic progression. Mutations in many genes will 
decrease the rate of ovuluation, thereby disrupting 
the normal process of oocyte maturation. Numerous 
other gene products have been identified that are 
important for production of functional oocytes, yet 
do not seem to regulate meiotic progression. These 
genes have oogenesis-defective (Oog) mutant pheno- 
types. In general, their primary function during germ- 
line development is not clear, but mutants produce 
small oocytes incapable of supporting embryonic 
development. 
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The production of haploid germ cells in the female 
follows a very different course than the male. Un- 
like the male, a female is born with all of the hapoid 
cells that she will ever have (~50000 in the mouse 
and 1 million in women). The mature haploid cell is 
called an egg or oocyte, and the process by which it is 
produced is called oogenesis. Oogenesis begins inside 
the newly formed ovaries of the developing fetus. 
Long before birth, primordial germ cells differentiate 
into oogonia (plural of oogonium) and enter meiosis, 
but stop at the diplotene stage of the first meiotic 
prophase. These primary oocytes remain arrested 
in suspended animation — for weeks in mice and 
many years in human females — until after the time of 
puberty. 

From this time on, the female will progress through 
an estrus cycle with a ~4-day period in mice and a ~28- 
day period in women. During each cycle, primary 
oocytes (one in women and 8-10 in mice) are stimu- 
lated to continue the process of differentiation. 
Differentiation leads to the completion of the first 
meiotic division and the extrusion of the first polar 
body. The second meiotic division is begun, but stops 


at metaphase. The mature secondary oocyte is now 
released from the ovary, in a process called ovulation, 
and passes into an oviduct (known as a fallopian tube 
in human females). For a brief period of time known as 
estrus, each mature oocyte, or egg, remains alive and 
receptive to fertilization. Most wild mammals die 
while they still have the ability to reproduce. Human 
females, however, usually live long enough to pass 
through a stage called menopause when they stop 
cycling through estrus and are no longer able to 
reproduce. 


See also: Spermatogenesis, Mouse 
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Opal codon is the old term for the nucleotide triplet 
UGA, one of the three ‘nonsense’ codons responsible 
for termination of protein synthesis. 


See also: Amber Codon; Ochre Codon 


Open Reading Frame 


J Parker 
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An open reading frame (ORF) is a sequence of DNA 
which, if transcribed, could be translated to yield a 
protein of known length and composition. A func- 
tional ORF is one that actually encodes a protein in 
the cell. The rapid increase in the amount of DNA 
sequence available from different genomes has made 
the search for functional ORFs of considerable 
importance, at least in prokaryotes. 

The vast majority of protein-encoding genes in 
prokaryotes do not contain introns, so that an ORF 
would typically be congruent with the complete cod- 
ing portion of the gene. In the cell, ribosomes establish 
a reading frame by initiating translation at a start 
codon, usually an AUG. The ribosome then proceeds 
until it reaches an in-frame stop codon, UAA, UAG, 
or UGA. In prokaryotes, therefore, an ORF begins 
with a sequence which would encode a start codon 
and ends when it reaches a sequence encoding a 
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stop codon in the same frame. However, in order to 
identify putative functional ORFs from DNA se- 
quence data, the analysis is often considerably more 
complex than simply searching for in-frame start and 
stop codons. 

An ORF is likely to be functional if its sequence is 
similar to sequences of ORFs obtained from genomes 
of other organisms, or if some part of the sequence 
has a motif common to known protein functional 
domains. The length of the ORF is also an important 
parameter, since even in random sequences of DNA 
there will be many short ORFs. Most functional pro- 
karyotic ORFs are longer than 100 codons. Sometimes 
codon bias can also give a clue as to whether the 
ORF is functional. Most organisms show preferences 
among synonymous codons. In the case of some pro- 
karyotic genes, the actual start of translation does not 
begin with the first possible start codon, nor is the 
start codon always an AUG. Prokaryotic ribosomes 
typically locate initiation sites using a so-called Shine- 
Dalgarno sequence, which is found on the message 
immediately upstream of the start codon. Therefore, 
searching DNA sequences from prokaryotic sources 
for a potential Shine-Dalgarno sequence can help 
establish whether an ORF is functional, and which 
potential start codon is actually used. Using this type 
of analysis can eliminate many ORFs which are 
almost certainly not functional. Even so, the analysis 
may leave a very large number of ORFs in which 
functionality can only be assumed. Even for an 
extremely well-studied prokaryote like Escherichia 
coli, 38% of the 4288 ORFs identified on the se- 
quenced chromosome are considered hypothetical, 
unclassified, or unknown. 

The situation is much different in eukaryotes, 
which have untranslated introns in protein-encoding 
genes. Therefore, in eukaryotes ‘open reading frame’ 
is often used simply to mean the number of potential 
sense codons between two in-frame stop codons, i.e., 
sequences which may or may not contain obvious 
start codons. There are many fewer clues when examin- 
ing eukaryotic DNA sequences as to whether an ORF 
is functional, although similarity to known functional 
ORFs and codon bias are still important. The fact 
that an ORF in a eukaryote is quite short does not 
necessarily mean it is nonfunctional, since some exons 
are very short. Therefore, although the exons of 
protein-encoding genes must be ORFs, searching for 
ORFs in genome sequences of higher eukaryotes is 
much more challenging than it is in the genomes of 
prokaryotes. 


See also: Coding Sequences; Codon Usage Bias; 
Introns and Exons; Sense Codon; Translation 
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Operators 


J C Hu 
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‘Operators’ are DNA sites where transcription factors 
bind and alter the frequency of initiation of transcrip- 
tion. Operators were initially identified as genetic loci 
that gave a constitutive phenotype when mutated. 
Operator mutations affect the regulation of genes 
that are in cis i.e., physically coupled to the operator 
by being encoded by a contiguous sequence of DNA. 
Operators control genes that are in an operon, i.e., 
cotranscribed into the same mRNA from a site in the 
DNA called the promoter. 

The existence of operators was postulated by Jacob 
and Monod as part of the operon model of gene con- 
trol in the lactose utilization system of Escherichia coli 
(the lac operon). Pardee, Jacob, and Monod had found 
that a regulatory molecule, the repressor, controlled 
the inducible synthesis of lac operon proteins. Jacob 
and Monod predicted that the repressor should act by 
recognizing a specific ‘receiver’ physically associated 
with the regulated genes, which they named the oper- 
ator. This model predicted that operator mutations 
would lead to a constitutive phenotype and that they 
would be dominant, because the presence of a second 
copy of the operator on another chromosome would 
not affect the ability of the repressor to bind the 
mutant operator. In addition, the effects of operator 
mutations would be ‘cis-acting’; i.e., they would only 
affect the genes that were on the same chromosome. 
By contrast, mutations that inactivated the repressor 
would either be complemented by a wild-type copy of 
the repressor or would be dominant in either the cis or 
trans configuration. The model was confirmed by the 
isolation of lac constitutive mutants in an E. coli strain 
that was merodiploid for the lac operon. By genetic 
crosses to place the operator mutations in cis and in 
trans to mutations affecting lac operon proteins, Jacob 
and Monod (1961) showed that the mutations, called 
‘O*%,’ were indeed cis-dominant. 

For the paradigm systems studied by Jacob and 
Monod, the lac operon and the control of lysogeny 
in phage A, the repressors are oligomeric proteins, and 
the operators are DNA sequences that engage the 
repressors. Mutations in the operators usually act by 
reducing the binding affinity of the repressor. While it 
was originally thought that operators were exclusively 
short DNA sequences that overlapped promoters, it is 
now clear that many operators involve sequences that 
can be either far upstream or downstream from the 


promoter. Many operators, including Jac and i, turn 
out to function as multipartite elements. Binding to 
two or more operators is often required to achieve 
normal transcriptional regulation; transcription factors 
often bind to multiple operators cooperatively. When 
the individual operator sites are separated by signifi- 
cant distances, cooperative binding often involves the 
bending of the intervening DNA into a loop. 

The molecular mechanisms by which repressors 
and operators control the initiation of transcription 
are now known in great detail for many bacterial 
regulatory systems. Using purified proteins and 
DNA, it is possible to determine how repressors affect 
the rates of different steps in the process of transcrip- 
tion initiation, and to examine complexes trapped 
when an operator is bound by its cognate repressor. 
Both the Jac and A repressors appear to act by prevent- 
ing the initial binding of RNA polymerase to the 
promoter. Other repressors act at later steps in the 
initiation process. 

Purified proteins and DNA have also allowed the 
elucidation of the structures of repressor—operator 
complexes. These structures, which give a molecular 
form at atomic resolution to the systems defined by 
genetics and biochemistry, resolve many questions 
about how the repressor is able to recognize the spe- 
cific DNA sequence of the operator. In particular, the 
structures address two classes of models: sequence 
versus structural reading of the DNA sequence. In 
the sequence readout model, the repressor recognizes 
features of the base sequence of the operator by mak- 
ing direct contact with the base pairs, in either the 
major or minor groove of B-form DNA. In structural 
readout models, the operator DNA has a propensity 
to form a non-B structure that is recognized by the 
repressor. The structures of these repressor—operator 
complexes revealed that the repressors interacted with 
operators that were close in structure to B-form DNA 
by interacting with both the sugar—phosphate back- 
bone and with functional groups in the major groove. 
However, other DNA-protein complexes involve dif- 
ferent mixes of sequence and structural readout. For 
example, the center of the phage 434 operator is 
important for repressor recognition, but does not 
make direct contact with the protein. Instead, it pro- 
motes a bend in the DNA that allows the flanking 
sequences to make favorable contacts with the repres- 
sor. In different complexes, the DNA can be found ina 
variety of bent, twisted, kinked, and unwound struc- 
tures. Nature does not use a universal protein-DNA 
recognition code. 

The operon model was originally formulated on the 
basis of genetic models and is formally independent of 
the molecular nature of the repressor or the operator 
or the mechanism of regulation. However, many 


genetic elements that would satisfy the original oper- 
ational definition for an operator are no longer con- 
sidered to be instances of operators. For example, the 
attenuators of many bacterial operons are cis-acting 
elements that are required for the negative regulation 
of gene expression. Some attenuator mutations lead to 
a cis-dominant constitutive phenotype. Nevertheless, 
the differences between the mechanisms of regulation 
at operators and attenuators has led molecular geneti- 
cists to classify them as different kinds of cis-acting 
genetic elements. Similarly, regulatory sites that affect 
translation are also cis-acting and are sometimes 
referred to as operators. Notable examples occur in 
the autoregulation of translation by the phage MS2 
coat protein, and the phage T4 gene 32 and gene 43 
products. In the latter case, a short stem-loop RNA 
structure binds to the gene 43 product, which is the 
phage DNA polymerase. 
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An operon is a genetic regulatory system found in 
bacteria and their viruses in which genes coding for 
functionally related proteins are clustered together 
and transcribed from one promoter into a single 
RNA. This is a functional unit and allows protein 
synthesis to be controlled in a coordinated and regu- 
lated fashion in response to the cell’s needs. Proteins 
can thus be produced only when they are required. 

A typical operon comprises of several types of genes: 


1. Structural genes (S1-Sn) which code for the pri- 
mary structures of enzyme proteins involved in a 
metabolic pathway, such as the biosynthesis of an 
amino acid. 

2. The promoter (P), a short sequence of DNA acting 
as the start point, and to which RNA polymerase 
binds. The promoter is controlled by various regu- 
latory elements that respond to environmental stim- 
uli. 
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3. The operator (O), comprising a short segment of 
DNA found adjacent to the promoter is a control 
element which binds a regulator protein that can 
either repress or activate transcription. 


Usually the regulatory gene is located in a different 
region of the chromosome. If the specific repressor 
binds to the operator, transcription of the structural 
genes is blocked. In some operons a small molecule 
may act as an inducer, binding to the repressor, inacti- 
vating it and thereby derepressing the operon. In 
others, a repressor may be unable to bind to the 
operator unless it is bound to a small molecule, the 
corepressor. Some operons are under attenuator con- 
trol, in which transcription is initiated but is arrested 
before the mRNA is transcribed. The resultant intro- 
ductory mRNA sequence (the leader sequence) 
includes the attenuator, which by folding back on 
itself to produce a loop, blocks the progress of 
RNA polymerase along the DNA strand. 

The operon theory was first proposed by Jacob 
and Monod in the early 1960s, who described the 
regulatory mechanism of the /ac operon in Escherichia 
coli. 


See also: Histidine Operon; Jacob, Francois; lac 
Operon; Tryptophan Operon 
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Definition 


A characteristic feature of eukaryotic (nucleus- 
containing) cells is the variety of ‘organelles’ they 
contain. One or more lipid membranes form the 
outer boundary of these distinct subcellular struc- 
tures, defining discrete compartments within which 
the biochemical reactions typical of each kind of organ- 
elle type occur. By this definition, macromolecular 
complexes that lack a bounding membrane (e.g., ribo- 
somes, nucleoli) are not considered to be organelles, 
even though they may have a readily recognizable 
structure and a specialized function within the cell. 
Organelles may be thought of as analogs of bodily 
organs (e.g., heart, liver, kidney), each of which has 
a characteristic size and shape and serves a distinct 
physiological role that is essential to the life of the 
organism. Just as organ systems communicate with 
one another, subcellular organelles interact through 
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transport of small molecules and macromolecules 
between them and across their membranes. 


Types of Organelles 


Nucleus 

The most prominent of the eukaryotic cellular organ- 
elles, the nucleus is the repository of most of the cell’s 
genetic information, which is packaged into chroma- 
tin. A primary function of the nucleus is gene expres- 
sion (transcription), whereby genes are copied into 
messenger RNA (mRNA), ribosomal RNA (rRNA) 
and transfer RNA (tRNA) molecules. Ribosome for- 
mation, which includes rRNA synthesis, takes place 
at a localized subnuclear structure, the nucleolus, 
whereas mRNA maturation (which involves removal 
of introns) occurs throughout the nucleoplasm on 
ribonucleoprotein particles called spliceosomes. 
DNA synthesis (replication) and DNA repair are 
also key functions of the nucleus. The nucleus com- 
municates with the rest of the cell by way of nuclear 
pores that penetrate the nuclear membrane. In some 
organisms, the nucleus transiently disappears during 
cell division when the nuclear membrane disassem- 
bles, later reforming around the duplicated chromo- 
some sets after their segregation into separate 
daughter cells. 


Endomembrane System 

Comprising the endoplasmic reticulum, the Golgi 
body, various types of intracellular transport vesicles 
and the plasma membrane, the endomembrane system 
is intimately involved in both inter- and intracellular 
trafficking of proteins and other material. 


Endoplasmic reticulum 

The endoplasmic reticulum (ER) is a lipid membrane 
network that extends throughout the cell and is 
continuous with the outer membrane of the nucleus. 
On the basis of their appearance in the electron 
microscope, two types of ER have been distinguished: 
rough ER (RER), so-called because of the numerous 
cytoplasmic ribosomes bound to its surface, and 
smooth ER (SER), which lacks ribosomes. The RER 
is the site of synthesis of integral ER proteins as 
well as proteins destined for other organelles or 
for export out of the cell. Proteins synthesized on 
membrane-bound ribosomes are translocated during 
their synthesis through the ER membrane into the 
interior (lumen), where they receive a core-targeting 
signal (N-linked oligosaccharide). The ER is the 
cellular site of lipid biosynthesis, and also has an 
important function in transport and storage of Ca** 
ions. 


Golgi body 

The Golgi body (also called Golgi apparatus, or Golgi 
complex) consists of a series of disk-like membranes 
(cisternae) organized into stacks, or dictyosomes. 
Newly synthesized glycoproteins are directed from 
the ER lumen to the Golgi body for further addition 
of sugar residues to the oligosaccharide core. These 
carbohydrate tags serve as signals for sorting and 
transport of the mature glycoproteins to their ap- 
propriate compartments within the cell, or out of 
the cell. The Golgi body has a distinct polarity, with 
proteins entering its cis (or entry) face via transport 
vesicles called transitional elements that bud from 
the ER, and exiting through its trans face via secretory 
vesicles. 


Lysosomes 

Lysosomes comprise a morphologically heteroge- 
neous collection of organelles that are characterized 
by their content of many different kinds of acid 
hydrolases, enzymes that carry out the controlled intra- 
cellular degradation of macromolecules delivered to 
the lysosome. Soluble material is brought into the 
cell through a process termed endocytosis, which 
involves invagination and pinching off of the plasma 
membrane to form endocytotic vesicles (endosomes). 
Endosomes subsequently fuse with trans-Golgi 
vesicles containing lysosomal hydrolases to form 
endolysosomes, which mature into lysosomes. Endo- 
lysosomes may fuse with other vesicles that enclose 
large particles brought in from outside the cell (e.g., a 
bacterium in a phagosome) or other organelles (e. 8- 
a mitochondrion in an autophagosome), thereby initi- 
ating the digestion of such inclusions. 


Organelles of Energy Production and 
Oxygen Metabolism 

The mitochondrion and chloroplast, two organelles 
involved in energy metabolism, are of special interest 
because they are the only ones known to contain 
invariably genetic information and a translation sys- 
tem, relics of their evolutionary past. Both organelles 
trace their evolutionary ancestry to eubacterial endo- 
symbionts, with mitochondria originating from 
within the phylum of -Proteobacteria (so-called 
purple bacteria) and chloroplasts from within the 
Cyanobacteria (formerly known as blue-green algae). 


Mitochondrion 

Typically, mitochondria are depicted as sausage- 
shaped organelles of rather uniform size. In fact, with- 
in a living cell, mitochondria are remarkably fluid, 
constantly changing shape, fusing, and separating. 
Distinct outer and inner mitochondrial membranes 
define two soluble compartments, the intermembrane 


space and the matrix (enclosed by the inner mem- 
brane). The two specialized membranes are biochem- 
ically unique, the inner membrane containing the 
respiratory chain complexes that carry out the pri- 
mary function of this organelle: oxidative phosphor- 
ylation coupled to the synthesis of ATP. The inner 
mitochondrial membrane is usually highly infolded 
into cristae, which greatly increases the surface (and 
therefore functional) area of the membrane and gives 
the mitochondrion its distinctive appearance in elec- 
tron micrographs. 


Chloroplast 
Whereas almost all eukaryotes have mitochondria, 
chloroplasts are found only in plants and algae. The 
primary function of the chloroplast is photosynthesis, 
in which energy in the form of visible light is ‘har- 
vested’ by photopigments such as chlorophyll and 
used to power the production of ATP and the fixation 
of CO, in carbohydrate. Like mitochondria, chloro- 
plasts have an outer and inner membrane, but they have 
in addition a third distinct membrane system compris- 
ing the thylakoids, localized within the stroma, the 
compartment enclosed by the inner membrane. Indi- 
vidual thylakoids are usually stacked into aggregates 
called grana. The thylakoid membrane contains all of 
the energy-generating machinery of the chloroplast. 
Nonphotosynthetic chloroplasts (often termed 
plastids) are either precursors or intermediates of 
chloroplast differentiation or are specialized for 
other functions. Such plastids include proplastids 
(the developmental progenitors of other types of 
plastids), various types of storage plastid (e.g., amylo- 
plasts, which accumulate starch), and chromoplasts 
(which contain the pigments that give flowers and 
fruits their characteristic colors). A remnant chloro- 
plast termed the apicoplast is even found in the malaria 
parasite, Plasmodium, and other members of the para- 
sitic phylum Apicomplexa. This nonphotosynthetic 
phylum may therefore have evolved from a photosyn- 
thetic ancestor. 


Peroxisome 

Peroxisomes, found in all eukaryotic cells, are 
bounded by a single membrane and contain neither 
DNA nor elements of a translation system. These 
organelles harbor high concentrations of oxidative 
enzymes such as catalase and urate oxidase and are a 
major site of oxygen utilization. Peroxisomes generate 
hydrogen peroxide (H202), which they then use 
(via catalase) to oxidize a variety of substrates. By virtue 
of this function, peroxisomes play an important role 
in the detoxification of substances that are poten- 
tially harmful to the cell. Peroxisomes are biochem- 
ically diverse, even within a single cell. In plants, 
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glyoxysomes (a type of peroxisome) play an essential 
role in converting fatty acids stored in germinating 
seeds into sugars required for seedling growth. 


Other Organelles 

Many other specialized organelles have a restricted 
distribution within the eukaryotic lineage. In plants, 
one or more vacuoles, surrounded by a single mem- 
brane called the tonoplast, may occupy up to 90% of 
the cell volume. By convention, vacuoles are consid- 
ered to be separate from the cytoplasm. The vacuole 
is a functionally versatile organelle, playing roles in 
the storage of both nutrients and waste products, in the 
breakdown of cellular constituents, and in control of 
cell rigidity (turgor). 

Many anaerobic eukaryotes lack mitochondria 
but instead have hydrogenosomes, energy-generating 
organelles that produce hydrogen. Hydrogenosomes 
have some of the properties of mitochondria, includ- 
ing a double membrane, and recent evidence suggests 
that they may be derived in evolution from mitochon- 
dria. However, hydrogenosomes lack a genome as well 
as a mitochondrial-type respiratory chain and asso- 
ciated cytochromes. 

Finally, in kinetoplastid protozoa (organisms that 
include the causative agents of African sleeping sick- 
ness and leishmaniasis), glycolytic enzymes are se- 
questered within membrane-bound organelles termed 
glycosomes. In other eukaryotes, the enzymes of gly- 
colysis are found free in the cytoplasm. The glycosome 
is probably a specialized type of peroxisome. 


See also: Chloroplasts, Genetics of; Mitochondria; 
Nucleolus 
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Hybridization is a powerful nucleic acid procedure 
that permits the identification of similar or identical 
nucleotide sequences and their isolation or purifica- 
tion. It depends on the complementary base pairing 
of DNA and the fact that the two strands are held 
together by hydrogen bonds that can be easily disso- 
ciated by reduced ion concentration and heat. Strands 
that have been dissociated can be reassociated under 
the correct conditions of salt and temperature depend- 
ing on the concentration of matching sequences. The 
pairs that form are hybrids since there is little chance 
that the original pairs would find each other. The 
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complementary pairs that form may be perfect or 
near-perfect matches depending on the sequence poly- 
morphism of the DNA sample. In practice hybrid- 
ization is used to compare different sets of nucleic 
acids for a wide variety of purposes. The procedure 
has remarkable specificity and selectivity, probably 
exceeding that of any other known tests of relation- 
ship. This is due to the fact that under controlled 
conditions a specific nucleotide sequence will pair 
surely and only with its complementary sequence 
out of many billions of possibilities. 


Requirements 


To achieve the maximum specificity the sequence 
must be long enough so that it does not match by 
statistical accident sequences in the population it is 
being tested against. The concentration must be high 
enough so that complementary strands approach each 
other and duplexes form at sufficient rate. The rate of 
duplex formation rises with nucleic acid concentration 
and very strongly rises with ionic strength. Obviously 
the solution must have sufficient ionic strength so that 
the long charged nucleic acid polymers do not repel 
each other and this is in practice greater than 0.1 molar 
monovalent cations. The resulting duplex must, of 
course, be stable under the conditions as described in 
the next paragraph. 


Thermal Stability 


A valuable feature is that strands with imperfectly 
matching sequences can form duplexes under the 
appropriate conditions. The thermal stability of the 
mismatched duplexes depends on the quality of match 
and the length of the matching regions as follows, 
where Tr is the reduction in melting temperature 
°C below that of perfect long duplexes under typical 
salt concentrations, L is length of duplex, and PC is 
the percent match: 


Tmr = 550/L + (100 — PC) /1.1 (1) 


Tmr (Tm reduction) is used here because the melting 
temperature depends strongly on the ionic strength of 
the environment rising about 10 °C for every factor of 
ten increase in monovalent ion concentration. Often 
formamide has been used to reduce the melting tem- 
perature and allow room temperature procedures. 


Criterion of Accuracy of Match 


As in equation (1) the accuracy of match controls the 
thermal stability of the duplex that is formed. Thus 
the ionic strength and temperature determine whether 


a duplex can be formed. The rate of formation falls 
as the incubation temperature approaches the melt- 
ing temperature of the duplex. In most solvents the 
melting temperature depends on the base composition 
and this must be taken into account. However tetra- 
methyl ammonium chloride reduces this effect and 
at 2.4 mol 1™* the melting temperature is essentially 
independent of base composition. Tetraethyl am- 
monium chloride is also useful in this way for estab- 
lishing a precise criterion of accuracy of sequence 
match. 


Hybrid Detection and Binding to Solid 
Support 


Many successful measurements have been made after 
duplex formation in solution. Duplexes can be detec- 
ted by: reduction in ultraviolet absorption; binding to 
hydroxyapatite; resistance to single-strand nuclease 
digestion; or for RNA/DNA duplexes reduction of 
RNase resistance. At present the great majority of 
measurements involve the binding of single-stranded 
target DNA to solid support such as nylon mem- 
branes or pretreated glass slides. Probe nucleic acids 
are labeled by fluorescence or radioactive compounds. 
Treatment with appropriate solutions can prevent 
additional binding to the substrate. After appropriate 
washing to remove background or nonspecific bind- 
ing the location of the specific target DNA can be 
determined. 


Large-Scale Systems 


These procedures can be scaled up so that many thou- 
sands of target clones can be tested, including possibly 
all members of a cloned genomic library. There are 
large-scale detectors available and the data can be 
automatically entered into computer storage and 
analysis. 


RNA/DNA Hybrids and Assay of Gene 
Expression 


The fact that RNA forms duplexes with comple- 
mentary DNA sequences permits a wide variety of 
qualitative and quantitative determinations of gene 
expression. This is not only possible with individual 
genes but with large assortments of genes so that gene 
expression localization or developmental changes can 
be assayed. 


In Situ Hybridization 


Cells or tissues can be prepared and bound to sub- 
strate so that the DNA is denatured and accessible for 


hybridization. This permits many tests of cellular 
location including determination of the location on 
chromosomes of specific genes. Such bound cells or 
tissues can be prepared so that RNA remains bound 
in place and the location of specific gene expression 
can be determined, for example in an embryo. It is 
even possible to carry out nucleic acid amplifica- 
tion by polymerase chain reaction (PCR) im situ to 
increase greatly the quantity of specific target se- 
quences. 


Virus and Bacterial Identification 


Sometimes it is crucial to determine the organism 
responsible for an infection. Solid substrates with 
DNA samples from many organisms are available. If 
the infecting organism can be labeled identification 
can be simple and rapid by hybridization. 


Genome Evolution and Phylogenetic 
Relationships 


The melting temperature reduction of genomic single- 
copy DNA hybrids formed between the DNA of 
different species has been used to determine relation- 
ships and evolutionary rate of change of DNA se- 
quences. This technique has been principally replaced 
by DNA sequence comparison but now that it is clear 
that individual genes and regions of the DNA exhibit 
different rates of change it is possible that the fact that 
hybridization averages over the whole genome may 
bring it back into popularity, for phylogenetic and 
evolutionary studies. 


See also: DNA Hybridization 
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The genetic material of all organisms is nucleic acid, 
and in most cases this material is deoxyribonucleic 
acid, or DNA. Chromosomes consist of single long 
molecules of DNA, nearly always double-stranded 
DNA, complexed with protein molecules. DNA 
replication is the process whereby an original ‘paren- 
ta? DNA molecule is duplicated to yield two ‘daugh- 
ter’ DNA molecules, each identical in nucleotide 
sequence (and thus in genetic information) to the ori- 
ginal parental DNA molecule. This process must be 
precisely regulated, both in time and in space. Each 
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DNA molecule in each cell must be duplicated once and 
only once during the cell cycle. Each daughter DNA 
molecule is then distributed to each daughter cell 
upon cell division via a process known as chromosome 
segregation. Chromosome partitioning is also precisely 
regulated in time and space, and is closely coupled to 
the DNA replication process. 

DNA replication begins at origins (‘ori’) sequences. 
It is followed by the elongation process, during which 
daughter chromosomes are synthesized. This is fol- 
lowed by the termination of replication and separation 
of the daughter chromosomes. This article considers 
the question: what is an origin of DNA replication? 
We consider the structural similarities and differences 
between DNA replication origins from different 
organisms, with a view to describing features common 
to all replication origins. 


Prokaryotic DNA Replication Origins 


Prokaryotic chromosomes are nearly always single 
DNA molecules, and chromosomal replication begins 
at a single unique origin. This origin generally con- 
tains binding sites for initiation proteins and sequence 
features that often include AT-rich regions where 
strand melting occurs. Bacteria frequently contain 
plasmids, circular or linear pieces of DNA containing 
their own replication origins. Plasmids are dispensible 
but often encode useful genes. The main difference in 
origin function between plasmids and chromosomes is 
that chromosomal origins initiate only once per cell 
cycle while plasmid origins may initiate infrequently 
(low copy plasmids) or frequently (high copy plas- 
mids) during the cell cycle. 


Plasmids 
Plasmids replicate either by a theta-type mechanism 
or a rolling-circle mechanism. In theta-type replica- 
tion, one or two replication forks capable of synthe- 
sizing the leading and lagging strands simultaneously 
are assembled at the origin. The origins in these plas- 
mids often have several binding sites (iterons) for a 
replicon-specific initiation protein (Rep protein), one 
or more sites for binding DnaA, the bacterial initiator 
protein, and AT-rich sequences. Rep binding to the 
iterons causes structural changes such as DNA bend- 
ing, strand melting, and unwinding, especially in 
adjacent AT-rich regions. A nucleoprotein complex 
consisting of plasmid-encoded and host proteins is 
then formed in the melted region. If only one fork is 
assembled at the origin, replication is unidirectional; if 
two forks are assembled at the origin, replication is 
bidirectional. 

The origin of the plasmid, ColE1, contained in many 
cloning vectors, is an exception because replication 
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initiation depends totally on Escherichia coli pro- 
teins, none of which is a Rep protein. Instead, replica- 
tion depends upon the synthesis of a 700 nucleotide 
long RNA, RNA II, which is cleaved by RNase 
H, resulting in the formation of primer RNA. The 
ori sequence of ColE1 is the site where RNase H 
cleaves RNA II. After the 3’ end of primer RNA is 
extended by DNA polymerase I, a single replication 
fork containing DNA polymerase III holoenzyme is 
formed. This enzyme replicates the plasmid unidirec- 
tionally. 

In the rolling-circle mechanism, the origin contains 
a Rep protein binding site and a Rep nick site. After 
the plasmid-encoded Rep protein nicks at the origin, 
leading strand replication is primed by the free 3’ OH 
end at the nick. Following synthesis of the leading 
strand, catalyzed by DNA polymerase III, the Rep 
protein cleaves at the nick site located at the junction 
between the old and new leading strands. The new 
leading strand is released and used as a template for 
lagging strand synthesis after host proteins are assem- 
bled at the single-strand origin. 


Bacteria 

Bacterial replication origins, first isolated from 
E. coli, were discovered as autonomously replicating 
sequences (ARS) or DNA restriction fragments cap- 
able of converting a DNA fragment bearing an anti- 
biotic resistance gene into replicon, which is defined as 
a DNA molecule capable of self-duplication. The 
minimal replication origin, termed oriC, was defined 
by deletion analysis. Extensive mutagenesis studies 
assisted in delineating the relative importance and 
function of specific base pairs within the minimal 
origin. In a comparative approach, ori sequences func- 
tional in E. coli were isolated from a variety of other 
gram-negative bacteria, including Salmonella typhi- 
murium, Enterobacter aerogenes, Klebsiella pneumo- 
niae, Erwinia carotovora, and the marine bacterium, 
Vibrio harveyi. Since these origins were functional in 
E. coli and used the E. coli initiation machinery, these 
origins could be considered to be ‘multiply mutated’ 
ancestral origins. Thus, their sequence comparisons 
have yielded some of the fundamental properties of a 
bacterial replication origin. 

These comparisons showed regions of high iden- 
tity, separated by regions (linker regions) whose 
length was conserved but which were highly variable 
in sequence. Three primary kinds of highly conserved 
sequences emerged (see Figure I): (1) 9-bp direct and 
inverted repeats (called R sites or DnaA binding sites), 
(2) 13-bp AT-rich direct repeats immediately adjacent 
to the DnaA binding sites, and (3) eight GATC sites 
positionally conserved within oriC among all enteric 
origins and V. harveyi. During E. coli initiation, the 


positively-required initiator protein, DnaA first binds 
the R sites. The resulting DNA-protein complex 
forms a structure within which DnaA protein causes 
unwinding of the 13-bp AT-rich direct repeats. Primo- 
some formation and primer synthesis for subsequent 
DNA synthesis then occurs within the unwound 
region. 

The GATC sites function in the regulation of ini- 
tiation. The Dam methylase of E. coli catalyzes 
methylation of the adenine residues in GATC sites. 
Immediately after initiation, the GATC sites within 
oriC are ‘hemimethylated’; the newly-synthesized 
‘daughter’ DNA strands contain unmethylated ade- 
nine, whereas those on the original ‘parental’ DNA 
strands are methylated. Such hemimethylated oriC 
DNA becomes sequestered within site(s) on the cell 
membrane and is nonfunctional for subsequent initia- 
tion. The hemimethylated origins must be converted 
to fully methylated origins before another round 
of initiation can occur. Thus, GATC methylation 
accounts, at least in part, for the observed time delay 
(eclipse period) between initiation events at a given 
origin. 

Most other bacteria, including the pseudomonads, 
have no Dam-GATC methylation system and hence 
lack the timing control mechanism for replication 
initiation found in enteric bacteria. Do these bacteria 
then have different replication origins from those of 
enteric bacteria? ARSs from Pseudomonas putida 
(one such) and from P. aeruginosa (two such) were 
isolated in P. putida, and shown to be functional in 
both pseudomonad species but not in E. coli. The ori 
sequences of the pseudomonads contain no more 
GATC sites than expected at random. Further, no 
other 4-bp sequence is found in abundance in these 
origins; a temporal control mechanism for the eclipse 
period comparable to the GATC-hemimethylation 
mechanism of enteric bacteria is not known for the 
pseudomonads. All three pseudomonad ARSs have 
five copies of the enteric 9-bp DnaA-protein bind- 
ing site, and these sites are positionally conserved 
between the three origins. Also, three 13-bp AT- 
rich direct repeats are found in each of the three 
origins immediately adjacent to the DnaA binding 
site region. 

Cloning of an ARS from the Bacillus subtilis 
chromosome was unsuccessful until alow copy number 
vector was used. The cloned fragment contains two 
DnaA binding regions that flank the dnaA gene, one in 
the dnaA promoter region and the other, containing 
the initiation site, lies between the dnaA and dnaA 
genes. Both DnaA binding regions are required for 
replication. 

The linear chromosome of the Streptomycetes con- 
tains an oviC region located between the dnaA and 
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Figure | Structural organization of the Escherichia coli oriC region. (A) Genes in the vicinity of oriC. Open boxes: genes drawn to scale; arrow shows transcription 
direction. Filled arrowheads: transcription promoters. kbp: kilobase pairs, measured relative to gidA side of the minimal 258-bp oriC region. (B) oriC DNA sequence 
from nucleotides | to 340, showing structural elements. RI-R4, M: 9-bp R sites, with arrowhead showing directionality. L, M, R 13-mer: |3-bp AT-rich direct 
repeats. Solid black rectangles: GATC sites. Protein names (IciA, HobH, DnaA, IHF, FIS, ROB, HNS) and open rectangles: proteins and their binding sites in and near 
oriC. Arrowheads: start points for DNA synthesis. (Reproduced with permission from Messer and Weigel, 1996.) 


sə2uənbəş HO 


ESEI 


1384 Ori Sequences 


dnaN genes. This origin has 19 DnaA boxes whose 
location, orientation, and spacing are conserved among 
three different species of Streptomyces. Although the 
AT-rich 13-mer sequences found in enteric origins are 
not present, several short AT-rich sequences are scat- 
tered throughout the Streptomyces origin sequence. 
The replication origin of Spiroplasma citri is also 
located between the the dnaA and dnaN genes and 
contains several DnaA boxes. 


Prokaryotic Origin Features 

Comparison of the properties of these eubacterial 
replication origins argues that such origins have three 
major features in common: (1) a DnaA binding site 
region bracketed by two DnaA binding sites of oppos- 
ite orientation, containing five or more DnaA bind- 
ing sites, (2) three AT-rich direct repeats of about 
15-bp are scattered throughout the origin, and (3) the 
origin is often found downstream or upstream of the 
dnaA gene, which encodes the eubacterial initiator 
protein. Eubacterial replication origins are distin- 
guished from each other by at least the following 
features: (1) presence or absence of GATC sites, 
together with a Dam methylation system, (2) position 
within the DnaA binding site region of the the DnaA 
binding sites, and (3) presence, precise sequence, and 
degree of conservation of the three AT-rich direct 
repeats. These classes appear to account for which 
origins are functional in which bacterial species, and 
may have taxonomic significance. These common fea- 
tures of eubacterial replication origin have cognates 
among the eukaryotic replication origins, as noted 
below. 

Another notable feature of bacterial ori sequences 
is their cellular location. At initiation, the origin is 
localized at the midcell; after initiating, the origin 
moves to the poles. However, DNA polymerase III 
remains at midcell during the replication cycle sug- 
gesting a ‘factory model’ where DNA moves through 
a fixed replisome during replication. The movement of 
oriCs from midcell to the poles may involve a mitotic 
apparatus associated with oriC. 


Eukaryotic DNA Replication Origins 


DNA replication in higher eukaryotic organisms is 
characterized by bidirectional replication proceeding 
from multiple initiation origins on single long DNA 
molecules, typically one molecule per chromosome. 
DNA sequences from many eukaryotes have been 
cloned as ARS sequences, which render a marked 
DNA molecule capable of self-duplication in a eukar- 
yotic host such as Saccharomyces cerevisiae. These 
ARSs show some conservation in sequence, but not 
nearly to the extent found in prokaryotic origins. 


In fact, many of these ARSs appear NOT to function 
as origins of DNA replication in the source chromo- 
somal DNA. A major problem with such analyses is 
that replication origins in eukaryotes exhibit complex 
regulation, both in time (see Figure 2) and in space. 
DNA replication occurs nearly exclusively during the 
S phase of the eukaryotic cell cycle, but different 
replication origins can initiate replication at different 
times during the S phase. Different types of cells, and 
different types of tissues, appear capable of using dif- 
ferent replication origins, resulting in duplication of 
some genes prior to others. In all cells, the DNA must 
be replicated precisely once, and only once, between 
cell division events. Some possible control factors for 
this regulation are shown in Figure 2, illustrating 
similarities between the simple eukaryote, S. cerevi- 
siae, and the more complex metazoan, Xenopus laevis. 
What then are the factors, regulatory features, 
and sequence characteristics that render a eukaryotic 
DNA sequence capable of functioning as a replication 
origin? 

One of the simplest eukaryotic chromosomes is the 
genome of the animal virus SV40. This genome is 
sufficiently small that replication proceeds from a 
single well-characterized origin. An SV40-encoded 
replication protein (T antigen) forms a protein-DNA 
complex at the origin recognition element (ORE) site 
in the origin, causing strand opening in the origin in an 
adjacent AT-rich DNA unwinding element (DUE). 
Additional proteins having helicase and polymerase 
activities bind to the unwound DUE, resulting in 
replication of the SV40 DNA. The nucleotide at 
which initiation occurs, the origin of bidirectional 
replication (OBR), is at one end of the DUE. An 
auxilliary protein, the transcription factor Sp1, may 
also facilitate SV40 replication. However, the replica- 
tion of SV40, as well as other viral genome, lacks the 
sophisticated regulatory mechanisms found in eukar- 
yotic chromosomal DNA replication; in particular, 
the SV40 genome is typically replicated many times 
per host cell cycle. 

Origins from the budding yeast S. cerevisiae are 
among the best characterized. Yeast origins contain a 
conserved sequence element A and three lesser con- 
served sequence elements B1, B2, and B3. Elements A 
and B1 (Figure 3) bind the six protein origin replica- 
tion complex (ORC), followed in early G, in the 
yeast cell cycle by loading of the minichromosome 
maintenance (MCM) protein complex, forming the 
prereplication complex. MCM loading is mediated 
by the protein Cdc6. (Figure 2). In late G4, Cdc6 is 
replaced by Cdc45 yielding the preinitiation complex 
(Figure 2). Cdc45 is phosphorylated, resulting in un- 
winding of sequence element B2, the DUE (Figure 3), 
and, in S phase, initiation of bidirectional replication 
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(See Plate 25) Prereplication complexes and events in the yeast Saccharomyces cerevisiae and eggs from the 


frog Xenopus laevis as a function of cell cycle position. ORC: origin recognition complex. Mcm: minichromosome 
maintenance. Cdc: cell division cycle. (Reproduced with permission from DePamphilis, 1999.) 


at the OBR, found between the B1 and B2 sequence 
elements. When present, sequence element B3 binds 
the transcription factor Abf-1, which is thought to 
facilitate initiation, perhaps by enhancing unwinding 
of element B2. Similar events occur in other eukar- 
yotes such as the frog X. laevis (Figure 2). Thus, 
temporal control of ARS initiation events is deter- 
mined at the molecular level via sequential activities 
of several proteins, as well as posttranslational protein 
modification events. 

These eukaryotic replication ARS origins are rather 
small (100-200 bp), typical of single origins of replica- 
tion. In contrast, replication origins found in the 
fission yeast, Schizosaccharomyces pombe, and in 
many higher eukaryotes including mammals, are 
500-1000 bp in size (Figure 3). These origin ‘re- 
gions,’ called initiation zones, are thought to contain 


multiple ARSs, only one of which functions in a given 
tissue, cell type, or chromosome region. The ARS that 
is used is further thought to be determined by the 
‘context’ of the sequence surrounding the initiation 
zone, which could account for much of the ‘spatial’ 
or cell type and tissue specific regulation shown by 
replication origins. 

What are the properties of this sequence ‘context’ 
that could account for these regulatory properties? 
Clearly one property, as mentioned above, is the 
presence of auxiliary sites often bound by transcrip- 
tion factors, resulting in stimulation or inhibition 
of DNA replication initiation. Other implicated pro- 
perties include nuclear structure, chromatin structure, 
DNA methylation state, and DNA sequence. An 
intact nucleus appears to be important, probably 
either to maintain a sufficiently high concentration 
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Figure 3 Replication origins that function in the nuclei of eukaryotic cells. Dark rectangles: sequence elements 
always required for origin function. Light rectangles: binding sites for auxiliary factors. SV40: simian virus 40. ARS: 
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DePamphilis, 1999.) 


of replication factors or to maintain a structure or 
matrix at which initiation can take place. Likewise, 
chromatin structure, particularly density of nucleo- 
somes and presence of higher-order structure, can 
affect formation and function of the initiation com- 
plexes (Figure 3). The role of cytosine methylation in 
DNA remains controversial, both for transcription 
and for DNA replication initiation. There is no 
methylation of adenines at GATC sites, as found in 
some prokaryotes. However, ARSs that are highly C- 
methylated in CpG islands are highly active as origins, 
and such origins are rapidly remethylated following an 
initiation events. Thus, regulation of DNA replication 
could be tied to cell methylation activity. DNA 


sequence is implicated in each of these properties, 
as well as in the efficiency of binding each initiation 
protein. 


Common Origin Features 


DNA replication origins are characterized primarily 
be three types of structures: (1) sites for binding of 
proteins, mainly initiation and auxiliary proteins, (2) a 
characteristically AT-rich region that is unwound, and 
(3) sites and structural properties involved in regulat- 
ing initiation events. These three types of structures 
appear to impart the necessary features to a DNA 
sequence for it to be, or to become, a functional origin 


of DNA replication for all life. Specificity of initia- 
tion, to a given organism or class of organisms, or to a 
given time or cell type during the life cycle of a given 
organism, is provided by the details of these three 
types of structures. Examples of initiation proteins 
include RepA for plasmid R100, DnaA for eubacteria, 
T antigen for virus SV40, and the ORC proteins for 
yeast and higher eukaryotes, whereas examples of the 
AT-rich unwinding region include the 13-mer direct 
regions in the eubacterial origins (Figure 1), the DUE 
region of the SV40 origin, the B2 element in yeast 
origins, and the DUE regions in origins from higher 
eukaryotes (Figure 3). Regulatory sites and properties 
include binding of the RNA I inhibitor RNA to 
the RNA II of plasmid ColE1 initiation, GATC sites 
in the enteric bacterial replication origin, the Sp-1 
binding site in the SV40 origin, and similar transcrip- 
tion factor binding sites in other eukaryotic origins 
(see Figure 3), chromatin structure features and sites 
for nuclear matrix interaction in eukaryotic origins, 
and state of CpG methylation in eukaryotic origins. 

Three main events occur during initiation: unwind- 
ing of the origin DNA, priming of the leading DNA 
strand, and assembly of the replisome. Only the first 
is determined by a DNA sequence. The location of 
unwinding (AT-rich sequences) and helicase loading is 
fixed, and the DNA sequence at that location is con- 
sidered the origin. Size and sequence requirements of 
the minimal origin can be defined by mutation and 
deletion analysis. Because primase action occurs fol- 
lowing helicase unwinding as it moves through DNA, 
the location of initiation sites or RNA:DNA junctions 
may be found at heterogeneous sites within and out- 
side of the minimal origin. Also, helicase unwinding is 
most likely required for replisome assembly, which 
may also occur outside of the minimal origin. It is 
unlikely that the locations of priming and replisome 
assembly are determined by a particular sequence of 
DNA but rather are determined by events occurring at 
the origin. 
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The origin (ori) is the region of DNA at which replica- 


tion is initiated. 


See also: Ori Sequences; Replication 
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A working definition of living systems is indispens- 
able for the construction and assessment of theories of 
the origin of life. Living systems are not necessarily 
units of evolution and, conversely, populations of 
units other than living systems can undergo evolution 
by natural selection. But, since we are interested in the 
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origin of our biosphere, the first living systems of 
interest must have been units of evolution as well. A 
model for minimal life consists of three autocatalytic 
subsystems: the metabolic, the genetic, and the bound- 
ary systems. The system as a whole is capable of 
spatial reproduction. Various theories account for the 
origins of these subsystems and possible combinations 
of them. Beyond the structural considerations, the 
dynamics of these systems is also crucial. The genetic 
code is likely to have originated in systems that were 
already alive. 


Criteria for Living Systems 


Why Bother about Definition? 

In order to tackle the origin of life one must have at 
least a working definition of life. There is no general 
agreement about such a definition. Some believe that 
self-replication, enzymatic aid of chemical processes, 
or cellularity alone, or a combination of these are 
necessary and sufficient to define a living system. 
Clearly, from the logical point of view definitions are 
arbitrary: One cannot falsify a definition in the same 
way as one falsifies a hypothesis. So why bother? 


In view of this arbitrariness on where to put the marker, is 
any definition equally good? Surely not, as one definition 
may be more meaningful than another, depending on what 
you want to do with it. In fact, the following criteria appear 
to be important: a definition of life should permit one to 
discriminate between the living and the nonliving in an 
operationally simple way and it should not be too restrictive 
(i.e., the discrimination criterion should be applicable over a 
large area and should be capable of including life as it is as 
well as hypothetical previous forms). All forms of life we 
know about should be covered by such a definition. Once 
decided upon, the definition should also help to design 
experiments on the production of minimal life in the labora- 
tory, consistent with the definition. It should help space 
explorers in the attribution of the term ‘life’ to novel bio- 
logical forms. Finally of course it should be logically self- 
consistent’ (Luisi, 1998). 


Units of Evolution 

Viruses do evolve, even if they are inert crystals by 
themselves. In fact they have become one of the most 
accesible test systems for evolutionary hypotheses. 
Some computer programs can also evolve in competi- 
tion with others. What is then the relationship of units 
of evolution to units of life? In order to give a tentative 
answer, one must first define both concepts with suf- 
ficient clarity. Units of evolution must: (1) multiply; 
(2) have heredity; and (3) heredity must not be totally 
accurate (variability). Furthermore, some of the inher- 
ited traits must affect the chance of reproduction 


and/or survival of the units. If all of these criteria are 
met, then in a population of such entities evolution by 
natural selection can take place. Note that this defin- 
ition does not refer to just living systems. Any system 
satisfying these criteria can evolve in a Darwinian 
manner. 


Units of Life 

Units of life as such are rarely defined, although cells 
and organisms are widely known and analyzed. For an 
individual’s living state reproduction is neither a 
necessary nor a sufficient condition. Many cells and 
organisms are commonly regarded as alive even if they 
cannot reproduce. So-called potential (the word 
‘potentiating’ would be better) life criteria must be met 
only if the ‘population’ of units is to be maintained 
and evolves. A sensible relation then between units of 
evolution and units of life is that of two partially 
overlapping sets. This simple relation resolves many 
apparent contradictions. Potential life criteria must be 
satisfied by living systems if the autonomous evolu- 
tion of a whole biosphere is what one (as an exobio- 
logist, for example) is looking for. 

There is a hierarchy both of units of evolution and 
units of life: in many cases they coincide. A reprodu- 
cing organism and its dividing cells are both alive and 
units of evolution. The latter qualification may be 
surprising, but suffice it to say that tumors arise as 
within-organism selection of moderately genetically 
unstable cells. Obviously, such ‘selfish’ tendencies of 
lower level units must typically be suppressed, other- 
wise higher level units would go extinct, or would 
never have arisen in the first place (see ‘Dynamics of 
Genome Stability’ below). 


A Model of Minimal Life 

A model (a precise description) of a ‘minimal’ living 
system is presented, satisfying the potential criteria, 
as conceived by Gánti. The chemoton is a chemical 
supersystem, composed of three autocatalytic (cf. 
“Metabolic Theories’ below) subsystems: a metabolic 
network; a replicating template; and a boundary mem- 
brane (Figure 1). Stoichiometric coupling among the 
subsystems ensures regulated reproduction of the sys- 
tem as a whole. Spatial reproduction happens essen- 
tially because the enclosed volume grows faster than 
the mass of internal material. It is important to empha- 
size that the membrane is also autocatalytic: building 
block T, produced by the metabolic network, is spon- 
taneously inserted by virtue of the fact that there is a 
pre-existing membrane surface. This system qualifies 
as a unit of evolution (with unlimited hereditary 
potential) owing to the presence of the template mol- 
ecule pV,,. If one imagines that these templates are 
the abstract versions of ribozymes (RNA molecules 
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Figure | The abstract network of the chemoton, 
model of a biological minimal system (after Ganti). the 
chemoton model describes the coupling of three auto- 
catalytic (the metabolic, the genetic, and the boundary) 
subsystems. X and Y are source and waste materials, 
respectively. A;_s are the intermediates of the metabolic 
subsystem. V’arethe building blocks for the synthesis of the 
template macromolecule pV,,, consisting of n pieces of V. R 
is the byproduct of template polycondensation. T’ and T* 
are intermediates in the pathway leading to the synthesis of 
the membranogenic molecule T. The membrane forms a 
vesicle built of m T molecules. T molecules self-insert into 
the membrane spontaneously, owing to the fact that ex- 
position of their hydrophobic parts to the water inside the 
system is energetically unfavorable. Note thatall three sub- 
systems, the metabolic, the genetic, and the boundary 
subsystems, are autocatalytic. The functioning of the sys- 
tem is as follows. Imagine a chemoton as a spherical vesicle. 

The spherical shape is due to the fact that molecules A;, V, 
T, and pV, cannot pass through the membrane: only water, 
X, and Y can pass through. There is, therefore, an osmotic 
pressure maintained by the inner materials leading to the 
spherical shape. As the system metabolizes molecules X, all 
internal constituents grow in number. Ultimately, owing to 
the couplings among the subsystems, everything will be 
doubled. Keeping an undistorted spherical shape through- 
out the growth of the chemoton is, however, impossible, 
because a sphere with a double surface has avolume that is 
considerably more than doubled (surface area scales up 
quadratically, volume scales up cubically with the radius of 
a sphere). Thus, osmotic pressure will not maintain a 
spherical shape, and the membrane will buckle to form 
a dumbbell-like object, until the system divides into two 
spheres of identical size to the initial one. More elaborate 
models take into account surface curvature and surface 
energy of the membrane as well. 
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acting as enzymes, cf. ‘Digital Information Storage 
and the RNA World’ below), then they catalyze 
steps of the metabolic cycle and membrane growth 
using their inherited information. 

The chemoton model is useful because it combines 
two approaches to the problem of life: the genetic 
approach and the system theoretical approach. It is 
also helpful in classifying the theories of the origin of 
life. It has been suggested that one should focus on all 
three subsystems of the chemoton, or some combin- 
ation of them, when dealing with the origin of life. 
Therefore, the metabolic, membrane, and template 
theories of the origin of life will be considered here. 
First, an alternative approach to the theories of the 
origin of life is presented below. 


Alternative Approaches to the Origin of 
Life 


Figure 2 classifies the theories for the origin of life 
mainly according to the origin of the basic chemicals 
(inorganic or organic) necessary to build a primitive 
living system. Panspermia (life originated elsewhere in 
the Universe and then naturally or artificially ‘fertil- 
ized’ the Earth) is not logically excluded, but does 
not solve by itself the problem of how life arose in 
the first place. The clay hypothesis of Cairns-Smith 
rests on the assumption that clay minerals can be units 
of evolution: however, as yet there is no experimental 
evidence to support this claim. The idea that primeval 
living cells were heterotrophic (used organic com- 
pounds in the milieu for metabolizing matter and 
energy) is strongly linked to the ‘soup’ approach, 
pioneered by the famous Miller experiments in 1953. 
A reducing primordial atmosphere and an organic 
soup (akin to Darwin’s ‘warm little pond’) are central 
to this approach. There are two main problems with 
the soup idea. First, there now seems to be little 
evidence for an early reducing atmosphere. Second, 
the reactions demonstrated to yield some or other 
bioorganic compound are in many cases chemically 
incompatible. There is increasing doubt that the soup 
approach satisfactorily explains the origin of life. 
Furthermore, it is not in fact a theory for the origin 
of life, but a theory for the origin of some of its 
chemical constituents. We shall see below that some 
promising alternatives have been suggested, although 
it is possible that the soup could also have contributed 
to the accumulation of organic material. 


Metabolic Theories 


Autocatalysis 
The metabolic subsystem of living cells ensures that 
replication of the genetic material is nontrivial: it can 
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Figure 2 Theories for the origins of the chemical constituents of life (after Davis and McKay). Panspermia means an 
extraterrestrial source of life: directed panspermia means that some alien civilization deliberately ‘fertilizes’ planets 
with life. Clay organisms refer to a highly hypothetical possibility of primordial clay replicators, whose evolution 
ultimately could have led to organic replicators. The so-called ‘soup’ theory invokes heterotrophy for the earliest 
form of metabolism; thus the first cells would have metabolized the organic compounds produced abiogenically 
outside them. Other scenarios favor primordial autotrophy, where cells would have synthesized organic material from 
simple carbon sources (such as formaldehyde or carbon dioxide) at the expense of energy gained from light 
(photosynthesis) or from the chemical conversion of inorganic molecules (chemosynthesis). 


proceed even if the activated building blocks (nucleot- 
ides) are not present in the external milieu. But 
metabolism has a ‘life of its own’: one reason for 
this is that it is an autocatalytic system. The chemical 
basis of replication is autocatalysis (A catalyzes the 
formation of new A from the raw materials) and auto- 
catalysis always results, in some sense, in replication. 
Heredity relies on replication of information, where 
different kinds of autocatalysts can exist. Are meta- 
bolic replicators of information possible? 

The simplest self-replicator of relevance in this 
context is glycolaldehyde, the autocatalytic seed 
of the formose ‘reaction’ (Figure 3) discovered by 
Butlerov in 1861. Some consider this system to be 
important for the origin of life, whereas others are 
more skeptical. Either way, it is unknown whether 
such systems just exist or can undergo some evolution 
by natural selection, for which hereditary variation 
would be mandatory. 


Chemoautotrophic pyrite theory 

Whether one can have heredity in such systems or not 
is an open question, both theoretically and empiric- 
ally. To be sure, there are other autocatalytic cycles of 
small organic molecules (such as the Calvin cycle and 
the reductive citric acid cycle, fixing carbon dioxide 
in plants and some bacteria, respectively) that could 
have played an early role even in chemical evolution. 
Wachtershauser suggested that archaic versions of the 
reductive citric acid cycle could have existed and propa- 
gated on pyrite surfaces. The central reaction of this 
chemoautotrophic theory is the fixation of carbon di- 
oxide, coupled to the formation of pyrite: 


4CO; + 7FeS + 7H)S — (CH;,COOH), + 7FeS, 


+4H,O 


which is energetically favorable. 


Formaldehyde Glycolaldehyde 
om 
> 
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Figure 3 The formose reaction (after Maynard Smith 
and Szathmáry). We show the core reactions of this 
autocatalytic formation of sugar molecules. Circles 
represent chemical groups containing one carbon atom. 
(A) The initial reaction is very slow, producing 
glycolaldehyde from formaldehyde. (B) The autocatalytic 
core rests on the replication of glycolaldehyde, fueled 
by formaldehyde consumption. Several other sugars, 
including ribose, can be produced using up the 
molecules produced by the autocatalytic core. The 
prebiotic relevance of this system is not unequivocal, but 
is likely to be high. 


This suggestion is open to experimental tests. It is 
encouraging that more and more reactions, broadly 
supporting a pyrite-based scenario, are demonstrated 
to occur at high temperatures under the required 
experimental conditions. 


Importance of mineral surfaces 

Appropriately charged (positive) mineral surfaces 
(such as pyrite, or clay covered by positive ions) are 
likely to have played a crucial role in the origin of bio- 
chemical reactions for several reasons: (1) adsorbing 
surfaces act as catalysts, since the local concentration 
is elevated; (2) polymerization can be thermodynamic- 
ally favorable, since water can leave the surface and 
thus increase the entropy of the milieu, compensating 
for the entropy decrease on the surface; (3) and the 
dynamics of natural selection on the surface more 
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readily lead to increased complexity (see ‘Dynamics 
of Genome Stability’). 


Heredity and evolution in metabolic systems 

We do not know of any replicative alternatives to 
glycolaldehyde in the formose reaction. Most changes 
in the chemical identity of the cycle intermediates will 
be just transient fluctuations or will simply drain the 
system. Even if heredity is possible for such cycles, 
hereditary variation will be very rare, closer to what 
biologists call ‘macromutations.’ Heredity, if possible 
at all, will be of the limited kind. This means that 
the number of types is smaller than the number of 
individuals in a given system. Under such circum- 
stances, evolution by natural selection soon comes to 
a halt. 

Another aspect is the lack of modularity in such 
replicators. DNA is copied by the sequential addition 
of modules, complementary to modules sitting in the 
parental strand. This is not the case for the cycles 
considered here. The terms ‘processive’ or ‘holistic’ 
refer to a replication process where it cannot be said 
(as for DNA) that replication is half-completed: one 
needs the whole series of chemical transformations 
until, almost by a miracle, two individuals of the initial 
type appear (see Figure 3). 

The final important aspect is that inheritance is 
based here on the dynamic nature of all the relevant 
chemical reactions of the network, so the hereditary 
states must be stable dynamic states. Such inheritance 
systems have been called ‘steady-state’ or ‘attractor- 
based’ systems. The question is how could such simple 


replicators have evolved into something as complex as 
RNA? We do not have the answer to this. 


Autocatalytic peptide networks 

Another type of ‘metabolic’ approach is the one con- 
sidered by Eigen, Dyson, and Kauffman based on 
reflexively autocatalytic protein networks. Their 
approach is based on the interconversion of oligo- 
and polypeptides, catalyzed by the peptides them- 
selves. Given an adequate source of amino acids for 
‘food,’ such a system would grow autocatalytically. 
Note that this system is modular, but still attractor- 
based: Peptides in general are not assumed to undergo 
template replication in the same way that nucleic acids 
do. There are at least two problems with this idea. 
First, the number of different reactions that any pep- 
tide in the network is supposed to catalyze is unreal- 
istically high. Second, the authors considered only the 
catalysis of potentially beneficial reactions. Unfortu- 
nately, the majority of potential reactions will lead out 
of the system. In order to obtain catalytic closure, one 
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must have a relatively large network. The larger the 
network, the higher the number of expected side reac- 
tions, which would then require a small system size. 
Clearly, one cannot have it both ways. No satisfactory 
solution to this problem has been given. Note that this 
applies to all metabolic theories. 


Autocatalytic oligonucleotide networks 

Another example of an attractor-based but modular 
replicator would be a population of RNA molecules 
that could undergo replication only through the mutual 
heterocatalytic aid given to the ligation of its mem- 
bers. Such networks may have been an intermediate 
stage in the transition from holistic replicators to the 
storage-based systems of an ‘RNA world’ (see below). 
The reality of such networks is open to question. 


Membrane Heredity and the Lipid World 
Morowitz has raised the possibility of hereditary 
membrane replication, with a relevance to early evo- 
lution. The membrane of the chemoton is also an 
autocatalytically growing system. Recently, Segré 
and colleagues presented a much more sophisticated 
version of this idea in their ‘lipid world’ scenario. To a 
first approximation their system is a lipid version of 
the reflexively autocatalytic protein networks (see 
above). There are two crucial differences, however: 
(1) the Segré system is holistic; and (2) it is spatially 
confined, by virtue of the lipid constituents forming 
a vesicle. This confinement has a very important 
consequence for dynamics: The lipids in a vesicle can 
only belong to a small subset of all the possible lipids, 
due to the physical limitation on vesicle size. Simu- 
lations show that this sampling fosters hereditary 
behavior. 

Again, we cannot tell whether complex lipid repli- 
cators are feasible or not. In any case, the problem of 
side reactions has not been solved for these systems 
either. 


Digital Information Storage and the RNA 
World 


Unlimited hereditary potential 

Surely short replicators of the modular kind must have 
preceded the longer ones. In a pioneering study Von 
Kiedrowski in 1986 managed to synthesize an artificial 
hexadeoxynucleotide analog that replicated without 
enzymatic aid. Many similar replicators have been 
designed and successfully tried out since. These results 
are important because they show that true molecular 
self-replication is possible; however they are not 
directly relevant to the origin of life because the mol- 
ecules involved are not plausible prebiotic com- 
pounds. Although replication of these molecules is 


modular, heredity is still limited, because small size 
restricts the number of possible types (sequences). We 
reach unlimited heredity as soon as we reach the 
dimensions of viruses. Unlimited heredity means 
that the number of possible sequences is much larger 
than the number of individuals in the given system. 

The digital nature of information storage in nucleic 
acids allows for microevolution, and if the length of 
the replicators is sufficient, unlimited heredity allows 
evolution to go on indefinitely. 


Ribozymes and the RNA world 

The term RNA world, coined by Gilbert in 1986, can 
be traced back to suggestions by Woese, Orgel, and 
Crick in the late 1960s. They realized that since RNA 
is a macromolecule consisting of building blocks with 
different chemical functional groups, and since it has a 
globular three-dimensional structure, coded for by its 
sequence, RNA could have served as a primordial 
catalyst, as well as being genetic material. This sugges- 
tion received limited attention until the discovery 
of catalytic RNAs (ribozymes) in the early 1980s. 
Almost all extant natural ribozymes catalyze reactions 
of other pieces of RNA; hence, it was an open ques- 
tion for a while as to whether ribozymes could be 
general catalysts. In vitro genetics involving the arti- 
ficial selection of ribozymes with predetermined func- 
tions, has given strong support to the notion that 
ribozymes could have been controlling primordial 
metabolism. However, there are two major concerns. 
First, nobody knows where RNA came from. It is too 
complex a molecule for primordial chemistry. One of 
the problems is enantiomeric cross-inhibition, which 
relates to the broader issue of biomolecular homo- 
chirality. Many organic molecules are such that they 
exist in pairs that are mirror images of each other. 
Typically, living systems only use one or other of 
them, and this is what is meant by homochirality. 
RNA is also a chiral molecule; its building blocks are 
right-handed. A mixture of both right- and left- 
handed building blocks would inhibit replication of 
a homochiral template. Therefore, a nonchiral pre- 
decessor to RNA has been suggested, but a convincing 
candidate for this role is still lacking. 

The second problem with RNA is that it is not a 
self-replicator. A protein enzyme replicates all known 
RNAs of even moderate length. A replicase ribozyme 
might solve the problem, but nobody has been able to 
make one so far. 


Toward Composite Systems: Template- 
Containing, Reproducing Micelles 

Ultimately the aim is to approach systems like the 
chemoton experimentally (see Figure 1), but using 
genetic material as a catalyst of metabolism. It is 


possible that such a chemoton-like system will be 
realized experimentally within the next two decades. 
Luisi has begun work in this direction by coupling 
autocatalytic vesicle formation with internal RNA 
replication. The system is still fairly limited, because 
RNA needs an added protein enzyme for replication, 
which is not replicable. Thus replication stops by 
dilution, due to vesicle fission. There are plans to 
construct a vesicle that would solve these problems. 
Implementation of metabolism experimentally seems 
to be the hardest problem to solve. A ribozyme- 
catalyzed reductive citric acid cycle would be a useful 
goal to achieve. 


Some Further Considerations 
Dynamics of Genome Stability 


Error threshold of replication and Eigen’s paradox 
The vast majority of research in this field concentrates 
on structural investigations. Yet we know since the 
pioneering works of Eigen (1971) that temporal 
dynamics cannot be ignored either. He spelt out 
what today is referred to as Eigen’s paradox or the 
catch-22 of the origin of life. Replication always pro- 
ceeds with finite accuracy: Inaccurate insertions of 
nucleotides into nucleic acids are called mutations. 
Contemporary error rate of nucleic acid replication 
is in the range of between 10 * and 10° */nucleotide/ 
replication. Primordial accuracy of replication must 
have been much more inaccurate, with error rates 
possibly exceeding 10 7/nucleotide/replication. The 
problem is that mutational load limits the genome 
size maintainable by selection. Therefore, primordial 
nucleic acid-like molecules could not possibly have 
been longer than ca. 100 nucleotides, the size of pre- 
sent-day tRNA. This size is just sufficient for one 
small gene, which implies that genes in a primordial 
genome must have been unlinked. But if unlinked, 
they were ready to compete among each other (differ- 
ences in sequence and three-dimensional structure 
translate into differential replication rates), hence the 
demise of the segmented genome. 


Importance of population structure 

Several resolutions of Eigen’s paradox have been sug- 
gested, all essentially resting on some kind of struc- 
tured population. The simplest, and possibly oldest, 
implementation is natural selection on a surface. 
Owing to the fact that on the surface adsorbed genes 
interact with neighbors only, coexistence of different 
genes becomes possible. The intuitive explanation for 
this is that the gene with the slowest rate of replication 
is likely to be complemented by the other genes in a 
local neighborhood, whereas the fastest replicating 
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gene is likely to be surrounded by copies of itself, 
which renders metabolic complementation provided 
by the other genes impossible. 

Another possible resolution has been given by the 
stochastic corrector model. Put simply, this is a popu- 
lation genetics implementation of the chemoton with 
catalytic RNA inside. Although different genes in the 
same chemoton still compete, there is selection at the 
level of the chemotons as well. Templates are allocated 
randomly into offspring vesicles. This (and some other 
sources of stochasticity) generates variation among 
the chemotons, on which natural selection can act. 
Selection at the higher level successfully counters 
that at the lower level: A recurrent theme in successful 
evolutionary transitions. 


Origin of the Genetic Code 

We do not know how and when the genetic code 
entered the game. It is likely that it arrived late in the 
day, in already evolved (possibly living) systems. This 
is almost certain if the RNA world really existed. But 
this implies that the origin of the code is no longer a 
problem for the origin of life, since the latter preceded 
the former. Nevertheless, considering that all extant 
life forms rely on the code, we must briefly touch on 
this issue. 

Preadaptationist scenarios for the origin of the code 
are becoming increasingly popular. A preadaptation is 
a trait of an evolutionary unit that has evolved to serve 
function a, which turns out to be useful (at a rudimen- 
tary level) for function b as well. One example is the 
case of feathers. Initially, they were used not for flight 
but to keep animals warm. By virtue of its structure it 
also aided rudimentary forms of flight. Ultimately, 
feathers for flight per se were selected. 

One version of the preadaptationist scenarios pos- 
tulates that amino acids entered the RNA world as 
coenzymes of ribozymes. Limitations on space do 
not allow for an explanation as to why this could 
have led to a coded assignment between amino acids 
and oligonucleotides, but the case has been made. 
Once such an assignment is present, it can be used 
for something else; in this case for peptide synthesis 
(translation). Thus, it could well be that by separating 
two difficult issues, coding and protein synthesis, 
from each other, one can finally crack this ‘notoriously 
difficult’ puzzle. 
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An orphan receptor is a gene product that appears to 
belong to a ligand-regulated receptor family on the 
basis of sequence identity, but lacks identified cognate 
ligands. 


Orthology 
W Fitch 
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The circumstance in which two homologous sequences 
diverge following speciation so that the common 
ancestor of two sequences lies in their cenancestor. 


See also: Cenancestor; Paralogy; Xenology 
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Rice generally refers to the seeds of the widely culti- 
vated cereal crop Oryza sativa but fundamentally, it 
refers to the plant itself that produces the seeds. There 
are two major ecotypes of Oryza sativa, namely 
indica, adapted to the tropics, and japonica, adapted 
to the temperate regions and tropical uplands. The 
basic differences between these ecotypes can be 
clearly recognized according to the distinct shape of 
the seeds. Indica is characteristically long and slim 
whereas j japonica appears short and round with the 
tropical japonica similar to temperate japonica but 


larger in size. Another species, Oryza glaberrima, is 
being cultivated in Africa but at a rather smaller pro- 
duction scale. Composed of about 20 species, the genus 
Oryza is known to have been originally domesticated 
in the southeastern part of Asia about 5000 years ago. 
Since then, several types of rice cultivars have been 
introduced to adapt to prevailing local culture tech- 
niques and environmental conditions. The successful 
breeding of cultivars under this scenario has, in part, 
contributed to the emergence of this cereal plant as the 
leading staple for about a half of the world population. 

Cultivated rice has 12 pairs of chromosomes with a 
total of 430 Mb genome size. Linkage analysis using 
morphological traits such as dwarfism or disease resist- 
ance as markers has generated a classical genetic map 
of 12 linkage groups with about 200 traits. The corres- 
pondence of each chromosome with each linkage 
group was established by a combination of genetic 
and cytogenetic analysis. 

Molecular genetic analysis of the rice genome was 
launched in the early 1990s using the principle of 
restriction fragment length polymorphism (RFLP). 
Currently, there are several representative molecular 
genetic maps available with a total of about 5000 
RFLP markers. A very fine map with 3267 DNA 
markers has been generated at the Japanese Rice 
Genome Research Program. Simple sequence repeat 
(SSR) is another type of reproducible marker that can 
be generated by PCR. So far, there are about 400 estab- 
lished SSR markers in the rice genome. Both RFLP 
and SSR markers are used to identify chromosomal 
locations of arbitrary polymorphisms such as traits 
or sequences to further dissect the rice genome. For 
example, several disease resistance genes and genes 
involved in transducing hormonal signals could be 
identified using these markers. In addition, cross- 
hybridization of these rice markers with molecular 
markers developed in other cereal crops such as 
wheat and maize has revealed various levels of con- 
servation of the order of DNA markers among the 
cereal genomes. Possessing the smallest genome size 
among cereals, rice has been recognized as a model in 
the conservation of gene order called synteny where 
common genomic information among cereals can be 
extracted from the genomic structure already uncov- 
ered in rice. 

Reconstruction of the rice genome by assembling 
pieces of its DNA fragments has been accomplished 
by using artificial chromosomes from yeast (YAC) or 
bacteria (BAC or PAC) as vectors. The resulting 
assemblage of DNA fragments is called a physical 
map and is used for gene identification following 
genetic analysis and for preparation of target frag- 
ments for genome sequencing. Although a complete 
physical map is not yet available, a map constructed 


with YAC covers 65% of the genome and several maps 
with BAC/PAC will soon cover the entire rice genome 
to accomplish the goals of genome sequencing. The 
accurate assignment of fragments to their original 
positions can be ensured using genetic markers as 
described above. The analysis of the rice genome 
sequence using a physical map based on BAC/PAC 
is expected to be completed by the end of 2004. 

The characterization of rice gene functions is per- 
formed using two strategies based on genetics or 
reverse genetics. Using genetics, mutant phenotypes 
generated by artificial method or by spontaneous 
mutation must be accurately tagged by DNA markers 
using a purely segregated population according to 
Mendel’s laws of inheritance. Once tagging is success- 
ful, the markers are used to pick the corresponding 
DNA fragments on the physical map. Sequencing the 
fragment then elucidates the candidate gene and trans- 
formation can finally confirm its function. So far, 
several disease resistance genes such as Xal, Xa21 
(resistance to bacterial blight), and Pzb (rice blast), as 
well as agronomically important genes such as d1 
(dwarfism), Sp/7 (spotted leaf), and sh (shattering) 
have been successfully identified and characterized. 
This strategy can also be used to identify individual 
genes involved in a phenotype collectively controlled 
by multigenes. For this purpose, backcrossed nearly 
isogenic lines for each of the multigenes are generated 
by marker-assisted selection and the strategy described 
above is adopted. Using this method, the photoperiod 
sensitivity genes involved in flowering time such as 
Hd1, Hd3a, and Hdé have been identified. 

Gene disruption in rice by sequence-known factors 
such as transposon or T-DNA can be used for func- 
tional analysis of genes by reverse genetics. Among 
them, rice endogenous retrotransposon called Tos17 is 
the most advantageous because of its stability, fre- 
quency of transposition, and efficiency of producing 
of gene-knockout plants by cell culture. There are 
two ways that can be adopted for identification of 
disrupted genes. One is to amplify the flanking 
sequence to the inserted Tos17 and then compare 
the sequence of the amplified product with many 
partial sequences of randomly cloned rice cDNAs 
(ESTs). This method can generate a database of Tos17 
disrupted expressed genes which can be applied for 
evaluating the relationship between the phenotype 
and the disrupted gene. Using this approach, the dis- 
rupted cellulose synthase catalytic subunit and chloro- 
phyll a oxygenase has been found to be associated 
with the brittle culm phenotype and pale-yellow color- 
ing of the leaves, respectively. Another strategy is by 
amplifying the DNA from gene-disrupted rice plants 
with primers designed from the sequence of a target 
gene and then screening for plants with longer insert 
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sizes. This approach has led to the identification of the 
function of a homeobox gene as controlling the inter- 
node elongation. 


Further Reading 

General information. http://www.iceweb.org/ 

Japanese Rice Genome Research Program. http://rgp.dna.affrc. 
go.jp/Publicdata.html 

Rice DNA materials. http://bank.dna.affrc.go.jp/ 

Rice germplasm. http://www.cgiar.org/irri 

Rice germplasm. http://gene. affrc.go.jp/plant/db/ 

Rice germplasm. http://www.grs.nig.ac.jp/ NIG_rice/rice.html 

Rice germplasm. http://www.ars-grin.gov/npgs 

Rice genomics/genetics. http://rgp.dna.affrc.go.jp 

Rice genomics/genetics. http://genome.cornell.edu/rice/ 


See also: Grasses, Synteny, Evolution, and 
Molecular Systematics; Retrotransposons 
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Osteogenesis imperfecta (OI) is the collective term for 
a group of connective tissue dysplasia syndromes 
characterized by liability to fractures throughout life. 
Osteoporosis is a primary defect in some affected 
persons but not all and, for many affected, results 
mainly from immobilization. Associated features in 
some affected individuals but not others include 
blueness of the sclerae, presenile hearing loss, dentino- 
genesis imperfecta (DI), hypermobility of joints, 
hyperextensibility of ligaments, short stature, skeletal 
deformity, and cardiovascular complications. Skeletal 
deformities such as scoliosis and basilar impression are 
regarded as secondary deformities rather than primary 
malformations 


Classification 


In the 200 years since Ekman (1788) described a fragile 
boned family (with normal sclerae), there has been a 
proliferation of terminology employed to classify dif- 
ferent types of OI (Sillence et al., 1979). In the Inter- 
national Nomenclature of Constitutional Disorders of 
the Skeleton 1997 at least ten OI phenotypes have 
been distinguished on the basis of clinical findings, 
inheritance patterns, and biochemistry although a 
tight genotype-phenotype correlation has not been 
possible (Rimoin et al., 1998) (Table 1). 


1396 Osteogenesis Imperfecta 


Table | International nomenclature of osteogenesis imperfecta syndromes 1997 
Number Disease MIM? Inheritance? Chromosome Gene 
l Ol type |, normal teeth 166200 AD 17q COLIAI 
7q22.1 
2 Ol type |, opalescent dentine 166240 AD 7q22.1 COLIA2 
Ol Ol type Il 166210 AD 17q COLIAI 
7q22.1 COLIA2 
4 Ol Ol type Il 259400 AR 17q COLIAI 
7q22.1 COLIA2 
5 Ol type III (dominant) AD 17q COLIAI 
17q COLIA2 
7q22.1 
6 Ol type Ill (recessive) 259420 AR 17q COLIAI 
7q22.1 COLIA2 
7 Ol type IV, normal teeth 166220 AD 7q22.1 COLIA2 
17q COLIA2 
8 Ol type IV, opalescent dentine AD 7q22.1 COLIA2 
17q COLIAI 
9 Ol with congenital joint contractures (Bruck) 259450 AR 17p12 TLHI 
10 Ol with metaphyseal fragility (Cole-Carpenter) 112240 Sp 


“AD, autosomal dominant; AR, autosomal recessive; SP, inheritance pattern unknown/spontaneous. 


b : A 3 
Mendelian Inheritance in Man. 


Clinical Description of Ol Syndromes 


In dominantly inherited OI type I there are distinctly 
blue sclerae. In the majority of affected multiple frac- 
tures occur throughout childhood and in later life but 
there is little skeletal deformity (Sillence et al., 1979). 
However, when people with OI type I have dentino- 
genesis imperfecta (DI) known as OI type I with DI, 
there is an increased frequency of fractures, severe 
short stature, and skeletal deformity (Paterson et al., 
1983). Deafness affects over 50% of subjects by the 
fifth decade. Otosclerotic-like conductive deafness, 
blue-grey sclerae, arcus cornea, and easy bruising 
occur in both types of OI type I. 

In contrast, dominantly inherited OI type IV, 
like OI type I, is characterized by variable fracture 
frequency but normal sclerae (Paterson et al., 1987). 
Deafness rarely occurs and easy bruising is not a 
feature. OI type IV is usually a mild disorder in 
families. Occasionally, affected family members are 
so severely affected, i.e., with short stature and/or 
skeletal deformity as to appear phenotypically in- 
distinguishable from OI type III. Families with 
normal teeth are designated OI type IV with normal 
teeth and those with dentinogenesis imperfecta, OI 
type IV with DI. 


Ol type II and OI type III commonly present in the 
newborn period. Perinatally lethal forms of OI (type 
II) have crumpled long bones (concertina-like) and 
multiple fractures resulting in deformity of the legs 
and forearms (Sillence et al., 1984). The ribs may show 
continuous beading due to fractures or be relatively 
spared. Rib morphology is used to differentiate two 
prognostic groups (Table 2) (Thompson et al., 1987) 
anda third rare autosomal recessive syndrome (Sillence 
et al., 1984). Pulmonary hypoplasia resulting from 
decreased fetal chest wall movement im utero accounts 
for the virtually 100% mortality in the perinatal period 
of those babies with continuously beaded ribs. Group 
B with a few rib fractures may survive childhood and 
occasionally reach adult life as very short and severely 
disabled adults. The majority of cases arise as a result of 
heterozygous mutations in type I collagen genes. The 
empiric recurrence risk reflects germinal cell mosai- 
cism in some affected parents (Sillence et al., 1984). 

OI type III was originally defined as an autosomal 
recessive type of OI with normal sclerae and severe 
progressive deformity of long bones and spine in sur- 
vivors. Itisthe commonest cause of severe progressively 
deforming OI in African/Middle Eastern populations. 
Families with recessive inheritance have also been 
ascertained from Italy, American Indian populations, 
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Table 2 Radiographic subclassification of osteogenesis imperfecta type Il 


Subgroup Radiographic features Empiric recurrence risk 
A Crumpled long bones (accordion-like femora) <1% 
Continously beaded ribs due to numerous fractures 
B Crumpled long bones (accordion-like femora) 6-7% 
Normal ribs or few fractures 
C Long, thin fractured long bones with thin wavy and beaded ribs 25% 


Sources: Sillence et al., 1984; Thompson et al., 1987. 


South America, and the Indian subcontinent. Linkage 
and biochemical studies have demonstrated that muta- 
tion of type I collagen genes is not responsible (Wallis 
et al., 1993). 

Phenotypically indistinguishable patients, OI type 
III (autosomal dominant), are born in families with OI 
type IV and in European populations OI type III is 
mainly due to heterozygous new mutations in type I 
collagen genes, i.e., sporadically occurring (Byers, 
1995). 

Bruck syndrome is also an autosomal recessive 
syndrome, which combines bone fragility with 
congenital joint contractures progressing in many 
instances to multiple pterygia (Viljoen et al., 1989; 
McPherson and Clems, 1997). Patients with this 
disorder appear to have a defect in bone-specific 
hydroxylysine crosslinking (Bank et al., 1999). 

The Cole—Carpenter type of OI is characterized by 
severe progressive osteopenia with craniosynostosis 
and metaphyseal fragility (Cole and Carpenter, 
1987). This disorder is usually recognized in the first 
year of life because of its craniofacial dysmorphism 
consisting of prominent eyes, brachycephaly, large 
anterior fontanelle, and in some patients hydro- 
cephalus. There is progressive long-bone deformity 
and it is otherwise similar to OI type III. Inheritance 
and pathogenesis are unknown. 


Biochemistry and Molecular Pathology 
of Ol 


In over 90% of cases of OI worldwide, mutations 
affecting type I procollagens are believed to be respon- 
sible (Viljoen et al., 1989). Mutations reported in OI 
type II and III have included multi-exon deletions, 
gene rearrangements, and, most commonly, point 
mutations in the triple helical domains of the alpha-1 
(I) and alpha-2 (I) chains. Mutant type I procollagen 
chains are included in triple helices, slowing the fold- 
ing of triple helices and resulting in increased intracel- 
lular degradation (Byers et al., 1991). 

Patients with OI type IV similarly have point 
mutations resulting in glycine substitutions pre- 
dominantly in COL1A2 but more amino terminal in 


position in the procollagen or resulting in substitu- 
tions in the second or third position of the obliga- 
tory procollagen gly-X-Y triplet (Byers et al., 1991; 
Wenstrup et al., 1990). 

Subjects with OI type I have approximately 50% 
reduction in net procollagen synthesis. Analysis shows 
a reduction in type I (alpha-1) procollagen mRNA 
consistent with a null allele for COL1A1 (Willing 
et al., 1993). The majority of families have nucleotide 
insertions or deletions, which shift the reading frame 
and generate new stop codons leading to premature 
termination of transcription (Willing et al., 1993). The 
mutant collagens are not translated im vivo. Some 
subjects with OI type I with DI have partly excluded 
short mutant procollagen chains resulting from multi- 
exon deletions in COL1A2 (Mundlos et al., 1996). 
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Introduction 


Outbreeding refers to sexual reproduction through 
fusion of gametes produced by distinct individuals. 
Organisms exhibit a wide array of mechanisms that 
promote outbreeding. Both the maintenance of these 


systems over exceedingly long periods of time and their 
independent derivation in different lineages suggest 
that outbreeding may confer considerable advantages. 


The Sexual Syndrome 


Sexual reproduction entails the coordination of a myr- 
iad of physiological and genetic processes, primary 
among which are meiosis, recombination, and gamete 
fusion. Sexual reproduction has the potential to 
generate new combinations of genes, unlike asexual 
reproduction, under which parents and offspring have 
identical genomes, barring mutational changes. Out- 
breeding, the fusion of gametes produced by distinct 
individuals, increases the magnitude of genetic vari- 
ation among offspring. Whether generating more 
diverse offspring, which may possess lower as well as 
higher fitness, confers a net advantage over replicating 
the parental genotype determines in large part the 
evolutionary value of outbreeding. 


Mechanisms of Outbreeding 


Bakers’ Yeast 

Saccharomyces cerevisiae, an ascomycete fungus, 
proliferates mainly through asexual reproduction 
of diploid cells. Starvation induces meiosis and the 
generation of haploid ascospores. Improvement in 
nutrient conditions induces the spores to germinate, 
producing haploid cells specialized for mating. Mating 
occurs only between a-cells, which express the MATa 
allele at the MAT locus, and «-cells, which express the 
MATz allele. Fusion generates the diploid a/x cell, 
which undergoes asexual reproduction until environ- 
mental conditions deteriorate once again. MATa and 
MATa encode homeodomain proteins that regulate 
the transcription of «-specific, a-specific, and haploid- 
specific genes. MATa, expressed in a-cells, encodes 
two proteins: «1, which activates transcription of «- 
specific genes, and «#2, which inhibits transcription of 
a-specific genes. In a-cells, «-specific genes are silent 
in the absence of their activator, «1, and a-specific 
genes are transcribed in the absence of their inhibitor, 
«2; MATa encodes the al protein, which is inactive by 
itself. In diploid a/« cells, «2, inhibits transcription of 
a-specific genes and forms a complex with a1 which 
inhibits expression of haploid-specific genes (includ- 
ing those required for mating) and of «1 (thereby 
preventing the activation of a-specific genes). 


Other Fungi 

Mating requires differences at two unlinked blocks of 
loci in basidiomycete fungi, including Ustilago may- 
dis, which produces the delicacy huitlacoche (corn 
smut), and species of the mushroom genera Coprinus 


and Schizophyllum. One block of loci comprises one 
or more subsets of genes, with each subset encoding 
one or more pheromones and a pheromone receptor. 
The other block comprises one or more subsets of 
divergently transcribed gene pairs encoding homeo- 
domain proteins with motifs called HD1 and HD2. 
HD 1 genes and the yeast gene that encodes the «2 pro- 
tein are homologous (derived from a common ances- 
tral gene), as are HD2 genes and the yeast al gene. 

Mating occurs between haploid cells, with specific 
interactions both between HD1 and HD2 homeodo- 
main proteins and between a pheromone and a pher- 
omone receptor required for compatibility. Only 
proteins encoded by members of the same subset but 
in different haplotypes can interact. In species with 
more than one subset of paired homeodomain genes, 
the different subsets appear to function in a redundant 
manner, with specific interaction between different 
haplotypes within any one pair sufficient for activation. 
Multiple subsets of pheromone/receptor gene pairs 
function in a similar way. 

In U. maydis, two functionally distinct forms (hap- 
lotypes) of the pheromone and pheromone receptor 
region exist; over 25 functionally distinct homeodo- 
main region haplotypes are known. Schizophyllum 
and Coprinus species maintain hundreds of homeo- 
domain haplotypes and about 80 pheromone/receptor 
haplotypes. Free recombination between the two 
blocks of genes in these species permits the formation 
of all possible combinations of haplotypes, giving rise 
to thousands of mating groups or sexes, for which 
mating can occur only between members of different 
groups. 


Flowering plants 

Although most flowering plants are hermaphroditic, 
with individual plants producing both male and female 
gametes, perhaps half of all species express some form 
of genetically determined incompatibility that pre- 
vents self-fertilization. In the best known systems, 
the genetic factors controlling self-incompatibility 
(SI) segregate as a single locus, the S-locus. 

In heteromorphic systems of SI, different mating 
groups exhibit different floral morphologies, which 
present physical barriers to fertilization by pollen 
produced by members of the same group. In the 
cowslip, for example, individual plants produce only 
flowers in which the stigmatic surface on which 
pollen grains germinate protrudes above the pollen- 
producing anthers (pin form) or only flowers in 
which the relative locations are reversed (thrum 
form). These morphological differences promote pol- 
lination between rather than within flower types. 

In homomorphic systems of SI, in which different 
mating groups have similar floral morphologies, the 
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S-locus encodes mating specificities expressed by pol- 
len, which causes their rejection by plants that express 
the same specificities. Under gametophytic SI (SSI), 
pollen grains or tubes express the specificities encoded 
in their own haploid genomes, while under sporo- 
phytic SI (SSI), pollen specificities are determined by 
the genotype of the plant that produced the pollen. 

All forms of homomorphic SI were previously 
thought to have descended from a single origin in 
GSI, which perhaps coincided with, or even facili- 
tated, the rise of the flowering plants themselves, fol- 
lowed by the evolution of SSI and self-compatibility. 
However, the characterization in the mid-1980s of the 
molecular basis of SI in two major groups revealed 
that SSI and GSI derive from entirely different evolu- 
tionary origins. The form of SSI expressed in the 
cabbage family entails the recognition of proteins 
borne on the pollen grain coat by a receptor protein 
kinase which spans membranes of the epidermal cells 
of the stigma. While compatible pollen grains induce 
the stigma to produce hydrating factors necessary for 
germination, hydration is withheld from incompatible 
pollen grains. Under the form of GSI expressed in the 
tomato family, both compatible and incompatible 
pollen grains germinate at the stigmatic surface, with 
rejection mediated in the style by extracellular ribo- 
nucleases that inhibit the growth of incompatible but 
not compatible pollen tubes. 

Comparisons of the genetic and physiological 
mechanisms of SI among various plant families indi- 
cate multiple, independent origins. In poppies, which 
lack a style, the GSI rejection reaction occurs on the 
stigma, with an increase in calcium ion concentration 
in incompatible pollen tubes inducing arrest of pollen 
tube growth. GSI in lilies, which have hollow styles, 
appears to involve distinct physiological processes and 
presumably distinct genetic mechanisms; the multi- 
locus GSI mechanisms of the grasses appear to derive 
from different origins as well. In contrast, GSI in the 
apple family appears to be mediated by the same ribo- 
nuclease system as in the tomato family, even though 
apples are more closely related to cabbages than to 
tomatoes. Species within the sweet potato and the 
sunflower families appear to express a form of SSI 
distinct from that of the cabbage family. This diversity 
of origins of SI among families of flowering plants 
suggests that the existence of just two major segre- 
gation patterns among homomorphic SI systems repre- 
sents a remarkable evolutionary convergence. 


Evolutionary Pressures Maintaining 
Outbreeding 


A parent contributes both genomic complements to 
offspring derived by self-fertilization, but only one 
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to offspring derived by outbreeding. Under random 
fusion of gametes, the gamete contributed by the other 
parent is randomly sampled from the population. 
However, the expression of mating incompatibilities 
ensures that the gamete from the other parent is less 
similar than random in regions that cosegregate with 
the mating type locus. Mutations within the mating 
type locus region that suppress mating incompatibil- 
ities would appear to enjoy a greater than twofold 
advantage over functional mating type alleles. 

In spite of this enormous disadvantage, mating in- 
compatibilities have persisted over long periods in a 
variety of organisms. Suppression of mating incompat- 
ibilities may induce severe inbreeding depression. En- 
forced heterozygosity of mating type alleles can shelter 
recessive deleterious mutations from expression and 
purging, permitting their accumulation in regions 
closely linked to mating type loci. Upon the suppres- 
sion of mating incompatibilities, such mutations would 
be expressed in offspring carrying mating type alleles 
in homozygous form. The progressive accumulation 
of such deleterious factors may constitute a strong and 
intensifying force serving to maintain outbreeding. 


Conclusions 


Mechanisms that promote outbreeding through genet- 
ically determined recognition of gametes or mates 
affect various aspects of reproduction beyond the 
avoidance of self-fertilization. For example, by influ- 
encing compatibility among close relatives or among 
individuals with similar genotypes, such mechan- 
isms may provide a means of discriminating among 
potential mates, as well as imposing severe restrictions 
on reproduction in small populations. In any case, the 
virtual absence of evolutionarily persistent inbred 
lineages suggests that outbreeding confers significant 
selective advantages. 


See also: Heterosis; Inbreeding Depression; 
Panmixis 
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An outcross is a cross between genetically unrelated 
organisms. Contrasted with an intercross (between 
two organisms that are identically heterozygous), an 
incross (between two organisms that are identically 
homozygous), and a backcross (between a homozgous 


organism and a second that carries the same allele as 
the first but a second allele in addition). 


See also: Backcross; Cross; Incross; Intercross 
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Overdominance describes a situation in which the 
heterozygote at a particular locus has greater size, 
fitness, resistance, or other desirable attribute than 
either homozygote. It is one explanation of heterosis 
or hybrid vigor. 

The weakening effect of inbreeding and the size and 
vigor of hybrids has been known since the time of the 
Greeks and Romans. The greatest practical use of this 
knowledge has been the breeding of hybrid corn. As 
early as 1908, G. H. Shull, working at Cold Spring 
Harbor, New York, developed inbred lines that were 
weak, as expected. But he also discovered that hybrids 
between the lines were large and vigorous and their 
yield often exceeded that of the randomly mated 
strains from which the inbreds were derived. He sug- 
gested soon after that this finding could be used in 
practical agriculture, and this has indeed happened. 
The practical difficulty of having to obtain seed from 
weak, low-yielding inbred plants, thus increasing the 
cost of seed, was circumvented by a suggestion a few 
years later from a graduate student, D. F. Jones, who 
suggested four-way crosses. In this system, one pro- 
duces seed for field planting by crossing two unrelated 
hybrids. Thus the seed is produced on high-yielding 
hybrids, and can be produced in abundance. In recent 
years, the yield of inbreds has been greatly increased 
by selection so that currently commercial corn seed is 
produced from single crosses. There is a great saving in 
the number of crosses required for adequate testing 
and a generation is saved; furthermore, the single cros- 
ses outperform double crosses and are more uniform. 

Originally, Shull thought the high yield of hybrids 
was due to stimulation of unlike germ plasms, but this 
soon came to be interpreted to mean that heterozygotes 
were superior to homozygotes. This was later called 
overdominance and this explanation of heterosis has 
come to be known as the overdominance hypothesis. 
This is sometimes erroneously called heterosis, but the 
word should be restricted, as Shull urged, to serve as a 
simple descriptive synonym for hybrid vigor. 

The alternative is the dominance hypothesis. This 
depends on the observed fact that most harmful 


mutations in a population are recessive. Inbreeding 
increases homozygosity, therefore increasing the pro- 
portion of deleterious recessive homozygotes and 
leading to decreased size and vigor. When two inbred 
lines are crossed, the recessives from each of them are 
concealed by dominants from the other. The vigor of 
the original noninbred population is restored, and 
if some deleterious recessives are eliminated during 
inbreeding the hybrid performance exceeds that of 
the original population. The two hypotheses make 
very similar predictions. In either view, inbreed- 
ing leads to deterioration and crossing two inbred 
lines leads to immediate recovery. The hypotheses 
are not mutually exclusive, but a preference for one 
or the other has vacillated over the years. 

The overdominance hypothesis was immediately 
criticized for the absence of convincing examples of 
gene loci at which the heterozygote was superior to 
either homozygote. The dominance hypothesis relied 
on the common observation of a correlation between 
recessiveness and deleterious effect. But it too was 
criticized on two grounds: first, selected inbred lines 
should be as vigorous as hybrids, and they are not; 
second, there should be a skewed distribution in F3 
populations according to the distribution of (3/4 + 
1/4)” where n is the number of loci. This was not 
observed either. These doubts were dispelled when it 
was pointed out that with a large number of factors, 
especially with the inevitable linkage, it would be 
improbable in the extreme to get all the favorable 
alleles in one strain, so it is not surprising that high- 
performing inbreds were not found. Likewise, with a 
large number of factors the skewness disappears. So 
from the 1920s until the mid-40s the dominance 
hypothesis prevailed. 

In 1945 Fred Hull resurrected the overdominance 
hypothesis. His main reason was the failure of rigor- 
ous selection to improve randomly mated popula- 
tions, but there were other arguments. Particularly 
convincing were the results of breeding systems 
devised to measure dominance. For yield in maize 
the estimated values were clearly in the overdomin- 
ance range. Overdominance became a widely held 
view from about 1950 to 1960 and this was the pre- 
vailing Zeitgeist at a conference on heterosis held at 
Iowa State College in 1950 (Gowen, 1952). 

By the middle and late 1950s, other breeding 
designs began to point toward dominance. In particu- 
lar, the later generations of the experiments designed 
to show overdominance failed to repeat the findings of 
the early generations. The early apparent overdomin- 
ance was the result of linkages between favorable 
dominants and deleterious recessives, called pseudo- 
overdominance. By 1960 the overdominance hypothe- 
sis had been largely abandoned. 
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At present there is little or no evidence for over- 
dominance. Inbred lines selected for performance 
yield as much as hybrids did a few years ago. Analysis 
of variance shows that most of the variance for yield is 
additive and the dominance component offers no evi- 
dence for overdominance. This is not to say that there 
are no overdominant loci. Examples have been discov- 
ered in several species, but they are rare. It is possible 
that the very best hybrids get a small additional boost 
from a small number of overdominant loci, but at the 
present time this is conjectural. 


Further Reading 
Crow JF (2000) The rise and fall of overdominance. Plant Breed- 
ing Reviews 17: 225-257. 
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Gowen JW (1952) Heterosis. Ames: lowa State College Press. 


See also: Heterosis 
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Overwinding of DNA is caused by positive super- 
coiling. This applies further tension in the direction 
of winding of the two strands about each other in the 
duplex. 


See also: DNA Supercoiling; Negative 
Supercoiling 
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Ovulation is a process that occurs in all female mam- 
mals. A female animal is born with a very large num- 
ber of immature oocytes or eggs (from hundreds of 
thousands in mice to millions in humans) within her 
two ovaries. When the animal or person reaches sexual 
maturity —a point in mammalian development referred 
to as puberty — she begins a developmental process 
called the estrus cycle. The estrus cycle can be as short 
as 4 days (in mice) or as long as 28 days (in humans). 
During the beginning of the estrus cycle, one or a few 
oocytes are induced to mature in eggs capable of being 


1402 Ovulation 


fertilized. When the maturation process is complete, 
the egg or eggs are released from the ovary in a process 
called ovulation. The released eggs enter the oviduct 
(also called a fallopian tube in humans) where they can 
be fertilized by sperm. If fertilization does not occur, 


they pass through the uterus and vagina, out of the 
female reproductive tract, and a new estrus cycle 
begins again. 


See also: Oogenesis, Mouse 
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Transposable elements are specialized segments of 
DNA with the ability to jump from place to place 
within the genome. Most species contain many 
families of transposable elements, and each family 
usually exists in many locations scattered about the 
chromosomes. P elements are one such family. They 
are common in many species of Drosophila, and have 
been studied primarily in Drosophila melanogaster. 

P elements are especially notable for two reasons. 
First, they impose a remarkable population structure 
on D. melanogaster, dividing the species into two 
groups called P or M depending on whether they 
have or lack P elements in their genomes. P strains 
typically have tens of copies of the elements dis- 
persed at random locations throughout the genome, 
whereas M strains have none. Crosses between M 
females and P males produce offspring whose germ 
cells have a high level of P mobilization resulting in a 
syndrome called ‘hybrid dysgenesis,’ which includes 
temperature- dependent sterility, elevated mutation 
rates, and premeiotic recombination. 

The second reason for interest in P elements is their 
usefulness as tools for studying the genetics of Dros- 
ophila. This usefulness is a direct consequence of the 
P/M dichotomy; researchers can introduce modified 
P elements into M strain genomes where there are no 
other copies to produce the gene products involved in 
transposition and regulation of P mobility. Thus, all 
aspects of the element’s behavior are under the control 
of the researcher. This property has resulted in the 
development of a wide variety of technological appli- 
cations for Drosophila research. 


P Element Structure and Function 


Complete P elements are 2907 bp in length with a 
structure as shown in Figure |. They encode one 
gene, a transposase required in trans for mobility. 
The termini are perfect 31-bp inverted repeats which 


are needed in cis for transposition. Internal to the 
repeats are transposase binding sites which are also 
required in cis. 

Since the transposase gene is required only in trans, 
a P element in which this gene is missing or defective 
can still transpose provided there is another source of 
transposase elsewhere in the genome. Such elements 
are said to be nonautonomous. Indeed, nonautono- 
mous elements are common within P strains, and 
usually make up the majority of copies. Naturally 
occurring nonautonomous P elements are smaller 
than complete ones, and contain deletions remov- 
ing all or part of the transposase gene, but leaving 
the terminal repeats and transposase-binding regions 
intact. Other nonautonomous P elements can be con- 
structed artificially by replacing the transposase gene 
with any other sequence. These constructs can be 
much larger than complete P elements. 


\ 31-bp inverted repeat 


transposase-binding region 


Transposase gene 


Transposase-binding region 
31-bp inverted repeat 


Figure | Structure of a complete P element (not to 
scale). Total element length is 2907 bp. 
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The transposase gene is interrupted by three 
introns. Splicing of one of the three introns is blocked 
by a gene product which is made only in Drosophila 
somatic cells. The failure of complete splicing in somatic 
cells means that P elements are normally mobile only 
in the germline. However, if a P element is modified 
to remove the germline-specific intron, then trans- 
position can occur in somatic cells as well. Such a 
modified P element is known as a A2-3 element. 

Within the germline, P elements are self-regulated. 
There are probably multiple mechanisms involved, 
and the details are not well understood, but certain 
P elements have been shown to produce a trans-acting 
repressor product. Some of these elements have dele- 
tions in the 3’ end of the transposase gene, whereas 
others are complete. The repression properties of the 
latter type depend on their genomic position, espe- 
cially near a telomere. 

The result of these regulation mechanisms is that 
P elements are not mobile within P strains. It is only in 
the hybrids between P and M strains that high levels of 
mobility occur. Furthermore, this hybrid dysgenesis is 
nonreciprocal. That is, crosses between M females and 
P males yield dysgenic hybrids, but crosses between 
P females and M males do not. It is thought that the 
eggs produced by P females contain regulatory factors 
that prevent mobilization, but the M-produced eggs 
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have no such factors. However, the inheritance of this 
regulation is a complex matter, violating the usual 
rules of either maternal effect or maternal inheritance. 

The effect of hybrid dysgenesis can be observed 
directly in the hybrids themselves as temperature- 
dependent sterility. The sterile flies have normal 
somatic parts, but they have few or no germ cells 
(Figure 2). In the presence of a A2-3 element, in 
which transposase is produced somatically as well as 
in the germline, the dysgenic flies usually die by the 
pupal stage if grown at elevated temperatures. 

When dysgenic flies are grown at lower tempera- 
tures, the effects of P mobility are not seen until the 
next generation. When fertile dysgenic hybrids repro- 
duce, their offspring carry mutations, chromosome re- 
arrangements, recombinant chromosomes, and other 
abnormalities in greatly increased frequencies com- 
pared to their cousins from the reciprocal cross. 
Most of these events occurred in the premeiotic germ 
cells of their dysgenic hybrid parents. 

Transposition of P elements occurs by a ‘cut and 
paste’ process. That is, a copy of the P element is ex- 
cised from one position in the genome and reinserted 
somewhere else. The excision occurs via a staggered 
cut leaving a 17-bp 3’ overhang. One end of the 
staggered cut occurs flush at the junction of 
the element with the host DNA, and the other end is 
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Figure 2 Hybrid dysgenesis. The cross on the left produces dysgenic hybrids. If raised at elevated temperatures, 
these hybrids have a high frequency of sterility owing to failure of the germ cells to develop. Ovaries of the sterile and 
fertile females are shown. In both cases all tissues derived from somatic cells are normal, but the germline-derived 
cells are missing in the sterile females. Male gonads are not shown, but are analogous. 
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Figure 3 P element transposition. (A) The P element 
on one of two sister chromatids transposes to a site on 
a different chromosome. (B) The original site now 
contains a double-strand break. The |7-bp 3’ overhang 
from the staggered cut is indicated. In some cases, this 
gap can expand into the flanking sequences owing to 
exonuclease activity. (C) The sister chromatid is used as 
a template to restore a P element copy into the gap. The 
net result is an increase of one P element copy. 


17 bp within the element. Reinsertion can occur any- 
where in the genome, but some sites are more likely 
than others. The insertion process results in a direct 
duplication of 8 bp of host DNA on either side of the 
insertion site. 
The excision of a P element leaves a double- 
stranded DNA break which must be repaired for sur- 
vival of the cell. This repair can occur by any of several 
mechanisms, but the most common process involves 
information from the sister chromatid being copied in 
to fill the gap as shown in Figure 3. The presence of 
17 bp of P element sequence at the ends of the break 
might facilitate the choice of the sister strand as a 
template rather than the homolog. The upshot is that 
anew P element copy is synthesized to replace the one 
that has just transposed to another location. There- 
fore, the copy number of P elements in the genome 
increases. 


P Element Population Biology 


P elements are ubiquitous in the genome of some 
Drosophila species, such as those of the willistoni 
and saltans groups. Many P elements in these species 
have multiple base substitutions and rearrangements, 


PElements 1405 


suggesting that they are ‘dead’ elements, i.e., evolu- 
tionary remnants. Note that ‘dead’ transposable ele- 
ments are different from nonautonomous elements in 
that the latter have only one or a small number of 
changes, and retain intact cis-acting parts. Other Dros- 
ophila species are devoid of any P element sequences. 
Examples include D. simulans and D. mauritiana, 
both sibling species of D. melanogaster. 

The existence of both P and M strains within 
D. melanogaster makes that species unique among 
those that have been studied. All known M strains 
are derived from laboratory strains established during 
the first few decades of the twentieth century, whereas 
all natural populations studied are P. Furthermore, 
‘dead’ P elements have not been found in D. melano- 
gaster. These observations suggest that P elements are 
relative newcomers in the D. melanogaster genome. 
Conclusive evidence for this interpretation came 
from a comparison of the DNA sequences of com- 
plete P elements in D. melanogaster versus that of 
complete P elements in D. willistoni. The two 
sequences differ by only one base pair. Such close 
conservation is unheard of between species that have 
diverged for about 60 million years, and implies that P 
elements have somehow moved ‘horizontally; i.e., by 
nonhereditary means, from D. willistoni to D. melano- 
gaster very recently. 

The willistoni and saltans species groups are endemic 
to Latin America and some parts of Florida, whereas 
D. melanogaster evolved in Africa and only colonized 
the New World via human activity in relatively recent 
times, probably the early 1800s. Therefore, the oppor- 
tunity for horizontal transfer between the two species 
has only existed for less than 200 years. It is not known 
exactly when or how the horizontal transfer happened. 
A reasonable guess is that a biological vector, such as a 
virus or parasitic mite, could have facilitated moving 
bits of DNA between species, but no such event has 
ever been observed directly. 

The rapid spread of P elements once they entered 
the species is easier to explain, and can even be repro- 
duced on a smaller scale in the laboratory. As men- 
tioned above, transposition usually results in an 
increase in the copy number (Figure 3), implying 
that P elements can spread through a population 
with no help from natural selection. Indeed, P elem- 
ents are actually harmful to their hosts in various 
ways, especially during the spreading phase when 
hybrid dysgenesis occurs. In addition, some insertions 
cause harmful mutations by landing within genes. After 
the spread is complete, and a population becomes a 
P strain, transposition activity is greatly decreased, 
and the harmful aspects of P elements are minimal. 
They become just another family of transposable 
elements with relatively rare transposition. 
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Technical Applications 


P elements are versatile tools for geneticists interested 
in studying or manipulating the Drosophila genome. 
In particular, P transformation is the primary method 
for placing genetic material into the germline. P elem- 
ents also provide several ways for generating muta- 
tions and other genomic changes. 


Separating Transposase from cis-Acting 
Components 
The key to the usefulness of P elements is the ability to 
work with M strains where the only P elements are 
those placed there by design. Researchers can then 
separate the gene encoding transposase from the ter- 
minal repeats and transposase-binding sites required 
in cis for transposition. Thus, transposition events 
occur only at the desired time, and not in previous or 
subsequent generations. Complete P elements, which 
contain both the transposase and the cis components, 
are not used, nor are P strains. 

Ina typical application, transposase is supplied by a 
P element fragment lacking one or more of the cis 
components needed for mobility. For example, a com- 
mon transposase source is a A2-3 element which had 
inserted on chromosome 3 and subsequently lost its cis 
components on the 5’ end. This element is a powerful 
source of transposase both somatically and in the 
germline, but is almost completely immobile. 

Nonautonomous P elements used in applications 
can be any of a wide variety of constructs in which the 
transposase gene has been replaced by other useful 
sequences such as reporter genes, specialized en- 
hancers, components of the FLP recombination 
system or the target sequences of endonucleases. 
These modified P elements can be as much as 40 kb 
in length and still transpose. However, smaller P 
elements usually jump at higher rates. 


Using P Elements for Germline 
Transformation 
The ability to place in vitro modified sequences into 
the Drosophila germline is essential for much of Dros- 
ophila research. Transformation is accomplished by 
placing the sequence of interest into a nonautonomous 
P element and injecting it into preblastoderm embryos. 
Transposase can be supplied by coinjecting either 
purified protein or the transposase gene. Alternatively, 
one can inject the nonautonomous P elements into 
embryos carrying an endogenous transposase gene, 
such as the immobile A2-3 element described above. 
Integration of the injected element occurs in the pre- 
meiotic germ cells. 

The injected embryos then develop into adults which 
are outcrossed to produce transformant progeny. 
Typically, 10-20% of the injection recipients will 


have at least one offspring with the construct. Fre- 
quently, the transformants appear in clusters which 
result from premeiotic integration into germ cells of 
the injected fly. 

Integration of the injected elements occurs by a 
process similar to ordinary P element transposition. 
The insertion site can be anywhere in the genome, but 
some sites have higher likelihood than others. Once a 
transformed line has been established, theinjected elem- 
ent is inherited stably unless transposase is supplied. 


Using P Elements for Mutagenesis 

When aP element inserts into a gene it usually results in 
a mutation. Therefore, one way to generate mutations 
in a given gene is to mobilize one or more nonauto- 
nomous P elements and screen the next generation 
for the expected phenotypic change. Ideally, the new 
mutation is recovered in a fly in which the transposase 
source has segregated away, thus ensuring the stability 
of the new mutation. This procedure is known as 
‘transposon tagging.’ Mutations generated this way 
are particularly useful because they allow researchers 
to identify the sequence and genomic location of a 
gene known previously only by its phenotype. 

Frequently, genes are first identified by their DNA 
sequence rather than their phenotype. Finding muta- 
tions in such genes is called ‘reverse genetics.’ The 
availability of the complete Drosophila genomic se- 
quence has made identifying genes by their sequence 
a much more common occurrence, thus greatly elevat- 
ing the importance of reverse genetics. P elements can 
be used for reverse genetics in several ways. One ap- 
proach is to generate a large number of flies, each with 
one or more new P element insertions at unknown 
sites, and then use the polymerase chain reaction 
(PCR) to identify those that lie in or near the gene of 
interest. 

A second way to do reverse genetics in Drosophila 
is to make use of a P element close to the gene of 
interest. The Drosophila Genome Project has gener- 
ated a large collection of stocks, each carrying a single 
P element in a different genomic position. For most 
genes, it is possible to obtain a stock with a P insertion 
less than 100 kb away, and often within 10 kb. 

There are three ways to make use of a P insertion to 
yield mutations in a targeted gene nearby. First, gene 
replacement can generate very specific changes by a 
process similar to that shown in Figure 3 except that 
the template for double-strand break repair is a modi- 
fied version of the targeted gene. This modified ver- 
sion is present as a transgene or as an injected plasmid. 
The P element excises to create a gap which frequently 
expands to include considerable flanking DNA. Fill- 
ing this gap with information from the modified ver- 
sion results in gene replacement. 


Second, some P elements have been observed to 
engage in ‘local jumping’ in which transposition to 
nearby genomic sites occurs with elevated frequency. 
Screens can then be used to detect transposition from 
the original site to the nearby gene. 

Finally, mobilization of a P element can cause dele- 
tions of flanking sequences, thus knocking out the 
targeted gene. In particular, a type of aberrant transpos- 
ition called ‘hybrid element insertion’ (HEI) results in 
crossing-over at the site of a mobile P element and 
simultaneous generation of a duplication or deletion 
flanking the element. HEI occurs when the left end of 
a P element joins with the right end of its twin on the 
sister chromatid to form a hybrid element. Insertion of 
this hybrid element into a site on the homolog results 
in a pair of recombinant chromosomes, one of which 
has a duplication and the other a deletion. Such events 
are readily detected as recombinants in the male germ- 
line where meiotic crossing-over does not occur. 


Other Techniques Involving P Elements 
P elements have been useful research tools in other 
ways as well. For example, specially modified P elem- 
ents can be used to determine the expression pattern of 
nearby genes. These ‘enhancer traps’ carry a reporter 
gene (i.e., a gene whose expression is easily monitored) 
whose promoter responds to genomic regulatory 
elements (‘enhancers’) close to its insertion site. By 
examining the expression pattern of a collection of 
such insertions, researchers can identify genes with a 
particular time and place of expression during devel- 
opment. 

The study of DNA repair has also been aided by 
P elements. As mentioned above, P element excision 
results in a double-strand DNA break which is then 
repaired by one of several mechanisms. The products 
of these repair events are easily recovered to provide 
information on the repair process. 


Conclusions 


When T. H. Morgan and his students selected Droso- 
phila melanogaster as their research organism, P elem- 
ents had probably already begun to spread through 
the species, but had not yet reached the populations 
where the specimens were collected. The timing of 
these events could not have been more fortuitous! As 
a result, Drosophila researchers now enjoy the avail- 
ability of both P and M strains, and were able to use 
the difference between these strains as a means of 
unraveling the surprising process of horizontal trans- 
fer and invasion of a transposable element into a for- 
eign genome. They were also able to use P elements to 
develop a dazzling array of tools for studying and 
manipulating the Drosophila genome. 
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T.S. Painter (1889-1969) was a pioneer in animal 
cytogenetics; he was the first to attempt serious stud- 
ies of mammalian chromosomes; he clearly estab- 
lished the XX-XY sex determination mechanism in 
mammals including man; and he was the first to 
demonstrate the relationship between a genetic defect 
and a chromosome deletion. He was the first to 
demonstrate the relationship between X-ray induced 
genetic changes and chromosome rearrangement. He 
developed the standard technique for the study of 
salivary gland chromosomes in Diptera and was the 
first to recognize their true nature. He established the 
first chromosome maps of the salivary gland chromo- 
somes of Drosophila melanogaster and was able to 
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demonstrate relationships between physical and 
genetic distances. 

Painter was born in Salem, Virginia in 1889, son of 
Franklin V.N. Painter and Laura T. Schickel Painter. 
He was educated at Roanoke College and Yale Uni- 
versity where he obtained his PhD in 1913. His PhD 
thesis explored the process of spermatogenesis in 
spiders (Painter, 1914). Following his PhD he spent a 
year of postdoctoral study in Europe, partly in the 
laboratory of Theodor Boveri in Würzburg and partly 
at the famed Marine Zoological Station at Naples. 
Although Painter himself wrote little about his stay 
in Bovert’s laboratory it seems unlikely that he did not 
come away with a deep interest in the chromosome 
theory of heredity. His experience in Naples led to an 
interest in the forces involved in the cleavage of the 
fertilized egg into a multiplicity of cells following 
repeated mitotic divisions. On his return to the United 
States he was appointed as instructor in zoology at 
Yale and also taught marine invertebrate zoology at the 
Woods Hole Laboratory. There he met his future wife 
Mary Anna Thomas, a student in his course, whom he 
married in 1917. There he also met John T. Patterson, 
the head of the Zoology Department at the University 
of Texas at Austin, who offered Painter an academic 
position in the institution where he would spend the 
rest of his career. 

Painter’s interest in chromosome cytology and 
cytogenetics turned initially to the problem of the 
number of mammalian chromosomes and the nature 
of sex determination. Because of their small size and 
large number little was known about mammalian 
chromosomes at that time. Furthermore the tech- 
niques available involved fixation in complex fixatives, 
embedding in paraffin blocks, sectioning and staining, 
followed by laborious microscopic examination. 

Recognizing the importance of rapid fixation fol- 
lowing dissection Painter invented a multibladed knife 
to chop up the spermatogenic tubules of the testis 
immediately after removal to ensure rapid penetration 
of the fixative. He first demonstrated that sex deter- 
mination in the opossum (Didelphys virginiana) was 
determined by an XX-XY mechanism and was the 
first to show segregation of the X and Y chromosomes 
at meiosis I in the male (Painter, 1922). It was fortuit- 
ous that Painter chose the opossum which has rela- 
tively few large chromosomes for his first venture into 
mammalian cytogenetics since this perhaps encour- 
aged him to study the smaller and more complex 
chromosomes of eutherian mammals. Again it seems 
that a fortuitous circumstance led him immediately to 
the study of human chromosomes. Through a former 
premedical student who was practicing medicine in a 
state institution in Austin where “for therapeutic rea- 
sons,” Painter wrote, “they occasionally castrated 


male individuals” he was able to obtain and preserve 
“within thirty seconds or less after the blood supply 
was cut off, a human testis” (Painter, 1971). His initial 
paper (Painter, 1921) reported that in spermatogonial 
mitoses “the counts range from 45 to 48 apparent 
chromosomes, although in the clearest equatorial 
plates so far studied only 46 chromosomes have been 
found.” He suggested that the human diploid number 
was either 46 or 48. In a more detailed paper (Painter, 
1923) he concluded that the correct diploid number of 
humans was 48, and confirmed the presence of an XX- 
XY sex-determining mechanism. Painters work 
finally settled the controversy raging since the work 
of Von Winiwarter that human males carried a small 
Y Chromosome. The question of the correct number 
of chromosomes in humans was less certain, with 
numbers ranging from 16 to 47 in the literature. 
Painter’s observation of 48 chromosomes and the 
quality of his preparations seemed to settle the matter, 
and remained in the textbooks until 1956. Using mod- 
ern techniques, the correct diploid number of humans 
was then demonstrated to be 46 (Ford and Hamerton, 
1956; Tjio and Levan, 1956). In the light of modern 
knowledge and the fact that Painter’s counts were 
based on testes from three mentally defective males, 
Chu (1960) has suggested that these individuals might 
have been aneuploid. The quality of Painter’s prepar- 
ations was superb, the presence of a Y chromosome 
clearly established, and as stated by Ford and Hamer- 
ton (1956): 


The crux lies no longer in the microscopy but in the pre- 
parative technique. The weary hours of toil which the pi- 
oneers must have spent at the microscope is reflected in Von 
Winiwarter’s cri de coeur “P ai perdu un temps énorme à 
répétér des numérations fatigantes et j’avoue aussi trés fas- 
tidieuses.” The wonder is that there is so little to alter. 


As Glass (1990) has pointed out Painter’s error in no 
way diminishes his major discovery of the XX-XY 
sex-determining mechanism in mammals. Painter 
went on to establish the chromosome number of 
several marsupial and eutherian mammals and found 
an XY pair in all species. He noted that in general 
placental mammals have a higher chromosome num- 
ber than marsupials, that all with few exceptions 
have an XX-XY sex-determining mechanism, and 
that the Y chromosome is smaller than the X. In 
these studies (Painter, 1924a, b) he established four 
clear principles required to establish the karyotype 
for any species: 


1. the morphology of the diploid chromosome com- 
plex and chromosome number of the male; 

2. the haploid number as revealed in the second sperm- 
atocytes; 


3. the morphology and behavior of the sex chromo- 
somes during meiosis; and 

4. the chromosome number and morphology in the 
female. 


Whilst Painter’s studies of mammalian chromosomes 
placed him clearly in the forefront of mammalian 
cytogeneticists, his next project started him on the 
road to classical genetics and the beginnings of gene 
and chromosome mapping. Painter’s observations on 
the chromosomes of the Japanese waltzing mouse 
(Painter, 1927) showed that they carried the normal 
complement of 40 chromosomes but with two hetero- 
morphic pairs, the XY pair and a pair of autosomes in 
which one homolog is much smaller than the other, 
thus confirming a postulated chromosome deletion 
leading to the expression of the phenotype. This 
appears to have been the first cytological identifica- 
tion of a chromosome deletion producing a specific 
genetic effect (Glass, 1990). Painter’s collaboration 
with H.J. Muller led to parallel investigations of 
genetic and cytological effects of X-ray induced 
translocations and deletions in Drosophila (Muller 
and Painter, 1929). Painter’s perhaps most notable 
discovery in cytogenetics was the identification of 
the salivary gland chromosomes of all Diptera as 
being closely paired homologous chromosomes, and 
the introduction of a new cytological method for sal- 
ivary gland preparations, the acetocarmine squash 
method (Painter, 1933). Together with Wilson Stone 
he went on to develop the first cytological map of 
the X chromosome of Drosophila melanogaster using 
stocks containing deletions of short portions of the 
gene sequence (Painter, 1934a, b). He was able to 
demonstrate for the first time the reciprocal nature 
of translocations, and to show that the precise point 
of breakage in deletions could be determined at 
the level of individual crossbands, and in the case 
of heterozygous inversions, the formation of the 
inversion loop. These studies demonstrated that as 
predicted by the genetic data “the attraction between 
homologous chromosomes is point by point, locus by 
locus, band by band and not a synapsis caused in some 
vague way by chromosomes as entire units” (Glass, 
1990). Following these fundamental discoveries 
Painter’s research moved to a study of the nature and 
function of the heterochromatin. 

In 1944, Painter became President of the University 
of Texas, a position in which he remained until 1952. 
His tenure was marked by the application of Heman 
Marion Sweatt, an African-American, for admission to 
the University of Texas School of Law. This application 
was rejected, the University of Texas being a segre- 
gated institution at that time. Sweatt with NAACP 
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support sued naming Painter as respondent. The Uni- 
ted States Supreme Court in a landmark decision 
ordered the integration of the University of Texas 
School of Law and the University’s graduate school 
(Sweatt v. Painter Archives, University of Texas). 

From 1952 until his death in 1969 Painter remained 
active in research with increasing interests in the 
developmental processes, how it is that the hereditary 
materials passed down from one generation to the next 
mediate the conversion into a multiplicity of endpro- 
ducts in different tissues, questions that with the 
advent of molecular biology and functional genomics 
we are beginning to come to grips with today. From 
my experience Painter’s achievements mark him as one 
of the great biologists of the twentieth century. As he 
wrote in his final paper, published posthumously in 
1971: 


From my experience I think that you should first select and 
define some broad biological problems, select a suitable 
material upon which to work and use any available techni- 
ques for the solution of your problem. The most important 
thing is for you to have a biological and not a test-tube 
approach (Painter, 1971). 


His own work exemplified his ability to identify the 
problem, find the right material, and develop the 
necessary techniques. 
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A palindrome is a DNA sequence which when read 
from one direction is the same sequence as when 
read from the opposite direction on the complementary 
strand. For example, many restriction endonucleases 
recognize palindromic sequences. The restriction 
endonuclease EcoRI recognizes and cleaves the palin- 


dromic sequence GAATTC. 
Top strand: GAATTC 
Bottom strand: CTTAAG 


The complementary or bottom strand of this sequence 
would also read GAATTC when read right to left. A 
two-base sequence can be a palindrome: for example, 
the sequence AT is a palindrome, whereas AA is not a 
palindrome. A palindrome can also encompass many 
kilobases and is a sequence which has an axis of sym- 
metry located at the midpoint of the sequence. 


See also: Restriction Endonuclease 
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Panmixis refers to a pattern of mating in which indi- 
viduals in a population choose their mates at random. 
In a panmictic population alleles at any locus are 
paired in individuals at random, and allele frequency 
differences among subgroups within the population 
are negligible. 

Nonrandom mating can take many different forms. 
Mating among relatives (inbreeding), mating among 
individuals with similar genotypes or phenotypes 
(assortative mating), and mating among individuals 
with different genotypes or phenotypes (disassort- 
ative mating) are three of the most common ways in 
which populations may depart from panmixis. In each 
of these cases alleles are nonrandomly associated 
within individuals. As a result, there may also be 
differences in allele frequency among different sub- 
groups within the population. The more severe the 
departure from random mating, the more extreme 
the departure from panmixis and from random pairing 
of alleles within individuals. 

Most populations will exhibit a combination of 
random and nonrandom mating. In humans, for ex- 
ample, marriage partners rarely choose one another 
based on blood type, but there is a strong tendency for 
marriage partners to have similar religious and ethnic 
backgrounds. Thus, mating occurs essentially at ran- 
dom with respect to blood type, although it occurs 
nonrandomly with respect to religious, racial, and 
ethnic background. Within a subgroup of humans 
having similar religious and ethnic backgrounds, 
alleles determining blood type are randomly asso- 
ciated within individuals, but there can be large differ- 
ences in allele frequency among individuals belonging 
to different racial or ethnic groups because of the non- 
random mating with respect to these characteristics. 

The locus determining R compatibility in humans 
provides a particularly clear example of this possibil- 
ity. The two most common alleles in Caucasians (R, 
and R3) occur ina combined frequency of about 57%, 
while their frequency in those of African ancestry is 
less than 7%. Similarly, the most common allele in 
those of African ancestry (Ro) occurs in a frequency 
of almost 74%, while the frequency of this allele in 
Caucasians is less than 2%. Within both groups the 
alleles present are randomly associated within indi- 
viduals, i.e., they are found in Hardy-Weinberg pro- 
portions. Nonetheless, the alleles are not associated 
randomly within the human population as a whole. 


When large differences in allele frequency are 
found among different subgroups within a population 
two conclusions follow: 


1. Mating is not random across subgroups. If it 
were, allele frequencies would not differ among 
them. 

2. Alleles are not associated randomly within indi- 
viduals. Alleles common within a particular sub- 
group are found together more frequently than 
expected if alleles were randomly associated across 
the entire population. 


Thus, humans do not form a panmictic population, 
although many subgroups of humans form panmictic 
subpopulations. Panmixis is always defined relative to 
a particular population. With respect to the entire 
species, humans are not panmictic. With respect to 
many subgroups of the human population, humans 
form panmictic subpopulations. It is also important 
to remember that the human species departs far less 
from panmixis than populations of many other plants 
and animals. The average genetic difference among 
subpopulations of humans is much smaller than the 
average difference among individuals within sub- 
populations. 

The most extreme departure from panmixis is 
found in some ferns and their relatives. In these plants 
a free-living haploid generation produces both sperm 
and egg, and in some species sperm and egg produced 
by the same haploid individual fuse to form a diploid 
zygote that is completely homozygous. Many flower- 
ing plants and some snails are only a little less extreme 
in their departure from panmixis. Diploid hermaph- 
rodites produce both sperm and egg that fuse to 
form zygotes heterozygous at only half as many 
loci as the individual that produced them, on average. 
Even animals and plants with separate sexes may 
inbreed to some extent, as when cousins mate, broth- 
ers mate with sisters, or aunts mate with nephews. 
Whether the departure from panmixis is a result of 
inbreeding, as just described, or assortative mating, the 
effects are similar: alleles with a similar effect are more 
likely to be found together within individuals than 
expected. 

Disassortative mating is less widely recognized, but 
may be almost as common as inbreeding and assort- 
ative mating. In many flowering plants, for example, 
individuals that share alleles at the self-incompatibility 
locus are prevented from mating. Only individuals 
that carry different alleles at this locus are able to 
mate. Similarly, there is evidence in mice that mating 
occurs preferentially among individuals with different 
genotypes at major histocompatibility complex loci. 
With disassortative mating, alleles with a similar effect 
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are less likely to be found together within individuals 
than expected. 


See also: Demes; Hardy-Weinberg Law; 
Inbreeding; Wahlund Effect 


Paralogy 


W Fitch 
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The circumstance in which two homologous genes 
diverge following gene duplication so that the com- 
mon ancestor of the two sequences predates their 
cenancestor. 


See also: Cenancestor; Orthology; Xenology 


Paramecia 
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Since Tracy Sonneborn demonstrated the existence of 
a genetic system in Paramecium in the 1930s, members 
of this genus have been valuable research organisms 
for many types of Mendelian and non-Mendelian 
genetic studies. A few dozen species in this genus 
have been found throughout the world, mostly in 
freshwater habitats. Cells are very large, ranging in 
size from about 100 to 300 um in length, and about 
20 to 504m in width. In spite of their size, the cell 
cycle time is typically short: as little as 5h for small 
species. Paramecia share a number of common cilio- 
phoran traits, including the cortex, the elaborate array 
of cytoskeletal and membranous structures organized 
around the basal bodies of the several thousand cilia 
on the cell surface, and nuclear dimorphism, the pos- 
session of two distinct types of nuclei in each cell. The 
small micronuclei are diploid, do not appear to be 
transcriptionally active, and contribute little to the 
phenotype of the cell. During mitosis or meiosis, 
the micronuclei show condensation of chromosomes 
and formation of a spindle apparatus without dis- 
solution of the nuclear envelope. Depending on the 
species, a cell might possess 1, 2, or 4 or more micro- 
nuclei. The large macronuclei are polycopy (ie., 
containing about 1000 copies of each gene), are 
transcriptionally active, and determine most of the 
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phenotype of the cell. During formation of the macro- 
nucleus, chromosomes are broken into smaller frag- 
ments that are telomerized and are replicatively stable. 
Macronuclear division (also known as ‘amitosis’), 
involves no elaborate spindle formation, no chromo- 
some condensation and no centromeres — the macro- 
nucleus elongates and divides more or less in half. In 
spite of this ‘approximation method’ of chromatin 
distribution, no losses of alleles or loci have been 
observed. The macronucleus is active during vegeta- 
tive proliferation of the cell, but is discarded during 
sexual events, while the micronuclei provide the geno- 
mic material for sexual events, including the formation 
of new macronuclei. This arrangement is similar to the 
distinction between germ-line and soma-line cells in 
animals. 


Mendelian Genetics 


Paramecium tetraurelia has been the object of exten- 
sive genetic investigations, so we shall discuss its 
genetic behavior as an example of the entire genus. 
P. tetraurelia has two micronuclei and one macronu- 
cleus. 


Nuclear Reorganization 

Sexual events in paramecia are characterized by an 
elaborate “dance of the nuclei’ known as nuclear 
reorganization (NR). The process includes meiosis 
and fertilization coordinated with other events, lead- 
ing to the formation of new macronuclei and micro- 
nuclei, and resorption of the existing macronucleus. 
The onset of NR in P. tetraurelia is identified by 
fragmentation of the existing macronucleus into 16 
or more pieces, apparently marking it for eventual 
degeneration. Both micronuclei then enter a typical 
two-division meiosis, producing eight haploid pro- 
ducts. Seven of the eight haploid nuclei degenerate; 
the one surviving nucleus is selected at random and is 
protected from destruction by its migration into a 
small protective region on the cell’s ventral right side, 
next to the oral apparatus, known as the paroral cone. 
This remaining nucleus then divides once mitotically, 
producing two identical haploid nuclei. These will 
participate in fertilization (see below). 

After fertilization, the diploid zygote nucleus 
divides twice mitotically without cytokinesis. The 
axes of the spindles of the second division are parallel 
to the long axis of the cell, placing two nuclei into the 
anterior part of the cell and two nuclei into the poster- 
ior tip of the cell. The anterior nuclei will become new 
micronuclei and the posterior nuclei differentiate in- 
to new macronuclei. Developing macronuclei have a 
characteristic morphology and are called anlagen. 
At the first cell division after fertilization, the two 


new micronuclei divide by mitosis, while the two new 
macronuclei are segregated each into the two daughter 
cells. Subsequent cell divisions show macronuclear 
division. The fragments of the old macronucleus are 
distributed among daughter cells with each division 
and are gradually resorbed. 


Conjugation 

Sexual reactivity is initiated by several hours of 
starvation. (Additionally, many species regulate mat- 
ing reactivity by diurnal cycles.) The cells cease vege- 
tative activities and begin synthesis of the mating 
reactivity substances which are located on the surface. 
P. tetraurelia has two mating types (some other species 
express four or more). When two reactive cells of 
complementary mating type touch, they will adhere 
to one another by their mating proteins and will initi- 
ate conjugation. Cells align their ventral faces to bring 
their paroral cones together; cell fusion occurs at this 
site, forming the conjugation bridge. Nuclear reorgan- 
ization then proceeds (see above). When the haploid 
nucleus in each cell divides mitotically, the spindle is 
aligned so that one product remains in each cell and 
the other is transferred through the bridge to the con- 
jugation partner. Thus, there is a reciprocal transfer of 
haploid (gamete) nuclei between the two cells. Each 
partner retains a haploid nucleus and contributes a 
haploid nucleus to, and receives a haploid nucleus 
from, its mate. The cells of each conjugant pair usually 
separate soon after nuclear transfer and fertilization. 
Each cell then completes NR individually. Conjuga- 
tion lasts 6-7 h from initial contact to cell separation. 
Nuclear exchange occurs at about 5 hours from initial 
contact. The genetic consequence of conjugation is 
that the two cells of a single conjugant pair become 
isogenic at all nuclear loci. Therefore, cells with dif- 
ferent cytoplasmic histories or cytoplasmic genetics 
can be analyzed relative to individual nuclear geno- 


types. 


Autogamy 

After several hours, if no conjugant partner is present, 
a mating-reactive cell will lose its reactivity and com- 
mit to the process of autogamy. (Not all species show 
autogamy.) The cell then proceeds through all stages 
of NR. When the remaining haploid nucleus divides 
mitotically, the two nuclei produced then fuse with 
each other to produce a zygote nucleus, after which 
the cell completes NR as usual. The genetic conse- 
quence of autogamy is that the cell becomes homo- 
zygous for all nuclear loci. This aspect is especially 
valuable when searching for nuclear mutations. One 
can mutagenize a population of cells, then induce auto- 
gamy and isolate individual autogamous cells into 
drop cultures. Homozygosity allows full expression 


of recessive alleles and allows the investigator to find 
recessive mutations readily. 


Cytogamy 

If, during conjugation, the paroral cones are mis- 
aligned or the conjugation bridge is disrupted in some 
way, the conjugant partners will continue through NR 
but will not accomplish a reciprocal transfer of hap- 
loid nuclei. The two haploid nuclei in each partner will 
instead fuse, so that fertilization occurs independently 
in each partner of a cytogamous pair. The cells then 
separate and complete NR. A cytogamous pair pro- 
duces two cells which are each homozygous at all 
nuclear loci, but are not necessarily genetically iden- 
tical to one another. The genetic consequence of 
cytogamy is the same as autogamy, and can be 
used for mutant hunts in species that do not express 
autogamy. 


Macronuclear Regeneration 

In any event involving NR, if something occurs to 
disrupt the development of the anlagen (developing 
macronuclei), these will be resorbed and lost. In 
response, the fragments of the old macronucleus 
again become transcriptionally and replicationally 
active. Each fragment expands in size over a series 
of cell cycles until it recovers its full size and func- 
tion as a macronucleus. Micronuclear development 
from the zygote nuclei progeny is unaffected. One 
potential consequence of this process is the produc- 
tion, in a single cell, of macronuclei and micronuclei 
with different genotypes. Certain physiological treat- 
ments (e.g., heat shock) and genetic types enhance 
the probability of induction of macronuclear regen- 
eration. 


Non-Mendelian Genetics 


Paramecia have been excellent model systems for the 
demonstration and investigation of a large number of 
non-Mendelian phenomena. A few examples will be 
discussed here; interested readers should consult one 
of the major reviews listed at the end of the article fora 
more comprehensive discussion. 


Mitochondrial Inheritance 

Cell lines bearing drug-resistant mitochondria can be 
isolated. When these are crossed with cells bearing 
drug-sensitive mitochondria, no transfer or mixing of 
mitochondria from one conjugant to the other takes 
place unless cell fusion is extended beyond the time of 
nuclear transfer (delayed separation). Some genetic 
types or chemical treatments induce this with high 
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frequency. The longer the separation of conjugating 
cells is delayed, the greater the possibility for mixing 
of fluid cytoplasms, and therefore transfer of mito- 
chondria from one cell to another. Intact mitochon- 
dria can also be isolated in vitro by standard cell 
fractionation protocols and mechanically microin- 
jected into a host cell. Successful transfer of mitochon- 
oa can be detected by the appearance of drug 
resistance in a cell previously sensitive to the drug. 
Although populations of two or more distinct mito- 
chondrial genetic types can be created in one cell, 
there is no evidence of genetic recombination among 
these mitochondria. 


Symbiont Organisms 

‘Killer’ paramecia release a particulate substance into 
the medium that is toxic to other ‘sensitive’ paramecia. 
The killer effect is the consequence of the presence 
of a cytoplasmic obligate endosymbiotic bacterium 
known as ‘kappa’ (also Caedobacter taeniospiralis). 
Sensitive cells lack kappa. The kappa bacterium itself 
possesses a defective lysogenic bacteriophage. A shift 
of the phage to a lytic state leads to the production of 
nonfunctional nucleocapsid proteins which crystallize 
inside the kappa into a cylinder. Killer paramecia 
release these particles into the medium. If they are 
eaten by sensitive cells, the cylinders cause the disrup- 
tion of food vacuoles, releasing digestive enzymes into 
the sensitive cell cytoplasm. Expression of the nuclear 
allele K is essential for maintenance of kappa in para- 
mecia; the kk genotype leads to loss of kappa because 
the bacteria cannot proliferate. Like mitochondria, the 
inheritance of kappa follows the fluid cytoplasm, so 
transfer of kappa from one conjugant cell to another is 
seen only in cases of delayed separation. Microinjec- 
tion of isolated kappa cells can also inoculate unin- 
fected paramecia. 


Molecular Transformation 

Initially, the standard techniques known to work on 
other organisms did not achieve transformation in 
paramecia. Work in the late 1980s demonstrated that 
direct injection into the macronucleus of cloned DNA 
in high copy number could achieve a high rate of 
transformation. DNA from apparently any source is 
telomerized and replicated in the macronucleus. The 
cell also transcribes and translates injected genes. Such 
transformations are stable with vegetative prolifer- 
ation until the macronucleus is discarded at the next 
NR. 


Cortical Inheritance 
The elaborate array of cytoskeletal and membranous 
components around the rows of basal bodies of the 
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cilia (the cortex) must be replicated every cell cycle to 
ensure that each daughter cell has a complete set of 
structures. Cortical morphogenesis requires that 
existing structures serve as precise templates for the 
formation of new structures. Thus, changes in the 
existing cortex, such as reorientation of cilia, can be 
propagated to progeny cells by virtue of the templat- 
ing process. This epigenetic phenomenon, known as 
cytotaxis or directed assembly, produces an inherit- 
ance pattern of the cortex which can be independent of 
that of nuclear genes. Inheritance of a cortical differ- 
ence strictly follows the structural lineage of the cell 
bearing the difference, and does not correspond to 
nuclear, or even other cytoplasmic patterns. No trans- 
fer of cortical differences at conjugation is normally 
possible. 

Although some might regard paramecia as atyp- 
ical organisms, their basic genetics fit well within 
the classical Mendelian patterns as based on meiosis 
and fertilization. The non-Mendelian phenomena 
associated with the group have been extremely 
useful in demonstrating the diversity of heritable 
processes known to exist in eukaryotes. Con- 
tinued investigations using these organisms are well 
justified. 
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Paramorphosis 


See: Neoteny 


Parapatric 
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Where allopatric populations or species have contigu- 
ous borders with each other, with or without inter- 
breeding, they are described as parapatric. 


See also: Allopatric; Speciation 


Paraphyly 
E Mayr 
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Paraphyletic taxa result from the operation of cladistic 
principles. A traditional higher taxon becomes para- 
phyletic when one of its lineages is removed from it 
because it has produced a derived group and forms a 
clade together with it. For instance, the Reptilia 
become paraphyletic when the Archosauria are 
removed because they gave rise to the birds, and also 
when the Therapsida and Pelycosauria are removed, 
because this synapsid branch of the Reptilia gave rise 
to the mammals. 

The cladistic principle of holophyly requires that 
the cladist dismantle any traditional taxon in which 
one of its components has given rise to a derived taxon 
(ex-group). Because most traditional higher taxa are 
ex-groups of an ancestral taxon, the cladistic method 
obliges its followers to consider most traditional taxa 
to be paraphyletic. This includes all fossil taxa, except 
the terminal ones. For instance, all major ancestral 
taxa of the mammals, Synapsida, Therapsida, Therio- 
dontia, and Cynodontia are paraphyletic. 


Reference 
Gans C and Pough FH (1982) Biology of the Reptilia. New York: 
Academic Press. 


See also: Background Selection; Holophyly 
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The term parasegment refers to the first morpho- 
logical subdivisions of the embryo of Drosophila. 
About 2h after gastrulation, grooves appear on the 
ventral side of the embryo that divide the ectoderm 
into 14 metameric units. These metameric units are 
transient and do not correspond to the definite seg- 
ments which reveal the metameric nature of the first 
instar larva and which appear much later in embryo- 
genesis. At the morphological level, the position of 
segmental openings of the tracheal tree, the tracheal 
pits, provide a fixed landmark to observe the transient 
nature of the parasegmental subdivisions. The tracheal 
pits are centered in the anterior third of each paraseg- 
ment, whereas they are located exactly in the position 
of the segment boundary. After the appearance of 
proper segments, the parasegment boundary comes 
to lie at the anterior third of each segment. 

The boundary between parasegments, known as 
the parasegment boundary, has been given some 
importance as a source of signaling molecules that 
will pattern the segment. In particular a stripe of cells 
at the anterior end of the parasegment expresses the 
signaling molecule Hedgehog, whereas a narrow 
stripe of cells at the posterior end expresses a member 
of the Wnt family of signaling molecules, Wingless. 
This means that at the parasegment boundary cells 
expressing Hedgehog abut against cells expressing 
Wingless. This creates a stable source of signaling 
molecules that will be used to pattern the epidermis 
and the nervous system. After the formation of proper 
segments the Hedgehog/Wingless interface remains as 
a remnant of the parasegment boundary on the ventral 
side of each epidermal segment. 

Parasegments are domains of expression of homeo- 
tic genes with the parasegment boundary defining the 
limit of expression of these genes and each homeotic 
gene having a well defined and characteristic onset of 
expression at a particular parasegment. The paraseg- 
ment boundary also represents a boundary of lineage 
restriction such that during cell proliferation cells 
from different parasegments do not mix with each 
other. It is thought that the parasegment boundary is 
a template for the compartment boundary, a site of 
residence of morphogen molecules that serve to pat- 
tern the adult fly. 

At the level of architectural designs of body plans, 
it is likely that parasegment boundaries are related to 
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the boundaries of the rhombomeres and, perhaps, also 
of the somites of vertebrate embryos. 


See also: Cell Lineage; Drosophila melanogaster 


Parental 


L Silver 
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Parental is the term that refers to the genotype or 
phenotype of the parents used in a cross. Offspring 
are said to carry a parental genotype or express a 
parental phenotype if either is identical to that present 
in, or expressed by, one of the strains used to generate 
the offspring. 


See also: Cross 


Parsimony 


G J Olsen 
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Parsimony (or maximum parsimony) is a criterion for 
evaluating alternative hypotheses, and is particularly 
common in phylogenetic analyses. In a parsimony- 
based analysis, the preferred history is the one that 
could give rise to the data with a minimum number 
of events (e.g., inventions and losses of features). 
Although parsimony is sometimes presented as apply- 
ing Occam’s razor, the principle that one should pick 
the simplest explanation of data, parsimony can easily 
be motivated by considering the history of major 
inventions (innovations) in the history of life. When 
focusing on innovations, it is intuitive to assume that 
they rarely, if ever, recur (otherwise they would be 
seen as incremental changes, not innovations), and 
hence the history will be parsimonious. Felsenstein 
(1982) has summarized much of the history and 
many of the variations of parsimony-based phylo- 
genetic inference. 


Motivation of Parsimony 


Consider the notochord of chordates (the structure 
from which vertebrae are derived). It is quite certain 
that the common ancestor of vertebrates and echino- 
derms did not have a notochord, it had not yet been 
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Figure | A tree (cladogram) of relationships among 
some animals, showing the invention of the notochord 
(bar), and its inheritance by chordates (heavy line). The 
common ancestor of echinoderms and vertebrates did 
not have a notochord; along the lineage (stem) from 
this ancestor to the vertebrates, it was invented. This 
feature has been conserved by members of the chordate 
group, and it is not found among any other group of 
organisms. Thus, a newly discovered organism with a 
notochord would be placed in the part of the tree with 
heavy lines, otherwise we would need to postulate a 
second invention of the notochord. 


invented (Figure |). The notochord is unique in 
evolution: all species that develop one are members 
of a single group that descended from a common 
ancestor with a notochord. Conversely, there are no 
insects, trees, or bacteria with a notochord. If a new 
species is observed to have a notochord, it will be 
placed among the chordates on the basis of this feature 
(see Clade). It would take extraordinary evidence to 
justify any other placement, since it would require 
proposing that the notochord was invented twice. At 
a basic level, parsimony is motivated by minimizing 
reinvention. In the case of major events in evolution, 
this is clearly desirable. 

The features or inventions analyzed are called char- 
acters. To be compared, all instances of a character 
must have a common genetic basis (see Homology), 
as opposed to being separate inventions of similar 
features. For example, bird wings, bat wings, and 
insect wings are not shared features, rather they repre- 
sent three separate inventions of structures that func- 
tion as wings. The homology of compared characters 
is important to the justifications of all phylogenetic 
inference methods, not just parsimony. 

There is a trivial solution to avoiding reinvention: 
assume that features are invented once, but lost an 
arbitrarily large number of times. Thus one must also 
assume that losses of a feature are to be avoided. 
However, it is usually not possible to find a tree for 


which every feature is invented once and never lost, so 
the parsimonious solution is the tree that minimizes 
the number of reinventions and losses. From a math- 
ematical perspective, it is equivalent to minimize 
invention or minimize reinvention. Most parsimony 
analyses count and minimize all events, not distin- 
guishing invention from reinvention, or invention 
from loss, though other treatments are possible 
(below). 

A consequence of not distinguishing the direction 
of changes is that the most parsimonious tree depends 
on the topology, but not the position of the most 
recent common ancestor in the tree. That is, parsi- 
mony analyses yield unrooted trees. The earliest 
point in time is usually identified by bringing add- 
itional data or judgement into the analysis. The most 
common method of rooting a tree is called outgroup 
rooting, in which additional taxa are introduced into 
the analysis with the a priori assumption that they are 
not members of the group of interest. Thus, Figure | 
suggests that the echinoderms might be used to orient 
(root) a tree of chordates, but this requires a separate 
judgement that there are biologically sound reasons to 
assume that the echinoderms are not modified chor- 
dates. 


Generalizations of Parsimony 


Gain and loss of a feature are special cases; more 
general types of change are possible. A feature might 
become progressively larger, or a position ina DNA 
molecule might change among four nucleotides, or a 
position in a protein might change among the 20 
amino acids. In the first case, it is natural to assume 
that all instances of change in size must pass through 
intermediate steps in the progression, and to count the 
number of steps traversed whether they are observed 
or not. In the case of DNA, there is no natural order- 
ing of the changes, so it is most common to allow any 
nucleotide to directly change to any other nucleotide 
in a single event. The most common treatment of 
proteins is to allow changes directly between any 
two amino acids (Eck and Dayhoff, 1966), but it is 
also possible to count the number of nucleotide 
changes that must occur in the underlying DNA 
sequence (Fitch, 1971). These alternatives, and many 
others, can be incorporated into a common frame- 
work by appropriately defining the cost of transform- 
ing one state of a character into each other possible 
state. For example, it is possible to make the invention 
of a complex feature much more costly than losing it. 
This would more closely correspond to the discussion 
of the notochord (above), or the eye of vertebrates, 
which was invented once and lost several times (as in 
blind cave fish). Efficient algorithms have been 


devised to calculate the minimum cost of all of the 
changes required in a character for any given tree. 

The evaluation of a tree is a synthesis of the data for 
all characters in the analysis, which is achieved by 
adding the costs of the changes of all characters. 
Thus, for a given tree and set of data, parsimony 
provides an overall tree score, and the goal is to find 
the tree (or trees) with the lowest score. There is no 
analytic way to do this, but the available computer 
programs provide options for rigorously searching or 
heuristically searching for the most parsimonious tree. 
Even when it is not possible to prove that the best tree 
has been discovered, it is always possible to test any 
concrete alternative to see if it is as good as or better 
than the current best tree. 

Parsimony can be derived from a maximum like- 
lihood perspective (e.g., Felsenstein, 1973; Goldman, 
1990). This requires defining tree branch lengths as 
the expected number of events in the branch (or the 
expected number of events per character). Parsimony 
does not provide an unambiguous definition of branch 
length, though bounds can be placed upon the total 
number of events in any branch of a tree. The max- 
imum parsimony tree is also the maximum likelihood 
tree when changes are rare, and the probability of 
change is equal in all branches in the tree (ie., all 
branch lengths are equal). The likelihood framework 
has provided some very important insights into parsi- 
mony. 

Felsenstein (1981) used maximum likelihood to 
derive an optimal relative weighting of the various 
characters for a maximum parsimony analysis. If one 
event is expected to occur much less often than 
another (e.g., invention of a complex structure vs. a 
relatively minor modification in another), then it 
makes sense to emphasize the rare event by making 
changes in it more costly. The appropriate weights are 
proportional to minus the logarithm of the expected 
number of changes of each character. This refers to the 
absolute frequency of changes in characters (not just 
their relative frequency), so the inferred weight could 
be negative for a frequently changing character. 
Although this is primarily an artifact of approxi- 
mations used in the derivation, it also reflects deeper 
problems when change is common (below). 


Potential for Systematic Error 


Long Branch Attraction 

Felsenstein (1978) demonstrated that parsimony is 
subject to systematic error when evaluating a tree 
with long peripheral branches separated by a short 
internal branch; in such cases, parsimony can be posi- 
tively misleading. Unlike random errors, systematic 
errors are not reduced by adding more data, which in 
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the case of parsimony means adding more characters. 
The situation analyzed is shown as a rooted, four- 
taxon tree (Figure 2, left panel). If the branches to A 
and D are sufficiently long, and the internal branch is 
short enough, then the most parsimonious tree will 
group B with C, not with its closest relative, A. This 
situation is frequently shown as an unrooted tree 
(Figure 2, center panel), though this representation 
makes it less obvious that the lineages do not have 
equal rates, no matter where the tree is rooted. 

With four taxa, the problem only arises when one 
taxon has diverged significantly faster (hence signifi- 
cantly more) than the two short-branch lineages. This 
observation was erroneously generalized to a conclu- 
sion that parsimony is free of systematic error when 
the rates of change are equal (a situation similar to that 
found in phenetics), and the problem was sometimes 
called ‘the unequal rate effect.’ 

Later, it was shown that with five taxa (or more), 
even equal rates of change are not sufficient to ensure 
consistency of parsimony (Figure 2, right panel) 
(Penny et al., 1991). The source of the problem is the 
length of the branches. Branch length is defined by the 
frequency of changes in characters. With two long 
branches there is a significant probability that the 
same change will occur independently in both of the 
long branches. The resulting states of the characters 
will be the same due to parallel or convergent changes, 
not due to conservation of the common ancestral state 
(synapomorphy or symplesiomorphy). This similarity 
that is not due to preserving the ancestral state is called 
homoplasy. When the amount of homoplasy is greater 
than the number of events that occurred in the com- 
mon branch (stem) defining the group of interest, then 
parsimony can be misleading. The problem is now 
most commonly called ‘long branch attraction,’ since 
it tends to join long branches in a tree. 


Minimizing Long Branch Attraction 
The potential for systematic error does not imply that 
parsimony-based trees are necessarily wrong, indeed 
they have proven to be very useful. The potential for 
long branch attraction to yield incorrect trees can 
sometimes be minimized by careful consideration of 
the data included in (or excluded from) an analysis. 
When a branch in a tree is interrupted by a new 
branch point, it becomes two shorter branches. Thus, 
it is sometimes possible to subdivide the longest 
branches in a tree by adding additional taxa. This can 
be accomplished by increasing the density of sampling 
of taxa in the group of interest. This requires that the 
necessary taxa be available and be identified, which 
can be problematic. For example, coelacanths (lobed- 
fin fish) are represented by only a single extant species, 
so in an analysis of vertebrate relationships, it is not 
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Figure 2 Potential systematic error in parsimony. The top of each panel shows a phylogenetic tree (with branch 
lengths drawn to scale) in which long tree branches are separated from one another by a short internal branch. If the 
long branches are sufficiently long, parsimony will systematically join the long branches, yielding a historically 
incorrect tree. The left and center panels are similar, but with the rooted version it is easier to see that the tree 
cannot be drawn with equal amounts of divergence in all lineages. In the right panel, the amount of change is the same 
in all lineages, yet there are four equally parsimonious incorrect trees. The one shown moves the long branch to A 
onto the line coming from the distant outgroup E. By symmetry, it would be equally parsimonious to move B, C, or D 


to the line from the outgroup. 


possible to improve the sampling of taxa in this import- 
ant lineage. 

Frequently, the longest branch in an analysis is that 
to the outgroup. There are at least four factors that are 
important in the selection of outgroup taxa. First, they 
must be clearly outside of the group of interest, other- 
wise the inferred direction of evolution will be incor- 
rect in parts of the ingroup. Second, the outgroup taxa 
should be as close to the taxa of interest as possible; the 
more distant the outgroup, the greater the potential 
for long branch attraction (echinoderms are a better 
outgroup than plants for an analysis of vertebrates). 
Third, it is preferable to use two or more diverse 
representatives of major outgroup lineages, which 
subdivides the long branch to the group (so including 
both sea urchins and starfish in an echinoderm out- 
group is preferable to either one alone). Finally, if 
there are several candidate lineages for the outgroup 
that are comparably good, it is preferable to use all of 
them, rather than just one. Again this has the effect of 
subdividing the long branch to the outgroup. 

The choice of features (characters) included in an 
analysis also influences long branch attraction. As 
noted above, it is commonly the case that some char- 
acters change more frequently than others. It is as 


though branches are longer for frequently changing 
characters than for rarely changing characters, so long 
branch attraction is a greater problem for frequently 
changing characters. Thus, removing these characters 
from an analysis, or at least dramatically lowering the 
emphasis placed on them (their weight), helps to min- 
imize the systematic error due to long branch attrac- 
tion. The disadvantage to completely eliminating these 
characters is that they sometimes provide the only 
information available to resolve the details of relation- 
ships among closely related taxa in the analysis. 
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The term parthenogenesis is defined as the production 
of an embryo, with or without eventual development 
into an adult, from a female gamete in the absence of 
any contribution from a male gamete. This phenom- 
enon differs from (1) gynogenesis where the oocyte is 
stimulated to complete the second meiotic division 
and to undergo further development by a sperm- 
atozoa, which however does not contribute genetic- 
ally to the developing embryo; (2) androgenesis where 
the egg is also activated by a spermatozoa, but 
where the male genome alone takes part in subsequent 
development. 

In some species, like in aphids, parthenogenetic 
generations alternate with others in which fertilization 
takes place. This is known as ‘cyclical parthenogen- 
esis.’ In other species, like in bees, an oocyte may be 
either fertilized or develop parthenogenetically. In 
species not belonging to these two groups, the activa- 
tion of mitogen-activated protein kinase (MAPK) by 
Mos appears to be one of the mechanisms able to pre- 
vent unfertilized eggs from proceeding into partheno- 
genetic development. 

Parthenogenetic development may proceed by 
various routes: 


Parthenogenesis, Mammalian 1419 


1. Extrusion of the second polar body and develop- 
ment of eggs with a single haploid pronucleus. 
Embryos derived from these eggs contain genetic- 
ally identical cells. 

2. Immediate division of the eggs into two equal-sized 
blastomeres, each containing one haploid nucleus, 
each derived from one of the two products of the 
second meiotic division. Embryos derived from 
these eggs contain cells that are genetically dissimi- 
lar, related to their chiasma frequency. 

3. Nonextrusion of the second polar body with the 
consecutive presence of two haploid pronuclei, 
each containing one of the two products of the 
second meiotic division. Embryos derived from 
this group are genetically identical to those issued 
from the former group, although their developmen- 
tal potential is different. 

4. Nonextrusion of the second polar body resulting in 
a single diploid pronucleus containing both oocyte 
and second polar body chromosomes. Embryos 
derived from these eggs contain genetically identi- 
cal diploid cells. 

5. Nonextrusion of the first polar body, the resulting 
eggs being diploid or tetraploid, depending on 
whether the second polar body is or is not extruded. 

6. Pre- or postmeiotic endoreduplication resulting in 
diploid eggs. 

7. Complete suppression of meiosis replaced by the 
occurrence of two mitoses and giving rise to diploid 


eggs. 


Parthenogenesis may give rise to both males and 
females. Offspring produced by parthenogenesis in 
the absence of meiosis will all be female, except in 
the case of occurrence of nondisjunction giving rise 
to XO males. Some other insects have cytoplasmically 
(maternally) inherited symbionts called parthenogen- 
esis bacteria that prevent segregation of chromosomes 
in unfertilized eggs. In this case, parthenogenesis is an 
adaptive mechanism increasing the frequency of the 
bacteria by biasing the sex ratio towards the transmit- 
ting (female) sex. 

In mammals, spontaneously occurring cleavage in 
ovarian or tubal eggs has been described in many 
species, including humans. Within the ovary, par- 
thenogenesis is followed eventually by teratoma or 
teratocarcinoma formation. 

Only one-fourth of implanted induced mouse 
parthenogenones may develop up to the somite stages, 
at which they are characterized by abnormal dif- 
ferentiation and proliferation in both the embryonic 
and extraembryonic lineages. Adult parthenogenones 
have never been recorded in mammals, but partheno- 
genetic-normal chimeras are viable in mice and 
humans. 
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The failure of X-inactivation in the trophectoderm, 
the lack of expression of IGF), proliferation and cell 
fate stem cell defects and the deficiency of the second- 
ary trophoblast giant cells all contribute to the defect- 
ive development of parthenogenones. 

In mammals, the maternal and paternal genomes 
are required for normal embryonic development 
because of genomic imprinting. Maternally inherited 
metaphasic chromosomes of parthenogenetic and of 
normally fertilized mouse preimplantation embryos 
have the same pattern of methylation, which is very 
different from that of the paternally inherited set of 
chromosomes. The physical absence of interaction 
with a male genome as well as the absence of male 
transactivating factors may contribute to the defect- 
ive gene expression and development of partheno- 
genones. 


See also: Androgenone; Gynogenone 
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Patau syndrome, trisomy 13, was first described by 
the German-born American geneticist Klaus Patau in 
1960. It is a rare condition with an incidence of ap- 
proximately 1 in 12 000 livebirths. The risk of trisomy 
13 increases with advanced maternal age, but even at a 
maternal age of 40 years the absolute risk for a live- 
birth with trisomy 13 remains very low at 1 in 2000. 
The majority of trisomy 13 conceptions result in 
spontaneous abortion. 

The clinical features of trisomy 13 include growth 
retardation, holoprosencephaly, cleft lip and/or palate, 
cardiac malformations (80%), polydactyly or limb 
deficiency, omphalocoele, kidney malformations, a 
scalp defect, and severe mental retardation. The me- 
dian survival of affected infants is 2.5 days, and > 80% 
die within the first month. Only 5% survive to 6 
months. Children surviving longer than this are likely 
to be mosaic (i.e., have a percentage of cells with a 
normal karyotype in addition to the trisomic line). 

Patau syndrome may occur as a result of meiotic 
nondisjunction resulting ina gamete with two chromo- 
some 13s rather than one. When this gamete fuses 
with a normal gamete the zygote has an aditional 
chromosome 13, with a karyotype 47, XY + 13 or 
47, XX + 13. Sometimes the additional chromosome 
results from the unbalanced product of a Robertsonian 
translocation. Approximately 1 in 1000 people 


carry a Robertsonian translocation, of which 75% 
are rob (13314). Carriers of a rob (13:14) translocation 
have a small (1% or less) risk in each pregnancy of a 
liveborn offspring with trisomy 13. 

After a pregnancy affected by trisomy 13, the recur- 
rence risk for future pregnancies is low at <0.5%, pro- 
vided that neither parent is a carrier of a translocation. 


See also: Robertsonian Translocation; Trisomy 
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Paternal inheritance refers to the transmission of any 
attribute from a father to his offspring. Most patern- 
ally inherited traits can be explained by the inherit- 
ance of nuclear genes that are contributed by the male 
parent and expressed in his progeny. However, cyto- 
plasmic components, such as organelles in plants and 
centrosomes in animals, can also be paternally trans- 
mitted and can influence phenotype. Inheritance pat- 
terns that are exclusively paternal or paternally biased 
are interesting because they reveal asymmetries in the 
parental contributions to the embryo. Consequently, 
studies of paternal inheritance, as well as those of 
maternal inheritance, link genetics to reproductive 
and developmental biology. Paternal inheritance is 
also a medically important topic since several human 
disorders are paternally inherited, or exhibit paternal 
effects that affect the severity of disease symptoms. 


Paternally Inherited Nuclear Factors 


In sexually reproducing organisms, one half of the 
chromosome complement is paternally inherited and 
the other half is maternally inherited. With the ex- 
ception of the sex chromosome constitution, these 
two chromosome sets are usually functionally equiva- 
lent in the embryo. However, in some organisms, 
chromosomes are differentially imprinted during 
spermatogenesis and oogenesis, resulting in sperm- 
specific, or oocyte-specific modifications at the DNA 
or chromosomal protein level. These parent-of-origin 
modifications can lead to differences in the behavior 
of homologous chromosomes or differences in 
the expression of paternal and maternal alleles in the 
embryo. Particularly striking examples of a paternal 
effect on chromosome behavior occur during normal 
development in fungal gnat Sciara and in scale insects 
(coccids). In these species, both paternal and maternal 


chromosome complements are contributed to the 
embryo at fertilization. However, in half of the em- 
bryos developing from fertilized eggs, the paternal 
X chromosome of Sciara and all the paternally inherited 
chromosomes in coccids are either heterochromatized 
and rendered inactive, or are eliminated during cleavage 
divisions. In these species, the imprinting process 
that distinguishes paternal and maternal chromosomes 
and results in loss of paternally inherited chromo- 
somes also serves as a mechanism for sex deter- 
mination. 

Parent-of-origin imprinting effects are observed 
in mammals and flowering plants as the differential 
expression of paternal and maternal alleles in the 
embryo. There are several well-documented cases in 
mice, humans, and Arabidopsis of alleles that are active 
in the embryo when inherited from one parent, but 
inactive or delayed in its expression when inherited 
from the other parent. Oppositely imprinted loci tend 
to exist in clusters in the mouse genome and in corres- 
ponding regions in the human genome. These clusters 
were first identified in the mouse as regions that must 
be biparentally inherited to support normal develop- 
ment. Uniparental inheritance, due to inheritance of 
a chromosomal deletion from one parent or due to 
inheritance of both copies from a single parent (uni- 
parental disomy), results in abnormal phenotypes. The 
abnormalities can differ depending on whether the 
paternal or maternal copy is lacking. The importance 
of understanding paternal inheritance and imprinted 
loci in man is illustrated by the disease phenotypes 
associated with region 15q11-13 on human chromo- 
some 15. Individuals who lacks a paternal copy of 
15q11-13 have a characteristic set of clinical features 
that include obesity, mental retardation, and small 
stature, a condition known as Prader-Willi syndrome. 
Individuals who lack a maternal copy of 15q11-13 ex- 
hibit a distinct set of symptoms, known as Angelman 
syndrome. These examples of parent-of-origin im- 
printing effects reveal the nonequivalence of paternal 
and maternal nuclear contributions to the embryo. 

Male-specific chromosomes provide the most 
straightforward cases of strict paternal inheritance. 
In the XY/XX sex chromosome system of mammals, 
sex is a paternally inherited factor because of the 
action of a single Y-linked gene. In humans, this gene 
is the SRY (sex-determining region Y) which encodes 
a transcription factor required for testis determin- 
ation. The Y chromosome of many animals also con- 
tains a handful of genes that are essential for normal 
spermatogenesis. Hence these male fertility factors are 
paternally inherited from father to son, as are the 
disorders that result from deletions of these genes. In 
fact, Y chromosome deletions are surprisingly com- 
mon among human males, accounting for up to 2% 
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of the clinically diagnosed cases of male infertility. 
Candidate spermatogenesis genes have been identified 
within the deleted intervals. These genes include DAZ 
and RBMY, which encode testis RNA-binding pro- 
teins, DFFRY, which is involved in the ubiquitin modi- 
fication pathway, and DBY, a putative RNA helicase. 
Counterparts of these genes exist in the mouse con- 
firming that the Y chromosome plays a spermatogenic 
role in mice, and probably in other mammals also. 

Paternal effects on disease phenotypes are not 
restricted to Y-linked loci. There are paternal biases 
in the transmission of some neurodegenerative poly- 
glutamine repeat disorders. These diseases display 
anticipation, a process in which an autosomal domin- 
ant disorder becomes increasingly severe and displays 
earlier age of onset from one generation to the next. 
Huntington’s disease, spinacerebellar ataxia type 
1(SCA1), SCA 3, SCA 7, spinal and bulbar muscular 
atrophy (SBMA), and dentatorubal pallidavysian 
atrophy (DRPLA) all display anticipation and all 
have greater polyglutamine tract expansion when 
passed through the male germline. The polyglutamine 
anticipation phenomenon may reflect the generally 
increased rate of mutation in the male germline rela- 
tive to the female germline. Males of a number of 
different species, including humans and Drosophila, 
show as much as a tenfold increase in mutation rate 
per generation when compared to females. The in- 
crease in mutation rate is thought to be the result of 
accumulated errors during DNA replication and 
possible increased sensitivity to environmental muta- 
gens. The high mutation rate in males has also led to 
the concept of male-driven molecular evolution. 


Paternally Inherited Cytoplasmic 
Components 


Most organelles are thought to be maternally inherited 
due to volumetric constraints of sperm. Cytological 
studies have shown that the mitochondria of mamma- 
lian sperm can enter the egg upon fertilization but are 
destroyed via a ubiquitin-mediated process in the egg 
cytoplasm. This process is thought to prohibit effect- 
ively the inheritance of paternal mitochondria. As 
more sensitive techniques to detect trace amounts of 
mitochondrial DNA (mtDNA) are being used, there 
is increasing evidence for the infiltration of paternal 
mtDNA in animals, including mice and humans. This 
changing viewpoint on the inheritance pattern of 
organelles is most evident in plants. Prevailing views 
have held that both chloroplasts and mitochondria are 
exclusively maternally inherited. On the contrary, 
organelle inheritance is far more complex in plants. 
In the kiwi fruit, Actinidia deliciosa, chloroplasts 
(cp) are inherited paternally while mitochondria are 
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maternally inherited. In bananas, there is strong bias 
toward transmission of paternal mtDNA and mater- 
nal cpDNA. In Nicotiana, Pisum, and Brassica, 
cpDNA is thought to be largely maternal, with per- 
haps low levels of paternal transmission. 

In contrast to mitochondrial inheritance, it has long 
been known that the centrosome, required for sperm 
aster formation and first mitosis, requires a paternal 
contribution. The sperm’s role in centrosome inherit- 
ance was first recognized by Boveri in 1883. It has 
since been confirmed in mammals, birds, reptiles, 
amphibians, fish, invertebrates, and algae. The current 
view of the centrosome is that it is a complex, self- 
replicating organelle that requires the assembly of 
dozens, if not hundreds of proteins. Whereas central 
organizing seed for the centrosome is provided by the 
sperm centriole in the vast majority of animal species, 
a fully functional organelle requires maternally pro- 
vided centrosomal proteins. This makes the centro- 
some in animals an organelle of biparental origin. An 
important exception to this rule is provided by the 
centrosome of rodents, which is derived only from 
maternal components. This feature makes rodents a 
problematic model for studies of human fertilization. 
Whether the centrosomes in plants also depend on 
paternally inherited components remains unknown. 


Paternal Contributions as Revealed by 
Paternal Effect Mutations 

Mutations that show paternal effects on development 
can be used to study paternal inheritance and identify 
sperm-supplied components that are essential for 
embryogenesis. Paternal effect mutations are defined 
as mutations that when present in males affect the 
development of their offspring. Like maternal effect 
mutations, the genotype of the parent is the most 
critical genotype, since the paternal genes are 
expressed during gametogenesis and affect the func- 
tion of sperm components in the embryo. In contrast 
to the large number of known maternal effect muta- 
tions, only a few paternal effect genes have been iden- 
tified so far. The best characterized of these exist in the 
fruit fly Drosophila melanogaster and the nematode 
Caenorhabditis elegans because of the relative ease of 
performing genetic screens for paternal effect muta- 
tions and developmental genetic analyses. Many of the 
paternal effect gene products identified so far are 
sperm-specific products that are required for the 
earliest stages of embryogenesis. The findings sub- 
stantiate the idea that the sperm provides unique con- 
tributions to the embryo in these organisms. Further 
molecular analysis of paternal effect genes should 
be informative for evaluating the extent to which 
embryos of different species rely on paternally pro- 
vided products. In addition, the nature of the defects 


induced by paternal effect mutations of Drosophila 
suggest that flies provide a useful model system for 
understanding paternal effects in humans. 


Summary 


A broad view of paternal inheritance takes into 
account traditional modes of nuclear inheritance 
from father to offspring as well as nontraditional 
inheritance patterns of paternally imprinted loci and 
sperm cytoplasmic factors. Much remains to be 
learned about the specific molecular composition of 
paternal contributions to embryonic development and 
gene expression. In spite of the recent successes in 
animal cloning, it is clear that the sperm makes unique 
contributions to the embryo and these contributions 
may ensure that biparental modes of inheritance per- 
sist during the normal development of plants and 
animals. Genetic strategies to study paternal effects 
using model organisms such as the mouse, Drosophila, 
C. elegans, and Arabidopsis are relatively new ap- 
proaches that should provide new tools for advancing 
this field of research and broadening its applications to 
studies of human development and disease. 


See also: Imprinting, Genomic; Sex 
Determination, Human; X-Chromosome 
Inactivation 


Pathogenicity Islands 
D K R Karaolis 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1646 


Bacterial pathogens are often clonal in that disease 
outbreaks or epidemics can be traced to distinct bac- 
terial cell lines (clones). In the last decade it has 
become clear that the virulence properties of patho- 
genic bacteria are often encoded by distinct virulence 
gene clusters. These virulence gene clusters have been 
termed ‘pathogenicity islands’ (or PAIs) and are typ- 
ically absent from nonpathogenic strains of the species. 
It is the presence of these PAIs that distinguishes a 
pathogenic strain from a commensal (nonpathogenic) 
strain of the species. This finding, together with many 
unusual genetic and phenotypic properties of PAIs, 
suggests that many PAIs were acquired from sources 
outside the current species via the process of horizontal 
gene transfer. Horizontal gene transfer is known to 
contribute greatly in bacterial evolution and in the 
emergence of new pathogens. This review discusses 
the concept of PAIs and provides examples of their 


role in the virulence of some important bacterial 
pathogens. 


The Pathogenicity Island Concept 


We have known for some time that there are differ- 
ences in the pathogenic potential between isolates and 
variants of the same species. These differences are 
often due to the presence of genes encoding toxins, 
adhesins, invasin ability, and evasion of the host 
defense system. Following the discovery of virulence 
plasmids and toxin-converting phages in the 1950s and 
1960s it was accepted that virulence genes can be 
extrachromosomal and transferable between strains. 
In the 1980s it was found that the chromosomes of 
bacterial pathogens may carry clusters of virulence 
genes and that these clusters of genes were absent 
from nonpathogenic strains of the species. These viru- 
lence gene clusters were called pathogenicity islands 
(PAIs) (Hacker et al., 1990). 


Definition of a Pathogenicity Island 


PAIs are clusters of genes that possess most if not all of 
the following characteristics: 


e PAIs carry genes encoding one or more virulence 
factors such as toxins, adhesins, invasins, iron 
uptake systems, and type III and IV protein secre- 
tion systems. 

e PAIs are present in pathogenic strains but absent 
from the genome of nonpathogenic strains of the 
same species. PAIs may be present on the chromo- 
some or as part of a virulence plasmid. 

e PAIs are large with the DNA often spanning 10- 
200 kb or more. 

e PAIs often have a DNA content that differs con- 
siderably from the rest of the host genome, particu- 
larly in the percentage G + C content and codon 
usage. This suggests horizontal gene transfer and 
that PAIs have been acquired by the strain from 
an outside source. 

e PAIs are often flanked by direct repeat (DR) DNA 
sequences. 

e PAIs are often associated with (inserted adjacent to) 
tRNA genes. Interestingly, the 3’ regions of tRNA 
loci are often the attachment sites for various bac- 
teriophages. The association of PAIs and tRNA 
genes and the presence of phage-like integrase 
genes on several PAIs suggests that many PAIs or 
parts of PAIs may be derived from phage. 

e PAIs usually contain cryptic or functional genes 
encoding mobility factors such as integrases, trans- 
posases, and insertion elements. The presence in 
some PAIs of plasmid origins of transfer suggests 
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that some PAIs or parts of the PAIs may be derived 
from plasmids. 
e PAIs often represent unstable regions of DNA. 


Diversity of Pathogenicity Islands 


PAIs are found in the genomes of various important 
human, animal, and plant pathogens. Interestingly, 
PAIs have not yet been found in bacterial species 
with natural competence that take up DNA via trans- 
formation, such as Streptococcus pneumoniae, Haemo- 
philus influenzae, and Neisseria meningitidis. 

PAIs of Salmonella (Mills et al., 1995) and diarrhea- 
genic E. coli (McDaniel et al., 1995) often appear to 
be stably maintained, whereas the PAIs of other 
pathogens such as Helicobacter pylori (Censini et al., 
1996) and Yersinia spp. (Fetherston et al., 1992) show a 
high frequency for deletion. Recently, evidence has 
been provided showing that PAIs in Vibrio cholerae 
(Karaolis et al., 1998, 1999) and Staphylococcus aureus 
(Lindsay et al., 1998) can be mobilized by bacterio- 
phage. 


Examples of the Roles of PAIs 


Salmonella typhimurium contains several PAIs. The 
gene products of Salmonella pathogenicity island I 
(SPI-I) are necessary for the invasion of Salmonella 
into epithelial cells (Mills et al., 1995). In contrast to 
SPI-I, SPI-I-specific gene products are essential for 
S. typhimurium survival within macrophages. 

Epidemic cholera is a life-threatening diarrheal dis- 
ease caused by specific toxigenic strains of V. cholerae. 
All epidemic V. cholerae strains contain genes for 
cholera toxin and a PAI called the V. cholerae patho- 
genicity island (VPI) (Karaolis et al., 1998). The VPI 
contains genes encoding the type IV pilus toxin- 
coregulated pilus (TCP), which is an essential intes- 
tinal colonization factor (Taylor et al., 1987) and also 
acts as the receptor for the cholera toxin phage 
(CTX) which carries cholera toxin genes (CT) 
(Waldor and Mekalanos, 1996). This indicates that 
there has been coevolution of the VPI and the CT 
genes which are located at a difference locus on the 
chromosome. Recent evidence suggests that the VPI is 
also encoded by phage (Karaolis et al., 1998). 


Pathogenicity Islands and Microbial 
Evolution of Pathogens 


Point mutations, genomic rearrangements, and hori- 
zontal gene transfer are essential components in 
microbial evolution. It is, however, the acquisition 
and excision of large genomic fragments that rapidly 
results in the emergence of new pathogenic variants. 
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Phages, plasmids, and PAIs are associated with fast 
evolutionary movement. It has been proposed that 
the acquisition of PAIs by some pathogens might be 
due to defects in DNA-repair genes, which result in 
higher rates of mutation and recombination in that 
strain compared with a nonpathogenic strain. Follow- 
ing the transfer of phages, plasmids, or PAIs into new 
host cells, two genetic processes are important. First, 
there must be stabilization of the new genetic elem- 
ents. The high rate of mutations often leading to stop 
codons in the mobility genes often associated with 
PAIs might be a mechanism of stabilizing the PAI, 
conserving the advantageous pathogenic phenotype, 
and limiting its loss from the strain. Second, there 
must be optimal expression of the newly acquired 
DNA. For this to occur the PAI-encoded virulence 
genes need to be incorporated into the regulatory net- 
work of the new host organism. Interestingly, the VPI 
of V. cholerae carries virulence genes; however, the 
VPI also carries genes that regulate expression of 
VPI genes and the genes encoding CT which are 
found at an independent locus on the V. cholerae 
chromosome. 


Future Work 


With the new age of microbial genomics, in which the 
complete genetic code for nearly all bacterial patho- 
gens will become available, additional PAIs will 
undoubtedly be discovered in bacterial pathogens. 
Comparison of PAIs in various strains of the same 
species will most probably reveal important differ- 
ences in genetic structure and sequence. These 
differences might be involved in niche adaptation of 
that particular clone or they may be associated with 
differences in virulence factor expression that might 
be found between the strains. Elucidation of the mo- 
lecular mechanisms involved in the generation, acqui- 
sition, and evolution of PAIs will provide information 
about the evolutionary potential of pathogenic bac- 
teria. Studies on PAIs and the genes they encode may 
utilize a combination of molecular methods, cell biol- 
ogy, immunology, and appropriate animal model sys- 
tems in order to determine the roles of these genes 
in the disease process. The knowledge obtained from 
these studies will not only help us understand the 
disease process but may also be used for a variety 
of applications such as the development of suitable 
vaccines or therapeutic agents. 
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Pattern formation is the process whereby organisms 
create spatially ordered and reproducible structures. 
Most cases of pattern formation involve multicellular 
organisms, though patterns can also be created in uni- 
cellular organisms such as ciliated protozoans, or by 
associations of prokaryotes such as swarming bacteria 
and the linear filaments of blue-green algae. Patterns 
can also be created in extracellular material, for 


example, in the striking shapes and patterns seen in 
mollusc shells. 

The generation of pattern and form is one of the 
major problems in developmental biology: organisms 
must not only generate different cell types, they must 
also ensure that the different cells are correctly 
arranged in time and space. Patterns as obviously 
complex as those in a butterfly’s wing, an orchid’s 
flower, or a peacock’s tail, represent blatant challenges 
to understanding, but they are perhaps less complex 
than the elaborate three-dimensional architecture of 
the vertebrate brain. 

As in other areas of developmental biology, genetic 
analysis has been very successful in dissecting some 
of the phenomena involved, though there is still a 
long way to go, and it would be fair to say that no 
single case of pattern formation is fully understood. 
Most cases of pattern formation seem to involve a 
complex interplay of regulated gene expression and 
modulation of cellular behavior, making it especially 
difficult to produce adequate explanations and models. 

Attention has focused on a number of experimental 
systems where the patterns involved are simple and 
can be easily studied, which include: the embryonic 
axes of organisms such as Drosophila, Caenorhabditis 
elegans, and Xenopus; the insect eye, limbs, and seg- 
ments; the nematode vulva; the chick limb; vertebrate 
segments; and retino-tectal projections from the eye to 
the brain in vertebrates. In plants, suitable systems 
have been provided by flower formation and by 
shoot and root patterning. The limited work so far 
carried out on more complicated patterns, such as 
those in butterfly wings, suggest that no new prin- 
ciples will be found to be involved, just reiterations 
of the same mechanisms used to create patterns in 
simpler situations. 

Many different effects have the potential to create 
spatial order in biological systems, and any one, or any 
combination, of these may be more or less important 
in a particular case of pattern formation. A partial list 
of these mechanisms and processes, together with 
some examples, is as follows: 


1. Self-assembly of large molecular structures. This is 
obviously important in generating form at the sub- 
cellular level (first illustrated in bacteriophage 
morphogenesis), but it is also important in the gen- 
eration of eukaryotic cell structures such as the 
lattice of myofilaments in muscle cells, or the junc- 
tions between cells in an epithelial sheet. Structures 
of this type may even be used as templates for the 
reproduction of patterns, as seems to be the case in 
the inheritance of cortical patterns in protozoa. 

Organized cell growth and cell architecture. For 
example, in plants the preferential elongation of 


ba 
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cells along certain axes is a major determinant of 
form. Much of animal development involves suc- 
cessive changes in the shape and arrangement of 
epithelial sheets, which are largely caused by 
shape changes in the individual cells making up 
these sheets. 

Control of cell proliferation. Cell multiplication is a 
prerequisite for any kind of tissue differentiation, 
but differential proliferation can also generate 
form. The amount of cell division in different 
parts of a developing embryo is usually carefully 
regulated. For example, in the Drosophila embryo, 
an initial phase of 13 cycles of general nuclear pro- 
liferation is followed by a switch to regulated div- 
ision, in which only certain regions of the embryo 
continue to undergo mitosis. In plants, placement 
and activity of meristems, which are the main sites 
of cell division, are major determinants of shape and 
form. 

External cues. Initially symmetric arrangements of 
cells may use external signals, such as gravity or 
light, in order to create a spatial pattern. The 
anterior-posterior axis of the chick blastoderm is 
specified using gravity as such a cue. 

Stochastic assignment of cell fate. In the absence of 
symmetry- breaking signals, asymmetry may arise 
in a set of cells simply by random fluctuations. 
Subsequent events and cell interactions can then 
reinforce small initial differences. Equipotential 
cells that become different as a result of a stochastic 
choice are seen repeatedly in the cell lineage of the 
nematode C. elegans. 

Asymmetric cell divisions. Patterning in the early 
embryos of many invertebrates, such as nematodes 
and molluscs, involves asymmetric cell divisions, 
resulting in daughter cells of unequal size. Such 
divisions necessarily create polarity and permit 
unequal distribution of cellular materials. 

Control of cell division axes. The orientation of the 
mitotic spindle in a dividing cell determines the 
spatial arrangement of its daughters: anterior/pos- 
terior, dorsal/ventral, left/right. This control is 
important on many occasions during development, 
and has been studied mechanistically in the greatest 
detail in the early embryo of C. elegans. 
Demarcation of fields of cells, as in the formation of 
compartments. This is seen most conspicuously in 
insect development and in the formation of rhom- 
bomeres in the vertebrate brain. 

Establishment of boundaries between these fields. 
Formation of boundaries is an essential step both in 
delimiting a morphogenetic field within a compart- 
ment, and in exploiting the confrontation of two 
cell fields in order to create a defined line of origin 
for the emission of diffusible signals. Compartment 
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and segment boundaries in Drosophila have been 
studied extensively in these respects. 

Short-range signaling between cells, involving 
direct contacts or very short-range signals, acting 
over one or two cells. Inductive signaling between 
cells is a major strategy for creating form. It has 
been studied intensively in systems such as the 
induction of the nematode vulva, or the sequential 
inductions occurring during the differentiation of 
the Drosophila ommatidium. Both of these depend 
on molecules related to vertebrate growth factors, 
and their corresponding receptors. 

Long-range signaling between cells or nuclei, 
involving diffusible morphogens. The establish- 
ment of pattern over longer distances, 10 cell diam- 
eters or more, has been suspected to depend on 
diffusible molecules organized in gradients. In 
principle, the different levels of such morphogens 
can be used by responding cells to determine 
where they were located relative to the source of 
the gradient. This positional information can then 
determine the pathway of differentiation pursued. 
A morphogen gradient of this type was first con- 
vincingly demonstrated in the case of bicoid pro- 
tein, which dictates part of the anterior patterning 
in the Drosophila embryo. There is circumstantial 
evidence for morphogen gradients in many events 
during vertebrate development, most notably in 
limb formation. 

Lateral inhibition. Interactions between neighbor- 
ing cells that prevent both from adopting the same 
fate are very important in creating spaced patterns, 
for example in the well-studied arrangements of 
bristles on the surface of insects. A family of con- 
served receptors (the NOTCH/LIN-12 family) 
and their ligands, originally defined in Drosophila 
and C. elegans, appear to be particularly important 
for this purpose, being used many times in differ- 
ent contexts throughout development in a variety 
of animal species. 

Sorting of cells by means of differential cell affin- 
ities. In principle, pattern can be generated sim- 
ply by randomly assigning different fates within a 
population of cells, and then allowing them to sort 
out by differential adhesion. A mechanism more 
or less like this seems to be used during the devel- 
opment of slime molds, when cells within undif- 
ferentiated mounds undergo commitment to one 
of two fates, prespore or prestalk. The two cell 
types then sort out to create a bipartite structure, 
which ultimately differentiates into the mature 
stalk and spore-containing fruiting body. Differ- 
ential adhesion is probably also important on 
many occasions during animal morphogenesis, 
being used for preventing the mixing of cell types 


and for straightening the borders between cell 
fields. 

Programmed cell death (apoptosis). A tissue can 
be sculpted by the selective death of some cells and 
the decay or removal of their corpses. Extensive 
cell death is responsible for the separation of digits 
in tetrapod limb development, and for some 
events in plant morphogenesis, such as the forma- 
tion of holes in rubber plant leaves. 


14. 


See also: Apoptosis; Cell Division Genetics; 
Cell Lineage; Compartmentalization 
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The mechanisms that control embryonic development 
are highly conserved among different organisms such 
as Drosophila, mouse and nematodes. Classical and 
molecular genetics analysis has led to the identifica- 
tion of so-called developmental control genes, which 
are shared between different species including mam- 
mals. Among these, the murine paired-box contain- 
ing genes (Pax) have been identified on the basis of 
their sequence homology to Drosophila segmentation 
genes. This review discusses the role of Pax genes in 
embryogenesis, and their function in development, 
in organogenesis and in cell proliferation and dif- 
ferentiation. 


Protein Structure 


In mouse and human the Pax gene family consists so 
far of nine members. They share a common motif, a 
DNA-binding domain of 128 amino acids, the paired 
domain, which is located at the amino-terminal end. 
The paired domain has been highly conserved during 
evolution and paired-box containing genes are found 
in other species including Drosophila, human, mouse, 
rat, chicken, and zebrafish. 

Distinct classes of Pax genes are defined by the 
presence or absence of a paired-type homeobox and 
of an octapeptide coding region, in addition to the 
paired domain. Accordingly, Pax genes of a distinct 
class or subgroup share a similar protein structure, 
common genomic organization and related expres- 
sion pattern during development. Pax proteins are 
transcription factors, as they display sequence-specific 
DNA-binding activity to regulate transcription. 


Expression during Development 


Pax Genes in the Central Nervous System 
All Pax genes, except for Pax1 and Pax9, are expressed 
in various restricted domains throughout the devel- 
oping neural tube. These expression territories display 
a complementary pattern in the rostro-caudal as well 
as in the dorso-ventral axis. Unlike Hox genes, Pax 
genes are not only detected in the spinal cord and 
hindbrain but also in more rostral domains of the 
brain: Pax6 in the telencephalon, Pax3 and Pax7 in 
the mesencephalon, and Pax2, Pax5 and Pax8 at the 
midbrain-hindbrain junction. In the spinal cord, 
Pax3, Pax6 and Pax7 are expressed prior to neural 
differentiation in mitotically active cells, in contrast 
to Pax2, Pax5 and Pax8 which first appear in two 
longitudinal columns of the intermediate gray, on 
both sides of the sulcus limitans. 


Pax Genes in the Paraxial Mesoderm 

The paraxial mesoderm arises from the primitive 
streak. The first noncompartmentalized epithelial 
somite undergoes several morphological changes and 
differentiates into a ventral mesenchymal part consist- 
ing of the sclerotome, and a dorsal epithelial compart- 
ment, the dermomyotome. Pax1 and Pax9 are 
confined to the sclerotome, while Pax3 and Pax7 are 
found in the dermomyotome reflecting a complemen- 
tary pattern of expression in the dorso-ventral axis of 
the differentiating somite. The overlapping expression 
territories shared by distinct Pax genes in the meso- 
derm and other tissues reflect their subdivision into 
subgroups and may argue for a functional redundancy 
between members of the corresponding subgroup. 


Pax in Organogenesis 

Pax genes exhibit dynamic expression patterns during 
ontogenesis in a large variety of tissues derived from 
all germ layers. In fact, Pax genes are detected in early 
steps of organogenesis and seem to define very specific 
regions. At early stages of eye formation, Pax2 and 
Pax6 share overlapping domains of expression in the 
optic vesicle that give rise to the developing eye. Pax1 
and Pax3 proteins are detected in the developing 
thymus, Pax9 in the parathyroid, Pax2 and Pax8 in 
the kidney, Pax2 in the eye and the inner ear, Pax8 in 
the thyroid gland, Pax4 in the pancreas, Pax6 in the 
eye and the pancreas and Pax5 in B-cells and testis. 


Mutations, Phenotypes, and Function 


Restriction of Differentiation Boundaries? 

Direct evidence for the functional significance of Pax 
genes has been demonstrated by the correlation of 
mouse developmental mutants and human diseases 
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with mutations in certain Pax genes. Pax1 is mutated 
in undulated (un) mice, leading to defects of the inter- 
vertebral discs, structures derived from the medial 
sclerotome compartment. Pax3 mutations are found 
in Splotch mice (sp) and human Waardenburg syn- 
drome where malformations in neural crest deriva- 
tives and skeletal muscle are observed. Pax6 is 
mutated in small eye (sey) mice and rats and in 
human aniridia, and in Drosophila eyeless, displaying 
eye defects. This is in close correlation with the early 
expression of Pax6 in the process of eye formation. 
Most of these mutations act in a dominant manner 
indicating that Pax proteins play a crucial role. They 
are loss-of-function mutations and lead to develop- 
mental defects, also indicating that Pax genes are im- 
portant players in embryogenesis. Gain-of-function 
mutations or deregulated expression of Pax genes, 
however, lead to oncogenesis. 

In the brain, comparative analysis of the expression 
domains of Pax, forkhead, Wnt, engrailed, and other 
homeobox genes with sites of neuronal differentiation 
suggest that some Pax proteins are morphoregulators 
of brain development. In the Pax6-deficient mutant 
sey, the boundary between the ganglionic eminence 
and the cortex is not defined, probably due to the 
loss of expression of R-cadherin, a cell adhesion mol- 
ecule normally detected in the cortex. In the spinal cord 
of double mutant embryos for Pax3 and Pax7, ventral 
interneurons extend into the dorsal part and suggest 
that Pax3 and Pax7 are required to restrict ventral 
neuronal identity. In Pax2 mutant mice, a severe eye 
coloboma occurs, developing outer pigmented layer 
and neural retina extend into the Pax6 expressing 
domain, and no differentiation of the glial cells 
surrounding the optic nerve is observed. All these 
phenotypes suggest that Pax genes are involved in 
restricting boundaries of differentiation. 


Cell Differentiation and Oncogenesis 
Pax4- and Pax6-deficient mice are devoid of insulin- 
producing B-cells and glucagon-secreting «-cells 
respectively. In the thyroid gland Pax8 is necessary 
for the formation of folliclar cells producing thyroxin 
and Pax8 mutant mice suffer from hypothyroidism. 
The deregulated expression of Pax genes can trans- 
form fibroblasts im vitro leading to tumors when 
implanted in nude mice. Accordingly, the expression 
of Pax2 and Pax8 is abnormally upregulated in Wilm’s 
tumor, a pediatric renal carcinoma. Similarly, de- 
regulated expression of Pax5 has been reported in 
human malignant astrocytomas, in large-cell lympho- 
mas and medulloblastomas. In addition, chromosomal 
translocations are found in alveolar rhabdomyosar- 
coma and lead to the expression of an inframe 
fusion protein between Pax3 or Pax7 and another 
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transcription factor of the winged helix family (fork- 
head FKHR). 

All these phenotypes point to a function of Pax 
genes in very early steps of cell differentiation. An- 
alysis of Pax5-deficient mice support the hypothesis 
that Pax genes may have a dual function in this pro- 
cess: activating a certain differentiation potential and 
thereby inhibiting inappropriate lineages. It is conceiv- 
able that Pax genes act on cell proliferation and/or 
survival. 


Further reading 

Noll M (1993) Evolution and role of Pax genes. Current Opinion 
Genetics and Development 3: 595-605. 

Mansouri A, Hallonet M and Gruss P (1996) Pax genes and their 
roles in cell differentiation and development. Current Opinion 
in Cell Biology 8: 851-857. 

Mansouri A, St-Onge L and Gruss P (1999) Pax genes in endo- 
derm-derived organs. Trends in Endocrinology and Metabolism 
10: 164-167. 


See also: Developmental Genetics; Embryonic 
Development of the Nematode Caenorhabditis 
elegans 
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pBR322 is one of the standard plasmid cloning vectors. 


See also: Cloning Vectors 


PCR 


See: Polymerase Chain Reaction (PCR) 


Pedigree Analysis 


J M Connor 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0964 


Pedigree analysis describes the process of interpreta- 
tion of information displayed as a family tree. The 
family tree or pedigree is constructed using a stand- 
ardized set of symbols and will include information 
about the disease status of each individual. If only a 
single individual is affected within the family then the 
pedigree cannot in itself provide proof for a particular 
mode of inheritance and cannot distinguish inherited 


from noninherited conditions. When more than one 
individual is affected then the pattern may provide 
important clues or even proof of the mode of inher- 
itance. There are four main patterns of inheritance that 
may be seen in a pedigree. 

A ‘vertical’ pedigree is the term used when a trait or 
disease is passed down through several generations, 
directly from an affected individual to affected des- 
cendants in successive generations. Such vertical trans- 
mission is typically seen in autosomal dominant 
inheritance but can also be seen in X-linked domin- 
ant inheritance, mitochondrial inheritance, inherited 
chromosomal imbalances and nongenetic situations 
(such as infective agents). In autosomal dominant 
inheritance both sexes can be affected and, in turn, 
transmit the trait to both males and females. In X- 
linked dominant inheritance both sexes can be affected 
and females can transmit the trait to both sons and 
daughters but affected males transmit it to all daugh- 
ters and no sons. In mitochondrial inheritance both 
sexes can be affected but males do not transmit the 
trait and females transmit it to all offspring (although 
not all may be clinically affected). 

A ‘horizontal’ pedigree is the term used when a trait 
or disease only affects family members in the same 
generation. This type of pedigree pattern is typical of 
autosomal recessive inheritance but can also be seen in 
X-linked recessive disorders, autosomal dominant 
disorders with incomplete penetrance, chromosomal 
translocations and nongenetic situations. In auto- 
somal recessive inheritance both sexes can be affected 
in a sibship (brothers and sisters) and the disease 
severity is similar in males and females. Parental con- 
sanguinity (parents who are blood relatives) would be 
a further clue to an autosomal recessive condition. In 
an X-linked recessive condition only brothers are 
affected and in the absence of other affected male 
relatives (see above) this would mimic an autosomal 
recessive pedigree. The situation of affected brothers 
and sisters with normal parents might also be seen if 
one parent has an autosomal dominant condition but 
is clinically unaffected due to nonpenetrance or gon- 
adal mosaicism. 

A ‘knight’s move’ pedigree is the term used when a 
trait or disease only affects males ina family and where 
affected males are related via outwardly normal 
females. Thus, for example, an affected boy may have 
an affected maternal uncle or affected maternal male 
cousins. The intervening females are usually clinically 
normal but are carrying the faulty gene. This pedigree 
pattern is typical of X-linked recessive inheritance. 
Males have only a single X chromosome and thus are 
affected by mutations in genes on the X, whereas the 
intervening females have a normal copy of the gene on 
their other X chromosome and are not usually affected. 


A ‘nonspecific’ pedigree is the term used when a 
trait or disease affects more than one individual but 
where the pattern does not conform to any of the 
above three patterns. This might be caused by multi- 
factorial inheritance, chance with common disorders, 
environmental factors, autosomal dominant inher- 
itance with low penetrance, or a chromosomal trans- 
location. In multifactorial inheritance the risks of 
recurrence are increased in close relatives above the 
general population risk and thus there is more likely to 
be a family history of other affected individuals. The 
pattern of involvement is, however, not typical or 
diagnostic. Similarly with common disorders there 
may be a family history by chance alone. One in 
three people develop cancer at some stage in their 
lives and thus it is not uncommon to see a family 
history of cancer on a purely chance basis. If the 
same type of cancer is involved and especially if 
there is a young age of onset or involvement of multi- 
ple sites or more than two relatives, then single gene 
forms of cancer need to be excluded. 


See also: Autosomal Inheritance; Consanguinity; 
Genetic Counseling; Genetic Diseases; 
Mitochondrial Inheritance; Mosaicism in 
Humans; Multifactorial Inheritance; Oncogenes; 
Penetrance; Sex Linkage; Vertical Transmission 
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The condition where sexual reproduction occurs in 
the immature (e.g., larval) organism; compare with 
neoteny. 


See also: Neoteny 
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Penetrance is the conditional probability of observing 
a corresponding phenotype given a specific genotype. 
Typically, it refers to the degree to which some indi- 
viduals of a mutant genotype display the associated 
phenotype. Penetrance may vary from 0 to 1. When 
less than 100% of a population with the identical 
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mutant genotype display the associated phenotype, 
that mutation is said to be ‘incompletely penetrant.’ 

Penetrance is similar in meaning to “expressivity” 
and the two terms are often used together when 
describing mutations. For example, certain weak 
alleles of the W locus seen in mice result in white 
coat color spots. These mutant alleles are said to 
show reduced penetrance and variable expressivity. 
The distinction between penetrance and expressivity 
is that penetrance refers to the genotype while expres- 
sivity refers to the phenotype. In this example, only 
some of the mice that carry the W /+ genotype show 
any spots at all. This is an example of reduced pene- 
trance. Of the animals that show the spotted pheno- 
type, however, some tend to show much spotting 
while others show very little spotting. This is an exam- 
ple of variable expressivity. 

Penetrance is sometimes used in a narrow sense to 
describe the probability of being affected by a disease, 
given the presence of a certain disease-predisposing 
allele. In principle, the penetrance of a disease- 
susceptibility allele is the fraction of individuals that 
are affected among a population that carry the disease 
allele. In practice, it is often very difficult to estimate 
the penetrance of a disease-predisposing allele, since it 
is difficult to collect a population of susceptible indi- 
viduals and determine the fraction that are affected in 
an unbiased way. This task is further confounded in 
cases of complex diseases by factors such as age, genetic 
background, and phenocopies, which are cases that 
resemble the affected state but are nongenetic in origin. 

The phenomena of reduced penetrance and variable 
expressivity have a similar root cause. The phenotypic 
effects of a specific gene are highly contingent on the 
environmental conditions that exist during the devel- 
opment of an organism and during maturity. The 
effects of a specific gene are also dependent on other 
modifier genes in the same developmental or physio- 
logical pathway. Hence, variation in the environment 
and in modifier loci among individuals in a population 
may alter the phenotypic effects of a specific gene or 
mutation resulting in reduced penetrance and variable 
expressivity. 


See also: Expressivity; W (White Spotting) Locus 
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A peptide bond is the amide bond which is formed 
when the carboxyl group of one amino acid becomes 
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linked to the amino group of another to form a pep- 
tide. (The loss of a water molecule occurs during for- 
mation of the peptide bond and the basic amino acid 
unit in a protein chain is therefore referred to as an 
amino acid residue.) The oxygen atom of the carbonyl 
group involved in the bond is in the trans position 
with respect to the hydrogen on the bonded nitrogen 
atom. The peptide group (-CO-NH-) has a partially 
double-bond character which results from resonance 
and keeps these four atoms planar. 

Peptide bonds link all the amino acid residues 
together in a polypeptide chain and form the very 
regular backbone of the chain. This regular linkage 
means that every polypeptide has a free amino group 
on the amino acid residue at one end of the chain (the 
N-terminus) and a free carboxyl group on the amino 
acid residue at the other end (the C-terminus). 


See also: Amino Acids; Polypeptides 
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The perinuclear space is the gap between the inner and 
outer membranes of the nuclear envelope (approx. 
10-40 nm wide). 


See also: Nucleus 
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Permissive cells are cells that can support the growth 
of a virus. A number of factors must be met if a virus 
is to have the capacity to replicate successfully in a 
host cell. The cell must first have the correct proteins 
displayed on its outer surface to absorb the virus. If 
the cell has an altered receptor protein for a specific 
virus the virus will be unable to attach to the cell, 
although the cell may be permissive for other strains 
of viruses which use other means to enter the cell. 
Likewise departure of the newly replicated virus par- 
ticles from the host cell can be affected by host cell 
mutations. 


See also: Virus 
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Petite strains of yeast lack mitochrondrial function. 
These mutants grow slowly and rely on anaerobic 
respiration. The mitochondria present have reduced 
cristae and are functionally defective. 


See also: Mitochondria 


P-Glycoprotein 
M A Barrand 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1607 


P-glycoprotein (Pgp) (now ABCB1) is one of the most 
extensively studied members of the superfamily of 
ATP-binding cassette (ABC) transporters found in 
prokaryotes and eukaryotes of plants and animals. 
By utilizing the energy released by ATP hydrolysis, 
ABC transporters bring about movement of sub- 
stances such as ions, sugars, amino acids, phospho- 
lipids, peptides, toxins, and antibiotics. They thus 
can affect distribution of molecules at subcellular, cel- 
lular, and tissue levels. Pgp is remarkable in that it is 
able to transport a wide range of substrates with dif- 
fering structures including many lipophilic anticancer 
drugs. 

Pgp was first identified by Juliano and Ling in 1976 
in cultured mammalian tumor cells that had been 
exposed to a cytotoxic drug and over time developed 
resistance not just to the original selecting drug but to 
a range of different drugs, i.e., ‘multidrug resistance.’ 
By comparing membranes obtained from the original 
sensitive cells with those from the resistant cells it was 
possible to observe a 170 kDa protein present only in 
the resistant cell membranes. Following cloning and 
sequencing of the gene, transfection experiments 
showed that the presence of this protein at the cell 
surface could bring about efflux of a number of dif- 
ferent drugs, thus preventing access of these toxic 
agents to their intracellular target sites and so confer- 
ring resistance to the transfected cells. Since that time, 
orthologs of Pgp have been identified in many differ- 
ent species. Pgps are present not only in tumor cell 
lines but also in many normal tissues where their 
ability to expel toxic material not only affects the 
pharmacokinetics of many therapeutic drugs includ- 
ing anticancer agents and drugs used against AIDS but 


also plays an important part in protecting healthy cells 
and eliminating toxins from the body. The presence of 
Pgp on tumor cells in some cancers can influence the 
extent of drug access and may be one of the factors 
contributing towards the clinical resistance seen to 
anticancer chemotherapy. Blockade of Pgp activity 
may be of therapeutic benefit, thus much research 
has been directed towards defining how Pgp expres- 
sion is regulated, understanding its mechanisms of 
action and identifying suitable inhibitors or other 
appropriate strategies for overcoming Pgp-mediated 
resistance. 


Relationship to Other ABC Transporters 


Pgps show structural similarity to other mammalian 
ABC transporters including the cystic fibrosis trans- 
membrane conductance regulator (CFTR) and the 
sulfonylurea receptor (SUR1). Members of the Pep 
family can also be found in lower organisms, e.g., 
the malarial plasmodium. The basic domain organiza- 
tion is as shown (Figure | A) with two halves joined 
by a linker region, each half containing six transmem- 
brane spanning segments (TMs) and an ATPase site or 
nucleotide-binding domain (NBD). The transmem- 
brane topology was initially deduced from hydro- 
pathy plots and later verified using other genetic 
approaches including cysteine-scanning mutagenesis 
and epitope insertion. The exact number of TMs 
remains controversial. The linker region contains sev- 
eral putative phosphorylation sites though it is still 
unclear whether these are involved in modulating 
activity. There are two genes, MDR1 and MDR3, 
which encode the human Pgps and are adjacent to 
each other on chromosome 7. Three isoforms of Pgp 
(mdria, 1b, and mdr2) are present in rodents. The 
MDRI1/mdr1 gene products are the ones mainly 
implicated in drug transport, the MDR3/mdr2 gene 
products being more restricted, predominantly to 
movement of phospholipids. Although some genetic 
polymorphism has been identified, there is a high 
degree of conservation of the primary sequence of 
the MDR1 gene. The presence of splice variants has 
not been reported for MDR1 Pgp but has been noted 
in a number of other ABC transporters, e.g., MDR3 
Pgp (now ABCB4) and members of the more dis- 
tantly related multidrug-resistance-associated protein 
(MRP now ABcC1-6) family. 


Substrate Profiles and Binding 


The MDR1 Pgp can transport many different natural 
product compounds including lipophilic weak bases 
such as doxorubicin and vincristine, and neutral poly- 
cyclic molecules such as taxol and colchicine. Some 
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Figure | (A) Predicted secondary structure of human 
MDRI P-glycoprotein showing the nucleotide binding 
domains (NBDs), the transmembrane spanning regions 
(TMs), and the linker region joining the two halves of the 
molecule. Three glycosylation sites are present on the 
first extracellular loop. (B) Possible actions of P- 
glycoprotein. Lipophilic substances diffusing across the 
cell membrane into the cytoplasm may be extracted 
from the lipid phase by Pgp and expelled to the exterior. 
This is fuelled by ATP hydrolysis, involves conforma- 
tional changes and requires both halves of the molecule. 


cyclic (valinomycin) and synthetic linear (the HIV pro- 
tease inhibitor sequinavir) peptides as well as several 
lipophilic cationic fluorescent dyes (Hoescht 33342, 
rhodamine 123) can also be transported. This ability to 
interact with many structurally dissimilar compounds 
raises questions about the selective nature of the inter- 
actions. Yet within each class of substrates, some 
structure-activity relationships can be seen which 
provide clues about the features necessary for inter- 
action. Genetic approaches involving site-directed 
mutagenesis have been used to identify the probable 
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location where drugs bind within Pgp as transmem- 
brane segments (TM5, TM6, TM11, and TM12). Both 
N-terminal and C-terminal halves seem to contribute 
to substrate binding. There is some controversy about 
the number and nature of the binding sites with pos- 
sibly different overlapping regions of a single flexible 
binding site large enough to accommodate more than 
one compound. This site may be located close to the 
cytoplasmic face of the molecule and be accessible 
from the inner leaflet of the plasma membrane. 


Normal Tissue Distribution and 
Physiological/Pharmacological 
Relevance 


The human MDR1 Pgp is constitutively expressed in 
many normal cell types, e.g., proximal tubule epithe- 
lium, mucosal cells of small and large intestine, adrenal 
cortical cells, pancreatic duct cells, and endothelial 
cells lining capillaries of the brain and testes. The 
transporter is located at the luminal face of cells in 
these tissues and is thought to play a protective role in 
preventing entry across the intestine or across the 
blood-brain or blood-testis barrier of unwanted sub- 
stances and in facilitating excretion of xenobiotic 
substances and endogenous metabolites into the urine 
and bile. The significance of Pgp in adrenal function 
and in some hematopoietic cell types (CD34+ and 
natural killer (NK) cells), though less clear, may relate 
to transport of certain endogenous substances. In 
rodents, the two Pgp isoforms equivalent to human 
MDR1 show tissue specific expression with mouse 
intestine and brain capillaries containing only the 
mdr1 a isoform and adrenal and placenta predomin- 
antly the mdr1b isoform. The human MDR3 Pgp and 
its rodent equivalent encoded by the mdr2 gene are 
present mainly on the canalicular membranes of hep- 
atocytes and appear to be involved in excretion of 
phosphatidylcholine into the bile. Studies in mice in 
which the mdria and/or mdrib genes have been 
inactivated by insertional mutagenesis indicate that 
neither viability, fecundity, nor life span are affected 
by loss of either one or both these genes. However, 
complete loss of Pgp from the gut and brain capillaries 
has profound effects on drug distribution with greater 
oral availability and increased brain penetration of 
several drugs known to be Pgp substrates. 


Pgp in Cancers and Regulation of MDR 
Gene Expression 


Tumors arising from cells that normally express Pgp 
may show ‘intrinsic’ resistance to anticancer drugs. 
Following chemotherapy, Pgp can also appear in 
tumors derived from cells that do not normally 


express it. These are said to show ‘acquired’ resist- 
ance. Gene amplification may be responsible for 
acquired resistance in cultured cells but in vivo this 
is rare. Increased MDR1 gene expression can occur 
without gene amplification involving both transcrip- 
tional and posttranscriptional mechanisms. Increased 
stabilization, for instance, may contribute to elevated 
MDR1 mRNA and transcriptional changes with 
increased activity of the MDR1 promoter have been 
noted in response to several stressors including anti- 
cancer agents, DNA damaging agents, heat shock, 
serum starvation, and UV irradiation. Genetic 
approaches involving expression of portions of the 
MDR1 promoter linked to a reporter gene have been 
useful in revealing regions important for regulating 
transcription. Several transcription factor consensus 
sites (GC-rich regions that bind the Sp1 element, a 
Y-box that binds the YB-1 protein and an AP-1 site) 
are present in the MDR1 promoter which responds 
also to signaling pathways involved with normal 
physiological stimuli. What factors can account for 
expression of Pgp in cells previously negative for 
Pgp is less clear. However, chromosomal abnormal- 
ities near the MDR1 locus including duplications and 
rearrangements that lead to formation of hybrid 
MDR1 mRNAs or that juxtapose the MDR1 gene to 
a transcriptionally active gene have been described. 
Point mutations in the MDR1 promoter have also 
been found in some cancers. How universal these 
mechanisms are in activating Pgp expression in 
tumor cells remains to be determined. More recently, 
the search for factors that convert Pgp— cells to Pgp+ 
cells has focused on the role of DNA methylation in 
repressing MDR1 promoter activity. 

To assess the possible relevance of Pgp to clinical 
resistance, studies to detect Pgp either at the mRNA 
level and/or protein level have been undertaken in 
various cancers. Evidence of transport activity has 
been obtained by observing changes in tracer accumu- 
lation following exposure to Pgp inhibitors either in 
cells taken from patients or in tissues monitored 
in vivo using fluorescent (e.g., rhodamine 123) or 
radioactive (Tc-99m-sestamibi) tracers. Such studies 
have revealed Pgp to be expressed at biologically sig- 
nificant levels in about 50% of human cancers. Thus 
strategies to “shut down” Pgp during chemotherapy 
are being devised. However, it may be that other 
transporters of the MRP and mitoxantrone-resistance 
(MXR) families, discovered more recently, also con- 
tribute significantly to drug efflux from tumor cells. 


Mechanisms of Action 


Genetic approaches involving site-directed mutagen- 
esis have proved valuable in identifying regions of Pgp 


important for its function. The majority of studies 
have been undertaken on the human MDR1 Pgp. 
There is still some dispute as to whether Pgp brings 
about drug transport directly or indirectly. Some sug- 
gest that the presence of Pgp in the cell membrane 
leads to alterations in membrane potential and intra- 
cellular pH and that it is these changes to biophysical 
parameters that then affect intracellular drug distribu- 
tion. Others favor a pump model in which the energy 
of ATP hydrolysis by Pgp is used to translocate drugs 
from the inner leaflet of the cell membrane and/or 
cytoplasm (see Figure IB). The two halves of the 
molecule cooperate in transport and simultaneous 
expression of each half is required to produce a func- 
tional transporter. Pgp is unusual in displaying a high 
level of constitutive ATPase activity which can be 
further modulated by drugs. It also has a somewhat 
low affinity for ATP binding and hydrolysis com- 
pared with other ATP-dependent transporters. An 
alternate site catalysis model has been proposed for 
ATP hydrolysis by Pgp with complete cooperativity 
between the two NBDs, each alternately hydrolyzing 
ATP. The mechanism by the which this hydrolysis 
energizes drug translocation remains poorly under- 
stood but involves conformational changes which are 
detectable following drug binding, ATP binding, and/ 
or hydrolysis. 

The carbohydrate moieties on Pgp are not essential 
for activity and glycosylation-deficient Pgp mutants 
can still confer drug resistance but such moieties do 
assist in processing Pgp to the cell surface. Point muta- 
tions in Pgp can cause it to be trapped in the endo- 
plasmic reticulum as a core-glycosylated intermediate 
associated with molecular chaperones. Proteolytic 
enzymes within the endoplasmic reticulum are import- 
ant in quality control of Pgp folding, correctly folded 
proteins having their protease-sensitive sites masked, 
thus avoiding digestion. 


Pgp Inhibitors and Strategies for 
Modulating Resistance 


Compounds known as “chemosensitizers’ or ‘resist- 
ance modulators’ are able to inhibit drug efflux by Pgp 
and thus allow higher concentrations of drugs to 
access their intracellular target sites. Such compounds 
include calcium channel blockers (verapamil), sodium 
channel blockers (quinidine), steroids and steroid- 
like compounds (tamoxifen), and cyclic peptides 
(cyclosporin A). Most reversing agents block trans- 
port by acting competitively or noncompetitively, 
binding either to drug interaction sites or to other 
modulatory sites leading to allosteric changes. Some 
modulators, e.g., verapamil, are themselves substrates 
and inhibit without interrupting the catalytic cycle. 
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Others such as cyclosporin A interfere with both 
substrate recognition and ATP hydrolysis and 
may not be substrates. An alternative view is that 
substrates and chemosensitizers are handled similarly 
by Pgp but substrates enter the cell membrane slowly 
and chemosensitizers enter more rapidly. Effective 
chemosensitizers should thus exhibit high-affinity 
binding to Pgp and also equilibrate rapidly across 
lipid bilayers. Clinical use of many of these modula- 
tors can be limited since concentrations required to 
achieve effective inhibition of Pgp transport are suffi- 
cient to bring about other pharmacological actions. 


Future Prospects 


There is still much to be unraveled concerning the 
interactions of Pgp with its substrates, its ability to 
transport so many compounds with apparently very 
different structures, the nature of possible endogen- 
eous substrates, and its role in physiological and 
pathological processes. Effective inhibition of Pgp re- 
mains a pharmacological goal to improve oral uptake 
of drugs, to maintain therapeutic levels, and to allow 
access to required target sites. Alternative approaches, 
preventing Pgp expression, are also being explored. In 
addition, strategies are being developed to improve 
the therapeutic index of anticancer drugs by increasing 
Pgp levels in normal healthy bone marrow and other 
drug-sensitive tissues, so protecting them from tox- 
icity. Such gene therapy, involving delivery of cDNA 
encoding Pgp, has been tested in mice and suitable 
vectors have been identified, and clinical trials to test 
this are now under way. 
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A phage (bacteriophage) is a bacterial virus. 


See also: Bacteriophage Recombination; 
Bacteriophages 


Phage à Integration and 
Excision 


A Landy 
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Bacteriophage A, like other lysogenic phage, has two 
possible life styles after infecting its host Escherichia 
coli. On the one hand, it can undergo a typical lytic 
cycle, which involves phage multiplication and the 
release of progeny into the extracellular medium. 
Alternatively (and depending upon the physiological 
state of the host), it can adopt a quiescent life style: the 
largely inactive viral genome residing benignly within 
the host cell and its descendants, until some environ- 
mental or physiological signal provokes an awakening 
to the lytic pathway. To maintain this ‘lysogenic’ state 
à has evolved an elaborate pathway of site-specific 
recombination that inserts the viral chromosome into 
the chromosome of the host, using specific sites on the 
phage (at¢P) and bacterial (attB) chromosomes. The 
resulting integrated viral ‘prophage,’ which is now 
assured of propagation and distribution to all descend- 
ants, is flanked by the junctions created between viral 
and host DNA sequences. These junction sequences 
called attL and attR (on the left and right, respect- 
ively) can recombine with each other to regenerate 
attP and attB, thereby excising the viral chromo- 
some in preparation for a cycle of lytic growth (see 
Figure lI). 


Integrase 


The protein responsible for catalyzing à integrative 
and excisive recombination is the virally encoded 356 
amino acid integrase (Int) protein. This founding 
member of the Int Family of site-specific recombin- 
ases makes a transient nick in one strand of a DNA 
duplex using a conserved tetrad of residues to activate 
the scissile phosphate and a tyrosine nucleophile 
to generate a 3’ phosphotryosine linkage and a 5’ 


attP 
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Figure | Integrative and excisive recombination path- 
ways. The protein binding sites for arm-type Int (O), 
core-type Int (—), IHF (0O), Xis (A), and Fis (©) are 
indicated by filled symbols when that site is occupied by 
its cognate protein to make a competent recombination 
partner for integrative or excisive recombination. 
Proteins required for each reaction (Int, IHF and Xis) 
are in bold, proteins that inhibit (Xis and IHF) or 
enhance (FIS) the indicated reactions are in italics. 


hydroxyl. When acting on a single duplex in this way 
the nick is rapidly resealed (ligated) by reversal of the 
reaction and Int is thus a type I topoisomerase capable 
of relaxing supercoiled DNA. This cleavage/ligation 
chemistry is harnessed for recombination by the 
arrangement of Int binding sites on the att site 
DNAs as a pair of inverted repeats (separated by a 7 
bp ‘overlap’ sequence). Alignment (synapsis) of two 
recombining att sites generates a tetramer of Int pro- 
tomers, each capable of cleaving and ligating one of 
the four DNA strands that are to be exchanged. 


The energy-conserving tyrosine-mediated chemis- 
try of DNA cleavage and ligation is characteristic of 
all Int Family members and is executed by the C 
terminal portion of Int, from residues 65-356 (referred 
to as C65). This region has been further subdivided 
into a catalytic domain (C170 domain, residues 170- 
356) that contains all of the residues involved in the 
cleavage/ligation chemistry and a central domain (CB 
domain, residues 65-169) that recognizes and binds 
specifically to the core-type Int binding sites. Critical 
residues in the à catalytic domain are identified by 
analysis of the à crystal structure and comparisons 
with the genetics, biochemistry, sequences, and crystal 
structures of other Int Family members. A tetrad of 
four basic residues (R212, H308, H333, and R311) 
activates the scissile phosphate and together with a 
fifth basic residue (K235) comprises a highly con- 
served basic pocket in the active site. 

A striking feature of the à crystal structure is the 
location of the attacking tyrosine342 nucleophile on a 
flexible loop about 20 A from the heart of the catalytic 
pentad. This loop can be readily modeled in a config- 
uration that orients Y342 for an in-line attack of the 
scissile phosphate coordinated by the catalytic tetrad 
of the same protomer. This would be consistent with 
those biochemical data indicating a ‘cis cleavage’ 
mechanism for A Int. Alternatively, it can be modeled 
so that the Y342 attacks the scissile phosphate co- 
ordinated by the tetrad of an adjacent Int promoter 
in the recombination complex, consistent with other 
biochemical data indicating a ‘trans cleavage’ mechan- 
ism for À Int and for other Int Family members. The 
crystal structures of other Int Family members do not 
place the attacking tyrosine on such a flexible loop, so 
it remains to be seen whether this is an idiosyncrasy of 
the Int crystal or a significant feature of the protein 
and/or reaction. 

The N-terminus of Int encodes an additional 
DNA-binding domain (N64) that recognizes and 
binds with high affinity to a second family of DNA 
sequences (arm-type) that are distinct and distant from 
the core-type sequences where strand exchange takes 
place. This feature of A Int, as a heterobivalent DNA- 
binding protein, is typical of a subgroup of Int Family 
members, many of which are also virally encoded. 


Accessory Proteins 


The à recombination pathway also depends on several 
accessory proteins, all of which are site-specific 
DNA-bending proteins. IHF (integration host fac- 
tor), is a 21 kDa heterodimer encoded by E. coli that 
is required for both integrative and excisive recom- 
bination. Although it was discovered because of its 
requirement for À recombination, it plays an important 
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role as a transcription factor for many E. coli genes and 
is itself regulated such that its intracellular concen- 
tration increases as cells enter stationary phase. The 
co-crystal structure confirmed that IHF induces a 
‘U-turn’ in DNA at its binding site. 

The accessory protein Xis (excisionase) is a virally 
encoded 72 amino acid protein that is required for 
excisive recombination and is inhibitory for the inte- 
grative reaction. It is the primary switch for determin- 
ing the direction of à recombination but its effects can 
be modulated by the relative concentrations of the 
three other proteins. An Xis dimer or an Xis—Fis 
heterodimer induce ‘U-turn’ bends similar to that of 
IHF at their respective binding sites. 

The other host-encoded accessory factor, Fis (fac- 
tor for inversion stimulation), was first discovered as a 
result of its role as an accessory protein in the Hin and 
Gin site-specific recombination pathways, and was 
then independently discovered as a protein that stimu- 
lated à excisive recombination when Xis is limiting. 
This 95 amino acid homodimeric DNA-bending pro- 
tein, whose crystal structure has been solved, is also a 
regulatory protein in E. coli, with especially promin- 
ent roles in the regulation of ribosomal RNA tran- 
scription and DNA replication initiation. A dramatic 
increase in the synthesis of Fis protein coincides with 
the emergence of cells from stationary phase and their 
entry into logarithmic growth. 


att Site Structure 


The bacterial att site (attB) is exemplary of the most 
basic Int Family recombination target site; it consists 
of two 7 bp ‘core-type’ Int binding sites as inverted 
repeats (B and B’, respectively) separated by a 7 bp 
‘overlap region.’ The entire attB is often designated as 
BOB’. The overlap region is flanked and defined by 
the staggered nicks made by Int during strand 
exchange. In recombinant DNA products this region 
receives one strand from each of the ‘parental’ mol- 
ecules, i.e., it is heteroduplex, and it is therefore critical 
that both parental molecules have the same overlap 
sequence. Although a wide variety of sequences are 
tolerated in the overlap region, the second and sixth 
positions (one base pair in from each end) are fixed 
because they are part of the sequence recognized by 
Int and, for unknown reasons, some other positions 
do not accept some base pairs. 

In contrast to the 25 bp attB, the phage att site 
(attP) has 240 bp and is much more complex. It is 
built around a core region that is virtually identical 
to the atzB site except for a few differences in the core- 
type Int binding sites (C and C’, respectively). The 
COC’ core region is flanked by arms (P on the left 
and P’ on the right) containing binding sites for the 
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N-terminal domain of Int (P1 and P2 in the P arm 
and P’1, P’2, and P’3 in the P’ arm), IHF (H1 and H2 
in the P arm and H’ in the P’ arm), Xis (X1 and X2 in 
the P arm), and Fis (F, overlapping X2 in the P arm). 
Since strand exchange takes place within the overlap 
region, integrative recombination between the POP’ 
and BOB’ att sites generates the prophage att sites 
BOP’ and POB’ (attL and attR, respectively), whose 
complexity is intermediate to that of attP and attB. 


Specialized Transducing Phage and 
Secondary att Sites 


AttB is located between the gal and bio genes on the 
E. coli chromosome. Very rare aberrant excision events 
sometimes result in one of these genes being excised 
along with the à prophage and thereby incorporated as 
part of a new phage genome called a ‘specialized trans- 
ducing phage.’ The properties of specialized transdu- 
cing phage depend on whether, or how much, phage 
DNA was lost to compensate for the newly acquired 
bacterial DNA. 

In a cell with a deletion of attB, the A chromosome 
will integrate with reduced efficiency at other, ‘sec- 
ondary,’ att sites on the E. coli chromosome. A hi- 
erarchy of secondary att sites, each with its own 
characteristic efficiency, reflects the extent to which 
these sequences fortuitously mimic the features of 
attB. Powerful genetic selections make it possible to 
identify extremely poor mimics whose recombination 
efficiencies are reduced as much as nine orders of 
magnitude relative to attB. Aberrant excisions from 
these sites has enabled the isolation of a variety of 
different specialized transducing phage, analogous to 
the Agal and Abio phage. 


Strand Exchange 


The mechanisms of strand exchange in A Int-mediated 
recombination are the same as the basic mechanisms 
of all Int Family members (refer to Figure | of article 
Integrase Family of Site-Specific Recombinases). It 
has the added feature, shared with a subset of family 
members, that the sequential strand exchanges are 
highly ordered. After synapsis, the ‘top’ strands of 
the recombining partners are cleaved by their respect- 
ively bound Ints at the left boundary of the overlap 
region, the first three bases of the free 5’ hydroxyl- 
terminated strands of the overlap region are swapped 
and then ligated by Int to form a four-way DNA 
junction (Holliday junction). After some rearrange- 
ments that include moving the crossed strands one base 
pair to the right, a similar sequence of events is executed 
on the right side of the overlap, where the bottom 
strands are cleaved, swapped, and ligated to resolve the 


Holliday junction to recombinant helices. The strict 
ordering of strand exchanges is undoubtedly related to 
interactions involving the P and P’ arms but the mech- 
anism of this ordering is unknown. It is noteworthy 
that the order of strand exchanges is the same in both 
integrative and excisive recombination, indicative of 
the fact that one reaction is not simply the reverse 
of the other, i.e., they are two distinct reactions. 


Role of the P and P’ Arms 


One might wonder how the arm-type Int binding sites 
fit into the recombination reaction or even why these 
distal sites do not interfere or compete with the action 
at the core-type sites. The answer lies in the accessory 
proteins which introduce sharp (U-turn) bends in the 
DNA and have binding sites that are interposed 
between the two classes of Int binding sites. Binding 
and bending by the accessory proteins ‘delivers’ Ints 
bound at the high-affinity arm-type sites to the low- 
affinity core-type sites. Thus, the higher order synap- 
tic complex between two att sites is composed of 
275 bp of DNA with three to five accessory bending 
proteins and four bivalent Int protomers bridging 
pairs of arm- and core-type binding sites. 

Two different subsets of arm-type binding sites are 
used for integrative and excisive recombination. Inte- 
grative recombination requires that the attP be on a 
supercoiled DNA, that the P1 and P’2 and P’3 arm- 
type Int binding sites and all three IHF binding sites 
be occupied, and that the Xis and Fis binding sites be 
vacant. Excisive recombination, which does not 
require supercoiled att sites, requires occupation of 
the P2, P’1, and P’2 arm-type Int binding sites, the 
H1 and H2 IHF sites, and both Xis sites or one Xis 
and one Fis site. It is interesting that occupancy of the 
H1 site by IHF inhibits excisive recombination. 
Therefore, excisive recombination can be inhibited 
by high concentrations of IHF and stimulated by 
high concentrations of Fis, two host proteins (as 
noted above) whose intracellular concentration varies 
with cellular physiology. 

A tentative map suggests the following Int bridges 
in excisive recombination. One Int forms an intra- 
molecular bridge in attL between the P’1 arm-type 
site and the C’ core-type site and seems to be espe- 
cially critical in forming a recombinogenic complex. 
Two other Int bridges are intermolecular (P’2-C 
and P2-B) and are therefore probably important in 
synapsis. The fact that the B’ core-type site was not 
seen to form any Int bridges is consistent with the 
apparent requirement for only three arm-type Int 
sites, or it may reflect the status of B’ at only one par- 
ticular phase of the reaction. The structure and mechan- 
isms of the transient higher order recombinogenic 


complexes are difficult to study but future work 
should elucidate how they determine the directional- 
ity and affect the efficiency of A site-specific recombin- 
ation. 
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Just as bacteriophage have made ideal tools for study- 
ing many basic biological phenomena, they have con- 
tributed enormously to our understanding of genetic 
recombination. Shortly after phage came into general 
use by Delbriick and his followers, mutants were 
discovered that produce different plaque morpho- 
logies. Mutants of r type (for rapid lysis) produce pla- 
ques that are somewhat larger than those made by 
wild-type phage, with a sharper edge; mi (minute) 
mutants produce very small plaques; and tu (turbid) 
mutants produce turbid plaques in which many bac- 
teria are not lysed. Another type of mutant derives 
from the phenomenon of resistance; if phage T4 is 
plated on Escherichia coli B, some mutant bacteria 
designated B/4 are found that are resistant to T4 
growth. However, if a large number of T4 are plated 
on B/4, a few plaques can be found, made by 4 (host 
range) mutants that are able to multiply even on B/4. 
All these mutants were used for early genetic work, 
since with practice one can learn to recognize plaques 
combining two or more plaque-morphology charac- 
teristics. 

The phenomenon of recombination in phage was 
discovered independently by Hershey (1946) and by 
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Delbriick and Bailey (1946). Briefly, susceptible bac- 
teria are infected simultaneously by two mutants of 
the same phage - for instance, by an r mutant and a mi 
mutant of T2. The experiment is standardly done with 
a multiplicity of about seven of each parent, so 
virtually every infected cell receives the genomes of 
both. Recombination occurs in the intracellular pool 
of infecting DNA, and among the progeny one finds 
not only the parental types but also recombinants: the 
wild-type and the r mi double mutant. The phenom- 
enon is regular enough that Hershey and Rotman 
(1949) were able to publish a genetic map for phage 
T2, based on the classic principle that the frequency of 
recombination between two mutations should be (to a 
first approximation) proportional to the distance 
between their sites. This map had three linkage 
groups; linkage group II included several r mutants 
that mapped very close to one another, identifying a 
region that was later explored in detail by Seymour 
Benzer in his studies of genetic fine structure. 

Recombination in phage is a populational phenom- 
enon, akin to mating large numbers of fruit flies with 
one another, rather than to a simple cross between two 
individuals. In each infected cell, large numbers of 
DNA molecules are interacting with one another, and 
the ratio of the two parental types to one another will 
vary from cell to cell. To model the phenomenon, 
Visconti and Delbriick (1953) proposed that the 
phage genomes undergo rounds of mating as they 
multiply. Thus, an imaginary bell rings and each 
infecting genome finds a partner genome and recom- 
bination may occur between them. The genomes sepa- 
rate and replicate. Another bell rings and each genome 
again pairs with a partner, and recombination may 
occur between them. The model is generally satisfac- 
tory for explaining recombination. As explained by 
Stahl (1979), Steinberg and Stahl sought to derive a 
mathematical formulation of the model. We imagine 
that the genomes (chromosomes) engage in mating ina 
‘mating room.’ The ancestry of any chromosome is 
determined by marking one parental genome with a 
bit of red stain at two sites (A and B) and the other 
parental genome with a bit of green at these sites. Then 
to quote Stahl: 


Any emerging chromosome that has inherited the informa- 
tion at A from the red parent is said to be red at A, etc. Each 
chromosome gets painted (or repainted) just as it enters the 
mating room. We define the descent of a chromosome 
according to its color at A. If it emerges from the mating 
room red at A, then we say it is descended from the chromo- 
some that was daubed red as it entered the room. By keeping 
an imaginary record of colors, we can define a line of descent 
for any chromosome; i.e., we can trace its ancestry back to a 
unique infecting phage particle. Now we define R according 
to the color of a chromosome at B. If a chromosome from a 
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mating is a different color at B than it is at A, we say it was 
color-converted at B. 


R is thus the frequency of recombination between the 
sites A and B. Now let f be the fraction of chromo- 
somes in the cell derived from a given parent, so 1 — f 
is the fraction derived from the other parent. Now 
given a cross between parents A* B and A B*, select 
a chromosome to form one recombinant type, say A B; 
the probability that it derived from the A parent is f. If 
there are m matings per lineage, the average number of 
color conversions per lineage is mR. Matings will be 
Poisson-distributed among lineages; the probability 
that a lineage has experienced no color conversion 
will then be e”? and the probability that it has 
experienced at least one will be 1 — e ”*. The prob- 
ability that the last color conversion in the lineage 
occurred with a chromosome carrying the B marker 
will be 1 — f. Exactly the same probability holds for 
producing the other recombinant, A‘ B*, and so the 
frequency of recombination in the mating pool will 
be 


= 2f(1—f)(1-e-™) 


This basic model can then be expanded and refined, 
as was done by Stahl et al. (1964), to take account of 
other phenomena. For instance, typical crosses with 
plants and animals often show the phenomenon of 
interference, in which recombination in one region 
reduces the probability of recombination in a neigh- 
boring region, as shown by crosses involving three 
linked markers. Phage crosses commonly show nega- 
tive interference, or an increased probability of 
recombination in two neighboring regions; this is 
explained generally by the mating theory, because if 
a pair of genomes has experienced one recombin- 
ational event simply by the act of mating with each 
other, it is likely that they will have experienced other 
recombinational events. In addition, phage crosses 
often show high negative interference, which means 
an unusually high number of recombinations within 
very short distances. It is the aim of current work in 
the mechanism of recombination to explain the mol- 
ecular events responsible for such phenomena. 
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Mu is a temperate phage of Gram-negative bacteria, 
capable either of growing lytically on a sensitive host 
or of passively integrating its genome into a host’s 
DNA and persisting as a prophage. It has several 
remarkable properties that have made it a popular 
subject for research. The primary reason for interest 
in Mu is that it is a transposing phage, integrating its 
DNA into the host chromosome and amplifying its 
DNA during lytic growth using a transposition path- 
way. In addition, Mu can alter its host range using a 
novel gene splicing reaction. It has some unusual forms 
of gene regulation, and it can be used as a powerful 
genetic tool. 

A brief outline of the Mu life cycle is as follows. On 
infection of a sensitive host, the Mu genome is injected 
into the cell and integrated into the host chromosome, 
virtually at random, by a conservative or nonreplica- 
tive transposition. A few per cent of the infected cells 
persist as lysogens with the Mu genome integrated as a 
prophage. The majority proceed into the lytic cycle, 
during which the Mu genome is amplified approxi- 
mately 100-fold by replicative transposition. Copies 
of the Mu genome accumulate in the host chromo- 
some until they are cut out and packaged into phage 
heads, and free phage particles are released following 
lysis of the host cells. The fascination with Mu lies in 
the details. 


The Mature Phage 


A mature viral particle is about 1000 A long by 180 Ain 
diameter, composed of an icosohedral head, a retract- 
able tail, and six tail fibers. The tail fibers determine 


host range and come in two flavors: one allows adsorp- 
tion to bacterial strains such as E. coli K12 and some 
species of Salmonella and Serratia, and the other to 
Citrobacter, Shigella, Enterobacter, and Erwinia. The 
phage head contains a double-stranded linear viral 
DNA of approximately 39 kb, which is longer than 
the actual genome of 36717 bases; the difference is 
made up of random host sequences covalently attached 
at each end of the genome. The host sequences, which 
are shed when the genome integrates into the host 
chromosome, derive from the packaging mechanism 
— the copies of the Mu genome that accumulate in the 
host chromosome during the lytic cycle are packaged 
by a headful cutting mechanism, starting about 50 bp 
upstream of the left end and proceeding to a variable 
point about 1-3 kb beyond the right end. 


The Genetic Map and Expression of the 
Mu Genome 


The genome is organized into several transcriptional 
units, with a single gene for the repressor (c) at the left 
end, transcribed to the left, and the rest of the genome 
transcribed to the right in a regulatory cascade of 
early, middle, and late transcripts (Figure 1). The 
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large early transcript initiated at the early promoter 
Pe encodes: a second regulatory protein, Ner; the pro- 
teins required for replication, A (transposase) and B; 
about a dozen poorly understood and dispensable 
proteins in what is called the SE region; and, at the 3’ 
terminus, a protein required for initiation of the mid- 
dle transcript, Mor (or GemB). Evidence also exists 
for another promoter within the early region which 
may independently transcribe the two 3’ terminal 
genes. 

The middle transcript initiated at Pm is positively 
regulated by Mor and by Mu DNA replication; unfor- 
tunately, little is known about the mechanism of this 
important coupling between replication and tran- 
scription. The transcript encodes the C protein which 
is the positive regulator of late transcription. 

Four late transcripts initiated at Piys, Pr, Pp and 
Prom are regulated by C, and encode the proteins 
required for constructing the phage particle, for lysis, 
and for an unusual form of DNA modification. 


The Decision: Lysis or Lysogeny 


‘Transcription from P, is required for both the lytic and 
lysogenic pathways, as integration of the genome via 
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Figure I Map of the Mu genome. Approximate locations of the genetic components mentioned in the text are 


indicated. The regulatory region from around 850-1 150 bp is expanded, showing the binding sites for c repressor 
(open boxes), Ner and IHF (closed boxes), and the locations of the promoters P, and P.. 
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conservative transposition, requiring the A and B pro- 
teins, is a first step for both pathways. To enter the 
lysogenic pathway, continued early transcription 
must be inhibited by binding of the c repressor to a 
set of three operator sites (O;—O3) located between the 
divergently transcribed c gene under P, and the early 
region under P, (Figure |). Repressor confers immun- 
ity to superinfection and also autogenously regulates 
its own synthesis. Opposing the action of the c repres- 
sor, the Ner protein, product of the first gene in the 
early operon, inhibits c repressor synthesis from P, and 
reduces transcription from P. by binding to a site over- 
lapping O2 and O3. Hence, the choice between the lytic 
and lysogenic pathways is determined, at least in part, 
by the interplay of the two repressors. The choice also 
is influenced by host physiology through the host IHF 
protein, which binds to a site between O4 and O3. 


Transposition: Conservative and 
Replicative 


Studies on Mu, both żin vivo and in vitro, have been 
instrumental to our present understanding of the 
mechanism of transposition. The A (transposase) end 
B proteins, along with the host HU protein, carry out 
transposition within nucleoprotein structures called 
transpososomes. To initiate transposition, the ends of 
the Mu genome, whether from an infecting viral DNA 
or from a prophage, are synapsed by a complex of 
transposase monomers bound to three sites at each 
end of the genome and to a transpositional enhancer 
that overlaps the operatorregion (Figure 2). Rearrange- 
ment of the complex results in the formation of a stable 
transposase tetramer bound to the ends of the genome 
(transpososome O). Synapsis of the genome ends of 
infecting DNA is aided by a coinjected N protein 
bound to the ends of the viral DNA. Efficient synapsis 
of prophage ends during replicative transposition 
requires a site in the center of the genome called the 


mini-Mu LER 


Type 0 


SGS (strong gyrase site). The SGS apparently promotes 
synapsis by organizing the structure of the prophage 
into a plectonemically interwound supercoiled loop, 
with the SGS at the apex of the loop and the prophage 
ends to be synapsed at the base. 

After synapsis, single-strand nicks are introduced 
at the 3’ ends of the genome and these occur in trans, 
i.e., transposase bound at the left and cleaves at the 
right end, and vice versa (transpososome 1). In a com- 
plex consisting of the transposase tetramer bound to 
the Mu genome ends and to the B protein which has 
recruited a target DNA, the 3’ ends are joined to the 
target DNA at the ends of a 5 bp staggered cut by a 
coupled cleavage and ligation reaction (transposo- 
some 2). This step is referred to as strand transfer. 

The transposition intermediate thus formed can be 
processed in two ways. The conservative transposition 
observed with infecting DNA requires cleavage at 
the 5’ ends of the Mu genome to remove the host 
sequences that were present in the infecting DNA. 
The replicative transposition that amplifies Mu DNA 
during the lytic cycle requires that host replication 
machinery enter the complex and replicate the Mu 
genome. During replicative transposition the host 
DNA polymerase initiates replication, mostly from 
the left end of the Mu DNA, using the remaining free 
3’ hydroxyl of the host DNA as the primer for leading 
strand synthesis. Entry of the polymerase requires 
prior removal of the tightly bound transposase, which 
is accomplished by several host proteins, including the 
chaperone ClpX and the primasome assembly pro- 
teins. Multiple rounds of replicative transposition 
result in the accumulation of about 100 copies of the 
genome in the host chromosome during the lytic cycle. 


Mom Modification 


The replicated DNA undergoes an unusual DNA 
modification, catalyzed by the product of the mom 
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Figure 2 Mu DNA transposition in vitro. The protein-DNA complexes formed during the in vitro reaction are 
illustrated. The left (L) and right (R) ends and the enhancer (E) are synapsed by the A protein to form the LER 
complex. Rearrangement of the A protein monomers leads to the Type 0 complex. Nicking at the Mu ends produces 
the Type | complex. Ligation of the ends to target DNA (strand-transfer) produces the Type 2 complex. The Type 2 
complex can then be processed for replicative transposition as described in the text. 


gene, involving conversion of the adenine residue 
in the sequence C/G A C/G N Py to a -N -(9 -B -D - 
2'-deoxyribofuranosylpurin-6-yl)glycinamide. The 
modification protects the progeny Mu DNA from 
many host restriction systems. What is most remark- 
able is the complexity of the regulation of the synth- 
esis of the Mom protein, apparently intended to delay 
its appearance until late in the lytic cycle. Transcrip- 
tion of mom requires the C protein, the positive regu- 
lator of all late transcription, but it is also dependent 
on host Dam methylation. Transcription of mom is 
inhibited by binding of the host OxyR protein to 
unmethylated or hemimethylated Dam sites upstream 
of mom; details of the release of OxyR repression 
during Mu replication are missing. In addition, synthe- 
sis of Mom is translationally regulated by a protein 
encoded by the com gene, which overlaps the mom 
gene. Com binding to its cognate site in the mom—com 
mRNA destabilizes a strong stem-loop in the mRNA, 
and exposes sequences required for initiation of trans- 
lation of mom. 


G Inversion and Host Range Specificity 


G inversion of a 3kb G region, using site-specific 
recombination between 34bp inverted repeat se- 
quences called gix that flank the G region, is respon- 
sible for host range variability. In one orientation of 
the G region, two proteins S and U can be synthesized, 
while the other orientation gives rise to S’ and U’. The 
S and S’ genes are not encoded completely within the 
G region, rather the 5’ portion of the gene encoding a 
particular tail fiber lies immediately upstream of the 
border of the G region, while the 3’ portion of the gene 
lies within the contiguous portion of the G region. 
Inversion of the G region results in a new gene with 
the same 5’ portion, but with a different 3’ portion. 

G in version is catalyzed by the product of the gin 
gene, located immediately downstream of the G re- 
gion. Synthesis of Gin is constitutive, allowing G inver- 
sion to occur at a low frequency in the lysogenic state. 
The Gin protein is a member of the invertase family of 
site-specific recombinases, and uses the host FIS pro- 
tein, which binds to a site within G, as an enhancer. 
The interaction between Gin and FIS both stimulates 
the rate of the reaction and imposes topological speci- 
ficity, such that only in cis, inverted gix sites can be 
utilized. 


Mu as a Genetic Tool 


Two aspects of Mu biology that have been exploited 
for use in genetic studies are the creation of mutations 
due to insertion in host genes (hence the name Mu 
for mutator phage) and the formation of various 
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chromosomal rearrangements such as inversions and 
deletions during replication. Mini-Mu constructs, 
deleted for most of the lytic functions but retaining 
at least the terminal sequences necessary for transpos- 
ition, have been particularly useful. For example, 
reporter genes, such as lac, can be placed within and 
near the right end of a mini-Mu; integration of such a 
construct can result in fusions in which the reporter 
gene has been placed under the control of the promo- 
ter of the gene into which Mu has inserted, allowing 
facile studies of gene regulation. Mini-Mus can also be 
transferred on promiscuous, conjugative plasmids 
into organisms to which the phage does not adsorb, 
thus further extending its usefulness as a genetic tool. 
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Any molecule displayed in sufficient quantity on the 
surface of a strain of bacteria can be used by some 
phage as a receptor. Molecules used as receptors 
include various specific lipopolysaccharides, the 
porins and other outer-membrane proteins, special 
proteins of pili and flagella and, for Gram-positive 
bacteria, peptidoglycan complexes involving teichoic 
acid or C-carbohydrate. Receptors even include extra- 
cellular slime and capsule molecules, at least for the 
first step of the infection process. Some phages, 
including coliphage T4, bind reversibly to one kind 
of receptor and thus position themselves to bind irre- 
versibly to another. Closely related phages often target 
different receptors, and the same phage may use dif- 
ferent receptors on different bacteria. For example, 
bacteriophage T4 targets a lipopolysaccharide on 
Escherichia coli B that is quite specific to B strains, 
while on K strains it primarily targets outer membrane 
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protein OmpC (which is not present on B) as well as, 
with lower affinity, a different lipopolysaccharide. T2 
recognizes either OmpF or FadL, while T6 recognizes 
the protein Tsx=NupA. The related phage Ox2 
normally uses OmpA as its receptor, but mutants 
have been isolated that recognize both Omp A and 
OmpC, or OmpP (an outer-membrane protease), or 
coli B-type lipopolysaccharides with one or with two 
terminal glucose residues. 

The phage with tail fibers, like the T-even phages 
and P1, require multiple closely spaced molecules of 
the targeted receptor, since generally several tail fibers 
must bind simultaneously to initiate infection. In con- 
trast, coliphage phage N4 targets a complex protein, 
NfrA-D, that is present in at most five copies per 
bacterial cell. In the case of at least some receptors 
on pili, the evidence now is that the pilus retracts to 
bring the phage to its surface after the phage binds 
through structures such as lateral spikes to the pilus. 
Some tailed phages interact with flagella and then 
move down along them to receptors on the cell sur- 
face. Some specific divalent cations are generally 
required as cofactors for phage-receptor interaction. 

The binding region near the tip of the tail or of the 
tail fiber is the most variable region of any given phage 
group. Altered receptor-binding sites seem to be 
generated in many ways, either by mutation or by 
inserting a small piece of DNA near the distal end of 
the tail-fiber gene that it has scavenged from other 
coinfecting phages, prophages, or the host chromo- 
some. In the highly lytic phage T4, this is the only site 
that shows signs of recent acquisition of host DNA. 
The various T-even phages show regions of high, low, 
and no homology in their distal-tail-fiber genes; high 
homology regions on both sides of the region encod- 
ing the actual binding sites facilitate recombination 
and thus recognition-element shuffling. The sugges- 
tion has been made repeatedly that these receptor 
recognition regions are the prokaryotic analogs of 
the immunoglobulins. 


See also: Bacteriophages 
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Overview 


Bacteria have impressive abilities to sense and adapt to 
changing environmental conditions. One way that 
bacteria are able to adapt to a changing environment 


is by altering the expression of various gene products 
by individual cells within a bacterial population with 
the result that some bacteria expressing the ‘right’ 
combination of factors for a given environment will 
always be present. Heterogeneity in the expression of 
gene products within a bacterial population can occur 
via a phenomenon known as phase variation, a term 
that was originally applied to the variation in expres- 
sion of two different flagellar antigenic types or 
‘phases,’ H1 and H2, in a population of Salmonella 
(Stocker, 1949). Phase variation is defined as a herit- 
able change in the level of expression of a specific gene 
product. The expression of a given product may 
switch between off and on states or, in other cases, 
expression of a product may vary between high and 
low amounts. Phase variation has been reported to 
occur with various cell surface molecules such as pili, 
flagella, outer membrane proteins, and capsules, as 
well as intracellular proteins such as DNA restric- 
tion/modification systems. 


Mechanisms of Phase Variation 


Bacteria undergo phase variation by two basic 
mechanisms: (1) alterations in the DNA sequence 
and (2) methylation of the DNA. The former mechan- 
isms are classified as ‘genetic’ and include site-specific 
recombination, general recombination, and slipped- 
strand mispairing, whereas the latter are ‘epigenetic’ 
since they do not involve rearrangement of the DNA 
sequence. 


Alterations in DNA Sequence (Genetic) 


Site-specific recombination 

One way for bacteria to heritably alter the expression 
of genes is via site-specific recombination between 
inverted repeats (Figure 1). This process inverts the 
DNA segment containing regulatory sequences that 
lie between the repeats. In Salmonella flagellar phase 
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Figure | Site-specific recombination. 


variation, the H2 flagellin promoter resides on a 
966 bp invertible region. In one orientation, H2 is 
transcribed. The H2 flagellin is expressed along with 
a repressor of H1 transcription. In the inverted orien- 
tation the H2 promoter can no longer control the 
expression of H2 flagellin and H1 repressor. Thus 
H1 flagella are expressed via their own promoter 
(Zieg et al., 1977). Inversion between the repeats is 
facilitated by the Hin recombinase acting in concert 
with the factor for inversion stimulation (Fis) and the 
histone-like protein HU. A similar mechanism con- 
trols the expression of tail fibres with different binding 
specificities in bacteriophage Mu. 

Another example of phase variation by site-specific 
recombination is type 1 pili (Fim) expression in 
Escherichia colt. In this case a 314 bp DNA fragment 
containing the fim promoter undergoes inversion 
mediated by the FimE and FimB recombinases in a 
process requiring leucine-responsive regulatory pro- 
tein (Lrp) and integration host factor (IHF). This 
inversion controls the on/off switch of type 1 pilus 
expression (Blomfield et al., 1997). 


General recombination 

Phase variation also occurs via a RecA-mediated 
recombination between homologous DNA segments. 
For example, type IV pili expression in Neisseria 
gonorrhoeae is subject to a phase variation mechanism 
controlled by RecA-mediated recombination between 
one of several silent, unexpressed pilin genes (pi/S) 
lacking promoters and the expressed pilus gene copy, 
pilE, which contains a promoter. Since the pilS genes 
each code for different amino acid sequences, this 
mechanism generates antigenically distinct pili when 
the silent gene is expressed from the pi/E promoter. 
Phase variation can also occur as a result of a misalign- 
ment during recombination. This generates deletions 
in the expressed gene copy or multiple tandem copies 
of pilE, which though expressed, cannot be assembled 
into the pilus—adhesin complex (Seifert, 1996). 


Slipped-strand mispairing 

A third mechanism for phase variation involves mis- 
pairing during DNA replication between DNA 
regions containing repetitive DNA elements such as 
short sequence repeats (SSRs). Misalignment of SSR 
regions during DNA replication results in the inser- 
tion or deletion of base pairs, which can alter either the 
transcription or translation of specific genes. An exam- 
ple of this form of control that is exerted at the tran- 
scriptional level is the Opc outer membrane protein in 
Neisseria meningitidis, which undergoes phase vari- 
ation due to slipped-strand mispairing within a poly(C) 
region of DNA near the opc promoter. The number 
of cytosine bases in this region controls the level of 
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expression of Opc protein: if the number of cytosines 
is less than 10 or higher than 15, Opc is not expressed 
(Sarkari et al., 1994). Similarly, LKP pili expressed by 
Hemophilus influenzae undergo phase variation as a 
result of slipped-strand mispairing at tandem repeats 
of “TA’ base pairs within the promoter region. 

An example of mispairing phase variation exerted 
at the translational level is the phase variation of opa- 
city proteins of N. gonorrhoeae and N. meningitidis. 
In these cases the translational reading frame of the 
Opa protein is controlled by slipped-strand mispair- 
ing between CTCTT pentamer coding repeats (CRs) 
within the signal peptide region of each opa gene. The 
result is that with 6,9, or 12 CRs the ATG initiation 
codon is in frame with the remaining opa gene, 
whereas at 4 or 8 CRs the Opa protein is not expressed 
since the ATG start codon is out of frame with the 
remaining opa codons. 


Methylation of DNA (Epigenetic) 

Another phase variation mechanism is orchestrated by 
the methylation of bases in DNA. DNA methylation 
can alter the binding of regulatory proteins to DNA, 
changing the expression of gene products including pili 
and outer membrane proteins (Henderson et al., 1999). 
A number of piliexpressed by E. coli and Salmonella are 
regulated by methylation-dependent phase variation 
including the pyelonephritis-associated pili (Pap) 
expressed by uropathogenic E. coli. The pap operon 
is regulated by the DNA adenine methylase (Dam), 
which is necessary for the formation of specific DNA 
methylation patterns at two target GATC sequences 
located in the pap regulatory region. In phase ON 
cells expressing Pap pili, the GATC site proximal to 
the pilin promoter (GATC?™*) is methylated, whereas 
the distal GATC site (GATC) is not methylated 
(Figure 2). In phase OFF cells the pattern is reversed. 
Pap DNA methylation patterns directly regulate pap 
transcription by affecting where the leucine respon- 
sive regulatory protein (Lrp) binds, since the affinity 
of Lrp is greatly reduced at methylated GATC sites. 
When bound at the nonmethylated GATC?*™ site, 
Lrp represses pap transcription. When bound to the 
nonmethylated GATC™ site, Lrp activates pap tran- 
scription. Additional regulatory proteins including 
PapI, H-NS, and CAP participate in Pap pilus phase 
switching and transcription. 


Biological Significance of Phase Variation 


Microorganisms are subject to changing environ- 
mental milieus in which adaptive responses are critical 
for survival. The kinds of environmental challenges 
microorganisms face range from occupying new 
physical, biochemical, and/or biological niches to 
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Figure 2 DNA methylation. 


avoiding host immune responses. Phase variation may 
help bacteria to adapt to changing environments. For 
example, phase variation of Lpf pili in Salmonella may 
provide a mechanism by which they evade cross- 
immunity between different serotypes, allowing their 
coexistence within a host (Norris and Baumler, 1999). 
Recent work suggests that phase variation of the outer 
membrane protein Ag43 may be important in the 
formation of biofilms by E. coli (Danese et al., 2000). 

Phase variation attributable to slipped-strand 
mispairing and general homologous recombination 
appears to be, for the most part, a random process. 
These types of phase variation have the advantage of 
creating a diversity of phenotypes within a resident 
population of cells. Thus a few cells of the population 
are preadapted to a potential environmental change 
and cell lines descended from those cells will survive 
should that change occur. 

Site-specific recombination and epigenetic types of 
phase variation may be random to a degree but they 
are also subject to environmental regulation (Krabbe 
et al., 2000). In the latter case, phase variation is itself 
an adaptive response to a changed environment. In 
this way the environment can influence the expression 
or nonexpression of a set of genes in an inheritable 
fashion. 
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Beans usually refers to food legumes of the genus 
Phaseolus, family Leguminosae, subfamily Papilio- 
noideae, tribe Phaseoleae, subtribe Phaseolinae. The 
genus Phaseolus contains some 50 wild-growing spe- 
cies distributed only in the Americas (Asian Phaseolus 
have been reclassified as Vigna). These species repre- 
sent a wide range of life histories (annual to perennial), 
growth habits (bush to climbing), reproductive sys- 
tems, and adaptations (from cool to warm and dry 
to wet). The genus also contains five domesticated 
species: in decreasing order of importance, common 
bean (Phaseolus vulgaris L.), lima bean (P. lunatus L.), 


runner bean (Phaseolus coccineus L.), tepary bean 
(P. acutifolius A. Gray), and year bean (P. polyanthus 
Greenman), with distinct adaptations and reproduct- 
ive systems: mesic and temperate, predominantly self- 
pollinated; warm and humid, predominantly self- pol- 
linated; hot and dry, cleistogamous; cool and humid, 
outcrossing; and cool and humid, outcrossing, respec- 
tively. Lima bean is phylogenetically more distant 
from the other domesticated species, which are sibling 
species and constitute a syngameon. The principal 
species economically and scientifically is common 
bean. It originated in Latin America where its wild 
progenitor (P. vulgaris var. mexicanus and var. abor- 
igineus) has a wide distribution ranging from northern 
Mexico to northwestern Argentina. Large germplasm 
collections of domesticated and wild forms are located 
at CIAT, Cali, Colombia and USDA, Pullman, 
Washington, USA. The reference collection of Pha- 
seolinae is located at the National Botanical Garden, 
Meise, Belgium. 

Common bean is the most important legume 
worldwide for direct human consumption. The crop is 
consumed principally for its dry (mature) beans, shell 
beans (seeds at physiological maturity), and green pods. 
When consumed as seed, beans constitute an import- 
ant source of dietary protein (22% of seed weight) 
that complements cereals for over half a billion people 
mainly in Latin America. Annual production of dry 
beans is around 15 million tonnes and average yield is 
700 kg ha~', although yields in certain countries reach 
2000-3000 kg ha~". The largest producers of dry beans 
are Brazil, Mexico, China, and the USA. Annual pro- 
duction of green beans is around 4.5 million tonnes, 
with the largest production around the Mediterranean 
and in the USA. 

Common bean was used to derive important prin- 
ciples in genetics. Mendel used beans to confirm his 
results derived in peas. Johannsen used beans to illus- 
trate the quantitative nature of the inheritance of cer- 
tain traits such as seed weight. Sax established the basic 
methodology to identify quantitative trait loci (for 
seed weight) via co-segregation with Mendelian mark- 
ers (seed color and color pattern). The cultivars of 
common bean stem from at least two different domes- 
tications, in the southern Andes and Mesoamerica. In 
turn, their respective wild progenitors in these two 
regions have a common ancestor in Ecuador and 
northern Peru. This knowledge of the evolution of 
common bean, combined with recent advances in the 
study of the phylogeny of the genus, constitute one of 
the main current attractions of beans as genetic organ- 
isms. All species of the genus are diploid and most 
have 22 chromosomes (27 = 2 x = 22). A few species 
show an aneuploid reduction to 20 chromosomes. The 
genome of common bean is one of the smallest in the 
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legume family at 625 Mbp per haploid genome. Nor- 
mal mitotic or meiotic chromosomes are very small 
(1-3 um), metacentric or submetacentric. A karyotype 
has been developed for P. vulgaris and P. coccineus 
based on polytene chromosomes of the embryo sus- 
pensor cells. There are three or four rRNA loci 
(nucleolar organizing regions). In situ hybridization 
with radioactive or fluorescent probes have been per- 
formed on mitotic or polytene chromosomes for 
rRNA, telomeric, and single-copy sequences. Highly 
repeated sequences comprise some 20% of the genome. 
They are distributed primarily in highly heterochro- 
matic regions and in chromosome ends. Satellite DNA 
is located mostly around centromeres. An as yet 
incomplete set of five trisomic stocks has been iden- 
tified. A consensus molecular linkage map, correlating 
some 12 maps, has been established based on RFLP, 
RAPD, isozyme, AFLP, ISSR, microsatellite, and 
phenotypic markers. The average total map length is 
1200 cM, consistent with the average number of chias- 
mata per bivalent (1.9). A single estimate of the aver- 
age relationship of physical vs. physical distance gave 
400 000 bp per cM, close to the genome-wide average 
of 500 000 bp per cM. The genome of common bean is 
colinear with that of Vigna sp. (also belonging to the 
subtribe of the Phaseolinae within the tribe Phaseo- 
leae), but shows many rearrangements when com- 
pared to that of soybean (subtribe of the Glycininae 
within the tribe Phaseoleae). A retrotransposon family 
of the copia type has been described. Bacterial artifi- 
cial chromosome libraries have been established for 
common bean. Major genes or quantitative trait loci 
for the domestication syndrome (reduced seed dis- 
persal and seed dormancy, compact growth habit, 
photoperiod insensitivity, seed size, color, and color 
pattern) have been located on the linkage map, as 
have clusters of resistance genes and resistance gene 
analogs (to viral, fungal, and bacterial diseases), and 
genes for Rhizobium nodulation, canning quality, and 
drought tolerance. In addition, several unmapped 
genes, especially for disease resistance and seed color 
and color pattern, have been tagged with molecular 
markers. Transformation systems have been estab- 
lished. These include an Agrobacterium-mediated 
system in P acutifolius and a biolistics method in 
P. vulgaris. 


Further Reading 
http://agronomy.ucdavis.edu/gepts/geptslab.htm 
http://www.ba.cnr.it/Beanref/ 
http://beangenes.cws.ndsu.nodak.edu:80/ 


See also: Glycine max (Soybean); Rhizobium; 
Transfer of Genetic Information from 
Agrobacterium tumefaciens to Plants 
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Phenetics is the study of phenetic similarity, which is 
that based on observed resemblances between entities 
without considering their history. This is in contrast to 
cladistics in which the similarity reflects the entity’s 
evolution. Thus, extreme evolutionary convergence 
can produce close phenetic similarity, though the 
cladistic relationship may be remote. All data can be 
analyzed to give either phenetic or cladistic relation- 
ships depending on the algorithms employed. Phe- 
netics does not refer to phenotype; genotypic or 
genomic data can be analyzed either by phenetic or 
cladistic methods. 


See also: Cladistics; Taxonomy, Evolutionary 
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A phenocopy is a condition where the phenotype of 
an individual is altered because of an environmental 
factor, and thus the individual appears to have an 
altered genotype, though in fact it does not. For ex- 
ample a person who is genetically diabetic whose de- 
ficiency is reversed by taking insulin, appears to have a 
normal genotype even though he is still genetically 
diabetic. For bacteria, the phenotype of Escherichia 
coli F* cells (which carry F fertility factors and thus 
are good donors and poor recipients) is converted 
by long incubation in stationary phase into cells 
which are good recipients and poor donors, i.e., they 
are F` phenocopies. 

In addition to a reversible phenocopy condition as 
in the examples above, another type of phenocopy 
occurs when a developing embryo is subjected to 
certain unusual stresses which permanently aoe the 
individual’s development. For example, a genetically 
normal human embryo subjected to the drug thalido- 
mide results in arrested development of part or all of 
the four limbs. Thus, the resulting phenotype does not 
correspond to the individual’s normal genotype. Simi- 
larly, in Drosophila, numerous environment factors 
during the course of fly development can result in 


altered appearance or behavior of the adult fly, 
in some cases mimicking the phenotype of a genetic- 
ally mutant fly, even though in fact the genotype is 
normal. 


See also: Genotype; Phenotype 
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A phenogram is one form of tree-like diagram (den- 
drogram) that expresses the phenetic relationships 
between the entities studied. The relationships are 
based on observed resemblances without considering 
their history, in contrast to cladograms which express 
phylogeny. A phenogram is usually constructed by 
cluster analysis of the similarities between the entities. 
The most similar entities are grouped together first, 
and less similar ones are added successively. The en- 
tities are represented as the tips of the tree, which all lie 
at the same level. The scale from the base to tips 
represents similarity values. The best-known cluster 
methods are the unweighted pair-group method with 
averages (UPGMA) and the single linkage method 
(SL). 


See also: Phenetics; Trees 
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The term phenotype is used to describe a discrete or 
measurable trait, attribute, or characteristic that is 
expressed in only a subset of the individuals within a 
population. Some phenotypes are controlled entirely 
by the genetic constitution of the individual, meaning 
his or her genotype at one or more loci. Other pheno- 
types are controlled by a combination of genetic and 
nongenetic factors. Still other phenotypes (like the 
particular language that one speaks) are entirely non- 
genetic. 


See also: Genotype 


Phenotypic Lag 
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The phenotypic lag is the period of time between 
the introduction or loss of genetic material and the 
expression of the functional phenotype. For example, 
if a bacterium, which is antibiotic sensitive, is intro- 
duced to antibiotic immediately after introducing the 
antibiotic resistance gene to the bacterium by trans- 
formation, the bacterium will not survive. However if 
the newly transformed bacterium is allowed a period 
of time to grow in the absence of the antibiotic, the 
bacterium will have time to express the enzyme (pro- 
tein) that degrades the antibiotic and as a consequence 
of expressing the enzyme the bacterium will survive in 
the presence of the antibiotic. 


See also: Phenotype 
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Phenotypic mixing is a concept which grew out of 
experiments in the 1940s and 1950s with bacterio- 
phages wherein two slightly different strains of phage 
(T2 and T4) were coinfected into the same culture of 
Escherichia coli host cells, and progeny phage were 
analyzed. It was found that the genome from either 
phage could be packaged into capsids of either phage, 
so that the progeny phages in the bursts were a mixed 
population of pure and hybrid genome/capsid combin- 
ations. Thus, the hybrid progeny phage particles 
showed phenotypic mixing, i.e., the capsid phenotype 
(which corresponded to either phage T2 or T4 and 
can be differentiated by the ability to subsequently 
infect certain other E. coli host strains) did not corres- 
pond to the genome (i.e., from T4 or T2) packaged 
within. 

This phenomenon was subsequently used to test 
for in vivo complementation between different tail 
mutants of phages, to determine whether different 
mutations are in the same or different genes. A further 
use of phenotypic mixing was to study the kinetics of 
gene expression for the tails or capsids, by delaying the 
infection by a second phage following partial devel- 
opment of progeny of infection by the first phage. 
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Phenotypic mixing was also used in the study of 
tobacco mosaic virus whereby hybrid genome/capsids 
were constructed i vitro, and used to prove that the 
RNA genome carried the genetic information of the 
virus. 


See also: Bacteriophages; Complementation Test 
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Phenylalanine is one of the 20 amino acids commonly 
found in proteins. Its abbreviation is Phe and its single 
letter designation is F. As one of the essential amino 
acids in humans, it is not synthesized by the body and 
so must be provided in an individual’s diet. 

The chemical structure of phenylalanine is given 
below. 


oe 
a 
CH, 
Figure | Phenylalanine. 
See also: Amino Acids 
Phenylketonuria 


R C Eisensmith and S L C Woo 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.0990 


Phenylketonuria (PKU), an autosomal recessive gen- 
etic disease, is the most severe form of a broad spec- 
trum of disorders, which stem from an inability to 
hydroxylate the essential amino acid phenylalanine 
to form the normally nonessential amino acid tyro- 
sine. This metabolic defect results in significantly ele- 
vated levels of phenylalanine in the blood. When 
phenylalanine levels become sufficiently high, alterna- 
tive pathways for the metabolism of phenylalanine can 
become activated. Phenylalanine may either be decar- 
boxylated to form phenylethylamine or be transamin- 
ated to form a variety of phenylketone compounds. 
The excretion of phenylketones via the urine was 
the initial diagnostic feature of this disorder and gave 
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rise to its name. These abnormal phenylalanine deriva- 
tives and/or elevated levels of phenylalanine inter- 
fere with several critical processes in the developing 
brain, including myelination and the synthesis of pro- 
teins and neurotransmitters, which greatly impairs 
cognitive function. Other than mental retardation, 
the most overt symptom associated with persistent 
hyperphenylalaninemia is the hypopigmentation that 
is secondary to a deficiency of tyrosine, the precursor 
of melanin. Additional symptoms reported in PKU 
patients include scleroderma, behavioral disturbances, 
and convulsive seizures. Persistent hyperphenylalanin- 
emia in females during the course of pregnancy can 
lead to the occurrence of a number of birth defects in 
the developing offspring, including mental retardation, 
microcephaly, impaired somatic growth, congenital 
heart abnormalities, and facial dysmorphisms. This 
so-called ‘maternal PKU’ syndrome is also associated 
with a higher incidence of stillbirths than normal 
pregnancies. 

The mental retardation and other symptoms of 
hyperphenylalaninemia can be largely reduced or 
completely prevented by restriction of phenylalanine 
in the diet if treatment is implemented early in the 
neonatal period and maintained throughout the course 
of the patient’s life. This observation stimulated the 
development of a simple procedure that could be used 
to rapidly test all newborns for this disorder so that 
treatment could be implemented. From these mass 
screenings, the incidence of PKU has been estimated 
at approximately one case in every 10000 births 
among Caucasians. This incidence, which corres- 
ponds to a carrier frequency of about one in fifty, 
places PKU among the most common of inborn errors 
of amino acid metabolism in man. 

In the vast majority of cases, hyperphenylalanine- 
mia results from a lack of the liver-specific enzyme 
phenylalanine hydroxylase (PAH). This mixed- 
function oxygenase utilizes one oxygen atom obtained 
from molecular oxygen and an electron obtained from 
a tetrahydrobiopterin cofactor (BH4) to hydroxylate 
phenylalanine to form tyrosine. Because one molecule 
of the cofactor is consumed for each molecule of 
phenylalanine that is hydroxylated, BH, levels must 
be rapidly replenished for the hydroxylation reaction 
to proceed catalytically. Regeneration of BH, occurs 
via a two-step reaction catalyzed by the enzymes 
4-carbinolamine dehydratase (originally called phenyl- 
alanine hydroxylase-stimulating protein) and dihydro- 
pteridine reductase (DHPR). While the recycling 
reaction influences the amount of the cofactor that is 
immediately available to support the phenylalanine 
hydroxylase reaction, the overall level of the BH, 
cofactor within the cell ultimately is limited by its 
biosynthesis from guanosine triphosphate (GTP). 


This biosynthetic pathway involves at least three 
additional enzymes (GTP cyclohydrolase I(GTP- 
CH), 6-pyruvoyl tetrahydropterin synthase (6-PTS), 
and sepiapterine reductase). Because BH, is an absolute 
requirement for phenylalanine hydroxylation, any 
deficiency in the synthesis or recycling of this cofactor 
can impair the hydroxylation of phenylalanine, lead- 
ing to hyperphenylalaninemia. These so-called ‘BH,- 
deficient’ forms of hyperphenylalaninemia are quite 
rare, accounting for only about 1-2% of all cases of 
PKU, but their existence can complicate the diagnosis 
of this disorder. 

In the case of PAH-deficient hyperphenylalanine- 
mia, the disorder is caused by mutations in the 
phenylalanine hydroxylase gene. This gene spans ap- 
proximately 90 kb of the q22-q24.1 region of chromo- 
some 12. The gene contains 13 exons separated by 
introns ranging in size from less than 1kb to more 
than 20 kb. The full-length message transcribed from 
this gene is 2.4 kb in length, and contains an open read- 
ing frame of approximately 1350 bp, which encodes a 
52kD protein comprised of 452 amino acids. Four 
units of the PAH monomer associate to form the 
mature, homotetrameric human protein. 

The determination of the full-length sequence of 
the PAH mRNA permitted the construction of specif- 
ic probes for examination of the PAH gene by South- 
ern hybridization. Studies performed on the normal 
and mutant PAH genes present in families in which 
one individual is afflicted with PKU have yielded two 
important observations. First, complete deletion of 
the gene is not responsible for PKU. Second, the 
human PAH gene contains a number of restriction 
fragment-length polymorphisms (RFLPs) as well as 
several forms of repeat polymorphisms (VNTRs and 
STRs). These polymorphisms are tightly linked to the 
PAH gene, and thus can be used to assign normal and 
mutant PAH chromosomes to specific haplotypes. 
Moreover, there is a high degree of heterogeneity for 
these polymorphisms in most human populations. 
Consequently, haplotype analyses performed within 
PKU families can assist in the diagnosis of the disease, 
either pre- or postnatally. However, to discriminate 
between normal and mutant PAH alleles, an affected 
individual, or proband, must already be present. 
Because nearly all cases of PKU occur in families 
with no prior history of the disease, the utility of haplo- 
type testing can be limited. This constraint may theor- 
etically be overcome through direct detection of the 
PKU-causing PAH mutations in a given individual. 

Studies in PKU kindreds collected throughout the 
world have identified more than 400 mutations at 
the PAH locus. Missense mutations account for the 
majority of all mutations, followed by small dele- 
tions, splicing mutations, nonsense mutations, and 


insertions. Most individuals with hyperphenylalanin- 
emia are compound heterozygotes, bearing different 
mutations on each of their two copies of the PAH 
gene. Biochemical analyses of the proteins encoded 
by these mutant PAH genes in most cases confirm 
the deleterious effect of the mutation on protein func- 
tion. Moreover, such studies permit a relative ranking 
of mutations on the basis of the residual enzymatic 
activity associated with each mutant protein. There is 
a strong correlation between the biochemical pheno- 
type of patients, as defined by the degree of hyper- 
phenylalaninemia experienced before initiation of 
treatment and/or the ability of patients to tolerate 
increasing amounts of oral phenylalanine while main- 
taining a given level of phenylalanine in the blood, and 
their PAH genotype. In keeping with the recessive 
nature of this disorder, where a single normal copy 
of the gene is able to provide sufficient PAH activity 
to prevent the disease, a single copy of a mutation that 
only mildly impairs PAH function in an individual 
who is compound heterozygous for a mild and a 
severe PAH mutation is able to confer enough PAH 
activity to prevent severe hyperphenylalaninemia. 
Correlations of this type are useful in the diagnosis 
and treatment of patients with PAH-deficient hyper- 
phenylalaninemia. 

With such a large number of mutations present in 
the PAH gene, it is not surprising that the spectrum of 
PAH mutations varies considerably between popula- 
tions. For example, among Slavic populations, R408W 
is the predominant mutation responsible for PKU, 
and can be present on more than 70% of all mutant 
PAH chromosomes. In contrast, among Mediterra- 
nean populations, this mutation is quite rare. The 
most common mutation in these populations is a spli- 
cing mutation that is relatively rare outside of the 
Mediterranean area. Population studies have indicated 
that the distribution of PAH mutations in various 
human populations is most likely the result of multi- 
ple, independent founding events followed by genetic 
drift. However, founder effect and genetic drift alone 
seem unlikely to account for the present distribution 
of PKU, which is present at relatively high frequency 
not only in Caucasians, but also in several other cul- 
turally and geographically distinct human popula- 
tions. Selective advantage among heterozygotes, as 
has been observed for several other recessive dis- 
orders, remains an attractive hypothesis to account for 
the high relative frequency of this disorder, especially 
in light of the strong apparent disadvantage in repro- 
duction that is associated with homozygosity. How- 
ever, at present, there is little direct evidence to 
support this hypothesis. 

Since its discovery, PKU has served as a paradigm 
for diagnosis and management of patients with 
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metabolic disorders. It was among the first to be 
detected by newborn screening and to be treated by 
dietary restriction therapy. It was also among the first 
in which detailed genotype/phenotype correlations 
were derived and used to improve the management 
of patients. Moving into the future, PKU can serve as a 
prototype for the correction of metabolic disorders 
secondary to hepatic enzyme deficiencies by gene 
therapy. As gene transfer technology matures, many 
of these diseases, including PKU, may be treated 
through delivery of the normal gene into parenchymal 
cells of the liver. 


See also: Gene Therapy, Human ; Genetic 
Counseling; Genetic Diseases 
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Chromosomal abnormalities are frequently acquired 
by the bone marrow cells of patients with leukemia. It 
is now well known that a large number of nonrandom 
chromosomal changes are associated with different 
types of leukemia, thus, cytogenetic analysis of the 
bone marrow can accurately determine the diagnosis. 
The chromosomal abnormality helps to identify 
patients with a good prognosis or those with a high 
risk of treatment failure who are considered for alter- 
native therapy such as bone marrow transplantation 
(see Leukemia). 


Incidence and Outcome 


The most famous example of an acquired chromosomal 
change in malignancy is the Philadelphia chromosome 
(Ph). It was the first chromosomal abnormality to be 
found in leukemia in 1960 and is now known to be pre- 
sent in 95% of chronic myeloid leukemia (CML) 
cases. It also occurs in acute leukemia. In acute lym- 
phoblastic leukemia (ALL), the Ph is found in 2-3 % 
of childhood cases, but in adults it is the most common 
cytogenetic change, the incidence of which increases 
with age. In acute myeloid leukemia (AML), it is also 
rare, accounting for approximately 1% of cases. In 
acute leukemia the presence of the Ph is associated 
with a poor outcome. 


Genetics 


The Ph provides an elegant example of how cyto- 
genetic findings provided the starting point for 
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Partial karyotype, showing the t(9;22) 


Figure | 
(q34;qI 1) giving rise to the Ph chromosome. 


understanding the genetic mechanisms involved in 
leukemogenesis. The Ph arises as a result of a recipro- 
cal translocation between chromosomes 9 and 22, 
t(9322)(q34;q11). Variant and complex translocations 
occur in which other chromosomes may also be 
involved. The genetic mechanisms involved in 
t(9;22)(q34;q11) are well understood. The ABL 
proto-oncogene is located on chromosome 9 in the 
chromosome band 9q34 and, as a result of the trans- 
location with chromosome 22, is moved into the BCR 
gene in 22q11. The translocation joins 3’ sequences of 
ABL to the 5’ sequences of BCR gene. The formation 
of the Ph from t(9;22)(q34;q11) or variant transloca- 
tions thus results in a BCR/ABL hybrid gene on the 
derived chromosome 22 (Ph). In CML, the break- 
points within BCR occur in a 5.8-kb region, either 
between exons 13 and 14 (b2a2), or exons 14 and 15 
(b3a2), which has been termed the ‘major breakpoint 
cluster region’ (M-BCR). This BCR/ABL gene tran- 
scribes an aberrant 8.5-kb mRNA, which in turn 
translates into a chimeric p210 protein. In the majority 
of Ph-positive ALL cases, the breakpoint occurs in the 
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first intron of the BCR gene, the minor breakpoint 
cluster region (m-BCR), and between exons 1 and 2 
and intron 2 of the ABL gene (e1a2). This results in the 
generation of a 7-kb mRNA and gives rise to a p190 
protein product. Both BCR/ABL fusion proteins 
(p190 and p210) possess enhanced tyrosine kinase 
activity and provide examples of activation of an 
oncogene by the creation of a novel fusion product, 
leading to the generation of leukemia. 

The Ph can usually be identified in CML and 
ALL by conventional cytogenetic analysis (Figure 1). 
However, 5% of CML and a small number of ALL 
cases are Ph negative, in which no Ph is visible, but 
they are positive for the BCR/ABL fusion. The Ph 
translocation in both Ph-positive and Ph-negative 
cases can be detected by fluorescence in situ hybrid- 
ization (FISH) using probes for BCR and ABL 
(Figure 2). Rearrangements within M-BCR may be 
detected by Southern analysis, and the BCR/ABL 
fusion transcript detected by the reverse transcriptase 
polymerase chain reaction (RT-PCR). 


See also: Leukemia 
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Since isolated, the race X virus placed in test-tube 174 
has produced more enigmas per nucleotide than any 


(B) 


Figure 2 Diagramatic representation of the translocation, t(9;22)(q34;qI 1), by dual-color FISH on metaphase 
chromosomes. Two locus-specific probes are employed, one for BCR and one for ABL, labeled with two different 
colored fluorochromes, which enable the fusion gene, BCR/ABL, to be accurately visualized in both metaphase and 
interphase cells. The normal chromosome 9 (9) shows paired signals indicating the presence of the ABL gene on the 
long arm (solid circles). The normal chromosome 22 (22) shows signals for the BCR gene (hatched circles). As a result 
of the translocation, ABL is moved onto the derived chromosome 22 (der (22)) which fuses with BCR, indicating the 
presence of the BCR/ABL fusion gene; no ABL signal is observed on the derived chromosome 9. (A) The re- 


arrangement in metaphase, and (B) in interphase. 


other organism. Race X bacteriophage were much 
smaller than the other bacteriophage characterized in 
the 1920s. Later, electron micrographs revealed a ‘tail- 
less’ particle, another oddity. In the 1950s, Robert 
Sinsheimer demonstrated that the genome was single- 
stranded DNA, which facilitated Fred Sanger’s pi- 
oneering work in DNA sequencing. ¢X174 has been 
used as a model system for the study of prokaryotic 
DNA replication, gene expression, and morphogen- 
esis. With the elucidation of the virion and procapsid 
atomic structures, )X174 became one of the few 
organisms in which the genetics of morphogenesis 
could be studied within a structural context. 


The Genetic Map and Its Evolution 


Early mapping experiments were difficult to interpret. 
Mutations in one cistron were often surrounded by 
mutations in another. Since this was inconsistent with 
current theories, maps depicted genes in nonoverlap- 
ping linear arrangements. Years later, the nucleotide 
sequence proved the existence of the overlapping 
genes. Some scientists believed an advanced alien 
race genetically engineered )X174, and searched the 
genome for hidden messages. According to a New 
York Times article (Walter Sullivan, 1979) the phage 
would “persist until the evolution of intelligent life 
and investigators interested in the genetics of phage.” 
Granted, if Evolution was a graduate student, s/he 
would have been expelled from even the most patient 
PhD program for lack of progress, but given a billion 
years, Evolution produces elegant work! Protein 
function and/or structure govern the arrangement of 
the overlapping reading frames. The overlapping 
genes encode catalytic and scaffolding proteins, not 
the structural proteins of the virion. The A*, B, and K 
genes all reside within reading frames found within 
the larger A gene. Two of these proteins, A* and K, are 
unessential for replication. The internal scaffolding 
protein, on the other hand, is extremely flexible and 
highly tolerant of amino acid substitutions. This places 
little, if any, constraints on the ability of the A protein to 
coevolve with the host cell proteins with which it must 
interact. Gene C overlaps partially with the unessential 
K gene. And finally gene E, which encodes a lysis 
protein, resides within the gene encoding the external 
scaffolding protein. However, the E protein is not 
needed for the production of infectious progeny and 
the entire protein is not required to mediate host lysis. 


The Genetics of Genome Replication 
and Packaging 


(+) Single-stranded DNA replication strategies are 
complex, usually occurring in three separate stages. 
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Stage I DNA replication involves the conversion of 
the single-stranded genome into a covalently closed, 
double-stranded, circular molecule, called replicative 
form (RF) I DNA. A stem-loop structure in the FG 
intercistronic region (the noncoding DNA sequence 
between genes F and G) serves as the origin of replica- 
tion for this process. Transfected (+) DNA is infec- 
tious; therefore, host cell proteins alone are both 
necessary and sufficient for stage I DNA synthesis, 
which was fully reconstituted in vitro by Arthur 
Kornberg and colleagues. 

Stage II and II DNA synthesis require two viral 
proteins, A and C, and an additional host cell protein, 
the rep protein DNA helicase. During stage II DNA 
replication the double stranded molecule is amplified. 
The A protein binds, nicks, and covalently attaches to 
RF IDNA at the stage II origin of replication. Repli- 
cation proceeds through a rolling-circle mechanism. 
Due to the nicking action of the A protein, RF II 
DNA is relaxed, not supercoiled like RF I molecules. 
In cells infected with gene A mutants, or rep” cells 
infected with wild-type X174, only supercoiled RF I 
molecules are produced. The 30-nucleotide origin of 
replication has been cloned and demonstrated to be 
both necessary and sufficient for stage II and III DNA 
synthesis. 

Early studies conducted to investigate single- 
stranded genome, or stage III, biosynthesis indicated 
that any mutation that blocked virion assembly pre- 
vented genome biosynthesis. Thus, genome biosyn- 
thesis is completely dependent on the presence of an 
assembled viral procapsid. Furthermore, synthesis and 
packaging are concurrent processes. The stage II to 
stage III conversion is mediated by the viral C protein. 
Mutations in gene C result in an overproduction of RF 
II DNA, suggesting an inhibitory role. However pro- 
tein C is also a component of the stage III DNA 
synthesis/packaging preinitiation complex, along 
with proteins A, rep, and RF II DNA, which physi- 
cally associates with the viral procapsid. The docking 
site for the preinitiation complex on the viral procapsid 
was first elucidated by second-site genetic analyses 
conducted with rep mutants that specifically blocked 
stage III DNA synthesis. The mutant proteins sup- 
ported stage II DNA synthesis and formed functional 
preinitiation complexes. However, these complexes 
were unable to associate with wild-type procapsids. 
Association can be restored by the introduction of 
several mutations in the viral coat protein. These 
mutations affect amino acids that reside in a depres- 
sion that skirts the twofold axis of symmetry in the 
atomic structure. Packaging and genome biosynthesis 
commence at the same origin of replication used in 
stage II DNA synthesis. By cloning the origin into 
plasmids of varying length, the genome packaging 
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capacity of the þX174 capsid has been determined to 
be from 70 to 105% unit lengths. 


The Genetics of Gene Expression 


Since the polarity of the X174 genome is positive, 
stage I DNA synthesis must occur before transcrip- 
tion. þX174 does not use trans-acting temporal 
mechanisms for gene expression, which is regulated 
entirely by cis-acting genetic elements: promoters, 
transcription terminators, and ribosome binding 
sites. There are three major promoters, Pa, Pg, and 
Pp, located before genes A, B, and D, respectively. 
With the exception of gene A and A* transcripts, 
most transcription commences at either Pg or Pp and 
terminates at one of four major terminators located 
after genes J, F G, and H. The terminators are not 
100% efficient, leading to a wide variety of transcripts 
including one greater than the unit length. By cloning 
different terminators behind various genes, Masaki 
Hayashi and colleagues demonstrated that in vivo 
transcript half-life is also a function of these sequen- 
ces. In contrast to transcripts beginning at Pg and Pp, 
transcripts synthesized from Pa, are extremely 
unstable. These transcripts degrade so rapidly from 
their 3’ ends that a terminator has never been mapped. 

In general, there are more transcripts encoding 
proteins that are required in greater abundance. As 
noted above, the instability of Pa transcripts ensures 
that the catalytic A protein is not overproduced. Gene 
D transcripts are the most abundant, and 240 copies of 
D protein are required to build one virion, which is 
four times more than any other protein. However, the 
relative abundance and half-lives of transcripts are not 
sufficient to ensure optimal in vivo protein concentra- 
tions. Another level of regulation is translational. The 
E protein, for example, is responsible for cell lysis. 
Early and abundant translation of gene E would lead 
to premature cell lysis, before the production of infec- 
tious progeny. Gene E resides in an overlapping read- 
ing frame with gene D. Therefore there are many E 
transcripts in the cell, but these are rarely translated, 
due to a very weak ribosome binding site. In addition 
the E protein is not a lysozyme, as found in other 
phage, but inhibits the host cell Mra Y protein. This 
enzyme catalyzes the formation of the first lipid- 
linked intermediate in cell wall biosynthesis. There- 
fore the consequences of E protein expression are 
delayed. Ultimately, cells become sensitive to osmotic 
pressure. 


The Genetics of Particle Morphogenesis 


Because most of the þX174 morphogenetic intermedi- 
ates can be isolated from infected cells, it was possible 


to use conditional lethal mutations to elucidate the 
assembly pathway. The first morphogenetic inter- 
mediates produced in infected cells are 9S and 6S 
particles, respective pentamers of coat (F) and major 
spike proteins (G). These particles can form in the 
absence of the internal or external scaffolding pro- 
teins, proteins B and D, respectively. After 9S particle 
formation, five copies of the internal scaffolding pro- 
tein bind to the underside of the particle. Since 
second-site suppressors of reduced-function B pro- 
teins map to the upper surface of the 9S particle, 
binding most likely triggers conformational changes 
on the particle’s upper surface, allowing it to interact 
with spike and external scaffolding proteins. In add- 
ition B protein also prevents the premature association 
of 9S particles into aggregates, a function reminiscent 
of molecular chaperons. Two lines of evidence suggest 
that the Microviridae internal scaffolding proteins 
have highly adaptable structures: (1) despite only 
30% homology, the >X174 and «3 B proteins effi- 
ciently cross-complement, and (2) while portions of 
the B protein are readily distinguished within the 
crystal structure, much of the N-terminus density 
is unordered, suggesting that interactions with the 
overlying coat protein can be both variable and 
flexible. 

Genetic and structural data suggest that the B pro- 
teins’ C-termini play a critical role in coat protein 
recognition. In cross-complementation experiments, 
the )X174 B protein is unable to direct the morpho- 
genesis of the related virus G4. However, mutations in 
the G4 coat protein confer a utilization phenotype. 
The mutated amino acids make contact with the B 
protein’s C-terminus, which is well ordered in the 
atomic structure. To further investigate the import- 
ance of C-terminus interactions, chimeric )X174/G4 
B gene was generated and assayed for its ability to 
complement G4 am (B) mutants. Complementation 
was efficient, suggesting that the inability of the 
6X174 protein to direct G morphogenesis resides in 
the C-terminus. This may be a general property of 
internal scaffolding proteins. Similar results have 
been obtained in studies with herpesviruses and P22 
scaffolding proteins. 

Viral procapsid formation requires 240 copies of 
the external protein, which forms a lattice on the 
outer surface of the procapsid. The four D proteins 
within the asymmetric unit are arranged in two similar 
asymmetric dimers of dimers. Each D protein makes 
different contacts with the underlying capsid, neigh- 
boring D, and spike proteins. In order for one protein 
to carry out these diverse interactions, it must assume 
several unique and rather varied structures, which 
questions the widely held assumption that folded 
proteins assume only one conformation. Unlike 


internal scaffolding proteins, foreign external scaf- 
folding proteins are potent cross-species inhibitors of 
viral morphogenesis. The Microviridae external scaf- 
folding proteins have only diverged 26%. However, 
divergence is localized to two regions: -helix 1 and 
loop 6/c-helix 7 in the atomic structure. The remain- 
ders of the proteins, constituting a-helices 2-6, are 
highly conserved, and mediate the vast majority of 
intra- and interdime contacts. 

Chimeric proteins were generated to separate the 
two divergene domains, and determine their indi- 
vidual inhibitory effects. Foreign first a-helices appear 
to block the formation of the procapsid, suggesting 
that this structure confers coat protein specificity or 
the ability to form both species of asymmetric dimers. 
The presence of foreign loop 6/a-helix 7 sequences 
allows procapsid morphogenesis; but these procapsids 
cannot be packaged. A mutation that confers resistance 
to the expression of foreign loop 6/a-helix 7 sequences 
has been isolated. The chiD® mutation (chimeric D 
resistance) alters protein A, a component of the gen- 
ome biosynthesis/packaging machinery, which binds 
the procapsid along the twofold axis of symmetry. The 
location of the chiD® mutation and the isolation of 
procapsids from cells expressing the chimeric protein 
suggest that D protein amino acids, along with coat 
protein amino acids, constitute the docking site for 
the genome biosynthetic/packaging machinery. The 
results of these analyses also demonstrate the feasibil- 
ity of using closely related proteins as antiviral agents. 

After formation of the viral procapsid, genome 
packaging most likely occurs through one of the 
threefold related pores. During packaging, B proteins 
are extruded. DNA-binding proteins, which enter the 
procapsid along with the single-stranded genome, 
most likely displace them. The B and J proteins inter- 
act with a common cleft in the viral coat protein. After 
one round of replication, the viral A protein ligates the 
3’ and 5’ ends of the genome, creating a closed single- 
stranded circular molecule. This packaged particle, 
called the provirion, still contains the external 
scaffolding protein and may represent the end of the 
intracellular assembly pathway. Upon cell lysis, the ion 
influx probably triggers the dissociation of the exter- 
nal scaffolding lattice, yielding the virion. 

The J protein may also play a role in organizing the 
DNA within the icosahedral symmetry, which may 
influence the final dimensions of the virion. The 
Microviridae J proteins are small very basic proteins. 
The N-termini of these proteins are rich in lysine and 
arginine residues. These positively charged side chains 
bind the genome’s phosphate backbone in a nonspe- 
cific manner. The J protein will bind any nucleic acid 
regardless of sugar moiety or strandedness. In add- 
ition, there is a basic amino acid cluster in the viral coat 
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protein that forms a DNA-binding pocket. The 
C-termini of the J protein binds to a cleft adjacent 
to the DNA-binding pocket. The combined inter- 
actions of the DNA-binding pocket and J protein 
tether the genome to the inner surface of the capsid. 
The tether prevents the single-stranded genome from 
forming secondary structure. Biophysical character- 
ization of virions packaged with mutant or foreign J 
proteins indicates that particles have altered dimen- 
sions, which leads to a decrease in infectivity. These 
infectivity defects can be suppressed by amino acid 
substitutions in the viral coat protein that mediate 
interactions across twofold and threefold axes of 
symmetry. 


Evolution 


The Microviridae have often served as model systems 
for the investigation of fundamental biological 
and biophysical questions. Recently, J. Bull and 
H. Wichman have used )X174 to elucidate evolution- 
ary mechanisms in chemostat experiments. Three 
point mutations, which confer selective fitness at 
high temperature extremes, have been identified 
(Bull et al., 2000). These mutations appear to act on 
the level of procapsid morphogenesis, suggesting that 
morphogenesis, as opposed to particle stability, may 
be the driving evolutionary force under these condi- 
tions. Morphogenesis may be the driving evolutionary 
force in host adaptation as well. Microviridae isolated 
from obligate intracellular parasitic bacteria appear to 
assemble without scaffolding proteins. The primary 
functions of the coliphage scaffolding proteins are the 
mediation of twofold interactions: the placement of 
spike protein pentamers on the coat protein, and the 
organization of the coat protein at the threefold axes 
of symmetry. None of these functions may be 
required in these distant @X174 relatives. These 
phage appear to be spikeless. A large insertion loop 
in the coat protein most likely organizes threefold axes 
and the internal scaffolding protein appears to have 
evolved into a structural protein. 


Future Prospects 


The proper assembly of Proteins and nucleic acids into 
biologically active virions involves numerous and 
diverse macromolecular interactions. The combination 
of genetic, biochemical, and structural approaches is 
making the Microviridae an extremely powerful sys- 
tem in which to study the fundamentals of morpho- 
genesis and evolution at the atomic level. 

The structural and morphogenetic studies were 
supported by a grant from the National Science Foun- 
dation. 
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Due to their sessile nature, plants develop with great 
plasticity and adapt to a variety of external conditions. 
Being photoauxotrophic, plants use light both as a 
source of energy and as a major stimulus for many 
developmental decisions. Plants use light in the pro- 
cess of photosynthesis to generate chemical energy 
and to fix carbon. In photomorphogenesis, light, 
sometimes at very low doses, triggers developmental 
decisions. It is important to distinguish between these 
two actions of light. This distinction can be observed 
in albino seedlings, which are not able to perform 
photosynthesis, but still have photomorphogenetic 
responses. Light affects all developmental switches of 
a plant’ s life cycle from seed germination to the tran- 
sition from vegetative to reproductive growth. Plants 
respond to a wide spectrum of light ranging from 
UVB to far-red light. They sense light intensity, direc- 
tion, spectral quality, and the duration of the light 


cycle. The spectral composition of light contains 
important information about the presence of other 
plants competing for light and triggers important 
developmental decisions such as shade avoidance. 
The length of the light cycle is a determining factor 
for the timing of flowering in numerous plant species. 
Physiological, photobiological, and more recently 
molecular genetic studies have demonstrated that 
plants possess distinct photoreceptors. So far three 
families of photoreceptors have been identified: 
(1) the phototropins that sense light direction (photo- 
tropism); (2) the cryptochromes, a class of blue light 
receptors; and (3) the dichromic red/far-red-absorbing 
phytochromes. Molecular mechanisms about the sig- 
naling events occurring after photoperception are 
starting to emerge. However, photoreceptors for sev- 
eral light responses remain to be assigned (Kendrick 
and Kronenberg, 1994). 


Arabidopsis thaliana as a Model System to 
Study Photomorphogenesis 


Photomorphogenesis has been studied for over 100 
years in a wide variety of plant species. Traditionally 
plant light responses were studied using physiological 
and photobiological techniques. However, since many 
light effects are induced by the coaction of several 
photoreceptors and since some photoreceptors regu- 
late multiple aspects of photomorphogenesis, a genetic 
approach has become an extremely valuable comple- 
ment (Kendrick and Kronenberg, 1994). As a conse- 
quence, research has concentrated on a few species 
that are particularly well suited for molecular genetic 
studies. Photomorphogenic mutants have been de- 
scribed in species such as peas, tobacco, cucumber, 
Arabidopsis thaliana, and tomato. Due to its small 
stature, short life cycle, and completely sequenced 
genome A. thaliana has become the most successful 
model system in the field. These assets are extremely 
valuable for performing genetic studies, so this article 
will mainly concentrate on studies performed in 
A. thaliana. 

Since plants are affected by light throughout their 
life cycle, mutant screens to identify genetic loci 
implicated in light responses have been performed 
with plants of all developmental stages. However, for 
practical reasons numerous screens were performed 
with young seedlings. It is possible to plate several 
hundred A. thaliana seedlings on a single petri dish 
allowing screens of large populations. Screening such 
numbers of adult plants is obviously much more space 
and time consuming. The other reason for choosing 
this stage of development is the very obvious effect of 
light on young seedling development (Figure 1). 
Species with seeds containing large food reserves can 
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Light 


(A) (B) 


(C) (D) 


wild-type 


Figure | 


hy det 


The effect of light on seedling development in Arabidopsis thaliana. The embryonic shoot is known as 


hypocotyl (below the cotyledons) and the embryonic leaves as cotyledons. (A) Dark-grown etiolated wild-type 
seedling; note the elongated hypocotyl (h), the apical hook (ah), the folded and unexpanded cotyledons (c). (B) Light- 
grown de-etiolated wild-type seedling; note the short hypocotyl, the open, expanded, and green cotyledons (c). (C) 
Light-grown light-insensitive hy mutant; note the length of the hypocotyl; the cotyledons are paler and less expanded 
than in the wild-type. (D) Dark-grown det mutant; note the de-etiolated appearance of this dark-grown seedling, with 
a short hypocotyl and expanded cotyledons. These mutants are also known as fusca (fus) due to their purple color, or 


cop for constitutively photomorphogenic. 


grow in the absence of photosynthesis for several 
days. Dark-grown seedlings are known as etiolated 
(from the French word étiolé: pale and weak). This 
etiolated stage is characterized by a long hypocotyl 
(shoot), a closed apical hook, and unopened cotyledons 
(embryonic leaves). These features allow the seedling 
to grow through a thin layer of soil and emerge into 
the light (see Figure 1). As the seedling perceives 
sufficient light, it will de-etiolate, a process that will 
initiate its photoauxotrophic life. Light has numerous 
effects on seedling development: inhibition of hypo- 
cotyl elongation, promotion of cotyledon expansion, 
primary leaf development, development of the chloro- 
plasts, and regulation of gene expression. Figure | 
illustrates the very obvious difference between a dark- 
and a light-grown seedling; it also shows one example 
for each of two classes of photomorphogenic mutants. 
Mutants that are insensitive (or less sensitive) to light 
(by mutants) are characterized by their long hypocotyl 
when grown in the light. Those mutants have allowed 
the identification of several photoreceptors and sig- 
naling components acting downstream of these light 
sensors. A second large class of recessive mutants dis- 
play several aspects of light-grown development in the 
absence of light (det mutants, also known as cop and 
fus), and they identify a class of negative regulators of 
photomorphogenesis (Figure 1). Hypocotyl length 
screens are easy to perform, but secondary screens 
for other typical light responses are needed to prop- 
erly characterize genetic loci identified by this simple 
method (Kendrick and Kronenberg, 1994). 


The Phytochromes: A Class of 
Dichromic Red/Far-Red Photoreceptors 


The nomenclature for all photoreceptors discussed 
here is based on the nomenclature adopted for the 
phytochromes. All those photoreceptors are com- 
posed of a protein and a chromophore, which allows 
them to absorb light of a specific wavelength. Phyto- 
chrome A is taken as an example. PHYA gene: PHYA, 
apoprotein; phyA, holoprotein; phyA, mutant allele. 

The history of phytochrome discovery has been 
extensively covered in a captivating book (Sage, 
1992). Phytochrome responses were first described 
as light responses in higher plants that are induced 
by red light and reversed by a subsequent pulse of 
far-red light. Far-red light alone either has no effect 
or it has an inhibitory effect compared to a dark- 
treated control. Since such red/far-red_ reversible 
light responses were observed for diverse physiologic- 
al processes such as seed germination and flowering 
time with very similar action spectra, it became clear 
that one class of photoreceptor was responsible for all 
these responses. The action spectrum for any given 
light response determines which wavelength is most 
effective to trigger the response. Originally it was 
believed that red/far-red light was sensed by a single 
photoreceptor; in fact, plants contain small gene 
families coding for phytochromes (five in A. thaliana, 
PHYA-PHYE). 

Phytochrome was purified from etiolated seedlings 
where it is relatively abundant and can be followed 
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spectroscopically due to the low level of other 
pigments absorbing red and far-red light. This allowed 
a biochemical and spectroscopical analysis demon- 
strating that the absorption spectrum of the holo- 
protein closely matches the action spectra of the 
physiological processes controlled by phytochromes. 
Phytochromes are found as soluble homodimers of 
120 kDa subunits. Each monomer covalently binds a 
linear tetrapyrrole chromophore (phytochromobilin) 
responsible for phytochrome’s characteristic spectral 
properties. It exists in two spectrally interchangeable 
forms Pr and Ptr, the red and far-red light absorbing 
forms respectively. Phytochrome is synthesized as Pr 
in the dark, and upon absorption of red light it is 
converted to Pfr, the active form of the photoreceptor 
for many phytochrome responses. Pfr can be con- 
verted back to Pr either after absorption of far-red 
light, or in a non-photochemical reaction known as 
dark reversion (after prolonged incubation in the 
dark) (Kendrick and Kronenberg, 1994). 
Phytochromes are composed of two protein do- 
mains: an N-terminal chromophore-binding domain 
separated by a small hinge region from the C-terminal 
output domain. This second domain shows an inter- 
esting homology with bacterial histidine kinases. 
However plant phytochromes do not possess 
histidine kinase activity, but a light-regulated Ser/ 
Thr protein kinase activity, with Pfr being a more 
active kinase than Pr. This might be one of the ways 
the light signal sensed by phytochrome is further 
transduced in the plant. Two PAS domains are also 
present in this C-terminal portion of the protein; both 
genetic and biochemical studies have highlighted their 
importance in phytochrome signaling. PAS is the 
acronym for the three founding members of this pro- 
tein domain PER-ARNT-SIM (PAS). Such protein 
modules have been found in a wide variety of organ- 
isms and play important signaling roles in response to 
small ligands, changes in light conditions, oxygen 
levels, and redox potential. In some proteins, such as 
phototropin (see below) this domain is used to bind 
cofactors, and in phytochromes this domain is import- 
ant for protein-protein interactions. The light envir- 
onment also controls the subcellular localization of 
phytochromes. These photoreceptors are cytoplasmic 
in the dark and appropriate light treatments triggers 
their translocation into the nucleus; this is another 
major level of phytochrome regulation by light (Neff 
et al., 2000). Protein stability is the third property that 
is regulated by light, but this is only true for certain 
phytochromes. Phytochrome A is approximately 100 
times more abundant in the dark than in the light, 
which contrasts with the other members of the family 
with constitutive protein levels. The existence of 
light-stable and light-labile phytochrome pools has 


important physiological implications, in particular 
for the shade-avoidance syndrome. 

Genetic studies in numerous plant species have 
allowed assigning specific light responses to the indi- 
vidual members of the phytochrome family. The exist- 
ence of phytochrome chromophore mutants was 
also informative for these studies. Phytochromes play 
important roles in seed germination, light-regulated 
gene expression, de-etiolation, vegetative development 
(shade avoidance), and the transition from vegetative 
to reproductive growth. Generally speaking phyto- 
chromes play redundant roles during photomorpho- 
genesis. However for specific light responses some 
members of the family have unique functions as well, 
which allowed their identificationas mutants in the first 
place. For example, in A. thaliana both phyA and phyB 
mutants were identified as hy mutants under specific 
light conditions (see Figure |) (Quail et al., 1995). 


Cryptochromes 


Cryptochromes are UVA/blue light receptors that 
were first identified in plants (Cashmore et al., 1999). 
Their name hints to their elusive nature (blue light 
responses in plants were described more than a cen- 
tury ago by Darwin) and to the prevalence of blue 
light responses in cryptogames (nonflowering plants). 
The power of genetics in A. thaliana allowed the 
identification of cry1 (cryptochrome 1), which was 
found by looking for seedlings which did not fully 
de-etiolate in blue light. The by4/cry1 mutant was 
isolated in the first mutant screen for light-insensitive 
mutants in A. thaliana (Kendrick and Kronenberg, 
1994). The original mutant alleles were known as 
hy4, later alleles are termed cry1. The gene was cloned 
allowing the analysis of the elusive blue light receptor. 

Cry1 consists of two protein domains: an N- 
terminal portion with high homology to bacterial 
photolyases and a C-terminal extension with weak 
similarity to tropomyosin. Photolyases are flavopro- 
teins that perform light-dependent DNA repair. They 
repair pyrimidine dimers in a blue/UVA dependent 
way. Despite the homology to bacterial photolyases, 
cry! has no photolyase activity. The similarity 
between cry1 and the photolyases extends to the 
cofactors since they both possess two noncovalently 
attached chromophores. In vitro reconstituted recom- 
binant cry1 binds to flavin-adenine dinucleotide 
(FAD) and a pterin or a deazaflavin. In photolyases 
UVA light is first absorbed by a pterin-like molecule, 
and the absorption spectra of the primary chromo- 
phore determines the action spectra of the photore- 
activation. Photon energy is then transferred to the 
FAD chromophore. This secondary cofactor donates 
the electron to participate in the cleavage of the 
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pyrimidine dimer of the UV-damaged DNA. The 
cofactor composition of cry1 correlates with the lack 
of response that hy4/cry1 mutants show in blue and to 
a lesser extent in UVA and green light. It therefore 
appears that cryptochromes have kept the photoper- 
ception mechanism of photolyses, but the light signal 
is then transmitted to the plant by a yet-to-be-discov- 
ered mechanism (Cashmore et al., 1999). 

In A. thaliana another cryptochrome photorecep- 
tor has been identified. Both cry 1 and cry2 are loca- 
lized in the nucleus. Their photolyase domain is 
very similar but their C-termini are unrelated to each 
other. Similarly to phytochromes one member of 
the family is light labile (cry2) whereas the other 
(cry1) is light stable. Mutant analysis has demon- 
strated that they both play important roles in de- 
etiolation, entrainment of the circadian clock, and the 
transition from vegetative to reproductive develop- 
ment. Cry1 is the primary blue light receptor for de- 
etiolation under high intensities of blue light. Under 
blue light by4/cry1 mutants have defects in inhibition 
of hypocotyl elongation, in cotyledon expansion, and 
in gene expression. Cry2 is particularly important 
to detect low intensities of blue light and to sense 
day-length extensions. Day-length extension leads to 
flowering in the wild-type, and this response is 
impaired in cry2 mutants. Cryptochromes are wild- 
spread throughout the biological kingdom. Genetic 
analysis has shown that they play well-established 
roles in regulation of the circadian clock in A. thaliana 
as well as in flies (Drosophila) and mice (Cashmore 
et al., 1999). 


Phototropin: A Photoreceptor for 
Phototropism 


Application of unilateral light triggers the curvature 
of growing plant organs away or towards the light 
source; this phenomenon is known as phototropism. 
A typical example of a phototropic response is the 
growth of the hypocotyl of a seedling towards the 
light source (positive phototropism) and the growth 
away from that source of the root (negative photo- 
tropism). Charles Darwin in 1881 investigated this 
phenomenon and a few years later Julius von Sachs 
measured a crude action spectrum demonstrating that 
blue light was very effective (Briggs and Huala, 1999). 

Phototropism was first studied with photobio- 
logical, physiological, and biochemical approaches. 
Interestingly, illumination with unilateral blue light 
correlated with the presence of a phosphorylated 
membrane protein of about 120 kDa. The action spec- 
tra for phototropism and the phosphorylation of p120 
were very similar and the phosphorylation occurred 
very rapidly after the onset of blue light. These results 
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suggested that phosphorylation of p120 was a very 
early event in phototropism (Briggs and Huala, 
1999). This protein was identified using genetics by 
looking for A. thaliana mutants that were unable to 
bend towards a unilateral source of blue light. This 
screen yielded four complementation groups, nph1- 
nph4 (nonphototropic hypocotyl), three of which are 
specifically impaired in positive phototropism of the 
hypocotyl. Nph1 turns out to be identical to the 
120kDa protein identified biochemically in earlier 
studies. NPH1 codes for a protein with an N-terminus 
containing two LOV/PAS domains and a C-terminal 
Ser/Thr protein kinase domain. Each LOV/PAS 
domains binds to a flavin mononucleotide (FMN) 
cofactor. LOV domain stands for light, oxygen, and 
voltage, which represent a subset of PAS domains. 
The absorption spectrum of this holoprotein is very 
similar to the action spectrum of phototropism 
demonstrating that this protein is indeed the photo- 
receptor for phototropism; nphi1 has thus been 
renamed phototropin (Briggs and Huala, 1999). Inter- 
estingly autophosphorylation of nph1 is stimulated by 
blue light. This photoreceptor is therefore a light- 
modulated protein kinase, suggesting that protein 
phosphorylation is important for signaling. One of 
the substrates of this activity is phototropin itself. 
The exact nature of the following signaling events is 
still under investigation. Nph1 homologs exist in 
numerous plants and the correlation between phos- 
phorylation of a 120kDa protein and phototropism 
has been made in many plant species. Sensing unilat- 
eral blue light with a phototropin photoreceptor is 
therefore a conserved mechanism among plants 
(Briggs and Huala, 1999). 

In A. thaliana genetic analysis has demonstrated 
that phototropin is the only photoreceptor primarily 
responsible for the detection of unilateral blue light. 
Interestingly, this light sensor is specifically dedicated 
to positive phototropism in the hypocotyl (Briggs and 
Huala, 1999). This situation contrasts with other 
photomorphogenetic responses where there are nu- 
merous interactions between, and a degree of redun- 
dancy among, multiple photoreceptors. This has been 
well documented for de-etiolation in seedlings of 
double or triple mutant combinations between phyto- 
chromes and/or cryptochromes mutants. Moreover, 
unlike phototropin, cryptochromes and phyto- 
chromes affect multiple aspects of photomorpho- 
genesis (Cashmore et al., 1999; Neff et al., 2000). 


Signaling: What Happens after 
Photoperception? 


Interestingly, two of the characterized photoreceptors 
are light-regulated protein kinases. Phototropin has a 
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classic protein kinase domain belonging to the Ser/Thr 
protein kinase superfamily. Phytochromes also dis- 
play light-regulated Ser/Thr protein kinase activity 
but they do not belong to this protein superfamily. 
Their C-terminal signaling domain is related to bac- 
terial histidine kinases. Phytochromes were originally 
identified in higher plants, but they have now also been 
encountered in prokaryotes. These bacteriophyto- 
chromes are light-regulated histidine kinases. In plants 
the protein kinase activity of phytochromes is still 
poorly characterized and very little is known about 
the physiological relevance of protein phosphorylation 
during photomorphogenesis. 

The study of signaling events initiated by these 
photoreceptors is still in its infancy. A large number 
of genetic loci belonging to two major classes have 
been identified: the hy mutants and the det/cop/fus 
mutants (see Figure |). The first class has phenotypes 
similar to photoreceptor mutants and therefore 
defines positively-acting signaling components. The 
ones acting very early after one specific photoreceptor 
have phenotypes only under the light conditions per- 
cieved by this particular photoreceptor. For example 
in A. thaliana phyB mutants are specifically affected 
in red-light sensing. A mutant such as red1 that is 
affected specifically in phyB signaling also shows a 
phenotype in red but not blue or far-red light. Since 
many photoreceptors eventually affect the same type 
of cellular responses, loci acting further downstream 
such as hy are defective in light sensing over the 
whole visible spectrum. Hy5 is therefore acting down- 
stream of multiple photoreceptors. The other large 
class of mutants that presumably act downstream of 
multiple photoreceptors are the det/cop/fus mutants. 
These recessive mutants, which de-etiolate in the 
absence of a light cue, define a large class of negative 
regulators of photomorphogenesis (Figure 1). A 
number of loci implicated in light signaling down- 
stream of the photoreceptors have been cloned in A. 
thaliana. It is however still too early to have a clear 
idea about the signaling cascades following photoper- 
ception. 
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If a culture of bacteria is irradiated with ultraviolet 
(UV) light, the bacteria die exponentially with a broad 
shoulder on the curve (Figure 1). This survival curve 
is typical of organisms that contain a certain number 
(n) of ‘targets,’ which must all be hit in order to kill the 
organism; the width of the shoulder increases as 
n increases. However, if the cells are exposed to visible 
light for a time shortly after UV irradiation, a fraction 
of them recover viability, so the survival curve has a 
smaller slope (second curve in Figure 1). This phe- 
nomenon is called ‘photoreactivation’ (Kelner, 1949). 
The lethal damage done by the UV light consists of 
two types of covalent linkages between adjacent 
pyrimidine residues in the bacterial genome. Most 
often a photon of UV light links two adjacent thymine 
residues into covalently bonded thymine dimers 
through the formation of a cyclobutyl ring between 
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Figure | Survival curves for bacteria (Escherichia coli) 
irradiated with UV light (lower curve) and for the same 
bacteria photoreactivated with visible light (upper 
curve). 
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Figure 2 When a pair of neighboring pyrimidines in 
DNA absorbs a quantum of UV light, they may form a 
dimer, which distorts the double helix. 


their respective C-5 and C-6 atoms (Figure 2; see 
Thymine). Alternatively, two adjacent residues may 
be linked into a dimer between C-6 of one and C-4 of 
the other (commonly between two cytosines or a 
cytosine and a thymine). These dimers distort a strand 
of the DNA helix and interfere with DNA replication; 
they can also be mutagenic. Repair of the damage in 
the presence of visible light depends upon two types of 
enzymes called photolyases, which each operate on 
one type of dimer. The enzyme binds to its character- 
istic dimer and, when activated by certain wavelengths 
of light, restores the original bases by splitting the 
cyclobutyl ring or the C6-4 linkage. Photolyases 
have also been found in bacteria and in some simple 
eukaryotes. Photolyases of the C6-4 type have been 
found in Drosophila and some plants. 
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Photorepair is the repair of DNA damage induced by 
exposure to sunlight or ultraviolet light. The damage 
includes a number of DNA lesions, including different 
types of dimers and photoproducts at adjacent 
pyrimidines on the same strand. Different enzymes 
carry out the repair using different pathways. Some 
enzymes, such as photolyase, directly reverse certain 
pyrimidine-pyrimidine dimers. Nucleotide excision 
repair eliminates a stretch of single-stranded DNA 
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containing the damaged base, and repair synthesis 
restores the correct sequence. An additional pathway 
involves using recombination to correct the portion of 
a gene containing a damaged base. 


See also: Excision Repair; Repair Mechanisms 
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Photosynthesis, the use of light energy to drive carbon 
fixation and the synthesis of organic compounds, is a 
central process in the biosphere. In eukaryotes, photo- 
synthesis takes place in a specialized organelle, the 
chloroplast, which has its own genetic system. The 
chloroplast genome contains only a small part of 
the genes required for photosynthesis, the others are 
encoded in the nuclear genome. As a consequence, the 
biogenesis of the photosynthetic machinery requires 
the coordinate expression of the two genomes. The 
genetics of photosynthesis addresses, first of all, the 
structural genes for the apoproteins of macromolecu- 
lar complexes and enzymes that are required in the 
process. The genetic analysis also deals with genes for 
the synthesis of pigments and other cofactors, for the 
import of polypeptides in the chloroplast and target- 
ing within the organelle, for the assembly and repair of 
the complexes, for the adaptation of photosynthesis to 
environmental conditions, and for many other facets. 
The genetics of photosynthesis also reveals loci 
required for the maintenance of the plastid, for the 
expression of the plastid genome, and for the regu- 
lation of plastid development. 


Photosynthesis 


Photosynthesis can be described as the process that 
allows some living organisms to convert light energy 
into chemical energy, which is used to synthesize 
organic compounds. Photosynthesis directly supports 
plants, algae, and some prokaryotes, and also indirect- 
ly sustains most of life in the biosphere (setting 
aside the minor contribution from chemoautotrophic 
bacteria) by providing organic matter, food, and 
oxygen. 

Oxygenic photosynthesis, which will be the main 
focus of this discussion, is found in plants, algae, and 
cyanobacteria: CO; is reduced to carbohydrate, water 
is oxidized, and oxygen is evolved. Overall, the whole 
process can be summarized as: 
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Figure | Simplified scheme of photosynthetic electron transport in the thylakoid membrane. Light energy captured 
by the antenna complex LHCII is transferred to the reaction center of photosystem II (PSII), where it induces charge 
separation. PSII thus catalyzes the light-driven reduction of plastoquinone and oxidation of water, with the release of 
oxygen and protons (H”) in the lumen. Reduced plastoquinone (PQH;), soluble in the membrane, transfers electrons 
to the cytochrome bef complex. The bef complex oxidizes PQH, and releases protons in the lumen; one electron is 
transferred to plastocyanin while the other is used in the Q-cycle and returns to the plastoquinone pool. Plastocyanin 
(PC), a soluble protein of the lumen, transfers the electrons to photosystem I. Light energy, captured by the antenna 
complex of photosystem | (LHCI), drives charge separation in photosystem (PSI) so that plastocyanin is oxidized and 
ferredoxin is reduced. Ferredoxin (Fd) is a soluble protein of the stroma, which is oxidized by the enzyme ferredoxin 
NADP reductase (FNR) to reduce NADP to NADPH. The proton gradient across the thylakoid membrane, 
generated by photosynthetic electron transport, is used by the ATP synthase complex to synthesize ATP from ADP 
and Pi. In cyclic electron flow, electrons are returned to the bef complex, favoring the formation of the proton 
gradient instead of the production of NADPH. 

The thick black arrows show the path of electrons from water to NADPH. The dotted gray arrow denotes cyclic 


electron flow. 


CO? + H20 + light — (CH20) + O2 


Photosynthesis comprises two phases: in the first 
set of reactions, light energy is absorbed and con- 
verted into chemical energy, which is then used in 
the second phase to reduce CO, to carbohydrate. 
The chemical energy provided by photosynthesis is 
also used in other processes such as nitrogen or sulfur 
assimilation. The first phase takes place in the photo- 
synthetic membrane and involves a series of large 
complexes, each containing multiple polypeptide sub- 
units and a variety of pigments and cofactors. They 
form an extraordinarily sophisticated machinery 
which is both efficient and robust, capable of rapid 
dynamic adaptation to large changes in light intensity. 
In the case of oxygenic photosynthesis (Figure 1), 
photons are captured by the light-harvesting com- 
plexes (LHCII and LHCI) and their energy is used 
to drive redox reactions in the photosystems (PSII and 
PSI) and hence the flow of electrons along the electron 
transfer chain, from water to NADP. The cytochrome 
bef complex lies in series between PSII and PSI. Water 
oxidation by PSII releases H* ions, and electron trans- 
fer through the cytochrome bef complex is coupled to 
H” transfer across the photosynthetic membrane. The 
resulting H* gradient is used by the ATP synthase 


complex to drive the phosphorylation of ADP. Cyclic 
electron flow, involving PSI and the cytochrome bef 
complex but not PSII, contributes to the H” gradient 
and thus to ATP synthesis, but not to the net produc- 
tion of NADPH. In the second phase of photosynthe- 
sis, NADPH and ATP participate in a series of 
enzymatic reactions (known as the Calvin cycle) for 
the reduction of CO, to carbohydrate. In this phase 
light plays a regulatory role by activating enzymes of 
the Calvin cycle through a pathway that involves fer- 
redoxin and thioredoxin. The enzyme Rubisco (ribu- 
lose-bis-phosphate carboxylase/oxygenase) plays a 
central role in CO, fixation: it catalyzes the carbox- 
ylation of a sugar pentose to yield two molecules 
of triose. However Rubisco also catalyzes a side- 
reaction, oxygenation of the pentose which leads to 
photorespiration and ultimately loss of CO}. 


The Chloroplast and its Genome 


In eukaryotes (algae and plants), photosynthesis takes 
place in a specialized organelle, the chloroplast, which 
is surrounded by two membranes (the outer and inner 
envelopes) and contains a soluble phase (the stroma) 
and a complex system of internal membranes (the 
thylakoid membranes). The latter form a network of 


flattened membrane vesicles, enclosing a separate 
internal compartment, the lumen. Part of the thyl- 
akoid membranes are exposed to the stroma and part 
form tightly appressed stacks, or grana. The chloro- 
plast is a specialized developmental form of the 
plastid, which can also differentiate to other forms, 
such as the starch-containing amyloplasts of roots, or 
the pigment-containing chromoplasts of fruits. Plas- 
tids are thought to have arisen during evolution from 
an endosymbiotic association of a photosynthetic bac- 
terium with an early eukaryotic cell. Plastids maintain 
an autonomous genetic system with a distinct genome, 
and are correspondingly equipped with the machinery 
for DNA replication and gene expression. This machin- 
ery has retained a prokaryotic character: the 70S ribo- 
somes and the plastid-encoded RNA polymerase 
resemble their bacterial counterparts. Another plastid 
RNA polymerase, which is nucleus-encoded, is simi- 
lar to the RNA polymerases of some bacteriophages. 

The plastid genome of land plants or green algae is a 
circular DNA molecule, 120 to 200 kb in size. It har- 
bors genes for the rRNAs and tRNAs (rrn and trn 
genes), and approximately 90 genes for proteins. Most 
of the latter genes encode components of the photo- 
synthetic machinery (PSI: psa; PSII: psb; b6f complex: 
pet; ATP synthase: atp). They also code for some of 
the ribosomal proteins (rp/ and rps), subunits of RNA 
polymerase (rpo), a few polypeptides with diverse 
roles, and open reading frames of unknown functions 
(ycf). While the plastid genomes of plants have been 
highly conserved during evolution, the size and organ- 
ization of the plastid genomes from algae are much 
more variable, and their genetic content is more 
diverse. Plastid genes are typically transcribed as poly- 
cistronic units, and many genes are interrupted by 
introns belonging to group I or group II. Some plastid 
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transcripts are subject to posttranscriptional editing at 
specific sites, where C is changed to U. This implies 
that the DNA gene sequence is not always in itself 
sufficient to describe the final gene product. 

Although the plastids have retained their own 
genetic system, most plastid polypeptides are encoded 
in the nuclear genome, translated in the cytosol 
as precursors with N-terminal transit peptides, and 
imported into the organelles. The photosynthetic 
complexes are of mixed genomic origin: some subunits 
are expressed from chloroplast genes, others are 
derived from nuclear genes, as illustrated in Table | 
for photosystem I. 


Model Organisms 


In recent years, a few organisms have been most inten- 
sively used to study the genetics of photosynthesis: the 
eukaryotes Chlamydomonas reinhardtii, Arabidopsis 
thaliana, and Zea mays, and several strains of Syne- 
chococcus, which are prokaryotic cyanobacteria. C. 
reinhardtii is a unicellular green alga with a short and 
simple life cycle, well suited for the genetic analysis of 
photosynthesis, which was initiated by Paul Levine 
and coworkers in the 1960s. In the presence of a 
carbon source such as acetate, photosynthesis is facul- 
tative, so that mutants can readily be recovered and 
propagated in large amounts. C. reinhardtii has a path- 
way for chlorophyll synthesis in the dark, and its 
photosynthetic machinery is assembled in the absence 
of light. The three genetic systems of C. reinhardtii (in 
the nucleus, the chloroplast, and the mitochondria) are 
amenable to genetic transformation. 

Zea mays (corn or maize) is a monocot that has also 
been used as a model organism, in particular because 
of its elaborate classical genetics and its well-studied 


Table I The dual genetic origin of the photosystem | subunits 


Function 


Gene Genomic location Polypeptide product 
psaA plastid PsaA (intrinsic) 

psaB plastid PsaB (intrinsic) 

psaC plastid PsaC (extrinsic, stromal) 
PsaD nucleus PsaD (extrinsic, stromal) 
PsaE nucleus PsaE (extrinsic, stromal) 
PsaF nucleus PsaF (intrinsic) 

PsaG nucleus PsaG (intrinsic) 

psal plastid Psal (intrinsic) 

psa] plastid Psa] (intrinsic) 

PsaK nucleus PsaK (intrinsic) 

PsaL nucleus PsaL (intrinsic) 

PsaM nucleus PsaM (intrinsic) 

PsaN nucleus PsaN (extrinsic, lumenal) 


reaction center (binds P700, Ao, Aj, F,)* 
reaction center (binds P700, Ao, Aj, F,)° 
electron transfer (binds Fa, Fg)° 


ferredoxin docking 
cyclic electron transfer 
plastocyanin docking 


PSI trimerization 


1 P700: reaction center chlorophyll dimer; Ag: monomeric chlorophyll; A,: phylloquinone; Fy, Fa, Fg: 4Fe—4S centers. 
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transposable elements. For reverse genetics, pools of 
transposon-tagged mutants can be screened using 
PCR approaches to identify lines carrying an insertion 
in the gene of interest. Maize sets large seeds, with 
reserves that can sustain homozygous mutant seed- 
lings until they have developed a few leaves. The 
leaves grow from the base, so that the temporal se- 
quence of plastid development maps to a spatial 
sequence along the length of the leaf blade. 

The small dicot Arabidopsis has also been an attract- 
ive model for genetic analysis, with its small size and 
short life cycle. Its genetics are enhanced by advanced 
genomics that include a complete nucleotide sequence, 
the first to be derived for a higher plant. Large col- 
lections of T-DNA-tagged and transposon-tagged 
mutant lines are available, and as in maize, PCR- 
based screening of mutant pools allows the identifica- 
tion of insertions in genes of interest. Homozygous 
photosynthesis mutant seedlings can be grown on 
sucrose-containing media, but it is difficult to obtain 
large amounts of mutant material. 

Cyanobacteria such as Synechococcus spp. also per- 
form oxygenic photosynthesis. Because transform- 
ation proceeds by homologous recombination, they 
are particularly well suited for reverse genetics. The 
complete nucleotide sequence of the Synechococcus 
sp. PCC6803 genome was the first to be derived for 
a photosynthetic organism. Synechococcus spp. have 
been used intensively, in particular for the study of 
structure-function relationships in the photosystems. 
They have a photosynthetic machinery that is closely 
related to that of plants but differs in some aspects, and 
they use a different type of light-harvesting antenna, 
the phycobilisome. 


Chloroplast Genetics 


Unlike the inheritance of nuclear loci, the transmis- 
sion of chloroplast genetic markers deviates from the 
rules of Mendelian genetics: it is usually uniparental 
(but biparental in some species). In many plants, the 
plastid genome is transmitted only from the maternal 
parent, by the exclusion or elimination of plastids 
from pollen, but interestingly in conifers plastid trans- 
mission is paternal. In the unicellular green alga 
C. reinhardtii, the chloroplast genome is inherited 
uniparentally from the mating type (+) parent. The 
molecular mechanism of uniparental inheritance in 
C. reinhardtii is only partly understood. It involves 
the selective degradation of the mt(—) genome after 
the two parental plastids, contributed by the two 
gametes, have fused in the early zygote. 

An unusual twist to plastid genetics comes from 
the fact that each one contains many copies of its 
genome. Furthermore each plant cell contains many 


plastids: thus some cells in the leaf can contain thou- 
sands of copies of the chloroplast DNA. Likewise in 
C. reinhardtit, although there is only a single chloro- 
plast per cell, each one harbors approximately 80 
copies of the genome. One consequence is that in 
certain situations, different plastids within a plant 
cell may have a different genetic constitution, or single 
plastids may contain a genetically heterogenous set of 
genomes (a condition described as heteroplasmy). 
Segregation of plastids and of genomes during cell 
division, and also gene conversion, can contribute to 
sorting of plastids. In some mutant plants, plastid 
segregation during development can give rise to 
variegated sectors. 

In most plants and in some algae, a segment of the 
circular plastid genome is duplicated as an inverted 
repeat, flanked by two single-copy regions. Recombin- 
ation between the two inverted repeats can generate 
two physical isomers of the genome by a flip-flop 
mechanism. Gene conversion between the inverted 
repeats is also very active so that new mutations in 
one copy are rapidly transferred to the other, and the 
two copies remain identical. 

The analysis of chloroplast recombination was pi- 
oneered by Ruth Sager in C. reinhardtii. Although the 
plastid genome is usually inherited from the mt(+) 
parent (see above), in a small fraction of the zygotes 
inheritance is biparental and recombination of paren- 
tal markers is observed in the progeny, allowing the 
establishment of a genetic map. Recombination fre- 
quencies are high (on the order of 1% per kb) and 
there are hot spots for recombination, so that genetic 
distances can only be derived over fairly short physical 
distances. 

Chloroplast transformation was first demonstrated 
in C. reinhardtii using biolistic bombardment with 
DNA-coated microprojectiles by John Boynton and 
collaborators in 1988. Biolistic transformation of the 
chloroplast can also routinely be achieved in tobacco. 
Integration proceeds by homologous recombination 
into one (or a few) copies of the plastid genome so that 
the transformed cell is initially heteroplasmic. Sub- 
sequent subculturing and selection usually leads to 
the segregation of homoplasmic lines. However, if a 
mutation is introduced that disrupts an essential gene, 
a heteroplasmic situation persists because of two 
opposing selective pressures: selection for wild-type 
copies of the vital gene on the one hand, and for the 
transformation marker linked to the mutated gene 
on the other hand. The demonstration of chloroplast 
transformation opened the way to targeted gene 
disruption and site-directed mutagenesis. This has 
allowed the investigation of structure—function re- 
lationships in the photosynthetic apparatus, and the 
analysis of cis-acting elements and trans-acting factors 


involved in plastid gene expression. The importance of 
interactions between the nucleus and the plastid 
were highlighted by the early studies of W. Stubbe 
with Oenothera. He identified three types of haploid 
nuclear genomes and five types of plastid genomes 
by studying the compatibility of the different nuclear 
backgrounds with the plastid genotypes. Depend- 
ing on the combination of nuclear and plastid 
genomes, plants with normal green chloroplasts 
or variegated plants with deficient plastids were ob- 
tained in crosses. 


Photosynthesis Mutants and their 
Phenotypes 


Because the photosystems and the light-harvesting 
antennae contain many pigments (chlorophylls and 
carotenoids), nonphotosynthetic mutants often have 
altered pigmentation, ranging from slightly pale to 
yellow or white (chlorina, viridis, yellow, albino, 
white, etc.). Lack of photosynthesis is lethal in homo- 
zygous seedlings, but they can be rescued if they are 
grown on sucrose-containing media. In some cases, 
plastid mutations give rise to variegated plants with 
sectors of wild-type and mutant tissue, the former 
sustaining the latter. In C. reinhardtii, photosynthesis 
mutants cannot grow on minimal medium and can be 
recognized as acetate-requiring by replica-plating (ac 
mutants). Mutants that harvest light but cannot use 
the energy for photochemistry exhibit high levels of 
chlorophyll fluorescence (cf mutants). Defects in the 
photosynthetic electron transfer chain can be revealed 
by changes in the kinetics of fluorescence induction 
after a transition from dark to light. Photosynthesis 
mutants are also often sensitive to high intensities of 
light. There are also mutants which were selected for 
increased tolerance to herbicides that act on photo- 
synthesis, or to inhibitors of plastid translation. 

Mutations in the photosynthetic machinery are 
typically pleiotropic. In the absence of one subunit, 
the other subunits of a complex are synthesized 
but are not assembled properly and they are rapidly 
degraded so that all components of the complex are 
affected. Although this phenomenon complicates the 
identification of the primary lesion in a mutant, it 
reveals a very active proteolytic surveillance. This 
proteolytic system may function in a posttranslational 
mechanism to regulate the stoichiometric accumu- 
lation of the subunits of each complex, albeit at a late 
step. 


The Genes of Photosynthesis 


The genes that play a role in photosynthesis can 
be identified in many different ways. They can be 
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revealed genetically by mutations that affect photo- 
synthetic activity, they can be recognized through 
biochemical isolation of factors involved in photo- 
synthesis, or be identified by similarity to their coun- 
terparts of known function in other organisms. Recent 
advances in molecular genetics are helping bridge the 
gaps between these different experimental approaches. 
For mutants obtained by classical forward genetics, 
positional cloning or ‘tagged’ insertion alleles allow 
the isolation of the genes that are affected, thus reveal- 
ing sequence information on the factors they encode. 
Conversely, for genes identified by biochemical 
approaches, insertion mutants can be obtained by 
screening large collections using PCR methods. The 
genes involved in photosynthesis can be classified 
arbitrarily into three broad groups (Figure 2): (a) the 
structural genes of the photosynthetic complexes and 
enzymes; (b) the genes encoding ancillary factors 
involved in the biogenesis of the photosynthetic 
machinery; and (c) the genes required for the expres- 
sion of the chloroplast genome. 

The first group (a) comprises the structural genes 
for the photosynthetic apparatus, such as subunits of 
the photosynthetic complexes (Table 1), or of the 
enzymes of the Calvin cycle. These genes are found 
in the chloroplast and the nuclear genome. A typical 
example is Rubisco, which in plants and green algae is 
composed of eight identical large subunits, encoded 
by the chloroplast gene rbcl, and eight small subunits, 
encoded by a family of nuclear genes, RbcS. 

The second group (b) includes genes for enzymes 
involved in the biosynthesis of pigments and other 
prosthetic groups (chlorophyll, heme, carotenoids, 
iron-sulfur centers, lipids, etc.) and for factors 
involved in the assembly of the photosynthetic 
machinery (e.g., heme lyases, chaperones) and in its 
repair or degradation (e.g., proteases). One can also 
include in this group the genes involved in protein 
import into the plastid or targeting within the chloro- 
plast, for example to the thylakoid membranes or to 
the lumen. In higher plants, most genes of this second 
group are in the nuclear genome, but in some algae 
a few are found in the plastid genome. A third 
prominent group (c) is composed of genes required 
for the expression of chloroplast genes involved in 
photosynthesis. Genes for ribosomal proteins or sub- 
units of the RNA polymerases belong to this class, 
distributed on both the nuclear and the plastid 
genomes. These genes are expected to be generally 
required for the expression of all the genes in the 
plastid genome, or of large sets of genes. Mutations 
in these genes occur in plants but are probably lethal 
in C. reinhardtii. In contrast to these genes for the 
general gene expression machinery, there are other 
loci that are involved in the expression of specific 
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Figure 2 Nuclear participation in the biogenesis of the photosynthetic apparatus. In this schematic representation 
of a nucleus and a plastid in a plant cell, nuclear genes encode polypeptides that are translated in the cytosol and 
imported into the plastid, where they assemble with subunits encoded in the chloroplast genome. The proteins that 
are imported (black arrows) can be arbitrarily classified in three groups as discussed in the text: (a) structural 
components of the photosynthetic complexes and enzymes; (b) proteins required in a broad sense for the assembly 
of the photosynthetic apparatus; (c) factors required for the maintenance and expression of the plastid genome. 


(Adapted from Goldschmidt-Clermont, 1998.) 


chloroplast genes. The latter act in posttranscriptional 
steps of gene expression (RNA processing and stability, 
splicing, translation) and are specific for only small sub- 
sets of chloroplast genes. The loci are mostly nuclear 
and are surprisingly numerous. An extreme example is 
the maturation of the chloroplast psaA mRNA in 
C. reinhardtii, which is composed of three exons that 
are transcribed as separate precursors: assembly of the 
psaA mRNA by two steps of splicing in trans requires 
the contribution of at least 14 nuclear loci. 


Regulation 


The expression of some chloroplast photosynthesis 
genes is regulated by light, mainly at the levels of 
mRNA maturation, stability, and translation. Chloro- 
plasts are derived from nonphotosynthetic precursors, 
the proplastids, following pathways that are regulated 
by developmental and environmental cues, promin- 
ently by light (see Photomorphogenesis in Plants, 
Genetics of). Chloroplast development and the 
assembly of the photosynthetic apparatus are tightly 


controlled by the nucleocytoplasmic compartment, 
since most chloroplast proteins are imported. Con- 
versely, the transcriptional activity of some nuclear 
genes encoding chloroplast proteins is influenced by 
signals emanating from the plastid, a response which is 
altered in the gum mutants (genomes uncoupled) of 
Arabidopsis. 

Photosynthesis responds dynamically to changes in 
the environment, and in particular to changes in light 
quality and intensity. Part of the LHC is reversibly 
redistributed between PSII and PSI depending on the 
redox state of the plastoquinone pool, a process that is 
affected in the state transition (stt) mutants of C. rein- 
hardtit. In adaptation to excess light energy or low 
temperature, the epoxidation/de-epoxidation cycle of 
xanthophylls in the antenna modulates a component 
of nonphotochemical quenching, which is defective in 
the npq mutants. 

Mitochondrial and chloroplast functions are tightly 
integrated, for example in the metabolic pathway of 
photorespiration. There is also genetic evidence that 
mitochondria play an essential role during plastid 


development. In some NCS (nonchromosomal stripe) 
mutants of maize, lesions in the mitochondrial DNA 
suggest that the primary defect may be in the mito- 
chondrial genome and that the effect on plastid 
development is a secondary effect. 


Concluding Remarks and Perspective 


From an early stage, genetic studies of photosynth- 
esis have focused on the structural genes for the 
photosystems, for other photosynthetic enzymes, or 
for pigment biosynthesis pathways. The genetics of 
photosynthesis has also revealed a surprisingly large 
number of loci that are involved in posttranscriptional 
steps of chloroplast gene expression. Recently, atten- 
tion has also been devoted to genes involved in the 
biogenesis of the plastids, e.g., protein transport and 
targeting, or assembly of polypeptides and cofactors 
to form the large macromolecular complexes. Plastid 
development and photosynthesis are regulated in 
response to the environment, and genetics have 
revealed many components of light perception and 
photomorphogenesis. Genetic approaches are begin- 
ning to address the dynamic adaptations of photo- 
synthesis to changes in the quality and quantity of 
light. With the emergence of many new molecular 
tools that enhance the long-standing heritage of clas- 
sical genetics, our understanding of photosynthesis 
will be growing at an increasing pace in the years to 
come. 
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The branching pattern of ancestor—descendant rela- 
tionships among species or their parts (e.g., genes) is 
a phylogeny. Researchers attempt to estimate these 
historical relationships by examining character evolu- 
tion using a tree — a mathematical structure used to 
model the actual evolutionary history of species or 
their parts. These inferred trees (historical branching 
relationships) can be represented as cladograms, 
where branch lengths are arbitrary and only the 
branching order is significant, or as phylograms, 
where the branch lengths are proportional to the 
amount of evolutionary change along the branch. 

Phylogenies were historically used to classify 
organisms into natural evolutionary groups based on 
these ancestor—descendant relationships. Indeed, great 
effort is currently being spent on estimating the ‘tree 
of life’ to quantify the biodiversity of our planet. 
However, phylogenies have also spread in use as the 
utility of the evolutionary framework for numerous 
other disciplines becomes increasingly obvious. For 
example, phylogenies are used extensively in conser- 
vation biology, developmental biology, genomic biol- 
ogy, the study of infectious disease, virology, human 
genetics, and ecology. The entire field of comparative 
biology is now couched in terms of phylogenetic 
associations. Thus the accurate estimation of phylo- 
genetic relationships has become a centrally important 
topic of research. 

Phylogenetic estimation is accomplished by opti- 
mizing character change relative to some criterion 
over a tree. The tree for which the character data 
show the best optimization is the preferred tree. 
There are two principal optimization criteria used by 
researchers: maximum parsimony and maximum like- 
lihood. The parsimony criterion attempts to minimize 
the number of changes among a tree for shared- 
derived characters, while likelihood attempts to 
maximize the probability of change for all characters 
relative to some model of evolution. Each criterion has 
its own strengths and weaknesses. For example, max- 
imum parsimony can incorporate insertion—deletion 
(indel) events and have asymmetric changes (e.g., a 
change from character A to character B is not the 
same as a change from character B to character A), 
whereas current implementations of maximum like- 
lihood cannot accommodate these biological realities. 
Likewise, maximum likelihood can account for 
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heterogeneity in evolutionary rates and multiple 
changes at the same character position, whereas max- 
imum parsimony cannot. Thus there is, often times 
heated, discussion about appropriate methods to use 
to estimate phylogenetic relationships. 

One of the reasons there is such debate about phy- 
logenetic methods is that their performance varies 
depending upon the type of data used, the number of 
species involved, and the depth of the evolutionary 
relationships to be inferred. Exact searches, those 
that explore every possible tree topology for a given 
optimality criterion, are only possible for a very small 
number of taxa (on the order of 20-30). This limited 
search is due to the rapidly increasing number of 
possible trees with a modest increase of taxa. The 
total number of (unrooted, strictly bifurcating) trees 
for T taxa is 


T 
B(T) = | [2i- 5) 
1=3 


So, for example, with only 50 taxa, there are 3 x 1074 


possible trees. For the tree of life, there are estimated 
to be well over 10 million species, yet for 10 million 
taxa there are 5 x 106866734 possible trees! Therefore, 
the phylogeny problem is a particularly tough one that 
is attracting the attention of computer scientists and 
mathematicians as well as biologists. An alternative 
approach to the optimality criterion is to use an algo- 
rithmic approach such as neighbor-joining. Neighbor- 
joining provides a heuristic point estimate for the 
minimum evolution tree which attempts to minimize 
the overall genetic distance among taxa relative to a 
specified model of evolution. Such methods are often 
used when sample sizes become very large. 
Phylogenetics has become an active field in and of 
itself. It is an extremely exciting field where talents in 
mathematics, computer science, and biology can be 
brought together to work on the problem of inferring 
historical relationships. A survey of the recent litera- 
ture in many fields will attest to the ever increas- 
ing applicability of phylogenetic analyses to diverse 
fields. In the slightly paraphrased words of the great 
population geneticist Theodisius Dobzhansky, “noth- 
ing in biology makes sense accept in the light of 


phylogeny.” 
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The science of phylogeography is concerned with 
the principles and processes governing the geographic 
distributions of genealogical lineages, especially those 
at the intraspecific level. The word itself was coined 
in 1987, but the discipline’s intellectual development 
began early in the century with studies of the popu- 
lation dynamics of surname turnover in human 
societies. The statistical and mathematical sides of 
phylogeography have developed in recent years into 
what is now termed coalescent theory, which addres- 
ses how, as functions of population demography, 
lineages trace back in time through extended pedigrees 
to common ancestors. 

On the empirical side, phylogeographic analyses 
have been motivated primarily by molecular genetic 
appraisals of animal mitochondrial DNA (mtDNA). 
Because this molecule evolves rapidly and is mater- 
nally inherited without recombination, it provides 
a chronicle of matrilineal relationships within and 
among related species. Thus, phylogenetic analyses 
of mtDNA variants can be used to estimate the exten- 
ded matrilineal component of an organismal pedigree 
in much the same way that family surnames in 
many human societies traditionally record patrilineal 
histories. Unlike surnames, however, the mitochon- 
drial genetic archives extend much farther back in time 
and in principle can be recovered from nearly any 
multicellular animal species. For technical reasons, 
comparable studies in plants often focus on another 
cytoplasmic genome — chloroplast DNA (cpDNA) - 
rather than mtDNA. A future challenge for the field 
of phylogeography centers on developing comparable 
genealogical methods for autosomal genes, a task 
made more difficult by a slow evolutionary pace for 
many nuclear DNA sequences and by the likelihood 


of historical intragenic recombination. 


Comparative studies in molecular phylogeography 
have revealed the following: (1) Most species are 
composed of geographic populations whose mem- 
bers occupy recognizable phylogenetic branches in 
a matrilineal tree; (2) exceptional cases (in which 
phylogeographic population structure is minimal or 
nonexistent) usually involve highly vagile organisms, 
and/or species that have occupied historically continu- 
ous ranges; (3) historical population separations can 
range from temporally shallow to deep; and (4) pro- 
nounced phylogenetic gaps often observed between 
regional populations usually appear to have resulted 
from long-term biogeographic barriers to gene flow. 
Molecular phylogeographic patterns also are highly 
relevant to conservation biology and to an under- 
standing of speciation processes. 

In broad terms, the most important contributions 
of phylogeography to evolutionary analysis have 
been to: (1) emphasize the historical, nonequilibrium 
aspects of microevolutionary change, (2) clarify the 
tight connections between population demography 
and genealogy, and (3) build empirical and conceptual 
bridges between the nominally separate fields of 
population genetics and phylogenetic biology. 


Further Reading 
Avise JC (2000) Phylogeography: The History and Formation of 
Species. Cambridge, MA: Harvard University Press. 


See also: Phylogeny; Population Genetics 
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In contrast to conventional genetic mapping, physical 
mapping is the process of demonstrating that two 
fragments of DNA both contain sequence in common. 
The two fragments may match each other precisely for 
thousands or millions of nucleotides or they may have 
in common a run of sequence of only a dozen or so 
nucleotides. Usually the identity between the DNA 
fragments is revealed by hybridizing a labeled 
DNA fragment to a complex mixture of unlabeled 
DNA fragments which have been separated by gel 
electrophoresis and transferred by blotting to a mem- 
brane. All of the DNA bands on the membrane which 
have become associated with the labeled fragment 
(probe) have sequence in common with the probe. 
Physical mapping can also refer to a method of 
identifying a DNA fragment which carries a particular 
function by changing the size of the DNA fragment. 
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Introduction of either a sizeable deletion or insertion 
into a gene carried on a plasmid will concomitantly 
alter the size of the restriction fragment of DNA 
which carries the gene. Therefore analyzing the sizes 
of DNA restriction fragments from a plasmid which 
carries the gene of interest whose function has been 
altered by transposon insertion will reveal a DNA 
fragment whose size has been increased. This larger 
DNA fragment corresponds (maps) to the gene which 
has been altered. 


See also: DNA Hybridization; Gene Mapping 
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The piebald trait (also referred to as piebaldism) is 
characterized by the presence of patches of skin and 


Figure | 
pigmented and white patch region of a piebald (Ednrb*/ 
Ednrb*') mouse. Note the presence of pigmented 
melanocytes in the hair follicle of the normally 
pigmented region (A). In a section through a white 
patch (B) the hair follicle is normal, however the 
melanocytes are missing and thus the hair is white. 


(See Plate 27) Sections through a normally 
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hair that that are white due to a lack of pigmentation 
(color). The patches are white because they are mis- 
sing the specialized pigment (melanin) producing cells 
called melanocytes (Figure |). The precursors to mela- 
nocytes are few in number and are formed early in 
gestation on the top of what will become the spinal 
cord. Once formed, they must divide and migrate 
throughout the developing skin to result in complete 
pigmentation of the body. If a gene is mutated that 
prevents the melanocytes from migrating to a particu- 
lar area of the body, or if the cells do not divide, or if 
they die, a white patch will form in the region missing 
the melanocytes. Piebaldism is common in the animal 
kingdom being seen in species as different as mice and 
horses. 

Incontrast to the pigmented patches seen in piebald- 
ism, a related condition called albinism has reduced 
pigment in skin and hair which is not associated in 
patches and more uniform across the entire body. 
Albinism has a normal distribution of melanocytes, 
however the melanocytes lack the ability to produce 
the melanin pigment. 


Human Piebald Trait 


The human piebald trait is inherited as an autosomal 
dominant disorder. The white areas are typically 
located on the front, middle portion of the forehead 
(called a white forelock), eyebrows, chin, abdomen, 
feet and hands. Piebaldism can sometimes be 
associated with deafness. This is thought to be caused 
by a lack of the melanocytes in a part of the inner ear 
called the stria vascularis. The gene that is altered in 
the piebald trait has been identified as a tyrosine 
kinase receptor gene called, KIT. It is thought that a 
mutation in this gene causes the melanocytes to die 
more frequently. Therefore there is not enough mela- 
nocytes to fully populate the surface of the body, 
hence white patches result. 


Related Disorders: Waardenburg 
Syndromes and Hirschsprung Disease 


White patches similar to those seen in the piebald trait 
can also be associated with Waardenburg syndrome 


Figure 2 (See Plate 26) Four mouse strains that have mutations in genes that caused patches of white hair in 
the coat: (A) piebald (Ednrb‘/Ednrb*); (B) piebald lethal (Ednrb*/Ednrb*'); (C) lethal spotting (Edn3'S/Edn3‘); 
(D) dominant megacolon (SOX/0°°”/-++). Note that the white patch in panel a resembles the white forelock often 


seen in piebaldism and Waardenburg syndrome. 


which is divided into several types (1-4) depending on 
the presence of additional traits: pigment anomalies of 
the iris, deafness, limb anomalies and a widened bridge 
of the nose (dystopia canthorum). Waardenburg syn- 
drome is autosomal dominant and mutations have 
been found in two different genes. One of them is 
a paired-box class transcription factor, PAX3. The 
second gene is also a transcription factor, MITF, but 
of the basic helix-loop-helix class. Waardenburg 
syndrome type 4 has in addition to the white patches 
and deafness due to a loss of melanocytes, a loss of 
cells that are related to melanocytes (enteric ganglion 
cells) in the digestive tract. This condition is also called 
Waardenburg—Hirschsprung disease and mutations 
have been found in a different receptor gene (endothe- 
lin receptor B) the ligand for that receptor (endothelin 
3) or a transcription factor of the HMG-box type 
(SOX 10). 


Mice and the Piebald Trait 


There are several strains of mice that demonstrate 
white patches of fur caused by a lack of melanocytes 
in those areas (Figure 2). One of the mice strains 
(dominant spotting) has the same KIT gene altered 
that is found in human piebaldism. There is a second 
mouse strain that is called piebald. However, it does 
not have the same gene mutated that is found in 
the piebald trait. It has a mutation in the gene 
for Waardenburg—Hirschsprung disease, endothelin 
receptor B. In fact mice exist that have mutations in 
MITE, PAX3, endothelin 3 and SOX10. These mice 
were useful to help determine which genes were defect- 
ive in the human conditions described above. They are 
also useful to determine why specific mutations result 
in the white patches. 


See also: Albinism; Coat Color Mutations, 
Animals; Hirschsprung’s Disease 
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Much of what we know about mammalian pigmenta- 
tion is based on mouse models. Indeed, the mouse is 
a rich source of models for genetic disorders of human 
pigmentation, including several types of albinism. As 
in other mammals, the major pigment in the mouse is 
melanin, a polymer formed primarily from tyrosine, 
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produced in specialized organelles called melano- 
somes within the pigment cell (or melanocyte). In 
mammals, melanocytes (with or without melanin) 
are required for normal hearing, and melanin is critical 
for normal vision. In some mammals, melanin aids in 
camouflage and solar protection, and can influence 
mate selection and vitamin D metabolism. There are 
nearly 100 different mouse loci (known by their muta- 
tions) that affect pigmentation at critical points. A 
current list is available through the internet (see 
below). 

Some mouse pigmentation mutations affect early 
development, influencing melanoblast viability, pro- 
liferation, and migration from neural crest precursors. 
The genes encoded at these loci include growth factors 
(e.g., Mgf) and their receptors (e.g., Kit) that may also 
affect other neural-crest-derived cells (e.g., hema- 
topoietic stem cells and primordial germ cells). The 
pigmentation phenotypes of this class of mutations 
range from white spots to a complete absence of 
neural-crest-derived melanocytes. 

Other mouse mutations affect the morphology of 
the melanocyte. An example of this type of mutation 
is Myo5a?. The MyoSa gene encodes an unconven- 
tional myosin, needed to help move the melanosomes 
along the dendrites of the melanocyte to efficiently 
export Pigment to hair follicles. The fur color of 
MyoSa“/MyoSa“ mice is dilute relative to wild-type 
mice. 

Another group of mouse mutations affects the 
morphology and integrity of the melanosome. The 
phenotype of these mutations may not be limited 
to melanosome irregularities, but may also affect 
other related organelles, such as lysosomes and even 
platelet dense bodies, as seen in the pale ear (ep) 
mutation and its human homolog Hermansky—Pudlak 
syndrome. The ep gene product and several other 
genes in this category may play a role in membrane 
trafficking, a process critical to normal organelle bio- 
synthesis. 

Still other mouse mutations affect the quantity 
of melanin. The most critical enzyme in the pro- 
duction of melanin is tyrosinase. Albino mice are 
homozygous for mutations in tyrosinase (Tyr‘/ Tyr‘) 
and lack melaninin. Humans with tyrosinase muta- 
tions have tyrosinase-related oculocutaneous albinism 
or OCA1, a common form of albinism. Temperature-- 
sensitive mutations in this gene result in the 
Himalayan mouse (Tyr°’/Tyr~”) and Siamese cat. 
The mouse pink-eyed dilution gene (p) may aid 
in creating conditions favorable for tyrosinase enzy- 
matic activity. Mutations in the human homolog, 
P, underlie the other common form of albinism 
(tyrosinase-positive oculocutaneous albinism or 


OCA2). 
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Lastly, some mouse mutations affect the type of 
melanin produced, from brown/black eumelanin to 
yellow/red phaeomelanin. A genetic switch during 
the hair cycle, mediated by the Agouti gene (A) pro- 
duct and melanocortin receptor 1 (Mcr1), leads to the 
banded pattern of hair color seen in wild-derived mice 
and in many other mammals. The Agouti gene has a 
complex promoter that can respond to positional (e.g., 
dorsal vs. ventral) and temporal (e.g., hair cycle) sig- 
nals producing many of the coat color patterns seen in 
mammals. Certain MCR1 variations in humans are 
associated with red hair. 

A complex picture of the genetics underlying 
mammalian pigmentation is emerging from the study 
of mouse pigmentation mutations. These mutations 
offer model systems to study human genetic disease 
as well as the development and regulation of normal 
pigmentation. Moreover, genetic variation in many of 
these same genes mediates pigmentation variation 
within, and between, mammalian species. 


Reference 
http:// www.informatics.jax.org/ 


See also: Coat Color Mutations, Animals; Genetic 
Diseases; Mus musculus 
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The Pim genes belong to a distinct family of serine/ 
threonine kinases. To date, this family contains three 
members: Pim1, Pim2, and Pim3. The Pim genes ex- 
hibit a short half-life of their messenger RNA (mRNA) 
transcripts as well as the encoded proteins. All Pim 


Table | 


genes have similar properties: they are induced by a 
range of cytokines and growth factors; overexpression 
results in a strong predisposition toward leukemia; 
and they are a very efficient partner for the Myc 
genes in the oncogenic transformation of lymphoid 
cells. 


See also: Leukemia 
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The garden pea (Pisum sativum L.) has long served as a 
model organism for genetic investigations. Gregor 
Mendel actually worked on several plant species dur- 
ing his studies, but only the pea provided easily scored 
traits consistently displaying simple inheritance ratios. 
Several other traits also recommend the pea as a model 
organism. The plant is relatively small and completes 
its life cycle in 3-4 months. The flower is large and, 
although primarily self-pollinating, can be conveni- 
ently emasculated and fertilized with pollen from 
another plant. Finally, the plant is an important crop, 
and has many close relatives (lentil (Lens culinaris), 
faba bean (Vicia faba), chickpea (Cicer arientum), 
grasspea (Lathyrus sativus L.)) that are also important 
crops. Thus, advances in our understanding of pea 
genetics can often be directly applied in agriculture. 
The most outstanding character of the pea as a 
genetic model is the large number of simply inherited 
morphological and physiological polymorphisms 
it displays. Mendel worked with seven of these 
(Table 1), but succeeding pea breeders and geneticists 
have identified over 500 additional natural or induced 
mutations. This variation has proven to be exceedingly 


The seven genes studied by Mendel, their positions and biochemical basis 


Trait (dominant/recessive) Locus symbol 


Map position? Biochemical basis 


Cotyledon (yellow/green) l 
Flower (violet/white) A 

Pod (green/yellow) Gp 
Pod wall (stiff/collapsing) Por V 
Pods (dispersed/clustered) Fa 
Seeds (round/wrinkled) R 
Stem length (tall/dwarf) Le 


| (92) Retention of chlorophyll 
Il (38) Lacks anthocyanins 
V (91) Lacks chlorophyll 


VI (50) or IIl (136) 
IV (uncertain) 

V (40) 

Il (131) 


Lacks sclerenchyma 
Unknown 

Starch branching enzyme 
Gibberellin 3-hydroxylase 


“Linkage group is designated by Roman numerals. Number in parentheses gives distance of gene in centimorgans from the 


top of the linkage group on the consensus map. 


useful to researchers in many fields. In breeding, the 
recessive a gene is used throughout the world to pro- 
duce better-tasting peas. Recessive alleles at either R or 
R, give wrinkled seeds that are sweeter than the wild- 
type if harvested at the correct time, and the double 
recessive combination results in a super-sweet pea. 
The combination of recessive alleles at three loci (P, 
V, and N ) produces the snap pea. Finally, the recessive 
allele, af (for afila), produces a plant in which all 
leaflets are converted to tendrils. Surprisingly, this 
phenotype generates as many pods as its normal coun- 
terpart, but because the tendrils interlace much more 
frequently between plants the phenotype is much 
more resistant to lodging. The erect nature of afila 
plants makes harvesting easier and improves overall 
yield. 

Developmental biologists have been interested in 
the afila gene, as well as five other genes (coch, tl, sil, st, 
and uni) that alter the morphology of leaflets and 
stipules. Researchers investigating these mutations 
hope to gain an understanding of the developmental 
processes producing the compound leaf and tendrils, 
just as investigations of flowering mutants in Arabi- 
dopsis have resolved questions regarding the process 
of flower formation and tissue differentiation. 
Another set of over 30 mutations (usually designated 
sym for symbiosis) in the pea all influence processes 
involved in nodule formation and metabolism. Again 
these mutations are being further explored by physi- 
ologists and molecular biologists to understand how 
rhizobia interact with the host plant and the process of 
nitrogen fixation in legumes. 

Molecular marker studies have been performed in 
the pea since 1970, and the first single-copy RFLP 
mapping in plants was performed in 1985. Genetic 
diversity studies have revealed copious levels of mo- 
lecular polymorphism in the domesticated pea (P. sati- 
vum subsp. sativum), and considerably more exists in 
the three other subspecies, P. sativum elatius, P. sati- 
vum abyssinicum, and P. sativum pumilo (P. sativum 
humile). All subspecies are interfertile, and perhaps as 
a result of considerable gene flow between subspecies 
in the Middle East, the delineation of the subspecies 
elatius, pumilo, and sativum is occasionally prob- 
lematic. 

The abundance of molecular markers has made the 
identification of tags for marker-assisted selection 
applications and the development of linkage groups 
relatively easy. A number of important genes for 
breeding have markers, and several linkage maps con- 
sisting almost entirely of molecular markers have been 
developed. A surprising outcome of this work is the 
failure of these more recent maps to confirm the clas- 
sical linkage map for the pea assembled in the 1960s. 
The classical map was primarily based on data from 
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di- and trihybrid crosses, and apparently a number of 
pseudolinkages were unknowingly incorporated into 
this map. A highly saturated (average distance between 
adjacent markers approximately 1cM) ‘consensus 
map’ is now available for the pea which suggests that 
four of the seven linkage groups portrayed in the 
classical map contained fragments from different 
chromosomes. Although early cytogenetic studies 
had indicated the presence of several translocations 
within the domesticated pea germplasm, there also 
appear to be a considerable number of ‘cryptic’ trans- 
locations that were unknown to those first performing 
linkage studies in this species. 

The relatively large genome of the pea (haploid 
DNA content of about 5 pg or 4.7 x 10° nucleotide 
pairs) prevents it from being a particularly good model 
organism for gene isolation or genome walking experi- 
ments. Early studies demonstrated that approxi- 
mately 35% of the pea genome contained single copy 
sequences. The rest consisted of repetitive DNA, with 
about 5% being highly repetitive. Such high percent- 
ages of repetitive DNA have discouraged investigators 
from using the pea to search for specific coding se- 
quences. In addition, transformation of the pea using 
Agrobacterium developed slowly. Only when cotyle- 
donary node transformation techniques became avail- 
able in the past few years did transformation of the pea 
become routine. Thus, the pea has not been commonly 
used for some of the most recent exploits in plant bio- 
technology. However, the wealth of genetic inform- 
ation available for the pea coupled with the interesting 
developmental mutants already known suggests that it 
will maintain its popularity as an experimental organ- 
ism as the explanation of physiological and devel- 
opmental processes becomes the focus of genomics. 


See also: Leguminosae; Mendel’s Laws 
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Development is a highly regulated process involving 
growth, specialization of cells, and changes in form. 
In contrast to most animals, plants exhibit extensive 
development after embryo formation is complete. 
Postembryonic development is remarkably plastic 
and generates a wide variety of distinctive pheno- 
types in response to environmental factors such as 
nutrition, the quantity and quality of light, water, and 
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temperature. Natural and induced genetic variation 
has proved to be a powerful tool for studying pro- 
cesses underlying the development of seed plants. This 
review discusses the genetic basis for these processes. 


Basic Processes in Plant Development 


The bipolar body of plants is generated by apical 
meristems, which are self-renewing populations of 
embryonic cells. The shoot and root apical meristems 
and the basic tissue types (ground tissue, dermal tissue, 
and vascular tissue) are formed during embryogenesis 
in the developing seed. Embryonic as well as post- 
embryonic development starting with the germination 
of seeds involves the basic processes of differentiation, 
growth, morphogenesis, and determination. 


Differentiation and the Concept of 
Differential Gene Activity 

The fertilized egg gives rise to a simple, relatively 
homogeneous mass of cells. As development pro- 
ceeds, plant cells become structurally and functionally 
specialized. This process, called differentiation, often 
involves modifications in the structure and chemical 
composition of cell walls; the accumulation by cells of 
specific proteins, lipids, and polysaccharides; and pro- 
duction of secondary metabolites such as pigments, 
defense-related substances, and fragrances. 

In the 1930s T.H. Morgan proposed that differentia- 
tion as well as the other basic developmental processes 
results primarily from changes in gene expression 
rather than from changes in the genetic constitution 
of cells. Detailed molecular studies with many organ- 
isms including seed plants strongly support this 
hypothesis. 


Growth and Morphogenesis 

With few exceptions there is no movement of plant 
cells during development. The size, shape, and posi- 
tion of cells, tissues, and organs is established by 
highly regulated cell division and cell enlargement. 
The relative contribution of cell division and en- 
largement is quite variable. For example, growth in 
meristematic regions is predominantly by cell div- 
ision, whereas the formation of the phloem involves 
few cell divisions and impressive increases in cell 
volume by several thousand-fold. The formation of 
large, specialized cells is usually associated with 
increases in ploidy resulting from endoreduplication 
of chromosomes. Growth shows complex genetic regu- 
lation. Although size is often a multigenic trait, some 
single-gene mutations regulate the enlargement of 
specific cells such as hair cells (trichomes) by affecting 
endoreduplication. Other single-gene mutations are 
highly pleiotropic such as those causing deficiencies 


in the hormone gibberellin, which result in dwarf 
plants. 

Certain specialized cells, for example, trichomes, 
gland cells, and the guard cells and subsidiary cells of 
stomata, are derived from a recognizable precursor, 
the idioblast. Idioblasts frequently exhibit unequal or 
polar division, i.e., they give rise to daughter cells 
conspicuously different in cytoplasmic content or 
size. Studies of mutants that perturb the plane of cell 
division and physiological experiments have shown 
that unequal division is often essential for the proper 
differentiation of the daughter cells. 

Plant tissues and organs exhibit an intrinsic polar- 
ity. Independent of their orientation relative to grav- 
ity, roots tend to regenerate at the morphological 
basal end of isolated stem segments, while shoots 
tend to regenerate at the apical end. The growth 
hormone auxin has a key role in determining polarity. 
Auxin is transported preferentially from the acropetal 
to the basipetal end of organs. This polar transport 
depends on an auxin exit carrier localized at the basi- 
petal end of cells. Deficiency mutants affecting the 
Arabidopsis Pin-formed 1 (PIN1) gene, which encodes 
the carrier or one of its components, are unable to 
transport auxin in a polar fashion and do not form 
organs derived from the shoot meristem. These find- 
ings together with experiments in which auxin is 
applied locally to shoots suggest that polar transport 
of auxin is important for the initiation and positioning 
of organs. 

A wide variety of single-locus mutations affect the 
shape, number, and position of organs. Some muta- 
tions act on specific organs. For example, alleles at a 
single locus in a curcurbit were shown to determine 
disk-shaped, spherical, and elongate gourds. The 
semidominant Lanceolate (La) mutation in tomato 
converts the compound leaf into a simple leaf, while 
the dominant Petroselinium (Pts) mutation increases 
the subdivision of the leaf. Other mutations act on 
several different organs, such as those at the S locus 
of tobacco which alter the length to width ratio of 
leaves and different whorls of the flower. 


Determination 
As plants develop, meristems and organ primordia 
become progressively committed to form specific 
and definite structures. This process, called determin- 
ation, results from stable changes in phenotype which 
persist in the absence of the agent that originally 
induced the change. As a consequence, parts of an 
organism can ‘remember’ their past, and this permits 
the progressive new formation of structures. 

Genes concerned with organ identity and specifica- 
tion are thought to have a key role in determination. 
Of particular interest are homeotic mutations that 


result in the duplication or substitution of floral 
organs. The determination of inflorescences involves 
at least two stable changes. First, a state of competence 
for flowering is established in response to a variety of 
internal and environmental factors. Several genes 
important in the signaling pathways for perceiving 
photoperiod and cold have been identified. Second, 
competent shoot meristems are transformed into 
inflorescence meristems. This transition and the sub- 
sequent specification of floral organs are regulated ina 
hierarchical fashion by genes encoding transcription 
factors that activate downstream homeotic genes. 
Studies of mosaics consisting of mixtures of mutant 
and wild-type cells have shown that some of these 
genes act cell autonomously, whereas other are in- 
volved in cell-cell signaling, within and between tissue 
layers. 


Epigenetic Regulation 


Stabilization of Developmental States 

In many cases, complete, fertile plants can be regener- 
ated from cultured tissues derived by cloning from 
individual differentiated cells. Although there are 
obvious exceptions such as cells that are enucleate or 
dead at maturity, these findings indicate that differ- 
entiated plant cells are totipotent, i.e., they have the 
potential to form the complete organism. Thus, in 
general, the development of plants is a reversible pro- 
cess that does not result from either the loss of genes or 
other permanent changes in the genome. This implies 
that determination results from epigenetic regulation, 
i.e., stable, but potentially reversible changes in gene 
expression. Analyses of visible patterns generated by 
spontaneous and induced somatic mutations have 
shown that there is no strict cell lineage in plants and 
that developmental compartments are not generally 
clonal in origin. This suggests that interactions 
between cells rather than cell lineage have a major 
role in plant development. Cell lineage studies have 
also shown that there is no fixed germline in plants. 
Thus, traits arising from stable somatic mutations 
resulting in genetic mosaics can be transmitted meiot- 
ically and segregate to give plants that are uniform in 
genotype. 

Forms of epigenetic regulation that are transmitted 
mitotically to daughter cells, known as epigenetic 
changes, have also been reported in plants. Well- 
documented examples include the formation of 
secretory-cell idioblasts in Ricinus, variation in the 
expression state of the Suppressor-mutator/Enhancer 
transposable element in maize, and clonal variation in 
the responses of tobacco cells to the growth hormone 
cytokinin. Epigenetic changes may also be the basis 
for the persistence of embryonic competence, juvenile 
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and adult states of development, and some tissue- 
specific forms of differentiation in proliferating 
cultures. 

Mechanisms proposed for epigenetic regulation 
include self-perpetuating patterns of cell division as 
described for the formation of roots of the water fern 
Azolla; biochemical switch models based on positive 
autoregulation of regulatory circuits; and, the methyl- 
ation of cytosines in DNA. Studies of the Knotted-1 
mutation in maize have shown that the KNOTTED-1 
gene product, which is a transcription factor, can move 
from cell to cell via plasmodesmata. This suggests that 
epigenetic regulation at the tissue and organ level 
might be mediated by mobile transcription factors. 


Epimutation 

Certain epigenetic changes, called epigenetic muta- 
tions or epimutations, can be transmitted meiotically 
over several sexual generations. Examples include 
paramutation and presetting of transposable element 
activity in plants and some parent-of-origin effects 
such as genomic imprinting reported for mammals 
and plants. Epimutations can have striking develop- 
mental effects. Recessive epimutations in the Arabi- 
dopsis SUPERMAN gene show increased numbers of 
stamens and carpels of the flowers. A naturally occur- 
ring epimutation in the Lcyc gene of Linaria vulgaris 
changes the fundamental symmetry of the flower from 
bilateral to radial. In a number of cases, including 
those cited above, measurements of the methylation 
level of specific cytosines in the genes affected and 
studies of crosses made between the epimutants and 
DNA-methylation deficient mutants have shown that 
epimutation can result from inhibition of transcrip- 


tion by potentially reversible DNA methylation. 


Homology-Dependent Gene Silencing 

A related epigenetic phenomenon is homology- 
dependent gene silencing (HDGS), which frequently 
occurs in transgenic plants. In this case, the interaction 
in trans of multiple copies of genes similar in sequence 
results in the inactivation of their expression. Two 
forms of HDGS have been described in plants. Tran- 
scriptional gene silencing results from a marked 
decrease in transcription due to hypermethylation of 
the genes involved and shows a high level of genetic 
transmission. In contrast, posttranscriptional gene 
silencing (PTGS), while sometimes associated with 
hypermethylation, results from sequence-specific 
degradation of RNA rather than the inhibition of 
transcription. PTGS is meiotically transmissible, 
shows pronounced developmental regulation, and 
results in the elaboration of sequence-specific silenc- 
ing signals that can move from cell to cell or even over 
long distances in the plant. PTGS in transgenic plants 
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shows strong mechanistic and genetic links to quelling 
in Neurospora crassa and RNA interference (RNAi) 
described in nematodes, insects, and mammals sug- 
gesting they have a common, highly conserved mo- 
lecular mechanism. Considerable evidence supports 
the hypothesis that PTGS helps defend plants against 
virus infection. It is not known if PTGS also plays a 
role in developmental regulation. 


Concluding Remarks 


Studies of developmental mutants and genetic mosaics 
suggest that plant development involves hierarchical 
regulation of genes linked by complex signaling net- 
works. Genes important for organ specification, 
growth, and morphogenesis have been cloned and 
sequenced. Key steps in development are often regu- 
lated at the level of transcription. Transcriptional 
specificity depends on the interaction of elements 
within the gene with transcription factors and other 
proteins. Thus, cloned genes provide the starting point 
for establishing the sequence of causal events in regu- 
latory pathways. 

Plant developmental biology is entering an exciting 
phase of rapid progress. The genomes of Arabidopsis 
thaliana and rice have been sequenced. Genetic homo- 
logies in species as distant as yeast, Arabidopsis and man 
suggest that comparative studies will provide insight 
into highly conserved developmental mechanisms. 
Finally, measurements of RNA expression patterns 
using DNA-chip technologies and high-resolution 
separation and identification of proteins will offer the 
opportunity to get a global picture of gene expression 
associated with a particular developmental event. 
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Plant embryogenesis begins with fertilization. In the 
plant life cycle, this marks the beginning of the diploid 
stage, as the haploid ovule and sperm come together to 
form the zygote. Because plant cells do not move in 
relation to each other, development of the plant 
embryo is entirely dependent on regulated cell div- 
ision and expansion. This is in contrast to animal 
embryos which go through a major reorganization 
marked by massive cell movement. Another striking 
difference between plant and animal embryogenesis is 
that the mature animal embryo contains most of the 
organs and features of the adult, whereas there is little 
ina mature plant embryo that is predictive of the adult 
plant structure. This is because very few organs are 
elaborated during plant embryogenesis. Instead, plant 
embryos form two stem cell populations. At the top 
of the embryo, the shoot apical meristem contains a 
population of cells that will form the postembryonic 
leaves and stem. At the bottom of the embryo, the root 
apical meristem will form the postembryonic root. A 
third difference between plant and animal embryo- 
genesis is that plant embryos go through a stage of 
dormancy and desiccation. Most seeds contain desic- 
cated plant embryos surrounded by stored nutrients 
inside a resistant seed coat. To determine the genetic 
basis for development of the embryo, genetic screens 
have been performed, primarily in the plant model 
species, Arabidopsis thaliana. 


Arabidopsis, a Plant Model System 


Arabidopsis, whose common name is thale cress, is a 
small plant of the mustard family. It has a fairly short 
life cycle (6 weeks from seed to seed), produces 
abundant progeny (up to 10000 seeds per plant), is 
diploid, and has a small genome of 130 Mb which has 
been completely sequenced. It normally self-fertilizes, 
but genetic crosses are straightforward to perform. 

Embryogenesis in Arabidopsis begins with an 
asymmetric division of the zygote to give a smaller 
apical cell and a larger basal cell. Further division of 
the apical cell will generate nearly all of the mature 
embryo. The basal cell divides to form the suspensor, 
which functions in nutrient uptake. The uppermost 
derivative of the basal cell will contribute to part of the 
root meristem and root cap. 


Anatomical analysis of Arabidopsis embryogenesis 
has defined a series of stages. During the globular 
stage, the three primordial tissues — protoderm, 
ground, and procambium - are formed through peri- 
clinal (longitudinal) divisions. Anticlinal (transverse) 
divisions at this stage also separate an upper region 
which will form the shoot apical meristem and the 
cotyledons (embryonic leaf-like structures) from the 
lower region which forms the hypocotyl (the embry- 
onic stem), the embryonic root and the root apical 
meristem. During the triangle and heart stages, cell 
division and polarized cell expansion at two points at 
the top of the embryo begin the formation of the 
cotyledons. The later stages of embryogenesis see 
growth of the cotyledons as well as formation of the 
vascular tissue from the procambium in the center of 
the embryo and division of the ground tissue to form 
endodermis and cortex. 


Genetic Screens for Embryonic Mutants 


Screens for mutations that affect embryonic develop- 
ment usually begin with mutagenesis of seeds. Muta- 
genic agents can be chemicals, such as ethyl methyl 
sulfonate, or ionizing radiation. It is also possible to 
produce insertional mutations. This can be done 
through introduction of foreign DNA using the abil- 
ity of the soil bacterium Agrobacterium tumefacens to 
transfer part of its genetic material into plant cells, or 
by mobilization of transposable elements. After muta- 
genesis, seeds are germinated and the plants are 
allowed to self-fertilize. The progeny can then be ana- 
lyzed for mutations that affect embryogenesis. 

Early genetic screens for defective embryogenesis 
in Arabidopsis were performed by collecting the seed 
pods from individual plants and determining if there 
were aborted embryos. This effort led to the identifi- 
cation of a large number of embryo lethal (emb) muta- 
tions. A drawback of screening for aborted embryos 
was revealed when one of the emb mutants was shown 
to have a defect in biotin biosynthesis. This indicated 
that mutations in genes that play a role in general 
cellular functions could result in embryonic lethal 
phenotypes. 

In an attempt to enrich for mutations that affect 
regulatory genes controlling development, Jiirgens 
and Coworkers (1991) designed a genetic screen 
based on screens for early acting genes in Drosophila. 
Instead of looking for aborted embryos, they allowed 
the seeds to germinate and then looked for seedlings 
with altered body plans. From a screen of 44000 
individual mutagenized plants they obtained a large 
number of alleles which they divided into three 
classes, those affecting the apical/basal axis, the radial 
axis, and overall size. 
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Mutations in Apical—Basal Axis 
Formation 


Along the apical—-basal (top to bottom) axis of the 
embryo, different morphological features can be clas- 
sified as pattern elements. At the top of the embryo the 
cotyledons and shoot apical meristem can be consid- 
ered to be the apical elements. The central element is 
the hypocotyl, and basal elements are the embryonic 
root and root apical meristem. Jiirgens’ screen identi- 
fied mutants that appeared to have deleted one or 
more of these pattern elements: gurke deleted the 
apical elements, fackel deleted the central element, 
monopteros deleted the basal elements. There was 
also a mutant, gnom, which was described as being a 
deletion of both apical and basal elements. 

Further analysis of these mutants has led to a re- 
examination of this classification scheme. The GNOM 
gene was cloned and found to encode a protein with 
similarity to a yeast guanosine nucleotide exchange 
factor for G proteins involved in vesicle formation. 
The set of cells that express this gene suggested a more 
general function than regulation of embryonic pat- 
terning. Recent work indicates that the cause of the 
embryonic pattern defects may be that GNOM is 
required for the correct localization of proteins that 
transport the plant hormone auxin. 

Similarly the MONOPTEROS gene was shown 
to encode a transcription factor (a protein that regu- 
lates gene expression) that probably controls auxin- 
responsive genes. Moreover, the primary defect in 
monopteros mutants appears to be an inability to 
correctly form vascular tissue, which is thought to be 
dependent on proper auxin transport. Further evi- 
dence for a role for auxin in embryonic patterning 
comes from two other mutants, auxin-resistant 6 
(axr6) and bodenlos (bdl), whose heterozygotes show 
resistance to the effects of auxin. Homozygous bdl and 
axr6 both exhibit defects in embryonic root formation. 

A screen of seedlings for defective shoot meristem 
formation performed by Barton and Poethig, 1993 
identified the shoot meristemless (stm) mutant.stm 
embryos do not form a shoot meristem even though 
all of the rest of the embryo appears normal. The STM 
gene encodes a transcription factor of the homeo- 
domain class. STM RNA is first observed in the cells 
between the cotyledon primordia at the triangle stage 
of embryogenesis. This expression pattern provided 
the first evidence that these cells are the precursors to 
the shoot apical meristem. 

The zwille/pinhead mutant is defective in mainten- 
ance of the shoot meristem. This results in premature 
differentiation and loss of the renewal capacity of the 
shoot meristem. The protein encoded by the ZWILLE 
gene has homology to the Drosophila PIW protein. 
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It is interesting to note that both are involved in 
stem cell maintenance and appear to act non-cell- 
autonomously, that is, on cells other than the ones in 
which they are expressed. The wuschel mutation is 
also defective in maintenance of the shoot apical 
meristem. The affected gene encodes a member of 
the homeodomain transcription factor family. It is 
expressed as early as the globular stage embryo in 
a domain below the cells that will form the shoot 
meristem, indicating that it also acts non-cell- 
autonomously. 

A somewhat analogous situation arises in the root 
in the hobbit (hb) mutant. In hb embryos the region 
near the root tip called the quiescent center does not 
appear to be able to perform its role of maintaining the 
surrounding root meristem cells in an undifferentiated 
state. This leads to premature differentiation of the 
cells that would normally be the progenitors for the 
primary root. The molecular nature of the protein 
encoded by the HOBBIT gene has not yet been re- 
ported. 


Mutations in Radial Axis Formation 


Along the radial axis (from outside to inside) of the 
Arabidopsis embryo, pattern elements have been 
defined as the epidermis on the outside, the two 
ground tissue layers, cortex and endodermis, and the 
pericycle and vascular tissue in the center. The Jiirgens 
screen identified two mutants, keule and knolle, which 
were classified as having defects in the formation of 
their radial axis. Both appeared to affect primarily the 
epidermis. Subsequent analysis of knolle revealed that 
the affected gene encodes a protein related to syntax- 
ins which play a role in bringing vesicles to the cell 
membrane. Moreover, the primary defect appears to 
be in completion of cell division as knolle embryos 
have imperfect cell boundaries. 

Screens for abnormal root development identified 
mutations that alter the radial pattern in embryos as 
well as in adult roots and shoots. In short-root (shr) 
mutants no endodermis is made, while in scarecrow 
(scr) mutants there is only one ground tissue layer in 
the embryonic root instead of the normal endodermis 
and cortex. In the root, this mutant layer has charac- 
teristics of both cortex and endodermis indicating that 
the wild-type SCR gene product is required primarily 
for the division that forms these two tissues. Both 
SHR and SCR encode putative transcription factors 
of the plant-specific GRAS family. SCR RNA is ex- 
pressed in the embryonic ground tissue prior to each 
periclinal division and then after the division it is ex- 
pressed only in the internal daughter cell. SHR RNA 
is expressed in the developing vasculature in the 
embryo indicating that its effect on patterning of the 


ground tissue is by a non-cell-autonomous mode of 
action. Genetic and molecular data indicate that SHR 
is required for the transcriptional activation of SCR. 


Other Mutations Affecting 
Embryogenesis 


In addition to pattern formation, other aspects of 
embryogenesis are being addressed through genetic 
analyses. The leafy cotyledon (lec) mutation results in 
embryos that have seedling characteristics including, 
as its name indicates, cotyledons which have morpho- 
logical features of postembryonic leaves. The LEC 
gene encodes a transcription factor of the CCAAT- 
box-binding family. A gain-of-function phenotype 
was obtained by expressing the gene outside of the 
cells in which it is normally expressed. This resulted in 
the formation of embryonic structures on leaves, con- 
sistent with the hypothesis that LEC’s normal role is 
to maintain the embryonic state. 

In fass mutant embryos, all the pattern elements 
are in the right places, but there are extra ground and 
procambium cell layers. The FASS gene appears to be 
necessary for the correct placement of the planes 
of cell division. This indicates that patterning and 
morphogenesis are independent processes in plant 
embryogenesis. 

Alterations in cell shape and number also led to the 
identification of the medea mutant. medea has been 
shown to be a maternal effect mutation — embryos 
formed from female gametophytes harboring a 
medea allele are mutant no matter what the paternal 
allele is. The protein encoded by MEDEA is a member 
of the Polycomb group, which encode chromatin 
remodeling factors. 


Future Prospects 


Screens for embryonic mutations have been carried 
out in other plants, but because molecular analyses 
are more challenging in these species, very few of the 
affected genes are known. As techniques and genomic 
resources improve it will be very interesting to 
compare the genes affected by embryonic mutations 
that have similar phenotypes in different species. 
Because the edible portion of many crops is the plant 
embryo and its surrounding tissue, understanding the 
genetic pathways controlling embryogenesis could 
have important agronomic applications. 
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The Rhizosphere 


In higher plants, the root zone is the site where inten- 
sive interactions take place between the plant, soil, and 
soil microorganisms. Plants take up most of their 
requirements in nutrients and water from the roots, 
and they also release from the roots a large number of 
low molecular weight water soluble exudates such as 
amino acids, hormones, organic acids, sugars, and 
vitamins. Plant species and age, soil nutrients, tem- 
perature, and plant—microbe interactions are among 
the many factors influencing the nature and import- 
ance of the different root exudates. At the beginning 
of the twentieth century, the term ‘rhizosphere’ was 
proposed to indicate soil near roots that was under the 
influence of the root, as indicated by enhance micro- 
bial activity. In addition to root exudates, the rhizo- 
sphere is in contact with other plant-derived products 
of high or low molecular weight, that contribute to the 
richness of this ecological niche in nutrients. The 
root-soil zone is very complex and it generated many 
definitions for the different ecological niche involved. 
The plant root surface or rhizoplane can be studied 
as an individual niche or frequently it is included 
with the rhizosphere. Endophytes are microorgan- 
isms colonizing the interior of plant tissues, including 
roots. 


Rhizosphere Effects on Soil 
Microorganisms 


Many factors influence the importance and activity 
of soil microorganisms. However the availability of 
nutrients (mainly carbon) and water are two import- 
ant limiting factors. As indicated in the previous 
section the rhizosphere (R) is a niche containing sub- 
stantially more nutrients that the nonrhizophere (S) 
soil. The R/S ratio of the numbers of microorganisms 
is used to illustrate how the rhizosphere affects the 
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different group of microorganisms. In general total 
microbial counts are found to be increased 10- to 
50-fold in the rhizosphere. However R/S values ran- 
ging from 100 to more than 1000 can be observed for 
some specific groups of soil microorganisms (like 
ammonitiers or denitrifiers). Bacteria including acti- 
nomycetes are the most abundant group of micro- 
organisms present in the rhizosphere. Fungi and 
protozoa are also present in the rhizosphere. In add- 
ition to soil, rhizosphere microorganisms may arise 
from seedborne populations, which survive storage 
and germination. 


Effect of Rhizosphere Microorganisms on 
Plants 


Fungal or bacterial plant pathogens can be present in 
the rhizosphere and cause plant disease. These patho- 
gens obtain their nutrition by living on or in the plant 
host. The presence of an important number of other 
microorganisms in the rhizosphere can have a neutral, 
deleterious or beneficial effect on plant growth. Dele- 
terious rhizosphere microorganisms are not parasitic, 
but they limit plant growth by altering water or 
mineral ion uptake or the activity of plant growth 
substances. Apart from the well-known symbioses 
formed between plants and soil microorganisms like 
the mycorhizal fungi, the Frankia actinomycetes and 
the bacteria Rhizobium, some rhizosphere micro- 
organisms can promote plant growth by different 
mechanisms summarized later here for rhizobacteria. 
Trichoderma species are the most common potential 
beneficial nonsymbiotic saprophytic fungi found in 
the rhizosphere. In this paper we will only be discus- 
sing nonsymbiotic plant growth promoting rhizo- 
sphere bacteria. 


Plant Growth-Promoting Rhizobacteria 


Rhizobacteria, are bacteria that aggressively colonize 
plant roots. Hence, these bacteria can multiply and 
occupy the ecological niches found on plant roots, at 
most stages of plant growth. Some rhizobacteria are 
endophytes. Plant growth promoting rhizobacteria 
(PGPR) are a very small portion of rhizobacteria (2- 
5%) that promotes plant growth. The term PGPR was 
elaborated in 1978 by Kloepper and Schroth, and used 
to designate the rhizobacteria showing significant 
plant growth promotion, as shown with the substan- 
tial increases in fresh matter yield obtained with 
inoculated radishes. Rarely, PGPR can be present in 
high numbers, naturally like in some suppressive soils. 
In general, PGPR are applied by inoculation of seeds 
or vegetatively propagated plant parts with high 
populations at planting time. 
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Species of PGPR 

Historically, most PGPR are found among the fluor- 
escent pseudomonads. However, nonfluorescent 
pseudomonads, Burkholderia sp. and other Gram- 
negative bacteria like Arthrobacter, Serratia, and 
Achromobacter include strains that are PGPR. Recent 
reports also indicate that some strains of Rhizobium, 
the nitrogen-fixing bacterial symbiotic partner of 
legumes, can behave like other PGPR with nonlegu- 
minous plants like corn, lettuce and radishes. A natural 
endophytic association between PGPR Rhizobium 
and rice was recently observed in the Nile delta region 
where rice is grown since antiquity, in rotation with 
the Egyptian berseem clover. Azospirillum spp. are 
nitrogen-fixing rhizobacteria, forming associative 
symbioses with plants. Other nitrogen-fixing bacteria 
like Azotobacter spp. canalso colonize the rhizosphere. 
However, like Rhizobium withnonlegumes, the benefi- 
cial effect on plant growth of all these diazotrophs does 
not result from their ability to fix atmospheric nitrogen, 
but rather from their other PGPR attributes. Among 
Gram-positive bacteria reported to include PGPR, 
spore-forming species of bacilli are most important. A 
practical advantage of sporulation is the ease of devel- 
oping formulations of PGPR which retain viability 
when dried, and hence, most commercial products with 
PGPRarebased onstrains of Bacillus and related genera. 


Plant Growth Promotion 

The beneficial effects of PGPR result from improve- 
ment of plant growth and health and can be evidenced 
by increases in seedling emergence, vigor, root system 
development and yield. PGPR use one or more of 
different mechanisms of action to promote plant 
growth. These mechanisms either have a direct effect 
on plant growth like the improvement of plant nutri- 
tion or they can have an indirect effect like the 
enhancement of plant health by eliminating patho- 
gens, inducing plant defense responses or by eliminat- 
ing contaminants from the rhizosphere. Although not 
yet clearly elucidated, in strains using more than one 
mechanism of action, these can be active simultan- 
eously or sequentially. Early studies with PGPR 
were performed with root plants like potato and 
sugar beet. Presently PGPR are studied with many 
crop plants, horticultural crops, and cultivated trees. 


Mechanisms of Growth Promotion by PGPR 


Biological control 

Many strains of PGPR also exhibit biological control 
of major plant pathogens. Antibiosis, competition, pro- 
duction of siderophores, cyanide, and lytic enzymes 
are mechanisms by which PGPR exhibit biological 


control. 


The involvement of bacterial-produced antifungal 
antibiotics in the biological control action of PGPR 
was illustrated in many host-pathogens systems by 
the isolation of antibiotic-negative bacterial mutants, 
which are less suppressive. Phenazine deficient (phz_) 
Tn5 mutant of Pseudomonas fluorescens 2-79 isolated 
from wheat grown in take-all suppressive soil in the 
Pacific Northwest of the United States, failed to in- 
hibit in vitro Gaeumannomyces graminis var. tritici the 
fungal disease causal agent. The mutants were also 
significantly less suppressive than the parental strain. 
Strain CHAO of P. fluorescens produces several anti- 
fungal antibiotics, like 2,4-diacetyl phloroglucinol 
which is involved in the suppression of black root 
rot of tobacco and take-all of wheat, and pyoluteorin 
that suppresses Phythium induced disease of cress. 
Recently, strong correlations were observed between 
the ability of fluorescent pseudomonads to suppress 
pea seed infection by Phythium ultimum and produc- 
tion of hydrogen cyanide (HCN) or accumulation of 
the C17:0 cyclopropane fatty acid (17CFA). There is 
no evidence that 17CFA has antifungal activity but 
like HCN, this fatty acid is synthesized under the 
control of stationary-phase sigma factors (RpoS) dur- 
ing the stationary phase of the growth cycle. In strains 
accumulating more 17CFA, other RpoS-mediated 
genes may also be similarly expressed, including those 
coding secondary metabolites with antifungal proper- 
ties. To select fluorescent pseudomonads for use as 
biological control agents, 17CFA may be an interest- 
ing marker. An antibiotic producing strain of Bacillus 
subtilis has been shown to produce potent antifungal 
volatiles active against a range of fungal species. 

Under iron-limited conditions, some PGPR pro- 
duce siderophores, which have very high-affinity to 
ferric iron. Pseudobactins are an example of sidero- 
phores produced by fluorescent Pseudomonas strains. 
By binding Fe(III), siderophores immobilize this 
essential element, inhibiting the growth of deleterious 
and soilborne bacterial or fungal agents unable to use 
this iron complex. Effective PGPR produce sidero- 
phores specific membrane protein receptors, which 
allow them to use iron. In addition to its own sidero- 
phore, a strain of PGPR can use siderophores pro- 
duced by other rhizobacteria. Therefore siderophore 
negative mutants can still colonize the rhizosphere in a 
fashion similar to their wild types. Biological control 
activity mediated by siderophores is nullified by the 
addition of soluble iron, under laboratory conditions. 
Comparable effect can happen in the field at acid soil 
pH (below 6) because iron becomes more available. 
PGPR producing siderophores are more competitive 
than other rhizosphere organisms. In addition to the 
iron effect, bacteria in general multiply and catabolize 
nutrients more rapidly than other organisms. This 
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high competitive ability allows them to colonize 
rapidly the rhizosphere preventing the presence, or 
reducing significantly the number of other deleterious 
or pathogenic microorganisms. 

The biological control activity of some PGPR 
against fungi is associated with bacterial production 
of lytic enzymes like chitinases, B-1,3-glucanases or 
proteases. This mechanism used successfully in the 
biological control of Phythium and Fusarium spp., is 
found in Bacillus, Serratia and Streptomyces species 
and is referred to as parasitism. 

Plants sensitized by biotic or abiotic agents, 
respond more rapidly to infection and gain-increased 
protection against virulent pathogens. PGPR can 
indirectly promote plant health by induced systemic 
resistance (ISR). In the absence of pathogens, the pre- 
sence of PGPR induces plant defense mechanisms 
against organisms causing foliar diseases. ISR involves 
plant structural changes like the formation of new 
barriers and increased activity of lytic enzymes or 
the production of the fungitoxic compounds phyto- 
alexins: PGPR mediated ISR was involved in the reduc- 
tion of the incidence of diseases caused by bacteria, 
fungi, insects, nematodes, and viruses. Bacterial lipo- 
polysaccharides (LPS) isolated from Pseudomonas sp. 
suppressed Fusarium wilt of carnation and increased 
phytoalexin accumulation in the plant. The LPS of 
Rhizobium etli induced systemic resistance in potato 
roots against the cyst nematode Globodera pallida. 
Salicylic acid (2-hydroxybenzoic acid) is another 
substance that induces local or systemic defense 
responses in plants. However, ISR” mutants of Serra- 
tia marcescens still produce salicylic acid. In the 
Rhizobium-legumes symbiosis upon activation by 
plant flavonoids, the nod genes are induced and they 
produced. lipochito-oligosaccharides (LCOs) called 
Nod factor, involved in nodule formation. The Nod 
factors can act as elicitors of phytoalexins biosynthesis, 
and they may play a role in the inhibition of salicylic 
acid-mediated defense in legumes. Alfalfa plants 
inoculated with their symbiotic partner Sinorhizo- 
bium meliloti accumulate salicylic acid, which is an 
inhibitor of nodulation when applied exogenously. 
Nod factors suppress salicylic acid accumulation. 


Direct plant growth promotion 

Phytohormone production is one of the mechanisms 
by which PGPR can promote plant growth. Inocula- 
tion of maize with indole-3-acetic acid (IAA) 
producing Pseudomonas and Acinetobacter strains 
exerted beneficial effects on root elongation and 
lateral root production. Inoculation of young maize 
plants with these PGPR had an action comparable to 
the application of IAA: higher root dry matter pro- 
duction and higher concentrations of Ca, K, Mg, P, Fe, 
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and Zn found in the roots. [AA and cytokinins produ- 
cing strains of Rhizobium leguminosarum signifi- 
cantly promoted the early seedling root growth of the 
nonlegumes canola and lettuce. Auxotrophic mutants 
requiring tryptophane or adenosine (precursors of 
IAA and cytokinins) did not promote growth to the 
extent of the parent strains Inoculation of maize with 
Azospirillum brasilence and rice with A. lipoferum 
enhanced the uptake of phosphates, nitrate and 
ammonium. The ionic transport system in oil-seed 
rape is also probably stimulated by inoculation with 
an Achromobacter sp. PGPR isolate. 


Other plant growth promotion mechanisms 
Coinoculation of PGPR (Pseudomonas and Serratia 
spp.) with rhizobia or bradyrhizobia, has been shown 
to increase nodulation and nitrogen fixation of pea, 
lentil, bean and soybean. Mycorrhiza are the symbiotic 
association between specific fungi and the fine roots 
of higher plants. Around the fungal partner a unique 
rhizosphere microbial community called myco- 
rhizosphere is formed. Rhizobia and pseudomonads 
adhere to the spores and hyphae of the arbuscular 
mycorrhizal (AM) fungus Gigaspora margarita. This 
indicates that AM fungi are probable vehicles for the 
colonization of roots by PGPR. Paenibacillus sp. is a 
Gram-positive bacterium isolated from the mycorhi- 
zosphere of sorghum plants inoculated with Glomus 
mosseae. This bacterium is antagonistic towards the 
soilborne fungal pathogen Phytophthora parasitica, 
and it stimulates mycorhization. The acetylated Nod 
factor produced by rhizobia, stimulate colonization of 
nodulating and nonnodulating soybeans by G. mos- 
seae. The sulfated Nod factor was ineffective. The 
stimulatory effect of the acetylated Nod factor was 
related to its ability to stimulate flavonoids secretion 
by soybean. In fact, the three flavonoids Apigenin, 
coumesterol, and daidzein significantly stimulated 
mycorrhizal colonization of soybean when added in 
the absence of bacteria. A putative P-transporter 
operon was found in the genome of a Burkholderia 
sp. strain living inside the AM fungus G. margarita. 

Some PGPR can solubilize rock phosphate or the 
different poorly soluble inorganic forms of P in soil by 
the production of organic acids and acidic protons 
(H‘ions). Phosphatase enzymes produced by PGPR 
also mineralize organic phosphate (P). By increas- 
ing the concentration of soluble phosphate in soil, 
phosphate-solubilizing PGPR can improve plant 
phosphate nutrition and growth. With phosphate- 
solubilizing PGPR, the increase in phosphate 
availability is probably not the sole plant growth pro- 
moting mechanism involved. 

Colonization of barley roots with a 2,4-dichloro- 
phenoxyacetic acid (2,4-p)-degrading strain of 
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Burkholderia cepacia protected the plant in an inhibi- 
tory contaminated soil, by degradation of the herbi- 
cide in the rhizosphere. Inoculation of tomato, canola, 
and mustard seeds with PGPR siderophores produ- 
cing strains of Kluyvera ascorbata, protected the 
plants against the inhibitory effects of high concentra- 
tions of nickel, lead, and zinc. 


Future Prospects 


PGPR offer promise for use as components in inte- 
grated pest management schemes within sustainable 
agriculture. Applications of PGPR can increase yields 
by significantly decreasing the amount of pesticides 
and chemical fertilizers used. PGPR have also excel- 
lent potential for use in rhizosphere bioremediation 
systems to degrade xenobiotic compounds or metal 
and other contaminants. Interesting approaches are 
still to be found and exploited like the development 
of PGPR strains that are deleterious to weeds but 
beneficial to crops, and coinoculation of PGPR with 
rhizobia that exert their beneficial effects on legumes 
and their companion or following cereal crop in a 
rotation system. Because of the complexity of the 
soilplant ecosystem, PGPR response under field trials 
can vary greatly from one region to another. However 
this variability will be increasingly narrowed, as we 
understand better the genetics of root colonization 
and the molecular basis of the plant microbial signaling. 
The promoter-trapping technology (in vivo expres- 
sion technology; IVET) is a very promising technol- 
ogy that allows the identification of genes showing 
elevated levels of expression in the rhizosphere. This 
technology was first developed to isolate plant- 
induced genes from Xanthomonas campestris, and 
subsequently IVET and similar technologies were 
used in animal pathogenesis. Recently the IVET 
allowed the identification of genes expressed in 
Pseudomonas putida during colonization of the plant 
pathogentic fungus Phytophthora parasitica. Presently 
it is being used to identify rhizobial genes expressed in 
the rhizosphere of nonlegumes. 
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The plant hormones are a group of small, unrelated 
molecules that have profound effects on diverse 
aspects of plant development and physiology. For 
many years there were only five known plant hor- 
mones; the auxins, the cytokinins, the gibberellins, 
abscisic acid, and ethylene. More recently jasmonic 
acid, salicylic acid, and brassinolide have been added 
to the list. All of these compounds are the products 
of secondary metabolism and each one is active at 
extremely low concentrations. 

In animals, hormones typically act a distance from 
their site of synthesis. This is also true for some plant 
hormones. For example auxin moves through plant 
tissues via a specialized polar transport mechanism. 
This process involves movement of the hormone 
through files of cells by successive cellular influx and 
efflux events. However, other plant hormones may be 
active at both the sites of synthesis and at distant sites. 

In most cases, hormone biosynthesis and mechan- 
ism of action are still poorly understood. However, 
recent genetic studies using the model plant Arabidop- 
sis thaliana have resulted in a number of exciting 
advances in this field. Mutants that are affected in 
hormone biosynthesis and response have been used 
to define many of the genes involved in these pro- 
cesses. This avenue of research is likely to rapidly 
increase our understanding of plant hormone biology. 

The most abundant naturally occurring auxin is 
indole-3-acetic acid (IAA). This hormone is required 
for growth of plant cells in culture and is involved in 
diverse aspects of plant growth and development 
ranging from embryogenesis to floral development. 
Auxin exerts these effects by regulating both cell 
elongation and cell division. The mechanism(s) of 
auxin action is still poorly understood and is therefore 
a very active area of investigation. Genetic studies 
have revealed that auxin action requires the activity 
of a protein degradation pathway called the ubiquitin- 
proteosome pathway. It seems likely that auxin 
response requires the degradation of one or more 


protein repressors of the response. For many years 
synthetic auxins such as 2,4-dichlorophenoxyacetic 
acid (2,4-D) have been used as herbicides. 

The cytokinins are also very important regulators 
of plant development. These molecules are purine 
derivatives and were originally defined based on 
their ability to stimulate plant cell division in culture. 
Later studies indicated that they also promote chloro- 
plast development and delay leaf senescence. At pres- 
ent, almost nothing is known of the molecular mode 
of cytokinin action. 

Ethylene is one of the few gaseous regulators 
known in nature. Active at exceedingly low concen- 
trations, ethylene regulates cell growth in a number of 
contexts. This hormone is particularly important for 
fruit ripening in many plants. For example, ethylene 
is required for ripening of so-called climacteric fruit 
such as tomato and banana. The ability to control 
the ripening process through regulation of ethylene 
synthesis and response has been one of the major goals 
of the food biotechnology industry. Largely through 
genetic studies, ethylene receptors and signaling path- 
ways are relatively well known. 

Abscisic acid (ABA) is synthesized from mevalonic 
acid and has extremely important roles in water stress 
and establishment of dormancy during seed develop- 
ment. The hormone acts very rapidly to induce 
stomatal closure during conditions of water deficit. 
ABA action is just beginning to be understood. 
Genetic studies suggest a role for protein phosphoryl- 
ation in early stages of ABA signaling together with 
specific changes in gene expression. 

The gibberellins (GAs) are also synthesized from 
mevalonic acid. The GAs were originally identified as 
a product of the fungus Gibberella, a pathogen of rice. 
Rice plants that become infected with this fungus 
grow tall and spindly because of fungal production 
of GA. As evidenced by this behavior, as well as the 
phenotype of mutants deficient in GA biosynthesis, 
these hormones are important regulators of stem 
elongation. In addition, they are essential for germin- 
ation of many seeds and often have an important role 
in flowering and fruit development. Genetic studies in 
maize and Arabidopsis have led to a detailed under- 
standing of GA biosynthesis. Less is known about 
mode of action but as for the other hormones, genetic 
studies in Arabidopsis promise to shed new light on 
this problem. 

Jasmonic acid (JA) is synthesized from linoleic 
acid. This hormone is a relative newcomer to the 
pantheon and less is known of its role in plant growth 
and development. Together with salicylic acid (see 
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below), JA appears to be particularly important for 
defense responses. JA is synthesized after wounding 
or some other insult and induces the synthesis of 
defense-related proteins including proteinase inhibi- 
tors that inhibit insect feeding. In Arabidopsis, JA 
action depends on regulated protein degradation in a 
manner similar to auxin. 

Salicylic acid (SA) is a derivative of phenylalanine 
and has an important role in plant defense responses. 
Infection results in an increase in SA levels, which in 
turn results in expression of a number of pathogenesis- 
related (PR) proteins. SA is also associated with a 
nonspecific persistent defense syndrome called sys- 
temic acquired resistance. Plants that are locally 
exposed to a pathogen develop a systemic and long- 
lasting resistance to a variety of pathogens. Because of 
the importance of plant disease processes to agricul- 
ture, both JA and SA have been the focus of intense 
investigation by the biotechnology industry. 

Brassinosteroids (BRs) are closely related to animal 
steroid hormones. This class of plant hormone is 
required for cell elongation and may have a special 
role in light regulation of plant growth. Genetic stud- 
ies have revealed an important difference between the 
action of plant and animal steroid hormones. In ani- 
mals, steroid hormones interact with cytoplasmic 
receptors. Hormone binding results in translocation 
of the receptor into the nucleus of the cell where it 
stimulates specific gene transcription. In contrast, the 
BR receptor appears to be protein kinase located on 
the cell surface. Arabidopsis mutants that lack this 
protein are unable to respond to BR. 

There are two important differences in the devel- 
opmental strategies of animals and plants. One is that 
in animals, development ceases at maturity whereas 
plants develop throughout their life. The second dif- 
ference relates to the importance of the environment 
in determining form. Animal morphology is by and 
large genetically determined. The environment may 
affect the overall size of the organism, but the number 
and type of organs is invariant within a species. 
In contrast, the environment is constantly affect- 
ing plant form in dramatic ways. The size, number, 
shape, and type of organ that develop change as 
conditions change. One of the major collective func- 
tions of the plant hormones is to facilitate this devel- 
opmental plasticity. 


See also: Arabidopsis thaliana: Molecular 
Systematics and Evolution; Arabidopsis thaliana: 
The Premier Model Plant; Developmental 
Genetics; Root Development, Genetics of 
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When a virulent phage multiplies on a bacterial lawn, 
this growth is visible due to the formation of a plaque. 
A plaque is similar in some respects to a bacterial 
colony. Like a colony, it represents many multiplica- 
tions of a single bacteriophage, each generation clear- 
ing another ring of the original lawn. Because of 
this, plaques can be used in the same way as bacterial 
colonies to determine the titer or concentration of a 
sample. 

In many kinds of phage, the size of the plaque is 
limited by the fact that the phage no longer produces a 
burst once the bacteria in the lawn reach stationary 
phase. Different strains of bacteriophage make differ- 
ent plaques, and so plaque morphology is one criter- 
ion that can be used to describe and characterize a 
bacteriophage. 


See also: Bacteriophages; Virulent Phage 
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Plasmacytomagenesis is a process that develops over 
the course of many cell divisions in the life span of a B 
lymphocyte clone. Genetic change plays an important 
role in this process. We are just beginning to appreci- 
ate the complexity of plasmacytomagenesis and to 
identify the critical genetic changes. An important 
concept is that the plasma cell is the terminal stage of 
differentiation and maturation of the B lymphocyte 
and very likely the process of plasma cell tumor devel- 
opment begins with an inherited susceptibility, fol- 
lowed by mutational change beginning at the B 
lymphocyte stage of development. There are four 
modes of plasma cell tumor development (referred to 
here as plasmacytomagenesis (PCTGEN)): spontan- 
eous, induced, oncogene driven accelerated, and spon- 
taneous in genetically altered mice. Most of the current 
data have come from the induction of plasmacytomas 
(PCTs). Different model systems are beginning to pro- 
vide a general picture of how B lymphocytes become 
neoplastically transformed into plasma cell tumors. 


Many references to this system are found in reviews 
(see Further Reading) 


Spontaneous PCTGEN 


PCTs are rarely encountered in intact normal inbred 
mice, but when found, the PCTs occur in old mice at 
1-2 years of age. In C3H/He mice PCTs appear to 
develop in association with inflammatory tissues that 
underlie mucosal ulcers. Plasma cell tumors also arise 
in old C57BL/Ka mice, usually in the bone marrow. 


Induced PCTGEN 


Induction does not involve the use of known chem- 
ical, physical, or biological (viral) tumor-inducing 
agents, but rather relies on methods for producing 
chronic peritoneal irritation and inflammation in 
genetically susceptible inbred strains of mice. 

PCTs are induced in BALB/cAn mice by the 
intraperitoneal (i.p.) introduction of materials which 
the phagocytic system is incapable of digesting or 
removing. These include paraffin oils (including 
chemically defined alkanes such as pristane (2,6,10,14 
tetramethylpentadecane)), phytane, silicone gels and 
various solid plastic objects that can be made from 
polycarbonates. Pristane, the best-studied material, is 
given in 0.2-0.5 ml doses three times on days 0, 60, and 
120. In the peritoneal cavity these materials evoke 
a chronic inflammatory process that develops in 
response to peritoneal irritation. Paraffin oils and 
small particulates derived from silicone gels are 
engaged by the phagocytic system, which results in 
an accumulation of a chronic inflammatory tissue that 
surrounds nonremovable material and builds up on 
peritoneal surfaces (the oil granuloma). This tissue is 
composed chiefly of macrophages, neutrophils, fibro- 
blasts, and blood vessels and is covered by mesothe- 
lium. Histologically, the PCTs develop in this tissue. 
There are qualitative differences in the plasmacytoma- 
genic properties of oil granuloma tissues. In contrast 
to paraffin oil granulomas the extensive oil granuloma 
tissue that forms in response to silicone liquids (poly- 
dimethyl siloxanes) is not plasmacytomagenic. Mice 
implanted with plastic objects, e.g., plastic disks of 20 
x 2mm, evoke a patchy fibroplastic response on peri- 
toneal surfaces. The PCTs induced by solid objects 
appear to develop in peritoneal connective tissues, 
but in contrast to the oil granuloma there is much 
less reactive tissue and the relationship of pathologic 
inflammatory tissue to PCT origin is less clear. 

Effective PCT induction, i.e., with incidence of 
tumor of 50-60% in 300 days is highly inbred strain 
dependent. BALB/cAn mice and various sublines 
derived from this branch of the BALB/c family are 


the most susceptible strains. The closely related 
BALB/c] subline is relatively resistant. Very few 
genetic differences have been found that distinguish 
BALB/cAn and BALB/cJ. One possibility is a gene 
located on chromosome (chr) 15 that regulates a 
mouse urinary protein (MUP) polymorphism. The 
autoimmune NZB/WEHI, NZB/B1 strains are sus- 
ceptible but the incidence of PCTs is considerably 
lower than in BALB/cAn. Common resistant inbred 
strains are C3H/HeN, C57BL/6, C57BL/Ka A/He, 
AKR, DBA/2N, SWR/J, NZW/WEH1, AL/N, C58, 
B10.D2/SnJ. F, hybrids of BALB/cAn and other 
strains are relatively resistant with incidences ranging 
from 1 to 16% the F, hybrids of BALB/c and DBA/2 
(CDF,) are highly resistant while (BALB/c x C57BL/ 
Ka) F; (CBF;) have been reported to develop an inci- 
dence of 16% PCTs. First generation backcrosses to 
B/c of various F; hybrids range from 10% (CDF1 x 
B/cAn) to 38% (in AKR x B/c)F, x B/cAn. 


Genetics of Susceptibility and Resistance 
Mice carry several genes that affect susceptibility (S) 
and resistance (R) to pristane induction. Definition 
of these genes is made in relation to the R strain 
employed in the cross. To date the most extensively 
studied cross has involved BALB/cAn (B/c) and 
DBA/2N (D2). The F; hybrids of these strains have 
an incidence of 1% or less, while the first generation 
backcross to B/c is 11%. These incidences suggest R 
genes are dominant and multiple. The first R gene of 
D2~ origin was linked to the Fv-1 locus on distal 
chromosome 4. A BALB/c.DBA/2-Fv-1 n/n (chromo- 
some 4) congenic strain was developed and found 
to be partially resistant. Mapping of R genes on 
chromosome 4 using other B/c.D2 chromosome 4 
congenics indicated the presence of 2 R genes, desig- 
nated Pctr? and Pctr2, 30cM apart on this chromo- 
some. Subsequently, PCT-S genes of B/c origin were 
defined by comparing the genotypes of (B/c x D2)F, 
x B/c backcross mice that developed PCTs with those 
that did not. Two S genes on chromosome 4 were 
positioned at Pctr1 and Pctr2 and are thought to be 
alleles of the DBA/2 R genes that map to these regions. 
The identity of these genes is not established with 
certainty, however a B/c-D2 allelic difference that 
affects the expression in Cdkn2a (P16) gene is a strong 
potential candidate. The wild-type allele in D2 nor- 
mally suppresses the cell cycle progression prior to S 
phase, while the partially defective allelic variant in B/ 
c does not. 


Consistent Chromosome Translocations 

Over 95% of the induced PCTs carry either chromo- 
some translocations t(12;15) in 80-85% or t(6;15) in 
10-15% of the tumors. The genetic loci involved in the 
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chromosomal breaksites are: switch sequences and 
some non-switch regions near JH on chromosome 
12; the 5’ region of c-myc proto-oncogene beginning 
300 bp 5’ of exon-1 and extending into intron-1 on 
chromosome 15; the Igk light chain locus on chromo- 
some 6 and Pvt-1 (PCT variant translocation) locus 
some 220 kb 3’ of c-myc exon 3 on chromosome 15. 
The t(12;15) illegitimate recombination of c-myc and 
the IgH locus alters or disrupts the normal regulatory 
sequences in the 5’ end of c-myc. First, in t(12;15) the 
c-myc locus is joined head-to-head with the IgH site. 
When this occurs in the 3’ end of exon-1 on the 5’ end 
of intron-1 the normal promoters of c-myc are no 
longer available and transcription is apparently regu- 
lated by enhancer sequences associated with the IgH 
complex, the intronic enhancer Ep, and the enhancer 
sequences 3’ of Ca and Ea. There is accumulating 
evidence that translocations beginning in Su can be 
subsequently switched to Sa, a process that is prob- 
ably driven by physiological switch recombinations. 
The c-myc locus then comes into close proximity 
with the strong Ex enhancers, located 3’ of Ca. In 
the other 5% of the PCTs a variety of different gene 
rearrangements involving c-myc, IgH, and Pvt-1 have 
been described. 

The biological effects of the t(6;15) translocation 
are still being worked out. However, t(6;15) indirectly 
deregulates c-myc transcription by an as yet undefined 
mechanism. In this rearrangement the c-myc locus 
is joined to chromosome 6 in the JK region with no 
apparent targeting to signal sequences, while the Put-1 
locus has no protein product; chimeric transcripts are 
generated from this translocation. These contain short 
segments of Pvt-1 aligned in frame to Cx. It has been 
proposed that gene product may play a role in aber- 
rant myc transcription. The proximity of Pvt-1 to c- 
myc also may play a role in normal regulation of c-myc 
expression through interaction between shared enhan- 
cers or promoters. 

The t(12;15) translocation has been detected in 
normal mouse lymphoid tissues by polymerase chain 
reaction (PCR). However, this is dependent on the 
size of a given clone and rarely in normal mice does 
such a clone expand to the point of detection by long 
PCR. T(12;15) translocations, however, can be readily 
detected in pristane oil granulomatous tissues 7-21 
days post pristane by nested PCR, by long PCR in 
mice immunized with cholera toxin, and in IL-6 trans- 
genic mice, suggesting they are selectively expanded in 
these sites. 

While t(12;15) and t(6;15) appear to be required for 
PCTGEN and may be the initiating and rate-limiting 
genetic event, it is surprising thus far that susceptibil- 
ity/resistance genes have not been implicated in the 
pathogenesis of these chromosomal abnormalities. 
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Environmental Factors 

Environmental factors (diet, exposure to potentially 
infectious microbial organisms, the normal microbial 
flora and common endogenous viruses (e.g., MHV, 
Sendai), housing conditions, bedding, and ambient 
noise) can potentially affect PCTGEN. Of these, the 
effects of raising mice in specific pathogen-free (SPF) 
as compared to conventional conditions have been 
shown to drastically decrease pristane PCTGEN 
from 50-60% to 5%. SPF mice not only have a 
restricted microbial gut flora, but also are not exposed 
to common mouse viruses. Furthermore, their food is 
sterilized (autoclaved). Thus, these mice have greatly 
reduced antigenic challenges throughout life. Hence, 
it is suspected that antigenic stimulation is required in 
the plasmacytomagenic process. 


Microenvironmental Factors 

The inflammatory microenvironment of the oil 
granuloma plays a critical role in the plasmacytoma- 
genic process. Several lines of evidence reveal this. 
First, when mice are given the nonsteroidal anti- 
inflammatory agent indomethacin in the drinking 
water or diet throughout the induction period (0-200 
or more days), the incidence of PCTs is drastically 
reduced or totally eliminated. Considerable evidence 
indicates that the mechanism of this inhibition acts on 
a prostaglandin-producing cell (macrophage or pos- 
sibly a fibroblast in the case of plastic disk PCTGEN). 
Pristane stimulates macrophages to produce prosta- 
glandins (PGE), which stimulate other macrophages 
to secrete IL-6 via a cAMP-dependent pathway. IL-6 
is an essential factor for the survival of PCT cells in 
vitro and in vivo and for the development of PCTs, 
as IL-6 defective mice are refractory to pristane 
PCTGEN. The oil granulomas of mice treated with 
indomethacin in general resemble those of intact mice. 
Allelomorphic differences in genes that control the 
response to peritoneal irritants has not yet been 
demonstrated. 


Oncogene-Driven Accelerated PCTGEN 


The mean latent period (LP) of PCT development is 
usually between 205 and 220 days. PCTs can be 
induced in pristane-treated mice with much shorter 
latent periods (accelerated PCTGEN) by artificially 
introducing oncogenes in infection of retroviral 
vectors. Infection of pristane-treated BALB/c mice 
with the transforming Abelson retrovirus produced 
PCTs with LPs ranging between 30 and 90 days. 
These PCTs also had t(12;15) and t(6;15) chromosomal 
translocations. PCTGEN has also been accelerated 
in pristane-treated mice by injection with retroviral 
constructs each carrying two oncogenes: the RIM 


viral construct which has an Eu-myc oncogene and 
a mutant v-Ha-ras gene and the J3V virus which 
carries avian v-myc and avian v-raf. Most of these 
PCTs lack t(12;15) and t(6;15) translocations as myc 
function was deregulated by Eu-myc or v-myc onco- 
genes. All three of these methods require pristane 
conditioning. 

In contrast, an ABL/MYC virus which carries v-abl 
and c-myc under the control of the tk promoter can 
rapidly induce PCTs in nonpristane-treated mice in 
3-4 weeks. Proto-oncogenes of abl, ras, and raf code 
for enzymes that are components of signal transduc- 
tion pathways involved in cell proliferation. As yet, 
mutations in c-abl, c-raf, and C-Ha-ras have not been 
reported in standard pristane PCTGEN. 


Spontaneous PCTGEN in Genetically 
Altered Mice 


Plasma cell tumors develop spontaneously with high 
incidence in intact, nonpristane-treated mice carrying 
in E p-v-abl transgene. The strains of mice used in 
which this transgene has been effective include strains 
that are resistant to pristane PCTGEN. The PCTs 
developing in this transgenic mouse carry t(12;15) 
translocations. Transgenic mice carrying a human 
IL-6 transgene under the control of the H-2LD pro- 
moter develop PCTs. 

Plasma cell tumors have been observed in 2-7% of 
(SJL x C57BL)F, mice carrying Ep-bcl-2 transgenes. 
Furthermore, PCTs can be induced with high inci- 
dence in BALB/c.Eu-bcl-2-22 mice with pristane. 


Later Genetic Changes During Plasma 
Cell Tumor Development 


As cells progress toward more profound neoplasia, 
many genetic changes reflecting genomic instability 
appear. Polyploidy has long been a characteristic of 
PCTs. Several inconsistent cytogenetic abnormal- 
ities have been described in PCTs: trisomy 11 and 
promiscuous, nonreciprocal translocations involving 
chromosome 5 have been found in 52% of pristane- 
induced PCTs. Genetic changes of this type are 
thought to lead to the promotion of tumor develop- 
ment. A consistent phenotype of BALB/c PCTs 
induced by pristane is the loss of expression of the 
TBRII receptor. In contrast, putatively nonmalignant 
plasma cells isolated from IL-6 transgenic mice 
strongly display this receptor on the cell surface. The 
genetic basis of this defect has not been determined. 
A long-standing phenotype of the neoplastic plasma 
cell is the extensive appearance of intracisternal 
A particles. The genes for these viral particles are 
found on many chromosomes, but the mechanism of 


their consistent activation in PCTs is poorly under- 
stood. 
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Plasmids are stable, nonessential components of 
microbial genomes that exist outside the chromosome 
as autonomous replication units. In a sense, plasmids 
are the most streamlined of the obligate parasites, 
consisting only of a double-stranded DNA molecule. 
These DNAs are usually, but not always, circular 
(linear forms have been found in species of Borrelia 
and Streptomyces, which also have linear chromo- 
somes). Plasmids can be small (2-3 kb), or they can 
be quite large (>500 kb). The large ones sometimes 
carry almost as many genes as a small bacterial 
chromosome. Indeed, large plasmids are generally 
considered to be chromosomes when they contain 
genes, such as ribosomal RNA genes, that are essential 
for normal growth of the host. 

The hosts for plasmids are usually single-celled 
organisms such as bacteria, yeast, and archaea. These 
host cells supply the replication, transcription, and 
translation machinery required for maintenance of 
the plasmid. The plasmid contains its own origin of 
replication, genes responsible for attracting host repli- 
cation proteins to the plasmid origin, and in some 
cases genes that assure stable maintenance. The infor- 
mation for plasmid replication is generally contained 
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ina small (<3 kb) portion of the plasmid, regardless of 
plasmid size. Some plasmids also contain genes 
involved in movement of genetic material from one 
cell to another and genes that confer a selective advan- 
tage to the host under specialized conditions. Exam- 
ples of plasmid genes that are important for the host 
include those that encode virulence factors, the ability 
to metabolize compounds such as toluene, and resist- 
ance to antibiotics, heavy metals, irradiation, and bac- 
teriocins (substances that kill cells lacking the plasmid 
producing the bacteriocin). 

Plasmids were first detected in the 1960s as extra- 
chromosomal elements that confer antibiotic resist- 
ance. They have been extensively studied, partly 
because of the perceived medical importance of 
plasmids as carriers of antibiotic resistance genes and 
partly because plasmids serve as good models for 
examining DNA biology. By the early 1970s their 
features were understood well enough to make plas- 
mids key factors in recombinant DNA technology as 
vectors for gene cloning. They still occupy that import- 
ant niche in genetic engineering. It has also become 
apparent that horizontal transfer of genes by plasmids 
is an important factor in bacterial evolution, a concept 
that is supported by the recent deciphering of many 
bacterial genomic sequences. Below we discuss how 
plasmids replicate, maintain a set copy number, and 
move from one cell to another. We also briefly touch 
on how these DNA molecules are used in genetic 
engineering. 


Plasmid Biology 


Replication 

Most plasmids behave as single replication units 
(replicons). In addition to an origin of replication, 
each plasmid usually carries a gene encoding a replica- 
tion initiation protein called a Rep protein. These 
proteins interact with host proteins to load the host 
replication elongation system onto the plasmid DNA. 
Once protein loading occurs, DNA synthesis pro- 
ceeds much as described for bacterial chromosomes. 
Although initiation of replication can be complex 
(some large plasmids encode helicases and up to 
three Rep proteins), only two general schemes are 
needed to describe most plasmid replication. One is 
called theta replication and the other rolling circle 
replication. Within each replication mode are many 
variations; a few of the better studied examples are 
sketched below. 

In the theta replication mode (Figure |), one DNA 
strand is replicated by continuous (leading) strand 
synthesis, while replication of the second strand 
occurs by discontinuous (lagging) strand synthesis, 
much like chromosomal replication. Initiation usually 
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Theta replication. (A) The DnaA protein binds to the supercoiled plasmid at the DnaA boxes and the Rep 


protein binds at the iterons to form a DNA-protein complex that causes local denaturation in the AT-rich region of 
the plasmid origin of replication. (B) Host replication proteins bind to the denatured region and begin to expand it to 
create a replication fork. (C) Additional host replication proteins begin DNA synthesis, and the replication fork (f) 
moves in the direction indicated by the arrow. (D) The replication bubble continues to expand as the fork moves in 
the direction of the arrow. (E) Two daughter molecules are formed that are interlinked. (F) The daughter molecules 
are unlinked (decatenated) by a topoisomerase, probably topoisomerase IV. (G) Negative supercoils are introduced 
by DNA gyrase so another round of replication can initiate. 


occurs in three steps. First, the plasmid-encoded 
initiator protein (Rep) binds to one or more specific 
sites in the plasmid origin. In many Escherichia coli 
plasmids, such as pSC101, R1, F, P1, and R6K, the host 
replication initiator, the DnaA protein, also binds to 
sites in the plasmid origin. In the second step, local 
DNA strand separation occurs in the origin region, 
and a host-encoded DNA helicase promotes further 
melting of the region. The third step involves assem- 
bly of the replication elongation proteins and forma- 
tion of one or two replication forks. 

Replication origins of theta replication plasmids 
often contain several sequence elements. Among 
these are reiterated sequences (iterons) where the plas- 
mid initiator protein binds (iterons are also involved in 
copy number control, as discussed below). Also pres- 
ent are one or two DnaA boxes, the binding sites for 
the DnaA protein. Some replication origins contain an 
AT-rich region that may have additional iterons and/ 
or binding sites for IHF, a host-encoded DNA bend- 
ing protein. DNA bending proteins are often essential 
for achieving optimal DNA conformation for replica- 
tion initiation. For example, plasmids such as pRYM 
cannot transform cells that are deficient in the bending 
protein called HU. This involvement of both host- 
and plasmid-encoded proteins probably restricts the 


host range of plasmids. One way to relax the restric- 
tion is to express more than one plasmid initiator 
protein. Such a strategy is used by the broad host 
range plasmid RK2. 

Not all examples of theta replication involve bind- 
ing of an initiator protein. For example, the ColE1 
plasmid of E. coli relies on a specific RNA primer to 
initiate DNA replication. Synthesis of the primer 
begins 555 base pairs upstream from the origin, 
which in this case is defined as the transition point 
between primer RNA and DNA. Initiation of DNA 
synthesis requires generation of a specific secondary 
structure at the 3’ end of the RNA primer and site- 
specific RNA processing by ribonuclease H. The 
3/-hydroxyl end of RNA then serves as a primer for 
DNA synthesis. 

Theta replication can be unidirectional or bidirec- 
tional, depending on whether one or two replication 
complexes assemble during initiation. Directionality 
is imparted by the nucleotide sequence of the replica- 
tion origin. In bidirectional replication, two identical 
replication forks are assembled in a concerted fashion. 
In the unidirectional mode of replication, only one 
fork is assembled at the origin for continuous (leading) 
strand synthesis. Discontinuous strand synthesis ini- 
tiates at primosome assembly sequences (pas) that 
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Figure 2 Rolling circle replication. (A) Rep protein binds to supercoiled plasmid and nicks the DNA at a site called 
ori. (B) Initiation of replication (dashed line) occurs at ori. (C) Leading strand synthesis continues around the plasmid. 
(D) Replication continues through the origin. (E) The Rep protein facilitates ligation of the displaced strand and the 
newly synthesized strand. (F) The displaced strand, Rep protein, and new duplex DNA are released. (G) Negative 
strand synthesis, beginning at the negative strand origin, replicates the displaced strand. (H) DNA gyrase introduces 


DNA supercoils so a new round of replication can initiate. 


become exposed as their DNA is melted by helicase 
action. 

DNA replication is presumably terminated by sim- 
ple juxtaposition and ligation of the 3’ and 5’ ends once 
the circular template is replicated. However, termina- 
tion of theta replication may also involve a replication 
termination signal analogous to that found in the bac- 
terial chromosome: replication forks moving in either 
direction are blocked at the terminus by a specific 
terminus-binding protein having antihelicase activity. 
Completion of the replication process also requires 
resolution of the linked daughter molecules by a 
topoisomerase, probably DNA topoisomerase IV. 

In rolling circle replication (Figure 2), the two 
strands of the plasmid are synthesized asynchron- 
ously, each using a different origin sequence and 
initiation mechanism. A plasmid-encoded initiator 
protein (Rep) binds to the double-strand origin and 
introduces a nick at a specific site. The nick generates a 
3/-hydroxyl end that then acts as a primer for DNA 
synthesis. A host-encoded helicase is presumably 
required to convert the initiation complex to a form 
that can be used by DNA polymerase III for DNA 
synthesis. During synthesis one strand is displaced 
(see Figure 2). After completion of one round of 
replication, the initiator protein attacks the newly 
synthesized DNA, and in the process it seals the 


ends of the displaced strand and the newly replicated 
circle. The result is release of (1) a single-stranded 
circle, (2) inactive Rep protein, and (3) a double- 
stranded circle in which one strand has been newly 
synthesized. Once the duplex becomes supercoiled, it 
is ready for a second round of replication. Conversion 
of the single-stranded, displaced DNA molecule into a 
duplex initiates at the single-strand origin using host- 
encoded functions. Replication initiation usually 
involves synthesis of an RNA primer by RNA poly- 
merase or by a primase. Rolling circle plasmids are 
usually small (<10 kb), presumably because deletions 
and duplications occur readily with this type of 
replication. 

Replication of linear plasmids entails specific 
mechanisms to replicate the gaps that are left at the 5’ 
ends of progeny strands after removal of primer RNA 
used in discontinuous strand synthesis. In Strepto- 
myces, replication initiates at an internal origin and 
proceeds bidirectionally to the ends of the DNA 
molecule. Completion of linear DNA replication at 
the termini (telomeres) is achieved by a protein that 
covalently binds to the 5’ telomeric ends of the plas- 
mid and provides a hydroxyl group that serves as a 
primer for DNA synthesis. In Borrelia, replication of 
linear plasmids proceeds by a rolling circle mechanism 
that involves formation of concatameric replicative 
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intermediates. In these plasmids telomeres contain 
palindromic hairpins. Nicking at the telomere and 
formation of a hairpin that snaps back onto itself 
provide a 3’ end that primes DNA synthesis. 


Copy number control 

The task of every plasmid is to produce enough copies 
of itself so at least one copy is passed to each daughter 
cell at the time of cell division. However, the plasmid 
must not overly tax host resources, either in terms of 
replication proteins or nucleoside triphosphates. Con- 
sequently, plasmid copy number is regulated. Copy 
number may be as low as one to two or as high as 
hundreds per cell. For a given plasmid in a given 
bacterial host, copy number is fixed; it can be changed 
only by mutation. 

Some plasmids encode negative regulators, often 
RNA molecules, to limit the rate of production of a 
critical initiation factor. For example, in pT181 of 
Staphylococcus aureus, short antisense RNAs are 
synthesized from a gene overlapping the 5’ leader 
portion of the rep (initiator) gene. The antisense 
RNA can form a duplex RNA with the rep transcript; 
when that happens, the rep transcript is prematurely 
terminated. rep expression is reduced, as is initiation 
of plasmid replication. Plasmids such as R1 assure 
tight regulation by using both small RNAs and small 
repressor proteins to negatively control rep expres- 
sion. 

For the Co1E1 plasmid of E. coli, the short anti- 
sense RNA is complementary to the 5’ end of the 
RNA primer. Formation of an RNA-RNA duplex 
alters the secondary structure of a downstream por- 
tion of the primer, making it less suitable for RNA- 
DNA hybrid formation and subsequent processing by 
ribonuclease H. In still other plasmids, copy control is 
exerted by the presence of multiple binding sites for 
the initiator protein (iterons). Protein—protein inter- 
actions between initiator (Rep) molecules bound to 
iterons are thought to be involved in the formation of 
nucleoprotein structures at the origin that facilitate 
initiation of replication. At high plasmid copy num- 
ber, and therefore high level expression of the Rep 
protein, more iterons are bound to Rep. In some 
cases this might lead to a plasmid—plasmid pairing 
through Rep protein interactions. That could inhibit 
replication. In some plasmids iterons are located both 
inside and outside of the origin. Then high levels of 
Rep protein might lead to binding between Rep pro- 
teins attached to the two patches of iterons and for- 
mation of a DNA loop that blocks replication. In 
another scenario, the active, origin-binding form of 
the initiator protein is monomeric. Dimers tend to 
bind to inverted repeat sequences involved in auto- 
repression of initiator synthesis. High levels of Rep 


protein, which would favor dimer formation, inhibit 
rep gene expression. 


Stable Maintenance 

An important component of plasmid biology is stable 
maintenance, i.e., the ability of a plasmid to be inherit- 
ed without the formation of plasmid-free progeny. 
For plasmids with high copy numbers, stable main- 
tenance is achieved by random distribution of plasmid 
copies to the daughter cells. As long as daughter cells 
receive at least one copy of the plasmid at cell division, 
the copy control mechanism reestablishes the copy 
number typical of that particular plasmid/host com- 
bination. Such a mechanism would not work well for 
plasmids that normally have only one to two copies 
per cell, since many daughter cells would not receive 
a copy. These plasmids use one or more specific 
mechanisms to ensure hereditary stability, as dis- 
cussed below. 

Active partitioning usually involves a centromere- 
like site and two plasmid-encoded, trans-acting pro- 
teins called ParA and ParB. With the P1 plasmid, 
the ParB protein binds to a specific plasmid DNA 
site called parS, which is located immediately down- 
stream from the parB gene. The ParA protein, which is 
encoded by a gene located immediately upstream from 
parB, is an ATPase that is stimulated by ParB. In one 
model, plasmid pairs are held together through ParB- 
ParB interactions. The partition complex then trans- 
locates plasmid pairs to a site at the division plane of 
the cell, presumably utilizing the hydrolysis of ATP 
and ParA activity. Proper positioning of the plasmid 
pairs leads to distribution of each member of a pair toa 
different daughter cell. 

Some low-copy-number plasmids also have a sys- 
tem for killing cells that fail to acquire a copy of the 
plasmid during cell division. Two plasmid-encoded 
proteins are usually involved. One is a long-lived 
toxic product, and the other is a short-lived protein 
that confers immunity to the toxin. For example, the F 
(fertility) plasmid of E. coli encodes a protein (CcdB) 
that traps DNA gyrase on DNA such that host DNA 
replication is blocked. A second plasmid protein 
(CcdA) inactivates the toxic one, thus protecting a 
plasmid-containing cell from the toxic protein. The 
immunity protein (CcdA) decays rapidly, so it must 
be continually produced by the plasmid for the host to 
remain protected. A daughter cell that fails to acquire 
a copy of the plasmid will probably pick up both 
proteins, but the immunity protein will soon disap- 
pear. At that point the toxin kills the cell. 

Plasmids that replicate by the theta mode also have 
the problem of multimer formation (multimers are 
plasmid circles that contain more than a single plasmid 
copy, usually with the plasmids oriented head to tail). 


Accumulation of multimeric forms effectively reduces 
copy number, since multimers are distributed to 
daughter cells as single units while the copy control 
mechanism senses each plasmid copy ina multimer. To 
minimize multimer problems, some plasmids, such as 
F, P1, and RK2, contain a specific resolution site and 
one or more plasmid-encoded resolvases. In the case 
of ColE1, host-encoded enzymes resolve plasmid 
multimers. 


Incompatibility 

Incompatibility among plasmids is usually manifested 
as the inability of a plasmid to be established in a cell 
that already contains another plasmid or as destabil- 
ization of a resident plasmid by a second, incoming 
plasmid. Experimentally, it has been possible to clas- 
sify plasmids according to incompatibility groups. 
Incompatible plasmids, i.e., members of the same 
incompatibility group, share one or more elements of 
the plasmid replication or partitioning systems. 
Incompatibility is usually symmetric: in the absence 
of external selective pressure, two incompatible plas- 
mids are lost from cell progeny at the same frequency. 
This symmetry is explained in the following way. In 
any given cell, copies of one plasmid or the other are 
selected at random for replication or partition. Occa- 
sional increases in the number of copies of one plasmid 
at the expense of the other cannot be corrected 
because the copy number control mechanism cannot 
distinguish between the two plasmids. Thus each host 
colony recovered will contain only one plasmid type. 
Since each plasmid predominates over the other with 
the same probability, the number of progeny cells, and 
therefore the number of colonies, carrying one plas- 
mid or the other will be equal. 

Cases have also been found in which incompatibil- 
ity is unidirectional. For example, cloned DNA 
fragments encoding essential plasmid replication or 
partitioning functions tend to exclude plasmids 
requiring those functions. Unidirectional incompati- 
bility is also created by mutations that cause replica- 
tion defects (the mutant plasmid cannot compete with 
a coresident, incompatible plasmid) or that alter inter- 
actions between a copy control regulator and its target 
(the mutant plasmid is less sensitive to the inhibitor 
encoded by a coresident incompatible plasmid). 


Horizontal Transfer 

Some plasmids carry a set of genes that allows a plas- 
mid-containing cell to ‘mate’ with a plasmid-free 
cell and pass a copy of the plasmid to the plasmid- 
free cell. This process, called conjugation, has been 
most thoroughly studied with the F plasmid of 
E. coli. This plasmid contains a contiguous set of 36 
open reading frames (tra genes) that encode all of the 
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proteins needed for conjugation. Many of the tra 
genes are involved in the construction of long, thin 
structures (pili) that extend one to two millimeters 
from the bacterial surface. The tips of the pili are 
thought to interact with the surface of cells lacking 
the F plasmid. Those cells are then drawn to the F- 
containing cells by depolymerization of the pili. 
Mating pairs are held together tightly by cell surface 
interactions called conjugative junctions, through 
which plasmid DNA is thought to pass. 

DNA transfer is an active process that involves 
rolling circle replication. It begins when the Tral pro- 
tein nicks one strand of the F plasmid at a site called 
ortT. As a result of the nicking reaction Tral becomes 
covalently bound to the 5’ end of the plasmid DNA. 
Tral has a helicase activity that unwinds the plasmid, 
and in the process the nicked strand is transferred to 
therecipientcell. The bound Tral protein may also parti- 
cipate in circularizing the transferred DNA strand 
once transfer is complete. The complementary strands 
in both donor and recipient cells are synthesized 
by host replication proteins. The F plasmid is par- 
ticularly active at promoting conjugation because an 
insertion element has disrupted a regulatory gene 
that would otherwise repress tra gene expression. 
Another unusual feature of F is the presence of a 
transposon and three insertion elements that promote 
integration into the host chromosome. Once in the 
chromosome, F can cause transfer of chromosomal 
genes, or even a copy of the entire chromosome, 
from one cell to another. This is followed by recom- 
bination of the incoming DNA with the resident 
chromosome. Since transfer occurs at high frequency, 
a cell containing an integrated copy of F is called 
an Hfr (high frequency recombination). Mating 
between an Hfr and a strain lacking F can be inter- 
rupted at various times, and the transfer of particular 
genes can be measured to determine the time of trans- 
fer for each. In this way chromosomal gene order has 
been determined. 

In addition to self transfer and chromosome trans- 
fer from one cell to another, conjugative plasmids also 
facilitate the independent transfer of mobilizable plas- 
mids when present in donor cells. Moreover, some 
plasmids and transposons can integrate into a conjuga- 
tive plasmid and be transferred to a different cell 
when conjugation occurs. Thus plasmids provide sev- 
eral ways for small segments of DNA to move among 
bacterial species. Nucleotide sequence analysis of bac- 
terial genomes suggests that gene transfer has occurred 
many times over the course of evolution. For example, 
regions containing metabolically related genes, viru- 
lence genes, and other clusters are sometimes bounded 
by short repeated sequences, as if they had been 
acquired in blocks through transposition from 
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plasmids. It is likely that conjugative plasmids are 
major factors in bacterial evolution. 


Plasmids and Genetic Engineering 


Genetic engineering has been built around the process 
of gene cloning, which is essentially a method for 
obtaining large quantities of a particular portion of 
DNA. Gene cloning involves cutting DNA molecules 
at specific places, physically separating the fragments, 
and then multiplying each fragment many times by 
replication in growing microorganisms. Plasmids 
serve as carriers of the DNA fragments. Small plas- 
mids are easily cut at a single, specific site where a 
foreign DNA fragment can be inserted. Once the 
circle is reformed by ligation of the ends, the chimera 
containing plasmid and foreign DNA is introduced 
into bacterial cells, usually E. coli. Antibiotic resist- 
ance genes on the plasmid make it easy to obtain 
plasmid-bearing cells (transformants) by spreading 
the bacterial cells on antibiotic-containing agar 
where they form colonies. The plasmid replication 
apparatus is present in the chimera, so many copies 
of the cloned gene are produced inside the bacteria as 
colonies grow. Biochemical methods are then used to 
identify colonies containing the fragment of interest, 
and those colonies are grown as pure cultures to 
obtain large numbers of cells. Plasmids isolated from 
those cells carry the fragment of interest. After the 
plasmid DNA is isolated, the fragment of interest 
can be excised by cutting with restriction endo- 
nucleases and then purified for a variety of studies. 
Thus plasmids act as selectable, amplifiable carriers for 
DNA fragments. 

Cloned fragments can originate from naturally 
occurring DNA or from artificial constructs. For 
example, genes can be synthesized chemically and 
then inserted into plasmids for amplification and 
expression by microorganisms. The polymerase 
chain reaction makes it possible to amplify any seg- 
ment of a genome. Then the segment is easily placed in 
a plasmid for further amplification and study. 

It is often desirable to obtain large quantities of the 
protein product of a particular cloned gene. Special- 
ized plasmids have been constructed that contain an 
inducible promoter followed by multiple cloning sites 
to allow a gene to be inserted downstream from the 
promoter. Then the cloned gene can be expressed at 
high levels. Such plasmids are called expression vec- 
tors. Some expression vectors will cause a short stretch 
of particular amino acids to be added to one end of the 
protein during expression. Then the protein can be 
easily purified by adsorption to and elution from a 
column containing a reagent that specifically binds the 
extra amino acids. 


Concluding Remarks 


Many aspects of plasmid biology are understood in 
considerable detail, and that knowledge has allowed 
biologists to use plasmids to manipulate bacterial 
genetics in the laboratory. In natural populations, 
plasmids have served as markers for following patho- 
genic bacterial strains, and we now understand some 
of the population-based aspects of plasmid-borne 
virulence factors and antibiotic resistance. Since 
some plasmids can move from one bacterial species 
to another, it is easy to imagine how plasmids contrib- 
ute to the emergence of new human pathogens. 
Indeed, the E. coli strain O157:H7, which has been 
responsible for many cases of serious food poisoning, 
appears to have developed recently from acquisition 
of a plasmid carrying a Shigella toxin gene. There is 
little doubt that plasmids will continue to attract the 
attention of geneticists. 


Further Reading 

Firth N, Ippen-lhler K and Skurray R (1996) Structure and 
function of the F factor and mechanism of conjugation. In: 
Neidhardt FC, Ingraham J, Low K et al. (eds) Escherichia coli 
and Salmonella: Cellular and Molecular Biology, pp. 2377—2401. 
Washington, DC: American Association for Microbiology 
Press. 

Helinski D, Toukdarian A and Novick R (1996) Replication 
control and other stable maintenance mechanisms of plas- 
mids. In: Neidhardt FC, Ingraham J, Low K et al. (eds) Escher- 
ichia coli and Salmonella: Cellular and Molecular Biology, pp. 
2295-2324. Washington, DC: American Association for 
Microbiology Press. 


See also: Bacterial Genetics; Escherichia coli; Yeast 
Plasmids 


Pleiotropy 
K B Low 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1001 


Pleiotropy refers to the condition where a single 
mutation causes more than one observable phenotypic 
effect or change in characteristic. For example, in the 
human genetic disease, phenylketonuria (PKU), a sin- 
gle simple recessively inherited mutation inactivates 
the enzyme parahydroxylase which converts phenyl- 
alanine to tyrosine. This results (in homozygous 
mutant individuals) in excessive amounts of phenyl- 
alanine, adeficiency of tyrosine, andanexcess of phenyl- 
pyruvic acid which is an alternate degradation product 


of phenylalanine. These effects can result in mental 
retardation and also abnormally light hair and skin 
color. Such a combination of phenotypic effects of a 
genetic defect is called a syndrome. 

In molecular terms, pleiotropy also refers to the 
regulation of more than one gene product by a single 
genetic element. In their studies of operon structure, 
Monod and colleagues (late 1950s) used Escherichia 
coli to show that a single mutation could effect the 
expression of more than one gene together, such as 
lacZ and lacY. One type of sucha pleiotropic mutation 
could be in the gene (Jacl) which encodes the repressor 
which regulates transcription of the entire lacZ lacY 
(and lacA) operon from one site at the beginning of 
the operon. Another type of pleiotropic mutation 
could be in the /ac operator region, where the repres- 
sor binds. 

Another class of pleiotropy in bacteria involves an 
altered gene product which controls more than one 
operon, such as the cya gene product which encodes 
adenylate cyclase which is needed for the expression 
of a number of operons involved in the breakdown 
of certain carbon sources such as lactose, rhamnose, 
arabinose, and maltose. Thus, a mutation in cya can 
block the cell’s ability to utilize any of these sugars. 

The pleiotropic effects of a mutation do not have to 
be all detrimental or all beneficial, but can be a mixture 
of the two. Thus, over the course of evolution various 
pleiotropic mutations can alter a group of character- 
istics such that the individual can survive better under 
some environmental condition than before, but less 
well under a different, formerly favored, condition. 


See also: Character; Operon; Phenotype 


Plesiomorphy 
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A plesiomorphy is one of a pair or series of homologs 
that is hypothesized to have evolved before the others 
during evolutionary descent. Plesiomorphy is a rela- 
tive term. All plesiomorphies are apomorphies at 
more inclusive levels in the phylogeny. 


See also: Homology; Primitive Character; 
Symplesiomorphy 
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Ploidy 
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Ploidy refers to the number of copies of the set of 
chromosomes in a cell. For example, a haploid has 
one copy, a diploid two copies. 


See also: Aneuploid; Chromosome Number; 
Polyploidy 
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Medical and Genetic Importance 


The pneumonia bacteria, Streptococcus pneumoniae 
(formerly Diplococcus pneumoniae), are small, ovoid 
cells that form pairs or chains of cells. This bacterium, 
also known as ‘pneumococcus’ is a major cause of 
otitis media and lobar pneumonia. Although they are 
normal denizens of the human nasopharynx, pneumo- 
cocci can produce discomfiting ear infections in chil- 
dren and life-threatening lung infections in adults. 
Prior to the advent of antibiotics, the latter were fre- 
quently fatal, and in aged individuals they are still. 
The medical importance of these bacteria prompted 
intense investigative scrutiny, which led to discoveries 
of great genetic consequence. 


Discovery and Analysis of Genetic 
Transformation 


Pathogenic pneumococci are surrounded by a protect- 
ive polysaccharide capsule; strains that have lost the 
ability to make the capsule are noninfective. Frederick 
Griffith in 1928 reported that heat-killed encapsulated 
bacteria could transfer the ability to make a capsule to 
live nonencapsulated cells and render them infect- 
ive. Later, cell-free extracts were shown to effect this 
transformation. At the time, it was not realized that 
bacteria contain genes and that the change represented 
a genetic transformation. However, in 1944 Oswald 
Avery and colleagues showed that DNA was the trans- 
forming agent, a discovery that at once affirmed the 
existence of genes in bacteria and demonstrated that 
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DNA was the genetic material. Further elucidation of 
the mechanisms of transformation, DNA repair, and 
recombination in pneumonia bacteria depended on 
the quantitation of transformation initiated by Rollin 
Hotchkiss using drug-resistant mutants. Quantitative 
analysis showed that competence, or the ability of 
bacteria to be transformed, depended on conditions 
of culture growth. 


Competence and DNA Uptake 


Regulation of Competence 

The ability of pneumococci to take up DNA depends 
on an elaborate surface mechanism involving over a 
dozen proteins, most of which are made only during 
a brief period of dense culture growth. A quorum- 
sensing system depends on the excretion of a 17- 
amino-acid polypeptide that accumulates in the 
culture medium until it reaches a sufficient concentra- 
tion to act back on a cell membrane receptor. The 
receptor is a histidine kinase that can phosphorylate 
a response regulator, which is also part of the signaling 
system. The regulator, in turn, activates production of 
a component of RNA polymerase that enables tran- 
scription of genes preceded by a unique promoter 
sequence called a ‘combox.’ These genes encode com- 
ponents of the DNA uptake mechanism. Such regula- 
tion assures that competence for DNA uptake occurs 
only at high densities when sufficient pneumococci 
are available to serve as donors and recipients of 


DNA. 


DNA Binding and Entry 

The first step in DNA uptake is binding of the donor 
molecule at one or more points on the cell surface, at 
which sites a break occurs in one strand of the duplex. 
Subsequently, action of a membrane nuclease releases 
oligonucleotide fragments from one DNA strand as 
the other strand enters the cell. That strand enters with 
its 3’ end first, and it is immediately bound by single- 
strand-binding proteins. At least ten different proteins 
situated in or near the cell membrane are involved in 
the entry process, and three more are required for 
processing and transporting these proteins. Duplex 
DNA from any species can be taken up by pneumo- 
cocci; specificity is required only for recombination of 
the internalized strand with the recipient chromo- 
some. 


Episomal Elements 


Phages 

About a dozen bacterial viruses have been isolated 
from S. pneumoniae. These include both lytic and 
temperate types. Some of the lytic phages contain 


abnormal DNA bases, which render them refractory 
to restriction endonucleases. The temperate phages 
can be induced by agents such as mitomycin-C, 
which act through the SOS repair system, but other 
aspects of SOS repair in enterobacteria, such as UV- 
induced mutagenesis, are absent in S. pneumoniae. The 
lytic enzymes associated with some of these phages 
are activated by choline-containing components of the 
cell wall, similarly to the autolytic enzyme responsible 
for the facile lysis of pneumonia bacteria. 


Plasmids 

Very few plasmids have been found in pneumonia 
bacteria. The strain originally isolated and used by 
Avery contained a 3-kb plasmid of unknown function, 
but this plasmid has been lost in most of the descend- 
ants of that strain. However, plasmids introduced 
from other streptococcal bacteria, such as Streptococ- 
cus agalactiae (pMV158) and Enterococcus faecalis 
(pAMB), replicate readily in S. pneumoniae. 


Conjugative Transposons 

Although plasmids can confer drug resistance to 
pneumococci, antibiotic-resistant strains isolated 
from diseased patients have all contained chromo- 
somal resistance genes. These genes often occur in 
elements called conjugative transposons that can 
mobilize their transfer from one pneumococcal cell 
to another by conjugation. These elements may be as 
large as 60 kb and contain resistance genes to several 
different antibiotics as well as genes encoding the 
apparatus for conjugational transfer. 


Insertion Sequences 

A half dozen different insertion sequence (IS) ele- 
ments related to insertion sequence families found in 
other bacteria are found in S. pneumoniae. These elem- 
ents are approximately 1 kb in length and occur up to 
ten times in the genome. They can transpose them- 
selves within the cell, but unlike conjugative trans- 
posons, they cannot transfer themselves to another 
cell. Pneumonia bacteria contain two classes of smaller 
DNA repeats: approximately 25 ‘BOX’ and 100 RUP’ 
elements, which are composed of 104 and 107 base 
pairs, respectively. 


DNA Damage, Repair, and 
Recombination 


Genetic Chemistry 

Pneumococcal transformation allows a quantitative 
assessment of DNA damage by physical and chem- 
ical agents. Heating at temperatures above the DNA 
‘melting point’ causes a precipitous decline in trans- 
forming activity due to strand separation. Annealing 


at a lower temperature restores the duplex. Discovery 
of such ‘renaturation’ stimulated a vast body of experi- 
mental work based on nucleic acid hybridization. 
Heating at lower temperatures causes a gradual loss 
of biological activity due to depurination and conse- 
quent strand breakage. 


Modes of Recombination 


1. Chromosomal transformation in pneumococcus 
results from a donor DNA strand segment re- 
placing its homolog in the recipient chromosome. 
Plasmid transformation, in which part of the plas- 
mid genome is replaced, is similar. Both processes 
depend linearly on DNA concentration. 

Plasmid transfer, due to degradation during uptake, 
requires two entry events to reconstitute the repli- 
con, and plasmid establishment depends quadrat- 
ically on DNA concentration. 

Chromosomal facilitation of a plasmid containing a 
chromosomal DNA segment, however, allows 
interaction of the plasmid with the chromosome 
so that its establishment can occur with a single 
entry event. 

Circular integration into the chromosome of non- 
replicative circular DNA with chromosomal 
homology occurs by a single crossover. This mode 
is useful for introducing mutations into genes or 
adding genes ectopically to the chromosome. 
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Gene Mapping 

Extraction of DNA from pneumococcal cells gener- 
ally breaks the 2.2-megabase chromosome randomly 
into fragments averaging about 30kb in length. 
Randomly cleaved strand segments averaging 3 kb in 
length are inegrated into the chromosome during 
transformation. These processes separate genetic mark- 
ers so that their cotransformation frequency is inver- 
sely proportional to their distance in the chromosome. 
On this basis genes and their mutations have been 
mapped with considerable precision, particularly at 
the loci conferring maltosaccharide utilization and 
aminopterin resistance. At both loci, recombination 
frequencies between linked markers correspond to 
0.03% per nucleotide. The entire chromosome of 
S. pneumoniae has been physically mapped and the 
genomic nucleotide sequence determined. 


Mismatch Repair 

Genetic analysis revealed strong marker effects on 
recombination frequencies. These effects were attrib- 
uted to a system of DNA base mismatch correction, 
called Hex, that acts on the heteroduplex product of 
transformation and affects certain mismatches more 
than others. The Hex system also acts after inaccurate 


Pneumonia Bacteria 1493 


DNA replication to prevent mutations. Homologous 
systems are found in other bacteria and in eukaryotes; 
defects in the human repair genes predispose cells to 
cancer. 


Biochemical Systems 


Gene Expression 

Many of the signals governing transcription and trans- 
lation are similar in S. pneumoniae and other bacteria, 
such as Escherichia coli. However, in S. pneumoniae a 
significant number of gene transcripts lack ribosome- 
binding sites complementary to the 3’ end of 16S 
rRNA. Also the —10 promoter sites are generally 
stronger in S. pneumoniae, being of the extended 


type, INTGNTATAAT. 


Cloning in S. pneumoniae 

Attempts to clone pneumococcal genes in E. coli were 
often frustrated by the strong promoters associated 
with them, which rendered the genes toxic for this 
host. Therefore, a cloning system was developed 
with S. pneumoniae as host and a derivative of 
pMV158 as vector. 


Folate Biosynthesis 

Folic acid is an important vitamin and an essential 
component of all living cells. The molecular genetics 
of its biosynthesis was first determined in S. pneumo- 
niae, using the pneumococcal cloning system. An 
operon containing four genes encodes five enzymes 
responsible for converting guanosine triphosphate, 
p-aminobenzoate, and glutamate to folate. Mutations 
in a gene encoding one of the enzymes, dihydroptero- 
ate synthase, confer resistance to sulfonamide drugs. 
Many other pneumococcal genes have been investi- 
gated, with particular emphasis on those affecting 
drug-resistance and virulence. 


Virulence Genes 


Surface Proteins 

Two distinct surface proteins, PspA and PsaA, extend 
out from the cell wall. Mutations in genes encoding 
them render the bacteria less able to persist and kill a 
mouse host. PsaA is an adhesin, which enables the 
bacteria to bind to tissue cell receptors. 


Pneumolysin 

Pneumolysin is a major virulence factor that exerts its 
toxic and lytic effects by producing pores in target 
cells. It is a cytoplasmic protein that is released only 
by bacterial lysis. Although the pneumococcal auto- 
lytic enzyme has no direct pathogenic activity, mut- 
ants altered in the gene encoding it are less virulent 
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because they fail to release pneumolysin. Partial lysis 
of pneumococcal cell populations may be important in 
pathogenesis and as a source of donor DNA in trans- 
formation. 


Glycosylases 

Pneumococci excrete various enzymes active on poly- 
saccharides and glycoproteins. Among them are a 
neuraminidase and a hyaluronidase that could assist 
in bacterial invasion of host tissues. 


Population Genetics 


Several systems of biological importance to S. pneu- 
moniae exist in two or more states in populations of 
the bacteria. This population diversity must have sur- 
vival value for the species. In the following examples, 
similar mechanisms of allelic substitution by a multi- 
gene cassette are responsible for changes of state. 


Restriction Enzymes 

Cells of S. pneumoniae contain either the DpnI or 
Dpnil restriction system. The DpnlI endonuclease 
recognizes and cleaves the methylated DNA sequence 
5’ GmATC; cells that produce it contain unmodified 
DNA. The DpnII system is complementary to the 
Dpnl system in that it recognizes the unmethylated 
sequence 5‘GATC. Unlike other restriction systems, it 
encodes two methyltransferases, DpnM and DpnA, 
which methylate double- and single-stranded DNA, 
respectively. The DpnII endonuclease cleaves un- 
methylated, double-stranded DNA. Thus, these sys- 
tems are designed to block phage infection but not 
to interfere with genetic transformation between 
cells with different systems. The dual systems may 
prevent viral epidemics from wiping out an entire 
population. 


Capsule Synthesis 

The polysaccharide capsule that surrounds the 
pneumococcal cell is essential for its virulence. More 
than 80 different capsule types exist. Genetic investi- 
gation of several capsular types revealed that the genes 
for their biosynthesis were present at the same genetic 
locus. Immunity to pneumococcal infection is dir- 
ected mainly to the capsule, so the multiplicity of 
capsule types is clearly beneficial to the pathogen. 
Effective vaccines, therefore, must be multivalent. 


Competence Control 

Pneumococci have two distinct, but closely related, 
systems for regulating competence. They differ by 
several amino acids in the competence-stimulating 
peptide and in the cognate receptor portion of the 
transmembrane histidine kinase. The result is to 


remove part of the pneumococcal population from 
quorum sensing, but why this is beneficial is unclear. 


Mosaic Genes 

The use of drugs to treat pneumonia has given rise to 
resistant strains. The genetic basis for resistance to 
numerous drugs and antibiotics has been determined, 
and it usually depends ona single mutation ina critical 
gene. However, penicillin resistance is conferred in 
steps by changes in several genes. Some of these 
genes encode proteins that normally bind penicillin 
but that have been rendered resistant by recombin- 
ation with genes from related streptococci. Such mosaic 
genes are formed by horizontal transformation. The 
fact that drug resistance mutations and capsular types 
are readily transferred into and among populations of 
pneumonia bacteria poses a real threat to the treat- 
ment of this disease and may require continuous 
development of new drugs. 


Further Reading 
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Lacks SA (1998) DNA repair and mutagenesis in Streptococcus 
pneumoniae. In: Nickoloff JA and Hoekstra MF (eds) DNA 
Damage and Repair, vol. 1, DNA Repair in Prokaryotes and 
Lower Eukaryotes, p. 263. Totowa, NJ: Humana Press. 

Lacks SA (1999) DNA uptake by transformable bacteria. In: 
Broome-Smith JK, Baumberg S, Stirling CJ and Ward FB 
(eds) Transport of Molecules across Microbial Membranes, 
p. 138. Cambridge: Cambridge University Press. 

Tomasz A (ed.) (2000) Streptococcus pneumoniae: Molecular 
Biology and Mechanisms of Disease. Larchmont, NY: Mary 
Ann Liebert Inc. 

Genome sequence information for Streptococcus pneumoniae 
can be obtained from The Institute for Genomic Research 
website at http://www.tigr.org. 


See also: Bacterial Genetics; DNA; DNA Repair; 
Genetic Recombination 
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Point mutations are changes in sequence involving 
single base pairs. 


See also: Mutation 


Polarity 
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Polarity is the effect of a mutation in one gene on the 
expression (transcription or translation) of subsequent 
genes in the same transcription unit. 


See also: Gene Expression; Mutation 


Polaron 
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The polaron is the unit of polarity of gene conversion. 
Most well-studied genes show a gradient in the fre- 
quency of non-Mendelian segregation (conversion 
and postmeiotic segregation) in meiotic tetrads. The 
gradient can also be seen in an inequality of recombin- 
ant meiotic products carrying the parental combin- 
ations of flanking markers, and, in two-point cross 
tetrad data, a preponderance of intragenic recombin- 
ants resulting from conversion of one marker over the 
other. The gradient has been suggested to represent a 
decreasing probability of hybrid DNA occuring at 
greater distances from the initiation site. Alternatively, 
it may reflect an increasing probability of hetero- 
duplex DNA being corrected by restoration of the 
parental genotype of the chromatid rather than con- 
version to the genotype of the other parent. 

The gradient of polarity is often high at the pro- 
motor end of the gene, but there are cases where the 
high conversion end is the end away from the pro- 
motor. A few genes have been described which show 
gradients that are high at both ends and low in the 
middle. In Saccharomyces cerevisiae, the gradient 
declines with increasing distance from the site outside 
the gene at which initiation of the recombination 
events occurs. 


Further Reading 

Hastings PJ and Whitehouse HLK (1964) A polaron model of 
genetic recombination by the formation of hybrid deoxy- 
ribonucleic acid. Nature 201: 1052-1054. 

Nicolas A and Petes TD (1994) Polarity of meiotic gene conver- 
sion in fungi: contrasting views. Experientia 50: 242-252. 


See also: Conversion Gradient; Gene Conversion 
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It has been known for some time that eucaryl messen- 
ger RNAs (mRNAs) have 3’ poly(A) tails when they 
are exported from the nucleus to the cytoplasm. The 
poly(A) tracts are not encoded within genes but rather 
represent posttranscriptional additions that are made 
in the nucleus, catalyzed by the enzyme poly(A) poly- 
merase. The mRNA sites for addition of poly(A) tails 
are determined by conserved nucleotide sequences at 
or near the sites and by other factors that contribute 
specificity and processivity to the reactions. The 
poly(A) tracts of both nuclear RNA and mRNA are 
associated with the protein PABP, the poly(A)-binding 
protein, resulting in a common feature in most 
eukaryotic organisms, namely that the 3’ end of each 
mRNA consists of a stretch of poly(A) bound to a 
large mass of protein. Across phylogenetic lines, the 
tail lengths are not uniform; they range from 60-80 
residues in yeast to 200-250 in mammals. However, in 
the cytoplasm, poly(A) tails become shorter with 
mRNA age, and in some instances may be completely 
removed. 

What is the role of poly(A)? Consistent with the 
routine and often substantial changes that mark 
poly(A) tract length during the cytoplasmic lifetime 
of an mRNA, it has been shown that the status 
of poly(A) can be a determinant of both mRNA 
translational efficiency and the time of onset of 
mRNA decay. In several (but not all) situations, 
poly(A) tails confer stability upon mRNA. Removal 
of the poly(A) tail precedes the degradation of certain 
mRNAs; stability of the mRNA is likely to be con- 
nected with poly(A), although it is not clear whether 
the relationship is universal. The ability of the poly(A) 
to protect mRNA against degradation requires bind- 
ing of the PABP. Removal of poly(A) inhibits the 
association of ribosomal subunits and initiation of 
translation in vitro, and depletion of PABP has the 
same effect in yeast in vivo, but it is not clear whether 
these effects are due to a direct influence of poly(A)- 
PABP on the initiation reaction or have some indirect 
influence. In contrast, in early embryonic development, 
there are many examples where polyadenylation of a 
particular mRNA is correlated with its translation. In 
some cases, there seems to be a correlation between 
storage of mRNAs in a nonpolyadenylated form, and 
their activation for translation when poly(A) is added. 
In other cases, the translation of poly(A)* mRNAs 
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is reduced when they are deadenylated. It is still not 
understood how the polyadenylation or deadenyl- 
ation is related to the control of the translational 
utilization of the mRNA. 

More recently, RNAs with poly(A) tails have been 
found in bacteria. The poly(A) tails have been found 
to reduce the stability of regulatory plasmid RNAs 
and mRNAs, but the mechanisms are not clear. Final- 
ly, it has been shown in bacteria that stable RNAs such 
as tRNA, tmRNA, and 4.5S, 6S, and ribosomal RNAs 
can be found in polyadenylated forms. These findings 
indicate that polyadenylation is not unique to mRNA 
and that it serves a more general function in RNA 
metabolism. 


See also: Messenger RNA (mRNA) 
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Polyadenylation is the addition of a sequence of poly- 
adenylic acid to the 3’ end of a eukaryotic RNA 
following its transcription. 


See also: Transcription 
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A polycistronic messenger RNA (mRNA) is a single 
RNA molecule encodes more than one protein by 
virtue of containing information from two or more 
sequential, functional open reading frames (ORFs). 
Therefore, each protein is produced independently. 
(This is in contrast to polyproteins, which are synthe- 
sized as a single polypeptide and then posttranslation- 
ally processed into a number of different functional 
proteins). Polycistronic mRNA is found almost exclu- 
sively in prokaryotes, since the mechanism of transla- 
tional initiation by prokaryotic ribosomes allows the 
ribosome to readily initiate at start codons located 
internally on mRNA. 

Most operons yielding polycistronic mRNA con- 
tain genes whose synthesis, or whose efficient function, 


would seem to require coordinate regulation, for 
example, those encoding enzymes in a biosynthetic 
pathway. There might be a natural advantage in keep- 
ing such genes together in organisms where polycis- 
tronic mRNA can be translated. When comparing 
sequenced genomes of widely divergent organisms, 
the most highly conserved polycistronic mRNAs 
seem to be those from operons encoding certain ribo- 
somal proteins. 

Typically, each gene on a polycistronic mRNA 
contains its own Shine-Dalgarno sequence, a sequence 
that is involved in initiation by prokaryotic ribo- 
somes, and therefore each gene can be translated 
independently. However, there are instances where 
translational initiation at one site is dependent on 
translation of some other region of the mRNA. In 
some cases, changes in the secondary structure of the 
polycistronic mRNA ‘induce’ initiation sites for other 
proteins, e.g., in the small RNA bacteriophage MS2 
the translation of the replicase gene is dependent on 
translation of the coat protein gene and concomitant 
disruption of secondary structure. In addition, trans- 
lational reinitiation has been observed in cases where 
the stop codon of the upstream gene is close to the 
start codon of the next gene. Here the 30S ribosomal 
subunit apparently does not dissociate from the mes- 
sage before reinitiating at a nearby site. Such reini- 
tiation can make the efficiency of the downstream 
initiation site much greater than if the site was on 
monocistronic mRNA. For this reason, some cloning 
vectors designed to yield very high levels of expression 
of a gene have incorporated into them a small, up- 
stream, functional ORF. Translational coupling is a 
type of regulation where the translation of a distal 
gene is very highly dependent on translation of the 
gene immediately upstream in the polycistronic mes- 
sages and either of the two mechanisms mentioned 
above could be involved. 

Although polycistronic mRNA is typical of pro- 
karyotes, most messages produced in these organisms 
is from transcriptional units that yield monocistronic 
mRNA. In Escherichia coli, less than 30% of the 
mRNA is polycistronic, and polycistronic mRNA 
seems to be even less common than this in the Archaea. 


See also: Cistron; Open Reading Frame; Operon; 
Translation 
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See: Complex Traits 
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Polymerases are enzymes that synthesize DNA or 
RNA polymers by successive attachment of nucleo- 
side 5’-triphosphates to the 3’-OH end of a growing 
chain. Polymerases play an essential role in a variety of 
cellular processes and are used by all organisms, ran- 
ging from viruses to humans, to maintain and propagate 
cellular life. DNA polymerases are necessary for 
replication of DNA during cell division, as well as 
for repair of DNA damaged by chemicals or UV 
light. RNA polymerases transcribe DNA to RNA, 
thereby initiating conversion of the genetic informa- 
tion into protein and nucleic acid machinery respon- 
sible for cellular functions (Kornberg and Baker, 
1991). 

Enzymatic synthesis of these biological polymers is 
a template-dependent process. In other words, DNA 
and RNA polymerases use single-stranded DNA as a 
template for the synthesis of a new and complemen- 
tary strand of DNA or RNA, respectively. The tem- 
plate strand is generated by transient unwinding of the 
DNA double helix or by degradation of one strand of 
the duplex. Once the DNA template is available, the 
polymerase can link complementary nucleotides one 
after another to create the new strand. DNA poly- 
merases do not initiate de novo synthesis of a polymer. 
DNA synthesis occurs only by extension of an RNA 
primer annealed to the template, or in some cases by 
extension of an existing 3’-OH end of a DNA strand. 
Once the polymerase recognizes and binds the primer- 
template, DNA synthesis can begin. A deoxyribo- 
nucleoside 5’-triphosphate (dNTP) complementary to 
the next nucleotide on the template binds the catalytic 
site, and the polymerase aids nucleophilic attack on 
this incoming nucleotide by the 3’-OH group of the 
primer. As a result, the primer gets covalently linked 
to the «-phosphate of the nucleotide and pyrophos- 
phate is released. Pyrophosphate release and its sub- 
sequent hydrolysis drive polymerization and render 
the reaction essentially irreversible. Thus at the end of 
one catalytic cycle, primer length is increased by one 
nucleotide and a new base pair is formed between the 
primer (new polymer) and template DNA. The cata- 
lytic cycle for RNA synthesis is similar, except that 
RNA polymerases use ribonucleoside 5’-triphosphates 
(rNTPs), and they can initiate RNA synthesis de novo 
on template DNA after the duplex is unwound at 
specific initiation sites. 
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DNA Polymerases 


Polymerase Structure 
DNA polymerases occur as single polypeptides or as 
multiprotein complexes that can vary greatly in com- 
position, structure, and function. Even a simple bac- 
terium like Escherichia coli has three different DNA 
polymerases, each with distinct functions in DNA 
metabolism. Recent structural studies have shown, 
however, that the catalytic domains of DNA poly- 
merases from several different organisms share strik- 
ing structural similarity and have analogous functions. 
The first polymerase structure to be elucidated 
was that of E. coli DNA polymerase I, an enzyme 
that is involved in the repair of damaged DNA. Poly- 
merase I has the distinctive shape of a ‘right hand’ 
complete with ‘fingers,’ ‘palm,’ and ‘thumb’ subdo- 
mains (Figure 1). The crystal structures of several 
polymerases solved since then show a similar arrange- 
ment of the subdomains, including the recently deter- 
mined bacteriophage T7 DNA polymerase structure 
(Brautigam and Steitz, 1998; Doublie et al., 1998). As 
shown in Figure I, the catalytic site of the polymer- 
ase, where the incoming nucleotide is incorporated 
into the growing chain, is located in a cavity on the 
palm domain. The fingers lie against the primer ter- 
minus, and the thumb domain contacts the DNA 
behind the primer terminus as it exits the catalytic site. 
Key, conserved acidic amino acids in the palm domain 
chelate two Mg** ions that are essential for the 
nucleophilic attack of the primer on the incoming 
nucleotide. Other amino acids in the palm domain 
interact with the primer-template and may aid correct 
positioning of the DNA for the reaction. The fingers 
domain likely undergoes conformational changes 
upon binding the incoming nucleotide, that may be 
important for the catalytic mechanism. The thumb 
domain may play an important role in maintaining 
the polymerase’s hold on the primer-template 
duplex, thus helping the polymerase synthesize DNA 
processively. 


Polymerase Function 

Replication of genomic DNA is the primary function 
of DNA polymerases. Once the DNA is duplicated 
accurately, the cell can undergo division with each 
daughter cell receiving the complete genetic code of 
the organism. Polymerases responsible for DNA 
replication are complex multiprotein machines that 
can synthesize DNA with high speed, processivity, 
and fidelity. For example, in E. coli, the DNA poly- 
merase III holoenzyme synthesizes DNA at approxi- 
mately 750 nucleotides per second, and can extend 
a DNA strand for several thousand nucleotides with- 
out dissociating from the template. Several proteins 
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Figure | 


T7 DNA polymerase has the characteristic DNA polymerase structure with the ‘fingers, ‘thumb, and 


f ; 
Primer-template 
DNA 


‘palm’ domains, and the catalytic site in a cleft that binds primer-template DNA. 


accessory to the DNA polymerase make up the 
holoenzyme particle and provide activities that are 
essential for rapid and accurate DNA replication. 
The holoenzyme particle contains two copies of the 
polymerase that coordinate leading and lagging strand 
DNA synthesis. Each polymerase is associated with a 
ring-shaped protein clamp that encircles DNA and 
tethers the polymerase to the duplex, allowing the 
polymerase to replicate several thousand nucleotides 
processively. The holoenzyme also contains a clamp 
loader protein complex that assembles the circular 
clamps around DNA for use by the DNA polymerase. 
Similar to the E. coli polymerase III holoenzyme, 
replicative polymerases from other organisms, includ- 
ing humans, also use accessory proteins such as circu- 
lar clamps to ensure processive and fast DNA 
replication. A 3/5’ exonuclease activity is also asso- 
ciated with polymerase III and enables the holo- 
enzyme to proofread newly synthesized DNA and 
correct errors in replication as they occur. Such proof- 
reading activity is usually associated with DNA poly- 
merases, either in the form of a separate protein or as 
part of the polymerase protein itself, as seen in the T7 
DNA polymerase (Figure 1). 

Polymerases responsible for DNA repair function 
by replacing damaged DNA with a newly synthesized 
strand to correct the defect. The E. coli DNA poly- 
merase I plays an important role in DNA excision 


repair by filling in single-stranded gaps left in DNA, 
following removal of damaged DNA by the excision 
machinery. The essential role of polymerases in DNA 
repair is illustrated by the fact that cells containing an 
inactive form of DNA polymerase I are highly sensi- 
tive to the damaging effects of UV light and X-rays as 
well as mutagenic chemicals. 

Reverse transcriptases are also DNA polymerases 
except with one critical difference; unlike DNA repli- 
cation and repair polymerases, reverse transcriptases 
use an RNA template to synthesize DNA. Thus, these 
enzymes are used by retroviruses to copy the single- 
stranded viral genomic RNA into double-stranded 
DNA that is necessary to invade host organisms. 
The human immunodeficiency virus type 1 (HIV-1) 
reverse transcriptase has been exceptionally well scru- 
tinized in recent years. The polymerase domain of 
reverse transcriptase is very similar to that of DNA 
polymerases described above, indicating a similar cata- 
lytic mechanism for DNA polymer formation. In 
addition the enzyme has a ribonuclease domain that 
degrades the RNA template, allowing synthesis of a 
second DNA strand to form duplex DNA. Detailed 
crystallographic structures and mechanistic informa- 
tion on the HIV-1 reverse transcriptase have allowed 
design of specific and potent inhibitors of the enzyme, 
such as AZT and Nevirapine, that are used as drugs in 
the fight against HIV infection. 


RNA Polymerases 


RNA polymerases synthesize RNA polymers com- 
plementary to a DNA template, and thus transcribe 
information from genes into RNA. A DNA-dependent 
RNA polymerase binds specific initiation sites on the 
DNA known as promoters, and unwinds the duplex 
just enough to start de novo synthesis on the template. 
After linking the first two nucleotides together, the 
polymerase elongates the RNA polymer in the 5'—3' 
direction as it moves on the template. Transcription 
ends at a terminator site on the DNA which signals 
the polymerase to stop RNA synthesis. The catalytic 
site on RNA polymerases and the mechanism of 
RNA polymer formation are likely similar to those 
observed for DNA polymerases, except for the 
obvious difference that RNA polymerases use rNTPs 
instead of dNTPs. Beyond the basic similarities, how- 
ever, RNA synthesis in the cell is a highly complex and 
distinctly different process from synthesis of DNA. 

Gene expression plays a prominent role in the cor- 
rect development and functioning of an organism, 
therefore transcription of genetic information is a 
highly regulated cellular process. Regulation of gene 
expression can occur at initiation of transcription or 
during elongation of the RNA polymer. Accordingly, 
RNA polymerases in both prokaryotes and eukaryo- 
tes are associated with several accessory protein fac- 
tors that interact with promoters and other proteins to 
ensure that genes are transcribed from the right sites 
and under the right conditions. For example, the 
eukaryotic RNA polymerase II uses at least six tran- 
scription factors (TFIIA, TFIIB, TFIID, TFIE, 
TFIIF, and TFII-I) as well as other enhancers or 
repressors when synthesizing RNA transcripts. In 
fact, the more complex the organism, the more elabor- 
ate the transcription machinery appears to be. Since 
cells in higher eukaryotes are well differentiated, only 
a small proportion of the total genetic information is 
used by any one cell type at one time, which can only 
happen because transcription in complex organisms is 
so finely controlled. 

Ongoing studies of the structure and mechanism of 
enzymes continue to provide detailed information on 
how cells maintain life. Specifically, the information 
on DNA and RNA polymerases, which are crucial to 
all life forms on earth, is essential for understanding 
how life evolved as well as for understanding how 
organisms grow and replicate to propagate life. 
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The polymerase chain reaction (PCR) is a method that 
can amplify a single gene from a genome so it can be 
analyzed. It can start with extremely small samples 
containing only one or a few molecules of the target 
DNA. These samples can be from forensic evidence, 
biopsy specimens of a few cells, or a few microbes. As 
a tool of genetic research in the laboratory, it is invalu- 
able for directed variation and construction of genes. 

As did not escape the attention of Crick and Watson 
in 1953, the base-pairing complementarity of the two 
strands of DNA suggests how DNA is replicated 
by life forms to make two copies from one. PCR can 
accomplish this replication in a test tube with only a 
few biochemical reagents, albeit only for short targets 
(currently those under 45 kb). Except for an enzyme 
(DNA polymerase) to do the copying, the primers to 
get the copying started, little else is needed to com- 
prehend and use the PCR method to replicate a gene 
of DNA in a sterile test tube without any cells. Kary 
Mullis won a share of the Nobel prize in 1993* for 
inventing PCR in 1983. After many and continuing 
improvements, PCR has become a major method of 
analysis and construction of DNA for research and 
practical purposes. 

The basic scheme is shown in an example PCR in 
Figure |. A span of DNA, which we will call the 
target, is the only part that is to be replicated, and the 
term amplification is more commonly used for this 
in vitro process. The target sequence is always 
between two ‘primers,’ which are short pieces of 


*These other items were also the subject of Nobel prizes: 
DNA Polymerase (Kornberg, in 1959) and nucleic acid 
oligomer synthesis chemistry (Khorana, in 1968). 
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Starting DNA template and primers: 


Step A. Melt. 95° C for two seconds. 


u 


Step B. Anneal. 60° C for 30 seconds. The primers, which are in large excess, form base- 
paired double helix with their complementary sequences. 
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Step C Extend. 68° C for 10 minutes. Extension by thermostable DNA polymerase 
actually begins during the previous annealing step, and continues during the extension 
step, incorporating dNTP subunits to make double-strand DNA. 


REPEAT the cycle of steps a,b,c 20 times, just by shifting the temperature, to obtain 2?° 
(1 million) copies of each input molecule of target DNA, assuming 100% efficiency. Actual 


PCR reactions are not 100% efficient 


wy 
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Figure | 


ia PCR product 


The PCR cycle. ~~ = extra, optional, designed DNA sequence on the 5’ portion of PCR primer. 


—> =PCR primer homologous to edge of target DNA span. ~ - = extended DNA past the other primer; this 
only happens at the first cycle for the exponentially amplifying DNA. Linear amplification (by recopying the original 
strand this way) is negligible compared to the exponential amplification. —- = priming location, where primers are 


complementary to template strand. 


single-stranded DNA, usually 20-30 nucleotides in 
length. The sequence of the primers is chosen to 
match the sequences at the borders of the target, and 
the primers can be supplied quickly and inexpensively 
by many companies. We will refer to the two strands 
of the DNA double helix as the top and the bottom 
strand (other terms for the two strands are Watson and 
Crick, sense and nonsense, plus and minus). Arbitrar- 
ily, we will consider a map in which the left primer is a 
piece of top strand sequence, and the right primer is a 
piece of bottom strand sequence. These sequences 
must be known, although the sequence in between 


them need not be known, unless nesting is intended 
(see below). 

The C in PCR could just as well mean ‘cycle,’ 
because a few dozen cycles is what PCR consists of. 
Each cycle starts with separating the two strands of 
DNA, which is called melting or denaturation. This is 
accomplished by subjecting the reaction (a drop of 
solution in a closed vial) to a few seconds at 93- 
96°C, and its purpose is to provide single-stranded 
template for DNA polymerase to copy. 

Then, in the annealing step, the primers are put 
onto their matching locations: top (left) primer binds 


to its matching sequence on the bottom strand, point- 
ing its 3’ end to the right. Similarly, the right primer 
points leftward, in the opposite direction toward the 
target and the left primer. The primers are present in 
huge excess, they find their correct locations just by 
bumping into them, and the base pairs line up to lock 
them in place if the temperature is about 60°C. 

Each cycle finishes with a few minutes at 68°C, 
during which the DNA polymerase extends the 
primer sequences, synthesizing a new strand of DNA. 
DNA polymerase can only add to the 3’ end of each 
primer. Monomer units of A, G, C and T, added as 
dATP, dGTP, dCTP, and dTTP, are included at the 
start for the DNA polymerase to incorporate into 
the extending DNA chains. 

That is one cycle, and most PCRs are 20 to 40 
cycles. A machine, known as a PCR machine or a 
thermal cycler, methodically raises and lowers the 
temperature under computer control. After 30 cycles, 
each starting target gene of DNA is theoretically amp- 
lified by a factor of 2°°, but theoretical efficiency is 
never actually achieved in practice: 70% efficiency 
results in amplification by 1.70°°, and 90% efficiency is 
1.90°°, etc. 

Amplification by 2°°, even though it may take 40 
cycles because of imperfect efficiency, can result 
in 1 tg of a single gene, starting from 1ng of DNA 
that consisted of 10 million genes. One microgram 
of product DNA is enough for several analytical 
purposes such as detection, sizing, cloning, and 
sequencing. 


Improvements and Applications 


The PCR process is being continually improved by its 
users, who as a group have made it a rich collection of 
molecular techniques. No discussion of PCR is com- 
plete without an attempt to describe some of these 
applications and improvements. 


Taq DNA Polymerase 


The first improvements were made by Mullis’s col- 
leagues at the biotechnology company where he was 
working, Cetus (Saiki et al., 1988). The most import- 
ant was to use a thermostable DNA polymerase from 
a bacterium that grows in hot water. Thermus aqua- 
ticus DNA polymerase (Taq) can withstand heating to 
96°C for extended periods, and is enzymatically 
active between 60 and 70°C. Without heat resistance, 
the DNA polymerase enzyme would have to be added 
fresh for the extension step of each cycle. Such heat 
resistance is very unusual for enzymes, most of which 
would cook like an egg at such temperatures. Despite 
intensive searching in hot environments throughout 
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the world, and the testing of hundreds of other thermo- 
stable DNA polymerases, the first one tried, Taq, is 
still the most widely used. Useful variants of Taq are 
available which lack 5’-exonuclease, have a couple 
degrees more thermostability, can incorporate nucleo- 
tide analogs more readily, or all three. 


Gene Construction and Making Mutants 


Directed mutagenesis is used by scientists to create 
test mutations or test genes for research or com- 
mercial pplications. Larger changes, such as hooking 
genes or their control regions together, are made to 
achieve expression in various biological or in vitro 
systems. PCR allows many precise changes to be 
specified conveniently, merely by making changes 
in the primers. This is because primers will prime 
even though they are not a perfect match. They can 
have several changes or even a few missing nucleot- 
ides, as long as their 3’ end matches for the last 10 or 
so. Also, once the 3’ 20 or 30 bases match the target, 
the 5’ portion, consisting of 10 to 50 bases of any 
sequence desired by the scientist, can add new 
sequences to the ends of the target DNA, since it 
will function as template for DNA synthesis coming 
the other way. 


Linker PCR 


Another important application of PCR, and an ex- 
ample of how complex PCR applications can be, is 
variously known as linker PCR or ligase-mediated 
PCR (Pfeifer et al., 1989). This is used to amplify 
DNA molecules from their very ends, even though 
the sequence at their very ends is unknown. Recall that 
only known DNA sequence, or at least sequence with 
known primer sequences at its ends, can be amplified 
by PCR. In linker PCR, we put a short piece of DNA 
of designed, and therefore known, sequence onto the 
ends using the enzyme DNA ligase. The PCR primer 
at the linkered end is then one strand of the linker 
DNA. Since every fragment of DNA in the solution 
can be ligated to a linker, another specific primer is 
needed to select out the desired target DNA sequence 
for more efficient amplification. 

If the linker is fairly long, such as 50 rather than 20, 
it will suppress the amplification of molecules which 
have this linker at both ends, by a process known as 
‘panhandle suppression’ (Lukyanov et al., 1997). The 
longer linkers match each other and anneal together to 
form a topological panhandle at the ends of the target. 
The panhandle forms faster and more stably than 
primer binding at the same ends, so priming and 
PCR are suppressed for molecules with the linker at 
both ends. The advantage to this effect is that now, 
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only one specific primer to a target gene can target 
the amplification, with the successful targets having 
the specific primer on one end, and a long linker on the 
other end. 


Nesting 


PCR primers occasionally prime, singly or together, at 
a few unwanted places on the template DNA or con- 
taminating DNA, and many unwanted genes are 
amplified. When this happens, nested primers can be 
used in a second-stage PCR that uses as template a few 
percent of the product from the first stage. Nested 
primers are specified to prime at the sequence just 
inside the first pair of primers on each side of the 
target. Sometimes the nesting is on only one side, 
and sometimes triple nesting is employed. In any 
case the idea is that the unwanted genes will not have 
the nested sequence, so only the desired target is 
amplified. 

Nesting is most commonly necessary for extreme 
amplification factors (such as starting with only one 
copy of template, such as a single sperm), or for linker 
PCR (for which the nesting is only on one side.) 


Long and Accurate PCR 


The size of the target span of DNA for PCR was 
initially limited to about 3000 base pairs. Another 
limitation was the fidelity — mutations were created 
about every 1000bp by the Taq DNA polymerase. 
Both of these problems were improved upon by a 
factor of about 10 by the inclusion of a low level of 
another DNA polymerase in addition to the Taq 
(Barnes, 1994). This other DNA polymerase could 
be from any of several bacteria known as Archaebac- 
teria, most found near undersea volcanoes and volcan- 
ic vents. Although these DNA polymerases, known 
variously as Pfu, Pwo, Vent, and Deep Vent, could 
withstand even higher temperatures than Taq (up 
to 110 °C), that is not why they are valuable. Their 
valuable other feature is that they have proofreading 
activity. Proofreading is catalyzed by an enzyme 
domain which is attached to (is part of) most single- 
chain DNA polymerases; it removes mismatched base 
pairs, the wrong bases that are occasionally inserted 
by Taq. Taq is unusual among DNA polymerases in 
that it somehow lost this proofreading activity during 
evolution. When present, proofreading activity chews 
DNA 3’ ends (this is known as a 3’-exonuclease), 
including those of primers bound and unbound, but 
it chews off mismatched base pairs faster than 
matched base pairs. 

The scenario for long and accurate PCR is thus as 
follows: Taq (or a variant known as Klentaq) carries 


out the bulk of the DNA synthesis for PCR. When it 
makes a mistake, inserting, for instance, an A opposite 
an A on the template, instead of inserting the correct T, 
the resulting 3’ end sticks out at a slight angle. The fit 
of the DNA and monomer substrates for the next 
addition is now a little off, and the enzyme tends to 
come off before it slowly locks in the mistake by 
further extension. Eventually, some of these molecules 
will be extended and lock in the mutation, causing low 
fidelity. Over long DNA targets, the slowness of this 
step causes most molecules with a mismatch to drop 
out of the PCR amplification, because the Taq does 
not get a chance to get to the end of the target span 
before the few minutes of extension time are over. If 
the proofreading enzyme is present, however, it 
removes the mismatched base, allowing Tag to jump 
back on and synthesize a few more kilobases rapidly. 
Thus longer targets can be amplified, and they have 
higher fidelity. 

Other advantages to the PCR with a mixture of 
polymerases are (1) all PCR reactions, not just the 
long ones, become more efficient, and (2) PCR pro- 
ducts can be more effectively used as primers them- 
selves (they are then known as ‘megaprimers’), during 
complex gene and plasmid construction procedures. 
Without the mixture of polymerases, the problem 
with using PCR products as primers is that Taq (and 
any DNA polymerase so far tested which lacks a 3’- 
exonuclease) puts an extra A onto the 3’ ends of PCR 
products, and unless this A base-pairs with a T, prim- 
ing will be inefficient because of an immediate mis- 
match. 

Pure proofreading enzyme is tricky to use for PCR, 
since its 3’-exonuclease activity tends to chew up the 
PCR primers, leaving only the 15 nucleotides or so at 
the 5’ end. When it does work, usually for products in 
the original 3 kb range in size or shorter, and with 
primers having 5’ portions homologous to the target, 
the product has high fidelity. 


DNA Typing 


The clearest differences between individuals of the 
same species that can be analyzed by PCR are what 
are called ‘length polymorphisms.’ The most common 
of these arise every few generations at some of the 
sequences consisting of short repeats of three or four 
nucleotides (STR, short tandem repeats), because 
of the relative difficulty of keeping these tandem 
repeats in register during natural replication. The dif- 
fering sizes of a PCR product from the same map 
location on a chromosome make these differences 
valuable in DNA fingerprinting of individuals, and 
their linkage to genetic markers assists in the mapping 
of genes. 
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Problems 


Contaminating DNA is the biggest problem for ana- 
lytical PCR. PCR is so effective that even human 
DNA from skin surface or dead cells (handprints) 
can serve as substrate. DNA from previous PCR reac- 
tions in the same laboratory (carryover DNA) can 
contaminate the pipette devices and even the air of 
the laboratory, with the result that control PCR reac- 
tions with no added template can give rise to the PCR 
target product anyway. Scientists have come up with 
several ways to address this carryover problem, the 
most effective of which require that the PCR product 
first and always be amplified with special nucleotides 
or primers so that PCR product DNA has some 
incorporated vulnerabilities. Enzymes or chemical 
treatments which attack the vulnerabilities then 
allow destruction of the contaminant carryover PCR 
product DNA without harming the target template 
DNA. 
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Quantification of the PCR target via classical PCR 
end-point analysis is a complex and time-consuming 


process. Firstly, the quality and amplifiability of the 
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isolated DNA or RNA should be ensured. Further- 
more, in reverse transcriptase (RT) PCR studies the 
number of mRNA molecules should be normalized to 
the number of transcripts of a housekeeping gene. 
Secondly, minor variations in RT efficiency, primer 
annealing, and primer extension may lead to major 
variations at the end of the PCR, i.e., after 30-35 
PCR cycles. These disadvantages of ‘PCR end-point 
quantification’ might (partly) be overcome by intro- 
duction of extra steps, such as: 


e serial dilution of well-defined amounts of the target 
DNA or RNA, which are analyzed in parallel to the 
test sample, often in combination with blotting and 
subsequent hybridization with a sequence-specific 
probe (semi-quantitative results, based on compari- 
son of PCR signals); 

e limiting dilution of the test sample in replicate exper- 
iments until negative PCR results are obtained; 

e competitive PCR, using several concentrations of 
an internal standard (competitor) in separate PCRs, 
followed by comparison of the PCR target signal 
with that of the competitor. 


These complex and time-consuming PCR analyses 
for quantification of the involved target can now be 
replaced by ‘real-time’ quantitative PCR (RQ-PCR). 


RQ-PCR: Principle of the Technique 


RQ-PCR permits accurate quantification of PCR 
products during the exponential phase of the PCR 
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probe technology and (B) hybridization probe/FRET 
technology. 
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amplification, which is in full contrast to the classical 
semi-quantitative PCR techniques with ‘end-point’ 
quantification. Because of the real-time detection of 
fluorescent signals during each PCR cycle, quantita- 
tive data can be accomplished in a short period of time 
and no post-PCR processing is needed, thereby 
drastically reducing the chance of PCR product 
contamination. At present, RQ-PCR analysis can be 
performed by three fluorescence-based techniques, 
which differ in the way the PCR products are 
detected. 


TaqMan Probe-Based RQ-PCR Analysis 

The TaqMan probe-based RQ-PCR analysis exploits 
the 5’ — 3’ nuclease activity of the Taq polymerase 
to detect and quantify specific PCR products as the 
reaction proceeds. The internal target-specific 
TaqMan probe is conjugated with a reporter fluoro- 
chrome (e.g., FAM, VIC, or JOE) and a quencher 
fluorochrome (e.g, TAMRA). As long as these 
two fluorochromes are in each other’s close vicinity, 
the fluorescence emitted by the reporter fluoro- 
chrome is absorbed by the quencher fluorochrome. 
However, upon amplification of the target se- 
quence the TaqMan probe is degraded by the Taq 
polymerase, resulting in the separation of the reporter 
and quencher fluorochrome. As a result, the fluores- 
cence signal of the reporter fluorochrome will 
become detectable and further increases during the 
consecutive PCR cycles because of the progres- 
sive accumulation of free reporter fluorochromes 
(Figure IA). 


Hybridization Probe-Based RQ-PCR 
Analysis 

The hybridization probe-based RQ-PCR analysis 
uses two sequence-specific probes, one labeled with 


Table | 


a donor fluorochrome at the 3’ end and the other 
labeled with an acceptor fluorochrome at the 5’ end. 
The location of the two probes is selected so that they 
can hybridize to juxtaposed target sequences on the 
amplified DNA fragment, thereby bringing the two 
fluorochromes into close proximity. Upon absorption 
of light of a specific wavelength, the donor fluoro- 
chrome (e.g., fluorescein) will emit light of a slightly 
longer wavelength. When the two fluorochromes are 
in close vicinity (i.e., within 1 to 5 nucleotides), the 
emitted light of the donor fluorochrome will excite 
the acceptor fluorochrome (a process referred to as 
fluorescence resonance energy transfer, FRET), 
resulting in the emission of light with a longer wave- 
length which then can be detected during the anneal- 
ing phase and the first part of the elongation phase of 
the PCR reaction (Figure |B), i.e., as long as the two 
probes are juxtaposed and FRET activity takes place. 
Also in this RQ-PCR technique, the fluorescent signal 
is exponentially increasing during the consecutive 
cycles, in line with the amount of PCR product 
formed. 


SYBR Green I Dye-Based RQ-PCR 
Analysis 


The third possibility for RQ-PCR analysis is detec- 
tion of PCR products via the DNA-intercalating dye 
SYBR Green I. This dye binds to the minor groove of 
double-stranded DNA, which greatly enhances its 
fluorescence. During the consecutive PCR cycles, 
the amount of double-stranded PCR product will in- 
crease, and therefore more SYBR Green I dye can bind 
to DNA and emit its fluorescence. Maximal SYBR 
Green I dye binding will occur at the end of each 
elongation phase. Although this approach is the most 
cost-effective and potentially sensitive, the detection 


Examples of the application of RQ-PCR in medicine 


Gene expression levels 


mRNA transcript levels in purified or well-defined cell populations: 


® activated versus non-activated cells 


© immature versus mature cells 
Genetic diseases, e.g., immunodeficient patients: 
@ decreased or absent transcript levels 
@ remaining levels of wild-type transcripts versus aberrant transcripts 


Rare events Tumor cells: 


@ detection of minimal residual disease during and after therapy, e.g., detection of 
chromosome aberrations 
Fetal cells in peripheral blood of the mother 


Viruses 


Viral load in HIV-positive patients 


CMV detection in patients post-bone marrow transplantation 


Telomerase length 
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of PCR product is not sequence-specific, which is in 
contrast to the TaqMan probe and hybridization probe 
approaches, which use one and two sequence-specific 
oligonucleotides, respectively. Therefore, further stud- 
ies should determine for each PCR target whether 
SYBR Green I dye-based RQ-PCR analysis shows 
satisfactory specificity. 

For each of the three methods, the cycle at which 
the fluorescence signal exceeds a certain background 


(A) 
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fluorescence level, refered to as the threshold cycle 
(Cr), is directly proportional to the amount of target 
DNA present in the sample. The methods have a very 
large dynamic range over five orders of magnitude, 
thereby eliminating the need for performing serial 
dilutions of samples. At present two RQ-PCR appar- 
atus are available: the ABI Prism 7700 (Applied Bio- 
systems, Foster City, CA, USA) and the LightCycler 
(Roche, Mannhein, Germany). 
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Figure 2 (A) Schematic presentation of the ASO probe and ASO primer TaqMan approach for RQ-PCR analysis of 
lg and TCR gene rearrangements. Representative example of a dilution experiment of a diagnostic sample from an 
ALL patient by RQ-PCR analysis of an lg gene rearrangement by using the ASO probe approach (B) and the ASO 
primer approach (C). Note the difference in the increase in fluorescence (delta Rn) between the ASO probe 
approach (specific detection of leukemia-specific PCR product between PCR products derived from normal cells) and 
the ASO primer approach (specific amplification of leukemia-specific PCR products). 
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Applications of RQ-PCR Analysis: 
Detection of Minimal Residual Disease 


RQ-PCR can be used for all applications in which 
quantitative dataon DNA or RNA levels are required 
(Table 1). In this section we will focus on the applic- 
ability of RQ-PCR analysis for detection of low fre- 
quencies of malignant cells in leukemia, as quantitative 
PCR data have significantly improved the assessment 
of treatment efficacy. 

Recent studies have indicated that detection of 
‘minimal residual disease’ (MRD) in patients with 
acute lymphoblastic leukemia (ALL), acute promye- 
locytic leukemia (APL), and chronic myelogeneous 
leukemia (CML) can give clinically relevant insight 
into the effectiveness of treatment. Furthermore, it 
was shown that quantitative sensitive MRD informa- 
tion can be used for risk group classification in ALL. 
Such quantitative and sensitive MRD data can now be 
obtained by RQ-PCR analysis of leukemia-specific 
chromosome aberrations as PCR targets (APL, CML, 
subset of ALL) as well as junctional regions of immuno- 
globulin (lg) and Tcell receptor (TCR) gene rearrange- 
ments (ALL). 

If junctional regions of lg and TCR gene rearrange- 
ments are used as PCR targets for MRD detection, the 
TaqMan probe-based approach can be used in two 
ways (Figure 2A). 


1. ASO probe approach: The TaqMan probe is posi- 
tioned at the junctional region (allele-specific oligo- 
nucleotide (ASO) probe) and used in combination 
with germline primers, implying that the TaqMan 
probe has to detect leukemia-specific PCR prod- 
ucts between the background of PCR products 
derived from polyclonal lg or TCR gene rearrange- 
ments of normal cells (Figure 2). This approach 
needs the design of new TaqMan probes for each 
rearrangement. 

2. ASO primer approach: The TaqMan probe and one 
of the primers are positioned at germline sequences, 
whereas the other primer is located at the junctional 
region (ASO primer; Figure 2). This approach 
aims at the specific amplification of the leukemia- 
specific junctional region. Germline TaqMan pro- 
bes can in principle be used for all lg and TCR gene 
rearrangements, which use the gene segments that 
are recognized by the TaqMan probe. 


Using these approaches, a dilution series of the diag- 
nostic sample can be made (Figure 2). Based on a two- 
fold amplification during each PCR cycle, a 10-fold 
dilution should theoretically result in a Cy increase of 


3.3 (i.e., "log 10), but in practice the slope of the dilu- 
tion curve will generally be between 3.2 and 3.9. A 
sensitivity of 10~* (i.e., 1 leukemic cell between 10* 
normal cells) can be reached in the majority of cases. 
The amount of residual leukemic cells in follow-up 
samples obtained during or after treatment can be 
calculated by using the standard curve of the diagnos- 
tic sample. If fusion gene transcripts from chromo- 
some aberrations are used as PCR target for the 
detection of MRD, copy numbers could also be calcu- 
lated by refering to a dilution curve of known amounts 
of plasmids containing the fusion gene transcipt. In 
both approaches, a control gene (e.g., albumin for 
DNA or Abelson for RNA) should be used to correct 
for the total amount of DNA/RNA and its amplifi- 
ability. 
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Genetic variation is the raw material for evolutionary 
change and ever since Darwin proposed his theory of 
evolution by natural selection, the amount and pattern 
of genetic variation within population and species has 
been the subject of scientific investigation. In sexually 
reproducing populations, the majority of phenotypic 
variation is continuous, i.e., organisms differ from 
each other in terms of the degree of characters such 
as in the shape and size. Discrete visible phenotypes 
that show distinct alternative forms among individ- 
uals, such as the shell banding patterns in snails, 
flower color variation in plants, or blue and brown 
eye color in humans, are relatively few in number in 
any species. 

‘Polymorphism’ is a special aspect of ‘genetic vari- 
ation’ and both terms are often used interchangeably 
in the literature. Polymorphism literally means the 
presence in the same population of two or more alter- 
native forms of a distinct phenotype such as flower 
color. Polymorphism can occur in any genetic trait, 
phenotypic or physiological, in any coding or non- 
coding segment of DNA (nucleus, mitochondria, or 
chloroplast). Polymorphism is a special aspect of 
genetic variation because it connotes segregation of 
relatively common variants within populations and 
also implies the presence of some evolutionary 
mechanism(s) for their maintenance. A general defini- 
tion of genetic polymorphism is that the locus (or the 
genetic entity under consideration) should contain 
two or more alleles, with the most common allele 
having a frequency of 99% or less. A more stringent 
definition of genetic polymorphism sets a lower limit 
to the frequency of the most common allele (95% or 
less). Under the latter criterion, a gene with two alleles 
(say A and a) with frequencies of 0.95 and 0.05 in a 
random mating population would produce three 
genotypes in proportions: AA (90.25%), Aa (9.5%), 
and aa (0.25%). In inbreeding organisms, genetic 
polymorphisms occur with elevated frequencies of 
homozygous genotypes and reduced frequencies 
of heterozygous genotypes. 


Inversion Polymorphism 


One of the best studied forms of genetic polymor- 
phism in natural populations occurs in the form of 
chromosome inversions. Insects species, and Dros- 
ophila populations in particular, harbor large amounts 
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of inversion polymorphisms. These inversions arise 
from double breaks in a given piece of chromosome 
followed by an inversion of the broken piece and end- 
to-end chromosome ligation. Although gene orders in 
the inverted pieces are reversed and drastic as this may 
seem, inversion polymorphisms are usually not asso- 
ciated with drastic morphological effects. However 
they do have fitness effects on the organisms. Some 
species show several types of inversion polymorphism 
segregating in the same population. Inversion poly- 
morphisms show geographic, latitudinal, and seasonal 
variation in genotypic frequencies, which suggests 
that they are affected by natural selection. 


Blood Groups Polymorphism 


One of the most commonly known genetic poly- 
morphisms is that of the red blood cell antigens in 
humans. We all know about the ABO blood group 
system which has three alleles and gives rise to six 
genotypes but only four phenotypes (the four blood 
groups): A, B, AB, and O. The A and B blood group 
individuals can be either homozygous or hetero- 
zygous. Blood groups are defined on the basis of chem- 
ical cues on the surface of the blood cells (antigens) 
which are involved in cell recognition. Only certain 
combinations of blood transfusions are possible. More 
than 20 blood group genes are known but only a few 
of these are highly polymorphic (e.g., MNS, Rh7, 
Kidd, Duffy, and Lutheran). One of the most complex 
and polymorphic gene systems in humans is the 
‘major histocompatibility complex’ (MHC), the 
HLA system, on chromosome 6, where scores of 
genes are involved. These genes are so highly poly- 
morphic that no two individuals (except identical 
twins) are alike in their HLA genotypes. Some of 
the blood group genes (such as rhesus, Lutheran, 
and Kell) show allele frequency variation between 
human populations and are of anthropological interest 
in human studies. 


Protein Polymorphisms 


While visible phenotype and inversion polymorph- 
isms provided rich sources of genetic markers for 
population and evolutionary studies, the uncovering 
of a substantial amount of genetic variation had to 
await the arrival of molecular techniques that could 
detect genetic variation directly at the level of gene or 
gene product. Gel electrophoresis, with its power to 
resolve migration differences between protein mol- 
ecules in an electrical field, turned out to be a powerful 
tool for quantifying genetic variation in natural popu- 
lations. Amino acids, the building blocks of proteins, 
are electrically charged (positive, negative, or neutral) 
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and any mutational changes resulting in the replace- 
ment of one amino acid by another may change the 
effective net charge of the protein. Protein electro- 
phoresis allowed detection of allelic differences 
among the protein products (commonly known as 
‘allozymes’) of hundreds of different protein-coding 
genes. It can readily differentiate homozygotes from 
heterozygotes since both copies of the gene in a given 
individual express alternative forms of the gene pro- 
duct. The technique can be used with tiny amounts of 
crude proteins from tissue preparations and it allowed 
unprecedented comparison of allelic profiles across 
related species and genera. Study of genetic poly- 
morphism in populations became free of hindrance 
from the reproductive biology of the organisms and 
comparisons could be made between widely separated 
species. Using this technique it was shown for the first 
time in 1966 that about a third of all the genes in 
chromosomes are polymorphic, and that the average 
individual is heterozygous (i.e., carries two different 
alleles at a gene locus) for about 10% if its protein- 
coding genes. During the 1970s and 1980s, a flood of 
genetic variation studies followed and protein poly- 
morphisms were shown to occur in all sorts of organ- 
isms ranging from microbes to humans. 


DNA Polymorphisms 


Since the genetic code is redundant, only about a third 
of all the mutations in the coding genes will lead to 
change in the protein. The remaining mutations will 
remain silent or undetectable by protein electrophor- 
esis. Protein electrophoresis is also not useful for 
detecting mutational changes in the noncoding 
portion of the DNA such as introns, regulatory se- 
quences, and satellite DNA. To observe these we 
need molecular techniques that can detect nucleotide 
variation directly in the DNA. Several such tech- 
niques are now available. Restriction fragment length 
polymorphism (RFLP) detects length variation in a 
given segment of DNA. These length variations are 
caused by restriction enzymes (harnessed from bac- 
teria) which have the property to recognize the pre- 
sence of a specific sequence of nucleotides (their DNA 
signature or restriction site) and cut the DNA at a 
precise place within this sequence. Mutational changes 
in the ‘signature’ sequence result in the loss of the 
enzyme’s ability to cut it. The presence/absence 
(+/—) of restriction sites in any piece of DNA will 
result in two types of length variation or two alleles. 
Several restriction enzymes used in a sequential man- 
ner will generate a variety of polymorphic nucleotide 
genotypes (or haplotypes) that can be scored in any 
piece of DNA, nuclear, mitochondrial, or chloroplast. 
Restriction enzymes vary in the number of nucleotides 


they recognize in their restriction site. Some recognize 
four nucleotides (four-cutters), some six (six-cutters), 
some eight, and so on. The higher the number of 
nucleotides in a restriction site signature, the fewer 
the number of such sites a DNA molecule is likely to 
have. So depending on the levels of diversity present, 
one can choose appropriate restriction enzymes for 
polymorphism studies. 

A second and very common type of DNA poly- 
morphism is scored by making use of the RFLP 
method with mini- or microsatellite DNA. Mini- or 
microsatellite DNA consist of short repeated DNA 
sequences; mini- and microsatellite DNA differ in 
the size of their basic repeat units. Variation is pro- 
duced by mutational expansion or contraction of the 
number of repeat units at a given site in the chromo- 
some. 

The ultimate measure of genetic polymorphism is, 
of course, DNA sequencing. Polymerase chain reac- 
tion (PCR) amplification of specific loci followed 
by DNA sequencing has become a powerful tool in 
the hands of even the biochemically disadvantaged. 
Detection of DNA sequence polymorphism is now 
routine and data are accumulating at a rapid rate. 
Unlike protein electrophoresis, DNA sequencing 
allows quantification of all types of genetic variation 
in DNA, coding as well as noncoding. DNA sequenc- 
ing has not only revealed more genetic variation than 
could be detected at the level of the protein, the pat- 
terns of sequence variation between the coding and 
noncoding, and between the coding and control 
regions, are complex and rich and can be used to 
infer the role of various evolutionary forces shaping 
this variation. 


Uses of Polymorphisms 


Genetic polymorphisms, whether studied in the form 
of allozymes, RFLP, mini- and microsatellite vari- 
ation, or DNA sequences, have become useful tools in 
a variety of research fields such as population genetics, 
evolutionary genetics, systematics and molecular phy- 
logeny, human genetics, agricultural genetics, and 
forensics. Genetic polymorphisms, through multiple 
alleles at individual loci, provide a mechanism to tag 
a gene or a piece of DNA, which is a powerful tool 
for a variety of investigations. Some of these inves- 
tigations are: identification of genotypes in paternity 
and forensic studies; movement of individuals in field 
studies; progress of selection experiments in cage 
populations; mapping of quantitative loci affecting 
economical traits in plants and animals; mapping of 
disease genes in humans; and evolutionary compari- 
sons of DNA sequences and chromosome organiza- 
tions between related species. The uses of genetic 


polymorphisms are almost endless. Within a mere 50- 
year period, our picture of genetic variation in natural 
populations has moved from near monomorphism to 
ubiquitous polymorphism in all organisms whose 
populations have not gone through severe bottlenecks 
in their recent evolutionary history. 


See also: Balanced Polymorphism; Microsatellite; 
Minisatellite; Restriction Fragment Length 
Polymorphism (RFLP) 
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Traditionally, one genetic locus of a chromosome was 
called “polymorphic” in one particular population of a 
species if the allele frequency of the most frequent 
allele was lower than a particular value, say 99%. 
However, this definition is no longer popular, because 
the definitions of locus, allele, species, and population 
are not clear and because the setting of threshold value 
is rather arbitrary. The modern definition is more 
objective. A particular nucleotide site is called “poly- 
morphic” when more than one nucleotide is observed 
in a given sample of sequences. Reconstruction of the 
tree (genealogy) of a gene is possible only when that 
gene is polymorphic. 


See also: Bacterial Transcription Factors; 
Evolution of Gene Families 
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A polypeptide is a reasonably lengthy chain of amino 
acid residues linked together by peptide bonds. The 
word polypeptide is sometimes used synonymously 
with the word protein; however, a protein can refer to 
a molecule composed of more than one polypeptide 
chain. 

The term polypeptide indicates that the chain can 
be cleaved into smaller units, termed peptides, by 
treatments that hydrolyze peptide bonds between 
specific amino acid residues. Peptide is a term reserved 
for short chains, typically containing 20 amino acid 
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residues or less. Polypeptide is also the term used 
when referring to a chemically polymerized chain of 
amino acid residues, a chain whose sequence or exact 
length is not specified, or sometimes when the chain is 
not correctly folded. The word protein is often used to 
refer to a chain that is synthesized on the ribosomes 
in a cell using the templated instructions found in a 
gene, but the term polypeptide is perfectly correct in 
statements such as “each ribosome makes a complete 
polypeptide.” 


See also: Proteins and Protein Structure 
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Polyploidy 


Polyploid cells or organisms are those that have more 
than two complete sets of chromosomes (one from 
each parent or ancestor) in somatic and germline 
cells. Polyploidy of individual cells or cell types, aris- 
ing from chromosome replication without cell div- 
ision, is involved in the normal (e.g., secretory cells) 
or abnormal (e.g., many cancers) development of 
organisms. Polyploid individuals are found frequently 
as a result of incorrect meiosis or fertilization events, 
and may be generated experimentally. Many species 
are polyploid, with multiple chromosome sets having 
come together during their evolution; development of 
such species is normal, and in some cases the pheno- 
type may not be obviously different from that of the 
diploid species. While some polyploids are sterile, 
others may have meiosis that is indistinguishable 
from a normal diploid, and ancestral polyploidy, 
widespread in species evolution, may be difficult to 
detect. 


Nomenclature and Examples 

In presenting chromosome numbers or karyotype 
constitutions, the letter x refers to the basic chromo- 
some number in a polyploid ‘series’, while 27, the 
diploid chromosome number, refers to the number 
of chromosomes in a cell of the sporophyte (the indi- 
vidual normally producing the germ cells). Higher 
levels of ploidy (e.g., 3x to 12x) are described as appro- 
priate: triploid, tetraploid, pentaploid, hexaploid, 
octaploid, dodecaploid. Thus the crop bread wheat, 
Triticum aestivum, is a hexaploid species with 
six sets each of 7 chromosomes, and is designated 
as 2n=6x=42. A pentaploid with an additional 
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chromosome (made perhaps by backcrossing a tetra- 
ploid x hexaploid wheat hybrid to the hexaploid par- 
ent) would have a chromosome constitution of 27 = 
5x + 1 = 36. Autotetraploids contain four sets of 
chromosomes from the same species, involving a doub- 
ling of chromosome numbers from the diploid level; 
higher autopolyploids are frequent. Allotetraploids 
also contain four sets of chromosomes, with two sets 
derived from each of two distinct parental species (for 
example in an intergeneric hybrid, where chromo- 
some number is doubled); where the parental species 
are known, the allotetraploid species or individual is 
described as amphidiploid. 


Detection of Polyploidy 

Examination and counting of chromosome number by 
light microscopy is used to detect straightforward 
cases of polyploidy in cells or species: A multiple of 
the normal haploid number of chromosomes is seen. 
Sometimes there is little morphological distinction 
between polyploid and diploid individuals, and sev- 
eral ploidies may be placed in the same species, so 
chromosome counting (or measurement of DNA 
content of nuclei) is essential to measure ploidy. In 
other situations, detection of polyploidy may be dif- 
ficult: Genetic mechanisms restore pseudodiploid 
behavior, involving strictly homologous chromosome 
pairing, so even amphidiploids may be fully fertile. 
These species may be referred to as diploids with a 
paleopolyploid origin. Without such mechanisms, 
chromosomes will produce an assortment of pairing 
configurations, including trivalents and quadrivalents, 
which will not assort regularly to give balanced 
gametes. Methods to test for polyploidy include con- 
struction of haploids that reach meiosis and show 
bivalent formation between the ancestral chromo- 
some sets, or crossing of the polyploid to a suspected 
diploid ancestor; bivalents will pair between the 
ancestor and one genome in the polyploid (although 
chromosomes of an autopolyploid could pair with 
themselves leaving the suspected ancestral chromo- 
somes as univalents). Molecular cytogenetic methods 
including in situ hybridization using species-specific 
repetitive DNA sequences or genomic DNA are prov- 
ing valuable to analyze the constitution of polyploids. 
Duplications in the genome may involve ancestral 
polyploidy, but chromosomal aberration, aneuploidy, 
and sequence duplication can also occur. For example, 
it is unclear whether the duplication of 75% or more 
of the genome of the sequenced species Arabidopsis 
thaliana is the result of polyploidy or other mechan- 
isms of duplication. Dense molecular marker maps 
show the duplication of large chromosome segments 
in species that were not previously considered as poly- 
ploids: The diploid Brassica mustards are of hexaploid 


origin, making the cultivated Brassica napus (oil seed 
rape, canola) a dodecaploid with 12 chromosome sets 
(Lagercrantz and Lydiate, 1996). 


Polyploidy in Evolution 

Polyploidy, involving the presence of multiple copies 
of identical or similar chromosome sets in one species, 
is an important feature of species evolution in the 
plant, animal, and fungal kingdoms. Polyploidy is 
widely considered to be an enabling force in evolution. 
Because chromosome sets are duplicated in polyploids, 
heterozygosity may be fixed, and random mutation or 
factors modulating gene expression may be buffered 
(unlike a diploid), so new genes and gene functions 
may evolve, leaving the original function in the other 
chromosome set. 

Polyploidy is seen in many angiosperm plant spe- 
cies, and the related diploid species can be readily 
identified. More than 50% of all plants are obvious 
polyploids, while detailed studies are showing that 
many other species are crypto- or paleopolyploids. 
Polyploidy is rare in the other major plant group, 
gymnosperms. In animals and fungi, detailed compari- 
son of the gene content of chromosomes combined 
with comparative analysis of chromosomes and genes 
in distantly related species enables the suggestion of 
paleopolyploidy to be made. It is possible that the 
transition from invertebrates to vertebrates involved 
two rounds of polyploidy (Spring, 1997). In the yeasts, 
Wolfe and Shields (1997) present evidence that Sacch- 
aromyces cerevisiae is a degenerate tetraploid resulting 
from a whole genome duplication that occurred after 
the divergence of Saccharomyces from Kluyveromyces. 


Origin of Polyploidy 

Polyploid cells and organisms can be made by treat- 
ment of cells with mitotic inhibitors (such as colchi- 
cine) which enable chromosome replication to occur 
without cell division. Polyploidy may occur spontan- 
eously in cells, either because of abnormal divisions or 
as part of differentiation. Fertilization involving unre- 
duced (27) gametes is a frequent source of triploid and 
tetraploid organisms: a meiotic division fails or a polar 
body is not expelled, giving a 27 gamete. Fertilization 
by two male gametes may give triploids. In humans, 
triploid (2n = 3x = 69) and tetraploid conceptuses 
(2n = 4x = 92) (arising from both mechanisms) are 
found in 20% and 6%, respectively, of spontaneous 
abortions. 


Polyploidy in Crop Plants 

The world’s four most important crops provide ex- 
amples of the range of ploidy levels found in plants. 
Bread wheat is a hexaploid (2n = 6x = 42), derived 
as little as 30000 years ago from a diploid species 


(2n = 2x = 14), Aegilops squarrosa, and a tetraploid, 
durum wheat (27 = 4x = 28), Triticum turgidum, itself 
derived from two diploid species T. monococcum and 
a species similar to Ae. speltoides. The second most 
important crop, rice, is considered diploid, while 
molecular mapping data, the fertility of monosomic 
chromosome lines, cytogenetic comparisons with wild 
species, and some chromosome pairing data show that 
maize, the third most important crop, is a paleotetra- 
ploid. Banana, the fourth most important crop, is culti- 
vated as a triploid hybrid to give sterile fruit with 
parthenocarpic development. 

A few ‘new’ crops have been generated as man- 
made hybrids: the wheat x rye amphidiploid Triticale 
is widely grown in dry and colder areas of Canada and 
Poland. In horticulture, polyploids, whether species, 
natural, or artificial hybrids, are widely selected by 
breeders, perhaps because they tend to be larger than 
the equivalent diploid. 
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Polysome (Polyribosome) 
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A polysome (polyribosome) is a functional unit of 
protein synthesis consisting of an mRNA molecule 
associated with a series of ribosomes engaged in trans- 
lation. 


See also: Messenger RNA (mRNA); Translation 


Polytene Chromosomes 
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Polytene chromosomes represent interphase chromo- 
somes that are amplified up to a 1000 times and 


Polytene Chromosomes I5I1 


therefore can be studied in great detail at the relatively 
low magnification of the light microscope. They arise 
as the result of a process called endoreplication; 
several rounds of DNA replication take place but the 
chromosomes do not separate and the cells do not 
divide. However, certain regions of the chromosome 
are underreplicated (mainly the centromeric region 
containing heterochromatin). The amplified euchro- 
matin portion of each chromosome stays aligned and 
each chromosome can be distinguished by the char- 
acteristic banding pattern of alternating regions of 
condensed and less-condensed DNA (bands and inter- 
bands). Polytene chromosomes can be found in many 
plants (Phaseolus among others) and Dipteran insects 
(for example the mosquito, Anopheles, Chironomus, 
and the fruitfly, Drosophila). In Drosophila, polytene 
chromosomes are formed in certain adult and larval 
tissues, the best-studied and largest being those of the 
larval salivary gland cells. 

Comprehensive monographs on polytene chromo- 
somes have been published by Beermann in 1962 and 
by Sorsa in 1988. The first maps of the Drosophila 
melanogaster polytene chromosomes of the salivary 
gland cells were drawn by Bridges in 1935. Most 
widely used are the photographic maps prepared by 
Lefevre (1976). Saura and Sorsa constructed electron 
micrograph maps during the period of 1979-1997, 
which can be accessed on the internet at http:// 
www.helsinki.fi/~saura/EM/index.html. In addition 
maps have been constructed using atomic force micro- 
scopy and there even exist three-dimensional studies 
(Urata et al., 1995) as well as three-dimensional com- 
puter visualizations in pictures and movies. 


Endoreplication 


A typical cell cycle of the subsequent phases G1, S, 
G2, and M is controlled in higher eukaryotes by 
cyclins A, B, D, and E and their kinase partners 
Cdk1, 2, 4, and 6. Targets of the cyclin-Cdk com- 
plexes modulate transcription of cell-cycle genes. 
The activity of the cyclin-Cdk complexes is regulated 
at multiple levels. In the absence of Cdc25, Cdk1 in 
cyclin A-Cdk1 and cyclin B—Cdk1 is inhibited by 
phosphorylation, resulting in a G2 arrest. The G2 to 
M phase transition is initiated by the pulsed expres- 
sion of Cdc25, a phosphatase that activates Cdk1 in 
the cyclin A-Cdk1 and CyclinB-Cdk1 complexes. 
During mitosis, Cdc25 as well as mitotic cyclin A 
and Cyclin B are degraded. G1 arrest after mitosis 
occurs in the absence of cyclin E-Cdk2, due to a 
transcriptional downregulation of the cyclin E gene 
and the increased presence of an inhibitor of the 
Cyclin E-Cdk2 activity. Cyclin D in complexes with 
Cdk4 or Cdké6 regulates the progression through G1. 
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Entry into S-phase is regulated by the activation of the 
cyclin E-Cdk2 complex and progression through S 
requires cyclin A-Cdk2. Cells are switched from nor- 
mal mitotic cycles to endoreplication by turning off 
the cyclin A and B genes necessary for entry into M 
followed by periodic expression of cyclin E, forcing 
the cells over and over into DNA replication. 


Organization of Polytene Chromosomes 
of Drosophila melanogaster 


The diploid cells of D. melanogaster have two sets of 
four chromosomes; the sex chromosomes X and Y and 
three pairs of autosomes. The X chromosome is acro- 
centric. The autosomes 2 and 3 are metacentric and 
each arm is approximately the size of the X chromo- 
some. Autosome 4 is small and acrocentric. Ina typical 
polytene chromosome preparation, the euchromatic 
parts of the chromosomes X, 2, 3, and 4 are visible as 
six arms (five major arms and one small) all connected 
with their centromeres in the chromocenter. Each of 
the two homologous chromosomes are paired and can- 
not be seen individually (somatic synapsis). The cen- 
tromeres themselves and the Y chromosome are hardly 
visible since they are not amplified or under-replicated. 

Each of the chromosome arms can be recognized 
by their specific band and interband pattern. The total 
number of bands according to the original drawings of 
Bridges amounts to 5059. It has long been speculated 
that each (darker stained) band represented the locus 
of one gene, indicating that the total number of genes 
of Drosophila was in the order of 5000. However, the 
first analysis of the recently finished sequence of the 
complete Drosophila genome indicates the presence of 
approximately 13 000 genes. The notation of the bands 
in the chromosomes is still the original designation of 
Bridges. Each major chromosome arm is divided into 
20 numbered sections: 1-20 for the X chromosome, 
21-40 for 2L, 41-60 for 2R, 61-80 for 3L, and 81-100 
for 3R. The small chromosome 4 consists of only two 
sections, 101 and 102. Each section is subdivided into 
smaller regions, numbered A-F and within the sub- 
section each individual band is numbered from left to 
right. 

The centric heterochromatin, i.e., darkly stained 
region around the centromere in mitotic chromo- 
somes that remains condensed even after mitosis as a 
result of a specific structural organization of this part 
of the chromosome, of the X chromosome and the 
autosomes, is apparently underreplicated (or the con- 
sequence of elimination of part of these DNA sequen- 
ces) and not really visible in polytene chromosome 
preparations. These parts of the chromosomes should 
be at the base of each polytene arm in the chromo- 
center. However, the chromocenter only shows some 


dense material (heterochromatin) and some diffuse- 
looking material (euchromatin) giving no apparent 
details. Also, the proximal sections 20, 40, 41, 80, 81, 
and 101 at the base of each polytene chromosome 
starting from the chromocenter shows a banding pat- 
tern that is “fuzzy, variable and confused.” Not only 
is the centric heterochromatin under-replicated, but 
there are also specific regions in the euchromatic por- 
tion of the polytene chromosome that appear as con- 
strictions, often leading to breaks during preparation 
and are therefore referred to as ‘weak spots.’ 

In view of the character of these weak spots, they 
are sometimes referred to as sites of intercalary het- 
erochromatin. The genetic content of some of the 
weak spots are known; one is in the region where a 
histone gene cluster is located, another is near the 
location of the homeobox gene cluster BX-C. Often 
these weak spots are also involved in a phenomenon 
called ectopic pairing, connecting apparent nonhomo- 
logous loci. Interestingly, polytene chromosomes 
from different tissues from the same animal do not 
have weak spots at all the same loci; there may be a 
tissue-specific pattern of weak spots. Interestingly the 
observed coincidence that factors suppressing pos- 
ition effect variegation (due to an influence on the 
structure of the chromatin) also ‘improve’ the banding 
pattern in the regions close to the chromocenter. 
Recently, a mutation has been described that clearly 
increases the endoreplication of weak spots as well as 
pericentric heterochromatin of chromosome 3. 

The tips of the polytene chromosomes of various 
Drosophila strains can be highly variable due to the 
different length of the telomeres. 


Gene Localization 


Polytene chromosomes have been instrumental in the 
development of Drosophila genetics as we know 
it today. It has allowed the detailed cytogenetic 
analysis of the breakpoints of spontaneous and in- 
duced chromosome rearrangements and the develop- 
ment of techniques of gene localization specific for 
this organism. 

In radiation genetics, Drosophila has played a lead- 
ing role over a period of several decades. Literally 
thousands of heritable X-ray-induced chromosome 
aberrations have been isolated and analyzed using 
the polytene chromosome, which allows the exact 
breakpoints to be determined. A score of dele- 
tions, each encompassing a defined region of a number 
of bands of the polytene chromosome have been 
maintained over the years in specific stocks, freely 
available from a number of stock centers. These dele- 
tions (as well as a number of defined chromosome 
duplications) were subsequently used to pinpoint the 


localization of hundreds of mutants and genes. A 
further development in gene localization was the 
introduction by M.L. Pardue, some 25 to 30 years 
ago, of the in situ hybridization of specific DNA 
probes to polytene chromosomes. Using this tech- 
nique, cloned genes could be mapped to specific 
band/interbands and physically linked to loci on the 
chromosome. Later, the same technique allowed the 
localization of several hundreds of randomly inserted 
P-elements for cloning purposes as well as the local- 
ization of many unknown genes defined by specific 
P-insertion mutants. It is still used continually for 


the localization of inserted constructs in transgenic 
flies. 


Gene Isolation and Cloning 


Polytene chromosomes also played a role in the earli- 
est attempts to isolate and determine the DNA 
sequence of particular genes. The chromosomes are 
so large that microcloning techniques were developed 
to isolate and molecularly characterize regions that 
could be obtained by ‘cutting out’ cytologically de- 
fined bands. 

The banding pattern of the polytene chromosome 
is not static but dynamic. The euchromatic part of 
polytene chromosomes often show in addition to the 
band/interband pattern sites that appear more ex- 
panded and less dense (puffs). In some cases these 
puffs become extremely large as, for example, in Chiro- 
nomus and are called Balbiani rings after their dis- 
coverer. Puffs are reversible modifications of polytene 
chromosomes; they originate from single bands or 
from single bands and part of the adjoining interband. 
These puffs and Balbiani rings are sites of active RNA 
synthesis; therefore, polytene chromosomes and more 
specifically puffs and Balbiani rings are excellent tools 
to study the induction and process of transcription. It 
could be shown that the induction is preceded by accu- 
mulation of non-histone proteins and the appearance of 
ribonucleoprotein particles. The role of regulating 
proteins of transcription induction and repression 
in particular genes can be made visible on polytene 
chromosomes by antibodies. 

The discovery (Balbiani) and early studies on poly- 
tene chromosomes (Heitz, Painter, Koltzoff, and 
Bridges) have been described in part of an essay on 
Emil Heitz by Zacharias (Zacharias, 1995). Beermann 
as well as Pavan and Breuer showed in 1952 that the 
occurrence of puffs at particular sites was tissue- 
specific and shows changes during development. 
These changes during development have been studied 
in great detail in Drosophila melanogaster, where it 
could be shown that the induction of certain puffs is 
developmentally regulated by the molting hormone 
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ecdysterone and that products of these early puffs 
are responsible for the induction of gene activity in 
other target genes later. 
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Scope of Population Genetics 


Population genetics seeks to understand how and why 
the frequencies of alleles and genotypes change over 
time within and between populations. It is the branch 
of biology that provides the deepest and clearest 
understanding of how evolutionary change occurs. 
Population genetics is particularly relevant today in 
the expanding quest to understand the basis for 
genetic variation in susceptibility to complex diseases. 
Many of the factors that affect allelic frequency and 
associations among alleles of linked genes have been 
first characterized in Drosophila and other model 
organisms, but the same principles apply to virtually 
all organisms. 

Shortly after the rediscovery of Mendel’s laws in 
1900, a raging controversy developed over the rele- 
vance of the kind of variation and transmission that 
Mendel characterized to the smooth, continuous vari- 
ation that biologists had noted and measured in vir- 
tually all organisms. Could the continuous variation in 
stature, for example, be explained by underlying genes 
of the sort Mendel described? One of the arguments 
against Mendel’s genes was that recessive alleles would 
soon be lost from a population by virtue of its 
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recessiveness. Godfrey Hardy and Wilhelm Weinberg 
independently demonstrated the folly of this argu- 
ment, and showed instead that randomly mating 
populations would be expected to retain the allelic 
variation by simple Mendelian principles unless 
some other force acted on the variation. But this did 
not fully resolve the question of why parents and off- 
spring have correlated phenotypes for continuously 
varying traits. 

It was the theoretical population geneticist Ronald 
Fisher who developed the mathematics to show 
exactly how many genes acting together could pro- 
duce the precise quantitative degrees of familial resem- 
blance that are observed. This was one of many 
instances in the history of population genetics in 
which a formal mathematical model of the problem 
paved the way to understanding what empirical data 
needed to be gathered to test the new conceptualiza- 
tion. Fisher went on to develop, along with Sewall 
Wright and J. B. S. Haldane, much of the theory for 
allelic frequency change under simple models of nat- 
ural selection. Wright and Fisher developed the the- 
oretical machinery needed to understand the complex 
process of recurrent sampling that we now call ran- 
dom genetic drift. By 1940 much of the theory for 
the ‘modern synthesis’ of Darwinian evolution and 
Mendelian transmission genetics had been developed. 

Before considering the development of the empir- 
ical aspects of population genetics, the basic mechan- 
isms that underlie the modern synthesis are briefly 
reviewed below. 


Forces that Cause Allelic Frequencies to 
Change 


Population geneticists envision evolution as change in 
allelic frequencies. If the fundamental process of evo- 
lution can be described so simply, then to understand 
how evolution works, it is necessary to uncover the 
minute details of what factors may result in allelic fre- 
quency changes. The primary forces that can change 
allelic frequencies are mutation, random genetic drift, 
migration, and natural selection. These serve as the 
focus of much of the effort in population genetics. 


Mutation 

Mutation is of course the ultimate source of all genetic 
variation in a population. Mutation includes all heri- 
table changes in genes, including single nucleotide 
changes, clusters of nucleotide changes, insertions, 
deletions, and gene rearrangements of many sorts. 
For the purposes of early models in population 
genetics, mutations were assumed to occur at random, 
that is to say, independently of any other mutations 
or other factors. We now know of many ways in which 


the process of mutation may deviate from this 
simple model, including the fact that many single 
mutation events change multiple nucleotides, and 
that populations have individuals that differ in their 
rates of mutation owing to genetic variation in DNA 
repair mechanisms. We have also learned that the 
mutation rate varies wildly from one nucleotide 
to another along genes, and that some regions of 
genomes are potent hot spots for mutations. Such 
complications are being incorporated into current 
models of population genetics. A key point to empha- 
size is that although mutation is the source of all 
variation, it is generally sufficiently rare that its role 
in changing allelic frequencies, once a mutation is 
introduced into a population, is generally dwarfed by 
the effects of random genetic drift and/or natural 
selection. 


Random Genetic Drift 

In a finite population there is a sampling of gametes 
that occurs at the beginning of each new generation. If 
there are two alleles in the population, then there is a 
binomial sampling that occurs to form the next gen- 
eration. One fundamental result of this binomial sam- 
pling is that the population size makes a big difference 
to the magnitude of random genetic drift. If the fre- 
quencies of alleles A and a are p and q = 1 — p, 
respectively, then the binomial sampling variance is 
pq/2N, when there are N diploid individuals in the 
population. (If we were considering a haploid species, 
we would replace the 2N by just N in the denomin- 
ator.) Doubling the population size halves the sampling 
variance, so the size of jumps in allelic frequency from 
generation to generation vary inversely with popula- 
tion size. 

When this kind of process is repeated over many 
generations, if no other forces are acting, then allelic 
frequencies take a random walk in time, and ultim- 
ately end up going to fixation or loss (i.e., p = 0 or 
p = 1). Two key results about random genetic drift are: 
first, that it results in a loss of variation in time; and 
second, that small populations lose variation by drift 
more quickly than do large populations. Building 
from these simple principles, population geneticists 
have developed an elaborate mathematical theory for 
the behavior of purely neutral alleles in populations. 

Pure drift has the rather dreary result that all vari- 
ation is lost, but when neutral variation is continually 
pumped into the population by mutation, some very 
nice results are obtained in which there is a steady- 
state balance between the influx of variation by muta- 
tion and the loss of that variation by drift. In this 
neutral mutation model, expressions have been 
derived for the steady-state heterozygosity, the time 
to fixation of alleles, the rate of allelic turnover, 


and the frequency spectrum of alleles. The infinite 
sites model provides mathematical formulations for 
the relation between sample size and the number 
of segregating sites in the mutation—drift balance 
situation. 


Migration and Population Structure 

So far we have assumed that the population is a large 
interbreeding unit, but real populations may be sub- 
divided. Sewall Wright’s background in agricultural 
genetics made him particularly attuned to the role of 
population subdivision in evolution, and he was 
responsible for most of the early development of the 
theory. Beginning with Wright’s work, population 
geneticists have sought to quantify the degree of 
genetic differentiation among populations, and to 
characterize the role of population subdivision in 
genetic change in populations. Statistical measures 
for the degree of population subdivision, like Fst, 
have received considerable attention and are now 
well understood. 

Wright proposed a theory in which subdivision 
actually helps populations to achieve higher fitness 
than would a large panmictic population. In this ‘shift- 
ing balance’ theory, small subpopulations undergo 
random drift, and generate particular combinations 
of alleles that might not occur in a larger population. 
This is particularly so if there is a low fitness inter- 
mediate that must be generated en route to the favor- 
able allelic combinations, as only a small population 
would have random drift predominate over natural 
selection. Once these new favorable combinations 
of alleles are generated, natural selection takes over 
and these genotypes spread across subpopulations 
by migration. There is mixed opinion about the value 
of the shifting balance theory, largely because it is 
difficult to test. But regardless of the importance 
of the shifting balance theory, local adaptation 
in subdivided populations is clearly a significant 
phenomenon relevant to the overall evolution of a 
species. 

The idea that recent mixtures of populations results 
in genetic admixture also has a long history, and recent 
progress is being made in quantifying the admixture 
history of populations by extensive sampling of micro- 
satellite loci. Understanding of the nature of genetic 
differences among human populations is a subject of 
particular importance today, as we need to know 
whether alleles that confer increased disease suscep- 
tibility in one population are likely to be in common 
across populations, or whether independent studies of 
such associations need to be done in individual popu- 
lations. If the latter is true, then we need to have a 
systematic sampling method that defines the popula- 
tions in a genetically meaningful way. 
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Natural Selection 

The cornerstone of Darwinian evolution is that those 
individuals with the highest fitness will pass on their 
traits to the greatest number of offspring. Population 
geneticists interpret this very literally — different geno- 
types have unique fitness values that define their 
relative reproductive success. In the case of one locus 
with two alleles, classical population genetics theory 
assigned relative fitnesses W11, W12, and W22 to geno- 
types AA, Aa, and aa, respectively. With this formula- 
tion, when the fitnesses are ranked W41 > Wi. > Wo), 
then the A allele goes to fixation, and the reverse 
ranking makes the a allele go to fixation. In the case 
where the heterozygotes have the highest fitness 
(which we can write W11 < Wi2 > W22), then there is 
a stable equilibrium, which means that regardless of 
the starting allelic frequency, the population will tend 
toward this equilibrium. Finally it is possible for the 
heterozygotes to have the lowest frequency (Wi; > 
W12 < W22), in which case the population goes either 
to fixation of A or fixation of a, depending on the 
starting allelic frequency. 

The modern view of natural selection is much more 
sophisticated than this. First, fitnesses are acknow- 
ledged to be a property of a particular genotype in 
a particular environment, so that relative fitnesses 
are likely to change when the environment changes. 
Second, fitness is not a property of single genes, but 
depends on the interplay of many genes which 
may interact in their effects. Finally, fitness is not a 
univariate property, but instead has many dimensions 
or components. Genotypes may differ in chance of 
survival, in mating success, in relative fecundity, in 
sperm competitive ability, and so on. One cannot 
simply add up these effects and produce a single 
net fitness because these different components have 
quite distinct effects on the dynamics of allelic fre- 
quencies. 


Explaining Genetic Variation in 
Populations 


If the ultimate goal of population genetics is to explain 
in a quantitative way the forces that underlie main- 
tenance of genetic variation, then it is important 
to understand the nature of the data that is being 
explained. Many of the controversies and shifting 
ideas about the relative roles of neutral variation vs. 
natural selection, for example, arose owing to the fact 
that we only had partial information about variation. 
Today we routinely gather massive amounts of data on 
DNA sequence variation, but in the quite recent past 
we only had the poorest of information available. 
Earlier this century, the primary data on genetic vari- 
ation came from visible phenotypic mutants of the sort 
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that Mendel observed and the early Drosophila geneti- 
cists worked with. Such mutants are rare in natural 
populations, so it is not surprising that the ‘classical’ 
school, largely pressed by H. J. Muller, was that there 
was a wild-type genotype, and most variation in 
populations could be characterized as deleterious 
deviations from this ideal. Subsequent discovery of 
much higher levels of variation resulted in rejection 
of this view in favor of the view that there is no 
universally ‘best’? genotype, but that any of several 
extant genotypes might, in some environments, have 
the highest fitness. 

Until the late 1950s, the only two forms of genetic 
variation that were scored were those that had pro- 
nounced phenotypic effects, like coat color mutations 
or blood groups, and those that affected chromosome 
morphology and were visible under the microscope. 
Theodosius Dobzhansky made particularly good 
use of cytogenetic polymorphisms in characterizing 
the distribution and frequencies of inversion poly- 
morphisms in Drosophila pseudoobscura. Early on 
Dobzhansky observed stable maintenance of inversion 
frequencies that differed from one population to 
another, seasonal cycling in frequencies in some loca- 
tions, clinal change in frequencies across temperature 
and altitude gradients, and population cages of flies 
maintained stable inversion polymorphisms. All of 
these observations convinced him that natural selec- 
tion was acting directly, and that the different inver- 
sion genotypes had different fitness values. 

In 1966, a landmark paper by Richard Lewontin 
and John Hubby established the utility of protein 
electrophoresis for characterizing polymorphisms at 
multiple loci encoding soluble enzymes. There soon 
followed an avalanche of studies quantifying protein 
polymorphisms in organisms ranging from bacteria 
to elephant seals. In 1968 Motoo Kimura published a 
paper in which he laid out the arguments for the 
neutral theory of evolution, and a full-blown contro- 
versy was launched centering on the question of 
whether most protein polymorphism is maintained 
by natural selection or whether it is selectively neutral. 
It seemed as though the neutralists had the upper hand 
because so many facets of the data were well fitted 
by the theory, but DNA sequence data would soon 
change this picture. In 1983 Martin Kreitman’s PhD 
thesis examined the DNA sequence differences among 
11 alleles of the gene for alcohol dehydrogenase in 
D. melanogaster. A staggering 52 positions of the gene 
were found to vary, and patterns of this variation were 
far from random. The most striking finding was that, 
of the 14 positions that varied in coding regions of the 
gene, only one resulted in an amino acid polymorph- 
ism, whereas 13 were silent. Since about 3/4 of the 
random changes in coding positions would produce an 


altered amino acid, if we saw 13 silent changes then 
proportionately we should see three times as many 
replacement changes. That only one replacement was 
seen instead of 39 means that the other 38 were elim- 
inated from the population, testifying to the exquisite 
sensitivity of natural selection to identify altered pro- 
tein forms. 

Since Kreitman’s 1983 paper there have been many 
studies of polymorphism at the DNA level, and our 
picture of the nature of evolution at the molecular 
level is being considerably refined. Genes have been 
identified with surprisingly little variation, presum- 
ably owing to a recent selective sweep event. Genes 
with stunningly high levels of replacement poly- 
morphism have also been found, particularly those 
encoding proteins important in the immune system’s 
ability to identify diverse pathogens. Such genes also 
often show shared polymorphism, where the same 
molecular polymorphism is present in closely related 
species, indicating that the allelic diversity has been 
present in the two species all the way back to the 
time that they shared a common ancestor. Now there 
is an enormous push to characterize human variation 
at the DNA level, and there is so much medical 
interest (and funding) behind this quest that it is 
likely that data on human polymorphism will dwarf 
that of all other species in the near future. Genome 
projects have already reported in excess of 2 million 
nucleotide positions in the human genome that 
show differences among fairly small samples of indi- 
viduals. 


Role of Mathematical Theory in Modern 
Population Genetics 


We have seen several instances in which mathematical 
theory has played a key role in the development of 
population genetics, and this is certainly true today. 
The intimate interplay between empirical observation 
and mathematical modeling of the underlying pro- 
cesses makes modern population genetics almost 
unique among the life sciences. Historically, the field 
was very much theory driven, meaning that the the- 
oretical work preceded what could be known empir- 
ically, and the theory helped to shape what sorts of data 
needed to be collected for the field to progress. Popu- 
lation genetics gave rise to many deep questions in 
mathematics (e.g., branching processes were invented 
as a result of a population genetics question), and it 
played an even bigger role in the development of 
modern statistical theory. The correlation coefficient 
we use today was devised by Karl Pearson to describe 
quantitatively the relation among relatives in a genet- 
ics problem. Analysis of variance was an invention of 
R. A. Fisher, also in the context of genetics problems. 


Many other statistical procedures have their roots in 
problems faced by population geneticists. 

More recently, with the discovery of altogether 
unexpected phenomena, such as codon usage bias, 
transposable elements, etc., the empirical data have 
surged ahead of the theory in generating new ques- 
tions. This in turn has resulted in a different perspec- 
tive for the theoretical work. As pointed out by 
Warren Ewens, earlier theory had been largely pro- 
spective, meaning that it tried to project forward in 
time the frequency dynamics of genetic variation. But 
the richness of DNA sequence data meant that such 
forward projections were of limited use. Instead, cur- 
rent theory more often takes a retrospective look, and 
projects backward in time to ascertain how forces of 
mutation, drift, and selection could give rise to the 
observed sample. 

Foremost among the retrospective approaches in 
theoretical population genetics is the use of the co- 
alescent. This refers to a mathematical construct of a 
gene tree, starting with the current sample and imagin- 
ing the ancestral relationships among those sample 
members as one extends backward in time. At some 
point in the past some pair of alleles in a sample came 
from one individual (that is, they shared common 
ancestry). At that point there is one less distinct allele 
in the sample. This process repeats recursively back- 
ward until there is but one copy of the gene, the 
common ancestor of all the genes in the sample. John 
Kingman developed the initial mathematics of this 
process (distributions of times back to common ances- 
try), and many refinements and applications have been 
added by Simon Tavaré, Richard Hudson, Joseph 
Felsenstein, Montgomery Slatkin, John Gillespie, and 
others. Along with this retrospective view, Bayesian 
methods have emerged that make extensive use of 
Monte Carlo sampling of Markov chains for obtaining 
posterior distributions of credibility of parameter esti- 
mates of models in population genetics. 


Open Problems in Population Genetics 


Population genetics remains an active and lively field 
in part because it addresses such sweeping questions 
about the process of evolution and the relationship 
of humans to other organisms. A few of the topics of 
active inquiry are described briefly below. 


Relation between Nucleotide Diversity and 
Recombination Rate 

First noted in Drosophila but later seen in many 
organisms is the positive correlation between local 
rate of recombination in a genome and the level of 
nucleotide diversity in that region of the genome. Some 
explanations could be discarded readily, including, for 
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example, the notion that recombination might be 
mutagenic. This could be discarded because the level 
of interspecific divergence is quite uniform across 
widely varying rates of recombination, so regions 
of high recombination could not have elevated muta- 
tion rates. Two models remain viable and different 
aspects of the data support each. The first model is 
that when adaptive mutations occur, they increase in 
frequency until they are ultimately fixed. This process 
of selection of favorable new mutations results in a 
‘sweeping’ of linked alleles as well. In regions of low 
recombination, the size of this swept region is greater, 
and the genetic variation in such a region is reduced 
after a sweep. Regions with a high recombination 
rate allow the favorable allele to recombine away 
from flanking alleles, so less variability is lost by the 
sweep. Such a model ought to result in a skewed 
distribution of allele frequencies, since it would 
appear that variation is expanding in the swept region, 
and generally Drosophila data do not exhibit this 
pattern. 

The second model is called background selection 
and it is a little more subtle. The idea is that deleterious 
mutations occur throughout the genome, and selection 
removes the deleterious alleles. As selection removes 
alleles, it also removes flanking variation, again with 
larger flanks removed in regions of low recombin- 
ation. Those alleles that are removed do not contribute 
to descendant lineages, so in effect the population 
size has been reduced. Regions of low recombination 
then face a greater reduction in effective population 
size by this background selection. Regions with lower 
effective population size would be expected to show 
lower levels of variability, thereby generating the 
observed association between recombination rate 
and genetic variability. We need more data on a 
genome-wide scale to assess the relative roles of 
these two mechanisms in producing the positive 
correlation between recombination rate and nucleo- 
tide diversity. 


Genetic Conflict and Evolution at Different 

Levels 

There are many situations in which natural selection 
appears to be working in conflicting directions for 
different aspects of an organism’s biology. Such pro- 
blems fall under the general name of genetic conflicts, 
and they give rise to some interesting puzzles for the 
evolutionary biologist. New mutations, for example, 
may be advantageous in males but disadvantageous 
in females. In particular, novel alleles of genes that 
encode seminal proteins may confer greater sperm 
competitive success, but come at a cost to the female’s 
survival or fertility. Genes involved in the immune 
system may serve the positive function of protection 
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from pathogenic infections, but in the absence of 
infection they may give rise to debilitating autoim- 
mune disease. 


Muller’s Ratchet 

H. J. Muller pointed out that an asexual organism 
faced the problem that in a finite population, random 
genetic drift would eventually result in the loss of the 
fittest genotype. If the population were small, it might 
lose the fittest genotype at a rate faster than its reap- 
pearance by mutation. If this were so, then there 
would be a ratcheting back in population fitness that 
would eventually lead to extinction. Why does this 
notoccur for mitochondrial genomes, or for Y chromo- 
somes? There are several mechanisms available to 
organisms to retard the rate at which Muller’s ratchet 
proceeds. The most effective way to stop the ratchet is 
to allow for recombination, since it allows fittest geno- 
types to be regenerated. But for truly asexual geno- 
mes, such as mtDNA, the Y chromosomes, or totally 
asexual organisms, more subtle approaches need to be 
taken. For mtDNA, the process of drift is very much 
more complicated, because sampling occurs in produ- 
cing an egg with its multiple mitochondria, each of 
which has multiple mtDNA molecules. Theoretically, 
this multistage sampling itself results in a much slower 
ratchet. Chloroplast DNA, which is also uniparen- 
tally inherited, has regions of inverted repeats that 
are functionally diploid in a way that can also retard 
the ratchet. In a broader context, population genetics 
addresses the role of sexual reproduction and recom- 
bination in adaptive evolution. 


Genetic Basis of Complex Traits 
Just as R. A. Fisher wrestled with the problem of 
continuous variation and its underlying genetic basis, 
we are still today trying to understand the role of 
genes in traits that do not segregate like Mendelian 
genes but which aggregate in families to an extent 
that clearly demonstrates a genetic contribution. 
Today the problem is often painfully practical. We 
need to find the genes for diabetes, cardiovascular 
disease, and cancer if we are to make further progress 
in treating these major sources of mortality and mor- 
bidity. If one imagines that the world is like that 
supposed by R. A. Fisher, with thousands of genes 
affecting each trait and each gene having a minute or 
infinitesimal effect, then the problem is clearly hope- 
less, and minute allelic effects could not be individu- 
ally characterized. The statistics of the situation are 
such that only genes with fairly substantial effects 
(more than about 5% of the trait variance) could be 
detected. 

The idea of a Quantitative Trait Locus (QTL) 
was developed to formalize the mapping of genetic 


variation for complex traits. By scoring many genetic 
markers throughout the genome, it is possible using 
maximum likelihood or regression methods, to test 
the hypothesis that a QTL is located at each position 
in the genome, including positions between markers. 
QTL maps provide statistical support for the rough 
number and location of genes that affect the trait. 
Considerable progress has been made in developing 
these methods and in testing them with such traits as 
bristle numbers in Drosophila or flowering time in 
Arabidopsis. There is a massive impetus to apply 
related methods to traits associated with chronic dis- 
ease risk in humans. Methods in human genetics tend 
to focus on excess sharing of allelic identity among 
affected relatives, or on cotransmission of traits and 
alleles, but the principle of using flanking anonymous 
markers to map QTLs is the same. Clearly, underly- 
ing processes of population subdivision and random 
genetic drift play a key role in determining the efficacy 
of these approaches. 


Efficacy of Linkage Disequilibrium Mapping 
Along with the general problem of finding genes 
for complex traits is the quite specific issue of using 
single nucleotide polymorphisms (SNPs) in the human 
genome to locate genes associated with disease by 
virtue of linkage disequilibrium (LD). Linkage dis- 
equilibrium refers to a lack of statistical independence 
of alleles at two or more loci. The idea is that if we 
could type enough SNPs such that we would have 
scored a SNP in close proximity to causal variants, 
then it would be likely that the association between 
the SNP and the disease could be detected by linkage 
disequilibrium. In order for this to work, we need a 
much more detailed picture of linkage disequilibrium 
in human populations, and such surveys are now 
under way. What is the distribution of LD across the 
human genome? How much does it vary among popu- 
lations? How many markers will be needed to map 
traits in this way? Can we develop a single set of such 
markers for use in all human populations? These are 
problems of major medical importance that require 
input by population geneticists. 


Human Origins 

Although collection of data on human genetic vari- 
ation has been progressing over the past 50 years or 
more, we are just at the beginning of the really large- 
scale efforts to understand human variation at a global 
and genome-wide scale. Data from classical blood 
proteins and from mitochondrial DNA studies have 
been suggesting that sub-Saharan African populations 
harbor more variability than other human popula- 
tions, and that in genealogical trees, these populatios 
seem to fall at the root (i.e., closest to the ancestor). 


The mitochondrial data, and more recent DNA 
sequence data, seem to be reasonably consistent in 
showing that modern humans migrated out of Africa 
around 80 000 to 150 000 years ago, and then spread all 
over the globe. This was at first surprising, because 
Homo erectus remains from 1.5 million years ago and 
older had been found in many parts of Asia and 
Europe. The implication is that there was some sort 
of replacement of Homo erectus by modern humans, 
but the details are still obscure. In Europe there was 
clearly overlap of another early ancestor, namely 
the Neanderthal man. What is the chance that there 
was gene flow between modern humans and the 
Neanderthals? These issues demand analysis of 
the population genetics of the situation, and recent 
recovery of mitochondrial DNA fragments from 
Neanderthals certainly stimulate greater work on 
these problems. By gathering and analyzing growing 
volumes of data on DNA variation, we hope to get 
better inferences of the timing of movements and popu- 
lation expansions in the earliest history of our species. 
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Origins of Population Substructure 


Simplicity in scientific theories is usually seen as a 
virtue and population genetics is no exception. Most 
discussions of the genetics of populations starts with 
the simplest description of a population as a very large, 


Population Substructure 1519 


single collection of randomly mating individuals. 
From this simple description genetic properties of 
populations may be deduced. For instance genes 
with multiple alleles are expected to obey the laws of 
Hardy-Weinberg and linkage equilibrium if they are 
not subject to natural selection and a sufficient num- 
ber of generations of random mating has occurred. 
However, many real populations do not fit this simple 
model. Often we find populations have barriers that 
prevent the exchange of genes between them. These 
may often be physical barriers like mountains, oceans, 
or simply great distances. In these circumstances 
members of a species are found in many different 
subpopulations that are genetically different and iso- 
lated from each other. The collection of genetically 
differentiated subpopulations is referred to as popula- 
tion substructure. 

Suppose a large population some time in the past 
sent out immigrants which created three new popula- 
tions that were isolated from each other and from the 
parental population (Figure 1A). Even if we assume 
these three new populations were initially genetically 
identical we expect that over long periods of time, 
perhaps dozens or even thousands of generations, 
these populations will become genetically different 
from each other. These genetic differences may arise 
due to completely random processes like genetic drift 
or they may arise due to natural selection which acts 
differently in the three localities. More likely genetic 
differentiation may be due to both processes. The 
particular history of a population may in fact be 
quite complicated giving rise to a hierarchy of events 
that affects the genetic characteristics of the popula- 
tion today. Thus, a single population may subdivide 
and give rise to two new isolated subpopulations that 
differentiate over time before these then subdivide and 
give rise to four subpopulations that persist today 
(Figure IB). The present-day ecology may help to 
identify this hierarchy. Thus, subpopulations 1-4 
(Figure |B) may be fish in four small streams; how- 
ever, subpopulations 1 and 2 are in streams that join a 
common river, as are populations 3 and 4. Addition- 
ally these two rivers may ultimately join a single lake. 
There are clearly many other complicated hierarchies 
and subdivisions that can give rise to substructure in 
natural populations. 

The present-day populations may be completely 
isolated from each other or they may exchange 
migrants (Figure IB). The group of populations that 
communicate with each other through the exchange of 
migrants are called a metapopulation. Migration of 
individuals between populations may have effects on 
both the genetic variation and long-term persistence 
of a population. 
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The origin of population structure. (A) Initially samples from a large source population create three new 


subpopulations, which are genetically identical or at least quite similar. Over time these populations become 
genetically differentiated due to random genetic drift, natural selection, or both. (B) There can be a hierarchy of 
sampling events. In this figure the source originally gives rise to two subpopulations. These become differentiated 
over time and then subdivide into a total of four populations that continue to differentiate. The present-day 
populations may be completely isolated or may exchange some migrants as a metapopulation. 


Genetic Consequences of Population 
Substructure 


It is often difficult to identify the boundaries of sub- 
populations or even know if they exist. Consequently, 
population geneticists are often confronted with sam- 
ples of individuals that may come from one subpopu- 
lation or may be from many subpopulations. It turns 
out that even if all the subpopulations obey simple 
population genetic rules like Hardy-Weinberg and 
linkage equilibrium, a pooled sample from many sub- 
populations will not. The nature of these effects 
depend on whether we are looking at one locus or 
multiple loci. 


Single Locus 

Suppose we are interested in genetic variation at a 
single locus with two alleles, called A and a. If there 
is the population substructure shown in Figure IA, 
then the frequency of A in populations 1, 2, and 3 will 
be pı, p2, and p3, respectively. The average of these 
three allele frequencies is p. If each subpopulation is in 
Hardy—Weinberg equilibrium then the frequency of 


AA homozygotes in the three populations is p1’, p2°, 
and p3’, respectively. Let the average of these three 
values be P. The naive population geneticist may then 
take samples from all three populations, thinking they 
are a single population, and compare the observed fre- 
quency of homozygotes (P) with the Hardy-Weinberg 
prediction p°. This comparison would always result in 
the observed frequency being greater than the pre- 
dicted, that is P > p*. This is called the Wahlund effect 
and is named after Sten Gösta William Wahlund, the 
Swedish geneticist who first described it in 1928. 

We can in fact make a more quantitative statement 
about the difference between the observed frequency 
of homozygotes in the pooled sample vs. the Hardy- 
Weinberg expectation. Just as we used the allele 
frequencies in the individual subpopulations to esti- 
mate the mean allele frequency, we can also use these 
values to estimate the variance in allele frequencies, 
which in this example is equal to: 
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If we call the variance o°, then the magnitude of the 
Wahlund effect is given by P = o? + p°. This last rela- 
tionship will hold no matter how many subpopula- 
tions we have included in our pooled sample. It also 
suggests that the excess of homozygotes in our pooled 
sample will be proportional to the variation in allele 
frequencies. When there is no variation, o7=0, we will 
observe the Hardy-Weinberg expectation. 


Two or More Loci 

Consider a second locus with two alleles, B and b. The 
frequency of the B allele in our three subpopulations 
(Figure IA) are 71, r2, and 73. It is usual to characterize 
the genetics of populations at multiple loci before 
examining gamete frequencies. For the two-locus 
genetic example considered here there are four pos- 
sible gamete types, AB, Ab, aB, and ab. If we let their 
frequency in population 1, say, be x11, x21, X31, and x41, 
respectively, then this population is said to be in link- 
age equilibrium if D=x11 x4, — x21 x31=0. D is called 
the coefficient of linkage disequilibrium. Even if all 
subpopulations are in linkage equilibrium, a pooled 
sample will generally not be. The magnitude of linkage 
disequilibrium in a pooled sample will be equal to the 
covariance in the frequencies of the A and B alleles 
over all subpopulations. Thus, if subpopulations with 
high frequencies of the A allele tend to either have very 
high frequencies of B or very low frequencies of B, the 
pooled subpopulations will show substantial linkage 
disequilibrium. 

If the subpopulations come back into contact and 
mate at random it will take many generations for 
linkage disequilibrium to vanish. The magnitude of 
linkage disequilibrium will be reduced by a factor 
of 1—r each generation, where v is the recombination 
fraction between the two loci. At best this means that 
linkage disequilibrium will be cut in half each gener- 
ation if the two genes are unlinked. If there are 
more than two loci then in addition to the two-locus 
measures of linkage disequilibrium there are higher 
order measures of associations between trios of loci, 
quadruples, etc. These higher order measures of asso- 
ciation will also eventually vanish with continued 
random mating although they may initially increase in 
magnitude, unlike the two-locus disequilibrium values. 

If recontact between the subpopulations does not 
result in random mating, but only an exchange of 
limited migrants between their immediate neighbors, 
linkage disequilibrium between a pair of loci will 
vanish, but at a slow rate. This rate will depend on 
the number of subpopulations and the rate of migra- 
tion. As an example suppose the three populations in 
Figure IA receive 5% of their breeding population 
from their adjacent neighbors. Even if the A and B 
locus are unlinked the linkage disequilibrium of the 
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pooled population will decrease by only about 5% per 
generation. 


Wright’s F Statistic 


Although we have summarized the Wahlund effect as 
the observation of an excess of homozygotes in a 
population of pooled subpopulations, it can also be 
stated as a deficiency of heterozygotes in the pooled 
population. Sewall Wright developed a statistic that 
makes use of this result. Using the parameters defined 
in the section ‘Single Locus’ above, Wright’s fixation 
index is defined as: 
_2p(1-p) =P 
2p(1 — p) 

This parameter ranges in value from 0 to 1. When there 
are no differences in allele frequencies between the 
constituent subpopulations, F=0. Alternatively, 
when the subpopulations are fixed for alternative 
alleles, so that there are no heterozygotes in the sub- 
populations, F achieves its maximum value, 1. For 
genes that are not subject to natural selection several 
precise predictions about the expected magnitude of F 
may be made. In these cases genetic drift is the major 
evolutionary force causing the differentiation of 
populations. For instance, populations with a struc- 
ture like Figure 1A and no migration between popu- 
lations or mutation at the studied loci will exhibit a 
steady increase in the magnitude of F until it even- 
tually reaches 1. F increases at a rate that depends on 
the size of the subpopulations. 

Evolutionary forces such as mutation and migration 
may prevent F from reaching 1. This is because the 
individual subpopulations will not become fixed for 
any allele since the alternative allele will be continually 
reintroduced. In the case of migration relatively low 
levels of migration will reduce the final value of F to 
just moderate values. If sufficient time goes by the 
forces of drift and migration should equilibrate, pro- 
ducing an equilibrium or constant value of F equal to: 


1 
4Nm +1 


where N is the effective size of the population and m is 
the migration rate. For example, if a population 
receives just two migrants per generation (e.g., 
Nm=2) F will equilibrate at 0.11. 


Migration between Subpopulations 


Migration can clearly have a substantial impact on the 
extent of population substructure. Typically it is very 
difficult to estimate migration rates for most species. 
Even if it is possible to document the movement of 
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individuals from one location to another, these move- 
ments will have no genetic effect if those individuals 
do not mate and have offspring. However, it is quite 
easy to gather extensive genetic information on most 
natural populations witha number of different molecu- 
lar based techniques. In 1981 Montgomery Slatkin 
devised a simple procedure for estimating rates of 
gene flow from genetic data. 

Slatkin’s technique requires an estimate of the fre- 
quency of private alleles. These are alleles that occur 
in only one of the many subpopulations examined. 
If gene flow between populations is very low we 
expect private alleles to have greater frequencies than 
when gene flow is high. Gene flow may be expressed 
as the product of effective population size and migra- 
tion rate, Nm. As described above in the section 
‘Wright’s F Statistic’, Wright’s fixation index — and 
thus the relative level of population substructure — 
will depend on the value of Nm. In Table | we see 
very high values of Nm for marine mussels, which 
indicates very little population substructure. This 
seems reasonable since these organisms distribute 
their immature larval forms into the ocean and the 
larvae may be carried great distances by ocean cur- 
rents before they settle and become adults. On the 
other hand, the study of Plethodon cinereus included 
samples from the Southern United States in Louisiana 
and as far north as Quebec in Canada. The ability of a 
small terrestrial salamander to traverse these distances 
is clearly limited. Accordingly the estimates of gene 
flow are quite low. 


Population Structure and Gene Trees 


The ability to collect detailed genetic data directly 
from DNA sequences in natural populations has 
opened up new ways of studying population substruc- 
ture. Consider a particular DNA sequence in a plant 
or animal mitochrondrion. This allows us to ignore 
the complications of recombination in the arguments 
that follow. Each copy of this particular sequence or 


Table | Estimates of gene flow (Nm) per generation in 
several different animal species 


Species Nm 
Marine mussel (Mytilus edulis) 42.0 
Fruit fly (Drosophila willistoni) 9.9 
Mouse (Peromyscus californicus) 2.2 
Fruit fly (Drosophila pseudoobscura) 1.0 
Pocket gopher (Thomomys bottae) 0.86 
Mouse (Peromyscus polionotus) 0.31 
Salamander (Plethodon cinereus) 0.22 
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Figure 2 A hypothetical gene tree. Originally four 
different haplotypes exist in a single population, a, b, c, 
and d. Over time each of these can either leave 0 
descendants (in which case the line ends), | descendant 
(symbolized by a single line), or 2 descendants (indicated 
by a split with two new lines). At some time the single 
population is split by a barrier (shown as a gray bar) into 
two isolated subpopulations. 


haplotype must have originated from a single copy 
sometime in the past. We can in fact use the techniques 
traditionally used for phylogenetic inference to con- 
struct gene trees that show the likely history of parti- 
cular haplotypes in the past. 

In Figure 2 we have shown a hypothetical gene 
tree. A single population starts out initially with four 
individuals, each with a different haplotype. Over 
time two of these haplotypes go extinct, a and d, 
while the other two, b and c, persist. Additionally a 
barrier is set up that subdivides the population into 
two subpopulations. Samples of individuals from each 
of these subpopulations will confirm their genetic 
separation and their true status, since one subpopula- 
tion will consist entirely of the b haplotype and the 
other the c haplotype. In practice one must have some 
means of sampling putative subpopulations and then 
the gene tree is compared with the sampling units to 
see if there is congruence. 

As an example, the gene tree for the freshwater 
spotted sunfish, Lepomis punctatus, is shown in 
Figure 3. There is a major split in the tree that corres- 
ponds perfectly to the samples that were taken from 
western and eastern localities. 
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Figure 3 The gene tree for mitochondrial haplotypes of Lepomis punctatus. The haplotypes are identified by 
different numbers, whereas the geographical samples are represented by different symbols. None of the locales 
where clones l-8 (eastern samples) were found contain clones 9—17 (western samples). (Reproduced with 


permission from Bermingham and Avise, 1986.) 
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When a gene or part of a gene is moved from one 
location to another, it can fall under the regulation of 
the enhancers and promoters of other genes or become 
incorporated into a part of the chromosome that is 
packaged differently. When this occurs the gene may 
not be expressed in the correct tissue or at the right 
time, or the gene may make an incorrect amount of 
product, or it may be silenced in some cells and not 
in others, or it may be silenced completely. These 
kinds of phenomena fall under the broad category of 
genomic position effects. Aside from their intrinsic 
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interest as biological phenomena, understanding the 
mechanisms underlying position effects will have pro- 
found implications for the treatment of genetic dis- 
orders, some of which arise due to position effects. In 
addition, understanding genomic position effects is 
crucial for the success of gene therapy in humans and 
genetic engineering in agriculture. At this time we are 
unable to remove an abnormal gene and replace it with 
a working copy in the exact same site. Since the work- 
ing copy of the gene will be inserted at a new site in 
the genome, and therefore potentially subject to new 
regulatory elements or local chromatin structure 
effects, we must be able to ensure that the inserted 
gene is regulated in the proper way in order for any 
therapy to be effective. 


DNA, Genes, and Chromosomes: 
Normal Structure 


The blueprint for every organism, from the smallest 
virus to the largest animal, is encoded in its DNA. An 
organism’s DNA contains the instructions for when, 
where and how much of the components necessary for 
life are made. The instructions are broken up into bits 
of information called genes. Each gene usually makes a 
protein product via an RNA intermediate, that is the 
gene is transcribed into an RNA message which is 
then translated into a protein. One part of a gene, the 
structural element, encodes the protein which is 
required by that organism or cell type. For example, 
in the follicle cells of your scalp, the genes that make 
the proteins of your hair are active. Of course this 
requires that some part of the gene tells the structural 
protein where in the body and when it is to be pro- 
duced; you would not want your eye cells to produce 
hair proteins. The regulatory element, or more pro- 
perly elements, is the part of the ‘gene’ that determines 
when, where, and how much product is transcribed. 
One component of the regulatory elements, called the 
promoter, is usually very close to the structural or 
protein coding part of the gene. Other regulatory elem- 
ents, called enhancer sequences, control when and 
how much of the structural element is transcribed. 
Enhancer sequences can be located some distance 
away from a gene, and can regulate more than one 
gene at a time. Thus, a gene has two parts, regulatory 
and structural elements, both of which must be 
arranged correctly for proper function and some of 
the regulatory elements can be located several thou- 
sand base pairs away from the structural gene or genes 
they regulate. However, DNA sequences such as regu- 
latory and coding elements are not the complete story. 
In all higher organisms the way in which the DNA is 
packaged is also a very important process in gene 
regulation. 


Organisms other than viruses and bacteria have 
their DNA (the genome) set aside in a special com- 
partment in each cell called the nucleus and these 
organisms, be they single-celled or multicellular 
plants or animals, are called eukaryotes. The genome 
of eukaryotes is divided into pieces that are organized 
into chromosomes. Each of the chromosomes com- 
prises a single molecule of double-stranded DNA that 
is packaged with special sets of proteins, called histone 
and nonhistone chromosomal proteins, into a struc- 
ture generically referred to as chromatin. The basic 
subunit of chromatin is the nucleosome and its struc- 
ture is now known at the atomic level. A nucleosome 
consists of about 200 bp of DNA associated with a set 
of highly conserved proteins called histones. About 
147 bp of DNA are wound twice around the nucleo- 
some core which is composed of eight histone pro- 
teins, two each of histone H2A, H2B, H3, and H4. The 
remaining DNA, between adjacent nucleosome cores, 
is packaged with another histone protein called his- 
tone H1. However chromatin is not simply DNA and 
nucleosomes. The nucleosomes and DNA interact 
with many other chromatin-associated proteins to 
form higher-order chromatin structures whose forms 
and functions are still poorly understood. The rela- 
tionship between chromatin structure and the essen- 
tial processes of gene expression, DNA replication, 
recombination, and DNA repair are still an enigma. 

Genes are located in a linear fashion along the 
length of the chromosome and differ in size from a 
few hundred to many thousands of base pairs, depend- 
ing on the size of the product they encode. In a par- 
ticular species, the same genes are located in the same 
position (called a locus) on the same chromosome for 
every member of the species. However, genes are not 
distributed uniformly along the length of a species’ 
chromosomes. In most eukaryotes, the DNA at the 
tips of the chromosomes, called the telomeres, and 
around a structure called the centromere contain few, 
if any, genes. The DNA in the telomeres and around 
the centromeres adopts a special compact conform- 
ation called heterochromatin. The DNA in the remain- 
der of the chromosome, called euchromatin, is less 
tightly packaged and contains most of the genes. 
Many decades ago cytogeneticists discovered that the 
densely packed heterochromatin stains differently 
from the less densely packed euchromatin, and the 
two types of chromatin can be easily distinguished 
by their differing morphologies. In some organisms, 
and perhaps all, heterochromatin occupies a special 
place in the nucleus. For example, in the fruit fly, it is 
found at the periphery of the nucleus. 

Chromatin structure plays a dramatic role in nor- 
mal development, and thus by corollary, in disease 
states. The stable and heritable inactivation of 


particular sets of genes is an essential part of normal 
development. During embryogenesis, mechanisms of 
pattern formation generate characteristic combin- 
ations of regulatory factors and states that identify a 
given tissue or cell type. These states must be main- 
tained over many cell generations to permit the for- 
mation of the correct structures during differentiation. 
There is a rapidly accumulating body of evidence that 
specific chromatin proteins and higher-order chroma- 
tin states, called domains, are involved in establishing 
and maintaining these normal and crucial develop- 
mental decisions. Abnormal chromatin structure can 
therefore deregulate these processes. Thus abnormal 
chromatin structure may lead to specific disease states, 
and often leads to cellular transformation and even- 
tually tumorigenesis. Altered chromatin states can 
be local (gene specific) or more global (influencing a 
number of genes within a segment of the genome). 
Genomic position effects come about by a variety of 
methods, but generally can be subdivided into those 
that put a gene under the influence of an incorrect 
regulatory region (enhancers or silencers), or those 
that result from alterations in chromatin structure. 
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History: Stable and Variegated Position 
Effects 


The first report of variegated position effects was by 
H. J. Muller who treated fruit flies (Drosophila melano- 
gaster) with X-rays. Normally the fly’s eye is dark 
red but some of the progeny of the X-ray-treated flies 
had eyes that were a mosaic of dark red cells and white 
cells. Muller called this phenomenon ‘eversporting 
events’ since the mosaic pattern seemed to differ 
from one individual to another, but now we call it 
position effect variegation (PEV). We now know that 
the X-ray treatment caused the X chromosome in 
some of the flies to break in two places, the middle 
piece inverted and the chromosome ends were re- 
joined. One of the breaks occurred in the heterochro- 
matin surrounding the centromere and the other was 
near to the white” (w*) gene (Figure |). The w™ gene 
makes a product that is necessary for the deposition of 
the eye pigments. When this gene functions normally 
it results in the normal dark red eye of fruit flies, but if 
it is inactive then the fly’s eye is white. As a result of 
the inverted piece of the chromosome, the w* was 
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telomere is repressed in some cells, but remains transcriptionally competent (active) in other cells. 2. Transgene 
insertion: the transgenic construct is repressed in most cells; a stable position effect. 3. Position-effect variegation: a 
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producing a striking mosaic pattern. 


lo = telomere, structure at the ends of chromosomes, fe 


= heterochromatin, 


(= centromere, area where spindle fiber attaches, [I= euchromatin 


1526 Position Effects 


moved from its normal position, toward the end of the 
X chromosome, to a position very close to (about 
25 000 bp from) the new heterochromatic breakpoint. 
In this position, the gene is expressed in some cells and 
silenced in others, and thus gives rise to the red and 
white mosaic pattern. 

Since its initial report in Drosophila, PEV has been 
found in a variety of organisms from single-celled 
fungi to mammals and also in a variety of plants. It 
has not been described in bacteria or viruses. Hence, it 
seems to be a phenomenon associated with eukaryotic 
organisms. 

Historically there are fewer examples of stable 
position effects. When stable position effects occur, 
the effect on the gene product is the same in every 
tissue in which it is expressed, hence the term stable. 
Many of the early examples of stable position effects 
(sometimes called cis—trans position effects) appear to 
result from mutations in two different sites within a 
single gene. Indeed, much of the historical discussion 
on the phenotypes that were used to define the stable 
types of position effect may be attributable to a lack of 
understanding of eukaryotic gene structure and will 
not be dealt with here. However, there are a few stable 
type position effects such as the duplication of the Bar 
eye gene and the Brown-dominant phenomenon in 
Drosophila that are curious structural rearrangements 
that result in altered phenotypes. The altered pheno- 
types associated with Bar eyes and Brown-dominant 
are usually fairly uniform among genetically similar 
individuals within a population and thus are good 
examples of stable position effects. However, they 
involve very small or local chromosomal or genomic 
aberrations. These appear to be somewhat special cir- 
cumstances and, while fascinating, they are probably 
best described as individual events, and thus they will 
not be discussed here. 

With the advent of modern molecular biology and 
the ability to make transgenic cell lines and transgenic 
animals, a whole new phenomenon of stable position 
effects or gene silencing was discovered. It is now 
routine to create transgenic cell lines or organisms by 
making a construct containing a gene of interest and 
inserting this construct back into the chromosome. 
Usually, the transgenic construct inserts into an ec- 
topic position in the euchromatic region of a chromo- 
some. That is, the construct does not replace the gene 
already present in the genome; rather it inserts into a 
novel site (Figure 1). The expression of these trans- 
gene constructs often depends on the region into 
which they insert; they can be silenced or expressed 
at significantly reduced levels, either immediately or 
after several cell divisions. The level of expression of 
the transgene is fairly uniform in all cells of a particu- 
lar tissue, hence they are stable position effects. 


The ectopic insertion of transgenes can also pro- 
duce variegated position effects, that is the level ex- 
pression of the transgene varies from cell to cell within 
a tissue. When these insertions are examined, it is 
usually found that the transgene has inserted into a 
heterochromatic region of the genome. Therefore, this 
variegated expression of transgenes appears to be a 
special class associated with disruptions of hetero- 
chromatin and this phenomenon may be closely 
related to PEV which also results from chromosomal 
rearrangements that disrupt heterochromatic regions 
of the genome. 

As stated above, this variability in transgene ex- 
pression poses very real problems when trying to 
develop effective gene therapies. The level of gene 
expression is often critical; if too little product is 
made, the abnormality may not be cured; conversely, 
making too much product may be deleterious. 


Stable Position Effects 


Euchromatic 

As noted above, these kinds of position effects usually 
occur when a transgene has inserted into an ectopic 
site. In many cases researchers have carefully con- 
structed the transgene so that it should be under the 
control of its own regulatory elements; however, in 
the transgenic cells the levels of product are often far 
less than expected. Most researchers believe that this 
repression results from insertion of the transgene con- 
struct into a region of a chromosome that somehow 
precludes its normal regulation and expression. In its 
ectopic (anomalous) position in the genome, the trans- 
gene comes under the control of local regulatory sig- 
nals that override those included in the transgene 
construct and this constrains or inhibits expression. 
This repression can occur either directly or indirectly. 
The transgene may come under the influence DNA 
sequences termed enhancer or silencer elements, 
which can act over long distances, sometimes tens of 
thousands of base pairs, to decrease the amount of 
product made from a gene. Alternatively, the trans- 
gene may insert into a condensed chromatin domain 
within the genome, and this compact type of chroma- 
tin organization may preclude the regulatory proteins, 
called transcription factors, from accessing the trans- 
gene. In fact, the action of silencing elements and 
chromatin domain structure may be related; the type 
of proteins associated with an enhancer or silencer 
may help establish the chromatin structure of a 
domain of the genome. While it has not been studied 
extensively, some transgene constructs appear to 
form concatamers where they insert into the genome, 
that is strings of tandemly reiterated copies, and these 
reiterated sequences themselves may stimulate the 


formation of a condensed or heterochromatic type of 
packaging. This may be the reason why some trans- 
gene constructs appear to be silenced over time, that 
is over several cell division cycles. A solution to the 
problem of transgene silencing may lie with a recently 
discovered kind of DNA element called a boundary 
or insulator element. As their name implies, these 
specialized elements appear to form specialized 
boundaries and perhaps chromatin domains, and 
may shield the transgene from the effects of local 
regulatory signals and allow correct expression of the 
gene. None of the boundary or insulator elements 
discovered to date work for all genes in all places, 
but the search for such ‘universal’ insulators goes on. 
It is hoped that if there is not a universal boundary or 
insulator element, then perhaps combinations of 
boundary elements may provide shielding for most, 
if not all, transgenes. 


Insertions into the Yeast Mating Locus 

The baker’s yeast, Saccharomyces cerevisiae, is capable 
of switching from one mating type to another. It can 
switch from a to æ mating type and vice versa. The 
genetic information for this handy ability is encoded 
by specialized DNA elements, the HMR and HML 
loci, each of which contain a copy of the gene instruct- 
ing the cell to be one or the other mating type. How- 
ever, at these sites the gene conferring the mating type 
is silent. It is only when a copy of the gene from HMR 
or HML is inserted into the third specialized DNA 
element that it is expressed. Only one mating type can 
be expressed at a time for the yeast to be fertile, so the 
copies of the genes in the HM loci must be kept 
silenced. When other genes are inserted into the HM 
loci, they are also silenced. This is another example of 
a stable position effect because, in the absence of any 
mutations, a gene inserted into the HM loci is always 
silenced. 

This phenomenon has been the subject of extensive 
biochemical and genetic analysis and therefore much 
is known about the DNA elements and proteins 
responsible for the silencing that occurs when genes 
are inserted into HM loci (Lustig, 1998). At the DNA 
level, repression requires the presence of binding sites 
for a number of the proteins that are required for the 
initiation and maintenance of silencing. These include 
binding sites for ABF1, ORC (the origin of replication 
complex), and RAP1 proteins. In addition, silencing is 
dependent on a number of other proteins including 
SIR2, SIR3, and SIR4 (the silent information regulator 
proteins) and the amino terminal tails of histones H3 
and H4, which are two of the four components of 
nucleosomes. Biochemical analysis of the structure 
of the DNA at the HM loci suggests it is packaged in 


a closed conformation similar to heterochromatin and 
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therefore this kind of silencing may be analogous to 
position effect variegation in higher eukaryotes (see 
below) except that an inserted gene is always silenced 
completely rather than displaying a mosaic expression 
pattern. 


Insertions into the Yeast rDNA Genes 

The genes that code for the ribosomal RNA (rDNA) 
of S. cerevisiae are located in an array of 100-200 
copies on chromosome 12. Only about half of the 
copies of the rDNA are active at any one time. Struc- 
tural analysis of the rDNA reveals that packaging 
differences exist between the active and inactive 
genes. Those that are inactive are inaccessible to a 
number of biochemical probes suggesting that the 
inactive genes are packaged in a closed conformation, 
perhaps similar to heterochromatin. It has been found 
that when genes are inserted into the rDNA they can 
also be silenced and therefore this is an example of a 
stable position effect. Silencing in the rDNA array is 
also dependent on SIR2, but independent of the other 
proteins required for silencing at the HM loci. Some 
of the proteins required for rDNA have been identi- 
fied, but much less is known about this type of silen- 
cing. However, this is an active and exciting area of 
research, since the rDNA silencing of many organisms 
exists in multiple copies which are arranged as an array 
of repeated units. All rDNA genes are transcribed 
by a special RNA polymerase (polymerase I). It will 
be curious to see if the rDNA genes from a variety of 
different organisms are packaged similarly. 


Variegated Position Effect 


Position-Effect Variegation: Historically the 
Longest Studied of the Variegated Position 
Effects 

Position-effect variegation (PEV) usually occurs 
when a chromosomal rearrangement (an inversion, 
translocation, or transposition — including now trans- 
gene inserts) places a gene that is normally found 
in a euchromatic environment next to a broken piece 
of heterochromatin (Grigliatti, 1991; Henikoff, 1994). 
It is also true that when genes that are normally pre- 
sent in heterochromatic regions of the chromosome 
are placed into a euchromatic region, they too 
variegate (Weiler and Wakimoto, 1998). But there are 
few genes in heterochromatin relative to euchromatin 
and thus the number of examples of the latter are 
limited. Hence, this discussion will focus on euchro- 
matic genes whose expression is silenced as a conse- 
quence of their juxtaposition to a broken segment of 
heterochromatin. As noted above, PEV has been 
found in a wide variety of eukaryotic organisms. 
While this phenomenon has been studied in plants as 
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well as animals, the vast majority of the data on PEV 
comes from studies in Drosophila, and hence most of 
the discussion that follows is taken principally from 
data derived from studies in Drosophila. 

As the ‘variegation’ part of the name implies, PEV 
has a diagnostic phenotype. In tissues in which the 
variegating gene is normally active in every cell, the 
PEV strain is a striking mosaic of cells in which 
the gene is expressed in some cells and silenced in 
others. Several different types of studies have demon- 
strated that the variegating gene itself is not mutated 
or lost from the cells in which it is silenced. In add- 
ition, the likelihood that a gene is silenced as a con- 
sequence of its new position, adjacent to a broken 
segment of heterochromatin, is correlated with its 
distance from the new junction. When one examines 
two or three genes whose expression can be monitored 
easily, the proportion of cells in which a particular 
gene is silenced decreases with its distance from the 
breakpoint. Genes that are closer to the illicit junction 
are silenced more frequently than those Ee away. 
Hence, in the first 40 years of genetic studies on PEV, 
investigators claimed that the gene silencing appeared 
to spread out from the breakpoint along the euchro- 
matin, and indeed many discussions talked about the 
‘spreading effect’ and people attempted to measure 
the distance over which it occurred. The distance over 
which this apparent ‘spreading’ can occur seems to 
vary from as little as 25000 bp in some cases up to 
about 80 polytene bands or about 2 million DNA base 
pairs in the most extreme cases. Since heterochromatin 
is packaged in a highly condensed state and contains 
few active genes, many people have speculated that the 
phenotype associated with PEV results from a hetero- 
chromatic type of packaging spreading out from the 
heterochromatin at the breakpoint. In some cells het- 
erochromatin spreads far enough to package the var- 
iegating gene in a closed, silenced state while in others, 
heterochromatin does not spread far enough and the 
gene is active. Indeed, there are several studies that 
show a strong correlation between the frequency with 
which a variegating gene is inactivated in one tissue 
and the frequency with which that region of the gen- 
ome is packaged as heterochromatin. For example, 
microscopic examination of Drosophila giant polytene 
chromosomes in variegating strains shows that the 
distance over which heterochromatin ‘spreads’ does 
vary from one cell to another. Recent studies have 
used DNA endonucleases, which cleave DNA se- 
quences rapidly when the DNA is in an ‘open’ con- 
formation, to ask whether variegating genes are 
protected from digestion by these nucleases. These 
studies have shown a strong correlation between the 
proportion of cells in which a variegating gene is 
silenced and a reduction in the sensitivity of the 


variegating gene to digestion by DNA nucleases. 
These results are consistent with the hypothesis that 
gene silencing caused by PEV is associated with an 
alteration in chromatin packaging. 

More recently, an alternative theory has been put 
forward to explain PEV. As noted above, in many 
organisms, the heterochromatic portions of the 
chromosomes are located in very specific locations, 
often around the nuclear periphery. It is thought 
that because a variegating gene is now close to a 
heterochromatic breakpoint, in some cells the gene 
will be ‘dragged’ to a heterochromatic compartment 
in the nucleus where it will be silenced. The silencing 
may occur because, once in the heterochromatic com- 
partment, the gene is packaged as heterochromatin or 
perhaps because transcription factors, necessary for 
proper gene activity, are excluded from the compart- 
ment. Indeed, in experiments that determined the 
location of a variegating gene, the proportion of cells 
in which the gene was associated with the nuclear 
periphery was in good agreement with the number 
of cells in which the gene was silenced. It should be 
noted that these two models of the mechanism by 
which PEV occurs are not mutually exclusive. Both 
may be correct. Indeed, the two mechanisms might 
work in concert with one another, that is the ‘spread- 
ing’ of heterochromatin might initiate a mislocaliza- 
tion of the region into a transcriptionally silent 
compartment of the nucleus or vice versa, dragging 
a normally euchromatic region of the genome into a 
special compartment of the nucleus may promote its 
packaging as heterochromatin. 

As noted above, a chromosome is approximately 
50% DNA and 50% protein. The bulk of these pro- 
teins are histones. The remaining proteins are believed 
to have structural and regulatory roles in the chromo- 
some but remain largely unidentified. In the last two 
decades a number of laboratories have tried to identify 
genes that encode chromatin proteins by creating 
dominant mutations that either suppress or enhance 
PEV. The idea is that many of these mutations would 
identify proteins that either package DNA into speci- 
fic types of chromatin structures, or attach chromatin 
to the nuclear matrix or nuclear envelope (chromatin 
structural proteins), or proteins that modify chroma- 
tin structure (chromatin regulatory proteins). While 
mutations in well over 30 different genes have been 
identified as modifiers of PEV (called Su(var) or 
E(var) for suppressor or enhancer of PEV, respec- 
tively), only about a dozen have been cloned, se- 
quenced, and characterized. Many of these seem 
to encode chromatin structural proteins, while others 
encode chromatin modifying proteins, for example, 
histone deacetylase proteins, which modify the his- 
tone tails and in so doing alter the chromatin structure 


from a more ‘open’ to a more ‘closed’ configuration. 
However, not all of these Su(var) proteins have been 
assigned a role in controlling either chromatin struc- 
ture or attachment of chromatin to the nuclear matrix 
or envelope. 

Finally, we now know that the proteins that pack- 
age DNA into chromatin are made in stoichiometric 
amounts (the proteins are made in quite precise 
amounts relative to one another). One set of studies 
placed two or more different variegating rearrange- 
ments together in the same genome to ask whether 
they would compete with one another for the com- 
ponents (proteins) required for silencing. Curiously, 
some variegating rearrangements did appear to com- 
pete with one another. That is, while both variegation 
reporter genes were often silenced in single variegating 
strains, when two variegating rearrangements were 
combined only one of the variegation reporter genes 
was strongly silenced, while the other appeared to 
be ‘relieved’ from silencing. In other combinations, 
neither variegating reporter gene appeared to be 
influenced by the presence of the other, that is, both 
were silenced to the same extent as they were as 
single variegating strains. These data suggest that if 
silencing occurs as a consequence of chromatin pack- 
aging, then these proteins are made in limited and rea- 
sonably precise amounts and that some of variegating 
breakpoints or regions share at least a subset of these 
components, while other variegating breakpoints or 
regions share few if any of these components. These 
results are consistent with studies of the various kinds 
of position effects in yeast which have shown that at 
least one component (SIR2) is common to all, but that 
each kind of position effect has its own unique set of 
proteins. Of course, if silencing occurs as a con- 
sequence of ‘compartmentalization’ rather than chro- 
matin packaging, one might interpret the competition 
experiments described above as two variegating rear- 
rangements attempting to occupy the same compart- 
ment and one out competing the other, while the 
noncompetitors occupy different compartments both 
of which repress the expression of the variegation- 
reporter gene. 

In summary, PEV has been studied for nearly 70 
years. We know that it occurs in a wide variety of 
eukaryotic organisms. There has been widespread 
speculation that the gene silencing that occurs in 
PEV very closely resembles the determinative deci- 
sions that occur during normal stages of development. 
In addition, many researchers have commented that 
PEV closely resembles some of the rearrangements 
and abnormal gene expression that occurs in certain 
types of cancers. Hence, the mechanism of PEV 
may be important in our understanding of normal 
developmental processes in multicellular organisms 
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including humans and in understanding abnormal cell 
growth and altered differentiated states that occur in 
cancer and other types of genetic disorders. Yet we 
still do not have a clear idea of the mechanism by 
which the PEV-associated gene silencing occurs, and 
we have identified only a small subset of the proteins 
involved in higher order chromatin structure. 


Telomeric Position Effect in Saccharomyces 
cerevisiae 

The tips of the chromosomes of all eukaryotes are 
composed of specialized structures called telomeres. 
The kinds of DNA sequences and proteins that act 
together to create telomeres differ from species to 
species, but telomeres form a heterochromatin-like 
structure in all species. Telomeres are often located in 
the nuclear periphery and analysis of telomeric DNA 
suggests it is packaged in a structure that is inacces- 
sible to biochemical probes. Thus in both their general 
structure and their location within the nucleus, 
telomeres resemble centromeric heterochromatin, 
the densely packaged region surrounding the centro- 
mere of the chromosome. Telomeric DNA shares 
another characteristic with centromeric heterochro- 
matin; genes inserted into or very near to telomeric 
DNA display a variegated expression pattern (Fig- 
ure |). This phenomenon is called TPE (for telomeric 
position effect) and it has been studied in both 
fruit flies and yeast but the genetic and biochemical 
analysis has progressed most rapidly in S. cerevisiae 
(Grunstein, 1998). 

The telomeres of S. cerevisiae chromosomes are 
composed of about 300bp of repeats of a simple 
sequence. This sequence is not organized into nucleo- 
somes but instead binds a protein called RAP1 that 
creates a structure called the telosome. Adjacent to the 
telosome are repetitive DNA sequences that are organ- 
ized into nucleosomes. RAP1 binds to the terminal 
sequences and then recruits SIR2, SIR3, and SIR4 
which interact with the N-terminal tails of histones 
H3 and H4 in the sequences adjacent to the telosome 
to form a large complex. If a gene is inserted within 
about 3000 bp of the telomere it displays telomeric 
position effect variegation. In some cells of a yeast 
colony the gene is expressed, while in others it is 
silenced. The interactions between RAP1, the SIR 
proteins, and the histones have been studied in detail 
and much is known about the order in which the 
proteins associate with the complex during its forma- 
tion, the regions of each of these proteins that are 
required for these interactions to occur, and other 
modifying proteins that are required for the assembly 
of this complex. However, precisely how silencing 
occurs at the molecular level is unknown and remains 
an area of active research. 
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Summary and Future Prospects 


In summary, gene silencing is an essential regulatory 
event in the normal development of all eukaryotic 
organisms. In multicellular eukaryotes, gene silencing 
is needed to maintain the patterns of gene expression 
which occur during development and distinguish one 
tissue or cell type from another. There is now a grow- 
ing body of evidence from yeast, flies, and mammals 
that these normal developmental decisions require 
alterations in chromatin structure, that is, changes in 
gene packaging. Furthermore, these normal develop- 
mental decisions share many of the hallmarks of 
PEV in flies and other gene silencing phenomenon 
described in this review. Hence, genomic position 
effects are excellent models for understanding normal 
developmental processes. By corollary, the foundation 
of many of the genetic disease states that we wish to 
understand, such as cancer, are probably embedded, at 
least in part, in alterations in chromatin structure and 
the failure to maintain silencing of particular genes or 
sets of genes. Finally, the gene silencing associated 
with transgene insertions is one of the three major 
impediments to the success of gene therapy, and thus 
understanding position effects is crucial for efficacious 
gene therapy. While we have known about position 
effects for several decades, it is only in the last decade 
or so that we have realized just how important they 
are as models for studying normal developmental 
processes and the molecular basis of some disease 
states, and how important they are to the success of 
gene therapy. Indeed, with this new status, many 
laboratories have recently turned their attention to 
unraveling the molecular basis of a variety of geno- 
mic position effects, both for their own sake and as 
model systems for understanding chromatin structure. 
Therefore, we should see a tremendous growth in our 
understanding of the structure of higher-order chro- 
matin and how alterations in the structure of chromatin 
domains maintain and regulate gene expression. It is 
clear that some proteins such as SIR2 are involved in 
many of these silencing phenomena. Other proteins 
seem to have a more limited or specific role in the 
types of genes they silence. Recent data suggest that 
a small subset of the Su(var) proteins, identified 
because they influence PEV, have a role in packaging 
DNA at telomeres as well as the reiterated DNA 
around the centromeres. Very preliminary data sug- 
gest some of these Su(var) proteins may interact with 
chromatin proteins known to influence chromatin 
packaging at the homeotic control loci, which are the 
genes involved in maintaining segmentation in higher 
organisms. We can look forward to a rapid growth in 
our understanding of higher-order chromatin struc- 
ture and, more importantly, how chromatin structure 


or gene packaging influences gene expression, the tim- 
ing of DNA replication, and DNA repair and recom- 
bination processes. 
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Positive Interference 


See: Interference, Genetic 


Positive Regulator Proteins 
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Positive regulator proteins are required for the acti- 
vation of a transcription unit. 


See also: Transcription 


Positive Supercoiling 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1968 


Positive supercoiling is the coiling of the double helix 
in space in the same direction as the winding of the 
two strands themselves. 


See also: DNA Supercoiling; Negative 
Supercoiling 
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Normal Meiotic Segregation 


When a heterozygous diploid (Aa) undergoes reduc- 
tion to haploidy, each haploid cell contains one or the 
other allele — meiosis segregates alleles. For loci closely 
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linked to a centromere, the segregation occurs in the 
first meiotic division. For markers farther from the 
centromere, an increasing fraction of the meioses seg- 
regates the two alleles into separate nuclei only in the 
second meiotic division. In either case, the resulting 
four haploid products are 2A and 2a (2:2 segregation). 
A full accounting of the genetic content of a meiotic 
tetrad recognizes that each chromosome is a DNA 
duplex, so that each allele is represented twice in a 
single haploid cell. For that reason, tetrads manifesting 
2:2 segregation are sometimes called 4:4 tetrads. 
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Figure | 


The origin of postmeiotic segregation according to the double-strand-break repair model for recombination. 


In meiosis, recombination occurs when each chromosome contains two chromatids. Only the interacting homologous 
chromatids are shown. Arrowheads indicate 3’ ends of polynucleotide strands. Broken lines indicate newly synthesized 
DNA. A chromatid (A) undergoes a meiotically induced double-strand break (B). The 5’ ends created are enzymatically 
resected (C), and the resulting 3’-ended single strands invade a chromatid of the homolog (D). The junctions between 
the duplexes may undergo branch migration (E) outward. DNA synthesis, using the intact homolog as template, is primed 
by the invading 3’ ends, and the resulting joint molecule is held together by a pair of Holliday junctions (F). The joint 
molecule can be resolved by cutting of the junctions (G-H) or by unwinding (I). Segments of the recombinant products 
responsible, in the absence of mismatch repair, for 5:3 or aberrant 4:4 segregation are noted. 
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Definition of Postmeiotic Segregation 


Occasionally, a tetrad is found in which one of the 
haploid cells gives rise to a mixed population of cells, 
half of whose members are A and half a. Such a cell has 
a segment of heteroduplex DNA, with A information 
on one DNA strand and a information on the other. 
A tetrad containing one such cell is typically of con- 
stitution AA AA Aa aa or AA Aa aa aa (a 5:3 tetrad). 
(Sometimes the two types of tetrads are distinguished 
by calling the first one 5:3 and the second one 3:5.) 
Less frequently, tetrads are found in which two of 
the haploid cells are heteroduplex. Such tetrads are 
usually comprised of the four haploid cell types AA 
Aa aA aa and are called aberrant 4:4 tetrads. Both 5:3 
and aberrant 4:4 tetrads contain individual haploid 
products of meiosis in which both A and a informa- 
tion are present — meiosis has failed to fully segregate 
the two alleles. Only in the postmeiotic mitosis do the 
two alleles get fully segregated, and such tetrads are 
referred to as PMS (postmeiotic segregation) tetrads. 


Origin of Postmeiotic Segregation 


The frequency of postmeiotic segregation depends on 
the organism, the nucleotide sequences that differenti- 
ate the two alleles, and the proximity of those mutant 
differences (markers) to a hot spot for recombination. 
The reigning model for the origin of PMS tetrads 
relates PMS closely to gene conversion (the formation 
of 3:1 tetrads) and to crossing over. Figure I, based 
primarily on data from Saccharomyces cerevisiae, dia- 
grams the double-strand break repair process that 
results in meiotic recombination. 

In Figure |, whenever a duplex is produced that is 
black on one strand and white on the other at a genet- 
ically marked site, PMS will result unless mismatch 
repair intervenes. The mismatch repair system has 
the primary role of reducing mutation rates during 
DNA replication by excising newly synthesized poly- 
nucleotide strands at replication forks when such 
a strand contains a mistakenly selected nucleotide or 
has deleted, or added, a few nucleotides. At DNA 
replication forks, the repair system distinguishes the 
correct (old) strand from the mistaken (new) strand by 
the presence of nearby ends in the new strands. The 
repair system then removes the mistaken strand froma 
point near the mistake to the nearby strand end. When 
the system acts on the recombination intermediate 
in Figure l, a 5:3 (PMS) tetrad can become a 6:2 
(conversion) tetrad. Since a 5:3 tetrad is halfway both 
in the pathway from 4:4 to 6:2 and in the allele ratio 
from 4:4 to 6:2, it is sometimes called a half conversion 
tetrad. 


Markers that give the highest frequency of PMS 
when segregating from a mutant/wild-type hetero- 
zygote are those that are close to a meiotic double- 
strand break site (increasing their likelihood of being 
included in heteroduplex) and those less likely to pro- 
voke mismatch-repair when in heteroduplex. 

Aberrant 4:4 tetrads imply regions of the recom- 
bination intermediate that have heteroduplex DNA 
on both participants, possibly as a result of a Holliday 
junction sliding outwards (Figure 1). 


Further Reading 
Stahl F (1996) Meiotic recombination in yeast: coronation of the 
double-strand-break repair model. Cell 87: 965—968. 


See also: First and Second Division Segregation; 
Gene Conversion; Mismatch Repair (Long/Short 
Patch); Nonreciprocal Exchange; Recombination, 
Models of; Tetrad Analysis 


Posttranscriptional 
Modification 
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Posttranscriptional modifications are changes that 
occur to a newly transcribed primary RNA transcript 
(hnRNA) after transcription has occurred and prior to 
its translation into a protein product. Major types of 
modification fall into three categories: 


1. A string of adenines (the ‘poly-A tail’) is added to 
the 3’ end of the transcript by the enzyme poly(A) 
polymerase. This process increases stability and 
may also be implicated in transfer of the RNA to 
the cytoplasm. 

2. A 7-methyl guanosine ‘cap’ is added to the 5’ end of 
the transcript, which prevents nucleases from 
destroying the transcript and is possibly involved 
in ribosome recognition and transfer of the tran- 
script to the cytoplasm. 

3. Intervening sequences (introns) in the primary 
transcript are excised. 


See also: Cap; Introns and Exons; Poly(A) Tail 
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Posttranslational 


Modification 
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Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1022 


Protein synthesis occurs during a process called 
‘translation.’ Posttranslational modification of pro- 
teins refers to the chemical changes proteins may 
undergo after translation. Such modifications come in 
a wide variety of types, and are mostly catalyzed by 
enzymes that recognize specific target sequences in 
specific proteins. The most common modifications 
are the specific cleavage of precursor proteins; forma- 
tion of disulfide bonds; or covalent addition or re- 
moval of low-molecular-weight groups, thus leading 
to modifications such as acetylation, amidation, biot- 
inylation, cysteinylation, deamidation, farnesylation, 
formylation, geranylgeranylation, glutathionylation, 
glycation (nonenzymatic conjugation with carbo- 
hydrates), glycosylation (enzymatic conjugation with 
carbohydrates), hydroxylation, methylation, mono- 
ADP-ribosylation, myristoylation, oxidation, palmi- 
toylation, phosphorylation, poly(ADP-ribosyl)ation, 
stearoylation, or sulfation. Posttranslational modifica- 
tions play a fundamental role in regulating the folding 
of proteins, their targeting to specific subcellular 
compartments, their interaction with ligands or other 
proteins, and their functional state, such as catalytic 
activity in the case of enzymes or the signaling 
function of proteins involved in signal transduction 
pathways. Some posttranslational modifications 
(6.855 phosphorylation) are readily reversible by the 
action of specific deconjugating enzymes. The inter- 
play between modifying and demodifying enzymes 
allows for rapid and economical control of protein 
function. A similar control by protein degradation 
and de novo synthesis would take much longer time 
and cost much more bioenergy. A very powerful way 
to study posttranslational modifications is by ‘pro- 
teomics,’ an extremely rapid and sensitive method- 
ology for the systematic identification of proteins from 
cells or tissues. This involves separation of proteins 
and their isoforms by size and/or charge heterogen- 
eity by two-dimensional gel electrophoresis, recovery 
of individual spots from the gel followed by mass 
spectrometry. The technique not only yields sequence 
information to identify the protein, but also reveals 
very precisely the site and nature of posttranslational 
modifications. 


See also: Proteins and Protein Structure; 
Translation; Translational Control 
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Predator—Prey and 
Parasite—Host Interactions 


AE Weis 
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The interaction between predator and prey, or between 
parasite and host, involves an astonishingly diverse 
array of traits. A partial list of the traits which have 
been implicated to evolve under enemy- or victim- 
imposed selection includes the following: morph- 
ology of limb and jaw, pigmentation, sensory reception, 
innate and learned behavior, cardiovascular function, 
digestive ability, metabolism, reproductive timing, cell 
surface proteins, immune reactions, and endonuclease 
enzymes. As might be expected, the genetic controls 
over variation in the traits on this hodgepodge list are 
equally diverse and can range from allelic substitu- 
tions at single loci to polygenic variation with strong 
environmental influence. 


Sequential Structure of Enemy-Victim 
Interactions 


To be successful an enemy (predator or parasite) must 
complete a sequence of steps. A victim (prey or host) 
can defend itself by thwarting the enemy at one or 
more of these steps. Recognizing this sequential 
deployment of attack and defense traits is key to 
understanding the evolution of enemy and victim strat- 
egies. To take a familiar example, for a lion to eat a 
zebra it must first detect, pursue, and capture it. Zebras 
have defenses against each of these steps. The striped 
color pattern of the zebra pelt makes it difficult for 
lions to pick out a single individual from the herd. This 
can cause a brief but crucial delay in the lion’s pursuit, 
which gives the zebra a head start. Zebras usually 
escape pursuit because their limb structure, muscle 
metabolism, and efficient cardiovascular system allows 
them to outrun most lions. Should the lion catch up, 
zebras occasionally fend them off by vigorous kicking. 

Enemy-victim interactions among microbes can 
also show an attack-defense sequence. T4 phage 
attacks Escherichia coli by first attaching to a receptor 
site on the bacterium’s surface. The phage then injects 
its genetic material, which inserts into the bacterial 
chromosome. Soon a phage endonuclease is expressed 
by the host’s transcription/translation machinery. This 
phage enzyme destroys the host’s genetic material. 
Bacteria evolve resistance to phage attack when a favor- 
able mutation occurs at the receptor site locus. Should 
the phage successfully attach, the bacteria can thwart 
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attack by the rapid expression of its own endonucleases, 
which then digest the enemy’s DNA before insertion. 

The sequential elements of an attack—defense seq- 
uence are to some degree redundant. If all zebras ran 
fast enough to escape the swiftest lion, a zebra with 
stripes would be no better off than one without. Thus 
the action of selection on one trait in the sequence can 
alter the intensity of selection on the others. 


Monogenic Control of Some Simple 
Defense Traits 


Many prey species thwart their predators early in the 
attack sequence by avoiding detection. A case in point 
is the camouflage color patterns of the peppered 
moth, Biston betularia, which is the most celebrated 
example of evolution through natural selection. In 
pre-industrial Britain, this pale moth would rest dur- 
ing the daytime on the light-colored crustose and 
foliose lichens that commonly covered tree trunks. 
After the mid-1800s, dark forms of this species began 
to appear and became the predominant form through 
much of Britain and parts of western Europe by the 
early 1900s. This spread of the dark form coincided 
with industrialization of the European economy. Pol- 
lutants from coal-fired factories led to the demise of 
the lichens and the deposit of soot on the tree surfaces. 
Experiments by H.B.D. Kettlewell showed that when 
pale and dark moths were placed together on a light 
background, the pale months were more likely to 
escape detection by birds. Conversely, dark moths 
more often escaped detection when a dark back- 
ground was used. This implies that bird predation is 
a selective factor contributing to the spread of the dark 
form of the peppered moth. 

There are three color phenotypes for B. betularia. 
Crosses among the forms found in Britain suggest they 
differ allelicly at a single locus. The pale form, typica, 
is homozygous for the t allele. A darker form, insularia, 
carries ani!, 7°, or 7 allele, all of which are dominant to 
the t. The darkest form, canbonaria, carries the C 
allele, which is dominant to all others. This pattern 
of dominance can change when alleles occur in differ- 
ent genetic backgrounds; crosses made between homo- 
zygous British carbonaria and Canadian typica forms 
do not always yield the carbonaria phenotype. 

Color patterns on moth and butterfly wings are 
widely believed to have defensive functions. A par- 
ticular spot and streak of pigment in a wing pattern 
can be under the strong influence of a specific locus. 
This is amply illustrated in the three related species 
Heliconius erato, H. melpomene, and H. cydno. 
Along with several rarer Heliconius species, these 
three are aposematic; that is, they are distasteful and 
warn potential predators of their unpalatability by 


their brightly colored wings. After tasting a Heliconius, 
birds avoid further encounters. These three species are 
found throughout the American tropics, where they 
are involved in a series of Mullerian mimicry rings. In 
any one area, two or three of these species may co- 
occur, and in that one area they have nearly identical 
color patterns. When these same species co-occur in 
other geographical areas they also converge, but on a 
different color pattern. There is a selective advantage 
for co-occurring aposematic species to mimic one 
another: Resident predators need to learn only one 
color pattern and so the species spread the risk of 
being the predator’s ‘learning experience.’ 

The genetics of Heliconius color pattern were elu- 
cidated by P. M. Sheppard and colleagues, and inter- 
preted in a developmental framework by F. Nijhout. 
The basic ground pattern of the wing is yellow and 
white. Imposed on top of this background are black 
margins, bands, and spots, with additional red, brown, 
and orange spots and streaks. Thirty-five loci are 
known to affect color configuration. Five of these 
affect pigment synthesis, which includes melanin for 
black, and variations on xanthommatin for the reds, 
browns and oranges. Other loci alter the size or 
position of color elements. For instance, a locus called 
‘short’ controls the thickness of the black margin 
around a yellow band on the forewing, which in turn 
affects the band’s length. At another locus, called 
‘forewing shutter,’ alternative alleles move a black 
band on the forewing closer or further from the wing 
base. The color pattern loci show a range of domin- 
ance relationships and many of these loci have epi- 
static effects on the expression of others. The N and B 
loci are a case in point. The NY N” bb genotype has a 
yellow band and several yellow spots on the forewing 
while the genotype N? N” BB has a large red spot in 
that position. The F, progeny of a cross between these 
types show six distinct phenotypes that differ in the 
size, location and color of the forewing markings. 
Some of these phenotypes are not obvious intermedi- 
ates between the parental forms, which implies com- 
plex interaction among the two loci. 


Polygenic Control of Some More 
Complex Defense Traits 


When prey are detected, they can survive if they 
escape before capture. Thus running, swimming, or 
flying speed can be crucial to prey survival. Locomo- 
tory speed is a complex trait that involves anatomy, 
metabolism, and behavior. Accordingly, speed differ- 
ences among individuals of a prey population appear 
to be influenced by many loci. T. H. Garland found 
heritable genetic variation in locomotor performance 
in the garter snake, Thamnophis sirtalis, a prey species 
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for raptor birds. Fifty-eight per cent of the phenotypic 
variance in sprint speed, measured ona treadmill, could 
be explained by genetic variation. For treadmill endur- 
ance, genetic variation explained 70% of pheno- 
typic variance. Speed and endurance are likely to 
be influenced by many of the same physiological 
factors, and, not surprisingly, the genetic correlation 
between speed and endurance was 0.58 in this snake 
population. 

Plants cannot run away, but they may curtail her- 
bivore attack by distasteful or toxic metabolites. The 
wild parsnip, Pastinaca sativa, produces an array of 
toxic compounds called furanocoumarins. Studies by 
M. R. Berenbaum and her colleagues showed both the 
concentration and the proportions of these com- 
pounds exhibit quantitative genetic variation. Natur- 
ally, genetic control of toxin production is affected by 
loci for the enzymes along their biosynthetic pathway. 
An increase in one furanocoumarin will generally lead 
to a decrease in another, as the limited pool of inter- 
mediate metabolites gets diverted at different pathway 
points. Concentration is also influenced by seed struc- 
ture, including seed size and the length of the oil tubes 
where the toxin is stored. Therefore, seed toxicity is 
influenced by a variety of genetic loci. Some insect 
species can eat wild parsnip, furanocoumarins not- 
withstanding. The black swallowtail butterfly, Papilio 
polyxenes, detoxifies these compounds through cyto- 
chrome P-450-mediated metabolism. The swallowtail’s 
ability to growth and develop on diets with furano- 
coumarins shows quantitative genetic variation. 


Gene-for-Gene Interactions between 
Plants and Pathogens 


A successful plant pathogen, such as a virus, bacter- 
ium, or fungus, must enter a plant cell to complete its 
growth and reproduction. Plants prevent or slow 
infection through resistance. Some resistance mechan- 
isms act against a broad array of pathogen species 
while others act against a single pathogen genotype. 
The general defenses, called horizontal resistance, 
limit the spread of pathogens through plant tissues. 
These defenses can include secondary metabolites 
such as phytoalexins and phenoloics. Other general 
defenses include structural barriers such as cork 
layers or tyloses (obstructions in the xylem that pre- 
vent pathogen growth through the vascular tissue). 
Horizontal defenses are generally thought to be 
under polygenic control. 

Vertical resistance, in contrast, prevents the estab- 
lishment of infection. This form of resistance is spe- 
cific against particular pathogen races and can be 
controlled by one or a few loci. One well-studied 
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form of vertical resistance involves plant detection of 
specific pathogens. Once detected, the plant mounts 
a hypersensitive response to the nascent infection. 
Hypersensitivity is akin to programed cell death, 
whereby the plant sacrifices the parasitized cell, kill- 
ing the pathogen in the process. Allelic substitutions 
at a resistance locus can break down the detec- 
tion ability and thereby render a plant susceptible to 
infection. 

The specificity of resistance to particular pathogen 
strains can be understood through the gene-for-gene 
concept, proposed by H. H. Flor. He postulated that 
the ability of a pathogen to infect a plant variety is 
governed by a single locus with the alternative alleles 
vir (for virulence, i.e., able to infect) and avir (for 
avirulence, i.e. unable to infect). The avir allele 
codes for a product called an elicitor which the plant 
can detect. The vir allele produces no recognizable 
product. The elicitor puts the hypersensitive response 
into motion. To be resistant, the plant must have a 
mechanism that will recognize the elicitor. Recogni- 
tion is controlled by a single plant locus with alleles R 
for resistance and r for susceptibility. The resistance 
allele is dominant to susceptibility. There are four 
possible combinations of pathogen and plant geno- 
types, but only one of them leads to resistance 
(Table 1). The hypersensitive response is triggered 
only when the pathogen produces the elicitor (patho- 
gen carries avir allele) and the plant has the ability to 
recognize the elicitor (plant carries R allele). 

It will always be in the fitness interests of the 
pathogen to escape detection by its plant host. Thus 
for avir alleles to persist in the face of selection for 
their elimination, they must have other important 
functions in the pathogen. Although progress is 
being made in identifying and sequencing these 
genes, their overall role in pathogen biology is not 
yet clear. 


Summary 


Predators and parasites rely on their victims to get the 
energy and nutrients needed for survival and repro- 
duction. For prey and hosts, survival depends on 
avoiding or escaping enemies. The outcome of an 
encounter between enemy and victim is potentially 
influenced by many traits that can vary widely i in 
their genetic complexity. Some of these traits have 
nonoffensive or nondefensive functions. Thus the 
genetics of a particular enemy—victim interaction can 
overlap with virtually any other area of genetics. 


See also: Frequency-Dependent Selection; Phage 
(Bacteriophage) 
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Table | The gene-for-gene concept for resistance 


Plant genotype 


Pathogen genotype RR, Rr 


rr 


avir Pathogen: produces elicitor 
Plant: can recognize elicitor 
Outcome: elicitor detected, plant is 


Resistant 


vir Pathogen: no elicitor produced 
Plant: can recognize elicitor 
Outcome: nothing to detect, plant is 


Susceptible 


Pathogen: produces elicitor 

Plant: cannot recognize elicitor 
Outcome: elicitor not detected, plant is 
Susceptible 


Pathogen: no elicitor produced 
Plant: cannot recognize elicitor 
Outcome: nothing to detect, plant is 
Susceptible 


Pre-mRNA Splicing 
J D Beggs 


Copyright © 2001 Academic Press 
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This article provides an overview of the process of 
splicing nuclear precursor messenger RNAs (pre- 
mRNAs), the primary transcripts of nuclear protein- 
encoding genes. This includes a general description of 
pre-mRNA introns, the reaction mechanism, and the 
factors involved in the splicing process, followed by 
an overview of the spliceosome cycle. 


Nuclear Pre-mRNA Introns 


In eukaryotes, many nuclear protein-encoding genes 
are interrupted by noncoding sequences (introns) that 
are removed from the nascent transcripts (precursor 
messenger RNAs or pre-mRNAs), and the exons 
(coding sequences) are joined to produce the mature 
mRNA in a process called pre-mRNA splicing. 
Introns occur in only 4% of the genes of the lower 
eukaryote Saccharomyces cerevisiae (budding yeast) 
and, in most cases, only one intron is present. In con- 
trast, the majority of protein-encoding genes in higher 
eukaryotes are interrupted by one or more introns. 
The presence of multiple introns can allow the joining 
of different combinations of exons in alternative 
splicing pathways, to produce distinct mRNAs from 
identical pre-mRNAs and thereby increase the infor- 
mational capacity of the genome. Introns are highly 
divergent in sequence; however, short regions of con- 
sensus sequence have been identified that define 
introns: the 5’ splice site (the 5’ end of an intron), 


the branchpoint (where a branched phosphodiester 
linkage forms during splicing; see below), and a 
pyrimidine-rich tract followed by the 3’ splice site (3’ 
end of an intron) (Figure 1). 


The Splicing Reaction 


The nuclear pre-mRNA splicing reaction involves two 
sequential trans-esterification reactions (Figure 2). In 
the first trans-esterification step, the phosphodiester 
bond at the 5’ splice site is cleaved as a result of 
nucleophilic attack by the 2’ hydroxyl group of the 
branchpoint adenosine. This yields two intermediates: 
the 5’ exon with a free 3’ hydroxyl, and intron-3’ exon 
in a branched or lariat structure, in which the 5’ end of 
the intron is covalently linked to the branchpoint 
adenosine via a 2'-5' phosphodiester bond. In the 
second trans-esterification step, cleavage at the 3’ 
splice site occurs as a result of nucleophilic attack by 
the free 3’ hydroxyl of the 5’ exon. As the intron is 


5' splice site Branchpoint 3' splice site 
t ! 
exon_] GUAUGU ———— UACUAAC —— Y „YAG 
(A) 
' t i 
ag] GURAGU———— CURAY —Y,-YAG [g____] 
(B) 
Figure | Consensus sequences found in introns of 


(A) budding yeast and (B) mammals. Exons are 
represented by boxes, with consensus bases in lower- 
case. Intron consensus sequences are shown in upper- 
case. R, purine; Y, pyrimidine; A, branchpoint adenosine. 
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Figure 2 The two steps of the splicing reaction. (A) 
pre-mRNA,; (B) intermediates; (C) spliced mRNA. Boxes 
represent exons; the thick line represents intron. 
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released, the exons are simultaneously joined via a 
3'-5' phosphodiester bond. This results in the two 
products of the splicing reaction: a mature mRNA 
and the excised intron. 


The Spliceosome 


Splicing of pre-mRNAs in the nucleus is dependent 
upon the formation of a large, dynamic ribonucleo- 
protein complex, the spliceosome, which is formed by 
the assembly of multiple trans-acting RNA and pro- 
tein factors onto the pre-mRNA transcript. Assembly 
of the spliceosome occurs in an ordered, stepwise 
manner, involving the interconversion of several 
distinct complexes. Extensive RNA-RNA and RNA- 
protein interactions are involved in splice site recog- 
nition and the alignment of the splice sites into a 
conformation suitable for the catalysis of intron 
removal. The spliceosome is composed of five small 
nuclear ribonucleoprotein particles (snRNPs) and a 
number of non-snRNP proteins. The snRNPs are 
trans-acting factors with both RNA and protein com- 
ponents, and are named according to the small nu- 
clear RNA (snRNA) each contains (i.e., the U1, U2, 
U4, U5, and U6 snRNAs). The snRNPs play a central 
role in the recognition and alignment of pre-mRNA 
splice sites and the snRNA components have been 
proposed to form the catalytic centre of the spliceo- 
some. 

In addition to the U2-specific spliceosomes (that 
contain U2 snRNPs), higher eukaryotes also contain 
U12-specific (or AT-AC) spliceosomes. The U12 
spliceosome contains an alternative set of snRNPs, 
U11, U12, U44T°, and U64TS, but apparently 
shares the U5 snRNP with U2 spliceosomes. Whereas 
U2-specific spliceosomes splice canonical introns 


(defined by the terminal dinucleotides GT-AG), U12 
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spliceosomes process a distinct class of introns that 
often have noncanonical (AT-AC) terminal dinucle- 
otide sequences (although other conserved sequences 
are more critical in defining these introns). 


Spliceosomal snRNAs 

With the exception of U6 snRNA, the lengths and 
primary sequences of the snRNAs vary greatly from 
species to species; however, their secondary structures 
(as predicted by chemical and enzymatic probing 
and phylogenetic studies) are conserved throughout 
eukaryotes and are important for the binding of 
proteins. The U1, U2, U4, and U5 snRNAs contain 
a conserved structural motif, the Sm-site. This is 
a single-stranded region with consensus sequence 
RAU; GR (where R is a purine base) that is 
normally flanked by two hairpin loops. The U4 and 
U6 snRNAs contain extensive sequence complemen- 
tarity, and are mostly complexed with each other, 
through Watson—Crick base-pairing, in a U4/U6 di- 
snRNP. 


Identification of Proteins Involved in Splicing 
The splicing machinery is relatively abundant in 
higher eukaryotes (e.g., there are approximately 
10°-10° snRNP particles per HeLa cell nucleus), and 
early biochemical purification studies identified many 
mammalian splicing factors. Yeast splicing factors are 
less abundant and so, until the recent development of 
highly sensitive mass spectrometric analyses, genetic 
approaches were used to identify splicing factors in 
S. cerevisiae. Splicing is an essential cellular process 
and, as yeast splicing factors are encoded by single- 
copy genes, mutations in these genes are often lethal. 
A common genetic strategy is to isolate mutations 
with conditional phenotypes, e.g., conferring heat- 
or cold-sensitivity on haploid cells. In this way many 
of the PRP (Precursor RNA Processing) genes were 
identified as encoding proteins involved in splicing. 
Other splicing genes have been identified by their 
ability to suppress conditional prp mutations at the 
restrictive temperature. Such extragenic suppressors 
can be either trans-acting mutations that alleviate the 
original defect, or wild-type genes that confer sup- 
pression when overexpressed. Another very success- 
ful approach has been to search for mutations that are 
lethal in combination with (i.e., enhance the defect of) 
a mutation in another splicing factor. The availability 
of the complete sequence of the genome of S. cerevi- 
siae led to the identification of further yeast splicing 
proteins that are homologs of splicing factors from 
other eukaryotes, and it is evident that the splicing 
apparatus is highly conserved from yeast to man. To 
date, more than 70 yeast proteins have been identified 
as splicing factors. 
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There are two types of snRNA-associated proteins. 
Seven core or Sm proteins (B, D1, D2, Ds, E, F, and G, 
as defined in metazoa) are common components of the 
U1, U2, U4, and U5 snRNPs, while each snRNP also 
contains snRNA-specific proteins. The Sm proteins 
are small (less than 20 kDa) and are characterized by 
a conserved amino acid motif, the Sm motif, which is 
composed of two conserved blocks of 32 and 14 amino 
acids, separated by a nonconserved spacer region of 
variable length. The Sm motif apparently determines 
the folded structure of the Sm proteins and they asso- 
ciate with each other to form a ring-shaped complex 
that binds to the Sm-site in the U1, U2, U4, and U5 
snRNAs. The U6 snRNA does not associate directly 
with the Sm proteins, although a distinct set of seven 
structurally related Sm-like (Lsm) proteins forms a 
ring-shaped complex that binds to the uridine-rich 3’ 
end of U6 snRNA. The many snRNA-specific snRNP 
proteins play multiple roles, including facilitating 
interactions between the snRNPs and the pre-mRNA 
to promote spliceosome assembly. 

In addition to those factors that are tightly asso- 
ciated with snRNA particles, there are several classes 
of non-snRNP proteins. These include proteins 
involved in snRNP biogenesis, spliceosome assembly, 
molecular rearrangements in the spliceosome, spliceo- 
some disassembly, intron debranching, and the recyc- 
ling of spliceosomal components for further rounds 
of splicing. Some of these factors associate only tran- 
siently with the spliceosome complex. To date, eight 
(in yeast) members of the DEAD- or DExH-box 
superfamily of ATP-dependent RNA helicases have 
been shown to be splicing factors. These proteins are 
believed to play important roles in facilitating the 
unwinding of RNA duplexes within snRNPs and 
spliceosomes, thereby controlling the molecular rear- 
rangements that take place during the spliceosome 
cycle. 

In higher eukaryotes, SR proteins (rich in arginine, 
serine repeats), play key roles in splice site selection 
and regulating alternative splicing, partly by acting as 
components of protein bridges that link splice sites 
across introns. In addition, SR proteins bind to exonic 
splicing enhancer sequences and form protein bridges 
that activate the splicing of neighboring weak introns 
(those that have noncanonical splice sites or unusually 
short pyrimidine tracts between the branchpoint and 
the 3’ end of the intron). 


Spliceosome Assembly Pathway 

In vitro studies, performed mainly with HeLa nuclear 
extracts or whole cell extracts of S. cerevisiae, have 
identified several distinct complexes that are involved 


in different stages of the splicing process. These can be 
resolved by gel electrophoresis, gel filtration, density 
gradient centrifugation, or affinity chromatography. 
In yeast, the splicing complexes are interconverted in 
the order CC >B > A2-1 > A1 > A2-2 > A2-3 >I; in 
which CC, B, A2-1 and A1 contain pre-mRNA, 
A2-2 and A2-3 contain intermediates and products, 
and I contains the excised intron. For HeLa splicing 
complexes the terminology is different: up to six com- 
plexes can be distinguished that form in the order: 
E>A>B>C> D>, in which complexes E, A, and 
B contain pre-mRNA, C contains the intermediates, 
and D and I contain the spliced exons and the excised 
intron, respectively. The spliceosome cycle is highly 
conserved between yeast and man, and will be de- 
scribed here with reference to the yeast system. 
Commitment complex (CC) is the earliest splicing- 
specific complex formed on the pre-mRNA, and 
represents the stage in spliceosome assembly when a 
pre-mRNA is no longer competed out of the splicing 
pathway by excess competitor pre-mRNA. It results 
from the ATP-independent association of the U1 
snRNP with the 5’ splice site through base-pairing 
between a conserved sequence in U1 snRNA and a 
complementary sequence at the 5’ end of the intron. 
The primary function of the U1 snRNP seems to be in 
defining the 5’ splice site. CC is converted to complex 
B (or prespliceosome) with the ATP-dependent asso- 
ciation of U2 snRNP at the branchpoint sequence of 
the pre-mRNA through Watson—Crick base-pairing. 
To this pre-spliceosome a U4/U6.U5 tri-snRNP com- 
plex and numerous non-snRNP factors are added in 
an ATP-dependent process that produces complex 
A2-1, a precatalytic form of spliceosome. Concur- 
rently with and/or immediately after tri-snRNP add- 
ition, several intra- and intermolecular rearrangements 
take place (Figure 3). One is the association of U6 
snRNA with the 5’ splice site, which apparently dis- 
places the U1 snRNP. Another is disruption of the 
base-pairing between the U4 and U6 snRNAs. 
Neither the U4 snRNA nor the U1 snRNA seems to 
be required for catalysis. As the U4:U6 base-pairing 
is destabilized, interactions between the U2 and U6 
snRNAs now contribute to the formation of the cat- 
alytic center. Simultaneously, an invariant loop 
sequence in U5 snRNA becomes closely aligned 
with exon sequences just upstream of the 5’ splice 
site. At this stage, the 5’ splice site is bound on the 
exon side by the U5 snRNP and on the intron side by 
the U6 snRNP, and the 2’ hydroxy! of the branchpoint 
nucleotide is in close proximity with the phosphodi- 
ester bond at the 5’ splice site. Thus the active spliceo- 
some, or complex A1, is formed, ready for the first 


x. 
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A exon 


Figure 3 RNA interactions in the spliceosome. A cartoon indicating the dynamic nature of RNA interactions as a 
prespliceosome (B complex) is converted to a spliceosome (complexes A2-I/Al) by the association of U4/U6.U5 
tri-snRNP, displacement of UI snRNP, unwinding of the U4:U6 heterodimer, and interaction of the U2 and U6 
snRNAs to form the catalytic center for the first trans-esterification reaction. Exons, represented by boxes; intron, 
represented by thin line with branchpoint adenosine (A); snRNAs, represented by thick lines; Watson—Crick base 
pairs, represented by cross-bars. The dashed arrows indicate dynamic RNA associations and dissociations occurring 
at this stage. The dotted arrow represents the nucleophilic attack of the phosphate at the 5’ splice site by the 


branchpoint adenosine. 


trans-esterification reaction. At present no informa- 
tion is available on what actually triggers the catalytic 
reaction, or how the target phosphodiester bond is 
identified. 

Before the second trans-esterification reaction can 
proceed, a reorganization takes place to remodel the 
catalytic site. This involves a conformational change in 
which the 3’ splice site is brought into juxtaposition 
with the 3’ end of first exon. The U5 snRNP now 
interacts with sequences in the 3’ exon, immediately 
adjacent to the 3’ splice site, as well as with the 5’ exon. 
Thus the U5 snRNP appears to play an important role 
in aligning the ends of the exons correctly for their 
joining in the second step of splicing. Completion of 
the second trans-esterification sees the formation of 
complex A2-3 that contains the products of splicing, 
i.e., a spliced mRNA and an excised intron. Following 
spliceosome disassembly, the snRNPs are recycled for 
subsequent rounds of splicing. 


Further Reading 

Burge CB, Tuschl Tand Sharp PA (1999) Splicing of precursors 
to mRNAs by spliceosomes. In: Gesteland RF, Cech TR and 
Alkins (eds) The RNA World, 2nd edn, pp. 525-560. New 
York: Cold Spring Harbour Laboratory Press. 

Kramer A (1996) The structure and function of proteins in 
mammalian splicing. Annual Review of Biochemistry 65: 
367-409. 


See also: Introns and Exons 
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Prenatal diagnosis encompasses all techniques for the 
diagnosis of abnormality of the embryo and fetus up 
to the time of delivery. In utero diagnosis of treatable 
conditions allows better management following deliv- 
ery and may be life-saving. For pregnancies at risk 
of severe, untreatable fetal abnormality, prenatal diag- 
nosis allows the mother and partner the choice of 
continuing the pregnancy or seeking a termination. 
Preimplantation diagnosis coupled with zn vitro fertil- 
ization (see In vitro Fertilization) can help to ensure 
that couples at risk of transmitting a serious genetic 
disorder have a normal pregnancy, by replacing in the 
womb only those embryos which have been shown to 
be free of disease. This may be the only option accept- 
able to couples who have ethical objections to the 
selective termination of pregnancy. 

For couples at increased risk of serious genetic 
disease in offspring, prenatal diagnosis provides the 
reassurance without which many would decline to 
undertake a pregnancy. In practice, 93% of prenatal 
tests provide this reassurance, and selective termin- 
ation is indicated in only about 7% of cases. It should 
be emphasized that termination of pregnancy for fetal 
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indications that are not associated with a risk of ser- 
ious fetal abnormality is not permitted by law. Thus, 
current practice does not permit the procedure to be 
used solely for choosing the sex of offspring by ter- 
minating pregnancies of the undesired sex. Mothers 
undertaking prenatal diagnosis should appreciate the 
limitations of the procedure, in particular understand- 
ing the hazards of an invasive test (e.g., 1% miscarriage 
rate following amniocentesis), and the fact that no 
single test or combination of tests can exclude all 
fetal abnormalities. 


Prenatal Diagnostic Procedures 


Amniocentesis (see Amniocentesis) in the second tri- 
mester of pregnancy is the most widely used proced- 
ure for prenatal diagnosis. Amniotic fluid contains 
viable fetal cells which can be cultured and used for 
fetal chromosome analysis, for biochemical analysis, 
or for DNA analysis. Uncultured amniotic fluid cells 
can be used for aneuploidy detection using labeled 
centromeric probes and fluorescence in situ hybrid- 
ization (see In situ Hybridization). The supernatant 
fluid after centrifugation may assist in the diagnosis of 
open spina bifida and anencephaly (in which levels of 
amniotic fluid alpha-fetoprotein are increased), and 
of biochemical defects such as congenital adrenal 
hyperplasia and mucopolysaccharidoses. Care is 
taken to avoid contamination of the amniotic fluid 
with maternal cells, as this may lead to false-negative 
results, particularly in DNA analysis. The most fre- 
quent indication for fetal chromosome analysis is the 
risk of Down syndrome (see Down Syndrome), and 
other indications for DNA analysis include muscular 
dystrophy, cystic fibrosis, thalassemia, and Huntington 
disease. Fetal sexing, by sex chromatin analysis (see 
Sex Chromatin) on uncultured amniotic fluid cells, is 
an important preliminary step in mothers who are 
carriers of X-linked disorders and is used prior to 
DNA or biochemical tests to identify affected males. 

The desire of mothers to avoid late termination of 
pregnancy following prenatal diagnosis has led to the 
development of chorion villus sampling (CVS). This 
involves taking a small biopsy of the placenta, and the 
procedure is undertaken from 10 weeks of gestation 
onwards, 6 weeks earlier than the usual time for 
amniocentesis. The biopsy is usually taken with a suit- 
able needle under ultrasound guidance, using a trans- 
abdominal approach. Each biopsy yields between 
5 and 30mg of placental tissue, which can be used 
for fetal sexing, fetal chromosome analysis, biochem- 
ical studies, and DNA analysis. There are sufficient 
numbers of dividing cells in the syncytiotrophoblast 
to permit direct chromosome analysis within 24h, 
but, in view of the 1.25% frequency of chromosomal 


mosaicism in this tissue, direct analysis should always 
be confirmed by analysis of cultured cells that are large- 
ly derived from the trophoblast mesenchyme. DNA 
analysis and biochemical tests can usually be completed 
from CVS material without the need for culture, and 
this means that the results are often available by 
11-12 weeks of gestation. If termination is indicated 
following any of these tests, it can therefore be per- 
formed in the first trimester. This is less traumatic 
for the mother than a second-trimester termination at 
18-20 weeks. The disadvantage of CVS is the excessive 
risk of miscarriage, which in most centers is estimated 
at about 2%. Chromosomal mosaicism in CVS cultures 
occurs at a rate of 0.7%. The mosaicism is usually con- 
fined to placental tissue, but amniocentesis is recom- 
mended in these cases to exclude disease in the fetus. 

In rare instances, a fetal skin biopsy is taken for the 
diagnosis of serious skin disorders such as epiderm- 
olysis bullosa, and liver biopsy may very occasionally 
be required for the diagnosis of certain metabolic 
disorders. It is also possible to take samples of fetal 
blood by cordocentesis, a technique which involves 
passing a needle transabdominally into a vein in the 
umbilical cord at a point where the cord is inserted 
into the placenta. The same route is used for perform- 
ing a fetal blood transfusion in utero. 

The above procedures are all invasive procedures 
which entail a degree of risk to the fetus. A number of 
other techniques are noninvasive, and chief among 
these is ultrasonography. The resolution of modern 
ultrasound equipment is remarkable, and an extensive 
range of fetal congenital malformations are now 
recognizable by the experienced ultrasonographers, 
often from an early gestational age of 10-12 weeks 
for major malformations and 16-18 weeks for the 
remainder. However, microcephaly and hydrocephaly 
may not become apparent until the third trimester. 
Even greater resolution is possible with ultrafast mag- 
netic resonance imaging, but this form of fetal imaging 
is not yet widely available. Such advances in the de- 
tection of congenital malformations have prompted 
the development of in utero surgical techniques for the 
correction of diaphragmatic hernia, for the repair of 
premature rupture of membranes, for the treatment 
of twin-twin transfusion syndrome, and for the repair 
of open spina bifida. Fetal mortality is currently high 
following surgical intervention in these cases, but pro- 
gress continues to be made and there is much promise 
for the future of fetal surgery. 

It has been known for over 30 years that fetal cells 
are present in small numbers in the maternal circu- 
lation from early in pregnancy. These cells include 
fetal leukocytes, nucleated red cells, and trophoblast 
cells. A major research effort has been mounted 
to develop techniques for isolating fetal cells from 


maternal blood and for developing measures for fetal 
cell enrichment. The aim is to achieve a non-invasive 
procedure capable of permitting the prenatal diagnosis 
of genetic disorders including fetal aneuploidies. 
While individual cases have been reported in which 
the correct fetal diagnosis has been made from fetal 
cells in maternal blood, the method is not yet suffi- 
ciently robust or reliable for routine application. 
The main problems have been in obtaining sufficient 
enrichment of fetal cells, in distinguishing unequivo- 
cally fetal from maternal cells, and in establishing pure 
cultures of fetal cells which can be used for fetal 
chromosome analysis. For certain single-gene defects, 
including thalassemia, DNA analysis of single fetal 
cells microdissected from maternal blood preparations 
have proved successful in a few cases, but the method 
requires special skills and has not been widely adopted. 

It seems likely that the life span of most fetal cells 
in the maternal circulation is short, and it has been 
shown that fetal DNA from decaying fetal cells is 
present in detectable amounts in maternal serum and 
maternal urine. Using DNA analysis by PCR methods, 
male fetal sex can be identified by analysis of DNA 
in maternal serum. This raises the possibility that 
single-gene defects such as Huntington disease may 
be diagnosed in cases where the origin of the mutation 
is paternal. 


Prenatal Screening for Chromosome 
Aberrations 


Up until 1985, the main indication for prenatal diag- 
nosis by amniocentesis was increased maternal age, as 
the risk of Down syndrome and the other serious 
autosomal trisomies increases with age. For example, 
it is estimated that the risk of Down syndrome births 
increases from 1 in 1500, for mothers aged 20 years, to 1 
in 28, for mothers aged 45 years. At a maternal age of 
35 years, the risk of a Down syndrome birth is approxi- 
mately 1 in 380. The frequency of Down syndrome in 
pregnancies tested by amniocentesis at 16 weeks at a 
maternal age of 35 years is 1 in 260, the difference being 
due to the natural loss by miscarriage of Down syn- 
drome fetuses between 16 weeks and full term. 

As only 30% of Down syndrome births occur in 
women aged 35 years and over, most affected mothers 
used to have no indication of whether or not they were 
at risk. This changed in the late 1980s, when it was 
found that all affected pregnancies, irrespective of 
maternal age, were associated in the second trimester 
with abnormal levels of certain biochemical analytes in 
maternal blood. Thus, serum alpha-fetoprotein levels 
were reduced on average to 0.7 multiples of the normal 
median (MOM), human chorionic gonadotrophin 
(hCG) was elevated to over 2 MOM, and unconjugated 
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estriol was reduced to 0.6 MOM. The results for any 
one pregnancy can be combined to produce an estimate 
of risk. More recently, hCG has been replaced by free- 
beta hCG and inhibin has been added. The four ana- 
lytes tested during the second trimester (together with 
the maternal age-related risk) are associated with a 
detection rate of Down syndrome of 76-80% for a 
false-positive rate of 5%. A similar scheme for Down 
syndrome screening in the first trimester, based on 
free-beta hCG, pregnancy-associated plasma protein- 
A (PAPP-A), and the ultrasound measurement of 
nuchal translucency, has been established to give a 
detection rate of 85% for a 5% false-positive rate. 
While a combination of first- and second-trimester 
screening is capable of a detection rate of 94%, this is 
not widely applied in view of expense and practicabil- 
ity. The most usual practice is to offer second-trimester 
serum screening to all pregnant mothers. This appears 
to be acceptable to about 77% of women. 

It should be emphasized that maternal serum 
screening provides an estimate of risk and is not diag- 
nostic; indeed, 5% of those tested have a false-positive 
result. Those with a positive result (i.e., a risk of more 
than 1 in 250) are therefore offered the diagnostic test 
of amniocentesis (or CVS in the case of first-trimester 
screening). 

As part of routine antenatal care includes at least 
two examinations of the fetus by ultrasound, this, in 
itself, provides a form of routine screening for fetal 
abnormality. However, the reliability of the examin- 
ation depends largely on the skill of the ultrasonogra- 
pher, and tertiary referral to centers of excellence is 
widely used in cases of uncertainty. 


Further Reading 
Connor JM and Ferguson-Smith MA (1997) Essential Medical 
Genetics, 5th edn. Oxford: Blackwell Science. 


See also: Down Syndrome; Genetic Counseling; In 
situ Hybridization; In vitro Fertilization; Sex 
Chromatin 
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The primary transcript is the RNA transcript formed 
immediately after transcription in the nucleus before 
RNA splicing or polyadenylation to form the mature 
mRNA. 


See also: Messenger RNA (mRNA); Transcription 
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Definition 


Primase is the name that has been given to the enzyme 
that synthesizes RNA primers. Primers are oligo- 
nucleotides that are complementarily bound to a DNA 
template and from which DNA polymerases elongate. 
Special proteins are responsible for loading primase at 
the origin of replication so that leading strand DNA 
synthesis can commence. In a subsequent step, other 
replication proteins cause primase to initiate DNA 
replication on the opposite lagging strand. After both 
the leading and lagging strand primers have been 
elongated by DNA polymerases, the RNA primers 
are enzymatically eliminated and the resulting gap in 
the DNA sequence is filled in by DNA polymerase I 
and DNA ligase. 


Discovery 


When the biochemical properties of a special DNA 
polymerase were being explored during the 1960s and 
1970s, it was found that DNA polymerase could not 
initiate polymer synthesis on single-stranded DNA 
templates. Polymerase I could only extend the 3/- 
ends of polymers that were bound to those DNA 
templates. Among the gene products that were essen- 
tial for DNA synthesis but for which no function had 
been established, one was found that catalyzed the 
synthesis of short RNA polymers. That enzyme was 
named primase because it had to act first, or prime, the 
DNA template so that it could be copied by the DNA 
polymerase. The synthetic product of this enzyme was 
called an RNA primer (pRNA) and it became the 
fourth class of RNA after mRNA, rRNA, and tRNA. 

Even though there are now four distinct DNA 
polymerase families with multiple different physio- 
logical functions, no DNA polymerase is able to ini- 
tiate chain synthesis. In contrast, the genomes of all 
living organisms encode a single primase. So far, three 
primase sequence families have been identified: bac- 
terial and bacteriophage, archaeal and eukaryal 
nuclear, and the herpes virus-like families. 


Bacterial and Bacteriophage Primases 


The primase gene is one of the approximately 250 
genes common to all bacteria. The length of the 
encoded protein (580 to 600 residues; ~65 kDa) is 


highly conserved. Several phage chromosomes also 
encode primases with sequence similarity to bacterial 
primase. The best understood primases on a biochem- 
ical and genetic level are those of Escherichia coli and 
its phages P4, T4, and T7. 


Biochemical Properties 

The bacterial primase consists of three domains. 
The first 110 residues fold into a zinc-binding domain 
that is thought to bind to single-stranded DNA in a 
sequence-specific manner. This domain has the largest 
numbers of identical and highly conserved residues 
suggesting that it plays an important role. The central 
320 residues fold into a domain that is capable of 
synthesizing RNA. The last 250 residues do not 
include any identical or highly conserved residues 
but are responsible for binding to other proteins 
such as DnaB helicase. 

To understand the biochemistry of primase, it is 
helpful to consider first that primase is the special 
RNA polymerase that acts during DNA replication. 
Second, it is necessary to consider that primase par- 
ticipates in two different processes, replication initia- 
tion and elongation. Primer RNA is synthesized once 
to initiate leading strand DNA synthesis. Primer 
RNA is made repeatedly on the lagging strand tem- 
plate to initiate the synthesis of Okazaki fragments. It 
is during the elongation phase of DNA replication 
that primase plays a key role in establishing the fre- 
quency of Okazaki fragment initiation. Leading and 
lagging strand DNA synthesis must be coordinated. 

In isolation, E. coli primase has the lowest catalytic 
efficiency of any known polymerase. So far, DnaB 
helicase is the only protein that can stimulate primase 
activity to near biologically relevant levels. Because 
DnaB helicase acts only at the replication fork, primer 
synthesis on the lagging strand DNA template is also 
limited to the replication fork. The coordination of 
leading and lagging strand synthesis is attributed to 
communication between the dimeric DNA polymer- 
ase III and DnaB helicase. When the lagging strand- 
specific half of the DNA polymerase completes the 
synthesis of an Okazaki fragment, it communicates 
this to the helicase via the leading strand-specific half 
of the polymerase. At that moment, helicase stimu- 
lates primer synthesis at the replication fork, the lag- 
ging strand half of the DNA polymerase loads onto 
the primer, and a new Okazaki fragment is initiated. 

Primase has very high specificity for its initiation 
sequence. This specificity may result from a need by 
all RNA polymerases to stabilize the first phospho- 
diester bond formed. For instance, the RNA polymer- 
ase that carries out transcription prefers to initiate by 
making the diribonucleotide pppApU. Primer RNA 
synthesis initiates with ATP opposite the thymine in 
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Figure | Initiation specificity of Escherichia coli pri- 
mase. This enzyme recognizes the trinucleotide se- 
quence d(CTG) within the DNA template and uses two 
of those nucleotides to direct the synthesis of the first 
phosphodiester bond. Magnesium is the only required 
cofactor for the reaction. 


the trinucleotide d(CTG) (Figure 1). The enzyme 
catalyzes phosphodiester bond formation with GTP 
to create pppApG. It is either this step or the one 
that precedes it that is the rate-limiting step in 
primer synthesis. The G in the d(CTG) trinucleotide 
sequence is required but has no template function. 
After the initating diribonucleotide has been synthe- 
sized, the next 10 phosphodiester bonds are formed 
quickly. The result is RNA primers that are 12 nucleot- 
ides or longer. When either DnaB helicase or DNA 
polymerase III holoenzyme is present, the resulting 
primers tend to be limited to lengths of 12 nucleotides 
and tend to be formed faster. The primases from T4, 
T7, and P4 have initiation specificity of d(CCG), 
d(GTC), and d(CTN), respectively, and they all initi- 


ate from the central nucleotide in these sequences. 


Essential Physiological Properties 

Primase is essential for DNA replication initiation, 
elongation, and possibly termination. There are tem- 
perature-sensitive E. coli mutants that alter amino acid 
residues either within the central RNA synthesis 
domain or near the C-terminus. When shifted to non- 
permissive temperatures, some RNA synthesis domain 
mutants display a fast arrest of DNA replication and 
others a slow arrest. The fast arrest phenotype correl- 
ates to an elongation mutant which occurs because 
one RNA primer is synthesized every 2s on the lag- 
ging strand DNA template. The slow arrest pheno- 
type correlates with an effect on the initiation of 
chromosomal DNA replication. 

When the C-terminal primase mutants are shifted 
to nonpermissive temperatures, the SOS response is 
induced and the phenotype is a partition defect. Even 
though DNA synthesis continues, the duplicated 
chromosomes are not partitioned between the daugh- 
ter cells. As a result, the cells become filamentous with 
centrally located DNA. It has proven difficult to sort 
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out whether the partition phenotype results from a 
defect in replication initiation or termination, possibly 
because the two may be linked. The C-terminus of 
primase is required for its interaction with the repli- 
cative DnaB helicase. Mutations in this region are 
known to uncouple primase activity from helicase. 
This causes a significant drop in the rate of primer 
synthesis which no longer occurs solely at the replica- 
tion fork. How this might result in a partition defect is 
unclear however. 


Macromolecular Synthesis Operon 

Nearly all of the DNA replication proteins are 
expressed at very low levels (less than 50 copies per 
cell). Thus, it is surprising that the E. coli primase gene 
lies within an operon having two other genes that are 
expressed at levels 100 to 1000 times higher. When 
discovered, this operon provided a counterpoint to 
the original operon model which was developed 
to explain the coordinate expression of the three 
galactoside-utilization genes. Because this operon’s 
three genes encode proteins that are involved in the 
three central processes of molecular biology, it was 
named the macromolecular synthesis operon (Fig- 
ure 2). The first gene in the operon, rpsU, expressed 
at levels of 50000 copies per cell, encodes the very 
highly conserved $21 ribosomal protein. This protein 
participates in the binding of the Shine-Dalgarno 
sequence of mRNA to the 16S rRNA, the initial step 
of translation. The middle gene, dnaG, expressed at 
less than 50 copies per cell, has the least sequence 
conservation of the three genes, and encodes primase. 
The last gene, rpoD, encodes the sigma 70 subunit of 
RNA polymerase. Thus, the macromolecular synthe- 
sis operon includes the genes for the initiation phases 
of replication, translation, and transcription. 

The expression of dnaG is controlled at several 
levels (Figure 2). The major promoter, a nut antiter- 
minator sequence, and the SOS LexA sequence are 
located upstream of the rpsU gene. Most of the tran- 
scripts from these promoters terminate upon comple- 
tion of the rpsU transcript when they encounter the 
first rho-independent terminator. The second rho- 
independent terminator is located after the rpoD 
gene. Most of the transcripts that read through the 
first terminator are processed by RNase E at a site 
between the dnaG and rpoD sequences so that the 
dnaG message is kept low. There is also a heat shock 
promoter within the 3’-end of dnaG that can be used 
to stimulate rpoD expression. Finally, primase is trans- 
lated slowly because its mRNA has a poor Shine- 
Dalgarno sequence and it contains an especially high 
number of rare codons. Within E. coli and other 
species, only a subset of the 61 possible codons tend 
to be found within the sequence of any given MRNA 
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Figure 2 The macromolecular synthesis operon of 
gamma subdivision proteobacteria. The three genes of 
this operon encode proteins involved in the three 
biological processes: translation, replication, and tran- 
scription. By having genes from all three processes 
within an operon, cells can coordinate their expression. 
However, additional control elements allow for a large 
difference in the level of expression of each of the genes. 


molecule. This codon bias is correlated with the rela- 
tive abundance of the cognate tRNAs. Many low- 
abundance proteins are replete with rare codons and 
all high-abundance proteins are encoded by genes 
containing abundant codons. 

The macromolecular synthesis operons from vari- 
ous bacteria differ in which gene they place upstream 
of dnaG but all of them have dnaG upstream from 
rpoD. The bacterial cell must gain something important 
by physically linking the expression of primase and the 
sigma subunit of RNA polymerase even though it 
allows for their discoordinate expression levels. The 
Gram-positive bacteria replace the rpsU with glyS 
which encodes glycyl-tRNA synthetase, a protein 
involved in translation like rpsU. The spirochaetes 
place a thymidine kinase gene upstream and thermo- 
philic bacteria do not have any gene in that location. 


Yeast and Animal Primases 


Biochemical Properties 

The biochemically best-studied eukaryotic nuclear 
primases are from baker’s yeast, fruit flies, mice, calf 
thymus, and humans. A hallmark of eukaryotic pri- 
mases is that its activity resides in a complex of four 
subunits (Figure 3). The heterotetramer consists of a 
1:1:1:1 complex of a small primase subunit (or p49, 
meaning a 49-kDa protein), a large primase subunit 
(p58), a regulatory phosphoprotein sometimes called 
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Figure 3 Arrangement of subunits within the eukar- 
yotic polymerase alpha/primase complex. The subunits 
are linearly arranged as a heterotetramer. The DNA 
polymerase, p180, forms a heterodimer with subunit B, 
p70, which is phosphorylated and dephosphorylated 
during the cell cycle. The small primase subunit, p49, 
forms another heterodimer with the large primase 
subunit, p58, which is responsible for the nuclear 
localization of the complex. The tetramer forms when 
the DNA polymerase subunit binds to the large primase 
subunit. 


subunit B (p70), and DNA polymerase alpha (p120). 
Among the different species, the small primase sub- 
unit has between 410 and 450 amino acid residues and 
the large primase subunit between 500 and 530 resi- 
dues. In all species, the small primase subunit has the 
catalytic site and, usually, the small subunit does not 
require the other subunits for activity. Even though 
the small subunit has no sequence similarity to the 
bacterial primase, its biochemical properties are very 
similar. This appears to be an example of convergent 
evolution of function. Just as in prokaryotes, the rate- 
limiting step is at or before the formation of the first 
phosphodiester bond. Then, the synthesis of subse- 
quent intermediate bonds is rapid and termination 
occurs when the primers are about 10 nucleotides 
long. In the eukaryotic complex, the primer is trans- 
ferred from the primase to the DNA polymerase alpha 
active site without dissociation of the complex from 
the template. 

It is possible to separate the heterotetramer into 
two subcomplexes: one has DNA polymerase alpha 
and subunit B and the other has the two primase sub- 
units. The role of DNA polymerase alpha is to elong- 
ate the short RNA polymer synthesized by primase 
for about 100 nucleotides, at which point yet another 
polymerase takes over. The phosphorylation and 
dephosphorylation of subunit B regulates the activity 
of the DNA polymerase. When subunit B of the het- 
erotetramer is phosphorylated by a cell cycle kinase as 
the cell enters S phase, the resulting phosphoprotein is 
competent and sufficient to initiate DNA synthesis. 
Subunit B also enhances the rate of formation of the 


heterotetramer when the four free subunits are mixed 
together. The large primase subunit stabilizes the 
activity of the small subunit, helps it to remain soluble, 
and is required for its import into the nucleus. 

Using very gentle isolation procedures, several 
other proteins have been shown to associate with the 
heterotetrameric complex. The functions of these pro- 
teins are quite diverse and suggest considerable regu- 
lation. Such a high degree of regulation is to be 
expected for the only enzyme capable of initiating 
DNA synthesis. Included among these proteins are 
the primer-removing ribonuclease FEN-1, the DNA 
strand break-sensitive ADP-ribosylation enzyme 
PARP, the fidelity protein CTF4, the tyrosine kinase 
substrate calpactin I heavy chain, and the glycolytic 
enzyme 3-phosphoglycerate kinase. 

Many well-studied DNA replication proteins are 
able to affect the activity of the heterotetrameric 
complex. Replication protein A (RP-A) is a single- 
stranded DNA binding protein required for lagging 
strand DNA synthesis. This protein inhibits primase 
under most conditions. Several helicases are able to 
stimulate primase activity even in the presence of RP- 
A. These include the eukaryotic helicase B, the SV-40 
large T-antigen, which is both a replication origin 
binding protein and a helicase, and the papillomavirus 
helicase. The SV-40 large T-antigen has been shown to 
exert its effects by binding to subunit B of the com- 
plex, whereas the papillomavirus helicase binds to the 
small primase subunit. In either case, it provides a 
mechanism by which viral proteins are able to utilize 
host proteins for viral DNA replication. 


Primase Genes 

The four proteins of the heterotetramer are each coded 
by single essential genes. In baker’s yeast, the genes are 
named pril (small primase subunit), pri2 (large pri- 
mase subunit), pol2 (subunit B), and poll (DNA poly- 
merase alpha). Mutations within the two primase and 
single DNA polymerase genes lead to phenotypes 
consistent with other DNA replication mutants. For 
instance, temperature-sensitive mutants of pril that 
map near the essential catalytic residues result in 
mutator and hyper-recombination phenotypes. 


Genetic Control of Primase Expression 

The low expression of the four proteins of the hetero- 
tetrameric complex is attributable to a variety of fac- 
tors: weak promoters, weak translation initiation 
sequences, and no codon use bias. (Highly expressed 
eukaryotic proteins show strong bias for the most 
abundant codons.) Even though the yeast small and 
large primase genes lack introns, the fruit fly large 
primase subunit has two small introns within the cod- 
ing sequence and one large intron in the 5’-noncoding 
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sequence. When introns are found in noncoding 
sequences, they often have a regulatory function. 

Because many more proteins must participate in 
eukaryotic DNA replication compared to the situ- 
ation in bacteria, it is important that their synthesis is 
coordinated. Except for a few hints about potential 
control sequences, it is not known how this is 
achieved. One hint is the presence of myb sites within 
upstream sequences of fruit fly small primase and 
DNA polymerase subunits. In fact, DNA polymerase 
gene expression has been shown to be controlled by 
the myb transcription factor. It has also been estab- 
lished that quite a few replication proteins are 
expressed in parallel during the cell cycle with a rise 
just before the onset of S phase. 


Other Primases 


Gene sequences similar to eukaryotic small primase 
subunits have been discovered in the archaeal gen- 
omes. The archaeal proteins are smaller (320-330 
residues) than their eukaryotic counterparts (410-450 
residues). They have been isolated and shown to have 
primer synthesis activity in the absence of a large 
subunit. It is not clear whether archaeal chromosomes 
encode large primase subunits. The primases of mito- 
chondria and of plant nuclei have been characterized 
biochemically but not genetically. There is a report of 
a telomere-specific primase but no corresponding pro- 
tein or gene sequence. Finally, there are no reports 
concerning any chloroplast primase. 


Further Reading 
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Wang TS (1991) Eukaryotic DNA polymerases. Annual Review of 
Biochemistry 60: 513-552. 

Wickner S (1977) DNA or RNA priming of bacteriophage G4 
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the National Academy of Sciences, USA 74: 2815-2819. 
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See also: Codon Usage Bias; DNA Polymerases; 
DNA Replication; Ori Sequences; Primer RNA; 
Temperature-Sensitive Mutant 
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A primer is a short nucleic acid sequence that hybrid- 
izes to one strand of DNA and provides a free 3’-OH 
end at which a DNA polymerase starts synthesis of a 
DNA chain. 


See also: DNA Polymerases; Replication 
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Definition 


Primer RNA is RNA that initiates DNA synthesis. 
Primers are required for DNA synthesis because no 
known DNA polymerase is able to initiate poly- 
nucleotide synthesis. DNA polymerases are special- 
ized for elongating polynucleotide chains from their 
available 3’-hydroxy] termini. In contrast, RNA poly- 
merases can elongate and initiate polynucleotides. 
Primases are special RNA polymerases that synthe- 
size short-lived oligonucleotides used only during 
DNA replication. 

Even though ‘transcriptional’ RNA polymerases 
primarily synthesize messenger RNA, transcripts are 
sometimes used to initiate DNA synthesis. For 
instance, the single-stranded DNA phage M13 gen- 
ome utilizes RNA polymerase instead of primase to 
initiate its DNA synthesis. In addition, the dominant 
hypothesis concerning mitochondrial DNA replica- 
tion initiation is that the mitochondrial RNA poly- 
merase synthesizes a polymer that is not displaced 
from the template. Then, the special RNase MRP 
cleaves the ribopolymer at specific sites enabling the 
exposed 3/-hydroxyl termini to serve as primers for 
DNA synthesis. Finally, transfer RNAs make up a 
special class of primer RNA because certain species 
of tRNA are used by retroviral reverse transcriptases 
to initiate replication of retroviral genomes. It is also 
possible to initiate DNA synthesis without primer 
RNA. The initiator proteins of adenovirus and 29 
covalently attach to both of the 5’-ends of linear duplex 
DNA and provide a serine B-hydroxy group from 
which a DNA polymerase elongates. Another ex- 
ample is that many plasmids encode sequence-specific 
nucleases which cleave one strand of the duplex to 


create a 3/-hydroxyl for the host DNA polymerase. 
An example of an animal virus is parvovirus, where the 
3'-end of the parental strand forms a DNA hairpin and 
becomes the primer of its complement. 


Discontinuous DNA Synthesis and the 
Primer RNA Hypothesis 


After it was established in the mid-1960s that all 
DNA and RNA polymerases catalyze polynucleotide 
synthesis with 5’ to 3’ polarity, the Okazaki laboratory 
performed pulse-chase experiments that helped to 
resolve the paradox of the antiparallel nature of duplex 
DNA. They discovered that two types of new DNA 
were being synthesized. One was short (from 500 to 
2000 nucleotides) while the other was much longer. 
The short replicative intermediates, now referred to 
as ‘Okazaki fragments,’ represent discontinuously 
synthesized DNA. 

In bacteria and eukaryotes, discontinuous replica- 
tion involves the following steps: (1) specific proteins 
bind to the replicative origin; (2) a replicative helicase 
is recruited to that complex; (3) the parental strands 
are unwound; (4) primase is recruited and synthesizes 
primer RNA on each of the two separated strands; (5) 
a replicative DNA polymerase elongates from each of 
these RNA primers to create two ‘leading’ strands that 
migrate away from each other (bidirectional replica- 
tion) leaving the complementary single-stranded tem- 
plate exposed; (6) helicase advances ahead of the 
leading strand DNA polymerase to assist in duplex 
unwinding; (7) single-stranded binding protein (SSB) 
binds to the exposed single-strand template; (8) new 
primer RNA molecules are synthesized complemen- 
tary to the lagging strand template once every 500- 
2000 nucleotides; and (9) another DNA polymerase 
molecule elongates from those primer RNAs. Primer 
RNA is then removed from the new strand to prevent 
it from being incorporated into the chromosome. 

The postreplicative excision processes are not direc- 
tly coupled to discontinuous synthesis (Figure 1). 
There are two enzymes that are able to remove the 
majority, if not all, of the primer RNA. The first is a 
5/-exonuclease and the second is RNase H (H stands 
for RNA/DNA hybrid duplex). This ribonuclease has 
the ability to hydrolyze ribopolymers that are co- 
valently attached to deoxyribopolymers when the 
latter is base-paired with another deoxyribonucleotide 
polymer. After primer RNA has been fully removed 
from the new strand, the resulting gap in the duplex is 
filled in by a repair DNA polymerase. The last ‘nick’ 
in the backbone is closed by a DNA ligase. In bac- 
teria, the primer-removing 5’/-exonuclease is one of the 
three domains of DNA polymerase I. In eukarya, it is 
a free enzyme called Five’ EndoNuclease or Flap 
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Figure | Postreplicative Okazaki fragment processing. 
The two enzymes that remove the RNA primer in 
bacteria are RNase H and the 5’-exonuclease activity of 
DNA polymerase |. In eukaryotes, it is removed by 
RNase HI and FEN-I. RNase H is not effective at 
removing the last ribonucleotide of the primer but the 
5/-exonuclease and FEN-I are. All of these enzymes 
leave an attached 5’-phosphate as shown. The resulting 
gap is then filled by the DNA polymerase | in bacteria 
and DNA polymerase B in eukaryotes by elongation 
from the exposed 3’-hydroxyl chain shown on the right. 
The last phosphodiester bond is formed by DNA ligase 
and NAD* in bacteria and DNA ligase | and ATP in 
eukaryotes. 


EndoNuclease (FEN-1). The gap-filling polymerase 
in bacteria is DNA polymerase I but in eukaryotes the 
responsible enzyme may be DNA polymerase beta. 
Bacteria with temperature-sensitive mutations in 
either their 5’-exonuclease or their RNase H contain 
10 to 30 times more Okazaki fragments than wild- 
type when grown at restrictive temperatures. This 
provides the genetic evidence for the physiological 
functions of these two enzymes and provides a tool 
for studying Okazaki fragments. 


The Sequence of Primer RNA 


Large amounts of Okazaki fragments can be isolated 
from double mutants carrying temperature-sensitive 


Table | 
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lesions in RNase H and the 5’-exonuclease domain of 
polymerase I. When these cells are arrested shortly 
after the initiation of replication, the primer RNA 
attached to the Okazaki fragments is found to be 
1141 nucleotides long. The primer was found to 
be initiated with ATP fives times more often than 
with other nucleoside triphosphates. A phosphodie- 
ster bond is formed between this initiating nucleotide 
and either ATP or GIP to make the initial diribo- 
nucleotide. The Okazaki fragments are most often 
complementary to the template trinucleotide sequence 
5’-d(CTG)-3’. The guanine in the trinucleotide does 
not serve as a template for synthesis but is required for 
directing primase to the cytosine and thymine so that 
it will synthesize pppApG. This high level of initiation 
specificity is a characteristic of many primases and 
transcriptional RNA polymerases (Table 1). In all 
cases though, once the initiating diribonucleotide has 
been synthesized, the rest of the primer sequence is 
determined by the template sequence. 

The biochemical features of primer RNA synthesis 
have provided a number of insights into the control of 
DNA replication. For bacterial and eukaryotic pri- 
mases, the rate-determining step is either the rate of 
formation of the first phosphodiester bond or some 
step preceding it. Rate-limiting steps are usually sub- 
ject to control. In bacteria, DnaB helicase is able to 
stimulate primase activity greatly and, because it 
unwinds duplex DNA, results in the synthesis of pri- 
mers at the DNA replication fork. 

In eukaryotes, replication protein A is a single- 
stranded DNA binding protein that is able to stimulate 
eukaryotic primase. After catalyzing the formation of 
the first bond, primases synthesize the next 10 or so 
bonds rapidly but then slow down. During this brief 
elongation phase, bacterial and eukaryotic primases 
readily incorporate deoxyribonucleotides into the pri- 
mer to create mixed ribo- and deoxyribo-oligomers. 
In the absence of other replication enzymes, bacterial 
primer RNA is 12 or more nucleotides and eukaryotic 
primer RNA is 8 or more nucleotides. When either 
DNA polymerases or replicative helicases and their 
substrates are added to the primase assay mixture, the 


Initiation specificity of selected primases and RNA polymerases as established in biochemical assays 
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Phage T7 gene 4 protein pppApC 
E. coli primase pppApG 
E. coli RNA polymerase pppApU 
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primers are limited to a length of 7 to 12 nucleotides. 
These primer lengths are the same as those observed at 
the ends of Okazaki fragments isolated from living 
organisms. 


Further Reading 

Bambara RA, Murante RS and Hendricksen LA (1997) Enzymes 
and reactions at the eukaryotic DNA replication fork. Journal 
of Biological Chemistry 272: 4647-4650. 

Kornberg A and Baker TA (1992) DNA Replication, 2nd edn. New 
York: WH Freeman. 

Ogawa T and Okazaki T (1980) Discontinuous DNA Replica- 
tion. Annual Review of Biochemistry 49: 421—457. 


See also: DNA Polymerases; DNA Replication 
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Primitive character is an alternative term for plesio- 
morphic character. Hennig, 1966 preferred the more 
technical term plesiomorphy and phylogeneticists 
usually avoid the term primitive because of its anthro- 
pomorphic connotations. 


Reference 
Hennig W (1966) Phylogenetic Systematics. Urbana, IL: University 
of Illinois Press. 


See also: Plesiomorphy; Symplesiomorphy 
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A primosome is a complex of proteins involved in the 
synthesis of the RNA primer sequences for DNA 
replication. It is comprised mainly of primase and 
DNA helicase, which move as a unit with the replica- 
tion fork. 


See also: Primase 


Prions 


See: Spongiform Encephalopathies 
(Transmissible), Genetic Aspects of 


Prisoner’s Dilemma 


See: Hamilton’s Theory, Altruism 
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A probe is the generic term for a sequence of DNA or 
RNA corresponding to a gene or sequence of interest 
that has been labeled either radioactively or with 
another suitable molecule (e.g., biotin, digoxygenin, 
or fluoresin) which can then be detected. The probe 
hybridizes to the complementary nucleic acid sequence 
and therefore labels, identifies, or distinguishes cloned 
DNA, genomic DNA, viral plaques, bacterial colonies, 
or prepared bands on a gel that contains the gene of 
interest. 


See also: DNA Hybridization; Northern Blotting; 
Southern Blotting 


Procentriole 


See: Centrioles 


Processed Pseudogene 
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A processed pseudogene is an inactive gene copy that 
lacks introns, in contrast to the interrupted structure 
of the active gene. Such genes may originate by reverse 
transcription of mRNA and insertion of a duplex copy 
into the genome. 


See also: Pseudogene 
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Progeny testing is a method commonly used in animal 
selection. It relies on phenotypic assessment of an 
individual’s offspring to make decisions regarding 
selection. 

For the traits that have high heritability simpler 
selection protocols (e.g., individual selection) may be 


used. However, once the environmental component of 
phenotypic variation becomes rather large, simple 
evaluation of an individual’s breeding value based on 
the phenotype becomes inaccurate. Progeny testing 
circumvents this problem by analyzing a number of 
offspring from a tested animal. In a large population, 
environmental components of phenotypic variation of 
individual progeny tend to cancel each other out. 
Therefore, the mean value of a selected trait within 
the population of an individual’s offspring serves as a 
good measure of that animal’s breeding value. Thus, 
the parents of progeny with high parameters of 
desired traits are selected for future breeding. If a 
population of tested offspring is large, accuracy of 
selection can be very high. 

For reasons of economic profitability, progeny test- 
ing protocols are usually applied to selection of males. 
Firstly, males can be mated with a large number of 
females to produce the large number of offspring needed 
for analysis. Secondly, in many species generation inter- 
vals for males are shorter than those for females. 

Progeny testing is commonly applied to traits of 
medium heritability such as weight in poultry and 
fleece traits in sheep. An added benefit of this 
approach is that sex-restricted traits such as lactation 
parameters in dairy cattle and other maternal traits 
(e.g., litter size in swine and egg production in poult- 
ry) can be selected for in males by analyzing the 
daughters of sires. Also, for obvious reasons, progeny 
testing is used for traits involving post-mortem evalu- 
ation (e.g., carcass traits). 

The principal drawback of progeny testing is a 
substantial increase in time and associated cost needed 
for animal evaluation. To be evaluated for most traits 
of economic importance the progeny has to reach 
maturity, thus adding at least one generation to the 
time required for a round of selection (up to 8 years in 
some species). To obtain high accuracy of selection, 
large populations of offspring have to be produced 
and maintained, thus making this approach feasible 
only to large-scale breeders. It should be noted that 
progeny testing is also used in plant selection. 


See also: Multifactorial Inheritance; Quantitative 
Trait; Selective Breeding 
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Prokaryotes are organisms whose cells do not contain 
a distinct nucleus, bounded by a nuclear envelope, in 
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contrast to eukaryotic organisms that do have distinct 
nuclei. Prokaryotes include both the bacteria and the 
archaea, which appear to be only distantly related to 
bacteria. Typical prokaryotes are minute rods or cocci 
(spheres) on the order of 0.5-5.0 um in diameter or 
length. The category also includes organisms as small 
as mycoplasmas, with diameters of only 0.1-0.3 um 
and polymorphic shapes due to their lack of cell 
walls, and as large as Epulopiscium, which grows to 
300 um in length. It includes all of the blue-green bac- 
teria or cyanobacteria (formerly called blue-green 
algae), which take on a variety of coccoid and filament- 
ous forms. Eukaryotic organisms include the plant 
and animal kingdoms (however delimited), the fungi, 
and the wide variety of organisms that have generally 
been classified as protista and may now be classified 
as protozoa or chromista (chromophyta) in various 
taxonomic schemes. 


See also: Archaea, Genetics of; Bacteria 
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Proline is one of the 20 amino acids commonly found 
in proteins. Its abbreviation is Pro and its single letter 
designation is P. As one of the nonessential amino 
acids in humans, it is synthesized by the body and so 
need not be provided in an individual’s diet. 

The chemical structure of proline is given below. 


Figure | Proline. 


See also: Amino Acids 
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A promoter is the region of a cellular or viral genome 
that directs transcription of a gene. A gene’s promoter 
consists of the DNA elements required for the proper 
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initiation and regulation of transcription of the gene. 
Promoters contain two types of DNA elements: 
(1) core promoter elements that bind the general tran- 
scription machinery, including the RNA polymerase 
core enzyme and other general transcription factors; 
and (2) regulatory elements that bind sequence-specific 
transcription factors, including transcriptional acti- 
vators and repressors. Each promoter has a unique 
sequence, however, consensus sequences of core pro- 
moters have been deduced for many RNA polymerase 
holoenzymes. The following discussion will be 
limited to promoters for DNA-dependent RNA poly- 
merases in bacteria and eukaryotes, using Escherichia 
coli and mammalian mRNA polymerase promoters as 
examples. 


Prokaryotic Promoters 


Prokaryotic promoters consist of core promoter elem- 
ents that are often, but not always, flanked by regu- 
latory protein binding sites. Promoters in E. coli have 
been well characterized and many encompass fewer 
than 100 base pairs (bp). 

Core promoters in E. coli recruit RNA polymerase 
holoenzymes and direct transcription initiation from 
the correct start site (+1). Most genes in E. coli are 
transcribed by the major o”°-containing holoenzyme, 
and it is the o subunit of the holoenzyme that binds 
core promoters with sequence specificity. Compari- 
son of the sequences of o”°-holoenzyme promoters 
led to the identification of two conserved 6 bp regions, 
the —35 and —10 elements (Figure | A). The consensus 
sequences for the —35 and —10 elements were deduced 
by aligning hundreds of promoters and confirmed by 
mutagenesis. These elements are separated from one 
another and from the transcription start site by vari- 
able length spacer regions. The consensus length of the 
spacer region between the —35 and —10 elements is 
17 bp and that between the —10 element and the tran- 
scription start site is 7 bp, although other lengths are 
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Figure I (A) E. coli o”? holoenzyme core promoter 
elements. (B) Mammalian RNA polymerase Il core 
promoter elements. 


allowed. The start site sequence is most often an A ora 
G on the nontemplate strand, hence the transcribed 
RNA usually begins with ATP or GTP. The strength 
of core promoters directly correlates with their simi- 
larity to consensus. 

In addition to core promoter sequences, E. coli 
promoters usually contain regulatory elements. These 
elements are found upstream, downstream, and in 
some cases overlapping with the core promoter se- 
quences. The regulatory elements typically bind tran- 
scription factors that themselves either affect the 
binding of RNA polymerase to core promoters or 
affect later steps in the transcription reaction. An 
interesting exception to this generalization is the UP 
element of E. coli rRNA promoters which functions 
to increase rRNA transcription by directly interacting 
with the «a subunit of E. coli RNA polymerase. 


Eukaryotic Promoters 


In general, eukaryotic promoters are far more compli- 
cated than their prokaryotic counterparts, as is the 
eukaryotic transcription machinery. Eukaryotes have 
three nuclear DNA-dependent RNA polymerases, 
each of which has its own core promoter elements. 
Mammalian mRNA promoters, which are transcribed 
by RNA polymerase II, consist of core promoter elem- 
ents and regulatory protein binding sites that often 
span tens of kilobase pairs. 

The three main elements identified in mammalian 
mRNA core promoters are the TATA box, the initi- 
ator (Inr), and the downstream promoter element 
(DPE) (Figure |B). It is important to note that some 
mammalian promoters do not appear to contain any of 
these three core promoter elements and in these cases 
it has been difficult to determine how transcription 
initiation occurs from a specific start site. The TATA 
box, which was the first eukaryotic core promoter 
element identified, has the consensus sequence TATA- 
TAAG (nontemplate strand) and is centered approxi- 
mately 29 bp upstream of the transcription start site. 
The Inr spans the transcription start site (conserved A) 
and has a consensus sequence that contains multiple 
pyrimidines (Y). The DPE, which was discovered first 
in Drosophila promoters and later in mammalian pro- 
moters, is centered approximately 31 bp downstream 
of the transcription start site. Typically, if promoters 
contain more than one of these three core elements 
they contain either: (1) a TATA box and an Inr; or (2) 
an Inr and a DPE. 

Eukaryotic core promoter elements serve to recruit 
the RNA polymerase II transcription machinery. 
TFIID, one of the general transcription factors for 
RNA polymerase II, is a multiprotein complex con- 
taining the TATA-binding protein and associated 


factors that can bind with sequence specificity to all 
three of the core promoter elements. Other compon- 
ents of the RNA polymerase II general transcription 
machinery also contact core promoter DNA, and data 
is emerging that these factors may bind specific DNA 
sequences other than the TATA box, Inr, and DPE. It 
is likely that our understanding of RNA polymerase II 
core promoters will dramatically change as more core 
promoters are studied in detail. 

The extended sizes of many eukaryotic promoters 
result from regulatory elements that can be found tens 
of kilobase pairs away from core promoters. Regu- 
latory elements include proximal elements that are 
found close to core promoters and typically bind activ- 
ators, as well as enhancers and silencers that (depend- 
ing on the promoter) can be found close to or at great 
distances upstream or downstream of the core pro- 
moter. In general, enhancers bind proteins that activate 
transcription and silencers bind proteins that repress 
transcription. 

The accessibility of transcription factors to eukar- 
yotic promoters is influenced by nucleosomes and 
higher order chromatin structure. It is likely that the 
chromatin structure of a promoter plays an active role 
in its function. Therefore, it may be more accurate to 
think of eukaryotic promoters as nucleoprotein chro- 
matin structures, since it is the chromatin and not 
simply the DNA sequence that will be recognized 
and accessed by the transcription machinery in eukar- 
yotes. 
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See also: Bacterial Transcription Factors; Sigma 
Factors; Transcription 
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Proofreading is the mechanism(s) for correcting errors 
in protein or nucleic acid synthesis which involves the 
scrutiny of individual units after they have been added 
to the chain. 


See also: Editing and Proofreading in Translation 
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Proofreading Function 


See: Editing and Proofreading in Translation 


Prophage 
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A prophage is the genome of a lysogenic bacteriophage, 
integrated into the bacterial host chromosome. The 
prophage is replicated as part of the host chromosome. 


See also: Bacteriophages 


Prophage, Prophage 
Induction 
See: Lysogeny, Induction of Prophage 
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Protein interaction domains can be defined as discrete 
amino acid stretches that are necessary to mediate 
noncovalent protein-protein interactions. Their dis- 
covery revealed a previously unexpected level of 
structural and functional modularity for proteins in 
general. Importantly, the molecular basis of many 
human diseases can be attributed to defective protein 
interaction domains. Therefore the design of thera- 
peutic strategies based on the manipulation of protein 
interaction domains remains an intense subject of 
investigation. 

Protein—protein interactions are crucial for the pro- 
duction of molecular machines and the organization 
of regulatory pathways. To a certain extent, a cell and 
its proteins can be compared to an engine and its parts: 
proteins interact with one another to mediate many of 
their functions, in much the same way as parts connect 
to one another to allow an engine to operate. Molecu- 
lar machines, such as those involved in DNA repli- 
cation and transcription, are made up of protein 
complexes containing dozens of protein subunits 
that interact with each other. Protein—protein inter- 
actions are also the basis for many regulatory mechan- 
isms. For example, signal transduction pathways are 
composed of series of proteins that physically interact 
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with each other to transmit a signal from the external 
environment to the nucleus. 

Protein-protein interactions can be predictive of 
function. Indeed the finding that a protein of un- 
known function can interact with one or more pre- 
viously characterized proteins can lead to reasonable 
hypotheses for its function. It is this principle that 
prompted the launching of proteome-wide ‘protein 
interaction mapping’ projects. In such projects, high- 
throughput approaches are used to identify large num- 
bers of potential protein-protein interactions. It is 
assumed that this information will help in elucidating 
the functional relationships among the tens of thou- 
sands of uncharacterized proteins predicted from 
complete genome sequences. 

Our current understanding of the molecular basis 
for protein-protein interactions is derived from a 
combination of experimental approaches that include 
biochemistry, genetics, and molecular and structural 
biology. First, these studies have revealed that only 
relatively small regions of proteins, referred to as 
‘protein interaction domains,’ are necessary to medi- 
ate such interactions. Second, the specificity of 
protein-protein interactions is achieved through non- 
covalent interactions between particular amino acids. 
Finally, many protein interaction domains are depend- 
ent upon posttranslational modifications. These three 
fundamental characteristics explain the capability of 
proteins to interact with more than one partner at a 
time, using distinct protein interaction domains. They 
also explain how proteins distinguish between struc- 
turally related partners, constitutively or according to 
the physiological conditions of the cell. 

A large number of human diseases have been found 
to originate from aberrant protein interaction dom- 
ains. For example, germline or somatic mutations that 
lead to small deletions or single amino acid changes in 
protein interaction domains lead to certain cancers. 
Similarly, oncogenic viruses express proteins that dis- 
sociate or prevent host protein-protein interactions. 
In prion-based diseases, an aberrant form of a protein 
mediates a protein-protein interaction with its natural 
counterpart that abrogates its normal function. These 
facts have prompted the pharmaceutical industry to 
design therapeutic programs based on the manipula- 
tion of protein-protein interactions. However, the 
molecular surfaces of most protein interaction do- 
mains are very large, relative to the size of most com- 
pounds that can be used as therapeutic agents. Hence, 
the manipulation of protein interaction domains 
remains a great challenge. 


See also: Genetic Diseases; Proteins and Protein 
Structure; Spongiform Encephalopathies 
(Transmissible), Genetic Aspects of 
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All cells are bound by membranes consisting of lipid 
bilayers. Eukaryotic cells are further compartmental- 
ized into membrane-bound organelles such as mito- 
chondria and the endoplasmic reticulum. Therefore, 
the movement of molecules into or out of cells as well 
as concentrating molecules within discrete locations 
within cells are dependent on systems that are able to 
transport molecules through membranes. There are 
several challenges in moving molecules through lipid 
membranes. Transport systems, which can consist of a 
few to several proteins, must have a high degree of 
selectivity. This requires that a given transport system 
can single out its ‘substrates’ (i.e., the molecules it is 
designed to transport) from a complex mixture of 
molecules. Transport systems also not only have to 
identify their substrates but have to transport those 
substrates through the membrane without comprom- 
ising the membrane’s barrier function. This is not 
trivial when one considers, for example, the relative 
sizes of transported proteins compared to small ions 
that are usually maintained at different concentrations 
on either side of most membranes. Additionally, trans- 
port systems often have to move molecules against 
their concentration gradient and therefore require 
energy. Coupling energy consumption and membrane 
transport was surely a seminal event in the evolution 
of life since the basic mechanism of this process is well 
conserved in all organisms. Although there are trans- 
port systems that are specific for either small or large 
molecules, the discussion below will be restricted to 
those systems that transfer proteins either partially or 
fully through membranes. 


Signal Hypothesis: Directing Proteins to 
Membranes 


Our current view of how protein transport systems 
work has been shaped by quite diverse experimental 
systems utilizing both prokaryotes and eukaryotes. 
Since the majority of proteins expressed i in a cell 
remain in the cytoplasm, one very important discovery 
that provided much of the foundation of modern pro- 
tein secretion studies concerned how proteins destined 
to be exported from the cell (or targeted to membrane- 
bound organelles within a cell) are recognized by the 
cell’s secretion system. What were the special features 
of secreted proteins that allowed them to be discrimin- 
ated from proteins that remained in the cytoplasm? 


Following its transcription and (in eukaryotes) 
processing, mRNA associates with ribosomes where 
translation occurs. It has been known since the 1950s 
that in eukaryotes secreted proteins are synthesized 
on membrane-bound ribosomes and transported 
through the membrane during their synthesis (dis- 
cussed in greater detail below). In contrast, ribosomes 
synthesizing proteins that are not transported through 
a membrane, and hence remain in the cytoplasm, are 
not found attached to membranes. Therefore, how is it 
that certain ribosomes become associated with mem- 
branes and others do not? Were there specialized ribo- 
somes for secreted and cytoplasmic proteins or was 
there information in the mRNA or the newly 
expressed protein that directed the ribosome to the 
membrane? 

The answer turned out to be that the information 
directing the ribosome to the membrane resided in the 
protein being translated. This was shown in a series of 
experiments performed by Giinter Blobel and Berhard 
Dobberstein in the mid-1970s who worked with cell 
lines that secreted large amounts of immunoglobulins. 
They found that the in vitro (i.e., cell free) translation 
of immunoglobulin-encoding mRNA by ribosomes 
in the absence of microsomes (small membrane-bound 
vesicles) resulted in a protein that contained an extra 
20 amino acids at their N-terminus which were not 
present in immunoglobulin proteins that were either 
translated in vitro by microsome-associated ribo- 
somes or secreted from living cells. Additionally, 
it was found that immunoglobulin expressed by 
microsome-associated ribosomes became physically 
enclosed within the microsomes showing that transla- 
tion was accompanied by the translocation of the 
immunoglobulin through the microsome membrane. 

To account for these findings, the signal hypothesis 
was proposed which postulated that the N-terminus 
region (containing the ‘signal sequence’) of a newly 
synthesized protein directs the ribosome to transport 
complexes that mediate the transfer of the protein 
through the membrane. (It was fairly well established 
that the transfer of proteins through membranes was 
performed by membrane-associated protein com- 
plexes which are now referred to as translocases.) 
The signal sequence is removed during the transfer 
process and therefore does not appear in the mature 
protein that is eventually released from the cell. It was 
subsequently found that N-terminus-located signal 
sequences mediate the transfer of the vast majority of 
proteins secreted by both prokaryotic and eukaryotic 
organisms as well as integral membrane proteins that 
are partially transferred through a membrane. In fact, 
protein signal sequence-like ‘addresses’ have subse- 
quently been found which can direct proteins to 
particular locations within cells such as mitochondria, 
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lysosomes, and the nucleus. Therefore, a given protein 
not only contains the amino acid residues necessary 
for its enzymatic and/or structural functions but it can 
also carry its own address which ensures that it makes 
it to the proper location in which to perform those 
functions. 


Secretion in Prokaryotes 


Bacteria secrete proteins for a number of different 
reasons. Some bacteria release enzymes that degrade 
large macromolecules too large to import directly into 
the cell, while other bacteria export toxins that are 
active against either eukaryotic cells or other bacteria. 
Several bacteria construct large extracellular flagella 
which are used for motility and/or attachment to 
eukaryotic cells. These are just some of the activities 
bacteria perform that are dependent on transporting 
proteins through membranes. A typical Gram- 
negative bacterial cell such as Escherichia coli consists 
of an outer cell membrane, a plasma membrane (inner 
membrane), ribosomes, and a nucleoid region con- 
taining the genetic material (Figure IA). Unlike 
eukarytoic cells, which are discussed later, bacterial 
cells do not contain internal membrane-bound organel- 
les. Since all bacterial proteins are synthesized on 
ribosomes located in the cytoplasm, proteins that are 
either exported from the cell or targeted to the peri- 
plasm (the space between the inner and outer 
membranes) or the outer cell membrane must be 
translocated through the inner membrane. Following 
their translocation through the inner membrane, pro- 
teins must be sorted and, in some cases, assembled into 
large multi-subunit structures such as flagella. There- 
fore, protein secretion in bacteria can be divided up into 
three stages: targeting to the innermembrane, transloca- 
tion through the inner membrane, and extracytoplas- 
mic sorting. Here we will restrict the discussion to 
protein secretion in E. coli and closely related Gram- 
negative bacteria since much of the pioneering experi- 
mental work utilized these species as model organisms. 


Bacterial General Secretory Pathway 

In the early 1980s several genetic approaches were 
developed in order to identify E. coli factors involved 
in protein export. One particularly successful 
approach involved the construction of a reporter 
gene that encoded the N-terminal region of the 
maltose-binding protein (MBP) linked to LacZ 
(B-galactosidase). MBP is normally exported to the 
periplasm and it was known previously that MBP’s 
N-terminal region is necessary for its export from the 
cytoplasm (the bacterial equivalent of the eukaryotic 
signal sequence discussed above). In contrast, LacZ is 
normally a cytoplasmic protein but can be targeted to 
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(A) (B) 
Figure | Schematic representation of protein secretion in prokaryotic and eukaryotic cells. (A) A Gram-negative 


bacterium possesses a cytosol (containing the nucleoid region, Nc) which is enclosed by an inner membrane (IM), a 
periplasm (PM), and an outer membrane (OM). Following their synthesis in the cytosol proteins destined to be 
exported are targeted (I) to the secretion machinery located at the inner membrane (represented by a black 
rectangle) by virtue of a signal sequence located at their N-terminus (thick black line). During its passage through the 
inner membrane (2) the signal sequence is cleaved off and the translocated protein can either remain associated with 
the inner membrane (not shown), remain in the periplasm (3a), be transported to and become associated with the 
outer membrane (3b), or be released from the surface of the cell (3c). (B) In eukaryotic cells synthesis and membrane 
translocation occur simultaneously (1) on ribosomes that are associated with the endoplasmic reticulum (ER). Within 
the ER proteins are posttranslationally modified and fold into their three-dimensional structure. Proteins exit the ER 
within membrane-bound vesicles (2) and are transported to the Golgi apparatus where they are further modified (3). 
Proteins exit the Golgi still within vesicles which can either fuse with other intracellular compartments such as 
lysosomes (4a) or with the plasma membrane which can result in the protein either becoming associated with the 


membrane (4b) or it being released from the cell (4c). 


the inner membrane when it is linked to the MBP 
N-terminal region (i.e, MBP’s export signal). E. coli 
containing the MBP-LacZ hybrid protein possess less 
B-galactosidase activity than E. coli containing the 
‘normal,’ cytoplasmically located LacZ protein pre- 
sumably because LacZ is less active when it is targeted 
to the export machinery. B-galactosidase enzymatic 
activity (an indirect measurement of how much LacZ 
protein is within the cytoplasm) can be measured by 
growing the bacteria on media containing a substrate 
of LacZ which produces a blue-colored product. Thus 
the relative activity of the export system can be mon- 
itored by easily detectable blue-white screening. 
Using this and related genetic screens a number of 
sec (secretion) genes were identified. 

One of the first genes to be identified was secB 
encoding a cytoplasmic protein that was later shown 
to function as a ‘chaperone’ for proteins destined to be 
exported. Either as they emerge from the ribosome or 
shortly afterward, a newly expressed protein, if left to 
itself, will fold into a three-dimensional structure 
through intramolecular and/or intermolecular inter- 
actions. Since most export systems can only transport 
unfolded (or extended) proteins through membranes, 
it is vital that prior to its export a protein be prevented 


from assuming a folded conformation. Thus the term 
‘chaperone’ has been coined to refer to auxiliary pro- 
teins that prevent the premature interactions occur- 
ring within an unfolded protein (later we will see that 
chaperone-like proteins can also serve to promote the 
proper folding of a newly synthesized protein). Thus 
in E. coli mutants lacking secB, newly synthesized 
MBP-LacZ quickly assumes a folded structure that 
cannot be translocated through the membrane. The 
SecB protein has been shown to associate with large 
regions (150-200 residues) of proteins as they are 
expressed on the ribosome. 

In addition to having to be kept in an unfolded 
conformation, export precursors must be brought (or 
targeted) to the membrane-located secretion appar- 
atus. In addition to its chaperone function, SecB may 
also serve as a targeting factor based on the fact that it 
has been shown to interact with one of the membrane- 
associated proteins of the export machinery (SecA, 
discussed below). In addition to SecB, E. coli possesses 
an additional membrane-targeting system which, in- 
terestingly, is very similar to the membrane-targeting 
system found in eukaryotes (discussed later below). 
This system utilizes a ribonucleoprotein particle com- 
prised of the Ffh protein and a 4.58 RNA molecule 


(encoded by the ffs gene) that has been found to 
associate with signal sequences. The Ffh/4.5S RNA 
complex targets its associated precursor to the mem- 
brane by virtue of its interaction with the membrane- 
associated FtsY protein. When the genes encoding any 
one of these subunits are deleted (/fh, ffs, or ftsY), 
certain precursor proteins accumulate in the cyto- 
plasm indicating that although these proteins are 
being expressed they are not being delivered to the 
export apparatus. Similarly E. coli strains lacking secB 
accumulate different precursors indicating that the 
Ffth/Ffs/FtsY and SecB systems target different sub- 
sets of precursor proteins to the membrane. 

However they are delivered to the membrane, all 
proteins exported by the general secretory pathway 
are dependent on the SecA protein for translocation 
through the membrane. The secA gene was independ- 
ently identified by two different research groups using 
different genetic screening approaches; one group 
named this gene secA while the other group designated 
the gene as prlD (protein localization). SecA appears 
to be involved in several different aspects of protein 
export — everything from targeting precursors to 
the membrane to being the ‘engine’ driving protein 
translocation. Consequently, SecA is found in the 
cytoplasm as well as associated with the plasma mem- 
brane both as a peripheral inner-membrane protein 
and as an integral-membrane protein. As a cytoplas- 
mic protein, SecA possesses targeting activities some- 
times in conjunction with SecB. The precise nature of 
the SecA targeting activity and how it works together 
with SecB are unclear, but one possibility is that SecA 
serves as a bridge between export precursors and the 
inner face of the plasma membrane. SecA associates 
with the membrane either nonspecifically by interact- 
ing with the phospholipids of the inner leaflet of the 
plasma membrane or by interacting with the SecE/ 
SecG/SecY (SecEGY) complex which spans the 
plasma membrane and where the actual protein trans- 
location process occurs. 

Results from a number of biochemical studies have 
given us at least a preliminary idea of how the SecA/ 
SecEGY translocon orchestrates the passage of a pro- 
tein through a membrane. As discussed above, the first 
step involves the association of SecA with the precur- 
sor; this can occur either in the cytoplasm or at the 
membrane and may involve SecB or the Fth/4.5S 
RNA complex. At the inner leaflet of the plasma 
membrane, the SecA/precursor binds to the cyto- 
plasmic face of the SecEGY complex and soon there- 
after the precursors N-terminal-located signal 
sequence is inserted into the membrane. Following 
ATP binding SecA then undergoes a dramatic conform- 
ational change in which it inserts into the membrane 
together with its associated precursor protein. On the 
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insertion of the SecA/precursor into the membrane, 
the signal sequence is cleaved from the precursor. The 
insertion of the SecA/precursor complex into the 
membrane is thought to be mediated by the mem- 
brane-spanning SecY protein (in contrast, the roles 
played by the SecE and SecG proteins in the trans- 
location process is not known). Upon the hydrolysis 
of its bound ATP, SecA releases at least a portion of the 
precursor protein and deinserts from the membrane 
whereupon it rebinds the precursor at a ‘downstream’ 
(i.e., toward the C-terminus) location at the inner 
leaflet face of the plasma membrane where the whole 
process can be repeated. The cycle of SecA membrane 
insertion/deinsertion driven by ATP hydrolysis is 
thought to lead to the experimentally observable step- 
wise translocation of a protein through a membrane. 
Hence SecA is thought to push (and/or pull) a pre- 
cursor through a membrane by virtue of its ability 
to undergo remarkable conformational changes. In 
addition to the energy provided by ATP hydrolysis 
that drives the SecA membrane insertion/deinsertion 
cycle, a protonmotive force (PMF) (generated by a 
higher concentration of protons on the outside of the 
membrane compared to the inside) also stimulates 
additional translocation of the SecA-bound precursor. 
It is not currently known exactly how the PMF is 
coupled to SecA-mediated protein translocation. 


Assembly of a Complex Organelle in the 
Periplasm and Outer Membrane 

On emerging from the SecA/SecEGY translocon a 
secreted protein can either remain in the periplasm 
or be targeted to the outer membrane. The periplasmic 
space is thought to consist of a gel-like medium that in 
several respects is very different from the cytoplasm. 
For example, since the outer membrane is relatively 
porus, the periplasm is essentially in direct contact 
with the extracellular medium and thus must be able 
to withstand greater fluctuations in pH and electrolyte 
concentrations than the cytoplasm. It is widely 
believed that outer membrane proteins (OMPs) must 
pass through the periplasm and be actively targeted to 
the outer membrane, whereas periplasmic localization 
is thought to occur by a default process. The evidence 
supporting this view has been taken from studies that 
have shown that a normally cytoplasmic protein can 
be localized to the periplasm by simply adding a signal 
sequence, and from experiments in which various por- 
tions of OMPs have been deleted, resulting in them 
being retained in the periplasm. A vital function that 
the periplasm performs i is in providing the proper 
environment for protein folding to occur. At least two 
different enzymatic properties of periplasmic resident 
proteins play a role in the proper folding of proteins 
as they emerge from the SecA/SecEGY translocon. 
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Many proteins possess disulfide bonds which play key 
roles in the proper folding of the protein. The DsbA 
protein (a disulfide-bond catalyst) resides in the peri- 
plasm from where it introduces disulfide bonds in 
either other periplasmic resident proteins or in pro- 
teins on their way to the outer membrane. Another 
important activity that occurs in the periplasm is per- 
formed by the peptidyl-prolyl cis/trans isomerases 
(PPlases), enzymes that catalyze the stereochemical 
orientation of the peptide bond between X-prolyl 
residues (X being any amino acid). In biochemical 
experiments PPlases promote the proper folding of 
certain proteins, and in E. coli mutants lacking surA, 
which encodes a periplasmic PPlase, unfolded or mis- 
folded OMPs accumulate in the periplasm. Thus the 
periplasm can be thought of as a protein assembly area 
where linear polypeptides (the form they are in when 
they are exported from the cytoplasm) are assisted in 
assuming their three-dimensional structure. 

An experimental model that has proved useful in 
delineating the events occurring following inner mem- 
brane transport has focused on the assembly process 
of pili on the surface of Gram-negative bacteria. Pili 
(singular: pilus) are thin hair-like structures that 
branch out from the bacterial cell surface and play an 
essential role in the attachment of the bacterium to 
a eukaryotic cell. The presence of pili on many types 
of bacteria is highly correlated with pathogenicity 
underscoring the fact that bacterial attachment is one 
of the key initial events occurring during a bacterial 
infection. E. coli can express several types of pili 
although any one strain usually only expresses one 
type at any given time. P pili (whose assembly will 
be detailed below) consist of a thin tip fibrillum 
connected to a relatively longer and thicker rod and 
have been shown to preferentially bind glycolipids 
which are found on the surface of certain cells in the 
kidney. Bundle-forming pili (sometimes referred to 
as type 4 pili) form at discrete locations on the bacter- 
ial cell surface and are involved in motility and cell 
aggregation as well as adhesion. Curli are another type 
of pilus that, as their name implies, form tangled 
masses on the bacterial surface and have been demon- 
strated to be important for attachment to eukaryotic 
cells. 

The biogenesis of pili involves several different 
proteins that must be targeted to the correct locations, 
many of which must be kept in an unfolded state (a 
function performed by chaperones), and must be 
assembled in a highly coordinated fashion. The pro- 
cess starts with the translocation of the pilus subunits 
(PapA, E, F, G, H, and K) through the inner mem- 
brane by the Sec general secretory machinery. As the 
pilus subunits emerge from the SecA/SecEGY trans- 
locon, they individually bind to the periplasmic 


chaperone PapD. The chaperone PapD binds to the 
same protein domains of the individual PapA-K sub- 
units that are later used when these subunits oligo- 
merize into a pilus. Therefore, PapD prevents the 
premature association of Pap subunits by shielding 
their intermolecular interaction domains. The PapD- 
subunit complexes traverse the periplasm and are tar- 
geted to an outer membrane protein, PapC, from 
which pilus assembly takes place. By some unknown 
mechanism, the interaction with PapC results in the 
PapD chaperone dissociating itself from its associated 
subunit thereby unshielding the subunit’s interaction 
domain. This unshielding results in the interaction 
domain of the subunit protein now being able to inter- 
act with the corresponding interaction domain of 
another subunit protein that has already been incor- 
porated into the growing pilus structure. The subunit- 
subunit interaction is thought to occur at the periplas- 
mic face of PapC. PapC forms rings in the outer 
membrane that form a central channel which is 
thought to serve as the conduit of the growing pilus 
rod to the exterior of the cell. After its formation in the 
periplasmic space and translocation through the outer 
membrane PapC protein, the pilus rod, consisting of 
PapA which forms the ‘stalk’ and the PapG, E, F, and 
K proteins at the tip, is converted from a linear con- 
formation (which can pass through the PapC channel) 
to its mature helical conformation. Although the broad 
outline of pilus biogenesis is known, it is less clear how 
the regulation of the various pilus subcomponents is 
coordinated during the pilus assembly process. 


Type Ill Secretion Systems 

Several species of Gram-negative bacteria that live for 
at least part of their lifecycle in close association with 
eukaryotic cells have a specialized secretion system, 
termed type III, that injects proteins from the bacterial 
cell directly into the eukaryotic cell cytoplasm. Bac- 
teria that have so far been shown to possess type III 
secretion systems include Chlamydia, Bordetella, 
Pseudomonas, as well as several pathogenic species of 
the Enterobacteriaceae (Escherichia, Salmonella, Shi- 
gella, and Yersinia). Additionally, several species of 
plant-interacting bacteria, including Erwinia, Pseudo- 
monas, Xanthomonas, and Rhizobium, have type III 
secretions that play important roles in the host- 
microbe interaction. The proteins injected by type 
III secretion systems (usually referred to as the ‘effect- 
or’ proteins) often have eukaryotic-like enzymatic 
activities and/or resemble eukaryotic signaling pro- 
teins. This suggests that the type III effector proteins 
serve to redirect host cellular responses according to 
the ‘wishes’ of the bacterium (in some cases, which 
will be discussed below, this has indeed been shown to 
be the case). Although the vast majority of bacterial 


species so far described possessing type III secretion 
systems are either animal or plant pathogens (prob- 
ably due to sampling bias), it is probable that other 
species that exist in a commensal or symbiotic rela- 
tionship with eukaryotic cells will be shown to have 
similar ‘protein injection systems.’ 

Type III systems are composed of over 20 proteins 
making them the most complex bacterial secretion 
system known. They are thought to have evolved 
from a closely related secretion system that exports 
flagellar subunits (flagella are large whip-like extracel- 
lular organelles used for motility). The actual type III 
protein injection apparatus has been visualized using 
the electron microscope and is composed of a needle- 
like complex that projects away from the bacterial cell 
surface and is attached to a cylindrical base that spans 
both the inner and outer bacterial membranes. It is 
thought, but not yet proven, that effector proteins 
are injected into the eukaryotic cytosol by passing 
through the hollow 120-nm-long needle complex. 
Thus in contrast to the Sec-mediated secretion path- 
ways described above in which proteins are thought to 
be exported across the inner and outer membranes in 
separate steps, type III secreted proteins are trans- 
located through both membranes (and probably 
through the eukaryotic membrane as well) in a single 
step without a periplasmic intermediate. One feature 
that is shared between proteins exported by the type 
III and Sec secretion systems is that the export 
signal of type III secreted proteins is found at their 
N-terminus. Surprisingly, the export signal of type III 
secreted proteins appears to consist of two parts: 
sequences at the extreme N-terminus (the first 10-20 
residues) are required for translocation through the 
bacterial membranes but are not sufficient to cross 
the eukaryotic membrane. Additional sequences 
downstream of the N-terminus (residues 20-50) are 
necessary for the exported protein to be injected into 
eukaryotic cells. This suggests that different parts of 
the protein are recognized by the secretion machinery 
during its transport from the bacterial cell into the 
eukaryotic cell. 

The genes encoding the proteins that comprise the 
type III secretion apparatus are found in large clusters 
which are sometimes referred to as pathogenicity 
islands, i.e., segments of chromosomal DNA that are 
absent in closely related but nonpathogenic species. 
Although there are some differences between species, 
both the genes and the gene order are for the most part 
highly conserved in these various type II-encoding 
pathogenicity islands. Additionally, the G + C content 
of the type I]-encoding genes often differs substan- 
tially from the surrounding genome and are either 
located on extra chromosomal plasmids or are flanked 
by insertion sequences, phage genes, or transposable 
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elements. Together these observations suggest that 
type III secretion systems were disseminated en bloc 
throughout the Gram-negative bacteria by horizontal 
(or ‘lateral’) gene transfer. In contrast to the evolution- 
arily conserved type III secretion apparatus, the genes 
encoding the type III effector proteins are for the 
most part unique in each species and are often genet- 
ically unlinked to the gene clusters encoding the secre- 
tion apparatus. This probably reflects the fact that each 
bacterial species faces a unique set of challenges in its 
dealings with eukaryotic cells. This could involve the 
type of organism the bacteria interacts with (e.g., ani- 
mal or plant), the nature of the interaction (e.g., sym- 
biotic or pathogenic), the cell types it is likely to 
encounter (e.g., macrophages or epithelial cells), and 
how the bacterium wishes to influence a particular 
eukaryotic cell response (e.g., inhibit or hyperactivate). 

Considerable progress has been made in determin- 
ing the activities and molecular mechanisms of a num- 
ber of the type III effector proteins. In most cases 
these proteins have been shown to play a clear role in 
determining the outcome of the host—microbe inter- 
action. Examples from Salmonella and Yersinia nicely 
illustrate how type III systems can be used by bacterial 
animal pathogens to modulate host cell responses in 
completely different ways. On attachment of a bacter- 
ium upon their surface, most animal cells normally 
initiate a phagocytic response that involves the uptake 
of the bacterium into a membrane-bound vesicle. 
These bacteria-containing vesicles then fuse with other 
vesicles which contain degradative enzymes and other 
compounds that have bactericidal activities. These 
phagocytic and vesicle-trafficking activities are de- 
pendent on the cellular cytoskeletal system which 
consists of polymeric protein filaments. Following 
its attachment to the surface of a eukaryotic cell, Sal- 
monella is able to both hyperactivate the initial pha- 
gocytic response, thereby becoming enclosed within 
membrane-bound vesicles (from where it is able to 
proliferate), and to later inhibit the fusion of these 
Salmonella-containing vesicles with the vesicles con- 
taining bactericidal compounds. 

How does Salmonella modulate the host cell 
response to its advantage? It turns out that Salmonella 
injects at least four proteins into eukaryotic cells 
via the type III secretion system that either directly 
or indirectly modulate the host cell cytoskeletal sys- 
tem. One of these injected proteins, SopE, activates 
host regulatory proteins of the RhoA family (small 
GTPases that are similar to the oncogenic Ras protein), 
which in turn activates a number of host proteins that 
play important roles in the reorganization of the actin 
cytoskeleton. Similarly, another injected protein, 
SopB, possesses an inositol phosphate phosphatase 
enzymatic activity which in some unknown way 


1558 Protein Secretion Systems 


affects several host signaling pathways that also regu- 
late the organization of the cytoskeleton. A third 
injected protein, SipA, has been shown to directly 
bind actin and affect its polymerization dynamics. 
And finally, a fourth injected protein, SptP, appears 
to have the remarkable property of assisting the host 
cell in regaining its original shape following bacterial 
entry. It performs this healing process by deactivating 
the same regulatory RhoA proteins that SopE had 
earlier activated thereby returning the activities of 
these host proteins to their basal levels. Thus Salmo- 
nella utilizes the type III secretion system to inject 
proteins into host cells and subverts a normal cellular 
defense response for its own ends. 

In contrast to Salmonella, the pathogenic species of 
Yersinia (which includes Y. pestis, the causative agent 
of bubonic plague) utilizes the type ITI secretion sys- 
tem to inject proteins that inhibit the host cell phago- 
cytic response. Following its penetration to deep 
tissue, Yersinia forms microcolonies on host cell sur- 
faces. This property is dependent on the injection by 
Yersinia’s type III secretion system of the YopE and 
YopH proteins into the host cell cytoplasm. YopE’s 
activity is directed at the same family of host regula- 
tory proteins that are targeted by SopE and SptP. 
YopE’s activity is similar to that of SptP in that it 
deactivates the RhoA proteins that are normally acti- 
vated during phagocytosis (or, during an encounter 
with Salmonella, are superactivated due to SopE’s 
activity). But unlike SptP, the activity of YopE results 
in the almost complete depolymerization of actin, the 
consequence being that cellular processes that are 
dependent on the actin-based cytoskeleton, includ- 
ing phagocytosis, grind to a halt. Yersinia’s second 
injected anti-phagocytic effector protein, YopH, is a 
tyrosine phosphatase that disrupts focal adhesions, 
sites where actin fibers contact the cell membrane. 
YopH accomplishes this by dephosphorylating (and 
thereby inactivating) a host kinase whose activity is 
necessary for the formation of focal adhesion com- 
plexes. Thus by the activity of these two proteins 
Yersinia is able to ‘paralyze’ eukaryotic cells by 
attacking their cytoskeletal system and thereby avoid- 
ing the cellular uptake and killing systems. 

The recent discovery of a functioning type III 
secretion system in a plant symbiont indicates that 
bacteria can utilize protein injection systems in non- 
pathogenic interactions with eukaryotic cells. The 
nitrogen-fixing Rhizobium are soil-borne bacteria 
that are able to establish themselves within the root 
hairs of legumes in a symbiotic relationship in which 
the bacterium, in return for being given nutrients and a 
protected environment (in root structures called 
nodules), provides the plant with reduced forms of 
nitrogen. The nodulation process is dependent on a 


number of different secreted bacterial factors that are 
probably necessary for the Rhizobium to identity 
itself in order to distinguish it from other bacteria 
species. No activities or functions have been assigned 
to the few Rhizobium type III effector proteins so far 
identified, although in Rhizobium mutants lacking a 
functional type III secretion system various pheno- 
types are observed depending on the host plant species 
or cultivar. Since the type III secretion machinery is 
assembled relatively late during nodulation after the 
bacterium has gained entry into plant root cells, it is 
thought that the type ITI effector proteins help lower 
the plant defense reactions much like the role type III 
effector proteins play in some bacteria—animal cell 
interactions (see the discussion on Yersinia above). 
The discovery of Rhizobium’s type III secretion sys- 
tem has greatly increased the possible range of activ- 
ities secreted bacterial proteins may possess. 


Secretion in Eukaryotic Cells 


Shortly following the development of cytochemical 
staining techniques in the latter half of the nineteenth 
century it became clear that eukaryotic cells possess a 
complex internal structure characterized by distinct 
membrane-bound organelles. These organelles include 
the nucleus, mitochondria, and chloroplasts (in plant 
cells), peroxisomes and lysosomes, and, most import- 
ant for protein secretion, the endoplasmic reticulum 
and Golgi apparatus (Figure |B). Having several dis- 
tinct membrane-bound compartments means that the 
sorting and targeting of proteins to their proper desti- 
nations is a considerable task. As discussed above, 
protein-expressing ribosomes are found either free in 
the cytosol or bound to a membrane-bound organelle 
called the endoplasmic reticulum (ER). Following 
their synthesis on free ribosomes, proteins can either 
be retained in the cytosol or targeted to the nucleus, 
mitochondria, chloroplasts, or peroxisomes. Asso- 
ciation of the ribosome with the ER membrane is 
dependent on whether the protein the ribosome is 
expressing contains a signal sequence at its beginning 
(see the discussion on signal sequences above). Pro- 
teins synthesized by ER-associated ribosomes are 
translocated through the ER membrane while their 
translation is in progress. From there proteins may 
either be retained within the ER or transported to the 
Golgi apparatus and eventually transported to either 
lysosomes, the plasma membrane, or secretory 
vesicles. A discussion of both the discovery and our 
current knowledge of the ER Golgi secretory vesicles 
cell exterior pathway follows below. 

Although early biologists could describe in great 
detail the cell’s internal structure using light micro- 
scopy they had very little idea of the physiological 


functions of the various subcellular organelles. It 
wasn’t until the latter half of the twentieth century 
that sufficient technical advances were made to allow 
for distinct organelles to be assigned functions. The 
discovery of the role played by the ER and Golgi ap- 
paratus in protein secretion was especially momentous 
since the function of these organelles was completely 
unknown for several decades. The ER is comprised 
of a network of membrane-bound tubules and sacs 
(called cisternae) that projects out from the nuclear 
membrane and extends throughout the cytoplasm. 
The membranes of the ER account for about half of 
the cell’s total membrane and the space within the ER’s 
membrane occupies approximately 10% of the total 
cell volume. The early microscopists could discern 
two distinct types of ER: one with a smooth surface 
(smooth ER) and one that possesses a membrane 
studded with small minute particles (rough ER). It is 
now known that these two types of ER perform dif- 
ferent functions within the cell: The smooth ER is 
involved in lipid metabolism while the rough ER, 
whose rough appearance is due to its membrane 
being decorated with ribosomes, is the location 
where the protein secretory pathway begins as well 
as being the location where secretory proteins assume 
their proper three-dimensional structure. The Golgi 
apparatus (named after the Italian histologist Camillo 
Golgi) is composed of a series of flattened membrane- 
bound cisternae and associated vesicles. The cisternae 
display a polarity, both in structure and function, in 
relation to the ER and the plasma membrane. The 
cisternae closest to the ER are designated as cis and, 
as will be discussed later, serve as the entry point for 
secretory and other proteins into the Golgi arriving 
from the ER. These proteins are then transported 
through the Golgi and exit its trans face, which is 
usually orientated toward the plasma membrane. 
How was the role played by the rough ER and the 
Golgi apparatus in protein expression and secretion 
shown? In the 1960s specialized cells of the pancreas, 
which secrete large amounts of digestive enzymes into 
the small intestine, were used together with the tech- 
nique of electron microscopic autoradiography in 
order to determine the pathway secretory proteins 
take on their journey to the exterior of the cell. 
These landmark experiments involved pulsing thin 
slices of living pancreatic cells with a radioactive 
amino acid (in this case ?H-labeled leucine) followed 
by various ‘chase’ periods in which °H leucine 
was removed and replaced with nonradioactive leu- 
cine. Following the chase period, the cells were treated 
with chemical fixatives to preserve their internal 
structure and were then overlaid with a photographic 
emulsion. Radioactive emissions from the cellular 
sample exposes the overlaid emulsion which can then 
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be visualized by electron microscopy, thus making it 
possible to align the areas of radioactivity with sub- 
cellular structures. Such ‘pulse-chase’ experiments can 
follow the fate of the °H leucine following its incor- 
poration into a newly synthesized protein. It was 
found that following a 3 min pulse radioactivity was 
confined to the rough ER. If this pulse was followed 
by a chase of 7 min the majority of the radioactivity 
was localized to the Golgi apparatus. With longer 
chase periods (1-2h) the radioactivity appeared 
within secretory vesicles that later fused with the 
plasma membrane and whose contents were released 
to the cell exterior. These results, along with findings 
using other experimental techniques, gave very strong 
support to the idea that secretory proteins were 
synthesized on ER-associated ribosomes and travelled 
through the ER and the Golgi apparatus prior to their 
loading into secretory vesicles and eventual release 
from the cell. Later techniques (discussed below) 
were used to determine exactly what was occurring 
to these proteins as they made the journey from the 
ER to the exterior of the cell. 

Following the identification of the organelles 
involved in the protein secretory pathway, great effort 
was made in trying to understand the mechanisms 
underlying the secretory process and identifying the 
factors that comprise the secretory machinery. One 
general approach involved reconstituting secretory- 
competent complexes from cellular components 
usually derived by fractionating whole cells into 
their components. By ‘adding back’ various compon- 
ents to a cell-free secretion system the roles played 
by individual proteins can begin to be understood. 
Another approach that involves the genetic analysis 
of the yeast Saccharomyces cerevisiae has been extre- 
mely useful in identifying important secretory factors. 
(Yeast are eukaryotic cells with the same basic internal 
structure as animal and plant cells.) These studies took 
advantage of the fact that secretory yeast mutants that 
fail to export proteins can still be manipulated to con- 
tinue protein synthesis. Under these conditions secre- 
tory mutants (which can be generated by exposing the 
yeast cells to either a chemical mutagen or UV light) 
enlarge and become denser and can be separated from 
wild-type cells by density gradient centrifugation. In 
one study using this approach to generate yeast secre- 
tory mutants, 23 complementation groups were identi- 
fied revealing for the first time that protein secretion 
required the activities of several gene products. The 
later identification of the yeast genes giving rise to 
these secretory mutants provided huge advances in 
understanding the mechanisms involved in secretion 
in yeast cells as well as in plant and animal cells, since 
clearly homologous genes (in an evolutionary sense) 
were found in ‘higher’ eukaryotes. 
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Endoplasmic Reticulum 

In eukaryotic cells protein translation temporarily 
ceases after the first 20-30 codons are translated 
from a mRNA transcript encoding a protein destined 
for the ER. The N-terminal segments of ER-targeted 
proteins contain a stretch of hydrophobic residues 
(designated the signal sequence) that, as it emerges 
from the ribosome, binds to a signal recognition par- 
ticle (SRP). SRPs are composed of six polypeptides 
and a small RNA molecule and their recognition and 
binding to signal sequences results in the first branch 
point in the secretory pathway: the partitioning of pro- 
teins expressed by cytoplasmic ribosomes from those 
proteins which will be expressed by ER-associated 
ribosomes. SRP binding to the signal sequence inhibits 
further translation and targets the ribosome/mRNA/ 
signal sequence-containing polypeptide complex to 
the rough ER through its binding to a specific receptor 
located on the ER membrane. On binding to its recep- 
tor the SRP is released from the ribosome/nRNA/ 
peptide complex and the signal sequence-containing 
polypeptide is inserted into the ER at the Sec61 pro- 
tein translocation complex. The Sec61 complex (so 
named from the original yeast mutants) consists of 
three membrane-spanning proteins that link the ribo- 
some to the ER membrane as well as serving as a 
protein-conducting channel through the ER mem- 
brane. The release of the SRP restarts the translation 
process, but now that the ribosome is engaged by the 
Sec61 complex the growing polypeptide chain is trans- 
ferred directly from the ribosome into the Sec61 mem- 
brane channel and eventually emerges within the 
lumen of the ER. 

It is likely that protein translocation through the 
Sec61 complex is mechanistically similar to the bacter- 
ial SecA/SecEGY-mediated export discussed above. 
There is a high degree of similarity at the sequence 
level between the proteins of the Sec61 complex and 
the bacterial Sec proteins suggesting that the prokar- 
yotic and eukaryotic protein translocation systems 
are evolutionarily related. An important difference 
though is that in eukaryotes translocation through 
the ER membrane occurs cotranslationally while in 
prokaryotes translocation through the SecA/SecEGY 
complex occurs posttranslationally (bear in mind 
that eukaryotes possess posttranslational translocation 
systems such as those that transfer proteins through 
the mitochondrial membrane). Both systems can only 
translocate unfolded, extended polypeptide chains; in 
prokaryotes chaperones (discussed above) maintain 
the SecA/SecEGY-destined protein in an unfolded 
state while in eukaryotes protein folding prior to 
translocation is impossible since translation and 
membrane translocation occur simultaneously. At 


the functional level, like in the SecA/SecEGY system, 


the signal sequence is usually cleaved off shortly fol- 
lowing its insertion into the Sec61 complex and there- 
fore does not appear in the posttranslocated protein 
found within the ER. 

Once the translocation process is initiated a protein 
can either be fully or partially transferred through the 
membrane. Proteins that are fully translocated into 
the ER lumen are those that are either destined to 
travel the entire secretory pathway on their journey 
to the exterior of the cell or that function within the 
lumen of the ER, Golgi, or the lysosome (discussed 
later). On the other hand, proteins destined to become 
incorporated within the plasma membrane or the 
membranes of the ER, Golgi, or lysosome are only 
partially transferred through the ER membrane and 
travel along the secretory pathway to their final destin- 
ation as membrane proteins instead of as soluble pro- 
teins. What determines whether a protein is fully or 
partially translocated? It turns out that proteins that 
are partially translocated possess a ‘stop-transfer’ 
sequence which, following it synthesis by the ribo- 
some, effectively blocks further translocation of the 
polypeptide through the ER membrane by causing the 
ribosome to dissociate from the Sec61 complex (trans- 
lation of the remaining portion of the protein con- 
tinues in the cytosol). Stop-transfer sequences consist 
of 20-25 hydrophobic residues that form an alpha 
helix within the membrane. It is thought that they 
inhibit further translocation by becoming ‘stuck’ 
within the membrane due to their hydrophobicity. 

A protein containing one stop-transfer sequence 
will be orientated with its N-terminus inserted into 
the lumen of the ER and it C-terminus in the cytosol. 
If, for example, this is a protein destined to be a 
receptor at the plasma membrane, it will be positioned 
with its N-terminus on the exterior of the cell and its 
C-terminus as its intracellular domain. Because both 
soluble (i.e., lumenal) and membrane-associated pro- 
teins are transported by membrane-bound vesicles 
(discussed in detail below) the lumen of the ER and 
Golgi, as well as the interior secretory vesicles, are 
topologically equivalent to the exterior of the cell. 
Membrane proteins can also be in the ‘reverse’ orient- 
ation with their C-terminus within the ER lumen and 
their N-terminus located in the cytosol. This orient- 
ation is achieved by proteins possessing an internal 
signal sequence (as opposed to one located at the 
N-terminus), which although recognized by an SRP 
is nevertheless not cleaved following its insertion into 
the ER membrane. Depending on how the internal 
signal sequence is positioned it can alternatively direct 
the membrane protein to have its N-terminus within 
the ER lumen and its C-terminus in the cytosol in an 
identical orientation to that of proteins possessing 
a signal sequence at the N-terminus and an internal 


stop-transfer sequence. Internal signal sequences act 
as transmembrane alpha-helices that anchor a protein 
in the membrane. And finally, several proteins such as 
a number of cell surface receptors span the membrane 
several times. It is thought that these proteins align 
themselves in relation to the membrane by possessing 
a series of alternating internal signal and stop-transfer 
sequences. Such proteins contain multiple ‘loops’ on 
both sides of the membrane; these loop domains often 
play important roles in the function of these proteins. 

What happens to a protein once it is fully or par- 
tially (if it is a membrane-bound protein) inside the 
lumen of the ER? One very important event is that 
newly translocated proteins, which arrive in the ER 
as extended polypeptide chains, assume their three- 
dimensional structure and in some cases, depending 
on the protein, are assembled into multisubunit com- 
plexes. This is far from a trivial task since a linear 
polypeptide can fold into a seemingly infinite number 
of different three-dimensional structures. (Being able 
to predict the three-dimensional structure of a protein 
from its primary sequence remains one the most for- 
midable challenges in molecular biology.) There are a 
number of proteins which reside within the ER that 
assist in the proper folding of newly synthesized pro- 
teins and, in fact, proteins that fail to fold properly are 
retained within the ER by a ‘quality control’ system 
which prevents misfolded proteins from continuing 
on to the Golgi apparatus. 

How does a protein become properly folded and 
how is it distinguished from its misfolded variants? As 
an unfolded protein enters the ER it immediately 
associates with several lumenal ER proteins that both 
assist the folding process and serve as retention signals 
for unfolded or misfolded proteins. The best charac- 
terized folding facilitator is a member of the heat 
shock family of chaperones called BiP (binding 
protein). BiP’s interaction with unfolded proteins is 
thought to be owing to its affinity for surface-exposed 
hydrophobic residues. Properly folded globular pro- 
teins usually have most of their hydrophobic residues 
buried within their interior where solvent water mol- 
ecules are excluded. Probably one of the functions of 
BiP, like that of chaperones in general, is to prevent 
surface-located hydrophobic residues of one protein 
interacting with similar surface-located hydrophobic 
residues of another protein which, if allowed to hap- 
pen, would lead to the formation of insoluble protein 
aggregates. 

Complete and proper folding of some proteins also 
depends on disulfide bond formation catalyzed by the 
enzyme protein disulfide isomerase as well as enzymes 
that transfer and process oligosaccharides. An event 
that actually occurs while a protein is still being trans- 
lated and translocated into the ER lumen is that 
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oligosaccharides consisting of 14 sugar residues are 
covalently attached to specific asparagine residues 
within the protein. These oligosaccharide units are 
further modified both within the ER (where four 
sugar residues are trimmed off) and the Golgi appar- 
atus (discussed below). By some incompletely under- 
stood mechanism protein glycosylation events within 
the ER drive the folding process of many proteins and 
provide some type of signal for the quality control 
system when proper folding is completed. It is fairly 
well established that once proper folding and post- 
translational modifications are completed a protein 
dissociates from the ER folding and modifying factors 
which signals to the quality control system that a 
protein is ready to be exported from the ER. In con- 
trast, proteins that fail to fold properly remain asso- 
ciated with BiP and other chaperones and eventually 
are ‘retrotranslocated’ to the cytosol through the Sec61 
channel. Thus the Sec61 translocon can perform pro- 
tein translocation in either direction through the ER 
membrane. After being removed to the cytosol these 
misfolded proteins are deglycosylated (in the case of 
glycoproteins) and degraded. 


Vesicle Loading and Transport to the Golgi 
Apparatus 
Although experiments utilizing electron microscopic 
autoradiography (discussed above) showed clearly 
that the Golgi apparatus was the next destination of 
secreted proteins after the ER, it was far from clear at 
the time how proteins were transported between these 
two organelles. It turned out that solving that problem 
was dependent on advances in understanding the na- 
ture and properties of biological membranes. In the 
early 1970s it became evident that biological mem- 
branes consisted of lipid bilayers which possessed a 
‘sidedness’ that is preserved when two membrane- 
bound vesicles fuse (or when a vesicle buds off from 
an organelle). A variety of experimental approaches, 
utilizing both yeast genetical and cell-free biochemical 
analysis, then firmly established that inter-organelle 
transport involved the formation of a transport vesicle 
by a budding event at the ‘donor’ organelle (in this 
case the ER) followed by fusion of the transport ves- 
icle at the ‘acceptor’ organelle (the Golgi). When a 
transport vesicle buds off from the ER membrane the 
lumen (or contents) of the vesicle will be typologically 
equivalent to the lumen of the ER. And similarly, 
when that same transport vesicle fuses with the mem- 
brane of the Golgi the vesicle’s lumenal contents will 
be released into the lumen of the Golgi. 
Vesicle-based transport systems such as the one 
that links the ER and Golgi can be divided up into 
five steps: cargo loading, budding from the donor 
membrane, physical movement toward the Golgi, 
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recognition of the acceptor (Golgi) membrane, and 
fusion with and unloading into the Golgi. Cargo load- 
ing involves the concentration of secretory proteins 
(and ER lumenal proteins that, as we will see later, will 
be returned to the ER) at specialized locations on the 
ER membrane, so-called ‘budding sites.’ Although it 
is unclear how secretory proteins are concentrated at 
budding sites, at least one requirement is that the 
proteins be free of the ER lumenal chaperones and 
thus properly folded. Budding sites are characterized 
by a high concentration of the COPII protein on the 
exterior (cytosolic) surface of the ER membrane 
(COP stands for coat protein). COPII is required for 
the physical deformation of the ER membrane that 
accompanies the budding process and is immediately 
shed from the surface of the transport vesicle once 
budding has been achieved. 

Following budding transport vesicles move toward 
the Golgi along microtubules which are protein fila- 
ments of the intracellular cytoskeletal system. During 
this movement toward the Golgi certain cargo con- 
tents are selectively removed from the transport ves- 
icle and returned to the ER. Such ‘retrograde’ transport 
involves both the returning of ER resident proteins 
such as BiP as well as misfolded proteins and protein 
aggregates which inadvertently escape the ER quality 
control retention system (discussed above). How are 
such proteins identified and removed? Following 
budding from the ER the COPII coat protein is 
replaced with a coat protein designated as COPI 
which appears to play a dual role in retrograde trans- 
port. Similar to COPII’s activity at the ER membrane, 
COPI physically deforms the surface of the transport 
vesicle’s membrane resulting in the formation of 
retrograde vesicles that proceed back to the ER. 
COPI-mediated bud formation also requires the small 
guanine nucleotide-binding protein ARF (ADP- 
ribosylation factor) which forms a complex with 
COPI on the outer vesicle surface. Like other guanine 
nucleotide-binding proteins (e.g., Ras) ARF can exist 
in either an ‘active’ GTP-bound or an ‘inactive’ GDP- 
bound conformation (the hydrolysis of ARF-bound 
GTP to GDP converts ARF from the active to the 
inactive conformation). It is believed that GTP-bound 
ARF together with COPI promotes the budding pro- 
cess and that once this is accomplished ARF is con- 
verted to its inactive GDP-bound conformation, 
which causes the COPI/ARF complex to dissociate 
from the vesicle membrane and allows the vesicle to 
subsequently fuse with the acceptor membrane. COPI 
also plays a role in the selective loading of retrograde 
vesicles. ER lumenal proteins such as BiP terminate in 
a characteristic tetrapeptide sequence (Lys-Asp-Glu- 
Leu or in the one-letter amino acid code KDEL) that 
binds to a retrieval receptor known as the KDEL 


receptor, which in turn is thought to interact directly 
with COPI. COPI acts similarly on the membrane of 
the Golgi by removing ER lumenal proteins from the 
Golgi apparatus and returning them to the ER via the 
retrograde transport system. 

Once a transport vesicle has budded from the ER it 
has to first recognize and then fuse with the acceptor 
membrane of the Golgi. Recognition involves binding 
between specific pairs of membrane proteins on the 
vesicle and acceptor (or target) membranes. The bind- 
ing of these proteins (which are called v-SNAREs and 
t-SNAREs, respectively) is thought to recruit other 
proteins to this initial point of contact between the 
two membranes which promote membrane fusion and 
the subsequent release of the vesicle’s cargo into the 
Golgi. SNARE-type proteins have been implicated in 
ensuring the accuracy of intracellular vesicle traffick- 
ing. The basic idea is that vesicles budding from one 
cellular compartment (like the ER) have a particular 
type of SNARE protein on their surface that will only 
interact with its ‘cognate’ SNARE-type protein on the 
membrane the vesicle is bound for (in this case the 
Golgi). Experimental support for this model has come 
from studies using liposomes (artificial membrane- 
bound vesicles) which can be made to display a 
defined SNARE on their surface. In experiments 
where liposomes associated with different SNARE 
proteins are mixed, it has been found that only those 
liposomes displaying cognate SNARE proteins will 
physically interact and initiate membrane fusion. 


Golgi Apparatus 

The Golgi apparatus (or complex) is located at the 
crossroads of the secretory pathway. The Golgi 
receives proteins from the ER and further modifies 
them before their distribution to their eventual destin- 
ations. The Golgi also serves as the synthesis site of 
glycolipids and other complex lipids as well as serving 
as the site of polysaccharide synthesis in plant cells. 
In addition to its protein modification, sorting, 
and synthesis activities, the Golgi also acts as a filter- 
ing system, separating those proteins destined for 
the plasma membrane from those to be returned to 
the ER. The Golgi apparatus is thus involved in the 
synthesis, processing, and sorting of a broad range of 
cellular constituents. 

The Golgi consists of flattened membrane-enclosed 
cisternae (also referred to collectively as the Golgi 
‘stack’) which display a polarity both in function and 
structure. Transport vesicles arriving from the ER 
empty their contents at the entry face (or cis face), 
which in most cells is orientated toward the ER and 
nucleus. Proteins delivered to the entry face then pro- 
ceed through the Golgi ‘stack’ of cisternae and eman- 
ate from the exit face (or trans face) from where they 


continue to their eventual destination (which could 
be the plasma membrane, lysosomes, or secretory 
vesicles). Proteins can be modified by a number of 
Golgi-resident enzymes which are asymmetrically 
distributed across the stack: cisternae at the entry 
face contain enzymes that perform the initial glycosy- 
lation reactions, medical cisternae contain the enzymes 
necessary for the intermediate reactions, and the 
cisternae at the exit face contain the enzymes that 
perform the terminal reactions. These modification 
enzymes possess a Golgi retention signal analogous 
to the KDEL sequence tag that serves as the ER reten- 
tion signal (discussed above). 

It is not known how the Golgi complex or its 
resident proteins organize themselves into a polarized 
organelle. It is also unknown how proteins are trans- 
ported across the Golgi stack: Is their transport 
mediated by vesicles that bud and fuse among the 
static cisternae, or do the cisternae themselves, along 
with their contents, progress through the stack? For a 
number of reasons both of these alternatives have been 
difficult to establish or rule out; therefore, a vigorous 
scientific controversy has surrounded this issue for a 
number of years. It is now starting to appear that both 
transport mechanisms occur simultaneously; namely 
that some proteins are transported through the Golgi 
stack via vesicles while other proteins remain within 
the cisternae compartments which themselves mat- 
urate from entry to exit face cisternae. Cisternal pro- 
gression and maturation probably account for the 
relatively slower transport of protein aggregates 
through the Golgi (these aggregates are nevertheless 
usually too large to fit within vesicles). In contrast, 
most cargo move through the Golgi at a much faster 
rate compared to aggregates and are probably trans- 
ported from one cisterna to the next via small vesicles. 
Thus it appears that both sides of this debate about 
how protein transport through the Golgi stack occurs 
are correct. 

Much of the protein modification that occurs 
within the Golgi is a continuation of the glycosylation 
process that was initiated in the ER. As described 
above, many proteins exit the ER possessing one to 
several oligosaccharide complexes consisting of 10 
sugar residues which are linked to the protein via 
specific asparagine residues (N-linked glycosylation). 
Within the Golgi these oligosaccharide complexes 
undergo further modifications that determine the pro- 
tein’s eventual destination, and/or its enzymatic activ- 
ities. For proteins destined to be loaded into secretory 
vesicles or for membrane proteins, the following series 
of enzyme-mediated modifications occur: first three 
mannose residues are removed from the 10-residue 
complex, then an acetylglucosamine residue is added 
followed by the removal of two more mannoses, the 
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addition of a fucose and two more acetylgluco- 
samines, and finally, in a step occurring in the exit face 
cisternae, three galactose and three sialic acid residues 
are added. Although the above represents the ‘com- 
plete’ N-linked glycosylation pathway, glycoproteins 
emerging from the Golgi complex can vary in how 
much their oligosaccharide attachments are modified. 
This could be due to both the structure of the protein 
as well as the relative levels of the modifying enzymes 
within the Golgi which have, in many cases, been 
shown to vary among different cell types. 

Proteins destined for the lysosome undergo much 
less modification of their attached oligosaccharides 
than membrane and secreted proteins. The modific- 
ation program of lysosome-bound proteins is first 
dependent on identifying these proteins on their ar- 
rival at the entry face of the Golgi stack. This recogni- 
tion event has been shown to reside with the first 
modifying enzyme of the pathway which adds a acetyl- 
glucosamine phosphate moiety in a specific man- 
nose unit in the 10-residue oligosaccharide complex. 
The enzyme performing this reaction recognizes a 
structural domain found only on lysosome-bound 
proteins. Unlike other protein ‘addresses’ (e.g., mem- 
brane signal sequences and ER retention signals) the 
signal directing proteins to the lysosome is not con- 
tained within a short linear sequence of amino acids 
but rather is located on noncontiguous segments of 
the proteins that become juxtapositioned following 
the proper folding of the protein. The second and 
last modification step for lysosome-bound proteins 
involves the removal of the just-added acetylgluco- 
samine (without the phosphate) resulting in the 
oligosaccharide complex possessing a mannose-6- 
phosphate residue. This phosphorylated mannose is 
in turn recognized by the mannose-6-phosphate 
receptor located in the membrane of the exit face 
cisternae which directs these proteins to the lysosome. 

It is believed that the default pathway for proteins 
entering the Golgi stack is either to be transported to 
the plasma membrane (for membrane-associated pro- 
teins) or to be loaded into secretory vesicles whose 
contents are released from the cell surface. Proteins 
may possess any one of a number of address tags 
which divert them from being secreted from the cell 
in an unregulated fashion (which is sometimes 
referred to as ‘bulk flow’). These address tags can 
consist of short protein sequences such as those 
found in ER or Golgi resident proteins or posttrans- 
lational modifications like the mannose-6-phosphates 
that direct proteins to the lysosome. Additionally, 
proteins can also be diverted from the bulk flow path- 
way by being packaged into specialized secretory ves- 
icles as they emerge from the exit face cisternae. These 
specialized vesicles, depending on the cell type, can be 
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released from the cell in response to various environ- 
mental signals. For example, digestive enzymes are 
secreted by pancreatic cells via specialized vesicles in 
response to the presence of food in the small intestine. 
These digestive enzymes begin to aggregate and/or 
crystallize as they are transported through the Golgi 
stack. On reaching the exit face cisternae they are 
sequestered in large vesicles that eventually maturate 
into the densely packed secretory granules character- 
istic of exocrine and endocrine cells. The formation of 
granules is a very efficient way in which to concentrate 
and store a secretory product prior to its regulated 
release. 

In some cases secretion must be regulated spatially 
as well as temporally, for example, where polarized 
neuronal cells must release neurotransmitters into 
very specific extracellular locations (within synaptic 
regions) in response to electrical stimuli. Other types 
of cells traffic specific proteins to either the apical 
or basolateral plasma membrane. For example, the 
cystic fibrosis transmembrane conductance regulator 
(CFTR), which is discussed in detail below, is 
expressed on the apical plasma membrane in epithelial 
cells that line the airways. What determines CFTR 
trafficking to the apical plasma membrane following 
its processing in the Golgi? Similar to other protein 
‘address’ signals, a short sequence at the C-terminal of 
CFTR is necessary for its localization to the apical 
plasma membrane. This region of CFTR has been 
shown to bind to still other proteins that are localized 
to the apical plasma membrane themselves. It is 
unclear what determines the cellular localization of 
these proteins that are responsible for the correct 
placement of the CFTR protein. 


Diseases Caused by the Misfolding of 
Secretory Proteins 


The proper functioning of a given protein is absolutely 
dependent on its three-dimensional structure. A wide 
range of debilitating human diseases are associated 
with protein misfolding events that can occur either 
within the cell or after a protein has been secreted. 
Some diseases, such as Alzheimer’s and Parkinson’s, 
are associated with the inappropriate aggregation of 
normal proteins (probably occurring after their secre- 
tion from the cell), while in other diseases or condi- 
tions, such as cystic fibrosis and albinism (see below), 
folding defects have been linked with specific muta- 
tions within a protein. 

Cystic fibrosis is one of the most common genetic 
diseases and is characterized by severe chronic pul- 
monary and pancreatic disorders. Nearly all cases of 
cystic fibrosis are linked to mutations in the gene en- 
coding the cystic fibrosis transmembrane conductance 


regulator (CFTR), a protein that normally forms 
chloride channels in the plasma membrane of epithelial 
cells. The majority of individuals suffering from cystic 
fibrosis are homozygous for an allele of CFTR which 
is missing a phenylalanine residue at position 508 
(CFTR-F508). Surprisingly, CFTR-F508 can function 
just as well as wild-type CFTR in so far as serving as a 
membrane chloride channel protein. However, it has 
been found that in cells the CFTR-F508 protein is 
transported to the plasma membrane much less effi- 
ciently than the wild-type CFTR. It has been observed 
that the CFTR-F508 protein is retained in the ER by 
the quality control system (discussed above) and is 
eventually exported to the cytoplasm and degraded. 

It is believed that the reason the CFTR-F508 pro- 
tein fails to be transported to the Golgi apparatus is 
owing to the fact that the F508 mutation in some way 
slows down (but doesn’t prevent) the CFTR folding 
process. This delay means that unfolded or partially 
folded CFTR-F508 will be associated with the ER 
chaperones for a relatively longer period of time, 
which increases the probability that it will be exported 
to the cytoplasm and degraded. This example illus- 
trates how the ER quality control system removes not 
only misfolded proteins from the secretory pathway 
but also proteins that fail to fold in a timely manner. 
Although unfortunately for cystic fibrosis patients 
this system prevents an otherwise functional protein 
(CFTR-F508) from reaching its cellular destination, 
in normal individuals this system prevents the poten- 
tially harmful accumulation of misfolded or partially 
folded proteins in the ER. 

Protein entering the secretory pathway can also be 
inappropriately modified which in turn can affect how 
the protein either folds or functions. Inappropriate 
modification within the ER can arise, for example, 
when a protein contains a mutation that results in it 
not being recognized by the ER glycosylation 
enzymes. Albinism is a genetic disease that can be 
caused by mutations in any one of the genes involved 
in pigmentation. One such protein, tyrosinase, an 
enzyme that catalyzes a key reaction in melanin 
synthesis, enters the secretory pathway and following 
its export from the Golgi is transported to melano- 
somes, the site of melanin synthesis. Tyrosinase is 
modified in the ER by the addition of seven (six in 
the mouse) N-linked glycans which are further modi- 
fied in the Golgi. In contrast, mutant versions of tyro- 
sinase, representative of the albino phenotype, are 
retained in the ER and, like CFTR-F508, are even- 
tually exported to and degraded in the cytoplasm. 

Why are these mutant tyrosinases retained in the 
ER? A clue can be found by examining where some of 
the mutations reside. Several of these mutations are 
located near tyrosinase’s glycosylation sites and have 


been shown experimentally to result in the site not 
being recognized by the ER glycosylation enzymes. 
Glycosylation of these sites is probably important for 
tyrosinase to become properly folded since it has been 
found that these underglycosylated variants of tyrosi- 
nase, like CFTR-F508, associate with the ER chaper- 
ones for a relatively longer period of time compared 
to the fully glycosylated wild-type tyrosinase. This 
example illustrates the close relationship between 
posttranslational modifications and protein folding, 
which occur within the ER, and how these processes 
influence downstream events such as export to the 
Golgi apparatus. 


See also: Mitochondria; Organelles; Protein 
Synthesis; Ribosomes 
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Protein splicing is a novel posttranslational processing 
event that involves the precise removal of an internal 
polypeptide segment, termed an intein, from a pre- 
cursor protein with the concomitant ligation of the 
flanking polypeptide sequences, termed exteins. Rem- 
iniscent of RNA introns, many inteins have been 
shown to self-catalyze the splicing event without the 
requirement of external energy or protein cofactors 
(Figure |). The mechanism of protein self-splicing has 
been elucidated by the identification of key catalytic 
amino acid residues and intermediates. Mutation of 
these catalytic amino acid residues has permitted the 
modulation of inteins for use in protein manipulation 
and gene expression. 


The Discovery of Inteins 


Protein splicing elements, or inteins, are encoded by 
an open reading frame embedded within a gene encod- 
ing a host protein, therefore they are a protein within a 
protein. In 1990, laboratories led by Tom Stevens and 
Yasuhiro Anraku reported for the first time the exist- 
ence of a protein splicing element, which they found 
inthe VMA1 gene of the budding yeast Saccharomyces 
cerevisiae. The VMA1 gene encodes a 119 kDa protein 
precursor from which the 69kDa catalytic subunit 
of the vacuolar ATPase is produced by the removal 
of a 50 kDa internal protein sequence. Since this first 
report, more than 100 inteins have been identified in all 
three domains of life including eubacteria, archae and 
unicellular eukaryotic organisms (http://www.neb. 
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Figure! Mechanisms ofprotein splicing. (A) Cis-splicing; 
(B) trans-splicing. 


com/inteins/intein_intro.html). The excised S. cerevi- 
siae VMA intein was found to exhibit a homing endo- 
nuclease function that is required for insertion of the 
intein coding sequence into an inteinless allele. 

The prevailing view of the flow of genetic informa- 
tion was first challenged in 1977 by the discovery of 
genes that were interrupted by noncoding regions 
termed introns. Introns are removed by a process 
defined as RNA splicing when the coding regions, 
termed exons, are spliced together to form the mature 
messenger RNA. The discovery of protein splicing 
elements adds another twist to the central dogma sur- 
rounding the organization of genetic information. 


Intein Organization 


Inteins themselves can be divided into three major 
regions, an amino (or N) terminal splicing domain 
(IN,,), a carboxy (or C) terminal splicing domain (IN,), 
and an optional endonuclease region. The N- and 
C-terminal splicing domains can be subdivided into 
conserved amino acid motifs shared by all known 
inteins. Within these conserved motifs is a cysteine 
(or serine or threonine) residue following the scissile 
peptide bonds at the N-terminal and C-terminal splice 
junctions, as well as a highly conserved asparagine 
at the C-terminus of the intein. These amino acid 


1566 Protein Splicing 


residues appear to directly participate in the cleavage 
of the two flanking peptide bonds and linkage of the 
external protein sequences. 

The majority of the known inteins appear to be 
bifunctional since they contain additional motifs char- 
acteristic of ahoming endonuclease that confers genetic 
mobility upon the intein-encoding DNA. In addition 
to the inteins containing an endonuclease domain 
there have been more than a dozen inteins identified 
that lack this region and are termed mini-inteins (ran- 
ging in size from 134 to 198 amino acid residues). The 
smallest of the mini-inteins is the 134-amino acid 
intein found in the ribonucleoside diphosphate reduc- 
tase gene of Methanobacterium thermoautotrophicum 
and may be close to the minimum size necessary to 
promote the protein. splicing process. 

The most intriguing mini-intein described to date 
is a naturally occurring trans-splicing intein from 
the catalytic subunit of a DNA polymerase III 
(DnaE) from the cyanobactertum Synechocystis 
sp. PCC6803. The Ssp DnaE protein is encoded by 
two genes separated by 745 kb of genomic DNA and 
on opposite DNA strands. The mature DnaE protein 
is formed by trans-splicing between two primary 
translation products, one comprising the DnaE 
N-terminal sequence followed by a 123-amino acid 
intein N-terminal splicing domain and another com- 
prising a 36-amino acid intein C-terminal splicing 
domain fused to the DnaE C-terminal sequence. 


Chemical Mechanism of Protein Splicing 


Protein splicing appears to be one of the most extra- 
ordinary posttranslational autoprocessing events in- 
volving peptide bond rearrangement. The first in 
vitro splicing experiment was performed using a pur- 
ified protein precursor containing an intein cloned 
from a hyperthermophilic archeon, Pyrococcus 
sp. GB-D, providing convincing evidence that protein 
splicing is self-catalyzed. Subsequently, extensive bio- 
chemical and mutational studies have led to the eluci- 
dation of the chemical steps that underlie protein 
splicing. The reaction begins with an acyl rearrange- 
ment at the N-terminus of the intein whereby the 
hydroxyl or sulfhydryl group of Ser or Cys attacks 
the carbonyl carbon of the residue preceding the intein 
forming an ester or thioester intermediate. The con- 
served Cys/Ser/Thr residue following the C-terminal 
scissile peptide bond performs a nucleophilic attack of 
the ester/thioester intermediate resulting in the for- 
mation of a branched intermediate. The next step 
couples the cyclization of the highly conserved aspar- 
agine adjacent to the C-terminal splice junction with 
peptide bond cleavage. The reaction releases an intein 
possessing a C-terminal aminosuccinimide residue 


and the ligated exteins bound via an ester/thioester 
linkage. In the final step, a spontaneous S-N or O-N 
acyl rearrangement converts the ester/thioester bond 
to a stable peptide bond and completes the splicing 
reaction. The intein crystal structures show that the 
folded intein brings the two splice junctions into close 
proximity, which facilitates the splicing reaction. 


Use of Inteins for Protein Manipulation 


Understanding the chemical events involved in the 
cleavage and formation of peptide bonds in the com- 
plex splicing pathway has led to the rational design of 
various strategies for protein manipulation (Figure 2). 
In order to make use of the self-cleaving activity of 
inteins as tools for protein purification, replacement of 
the catalytic amino acid residues at either splice junc- 
tion is necessary to block protein splicing. Controlled 
cleavage at single splice junctions led to the develop- 
ment of one-column protein purification systems. 
The gene encoding for a protein of interest is fused 
to the coding region of an engineered intein. The 
chimeric protein is expressed and purified by an 
affinity column. The protein of interest is released 
when the intein is induced to cleave the peptide 
bond at the fusion junction by a thiol reagent or by a 
pH and temperature shift. This method dramatically 
simplifies the protein purification process. Unlike 
other protein fusion systems, the use of an intein as a 
fusion tag does not rely on exogenous protease to 
remove the fusion tag. This prevents the loss of the 
target protein due to nonspecific proteolysis and 
eliminates the need for further steps to remove or 
inactivate the protease. 

Inteins have also been engineered to be versatile 
tools in protein manipulation including ligation, 
labeling, and cyclization of proteins and peptides. 
The intein-mediated protein ligation reaction allows 
incorporation of noncoded amino acids into a large 
protein sequence, production of cytotoxic proteins, 
and facilitation of the analysis of protein structure by 
techniques such as NMR. The protein trans-splicing 
technique relies on the high affinity and catalytic 
activity displayed by the two halves of an intein 
to ligate two protein sequences. The in vitro trans- 
splicing of artificially split intein fragments requires a 
denaturation/renaturation step, while the naturally 
occurring Ssp DnaE intein is capable of trans-splicing 
under native conditions. 

Furthermore, novel methods have been developed 
to utilize the catalytic activity inherent in inteins to 
cyclize large proteins as well as small peptides. Protein 
backbone cyclization confers conformational con- 
straints on proteins and peptides, which may contri- 
bute to higher biological potency as well as higher 
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Figure 2 Protein manipulation by engineered inteins. (A) Protein purification and ligation; (B) protein cyclization. 


stability of many cyclic proteins or peptides. Thus, 
inteins represent important tools for the production 
of new protein drugs. 
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Proteins are synthesized stepwise by the polymeriza- 
tion of amino acids in a unidirectional manner, 


beginning at the N-terminus and ending at the 
C-terminus. The amino acids are linked by the 
formation of peptide bonds, and the resulting poly- 
peptide chain contains one of 20 different amino 
acids at each position. For protein synthesis, a mes- 
senger RNA (mRNA) molecule copied from DNA 
provides the instruction for the synthesis of a specific 
protein. The information encoded in the sequence 
of bases in the mRNA is translated by transfer 
RNA (tRNA) molecules that bind to the mRNA 
at one end, and carry specific amino acids at the 
other end. The synthesis of the growing polypeptide 
chain is carried out on ribosomes, that contain RNA 
and associated proteins. Additional specific protein 
factors aid in the initiation, elongation and ter- 
mination of protein synthesis. Genetic information is 
encoded as a series of three bases, or triplets, in the 
mRNA. The 64 triplets and the amino acids they 
specify are called the genetic code. In most organisms 
three (and sometimes two) of the triplets signal chain 
termination. 


See also: Amino Acids; Genetic Code; Messenger 
RNA (mRNA); Proteins and Protein Structure; 
Transfer RNA (tRNA) 


1568 Proteins and Protein Structure 


Proteins and Protein 
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Protein molecules regulate and participate in all the 
essential tasks of the living organism. They provide 
and transmit the biological signals that regulate gene 
expression, cell growth and division, cell differenti- 
ation, and programed cell death. In animals proteins 
perform muscle contraction, provide the matrix for 
bone and skin, and give elasticity to blood vessels. 
There exists an enzyme protein to catalyze each of 
the chemical reactions of a biological system. A mam- 
malian cell expresses between 10000 and 20000 
proteins simultaneously and at widely disparate con- 
centrations. 


Proteins are Linear Polymers of Amino 
Acids 


Proteins are specified by the nucleotide sequences of 
genes. The DNA sequence of a gene is transcribed into 
an RNA molecule which is translated by the ribo- 
somes into a specific linear sequence of amino acid 
residues connected by peptide bonds. The amino acid 
sequence of each protein type is unique. After protein 
synthesis is completed on the ribosome, the amino 
acid polymer or polypeptide folds into a particular 
three-dimensional structure by the action of noncova- 
lent interactions between amino acids distributed 
along the polypeptide chain. The folded structure is 
called the native conformation of the polypeptide 
chain or protein. It is the native conformation of the 
protein that is functionally active and enables it to 
perform a biological role. 
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Figure | General structure of a common amino acid. 
Each amino acid has a common carbon atom (a-C) to 
which four different groups are covalently attached. The 
ionized form of the amino acid is shown, which is the 
predominant form at physiological pH. In the amino acid 
glycine, the R group is a hydrogen atom and thus glycine 
has only three different types of groups attached to 
the a-carbon. The amino acid proline incorporates the 
amine substituent in its side chain R group (see Glycine 
and Proline entries). 


There are only 20 different amino acids found in the 
primary products of translation. These amino acids are 
designated the common amino acids. However, some 
proteins may contain amino acids for which no codon 
exists. These amino acids are derived from common 
amino acids, usually by an enzyme-catalyzed reac- 
tion, after the common amino acid has been incorpor- 
ated into a polypeptide chain of a protein. 

The 20 common amino acids have the general struc- 
ture shown in Figure |. Each amino acid contains a 
central alpha (a) carbon to which is attached a car- 
boxylic acid group, an amino group, a hydrogen atom, 
and a side chain (R) which differs for each of the amino 
acids. To form polypeptides, different amino acids 
are joined between their carboxylic acid groups and 
amino groups to form peptide bonds. The joining of 
two amino acids to form a peptide bond is shown in 
Figure 2. The dipeptide product has a free amino end 
(N-terminal end) and a free carboxylic acid end 
(C-terminal end) each of which can be joined to 
additional amino acids. Reiteration of successive join- 
ing steps generates polypeptide chains. Genes code for 
polypeptides of widely varying length. Small polypep- 
tides may have less than 50 amino acids while large 
polypeptide chains contain 4000-5000 amino acids. 


Levels of Protein Structure 


The amino acid sequence constitutes the protein’s pri- 
mary level of structure. On folding to the native con- 
formation, the protein takes on secondary, tertiary, 
and quaternary levels of structure. Secondary struc- 
ture refers to regular conformations of segments of 
the polypeptide chain. In the secondary structure, 
neither, the locations of the amino acid side chains 
nor the location of distant regions of the polypeptide 
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Figure 2 Joining of two amino acids to form a peptide 
bond. Amino acid with side chain group R, is joined to 
amino acid with side chain group R3 to form a dipeptide 
(R;—R2) connected by a single peptide bond. Succeeding 
amino acids are joined to the C-terminal end to form a 
polypeptide. 


chain with respect to each other are considered. 
Tertiary structure refers to the location in three- 
dimensional space of each atom of a polypeptide 
chain including the relationship of the side chain 
groups to the polypeptide backbone and of distant 
parts of the polypeptide to each other. Quaternary 
structure refers to the arrangement of polypeptide 
chain units in a multipolypeptide chain protein. Many 
proteins are composed of a single polypeptide chain 
and have no quaternary structure. Other proteins are 
composed of two to several thousand individual poly- 
peptide chain units associated with each other by non- 
covalent interactions, leading in many cases to massive 
and complex quaternary structures. 


Features of Secondary Structures 


Each amino acid residue within a polypeptide chain 
contributes three covalent bonds to the polypeptide 
chain: (1) the peptide bond; (2) the bond between 
the a-C and N (designated the phi bond, ©); and 
(3) the bond between the a-C and carbonyl carbon 
(designated the psi bond, ¥) (Figure 3). If all the ọ 
bonds in a segment of polypeptide chain have an equal 
angle of rotation and all the bonds have an equal 
angle of rotation, that segment of polypeptide chain 
has a regular polypeptide conformation or secondary 
structure. Stable secondary structures include the 
a-helix and the B-strand conformations, which are 
commonly found in folded proteins. 

Figure 4 shows an a-helix structure, characterized 
by 3.6 amino acid residues per turn of the helix. Each 
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Figure 3 Phi (þ) and Psi (¥) bonds contributed by 
each amino acid within a polypeptide chain. Each amino 
acid residue within a polypeptide chain contributes three 
covalent bonds to the chain, its p bond between its «-C 
and amine nitrogen, its ¥ bond between its o-C and 
carbonyl carbon, and its peptide bond. Secondary 
structures are generated when all þ bonds have equal 
angles of rotation and all ‘¥ bonds have equal angles of 
rotation within those amino acids of the region of the 
polypeptide chain. Structure of a tetrapeptide (R;—R4) is 
shown and the p and ¥ bonds for amino acids R, and R3 
are indicated. Peptide bonds joining adjacent amino acids 
are depicted by a jagged line. 
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peptide bond within the helix forms two hydrogen 
bonds, one to the peptide bond of the amino acid 
four residues above it and the second to the peptide 
bond four amino acid residues below it in the helix. 
Hydrogen bonds are noncovalent (weak) bonds 
formed by the sharing of hydrogen atoms between 
two electronegative atoms. In the a-helix, hydrogen 
bond interactions involve the sharing of the NH 
hydrogen between the electronegative carbonyl 
oxygens and the electronegative nitrogens in helix- 
adjacent peptide bonds. The large number of intra- 
helix hydrogen bonds between peptide groups is a 


Figure 4 «-Helix. Atoms of polypeptide chain com- 
prising the o-helix are gray balls with interconnected 
bonds depicted as white tubing. The helical secondary 
structure is overdrawn by spiral lines to visualize helical 
structure. Dotted lines (purple) show hydrogen bonds 
from carbonyl oxygen atoms and nitrogen atoms of 
peptide bonds from amino acids four residues apart in the 
helix. Hydrogen atoms are not shown. Side chain groups 
(dark black) are on the outside of the «-helix generated 
by the atoms of the polypeptide chain. There are 3.6 
amino acid residues per turn of the helix. -Helix is from 
deoxy human hemoglobin «-chain, residues 21 through 
33, of amino acid sequence -Glycine-Glutamate-Tyro- 
sine-Glycine-Alanine-Glutamate-Alanine-Leucine-Gluta- 
mate-Arginine-Methionine-Phenylalanine- from the 
NHp>-terminal direction (on top) toward the COOH- 
terminal direction (on bottom). (Based on Structure 
IA3N in protein data bank (PDB) submitted by J. Tame 
and B. Vallone and generated with the SwissPdb Viewer: 
Guex N, and Peitsch MC (1997) Electrophoresis 18, 
2714-2723.) 
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significant stabilizing factor in a-helical structures. 
The pitch (distance between successive turns of an 
a-helix) is 5.4 A. The side-chain groups are on the 
outside of the helix, directed perpendicular to the 
helix axis. As there are 3.6 amino acids per helix turn, 
the -helix brings every third to fourth amino acid side 
chain close together. If such side chains are of similar 
polarity (nonpolar or polar) an edge may be created 
along the helix with hydrophobic or hydrophilic 
properties. Such edges are important for the inter- 
action of a-helices with other segments of polypeptide 
chain, the aqueous solvent, or with other polypeptides 
in the formation of quaternary structures. 

A second common secondary structure for sections 
of polypeptide chain is the B-strand conformation. 
The B-strand conformation is an extended helix with 
two residues per turn and a pitch of 6.8 A. B-Strands 


È? 


s D -m n 


Figure 5 ß-structure. Three antiparallel ß-strands 
forming a B-structure. Atoms of the polypeptide chains 
are white and side chain groups are dark. Hydrogen 
bonds between peptide group atoms are shown by 
dotted lines. Polypeptide chains of left and right are 
directed from N-terminal (bottom) to C-terminal (top), 
while middle strand is antiparallel, from N-terminal 
(top) to C-terminal (bottom). Structure from Cu/Zn 
superoxide dismutase chain A, amino acids Serl4 to 
Gln21I (left), Thr28 to Ala35 (middle), Ala95-Asp101 
(right). (Based on PDB Structure IB4T: Hart PJ et al. 
(1999) Biochemistry 38: 2167.) 


are only stable when the peptide bonds of one segment 
are hydrogen bonded to another segment of similar 
conformation, on one or both sides of the first strand 
(Figure 5). The structure generated by hydrogen- 
bonded multiple B-strand segments is known as 
B-structure. Atoms within polypeptide chains partici- 
pating in B-structures tend to lie in a plane that is 
referred to as pleated sheet-like with the side chains 
of the amino acids alternately pointed above and 
below the plane of the sheet. In most proteins the 
B-structure sheet tends to deviate from the ideal, 
appearing deformed or twisted. Alternate strands in 
the B-structure may be aligned in parallel directions 
(in the N-terminal toward C-terminal sense) or in 
antiparallel directions. These arrangements give a dif- 
ferent geometry to interstrand hydrogen bonding 
interactions. 


Combinations of Secondary Structure 
Form Motifs 


Arrangements of sections of secondary structure 
recurrently found in different proteins are called 
structural motifs. Motifs may be rather simple such 
as the helix—turn—-helix motif of certain DNA-binding 
proteins, the f-strand-turn—f-strand motif found 
in proteins with antiparallel B-structure, and the 
B-strand-loop—a-helix-loop-B-strand motif, found in 
proteins that can alternate between o-helical and 
B-strand secondary structures. More complex motifs 
include a pattern of loop interconnections between 
four antiparallel B-strands (Greek key motif) and an 
arrangement of two B-strands connected to an o-helix 
that binds a zinc ion (zinc finger motif). 


Globular Proteins Form Structural 
Domains at the Level of Tertiary 
Structure 


Assemblies of motifs within polypeptide chains 
may form domains. Domains are compact semi- 
independent folded regions that contain an inner 
core of hydrophobic amino acids and an outer surface 
that contains most of the polar and charged amino 
acids. The polar groups on the outside are stabilized 
by favorable dipole interactions with water molecules 
of the solvent. A polypeptide chain can form a single 
domain or multiple domains. Figure 6 shows the cata- 
lytic domains of an enzyme that catalyzes the hydro- 
lysis of peptide bonds in other proteins. The enzyme is 
a single polypeptide chain arranged in two domains 
that are connected by a segment of polypeptide chain 
that forms a narrow cleft between the two domains. 
The active site binds the substrate and catalyzes 
peptide bond hydrolysis. It lies within a small region 
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of folded structure in the cleft at the domain interface 
and is composed of amino acids from both domains. 

Domains have compact structures with polypep- 
tide chain segments, often in o-helical or B-strand 
conformation, passing back and forth to make up the 
spheroid structure of the domain. The segments of 
secondary structure are interconnected by turn-and- 
loop regions of polypeptide chain on the outside of 
the domains. There appears to exist little unoccupied 
space on the inside of the spheroid domains. However, 
even with such dense, compact structures, the atoms 
within the folded protein are rotating and vibrating 
under the constraints of their binding forces. Atoms in 
folded structures fluctuate in position 0.5-0.8 A ona 
picosecond time scale, allowing small empty pockets 
within the folded structure to move and small mol- 
ecules from the outside to enter into the interior of a 
folded structure. Over times greater than picoseconds, 
the small atomic fluctuations facilitate larger motions 
in the protein such as the movement of regions of a 
domain or the movement of one domain in a protein 
with respect to another. 

Surprisingly, similar domain folds can be found 


Figure 6 Structure of protease catalytic domains 
Cartoon depiction of the secondary structural elements 
in the catalytic domains with arrows showing each of the 
B-strands that align to form an extensive f-structure in 
each of the two domains. The domain fold is referred to 
as a B-barrel. The two B-barrel domains are inter- 
connected by a cleft region containing the catalytic site. 


(Based on PDB Structure IBDA of human single chain . : ‘ : ; 
; : , a in proteins that are neither evolutionarily related nor 
tissue plasminogen activator, structures in Figures 6 and 


7 were generated with the Cn3D Viewer: Wang Y et al. me PU AR a toodi p a 
(2000) Nucleic Acids Research 28: 243.) ee eee ener aS SUPEO QS 


(Figure 7). Protein domain structures are classified 


(C) 


Figure 7 Examples of superfolds. Cartoons of fold structures showing the type of secondary structure and 
arrangement of secondary structure elements in three superfold domain structures. «-Helical regions of chain are 
cylinders, B-strands are broad arrows pointed in the C-terminal direction, and nonregular conformations of the 
polypeptide chain are ropes. Side-chain groups of the amino acids are not shown. (A) All « structure, globin domain 
fold shown for B chain of hemoglobin. (Based on PDB Structure ICBM, from Borgstahl GE et al. (1994) Journal of 
Molecular Biology 236: 817.) (B) «,B structure, TIM barrel fold characterized by parallel B-strands in the interior of 
domain alternating with «-helical segments on outside of domain. Structure is domain | of triose phosphate 
isomerase. (Based on PDB Structure 8TIM, submitted by P. J. Artymiuk, W. R. Taylor, and D. C. Phillips.) (C) All B- 
structure, immunoglobulin fold from an antibody heavy chain. (Based on PDB Structure I FAY, from Villeneuve S et al. 
(2000) Proceedings of the National Academy of Sciences, USA 97: 8433. B-Strands form antiparallel B-sheet in front and a 
second ß-sheet in rear of fold structure.) 
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by class, architecture, fold, homologous superfamily, 
and family. The class of a protein is determined by 
the type of secondary structural elements present 
in the structure. There are four classes of proteins: 
(1) mainly o-helix; (2) mainly B-strand; (3) approxi- 
mately equal amounts of -helix and B-strand; and 
(4) proteins with little regular secondary structure. 
Architecture is determined by the arrangement of 
secondary structure elements but ignores how the 
sections of secondary structure are connected. In the 
determination of fold family, the arrangement as 
well as the connectivity of the secondary structural 
elements is considered. An homologous superfamily is 
determined from an analysis of the per cent amino acid 
sequence identity between proteins. Proteins with 
sequence identity or homology are considered to be 
evolutionarily related, the family members having 
evolved through gene duplication and mutations from 
an initial primordial gene. Proteins that are members 
of the same family have a similar function and a higher 
sequence identity than members of a superfamily, 
where the sequence identity between families may be 
lower and the evolutionary relationships may be more 
distant. 


Quaternary Structure 


Many proteins consist of two or more polypeptide 
subunits associated noncovalently to form a quater- 
nary structure. The polypeptide chains may be identical 
or different. Quaternary structures may contain as few 
as two polypeptide subunits or hundreds of poly- 
peptide chain subunits. For example, hemoglobin 
contains four polypeptide subunits (two o-globin 
polypeptide chains and two f-globin polypeptide 
chains). The proteasome that facilitates the break- 
down of intracellular proteins that are targeted for 
degradation contains 14 polypeptide chains. The coat 
protein of tobacco mosaic virus contains over 2000 
polypeptide subunits. 


Fibrous Proteins 


Proteins may also be classified either as globular or 
fibrous. Globular proteins, described above, have a 
spheroid-like shape with a hydrophobic interior and 
a polar exterior. In contrast, fibrous proteins typically 
have a nonspheroid structure. In addition, fibrous 
proteins often have unusual amino acid compositions, 
a repetitive amino acid sequence pattern, and low 
solubility in water. Collagen is the most prominent 
example of a fibrous protein. The high proline content 
of collagen generates a structure called the polyproline 
type II helix. This helix has three amino acid residues 
per turn and a pitch of 9.4 A. Three polypeptide chains 


are wound around each other to generate an elongated 
superhelical structure. The superhelical molecular 
structure is rod-like, 3000 A long, and 15 A in dia- 


meter. 


Covalent Modification of Proteins 


Many proteins contain carbohydrate molecules coval- 
ently attached to amino acid residues. This is espe- 
cially common in eukaryotic proteins secreted into the 
extracellular environment and in eukaryotic plasma 
membrane proteins. The attached carbohydrate may 
be simple, such as a single glucose molecule, or com- 
plex, with more than 10 monosaccharide units. Pro- 
teins often require a non-amino acid moiety, which 
binds to the protein either covalently or noncoval- 
ently and becomes a functional part of the protein. 
These non-amino acid moieties are called cofactors or 
prosthetic groups. They include metal ions, hemes, 
and derivatives of many of the vitamins, called co- 
enzymes, that function as a part of the catalytic sites 
of enzymes. Proteins may also be modified by phos- 
phorylation, acylation, reduction, oxidation, or ester- 
ification of particular side chain groups or of the 
N-terminal or C-terminal ends. Covalent modifica- 
tions almost invariably change the functional activity 
or role of the protein. 


Structural Genomics 


Structural genomics is a field whose objective is to 
determine the structures of all the protein fold families 
encoded by the genes of living organisms. If success- 
ful, this will allow the structures of all proteins or gene 
products to be determined by homology to proteins 
where the domain fold structure has been solved. 
Knowledge of a protein’s structure and its homology 
to other proteins gives insights into the function of 
the protein and its roles within biological systems. 
This knowledge may allow us to modulate a protein’s 
activity with inhibitor molecules or activator mol- 
ecules and by genetic engineering. All such possibilities 
are based on an understanding of the protein’s phys- 
ical, chemical, and geometric properties, deduced from 
its molecular structure. 


Web Sites of Interest 


http://www.biochem.ucl.ac.uk/bsm/cath. A classifi- 
cation of protein structure. 

http://www3.ncbi.nlm.nih.gov/Entrez/. National In- 
stitutes of Health site that accesses protein sequence 
data bases and the protein data bank which is the 
repository of protein three-dimensional structures 
determined by X-ray crystallography and NMR 


spectroscopy. Entrez site contains links to another 
NIH site from which you may download the protein 
structure viewer Cn3D to view, manipulate, and 
study protein structures from the structural protein 
data bank. The site also contains tutorials for Cn3D 
viewer. 
http://www.usm.maine.edu/~rhodes/SPVTut/index. 
html Tutorial for Swiss PDB viewer by Gale Rhodes, 
University of Southern Maine. This is an interesting 
structural viewer with different capabilities than the 
Cn3D viewer cited above. 
http://www.expasy.ch/tools/ This is the ExPASy 
(Expert Protein Analysis System) proteomics site 
of the Swiss Institute of Bioinformatics (SIB). This 
site contains tools and multiple links to other sites 
and databases for the study and analysis of protein 
structure. 
http://www.umass.edu/microbio/chime/explorer/ 
index.htm Site for a third type of protein structure 
viewer called protein explorer. This site also contains 
information on protein structure and links to other 
sites of interest. 


Further Reading 

Branden C and Tooze J (1999) Introduction to Protein Structure, 
2nd edn. New York: Garland. 

Creighton TE (1993) Proteins: Structures and Molecular Properties, 
2nd edn. New York: Freeman. 

Devlin TM (ed.) (1997) Textbook of Biochemistry with Clinical 
Correlations, 4th edn. New York: Wiley-Liss. 

Stryer L (1995) Biochemistry, 4th edn. Freeman: New York. 
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Proteolysis, or protein degradation is a set of processes 
that result in the hydrolysis of one or more of the 
peptide bonds in a protein, either through catalysis 
by proteolytic enzymes called proteases or nonen- 
zymatically, for example at very low or very high 
pH. In living organisms, proteolysis is a part of pro- 
tein turnover, in which the molecules of specific 
proteins are first made through ribosome-mediated 
translation, and eventually get destroyed, in ways and 
at rates that are specific for the protein in question 
and depend on the state of an organism. The in vivo 
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half-lives of proteins vary from a few seconds to many 
months. The biological functions of proteolysis are of 
several distinct classes, described below. 

Complete degradation of a protein, by proteases, to 
its constituent amino acids allows these amino acids to 
be reutilized, for example, in making other proteins. 
Thus, dietary proteins are hydrolyzed to amino acids 
or short peptides in the gastrointestinal tract by pro- 
teases that include trypsin and pepsin. The amino 
acids are then delivered to cells of a multicellular 
organism. A starving cell can also destroy some of its 
own nonessential proteins and use the resulting free 
amino acids to make, for example, essential proteins, 
thereby prolonging the cell’s viability in the absence of 
outside nutrients. 

Intracellular proteins can be selectively and proces- 
sively hydrolyzed through proteolysis by the ubi- 
quitin system (also called the ubiquitin—proteasome 
system; see below) in the cytosol and the nucleus. 
Intracellular proteins can also be destroyed through 
a process called autophagy, in which a membrane- 
enclosed intracellular compartment is delivered to 
the interior of an organelle called the lysosome and is 
degraded by lysosomal proteases and other hydrolytic 
enzymes of this organelle. In a related but distinct 
process of microautophagy, small (apparently ran- 
dom) portions of the cell’s cytosol can also be deliv- 
ered to the lysosome; this process is accelerated under 
the stress of starvation. Many cells are able to bind to 
and ingest extracellular proteins through a process 
called endocytosis. Some of the proteins thus absor- 
bed are also delivered to the lysosome and destroyed. 
This route to obtaining and utilizing dietary proteins 
is typical of single-cell eukaryotic organisms, but is 
also characteristic of cells in a multicellular organism, 
except that in this case the nutritional function of 
endocytosis tends to be minor in comparison to its 
other functions. 

The detection and elimination of damaged (for 
example, misfolded, aggregated) or otherwise abnor- 
mal proteins is one major role of the cellular proteo- 
lytic systems. The damaged proteins are potentially 
toxic to the cell, in part because they might interact 
with physiologically inappropriate ligands or, if ag- 
gregated, become mechanical impediments to the 
normal cellular processes. Most of the damaged intra- 
cellular proteins are recognized and destroyed by the 
ubiquitin system, which is present in the cytosol and 
the nucleus, but can also target proteins that fail to fold 
properly after their translocation from the cytosol 
into the endoplasmic reticulum (ER). These proteins 
are detected by quality-control systems of the ER and 
can be retrotransported back to the cytosol for their 
destruction by the ubiquitin system. A minority of 
abnormal proteins is either not detected by the 


1574 Proteolysis 


surveillance mechanisms or cannot be selectively 
eliminated, for example because they form large intra- 
cellular aggregates, as happens in several neurodegen- 
erative diseases. Gradual accumulation of abnormal 
proteins that cannot be selectively removed by pro- 
teolysis is likely to be among the causes of aging of 
multicellular organisms. 

Proteolysis can involve either most of the protein’s 
peptide bonds or only some of them, resulting, in the 
latter case, in two or more fragments of the initial 
protein. This limited, site-specific proteolysis under- 
lies a great variety of biological processes, only some 
of which are mentioned below. For example, 
lymphocytes of the immune system recognize short 
(~10-residue) fragments of proteins, called peptides, 
which are presented on the cell surface as a part of a 
complex with specific transmembrane (major histo- 
compatibility complex) (MHC) proteins. Some of 
these peptides are produced by the ubiquitin system 
in the cell’s cytosol, and are thereafter transferred to 
the lumen of ER, where they associate with newly 
formed MHC proteins and are transported to the 
cell surface. Other MHC-associated peptides are 
derived from proteins that have been endocytosed by 
the cell. In this case, the peptides are usually produced 
by lysosomal proteases, and reach the cell surface by 
routes distinct from those of peptides derived from 
intracellular proteins. 

Yet another function of limited proteolysis is to 
modify newly formed proteins in preparation for 
their function inside or outside the cell. For example, 
many hormones are produced as larger precursor pro- 
teins, some of which may contain the moieties of 
several distinct hormones as parts of a single poly- 
protein. Individual hormones are produced from this 
precursor, in the course of its journey through the 
secretory pathway (ER, Golgi, storage vesicles), by 
specific proteases that reside in these intracellular 
compartments. An appropriately timed and precisely 
placed cleavage by a protease can also be utilized as a 
signal-transduction device. For example, the meta- 
bolism of cholesterol, a major constituent of bio- 
logical membranes, is regulated in part through 
conditional cleavage of the cytosolic domain of a 
specific protein in the cell’s membrane. The released 
cytosolic domain is translocated to the nucleus, where 
it regulates the expression of genes involved in the 
lipid metabolism. 

Site-specific proteolysis also plays a major role in 
the initiation and execution of a fundamental cellular 
process called apoptosis or programed cell death. 
Cells of a multicellular organism are programed 
to kill themselves under certain conditions, which 
include a variety of metabolic stresses. For some 
cells, apoptosis is their normal fate in the course of, 


for example, embryonic development. Cells can also 
die an apoptotic (as distinguished from necrotic) death 
if they suffer certain mutations or find themselves in 
an environment devoid of an essential growth factor. 
In all of these cases, cells activate a cascade of proteases 
called caspases, which cleave a number of specific 
intracellular proteins, resulting in irreversible changes 
that lead to apoptotic death. 

Site-specific proteolysis is also a frequent feature of 
extracellular regulatory systems. For example, the 
clotting of blood, an adaptive response to injury of a 
blood vessel, is mediated by a complex cascade of 
proteins in the blood that include conditionally active 
proteases and their protein inhibitors. Sequential 
activation of these proteases plays a major role in 
both the initiation and completion of clot formation. 
These examples are but a small fraction of biological 
processes that involve limited proteolysis of specific 
proteins. 

One major function of intracellular proteolysis 
is the selective destruction of proteins whose con- 
centrations must vary as a function of the cell’s state. 
Metabolic instability is a property of many regulatory 
proteins. Thus, these proteins have evolved not only 
to carry out their primary functions — being, for ex- 
ample, a phosphokinase or a DNA-binding transcrip- 
tional activator — but also to be rapidly degraded in 
vivo. A short half-life of a regulator provides a way to 
generate its spatial gradients and allows for rapid 
adjustments of its concentration through changes 
in the rate of its synthesis. For example, stopping 
the synthesis of a transcriptional activator may not 
suffice to extinguish transcription of the activator- 
regulated genes rapidly enough, because the pre- 
viously made molecules of activator would still be 
present. One solution is to make the activator short- 
lived, so that cessation of its synthesis would result in 
rapid disappearance of the activator. The metabolic 
price of this arrangement is the necessity of making 
more activator than would have been necessary if the 
activator were long-lived. A protein can also be con- 
ditionally unstable, i.e., long-lived or short-lived 
depending on the state of a cell. One example are 
cyclins — a family of proteins whose destruction at 
specific stages of the cell cycle drives and regulates 
this cycle. In addition, many proteins are long-lived 
as components of larger complexes such as ribosomes 
or oligomeric proteins but are metabolically unstable 
as free subunits. The short zm vivo half-lives of free 
subunits decrease the necessity of stringent control 
over the relative rates of their synthesis, because a 
subunit produced in excess would not accumulate to 
a significant level. 

Most of the short-lived intracellular proteins are 
destroyed by the ubiquitin system, which conjugates 


a 76-residue protein called ubiquitin to proteins that 
are subsequently degraded to short peptides by the 
26S proteasome, an ATP-dependent multisubunit 
protease that recognises ubiquitylated proteins. Fea- 
tures of proteins that confer metabolic instability are 
called degradation signals, or degrons. A properly 
folded, long-lived protein usually contains a cryptic 
(buried) degradation signal or signals, for example, 
stretches rich in hydrophobic residues in the protein’s 
interior. These buried signals may become exposed if 
the protein is conformationally perturbed, for ex- 
ample during heat stress, thereby accounting for the 
selective recognition and degradation of damaged pro- 
teins by the ubiquitin system. At least some degrons of 
regulatory proteins that evolved to be short-lived are 
likely to be similar to the normally cryptic degrons of 
long-lived proteins. 

It is a common assumption that processive proteo- 
lysis provides a particularly effective, irreversible way 
to regulate the regulators. However, the proteolysis- 
based regulation, while effective, is also metabolically 
costly, given the necessity of increased protein syn- 
thesis. Hence the alternative view (which is also con- 
sistent with the available evidence) that the major and 
varied functions of regulatory proteolysis in modern 
organisms stem in part from the fact that proteolysis 
coevolved with protein synthesis, and got entrenched 
as an early but not necessarily the most cost-effective 
adaptation. For example, it is possible that the control 
of protein activity through site-specific phosphoryl- 
ation—-dephosphorylation, although metabolically 
less costly than processive proteolysis, did not dis- 
place the latter completely not because of the pre- 
sumed higher fidelity of proteolysis-based systems, 
but because these systems appeared in the essential 
circuits early in the history of protein-containing 
organisms, and therefore could not be replaced later 
on through incremental steps that underlie molecular 
evolution. 
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Until recently, protein researchers designed their 
experiments to focus on a single protein at a time or 
at most a single protein complex or functional path- 
way. Improved technologies are making possible a 
previously unimaginable scale of research: Global stud- 
ies that aim to achieve a comprehensive view of all 
the proteins expressed in a single cell. To accom- 
modate this new scope, the term ‘proteome’ was 
coined in 1994 by Marc Wilkins (then a postdoctoral 
fellow at Macquarie University, Sydney). In analogy 
to the term genome, the proteome represents the total 
protein repertoire able to be expressed from a given 
genome. The word has rapidly evolved to encompass 
diverse meanings: not just the proteome of an organ- 
ism, but also the proteome of a cell, tissue, or organ, 
referring to the set of proteins actually expressed in a 
particular cell, tissue, or organ at a particular time and 
under particular conditions. For example, in this 
context a human blood cell has a different proteome 
than a human muscle cell. This review discusses the 
reasons, current applications (both experimental and 
predictive), limitations, future challenges, and future 
applications for proteome research, a field termed 
‘proteomics.’ 


Significance of the Proteome 


What can we learn from the proteome? Since most 
cellular enzymatic functions, regulatory switches, 
signal transducers, and structural components are 
composed of proteins, characterizing the proteins 
expressed by a cell can give important clues to the 
function, organization, and responsiveness inherent 
ina cell. In addition, by defining the variation between 
different cells, and between cells exposed to different 
stimuli, we can gain an understanding of: 


e cellular adaptation to environmental signals; 

e mechanisms of cellular differentiation and organ- 
ismal development; 

e cellular aspects of disease processes; 

e cellular responses to aging; 

e difference between individuals within a species, 
i.e. the molecular basis of our individuality in 
physiology, disease susceptibility, and response to 
therapeutics and environmental exposures. 


There is currently a great deal of excitement about the 
potential to measure gene expression levels for every 
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gene of an organism. Extensive or complete genome 
sequences have made it possible to profile the levels of 
mRNA transcripts of all genes simultaneously by 
DNA microarray hybridization. Therefore, is it even 
necessary to study protein expression now that gene 
expression is so easily measured at the mRNA level? 
Most scientists believe the answer is yes, because the 
two approaches really are quantitatively and quali- 
tatively different. First, most DNA microarrays typic- 
ally do not differentiate between variant transcripts 
(produced by alternative splicing, use of alternative 
transcription start sites or polyadenylation sites, or 
RNA editing). Second, protein abundance may not be 
accurately predicted by mRNA level since the rate of 
translation and protein degradation is unknown for 
each mRNA. Third, posttranslational modifications 
and proteolytic cleavages are critical for the function 
of a protein, but cannot be detected or predicted by 
mRNA level. Finally, proteins usually work in com- 
plexes and protein localization is regulated by the cell, 
yet neither of these properties is addressed by examin- 
ing mRNA levels. 

Both the significance and the complexity of study- 
ing the proteome are evident in its sheer magnitude. 
The proteome is many-fold larger than the genome, 
given the wide degree of posttranslational modifica- 
tions and processing that nearly all proteins undergo. 
Many examples exist where a single gene (composed 
of many exons) can generate hundreds and possibly 
thousands of different protein molecules by alternative 
splicing and posttranslational modifications. Thus, 
analysis of the entire proteome presents a more daunt- 
ing challenge than the genome sequencing projects. 


Analysis of the Proteome by Physical 
Techniques 


No technology is yet available that can identify the 
entire proteome of any cell. However, current technol- 
ogy can allow a large sampling (several hundred to 
several thousand) of proteins to be viewed at once. 
Since its introduction in 1975 by O’Farrell, two- 
dimensional polyacrylamide gel electrophoresis (2D 
PAGE or 2D gel) has been the workhorse of the pro- 
teomics laboratory. 2D gels separate proteins from cell 
extracts in the first dimension according to their charge, 
and in the second dimension according to their mole- 
cular weight. Staining of the gel can reveal as many as 
several thousand spots, each corresponding to a single 
protein species or coincidentally comigrating proteins. 

Identification of the proteins comprising these 
spots was for many years difficult and arduous. The 
available techniques, such as determining amino acid 
composition, masses of peptide fragments, or partial 
N-terminal sequencing, were both time-consuming 


and characterized by low throughput. Computer pro- 
grams were developed to search the sequence data- 
bases for candidate proteins that matched one of 
these criteria. In parallel with the exponential increase 
in the content of DNA sequence databases through 
the 1990s, protein identification improved consider- 
ably. 

The problems of low resolution and low through- 
put have been greatly diminished in recent years by 
the rapid improvement of various mass spectroscopy 
(MS) technologies. Mass spectrometers measure the 
mass of chemical fragments with exquisite resolution, 
sometimes down to less than a single dalton. Two 
types of MS approaches have revolutionized protein 
identification. In the matrix-assisted laser desorption/ 
ionization (MALDI) approach, a ‘peptide mass’ finger- 
print for a protein is obtained by determining the 
masses of peptides generated by a protease. Computer 
algorithms are able to compare the fingerprint against 
the predicted peptide masses of all the proteins in a 
sequence database to identify the protein. A second 
option, tandem electrospray ionization (ESI) mass 
spectroscopy, employs an initial mass spectroscopy 
step whereby an unseparated mixture of peptide frage- 
ments is fed into a mass spectrometer by ESI. Indi- 
vidual separated peptides are selected, fragmented 
further, and fed into a second tandem mass spectro- 
meter for determination of partial amino acid 
sequence of the peptides. These partial sequences per- 
mit more specific identification than is possible by 
MALDI MS alone. Despite these advances, there are 
still limitations inherent in 2D gel technology. 
Although resolution is excellent, there are serious 
problems with consistency and reproducibility with 
a technology that is difficult to automate. Many im- 
portant classes of proteins, glycoproteins and mem- 
brane proteins in particular may be insoluble or may 
not enter the gel. Separation difficulties exist because a 
single spot can represent multiple comigrating pro- 
teins, and a single protein can migrate as multiple 
spots. But the most daunting difficulty is the 10 mil- 
lion-fold range in concentration among proteins 
within a cell, from tens or hundreds of copies for low- 
abundance proteins to many millions of copies for 
high-abundance proteins. Low-abundance proteins 
are neither visible on 2D gels nor detectable by current 
MS technology, and increasing sample quantity is not 
an option since 2D-gel resolution is severely distorted 
by protein overloading. 

Future improvements in 2D-gel technology, such 
as robotic automation of gel preparation and proces- 
sing, more powerful database searching and analysis, 
and advances in mass spectroscopy technology are 
likely to circumvent some of these limitations. Add- 
itional strategies are to specifically enrich or limit the 


diversity in the protein sample before loading onto the 
gel. Apart from 2D gels, two alternative approaches 
for proteome analysis are under development. “Multi- 
dimension chromatography’ separates proteins on a 
column (or a number of columns in series) which can 
be automatically linked to a mass spectrometer to 
process samples with high throughput. A second alter- 
native is to use protein chips similar to DNA chips 
used for mRNA expression analysis. This strategy 
entails using some sort of ‘bait,’ be it other proteins, 
antibodies, peptides, or small molecules, which is 
immobilized on a two-dimensional array. A crude 
protein sample is applied, and those proteins in the 
sample that bind to the bait are detected by one of 
several possible methods. 


Proteome-Wide Investigations Based 
Upon Genome Sequences: ‘Functional 
Proteomics’ 


In a number of model organisms, the complete pro- 
teome is predicted from the complete genome se- 
quence, allowing the design of large-scale experiments 
that systematically assay every putative protein. Per- 
haps the most powerful approach currently available is 
the yeast two-hybrid system, which allows detection 
within a cell of binary protein-protein interactions 
that occur in the proteome. Modifications of the two- 
hybrid approach allow detection of protein—RNA and 
protein-DNA interactions. Another technique, phage 
display, permits screening of all proteins in a proteome 
for binding to ligands — either chemical compounds or 
peptides. Large-scale subcellular localization screens 
can be done using panels of altered strains that contain 
and express a fusion of every open reading frame 
(ORF) to the green fluorescent protein (GFP). A dif- 
ferent use of fusion proteins, employing an affinity tag 
(such as glutathione-S-transferase or polyhistidine) 
allows purification of every protein individually for 
high-throughput screening of potential biochemical 
activities. Large-scale genetic screens, in which every 
putative protein is individually inactivated and the 
resulting phenotype(s) determined, are underway for 
several model organisms, particularly the yeast Sac- 
charomyces cerevisiae (by knockout mutagenesis), the 
nematode Caenorhabditis elegans (by RNA inactiva- 
tion methodology), and the mouse Mus musculus (by 
homologous recombination). These methodologies 
and others permit rapid large-scale functional charac- 
terization of every putative protein in the proteome. 


Knowledge of the Proteome and its 
Collection in Databases 


What properties of the proteins in a cell can proteo- 
mics hope to measure? The primary goal is protein 


Proteome 1577 


identification, addressing which proteins are expres- 
sed in which cells or which tissues. The abundance of 
each protein is critical, especially relative to other cells 
or tissues, as well as changes in abundance in response 
to signals or stresses and during differentiation and 
development. Variant protein isoforms need to be 
identified, and hopefully tied to the mechanism of 
their generation (transcription, mRNA processing, 
translation, or posttranslational modification). 

The eventual goal of proteome analysis extends 
beyond mere identification. The ultimate aim is the 
complete functional characterization of every protein 
in the proteome. To this end, a variety of proteomic 
techniques (both experimental and predictive) are 
being developed to assess, on a large scale, properties 
such as: protein-protein interactions, complex assem- 
bly, subcellular localization, regulation by modifica- 
tion or interaction or other means, enzymatic activity, 
and even three-dimensional structure. 

A variety of proteome databases to store and organ- 
ize this knowledge are being created and developed. 
Although some functional characterization is gathered 
in traditional sequence databases, such as Genbank 
(http://www.ncbi.nlm.nih.gov/Genbank), these data- 
bases are primarily repositories of annotated sequence 
data. Knowledge from large-scale investigations is 
being captured in databases for specific functional pro- 
teomics studies. For example, databases such as the 
Human and Mouse 2D PAGE Databases (http://bio- 
base.dk/cgi-bin/celis), _ SWISS-2DPAGE (http:// 
www.expasy.ch/ch2d/), and Siena~-2D PAGE (http:// 
www.bio-mol.unisi.it/d/2d.html) link sequence infor- 
mation to functional information and physical char- 
acteristics of the protein as determined by 2D gels. 
In addition, knowledge from predictive proteomics 
studies, where function or structure is inferred or 
calculated from sequence, is being captured in other 
types of databases. Examples are SWISS-PROT 
(http://www.expasy.ch/sprot) for function, and (http: 
//expasy.hcuge.ch/swissmod/SWISS-MODEL.html) 
or ModBase (http://pipe.rockefeller.edu/modbase) for 
structure. 

Such high-throughput approaches will enhance but 
never replace the wealth of knowledge already 
obtained and being produced by investigations on 
single proteins. Such focused studies will always be 
critical to explore, in a depth and rigor not possible 
with large-scale studies, the complex function of a 
protein. However, accessing the collective informa- 
tion about the proteome gathered from a multitude of 
single protein investigations is complicated. The data 
are published in peer-reviewed journals, but the size of 
research literature is immense. Information specific to 
a single protein of a single species can be scattered 
across various journals and articles, and is hard to 
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gather for any but the most diligent investigators. 
Therefore, other proteome databases, such as the Yeast 
Proteome Database (YPD™) and related databases 
for other model organisms (http://www.proteome. 
com//databases), combine comprehensive curation of 
protein properties and functions from published 
experiments with predicted properties and functions 
for proteins predicted from the genome. Such data- 
bases are complemented by the databases supported by 
the model organism communities, suchas the Saccharo- 
myces Genome Database (SGD) (http://genome- 
www.stanford.edu/Saccharomyoes), WormBase (http: 
//www.wormbase.org), Flybase (http://flybase.bio. 
indiana.edu), and the Mouse Genome Database 
(MGD) (http://www.informatics.jax.org), all of which 
integrate a great deal of published and unpublished 
information relevant to genetic and biochemical 
research in each organism. 


Future Directions 


A field that is currently just beginning but will greatly 
expand in the future is comparative proteomics, the use 
of the functional characterization of proteins in one 
characterized proteome to predict the function of 
uncharacterized but related proteins in another pro- 
teome. Complete functional characterization of a 
proteome will happen first with model organisms, 
particularly the Gram-negative bacterium Escherichia 
coli and the yeast Saccharomyces cerevisiae, which 
already have more than half of their proteomes func- 
tionally characterized to some degree. The nematode 
C. elegans and the fly Drosophila melanogaster will 
probably follow. All four organisms have a long his- 
tory of small-scale proteome analysis, with decades of 
accumulated knowledge derived from biochemical and 
genetic experiments. Investigations into the biology of 
all four organisms are supported by proteome data- 
bases that provide convenient access to these founts of 
accumulated knowledge. The biochemical pathways, 
complexes, networks, and even complicated processes 
such as learning, aging, and the mechanisms of disease 
are surprisingly conserved between model organisms 
and humans. Thus, the ultimate goal, to characterize 
the entire human proteome, is not as elusive as it might 
appear from the enormous size and complexity of this 
proteome. Investment in developing proteomic tech- 
niques and a deep functional knowledge base for 
model organisms will contribute to attaining even- 
tually the goal of understanding the human proteome. 
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A proto-oncogene is the normal counterpart of an 
oncogene; it is usually a gene involved in the signaling 
or regulation of cell growth. Typically, cellular onco- 
genes are prefixed with a ‘c, whereas their abnormal 
viral equivalents are prefixed with a ‘v,’ e.g., c-myc and 
v-myc. 


See also: Oncogenes 


Provirus 
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A provirus is the viral genome, when integrated into 
the host cell DNA. In retroviruses, their RNA genome 
must first be transcribed to DNA by reverse transcript- 
ase. The genes of the provirus may be transcribed and 


expressed, or the provirus may remain in a latent con- 
dition. Integration of the oncogenic viruses, such as 
Papovaviridae and retrovirus, may lead to cell trans- 
formation. 


See also: Retroviruses; Virus 


Proximal 


L Silver 
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A relative term meaning closer to the centromere 
along a chromosome (the opposite of distal). 


Pseudoalleles 
See: Alleles 


Pseudoautosomal Linkage, 
Region 
L Silver 
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In mammals, the X and Y chromosomes occupy a 
unique genetic niche in that they are not present in 
equal quantities in all members of the species. For 
example, males have just one copy of the X chromo- 
some, while females have two; males carry a Y chromo- 
some, while females do not. The process of 
development is known to be very fine-tuned with 
precise requirements for particular levels of gene 
activity, and thus the presence of different numbers 
of X chromosomes poses a problem for normal devel- 
opment. The question is how is the difference in gene 
number compensated for in males or females? The 
problem is solved in mammals through a process of 
X chromosome inactivation that occurs in all female 
cells: one of the two X chromosomes is inactivated so 
that only one copy of each X chromosome gene is 
expressed in females, just like in males. The process 
of X chromosome inactivation is random, which 
means that two females heterozygous at the same X 
chromosome gene may inactivate this locus differ- 
ently and express different phenotypes. There is a 
portion of the X chromosome that does not need to 
be inactivated because it is shared with the Y chromo- 
some. Thus, both males (XY) and females (XX) have 
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two copies of all genes in this region, in the same 
manner as an autosomal region. Thus, this region is 
referred to as a pseudoautosomal region, and the genes 
within it are said to have pseudoautosomal linkage. 


See also: X-Chromosome Inactivation 


Pseudogene 


E Thomas 
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A pseudogene is a nonfunctional genomic region that 
originated by duplication of, and is still homologous 
to, an ancestral gene. Any gene that mutates suffi- 
ciently to lose its ability to express a functional product 
becomes a pseudogene. This may be due to ‘sequence 
drift’ that has led to reading-frame shifts and transcrip- 
tion termination and/or mutations affecting mRNA 
processing or critical transcription or translation con- 
trol regions, such as initiation motifs. 

When a gene mutates and becomes nonfunctional, 
the evolutionary pressure on it not to mutate disap- 
pears, and it will quickly mutate further and become a 
pseudogene, deviating ever further from its original 
form. 

Pseudogenes often occur if a genome has several 
similar genes that perform identical, or very similar, 
functions. In such a case, the organism can often 
survive loss of function by one of the genes. When 
mutations cause one of the similar genes to lose 
some of its function, the evolutionary pressure on it 
is relieved as long as another gene can take over its 
function. On the other hand, if a gene performs a 
unique function, the organism may not be able to 
survive loss of this function, and this gene will not be 
able to mutate to become a pseudogene. 

It is possible that pseudogenes have an import- 
ant evolutionary role. They may provide a form of 
‘scratch space’ where the genome can mutate to form 
potential new genes. The new gene inherits the struc- 
ture of, and is homologous to, the original gene, and 
can potentially recombine with it to allow the organ- 
ism to try out some of the new variations that have 
been accumulating. Since pseudogenes originate in 
functional genes, they possess gene-like structure 
and characteristics. Since they have no evolutionary 
pressure on them, they will gradually degrade and lose 
this gene-like appearance. Some statistical properties 
(e.g., a nucleotide distribution resembling that of a 
coding region) will remain detectable for a long time. 
Other characteristics that are highly conserved in 
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functional genes will be lost quickly. For example, 
pseudogenes can often be detected by the existence 
of nearly perfect open reading frames with a few 
frameshifts or stop codons in them. 


See also: Evolution of Gene Families 


Pseudoxanthoma 
Elasticum (PXE) 


F M Pope 
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Pseudoxanthum elasticum (PXE) is a highly unusual 
connective tissue disorder because it produces specific 
light microscopy of the affected skin, where the flex- 
urally distributed elastic deposits produce a dense mid- 
dermal proliferation of calcified elastic rich material 
(Figure |). Electron microscopy is also specific, 
showing central calcification of the central amorphous 
component, whilst cauliflowers similar to those of 
Ehlers — Danlos syndrome (EDS I) are also common. 
Heterozygotes often show proliferation of micro- 
fibrils (Figure 2). 

Clinically, such elastic fragmentation and calcifica- 
tion especially affects the skin and mucous mem- 
branes, particularly the lower lip, but occasionally 
the palate, upper or lower intestines. Bruch’s mem- 
brane which is an elastic layer lying between the retina 
and choroid, is also faulty as are the arterial media and 
cardiac endothelium which may also be afflicted. 

PXE clinically presents with a flexurally distrib- 
uted lemon-yellow or ivory colored skin rash, ranging 


from macules to confluent peau d’orange infiltrates. 
This can present at any time from early childhood to 
adult life. In children the rash is often mistaken for 
permanent ‘dirty marks,’ especially around the neck. 
Retinal fragility causes fractures of Bruch’s mem- 
brane, which present as angioid streaks radiating from 
and surrounding the optic disk, culminating eventually 
with neovascularization from the underlying choroid, 
frequently causing macular hemorrhage, followed by 
central visual loss and effective functional blindness, 
although peripheral vision remains (Neldner, 1988). 

Premature arterial degeneration and stiffening 
occurs, as manifested by poorly palpable inelastic per- 
ipheral pulses and hypertension. Whilst premature 
claudication is common, cerebrovascular and coron- 
ary occlusion is less so. Bleeding from the GI tract (for 
a variety of unrelated reasons, such as vascular mal- 
formations, microaneurysms or peptic ulceration) is a 
notorious complication, with a frequency of 10%, and 
is more common in women, in whom it may recur in 
pregnancy. In general pregnancy is uncomplicated, 
except for a predisposition to perineal tears. 

The genetics is complex and puzzling (Pope, 1974), 
most cases (80% +) presenting sporadically and witha 
low recurrence risk. The other 20% produce complex 
segregation patterns the commonest of which is of 
multiple affected siblings in one generation (autosomal 
recessive). This pattern is at least twice as common 
as multiple affected generations (autosomal dominant) 
or variable transmission, with multigenerational full- 
house PXE, whilst joint hypermobility is very com- 
mon in obligate heterozygotes. 

The gene locus was mapped to 16p13.2in 1998 andin 
late 2000 the gene MRP6, was identified (Bergen et al., 
2000; Le Saux et al., 2000; Ringpell et al., 2000), and is 
a 31-exon ion transporter gene, coding for a gene with 


Figure | 
PXE skin. Fragmented elastic fibers are running trans- 
versely through the mid dermis. They are a lighter color 
than the remainder of the dermis (metachromasia). 


(See Plate 28) Light microscopy of typical 


Figure 2 Transmission electron micrograph x 40000 
of PXE heterozygote skin. The dark elastic fibers have 
fluffy haloes of microfibrils. The transversely sectioned 
collagen fibers are slightly irregular. 


17 transmembrane domains of unknown function and 
expressed predominantly in liver and kidney (Ringpell 
et al., 2000), rather than tissues, such as skin, eyes, and 
arteries, where the pathology of PXE resides. So far 
all available evidence suggests that double hetero- 
zygosity is common, whilst single heterozygotes may 
be at increased risk of arterial disease. Heterozygotes 
frequently show generalized joint laxity, otherwise 
indistinguishable from EDS T/hypermobile syn- 
drome. As shown earlier electron microscopy often 
shows microfibrillar haloes sutrrounding elastic fibers. 
Quite clearly genetic counseling, prenatal diagnosis, 
and preimplantation diagnosis are all realistic possibil- 
ities for all PXE sufferers. 

The population frequency is at least 1 per 100 000 in 
the UK and may be from three to six times commoner 
with heterozygote frequencies then varying from 1:50 
to 1: 200, depending upon the actual epidemiology. 
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Psoriasis 
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Psoriasis, mentioned by Galen, was first properly 
described in the nineteenth century by Robert Willan. 
It is a common dermatosis, affecting 2-3% of north- 
ern Europeans, but very much less common in other 
races. Histologically it produces epidermal hyperplasia 
and parakeratosis (nuclear retention of the stratum 
corneum) together with dermal vasodilation. These 
localized changes reflect a focal hyperproliferative 
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rapid turnover of epidermis. The prevalence rises with 
age, and is usually chronic and persistent, with separ- 
ate peaks in early adult and later life, rather resembling 
types I and II diabetes. 

Clinically the characterisic skin lesion is a sharply 
demarcated hyperkeratotic and erythematous change, 
varying from small 1-2 mm papules to larger plaques 
(Figure |) which often enlarge and become confluent, 
especially over knees, elbows, trunk, and the scalp. 
Variants include generalized papular (guttate psoriasis) 
often induced as an immune response to the strepto- 
coccus; plaques varying from coin-sized to plate-sized 
(nummular and plaque psoriasis) localized pustular 
variants of the hands and/or feet (localized pustular 
psoriasis), to the highly disabling and potentially 
lethal generalized variants, such as erythodermic and 
generalized pustular forms, which require treatment 
with potent systemic antimetabolites, such as metho- 
trexate, hydroxyurea, or cyclosporin. Systemic (non- 
dermatogical) features include nail dystrophies caused 
by abnormalites of the nail plates, mucosal lesions of 
the mouth and urinary tract, and psoriatic arthritis, 
with overlaps, when systematized with ankylosing 
spondylitis and Reiter syndrome, all of which are oc- 
casionally complicated by aortitis with aortic valvular 
regurgitation. Most commonly, however, there is a 
distal interphalangeal arthropathy, or a monoarthritis 
of knees, hips, or elbows or more rarely a symmetrical, 
relatively benign, nonvasculitic polyarthritis. There 
is also an association with hyperuricemia, gout, and 
diabetes. Aggravating factors include infections, stress, 
and drugs such as lithium, beta blockers, nonsteroidal 
anti-inflammatories, and ethanol. 

Given such clinical variation, the genetics of psor- 
iasis is complicated and usually polygenic. Population 
surveys clearly show familial aggregation, consistent 
with both single gene and polygenic models, with an 
increased risk varying with the number and closeness 
of affected relatives. Twin studies show concordance 


Figure | 
from the extensor surface of the forearm. 


(See Plate 29) Typical hyperkeratotic plaque, 
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in monozygotic twins with high heritabilities of 
between 80% and 90%. 

HLA associations are frequent, including Cw6, A1, 
B13, B17, B37, of the HLA class I subtype, special 
haplotypes of which greatly increase the relative risk 
more than 20 fold. Early-onset type I psoriasis is 
skewed towards Cw6/DR7, whilst these markers are 
lacking in the late-onset form. Affected sib-pair an- 
alyses also show remarkable frequencies of allele shar- 
ing. Not surprisingly, therefore, the HLA class I locus 
on chromosome 6 shows linkage disequilibrium in 
many psoriasis families, implying T cell-mediated 
mechanisms. Furthermore, given the clinical variabil- 
ity of psoriatic phenotypes, it should not be surprising 
that genome-wide scans have implicated loci other 
than chromosome 6, including the long arm of chromo- 
some 17 and another locus on chromosome 4. OMIM 
currently lists six distinct loci; PSORS1 on 6p21.3, 
PSORS2 on 17q, PSORS3 on 4q, PSORS4 on cen- 
q21, PSORS5 on 3q21, and PSORS6 on 6p. It also 
mentions other putative loci on 16q and 20p. 

Clearly the genetics of psoriasis is highly complex 
and genes of high, medium, and small effect may well 
play differing parts as they do in maturity-onset and 
other forms of diabetes. Furthermore, just like diabetes, 
variable penetrance poses real problems, whilst pheno- 
copies, misdiagnosis and unconvincing, inconsistent 
pedigree patterns conspire against consistent genetic 
models beween different families. With time the con- 
flicting models of autosomal dominant, autosomal 
recessive and male (paternal-influenced) effects may 
all ultimately be explicable by clusters of genes acting 
at many points in the complex cascade of substances 
regulating epidermal proliferation and differentiation. 
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See also: Penetrance 


Puff 
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A puff is a swelling of a band of a polytene chromo- 
some associated with the active synthesis of RNA ata 
particular locus in the band. In these areas, the chro- 
matin becomes less condensed and the fibers unwind, 
although they remain continuous with the fibers in the 
chromosome axis. Unwinding at multiple bands 
causes Balbiani rings in Diptera. 


See also: Balbiani Rings; Chromatin; Polytene 
Chromosomes 


Pulse—Chase 
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Pulse—chase is an experimental technique used to deter- 
mine cellular pathways, such as precursor—product 
relationships. A sample (organism, cell, or organelle) 
is exposed for a brief period of time to a radioactively 
labeled molecule (pulse). It is then replaced with an 
excess of the unlabeled counterpart (the chase or cold 
chase). The sample material is then examined at vari- 
ous intervals to determine the fate of the radioactive 
component. 


See also: Cell Cycle 


Pulsed Field Gel 
Electrophoresis (PFGE) 
L Stubbs 
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Pulsed field gel electrophoresis (PFGE) describes a set 
of electrophoresis protocols that permit the separation 


of large DNA fragments and approximate measure- 
ment of fragment length. Since the concept of PFGE 
was first described, many variations of the basic 
methodology have been described. But all PFGE pro- 
tocols involve separation of large DNA fragments 
through agarose gels under the influence of an electric 
field that is periodically changed in direction. PFGE 
has had a major impact on human gene mapping and 
positional cloning efforts, but the method has many 
different applications, ranging from preparative steps 
essential to large-insert genomic cloning to isolation 
of large plasmids in microbes and separation of intact 
chromosomes of yeast. 


Historical Perspective 


PFGE was first described in the early 1980s, as bio- 
medical researchers first began to work in earnest 
toward positional cloning of human disease genes. In 
those early days, the density of useful human markers 
was low; genetic mapping provided the only means of 
ordering markers, measuring their relative positions, 
and linking them to each other as well as to loci 
associated with inherited disease. Genetic mapping 
was slow and difficult and genetic distances (measured 
in centiMorgans, or cM) were known to be poor 
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indicators of physical spacing. Given the tedious 
nature of chromosome walking and other available 
gene cloning methods, knowledge of physical distance 
between flanking markers was a crucial factor in the 
planning and prioritization of positional cloning pro- 
jects. Methods to resolve and physically measure 
genomic intervals as large as the average cM - esti- 
mated in humans to measure approximately 1 million 
base pairs (1 Mb), were desperately needed. 

Agarose gel electrophoresis had long provided a 
reliable means of measuring lengths of DNA seg- 
ments. However, standard electrophoretic methods 
cannot be used to resolve fragments of megabase 
lengths. In standard electrophoresis protocols, an 
electric field is applied in a constant direction across 
the length of the gel, driving the negatively charged 
DNA molecules to migrate steadily toward the posi- 
tive electrode (Figure 1A). Molecules pass through 
the agarose matrix at a rate that is roughly dependent 
on fragment size; smaller molecules migrate faster and 
are therefore separated from larger DNA fragments in 
a mixture. However, the resolving power of agarose 
gels subjected to constant currents is limited to smaller 
DNA fragments, and cannot be used to separate mol- 
ecules that are greater than approximately 30-40 kilo- 
bases (kb) in length. 


N 
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Arrangement of positive (+) and negative (—) electrodes, direction of applied electric fields relative to the 


gel, and the application of fields with constant versus alternating directions distinguishes standard gel electrophoresis 
from pulsed field gel electrophoresis. In standard electrophoresis protocols (A), continuous strip electrodes are 
placed above (—) and below (+) the gel. The directions of applied electric fields are shown by dashed-line arrows; 
solid-line arrows illustrate the migration path of DNA through the agarose gel. A constant electric field is applied 
through the length of the gel during electrophoresis, driving the negatively charged DNA molecules to migrate in a 
straight path toward the positive electrode. In contrast, pulsed field gel electrophoresis protocols require the 
direction of the electric field to be changed periodically and the DNA does not migrate in a straight path. In the 
originally described version of PFGE, two sets of point electrodes were positioned to direct the electric field through 
the gel at opposing 45° angles. Current was applied to one electrode set for a fixed period of time, then switched to 
the other for an identical period. The alternating electric field drives the DNA fragments to migrate through the gel in 
a zigzag pattern (B). 
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Measuring Large DNA Fragments: PFGE 
in Practice 


In 1982, a novel means of separating large DNA mol- 
ecules was first described by David Schwarz, Charles 
Cantor, and colleagues. Many variations on this 
method, termed pulsed field gel electrophoresis 
(PFGE) have since been described. PFGE uses stand- 
ard agarose gels but differs from standard electro- 
phoresis in that the direction of the electric field is 
changed periodically throughout the electrophoretic 
run. The original PFGE experiments used focused 
‘point’ electrodes positioned at each side of the gel 
and set to pass current through the gel at intersecting, 
perpendicular 45° angles. The direction of the electric 
field was changed by activating first one set of electro- 
des, then the next, in alternating pulses of equal dura- 
tion. Another version of PFGE, called field inversion 
gel electrophoresis (FIGE), begins with electric field 
oriented down the length of the gel, as in traditional 
electrophoresis. However, in FIGE protocols, the 
field is inverted after a fixed period of time, driving 
the negatively charged fragments backward through 
the agarose matrix. In all types of PFGE, the alternat- 
ing current pulses are directed (or in the case of FIGE, 
timed) so that the DNA fragments move in a net 
‘forward’ direction, that is, directly away from the 
point at which the sample was applied to the gel 
(Figure IB). 


Explaining the Resolving Power of PFGE 
The precise basis of the increased resolving power of 
PFGE are not understood. But the most widely 
accepted theory focuses on the relative ability of frag- 
ments of different lengths to respond to the changing 
direction of the electric field. Due to the alternating 
currents, the DNA fragments subjected to PFGE 
separation regime do not migrate directly through 
the agarose matrix in a straight path, as they would 
ina constant field. The fragments must turn and reori- 
ent themselves each time the direction of current is 
changed. For short fragments, the time and energy 
required to reorient in response to the changing field 
is minor, and such fragments move through the 
gel much as they would in a constant field. However, 
large fragments reorient and begin to move toward the 
positive electrodes through the gel matrix at a rate that 
is dependent on their length. The difference in time 
that is required to reorient a fragment 200 kb long, 
versus one that is 400 kb in length, for example, is 
significant under appropriate conditions. This differ- 
ence impedes the forward movement of the larger 
fragment relative to its smaller counterpart, and per- 
mits separation of the two fragments as they move 
down the length of the agarose gel. 


Parameters Affecting PFGE Resolution 
PFGE can be used to resolve DNA fragments within a 
wide range of sizes, depending on conditions of elec- 
trophoresis. Fragments ranging from 10 to 100 kb can 
be well resolved under certain conditions, while other 
conditions can be chosen to resolve 2-6 Mb lengths. 
Optimal separation requires careful attention to many 
different parameters including agarose type and con- 
centration, buffer strength, current (high or low levels; 
also the application of constant current vs. current that 
increases over the time), and the total length of time 
electrophoresis is allowed to proceed. However, the 
most important parameter is the frequency with 
which the direction of the electric field is changed. 
Fast ‘pulses’ provide the best separation of smaller 
molecules, while long pulses permit better separation 
of fragments in the Mb range. This presumably 
reflects the fact that, in rapidly switching fields, none 
of the larger molecules can reorient themselves effi- 
ciently and therefore all are equally retarded in for- 
ward movement. By contrast, rapidly switching fields 
permit differences between smaller molecules to be 
discriminated more accurately. 


Applications Then and Now: PFGE in the 
Age of Genome Sequencing 

PFGE-based methods have played a major role in 
human gene mapping and genome sequencing, prim- 
arily through two specific applications. After early 
successes in separating chromosomes of yeast, PFGE 
was soon put to use in constructing restriction maps of 
megabase-long regions surrounding markers linked to 
human disease genes. To create a long-range restric- 
tion map, large DNA fragments generated by rare- 
cutting restriction enzymes (with recognition sites 
spaced 100-1000kb or more apart in mammalian 
DNA) are separated by PFGE, and transferred to a 
Southern blot. Probes corresponding to the linked 
markers or genes are hybridized to the blot; patterns 
of shared and distinct restriction fragments provide 
verification of linkage and a measure of physical dis- 
tance between the probes. By permitting comparison 
of restriction maps of DNA samples taken from 
patients and normal individuals, PFGE has been 
used extensively to detect and locate rearrangements, 
such as deletions and translocations, associated with 
many different types of inherited disease. Preparative 
applications of PFGE have also played a significant 
role in creation of high-quality libraries of large-insert 
genomic clones, such as BACs and YACs. PFGE- 
based size separation followed by gel purification 
steps is still an essential step in successful BAC or 
YAC library creation, providing the best means of 
ridding large-insert DNA preparations of small con- 
taminant fragments. 


As marker density increases and new means of 
measuring physical distances (like radiation hybrid 
maps, and for human, complete genomic sequence) 
become available, PFGE is less frequently used to 
link and measure genetic markers and genes in mam- 
malian genomes. However, PFGE in its many varia- 
tions is still a powerful and very useful technique. 
With its power to resolve whole chromosomes in 
yeast, large plasmids from microbes, minichromo- 
somes in eukaryotes, and small chromosomes pro- 
duced by genetic rearrangements in mammalian cells, 
PFGE should continue to provide an important tool 
for genetics studies in upcoming years. 
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Following Mendel’s law of segregation, for any single 
gene trait in a diploid organism, half of the gametes 
produced by the organism will have one of the alleles 
that the organism possesses and half of the gametes will 
have the other. If the organism is homozygous for the 
trait, of course, all gametes will contain the same allele. 
When two organisms are crossed, a Punnett square can 
be used to predict the proportions of genotypes and 
phenotypes that will result in the F, offspring. 

The Punnett square itself is a table in which all of 
the possible genetic outcomes for a given mating are 
listed. In its simplest form, the Punnett square consists 
of a square divided into four quadrants. Across the top 
of the table, all possible genotypes for the haploid 
female gamete are listed. Down the left side, all of 
the possible genotypes for the haploid male gametes 


Female gametes 


Male gametes 


Figure | 


Punnett square. 
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are listed. In the squares of the table are the diploid 
genotypes that would result from each possible com- 
bination of male and female gametes, that might come 
together in fertilization. 

In a cross between two individuals that are hetero- 
zygous for a trait, the Punnett square would appear 
as represented in Figure I. 

For this cross, the Punnett square reveals that 
three possible offspring genotypes are possible: AA, 
Aa, and aa. Further, they are expected to occur in the 
ratio 1:2:1. With information about the dominance 
relationship between the alleles, it is also possible to 
predict the ratio of phenotypes among the offspring of 
this cross. For instance, if A is dominant to a, the 
expected ratio of phenotypes would be 3 showing 
the dominant trait to every 1 showing the recessive 
trait. 

A Punnett square can be expanded to accommodate 
crosses involving two genes or even more (although 
they quickly become unwieldy, since a three-gene 
cross requires a 64-cell Punnett square). Punnett- 
square analysis of such multigene crosses illustrates 
Mendel’s law of independent assortment; when the 
individual genes are considered one at a time, it is 
easy to note that the expected genotype and pheno- 
type ratios among the offspring are not influenced by 
the other genes. 


See also: FI Generation; Mendel’s Laws 
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Purine is the generic chemical name for a class of 
bicyclic nitrogen-containing aromatic bases. The 
term also refers to a specific compound (composition, 
Cs5H,N,) not found in nature that can be regarded as 
the parental structure for a range of naturally occur- 
ring chemical species. The most abundant naturally 
occurring purines are adenine (6-aminopurine) and 
guanine (2-amino, 6-oxypurine), found in DNA and 
RNA as nucleotidyl building blocks of these poly- 
mers. Other prominent purines include uric acid 
(2,6,8-trihydroxypurine), the major metabolic end 
product of purine metabolism in primates, and caf- 
feine (1,3,7-trimethyl 3,6-dioxopurine), a stimulant 
found in tea and coffee. 


See also: Pyrimidine 
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Pyrimidine 
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‘Pyrimidine’ is the generic name for a class of 
aromatic, nitrogen-containing bases that have a six- 
membered, heterocyclic ring system. The name also 
refers to a specific compound (composition C4H4N>), 
not found in nature, that can be regarded as the 
parental structure of a wide range of naturally occur- 
ring chemical species. The most abundant naturally 
occurring pyrimidines are uracil (2, 4-dihydroxy- 
pyrimidine), cytosine (2-hydroxy-4-aminopyri- 
midine), and thymine (2, 4-dihydroxy-5-methyl 
pyrimidine). The first two are found predominantly 
in RNA, while the latter two are found predomin- 
antly in DNA. Small amounts of thymine are found 
in transfer RNA. The two pyrimidines found in DNA 
are usually base-paired with a purine residue on the 
complementary strand, so the purine to pyrimidine 
ratio in DNA is unity. In RNA, which is single- 
stranded, this ratio varies widely. 


See also: Cytosine; Purine; Thymine; Uracil 


Pyrimidine Dimers 
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Adducts between two adjacent pyrimidine bases in 
a DNA strand comprise more than 95% of the DNA 
lesions caused by UV light below 340 nm wavelength. 
Two types of these pyrimidine dimers are formed: 
cis-syn-cyclobutane adducts (about 2/3) and 6,4- 
photoproducts (about 1/3). Both are mutagenic, block 
progression of RNA polymerase and can be repaired 
by the ubiquitous nucleotide excision repair process 
or by photolyases found in almost all organisms except 
placental mammals. In man, UV-induced DNA dam- 
age is the primary cause of all nonmelanoma skin 
cancers. 


See also: Excision Repair; Photorepair 


Q-Banding 
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Genetic maps of chromosomes can be obtained using a 
variety of techniques. Crude chromosomal maps can 
be produced by staining chromosomes with dyes, 
such as quinacrine and Giemsa, which intercalate 
into helical DNA producing Q and G bands, respect- 
ively. This results in distinctive chromosomal banding 
patterns. However, since each band contains around 
5-10% of the chromosomal DNA, such chromosomal 
maps are extremely rough. 


See also: Chromosome Banding; G-Banding; 
Giemsa Banding, Mouse Chromosomes 
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QTL mapping is a genome-wide inference of the rela- 
tionship between genotype at various genomic loca- 
tions and phenotype for a set of quantitative traits in 
terms of the number, genomic positions, effects and 
interaction of quantitative trait loci (QTL). The pri- 
mary purpose of QTL mapping is to localize chromo- 
somal regions that significantly affect the variation of 
quantitative traits in a population. This localization is 
important for the ultimate identification of respon- 
sible genes and also for our understanding of genetic 
mechanisms of the variation. 

Mapping QTL can also help us to understand how 
many QTL significantly contribute to the trait varia- 
tion in a population. How much variation is due to the 
additive effects of QTL and how much due to domin- 
ant and epistatic effects of QTL? What is the nature of 
genetic correlation between different traits ina genomic 
region, pleiotropy or close linkage? Do QTL interact 


with environments? These questions are related to the 
genetic architecture of quantitative traits in the popu- 
lation, and are intimately related to many applications 
in quantitative genetics, such as marker-assisted pre- 
diction or selection and marker-assisted gene intro- 
gression. 

Data for mapping QTL consist of types of a num- 
ber of polymorphic genetic markers and quantitative 
trait values for a number of individuals. Marker data 
are categorical and can be classified in different cat- 
egories and recorded in digital form, such as 1 or 0 for 
the presence or absence of a particular molecular band 
at a particular marker, or the two marker genotypes 
(homozygote and heterozygote) for a backcross 
population from two inbred lines. Based on segrega- 
tion analysis, these markers can be ordered in linkage 
groups or linearly on chromosomes to represent a 
genetic linkage map. Quantitative trait data are usually 
continuous, such as body weight, but can also be dis- 
crete, such as litter size. While marker data contain 
information about segregation of a genome in a popu- 
lation, quantitative trait data contain information 
about the variation of traits in the population. The 
two data sets are connected by QTL. A part of the 
trait variation is caused by the segregation of QTL 
which are linked to some of the markers in the gen- 
ome. So the statistical task of mapping QTL is to relate 
quantitative trait variation to genetic marker variation 
in terms of a quantitative genetic model that includes 
many genetic architecture parameters such as number, 
positions, effects and interactions of genes that affect 
the quantitative traits of interest. 

Traditional experimental designs for locating QTL 
start with two parental lines differing both in trait 
values and in the marker variants they carry. Suppose 
two pure-breeding lines, p; and p>, have marker geno- 
types MN/MN and mn/mn for two markers. Cross- 
ing these lines produces fı offspring that is doubly 
heterozygous. It is denoted as MN/mn, where the slash 
separates the contributions from the two parents. Each 
fı individual can produce four possible gametes, or 
marker allele combinations for transmission to the next 
generation. The proportions of these four gametes 
MN, Mn, mN and mn are (1—ryn)/2, run /2; 
run/2 and (1—ryn)/2, respectively, where rmn is 
the recombination frequency between the two markers. 
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This segregation of gametes can be observed, for 
example, from backcross populations b4 and b (b = 
fi x pi and by =f, x p2) and also fọ populations 
(f2 =fi x fi). If a number of genetic markers and 
quantitative traits are observed in these and other popu- 
lations, mapping can be performed to locate QTL. 


One Marker Analysis 


The simplest method of associating markers with 
quantitative trait variation is to test for trait value 
differences between different marker groups of indi- 
viduals for a particular marker. For example, let m/m 
and jijm be the observed trait means of the groups 
of individuals with marker genotypes M/M and M/m 
for a marker in a backcross population, we can test for 
significance between means jiy/y and fiy/ using the 
usual ¢ test with the statistic 


HM /m — eM /M 
2 1 1 
j E a +) 


where s* is the pooled sampling variance, and ny/y 
and ny/,, are corresponding sample sizes in each 
marker class. The hypotheses to be tested can be 
Ho: umm = Hm/m and Ay: pmm E Hmm- 

To understand the relevance of this test to QTL 
mapping, we need to know what exactly is tested in 
genetic terms. Suppose that there are m QTL con- 
tributing to the genetic variation in a backcross 
population from two inbred lines. Ignoring epistasis, 
the expected difference between jiy/y and fm/m is 


— 


m 


e(Õum — fimjm) = X (1 — 2ni)ai 


i=l 

where e denotes expectation, a; is the effect of the ith 
QTL expressed as a difference in effects between the 
recurrent parent homozygote and the heterozygote, 
and r; is the recombination frequency between the 
marker and the zth QTL. Essentially this means that 
we test a composite parameter that constitutes gene 
effects and recombination frequencies for (poten- 
tially) a number of genes. Of course, many QTL 
may not be linked to the marker, and thus have 0.5 
recombination frequency. The above hypotheses are 
then equivalent to Ho: all r; = 0.5 and Aj: at least one 
r; < 0.5, because the a,’s are usually non-zero by 
experimental design. If m/m and m/m are found to 
be significantly different, we conclude that the marker 
is linked to one or possibly more QTL. This analysis, 
however, cannot determine whether a significant 
marker effect is due to one or multiple QTL and 


whether the effect is due to distantly linked QTL 
with large effects or closely linked QTL with small 
effects. With a dense linkage map, the second problem 
can be alleviated. 


Interval Mapping 


Because single marker analysis cannot separate r and a 
in test and estimation even when there is only one 
QTL on a chromosome, Lander and Botstein (1989) 
proposed a maximum likelihood method that uses a 
pair of adjacent markers to test the effect of a genomic 
position within a chromosomal interval bracketed by 
two adjacent markers. This is an attempt to disentan- 
gle r and a in analysis. This method is called interval 
mapping. Specifically, for a backcross population they 
proposed the following linear model to test for a QTL 
located on an interval between two adjacent markers 


Yj = Bt Bx! + ej forj =1,2,...,” 


where y; is a quantitative trait value of the jth individ- 
ual, u is the mean of the model, x* is an indicator 
variable, taking a value 1 or O for the two possible 
QTL genotypes with probability depending on the 
genotypes of markers and the genomic position 
being tested, b* is the effect of the putative QTL, e; is 
a residual variable (usually assumed to be normally 
distributed with mean zero and variance o°), and 7 is 
the sample size. Since x* is usually unobserved for 
a particular genomic position and can take different 
values, statistically this is a mixture model. The likeli- 
hood function of the model is 


n 


L(u, b, P) = | [puolu + 2°, 07) + poell 07) | 


j=l 


where ¢(y;|u, o°) is a normal density function of y; 
with mean 4 and variance 0°, and pz; is the probability 
of x* = k given marker data and the testing position of 
the putative QTL. 

The test statistic can be constructed using a likeli- 
hood ratio (LR) 


are a9 
iR- ee? = 0, 6°) 
IRIRE) 

to compare the null hypothesis Ho: b* = 0 with the 
alternative hypothesis H4: b* 4 0, assuming that the 
putative QTL is located at the point of consideration, 
where ji, b* and & are the maximum likelihood esti- 
mates of u, b* and o° under H4, and Â, ô? are the 
estimates of u, o° under Ho with b* constrained to zero. 


In human linkage analysis, the likelihood ratio test 
statistic, however, has traditionally been expressed in 
terms of LOD (for log odds) score 


Extending this tradition, many QTL mapping an- 
alyses also use LOD score as a test statistic. There 
is a one-to-one correspondence between LR and 
LOD, and LR can be translated into LOD as 
LOD = $ (log, )¢)LR = 0.217LR. 

This test can be performed at any genomic position 
covered by markers and thus the method involves a 
systematic strategy of searching for QTL. If the like- 
lihood ratio test statistic at a genomic region exceeds a 
predefined critical threshold, a QTL is estimated at the 
position of the maximum test statistic. The estimates 
of locations and effects of QTL are asymptotically 
unbiased statistically with this maximum likelihood 
approach if there is only one QTL on a chromosome. 

It is important to determine an appropriate critical 
threshold for a test statistic above which a QTL can be 
claimed with a certain confidence. The determination 
of the critical threshold is based on the distribution of 
a test statistic under the null hypothesis. This distribu- 
tion for LR at a given position is generally asympto- 
tically chi-square with a degree of freedom that is equal 
to the number of parameters under the test. However, 
because the test is usually performed in the whole 
genome, there is a multiple testing problem, and the 
distribution of the maximum LR or LOD score over 
the whole genome under the null hypothesis becomes 
very complicated. Theoretical and numerical analyses 
have indicated that the threshold at 5% significance 
level over a whole genome is generally between 2 and 
3.5 on LOD score, depending on the size of genome, 
density of markers, sample size and genetic model. 
Alternatively, the relevant threshold for a given data 
set can be estimated numerically from the data by 
using a permutation test. 

The model of interval mapping is relatively simple 
in terms of genetics. Because of it, it has a critical 
problem that, if there are two or more QTL on a 
chromosome, the test statistic at a genomic position 
will be affected by all those linked QTL. Therefore, 
the estimated positions and effects of ‘QTL identified 
by this method can be biased. Moreover, some geno- 
mic regions which do not contain QTL can still show 
a significant peak on LOD score if there are multiple 
QTL in the nearby regions. This is the so-called 
‘ghost’ gene phenomenon. This defect is similar to 
the defect in single marker analysis that is discussed 
above. 


QTL Mapping 1589 


Composite Interval Mapping 


Ideally, when we test a marker interval for a QTL, we 
would like to have our test statistic be independent of 
the effects of possible QTL located in other regions of 
the chromosome. If such a test can be constructed, we 
can break down the effects of linked QTL by statis- 
tical means to avoid the confounding effects of multi- 
ple linked QTL in the search for each individual QTL. 
In other words, we can test independently each inter- 
val for the presence of a QTL. Such a test can be 
constructed by using a combination of interval map- 
ping and multiple regression. 

Ina multiple regression analysis of a trait on multi- 
ple markers or other explanatory variables, each 
regression coefficient is a partial regression coefficient 
conditional on other variables fitted in the model. 
Largely because of the linear structure of genes on 
chromosomes, a partial regression coefficient of a 
trait on a marker or a testing position of interest 
possesses a very important property that the coeffi- 
cient is expected to depend only on those QTL within 
an interval that is bracketed by two fitted flanking 
markers. The flanking markers are fitted in the 
model as cofactors to block the effects of other poss- 
ibly linked QTL to the test. This treatment makes the 
partial regression coefficient independent of QTL 
effects on other linked or unlinked intervals, and is 
the basis of composite interval mapping. The linear 
independence, however, depends on the assumption of 
no crossing-over interference and no epistasis. Inter- 
ference and epistasis introduce nonlinearity into the 
model. 

Specifically, to test for a QTL on an interval 
between two adjacent markers, we can extend the 
interval mapping model to 


Yje= M+ b*x; + XO bexje + ej 
k 


where xjz is an indicator variable referring to the geno- 
type of marker k that is selected to control the genetic 
background, bg is the partial regression coefficient 
associated with marker k, and b* now is also a partial 
regression coefficient associated with the putative 
QTL. In this case, the likelihood function becomes 


L(b*,b, o°) 
= | [[pue(yjlxib + 5°, 07) + poid (yilxjb, 0”)] 
jat 
where xjb = p + Soy Dpxjp- 


A likelihood ratio test statistic can also be con- 
structed to compare the hypotheses Ho: b* = 0 with 
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Hy: b* # 0. However, since b* is a partial regression 
coefficient, the null hypothesis is acomposite hypothe- 
sis, conditional on other partial regression coefficients 
inthe model. Thus the method is called composite inter- 
val mapping. Many statistical issues of composite in- 
terval mapping were discussed in Zeng (1994). 

Like interval mapping, this test can be performed at 
any position ina genome covered by markers. Thus it 
also gives a systematic strategy to search for QTL ina 
genome. The main advantage of composite interval 
mapping, as compared to interval mapping, is the 
ability to separate effects and locations of multiple 
linked QTL in mapping. This is shown in Figure | 
as an example. Figure | summarizes the analyses of 
mapping body weight loci on mouse chromosome x 
from a backcross population (Dragani et al., 1995). 
The test statistic, LOD score, of the interval mapping 
and composite interval mapping analyses is plotted 
against the linkage map location of the chromosome 
referenced by 14 microsatellite markers. The value of 
LOD score at each map position indicates the strength 
of evidence for a QTL at the position. If the LOD 
score at a genomic region exceeds a predetermined 
threshold, one or more QTL are indicated in that 
region. 

For the interval mapping, the threshold is 3.3 for 
the experimental design. By this criterion, the LOD 
score in the most part of chromosome x is above the 
threshold, and shows significant peaks in several mark- 
er intervals. However, not all significant peaks could 


— Interval mapping 


LOD 


10 cM 


be interpreted as QTL because of linkage effects, the 
‘ghost’? gene phenomenon and statistical sampling 
effects. Although the analysis strongly supports the 
existence of segregating QTL on chromosome x, it is 
not clear from the interval mapping analysis how 
many QTL are on the chromosome and where they 
are located. 

The LOD score of the composite interval mapping 
analysis shows two distinct major peaks. This suggests 
that there are at least two body weight QTL on 
chromosome X in the mouse genome, one is mapped 
near marker Rp18-rs11 and the other near DX MIT60. 
The two QTL together explain 25% of the phenotypic 
variance in the mapping population. In this case, the 
composite interval mapping analysis achieved a much 
better resolution in mapping QTL. 


Multiple Interval Mapping 


Composite interval mapping still has some limita- 
tions. One limitation is that the analysis can be 
affected by an uneven distribution of markers in the 
genome, meaning that the test statistic in a marker-rich 
region may not be comparable to that in a marker- 
poor region. It is also difficult to estimate epistasis of 
multiple QTL and the contribution of multiple QTL 
to the phenotypic variance. These limitations can be 
removed if multiple QTL are searched and mapped 
simultaneously. This is the idea of multiple interval 
mapping which fits multiple putative QTL, including 
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Genetic mapping of body weight loci on chromosome X in a mouse backcross population. LOD score 


curves of a composite interval mapping analysis (solid curve) and an interval mapping analysis (dashed curve) are 
shown on a map containing 14 molecular markers. By the interval mapping analysis, it seems that most of the 
chromosome shows significant effects on body weight. The composite interval mapping analysis strongly indicates 
that there are two body weight loci on chromosome X segregating in the population. 


epistasis, in a model to search, test, and estimate the 
positions, effects, and interactions of multiple QTL 
simultaneously. 

Multiple interval mapping (Kao et al., 1999; Zeng 
et al., 1999) consists of four components: (1) an 
evaluation procedure designed to analyze the like- 
lihood of the data given a genetic model (number, 
genomic positions and epistatic pattern of QTL); (2) 
a search strategy optimized to select the best genetic 
model (among those sampled) in the parameter space; 
(3) an estimation procedure for all the genetic para- 
meters of quantitative traits, including the number, 
positions, effects, and epistasis of QTL, and genetic 
variances and covariances explained by QTL effects, 
given the selected genetic models; and (4) a prediction 
procedure to estimate or predict the genotypic values 
of individuals and their offspring based on the selected 
genetic model and estimated genetic parameter values 
for marker assisted selection or prediction. 

For m putative QTL in a backcross population, the 
multiple interval mapping model is defined by 


suty oi + > Br (<; XiX xi) + ei. 


r<se(1 ~m) 


where y; is the phenotypic value of individual z while 
xj, is a coded variable denoting the genotype of 
putative QTL r (defined by 1/2 or —1/2 for the two 
genotypes). The variable x. is unobserved, but its 
conditional probability given observed marker pheno- 
types can be analyzed. Parameters of the model 
include the mean (u), the marginal effects («,’s) and 
epistatic effects (f,,’s) of the putative QTL, and the 
variance (o°) of the residual effect (e; assumed to be 
normally distributed with mean zero). To avoid over- 
parameterization, a subset of significant pairwise 
QTL epistatic effects, indicated by r Æ s e (1, ..., m), 
are selected to be included in the model. 

Since the genotypes of an individual at many geno- 
mic locations are not observed (but marker pheno- 
types are), the model contains missing data and thus 
the likelihood function of the data given the model is a 
mixture of normal distributions 
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The term in square braces is the weighted sum of a 
series of normal density functions, one for each of the 
2” possible multiple-QTL genotypes. p; is the prob- 
ability of each multilocus genotype conditioned on 
marker data. The QTL parameters («’s and f’s) are 
contained in the column vector E while the row vector 
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D; specifies the configuration of x*’s associated with 
each « and £ for the jth QTL genotype. 

The analysis of the likelihood can be performed 
through a numerical EM (expectation/maximization) 
algorithm. The EM algorithm is an iterative procedure 
involving an E-step (Expectation) and an M-step 
(Maximization) in each iteration. In the [t + 1]th 
iteration, the E-step is 
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and the M-step is 
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where E, is the rth element of E and D, is the rth 
element of D;. Given an initial value for parameters E, 
the algorithm can rotate between E and M step until 
the convergency of estimates. 

The test for each QTL effect, say E,, is performed 
by a likelihood ratio test conditioned on the other 
QTL effects 


LOD = log,, 


L(E, £0, ...,Emsr#0) 


*T(E1 #0,. E,- 1 0, E,=0, F,4140,..., Emi: #0) 


For given positions of m putative QTL and m + t 
QTL effects, the likelihood analysis can proceed as 
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outlined above. Then the main task is to search and 
select the best genetic model (number, genomic pos- 
itions, and epistatic pattern of QTL) that fits the data 
well. Search for multiple QTL ina multiple (unknown) 
dimension space is a very difficult task. Several issues 
have to be considered and balanced in designing an 
efficient algorithm for the search process. On the one 
hand, we need to consider the reliability and robust- 
ness of an algorithm, and on the other hand we need 
also to consider its efficiency and applicability. Several 
methods have been used for this process, such as step- 
wise model selection, genetic algorithms and Markov 
chain Monte Carlo. 

The stepwise model selection consists of a number 
of components. There is a search step that searches the 
genome for the position of new QTL given the current 
genetic model (a forward step); an epistasis step that 
searches for significant interaction effects of the newly 
identified QTL with other QTL in the model (a part 
of the forward step); an evaluation step that evaluates 
each QTL effect fitted in the model for significance 
under the new model and drop any nonsignificant 
effect (a backward step); an optimization step that 
optimizes the estimates of genomic position of each 
QTL fitted in the model under the new model; a 
stopping rule that determines the termination of the 
search process; and an estimation step that reports 
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estimates of various genetic architecture parameters, 
composite genetic parameters, and individual geno- 
typic values. Estimation of individual genotypic 
values and prediction of offspring genotypic values 
of two individuals can provide a basis for marker- 
assisted selection. 

As an example, Figure 2 shows the mapping result 
by multiple interval mapping for a morphological 
shape difference between two Drosophila species 
(Zeng et al., 2000). Two Drosophila species, D. simu- 
lans and D. mauritiana, were crossed to make F; 
hybrids. Because F; males are sterile, females of F, 
population were backcrossed to each of the parental 
species to create two backcrosses. There are about 500 
individuals in each backcross. The trait is the mor- 
phology of the posterior lobe of the male genital arch 
analyzed as the first principal component in an ellip- 
tical Fourier analysis. After extensive search analysis 
using multiple interval mapping, the model is stabil- 
ized at 19 QTL with six significant epistatic terms in 
the backcross to D. mauritiana. Figure 2 depicts the 
likelihood profile (LOD score) for each QTL that 
spans between its neighbors. The peak of each like- 
lihood profile provides an estimate of the position of a 
QTL on the genetic linkage map. 

Mapping QTL is not restricted for backcross and 
F, populations of inbred lines. Mapping methods can 
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Figure 2 LOD profiles of the 19 putative QTL mapped by multiple interval mapping for chromosome X, 2 and 3 of 


Drosophila. Marker positions are shown by triangles. 


be extended and applied to other crosses of popula- 
tions or species or to segregating populations. For 
species such as human, the mapping of QTL has to 
be made with current segregating populations. No 
matter what population is analyzed, the general idea 
of QTL mapping analysis is based on the inference of 
genotypes and an appropriate model that relates a trait 
to the genotypes or combinations of genotypes at a 
number of genomic positions. However, for mapping 
QTL with segregating populations, statistical analyses 
become much more complicated due to a number of 
limiting factors in data, such as small family size, 
unknown linkage phases between markers and QTL, 
and complicated family structures. Many statistical 
methods for mapping QTL from segregating popula- 
tions have been developed. These include, for ex- 
ample, the sib-pair methods, the identity-by-descent 
mapping, and some Bayesian methods that incorpor- 
ate Markov chain Monte Carlo algorithms. More 
studies are needed to generalize these and other meth- 
ods to make them applicable to the wide variety of 
populations or experimental designs, data structures, 
and genetic models. 

See Falconer and Mackay (1996) and Lynch and 
Walsh (1997) for more general discussion on the genetic 
basis of QTL and on genetic and statistical analyses 
for mapping QTL. 
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Inheritance; QTL (Quantitative Trait Locus); 
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Most of the phenotypic characteristics that distinguish 
differentindividuals within a natural populationare not 
of the all-or-none variety associated with laboratory- 
bred mouse mutations like albino, tailless, Kinky tail, 
and hundreds of others. On the contrary, easily visible 
human traits such as skin color, wavy hair, and height, 
as well as hidden traits such as blood pressure, musical 
talent, longevity, and many others each vary over a 
continuous range of phenotypes. These are ‘quantita- 
tive traits’ which are so called because their expression 
in any single individual can only be described numer- 
ically based on the results of an appropriate form of 
measurement. Quantitative traits are also called con- 
tinuous traits, and they stand in contrast to qualitative, 
or discontinous, traits that are expressed in the form of 
distinct phenotypes chosen from a discrete set. 

Continuous variation in the expression of a trait 
can be due to both genetic and nongenetic factors. 
Nongenetic factors can be either environmental (in 
the broadest definition of the term) or a matter of 
chance. In experimental animals like mice, it is rela- 
tively straightforward to separate genetic from non- 
genetic contributions through the analysis and 
comparison of animals within and between inbred 
strains. Variation in expression among individual 
members of an inbred strain must be caused by non- 
genetic factors. Furthermore, if one is convinced that 
all individuals are maintained under identical environ- 
mental conditions, then existing variation is likely to 
be the result of chance alone. 

Geneticists are, obviously, most interested in the 
genetic contribution to a quantitative trait. A genetic 
contribution cannot be demonstrated by looking at 
individuals from a single inbred strain alone. Rather, 
a comparison of expression levels must be made on 
sets of animals from two different inbred strains. If a 
significant strain-specific difference is demonstrated, 
and all other variables have been controlled for, it 
becomes possible to attribute the observed difference 
in quantitative expression to allelic differences at mul- 
tiple quantitative trait loci or QTLs. 

With all of the new approaches to mapping that 
have been developed over the last decade, it has 
become possible, for the first time, to follow the seg- 
regation of the whole genome from each parent to 
each individual offspring in a cross. This, in turn, has 
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allowed investigators to consider the exciting possi- 
bility of mapping and identifying QTLs that control 
complex traits. 

The appearance of a quantitative trait usually 
signifies the involvement of multiple genetic loci, 
although this need not be the case. In particular, a 
single polymorphic locus with multiple, differentially 
expressed alleles can give rise to continuous variation 
within a natural population. There may also be some 
instances where the expression of a quantitative trait is 
controlled by a mutant allele at a single locus with a 
high degree of variable expressivity. 


Dissecting Complex Traits with Use of 
Human Pedigrees 


Until very recently, the vast majority of complex traits 
were beyond the reach of human geneticists. But with 
the development of high density maps of polymorphic 
molecular markers and computer-assisted multipoint 
linkage analysis, the last frontier of transmission 
genetics appears to be firmly within reach. 

The first step in the analysis of complex traits is not 
any different from the first step used in the mapping of 
simple traits like cystic fibrosis: all of the individuals 
within a large set of disease-transmitting families are 
typed at marker loci that together provide linkage 
coverage over the entire genome. However, once this 
large marker genotype data set is obtained, the method 
of analysis must be adjusted to the specific form of 
inheritance that appears to be associated with the dis- 
ease under analysis. For example, with incompletely 
penetrant traits, the computer must be instructed that 
the absence of a mutant phenotype need not imply the 
absence of a disease genotype. With polygenic traits, 
the computer will be programmed to anticipate evi- 
dence of disease linkage to multiple genomic regions 
identified by unlinked marker loci. And finally, a 
limited degree of heterogeneity can sometimes be 
resolved by programming a computer to perform an 
‘either—or’ search for linked loci in different families. 
As is obvious from this discussion, the dissection of 
complex traits in humans will continue to be very 
much dependent on the development of more sophis- 
ticated computer algorithms that take all inheritance 
possibilities into account and, at the same time, 
provide an accurate estimation of the likelihood 
of any particular linkage relationship that may be un- 
covered. 


Dissection of Complex Traits with Use of 
Model Organisms 


The use of model mammalian organisms can provide a 
. 8 . . p 
powerful alternative for the analysis of traits that may 


be too complex for dissection with the use of human 
pedigrees. The model mammal of choice in nearly all 
cases has been the mouse, although the rat has been 
useful in some special cases. 

QTL analysis in mice or rats is begun by first 
identifying two inbred (homogeneous) strains with 
reproducibly extreme differences in the expression of 
the trait of interest. For example, let us say that you 
were interested in studying the quantitative trait 
defined by the amount of liquid intake that is attribu- 
table to a 10% solution of ethanol when an animal is 
given a choice between this and water alone. By test- 
ing various inbred strains of mice, you find consider- 
able interstrain variability in alcohol preference 
ranging from a low of 10% intake in strain DBA 
(considered to be alcohol-avoiding) to a high of 
80% intake in strain B6 (considered to be alcohol- 
preferring). By testing a large number of the animals 
in each of these strains, you would find a low level of 
intrastrain variation (+10% intake) indicating a strong 
genetic component to this quantitative trait. 

In practice, a quantitative trait is most amenable to 
genetic analysis in mice and other experimental organ- 
isms with a pair of inbred strains that show nonover- 
lapping distributions in measured levels of expression 
among at least 20 members of each group. Although a 
significant strain-specific difference can be demon- 
strated under much less stringent criteria, it becomes 
more and more difficult to ferret out the QTLs 
involved as the possibility of phenotypic overlap in- 
creases. 

You would begin your genetic analysis by crossing 
together mice from these two strains to obtain an 
identical set of Fy hybrids, and these F, animals 
would be either crossed to each other (intercross) or 
crossed back to one of their parental strains (back- 
cross) to obtain a set of several hundred second- 
generation offspring. With recombination in the F, 
parent(s), the second generation offspring will have a 
broad range of different sets of genotypes at different 
loci: some will have an alcohol preference similar to 
one parent, some will be similar to the other parent, 
and many will lie in between. 

If a significant number of second generation ani- 
mals are found to express phenotypes intermediate to 
those found in the parental strains and F, hybrid, it is 
most likely that multiple genetic differences between 
the progenitor strains are responsible (a statistical 
approach can be used to ascertain whether a signficant 
difference in expression exists between any two sets of 
animals). The term polygenic is used to describe traits 
that are controlled by multiple genes, each of which 
has a significant impact on expression. The 
term multifactorial is also used to describe such traits, 
but is more broadly defined to include those traits 


controlled by a combination of at least one genetic 
factor with one or more environmental factors. By 
scanning the genome of each offspring with markers 
that cover the whole mouse genome, it will be possible 
to identify all of the major loci that play a role in 
determining the expression of this trait. 

The analysis of complex traits in mice or rats 
according to the protocol just described has numerous 
advantages over studies in humans. First, genetic het- 
erogeneity is completely eliminated since the analysis 
is conducted on just two contrasting genotypes repre- 
sented by the two parental inbred strains. Second, 
environmental variation can be eliminated as a vari- 
able since all animals can be maintained under identical 
conditions. Third, data analysis is greatly simplified 
with the transmission of only two alternative alleles at 
every locus, both marker and disease. In fact, an 
experimental cross of this type can be viewed as a 
large family of several hundred children born to a 
single set of parents. 

This approach has been used with success in dis- 
secting a number of complex traits and its use is likely 
to mushroom quickly with time. Already, ten distinct 
loci with a role in insulin-dependent diabetes and four 
with a role in epilepsy have been mapped. Once a 
locus that plays a role in animal disease is mapped, it 
becomes possible for human geneticists to focus in on 
the homologous region of the human genome to look 
directly for marker linkage to the human disease coun- 
terpart. This approach was used with great success to 
move froma locus found to play a role in hypertension 
in the rat to a gene that plays a role in human hyper- 
tension. 

There is an important limitation to animal models 
that must be kept in mind which the flip side of the 
advantage provided by the elimination of heterogen- 
eity through the choice of just two contrasting inbred 
strains. The analysis only allows the mapping of those 
loci that have different alleles in the two chosen 
strains. If a trait is naturally heterogeneous within 
mice, the choice of a different set of inbred strains 
might yield a different set of predisposing loci. For 
example, in the B6 x DBA cross, the alcohol prefer- 
ence loci 1, 2, and 3 may be detected, but if the cross 
had been performed between strains B6 and C3H, the 
loci 1, 4, and 5 might be detected as responsible for 
alcohol preference, and with other strain combin- 
ations, there might be other sets of loci. Even if all of 
these loci have human homologs that play a role in the 
human condition of alcoholism (and this is by no 
means guaranteed), the loci discovered in one experi- 
mental cross may, by chance, not be equivalent to those 
responsible for the most prevalent form of the human 
disease. Even with this caveat, the ease with which 
complex traits can be analyzed in an experimental 
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cross is so great that it is sure to be pursued in vastly 
increasing numbers of studies with time. 


See also: Complex Traits; Multifactorial 
Inheritance; Neoteny 
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Because a quantitative trait may be influenced by 
many genes and environmental variability also plays 
a role in its expression, such genes cannot be studied 
individually by using the methods of classical 
Mendelian genetics. Until recently, all that was obser- 
vable was variability among individuals and resemb- 
lances between relatives, which are, respectively, 
manifested numerically by variances and covariances. 
But methods have now become available for studying 
effects of individual genes known as quantitative trait 
loci (QTLs). The goal is to locate at least those genes 
with relatively large effects (major genes). 


Classical Quantitative Genetics 


R.A. Fisher (1890-1962) showed that observed values 
of covariances between relatives are consistent with 
what is to be expected if there is Mendelian inheritance 
of quantitative traits (Fisher, 1918). The approach 
taken was to assume that the measurement of a quan- 
titative character on an individual is made up of a 
general mean, the sum of effects of alleles in the indi- 
vidual’s genotype, dominance deviations, an epistatic 
effect, and an environmental effect. The effect of an 
allele is fitted by least squares and, in a random mating 
population, is the mean of deviations from the general 
mean among individuals having at least one copy of 
the allele. The dominance deviation of the genotype 
B;B; at a locus is the difference between the mean of 
individuals with this genotype, averaged over envir- 
onments and other loci, and the sum of the general 
mean and the fitted effects of B; and B;. It is, in statis- 
tical terminology, a two-way interaction effect, inso- 
far as it measures the extent to which the mean and 
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main effects of B; and B; do not suffice to explain the 
observed value of the mean of B;B;. The epistatic effect 
is associated with interactions of alleles at different 
loci. Under the assumptions that there is random 
mating and independent assortment between loci 
the average squared deviation of means of individual 
genotypes about the population mean, which is the 
genotypic variance, can be written as 


Te =o tot 


In this expression 0%, the additive genetic variance, is 
the variance of the sum of fitted allele effects and 07, is 
the variance of dominance deviations. The remaining 
term, 07, is the epistatic variance, which incorporates 
information about non-allelic gene interactions. In 
1954 Cockerham and Kempthorne independently 
showed that 
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where 0%, is the variance associated with interac- 
tions of single alleles at r loci and genotypes at s 
other loci. 

If genotypes can be assumed to be randomly dis- 
tributed among environments and there is no inter- 
action between genotypes and environments, the 
covariance between pairs of individuals with a par- 
ticular pattern of relationship is entirely genetic and 
turns out to be a linear combination of the various 
components that add to o%. 


A Use of the Theory in Plant and Animal 
Breeding 


A consequence of Fisher’s theory is that a simple 
expression is obtainable for the predicted response R 
to selection based upon measurements taken on par- 
ents, provided epistasis and gametic phase disequilib- 
ria are assumed to have negligible effects. This can be 
written as 
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where Cov(O,M) is the covariance between offspring 
and the mean of their parents, o° is the total variance 
among phenotypes, ° is heritability in the narrow 
sense, and 7 is the intensity of selection. Under the 
assumptions made in this section, 0% and oj can be 
estimated from analysis of variance tables that result 
from experiments in which full sibs and half sibs are 
observed. 


Information from Long-term Selection 
Experiments 


Information on the number of loci with effects on a 
quantitative character is obtainable by observing 
results of long-term selection experiments in which 
there is selection for both high and low measurements. 
The simplest assumptions are that there is a large 
population, two alleles at each locus with initial fre- 
quencies of 0.5, and that all loci have an equal effect on 
the quantitative character. The number n of loci is then 
estimated by the formula 


where Ry is the range between the individuals with 
the highest and lowest measurements and oł is the 
initial additive genetic variance. Even though these 
assumptions are unrealistic, it is sensible to conclude 
that many genes are involved, for example, in deter- 
mining oil content of maize seeds. Data from selecting 
upward and downward on this trait were presented by 
Dudley (1977). His figures gave no indication that 
progress from selection had come to an end. 

If a population is not very large and is not followed 
for an extremely large number of generations, the 
response from selection has been observed to cease. 
Examples of this are given by Falconer and Mackay 
(1996). This result is due to the exhaustion of genetic 
variability because of selection and the loss of alleles 
from accidents of sampling. Robertson (1960) showed 
that the long-term limit of response is proportional to 
the expected response to selection in one generation 
and the size of the population, provided selection is 
not intense. This theory is also consistent with obser- 
vations of occasional halts in response (or plateaus) 
when selection is observed over a very long time. 
These are, however, only temporary, and a theory to 
explain the reason for renewed progress from selec- 
tion was developed by W.G. Hill and his colleagues in 
the 1980s (Hill and Keightley, 1988). This was based 
on the assumption that new variability is ultimately 
generated by mutations that can occur at many poss- 
ible loci. 


Recently Developed Methods for 
Studying QTLs 


With the development of molecular genetics it is now 
possible to map QTLs by estimating how closely they 
are linked to observable markers, more and more of 
which are being identified. By conducting appropriate 
experiments, recombination fractions can be esti- 
mated so that QTLs can be ever more closely localized 


to specific small regions of chromosomes. The most 
probable positions of QTLs are those on which the 
observed likelihood of the data is maximized in com- 
parison with what it would be under the hypothesis 
that no QTLs are segregating. This sort of research is 
of interest not only to plant and animal breeders, but 
also to human geneticists who want to discover the 
nature of inheritance of some human diseases such as 
hypertension. 
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Overview 


A quantitative trait is one for which there is no 
obvious way to classify individuals in a population 
according to whether they belong to one of a small 
and precisely limited set of possible distinct cat- 
egories. Some traits of this sort, such as the amount of 
milk produced by a cow during a lactation, have an 
apparent continuum of possible values. Others, such 
as litter size in mice, can only be whole numbers, but 
the number of possible values is very large. In cases 
like this, Falconer and Mackay (1996) point out that the 
trait can be analyzed as if the observable trait is a 
measure of an underlying trait that is continuous. 
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The observed distributions of quantitative traits 
can arise because the traits are influenced by many 
genes, which result in many possible genotypes, and 
also by environments. Thus the difference between the 
means of genotypes are unobservable because of the 
variability among the environments in which indi- 
viduals with any particular genotype live. 


Effectiveness of Selection 


Evidence that at least part of the variability in quanti- 
tative traits is genetic has existed for a long time. 
Selection among cultivated plants and farm animals 
has been going on for thousands of years and, when- 
ever it is possible to compare these populations with 
wild progenitors, substantial differences are observ- 
able, even if there is a quantitative trait. These facts 
were discussed extensively by Charles Darwin (1809- 
83) in On the Origin of Species. More recently, sub- 
stantial increases in yield have occurred in dairy cattle 
and maize and there is very good evidence that these 
increases are due largely to genetic changes. 


Resemblance between Relatives 


Other evidence for the partially genetic determination 
of the expression of a quantitative trait is that pairs of 
relatives tend to resemble each other more than do 
pairs of unrelated individuals. A quantitative measure 
of the resemblance of relatives is the covariance 
between individuals that have a particular pattern of 
relationship, such as parent—offspring. In a pioneering 
paper published in 1918, R.A. Fisher (1890-1962) 
showed that observed values of covariances between 
relatives are consistent with what is to be expected if 
there is Mendelian inheritance and many loci influ- 
ence a quantitative trait (Fisher, 1918). Prior to that 
time there had been acrimonious controversy between 
the biometricians, who believed that evolution 
resulted from selection upon small heritable vari- 
ations, as proposed by Darwin, and the Mendelians, 
who believed that it came about by occasional large 
jumps. Fisher demonstrated that Mendelian inherit- 
ance and slow selection can be explained by one 
theory. 


Direct Evidence of Polygenic Inheritance 


Much information had accumulated from mapping 
experiments on many quantitative traits in several 
species. It indicates that many quantitative trait 
loci (QTLs) are influencing the traits, but not all of 
them have effects of the same order of magnitude. 
Those with relatively large effects are called major 
genes. 
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Further Reading 
Provine WB (1971) The Origins of Theoretical Population Genetics. 
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A group of over 30 genes, the RAD loci, confer resist- 
ance to killing by ultraviolet irradiation (UV) and/ 
or ionizing radiation. They are classified into three 
groups, the RAD3 group, the RAD52 group, and the 
RAD6 group, based on the type of cellular response to 
DNA damage that they specify. 


See also: Saccharomyces cerevisiae (Brewer’s 
Yeast) 
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Radiation genetics in mice began over 60 years ago 
through a study initiated by W. L. Russell designed to 
estimate the genetic hazards of radiation to humans 
(Russell, 1952). Russell’s specific locus test, based 
on work done in Drosophila melanogaster, was a 
screen for recessive mutations in the progeny of ir- 
radiated mice. The initial screen crossed wild-type 
mutagenized mice against tester stocks homozygous 
for visible recessive traits at seven different loci: agouti 
(a), brown (Tyrp1), albino (Tyr), dilute (Myo5a), short 
ear (Bmp5), pink-eyed dilution (p), and piebald 
(Ednrb). The mutations recovered from these first 


screens were successful not only in estimating muta- 
tion rates, but they have supplied a tremendous 
amount of functional information for the genomic 
regions surrounding the specific loci screened. The 
deletion complexes generated in these first screens 
have been the basis for much research that is still 
ongoing today (reviewed by Rinchik and Russel, 
1990) and have led to the discovery of new phenotypes 
(both viable and lethal), haploinsufficient regions, and 
positional cloning of new genes through deletion 
breakpoint mapping. It is worth noting that almost 
all subsequent mutagenesis screens in mouse have 
been based on results from Russell’s specific locus test. 
As molecular techniques and technology have 
evolved, many groups have continued to look at the 
nature of radiation-induced mutations using the spe- 
cific locus test both in whole-animal screens as well as 
in established cell lines. It is clear that radiation can 
induce many different types of events within a DNA 
molecule (point mutations, deletions, complex re- 
arrangements); however the mechanisms are not 
fully understood. One recent study examined muta- 
tions at the hypoxanthine guanine phosphoribosyl- 
transferase HPRT locus, (a selectable marker) in tissue 
culture, induced by exposure to varying amounts of X- 
irradiation (Schwartz et al., 2000). Using exon PCR- 
based deletion analysis, Schwartz et al. found that at 
low doses of radiation (0-2 Gy), primarily point muta- 
tions are recovered, while at higher doses (2-6 Gy) 
primarily deletions are found. They also report an 
increase in average deletion size with increasing 
amounts of radiation. Another recent report examined 
the numbers and types of mutations induced during 
irradiation of successive stages in spermatogenesis and 
found that 80% and 50% of recovered mutations are 
large deletions during postspermatogonial and sperma- 
togonial stages, respectively (Russell et al., 1998) 
Russell’s whole-animal radiation screens as well as 
subsequent efforts (Lyon and Morris, 1966) relied 
entirely on visible markers that could be easily ob- 
served in the progeny (mostly coat color) in order to 
identify mice carrying new mutations (primarily dele- 
tions). This scheme required tens of thousands of 
animals to be screened, and maintained. In the past 
decade however, advances in both molecular genetics 
and embryonic stem (ES) cell technology have enabled 
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the development of radiation mutagenesis screens that 
are more efficient and less costly. The ability to create 
mice from genetically manipulated ES cells main- 
tained in tissue culture allows for a mutagenesis screen 
to be performed im vitro, drastically reducing the 
expense and labor of such a project. Additionally, the 
vast amount of sequence data now known has created 
molecular markers (microsatellite arrays, SNPs) 
throughout the genome, which can be used to assess 
deletions around any known locus, without the need 
for a visible marker. This can be done in cell culture, 
before the labor of creating mice. Several groups have 
successfully created radiation-induced deletion 
complexes in ES cells around specific negatively 
selectable loci (You et al, 1997a, b; Kushi et al., 
1998; Thomas et al., 1998). It is clear from these initial 
results that the mutation induction efficiency, the 
germline competency of recovered mutations, as well 
as the size of lesions created vary greatly from one 
genomic region to the next. 

As an example, the deletion complexes on chromo- 
some 5 (Schimenti et al.) were created by tar- 
geted insertion of the herpes simplex virus thymidine 
kinase gene TK by homologous recombination at 
three deletion focal points (DFPs). Targeted cell lines 
(Dpp6, Hdh, and Gabrb7, one per experiment) were 
subjected to irradiation and selection for loss of TK 
function by growth in the presence of FIAU. DNA 
from the surviving clones was then PCR analyzed for 
the presence of the TK gene. Overall induced deletion 
frequencies at the three loci were 1/1400, 1/33 000, and 
1/150000 irradiated cells, and recovered deletions 
varied in length up to 20cM. It is worth noting the 
variability in percentage of cells that both survived 
FIAU selection and also bear a deletion. For example, 
in the Dpp6 experiment, virtually all of the surviving 
cells had deletions, as opposed to Gabrb1 locus, where 
only 2-5% of clones had the TK gene removed. The 
low efficiency at the Gabrb1 locus was further experi- 
enced in the germline competence of these cells. Only 
one of six deletions bearing lines produced a germline 
chimera, and was shown to harbor only a small dele- 
tion. Schimenti et al. (2000) conclude that there must 
exist a haploinsufficient locus (or loci) in the vicinity 
of Gabrb1, but that they cannot determine where 
this region resides. The other two loci (Dpp6 and 
Hdh) were highly amenable to the production of 
nested, overlapping deletion complexes. Similar ex- 
periments on chromosome 9 identified a putative hap- 
loinsufficient region between two loci (Ncam and 
Myo 5a/Bmp4) due to a lack of overlapping deletions 
(Thomas et al., 1998). Although these suspected hap- 
losinsufficient regions could be pursued, the initial 
goal of creating large deletion sets is not possible 
within these regions. 


A project funded by the Merck Genome Research 
Institute termed “Delbank’ will allow for the creation 
of interdigitated deletion complexes that span the 
entire genome. Delbank, which is run by Schimenti, 
is based upon the success of creating interdigitated 
deletion complexes in ES cells (as mentioned above). 
The strategy employs random insertion (as opposed to 
targeted insertion) of TK into the ES cell genome by a 
retroviral vector. The goal of Delbank is to create a 
panel of germline competent ES cell lines with one TK 
insertion approximately every 10cM. The TK gene in 
each cell line can then be used as the DFP for the 
creation of a deletion complex. The locations of the 
insertions will be mapped by Delbank, enabling 
the creation of interdigitated deletion complexes at 
any genomic location of interest. Delbank’s cell lines 
are available free to the community, and exclude 
the need for the individual researcher to perform 
the gene targeting. Technical protocols and cell line 
details are online at Delbank (http//lena.jax.org/ ~ jcs/ 
Delbank.html). 

As all of the genes within the genome will likely be 
identified through annotation of sequence produced 
by efforts organized by the genome project, the neces- 
sity of functional dissection of each individual gene is 
the obvious next step. Without question having an 
allelic series of mutations in a gene of interest is a 
powerful tool in functional analysis. Whole genome 
saturation mutagenesis has been invaluable in creating 
allelic series in other organisms (both radiation and 
chemical induced). Due to the size and complexity of 
the mouse genome, as well as the expense of maintain- 
ing the numbers of animals involved, such approaches 
have not been practical. ES cells clearly present 
a viable alternative to whole-animal mutagenesis 
(Chen et al., 2000a). A strong argument is made for 
combined mutagenesis (Schimenti and Bucan, 1998); 
the creation of overlapping deletion complexes in a 
given genomic region, followed by N-ethyl-N-nitro- 
sourea (ENU) regional saturation mutagenesis (Chen 
et al., 2000b). This approach makes phenotype based 
screens more efficient than classic schemes, however 
the resources involved to perform such a project 
would exceed the capability of most researchers, and 
would require large collaborations to screen, recover, 
and analyze all mutations within each genomic 
segment. 

With the sequencing of the mouse genome, the 
usefulness of deletion sets must be reevaluated. 
A bank of deficiencies spanning the entire mouse 
genome would provide null alleles for any gene of 
interest as well as the obligate reagents for combined 
mutagenesis (mentioned above). The discovery of 
genes, however, their exact locations, as well as protein 
analysis will largely be done in silico, by DNA sequence 


comparison and protein prediction algorithms. Add- 
itionally, the ability to create precise targeted deletions 
(Zheng et al, 1999) avoids the potential problem 
of unwanted genes being deleted in a given lesion. 
Advances in mutation detection and chemical muta- 
genesis are allowing for efficient genotype-based 
screens in ES cells (Chen et al., 2000a), and will 
provide a valuable method for functional analysis of 
known genes. 
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At the foundation of models of population genetics, 
first derived by Fisher, Wright, and Haldane, lies the 
concept of ‘random mating.’ If each individual in the 
population were assigned a number, and mating pairs 
were established by drawing pairs of numbers at ran- 
dom, the population would exhibit random mating. 
Real organisms make many decisions in selecting 
mates, and in this sense we may expect that random 
mating would be rare. But usually random mating is 
considered relative to genotypes at one or two genetic 
loci, so even if there are many phenotypic attributes 
that are important in mating success, so long as those 
attributes are independent of the particular genes 
under study, there may be random mating with respect 
to those genes. 

Population geneticists need to determine whether 
samples of genotypes drawn from a population are 
consistent with random mating, and this entails mod- 
eling the process and devising statistical tests. We can 
conceptualize organisms pairing up in mating by im- 
agining the organisms as gas molecules bouncing 
around in a container. If the motions of the molecules 
are at random, then the probability that two molecules 
will collide is proportional to the product of the con- 
centrations of the molecules. This is known as the 
principle of ‘mass action,’ and the same idea applies 
to random mating of macroscopic organisms. Sup- 
pose, for example, that the frequency of genotype 
A is pa, and the frequency of genotype B is pg. In this 
case, when there is random mating among these geno- 
types, the frequency of matings between A and B is 
Paps: 

But organisms are not gas molecules, and for our 
purposes in calculating mating frequencies an import- 
ant distinction is the occurrence of two sexes. Sup- 
pose par is the frequency of genotype A in females, 
and pgm is the frequency of genotype B in males. The 
mating of genotype A females with genotype B males 
under random mating is pafpsm. The ‘reciprocal’ mat- 
ing, that is, between genotype B females and geno- 
type A males has frequency pgyPam. Note that if the 
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frequencies are the same in males and females, so that 
Paf=Pam=Pa, and pa=Pam=pe, then the frequency 
of matings between genotypes A and B is 2p4pp. 
The factor of 2 comes from needing to consider both 
reciprocal matings. 


What Random Mating is Not 


Another way to understand random mating is to con- 
sider mating patterns that deviate from random mat- 
ing. One possibility is that like attract like, so that 
there are too many A x A matings, and too many 
B x B matings, leaving a deficit of A x B matings. 
This sort of mating pattern is called ‘positive assort- 
ative mating.’ The converse is to have an excess of A x B 
and B x A matings, with a deficit of like x like 
matings. This pattern is called ‘negative assortative 
mating’. 


More than one Genetic Locus 


Despite this simple operational definition, there are 
many subtleties to the concept of random mating. 
First, one has to ask “random with respect to what?” 
In the above example, the answer is that the matings 
occur at random with respect to genotypes A and B, 
but observation of the fact that the A x B mating 
has frequency 2p4pg does not allow one to generalize 
that mating is at random with respect to all genetic 
attributes. For example, consider a third gene C. It is 
possible that the only matings are AC x BC, Ac x be, 
aC x bC, and ac x be. The loci A, B, and C may be 
in ‘linkage equilibrium,’ and mating may be at 
random with respect to the A and B loci, but still 
there may be complete positive assortative mating at 
locus C. 


Phenotypic Random Mating 


The random mating we have discussed so far could be 
called genotypic random mating because the mating 
frequencies are considered for pairs of genotypes in 
the population. Random mating may also occur with 
respect to phenotypes. If allele A is dominant to allele 
a, then the two phenotypes are A” and aa. If these two 
phenotypes have frequencies D and R, then the mating 
A` x A` will have random mating frequency D’, A” x 
aa and aa x A` will have combined frequency 2DR, 
and aa x aa will have frequency R°. Note that matings 
involving the dominant phenotype may be further 
partitioned if one wants to follow the genotypes, and 
that random mating of phenotypes may also be (but 
does not necessarily guarantee) random mating of 
underlying genotypes. 


Testing whether Population is Randomly 
Mating 


If one can collect observations of counts of the pairs of 
genotypes that mate, it is possible to directly test ran- 
dom mating by a standard chi-square (x°) test. The 
expected counts of each mating type are determined 
from the frequencies of the genotypes as described 
above, and the formula for the y? is used to calculate 
a statistic based on the observed and expected counts. 
It may be that one cannot obtain mating pairs, but 
instead the population sample is merely a sample of 
genotypes. In this case an indirect test of random 
mating is to test the goodness-of-fit to Hardy- 
Weinberg proportions. This test is indirect because 
one may have random mating and a poor fit to 
Hardy-Weinberg proportions due to several other 
factors, including, for example, natural selection. On 
the other hand, it is possible, if unlikely, to have 
nonrandom mating but the population fits Hardy- 
Weinberg proportions because other factors cancel 
out the skewed genotype proportions caused by the 
nonrandom mating. Clearly it is best to obtain actual 
counts of mating pairs. 


See also: Hardy-Weinberg Law; Panmixis 


Ras Gene Family 


R Hesketh 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1610 


The Ras superfamily comprises over 60 genes encod- 
ing GTP-binding proteins (G proteins). There are 
three Ras genes that comprise the most highly con- 
served group of known oncogenes: Hras (the onco- 
gene of Harvey murine sarcoma virus), Kras (oncogene 
of Kirsten murine sarcoma virus), and Nras (detected 
in tumors but not in retroviruses). The human homo- 
logs are HRAS1, KRAS2, and NRAS: HRAS2 and 
KRAS1 are inactive pseudogenes. 

RAS-like proteins can be grouped into three main 
families: RAS proteins (HRAS1, KRAS2, and NRAS 
and their close relatives RRAS, RAL, RAP, RHEB, 
RIN, and RIT), RHO/RAC proteins, and RAB pro- 
teins. RHO/RAC proteins (12 mammalian members 
including RHOA, RAC1, and CDC42) are involved 
in the organization of the actin cytoskeleton, cell cycle 
regulation and membrane trafficking. RAB proteins 
regulate intracellular vesicular transport. Further sub- 
groups are typified by RAN (nuclear GTPases), ARF 
(ADP-ribosylation factor), and GEM/RAD. 


Homologs of RAS act as crucial signal trans- 
ducing elements in all eukaryotic organisms that 
have been examined and in Caenorhabditis elegans 


and Drosophila. 


Protein Function 


RAS is a Molecular Switch 

RAS proteins are ubiquitously expressed membrane- 
bound GTPases (Boguski and McCormick, 1993). 
Normal p21*4° (RAS) hydrolyzes GTP at rates com- 
parable with those reached by purified G proteins and 
exists in an equilibrium between an active (GTP.RAS) 
and an inactive (GDP.RAS) state (Figure 1). NIH 3T3 
fibroblasts contain ~500 and 1.3 fmol of GDP.RAS 
and GTP.RAS, respectively. In cells over expressing 
activated Hras the respective figures are ~5000 and 
~2000 fmol. The rates of GDP release and GTP 
hydrolysis are increased by the actions of three classes 
of regulatory proteins: 


1. GTPase activating proteins (GAPs) that increase 
the rate of hydrolysis of GTP. Most cells express 
two GAPs, type I p120@4” and NF1-GAP (neuro- 
fibromatosis type 1 (neurofibromin)), with similar 
activities. Type I GAP may be alternatively spliced 
to give Type II GAP, detected in placental tropho- 
blasts. In general, GAP appears to function as an 
upstream regulator of normal RAS, maintaining it 
in an inactive, GDP-bound state. However, GAP 
may also be involved in coupling RAS to down- 
stream effector proteins. 

2. Guanine nucleotide exchange factors (GEFs), also 
called guanine nucleotide release proteins (GNRPs) 
or guanine nucleotide dissociation stimulators 
(GDSs) that catalyze the release of bound GDP 
(Sprang and Coleman, 1998). At least six GEFs for 
RAS proteins have been identified: SOS, RAS- 
GRF, C3G, CalDAG-GEFI, RAS-GRP/Cal- 
DAG-GEFII, and Epac/cAMP-GEFI. 

3. Guanine nucleotide dissociation inhibitors (GDIs) 
that inhibit the replacement of GDP by GTP and 
may also inhibit the action of GAPs. A number of 
GDIs specific for RHO or RAB family members 
have been identified which additionally regulate the 
translocation of their target GTPases between the 
membrane and the cytosol. 


Yeast RAS 

The yeast RAS1 and RAS2 genes encode proteins with 
GTPase activity that have strong homology with 
human RAS proteins. The yeast proteins activate ad- 
enylate cyclase in a manner analogous to the action of 
G, in mammalian plasma membranes. However, there 
is no evidence that RAS proteins regulate adenylate 
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Figure | The RAS molecular switch. RAS undergoes 
conversion between active (GTP-bound) and inactive 
(GDP-bound) forms. GTP hydrolysis of normal RAS is 
promoted by GTPase activating proteins (p120°®^° and 
NFI): GDP release is stimulated by GEFs/GNRPs (see 
text). 


cyclase in vertebrate cells, although they can sub- 
stitute for RAS1 and RAS2 in yeast. 


Cellular Roles of RAS and RAS-Protein 
Interactions 

Normal RAS proteins are involved in the control of 
cell growth and differentiation. However, the effects 
of activated RAS proteins are cell-specific, and RAS 
may cause growth transformation or growth inhib- 
ition, anti-apoptotic or apoptotic responses, differen- 
tiation or blockade of differentiation, depending on 
cell type. This diverse spectrum of responses suggests 
that RAS may require multiple effectors. Consistent 
with this deduction, RAS proteins interact directly 
with a variety of cellular proteins in addition to GAPs, 
including RAF1, BRAF, RAL.GDS (RAL guanine 
nucleotide dissociation stimulator), RAL.GDS2, 
RLF (RalGDS-like factor), AF6, RIN1, PLC210, 
multiple isoforms of phosphatidylinositol-3 kinase, 
NORE, the aiolos transcription factor and protein 
kinase CC (Yamamoto et al., 1999). A number of these 
proteins constitute the initial components of specific, 
intracellular signaling pathways that can emanate 
from activated RAS. RAS functions as homo-dimers 
or homo-trimers in a manner similar to EF-Tu, 
SV40 large T antigen and E. coli CRP, which may 
facilitate interaction with multiple effectors. Each 
RAS-regulated pathway ultimately leads to proteins 
controlling gene transcription (Campbell et al., 1998; 
Shields et al., 2000). 


RAS Signaling Pathways 


Activation of normal RAS 

The action of a variety of growth factors (e.g., epider- 
mal growth factor, platelet-derived growth factor 
(PDGF), or serum) increases the concentration of 
GTP.RAS in normal cells. Phosphorylated tyrosine 
residues on activated receptor tyrosine kinases 
(RTKs) associate with the SH2 domain of growth 
factor receptor-bound protein 2 (GRB2). GRB2 can 
also bind to RTKs via SHC proteins. GRB2 binds to 
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Figure 2 Activation of RAS. Ligand stimulation of a 
receptor tyrosine kinase (RTK) induces receptor 
dimerization and autophosphorylation. Phosphorylated 
tyrosine residues on activated RTKs associate with the 
SH2 domains of GRB2 (or SHC) proteins. GRB2 can 
also associate with SHC via its SH2 domain. The two 
SH3 domains of GRB2 recruit SOS which activates RAS 
and downstream pathways including RAF-MAPK. GRB2 


the GEF SOS1 via its SH3 domains. SOS1 is thus 
recruited to the plasma membrane where it activates 
RAS (Figure 2). Other RAS GEFs (RAS.GRP and 
RAS.GRF) may be activated via G-protein-coupled 
receptors (by the actions of diacylglycerol and cal- 
cium) or by cytokine receptors which may activate 
VAV (Figure 3). The conformational change induced 
by GTP binding activates RAS, enabling it to interact 
with cellular target (‘effector’) proteins. 


MAPK pathways 

GTP.RAS activates signaling pathways leading to 
the stimulation of members of the mitogen activated 
protein kinase (MAPK) family. The components of 
these pathways are dual-specificity MAPK kinases 


(MAPKKs) that are themselves activated by 
MAPKK kinases (MAPKKKs), constituting a cascade 
of serine/threonine protein kinases (Whitmarsh and 
Davis, 1998). Three mammalian MAPK families have 
been characterized: extracellular signal-regulated 


also couples the major insulin receptor substrate, IRS-1, 
to the RAS signaling pathway. 
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Figure 3 Signaling pathways emanating from RAS. Major signal transduction pathways emanating from activated 
RAS (GTPRAS) include the MAPK/ERK pathway involving RAFI, MAPKK (also called MEK, (MAP kinase or ERK 
kinase)) and MAPKs. A RAF | -independent pathway leads to the activation of JNKs and SAPKs. RTKs may also activate, 
directly or indirectly: (1) phosphatidylinositol 3-kinase and an anti-apoptotic pathway via AKT, (2) phospholipase Cy 
(PLCy) leading to hydrolysis of phosphatidylinositol 4,5-bisphosphate (Ptdins(4,5)P2) and elevation of the free, 
intracellular concentration of calcium, (Ca**),, and (3) members of the Janus family of protein kinases (JAKs) which 
may also activate RAS. Other RAS GEFs (RAS.GRP and RAS.GRF) may be activated via G-protein-coupled receptors 
(by the actions of diacylglycerol and calcium) or by cytokine receptors which may activate VAV. Targets of AKT 
include GSK3, BAD, caspase 9, and forkhead proteins including AFX and IkB. RAL.GDS regulates RHO family 
proteins and RHO may also be inhibited by the GAP-associated protein p|90°*". RHO activation can stimulate 
transcription of the serum response factor (SRF) gene. SUR8 is a leucine-rich repeat protein that binds to both RAS 
and RAF and enhances MAP kinase activation. 


kinase (ERK or MAPK), which is activated by growth 
factors, peptide hormones and neurotransmitters, 
JUN N-terminal kinases (JNKs or stress-activated 
protein kinases, SAPKs) and p38 MAPK, the last 
two being activated by cellular stress stimulus as well 
as by growth factors (Figure 3). 

Activation of MAPKs leads to the phosphorylation 
of a variety of proteins including the 90 kDa ribosomal 
S6 kinase (RSK) family, of which CREB kinase/RSK2 
is amember, and the transcription factors ETS1, ETS2, 
ELK1, NET, SAP1, SAP2, ATF2, and JUN (Cobb, 
1999). 


Phosphatidylinositol-3 kinase (PtdIns-3 kinase) 
Inadditionto MAPK pathways, GTP.RAS may activate 
PtdIns-3 kinase, generating phosphatidylinositol-3, 
4-bisphosphate which then activates the AKT family 
of serine/theonine protein kinases. AKT mediates a 
variety of biological responses including inhibition of 
apoptosis and stimulation of cell growth. AKT in- 
hibits glycogen synthase kinase-3 (GSK3: see Adeno- 
matous Polyposis Coli), 6-phospho-fructo-2-kinase, 
the BCL2 family protein BAD, and possibly p70 ribo- 
somal S6 kinase and phosphorylates RAC1 to inhibit 
RAC1-GTP binding. A key target of AKT in the 
suppression of apoptosis is BAD, the phosphorylation 
of which represents a mechanism for growth factor 
inactivation of a component of the cell death system. 
AKT also phosphorylates and thereby negatively 
regulates the transcriptional activity of the forkhead 
factors AFX, FKHRL1, and FKHR. 


Signaling through RHO by RAS-dependent and 
RAS-independent pathways 

In at least some cell types full transformation by 
oncogenic RAS requires the activation of members 
of the RHO/RAC family of small GTPases which 
includes RHOA, RHOB, RHOG, RAC, CDC42, 
and TC10 (Kaibuchi et al., 1999). The activities of 
CDC42, RAC, and RHO are interdependent and 
activated RHO proteins cooperate with RAF1 to 
transform cells. The precise mechanisms by which 
RAS regulates the RHO family are unresolved but 
may include: 


1. Activation of a family of GEFs for the RAL small 
GTPases (RAL.GDS, RGL, and RGL/RLF). 
Hence RAL binding protein 1 (RALBP1) is acti- 
vated as a negative regulator of CDC42 and RAC. 

2. Coordination with activation of RAS through SOS 
and RAS.GRF which also function as RHO GEFs. 
In fibroblasts the complex of Eps8, E3b1 and SOS1 
has RAC-specific GEF activity im vitro and medi- 
ates signaling between RAS, PtdIns 3-kinase and 
RAC. 
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3. RAS activation of RAC which can suppress RHO. 
The reciprocal balance between the activities of 
RAC and RHO is a major determinant of cellular 
morphology and motility in NIH 3T3 fibroblasts. 


VAV and VAV2, members of the DBL family of GEFs, 
also act on RHO proteins. Oncogenic (transforming) 
VAV activates the JNK/SAPK pathway via RAC1, 
independently of RAS activation. However, VAV 
cooperates with RAS to transform fibroblasts and 
dominant negative mutants of Vav inhibit Ras- and 
Raf-induced transformation, suggesting that Vav 
and Ras signaling pathways overlap. 


Functional Diversity within the RAS Family 
On the basis of their high degree of sequence identity 
HRAS1, KRAS2, and NRAS are usually considered to 
be functionally identical although the evolutionary 
conservation of three RAS genes suggests diverse 
functions. Evidence is limited but transgenic studies 
have shown that, although neither Nras nor Hras is 
essential for mouse development and both Nras/~ 
and Hras” mice grow normally and are fertile, 
Kras ‘~ mice die in utero. The posttranslational modi- 
fications specific to KRAS2 may result in differential 
routing to the plasma membrane (see section ‘Post- 
translational Modification of RAS Proteins’ below) 
and KRAS2 may be less confined to distinct domains 
of the plasma membrane than HRAS1. The inhibition 
of C-terminal modification by farnesyl transferase 
inhibitors (see section ‘Farnesyl Transferase Inhib- 
itors’ below) is effective against HRAS1 but results 
in an alternative modification to KRAS2 and NRAS 
(Oliff, 1999). In addition, some GEFs show specificity 
for individual RAS proteins. Evidence is emerging 
that other members of the RAS family are coord- 
inately regulated in cellular signaling pathways 
(McCormick, 2000). 


Transcriptional Control via RAS 


Oncogenic RAS has been reported to activate tran- 
scription of many genes in a diversity of cell types, 
including ornithine decarboxylase, FOS, JUN, JUNB, 
MDR1, Mob-1, MYC, SRF, transin, heparin-binding 
epidermal growth factor, p9Ka/42A, WAF1, Cyclin 
D1, TGF and TGF. In endothelial cells Hras stimu- 
lates the expression of vascular endothelial growth 
factor (VEGF) and of the matrix metalloproteinases 
MMP-2 and MMP-9 whilst reducing TIMP (Tissue 
inhibitor of metalloproteinase) activity, suggesting 
that RAS may contribute to the growth of solid tumors 
by indirectly promoting angiogenesis. RAS represses 
transcription of the MYOD1, MYOH, Myt5, MRF4, 
myogenin, PDGF receptor, and fibronectin genes. 
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Conversion in mammalian RAS: Hypervariable 
a a D ae, | 
70-80% 


>95% 
Switch | Switch II vir 
region region 
32-40 165 186 189 
Gly12, Gly13 60 - 72 


Ala59, Gin61 (oncogenic mutations) 


Figure 4 Structure of RAS protein. Cross-hatched 
regions indicate the switch regions. Switch | (or the 
‘Effector domain’) is the region in which substitutions 
reduce the biological effects of RAS proteins in both 
mammalian and yeast cells but do not affect GTP binding 
or hydrolysis. Switch | is essential for stimulation of 
GTPase activity by GAP. Mutations in this region reduce 
the direct interaction that occurs between RAS and 
RAFI. The Switch Il region, together with Switch |, 
forms the two domains that undergo large conforma- 
tional changes upon exchange of bound GDP for GTP. 
Cys186 in the CAAX box is essential for transforming 
activity. Naturally occurring activating point mutations at 
codons 12, 13, 59, or 6l inhibit GTP hydrolysis: such 
oncogenic mutations therefore lock the GTPRAS 
complex in an active form. 


Oncogenic RAS also downregulates expression of the 
transcriptional repressor PAR4. 


Oncogenic Mutations 


Any one of many single amino acid mutations in RAS 
can give rise to highly oncogenic proteins. Naturally 
occurring activating point mutations in RAS (Gly12, 
Gly13, Ala59, and Gln61) inhibit GTP hydrolysis 
either by diminishing GTPase activity or (for Ala59) 
modulating the rate of nucleotide exchange (Figure 4). 
A mutation inserting glycine between codons 10 
and 11 has been detected in KRAS2 from a human 
leukemia. In addition to the mutation at codon 12 in 
the HRAS1 gene a mutation in the fourth intron 
causes a 10-fold increase in expression of HRAS1. 


Cancer 


Activating mutations in RAS oncogenes have been 
detected in a wide variety of human tumors. The over- 
all incidence of transforming RAS genes in human 
cancers is only between 10% and 15% but the vari- 
ation extends from being rarely detectable in breast 
and stomach tumors through a 10% incidence in urin- 
ary tract tumors to a frequency as high as 95% in 
pancreatic carcinomas. Mutations in individual RAS 
genes are commonly associated with specific tumors, 
for example, HRAS1 with cutaneous squamous cell 
carcinomal and squamous head and neck tumors, 
KRAS2 with cancers of the lung, colon or pancreas, 


NRAS with acute myelogenous leukemia. KRAS2 
mutations occur in lung adenocarcinoma and in squa- 
mous cell lung carcinomas in which HRAS1 muta- 
tions are rare. However, there is no specificity in 
thyroid tumors and in thyroid adenomas and carcino- 
mas mutations in all three genes (HRAS1, KRAS2, and 
NRAS) may occur within one tumor. Simultaneous 
mutations in KRAS2 and NRAS have also been 
detected in multiple myeloma. Even when these tumors 
are histologically identical, however, RAS expression is 
inconsistent. Although mutations in RAS occur only 
rarely in breast cancer, point mutations in HRAS/ or 
KRAS2 have been detected in primary carcinomas and 
in some mammary tumor-derived cell lines. 

The highly polymorphic HRAS1 minisatellite 
locus (unstable repetitive DNA sequences) just down- 
stream from the HRAS7 gene consists of four com- 
mon progenitor alleles and several dozen rare alleles, 
which apparently derive from mutations of the pro- 
genitors. Mutant alleles of the HRAS1 minisatellite 
locus represent a major risk factor for common types 
of cancer (breast, colorectal, and bladder). Mutations 
in KRAS2 have been detected in ~ 40% of pancreatic 
cancers and these mutations appear to correlate 
strongly with the presence of microsatellite instability. 


Protein Structure 


Posttranslational Modification of RAS 
Proteins 

The C-terminus of all RAS proteins contains two 
signal sequences for posttranslational modifications 
(polyisoprenylations) that promote association with 
the plasma membrane: 


1. The CAAX box (C = cysteine, A = aliphatic, X = 
non aliphatic amino acid) signals a three-step modi- 
fication: Cys 186 is alkylated by addition of C15 
farnesyl isoprenoid lipid, the AAX amino acids are 
removed by proteolysis, and methylesterification 
occurs at the « carboxyl of the new C-terminal 
Cys. The modified product is more hydrophobic 
than unmodified pro-RAS and associates weakly 
with cell membranes. 

2. (In HRAS1, NRAS and KRAS2A.) Palmitoylation 
of Cys residues in the hypervariable region 
increases the extent and avidity of membrane bind- 
ing. Fatty acylation of Cys 186 is reversible, the 
period of attachment being short compared to the 
lifetime of the protein itself. Both modifications 
are necessary for plasma membrane localization. 
KRAS2B lacks Cys in the hypervariable region 
and does not undergo the final palmitoylation step 
but has a polybasic region essential for plasma 
membrane targeting. 


The CAAX box targets RAS proteins to the endoplas- 
mic reticulum and Golgi apparatus: KRAS2B may 
bypass the latter. Trafficking to the plasma membrane 


requires palmitoylation or a polybasic motif (Magee 
and Marshall, 1999). 


Farnesyl Transferase Inhibitors 

The essential requirement of C-terminal farnesylation 
of RAS for its activity has prompted the development 
of a variety of farnesyl transferase inhibitors (FTIs) 
as potential chemotherapeutic agents (Nammi and 
Lodagala, 2000). FIIs inhibit transformation by 
HRAS1, but not NRAS or KRAS2, and have been 
shown to induce apoptosis of transformed cells via a 
decrease in mitochondrial membrane potential, the 
release of cytochrome c, and activation of caspase-3. 


X-Ray Structural Studies of RAS 


Several different wild-type and oncogenic RAS com- 
plexes have been crystallized to provide the first 
atomic descriptions of proto-oncogenes and onco- 
genes. The crystal structures are consistent with a 
transition state stabilization mechanism for GTP 
hydrolysis by RAS in which a complex is formed 
between the y-phosphate of GTP and the Gln61 side 
chain. The structure of human HRAS1 bound to the 
GTPase activating domain of p120°*” confirms that 
GAP stabilizes the switch II region of RAS to permit 
Gln61 involvement in catalysis. The crystal structures 
of RAS-GEF complexes have also been elucidated 
(Sprang and Coleman, 1998). 
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Background 


Animals belonging to the order Rodentia comprise 
almost half of all the present mammalian species. 
Thus, there is good reason to study rodents from 
every biological aspect, including genetics. For genetic 
studies the mouse has often been favored, but, in fact, 
the rat was the first rodent to be domesticated for 
research purposes in the 1850s. It has been estimated 
that there are about four rats for every human being on 
earth. Since rats have been companions/followers of 
humans throughout the world it is clear that they 
appreciate and readily adjust to the same type of food 
and environment as humans. This fact in itself has led 
to considerable antagonism between humans and rats, 
and it has not helped that rats are known to carry and 
spread various pests (notably the medieval plague epi- 
demia). One must remember, however, that rats can be 
extremely helpful to humans as laboratory animals 
and models of human disease, partly because their 
preferences are similar to ours. 

Many species and subspecies exist in the genus 
Rattus, but the two main species with worldwide dis- 
tributions are the black rat (Rattus rattus) and the 
brown or Norway rat (R. norvegicus). The black rat 
is thought to have originated in the Indian peninsula 
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and to have spread in the twelfth century to Europe 
and Africa along trade routes and, subsequently, to the 
Americas by ships in the sixteenth century. The brown 
rat invaded northern Europe from Russia during the 
eighteenth century. For reasons of climate and behav- 
ior, the brown rat has now displaced the black rat in 
many areas, particularly those with colder average 
temperatures. 


Rat Cytogenetics 


The cytogenetics of rats have been well studied, par- 
ticularly by Yosida (1980). The Norway rat exhibits a 
very stable karyotype with 42 chromosomes. Most 
black rats have 38 chromosomes, but the karyotype 
is very similar to that of R. norvegicus, if one takes into 
account that two centric (Robertsonian) fusion events 
between acrocentric chromosomes have taken place in 
R. rattus. Specimens of R. rattus in southeast Asia 
exhibit karyotype variation and subspecies exist ex- 
hibiting 27 = 40 and 2” = 42, in which one or none of 
the centric fusions have taken place. Hybrids between 
the different chromosomal races of R. rattus can readily 
be obtained from matings in the laboratory, and some 
naturally occuring hybrids have also been found. The 
hybrids have the expected karyotypes, representing the 
sum of the haploid chromosomes from the parents. The 
chromosomal hybrids have been shown to be fertile in 
laboratory matings, although they may have reduced 
litter sizes. In contrast, interspecific hybrids between 
R. rattus and R. norvegicus cannot be generated even 
after artificial insemination. 


The Rat as a Model Organism in 
Biomedical Research 


The rat has been widely recruited as a laboratory 
animal and a model for human disease. All laboratory 
rats belong to the R. norvegicus species. Rat fanciers 
made the first contributions to rat genetics by the 
isolation of coat and eye color mutants. Different rat 
breeds were established and in the early part of the 
twentieth century a number of inbred rat strains were 
established. Many of the most common diseases have 
complex multifactorial backgrounds and are very dif- 
ficult to study in human subjects because of hetero- 
geneity in genetic and environmental factors. In this 
situation it is logical to turn to animal models, in 
which these factors can be reasonably controlled. 
Rats have been particularly favored as model organ- 
isms by physiologists, pharmacologists, and behavior 
researchers. This has resulted in the development of 
numerous strains exhibiting genetic predisposition to a 
variety of traits. In the recent list compiled by Festing 
(available in the RATMAP database at http://ratmap. 


gen.gu.se/) about 230 different laboratory strains are 
mentioned. Most of them have been developed for 
the study of specific disease characteristics, including 
hypertension, diabetes, cancer, nutrition, cavity for- 
mation, eye disorders, alcohol preference, drug abuse, 
obesity, kidney failure, craniofacial disorders, sen- 
sitivity to toxins, and immunological responses. 
Particularly valuable are numerous inbred strains 
modeling human complex traits, since analysis of 
rodent models provides the most promising approach 
to the characterization of genetic components behind 
human complex disease (Lander and Schork, 1994). 

Although the rat models have often been thor- 
oughly studied from a physiological point of view, 
the genetic analysis has been lacking. Recently, there 
have been rapid developments of the scientific tools 
for molecular genetics analysis in rats. These tools 
include highly polymorphic genetic markers, large 
insert genomic DNA libraries, unambigous genetic 
nomenclature, and the RATMAP rat-specific genome 
database. Thus, it is now possible to analyze the gen- 
etic segregation of complex traits in progeny from 
crosses between rat strains that are susceptible or 
resistant to the particular diseases. Subsequently, asso- 
ciation analysis can be used to pinpoint candidate 
regions for the genes involved. 

From a genetic point of view it is typical that in 
complex disease (1) multiple genes work together to 
produce a phenotype that is often quantitative (e.g., 
blood pressure, glucose level, tumor latency time), and 
(2) the effects of different subsets of genes may result 
in identical phenotypes. Thus, the analysis often in- 
volves analysis of ‘quantitative trait loc? (QTLs). 
Hypertension, which is associated with cardiovascular 
disease, may be mentioned as an example. There are 
several different rat hypertension models that have 
been independently derived from different normo- 
tensive strains. As expected, the preliminary findings 
show that different genes are responsible for the 
hypertensive phenotype in the different models, veri- 
fying the complex nature of the trait. For instance, 
QTLs related to defects in glucose and fatty acid 
metabolism had been identified on chromosome 4 
in a rat model of hypertension and insulin resistance. 
Combining various molecular genetics methods it was 
possible to identify a defective gene (the gene was 
Cd36, also known as Fat, fatty acid translocase; 
Aitman et al., 1999), which is a very strong candidate 
to underlie insulin resistance, defective fatty acid 
metabolism, and hypertriglyceridemia in this model. 
The homologous human gene may be important in the 
pathogenesis of human insulin resistance syndromes. 
One can conclude that work with the rodent models 
clearly opens up new opportunities to identify and 
characterize genes involved in complex diseases. 


Rats: Genetics and Cytogenetics 1609 


Genetic Analysis of Rat Cancer Models variation can be kept at a minimum level in the models. 
Significant genetic changes associated with cancer are 
expected to be mutations in suppressor genes (that 
might be lost or inactivated by loss-of-function muta- 
tions) and/or oncogenes (that can be amplified or 
‘activated’ by gain-of-function mutations). These 
types of changes may be detected by two molecular 
screening methods: loss of heterozygosity (LOH) in 
tumor DNA and comparative genomic hybridization 
(CGH). 

The LOH method aims at detecting genetic losses 
in tumor DNA. Such losses are diagnostic of the pres- 
ence of a tumor suppressor gene. The status of gen- 
etic markers that are heterozygous in the normal cells 
is tested in the tumors by genome-wide screening. 
Markers adjacent to a deleted suppressor gene are 
often co-deleted, and, thus, the tumor DNA becomes 
hemizygous (or even nullizygous if both copies are 
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A genetic approach has also been applied in the study 
of rat models of cancer. There are inbred rat strains in 
which the animals are predisposed to specific types of 
cancer such as breast cancer, uterine cancer, or nerve 
cell cancer. Regions containing susceptibility genes 
can be identified through genetic analysis of crosses 
between susceptible and resistant rats. In addition to 
determining the susceptibility genes by association 
and linkage studies, genetic screening methods can 
be used to detect genetic changes in the tumors them- 
selves. It is difficult to identify the genetic changes that 
are important in human tumors because there are often 
so many changes present, many of which may be com- 
pletely irrelevant. One would expect the spectrum of 
genetic changes seen in the tumors of a model system to 
be less diverse, since both genetic and environmental 


Normal Sarcoma LB20 
(A) (B) 


Sarcoma LB32 Sarcoma LB131 Sarcoma LB133 
(C) (D) (E) 
Figure I Diagrams showing the average curves from CGH scans of rat chromosome 4 (represented by a banded 


idiogram in each diagram). The tested DNA is from normal rat liver tissue (A) or from rat sarcoma tumors (B-E). In 
the CGH method the ratio of test DNA to normal DNA is measured along each chromosome. In each of the 
diagrams the line marked with an arrowhead represents the |:| ratio. If the curve resulting from the scan is off to 
the right it means that there are extra DNA copies in the tumor DNA, whereas the curve will be off to the left if the 
relative copy number is lower in the tumor DNA than in normal tissue DNA. The lines parallel to the I: ratio lines 
represent an average value of one extra copy (the lines on the right) or one copy less (the lines on the left). In the 
figure it can be seen that if normal DNA is used as the test DNA the curve will stay very close to the I:1 ratio line (A). 
Scans of four rat tumor DNAs are also shown. In the tumor LB20 (B) on average there is one extra copy of 
chromosome 4 in each cell (‘trisomy 4’), whereas in each of the tumors LB32, LBI31, and LB133 (C-E) there is 
amplification of a proximal segment of rat chromosome 4. The c-met oncogene has been mapped to this 
chromosomal region, and, as mentioned in the text, numerous extra copies of c-met are often found in this particular 
tumor model. The resulting c-met overexpression is probably very significant for the tumor development in the 
animals, and an indication that similar mechanisms may be active in corresponding human tumors. 
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lost) for markers located near the position of the sup- 
pressor gene. CGH can also be used to screen the 
tumors. In this method tumor DNA (labeled with 
green fluorescence) is allowed to compete with normal 
DNA (labeled with red fluorescence). The DNA mix- 
ture is hybridized to normal rat metaphase prepara- 
tions and each chromosome is scanned for the ratio 
of green to red fluorescence. The scans will pinpoint 
the regions of major deviations from a 1:1 ratio (see 
Figure |). A ratio significantly greater than 1 is indi- 
cative of gene amplification in the tumor DNA, 
whereas a ratio significantly below 1 is suggestive 
of gene loss/deletion. Using the CGH method it 
was possible to identify DNA amplification in 
the proximal part of rat chromosome 4 in a subset 
of rat sarcoma tumors. Further experimentation 
led to the conclusion that these tumors displayed 
amplification and overexpression of the c-met 
oncogene, a gene that was shown to be frequently 
overexpressed in certain human sarcomas as well 


(Helou et al., 1999). 


Comparative Mapping Provides the 
Connection to Humans 


Thus, analysis of rat models will lead to the identi- 
fication of genetic factors involved in the develop- 
ment of diseases in these rat strains. Comparative 
mapping is the important tool to transfer the results 
from the model organism to humans. When the gene 
maps of different mammalian species are compared, 
the findings are unanimous in showing that there is 
conservation of large chromosome regions between 
species. Results obtained with new methodology 
involving heterologous chromosome painting (so- 
called zooFISH) largely support the conclusions that 
have been made earlier based on comparative cytogen- 
etics and comparative mapping. Taken together, these 
studies confirm that it is possible to predict the loca- 
tion and nature of human genes based on information 
about the corresponding chromosome region in a 
model organism. Thus, it seems clear that the analysis 
of disease in model organisms such as the rat is going 
to have a tremendous impact on diagnosis and treat- 
ment of human disease. 
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Nucleoli are distinctly staining regions within eukar- 
yotic cell nuclei and are the site of ribosomal RNA 
(rRNA) synthesis. Typically, a cell has two nucleoli, 
but the oocytes of some organisms have dozens, hun- 
dreds, or even thousands of them. The molecular basis 
for this cytological observation was uncovered in the 
1960s with the discovery that the genes encoding the 
major rRNAs are amplified up to 1000-fold in these 
specialized cells. 

rDNA is the abbreviation for the genes encoding 
the major rRNAs. In eukaryotes these genes are typic- 
ally organized as tandem copies of a basic repeating 
unit at one or a few chromosomal loci (Figure 1). The 
repeating unit contains coding sequences for three of 
the four RNAs of the large and small ribosomal sub- 
units, i.e., the 28S, 5.8S and 18S rRNAs in vertebrates. 
(The genes for 5S RNA, another large subunit con- 
stituent, are usually in a different chromosomal loca- 
tion(s).) The three rRNAs are transcribed together 
into a precursor that is subsequently processed by 
nucleases and modifying enzymes into the separate 
components. The precursor is synthesized by RNA 
polymerase I. In the genome the transcribed sequences 
alternate with nontranscribed sequences called spacers, 
and there are as many as several hundred repeating 
units in a single cluster. 

In amphibian oocytes, the extra rDNA exists as 
extrachromosomal circles, each of which contains 
multiple copies of the repeating unit. Details of the 
mechanism by which these amplified copies are 
generated are not completely worked out, but some 
features have been characterized. The first step, in 
which the initial extrachromosomal copies are pro- 
duced, is the most obscure. The chromosomal com- 
plement of rDNA is not obviously depleted, so the 
process is essentially replicative. The bulk of the 
amplification may occur by extrachromosomal rolling 
circle replication, since intermediates of this type have 
been observed in the electron microscope. To produce 
circular multimers from linear rolling circle tails 


requires a recombination process, probably one that is 
homology-dependent, but the actual mechanism has 
not been determined. 

Why would amphibian oocytes go to the trouble of 
making so many copies of their rRNA genes? The 
answer lies in the strategy of embryogenesis employed 
by these organisms. In most amphibians, eggs are laid 
and fertilized outside the body of the mother, essen- 
tially in open water. To achieve an independent status 
as rapidly as possible, the embryo traverses the early 
stages of development extraordinarily rapidly: the first 
12 cell divisions occur at approximately 30-minute 
intervals, until the mid-blastula stage is reached. Bio- 
synthesis of cellular constituents could not keep pace 
with such rapid cell divisions, so much of the material 
required for a 10000-cell embryo is stored in the 
oocyte and partitioned into daughter cells at each 
cleavage division. This is true of cellular enzymes, 
mitochondrial components, and ribosomes; in fact 
the content of these constituents is typically equiva- 
lent to about 100 000 normal somatic cells. 

During the synthesis of proteins, there is a natural 
amplification, since each messenger RNA can be 
translated repeatedly. For the stable RNAs, such as 
rRNA, there is no such intermediate step, and to 
achieve the necessary accumulation in the allotted 
time, it is necessary to amplify the templates for 
rRNA transcription. The extrachromosomal rDNA 
circles are actively transcribed, as has been demon- 
strated in dramatic electron micrographs (Figure 2). 
These images correspond exactly to the structure of 
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Figure | Organization of the rRNA genes in eukar- 
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yotes. The top line indicates that there are multiple 
tandem copies of a basic repeating unit that consists of a 
transcribed region and a nontranscribed spacer. One 
transcription unit is enlarged below to show the 
locations of coding sequences for the 18S, 5.8S, and 
28S rRNAs (shaded). Unshaded regions within the 
transcribed precursor are discarded during processing 
of the mature rRNAs. The promoter is at the left end of 
the transcription unit, and the 18S sequences are near 
the 5’ end of the precursor RNA. 
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the repeating units illustrated earlier: each ‘Christmas 
tree’ represents transcripts of increasing length being 
synthesized on the rDNA axis, and the gaps between 
the trees are the nontranscribed spacers. 

In the frog Xenopus laevis, amplification of (DNA 
occurs in small, stage I oocytes at about the time that 
the animal is going through metamorphosis. Synthesis 


Figure 2 Actively transcribing amplified rRNA genes 
visualized in the electron microscope. (A) Field showing 
many transcription units from a nucleolus of an oocyte 
of the newt Notophthalmus viridescens. (B) Single tran- 
scription unit showing the DNA axis (solid arrow) and 
the lateral rRNA transcripts. A terminal knob (open 
arrow) may be a site of RNA processing. Bar: (A, B) 
0.5 um. (Reproduced with permission from O'Reilly et al. 
(1994) Chromosoma 103:122-128.) 
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of rRNA starts later, becomes maximal in the middle 
stages of oogenesis (stages III-V), and decreases again 
in the largest oocytes. Eventually, approximately 5 pg 
of rRNA and 10” ribosomes accumulate in each 
oocyte, and these are ultimately distributed to cells 
of the embryo during the cleavage stages that imme- 
diately follow fertilization. 

Because of the linkage of the genes for the 28S, 18S, 
and 5.85 rRNAs in the amplified rDNA and in the 
RNA precursor, their amounts are effectively balanced 
at all stages of synthesis. Each ribosome must also 
contain one molecule of 5S rRNA. In X. laevis, the 
genes for this small RNA are more abundant in the 
chromosomes than those for the other rRNAs (20 000 
copies vs. 500), but they are not amplified in oocytes. 
The larger endogenous copy number is not sufficient 
to make up for the 1000-fold amplification of rDNA, 
so 5S rRNA synthesis begins earlier in oogenesis and 
continues over a longer period of time to ensure ad- 
equate quantities for all of the ribosomes produced. 

The other well-characterized case of rDNA ampli- 
fication is in the single- cell protozoon Tetrahymena 
thermophila. This case is more complex because ampli- 
fication occurs in the context of extensive genome 
rearrangements during the formation of the vegetative 
macronucleus. During macronuclear development in 
Tetrahymena, some chromosomal DNA is eliminated, 
and the remaining portions are amplified 45-fold. 
rDNA is exceptional, since it is amplified 18 000-fold. 
Macronuclear rDNA exists as 9000 linear, dimeric, 
palindromic 21-kb molecules-i.e., each contains two 
fundamental repeating units in inverted orientation. 
Although many details of the mechanism of amplifi- 
cation are not known, all of the amplified rDNA 
derives from a single original chromosomal gene. 
This is excised, dimerized, and replicated bidirection- 
ally from an origin near the center of the palindrome. 
Presumably, the overreplication of rDNA compared 
to the remainder of macronuclear genes reflects some 
special properties of this replication origin. 

Two additional features of Tetrahymena rDNA are 
worth noting, although they are not related to ampli- 
fication per se. Like other stable, linear chromosomes, 
both ends of each rDNA molecule are capped by 
telomeres (see Telomeres). In fact, the telomeres of 
T. thermophila rDNA were the first to be character- 
ized at the sequence level. Second, the first ribozyme 
to be identified was the self-splicing intron within the 
precursor of the large rRNA of Tetrahymena. The 
discovery that RNA could act as an enzyme was revo- 
lutionary, and it completely changed our perspective 
both on biological catalysis and on the molecular 
origins of life. 

The extensive information available on the ampli- 
fied rDNAs of both Xenopus and Tetrahymena is a 


reflection, not only of their inherent interest, but also 
of the fact that, due to their overrepresentation, these 
DNAs could be isolated in relatively pure form prior 
to the advent of recombinant DNA, PCR, and other 
current techniques of DNA isolation and analysis. 


Further Reading 

Davidson EH (1986) Gene Activity in Early Development,3rd edn. 
Orlando, FL: Academic Press. 

Yao, M-C (1986) Amplification of ribosomal RNA genes. In: 
Gall JG (ed.) The Molecular Biology of Ciliated Protozoa, pp. 
179-201. Orlando, FL: Academic Press. 


See also: Polymerase Chain Reaction (PCR); 
Recombinant DNA; Ribosomal RNA (rRNA) 


Reading Frame 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1982 


A reading frame is one of the three possible ways in 
which a nucleotide sequence can be read. The genetic 
code is read as a series of nonoverlapping triplets, and 
thus there are three alternative ways of translating 
a sequence of nucleotides into protein, each with a 
different starting point. 


See also: Closed Reading Frame; Frameshift 
Mutation; Open Reading Frame 
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Types of Genetic Rearrangements 


Genomes are not static but are subject to genetic 
rearrangements that cause gross changes in genetic 
content and the location of genes on chromosomes. 
Genetic rearrangements are of four distinct types: 
deletions, duplications, inversions, and translocations. 
Deletions result from the loss of a contiguous seg- 
ment of the genome. Duplications are caused by the 
addition of a segment already present in the genome. 
In many cases, the additional material is inserted next 
to its original location, resulting in a tandem duplica- 
tion. Inversions are caused by the reversal of the 
orientation of a genetic segment within a chromo- 
some. Translocations result from the joining of two 


distinct chromosomes. Extensive genetic rearrange- 
ments involving large segments of chromosomes can 
often be ascertained by cytological examination. 
However, many genetic rearrangements are more 
local but may none the less have serious genetic 
consequences. 


Consequences of Genetic 
Rearrangements 


A large subset of mutations that cause loss of genetic 
function are rearrangements. For example, in the Zac! 
(lactose repressor) gene of Escherichia coli, approxi- 
mately 80% of mutations that inactivate the gene 
(including a strong mutational hot spot) are deletions; 
in humans, 85% of patients with steroid sulfatase 
deficiency carry a deletion of the STS gene. Deletions 
almost always have serious genetic consequences due 
to the loss of genetic information. Duplications, trans- 
locations, and inversions can also cause deleterious 
effects if a breakpoint of the rearrangement lies within 
a gene. Inversions and translocations sometimes place 
genes in an unfavorable chromosomal locale that 
represses gene expression (so-called ‘position effects’). 
Translocations can produce acentric or dicentric 
chromosomes which cannot segregate properly during 
mitosis. Crossing-over between inversion and trans- 
location chromosomes can lead to problems during 
meiosis, producing offspring with incomplete geno- 
mic information. 

Genetic rearrangements play an important role in 
genomic evolution. Tandem duplications can have 
favorable effects by increasing gene dosage and pro- 
viding an opportunity for evolution of altered genetic 
function within the duplicated genes. For instance, the 
genes for rRNA are found in tandemly duplicated 
arrays in many organisms and may have arisen by 
ancestral gene duplication, where the amplification 
of gene copy provides the capacity for higher expres- 
sion. The globin loci of mammals are found in tandem 
arrays and some of these duplicated genes have 
evolved different properties of expression and bio- 
chemical function. In addition, gross chromosomal 
rearrangements such as large inversions and transloca- 
tions may accompany speciation. 


Mechanisms of Genetic Rearrangements 


Sequence analysis of the breakpoints of genetic re- 
arrangements provides important clues about the 
mechanisms that drive these processes. In addition, 
in model genetic organisms, such as the bacterium 
E. coli and the yeast Saccharomyces cerevisiae, genetic 
rearrangements have been studied systematically and 
genes that affect the frequencies of rearrangements have 
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been discovered. From this work, it is clear that several 
mechanisms contribute to genomic rearrangements. 

Many, but not all, genetic rearrangements occur 
between repeated genomic sequences. Homologous 
break-and-join recombination can occur between 
repetitive cine such as duplicated genes or trans- 
posable elements (or their remnants) dispersed 
throughout chromosomes. For example, an inversion 
of about 20% of the chromosome in a strain of E. coli 
occurred by recombination between dispersed rRNA 
genes. Duplications and deletions in several human 
disease loci are caused by rearrangements between 
repetitive Alu sequence elements, perhaps by unequal 
crossing-over between these Alu repeats. Inaddition, 
many short-range deletions and duplications (over sev- 
eral thousand bases) have short repeated sequences, 
several nucleotides in length, at their end points. 
These homologies are believed to be too short for 
homologous recombination. Instead, these rearrange- 
ments are thought to occur by slipped misalignment 
of DNA strands during replication and may be 
facilitated by DNA sequences or structures that stall 
replication. Because of the nature of this mechanism, it 
is restricted to short-range rearrangements and med- 
iates only deletions and duplications. Transposable 
genetic elements can also mediate genetic rearrange- 
ments such as deletions and inversions during the 
transposition process. In addition, topoisomerases 
have been proposed to catalyze deletions by their 
DNA cleavage and rejoining activity, as the preferred 
sequences for particular topoisomerases have been 
found at certain deletion end points. The relative 
contribution of each mechanism to genetic rearrange- 
ments involving a particular locus may depend on the 
chromosomal sequence context and other environ- 
mental factors. 

Chromosomal breaks, such as those induced by 
ionizing radiation or replication of damaged DNA 
templates, increase the occurrence of genetic rearrange- 
ments. A broken chromosome may undergo recombi- 
nation between repetitive elements on the broken 
segments, which heals the break but produces a dele- 
tion. Alternatively, recombination between a repeti- 
tive element on the broken chromosome and a repeat 
on another chromosome can produce translocations. 
Broken chromosomes can apparently also be religated 
with no sequence homology at their joints, in a pro- 
cess called nonhomologous end-joining or “illegitim- 
ate’ joining. 

Genetic rearrangements are important for genomic 
evolution and form the basis for many genetic muta- 
tions including the genetic changes that accompany 
carcinogenesis. The goal of much current research is 
to define the molecular steps in these mechanisms of 
rearrangements. In addition, cellular factors that either 
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promote or discourage such rearrangements are being 
elucidated. The relative contribution of these mechan- 
isms to human genetic disease will continue to be 
investigated. 


See also: Genetic Diseases; Inversion 
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The rec genes of Escherichia coli are genes required 
for homologous genetic recombination. Many of them 
were identified in genetic screens for E. coli mutants 
that fail to incorporate into their chromosomes, via 
recombination, linear DNAs transferred into them 
from high frequency recombination (Hfr) donor bac- 
teria. They were named recA, recB, recC, etc. The rec 
alphabet currently ends with recT and includes 13 
genes (having skipped some letters). The rec genes 
encode proteins or enzymes that participate in the 
reactions of DNA recombination, DNA repair, and 
regulation of the SOS response to DNA damage. 
Many of the E. coli rec genes have orthologs in other 
eubacteria, Archaea, and eukaryotes, some of whose 
products demonstrably perform similar DNA repair 
and recombination functions. 

In the mid-1960s, A. John Clark and his colleagues 
at Berkeley performed the first genetic screens to 
identify mutants of E. coli that fail to carry out homo- 
logous recombination (e.g., Clark and Margulies, 1965; 
and reviewed by Clark and Sandler (1994); Clark, 
1996). They were interested in understanding the 
molecular mechanisms by which homologous recom- 
bination works and reasoned that finding the proteins 
that catalyze the reactions of recombination, by first 
finding the genes encoding them, would be a pro- 
ductive approach. They used bacterial conjugation, 
discovered by Joshua Lederberg, as an assay for re- 
combination (see F Factor, Hfr). Such experiments use 
male or donor Hfr bacteria: bacteria that have an F sex 
plasmid incorporated into their chromosome. Hfr 
bacteria can transfer linear single-stranded copies of 
segments of their chromosome into female or recipi- 
ent cells (cells that have no F sex plasmid), in which the 
DNA is replicated to become a linear duplex. Recom- 
binants form only if the linear DNA transferred by 
the donor becomes incorporated into the recipient 
chromosome via homologous recombination. Usually 
in such experiments, the donor and recipient each 
possess a selectable marker so that recombinant types 


that inherit both selectable markers can be selected 
directly on an appropriate solid medium in petri 
dishes. Clark and colleagues mutagenized recipient 
bacteria and screened for mutants that failed to pro- 
duce recombinant types in Hfr-mediated conjugation. 
They further screened out any mutants that were 
simply incapable of uptake of transferred DNA, by 
testing that their mutants could receive transferred 
lambda prophage DNA, which kills the recipient cell 
without needing to undergo recombination. Using 
this approach, they identified genes encoding import- 
ant recombination proteins. The first found, recA, 
encodes a universal recombination and DNA repair 
protein (see RecA Protein and Homology) with 
orthologs in all domains of life. Many subsequent rec 
genes have been discovered (Table 1) (see Recombin- 
ation Pathways). 

Some of the rec genes display their recombination- 
defective (Rec) phenotype in cells carrying no other 
recombination-related mutations. Some display their 
Rec” phenotype only if other rec (or related DNA 
recombination genes) are also mutated. The latter are 
interpreted as encoding (at least partially) redundant 
functions, in that some other gene product appears 
to substitute. Table | presents a list of rec and related 
genes and brief summaries of their functions. Because 
many of the proteins that carry out recombination and 
DNA repair are highly conserved evolutionarily, notes 
about the extent of conservation are also provided. 

Additional proteins not identified in screens such as 
that described above participate in recombination. 
Because they were not identified in such screens, 
they are not named Rec. The screens for rec mutants 
could not identify essential genes and also missed 
many recombination genes with redundant or par- 
tially redundant functions. Other important re- 
combination proteins not named Rec include the 
RuvA, RuvB, and RuvC proteins (see RuvAB 
Enzyme, RuvC Enzyme); the major replicative poly- 
merase of E. coli, Pol III (reviewed by Marians, 2000); 
primasome assembly protein PriA and probably other 
proteins it associates with (reviewed by Marians, 
2000); exonucleases Exol and ExoVII (Razavy et al., 
1996); SSB (single-strand DNA binding protein), 
DNA topoisomerases, DNA ligase, and probably 
many others not yet identified. 

rec genes have been described in several organisms 
other than E. coli. For the yeast Saccharomyces cere- 
visiae many genes relevant to recombination were 
identified on the basis of the radiation sensitivity of 
their mutants and have been designated RAD. Homo- 
logs of Rad proteins identified in higher eukaryotes 
also carry the Rad designation. For example, an 
important human homolog of S. cerevisiae Rad51 
and E. coli RecA is hRad51. 


Table | 


E. coli rec genes and proteins? 


Gene Complex Relevant Null mutant Domains from Role in recombination in E. coli Encyclopedia of Genetics article 
name that protein biochemical recombination which homo- 
is part of functions phenotype?” logs are known 
recA |. Strand exchange Rec Eubacteria, |. Coats single-stranded DNA and RecA Protein and Homology 
2. Co-protease Archaea, catalyzes its invasion of duplex DNA; 
Eukaryotes homologous pairing and strand-ex- 
change forming bimolecular, heterodu- 
plex strand-exchange intermediates 
2. Sensor molecule for activation of the SOS Repair 
SOS DNA damage repair response in 
which RecA acts as a co-protease, 
regulating the expression of many 
other genes via cleavage of the LexA 
repressor 
recB RecBCD RecBCD is a dou- Rec Eubacteria l. Creates single-strand DNA used by RecBCD Enzyme and Pathway, 
enzyme ble-strand exonu- RecA for strand invasion and exchange Reckless DNA Degradation, 
(essential clease and helicase 2. Degrades linear duplex DNA Recombination Pathways, 
subunit) Chi Sequences 
recC RecBCD RecBCD is a dou- Rec Eubacteria |. Creates single-strand DNA used by RecBCD Enzyme and Pathway, 
enzyme ble-strand exonu- RecA for strand invasion and exchange Reckless DNA Degradation, 
(essential clease and helicase 2. Degrades linear duplex DNA Recombination Pathways, 
sub unit) Chi Sequences 
recD RecBCD Subunit required Chi inactive; Eubacteria Required for modulation of the RecBCD Enzyme and Pathway, 


enzyme 


for nuclease 
activity of RecBCD, 
not required for 
helicase or 
recombination 


hyper-rec in the 
absence of Chi 


RecBCD enzyme at Chi sites at which 
the RecBCD nuclease is diminished 
and RecA loading activity is stimulated. 
Required for nuclease activity of 
RecBCD and prevention of rolling 
circle replication in vivo 


Reckless DNA Degradation, 
Recombination Pathways, 
Chi Sequences 
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Gene 
name 


Complex 
that protein 
is part of 


Relevant 
biochemical 
functions 


Null mutant 
recombination 
phenotype’ 


Domains from 
which homo- 
logs are known 


Role in recombination in E. coli 


Encyclopedia of Genetics article 


recE 


recF 


recG 


rec] 


RecET 


RecF, O, and 
R may act as a 
complex 


5’ to 3’ double- 
strand dependent 
single-strand 
exonuclease 


Binds ssDNA, aids 
RecA-filament 
formation 


Strand-exchange- 
junction-specific 
helicase 


5’ single-strand 
dependent 
exonuclease 


None (but Rec 
in cells carrying 
recBC and sbcA 
mutations) 


Slight HypoRec 
(but Rec in cells 
carrying recBC 
sbcB and sbcC or 
sbcD mutations) 
Slight HypoRec; 
Rec” when cells 
are RuvA , 
RuvB , or RuvC™ 


HypoRec when 
Exol, ExoVII 
absent (and Rec — 
in cells carrying 
recBC sbcB and 
sbcC or sbcD 
mutations) 


Eubacteria, phage 


Eubacteria 


Eubacteria 


The recE and recT genes are homologs 
of the phage lambda red pathway 
recombination genes. recE and recT are 
part of a cryptic lambdoid prophage in 
some strains of E. coli K12, apparently 
evolutionary remnants of a prophage. 
Normally inactive, these genes can be 
activated by sbcA mutations and then 
will allow recombination in recBC 

E. coli. RecE exposes 3’-ended ssDNA 
for strand invasion or annealing by 
RecT. This is called the ‘RecE’ or 
‘RecET recombination pathway’ 

May help RecA displace SSB from 
single-strand DNA. Required for 
recombination in the ‘RecF pathway’. 


May promote branch migration that 
extends heteroduplex joints of one 
polarity but disrupt heteroduplex 
joints of the opposite polarity. May be 
partially redundant with Ruv system. 
Required for recombination in the 
‘RecF pathway’ 

May expose 3’-ended ssDNA for 
strand invasion by RecA. Required for 
recombination in the ‘RecF pathway’ 


Recombination Pathways 


Recombination Pathways, 
RecA Protein and Homology 


RecBCD Enzyme and Pathway, 
RuvAB Enzyme, 

RuvC Enzyme, 

Recombination Pathways 


Recombination Pathways 
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Gene Complex Relevant Null mutant Domains from Role in recombination in E. coli Encyclopedia of Genetics article 
name that protein biochemical recombination which homo- 
is part of functions phenotype” logs are known 
recN Unknown None (but Rec” Unknown. Required for recombination Recombination Pathways 
in cells carrying in the ‘RecF pathway’ 
recBC sbcB and 
sbcC or sbcD 
mutations) 
recO RecF, O, and Binds ssDNA, aids None (but Rec” Eubacteria May help RecA displace SSB from Recombination Pathways, 
R may act asa RecA-filament for- in cells carrying single-strand DNA; hypothesized to RecA Protein and Homology 
complex mation recBC sbcB and allow 5’ DNA end invasions. Required 
sbcC or sbcD for recombination in the ‘RecF path- 
mutations) way’. 
recQ Helicase None (but Rec” Eubacteria, Eukar- Hypothesized to unwind duplex DNA Recombination Pathways 
in cells carrying yotes ends and, with Rec], create 3’ ssDNA 
recBC sbcB and ends. Required for recombination in 
sbcC or sbcD the ‘RecF pathway’ 
mutations) 
recR RecF, O, and Binds ssDNA, aids None (but Rec’ Eubacteria May help RecA displace SSB from Recombination Pathways 
R may act asa RecA-filament in cells carrying single-strand DNA; hypothesized to 
complex formation recBC sbcB and allow 5’ DNA end invasions. Required 
sbcC or sbcD for recombination in the ‘RecF path- 
mutations) way’ 
recT RecET Eubacteria (see recE above) Recombination Pathways 


“Table modified from Rosenberg and Motamedi (1999). 


Phenotype applies to cells carrying no other recombination mutations. Some of the recombination or rec genes that appear not to affect recombination phenotype much 
(e.g., recF, recO, recR, recQ, recN) may be either redundant with other functions that substitute when that gene is defective, or may play more important roles in recombination 
of DNA substrates other than the double-strand linear DNA that is processed during conjugational recombination. For example, their gene products may be more important 
for recombination of the circular bacterial chromosome and/or for DNA repair at single-strand rather than double-strand breaks. See Recombination Pathways, and Clark 
and Sandler (1994). 


SOUdD 23y 


LI9I 


1618 RecA Protein and Homology 


Further Reading 

Aravind L, Walker DR and Koonin EV (1999) Conserved 
domains in DNA repair proteins and the evolution of repair 
systems. Nucleic Acids Research 27: 1223-1242. 

Lloyd RG and Low KB (1996) Homologous recombination. In: 
Neidhardt FC, Curtiss R Ill, Ingraham JL et al. (eds.) Escher- 
ichia coli and Salmonella: Cellular and Molecular Biology, 2nd 
edn, vol. 2, pp. 2236-2255. Washington, DC: American 
Society for Microbiology Press. 

Rosenberg SM and Motamedi MR (1999) Homologous recom- 
bination during bacterial conjugation. In: Embryonic Encyclo- 
pedia of Life Sciences, www.els.net. London: Nature Publishing 
Group. 
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Genetic recombination functions primarily to main- 
tain the integrity of genomic DNA while also contrib- 
uting to the generation of genetic diversity. The RecA 
protein is a central component in the processes of 
homologous genetic recombination and recombin- 
ational DNA repair in Escherichia coli. Functional 
homologs of RecA have been identified in every organ- 
ism examined. RecA is a DNA-dependent ATPase (an 
enzyme that hydrolyzes adenosine 5’-triphosphate) 
that catalyzes a strand exchange reaction between 


homologous DNA molecules. The active form of the 
protein is a nucleoprotein filament formed when 
RecA monomers polymerize onto single-stranded 
DNA. In addition to its direct role in recombinational 
processes, the RecA protein regulates other repair 
pathways by mediating the induction of the E. coli 
SOS response to excessive DNA damage. 


RecA Monomer Structure 


The E. coli RecA protein consists of 352 amino acids 
and has a calculated molecular weight of 38742 Da. 
The three-dimensional structure of the protein was 
determined in the absence of DNA and in the presence 
and absence of cofactor (Story and Steitz, 1992; Story 
et al., 1992). The protein possesses a central core 
domain and two smaller domains of the amino (N) 
and carboxyl (C) terminus (Figure |). Sequence align- 
ments carried out on the RecA proteins of many bac- 
terial species and several eukaryotic RecA homologs 
have shown that the core domain is quite well con- 
served in this class of proteins. Protein domains with 
structural identity to the RecA core domain, as well as 


Figure | Ribbon diagram of the RecA monomer 
structure complexed with ADP (Protein Data Bank 
#2REB). The monomer represented is a unit of the 
inactive filament observed in the crystal (see Figure 2). 
B-strands are numbered 0—10 and o-helices are lettered 
A-J. The ADP nucleotide cofactor is displayed in ball and 
stick. Residues between f-strand | and a-helix C form 
the Walker A box or the P-loop (colored dark). The 
Walker B box is located at B-strand 4 (also colored 
dark). Regions not ordered in the crystal structure are 
shown as dashed lines. Disordered loop | (LI) is located 
between f-strand 4 and o-helix F Disordered loop 2 
(L2) is located between B-strand 5 and a-helix G. The 
C-terminal domain is circled. 


limited sequence identity, are also the structural build- 
ing blocks of oligomeric DNA helicases and the 
bacterial mitochondrial membrane F,-ATPase. 

The core domain of the E. coli RecA protein 
consists of a mixed eight-stranded, twisted B-sheet 
flanked by four a-helices (Figure |) in the crystal 
structure. This domain contains the nucleotide bind- 
ing site and two disordered regions presumed to 
be DNA binding sites. The nucleotide binding 
loop between B-strand 1 and a-helix C (amino acids 
66-73) matches the P-loop (also referred to as the 
Walker A box) amino acid consensus sequence G/ 
AXXXXGKT/S (where X is any residue) found in 
many nucleotide-triphosphate-binding proteins such 
as the proto-oncogene Ras p21 protein. This loop 
usually interacts with the a- and B-phosphates of the 
nucleotide. Another ATP-binding motif, the Walker B 
box, is found in the RecA core domain at B-strand 4 
(amino acids 140-144). This motif is characterized 
by four hydrophobic amino acid residues followed 
by an aspartate that interacts with the y-phosphate of 
ATP in Ras p21 and other nucleotide-binding pro- 
teins. Because the RecA structure was solved with 
ADP (adenosine 5’-diphosphate), it is not yet known 
whether Asp144 actually interacts with the y-phos- 
phate when RecA is bound to ATP. Interestingly, 
a non-prolyl cis-peptide bond was found between 
Asp144 and Ser145 in the RecA structure. This un- 
common configuration is also conserved at the end of 
the Walker B motif in F,-ATPase and the Rep helicase. 

RecA protein interacts with multiple DNA strands 
in the course of its reactions, but information about 
DNA binding remains limited. Two putative DNA- 
binding loops, L1 (amino acids 157-164) and L2 
(amino acids 195-209), are disordered in the crystal 
structure, presumably because DNA is not present in 
the crystal. Biochemical studies have provided experi- 
mental support for the proposal that these loops func- 
tion in DNA binding. The same studies also indicate 
that the DNA-binding regions extend beyond these 
two protein loops. 


RecA Filament Structure 


RecA monomers polymerize onto DNA in the pres- 
ence of ATP to form a nucleoprotein filament that is 
competent to promote DNA strand exchange and 
induce the SOS response. Monomers in the crystal 
structure pack so as to form a right-handed helical 
filament with six monomers per turn (Figure 2A). 
The amino terminal domain of each monomer, con- 
sisting of o-helix A and B-strand 0, packs against B- 
strand 3 and a-helix D in the core domain of an 
adjacent monomer. The C-terminal domain is on the 
exterior distal from the filament axis in the polymer 
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Figure 2 The RecA filament structure. (A) Space- 
filling model of the side view of a RecA filament made up of 
24 monomers (the same monomer shown in Figure 1!) 
and based on the published monomer coordinates. The 
C-terminal domains (residues 270—328) are shown in a 
darker shade. (B) Same model as in view A and rotated 
90° to show the view down the filament axis. The P- 
loop (darker shade) lies close to the axis. 


structure and is exposed to solvent. The polypeptide 
ends bordering the disordered putative DNA-binding 
loops (L1 and L2), as well as the P-loop, lie close to the 
filament axis (Figure 2B). The information provided 
by the filament structure has allowed a wide range 
of structure-function analyses, yet is still limited. 
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The crystalized filament does not contain ATP or 
DNA (one form was crystalized with ADP), is not 
as extended as the filaments characterized in vitro, and 
probably does not represent an active filament form. 

The structural parameters for functioning RecA 
nucleoprotein filaments in vitro have been derived 
largely from electron microscopy studies (Figure 3). 
Extended active nucleoprotein filaments, in which 
RecA protein is bound to ssDNA (single-stranded 
DNA) in the presence of ATP or a non-hydrolyzable 
ATP analog, have a helical diameter of 100 A, a pitch 
of 95 A, an axial rise per nucleotide of 5.1 A, six 
monomers per turn, and three ssDNA nucleotides 
per RecA monomer. In contrast, the RecA filament 
formed on ssDNA in the absence of cofactor or with 
ADP is the collapsed, inactive form with a pitch of 
64 A and an axial rise per nucleotide of 2.1 A. 


aa PEOS aN ET. s 

Figure 3 Electron microscopy picture of a RecA 
nucleoprotein filament on ssDNA in the presence of an 
ATP analog. The visible striations are indicative of the 
filament’s helical nature. (Courtesy of Dr Ross Inman, 
University of Wisconsin.) 
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RecA Protein-Mediated DNA Strand 
Exchange Reaction 


Thein vitro DNA strand exchange reaction in Figure 4 
is thought to mimic some of the central steps of homo- 
logous recombination of DNA catalyzed by RecA in 
vivo. The reaction can be broken down into four 
conceptually distinct phases. 


Phase l: Nucleoprotein Filament Formation 
In the presence of ATP, RecA monomers bind ssDNA 
stoichiometrically (one RecA monomer per three 
nucleotides). After a nucleation step, the filaments are 
extended unidirectionally (5' to 3’) and cooperatively. 
In a four-strand exchange reaction, the filaments are 
formed on a gapped duplex molecule. Both the nucle- 
ation of filament formation and the subsequent DNA 
pairing processes occur within the single-stranded 
gap. The RecA protein does not readily bind directly 
to duplex DNA at pH >7 due to a slow nucleation 
process. However, the filament can extend into the 
duplex region following nucleation on ssDNA. 


Phase 2: Homology Alignment 

The nucleoprotein filament recruits a linear ds DNA 
(double-stranded DNA) molecule homologous to the 
ssDNA already bound, and homology between the 
DNA molecules is aligned in the filament. The align- 
ment appears to depend primarily on Watson—Crick 
interactions between the originally bound ssDNA and 
its complement within the incoming duplex. The 
mechanism by which RecA facilitates homologous 
pairing remains a focus of investigation. 


Phase 3 Phase 4 


Figure 4 Strand exchange reactions catalyzed by the RecA protein in vitro. The substrates, the intermediate and 
the products of both reactions are distinguishable by agarose gel electrophoresis. See text for detailed descriptions of 


each phase of the reaction. 


Phase 3: Strand Switch (Hybrid DNA 
Formation) 

Homologous alignment in phase 2 leads to rapid 
strand switching to form hundreds of base pairs of 
hybrid DNA. The like-strand in the duplex DNA is 
displaced to form the branched DNA intermediates 
depicted in Figure 4. The hybrid DNA formed in this 
step can encompass thousands of base pairs, although 
pairing is limited to about a thousand base pairs under 
most conditions. Studies utilizing nonhydrolyzable 
analogs of ATP or the RecA mutant K72R that is 
deficient in ATPase function have determined that 
only the binding of ATP, and not its hydrolysis, is 
required for phases 1, 2, and 3 of the strand exchange 
reaction. 


Phase 4: Extension and Product Formation 
The final phase is characterized by a directed exten- 
sion of the hybrid DNA segment formed in phase 3. 
This extension, which requires ATP hydrolysis, is 
unidirectional (5’ to 3’ with respect to the initiating 
single strand) and can proceed through structural bar- 
riers such as regions of heterology. RecA facilitates the 
migration of the branched intermediates until prod- 
ucts are formed. In the four-strand exchange reaction, 
the branch migration can move into the duplex region 
of the bound gapped DNA during this phase. 


ATP and the RecA-Mediated Strand 
Exchange Reaction 


RecA protein hydrolyzes ATP to ADP and inorganic 
phosphate at a modest rate. In phase 1, RecA filaments 
bound to ssDNA hydrolyze ATP with a rate (keat) 
approaching 30 min’. In this reaction, kea is defined 
as the number of ATP molecules hydrolyzed per 
RecA molecule per unit time. ATP is hydrolyzed uni- 
formly throughout the nucleoprotein filament. ATP 
hydrolysis is required for filament disassembly, which 
occurs at the end opposite to that at which filament 
extension occurs. However, filament disassembly does 
not play a major role in DNA strand exchange (other 
than to recycle the RecA protein after reacting). 
When DNA strand exchange is initiated by add- 
ition of ahomologous dsDNA, the rate of ATP hydro- 
lysis drops abruptly to 20-22 min™'. This new rate 
characterizes the hydrolysis reaction during phase 4 of 
the strand exchange. Although its molecular role is not 
yet clear, ATP hydrolysis is necessary for unidirec- 
tional reactions, to bypass structural barriers such 
as sequence nonhomologies or DNA lesions in the 
invading duplex molecule, and for facilitation of the 
four-strand exchange reaction. Two competing hypo- 
theses hold that ATP hydrolysis is needed for either 
(a) redistribution of RecA monomers to fill in gaps in 
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otherwise discontinuous filaments, or (b) to provide 
a motor function that permits the bypass of DNA 
structural barriers. 


Eukaryotic RecA Homologs 


A functional RecA homolog has been identified in 
every organism examined. The eukaryotic homolog 
Rad51 shares sequence, structural, and functional 
homology with RecA. For example, Rad51 from the 
yeast Saccharomyces cerevisiae is an ATPase that 
catalyzes the same DNA strand exchange reaction as 
RecA in vitro. It is a 400 amino acid protein with 
a calculated molecular weight of 42961 Da that is 
C-terminally truncated and N-terminally extended 
relative to RecA. The core of the Rad51 protein is 
very similar to the core of the RecA protein, with 
61% similarity and 35% identity. Although the 
three-dimensional structure of Rad51 has not been 
solved, electron microscopy analysis shows it forms 
a nucleoprotein filament very similar to that formed 
by RecA. However, there are clear biochemical differ- 
ences between the two proteins. The Rad51 protein 
hydrolyzes ATP at a rate approximately two orders of 
magnitude slower than RecA, and it does not exhibit 
the ssDNA binding preference of RecA, i.e., it nucle- 
ates onto dsDNA and ssDNA equally well. In add- 
ition, Rad51 apparently lacks the capacity of RecA to 
promote DNA strand exchange through heterologous 
DNA insertions and to promote four-strand exchange 
reactions. Rad51 may have a role in meiosis that would 
necessitate duplex DNA binding. While the biochem- 
ical characterization of the Rad51 protein is still in its 
infancy relative to that of the RecA protein, it is clear 
that its function is highly regulated and affected by 
interactions with a large number of additional proteins 
that may supply the ATPase-dependent functions that 
RecA exhibits but Rad51 lacks. 


Function of Homologous Genetic 
Recombination: DNA Repair 


Recombination is readily observed during conjugation 
or transduction in E. coli, thereby contributing to 
genetic diversity. However, the primary role of homo- 
logous recombination in bacteria appears to be DNA 
repair, and in particular the repair of stalled or broken 
replication forks. When the bacterial DNA replication 
machinery encounters a damaged template strand, the 
replication fork is inactivated. The DNA lesion is 
generally left in a single-strand gap, where it is inac- 
cessible to standard repair pathways such as nucleotide 
excision repair. Alternatively, a double-strand break 
may result when a replication fork encounters a 
template strand break. In either case, nonmutagenic 
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Figure 5 Potential pathways for nonmutagenic repli- 
cative bypass of DNA damage that require DNA 
recombination. Once a replication fork becomes 
inactivated by DNA damage, the RecA protein can pair 
the damaged strand back with the complementary 
parental DNA strand. This recombination by RecA is also 
dependent on accessory proteins such as the RecFOR or 
RecBCD complexes. The RecFOR proteins are required in 
gap repair to help RecA load onto single-stranded gaps that 
would be bound by the single-stranded binding protein 
(SSB). In double-stranded break repair, the RecBCD 
protein promotes RecA loading by processing double- 
stranded breaks into ssDNA. The resulting Holliday 
junction can be resolved by other proteins such as RecG 
and/or the RuvABC complex and replication can resume. 
The lesion in the template strand (gap repair) can be 
repaired by base- or nucleotide-excision enzymes upon 
the regeneration of an intact complementary strand. 
Arrowheads denote the location of the cleavage by the 
resolvase; an alternative resolution is not shown. 


replicative bypass of a DNA lesion requires both 
recombination and an origin-independent replication 
restart process (Figure 5). It is estimated that most, if 
not all, oriC-initiated replication forks encounter 
DNA damage during normal growth conditions, 
resulting in fork inactivation. Both of the pathways 
illustrated in Figure 5 require the action of RecA 
protein, as well as auxiliary proteins such as the 
RecBCD nuclease/helicase. Although bacteria lacking 
functional RecA protein are viable, up to half the cells 
are dead and about 10% lack DNA. Evidently, sec- 
ondary pathways exist for replication fork repair that 
do not require RecA. Once a lesion has been incorpor- 
ated into heteroduplex DNA, base or nucleotide ex- 
cision repair pathways may eliminate the lesion. 
Excessive DNA damage is readily caused by stresses 
such as UV irradiation, and this heavy damage can 
elicit the SOS response in E. coli (see below) with 
additional (mutagenic) lesion bypass mechanisms. 


RecA and the E. coli SOS Response to 
DNA Damage 


The E. coli SOS response is a system of processes 
induced in response to massive DNA damage. These 
processes culminate in the mutagenic polymerization 
of DNA past a lesion. In essence, genetic integrity is 
sacrificed for cell survival. A network of more than 20 
proteins (SOS gene products) is involved in these 
mutagenic bypass events. The expression of the corres- 
ponding genes is regulated by the LexA repressor 
protein. Additionally, LexA represses the expression 
of the RecA protein. This repression is at a low level 
because LexA has a relatively low affinity for the recA 
operator sequence. Consequently, RecA is consti- 
tutively produced at a level sufficient for the cell’s 
recombination needs under normal growth condi- 
tions, but induced to much higher levels under SOS 
conditions. 

Excessive DNA damage leads to complete block- 
age of replication, and the accumulation of single- 
strand gaps. The RecA protein binds these ssDNA 
regions in the presence of ATP. LexA interacts with 
the RecA nucleoprotein filament and undergoes 
an autocatalytic proteolysis. This RecA-stimulated 
cleavage inactivates LexA as a repressor. As the level 
of active LexA decreases, the level of the expression 
of the SOS genes, as well as that of the recA gene, 
increases. Proteins induced as part of the SOS system 
facilitate a number of DNA repair pathways, includ- 
ing a mutagenic replicative bypass of DNA lesions 
mediated by DNA polymerases IV and V. Most of 
the mutagenic lesion bypass is mediated by DNA 
polymerase V, encoded by the protein products of 
the umuC and umuD genes. The polymerase also 


requires several subunits from DNA polymerase II, 
as well as the RecA protein itself, to function properly. 
The role of RecA in the activity of DNA polymerase 
V represents a third activity of RecA, distinct from its 
functions in recombination or SOS induction, and is 
under active investigation. 
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RecBCD of Escherichia coli is a multisubunit enzyme 
with DNA helicase and nuclease activities. It is also 
called exonuclease V or exoV. It is required for hom- 
ologous recombination of linear DNAs, such as those 
that occur during bacterial conjugation and phage- 
mediated transduction, and also for DNA double- 
strand-break repair, which, in E. coli, is accomplished 
almost exclusively via homologous recombination. 
RecBCD activity is controlled by 8-bp Chi sequences 
in the E. coli genome. In vivo, Chi modulates the 
enzyme, diminishing its destructive exonuclease acti- 
vity while leaving its recombination-promoting heli- 
case activity intact. The Chi/RecBCD system almost 
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certainly functions in promoting recombinational 
repair of DNA ends that form at broken replication 
forks. This allows reestablishment of replication forks 
by recombinational joining of a broken end to a sister 
molecule. 


The recB and recC Genes 


The recB and recC genes, encoding the RecB and 
RecC subunits of RecBCD enzyme, were identified 
by A. John Clark at Berkeley, and Peter Emmerson, 
then at Yale, by isolation of null mutants of E. coli 
unable to perform conjugational and transductional 
recombination (Clark and Margulies, 1965; Emmerson, 
1968). recB or recC null mutants display roughly 
100-fold decreases in transductional and conjuga- 
tional recombination, are sensitive to UV irradiation 
and other DNA damaging agents, and have decreased 
viability compared with wild-type cells (reviewed by 
Kowalczykowski et al, 1994). These phenotypes 
result from failure to carry out double-strand-break 
(DSB) and double-strand-end (DSE) repair. DSB and 
DSE repair are used to recombine the linear DNA 
substrates in conjugation and transduction, and also 
for repair of DNA damage and of broken replication 
forks. Loss of RecB or RecC obliterates all functions 
of the RecBCD enzyme. 


RecBCD Recombination Pathway 


Because Hfr-mediated conjugational recombination 
(see Hfr) requires the presence of functional RecB 
and RecC (which associate with RecD), the re- 
combination of linear DNA in wild-type E. coli recipi- 
ent bacteria is called recombination by the RecBCD 
recombination pathway. The work of Clark and col- 
leagues defined this pathway as requiring RecA, RecB, 
and RecC. RecA is used also for recombination that 
can occur in the absence of RecBC. Later, the presence 
of either RuvA, RuvB, and RuvC or RecG was shown 
to be required as well (Lloyd, 1991; see entries RecA 
Protein and Homology, RuvAB Enzyme, and RuvC 
Enzyme). Clark’s work defined other recombination 
pathways that could operate in special multiply mutant 
cells in the absence of RecB and RecC (see Recombin- 
ation Pathways). The idea of recombination pathways 
is that there is a defined series of DNA intermediates 
in recombination acted upon by specific enzymes. The 
idea of multiple pathways of conjugational recombin- 
ation implies that there is more than one set of 
enzymes capable of processing the double-strand lin- 
ear DNA resulting from transfer in Hfr-mediated con- 
jugation, and probably more than one set of DNA 
intermediates, leading to production of recombinant 
DNA in the recipient chromosome. 
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The recD Gene 


The exonuclease V enzyme is an ATP-dependent, 
double-strand DNA exonuclease. It had been thought 
to consist of two proteins encoded by the recB and 
recC genes. The RecD subunit was identified by 
Gerald Smith and his colleagues by isolation of a class 
of recBC mutants (called ¢) that lack the nuclease activ- 
ity of exoV, but are recombination proficient (also 
identified independently by Donald Biek and Stanley 
Cohen). Some of the { mutations were located in a 
novel gene, recD (Amundsen et al., 1986). recD null- 
mutants are recombination-proficient, are resistant to 
UV and gamma irradiation, and have normal levels of 
viability (reviewed by Myers and Stahl, 1994). recD 
mutants lack ATP-dependent nuclease activity, and 
display plasmid instability, an in vivo manifestation 
of the absence of RecBCD exonuclease activity (Biek 
and Cohen, 1986). (Plasmids replicate via rolling circle 
replication in the absence of RecBCD exonuclease, 
and then segregate unstably.) RecD is found in asso- 
ciation with RecBC (discussed below). recD mutants 
show no recombinational stimulation in response to 
Chi (RecBCD recognition) sites in DNA. This led 
David Thaler and colleagues to infer that Chi promotes 
recombination by altering the RecBCD enzyme such 
that it behaves like RecBC enzyme lacking RecD. They 
suggested that Chi promotes an important switch in 
the enzyme from RecBCD nuclease to a RecBC[D ]- 
like recombinase. This could occur by dissociation or 
alteration of the RecD subunit at Chi sites (reviewed 
by Myers and Stahl, 1994, and discussed below). 
Following the discovery of the RecD subunit, exoV 
is now commonly referred to as RecBCD. 


Genetic Organization 


The recB, recC, and recD genes are located at 63.5 
minutes on the E. coli chromosome, between the 
argA and thyA genes (Figure 1). recB and recD con- 
stitute an operon controlled by a promoter upstream 
of recB. recC is separated from recB and recD by the 
ptr gene (encoding protease III) and is transcribed 
independently (Figure 1). Transcription and transla- 
tion are regulated to achieve a low expression of the 
recB, recC, and recD genes such that there are about 10 
copies of RecBCD enzyme per cell. 


_ 
argA recD recB ptr recC thyA 


Figure | The Escherichia coli recBCD loci. Arrows 
represent the 5’ to 3’ direction of transcription. The 
recB and recD genes are co-transcribed. recC is in a 
separate transcriptional unit and is separated from recB 
and recD by the ptr (Protease Ill) gene. 


Functions of the RecB, RecC, and RecD 
Subunits of Exonuclease V 


RecB and RecD both contain nucleotide-binding 
domains (reviewed by Kowalczykowski et al., 1994). 
Both the RecB and RecD subunits have ATPase activ- 
ities, which are necessary for the double-strand DNA 
(dsDNA) exonuclease activity of RecBCD (Chenetal., 
1998). RecB, alone, is able to hydrolyze ATP in a 
DNA-dependent manner, and has DNA helicase 
activity. Purified RecB and RecC subunits assemble to 
form RecBC, a processive, ATP-dependent helicase 
having little or no nuclease activity (e.g., see Korangy 
and Julin, 1993). This corresponds with the observa- 
tions of RecBC-dependent nuclease activity in 
the presence of RecD, and RecBC-dependent helicase 
activity in the absence of RecD, in vivo (Rinken et al., 
1992). Genetic data imply an interaction of the RecC 
and RecD subunits in that mutants with a RecD™ 
phenotype (RecBCD#) map to either recD or recC 
(Amundsen et al., 1986). 

In the RecBCD holoenzyme, all subunits make 
contact with DNA (Ganesan and Smith, 1993). The 
N-terminal domain of RecB contains a helix—loop- 
helix DNA-binding motif, and removal of this domain 
results in a nearly complete loss of RecBCD en- 
zymatic activity (Yu et al., 1998a). 

The domain of RecBCD that recognizes Chi 
sequences in DNA resides, at least in part, in RecC. 
This can be inferred from the existence of recC mu- 
tants with altered Chi-recognition abilities (Schultz 
et al., 1983; Arnold et al., 2000). The RecD subunit 
appears to be regulatory (discussed below). 


Chi Sites 


Gerald Smith and colleagues demonstrated that 
Crossover hot spot instigator or Chi sites are the 
recognition sequence of RecBCD enzyme in DNA 
(reviewed by Smith, 1991; Kowalczykowski et al., 
1994). Chi is an 8-nucleotide sequence, 5’ GCT 
GGTGG3’, that acts as a recombination hot spot. 
In one model, Chi tames the RecBCD nuclease, shift- 
ing the enzyme’s mode of action to a recombination- 
promoting one. This view is harmonious with in vivo 
evidence that RecBCD exonuclease activity is stop- 
ped, in conjunction with RecA and SSB proteins, 
by Chi sites in DNA (Dabert et al., 1992; Kuzminov 
et al., 1994; Köppen et al., 1995; Myers et al., 1995). 
In another model, RecBCD unwinds DNA before 
encountering a Chi site. Upon Chi interaction, the 
enzyme makes a nick on one strand (that with a 3’ end 
at the RecBCD entry point), and continues to unwind 
the DNA after making this nick to generate a 3’-end 
strand capable of invading another DNA duplex. In 
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Figure 2 General scheme for RecBCD action on 
linear DNA. (A) RecBCD loads onto blunt or nearly 
blunt DNA ends. (B) RecBCD is believed to travel along 
the DNA as exonuclease V (depending upon the reaction 
conditions), destroying the DNA until it encounters a 
properly oriented Chi site (also denoted as the (cross- 
over-like) Greek symbol, x). (C) RecBCD recognizes 
only Chi sites encountered from the 3’ side of the 
5’'GCTGGTGG3’ sequence. A productive Chi-RecBCD 
interaction leads to attenuation of the RecBCD nuclease 
activity. The enzyme becomes the equivalent of a RecD~ 
(nuclease-defective) RecBC enzyme. The enzyme may 
retain the helicase but lose the endonuclease component 
of its exoV activity after encountering Chi, and so 
produce single-strands downstream of Chi (Rosenberg 
and Hastings, 1991). It is not yet clear whether both or 
just one single-strand end is produced at Chi in vivo. The 
picture with two strands produced (shown here) is 
supported by in vivo evidence (Hagemann and Rosenberg, 
1991; Razavy et al., 1996). One-strand models are also in 
current use (e.g., see reviews by Smith, 1991; Kowalczy- 
kowski et al., 1994; Kowalczykowski, 2000). 


this model the attenuation of DNA degradation results 
from recombination of the linear DNA witha homolog 
to produce circular DNA, which is resistant to 
RecBCD enzyme. In a third model, Chi causes a shift 
in the strand polarity of RecBCD exonuclease (dis- 
cussed below), ultimately creating 3’ single-strand 
tails exclusively, downstream from Chi. 

Chi sites were discovered as large plaque mutants 
of phage à lacking the red and gam recombination 
genes (reviewed by Myers and Stahl, 1994). à Gam 
protein binds to and inactivates the RecBCD enzyme. 
In its absence, RecBCD exonuclease destroys the lin- 
ear DNAs in rolling circle replication. Thus, in gam 
mutants, replication is only via the bidirectional theta 
(8) mode, producing circular monomer à chromo- 
somes. Monomers cannot be packaged into viable 
phage (only multimers can). However, recombination 
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between monomeric DNA can produce dimers, from 
which packaging of viable phage can occur. In A red 
gam mutants, the only recombination available is via 
the RecBCD pathway, such that production of pack- 
ageable, dimeric à chromosomes relies on RecBCD- 
mediated recombination events. Chi sites enhance 
those events. Therefore, à containing Chi produce 
larger phage bursts, and thus larger plaques, than À 
without Chi. Wild-type à has no Chi sites, but single 
base substitutions can create active Chi sequences at 
at least four positions in the à chromosome. 


RecBCD Enzyme and Chi: Overview 


A model of the interaction of RecBCD enzyme with 
Chi, based on current information, follows. RecBCD 
loads onto blunt, or nearly blunt, DSEs and degrades 
the DNA until it reaches a properly oriented Chi site 
(Figure 2). RecBCD recognizes only Chi sites that it 
encounters from the 3’ end of the 5; GCTGGTGG3’ 
sequence (Figure 2A,B) and does so with about 25- 
40% efficiency. That is, about one in three RecBCD 
transits past Chi results in recognition of the sequence. 
Successful recognition of the Chi sequence leads to 
attenuation of the RecBCD nuclease activity, and 
the enzyme continues along the DNA as a helicase 
(Figure 2C). The mechanism by which the Chi signal 
is transduced and changes RecBCD enzyme has not 
been defined in vivo. RecBCD then loads RecA pro- 
tein onto the single-strand (ss) DNA generated by 
RecBCD, following modification by Chi, helping to 
form a RecA-ssDNA filament (Figure 3A,B). This 
RecA-ssDNA filament can invade duplex (ds) DNA 
in a search for a DNA region complementary to the 
ssDNA filament (Figure 3B). Base-pairing can then 
occur leading to the formation of a heteroduplex 
DNA joint (see Heteroduplexes). These initial steps 
of DNA end processing and strand invasion achieved 
by RecBCD, Chi, and RecA lead to crossed-strand 
intermediates (Figure 3B). The crossed-strand inter- 
mediates may then be processed or resolved by the 
RuvABC and/or RecG Holliday junction processing 
proteins, to produce finished recombinants, by endo- 
nucleolytic cleavages (Figure 3C,D; and see Holliday 
Junction, RuvAB Enzyme, and RuvC Enzyme). The 
strand-invasion intermediates promote DNA replica- 
tion, perhaps independently of Holliday junction pro- 
cessing proteins (Figure 3E-3G; see Motamedi et al., 
1999, and discussed below). 

We realized that degradation of DNA from a dis- 
tant DSE to a Chi site has the effect of moving the 
DSE to Chi such that Chi sites would be sites of 
double-strand-break repair (DSBR) recombination, 
suggesting DSBR models for Chi activity (see 
Figure 3; Rosenberg and Hastings, 1991). Although 
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Figure 3 Fate of DNA after processing by RecBCD. This diagram depicts incorporation of linear DNA into the 
bacterial chromosome, by processing of each end by RecBCD and RecA. Such reactions occur during conjugational 
and transductional recombination, two routes of genetic transmission in bacteria. (A, B) RecA-coated single strands 
are used to initiate strand invasion, producing heteroduplex DNA recombination intermediates. (C,D) 
Endonucleolytic resolution of Holliday junctions can produce break-join recombinants, recombinants in which little 
newly synthesized DNA is present. (E-G) Alternatively, priming of replication forks from the recombinational strand 
exchange sites may lead to replication of the invaded chromosome. (Dashed lines represent newly synthesized DNA.) 
In this diagram, a purely break-copy outcome is shown, in which the unreplicated donor DNA primes replication of 
the recipient chromosome, producing a new chromosome with the donor DNA spliced into it. Some experimental 
evidence (Motamedi et al., 1999) supports this idea (Smith, 1991). 


there are several different reactions of RecBCD with 
DNA under different buffer conditions in vitro 
(reviewed below, and by Kowalczykowski, 2000), 
this view is supported for RecBCD in vivo by evi- 
dence of RecBCD-dependent exonuclease activity 
and its attenuation by Chi sites and RecA and SSB 
proteins in vivo (Dabert et al., 1992; Kuzminov et al., 
1994; Köppen et al., 1995; Myers et al., 1995). 


Enzymology of RecBCD 


RecBCD Loading and translocation 

RecBCD binds to DNA as a monomer, at blunt or 
nearly blunt ends (a 4-nucleotide overhang is pre- 
ferred) (Taylor and Smith, 1995a). Overhangs longer 
than about 25 bp prevent RecBCD loading. RecB asso- 
ciates with the 3’ strand upon entry, whereas RecC and 
RecD contact the 5’ strand (Figure 4; Ganesan and 


Smith, 1993). Translocation on DNA, and DNA 
unwinding, appear to be two distinct processes for 
RecBCD (Bianco and Kowalczykowski, 2000). The 
RecBCD helicase activity generates unique ssDNA 
looped-intermediates during DNA unwinding, 
which have been postulated to play a role in the initial 
synapsis steps of DNA recombination (Taylor and 
Smith, 1980). 


DNA Exonuclease and Helicase Activities 

The RecBCD helicase can unwind DNA processively 
for greater than 30kb at rates of up to 1000 bp per 
second (reviewed by Kowalczykowski et al., 1994). 
The dsDNA exonuclease activity of the holoen- 
zyme appears to be a combination of helicase and 
ssDNA endonuclease activities. RecB enzyme has 
two domains, which can be separated by proteolysis. 
The 100-kDa N-terminal domain contains DNA 


helicase 
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Figure 4 Schematic diagram of the RecBCD nuclease. Biochemical evidence and homology to known nucleases 
indicates that at least part of the nuclease active site of RecBCD lies in the RecB protein and cuts ssDNA. To cut both 
the 5’- and 3’-ended DNA strands, the 5’ strand must loop around to be oriented in the active site properly. RecD is 
believed to interact with RecC and may play a role in orienting the 5’ strand. Parallel lines represent DNA, with 
arrowed ends indicating 3’ ends and non-arrowed ends 5’ ends. 


helicase motifs and is necessary for RecBCD to bind 
and unwind DNA in an ATP-dependent manner 
(Figure 4). The 30-kDa C-terminal portion of RecB 
contains at least a part of the nuclease active site, with- 
out which dsDNA exonuclease activity is absent from 
the holoenzyme (Yu et al., 1998b). 

The nuclease active site motif, located in the 
C-terminal region of RecB (Figure 4), is similar to 
the active site motif found in several restriction endo- 
nucleases, including EcoRI, EcoRV, PvuII, Bell, and 
FokI as well as the bacteriophage A 5’—3’ exonuclease 
and the E. coli MutH protein (Aravind et al., 1999). 
RecB alone possesses weak exonuclease activity. 
Addition of RecC to the RecB protein enhances heli- 
case but not nuclease activity (Korangy and Julin, 
1993). RecC appears to protect RecB from proteolytic 
degradation, suggesting that at least part of RecC 
blocks the protease-sensitive hinge between the 30- 
and 100-kDa domains of RecB (Figure 4). Binding of 
RecD to RecBC leads to the potent exonuclease acti- 
vity of the holoenzyme. 

The nature of RecBCD dsDNA exonuclease 
activity prior to, and following, Chi recognition has 
been studied extensively. It is important to note that, 
when considering RecBCD in vitro studies, condi- 
tions under which RecBCD is allowed to unwind or 
cleave DNA can affect the apparent activity of the 
enzyme profoundly. The greatest nuclease activity, 
for example, is obtained when Mg”* exceeds ATP 
(Dixon and Kowalczykowski, 1995). When ATP 
exceeds Mg”*, RecBCD acts as a helicase and then 
makes a single-strand DNA endonucleolytic cut a 
few nucleotides to the 3’ side of the Chi site (on the 
Chi-containing strand). Following the endonucleoly- 
tic cut, the enzyme proceeds as a helicase (Taylor et al., 
1985; reviewed by Kowalczykowski et al., 1994). 
When Mg** exceeds ATP, the 3/-strand, relative to 
entry, is degraded up to the Chi site, where a final 
double-strand DNA cut is made (Taylor and Smith, 


1995b). Beyond Chi, the enzyme progresses as a heli- 
case (Taylor and Smith, 1995b) or as a nuclease 
degrading the opposite strand (Kowalczykowski, 
2000). Levels of free Mg** and ATP in E. coli are 
not known, although they appear to be about equal 
(Taylor and Smith, 1995b). Therefore, im vitro 
RecBCD activities may not mimic those in vivo. 

In vitro, when the Mg’* concentration is greater 
than the ATP concentration, RecBCD degrades both 
strands of DNA prior to Chi, with greater nucleolytic 
activity on the strand with a 3’ end at the entry point. 
A plausible model for how RecBCD, with one nucleo- 
lytic active site, degrades both DNA strands is shown 
in Figure 4. In this model, duplex DNA is unwound 
and passes through a single-strand nuclease active site 
located in the C-terminal domain of RecB. The 3/- 
ended strand passes directly through the nuclease site 
and therefore is processed efficiently. The 5’-ended 
strand needs to be situated in the active site in the 
same orientation as the 3/-strand (3’-end first) 
(Figure 4). To accomplish this, the 5’-ended strand 
must be looped around the nuclease active site. RecD 
may facilitate this looping process (Wang et al., 2000). 
Thus, the double-strand DNA nuclease activity is 
proposed to be a manifestation of one 3'-5' ssDNA 
nuclease site. The idea that the 5’-ended strand needs 
to be bent to bind properly in the nuclease active site, 
whereas the 3/-ended strand can be cut directly, might 
explain the preferential degradation, prior to RecBCD 
encountering Chi, of the 3’ relative to the 5’ strand. 
Polarities discussed here are with reference to the 
RecBCD entry point. 


Modification of RecBCD at Chi Sites 

Upon encountering a properly oriented Chi site (from 
the 3’ side of 5'GCTGGTGG3’) the RecBCD enzyme 
is altered such that the nuclease activity is attenuated 
in vitro with either ATP or Mg** in excess (Taylor 
and Smith, 1995b; Dixon and Kowalczykowski, 1993, 
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1995) and in vivo (Dabert et al., 1992; Kuzminovet al., 
1994; Köppen et al., 1995; Myers et al., 1995), and the 
DNA helicase activity remains to generate ssDNA 
substrates for recombination. The nature of the change 
to RecBCD, and the (in vivo) DNA recombination 
substrates resulting from RecBCD interaction with 
Chi, have yet to be determined. However, because the 
enzyme behaves like a RecBC(D ) enzyme, the inter- 
action between RecBC and RecD may be disrupted at 
Chi, for example by repositioning or removal of RecD 
from the holoenzyme, or other means. 

Following Chi recognition, RecBC (with or with- 
out RecD) progresses as a DNA helicase, unwinding 
DNA until it dissociates from, or reaches the end of, 
the linear DNA molecule. In vivo recombination 
experiments with phage à DNA show that the Chi 
hot spot stimulates recombination to the 5’ side of the 
Chi sequence (i.e., in the region downstream from 
RecBCD modification; reviewed by Myers and Stahl, 
1994). This provided early evidence that RecBCD 
processes linear DNA downstream from Chi such 
that it is a better recombination substrate than the 
DNA that RecBCD acted upon before reaching Chi. 
A recent study of purified RecBCD showed that, in 
addition to converting RecBCD toa RecBC(D_ ) phe- 
nocopy, interaction with Chi also causes the RecBCD 
subunits to disassemble downstream of the Chi site. 
This suggests that Chi not only promotes recombin- 
ation 5’ of Chi sites, it may also permanently inactivate 
the RecBCD enzyme. This would result in one 
RecBCD molecule being able to catalyze only a single 
recombinational exchange (Smith, 1991; Taylor and 
Smith, 1999). 

Some biochemical data suggest that, instead of 
RecBCD conversion from a dsDNA exonuclease to 
a helicase, an exonuclease polarity switch occurs at 
Chi (e.g., Anderson and Kowalczykowski, 1998). The 
cleavage of the 3’-ended strand is attenuated and cleav- 
age of the 5’-ended strand (relative to RecBCD entry) 
is increased. As a result, only a 3’/-ssDNA substrate 
is left available for recombination following Chi 
modification of RecBCD. Existing data do not reveal 
exactly what DNA substrates remain following 
modification of RecBCD at Chi in vivo. 


Loading of RecA Protein 

Once single-stranded DNA is formed, the strand 
exchange protein RecA must coat that DNA in order 
for recombination to proceed (Figure 3; see RecA 
Protein and Homology). In in vitro assays the C- 
terminal domain of RecB, within the RecBCD 
holoenzyme, facilitates the loading of RecA protein 
preferentially onto the 3/-ended strand of DNA (rela- 
tive to RecBCD entry; Churchill et al., 1999). RecA 
loading promoted by the RecBCD enzyme requires 


Chi, whereas loading promoted by the RecBC 
enzyme does not. Thus, RecD appears to block 
RecB-mediated loading of RecA until RecBCD 
encounters a Chi site (Amundsen et al., 2000). These 
discoveries highlight the multiple roles of Chi and 
RecBCD in DNA repair and recombination. RecBCD 
both creates ssDNA from dsDNA and loads RecA 
protein onto ssDNA to facilitate homologous pairing. 


Role of RecBCD in DNA Repair and 
Recombination 


RecBCD is required for the repair of chromosomal 
DSBs caused by DNA damaging agents such as UV or 
gamma radiation (reviewed by Kowalczykowski et al., 
1994; Myers and Stahl, 1994). The ssDNA-RecA 
filament, generated by RecBCD, can invade double- 
strand DNA starting the recombination process 
(Figure 3B). The RecA-DNA filament also promotes 
the SOS response to DNA damage. The filament acts 
as a co-protease, facilitating the cleavage of the LexA 
transcriptional repressor, which leads to induction 
of DNA repair, recombination, and mutation genes 
(reviewed by Walker, 1996). Therefore, the RecA pro- 
tein and the RecBCD enzyme, specifically the helicase 
activity of RecBCD, play significant roles in inducing 
the SOS response when DNA DSEs become exposed 
by damage or by replicational pausing (reviewed by 
Cox et al., 2000). 

RecBCD is important for recombination of linear 
DNAs such as occurs during phage-mediated trans- 
duction, conjugation and recombination of à red gam 
mutant phage (Figure 3). Transduction entails the 
injection of linear dsDNA into a cell by a bacterio- 
phage. RecBCD-mediated recombination enables the 
injected DNA to recombine with the host chromo- 
somal DNA. Linear ssDNA that enters a cell during 
conjugation is converted to dsDNA, onto which Rec- 
BCD can load (see Hfr, F Factor). For productive 
packaging of à red gam mutant phage, the phage gen- 
ome has to be dimerized via host (RecBCD) functions. 
The recombination process is started by a double- 
strand cut at the A cos site, which initiates packaging 
from one DNA end while RecBCD processes the 
other (reviewed by Myers and Stahl, 1994). 


DNA Replication Restart 


RecBCD-mediated recombination not only repairs 
DSBs but also helps reinitiate replication (reviewed 
by Motamedi et al., 1999; Cox et al., 2000). A single- 
strand lesion in a DNA template can lead to 
collapse of the replication fork (Figure 5A,B; Skalka, 
1974; Kuzminov, 1995) and, unless repaired, can be 
lethal. Chi sites are present 1009 times on the E. coli 
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Figure 5 RecBCD-mediated repair of a collapsed replication fork. (A) An asymmetric distribution of the 1009 
Chi sites exists on the E. coli chromosome with about two-thirds ‘pointing’ toward the origin (arrows). (B) A 
bidirectional replication bubble can be disrupted by a ssDNA break, forming a DSE. (C) RecBCD can load onto the 
DSE formed and travel along the DNA until (D) it encounters a Chi site. RecA loads onto the ssDNA generated 
downstream from Chi. (E) This RecA-ssDNA filament invades a homologous duplex (in this diagram, the single- 
strand nick in the invaded molecule has been ligated to form continuous duplex) and (E, F) primes DNA replication, 
which is thus restarted (colored ball represents a replisome). 


chromosome. Not only is this four to eight times 
greater than the number expected from random dis- 
tribution of nucleotides, but two-thirds of the Chi 
sites are oriented in the chromosome such that they 
are encountered in the correct orientation for repair of 
broken replication forks originating at oriC 
(Figure 5). When a strand being replicated includes a 
single-strand nick (Figure 5A), the replication fork 
can collapse, leaving a DSE (Figure 5B). RecBCD can 
load onto the resulting DSE and translocate toward 
the origin (Figure 5C). With more Chi sites in the 
correct orientation, RecBCD is not likely to travel far 
before loading RecA onto the ssDNA that is gen- 
erated at Chi (Figure 5D). The RecA-DNA filament 
can then invade a sister molecule and may prime DNA 
synthesis, restarting replication in the correct (origin 
to terminus) direction (Figure 5E). 


Prevention of Sigma Replication of the 
E. coli Chromosome 


Rolling-circle or sigma (o) replication of non-Chi- 
containing plasmid DNA is prevented by the nuclease 
activity of RecBCD, which degrades the linear DNA 
formed (reviewed by Myers and Stahl, 1994). RecBCD 
may function in a similar manner with the E. coli 
chromosome to maintain bidirectional @ replication. 
During chromosomal replication, replication fork col- 
lapse or breakage could lead to a -replicating chromo- 
some (as in Figure 5B). Such chromosomes may not 
segregate properly following replication, thus leading 
to cell death (Kuzminov and Stahl, 1997). However, 
RecBCD, along with the number and orientation 


of Chi sites in the chromosome, probably prevents 
o-replication by generating ssDNA for RecA to act 
upon for recombination-dependent restart of normal 
O replication (Figure 5C-F). 


Foreign DNA Degradation 


The nuclease activities of RecBCD also protect E. coli 
from foreign DNA such as phages which could kill 
the cell. Many bacteriophages encode inhibitors of 
RecBCD so that their linear DNA can persist and a 
productive infection can occur. Without these inhibi- 
tors, the infecting DNA is destroyed by RecBCD. 
Phage T4, for example, elaborates a protein, the prod- 
uct of gene 2, which binds to the ends of dsDNA, 
thereby preventing RecBCD loading onto and 
degrading T4 DNA (Oliver and Goldberg, 1977). 
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A recessive allele or mutation is one that is only 
expressed phenotypically when it is present in the 
homozygous form. In the heterozygote it is obscured 
by the dominant allele. 


See also: Recessive Inheritance 
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Recessive Inheritance 
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A recessive allele of a gene is one whose phenotype is 
only manifest in an organism when two copies of that 
allele are inherited — one from each parent. In recessive 
inheritance, an offspring may inherit the recessive 
allele from one parent; however, the presence of a 
dominant allele from the other parent obscures the 
recessive phenotype. Thus, phenotypically, hetero- 
zygous offspring possessing one recessive allele of 
a gene are indistinguishable from homozygous 
offspring possessing two dominant alleles. 


See also: Dominance 


Recessive Lethal 
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A recessive lethal allele is one that is lethal when 
present in the homozygous form. 


See also: Lethal Mutation; Recessive Inheritance 
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Reciprocal crosses are crosses between different 
genetic stocks, strains or species where the sexes of 
the parents are reversed. In the case of strains A and B, 
(A x B) and (B x A) are reciprocal crosses. Phenotypic 
differences between (A x B) vs. (B x A) F; hybrids are 
due to so-called ‘parent-of-origin effects.’ Reciprocal 
Fı hybrids may differ in factors such as maternally 
supplied growth factors, nutrients, and maternal care. 
They also differ in the inheritance of maternally 
supplied episomes such as mitochondrial DNA, the 
inheritance of uniparental epigenetic marks such as 
those seen in genomic imprinting, and in the inherit- 
ance of sex chromosomes. When differences in the 
phenotypes of progeny from reciprocal crosses are 
observed, it is likely that strain differences in these 
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elements are responsible. Hence, reciprocal crosses 
may be useful in mapping loci that underlie parent- 
of-origin effects. 


See also: Cross; Gene Mapping 
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Reciprocal Recombination in Meiosis 


For genetic markers that are more than a few kilobases 
apart, meiotic recombination generally produces com- 
plementary recombinant types simultaneously. Since 
meiotic recombination occurs pairwise between chro- 
matids after chromosome replication, three kinds 
of tetrads can be produced by the diploid AB/ab 
(see Table l; see Tetrad Analysis). This meiotic recom- 
bination (crossing-over) is conservative (two chro- 
matids in, two chromatids out) and reciprocal (each 
recombinant is accompanied by its complement). 

The reigning view of the reaction pathway that 
results in meiotic reciprocal recombination has been 
derived primarily from studies in Saccharomyces cere- 
visiae and by analogy with the recombination system 
of Escherichia coli. According to this view, a meiosis- 
specific, dimeric endonuclease introduces a variable 
number of double-strand cuts per chromatid. The 
cuts, several per chromatid, occur where the DNA is 
endonuclease-sensitive due to relatively loose chro- 
matin structure. These regions frequently correspond 
to transcription promoters and tend, consequently, to 
be intergenic. 

Limited digestion of the 5'-ended strands on each 
side of the cut creates single-stranded 3’overhangs, 
which are several hundred nucleotides long. These 
overhangs bind protein(s) that are like the RecA pro- 
tein of E. coli, and which confer upon the single- 
stranded ends the ability to recognize complementary 
sequences in homologous duplex DNA and to use that 
duplex as a jig and template to repair the double- 
strand break. The reaction connects the repaired 
duplex with the homolog by the helical twists of two 
DNA duplexes lying between a pair of Holliday junc- 
tions (Figure 1). Each of the duplexes contains a 


Table | 


segment in which the two strands are derived from 
different parent duplexes (hybrid DNA). These joint 
molecules are resolved in time to permit the separation 
of homologs at the first meiotic anaphase. 

Resolution of the joint molecules can occur by 
enzyme (resolvase)-catalyzed cutting of two strands 
of like polarity in each junction. Reciprocal recombin- 
ation of flanking DNA results when, in one junction, 
the crossing pair of strands is cut and, in the other, the 
noncrossing pair is cut. Typically, a joint molecule fails 
to give reciprocal recombination of flanking DNA 
with a probability exceeding one half. The noncross- 
over resolution of joint molecules may result from 
resolvase-catalyzed cutting of the same two strands 
at each junction, from the action of a topoisomerase, 
or from the cutting of one junction followed by sliding 
of the other one to the site of the first. 

Reciprocal recombination by double-strand-break 
repair requires that the two ends created by the double- 
strand break invade the same homolog. It is likely that 
the machinery assuring this coordination is specifically 
meiotic. It may be related to machinery that permits 
repair of double-strand breaks to favor interaction 
between homologs, as opposed to sister chromatids. 


Relationship to Gene Conversion 


Meiotic gene conversion, a nonreciprocal route to 
recombination, can be understood within the double- 
strand-break repair scheme described above. DNA 
lost by the preinvasion digestion of the 5/-ended 
strands of the cut chromosome is replaced by copying 
the homolog. Any genetic marker located in that 
segment of the chromosome will no longer be re- 
presented normally in the tetrad, since five single 
strands will correspond to the genotype of one parent 
and three strands to the other parent (5:3 tetrad, half 
conversion). The inequality can be enlarged to a 6:2 
ratio (full conversion) by the loss of nucleotide 
sequences from the invading, 3’-ended strand that is 
in hybrid DNA. Such loss is often the result of the 
action of a mismatch-repair system that recognizes the 
local noncomplementarity resulting from the marker 
difference between the two homologs and excises 
a segment of the 3’-ended strand, which is then 
replaced using the homolog as template. A 6:2 segre- 
gation could also result if the 3’-ended, as well as the 
5’-ended, strand at the initiating double-strand break 
is sometimes degraded. 


Types of meiotic tetrads produced by reciprocal recombination in the two-factor cross AB x ab 


Parental ditype (PD) 


Tetratype (T) 


Nonparental ditype (NPD) 


AB, AB, ab, ab 


AB, Ab, aB, ab 


aB, aB, Ab, Ab 
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Figure | The double-strand-break repair model for meiotic reciprocal recombination. Only the interacting 
homologous chromatids are shown. Arrowheads signify 3’-OH ends of polynucleotide strands. Newly synthesized 
DNA is shown as discontinuous. The black DNA duplex (A) is cut on both strands (B). The 5’-ended strands are 
resected exposing 3’-ended single strand DNA (C). With the aid of RecA-like proteins, those strands invade an intact 
homolog and pair with their complements (D). The DNA removed by resection is restored using the 3’ ends as 
primers and the white homolog as template (E). The interacting duplexes are tied together in a joint molecule by two 
Holliday junctions (E). Those junctions are subject to enzymatic cutting by a ‘resolvase, which cuts two strands of the 
same chemical polarity. If the pairs of strands that are cut at the two junctions are different (F), DNA flanking the pair 
of junctions undergoes reciprocal recombination. In the figure, the left junction was cut vertically, and the right one 
was cut horizontally. If both junctions are cut on the same pair of strands (G, in which both junctions were cut 
horizontally), reciprocal recombination fails, as it does if the joint molecule is resolved by a topoisomerase (H). 
Genetic markers between the junctions may undergo gene conversion, which can result in nonreciprocal 
recombination. Conversion is symbolized by segments of DNA where the two interacting DNA duplexes emerge 
with an excess of white strands over black strands. 


Site-Specific Recombination is Often 
Reciprocal 


When a homolog is available for repair of a double- 
strand break, the recombination reactions described 
above can occur anywhere along a chromosome 
that is subject to such breaks (generalized or 


homology-dependent recombination). Reciprocal 
recombination can also occur at specialized sequences, 
catalyzed by enzymes adapted to those sequences 
(site-specific recombination). The incorporation of the 
chromosome of temperate prophages (such as À) into 
the chromosome of their bacterial host is dependent on 
such site-specific, reciprocal recombination. The same 
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In prokaryotes, double-strand ends of DNA duplexes usually initiate recombination that is nonreciprocal, 


which can be completed with (left) or without (right) extensive DNA replication. The initial steps (A-C) are 
as in Figure |. Invasion of the two ends is not coordinated, so that only one end is apt to invade a given intact 
homolog (D). Enzyme-catalyzed cutting of the single Holliday junction creates an intermediate that has the topology 
of a DNA replication fork. This intermediate may acquire the enzymes needed for replication (F) or may be 
further cut to produce a recombinant without the intervention of DNA replication (G). 


system can catalyze reciprocal, site-specific exchange 
between two viral chromosomes. Similar systems can 
catalyze inversions, resulting inaltered gene expression. 


Nonreciprocal Recombination in 
Prokaryotes 


Generalized recombination in E. coli and its phages is 
initiated by double-strand breaks but is frequently 
nonreciprocal. The presumed intermediates (Figure 2) 
can be resolved by resolution of the single Holliday 
junction or by copying the homolog. Such replica- 
tion, primed by recombination intermediates, is the 


major mechanism of DNA replication late in the 
infectious cycle of bacteriophage T4. In S. cerevisiae, 
recombination initiated by the HOT hotspot in 
mitotically dividing cells appears to involve such a 
‘break-and-replicate’ pathway (Voelkel-Meiman and 
Roeder, 1990). Such nonreciprocal recombination is 
probably a reflection of the primary role of the recom- 
bination apparatus, which is thought to be the repair of 
accidently broken DNA replication forks. 


Further Reading 
Stahl F (1996) Meiotic recombination in yeast: coronation of the 
double-strand-break repair model. Cell 87: 965—968. 


Reference 

Voelkel-Meiman K and Roeder GS (1990) Gene conversion tracts 
stimulated by HOT l-promoted transcription are long and 
continuous. Genetics 126: 85 1-867. 


See also: Crossing-Over; Gene Conversion; 
Genetic Recombination; Nonreciprocal 
Exchange; Tetrad Analysis 
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Reciprocal translocation refers to the production of 
new genotypes with the reverse arrangements of alleles 
according to maternal and paternal origin. 


See also: Crossing-Over 


Reciprocality 
F W Stahl 
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The property of producing complementary genetic 
recombinants in a single act, as in meiotic crossing- 
over. A recombination act that manifests reciprocality 
for markers more than a few kilobases apart may be 
nonreciprocal for markers lying closer together in that 
interval. 


See also: Crossing-Over; Gene Conversion; 
Nonreciprocal Exchange; Reciprocal 
Recombination 


Reckless DNA Degradation 


S M Rosenberg and P J Hastings 
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‘Reckless DNA degradation’ is an old name for a 
phenomenon observed in Escherichia coli mutants 
lacking RecA homologous recombination protein. In 
these cells, an abnormal amount of degradation of 
bacterial chromosomal DNA occurs both spontan- 
eously and following exposure to ionizing or ultravio- 
let (UV) radiation. The degradation is carried out by 
the RecBCD exonuclease, also called exoV, which 
attacks DNA double-strand ends (DSEs). Modern 
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understanding of the functions of RecA and RecBCD 
enzyme provide a framework for understanding reck- 
less degradation. Although the E. coli chromosome is 
circular, it frequently becomes linear as a result f DNA 
replication accidents and from damage. RecA recom- 
bines DNA ends with homologous sequences, thereby 
restoring circularity to the bacterial chromosome, and 
circular DNA is not attacked by RecBCD exonuclease. 

RecA protein is the prototypical strand-exchange 
protein of E. coli with orthologs in all organisms 
examined (see RecA Protein and Homology). RecA 
coats single-strand DNA and catalyzes homologous 
pairing with duplex DNA, forming heteroduplex 
joints, the key intermediate in homologous recombin- 
ation (see Heteroduplexes). Shortly after recA null 
mutants were isolated in the laboratory of A. John 
Clark at Berkeley (Clark and Margulies, 1965), Clark, 
Paul Howard-Flanders at Yale, and their colleagues, 
discovered that recombination-deficient EcA mutants 
undergo abnormally high levels of both spontaneous 
and UV-irradiation-and X-irradiation-induced de- 
gradation of their DNA (Clark et al., 1966; Howard- 
Flanders and Theriot, 1966). This degradation, called 
reckless DNA degradation, was shown by Emmerson 
(1968) to be caused by products of the recB and recC 
genes, which encode subunits of RecBCD enzyme or 
exoV (see RecBCD Enzyme, Pathway). 

We can now understand the basis of reckless de- 
gradation as follows: RecBCD is a double-strand DNA 
exonuclease and also the major enzyme involved with 
double-strand-break (DSB) and DSE repair in E. coli 
(see RecBCD Enzyme, Pathway). It processes DSEs, 
degrading them until it recognizes a DNA sequence 
called ‘Chi,’ present throughout E. coli DNA. At Chi, 
degradation often stops, and the enzyme then gener- 
ates single strands (helicase activity with or without 
some single-strand nuclease activity). RecA coats the 
single-strand (ssDNA) and catalyzes recombination 
with a sister molecule, thus protecting the DSEs 
from further nuclease activity. 

The radiation-induced reckless degradation can be 
understood as the degradation of DNA that becomes 
linear after suffering radiation-induced double-strand 
breakage. In the absence of RecA and recombination, 
DNA ends are not joined to sister molecules to restore 
circularity. Consequently, they are not protected from 
RecBCD exonuclease. 

Spontaneous reckless degradation implies that the 
chromosome can become linear without exogenous 
damaging agents. Recent work from many laboratories 
indicates that DNA replication frequently goes awry, 
leaving DSEs exposed to RecBCD nuclease attack. See 
RecBCD Enzyme, Pathway, for one way that such ends 
are generated by replication breakdown, and Cox et al. 
(2000) forareview of others. Reckless degradationinthe 
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absence of damaging agents is likely to be caused by 
such replication breakdown, generating ends that are 
destroyed by RecBCD in the absence of the protec- 
tion by RecA. 


Further Reading 
Clark AJ (1996) recA mutants of E. coli K12: a personal turning 
point. BioEssays 18: 767-772. 
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Recombinants are progeny (cells or molecules) with a 
genotype different from that of either parent. 


See also: Recombination, Models of 


Recombinant Congenic 
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Recombinant congenic strains (RCS) of mice are a 
genetic tool designed to increase the resolution power 


of mapping the genes that control quantitative traits. 
Genetic control of important biological characteristics 
including susceptibility to common diseases, but also 
virtually all biological aspects of the organisms, such 
as body size, metabolic parameters, etc., is exercised 
by multiple genes that affect quantitative expression of 
such traits. Obviously, identification of these genes 
would mark a considerable advance in the understand- 
ing of the molecular biology of such traits and offer 
possibilities of their manipulation. 

The capacity to map a gene affecting a quantitative 
trait in a cross is proportional to its contribution to 
phenotypic variance, and indirectly proportional to 
the variance caused by other segregating genes and 
by non-genetic factors. This ratio may be increased 
by reducing the variance caused by other genes. In the 
RCS this effect is obtained by ‘diluting’ the genome of 
one inbred strain (the ‘donor’ strain) on a genetic 
background of a second inbred strain (the ‘back- 
ground’ strain). This is achieved by two generations 
of backcrossing of the donor strain to the background 
strain and producing a number of RCS by subsequent 
inbreeding. Each of the RC strains produced in this 
way contains a different, random subset of approxi- 
mately 12.5% genes from the donor strain. These 
subsets are partly overlapping and in a series of 20 
RC strains almost 95% of donor strain genes is repre- 
sented at least once. In a cross between a RC strain and 
the background strain only a small fraction of all genes 
is segregating. 

Three series of mouse RC strains were generated. 
Their respective background and donor strains are 
BALB/cHeA and STS/A (the CcS/Dem series), C3H/ 
SnA and C57BL10/SnA (the HcB/Dem series), and 
O20/A and B10.020/Dem (the OcB/Dem series). 
The genetic composition of these strains (the parts 
of the genome they received from their donor and 
background strains) is described in detail in the 
mouse genome database (http://www.informatics. 
Jax.org). 

Mapping of quantitative trait loci (QTLs) using the 
RCS involves two stages. First, the phenotype of the 
RCS for the studied trait is established. Subsequently, 
the RC strains which differ most from the background 
strain are used to map the genes responsible for these 
phenotypic differences in Fz hybrids or in backcrosses 
between the selected RC strains and the background 
strain. The genotyping of these crosses is easier than 
genotyping of a cross between two inbred strains, as 
the RC strains differ from the background strain at 
a few and relatively short chromosomal segments. 
Usually 13-18 markers are sufficient to achieve the 
density coverage of one marker per 10-15cM which 
is required for efficient mapping. The RC strains have 
been used to map a number of traits: susceptibility 


to colon tumors, lung tumors, radiation-induced 
lymphomas, resistance to Leishmania major infection, 
genes that control various aspects of T lymphocyte 
activation, radiation- and glucocorticoid-induced 
apoptosis, lipid metabolism, diabetes, etc. More than 
60 novel QTL loci have been mapped using this 
approach. 

Due to the high genetic resolution of RCS map- 
ping, several general features of quantitative genetics, 
not readily detectable in crosses of standard inbred 
strains, were demonstrated: 


= 


. The large number of QTLs which can be detected. 
By screening less than a half of the genome, more 
than 20 lung tumor susceptibility loci were de- 
tected. 

2. Different QTLs interact very frequently with each 
other; this is not readily detectable in the total 
genome crosses because of insufficient resolution 
power. 

3. Uncovering the hidden phenotypes (e.g., a specific 
tumor type) that are not detected in the parental 
inbred strains nor in their whole genome cross, 
because they depend on the presence of a specific 
allelic combination. Due to the presence of differ- 
ent combinations of genes in individual RC strains, 
such phenotypes may become prominent in some 
of them. For example the RC strain CcS-2 develops 
after irradiation a high number of myelocytic leuke- 
mias, whereas these tumors are very rare in the 
parental strains BALB/c and STS and in their 
crosses. 

4. RCS help to dissect the pathways of disease devel- 
opment, as shown in the case of L. major infection, 
where a number of loci with unexpected com- 
plexity and diversity of immunological effects 
were shown to determine susceptibility to leishma- 
niasis. 


Some of the QTLs defined by the RC strains have 
been mapped to very short chromosomal segments 
(less than 0.2-1 cM). Using the sequence information 
from the human and mouse genome projects it will 
be possible to focus rapidly on the candidate genes 
present in such regions and to clone the responsible 
genes. Application of this approach in combination 
with linkage disequilibrium and association studies 
in humans may lead to rapid detection of human 
genes important for susceptibility to common dis- 
eases. 


See also: Inbred Strain; QTL (Quantitative Trait 
Locus) 
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Recombinant DNA is the term applied to chimeric 
DNA molecules that are constructed in vitro, then 
propagated in a host cell or organism. The basic 
recombinant DNA consists of a vector and an insert 
(Figure |). The vector is a replicon (see Replicon) 
capable of replicating in the cells of choice. It is 
endowed with a functional replication origin, usually 
carries a selectable marker, and typically has been 
engineered to accommodate inserts conveniently. Vec- 
tors are based on naturally occurring replicons, such as 
bacterial plasmids, viruses, or cellular chromosomes. 
Inserts can be of any sort — long or short segments of 
DNA, from natural or synthetic sources. The result- 
ing recombinant DNAs are often referred to as clones, 
which is shorthand for chimeric DNAs that are isol- 
ated in cellular or viral clones; and the process of 
producing these recombinants is frequently called 
DNA cloning or gene cloning. 

The ability to construct and propagate recombin- 
ant DNAs was developed in the early 1970s. The 
first chimeric DNAs were produced by Peter Lobban 
and Dale Kaiser at Stanford University by endowing 
two different DNA molecules with complementary 
homopolymer tails and joining them by simple 
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Figure | The structure of a recombinant DNA. The 
vector illustrated here is a bacterial plasmid that has an 
origin of replication (ori), a selectable marker in the 
form of a gene conferring resistance to the antibiotic 
ampicillin (Amp'), and a multiple cloning site (MCS). The 
insert can be any type of DNA sequence (as elaborated 
in the text). The recombinant DNA is created by joining 
the vector and insert. 
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Watson—Crick base pairing, but these were not intro- 
duced into living cells and replicated. That advance 
required the isolation of a bacterial plasmid that could 
serve as a vector. Combining the identification of 
such plasmids and the use of restriction enzymes (see 
Restriction Endonuclease), the laboratories of Stanley 
Cohen (at Stanford University) and Herbert Boyer (at 
the University of California, San Francisco) produced 
replicating recombinant DNAs by joining segments 
isolated from two different bacterial plasmids and 
propagating the chimera in a laboratory strain of 
Escherichia colt. Soon after, DNAs from a wide variety 
of sources, including eukaryotic genomes, were cloned 
using the same basic methodology. 

At the time that recombinant DNA technology 
was being developed, concerns were raised about 
potential hazards that novel combinations of genes 
might entail. In 1974 a moratorium was placed on 
many types of cloning experiments, and in 1975 a 
conference was held at the Asilomar Conference 
Center in Pacific Grove, California, to formulate the 
first guidelines for ensuring safe use of the new cap- 
abilities. The United States National Institutes of 
Health established a Recombinant DNA Advisory 
Committee to oversee the revision and dissemination 
of formal guidelines and to review proposals for 
recombinant DNA experiments that seemed most 
likely to produce potential threats to public safety. 
Although proceeding with caution in the early stages 
of recombinant DNA production was a socially 
responsible approach, the technology has proved to 
be quite safe. No novel infections have been attributed 
to recombinant DNAs, and federal oversight of clon- 
ing experiments has relaxed considerably. 

Recombinant DNAs are now generated quite 
routinely: as a method of gene isolation, to produce 
large quantities of specific gene products, for studies 
of the functions of normal and mutant versions of 
specific genes, and for a variety of other purposes. 
The goals of each individual study dictate the nature 
of the vector, insert, and detailed structure of the 
chimeric DNA, but some general characteristics can 
be noted. 

A vector must be capable of replicating in the cells 
of choice; it must have characteristics that allow iden- 
tification of cells carrying the recombinant DNA; and 
it should be designed to make joining to potential 
inserts as easy as possible. For propagation in bacteria, 
bacterial plasmid DNAs are frequently used as vectors. 
The plasmid vectors in use today have features taken 
from naturally occurring plasmids, but they have been 
engineered considerably for ease of use in the labora- 
tory. At a minimum, such vectors have an origin of 
DNA replication and sequences that determine how 
many copies of the plasmid will be maintained in the 


host cell. They carry a selectable marker — typically 
a gene specifying antibiotic resistance — that allows 
selection of plasmid-carrying cells. Modern vectors 
have a cluster of restriction enzyme sites (often called 
a multiple cloning site, or MCS) ina nonessential region 
of the plasmid that facilitates joining to inserts of diff- 
erent designs. Many vectors carry, in addition, a feature 
that allows one to distinguish cells with recombinant 
DNAs that contain an insert from ones that contain 
vectors without inserts. Vectors that are designed for 
the expression of novel proteins in the host cell fre- 
quently carry, adjacent to the MCS, sequences that 
ensure and regulate gene expression in bacteria, includ- 
ing a promoter for RNA polymerase and a translational 
initiation site. Sometimes coding sequences for specific 
proteins or peptides are included, so the protein that 
is expressed emerges as a fusion between sequences 
encoded by the insert and sequences that facilitate 
purification of the resulting polypeptide. 

An alternative to bacterial plasmid vectors for 
propagation in bacteria is the use of modified bacterial 
viruses as vectors. Bacteriophage à has been used 
extensively in this context due to the detailed know- 
ledge of its life cycle and essential functions. Some 
particularly useful vectors combine features of à and 
plasmid sequences and allow exploitation of advant- 
ages of viral and plasmid characteristics at different 
stages of a cloning project. Other bacterial viruses that 
have been used as vectors for specific purposes include 
the M13 family, which package a single-stranded cir- 
cular version of the chimeric DNA into virus particles, 
a characteristic that may have advantages for some 
experiments. 

When it is desired to propagate recombinant DNAs 
in eukaryotic cells, vectors that replicate in those cells 
are used. Yeast cells are capable of propagating nuclear 
plasmids based either on a natural plasmid replicon 
(the 2 um circle) or on yeast chromosomal replicons. 
As in the case of bacterial vectors, a replication origin, 
a selectable marker (typically a gene that complements 
a chromosomal mutation that confers a nutritional 
requirement), and a multiple cloning site are included 
in a basic yeast vector. Many yeast vectors also carry 
features of bacterial vectors, so they can be propagated 
in cells of both types, and these are called shuttle 
vectors. 

Some yeast vectors are designed to behave in the 
host cells essentially like natural chromosomes. They 
must, therefore, have the minimum features of normal 
yeast chromosomes. In addition to an origin of DNA 
replication, these vectors carry a centromere to ensure 
proper chromosome segregation at mitosis and telo- 
meres to stabilize chromosomal ends. Mammalian 
cells do not have natural nuclear plasmids, and the 
minimal properties of mammalian chromosomes 


have not been identified at the molecular level. Thus, 
vectors for mammalian cells are usually based on viral 
genomes. 

DNA inserts for recombinant DNAs can be pro- 
duced in a variety of different fashions. They can be 
fragments of genomic DNA from an organism of 
interest that are generated by cleavage with restriction 
enzymes or by random shearing. They can be DNA 
copies of mRNAs (complementary DNAs, or 
cDNAs) from selected cells or tissues. They can be 
polymerase chain reaction (PCR) products (see Poly- 
merase Chain Reaction (PCR)); they can be fragments 
derived from previously isolated recombinant DNAs; 
and they can be synthetic oligonucleotide duplexes. 

Joining of inserts to the vector is commonly 
achieved by cutting both with restriction enzymes 
that create compatible (i.e., complementary) single- 
stranded ends and joining with DNA ligase. If two en- 
zymes with different nucleotide sequence specificities 
are used to generate distinct tails on the two ends of 
both vector and insert, the insert will be joined in only 
one orientation with respect to the vector, and prob- 
lems with simple religation of the vector without an 
insert are minimized. Once joining is accomplished, 
the resulting chimeric molecules can be introduced 
into the host cells by a variety of transformation (see 
procedures). Selectable markers on the vector allow 
recovery only of cells carrying the recombinant 
DNAs. Isolation of colonies of cells derived from 
single transformants, or of viral plaques represent- 
ing progeny of a single initiating virus particle, ac- 
complishes the cloning step in recombinant DNA 
production. 


Further Reading 

Brown TA (1995) Gene Cloning: An Introduction, 3rd edn. London: 
Chapman & Hall. 

Watson JD and Tooze J (1981) The DNA Story. San Francisco, 
CA: W.H. Freeman. 


See also: Polymerase Chain Reaction (PCR); 
Replicon; Restriction Endonuclease 
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The United States National Institutes of Health estab- 
lished a set of guidelines, which stipulates practices for 
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making and using recombinant DNA molecules and 
organisms that contain recombinant DNA. Any indi- 
vidual who receives support from the NIH for recom- 
binant DNA research must be associated with or 
sponsored by an institution that assumes responsibil- 
ities assigned in the NIH Guidelines for Research 
involving Recombinant DNA Molecules. The guide- 
lines categorize different experiments into different 
risk groups based on among other criteria the source 
of the DNA molecule and the host into which it is 
introduced. The guidelines indicate the level of con- 
tainment at which each experiment can be conducted. 
The levels of containment consider both the physical 
containment (the laboratory) and the biological con- 
tainment (the host). 

The guidelines can be accessed at http://www4. od. 
nih.gov/oba/rdna.htm 


See also: Asilomar Conference; Recombinant 
DNA 
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See: DNA Cloning 


Recombinant Inbred 
Strains 
L Silver 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1082 


Recombinant inbred (RI) strains are formed from an 
initial cross between two different inbred strains fol- 
lowed by an F, intercross and 20 generations of strict 
brother-sister mating. This breeding protocol allows 
the production of a family of new inbred strains with 
special properties relative to each other. Different RI 
strains derived from the same pair of original inbred 
parents are considered members of a set. Each RI set 
is named by joining an abbreviation of each parental 
strain together with an X. For example, RI strains 
derived from a C57BL/6] (B6) female and a DBA/2J 
male are members of the BXD set, and RI strains 
derived from AKR/J and C57L/J are members of the 
AKXL set. Each RI strain in a particular set is distin- 
guished by appending a hyphen to the series name 
followed by a letter or number. Thus, BXD-15 is 
a particular RI strain that has been formed from an 
initial cross between a B6 female and a DBA male. 
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At any point in time, it is always possible to add a new 
strain to a particular set through an outcross between 
the same two progenitor strains followed by 20 gen- 
erations of inbreeding. The RI strains represent an 
important tool in the arsenal available for linkage 
studies in mice. 


See also: Inbred Strain 


Recombination Hot Spots, 
Mouse 
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The mouse major histocompatibility complex (MHC) 
remains the best-studied model in mammals for the 
correlation of meiotic recombination frequency (in 
centimorgans, cM) with physical distance (in kilo- 
bases, kb). Over the approximately 2000 kb region 
that makes up the MHC on mouse chromosome 17, 
the location of meiotic crossing-over departs dramat- 
ically from what would be expected of random recom- 
bination. A report of the molecular cloning of an 
important region of the MHC by Steinmetz and col- 
leagues in 1982 revealed that in standard inbreed 
strains, recombination is clustered in a small segment 
associated with the Ef gene. Further work on the Ef 
hot spot narrowed the location of recombination in 
over 50 cases of recombination to a 1.0-4.0 kb segment 
within the second intron of the Ef gene. 

A total of eight recombination hot spots have now 
been defined in the mouse MHC. Some of these have 
been well characterized by molecular cloning and 
nucleotide sequence analysis (the hot spots associated 
with the Lmp2, EB, Ex, and G7c genes). In each case, 
recombination breakpoints seem to be limited to a 
chromosomal segment of only 1-4 kb. At least within 
the I-region of the mouse MHC (about 450kb in 
length), hot spots are separated by relatively large 
segments of 50-100 kb where little or no recombin- 
ation has been detected. Some intriguing differences in 
the specificity of recombination have been associated 
with the Lmp2 and Ex recombinational hot spots. For 
example, recombination associated with the Lmp2 hot 
spot can be regulated by cis-acting elements that con- 
trol strain-specificity (which parental chromosomes 
are competent for recombination) and sex-dependency 
(the sex of the parental heterozygote). 

Little is presently known of the physical or bio- 
chemical features of a recombinational hot spot that 


makes it an attractive, and perhaps exclusive, site for 
recombination. Although DNA sequence signals have 
been suggested, none have been consistently asso- 
ciated with the well-characterized hot spots, and no 
experimental evidence exists that would tie recombin- 
ation to specific or general kinds of DNA sequences. 
Importantly, it remains unclear whether site-restricted 
recombination is characteristic of recombination 
throughout the genome or is a special case associated 
with the genes and unique evolution of the MHC. 


Further Reading 

Shiroishi T, Sagai Tand Moriwaki K (1993) Hotspots of meiotic 
recombination in the mouse major histocompatibility com- 
plex. Genetica 88: 187—196. 

Snoek M, Teusher C and van Vugt H (1998) Molecular analysis of 
the major MHC recombinational hot spot located within the 
G7c gene of the murine class Ill region that is involved in 
disease susceptibility. Journal of Immunology |60: 266 —272. 

Steinmetz M, Minard K, Horvath S et al. (1982) A molecular map 
of the immune response region from the major histo- 
compatibility complex of the mouse. Nature 300: 35 — 42. 


See also: Histocompatibility; Major 
Histocompatibility Complex (MHC) 
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Programmed changes in genomic structure are essen- 
tial to the immune response. Early in the development 
of the immune system, in the precursors of B and T 
cells, V(D)J recombination occurs to create the vari- 
able (V) regions of the antigen receptors. Later, 
following activation by antigen, class switch recombin- 
ation occurs in B cells to alter the constant (C) region 
of the immunoglobulin molecule. Figure | outlines 
the steps that occur during VDJ recombination and 
class switch recombination at the murine immuno- 
globulin heavy chain locus. 


V(D)J Recombination in Lymphocyte 
Development 


V(D)J recombination joins gene segments to create 
a region of DNA that encodes the variable region of 
the antigen receptor. Similar events occur to produce 
the antigen receptors in both T cells and B cells, but 
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V(D)J recombination and class switch recombination. The top line diagrams the murine heavy chain locus 


in its germline configuration. V(D)J recombination joins V, D, and J gene segments (middle line, left) to allow 
production of IgM antibodies (middle line, right). Class switch recombination joins a new constant region to the 
expressed variable region (bottom line, left) to allow production of antibodies of classes other than IgM. The figure 
shows switch recombination from [1 to yl; and an IgGI antibody is diagrammed (bottom line, right). V, variable; D, 


diversity; J, joining; S, switch region; C, constant region. 


for simplicity this description will concentrate on 
events at the immunoglobulin loci in B cells. In B 
cells, the heavy chain D and J regions recombine 
first; and next, heavy chain V regions recombine with 
D-J regions. The immunoglobulin light chain loci 
contain only V and J regions, and no D regions, and 
light chain recombination occurs in a single step in 
which V and J regions are joined. Following successful 
recombination at both the heavy and light chain loci, 
a B cell expresses the encoded immunoglobulin mo- 
lecule on its cell surface as its receptor for antigen. 
Because amino acid residues within the V(D)J regions 
make direct contact with antigen, the specificity of 
the antigen receptor is determined by the sequences 
of the heavy and light chain V(D)J regions. 


The V(D)J Recombination Pathway 

V(D)J recombination depends on a pair of proteins, 
RAG1 and RAG2, which are produced only in lym- 
phocytes. RAG1 and RAG? recognize specific recom- 
bination signal sequences (RSSs) that flank the V, D, 
and J gene segments. The RSSs contain conserved 
heptameric and nonameric binding site for RAG1 
and RAG2, separated by either 12 or 23 bp of DNA 
spacer. The ordered recombination of V, D, and J 
segments is determined by the “12/23 rule’: recombin- 
ation always involves one pair of RSSs with a 12 bp 
spacer, and one pair with a 23 bp spacer. In the cleay- 
age reaction, a nick is generated on the coding strand 
of the DNA adjacent to the heptameric site in the RSS. 
This creates a free 3/-hydroxyl end which attacks 


the phosphodiester bond on the opposite strand to 
produce a hairpin structure at one side of the cleavage 
site and a blunt end at the other. (This transesterifica- 
tion reaction is similar to that used in transposition 
and retroviral integration, although in those processes 
the free end attacks a target DNA, not the opposite 
strand of the duplex.) The hairpin is then opened, and 
untemplated nucleotides may be added by terminal 
transferase, creating novel sequence at the cleavage 
junction. Finally, DNA ends are ligated in a reaction 
that depends upon ubiquitous factors involved in non- 
homologous end-joining: Ku70, Ku80, DNA-PKcs, 
DNA ligase IV, and XRCC4. 


Allelic Exclusion 

V(D)J recombination can occur at either chromo- 
somal allele. If recombination at one allele is not 
successful — for example, if recombination produces 
a V region which includes a premature termination 
codon — the second allele will recombine. However, 
successful recombination at one allele prevents recom- 
bination at the other allele. This is called ‘allelic ex- 
clusion.’ Allelic exclusion ensures that each B cell 
expresses only one type of antigen receptor, so that 
when antigen binding stimulates B cell proliferation, 
clonal expansion increases the numbers of cells ex- 
pressing antigen of the correct specificity. 


Recognition of Diverse Antigens 
An organism needs a diverse repertoire of antibodies, 
which can provide protection against many different 
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pathogens. This diverse repertoire is achieved in two 
distinct ways: combinatorial diversity and sequence 
diversification. 


1. Combinatorial diversity. In some organisms, 
including mice and humans, V(D)J recombination 
uses a large pool of V, D, and J gene segments. In 
principle, the number of different variable genes 
that can be produced from combinatorial joining 
of gene segments from three distinct families is 
the product of the number of segments in each 
family. In fact, not all gene segments in the mam- 
malian germline are functional, and others are 
rarely used. 

2. Sequence diversification by targeted hypermu- 
tation. In some organisms, including sheep and 
chickens, V(D)J recombination uses relatively few 
gene segments, and the repertoire becomes diverse 
by a targeted process of hypermutation. The 
chicken 4 light chain locus is an extreme example. 
Here, a single functional V region recombines with 
a single J region. Hypermutation of the recom- 
bined VJ region then produces a diverse reper- 
toire. In the chicken, hypermutation depends 
on templated mutation (gene conversion), which 
transfers sequence information from a family of 
pseudo-V regions to the recombined gene. In the 
sheep, hypermutation also diversifies a limited 
repertoire, but hypermutation is untemplated, not 
templated. 


Immunoglobulin Heavy Chain Class 
Switch Recombination 


Immunoglobulin heavy chain class switch recombin- 
ation is a regulated process of DNA deletion. Prior to 
switch recombination, the heavy chain contains a vari- 
able (VDJ) region fused to the Cy constant region, and 
a B cell produces IgM antibodies (Figure 1). Switch 
recombination joins the expressed variable region to a 
downstream C region of a new ‘class’ (or ‘isotype’). 
Antibodies of each class remove antigen in distinct 
ways: IgM antibodies activate complement; IgG anti- 
bodies, the major serum antibodies, interact with 
receptors on phagocytic cells; and IgA antibodies, 
found in secretions including saliva, tears, milk, and 
intestinal mucus, coat invading pathogens to remove 
them from the body. Because switch recombination 
changes the C region but not the V region, the result of 
switch recombination is to alter how an immuno- 
globulin molecule removes antigen without altering 
its specificity for antigen. Switch recombination is 
essential to the immune response, and impaired pro- 
duction of specific classes of serum antibodies can 
result in immunodeficiency. 


Switch Recombination is Regulated, 
Region-Specific Recombination 

Switch recombination is a region-specific recom- 
bination process which involves repetitive regions 
of DNA, called switch or S regions. S regions are 
G-rich sequences, 2-8 kb in length, which are located 
in the intron upstream of those C regions that partici- 
pate in switch recombination: Cp, Cy, Ca, and Ce. 
Switch recombination is not site-specific nor homo- 
logous, but instead produces junctions which are het- 
erogeneous in sequence and which may be located 
anywhere within an S region. Because switch regions 
are within introns, the imprecision of the DNA 
recombination event leaves no mark on the heavy 
chain polypeptide. Switch recombination occurs in 
activated B cells. S region transcription is prerequisite 
to recombination, and is regulated by signals from 
T cells and by cytokines and lymphokines, which 
bind to the B cell surface and stimulate a signaling 
cascade that culminates in activation of transcrip- 
tion at those S regions targeted for recombination. 
Factors involved in general recombination/repair 
have been implicated in switch recombination, but 
relatively little is known about the recombination 
mechanism. 


See also: Immunoglobulin Gene Superfamily; 
Recombination, Models of 
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Models of recombination are the hypothetical schemes 
by which recombination is proposed to occur. They 
are subject to continual modification to accommodate 
new findings and concepts. Models attempt to inte- 
grate many of the diverse phenomena of recombin- 
ation in a single mechanism with varying outcomes. 

The signal that a recombinational interaction 
between DNA molecules (a recombination event) 
has occurred is often the transfer from one molecule 
to another of a short length of information on the scale 
of a single gene. This nonreciprocal (unidirectional) 
transfer is called conversion or gene conversion. Con- 
version can affect either one or both DNA strands of a 
chromatid. Conversion on one strand only is seen as 
the phenomenon of postmeiotic segregation, where a 
single haploid meiotic product gives rise to cells of 
two different genotypes. 


A recombination event sometimes results in a 
crossover. This is a reciprocal exchange of whole 
lengths of two molecules, as though they had been 
broken in the same position and rejoined with the 
other broken end. Other recombination events leave 
the interacting molecules largely in their parental form 
and the event is detected only by the presence of 
conversion. Meiotic recombination involves the for- 
mation of crossovers and non-crossovers approxi- 
mately equally. In mitosis, over 90% of the events 
are found to be non-crossovers. 

The process of recombination necessarily consists 
of several phases: initiation, preparation of DNA sub- 
strates, interaction of recombining molecules, pro- 
cesses acting on intermediates, and resolution — the 
separation of the interacting molecules. 


The Chiasmatype Hypothesis 


Crossovers are visible during meiosis as nodes (chias- 
mata) at which pairs of chromatids interact. An early 
model, called the chiasmatype hypothesis, was based 
on the visible properties of meiotic chromosomes. The 
model proposed that the coiling of chromosomes, by 
which chromatin becomes condensed, leads to a build- 
up of torsional stress. This stress leads to breakage of 
chromatids in reciprocal positions, and rejoining of 
the broken ends gives the crossovers that are visible as 
chiasmata. This idea also explained why many organ- 
isms show regular spacing between chiasmata, namely 
that a certain length of coiling was needed to build 
up the torsional stress. This phenomenon of spacing 
of crossovers is known as crossover position inter- 
ference. There is no widely accepted mechanism 
for it, and few models seek to incorporate an explan- 
ation. 


Copy-Choice Hypothesis 


When chiasmata, the sites of crossovers, become vis- 
ible during meiosis, the chromosomes are already 
visibly double, showing that they have already 
become duplicated. This is consistent with the idea 
that the machinery replicating the chromosome 
might have a choice as to which of the two hom- 
ologous chromosomes to follow. This notion was 
incorporated into a model of recombination called 
the copy-choice hypothesis. This model proposed 
that, when the template being followed by the DNA 
polymerase is switched, recombination occurs. If re- 
ciprocal switches do not occur at precisely the same 
position on the chromosome, conversion would be 
seen. This mechanism implies a conservative style of 
DNA replication, so that when the concept of semi- 
conservative replication of DNA became accepted, 
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it was easy to discount copy-choice as a possible 
mechanism. The mechanism also predicts that mul- 
tiple crossovers will be confined to the same two 
chromatids. This conflicts with what is observed, 
that all four chromatids may be involved in crossovers 
along the chromosome. 


Hybrid DNA 


The elucidation of the structure of DNA and the 
concept of complementary base pairing coincided 
with detailed descriptions of intragenic recombin- 
ation, including conversion and postmeiotic segrega- 
tion. This resulted in a new generation of models, 
starting in 1963, in which it was proposed that DNA 
molecules were rejoined by complementary base pair- 
ing between DNA single strands to give a hybrid 
molecule. Hybrid DNA containing mismatched base 
pairs, stemming from an allelic difference between the 
parents, is called heteroduplex DNA. The existence 
and significance of heteroduplex DNA was realized at 
about the same time based on observations of bacterio- 
phage genotypes that were best explained as two 
genotypes occurring within the same DNA molecule. 
These models not only gave us the concepts of hybrid 
and heteroduplex DNA, they also postulated that 
heterozygosity within heteroduplex DNA is subject 
to correction. This is the present-day concept of mis- 
match repair. The Holliday junction, the structure 
formed by reciprocal exchange of single strands 
between two DNA molecules, is a central concept of 
modern molecular biology (Holliday, 1964). It is the 
substrate on which resolvases (enzymes that resolve 
recombination intermediates into separate molecules) 
act. Resolution of the Holliday junction is thought to 
play a role in whether or not the outcome of a recom- 
bination event is a crossover. 


Asymmetrical Heteroduplex 


During the 1970s, accumulating data on Saccharo- 
myces cerevisiae meiotic recombination was making 
it clear that recombination events were not always 
reciprocal in that heteroduplex is not symmetrically 
distributed on the two chromatids participating in a 
recombination event. The Meselson—Radding model 
introduced new features of strand-exchange inter- 
mediates that allow an asymmetrical invasion of a 
duplex by a single strand to form asymmetrical 
heteroduplex that is extended as symmetrical hetero- 
duplex (Meselson and Radding, 1975). The configur- 
ations of heteroduplex DNA predicted by this model 
provide a very satisfying description of the pattern of 
recombination seen in meiosis in the fungus Ascobolus. 
However, the Meselson—Radding model, like earlier 
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heteroduplex models, relies on nicks in single DNA 
strands to initiate recombination events. The initiation 
of recombination was the next concept to be changed. 


Double-Strand Break Repair 


At about the same time as the publication of the 
Meselson—Radding model, evidence was accumulating 
that circumstances that damaged both strands of a 
DNA molecule induced recombination. This led to 
the concept that recombination is a repair process, 
involved predominantly in the repair of double-strand 
breaks. This was first developed into a recombin- 
ation model in 1975 to explain the repair of breaks in 
DNA caused by radiation. The demonstration that 
cut plasmids were repaired in S. cerevisiae by a 
mechanism that filled gaps by conversion-like transfer 
of information from a homologous molecule re- 
inforced this notion. This work inspired the elaboration 
of a more generalized model of recombination based 
on double-strand breaks and called the double-strand 
break repair model (Szostak et al., 1983). This model 
made the bold prediction that meiotic recombination, 
like repair recombination, is induced by double- 
strand breaks. This has been found to be true at 
many places in S. cerevisiae chromosomes. Thus, 
repair recombination converged with meiotic recom- 
bination. 

Double-strand break repair as a general hypothesis 
for recombination solved a long-standing problem. 
Geneticists have inferred from many lines of evidence 
that the molecule on which recombination is initiated 
is the recipient of information in the recombination 
process. Earlier models that used a nick to initiate 
recombination predicted that the nicked molecule 
would become the donor. On a double-strand break 
repair model, the broken molecule is expected to 
be the recipient of genetic information, because the 
break or gap is filled by copying the sequence of a 
homologous molecule. The mechanisms of hetero- 
duplex formation, extension, resolution, and cor- 
rection that are proposed in the double-strand break 
repair model are similar to those used in previous 
models. 

The emphasis now shifted to the perspective that 
repair is the primary function of recombination. Use 
of the same mechanism to achieve regularized segre- 
gation of chromosomes in meiosis and to provide 
variation in future generations look more like derived 
functions. 

The double-strand break repair model continues 
to be modified to incorporate new information on 
the biochemistry and genetics of recombination, and 
some version of it can be expected to persist. 


Replication Restart 


Double-strand break repair models invoke DNA 
synthesis primed by an invading strand to bridge the 
break. The same mechanism can fill gaps in the 
damaged DNA molecule. Since about 1990, there has 
been increasing evidence and speculation that recom- 
bination intermediates not only prime synthesis of one 
strand of DNA, but also allow the formation of a 
replication complex, and hence replicate both strands 
of whole pieces of chromosomes. The idea is that 
faults in the progression of a replication fork along 
the chromosome occur fairly often. Instead of recom- 
mencing replication at a replication origin, recombin- 
ation between sister molecules at the stalled fork will 
reform the replication fork and allow it to continue 
along the chromosome. If the recombination event 
that restarts replication involves a homologous 
molecule rather than a sister molecule, recombi- 
nation between the homologs will occur. Recombina- 
tion of this sortis knownas break/copy recombination. 
Replication restart may be needed more often than 
repair of double-strand breaks. It is now suggested 
to be the most important, and perhaps the original, 
function of recombination. The next generation of 
models may integrate recombination with the re- 
plication apparatus and with the events that occur at 
replication forks. 


Further Reading 

Whitehouse HLK (1965) Towards an Understanding of the 
Mechanism of Heredity. London: Edward Arnold. 

Hastings PJ (1988) Recombination in the eukaryotic nucleus. 
Bioessays 9: 61-64. 

Petes TD, Malone RE and Symington LS (1991) Recombination 
in yeast. In: JR Broach, JR Pringle and EW Jones (eds), The 
Molecular and Cellular Biology of the Yeast Saccharomyces. 
vol. 3, pp. 407-521. New York: Cold Spring Harbor Labora- 
tory Press. 
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Definition 


Recombination nodules are dense bodies detectable by 
electron microscopy that are associated with the paired 
chromosome cores at meiotic prophase. Late nodules 
(LN) are found at the late meiotic prophase stage 
(Figures |, 2, and 4) and show a distinct correlation 
with numbers and positions of reciprocal recombinant 
events. Early nodules (EN) are also electron-dense 
bodies associated with chromosome cores, or paired 
cores (Figures 3 and 5) but they do not correlate with 
numbers or positions of reciprocal recombination 
events (crossovers). The ENs and LNs are common 
phenomena and are associated with the chromosome 


Figure | The organization of a set of synapsed 
homologs at the pachytene stage of meiosis. The 
chromatin loops (ch) are attached to the chromosome 
cores (co) which are connected to the nuclear envelope 
(ne). The late nodule (LN) lies between the lateral 
elements (le) of the synaptonemal complex (SC). The 
centromere (ce) is associated with the SC. 


Figure 2 Late nodules are visible in plastic-embedded 
rat spermatocytes: a thin section through the nucleus 
in the figure shows the lateral elements (le) of a 
synaptonemal complex in cross-section with an asso- 
ciated late nodule (LN). 
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cores and synaptonemal complexes of pollen mother 
cells of a variety of plants such as lilies, tomatoes, and 
onions. They have also been reported in a variety of 
fungi, insects, birds, and mammals. To what extent 
they are a necessary adjunct of the synaptonemal com- 
plexes will become clear as they are reported in vari- 
ous other species. They are, however, difficult to 
recognize in some species. 


Early Nodule Functions 


In lily, mouse, and humans, the electron-dense bodies 
at the early stages of meiotic prophase contain signifi- 
cant accumulations of RAD51 and DMC1 pro- 
teins (Figures 3 and 5) which have been shown by in 
vivo and in vitro assays to function in DNA break 
repair and DNA homology search, but there are con- 
flicting reports on their presumptive colocalization. 
The homology search properties of the RAD51/ 
DMC1 strand-exchange proteins would appear to be 
ideally suited for a mechanism that brings together 
homologous chromosomes. There is, however, no evi- 
dence in support of that hypothesis. In fact, indica- 
tions are that alternative mechanisms are involved. 
While some of the proteins that are expected to be at 
the initiation sites of recombination, such as SPO11 
and MRE11, have not been reported in association 
with ENs, they may be present in insufficient quan- 
tities to be detectable with immunocytology. 


Figure 3 Antibodies against the RAD51I and DMI pro- 
teins specifically recognize early nodules (EN) which are 
associated with the synaptonemal complex (SC). The 
number of ENs ranges from 250 at the earliest prophase 
stages to zero at mid-pachytene. In this mouse spermato- 
cyte, there are about 50 ENs, and the X and Y sex chromo- 
somes (lower right) have the brightest fluorescent foci. 
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Figure 4 In this shadow-cast image, the late nodule (LN) lies on the central element (ce) between the cores or 
lateral elements (le) of a pair of synapsed homologous chromosomes (rat in this figure). The two cores form the axial 
synaptonemal complex, which in this figure has two twists (T). The chromatin surrounds the synaptonemal complex 
but is not visible in this type of preparation. The image is produced by drying the synaptonemal complexes on a thin 
plastic film and shadow-casting at a low angle with atomic gold or platinum. The preparation is then viewed with an 
electron microscope. The synaptonemal complex is about 200 nm wide. 


Figure 5 Antibodies against the RAD5 I and DMCI proteins of the early nodules (EN) can detect early nodules with 
high resolution by electron microscopy when the secondary antibody is conjugated with 5 nM gold grains. The centromere 
(ce) is marked by 15 nM gold grains through the use of anti-centromere antibodies. SC, synaptonemal complex. 
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mutated in Bloom’s syndrome, BLM, which is a 
DNA helicase, has been observed in association with 
a known site of genetic exchange. At later stages, pre- 
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events led to the conclusion that LN are involved in 
recombination. The correlation was later demon- 
strated in two species of grasshopper, one with cross- 
overs/chiasmata localized at the ends of the large 
chromosomes and a related species with nonlocalized 
chiasmata (Figure 6). In the first species, Chloealtis 
conspersa, chiasmata and, coincidentally, the LNs are 
at the ends of the synaptonemal complexes, but in the 
second species, Locusta migratoria, they are not neces- 
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Figure 6 Chromosomal location of late recombin- 
ation nodules. 


Late Nodule Functions 


The indications are that the late nodules contain pro- 
tein complexes that are involved in the maturation of 
the recombinant molecules such as the resolution 


sarily at the ends. The same characteristics were 
reported for one onion species with centromere- 
proximal localized chiasmata and another species 


with non-localized chiasmata. These correlations are 
indirect, but strong, evidence that the nodules are 
involved in late recombination functions. 


LN Detection 


Traditionally, LNs are detected by electron micro- 
scopy of tissue sections or in whole-mount (surface) 
spreads of meiotic prophase nuclei (Figures 1, 2, and 
4). To observe the complete set of LNs in a given 
nucleus with sections (Figure 2), an elaborate set of 
complete serial sections has to be generated and the 
nucleus reconstructed. This has been done for a num- 
ber of fungi, protists, plants, and animals. Whole- 
mount spreads are more efficient because the nucleus 
can be observed in its entirety, and all LNs are dis- 
played simultaneously when appropriately stained 
with phosphotungstic acid and osmium tetroxide or 
else shadow-cast. In general, the observed number of 
LNs per nucleus falls short of that predicted by the 
known frequency of recombination. This is usually 
attributed to the transient existence of the LNs. 


Further Reading 
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Recombination pathways are schemes of proposed 
molecular mechanisms of recombination with defined 
sets of proteins that are imagined to act sequentially 
on specific DNA intermediates. The idea is that DNA 
intermediates are passed from enzyme to enzyme ina 
series of reactions leading to recombined DNA. Re- 
combination pathways were defined first by A. John 
Clark (Berkeley) for conjugational recombination in 
the bactertum Escherichia coli. His group isolated 
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mutants incapable of recombining DNA in mediated 
conjugation and defined the sets of genes required. 
Recombination in wild-type E. coli is said to proceed 
by the RecBCD pathway. This is the normal recombin- 
ation route for linear DNA. Two other pathways, 
called RecE and RecF, were defined in recBC null 
mutants. The RecE pathway operates only in some 
strains of E. coli K12 that carry a silent, lambda-like 
prophage and only if those cells acquire a mutation 
that activates transcription of the prophage recom- 
bination genes. The RecF pathway operates when 
two normally functional nucleases are inactivated or 
altered by mutation and requires many rec genes not 
seen to be required in otherwise rec* cells. RecF path- 
way proteins may be relevant to recombination of 
circular DNA substrates, or to only one of two (par- 
tially redundant) branches of the RecBCD pathway 
for recombination of linear DNA (in wild-type cells). 


RecBCD Recombination Pathway 


Recombination in Hfr crosses, in which linear duplex 
DNA recombines with a recipient chromosome, 
requires RecA and RecBC, as well as the partially 
redundant functions of RuvABC or RecG (reviewed 
by Lloyd and Low, 1996). These proteins are said to 
function in the RecBCD recombination pathway, so 
named because RecBCD enzyme does not function in 
any other known pathway, whereas the rest of those 
proteins do. The RecBCD pathway is the major route 
for processing linear DNA in E. coli, functioning in 
Hfr-mediated conjugation, phage-mediated transduc- 
tion, and double-strand-break and double-strand-end 
repair. 


RecF Recombination Pathway 


recB and recC null mutants are recombination- 
deficient in Hfr crosses. When Clark and colleagues 
searched for mutations that would restore recombin- 
ation proficiency to recBC mutant cells, they first 
found sbcA mutations that suppressed the recBC 
recombination defect (restoring conjugational recom- 
bination of linear DNA). These mutations activate 
the promoter of the rac prophage, a cryptic (non- 
expressed) fragment of a lambda-like prophage in the 
genome of some E. coli K12 strains. When activated, 
that promoter allows transcription of the recE and 
recT genes. RecE and RecT are required for recombin- 
ation in recBC sbcA cells, thus giving rise to the ‘RecE 
recombination pathway.’ RecE and RecT proteins are 
orthologs of phage à exonuclease and B-proteins, 
respectively: a 5’ to 3’ single-strand exonuclease 
(such as RecE) and a strand-annealing protein (such 
as RecT). They are not present in E. coli strains other 
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than K12 (and not in all K12 derivatives) and so appear 
to be a special case that is not generally relevant to 
recombination in E. coli (reviewed by Clark and Low, 
1988; Lloyd and Low, 1996). 


RecF Recombination Pathway 


Acquisition of both sbcB and sbcC or sbcD mutations 
also restores linear DNA recombination-proficiency 
to recBC cells (Lloyd and Buckman, 1985). The sbcC 
mutation inactivates the SbcCD hairpin endo/exo- 
nuclease, whereas the sbcB mutation appears to be a 
non-null allele of the gene encoding Exol, a 3’ to 5’ 
single-strand-dependent exonuclease (Razavy et al., 
1996). Satisfying mechanisms have not been demon- 
strated for how these two mutations restore recom- 
bination to recBC cells, but the general hypothesis of 
Clark and colleagues still holds (reviewed by Clark 
and Sandler, 1994). They envisioned that these muta- 
tions result in altered processing of linear DNA dur- 
ing conjugational recombination such that different 
DNA substrates would be available for recombination 
insbcBC mutants than in wild-type cells. The different 
DNA substrates would be acted upon by different sets 
of enzymes. In addition to recA, the rec genes recF, -G, 
-J, -N, -O, -Q, and -R, as well as the ruvA, ruvB, and 
ruvC genes, are all required for conjugational re- 
combination in recBC sbcB sbcC cells. This process is 
called RecF pathway recombination. recA is the only 
one of these genes that is required for conjugational 
recombination in otherwise wild-type cells (reviewed 
by Clark and Low, 1988; Lloyd and Low, 1996). 


Function of the RecF Pathway Genes in 
E. coli 


Presumably, the many Rec proteins that appear to be 
specific to the RecF pathway have not evolved to 
protect E. coli in the eventuality that cells accumulate 
three mutations (recBC, sbcB, and sbcC). It seems 
more likely that these are recombination and DNA 
repair proteins that normally act on substrates that are 
not processed efficiently by RecBCD such as circular 
DNA (Kolodner et al., 1985). Alternatively, these 
RecF pathway proteins could be RecBCD pathway 
(linear DNA recombination) proteins that are needed 
for only some of the DNA substrates that RecBCD 
usually handles. For example, RecBCD-mediated 
recombination may be divided into two separate 
branches: one branch requiring DNA replication, and 
the other branch not (Motamedi et al., 1999). It may be 
that the RecF pathway proteins function in either 
the replicative or nonreplicative branch. If so, the 
loss of function of RecF pathway proteins would not 
diminish recombination much, because, in otherwise 


wild-type cells, an independent, alternative branch 
would still remain functional. 


Further Reading 

Rosenberg SM and Motamedi MR (1999) Homologous recom- 
bination during bacterial conjuation. In: Embryonic Encyclope- 
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Recombination suppression is a reduction or ab- 
sence of the exchange of genetic material between 


homologous chromosomes during meiosis. It occurs 
in regions of the genome where homologs differ by 
the presence or absence of one or more inversions 
that suppress normal pairing and crossing-over. Loci 
within the extent of the inversion become completely 
linked and cannot be resolved by traditional mapping 
methods. 


See also: Inversion 


Recombinational Repair 
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Some DNA repair processes depend on the pairing 
of homologous DNA segments and recombinational 
mechanisms otherwise used for genetic exchange. 
These processes intervene when there is no local 
template for repair, as there is for damage in a single 
strand. Instead, sister chromosomes or homologs 
provide the template. Double-strand breaks in 
DNA are repaired by recombinational mechanisms 
in species ranging from bacteria to humans. These 
mechanisms may include but do not require re- 
combination of chromosomal markers flanking the 
damage. 


See also: Repair Mechanisms 


Reed-Sternberg Cells 
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Reed-Sternberg (R-S) cells are large binucleate cells, 
15-45 um in diameter with abundant slightly basophi- 
lic cytoplasm, and prominent nucleoli. Their presence 
in the appropriate morphological background of 
lymphocytes, eosinophils, and plasma cells is diag- 
nostic of Hodgkin’s disease. Originally described by 
Sternberg (1898) and independently by Reed (1902), 
these cells are of uncertain origin, but recent evidence 
showing clonal Ig rearrangement in individual R-S 
cells strongly points to a B-cell origin. Clonal Epstein- 
Barr virus (EBV) genomes and EBV latency asso- 
ciated proteins found in R-S cells also suggest an 
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etiological role for EBV in the pathogenesis of this 
tumor. 


Further Reading 
Diehl V (ed.) (1996) Hodgkin’s Disease. London: Balliére Tindall. 


See also: Epstein-Barr Virus (EBV); Hodgkin’s 
Disease 
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The reeler mouse was identified as a spontaneous 
recessive mutation in 1951. It is characterized grossly 
by ataxia, especially locomotion involving the hind 
limbs, and displays variable viability and fertility on 
different inbred or outbred genetic backgrounds. 
Histologically, the brains of these mice are disorgan- 
ized with respect to neuronal placement, and this dis- 
ruption is especially evident in the highly structured 
cerebellum, cerebral cortex, and hippocampus. Analy- 
sis of the reeler mutation has contributed substantially 
to the understanding of how neurons find their correct 
positions during brain ontogenesis, and has implica- 
tions for important human neurological disorders, 
including schizophrenia. 

Disruption of the reelin gene (Reln) was demon- 
strated to be responsible for the reeler phenotype in 
1995. (Accordingly, the gene name and symbol were 
changed to represent the normal gene product, of 
which reeler is a mutant allele). Reln is a very large 
gene, encoding a 12kb mRNA from 65 exons span- 
ning about 450 kb of genomic DNA. The 10.3 kb open 
reading frame encodes a protein of 3461 amino acid 
residues and a relative molecular mass of 388 kDa. 
Reelin is a secreted glycoprotein characterized by a 
signal peptide, F-spondin domain, and eight novel 
repeated domains, each of which contains an EGF- 
like motif. The protein is highly conserved, show- 
ing 94% amino acid identity between mouse and 
human. The Reln gene maps to mouse chromosome 
5 and to a region of conserved synteny on human 
chromosome 7. 

Neurons in the developing brains of homozygous 
reeler mice display a migratory defect resulting in 
abnormal lamination of the cerebral cortex and cere- 
bellum. The normal cortex is formed by an ‘inside-out’ 
pattern of migration. Pioneer neurons such as the 
Cajal-Retzius cells migrate outward from a central 
ventricular zone along glial fibers, stopping at an 
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appropriate level. Subsequent waves of cells migrate 
past layers of already established cells to form succes- 
sive layers of the cortex. Neurons in the mutant 
mice are formed and begin to migrate, but do not 
penetrate the cortical plate formed by the first wave 
of migrating cells. Ultimately they do form appropri- 
ate synaptic connections with other neurons. In other 
words, they carry out most processes of neuronal 
development and function, but appear to lack signals 
necessary to determine their proper positions within 
the cortex. 

Substantial advances in understanding how reelin 
might affect this process have come through the analysis 
of additional mouse mutations that are near-perfect 
phenocopies of reeler. Two spontaneous mutations, 
scrambler and yotari, are both alleles of the Dabi 
gene. Dab1 encodes a cytoplasmic protein involved 
in signal transduction which is regulated by tyrosine 
phosphorylation. It is expressed in the migrating neur- 
ons that fail to find their appropriate positions in 
reeler mice. Dab1 expression levels are significantly 
elevated in the absence of reelin, but levels of phos- 
phorylated protein are decreased. Reelin has no known 
Dab1-independent functions. 

In addition to these spontaneously occurring Dab1 
mutations, the reeler phenotype is also closely re- 
capitulated in a genetically engineered mouse in 
which two low-density lipoprotein family receptors, 
the ApoE and VLDL receptors, are both deleted 
(ApoER2 -/-, VLDLR -/-). Together with the Dab1 
-/- phenocopies of reeler, these observations suggest a 
pathway for signaling to determine neuronal place- 
ment involving these separate elements. Several lines 
of experimental evidence suggest that ApoER2 and 
VLDLR act as receptors for the secreted reelin glyco- 
protein, although this mechanism is not yet proven 
in vivo. An additional co-receptor may be required 
for reelin signaling, as well. Transmembrane proteins 
of the cadherin-related neuronal (CNR) receptor 
family are implicated both by an appropriate pattern 
and timing of expression, and from in vitro binding 
studies. 

Like all LDL family receptors, ApoER2 and 
VLDLR transmembrane proteins both contain a cyto- 
plasmic FXNPXY signal, and the Dab1 protein has a 
binding domain that interacts with this signal. Further, 
the CNR receptors transduce signals via tyrosine 
kinases of the src family. If a CNR serves as the co- 
receptor for reelin with ApoER2 or VLDLR, Dab1 
binding to the FXNPXY signal in the activated recep- 
tor would thus bring it into proximity of a tyrosine 
kinase associated with the CNR. The absence of 
phosphorylated Dab1 in reeler mice could then be 
explained, since Dab1 would not be expected to asso- 
ciate with the receptor unless it first binds reelin. 


Even if the reelin-LDLR/CNR-Dab1 model proves 
to accurately reflect the cellular signaling pathway for 
neuronal positioning during development, the Reln 
gene may prove to have further roles operating via 
different pathways in adults. Analysis of the brains 
of schizophrenics demonstrated a reduction of reelin 
mRNA and protein to levels 50% of normal, with no 
effect on the level of Dab1 expression. Substantial 
characterization remains to be done to fully under- 
stand this gene and its multiple roles in the complex 
organization and function of the brain in normal and 
pathological situations. 


Further Reading 

Cooper, JA and Howell, BW (1999) Lipoprotein receptors: 
signaling functions in the brain? Cell 97: 671-674. 

Rice, DS and Curran, T (1999) Mutant mice with scrambled 
brains: understanding the signaling pathways that control 
cell positioning in the CNS. Genes and Development 13: 
2758-2773. 


See also: Neuronal Guidance; Neuronal 
Specification; Schizophrenia 
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DNA is inherently unstable and susceptible to modi- 
fication by a broad range of environmental agents, 
including ultraviolet (UV) light and ionizing radi- 
ation. In addition, it is widely appreciated that the 
stability of cellular DNA can be compromized by 
purely endogenous processes, which include spontan- 
eous enzymatic attack, spontaneous base loss, and 
reaction of oxidizing and alkylating chemical agents 
formed during normal cellular metabolic processes. 
Given the importance of maintaining the structural 
integrity of the DNA molecule, it is not surprising to 
find that organisms ranging from bacteria to Homo 
sapiens have a number of DNA repair pathways that 
are capable of recognizing and repairing many differ- 
ent types of DNA lesions. 

In general, organisms have at their disposal consti- 
tutively produced DNA repair proteins that primarily 
serve to prevent and resolve low levels of DNA 
damage in an essentially ‘error-free’ fashion. In some 
cases, DNA damage can be repaired by catalyzing the 
direct chemical reversal of the damage (e.g., by consti- 
tutively expressed photoreactivating and alkyltrans- 
ferase proteins). Other more complex constitutively 


expressed repair processes include the base excision 
repair (BER), nucleotide excision repair (NER), mis- 
match excision repair (MMR), and recombinational 
repair pathways. The excision pathways act by excis- 
ing damaged segments of DNA and they correct 
DNA alterations ranging from DNA base mismatches 
to a variety of chemical and radiation-induced DNA 
changes. The recombination repair pathways act to 
shuffle damaged segments of DNA in such a way 
that they become amenable to the excision and other 
repair pathways. 

Interestingly, most organisms possess so-called 
damage-inducible DNA repair mechanisms whose 
activities are greatly upregulated in direct response to 
widespread and often severe DNA damage. However, 
it is important to point out that the line between 
constitutive and damage-inducible DNA repair pro- 
cesses is not always clearly defined. For example, the 
activities of certain of the constitutive repair proteins 
mentioned above (i.e. some alkyltransferases and 
nucleotide excision repair proteins) are also products 
of damage-inducible regulons (a regulon is a set of 
unlinked genes that are coordinately regulated by a 
common mechanism). Damage-inducible repair 
processes have been found in prokaryotic and eukar- 
yotic organisms, but have been extensively studied 
and characterized in model bacteria like Escherichia 
coli. The most well-known damage-inducible mech- 
anisms include the SOS response, the Ada-dependent 
adaptive response to alkylation damage, and the 
SoxRS and OxyR-mediated responses to oxidative 
stress. 

First described by Miroslav Radman in 1975, the 
SOS response is a rather complex cellular process that 
is induced by a variety of DNA-damaging treat- 
ments and by the inhibition of DNA replication. It 
is controlled by the products of the recA and lexA 
genes that together function to regulate and coordin- 
ate the expression of more than 20 unlinked genes 
whose products are involved in DNA repair, mutagen- 
esis, and many other cellular processes (see SOS 
Repair). However, unlike the Ada-, SoxRS- and 
OxyR-dependent repair pathways that repair DNA 
damage in essentially error-free ways, the repair 
mechanisms coded for by certain of the SOS genes 
(i.e., recA, umuD,C and dinB) do not repair DNA 
damage, per se, but instead help the organism to toler- 
ate damage to its genome. E. coli has two different 
types of these so-called translesion DNA repair pro- 
teins, DNA polymerase V (umuDC) and DNA poly- 
merase IV (dinB). A number of these translesion 
DNA polymerases have been found in eukaryotes 
and mammals (i.e., the Rev1 deoxycytidyl transferase 
and the DNA polymerases ¢, n and 2). 
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Leona Samson and John Cairns (1977), initially 
showed that after E. coli cells are exposed to low 
nonlethal doses of a simple alkylating agent for certain 
periods of time (i.e., N-methyl-N-nitrosoguanidine 
[MNNG]}) they can become resistant to the mutagenic 
and potentially lethal effects of much higher doses of 
a wide range of chemical alkylating agents; hence, 
the bacterial cells are able to adapt themselves to 
alkylation-induced stress. It is now well known that 
the Ada DNA alkyltransferase is responsible for regu- 
lating this adaptive response; on transferring the alkyl 
group from DNA alkylphosphotriesters to an active 
alkyl-acceptor cysteine moiety located in its N- 
terminal region (i.e., Cys-69), Ada becomes a strong 
transcriptional activator for a group of genes including 
ada itself, alkA (a DNA glycosylase which repairs a 
plethora of alkylated bases), alkB, and aid (genes of 
unknown function) whose products further protect E. 
coli from DNA alkylation. 

The protein products of the SoxRS and OxyR- 
regulated genes, on the other hand, protect E. coli 
cells from various forms of oxidative stress. These 
are two quite independent regulatory processes. 
Thus, for example, SoxRS-dependent genes are upre- 
gulated in the presence of high intracellular levels 
of superoxide (O7); interestingly, expression of the 
SoxRS response is also increased in the presence 
of nitric oxide (NO) thereby providing protec- 
tion against NO-induced cytotoxicity. By contrast, 
OxyR-regulated genes are induced in response to ele- 
vated intracellular concentrations of hydrogen perox- 
ide (H203) or else by an H,O>-generated signal. 

It is also worthwhile noting that the operations of 
certain of these constitutive or damage-inducible 
DNA repair pathways are greatly influenced by the 
overall metabolic state of the cells themselves or else 
by other global regulatory mechanisms. Thus, for 
example, the levels of certain of the mismatch excision 
repair proteins in E. coli, at least, vary greatly between 
exponentially and stationary-phase growing cell cul- 
tures. In addition, it is known that the expression of 
oxyR and certain of the SOS genes (i.e., recA and lexA) 
are more likely to be maximal in the presence of high 
intracellular concentrations of the global regulatory 
molecule known as cyclic AMP. Given the recent 
explosion in the development and use of DNA micro- 
array technologies to study global gene expression in 
prokaryotes, eukaryotes, and mammals, it is conceiv- 
able that many more such regulatory pathways will 
be revealed, thereby allowing us to comprehend fully 
how organisms cope and deal with single or even 
multiple stresses. 


See also: DNA Repair; SOS Repair 
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Why do Organisms Need Regulatory 
Genes? 


A living cell’s DNA encodes all of the information 
that is needed to carry out thousands of chemical 
reactions that make the cell what it is, for example, a 
unicellular organism living under the ocean or as part 
of a complex multicellular organism, such as a human. 
However, if the entire spectrum of protein and RNA 
molecules that a cell is capable of making were to be 
made all the time and the myriad of chemical reactions 
the cell is capable of carrying out occurred constantly 
and simultaneously there would be complete chaos, 
resulting in no adaptation, variation, differentiation, 
or development. Nature has developed sophisticated 
regulatory networks that coordinate such behavior, 
thereby avoiding chaos, in two ways: 


1. Metabolic coordination, which allows or blocks 
chemical reactions as cellular demand requires. 
Such objectives are achieved rapidly by modulating 
the behavior of enzymes and proteins through their 
noncovalent or covalent modification by intra- or 
extracellular signals. 

2. Gene regulation, whose purpose is long-term care. 
This set of processes ensures that gene products, 
such as enzymes, structural proteins, and RNA 
molecules, are synthesized when they are needed 
and in proper amounts. The synthesis of gene pro- 
ducts is controlled by mechanisms called gene regu- 
lation. Thus, the regulatory genes. 


Definition of a Regulatory Gene 


In macromolecular synthesis, the premier step of gene 
regulation is at the level of transcription initiation, 
achieved most often by proteins encoded by genes 
called ‘regulatory genes.” The products (regulators) 
act by turning transcription on (up) or off (down). 
The former process is called activation or enhance- 
ment, the latter is designated repression or silencing. 
Accordingly, the regulatory proteins are activators 
and repressors. The action of activators (positive 
control) and repressors (negative control) are gene 
specific and occur in response to cellular needs. Regu- 
latory proteins usually influence transcription initia- 
tion by binding to specific target sites in DNA 


although examples are known of regulators that 
work without a DNA site. Regulators usually act by 
contacting RNA polymerase. 


History of Negative and Positive Control 
of Gene Expression 


The concept of regulatory genes whose products 
control the expression of other genes was formulated 
by F. Jacob and J. Monod, primarily from the genetic 
analysis of the synthesis of enzymes involved in the 
utilization of the sugar lactose in Escherichia coli, and 
the expression of coliphage à proteins, needed for its 
lytic growth from a prophage state (Jacob and Monod, 
1961). They proposed a negative control mechanism 
to explain, for example, the synthesis of lactose 
enzymes, induced when the sugar is present. In nega- 
tive control, the product of a repressor gene, Jacl, 
binds to a specific site, called an operator, and 
represses transcription from an adjacent site, called a 
promoter, the site of transcription initiation of the 
gene cluster. The gene cluster is called the Jac operon, 
encoding the lactose enzymes. When present, the 
sugar lactose causes inhibition of the operator binding 
activity of the LacI repressor, thus allowing transcrip- 
tion of the lac operon. E. Engelsberg and his col- 
leagues demonstrated positive control from the 
genetic analysis of the induced synthesis of the sugar 
L-arabinose metabolizing enzymes, encoded in the ara 
operon in E. coli (Engelsberg et al., 1966). In this case 
the product of the regulatory gene avaC, serves the 
role of an activator. Transcription from the promoter 
of the ara operon is turned on only when the AraC 
protein binds to a DNA site (activation site) close to 
the promoter. This happens only when AraC is 
liganded to the sugar L-arabinose. Thus, without 
the sugar there is no induction of the L-arabinose 
enzymes. 


An Effector can Make or Break a 
Regulator 


There are now hundreds of examples where bacterial 
gene expression is known to be regulated negatively or 
positively. It is clear from these examples that modu- 
lation of regulators by effector molecules, like lactose 
or arabinose, has been used by nature in four possible 
ways, as summarized in Table |. Thus, a small mo- 
lecular effector can help repression or activation. 
Alternatively, it can negate repression or activation. 
Effector molecules usually work by allosterically 
modifying the regulators, although examples of cova- 
lent modification (for example, phosphorylation) of 
regulators are also known. 
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Table I Strategies of regulation 
Regulatory strategy Regulator Effector DNA site Gene expression 
No Yes 
Negative Repressor Inducer Operator Repressor Repressor 
+ inducer 
Aporepressor Corepressor Operator Aporepressor Aporepressor 
+ corepressor 
Positive Activator Inhibitor Activation site Activator Activator 
+ inhibitor 
Apoactivator Coactivator Activation site Apoactivator Apoactivator + 
(inducer) coactivator 
Superimposed Controls AS, P O Z Y A 
Negative and positive controls have been found to be Laci pri 
superimposed on the same promoter in a variety of (A) 
ways (Lengeler et al., 1999). 
AS, P O Z A 
Independent Positive and Negative Control W. Operon 
The inactivation of the LacI repressor is not sufficient eee (RNP ) [De off 
to permit transcription of the lac operon. An activator, (B) 
cAMP receptor protein (CRP), in concert with cAMP, 
is required for full activation of the lac promoter (see Figure I The regulation of the lac operon of E. coli. 


Kolb et al., 1993). cAMP. CRP acts by binding to a 
target site (AS) near the lac promoter (Figure 1). 
Thus, the full expression of the lac operon depends 
on both the presence of an inducer and the availabi- 
lity of a sufficiently high concentration of cAMP. 
The cAMP concentration depends on the nature of 
the carbon source. Cell growth in glucose lowers the 
cAMP concentration, resulting in a very low level of 
lac operon expression even when the LacI repressor 
is inactive. 


Mixed Control 

Although promoters can be modulated both by 
repressors and activators, there are examples of super- 
imposed controls where one regulator acts by control- 
ling the activity of a second regulator. Several operons 
in E. coli are repressed by a specific repressor, called 
CytR, but require the regulator cAMP.CRP for acti- 
vation (Valentin-Hansen et al., 1996). For example, in 
the P2 promoter of the deo operon, cAMP.CRP acti- 
vates transcription by binding to a site AZ (Figure 2). 
Repression of the operon requires binding of CytR to 
an operator site, O. CytR acts by physically contact- 
ing DNA-bound cAMP.CRP and preventing tran- 
scription activation. Inactivation of CytR by cytidine 
allows cAMP.CRP to activate transcription. Thus, 
negative control by a repressor is accomplished by 
inhibiting an activator. 


(A) operon repressed; (B) operon induced. Z, Y, and A 
are the three structural genes; P promoter, which 
determines RNA polymerase binding and transcription 
start site; O, operator, which is the repressor-binding 
site; AS, activator-binding site; Lacl, Lac repressor; RNP, 
RNA polymerase; cAMP-CRP, cAMP receptor protein 
complexed with cAMP. 


AS2 O AS1 P 
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(A) 


Figure 2 Regulation of the deo promoter of E. coli. 
The deo promoter contains two binding sites of the 
cAMP.CRP: ASI and AS2, one of which, ASI, is needed 
for transcription activation. CytR, cytidine repressor. 
Binding of cAMP.CRP to both ASI and AS2 is essential 
for cooperative CytR-binding to the operator and sub- 
sequent repression. 
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Figure 3 Regulation of the malK promoter of E. coli. The 
multipartite activation sites (shown by circles) for the two 
activators, MalT complexed with maltotriose, (MalT. Mtr) 
and cAMPCRP are named as T3, T4, T5 and C2, C3, C4, 
respectively. MalT. Mtr complex repositions (shown by 
solid circles) by an unknown mechanism after cAMP.CRP 
binding. Repositioning is essential for operon induction. 


Two Interdependent Activators 

Superimposed controls mayalsoinvolvetwoactivators. 
In these cases, both activators must be functional in 
order for the promoter to be transcribed. For example, 
the transcription of the malK operon requires two 
activators: MalT and cAMP.CRP (Reichet et al., 1991) 
(Figure 3). Activation by MalT requires maltotriose. 
Each activator has several binding sites. Activation 
involves a DNA-multiprotein complex of higher 
order structure, in which cAMP.CRP binding to 
DNA repositions MalT to slightly shifted new pos- 
itions, which leads to activation of the promoter. 


Autoregulatory Genes 


The activity of some DNA binding regulatory pro- 
teins is a function of its concentration. Regulatory 
genes are often regulated in this way (autoregulation). 
A regulator can be maintained at a critical level by, for 
example, negative control by binding to an operator 
specific for its own gene. When the concentration of 
the regulator falls below a critical level, its binding to 
the operator decreases, thereby derepressing its own 
synthesis and vice versa. Autoregulation of transcrip- 
tion factors by positive control alone does not happen 
because such an event would result in exponential 
synthesis of the activator protein. Systems of positive 
autoregulation are, therefore, superimposed onto one 
with negative control. At low concentrations, the 
regulator activates transcription by binding to a high- 
affinity DNA site (activation site); at high concentra- 
tion, the regulator binds to a low-affinity second site 
(operator) bringing about repression. Superimposed 
positive and negative autoregulation is best exempli- 
fied by the autoregulation of phage à regulator (cI) 
synthesis in the prophage state (Figure 4; see below). 


Specific versus Global Regulators 


Although many regulatory proteins are specific for 
the target genes they control some control a set of 


Figure 4 Autoregulation of bacteriophage A cl 
regulator synthesis from the promoter PRM. At low cl 
concentrations, the regulator activates its own synth- 
esis, whereas at high concentrations, repression ensues. 


operons, which are scattered throughout the bacterial 
chromosome. Such a set of operons under the control 
of a common regulatory protein is called a ‘regulon’ 
(see Lengeler et al., 1999). The member operons of 
a regulon may also be subject to specific regulation 
by superimposed control as mentioned above. The 
best-studied example of a global regulator is the 
cAMP.CRP complex in E. coli. 


Some Regulators may both Repress and 
Activate Transcription 


Although the products of regulatory genes are usually 
either repressors or activators, some regulators can 
perform both roles for examples, as a repressor of one 
promoter and an activator of another. The cI protein 
of phage à (Ptashne, 1992) autoregulates itself both 
as an activator and as a repressor of its own synthesis. 

Another example of a bifunctional regulator is the 
GalR protein which controls the gal operon of the 
bacterium E. coli (Choy et al., 1997). The biochemical 
mechanisms by which such regulators act are 
discussed below. 


Regulators may Perform other Cellular 
Functions 


The bio operon encodes a set of enzymes of biotin 
biosynthesis in E. coli. The regulatory protein of this 
system is BirA, a protein that also acts as an enzyme 
for the synthesis of its cognate corepresser, biotinyl- 
5’-adenylate. The BirA-biotinyl-5'-adenylate com- 
plex either binds to an operator site in the bio operon 
and represses transcription, or transfers the biotinyl 
moiety to apoenzyme of acetyl-CoA carboxylase. 


DNA Binding Properties 


DNA-protein interactions play key roles in the func- 
tion of regulatory proteins. Because they recognize a 


sequence of 8-20bp in a background of millions of 
base pairs, they are sequence-specific DNA-binding 
proteins (Travers, 1993). The target sites in DNA are 
located at strategically important places near pro- 
moters. Sequence specificity serves three purposes in 
transcriptional regulation: (1) it guides the protein 
to its area of performance; (2) it increases the local 
concentration of the protein, permitting further 
protein-protein and/or DNA-protein interactions; 
and (3) it brings about any required structural changes 
in the interacting DNA and/or protein. A protein- 
induced distortion of DNA may be required to modu- 
late transcription initiation from a promoter, or a 
DNA-induced conformational change in the regu- 
latory protein may enable it to interact with RNA 
polymerase. In sequence-specific amino acid—base 
interactions, there seems to be no code. In contrast, 
different structural motifs in proteins recognize spe- 
cific DNA sequences. The two most commonly found 
structures among prokaryotic regulatory proteins are 
the helix-turn-helix motif and the B-fold. 


Helix-Turn-Helix Motif 

This is the most common DNA-binding motif in 
prokaryotes. The proteins bind as dimers. 16-20-bp 
long unique stretches of DNA with dyad symmetry 
bind the protein dimers via two symmetrically spaced 
helix-turn-helix motifs. Each motif comprises two 
stretches of a-helices connected by a B-turn with an 
interhelical angle of about 90°. 


B-Fold Motif 

Another DNA-binding motif found among prokar- 
yotic regulatory proteins is the B-fold. A two-stranded 
antiparallel B sheet in each subunit of a protein dimer 
recognizes a major groove of DNA, in a sequence- 
specific way, within a half symmetry of the corres- 
ponding DNA site. Each site is about 8 bp long. The 
side chain of each B-fold interacts with the base-edge 
within the major groove. 


Allosteric Modification 


As summarized in Table 1, allosteric effectors modu- 
late the DNA-binding activity of several regulatory 
proteins. These proteins usually have two domains 
connected by a flexible polypeptide hinge in each 
subunit of the dimers. One domain binds to DNA 
though the helix-turn-helix or B-fold mode; the other 
domain binds to the effector. Ligand binding to one 
domain changes the polypeptide conformation that is 
transmitted to the other domain through the hinge. 
The effector-induced change either sets up the DNA- 
binding motif for correct major groove contact or 
prohibits such recognition. 
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Interaction of Regulators with RNA 
Polymerase 


The most common mode by which a regulator influ- 
ences the behavior of RNA polymerase is by making a 
direct contact between the two proteins in the DNA- 
bound form (Ishihama, 1993). However, in eukaryotic 
systems, activators communicate with RNA polymer- 
ase, usually from far-away sites on DNA (enhancers), 
indirectly through adapter proteins, which interact 
simultaneously with activators and RNA polymerase 
(see below). The interaction, direct or indirect, forms 
thecriticalcomplex whosetemporal natureand stability 
during the initiation reactions determine the step the 
regulator influences, as well as the outcome: activation 
or repression (see below). Two well-studied examples 
of regulator-RNA polymerase interaction have shown 
that regulators act by contacting different subunits 
of the multisubunit enzyme RNA polymerase. 


CRP-« Interaction 

The cAMP.CRP complex, binding to a DNA site 
located upstream of the transcription start site, acti- 
vates transcription from the lac promoter by making a 
direct contact with RNA polymerase (Busby and 
Ebright, 1999). RNA polymerase binds poorly to the 
lac promoter. The cAMP.CRP-RNA polymerase con- 
tact helps RNA polymerase to bind to the Jac pro- 
moter. The contact is made through a patch on the 
CRP subunit, consisting of a 9 amino acid stretch of 
the C-domain of CRP. The RNA polymerase counter- 
part is a segment in the a-subunit that is close to 
another segment of « that binds to DNA. 


Acl-o Interaction 

Activation of transcription initiation from the PRM 
promoter, by the cI protein of phage A described 
above, requires cI binding to a site known as OR2 
(Ptashne, 1992). cI acts by making a direct contact 
with RNA polymerase bound to PRM, and stimulates 
the isomerization of the closed to open complex 
of RNA polymerase (see below). Two negatively 
charged amino acid residues (an acidic patch) near 
the DNA-binding helix-turn-helix motif of cI play a 
key role in activation, from which it has been pro- 
posed that they contact RNA polymerase. Mutants of 
RNA polymerase that restore the activation detect of 
cI are altered in the o subunit of RNA polymerase, 
suggesting that cI interacts with ø (Li et al., 1994). 


Mechanistic Details of the Action of 
Regulators 


The minimal biochemical steps of RNA polyme- 
rase binding to a promoter leading to transcription 
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R+P ==> [R.P], = [R-P], == [R.P]; — [RP], 
f2 


Binding Isomerization Initiation Clearance 

Figure 5 Steps of transcription initiation. RNA 
polymerase, R, binds to the promoter, P, in a 
competitor-sensitive closed complex, [R-P]., which then 
isomerizes to a competitor-insensitive open complex, 
[R:P],, in which the DNA is partially unwound. RNA 
polymerase then initiates transcription as an initiation 
complex, [R-P];, usually making short RNA oligomers 
before clearing the promoter as an elongating complex, 


[R:P].. 


initiation are shown in Figure 5. In principle, any of 
the steps can be regulated. For example, a rate-limiting 
step can be enhanced by an activator or inhibited by a 
repressor. These properties are best explained by com- 
paring the action of regulators to enzymes which sug- 
gests a mechanism of repressor and activator action 
from thermodynamic considerations (Roy etal., 1998). 
In this model, when a regulator binds to a DNA site, 
it makes a direct contact with RNA polymerase and 
modulates the activity of the latter. The protein- 
protein contact(s) enhances or inhibits RNA poly- 
merase activity by changing the energetics, i.e., by 
differentially stabilizing one or more of the inter- 
mediates (including transition states) of the transcrip- 
tion initiation reactions. Depending on the DNA 
sequence of the promoter and the architecture of the 
regulator-RNA polymerase-DNA complex, changes 
in RNA polymerase conformation during the steps 
may facilitate differential contacts favoring activation 
or repression. 


Complex Regulatory Structures: DNA 
Loops 


Enhanceosome 

Regulator binding sites on DNA, although usually 
located very close to the RNA polymerase binding 
site in prokaryotic systems, may sometimes be far 
from the promoter (North et al., 1993). NtrC is an 
activator protein that enhances transcription from 
the promoter of the glnA gene. The binding site of 
NtrC is located kilobase pairs upstream of a promoter 
(Figure 6A) that is recognized by RNA polymerase 
containing an alternate o factor, o°*. NtrC functions 
by enhancing the isomerization of the closed to open 
complex of o°* RNA polymerase at the glnA pro- 
moter. NtrC acts as an apoactivator and binds to the 
enhancer; the apoactivator is phosphorylated by a 
specific protein kinase, called NtrB, to become an 
activator. The enhancer-bound phosphorylated NtrC 
physically contacts the promoter-bound o°*-RNA 
polymerase by looping out the intervening DNA 


Operon 
on 


P2 


Operon 


9 
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Figure 6 DNA looping. (A) Enhanceosome: transcrip- 
tion activation at the E. coli ginA promoter by NtrC 
activator. NtrC bound to the activation site located 
kilobase pairs away contacts RNA polymerase bound to 
the promoter. In the process, the intervening DNA is 
looped out. In similar cases in other organisms, a DNA- 
bending protein (Be) binding to the intervening DNA 
segment may be needed to facilitate DNA looping. (B) 
Repressosome: transcription repression of the P2 
promoter of the gal operon of E. coli. GalR dimers bind 
to two operators, OE and Ol. The two DNA-bound 
GalR dimers interact to form a DNA loop containing the 
promoter DNA segment. DNA looping is facilitated by 
binding of the protein HU in the middle of the DNA 
segment. Looping prevents transcription from P2. 


sequence. Such a loop-containing DNA-multiprotein 
complex is called an enhanceosome. In eukaryotic 
enhanceosomes, an enhancer-bound activator fre- 
quently needs an adapter protein(s) for making an 
indirect contact with the RNA polymerase assembled 
at the promoter. 


Repressosome 

In negative control, a repressor may need multipar- 
tite operator sites. For example, repression at the P2 
promoter of the E. coli gal operon requires GalR 
binding to two operators, Og and Oy (Figure 6B). 


The operators are separated by 113 bp and span P2. 
When GalR binds to both operators, the two GalR 
dimers associate to form a DNA loop of the interven- 
ing DNA segment (Geanacopoulos et al., 1999). Loop 
formation by GalR requires an additional cofactor. 
DNA looping results from a three-way cooperative 
binding of two GalR to their corresponding operators 
and the binding of the cofactor, HU, a bacterial his- 
tone-like protein, to a specific region of the DNA 
between Og and Or. HU does not bind without 
GalR binding to both operators, and there is no inter- 
action between the two operator-bound GalR without 
HU. The multiprotein complex containing a DNA 
loop, which brings about repression, is termed a repres- 
sosome. Repressosomes restrain RNA polymerase 
from forming an open complex at P2. Thus, repres- 
sosome formation makes the promoter inadequate 
for transcription initiation at the gal P2 promoter. 


Epilog 


If one wades through the massive literature on the 
structure and function of gene regulators and the 
mechanisms by which they achieve their goals, only 
a few of which are mentioned here one cannot escape 
the observation that natural selection has evolved a 
remarkable class of proteins to regulate transcrip- 
tion initiation. These proteins respond to changes in 
environmental signals by changing their behavior in 
order to contribute to critical biological phenomena, 
i.e., to respond to cellular adaptation, differentiation, 
and development. Yet their seemingly distinct actions 
(gene activation and repression at different steps of 
transcription initiation) may be explained by the 
basic rules of enzymology. 
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Noncoding RNAs (ncRNAs) comprise a class of 
genes which function directly as RNAs rather than 
coding for protein products. The best-known ncRNAs 
are parts of the basic machinery of the cell: ribosomal 
RNAs (rRNAs) and transfer RNAs (tRNAs) that act 
in the translation of mRNAs, small nuclear RNAs 
(snRNAs) of the spliceosome, and small nucleolar 
RNAs (snoRNAs) that direct the chemical modifica- 
tion of RNAs. However, ncRNAs play roles in an 
astonishingly broad range of biological processes, 
such as the control of DNA replication in bacteria, 
X-chromosome dosage compensation in mammals 
(Xist), translocation of proteins across the endoplas- 
mic reticulum (signal recognition particle RNA), and 
the targeting of aberrant protein products from trun- 
cated mRNAs for degradation in bacteria (10S RNA 
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or tmRNA). A subset of the ncRNAs are regulatory 
RNAs (‘riboregulators’), which regulate the expres- 
sion of specific genes or sets of genes. 

Regulatory RNAs typically have extended anti- 
sense complementarity to their targets and, therefore, 
regulation occurs at a posttranscriptional level by 
RNA-RNA interactions. Notable exceptions are a 
few ncRNAs involved in imprinting, where we cur- 
rently have too little information to be certain of their 
roles in altering gene expression. Here we focus on 
two examples of posttranscriptional regulation by 
ncRNAs in bacteria, where the mechanisms of regula- 
tion have been studied extensively, as well as one 
example from eukaryotes that describes novel regula- 
tory RNAs in the nematode Caenorhabditis elegans. 

In the most simple example of RNA-RNA regula- 
tion, the antisense regulatory RNA transcript is tran- 
scribed from the same locus as the sense RNA, but in 
the opposite direction and overlapping the region of 
sense transcription. The expression of the transposase 
gene for mobilization of the bacterial Tn/0 transposon 
is controlled by the 70-nucleotide (nt) IS/0 antisense 
RNA, RNA-OUT, which is complementary to the 5’ 
end of the RNA-IN transposase mRNA across a 35-nt 
region. Pairing of RNA-OUT to RNA-IN prevents 
translation of RNA-IN by physically blocking the 
ribosome binding site but also destabilizes the mRNA 
by exposing the unpaired 3’ end to ribonucleases. 

RNAs with more complex regulatory roles have 
been found in prokaryotes. For example, the regula- 
tory RNA can be produced from a distinct genetic 
locus and interact with one or more targets. The 87- 
nt DsrA RNA coordinates the expression of many 
E. coli stress-response genes by exerting opposing 
effects on the expression of two of their regulators; it 
antagonizes the repressor hns but enhances the posi- 
tive regulator rpoS. DsrA RNA negatively regulates 
the translation of the ns mRNA by base-pairing 
to the 5’ end of the mRNA just downstream of the 
AUG start codon. The rpoS mRNA itself forms a 
stable secondary structure in the which 5’ UTR cis- 
sequences fold back to form a stem loop at the 
AUG, inhibiting RpoS translation. The DsrA RNA 
positively regulates rpoS by interfering with this 
intrastrand base-pairing, by binding to the 5’ UTR 
of the rypoS mRNA, freeing the ribosome-binding 
sequences for the translational machinery. 

While the majority of regulatory RNAs have been 
identified in prokaryotes and viruses, an increasing 
number are found in eukaryotes. C. elegans has two 
regulatory RNAs that act in the same genetic path- 
way to regulate developmental progression. lin-4 and 
let-7 encode the smallest regulatory RNAs known to 
date, 22 nt and 21 nt, and both are unlinked to their 
target genes. The expression of these small RNAs is 


temporally regulated to trigger the downregulation of 
target genes and allow progression to the next develop- 
mental stage. Little is known about their mechanism of 
action, but they act posttranscriptionally as negative 
regulators via physical interactions with complemen- 
tary sequences in the 3’ UTR of their mRNA targets. 
For example, the lin-41 mRNA has two sites for let-7 
binding that have slightly different sequences and are 
predicted to form different duplexed structures. The 
RNA-RNA duplexes formed are imperfect, with 
loops and bulges that could be binding sites for pro- 
teins involved in the downregulation of mRNA trans- 
lation. The /et-7 RNA is conserved and temporally 
regulated in other metazoans as different as drosophila 
and humans, suggesting that such small regulatory 
RNAs may have universal roles in control of develop- 
mental timing. 

The emergence of complete genome sequences 
should accelerate the discovery of regulatory RNAs. 
Because many of the known ncRNAs are conserved 
across animal or microbial phylogeny, a comparison 
of genome sequences for other conserved segments 
20-100 bases long, followed by experimental detection 
of the corresponding RNA, could reveal many more 
examples of the regulatory RNA world. 


See also: Nutritional Mutations; Regulatory 
Genes 
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The proto-oncogene Rel was identified as the cellular 
homolog of V-Rel, the oncogenic gene of the reticulo- 
endotheliosis virus (REV), which causes lymphomas 
in chickens. Rel is the p65 subunit of the p50-p65 
heterodimer of nuclear factor-xB (NF-«B), a family 
of transcription factors. The p50-p65 heterodimer is 
located in the cytoplasm in association with an 
inhibitory protein, IxB. In response to a plethora of 
external signals such as growth factors, lymphokines, 
cytokines, and stress, a family of kinases (IKK) is 
activated, which in turn phosphorylates IxB at specific 


sites. The phosphorylated form of IkB is ubiquitinated 
and degraded by proteosomes, leading to trans- 
location of p50-p65 heterodimers to the nucleus, 
where they bind to a consensus decameric site (5'- 
GGGRNYYCC-3’) present in a large number of 
genes involved in the immune system, apoptosis, 
growth, differentiation, and development. The C- 
terminus of p65 (RelA) provides the transactivation 
domain of p50-p65 heterodimers. In addition to asso- 
ciation with p50 and p52, p65 can also form homo- 
dimers. Resolution of three-dimensional structures of 
p50-p65 heterodimers, p65 homodimers, and p50- 
p65-IkB reveals that IxB masks the nuclear transloca- 
tion signal of the p50-p65 heterodimer. Mice lacking 
p65 (RelA) die prenatally at about day 14.5 owing to 
extensive liver apoptosis. 

Mice lacking both p65 (Re/A) and tumor necrosis 
factor-aR (TNFoR) —/— are born, suggesting that p65 
(Rel A) is essential to prevent TNFa-induced liver 
apoptosis. C-Rel (p65/RelA) has a homolog in Droso- 
phila called Dorsal, essential for dorsal—ventral mor- 
phogenesis of an embryo. 


See also: Cancer Susceptibility; Proto-Oncogene 


Release (Termination) 
Factors 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1992 


Release (termination) factors respond to termination 
codons to bring about release of the completed poly- 
peptide chain and the ribosome from mRNA. 


See also: Protein Synthesis 
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Pathology, Classification and Clinical 
Behavior of RCC 


Renal cell cancer (RCC) constitutes a group of tumors 
that is highly heterogeneous with respect to morph- 
ology and clinical behavior. RCC is the most common 
malignant tumor arising in the kidney and accounts 
for 2% of all new cancers diagnosed world-wide. RCC 
affects males twice as often as females and shows a 
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peak in the sixth decade. There is no clear geographical 
or ethnic link. An increased incidence of RCC has been 
associated with end-stage renal disease or with acquired 
cystic kidney disease and also environmental factors. 
RCC tumors are often quite large at detection and 
invariably have already metastasized. At present there 
is no effective therapy for metastatic RCC and patients 
with irresectable disease have a poor prognosis. 

Clinicopathologically, RCC consists of a number 
of histologically defined entities which may be either 
hereditary or nonhereditary. In the past it was believed 
that certain clear cell epithelial renal tumors were de- 
rived from ectopic adrenocortical elements as expressed 
by Virchow and advocated by Grawitz, which led to 
the term ‘hypernephroma’ or Grawitz tumor. Today, 
there is evidence from animal experiments that the 
usual (nonembryonic) RCC in all its variants derives, 
in principle, from the mature uriniferous tubule. 

Currently several morphological classifications are 
used: WHO/AFIP; modified Mainz classification; and 
the Heidelberg classification. As stated in the latter 
two, eight different subtypes of RCC can be distin- 
guished, relating to the basic cell types of the nephron 
from which they are derived and in keeping with the 
genetic facts as presently understood: (1) metanephric 
adenoma and metanephric adenofibroma; (2) papillary 
adenoma; (3) renal oncocytoma; (4) common or con- 
ventional (clear cell) renal carcinoma; (5) papillary 
(formerly chromophilic or tubulopapillary) renal 
carcinoma; (6) chromophobe renal carcinoma; (7) col- 
lecting duct carcinoma; and (8) renal cell carcinoma 
(unclassified). The first three subtypes are benign 
parenchymal neoplasms and subtypes (4) to (7) are 
malignant. These subtypes show phenotypical/histo- 
genetical relationships to different parts or cell types, 
respectively, of the nephron collecting duct system. 
Cytogenetic and molecular genetic studies allow the 
classification of tumors with respect to their genotypic 
differences. 


Genetic Classification of RCC 


The most frequently occurring RCC is common RCC 
characterized by loss of (part) of the short arm of 
chromosome 3 due to a deletion or unbalanced trans- 
location, which is restricted to this subtype. Regions 
frequently lost are are 3p12-14, 3p21, and 3p25. The 
VHL (von Hippel—Lindau syndrome) gene, assigned 
to 3p25, seems to play a role in the development of 
sporadic RCC, probably in combination with other 
gene(s) like the fragile histidine triad (FHIT) gene, 
assigned to 3p14, or a candidate gene, nonpapillary 
renal cell carcinoma 1 (NRC-1), mapped within 
3p12. In the dominantly inherited von Hippel-Lindau 
(VHL) cancer syndrome, the VHL gene is mutated in 
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the germline and in renal cell tumors of affected family 
members. Loss of at least two of the regions mentioned 
above are necessary for kidney cells to develop into 
common type renal cell carcinoma, and loss of 3p21 is 
obligatory. Therefore, if tumor shows only one dele- 
tion at 3p, either 3p14 or 3p25, it should be designated 
as a common type renal cell adenoma. Other aberra- 
tions frequently found in common RCC are (partial) 
trisomy of chromosome 5, especially the 5q22-qter 
segment, as well as trisomy 12 and 20, and loss of 
chromosomes 8, 9, 13q, 14q, and structural abnormal- 
ities of the long arm of chromosomes 6 and 10. 

Most papillary renal adenomas and carcinomas are 
characterized by a unique combination of autosomal 
trisomies with trisomy 17. Papillary adenomas specif- 
ically show a —Y, +7, +17 chromosomal pattern as 
well as trisomy 3 or gain of the long arm of chromo- 
some 3, probably reflecting malignant transformation. 
Trisomy of chromosomes 12, 16, and 20 as well as loss 
of the extra copy of chromosome 17 or loss of 17p are 
associated with progression from the adenoma into 
the carcinoma stage, i.e., papillary renal cell carci- 
nomas. The p53 gene most likely does not play an 
important role, since no mutations of p53 have been 
observed in this subtype. Microsatellite analysis 
revealed allelic duplications a.o. at 20q11.2 and 
20q13.2 suggesting new tumor genes in papillary 
renal carcinoma. The MET proto oncogene, assigned 
to 7q31 and encoding the hepatocyte growth factor 
receptor/scatter factor implicated in the proliferation 
and invasiveness, has been found to be mutated in 
germline and somatic mutations in papillary renal 
tumors. Cytogenetically, no differences are observed 
between hereditary tumors (usually presenting as 
multiple/bilateral tumors, and especially in familiar 
cases characterized by an early age of onset) and 
sporadic papillary tumors. Also the high incidence of 
loss of the Y chromosome combined with the strong 
predominance in males suggest that loss of specific 
sequences harbored on the Y chromosome are prob- 
ably important for developing this subtype. 

A small subset of papillary RCC is characterized by 
X;autosome translocations. The t(X;1)(p11.2;q21), 
resulting in a fusion of the transcription factor TFE3 
on the X chromosome with a novel gene, designated 
PRCC, on chromosome 1, appears to be a specific 
primary anomaly characterizing a distinct subgroup 
of papillary RCC with common RCC-like features 
such as clear cytoplasm. These tumors occur prefer- 
entially in young (male) adults and children, although 
female cases have been described recently. 

Metanephric adenoma or adenofibroma shows gain 
of chromosomes 7 and 17 with Y chromosome loss 
suggesting a relationship with papillary renal cell 
adenomas and carcinomas. 


In renal oncocytoma several genetic subsets can be 
distinguished: one with mixed populations of normal 
and abnormal karyotypes with no cytogenetic simi- 
larity found as yet; a group defined by (variant) trans- 
locations involving 11q13; and one with specifically 
defined numerical anomalies, in particular loss of 
chromosomes 1 and Y/X. The finding of mitochon- 
drial DNA changes and the loss of Y/X in both renal 
oncocytoma and chromophobe carcinoma might indi- 
cate progression from renal oncocytoma to chromo- 
phobe renal cell carcinomas through additional 
chromosome losses, also explaining the occasionally 
malignant behavior of renal oncocytomas. Chromo- 
phobe renal carcinomas show multiple losses of entire 
chromosomes, i.e., loss of chromosomes 1, 2, 6, 10, 13, 
17, 21, and the Y or X chromosome, leading to a low 
chromosome number. Collecting duct carcinomas do 
not show consistent chromosomal abnormalities as 
yet; involvement of the short arm of chromosome 8 
and loss of the long arm of chromosome 13 as well as 
loss of part of the long arm of chromosome 1q32 are 
probably related to the poor prognosis. 

Sarcomatoid transformation in RCC represents 
the highest form of dedifferentiation and can in prin- 
ciple be derived from all the basic cell types. Cyto- 
genetic data on sarcomatoid RCC are scarce: some 
show structural abnormalities of chromosomes 1, 5, 
16, and 19 and losses of 3p, 4(q), 6q, 8p, 9, 13, 14, 
and 17p, and gain of 5, 12, and 20, as well as p53 
mutations. 

There is increasing evidence to suggest the presence 
of clonal, mostly numerical, chromosomal changes in 
apparently normal kidney tissue from patients with a 
normal constitutional karyotype, for example, trisomy 
7, 5, 8, 10, 18, and loss of the Y chromosome. These 
changes are not an im vitro artifact and are independent 
of the length of time of cell culture. The presence of 
clonal and nonclonal aberrations in apparently normal 
kidney tissue merely indicates a chromosome instabil- 
ity pattern or mosaicism, and this condition should 
not be considered as strictly neoplastic. 


Summary 


In conclusion, different subtypes of RCC might 
originate from cells of different parts of the renal 
tubulus. Taken together, cytogenetic and molecular 
genetic studies of recent years have demonstrated 
that certain specific chromosomal abnormalities cor- 
relate with different histological subtypes of renal cell 
cancer and could have diagnostic and prognostic con- 
sequences. 


See also: Aneuploid; Cancer Susceptibility; 
Mosaicism in Humans; Translocation 
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Renaturation is the reassociation of denatured 
complementary single strands of DNA. 


See also: DNA Denaturation 
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Correction of Errors and Damage in DNA 


Maintaining the integrity of its genetic blueprint is of 
central importance for a living cell and the organism of 
which it is a part. To preserve the function of the 
genetic material within the cell and to ensure its accur- 
ate transmission to future generations, numerous 
mechanisms have evolved to repair errors and damage 
in DNA. As an example of the cellular resources 
devoted to this end, consider that of the 1709 proteins 
encoded by Haemophilus influenzae, the first bacter- 
ial genome to be sequenced, at least 45 function in 
DNA repair mechanisms. 

The complementary, double-stranded structure of 
DNA, a crucial feature that allows it to be readily 
replicated, also facilitates its repair. The objects of 
repair range from mismatched bases resulting from 
errors in DNA replication to base damage and even 
gross distortion of the DNA structure by physical and 
chemical agents. In a few instances the damage is 
simply reversed. Most repair mechanisms, however, 
first remove the damaged region together with a seg- 
ment of the DNA strand in which it occurred and then 
resynthesize that segment correctly using the comple- 
mentary strand as a template. Depending on the 
component initially removed or recognized, these 
mechanisms have been categorized as base excision, 
nucleotide (or oligonucleotide) excision, and mis- 
match repair. When the damage cannot be so simply 
repaired, a mechanism of recombinational repair, 
requiring interaction with another copy of the 
genome, may intervene. 


Types and Sources of Errors and Damage 


Two types of defect can arise in DNA. One type 
results from mistakes in replication to give the 
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wrong base at a particular position in the DNA; the 
other type results from molecular damage to DNA 
by physical or chemical agents. Both types of defect 
can block proper function of the DNA and lead to 
mutations after DNA replication. 


DNA Mismatches 
Base mismatches are noncomplementary pairings of 
bases in opposite strands. Unlike the complementary 
A:T and G:C base pairs, which fit precisely into the 
double helix and form two or three hydrogen bonds, 
the eight mismatched pairs AIC, GIT, AIG, TIC, AIA, 
TIT, GIG, and CIC, cannot do so. Mismatches may 
also correspond to deletions or additions of one or 
more bases in a strand to give single-strand loops of 
various sizes. Figure | illustrates a normal G:C base 
pair (Figure 1A) and a GIT mismatch (Figure |B). 
Base mismatches result from incorporation of the 
wrong nucleotide by the enzyme replicating the 
DNA. All enzymatic reactions are subject to error, 
and despite the high specificity of DNA polymerases 
and editing processes during synthesis some mis- 
matches persist. Furthermore, slippage of the poly- 
merase on the parental strand template gives rise to 
additions or deletions of stretches of nucleotides in 
the nascent strand. Genetic recombination usually 
involves hybrid DNA formation, where complemen- 
tary strands come from different individuals; genetic 
differences between them will appear as mismatches 
in the heteroduplex DNA. Mismatches can arise 
also from base damage; for example, deamination of 
a naturally occurring 5-methylcytosine opposite 
guanine gives a GIT mismatch. 


Base Damage 

One of the most common forms of base damage is the 
spontaneous hydrolysis of the amino group on C4 of 
cytosine to give uracil, which forms base pairs like 
thymine. Adenine and guanine are also subject to 
deamination. Alkylated bases resulting, for example, 
from abnormal methylation at N; or N7 in adenine 
and N; or Og in guanine may alter base-pairing prop- 
erties (Figure IC). Oxidation of bases leads to prod- 
ucts such as thymine glycol and 8-oxoguanine, among 
others (Figure ID). Hydrolytic removal of a base is an 
extreme form of base damage. Spontaneous loss of 
bases by hydrolysis at physiological temperature and 
pH is rare (one in a million per day), but it is signifi- 
cant (tens of thousands in a human cell per day). 
Purine bases are more readily lost than pyrimidines. 
Spontaneous deamination of cytosine occurs with a 
frequency similar to depurination. Base damage can 
be caused by metabolic products normally found 
in cells, for example, methyl groups may be trans- 
ferred adventitiously from S-adenosylmethionine, 
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the normal methyl donor. Oxidative metabolites such 
as hydrogen peroxide can oxidize DNA bases. Extra- 
neous chemicals in the cellular environment also cause 
damage. These include nitrosourea and paraquat, 
which give rise to alkylating and oxidative agents, 
respectively. Ionizing radiation delivers large amounts 
of energy, and in the aqueous cellular milieu it 
produces strong oxidizing agents that act on DNA 
bases. Typical oxidation products are shown in 
Figure ID. 


Bulky Adducts 

Ultraviolet light produces pyrimidine dimers in DNA 
with an action spectrum corresponding to the absorp- 
tion of DNA, which peaks at a wavelength of 260 nm. 
These photoproducts are mainly of two sorts with 
adjacent pyrimidines linked either by cyclobutane 
rings between corresponding Cs and Ce atoms or by 


a single C4—Cg linkage (Figure IE). In both cases they 
severely distort the DNA structure. Chemical carcino- 
gens such as N-acetoxy-2-acetylaminofluorene and 
4-nitroquinoline-1-oxide form bulky base adducts. 
Some therapeutic drugs, such as cisplatin and psor- 
alen, react with more than one base to give intrastrand 
and interstrand crosslinks. 


Strand Breaks 

Phosphate groups in the DNA backbone can be alkyl- 
ated to form triesters, which render the original ester 
bonds more labile and lead to strand breaks. Single- 
strand breaks are commonly formed by physical 
agents, such as UV and X-rays, and they occur spon- 
taneously, especially at depurinated sites. Ionizing 
radiation produces many double-strand breaks. Such 
lesions are very deleterious, and a single unrepaired 
double-strand break can be lethal to a cell. 
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Normal and damaged DNA bases. (A) Normal G:C nucleotide pair in DNA. Hydrogen bonds (dashes) 


form between complementary bases. Note opposite sense of deoxyribose-phosphate chains in complementary 
strands; half-arrows show 5’ to 3’ direction. In (B)—(E) only base residues in DNA are shown. (B) Mismatched G|T 
base pair. (C) Abnormally methylated derivatives of guanine and adenine. (D) Oxidatively damaged guanine and 
thymine residues. (E) UV-induced pyrimidine photoproducts. 


Biological Effects of DNA Damage and 
Errors 


A double-strand break ruptures the continuity of the 
genetic material. Truncating a gene may block tran- 
scription not only of that gene but also of downstream 
genes. DNA replication distal to the break is pre- 
vented, and the broken chromosome cannot be trans- 
ferred properly to daughter cells. Single-strand breaks 
and base damage do not impinge so strongly on gene 
expression and replication, but they can impede these 
processes. Bulky adducts, in particular, block tran- 
scription and DNA synthesis. Even when transcrip- 
tion and replication are not blocked, base damage, 
and particularly base mismatches, can lead to altered 
sequences in messenger RNA and daughter DNA 
strands. The altered RNA may give a nonfunctional 
product, and the altered DNA would produce a 
mutation, generally deleterious, in the progeny of 
that cell or organism. To avoid such mutations and 
loss of function, various repair mechanisms have 
evolved. 


Direct Reversal of Damage 


The simplest form of repair is reversal of the damage, 
and the simplest case to consider is ligation of a single- 
strand break. The two ends of the broken strand are 
held together by the complementary DNA strand, and 
if the only damage is rupture of the bond between the 
5’-phosphate of one nucleotide and the 3’-OH of its 
neighbor, DNA ligase, the enzyme that normally links 
such strand segments, restores the bond. All repair 
mechanisms require physical or chemical energy to 
effect the molecular conversion, and in this case the 
energy comes from ATP or other cofactor used by the 
ligase. Mechanisms for direct reversal are limited in 
number, probably because they must be highly specif- 
ic and therefore are not generally useful. Two such 
mechanisms are described below. 


Photoreactivation 

In photoreactivation, cyclobutane pyrimidine dimers 
(Figure IE) formed by UV light are reversed by light 
of a longer wavelength. Since UV damage in nature 
requires exposure to sunlight, cells that are so exposed 
can make use of longer wavelength light as a source of 
energy for directly restoring dimers to monomeric 
form. The precise wavelength needed depends on 
the chromophore, or light-absorbing component of 
the photoreactivating enzyme, DNA photolyase. In 
Escherichia coli, the enzyme contains two chromo- 
phores, a deazaflavin derivative and a folate derivative, 
but only the latter is used as a light antenna, and 
photoreactivation peaks at a wavelength of 384 nm. 


Repair Mechanisms 1663 


Alkyl Group Removal 

The alkylated base Og-methylguanine (Figure IC) is 
mutagenic because it can pair with either thymine or 
cytosine during DNA synthesis. Two rounds of DNA 
synthesis would lead to a G:C to A:T mutation. Simi- 
larly, O4-methylthymine (Figure IC) pairs with 
either guanine or adenine to give a T:A to C:G muta- 
tion. To offset the effect of such alkylation, cells ran- 
ging from bacteria to mammals harbor a specific, but 
costly, mechanism for transferring methyl groups 
from the DNA base to an activated cysteine residue 
in a protein called O,-methylguanine DNA methyl- 
transferase. This protein, however, is not a typical 
enzyme, since it is consumed in the reaction. A shorter 
form of the protein (encoded by the gene ogt in E. coli) 
is found constitutively in many species of prokaryotes 
and eukaryotes. An extended form of this protein, 
which is present in bacteria, can remove alkyl groups 
from DNA phosphate, as well. The latter protein, of 
which only a few molecules are normally in the cell, is 
induced by small amounts of alkylation damage. 
Hence, the gene encoding it is called ada, for adaptive 
response. Such regulation is advantageous because it 
limits the unnecessary production of a protein that, 
when required, is used stoichiometrically in a profli- 
gate manner. 


Indirect Reversal of Errors and Damage 


Most DNA repair mechanisms are more broadly 
applicable than the direct reversal of specific types of 
damage because they make use of DNA strand com- 
plementarity. The paradigm for these mechanisms 
consists of the following steps: (1) recognition of the 
error or damage; (2) excision of a strand segment con- 
taining the defect; and (3) resynthesis of the corrected 
segment by a DNA polymerase using the complemen- 
tary strand as template. Three categories of these 
mechanisms have been delineated. In ‘base excision’, 
a damaged base is first recognized and removed. In 
‘oligonucleotide excision’, a damaged strand is doubly 
incised, and the strand segment is removed. In ‘mis- 
match repair’, a strand segment containing the incor- 
rect component of the mismatch is removed. In the 
case of a mismatch, neither strand is damaged, so 
incision of the incorrect strand is indirectly deter- 
mined, as explained below. 


Base Excision 


General mechanism 

Damaged or abnormal bases are usually recognized 
by individual enzymes in a highly base- and damage- 
specific manner. (An exception is the a/kA product 
of E. coli, which removes both hypoxanthine and 


1664 Repair Mechanisms 


3-methyladenine.) These enzymes are glycosylases 
that detach the defective base from the deoxyribose 
sugar in the DNA backbone by hydrolysis to leave an 
abasic (AP) site (Figure 2). In some cases, an AP-lyase 
activity of the enzyme desaturates the abasic sugar and 
cleaves its 3’-phosphate link. However, in all cases an 
AP-endonuclease then breaks the phosphate bond 
attached to the 5’-C of the abasic sugar to leave an 
upstream 3/-OH end suitable for extension by a repair 
polymerase. AP-endonucleases in E. coli are encoded 
by the xth and nfo genes. Both enzymes are also able 
to remove damaged sugars and other products exo- 
nucleolytically in the 3’ to 5’ direction. Homologs of 
one or the other of these AP-endonuclease genes are 
present in virtually all prokaryotes and eukaryotes. In 
bacteria, the 5’ to 3’ exonuclease activity of the repair 
enzyme DNA polymerase I may remove additional 
nucleotides, and the polymerase activity of the 
enzyme inserts the normal base as it resynthesizes 
the missing segment. In eukaryotes, a separate protein 
with 5’ to 3’ exonuclease activity interacts with DNA 
polymerase ¢ to carry out the repair replication. 
Finally, the repaired strand is sealed by the action of 
DNA ligase. 


Deaminated bases 

Conversion of cytosine to uracil is mutagenic. In addi- 
tion to deamination of cytosine, uracil occasionally 
enters DNA by erroneous incorporation of dUTP 
instead of dTTP during DNA synthesis. In both 
cases, uracil is removed by a DNA uracil glycosylase. 
Closely related proteins with this activity are found in 
nearly all species. Deamination of adenine converts 
it to hypoxanthine, which is excised by a different 
glycosylase. 


Mismatched Bases 

Escherichia coli contains a DNA glycosylase (MutY) 
that removes adenine residues from AIG (and to a 
lesser extent from AIC) mismatches. This is of parti- 
cular interest since AIG is one of the few mismatches 
that, depending on sequence context, is not always 
recognized by the generalized mismatch repair sys- 
tem described below. A DNA thymine glycosylase 
that acts on mismatches is found in human cells. It 
removes thymine from GIT mismatches. Thus, it can 
reverse the effect of spontaneous deamination of 5- 
methylcytosine, which produces thymine (Figure 1). 
In the DNA of some eukaryotic phyla, many cytosine 
residues are normally methylated at the 5-position. 
Although the GIT mismatch is readily recognized by 
the generalized mismatch repair system, as indicated 
below, that process can correct only newly synthe- 
sized DNA. This glycosylase, however, can repair 
the mismatch after deamination in preexisting DNA. 


Oxidation products 

Several DNA glycosylases are designed to remove 
products resulting from oxidation of DNA bases. 
Inasmuch as these products result also from X-rays 
or other ionizing radiation, the enzymes confer resist- 
ance to such radiation. One enzyme removes thymine 
glycol residues. Oxidation products of guanine 
include 8-oxoguanine and a formamido pyrimidine 
derivative (Figure ID). A single enzyme, DNA 
Fapy glycosylase, can excise both of these products. 


Alkyl groups and other adducts 

Common alkylated products in DNA are N3-methyl 
derivatives of adenine and guanine. Two glycosylases 
in E. coli, encoded by the tag and alkA genes, can 
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Figure 2 Base excision repair of uracil in bacterial DNA. (I) Removal of uracil, produced by deamination of 
cytosine, by DNA uracil glycosylase (g). (2) Strand cleavage at the abasic site by AP-endonuclease (a). (3) Removal of 
the deoxyribose-phosphate residue and a few additional nucleotides by the 5’-exonuclease (e) of DNA polymerase I. 
(4) Resynthesis (dashed line) of the strand segment from dNTPs by DNA polymerase | (p). (5) Closure of the 
repaired segment by DNA ligase (l). 


remove these alkylated bases. Homologs of the latter 
gene are found in eukaryotes, as well. A much rarer 
type of glycosylase is encoded by the bacterium 
Micrococcus luteus and by bacteriophage T4. These 
enzymes sever the upstream base of cyclobutane pyri- 
midine dimers from the DNA backbone. After action 
of an AP-endonuclease, the dimer-containing strand 
is excised to remove the remaining dimer adduct 
attached to the adjacent sugar. 


Ribonucleotide excision 

Ribonucleotides are normally incorporated into DNA 
as primers for lagging strand synthesis, after which 
they are removed by exonucleolytic action. Occa- 
sional incorporation of ribonucleotides in place of 
deoxyribonucleotides in DNA extension has been 
observed during im vitro synthesis, but it is not 
known how frequently this occurs in vivo or how 
deleterious is such incorporation. However, the fact 
that virtually all species harbor an enzyme, ribonucle- 
ase HII, that can recognize a single ribonucleotide 
residue in a DNA strand and then make an endonu- 
cleolytic single-strand break at that position, suggests 
that removal of adventitious ribonucleotides is bene- 
ficial to a cell. After this incision, action of a 5’ to 3’ 
exonuclease, DNA polymerase and DNA ligase 
would repair the DNA. This is a case of nucleotide 
excision, which falls between the category of base 
excision and that of oligonucleotide excision, outlined 
below. 


Oligonucleotide Excision 


General mechanism 

Inasmuch as exposure to UV radiation in sunlight is a 
nearly universal hazard, it is not surprising that living 
cells have developed a universal method for dealing 
with UV damage. Although some species evolved 
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additional mechanisms for photoreactivation or gly- 
colytic removal of cyclobutane pyrimidine dimers, a 
more general mechanism removes both these and 
(6-4) pyrimidine photoproducts from DNA, and it 
can remove other bulky adducts as well. Although 
many components of the mechanism have evolved 
independently in prokaryotes and eukaryotes, the 
repair systems in both groups are strikingly similar 
in function. First, the adduct is recognized, probably 
by its gross distortion of DNA structure. Two incisions 
are then made in the strand containing the adduct, 
one at a short distance on each side. Helicase action 
unwinds the DNA, and the strand segment containing 
the adduct is released as an oligonucleotide, following 
which the missing part of the strand is resynthesized by 
a DNA repair polymerase (Figure 3). 


Recognition of adducts 

Pyrimidine dimers distort DNA by partially unwind- 
ing and kinking it. Similar distortions may result from 
bulky base adducts introduced by N-acetoxy- 
2-acetylaminofluorene, 4-nitroquinoline-1-oxide, 
cisplatin, and psoralen. In bacteria, three highly con- 
served gene products, UvrA, UvrB, and UvrC, are 
required for oligonucleotide excision repair (Fig- 
ure 3). Dimers of UvrA bind to DNA and monitor 
it for damage; in the vicinity of damage, the UvrA 
protein also binds UvrB and installs it at the damaged 
site, at which time UvrA falls off the complex, leaving 
UvrB at the site ready for subsequent binding of 
UvrC. Migration of UvrA and UvrB along the DNA 
requires ATP hydrolysis, as does additional unwind- 
ing and kinking of the DNA by the complex. In 
eukaryotes, the proteins responsible for damage 
recognition and incision are unrelated to those in bac- 
teria and show no sequence homology. In the yeast 
Saccharomyces cerevisiae, a complex of two proteins, 
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Figure 3 Oligonucleotide excision repair of a thymine dimer in bacterial DNA. (1) Recognition of the thymine 
dimer by UvrA (a). (2) Binding of UvrB (b) and UvrC (c) to the complex of dimer and UvrA. (3) Departure of UvrA 
and incision of the dimer-containing strand by UvrB and UvrC 5 and 8 nucleotides, respectively, away from the dimer. 
(4) Removal of the dimer-containing oligonucleotide by the combined action of the UvrD helicase (h) and 
DNA polymerase (p); steps (3) and (4) require ATP. (5) Resynthesis (dashed line) of the strand segment by DNA 
polymerase (p) and closure of the repaired segment by DNA ligase (I). 
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Rad4 and Rad23, sense and bind to the damage along 
with Rad14; the homologous proteins in human cells 
are XPC, hHR23B, and XPA. 


Relationship to transcription 

Inasmuch as the movement of RNA polymerase in 
transcription is blocked by bulky adducts, it is vital 
for the cell to remove such impediments in DNA 
currently being transcribed. Blockage of the transcrip- 
tion apparatus itself can signal the presence of damage. 
Transcriptional helicase activities are used in the repair 
process, and, in eukaryotes, components of the tran- 
scription apparatus are even recruited to repair DNA 
not being expressed. The template strand in actively 
expressed DNA is more quickly repaired in both 
prokaryotes and eukaryotes. 


Components in prokaryotes 

In keeping with its role as a monitor for scanning 
DNA, UvrA, normally made in small amounts, is 
induced by DNA damage along with other proteins 
regulated by the SOS system, which regulates repair 
processes in E. coli. UvrB, a component of the moni- 
tor, is partially induced, but UvrC, which is needed 
only at the damage site and not to scan the DNA, 
needs and receives no such amplification. After 
UvrC binds to UvrB at the site of damage, UvrB 
cleaves the damaged strand 4 or 5 nucleotides down- 
stream from the damage, and UvrC cleaves the strand 
8 nucleotides upstream. Subsequent action of a DNA 
helicase, UvrD, releases an oligonucleotide, 12 or 13 
nucleotides in length, which contains the damage. 
DNA polymerase I then resynthesizes the missing 
segment. 


Components in eukaryotes 

When not coupled to transcription, recognition of 
pyrimidine dimers or bulky adducts in the yeast 
S. cerevisiae requires Rad4 and Rad23. After these pro- 
teins bind to the site of damage, they recruit Rad14, 
which facilitates assembly of the repair complex. In 
transcription-coupled repair, Rad14 binds directly to 
the damaged site at the stalled RNA polymerase. In 
either case, the TFIIH multiprotein complex that 
functions in transcription initiation is then recruited, 
and its 3’ to 5’ (Rad25) and 5’ to 3’ (Rad3) helicase 
activities open up the DNA in the vicinity of the 
damage. Additional factors, such as Rfal, Rfa2, and 
Rfa3, stabilize the open complex and help position 
endonucleases Rad2 and Rad1—Rad10, which incise 
the damaged strand in the 3’ and 5’ directions, re- 
spectively. A damage-containing oligonucleotide ~ 30 
residues in length is released. The gap in the excised 
strand is filled in by either DNA polymerase 6 or ¢ and 
sealed by DNA ligase I. 


Defects in the human genes encoding the mechan- 
ism for oligonucleotide excision repair result in xero- 
derma pigmentosum, a recessive genetic disease, 
which is characterized by an extreme sensitivity to 
light leading to disfiguring lesions and skin cancer. 
Disease-causing mutations fall into multiple comple- 
mentation groups, which encode the various com- 
ponents of the repair mechanism. Thus, the human 
correlates of the S. cerevisiae proteins Rad1, Rad2, 
Rad3, Rad4, Rad14, and Rad25 are XPF, XPG, XPD, 
XPC, XPA, and XPB, respectively. 


Mismatch Repair 

Evidence for DNA mismatch repair was first observed 
in genetic recombination. In the transformation of the 
bacterium Streptococcus pneumoniae, where a single- 
strand segment of donor DNA replaces the homolo- 
gous segment of host DNA, genetic markers giving 
different base mismatches are integrated with different 
efficiencies. A repair system that differentially elim- 
inates donor contributions to the mismatches accounts 
for the differences in integration. Mutations in the 
hexA and hexB genes, which govern such repair, also 
have a mutator effect, thereby indicating that the sys- 
tem acts also to correct newly replicated DNA. Muta- 
tions in genes of E. coli originally identified as mutators, 
mutS and mutL, also block mismatch repair. The hex 
and mut genes are homologous, and similar sets of 
homologous genes are found in nearly all cells. They 
encode a generalized mismatch repair system that 
recognizes and corrects a variety of base mismatches. 


Generalized mismatch repair 

In bacteria, the repair process begins with recognition 
of a mismatched base pair or a short deletion/insertion 
mismatch (1 to 4 nucleotides in length) by the MutS 
(or HexA) homodimeric protein, composed of two 
identical subunits (Figure 4). The affinity of binding 
varies for different mismatches, with GIT and AIC 
mismatches recognized (and corrected) most frequent- 
ly and CIC least frequently. A dimer of MutL (or 
HexB) attaches to the bound MutS. Powered by the 
hydrolysis of ATP, the complex then moves DNA past 
it on both sides of the mismatch to form a double loop 
until it reaches a strand break. At this point, the strand 
with the break is digested back toward the mismatch 
by the combined action of a DNA helicase, a 5’ to 3/- 
exonuclease and a 3’ to 5/-exonuclease, which elim- 
inates one strand containing the mismatch. That strand 
is then correctly synthesized on the complementary 
strand template by a DNA polymerase. Because the 
average length of a repair tract is ~ 1000 nucleotides, 
this is called a long-patch repair. When the repair 
mechanism detects a mismatch, it does not discrim- 
inate between strands on the basis of the mismatch 
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Figure 4 DNA base mismatch repair by the generalized mismatch repair system. (1) Binding to the mismatch by a 
dimer of MutS/HexA (a), followed by binding of MutL/HexB (b) to the complex. Strand breaks are present in the 
nascent DNA upstream from the replication fork. (2) As the protein complex remains bound to the mismatched base 
pair, DNA on both sides is looped past it until a strand break is reached. A DNA helicase (h), such as UvrD, and a 3’- 
exonuclease (3’e) degrades the segment containing the incorrect base from one end, and a 5’-exonuclease (5'e) 
degrades the segment from the other end. (3) The incorrect base is removed, together with a long tract of DNA, by 
combined action of the helicase and exonucleases. (4) Correct synthesis (dashed line) of the strand segment by DNA 
polymerase (p). (5) Closure of the repaired segment by DNA ligase (I). 


itself. What determines which strand is ‘corrected’ is 
the presence of strand breaks on the target strand. 
After transformation in S. pneumoniae, it is always 
the donor strand segment that is removed. In newly 
replicated DNA, breaks at the ends of ‘Okazaki’ frag- 
ments are due to discontinuous synthesis of the lag- 
ging strand. Breaks may also occur on the leading 
strand as a consequence of adventitious incorporation 
of uracil-containing nucleotides or ribonucleotides. In 
some bacterial species, including E. coli and its rela- 
tives, strand targeting is enhanced by its dependence 
on DNA methylation at GATC sites. This physio- 
logical methylation on adenine Ng is delayed in the 
newly replicated DNA, and its absence is sensed by an 
additional mismatch repair protein, MutH, to produce 
a strand break at the unmethylated site. Such methyl- 
ation enhancement does not occur in most bacteria or 
in eukaryotes. However, in addition to strand breaks, 
it is possible that specific proteins of the replication 
complex help target the replicating strand for repair. 
Targeting of the newly replicated strand ensures that 
mismatch repair prevents mutations. 

In eukaryotic cells, from yeasts to humans, general- 
ized mismatch repair is similar to the mechanism in 
bacteria, but with a few elaborations. The main repair 
mechanism is similar to that in bacteria, but instead of 
homodimers, the repair complex consists of a hetero- 
dimer containing two different MutS homologs, pro- 
teins MSH2 and MSH6. It also contains a heterodimer 
of MutL homologs, MLH1 and PMS2. A second 
mechanism, in which a third MutS homolog, MSH3, 
substitutes for MSH6, is restricted to repair of longer 
insertion/deletion mismatches. Additional homologs 
of MutS and MutL are found in eukaryotes, some of 


which function in meiotic recombination but not in 
mismatch repair. 


Mutation avoidance and cancer 

The purpose of generalized mismatch repair appears 
to be avoidance of deleterious mutations. In S. pneu- 
moniae, defects in the system increase spontaneous 
mutation rates ~100-fold, and in E. coli, with its 
methylation enhancement, by ~1000-fold. As well 
as increasing the frequency of solitary base change 
and short insertion/deletion mutations, defects in 
generalized mismatch repair allow the expansion or 
contraction of short repeat elements within genes. In 
humans, cancer is thought to result from accumulated 
mutations and other changes in genomic integrity and 
expression. A genetic predisposition to certain types of 
cancer was found in families with defects in the genes 
encoding MSH2, MLH1, and PMS2. Individuals het- 
erozygous for the recessive defect were much more 
likely to contract nonpolyposis colorectal cancer, 
and to a lesser extent, endometrial and ovarian cancer; 
and the cancers themselves were homozygous for the 
mismatch repair defect, as were many spontaneous 
cases of such cancer. Cancer-provoking mutations 
apparently arise more frequently in the absence of 
generalized mismatch repair. 


Specialized mismatch repair 

Some species have specialized systems that recognize a 
particular mismatch within a specific DNA sequence. 
These systems always remove the same component of 
the mismatch and replace it by synthesis of a short 
DNA segment (short patch repair). Many strains of E. 
coli normally methylate the Cs position of the internal 
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cytosine in the sequence CC(AorT)GG. Spontaneous 
deamination of the methylated cytosine gives a GIT 
mismatch. A mechanism exists for correcting this mis- 
match by incising the T-containing strand, removing 
the altered base along with a few more nucleotides, 
and resynthesis of the correct sequence. Required 
components include the incising enzyme (encoded 
by the gene vsp which is transcribed together with 
the gene dem encoding the methylating enzyme), the 
repair polymerase (DNA polymerase I) and ligase. 
Streptococcus pneumoniae contains a similar system 
that converts A to C in the mismatched sequence 


5/-ATTAAT/ TAAGTA-5’, 


Recombinational Repair 


Certain DNA repair processes depend on the pairing 
of homologous DNA segments and recombinational 
mechanisms otherwise used for genetic exchange. 
These processes intervene when there is no local tem- 
plate for repair, as there is for the repair mechanisms 
described above. Double-strand breaks in DNA, for 
example, are repaired by recombinational mechanisms 
that are similar in species ranging from bacteria to 
humans. 


Double-Strand Breaks 

Repair mechanisms for double-strand breaks are char- 
acterized by the following steps: (1) degradation of 
one strand adjacent to the break, which removes 
damaged nucleotides and exposes a single-stranded 
segment; (2) search for and recognition of homology 
to that segment in another chromosome; (3) form- 
ation of a recombination intermediate structure; 
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(4) replicative extension from the original break; and 
(5) return of the restored strand to its original locus 
(Figure 5). Such mechanisms may include but do 
not require recombination of distal markers on the 
chromosomes. In a prokaryote, the second chromo- 
some may correspond to previously replicated DNA, 
and in a eukaryote, to a sister chromatid or to the other 
member of a chromosome pair. 

In E. coli, the ATP-dependent RecBCD nuclease 
either degrades the 3’-ended strands at the DNA break 
or it unwinds the DNA until it reaches a specific 
sequence called Chi, where it cleaves one strand to 
leave a 3‘-OH-ended single-stranded segment. In the 
presence of RecA protein and ATP, this segment can 
find homologous double-stranded DNA elsewhere in 
the cell. The RecA-catalyzed interaction binds the 3/- 
ended segment to one strand at the new location, 
and the displaced strand binds to the complementary 
segment on the other side of the break. A DNA 
polymerase extends the 3/-ends, thereby repairing 
the damage. The RuvABC protein complex can 
resolve the recombination intermediate to separate 
the chromosomes. Similar processes occur in yeast 
and human cells, and homologs to RecA, such as 
Rad51, are essential for the repair. 

In addition to precise repair of double-strand 
breaks, eukaryotic cells have a mechanism that 
directly rejoins broken ends but usually gives rise to 
deletions of the DNA sequence adjacent to the break 
or, less frequently, inversions or translocations. Such 
imprecise joining may account for half or more of all 
double-strand break repair. The immune system uses 
this error-prone mechanism to enhance the variability 
of antibodies. 


Figure 5 Recombinational repair of a double-strand break in DNA. Symbols: | and Il, DNA duplexes from 
homologous chromosomes or sister chromatids marked, respectively, A-D or a—d, and depicted by thick and thin 
lines; R, recombination intermediate. (1) A double-strand break, formed, for example, by ionizing radiation, with 
damage (x) at strand ends. (2) A 5’-exonuclease (5'e) trims away damaged nucleotides and produces recombinogenic 
single-stranded tails. (3) Interaction with an intact homolog forms a recombination intermediate with removal of 
3’-end damage by another exonuclease (3'e) and extension (dashed lines) of the 3’-ended tails by DNA polymerase 
(p) using the intact homolog as template. (4) Resolution of the 4-stranded recombination intermediate by action of a 
resolving enzyme (r). (5) Product duplexes showing restored strand continuity and transfer of some information. 
Other products could be recombinant for flanking markers, A and D. 


Strand Gaps 

One mechanism of dealing with bulky adducts in 
E. coli is translesion DNA synthesis, where replication 
is blocked by the lesion and then picked up again 
downstream so that the daughter strand is interrupted 
by a gap opposite the lesion. This gapped strand is 
then repaired by a RecA-mediated recombination 
process. Inasmuch as the damage is sidestepped but 
not removed, this mechanism may be called damage 
tolerance rather than repair. 


Concluding Remarks 


Interaction of Repair Mechanisms 

Repair mechanisms may interact in constructive or 
destructive ways. In mismatch repair after transform- 
ation of S. pneumoniae, ligation of strand breaks at the 
ends of the donor segment rescues any uncorrected 
donor marker. However, with UV-irradiated donor 
DNA, new breaks made at pyrimidine dimers by the 
oligonucleotide excision system serve as signals for 
additional mismatch repair. 

After treatment of cells with methylating agents, 
removal by the mismatch repair system of a tract 
containing Og-methylguanine combined with incision 
of the complementary strand containing 3-methyl- 
adenine (after its removal by a glycosylase) produces 
a lethal double-strand break. As a consequence, in 
both bacterial and human cells, defects in mismatch 
correction confer resistance to alkylation. 


Redundancy of Repair Mechanisms 

The importance of DNA repair is highlighted by the 
frequent redundancy of repair mechanisms. Thus, in 
E. coli and yeast, UV-induced pyrimidine dimers are 
repaired by either photoreversal or oligonucleotide 
excision. In humans, GIT mismatches are corrected 
either by a mismatch-specific thymine glycosylase or 
by the generalized mismatch repair system. Redun- 
dancy ensures repair despite loss or malfunction of a 
single mechanism. 


Regulation of Repair 

To conserve cellular resources, repair systems are 
often induced only after detection of DNA damage. 
In bacteria, alkylation of DNA bases induces expres- 
sion of a repair methyltransferase, and a group of 
genes comprising the SOS system is induced by 
DNA damage. The latter genes include components 
of the oligonucleotide excision and recombinational 
repair mechanisms. In humans, DNA damage is 
sensed by a DNA-dependent protein kinase, which 
signals the expression of repair proteins. 
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Cell-Cycle Arrest and Repair 

Eukaryotic cells have 10 to 10000 times as much 
DNA as bacteria, yet they are not vastly more sensi- 
tive to DNA damage, despite the general similarity of 
their repair mechanisms. The eukaryotic secret lies in 
its control of the cell cycle to block DNA replication 
until the damage is repaired. Cell-cycle arrest after 
detection of DNA damage allows the defects to be 
corrected before they cause mutations or irreversibly 
interfere with replication. Bacteria generally lack this 
control. However, one species, Deinococcus radio- 
durans, appears to block replication until DNA is 
repaired. Deinococcus radiodurans is ~1000-fold 
more resistant to radiation than other bacteria. 


Significance of Repair 

The evolution of so many mechanisms of DNA repair 
attest to its significance for the survival of living crea- 
tures. Xeroderma pigmentosum and nonpolyposis 
colorectal cancer, mentioned above, are just two of 
numerous human diseases resulting from genetic 
defects in repair mechanisms. To the extent that 
aging results from DNA damage, these repair 
mechanisms may be the guardians of youth and vigor. 
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Organisms are continuously challenged by oxidative 
stress. Molecular oxygen, ionizing radiation, enzymes, 
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and certain chemicals generate superoxide, hydrogen 
peroxide, and hydroxy radicals — reactive oxygen 
species which, especially in conjunction with Fe**, 
damage macromolecular components of living cells. 
DNA is an important target for oxidative damage. The 
phosphodiester backbone and bases are both subject 
to attack, producing single- and double-strand breaks, 
intra- and inter-strand crosslinks, cyclic deoxynucleo- 
sides, abasic (AP) sites, and a variety of modified bases. 
These lesions interfere with critical cellular processes; 
for example, DNA polymerases may be blocked at 
the site of a lesion or instead may insert an incorrect 
deoxynucleoside triphosphate opposite the damaged 
base, leading to mutation(s). Ultimately, DNA damage 
manifests itself in cytotoxicity, mutagenesis, and/or 
cell death, and in higher animals, in cancer and aging. 
To avoid the deleterious effects of oxidative DNA 
damage, systems for its repair have evolved. 

Of the several general pathways for DNA repair — 
direct reversal, base excision repair (BER), nucleotide 
excision repair (NER), mismatch repair, and recombin- 
ation repair — four contribute to repair of oxidative 
DNA damage. Double-strand breaks are restored by 
recombination repair, interstrand crosslinks by a com- 
bination of NER and recombination repair. Intra- 
strand crosslinks and cyclic deoxynucleosides are 
repaired by NER, a pathway that plays a minor role 


in repair of abasic sites and some oxidized bases. Mis- 
match repair also contributes to the repair of oxidized 
bases. 

The most common forms of oxidative damage — 
single-strand breaks, AP sites, and base damage — are 
handled by BER. These lesions are repaired by the 
combined actions of a DNA glycosylase, AP endonu- 
clease, DNA polymerase, and DNA ligase, sometimes 
with the additional involvement of deoxyribopho- 
sphodiesterase and flap endonuclease. Repair of AP 
sites and single-strand breaks follows the same path- 
way, beginning downstream from the DNA glyco- 
sylase step. 

Following base excision, the pathways for repair of 
damaged bases converge so that AP endonucleases and 
other repair enzymes that lie downstream act on all 
forms of oxidative damage. DNA glycosylases are 
more specific; nevertheless, despite the variety of oxi- 
datively damaged bases, only a few glycosylases have 
been identified. These have relatively broad substrate 
specificity, some acting preferentially on oxidized 
pyrimidines, others on oxidized purines. 

Oxidatively damaged pyrimidines, exemplified by 
thymine glycol, are repaired by endonuclease III 
(EndolllI) and related enzymes. More than 15 types 
of oxidized bases can be excised by Endolll. With few 
exceptions, all lack aromaticity. Another glycosylase 
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Repair of 8-oxoguanine. 


that acts on oxidized pyrimidines is endonuclease 
VIII; its primary substrate also appears to be thymine 
glycol. Glycosylases specific for 5-hydroxymethyl- 
cytosine and 5-hydroxymethyluracil have been identi- 
fied in mammalian cells, but appear to be absent in 
lower eukaryotes and bacteria. A minor product of 
thymine oxidation, 5-formyluracil, is excised by 3- 
methyladenine-DNA glycosylases, an activity that 
also handles repair of alkylated DNA. 

8-Oxoguanine (8-oxoG), a major product of oxi- 
dation damage, presents a special problem for DNA 
repair. Replicattve DNA polymerases incorporate 
C and A opposite 8-oxoG; thus, the repair system 
for this lesion must convert both 8-oxoG:C and 
8-oxoG:A mispairs into a G:C pair. If 8-oxoG is 
excised from 8-oxoG:A and the gap is filled subse- 
quently with DNA polymerase, a GT transversion 
results. Therefore, 8-oxoG:A must be repaired by 
BER in two cycles. First, the undamaged base, A, is 
excised, followed by insertion of C. The resulting 
8-oxoG:C mispair is then repaired by excision of 
8-oxoG and insertion of G. A repair system composed 
of MutT, Fpg/MutM/Ogg1, and MutY counters the 
deleterious effects of 8-oxoG in both prokaryotes and 
eukaryotes (Figure 1). MutT is an 8-oxoGTPase 
which cleanses the cellular nucleotide pool, prevent- 
ing incorporation of 8-oxoG into DNA. Fpg (MutM) 
and its eukaryotic counterpart, Oggl, are DNA 
glycosylases that excise 8-oxoG from 8-oxoG:C 
but not from 8-oxoG:A. Mut Y is an adenine DNA 
glycosylase specific for 8-oxoG:A mispairs. During 
gap filling, DNA polymerases preferentially insert C 
opposite 8-oxoG. If A is inserted, the cycle of MutY 
repair is repeated. If C is inserted, the 8-oxoG:A 
mispair is converted into a 8-oxoG:C mispair, which 
becomes a substrate for Fpg/Ogg1. Other oxidized 
purines, such as ring-open formamidopyrimidines 
derived from A or G, are mostly repaired by Fpg or 
Oggi in a single cycle of BER. 

Another major class of oxidative DNA damage 
involves oxidation of deoxyribose in the DNA back- 
bone. All positions on the deoxyribose ring are 
susceptible to attack by free radicals. Chemical 
rearrangement inevitably leads to a single-strand 
break; base loss and ring fragmentation often occur. 
The 3’ or 5’ ends of DNA are modified at such a break 
(3'-phosphoglycolate and 3’-phosphate are examples); 
DNA polymerase cannot repair the damage directly. 
Instead, these lesions are processed initially by AP 
endonucleases, followed by the downstream events 
of BER. Recombination repair may also be involved. 
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Repetitive sequence was originally defined as DNA 
sequence that appears many times in the genome of an 
organism such that the individual instances are not 
easily distinguished by nucleic acid hybridization. 
With the introduction of genomic scale DNA sequen- 
cing, the definition can be extended to include related 
sequences of greater divergence. Gene families are 
usually not classified as repetitive sequences, unless 
their members are numerous and hard to distin- 
guish, or unless they commonly engage in genetic 
processes typical of repetitive sequences, such as 
unequal crossing-over or gene conversion. 


History 


The existence of large quantities of repetitive 
sequences in higher eukaryotic genomes was first 
inferred by kinetics of reannealing experiments. 
After denaturation, each component of the genome 
reanneals at a rate determined by its initial concentra- 
tion and hence the number of copies of the sequence per 
genome. The reannealing curve of human 
DNA, for example, reveals that about 10% of human 
sequences have a copy number greater than 100000, 
and 10-15% have a copy number between 100 and 
100 000. The terms highly repetitive and middle repeti- 
tive DNA are loosely applied to designate these ranges 
of abundance. 

The advent of restriction enzymes yielded many 
examples of repetitive restriction fragments that were 
easily distinguishable from the heterogeneously sized 
fragments produced from the rest of the genome. 
Repetitive sequences became identified with the repeti- 
tive restriction fragments that they produced and 
were often named after the restriction enzyme. Sub- 
sequently, cloning and sequencing experiments col- 
lated these fragments into larger units that were 
given more informative names; however, the human 
Alu repetitive family retains its name from this era. 
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The general distribution of each family was clarified as 
either clustered in tandem repeats, or dispersed 
throughout the genome. The term ‘interspersed’ is 
often used to emphasize that dispersed repeats are 
interspersed with genes. 

Tandemly repeated simple sequences were also isol- 
ated as satellite bands in a buoyant density gradient. 
They band at unusual densities because they have 
relatively pure and unusual base compositions. These 
are often designated as satellite DNAs. 

Genome sequencing has expanded the definition 
of repetitive DNA to include sequences that are rela- 
ted, but too divergent for detection by biochemical 
methods. This has expanded the scope of known repeti- 
tive sequences, particularly the dispersed families, and 
has allowed identification of new families. For exam- 
ple, members of the human SINE family named MIR 
are too divergent to permit identification other than 
by computerized sequence comparisons. Estimates of 
how much of the genome is derived from repetitive 
DNA have correspondingly swelled, and are ulti- 
mately dependent on the author’s choice of stringency 
for inclusion as repetitive sequence. If carried to the 
extreme, it is entirely plausible that the major portion 
of unique intergenic DNA in eukaryotic genomes is 
actually ancient repetitive DNA that is no longer 
recognizable as such. 


Types, Families, and Subfamilies of 
Repetitive DNA 


Repetitive sequences include very different types of 
DNA with respect to mode of origin, function, struc- 
ture, and genomic distribution. These include gene 
families, large blocks of tandemly repeated DNA, 
dispersed repeats resulting from the action of trans- 
posons and retroviruses, and short simple sequences 
generated independently at multiple dispersed sites. 
With the exception of short simple repetitive 
sequences, each of these types is composed of families 
with each family descended from a common ancestral 
sequence. The family members are therefore not 
identical in sequence, but related by divergent evolu- 
tion. There may also be a hierarchy of subfamilies 
defined. Each family or subfamily is represented by a 
prototypical sequence, which is often a consensus of 
several sequenced members. 


Gene Families 

Gene families may be tandemly arrayed or clustered. 
Individuals may be functionally equivalent or of spe- 
cialized function. Numbers of gene families have copy 
numbers on the order of 100, including ribosomal 
RNA genes, histone genes, immunoglobulin gene seg- 
ments, and olfactory receptor genes. 


Tandemly Repeated Sequences 

Repeat units of 5 bp up to several hundred bp are 
tandemly repeated to form blocks of 10kb up to 1 
Mb. Tandemly repeated sequences are often organized 
as smaller repeats within larger repeats. Their major 
concentration is in centromeric regions, although they 
are also found at telomeres, on the largely hetero- 
chromatic Y chromosome, and elsewhere. Tandemly 
repeated sequences generate variants by unequal 
crossing-over, and regions containing them can ex- 
hibit considerable length variation between species, or 
even within a population. Whereas some families are 
chromosome-specitfic, others are distributed on mul- 
tiple chromosomes revealing that additional genetic 
exchanges are occurring. Many of the tandemly 
repeated DNAs are also identified as satellite DNA. 
The human genome is about 5% satellite DNA. 


Transposons and Transposon-derived 
Sequences 

Transposons encode the enzymology to copy them- 
selves using either a DNA or RNA intermediate. The 
latter are called retrotransposons. Virtually every spe- 
cies of organism contains some kind of transposon. 
Transposition-competent copies are found in small 
numbers per genome. There has been horizontal 
transfer between species, and superfamilies are 
defined that cut across phylogenetic boundaries. 
Most transposons have an interspersed distribution, 
including sites within satellite DNAs. Examples of 
DNA transposons include the Tcl/mariner super- 
family, P element, bacterial transposons, and IS elem- 
ents. Examples of retrotransposon superfamilies 
include the Ty1/copia superfamily, the gypsy/Ty3 
superfamily, and non-LTR retrotransposons (LINE- 
like transposons). 

Transposons are usually accompanied by at least a 
small number of defective copies. In many cases, there 
are large numbers of defective and very divergent 
copies, and sometimes there are families of ancient 
defective copies where no corresponding active copy 
can be found. Large defective transposon-associated 
families are found most abundantly in eukaryotes with 
noncompact genomes. The most prominent family in 
the human genome is the LINE-1 or L1 family. LINE 
was originally meant to indicate any interspersed 
repeat of greater than 500 bp in length; but the term 
has since been identified with non-LTR retrotrans- 
posons and their derivatives regardless of length. 
LINE-like transposons typically generate truncated 
copies of all different lengths. The human LINE-1 
family contains at least 100000 members generated 
from an active core family of perhaps 100 members. 
The ages of the defective members range all the way 
back to the mammalian radiation. The study of human 


LINE-1 has been greatly advanced by the recent 
isolation of active copies and the demonstration of 
their mobility in cell culture. 


Retroviruses and Retrovirus-related 
Sequences 

Retroviral proviruses occur in small numbers in many 
mammalian and avian genomes. Under certain cir- 
cumstances these proviruses activate and initiate a 
viral infection. Some members of the gypsy retrotran- 
sposon family, including gypsy itself, include an envel- 
ope gene. These can be classified as retroviruses, 
extending the range to invertebrates. 

Defective ancient retroviruses are also dispersed in 
the mammalian genome. These retroviral-related se- 
quences can reach high copy numbers. They are pre- 
sumably amplified by a helper retrovirus. Retroviral- 
related sequences are usually accompanied by an even 
larger family of solo long terminal repeats (LTRs). 
Although humans do not have recognized active endo- 
genous retroviruses, they do have several anciently 
amplified families collectively called HERV. There 
are also families of interspersed repetitive sequence 
that are presumed to be retroviral-related due to 
the LTRs, but which have lost all internal retroviral 
similarity. 


Retroposons, SINEs, and Processed 
Pseudogenes 

The term retroposon is intended to include RNA 
templated insertion elements that do not encode 
their own reverse transcriptase. These include SINEs 
and processed pseudogenes. These can be found 
in high copy numbers in noncompact eukaryotic 
genomes. SINEs are typically 150-300 bp discrete 
length inserts derived indirectly from pol III tran- 
scribed RNA templates. The human Alu sequence 
and rodent B1 SINE are indirectly derived from the 
7SL signal recognition particle RNA gene. Many 
other SINEs are known that are derived from various 
tRNA genes. There are about 1 million Alu inserts in 
the human genome. 


MITEs 

MITEs stands for Miniature Inverted Repeat Trans- 
posable Elements. They are typically 150-300 bp in 
length and contain an imperfect inverted repeat struc- 
ture. MITEs can reach high copy numbers and are 
often found in gene transcription units. They are 
mostly found in plants and fungi. 


Short simple repeats 

Mammalian genomes typically contain runs of greater 
than 20 bp of tandemly repeated 1, 2, or 3 bp motifs. 
Some of these are dispersed throughout the genome at 
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up to 50000 loci. Unlike the other dispersed families, 
these are thought to each be generated im situ by an 
errant replication process, rather than being trans- 
posed from a parental copy. 


Long-range repeats 

Multiple copies of sequences of up to 100000 bp 
appear in the mammalian genome in low copy num- 
bers. They sometimes occur on multiple chromo- 
somes. 


General Patterns of Abundance 


Lower organisms (prokaryotes and unicellular eukar- 
yotes) tend to have compact genomes and few repeti- 
tive sequences. This presumably reflects a need for 
rapid replication. Transposons and ribosomal genes 
are the major repetitive sequences. 

In higher eukaryotes, the amount of repetitive 
DNA is highly variable, and is not well correlated 
with evolutionary advancement. Two general patterns 
are recognized: The ‘Drosophila pattern’ with a virtual 
lack of SINEs and other classes of diverged inter- 
spersed repeats, and the ‘Xenopus pattern’ (including 
humans) with lots of SINEs, LINEs, and retroposons. 
These patterns may reflect different activities of 
mechanisms that remove repetitive DNA. Similarly, 
Drosophila maintains its transposons at low allele 
frequency, whereas humans and other mammals 
allow insertions of all kinds to drift to fixation. This 
presumably reflects high or low intensity of selection 
against the physical presence of insertion elements in 
the genome. 


General Evolutionary Considerations 


Most repetitive families are derived from a common 
ancestral sequence by some process of duplication and 
divergence of the various members. For most families 
this process is ongoing resulting in a phenomenon 
called ‘concerted evolution.’ The hallmark of con- 
certed evolution is that in a group of species sharing 
a repetitive sequence family, the sequences are more 
homogeneous within each species than they are when 
compared between species. This observation implies 
that the common ancestral genome already had the 
repetitive sequence family, and that there was homo- 
genization of the diverging sequences as they des- 
cended into the various present-day species. The 
major homogenizing processes by type of repeat are 
usually gene conversion for gene families, unequal 
crossover for tandemly arrayed repeats, and transpos- 
ition for interspersed repetitive sequences. However, 
all three of these processes can play a role in the 
evolution of any type of repetitive sequence. 
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The evolution of repetitive sequences often pro- 
ceeds in sudden jumps whereby a new variation in 
the repetitive sequence appears to have been widely 
distributed in a short time. These events are called 
‘amplifications’ and the newly distributed variants 
are called ‘subfamilies.’ Subfamilies defined in this 
way do not necessarily have a sibling relationship, 
but rather are often descended one from the other. 

For example, most human Alu inserts appear to 
have been generated in one of several major amplifica- 
tions during the descent of primates. The present-day 
output is low by comparison. This organization gives 
rise to a theory that an amplification is driven by a 
single source locus that transiently acquires a high 
level of activity in generating new inserts. 


Functionality of Repetitive Sequences 
Repetitive DNA with Function 


Tandemly repeated gene families 

Ribosomal 18S and 25S genes are encoded in a unit 
that is repeated in tandem 150-250 times. The system 
is best characterized in Drosophila melanogaster. The 
genes are essentially of equivalent function, and are 
repeated to support massive expression. Genetic vari- 
ants are mainly due to unequal crossing-over altering 
the number of genes. These variants, called bobbed, 
range from mild to lethal phenotypes. 


Clustered gene families 

Clustered gene families usually have developed 
specialized functions for the individual members. 
Unequal crossing-over and gene conversion create 
genetic variants with altered gene numbers and altered 
function. Gene conversion can become a dominant 
process for certain gene families that are under diver- 
sifying selection. The classic case is the major histo- 
compatibility locus, wherein diversity is maintained 
by gene conversion among members of the gene 
family creating a series of new genes that are mosaics 
of older ones. 


Developmentally regulated rearrangements in gene 
families 

The availability of unusual processes affecting re- 
petitive gene families has been recruited for devel- 
opmental regulation in several cases, including 
immunoglobulin gene expression, and antigenic vari- 
ation in trypanosomes. 


Alphoid centromeric DNA 
The most abundant repetitive DNA is usually located 
within centromeres at the site of spindle attachment. 


This region is called alpha heterochromatin and the 
DNA is called alphoid DNA. The human version is a 


171 bp tandem repeat called alpha satellite. Alphoid 
DNA contains binding sites for kinetochore proteins 
that attach the spindle to the chromosome. The repeti- 
tive nature of the DNA underlies the repetitive nature 
of the spindle attachment. However, Saccharomyces 
cerevisiae makes do with a single attachment site per 
chromosome. 


Core telomeric DNA 

The very ends of linear chromosomes consist of a 
short repeat similar to the motif TTGGGG first char- 
acterized in Tetrahymena. This sequence supports 
extension by telomerase, unusual DNA structure, 
and the binding of proteins to maintain the specialized 
functions of the telomere. In a curious twist, a longer 
repeat derived from LINE-like retrotransposons 
(Het-A and TART) appears to have replaced the func- 
tion of the classic telomeric repeat in Drosophila mela- 
nogaster. The human core telomer is 5000-12 000 bp of 
(TTAGGG),. 


Other Proposed Functions or Genetic 
Effects of Repetitive Sequences 

The term ‘effects’ means consequences that are not 
necessarily sufficient to select for the presence or 
elimination of the repeat family. 


Heterochromatin 

There are additional large heterochromatic segments 
of DNA surrounding the centromere (beta hetero- 
chromatin), adjacent to the core telomere, encompass- 
ing the Y chromosome, and less so at other sites. These 
regions contain a more heterogeneous collection 
of simple sequences, concentrations of interspersed 
repeats, and some intermixed unique DNA and 
genes. Some of the repeated sequences may function 
in maintaining the condensed structure, attaching to 
the nuclear envelope, or modulating position effect 
variegation. 


Chromosome translocation 

Tandemly repeated sequences, particularly around the 
centromere, are a major site of chromosome transloca- 
tion. Recombination is suppressed in centromeric 
regions, possibly as an adaptive response. 


Insertional mutagenesis 

Creation and maintenance of dispersed repeat families 
implies generation of new inserts, with a concomitant 
incidence of gene disruption. The incidence is usually 
very low. For example, human LINE-1 damages genes 
at 1/500 the rate of base substitution. 


Influence on gene expression 
Interspersed repeats sometimes alter expression of 
genes adjacent to their insertion site. Expression may 


be increased or decreased. The effect is best demon- 
strated for retroviral inserts in mammals and trans- 
posons in Drosophila. The effects may be either 
beneficial or detrimental. 


Selfish propagation 

Transposons encode a mechanism to increase their 
own copy number. This causes a self-generated con- 
servation of the ability to transpose and selection to 
transpose more aggressively. Presumably this forces 
adaptations on the part of the host genome to limit 
their numbers. 


Ectopic exchange 

Ectopic exchange is the recombination between two 
loci that are nonhomologous in chromosomal position. 
Dispersed repeats provide a target for ectopic exchange 
leading to large deletions or translocations. A few 
examples are known involving human Alu. Genomes 
with large amounts of interspersed DNA presumably 
have adapted their recombinational mechanism to 
avoid a high incidence of such recombinations. 


Triplet repeat expansion 

In humans, some trinucleotide repeats show a ten- 
dency for large saltatory expansions leading to genetic 
disease. Examples are fragile X syndrome, monotonic 
dystrophy, and Huntington’s disease. There is also a 
tendency for codons to be expanded into short homo- 
geneous amino acid runs found in the evolutionary 
history of many proteins in many phylogenetic groups. 


Uses of Repetitive DNA 


Recombination Mapping 

Dispersed simple sequence tracts are often highly 
polymorphic in their lengths. When assayed by 
PCR, these simple sequence length polymorphisms 
(SSLPs) make spectacularly informative genetic 
markers for recombination mappings. They also 
make excellent physical markers for correlation with 
clone mapping and sequencing. As such, SSLPs have 
become the lynchpins of the human genome project. 
The most commonly used simple sequence is (CA),,. 


Transposons 

Transposons have received heavy usage in many 
organisms for disrupting genes, tagging genes, and 
as vector systems. 


Oncogene Isolation 

A classic example of the use of interspersed repeats to 
tag genes was the isolation of human oncogenes after 
introduction into rodent cells to screen for oncogenic 
activity. The oncogenes were retrieved based on their 
proximity to human Alu sequences. 
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Fingerprinting 

The collection of dispersed repetitive DNAs on a 
particular large fragment of DNA serves to identify 
that fragment when displayed by a hybridization or 
PCR-based technique. These techniques are used for 
purposes of identifying overlapping clones, or to 
analyze chromosome fragmentation patterns. 


Phylogenetic Analysis 

The amplification of a family or subfamily of SINEs 
or LINEs is a directional event that can not revert. 
Therefore, all of the descendant species of an ancestral 
species that harbored such an amplification are perman- 
ently marked. This property has been used to sort 
out some difficult phylogenetic relationships. 


Population Studies 

The most recently generated SINE and LINE inserts 
are still unfixed within the population. They have 
useful properties for population studies in that the 
ancestral state is known and the insertion does not 
revert. 
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The process of templated duplication, such as occurs 
in RNA and DNA synthesis. 
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Very little in this world is perfect and when genetic 
material is replicated mistakes are made. Some, but not 
all, of these mistakes result in a change in the genetic 
information which is termed a mutation. It is custom- 
ary to express spontaneous mutation rates per gene or 
base or base pair replicated. Thus in bacteria mutation 
rates are generally found to be between 10° '° and 107° 
per base pair replicated. There is nothing universal 
about these figures, however. Among organisms with 
DNA genomes mutation rates per nucleotide incorp- 
orated vary by more than 10*-fold (see Table 1). 
Surprisingly, when expressed per genome, the rates 
for cellular and subcellular organisms all come out at 
around 0.003, and for multicellular organisms slight- 
ly more. Genomes also vary by a factor of more than 
10*-fold and the larger the genome, the lower the 
mutation rate per base pair. Mutation rates per nucleo- 
tide are thus inversely related to genome size. This 
relationship, which shows that mutation rates are 
themselves highly evolved, was discovered by John 
W. Drake and is known as Drake’s rule; from it evolu- 
tionary geneticists have made a number of important 
extrapolations. Organisms with RNA genomes do 
not comply with Drake’s rule and have an extremely 
high rate of replication errors; the average mutation 
rate per genome for lytic RNA viruses is, for example, 
around 1. 

In the early post-Watson—Crick days it was tempt- 
ing to suppose that fidelity of DNA replication was 


largely dependent upon the hydrogen bonding of 
the two ‘normal’ base pairs, guanine with cytosine 
and adenine with thymine. Time has shown that this 
pairing is no more than a rough preference which is 
exploited by enzymes concerned with DNA proces- 
sing. The existence of transient tautomeric and ionized 
forms of the bases further complicates the picture and 
in addition bases may undergo chemical change to 
forms with altered pairing properties. Examples of 
this are the spontaneous deamination of cytosine to 
uracil (which pairs with adenine instead of guanine) 
and the oxidation of guanine to 8-oxo-7,8-dihydro 2’ 
deoxyguanine (which can pair with either cytosine or 
adenine). 

DNA polymerases themselves are responsible for 
the basic accuracy of replication. A replicative poly- 
merase, such as the a subunit of DNA polymerase III 
holoenzyme (Pol II) of Escherichia coli, can reduce 
the error rate for base substitutions to around 107% to 
1074 per base pair, although this is dependent upon the 
base being replicated and the sequence context. Error 
rates for insertion or deletion of bases (frameshifts) are 
higher. Slightly lower rates (around 10~°) have been 
reported for Pol III holoenzyme acting im vitro and in 
vivo. Overall it may be inferred that base selection by 
the a subunit of Pol III contributes a factor of roughly 
10*-10° to the fidelity of genome replication. A glance 
at Table I, however, shows that a polymerase error 
rate of 10° or 10 * is by no means enough to account 
for the fidelity of replication in E. coli (5.4 x 107'° per 
base pair), let alone that in higher organisms. To 
achieve rates such as this, two important devices have 
evolved to correct polymerase errors. The first of these 
is proofreading, which is carried out in extremely close 
proximity to the polymerase. The proofreading func- 
tion is sometimes carried out by another domain of the 


Table I Drake’s rule concerning mutation (error) rates per generation and genome size 
Organism Genome size Mutation rate Mutation rate 
(bases or base pairs) per base per genome 

Phage MI3 6.4 x 10° 7.2 x 1077 0.0046 

Phage lambda 4.9 x 104 7.7 x 107° 0.0038 
Phages T2 and T4 1.7 x 10° 2.4 x 107° 0.0040 

E. coli 4.6 x 10° 5.4 slo: 0.0025 
Saccharomyces cerevisiae 1.2 x 107 2.2 x 10°" 0.0027 
Neurospora crassa 4.2 x 107 7.2 x 107"! 0.0030 
Caenorhabditis elegans 1.8 x 10% 2.3 x 107'° 0.004** 
Drosophila 1.6 x 10 3.4 x 10°! 0.005** 
Mouse 8 x 107% 1.8 x 107'° 0.014** 
Human 8 x 107 5 x lo" 0.004% 


*For higher eukaryotes this is the ‘effective’ genome size, i.e., that portion in which most mutations are deleterious. 


**Mutation rate per effective genome. 


polymerase polypeptide itself, as with bacterial DNA 
polymerase I (Pol I), and sometimes by a separate 
protein, as in bacterial Pol III where the polymerase 
function is carried out by the « subunit and the proof- 
reading function by the s subunit. Essentially, proof- 
reading is the action of a 3’ to 5’ exonuclease which has 
a greater probability of excising a newly polymerized 
base that is mismatched than one that is correctly 
matched. In vitro work has shown that at least 92% 
of misinserted nucleotides are removed by the £ sub- 
unit of Pol III, except where the next template base 
correctly matches the inserted mismatched base. 
Overall, the presence of the ¢ subunit in bacterial Pol 
III typically reduces misincorporation frequencies to 
between 10°’ and 10°°. The efficiency of proofreading 
is determined in part by the polymerase that has under- 
taken the synthesis. Polymerase error rate is a function 
not only of the probability of inserting a mismatched 
base, but of the ability of the polymerase to use the 
mismatched base as a primer for further synthesis. A 

polymerase that is reluctant to continue synthesis on a 
mismatched primer terminus will allow much more 
time for proofreading to act than one that continues 
synthesis and so hides the mismatched base from the 
proofreading exonuclease. This property of a poly- 
merase may be quite independent of its intrinsic 
misincorporation rate. 

Quantitatively more important than proofreading 
are mismatch correction processes which occur some 
way behind the replication fork. In these processes, 
enzymes remove either the incorrect base or a section 
of the newly synthesized strand that contains the 
mismatch. In bacteria the most important general 
mechanism is one which removes a long patch (around 
10° nucleotides) of newly synthesized DNA and 
allows a second attempt at polymerization. General- 
ized mismatch correction operates not only on mis- 
matched bases but also on small frameshifts which 
cause one strand to loop out. The proteins involved 
in this pathway are conserved from bacteria to humans 
and the mechanism is an important defense against 
cancer. To be effective, generalized mismatch correc- 
tion must not only recognize the presence of a mis- 
match but also distinguish which strand is parental 
and which is newly synthesized. In E. coli this is 
achieved by means of a methylation tag. Soon, but not 
immediately, after polymerization a methyl group is 
attached to adenine residues at specific sequences in 
the DNA. Until this is done the newly synthesized 
strand can be recognized by the absence of the methyl 
groups and there is thus a window of time in which 
mismatch correction can take place. Other bacteria 
and higher organisms have other ways of distin- 
guishing newly synthesized from parental strands 
of DNA, most of which remain cloaked in mystery. 
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The combined operation of mismatch correction pro- 
cesses following replication and proofreading enables 
the overall error frequency to be reduced to below 
107° or less per base pair replicated. 

While most replicative DNA polymerases have 
error rates below 10 *, some specialized polymerases 
exist with much higher error rates. DNA polymerases 
IV and V in E. coli, for example, have error rates of the 
order of 1074-107% and have specialized roles for the 
generation of genetic variability and for synthesizing 
past damage in the template strand. Other error-prone 
polymerases are suspected of being responsible for the 
somatic hypermutation that occurs in mammalian 
immunoglobulin genes. 


See also: DNA Repair; DNA Replication; Genome 
Size 
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The replication eye is a region within a longer, 
unreplicated region, in which DNA has undergone 
replication. 


See also: Replication 


Replication Fork 
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The replication fork is the point at which DNA 
strands are separated in preparation for replication. 
Replication forks thus move along the DNA as 
replication proceeds. 


See also: Okazaki Fragment; Origin (ori); 
Replication 
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A unit of replication consisting of an origin replica- 
tion, a terminator on one or both sides, and the 


1678 Replisome 


segment of adjacent DNA under the control of the 
origin and terminator(s). 


See also: Replication 


Replisome 
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A replisome is a complex of proteins involved in the 
replication of DNA which moves along as the new 
complementary strand is synthesized. Main compon- 
ents include DNA polymerase III and a primosome. It 
has been suggested that an RNA replisome may be an 
evolutionary ancestor of the ribosome. 


See also: DNA Polymerases; Primosome; 
Replication 
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A reporter gene is one that encodes an easily assayed 
product (e.g., chloramphenicol transacetylase) that is 
coupled to a promoter of interest and transfected into 
cells. Expression of the gene (under different condi- 
tions, or in the presence of other factors) can be used 
to assay promoter function. 


See also: Promoters 


Repression 
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Repression is the ability of bacteria to prevent syn- 
thesis of certain enzymes when their products are 
present. It is caused by inhibition of transcription or 
translocation by virtue of the binding of repressor 
protein to a specific site on DNA or mRNA. 


See also: Repressor 


Repressor 
| Schildkraut 
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A repressor is a protein that binds to a short specific 
DNA sequence and controls the expression of a gene 
or operon. A repressor is a negatively acting regula- 
tory protein. It binds to the operator region of a 
promoter and thereby negatively influences the ability 
of RNA polymerase to transcribe the gene or operon. 
The binding of a repressor to a specific DNA sequence 
ensures that it will not control other genes or operons 
and is specific for its own operator sequence. A repres- 
sor can also bind to a small molecule, which is called 
an effector. There are two types of effectors. One type 
is called an inducer. When an inducer is bound to 
its repressor, the repressor losses its ability to bind 
to its operator sequence. In the absence of the inducer, 
the repressor binds to its operator. The other type of 
effector is called a corepressor. When a small molecule 
corepressor is bound to its repressor, the repressor 
gains the ability to bind to its operator; in the absence 
of the corepressor, the repressor does not bind to the 
operator. When repressors are not bound to their cog- 
nate operator the gene or operon can be transcribed by 
RNA polymerase. 

The lac repressor of E. coli is a well-studied ex- 
ample of a repressor whose effector is an inducer. 
The lac repressor controls the expression of the lactose 
operon, which is responsible for the metabolism of 
lactose. The lactose operon is composed of three 
genes and all three are transcribed into a single poly- 
cistronic messenger RNA. In the absence of lactose in 
the medium a bacterium has no need to produce the 
proteins necessary for the metabolism of lactose. The 
lac repressor ensures that the cell will not waste 
resources by transcribing the lac operon. The lac 
repressor has two binding sites. One is specific for 
the operator sequence on DNA and the other is spe- 
cific for the inducer, in this case, lactose. When lactose 
is added to the medium, lactose is transported into the 
cell and binds to the lac repressor. After the lac repres- 
sor binds the lactose it undergoes a slight alteration in 
its structure and no longer has an affinity for the lac 
operator. The genes necessary for the utilization of 
lactose are transcribed and translated. Lactose can 
then be utilized as a carbon source. Once the lactose 
is depleted, lactose no longer binds to the repressor 
and the repressor’s structure returns to the unin- 
duced state and binds to the lac operator blocking 
its transcription. 


The trp repressor is an example of a repressor that 
requires a corepressor in order to bind to its operator. 
The amino acid tryptophan is the product of the 
enzymes encoded by the trp operon. The trp repressor 
binds to its operator only when there is a sufficient 
level of tryptophan in the cell. Tryptophan binds 
to the trp repressor, which in turn binds to the trp 
operator, blocking transcription of those genes that 
encode the enzymes necessary for the manufacture 
of tryptophan. 


See also: lac Operon; Operon 
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Introduction and Definitions 


Reproductive isolation is the reduction or cessation of 
reproduction between members of different species 
compared to that seen between individuals of the 
same species. The importance of reproductive isol- 
ation is that it is the keystone of the popular ‘biological 
species concept’ (henceforth BSC) introduced by 
Ernst Mayr and Theodosius Dobzhansky in the 
1930s. The BSC is an attempt to capture in words the 
discrete groups of organisms known as ‘species,’ and 
is the working definition of species most often used 
by modern evolutionary biologists. 

Dobzhansky was the first to connect the discrete- 
ness of species in nature with the fact that members of 
different species are reproductively isolated. As he 
noted: 


..no discrete groups of organisms differing in more than a 
single gene can maintain their identity unless they are pre- 
vented from interbreeding with other groups... Hence the 
existence of discrete groups of any size constitutes evidence 
that some mechanisms prevent their interbreeding, and thus 
isolate them. 


This idea led directly to the BSC. As defined by Mayr, 
a species consists a group of interbreeding populations 
that is reproductively isolated from other populations 
living in the same area. Thus, two individuals whose 
gametes can unite and produce a fertile hybrid when 
they co-occur in the wild are considered conspecific; 
otherwise they belong to different species. This def- 
inition of course applies only to sexually reprodu- 
cing organisms. It is not yet clear whether asexually 
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reproducing species, such as bacteria, form taxa just as 
discrete as those which reproduce sexually. 


Reproductive Isolating Mechanisms 


The various factors that prevent gene flow between 
species are known as ‘reproductive isolating mechan- 
isms,’ and are categorized in Table |. They include 
factors preventing members of different species from 
mating or forming zygotes (‘prezygotic isolation’), 
and those acting after fertilization (‘postzygotic isol- 
ation’) that lead to sterile or inviable hybrids. The 
recognition that species are characterized by these 
mechanisms (a recognition that eluded Darwin, who 
did not have a clear notion of species) is important, 
for it leads immediately to a research program for 
studying speciation: the origin of species becomes 
equivalent to the origin of reproductive isolating 
mechanisms. Evolutionists have thus grown intensely 
interested in understanding the various mechanisms 
that Prevent gene flow between naturally occurring 
species, and geneticists in finding, counting, and char- 
acterizing the genes that cause reproductive isolation. 
It is important to realize that reproductive isolating 
mechanisms do not usually evolve as a result of selec- 
tion to prevent gene flow between species. Rather, 
such mechanisms are most often accidental by- 
products of divergent evolution — due to either natural 
selection or genetic drift — between physically isolated 
populations. For example, geographically isolated 
populations of a species may undergo divergent sex- 
ual selection, so that each develops different male 
characteristics and female preferences for those charac- 
teristics. If these populations later come to inhabit the 
same area, females from one population may no longer 
recognize males of the other population as appropriate 
mates, and the two populations would have attained 
species status based on sexual isolation. Similarly, eco- 
logical isolation can occur when isolated populations 
adapt to different niches, and then retain the niche 
differences when their ranges overlap. Postzygotic 
isolating mechanisms, hybrid sterility and inviability 
may arise simply because of the divergence of devel- 
opmental and reproductive systems of species. 


Genetic Analysis of Reproductive 
Isolation: Principles 


Reproductive isolation is unique in evolution because 
it is not a trait possessed by members of a single 
species, but a composite character that is the joint 
property of a pair of species. A single species can be 
reproductively isolated only with respect to another. 
Moreover, by its very nature, reproductive isolation is 
a trait that almost always involves epistatic interaction 
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Table I Reproductive isolating mechanisms 


Prezygotic isolation (factors preventing members 
of different species from forming fertilized eggs) 


Postzygotic isolation (factors preventing members of 
different species from producing viable and fertile 
offspring once a hybrid egg is formed) 


|. Ecological isolation: members of different species live 
in the same general area, but confine mating or 
reproduction to different habitats, so that hybrids are 
not formed 

2. Temporal isolation: members of different species mate 
at different times of day or year, preventing gene flow 

3. Sexual isolation: members of different species do not 
mate because of a lack of cross-attraction; this can be 
due to differences in behavior, pheromones, mating 
calls, or (in plants) different pollinators 

4. Mechanical isolation: members of different species 
cannot copulate effectively because of physical 
incompatibility of the genitalia or sperm transfer organs 

5. Gametic isolation: sperm or pollen of one species 
cannot properly fertilize eggs of the other species; 
this may be due to poor viability of gametes in the 
sexual ducts of another species, or chemical or 
physical incompatibility between gametes of different 
species 


|. Hybrid inviability: hybrid zygotes are formed, but hybrids 
are either inviable or have reduced viability in the F; or 
later generations 

2. Hybrid sterility: adult hybrid individuals are formed and 
are viable, but are sterile or semiserile in the F, or later 
generations 


between alleles — but alleles occurring in different 
species. Hybrid inviability, for example, results from 
genes that produce normal viability in members of 
their own species but are lethal when interacting 
with alien genes in hybrids. Similarly, sexual isolation 
is caused when females evolved to prefer traits of 
conspecific males encounter different traits in other 
species. This composite and epistatic nature of repro- 
ductive isolation guarantees that speciation will not 
only show emergent genetic and phenotypic properties 
not seen in studies of a single species (e.g., Haldane’s 
rule; see below), but also that mathematical theories 
of speciation will be different—and perhaps more com- 
plicated — than models of evolution in single lineages. 
While genetic analysis of reproductive isolation has 
occurred since the mid-1930s, mathematical theories 
of speciation are only now beginning to appear. 

There are several reasons for studying the genetic 
basis of speciation. First, just as with a trait that 
evolves within a lineage, one wants to know whether 
a reproductive isolating mechanism has a ‘simple’ 
genetic basis (i.e., involves only a few genes of large 
effect) or is based on the accumulation of many genes. 
The number of genes involved may, in turn, allow 
inferences about the evolutionary process producing 
reproductive isolation. For example, if the difference 
in plumage color between males of two sexually 
dimorphic bird species is due to many genes of small 
effect, one may posit that these differences arose by 


sexual selection during which the male trait evolved 
step-by-step in concert with the female preference for 
that trait. 

Similarly, the pattern of genetic differences causing 
reproductive isolation may give clues to the under- 
lying evolutionary processes. It has been found, for 
example, that there are often many more genes causing 
hybrid male than female sterility between closely 
related species of Drosophila. This has led to the idea 
that hybrid sterility may result from sexual selection 
acting in isolated populations. Such selection, based 
on female choice, may cause more evolutionary 
change in males than in females, leading to the prefer- 
ential sterility of male hybrids as an accidental out- 
come. Finally, genetic analysis can help localize small 
sections of chromosomes containing genes causing 
reproductive isolation, a necessary prelude to cloning 
and sequencing these genes. Such molecular work is 
essential for understanding the developmental basis of 
reproductive isolation, including the question of how 
a gene that works normally within a species causes 
deleterious effects in hybrids. At this writing we 
understand the developmental basis of only one case 
of reproductive isolation: the formation of lethal 
melanomas in hybrids between the swordfish and 
platyfish. This hybrid lethality is based on an onco- 
gene in one species that is normally suppressed by 
another gene in the same species; the absence of sup- 
pression in hybrids causes the appearance of tumors. 


Ideally, a study of the genetics of speciation should 
involve only reproductive isolating mechanisms that 
evolved up to the point at which gene exchange 
between populations was first reduced to zero, for it 
is at that point that speciation is complete. Because of 
divergent evolution, however, reproductive isolation 
continues to accumulate even after species cannot 
exchange genes, but such isolation is incidental to 
speciation. A proper study of speciation thus requires 
identifying the isolating mechanisms leading up to 
complete isolation (there may, of course, be more 
than one). This is not easy, as it requires that one 
must find either incipient species that have not yet 
evolved complete reproductive isolation, or species 
in which gene flow is prevented by only a single 
form of reproductive isolation. This has been possible 
in some cases, as with polyploidy in plants (see 
below), but in no group of animals or plants have 
there been systematic attempts to determine which 
forms of reproductive isolation are the first to evolve. 
Instead, there are only tentative conclusions based on 
general impressions. It has been suggested, for ex- 
ample, that sexual isolation is the most important 
factor causing speciation in birds, as closely related 
species do not hybridize in the wild but will produce 
fertile hybrids when forcibly crossed in the labora- 
tory. Such suppositions are intriguing, but neglect 
possible ecological isolation, and must be buttressed 
by systematic analysis of populations at different 
stages of evolutionary divergence. 


Genetic Analysis of Reproductive 
Isolation: Data 


Summary of Existing Work 

Genetic dissection of reproductive isolation has taken 
several forms. In some cases, important components 
of isolating mechanisms can be shown to Mendelize in 
hybrids, and thus are probably based on alleles of a 
single gene. For example, a simple difference in the 
structure of female pheromones between two sexually 
isolated strains of European corn borer Ostrina 
nubialis exhibits simple Mendelian ratios in back- 
crosses and F3s. 

In many cases, stocks containing morphological 
mutants or molecular markers can be used in species 
crosses to identify segments of chromosomes contrib- 
uting to reproductive isolation. In such cases, species 
are crossed, and the degree of reproductive isolation 
in backcrosses or Fəs can be correlated with the 
genetic constitution of the progeny to determine 
which markers are linked to genes causing reproduct- 
ive isolation. Figure | shows one such case: a genetic 
dissection of sterility in male hybrids between Droso- 
phila mauritiana and D. sechellia. Here we see that all 
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three of the major chromosome arms carry genes 
causing sterility, but that the X chromosome has by 
far the greatest effect. (These ‘“X-effects’ are seen in 
many hybridizations and may reflect the fact that 
genes causing sterility behave recessively in hybrids, 
thus being expressed more strongly when on hemizy- 
gous than on heterozygous chromosomes. This reces- 
sivity may, in turn, explain a common pattern of 
speciation first noticed by J.B.S. Haldane and now 
called ‘Haldane’s rule’: if species’ crosses produce 
only one gender of offspring that is sterile or inviable 
— while the other is fertile or viable — the afflicted sex 
is nearly always the heterogametic one. This general- 
ization is true regardless of which sex is heter- 
ogametic.) With the advent of molecular technology, 
DNA markers can now be localized at high density 
along chromosomes, and any such markers that differ 
between species can be used to map reproductive 
isolation. 

Finally, the number of genes involved in a repro- 
ductive isolating mechanism — though not their loca- 
tion — can also be inferred from biometrical analysis, in 
which the means and variances of a character in pure 
species, and in F4, F2, or backcross hybrids can, under 
certain assumptions, yield estimates of the number of 
genes involved in a character difference. If the trait is 
involved in reproductive isolation, such a method can 
be used in genetic studies of speciation. 

There are few genetic analyses of speciation that 
have used these methods to get fairly accurate num- 
bers of genes causing reproductive isolation. All exist- 
ing studies are summarized in Table 2. The table 
includes all investigations of characters that (1) are 
likely to be involved in reproductive isolation between 
species, (2) used one of the three methods of genetic 
analysis described above, and (3) surveyed enough of 
the genome so that one can obtain a fairly accurate 
estimate of the minimum number of genes involved in 
reproductive isolation. 

Although the table shows that the genetic basis of 
reproductive isolation runs the gamut from simple to 
polygenic bases, the data are less than satisfactory for 
several reasons. First, a high proportion of the work is 
on Drosophila because of the genetic tools available in 
the genus and its traditional use as a model organism 
in evolutionary genetics. Most of the remaining data 
come from studies of the monkeyflower Mimulus. 
Obviously, we need similar data from a more diverse 
group of organisms. In addition, reproductive isolat- 
ing mechanisms are not evenly represented. Mechan- 
isms that are easy to study, such as hybrid sterility 
and inviability, are well represented, those less tract- 
able, such as behavioral isolation, have been studied 
less often, and those that are very difficult to study, 
such as ecological and temporal isolation, are not 
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Figure | Typical genetic analysis of hybrid sterility in males, a form of postzygotic reproductive isolation. In crosses 
between the two closely related species Drosophila mauritiana and D. sechellia, female hybrids are fertile but male 
hybrids sterile. Genetic analysis of this sterility proceeded by crossing the two species, one of which (D. mauritiana) 
had its chromosomes marked with morphological mutations. The F; female hybrids were backcrossed to the marked 
D. mauritiana stock, producing a genetically diverse array of backcross progeny, whose genetic constitution could be 
identified by the presence or absence of the markers. With one marker per major chromosome, there are eight 
genotypic classes of progeny; a sample of males from each class was analyzed for fertility, measured as the proportion 
of males of each genotype having at least some motile sperm. This graph shows that, compared to the backcross class 
having all chromosomal markers from D. mauritiana, each foreign chromosome from D. sechellia reduces male fertility. 
The foreign X chromosome, however, has by far the largest effect on sterility, a common observation in similar 


crosses among different pairs of species. 


represented at all. A broader and more accurate survey 
of the genetics of speciation will soon be possible with 
the advent of molecular techniques for gene mapping, 
but even then must await conclusions about which 
isolating mechanisms cause speciation in different 
groups. 

The sections below summarize what is known 
about the two most frequently studied classes of 
reproductive isolating mechanisms: sexual and post- 
zygotic isolation. We note again that ecological and 
temporal isolation — the preference of different species 
for living in different niches within an environment or 
for mating at different times — are likely to be import- 
ant causes of speciation in many groups (particularly 
plants), but have been completely neglected in genetic 
studies. 


Ethological Isolation 

It is likely that ethological isolation is an important 
primary reproductive isolating mechanism for several 
reasons. First, it is often present when other forms of 
reproductive isolation seem absent. As mentioned 
above, many animals (e.g., birds and frogs) seem to 
lack postzygotic or obvious ecological isolation and yet 
differ in plumage characters or mating calls that may 
attract conspecifics but repel members of other spe- 
cies. In addition, the degree of speciation within 
several groups of birds appears to be positively cor- 
related with both the degree of sexual dimorphism 
and the amount of polygamy of species within each 
group. Increased sexual dimorphism and polygamy 
provide greater opportunities for sexual selection, 
and their correlation with speciosity suggests that 
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Table 2 Summary of existing genetic analyses of reproductive isolation between closely related species, giving the 
trait studied and the number of genes involved in the species difference 


Species pair* Trait Number of genes?” 
Drosophila heteroneura/D. silvestris Head shape 9 
Drosophila melanogaster/D. simulans Hybrid inviability >9 

Female pheromones >5 
Drosophila mauritiana/D. simulans Hybrid male sterility >15 

Hybrid female sterility >4 

Hybrid inviability >5 

Male sexual isolation >2 

Female sexual isolation >3 

Genital morphology >9 

Shortened copulation >3 
Drosophila mauritiana/D. sechellia Female pheromones >6 

Hybrid male sterility >3 
Drosophila simulans/D. sechellia Hybrid male sterility >6 

Hybrid inviability >2 

Female sexual isolation >2 
Drosophila mojavensis/D. arizonae Hybrid male sterility >3 

Male sexual isolation >2 

Female sexual isolation >2 
Drosophila pseudoobscura/D. persimilis Hybrid male sterility >9 

Hybrid female sterility >3 

Sexual isolation >3 
Drosophila pseudoobscura USA/Bogota Hybrid male sterility >5 
Drosophila buzatti/D. koepferae Hybrid male inviability >4 

Hybrid male sterility >7 
Drosophila subobscura/D. madeirensis Hybrid male sterility >6 
Drosophila virilis/D. littoralis Hybrid female viability >5 
Drosophila virilis/D. lummei Male courtship song >4 

Hybrid male sterility >6 

Hybrid female sterility >2 

Hybrid inviability >4 
Drosophila montana/D. texana Hybrid female inviability >2 
Drosophila virilis/D. texana Hybrid male sterility >3 
Drosophila auraria/D. biauraria Male courtship song >2 
Ostrina nubialis, Z and E races Female pheromones l 

Male perception of pheromones 2 
Laupala paranigra/L. kohalensis Song pulse rate >8 
Spodoptera latifascia/S. descoinsi Pheromone blend l 
Xiphophorus helleri/X. maculatus Hybrid inviability 2 
Mimulus lewisii/M. cardinalis 8 floral traits 1—3 per trait 
Mimulus guttatus/M. micranthus Bud growth rate 8 

Duration of bud development 10 
Mimulus, four taxa Flowering time and five floral traits 5—13 per trait 
Mimulus guttatus populations Hybrid inviability 2 (system |) 

>2 (system 2) 

Mimulus guttatus/M. cupriphilus Flower size 3-7 
Helianthus annuus/H. petiolarus Pollen viability >14 


"Literature citations to these studies, as well as additional information, are given in Coyne and Orr (1998). ° Indicates 


minimum number of genes. 
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sexual selection might play a major role in adaptive 
radiations in birds. 

Finally, according to a popular hypothesis called 
‘reinforcement,’ sexual isolation may play an import- 
ant role in completing speciation. According to this 
theory, geographically isolated populations begin to 
diverge evolutionarily, incidentally acquiring genetic 
changes that can cause partial postzygotic isolation in 
their hybrids. When such populations later occupy the 
same area, their hybridization produces maladaptive 
offspring. This places a selective premium on indi- 
viduals who avoid mating with members of the other 
population, as those who mate with their own type 
leave a greater number of viable offspring. In this way 
the initial existence of incomplete postzygotic isol- 
ation can select for increased prezygotic isolation, 
which can become strong enough to reduce gene 
flow to zero, completing speciation. We now have 
comparative and observational evidence from nature 
that reinforcement can increase sexual isolation; and 
mathematical theories show that the process may 
occur under a broad range of conditions. 

The paucity of studies given in Table 2 allows us to 
make only a few tentative conclusions about etho- 
logical isolation. First, it can be based on only a few 
genes (as in the case of corn borers, where the two 
races appear to differ at only three loci, affecting 
respectively the female pheromone, its perception by 
the male, and the male response after perception) or on 
many genes (as in D. mauritiana/D. sechellia hybrids, 
where differences in the female pheromone alone are 
based on at least five loci). In addition, in all three cases 
in which sexual isolation was studied in both males 
and females of a species pair, the loci causing differ- 
ences in the male trait differed from those loci causing 
differences in how females perceive the traits. This is 
not surprising, as the developmental bases of male 
traits almost certainly differ from those of female 
perception. Finally, sexual isolation between species 
is often asymmetric: that is, it is often easy to cross 
males of species A to females of species B, but much 
more difficult to make the reciprocal cross. This pat- 
tern is common in Drosophila and amphibians, but has 
not yet received a satisfactory explanation. It may be 
related to the type of sexual selection that produces 
sexual isolation as a byproduct. 


Postzygotic Isolation 

Historically, postzygotic isolation has been attributed 
to three major factors: changes in the numbers of 
entire chromosome sets (polyploidy), changes in the 
structure of individual chromosomes, and changes in 
the sequence of genes. Changes in chromosome num- 
ber or structure are thought to cause only hybrid 
sterility, transposable elements and cytoplasmic 


incompatibility are thought to cause primarily hybrid 
inviability, while genic changes can produce both 
forms of reproductive isolation. Recent molecular 
work has suggested two other possible causes of 
hybrid inviability: interspecific differences in the 
numbers or kinds of transposable elements, and cyto- 
plasmic incompatibility caused by infectious micro- 
organisms such as Wolbachia. 

There is no doubt that polyploidy is a major cause 
of speciation, especially in plants. It has been estim- 
ated, for example, that 65% of angiosperms are of 
polyploid origin. The formation of polyploid plants 
is well understood (see Polyploidy), new polyploid 
species have been created in the laboratory and have 
been observed arising in nature, and the ancestry of 
existing polyploids has been reconstructed using 
chromosomal, genetic, and molecular analysis. As 
the formation of a new polyploid species involves 
only the formation of a semisterile F; species hybrid, 
followed by chromosome doubling, the process 
requires only a few generations, and is thus the fastest 
known form of speciation. Moreover, polyploid 
speciation requires no changes in genes, but only in 
the number of sets of chromosomes. 

Changing the structure of individual chromo- 
somes, on the other hand, is a more controversial 
mode of speciation. This idea is based on the observa- 
tion that some types of chromosomal rearrangements, 
such as translocations or pericentric inversions, can 
render their heterozygous carriers semisterile because 
of meiotic problems caused by either i improper segre- 
gation or recombination within rearranged regions. If 
such chromosomal differences were to become fixed 
among different populations, the hybrids might be 
sterile and would represent different species. 

There are several problems with this scenario. A 
chromosome rearrangement causing semisterility of 
heterozygotes would have difficulty rising to high 
frequency in a population, and could do so only 
through strong genetic drift in very small populations. 
Moreover, the observation of species fixed for differ- 
ent rearrangements does not necessarily prove that 
those arrangements were instrumental in speciation, 
for such rearrangements might be fixed after speci- 
ation had already been completed. Third, meiotic 
problems can result from both genetic and chromo- 
somal differences between species, so the observation 
of differences in chromosome structure and of aber- 
rant products of meiosis does not prove that the for- 
mer cause the latter. Finally, many good species show 
no detectable differences in chromosome structure. In 
Drosophila, for example, many closely related species 
are homosequential, i.e., have identical chromosome 
banding patterns. Although there has been a great deal 
of speculation about chromosomal speciation, there is 


not a single unambiguous case of primary reproduc- 
tive isolation having arisen in this way, and the process 
therefore remains controversial. 

There is no controversy, on the other hand, about 
the importance of genetic differences in postzygotic 
isolation. Genetic analysis (Table 2) has repeatedly 
shown that reproductive isolation maps to restricted 
sites on the chromosomes, and in one case (see above) 
the relevant genes have been cloned and sequenced. 

By 1940, it was realized by both H.J. Muller and 
Theodosius Dobzhansky that genetic postzygotic 
isolation must involve changes in at least two loci, 
for a single allele causing hybrid inviability or sterility 
would produce lethal or sterile heterozygotes and 
could not be fixed. Figure 2 shows their simple two- 
locus model of how postzygotic isolation might 
evolve between isolated populations. It is clear from 
this model that hybrid unfitness can evolve as a simple 
byproduct of either adaptive evolution or drift occur- 
ring in physically separated populations, that post- 
zygotic isolation can evolve without any population 
itself having experienced maladaptive evolution, and 
that postzygotic isolation involves strong epistasis 
between alleles fixed in the different populations. 
It is not necessary, of course, for postzygotic isolation 


aabb 


Population Population 
1 2 


AAbb aaBB 


A_B_ hybrids sterile or inviable 


Figure 2 The Dobzhansky—Muller two-locus model 
for the evolution of postzygotic reproductive isolation. 
An ancestral species has genetic constitution aabb. This 
species is then geographically fragmented into two 
populations. One population undergoes evolution sub- 
stituting the A for the a allele. In the other population 
the genetic constitution at the first gene remains 
unchanged, but at the other gene the B allele replaces 
the b allele. These evolutionary changes can occur by 
either natural selection or genetic drift and need not be 
maladaptive. However, the hybrids may be maladapted if 
there are deleterious interactions between the A and B 
alleles, so that the A_ B_ genotype is completely sterile 
or inviable. The two populations would then have 
become separate species, producing deleterious hybrids 
even though neither population experienced maladap- 
tive evolution. 
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to involve only two genes, as the interaction between 
several or many alleles might be necessary to cause 
hybrid sterility or inviability. Both two-locus and 
many-locus interactions have been found in genetic 
studies, confirming the generality of the Dobzhansky- 
Muller model. With the advent of molecular mapping 
and sequencing, the nature of these interactions, and 
the manner in which they afflict hybrids, will occupy 
experimental evolutionary geneticists for many years. 
Moreover, although there have been few mathematical 
models of speciation to date, theoreticians are begin- 
ning to incorporate the Dobzhansky—Muller model 
into various models of speciation, including those 
designed to examine the likelihood of reinforcement, 
the time course for the evolution of postzygotic isol- 
ation, and the processes likely to explain Haldane’s rule. 

Transposons and infectious microorganisms such 
as Wolbachia have been proposed as other causes of 
speciation, as in both cases crosses between indi- 
viduals possessing either the elements or the bacteria 
and those individuals lacking them can produce invi- 
able progeny. This fact, however, makes these elements 
unlikely causes of speciation: since both factors are 
infectious, they will sweep through hybridizing popu- 
lations, making them genetically uniform and hence 
not causing reproductive isolation. Obtaining invi- 
ability of hybrids in both reciprocal crosses between 
a pair of taxa requires that those taxa be infected with 
either different forms of microorganisms or different 
families of transposable elements. We know of no case 
of speciation based on transposable elements. Incipi- 
ent speciation due to differential Wolbachia infection 
has, on the other hand, been seen in both Drosophila 
simulans and in three species of the parasitic wasp 
Nasonia. Like polyploidy, this form of speciation can 
be rapid and does not require changes in genes of the 
speciating organism. While such infectious speciation 
may occur occasionally, it is not likely to be common. 
As seen in Table 2, postzygotic isolation usually maps 
to distinct genes, not to the cytoplasm (as expected if it 
is caused by microorganisms) or to diffuse sites 
throughout the genome (as might be expected with 
transposons). Moreover, neither of these phenomena 
can explain Haldane’s rule — both phenomena produce 
lethal hybrids of both sexes — and neither can be 
responsible for the evolution of hybrid sterility or 
any type of prezygotic isolation. 


Future Work 


With the advent of new empirical and theoretical tools 
for the genetic analysis of reproductive isolation, spe- 
ciation is leaving its previous condition of a nebulous 
field obsessed with intractable questions, and entering 
an era in which tractable questions can be posed and 
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answered (e.g., what is the genetic cause of Haldane’s 
rule?). Moreover, the connection between theory and 
experiment has grown increasingly close, and testable 
mathematical theories about speciation are already 
being constructed. 

Many provocative and yet tractable questions 
remain. Most of what we know about the genetics of 
speciation comes from Drosophila, and it is not clear 
that patterns found in that genus will characterize 
other groups. Does reinforcement, for example, 
occur in plants? Do plant species having hetero- 
morphic sex chromosomes obey Haldane’s rule? 
What are the primary mechanisms of reproductive 
isolation in various plant and animal taxa? 

In addition, we know far less about the genetics and 
evolution of ecological, temporal, and sexual isolation 
than we do about postzygotic isolation. How often, 
for example, do temporal differences in reproduction 
isolate species? Is niche separation an important cause 
of speciation? Is sexual selection an important cause 
of sexual isolation in animals? Why does sexual isol- 
ation so often exhibit asymmetries? We can expect that 
the next decade will bring answers to many of these 
questions. 
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Resistance plasmids (R plasmids) are extrachromo- 
somal genetic elements which confer on the bacteria 


that harbor them resistance to antibiotics and various 
other inhibitors of growth. R plasmids were discov- 
ered in Japan in the early 1960s as a result of studies 
related to the emergence of multiple drug resistance in 
Shigella, a pathogenic bacterium. Since this discovery, 
these plasmids have been found throughout the world 
and their rise is correlated with the use of antibiotics 
to treat infectious disease. 

The genes carried by these plasmids that are respon- 
sible for the antibiotic resistance phenotype are often 
located near a transposable genetic element or within 
one. As such, they may have moved to the plasmid 
from another genetic element through recombination, 
and a single plasmid may contain different elements 
and confer multiple resistances. For example, the plas- 
mid R100 contains transposable elements carrying 
genes that confer resistance to tetracycline, chloram- 
phenicol, streptomycin, sulfonamides, and mercuric 
ions. Similarly, the plasmid R1 contains multiple 
transposable elements and confers resistance to ampi- 
cillin, chloramphenicol, kanamycin, sulfonamides, 
and streptomycin. The fact that these genes are often 
contained on transposable elements means that 
new combinations of resistance can arise relatively 
rapidly. 

Like all plasmids, R plasmids also carry genes that 
control their replication and other characteristics. The 
fact that the genes encoding resistance are often 
mobile means that the plasmids carrying them need 
not otherwise be closely related. Indeed, R plasmids 
may be as distantly related to each other as they are 
to any other type of plasmid. 

Many R plasmids are conjugative. Conjugative 
plasmids carry genes that allow them to be transferred 
into other cells by cell-to-cell contact. Some conju- 
gative plasmids, including some R plasmids, can trans- 
fer to a wide variety of different kinds of bacteria. 
For instance, the plasmid R100, mentioned above, is 
a conjugative plasmid that can be transferred to a var- 
iety of different bacteria, including members of the 
genera Escherichia, Klebsiella, Proteus, Salmonella, 
and Shigella. 

The emergence of bacteria with resistance to 
multiple antibiotics is of considerable medical impor- 
tance. The fact that these resistances can be carried by 
conjugative plasmids and may be transferred rapidly 
from one type of bacterium to another constitutes a 
serious threat to the continued use of antibiotics to 
treat infections caused by many different pathogenic 
bacteria. 


See also: Antibiotic Resistance; Conjugation, 
Bacterial; Plasmids 
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Bacteria emerge with antibiotic resistance by a variety 
of mechanisms. A mutation in the gene for the target 
for the antibiotic may occur spontaneously or in 
response to environmental mutagens, which provides 
the bacteria with a gene product that is no longer 
susceptible to the antibiotic. Examples of this kind of 
resistance include: the mutated RNA polymerase 
enzyme providing resistance to rifampicin; mutated 
ribosomal proteins causing resistance to streptomycin; 
mutations in the topoisomerase II (gyrase) enzyme 
leading to resistance to quinolones; and mutation in 
the membrane-associated penicillin-binding proteins 
which mediate resistance to the activity of penicillin. 

A second mechanism is the overexpression of genes 
that destroy the antibiotic or substitute for its target. 
An example of this is the AmpCB-lactamases among 
the Enterobacteriaceae in which overproduction of 
this normal cell product provides resistance to a 
broad spectrum of B-lactam (penicillin-like) antibio- 
tics, or the overexpression of a precursor peptide of the 
cell wall, the target of vancomycin, which leads to 
resistance to vancomycin in Staphylococcus aureus. In 
some cells, resistance is provided by the increased 
expression of cell efflux pumps, which normally func- 
tion for other purposes, but that can also pump out 
multiple antibiotics. 

The most common way for a bacterium to become 
resistant is to acquire resistance genes from other bac- 
teria. There are many mechanisms available by which 
resistance genes are exchanged. Cells may pick up 
naked DNA in the environment and incorporate it 
into their chromosome by a process called ‘transform- 
ation.’ Resistance genes from one bacterium may 
enter another via bacteriophages (bacterial viruses) in 
what is called ‘transduction.’ The transfer of circular 
extrachromosomal pieces of DNA (plasmids) from 
one organism to another by cell-to-cell contact is 
called ‘conjugation.’ Transposition is the movement 
of small pieces of DNA (transposons) from one 
DNA vehicle to another, such as from phages to the 
chromosome or from a plasmid to the chromosome. 
This process fosters the stabilization of the resistance 
gene into new DNA molecules. Some transposons 
themselves can move among bacteria; these are 
called ‘conjugative transposons.’ A unique kind of 
transposon, the ‘integron,’ has more recently been 
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described. It involves a group of resistance genes that 
reside together within a usually inactive transposon. 
The important feature of the integron is the gene int 
(integrase), which allows cassettes of resistance genes 
to be incorporated into a specific site near a promoter, 
a DNA sequence that allows the gene to be expressed. 
Many different types of bacteria have become multi- 
drug resistant using the integron mechanism. 

The origin of resistance genes is not known, but it 
has been proposed that they evolve from protective 
traits of the antibiotic-producing organisms them- 
selves. Resistance genes mediate resistance in the 
following ways: by inactivating the antibiotic, for 
example, as penicillinases do to penicillins; by sub- 
stituting new, insensitive target enzymes, as in the 
mechanisms for resistance to trimiethoprim and 
sulfonamides; by altering targets, as for erythromycin 
and tetracyclines; or by altered transport, chiefly 
exemplified by efflux pumps, which keep single or 
multiple antibiotics out of the cell. 

Some resistance genes require other changes in the 
host bacterium in order to produce their resistance. 
This feature has been clearly shown for methicillin 
resistance mediated by the mec gene which alone pro- 
vides very little resistance. It must be accompanied by 
mutations in other chromosomal genes in order to 
express fully fledged resistance. The same may be 
said for changes in the membrane to accommodate 
new efflux proteins. 

It is highly likely that other kinds of genetic 
mechanisms for resistance will emerge as scientists 
continue to study antibiotic resistance and to decipher 
the genetic code of an increasing number of bacteria. 


See also: Antibiotic Resistance; Antibiotic- 
Resistance Mutants; Bacterial Transformation; 
Conjugation, Bacterial 
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The term resolvase was coined independently at about 
the same time to describe two different enzymatic 
activities. 


Site-Specific Recombinase 


The term resolvase is used to describe a related family 
of site-specific recombinases that function to excise (as 
a circle) a segment of DNA contained between two 
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recombination sites (called res). The original resolv- 
ases were encoded by transposons of the Tn3-family. 
These transposons move by a replicative pathway and, 
in the first step of transposition, form what is called a 
cointegrate (see Resolvase-Mediated Deletion). This is 
an insertion into a target DNA of the entire replicon 
that carries the transposon with one complete copy of 
the transposon at each end of the inserted DNA. The 
transposon-encoded_ resolvase protein excises the 
initial donor replicon (including one transposon copy) 
from the cointegrate, leaving behind one copy of the 
transposon in the target site. Resolvases closely related 
to the original Tn3 and yò resolvase are encoded by 
some other (otherwise unrelated) transposons and 
serve the same purpose, to resolve cointegrate trans- 
position intermediates. 

Additional resolvases (also related to the Tn3 resolv- 
ase) are not transposon-associated, but are encoded by 
bacterial plasmids. The purpose of these is to maintain 
the plasmids in a monomeric state. Following replica- 
tion, two monomeric copies of a plasmid may recom- 
bine to form a single dimer. If the plasmid remains as a 
dimer at cell division, it will be retained by one daugh- 
ter and lost by the other. The plasmid resolvases keep 
this from happening by reducing dimers back to the 
two monomeric precursors. This substantially reduces 
plasmid loss during vegetative growth. 


Holliday Junctions 


The term resolvase is also used for a class of enzymes 
that process (resolve) Holliday junctions. Holliday 
junctions are intermediates of homologous recombin- 
ation in which two homologous DNA duplexes are 
joined by the interchange of a pair of (nearly) identical 
single strands. Holliday junction resolvases cleave a 
pair of single strands at the same site, disconnecting the 
two duplexes and leaving a ligatable nick in each one. 


See also: Chromosome Dimer Resolution by Site- 
Specific Recombination; Holliday Junction; 
Resolvase-Mediated Deletion; Site-Specific 
Recombination 
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A number of bacterial transposons, particularly those 
related to Tn3 and yë but also others including Tn552, 


encode two distinct recombinases that participate in 
their transposition to other DNA molecules. In the 
initial step mediated by the element’s transposase, a 
cointegrate is formed between the transposon- 
containing donor DNA and the target molecule (see 
Figure |). In this transpositional intermediate, the 
donor and target DNAs are joined together by copies 
of the duplicated transposon, one copy occurring 
at each donor—target junction. The second step in 
the pathway, cointegrate resolution, is a site-specific 
recombination performed by the transposon’s resolv- 
ase protein, acting at a site, called res (Figure 2A), 
located within the transposon. This recombination 
excises an intact circular donor molecule from the tar- 
get DNA, leaving behind one copy of the transposon. 

The chief features of the resolvase-mediated site- 
specific recombination reaction are: (1) it is performed 
efficiently by the small resolvase protein, requiring 
no additional protein cofactors (however, multiple 
dimers of resolvase are involved), (2) the res site is 
complex, containing several resolvase binding sites 
in addition to the crossover site, and (3) resolvase is 
deletion-specific; it is unable to recombine res sites in 
an inverted orientation or carried on separate DNA 
molecules. 


Resolvase and its res Site 


Resolvase is the product of the transposon’s tnpR 
gene. It is a small polypeptide — only 183 residues in 
the case of the transposon 76. The Tn3 and 76 resolv- 
ases are prototypes of a large family of similar site- 
specific recombinases, including not only cointegrate 
resolvases but also some DNA invertases and phage 
integrases (see Site-Specific Recombination). The 
structure of the yò protein has been solved in both 
DNA-bound and unbound states. 

Resolvase consists of two domains. The N-terminal 
domain of about 120 residues contains the catalytic 
active site and is responsible both for dimerization of 
resolvase and for higher-order interactions between 
dimers that are important for assembly of the 
resolvase-DNA complex within which recombination 
occurs. The C-terminal 65 residues are responsible 
for DNA binding. In the absence of DNA this 
segment of resolvase appears to be largely unstruc- 
tured. However, when bound to its recognition site, 
the last 36 residues form a compact three-helix bundle 
(with the classical helix-turn-helix DNA binding 
motif), while the remaining polypeptide segment 
between the two globular domains (residues 121-146), 
although highly extended, makes a variety of add- 
itional contacts to the DNA. In the complex with its 
crossover site, a dimer of 76 resolvase wraps complete- 
ly around the DNA helix, with the catalytic and 
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Figure | 


The overall pathway of transposition for Tn3, yò, and related elements, showing the transposase- 


mediated formation of a cointegrate (top) and its resolvase-mediated resolution. The transposase acts at the ends 
of the transposon (black triangles), while resolvase acts at the internal res sites (stippled rectangles). Note that 
cointegrate formation also requires the host cells DNA replication activities. 


(A) res-114 bp 


(B) site | . 
CGTCCGAAATATTATAAATTATCGCACA 


GCAGGCTTTATAATATTTAATAGCGTGT 
A 


Figure 2 (A) The res site of yò (a typical res site). The 
three binding sites for resolvase dimers (l, Il, and III) 
consist of inverted pairs of 12 bp resolvase recognition 
sequences (represented by the arrow heads), flanking 
short spacers of the indicated lengths. The arrows 
labeled Pa and Pr indicate the positions at which 
transcription of the tnpA and tnpR genes is initiated. 
(B) An expanded view of site |, showing the |2bp 
recognition sequences (horizontal arrows) and the site 
of DNA cleavage (vertical arrowheads). 


DNA-binding domains contacting opposite faces of 
the helix (Figure 3). 

res, the minimal DNA segment for efficient resolv- 
ase action, contains three binding sites for resolvase 
dimers distributed over about 120 base pairs (Figure 
2A). Site I contains the crossover point (the actual site 
of the DNA cleavage that initiates recombination; 
Figure 2B), while sites II and III are accessory sites 
primarily involved in bringing together the two res 
sites and assembling the resolvase-DNA complex 


within which recombination occurs. Typically, each 
binding site consists of two recognition sequences 
(about 12 bp) in inverted orientation, separated by a 
short spacer. The length of these internal spacers varies 
from site to site (see Figure 2A). This is an unusual 
feature for sites that bind a protein dimer and prob- 
ably plays an important role in determining the local 
action of each resolvase dimer. The site I spacer must 
be of a length that correctly juxtaposes the two active 
sites for cleavage of the crossover point, while the 
longer spacer of site II may enable the DNA to bend 
sharply around the protein. The distribution of the 
three sites in res is also irregular but critical. The 
centers of sites I and II are always separated by an 
integral number (four to seven) of helical turns, and 
perturbing either the I-II or the II-III separation 
inhibits resolution. 

The res sites of Tn3 and y6 are involved not only in 
recombination but also in regulation of gene expres- 
sion. The genes for both the transposase and the re- 
solvase proteins (tnpA and tnpR, respectively), are 
divergently transcribed from promoters (Pa and Pr; 
see Figure 2A) contained within res and are repressed 
by binding of resolvase. As a consequence, when the 
transposon is carried into a new cellular environment 
by horizontal transmission (via conjugation, trans- 
formation, or phage infection) the absence of resolvase 
will result in elevated expression of transposase and 
resolvase, and increase the chance of transposition 
from the vector of transmission into the genome of 
the new host. 
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Figure 3 Structure of yò resolvase bound to its crossover site. The two resolvase monomers that constitute the 
dimer are shown in black and white surface representation. The catalytic domains are above the DNA; the DNA- 


binding domains are below. 


The Resolution Process 


Resolvase-mediated recombination can be broken 
down into several separable steps: (1) resolvase bind- 
ing, (2) res site synapsis, (3) crossover site cleavage, (4) 
exchange of DNA strands, and (5) religation to form 
the recombinant DNA products. 

Although resolvase binds to any cognate res site, 
effective pairing (or synapsis) depends on the two res 
sites being on the same superhelical molecule in a 
head-to-tail orientation. The structure of the synaptic 
complex is highly organized. The available evidence 
suggests that the two res sites, held together by inter- 
actions between the DNA-bound resolvase dimers, are 
aligned in an antiparallel manner and wrapped around 


one another, trapping three (negative) supercoils 
(Figure 4A). Binding sites II and II play a critical role 
intheassembly of the synaptic complex, and two appro- 
priately oriented copies of the DNA segment contain- 
ing sites II and III (but lacking site I) can be efficiently 
paired by resolvase to give a complex with the same 
DNA topology as a complete synaptic complex. 
Formation of the synaptic complex triggers the re- 
combinational activity of the resolvase dimers bound 
to the two copies of site I, presumably by altering the 
conformation of resolvase or the resolvase-site I com- 
plex. This activation results in double strand cleavage 
of the two crossover points (Figure 4B). The DNA 
strands are broken, not by endonucleolytic action, but 
by a direct phosphoryl transfer to a nucleophilic side 
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Figure 4 (A) Cartoon of the resolvase synaptic complex. The single substrate circle contains two res sites (each 
with three subsites labeled |, II, and IlI), and is divided into two domains (thick and thin lines) that will become the two 
circular products after strand exchange between the site | segments. Resolvase dimers are represented as circles, 
with stippled and crosshatched dimers bound to the different res sites. Note the interwrapping of the res sites, 
trapping three negative supercoils. (B) Cartoon showing the consequences of strand exchange. Each ‘ladder’ 
represents duplex DNA with the black circles indicating the 5’ ends, and the rungs representing base pairs. On the 
left, resolvase (R) has cleaved each strand, becoming covalently linked to the 5’ ends at each break. One half of each 
site (the right half in this cartoon) is then rotated half a turn in a clockwise direction, positioning it for rejoining with 
the fixed recombinational partner. The rotation introduces a half twist in each DNA duplex and a new crossing 


between recombinant sites. 


chain of the recombinase (a conserved serine residue 
found in all members of the resolvase/DNA invertase 
family). As a result, the 5’ ends of the broken strands 
become covalently joined to resolvase via a phospho- 
serine linkage, and the energy of the broken DNA 
phosphodiester bond is conserved. Religation and 
release of the covalently joined resolvase is readily 
achieved by reversal of the DNA cleavage process, 
using the free 3’ hydroxyl groups as nucleophiles to 
attack the phosphoserine linkages. 

The penultimate step of the resolution process — the 
exchange of DNA strands that precedes their religa- 
tion in the recombinant configuration — remains the 
most mysterious. Careful analyses of the change 
in superhelicity that accompanies recombination (as 
one proceeds from a circular substrate to the two 
product circles) indicate that the exchange is equiva- 
lent to a 180° clockwise rotation of the left halves of 
site I relative to the rest of the synaptic complex (see 
Figure 4B). This observation led to the ‘subunit 
exchange’ model, which proposes that the monomeric 
resolvase subunits linked to the left halves of site I 
loosen their association with their dimeric partners 
and exchange places with one another. Since the initial 


superhelicity of the substrate DNA is essentially 
retained in the recombinant products, the proposed 
exchange must be an orderly, constrained process that 
does not involve complete subunit dissociation; this is 
difficult to envision. An alternative model proposes 
that the exchange of DNA strands occurs within an 
essentially fixed protein framework (much as strand 
exchange catalyzed by Cre and other site-specific 
recombinases of the lambda integrase family is 
thought to occur). However, in the case of resolvase, 
the precise changes in DNA linking number plus 
constraints raised by the known structure of 
the resolvase-site I complex make this model equally 
difficult to envision. 


Further Reading 

Grindley NDF (1994) Resolvase-mediated site-specific re- 
combination. In: Eckstein F and Lilley DMJ (eds) Nucleic 
Acids and Molecular Biology, vol. 8, pp. 236-267. Berlin: 
Springer-Verlag. 

Stark WM and Boocock MR (1995) Topological selectivity in 
site-specific recombination. In: Sherratt DJ (ed.) Mobile 
Genetic Elements, pp. 101—129. Oxford: Oxford University 
Press. 
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Yang W and Steitz TA (1995) Crystal structure of the site- 
specific recombinase yd resolvase complexed with a 34 bp 
cleavage site. Cell 82: 193-207. 


See also: Hin/Gin-Mediated Site-Specific DNA 
Inversion; Site-Specific Recombination; 
Transposable Elements; Transposase 
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Luria and Human first described restriction—modifi- 
cation in 1952. They observed that bacteriophages 
efficiently infecting one strain of bacteria could not 
efficiently infect another related second strain, whereas 
the progeny of the few bacteriophages that initially 
managed to grow on the second strain were found to 
be capable of efficiently reinfecting the second strain. 

Restriction describes the initial poor infectivity of a 
bacteriophage; modification describes the alteration 
the bacteriophage undergoes to overcome restriction. 
A bacterial cell, which harbors a restriction modifica- 
tion system (RM system), will most frequently degrade 
incoming unmodified bacteriophage DNA, thereby 
overcoming infection by restriction. Occasionally the 
bacteriophage DNA will avoid degradation in the cell; 
the DNA will undergo modification and the bacterio- 
phage will continue on witha productive infection with 
the progeny DNA also undergoing modification. 
When a bacteriophage whose DNA has been modified 
infects a host that harbors the same RM system 
by which the bacteriophage had been modified, the 
bacteriophage will successfully avoid restriction. 

The basis for restriction is nuclease digestion of the 
incoming DNA by a sequence-specific restriction 
endonuclease alternatively referred to as a restriction 
enzyme. The basis for modification is sequence- 
specific methylation of the incoming DNA by the 
cognate DNA methylase, also referred to as a DNA 
methyltransferase. DNA methylases add a methyl 
group to either the Ng position of adenine, or the C5 
position or N4 position of cytosine. The RM systems all 
have a specific recognition sequence. Generally these 
recognition sequences consist of from 4 to 8 specific 
bases. Hundreds of different unique recognition 
sequences and thousands of different restriction endo- 
nucleases have been described. RM systems have been 
classified into different types based on whether they 
require ATP, the number of proteins involved, nature 


of the sequence recognized, and various other factors. 
The two best-described types are typeI and type II. The 
type I RM systems are composed of three proteins: a 
restriction endonuclease, hsdR, a DNA methylase, 
hsdM, and a specificity protein hsdS. An example of a 
type I system is the EcoK system of Escherichia coli. It 
recognizes the sequence AAC(N,)GTGC, where N 
can be any base. The EcoK methylase methylates the 
adenines which occur once on each DNA strand in the 
recognition sequence. The cleavage reaction requires 
ATP and although the sequence recognized is specific 
and the site of methylation is specific, the site of 
cleavage occurs nonspecifically throughout the DNA 
molecule. In contrast, the necessary protein compon- 
ents of the type II systems are the restriction endonu- 
clease and the DNA methylase. There is no third 
specificity protein and there is no requirement for 
ATP. Furthermore, the type II restriction endo- 
nucleases cleave the DNA at specific sequences and 
it is this property that make them invaluable tools for 
molecular biologists. Examples of type II restriction 
endonucleases are EcoRI, HindIII and BamHI, 
which recognize and cleave the sequences GAATTC, 
AAGCTT, and GGATCC, respectively. 

The majority of bacteria contain at least one RM 
system and many bacteria contain more than 10 
different RM systems. RM systems are not species 
specific. One strain of Bacillus subtilis may have a 
particular RM system while another strain of B. sub- 
tilis may have different or no RM systems. RM sys- 
tems are mobile, moving from one bacteria to another, 
and are often located on plasmids. The genes encoding 
the methylase and endonuclease of a RM system are 
always found to be genetically linked. The organiza- 
tion of those genes varies, that is, the gene encoding 
the methylase may come before or after the endonu- 
clease gene and the direction in which the genes are 
transcribed varies relative to each other as well. There 
are restriction systems in bacteria that cleave DNA 
only when specific sequences are methylated as con- 
trasted to the conventional RM systems that cleave 
only unmodified recognition sequences. The labora- 
tory strain of E. coli K12 contains two different sys- 
tems that degrade DNA when the incoming DNA is 
methylated at particular specific sequences. In addi- 
tion to restricting bacteriophage infection RM sys- 
tems also act on foreign plasmid and chromosomal 
DNA. Therefore, molecular biologists must take 
restriction systems into consideration when transfer- 
ring DNA in and out of different bacteria. 


Further Reading 

Heitman J (1993) On the origins, structures and functions 
of restriction-modification enzymes. 
(NY) 15: 57-108. 
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Wilson GG and Murray NE (1991) Restriction and modification 
systems. Annual Review of Genetics 25: 585—627. 


See also: Restriction Endonuclease 
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The term ‘restriction enzyme’ was first coined more 
than 30 years ago, following the classic observation 
that phage grown on one bacterial species would grow 
poorly on another strain of the same species; the phage 
were ‘restricted’ in their host range. The restriction 
activity was dependent on an endonuclease which 
cleaves double-stranded DNA upon binding a spe- 
cific sequence of nucleotides (the recognition site). 
However, transfer of a methyl group from the donor 
S-adenosylmethionine to a particular base on one or 
both strands of the recognition site prevents cleavage. 
This modification is catalyzed by a methyltransferase 
activity. The host DNA can therefore be protected 
from self-digestion, even during semiconservative 
replication when one strand is temporally unmodified. 
Conversely, any invasive DNA is unlikely to carry 
the correct pattern of methylated bases, and will be a 
target for cleavage. The presence in parallel of endo- 
nuclease and methyltransferase activities produces a 
bacterial equivalent of an immune system, capable 
of distinguishing self from foreign DNA. Probably 
as a consequence of this custodial role, restriction 
endonucleases are widespread in nature, appearing in 
every bacterial genera examined. 

The exquisite specificity of restriction endonu- 
cleases has also prompted their extensive use in the 
laboratory as in vitro tools for cutting DNA. Without 
the discovery and characterization of the first type II 
restriction enzymes in the early 1970s, followed by the 
active pursuit of new cleavage specificities, few of the 
advances in molecular biology over the last three 
decades would have occurred. Furthermore, due to 
their relatively simple protein subunit requirements, 
restriction endonucleases are also excellent systems 
for analyzing the molecular mechanisms involved in 
protein-DNA interactions. 


Types of Restriction Endonuclease 


Restriction endonucleases are classified into three 
types, called I, II, or III, based on their genetics and 
enzymology. 
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ATP-Dependent Restriction Endonucleases 
Type I and type III restriction endonucleases are oli- 
gomeric complexes which result from the modular 
assembly of separate gene products. Type I enzymes 
are assembled from three gene products: HsdS, 
which specifically binds the recognition site; HsdR, 
which cleaves the DNA; and HsdM, which methylates 
the DNA. The complex of all three proteins acts as a 
functional endonuclease and methyltransferase. Type 
III enzymes are assembled from two genes products: 
Mod, which binds and methylates the DNA; and Res, 
which cleaves the DNA. Together they also form an 
endonuclease and methyltransferase. Both type I and 
type III enzymes recognize specific asymmetric DNA 
sequences (see Table 1), but subsequently cleave 
DNA at nonspecific loci separate from the recognition 
sites. For type I enzymes this can be anywhere 
between 50 to several thousand bp away from the 
site, whereas type III enzymes cleave 25-27 bp from 
the site. In both cases, DNA restriction relies on 
ATP and Mg** ion cofactors and is stimulated by 
S-adenosylmethionine. 

The long-range communication between site- 
specific DNA recognition and nonspecific cleavage is 
provided by translocation along DNA (DNA track- 
ing), driven by the hydrolysis of ATP. During track- 
ing, an enzyme remains bound to its recognition site 
whilst simultaneously translocating adjacent non- 
specific DNA past itself, thus extruding an expanding 
loop of DNA. Subsequent DNA cleavage is triggered 
in different ways. Reactions of the type I enzymes 
on linear DNA require a minimum of two sites, with 
cleavage occurring wherever a pair of translocating 
enzymes collide. However, on circular DNA a single 
site is adequate, suggesting that changes in DNA top- 
ology produced by tracking could eventually arrest 
motion and trigger DNA cleavage. Conversely, the 
type III enzymes have an absolute requirement for 
two sites in a ‘head-to-head’ orientation. (Using two 
EcoPI sites as an example, the sequence AGACC must 
precede GGTCT in a 5’ to 3’ direction on one DNA 
strand.) Cleavage occurs proximal to one restriction 
site when the tracking enzymes collide. In contrast 
to the type I enzymes, the absolute requirement for 
a particular orientation of sites suggests that specific 
protein-protein contacts between the stalled species 
are required to activate DNA hydrolysis. 

Until recently, type I and type III endonucleases 
were believed to have a limited distribution. However, 
complete sequences from a broad range of bacterial 
and archaebacterial genomes suggest that these en- 
zymes are far more prolific than previously assumed; 
for instance, the gastric pathogen Helicobacter pylori 
has three putative type I enzymes while the archaeon 
Methanococcus jannaschii has at least five. 
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Table I Recognition sites for restriction enzymes 
Type and organism Enzyme‘ Sequence” Notes 
Type | 
Escherichia coli KI EcoKI 5’...AACnnnnnnGTGC...3’ Oligomeric complex 
3’... TTGnnnnnnCAG.G...5’ DNA cleavage at distant random loci 
Type Ill 
E. coli Phage PI EcoPI 5’...AGACC (n)25-27| ...3” Oligomeric complex 
3’... TCTGG (n)25-27f .-. 5’ DNA cleavage 25-27 bp from site 
Type Il 
Arthrobacter luteus Alul 5’... AG|CT...3’ 
3’... TCTGA...5! 
E. coli RY13 EcoRI 5. ..GĻAATT C...3’ Dimer 
3’...C TTAATG...5’ 
E. coli J62[plg74] EcoRV 5’...GAT|ATC.. .3’ Dimer 
3’...CTAT TAG...5’ 
Providencia stuartii Pstl 5’...C TGCAĻG...3' Dimer 
3’...G TACGT C...5’ 
Neisseria sicca Nsil 5’...A TGCA|T...3’ 
3’...T TACGTA...5! 
Pseudomonas alcaligenes Pacl 5’... TTA AT| TAA. ..3’ 
3’...AAT TTA ATT...5’ 
Citrobacter freundii Cfrl 01 5’...R] CCGG Y...3’ Tetramer. Binds and cleaves two sites 
3’...Y GGCCT R...5’ 
Desulfovibrio desulfuricans Ddel 5’...C|TnA G...3’ 
3’...G AnTT C...5! 
Streptomyces fimbriatus Sfil 5’...GGCCn nnn|nGGCC.. .3’ Tetramer. Binds and cleaves two sites 
3’...CCGGnfnnn nCCGG...5’ 
Flavobacterium okeanokoites Fokl 5’...GGATG (n) | ...3’ Monomer 


3/...CCTAG (n))3 T.-.5! 


“Enzymes are named after the genus and species from which they derive. PR and Y represent purines (A or G) and 
pyrimidines (T or C), respectively. n represents any base (A, G, T, or C). The arrows indicate the scissile phosphodiester 


bonds. 


Type Il Restriction Endonucleases 

In type II systems, the DNA restriction activity is 
encoded by a single gene; DNA methylation is cata- 
lyzed by a separate enzyme. The endonucleases are 
generally dimers, but examples of active monomers 
and tetramers also exist. Unlike the type I and III 
enzymes, the only cofactor requirement is for Mg** 
ions and, in most cases, DNA cleavage occurs at a 
precise point within the recognition site. More than 
3000 type II endonucleases have been identified, 
encompassing over 230 different DNA recognition 
sites. This has revealed a very diverse group of enzymes. 
Significant homology at the primary sequence level, 
even with their partner methyltransferases, is unde- 
tectable. However, two enzymes from different species 
can cleave the same restriction site (isoschizomers), and 
in these cases homologous regions are detectable. 


Some examples of type II recognition sequences are 
shown in Table |. Generally, the sites are 4-8 base 
pairs long, with both strands having the same 5'-3' 
DNA sequence. These ‘uninterrupted’ sites are illus- 
trated by Alul, EcoRI, EcoRV, PstI, NsiI, and Pacl 
(Table |). However, some enzymes recognize sites 
that are degenerate, in that more than one base can 
occupy a particular position within the sequence. For 
example, Cfr10I requires a purine at the first base in 
the recognition sequence, which can be either A or G 
(Table |). In other cases, this degeneracy is extended 
such that any base can occupy a given position and the 
recognition site is ‘interrupted’ by nonspecific DNA. 
Examples of this class of site are given by DdelI and 
Sfil (Table 1). 

In all the examples given above, the cleavage loci 
are within the recognition site. Some endonucleases 


cleave the DNA in the 5’ half of the sequence whereas 
others cleave in the 3’ half; the former produce over- 
hanging 5’ single-stranded DNA ends whereas the 
latter produce overhanging 3’ single-stranded DNA 
ends (e.g., EcoRI for the 5’ extensions and PstI and for 
the 3’ extensions). Others cut both strands at the cen- 
ter of the site, to produce a blunt-ended fragment (e.g., 
EcoRV). Cleavage can even occur within the non- 
specific nucleotides of an interrupted sequence (e.g., 
Sfil). In contrast, a subset of the type II endo- 
nucleases, called type Ils, cleave nonspecific DNA at 
a fixed position outside of their recognition sites. An 
example of one of these enzymes is given by Fokl 
(Table 1). 

As more enzymes are characterized in detail, it is 
becoming clear that there are significant differences 
in subunit assembly, substrate requirements, and 
modes of DNA cleavage. For instance, one subset of 
enzymes, called type Ile (e.g., Nael), bind two re- 
cognition sites but cleave only one; one site acts as 
an allosteric activator for cleavage of the second. 
Another subset of enzymes, characterized by Sfil, 
not only bind two sites simultaneously but subse- 
quently cleave both loci during one binding event. 
The oligomeric BcgI endonuclease also cleaves four 
DNA strands but in a bilateral fashion on either 
side of its recognition site, thus excising a 32-34 bp 
fragment. 

In the laboratory, many endonucleases are con- 
sidered ‘difficult,’ due to slow or incomplete DNA 
cleavage. However, this may be due in part to the 
alternative strategies that these enzymes use to cut 
DNA, such that the number and arrangement of re- 
cognition sites on each DNA molecule can affect the 
rate of cleavage. 


Applications of Type II Restriction 
Endonucleases 


Tools of Molecular Biology 

The type II restriction endonucleases cleave their spe- 
cific recognition sites at least a million times faster 
than any other sequence, even those which differ by 
just one bp. This discrimination, combined with the 
reproducibility of the cleavage reactions, has led to 
their prolific use in molecular biology. (However, cer- 
tain reaction conditions such as organic solvents or 
Mn?* ions can lead to a loss in discrimination, and 
sites which differ by 1 bp are cleaved more readily; 
this is referred to as ‘star activity.) Such precise DNA 
cleavage is a useful diagnostic for analyzing large 
DNA molecules. DNA digested with one or more 
restriction enzyme will produce a distinctive series of 
linear DNA fragments (restriction fragments). After 
separation by gel electrophoresis through agarose or 
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polyacrylamide, fragment sizes can be compared and a 
map constructed showing the relative positions of 
each site. Only one permutation of sites can produce 
the observed fragments. This technique is a first step in 
genome characterization and can also be used to map 
defective alleles based on the loss or creation of parti- 
cular sites (restriction fragment length polymorphism; 
RFLP). 

All type II restriction endonucleases cleave phos- 
phodiesterbondstoleave5'-phosphateand3'-hydroxyl 
groups. These termini are recognized by DNA ligase, 
allowing the ends to be covalently joined. When the 
ends have complementary 5’ or 3’ extensions (‘sticky 
ends’), the fragments will pair spontaneously. For 
instance, digestion with PstI leaves the single-stranded 
sequence TGCA at the 3’ ends of DNA. The comple- 
mentary ends need not be produced by the same 
enzyme. Although NsiI has a different recognition 
site to PstI (Table 1), it generates the same single- 
stranded 3’ end. Therefore, DNA cleaved by PstI and 
NsiI can be readily annealed, and joined by DNA 
ligase. These properties are fundamental to recom- 
binant DNA technology. 


Mechanisms of DNA Recognition and 
Cleavage 

An extensive biochemical analysis of the type II 
restriction endonucleases, alongside structural evalua- 
tion by X-ray crystallography, has revealed a detailed 
picture of DNA recognition and cleavage. To cut both 
strands of a recognition site during one binding event, 
each scissile phosphodiester bond must be proximal to 
a protein active site. In general, this is achieved by a 
symmetrical arrangement of active sites — one on each 
subunit of a dimer (Table 1). Where four strands are 
cut at the same time, as with S/fl, four active sites are 
required — one on each subunit of a tetramer (Table 1). 
One exception to this organization is FokI (Table 1), 
which is a monomer comprising two domains con- 
nected by a flexible amino acid linker; one domain 
binds the recognition site whilst the other cleaves 
DNA. 

A comparison of the current structures of type II 
restriction endonucleases reveals two distinct groups 
of enzymes; an EcoRI-like group and an EcoRV- 
like group, segregated according to which structure 
they most resemble. Any homology at a structural 
level seems surprising due to the lack of significant 
homology at the amino acid level. Nevertheless, the 
structural similarities appear to relate to similarities 
in the biochemical properties of the enzymes. In both 
groups, the structural elements that interact with 
the DNA rely on a network of hydrogen bonds to 
the bases and backbone phosphates. For the EcoRI-like 
group, these contacts result in the enzymes binding 
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more strongly to their recognition sites than any other 
sequences (some binding energy may be lost due to 
DNA distortion). In contrast, EcoRV-like enzymes 
bind every DNA sequence equally well, including 
their own recognition sites; no discrimination be- 
tween sites can occur at the level of DNA binding. 
Instead, discrimination arises at the catalytic step; only 
the DNA in the specific complex is sufficiently dis- 
torted to allow Mg** ions to bind and so facilitate 
cleavage (the Mg”* ions associate weakly when the 
enzyme is bound at any other site). The structure of 
the DNA in the protein-DNA complex is different in 
each case. With EcoRV the DNA is severely distorted, 
whereas with BamHI it is essentially B-DNA like. 
Despite the differences between and within the 
groups, the active site regions of all the enzymes are 
similar. It is not yet clear if this relates to a shared 
mechanism for the actual DNA cleavage step. 


Further Reading 

Roberts RJ and Macelis D (1998) REBASE — Restriction enzymes 
and methylases. Nucleic Acids Research 26: 338-350 
(http://www.neb.com/rebase/) 

Roberts RJ and Macelis D (2001) REBASE — Restriction enzymes 
and methylases. Nucleic Acids Research 29: 268-269 
(http://rebase.neb.com) 


See also: DNA Cloning; DNA Mapping; DNA 
Modification; Nuclease; Recombinant DNA 
Technology; Restriction and Modification 
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A restriction fragment length polymorphism (RFLP) 
is a DNA variation that affects the distance between 
restriction sites (most often by a nucleotide change 
that creates or eliminates a site) within or flanking a 
DNA fragment recognized by a cloned probe. RFLPs 
are detected as bands of different sizes on Southern blot 
hybridization. The term RFLP is commonly used even 
in situations where the DNA variation may not 
represent a true polymorphism in the population- 
based definition of this term. 


See also: Restriction Endonuclease 
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A restriction map is a map of DNA illustrating the 
position of sites recognized and cleaved by various 
restriction endonucleases. 


See also: Restriction Endonuclease 
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The RET (Rearranged in Transfection) proto- 
oncogene encodes a cell surface receptor tyrosine 
kinase which is required for development of the kid- 
ney and some nerves. RET is frequently rearranged in 
papillary thyroid carcinoma, resulting in fusion of the 
RET tyrosine kinase domain to sequences of one of 
several other proteins. These chimeric proteins are 
able to dimerize and stimulate cell proliferation and 
tumor formation in the absence of the signals that 
normally control RET activation. Inherited point 
mutations in RET, which also activate the protein 
inappropriately, cause the cancer syndrome multiple 
endocrine neoplasia type 2 (MEN 2). Conversely, 
mutations that inactivate RET are found in patients 
with the birth defect Hirschsprung disease, which is 
characterized by absence of the nerves and ganglia of 
the lower intestine. 


See also: Hirschsprung’s Disease; Multiple 
Endocrine Neoplasia 
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Mathematically, a tree is a graph for a minimally 
connected set of points (nodes) connected by edges 
(branches). Minimally connected means that, for any 
pair of points in the graph, there is one and only one 
path that gets you from one node to another node. If 
there were more than one way of getting from one 


point to another, the tree becomes a network (having 
paths that are loops). Genealogies often have loops 
within them as relatives mate with each other (called 
inbreeding and seen especially, for example, in royal 
families or small religious sects). The presence of these 
loops is called reticulation. Species hybridization (e.g., 
crossing a lion and a tiger to get ligers or tigons) is 
another way in which reticulation may arise. A gene 
sequence formed by the recombination of two other 
genes has a reticulated ancestry. Most methods for 
phylogeny reconstruction are incapable of recogniz- 
ing reticulation. 


See also: Trees 


Retinitis Pigmentosa 


S S Bhattacharya 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1 108 


Retinitis pigmentosa (RP; MIM 268,00. Online 
Mendelian Inheritance in Man:http://www.ncbi.nlm. 
nih.gov/omim) is a clinically and genetically hetero- 
geneous disease primarily affecting the rod photo- 
receptor cells of the retina. The disease is progressive 
and has an overall prevalence of approximately 1:3000. 
RP is initially characterized by night blindness and 
reduction in peripheral visual field, and later involves 
loss of central vision. It can be inherited as an auto- 
somal recessive, autosomal dominant, digenic, or 
X-linked trait. Genetic studies have implicated 29 
chromosomal loci, and so far mutations in 15 genes 
have been associated with RP. A comprehensive list of 
RP loci and references is available on RetNet website 
(http://www.sph.uth.tmc.edu/RetNet). 
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Retinoblastoma is the commonest intraocular tumor 
of childhood, arising from primitive precursor cells in 
the developing retina, and affecting one child in 20 000 
before the age of 5 years. It is of seminal importance 
in human cancer genetics as a paradigm of the role of 
the antioncogene (tumor suppressor gene) in human 
cancer. Studies of the function of the gene product are 
shedding light on many processes fundamental to cell 
growth and replication. 
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Clinical Aspects 


Two different patterns of retinoblastoma have long 
been recognized — a hereditary form and a nonheredi- 
tary form. Hereditary retinoblastoma is usually bilat- 
eral, with the tumor arising from several foci within 
the eye. They tend to present earlier than nonheredi- 
tary cases (mean age 8 months vs. 25 months), and the 
patients are also at increased risk of second primary 
tumors, particularly osteosarcoma (bone tumor). 
There may be a family history of retinoblastoma. In 
nonhereditary retinoblastoma the tumor is unilateral 
and unifocal, with no family history. 

Untreated, retinoblastoma expands and invades 
local tissues and the brain, resulting in death in the 
large majority of cases. Treatment of advanced cases 
requires removal of the affected eye, but external 
radiotherapy, focal radiotherapy (in which a radio- 
active source is implanted next to the tumor), laser 
photocoagulation, cryotherapy, and chemotherapy 
have proved to be effective treatments in suitable 
cases, and have resulted in survival rates of over 90%. 


Genetics 


Hereditary retinoblastoma is passed on as a dominant 
trait with 90% penetrance; however about 75% of 
cases are new mutations. In 1971 Knudson proposed 
a ‘two-hit’ mutational model to explain the differences 
between the hereditary and nonhereditary forms. It 
was postulated that in hereditary retinoblastoma, a 
mutation or deletion in the germline had already inacti- 
vated one copy of the retinoblastoma gene. A second 
‘hit’ (inactivation event) which inactivated the remain- 
ing (wild-type) allele in the retinoblast cell then led 
to development of a tumor. In nonhereditary retino- 
blastoma, two such somatic hits are required in a 
retinoblast cell before the tumor can develop. Since 
one hit is a fairly likely event, but two hits in the same 
cell line are very much more unlikely, this was thought 
to explain why inherited retinoblastoma was usually 
bilateral and multifocal, as well as having an earlier age 
of onset than nonhereditary cases. Thus the retino- 
blastoma gene was predicted to be a ‘tumor suppres- 
sor’ gene with a role in preventing tumor formation. 


Molecular Aspects 


The association of retinoblastoma with chromosomal 
deletions and translocations involving chromosome 
13 led to the localization of the susceptibility gene to 
chromosome band 13q14. Subsequent work resulted 
the isolation of a gene, named RB1, which was 
consistently found to be mutated or deleted in ret- 
inoblastoma tumor cells. In hereditary cases of retino- 
blastoma, a mutation was identifiable in the germline 
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in most patients, whereas in nonhereditary cases both 
alleles of the RB1 gene were found to be intact in the 
germline, but to have been inactivated in the tumor cell 
line in two separate events, either by point mutation or 
deletion, thus confirming the two-hit hypothesis. 
Since the work on retinoblastoma, other tumor sup- 
pressor genes have been identified which also comply 
with the two-hit model, so this has proved an import- 
ant paradigm in understanding many types of cancer. 


Gene Product 


The RB gene encodes a 105 kDa protein named p105- 
RB which acts to regulate cell proliferation by binding 
certain transcription factors (such as members of the 
E2F family) in the nucleus of the cell at specific points 
in the cell cycle. In addition, it has been shown that 
some viral proteins, such as the E17 protein from the 
human papillomavirus type 16 (HPV-16), bind p105- 
RB, and thereby promote cell division and viral repli- 
cation. The p105-RB protein is expressed in all tissues, 
but has been shown to be absent from retinoblastoma 
cells, and also from a high proportion of other types of 
tumor, such as lung, breast, and bladder tumors. 


Families with Retinoblastoma 


Before the RB gene had been isolated, it was possible 
to infer mutation carrier status using closely linked 
genetic markers in some families. With identifica- 
tion of the precise defect in many families, it is now 
possible to look for the mutation directly. 


Further Reading: 

Fearon ER (1997) Human cancer syndromes: clues to the origin 
and nature of cancer. Science 278: 1043—1050. 

Murphree AL (1997) Retinoblastoma. In: Rimoin DL, Connor 
JM and Pyeritz RE (eds) Emery & Rimoin’s Principles & Practice of 
Medical Genetics, 3rd edn. New York: Churchill Livingstone. 

Moore A (1990) Retinoblastoma. In: Taylor D (ed.) Pediatric 
Ophthalmology. Boston, MA: Blackwell Scientific Publications. 

Weinberg RA (1996) How cancer arises. Scientific American. 
1996: 62. 


See also: Oncogenes 
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Retroviruses are RNA-containing viruses that can 
convert their RNA genome into circular DNA 


molecules through a viral-associated reverse tran- 
scriptase which becomes activated upon cell infection. 
The resultant DNA ‘provirus’ can integrate itself into 
a relatively random site in the host genome. The 
genetic information present in the retroviral genome 
is retained within the integrated provirus, and under 
certain conditions, the provirus can be activated to 
produce new RNA genomes along with the associated 
proteins — including reverse transcriptase — that can 
come together to form new virus particles that are 
ultimately released from the cell surface by exocytosis. 
However, in many cases, stably integrated retroviral 
elements appear not be active. 

Once it has become integrated into a chromosome, 
the provirus will become replicated with every round 
of host replication irrespective of whether the pro- 
virus itself is active or silent. Furthermore, proviruses 
that integrate into the germline — through the sperm 
or egg genome — will segregate along with their host 
chromosome into the progeny of the host animal and 
into subsequent generations of animals as well. In 
certain hybrid mouse strains, new proviral integra- 
tions into the germ line can be observed to occur at 
abnormally high frequencies. 

All strains of mice as well as all other mammals 
have endogenous proviral elements. These elements 
can be classified and subclassified according to the 
type of retrovirus from which they derived. Loss and 
acquisition of new proviral sequences is an ongoing 
process and, as a consequence, the genomic distri- 
bution of these elements is highly polymorphic. 

It is of evolutionary interest to ask the question: 
From where do retroviruses come? Retroviruses 
cannot propagate in the absence of cells, but cells can 
propagate in the absence of retroviruses. Thus, it 
seems extremely likely that retroviruses are derived 
from sequences that were originally present in the cell 
genome. The first retrovirus must have been able to 
free itself from the confines of the cell nucleus through 
an association with a small number of proteins that 
allowed it to coat, and thus protect, itself from the 
harsh extracellular environment. Of course, the pro- 
tein most critical to the propagation of the retrovirus 
is the enzyme that allows it to reproduce - RNA- 
dependent DNA polymerase, commonly referred to 
as reverse transcriptase. But where did this enzyme 
come from? Reverse transcriptase catalyzes the pro- 
duction of single-stranded complementary DNA 
molecules from an RNA template. This enzymatic 
activity does not appear to be required for any normal 
cellular process known in mammals! How could such 
an activity — without any apparent benefit to the host 
organism — arise de novo in a normal cell? One pos- 
sible answer is that reverse transcriptase did not 
evolve for the benefit of the organism itself but, 


rather, for the benefit of selfish DNA elements within 
the genome that utilize the enzyme to propagate 
themselves within the confines of the genome. 


See also: Retroviruses; Reverse Transcription 
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Retroregulation is the ability of a sequence down- 
stream to regulate translocation of an mRNA. 


See also: Regulatory RNA 
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Definition 


Retrotransposons (or retrotransposable elements) are 
3.5-10 kb mobile DNA units that encode the enzyme 
reverse transcriptase. New copies of a retrotransposon 
are generated by reverse transcription of an RNA 
transcript, a process called retrotransposition. This 
requirement of an RNA intermediate distinguishes 
all retrotransposons from the transposons whose 
mobility are dependent upon the direct replication or 
excision/reinsertion of DNA. 


Classes of Retrotransposons 


The first retrotransposons to be discovered were 
similar in structure and protein-encoding capacity to 
vertebrate retroviruses. While these elements do not 
contain an extracellular step in their life cycle, all 
aspects of their intracellular retrotransposition are 
similar to that of retroviruses. The second class of 
retrotransposons to be discovered were unlike retro- 
viruses in structure and encoded a different set of 
proteins. This second class of elements did not contain 
the long terminal repeats (LTRs) associated with retro- 
viruses and the first class of retrotransposons, ending 
instead with a simple poly(A) tail. This second class of 
elements has been termed nonviral retrotransposons, 
poly(A) retrotransposons, retroposons, and LINEs 
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(long interspersed nucleotide elements). Given the 
importance of LTRs in all aspects of retroviral retro- 
transposition and the ease of identifying LTRs, most 
authors refer to these two groups of retrotransposons 
as the LTR and non-LTR classes. 


Structure and Retrotransposition 
Mechanism 


LTR Retrotransposons 

The flanking LTRs are usually several hundred base 
pairs in length but range from less than 100 to over 
1000 bp. As shown in Figure IA and B the central 
protein encoding region usually contains two open 
reading frames (ORFs). The first is similar to the 
retroviral gag ORF, encoding analogs of retroviral 
capsid and sometimes nucleocapsid proteins. The 
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Figure | Genomic organization of the major families 


of retrotransposons: (A) Tyl, Saccharomyces cerevisiae 
(Pseudoviridae); (B) Ty3, Saccharomyces cerevisiae (Me- 
taviridae); (C) R2, Drosophila melanogaster; (D) L1, Homo 
sapiens. Shaded boxes represent the DNA genome of 
the element with long terminal repeats (LTRs) repre- 
sented by the arrowheads. Boxes below the DNA 
genome denote open reading frames (ORFs) which are 
offset to indicate that they are separated by a frameshift. 
Conserved coding domains within each ORF are as 
follows: PR, protease; RT, reverse transcriptase; RN, 
RNase H; IN, integrase; EN, restriction-like endo- 
nuclease; APE, apurinic-like endonuclease. 
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second is similar to the retroviral pol ORF and 
encodes a protease (PR), reverse transcriptase (RT), 
RNase H (RN), and integrase (IN) proteins. The IN 
proteins belong to the same family of proteins as the 
transposases found in bacterial and eukaryotic trans- 
posons. The downstream pol ORF is translated at a 
lower level than the gag ORF by means of ribosomal 
frameshifting or bypassing of termination codons. 
Most LTR retrotransposons can be placed into two 
phylogenetically distinct groups: the Pseudoviridae 
and Metaviridae (these terms denoting the uncertainty 
as to whether they should be considered as viruses). 
The major difference between these two groups is in 
the location of the IN domain either before or after the 
RT-RH domains. The latter arrangement is found in 
retroviruses. Exceptions to the structures shown in 
Figure IA and B include elements that have fused 
the gag-pol ORFs into a single ORF, and elements 
containing a third ORF downstream of pol analogous 
to the env gene of retroviruses. 

Transcription of the element begins within the left 
(upstream) LTR and terminates at the right (down- 
stream) LTR. Reverse transcription is usually primed 
by the 3’ end of a specific host tRNA annealing to the 
RNA template immediately downstream of the 5’ 
LTR sequences. After the reverse transcriptase has 
extended to the 5’ end of the RNA template, the 
complimentary DNA (cDNA) is transferred to the 3’ 
end of the template by means of the identical LTR 
sequences, where it is further extended to generate a 
full-length (—) strand DNA. The RNA template is 
destroyed by RNase H activity. Priming of second 
(+) strand DNA synthesis is by means of a small 
region of RNA resistant to this degradation at a site 
near the downstream LTR sequences. Again a tem- 
plate jump across the ends of the template enables the 
reverse transcriptase (now exhibiting DNA-directed 
DNA polymerase activity) to generate a full-length 
double-stranded DNA. Finally the integrase binds to 
the end of the linear DNA intermediate and directs 
integration into a chromosomal target site in a manner 
similar to that used by transposons. 


Non-LTR Retrotransposons 

Non-LTIR retrotransposons have neither direct nor 
inverted terminal repeats. The protein-coding domains 
of non-LTR retrotransposons are considerably more 
variable and not as well characterized as that of the 
LTR retrotransposons and retroviruses. All non-LTR 
elements can be divided into two groups based on the 
nature and location of their encoded endonuclease 
domain. The endonuclease domain of one group 
(Figure IC) is located downstream of the RT domain 
and has an active site similar to that of certain restriction 
enzymes (EN). Most elements with this EN domain 


encode a single ORF with putative DNA-binding 
domains upstream of the RT domain. The second 
group of non-LTR elements (Figure ID) contains 
an endonuclease domain with homology to apurinic— 
apyrimidinic endonucleases located upstream of the 
RT domain (APE). Most elements with the APE 
domain encode two ORFs, with the first ORF analo- 
gous to a gag. The APE group has many exceptions to 
these general features: some elements do not encode a 
first ORF, some encode an RNase H domain down- 
stream of the RT domain, while others have lost all 
domains downstream of the RT. 
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Figure 2 Target-primed reverse transcription (TPRT) 
model for the integration of non-LTR retrotransposons. 
Most elements end in an A-rich or poly(A) tail. In the 
initial step an element-encoded endonuclease cleaves 
the first strand of the target site and uses the released 3’ 
hydroxyl of the terminal nucleotide to prime reverse 
transcription. Cleavage of the second DNA strand 
probably occurs after reverse transcription. The mech- 
anism by which the 5’ end of the cDNA is attached to 
the upstream target sequences is unclear, but appears 
functionally equivalent to the reverse transcriptase 
simply jumping from the RNA template onto the DNA 
target. The means by which the RNA is removed and 
the second DNA strand synthesized is also not known, 
but is presumed to be heavily dependent on the cellular 
repair machinery. Thick lines, DNA target sequences; 
wavy lines, element RNA sequences; thin lines, element 
DNA sequences. 


Non-LTR retrotransposons either have an internal 
promoter that initiates transcription upstream at the 5’ 
end of the element or are cotranscribed along with the 
target site from a external host promoter. As shown in 
Figure 2 integration is initiated by the endonuclease 
cleaving (or simply nicking) the target site. The reverse 
transcriptase then utilizes the released 3’ end of the 
DNA to prime reverse transcription of the RNA tem- 
plate starting at its 3’ end. This polymerization of the 
first (—) DNA strand directly onto the insertion site 
is called target-primed reverse transcription (TPRT). 
The means by which the second (+) DNA strand is 
synthesized and the 5’ end of the element is attached to 
the upstream target site is not known. However, these 
steps do not depend on the 5’ end of the element as 
integration proceeds normally even when the reverse 
transcriptase fails to reach the 5’ end of the RNA 
template resulting in 5’ truncated copies. 


Distribution and Evolution 


Retrotransposons are found in all eukaryotes but not 
in prokaryotes. There is a direct correlation between 
the size of a eukaryotic genome and the abundance 
but not necessarily the type of retrotransposons. For 
example, 3% of the small yeast genome is composed 
of retrotransposons, which are all of the LTR class. 
The much larger human genome is over 30% retro- 
transposons, predominantly of the non-LTR class. 
Finally, 75% of the even larger maize genome are 
retrotransposons, predominantly of the LTR class. 

Retrotransposons usually establish long-term asso- 
ciations with the host genome. This differs from the 
transposons which are believed to be active for only a 
short time in any genome and are dependent on hori- 
zontal transfers between species for their long-term 
survival. The predominant vertical (through the germ- 
line) inheritance of retrotransposons is most pro- 
nounced in the non-LTR elements. L1 elements have 
been slowly accumulating throughout the 100-million- 
year history of mammalian genomes. R2 elements 
have been stable components of arthropod genomes 
for over 500 million years. These stable relationships 
of retrotransposons with the host genome are believed 
to have given rise to specialized insertion strategies. 
All retrotransposons in yeast insert either into hetero- 
chromatin or immediately upstream of tRNA genes, 
where they do not interfere with the expression of 
host genes. Similarly, a variety of retrotransposons in 
arthropods insert at specific locations in the rRNA 
genes or telomeric sequences of their host. 

The long-term relationship between retrotranspo- 
sons and the host genome raises the question of what 
controls their copy number, and whether they have 
positive as well as negative effects on the genome. 
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Mobile elements have been suggested to supply 
sequence variation which could enable hosts to evolve 
rapidly. On the other hand, the excessive numbers of 
these elements in many species suggest a wanton dis- 
regard for the well-being of the host genome. A num- 
ber of eukaryotes have evolved elaborate mechanisms 
in attempts to eliminate or silence these elements. 
Much remains to be understood of this ‘molecular 
arms race.’ 


Further Reading 

Boeke JD and Stoye JP (1997) Retrotransposons, endogenous 
retroviruses, and the evolution of retroelements. In: Coffin 
JM, Hughes SH and Varmus HE (eds) Retroviruses, pp. 343— 
435. Cold Spring Harbor, NY: Cold Spring Harbor Labora- 
tory Press. 

Craig N, Craigie R, Gellert M and Lambowitz A (eds) (in press) 
Mobile DNA, vol. 2. Washington, DC: American Society of 
Microbiology Press. 


See also: Integrase Family of Site-Specific 
Recombinases; Integrons; Retroviruses; Site- 
Specific Recombination; Transposable Elements 
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Retroviruses are a diverse family of animal viruses that 
contain RNA as their primary genetic material but 
produce a double-stranded DNA copy of their gen- 
ome in order to express their genes. The dual genetic 
system employed by retroviruses allows transmission 
of their genetic information from cell to cell as pack- 
aged RNA while simultaneously leaving an integrated 
DNA copy residing in the chromosomes of each 
infected cell. 

The study of retroviruses has had an enormous 
impact on molecular biology, biotechnology, and 
molecular medicine (Table 1). Retroviruses were 
first identified as pathogens giving rise to a wide 
range of diseases, but received special attention 
because of their role in inducing cancers. The first 
cancer-inducing, or oncogenic retrovirus, the avian 
sarcoma virus, was isolated by Peyton Rous in 1911. 
By the late 1970s the study of retrovirally induced 
cancers led to the discovery of oncogenes. Oncogenes 
are cellular genes that are normally involved in the 
regulation of cell growth but are able to induce tumors 
because they are abnormally expressed or activated by 
mutations. Retroviruses also cause a variety of serious 
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Table I Key events in the history of retroviral research 

Year Event 

1904 The first retrovirus, equine infectious anemia virus (EIAV), was described 

1911 Rous sarcoma virus was isolated and shown to cause tumors in chickens 

1936 Mouse mammary tumor virus was shown to be the genetic factor resulting in an 
increased incidence of mammary tumors in certain strains of mice 

1951 Murine leukemia virus was isolated from a strain of mice selected for high 
frequency of leukemias 

1960 Temin proposes the proviral hypothesis 

1970 Temin and Baltimore discover reverse transcriptase 

1976 The viral oncogene src was shown to be derived from a normal cellular gene 

1978-80 Long terminal repeats were discovered, and the detailed scheme for reverse 
transcription was described 

1977-87 Retroviral genomes sequenced and mechanism of action of numerous 
viral oncogenes established 

1981 Tumor induction by proviral insertion discovered 

1983-84 HIV-I identified as the cause of AIDS 

1985 Viral trans-activator proteins discovered 

1989 Retroviruses first used for genetic manipulation in humans 

1993-98 Crystal structures of reverse transcriptase, protease, and integrase 

1997-99 Marketing of protease inhibitors as treatment for AIDS, the first examples of drugs 


that were derived from structure-based design 


human diseases including acquired immune deficiency 
syndrome (AIDS), adult T-cell leukemia, and tropical 
spastic paraparesis. 

The molecular feature shared by all retroviruses 
is the ability to transfer genetic information from 
RNA to DNA (Figure 1). The identification of the 
RNA-dependent viral DNA polymerase, or reverse 
transcriptase, by Temin and Baltimore (1970) trans- 
formed prevailing concepts of the transmission of 
genetic information. Before reverse transcriptase was 
discovered, it was tacitly assumed that the flow of 
genetic information from DNA to RNA to proteins 
was unidirectional and irreversible. After the discov- 
ery of reverse transcriptase, it became apparent that 
the growth cycle of retroviruses, involving reverse 
transcription, chromosome integration, and transcrip- 
tion of the viral DNA back into RNA, is a paradigm 
for a ubiquitous mechanism of genetic exchange. 
Examples of organisms utilizing reverse transcrip- 
tion include the Ty elements of yeast and the copia 
elements of Drosophila, other viruses such as hepatitis 
B, and a wide variety of cellular movable genetic elem- 
ents known as retrotransposons. Remarkably, over 
17% of the human genome is composed of sequences 
resulting from the reverse transcription of the LINE-1 
retrotransposon elements and another 7.2% of the 
human genome is derived from defective retroviruses 
(mammalian endogenous retroviruses). 

In the present recombinant DNA era, reverse 
transciptase has become an essential tool for genetic 


manipulation. This unique enzyme is used to generate 
DNA copies of cellular mRNAs which can then be 
cloned and sequenced. Genetically engineered retro- 
viruses are also widely used for the stable expression 
of cloned genes in mammalian cells, and as delivery 
vehicles for human gene therapy. 


Molecular Description of the Retroviral 
Life Cycle 


It is convenient to think of viral replication as com- 
prising four distinct stages: infection, reverse tran- 
scription and integration, viral gene expression, and 
virus assembly and maturation. 


Infection 

Retroviral entry into cells is initiated through the 
binding of the viral envelope (env) to specific receptor 
molecules on the cell’s outer membrane. The envelope 
of retroviruses is composed of two proteins, an 
external highly glycosylated envelope protein (SU) 
and a membrane spanning protein (TM). These pro- 
teins form oligomeric spikes on the surface of virion 
particles. 

The SU protein binds to specific receptor mol- 
ecules displayed on the surface membrane of target 
cells. Binding to the receptor alters the conformation 
of the SU domain and this in turn triggers a conform- 
ational change in the TM protein, which releases 
a hydrophobic fusion peptide sequence. Interaction 
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between the fusion peptide and the lipid bilayer of the 
cell then mediates fusion of the cell and viral mem- 
branes and permits release of the viral core proteins 
and genetic material into the cytoplasm of the infected 
cell. 

The range of cells that retroviruses are able to infect 
is determined primarily by the distribution of the re- 
ceptor molecules in different tissues. Cells that lack an 
appropriate receptor are nonpermissive for viral entry. 
Similarly, virus particles that have lost the surface 
glycoprotein are noninfectious. 


Reverse Transcription 

Immediately after entry into the cytoplasm of a 
susceptible cell, the viral RNA is transcribed into a 
double-stranded copy. The reaction is catalyzed by the 
viral reverse transcriptase (RT). The enzyme carries 
out several reactions, including reverse transcription 
of the RNA into complementary DNA, conversion of 
the newly synthesized DNA strand into duplex DNA, 
and removal of the unused RNA using an intrinsic 
RNaseH activity. 

The proviral DNA generated during reverse tran- 
scription is not a simple copy of the virion RNA 
(Figure 1). During the course of reaction terminal 
duplications known as the long terminal repeats 
(LTRs) are created by a complex set of template 
switches. The LTR carries a short repeat sequence 
(R) found at each end of the viral RNA and two 
unique regions, U5 which is found at the 5’ end of 


Flow of genetic information during retroviral growth. 


the viral RNA and U3 which is found at the 3’ end 
of the viral RNA. The final product is a blunt-ended 
linear duplex DNA with duplicate LTR regions. 

Because RT lacks an error correction mechanism, 
the enzyme shows poor fidelity compared to cellular 
DNA polymerases and introduces 5-10 mismatches 
per genome per round of replication. The high error 
rate of reverse transcription helps to explain both wide 
sequence variations found between individual retro- 
viral isolates, as well as the rapid selection of drug- 
resistant mutants during treatment of patients with 
antiretroviral drugs. 


Integration 

Integration is an essential step in the life cycle of 
retroviruses. Since retroviral DNA molecules are 
unable to replicate autonomously, integration permits 
stable association of the viral and host genomes in 
dividing cells. In addition, integration is required for 
transcription of the viral DNA by cellular RNA poly- 
merase into new copies of the viral genome and the 
mRNAs that encode the viral proteins. 

Since reverse transcription takes place in the cyto- 
plasm, a specialized mechanism is used to bring the 
proviral DNA to the cell nucleus. Newly synthesized 
proviral DNA is found in a nucleoprotein complex 
containing the viral matrix (MA) protein, integrase, 
and in certain viruses, additional auxillary proteins. 
The MA protein carries a nuclear localization signal 
that is required for DNA uptake into the nucleus. 
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The structure of the integrated provirus is pre- 
cisely defined by the sequences of the viral LTR. The 
integrated provirus is colinear with the linear DNA 
product of viral DNA synthesis except for two base 
pairs that are removed from each end during inte- 
gration. The integration of retroviral DNA into the 
host cell genome is catalyzed by the viral integrase 
protein (IN). IN directs the removal of the two term- 
inal base pairs as well as a strand exchange reaction 
which duplicates a short region of flanking cellular 


DNA. 


Control Gene Expression 
Once the provirus is integrated into the host cell’s 
DNA it mimics a cellular gene. The 5’ LTR contains 
the viral transcription start site and contains a series of 
cis-acting control elements that regulate transcription 
initiation by cellular RNA polymerase II. The 3’ LTR 
contains a signal that controls formation of the 3’ end 
and polyadenylation of the viral RNA transcripts. 
The range of transcription factors that recognize 
retroviral LTRs is highly diverse and helps to deter- 
mine the permissiveness of host cells for viral repli- 
cation. Frequently, viral transcription is induced by 
cell-type specific transcription factors which provide 
signals about the growth state of the host cell. For 
example, initiation of HIV transcription is regu- 
lated by NF-«B, a transcription factor that is tightly 


regulated in the activated T cells that form the primary 
site of HIV infection. 


Posttranscriptional regulation 

Retroviral genomes are tightly compressed (Figure 2). 
Expression of the full complement of protein products 
requires differential mRNA splicing, special transla- 
tion mechanisms, and protein processing events. 

All retroviruses carry three essential genes, the gag, 
pol, and env genes, which encode the viral structural 
proteins, enzymes, and viral envelope. Simple retro- 
viruses such as Moloney murine leukemia virus pro- 
duce only two mRNAs: the full-length viral RNA 
which acts as the mRNA for gag and gag—pol and the 
subgenomic mRNA that encodes the env gene. HIV, 
which is one of the most complex retroviruses, has 
nine genes, and produces over 30 different mRNA 
transcripts by splicing. 

Typically, the gag and pol genes are found in differ- 
ent reading frames and translation of both sets of 
proteins requires frameshifting during protein syn- 
thesis. However, in some viruses, such as Moloney 
murine leukemia virus, the gag and pol genes are sepa- 
rated by a ‘leaky’ termination codon. Both genes 
encode polyproteins that are cleaved into active 
components by the viral protease. The viral envelope 
protein is always expressed on a subgenomic RNA 
derived by splicing. Similarly, the accessory proteins 
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Figure 2 Structures of (A) a simple retrovirus, Moloney murine leukemia virus, (B) a complex retrovirus, human 
immunodeficiency virus, and (C) an acutely oncogenic retrovirus, Abelson leukemia virus. 


found in complex retroviruses are expressed on sub- 
genomic mRNAs. 

Regulation of the relative amounts of spliced and 
unspliced RNAs is critical for optimal virus growth. 
Since unspliced transcripts from cellular genes are sub- 
ject to rapid degradation in the nucleus retroviruses 
have developed a variety of signal sequences that 
promote export of unspliced mRNAs from the 
nucleus. In HIV the export signal is supplied by Rev, 
a viral regulatory protein that binds specifically to a 
RNA sequence present in the envelope gene. In other 
cases a specific RNA sequence located in the envelope 
gene is recognized by nuclear export proteins. To 
achieve the correct balance between spliced and 
unspliced RNAs special mechanisms to control the 
efficiency of splicing are also employed by retroviruses. 


Trans-acting viral regulators of transcription 

In complex retroviruses such as HIV and the T-cell 
leukemia viruses, viral trans-activator proteins are 
required for efficient transcription. The viral trans- 
activator proteins are early gene products that amplify 
viral gene expression by establishing a positive feed- 
back mechanism. However, the specific mechanisms 
employed by viral trans-activators vary widely. For 
example, the HTLV-1 viral regulatory protein Tax 
interacts with DNA binding proteins bound to the 
viral LTR to enhance transcription initiation. During 
HIV transcription the viral regulatory protein Tat 
stimulates transcriptional elongation. Tat is recruited 
to the transcription complex by binding toa specific 
RNA leader sequence and then activates a protein 
kinase which selectively phosphorylates the carboxyl 
terminal domain of the large subunit of the RNA 
polymerase. 

The retroviral trans-activators help ensure com- 
plete viral shutdown in latently infected cells. A host 
cell that is growing slowly will typically express only 
low levels of the transcription factors required for the 
initiation of viral mRNA synthesis. In these cells, 
trans-activator protein expression is also highly 
restricted and the provirus genome will remain quies- 
cent. If the host cell is subsequently activated, an early 
burst of trans-activator protein production ensures 
that a full cycle of viral replication is initiated. 


Virus Assembly and Maturation 
The later stages of the life cycle involve synthesis of 
the virion proteins and their assembly into virus par- 
ticles containing two copies of the viral RNA. Assem- 
bly of an infectious virion particle is required for the 
efficient transfer of the retroviral RNA genome from 
cell to cell. 

Particle formation is controlled by the self-assembly 
of the Gag protein. Gag protein comprises the matrix 
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(MA) protein which binds to the membranes and 
directs virions to the cell surface, the capsid (CA) 
protein which forms an inner shell, and nucleocapsid 
(NC) protein which binds directly to the virion RNA. 
In addition to the Gag proteins, retroviral particles 
contain the viral enzymes protease (PR), reverse tran- 
scriptase (RT) and integrase (IN). These enzymes are 
synthesized initially as Gag—Pol (MA-CA-NR-PR- 
RT-IN) fusion proteins and then incorporated into 
immature particles. 

After the retroviral particle buds from the cell sur- 
face the internal pH of the viral particle drops to 
below pH 5.0. This permits assembly of activated 
protease dimers which are then able to cleave the 
viral polyproteins into their individual components. 

The virion RNA contains a specific packaging sig- 
nal (o) located near its 5’ end which is required for 
efficient incorporation of RNA into viral particles. 
Subgenomic viral RNAs lack the packaging signal 
because it is removed by splicing. 


Retroviruses and Cancer 


Retroviruses cause tumors by a wide variety of 
mechanisms, but a common theme is the activation 
of oncogenes. Induction of tumors by nonacutely 
transforming retroviruses is due to retroviral insertion 
adjacent to cellular proto-oncogenes. Retroviral inte- 
gration is mutagenic since the proviral genome is 
inserted at random into regions of actively trans- 
cribed chromatin. When a retrovirus integrates near 
a cellular proto-oncogene tumors frequently result 
because the viral LTR acts as a dominant control 
element that stimulates aberrant expression of the 
oncogene. 

Although integration is irreversible, the provirus 
can undergo partial deletion by homologous recom- 
bination between the long terminal repeats. Pro- 
viruses can also recombine with cellular genes that 
are adjacent to their sites of integration giving rise 
to defective viruses that are capable of transducing 
fragments of cellular genes. Many highly oncogenic 
viruses are defective viruses that carry oncogenes 
initially acquired by nonhomologous recombination 
events. A ‘helper’ virus supplying the proteins needed 
for viral growth is required to permit the replication 
of the defective oncogenic viruses. 

In certain retroviruses the envelope gene can act as 
an oncogene by interacting with receptor proteins that 
are normally used as growth factor receptors. For 
example, the envelope gene of the Friend erythroleu- 
kemia virus produces massive erythroid proliferation 
by binding to, and activating, the erythropoietin (Epo) 
receptor. 
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Human Immunodeficiency Virus (HIV) 


The discovery in 1983 that AIDS is caused by a retro- 
virus, the human immunodeficiency virus (HIV), led 
to an explosion of research into the virus life cycle and 
the structure of viral proteins. Contemporary retro- 
virology is largely devoted to HIV because of the 
pressing need to develop a safe and effective protective 
vaccine and improved antiviral therapies. 

Over the last 10 years, the three-dimensional struc- 
tures of an impressive number of molecules involved 
in HIV replication were solved, including the viral 
coat proteins, the viral enzymes (RT, RNaseH, PR, 
and IN), and the viral core proteins (MA, CA, NR). 
These structures have provided critical starting- 
points for drug discovery efforts. Inhibitors of the 
viral protease represent the first examples of drugs 
to be derived primarily from structure-based drug 
design. 

Work on developing vaccines has progressed 
slowly. One difficulty is that antibodies directed 
against the viral envelope proteins block virus infec- 
tions poorly. More complex strategies involving a 
combination of antigens that stimulate both humoral 
and cellular immunity will probably be needed to 
confer protective immunity. 


Retroviruses and Genetic Manipulation 


The ability of retroviruses to infect a broad range 
of cell types and produce stable integrated copies 
of their genetic material has been exploited in the 
development of retroviral vectors for artificial gene 
transfer. In its simplest form, a retroviral vector con- 
tains only the viral LTR and the RNA packaging 
signal. RNA transcripts of the retroviral vector can 
be packaged into virions using cell lines carrying 
defective viruses that lack the appropriate packaging 
signals. Foreign genes inserted into retroviral vectors 
can be regulated by transcription from the LTR, spli- 
cing, or the introduction of an internal promoter 
element. 

Retroviral vectors have become widely used tools 
for gene transfer in tissue culture and animals and are 
currently the most widely used method for gene trans- 
fer for therapeutic purposes in humans. 


Further Reading 
Coffin JM, Hughes SH and Varmus HE (1997) Retroviruses. 
Plainview, NY: Cold Spring Harbor Laboratory Press. 


See also: Gene Expression; Gene Therapy, Human; 
LINE; Oncogenes; Retrotransposons; Reverse 
Transcription 
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Background 


Recent developments in biology, in particular the 
advent of large-scale genomics, mean that we can, in 
principle, envision an understanding of the functions 
of all the proteins encoded in a genome. The scale of 
this task can be judged by the example of Caenorhab- 
ditis elegans. At the time of publication of the genome 
sequence in 1998, of the 19000 or so predicted genes, 
only about 10% had been analyzed in depth and a 
possible function (as deduced by in silico comparison 
to other known proteins) could be ascribed to only 
about a third. This is not untypical of our level of 
understanding in other model organisms at this time. 
While traditional ‘forward’ genetics (analysis pro- 
ceeding from mutant phenotype to an analysis of 
the genotype) would certainly eventually allow us to 
deduce the function of most essential genes, the avail- 
ability of the genome sequences of model organisms 
invites an alternative approach, i.e., reverse genetics, 
the determination of biological function from the 
starting point of the gene structure. 

While complete genome sequences are not an 
essential prerequisite for such approaches in general, 
they do make it possible to contemplate projects on a 
scale that would not otherwise be possible. Reverse 
genetics procedures depend upon introducing modi- 
fications into the genome, or the expression thereof, in 
order to study the resulting phenotype. The subtlety 
of these modifications, and the accuracy with which 
they can be targeted, depends upon the tractability, 
and our understanding, of the relevant molecular bio- 
logy of the particular organism. Thus, for example, 
precise gene replacement, either to modify or com- 
pletely disrupt (knockout) a gene, can be achieved in 
yeast (Saccharomyces cerevisiae) and mouse by virtue 
of homologous recombination. By contrast, other 
methods need to be employed in C. elegans and 
Drosophila, where methods dependent on homolo- 
gous recombination have not yet been established. 
The methods currently used in reverse genetics are 
outlined below for a number of model organisms. 


Yeast 


Reverse genetics by way of gene targeting or trans- 
placement is relatively straightforward in Saccharo- 
myces cerevisiae, Schizosaccharomyces pombe, and 


other yeasts. A more effective recombination machin- 
ery in conjunction with a relatively small genome (S. 
cerevisiae is 15 Mb) make homologous recombination 
a much more frequent event in these organisms than in 
higher organisms. 

Gene disruption takes advantage of the Cre-LoxP 
recombination system, derived from bacteriophage 
P1. Oligonucleotide primers with 5’ tails homologous 
to the reading frame that is to be disrupted are used to 
produce by PCR (from a plasmid) a disruption cas- 
sette incorporating a kanamycin resistance gene and 
flanking loxP sites. This cassette will integrate at the 
targeted locus with a typical efficiency of about 70%. 
Induction of the Cre recombinase (introduced by 
plasmid transformation) brings about excision of the 
resistance marker. 

Efficient gene disruption is also possible in other 
unicellular eukaryotes. The ability to do this in Dic- 
tyolstelium, for example, by REMI insertional muta- 
genesis, has been in part responsible for an upsurge of 
interest in the genomics of this organism. 


Caenorhabditis elegans 


Deletion Mutagenesis 

To date, direct gene replacement or knockout by 
means of mechanisms dependent on homologous 
recombination have not proved efficient in the nema- 
tode C. elegans. Since precise, targeted excision of a 
section of the 100 Mb genome cannot be achieved, the 
desired deletion has to be identified by screening a 
library of mutated animals (target selected gene in- 
activation). Gene replacement has to be achieved by 
the two-step process of gene knockout followed by 
transgenesis with the replacement gene in the form of 
an extrachromosomal array, the latter being a rela- 
tively straightforward procedure. 

A widely used method for generating deletions is 
based on the imprecise excision of transposons from 
the genome. A library of animals carrying random Tcl 
insertions is screened by PCR using gene-specific and 
transposon-specific primers to identify the presence of 
the transposon in the target gene. Generally, this in- 
sertion will not result in gene inactivation because 
it resides within an AT-rich intronic or extragenic 
sequence. Secondary screens have to be carried out 
subsequently in order to identify individuals in 
which imprecise excision of the transposon has 
occurred, generating an inactivating deletion of the 
target gene. These excision events are relatively rare, 
making it difficult to scale up the process, and further- 
more transposon insertions do not occur randomly 
throughout the genome. 

Consequently, this approach has generally come to 
be replaced by the ‘one step’ precedure of identifying 
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chemically induced deletions in the target gene. A 
number of mutagens (ethylmethane sulphonate, 
ultraviolet-activated trimethylpsoralen, diepoxyoc- 
tane, ethylnitroso-urea) produce deletions of various 
size distributions at a frequency practical for screen- 
ing. DNA derived from libraries of more than 10° 
mutated animals is assayed by nested PCR, generally 
using primers flanking about 3 kb of an exon-rich 
region of the target gene. Gel analysis of the products 
reveals smaller amplicons (favored by the PCR condi- 
tions) resulting from a deletion. Various combinatorial 
pooling schemes are used to screen microtitre plate- 
based libraries of such complexity. In the case of 
C. elegans, it is possible to create either a permanent 
frozen resource, or a living resource of limited dura- 
tion. In either case, the identification of a candidate 
deletion in a DNA pool requires subsequent rounds 
of PCR analysis in order to identify the desired in- 
dividual. This then has to be outcrossed to remove 
extraneous background mutations in order to producea 
strain suitable for phenotypic and functional analysis. 


RNA Interference (RNAi) 

RNA interference (RNAi) is a powerful epigenetic 
means of disrupting gene activity in C. elegans, 
where it was discovered, and has also been found to 
be effective in a wide range of other organisms includ- 
ing Drosophila, zebrafish, protozoa, mice, and plants. 
The technique evolved in C. elegans following the 
observation that sense and antisense RNAs were indi- 
vidually effective at inhibiting maternal and zygotic 
par-1 gene activity in the offspring of a gonad-injected 
parent. It is now known that double-stranded RNA 
(dsRNA) is many times more effective than single- 
strand RNA (ssRNA). (It is possible that single-strand 
effects are, in fact, due to undetected double-strand 
material in the preparation.) The mechanism of the 
effect is not properly understood, but there is evidence 
for cross-phyla conservation with regard to the path- 
ways that are involved, and it has been suggested that 
the cellular function may be part of a transposon or 
viral surveillance mechanism. 

It is postulated that when dsRNA is introduced 
and subsequently detected by a cell, a surveillance 
mechanism is activated that results in the post- 
transcriptional degradation of endogenous RNA. 
Although transitory in effect, and no substitute for 
genomic knockout with regard to genetic manipula- 
tion, RNAi will usually phenocopy null or reduction- 
of-function mutations, and hence give a rapid initial 
insight into the function of many genes. Some classes 
of genes, for example those involved in the develop- 
ment of the nervous system, may be refractory. The 
effects of RNAi, as outlined above, are not stably 
inherited in C. elegans. This has been circumvented 
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by the introduction of transgenes that express, when 
driven in vivo by an inducible heatshock promoter, a 
hairpin dsRNA constructed from inverted copies of 
the target gene. 

The relative ease of RNAi techniques (e.g., feeding 
C. elegans with bacteria that express the RNA and 
soaking the worms in RNA are both effective) has 
led to large-scale, genome-wide analyses of all genes 
in order to assess their involvement in particular 
cellular and developmental processes. 


Drosophila 


Reverse genetics approaches in Drosophila have for a 
long time been based on transposon-mediated meth- 
odologies, analagous to the Tc1-based procedures in 
C.elegans. The relevant family of transposons in Dros- 
ophila is the P element. In general, these methods 
suffer from the same drawbacks as in C. elegans, that 
is, the inability to target specific sequences (necessitat- 
ing target selection from large pools of animals) and 
the nonrandom distribution of insertions, which 
lowers efficiency. Large-scale screens can be carried 
out (using anchored PCR) on pools of flies in which P 
elements have been mobilized. Because the target is 
small and the P elements are distributed nonrandomly, 
the necessarily complex pools make this process non- 
trivial. Some improvement in efficiency can be gained 
by a hybridization strategy that screens a larger region 
for candidate insertions. Under some circumstances, 
genes can be inactivated by P element excision 
through flanking sequence deletion (as in the Tc1- 
based method for C. elegans) or by replacement with 
a modified element. 

Recently, there has been some encouraging success 
with targeted gene knockout by homologous recom- 
bination. This depends upon transgenesis to provide 
in vivo yeast-derived site-specific recombinase and a 
site-specific endonuclease to promote the formation 
of a linear target-specific molecule. The general ap- 
plicability of this methodology in Drosophila has yet 
to be determined. Similar strategies may be applicable 
to other organisms. 

RNAtis effective in Drosophila, though apparently 
not as reliable or effective as in C. elegans. 


Mouse 


Gene targeting in the mouse depends upon hom- 
ologous recombination in cultured embryonic stem 
(ES) cells. A neomycin resistance gene (lacking a 
promoter) is inserted into a 5’ exon of the cloned, in 
vitro mutated target gene. This construct is electro- 
porated into cultured ES cells and correctly targeted 
recombination events (which occur at a frequency 


of about 10%) are identified by neomycin resistance 
in the cell line. These cells are injected into early 
embryos, producing chimeric individuals when 
implanted into foster mothers. These individuals are 
out-crossed with wild-type, and heterozygous off- 
spring are crossed to produce mutant animals. Greater 
specificity of knockout in regard to particular cell 
types or developmental stage can be achieved by 
using the Cre-loxP recombination system under the 
control of a specific promoter. 

In one strain, loxP sites are inserted such that they 
flank the gene or exon of interest. A second strain is 
constructed carrying the Cre recombinase gene under 
the control of a cell-type-specific promoter. In the 
offspring of these crossed mice, the gene of interest 
will be disrupted in those cells in which the pro- 
moter driving expression of the recombinase is active. 
Methods of reverse genetics have not yet been estab- 
lished for other mammals. This has been hampered in 
rats, for example, by the lack of a good ES cell line. 

Knockouts in vertebrate diploid cell lines, in tissue 
culture, can be achieved by disrupting both homologs 
successively using different selectable markers. This is 
particularly successful in the chicken cell line DT40 
(B-lymphocyte-derived) because of very high levels of 
targeted integration. Of course, this approach can be 
used only to address cell-autonomous gene functions. 


Arabidopsis 


It has not proved possible to use homologous recom- 
bination for targeted gene disruption in plants (other 
than the moss Physcomitrella patens). Consequently, 
in Arabidopsis thaliana, which like C. elegans has a 
genome of about 100 Mb, the selection of targets from 
pools of transposon-induced mutants has been used. 
The methodology is very similar to that used for 
C. elegans, but uses T-DNA transposition. 

Recently, heteroduplex analysis by denaturing 
high-performance liquid chromatography (DHPLC) 
has been used to detect base pair changes induced by 
EMS mutagenesis. The region of interest is amplified 
from pooled, mutagenized DNA samples derived 
from M2 plants. The products are heated and cooled 
to promote the formation of heteroduplexes between 
wild-type and mutant fragments. DHPLC analysis 
detects base pair mismatches by alterations in the 
melting and elution profile. This approach should be 
applicable to other organisms. 


Further Reading 

Anderson M and Roberts J (eds) (1998) Annual Plant Reviews, 
vol.l, Arabidopsis. London: Academic Press. 

Barstead R (1999) In: Hope IA (ed.) C. elegans: A Practical 
Approach. Oxford: Oxford University Press. 


Fire A, Xu S et al. (1998) Potent and specific genetic interference 
by double-stranded RNA in Caenorhabditis elegans. Nature 
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New York: WH Freeman. 

Watson JD, Gilman M, Witkowski J and Zoller M (1992) Recom- 
binant DNA. New York: Scientific American Books. 


See also: Arabidopsis thaliana: The Premier Model 
Plant; Caenorhabditis elegans; Cre/lox — 
Transgenics; Drosophila melanogaster; Embryonic 
Stem Cells; Targeted Mutagenesis, Mouse; 
Transposons as Tools 
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Reverse mutation, also called reversion, denotes any 
mutational process or mutation that restores the wild- 
type phenotype to cells already carrying a phenotype- 
altering forward mutation. Forward mutations are 
those that confer a phenotype different from that 
conferred by the wild-type gene. An example of a 
forward mutation might be a mutation that inactivates 
the lacZ gene of the bacterium Escherichia coli (the 
organism in which mutation mechanisms are best un- 
derstood), making the cells unable to grow on medium 
with lactose as the sole carbon source. In this example, 
reversions would include any mutations that would 
allow growth of the cells carrying the reverse mutation 
on lactose medium. 


True Reversion and Pseudo-Reversion 


True reversions are reverse mutations that restore 
the wild-type DNA sequence. Pseudo-reversions are 
changes other than a true reversion that confer the 
phenotype of a reversion. These can be mutations at 
the same or a different place in the gene carrying the 
forward mutation, or even in a different gene. For 
example, forward mutation of a tyrosine-encoding 
TAT codon to the nonsense (translation-stop signal- 
ing) TAA codon in a gene could be reverted by true 
reversion (to TAT), by the pseudo-reversion to TAC, 
which also encodes tyrosine, or by mutation of a 
tRNA gene anticodon such that it inserts an amino 
acid at the TAA stop codon. Such extragenic pseudo- 
reversion mutations are also called extragenic suppres- 
sor mutations. By contrast intragenic suppressor 
mutations are pseudo-reversions that encode a 
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compensating change within the same gene, such as a 
single-base deletion mutation that restores function to 
a gene that was inactivated by a single-base insertion 
mutation at a nearby site. 


Selection 


Reverse mutations have been studied preferentially by 
geneticists because, unlike most forward mutations, 
many reversions can be selected genetically, making 
low-frequency mutation events possible to quantify 
and study. Genetic selection means the use of a condi- 
tion (e.g., growth medium, temperature, or other) 
under which cells or organisms possessing the selected 
genotype can grow, and those not possessing the 
selected genotype cannot. In microbial genetics, one 
way that this can be done is by spreading cell mixtures 
containing a few rare mutants, and billions of non- 
mutants, onto solid medium on which the rare 
mutants can form colonies while the rest of the cells 
cannot. (This differs from genetic screens in which all 
would form colonies, but the mutant colonies look 
different.) Thus selection allows quantification of 
rare genotypes and rare mutation events that create 
them. Selection for forward mutation was done 
(famously) by Luria and Delbriick (1943), who plated 
E. coli cells in the presence of bacteriophage T1 to 
select forward mutants resistant to killing by the 
phage, most of which probably carried alterations in 
the gene encoding the cell protein used by the phage as 
a receptor. Their experiments established bacteria 
as legitimate genetic organisms with genes, and 
mutation, like other organisms. Joshua Lederberg 
(Lederberg and Tatum, 1946) invented the technique 
of selection for prototrophic strains (able to grow 
without special supplementation of the medium), in 
the presence of auxotrophic mutants (that are unable 
to grow without a specific supplement), by plating for 
colonies on medium lacking the supplement (an amino 
acid in his experiments). This is an example of a selec- 
tion that could be used to obtain revertants (cells 
carrying a reverse mutation). 


Forward Mutations Are More Frequent 
and Less Biased 


Inthe study of mechanisms of mutation, forward muta- 
tions that inactivate a gene are sometimes more advan- 
tageous subjects of study (if more difficult to select) 
because there are often many more DNA changes cap- 
able of inactivating a gene than there are changes 
capable of reverting a particular mutation. In the lac 
example of E. coli, any of many different sequence 
alterations in the lacZ (f-galactosidase-encoding) 
gene or the lacY (lactose permease-encoding) gene can 
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produce a forward mutation, leading to a phenotypic- 
ally Lac” cell, whereas reversion of a Jac” mutation 
can be accomplished by only one true reversion, or 
few pseudo-reversions. That is, the target size for for- 
ward mutation is larger. Because of this, forward 
mutations are often more frequent than reverse mu- 
tations, and are better reporters of the variety of 
sequence changes, or mutation spectrum, caused by 
any given circumstance, agent, or process under study. 


Specific Useful Reversion Assays 


Two widely used reversion assays that report muta- 
tions of specific sequence changes have been devel- 
oped (separately) by Bruce Ames and Jeffrey Miller 
and colleagues. The Ames test (Ames, 1971; McCann 
etal., 1975) employs different histidine-requiring (his) 
mutants of the bacterium Salmonella typhimurium 
that can revert to histidine prototrophy by different 
kinds of base substitution mutations or frameshift 
mutations. This is the most widely used mutagenicity 
test for screening chemicals being considered as pos- 
sible additives to food, and other commercial prod- 
ucts. It is important because of the correspondence 
between mutagenicity and carcinogenicity. Addition 
of mammalian microsomal fractions to this plate assay 
is used to mimic the normal metabolic processing that 
converts many different nonmutagenic chemicals into 
mutagens in mammalian cells (McCann et al., 1975). 
Miller and colleagues have developed sets of Lac” 
E. coli strains that revert to Lac* only by specific 
base substitution or frameshift mutations in the lacZ 
gene (Cupples and Miller, 1989; Cupples et al., 1990). 
These have provided a highly useful tool for molecular 
analysis of mutational processes without the add- 
itional work of DNA sequencing. 


Adaptive (Reverse) Mutation 


Reversions have been the assay of choice for many 
recent investigations into the possibility that some 
environmental conditions may provoke mutations 
that allow survival of the condition, a phenomenon 
called adaptive mutation. These have been studied in 
bacteria and in yeast, always with specific reversion 
assays that afford the ability to select rare mutation 
events. The adaptive mutations studied in various 
systems so far are known to occur by multiple differ- 
ent mechanisms, most (but not all) of which do not 
appear to target specifically those genes capable of 
conferring the selected phenotype. These systems are 
interesting both from an evolutionary perspective, 
of understanding whether selective environments 
can play active roles in evolution, and also poten- 
tially as models for carcinogenesis in cells whose 


growth is normally held in check by environmental 
signals. 


Further Reading 

Brock TD (1990) The Emergence of Bacterial Genetics. Cold 
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Reverse transcriptase is an RNA-directed DNA poly- 
merase first discovered in retroviruses, whose action 
can result in the production of double-stranded DNA 
molecules from single-stranded genomic RNA tem- 
plates. Reverse transcriptase also appears to beinvolved 
in the movement of certain mobile genetic elements 
such as the Ty plasmid in yeast, in the replication of 


other viruses such as Hepatitis B, and possibly in the 
generation of mammalian pseudogenes. 


See also: Pseudogene; Retroviruses; Reverse 
Transcription 
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Reverse transcription is the synthesis of DNA using 
an RNA template, a process accomplished by the 
enzyme reverse transcriptase. It is found in retro- 
viruses with genomes of single-stranded RNA mol- 
ecules. During infection, viral RNA is converted into 
a single-stranded DNA, which in turn is used to pro- 
duce a double-stranded DNA molecule. This duplex 
DNA may integrate into the host cell genome and 
become an inheritable element. When this is tran- 
scribed into RNA the cycle is completed. 


See also: Retroviruses; Reverse Transcriptase 
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Reverse translation is a technique for isolating genes 
or mRNAs by virtue of their ability to hybridize with 
a short oligonucleotide sequence prepared by predict- 
ing the nucleic acid sequence from a known protein 
sequence. 


See also: Hybridization 
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Reversion of a mutation is a change in DNA that 
restores function, either by reversing the original 
mutation (true reversion) or by compensating for it 
with a second mutation (second site reversion in the 
same gene). 


See also: Mutation, Back 
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Reversion tests are used to assess the genetic stability 
of an organism. A second method, forward mutational 
analysis, is discussed elsewhere (see Mutational Anal- 
ysis). Both methods are designed to detect heritable 
changes in the DNA sequence, which are called muta- 
tions. Since genomes are large and the creation of a 
mutation is rare, finding a mutation is like finding a 
needle in a haystack. One way to reduce this problem 
is to limit the search for mutations to a single site 
within a gene, which when mutated produces an iden- 
tifiable characteristic or phenotype. For example, a 
bacterial strain with a mutation in one of the genes 
required for tryptophan biosynthesis cannot grow 
unless tryptophan is supplied in the growth medium. 
When cells from a culture of this strain are plated on 
medium lacking tryptophan, however, a few colonies 
are observed (Figure 1). These colonies are called 


0.1 ml 


Revertant colonies 


Tryptophan minus medium 


Culture of trp- bacterial cells 
1 x 109 cells per ml 
(complete medium) 


Figure | Identification of tryptophan (Trp*) rever- 
tants. A bacterial culture was grown from a single cell 
that carried a mutation in one of the genes required for 
the biosynthesis of tryptophan. Tryptophan was supplied 
in the complete medium. The cell concentration 
reached after several hours of growth was | x 10? 
cells per ml. If in the course of the multiple rounds of 
cell growth and division a new mutation occurred that 
‘reverted’ the initial mutation, cells carrying the new 
mutation would be able to grow in the absence of 
tryptophan. The number of Trp’ revertants can be 
measured by plating cells on medium lacking tryptophan. 
In the example illustrated, 0.1 ml of cells containing | x 
10® cells was spread on solid medium containing no 
tryptophan. Five ‘revertant colonies’ were detected. 
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revertants because a new mutation has ‘reverted’ the 
initial mutation and restored wild-type function so 
that these cells can now grow in the absence of tryp- 
tophan. This method provides a sensitive way to 
detect rare mutational events since the few revertants 
can be selected from the large population of mutant 
cells that require tryptophan for growth. 

Two types of information about genome stability 
can be learned from reversion tests. First, mutation 
rates can be determined. For the culture illustrated 
in Figure |, there are 5 revertants per 1 x 10° cells. 
However, production of spontaneous mutations is 
random (see Luria—Delbriick Experiment), which 
means that mutation rates cannot be determined 
from a single culture since the number of revertants 
‘fluctuates’ from culture to culture, as demonstrated 
by Luria and Delbriick. Thus, mutation rates are 
determined by measuring the number of revertants 
in several cultures and applying the mathematical 
methods described elsewhere in this encyclopedia 
(see Luria—Delbriick Experiment). 

Second, reversion tests provide a way to detect spe- 
cific types of mutations since base substitution muta- 
tions revert by base substitutions and frameshift 
mutations revert by frameshifts. If the initial mutation 
in the tryptophan gene is a T—G base substitution 
mutation that changes the codon for tyrosine “TAT to 
the nonsense codon ‘TAG, a GT base substitution, 
which reverses the original mutation, is required 
to recreate the wild-type sequence (Figure 2). Gene 
function, however, may also be restored by several 
other base substitutions that convert the nonsense 
codon, which does not code for any amino acid, to a 
codon for an amino acid. For example, a T—C base 
substitution produces the ‘CAG’ codon for glutamine 
(Figure 2). If glutamine can function in the protein in 
place of tyrosine, then cells having this mutation will 


Wild-type . TAT... (tyrosine) 
4 
Mutant . TAG... (nonsense) 
Y 
Pseudo . CAG... (glutamine) 


Figure 2 Reversion by base substitution mutations. A 
single codon within a gene required for tryptophan 
biosynthesis is illustrated. The inability to synthesize 
tryptophan is due to a T—G base substitution mutation 
that changes the ‘TAT’ codon for tyrosine to the 
nonsense codon, ‘TAG; which does not code for any 
amino acid. Function is restored by a G—T base 
substitution mutation that recreates the wild-type DNA 
sequence. Function may also be restored by other base 
substitution mutations, such as the T—C base substitu- 
tion illustrated, to produce pseudorevertants. 


be able to grow on medium lacking tryptophan. These 
revertants are called pseudorevertants since they do 
not have the wild-type DNA sequence even though 
they are phenotypically wild-type. 

A frameshift mutation is a loss or gain of one or more 
nucleotides within a gene. In the example illustrated in 
Figure 3, the mutant has gained an extra ‘T; which 
shifts the ‘reading frame’ of the message encoded in 
the DNA by one position (+1). When the reading 
frame is shifted, incorrect amino acids are encoded 
following the frameshift mutation and nonsense 
codons such as “TAG” are usually produced. Restor- 
ation of the wild-type DNA sequence requires loss 
of the extra ‘T, a ‘—1’ frameshift mutation. Pseudo- 
revertants may be produced if another nearby nucleo- 
tide is lost. Loss of the bold ‘A’ in Figure 3 restores the 
reading frame and function will also be restored if 
phenylalanine can substitute in place of tyrosine. 

One use of reversion tests is in the identification of 
mutagens, which are agents that produce mutations. 
The ‘Ames test,’ developed by Bruce Ames and his 
colleagues, employs a series of Salmonella typhimur- 
ium strains that require the amino acid histidine for 
growth to screen chemicals for mutagenic activity. If 
exposure of the bacteria to a particular chemical 
increases the histidine reversion rate, then the chem- 
ical tested is a mutagen. The number of histidine rever- 
tants produced provides a measure of the potency of 
the mutagen and the type of revertant, e.g., base sub- 
stitution or frameshift, provides information on how 
the chemical causes mutations. 

Reversion tests have also been used to identify 
DNA repair genes. A novel use was to construct bac- 
terial strains with two or more mutations. Reversion 
of one mutation is a rare event and reversion of two 


Val Tyr Arg Gin Met 
. GIT TAT AGG CAA ATG... 


-14 vil 
Val Leu Ala Asn 


. GIT TTA TAG GCA AAT... incorrect reading 
4 -1 frame 


Wild-type 


Mutant 


Val Phe Arg Gin Met 


Pseudo . GTT TIT AGG CAA ATG ... correct reading 


frame restored 


Figure 3 Reversion by frameshift mutations. A few 
codons within a gene required for tryptophan biosynth- 
esis are illustrated. The mutant has gained an extra ‘T; a 
+1 frameshift mutation, which shifts the reading frame 
as illustrated. Function is restored by loss of the extra 
‘T’, a —I frameshift mutation. Function may also be 
restored by loss of another nearby nucleotide, such as 
the bold ‘A’ in the illustration, to produce a pseudo- 
revertant. 


mutations within the genome of a single cell is very 
rare. Thus, selection of revertants in which two or 
more mutations have reverted is indicative of a bacter- 
ial cell with an unusually high mutation rate. Analysis 
of these ‘mutator’ strains led to the discovery of genes 
that function in the mismatch repair pathway in bac- 
teria. These bacterial studies led to the identification of 
similar genes in human cells. 


See also: Ames Test; DNA Repair; Luria—Delbriick 
Experiment; Mismatch Repair (Long/Short 
Patch); Mutagenic Specificity; Mutational 
Analysis; Mutation; Mutation Rate; Mutators; 
Reverse Mutation 


Revertants 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 | .2007 


Revertants are derived by reversion of a mutant cell or 
organism. 


See also: Reversion Tests 


Rh Blood Group Genes 
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The Rh blood group system (frequently but mislead- 
ingly termed ‘Rhesus’) was discovered in 1939-1940 
as the result of observations made by four people, 
namely, P. Levine, R.E. Stetson, K. Landsteiner and 
A.S. Wiener working in two teams in New York. 
Levine and Stetson described an antibody found in a 
woman postpartum which reacted with an antigen 
which was present on the red cells of her stillborn 
fetus and those of the father. At about the same time, 
Landsteiner and Wiener produced antibodies in rab- 
bits against rhesus monkey red cells; these appeared to 
have the same specificity as the antibody of Levine and 
Stetson and were also found to agglutinate 85% of a 
sample of white people in New York, who were thus 
termed Rh-positive. Ironically, it was later found that 
the original rabbit antibodies were not reacting with 
the Rh antigens at all but with another antigen, now 
termed LW (for Landsteiner—Wiener), whichis genetic- 
ally independent but phenotypically related to Rh in 
that it is more strongly expressed on Rh-positive red 
cells. 
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Structural Models Based on Serological 
Evidence 


It was clear from the beginning that the Rh locus was 
complex, consisting of a number of antigens; two 
principle genetic models were devised, that of A.S. 
Wiener and that of R.A. Fisher and R.R. Race. 
Wiener’s theory suggested multiple alleles at a single 
locus. In contrast, Fisher and Race suggested three 
closely-linked loci; their model was based on the 
agglutination patterns of four antibodies, two of 
which were antithetical. These latter two antibodies 
were considered to be recognizing antigens deter- 
mined by the alleles C and c; the other two antibodies 
recognized the antigens D and E and it was also pos- 
tulated that each had an allele, d and e. Antibodies to 
the e antigen were later found, but not for d. The 
alleles at the three loci are thus D or d, C or c, and E 
or e and these can be arranged in eight different ways 
on a chromosome, namely, DCE, DCe, Dee, DcE, 
dCE, dCe, dcE and dce; all eight combinations have 
been identified. The presence of the D gene on either 
chromosome designates the person as Rh-positive; 
Rh-negative is defined by the presence of dce on 
both chromosomes. The terms Rh-positive and 
D-positive are synonymous. The frequencies of the 
various combinations vary considerably in different 
populations and this has been utilized in the determin- 
ation of ethnic origins. Thus, the DCe gene is common 
in the UK but rare in Nigeria; the dce gene is also 
common in the UK but rare in the Chinese and the 
dCE gene is rare in all populations. This notation and 
model has made the complexity easily understood and 
has been very fruitful in the analysis of the Rh system. 
In 1986, Patricia Tippett, reviewing all the serological 
evidence which had accrued in the previous 40 years, 
postulated the presence of only two genes, encoding 
for D and for CcEe and genetic analysis has now 
shown this to be the case. 


Chemical Structure and Identification of 
the Genes 


Elucidation of the chemical nature of the Rh antigens 
was hampered by the lability of the D molecule but 
this was overcome by the finding that it was stabilized 
when combined with anti-D. Cloning of the Rh genes 
was achieved in 1990 by groups in Paris (directed by 
J-P. Cartron) and in Bristol (directed by D. Anstee), 
using probes based on N-terminal amino acid 
sequences obtained from immunoprecipitated D anti- 
gen. Both groups identified the RhCcEe gene which 
coded for a polypeptide of 417 amino acids. The D 
antigen was later found to differ from CcEe by a total 
of 36 residues. There are only two genes, RHD and 
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RHCcEeg, situated on chromosome 1 (1p34.3-1p36.1) 
each with 10 exons; Rh-negative people simply lack the 
RPD gene; there is no gene for the postulated d anti- 
gen, hence explaining the failure to find anti-d anti- 
bodies. Hydropathy analysis showed that there are 
12 alternating domains of hydrophilic and hydro- 
phobic residues, indicating that the molecule crosses 
the membrane 12 times, both N- and C-terminal ends 
being cytoplasmic. The extracellular surface of the 
molecule thus consists of six loops ranging in length 
between 7 and 22 residues: the area of this surface may 
be the same as that of the six similar-sized loops of the 
binding site of antibodies. This relatively small area 
suggests that all the DCcEe epitopes are at the 
same position on the Rh molecule; specificity thus 
depends on the amino acid sequences on the loops 
on the extracellular surface. As far as the six loops on 
the extracellular surface are concerned, there are only 
eight residues in the D polypeptide which consistently 
differ from those present in the CcEe polypeptide and 
these differences only involve three of the loops 
(Table 1). The C and c antigens differ only at position 
103 on loop 2; similarly, E and e differ by only one 
residue at position 226 on loop 4. It is interesting to 
note that the Cc and Ee determinants are both present 
at the same time on the same polypeptide, thus 


Table | 


explaining the finding of antibodies which recognize 
the compound antigens CE, Ce, cE and ce, that is, they 
must make contact with the amino acids at positions 
103 and 226. 

There are many variants (although rare) resulting 
from mutations which bring about an abnormal 
expression of one or more of the DCcEe antigens. 
Some mutations involve gene conversion where 
exons from the RHCcEe genes are found in the 
RHD genes, giving rise to partial D antigens. The 
partial D antigen can still react with certain anti-D 
antibodies but nevertheless are sufficiently different 
to allow normal D antigen to evoke an immune 
response when transfused into individuals with the 
partial D antigens. Some rare people, Rhyu, have no 
expression of any Rh antigens on their cells, resulting 
from either the presence of ‘silent’ Rh genes (although 
no mutations have yet been identified) or defective 
regulator genes. 

Rh-like antigens are found in nonhuman pri- 
mates; the c antigen first appeared in the anthropoid 
apes, such as gibbons, which were established as a 
separate species about 20 million years ago. The D 
antigen appears in gorillas and chimpanzees, indi- 
cating that the D gene arose by gene duplication 
from the c gene about 10 million years ago. Only 


The amino acid sequences of the six external loops (the residues which make contact with the antibodies) 


of the C, c, E, e and D polypeptides, as translated from the cDNA nucleotide sequences. The sequence labeled CcEe 
is that common to all the C, c, E, and e antigens. The amino acids which define C and c are found at position 103 and 
those that define E and e are at position 226. The consistent differences between D and CcEe are to be found at eight 
positions on the 3rd, 4th and 6th loops. The dashes indicate that the amino acids are the same (Adapted from Daniels, 


1995.) 

Loop number Residue positions Exon 

| CcEe HYDASLEDQKGLVASYQVGQD 33-53 | and 2 
De) eee ee ee ere cee ene 

2 Cc ----S---------------- 
CcEe SQFP GVVITLFSIRLAT 99-118 2 and 3 
Cc ----P------------- 
D ----S------------- 

3 CcEe LRMVISNIFNTDYHMNLRHFY 152-173 3 and 4 
D N--------------- MMI- - 

4 E - -P---------- 
CcEe NS LLRSPIQRKN 224-235 5 
e - -A--------- 
D - -A---- E---- 

5 CcEe SCHLIPS 284-290 6 
D = £«#ecee 

6 CcDe HTVWNGNGMIGPQVLLSIGE 350-370 8 


D DesGAs tas. 6 6ce 


humans express C, E and e antigens. There is evidence 
that the Rh polypeptides are involved in membrane 
cation transport. 


Further Reading 
Daniels G (1995) Human Blood Groups. Oxford, UK: Blackwell 
Scientific Publications UK. 
Mollison PL, Engelfriet CP and Contreras M (1997) Blood 
Transfusion in Clinical Medicine, |Oth edn. Oxford, UK: Black- 
well Scientific Publications. 


See also: Blood Group Systems 
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Rhabdomyosarcoma is a malignant neoplasm of the 
soft tissues, in which the neoplastic cells show varying 
degrees of skeletal muscle differentiation. The tumor 
most commonly occurs in childhood, where two types 
of rhabdomyosarcoma are recognized, the alveolar 
and embryonal forms. The former is characterized 
by translocations between chromosomes 2q35 and 
13q14 or, less frequently, between chromosomes 
1p36 and 13q14. These translocations create fusions 
between PAX3 and FKHR genes and PAX7 and 
FKHR genes, respectively. The fusion genes are 
powerful transcriptional activators that are likely to 
contribute to neoplastic development by induction of 
secondary transforming genes. Relatively little is cur- 
rently known about genetic abnormalities character- 
istic of embryonal rhabdomyosarcoma. 


See also: Genetic Diseases; Sarcomas; 
Translocation 
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Rhizobium is a genus of gram-negative, motile bac- 
teria whose members are most notable for their ability 
to establish a symbiotic relationship with leguminous 
plants, such as peas, soybeans, and alfalfa. This rela- 
tionship leads to the establishment of specialized 
structures called nodules. In these structures the bac- 
teria are able to convert atmospheric nitrogen into 
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ammonia, a process called nitrogen fixation. The 
ammonia is used by the plant as a nitrogen source. 
Other genera, such as Azorhizobium and Bradyrhizo- 
bium can also nodulate leguminous plants and, 
together with Rhizobium, they are referred to as 
rhizobia. Members of the genus Rhizobium specific- 
ally form root nodules, but some other rhizobia can 
also form nodules on plant stems. 

The roots of leguminous plants secrete a variety of 
organic compounds, such as amino acids, which can be 
utilized by soil microorganisms, such as free-living 
Rhizobium. Thus these organisms can grow to high 
density in the area surrounding these roots. Nodul- 
ation takes place because of specific and complex inter- 
actions between the Rhizobium and the plant. The 
initial attachment seems to involve a protein called 
rhicadhesin, which is found on the surface of all spe- 
cies of Rhizobium, and other determinants on the 
plant. Nodulation typically involves: the attachment 
of the bacterium to the root hairs; invasion of the root 
hair (following hydrolysis of plant cell wall) by for- 
mation of an infection thread; formation of altered 
bacterial cells (called bacteroids) within plant cells; 
the expression of both bacterial and plant genes lead- 
ing to nitrogen fixation; and the formation of the 
nodule itself. Not all plant cells in the root nodules 
are infected. Those that are not are specialized for 
assimilation of the fixed nitrogen produced. 

The process of nodulation is complex and controlled 
primarily by the nod genes found in the rhizobia. 
Although most species of legumes can be nodulated, 
there is in some cases great specificity between the 
particular host species and the infecting Rhizobium 
species or biovar. It is the nod genes that control the 
specificity of the Rhizobium-—plant interaction. While 
some of the nod genes are involved in the nodulation 
of a specific host, many of these genes are found in 
most rhizobia and are essentially interchangeable. The 
nod genes are typically not expressed in cultured 
rhizobia, but are induced by chemicals called fla- 
vonoids secreted by the plant. Induction involves the 
activator protein, NodD, synthesized by the rhizo- 
bium. The nod genes encode a variety of regulatory 
and structural proteins as well as enzymes. They are 
also involved in generating lipooligosaccharide signals, 
called Nod factors, which elicit responses by the plant 
cells involved in nodulation, such as root hair curling. 

In most members of the genus Rhizobium, the nod 
genes are located on large plasmids, called Sym plas- 
mids. These plasmids also carry the genes responsible 
for nitrogen fixation, nif or fix. For instance, Rhizo- 
bium leguminosarum biovar viciae, which nodulates 
peas (genus Vicia), carries a 220 kb Sym plasmid that 
carries the mod and nif genes ina localized area (island) 
of approximately 35 kb. This Sym plasmid confers the 
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ability to nodulate peas on other Rhizobium species or 
biovars when it is transmitted to them. 

The formation of the nodule also involves the 
expression of a set of organ-specific genes in the 
plant, the nodulin proteins. Many of these genes 
produce proteins involved in metabolism within the 
nodule. One of the most interesting of these proteins 
is leghemoglobin. Nitrogen fixation itself is carried 
out by an enzyme encoded by the rhizobia called 
nitrogenase, a large two-component protein contain- 
ing iron and molybdenum. However, this enzyme is 
very sensitive to molecular oxygen and it is the role of 
the plant protein leghemoglobin to protect the nitro- 
genase complex from oxygen. 

Nodulated legumes are at a selective advantage in 
soils that have nitrogen deficiencies. The cultivated 
nodulated legumes such as alfalfa, beans, clover, peas, 
and soybeans are of great economic importance, so 
the symbiotic relationship between these plants and 
rhizobia is also an important one for humans. 


See also: nif Genes 
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Transcription Termination 


Rho factor is a protein that acts in bacterial cells to 
mediate termination of transcription at distinct sites. 
Escherichia coli, and probably most bacteria, have two 
sets of transcriptional terminators: intrinsic and Rho- 
dependent. At intrinsic terminators RNA polymerase 
spontaneously releases its RNA transcript in response 
to changes in the interactions of the enzyme with the 
DNA and RNA that occurs at certain very specific 
sequences. A separate mechanism exists that allows 
termination to occur at places in the genome where 
an intrinsic terminator cannot be used. At such Rho- 
dependent terminators, Rho factor mediates the dis- 
sociation of the RNA from the very stable ternary 
transcription complex. 


Structure 


Rho protein, encoded by the rho gene, is a polypeptide 
of 419 amino acid residues and M, = 47 100. It forms 
oligomeric structures and becomes a stable hexamer 
when it is complexed with RNA. Since its effects on 
RNA are essential to the termination process, the 
hexameric structure on the RNA is considered to be 


the functionally active form. Electron micrographic 
images of Rho indicate that the monomers are prob- 
ably globular and that the hexamers are organized in 
the form of a ring with a maximum outer diameter of 
12.5 nm. 

Rho is both an RNA-binding protein and an 
enzyme, with the capacity to hydrolyze ATP and 
other nucleoside triphosphates. Treatment of Rho 
with trypsin releases a polypeptide of 128 amino acid 
residues from the N-terminals. This monomenic spe- 
cies can bind RNA by itself. Its structure has been 
determined by X-ray crystallography and NMR. It 
consists of two subdomains with the first 48 residues 
organized in an a-helical bundle and the remaining 
residues in a type of B-barrel structure called an oligo- 
nucleotide (and oligosaccharide) binding fold. The 
latter motif has been found in other RNA- and 
DNA-binding proteins and contains the major con- 
tact sites for RNA. 

The remainder of the Rho polypeptide contains 
polypeptide segments that are present in other 
ATPases and is thus called the ATP-binding domain. 
Its structure has not yet been determined but its se- 
quence is very similar (~22% identity and 45% simi- 
larity) to the C-terminal two-thirds of the a and B 
subunits of the F1-ATPase. Therefore this segment 
of Rho probably has a tertiary structure that is similar 
to the aligned portions of a and B F1 ATPase subunits. 

The F1-ATPase o and B subunits form a six-subunit 
LopoPaP J ring structure that is similar in appearance 
to the Rho hexamers observed in electron micro- 
graphs. Hence, the quaternary structural organization 
of Rho is also believed to be very similar to that of the 
F1-ATPase hexamer. In this organization the six 
amino terminal domains form a crown at one pole 
of the hexamer, a predicted -helical loop motif near 
the C-terminus would be at the opposite pole, and the 
core of the ATP-binding domain would form the 
major globular mass around the equatorial part of 
the hexameric ring. 


Proposed Mechanism 


Rho factor mediates termination of transcription by 
first binding to a site on the nascent transcript, then 
using its ATP hydrolysis activity as a source of energy 
to dissociate the transcript from its ternary complex 
with RNA polymerase and DNA. The binding of Rho 
occurs preferentially at sites on the RNA where there 
is very little double-stranded secondary structure and 
a high content of cytidylate (C) residues. A segment 
with at least 40 nucleotides of single-stranded RNA in 
the context of a larger RNA (at least 80 nucleotides 
total) is needed for interactions that can lead to termin- 
ation. Since the free protein exists as a mixture of 


monomers and oligomers, the hexameric structure can 
form by assembly of monomers and oligomers on the 
RNA. In Figure | is shown a diagram of a proposed 
termination mechanism. This mode of assembly of 
Rho on an RNA is indicated by showing a possible 
intermediate state with only two sets of dimers 
attached to the RNA. 

Although the primary contacts of the Rho subunit 
with RNA are presumed to occur via the oligonucleo- 
tide binding folds in the RNA-binding domains of the 
individual subunits, an additional and very critical 
contact is believed to occur when the six subunits 
form around the 3’ portion of the transcript, yielding 
a structure in which the 3’ part of the RNA passes 
through a hole in the center of the hexamer (Figure 1). 
This structure would eventually bring the C-terminal 
pole of the Rho hexamer into contact with RNA 
polymerase near the exit channel for nascent RNA. 
The contacts between RNA and Rho activate the ATP 
hydrolysis function in the subunits. Although the 
actual mechanism that couples ATP hydrolysis with 


Rho subunits 


Figure | 
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the movement of Rho along the RNA is not known, 
there is evidence that hydrolysis of ATP molecules is a 
concerted, coordinated process and that Rho main- 
tains contact with the initial binding site on RNA 
where the assembly of the hexamer occurred. Coordin- 
ated conformational changes that would thus occur in 
the subunits upon the release of ADP and the binding 
of a new ATP substrate to that empty site could medi- 
ate conformational changes that would pull the RNA 
through the center of the Rho hexamer, thereby dis- 
sociating its 3’ end from contacts with RNA polymer- 
ase and the DNA template. The Rho-RNA complex 
would then dissociate from the RNA polymerase- 
DNA complex, which would itself dissociate in turn. 
This last step might be facilitated by sigma factor, 
which is known to decrease the affinity of core RNA 
polymerase for non-promoter sites on double- 
stranded DNA. The Rho-RNA complex finally 
dissociates to release free Rho. This step could be 
facilitated by the 3’ to 5’ exonucleases of the cell 
(polynucleotide phosphorylase, RNase II, and the 


nADP + nP; 


Transcription termination with Rho factor. The diagram shows representations of the ternary complex 


between RNA polymerase, a DNA template, and RNA product at intermediate stages in the termination process. 
RNA polymerase is shown as a multilobed globular structure, the DNA as two dark lines that are separated within 
the RNA polymerase, and RNA as a single partially coiled line that emerges near the bottom of the polymerase. Rho 
subunits bind to the RNA, forming an acron-like structure with a 5’ portion of the RNA making contacts with the six 
amino-terminal RNA-binding domains of the subunits and with the 3’ portion passing through a hole in the center of 
the Rho hexamer. Hydrolysis of ATP to ADP and P; is coupled to events on the RNA that pull the 3’ portion through 
the hole, eventually allowing the RNA-Rho complex to dissociate from the complex of RNA polymerase with the 
DNA template. (The representation of RNA polymerase is based on that shown in Fig. 4 of Mooney, Antsimovitch 
and Landick (1998) Journal of Bacteriology 180: 3265-3275 and is used courtesy of Dr Robert Landick.) 
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degradosome) that would degrade the part of the 
RNA to which Rho preferentially binds (i.e., the 3’ 
untranslated ‘tail’ of an operonic transcript). 

Rho has been shown to dissociate short DNA 
segments base-paired to the 3’ end of the RNA by 
employing a helicase activity that is similar to the 
action of the DNA helicases on partially unwound 
double-stranded DNA. Thus, the mechanisms used 
by Rho and DNA helicases are likely to be very 
similar and indeed, a mechanism similar to that pro- 
posed for Rho here has been proposed for the action of 
the hexameric DNA helicases that are involved in 
DNA replication. 


Rho-Dependent Terminators 


A Rho-dependent terminator is a bipartite site that 
extends over approximately 200bp of DNA. The 
upstream portion, called rut (Rho utilization site) 
encodes the segment of the RNA transcript to which 
Rho binds initially. This sequence is about 80 bp in 
length, and although it is the main specificity deter- 
minant, the known rut sequences are so different 
they do not have any consensus. However, they are 
broadly characterized as being C-rich and G-poor on 
the nontemplate strand of the DNA (G residues con- 
tribute strongly to double-stranded secondary struc- 
tures). The points where termination actually occurs, 
the transcription stop points (tsp), are even less well 
defined. They can be distributed over a wide segment 
of DNA (80 to 120bp) and correlate very well with 
places where RNA polymerase pauses during tran- 
scription. This observation suggests that some feature 
of the pause sites allows Rho to preferentially dissoci- 
ate transcripts at such points. Transcript stop points 
are thus defined by the features that cause pausing. 
Since a natural tsp sequence segment can be func- 
tionally replaced by any of several nontermination 
sequences, this part of the terminator is not considered 
to be a major specificity determinant. Instead, specifi- 
city is provided by the rut portion. Finally, the two 
parts of the terminator are closely linked. The tsp 
region starts as soon as sufficient rut sequence has 
been transcribed for Rho to make its stable initial 
attachment. 

Because many sequences in DNA can and do serve 
as rut sequences, what prevents Rho from prematurely 
terminating transcription? The answer is the presence 
of a ribosome that normally engages the mRNA as 
soon as it emerges from the exit site of RNA polymer- 
ase. The rate of translation of mRNA in bacteria is 
similar to the rate of transcription — about 48 nucleo- 
tides per second. The presence of a ribosome on the 
emerging nascent RNA effectively blocks access by 
Rho to the RNA. Hence, Rho-dependent terminators 


are found at the ends of genes or operons, where the 
transcript is no longer translated, and at some regula- 
tory sites in 5’ untranslated regions of genes. Rho- 
dependent termination also occurs within genes when 
the activity of the ribosome is disrupted or blocked. 
In this guise these terminators prevent the continued 
synthesis of an unneeded mRNA. 

Rho does not interrupt the transcription of 
untranslated RNAs such as tRNAs and rRNAs for 
two reasons. Both of these kinds of RNA are highly 
structured and thus do not have many, if any, sites for 
attachment of Rho. But in addition, the RNA poly- 
merases that transcribe rDNA segments become modi- 
fied in response to regulatory signals at the start of the 
rRNA genes making them immune from Rho action 
even when the nascent rRNA contains a sequence that 
could act as a functional rut site in an mRNA context. 
This antitermination mechanism is known to involve 
the proteins NusB and S10 and may involve other 
regulators of elongation and termination, such as 
NusA and NusG. 


A Role for NusG 


Although Rho alone can cause termination of tran- 
scription in a purified system under certain artificial 
conditions, another protein, NusG, is needed for Rho 
to function in its normal cellular context. NusG is a 
19kDa protein that can bind to both RNA polymer- 
ase and Rho. It is also known to accelerate the rate of 
transcriptional elongation and the rate with which 
Rho releases transcripts from arrested complexes. It 
becomes essential in vitro under conditions where 
the action of Rho is kinetically limited; Rho does not 
function very well with some transcripts if the time 
interval between interaction with the RNA and pas- 
sage of RNA polymerase through the tsp region is 
very short. Because NusG can bind to both Rho and 
RNA polymerase, it could serve as a bridge to facili- 
tate the location of rut sites by Rho on nascent tran- 
scripts. An alternative (or possibly additional) role for 
NusG could be to alter the response of RNA poly- 
merase signals that cause it to pause during elongation 
and to release a transcript at a pause site in response to 
Rho factor. 


Further Reading 

Burgess BR and Richardson JP (2001) RNA passes through the 
hole of the protein hexamer in the complex with Escherichia 
coli Rho factor. The Journal of Biological Chemistry 276: 4182— 
4189. 

Platt T and Richardson JP (1992) E. coli Rho factor: protein and 
enzyme of transcription termination. In: McKnight SL and 
Yamamoto KR (eds) Transcriptional Regulation, pp. 365—388. 
New York: Cold Spring Harbor Press. 


Richardson JP and Greenblatt JL (1996) Control of RNA chain 
elongation and termination. In: Neidhart FC, Ingraham J, Low 
K et al. (eds) Escherichia coli and Salmonella: Cellular and 
Molecular Biology, 2nd edn, pp. 822-848. Washington, DC: 
American Society for Microbiology. 

Richardson JP (1996) Structural organization of transcription 
termination factor Rho. Journal of Biological Chemistry 271: 
1251-1254. 

Yu X, Horiguchi T, Shigesada K and Egelman EH (2000) Three- 
dimensional reconstruction of transcription termination 
factor rho: Orientation of the N-terminal domain and visual- 
ization of an RNA-binding site. Journal of Molecular Biology 
299: 1279-1287. 
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Ribosomal RNA (rRNA) forms the core of ribosomes. 
The two ribosomal subunits each have a large RNA 
molecule that provides the binding sites for ribosomal 
proteins. These help the rRNA assume its proper 
functional three-dimensional structure. Most of the 


functions of ribosomes are closely associated with 
the rRNA. 


rRNA Molecules 


The small ribosomal subunit usually contains one 
rRNA molecule. In eubacteria (Escherichia coli will 
be used as the reference organism in this article) it is 
called the 16S RNA from its sedimentation velocity. 
The size of the corresponding RNA molecule varies in 
other organisms (see Ribosomes). In the large ribo- 
somal subunit there is one small RNA molecule called 
the 5S RNA, which is usually composed of 120 
nucleotides. The large RNA molecule in this subunit 
is called the 23S RNA in eubacteria. Its size also varies 
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and sometimes it is present as several pieces. The 
rRNAs have been useful for studies of evolution. 
Since it is presumed that all organisms have ribosomes 
that have evolved only once, the rRNA has been used 
for phylogenetic comparisons. In this way C. Woese 
identified a third domain of life: the Archaea. 


Structure of rRNA 


The structure of the rRNAs has been studied by a 
variety of methods since the first sequences became 
available. From sequence information it is possible to 
analyze the potential for base-pairing. Since the struc- 
ture of rRNA is expected to be conserved, the avail- 
ability of sequences from numerous species allows one 
to identify conserved secondary structures as well as 
to study the species variation. Thus the secondary 
structures of the rRNAs have been established and 
confirmed with chemical methods. These studies 
have led to the conclusion that the 5S RNA has a 
secondary structure that has the shape of a Y where 
the 5’ and 3’ termini form one of the short helices of 
this molecule (Figure ID). The large rRNA molecule 
of the large subunit forms six domains (Figure IC), 
whereas the rRNA of the small subunit is organized 
into three domains see (Figure | A). 

The details of the organization of the rRNA have 
also been explored by chemical methods. RNA can be 
modified or cleaved with a number of chemicals and 
the position of the modifications can be established. 
Thus the base-pairing can be analyzed as well as sites 
of protection by ribosomal proteins or other compon- 
ents of the system. Bifunctional cross-linking can also 
be used for studies of RNA and the residues involved 
can be identified. This yields direct information about 
proximity of components. Labeling of the rRNA from 
other components of the protein synthesis system has 
also been highly informative. 

The rRNA needs ribosomal proteins to be folded 
into the fully functional particle; a particular order of 
assembly has been established for the proteins. The 
binding sites for the ribosomal proteins give further 
insight into the organization of rRNAs. This is 
especially true in cases where the ribosomal protein 
binding sites or cross-linking data encompass several 
separate regions of rRNA. 

High-resolution information about the structure of 
rRNA has been obtained using NMR spectroscopy as 
well as X-ray crystallography, which shows that the 
structures of RNA are highly complex and the devi- 
ations from standard A-type RNA are numerous. On 
the one hand, the hydrogen bonding between bases as 
well as between bases and ribose hydroxyls provides 
extensive possibilities for variation beyond the clas- 
sical Watson—Crick interaction. On the other hand, 
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Figure | 


(See Plate 30) The secondary and tertiary structure of the ribosomal RNAs. (A) The secondary structure 
of 16S RNA. The four domains are colored differently. The numbering is according to the sequence from E. coli. 
(B) The tertiary structure of 16S RNA as observed in Thermus thermophilus small subunits. The domains can be 
identified by the coloring which is the same as in (A). (A) and (B) are reproduced with permission from Wimberley 
et al. (2000). (C) The secondary structure of 23S RNA. The coloring illustrates the six domains. (D) The secondary 
structure of the 5S RNA. (E) The tertiary structure of the RNA in the large ribosomal subunit as seen from the 
interface side. (F) The same as (E) but seen from the external side. The coloring of domains is the same as in (C) and 


(D). (C—F) are reproduced with permission from Ban et al. (2000). 
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bulges and loops in the secondary structures can 
sometimes be accommodated in the helical struc- 
tures. 

The crystallographic analysis of ribosonal subunits 
has clarified the entire organization of the ribosomal 
RNAs as shown in Figure IB, E and F. The four 
domains of the 16S RNA are found as separate blocks 
in different parts of the subunit whereas the six 
domains of the 23S RNA are found to traverse the 
subunit in a complex mesh. 


Functional Sites 


The ribosomal functions are to a large extent closely 
related to the rRNA. In fact a central function of the 
ribosome, peptidyl transfer, is associated with the 
ribosomal RNA. The binding of tRNAs and factors 
to the ribosome involves extensive parts of the 
rRNAs. Here we focus on some main aspects of 
these interactions. 


Binding of mRNA 

The mRNA appears to bind centrally on the small 
subunit of the ribosome between its so-called plat- 
form and head. In eubacteria a nucleotide sequence 
rich in As and Gs is usually found upstream of the initi- 
ator codon of the mRNAs. These sequences are com- 
plementary to a varying extent to a region of the 3’ end 
of the 16S ribosomal RNA. The binding of this region 
of the mRNA to the 3’ end of the 16S rRNA is called 
the Shine and Dalgarno interaction. 

In eukaryotic systems the binding site on the 
mRNA for the ribosome is recognized quite differ- 
ently. The eukaryotic mRNAs usually have an N’- 
methylated GTP linked by a 5'-5' pyrophosphate 
bond to the terminal nucleotide. This so-called cap 
is recognized by the cap-binding proteins that are 
important constituents for the binding of the mRNA 
to the small subunit. 


Decoding Site 

The decoding of the mRNA by the tRNAs is carried 
out on the small subunit. The anticodons of the 
tRNAs in the A- and P-sites are base-paired with the 
corresponding codons of the mRNA. The 16S RNA is 
closely associated with the decoding site. The crystal- 
lographic analyses of ribosomes and subunits provide 
extensive details of the decoding site. Three regions 
of the 16S RNA (921-927/1390-1409/1491-1505) 
are conserved during evolution and form a close 
interaction in the central part of the decoding site. 
Thus the anticodon of the P-site-bound tRNA can 
be cross-linked to C1400 of the 16S RNA. The 


ribosomal RNA has a controlling function in the 
decoding of the message in the A-site. This is prima 
rily done by nucleotides A 1492 and A 1493 as well as 
G530 that select for Watson-Crick base-pairing 
between codon and anticodon. 


Peptidyl Transfer Site 

The central function of the ribosome is to transfer a 
nascent peptide from one tRNA to the amino acid that 
is bound to another tRNA. The tRNAs are specified 
by subsequent codons of the mRNA located in the 
decoding site. 23S RNA devoid of essentially all pro- 
teins was shown to have a low but significant peptidyl 
transfer activity. This function is primarily associated 
with the central circle of domain V of the 235 RNA 
from the large subunit. “The identification’ by: The 
crystallographic analysis of the large subunit has clari- 
fied the important interactions in the peptidyl transfer 
site. Thus C74 and C75 of the P-site tRNA base pair 
with G2252 and G2251 of the 23S RNA, respectively. 
C75 of the A-site tRNA is base-paired to G2553 of 
the 23S RNA. A catalytic mechanism has been pro- 
posed which involves a number of conserved groups 
including A2451 of the 23S RNA which plays the role 
of a general base during catalysis. Thus the CCA-end 
of the tRNA in the P-site probably binds to the loop 
G2251-G2253 with C74 of the tRNA base-pairing 
with G2252 of the 23S RNA. Residues U2552 and 
G2553 interact with the CCA-end of the A-site 
tRNA. 


Factor-binding Site 

The binding sites for elongation factors Tu and G (EF- 
Tu and EF-G), which are partially overlapping also 
are made up of contributions from rRNA. The main 
contacts are with the large ribosomal subunit. Thus 
EF-G has been cross-linked to a nucleotide and shows 
footprints in the 1067 region of the 23S RNA. This 
region of the ribosome is known to bind not only 
ribosomal proteins interacting with the elongation 
factors (L7/L12, L10, and L11), but also antibiotics 
inhibiting their functions (thiostrepton). In addition 
the so-called «-sarcin/ricin loop of the 23S RNA (resi- 
dues 2653-2667) interacts with both elongation fac- 
tors, as illustrated by protection of the rRNA from 
chemical reagents. 


Binding of Antibiotics 

Ribosomes are the targets of numerous antibiotics, 
which bind at vital functional sites. These binding 
sites seem to involve the rRNAs. This further empha- 
sizes the importance of the rRNAs. 


Further Reading 

Zimmermann RA and Dahlberg AE (eds) (1996) Ribosomal 
RNA. Structure, Evolution, Processing and Function in Protein 
Synthesis. Boca Raton, New York, London, Tokyo: CRC 
Press 
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The ribosome binding site may correspond to the 
binding site for the ribosome on the messenger 
RNA (mRNA) as well as the binding site for the 
mRNA on the ribosome. Protein synthesis nor- 
mally starts at the initiation codon, AUG. However 
this codon also encodes methionines that can be 
situated at any position of the polypeptide. Dif- 
ferent methods have evolved in eubacteria and eukar- 
yotes to identify the ribosome binding site on the 
mRNA that initiates protein synthesis at the correct 
AUG. 

In eubacteria a nucleotide sequence rich in As and 
Gs is usually found 3-10 nucleotides upstream of the 
initiator codon of the mRNAs. These sequences are 
complementary to a region of the 3’ end of the 16S 
ribosomal RNA variably. The binding of this region 
of the mRNA to the 3’ end of the 16S rRNA is called 
the Shine and Dalgarno interaction. This interaction 
increases the local concentration of the initiation 
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codon near the decoding site of the ribosome. The 
initiator tRNA (f{Met-tRNA) in complex with initia- 
tion factor 2 recognizes the initiation codon AUG 
and binds to the P-site of the small subunit of the 
ribosome. 

The mRNA binding site on the ribosome is cen- 
trally located on the small subunit of the ribosome 
between its so-called platform and head. The anti- 
codon of P-site bound tRNA can be crosslinked to 
C1400 of the 16S RNA. A tRNA bound to the A-site 
is similarly close to nucleotides in the 1490 region of 
16S RNA. This then identifies important parts of the 
decoding site. Several additional interactions between 
the mRNA and the 16S RNA or ribosomal proteins 
have been observed. A detailed crystallographic struc- 
ture of the ribosome and its subunits is becoming 
available, making it possible to put these interactions 
into a coherent picture. 

In eukaryotic systems the binding site on the 
mRNA for the ribosome is recognized quite differ- 
ently. The number of initiation factors is significantly 
greater than in eubacteria. Some of these initiation 
factors interact with the small (40S) subunit while 
others interact with the mRNA. The initiator (RNA 
binds to the small subunit in complex with the eukar- 
yotic initiation factor 2 (eIF-2, which is composed of 
three polypeptides). The eukaryotic mRNAs are 
usually capped at the terminal 5’ position. This means 
that they have a N’-methylated GTP linked by a 5'—5' 
pyrophosphate bond to the terminal nucleotide. This 
so-called cap is recognized by specific proteins, the 
cap binding proteins, which are important constitu- 
ents for initiation. The cap is situated at a varying 
distance from the initiation codon, the first AUG. 
The small subunit then scans the mRNA for this 
AUG codon, which will be recognized by the bound 
initiator tRNA. Subsequently, the large subunit asso- 
ciates with this complex to initiate protein synthesis. 


See also: Initiation Factors; Ribosomal RNA 
(rRNA) 
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Ribosomes are macromolecular assemblies that are the 
central sites for protein synthesis or translation in all 
cells. The key chemical step of protein synthesis on 
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ribosomes is peptidyl transfer, in which the growing 
or nascent peptide is transferred from one tRNA 
molecule to the amino acid bound to another tRNA. 
Amino acids are incorporated into the growing poly- 
peptide on the ribosome according to the sequence of 
codons of a mRNA. The ribosome thus has binding 
sites for one mRNA and at least two tRNAs. Ribo- 
somes are composed of two subunits, the large and the 
small subunit which consists of a few ribosomal RNA 
(rRNA) molecules and a variable number of ribosomal 
proteins. Several protein factors catalyze different 
steps of protein synthesis. The fidelity of translation 
of the genetic code is of critical importance for the 
production of functional proteins and for the viability 
of the cell. 

Escherichia coli will be used here as the reference 
organism. Protein synthesis on ribosomes follows a 
closely related pattern in other organisms. Important 
differences that are clearly established will be specified. 


History 


The first observations of RNA-containing particles 
that must have been ribosomes dates back to the 
1940s. Subsequent studies in the 1950s and 1960s led 
to purified preparations and a realization that proteins 
are synthesized on ribosomes. It was also observed 
that eukaryotic ribosomes are larger than eubacterial 
ribosomes, sedimenting at 80S and 70S, respectively. 
However, the ribosomes of mammalian mitochondria 
are noticeably smaller, with significantly shorter 
rRNAs, which are sometimes referred to as mini- 
ribosomes. It was realized that the process of protein 
synthesis, which is dependent on mRNA, goes 
through different phases called initiation, elongation, 
and termination and that these steps are catalyzed by 
different soluble protein factors. 


Composition of Ribosomes 


It is evident from the primary structures of rRNA and 
ribosomal proteins that all ribosomes have a common 
evolutionary origin, even though the number and size 
of the components varies considerably. This has been 
exploited for evolutionary studies in which the 
nucleotide sequences of ribosomal RNAs are used to 
establish the relationships between species. Indeed it 
was from the analysis of ribosomal RNA sequences 
that Carl Woese could establish that archaebacteria 
form a separate group of organisms distinct from 
eubacteria and eukaryotes. The growing number of 
sequenced genomes also provides data for compara- 
tive analysis of ribosomal proteins. In most ribosomes 
the mass of the rRNA is significantly larger than that 
of the ribosomal proteins. This makes it clear that the 


Table | Ribosomal RNA from different classes of 
organisms 
Source Small Large 

subunit subunit 
Eubacteria 16S 23S, 5S 
Chloroplasts 16S 23S, 5S, 4.5S 
Mitochondria (plant) 18S 26S, 5S 
Mitochondria (mammals) 12S 16S 
Mitochondria (trypanosoma) 9S 12S 
Archaea 16S 23S, 5S 
Eukaryotes 18S 5.88, 25-285S, 5S 


protein-RNA interactions must be extensive and that 
the protein-protein interactions may be more limited. 
The ribosomes from mammalian mitochondria, which 
seem to have evolved most rapidly, have less rRNA 
and a large complement of proteins (Table 1). It is not 
excluded that some proteins replace and mimic parts 
of the deleted rRNA. 


rRNA 

The small ribosomal subunit usually contains one 
rRNA molecule. In eubacteria it is called the 16S 
RNA from its sedimentation velocity. In other organ- 
isms the size of the corresponding RNA molecule 
varies (see Table 1). In the large ribosomal subunit 
there is one small RNA molecule called the 5S RNA, 
which is usually composed of 120 nucleotides. The 
large RNA molecule in eubacteria is called the 23S 
RNA. Its size varies and sometimes it is present as 
several pieces (Table 1). 


Ribosomal Proteins 


Identification and number of ribosomal proteins 

The ribosome has a large number of usually small 
proteins bound to the ribosomal RNA. The exact 
definition of the ribosomal proteins has proved to be 
a problem. When ribosomes are purified, different 
washing procedures lead to a variable number of pro- 
teins remaining attached. Thus few of the proteins are 
found in stochiometric amounts. Furthermore, pro- 
teins not belonging to the protein synthesis apparatus 
may stick to the ribosome artificially during the puri- 
fication. In addition, some of the ribosomal proteins 
were not initially identified as such due to their limited 
size and high positive net charge that made them 
run out of the classical two-dimensional gel. The 
small ribosomal subunit from the reference organism 
Escherichia coli contains proteins $1-S21 and the large 
subunit contains proteins L1-L36 (Table 2). Some 
proteins from the large subunit have been deleted: 
L7 (a modified form of L12 found only in a limited 


Table 2 Relationships of eubacterial proteins (adapted 
from Amos Bairoch, Geneva. http://www.expasy.ch/cgi- 
bin/lists? ribosomp.txt) 


LI APECc SI P E Cc M 
L2 APECM S2 APEC M 
L3 APEC $3 APEC M 
L4 P C S4 APEC M 
L5 APECM S5 APEC 
L6 APECM S6 P C 
L9 P Cc S7 APECM 
LIO APE S8 APECM 
LII APE Cc S9 APEC 
LI2 APE Cc SIO APECM 
LI3 APE Cc SII APEC M 
LI4 APECM $12 APEC M 
LIS APEc S13 APECcM 
LI6 P CM S14 APEC M 
LI7 P m S15 APEC M 
LI8 APEC S16 P C m 
LI9 PEC SI7 APECc m 
L20 P C SI8 P C 
L21 P Cc SI9 APECM 
L22 APE Cc S20 P C 
L23 APEC S21 P 

L24 APE Cc 

L25 P 

L27 PEC 

L28 P Ccm 

L29 APEC 

L30 APE 

L31 PC 

L32 P 

L33 P C 

L34 PC 

L35 P Cc 

L36 P C 


Legend for taxonomic range: A, archea; P, eubacteria; 
E, eukaryotes; C, chloroplast encoded; c, chloroplast, nuclear 
encoded; M, mitochondrion encoded; m, mitochondrion, 
nuclear encoded. Notes: L7 = L12; L8 = (LI 2)4:L10;L26=S20. 


number of species); L8 (a complex of L7/L12 and 
L10); and L26 (identical to S20). The realization that 
many of the ribosomal proteins in bacteria occur in 
operons that mainly contain ribosomal proteins has 
also helped to verify the ribosomal nature of several 
proteins. In total there are around 54 ribosomal pro- 
teins in eubacteria, archaebacteria, and chloroplasts, 
while the number rises to over 80 in eukaryotes and 
mitochondria. 


Evolutionary relationship of ribosomal proteins 
The database of ribosomal protein sequences is grow- 
ing rapidly primarily owing to the number of genomes 
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that have been sequenced. From sequence compari- 
sons it is evident that more than 50% of the eubacterial 
ribosomal proteins have homologous proteins in 
chloroplasts, in archaebacteria as well as eukaryotes 
(Table 2). Complete investigations of ribosomal pro- 
teins from mitochondria have not yet been carried out. 
The list of conserved proteins is expected to increase 
when the structures become available since the diver- 
gence of proteins often goes well beyond what can be 
safely identified by sequence comparisons. The recog- 
nition of sequence motifs or structural and functional 
correspondence is likely to extend the fraction of pro- 
teins conserved between the different kingdoms. In 
addition structural studies of ribosomal proteins indi- 
cate that several different ribosomal proteins may have 
a common origin. The relationship of ribosomal pro- 
teins with proteins having other functions is also 
revealed by structural studies. Additional sequence 
and structural data may determine whether this is 
due to a common evolutionary origin. 


Copy number of ribosomal proteins 

Ribosomal proteins are present in only one copy per 
ribosome, apart from one exception. This is the acidic 
protein L7/L12 in E. coli, which is found in four copies 
per ribosome. In archaebacteria or eukaryotes this 
protein corresponds to an acidic protein of the same 
size but with quite different amino acid sequence, 
which is also found in four copies per ribosome. In 
eukaryotes there are several forms of this protein, 
which are produced from different genes. The total 
number of these proteins in the ribosome however 
remains at four. 


Structure of the Ribosome 


Determining the structure of the ribosome is a signifi- 
cant challenge because of its large size and lack of 
symmetry. The primary interest is focused on the 
binding sites for mRNA and tRNA molecules as 
well as for the soluble factor proteins. The functional 
sites will be dealt with below. A large number of 
approaches have been tried to obtain a detailed struc- 
ture of the whole ribosome as well as of the individual 
components, some of which will be briefly described 
below. 


Structure of Ribosomes and Ribosomal 
Subunits 


Assembly 

One of the first ways in which the structure of the 
ribosomal subunits was explored was the assembly of 
purified components into functional particles. It was 
found that certain proteins had to bind initially to the 
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rRNA in order for other proteins to be able to associ- 
ate. This indicated that the rRNA was not able to fold 
properly without the presence of specific proteins. A 
particular order of assembly was obtained for groups 
of proteins. It was later established by other methods, 
such as cross-linking and neutron scattering, that the 


interdependence of assembly is primarily related to 
the proximity between the proteins in the particles. 


Crosslinking 
Bifunctional cross-linking and affinity labeling has 
continually been used to explore proximity between 


different components or residues of the ribosome or 
between bound tRNA, mRNA, or translation factors 
and the ribosome. The method remains a very import- 
ant tool with which to explore structural and func- 
tional proximity on the ribosome. Great care must be 
exercised in order to avoid accidental and misleading 
covalent reactions between the reagent and compon- 
ents of the protein synthesis system. 


Accessibility to enzymes and chemical modifications 
There are numerous chemical and enzymatic ap- 
proaches to studying the exposed surfaces of rRNA 
and proteins in the ribosome, as well as the domains 
binding different types of ligands. These methods have 
primarily been used for studies of the rRNAs. Thus 
the secondary and tertiary structures of the rRNAs 
and the binding sites for ribosomal proteins, as well as 
those for tRNAs and factors, have been investigated. 
Protection against chemical modification of certain 
nucleotides is usually called ‘footprinting.’ 


Electron microscopy 

Electron microscopy has been and remains an import- 
ant tool for gaining structural insight into particles as 
large as ribosomes. One objective is to get a detailed 
shape of subunits and whole ribosomes. Another 
objective is to locate ribosomal proteins or parts of 
the rRNA. Many different approaches have been 
tested. One has been immune electron microscopy, 
where an antibody against a certain component of the 
system is used as a ‘pointer’ to this component in the 
complete subunit or full ribosome. One obvious danger 
with this method is the temptation toward subjective 
interpretation. The most recent application of electron 
microscopy in the study of ribosomes is electron 
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Figure | (See Plate 31) (Opposite and left) The 
crystal structure of Thermus thermophilus ribosomes 
displaying the three tRNA binding sites. (Yusupov, M. 
et al., 2001. Crystal structure of the ribosome at 5.5A 
resolution. Science 292: 883-896). The RNA of the large 
subunit is shown in white whereas the RNA of the small 
subunit is shown in blue. The proteins are shown in red 
and dark blue respectively. The three tRNA molecules 
are shown in yellow (A-site), orange (P-site) and red 
(E-site). A to E show five views of the whole ribosome 
and F and G show the 50S and 30S subunits respectively. 


cryomicroscopy where large numbers of randomly 
oriented ribosomes give different views of the particle, 
which can be combined into a three-dimensional 
picture of the ribosome. Resolution at better than 
10 A is possible. Structures of a rapidly growing 
number of complexes of ribosomes with tRNAs and 
factor proteins are becoming available. 


Neutron scattering 

Neutron scattering is an elegant method that has pro- 
vided extremely valuable insight into the localization 
of ribosomal proteins. Here the difference in scattering 
of neutrons by hydrogen and deuterium is used. Thus 
the neutron scattering of protonated or deuterated 
pairs of proteins has been studied in a background 
of the complementary hydrogen isotope. Distances 
between proteins, as well as information about the 
shape of proteins, has been obtained in this way. 
From the pairwise distances of ribosomal proteins, a 
three-dimensional map was constructed and related to 
the electron microscopic shape. The results have 
turned out to be reliable even though they only pro- 
vide low resolution information. 


Nuclear magnetic resonance 

One method that can provide structural information 
at atomic resolution is nuclear magnetic resonance. 
One limitation, which is severe when it comes to the 
study of the ribosomes, is that resulting from the size 
of the object. Molecules with a molecular mass of 
more than about 40000 Da are difficult to study. 
However, ribosomal proteins or fragments of rRNA 
are very suitable for this method. In particular, studies 
of well-chosen fragments of rRNA have given useful 
insights into ribosomal structure and function. 


X-ray diffraction 

Crystallography on ribosomal components, as well as 
ribosomal subunits or whole ribosomes, has been per- 
formed for several decades. The limitation here is the 
need to obtain crystals, but the size of the object is no 
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limitation. The structures of a number of ribosomal 
proteins and translation factors have been determined. 
The progress of crystallography on ribosomal sub- 
units has led to structures at atomic resolution which 
have been fitted into a map of the whole ribosome 
at lower resolution (Figure IA, B). The interplay of 
X-ray crystallography with electron cryomicroscopy 
will be an essential factor in providing structural 
information, and the progress after the initial break- 
through will lead to a solid structural base for ribo- 
somal studies and an approach to understanding the 
mechanisms of translation that has been awaited for 
decades. 


Functional Sites 


Binding of mRNA 

The messenger RNA (mRNA) binds to the small 
ribosomal subunit. In eubacteria the 5’ end of the 
mRNA binds to the 3’ end of the 16 S RNA through 
base-pairing using a short complementary sequence 
called the Shine and Dalgarno interaction. This then 
presents the initiation codon in the decoding site. The 
decoding site involves nucleotides at the top of the 
penultimate helix of the 16S RNA particularly ade- 
nines 1492 and 1493 which interact with the codons of 
the mRNA and the anticodons of tRNA molecules. 
The decoding site of the mRNA on the ribosome is 
located centrally on the small subunit. The site is be- 
tween the platform and the head of the small subunit. 
When the two subunits interact the decoding site faces 
into the intersubunit space. 


tRNA binding sites 

The tRNA molecules bridge the space between the 
two subunits since the decoding site is on the small 
subunit and the peptidyl transfer site is on the large 
subunit. This parallels the elongated structure of the 
tRNAs with the anticodon at one end and the amino 
acid at the opposite end. Two sites for tRNA mol- 
ecules on the ribosome have been discussed in the 
classic literature. These are the A-site (the site for the 
acceptor or aminoacyl tRNA) and the P-site (the site 
for the donor or peptidyl tRNA). A third site has also 
been generally accepted. This is the E-site where de- 
acylated tRNA resides before it dissociates from the 
ribosome. From footprinting or chemical protection 
studies these sites have been delineated on the rRNA. 
It has also become evident that during transition from 
one site to the next, the tRNA molecules transiently 
bind in hybrid states such as A/P and P/E. The crystal- 
lographic work on whole ribosomes has delineated the 
tRNA binding sites in great detail (Figure IC, D) 


A-site The location of the A-site tRNA is related to 
the binding site for EF-Tu as long as the aminoacyl 


tRNA remains bound to EF-Tu in complex with GTP. 
Its anticodon is located in the decoding location of 
the A-site, whereas the aminoacyl end remains in- 
timately attached to EF-Tu far from the peptidyl 
transfer site. Incorrect matches of anticodon to 
codon lead to dissociation from the ribosome of the 
EF-Tu-tRNA complex. However, for cognate codons 
EF-Tu is induced to hydrolyze its GTP to GDP, 
which changes its conformation drastically. EF-Tu 
then looses its affinity for the tRNA, as well as for 
the ribosome and dissociates. Then the aminoacyl 
tRNA moves into the A-site, which at one end is 
defined by the interaction of the anticodon of the 
tRNA with the corresponding codon of the mRNA 
and at the other end by the localization of the amino 
acyl moiety in the peptidyl transfer center, in close 
proximity to the nascent peptide. This movement of 
the tRNA coincides with the proofreading stage of 
elongation during which noncognate tRNAs fall off 
the ribosome. 


P-site The P-site, as for the A-site tRNA, stretches 
across the space between the subunits and not only 
remains attached to the peptidyl transfer site but also 
to the codon in the decoding site. The P-site is further 
inside the subunit interface from the factor-binding 
site. A number of specific contacts between the tRNA 
in the P-site and the ribosome have been identified by 
different methods. The P-site tRNA is related to the 
A-site primarily by a 26° rotation. 


E-site The E-site is the site at which deacylated 
tRNA molecules bind before they dissociated from 
the ribosome. It is not clear whether occupation of the 
site has any functional role. The E-site is further 
toward the L1 side of the subunit interface compared 
to the P- and A-sites. The E-site tRNA is probably 
related to the P-site tRNA by a 40° rotation. 


Peptidyl transfer site 

The large subunit contains the site for peptidyl trans- 
fer. Experiments to identify the components that are 
essential for this partial reaction have been performed 
for decades and a number of proteins have been 
tentatively identified. At the same time assays repre- 
sentative for peptidyl transfer, such as the puromycin 
reaction, have been catalyzed by rRNA essentially 
devoid of ribosomal proteins. The crystallographic 
analysis of large subunits shows that no protein is in 
the vicinity of the peptidyl transfer site. Thus the 
ribosome is a ribozyme. 

A number of approaches have been used to identify 
the region of the 23S RNA involved in peptidy] trans- 
fer. The central loop of domain V is found to be of 
great importance as illustrated by the following: many 


of the nucleotides in this area are completely con- 
served; resistance against antibiotics that inhibit pep- 
tidy] transfer are accumulated in this region; chemicals 
reacting with nucleotides are blocked by tRNAs 
bound in this region; cross-linking from the acceptor 
ends of tRNAs or from the amino acid or peptide that 
is attached to the tRNA are found in the loop; and 
mutations in this region severely affect peptidyl trans- 
fer. The site of the large subunit where peptidyl 
transfer occurs is in the interface side below the central 
protuberance. 

The crystallographic analysis of the large subunit 
has clarified the important interactions in the peptidy] 
transfer site. Thus C74 and C75 of the P-site tRNA 
base-pair with G2252 and G2251 of the 23S RNA, 
respectively. C75 of the A-site tRNA is base-paired 
to G2553 of the 23S RNA. A catalytic mechanism has 
been proposed which involves a number of conserved 
groups including A2451 of the 23S RNA which plays 
the role of a general base during catalysis. 


Exit channel 

It has long been observed that ribosomes protect a 
number of amino acids of the nascent polypeptide 
from digestion by proteolytic enzymes. This could 
be due to the existence of a channel in the large subunit 
through which the polypeptide exits. This channel 
has been observed by a suitably positioned channel 
through the large subunit by electron microscopy and 
crystallography. 


Binding site for elongation factors 

Elongation factor Tu (EF-Tu or EF-1« in eukaryotes) 
binds to the ribosome as a ternary complex with an 
aminoacyl tRNA and GTP. Numerous studies have 
identified the factor binding site to be just inside the 
so-called L7/L12 stalk (Figure IF). Elongation factor 
G (EF-G, EF-2 in archaea and eukaryotes) in complex 
with GTP translocates the peptidyl tRNA from the 
A-site to the P-site once peptidyl transfer has occur- 
red. Electron cryomicroscopy has provided pictures 
of the factors when bound to the ribosome. One 
end of the factors binds to the classical factor binding 
site with direct contacts to the L7/L12 region. At the 
opposite end, in the case of EF-Tu-tRNA the tRNA 
anticodon binds to the decoding site on the small 
subunit. The overall shape of EF-G is very similar to 
that of the ternary complex of EF-Tu with tRNA and 
one domain (IV) of EF-G corresponds structurally to 
the anticodon stem and loop of the tRNA. In one 
phase of translation the binding of EF-G to the ribo- 
some overlaps with the binding site of EF-Tu and 
domain IV of EF-G interacts with the decoding region 
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of the small subunit. Regions of the rRNA, the so- 
called thiostrepton region (around 1070 of the 23S 
rRNA), and the highly conserved «-sarcin/ricin loop 
(around 2660 of the 23S rRNA), are close to the bind- 
ing site of the factors. 


Inhibitors of Protein Synthesis, 
Antibiotics 


Numerous inhibitors of protein synthesis are known 
to bind to the ribosome. These are frequently antibiot- 
ics isolated from different microorganisms. Some of 
the best known are: streptomycin, puromycin, ery- 
thromycin, and chloramphenicol. They inhibit differ- 
ent steps of translation. Several antibiotic binding sites 
have been identified crystallographically. They are 
generally located on the rRNA. Resistance to these 
antibiotics is correlated with modifications of the 
rRNA, as well as mutations of ribosomal proteins. 
The analysis of the mode of action of these inhibitors 
and the resistance toward them provides an excellent 
means to study the interplay between ribosomal com- 
ponents during protein synthesis. A number of 
antibiotic inhibitors of protein synthesis are clinically 
useful and a search for new ones to overcome the 
growing problem with antibiotic resistance is a signifi- 
cant aim of pharmaceutical companies. 
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See also: Chain Initiation, Elongation and 
Termination; Elongation Factors; Ribosomal 
RNA (rRNA); Ribosome Binding Site; Transfer 
RNA (tRNA); Messenger RNA (mRNA) 


Ribozymes 
T M Picknett and S Brenner 
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Ribozymes are RNA molecules with catalytic proper- 
ties — enzymes made of nucleic acid, not protein. Two 
decades ago, it was generally accepted that all cellular 
processes were due to the action of proteins that per- 
formed all functions. During studies conducted on 
RNA, it was discovered that some RNA molecules 
had catalytic properties and in particular, they could 
act as ‘molecular scissors’ and cleave other RNA 
strands (Guerrier-Takada et al., 1983; Kruger et al., 
1982). In 1990, Thomas Cech and Sidney Altman 
shared the Nobel Prize in Physiology or Medicine 
for their demonstration that RNA could act as an 
enzyme. 

The phenomenon is of particular note because of 
the implications for self-replicating systems in the 
earliest stages of the evolution of life on earth. It 
resolved the difficulty of explaining how catalysts 
and informational molecules could have separately 
evolved by showing that both properties can occur in 
one molecule. 


Further Reading 

Guerrier-Takada C, Gardiner K, Mardh T, Pace N and Altman S 
(1983) Cell 35: 849-857 

Kruger K, Grabowski PJ, Zaug AJ, Sands J, Gottschling DE and 
Cech TR (1982) Cell 31: 147-157. 


See also: Pre-mRNA Splicing 


Rifamycins 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.2008 


Rifamycins are antibiotics produced by Streptomyces 
mediterranei that specifically inhibit prokaryotic 


DNA-dependent RNA synthesis. They act by inhib- 
iting initiation but not elongation of transcripts. 


See also: Transcription 


Right/Left Handed DNA 


H C M Nelson 
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DNA contains all the genetic information and needs 
to be recognized by a vast variety of proteins and 
macromolecular complexes that are involved in every- 
thing from recombination to replication and from 
transcription to assembly into chromatin. For these 
processes to proceed appropriately, the proteins and 
macromolecular complexes must recognize specific 
sequences and/or structures of DNA. This occurs at 
two levels of complexity: the different hydrogen bond- 
ing patterns found on the major and minor groove 
of each base pair and the different overall shape that 
occurs for a given sequence arrangement of DNA. 

In terms of the overall shape, double-stranded 
DNA can be divided into three types: A, B, and Z. A- 
and B-like DNA have a right-handed double-helical 
twist, while Z-DNA has a left-handed double- 
helical twist. The two components that determine the 
DNA structure, as well as the handedness of the helix, 
are sequence (i.e., the particular arrangement of the 
four bases) and environment (i.e., the level of hydra- 
tion and the type and amount of ions). Historically, 
the right-handed DNA helices were the first to be 
discovered. Rosalind Franklin studied the repeating 
patterns of DNA fibers by looking at how X-rays 
were diffracted by the fibers. Most of her fibers were 
what we now call A-like DNA, although Watson and 
Crick deduced the first DNA model based on her 
X-ray diffraction data on what we would now call 
B-like DNA. In the 1970s, the presence of left-handed 
DNA double helices was suggested using spectro- 
scopic techniques such as circular dichroism and 
nuclear magnetic resonance spectroscopy, and this 
was confirmed later by X-ray crystallography. To 
obtain a left-handed double-helical configuration, 
the DNA needs to have a specific sequence, typically 
alternating G and C bases, with particular environ- 
mental conditions, such as high salt or low humidity. 

Apart from the differences in handedness of the 
helix, right- and left-handed DNA can be distin- 
guished by the shapes of their major (M) and minor 
(m) grooves. Figure | illustrates A-like DNA, B-like 
DNA and Z-DNA. In right-handed DNA, the 


A-like 


Figure | 
bases appear white in the figure. 


sugar—phosphate backbone runs smoothly along the 
edges of the grooves. In B-like DNA, the major and 
minor grooves are approximately the same depth, 
while in A-like DNA, the major groove is deep and 
the minor groove is shallow. In left-handed DNA, 
the sugar-phosphate backbone has a jagged, zigzag 
appearance, hence the term Z-DNA. The minor 
groove is quite deep, while the major groove is ex- 
tremely shallow. 

Looking at short pieces of DNA in isolation, it 
is easy to test whether they are right- or left-handed. 
Nevertheless, it is more difficult to tell whether the 
DNA is right- or left-handed in the cell. The vast 
majority of the DNA in the chromosome is thought 
to be right-handed. There is some evidence that a small 
portion of chromosomal DNA is either normally 
left-handed or, at least, can be induced to have a left- 
handed helical conformation upon binding of proteins 
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R-Loop 


B-like Z 


A-like DNA, B-like DNA, and Z-DNA. The phosphates appear black, the sugars appear gray, and the 


that recognize left-handed helical DNA. The signifi- 
cance of left-handed DNA is not understood. 


See also: DNA; Handedness, Left/Right 


R-Loop 
M R Lieber and F Chedin 


Copyright © 2001 Academic Press 
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An R-loop is a nucleic acid structure consisting of two 
antiparallel DNA strands plus one RNA strand. In 
this structure, the RNA is base-paired to one of the 
DNA strands, while the other DNA strand is unpaired. 
The name for this structure derives from the term 
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R-loops formed with 18S and 28S ribosomal RNAs hybridized to Dictyostelium discoideum ribosomal DNA. 


(Reprinted with permission from: Stumph W, Wu J-R and Bonner J (1978) Biochemistry 17(2): 5791-5798. Copyright 


© 1978 American Chemical Society.) 
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Figure 2 Model R-loop: The straight lines represent 
the two DNA strands with the displaced strand being 
depicted on the top. The wavy line represents the 
RNA strand base-paired to the bottom DNA 
strand. 


‘D-loop’ (displacement loop), which refers to the 
same structure, but in which all three strands consist 
of DNA. 

While a transient R-loop is formed at the tip of 
an elongating transcription fork, stable R-loops have 
been documented to form only at a few sequences. 
Examples include the origin of replication of several 
bacterial plasmids, where the RNA strand, after a 
processing step, serves as a primer for the initiation 
of DNA replication. R-loops were also proposed to 
occur at the origin and the terminus of replication of 
the Escherichia coli genome, as well as at the ribosomal 
DNA genes. In eukaryotes, formation of stable R- 
loops has been well documented at the origin of repli- 
cation of the mitochondrial genome, where they also 
serve as a primer. Stable R-loops have also been 
detected upon in vitro transcription of mammalian 
class-switch DNA sequences and proposed to target 
the process of immunoglobulin class-switching to the 
appropriate regions. 

Although the thermal stability of an R-loop is sup- 
posed to be maximal when a G-rich RNA is base- 
paired to a C-rich DNA strand, the rules governing 
R-loop formation are not clear yet. Refinement of 
these rules will come from additional experimental 
evidence of R-loop formation. 


See also: D-Loop 


RNA 


J Read and S Brenner 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.2144 


RNA (ribonucleic acid) is made up of four chemical 
bases, adenine (A), cytosine (C), guanine (G) and 
uracil (U) connected by a ribose backbone. It is tran- 
scribed from DNA following the Watson—Crick pair- 
ing rule but replacing thymidine (T) with U. There 
are four types of RNA: messenger (m)RNA, transfer 
ORNA, ribosomal (r)JRNA, and small nuclear 
(sn)RNA. Together, these serve to carry the genetic 
information stored by DNA in the cell nucleus to 
other parts of the cell where it is converted into protein. 


See also: Messenger RNA (mRNA); Ribosomal 
RNA (rRNA); RNA Polymerase; RNA World; 
snRNAs; Transfer RNA (tRNA) 


RNAases 


TM Picknett and S Brenner 
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The RNAases (RNases, ribonucleases) are a group of 
enzymes that cleave RNA. Some act as endonucleases, 
others as exonucleases. Generally recognition of the 
target is by tertiary structure rather than sequence. 
Ribonuclease E is an RNAase that takes part in the 
formation of 5S ribosomal RNA from pre-rRNA. 
F is stimulated by interferons and cleaves viral and 
host RNAs, inhibiting protein synthesis. H specific- 
ally cleaves RNAs base-paired to a complementary 


DNA strand. P comprises an endonuclease that acts 
on precursor transcripts to derive t-RNAs. T is an 
endonuclease that removes the terminal AMP from 
the 3’ CCA end of a nonaminoacylated tRNA. 
RNAase T1 cleaves RNA with a specificity for guano- 
sine residues. RNAase III cleaves double-stranded 
regions of RNA molecules. 


See also: Endonucleases; Exonucleases 


RNA-Binding Domains in 
Proteins 


D Pomeranz-Krummel and K Nagai 
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Following transcription RNA is subjected to various 
processing and modification events. In eukaryotic 
cells, it is critical that RNA is transported to a specific 
cellular compartment. For example, in developing 
embryos some mRNAs localize to a specific region 
of the embryo and thus determine its body plan. In 
addition, the translation of some mRNAs is regulated 
in a temporal as well as a spatial manner during devel- 
opment. These critical cellular events require specific 
interactions between RNA and protein. Several recur- 
ring RNA-binding sequence motifs have been identi- 
fied in some of these important proteins. These protein 
sequence motifs probably appeared early in evolution 
and have become widespread because of their versatile 
RNA-binding properties. We have now begun to 
understand how these RNA-binding protein domains 
recognize their specific target RNA(s). 


RNP domain (RNA Recognition Motif) 


The RNA recognition motif (RRM), also known as 
the ribonucleoprotein (RNP) motif or RNP-type 
RNA-binding domain (RBD), is the most common 
RNA-binding motif that has been identified; it is pres- 
ent in a significant number of proteins involved in 
almost all aspects of RNA processing and transport. 
Proteins containing an RRM include: hnRNP proteins 
(A1, A2/B1, C1/C2), spliceosomal proteins (U1A, 
U1, 70k, U2B”), nucleolin, and poly(A)-binding pro- 
teins. RRM is present as a single copy or in multiple 
copies and often occurs with other sequence motifs, 
such as the RS (Arg-Ser) and RGG (Arg-Gly-Gly) 
repeats. The spliceosomal U1 70k protein and 
hnRNP protein C1 each contain only a single 
copy of RRM, while the 65kDa subunit of the U2 
auxiliary factor (U2AF) contains two copies, ELAV 
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protein contains three copies, and the poly(A) 
(polyadenylate)-binding protein contains four tandem 
copies. However, not all RRMs are required for RNA 
binding and hence some may have still other func- 
tions, such as mediating protein-protein interaction. 

The RRM, consisting of approximately 80 amino 
acid residues, contains two highly conserved short 
sequence motifs known as RNP1 (RNP octamer) and 
RNP2 (RNP hexamer) (Burd and Dreyfuss, 1994). The 
RNP1 and RNP2 consensus sequences are (R/K)-G- 
(F/Y)-(G/A)-(E/Y)-V-X-(F/Y) and (L/I)-(F/¥)-(V/1)- 
X-(N/G)-L, respectively. The crystal structure of the 
spliceosomal U1A protein revealed that RRM folds 
into a compact globular domain (RNP domain) con- 
sisting of a four-stranded antiparallel B-sheet flanked 
on one side by two a-helices with a topology of B-a-B- 
B-x-B (Nagai et al., 1990). The RNP1 and RNP2 
motifs occupy the two middle f-strands. The third 
and fifth residues of RNP1 and the second residue of 
RNP2 (shown in bold above) are either a Phe or Tyr in 
the majority of the RRM motifs identified. 

The crystal structure of the U1A protein in com- 
plex with a fragment of U1 small nuclear RNA 
(snRNA) revealed how an RNP domain binds to 
RNA (Oubridge et al., 1994). The RNA-binding site 
of U1A protein consists of a 10-nucleotide RNA loop 
having the sequence AUUGCACUCC, closed by a 
CG base pair. The first seven loop nucleotides lie 
across the surface of the B-sheet and fit into a groove 
formed between the C-terminal region and the pep- 
tide loop connecting B2- and B3-strands. Upon U1A 
complex formation with RNA both the RNA loop 
and these two regions of the protein become well 
ordered by forming an intricate network of hydrogen 
bonds involving the RNA bases and protein side and 
main chain atoms. The bases of these seven loop 
nucleotides stack onto either: (1) an adjacent RNA 
base; (2) aromatic protein side chains of RNP1 and 
RNP2; or (3) both an RNA base and an aromatic 
protein side chain. The stacking interactions between 
RNA bases and aromatic side chains of RNP1 and 
RNP2 residues and the formation of the intricate 
network of hydrogen bonds play crucial roles in 
sequence-specific recognition of RNA by UIA protein. 

Interestingly, the RNP domain containing spliceo- 
somal U2B” protein is structurally similar to the U1A 
protein, but unlike U1A it binds its cognate RNA 
hairpin only when it is in a complex with another 
protein, the spliceosomal U2A’ protein. The crystal 
structure of the ternary complex shows that U2A’ 
binds to U2B” on the surface opposite the B-sheet 
and interacts with an RNA stem of U2 snRNA 
(Price et al., 1998). This structure revealed how the 
RNA-binding specificity of the RNP domain can be 
modulated by a second protein. 
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Some proteins require multiple copies of the 
RRM for tight and specific RNA binding. The sex- 
lethal (Sxl) protein is produced in female Drosophila 
and binds to a polypyrimidine tract in the intron 
between exon 1 and 2 of the transformer (Tra) gene. 
This binding event acts to repress the use of the down- 
stream splice acceptor site, resulting in sex-dependent 
alternative splicing. A fragment of the Sxl protein 
containing two tandem RNP domains has been 
crystallized with an RNA that is found in the pyrimi- 
dine tract of Tra mRNA having the sequence 
GUUGUUUUUUUU (Handa et al., 1999). The B- 
sheets of the two RNP domains face each other 
to form a V-shaped cleft. The RNA is sandwiched 
between the two RNP domains in an extended form. 
The linker peptide between the two RNP domains 
forms a short 319 helix upon RNA binding. The 
bases of the UGU (U3-U4-G5) sequence lie on the 
B-sheet surface of the second RNP domain and inter- 
act with the conserved residues of RNP1 and RNP2 
in a manner similar to that observed in the crystal 
structure of the U1A protein in complex with RNA. 
The base of U3 is packed against a Val (the third 
residue of RNP1 motif), whereas G4 and U5 stack 
on the aromatic rings of a Tyr (the second residue of 
RNP2) and a Phe (the fifth residue of RNP1), respect- 
ively. The U6-U11 nucleotides interact mainly with 
the first RNP domain. These interactions, involv- 
ing the bases as well as the phosphate backbone and 
2'OH groups of RNA, are distinct from those inter- 
actions observed in U1A and the first RNP domain of 
Sxl. 

The poly(A)-binding protein binds to the poly(A) 
tail of mRNA and promotes the formation of a closed 
loop by interacting with the initiation factor eIF4G 
bound to the 5’ end of mRNA. The formation of such 
a closed loop is thought to increase the efficiency of 
translation initiation. Poly(A)-binding protein con- 
tains four tandem copies of the RNP domain. A frag- 
ment of poly(A)-binding protein containing the first 
and second RNP domains has been crystallized 
in complex with an 11-nucleotide polyadenylate 
sequence. The crystal structure shows that the two 
RNP domains, connected by a short a-helix, interact 
side by side and form a continuous RNA-binding 
trough (Deo et al., 1999). The polyadenylate RNA 
adopts an extended conformation in the trough. The 
first RNP domain interacts with RNA bases A5—A8 
and the second RNP domain with bases A2—A5. In 
both RNP domains, the aromatic side chains of the 
second residue of RNP2 and the fifth residue of RNP1 
stack onto an adenine ring, as is the case in the struc- 
ture of the U1A-RNA complex. In these structures 
adenines are specifically recognized by hydrogen 
bonds between the bases and the protein. 


Stacking interactions between the RNA bases and 
the aromatic amino acid residues of RNP1 and RNP2 
motifs are a common feature in these complexes. 
However, there is a striking variation in the path of 
RNA and the way bases are recognized by surround- 
ing amino acid residues in the structures. Further, the 
linker peptide between the RNP domains appears to 
be important for the recognition of some RNA bases 
in proteins containing multiple RNP domains. 


Double-Stranded RNA-Binding Domain 
(dsRBD) 


A short sequence motif consisting of approximately 
70 amino acid residues has been identified in a large 
number of functionally diverse proteins including 
Escherichia coli RNase III, Drosophila Staufen protein, 
double-stranded RNA (dsRNA)-dependent aden- 
osine deaminase and the dsRNA-dependent protein 
kinase (PKR). This motif is called the double-stranded 
RNA-binding domain (dsRBD). It binds dsRNA but 
has no apparent affinity for dsDNA or single-stranded 
DNAorRNA. ThedsRBD formsathree-stranded anti- 
parallel B sheet with the N- and C-terminal «-helices 
packed on one face of the B-sheet. The crystal structure 
of a dsRBD of Xenopus RNA-binding protein A in 
complex with dsRNA shows that dsRBD interacts 
with two successive minor grooves across the inter- 
vening major groove on one face of a dsRNA helix 
(Ryter and Schultz, 1998). The N-terminal «-helix 
interacts with 2'OH and base functional groups in 
the RNA minor groove, and the peptide loop between 
B-strands 1 and 2 interacts with 2'OH and base func- 
tional groups in the adjacent RNA minor groove. The 
involvement of RNA 2'OH groups in binding 
accounts for the specificity of dsRBD for dsRNA 
but not dsDNA. The C-terminal «-helix binds across 
the RNA major groove forming contacts primarily 
with nonbridging phosphate oxygens. The majority 
of the RNA-contacting residues are conserved 
evolutionarily. The dsRBD appears to measure the 
helical pitch of the A form of the RNA helix. The 
dsRBD of Drosophila Staufen protein also binds to 
dsRNA in a similar manner (Ramos et al., 2000). In 
Drosophila embryos Staufen protein is found in ribo- 
nucleoprotein particles containing bicoid mRNA. The 
3‘ untranslated region of bicoid mRNA is required for 
this interaction, but it is not yet clear how Staufen 
protein associates with its target RNA. 


K Homology Sequence (KH) Domain 


Three copies of a short sequence consisting of about 
70 amino acid residues has been identified in hetero- 
geneous nuclear RNP (hnRNP)K protein. This 


sequence motif, called K homology (KH) domain, has 
been found in more than 100 proteins in eukaryotes, 
eubacteria, and archaebacteria. The KH domain con- 
tains three a-helices and a three-stranded anti-parallel 
B-sheet arranged in the order B-a-a-B-B-«. The invari- 
ant Gly-X-X-Gly motif forms a short 349 helix 
between the first and second a-helices. Two copies of 
the KH domain have been identified in the fragile X 
mental retardation protein (FMR1). Loss of function 
or mutations in FMR1 are common causes of inherited 
mental retardation. The proteins Nova 1 and 2 contain 
three such KH domains and bind to the repeated 
sequence UCAU(C/U) present within the intron 
immediately upstream of exon 3A of the glycine recep- 
tor 42 pre-mRNA. Autoimmune antibodies against 
the Nova proteins occur in individuals with paraneo- 
plastic opsoclonus—myoclonus ataxia (POMA), result- 
ing in neuronal degeneration. The crystal structure of 
the third KH domain of Nova-2 protein has revealed 
how a KH domain interacts with its target RNA 
(Lewis et al., 2000). The extended RNA (AUCAC) 
lies on a hydrophobic platform formed by the first 
and second a-helices and the edge of the second B- 
strand, where it is gripped by the invariant Gly-X-X- 
Gly motif and the variable loop between strands 2 and 
3. A leucine residue on the hydrophobic platform 
makes van der Waals’ contacts with an A in the 
AUCAC sequence. This residue corresponds to 
an isoleucine residue in FMR1 and its substitution 
with Asn causes severe mental retardation. This sug- 
gests that the Ile—+Asn mutation causes metal retard- 
ation by affecting the RNA-binding properties of 
FMR1. 


Conclusion 


The RNP, dsRBD, and KH domains are the most 
common RNA-binding modules found in many 
RNA-binding proteins that have diverse functions. 
The three-dimensional structures of these domains 
have been determined by both NMR and X-ray 
crystallography. The crystallographic structures of 
proteins having these domains in complex with RNA 
have revealed how these domains recognize their spe- 
cific target RNAs. The structures of these protein— 
RNA complexes have also provided important insight 
into mutations that result in human diseases. 
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The term ‘RNA editing’ encompasses a variety of 
distinct processes that change the base content of 
RNA from that encoded in the genome. It does not 
include the ubiquitous processes of premessenger 
RNA splicing, 5'- or 3’-end formation, or the hyper- 
modification of nucleotides in transfer RNA. It 
can involve insertion, deletion, or modification of 
nucleotides. 


Insertion and/or Deletion Editing 


Trypanosome RNA Editing 

Trypanosomes provide the most florid example of 
RNA editing. These parasitic, flagellate protozoa 
belong to the order Kinetoplastida. Kinetoplastid 
protozoa parasitize mammals, birds, reptiles, and fish. 
RNA editing has been found in the complex mito- 
chondria (‘kinetoplasts’) of all kinetoplastid species 
studied so far. RNA editing has not yet been found 
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in euglena, a divergent, free-living relative. Kineto- 
plastid protozoa are of medical interest and import- 
ance because of the range of diseases they produce, 
such as kala-azar (visceral leishmaniasis), Chagas 
disease (American trypanosomiasis), and sleeping 
sickness (African trypanosomiasis), and of special bio- 
logical interest because they are the most primitive 
extant eukaryotes that contain mitochondria. 
Mitochondrial preedited RNA transcripts are 
altered by deletion of certain genomically encoded 
uridines (U) and insertion of other noncoded uridines 
(Figure |). The edited regions may contain up to 40 
Us. The result is that abnormalities in local reading 
frames are corrected. More spectacularly, extensive 
editing can create as much as half of the messenger 
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Figure | Model for U-insertion RNA editing. The 


thick line represents the messenger RNA (mRNA) 
molecule. The thin line represents the cognate gRNA. 
Vertical lines indicate base pairs. The 3’ oligo (U) tail in 
the gRNA is represented as an overhang. It interacts 
with the preedited region of the mRNA. U-deletion 
editing involves a substrate mRNA with one or more Us 
at the 3’ end of the 5’ cleavage fragment and an absence 
of guiding nucleotides in the guide RNA to base-pair 
with these Us, which are then trimmed by the 
exonuclease activity. (From Alfonzo et al. (1997) Nucleic 
Acids Research.) 


RNA (mRNA) sequence for those genes that had 
previously been hard to recognize. By this process, 
aberrant transcripts are edited back to sequences found 
in the mitochondrial genes of other organisms, and the 
normal complement of mitochondrial mRNAs and 
proteins are produced. 

The mitochondrial genome of kinetoplastids is 
made up of concatenated maxi-circles and mini-circles 
of DNA (large and small circular pieces of DNA 
connected in links to form a chain). Maxi-circle 
DNA encodes the aberrant mitochondrial preedited 
mRNAs which are edited back to sequences found in 
the mitochondrial genes of other organisms. Mini- 
circles encode guide RNAs which form a template 
for the reconstruction of normal mRNA by the RNA 
editing process. 


Mechanism of RNA Editing 

Guide RNAs contain a 4- to 18-nt anchor sequence, 
which is the opposite of the sequence immediately 
downstream of the editing site on unedited transcripts. 
Guide RNAs hybridize with the preedited RNA, but 
are mismatched at the editing site. 5’ of the mismatch 
between the guide RNA and the unedited premes- 
senger RNAs, the RNA backbone, is cleaved by an 
endonuclease. U is added by the enzyme terminal 
ribonucleotide transferase or deleted by an exonu- 
clease as directed by the guide RNA template. The 
free ends of the corrected RNA are ligated by an RNA 
ligase enzyme. The proteins of the editing complex 
are imported from the nucleus into the kinetoplast. 
These proteins are now purified and in the process of 
characterization. 


Origins of RNA Editing 

Although the pressures that led to the evolution of 
kinetoplastid RNA editing are obscure, this RNA 
editing is evidently strongly regulated by changes in 
the life cycle of parasites, which creates the need for 
switching between anaerobic glycolysis and aerobic 
respiration, using the tricarboxylic acid cycle. This 
regulation is at the level of the nuclear transcripts, 
the protein products of which must be imported into 
the kinetoplasts. Kinetoplastid RNA production by 


maxi-circle and mini-circle genes is constitutive. 


Substitution or Modification RNA 
Editing 

Two well-characterized types of RNA editing affect 
genes expressed in the nucleus. These are cytidine 
(C)-to-U and adenosine (A)-to-inosine (I; read as 
guanosine, G) RNA editing. Various less completely 
understood forms of nuclear RNA editing are 


described in animals. RNA editing also affects organ- 
elle genome transcripts in plants. 


C-to-U RNA Editing 


ApoB mRNA editing 

The prototype of C-to-U RNA editing is that of 
apolipoprotein (apo)B mRNA. ApoB is the principal 
cholesterol and triglyceride transport protein in the 
blood. C-to-U RNA editing generates a specific stop 
translation codon (UAA) from glutamine codon 2153 
(CAA) in apoB mRNA, thereby causing the produc- 
tion of a truncated apoB polypeptide by cells of the 
small intestine. Intestinal apoB is required for dietary 
fat absorption. 

The apoB mRNA editing enzyme has catalytic and 
RNA-binding subunits (Figure 2). The RNA-binding 
subunit confers specific RNA substrate recognition 
on the catalytic subunit. The site of apoB mRNA 
editing is identified by the RNA editing enzyme 
through a highly conserved sequence around the edit- 
ing site which has sequence, spatial, and secondary 
structural elements all required for specific editing- 
site recognition. 

The catalytic subunit of the editing enzyme is 
an RNA-binding cytidine deaminase designated 
‘APOBEC-1’ for apoB mRNA editing component-1. 
APOBEC-1 is closely related in structure to Escher- 
ichia coli cytidine deaminase (ECCDA). The catalytic 
activity of cytidine deaminases derives from a cluster 
of residues which bind zinc, activate a zinc-bound 
water molecule, and mediate proton transfer. This 
leads to elimination of ammonia from C in order to 
form U. ECCDA and APOBEC-1 both form head- 
to-tail homodimers. Each has a monomer composed 
of two core domains. The core domains are linked by 
an extended peptide. In the dimer, two composite 
active sites are constructed with contributions from 
N- and C-terminal core domains of each monomer. 
The N-terminal core domain contains the catalytic 
residues. In contrast to ECCDA, gaps in the APO- 
BEC-1 sequence form a crevice which accommodates 
the RNA substrate. 

One of the composite active sites of the APOBEC-1 
homodimer binds to the edited C in apoB mRNA. 
The other binds downstream on the RNA to a series of 
Us. The bifunctional binding of the APOBEC-1 active 
site to both substrate (C) and product (U) positions the 
enzyme homodimer for RNA editing. 

The catalytic subunit of apoB RNA editing enzyme 
is completed by an RNA-binding subunit. This 
protein is designated ‘ACF’ for APOBEC-1 comple- 
menting factor. ACF protein contains three single- 
stranded RNA-binding motifs at its N-terminal and 
a double-stranded RNA recognition motif toward its 
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(A) Linker 


(B) 


Figure 2 (See Plate 32) Apolipoprotein B (ApoB) 
mRNA editing. The composite active site of the apoB 
mRNA editing component-| (APOBEC-1), homodimer 
is made up of contributions from the N-terminal true 
active site (AS) and the C-terminal pseudoactive site 
(PAS) core domains. (A) Active site of APOBEC-| 
interacts with the C to be edited. Downstream, the 
other active site of the APOBEC homodimer binds a 
series of U residues. ‘wH’ denotes the N-terminal 
a-helical domain of APOBEC-I. (B) The stem-loop 
structure formed by apoB mRNA at the editing site is 
shown. Upper case letters denote sequence acquire- 
ments for editing. A to G are scanning mutants that 
facilitate (+) or abolish editing (—). The edited C is 
found in the open loop. APOBEC-I complementing 
factor (ACF) binds to the juncture between the stem 
loop and the loop as is characteristic of other RNA 
recognition motif (RRM) RNA-binding proteins. 


C-terminal. ACF binds to the junction between the 
double-stranded stem and single-stranded portion of 
the RNA loop formed by the conserved sequence 
that contains the edited C. ACF facilitates loading of 
apoB messenger RNA into the APOBEC-1 active 
site. 


APOBEC-like editing enzymes 

A number of other APOBEC-1-like genes have been 
identified. A muscle- and heart-specific form, a B 
lymphocyte-specific form, and an anthropoid-specific 
cluster of genes have been discovered. The function of 
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Figure 3 (See Plate 33) Generation of antibody repertoires. B lymphocytes developing in fetal liver or adult bone 
marrow use RAGI and RAG2 proteins to rearrange their immunoglobulin V (variable), D (diversity), and J (joining) 
gene segments, producing a functionally integrated VDJ segment that is linked to the u constant region (Cp). This 
yields a primary antibody repertoire composed of IgM antibodies. Subsequent encounter with antigen causes those B 
cells expressing cognate IgM antibodies to proliferate, forming germinal centers in secondary lymphoid organs. Here, 
their rearranged immunoglobulin genes undergo class (isotype) switching and hypermutation, allowing the production 
of high-affinity IgG antibodies (the secondary repertoire). Class switching occurs by region-specific recombination 
between the switch (s) regions located upstream of Cp and Cy. Hypermutation introduces multiple single-nucleotide 
substitutions into a region of ~2 kb encompassing the rearranged VDJ. Deficiency in activation-induced deaminase 
(AID) abolishes the switching and hypermutation of the secondary repertoire. (From Neuberger and Scott (2000).) 


the encoded products of these genes apart from the B 
lymphocyte form is not yet known. 


RNA editing and antibody production 

The B lymphocyte-specific form of APOBEC is 
designated ‘activation induced deaminase’ (AID), 
because it is induced by antigen encounter in lymph 
node germinal centers. Deficiency of AID completely 
obliterates the generation of the secondary antibody 
repertoire in mammalian B cells by the process of class 
switching of antibody production from IgM to IgG 
and somatic hypermutation, both processes which 
allow the production of higher affinity of IgG anti- 
bodies (Figure 3). Class switching and somatic hyper- 
mutation are mechanistically quite distinct. 

The fact that both class switching and somatic 
hypermutation are abolished by AID deficiency 
could reflect the requirement for AID-directed 
mRNA editing, such as that of a nuclease necessary 
for the two processes. Less conventionally and per- 
haps more interesting is the idea that AID-directed 
editing of immunoglobulin transcripts occurs while 
these transcripts are still attached to their genomic 


templates, thereby providing an important signal in 
the switching and hypermutation process. Thus, RNA 
editing-induced mismatches could trigger the double- 
stranded breaks necessary for switch recombination 
and the error-prone repair of hypermutation. In sucha 
scenario, the question might arise as to how AID can 
be specific for its immunoglobulin RNA substrate. 
The answer might be provided by the proteins like 
ACF that associated with AID. The close (35%) 
homology between AID and APOBEC-1 suggests 
that they share a common evolutionary ancestor. 
Indeed, the two genes are linked on human chromo- 
some 12. Dependence of somatic hypermutation on 
AID in mammals suggests that an AID homolog 
is likely to be present in those lower vertebrates that 
use hypermutation to generate their antigen receptor 
repertoire. In that case, AID is likely to predate APO- 
BEC-1, which is not found in lower vertebrates, and 
to be the precursor from which APOBEC-1 evolved. 


APOBEC-! and neoplasia 
A variety of studies implicate APOBEC-1 in tumoro- 
genesis. Transgenic mice and rabbits that have a high 


level of expression of APOBEC-1 in the liver develop 
malignant liver tumors. The mechanism of tumoro- 
genesis is unclear. Proposed mechanisms include pro- 
miscuous ‘hyperediting’ of many different transcripts 
akin to the mutator status that causes colon carcino- 
mas, message stabilization through binding to A-U- 
rich degradation signals in the 3’ untranslated region, 
such as is the case for the oncogene c-myc mRNA, or 
removal of the tumor suppresser function of proteins 
as has been demonstrated to occur by editing of the 
neurofibromatosis type 1 (NF1) mRNA. NF1 mRNA 
editing is correlated with transformation from benign 
neurofibromas to malignant neurofibrocarcomas in 
this disease. 


Other forms of C to U RNA editing 

Two other forms of C to U RNA editing are described. 
A major cytoplasmic transfer RNA (tRNA) for aspar- 
tic acid in rats undergoes U to C and C to U conver- 
sion at the two nucleotides adjacent to the anticodon 
loop. This generates the major species of this tRNA. 
C to U RNA editing also alters the tRNA for glycine 
in the mitochondria of marsupials to aspartate tRNA. 


U to C RNA Editing 

The mRNA for the transcription factor (WT1) that 
confers susceptibility to Wilms tumor (a malignant 
renal tumor of childhood) undergoes a U to C editing 
which changes the amino acid proline to leucine. This 
alters the expression of the gene that is the target of the 
Wilms tumor susceptibility protein. A role in embry- 
ological development and tumor production has been 
proposed. 


A to I RNA Editing 


Adenosine deaminases acting on RNA 

A to I RNA editing enzymes are identified in mam- 
mals, molluscs, flies, and worms. They are designated 
ADARs for adenosine deaminase acting on RNA. 
Related enzymes are also found in yeast and bacteria. 
The animal enzymes are characterized by their ability 
to edit A to I in double-stranded RNA. They contain 
two of three classical double-stranded RNA binding 
domains and a catalytic domain. A Z DNA-binding 
domain is also found in ADARI. The catalytic domain 
contains the same active site motif as the cytidine 
deaminases. It is not related to the active site of the 
adenosine deaminase that acts on monomeric sub- 
strates. 


Mechanism of A to | RNA editing 

ADARs edit A to I nonselectively in extended perfect 
double-stranded RNA duplexes. This activity is 
highly conserved and ancient in origin. The biological 


RNA Editing in Animals 1739 


role for this promiscuous editing is not known. It has 
variously been suggested to be involved in gene regu- 
lation, viral life cycle or defence against viruses. The A 
to I editing enzymes also have a site specific role in 
covering A to I in premessenger RNA. The targets for 
A to I editing are mainly found in the nervous system 
of vertebrates and invertebrates. They include tran- 
scripts for ligand or voltage gated ion channels and G 
protein-coupled receptors. A to I editing exists in a 
variety of other tissues, such as the heart, where I has 
been detected in mRNA. 

The prototype of site selective A to I premessenger 
RNA editing is in the glutamate receptor sub-unit 
genes. At one site in the gluRB mRNA (the glutamine 
(Q)/arginine (R) site) undergoes editing from a geno- 
mically encoded Q codon (CAG) to an R codon 
(CGG). ADAR2 is the editing enzyme for this Q/R 
site. The functional consequence of this editing is a 
marked change in calcium permeability of the gluR 
channels. Site specific A to I editing is dependent on 
the formation of double stranded RNA between the 
editing site and a complementary downstream intron 
(Figure 4). The most plausible mechanism of action 
for this enzyme is suggested by the structural relation- 
ship of the ADARs to Hhal DNA methy] transferase. 
Thus ADAR would flip out the A to be modified to 
bring it into the active site of the enzyme. Other 
examples of A to I editing in mammals include other 
gluR editing sites, a variety of sites in serotonin recep- 
tor messenger RNA and in the ADAR2 gene premes- 
senger RNA itself (Figure 4). The Q/R site is the most 
highly edited of all the A to I editing sites. More than 
99% of transcripts are edited. Other examples of A to 
I editing have considerably lower frequencies. 


Function of A to | RNA editing 

Although the mechanism of A to I editing has been 
extensively investigated its biological role has not been 
established. It is most plausibly representing another 
tier of posttranscriptional regulation for generating 
protein diversity. By tailoring mixtures of proteins 
by changing single amino acids and so supply diversity 
to the nervous system in carrying out its hierarchical 
functions. Support for this is provided by deletion 
of ADAR in Drosophila. This produced morpho- 
logically normal animals that exhibit a temperature- 
sensitive paralysis, locomotor incoordination and 
tremors, which increase with age. This phenotype is 
associated with neurodegeneration, but not with 
shortening of lifespan. The fine tuning of ion channel 
function in the brain and heart and of receptors 
involved in mood and appetite control or ion channels 
in the axons of the squids tentacles can be seen as an 
important process that facilitates plasticity of control 
and function. 
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Figure 4 (See Plate 34) Transcripts encoding the 
serotonin (5-hydroxytryptamine, 5-HT) receptor sub- 
type, 5-HT2c undergo RNA editing by the double- 
stranded RNA editing deaminase ADAR2. Genomically 
encoded adenosine residues in the pre-mRNA are 
converted to inosines, resulting in the following amino 
acid changes at three sites in the receptor: isoleucine (I) 
to valine (V); asparagine (N) to serine (S); and isoleucine 
to valine. The editing sites are identified in the double- 
stranded RNA that is formed between intron and exon 
sequences. (From Scott (1997).) 


ADAR2 edits the 3’ splice acceptor site of its first 
intron. This editing appears to be under feedback 
control, so that production of an alternatively spliced 
transcript with an inefficient translation initiation site 
reduces production of ADAR2 protein and prevents a 
hyper-editing phenomenon that might be deleterious 
to cellular function, or as in the case of APOBEC-1, 
lead to neoplastic transformation. 

ADAR1 is necessary for embryonic erythropoiesis. 
Mice deficient in one copy of the ADAR1 die before 
embryonic day 14 of defects of the hemopoietic sys- 
tem indicating that ADAR1 expression is critical for 
embryonic hemopoiesis in the liver. 


Origins of A to | editing 
The adenosine deaminase that generates I wobble 
bases in transfer RNAs in Saccharomyces cerevisiae 


(tRNA-specific adenosine deaminase, Tad) like the 
double-stranded RNA editing deaminases contain 
the cytidine deaminase active site motif. These 
enzymes appear to be the evolutionary antecedents 
of the ADARS and have clear homologs in bacteria. 


Other forms of RNA editing 


Human immunodeficiency virus (HIV) mRNA 
undergoes A to G and C to U changes. G to A modi- 
fication in the untranslated region of exon 1 is present 
only in the spliced HIV mRNA. The creation of stop 
codons in HIV mRNAs may control the translation of 
viral proteins, such as viral protein R, that are involved 
in the regulation of HIV expression by chronically 
infected cells. 

Other forms of RNA editing occur in mitochondria. 
The slime mold Physarum polycephalum modifies 
mitochondrial RNA by the insertion of nonencoded 
Cs and, at a lower frequency, G and U residues at 
many precise sites. In addition, C is substituted by 
U. Thus, Physarum sp. displays mixed insertional 
and substitutional editing. The mechanisms of these 
editings are unknown. 

In Acanthamoeba castellani and the related fungus 
Spizellomyces punctatus, tRNA undergoes single- 
nucleotide conversions (U to A, U to G, A to G), 
which correct mismatched base pairs to those found 
in normal tRNA. It is believed that this type of 
editing involves base-pair exchanges rather than 
modification. 
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The diversity of RNA editing (see RNA Editing in 
Animals) has now been shown to extend through 


viruses to primitive eukaryotes, fungi, and plants, with 
now increasing frequency within the vertebrates. The 
functions of RNA editing can be crudely categorized 
into two groups: those that restore functionality and 
those that create protein diversity. 

RNA editing in the mitochondria of flowering 
plants (angiosperms) was first documented over a dec- 
ade ago (Covello and Gray 1989; Gualberto et al., 
1989; Hiesel et al., 1989). A few years later, the same 
type of editing was reported in angiosperm chloro- 
plasts (Hoch et al., 1991). In more recent work, editing 
has been shown to be more widespread within the land 
plants, occurring in all major groups including the 
Bryophyta and gymnosperms. This posttranscrip- 
tional mRNA editing consists almost exclusively of 
C-to-U substitutions, although infrequent reverse (U- 
to-C) changes have also been reported. Editing events 
take place predominantly at the first or second pos- 
itions of codons, thereby almost always changing 
the amino acid from that specified by the unedited 
(genome-encoded) codon. In some instances, editing 
may also create initiation and termination codons. 
Although this C-to-U editing has been reported to 
occur in plant mitochondrial rRNA, tRNA, and 
intron sequences, as well as in 5'- and 3’-untranslated 
regions of mitochondrial mRNAs, it predominantly 
affects the translated regions of protein coding tran- 
scripts. Most of these nucleotide exchanges in coding 
regions lead to altered codons in the mRNA that 
specify amino acids more conserved in evolution 
than those encoded by the genomic DNA. On the 
basis of such sequence comparisons, it can be argued 
that RNA editing in plants is required for function, 
effectively acting as an RNA repair mechanism 
to correct gene-encoded mutations that would 
otherwise be lethal. 
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Trypanosome RNA editing is the posttranscriptional 
insertion and deletion of U residues (Us) in primary 
mitochondrial transcripts to generate mature protein- 
coding mRNAs. This editing occurs in Trypanosoma, 
Leishmania, Crithidia, and related lower eukaryotes 
and is surely the most bizarre form of RNA proces- 
sing known. There can be hundreds of individual edit- 
ing events in a single pre-mRNA, generally spaced a 
few nucleotides apart, most involving ~1-5 U resi- 
dues. Yet an uncorrected editing error at only one of 
these sites would generate a frameshifted, nonfunc- 
tional mRNA. U insertion is ~ 10-fold more frequent 
than U deletion. Thus, in extensively edited mRNAs, 
over half the coding residues can be Us that were 
added by editing. Some trypanosome mitochondrial 
mRNAs are edited extensively, some edited only in 
certain regions, and still others are not edited at all. In 
many edited mRNAs, the U of the initiating AUG is 
one of the residues introduced by editing. 


RNA Editing and gRNAs 


When discovered in 1986 by Rob Benne (Benne, 
1986), RNA editing immediately explained why 
many trypanosome mitochondrial gene sequences 
had appeared nonfunctional, and this editing has gen- 
erated much interest, largely due to its unprecedented 
nature. (Subsequently, other kinds of RNA processing 
have also been termed ‘editing’, but these are 
mechanistically and evolutionarily unrelated.) The 
initial glaring question in trypanosome RNA editing, 
what directs its specificity, was answered when Larry 
Simpson’s laboratory (Blum et al., 1990) identified 
guide RNAs (gRNAs). These small separate 
mitochondrial transcripts are complimentary, using 
Watson—Crick as well as G-U interactions, to seg- 
ments of edited sequence. Hence, they have mis- 
matches with the pre-mRNA at the sites of editing, 
and these direct the location and extent of the mod- 
ifications. Unpaired Us in the pre-mRNA are 
removed, and unpaired purines in the gRNA direct 
addition of an equal number of Us at the opposing 
mRNA position (Figure |). The mRNA complemen- 
tarity of the 3'-most gRNA begins several nucleotides 
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Figure | Mechanisms of U deletion (left) and U 
insertion (right) during trypanosome RNA editing. Both 
involve pairing of a guide RNA (gRNA) to the pre- 
mRNA at a 3’ anchor site, followed by endonuclease 
cleavage of the pre-RNA at the mismatch adjacent to the 
anchor duplex. U residues are then either removed (by 
3’-U-exonuclease) or added (by TUTase, terminal U 
transferase), and the mRNA is rejoined by RNA ligase. 
Despite their similarities, the two kinds of editing utilize 
distinct catalytic activities at each step. 


beyond the editing domain, so the gRNA can ‘anchor’ 
to the pre-edited mRNA. It then directs editing 
sequentially 3’ to 5’ along the mRNA. Subsequently, 
additional overlapping gRNAs similarly direct editing 
further upstream in the pre-mRNA. 


The Basic Editing Mechanism 


Ken Stuart’s laboratory (Setwert and Stuart, 1994; 
Kable et al., 1996) then made a major advance by 
demonstrating that a cycle of U deletion or U inser- 
tion could be reproduced im vitro using mitochondrial 
extract and synthetic RNAs, allowing the editing 
mechanism to be experimentally investigated. Cor- 
recting a widely held belief that this RNA processing 
involved coupled transesterification reactions, each 
editing cycle was instead found to utilize three 
protein-catalyzed reactions of the gRNA-mRNA 
pair (see Figure l; reviewed in Alfonzo et al., 1997; 
Sollner-Webb, 1996). First, endonuclease cleaves the 
pre-mRNA at the mismatch adjacent to the anchor du- 
plex. Second, either terminal U transferase (TUTase) 
adds or 3’ U-exonuclease (3'-U-exo) removes U resi- 
dues from the end of the upstream mRNA fragment. 
And third, RNA ligase rejoins the mRNA. This allows 


the base pairing to ‘zip-up’ to the next mismatch, 
where another editing cycle can begin. 


Unexpected Differences Between the 
Two Kinds of Editing 


A seven-polypeptide complex that catalyzes both U 
deletion and U insertion and contains all the predicted 
component enzymatic activities was then purified 
in the Sollner-Webb laboratory (Rusché et al., 1997; 
Sollner-Webb, 1996). The relative simplicity of this 
complex encourages one to think that the editing 
mechanism can be fully explained, and excellent stu- 
dies of these cloned genes are showing their impor- 
tance. Surprisingly, although the basic outlines of U 
deletion and U insertion appear very similar, these two 
kinds of editing utilize distinct catalytic activities 
(Cruz-Reyes et al., 1998). The endonucleases active 
in U deletion and U insertion have markedly different 
properties; 3’/-U-exo is not a reverse action of the 
TUTase activity, and the editing complex contains 
two different RNA ligases, evidently one for each 
kind of editing. 

Several other unanticipated aspects of trypanosome 
RNA editing have been found recently by the authors. 
First, the gRNA features that direct U deletion and U 
insertion are strikingly different, even though the 
same gRNA molecules guide both kinds of editing 
and the various gRNAs share many common features. 
Second, only minimal gRNA features direct U dele- 
tion (single-stranded character beyond the anchor 
duplex plus some sequence to tether the upstream 
mRNA), and artificial gRNAs with these simple fea- 
tures direct ~ 30-fold enhanced levels of this editing, 
indicating that the natural A6 (3’) gRNA is remark- 
ably restricted for U deletion. Third, completeness 
of U deletion is assured by the second editing step 
(the 3’-U-exo is specific for removing all contiguous 
upstream Us and the ligase of U deletion is relatively 
non-specific), while the completeness of U insertion 
arises at the third editing step (after the TUTase evi- 
dently adds Us nonselectively, its ligase evidently uses 
a gRNA bridge to position fully edited sites). The 
major differences between U deletion and U insertion 
raise the possibility that rather than sharing a common 
evolutionary genesis, U deletion may have originated 
separately from U insertion and their enzymes only 
later joined in a common protein complex. 


Future Directions 


Remaining important issues about RNA editing that 
will probably be addressed in the next few years 
include the specific roles of the various polypeptides 
in the editing process, how the events are coupled 


between successive editing sites and between succes- 
sive gRNAs, how RNA editing is controlled, and 
whether it serves a currently advantageous function. 
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RNA interference, usually abbreviated as RNAi, is a 
gene-silencing effect first discovered in the course of 
transgenic experiments on the nematode Caenorhab- 
ditis elegans, and subsequently found to be widely 
distributed in eukaryotes. During experiments direc- 
ted at specifically inhibiting target genes in C. elegans 
by the injection of antisense RNA, it was observed 
that control injections of sense RNA were just as 
effective as antisense RNA, causing the reduction or 
elimination of expression from the gene under inves- 
tigation. Subsequently it was discovered that the effect 


RNA Interference 1743 


could be most potently elicited by injecting double- 
stranded RNA (dsRNA) corresponding to the target 
gene, and that the earlier results could be explained 
by contamination of the single-stranded RNA (either 
sense or antisense) by traces of dsRNA. It appears that 
the presence of dsRNA corresponding to part or all of 
the mature mRNA from any given gene leads to sub- 
stantial or complete inhibition of expression from that 
gene, by a posttranscriptional mechanism. This 
mechanism is not fully understood, but it probably 
involves the selective degradation of the mRNA in 
question by a targeted exonuclease. 

RNAi appears to be related to several other gene- 
silencing effects observed in other organisms, such as 
cosuppression (in plants and some animals) and quel- 
ling (in certain fungi). These effects were observed 
when additional copies of a gene were introduced 
into the genome, which sometimes has the paradoxical 
effect of silencing both the endogenous gene and the 
added transgenes. It is possible that the main natural 
function of these processes is to act as a defence against 
viruses or transposons. Many viruses have dsRNA 
genomes, or generate viral dsRNA during their life- 
cycles, so a mechanism to detect such molecules and 
repress the corresponding genes would be select- 
ively advantageous. The phenomena are not entirely 
identical: for example, RNAi and germline cosuppres- 
sion both occur in C. elegans, and they have been 
shown to have partly overlapping but not identical 
genetic requirements. Also, cosuppression in plants 
can be induced as a stably heritable inhibition, but 
RNAi does not exhibit stable heritability in C. elegans. 

Operationally, RNAi provides an extremely con- 
venient tool for specifically inhibiting gene function 
in C. elegans, and also in various other eukaryotes 
(Drosophila, trypanosomes, some vertebrates), though 
the utility of the technique may be somewhat less 
in these other systems. dsRNA can be applied to 
C. elegans by a variety of methods. Probably the most 
potent is direct injection of dsRNA into the syncytial 
gonad of an adult hermaphrodite, which results in 
target gene silencing in most or all of the eggs and 
progeny produced from the injected gonad. Surpris- 
ingly, the silencing can spread from tissue to tissue, so 
that injection of dsRNA into the intestine of a worm 
will also elicit RNAi. Soaking worms in solutions of 
dsRNA has the same effect. Worms can even be fed on 
E. coli that are expressing dsRNA, and this too elicits 
RNAi. Finally, worms can be made transgenic for 
constructs that express a self-complementary ‘hairpin’ 
RNA under the control of an inducible heat-shock 
promoter. When these worms are heat-shocked, the 
transgene is transcribed, producing an RNA with an 
extended dsRNA hairpin, and this activates RNAi. 
The advantage of this last technique is that it can be 
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used on genes that are needed for both embryonic 
and postembryonic events. Direct RNAi treatment 
of such genes normally leads to embryonic lethality, 
yielding no information about their postembryonic 
functions, but the heat-shock—hairpin method can be 
used to examine these, by delaying the induction until 
after embryogenesis is complete. 

The technique has some disadvantages, such as the 
fact that some genes appear to be refractory to RNAi, 
or produce RNAi phenocopies that are much less 
extreme in phenotype than null mutants. The silencing 
machinery also appears to be easily saturated, so that 
simultaneous treatment with three or more different 
dsRNAs at once leads to less efficient silencing than 
treatment with just one or two dsRNA species. These 
disadvantages are more than outweighed by the speed 
and convenience of the method. 


See also: Antisense RNA; Gene Silencing 


RNA Phages 


L Mindich 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1134 


Only two families of RNA bacteriophages have been 
described. The first to be discovered were the Levivir- 
idae. These are viruses with genomes of a single RNA 
molecule about 4000 bases long. The genomic RNA can 
serve as message in protein synthesis as well as acting as 
an intermediate in replication. It is called plus strand 
RNA. The RNA is packaged in a capsid containing 180 
molecules ofa single protein called coat protein and one 
copy of maturation protein. The second family is the 
Cystoviridae, whose members have a more complex 
structure. They have genomes composed of three dif- 
ferent molecules of double-stranded RNA (dsRNA). 
The three dsRNA molecules are packaged in an inner 
core particle composed of 120 molecules of a structural 
protein, but also containing polymerase molecules, and 
copies of a motor-like protein called an NTPase. The 
inner core is surrounded by a series of complex layers 
including lipids. So far, all the hosts of RNA bacterio- 
phages have been found to be Gram-negative bacteria. 
This may be due to the difficulty in infecting Gram- 
positives as these phages use different penetration 
strategies from the tailed DNA bacteriophages. 

The Leviviridae are comprised, among others, of 
phages fr, f2, ms2, QB, GA, JP34, PRR1, and PP7. 
Most of the known members of the family replicate 
in male strains of Escherichia coli, but some, like 
PRR1 can replicate in virtually any Gram-negative 


bacterium that carries a P-type plasmid with concur- 
rent pilus production. In all cases, the virus attaches to 
a pilus on the host cell. The pilus retracts and the virus 
is then in close proximity to the cell surface. The RNA 
is released from the virion in a poorly understood 
process and enters the cell along with the maturation 
protein. In the cell, the RNA acts as a messenger, 
particularly coding for an RNA polymerase molecule. 
This protein combines with two host proteins to form 
the active polymerase, which then starts making copies 
(negative strands) of the infecting RNA. The negative 
strands then serve as templates for the synthesis of more 
viral plus strands. The plus and minus strand RNAs 
do not anneal with each other because they each have 
extensive folding patterns resulting in very little 
unpaired sequence. The Leviviridae have very high 
mutation rates because RNA polymerases do not 
exhibit the editing or correcting mechanisms found in 
most cases of DNA synthesis. RNA viruses generally 
have mutations rates thousands of times greater than 
DNA viruses, or DNA genomes in organisms. 

At the same time, the same message codes for the 
synthesis of capsid protein and lysin and/or matur- 
ation protein. Lysin synthesis involves a frameshift in 
the capsid protein gene. The capsid protein can bind to 
the plus strand RNA for two separate functions. It 
binds to a stem-loop structure near the beginning of 
the polymerase gene to turn off its synthesis and to 
capture the RNA for packaging into a capsid. 

Bacteriophage Qf was the first RNA virus to be 
produced from a cDNA copy in a plasmid. The gen- 
ome was copied as cDNA and put into plasmid PCRI. 
When E. coli was transformed with this plasmid, phage 
was produced. It seems that the RNA transcript of the 
plasmid was trimmed in the host cell so as to result ina 
perfect copy of the phage genome, which could then 
replicate just as the genome replicates in an infected 
cell. Using this approach, it is possible to construct 
many interesting mutant forms of these viruses. This 
process is called reverse genetics. 

Although some of the plus strand viruses of eukar- 
yotes manifest homologous recombination, the Levi- 
viridae do not. However, it has been possible to effect 
homologous recombination between the viral gen- 
omes and plasmid transcripts in infected cells. It has 
even been found that RNA molecules can recombine 
in vitro, probably through the mediation of the viral 
polymerase, but also by other mechanisms. 

The Cystoviridae are comprised of phages 06, 7, 8, 
9, 10, 11, 12, 13, 14, and 15. They all replicate in 
pseudomonads, but one, 8, will form plaques on 
Salmonella typhimurium. Some of the cystoviruses 
attach to pili on the host cells. When the pili retract 
the phages are brought into contact with the outer 
membrane of the cell. The lipid-containing envelope 


of the virus then fuses with the outer membrane of 
the cell which results in the nucleocapsid of the virus 
being placed in the periplasm (space between the outer 
membrane and the cell wall). Some of the Cystoviridae 
attach directly to the outer membrane and then fuse. 
The cystoviruses carry a wall-destroying enzyme 
(muramidase) on the surface of the inner core and 
this makes a hole allowing the particle to contact the 
inner membrane of the host. The particle then pene- 
trates into the cell, possibly with some of the host 
membrane temporarily around it. Ultimately the mem- 
brane is lost and the inner core begins transcription 
(plus strand synthesis). These viruses have a special 
problem in that the genomic RNA cannot be trans- 
lated or transcribed by host enzymes. The virus particle, 
therefore, carries its own polymerase and conse- 
quently the whole inner core must enter the cell. In 
addition, the host cells have enzymes that will destroy 
double-stranded RNA, so the genome must always 
remain inside the particle to be protected. 

Once transcription starts, the RNA serves as mes- 
sage for protein synthesis, especially for the synthesis 
of the inner core proteins which assemble to form a 
dodecahedral particle. This particle is able to recognize, 
bind, and package the plus strands inside the particle to 
result in dsRNA. Once dsRNA is made, the particles 
start to produce more plus strand RNAs which serve as 
replicative intermediates and as mRNA. Just as in the 
case of the Leviviridae, the mutation rate is very high 
for the Cystoviridae. Mistakes that occur during repli- 
cation are not corrected. The mutation rates for both 
the Leviviridae and the Cystoviridae are high enough 
so that no two transcripts are likely to be completely 
identical. This condition has been called quasi species 
and applies to just about all RNA viruses. 

Mistakes are corrected by selection (deleterious 
mutations result in poor growth) and by back muta- 
tion. In addition, the Cystoviridae can reassort their 
genomic segments when infecting cells that are 
infected by other strains of the same or closely related 
viruses. The plus strands from more than one virus can 
mix in a pool and enter new procapsids randomly in 
their normal order of S-M-L. Each different plus 
strand RNA has a unique packaging sequence near 
the 5’ end of the molecules and this determines when 
and if it will be packaged. Some of the Cystoviridae 
are very stringent in their packaging and will only 
package RNA with homologous sequences while 
others are less demanding and can package RNA 
from distant relatives. In this way viruses can effect 
drastic changes in their characteristics such as how 
they attach to host cells. This is particularly easy in 
the case of the Cystoviridae because all the genes for 
the host attachment proteins are on segment M and 
can be transferred together. 


RNA Phages 1745 


The Cystoviridae can be manipulated by reverse 
genetics. If a host cell is transformed simultaneously 
with three plasmids, each one containing a cDNA 
copy of a different genomic segment, the transcripts 
can direct the assembly of infectious virus. It has also 
been possible to prepare plasmids containing all the 
genes of $6 or $13 in one transcript and produce a 
virus with a single genomic segment containing all the 
genes of the virus. 

It is also possible to grow mutant virus on strains 
carrying plasmids with engineered cDNA copies. 
Under these conditions the virus can acquire the plas- 
mid transcript as a replacement for its homologous 
genomic segment. In this manner it is possible to 
introduce mutant genes, or new reporter genes such 
as gfp, or laca or kan. It is also possible to produce 
virus with only two genomic segments, one being 
the normal L segment, the other being a chimera of 
segments S and M. 

A third way of manipulating the genomes of the 
viruses is by im vitro packaging. If empty inner core 
particles are prepared, they will package plus strand 
RNA im vitro. If the RNA is produced by in vitro 
transcription of a cDNA plasmid, then the composi- 
tion of the segments can be altered. The in vitro pack- 
aging process proceeds to the point of minus strand 
synthesis. These inner core particles can then be used 
to infect spheroplasts of the host bacterium to produce 
live phage. In some of the viruses, the inner core must 
first be covered with another protein, but in others, 
the inner core itself is infectious. 

The Cystoviridae exhibit heterologous recombin- 
ation. That is recombination between the different 
genomic segments even though there is no great 
sequence identity involved. The crossover regions 
usually show an average of about three identical 
bases on the donor and recipient segments. The re- 
combination is of a type called template switching, 
where the synthesis starts on one template and then 
jumps to another. The 3’ ends of the plus strands have 
secondary structure that protects them from nucleases 
in the cell. Sometimes the 3’ ends are truncated and 
these molecules can be packaged but do not serve as 
templates for minus strand synthesis. Template 
switching and heterologous recombination can rescue 
these molecules. Although heterologous recombin- 
ation has been seen primarily in laboratory studies, 
its consequences have also been seen in isolates from 
nature. 
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RNA polymerase is the name given to a class of 
enzymes which in vivo synthesize RNA molecules 
using double-stranded DNA as a template. Such en- 
zymes are more properly known as DNA-dependent 
RNA polymerases. The copying of the information 
contained in a DNA sequence into an RNA sequence 
is termed ‘transcription,’ a central step in biological 
information flow. RNA polymerase is the key enzyme 
involved in transcription. (Some RNA viruses encode 
enzymes which synthesize RNA from an RNA tem- 
plate. Typically such an enzyme is called an “RNA rep- 
licase,’ but occasionally the term ‘RNA-dependent 
RNA polymerase’ is used. These enzymes are distinct 
from the RNA polymerases discussed here.) 

All RNA polymerases synthesize an RNA chain 
from the 5’ end to the 3’ end; the template strand of 
the DNA is consequently read in the antiparallel 3’ to 
5’ direction, since templating requires base-pairing. 
The substrates are ATP, GTP, CTP, and UTP (and 
magnesium ion is required). The RNA molecules are 
synthesized from specific starting sites on the DNA 
(called promoters), and RNA polymerase can initiate 
new chains without the requirement for a primer, 
unlike the case with DNA polymerase. However, 
DNA polymerase and RNA polymerase have essen- 
tially identical mechanisms of phosphodiester bond 
formation during chain elongation. 

Cellular RNA polymerases are multisubunit en- 
zymes. Bacteria and Archaea each have a single RNA 
polymerase, while the eukaryotic nucleus contains 
three such enzymes: RNA polymerase I (RNAP 1), 
RNA polymerase II (RNAP II), and RNA polymer- 
ase III (RNAP III). While there are profound differ- 
ences between these multisubunit RNA polymerases, 
there are also significant similarities. Indeed, it is clear 
that these enzymes are all related and form a family. 
All members of this family have three different sub- 
units, which are evolutionarily conserved to a greater 
or lesser extent. Bacterial RNA polymerase and the 
closely related chloroplast RNA polymerase contain 


only these conserved subunits. The Archaea, like the 
Bacteria, have only a single RNA polymerase, but it 
is more complex than the bacterial enzyme and is 
more closely related to the eukaryotic RNA polymer- 
ase II. However, even in the complex eukaryotic RNA 
polymerases, conserved sequences make up over 50% 
of the enzyme mass, and therefore the simpler bac- 
terial enzymes have provided an important model for 
RNA polymerase structure and function. This has 
been confirmed by structural analysis of the purified 
enzymes. 

However, not all RNA polymerases are multi- 
subunit enzymes. The enzymes found in mitochondria 
(but encoded in the nucleus) and those encoded by 
some bacteriophages (for transcription) are single- 
subunit enzymes. These single-subunit enzymes are 
not closely related to the complex cellular RNA poly- 
merases but are more closely related to certain DNA 
polymerases. 


Bacterial RNA Polymerases 


Bacteria have a single cellular RNA polymerase 
(RNAP), whose ‘holoenzyme’ form has five subunits: 
two copies of the relatively small -subunit 
(each about 36 kDa), one copy each of large B- and 
B’-subunits (151 kDa and 155 kDa, respectively), and 
one copy of the o-subunit, also called the ‘sigma 
factor.’ The ‘core’ enzyme, of about 400 kDa, con- 
tains all the subunits except o and can carry out the 
elongation reaction of polymerization using a DNA 
template and the four substrates ATP, CTP, GTP, and 
UTP. The evolutionarily conserved subunits are those 
that make up the core. However, site-specific initia- 
tion requires the o subunit, which allows RNAP to 
recognize the promoter. Most bacteria encode several 
alternative o factors (Escherichia coli encodes seven, 
Bacillus subtilis encodes 17), which may vary widely in 
size and which allow the RNAP to recognize several 
different types (sequences) of promoters. If there are 
several different o factors in a cell, there must be 
several different holoenzymes and, therefore, one 
could say there are several different RNAPs in a 
given bacterium. However, this would be misleading, 
because the o factor (of whatever kind) is only bound 
to the enzyme during initiation. Also, in a given bac- 
terium, the majority of genes typically require only a 
single species of sigma factor and, therefore, one form 
of the holoenzyme predominates. In E. coli the pri- 
mary o factor, and the first discovered, has a mass of 
70 kDa and is often referred to as o”. 

Initiation of transcription by RNAP at the pro- 
moter is a complex process involving many different 
steps. First, of course, the core enzyme must bind the 
appropriate o factor. The holoenzyme then binds to 


promoter DNA upstream of the transcriptional start 
site. RNAP then interacts with the DNA, leading to 
melting of about 14 bp of the promoter DNA, includ- 
ing the transcriptional start site. There is also a con- 
formational change of the RNAP during this process. 
RNAP can then begin RNA synthesis, but chain 
elongation often aborts, yielding short chains of less 
than 10 nucleotides. However, RNAP remains at the 
promoter and can undergo further rounds of abortive 
synthesis or true elongation. If the chain reaches 
about 10 nucleotides in length, o factor is released 
and the core RNAP begins moving along the DNA 
template, synthesizing the RNA chain. The antibiotic 
rifampicin specifically inhibits initiation by bacterial 
RNAP, at the first or second phosphodiester bond. 
The antibiotic binds to the B-subunit, and resistant 
mutants have mutations in the gene encoding this 
subunit. After initiation the o-subunit is released 
form RNAP and the elongation phase begins. 

Elongation by bacterial RNAP is inhibited by the 
antibiotic streptolydigin, which also binds to the 
B-subunit. During initiation the RNAP may span 
70-90 bp of DNA (some of which is wrapped around 
the enzyme), but this is reduced to about 35 bp during 
elongation. The newly synthesized RNA forms base 
pairs with the DNA template for approximately 8 or 9 
nucleotides. The newly synthesized chain exits the 
RNAP through a channel. The rate of elongation of 
an RNA chain in vivo may be about 50 nucleotides per 
second, but this rate is the mean of rapid elongation 
over some sequences and pauses at others. The elong- 
ating complex is quite stable (RNA molecules of 
over 10000 nucleotides may be synthesized), but the 
RNAP also terminates at specific DNA sequences, 
termed ‘transcription terminators.’ Some such se- 
quences can be recognized by the RNAP itself, 
but others require specific accessory proteins, called 
‘termination factors.’ 


Eukaryotic RNA Polymerases 


RNAP I, RNAP II, and RNAP III of the eukaryotic 
nucleus are quite different from each other structu- 
rally and each transcribes a different set of genes 
(other polymerases are located in the mitochondria 
and chloroplasts). However, all three have two large 
subunits that are related to each other and also to the 
two largest subunits of the bacterial RNAP. In add- 
ition, several of the smaller subunits are found in 
common among all three of these enzymes, or only 
between RNAP I and RNAP II. 

As with the bacterial RNAPs, there are special ac- 


cessory factors necessary for transcription initiation. 
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However, unlike the case in Bacteria, the eukaryotic 
initiation factors (and those of the Archaea) recognize 
the promoter elements independently, not as part of 
a polymerase holoenzyme. Many different initiation 
factors are involved, particularly in genes transcribed 
by RNAP II, and some of the initiation factors are 
themselves very complex proteins. Purified eukaryo- 
tic RNA polymerases, then, cannot selectively initiate 
transcription at promoters. The term ‘holoenzyme’ is 
sometimes used to refer to a eukaryotic RNAP, but in 
this case it refers to something more like the bacterial 
‘core’ enzyme and would not be able to initiate from 
promoters. However, unlike the bacterial core en- 
zyme, the eukaryotic holoenzyme may contain a 
large number of other proteins involved in transcrip- 
tion or the processing of RNA. 

RNAP I is found in the nucleolus and transcribes 
only genes encoding large ribosomal RNAs, the major- 
ity of cellular RNA synthesized. In yeast the enzyme 
has 13 subunits (and a mass of almost 600 kDa). Five of 
the smaller subunits are also found in yeast RNAP II 
and II and two others in yeast RNAP III. 

RNAP II transcribes genes which encode proteins, 
the majority of genes ina cell. It also transcribes genes 
encoding most of the small nuclear RNAs (snRNAs). 
Most organisms seem to have a 12-subunit RNAP II 
(with a mass of about 550kDa). However, several 
other proteins are required for complete activity and 
the RNAP holoenzyme may have a mass of 4000 kDa. 
RNAP II is inhibited by the fungal toxin o-amanitin, 
and thus eukaryotic mRNA synthesis is sensitive to 
this inhibitor. 

RNAP III primarily transcribes genes encoding 
transfer RNA and 5S RNA but also transcribes some 
genes encoding other small RNAs. RNAP III has 14 or 
more distinct subunits with a mass of almost 700 kDa. 
Although the promoters for RNAP Iand RNAP II lie 
for the most part upstream of the transcription start 
site (as is the case for prokaryotic promoters), some 
promoters for RNAP III lie downstream of the start 
site. 

The overall elongation complexes formed by these 
enzymes seem similar to those of the bacterial RNAPs. 
Although the mechanisms by which these enzymes 
locate promoters are quite different from that used 
by bacteria, the overall mechanism of transcriptional 
initiation, including abortive cycles, is very similar. 
Less is known about termination in eukaryotes, 
however. 


See also: Promoters; Sigma Factors; 
Transcription 
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Biological Role of mRNA Degradation 


Messenger RNA stability is an important control 
point in modulating gene expression for several rea- 
sons. First, the steady-state level of a given mRNA is 
determined by a balance between its rates of synthesis 
and degradation. Second, the stability of individual 
mRNAs can be altered in response to numerous envir- 
onmental stimuli including carbon source, viral in- 
fection, and developmental transitions, allowing for 
rapid alterations in gene expression. Third, a special- 
ized system of mRNA degradation functions to 
eliminate potentially deleterious errors in mRNA 
synthesis (see below). Finally, efficient mRNA degrad- 
ation is required for cell growth in both prokaryotes 
and eukaryotes, emphasizing the importance of this 
process. 


A Diversity of mRNA Turnover Pathways 
in Eukaryotic Cells 


There exist four known pathways of mRNA decay 
in eukaryotic cells (Figure 1). One major pathway is 
the deadenylation-dependent decapping pathway 
wherein mRNAs are deadenylated, decapped, and 
subsequently degraded in a 5'—3' direction. A second 
general pathway occurs by 3/5’ exonucleolytic 
degradation of the body of the transcript following 
removal of the 3’ poly(A) tail. Both of these path- 
ways are thought to be general pathways of mRNA 
decay that can act on most, if not all, eukaryotic 
mRNAs. 

In addition to the two general pathways described 
above, there are two mRNA decay pathways that are 
more specialized. The first involves rapid mRNA 
decapping prior to deadenylation, and is generally 
part of a process termed mRNA surveillance, which 
degrades aberrant mRNAs (see below). The second 
specialized pathway is one initiated by cleavage with- 
in the body of the mRNA catalyzed by sequence- 
specific endoribonucleases. 

Given these numerous degradation pathways, a 
significant point is that an individual mRNA can be 
subject to more than one pathway simultaneously. 
This has several consequences. First, the observed 
half-life will be a summation of the decay rates 
through each pathway. Second, the susceptibility of a 
single mRNA to degradation via multiple pathways 


raises the possibility that the pathways through which 
an individual mRNA is degraded may change under 
different conditions. 


The Deadenylation-Dependent 
Decapping Pathway 


Based primarily on work in the yeast Saccharomyces 
cerevisiae, the deadenylation-dependent decapping 
pathway (Figure l) appears to be the major pathway 
of mRNA decay. Degradation through this pathway is 
initiated by poly(A) shortening of the full-length, 
polyadenylated mRNA to an oligo(A) species that 
is no longer capable of binding to Pab1p, the major 
poly(A)-binding protein. Loss of Pabi1p from the 
poly(A) tail is thought to induce a transition in the 
mRNP that makes the mRNA a substrate for a decap- 
ping reaction, which cleaves the 5’ cap structure 
releasing m7GDP. This decapping reaction is followed 
by 5'—3' degradation of the mRNA. Two nucleases 
involved in the important steps of decapping and exo- 
nucleolytic degradation have been identified in yeast. 
The DCP1 gene encodes the decapping enzyme that 
hydrolyzes the 5’ cap structure in vivo and the product 
of the XRN1 gene catalyzes the 5’ —3’ exonucleolytic 
degradation of the decapped mRNA. The nuclease(s) 
responsible for poly(A) shortening have not yet been 
identified. 

One of the key features of the deadenylation- 
dependent decapping pathway is that it degrades 
both stable and unstable mRNA. Because of this, 
differences in mRNA decay rates between stable 
and unstable mRNAs can be specified by sequences 
which modulate either the deadenylation rate or the 
decapping rate, or both. 

Since poly(A) tails and cap structures are common 
features of eukaryotic mRNA, a reasonable proposal 
is that deadenylation-dependent decapping followed 
by 5'—3' degradation is a major mRNA decay path- 
way in many eukaryotes. This hypothesis is sup- 
ported, but not yet proven, by several observations. 
First, deadenylation precedes the decay of many 
mammalian mRNAs including the c-fos, c-myc, and 
GM-CSF mRNAs. Second, intermediates in mRNA 
decay in metazoans have been identified that are either 
trimmed at the 5’ end or lack the 5’ cap structure. 
Additionally, uncapped mRNAs are less stable than 
capped mRNAs in mammalian cells. Finally, the 
enzymes that perform the key steps in mRNA deg- 
radation in yeast, including Dcp1p and Xrnip, have 
homologs in more complex eukaryotes. Based on 
all of these facts, it is highly likely that deadenyla- 
tion dependent decapping is a major mechanism 
of mRNA degradation throughout eukaryotic 
kingdom. 
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The four currently known mRNA degradation pathways in eukaryotic cells. Shown in the middle are the 


two general mRNA degradation pathways initiated by deadenylation. The oligoadenylated mRNA can then be either 
decapped and degraded in a 5'—3' direction or degraded exonucleolytically in a 3’5’ direction. Also shown are the 
two more specialized mRNA degradation pathways, deadenylation-independent decapping and sequence specific 


endonucleolytic cleavage. 


3'—5’ Exonucleolytic Degradation of 
Eukaryotic mRNAs 


A second general pathway of mRNA degradation 
involves 35’ exonucleolytic degradation of the tran- 
script. This pathway has been most thoroughly docu- 
mented in yeast when the competing, and more rapid 
5'—3' degradation pathway is blocked either in cis or 
in trans. The 3'—5' degradation pathway may have a 
more prominent role in systems other than yeast. For 
example, decay intermediates consistent with 3’—5’ 
exonucleolysis have been observed for the degrad- 
ation of the oat phytochrome A mRNA in vivo. 
Degradation of the body of the transcript by 3’—5’ 
exonucleolysis is likely to occur following deadeny- 
lation, though this has not yet been proven. 

In yeast, the exonuclease(s) responsible for the 
3/5! degradation appears to be the exosome complex. 
The exosome is a large and evolutionarily conserved 
complex of several 3/5’ exoribonucleases that per- 
forms a variety of RNA processing and exonucleolytic 
degradative events, including mRNA degradation and 
ribosomal RNA processing (Figure 2). In addition to 
the exosome complex, several accessory factors have 
been shown to play a role in 3/5’ mRNA degrada- 
tion. These include the SKI2 protein, which has 
homology to the DExH box family of RNA helicases 


and may function in delivering the mRNA substrate to 
the exosome, and the SKI3 and SKI8 proteins whose 
function in 3’—5’ mRNA degradation is unknown. 

The deadenylation-dependent decapping and 3'—5' 
degradation pathways appear to be the only general 
pathways of decay in eukaryotic cells. This is based on 
the observations that if both pathways are inactivated 
in yeast the cells are unable to grow and mRNAs 
exhibit extreme stability. This observation also indi- 
cates that efficient mRNA turnover is required for cell 
growth and division. 


Deadenylation-Independent Decapping 
and mRNA Surveillance 


Certain mRNAs can be degraded via an alternative 
decapping pathway that is initiated prior to dead- 
enylation. In this pathway mRNAs are decapped with 
long poly(A) tails and subsequently degraded in a 
5'—3' direction. Substrates for this pathway include 
mRNAs containing premature translation termin- 
ation codons, mRNAs containing unspliced introns, 
and mRNAs with extended 3’ UTRs. This pathway is 
part of a conserved system termed ‘mRNA surveil- 
lance,’ wherein aberrant mRNAs are recognized as 
being incorrect and are rapidly degraded. 
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Figure 2 The roles of the exosome complex in mRNA degradation and 5.88 rRNA maturation. In each process, 
exonucleases of the exosome complex hydrolyze RNA in a 3’—5’ direction. 


The deadenylation-independent decapping path- 
way is evolutionarily conserved in eukaryotes such 
as yeast, the nematode Caenorhabditis elegans, and 
mammals. Given the aberrant mRNA substrates that 
are degraded through this pathway, deadenylation- 
independent decapping likely is important for main- 
taining the fidelity of gene expression. For example, 
translation of mRNAs containing premature trans- 
lation termination codons would create truncated 
proteins that could have dominant negative effects. 
Interestingly, and consistent with this view, smg 
mutants in C. elegans, which are defective for this 
degradative pathway, convert recessive nonsense 
mutations in the myosin gene unc-54 into dominant 
negative alleles. 

A major unresolved question is how an mRNA 
is recognized as ‘aberrant? and how that infor- 
mation is transmitted to lead to decapping of the 
mRNA. Several factors required for deadenylation- 
independent decapping have been identified in yeast 
and other systems though their exact roles remain 
unknown. Despite these unresolved issues, the dead- 
enylation-independent decapping pathway in yeast 
uses the same enzymes required for deadenylation- 
dependent decapping, namely Dcpip and Xrnip. 
Thus this pathway essentially triggers rapid decapping 
by bypassing the control systems that specify the rate 
of decapping of a normal transcript. 

Though currently known substrates for the 
deadenylation-independent decapping pathway are 
all ‘aberrant?’ mRNAs, a reasonable expectation is 
that deadenylation-independent decapping will also 
occur in response to other cues that trigger mRNA 
degradation. For example, in Chlamydomonas the 
rapid degradation of alpha-tubulin mRNA induced 
by deflagellation is deadenylation-independent. 
However, the alpha-tubulin mRNA is degraded via a 


deadenylation-dependent mechanism in the absence 
of this induction. Thus, in this system deadenylation- 
independent mRNA degradation can be a mechanism 
to rapidly induce the degradation of a specific normal 
mRNA. More examples of regulated decay through 
the deadenylation-independent decapping pathway 
likely will emerge as work continues in this area. 


Degradation of mRNAs Initiated by 
Endonucleolytic Cleavage 


Several eukaryotic mRNAs decay through a pathway 
initiated by endonucleolytic cleavage. Internal endo- 
nucleolytic cleavages create mRNA fragments that are 
substrates for further decay. Degradation of the prod- 
ucts of endonucleolytic cleavages presumably is per- 
formed by both the 5’-3’ and 3’—5’ exonucleases, 
though no experiments addressing this issue have been 
reported. Some mRNAs whose decay is initiated by 
endonucleolytic cleavage include the mammalian 9E3, 
IGFII, transferrin receptor, and albumin mRNAs as 
well as the Xenopus Xlhbox2B mRNA. In each of 
these examples, cleavage by an endonuclease requires 
specific mRNA sequences not found in all mRNAs. 
Therefore, decay initiated by endonucleolytic clea- 
vage is likely to be limited to an individual mRNA 
or subset of mRNAs containing specific endonuclease 
recognition sequences. 

As a consequence of the requirement for specific 
recognition sequences, endonucleolytic cleavage 
allows for transcript specific control of mRNA sta- 
bility. This control appears to occur via two different 
mechanisms. In many cases, the control appears to 
occur via competing protective factors. For example, 
binding of the iron response element-binding protein 
to the 3’ UTR of the transferrin receptor mRNA 
inhibits endonucleolytic cleavage of this mRNA 


through protecting the endonuclease recognition 
sequence from the endonuclease. Thus, in this case, 
the activity of the endonuclease does not appear to be 
regulated. Rather, accessibility of the endonuclease to 
its substrate is controlled. This example contrasts with 
other cases in which the activity of the endonuclease 
itself is regulated. An example of this type of regu- 
lation is the mammalian endonuclease RNaseL. 
RNaseL is activated only by oligomers of 2’, 5’ phos- 
phodiester-bonded adenylate residues that are pro- 
duced in response to double-stranded RNA. An 
important area for future research will be to identify 
and understand the regulation of the endonucleases 
that cleave mRNAs and their competing protective 
factors. 
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Overview 


The RNA world hypothesis states that early life was 
based on RNA. That is, the first enzymes were not 
made of protein but were instead made of RNA or 
a very similar polymer. Such enzymes composed of 
RNA are called ribozymes. According to the hypothe- 
sis, ribozymes first promoted the reactions required 
for life with the help of metals, amino acids and other 
small-molecule cofactors. Then, as RNA-based meta- 
bolism became more complex, it developed the ability 
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to synthesize coded polypeptides, which served as 
more sophisticated cofactors. DNA eventually re- 
placed RNA as the genetic polymer, and protein 
replaced RNA as the prominent biocatalyst. The con- 
version to protein catalysis is not considered com- 
plete; RNA retains a central role in protein synthesis. 
Remnants of ancestral ribozymes are also thought to 
persist as nucleotides within many cofactors, such as 
NAD+, NADPH, FAD, coenzyme A, coenzyme By», 
ATP and S-adenosylmethionine. 


The Case for the RNA World 


The conceptual appeal of the RNA world is that mini- 
mal, self-replicating forms of life are easier to imagine 
if their enzymes are composed of RNA. This is 
because ribozymes are much easier to replicate than 
are protein enzymes. Replication of a single protein 
enzyme requires dozens of macromolecules, including 
messenger RNA, transfer RNAs, aminoacyl-tRNA 
synthetases and the ribosome. Replication of a ribo- 
zyme is much simpler because the ribozyme mol- 
ecules themselves embody the genetic information 
needed for their replication — each ribozyme molecule 
serves as both gene and enzyme. Thus, replication 
would require only a single macromolecule, an 
RNA-dependent RNA polymerase, which could 
synthesize the complement of the ribozyme and then 
use this complement strand of RNA as a template 
to synthesize a copy of the ribozyme. If this RNA 
polymerase were itself a ribozyme, then one can con- 
ceive of a simple ensemble of molecules capable of 
self-replication and eventually giving rise to the 
protein—nucleic acid world of contemporary biology. 

Carl Woese, Francis Crick, and Leslie Orgel pro- 
posed the concept of the RNA world in the late 1960s. 
The popularity of the hypothesis surged in the early 
1980s when Thomas Cech, Sidney Altman, and 
Norman Pace discovered that RNA can catalyze reac- 
tions and that a few contemporary enzymes are indeed 
composed of RNA rather than of protein. At this time 
Walter Gilbert coined the term ‘RNA world.’ More 
recently, the isolation of new ribozymes from large 
libraries of random RNA sequences has begun to 
confirm that the catalytic abilities of RNA are com- 
patible with the RNA world hypothesis. These new 
ribozymes have demonstrated that RNA can synthe- 
size short fragments of RNA in a template-directed 
fashion and promote the formation of peptide, ester, 
and glycosidic linkages. The hypothesis is also receiv- 
ing renewed attention as structural studies of the 
ribosome show that the active site for peptide bond 
formation is composed of ribosomal RNA. This cata- 
lytic role for ribosomal RNA, combined with the 
previously established functions of messenger RNA 
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and transfer RNA, reinforces the idea that RNA was 
instrumental in the synthesis of the first coded pro- 
teins and had a prominent catalytic role before the 
advent of protein synthesis. 


Gaps to Be Filled 


The RNA world hypothesis has made its way into 
most of the recent biology textbooks. Nevertheless, 
the hypothesis is far from proven, and major difficul- 
ties remain, particularly the implausibility of prebiotic 
RNA synthesis. Few question the assertion that ribo- 
zymes played a much more important role in early 
evolution than they do in modern biology, but many 
are doubtful that life began with RNA per se. They 
have elaborated the RNA world hypothesis with the 
proposal that life began with an RNA-like polymer 
that possessed the catalytic and templating features of 
RNA but somehow lacked RNA’s undesirable traits. 
The era of this RNA-like polymer is referred to as the 
‘pre-RNA world,’ which presumably gave rise to the 
RNA world in a manner analogous to that in which 
the RNA world gave rise to the protein—nucleic acid 
world of today. The identification of plausible pre- 
RNA world polymers is a key pursuit of current 
research in the origins of life. Another important 
goal is the generation of ribozymes with activities 
that more fully represent those presupposed by the 
RNA world hypothesis. Such ribozymes would sup- 
port the hypothesis and provide components for con- 
structing minimal forms of RNA-based cellular life. 
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In 1916, W. Robertson reported for the first time on a 
chromosomal structural change in which two acro- 
centric chromosomes are united head to head and 
form a metacentric chromosome. This special case 


of a chromosomal rearrangement is termed in 
honor of W. Robertson Robertsonian translocation, 
Robertsonian fusion, or centric fusion. 

The establishment of a Robertsonian translocation 
needs breakage within the centromeric region of two 
acrocentric chromosomes and a mutual exchange of 
chromatin blocks. By this process a large biarmed 
Robertsonian translocation and a small translocation 
product is formed which mostly consists of hetero- 
chromatin and is regularly eliminated. Although a 
Robertsonian translocation reduces the number of 
chromosomes by one, this type of translocation is 
not accompanied by a gain or loss of genetic relevant 
material. The presence of a Robertsonian transloca- 
tion can easily be assessed by the analysis of meiotic 
chromosomes. In Robertsonian heterozygous indi- 
viduals a trivalent and in homozygous individuals 
mostly a large ring-bivalent is formed. 

Homozygous and heterozygous Robertsonian 
translocation carriers are phenotypically normal. 
However, Robertsonian translocation heterozygosity 
may have a considerable influence on fertility. Firstly, 
impairment of fertility can be caused by disturbances 
of meiotic chromosomal segregation with the produc- 
tion of aneuploid gametes leading finally to aneuploid, 
nonviable zygotes. Secondly, Robertsonian hetero- 
zygosity may cause a complete breakdown of gameto- 
genesis. The risk of producing aneuploid gametes 
seems to be higher in heterozygous females than in 
males. On the other hand, a breakdown of gametogen- 
esis is more pronounced in the male sex. 

The Robertsonian translocation is acommon spon- 
taneous chromosomal rearrangement. It has been 
found in many taxa of the animal kingdom including 
human, primates, insectivores, rodents, and insects. In 
mammals, Robertsonian translocations are common in 
insectivores and in rodents, particularly in the house 
mouse (Mus musculus). Robertsonian translocations 
are abundant. There is evidence that Robertsonian 
translocations play an important role for karyotype 
evolution and possibly for speciation. Furthermore, 
Robertsonian translocations are very useful as marker 
chromosomes in the field of cytogenetics. 


See also: Karyotype; Translocation 
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In some circular genomes, replication proceeds from 
the origin of replication in a unidirectional rather than 
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Figure | 


bidirectional manner. The DNA at the origin is first 
nicked by a specific endonuclease. The 3’ end of the 
nicked strand is then extended by a DNA polymerase, 
and the 5’ and is separated from its complementary 
strand. This leads to a rolling circle mode of replica- 
tion (see Figure |), where one strand of the genome is 
used as a rolling template to generate a long string of 
concatenated, linearized genomes. These concate- 
nated genomes can then be cleaved and recircularized 
to create new circular genomes like the original. This 
method can be used to generate linear or circular 
genomes, either single-stranded or double-stranded, 
and is used by many viruses. 


See also: Concatemer (Genomes); Theta (0) 
Replication 
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Root Development 


Roots develop from a set of dividing cells called an 
apical meristem (Figure |). The meristem is protected 
by the cells it produces, and particularly by the devel- 
opment of a root cap, a specialized set of cells that 
produces copious amounts of slime (sugars, often 
fucose) to allow the root to penetrate its substrate. 
The root cap also contains cells that can detect 
gravity, enabling the root to grow downward. 
Gravity-detecting cells have also been found in the 
elongation zone behind the meristem in Equisetum 
roots. Thus the meristem produces cells in two basic 
directions, downward to produce a root cap, and 
upward to produce the rest of the root. The meristem 
has some complexity, as it consists of a quiescent 
center (QC), where cell division is relatively less fre- 
quent (cell regeneration). Cells emanating from the 
QC divide more frequently, producing the copious 
numbers of cells required for root growth and develop- 
ment (generation of cell lineage). The meristem is 
compartmented into tiers from which emanate various 
clonal lineages (for details see Dolan et al., 1993). 
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Root cap cells produced down from the meristem 
mature quickly with little or no elongation. As the 
root cap must accept a lot of damage as it passes 
through the soil, there is a high turnover of cells 
through it, and root cap cells that are cleaved from 
the root cap can be found in abundance in that area, 
contributing to the general lubrication of the root tip. 
The epidermis of the root cap envelops the meri- 
stematic area and overlaps (overlays) epidermal cells 
produced above the meristem. 

Many of the cells that are produced above the meris- 
tem extend rapidly in an elongation zone. These 
cells specialize very early after leaving the meristem 
(Figure 2), and from longitudinal sections of roots it is 
very easy to see the cell lineages of small cells yet to 
elongate, because they are in line with cells above 
them (hence lineages). The fate of cells (how they 
develop) in roots appears to overlap their lineage, but 
the concept of cell fate is quite different. For example, 
the lineage of the epidermis (the outmost layer) of a 
root is plain to see in longitudinal section, but the fate 
of some of these cells is to produce root hairs, which 
develop after most of the cell’s elongation has com- 
pleted. Other cells may be derived from divisions in 
the elongation zone at 90 degrees to elongation, allow- 
ing the production of vascular parenchyma and extra 
cortical parenchyma, for example. Thus, though all 
cells are ultimately derived from the apical meristem, 
further divisions and specialization occur during elon- 
gation and in the mature primary root (e.g., for the 
initiation of lateral roots). 

There are three major tissue types that are pro- 
duced above the meristem: dermal, cortical (or ground 
tissue), and vascular. The epidermis is a single layer of 
cells on the external surface (thus producing a thin- 
walled cylinder of cells around the root) that to some 
extent protects the root but whose main function is to 
absorb nutrient-containing water for growth. The epi- 
dermis can increase its surface area for such a purpose 
by producing root hairs, which are tip-growing pro- 
tuberances that emanate from specialist cells called 
trichoblasts. Root hairs do not divide, and they are 
the only tip-growing cells physically attached to the 
plant. They make ideal model cells for the study of 
plant cell and molecular biology, and have recently 
been the subject of a monograph (Ridge and Emons, 
2000). There is a second dermal layer, one cell thick, 
that surrounds the vascular stele called the endoder- 
mis, which overlies another single-cell layer called the 
pericycle. From the pericycle emanate lateral roots 
which are induced by gradients in hormone levels 
(the apical meristem is an auxin sink, and apical domin- 
ance prevents the development of another apical meris- 
tem in close proximity, but distance reduces the 
effectivity of dominance). Legume nodules induced 
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Figure | 
van den Berg et al., 1998.) 


by the action of rhizobia also emanate from the peri- 
cycle opposite a xylem pole, as lateral roots do, and 
their vascular connection to the main vascular stele is 
derived through division and specialization of cells 
from the pericycle. A belt of suberized material sur- 
rounds all endodermal cells ensuring that substances 
getting into the vascular stele first pass through the 
cells (symplastic passage as opposed to apoplastic). 
This helps to prevent disease transmission and 
the cells that all things must pass through are very 


stele 


col 


Figure 2 Fate map of the Arabidopsis root. Initial cells 
for all the different cell types surround the quiescent 
center. end, endodermis; cort, cortex; epi, epidermis; 
Irc, lateral root cap; col, columella. (Reproduced with 
permission from van den Berg et al., 1998.) 
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(See Plate 35) Schematic representation of the Arabidopsis root. (Reproduced with permission from 


effective filters. Between the epidermis and endodermis 
is a cushion of cortical cells (simple parenchyma) that 
helps to protect the vascular stele. These parenchyma 
cells can also store products of photosynthesis in 
amyloplasts in the form of starch. Cortical cells in 
legume roots can also be induced to divide by the 
activity of rhizobia. 

The most noticeable lineage of cells emanating 
above the meristem is the vascular stele. Some of these 
cells elongate rapidly to form the tubes through which 
nutrient-loaded water is sent upward (the xylem) and 
the tubes related to the up-down passage of photo- 
synthates (sieve tubes of the phloem), and their asso- 
ciated cells (xylem parenchyma, phloem companion 
cells). Because roots are supported by the substrate, 
there are no supporting or strengthening cell types 
such as sclerenchyma and collenchyma, as found in 
above-ground organs. These cells manage to elongate 
rapidly in one direction because cell wall deposition is 
confined by the management of microtubules that 
form arrays just next to the plasma membrane. These 
arrays are thought to be helical or spiral, allowing 
stretching in one direction. Indeed this is reflected in 
the deposition of secondary thickening of xylem cells, 
where deposits of lignin can be seen in rings or spirals 
inside the cell. Such deposition allows great stretching 
of the cells while still maintaining some strength. As 
elongation slows, secondary thickening becomes more 
connected. The ends of xylem cells eventually form 
sieves or break down to form very long tubes; such 
formation accompanies cell death of xylem. 


Cells that detect gravity in the root cap and in the 
elongation zone (known only in Equisetum) are able to 
do so by sedimentation of both plastids and the nucleus. 
The exact detection mechanism remains unresolved, 
but the endoplasmic reticulum is clearly involved. A 
change in the direction of pressure of the sedimented 
organelles results in an asymmetric flow of auxin in 
the root tip, inducing an asymmetric elongation of 
cells in the elongation zone, allowing directionality. 


Root Development Genetics 


Almost all genetics work has been carried out on 
primary roots, and a considerable part of that on the 
plant Arabidopsis, which has a small genome and is 
easy to transform with Agrobacterium. In addition, 
Arabidopsis is characterized by an almost invariant 
sequence of cell expansions and divisions enabling 
screening for mutants impaired in certain aspects of 
cell expansion. The simplicity of its organization 
allows large-scale screening of mutants, and experi- 
ments have uncovered the existence of positional cues 
important for pattern formation. Several classes of 
mutants so far have contributed to our understanding 
of root development, and identification of these genes 
and the isolation of other mutants is on-going. In 
addition, identification of clonal cell lines has been 
made possible by transformation with GUS (Beta- 
Glucoronidase), and there is now much promise for 
the use of GFP (Green Fluorescent Protein)-protein 
fusions that also allow identification of proteins and 
the study of living plant organelles and processes in 
wild-type and mutant strains. 

During the 1990s about 48 distinct root mutants- 
were isolated and characterized (Scheres and Wolken- 
felt, 1998). About 50% in studies on pattern 
formation, and most of the rest related to meristem 
activity and cell expansion. Only eight mutants have 
been well characterized that are related to response to 
the environment, which include the WAV genes that 
affect tip rotation, gravity response genes, and the 
SKU genes that exaggerate right-slanting growth. 


Genes that Regulate Pattern Formation 

Mutations in the HOBBIT (HBT) gene interfere with 
cell division and cell type specification in the Arabi- 
dopsis root meristem. Such seedlings lack columella 
root cap cells, lateral root cap cells, and quiescent 
center (QC) cells, and no cell division occurs in the 
postembryonic root meristem. The root defect ori- 
ginates in the early embryo. Cell divisions in hbt 
mutants are defective in the region of the root meris- 
tem from which the hypophyseal cell is laid down 
during embryogenesis. Cell fates in the root meristem 
appear to be initiated correctly in the embryo, but 
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columella root cap, lateral root cap, and QC cells 
cannot be maintained. 

In pattern formation, genes of particular interest 
are those that regulate radial pattern formation. The 
scarecrow (scr) and short-root (shr) mutations result in 
embryonic, primary, and lateral roots that are missing 
one layer. Phenotypic characterization indicates a role 
for the SCARECROW and SHORT-ROOT genes 
in regulating a key asymmetric division required for 
generating the cortex and endodermal cell layers. 

Root hair mutants are those that either produce an 
excessive number of hairs, or few/none. 


Genes that Regulate Meristem Activity 
Though pattern formation mutant alleles such as 
MONOPTEROS and HOBBIT have no meristem, 
these genes are required early and they presumably 
affect cell division in the root meristem indirectly. 
In order to study meristems directly, more specific 
mutants are required, but so far few have been discov- 
ered. Some work has been carried out on those genes 
homologous to cell cycle regulator families, the 
cyclin-dependent kinases and the cyclins (both well 
characterized for yeast). For Arabidopsis the CDC2a 
gene marks cells that are competent for cell division 
and overexpression of the gene accelerates cell divi- 
sion. However, overexpression of the CYC1AT gene 
leads to an increase in root growth, which is at odds 
with its presumed normal role of controlling cell 
division. 


Genes that Regulate Cell Expansion 

The cell expansion mutants that we have characterized 
in depth fall into two classes. The sabre mutant repre- 
sents the class of Environmentally Responsive Expan- 
sion (ERE) mutants. The sabre mutation results in 
root cells that have shifted their principal direction 
of expansion from longitudinal to radial. The root 
expansion phenotype is dramatically rescued by 
reducing effective levels of the plant growth regulator, 
ethylene. SABRE gene function indicates that it may 
regulate cell expansion by acting in a pathway that 
counterbalances the ability of ethylene to promote 
radial expansion. In the Conditional Root Expansion 
(CORE) class of mutants, expansion appears to be 
responsive to internal signals. The abnormal cell 
expansion phenotype of the CORE mutants is condi- 
tional upon high growth rates and is not responsive to 
any regulatory substance. One of the CORE mutants, 
cobra, in which the direction of expansion is dramatic- 
ally altered under high growth-rate conditions, has 
abnormal expansion primarily affecting root epi- 
dermal cells. Phenotypic analysis suggests that the 
COBRA gene plays a role in regulating a shift from 
radial to longitudinal expansion. 


1756 RuvAB Enzyme 


References 

Dolan L, Janmaat K, Willemsen V et al. (1993) Cellular organisa- 
tion of the Arabidopsis thaliana root. Development | 19: 71-84. 

Ridge RW and Emons AMC (2000) Root Hairs: Cell and Molecular 
Biology. Tokyo: Springer-Verlag. 

Scheres B and Wolkenfelt H (1998) The Arabidopsis root as a 
model to study plant development. Plant Physiology and Bio- 
chemistry 36: 21-32. 

van den Berg C, Weisbeck P and Scheres B (1998) Cell fate 
and cell differentiation in the Arabidopsis root. Planta 205: 
483—491. 


See also: Arabidopsis thaliana: The Premier Model 
Plant; Plant Development, Genetics of; Transfer 
of Genetic Information from Agrobacterium 
tumefaciens to Plants 


RuvAB Enzyme 


S M Rosenberg and P J Hastings 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn/2001.1145 


RuvA and RuvB of Escherichia coli are DNA repair 
and homologous genetic recombination proteins that 
function in processing crossed-strand DNA junctions. 
Such DNA junctions hold homologous molecules 
together during recombination and must be processed 
or ‘resolved’ to form finished, duplex recombined 
DNA. Working together, the RuvAB complex binds 
crossed-strand junctions and catalyzes ‘branch migra- 
tion, the movement of the junction along DNA. 
Branch migration can lengthen the heteroduplex 
DNA joint connecting the recombining molecules 
and appears to be a prerequisite to endonucleolytic 
cleavage of the junction by RuvC endonuclease (see 
RuvC Enzyme). RuvC endonuclease cleaves two 
strands of the same polarity at a junction and thereby 
produces duplex recombined DNA products, the 
strands in which are then made continuous by liga- 
tion. Thus RuvAB and RuvC are part of an endo- 
nucleolytic junction-resolution system, providing 
one way that branched recombination intermediates 
can be processed or resolved in E. coli. 


Crossed-Strand DNA Junctions are Part 
of Intermediates in Recombination/ 
Repair 


Both three-strand and four-strand DNA junctions 
may be intermediates in homologous recombination 
and DNA break-repair via recombination (Figure 1). 
Such structures are formed when homologous mol- 
ecules exchange DNA strands, base-pairing a strand 


from one duplex with a complementary region in 
another. The regions of interduplex base-pairing are 
called heteroduplex DNA joints (see Heteroduplexes). 
Heteroduplex DNA enables homologous recombin- 
ation to be homologous, rather than random. 

The four-stranded junctions are called Holliday 
junctions, (see Holliday Junction), named after 
Robin Holliday, who used them in an early model 
for a molecular mechanism of homologous DNA 
recombination in fungal meiosis (Holliday, 1964; see 
also Whitehouse, 1963, Figure Id). 


Formation of Crossed-Strand Junctions 
by Strand Exchange 


Formation of Single-Strand DNA 

In a typical model for the molecular mechanism of 
recombination (Figure |), a broken DNA molecule 
may be processed by helicases and exonucleases, ex- 
posing a single-strand tail (Figure 1A); see RecBCD 
Enzyme, Pathway, regarding DNA end-processing in 
recombination/repair). In E. coli (and all organisms 
examined so far), the single strand can be coated by a 
strand-exchange protein(s), the prototype of which is 
E. coli RecA (see RecA Protein and Homology). 


Strand Exchange and RecA-Like Proteins 
RecA (and its orthologs in eubacteria, eukaryotes, 
and archaebacteria) forms a helical protein filament 
by polymerization along single-strand DNA. The 
protein-DNA filament scans a duplex molecule for 
DNA sequence regions complementary to the single 
strand and can catalyze exchange of the single strand 
for one strand of the duplex, forming heteroduplex 
DNA and creating a D-loop (Figure |B). 


Formation of Three- and Four-Strand 
Junctions 

The D-loop contains a three-stranded crossed-strand 
junction. The heteroduplex DNA in the inter- 
mediate can be lengthened by branch migration and, 
by further strand exchange to the left of the junction in 
Figure | Band C, can become a four-strand or Holliday 
junction involving reciprocal strand exchange (Figure 
1D). In vivo, this job might be carried out by RuvAB or 
RecG (see Recombination Pathways, Rec Genes). 


RuvAB Catalyzes Branch Migration 


Genetic studies indicate that RuvAB and RuvC pro- 
teins function in DNA repair and suggest that all three 
might act late in recombination (after formation of 
heteroduplex DNA; Benson et al, 1991; Lloyd, 
1991). Biochemical studies have provided a detailed 
picture of their activities on crossed-strand junc- 
tions such as Holliday junctions. Using artificially 
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Figure | (See Plate 37) Formation and branch migra- 
tion of crossed-strand junctions in recombination/repair 
intermediates. As an example of a homologous recombin- 
ation reaction, a model for double-strand break-repair or 
end repair is shown. (A) A single-strand DNA end exposed 
by helicase and/or nuclease action ata double-strand end is 
coated by RecA strand exchange protein. (B) Strand 
invasion allows base-pairing of complementary sequences 
forming a heteroduplex DNA joint in a D-loop; a three- 
strand junction is formed. (C-D) The heteroduplex DNA 
joint can be extended, stabilizing the recombination 
intermediate by RuvAB-mediated branch migration. (D) 
Further branch migration can allow formation of a four- 
strand Holliday junction. See RecBCD Enzyme, Pathway 
for more discussion of double-strand break- and end- 
repair in E. coli. See RuvC Enzyme for RuvC-mediated 
cleavage of RuvAB-processed Holliday junctions. 


constructed four-strand junctions, Steven West and 
colleagues have found that RuvA protein binds speci- 
fically to four-way junctions (Figure 2; West, 1997) 
and does so as a tetramer. The RuvA tetramer holds 
the junction in a square-planar configuration (Figure 
2). RuvB can load onto RuvA-bound junctions, bind- 
ing both RuvA and the DNA. RuvB is an ATPase and 
a helicase and is thought to provide the motor that 
powers branch migration of the junction (Figure 2B). 
The RuvAB complex is thought to twist the junction 
DNA, resulting in the breaking of hydrogen bonds 
holding base-paired strands in front of the junction, 
while base pairs re-form behind, causing the transloca- 
tion or branch migration of the junction (Figure 2B). 
This can allow extension of a heteroduplex joint (Fig- 
ure |B). If migration occurs in a direction opposite to 
that shown in Figure | (rightward, not shown), this 
activity could potentially also undo a recombination 
intermediate by separating the recombining molecules. 
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The precise mechanism of RuvB-powered branch 
migration by RuvAB is not fully understood, as it is 
not yet clear whether the helicase activity of RuvB 
is required for branch migration (George et al., 1999). 


Biological Functions of RuvAB-Mediated 
Branch Migration 


Branch migration of crossed-strand junctions can 
lengthen the region of heteroduplex DNA, holding 
recombining molecules together (Figure 1B-D). 
This may be required to stabilize recombination inter- 
mediates so that reversal of strand exchange is less 
likely and the reaction proceeds forward. 

Branch migration may also be required to convert 
three-strand junctions into four-strand junctions. This 
may also be required to stabilize the recombination 
intermediate and to allow endonucleolytic cleavage by 
RuvC, which recognizes only four-strand junctions 
(Figure IC; D). 

Branch migration of four-strand junctions by 
RuvAB is probably also required for endonucleolytic 
resolution of Holliday junctions by RuvC protein im 
vivo (see RuvC Enzyme for possible mechanisms). 
First, genetic studies have generally revealed similar 
phenotypes for cells lacking RuvA, RuvB, RuvC, or 
any of these enzymes simultaneously (reviewed by 
Lloyd and Low, 1996; Sharples et al., 1999 and West, 
1997), suggesting that they work as a team. Second, 
recent biochemical work indicates that RuvC works 
best in concert with RuvAB, suggesting that they may 
constitute a ‘resolvasome’, a large protein machine used 
for endonucleolytic resolution (Eggleston and West, 
2000; see also West, 1997). Third, endonucleolytic 
resolution is thought to be the only way to produce 
recombined molecules that have swapped DNA 
arms, or crossed over, and have not simultaneously 
replicated (‘break-join’ recombinants; see Break- 
Copy/Break—Join) — replication of the invaded DNA 
primed by the invading ends is an alternative way to 
resolve recombination intermediates (Motamedi et al., 
1999; see Break-Copy/Break—Join) (See also Double- 
Strand Break Repair Model for a third mechanism of 
resolution specific to noncrossover recombination.) 
Recent work indicates that RuvAB and RuvC are 
required for formation of unreplicated or ‘break— 
join’ recombinants in vivo (Motamedi et al., 1999), 
providing evidence for junction resolution activity of 
the Ruv system in vivo. 

RuvAB-mediated branch migration may also 
function independently of RuvC-catalyzed DNA 
cleavage. Although the replicative (or break—copy; see 
Break-Copy/Break-Join) break-repair pathway can 
function without RuvAB im vivo (Motamedi et al., 
1999), it is unknown whether RuvAB (or RuvC) 
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Figure 2 (See Plate 36) Branch migration of Holliday junctions by RuvAB. (Modified from West, 1997, from 
diagrams kindly provided by S. C. West.) (A) RuvAB binds synthetic Holliday junctions as a tetramer and holds the 
four strands in a square planar configuration, and (B) RuvB helicase functions as a hexamer. A hexameric ring of RuvB 
positions itself on each side of a RuvA-bound junction, with opposite duplex arms of the junction threaded through 
the hole in each ring. Branch migration is thought to be achieved by ATP-dependent turning of the RuvB rings, pulling 
two arms through the rings, thereby forcing the rings inward toward the junction, while the opposite arms are then 
pulled into the junctions swapping strands at the junction point. 


facilitate this route when they are present in cells. 
Branch migration might possibly facilitate DNA 
replication by acting on stalled replication forks to 
help in their reactivation (Cox et al., 2000). 

Finally, branch migration might possibly undo 
recombination intermediates (e.g, by migration 
rightward, although not shown in Figure IB). How- 
ever, in vivo and in vitro evidence exists for such an 
antirecombination activity for a different E. coli 
branch migration helicase, RecG (Harris et al., 1996; 
Whitby et al., 1993); RuvA and RuvB have so far been 
implicated only as promoters (not destroyers) of 
recombination. 
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DNA repair and recombination protein RuvC of 
Escherichia coli is an endonuclease that cleaves 
crossed-strand junctions in DNA intermediates 
in recombination. It is specific for four-strand or 
Holliday junctions (named for Robin Holliday, who 
proposed them in an early recombination model 
(Holliday, 1964); see Holliday’s Model; see also 
Whitehouse (1963)) and so is also called a Holliday 
junction resolvase. In collaboration with the RuvAB 
branch migration proteins (see RuvAB Enzyme), 
RuvC resolves Holliday junctions by cleaving two 
strands of the same polarity in the four-strand junc- 
tion. This separates the two recombining DNA regions 
into discrete duplex molecules which are then made 
continuous by ligation. E. coli RuvC is one of several 
identified junction-specific endonucleases. The 
majority are bacteriophage enzymes and can cleave 
three-strand or Y-junctions in addition to Holliday 
junctions. They have been found in eukaryotes in the 
mitochondria but not in the nucleus. 


Crossed-Strand DNA Junctions in Repair 
and Recombination Intermediates 


In the article on RuvAB enzyme (RuvAB Enzyme), 
the process by which crossed-strand junctions are 
formed during DNA repair via homologous recom- 
bination is described. Once formed, the RuvAB com- 
plex can catalyze ATP-dependent branch migration of 
the junction. RuvAB is probably required for RuvC 
activity on Holliday junctions. Once formed, one way 
that Holliday junctions can be resolved is by the endo- 
nuclease activity of RuvC (and similar proteins in 
other organisms). 


Endonucleolytic Resolution of Holliday 
Junctions by RuvC 


RuvC protein forms a dimeric complex with Holliday 
junctions (Figure | A) assisted by RuvAB (not shown). 
The DNA strands at the junction are held in an open 
configuration in the complex. RuvC can cleave oppos- 
ite strands (which have the same polarity) at the sin- 
gle-stranded region of the junction, resulting in 
resolution of the four-armed (cruciform) structure to 
two duplex molecules (Figure IB). The two resulting 
duplexes may have crossed-over, as in Figure IB, 
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depending on which two strands are cleaved. The 
side strand cleavages, indicated by the blue arrows in 
Figure IA, lead to crossover molecules being prod- 
uced (Figure IB, in which the products each have one 
green and one black end). If the top and bottom 
strands in Figure IA had been cleaved (not shown), 
non-crossover duplexes would have resulted. RuvC 
produces either outcome. 


Sequence-Specificity of RuvC-Mediated 
Cleavages in vitro 


The cleavage in vitro exhibits a sequence-specificity, 
occurring preferentially at5’(A/T)TT(G/C) sequences. 
This small sequence should be present frequently in 
DNA. In a random distribution of equally frequent 
nucleotides, it would be represented every 64 base 
pairs. If RuvC has a sequence-specificity for cleavage 
in vivo as it does in vitro, this might explain genetic 
and biochemical evidence suggesting that RuvC 
requires RuvAB to function. RuvAB may be required 
to move Holliday junctions, via branch migration 
(see RuvAB Enzyme), to a position in the DNA at 
which the RuvC preferential cleavage sequence is pre- 
sent at the junction, so that RuvC can then make its 
endonucleolytic cleavages. That is, RuvAB and RuvC 
may act sequentially and without direct contact. 
Alternatively, it is possible that RuvAB must associate 
with RuvC directly, the three protein multimers for- 
ming a larger protein-machine, a resolvasome, in order 
for RuvAB or RuvC to work at branch-migration and 
cleavage of Holliday junctions. 


Other Holliday Junction Resolution 
Enzymes 


Several bacteriophages encode proteins capable of 
endonucleolytic resolution of Holliday junctions. 
Phage lambda (A), and a segment of DNA in the E. coli 
genome that appears to be a remnant of sequences 
derived from a A-like prophage, encode Holliday 
junction resolvases called Rap (the à enzyme; Sharples 
et al., 1998) and Rus, the Rap-ortholog in E. coli 
(Mahdi et al., 1996). Like RuvC, these enzymes can 
catalyze endonucleolytic cleavage of four-strand junc- 
tions. Bacteriophages T4 and T7 also encode Holliday 
junction cleaving enzymes, called Endo VII (the prod- 
uct of T4 gene 49; Kemper et al., 1984) and Endo I (the 
T7 enzyme). These enzymes and the i enzyme differ 
from RuvC in that they can cleave three-strand junc- 
tions and Y-junctions in addition to four-strand junc- 
tions. A probable reason why bacteriophages possess 
such enzymes may be to cut branched DNA network 
structures that result from replication and recombina- 
tion of the phage DNA, so that the DNAs can be 
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Figure | 


cleavage 


(See Plate 38) Holliday junction resolution by RuvC endonuclease. (A) RuvC protein, the three- 


dimensional structure of which is not yet known, binds to Holliday junctions as a dimer in concert with RuvA and 
RuvB proteins (not shown here; see RuvAB Enzyme). RuvC is an endonuclease that cleaves either set of strands of 
the same polarity at the Holliday junction. In this figure, the cleavage, indicated by arrows, is of the strands at right and 
left. Cleavage of the other two strands (top and bottom) would also be possible (not shown). (B) Cleavage leaves 
single-strand nicks that can be sealed by ligase (not shown). Cleavage of the strands indicated would produce new 
duplex molecules with arms | and 2 covalently linked and arms 3 and 4 covalently linked. Figure modified from West 


(1997), from diagrams kindly provided by S.C. West. 


packaged into phage capsids. The T4 enzyme associ- 
ates with the packaging enzymes, presumably to per- 
form this function. Holliday junction resolvases have 
been found in the mitochondria of eukaryotes, but 
nuclear Holliday junction resolvases, if they exist, 
remain to be discovered and elaborated. 
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S phase is the part of the eukaryotic cell cycle during 
which DNA synthesis takes place. 


See also: Cell Cycle 
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An S1 nuclease is an enzyme that specifically digests 
single-stranded sequences of DNA. 


See also: Nuclease 
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The study of yeast genetics originated in the 1930s 
with the work of Winge (1935). He pioneered tetrad 
dissection and discovered the alternation between 
haploid and diploid phases. He also discovered the 
homothallism (HO) gene (this gene converts haploid 
spores to diploid cells within two divisions by causing 
a switch in mating type in two of the four resultant 
cells) and characterized many genes for the fermenta- 
tion of sucrose, raffinose, and maltose. Winge used 
strains that had been isolated by Emil Christian Hansen 
and Albert Kloecker. Carl Lindegren also was one of 
the pioneers of yeast genetics and fortuitously had 
been provided with a strain that had been isolated 
by Emil Mrak in 1938. This strain was diploid and 


heterothallic, and Lindegren characterized the bipolar 
mating system and developed the first genetic map of 
this organism. He provided strains of both mating 
types to several laboratories and this helped to get 
yeast genetics started. 

Figure | shows the life cycles of homothallic and 
heterothallic yeasts. There are two principal genes 
involved, the mating-type locus and the homothallism 
gene. Both heterothallic and homothallic yeasts have 
functional mating-type loci. Heterothallic yeasts havea 
nonfunctional homothallism gene (ho), so they divide 
vegetatively either as haploids or as diploids. The 
haploids are of two mating types, a and a. Haploid 
cells will mate with cells of the opposite mating type to 
form diploids. The diploids have both mating-type 
alleles and will not mate, but they can sporulate. 
Homothallic strains have functional mating type and 
homothallism genes, and are diploid and usually 
homozygous for the homothallism gene. 

Nearly all laboratory strains are heterothallic, yet 
such strains are in the minority among natural yeasts. 
In a study of 239 natural yeasts that had been isolated 
from natural (noninoculated) wine fermentations, it 
was found that 185 were homozygous for HO, 26 
were homozygous for ho, and 28 were heterozygous 
(HO/ho) for these two alleles (Mortimer, 2000). It is 
likely that the laboratory strains had their origins as 
wine or beer yeasts. Mrak’s strain EM93 was found 
on rotting figs, and it seems likely that this yeast 
was carried to the figs by insects in a manner similar 
to that in which they are transported to grapes 
(Mortimer and Polsinelli, 1999). 

The heterothallic yeasts have stable haploid and 
diploid vegetative phases. By treating the haploid 
cells with a mutagen such as UV light or ethyl methane 
sulfonate and plating the survivors on a medium such 
as yeast extract-peptone—dextrose agar, mutations can 
be recovered. The mutations can be detected by 
replica-plating on a selective medium such as minimal 
medium. Some of the replicas will not grow because 
they require nutrients not present in the minimal 
medium. Other selective regimes can detect other 
classes of mutants. Both dominant and recessive 
mutants will be detected by this approach. 

The next step is to cross those ‘mutants’ to haploid 
cells of the opposite mating type which do not 
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spores. The top two spores shown pair, but only about one in seven such pairings produces a zygote. The third spore 
shown is left to undergo homothallic switching. In two divisions, two of the four resulting cells switch mating type and 
mate with the other two to form ‘twin zygotes. Switching of mating type occurs only after a cell has divided at least 
once. Heterothallic yeast: two diploid cells are considered and they each produce four spores by meiosis. These 
spores divide by vegetative division and then a cell from each of two spore clones that are of opposite mating type 
mate to form a zygote. At the bottom, one cell from the upper ascus mates with one cell from the lower ascus. 
Nearly 100% of cell-cell pairings between heterothallic strains of opposite mating type form zygotes. HO, 


homothallism gene; ho, nonfunctional homothallism gene. 


carry the mutation. If the diploid is nonmutant, the 
mutation is recessive; if the diploid has the mutant 
phenotype, then the mutation is dominant. Nearly 
all mutations are recessive. If the diploid segregates 
during meiosis into a ratio of two nonmutant spores to 
two mutant spores, then the difference between the 
mutant and wild-type is located in a single gene. This 
is a direct prediction of Mendel’s laws of inheritance. 
Such crosses also yield segregants that carry the muta- 
tion but are of opposite mating type. If, for example, 
we cross a number of tryptophan-requiring mutants 
and obtain strains of both mating types for these 
mutants, then different mutants can be crossed with 


each other. If the diploid is tryptophan-requiring and 
both mutations are recessive, then the two mutations 
are noncomplementing and are in the same gene. If the 
diploid does not require tryptophan, then the muta- 
tions complement and this diploid must be analyzed 
genetically. If it segregates both tryptophan-requiring 
and tryptophan- nonrequiring spores, then the muta- 
tions are in different genes. However, some of the 
tryptophan- nonrequiring diploids may segregate 
only tryptophan-requiring spores. This indicates that 
the mutations are in the same gene, that is, they are 
alleles that show interallelic complementation. Such 
complementation is due to interactions at the level of 


the protein products of these alleles. If the cross seg- 
regates both tryptophan-requiring and tryptophan- 
nonrequiring spores, then the ratios of these two 
classes in individual tetrads determines whether 
these genes are linked to each other or to their respect- 
ive centromeres. This is determined by the laws 
of tetrad analysis. For the five tryptophan genes, 
two are linked to their respective centromeres 
and the other three are unlinked and are on different 
chromosomes. 

By crossing many different classes of mutants in 
various combinations, a genetic map was eventually 
developed. This map describes the locations of over 
2000 genes on 16 chromosomes. The total genetic map 
length is around 4500 cM. The total number of genes 
determined from the nucleotide sequence of the DNA 
is about 6200. 

Some of the various classes of mutants that have 
been analyzed and mapped are: 


1. Temperature-sensitive mutants that control the cell 
cycle, protein synthesis, DNA synthesis, and RNA 
synthesis. These mutants grow at 23 °C but not at 
36 °C. The cell-cycle mutations are especially im- 
portant, because many of these genes have human 
homologs and are related to cancer. 

2. Mutants in genes that control DNA repair. Several 
of these genes have human homologs and are 
related to cancer. 

3. Suppressor genes and nonsense suppressors. These 
are mutations in genes that reverse the phenotype of 
mutations in other genes. A nonsense suppressor 
reverses the phenotype of any mutation that is 
caused by a nonsense mutation. 

4. Genes controlling fermentation of various sugars. 
Saccharomyces cerevisiae ferments many sugars. 
The genes for fermentation of sucrose, maltose, 
and raffinose are dominant and polymeric, that is, 
cells with any one of several functional genes can 
ferment the sugar. 

5. Mutants in genes that control the synthesis of the 
cell wall. Yeast has a wall composed mostly of 
mannan and glucan, and these polysaccharides are 
in turn synthesized by sets of genes. 

6. Mutants in genes that control mitochondrial func- 
tion. The mitochondria are autonomously replicat- 
ing, yet they depend on nuclear genes for the 
synthesis of many of their components. 

7. Mutants that affect mating. In addition to the 
mating-type locus and the homothallism genes are 
several other genes that affect the expression of the 
mating-type genes. 

8. Mutants that are unable to sporulate (meiotic 
mutants). Meiosis is a complex process and many 
genes control this sequence of events. 
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9. Mutants that affect the uptake of various nutri- 
ents. The uptake of nutrients into the cell is com- 
plex and involves many genes. There are general 
amino acid permeases and specific permeases. 

10. There are many genes controlling the resistance to 
various toxic materials such as toxic metals, amino 
acid analogs, and fungicides. 


The nucleotide sequence of this organism was com- 
pleted in 1996 and this was the first eukaryotic genome 
to be sequenced. Methods developed in this project 
have since been applied to the sequencing of other 
organisms, including humans. 


Further Reading 

Broach J, Pringle J and Jones E (eds) (1997) The Molecular and 
Cellular Biology of the Yeast Saccharomyces. Plainview, NY: 
Cold Spring Harbor Laboratory Press. 
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The physical size of the yeast chromosomes pre- 
cludes their individual visualization with the light 
microscope. The development of pulsed-field gel elec- 
trophoresis solved this problem. Individual chromo- 
some bands are resolved by this method and these 
bands can be sized. The method allowed comparisons 
of different species of yeast, chromosome polymorph- 
isms, translocations, and mapping of genes to chromo- 
somes. The complete nucleotide sequence of the 
Saccharomyces cerevisiae genome revealed the precise 
sizes of the chromosomes (Table |) and many add- 
itional features of them (14 Mb DNA in total). In 
general this information corroborates earlier genetic 
data indicating the number of chromosomes (16), the 
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Table | The sizes of the l6 chromosomes of 
Saccharomyces cerevisiae 

Chromosome DNA (bp) 
l 230 203 
ll 813 140 
lll 316613 
IV 1531929 
V 576 869 
Vi 270 148 
VII 1090 9365 
VIII 562 639 
IX 439 885 
xX 745 440 
xI 666 445 
XII 1 078 172° 
XIII 924 430 
XIV 784 328 
XV 1091 283 
XVI 948061 


“plus rDNA. In addition the mitochondrial DNA is 85 779 
bp. 


order of certain genes on them, and the locations of 
centromeres. 


Structure of Chromosomes 


Centromeres 

The 16 centromeres of S. cerevisiae are approximately 
300 bp in length. These sequences bind a protein that is 
part of the kinetochore assembly. It is to this structure 
that the spindle fibers attach. There appears to be one 
spindle fiber per centromere. The centromeres are 
essential for proper chromosome assortment in both 
mitosis and meiosis. Genes that are near centromeres 
tend to segregate in the first meiotic division, and this 
is how centromeres were identified and mapped. 


Telomeres 

Telomeres are protein-DNA structures at the termini 
of linear chromosomes. The telomere appears to act as 
a repressor of genes in the subtelomeric regions. The 
genes located in this region include all 25 of the poly- 
meric fermentation genes, the homothallism gene 
HO, and several of the glycolytic genes. The telo- 
meres are special structures whose integrity is main- 
tained by the enzyme telomerase. Shortening of 
telomeres is proposed as one of the mechanisms that 
control the life span of cells. 


Replication Origins 
DNA replication origins are located in intergenic 
regions and are flanked by open reading frames. 


Chromosome III has seven such origins. They are 
referred to as autonomous replication sequences 


(ARSs). 


Nucleosomes 

The basic unit of the chromosome fiber is the nucleo- 
some. It is composed of two each of four histone 
molecules, H2A, H2B, H3, and H4. Histone H1 
seems to be involved in wrapping DNA around this 
octamer. About 200 bp of DNA are associated with 
each nucleosome, which is approximately 10nm in 
diameter. 


Chromatin 

Chromatin is an assembly of chromosome fibers that 
are considerably supercoiled. Functional portions of 
the chromatin are called euchromatin and nonfunc- 
tional parts are called heterochromatin. 


Spindle Fibers 

The spindle fibers form the connections between the 
centromeres and the centrioles. In yeast there is one 
fiber per centromere. The fibers pull the sister chro- 
matids apart during anaphase and, because the 
chromatids are intertwined, this process involves the 
action of a topoisomerase. 


Centrioles 

Centrioles are the other attachment points of the spin- 
dle fibers. They form the poles of the cell division 
process. 


Artificial Chromosomes 

The realization that the essential features of a chromo- 
some were a centromere, two telomeres, a replication 
origin, and sufficient DNA to form a chromosome, 
led to the development of yeast artificial chromo- 
somes (YACs). A vector with most of these essential 
components and a cloning site was developed, and 
then DNA from any source could be cloned into the 
cloning site. All that remained was to linearize this 
molecule and transform it into yeast. YACs are used in 
many ways. For example, they have been used to clone 
the DNA of humans, Drosophila spp., and other 
organisms as part of the effort to sequence the 
genomes of these organisms. 


Segregation of Chromosomes 
Mitotic Division 
Mitosis occurs in haploid, diploid, and higher ploidy 


yeast cells and produces two cells. There are four 
steps in mitosis: prophase, metaphase, anaphase, and 


telophase. Fluorescent im situ hybridization was 
used to study chromosome movement in S. cerevisiae 
mitosis. The features were like those seen in other 
eukaryotes. Crossing over can occur both within and 
between genes in diploid and higher-ploidy yeast 
cells. 


Meiotic Division 

Meiosis occurs in diploid and higher-ploidy cells. The 
process involves pairing of homologous chromo- 
somes, recombination between these chromosomes, 
and two special divisions, meiosis I and meiosis II, 
which sort the recombined chromatids into four 
spores. Recombination seems to initiate at specific 
sites along the chromosomes, and double-strand 
breaks occur at these sites to initiate recombination. 


Recombination Nodules 

Recombination nodules are proteinaceous structures, 
approximately 100nm in diameter, which are asso- 
ciated with the synaptonemal complexes in early 
meiotic prophase I. RADS1 and DMC1 are yeast 
genes involved in this process. Their protein products 
have homologs in mice and lily as well as to the RecA 
protein of bacteria. S. cerevisiae has about 75 recom- 
bination nodules in meiotic prophase. 


Synaptonemal Complex 

Synaptonemal complexes (SCs) are structures formed 
in the first meiotic prophase when homologous 
chromosomes pair. Recombination nodules are spaced 
along these structures where crossing over occurs. 
Several gene products are needed for proper meiosis, 
including Zip1. This protein occurs as a dimer with the 
NH), ends in the central region of the SC and the 
C-terminus in the lateral regions. 


Meiotic Recombination 

Shorter yeast chromosomes have a higher frequency 
of crossing-over per kilobase compared with longer 
chromosomes. In an examination of 10 genetically 
studied organisms, it was found that the frequencies 
of crossing-over per unit of physical size (megabase of 
DNA) was inversely related to the average physical 
size of the chromosomes (megabase per chromosome; 
see Figure |). S. cerevisiae has the highest frequency 
of crossing-over per unit of physical size of all these 
organisms. The frequencies of crossing-over per 
megabase vary over a 10°-fold range. Within this 
range there is a small number of exchanges per 
chromosome, which must be under evolutionary con- 
trol in all these organisms. 
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log (Mb/chr) 


Figure | Frequency of crossing-over in relation to 
chromosome size. (co, crossovers; chr, chromosome.) 
Points I-10 represent various organisms. 


Organization of Genes on 
Chromosomes 


Functionally related genes in yeast generally are not 
linked to each other, in contrast to the usual situation 
in prokaryotes. However, there are some notable 
exceptions. Three complementation groups of muta- 
tions blocking histidine biosynthesis are linked, and 
these ‘genes’ HIS4A, -B, and -C control three en- 
zymatic steps in biosynthesis of this nutrient. How- 
ever, a single protein is encoded by these genes and it 
has all three enzymatic activities. Similarly, five com- 
plementation groups of mutations blocking aromatic 
amino acid biosynthesis are clustered; the gene ARO1 
encodes a pentafunctional polypeptide. On the other 
hand, the three separately transcribed genes in gal- 
actose fermentation are tightly linked and control 
three steps in this process. As mentioned earlier, the 25 
polymeric fermentation genes controlling the fermen- 
tation of sucrose, maltose, melibiose, and o«-methyl 
glucoside are located in the subtelomeric regions of 
several chromosomes, and many combinations of 
these genes are linked. 


Further Reading 

Broach J, Pringle J and Jones E (eds) (1997) The Molecular and 
Cellular Biology of the Yeast Saccharomyces. Plainview, NY: 
Cold Spring Harbor Laboratory Press. 


See also: Chromosome; Chromosome Structure; 
Saccharomyces cerevisiae (Brewer’s Yeast) 
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Salmonella typhimurium (Salmonella 
enterica) — A Genetic System in a 
Pathogenic Bacterium 


Salmonella typhimurium strain LT2 has been the sub- 
ject of detailed genetic analysis since the discovery of 
transduction (virus-mediated genetic exchange) in this 
organism. By applying both transductional and con- 
jugational crosses, a genetic map was constructed 
and many aspects of physiology have been genetically 
investigated. The parallel development of genetics in a 
sister species (Escherichia coli, strain K12) has pro- 
vided a situation for comparison of their shared gene 
systems and for analysis of the process of bacterial 
speciation. The Salmonella genetic system takes on 
added importance in that many Salmonella isolates 
are important pathogens and genetics can be used 
to investigate mechanisms of virulence. Salmonella 
typhimurium causes a typhoid fever in mice; it is a 
less serious pathogen for humans but is the causative 
agent of common food-borne enteric infections. 


Taxonomy of Salmonella 


The bacterium Salmonella typhimurium strain LT2 
belongs to the general group of enteric bacteria, 
which includes other organisms that have been in- 
tensively studied: E. coli, Klebsiella, and Citrobacter. 
Salmonella was recognized long ago as a disease 
organism (Eberth, 1880) and many isolates, distin- 
guished largely on serological criteria, were given 
species names within the genus Salmonella (e.g., 
S. typhimurium, S. typhi, S. dublin). Subsequent bio- 
chemical and genetic characterization revealed that 
these serovars are very closely related and so the 
many Salmonella species were recently combined 
into a new, more broadly inclusive group Salmonella 
enterica (Le Minor and Popoff, 1987). According to 
this nomenclature, the strain used in genetic analysis is 
designated Salmonella enterica serovar typhimurium 
strain LT2. For a discussion of evolutionary genetics 
of Salmonella and its divergence from E. coli see 
Selander et al. (1996) Lawrence and Roth (1999). 


Phage P22 and Transductional Crosses 


The development of Salmonella as a genetic system 
was made possible by discovery of the generalized 


transducing phage P22 (Zinder, 1992), which permits 
genetic crosses between Salmonella strains. Phage 
P22, a close relative of the E. coli phage lambda, is a 
temperate phage that replicates its genome and then 
packages it into protein capsids by a headful-measuring 
mechanism. This mechanism makes the virus prone to 
occasional encapsulation of fragments (44 kb) of the 
host (Salmonella) chromosome. Virus particles with 
such a fragment (transducing particles) can inject bac- 
terial DNA intoanew hostcell allowing recombination 
between the injected (transduced) fragment from the 
donor host and the chromosome of the new recipient 
bacterium. All regions of the Salmonella chromosome 
are transducible. Most genetic analysis in Salmonella is 
performed using P22-mediated transductional crosses. 
Salmonella is an attractive genetic system largely 
because its phage (P22) is easy to propagate, stable 
during storage, and an efficient transducer. Mutational 
modification of phage P22 has improved its usefulness 
by increasing its transducing frequency and preventing 
stable lysogeny so that phage-free recombinants can be 
obtained for subsequent crosses. The biology of phage 
P22 has been reviewed by Susskind and Botstein (1978) 
and the general process of transductional genetic 
analysis has been surveyed by Masters (1996). 


Development of Salmonella as a Genetic 
System 


After the discovery of P22 and transduction, Milislav 
Demerec and his coworkers initiated development of a 
general genetic system in Salmonella. They isolated 
a variety of mutants, studied the genes involved in 
several metabolic pathways, and initiated chromo- 
some mapping. Using P22-mediated crosses for fine- 
structure mapping and conjugational crosses for 
long-range mapping, a detailed genetic map was devel- 
oped (Sanderson et al., 1995). The complete genomic 
DNA base sequences of S. typhimurium and of S. typhi 
are nearly complete and several others are in progress. 
Among the important findings during early work 
in Salmonella is the observation that genes with 
related functions are frequently clustered in the bac- 
terial chromosome (Demerec and Hartman, 1959). 
This later led to discovery of the operon (clusters of 
genes that are transcribed into mRNA from a single 
promoter start site). Recent work in Salmonella has 
suggested a mechanism for evolution of bacterial oper- 
ons based on selection for enhanced horizontal transfer 
of multiple genes that contribute to a single selectable 
phenotype (Lawrence and Roth, unpublished data). 
Major gene-enzyme systems developed in 
Salmonella are the histidine operon analyzed by 
Lawrence and Roth (1996) and the leucine operon. 
These systems contributed to the understanding of 


operons and their expression and control, and to the 
biochemistry of the individual synthetic pathways. 
One of the genetic methods developed initially in 
Salmonella was the use of transposable genetic elem- 
ents as a means of making mutations. The transpos- 
able element is a DNA sequence that includes a drug 
resistance determinant and can insert itself into genes 
of the bacterial chromosome. Mutants made by inser- 
tion of such transposons have two phenotypes, a 
dominant drug-resistance phenotype (conferred by 
the inserted material) and a recessive null phenotype 
caused by inactivation of the target gene. These muta- 
tions make it possible to transduce mutations select- 
ively into new genetic backgrounds by transductional 
crosses (Kleckner et al., 1977). In addition, insertion 
mutations add substantial sequences to various sites in 
the chromosome; selection for recombination between 
these sequences makes it possible to create a variety of 
genomic rearrangements (Roth et al., 1996a). Drug- 
resistance insertions make it possible to selectively 
clone many regions of the chromosome. Since the 
advent of PCR amplification of DNA, insertions have 
proved useful because they place known sequences at 
a variety of points in the chromosome that can help in 
amplification of particular chromosome regions. 


Transposon Tn/0 

One of the first transposable elements used in genetic 
analysis was Tn10, a tetracycline-resistance trans- 
poson discovered in Salmonella (Kleckner et al., 1978). 
A notable aspect of work in Salmonella has been the 
characterization of this transposon, its structure, and 
the mechanisms by which it transposes and causes 
chromosome rearrangements (Kleckner et al., 1991, 
1996). The tight regulation of Tn/0’s transpositional 
activity made this element an ideal tool for genetic 
analysis of bacteria. 


Analysis of Virulence 

The application of genetic methods to the study of 
pathogenicity revealed that the Salmonella chromo- 
some includes several separated blocks of genes 
(roughly 2% of the total chromosome) encoding func- 
tions that are of particular importance for invasion of 
hosts and evasion of host defense systems. These 
blocks of genes are known as ‘pathogenicity islands’ 
all or some of which are present in various Salmonella 
strains but are absent from E. coli (Finlay and Falkow, 
1989, 1997; Groisman and Ochman, 1997). 


Evolutionary Divergence of Salmonella 
enterica and Escherichia coli 


Despite the difficulties in the species concept for 
bacteria, taxonomic criteria make it reasonable to 
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regard S. enterica and E. coli as independent species 
that have diverged from a common ancestor living 
over 100 million years ago. Each of these species has 
been subjected to intense genetic analysis, including 
the determination of the complete genome sequence. 
The divergence of these species may be the first 
example of an act of speciation for which all genetic 
changes can be visualized. While the two species show 
a very low frequency of genetic exchange in nature 
(Maynard Smith et al., 1993), that low level has been 
important to their evolution. Early work on DNA 
hybridization, confirmed by sequence data, suggested 
that the genomes of S. enterica and E. coli share 
about 75% of their sequences. The genes that are 
present in one species but not the other must 
encode the species-specific attributes, including 
those used in taxonomic identification. Some species- 
specific genes show sequence characteristics suggest- 
ing that they entered the chromosome of S. enterica or 
E. coli after the divergence of the two organisms. 
These sequences were acquired after transfer from 
distantly related (presumably bacterial) species (i.e., 
by horizontal transfer). Other species-specific genes 
appear to be ancestral genes that were maintained by 
one lineage and lost from the other (Lawrence and 
Roth, 1996; Lawrence and Roth, 1999). Salmonella- 
specific genes include those for synthesis and 
use of vitamin B42, about 2% of the Salmonella gen- 
ome (Roth et al, 1996b), and the horizontally 
acquired pathogenicity islands mentioned above. 
Thus the divergence of S. enterica and E. coli appears 
to have occurred by assembly of distinct sets of genes 
through differential gene loss and acquisition in the 
two lineages (genomic flux) rather than by sequence 
divergence of ancestral genes or internal creation of 
new genes. 
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Frederick Sanger (1918— ) pioneered two areas of bio- 
chemistry research. First, he developed methods for 
the determination of the amino acid sequences of 
proteins, and when he had completed the structure of 
insulin, then turned his interest to developing methods 
for sequencing nucleic acids, first RNA and later 
DNA. Progressing from the sequence of a small DNA 
virus, DX174, of about 6 kb, to bovine mitochondrial 
DNA, 17 kb, to the genome of the lambda bacterio- 
phage (50 kb), Sanger pioneered what has now become 
the genome revolution. He received one Nobel Prize 
in Chemistry for his insulin work in 1958 and a second 
one for his work on nucleic acids in 1980. 


See also: DNA Sequencing 


Sarcomas 


C S Cooper 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1617 


The term ‘sarcoma’ is used to describe a heteroge- 
neous group of cancers that exhibit the differential 
features of various supporting and skeletal tissues of 
the body such as smooth and striated muscle, fat, 
fibrous tissue, vessels, bone, and cartilage. This 
group constitutes a major histogenic class distinct 
from neoplasms of epithelial origin (carcinomas), 
blood and lymphorticular origin (leukemias or lym- 
phomas), and of the central nervous system. They are 
usually named according to the tissue they most 
resemble, though in some cases such as synovial 
sarcoma, there is no clear normal tissue homolog 
(Table 1). 

Bone and soft tissue sarcomas account for ~2% of 
human malignancies and 3-4% of cancer deaths with 
around 8000 cases diagnosed in the USA each year. 
They include the bone tumors osteosarcoma, and 
chondrosarcoma (cartilage), as well as the soft tissue 
sarcomas liposarcoma (fat), leiomyosarcomas (smooth 
muscle), angiosarcomas (vessels), rhabdomyosarcomas 
(striated muscle), fibrosarcomas, Kaposi’s sarcoma, and 
malignant fibrous histiocytoma (MFH). Ewing’s sar- 
coma, a highly malignant primary bone tumor, and 
malignant peripheral nerve sheath tumors are also 
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Frederick Sanger (1918— ) pioneered two areas of bio- 
chemistry research. First, he developed methods for 
the determination of the amino acid sequences of 
proteins, and when he had completed the structure of 
insulin, then turned his interest to developing methods 
for sequencing nucleic acids, first RNA and later 
DNA. Progressing from the sequence of a small DNA 
virus, DX174, of about 6 kb, to bovine mitochondrial 
DNA, 17 kb, to the genome of the lambda bacterio- 
phage (50 kb), Sanger pioneered what has now become 
the genome revolution. He received one Nobel Prize 
in Chemistry for his insulin work in 1958 and a second 
one for his work on nucleic acids in 1980. 
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The term ‘sarcoma’ is used to describe a heteroge- 
neous group of cancers that exhibit the differential 
features of various supporting and skeletal tissues of 
the body such as smooth and striated muscle, fat, 
fibrous tissue, vessels, bone, and cartilage. This 
group constitutes a major histogenic class distinct 
from neoplasms of epithelial origin (carcinomas), 
blood and lymphorticular origin (leukemias or lym- 
phomas), and of the central nervous system. They are 
usually named according to the tissue they most 
resemble, though in some cases such as synovial 
sarcoma, there is no clear normal tissue homolog 
(Table 1). 

Bone and soft tissue sarcomas account for ~2% of 
human malignancies and 3-4% of cancer deaths with 
around 8000 cases diagnosed in the USA each year. 
They include the bone tumors osteosarcoma, and 
chondrosarcoma (cartilage), as well as the soft tissue 
sarcomas liposarcoma (fat), leiomyosarcomas (smooth 
muscle), angiosarcomas (vessels), rhabdomyosarcomas 
(striated muscle), fibrosarcomas, Kaposi’s sarcoma, and 
malignant fibrous histiocytoma (MFH). Ewing’s sar- 
coma, a highly malignant primary bone tumor, and 
malignant peripheral nerve sheath tumors are also 


included. An additional tumor group is fibromatosis, 
a locally infiltrative but nonmetastasizing fibroblas- 
tic proliferation, which is conventionally divided 
into slow-growing superficial fibromatosis and fast- 
growing deep fibromatosis (desmoids). 

The major sarcoma classes and subclasses are listed 
in Table l. Sarcomas are relatively more common 
in children with rhabdomyosarcoma accounting for 
6-7% of children’s cancers. Benign sarcomas such as 
lipomas and leiomyosarcomas are up to 100 times 
more common than their malignant counterparts but 
there is no evidence supporting the view that these 
particular benign lesions are the precursors of malig- 
nant sarcomas. Considerable differences are observed 


Table I Major classes and subclasses of sarcomas 


Angiosarcoma 
Chondrosarcoma 

Myxoid 

Mesenchymal 

Dedifferentiated 
Ewing’s sarcoma 
Fibromatoses 

Superficial 

Deep (desmoid) 
Fibromatoses 

Adult 

Congenital or infantile 
Giant cell tumors (oesteoclastomes) 
Kaposi’s sarcoma 
Leiomyosarcomas 
Liposarcomas 

Well differentiated 

Myxoid 

Renal cell 

Pleomorphic 

Dedifferentiated 
Malignant fibrous histiocytoma 

Storiform—pleomorphic 

Myxoid 

Giant cell 
Malignant peripheral nerve sheath tumors 

Malignant gastrointestinal stromal tumors (GIST) 
Osteosarcoma 
Rhabdomyosarcomas 

Alveolar 

Embryonal 

Botryoid 

Pleomorphic 
Synovial sarcomas 

Biphasic 

Monophasic 

Poorly differentiated 
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in the age distribution of occurrence of individual 
sarcoma classes and subclasses. The embryonal sub- 
class of rhabdomyosarcoma occurs most frequently in 
children under 15. Ewing’s sarcoma, alveolar rhabdo- 
myosarcoma, and synovial sarcoma occur most fre- 
quently in adolescents and young adults, while other 
categories such as liposarcoma, osteosarcomas, and 
malignant fibrous histocytomas (MFHs) occur pre- 
dominantly in adults. 


Etiology 


Several etiological factors have been suggested as pos- 
sible causes of sarcoma development. Infection with 
HIV and with herpesvirus type 8 are associated with 
the development of Kaposi’s sarcoma. Trauma and 
past injury as well as exposure to environmental 
agents such as dioxin have also been proposed as risk 
factors. Particularly, the etiology of deep fibromatosis 
(or desmoids) is thought to involve trauma since they 
have been reported to arise in surgical scars and bullet 
wounds. There is good evidence that exposure to 
radiation is a causal factor with around 0.1% of cancer 
patients who are treated with radiation and survive 
5 years developing bone or soft tissue sarcomas. 
Immunodeficiency or immunosuppression by drugs 
such as cyclosporin have been established as risk fac- 
tors for the development of a variety of sarcoma types. 
Sarcomas may also arise in other organisms but the 
histopathological classification is not as sophisticated 
as that in humans with most lesions being referred to 
simply as ‘sarcomas.’ Sarcomas may be induced in 
many species including rodents, cats, and chickens 
by type C retroviruses and often arise following treat- 
ment of rodents with chemical carcinogens. 

Several genetic diseases are associated with the 
development of soft tissue and bone sarcoma (Table 2). 
The development of deep fibromatosis can occur in 
patients with familial adenoma polyposis (FAP) in a 
condition known as Gardner syndrome that often 
includes benign soft tissue tumors such as lipomas 
and leiomyomas. Pure FAP and Gardner syndrome 
both occur as a consequence of germline mutation in 
the APC gene, which encodes a protein that functions 
in the regulations of cell growth. Mutations in APC 
usually result in a truncated protein that can no longer 
correctly regulate B-catenin. Detailed analysis of 
mutations in the APC gene in FAP and Gardner 
patients failed to reveal any correlation between the 
type of mutations in the APC gene and desmoid for- 
mation. Indeed it has been found that individuals with 
a G > T transition mutation in codon 309 of the APC 
gene can either have the range of symptoms associated 
with Gardner syndrome or have FAP with no evi- 
dence of extracolonic manifestation. This observation 
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Table 2 Inherited predisposition to soft tissue sarcomas 

Disorder Sarcoma type Genellocation 
Gardner’s syndrome Fibromatosis APC 
Li-Fraumeni syndrome Rhabdomyosarcomas plus other types p53, CHK2 
Beckwith-Wiedemann syndrome Rhabdomyosarcoma 

Von Recklinghausen’s neurofibromatosis Malignant peripheral nerve sheath tumours NFI 

Paget’s disease Osteosarcoma 18q12 

Familial retinoblastoma Soft tissue sarcomas and osteosarcomas RBI 


suggests that other genetic factors may control the 
conditions specifically associated with the develop- 
ment of desmoids. 

The Li-Fraumeni syndrome is an autosomal domin- 
ant familial cancer syndrome that is defined by the 
existence of both a proband with sarcoma and tumors 
in other first-degree relatives with cancer by the age of 
45. The cancers associated with this syndrome include 
early onset soft tissue sarcomas, osteosarcomas, breast 
cancers, brain cancers, and leukemias. Germline muta- 
tions within the p53 or CHK2 genes have been im- 
plicated as the cause for many of these families. 
Hereditary retinoblastoma, which is usually found in 
children under 5 years of age, is caused by germline 
mutations in the RB/ suppressor genes. Of patients 
with the hereditary form of this disease, 10-20% will 
develop secondary tumors, usually osteosarcomas and 
soft tissue tumors, later in life. 

Von Recklinghausen’s neurofibromatosis (type 1 
neurofibromatosis NF1) is an autosomal dominant 
disorder affecting around 1 in 3500 newborns. Char- 
acteristics of this disease are the presence of café- 
au-lait spots, skin neurofibromas, Lisch nodules, and 
orthopedic abnormalities. Five percent of individuals 
develop Schwannoma of the peripheral nerves and 2% 
of cases develop malignant peripheral nerve sheath 
tumors. This disorder is caused by germline mutations 
in the NF1 gene that encodes a protein called neuro- 
fibrin involved in controlling signaling through the 
RAS pathway. 

The Beckwith-Wiedemann syndrome (BWS) is 
characterized by abdominal wall defects, macroglos- 
sia, and gigantism and is associated with an increased 
risk of developing Wilms’ tumor, hepatoblastoma, and 
rhabdomyosarcoma. The predisposing gene has been 
localized to 11p15 and analysis of this syndrome 
strongly suggests that the BWS gene is imprinted with 
insulin growth factor 2 (IGF2) and cyclin dependent 
kinase inhibitor. CDKNIC (p57/kip2) have been 
proposed as the candidate genes for this disorder. 

Osteosarcomas commonly develop against a back- 
ground of preexisting bone pathology such as Paget’s 
disease of the bone, multiple enchondromas, multiple 
osteochondromas, chronic osteomyelitis, fibrous 


dysplasia, and fractures of bone. In the case of Paget’s 
disease, which accounts for around 10-15% of osteo- 
sarcoma in individuals over 30, a gene accounted for 
the familial form of this disease and has been mapped 
to chromosome band 18q12. 

The detection of inherited mutations that predis- 
pose to cancer development is extremely useful in the 
management of sarcomas through the identification of 
individuals of high risk of cancer development. For 
example it is now possible to screen Li-Fraumeni 
families for mutations in the p53 and CHK2 gene, 
to identify carriers who have high risk of sarcoma 
development. 


Pathology and Prognosis 


Despite the definition of major sarcoma categories in 
AFIP and WHO classification schemes considerable 
controversy still exists over the definition of some 
sarcoma types. For example, although a decade ago 
MFH tumors were one of the most common cat- 
egories, it has been proposed that many, perhaps 
most, of these sarcomas should be reclassified as pleo- 
morphic liposarcomas, leiomosarcomas, or rhabdo- 
myosarcomas. Because of this and other problems 
with the differential diagnosis of sarcoma there is 
clearly a continued need to search for markers that 
will improve diagnostic reliability. Histopathological 
categories and subcategories commonly define disease 
with distinct clinical behavior and prognosis. Liposar- 
comas, for example, can be divided into myxoid, 
round-cell (poorly differentiated), well differenti- 
ated, dedifferentiated, and pleomorphic forms. Their 
clinical behavior closely reflects the state of differen- 
tiation with the round-cell tumors having poorer 
prognosis than the myxoid tumors, while the well- 
differentiated subclasses have the best outcome. Syn- 
ovial sarcoma may be divided into monophasic and 
biphasic variants with calcification, an indicator of a 
more favorable prognosis, observed in around 20% of 
cases. Poorly differentiated synovial sarcomas which 
are associated with a more aggressive and metastatic 
behavior are also occasionally observed. Several sub- 
classes of rhabdomyosarcoma including embryonal, 


botyroid embryonal, alveolar, and pleomorphic can be 
distinguished. 

Osteosarcomas can vary widely in appearance and 
site of occurrence but have in common the presence of 
an anaplastic mesencymal parenchyma that is punctu- 
ated by the formation of osteoid matrix. These tumors 
are generally aggressive but can vary in behavior 
depending on the presence of predominant lines of 
differentiation such as osteoblastic, telangiectatic, or 
chondroblastic. Pure osteoblastoma, sometimes called 
giant osteoid osteoma, is in fact usually benign. Simi- 
larly pure chondroblastomas are commonly benign in 
their behavior. A variety of subtypes of chondro- 
sarcoma can be distinguished including myxoid, 
mesenchymal, and dedifferentiated. Although chon- 
drosarcomas are in general amenable to surgical 
removal and cure the dedifferentiated variant form 
represents a particularly aggressive variant. 

A variety of factors in addition to histopathological 
appearance can influence clinical outcome. Grading of 
sarcomas is based on mitotic activity, extent of recro- 
sis, degree of cellularity, anaplasia, infiltrative growth, 
matrix formation, calcification, and presence of 
hemorrhage, although the first two of these factors 
are usually the most important. A three-grade system 
(I-III) has proven most useful for predicting survival 
and likely treatment response with the highest grade 
faring worst. Histopathogical subtype may be used for 
establishing grade with Ewing’s sarcoma, osteosar- 
comas, rhabdomyosarcoma, and round-cell and pleo- 
morphic liposarcomas all being considered as high 
grade. Well-differential and myoxid liposarcomas are 
considered as low grade. 

The American Joint Committee (ASC) system 
stages sarcomas based on size, infiltration by the pri- 
mary tumor, involvement of lymph nodes, presence of 
metastatic tumor, and the sarcoma type and grade. 
Usually smaller tumors (<5 cm) without lymph node 
involvement or metastatic tumor are grade I, while 
large tumors with lymph node involvement and meta- 
stases represent the highest stage IV. The difference be- 
tween these grades is dramatic with most stage I cancers 
and <10% of Stage IV tumors surviving 5 years. 


Clinical Evaluation and Treatment 


Once the histopathological diagnosis has been deter- 
mined a variety of procedures including X-rays, CT 
scan, and magnetic resonance imaging are used to 
determine the sarcoma’s precise shape and degree of 
infiltration. Surgical removal is the mainstay of the 
majority of treatment plans for many sarcoma classes. 
One of the key problems with surgery is the tendency 
of the tumor to recur locally. A wide margin around 
the tumor is usually removed to minimize recurrence, 
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perhaps including muscle groups or even limb ampu- 
tation. The precise site of the sarcoma can determine 
the ease with which the sarcoma can be surgically 
removed and hence overall survival. Sarcomas in the 
extremities are usually easier to treat, particularly 
those located distally as they are usually detected 
when relatively small and are readily accessible. Sar- 
comas located at sites where removal is technically 
more difficult, such as in the head and neck or adjacent 
to vital structures, may by comparison have a poor 
prognosis. Surgery is often accompanied by radio- 
therapy, usually postoperatively and particularly for 
tumors of higher grades and stages. These treatments 
can achieve good local control in patients, but for 
intermediate or high-grade sarcomas up to 50% of 
patients may subsequently develop metastatic disease. 
Adjuvant chemotherapy may be used in these patients 
in attempts to improve overall survival. Drugs such 
as doxorubicin, ifosofamide, vincristine, cisplatin, 
dactinomycin, dacarbazine, cyclophosphamide, and 
methotrexate are used in these treatments and admin- 
istration may occur preoperatively for some sarcomas 
such as osteosarcomas of the extremities. Although 
chemotherapy is not curative for most adult sarcomas, 
certain classes of sarcoma such as childhood rhabdo- 
myosarcoma, osteogenic sarcomas, and Ewing’s sar- 
coma exhibit much better responses. HIV-infected 
individuals with Kaposi’s sarcoma may be treated 
with a combination antiretroviral therapy consisting 
of indinavir which targets the viral protease, and one 
or more viral reverse-transcriptase inhibitors. 


Cytogenetic Aberrations 


Cytogenetic studies have identified specific chromo- 
some translocations in several classes of sarcomas 
(Table 3). These abnormalities can be found as the 
only cytogenetic alteration indicating that their for- 
mation may be a key event in tumor development and 
they usually occur ina high proportion of tumors thus 
offering the prospect of their use in tumor diagnosis. 
A consistent feature of these translocations that they 
result in fusion of genes present at two different cyto- 
genetic locations. The new chimeric genes created by 
the translocation are expressed as chimateric tran- 
scripts, which in turn encode novel fusion proteins. 
The t(X;18)(p11.2;q11.2) translocation detected in 
practically all monophasic and biphasic variants of 
synovial sarcoma was found to result in the fusion of 
the SYT gene on chromosome 18 to either of two 
closely related genes SSX1 and SSX2 located at 
Xp11.2. The predicted protein encoded by the SYT- 
SSX transcripts most commonly consist of the 379 
amino terminal amino acids of SYT (lacking the final 
eight C-terminal amino acids) fused to the 78 
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Table 3 Major translocations in human sarcomas 


Tumor Translocation Genes 
Synovial sarcomas t(X;18)(pI 1.2;ql 1.2) SYT, SSX1, SSX2 
Myxoid and round-cell liposarcomas t(12;16)(q13;p1 1) FUS/TLS, CHOP 
Ewing’s sarcoma t(11;22)(q24;q12) EWS, FLII 
Childhood fibrosarcoma der(15)t(12;15)(p13;q25) ETV6/TEL, NTRK3 
Rhabdomyosarcoma t(2;13)(q35;q14) PAX3, FKHR 
t(1;13)(q36;q14) PAX7, FKHR 
Extraskeletal myxoid chondrosarcoma t(9;22)(q3 1;q12) EWS, CHN/TEC 
Clear-cell sarcomas t(12;22)(q13;q12) EWS, ATF! 
Desmoplastic round-cell tumors t(11;22)(p13;q12) EWS, WTI 
Dermatofibrosarcoma t(17;22)(q22;q13) PDGFf, COLA! 


C-terminal amino acids of either SSX or SSX2. The 
use of reverse transcriptase PCR technology for 
detecting this diagnostic translocation is of particular 
use in distinguishing the monophasic spindle cell sub- 
type of synovial sarcoma from other types of spindle 
cell tumor that fall within its differential diagnosis 
such as fibrosarcoma, leiomyosarcoma, malignant 
peripheral nerve sheath tumors, MFHs, and hemangio- 
pericytomas. The SYT-SSX1 and SYT-SSX2 fusions 
have prognostic importance since the metastasis-free 
survival time is significantly longer in cases involving 
fusion to SSX2. 

The consistent translocation t(12;16)(p18;p11) 
identified in liposarcomas causes the fusion of N- 
terminal transcription activation domain of FUS/ 
TLS to the entire open reading frame of CHOP, a 
protein normally involved in the cellular response to 
endoplasmic reticulum stress. Interestingly this fusion 
is found in both myxoid liposarcoma and in a diag- 
nostically and prognostically distinct category desig- 
nated round-cell liposarcomas. Myxoid liposarcomas 
are rarely metastatic and associated with favorable 
survival while round-cell tumors are usually highly 
metastatic and high-grade. 

The translocations t(2;13)(q35;q14) and t(1;13) 
(p36;q14) are found in the majority of alveolar rhab- 
domyosarcomas and in a low proportion of tumors 
diagnosed as embryonal rhabdomyosarcomas. These 
rearrangements result in the fusion of N-terminal 
paired box and homeodomains of either PAX3 
or PAX7 to the C-terminal domain of the fork head 
protein FKHR. The aberrant PAX3-FKHR and 
PAX7-FKHR transcription factors appear to disrupt 
the normal process of muscle differentiation leading to 
tumor development. 

A t(11;22)(q24;q12) translocations is found in 
approximately 80% of Ewing’s sarcomas, in neuro- 
epithelioma, and in Askin’s tumors, indicating a com- 
mon histiogenesis of these tumors. At the molecular 
level this translocation results in the fusion of 


N-terminal EWS transcriptional activation sequence 
to the FLI1 DNA binding domain. Variants of 
this fusion have been observed in Ewing’s sarcomas in 
which the EWS gene becomes fused to other Ets family 
members including the ERG, ETV-1, EIAF, and/or 
FEV genes. Interestingly one of the splicing variants 
of the EWS-FLI1 (called type 1) appears to define a 
clinically favourable subset of Ewing’s sarcoma. The 
EWS gene also becomes fused to the ATF2 transcrip- 
tion factor gene in clear-cell sarcoma (also known as 
malignant melanoma of soft parts) and to the WTZ gene 
in desmoplastic small cell tumors, a primitive sarcoma 
with desmoplastic and multilineage differentiation. 

The recurrent translocation t(9;22)(q22;q12) found 
in extraskeletal myxoid chondrosarcoma also involves 
fusions of the EWS gene in this case to the CHN/TEC 
gene that encodes a novel orphan nuclear receptor 
containing a zinc finger DNA-binding domain. A 
variety of other rare fusions have been found includ- 
ing the fusion of the platelet derived growth factor B 
gene and the collagen COLA/ gene in dermofibro- 
sarcoma and the fusion of the ETV6(TEL) gene, an 
Ets family member to the NTRK3 neurotrophin-3 
receptor gene in childhood fibrosarcoma. 


Oncogene and Suppressor Genes 


A variety of somatically acquired alterations in onco- 
genes and suppressor genes have been observed in 
sarcoma. Alteration of several genes in the cellular 
control pathway involved in regulating RB1 status 
have been observed including mutation and loss of 
the RB1 gene itself, loss of the CDKN6/P16, p14, 
and p15 genes, and amplification and overexpression 
of the CDK4 and cyclinD1 genes. Alterations have 
also found in the p53 suppressor gene pathway. In 
addition to mutations in the p53 gene that are found 
in almost all classes of sarcomas, amplification and 
overexpression of MDM2 proto-oncogene which en- 
codes a protein, which can bind to and inhibit the 


growth-regulating function of p53, is found in up to a 
third of both bone and soft tissue sarcomas. An inter- 
esting feature of these alterations is that mutation of 
the RB1 gene and p53 gene are often found together in 
individual sarcoma indicating that cooperative may 
occur between these abnormalities. Activation of 
members of the RAS gene family by point mutations 
at codons 12, 13, and 6 have been found in some 
sarcomas such as rhabdomyosarcomas and leiomyo- 
sarcomas while amplification and overexpression of a 
selection of genes including the SAS gene and MYC 
gene family members have also been implicated in 
sarcoma development. In addition to p53 and RB1 
alterations of other genes implicated in inherited 
predisposition to sarcoma development have also 
been observed in spontaneous lesions. For example 
mutations in the APC gene have been observed in 
sporadic desmoid tumors. Interestingly for this sar- 
coma a high frequency of alteration of the APC inter- 
acting protein PB-catenin has also been observed. 
Molecular cytogenetic studies have identified many 
other consistent regions of chromosome genes in sar- 
comas including amplification of 1q21-22 sequences 
in soft tissue tumours and of 6q and 17p sequence in 
oesteosarcoma. 


Future Perspectives 


Many clinical problems remain to be addressed. For 
some sarcoma categories precise diagnosis remains 
difficult or controversial. There is an urgent need for 
new clinical markers that will predict how tumors 
may behave or whether they will be drug resistant. 
Survival for many sarcoma classes remain poor and 
more effective drugs are urgently required. Finally 
much still remains to be learnt about the molecular 
mechanisms of development of these tumors. Current 
technological advances including (1) the completion 
of the Human Genome Project that will lead to the 
identification of all human genes, (2) the development 
of microarray technology for analyzing many thou- 
sands of genes simultaneously, (3) the initiation of the 
Cancer Genome Project that will screen major human 
tumors including sarcomas for alterations at all genes 
in the human genome, and (4) the development of new 
drugs that target key cellular control pathways are 
likely to have a major impact on sarcoma management 
over the next decade. 
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‘Chromosomal satellite’ is the term given to that part 
of the end of a chromosome which is separated from 
the rest of the chromosome by a secondary constric- 
tion. (The primary constriction refers to the region of 
the chromosome occupied by the centromere.) It 
seems that chromosomal satellites were first described 
by Russian cytologists early in the twentieth century 
who used the term ‘sputnic.’ The secondary constric- 
tion marks the site of the nucleolar organizer, a region 
containing multiple copies of the ribosomal genes (see 
Nucleolus). This region remains attached to the 
nucleolus during interphase, and nucleolar remnants 
remaining on the chromosome may lead the chro- 
matin fiber containing the ribosomal genes to persist 
as a thin thread at metaphase. 

In the human karyotype there are five pairs of 
acrocentric chromosomes (13, 14, 15, 21, and 22) 
which carry nucleolar organizers on their short arms. 
Not all are active in any one cell, as revealed by silver 
staining of the ribosomal nucleoprotein at metaphase. 
Their activity determines whether or not a satellite 
is formed. The satellite itself is largely composed of 
families of repetitive DNA (heterochromatin) usually 
without the presence of transcribed genes. The satel- 
lite is variable in size, and particularly large or dupli- 
cated satellites may often be observed as familial traits, 
inherited through many generations of a pedigree. The 
satellite region may also be deleted, and these variants 
as well as the large heteromorphisms are without 
phenotypic effect on the carrier individual. 
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During interphase the nucleolar organizers of sev- 
eral satellited chromosomes may together form a com- 
mon nucleolus. In the following metaphase, with the 
dissolution (disassembly) of the nucleolus, these 
chromosomes may be observed still to be attached to 
one another by their short arms, a phenomenon de- 
scribed as ‘satellite association,’ and an example of the 
nonrandom arrangement of chromosomes during the 
cell cycle. It is believed that this close association may 
favor recombination between nonhomologous acro- 
centric chromosomes leading to the formation of 
Robertsonian translocations. Robertsonian transloca- 
tions, particularly those between chromosomes 13 and 
14, and 13 and 21, are the commonest chromosomal 
rearrangements found in humans (see Robertsonian 
Translocation). 


See also: Human Chromosomes; Nucleolus; 
Robertsonian Translocation 


SCE (Sister Chromatid 
Exchange) 
J Hodgkin 
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Sister chromatid exchange (SCE) is a form of exchange, 
equivalent to recombination, between the two replicat- 
ing or replicated chromatids of a chromosome, which 
can occur at mitosis as well as at meiosis. When chro- 
matids are differentially labeled, as in the harlequin 
chromosome technique, SCE events can be seen as dis- 
continuities in the uniform labeling of each chromatid. 


See also: Chromatid; Harlequin Chromosomes 
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Origin of the Term 


‘Schizophrenia’ is the term that was introduced by 
E. Bleuler to refer to a ‘group of diseases’ that had 
been identified as ‘dementia praecox’ by E. Kraepelin 
at the turn of the nineteenth century. It connotes that 
group of ‘psychoses’ (illnesses characterized by the 
presence of delusions, hallucinations, and thought 


disorder) that have a poor outcome. Outcome was 
the criterion that Kraepelin depended upon for his 
definition. Bleuler argued that there was a fundamen- 
tal disturbance of thinking — a ‘loosening of associat- 
ions.’ Neither criterion is well defined. 

Arguably the entity is as well established as the 
lynchpin of psychiatric thought at the millennium as 
it was at the end of the nineteenth century. Other 
conditions, a fortiori the affective psychoses (dis- 
orders with psychotic symptoms in which mood 
disturbance predominates), are defined by reference to 
schizophrenia — these are diagnoses to be considered 
when schizophrenia has been excluded. The classical 
Kraepelinian concept that ‘schizophrenia’ (with psy- 
chotic symptoms that cannot be explained in terms of 
a primary mood change) and ‘manic-depressive ill- 
ness’ (in which the mood change is primary) constitute 
separate disease entities is central to almost all discus- 
sions of psychiatric diagnosis and practice. The struc- 
ture of textbooks and examinations is built around it. 


Categories or Dimensions? 


But there are serious doubts about whether either 
Bleuler’s or Kraepelin’s concept defines a real disease 
entity. The fundamental flaw was apparent to 
Kraepelin. It is simply that the concept has no bound- 
aries. In 1920 Kraepelin wrote of: 


the difficulties which prevent us from distinguishing reliably 
between manic-depressive insanity and dementia praecox. 
No experienced psychiatrist will deny there is an alarmingly 
large number of cases in which it seems impossible, in spite 
of the most careful observation, to make a firm diagnosis 
...it is becoming increasingly clear that we cannot distin- 
guish satisfactorily between these two illnesses and this 
brings home the suspicion that our formulation of the 
problem may be incorrect. 


The failure of the concept was demonstrated by 
Endicott and coworkers (1982) when they applied 
seven different sets of operational criteria to a consecut- 
ive series of 46 patients meeting any of the criteria for a 
diagnosis of schizophrenia admitted to the Psychiatric 
Institute in New York. By the most liberal criterion 44 
patients were diagnosed as suffering from schizophre- 
nia; by the most restrictive it was only six. Yet all of 
these criteria can be traced back through Bleuler to 
Kraepelin. Something is fundamentally wrong with 
the concept. What it is, is clear from Endicott et al.’s 
seminal contribution and from the direction of recent 
research on psychosis. Endicott et al. showed that the 
differences between different sets of criteria are to a 
large extent related to whether or not patients with an 
affective component to their illness are included. By 
the more liberal criteria some who by modal Research 


Diagnostic Criteria (RDC) will be diagnosed as manic- 
depressive will be given a diagnosis of schizophrenia. 
Much of the recent literature, whether nosological, 
pathophysiological, or genetic, is consistent with the 
concept that psychotic disorders represent continua 
rather than discrete disease entities. 


Brain Changes and Syndromes 


An important conclusion from imaging and post- 
mortem studies is that the psychoses are associated 
with structural changes in the brain. For example 
there is a degree of enlargement of the cerebral 
ventricles and there may be a modest reduction in 
cortical mass. Such changes are not specific to schizo- 
phrenic illnesses but occur also, probably to a lesser 
degree, in the affective psychoses. Thus the presence 
of structural change must be integrated within the 
concept of a continuum. 

But the nature of the continuum is elusive. Much 
recent work has focused on dimensions of psycho- 
pathology. One concept was that there are separate 
dimensions of positive (features such as delusions and 
hallucinations that are pathological by their presence) 
and negative symptoms (features such as poverty of 
speech or flattening of affect that represent loss of 
function) within a disease entity of ‘schizophrenia’ 
and that these syndromes had separate underlying 
pathologies — a neurochemical and a structural com- 
ponent respectively. But it is clear that such syn- 
dromes are not limited to illnesses that can be labeled 
schizophrenic in the original Kraepelinian sense. The 
existence of such variation raises the question of why 
if we really had discrete disease entities would we also 
have dimensions? What could these be other than a 
single dimension of severity? Alongside this issue 
must be placed the body of evidence that symptoms 
apparently characteristic of psychosis are common in 
the general population, i.e., in people who are never 
formally considered either by themselves or others to 
suffer from an illness. Where is the line to be drawn? 

E. Kretschmer formulated a challenge to the origin- 
al Kraepelinian concept: 


We can never do justice to the endogenous psychoses so long 
as we regard them as isolated unities of disease, having taken 
them out of their natural heredity environment, and forced 
them into the limits of a clinical system. Viewed in a large 
biological framework, however, the endogenous psychoses 
are nothing other than marked accentuations of normal 
types of temperament. 


This formulation raises the questions of what is ‘the 
natural heredity environment’ of psychosis, i.e., its 
genetic basis, and ‘its large biological framework’? 
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These questions impinge upon the origins of humans, 
and the nature of human diversity. In their solution 
the epidemiological characteristics of psychosis are 
crucial. 


Nuclear Symptoms, Epidemiology, and 
Genetic Implications 


K. Schneider identified a set of ‘nuclear’ or ‘first rank’ 
symptoms that appear to define a boundary of severity 
that makes it likely that an individual who experiences 
them for the first time will be referred to a treatment 
facility and thus will be enumerated for comparisons 
across populations. The symptoms include specific 
types of hallucination (e.g., hearing one’s thoughts 
spoken aloud, and hearing voices discussing one in 
the third person) and primary experiences concerning 
one’s thoughts (e.g., that thoughts are removed from 
or inserted into one’s head). With the use of these 
symptoms Jablensky and coworkers (1992) in the 
World Health Organization (WHO) Ten Country 
study concluded that 


schizophrenic illnesses are ubiquitous, appear with similar 
incidence in different cultures and have features that are 
more remarkable by their similarity across cultures than by 
their difference. 


Thus the predisposition to psychosis is intrinsic to 
human populations. The biological disadvantage that 
is associated with such genetic predisposition must be 
balanced by an advantage. It has been argued that it is 
related to the speciation characteristic, the capacity for 
language. Schizophrenia, according to this view, is ‘the 
price that Homo sapiens pays for language.’ Nuclear 
symptoms themselves can be conceived as anomalies 
of the transition from thought to speech. They may 
represent ‘language at the end of its tether.’ A response 
to Kretschmer’s challenge therefore is that the ‘natural 
heredity environment’ is the genetic change (the ‘spe- 
ciation event’) associated with the transition between 
a precursor hominid and Homo sapiens. The genetics 
of psychosis according to this view is inseparable from 
the genetics of speciation. The relevant variation is 
Homo sapiens-specitic. It may be related to what 
appears to be the defining feature of the human 
brain — that it is lateralized. A critical component of 
language, probably the phonological sequence is 
confined to one (the ‘dominant’) hemisphere. This 
lateralization is associated with subtle anatomical 
asymmetries and loss of these asymmetries appears 
to be an accompaniment of the ventricular enlarge- 
ment and cortical reduction that has been identified 
in schizophrenic and to a lesser degree in affective 
psychoses. 
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A simple view is that the psychoses represent com- 
ponents of the variation that is associated with the 
development of the human cerebral cortex. This 
evolved by a change that allowed the two hemispheres 
to develop with a degree of independence and this 
genetic change permitted the evolution of language. 
The nature of the variation that persists within the 
population even though it is associated with a bio- 
logical disadvantage (individuals with psychosis are 
much less likely that the rest of the population to 
procreate) is a question of theoretical importance. 
One possibility, consistent with findings in studies of 
monozygotic twins discordant for psychosis is that it 
is ‘epigenetic’ (associated with gene expression) rather 
than variation in the DNA sequence. The ‘larger 
biological framework’ of the genetics of psychosis is 
the evolution of the capacity for language and the 
associated revolution in brain function (hemispheric 
differentiation) that allowed the transition to take 
place. 


Conclusions 


The focus on ‘schizophrenia’ as a disease entity, as 
widely promoted by the psychiatric textbooks of the 
past century, does no justice to the nature of psychosis. 
These phenomena are intrinsic to human populations 
and tell us about the genetic origin of the characteristic 
that defines the species, the capacity for language. The 
nuclear symptoms of schizophrenia (e.g., thoughts 
inserted into or removed from one’s mind, hearing 
one’s thoughts spoken aloud) can be regarded as a 
window on the transition from thought to speech, 
i.e., a reflection on the neural basis of language. By 
the same logic the whole range of psychotic mani- 
festations, including the affective psychoses, tells us 
about the variation that relates to the core character- 
istic of the human brain — hemispheric differentiation. 
Thus the phenomena of psychosis help us to unravel 
what is distinctive about the function of the brain in 
Homo sapiens and why it is so diverse. They are also 
relevant to the genetic nature of the transition from a 
precursor hominid species. 
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Schizosaccharomyces pombe is a primitive ascomyce- 
tous fungus, also known as fission yeast. It has been 
extensively used in general and molecular genetics, 
and its genome has been fully sequenced (August 
2000). It is considered a very useful model organism 
for experimental research on fundamental properties 
of eukaryotic cells, such as cell cycle control mechan- 
isms, polarized cell growth, signal transduction, and 
sexual differentiation. 


History 


The narrow group of fission yeasts harbors only three 
recognized species, of which S. pombe is the best 
known, by far, experimentally. The other two are 
S. japonicus and S. octosporus. These yeasts can be 
isolated from fermenting saps or juices in tropical 
and subtropical areas. They are not directly related 
to the more commonly occurring budding yeasts, 
which independently have developed a divergent pat- 
tern of unicellular growth from another lineage of 
filamentous ascomycetes. In many respects, the mo- 
lecular clock of protein evolution seems to have kept 
a slower pace in the fission yeasts as compared to 
the more rapidly evolving budding yeasts. Hence, 
S. pombe tends to resemble the common fungal ances- 
tor more faithfully — as well as the common ancestor 
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of fungi and animals. This makes it a particularly use- 
ful model organism for eukaryotic cell and molecular 
biology studies in general. The fission mode of cell 
division resembles hyphal septation, which com- 
monly occurs in many filamentous fungi, whereas 
the budding mode may have arisen from more special- 
ized patterns, such as the emergence of microconidia 
in Neurospora crassa. 

The first genetic studies of S. pombe were done 
by Urs Leupold in the late 1940s, and essentially all 
experimental strains in current usage are derived from 
Leupold’s cultures, which had been isolated in 1921 by 
A. Osterwalder from “an exceedingly over-sulfurized 
grape juice,” originating from southern France. The 
unique potential of fission yeast for studies of cell 
division and growth was first recognized by Murdoch 
Mitchison in the 1950s, and the cell cycle studies of his 
group were merged with Leupold’s genetic approach 
by Paul Nurse in the 1970s. 


General Genetics 


Leupold began his genetic analyses by characterizing 
cross-fertile strains of two mating types (P, plus; or M, 
minus). These were related to the originally homo- 
thallic parent culture (h7°, capable of mating-type 
switching and self-fertile) by various kinds of chromo- 
somal rearrangements in the mating-type region (see 
Mating-Type Genes and their Switching in Yeasts). 
This set the stage for doing genetic crosses by mixing 
compatible haploid strains and harvesting asci or free 
spores at the end of growth, as meiosis and sporulation 
directly occur in the zygotes formed upon nutritional 
limitation. S. pombe usually grows as haploid cells 
from spores, or rather rarely as diploid cells from 
uncommitted zygotes that failed to sporulate upon 
transfer to growth medium. 

Crosses are routinely analyzed by tetrad dissection 
or random spore analysis. Useful screening proced- 
ures include the staining of spores on agar plates 
with iodine vapor (reacting with starch-like polyglu- 
cans), selective staining of diploid colonies by certain 
dyes, such as Phloxin B (actively excluded by live 
cells, but readily adsorbed to the dead cells that 
occur relatively frequently in diploid cultures), or 
mutant enrichment after inositol-less death or deoxy- 
glucose-induced cell lysis of growing cells. Diploid 
growth from uncommitted zygotes is routinely 
selected for by allelic complementation for a growth 
requirement. Sterile strains can be crossed by the alter- 
native method of protoplast fusion, after the rigid cell 
walls have been removed by enzymic digestion. 

Leupold’s group initiated the systematic coverage 
of the genetic map by biochemical markers, and later 
specialized in suppressor tRNA genetics. Other topics 
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pursued in S. pombe before the onset of the molecular 
era include analyses of biosynthetic pathways, such as 
purine metabolism, intragenic recombination and alle- 
lic complementation, mutagenesis and DNA damage 
repair, conditional cell division cycle mutants, mating- 
type switching, and mitochondrial genetics. 


Reverse Genetics 


The modern era of S. pombe genetics began in the early 
1980s when Paul Nurse and others established gene 
cloning procedures in fission yeast, soon after the 
basic protocols had been developed for budding 
yeast. Since then the number of vector plasmids and 
other tools has increased considerably. Targeted gene 
disruption and replacement by linearized DNA 
constructs work reasonably well, as guided by flank- 
ing regions of homology with the target sequence, 
although the efficiency of insertion at the correct 
target may vary from place to place in the genome. 
Random insertion of selectable markers in the absence 
of sequence homology has been utilized as a novel 
means for mutagenesis and facile cloning. Systematic 
screening of green fluorescent protein (GFP)-fusion 
libraries has proved very useful for the classification of 
functional proteins as to their subcellular in vivo lo- 
calization. The potential of S. pombe as host cells 
for heterologous expression of selected proteins is 
actively being investigated. 


Genomics 


The haploid genome of S. pombe amounts to about 14 
Mb, distributed among three chromosomes of 5.7, 4.6, 
and 3.5 Mb. These have become convenient size mark- 
ers in pulsed-field gel electrophoresis of chromosomal 
DNA of other organisms. The sequencing of the 
entire genome, coordinated by the Sanger Centre, 
was completed in 2001. 

There are about 6000 protein-encoding genes. 
About half of these have introns; half of those, again, 
have at least two, and so forth, in a uniformly descend- 
ing series of genes with multiple introns. Most introns 
are relatively short (40-75 nt), and their spacial dis- 
tribution is significantly skewed towards the 5’ end of a 
gene. There is little evidence of differential splicing in 
general, but some instances of meiosis-specific splicing 
have been reported. 

Ribosomal RNA genes are located at both ends 
of chromosome III. Other conspicuous features of 
the chromosomal landscape include centromeres, 
telomeres, ars elements (autonomously replicating 
sequences), the specifically organized mating-type 
region (mat), and retrotransposon-related repetitive 
sequences. 
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Relatively long segments of 60-100 kb are needed 
to make up functional centromeres. These stretches 
are characterized by a repetitive, heterochromatin- 
like organization — highly inaccessible to both tran- 
scription and recombination, during vegetative 
growth as well as during meiosis. A sizable chunk of 
centromeric-repeat DNA appears to have been trans- 
posed to the mat region, where it holds a central 
position in the silenced domain carrying two backup 
cassettes for mating-type switching. 

Chromosomal origins of replication (ars) tend to 
map at 1-4 kb gene-free regions of low complexity. 
Essential consensus sequences could not be defined, 
but the number and spacing of multiple binding sites 
for so-called AT-hook proteins appear to be import- 
ant. The local configuration of replication origins and 
pause sites around the functional mat1 cassette has 
been implicated in the molecular mechanism of 
mating-type switching. 


Nuclear Division Cycle 


The cell cycle is governed by the regular replication 
and segregation of the chromosomal genome. In fis- 
sion yeast, the analysis of cell cycle control factors by 
conditional cdc mutants was pioneered by Paul Nurse. 
The key role was ascribed to the prototype cyclin- 
dependent protein kinase Cde2p, which is homologous 
and functionally equivalent to MPF (maturation pro- 
moting factor) activity of developing amphibian 
oocytes. Its activity is needed independently at two 
crucial control points, the G,/S transition to start 
replication and the G2/M transition to initiate mitosis. 
A host of interacting factors, such as cyclins, inhibi- 
tors and ancillary protein kinases, have been identified 
in addition. Furthermore, the metaphase—anaphase 
transition has been recognized as a specific phase of 
polyubiquitinylation and proteasome-driven proteo- 
lysis to dissolve sister chromatid cohesion. 

Modern emphasis has shifted to various ‘check- 
point’ control systems, which ensure that crucial tran- 
sitions in the cell cycle can take place only if essential 
functions of the preceding stage have safely been com- 
pleted and premutational damage has been repaired. 
The mitotic cyclins are removed by the proteasome, 
and cytokinesis commences. Aberrant behavior at this 
stage was first recognized in Mitsuhiro Yanagida’s 
group as conditional cut mutants, when still undivided 
nuclei were bisected by untimely septation. 


Cell Shape and Growth 


Morphogenetic processes in fission yeast appear to be 
relatively simple. Individual cells are cylindrical in 
shape, and they grow in essentially one dimension 


at their tips. They arise by cytokinesis, shortly after 
mitosis, when a centripetally constricting septum 
bisects the mother cell, and the old cell wall is split 
apart. The ‘new’ end of any given cell originates from 
the most recent septum — the ‘old’ end being related to 
an earlier septation event. During mitosis and cyto- 
kinesis, cell wall elongation ceases temporarily. Cell 
growth is then resumed at the old end only for a while, 
before it becomes bipolar by switching over to the 
new end as well. 

All these transitions are tightly coordinated with 
the nuclear division cycle, and a host of interacting 
components has been identified by many groups, 
focusing on actin localization, organization of micro- 
tubules, participation of various motor proteins, 
signal transduction cascades, and other critical factors. 
Thereby, the superficially simple processes of linear 
growth, septation and morphogenesis are being cast 
into very sophisticated mechanistic networks at the 
molecular scale. 


Sexual Differentiation 


In fission yeast, mating occurs only at the end of 
vegetative growth, upon nutritional starvation. This 
is mediated by the transcriptional repression of sexu- 
ally important genes on rich medium, where two 
vegetative protein kinases are active at a high level. 
Upon nitrogen depletion, in particular, this repression 
is relieved and the mating-type genes can take over. 
These code for two complementary pairs of transcrip- 
tion factors: an early acting pair, specifying the sexual 
identity of M cells and P cells, respectively, and a later 
acting pair controlling meiosis. 

Mating-competent cells communicate by secreting 
their own peptide pheromone and responding to the 
presence of the complementary one, as mediated by 
specific receptor proteins, spanning the cytoplasmic 
membrane seven times. The pheromone response 
leads to loose agglutination en masse and tighter pair- 
wise association, before the interactive partners fuse 
by local dissolution of the separating cell walls. Rapid 
nuclear movements are likewise induced by the phero- 
mone response, resulting in karyogamy shortly after 
cytoplasmic fusion. As the two late-acting mating- 
type genes themselves are induced by the pheromone 
response, the diploid nucleus of the zygote is primed 
to undergo meiosis immediately. Finally, upon sporu- 
lation, the zygotes are converted into four-spored 
asci. Azygotic asci can also be produced by rarely 
occurring diploid cells, if these are heterozygous at 
the mat locus. 

Within the differentiating cells, pheromone per- 
ception is coupled to both trimeric and Ras-related 
G proteins, and the signal is further conveyed via a 


classical MAP kinase cascade, before it results in 
pheromone-induced transcription in the nucleus. 
There is important cross-talk to the stress response 
cascade as well. Various mechanisms contribute to 
specific desensitization of the pheromone response in 
the end. Many groups have worked on and added 
further details to these aspects. 


Meiosis and Recombination 


Meiosis in fission yeast is remarkable by several 
unusual features. The number of crossover events, 
10-20 per chromosome, is higher than in any other 
organism analyzed genetically. Yet, this efficient 
meiotic recombination occurs in the absence of synap- 
tonemal complexes, and crossover interference is 
not observed either. Most conspicuously, the meiotic 
prophase nuclei are rapidly pulled back and forth in a 
series of so-called ‘horsetail’ movements, led by the 
spindle pole bodies, to which the partially contracted 
chromosomes are attached by their clustered telo- 
meres. This mechanical agitation, as well as the organ- 
ization of chromosomal cores (‘linear elements’) 
connecting sister chromatids, is essential for efficient 
recombination. Systematic mutant screening has iden- 
tified numerous genes involved in the mechanism of 
meiotic crossing-over. 


Mitochondrial Genetics 


The mitochondrial genome of fission yeast (17-22 kb) 
is one of the smallest of its kind in lower eukaryotes. 
S. pombe is a so-called petite-negative yeast, which 
cannot form ATP in the absence of mitochondrial 
DNA, even under nonrespiratory conditions. Muta- 
tions affecting this dependency have been obtained, in 
addition to more commonly occurring respiratory- 
deficient mutants. 


Future Prospects 


Even though many scientists have firmly established 
S. pombe as a powerful model system for many basic 
aspects of eukaryotic cell biology, a lot is still to learn. 
The systematic evaluation of the information provided 
by the complete genome sequence has barely begun. In 
a most significant trend, more and more workers con- 
cerned with higher eukaryotes are becoming interested 
in the fission yeast to look at their favorite ortholog 
together with potentially interacting proteins so as to 
learnmoreabouttheunderlyingmolecularmechanisms. 


Relevant Web Sites 

History and Methods: http://www.bio.uva.nl/pombe/ 
Sequencing program: _ http:/www.sanger.ac.uk/ 
Projects/S_pombe/ 
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Vectorplasmids:http://pingu.salk.edu/users/forsburg/ 
plasmids.html 
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Screening is a method of analyzing bacterial isolates 
for a particular property or phenotype. It is useful 
for isolating mutant organisms which do not have a 
selective advantage over the wild-type organism 
or when the mutation leads to a conditionally lethal 
phenotype. 

Screening for a particular phenotype can be quite 
straightforward; for example, a distinguishing charac- 
teristic can be analyzed based on the color or size of a 
colony. Differentiating a blue colony from a sea of 
white colonies is one such example. More involved 
methodologies include DNA probe analysis, bio- 
chemical assay (for enzymatic activity), and antigen/ 
antibody reaction (for the presence of a protein). 
Screening for a particular phenotype can be quite 
laborious. For example, analysis of certain enzymatic 
activities requires each bacterial isolate to be grown in 
a separate test tube. Then the protein must be extracted 
and each extract tested biochemically. This can limit 
the analysis to a small number of isolates. 

Screening bacteria for a deleterious characteristic 
such as antibiotic sensitivity requires growing each 
of the isolates in the absence of the antibiotic. Each 
individual isolate must then be tested by picking 
colonies onto a plate that contains the antibiotic and 
on a plate that does not contain the antibiotic. Those 
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isolates that can only grow in the absence of the anti- 
biotic and fail to grow in the presence of the antibiotic 
are the antibiotic-sensitive isolates. 

Analternative and simpler method for screening for 
isolates that do not have the ability to grow under 
specialized conditions is the method of replica plating. 
Here, instead of individually picking hundreds of 
colonies for analysis on different media, a simple 
device that takes an imprint of all the colonies from 
a master agar plate is applied to a fresh agar plate, 
thereby creating a replica of the master plate. The 
plate can be incubated under unfavorable conditions 
such as in the presence of an antibiotic. By this 
method, a relatively large number of colonies can be 
screened without much effort. 

Screening should not be confused with selection. 
Selection distinguishes among bacteria by their ability 
to grow under specific conditions. Selections can be 
set up to isolate a single bacterium with a specific 
property from a mixture of hundreds of millions of 
bacteria lacking the property. 


See also: Antibiotic-Resistance Mutants; Bacterial 


Genetics; Resistance to Antibiotics, Genetics of, 
Selection Techniques 


scRNA 
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scRNA is the abbreviation for small cytoplasmic 
RNAs present in the cytoplasm and (sometimes) 
nucleus. 


See also: Cytoplasm; Nucleus 


scRNP 
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scRNP is the abbreviation for small cytoplasmic ribo- 
nucleoproteins, i.e. sR NAs associated with proteins. 


See also: Cytoplasm 


SDP (Strain Distribution 
Pattern) 


See: Strain Distribution Pattern (SDP) 


Second Division 
Segregation 


See: First and Second Division Segregation 


Secretion 


See: Protein Secretion Systems 
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The seeds of flowering plants have long been viewed 
as a convenient subject for basic research and an 
important factor in agriculture and human nutrition. 
The concept that seeds could also be subjected to 
genetic analysis dates back to the earliest days of mod- 
ern genetics. Wrinkled seeds were one of the initial 
characters studied by Mendel and defective kernel 
mutants were first described during the formative 
years of maize genetics. Large-scale analysis of such 
mutants, however, did not occur until many years 
later, when the power of genetics and molecular biol- 
ogy were combined. Arabidopsis eventually became 
the model system of choice for the identification of 
genes with essential functions during seed develop- 
ment while maize has remained the preferred system 
for genetic and biochemical studies of endosperm 
maturation. With the completion of the Arabidopsis 
genome, and multinational programs in functional 
genomics already in progress, it appears likely that 
every gene required for seed development in this 
model plant will eventually be identified. The chal- 
lenge will then be to understand the many regulatory 
pathways and interesting variations in seed develop- 
ment that are evident throughout the angiosperms. 


Seed Development 


The angiosperm seed forms through a unique process 
known as double fertilization, in which one male 
gamete contributed by the pollen fuses with the egg 
cell located within the ovule to form the diploid 
zygote, while the other male gamete fuses with two 
haploid maternal nuclei in close proximity to the 
zygote to form a triploid endosperm nucleus. 


Following double fertilization, the ovule enlarges and 
develops into a seed, and the ovary that generated the 
ovule becomes known as the fruit. The angiosperm 
zygote develops into an embryo composed of two 
parts, the embryo proper and the suspensor. The 
embryo proper ultimately differentiates into the 
mature embryo, whereas the suspensor degenerates 
during later stages of development and is not usually 
present at maturity. The suspensor appears to perform 
both a structural role in attaching the embryo proper 
to the surrounding seed coat and an active role in 
supporting and promoting growth of the embryo. 
The embryo proper progresses through a series of 
characteristic morphological stages early in develop- 
ment, establishes the root and shoot apical meristem 
regions that will produce the vegetative plant follow- 
ing germination, and prepares for seed desiccation at 
maturity. The endosperm tissue often begins as a free 
nuclear syncytium that later undergoes cellulariz- 
ation. In dicotyledonous plants such as Arabidopsis, 
the endosperm is absorbed by the developing embryo 
and is not present in the mature seed. Nutrients 
required for germination and initial seedling develop- 
ment are stored in embryonic leaves known as cotyle- 
dons. In monocots such as maize, a significant amount 
of starchy endosperm tissue remains at seed maturity 
to support growth of the young seedling. 


Screening for Seed Mutations 


Mature seeds are the preferred target for chemical 
mutagenesis in Arabidopsis because they are small, 
easily suspended in small volumes of mutagen, and 
contain only a few target cells in the shoot apical 
meristem. M, plants derived from treated seeds are 
chimeric, with sectors of heterozygous cells adjacent 
to wild-type cells derived from other parts of the 
treated meristem. Flowers that form within a mutant 
sector produce siliques (fruits) with 25% homozygous 
mutant seeds following self-pollination. Embryo- 
defective mutations can therefore be identified by 
screening siliques of M; plants for the presence of 
25% defective seeds. Mutant seeds often differ from 
normal seeds in size, color, and embryo morphology. 
The transparent seed coat facilitates the identifica- 
tion of embryos altered in morphology and pigmen- 
tation. Seeds treated with a clearing agent can be 
viewed under a compound microscope equipped with 
Nomarski (DIC) optics to reveal striking cellular 
details within the developing embryo. Although 
large collections of embryo-defective mutants have 
been generated by this method of screening immature 
siliques for abnormal seeds, many additional mutants 
defective in seed development have been identified 
by screening germinated seedlings on agar plates for 


Seed Development, Genetics of 1781 


defects in morphology indicative of a disruption of 
embryo development. 

Pollen is the preferred target for chemical muta- 
genesis in maize because controlled crosses are per- 
formed with minimal effort and because mature 
kernels are much larger than in Arabidopsis and con- 
tain a more complex shoot meristem. Segregation of 
mutant kernels can be detected by examining ears at 
different stages of development. Translocations in- 
volving accessory (B) chromosomes of maize, which 
undergo nondisjunction during pollen development, 
have in some cases been used to construct discordant 
kernels in which the embryo and endosperm differ in 
genotype. 

Many of the seed mutations analyzed in recent 
years have been generated through insertional muta- 
genesis. The preferred agent for Arabidopsis research 
is a piece of plasmid DNA (T-DNA) from the soil 
bacterium Agrobacterium tumefaciens. More than 
200000 transgenic Arabidopsis plants carrying in- 
dependent T-DNA insertions are available for detailed 
forward and reverse genetic screens and many add- 
itional lines await further characterization. A variety 
of endogenous transposable elements have been used 
in place of T-DNA for large-scale gene tagging in 
maize. Alterations in patterns of kernel pigmentation 
caused by transposon activity were originally an 
important factor in the discovery of transposable 
elements by Barbara McClintock. Since that time, 
movement of transposable elements during develop- 
ment and their experimental manipulation for the 
purpose of gene isolation have been characterized in 
detail. Several large-scale projects have recently been 
initiated to identify transposon insertions in most of 
the genes required for endosperm development in 
maize. 


Diversity of Mutant Phenotypes 


Thousands of mutants defective in seed development 
have already been found in maize and Arabidopsis. 
Although the long process of analyzing mutant 
phenotypes and identifying the disrupted genes will 
take years, many interesting and informative pheno- 
types have already been studied in detail. Extensive 
analysis of maize endosperm mutants altered in 
storage product accumulation has contributed not 
only to our understanding of endosperm function 
but also to the development of plants with improved 
nutritional qualities. Informative Arabidopsis pheno- 
types include the twin (tw) mutants in which the sus- 
pensor forms a secondary embryo, viviparous leafy 
cotyledon (lec) mutants characterized by premature 
germination and partial transformation of cotyledons 
into leaf-like structures, fertilization-independent (fis 
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and fie) mutants in which seed development begins 
in the absence of fertilization, titan (ttn) mutants with 
giant endosperm nuclei and enlarged embryo cells, 
shoot meristemless (stm) mutants, auxotrophic mu- 
tants defective in biotin synthesis, and a variety of 
mutants disrupted in cell division patterns during 
early stages of development. In contrast to Drosophila, 
where a small number of genes regulate many of the 
critical events in embryo development, pattern forma- 
tion during plant embryogenesis appears to be con- 
trolled by many genes with a variety of cellular 
functions. Current topics of interest that arose from 
mutant analysis include the role of the plant hormone 
auxin in regulating embryo pattern formation, the 
relationship between intracellular transport mechan- 
isms and embryo morphogenesis, and the importance 
of gene silencing in early endosperm development. 


Future Prospects 


Arabidopsis contains an estimated 500 genes that can 
mutate to give an embryo- defective phenotype. Identi- 
fying knockout mutations in each of these essential 
genes is a high priority of future efforts in functional 
genomics. Many additional genes that are expressed 
during embryo development do not give rise toa mutant 
phenotype when disrupted. Most of these genes are 
likely to be duplicated in the genome or encode pro- 
ducts that function in redundant cellular or biochem- 
ical processes. Understanding the precise roles of these 
gene products in embryo and endosperm development 
represents a significant challenge for the future. 
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More than 70% of the proteins consumed by humans 
are derived from storage proteins of legume and cereal 
seeds. Seed protein content of legumes varies from 20% 
to 40%, while in cereals it accounts for 7-15% of the 
dry weight of the seed. Seed storage proteins accumu- 
late in the cotyledon and embryo of dicotyledonous 
plants and in the endosperm of monocotyledonous 
plants. These proteins are deposited in specialized 
membrane-bound organelles called protein bodies. 
The predominant legume storage proteins are salt- 
soluble globulins and are grouped under two classes, 
7S and 11S, while the major storage proteins of cereals 
are the alcohol-soluble prolamins. Exceptions are oats 
and rice, in which the major storage proteins are 
globulin-like. Because of the abundance of these 
proteins, they are mainly responsible for the nutri- 
tional quality of the human diet. The deficiency of 
certain essential amino acids in the seed proteins 
may, however, limit their nutritional quality for 
monogastric animals. In general, the cereal storage 
proteins are deficient in lysine, threonine, and trypto- 
phan while the cereal prolamins contain low levels of 
cysteine and methionine. Most of the seed storage 
proteins are encoded by multigene families. The 
cereal prolamins appear to have evolved from a single 
ancestral gene and similarly the 7S and 11S legume 
proteins appear to have evolved from a common 
ancestral protein. The synthesis of seed storage 
protein is primarily controlled at the level of tran- 
scription, but may also be subject to posttranscrip- 
tional controls. The transcription of seed protein 
genes is highly regulated in a spatial and temporal 
manner. 


Properties of Seeds 


Starch is just one of the major food value ingredients 
in seeds. In addition to carbohydrates, seeds contain 
proteins and lipids. All three are storage constituents 
of the seed, and they are the source not only of the 
energy and growth compounds for germination of the 
seed, but for the consumer of the seed and its products. 
The proteins confer food value, handling properties, 
and gustatory qualities. Legume seeds, for example, 
are widely recognized to be significant sources of 
protein in the diet. Proteins in wheat flours give 
them their specific texture, consistency, and baking 
quality. Indeed, differences among breads and pastas 


are a reflection of differences among their proteins, in 
types and in the balance of different classes of pro- 
teins. Even the many ‘starch’ products prepared from 
seeds carry significant amounts of proteins. For the 
most part, they are valuable food constituents, but 
many parents of young children become aware by 
experience that some cereal grain proteins may cause 
digestive distress in infants while others do not. 


History 


Humans have utilized seed crops for centuries as a 
major source of carbohydrate and protein. For example, 
domesticated wheat and barley have been found in 
7000-year-old Egyptian dwellings. Isolation of the 
proteins (gluten) of wheat flour from the starch was 
described in the mid-1700s, soon followed by charac- 
terizations of differential solubilities and fractions of 
the gluten. The protein (prolamin) fraction of maize, 
zein, was isolated by differential extraction in the early 
1800s, and studies on other cereal proteins followed 
over time. The modern study of seed proteins was 
stimulated by the classical and innovative work of 
Thomas Osborne. His work on the classification of 
seed proteins on the basis of their differential solu- 
bility in aqueous and nonaqueous solutions facilitated 
research on the characterization of seed proteins. The 
ability to classify the proteins and to measure differ- 
ences in specific qualities has permitted extensive 
inheritance studies. In fact, a number of genes that 
encode seed proteins, or that regulate them, have 
been cloned. 


What Are Seed Storage Proteins? 


Seed storage proteins are proteins that accumulate 
significantly in the developing seed, whose main func- 
tion is to act as a storage reserve for nitrogen, carbon, 
and sulfur. These proteins are rapidly mobilized dur- 
ing seed germination and serve as the major source of 
reduced nitrogen for the growing seedlings. In gen- 
eral, seed storage proteins do not carry out any en- 
zymatic functions. Even though storage proteins from 
diverse plants are structurally different, they all share 
some common characteristics. One of the main char- 
acteristics of storage proteins is that they accumulate 
in high levels in specific tissues at a specific stage of 
development. Seed storage proteins are generally not 
found in nonseed organs. These proteins accumulate 
within membrane-bound organelles called protein 
bodies (Figure |). The sequestration of storage pro- 
teins within protein bodies ensures that these proteins 
are separated from the metabolic compartments of the 
cell. The expression of storage proteins is also regu- 
lated by nutrition. For example, the synthesis of 
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Figure | (A) Transmission electron micrograph of 
rice endosperm showing spherical (S) and irregular- 
shaped (IR) protein bodies. The spherical protein bodies 
store prolamins and the irregular-shaped protein bodies 
accumulate glutelins. Note the direct connection 
between the rough endoplasmic reticulum (ER) and 
spherical protein bodies. (B) Protein bodies (PB) in the 
cotyledons of developing soybean seed. Note the occur- 
rence of protein deposits (arrows) within the vacuoles (V). 
Large amyloplasts (A) are also seen. 


sulfur-rich proteins may be restricted when the plants 
are grown in soils having low sulfur. 


Classification of Seed Proteins 


Traditionally seed proteins are classified into four 
groups based on their solubility properties. Proteins 
soluble in water are known as albumins, in salt 
solutions as globulins, in alcoholic solutions as 
prolamins, and in dilute acid or alkali as glutelins. 
Even though this classification is not absolute, since 
one group of proteins can be soluble in more than one 
solution, it provides a convenient method to group 
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proteins into different classes. Prolamins are found 
exclusively in the grass family and so far have not 
been detected in other plant families. The members 
of the legume family store globulins as the predom- 
inant storage reserve. The globulins are broadly 
classified into 7-8S and 11-128 based on their sedi- 
mentation coefficients. In several instances, seed 
storage proteins are given names that are derived 
from the Latin generic name of the plant. For example, 
the storage proteins of maize are called zeins (Zea 
mays), hordein for barley (Hordeum vulgare), and 
glycinin for soybean (Glycine max). Currently, the 
seed proteins are classified on the basis of function 
and molecular/biochemical relationships. On the 
basis of their function, seed proteins are classified as 
storage, structural and metabolic, and protective pro- 
teins. The major function of the storage proteins is to 
serve as the source of nitrogen, carbon, and sulfur. 
The structural and metabolic proteins are essential 
for normal growth and development. The protective 
proteins play a role in providing resistance against 
pathogens, pests, and desiccation. 


Cereal Storage Proteins 


Prolamins are the abundant storage proteins of most 
cereals and consequently were the earliest proteins to 
be studied. Rice and oats are exceptions in that the 
major storage proteins are glutelins and globulins, 
respectively. Prolamins can be subdivided into two 
categories: those that are soluble in aqueous alcohol 
and those that are soluble only in the presence of 
reducing agents such as mercaptoethanol. The pro- 
lamins of the second group contain interchain disulfide 
bonds that render them insoluble in aqueous alcohol. 
The prolamins of cereals can also be conveniently put 
into four cereal groups: the Triticeae (wheat, barley, 
rye, and their relatives), oats, rice, and the Panicoideae 
(maize, sorghum, and most millets). 


Prolamins of the Triticeae 

The prolamins of the Triticeae contain high levels of 
proline and glutamine. They are classified into three 
families: sulfur-rich (S-rich), sulfur-poor (S-poor), and 
high-molecular-weight (HMW) prolamins. The S-rich 
prolamins are the predominant storage proteins repre- 
senting 80-90% of the total prolamin fractions. The 
a-gliadins, y-gliadins, and the low-molecular-weight 
(LMW) glutenin subunits of wheat, the B and y- 
hordeins of barley, and the y-secalin of rye all belong 
to this group of prolamins. The amino acid sequence 
of the S-rich prolamins consists of N-terminal re- 
petitive and C-terminal nonrepetitive domains. The 
repeats are based on short peptide motifs that are 
rich in proline and glutamine. The nonrepetitive 


domain contains most of the cysteine residues that 
lead to interchain and intrachain disulfide bonds in 
the polymeric and monomeric S-rich prolamins. 
A comparison of the nonrepetitive domains of S-rich 
prolamins reveals three highly conserved regions of 
20-30 residues. Three regions, designated A, B, and C, 
contain the conserved cysteine residue, indicating a 
common origin from a single ancestral protein. The 
S-poor prolamins include the w-gliadins of wheat, 
w-secalins of rye, and C hordeins of barley. The 
S-poor prolamins consist of a repetitive domain near 
the N-terminus and a nonrepetitive domain at the 
C-terminus. The HMW prolamins, which contribute 
to the bread-making quality of wheat, consist of 
three domains. A nonrepetitive domain is present at 
the N- and C-termini that flank a central repetitive 
domain. The repetitive domain contributes to the un- 
usual amino acid composition of these proteins. These 
proteins contain high levels of glycine and glutamine. 
Regions related to A, B, and C of S-rich prolamins are 
also present in HMW prolamins. These similarities 
indicate an evolutionary relationship among the 
S-rich, S-poor, and HMW prolamins. 


Prolamins of Rice and Oats 

The prolamins of oats are called avenins and account 
for about 10% of the total seed proteins. These pro- 
teins also resemble the prolamins of Triticeae in the 
presence of repeated sequences that occur as two 
separate blocks. The repeats have close similarity to 
those present in the S-poor and S-rich prolamins. On 
the other hand, the rice prolamins, which account for 
5-10% of the total seed proteins, do not reveal the 
repeated sequences that are typical of prolamins of 
the Triticeae. Rice prolamins fall into four classes. 
Classes I to III encode the abundant prolamin poly- 
peptides, while class IV represents the minor prolamin 
component. The class IV prolamin, which is rich in 
sulfur-containing amino acids (30% methionine and 
cysteine), has very little sequence homology to the 
other three classes. 


Prolamins of Subfamily Panicoideae 

The prolamins of maize (zeins) are divided into four 
major groups called «, B, y, and 6 zeins. Zeins 
can be fractionated by sodium dodecyl sulfate- 
polyacrylamide gel electrophoresis (SDS-PAGE) 
into polypeptides of 27000, 22000, 19000, 16000, 
15000, 14000, and 10000 Da. The «-zeins, which 
are made up of polypeptides of 22000 and 19000 
Da, are the predominant storage proteins of maize, 
accounting for about 70% of the total protein. The 
second abundant zein fraction is represented by the 
y-zeins made up of 27000 and 16000 Da proteins. 
These proteins are rich in cysteine residues and 


are soluble in aqueous and alcoholic solutions contain- 
ing a reducing agent. These proteins may be phylo- 
genetically related to the a, B, y gliadins of wheat 
and the B hordein from barley. Proteins with a mo- 
lecular mass of 14 000 and 15 000 represent the B-zeins, 
while the 5-zeins are made up of 10000 and 18 000 Da 
proteins. Both classes of zeins are rich in methionine. 
The 5-zeins are structurally unrelated to other zeins, 
showing some similarity to the methionine-rich Brazil 
nut 2S storage proteins. In contrast to other classes 
of zeins, the 18 000 Da 6-zein contains one lysine and 
two tryptophan residues. Other tropical cereals such 
as sorghum, Coix, and millets also accumulate 
prolamins accounting for over 50% of the total seed 
proteins. The prolamins of sorghum (kafirins), Coix 
(coixins), pearl millet (pennisetins), and foxtail millet 
(setarins) have solubility and protein structure similar 
to zeins. 


Legume Storage Proteins 


The globulins are the predominant storage proteins 
in legumes. Unlike the prolamins, the globulins are 
widely distributed amongst higher plants. They are 
present not only in dicots but also occur in monocots, 
gymnosperms, and ferns. These proteins are soluble in 
dilute salt solutions and have sedimentation coeffi- 
cients of 7-8S and 11-12S. The 7-8S globulins are 
generally referred to as vicilin-type globulins, while 
the 11-12S globulins are called legumin-type globu- 
lins. The globulins have been studied in detail from 
several important legumes including peas, soybean, 
lupin, peanut, French bean, and broad bean. The 
amino acid composition of the globulins reveals 
deficiency in sulfur-containing amino acids with 
methionine being the most limiting amino acid. 

The 11-12S storage proteins are isolated as hex- 
amers. Each of these subunits is made up of an acidic 
subunit of 40 000 Da and a basic subunit of 20 000 Da. 
A single disulfide bond holds each of these subunits 
together. The position of the disulfide bridge is highly 
conserved amongst 11-12S globulins. Each subunit 
is synthesized as a precursor protein that undergoes 
proteolytic cleavage resulting in the basic and acidic 
subunits. The 7-8S globulins are found as trimers with 
an apparent molecular mass of 150 000 to 190 000. The 
7-8S globulins can be divided into two groups. The 
first group contains members in which the precursor 
polypeptides undergo little or no posttranslational 
modification. The polypeptides belonging to this 
group are in the range of 76000 to 40000 Da. On 
the other hand, the second group undergoes exten- 
sive posttranslational modification resulting in sub- 
units in the molecular mass range of 12 000 to 34.000 
Da. Based on the similarity between the C-terminals 
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of the 11-12S and 7-85 globulins, it has been suggested 
that these two groups of proteins are related to each 
other and presumably evolved from a common ances- 
tral protein. 


Genomic Organization of Seed Storage 
Protein Genes 


Multigene families encode cereal prolamins. The pro- 
lamin genes of the Triticeae can be grouped under 
three multigene families encoding the S-poor prolam- 
ins, the S-rich prolamins, and the HMW prolamins. 
The sequences of these genes vary widely and contain 
no introns. Classical and molecular genetic studies 
have shown that most of the prolamin genes of Triti- 
ceae are located at complex loci on the homoeologous 
group 1 chromosomes. The «-zeins are encoded by a 
multigene family consisting of about 70 to 100 mem- 
bers. Some of these genes contain in-frame stop codons 
indicating that they are pseudogenes. Several of the 
a-zein genes have been mapped to the long and short 
arm of chromosome 4, the short arm of chromosome 
7, the long arm of chromosome 10, and near the cen- 
tromere of chromosome 1. The gene encoding the 
27000 y-zein has been mapped to the long arm of 
chromosome 7, while the 10000 6-zein has been 
mapped to the short arm of the same chromosome. 
Similarly the B-zein is mapped to the short arm, while 
the 18 000 6-zein has been mapped to the long arm of 
chromosome 6. The rice prolamins are also encoded 
by multigene families consisting of about 80-100 
copies per haploid genome. Some of the prolamin 
genes of rice occur in tandem repeats. 

The globulins of legumes are encoded by a small 
family of genes. For example, the soybean glycinins 
are encoded by five genes (Gy1—Gy5) and are resolved 
into two groups. These genes are scattered throughout 
the genome. Most 11S globulins contain three introns 
while the 7S globulins have five introns. The insertion 
positions of the introns are highly conserved among 
the different globulin genes supporting the notion that 
the 11S and 7S globulin genes share a common pre- 
cursor. The globulin proteins of maize embryo and 
endosperm are encoded by two genes, glb1 and glb2. 
The proteins are quite different between the two. The 
glb1 gene has variants found in various strains that 
produce proteins differing in electrophoretic mobility. 
In the long-term selection strains Illinois High Pro- 
tein and Illinois Low Protein, and in some inbred 
lines, the glb1 locus produces no protein product. 
Variants of the g/b2 gene have only been found 
showing presence or absence of the protein. Derived 
strains lacking both proteins produce normal kernels, 
indicating that globulins are not involved in essential 
functions. 


1786 Seed Storage Proteins 


Regulation of Seed Storage Genes 


Many different processes regulate seed storage protein 
gene expression. Differences in the transcription rates, 
messenger RNA (mRNA) stabilities, translation effi- 
ciency rate, and protein degradation rates can all play a 
role in determining the relative levels of storage pro- 
teins. The spatial and temporal regulation of seed 
storage protein genes is mainly at the transcriptional 
level. In general, seed protein genes are not transcribed 
in nonseed tissues. Based on detailed DNA sequence 
analysis of seed storage protein genes from several 
cereals and legumes, it is clear that several conserved 
sequence motifs (regulatory elements) are present in 
the 5’ promoter regions of these genes. These regula- 
tory elements include the CATGCATG, CACA, 
and ANCCCA sequences. The CATGCATG motif, 
which is also called an RY element, is involved in 
high-level expression in seeds and represses expression 
in nonseed organs. In addition, an octanucleotide box, 
GCCAC (c/t)TC, is present in most genes encoding 
the 7S and 11S globulins. In the case of cereals, a 
prolamin-specific motif (TGTAAAG) has been iden- 
tified around —300 in the 5’ regions of prolamin genes. 
It is believed that the —300 box may have a role in the 
quantitative level of expression of some prolamin 
genes. The seed-specific expression is controlled by 
both positive and negative DNA elements. The posi- 
tive element will stimulate the transcription, while the 
negative element will inhibit transcription in nonseed 
organs. Binding of nuclear proteins (trans-acting fac- 
tors) to conserved elements (cis-acting elements) also 
contributes to transcriptional regulation. In general, 
the cis-acting elements are AT-rich. Sequences that are 
CA-rich, CATGCATG-like sequences, and G-boxes 
(CACGTG) are present in genes expressed in seeds 
and have been shown to bind nuclear factors. In addi- 
tion to the transcriptional regulation, seed storage 
protein gene expression may be controlled by post- 
transcriptional events. Relative RNA stabilities, dif- 
ferential translation, and differential protein stabilities 
can all influence the accumulation of seed storage 
proteins. 


Seed Storage Protein Mutants 


Several high-lysine mutants, both spontaneous and 
induced, are found in maize, barley, and sorghum. 
These mutants are easy to identify because they ex- 
hibit three common characteristics: reduced accumu- 
lation of prolamins, presence of starchy endosperm, 
and decreased yield. In addition, these high-lysine 
mutants may be affected in their embryo size. A 
high-lysine barley line (Hiproly) contains 30% more 
lysine as compared to normal isolines. Another 


mutation (lys 3a) located in the same chromosome, 
but not linked to the lys gene, also has elevated levels 
of lysine. Even though the high-lysine gene (lys) has 
been mapped to chromosome 7, the molecular basis 
of mutation is not understood. Several mutations in 
maize (opaque2, opaque7, opaquel5, opaque6, 
floury2, floury3, defective endosperm*B30, and 
Mucronate) cause reduction in the accumulation of 
maize zeins. The o2 mutation affects a regulatory 
gene and regulates the expression of 22000 Da 
a-zein genes. The 02 and f/2 mutants contain elevated 
levels of lysine and tryptophan when compared to 
the wild-type. Through plant breeding programs at 
The International Maize and Wheat Improvement 
Center, the soft, starchy phenotype of the 02 mutant 
was converted to hard, translucent endosperm by 
incorporating genetic modifiers. These high-lysine 
cultivars are known as Quality Protein Maize 
(QPM) and are currently being developed for human 
and livestock consumption. 


Targets and Modification 


In comparison with legumes, cereals have low protein 
content. In addition, the amino acid composition of 
cereal prolamins is not balanced. They are deficient in 
lysine, an essential amino acid for monogastric ani- 
mals. As a result, a considerable amount of research is 
focused on improving the quality and quantity of seed 
storage proteins both by traditional plant breeding 
and modern genetic engineering technology. In add- 
ition to low protein content, some of the protein 
components are not easily digestible by monogastric 
animals. For example, protein digestibility values for 
cooked sorghum, rice, maize, and wheat are 46%, 
63%, 73%, and 81%, respectively. In the case of 
sorghum, it has been determined that the interior loca- 
tion of a-kafirin within the protein bodies resulted in 
poor accessibility to digestive enzymes. Recently, sor- 
ghum lines with high protein digestibility were dis- 
covered and were found to contain morphologically 
altered protein bodies. Detailed understanding of the 
molecular basis of altered protein body formation 
could be used to improve the protein digestibility of 
other important seed proteins. 

In the case of legumes, most studies have been 
directed at increasing the methionine content. Three 
types of approaches have been attempted. In the first 
approach, methionine-rich sequences have been intro- 
duced into globulin genes and expressed in transgenic 
plants. In the second approach, overexpression of 
endogenous high-sulfur-containing seed proteins has 
been attempted. In the third approach, heterologous 
genes encoding high-sulfur proteins have been ex- 
pressed in transgenic plants. 


Genetic engineering technology has enabled scien- 
tists to alter the quality and quantity of seed storage 
proteins. For example, a methionine-rich 2S albumin 
of Brazil nut has been successfully expressed in 
tobacco seeds and forage grasses, thereby greatly 
increasing the methionine content. However, it has 
been found to have allergenic properties. Similarly, 
by expressing two key enzymes in the lysine biosyn- 
thetic pathway, the overall content of lysine in maize 
and soybeans has been elevated. By antisense tech- 
nology, the levels of a 16 000 Da rice allergenic protein 
have been reduced to one-fifth of the wild-type rice. 
Furthermore, the lysine content of rice seeds was 
elevated by expressing the B-phaseolin gene in trans- 
genic rice. All the above examples demonstrate the 
invaluable contribution of biotechnology in the 
improvement of seed storage protein quality and 
quantity. 


Further Reading 

Habben JE and Larkins BA (1995) Improving protein quality in 
seeds. In: Kige J and Galili G (eds) Seed Development and 
Germination, pp. 791-810. New York: Marcel Dekker. 

Okita TW and Rogers JC (1996) Compartmentation of proteins 
in the endomembrane system of plant cells. Annual Review of 
Plant Physiology and Plant Molecular Biology 47: 327-350. 

Shewry PR and Tatham AS (1999) The characteristics, struc- 
tures and evolutionary relationships of prolamins. In: Shewry 
P and Casey R (eds) Seed Proteins, pp. | 1-33. Norwel, MA: 
Kluwer Academic Publishers. 

Shewry PR, Napier JA and Tatham AS (1995) Seed storage 
proteins: structures and biosynthesis. The Plant Cell 7: 
945-956. 

Vitale A and Bollini R (1995) Legume storage proteins. In: Kige J 
and Galili G (eds) Seed Development and Germination, pp. 
73-102. New York: Marcel Dekker. 


See also: Grasses, Synteny, Evolution, and 
Molecular Systematics; Leguminosae; 
Seed Development, Genetics of 
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When chromosomes are broken, whether spontan- 
eously or as a result of irradiation, the broken ends 
have a strong tendency to rejoin. Simultaneous breaks 
in two chromosomes may result in a reciprocal ex- 
change of segments. This process is known as segmental 
interchange, or reciprocal translocation. 
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Visualization 


A diploid with two chromosomes with interchanged 
segments and two structurally normal homologs is 
called an interchange heterozygote. Since, overall, it 
has a balanced chromosome complement, it will be 
phenotypically normal unless, as sometimes happens, 
a dominant mutation has been induced at one of the 
original breakpoints. But it encounters complications 
when it undergoes meiosis. The close homologous 
pairing of chromosome segments at the pachytene 
stage can occur only through an association of four 
chromosomes, two normal and two interchanged, 
with exchange of partners at the loci of the original 
breaks. Such pachytene configurations can be seen 
clearly with the light microscope in organisms such as 
Zea mays (maize, corn) in which clear pachytene pre- 
parations can be made, and with higher definition with 
the electron microscope in sexual organisms generally 
after staining the axial elements of the synaptonemal 
complex with silver ions. Interchange points can be 
defined with the highest precision in Drosophila spe- 
cies, not in meiotic cells but in the giant nuclei of the 
salivary gland cells, where the polytene chromosomes 
display close homologous pairing (Figure 1). 


Genetic Consequences 


The effect of the association of four chromosomes at 
pachytene depends critically on the positioning of the 
crossovers (chiasmata) which occur within it. Poten- 
tially, meiosis can yield wholly viable products if the 
chiasmata are confined to regions distal to (i.e., further 


Figure | 
mere-distal segments of chromosomes 2 and 3 in 
Drosophila, seen in the polytene chromosomes of a 
salivary gland cell nucleus. The centromere-proximal 
ends of the chromosome arms are not shown. (Drawn 
from the photograph of Roberts (1970) Genetics 65: 
429.) 


A segmental interchange between centro- 
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from the centromere than) the exchange points. Then, 
in the case where all four chromosomes are joined by 
chiasmata to form a ring or chain of four, alternate 
orientation of centromeres will result in two structur- 
ally normal divided chromosomes (dyads) passing to 
one pole of the spindle and two interchanged dyads to 
the other, and all meiotic products will have a balanced 
chromosome complement (Figure 2Bi). If, onthe other 
hand, adjacent centromeres in the ring or chain pass to 
the same pole, each of the nuclei resulting from the 
first division will be duplicated in one chromosome 
segment and deficient in the other, and all products 
will be inviable. If a chiasma (crossover) forms 
between an interchange point and the centromere, at 
least two out of the four products of meiosis will have 
a duplication and a deficiency whatever the orienta- 
tion of the centromeres (Figure 2Biii,iv). Thus, inter- 
change heterozygotes are expected to suffer some 
degree of infertility, but can be normally fertile given 
an appropriate chiasma distribution and metaphase I 
centromere orientation. 

Because viability of the meiotic products of an 
interchange heterozygote depends on their having a 
balanced chromosome complement, consisting of 
either the two structurally normal or the two inter- 
changed chromosomes, allelic differences (markers) 
close to the interchange points will appear closely 
linked, even though they are normally located on 


nonhomologous chromosomes. They will be able to 
recombine to give viable meiotic products only when 
there is a crossover between one of them and the 
exchange point (see Figure 2B). Their linkage relation- 
ships will be represented by a four-armed linkage map 
mirroring the pachytene pairing patterns, with loci 
close to the four-way junction giving little or no 
effective recombination. In the interchange homozy- 
gote, of course, there will be the normal number of 
independent linear linkage groups with segments of 
two of them interchanged. 


Interchange Complexes in Plants 


In some plants, most famously in one section of the 
genus Oenothera (evening primrose), the avoidance of 
infertility in interchange heterozygotes, by restriction 
of chiasmata to distal chromosome segments and alter- 
nate orientation of centromeres at first metaphase, has 
become established as a regular system. In these 
Oenothera species there are 14 chromosomes in the 
diploid set, each with two arms of approximately 
equal length. The plants are all complex interchange 
heterozygotes, forming at meiosis a ring of 14 involv- 
ing the entire genome. At the first anaphase of meiosis 
the chromosomes are alternately arranged and 
separate into two sets of seven. The ring is the result 
of an interchange in each chromosome arm. If one 


Figure 2 (Opposite) (A) Diagram showing the pairing and chiasma formation at the first prophase of meiosis in an 
interchange heterozygote. The two structurally normal chromosomes inherited from one parent are shown as thick and 
thin lines, respectively, with centromeres labeled | and 3. The two chromosomes contributed by the other parent, with 
centromeres labeled 2 (homologous to |) and 4 (homologous to 3) have a segmental interchange. The normal and 
interchanged chromosome pairs are distinguished by markers at three loci, with the normal chromosomes carrying A, 
B, and C, and the interchanged pair the alleles a, b, c. The A and B loci are located on the interchanged segments close to 
the interchange points. The chiasmata (crossovers) are, for convenience in drawing, shown occurring between the two 
‘inner’ chromatids in each case, but this makes no difference to the argument. (B) Quadrivalent associations at 
metaphase | of meiosis. (i) Chromosomes joined in a ring-of-four following formation of chiasmata in the positions 
shown by full lines in (A). Viable meiotic products are possible only if centromeres are oriented alternately as shown, 
with | and 3 directed to one spindle pole and 2 and 4 to the other. Separation of the centromeres of adjacent 
members of the ring to the same pole (1,2/3,4 or |,4/2,3) will result in all products having segmental duplications and 
deficiences which will make them inviable. The viable products will always be of the parental constitutions A B and a b 
except in the rare event of chiasma formation (crossing-over) in the short space between either locus and the 
interchange point. Thus, the two loci will appear closely linked even though they are normally on different 
chromosomes. C, far enough from the interchange for frequent chiasma formation in the intervening interval, will give 
two-out-of-four recombination with the other two markers whenever, as shown in this diagram, such a chiasma 
occurs. (ii) Constitutions of the four meiotic products resulting from the chiasma distribution and centromere 
orientation shown in (i). (iii) The corresponding association of four when an additional chiasma (crossover) is formed 
between a centromere and the interchange point (shown dotted in Figure 2A). Viable meiotic products can be 
formed whether the centromeres separate | with 3 and 2 with 4, or (the alternative shown) | with 4 and 2 with 3. 
But in either case two of the four products will have duplications and deficiencies and be inviable. (iv) The two 
alternative sets of meiotic products resulting from the chiasma distribution shown in (iii). Those duplicated for the A/a- 
containing segment and deficient for B/b, or vice versa, and therefore inviable, are marked X. The viable products are 
again all A B or a b. 


numbers the exchanged sections of arms 1 to 14, one 
set will have combinations 1-2 3-4 5-6 7-8 9-10 11-12 
13-14, and the other 2-3 4-5 6-7 8-9 10-11 12-13 14 
1, the hyphen in each case standing for the centromere 
and the region around it. 

The regular formation of the metaphase I ring of 14 
chromosomes, with alternate orientation of centro- 
meres, depends on chiasma localization, such that 
chiasmata are regularly formed in each chromosome 
arm but hardly at all in the regions between the 
centromeres and the exchange points. The two 
chromosome sets can exchange genes located in their 
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pairing arms, but in the regions between the centro- 
meres and the exchange points they remain distinct 
and to some extent functionally differentiated, most 
notably with respect to germ cell production and 
viability. The meiotic products that form embryo 
sacs all carry one set and all viable pollen grains 
carry the other. Thus, a state of ‘permanent hybridity’ 
is maintained. 

The system depends on the localization of chi- 
asmata and the alternate orientation of centromeres, 
and it is not, in fact, completely stable. In some stocks 
a few per cent of plants can show striking differences 


CiA B3 
c1A B 3 
c2b a 4 
C2b a 4 
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from the standard species type because they have 
become homozygous for chromosome segments that 
are normally kept heterozygous. This apparent high 
mutability of Oenothera spp., quite untypical of 
organisms in general, was a main foundation for 
Hugo de Vries’s mutation theory of evolution, which 
was very influential in the first two decades of the 
twentieth century. 


Further Reading 
Darlington CD (1937) Recent Advances in Cytology, 2nd edn. 
London: Churchill. 


See also: Crossing-Over; Crossover Suppressor; 
De Vries, Hugo; Polytene Chromosomes; 
Synaptonemal Complex; Translocation 
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Segmentation genes are those required for controlling 
segmentation in insect embryos. 


See also: Drosophila melanogaster 
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Segregation is a fundamental concept of genetics. It 
refers totheseparation, during gameteformation, oftwo 
alleles present at a locus in a diploid individual. This 
ensures the representation of bothalleles inthe progeny. 

The notion of segregation was introduced by 
Gregor Mendel in his seminal work ‘Experiments in 
plant hybridization.’ He demonstrated that “various 
kinds of egg and pollen cells were formed in hybrids 
on the average in equal numbers,” thus establishing 
that heterozygotes produce two types of gametes with 
regard to each locus. These two types are being pro- 
duced equally and each represents an allele of the 
parent. Therefore, Mendel’s First Law is also known 
as “The Law of Segregation.’ 

In the early twentieth century it was recognized 
that segregation of genes during gametogenesis is 
closely paralleled by segregation of chromosomes 
during meiosis, the cell division responsible for gamete 


production. This observation lead to the formulation 
of the chromosomal theory of heredity. 

While in the majority of cases segregation results in 
equal representation of alleles in offspring, several 
exceptions exist when the ratio deviates significantly 
from 1:1. This phenomenon is known as segregation 
distortion, transmission ratio distortion, or meiotic 
drive. Among the more spectacular examples are the 
inheritance of the t complex in the mouse, SD locus in 
Drosophila, and Spore killer in Neurospora where 
ratios of alleles transmitted to the offspring by a het- 
erozygous parent can be biased as much as 95:5 or 
even 99:1. The underlying causes of segregation dis- 
tortion may differ from case to case. 

The concept of segregation is also used in a more 
specific sense in fungal genetics with regard to the 
arrangement of spores in a linear ascus. Thus, if A 
and a are alleles at a particular locus, a situation 
when spores are arranged in the order AAaa (in a 
tetrad) or AAAAaaaa (in an octad) is referred to as a 
first-division segregation (FDS) pattern. AaAa and 
AAaaAAaa (in a tetrad and octad, respectively) is 
called second-division segregation (SDS). The linear 
nature of the ascus with respect to the division planes 
of meiosis allows the inference that in FDS the two 
alleles were separated in the first meiotic division, 
whereas in SDS the alleles segregated in the second 
meiotic division. This information, in turn, can be 
used to conclude that no recombination (or an even 
number of events, if the distance is large) has occurred 
between the locus and the centromere in the first case. 
In contrast, there was a recombination event (or an 
odd number of events) in the second case. In this way, 
analysis of the order of spores in linear asci can be 
utilized for centromere mapping. 


See also: Mendel’s Laws; Mendelian Genetics 


Segregation Distortion, 
Mouse 
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According to Mendel’s First Law, sexually reprodu- 
cing organisms segregate the two copies they carry 
of each of their genes equally to their gametes and 
offspring. Thus, if an organism is heterozygous with 
an Al and an A2 allele at its A locus, half of its off- 
spring (on average) will receive the A1 allele, and half 
will receive the A2 allele. Some unusual genetic 
entities violate Mendel’s First Law. In the mouse, a 


chromosomal region called the t complex can be pres- 
ent in a mutant form known as a t haplotype. Males 
that are heterozygous with a wild-type form of the 
t complex and a t haplotype form can transmit the t 
haplotype to over 90% of their offspring in a clear 
violation of Mendel’s Law. Since segregation is not 
equal, as for most genes, this process is referred to as 
segregation distortion. 


See also: Meiotic Drive, Mouse 


Selection 


See: Frequency-Dependent Selection; Frequency- 
Dependent Selection as Expressed in Rare Male 
Mating Advantages; Fundamental Theorem of 
Natural Selection; Natural Selection 


Selection Coefficient 
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Natural selection is differential reproduction and at 
the single gene level it is modeled mathematically by 
assigning relative probabilities of reproduction to the 
various genotypes at a genetic locus. For example, a 
gene which may mutate to an allele which leads to 
prereproductive death or to sterility in homozygotes 
may be assigned a zero probability of reproducing and 
leaving progeny. In this simple case, we may also 
assume that the lethal/sterile allele is completely reces- 
sive; then the probabilities of reproduction for the 
heterozygote and alternate homozygote will be one 
or unity. If we count genotypes at fertilization we will 
have AA, Aa, and aa individuals before selection acts. 
The most common model assigns reproductive prob- 
abilities of 1 to the AA and Aa genotypes and zero to 
the aa genotype. These are multiplied by the frequen- 
cies of the three genotypes to predict frequencies in 
the reproductive gene pool. Clearly, only the AA and 
Aa genotypes will be represented among the indi- 
viduals reproducing and the fertilization-stage geno- 
types for the next generation will be determined by 
the relative frequencies of these two genotypes. The 
most common formulation of this model uses notation 
in which the most likely to reproduce genotype(s) 
are assigned values of 1.0 and the less successful 
genotype(s) are assigned values of (1.0 — s). In this 
formulation the variable s is known as the selection 
coefficient and takes values ranging from zero to 1.0. 
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In some cases selection coefficients may vary with 
population density or genotype frequency. 


See also: Frequency-Dependent Selection 


Selection Differential 
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If there is selection aimed at changing the expression 
of a quantitative trait in a population, individuals that 
have measurements in a desired range of values are 
chosen to be parents. So if, for example, high yield is 
what a plant breeder wants, he or she saves for repro- 
duction plants that have yields in the highest 100p 
percent of the population. The average amount by 
which chosen individuals exceed the population mean 
is called the selection differential. 

The selection differential does not by itself express 
the strength of selection. A more revealing way to do 
this is to divide the selection differential $ by the 
phenotypic standard deviation ø. The ratio, i, that is 
thus obtained is called the intensity of selection. If, 
in particular, it is assumed that measurements on a 
quantitative character are normally distributed and a 
proportion p of individuals having the highest meas- 
urements are selected, 


where z is the ordinate of the standard normal distri- 
bution at the point of truncation. 

The concept of a selection differential can be gen- 
eralized. First, one may consider a weighted selection 
differential. A weight, proportional to the contribu- 
tion to offspring measured in the next generation, is 
assigned to a parent (or a pair of parents). Second, if 
several characters are correlated with the one under 
selection, the selection differential for any trait may 
be partitioned into a component that estimates direct 
selection and a sum of components from indirect 
selection on all correlated traits. A third generalization 
applies where selection is based on an index which is a 
linear combination of measurements on at least two 
traits. A weight associated with a trait in this expres- 
sion represents the relative importance assigned to the 
trait by the breeder. In this case the selection differ- 
ential in the average amount by which the chosen 
individuals have index values that exceed the popula- 
tion mean of the index. 
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Further Reading 

Falconer DS and Mackay TFC (1996) Introduction to Quantitative 
Genetics, 4th edn. Harlow, UK: Longman. 

Lynch M and Walsh B (1998) Genetics and Analysis of Quantitative 
Traits. Sunderland, MA: Sinauer Associates. 


See also: QTL (Quantitative Trait Locus); 
Quantitative Inheritance; Quantitative Trait; 
Selective Breeding 
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In programs for the genetic improvement of animals 
and plants, the aim is usually to improve performance 
in a number of traits. In pigs, for example, these traits 
might include traits of the growing animal such as 
growth rate, leanness, efficiency of conversion of 
food into growth, and viability; and traits of the par- 
ents such as litter size, fertility, and longevity. Further- 
more, data are collected on individual candidates for 
selection and on their relatives, perhaps on different 
traits and at different times: for example carcass lean- 
ness can not be recorded on an animal for breeding, 
and traits such as litter size can be recorded only on 
females. The selection index is designed to put 
together information in an optimal way to enable 
selection of the individuals likely to have progeny 
with the highest overall economic performance. The 
bases for constructing a multitrait selection index were 
put forward by Fairfield Smith in 1936 and by Hazel 
in 1943. In 1947 Lush showed how to incorporate 
information on relatives. 

The selection index is essentially a multiple regres- 
sion predictor of progeny performance or breeding 
value for an objective, typically economic merit, 
from a set of observations, y; on an individual or its 
relatives. The index, I= b;y; is a weighted sum of 
these observations and the index weights, b;, or in 
matrix notation, J=b’y. The variances and covariances 
of the y; are summarized by P, with elements 
py=cov(y; y;). P is traditionally called the phenotypic 
covariance matrix, but the y; can be family means, or 
predicted breeding values, or indeed almost any meas- 
urement. Economic merit, H, is usually decomposed 
into a set of breeding values, z;, for individual traits, 
so H=)'z,a;=z'a, where the economic weights, aj, 
specify the financial benefit of a unit change in 
the trait (e.g., the price of an average chicken egg), 
holding the rest constant. The covariances between 


the observations and these genetic objectives are sum- 
marized by the genetic (co)variance matrix G, with 
gy=cov(y; Z;)- 

The weights of the optimal index are given by 
b=P~'Ga; and if the objective is just to improve a 
single trait, G can be represented by a column vector 
for that trait, with a;=1. These optimal index weights 
satisfy a number of criteria: they maximize the accur- 
acy of the index (the correlation, rim, between J and 
H), the probability that two individuals are correctly 
ranked, and the selection response; and they minimize 
the variance of predicted about actual values of H. The 
index is, of course, optimal only if P, G, and a are 
known without error. In practice this can never be 
the case so that accuracy is usually less than predicted, 
but indices are robust to poor estimates of most of the 
variables. The index can be expanded in various ways: 
for example to include data on molecular markers 
linked to loci influencing the quantitative traits of 
interest; and to maximize changes in overall perform- 
ance, but holding other traits constant. 

Selection indices are widely used in animal breed- 
ing, where information on individual animals is expen- 
sive to acquire and progeny group sizes often small, 
and on trees, but are less used in other plant breeding 
programs. An example of the use of an index was in pig 
breeding in Britain in a national program operated 
in the 1960 and 1970s. Four littermates were taken 
from each candidate litter to a central testing station 
and reared to bacon weight, when two were then 
slaughtered. Indices were computed for selecting the 
boars, each combining 14 items of information: rate of 
weight gain, efficiency of conversion of food into gain 
and backfat depth (recorded ultrasonically) indi- 
vidually on both the boar and his brother, and the 
average daily gain, food conversion, and six carcass 
traits, including dissected lean content of a joint, on 
their two slaughtered sibs. 

In order to compare the performance of candidates 
for selection using their own or relatives’ perform- 
ance, identifiable sources of environmentally caused 
differences between them have to be eliminated. These 
include, for example, number of animals in the litter in 
which they were born, and season, year, or herd of 
birth. In the classical selection index, these envir- 
onmental effects are assumed to be negligible, for exam- 
ple if candidates are reared together, or accurately 
estimated. In many breeding programs using field 
records, for example on milk production of dairy 
cows, environmental effects such as herd can not be 
estimated with sufficient precision. The method of best 
linear unbiased prediction (BLUP), due to Henderson, 
deals with this problem. His ideas evolved from 1949, 
but first came into use two decades later for selection 
of dairy bulls used in artificial insemination programs 


and having daughters unevenly distributed over many 
herds. Now it is common to use an animal model in 
which a breeding value is computed for each indi- 
vidual, taking account of records on that animal and 
all its relatives, properly weighted according to the 
degree of relationship, ina single analysis. In principle, 
BLUP combines the methods of least squares to esti- 
mate identifiable environmental (fixed) effects and 
selection indices to predict the random effects or 
breeding values. BLUP is highly computer intensive, 
but is now standard practice in animal breeding 
programs, and has largely replaced the traditional 
selection index. 


Further Reading 

Cameron ND (1997) Selection Indices and Prediction of Genetic 
Merit in Animal Breeding. Wallingford, UK: CAB International. 

Falconer DS and Mackay TFC (1996) Introduction to Quantitative 
Genetics, 4th edn. Harlow, UK: Longman. 


See also: Artificial Selection; Genetic Correlation; 
Selective Breeding 
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Selection intensity is a measure of the strength of 
directional selection applied in a selection experiment 
or breeding program to change a quantitative trait. If 
selection is practiced on individual performance, the 
selection applied can be described by the selection 
differential (S), which is the difference between the 
mean performance of the selected individuals and 
that of the population as a whole. The selection differ- 
ential is measured in units of the trait, for example 
grams of body weight or number of offspring 
born in a litter of mice or pigs. The selection intensity, 
usually denoted 7, equals the selection differential 
measured in phenotypic standard deviations 
(op = Vp), i.e., i = S/op. Therefore z is a dimension- 
less quantity, and its magnitude does not depend on 
the variability of the trait. 

The selection intensity is a useful measure because 
its value can be predicted in an artificial selection 
program from knowledge of the selection criteria, 
the proportion of individuals selected and the distri- 
bution of the trait. For example with truncation selec- 
tion, in which the highest performing individuals for 
the trait are selected, 7 is a simple function of propor- 
tion selected (p) for any distribution. For the normal 
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distribution, tables of 7 are available (e.g., Falconer and 
Mackay, 1996), or it can be computed as 7 = z/p, 
where z is the ordinate of the standardized normal at 
the truncation point. For example: 


p 0.5 0.2 0.1 0.05 0.01 0.001 
zt 0.798 1.400 1.755 2.063 2.665 3.367 


Selection intensity depends somewhat on the size of 
the population. For a given proportion selected the 
intensity becomes slightly less as the population size 
becomes smaller, and values can be computed using 
order statistics. For example, if N are selected from M 
recorded, and N = M/10, e.g., 2 out of 20: 


M 10 20 50 100 200 
2 1.539 1.638 1.705 1.730 1.742 


> œ 
1.755 


The intensity is further reduced in a predictable way in 
a small population because the performance of family 
members is correlated. 

If selection is practiced on individual performance, 
the predicted response to one generation of selection is 
given by R = h?S = th’op, where i is the mean for 
males and females if the intensities differ between the 
sexes. Thus, for example, if the highest-scoring 5% of 
males and 20% of females of a large population are 
selected for a trait (say growth rate of pigs) with op = 
50g per day and h? = 0.3, the predicted response 
would be 26 g per day (= 0.5 x (2.063 + 1.400) x 0.3 
x 50). The selection intensity can be used to measure 
the amount of selection when selection is not simply 
on individual performance (mass selection) but on 
relatives’ performance or indeed on any quantitative 
selection index. It also features in formulas for the 
change in gene frequency and hence selective value 
(s) of a gene affecting a quantitative trait under selec- 
tion: with mass selection, s = ia/op, where a is the 
effect of the gene. 

Selection intensity depends on reproductive rate, 
but the breeder can manipulate this, for example, by 
retaining selected animals for more litters or by using 
techniques such as artificial insemination. There can 
be a tradeoff, however: an increase in intensity may be 
at the expense of increased generation interval and at 
the cost of increasing inbreeding and reducing vari- 
ation if few individuals are selected. 


Natural Selection 


Selection intensity can be computed even where trun- 
cation selection is not practiced. In artificial selection 
this may arise where numbers of offspring are de- 
liberately adjusted according to performance. Under 
natural selection, individuals contribute to the next 


1794 Selection Limit 


generation according to their fitness. Assume that the 
number of offspring of individual 7 is X; and the mean 
number is u. Hence the relative fitness of the indi- 
vidual is X;/u, and the selection differential in fitness 
is X;(X; — n) X;/p? = Vx/p?. This is called the index 
of opportunity for selection. As the standard deviation 
of fitness is y (Vx/u), it follows that i = \/(Vx/). 
This shows that, of course, selection can occur only if 
there is variability in fitness. Whether selection on 
fitness, or indeed any other trait, is effective depends 
then on it having additive genetic variance. Selection 
intensity is not used to define the magnitude of stabi- 
lizing selection, i.e., whereby selection acts mainly to 
reduce variance in fitness. 


Further Reading 
Cameron ND (1997) Selection Indices and Prediction of Genetic 
Merit in Animal Breeding. Wallingford, UK: CAB International. 


Reference 
Falconer DS and Mackay TFC (1996) Introduction to Quantitative 
Genetics, 4th edn. Harlow, UK: Longman. 


See also: Artificial Selection; Heritability 
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In long-term selection experiments for quantitative 
traits it has often been found that after many gener- 
ations of selection, a plateau or selection limit has been 
reached at which there appears to be little or no 
response despite continued selection. The limit can 
be explained by fixation of all useful genetic variation 
or by counteracting effects, for example, natural selec- 
tion opposing the artificial selection. 

It is usually hard to be sure a limit actually has 
been reached because of sampling due to small num- 
bers of individuals and environmental differences 
among generations. Even so there are some clear 
cases (e.g., F. W. Robertson, 1955) where limits have 
been reached. There are others in which, despite selec- 
tion for over 50 generations, limits appear not to have 
been obtained. An example is the Illinois corn ex- 
periment for increased oil content, in which response 
has continued for a century. In the low line, how- 
ever, little response has occurred in recent gener- 
ations, but the mean is so near zero there is little 
opportunity for further change (see graph in the article 
on Artificial Selection). 


A selection limit will occur if a population has run 
out of useful variation. This is inevitable if there is 
essentially no phenotypic variation (as for low oil 
content in the Illinois corn oil experiment). More 
generally the limit can occur if additive genetic vari- 
ation is exhausted, i.e., the selected line becomes 
homozygous for all genes which were segregating in 
the base and which increase the trait in the desired 
direction. Residual nonadditive genetic variation can 
remain at such a limit if the favorable genes are domin- 
ant and reach high frequency or if there is overdo- 
minance. 

The magnitude of the response to the limit in rela- 
tion to the genetic variation in the base population 
depends on the numbers of genes affecting the trait 
and on the distribution of their effects. If very few loci 
which influenced the trait were segregating in the base 
population, such that individuals with the extreme 
genotype were present in it, albeit at low frequency, 
the limit would not be outside the initial range 
of the population. Usually, however, it is far outside 
the initial range, i.e., the total response is many 
phenotypic standard deviations. An estimate of the 
number (7) of genes affecting a trait, Wright’s effective 
number, can be obtained by comparing the range 
(R = high-low divergence) achieved to the additive 
genetic variation (Va) in the base population or in an 
F, cross of high and low lines, as 2 = R?/8V;. 

As a population under selection is necessarily of 
finite size, desirable genes may be lost by chance, 
particularly those with a small effect on the trait and 
particularly if selection is weak and the population 
size is small. The limit to artificial selection then 
depends on the probability of fixation of the favorable 
genes. In a theory of limits to artificial selection, A. 
Robertson (1966) showed that the fixation probability 
is proportional to the product of effective population 
size (Ne), selection intensity (z), the effect of the gene 
on the trait relative to the phenotypic standard devi- 
ation, and its degree of dominance. Prediction of the 
actual limit to selection is not, however, possible with- 
out (usually) unknown information on the distribu- 
tion of gene effects and frequencies in the base 
population, but nevertheless there are some practical 
consequences of the theory. In particular there is a 
trade-off between short- and long-term response, for 
the initial response is proportional to the selection 
intensity, whereas the limit is proportional to Nei, 
and is maximized if only one-half of the population 
is selected. Similarly, use of relatives’ information in a 

selection index reduces the limit because relatives are 
coselected, and so N. is reduced more than the accur- 
acy of selection is increased. 

Fixation is not the only cause of selection limits. 
In theory, limits can occur if there are overdominant 


loci or if most of the variance is due to recessive genes, 
when inbreeding would lead to reduction in perform- 
ance. More importantly, perhaps, because selected 
populations become extreme for the trait under selec- 
tion, but also show correlated responses in other traits, 
it is to be expected that natural selection opposes 
artificial selection such that a limit occurs at the bal- 
ance between these opposing forces. Evidence comes 
from experiments in which the population mean at the 
limit falls when either selection in the opposite direc- 
tion (reversed selection) has been practiced or the 
population has been maintained without selection 
(relaxed selection). Natural selection may be a conse- 
quence solely of the shift in mean of the correlated 
traits subject to stabilizing selection, or of increases in 
frequency of specific genes with effect on the trait 
under selection but also pleiotropic effects upon fit- 
ness. (Extreme examples found are genes that have 
a large effect on the trait as a heterozygote, but are 
lethal as a homozygote.) 

As new variation in quantitative traits arises by 
mutation, limits cannot happen as a consequence of 
running out of variation unless there are so few pos- 
sible loci and useful alleles at them that all were present 
at the outset or appeared during the selection process. 
It is therefore likely that fixation cannot account 
exclusively for limits, and other factors such as natural 
selection have to be invoked. All ‘limits’ may there- 
fore be transient, and renewed responses expected 
and explained by mutations or, perhaps, by recom- 
bination among haplotypes with balanced repulsion 
for useful genes. Nevertheless, in selection experi- 
ments for competitive fitness in bacteria for which 
responses must derive from mutation, Lenski and 
colleagues (Lenski and Travisano, 1994) found plat- 
eaus in response after thousands of generations of 
selection. 


Further Reading 

Falconer DS and Mackay TFC (1996) Introduction to Quantitative 
Genetics, 4th edn. Harlow, UK: Longman. 

Hill WG and Caballero A (1992) Artificial selection ex- 
periments. Annual Review of Systematics and Ecology 23: 287— 
310. 
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Selection, either natural or artificial, involves unequal 
reproduction among various genetic types. Consider 
the old example of selection for longer necks in giraffes. 
In times of scarce browse, the giraffes able to reach the 
tops of trees would eat more and have more progeny. To 
the extent that their longer necks and legs were geneti- 
cally determined we would expect to see taller progeny 
who would, in turn have taller progeny. There is a point 
of diminishing returns in this type of selection as the 
giraffe population becomes taller and taller in response 
to the hunt for nutrition higher and higher in the trees. 
Eventually the competition for browse is just as 
intense at the tops of the trees as it was at other levels. 
The effectiveness of selection in changing giraffe height 
genotypes over generations is selection pressure. 

Selection pressure depends primarily on the se- 
lection differential (see Selection Differential, Selection 
Intensity) and the amount of genetic variation in the 
selected population (see Heritability). Consider gene- 
tic resistance to pathogens (see Sickle Cell Anemia) 
in which any differential reproduction among geno- 
types, the selection differential, takes place only in the 
presence of the pathogen and magnitude of the repro- 
ductive advantage depends on the prevalence of the 
pathogen. In this example the selection pressure will 
depend on the prevalence of the pathogen; there are no 
genotypic differences in the absence of the pathogen 
and as the disease becomes more common the repro- 
ductive differences among the genotypes become 
more important in altering the reproductive success 
of different genotypes. 


See also: Branch Migration; Heritability; 
Selection Differential; Sickle Cell Anemia 
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Selection techniques are used by geneticists to isolate 
mutations. The techniques involve the process of 
isolating cells with a mutant phenotype by choos- 
ing conditions that favor the survival of the mutant 
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phenotype and disfavor the survival of the parental 
type. Selective techniques are a powerful tool and 
are routinely used by microbial geneticists. Selective 
techniques can be robust when applied to micro- 
organisms because of the numbers of organisms 
that can be manipulated. Although selective tech- 
niques exist for complex organisms, it is much easier 
to handle a million bacterial cells than a million 
mice. 

Typically a large population of bacterial cells are 
grown under selective conditions so that the relatively 
small number of variants that have arisen due to muta- 
tion are the only cells that are capable of forming 
a colony on selective agar plates. Among the most 
powerful of selective techniques is the selection of 
antibiotic-resistant cells. In a population of hundreds 
of millions of antibiotic sensitive cells, a single 
antibiotic-resistant cell can be isolated with little 
effort. The selective growth condition in this case 
contains in addition to all the required nutrients for 
growth, an antibiotic that would prevent the original 
population of cells from growing. 

Selection techniques vary and can employ the cap- 
acity to: (1) grow in the presence of an antibiotic or 
other inhibitor; (2) utilize a new carbon source; (3) 
resist bacteriophage infection; (4) convert from auxo- 
trophy to prototrophy; (5) grow at a higher/lower 
temperature; and so on. 

A strong selection usually results in the demanded 
phenotype, but not necessarily the desired genotype. 
Usually it is necessary to analyze the new phenotype 
to verify the genotype. For example, an investigator 
would like to set up a selection that would increase the 
catalytic activity (efficiency) of the enzyme respon- 
sible for degrading the antibiotic ampicillin and there- 
fore increase the level of resistance. A strain of 
Escherichia coli that is resistant to 20 ug ml~! of ampi- 
cillin but sensitive to 100 pg ml”! of ampicillin is 
spread on agar plates that contain 100 pg ml! of 
ampicillin. By increasing the concentration of ampi- 
cillin on the agar plates, it is anticipated that mutations 
would occur that would improve the catalytic effi- 
ciency of the enzyme, which degrades the ampicillin. 
However, after analyzing the colonies that grew on 
the higher level of ampicillin, it is determined that two 
different independent classes of mutations can result 
in resistance to the higher level of ampicillin. One is 
due to the anticipated improvement of an increased 
catalytic activity of the enzyme; the other class is due 
to increased expression of the enzyme (increased 
number of molecules of the enzyme). 


See also: Antibiotic-Resistance Mutants; 
Screening 
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The basis of genetic improvement programs in any 
organism is selective breeding, where individuals are 
chosen that are expected to have offspring with desir- 
able properties. This is directed evolution: fitness is 
defined by the breeder rather than by the individual’s 
ability to survive and reproduce in nature. Selective 
breeding long predates the discovery of the mechan- 
isms of inheritance. Indeed, Darwin was much influ- 
enced by the success of selective breeding in animals 
and plants in developing the theory of natural selec- 
tion “There can be no doubt that methodical selection 
[by man] has effected and will effect wonderful 
results” (Darwin, 1868). The vast range of phenotypes 
of dogs (all derived by selection from the wolf) in color, 
size, conformation, and behavior, is perhaps the clear- 
est example of the power of selective breeding over 
long periods of time. Improvements in selection of 
plants and animals for food, illustrated by the greatly 
increased grain production of modern varieties of 
plants and of meat or milk by the modern breeds or 
strains of animals, has enabled an enormous increase 
in the human population. Selective breeding is used 
to develop more efficient strains of microorganisms, 
for example yeast for brewing, and of laboratory 
stocks with defined properties, such as extreme 
obesity, for analysis to elucidate the genetic basis of 
the trait. 


Principles of Selective Breeding 
Programs 


For selective breeding to be effective, there must be 
genetic variation present in the population, a way of 
identifying individuals for selection that are likely to 
transmit the desired properties to the descendants, 
and sufficient spare reproductive capacity so that the 
population can be bred from only the chosen individ- 
uals. For most traits there is considerable variation 
at the observed or phenotypic level, thus providing 
plenty of selective opportunity. Indeed, selection has 
utilized both extreme mutant forms that have arisen or 
been identified, suchas coat color in livestock or dwarf- 
ing genes in wheat, and quantitative genetic variation 
contributed by many unidentified loci. Most traits 
have a sufficiently high heritability, i.e., proportion 
of the variation that is genetic (formally additive 


genetic), for artificial selection to be effective. In other 
words, individuals that are extreme for the trait(s) of 
interest are likely to have offspring with somewhat 
similar properties, albeit less extreme than the selected 
parents. In practice, selection may not be based just on 
the individual’s own performance, but additionally or 
exclusively on that of its relatives using a selection 
index or best-linear unbiased prediction. Spare repro- 
ductive capacity to enable selection is the norm in most 
plants and in most animals, at least among males. Tech- 
niques such as artificial insemination can be used to 
increase the selection intensity. 

A breeding program needs a clear set of objectives 
that is followed over several generations. Typically, this 
involves the simultaneous improvement of many traits. 
For example, in wheat these include traits of the prod- 
uct, such as yield and bread-making quality, and agro- 
nomic traits such as straw strength, disease resistance, 
and drought resistance. In dairy cattle, the equivalent 
objectives include milk yield and protein composition, 
fertility, longevity, and mastitis resistance. 

A further fundamental component of a selective 
breeding program is the mating system employed by 
the breeder, which depends on the reproductive sys- 
tem of the organism, for example whether heterosis is 
important for the traits of commercial interest, and 
whether homogeneity of product is required. 

In most livestock breeding programs, selection is 
practiced within segregating populations of as large a 
size as can be managed or is available nationally. In 
dairy cattle, commercial animals are usually purebred, 
largely because one breed in particular, the Holstein- 
Friesian (the black and white), is regarded as superior 
for milk production. In poultry or pigs that are bred 
for meat production, selection is typically practiced 
within three or more segregating populations and 
crosses made to produce the dam of the commercial 
animal, and a further three-way cross to produce the 
commercial offspring. This is mainly to utilize hetero- 
sis in reproductive performance of the dam, and to 
utilize complementary properties of the sire and dam 
lines. The sire line(s) do not contribute to reproduc- 
tion except via fertility and can be selected primarily 
for traits of growth, and the dam lines are selected for 
both growth and reproductive performance. 

In plants, the wide range of reproductive systems 
leads to a wide range of design of improvement pro- 
grams. For naturally outcrossing species of plant such 
as maize (corn), for which there is considerable het- 
erosis in the commercial traits of yield and a need to 
produce a uniform product, the seed grown commer- 
cially is a two-way or higher cross of inbred lines. The 
selection is practiced both within inbred lines during 
their formation from crosses of existing commercial or 
other populations, and among inbred lines on the basis 
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of their own and crossbred performance. It is the 
selection that leads to the success of a new variety of 
maize, and not the inbreeding per se. In natural selfing 
species such as wheat, inbred lines are used commer- 
cially and are developed by selection within and 
among selfing inbred lines developed from crosses. 
In species that are reproduced clonally, such as the 
potato, selection has to be practiced during a repro- 
ductive cycle, but subsequent uniformity of product 
does not require homozygosity of the variety. Thus, 
while clonal reproduction offers the opportunity to 
use a specific genotype widely, it does not offer a route 
to subsequent improvement for which segregation and 
recombination are needed. 

There are a considerable number of theories and 
experiments on the design of selective breeding pro- 
grams in animals and plants, which have been devel- 
oped over many decades. In livestock, the most 
influential proponent of the use of genetic principles 
was J.L. Lush, a disciple of Sewall Wright; and most 
modern methodology comes from C.R. Henderson 
whose work, like that of Lush, was mainly motivated 
by the problems of dairy cattle improvement. In 
plants, the use of inbreeding and selection was pro- 
pounded by Mangelsdorf, using the ideas of East and 
Jones. In general, because they can be grown in large 
numbers and individual plants of most crops are not 
valuable, less formal statistical/quantitative genetic 
methodology has been applied in plant rather than 
in animal and (more recently) tree improvement 
programs. 

Success in a breeding program does not depend 
solely on its scientific basis. It requires, as indeed 
does any business, quality management that actually 
executes the geneticists’ breeding plan, financial 
strength, foresight, and luck. 


Examples of Genetic Changes from 
Selective Breeding in Practice 


The efficacy of selective breeding is obvious from the 
changes in yield and of the cost of food relative to 
income. While improvement in yields can be attrib- 
uted to both genetic selection and to management, they 
can be separated. This typically requires the compari- 
son of stock of different generations at the same time in 
the same environment, for example by planting seeds 
stored for several years alongside seed from modern 
varieties. Even so, the most spectacular differences are 
not the products of man’s recent efforts, but can be 
seen among breeds of dog that differ by almost 100- 
fold in weight. These breeds have been developed over 
many centuries, and have presumably utilized mu- 
tations that have occurred over the long period since 
domestication. Similarly, much of the improvement 
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during and subsequent to domestication in food plants 
and animals was not based on knowledge of genetics. 
Some examples of the changes brought about by more 
recent selective breeding can be obtained from well- 
conducted experiments. 


Animals 
In the past, the same breeds or strains of chickens were 
used for both commercial egg and meat production, 
but now they are specialized. Most selection in broiler 
chickens for meat consumption has been placed on 
growth rate, as faster growing birds incur less food 
and housing costs at market weight. Selection has, 
however, also been placed on many other traits, 
including meat yield, conformation, leg function, and 
disease resistance of the broiler and on reproductive 
performance of the broiler parent. Havenstein et al. 
(1994) estimated genetic progress in broiler chickens 
by comparing the performance of a population main- 
tained without selection since 1957 and a 1991 com- 
mercial strain, in each case fed on a diet typical for 
1991 (Table 1). Note the threefold increase in growth 
rate, thereby enabling slaughter at a younger age (e.g., 
at 6 versus 8 weeks, but consequently with less flavor), 
with improved feed conversion efficiency and meat 
yield. There are downsides, however: the birds are 
fatter, so feed needs to be restricted to broiler hens, 
and there are increases in mortality and leg abnor- 
mality. Birds were also compared on a diet formulated 
to 1957 rather than 1991 standards (which has about 
10% higher energy and protein content). At 6 weeks of 
age, for example, body weights were 0.51 and 1.77 kg for 
the 1957 and 1991 strains, respectively, fed on the 1957 
diet. Hence most of the change was genetic in nature. 
In dairy cattle, as in many other species of livestock, 
there has been substantial breed substitution (notably 
toward more specialized dairy types) between coun- 
tries, particularly in the black and whites. The North 
American Holstein population was derived from 
European animals exported during the late nineteenth 
century. Subsequently, however, because of greater 
concentration on milk production characteristics, the 
American population became superior in production 
to those remaining in Europe. During the last quarter 
of the twentieth century there has been almost 


complete replacement of European animals by North 
American Holsteins. Rates of genetic change have 
greatly accelerated in recent decades as modern selec- 
tion and breeding methods have been introduced, and 
are now in excess of 1% of the mean per year for 
production traits. 

There are some problems and failures, however. 
Thoroughbred racehorses do not seem to be running 
much faster now than 50 years ago, judging by 
winning times recorded in the classic races. Genetic 
change in some species has left associated fitness prob- 
lems in their wake; for example, problems with leg 
weakness in broiler stocks have to be overcome or 
kept in abeyance by devoting selection effort to such 
traits in the breeding program. In the developing 
world, and even in less developed areas of other coun- 
tries, there has been little input or uptake of genetic 
change, in line with the lack of change in management 
practices. 


Plants 

Plants need to be adapted to their environment, which 
for field crops can be modified but not controlled, in 
contrast to animal breeding where housing can be 
uniform worldwide. Genotype x environment inter- 
actions therefore are an important feature of plant 
breeding: for example, different varieties of maize are 
used at different latitudes in North America to make 
best use of the length of the local growing season. 
Improvements in yield have been well publicized, for 
example in the ‘Green Revolution’ where changes in 
management practices have been accompanied by and 
have benefited from new varieties. Indeed, for genetic 
progress to be effective, beneficial changes in manage- 
ment are usually required. In order to isolate the con- 
sequences of selective breeding, designed experiments 
are more suitable. Table 2 illustrates the extent and 
basis of the changes in winter-sown wheat (Austin 
et al., 1980). Varieties that were of major commercial 
importance in their time were grown in 1978 in 
England in low and high fertility soils. Note in par- 
ticular the substantial increases in yield, much of it 
achieved by shortening of the straw. The latter is associ- 
ated with an increase in the proportion of the plant 
mass present in the grain, and enables much heavier 


Table | Growth in the same trial of broiler chickens bred in 1957 and 1991 
Strain Body weight (kg) Feed conversion (feed/gain) Per cent (at 6 weeks) 

At 6 weeks At8weeks At 6 weeks At 8 weeks Meat Fat Deaths TD? 
1957 0.63 0.99 2.51 2.65 11.6 8.4 2.2 1.2 
1991 2.13 3.11 2.04 2.34 15.6 14.1 9.7 47.5 


“Tibial dyschondroplasia. 
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Table 2 Production of wheat in the same trial from varieties bred in different years. 
Year variety Yield(t/ha) Height(cm) Harvest index* 
introduced 

Poor soil Good soil Poor soil Good soil Poor soil Good soil 
1908 3.30 5.22 112 142 34 36 
1953 3.74 5.86 87 110 42 42 
1972 4.04 6.54 86 106 46 46 
1977 4.63 7.30 64 80 50 48 


°Grain/(grain + straw)%. 


use to be made of nitrogen fertilizer without lodging 
(i.e., failing to stand). Indeed the 1977 variety was the 
first to incorporate a dwarfing gene with major effect. 


Developments in Selective Breeding 
Programs 


There has been extensive research on optimizing 
breeding schemes. This includes methods for predic- 
tion of breeding value using selection indices and best 
line unbiased prediction (BLUP) (see Selection Index) 
for better estimation of genetic parameters such as 
heritabilities and correlations for use in these predic- 
tors, and for balancing selection intensity and effective 
population size. As decisions among potential breed- 
ing animals or plants cannot usually be made until they 
are mature enough to have records, for example, on milk 
yield or amount of wood produced, there has been 
much research into indirect predictors of performance. 
Yield in first lactation is an excellent indicator of yieldin 
later lactations, but indirect measures such as hormone 
levels are usually not found to be sufficiently accurate to 
be useful. 

Markers of individual loci associated with perform- 
ance provide a quite different route. For example, a 
molecular marker has been used to identify hetero- 
zygotes and so eliminate a recessive gene causing 
stress susceptibility in pigs. With the advent of large 
numbers of molecular markers and dense linkage 
maps, the opportunities to use Mendelian variants 
increases, and studies have been conducted in the 
major commercial species of plants and animals to 
identify quantitative trait loci (QTL). This infor- 
mation is intended for use in two ways. One is 
marker-assisted introgression, where QTL from one 
population are backcrossed into another (e.g., the 
dwarfing gene in wheat), using the marker informa- 
tion to bring in the QTL but to exclude as much 
background as possible. The other is marker-assisted 
selection, where marker data are used alongside quan- 
titative trait data to increase the accuracy of selection, 
and to make more accurate early selection; for example, 
in picking recombinant lines from an F; cross, or in 


selecting among young bulls prior to progeny testing 
for milk. Such marker-assisted selection is most likely 
to be effective where there is much linkage disequilib- 
rium, such as in an F; cross of inbred lines of plants, 
rather than in random-mating livestock populations. 

Transgenic manipulation is being used commer- 
cially in plants, for example, to provide herbicide- 
resistant soybeans, but is not yet commercially 
available in animals. Indeed, the consumer response 
to genetically modified varieties is often highly emo- 
tional, even though the actual genetic changes made in 
effecting the improvement are known, whereas in 
classical breeding they are not. Selective breeding is 
increasingly based on a wider range of science and 
technology and is exposed to greater public interest 
as new routes to improvement become available. 
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Effect of a Mutant and Neutrality 


When a mutant is neither advantageous nor deleteri- 
ous, and its behavior is determined not by selection 
but by random genetic drift, the condition is said to be 
selective neutrality. In the strict sense, it means that a 
mutant has no effect whatsoever. An example is a 
nucleotide change of a pseudogene; since the pseudo- 
gene has no function, any changes to that gene have no 
effect. However, it is possible, even in this example, 
that a nucleotide change has some very small effects on 
DNA replication and recombination, and hence it is 
not completely neutral. Such a small effect cannot be 
recognized by natural selection, and is considered to 
be practically negligible. Let us define selective neu- 
trality in the narrow sense and that in the broad sense. 
The former applies to those cases where the effect of 
a mutant is practically nil in the biological world, 
whereas the latter includes the cases in which a mutant 
has some small, but not negligible, effects. For the 
latter, both random genetic drift and selection become 
important, and the nearly neutral theory can be 
applied. 


Selective Neutrality in the Narrow Sense 


The neutral theory was first put forward by 
M. Kimura, and later by J.L. King and T.H. Jukes. 
Under selective neutrality in the narrow sense, which 
was mainly developed by Kimura and his associates, 
the behavior of mutant alleles in the population is 
solely controlled by random genetic drift, and the 
theory becomes simple. Let us consider the process 
of accumulation of new mutants within the species in 
the course of evolution. Suppose that mutant genes are 
substituted one after another in the finite population 
of N individuals. Let v be the neutral mutation rate per 
gene per generation. Since each individual has two 
homologous genes, there are 2N genes in the popula- 
tion, and 2Nv new mutants appear in the population 
in each generation. The rate of mutant substitutions 
per generation is equal to this number multiplied 
by the fixation probability. For a neutral mutant, the 
fixation probability is equal to the initial frequency, 
1/(2N). Therefore, the rate of substitution, k, be- 


comes: 


k=v (1) 


This formula is very simple, i.e., the rate of mutant 
substitution equals the neutral mutation rate. It should 
be noticed that the remaining fraction, 1 — 1/(2N), of 
mutants are lost from the population. 

In considering the population dynamics of mutant 
substitutions, one needs to know how long the mutant 
takes until it fixes in the population. The average time 
until fixation, f4, of a neutral mutant is known to be 
four times the effective population size, Ne: 


ti = 4Ne (2) 


The effective population size is usually smaller than 
the actual population size, N. During the process of 
the substitution, polymorphism of mutant and origin- 
al alleles appears. Population geneticists often measure 
polymorphism by heterozygosity, which is the prob- 
ability that two randomly chosen alleles differ. Under 
the selective neutrality in the narrow sense, if the 
population is in equilibrium between mutation and 
random drift, the heterozygosity, H, is expected to be: 


1 Te v (3) 
e 

This formula is again remarkably simple and quite 

useful. In addition to the heterozygosity, various 

quantities have been obtained enabling the neutral 

theory to be tested. 


Selective Neutrality in the Broad Sense 


Selective neutrality in the broad sense brings many 
complications to discussion on the subject. It is well 
known that the rate of molecular evolution is strongly 
dependent on selective constraints of proteins or 
nucleic acids, and we know that there are numerous 
types of mutations, from those with negligible effect 
to those with large effect. The borderline mutations 
between the selected and the neutral classes may be 
important and are called nearly neutral. How do the 
predictions differ from those under simple neutrality 
if these mutations are important? Theoretical studies 
on this problem were mainly carried out by T. Ohta 
and colleagues. One of the most critical quantities on 
such mutations is the fixation probability, u. For the 
simplest case of a semidominant gene with selection 
coefficient, s, the fixation probability in a finite popu- 
lation of the effective size, Ne, is the function of the 
product, Nes. It is a continuous monotone function 
of N.s. Therefore, in discussing the rate of gene sub- 
stitution, one has to consider all mutants around 
N.s = 0. The effectiveness of selection is determined 
by this product, whereas actual species have various 


population sizes from very small to very large. Hence, 
the effectiveness of selection differs among species. 
In addition, physiological conditions may influence 
weak selection, e.g., a functional constraint on a pro- 
tein may differ between homoiotherms and poi- 
kilotherms. 

Considering the importance of negative selection 
for keeping gene function, it is likely that many nearly 
neutral mutations are very slightly deleterious, i.e., s is 
negative. Remember that a slightly deleterious muta- 
tion may have a finite probability of fixation depending 
on the absolute value of the product, IN,sl. If this value 
increases, the fixation probability becomes smaller, 
i.e., it decreases as the population size becomes larger, 
or as the selection intensity becomes stronger. This 
prediction is related to the molecular clock in an 
important way. As explained, the fixation probability 
is higher in a small population than in a large popula- 
tion. On the other hand, the mutation rate would 
depend on the number of cell generations, and there- 
fore mildly on the generation number. In general, large 
organisms have a long generation time and small 
population size and vice versa, and there is a negative 
correlation between population size and generation 
time. Then the population-size effect on the fixation 
probability is expected to be partially canceled by the 
generation-time effect on mutation rate. This cancel- 
lation is likely to be responsible for the molecular 
clock of protein evolution. 


Distribution of Mutants’ Effects 


Precise formulation of selective neutrality in the broad 
sense is difficult. Distribution of mutants’ effects 
around strict neutrality is needed for the exact analy- 
sis, but it is not known. There are two theoretical 
approaches to near neutrality, i.e., the shift model 
and the fixed model. The former is based on the 
assumption that the distribution of mutants’ effects 
shifts back whenever a mutant fixes in the population. 
In other words, mutant substitutions are independent 
of one another. For example, in one study for the shift 
model, the gamma distribution is assumed for the 
effects of slightly deleterious mutations, and this dis- 
tribution remains the same when a mutant fixes, since 
the population mean shifts back to the original state. 
On the other hand, in the fixed model, the distribution 
is fixed irrespective of mutant substitutions, and the 
population mean moves according to the effect of the 
fixed mutant. This model is also called the ‘house of 
cards’ model. Contrary to the shift model, the effect of 
each substitution stays and affects subsequent substi- 
tutions by changing the mean fitness in the fixed 
model. Therefore substitutions are interrelated in 
their effects on fitness. If the evolution of a protein is 
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Population mean 


Figure | Distribution of selection coefficient of new 
mutations. (From Ohta, 1992.) 


the subject of interest, amino acid substitutions are 
interrelated, and the fixed model is more realistic 
than the shift model. 

An example of the fixed model where the normal 
distribution is used for the mutants’ effect on fitness 
is considered here in some detail. Figure | shows 
the distribution of the selection coefficient of new 
mutants around the population mean. If selection is 
strong enough, the mean moves toward the right with- 
out fluctuation. For nearly neutral mutations, it moves 
erratically but tends to increase. When the population 
mean becomes positive, the average selection coeffi- 
cient of new mutations becomes negative, i.e., new 
mutations are slightly deleterious on the average. 
Then the mutant substitution slows down. The effec- 
tiveness of selection is again determined by the prod- 
uct of population size and selection intensity, which 
is measured by the standard deviation, cs, of the nor- 
mal distribution. According to H. Tachida, the nearly 
neutral mutations lie in the range 324N.0;20.2, 
where both random drift and selection affect the 
population fitness. Although the shift model and the 
fixed one are different, some patterns of evolution and 
polymorphisms are similar under both models, i.e., 
the negative correlation between the evolutionary 
rate and the population size, and the slow increase of 
polymorphism with larger population size. 


Some Related Observations 


What kind of experimental evidence is there for select- 
ive neutrality? As compared with the rate of pheno- 
typic evolution, the rate of molecular evolution is 
remarkably constant, i.e., a molecular clock is often 
observed. This characteristic of molecular evolution is 
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strong evidence for the neutral theory as it is explained 
by equation (1). Another characteristic is the fact that 
the more constrained the protein is, the lower is its rate 
of evolution. This is also thought to support the neu- 
tral theory, since the proportion of neutral mutations 
is thought to decrease as the functional constraint 
becomes stronger. 

Through comparative studies of DNA sequences, 
many interesting patterns of evolution and poly- 
morphism have emerged. The difference in average 
patterns between synonymous and nonsynonymous 
substitutions of mammalian genes is in accord with 
selective neutrality in the broad sense, i.e., the 
generation-time effect is larger for the synonymous 
substitutions than for the nonsynonymous substitu- 
tions. 

The next question to ask concerns the variance of 
the evolutionary rate. One approach is to estimate the 
index of dispersion, that is the ratio of the variance to 
the mean number of substitutions. The index becomes 
1 under the simple Poisson process of mutant substi- 
tutions. By examining sequences of mammalian genes, 
J.H. Gillespie has shown that the dispersion index, R, 
is larger than unity, and is often between 1.5 and 10 for 
nonsynonymous substitutions. His analysis also sug- 
gests that synonymous substitutions are less erratic. Is 
this large index of nonsynonymous substitutions in 
accord with selective neutrality in the broad sense? 
Analysis of the fixed model suggests that the disper- 
sion index is only slightly larger than unity. In other 
words, the interaction effect of mutant substitutions 
of the fixed model is not big enough to explain the 
observed value of R. If one incorporates changes in 
population size, R can be shown to become larger, and 
to be similar to the observed value. 

Real difficulty lies in distinguishing the neutrality 
in the broad sense from the selection theory. The 
choice is not one or the other, and there may be cases 
in which both selection and drift are almost equally 
important. It would be unwise to decide which of the 
theories is ‘correct’ in such cases. 

Another aspect of selective neutrality is concerned 
with polymorphisms. Under strict neutrality, various 
quantitative predictions can be made and the neutral 
theory is testable. Many attempts have been made to 
test the theory. D. Hartl, M. Kreitman, and associates 
carried out many studies on synonymous and non- 
synonymous polymorphisms. Their test compared the 
relative numbers of synonymous and nonsynonym- 
ous substitutions either within a species or among 
closely related species. Note that the relative numbers 
should remain the same whether it is measured within 
species or between species. In some cases, an excess 
of nonsynonymous differences was found for within- 
species comparisons, whereas in the other cases, a 


deficiency of the same differences was observed. 
Again, it is difficult to tell whether the selection 
theory or the nearly neutral theory fits better to such 
cases. The results reflect the short-term effect of large 
variance of evolutionary rate mentioned above. 

Let us turn our attention to synonymous substitu- 
tions, which were interpreted to be neutral in the 
1970s and 1980s. However, very weak selection has 
been shown to be operating in relation to the codon 
usage bias that reflects tRNA abundancy. A large 
amount of data on codon usage bias are available, 
and it became clear that the bias is particularly con- 
spicuous for highly expressed genes. Such facts are 
explained by assuming the presence of the optimum 
codon usage. The model of very weak selection nicely 
explains the observed bias of codon usage. In finite 
populations, most synonymous sites are fixed with 
biased frequencies. Mutant substitutions occasion- 
ally take place by chance. They are either slightly 
deleterious or slightly advantageous, and mutation- 
selection—drift equilibrium is expected for codon 
usage. Thus, most synonymous substitutions are neu- 
tral in the broad sense. Average selection intensity is 
smaller for synonymous than for nonsynonymous 
substitutions. 


Effective Population Size 


So far, our discussion is based on the concept of effect- 
ive population size. In small populations, the range of 
effectively neutral mutations increases, and vice versa. 
In actual species, the population size rarely remains 
constant. For example, speciation often accompanies 
bottlenecks, and the effective population size may 
depend on the founding individuals; the range of 
selective neutrality increases, and more mutations 
become neutral than after the species expanded. 
Several examples of rapid molecule evolution in con- 
junction with bottlenecks have been reported, e.g., 
ribosomal RNA and protein evolution of Hawaiian 
Drosophila. The effective population size is dependent 
on linkage to other selected loci. Under strong linkage 
with many selected loci, the effective size becomes 
much smaller than the value under free recombin- 
ation. A structured population brings another compli- 
cation, since the weak selective force may differ 
between local colonies, and the effective size may be 
the local one. Effects of linkage and population struc- 
ture on selective neutrality are currently being inves- 
tigated by population geneticists. 
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Most populations of plants and animals are poly- 
morphic at the nucleotide level. In the fruit fly 
Drosophila, for example, where we have the most 
comprehensive knowledge of variation, nucleotide 
diversity, the probability that a randomly chosen loca- 
tion along the DNA is different between two ran- 
domly chosen chromosomes, can exceed 1%. For a 
typical gene 2000 bp in length, nucleotide polymorph- 
isms will be found at several dozen positions along the 
gene in a population sample of only a dozen randomly 
chosen chromosomes, and no two gene sequences in 
the sample will be identical. Many of these poly- 
morphisms will be in locations along the gene that 
do not have known functionality, such as in introns, 
and to a first approximation we may assume that 
this extensive polymorphism is representative of the 
standing crop of selectively neutral mutations found 
genome-wide (see Neutral Mutation). Now consider a 
new mutation that arises in one particular allele of this 
gene, as defined by the particular combination of these 
neutral variants along the gene, and further imagine 
that this is a selectively favorable mutation that is 
destined to be driven to fixation by positive natural 
selection. It follows that as this favored allele goes to 
fixation in the population all of the other alleles in the 
population will necessarily be driven to extinction, 
along with the neutal variation that distinguishes 
these alleles one from another. In other words, neutral 
variation that is tightly linked (in both a physical and 
genetic sense) to the site under positive selection will 
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be swept out of the population as the selectively 
favored allele sweeps to fixation. Such is the double 
meaning of a selective ‘sweep.’ 

The process of a selective sweep can be illustrated 
by considering the “genealogy” of alleles at a locus 
before and after a selective sweep, as depicted in 
Figure |. In this example we have sampled eight 
chromosomes from a population and have traced the 
ancestry of each of these alleles back in time to each of 
their most recent common ancestor with another 
allele. All the alleles ‘coalesce’ to a single common 
ancestor of the sample at the bottom of the tree. This 
tree of relationships is called a gene genealogy. 
Imposed on this genealogy are the mutations that 
have occurred along the branches, and these muta- 
tions, of course, represent the differences that can be 
used to distinguish the eight alleles one from another. 
One of these mutations, demarcated by an X, is an 
adaptive mutation that has occurred in the recent past. 
At the instant this allele reaches fixation in the popu- 
lation, every individual will possess this variant, and 
the genealogy of a random sample of eight chromo- 
somes will look like the one depicted on the right by 
the solid lines. The dashed lines, representing the old 
alleles, will have been driven to elimination, and the 
mutations that distinguish them will also have been 
lost. 

Notice that immediately following a selective 
sweep, as depicted in this example, not enough time 
has elapsed for new mutations to have arisen: all 
alleles are identical in sequence along the entirety of 
the gene. In addition, the genealogy looks decidedly 
‘star-like,’ with each allele emanating as a spoke 
from the original allele that incurred the favorable 
mutation. Both of these features — the loss of linked 
neutral polymorphism and a star-like genealogy — are 
two telltale signatures of a recent selective sweep, 
and each of these signatures of nucleotide polymorph- 
ism can be used as a criterion for inferring the exist- 
ence of a selective sweep in real polymorphism 
survey data. 
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In reality the situation is not quite this simple 
because the fixation of advantageous alleles is not 
instantaneous, and during the time in which the 
favored allele increases in frequency it can both incur 
new mutations and it can also regain some of the old 
polymorphisms by recombining with another old 
allele. More generally, a selective sweep will produce 
a trough of neutral polymorphism that will be 
restricted to a small interval of tightly linked sites. 
The length of the trough will be determined by the 
relationship between the strength of positive selection 
on the adaptive mutation (how fast the allele increases 
in frequency) and the rate of recombination between 
the selected site and a linked site (how fast the associ- 
ations are broken up). Applying realistic values of the 
recombination rate across a gene and the strength of 
positive selection of a favored mutation, a model of 
selective sweep indicates that the loss of neutral linked 
polymorphism might extend only several kilobases to 
either side of the site under selection. In addition, a 
trough of reduced polymorphism will only be dis- 
cernible in a survey of nucleotide polymorphism if 
the selective sweep has occurred in the relatively 
recent past. For these reasons, it may not be surprising 
that few convincing examples of selective sweeps have 
been documented from surveying nucleotide poly- 
morphism in genes. 

Perhaps the most convincing example of a selective 
sweep can be seen in the superoxide dismutase locus 
(SOD) in Drosphila melanogaster. This locus segre- 
gates for two different protein variants, and these 
variants have measurably different enzymatic proper- 
ties. An extensive survey of nucleotide variation found 
that all of the sequenced representatives of the more 
common of the two alleles are completely identical 
to one another in sequence, suggesting that they all 
recently derive from a common ancestor. The other 
less frequent allele, in contrast, carries extensive 
nucleotide polymorphism among individual copies. 
The locus was also sequenced in a closely related 
species, and it carries the same amino acid as the less 
frequent variant in D. melanogaster. A reasonable 
interpretation of the data is that the more common 
protein variant is the younger of the two alleles, and 
that it has been rapidly driven up in frequency in the 
recent past to become the more common type. This 
may very well be an example of a selective sweep 
caught in the act. 

Selective sweep is an example of genetic hitchhiking 
(see Hitchhiking Effect) between a site under selection 
and a linked site not under selection. This linkage can 
come about in unexpected ways. The animal mito- 
chondrial genome, for example, a maternally inherited 
circular genome consisting of 13 genes, is expected to 


be particularly susceptible to hitchhiking events 
because it is a nonrecombining genome. In one species 
of the fruit fly, D. simulans, a maternally inherited 
microorganism, called Wolbachia, has a mechanism 
by which it provides a strong selective advantage to 
females carrying the infection when they are intro- 
duced into a population without the infection. This 
strong selective advantage and maternal inheritance of 
both Wolbachia and the mitochondrial genome has 
been shown to cause the mitochondrial variant of 
the infected female to increase in frequency to near- 
fixation as it hitchhikes up along with the frequency of 
Wolbachia infection. 

If selective sweeps of advantageous variants are 
common occurrences in genes, then neutral variation 
levels might be expected to be depressed throughout 
the genome, but more so in regions of chromosomes 
characterized by low rates of recombination than in 
regions having high recombination rates. In D. melano- 
gaster, recombination rates vary by one or two orders 
of magnitude across regions of chromosomes, and true 
to this prediction, levels of noncoding polymorphism 
are strongly positively correlated with the recombina- 
tion rate. Importantly, the rate of divergence between 
this species and its sibling species D. simulans is not 
correlated with recombination rate, indicating that 
differences in polymorphism levels are not the result 
of any difference in the mutation rate. Thus, it is poss- 
ible that selective sweeps modulate levels of nucleotide 
variation genome-wide. Unfortunately for this 
hypothesis, an alternative model of genetic hitchhik- 
ing, called background selection (see Background 
Selection), has also been proposed to explain this 
correlation between the recombination rate and the 
level of polymorphism, and at present time both 
hypotheses remain viable. 
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Self-fertilization occurs when an individual is capable 
of generating both male and female gametes, and using 
the former (sperm) to fertilize the latter (eggs), there- 
by producing self-progeny. The majority of animal 
species reproduce by cross-fertilization, either with 
separate male and female sexes, or with a hermaphro- 
dite sex that is specialized to avoid self-fertilization. 
Reproduction by cross-fertilization increases genetic 
diversity by sampling the genomes of two different 
individuals, and so creating new combinations in their 
offspring. Self-fertilization can also create diversity 
in progeny genotypes, because both male and female 
gametes are usually produced with meiotic recombin- 
ation, but a purely selfing population will steadily lose 
heterozygosity at any given locus, and is therefore vul- 
nerable to inbreeding depression. Self-fertility never- 
theless occurs widely in the animal kingdom, in a 
variety of invertebrate groups. 

There are several obvious advantages to self-fertility, 
of which the most important is the avoidance of the 
twofold ‘cost of sex’: all individuals in a self-fertile 
population are capable of producing progeny, in con- 
trast to a population of males and females, in which 
only the females produce progeny. Self-fertility also 
has the advantages that a single organism can colonize 
a new habitat, and that individuals do not need to 
invest time and resources in finding mating partners. 

The same advantages of rapid population growth 
and efficient colonization apply also to organisms that 
reproduce amictically (parthenogenetically), but such 
organisms do not even undergo meiotic recombin- 
ation, and therefore have no ability to create new 
genetic combinations from one generation to the 
next. In general, it is believed that completely amictic 
populations are more likely to go extinct than those 
with some degree of genetic exchange. Reproduction 
by self-fertilization represents a compromise, espe- 
cially for organisms with the capacity for both self- 
fertilization and cross-fertilization. Examples of this 
strategy are provided by species with hermaphrodite 
sexes that are capable of both selfing and crossing, or 
species such as the laboratory nematode Caenorhab- 
ditis elegans, which has populations consisting mostly 
of self-fertile hermaphrodites with rare males that can 
cross-fertilize the hermaphrodites. 


See also: Caenorhabditis elegans; Fertilization; 
Hermaphrodite; Parthenogenesis, Mammalian 


Selfish DNA 1805 


Selfish DNA 


H Y Wong 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1 170 


Theoretical Concepts 


The assertion that organisms are simply DNA’s way of 
producing more DNA has been made so often that it is 
hard to remember who made it first. 


So begins one of two classic papers (Doolittle and 
Sapienza (1980); Orgel and Crick, 1980), which de- 
veloped the concept of selfish DNA and sparked off 
a debate which is still not altogether resolved. Whilst 
Dawkins (1976, p. 47) also mentioned the idea of selfish 
DNA, his focus was primarily on ‘selfish genes’: a 
phrase he used to describe all genes, in order to empha- 
size that genes are selected solely on the basis of their 
own propensity to increase in number. 

One way in which this increase occurs is when the 
DNA sequence provides a function that increases the 
general reproductive output (fitness) of the organism 
in which it is found, but there are two other ways in 
which a sequence can increase in number over time. 
The first is by subverting the process of inheritance 
such that, at a single locus, an individual heterozygous 
for the sequence has a greater than 50% chance of 
passing it to an offspring. This can be accomplished 
by mechanisms such as meiotic drive which promotes 
one chromosome to the detriment of the other. The 
second way is for sequences to replicate across the 
genome, so that many copies may be found in differ- 
ent locations in the same genome. It should be empha- 
sized that both of these methods only require sequences 
to be inherited vertically, from parent to offspring. 
Those replicating sequences that are regularly trans- 
mitted horizontally are usually regarded as viruses 
or virus-like organisms. Sometimes, but not always, 
the spread of these self-promoting elements reduces 
the fitness of the bearer. This is only likely to occur in 
sexually outcrossing species, where the element can be 
selected relatively independently of the rest of the gen- 
ome. The result of this collision of interests is genomic 
conflict, the demonstration of which is probably the 
clearest evidence of the ‘selfish’ nature of a sequence. 


Definitions 

Since the term ‘selfish gene’ can be used to describe 
any gene, genetic elements that promote themselves 
without necessarily benefiting the organism in which 
they are found, can be referred to as ‘ultra-selfish genes,’ 
‘selfish genetic elements,’ or ‘outlaw genes.’ This avoids 
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the terminological confusion witnessed in the use of 
‘selfish gene’ and ‘selfish DNA’ in the respective titles 
of the two previously mentioned papers. Neverthe- 
less, it is becoming common to find the name ‘selfish 
gene’ being used specifically to refer to these sequences. 
‘Selfish DNA’ is a term often used rather vaguely to 
refer to a self-promoting element which works at a 
molecular level. Meiotic drive genes are not usually 
thought of as selfish DNA, and it seems sensible to 
restrict the term to self-promoting elements which 
multiply within a genome (although some spread in 
both ways). Some authors prefer the less anthro- 
pomorphic terminology ‘parasitic DNA,’ and perhaps 
more accurate still is the term ‘symbiotic DNA.’ This 
emphasizes that the DNA is not distinguished by its 
physical effects. Indeed, one of the main points 
emphasized by Doolittle and Sapienza (1980) was that 
presence or absence of these sequences often has no 
specific effect on the phenotype. However, gross 
phenotypic effects can sometimes be seen, such as 
sterility and other defects in Drosophila hybrids. 
This hybrid dysgenesis is due to the introduction of 
selfish DNA, in the form of P elements, into genomes 
which have no mechanism to prevent its excessive 
duplication. In addition, it is now known that organ- 
isms sometimes use selfish elements for particular 
purposes (see below, ‘Coevolution of selfish DNA 
and host’). It thus seems unwise to define selfish 
DNA by the absence of a specific phenotype. Never- 
theless, one successful strategy for a replicating se- 
quence is to reduce phenotypic effects so as to cause 
as little disturbance as possible to the organism. 
‘Junk DNA’ is used to refer to DNA which is 
sequence-independent: the sequence order does not 
lead to any recognizable function. Selfish DNA or nor- 
mal genes may lose functionality and become ‘junk.’ 
Mutation may then render their origin uncertain. 


Genetic Mechanisms 


This section presents only a brief summary of some 
types of selfish DNA. For greater detail, the reader is 
referred to more specific encyclopedia entries. Gener- 
ally, selfish DNA sequences provide some of the 
machinery needed for self-replication, but also rely 
on cellular DNA replication mechanisms. 


Transposable Elements 
Transposons are perhaps the most common form of 
selfish DNA, and eukaryotic genomes contain many 
nonfunctional remnants of transposons as well as 
functional elements. They fall into class I and class II 
elements depending on their method of transposition. 
Class I elements (retroelements) use a ‘copy-and - 
paste’ mechanism whereby reverse transcriptase makes 


a DNA copy of the element from transcribed RNA, 
which is then integrated elsewhere in the genome. 
One type, the retrotransposons, bear close similarity 
to retroviruses, and evolution from retrotransposons to 
retroviruses, and vice versa, is likely. Indeed, the gypsy 
retrotransposon in Drosophila can be passed from one 
individual to another via food, and could be said to be 
a virus. A few retroelements do not themselves code 
for reverse transcriptase, presumably requiring it to be 
provided by other elements. This is the case for Alu, an 
element derived from a normal host gene, which 
makes up about 5% of human DNA. It could be said 
that these are hyperparasitic DNA, parasitizing the 
resources of ‘normal’ parasitic class I elements, their 
reduced size giving faster and more accurate replica- 
tion, as well as making mutational inactivation less 
likely. Indeed, most transposons have nonautonomous 
variants which, like Alu, ‘borrow’ some of the genes 
needed for transposition. 

Class II transposons, such as the Drosophila 
P element, use a ‘cut-and-paste’ method, excising 
themselves from the genome and reinserting else- 
where. At first sight, this would seem not to increase 
the number of elements present, but high copy num- 
bers suggest that they can replicate, and it has been 
suggested that this occurs due to transposition from 
replicated to unreplicated regions during normal 
DNA duplication. 

Transposons are thought to have an insertion bias 
for noncoding areas, such as heterochromatin or even 
preexisting transposons, where their phenotypic effect 
is minimal. Even so, a large fraction of mutations in 
most organisms are caused by transposable elements 
inserting and excising. 

Transposition in somatic cells gives no long-term 
advantage to the sequence, and may endanger the host. 
This is presumably the reason for transposons such as 
P elements only being active in germ-line cells. Con- 
centration of transposons on the germ-line is seen in 
an extreme form in hypotrich ciliates. These unicel- 
lular organisms use a separate nucleus for somatic gene 
transcription, which only contains 5% of the germ- 
line DNA. A substantial fraction of the excised DNA 


consists of transposons. 


Type I and Type Il Introns 

These self-splicing introns are unlike other introns: 
not only do they have a fairly conserved sequence, 
but in addition to being present in eukaryotic nuclei, 
multiple copies are also found in prokaryotes and 
organelles. Their RNA sequence is catalytic, splicing 
itself out of any length of RNA in which it occurs, 
hence concealing its presence when inserted in a cod- 
ing region. Type I introns have the best-studied ‘self- 
ish’ behavior. They can add themselves to alleles 


without the intron at the corresponding locus on the 
homologous chromosome. This is done by coding 
for a restriction endonuclease which recognizes and 
chops DNA at the locus, forcing repair to take place 
using the intron-containing sequence as a template. 
Alleles containing the intron cannot be cut, as the 
intron spans the restriction site. This biased gene con- 
version process is known as homing, and may also be 
responsible for the transfer of the intron to other loci. 
Perhaps the most intriguing behavior is endonuclease 
war, in which different type I introns present in sep- 
arate bacteriophages, preferentially cut each others 
splice sites, even when the intron is present. When 
both phages infect the same host, a selfish DNA battle 
may ensue, with each intron trying to deplete the 
available splice sites so as to reduce the risk of being 
spliced into by the other sequence. 

Type II introns code for reverse-transcriptase-like 
proteins, which suggests mechanisms for autonomous 
self-replication. Similarities in splicing mechanisms 
suggest that a type-II-like sequence may well have 
been ancestral to normal eukaryotic introns. 


Supernumerary Chromosomes 
Ten to fifteen percent of plant and animal species 
possess B chromosomes: chromosomes which are 
generally small and seem to be dispensable, since 
they are present in some, but not all, individuals. 
They are often present in multiple copies, and were 
first suggested as parasitic elements as early as 1945. 
Their probability of transmission is increased in 
species that do not use all four meiotic products, by 
preferential movement into those cells which become 
true gametes. They may also move preferentially into 
germline cells in development. These transmission 
methods can lead to a cumulative increase in the num- 
ber of B chromosomes. 

Plasmids may also be considered as extra chromo- 
somes, and the probable evolution of viruses from 
plasmids shows their potentially selfish nature. 


Tandemly Repeated DNA 

This is also known as satellite DNA, and consists of a 
single sequence repeated many times over. It is caused 
by slippage, where misalignment during meiotic 
recombination leaves one chromosome with a higher 
number of copies and the other with a lower number. 
A few satellite sequences may have organismal func- 
tions, but many do not. Those that do not are often 
regarded as junk DNA, but are clearly sequence- 
dependent: although it is never translated into protein, 
changing a sequence in the array will reduce slippage. 
They could be said to be ‘selfishly’ using the mechan- 
ism of crossover to replicate, as although their replica- 
tion is extremely limited and elimination as well as 
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amplification can occur, the world disproportionately 
contains the results of amplification. Evidence that 
tandemly repeated DNA can be transferred between 
loci also suggests the potential for selfish-DNA-like 
behavior. 


Evidence and Controversy 


The concept of selfish DNA is now generally 
accepted: it is indisputable that these sequences multi- 
ply in the genome, and that they can cause problems 
for the organism in which they reside. Debate has in- 
stead focused on the role of selfish DNA in evolution. 


The C Value Paradox 

The C value paradox is that the amount of DNA in 
a haploid genome (the 1C value) does not seem to 
correspond strongly to the complexity of an organism, 
and 1C values can be extremely variable. Some sala- 
manders have more than 30 times the amount of DNA 
per cell as humans, and within genera such as the sun- 
flowers, Helianthus, some species have 1C values four 
times greater than others. Much DNA in the cell is 
present as repetitive sequences of varying lengths, 
often intermediate repeat sequences which are mostly 
selfish elements. Over 50% of the maize genome 
probably consists of retroelements. 

A strong correlation between the C value and 
nuclear size, cell size, and cell cycle time has led 
some to suggest that selection on these factors main- 
tains a C value which is more or less optimal for the 
organism. According to this hypothesis, the organism 
requires a certain amount of DNA, which could con- 
sist of any sequence. Selfish DNA is particularly good 
at competing for this ‘resource,’ hence its presence in 
the genome. The organism can regulate the C value, 
for example, by deleting stretches of sequence in het- 
erochromatic regions. The organism thus has the final 
say in the C value, and selfish DNA does not explain 
the paradox. 

The opposing argument is that selfish DNA can 
increase the C value to well above that which is best 
for the organism: conflict between selfish elements 
and the rest of the genome results in different C values 
depending on which is winning. Under this view, self- 
ish DNA can explain much of the paradox. 

One factor suggesting that organisms have ultimate 
control over their genome size is the presence of geno- 
mes which contain mostly coding DNA, but have no 
major reason to prevent the build-up of selfish genetic 
elements. Although bacterial genomes have little sur- 
plus DNA, this can be explained by strong selection 
for rapid replication. Organelle genomes are similarly 
economical, but this may be due to competition among 
themselves for representation in the cell or in the 
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gametes. Indeed, petite mutants of yeast contain 
defective mitochondria which are successful due to 
their increased replication rate, but do not provide a 
respiratory function (they have been suggested as self- 
ish DNA in their own right). Since the methods 
by which selection acts on increased genome size are 
still a topic of debate, it seems likely that the issue will 
remain controversial. 


Intron Evolution 

Because prokaryotes have no introns, it is tempting to 
assume that introns are a late addition to the eukary- 
otic genome. By contrast, the introns-early hypo- 
thesis states that the ancestor of eukaryotes already 
possessed introns, and that introns were lost in pro- 
karyotes. It has close links with the ‘exon theory of 
genes’ which states that exon shuffling of originally 
small exons, each of which provides a functional 
domain of a protein, is the origin of the eukaryotic 
genome we see today. This leads to a selective ad- 
vantage for possessing introns, which now provide 
an important organismal function. The introns-late 
hypothesis considers that introns are primarily the 
result of more recent, selfish DNA movement. 

Generally, introns at different loci have quite dif- 
ferent sequences, suggesting that, although they might 
have been selfish DNA, they have been inactive for a 
reasonable length of time. More convincingly, a few 
intron positions are shared between plants and ani- 
mals. This could be evidence for early introns 
(although the plant/animal split is later than the pro- 
karyote/eukaryote one), or could conceivably be due 
to insertional bias. 

It seems extremely probable that many genes arose 
by exon shuffling: an example of gene formation of 
this sort has been found recently in Drosophila. In 
addition, there is slight evidence that introns do cor- 
respond to boundaries between protein domains. 
Whether this is due to exons evolving to provide this 
function and relatively recently co-opting introns 
based on selfish DNA to an organismal function, or 
whether the system started off in this form, is essen- 
tially still unresolved. 


Coevolution of Selfish DNA and Host 

The possibility that introns may have been selfish 
DNA, which now provides a function for the rest of 
the genome, is one of many examples of coevolution 
which have been suggested by recent research. 
Another is the telomere structure in Drosophila and 
some ciliates, which consists of repeated retroelem- 
ents: the ends of the chromosome are extended by 
retrotransposition. In many plants, the regulatory 
regions of several genes are encoded by the mobile 
elements Tourist and Stowaway, and the use of 


plasmids to transfer antibiotic resistance between bac- 
teria is well known. Finally, it is often suggested that a 
function of transposons is to provide new mutations, 
especially when accidentally transposing parts of 
the host sequence to new loci. This last suggestion 
is clearly an important factor in evolution, but is 
unlikely to provide short-term advantage. It is best 
interpreted as the host adapting to what is increasingly 
recognized as a very fluid genome. 


Further Reading 
Zeyl C and Bell G (1996) Symbiotic DNA in eukaryotic geno- 
mes. Trends in Ecology and Evolution | 1: 10-15. 
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Self-Splicing 


See: Introns and Exons 


Semiconservative 
Replication 
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Semiconservative replication is the universal system 
of DNA replication whereby strands of a parental 
duplex DNA molecule separate, each then acting as a 
template for the synthesis of a new complementary 
strand. 


See also: Replication 


Semidiscontinuous 
Replication 
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Semidiscontinuous replication is the mode of DNA 
replication whereby one new strand is synthesized 


continuously while the other is synthesized discon- 
tinuously. 


See also: Replication 


Semidominance 


See: Incomplete Dominance 


Sense Codon 


J H Miller 
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A codon that specifies an amino acid, as distinct from 
a nonsense codon that does not specify an amino acid 
but instead signals chain termination. 


See also: Genetic Code 


Sequence Alignments 


See: Alignment Problem 


Serine 
E J Murgola 
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Serine is one of the 20 amino acids commonly found in 
proteins. Its abbreviation is Ser and its single letter 
designation is S. As one of the nonessential amino 
acids in humans, it is synthesized by the body and so 
need not be provided in an individual’s diet. It is a 
precursor of selenocysteine in certain proteins. In 
bacteria, after a specialized tRNA is aminoacylated 
with Ser, the amino acid is converted in two steps to 
selenocysteine. 

The chemical structure of serine is given in Figure I. 


COOH 
H,N—C—H 
H—C— OH 
H 


Figure I Serine. 


See also: Amino Acids; Proteins and Protein 
Structure 
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Sex Chromatin 


M A Ferguson-Smith 
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In female somatic cells one of the two X chromosomes 
is genetically inactive and becomes condensed, forming 
a small mass of dense chromatin which can be seen with 
the light microscope within the cell nucleus closely 
applied to the nuclear membrane. This small structure, 
variously termed the X chromatin, sex chromatin, or 
Barr body, can be identified in most female somatic 
tissues in proportion depending on the active state of 
the tissue and the stage reached in the cell cycle. 

The sex chromatin was discovered in 1949 by 
Murray Barr, a Canadian neurophysiologist, while 
studying the effects of electrical stimulation of the 
hypoglossal nerve in a series of cats; he noticed that 
only half the cats showed this nuclear structure. For- 
tunately, he had recorded the sex of each experimental 
subject and quickly recognized that it was only 
present in female subjects. A similar structure, the 
nucleolar satellite, can be seen in the early drawings 
of nerve cells by Ramón y Cojal made in the previous 
century, although its significance in terms of sexual 
dimorphism was not recognized at the time. 

Barr and his colleagues followed up this observa- 
tion and soon determined that the sex chromatin body 
could be recognized in female somatic cells of many 
mammals including humans, but was absent in male 
cells. Its association with only one of the two X chromo- 
somes was not recognized until much later (see X- 
Chromosome Inactivation), but it was put to practical 
use earlier on as an aid to the investigation of problems 
of intersex (see Intersex), for example in the diagnosis 
of female pseudohermaphroditism due to congential 
adrenal hyperplasia (see Congenital Adrenal Hyper- 
plasia (Adrenogenital Syndrome)). 

The most surprising result of nuclear sexing was the 
discovery that two forms of hypogonadism in humans 
were associated with paradoxical sex chromatin find- 
ings. In 1954, Polani and Lennox found that patients 
with Turner syndrome (see Turner Syndrome) had 
‘male’ nuclear sex, and were thus presumptively sex- 
reversed males. This seemed to be confirmed by the 
incidence of color blindness in cases of the syndrome, 
which was the same as the incidence in normal males. 
Then in 1956, Barr and colleagues found that a number 
of patients with Klinefelter syndrome (see Klinefelter 
Syndrome) had ‘female’ nuclear sex, suggesting that 
they were sex-reversed females. In 1959 Turner 
patients were shown to have a single X chromosome 
and no Y, while Klinefelter patients were found to 
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have two Xs and a Y, incidentally demonstrating the 
dominant sex-determining role of the Y chromosome. 

Progress towards the elucidation of the nature of 
the sex chromatin body came with the discovery of 
XXXY Klinefelter patients with two Barr bodies and 
Turner patients with two X chromosomes in which 
large and small Barr bodies were associated with X 
chromosome duplications and deletions respectively. 
These findings pointed to a derivation from a single X 
chromosome. In 1961 Mary Lyon presented evidence 
for random inactivation in female mammals based on 
the patchy distribution of coat colour in mice hetero- 
zygous for X-linked coat color genes. It was immedi- 
ately obvious that X inactivation was associated with 
the formation of the Barr body. 

Sex chromatin is readily studied in smears taken 
from the buccal mucosa with a spatula and spread 
onto microscope slides. Once the buccal mucosal 
cells are fixed and stained by a simple nuclear dye 
(e.g., cresyl violet), the sex chromatin body can be 
observed in approximately 30% of cells from normal 
females, and in no cells from normal males. The pro- 
cedure has been used to screen various populations of 
individuals. As a result Klinefelter syndrome has 
been found to occur in approximately 11% of azoo- 
spermic or oligozoospermic males, and 1% of males 
with learning defects. The same method was intro- 
duced in 1960 as a gender verification test for female 
athletes taking part in the Olympic Games. This 
resulted in the identification of XY females in ap- 
proximately 1 in 420 female athletes, and many 
were unfairly excluded from participation. DNA test- 
ing of buccal smears replaced nuclear sexing in the 
1980s, but the same discrimination continued until 
the Olympic Games in Sydney in 2000, when gender 
verification was finally abandoned. 


See also: Congenital Adrenal Hyperplasia 
(Adrenogenital Syndrome); Intersex; Klinefelter 
Syndrome; Turner Syndrome; X-Chromosome 
Inactivation 


Sex Chromosome 
Aneuploidy: XYY 
PA Jacobs 
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Normal males have one X and one Y sex chromosome, 
but individuals with an abnormal number of sex 
chromosomes are not uncommon in the human 


population. Approximately 1 in 1000 males has an 
additional Y chromosome. Such men have a normal 
appearance, although on average they are considerably 
taller than XY males. Furthermore, as a group, they 
have a propensity for aberrant behavior associated 
with a personality disorder. As a result, XYY men 
are found with a 30-fold increased frequency among 
men in maximum security hospitals. The basis for the 
aberrant behavior of a proportion of XYY males is not 
understood. 


See also: Klinefelter Syndrome 


Sex Chromosomes 


J AM Graves 
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Sex as a mode of reproduction is widespread in the 
animal world. The system whereby males make sperm 
and females make eggs requires a means to differenti- 
ate two sexes with distinct anatomy, hormones, and 
behaviors superimposed on a common body plan. 
There are several ways to accomplish this. First, 
sex can be established by different environmental con- 
ditions. For instance, egg incubation temperature 
determines the sex of alligator hatchlings (hotter for 
males). Second, sex in many animals is determined 
by different versions (alleles) of a single gene, as in 
many fish and all amphibians. Last, sex may be deter- 
mined by a sex chromosome system. 


Chromosomal Sex Determination 


In all mammals and birds, some reptiles and fish, males 
and females differ in one pair of chromosomes. Het- 
eromorphic chromosomes also occur in many insects 
such as the fruit fly Drosophila, moths, and butterflies. 
In fact, sex chromosomes were first spotted in grass- 
hoppers, when it was observed that one chromosome 
was present in the normal duplicate in females, but 
was solo in males. This peculiar sex-related chromo- 
some was called the “X” to denote its unknown sig- 
nificance — the name has nothing to do with its shape. 
In other insects such as the fruit fly, females again had 
two X chromosomes and males only one, but there 
was also a small male-specific entity (called a Y). In 
moths and butterflies, itis the other way around — males 
have two copies (as per normal) of a sex chromosome 
(called the Z to avoid confusion), and females have a 
single Z and a smaller W chromosome. It is the same 
story in vertebrates; mammals (including humans) 


have an XX female:XY male system, whereas birds 
and snakes have a ZW female: ZZ male system. 

Sex works by the distribution of the heteromorphic 
sex chromosomes during spermatogenesis. For in- 
stance, in humans and fruit flies, the X and Y chromo- 
somes of an XY male separate into different sperm at 
meiosis. All eggs carry a single X. An egg fertilized by 
an X-bearing sperm develops into a female, and an egg 
fertilized by a Y-bearing sperm develops into a male. 
In these XX female: XY male species, we call the male 
the heterogametic sex because he can make two kinds 
of gametes. In species such as birds and butterflies, the 
female is the heterogametic sex. She makes two kinds 
of eggs, Z- and W-bearing, which become female and 
male when fertilized by Z-bearing sperm. 


Sex-Determining Genes 


What is it about the X and Y chromosomes in fruit 
flies and man that determine maleness and femaleness? 
Appearances are deceptive — it turns out that the two 
species have completely different ways of doing it. In 
fruit flies, the Y chromosome is quite irrelevant to sex 
determination, although it carries genes required for 
making sperm. It is the different dosages of the X 
chromosome in females (two) and males (one) that 
determine the sex of the embryo. We know this 
because flies that are XO (have a single X but no Y) 
are male like XY, and flies that are XXY (have a Y as 
well as two Xs) are female like XX. Several genes were 
identified in fruit flies because mutants develop into 
the wrong sex. One of these genes is the key switch 
that is flipped one way if there is a single X, and the 
other way if there are two. The ratio of important 
genes on the X chromosome to genes on other chromo- 
somes determines how the RNA product of this gene 
is spliced to form alternative products that activate 
male-specific and female-specific sets of genes. 

In mammals (man is typical), the Y chromosome is 
paramount. XO individuals are females with Turner 
syndrome, and XXY are male with Klinefelter syn- 
drome. There was a frenzied search of the human Y in 
the 1980s to identify the gene that triggers testis differ- 
entiation, the first step in a hormone-controlled path- 
way to all the other sex differences. Studies of patients 
having only parts of a Y chromosome directed the 
search to a small region of the Y near one end. Candi- 
date genes were isolated from this region and tested by 
their patterns of expression, and their location and 
expression in closely and distantly related mammals. 
The SRY gene on Y of humans and other mammals, 
even kangaroos, was mutated in some XY females, and 
directed male development when injected into XX 
mouse eggs. This gene works by regulating other 
genes in a testis-determining pathway, but it is not 
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yet known exactly which genes, or whether SRY 
turns them on or off. The other genes in the pathway 
are not on the sex chromosomes in mammals. How- 
ever, one of them, DMRT, turns out to be the sex- 
determining gene on the Z chromosome in birds that 
probably works by dosage differences in the male and 
female. 


Sex Chromosome Evolution 


Compared to the other chromosomes, the mammalian 
Y is a genetic wasteland, being small and almost 
entirely devoid of active genes. Other than SRY, the 
human Y contains only about 20 genes, several con- 
cerned with spermatogenesis. The Y is largely genetic 
junk — dead (pseudo)genes and highly repeated 
sequences that do not specify proteins. However, we 
know that the Y was once equivalent to the X. Over 
the last 200 million years, it lost most of its 2000-odd 
genes when it became isolated from genetic recombin- 
ation with the X. Degradation is still continuing, so 
that in time the Y may disappear entirely and new sex 
chromosomes may be initiated, as seems to have hap- 
pened in some unusual rodent species. 


See also: Sex Determination, Human; 
W Chromosome; X Chromosome; 
X-Chromosome Inactivation; Z Chromosome 


Sex Determination, 
Human 


A Sinclair 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1179 


Since antiquity people have proposed various mechan- 
isms to account for an individuals sex. Aristotle be- 
lieved that vigorous intercourse would result in a boy, 
whereas a gentler approach would yield a girl. Since 
that time we have developed a more sophisticated 
understanding of the molecular genetic mechanisms 
that govern human sex determination. In mammals, 
sex determination involves the commitment of the 
embryo to follow either a male or female develop- 
mental pathway. The key step in this process is 
the development of the undifferentiated embryonic 
gonads into either testes or ovaries. In humans and 
other mammals, sex is determined genetically at fertil- 
ization by the sex chromosome constitution. Two 
X chromosomes result in female development while 
the inheritance of an X and a Y chromosome results 
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in male development. It was postulated that the Y 
chromosome carried a dominant testis-determining 
factor (TDF, so called because at that time it was an 
unknown ‘factor’) which causes the undifferentiated 
embryonic gonad to develop as a testis. The masculin- 
izing effect of the testis is due to the secretion of the 
hormones testosterone and anti-Miillerian hormone 
(AMH; also known as Miillerian inhibitory substance, 
MIS). AMH causes regression of the embryonic 
female Millerian ducts. Inthe absence of the Y chromo- 
some (and absence of TDF), ovaries will develop. 
Interestingly, female development will still occur in 
the absence of ovaries or their hormonal products. 
Consequently, the decisive event in sex determination 
is whether or not a testis develops. In mammals, sex 
determination can be equated with testis determin- 
ation. The postulated Y-linked testis-determining 
factor was thought to orchestrate a hierarchy of 
genes in a pathway leading to testis development. Isol- 
ation of the master switch gene TDF would allow the 
stepwise unraveling of the molecular genetic pathway 
of human sex determination. 


SRY: the Master-Switch Testis- 
Determining Gene 


In 1990, Sinclair et al. isolated and characterized the 
SRY gene (Sex-determining Region on the Y chromo- 
some) from the human Y chromosome and showed 
it to be the elusive master-switch testis-determining 
factor (TDF). The SRY gene was isolated using DNA 
from sex-reversed patients who had two X chromo- 
somes but had formed testes and were male. Approxi- 
mately 80% of these XX males had a small portion of 
the Y chromosome including SRY translocated onto 
one of their X chromosomes. Consequently, SRY was 
derived from the smallest region on the short arm of 
the Y chromosome, known to be sex-determining. 
Another group of sex-reversed patients had XY 
chromosomes but no testes and were female. In 20% 
of these XY females there was a loss-of-function 
mutation in the SRY gene. These mutations in XY 
females confirmed that SRY was required for normal 
testis formation and male sex determination. The 
other 80% of XY females are thought to have muta- 
tions in other genes in the testis pathway. Finally, the 
SRY gene in mouse (Sry) was isolated and used to 
make sex-reversed transgenic mice. These mice carried 
a 14 kb DNA fragment containing only the Sry gene 
and developed as males with (sterile) testes even 
though they had two X chromosomes (Koopman 
et al., 1991). This was final proof that SRY is the 
only Y-linked gene necessary and sufficient to initiate 
testis development. Consequently, the SRY gene is the 
long-sought after testis-determining factor (TDF). 


Current evidence suggests SRY is expressed in the 
pre-Sertoli cells and acts to induce Sertoli cell differ- 
entiation. Pre-Sertoli cells promote the formation of 
testicular cords and Leydig cell formation. Leydig 
cells in turn produce the key male hormone testoster- 
one and Sertoli cells produce AMH. SRY is a single 
exon gene that encodes a 79 amino acid motif, the 
HMG box, which is capable of sequence-specific 
binding and bending of DNA. The SRY protein is 
thought to act as an architectural transcription factor 
influencing the expression of other genes by inducing 
conformational change in the surrounding chromatin. 
The HMG domain also contains signal motifs for 
transporting the SRY protein into the nucleus. Outside 
the HMG box SRY shows little conservation between 
mammalian species, suggesting that the HMG box is 
the major component. However, in mice there appear 
to be other regions outside the HMG box necessary 
for Sry function. Sry appears as a brief burst of expres- 
sion just prior to morphological differentiation of 
the embryonic testis. This suggests that SRY acts as a 
switch toward Sertoli cell fate but it is not required for 
the maintenance or function of Sertoli cells. SRY must 
act as switch, activating other genes that are involved 
with maintaining functioning Sertoli cells. Despite 
extensive studies we still do not know the genuine in 
vivo target(s) of SRY, nor do we know how SRY itself 
is regulated. 


SOX9: An Autosomal Testis-Determining 
Gene 


Several autosomal genes are known to be associated 
with sex-reversal syndromes and presumably play a 
role downstream of SRY in the testis developmental 
pathway. Unlike the male Y-specific SRY gene, these 
autosomal genes will be present in both males and 
females but are likely to show differential sex-specific 
expression. Deletions of loci from chromosomes 9p 
(short arm), 10q (long arm), and 17q (long arm) can 
result in XY females with dysgenic ovaries. In the 
latter case, translocations and deletions of 17q are 
also associated with campomelic dyplasia (CD), a 
rare fatal congenital skeletal malformation syndrome, 
characterized by bowing of the long bones. Three- 
quarters of the XY individuals with CD develop as 
phenotypic females or intersexes. Analysis of the 17q 
translocation breakpoints in CD patients revealed an 
SRY-related HMG-box gene, SOX9. CD patients and 
sex-reversed XY females with CD were both found to 
have loss-of-function mutations in one allele of SOX9. 
This indicates that haploinsufficiency of SOX9 is 
responsible for both the CD skeletal dysplasia 
and the XY sex reversal. This implies that SOX9 is a 
key component in the testis-determining pathway. 


Duplication of the chromosome 174 region, including 
SOX9, can also result in sex-reversed XX male 
patients suggesting a gain-of-function mutation in 
SOX9 may be responsible. SOX9 is a classic transcrip- 
tion factor possessing both a DNA-binding HMG 
box and a transactivation domain. SOX9 is expressed 
in a variety of human fetal tissues including brain, 
testis, and chondrocytes of the hypotrophic zones of 
developing long bones and ribs. In the mouse, Sox9 is 
upregulated specifically in the pre-Sertoli cells sug- 
gesting its role is to induce Sertoli cell differentiation. 
Sox9 is expressed in the genital ridge, just after the 
onset of Sry expression. This led to speculation that 
SRY may be regulating SOX9 expression. However, as 
the human SOX9 regulatory region is spread over a 
vast 1Mb region of DNA, it has been difficult to 
examine interactions between SRY and SOX9. How- 
ever, it is clear that SOX9 is a key downstream com- 
ponent of the sex-determining pathway, inducing 
Sertoli cell differentiation in the developing testis. 


DMRTI: A Conserved Testis- 
Determining Gene 


Sex-determining mechanisms appear to be very differ- 
ent between vertebrates and invertebrates. So it was 
surprising to find a family of genes related by a DM 
domain (DNA-binding region) that play a sex-specific 
role across the different phyla. The Drosophila dsx 
(doublesex) gene and the Caenorhabditis elegans 
mab-3 gene share the DM domain and both genes are 
involved in the differentiation of sex-specific structures 
such as the peripheral nervous system and yolk protein 
development. A search for DM-related genes in 
humans revealed the DMRT1 (DM-related transcrip- 
tion factor 1) gene. DMRT1 mapped to the distal short 
arm of human chromosome 9, deletions of which are 
associated with XY female sex-reversal. DMRT1 is 
upregulated specifically in the male genital ridge 
in humans and mice. Birds and reptiles such as the 
chicken and alligator, respectively, lack the SRY gene 
but do show male specific upregulation of SOX9 and 
DMRT1 in the developing gonads. This suggests an 
important role for DMRT1 in vertebrate sex deter- 
mination. However, there is a complication with 
DMRT1 because knockout mice do not develop as 
sex-reversed XY females. This suggests that DMRT1 
is not a testis-determining gene but it may be that its 
function is compensated by other genes residing near 
DMRT1 on the distal short arm of human chromo- 
some 9. The large deletions of chromosome 9 seen 
in sex-reversed XY female patients would presum- 
ably remove a number of genes from this region. The 
jury is still out on the role of this new candidate sex- 
determining gene. 
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DAXI: Suppression of Testis 
Development 


Some sex-reversal syndromes are known to involve 
X chromosome rearrangements and result in failure 
of the testis to develop. Duplications of the Xp21 
region have been shown to cause XY female develop- 
ment and this led to the proposal that a dosage- 
sensitive sex-reversing (DSS) gene must exist. Two 
active copies of the postulated DSS gene are believed to 
override the testis-determining signal, resulting in the 
development of ovaries and XY female sex-reversal. 
This same region on the X chromosome (Xp21) is also 
involved in adrenal hypoplasia congenita (AHC), 
which results in the failure of the adrenals to form 
properly. A search of the Xp21 region revealed a 
gene called DAX1 (DSS-AHC- critical region of the 
X chromosome, gene 1). DAX7 is likely to be subject 
to X chromosome inactivation, as Klinefelter syn- 
drome (XXY) individuals are male, not sex-reversed 
females. Consequently, normal males and females 
would each be expected to have one functional copy 
of DAX1. The DAX1 gene encodes an orphan nuclear 
receptor with a ligand-binding domain but lacking the 
usual zinc finger DNA-binding motif. Instead the N- 
terminal domain of DAX7 has an unusual repeat motif 
rich in alanine and glycine. Mutations in the DAX1 
gene result in patients with adrenal hypoplasia con- 
genita (AHC). However, it is not clear if DAX1 also 
played a role in dosage-sensitive sex reversal. Dele- 
tions of DAX7/ in XY patients do not disrupt testis 
differentiation. However, transgenic mice that over- 
express Dax1 can be shown to undergo XY female sex 
reversal. Dax1 is expressed at the same time as Sry; 
however, Dax1 is downregulated in the developing 
testis but maintains expression levels in the ovary. 
This suggests that either extra DAX1 copies in an 
XY individual can suppress testis development or that 
DAX1 may have a role in ovary development. How- 
ever, knockout Dax1 XX female mice still developed 
ovaries and were fertile, suggesting that DAX7/ is not 
an ovarian-determining gene but acts antagonistically 
to SRYas an anti-testis gene. Lack of Dax1 in XY male 
mice resulted in sterility, confirming another, if unex- 
pected, role for DAX1 in spermatogenesis. 


Genes Required for Early Indifferent 
Gonad Formation 


Steroidogenic factor 1 (SFZ) is an orphan nuclear 
receptor and a key regulator of steroidogenic en- 
zymes. SF1 is expressed in all primary steroidogenic 
tissues, including the adrenal cortex, Leydig cells of 
the testis, and ovarian follicles. SF1 encodes a zinc 
finger DNA-binding domain and a ligand-binding 
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domain. In the mouse embryo, Sf7 is expressed at the 
earliest stage of urogenital development in the undif- 
ferentiated gonad. Sf1 shows continued expression 
throughout testis development but is downregulated 
during ovarian development. In the testis SF1 expres- 
sion is localized to the Sertoli and Leydig cells. 
Together these data suggest a role for SF1 in the for- 
mation of the urogenital ridge and subsequent sexual 
differentiation. SF1 is also expressed in the developing 
hypothalamus and in pituitary gonadotropes suggest- 
ing it has a wider role by operating at different levels of 
the reproductive axis. Null mutant Sf1 knockout mice 
(XX and XY) lack both adrenal glands and gonads. 
These mice develop as phenotypic females but die 
shortly after birth from adrenal failure. Detailed 
analysis of these mice showed that in the absence of 
Sf1 the genital ridge displayed arrested development 
and then atrophied. SF1 clearly acts at several points 
in the sex-determining hierarchy. One of its known 
targets is AMH, where it plays a key regulatory 
function (see ‘Regulation of AMH’). 

The Wilms’ tumor 1 (W77) gene is an oncogene 
associated with a pediatric cancer of the kidney 
(Wilms tumor). WTI encodes a zinc finger protein 
and is thought to act as a transcription factor. Hetero- 
zygous mutations in the zinc finger domain of WT1 
result in patients with Denys—Drash syndrome, char- 
acterized by renal failure and genital abnormalities, 
including XY female sex reversal. Frasier syndrome, 
also associated with XY female sex reversal, was 
shown to be due to a mutation in a splice donor site 
in WT1, causing loss of a specific isoform of WT1. The 
mouse gene, Wt1 is expressed in the undifferentiated 
gonads of both males and females at the same time as 
Sf1. Targeted disruption of Wt1 in mice resulted in the 
failure of the kidney, ovary, and testis development. 
This suggests that Wt1 acts early in urogenital 
development (possibly in conjunction with Sf7) to 
ensure proper formation of the indifferent embryonic 
gonad. The mutations in WTZ that result in XY 
females in both Denys—Drash and Frasier syndome 
patients indicate that W717 plays an additional role in 
testis development. The human WT1 gene is com- 
prised of 10 exons and produces a range of different 
transcripts using alternative splice sites and translation 
start sites. It is thought that the different isoforms of 
WT1 carry out a variety of different functions. A 
specific isoform of WT1 (the same isoform abolished 
by the WTI mutations in Frasier syndrome) is thought 
to interact with SF1 to upregulate AMH expression in 
the testis (see “Regulation of AMF? below). 

The homeopaired box gene, Lim1, appears to play a 
role in kidney and gonad development as targeted 
disruption of Lim1 results in mice lacking these 
organs. Lim1 is expressed in early urogenital ridge 


development in both the mesonephros and meta- 
nephros but its role in gonad development is not 
clear. The mouse gene M33 has also been implicated 
in gonad development. M33 shows similarity to the 
Drosophila polycomb group of genes (PcG). Mice 
carrying a disrupted M33 gene show retarded gonad 
development associated with varying degrees of XY 
sex reversal. In Drosophila, PcG proteins regulate the 
coordinate expression of homeotic genes. So it is pos- 
sible that M33 may similarly regulate Hox gene ex- 
pression in the mammalian urogenital ridge. Although 
SF1, WT1, Lim1, and M33 all appear to be important 
in the formation of the indifferent gonad, their precise 
role is unknown. 


Regulation of AMH 


Anti-Millerian hormone (AMH) is synthesized in the 
Sertoli cells and is one of the first proteins produced 
by the developing testis. AMH causes regression of 
the Miillerian ducts within the male embryo, which 
would otherwise develop as oviducts, uterus, and 
upper vagina. Within the small regulatory region of 
the AMH gene, binding sites were discovered for 
SOX9 and SFI (steroidogenic factor 1). SOX9 and 
SF1 proteins bind to adjacent sites in the regulatory 
region of AMH and cause significant upregulation of 
AMH expression. In mice, mutation of Sox9 or Sf1 
binding sites in the Amb regulatory region causes 
abolition or diminution, respectively, of Amh expres- 
sion. SOX9 appears to be essential for initiating AMH 
expression. SF1 physically interacts with the adjacent 
SOX9 protein to significantly upregulate AMH tran- 
script levels. A specific isoform of WT1 also appears 
to physically interact with SF1 to synergistically 
upregulate AMH expression. As a counter to this, 
DAX1 has been shown by in vitro experiments to 
repress the synergistic action of SF1 and WT1 on the 
AMH promoter. This suggests that when there is an 
abnormal double dose of DAX1 and it is present in 
high levels in the testis it can block the normal up- 
regulation of AMH expression. The normal role of 
DAX1 in the ovary may be to prevent the expression 
of AMH. One member of the GATA family of tran- 
scription factors, GATA4, is expressed in the develop- 
ing gonads. A GATA4 binding site is also present in 
the AMH regulatory region. It is thought that the 
GATA4 protein may bind to this site and interact 
with SOX9 to regulate AMH expression. The role of 
WT1 and GATA4 on AMH regulation has not been 
confirmed im vivo. However, we can clearly state that 
SOX9 and SF1 together upregulate AMH expression. 
While AMH is required for sex-specific differenti- 
ation of the reproductive tract, lack of AMH does 
not affect testis development. 
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Genetic control of human sex determination. (Adapted from O’Neill M and Sinclair A (1996) The testis- 


determining gene, SRY. Advances in Genome Biology 4: 29-51.) 


WNT4: Ovarian Development and Testis 
Suppression 


Wnt4, a member of the Wnt gene family of signaling 
molecules, appears to be required for ovarian develop- 
ment and acts to suppress testis formation. The Wnt4 
gene is initially expressed in the genital ridge and 
mesonephros of both sexes but as sex-specific gonadal 
differentiation proceeds it is downregulated in the 
testis but maintained in the ovary. Male (XY) mice 
lacking Wnt4 develop normal testes and Wolffian 
duct derivatives. By contrast, females (XX) lacking 
Wnt4 are masculinized as the Müllerian duct is absent 
and the Wolffian duct is similar to that of a male. 
Furthermore, the ovaries of Wnt4 mutant female mice 
express genes coding for enzymes normally associated 
with testosterone production. Finally, the ovaries of 
these mice display a marked decrease in oocyte 
development. The implication is that steroid cell pre- 
cursors of Leydig and theca cells must be present in 
the indifferent gonad of both males and females. As 
the testis develops more rapidly than the ovary, the 
Leydig cells can begin producing testosterone as soon 
as testis cords form in the embryo. By contrast, in the 
ovary the theca cells are not steroidogenically active 
until birth. This data suggests that in normal XX 
females high levels of Wnt4 expression act in the 
indifferent genital ridge to represses testosterone pro- 
duction in the precursors of Leydig cells, allowing 
theca cells to eventually develop. Presumably in nor- 
mal (XY) males the downregulation of Wnt4 expres- 
sion allows testosterone biosynthesis from Leydig 
cells to proceed. Consequently, Wnt4 has three dis- 
tinct functions in the female pathway: formation of 


the Millerian duct, suppression of Leydig cell develop- 
ment in the indifferent genital ridge, and postmeiotic 
maintenance of oocytes in the ovary. 


Future Prospects 


Over the past decade there has been an enormous 
increase in our understanding of the molecular basis 
of mammalian sex determination. The isolation and 
characterization of several key genes has begun to 
elucidate the complexity inherent in the development 
of a gonad. Unfortunately, we have only a few pieces of 
this time-dependent three-dimensional jigsaw puzzle. 
As more of the pieces are found and their interactions 
in time and space become clear, we will gain a unique 
insight into the process underlying organogenesis. 
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Introduction and Developmental Biology 


In mammals, XY fetuses develop testes due to the 
action of the Y-linked testis determining gene (Sry, 
also known as Tdy) and XX fetuses develop ovaries in 
its absence. As a result of early experiments, primarily 
by Jost, it is known that sex determination (primary) 
can be reduced to the genetic choice between the 
development of an ovary or a testis; subsequent sex 
differentiation (secondary sex determination) is depen- 
dent upon gonadal sex determination. Sex differenti- 
ation is regulated by the production of two hormones 
exported from the developing testis, Miillerian inhibit- 
ing substance (MIS, or AMH for anti-Miillerian hor- 
mone) and testosterone, which both affect the further 
development of the reproductive tract. In males, MIS 
causes the Miillerian ducts to regress and testosterone 
stimulates the Wolffian ducts to differentiate into the 
epididymides, vasa deferentia, and seminal vesicles. In 
females, the absence of MIS allows the Miillerian ducts 
to differentiate into the oviduct, uterus, and upper part 
of the vagina, while the absence of testosterone causes 
the Wolffian ducts to degenerate. 

This review will be restricted to a genetic descrip- 
tion of the process of sex determination in mice, and 
will mostly describe testis determination because cur- 
rently little is known about the genetics of ovary 
determination. 

The genital ridge, the anlagen of both the testis and 
the ovary, develops on the ventromedial surface of the 
mesonephros (primitive kidney) and is first visible at 


approximately 10 days post coitum (dpc) in the mouse. 
The mesonephros and the genital ridge are derived 
from the intermediate mesoderm. Mammalian adult 
gonadal cells can be grouped into four types: germ 
cells (sperm in the testis, eggs in the ovary), steroid- 
producing cells (Leydig cells in the testis, theca cells in 
the ovary), supporting cells (Sertoli cells in the testis, 
follicle/granulosa cells in the ovary), and connective 
tissue cells (essentially similar in both organs, except 
peritubular myoid cells are unique to the testis). 
Analysis of XX > XY aggregation chimeras demon- 
strated that Sertoli cells are almost exclusively XY 
whereas the other testis cell types can be XX or XY. 
This result indicates that the cell-autonomous action 
of Sry is needed only in Sertoli cells, and suggests the 
following working model for testis determination. Sry 
expression in pre-Sertoli cells causes them to differ- 
entiate and become organized into testis cords. The 
formation of testis cords then triggers the remaining 
cell types to differentiate along the testis development 
pathway. 


Genes 


Sry (Sex-determining region, 

Y Chromosome) 

In the early 1950s, it was discovered that the presence 
or absence of a Y chromosome determined whether a 
mammal developed as a male or female, respectively. It 
was hypothesized that a gene on the Y chromosome 
determined male sex and the human gene was desig- 
nated TDF (testis determining factor) while the mouse 
gene was designated Tdy (testis determining-Y). In 
1990, the search for TDF/Tdy identified the human 
SRY gene ina small Y chromosome region that caused 
sex reversal when present in XX individuals. (By con- 
vention, Sry refers to the mouse gene, SRY refers to 
the gene in all other species, and SRY refers to the 
protein in all species.) Definitive proof equating Tdy 
and Sry came from mutation analysis in humans, and 
gain- and loss-of-function studies in mice. Introduc- 
tion of a 14.6 kb transgene containing only Sry causes 
XX mice to develop as sterile males. (Sterility is caused 
by the presence of two X chromosomes and the 
absence of a Y chromosome.) Conversely, XY?” 
mice, which have an 11 kb deletion at the Sry locus, 
develop as semifertile females. 

SRY contains a 79 amino acid DNA binding motif 
that was first identified in high mobility group 
proteins (the HMG domain). SRY binds to the 
A/TAACAAT/A consensus sequence in linear DNA 
and induces an ~80° bend. These features strongly 
suggest that SRY is a transcription factor, but it is 
unknown if Sry is a transcription activator or repres- 
sor (or both) and no definitive target genes have been 


identified. SRY also binds in a sequence independent 
manner to cruciform DNA, the significance of which 
is obscure. Sry can be considered a strange gene for 
many reasons. It is rapidly evolving and the HMG box 
is the only obviously conserved portion of the gene 
(true for SRY in general). It is monoexonic and 
imbedded within large inverted repeats. It generates 
two very different RNA transcripts, an embryonic 
linear polyadenylated form and an adult germ-cell 
circular nonpolyadenylated form. The function of 
the circular form is unknown; it probably is not trans- 
lated, it is not present in other species, and it may be an 
artifact of the inverted repeat structure of the mouse 
Sry locus. In fetal mice, Sry reportedly is expressed 
exclusively in the somatic cells of the developing XY 
gonad between 10.5 and 12.5 dpe immediately prior to 
the first visible sign of sex determination, which is the 
formation of testis cords at 12.5 dpc. However, SRY is 
transcribed in many different tissues in humans and 
marsupials and throughout testis development in 
marsupials. 


Sox 9 (Sry-box containing gene 9) 

After the cloning of Sry, the Sox gene family was 
identified by their high sequence homology to the 
Sry-HMG box. SOX is disrupted in patients with 
campomelic dysplasia (CD), a severe dwarfism syn- 
drome that often is associated with XY sex reversal. 
Interestingly, SOX9 is probably haploinsufficient 
because all CD patients are heterozygous for the 
mutation and many of the identified mutations are 
predicted to cause loss of function. In contrast to Sry, 
Sox9 has two introns and is highly conserved between 
species both within and outside of the HMG box. In 
addition to the HMG domain, SOX9 has two domains 
that are rich in proline, glutamine, or serine, and are 
necessary for its function as a transcription transacti- 
vator. SOX9 binds to and activates transcription from 
the Col2a1 (procollagen, type II, alpha I) gene; how- 
ever, no definitive target genes have been identified in 
the developing gonad. 

Sox9 expression in the developing gonads is sexu- 
ally dimorphic. Prior to 11.5dpc, it is expressed in 
both XX and XY genital ridges at fairly equal levels; 
but thereafter, expression falls precipitously in XX 
gonads, and increases in XY gonads. SOX9 expression 
mirrors this pattern and is largely undetectable in XX 
gonads after 11.5dpc. Curiously, SOX9 is localized 
perinuclearly prior to 11.5 dpe in both sexes, but then 
becomes localized to the nucleus of Sertoli cells. Sub- 
sequently, Sox9 expression is maintained in fetal and 
adult testes. In adult humans and mice, SOX9/Sox9 
expression is germ-cell independent. In the urogenital 
system, Sox9 also is expressed in the Miillerian 
and Wolffian ducts, the cranial portion of the 
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mesonephros, and in the epididymis. Interestingly, 
fetal ovaries that transdifferentiate into testes after 
transplantation under adult kidney capsules activate 
Sox9 expression. Gene ablation studies in mouse are 
currently in progress. 


Sfl (Steroidogenic factor I, also Known as 
Adrenal 4-Binding Protein, Ad4BP) 

SF1 is encoded by the Ftzf1 locus (fushi tarazu factor 1 
homolog), and was initially isolated as an important 
regulator of the cytochrome P-450 steroid hydroxy- 
lases. It is important to note that the Ftzf1 locus is 
genetically complex and produces four related tran- 
scripts by alternative promoter usage and splicing: 
embryonal long terminal repeat binding protein 1 
(ELP1), ELP2, ELP3, and Sf1/Ad4BP. SF1 is an 
orphan nuclear receptor and encodes a protein with 
two zinc fingers, an A or FTZ-F1 box that mediates 
additional DNA binding, a proline rich domain, and 
an AF-2 domain both of which may be transactivation 
domains. SF1 binds DNA as a monomer with specifi- 
city for the YCAAGGTCA motif and has been shown 
to activate the expression of a number of genes, 
including Mis. 

Sf1 is expressed in the urogenital ridges of both 
sexes from 9 dpc, and continues in the genital ridges 
of both sexes until about 12.5dpc. After 12.5 dpc 
expression is extinguished in XX gonads and is un- 
detectable by in situ hybridization at 14.5 dpc. Expres- 
sion in XY gonads continues at high levels in Sertoli 
cells and in interstitial Leydig cells until about 
14.5 dpc when expression becomes restricted to Leydig 
cells and is extinguished by 17.5 dpc. In adults, SF1 is 
expressed in testicular Leydig cells, and ovarian theca 
and granulosa cells. Mice homozygous for a targeted 
disruption of Ftzf1 do not develop gonads, adrenal 
glands, and the ventromedial hypothalamic nucleus, 
and exhibit impaired gonadotrope function. Fetuses 
homozygous for the null allele display some mesen- 
chymal cell thickening on the surface of the urogenital 
ridge where the genital ridge usually develops, but this 
region does not develop further and begins to degen- 
erate, probably by apoptosis. 


Dax | (Dosage-Sensitive Sex-Reversal-AHC 
Critical Region on the X Chromosome, also 
Known as Adrenal Hypoplasia Congenita 
Homolog, ahch) 

DAX1 was cloned from a 160kb region on human 
Xp21 that is associated with adrenal hypoplasia con- 
genita and dosage-sensitive sex reversal. Duplication 
of the chromosomal region including DAX1 causes 
sex reversal in XY humans. Like Sf7, Dax1 encodes 
an orphan nuclear hormone receptor. Unlike Sf7, 
Dax1 lacks any zinc finger DNA binding motifs and 
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is rapidly evolving in certain regions of the gene. DNA 
binding activity has been assumed by a region with 
three and a half repeats of a 67-68 amino acid motif. 
DAX1 inhibits the transcriptional activity of a num- 
ber of genes, including Sf1. Conversely, in certain cells 
SF1 activates Dax1 transcription. However, the sig- 
nificance of these finding to gonadogenesis is unclear 
because Dax1 expression persists in the genital ridge 
of Ftzf1 null mice. 

Dax1 is first expressed in somatic cells of the XX 
and XY genital ridge at 11.5 dpc. After 12 dpc, Dax1 ex- 
pression becomes sexual dimorphic: rapidly decreasing 
in the XY gonad, but persisting in the XX gonad until 
14.5dpc. However, conflicting data suggests that 
expression does not become sexually dimorphic. In 
the adult, Dax1 is expressed in testicular Sertoli and 
Leydig cells as well as ovarian stromal cells. 

The role of Dax1 in sex determination is enigmatic. 
Overexpression seems to inhibit proper testis develop- 
ment. On the other hand, loss of function in knockout 
mice has no effect on gonadogenesis or sex determin- 
ation. Adult males homozygous for the null allele are 
sterile due to loss of stratification of the testicular 
germinal epithelium, and therefore do not complete 
spermatogenesis. Adult females are fertile, but display 
minor abnormalities in oogenesis: some follicles con- 
tain multiple oocytes. It is possible that overexpres- 
sion of DAX1 inhibits the normal functioning of 
SF1 in testis development and thereby causes the 
sex-reversal phenotype. 


Wt! (Wilms’s Tumor 1!) 

WT1 mutations in humans are associated with Wilms’s 
tumor (kidney) and urogenital malformation, particu- 
larly Denys—Drash and Fraser syndromes. WT1 is a 
zinc-finger-containing, DNA binding transcription 
factor that probably also plays a role in mRNA spli- 
cing. The protein contains both activation and repress- 
ion domains, as well as a self-association domain. Wt1 
is expressed in the intermediate mesoderm and coelo- 
mic epithelium of 9 dpc mouse embryos, in the meso- 
nephros from 9.5 to 12.5 dpc (excluding the Millerian 
and Wolffian ducts), and in the genital ridges and later 
the developing gonads from 9.5dpc until at least 
birth. Wz1 expression is not sexually dimorphic in 
the developing gonads, is limited to the sex cords and 
is excluded from interstitial and germ cells. Wt1 is 
expressed in Sertoli cells in adult males. 

Fetal mice homozygous for a Wt1 targeted null 
allele display a very reduced thickening of the coelo- 
mic epithelium at 11 dpc and the gonadal ridge does 
not develop much further. Clearly, Wt is important 
for the development of the gonadal ridge from the 
urogenital ridge, but a role in later gonadogenesis 
also is possible. In fact, WT1 was recently reported 


to functionally oppose DAX1 in testis development 
and modulate SF1-mediated transactivation. 


Mis (Millerian Inhibiting Substance, also 

Known as Amh for Anti-Millerian Hormone) 
Mis is a member of the transforming growth factor-B 
family that is expressed in developing Sertoli cells from 
11.5 dpc until puberty and in ovarian granulosa cells 
from birth. Clearly, Mis is not involved in sex deter- 
mination because mice homozygous for a targeted 
null allele of either Mis, or its receptor (Amh type 2 
receptor, Amhr2), develop normal gonads (except for 
some Leydig cell hyperplasia in adult males) and 
gonadal phenotypic sex and genetic sex are concordant. 
However, chronic overexpression partially represses 
gonadal development and function in both sexes. 


Other Genes 

Additionally, a number of genes have been implicated 
as participating in sex determination or gonadogen- 
esis. However, their roles are less well established, and 
thus are briefly described below. 


Lhx! (LIM homeobox protein |, also known as Lim!) 
Lhx1 is expressed in the developing urogenital ridges 
(probably the mesonephric portion) from 8.5 dpc, and 
later becomes restricted to the mesonephric tubules 
and mesonephric (Wolffian) duct. It is not expressed in 
the genital ridges or gonads. Most embryos homozy- 
gous for a targeted null allele die at ~ 10 dpc; however, 
a few surviving neonatal pups lacked kidneys and 
gonads. 


Cbx2 (chromobox homolog 2, also known as M33) 
Cbx2, the mouse homolog of the Polycomb gene in 
Drosophila, is ubiquitously expressed in 12.5 dpc 
fetuses. The development of the genital ridge is 
very retarded in fetuses homozygous for a targeted 
probable-null allele. Subsequent gonadogenesis is 
similarly affected particularly testis formation, leading 
to XY sex reversal. The XY animals range from 
XY females with two ovaries to hermaphrodites with 
one ovary and one undescended testis. XX homozy- 
gous null females are sterile and occasionally lack 
one ovary. 


Gata4 (GATA-binding protein 4) 

GATA4 activates transcription from the Mis promoter 
in vivo, and is expressed in gonadal somatic cells in a 
sexually dimorphic pattern. The GATA4 protein is 
present in both XX and XY genital ridges at 11.5 dpc, 
and continues to be expressed in the testis through 
puberty. However, its expression is extinguished in the 
developing ovary beginning at 13.5 dpc. Unfortunately, 
fetuses homozygous for a targeted null Gata4 allele 


die by 9.5 dpc thereby precluding an analysis of sex 


determination in these mice. 


See also: Sex Chromosomes; Sex Determination, 
Human 
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In mammals, many insects and a few flowering plants, 
separate male and female sexes are determined by a pair 
of only partly homologous sex chromosomes, called X 
and Y. The female carries two X chromosomes in each 
cell nucleus and the male an X and a Y. The male, with 
the pair of unlike sex chromosomes, is called the het- 
erogametic sex, because it produces two kinds of 
sperm, X-bearing and Y-bearing in equal numbers. On 
the female side, all the eggs carry the X chromosome; 
progeny are female or male depending on whether the 
egg is fertilized by an X- or a Y-bearing sperm cell. 

In birds and reptiles the situation is reversed in that 
it is the female that is heterogametic. The sex chromo- 
somes are called W and Z; the females are WZ and the 
males are ZZ. 

X and Z chromosomes typically carry as many 
genes in proportion to their size as the ordinary 
chromosomes (autosomes), but the Y or W, present 
in only one sex, is typically nearly inert genetically 
(though necessary for Y sperm motility). Indeed, in 
some groups of insects, notably the Orthoptera (grass- 
hoppers and locusts), with XO sex-determining sys- 
tems, there is no Y chromosome at all, and the sperm 
cells either have an X chromosome or no sex chromo- 
some. The same applies to the nematode worm 
Caenorhabditis elegans, except that here XO and XX 
individuals are respectively males and hermaphro- 
dites, not males and females. 

The twofold difference between males and females 
in the dosage of the gene-rich X or Z chromosome, 
when the great majority of genes are required equally 
by both sexes, requires some system of dosage com- 
pensation. This has been well studied both in mam- 
mals and in Drosophila (see Dosage). In mammals one 
or other of the two X chromosomes is largely inacti- 
vated in every female somatic cell, resulting in a 
mosaic phenotype when the two X chromosomes 
carry distinguishable alleles — as in the case of the 
tortoiseshell cat. 

The key feature of the inheritance of sex-linked 
genes is that the heterogametic sex inherits them only 
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Table | Mode of inheritance of an X-linked recessive 
mutant allele 
Cross Progeny 
Female Male Female Male 
XX? x XN XET X*/Y 
Normal Mutant Carrier Normal 
X“XT x XN xX" XxX™/Y 
Mutant Normal Carrier Mutant 
XXY x XY 50% XtIXt 50% XTtIY 
Carrier Normal Normal Normal 
50% XX 50% X™/Y 
Carrier Mutant 
xtx™ x X™Y 50% XtT/IX™ 50% XTtIY 
Carrier Mutant Carrier Normal 
50% XxX /X™ 50% X™/Y 
Mutant Mutant 


This Table will apply equally to WZ (bird and reptilian) sex 
chromosome systems if the X is replaced by Z and Y by W, 
and the male and female headings are reversed. 


from the homogametic parent. Consequently the X- 
chromosome constitutions of human, mouse, or Dros- 
ophila males are a direct reflection of the segregation 
and recombination of the X-linked genes of their 
mothers, without any concealment by dominance 
from the father. X-linked mutations that are recessive 
in the female will always show their phenotypic 
effects in the male because there is no second chromo- 
some to supply a normal dominant allele (Table 1). 
Those few genes (if there are any) present on both X 
and Y chromosomes show a modified form of sex- 
linkage called pseudoautosomal (see Pseudoautosomal 
Linkage, Region). 

Human sex-linked conditions, such as hemophilia, 
are characteristically transmitted to sons by symp- 
tomless mothers. Such conditions are seen in males 
but hardly ever in females. If a recessive X-linked 
allele has an overall frequency of p in the whole popu- 
lation of genes, the phenotype will occur at frequency 
p in males (provided that they are fully viable) and p* 
in females, with heterozygous (‘carrier’) females at 
frequency 2p (1 — p). Deleterious X-linked recessive 
mutations are expected to be less common in the gene 
pool than autosomal recessives of comparably severe 
phenotype because they are always exposed to adverse 
selection in the male sex. This prediction is borne out 
by medical statistics. 


See also: Dosage Compensation; 
Pseudoautosomal Linkage, Region; 
Sex Linkage; X-Chromosome Inactivation 
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Sex Plasmid 
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A sex plasmid (S plasmid) is an episome that is able to 
initiate the process of conjugation, resulting in the 
transfer of chromosomal material from one bacterium 
to another. 


See also: Conjugation, Bacterial 


Sex Ratios 
S A West 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1 184 


Research into the sex ratio (proportion of individuals 
that are male) has been one of the most quantitatively 
successful areas of evolutionary biology. Relatively 
simple theory is able to explain why many animal 
species produce approximately equal numbers of 
males and females, why certain species have an excess 
of males or females, and why individuals of some 
species facultatively shift their offspring sex ratio 
in response to environmental conditions (Charnov, 


1982). 


Fisher’s Theory of Equal Investment in 
the Sexes 


Fisher (1930) provided an explanation for why most 
animal species, including humans, produce approxi- 
mately equal numbers of males and females. If there 
were an excess of males, they would on average obtain 
less than one mate, and so the fitness of females would 
be greater, favoring parents that produced a relative 
excess of female offspring. In contrast, if there were 
an excess of females, males would on average obtain 
more than one mate, and so the fitness of males would 
be greater, favoring parents that produced a relative 
excess of male offspring. Consequently, the fitness of 
males and females is only equal when equal numbers 
of the two sexes are produced (a sex ratio of 0.5). This 
argument assumes that equal amounts of resources are 
put into the production of sons and daughters. If this is 
not the case then the argument is phrased in terms of 
investment, and the evolutionarily stable strategy 
(ESS) is to invest resources equally in male and female 
offspring. 


Biased Sex Ratios 


Fisher’s principle clearly shows the frequency- 
dependent nature of selection on the sex ratio, and it 
provides a null model (equal investment in the sexes) 
which is the foundation block on which most areas of 
sex ratio research have been built. However, it assumes 
that the fitness returns from the production of sons 
and daughters are identical (or linear). Many different 
biological mechanisms contradict this assumption and 
in these cases biased sex ratios are predicted. Two 
scenarios are reviewed here where there is a rich 
experimental literature exploring the predictions of 
many theoretical models: (1) sex-biased interactions 
between relatives; and (2) differential effects of the 
environment on male and female fitness. 


Sex-Biased Interactions between Relatives 
Fisher’s principle assumes that the fitness returns from 
producing sons and daughters do not differ. This will 
not be the case if there are sex differences in the inter- 
actions of offspring with each other, or their parents 
(Hamilton, 1967). 

If production of one sex leads to a greater increase 
in fitness of the parents or their offspring, then an 
excess production of that sex is favored by a process 
called local resource enhancement (LRE). One ex- 
ample of this process is observed in African wild 
dogs, which are wolf-like social carnivores found in 
sub-Saharan Africa. These dogs live in packs, and 
young males help more than young females in the 
rearing of pups. This favors an excess of males, and, 
indeed, 60% of offspring are males. Another example 
is provided by allodapine bees, primitively social bees 
that communally nest in burrows. Sisters nest together, 
and increasing nest size leads to fitness benefits 
through increased survival and reproduction, in part 
due to the efficiency gained from division of labor. 
This favors an excess of females and, in this case, less 
than 20% of offspring are male. 

In contrast, if competition between siblings and/or 
parents is greater for one sex, then an excess produc- 
tion of the other sex is favored by a process called local 
resource competition (LRC). One example of this 
comes from African primates in the Galagidae family. 
In these species female disperse much less than males, 
and related females compete for resources, especially 
during the breeding season. This favors an excess of 
males and, in this case, 70% of offspring are male. 

A special case of LRC that has received consider- 
able attention is the competition for mates between 
brothers in structured populations, which is termed 
local mate competition (LMC). If brothers compete 
for mates (including their sisters) before the females 
disperse, then LMC theory predicts a female-biased 


sex ratio to reduce this competition. Support for LMC 
theory has come from a wide range of animals and 
plants, especially insects (e.g., parasitic wasps, aphids, 
thrips, and beetles), other arthropods (e.g., mites, spi- 
ders), protozoan parasites (e.g., blood parasites such as 
those causing malaria, and intestinal parasites such 
as Toxoplasma), and flowering plants. In these cases 
not only do populations show female-biased sex ratios, 
but in many cases individuals have been shown to 
adjust the ratio of their offspring in response to vari- 
ation in the level of LMC. 

Both LRE and LRC can occur in the same species. 
One of the clearest examples of this comes from stud- 
ies of the Seychelles warbler. This small bird is ex- 
tremely territorial, and in situations where there are 
few new territories available, some young will remain 
at their natal nest and help raise siblings. The majority 
(80%) of helpers are daughters. Importantly, whether 
a helper is advantageous or disadvantageous for her 
parent depends on the quality of the territory occu- 
pied, which depends on the availability of insects for 
food. On high-quality territories, helpers are benefi- 
cial from the point of view of their parent, and increase 
the number of young produced (LRE). Onlow- quality 
territories, the increased competition for food with 
helpers means that their presence is disadvantageous 
from the point of view of their parent (LRC). As 
predicted, predominantly (90%) females are laid on 
high-quality territories where their presence as help- 
ers will be beneficial (LRE), and predominantly (80%) 
males are laid on low-quality territories, from which 
they will disperse, and avoid competition with their 
relatives (LRC). 


Differential Effects of Environment on Male 
and Female Fitness 

Fisher’s principle assumes that variation in environ- 
mental conditions affects the fitness of sons and 
daughters equally. If this is not the case, then individ- 
uals can be selected to adjust the sex of their offspring 
in response to the environment (Trivers and Willard, 
1973). 

This idea was first applied to explain sex ratio pat- 
terns in mammals caused by variation in maternal 
condition. For example, in red deer, higher quality 
(indicated by rank) females are more likely to produce 
sons, and lower quality females are more likely to 
produce daughters. This is thought to occur because 
(1) higher-quality females are able to provide more 
resources for their offspring, and (2) competition for 
mates between males is intense, with only the highest 
quality males being successful, and so sons benefit 
more from increased resources than daughters. 
The same concept can explain why in many species 
of parasitic wasp, where only one individual can 
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develop per host, females lay male eggs on small 
hosts and female eggs on large hosts. In this case it 
is presumed to be advantageous because the result- 
ant increase in body size provides a greater bene- 
fit for daughters (through effects on fecundity) than 
sons. 

The same concept can also apply in response to 
variation in mate quality. In several bird species (e.g., 
zebra finches, collared flycatchers, and blue tits), 
females have been shown to adjust the sex of their off- 
spring in response to the attractiveness of their mate. 
This is advantageous when male attractiveness is 
an indicator of genetic quality and heritable. 
Consequently, if a female mates with a relatively 
attractive male there is an advantage to producing 
sons, who will inherit their father’s attractiveness. 
This pattern is observed, and some bird species show 
remarkable ability to shift their offspring sex ratio in 
the predicted way. 


Conclusions 


Evolutionary biologists have developed an excellent 
understanding of the selective factors that shape the 
sex ratio. More generally, studies of the sex ratio have 
provided some of the best support for the adaptation- 
ist approach (West et al., 2000). In particular, they have 
provided an area in which theory is able to make 
predictions that can be tested qualitatively, and 
sometimes quantitatively, with experimental and 
observational data. Perhaps one of the greatest ques- 
tions remaining is how do species with chromosomal 
sex determination, such as mammals and birds, 
achieve such striking facultative shifts in offspring sex 
ratios? 


References 

Charnov EL (1982) The Theory of Sex Allocation. Princeton, NJ: 
Princeton University Press. 

Fisher RA (1930) The Genetical Theory of Natural Selection. 
Oxford: Clarendon Press. 

Hamilton WD (1967) Extraordinary sex ratios. Science |56: 
477-488. 

Trivers RL and Willard DE (1973) Natural selection of 
parental ability to vary the sex ratio of offspring. Science 
179: 90-92. 

West SA, Herre EA and Sheldon BC (2000) Recent develop- 
ments in the study and use of sex allocation. Science, 290: 
288-290 


See also: Evolutionarily Stable Strategies; 
Frequency-Dependent Selection; Frequency- 
Dependent Selection as Expressed in Rare Male 
Mating Advantages 


1822 Sex Reversal 


Sex Reversal 


M A Ferguson-Smith 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1185 


The somatic (or phenotypic) sex of most mammals 
including man is normally determined at fertilization 
by the sex chromosome carried by the sperm. 
X-bearing sperm produce female embryos, while Y- 
bearing sperm produce male embryos. The SRY gene 
present on the short arm of the Y chromosome is 
regarded as the primary signal required for inducing 
the undifferentiated gonad in the early embryo to 
develop into a testis. Without SRY the gonad defaults 
into ovarian development. However, testis differen- 
tiation depends on the action of a number of other 
genes, some X-linked and others autosomal. It follows 
that mutations of a number of genes involved in 
the testis differentiation pathway may lead to a female 
phenotype in the presence of a normal XY sex 
chromosome constitution. Less commonly, genetic 
defects may lead to testis differentiation in the pre- 
sence of a normal XX sex chromosome complement. 
These cases of sex reversal are often referred to as XY 
females and XX males. 


XY Females 


Individuals with mutations of the SRY testis- 
determining gene develop as immature females of 
normal to above average height and normal intelli- 
gence. The internal genitalia are normal in childhood, 
with development of a uterus and fallopian tubes and 
absence of Wolffian derivatives. However, the ovaries 
are abnormal and devoid of oogonia. In the adult the 
ovaries are represented by thin strips of ovarian 
stroma (streak gonads) in the broad ligament, and 
this results in sexual infantilism due primarily to 
lack of estrogenic hormones. Breasts and vulva fail 
to develop, and there is lack of axillary and pubic 
hair. These features can be corrected by estrogen 
therapy, which has to continue for life. Pregnancy is 
possible only by ovum donation. The phenotype of 
these XY females is usually known as pure gonadal 
dysgenesis to distinguish it from Turner syndrome, 
which is also characterized by streak gonads and 
sexual infantilism but which has in addition short 
stature and a number of additional features referred 
to as Turner stigmata. These include webbing of the 
neck, cubitus valgus, short IVth metacarpals, multiple 
pigmented naevi, and hypoplastic finger nails (see 
Turner Syndrome). 


Pure gonadal dysgenesis in XY females also occurs 
in individuals with SOX9 mutations (campomelic 
dysplasia), with WT mutations (Denys—Drash syn- 
drome), with duplications of the Xp21 region (DSS 
gene), and with deletions of the short arm of chromo- 
some 9 and the long arm of chromosome 10. The gene 
locus involved in these last two conditions has not 
yet been identified precisely. (See Sex Determination, 
Human for a detailed account of the known genes 
involved in the sex determination pathway.) In all 
these genetic aberrations, additional clinical features 
of varying severity are associated, usually leading to 
substantial handicap. Investigation of XY females for 
the underlying genetic defect has been largely respon- 
sible for our present understanding of the genetic 
pathway in mammalian sex differentiation. As the 
underlying defect has been identified in only a small 
proportion of XY females, other loci important in sex 
differentiation remain to be discovered. 

Apart from pure gonadal dysgenesis, several other 
disorders are associated with sex reversal in XY 
females: 


1. Androgen insensitivity (testicular feminization). 
This condition is the result of mutations of the X- 
linked androgen receptor gene. The developing 
testis secretes normal amounts of testosterone but 
the tissues are unable to respond due to the absence 
of androgen receptors. Affected individuals have 
normal female gender, are within the stature range 
of males, and develop breasts at puberty. However, 
they fail to menstruate, and pubic and axillary hair 
is scant. This is often the first indication of the 
condition. Sometimes patients present with an 
inguinal hernia in childhood, and this leads to the 
discovery of testes in the inguinal canals; however, 
the testes usually remain within the pelvis and are 
only identified by laparotomy. This reveals a short, 
blind-ending vagina, the absence of uterus and fal- 
lopian tubes, and the failure of development of 
Wolffian structures. As there is a risk that testicular 
tumors (dysgerminoma) may develop, the testes 
are removed and the patient maintained for life on 
a small daily dose of estrogen. 

The disorder is inherited as an X-linked recessive 
trait and the carrier state in normal females may 
sometimes be recognized by the patchy distribu- 
tion of sex hair. Half the XY offspring of a carrier 
are at risk of being affected. Incomplete forms of 
androgen insensitivity are also recognized, due to 
mutations of the androgen receptor locus distinct 
from the complete form. Partial masculinization 
occurs leading to sexual ambiguity at birth and 
virilization at puberty. 


2. 5-a-reductase deficiency (pseudovaginal perineo- 
scrotal hypospadias). Deficiency of 5-a-reductase 
leads to failure of conversion of testosterone to di- 
hydrotestosterone. XY infants with severe enzyme 
deficiency have a small hypospadic phallus, a blind 
vaginal pouch, and absent Miillerian derivatives. 
Over 50% are raised as XY females. At puberty, 
the patients virilize, do not develop breasts, and 
undergo a gender identity change to male gender. 

3. Several other rare disorders of steroid biosynthesis 
may lead to sex reversal in XY females. These 
include: 


. testicular unresponsiveness to gonadotrophin 

. congenital lipoid adrenal hyperplasia 

. 3-B-hydroxysteroid dehydrogenase deficiency 

. 17-hydroxylase deficiency 

. 17 B-hydroxysteroid oxidoreductase deficiency. 


oma o 


Deficiency of each of these enzymes can be asso- 
ciated with female external genitalia, absence of 
Miillerian derivatives, and a blind vaginal pouch. 


XX Males 


Phenotypic males with small testes and, commonly, 
gynaecomastia associated with an apparently normal 
female karyotype are usually referred to as XX males. 
The endocrinological features are identical to 
Klinefelter syndrome (Klinefelter Syndrome), with 
the exception that there are no associated learning 
difficulties and stature is reduced to within the normal 
female range. In the majority of patients the condition 
is due to abnormal recombination between the X and Y 
in paternal meiosis, so that the region containing the 
SRY gene is transferred to the end of the short arm of 
the X. This abnormal event is often accompanied by 
loss of the Xg locus on Xp, revealed by failure of the 
patient to inherit the paternal Xg(a) allele. Sometimes, 
the transferred region of the short arm of the Y is large 
enough to be identified under the microscope. 

Some 15% of XX males do not have SRY and the 
reason for the sex reversal is at present unknown. 
Hypospadias is more common in these SRY-negative 
patients and, rarely, several XX males may occur in the 
same pedigree. The etiology of this group may be 
similar to that of XX true hermaphroditism and, in 
fact, pedigrees have been reported in which both con- 
ditions occur. 
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‘Sexduction’ (also called F-duction) is an old term 
no longer in general use. Coined by Jacob and Woll- 
man (1961), sexduction uses an analogy with phage- 
mediated transduction to describe the process of high 
frequency conjugative transfer to a recipient bacter- 
ium of a segment of Escherichia coli DNA incorp- 
orated in an F’ (F-prime) plasmid. 

F’ plasmids are derivatives of the F sex plasmid, a 
conjugative plasmid of E. coli (see F Factor). F’ plasmids 
have acquired a piece of the E. coli chromosomal 
DNA. (See F Factor, Figure 2 for a description of how 
F’ plasmids acquire chromosomal DNA.) Because the 
chromosomal DNA segment is joined covalently into 
the F plasmid, it can be transferred efficiently into re- 
cipient bacteria. The piece of chromosomal DNA 
transferred is of defined length (the length present in 
the F’), transfers quickly, and does not require recom- 
bination into the recipient bacterium’s chromosome for 
its replication, expression, and heritable transmission. 
Also, the bacterial recipients of sexduction become 
male (donors) upon sexduction because they acquire 
the entire F’ plasmid with its transfer and pilus genes, 
as well as its chromosomal DNA segment. These pro- 
perties are unlike the conjugative transfer of chromo- 
somal DNA from Hfr strains of E. coli, in which the F 
plasmid has integrated into the bacterial chromosome 
(see Hfr). Sexduction is similar to transfer of bacterial 
DNA in phage-mediated ‘transduction’ (see Trans- 
duction), especially in “generalized transduction’ in 
which essentially any region of the bacterial chromo- 
some can be transferred to a recipient cell. This is pos- 
sible because the F can be integrated at many different 
locations in the chromosome, such that many 
different F’ plasmids can be formed upon its excision. 
The key similarity between sexduction and phage- 
mediated transduction is the finite amount of the 
chromosomal DNA transferred. With sexduction, un- 
like transduction, the recipient strain becomes partially 
diploid (merodiploid) for the segment transferred. 
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Sexduction was used by bacterial geneticists as an 
early form of in vivo gene cloning. A piece of chromo- 
somal DNA could be selected that was attached to the 
F plasmid vector, isolated from it, and transferred to 
other bacteria by conjugation. 
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A sex-limited character is a trait that is expressed in 
only one sex. Not all phenotypic differences between 
the sexes are due to the sex-linked genes that are 
present in differing amounts in each sex. Autosomal 
genes, that are equally present in both sexes, can also 
cause sex-limited characters. An example is the devel- 
opment of horns in only one sex of certain animals. 
Sex-limited disease can also occur. An example is the 
development of breast cancer in human females caused 
by mutations in the autosomal BRCAI gene. Males 
carrying the mutation have a low risk of developing 
breast cancer but can transmit the disease to their 
daughters. 

Example pedigrees may be seen at http://www. 
gla.ac.uk/medicalgenetics/encyclopedia.htm 
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Sezary’s syndrome is a form of low-grade cutaneous 
T-cell non-Hodgkin’s lymphoma with involvement of 
the peripheral blood; the purely cutaneous form of the 
disease is known as mycosis fungoides. Both forms are 
rare. In Sezary’s syndrome the degree of cutaneous 
infiltration is often marked, resulting in erythroderma 
or Phomme rouge’ appearance. The T cells are 
clonal, have a characteristic nuclear morphology, and 
usually express CD4 but neither CD8 nor CD25. The 
pathogenesis of this disease is not known. There are no 
consistent cytogenetic or molecular changes. Reports 


claiming retroviral involvement have not been con- 
firmed. 


Further Reading 
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The Src homology (SH) domains are protein modules 
with defined structures and protein-protein inter- 
action function that are found in Src family kinases 
and also in many other intracellular signal transduction 
proteins, including, for example, receptor protein 
tyrosine kinases, phospholipase C-gamma, and the 
Ras GTP-ase-activating protein. The SH domains ex- 
emplify protein-protein interaction domains, which 
like their cognate recognition motifs are modular in 
nature. Their widespread occurrence and conserved 
molecular functions, even in the context of proteins 
with distinct enzymatic or biological properties, has 
led to the concept of ‘protein recognition codes’ 
(reviewed in Sudol, 1998). 
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Noncatalytic Src homology 2 (SH2) domains rec- 
ognize tyrosine phosphorylated residues in other 
proteins, including receptor tyrosine kinase auto- 
phosphorylation sites. Phosphorylation of individual 
tyrosine residues induces the binding of SH2 domain- 
containing proteins, with specificity being determined 
by the amino acids flanking the phosphotyrosine, 
particularly the C-terminal residues. Recruitment of 


tyrosine phosphorylated peptides by SH2 domain- 
containing proteins results in heteromeric protein 
complexes that are temporally and spatially controlled 
within the cell. This links tyrosine kinase signaling to 
the formation of protein complexes that are necessary 
for signal propagation. 


See also: SH Domains; SH3 Domain; Signal 
Transduction 
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The SH3 domain is a distinct motif that binds target 
proteins, including proteins associated with the actin 
cytoskeleton, through sequences containing proline 
and hydrophobic amino acids. Many proteins that 
contain SH3 domains also have SH2 domains, and 
these may act together to modulate specific protein— 
protein interactions. Some signaling and transforming 
proteins contain SH3 and/or SH2 domains with no 
associated catalytic activity, for example, the Grb2 
adaptor protein that links receptor tyrosine kinases 
to the Ras pathway. Thus, adaptor proteins serve to 
mediate higher order protein complexes at appropri- 
ate times and places within the cell and to orchestrate 
appropriate biological responses. Three-dimensional 
structures of several individual SH domains have been 
determined and have provided clues as to the determin- 
ants of molecular specificity. 


See also: SH Domains; SH2 Domain; Signal 
Transduction 
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This is the method of subjecting large DNA molecules 
to hydrodynamic forces to reduce the size of the mol- 
ecules. DNA molecules which are hundreds of thou- 
sands of base pairs in length can be broken to tens of 
thousands of base pairs simply by pipetting a solution 
of DNA. DNA of thousands of base pairs can be 
reduced to hundreds of base pairs by forcing a solution 
of DNA through a small-bore needle with a syringe. 
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The ‘shifting balance theory’ (SBT) is a theory of 
evolution that was proposed by Sewall Wright in the 
early 1930s as an alternative to the view of evolution- 
ary change dominant at the time. That view, based 
largely on the work of R. A. Ronald Fisher, saw evo- 
lution as resulting mainly from directional selection 
acting on large populations. Wright, on the other 
hand, believed that such a process could not account 
for either the nature of adaptations or the pace of 
evolutionary change, and considered his SBT a more 
plausible explanation. In contrast to the ‘Fisherian’ 
view of evolution requiring only selection on favor- 
able alleles, Wright’s SBT assumes that species are 
subdivided into small populations (‘demes’), that 
these demes exchange migrants based on their degree 
of adaptation, that there are special forms of epistasis 
between genes, and that both natural selection and 
random genetic drift interact to cause evolutionary 
change. 


Adaptive Landscape 


The SBT is intimately connected with Wright’s notion 
of the ‘adaptive landscape,’ which must be under- 
stood to fully grasp his theory. In Wright’s view, the 
evolutionary opportunities for a population can be 
viewed as a type of topographical map. In a three- 
dimensional analog, the latitude and longitude of a 
point would correspond respectively to the frequen- 
cies of alleles segregating at two loci within a popula- 
tion, and the height of the landscape above any point 
would represent the mean fitness of a population 
having the specified allele frequencies at those 
two loci. ‘Peaks’ in the landscape would correspond 
to frequencies of alleles conferring high fitness on a 
population, and ‘valleys’ to intervening areas of 
low fitness. Wright’s SBT was designed to explain 
how an entire species could move from one adaptive 
peak to successively higher ones — hence becoming 
more and more adapted — while temporarily under- 
going periods of reduced adaptation as it crossed 
adaptive valleys. 

Figure | shows a simple adaptive landscape based 
on two alleles segregating at each of two loci (such 
landscapes can be envisioned for more than two genes, 
but are difficult to represent since they require more 
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ean fitness of 
the population 


M 


Figure | 


Simple adaptive landscape involving two alleles at each of two loci (A, a and B, b). The frequencies of the A 


and B alleles are plotted on the horizontal axes, and the ‘adaptive landscape’ (shaded area) represents the fitness of a 
population having the given frequencies of the two alleles. The AABB and aabb genotypes are assumed to be the most 
well adapted, having relative fitnesses of |.0 and 0.8, respectively. The fitness of the A_bb genotype is assumed to be 
0.4, and of the aaB_ genotype 0.2. These assumptions produce an adaptive landscape having two peaks (corresponding to 
fixation for the aabb or AABB genotypes) separated by an adaptive valley. A population cannot move from the aabb peak to 
the AABB peak without the help of genetic drift, which may allow a move across the intervening valley. 


than three dimensions). This landscape is derived 
from assuming that a population may have dominant 
or recessive alleles at two loci (A, a and B, b, re- 
spectively). Populations consisting entirely of aabb or 
AABB genotypes occupy adaptive peaks, while popu- 
lations consisting either of AAbb or aaBB genotypes, 
or which contain more than one genotype, have lower 
fitness. Plotting the fitness of a population against 
its genetic constitution yields an adaptive landscape 
with two peaks: a higher peak (an AABB population) 
and a lower peak (an aabb population). As a biological 
example, the AABB and aabb genotypes might both 
produce cryptic white coat colors in arctic mammals, 
while the aaB_ or A_bb genotypes produced colored 
coats that are maladaptive; the higher fitness of the 
AABB than of the aabb peak might result from other 
pleiotropic effects of the coat-color genotype on repro- 
duction. In such a case, which involves both epistasis 
and pleiotropy, adaptive peaks can be separated by 
adaptive valleys, as seen in Figure |. (While Wright’s 
model assumed pleiotropic effects of identical pheno- 
types, this is not necessary to produce a multipeaked 
adaptive landscape, for the peaks can also represent 
different phenotypes separated by adaptive valleys.) 


If one accepts that such fitness landscapes are com- 
mon in nature, then populations on adaptive peaks are 
‘stranded,’ i.e., unable to reach higher peaks by natural 
selection alone. A population fixed for genotype aabb 
in Figure l, for example, cannot reach the higher 
AABB peak by selection, for this would require cross- 
ing the ‘adaptive valley’ dividing the peaks, a journey 
opposed by natural selection. In Wright’s view, hilly 
adaptive topographies are quite common in nature. To 
become more and more adapted to its environment 
(or to adapt to a changing environment), species thus 
require a mechanism for becoming temporarily mal- 
adapted so that they may traverse adaptive valleys. 
This mechanism was supplied by the SBT. 


Shifting Balance Process 


The SBT posits that the movement of a species from one 
adaptive peak to a higher one involves three distinct 
phases of evolution. In phase I, a single population of a 
species is perched on an adaptive peak, but then 
undergoes a loss in fitness due to random gen- 
etic drift, which counteracts selection and drives the 
population into an adaptive valley. Once in the valley, 


the population experiences phase II, coming under the 
influence of another, higher peak. Natural selection 
draws this population uphill until it attains a level of 
adaptation higher than its original state (in Figure |, 
for example, a population could move from the aabb 
peak to the AABB peak). In phase III, the population 
that has attained the higher peak sends out additional 
migrants to other populations of the species (the 
assumption here is that the number of migrants leav- 
ing a population is proportional to its size, which 
is correlated with its degree of adaptation). These 
migrants would then genetically alter other popula- 
tions, forcing them off their peaks until the entire 
species comes to rest on the new and higher peak. If 
this process occurred repeatedly, it could lead a species 
to become more and more adapted as it scaled ever 
higher peaks. Because Wright viewed adaptation in 
nature as the movement between peaks in an adaptive 
landscape, he did not believe that mass selection in 
large populations could lead to greater adaptation, for 
such populations are largely immune to the genetic 
drift that enables them to cross adaptive valleys. 


Importance and Influence of the Theory 


Between 1935 and 1975, the SBT was quite influential, 
and discussions of the theory and adaptive landscapes 
appeared frequently in major works of the evolution- 
ary synthesis and still permeate modern textbooks of 
evolution. The SBT has been attractive for several 
reasons. First, it introduced the appealing idea of 
the adaptive landscape, in which evolution is seen as 
a form of hill climbing. This simple graphical metaphor 
is able to make complex evolutionary processes visually 
comprehensible. In addition, unlike the Fisherian view 
of evolution, which requires only natural selection and 
large populations, the SBT incorporates a diversity of 
evolutionary elements, including genetic drift, popu- 
lation structure, selection, and epistasis, and thus may 
be considered to be more comprehensive. 

When evaluating the SBT, one must ask four ques- 
tions. First, is, as Wright claimed, the Fisherian theory 
of adaptation insufficient to explain large-scale evo- 
lutionary changes? Second, does the SBT work as a 
theory; that is, does the verbal description given above 
prove valid when one makes a mathematical model of 
the entire process? Third, is there evidence from 
laboratory experiments that evolution can proceed 
according to the SBT and produce greater adaptation 
than does mass selection in large populations? Finally, 
is there evidence from nature that the SBT has been a 
frequent cause of adaptation? 

Wright’s main rationale for proposing the SBT, that 
mass selection would be insufficient to produce the 
observed diversity of life, has not been substantiated, 
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as there is no evidence that mass selection is too slow 
to explain either adaptation or biological diversity. 
Moreover, there are almost no adaptations known 
whose evolution would involve intermediate steps of 
lowered fitness, making genetic drift essential to over- 
come selection during their evolution. Thus one can- 
not assert the superiority of the SBT over the Fisherian 
theory on this basis. 

Although Wright developed mathematical methods 
to represent different phases of the SBT, including 
theories of how allele frequencies in populations 
respond to the joint pressures of selection, drift, and 
migration, he never produced a mathematical analysis 
including more than one phase of the SBT. Subse- 
quent workers have produced such models, and their 
work has indeed shown that, given Wright’s assump- 
tions, the SBT can operate under restricted conditions. 
Empirical data and some models of evolution do 
imply the existence of different adaptive peaks, and 
although the transitions between such peaks need 
not require genetic drift, they can occur if populations 
are fairly small and valleys are sufficiently shallow. 
The primary theoretical problems of the SBT occur 
in phase III, in which a population arrives on top of 
a new adaptive peak and, through migration, draws 
the rest of the species to that same peak. One major 
obstacle to this process is that, unlike adaptations 
favored by simple mass selection, adaptations whose 
fixation requires some genetic drift are often pre- 
vented from spreading by physical barriers to gene 
flow (such as poor habitat). Moreover, the evolution 
of complex adaptations by the SBT requires that 
the components of such adaptations arise by peak 
shifts in different demes, and theory shows it is diffi- 
cult for the SBT to assemble these components into 
the whole. 

There have been some attempts to test the SBT in 
the laboratory, which have generally been experiments 
in which artificial selection is practiced on both large 
and subdivided populations. In general, these experi- 
ments have failed to show a greater response in 
the structured populations, as might be predicted by 
the SBT. 

It is difficult to test the SBT in nature because of 
both the complexity of the process and the near- 
impossibility of establishing whether an existing adap- 
tation involved crossing an adaptive valley via drift. 
(Movement of a species from one adaptive peak to 
another need not require genetic drift but can occur 
through mass selection following a changed environ- 
ment or a new mutation.) Reviewing specific adap- 
tations in nature, one finds that although there is some 
evidence for individual phases of the shifting balance 
theory (i.e., as proposed in phase I, drift can occa- 
sionally counteract natural selection), there are almost 
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no empirical observations explained better by the SBT 
than by simple mass selection. 

In view of these theoretical and empirical problems, 
there is not strong support for the SBT or Wright’s 
assertion that it has been the major engine of evolu- 
tionary change. Nevertheless, the theory has left an 
important legacy, both in the metaphor of the adaptive 
landscape, which still pervades evolutionary biology, 
and in the mathematical constituents of the SBT 
devised by Wright. The most important of these are 
the general equations for the interaction of diverse 
evolutionary forces such as drift, selection, migration, 
and mutation. These equations are still used to solve 
many problems of theoretical population genetics. 


Further Reading 

Coyne JA, Barton NH and Turelli M (1997) A critique of Sewall 
Wright’s shifting balance theory of evolution. Evolution 51: 
643-671. 

Wright S (1932) The roles of mutation inbreeding, cross breed- 
ing, and selection in evolution, pp. 356—366. Proceedings of the 
6" International Congress on Genetics. 
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A Shine-Dalgarno sequence is a polypurine se- 
quence found in bacterial mRNA just before an 
AUG initiation codon. It is part or all of the sequence 
5'’-AGGAGG-3’. It is complementary to a highly 
conserved sequence at the 3’ end of 16S rRNA, 
and is involved in binding of the ribosome to the 
mRNA. 


See also: Messenger RNA (mRNA) 


Shotgun Cloning 
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Shotgun cloning is the cloning of an entire genome in 
the form of randomly generated fragments. 


See also: Human Genome Project 


Shuttle Vector 
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A shuttle vector is a cloning vector that is able to 
replicate in more than one organism, e.g., Escherichia 
coli and Saccharomyces cerevisiae. Generally, it is a 
plasmid constructed to contain the origins of replica- 
tion of both hosts, and is used to carry foreign genes 
from one species to another. 


See also: Cloning Vectors; Vectors 
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A form of anemia associated with elongated and 
sickle-shaped blood cells was first reported by the 
American physician James Herrick in 1910. In 1949 
Linus Pauling and colleagues found that it results from 
an inherited structural change in hemoglobin. It was 
subsequently realized that it is an extremely common 
disease which occurs principally in sub-Saharan Africa, 
some of the Mediterranean populations, throughout 
the Middle East, in localized areas of the Indian sub- 
continent, and in populations anywhere in the world 
which originated from these regions (Figure 1). 


Inheritance 


Sickle cell anemia is inherited in a Mendelian recessive 
fashion. The structure of human hemoglobin changes 
between fetal and adult life. All the human hemo- 
globins consist of two different pairs of peptide chains 
called globin chains, each of which is attached to the 
oxygen carrying moiety, heme. Fetal hemoglobin con- 
sists of a pair of -globin chains and a pair of y-globin 
chains (0272), while adult hemoglobin consists of a 
and B chains (0282). Hemoglobin S differs from hemo- 
globin A by a single amino acid substitution in the 
B-globin chain, valine for glutamic acid. The B chains 
consist of 146 amino acids, numbered from the N- 
terminal end; this change is at position 6. Individuals 
with a single sickle cell mutation, that is carriers for 
the abnormal gene, have one normal B chain gene, B, 
and one abnormal gene, B°. Thus they have two types 
of hemoglobin, normal (a2) and sickle cell (#263). 
This is called the sickle cell trait. Those who inherit a 
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Figure | 
hemoglobin variant, is also shown. 


B° gene from both parents can only make sickle cell 
hemoglobin and hence they have sickle cell disease. 


Effect of the Sickle Cell Mutation 


The amino acid substitution that causes sickle cell 
hemoglobin has the remarkable effect of changing 
the shape of red blood cells from their normal circular, 
biconcave configuration into an elongated, sickled 
shape when blood is deoxygenated (Figure 2). This 
occurs because concentrated solutions of hemoglobin 
S form gels containing fibers of hemoglobin S mol- 
ecules which are stabilized by interactions of the sub- 
stituted valine residue. These have the effect of 
distorting the red cell into a sickle, or sometimes a 
holly leaf, shape. In addition, the continued distortion 
of the red cell damages its membrane and causes it to 
become dehydrated and more rigid than normal. 
Furthermore, these cells tend to become abnormally 
adherent to the endothelial lining of blood vessels. 
The overall effect of the complex changes which 
occur to deoxygenated sickle cells is to reduce their 
survival time in the circulation and also to cause 
blockage to small blood vessels with subsequent 
destruction of tissues. Thus sickle cell disease is char- 
acterized by a reduction in the number of red cells, 


The world distribution of the sickle cell gene. Hemoglobin E, the second commonest structural 


or anemia, and a variety of complications due to the 
sequestration of sickle cells in different organs and the 
tissue damage that results from this process. 


Clinical Features 


The sickle cell trait is symptomless. It can be diag- 
nosed by hemoglobin electrophoresis. Patients with 
sickle cell anemia adapt surprisingly well to their ane- 
mia. However, they are prone to a variety of compli- 
cations. They are particularly susceptible to infection, 
possibly because sickling of red cells damages the 
spleen and causes it to shrink and become scarred; 
the spleen, for reasons that are not well understood, 
plays a major role in combating infection, particularly 
early in life. Constant minor blockages of small blood 
vessels also leads to other chronic complications, 
including damage to the bones, particularly the heads 
of the humerus and femur, progressive impairment of 
kidney function, and chronic leg ulcers. 

The most important feature of sickle cell anemia, 
however, is the occurrence of so-called sickling crises, 
a name given to a variety of acute complications of 
the disease. These may begin early in infancy with the 
‘hand-foot syndrome,’ that is painful swelling of the 
hands and feet due to damage of the growing ends of 
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Figure 2 Peripheral blood film of a patient with sickle 
cell anemia showing sickle cells. 


the bones. Painful crises, characterized by widespread 
bone pain due to local areas of destruction of the 
marrow and damage to the overlying bone, can occur 
at any age. Although they are sometimes precipitated 
by infection or cold, often no cause can be found. In 
babies there is a form of sickling crisis characterized 
by rapid progressive anemia and enlargement of the 
spleen, which probably reflects the sequestration of 
sickle cells. A similar process occuring in the main 
blood vessels of the lungs is the basis for the lung 
syndrome, characterized by increasing breathlessness 
and anemia. Some children develop curious thickening 
of the arteries at the base of the brain which may give 
rise to recurrent strokes. Other complications include 
prolonged, painful erections, and episodes of pro- 
found anemia associated with infections with an 
agent called parvovirus. Pregnancy may be associated 
with an increased frequency of painful crises and there 
is also an increased rate of fetal loss. 

The clinical course of sickle cell anemia varies 
widely; some patients go through life with few com- 
plications while others are troubled with frequent 


crises and a wide variety of pathological sequelae. 
Although a number of genetic factors have been dis- 
covered which are partly responsible for this variabil- 
ity, in many patients it remains unexplained. 


Control and Treatment 


In between crises no specific treatment is required 
except for supplements of the vitamin folic acid, 
which is required for red cell production. As soon as 
the condition is recognized patients are given prophy- 
lactic penicillin, taken in tablet form each day, and 
they should be immunized to combat infections with 
several organisms which commonly cause severe 
infections in this disease. Mild painful crises are man- 
aged at home with simple analgesics, but when the 
pain is more severe hospital admission is required, so 
that more powerful analgesics can be administered, 
together with adequate hydration and monitoring of 
the oxygen levels in the blood. Infections are treated 
with appropriate antibiotics. The various sequestra- 
tion crises are medical emergencies which require 
urgent hospital admission and treatment with transfu- 
sion or even exchange transfusion, that is the replace- 
ment of most of the patient’s blood with that from a 
normal donor. 

Children should be monitored with regular testing 
of their cerebral circulation and if there is evidence 
that they are likely to develop a stroke, or they have 
had an event of this kind, they should be maintained 
on blood transfusion for an indefinite period. Because 
of the risks of iron overload following long-term 
blood transfusion, the decision to embark on a regimen 
of this type is made only if there is a danger of 
stroke or the frequency of painful crises is making 
life intolerable. 

Sickle cell disease can be identified prenatally at 
about 9-12 weeks’ gestation. However, in most coun- 
tries very few pregnancies are terminated for this 
condition. It can be cured by bone marrow transplant- 
ation if suitable donors are available. However, 
because of the uncertainty of the prognosis, and the 
risks of this procedure, its place in management is still 
controversial. Attempts are being made to elevate fetal 
hemoglobin levels in this disorder because children 
who make more fetal hemoglobin than usual are pro- 
tected against some of the effects of the disease. A 
clinical trial of the agent hydroxyurea, a drug that is 
used for certain forms of leukemia, has shown that it 
has this effect and reduces the frequency of painful 
crises in adults. The long-term safety of this agent has 
not yet been assessed, however. A great deal of research 
is being carried out toward discovering other agents that 
will stimulate fetal hemoglobinsynthesis, and into ways 
of trying to replace the sickle cell gene with a normal 


B-globin gene, or by other methods of genetic engineer- 


ing, to correct the defect in sickle cell hemoglobin. 


Prognosis 


It is now usual for the majority of patients with sickle 
cell anemia who live in the richer countries, where there 
is a high quality of medical care, to survive to adult life. 
This is not the case in sub-Saharan Africa, where the 
disease is still a major killer in early childhood. 


Other Sickling Disorders 


The sickle gene may be inherited together with one for 
another hemoglobin variant, hemoglobin C or D for 
example, with the production of milder sickling dis- 
orders. It may also be inherited together with different 
forms of B-thalassemia (see Thalassemias). 
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History 


Sigma factors are subunits of all bacterial RNA poly- 
merases. They are responsible for determining the 
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specificity of promoter DNA binding and control 
how efficiently RNA synthesis (transcription) is 
initiated. The first sigma factor discovered was the 
sigma70 (o”°) of the highly studied bacterium Escher- 
ichia coli. Its discovery in 1968 was an unexpected 
outcome of attempts to understand the subunit struc- 
ture of RNA polymerase. It was found that RNA 
polymerase activity was associated with two protein 
species. A core polymerase (with subunit structure 
a2ßBp’) can transcribe DNA into RNA inefficiently 
and nonspecifically. When the sigma subunit, 67°, is 
added, it can bind to core forming a holoenzyme 
(a2BB’o) that is capable of specific engagement with 
duplex DNA at the beginning of genes (promoters) 
as well as efficient initiation of transcription. It was 
hypothesized that multiple sigma factors would be 
found in E. coli, each capable of directing the core 
polymerase to transcribe a specific set of genes. In 
this way, by regulating the level of each active sigma 
factor, the cell could coordinately regulate groups of 
genes with common functions. During the last 25 
years, multiple sigma factors have indeed been 
found. The seven sigma factors of E. coli are listed in 
Table | along with their gene names, molecular 
weights, consensus promoter DNA binding sites, 
and classes of genes they regulate. Sigma factors have 
also been discovered that are encoded by bacterio- 
phage. By binding to core polymerase these proteins 
cause preferential transcription of phage genes. In the 
sporulating bacterium Bacillus subtilis, ten sigma fac- 
tors have been discovered and characterized. These 
proteins not only regulate classes of genes during 
vegetative growth, but also orchestrate the develop- 
ment of the spore, in response to nutrient starvation. 
E. coli o”° was essentially the first positive transcrip- 
tion activation factor whose basic mode of action was 
understood. The concept that groups of genes could 
be coordinately regulated by transcription initiation 
factors spurred successful searches for numerous 
transcription factors in both bacteria and higher 
organisms. 


Basic Role in Transcription: The Sigma 
Cycle 


In the bacterial cell, sigma factors exist in several types 
of complexes, each of which is important to the tran- 
scription process. When a free sigma binds to core to 
form holoenzyme, it weakens the nonspecific binding 
of core to DNA and enhances the specific interaction 
of RNA polymerase with promoter DNA. This pos- 
itions the polymerase at the beginning of the gene 
(promoter) to be transcribed. Although most sigmas 
cannot bind to DNA by themselves, a conformational 
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Table | The seven sigma factors of Escherichia coli 
Factor? Gene Number of amino Size (kDa) Consensus binding site” Genes regulated 
acid residues 
o” (oP) rpoD 613 70 TTGACA-N;7-TATAAT Housekeeping 
54 (oN) rpoN (ntrA) 477 54 CTGGCAC-N;-TTGCA Nitrogen metabolism 
o° rpoS (katF) 362 38 TTGACA-N |.-TGTGCTATACT Stationary phase 
o°? (o) rpoH (htpR) 284 32 CTTGAA-N;4-CCCCATNT Heat shock 
of (o°) flA 239 28 TAAA-N |s-GCCGATAA Flagellar proteins 
of rpoE 191 24 GAACTT-N 6- TCTGA Extreme heat shock 
ot fecl 173 19 GGAAAT-N 7-TC Iron transport 


“Alternative names are given in parenthesis. 
ÞN, indicates any x number of nucleotides. 


change occurs when sigma binds to core that exposes 
two regions of sigma. The exposed segments recognize 
two short regions of promoter DNA that lie about 
38-32 bases (the -35 region) and 13-8 bases (the -10 
region) before the start site of transcription (at 
position +1). This resulting closed promoter complex 
then undergoes a conformational change that facilit- 
ates a ‘melting’ process to form an open promoter 
complex in which the base pairs from positions —10 
to +1 are disrupted, allowing the two strands of DNA 
to separate slightly. The first two nucleoside triphos- 
phates then bind to the open complex, base pairing 
with the DNA bases in positions +1 and +2 
of the template strand. The first two nucleotides 
become linked by a phosphodiester bond during 
RNA chain initiation. After initiation, when the RNA 
chain reaches a length of about 8-10 residues, the 
sigma factor dissociates from the core, allowing the 
polymerase to escape from the promoter and traverse 
the DNA, elongating the RNA chain until the com- 
plete gene is transcribed and termination of transcrip- 
tion occurs. The released sigma is free to find another 
core polymerase and start the sigma cycle anew. It is 
likely that the conformation of sigma changes signifi- 
cantly both in its interactions with the core polymer- 
ase and with the promoter DNA as the above cycle 
progresses. 

It should be noted that not all of the regulation of 
bacterial transcription is attributable to sigma factors. 
In addition to the general/global regulation exerted 
by sigma factors, there are very important effects 
on transcriptional regulation exerted by promoter- 
specific negative transcription factors, such as repres- 
sors, that bind to operator targets that overlap the 
promoter, thereby preventing RNA polymerase 
from binding. There is also regulation by positive 
transcription factors that bind to specific targets 
just upstream of promoters. By interacting with the 
holoenzyme, such factors enhance its ability to bind 


to the promoter, the efficiency of open promoter 
complex formation, or the rate of RNA chain ini- 
tiation. 


Sigma70 (o7°) Structure and Function 


The gene for oh rpoD, has been mapped, cloned, and 
sequenced. By comparing its sequence to those of 
other sigma factors, regions of strong similarity were 
identified. This implies that the E. coli sigmas (with the 
exception of o™) belong to a family of homologous 
proteins. The most highly conserved segments were 
designated regions 1, 2, 3, and 4. Predicted helix- 
loop—-helix (HLH) structures, often found in DNA 
binding proteins, in regions 3.1 and 4.2 suggest that 
the role of these regions may be to bind DNA. 
Through a combination of genetic, biochemical, and 
sequence analysis studies, various structural and func- 
tional features have been assigned to segments of the 
613 amino acid residue polypeptide chain. These are 
summarized in Figure |. By analyzing promoter 
mutations and compensating mutations in o`, regions 
4.2 and 2.4 of o”? were shown to interact with the -35 
region and the —10 region of the promoter DNA, 
respectively, and are thus responsible for the specific 
DNA binding properties of o”°. Region 2.3 appears to 
be involved in interacting nonspecifically with single- 
stranded DNA in the open promoter complex. Region 
1.1 seems to interact with region 4.2 to prevent free o”° 
from binding DNA. Deletion and mutational analysis 
impliates regions 2.1 and 2.2 of o”° in core binding. 
The sequence between regions 1.2 and 2.1 is thought 
to be dispensable because it is found only in o”° and is 
absent from the other members of the o”° family in 
E. coli. It is also not found in the 67° equivalent in 
B. subtilis (o*). A highly acidic region, containing 
18-22 acidic amino acids, is found around residue 
200. Regions 1.2 through 2.4 (amino acids 114-448) 
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Figure I Structural and functional features of o”° 


form a structure that is resistant to digestion by a 
variety of proteases. This protease-resistant domain 
was recently crystallized and its three-dimensional 
structure determined (Malhotra et al., 1996). Regions 
1.2 and 2.1 are seen to be a-helices, located close 
together and forming a coiled-coil structure. The 
structures of the remaining regions of o° remain 
unsolved. 


How Sigma Activity is Regulated 


It is not understood fully how the regulation of tran- 
scription by way of multiple sigma factors is accom- 
plished. The simplest model is one where the amount 
of each type of holoenzyme is determined by the 
relative amounts and binding strengths for core poly- 
merase of each sigma. Emerging results for a few of the 
E. coli and B. subtilis sigmas suggest that diverse and 
complex mechanisms are employed to regulate by 
sigma abundance and activity under different growth 
conditions. For example, the abundance of o°? appears 
to be regulated by its stability. During growth at 37 
°C, o is bound by a heat shock protein, DnaK, 
resulting in its rapid proteolytic degradation (half- 
life, 1-2 minutes). At 42 °C, certain, proteins become 
denatured. DnaK, which binds to partially unfolded 
proteins, is competed away from 6°’. Free o°? rapidly 
accumulates, binding to core polymerase, thereby 
stimulating transcription from the promoters of heat 
shock genes. One of the genes transcribed is DnaK. 
The consequent elevation in DnaK levels results in the 
sequestration of o” and its rapid degradation. 

In another example, o" is bound to an ‘anti-sigma,’ 
FlgM. This makes o unavailable for the transcription 
of genes essential for the construction of the flagellum. 
During nutritional deprivation, the bacterium must 
make flagella to be able to swim toward new sources 
of food. The cell first turns on the synthesis of the 
proteins that make up the base of the flagellum. The 
first protein to be transported out of the cell through 
the base pore structure is FlgM. This frees up o" to bind 
to core polymerase, activating the transcription of the 


remaining flagellar genes. When the flagellum is com- 
plete, FlgM levels again build up and o" is converted 
into an inactive o'/FlgM sigma/anti-sigma complex. 


Future Directions 


Extensive work is in progress to determine the precise 
nature of the interaction between o”° and core poly- 
merase. A major portion of this interaction appears 
to involve amino acids 260-309 of the ß’ subunit of 
core. This region of P’ is likely to be involved in the 
interaction of core with most, if not all, of the other 
sigma factors. Detailed knowledge of this important 
interaction may allow the design of small molecules 
that interfere with o”°-f! interaction. Such com- 
pounds could have potential therapeutic use as anti- 
biotics. 

Recently the crystal structure of the core RNA 
polymerase from Thermus aquaticus was determined 
(Zhang et al., 1999), providing us with new insights 
into sigma binding. Since o”? is likely to undergo 
major conformational changes during its participation 
in the sigma cycle, we really need a motion picture 
rather than a snapshot. More extensive site-directed 
mutagenesis of key regions of o” is in progress. Care- 
ful study of the effects of these mutations on various 
o” functions will provide a more detailed view of the 
system and a deeper understanding of sigma factors in 
general carry out their functions. 

Work is in progress to measure the cytoplasmic 
concentration of each sigma factor and the level of 
the cognate holoenzyme under a variety of physio- 
logical states. Such information will provide insight 
into how the level and activity of each sigma varies 
with growth condition. The advent of high-density 
DNA arrays for simultaneously measuring the level 
of transcription of all 4300 E. coli genes will allow us 
to determine how many of the operons are regulated 
by each of the seven sigma factors and give us a much 
more complete picture of how global gene expression 
is regulated by fluctuations in the amount and activity 
of sigma factors. 


1834 Signal Sequence 


Further Reading 

Burgess RR, Travers AA, Dunn JJ and Bautz EKF (1969) Factor 
stimulating transcription by RNA polymerase. Nature 221: 
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A signal sequence is a peptide present on proteins 
(usually N-terminal) reponsible for cotranslational 
insertion into membranes of the endoplasmic reticu- 
lum. These sequences are usually present on proteins 
destined to become membrane components or to be 
secreted. They are highly hydophobic sequences of 
approximately 20 amino acids, which are normally 
removed from the growing peptide chain by signal pep- 
tidase, a specific protease of the endoplasmic reticulum. 


See also: Proteins and Protein Structure 
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Signals acting on the outside of the cell need to have 
their effects transmitted across the cell membrane. 
This signal transduction is mediated by receptors 
which interact with the external signals and undergo 


changes in molecular conformation which alters the 
receptor structure on the inside of the cell. These can 
be detected by other proteins and passed on through 
complex chains of protein alterations, usually involv- 
ing phosphorylation, ultimately terminating in func- 
tional changes in the cell. 

One system involves the G proteins which bind 
GTP and which carry information from receptors 
called seven transmembrane receptors and lead, 
amongst other things, to stimulation or inhibition of 
the synthesis of cyclic AMP. 


See also: cAMP and Cell Signaling 
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P HA Sneath 
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Similarity is the degree to which two entities resemble 
each other in their properties, and is commonly 
expressed as the proportion of such resemblances in 
a defined list of properties. This broad definition may 
be qualified by restricting the comparison in some 
way. For example, phenetic similarity refers to simi- 
larity in observed properties without considering 
evolution, cladistic similarity refers to that deduced 
from phylogenetic principles, and genomic similarity 
is that from genomic data. Similarity is occasionally 
called resemblance or relationship. Dissimilarity is the 
complement of similarity (i.e., identical entities have 
100% similarity but zero dissimilarity). This is useful 
when dissimilarity is expressed as a distance in genetic 
algorithms. 


See also: Taxonomy, Evolutionary; Taxonomy, 
Numerical 
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The general name coined for selfish genetic elements 
that disperse themselves through the genome by 
means of an RNA intermediate is ‘retroposon’. There 
are two classes of retroposons. The SINE family is 
made up of very small DNA elements that require 
other genetic information to facilitate their dispersion 
throughout the genome. The LINE family is derived 


from a full-fledged selfish DNA sequence with a self- 
encoded reverse transcriptase. 

The two major families of highly repetitive elem- 
ents in the mouse — B1 and B2 — are both of the SINE 
type with relatively short repeat units of ~ 140 bp and 
~ 190 bp in length respectively. In humans, the highly 
repetitive Alu repeat element can also be classified as a 
SINE type. 

The significance of the short repeat length of a 
SINE element is that it does not provide sufficient 
capacity for these elements to actually encode their 
own reverse transcriptase. Nevertheless, SINE elem- 
ents are able to disperse themselves through the 
genome, just like LINE elements, by means of an 
RNA intermediate that undergoes reverse transcrip- 
tion. Clearly, SINEs are dependent on the availability 
of reverse transcriptase produced elsewhere, perhaps 
from LINE transcripts or endogenous retroviruses. 

All SINE elements, in the mouse genome and else- 
where, appear to have evolved out of small cellular 
RNA species — most often tRNAs but also (in the case 
of mice and humans) the 7S cytoplasmic RNA which 
is one of the components of the signal recognition 
particle (SRP) essential for protein translocation across 
the endoplasmic reticulum. Unlike the LINE famil- 
ies, however, SINE families present in the genomes of 
different organisms appear, for the most part, to 
have independent origins. The defining event in the 
evolution of a functional cellular RNA into an altered- 
function self-replicating SINE element is the accumu- 
lation of nucleotide changes in the 3’ region that lead 
to self-complementarity with the propensity to form 
hairpin loops. The open end of the hairpin loop can be 
recognized by reverse transcriptase as a primer for 
strand elongation. Since hairpin loop formation of this 
type is likely to be very rare among normal cellular 
RNAs, the SINE transcripts in a cell will be utilized 
preferentially as templates for the production of 
cDNA molecules that are able (somehow) to integrate 
into the genome at random sites. Like the LINE family, 
SINE families appear to be evolving by episodic 
amplification followed by sequence degradation. 


See also: LINE; Repetitive (DNA) Sequence 
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Single-copy plasmids are maintained at a level of one 
plasmid per host chromosome. 


See also: Multicopy Plasmids; Plasmids 
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Single-Gene Inheritance 
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Single-gene inheritance occurs when the development 
of a trait, or phenotype, is largely determined by the 
presence of mutation in the alleles of a single gene. In 
pedigrees where the mutation is present in several 
individuals, the inheritance of the disorder will show 
a pattern of affected individuals. Characteristic pat- 
terns are created depending on whether the mutated 
gene alleles are dominant or recessive and whether 
the genes are located on the autosomes or sex 
chromosomes. 


History 


The effects of genes were first recognized because 
mutations caused similar phenotypic differences in 
several members of a family. The first human disorder 
to be recognized as a single gene trait was alkapto- 
nuria, which was described by Garrod in 1902. He and 
Bateson then proposed that affected individuals were 
homozygous for an underactive recessive gene. The 
first human gene to be mapped to a chromosome was 
Wilson’s demonstration of the X-linked nature of 
color blindness in 1911. The first evidence, in any 
organism, that a mutation in a structural gene could 
cause an altered amino acid sequence in a protein came 
in 1956. Ingram, developing the work of Pauling, 
demonstrated an abnormal hemoglobin polypeptide 
sequence in sickle cell disease. Since then, many 
thousand single gene disorders have been identified 
and characterized. They are catalogued in Online 
Mendelian Inheritance in Man. At the end of 2000 
the database had information on 11372 autosomal, 
674 X-linked, 37 Y-linked, and 60 mitochondrial 
entries. Although our knowledge of the total number 
of single genes in the genome is nearing completion, 
our understanding of genomics, or gene interactions, is 
still elementary. 


Autosomal, Sex-Linked, and 
Mitochondrial Inheritance Patterns 


Genes can be characterized by their location in the 
genome. Genes on the 22 pairs of autosomes are auto- 
somal, genes on the pair of sex chromosomes are 
X-linked or Y-linked, and genes on the mitochondrial 
chromosome are mitochondrial. The nature of trans- 
mission of each of these chromosomes, and whether a 
trait caused by a mutation is dominant or recessive, 
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create distinctive inheritance patterns of affected indi- 
viduals. Mendel described some of these patterns in 
the inheritance of pea characteristics and sometimes, 
single gene inheritance is referred to as Mendelian 
inheritance. Strictly speaking, Mendelian inheritance 
does not include those single genes on the mitochon- 
drial chromosome. 

In practice, many individual pedigrees with a single 
gene disorder may not show a typical Mendelian pat- 
tern of inheritance. This can occur for a variety of 
reasons. A single affected individual may be a new 
mutation. A new dominant mutation may be lethal 
before reproduction. A healthy parent may have two 
or more affected offspring as a result of a new domin- 
ant mutation occurring in the parent’s gonads (gonadal 
mosaicism). An affected parent may transmit only 
normal alleles to offspring. Some individuals may 
carry the mutation but not express the expected 
phenotype. 


Recessive and Dominant Traits 


If mutation needs to be present in both alleles (m/m) 
before altering the phenotype, the trait is recessive. If 
a mutation in a gene alters the phenotype when it 
is present only in a single allele in the heterozygote 
(+/m), the trait is dominant. 


Molecular Basis of Recessivity 

The mutated allele in a heterozygote for a recessive 
trait (+/m) has little or no effect on the phenotype 
generally because the mutation causes a loss of func- 
tion with no gain of new function or interference of 
function of the healthy allele. The normal protein 
is usually present in the heterozygote at half the level 
found in the normal homozygote but this is sufficient 
to maintain the normal phenotype. Examples of re- 
cessive disorders include enzyme defects such as 
alkaptonuria and cell membrane receptor or channel 
defects such as cystic fibrosis. Some recessive traits 
such as sickle cell disease, a disorder of hemoglobin, 
result in two proteins (normal and mutated) being 
produced and found in the target tissue. These traits 
may show a change of phenotype in heterozygotes in 
some environments. In nonmalarial areas the sickle 
cell heterozygote phenotype is normal but in malarial 
areas the heterozygotes are fitter than the normal 
homozygotes because the presence of sickle hemo- 
globin in the red blood cells interferes with the malar- 
ial parasites’ life cycle. This is called heterozygote 
advantage. Other recessive traits, such as Duchenne 
muscular dystrophy, are caused by mutations that 
result in little or no mutated protein being found in 
the target tissue. 


Molecular Basis of Dominance 

The mutated allele in a heterozygote for a dominant 
trait (+/M) affects the phenotype by either interfering 
with the function of the normal allele or by gain of 
new function. Examples of dominant traits include 
collagen disorders such as osteogenesis imperfecta, a 
disorder causing brittle bones. In these, the mutated 
gene codes for a protein that is a subunit of a larger 
multimeric structural protein. The presence of the 
mutated subunit protein degrades the function of the 
final protein. This explains the apparent paradox in 
which a mutation in a subunit gene, which produces 
no protein, is recessive and results in a normal pheno- 
type in the heterozygote but an apparently less severe 
mutation, which produces an abnormal protein, 
results in a severe dominant trait. 

Other dominant traits such as Huntington disease 
result from a gain of function of the mutated protein. 
Huntington disease is a progressive neurodegenera- 
tive condition that causes dementia and a movement 
disorder. Affected individuals develop and grow 
normally until symptoms develop, usually in middle 
age. The normal and mutated huntingtin protein, 
which contains an expanded glutamine repeat near 
the protein N-terminal, are widely expressed but a 
gain of function, expressed as abnormal metabolism 
of the mutant protein, leads to aggregation of N- 
terminal huntingtin fragments. The presence of 
these aggregates in neurones predisposes to early cell 


death. 


Complete (True) and Partial Dominance 


Complete dominance 

When a dominant trait is caused by a mutation that 
causes only a gain, and no loss, of function of the 
mutant protein, then individuals homozygous for the 
mutant allele would be expected to have the same 
phenotype as heterozygotes. Huntington disease is 
an example of a complete or true dominant condition. 
The mutant huntingtin proteins in homozygotes have 
not lost their normal function, allowing normal 
growth and development until the gain of function 
causes early neuronal death. 


Partial dominance 

Many dominant traits show partial dominance. This is 
demonstrated when individuals who are homozygous 
for the mutant allele have a more severe phenotype 
than heterozygotes. In addition to the mutant allele’s 
gain of function, causing dominance of the trait, there 
is partial loss of normal function that is recessive in 
the heterozygote. An example is achondroplasia. 
The phenotype of heterozygotes involves restricted 
growth but normal life expectancy. In contrast, 


mutant homozygotes have a very severe skeletal dys- 
plasia and die in infancy. 


Other Factors Affecting the Phenotype 
in Single-Gene Inheritance 


Many single-gene disorders, particularly autosomal 
dominant traits, show considerable variation in the 
phenotype and individuals with healthy phenotypes 
may transmit the trait demonstrating the presence of 
the mutant genotype. Penetrance and expressivity are 
common factors influencing autosomal dominant 


phenotypes. 


Penetrance 

The penetrance of a trait is the proportion of those 
who have the trait genotype (obligate carriers) who 
show the trait phenotype. A trait with full penetrance, 
such as achondroplasia, results in all heterozygotes 
developing the trait phenotype. Other disorders 
show reduced penetrance, e.g. breast cancer caused 
by BRCA1 mutations shows about 85% penetrance 
in female heterozygotes. Penetrance may also be age- 
dependent. For example, achondroplasia is 100% 
penetrant at birth, neurofibromatosis is near 100% 
penetrant by the end of the second decade, and 
Huntington disease is near 100% penetrant if hetero- 
zygotes live long enough. 


Variable Expressivity 


Stable mutations 

Variable expressivity refers to variations in the degree 
of severity of a phenotype. Some single-gene disorders 
such as adrenoleukodystrophy, a multisystem dis- 
order affecting the adrenal glands and nervous system, 
show a marked variation in phenotype involving age 
of onset and extent of involvement of each system. In 
one family, where each affected member has the same 
mutation, the phenotype can vary from severely 
affected children to asymptomatic, nonpenetrant 
adults. Variable expressivity in disorders with stable 
mutations may represent the effects of modifying 
genes and/or environmental factors on the final 
phenotype. 


Unstable mutations 

The mechanism that causes variable expressivity in 
some disorders is related to instability of the causative 
mutation, even among members of the same family. An 
example is fragile X syndrome, which causes a variable 
dysmorphic mental retardation. It is caused by an 
unstable amplified CGG trinucleotide repeat mutation 
and the size of an individual’s CGG repeat mutation 
correlates with the severity of the phenotype. 
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Anticipation 

Several autosomal dominant disorders show anticipa- 
tion where the age of onset is earlier and the pheno- 
type more severe in successive generations. Myotonic 
dystrophy is an example where the first generation 
may only develop cataracts in late middle age, the 
second generation may develop muscular weakness 
and stiffness in early adult life, and the third gener- 
ation may have severe congenital onset. Anticipation 
in myotonic dystrophy is caused by instability of the 
amplified CTG trinucleotide repeat mutation. The 
number of repeats tends to increase with each gen- 
eration, particularly when transmitted by a female. 
Mildly affected adults in the first generation of an 
affected family may have only 50 repeats but a con- 
genitally affected infant may have more than 2000. 


Mendelian Disorders: Are They Truly 
Single-Gene? 


Some single-gene disorders such as achondroplasia and 
Duchenne muscular dystrophy show little variation in 
severity of the phenotype, even in unrelated indi- 
viduals and can be considered to show true single- 
gene inheritance. However, disorders with stable muta- 
tions showing variable expressivity suggest the possible 
effects of modifying genes. There are also conditions in 
which only the susceptibility to the trait is inherited as 
a single-gene disorder. Examples include autosomal 
dominant familial cancers such as early onset breast 
cancer and early onset colon cancer. Cancers are 
caused by a sequence of genetic changes (which may 
be triggered by environmental factors) occurring in a 
clone of somatic cells in the affected tissue. Over time, 
these somatically inherited mutations, some of which 
may need to become homozygous, lead to uncon- 
trolled cellular proliferation in the clone. In the famil- 
ial cancers, the first key step is inherited through 
the germ line, often as an autosomal dominant. 
This results in the whole sequence being completed 
more quickly, giving rise to an earlier age of onset 
in familial cancers than in sporadic cancers. Some 
individuals who have inherited the susceptibility 
mutation may not be exposed to the factors that 
cause the full subsequent sequence of somatic muta- 
tions and so may never develop cancer and be non- 
penetrant for the trait. These healthy individuals can 
pass their susceptibility mutation to their offspring 
who may not appreciate their own risk of developing 
cancer. 

Finally, mutations in some ‘single-gene disorders’ 
may not be in a single-gene at all. In myotonic dys- 
trophy, which is transmitted as an autosomal domi- 
nant single-gene disorder, the associated mutation is 
situated in the 3’-untranslated region of the DMPK 
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gene. This is also the promoter region of SIX5 gene, 
which is immediately downstream of DMPK. The 
multisystem nature and very variable phenotype of 
this disorder are not yet explained by knowledge of 
the function of these two genes, raising the possibility 
that the myotonic dystrophy mutation affects the 
expression of additional genes either directly on local 
genes as a result of disruption of the normal chromatin 
structure or indirectly through the effects of altered 
expression of DMPK and SIX5 on the expression of 
genes elsewhere in the genome. 


Further Reading 
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Genetics, 5th edn. Oxford: Blackwell Scientific Publications. 

Gelehrter TD, Collins FS and Ginsburg D (1998) Principles of 
Medical Genetics, 2nd edn. Baltimore, MD: Williams & Wilkins. 
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University of Glasgow, Department of Medical Genetics, Ency- 
clopaedia of Genetics pages contain a number of illustrations 
and animated diagrams to accompany this article: http:// 
www.gla.ac.uk/medicalgenetics/encyclopedia.htm 

Winchester CL, Ferrier RK, Sermoni A, Clark BJ and Johnson KJ 
(1999) Characterization of the expression of DMPK and SIX5 
in the human eye and implications for pathogenesis 
in myotonic dystrophy. Human Molecular Genetics 8(3): 
481—492. 


See also: Dominance; Expressivity; Mutation; 
Penetrance; Recessive Inheritance 
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SNPs (pronounced snips) are single-nucleotide poly- 
morphisms. They are single-base variations in the 
genetic code that occur about every 1000 bases along 
the 3 billion bases of the human genome. Researchers 
believe that knowing the locations of these closely- 
spaced DNA landmarks will help to discover genes 
involved in such major human diseases such as 
asthma, diabetes, heart disease, schizophrenia, and 
cancer. 


See also: Genetic Diseases; Human Genome 
Project; Polymorphism 
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There are normally two mechanisms available for the 
repair of a DNA double-strand break: homologous 
recombination and nonhomologous end joining. In 
special circumstances, there is a third way. When there 
is a repeated nucleotide sequence in a direct orienta- 
tion (pointing the same way), exonucleolytic removal 
of one polynucleotide chain from each of the broken 
ends can reveal the complementary sequences of the 
repeated length. Rejoining by complementary base 
pairing, followed by endonucleolytic removal of the 
loose ends, filling in gaps, and subsequent ligation, 
yields a rejoined molecule that has lost one copy of 
the repeated sequence, and any sequence that was 
between the repeats. This mechanism can operate to 
repair double-strand breaks when a homologous 
duplex molecule is not present, as in a haploid cell in 
the G4 phase of the cell cycle. 

The single-strand annealing mechanism was first 
proposed to explain plasmid recombination in mam- 
malian cells and was subsequently studied in detail 
in Xenopus and Saccharomyces cerevisiae. Figure | 
shows a scheme by which single-strand annealing is 
proposed to occur. The broken ends of the DNA mole- 
cule are processed as proposed in other recombination 
models, by resection of ends of like polarity. When 
this resection exposes the complementary sequences 
of the repeated nucleotide sequence, they can anneal 
with each other by complementary base pairing. The 
nonhomologous tail requires a flap endonuclease (the 
products of the RAD/ and RAD10 genes in S. cerevi- 
siae) for its removal unless the tail is shorter than 
about 30 nucleotides. The editing function of a DNA 
polymerase is believed to be able to remove very short 
tails. As Figure | shows, we expect to need some 
DNA synthesis to extend the 3’ ends where there has 
been excessive resection. The final step will be ligation 
to close the last nicks and restore intact DNA with 
deletion as described. 

In S. cerevisiae, the process is highly efficient if the 
lengths of homology are 400 bp or more. The distance 
between the repeats can be short, or up to 10 or 20 kb. 
Time to completion of the repair depends on the dis- 
tance between the repeats, apparently because of the 
time required for longer resection. Single-strand 
annealing differs from crossing-over between the 
directly repeated sequences in that single-strand 
annealing is not conservative. The reciprocal product, 
a circle consisting of the deleted length, is not formed. 


Single-Stranded DNA-Binding Proteins (SSBs) 
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Figure | Double-strand break repair and deletion 


formation by single-strand annealing. Lines represent 
polynucleotide chains, with broken lines showing new 
synthesis. Lengths of repeated sequence are shown by a 
thicker line. Half arrows indicate 3’ ends. The open 
arrow shows a double-strand break. Solid arrows show 
places where flap endonuclease activity is required. (A) 
A double-strand break occurs anywhere within or 
between the repeated sequence. (B) Resection of 5’ 
ends by exonuclease reveals complementary sequences 
in the 3’ tails. (C) Complementary regions become 
annealed. (D) Nonhomologous tails are removed by a 
flap endonuclease, and gaps are filled by new synthesis. 
(E) Ligation yields an intact DNA molecule with one 
repeat and the sequence between deleted. 


Direct repeated sequences, which are common in 
the human genome, would be subject to removal by 
this mechanism. These may persist because they have 
diverged enough to reduce the likelihood of homolo- 
gous interaction. 


Further Reading 

Fishman-Lobell J, Rudin N and Haber JE (1992) Two alternative 
pathways of double-strand break repair that are kinetically 
separable and independently modulated. Molecular and Cel- 
lular Biology 12: 1292-1003. 

Lin DS, Sperle K and Sternberg N (1984) Model for homologous 
recombination during transfer of DNA into mouse L-cells: 
role for DNA ends in the recombination process. Molecular 
and Cellular Biology 4: 1020-1034. 

Paque F and Haber JE (1999) Multiple pathways of recombin- 
ation induced by double-strand breaks in Saccharomyces 
cerevisiae. Microbiology and Molecular Biology Reviews 63: 
349-404. 


See also: Double-Strand Break Repair Model; 
Recombinational Repair; Repair Mechanisms 
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Single-Strand Assimilation 
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Single-strand assimilation is the ability of RecA pro- 
tein to cause a DNA strand to displace its homologous 
strand in a duplex, i.e., the single strand is assimilated 
into the duplex. 


See also: RecA Protein and Homology 


Single-Strand Exchange 
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Single-strand exchange is a reaction whereby one of 
the strands of a DNA duplex leaves its former partner 
and instead pairs with the complementary strand of 
another molecule, displacing its homolog in the se- 
cond duplex. 


See also: Homologs 


Single-Stranded 
DNA-Binding Proteins 
(SSBs) 
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The single-stranded DNA-binding protein (SSB) of 
Escherichia coli is a well-studied member of a class of 
proteins that have essential roles in DNA replication, 
recombination, and repair. These proteins lack en- 
zymatic activity and contribute to the ‘three Rs’ of 
DNA metabolism mainly through their shared ability 
to bind with high affinity to single-stranded DNA 
(ssDNA) and with low affinity to double-stranded 
DNA (dsDNA). Additionally, they organize pro- 
cesses by selectively binding to various participating 
proteins. Other well-studied SSBs discussed here are 
gene 32 protein (gp32) from the T4 bacteriophage 
and replication protein A (RPA) from humans and 
Saccharomyces cerevisiae. 
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Functional Aspects of Having a High 
Affinity for ssDNA 


Helix Destabilization 

In the metabolism of DNA during replication, recom- 
bination, and repair, regions of ssDNA are generated. 
Random ssDNA sequences have a tendency to form 
internal Watson—Crick bonds, and this secondary struc- 
ture can interfere with the binding of proteins used in 
DNA metabolism. SSBs possess a helix destabilization 
ability, by virtue of their high specificity for ssDNA, 
which allows them to ‘melt’ these structures. 


Protection of ssDNA 

It is generally thought that due to their great abun- 
dance in the cell, and their high affinity for ssDNA, 
SSBs are the first proteins to bind newly formed 
ssDNA. Not only do they reduce secondary structure 
formation, but they also prevent unintentional access 
to the ssDNA by other proteins. Endonucleases are 
particularly deleterious, as a nick in the single strand 
generates a double-strand break. If left unrepaired, 
these breaks can lead to the inability to replicate 
DNA or gross chromosomal rearrangements during 
cell division. 


Structure of SSBs: How Do They Bind 
DNA? 


SSBs from all organisms are, for the most part, func- 
tional homologs. Sequence similarity between various 
SSBs is limited; however, they share a DNA-binding 
motif called the OB fold. This fold, consisting of a 
five-stranded antiparallel B barrel with a terminal a 
helix, binds rather weakly but specifically to ssDNA. 
ssDNA binds in a narrow cleft formed by the motif 
and interacts with various residues through contacts 
with its backbone, sugar, and base moieties. The rela- 
tively weak binding of an OB fold to ssDNA can lead 
to strong binding of an SSB to ssDNA when more 
than one motif is present. 

The actual sequence, structure, and binding modes 
differ considerably for SSBs from various species, but 
most utilize multiple copies of the OB fold to bind 
ssDNA. SSB from E. coli, a monomer of which has a 
molecular weight of 19kDa, forms a stable homo- 
tetramer and effectively presents four OB folds to 
ssDNA. Almost the entire sequence of SSB is devoted 
to forming the OB fold, highlighting the lack of en- 
zymatic activity for this protein. 

E. coli SSB can bind to ssDNA in several different 
modes, depending on the salt concentration. At all salt 
concentrations, ssDNA is wrapped around SSB, 
causing the apparent DNA length to be shorter. At 
relatively low salt concentrations, only two of the 


tetramer subunits contact the DNA. In this mode, 
SSB exhibits a high cooperativity in DNA binding. 
This mode is illustrated in Figure l, in which the 
apparent length of ssDNA bound with SSB is greatly 
reduced compared to linear dsDNA. At higher salt 
concentrations, all four subunits interact with the 
DNA, and cooperativity is limited to the formation 
of octamers, leading to a “beads on a string appear- 
ance.” Intermediate binding modes have also been 
detected. The level of cooperativity observed in 
DNA binding is an important consideration when 
other proteins are competing with SSBs for binding 
sites. 

The other SSBs differ markedly in their structure 
from E. coli SSB. All identified RPA homologs are 
heterotrimers whose subunits have molecular weights 
of approximately 70, 30, and 14kDa. OB folds are 
found in all of the subunits. Human RPA displays a 
low cooperativity in its DNA-binding function, while 
the results for yeast RPA are less clear. T4 gp32 is a 
stable monomer of 33.5 kDa, and contains an OB fold. 
Unlike SSB, gp32 does not rely on multiple copies of 
the OB fold to bind ssDNA tightly. Instead, gp32 
binding is aided by the high level of positive coopera- 
tivity displayed under all binding conditions. 


Roles in Cellular Processes 


SSBs play major roles in DNA metabolism in the cell. 
In reviewing these roles, only a brief outline of repli- 
cation, recombination, and repair processes are given 
below. For a more complete review of these processes, 
see Replication, Genetic Recombination, Recombin- 
ational Repair, Mismatch Repair (Long/Short Patch), 
and Excision Repair. 


Replication 
Owing to the intrinsic nature of replication, the pro- 
cess involves unwinding dsDNA to generate ssDNA. 
The capability of SSBs to bind ssDNA and interact 
functionally and physically with other proteins places 
them in a crucial position. E. coli SSB aids the forma- 
tion and stabilization of origins of replication, and 
assembly and modulation of the primosome, allowing 
for primer synthesis only near the origin of replication 
used in vivo. SSB aids DNA helicases as they unwind 
DNA. Using ATP, helicases disrupt extensive 
Watson—Crick base pairing. Without SSB, separated 
single strands would rapidly reanneal. SSB has severa 
efects on the DNA polymerase, enhancing both poly- 
merase—template binding and polymerase fidelity, as 
well as destabilizing secondary structure that would 
lead to lower polymerase processivity. 

RPA serves mainly the same functions as SSB 
in replication, and protein-protein interactions are 
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Electron micrograph of circular ssDNA bound with Escherichia coli SSB. At the right-hand side are two 


circular ssDNA molecules bound with SSB protein. At the left-hand side, a linear double-stranded molecule of the 
same sequence as the circular ssDNA is shown for comparison. SSB causes a large apparent decrease in the contour 
length of the DNA due to the wrapping of DNA around the SSB homotetramer. (DNA is from X174. Samples were 
cross-linked and spread using cytochrome c grids: courtesy of Ross B. Inman, University of Wisconsin Madison.) 


important for some of its functions. For example, 
formation of the priming complex in SV40 replication 
requires interaction between human RPA and both 
T-antigen and DNA polymerase «/primase. Similarly 
in T4, gp32 interacts with gp43 (DNA polymerase), 
gp61 (primase), and gp59 (helicase loading factor), 
ensuring proper assembly of the replication machin- 
ery onto ssDNA and successful replication. 


Recombination 

In recombination, as in replication, a necessary pre- 
requisite to the process is ssDNA. In homologous 
recombination in E. coli, the RecA protein coats the 
ssDNA, searches for homologous DNA, and facili- 
tates a strand switch, creating a heteroduplex DNA 
and a displaced single strand. SSB is required for com- 
plete binding of RecA to ssDNA, by virtue of its 
ability to melt out secondary structure as described 
above. However, SSB and RecA compete for binding 
sites on ssDNA. While the nucleation of RecA onto 
ssDNA is inhibited, once a RecA monomer is bound, 


other monomers bind cooperatively, displace SSB, and 
coat the ssDNA. The cooperativity of SSB must 
be low enough to allow RecA to displace it. If the 
ssDNA is precoated with SSB, RecO, in a complex 
with RecR, physically interacts with SSB to allow 
RecA binding. 

In addition to aiding the binding of RecA, SSB 
participates in the formation of certain ssDNA sites 
for recombination. It modulates the activity of the 
RecBCD helicase/nuclease, which generates ssDNA 
regions from double-strand breaks and loads RecA 
onto the appropriate ssDNA. Finally, after the initi- 
ation of DNA strand exchange, SSB aids the reaction 
by binding to the displaced strand, preventing reiniti- 
ation of strand exchange that could lead to extended 
DNA networks. 

RPA and gp32 also serve to facilitate complete 
binding of the cognate strand exchange proteins 
Rad51 and UvsX, respectively. As in E. coli, mediator 
proteins aid in binding of Rad51 and UvsX to ssDNA 
coated with SSBs. Modulation is species specific, with 
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one mediator known in T4, UvsY, and two in yeast, 
Rad52 and Rad55/57. 


Repair 

SSBs participate in DNA repair processes, including 
mismatch repair, nucleotide excision repair, base exci- 
sion repair, and recombinational repair, as outlined 
above. Well-studied examples are the involvement of 
SSB in mismatch repair and RPA in nucleotide exci- 
sion repair. 

Base pair mismatches that arise occasionally during 
replication are subsequently repaired in a process that 
takes advantage of the fact that newly synthesized 
DNA in some organisms is undermethylated. In 
reconstituted reactions in vitro, methyl-directed 
mismatch repair requires SSB for DNA helicase II- 
mediated unwinding of DNA, stimulation of exonu- 
cleolytic excision of the strand containing the error, 
and synthesis of a complementary strand. 

Human RPA plays several roles in nucleotide exci- 
sion repair. RPA and the XPA protein initially sense 
the DNA damage and bind to it. Other factors subse- 
quently bind, and the damaged DNA is cleaved and 
excised. RPA also stabilizes the ssDNA gap prior to 
DNA synthesis. 


Summary 


Although SSBs lack enzymatic activity, they are essen- 
tial for DNA metabolism in the cell. It is rare for one 
protein to have key roles in processes with very dif- 
ferent mechanisms, as SSBs do; however, these pro- 
cesses are linked by the involvement of ssDNA. Not 
only do SSBs protect ssDNA and remove secondary 
structure, they also guide the assembly of the machin- 
ery required for DNA metabolism. 


See also: Excision Repair; Genetic Recombination; 
Mismatch Repair (Long/Short Patch); 
Recombinational Repair; Replication 
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A sister chromatid is one of the two chromatids com- 
prising a bivalent. Both chromatids are semiconser- 
vative copies produced by replication of the original 
chromosome. 


See also: Chromatid; Chromosome; 
Semiconservative Replication 


Site-Directed Mutagenesis 


See: In vitro Mutagenesis 
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Site-specific recombination describes a variety of 
specialized recombination processes that involve 
reciprocal exchange between defined DNA sites. In 
its strictest definition, site specific recombination 
involves: (1) two DNA partners, (2) a specialized 
recombinase protein that is responsible for recogniz- 
ing the sites and breaking and rejoining the DNA, and 
(3) a mechanism that involves DNA breakage and 
reunion with conservation of the phosphodiester 
bond energy (i.e., lacking a requirement for either 
DNA synthesis or a high-energy nucleotide cofactor). 
A consequence of these features is that site-specific 
recombination is not dependent on the cellular 
machinery for homologous recombination. The pro- 
totypes of site-specific recombination (thus defined) 
are the integration of bacteriophage lambda into the 
Escherichia coli chromosome (see Phage A Integration 
and Excision), the resolution of cointegrates derived 
from transposition of Tn3-related transposons (see 
Resolvase-Mediated Deletion), and the DNA inver- 
sions responsible for flagellar phase variation in Sal- 
monella (see Hin/Gin-Mediated Site-Specific DNA 
Inversion). The strict definition excludes several other 
specialized recombination processes that have, on occa- 
sion, been described as ‘site-specific’; these include 
VDJ joining catalyzed by the RAG1/2 proteins during 
the development of the immune system (see Integra- 
tion, T Cell Receptor Gene Family); most DNA 
transposition events (even when a specific target site 
is used) including integration of retroviral cDNA (see 
Retrotransposons, Retroviruses); and the ‘homing’ of 
mobile introns (or inteins) (see Intron Homing). 


Structural Consequences of Site-Specific 
Recombination 


Recombination sites are naturally polar and recombin- 
ation respects that polarity, always joining left halves 
to right (or, as shown in Figure l, arrow heads to 
tails). Depending on the initial arrangement of the 
parental recombination sites, site-specific recombin- 
ation has one of three possible outcomes: integration, 
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The three consequences of site-specific recombination. The recognition sites for the site-specific 


recombinase are represented by the broad black-and-white arrows. 


excision, or inversion (Figure |). Integration results 
from recombination between sites on separate DNA 
molecules (provided at least one of the parental 
chromosomes is circular) and occurs with a uniquely 
defined orientation. For sites located on the same 
chromosome, the outcome is determined by their rela- 
tive orientation. Thus, excision results from recombin- 
ation between sites in a head-to-tail orientation, while 
inversion results from exchange between inverted 


(head-to-head) sites. 


Biological Consequences of Site-Specific 
Recombination 


The three structural outcomes are used for a wide var- 
iety of purposes in biological systems and a number of 
examples are shown in Table |. Most commonly, use of 
site-specific recombination by an organism or a genet- 
ic element is driven by a primary need to physically 
join or separate DNA segments. However, it is also 
used as a means of activating or switching gene expres- 
sion, and generating genetic diversity through the 
acquisition of advantageous genes or gene segments. 

The first category of uses is dominated by three com- 
mon biological processes. (1) The integration of bac- 
teriophage chromosomes into (and their excision from) 
the chromosome of their host for the reversible forma- 
tion of lysogens. This phage strategy is also used by a 
few classes of transposons, exemplified on the one hand 
by Tn916 and Tn1545, and on the other by Tn4451 
(see Conjugative Transposition) (see Table 1). (2) The 
reduction to a monomeric state of dimers of a variety 
of circular chromosomes including both plasmids and 
bacterial chromosomes, to allow separation and cor- 
rect segregation into daughter cells (see Chromosome 
Dimer Resolution by Site-Specific Recombination). 
(3) Cointegrate resolution: the irreversible excision 
of the transposon-donor vector replicon from the 
cointegrate intermediate formed during transposition 
of elements such as Tn3, to regenerate the transposon 
donor and produce a simple insertion of the Tn in the 
target DNA (see Resolvase-Mediated Deletion). 

In the second category of uses, the primary purpose 
of the recombination is to juxtapose alternative 


DNA sequences in ways that affect their expression 
or coding potential. Inversions provide a relatively 
common means to achieve these goals in a reversible 
manner. For example, the inversions mediated by 
FimE, FimB, and Hin flip the orientation of a tran- 
scriptional promoter, switching adjacent genes off and 
on (see Hin/Gin-Mediated Site-Specific DNA Inver- 
sion). The inversions mediated by Gin, Cin, Rci, and 
Piv bring alternate coding sequences to the down- 
stream segment of an expressed gene, changing the 
C-terminal portion of the encoded protein in ways 
that affect its activity or antigenicity (see Alternation 
of Gene Expression, Hin/Gin-Mediated Site-Specific 
DNA Inversion). Deletions can also be used to affect 
gene expression; for example the XisA, XisC, and 
XisF functions in Anabaena, and CisA (SpoIVCA) 
in Bacillus subtilis delete large DNA segments that 
split specific genes into two inactive portions, to create 
an active gene fusion (see Gene Rearrangements, Pro- 
karyotic). In these cases, the change is irreversible and 
is part of a developmentally regulated pathway leading 
toa terminally differentiated cell type. Finally, a system 
that combines features of both categories is the acqui- 
sition of mobile gene cassettes mediated by the IntI 
activities of integrons (see Integrons, Gene Cassettes). 
Integration of a cassette is required both for cassette 
survival and for its expression, which generally occurs 
from a promoter adjacent to the insertion site. 


The Mechanism of Site-Specific 
Recombination: An Overview 


The process of site-specific recombination can be div- 
ided into a series of conceptually simple steps. The re- 
combinase binds to the two recombination sites. The 
two recombinase-bound sites pair, forming a synaptic 
complex with crossover sites juxtaposed. The recombi- 
nase then catalyzes cleavage and rejoining of the DNA 
within the synaptic complex. Finally, the synaptic com- 
plex breaks down, releasing the recombinant products. 

From this description it follows that the minimal 
components of a site-specific recombination system 
are a recombinase and a pair of recombination sites. 
The simplest sites are short duplex DNA segments, 
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Table | 


Site-specific recombination: a sampling of enzymes and functions 


Recombinase Biological function 


A Integrase family 
A Int and many phage integrases 
Int of Tn916/Tn1545 


Integration and excision of phage genomes 
Integration and excision: ‘transposition’ of circular transposons 


Intl Integration and excision of gene cassettes in integrons 
Cre Excision: dimer reduction in phage PI plasmids 
XerCD Excision: dimer reduction in the E. coli and many other bacterial chromosomes, and 


some plasmids 
Tnpl of Tn4430 


Excision: resolution of cointegrates resulting from transposition of Tn4430 


FimB, FimE Inversion: alternation of gene expression (fimbrial phase variation in E. coli) 
Rci of R64 Inversion of shufflon segment in plasmid R64 producing various forms of pili 
Flp Inversion: for amplification of yeast 2 um plasmid 


Resolvase family 
TnpR of Tn3/yò and related 


Excision: resolution of cointegrates resulting from transposition 


transposons 

ParA of RP4 Excision: dimer reduction in plasmid RP4 

Hin Inversion: alternation of gene expression (flagellar phase variation) in Salmonella 
Gin, Cin Inversion: alternation of gene expression (tail fiber proteins) in phages Mu and PI 


Int of pC31/Sre of R4? 
TnpX of Tn4451° 
SpolVCA (CisA)? 
XisF* 


Other classes 


Integration and excision of Streptomyces phages þC31 and R4 
Integration and excision of Tn4451 in Clostridium 

Excision: for developmentally regulated gene activation in B. subtilis 
Excision: for developmentally regulated gene activation in Anabaena 


Piv Inversion: alternation of gene expression (pilin phase variation) in Moraxella 


XisA, XisC 


Excision: for developmentally regulated gene activation in Anabaena 


* Unusually large members of the resolvase family. 


20 to 30 base pairs in length, that contain an inverted 
pair of recognition sequences and bind one dimer (or 
two monomers) of the recombinase. Such sites contain 
at their center the point of DNA breakage and joining, 
and are often referred to as the crossover sites. In 
nature, however, most recombination sites are more 
complicated containing not only a crossover site, but 
additional sequences spanning 100 or more base pairs. 
Such a complex site may operate in combination with 
a simple crossover site or with another complex part- 
ner. The extra DNA contains additional sites of pro- 
tein recognition and may bind more copies of the 
recombinase or other protein factors encoded by the 
host or the genetic element (e.g., phage or transposon) 
associated with the recombination system. The pur- 
pose of these additional DNA-bound proteins may be 
regulatory, structural, or both. They may initiate or 
stabilize the pairing of recombination sites, or inhibit 
inappropriate pairings; they may deliver recombinase 
catalytic domains to the crossover site; and they may 
determine the directionality of recombination (for 
example, promoting deletion but preventing inver- 
sion, or vice versa). 


As indicated earlier, breakage and rejoining of 
DNA in site-specific recombination occurs with 
no loss or gain of nucleotides and with strict con- 
servation of phosphodiester bond energy. To achieve 
this, a mechanism analogous to that of a topoisomer- 
ase (see Topoisomerases) is used; DNA strands are 
broken not by hydrolysis but rather by direct phos- 
phoryl transfer to a side chain of the recombinase. 
This side chain, a tyrosine or a serine in all character- 
ized cases, directly attacks the DNA sugar—phosphate 
backbone at the crossover site in a transesterification 
reaction, forming a covalent recombinase-DNA inter- 
mediate on one side of the break and a free hydroxyl 
group on the other. Rejoining the DNA strands is 
accomplished by reversing the process; the free hydr- 
oxyls from one recombination partner directly attack 
the phosphodiester linkage between recombinase 
and DNA of the other partner, releasing the re- 
combinase and sealing the breaks to produce recom- 
binant products. Intriguingly, the details of the 
process differ depending on whether the recombinase 
uses a tyrosine or a serine as the attacking nucleophile 
(see below). 


The Specialized Recombinases and their 
Mechanisms of Recombination 


Despite the many and distinct roles that site-specific 
recombination plays in biology and the large number 
of systems that have been identified, comparisons of 
the recombinase amino acid sequences indicate that 
nearly all fall into two families. These are the integrase 
family, named after the prototypical phage lambda 
integrase, and the resolvase family, named after the 
cointegrate-resolving recombinase encoded by the 
transposons Tn3 and y8. The two families are un- 
related in protein sequence or structure and employ 
different recombinational mechanisms; each family 
appears to have arisen and evolved separately. Despite 
the existence of these two distinct families, members of 
one family are not all associated with a particular set of 
structural and biological consequences. Thus, 
although the prototypical integrase is responsible for 
reversible integration and excision, there is at least one 
integrase-related cointegrate resolvase, and other 
members of the family catalyze DNA inversion. Simi- 
larly, although the prototypical resolvase catalyzes 
irreversible excision, related enzymes catalyze 1 inver- 
sion, or combined (i.e., reversible) integration and 
excision (see Table 1). 


The Integrase Family: Tyrosine 
Recombinases 

Members of the integrase family all possess a tyrosine 
nucleophile in combination with a totally conserved 
set of basic amino acid residues, two arginines and 
a histidine, known as the RHR triad. These residues 
are essential for full recombinational activity. A par- 
ticular feature of recombination performed by mem- 
bers of the integrase family is that double-strand 
breaks are not observed; rather, after each crossover 
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site is nicked by the recombinase, it must be joined to 
its partner before the second strand can be cut. This 
produces a cross-strand intermediate called a “Holli- 
day junction.’ 

Biochemical and structural analyses have eluci- 
dated many of the details of the recombination process 
(see Figure 2A). Within the synaptic complex the two 
crossover sites are held in antiparallel (head-to-tail) 
alignment by a tetramer of the recombinase. The initi- 
ating catalytic event is attack by a pair of diametrically 
opposed integrase subunits on one strand of each par- 
ental DNA duplex, three or four nucleotides 5’ to the 
center of the crossover site. The active site tyrosines 
link to the 3’ phosphates of each nicked strand, liber- 
ating a 5’ OH. These free ends melt away from the 
unbroken complementary strands of the parental 
duplex, and reach across to the partner duplex, form- 
ing an open square with each side composed of a short 
single-strand segment. The 5’ OHs attack the inte- 
grase-DNA phosphotyrosine linkages, releasing the 
recombinase and forming the first recombinant joint. 
This religated intermediate, with one pair of recombin- 
ant single strands and one pair of parental strands, is the 
classical Holliday junction: two homologous duplex 
DNAs connected by a pair of reciprocal single-strand 
exchanges. The second set of single-strand exchanges, 
necessary to complete the recombination, occurs in a 
similar fashion. The other pair of opposed integrase 
subunits cleaves the unexchanged parental strands, 
the freed 5’ OH ends again reach across to their part- 
ners (forming a heteroduplex with the single strand 
segment initially exchanged (see Figure 2A)) and initi- 
ate the fourth and final pair of phosphory] transfers. 


The Resolvase Family: Serine Recombinases 
Members of the resolvase family all contain a serine 
nucleophile in a short, conserved stretch of amino acid 
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Figure 2 The processes of DNA strand exchange in site-specific recombination. (A) Mechanism of integrase- 
related recombinases; (B) mechanism of resolvase-related recombinases. You represents each tyrosine nucleophile, 
Sou represents the serine nucleophile, P is the phosphate at the recombinase cleavage site. DNA 5’ and 3’ ends are 
represented by the terminal circles and arrowheads, respectively. 
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residues near to the N-terminus of the protein. The 
serine plus several other conserved residues, including 
three arginines, are essential for recombination activ- 
ity. A defining feature of recombination by members 
of this family that distinguishes them from integrase- 
related recombinases is the formation of double- 
strand breaks at both crossover sites; all strands are 
broken before any exchange is initiated. 

As with the integrase family, strand exchange 
occurs within a synaptic complex, containing the 
paired recombinase-bound crossover sites. However, 
the organization of this complex and, in particular, the 
movements of DNA ends or the recombinase that 
effect strand exchange remain a mystery since no 
structures of the complex have yet been solved. 
Synapsis triggers the catalytic activity of all four 
recombinase subunits bound to the crossover sites 
(Figure 2B). The serine nucleophiles attack both 
strands of the two parental DNAs, at the phosphates 
positioned one nucleotide 3’ to the center of the cross- 
over sites. This creates staggered breaks with a 3’ 
single-strand extension of two nucleotides terminated 
with a 3’ OH, and a recessed 5’ phosphate covalently 
linked to the recombinase via the active site serine 
residue. Without dissociation of the complex, the ends 
somehow reassort from a parental to a recombinant 
configuration, so that attack by the free 3’ OHs on the 
phophoserine linkages produces recombinant prod- 
ucts (and releases the recombinase). 


Other Classes of Recombinase 

A few site-specific recombinases, listed at the end of 
Table l, appear to be unrelated to members of either 
of the two large families. Moreover, Piv is unrelated 
to XisA and XisC, suggesting that there are at least 
two classes of recombinase that await further charac- 
terization. Since almost nothing is yet known about 
these recombinases, they will not be discussed further 
here. 
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c-Ski is a nuclear protein with transforming and myo- 
genesis-promoting activities. It is the cellular homo- 
log of the v-Ski oncoprotein that is responsible for 
transformation induced by the SKV avian carcinoma 
virus. v-Ski, which is produced in infected avian cells 
as a fusion with N-terminal gag sequences, can in- 
duce cell proliferation, morphological transformation, 
anchorage-independent growth, as well as myoblast 
differentiation. Ski is a DNA-binding protein that in- 
duces expression of muscle-specific genes, and which, 
under some circumstances such as downstream of 
TGF-B or nuclear hormone receptor signaling, can 
act as a transcriptional corepressor by recruiting his- 
tone deacetylases to transcription complexes. v-Ski 
acts in a dominant-negative manner to inhibit tran- 
scriptional repression by pRb. Mice lacking the c-ski 
gene have established its role in the expansion of 
neuroepithelial and skeletal muscle precursors during 
development. In vitro, v-Ski can also induce self- 
renewal of primary avain hematopoietic progenitors 
and arrest hematopoietic cell differentiation. c-Ski and 
the related protein Sno have been detected in human 
tumor cell lines. 


See also: Transformation 
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Hamilton O. Smith (1931- ), the co-recipient of the 
1978 Nobel Prize in Physiology or Medicine, shared 
the prize with Werner Arber (1929- ) and Daniel 
Nathans (1928- ). The Nobel Foundation honored 
these scientists “for the discovery of restriction 
enzymes and their applications in molecular biology.” 

Born in New York City in 1931, Hamilton 
Smith did much of his schooling in Urbana, a small 
Midwestern town, where his father was Professor of 


Education at the University of Illinois. Although 
Hamilton majored in mathematics at the University 
of Illinois, he developed an avid interest in biological 
subjects, and in 1952 entered the Johns Hopkins Med- 
ical School. Four years later, he earned his MD degree, 
and in 1962 was awarded a postdoctoral fellowship 
from the National Institutes of Health to pursue 
genetics research with Myron Levine at the University 
of Michigan in Ann Arbor. 

Smith’s initial work was on bacteriophages — viruses 
that infect bacteria. Specifically, he studied phage P 22, 
which infects Salmonella typhimurium. Smith focused 
his studies on lysogen, a phenomenon in which viruses 
inside host cells divide in concert with the host cell, 
without harming the latter. 

By the mid-1960s, Werner Arber, the Swiss scien- 
tist, had discovered restriction enzymes — the ‘chem- 
ical knives’ that cut DNA molecules into discrete, 
smaller segments by acting upon specific chemical 
sites. In 1966 Smith learned of Arber’s remarkable 
work on restriction enzymes responsible for cutting 
DNA, and modification enzymes that prevented harm 
to the host DNA. Smith realized the relevance of these 
discoveries to his own work, and pursued research on 
the study of specific restriction enzymes. 

In 1970, Hamilton Smith published two classic 
papers describing the first restriction enzyme from a 
common bacterium Haemophilus influenzae. He also 
characterized in detail the mechanism of enzyme 
action. The restriction enzyme from H. influenzae 
degraded foreign DNA to fragments of 1000 bp with- 
out affecting the DNA of the host bacterium. Smith 
showed that all fragments had the same four base pair 
sequence at each end. The enzyme he had discovered 
cleaved DNA at specific sequences of 6 bp. Thus, not 
only was Arber’s discovery verified and confirmed, 
but also the biochemical basis of enzyme action was 
elucidated. Later work confirmed that restriction 
enzymes recognize symmetrical base pair sequences 
and cleave DNA wherever these sequences occur. 

Daniel Nathans, a colleague of Hamilton Smith at 
Johns Hopkins University, pioneered practical appli- 
cations for restriction enzymes. In his classic 1971 
paper, Nathans reported that the restriction enzyme 
discovered by Smith cleaved the small DNA molecule 
from a simian virus called SV40 into 11 fragments. 
Using innovative methods of cleaving and mapping, 
Nathans later reported the complete genetic map of 
SV40 DNA - the first DNA mapping obtained by a 
chemical method. 

Nathans’ approach was refined by others enabling 
mapping of increasingly complex DNA structures, 
including those in human chromosomes. These devel- 
opments led to the formulation of the basic tenets of 
genetic engineering in which restriction enzymes 
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began to be used to determine the order of genes in 
chromosomes and to manufacture ‘designer genes.’ 

The discoveries of Arber, Smith, and Nathans also 
influenced all of modern molecular genetics and much 
of the biological sciences. The knowledge and applica- 
tions so derived are being used to this day in the study 
of evolutionary biology, the Human Genome Project, 
the discovery of the biochemical basis of hundreds of 
human diseases, and gene therapy for many diseases 
including malignancies. 

Even from his childhood, Hamilton Smith’s life can 
be characterized as one filled with an atmosphere of 
intense intellectualism and scholarly pursuits. In 
school he studied French, played electronic games, 
and participated in football and basketball. With his 
brother, Hamilton Smith collected an assortment of 
chemistry and electronic paraphernalia in the base- 
ment of their house, setting up recreational scientific 
experiments. Hamilton Smith played the piano, but 
claims that he was “in no way gifted” at it. However, 
when he was 13, he heard a recording of Beethoven’s 
Pathetique Sonata performed by Artur Rubinstein — 
this was to awaken in Smith a lifelong passion for the 
dramatic beauty of classical music. 


See also: Arber, Werner; Bacteriophages; Genetic 
Recombination; Nathans, Daniel; Restriction 
Endonuclease 
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The conceptual basis for the development of congenic 
mice was formulated by George Snell at the Jackson 
Laboratory during the 1940s and it led to the first and 
only Nobel Prize for work strictly in the field of 
mouse genetics. Snell was interested in the problem 
of tissue transplantation. Long before 1944, it was 
known that tissues could be readily transplanted 
between individuals of the same inbred strain without 
immunological rejection, but that mice of different 
strains would reject tissue transplants from each 
other. Although these observations were a clear indi- 
cation of the fact that genetic differences were respon- 
sible for tissue rejection, the number and types of 
genes involved remained entirely unknown. In absen- 
tia, these genes were named histocompatibility (or H) 
loci. The assumption was that the histocompatibility 
genes were responsible — directly or indirectly — for 
the production of tissue (or ‘histological’) markers 
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that could be distinguished as ‘self’ or ‘nonself’ by an 
animal’s immune system. If transplanted tissue and a 
host recipient carried identical genotypes at all H loci, 
there would be no immunological response and the 
transplant would ‘take.’ However, if a single foreign 
allele at any H locus was present in the tissue, it would 
be recognized as foreign and attacked. 

Although the number of histocompatibility loci 
was unknown, it was assumed to be large because of 
the rarity with which unrelated individuals — both 
mice and humans — accept each other’s tissues. The 
logic behind this assumption was the empirical finding 
that polymorphic loci are most often diallelic and not 
usually associated with more than three common 
alleles. If H loci showed a similar level of polymorph- 
ism, a large number would be required to ensure that 
there would almost always be at least one allelic dif- 
ference between any two unrelated individuals. The 
experimental problem was to identify and characterize 
each of the histocompatibility loci in isolation from all 
of the others. 

Snell’s approach to this problem was to use a novel 
multigeneration breeding protocol based on repeated 
backcrossing to trap a single H locus from one mouse 
strain (the donor) in the genetic background of 
another (the inbred partner). The basic approach 
caused the newly forming congenic strain to become 
increasingly similar to the inbred partner at each 
generation, but only those offspring who remained 
histoincompatible with the inbred partner were 
selected to participate in the next round of backcross- 
ing. It was assumed that a difference at any one H 
locus would be sufficient to allow full histoincompati- 
bility. Thus, at the end of the process, Snell expected to 
find that each independently derived congenic line 
would have trapped the donor strain allele at a single 
random H locus. With random selection, all H loci 
could be isolated in different congenic strains so long 
as a large enough number were generated. 

With this outcome in mind, Snell began the produc- 
tion of histoincompatible congenic strains (originally 
called ‘congenic resistant’ strains) with 125 independ- 
ent lines of matings. Of these, 27 were carried through 
to the point at which it was possible to determine 
which H locus had been trapped. Surprisingly, 22 of 
the 27 lines had trapped the same locus, which was 
given the name H-2 (by chance, it was the second one 
identified). Contrary to expectations, the H-2 locus 
(now called the H2 complex since it is known to be a 
tightly linked complex of genes) acts, for all effective 
purposes, as the only strong determinant of histocom- 
patibility. Snell and his predecessors were misled by 
the false assumption that only a limited number of 
alleles are possible at any one locus. Instead, a subset 
of genes within the H2 complex (known as the class I 


genes) are the most polymorphic in the genome with 
hundreds of alleles at each individual locus. The gen- 
eric term ‘major histocompatibility complex’ (MHC) 
is now used to designate this complex locus in mice as 
well as its homolog in all other mammalian species 
including humans, where it was historically called 
HLA. 


See also: Major Histocompatibility Complex 
(MHC) 


SNPs 


See: Single Nucleotide Polymorphisms (SNPs) 
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snRNAs (small nuclear RNAs) are an abundant class 
of RNA found in the nucleus of eukaryotes. Several of 
the snRNAs are involved in splicing or other RNA 
processing reactions. They are generally about 100- 
300 nucleotides long; most are found in complexes 
with proteins. 


See also: Nucleus 


snRNPs 
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snRNPs (small nuclear ribonucleoproteins) are 
snRNAs associated with proteins. 


See also: Nucleus 


Solanum tuberosum 
(Potato) 
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The potato (Solanum tuberosum subsp. tuberosum) is 
the fourth most important crop for human nutrition in 


the world. Potatoes grow under different climatic 
conditions. The world potato area in 1998 was 
17949000 hectares, and the amount produced was 
295 632 000t (FAO, 1998). The potato yield, with a 
world average of 16t ha~', ranges from 5-8t ha’ in 
some developing countries to 40t ha! and more in 
developed countries. In the last 10 years, potato 
production has increased at an average of 4.5% per 
year and the area planted has increased by 2.4% 
(Zandstra, 1999). 

The potato is food for humans and in some regions 
for animals, and raw material for the food-processing 
(e.g., potato chips, French fries, dried potatoes) and 
starch industries. Developing countries recognize 
more and more the opportunities for potato produc- 
tion. Whereas the potato in developed countries in 
moderate climates is increasingly used as raw material 
for the food industry, this crop in developing coun- 
tries is becoming more important for its original use in 
human nourishment. Substantial advantages of the 
potato are its high yield potential in short growth 
time, the high edible dry-matter content of its tubers, 
and its high dietary value as staple food. Potato tubers 
are rich in starch (10-20%), they contain biological 
high-value protein (2%), ascorbic acid (17mg per 
100 g edible parts), 2.5% roughage, and 1% mineral 
substances (Ka, Mg, P, Mn). 

The potato is native in the southwest of the United 
States and in the whole of Central and South America, 
with centers of genetic diversity in the Andean regions 
of Peru and Bolivia and in Mexico. There are accord- 
ing to Hawkes (1990) 228 wild potato species (tuber- 
bearing species of the huge genus Solanum). The potato 
species are in cytological respect a polyploid line with 
the basic number x = 12 from 2x to 6x, so diploids are 
2n = 24 (most wild species are diploid), triploids are 
2n = 36, tetraploids are 27 = 48 (in this group belongs 
S. tuberosum subsp. tuberosum), pentaploids are 2n = 
60, and hexaploids are 2n = 72 chromosomes. In each 
wild species there are more or less numerous acces- 
sions which are varying in important traits (e.g., resist- 
ance against diseases). 

The different wild potato species are part of the 
natural plant associations in extraordinary different 
ecological regions in their native habitats (e.g., high 
altitudes of 3500-4500 m; hot, dry semidesert con- 
ditions; wet mountain rainforests). This results in an 
adaptability of many species in the most varying envir- 
onmental conditions with many kinds of abiotical 
(frost, heat, drought) or biotical (diseases, pests) stress 
factors. This makes wild potato species immensely 
useful for potato breeding. The tubers (parts of the 
stems, not of the roots) of wild species are generally 
small and grow on long stolons with a distance to the 
mother plant of 30 cm or even more. 


Solanum tuberosum (Potato) 1849 


History 


Seven thousand to 10000 years ago in the Central 
Andean regions in the today’s countries Peru and 
Bolivia native people began to select some species for 
human use. The evolutionary relations between these 
seven cultivated species and their probable wild rela- 
tives are given by Hawkes (1990). The cultivated spe- 
cies are S. ajanhuiri (2x), S. chaucha (2x), S. curtilobum 
(5x), S. juzepczukit (3x), S. phureja (2x), S. stenotomum 
(2x), and S. tuberosum (4x) with the subspecies andi- 
gena and tuberosum. These species are grown in many 
different countries in South America. Most important 
is the tetraploid species S. tuberosum subsp. andigena 
which was introduced into Europe in the sixteenth 
century after the Spanish conquest of America. After 
adaptation and first simple selection of offspring from 
selfing berries and crossings, from the middle of the 
nineteenth century systematic breeding improved dis- 
ease resistance (especially to Phytophthora infestans = 
late blight) resulting into the present long-day- 
adapted European and North American cultivars of 
S. tuberosum subsp. tuberosum. Evidence for this the- 
ory was provided by the “Neotuberosum program” 
which was carried out in the UK, USA, The Nether- 
lands, and Canada (Bradshaw and Mackay, 1994). A 
similar process happened thousands year ago in South 
America. S. tuberosum subsp andigena migrated from 
today’s Peru and Bolivia into today’s Chile; and there, 
under the same day-length as in Europe, originated 
also the subspecies tuberosum, much earlier than in 
Europe or North America. 


Potato Breeding 


The traditional method for clonally propagated pota- 
toes is combination breeding by means of sexual 
hybridization (crossing) of suitable parents at the 
tetraploid level. Very important for a successful cross 
is the combining ability of the parents. Using this 
method a huge number of valuable cultivars were 
bred, but this also led sometimes to a limited gene 
pool and inbreeding, as a result of the relationship 
between the parents. In the twentieth century, the 
introduction of wild and other cultivated species and 
the use of their gene pool became increasingly import- 
ant because of the need to improve some traits (e.g., 
resistance to late blight, virus diseases, and nematodes) 
(Ross, 1986). 

Accessions of wild and cultivated species have been 
collected in numerous expeditions since the 1930s 
(initiated by Vavilov’s The Theory of the Origin of 
Cultivated Plants after Darwin). This has continued 
to this day, and the specimens are stored, propagated, 
and evaluated in the potato gene banks (germplasm 
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collections) of the world (Peru, Argentina, Chile, 
USA, Russia, The Netherlands, Germany, and UK). 

The 4x cultivated species S. tuberosum with its two 
subspecies is autotetraploid. The tetrasomic inherit- 
ance of autotetraploids and the segregation of the 
very often polygenic traits makes potato breeding 
difficult. 

The ability to make crosses between different spe- 
cies depends on many pre- and postzygotic inhibition 
mechanisms. The endosperm balanced number (EBN) 
is responsible for a balanced developing of endosperm 
and embryo. Dihaploids (2 = 2x) of 4x breeding lines 
or cultivars can improve the ability to cross with 
diploid species. Prebreeding at the diploid level 
makes the interpretation of segregation and selection 
of polygenetic traits easier. Besides the classic sexual 
hybridization there is the possibility of combining 
genomes asexually by protoplast fusion. This somatic 
hybridization can be applied to species which are 
impossible or very difficult to cross sexually (e.g., 
S. bulbocastanum x S. tuberosum subsp. tuberosum). 

Gene mapping with molecular markers and marker, 
assisted selection is an important tool in making mod- 
ern potato breeding more efficient. Genes transfer 
gives new prospects for the future. First steps in this 
direction are being made, in the improvement of 
starch quality, and in disease and pest resistance. 

The most important goals in breeding potatoes, out 
of more than 50 traits are: high yield on tubers in 
different maturity groups; resistance to diseases and 
pests; resistance to external tuber Damage, quality for 
food processing; and many others. 

The potato is threatened by numerous pathogens, 
which makes resistance breeding such an import- 
ant project. Fungal diseases include Phytophthora 
infestans, Fusarium spp., Synchytrium endobioticum, 
Phoma foveata, Rhizoctonia solani, Helminthospor- 
ium solani, Spongospora subterranea, Colletotrichum 
coccodes, Verticillium ssp., and Sclerotinia sclero- 
tiorum. Bacterial diseases include Erwinia ssp., Strep- 
tomyces scabies, Clavibacter michiganensis, and 
Ralstonia solanacearum; and viruses causing infec- 
tions include PVY, PLRV, PVM, PVA, PVX, and 
PVC. The process of clonal tuber propagation pro- 
moted infection and transfer of many diseases, so 
special extensive phytosanitary treatments are neces- 
sary for the production of healthy seed tubers. 

In the last decades true potato seeds (TPS) for field 
cultivation have gained importance especially for 
countries in hot climate areas. In these regions the 
production, storage, and transport of seed tubers is 
difficult and expensive. Here the use of TPS has 
many advantages: reduction of seed costs (50-250 g 
TPS per ha instead of 2t per ha seed tubers), 
flexibility of TPS in planting time (seed tubers suffer 


a physiological aging with limited durability), and 
freedom from tuber-borne or tuber-transmitted dis- 
eases (viruses, fungal, and bacterial diseases). Import- 
ant initiatives in the practical use of TPS in 
developing countries have been made by the Interna- 
tional Potato Center (CIP) in Lima, Peru. Whereas 
clonal propagation leads to homozygous plants with 
uniform tubers, TPS progenies are heterozygous and 
give more uneven produce. The main goal in TPS 
breeding programs is to improve the progeny unifor- 
mity, while maintaining other quality and resistance 
characteristics. Methods are inbreeding and use of 
suitable diploid or tetraploid parental lines. TPS pro- 
duction uses natural open pollination, hybrids, syn- 
thetic lines, or cytoplasmic male sterility (CMS), this 
last giving rise to so-called ‘cybrids.’ 


References 

Bradshaw JE and Mackay GR (1994) Potato Genetics. Wallingford, 
UK: CAB International. 

FAO (1998) Production Yearbook, vol. 52: pp. 83-84. Rome: FAO. 

Hawkes JG (1990) The Potato: Evolution, Biodiversity and Genetic 
Resources. London: Belhaven Press. 

Ross H (1986) Potato Breeding: Problems and Perspectives. Berlin: 
Verlag Paul Parey. 

Zandstra HG (1999) Retrospect and future prospects of potato 
research and development in the world. Keynote address, 
Global Conference on Potato, 6—1 | December 1999, New Delhi. 


See also: Genetic Stock Collections and Centers 


Soluble RNA 


B S Guttman 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.200 1.1206 


Soluble RNA is the original term for what is now 
called transfer RNA (Transfer RNA (tRNA)). The 
bulk of RNA in a cell (on the order of 80%) is 
ribosomal, which consists of very large molecules 
built into particles (ribosomes) that can be collected 
by high-speed centrifugation. Hoagland et al. (1957) 
demonstrated that cells also contain a large amount 
of ‘soluble’ RNA of much lower molecular 
weight, and they went on to show that these 
RNAs bind amino acids in the presence of ATP 
and can transfer these amino acids to microsomal 
protein (Hoagland et al., 1958). This discovery coin- 
cided with Crick’s prediction, on theoretical grounds, 
that such small RNAs should exist. Crick pointed out 
that amino acids cannot bind specifically to nucleic 
acids and that if a single amino acid is encoded by a 


triplet of nucleotides, there is a significant spacing 
discrepancy between the RNA template and the nas- 
cent polypeptide chain. Crick therefore predicted that 
the protein-synthesizing mechanism ought to include 
‘adaptor’ molecules, probably a type of small RNA, 
that would recognize a codon at one end and carry an 
amino acid at the other end. This, of course, is exactly 
what transfer RNA molecules do. 
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A somatic mutation is a mutation occurring in a soma- 
tic cell. It therefore affects only its descendants and 
will not be heritable, since it is not present in the germ 
cells. 


See also: Mosaicism in Humans 
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Somatic pairing is the association of the maternal and 
paternal homologs in nonmeiotic nuclei. When a set of 
mitotic chromosomes from a typical diploid eukary- 
otic nucleus is observed under a light microscope, it 
appears as if all of the pairs of chromosomes were 
thrown into a bag, shaken, and dumped out into a 
pile. The chromosomes distribute randomly with 
respect to each other and the two homologs, one 
from each parent, are as likely to be found next to 
each other as they are to any of the other chromo- 
somes. This is expected as, for the most part, homo- 
logs find each other only during meiosis I when they 
align to undergo recombination. Mitotic figures from 
a dipteran insect, such as the model genetic organism 
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Drosophila melanogaster, are organized quite differ- 
ently (see Figure 1). This was first recognized by 
Stevens (1908). Examining cytological preparations 
of mitotic nuclei from embryonic cells and somatic 
gonadal cells, she noted that the homologs in 
these nuclei were commonly found next to each 
other. Metz (1916) looked at other insects and tissues 
and confirmed that homolog pairing was a widespread 
feature of mitosis in the Diptera. 

The extent to which mitotic pairing of homologs 
reflected the arrangement of the chromosomes in the 
interphase diploid nucleus was unclear. Thus the term 

‘mitotic pairing’ was often used instead of ‘somatic 
pairing,’ as it is now commonly called. One type of 
interphase nucleus, the polytene nuclei of the dipteran 
larva, provided an early and incontrovertible link 
between mitotic pairing and interphase homolog 
organization. Many larval cells undergo a process of 
DNA replication without mitosis called endoredupli- 
cation which results in a polyploid nucleus that con- 
tains as many as 1024 copies of a single chromosome 
aligned in a rope-like strand. The presence of the two 
homologs right next to each other gives an appearance 
of two ropes wrapped around each other. Due to the 
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Mitotic figures from Drosophila melanogaster. 
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large size and distinctive banding pattern of the poly- 
tene chromosomes, it can be quickly recognized that 
the homologs are closely and precisely apposed. When 
polytene chromosomes are prepared by squashing for 
viewing in a microscope, the two homologs only occa- 
sionally separate from each other, an event referred 
to as asynapsis. Synapsis of polytene homologs is 
maintained even when one of the homologs contains 
multiple inversions, as in the case of the balancer 
chromosomes of D. melanogaster. Even translocation 
heterozygotes, where chromosomal rearrangement 
has moved a substantial fraction of a chromosome 
arm to another centromere, retain pairing between 
the unrearranged and the translocated chromosomes. 

The diploid interphase nucleus and its chromo- 
somes usually lack visually discernible substructure. 
As a result, the position of homologous loci is less 
obvious in diploid interphase nuclei than in either 
polytene or mitotic nuclei. Development of in situ 
DNA hybridization allowed visualization of specific 
sequences within the space of the nucleus, which in 
turn allowed statistical analysis of the distribution of 
homologs relative to each other. In most diploid 
organisms, when a probe that is unique to a region of 
a chromosome is hybridized to a G, interphase 
nucleus, two well-separated spots of hybridization 
can be seen per nucleus, corresponding to the two 
homologs. However, in the majority of D. melano- 
gaster nuclei, only one spot of hybridization can be 
resolved. In the diploid tissue of the larva this pairing 
is seen in 70-100% of the nuclei, with the variability 
most likely depending on the rapidity of cell division 
in a given tissue. Notably, somatic pairing is not seen 
in the very early embryo of D. melanogaster and 
homologous loci do not begin to substantially pair 
until embryonic nuclear cycle 14, when the cell cycle 
slows down and acquires a G, phase. This agrees with 
the observation that very close pairing is lost during 
mitosis and perhaps as early as the onset of DNA 
replication of the paired locus, although, as seen in 
mitotic spreads, the homologs are still nonrandomly 
close to each other. 

While the actual observation of somatic pairing in 
the nuclei of interphase diploid chromosomes of flies 
required the development of new techniques, genetic 
observations had long suggested the possibility of 
cross-talk between homologs during interphase and 
hence implied their juxtaposition in the nucleus. 
Remarkably, the potential for such interactions was 
pointed out in Stevens’s first report of somatic pairing 
in 1908 when she wrote 


One is tempted to suggest that if homologous maternal and 
paternal chromosomes in the same cell ever exert any influ- 
ence on each other, such that it is manifest in the heredity 


of the offspring, there is more opportunity for such influ- 
ence in these flies than in cases where pairing of homologous 
chromosomes occurs but once in a generation. 


This statement foreshadowed the discovery in 
Drosophila of a variety of genetic phenomena that 
are dependent on the synapsis of the homologs. The 
best-described example is the phenomenon of trans- 
vection, which is the disruption of allelic complemen- 
tation by chromosomal rearrangements thought to 
disrupt somatic pairing. Transvection was first des- 
cribed at the Ultrabithorax (Ubx) locus. When certain 
mutant Ubx alleles are combined, the phenotype is 
close to wild-type. However, when chromosomal 
rearrangements are induced in either one of the two 
homologs the phenotype becomes more severely 
mutant, even though the breakpoints do not disrupt 
the Ubx gene. This observation is interpreted to indi- 
cate that the two mutant alleles must be close enough 
to each other in the interphase nucleus to allow cross- 
talk between the two copies of the gene and rearrange- 
ments of the homolog interfere with close somatic 
pairing. How somatic pairing may promote allelic 
complementation was further clarified when another 
example of transvection was described at yellow. Here, 
allelic complementation was described for two alleles, 
one containing a compromised regulatory region and 
another containing a mutated coding region. Comple- 
mentation is not seen when one mutant allele is pre- 
sent on a transgene elsewhere in the genome and the 
second mutant allele is present at the normal site of 
yellow. Therefore, it seems that the two alleles must be 
present at their normal location so that somatic pairing 
and cross-talk can take place. It is thought that the 
intact regulatory sequences from the first allele are 
able to promote the transcription of the intact coding 
region on the homolog to make a functional and pro- 
perly regulated transcript. 

A second example of a genetic phenomenon in Dros- 
ophila that is dependent on somatic pairing is trans- 
inactivation. Chromosomal rearrangements that bring 
a gene normally found in euchromatin next to tran- 
scriptionally repressive heterochromatin result in the 
inactivation of that gene in some cells but not in others 
in a tissue, an effect referred to as position effect 
variegation (PEV). Most PEV alleles are recessive to 
the wild-type allele, but in some instances, most not- 
ably those involving the brown locus, the variegating 
allele is dominant to the wild-type allele. This dom- 
inance can be suppressed by disruption of somatic 
pairing, implying that the ability of heterochromatin 
to silence a gene on the opposite chromosome is 
dependent on the proximity of the two homologs. 

The mechanism of somatic pairing is not well 
understood, which is not surprising since little is 


known about more general mechanisms organizing 
the interphase nucleus. The simplest explanation is 
that chromosomes move about the nucleus in a con- 
fined random walk pattern until they find a homolo- 
gous sequence, at which time pairing interactions are 
established and maintained. There is some evidence 
that certain regions of the chromosome pair more 
quickly than others, but what features could promote 
such pairing, aside from closer initial position, is 
unclear. Models based on DNA-pairing or on co- 
association of DNA-binding proteins have been pro- 
posed. 

There is no other known group of organisms where 
somatic pairing is so prevalent as in the Diptera. 
However, the nonrandom association of homologs 
in metaphase spreads has been occasionally, if contro- 
versially, reported for various plants and animals. 
More convincing are the reports of premeiotic homo- 
logous pairing in germ cell mitoses of organisms other 
than Diptera. Recent evidence from budding yeast 
suggests that there is some homolog pairing in both 
premeiotic and vegetatively growing diploid cells. 
Intriguingly, one study suggests that the maintenance 
ofi imprinting in mammals is correlated with transient 
association of homologs during certain phases of the 
cell cycle. Most of these more recent studies used 
fluorescent in situ hybridization to examine directly 
the position of the homologous loci in interphase 
nuclei. Undoubtedly, this technique will be applied 
to many other systems and may begin to tell us how 
widespread somatic pairing is in groups outside the 
Diptera. 
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SOS Bypass 
J H Miller 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1212 


‘SOS bypass’ is the process of replication past non- 
coding lesions as a result of induction of the SOS 
system. Specific DNA polymerases replicate past 
these lesions, often resulting in mutations. 


See also: SOS Repair 
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Free-living organisms such as bacteria frequently have 
a life style in which periods of rapid growth alternate 
with periods in which growth is inhibited by various 
stressful conditions. Among the most insidious are 
those in which their genetic material is subjected 
to damaging agents such as UV light or chemical 
mutagens. To maintain the integrity of their genome, 
organisms have evolved a variety of mechanisms for 
dealing with such assaults. Many of these can be 
broadly thought of as DNA repair mechanisms. In 
many bacteria a mechanism has evolved whereby the 
genes determining such processes can be expressed 
when they are actually needed. There are more than 
30 genes in Escherichia coli whose expression is greatly 
enhanced when the cellular DNA is damaged: This 
concept was proposed by Miroslav Radman in 1973 
and was termed the SOS response. 

SOS-inducible genes are repressed by the product 
of the JexA gene which recognizes and binds to an 
operator sequence of some 20 nucleotides, known as 
an SOS box; the lexA gene itself has two such boxes. 
Many types of damage in DNA give rise to single- 
stranded regions either because of the excision of the 
damaged region or because the damage causes an 
interruption in DNA replication with the production 
of a single-stranded region in one of the daughter 
chromosomes. The product of another gene under 
SOS control, recA, is normally present in an uninduced 
bacterium at around 7000 molecules per cell. It binds 
to single-stranded DNA in the presence of ATP to 
form a filamentous nucleoprotein in which the RecA 
protein is said to be in an activated form (RecA’). Lex 
A protein diffuses to this RecA filament and the 
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RecA. catalyzes an autoproteolytic reaction in the 
LexA protein such that it self-cleaves at an Ala—Gly 
bond near the middle of the protein. The truncated 
LexA protein is no longer able to bind to SOS boxes 
and transcription of SOS genes begins. There are lexA 
mutants whose product is uncleavable and which do 
not therefore show SOS induction (e.g., lexA3), and 
others in which induction is constitutive because cleav- 
age is spontaneous (e.g., /exA51). There are also recA 
mutants that cannot be activated to a form that will 
cleave the LexA repressor (e.g., recA56), and others 
that can catalyze cleavage of the LexA repressor even 
when not bound to single-stranded DNA (e.g., 
recA730 and recA441 at 42°C). The former are unable 
to show SOS induction, while the latter show consti- 
tutive induction. Strains unable to induce SOS res- 
ponses are hypersensitive to ultraviolet light and many 
other mutagens and do not show significant muta- 
genesis by such agents. 

The RecA protein itself has a major role in genetic 
recombination but, although it is inducible, induced 
levels are not essential for normal recombination pro- 
cesses. They do, however, seem to be necessary for the 
recombination processes that are involved in certain 
types of DNA repair, for example, the repair of double- 
strand DNA breaks and the recombinational repair of 
daughter-strand gaps that are formed when certain 
types of damaged nucleotide encounter the replication 
fork. 

Nucleotide excision repair is a mechanism in which 
a damaged region of DNA is cut out and replaced by 
DNA synthesized using the undamaged strand as 
template. Three very important genes determining 
this pathway are under SOS control, namely uvrA, 
uvrB, and uvrD. 

Historically, mutagenesis was an important attrib- 
ute of SOS induction; indeed a crucial experiment that 
led to the SOS hypothesis was reported by Weigle in 
1953 and involved mutagenesis of bacteriophage 
lambda. It was found that if lambda phage were 
exposed to UV light and plated on unirradiated bac- 
teria, few if any mutations appeared among the phage 
progeny. If, however, the bacteria had themselves been 
independently irradiated then there were many 
mutants among the progeny of the irradiated phage. 
Subsequent work showed that the recA and lexA genes 
were needed for this “Weigle mutagenesis’ and that the 
same two genes were required for mutagenesis of the 
bacterial chromosome by UV light, ionizing radiation, 
and a wide variety of chemical mutagens. As early as 
1967, Evelyn Witkin had noticed that there were simi- 
larities between the induction of filamentation and the 
induction of prophage following UV irradiation and 
boldly hypothesized that both phenomena reflected 
the release of repressor action (induction) consequent 


upon interruption of DNA replication. Radman 
broadened this concept and argued that there was a 
whole battery of inducible responses dependent upon 
this type of induction, and that included among these 
was one that was needed for mutagenesis to occur 
both in bacteriophage and the bacteria themselves 
following exposure to ultraviolet light. 

Among the genes found to be induced by UV light 
was an operon containing two bacterial genes, wmuD 
and umuC, both of which were needed for SOS 
mutagenesis. The products of these genes act in a 
complex consisting of one molecule of UmuC and 
two molecules of UmuD’, a posttranscriptionally 
modified form of UmuD. In fact UmuD has the 
same ability to self-cleave as LexA and does so under 
the influence of RecA’, so revealing yet another role 
for RecA protein. It has recently, and rather surpris- 
ingly, emerged that the UmuD’,; UmuC complex 
constitutes a new DNA polymerase, designated 
DNA polymerase V, which is able to catalyze synth- 
esis past damaged nucleotides in the template strand. 
In doing so it inserts incorrect bases that form the 
induced mutations. Other E. coli polymerases can be 
encouraged to insert bases opposite template damage, 
but only DNA polymerase V appears to be able to use 
such a damage/mismatch terminus as a primer for 
further strand extension. Polymerase V is also prone 
to error when acting on undamaged template. UmuC 
contains the polymerase domain in DNA polymerase 
V and is now known to be representative of a whole 
family of homologs throughout the evolutionary 
scale. Homologs of umuDC, presumably also deter- 
mining DNA polymerases, are found on many plas- 
mids, e.g., mucAB in pKM101 and impAB in TP110. 
Almost simultaneously with the recognition of poly- 
merase V, it was shown that another SOS-inducible 
gene, dinB, codes for a further DNA polymerase (IV). 
It was known that dinB was required for the phenom- 
enon of indirect mutagenesis which is seen when un- 
irradiated phage are allowed to infect bacteria that 
have been exposed to UV light. It now appears that 
polymerase IV, which is, like polymerase V, a low- 
fidelity enzyme, is induced by the irradiation and 
makes numerous replication errors while replicating 
phage DNA, many of them single base deletions. 
Polymerase IV has little effect on the host DNA and 
its role in the cell is still unclear. There are hints that it 
may perform translesion synthesis at certain specific 
types of damage in template DNA that polymerase V 
cannot. 

The third SOS-inducible DNA polymerase is 
DNA polymerase II, the product of the dinA gene. 
Its role seems to be to assist in the reassembly of 
replication forks that have become stalled by encoun- 
tering particularly difficult types of damage. 


As more is revealed about the control of SOS 
responses it becomes apparent that individual 
responses are subject to quite sophisticated levels of 
control. At the crudest level the timing and extent of 
induction of different genes are controlled by the 
affinity of LexA protein for their SOS box(es), which 
is determined by the sequence of the box(es), very few 
of which are identical. The affinity of LexA repressor 
for the SOS box of uvrD, for example, is 16 times 
greater than its affinity for the SOS box of umuDC. 

As mentioned above, the parallels between SOS 
repair and prophage induction were recognized early 
on. We can now see that certain lysogenic phages such 
as lambda have hijacked the activation of RecA pro- 
tein for their own ends. These phages have evolved a 
repressor that self-cleaves under the action of acti- 
vated RecA’ protein thus allowing excision of the 
phage and vegetative reproduction. As far as the 
phage is concerned it is not the DNA repair responses 
of the SOS system that are of primary interest but the 
utilization of RecA -mediated proteolysis to enable it 
to bail out from a potentially sinking ship. However, 
the presence of the SOS system throughout much of 
the bacterial world and the conservation of many 
SOS-inducible genes from bacteria to humans testify 
to the value of SOS repair to the cell. In prokaryotes 
the primary function of SOS repair is to make avail- 
able DNA repair and certain other mechanisms when 
they are needed, with an important secondary func- 
tion to generate genetic variability when a change in 
environment may demand it. Thus it is becoming 
apparent that SOS induction not only occurs when 
DNA-damaging agents are encountered, but also in 
other circumstances such as in aging colonies and 
when there is nutritional stress, although the mechan- 
isms of induction under these conditions need further 
clarification. 

The SOS system was the first paradigm for a global 
cellular response to DNA damage and it has provided 
the foundation for subsequent studies in mammalian 
and other eukaryotic systems including those involv- 
ing cell cycle control and apoptosis. 


See also: Cell Cycle; DNA Repair; Excision Repair 
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Southern blotting, initially described by Southern 
(1975), is in essence the transfer of DNA restriction 
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fragments from an electrophoresis gel to a nitrocellu- 
lose or nylon membrane in such a way that the DNA 
banding pattern in the gel is reproduced on the mem- 
brane. The DNA fragments become bound to the 
membrane in a form that is suitable for hybridization 
analysis with labeled DNA or RNA probes. Southern 
blotting therefore enables a specific restriction frag- 
ment to be detected against a background of many 
other restriction fragments. 


The Methodology for Southern Blotting 


The basic methodology for Southern blotting is shown 
in Figure |. An agarose gel containing an array of 
DNA fragments is placed on a filter-paper wick 
which connects with a reservoir of buffer. The mem- 
brane is positioned on the gel and a pile of paper 
towels is placed on top of the membrane. Buffer soaks 
through the filter-paper wick by capillary action, 
passes through the gel and membrane, and is absorbed 
by the paper towels. DNA fragments are carried from 
the gel to the membrane. The complete transfer of 
fragments up to 15kb requires approximately 18h. 
This basic methodology can be embellished by alter- 
native forms of transfer, such as electroblotting, which 
uses electrophoresis rather than capillary action to 
transfer the DNA, and vacuum blotting, in which 
buffer is drawn through the gel and membrane under 
vacuum. Both electroblotting and vacuum blotting 
are more rapid than the conventional methodology, 
reducing the transfer time to as little as 30 min. 

Nylon membranes are more popular than the nitro- 
cellulose versions because they are tougher and so are 
unlikely to break during the blotting procedure. In 
addition nylon membranes can be subjected to mul- 
tiple rehybridizations. A second advantage is that 
nylon membranes bind DNA molecules of 50 bp or 
longer, whereas nitrocellulose membranes do not effi- 
ciently bind molecules less than 500 bp. The one major 
advantage of nitrocellulose is that these membranes 
give less background hybridization, especially when a 
nonradioactive label is used. 

Prior to blotting, an agarose gel must be pretreated 
to break the DNA molecules in individual bands into 
smaller fragments, smaller fragments transferring 
more efficiently than larger ones. The pretreatment 
involves soaking the gel in 0.25 mol HCl for 30 min, 
which cleaves some of the B-N-glycosidic bonds that 
attach adenine and guanine bases to the sugar com- 
ponents of their nucleotides. This depurination is fol- 
lowed by spontaneous breakage of the polynucleotide 
chain at the baseless sites that are created. After acid 
pretreatment, the gel is placed in alkali to disrupt the 
hydrogen bonds in the double helices, resulting in 
fragmented single strands at locations corresponding 
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to the migration positions of the original restriction 
fragments. 

For transfer to a nitrocellulose membrane, the 
alkali treatment is followed by neutralization of the 
gel by soaking in a Tris-salt buffer, because DNA does 
not bind to nitrocellulose above pH 9.0. The blot is 
then assembled with a high-salt transfer buffer called 
20X SSC, which comprises 3.0mol NaCl + 0.3 mol 
sodium citrate. This buffer can also be used with a 
nylon membrane, but if the nylon membrane is posi- 
tively charged then 0.4mol NaOH is usually used 
because this buffer not only transfers the DNA but 
also binds it covalently to the membrane. With a 
nitrocellulose or uncharged nylon membrane, the 
initial attachment of the DNA is reversible and must 
be made more permanent by a post-treatment: either 
baking at 80°C for 2h, which noncovalently attaches 
DNA to a nitrocellulose membrane, or UV ir- 
radiation, which covalently binds DNA to a nylon 


membrane. 


Hybridization Analysis of a Southern 
Blot 


Southern blotting is always a prelude to hybridization 
analysis, for example with a cloned restriction frag- 
ment, a copy DNA (cDNA), or a synthetic oligonu- 
cleotide probe. Before hybridization the membrane is 
placed in a solution containing polymers that attach 
to any vacant DNA binding sites on the membrane 
surface, ‘blocking’ these so that the hybridization 
probe does not bind nonspecifically to them. Various 
polymers have been used, including nonbiological 
ones such as polyvinylpyrrolidone or biological poly- 
mers including Ficoll (a carbohydrate), bovine serum 
albumin (a protein), or dried milk (a complex mix- 
ture). DNA can also be used as a blocking agent, 
providing it is unrelated to the DNA being used as 
the probe. This prehybridization step takes between 
15min and 3h at 68 °C, depending on the type of 
membrane. 

Hybridization is performed by placing the mem- 
brane in a buffer in a rotating tube containing the 


Nylon membrane 


Southern blotting. (Reproduced from Brown TA (1999) Genomes, with permission of BIOS Scientific 


hybridization probe, or alternatively ina sealed plastic 
bag ona shaker. The buffer has a high salt content (e.g., 
2X SSC) and a detergent such as 1% sodium dodecyl 
sulfate is usually included. To increase sensitivity a 
second polymer might be added at this stage, such 
as 10% dextran sulphate or 8% polyethylene glycol 
6000. These polymers do not block DNA binding sites 
but instead induce the probe molecules to form net- 
works so that greater amounts attach to the target sites 
on the membrane. 

Specificity is critical during hybridization analysis. 
The probe DNA must contain a region that is com- 
plementary to at least part of the blotted restriction 
fragment that is being sought. Problems can arise if the 
probe is partially complementary to other blotted 
DNA fragments. Hybridization must therefore be 
carried out under conditions that result in formation 
of a stable hybrid between the probe and its specific 
target, but not between the probe and any nonspecific 
targets. Providing that the probe has been well de- 
signed and is more complementary to its specific tar- 
get than it is to the nonspecific ones, then specificity 
can be ensured by careful selection of the temperature 
at which the hybridization is carried out. This is 
because the highest temperature at which the hybrid 
between the probe and it specific target is stable (this is 
called the Tm or melting temperature for the hybrid) 
will be higher than the highest temperature at which a 
nonspecific hybrid is stable, because this nonspecific 
hybrid will be held together by fewer base pairs. If the 
probe is a restriction fragment or cDNA longer than 
100bp then nonspecific hybridization is usually 
avoided if the reaction is carried out at 68 °C in a 
high-salt buffer. With oligonucleotide probes the 
situation is more complicated because 68 °C might 
be too high for the formation of any hybrids, includ- 
ing the fully base-paired one. However, the Tn can be 
estimated from the sequence of the oligonucleotide, 
using the formula: 


Tm = (2 x number of A and T nucleotides) + 
(4 x number of G and C nucleotides) °C 


This estimation is reasonably accurate for most oligo- 
nucleotides whose values of Tm fall between 40 and 
90°C. The initial hybridization is set at a temperature 
10°C or so below the estimated Tm, which allows 
many hybrids to form, including nonspecific ones. 
Specificity is subsequently achieved by a series of 
post-hybridization washes, these being carried out at 
increasing temperatures so that nonspecific hybrids 
are disrupted, with the last wash designed to leave 
just the specific hybrid. 

After hybridization, the position of the probe that 
remains bound to the membrane is determined by 
autoradiography if a radioactive label has been used, 
or by an alternative methodology if the probe was 
nonradioactively labeled. A nylon membrane can be 
reprobed up to ten times between each hybridization 
if it is ‘stripped’ by washing at a high temperature in a 
buffer containing alkali and detergent to remove the 


hybridized DNA. 


Applications of Southern Blotting 


Southern blotting has many applications in molecular 
biology research. For example, it is used at important 
stages during gene cloning projects. Genomic DNA, 
containing the gene to be cloned, is blotted and hybrid- 
ized to identify one or more restriction fragments 
containing the desired gene, and Southern blotting is 
used later in the project when a tentative clone has 
been isolated, to verify that the clone does indeed 
contain the desired gene and possibly to identify a 
smaller subfragment within which the gene lies. A 
second application of Southern blotting is in restric- 
tion fragment length polymorphism (RFLP) analysis, 
which is important in several contexts including con- 
struction of genome maps. An RFLP arises if a restric- 
tion site that is present in the genomes of some 
members of a population is absent, owing to an altera- 
tion in the nucleotide sequence, in other individuals. 
RFLPs are typed by Southern hybridization, using a 
probe that spans the polymorphic region, the presence 
or absence of the polymorphic restriction site 
being determined from the number and sizes of the 
fragments that are detected. 


Reference 

Southern EM (1975) Detection of specific sequences among 
DNA fragments separated by gel electrophoresis. Journal of 
Molecular Biology 98: 503-517. 


See also: DNA Cloning; Restriction Endonuclease 
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Specialized Recombination 
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Specialized recombination is a term used to describe 
recombination events that are distinct from general 
homologus recombination in that they either are 
directed to specific DNA sites (or regions), or involve 
special proteins that are not required for homologous 
recombination. The term encompasses a variety of 
recombinational processes of which the best known 
are transposition and site-specific recombination. 

The following are some examples of specialized 
recombination. For more detailed descriptions see 
the separate entries below. 


DNA Transposition 


DNA transposition is the movement of a defined 
DNA segment (a transposon) from one genomic site 
to another; the ends of a transposon are specific, but 
the integration sites generally are relatively random. 
Movement is catalyzed by a transposon-encoded 
transposase. Some DNA repair/replication is required 
to seal the short gaps at the transposon-target junc- 
tion (and in some cases to duplicate the transposon). 
(See articles Transposable Elements and Insertion 
Sequence.) 


Retrotransposition 


Retrotransposition is the movement of defined DNA 
segments (retrotransposons) by a process that involves 
transcription of the element to form an RNA inter- 
mediate. In elements such as retroviruses and LTR 
(long terminal repeat) retrotransposons, the RNA tran- 
script is used as a template to make a double-stranded 
DNA version of the transposon. Like a conventional 
transposon, this DNA is then processed and inserted 
into a target by the element-encoded integrase. In non- 
LTR retrotransposons, the RNA transcript is copied by 
reverse transcriptase directly into the target site, using 
a nick created at the target by the element-encoded 
endonuclease as the primer for DNA synthesis. (See 
articles Retrotransposons and Retroviruses.) 


Site-Specific Recombination 


Site-specific recombination is an exchange between 
two defined sites resulting in integration, excision, or 
inversion. Recombination is catalyzed by a site-specific 
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recombinase. DNA cleavage at the recombination site 
results in an intermediate with the recombinase co- 
valently linked to the ends of the DNA; reversal of this 
process reseals the DNA to form the recombinant, 
and releases the recombinase. No replication or repair 
required. (See Article Site-Specific Recombination.) 


V(D)J Joining 


This is a process for generating immunoglobulin and 
T cell receptor (TCR) diversity during mammalian 
B and T cell development. It involves the precise exci- 
sion of the DNA segments that separate V and J, or 
Vand D, and D and J coding sequences of the immuno- 
globulin and TCR loci, coupled to imprecise joining 
of the V, D, and J coding sequences. The process is 
catalyzed by Rag1/Rag2 recombinase acting at recom- 
bination signal sequences (RSS), and involves DNA 
synthesis by terminal transferase and cellular DNA 
double-strand break repair activities (including Ku 
and DNA-PK). (See articles Immunoglobulin Gene 
Superfamily and T Cell Receptor Gene Family.) 


Immunoglobulin Heavy Chain Class 
Switch 


This is the process for changing the class of animmuno- 
globulin protein (e.g., from IgM to IgG). It is an im- 
precise but region-specific form of recombination 
within the immunoglobulin heavy chain locus, that 
deletes genomic DNA between the variable (VDJ- 
encoding) genes and various downstream constant 
(Cu) genes. Recombination occurs between two 
‘switch’ regions, by an unknown mechanism. (See 
articles Class Switching and Immunoglobulin Gene 
Superfamily.) 


Intron (and Intein) Homing 


Homing is the term for the process for introns from a 
particular gene to insert into an intronless version of 
the same gene. There are two distinct mechanisms, a 
DNA-dependent process and an RNA-dependent 
process called retrohoming. 


1. Group I introns encode a site-specific endonuclease 
that makes a double-strand break in the intronless 
(butnot the intron-containing) allele. The breakis re- 
paired by homology-dependentdouble-strand break 
repair, using the uncleaved, intron-containing, allele 
as the genetic donor; this results in gene conversion 
of the intronless to the intron-containing form. 

2. Group II introns encode a protein with three activ- 
ities: RNA maturase, DNA endonuclease, and 
reverse transcriptase (RT); the latter two activities 
are required for insertion (homing) of the spliced 


intron RNA. Inacomplex process, the intron RNA 
is attached to one strand of the cleaved DNA inser- 
tion site, andacDNA copy is made by the RT primed 
from the 3’ end of the opposing strand. Host repair/ 
replication (but not recombination) functions are 
required. (See article Intron Homing.) 


Mating-Type Switching in Yeast 


Interconversion of yeast haploid cells can occur be- 
tween the two alternative mating types, a and a, achiev- 
ed by moving a- or a-specific regulatory genes from 
silent loci (HMLo« and HMRa) to the expression locus, 
MAT. The genetic identity of MAT is switched by gene 
conversion, initiated by a double-strand break at the 
MAT locus catalyzed by the HO site-specific endonu- 
clease. Thecleaved MAT locusactsasatargetfordouble- 
strand break repairusing the silentloci HMLaor HMRa 
as the donors of the genetic information. (See article 
Mating-Type Genes and their Switching in Yeasts.) 


See also: Alternation of Gene Expression; 
Chromosome Dimer Resolution by Site-Specific 
Recombination; Conjugative Transposition; Flp 
Recombinase-Mediated DNA Inversion; Gene 
Cassettes; Gene Rearrangement in Eukaryotic 
Organisms; Hin/Gin-Mediated Site-Specific DNA 
Inversion; Integrase Family of Site-Specific 
Recombinases; Integrons; P Elements; Phage À 
Integration and Excision; Resolvase-Mediated 
Deletion 
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Specialized transduction is the virus-mediated transfer 
of nonviral genetic material to a recipient cell by a 
process involving the formation of a hybrid genome 
in which viral genes are substituted by genes derived 
from the host chromosome. Because it normally ap- 
plies to a strictly limited set of host genes, the pheno- 
menon has also been called ‘restricted’ or ‘localized 
transduction.’ 


Basis of Specialized Transduction 


Detailed understanding of the basis of specialized 
transduction largely derives from studies with coli- 
phage lambda (A), a temperate phage whose genome 
in the lysogenic state is integrated at a specific 
attachment site, attB, located between the gal and bio 
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Generation of specialized transducing phages by aberrant excision. The linear genome of bacteriophage à 


circularizes on injection into the Escherichia coli host cell. The circular genome is integrated into the host 
chromosome by site-specific recombination via the phage attachment site, P:P’, and the bacterial attachment site, B:B’, 
between the gal and bio operons. Normal excision of the prophage is a reversal of integration. Rare, aberrant excision 
events create transducing phage genomes, in which some phage genes have been replaced by genes from one or other 
flanking region of the host chromosome. (Reproduced with permission from Campbell, 1962.) 


operons on the host chromosome. When A-lysogens 
are treated with low doses of UV light or radio- 
mimetic chemicals, the prophages are induced to pro- 
liferate, excised from the host chromosome, and 
eventually give rise to a phage ‘lysate.’ This lysate is 
capable of transducing gal” or bio™ recipient bacteria 
to the Gal* or Bio” phenotype at a low frequency (c. 1 
per 10° infecting phage particles). Only genes that are 
closely linked to attB can be transduced, and lytically 
grown phage lysates are ineffective. 

The rare transducing phages are derived by aber- 
rant excision of the prophage from the chromosome 
of the lysogen. The nonhomologous recombin- 
ation events that result in the substitution of flanking 
chromosomal genes for phage genes are independent 
of X’s normal excision functions encoded by the phage 
int and xis genes (Figure 1). 

The transduced cells are frequently lysogenic for 
a A-transducing phage, with the phage genes being 
flanked by tandem copies of the transduced gene. 
(The exogenous gene carried by the transducing 
phage is termed an ‘exogenote,’ while the endogenous 
gene is the ‘endogenote.’ A cell carrying an exogenote 
is a ‘syngenote’ or, where the exogenote and endogen- 
ote differ in one or more markers, a ‘heterogenote.’) 


Excision of the prophage by homologous recombin- 
ation between the flanking genes readily leads to 
segregation of the recipient phenotype (1 per 10° bac- 
terial divisions). 

In most cases the loss of phage genes associated 
with formation of the transducing derivative leads to 
a defective phage, incapable of growth in the absence 
of an associated ‘helper’ phage. Thus defective gal- 
transducing phages are referred to as Adg phages. In 
some cases the genes lost are not essential for phage 
growth and a plaque-forming transducing phage can 
be isolated (e.g., pbio). 

Induction of a host cell lysogenic for Adg does not 
lead to production of a phage lysate. If the cells are 
doubly lysogenic, for Adg and 2°, or if the UV-treated 
single lysogen is superinfected with A+ helper phage, 
the resulting preparation is a high-frequency trans- 
ducing (HFT) lysate, with up to half of the phages in 
the lysate being transducing phages. 


Extending the Range of Specialized 
Transduction 


The number of host genes that can be picked up and 
stably carried by phages such as bacteriophage à can 
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be greatly extended by the use of donor strains lack- 
ing the normal chromosomal attachment site. Lambda 
integrates its genome and forms stable lysogens about 
200-fold less frequently in such hosts, but does so by 
the normal integration mechanisms at ‘secondary’ 
attachment sites. Although about a dozen such sites 
predominate, where a strong positive selection for 
gene-inactivation is available, rare \-integrants can 
usually be selected. Induction of such lysogens into 
lytic growth leads to the generation of transducing 
phages by aberrant excision. 

Using in vitro recombinant DNA techniques, 
phages resembling plaque-forming, specialized-trans- 
ducing phages can readily be generated. When the 
cloned DNA is homologous with the host cell DNA, 
phage integration and transduction can be achieved via 
homologous recombination. Transductants will be 
syngenotic lysogens so long as the vector phage has a 
functional immunity system. If the cloned DNA is 
nonhomologous with the Escherichia coli host DNA, 
integrants can be created by the use of a A-lysogenic 
host strain. Using an endogenous prophage and a 
transducing phage of different immunity specificity 
allows the double lysogen to be selected by its immun- 
ity to the appropriate superinfection. 


Other Specialised Transducing Systems 


Although most studies have been carried out with 
à and its host, E. coli, many other phage/host systems 
supporting specialized transduction have been 
described and characterized. In most cases, these 
behave according to the model worked out for Agal- 
and Abio-transducing phages. Specialized transducing 
derivatives of generalized transducing phages have 
been isolated. In some cases these are due to direct 
transposition of genes from the host chromosome or 
an episome into the phage genome. 

A phenomenon superficially similar to specialized 
transduction occurs with eukaryotic retroviruses. 
Transducing retrovirus particles carrying host genes 
arise by cotranscription of an inserted provirus and an 
adjacent host gene, followed by splicing, packaging, 


and recombination into a viral genome. 


Uses of Specialized Transducing Phages 


Because specialized transduction normally results in 
a partial diploid, with various lengths of the donor 
chromosome carried on the exogenote, it has been 
valuable in the detailed genetic analysis of certain 
bacterial genes. Complementation tests, dominance 
tests and deletion mapping are readily carried out 
using specialized transducing phages. The ease of car- 
rying out electron microscopic heteroduplex mapping 


with lambdoid phage genomes allows genetic and 
physical measurements to be correlated. 

Transduction with specialized transducing phages 
involves integration by homologous recombination, 
a readily reversible event. This integration—excision 
cycle allows ready exchange of genetic markers be- 
tween exogenote and endogenote and has proved 
useful in the manipulation of bacterial genotypes. 

Analogs of transducing phages constructed by in 
vitro methods have been used to prepare an ordered 
array of phages covering the entire E. coli chromo- 
some. Such ordered arrays, which facilitate genetic 
mapping and gene isolation, can readily be con- 
structed for other bacterial genomes. 
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One of the most challenging aspects of the diversity of 
life is that it consists of discrete entities, called species. 
There was no need to explain this as long as one 
believed, as did Linnaeus, that “there are as many 
species as were created at the beginning.” But 
throughout the eighteenth and the first half of the 
nineteenth centuries, it became ever more obvious 
that there are processes through which new species 
originate. The replacement of extinct faunas by new 
species was one of these phenomena. For Darwin, 
after his return from the Beagle, the origin of species 
became the foremost research program. 

Now, 140 years after the publication of On the 
Origin of Species in 1859, speciation is still an active 
field of research. This means that there are still 
unsolved aspects of this process and unresolved 
controversies. This remaining uncertainty is mainly 
accounted for by (1) the pluralism of speciation 


phenomena, and (2) equivocation as to the meaning of 
the word ‘speciation.’ 

Paleontologists traditionally have referred to the 
change of phylogenetic lineages as speciation. This 
process, however, even though resulting in change of 
the lineage over time, does not produce additional 
species. To avoid confusion, this process is best 
referred to as ‘phyletic evolution.’ 

What is usually meant when an author speaks of 
speciation is the multiplication of species. It is the 
production of new species by existing ones. Darwin 
encountered this process and understood its meaning, 
when he was told that the mockingbirds (Mimus) 
which he had collected on three different islands in 
the Galapagos were three different species. Because 
there is only one species on the mainland of South 
America, opposite the Galapagos Islands, colonists 
of that species apparently had speciated into three 
species in the Galapagos archipelago. This led to the 
question which is still in part controversial today, by 
what processes does such a multiplication of species 
take place? Numerous answers to this question have 
been proposed which require critical analysis. 


Type of Speciation 


Instantaneous Speciation 

When essentialistic thinking was still dominant, speci- 
ation could be conceived only by the spontaneous 
production of a new individual that represented a 
new kind of organism, a new type. Some of Darwin’s 
contemporaries adopted this solution, such as Lyell 
and T.H. Huxley. Three leading Mendelians (de 
Vries, Bateson, and Johannsen) believed, after 1900, 
that a single mutation could establish a new species. 
This mode of speciation was defended up to the mid- 
dle of the century (Goldschmidt, 1940; Willis, 1940; 
Schindewolf, 1950). 

Instantaneous speciation may be defined as the 
production of a single individual (or the offspring of 
a single mating) that is reproductively isolated from 
the species to which the parental stock belongs and 
that is reproductively and ecologically capable of 
establishing a new species population. 

Even though such instantaneous speciation by a 
single mutation has been shown not to occur or- 
dinarily in sexually reproducing organisms, it is of 
course frequent among asexual clones, but the new 


*Agamospecies are sets of asexual clones. Speciation of 
agamospecies takes place by mutation and by the elimination 
of less successful clones by natural selection. This creates gaps 
among sets of clones and if such gaps are wide enough, such 
sets of clones are considered different species. 
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agamospecies* which are produced by this process 
are not the equivalent of biological species. 

Instantaneous speciation, however, occurs not 
infrequently in plants and more rarely in animals, as 
a result of chromosomal restructuring. For instance, 
the doubling of the chromosome set of a sterile species 
hybrid in plants may lead to the production of a 
fully fertile allopolyploid. For reasons not yet fully 
understood, such a restoration of fertility in species 
hybrids occurs far less frequently in animals. Instead, 
animal species hybrids may switch to partheno- 
genesis and may persist for long periods of time. 
Chromosomal rearrangements may also lead to the 
production of new postzygotic isolating mechan- 
isms, if the new population succeeds getting through 
the first deleterious heterozygous stage. It seems 
that in animals all cases of seeming instantaneous spe- 
ciation are accompanied by a shift to parthenogenesis 
or self-fertilizing hermaphroditism. In plants, how- 
ever, speciation by polyploidy is common. At least 
one-third of all plant species are the result of this 
process. 


Geographic Speciation 

In this process, “a population which is geographically 
isolated from its parental species, acquires during this 
period of isolation genetic differences which promote 
or guarantee reproductive isolation when the external 
barriers break down” (Mayr, 1942). 

That speciation is a populational process was dis- 
covered by several naturalists in the first third of the 
nineteenth century. Darwin understood it in 1837 
when studying the three species of mockingbirds he 
had discovered on three islands in the Galapagos. 
Even though Darwin himself later adopted sympatric 
speciation (see below) and downgraded the import- 
ance of geographic speciation, the importance of the 
latter process continued to be emphasized by leading 
naturalists such as Moritz Wagner, K. Jordan, D.S. 
Jordan, Stresemann, Rensch, and Mayr. Under the 
influence of Darwin and Weismann, sympatric speci- 

ation was adop ted s the principal process of speciation 
until about 1942. It is now acknowledged that 
geographic speciation is the principal process of speci- 
ation in sexually reproducing animals and probably 
also in plants 


Allopatric (geographical) speciation occurs in two forms 


1. Dichopatric (splitting) speciation. The range of a 
more or less widespread species is split into two by 
a newly arising geographic barrier such as a water- 
way, a new mountain range, or a vegetational bar- 
rier (like a savanna separating two parts of a 
previously continuous rainforest). 
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2. Peripatric (budding) speciation. A founder popula- 
tion is established beyond the current species bor- 
der and beyond the gene flow of the parental 
species. Owing to the normal genetic processes 
occurring in any population (mutation, errors of 
sampling, selection, etc.), the isolated population 
diverges continuously genetically until it has 
reached the level of distinction that permits it to 
coexist as a separate species if it establishes contact 
again with the parental species. 


Peripatric speciation differs in a number of ways from 
dichopatric speciation: (1) the gene pool is very small, 
answering readily to local selection pressures; (2) the 
population lives in a new physical and biotic environ- 
ment and is exposed to new and rather strong selection 
pressures; (3) the gene pool of the founder population 
was started by a very small sampling of the genes of 
the parental population and is apt to lose additional 
genetic variants through inbreeding. The origin of new 
epistatic connections is thus favored and this may lead 
to a rather radical restructuring of the new gene pool. 
Mayr (1954) has referred to this possibility as a genetic 
‘revolution’; even though this term is perhaps too 
strong, there is no doubt about the opportunity for 
considerable genetic restructuring in founder popula- 
tions. This is substantiated by students of geographic 
variation who find that the most distant and most 
isolated peripatric founder populations are often dras- 
tically different from the parental species. By far the 
majority of the peripatric founder populations are 
unsuccessful, however, and quickly become extinct. 


Evidence for geographic speciation 

The evidence for the universality of allopatric speci- 
ation is overwhelming (Mayr, 1963). Particularly 
impressive are the numerous cases of peripatric popu- 
lations that are ‘borderline cases,’ that is, populations 
more or less on the way to become separate species. 


Sympatric Speciation 

In the 1850s, Darwin switched from allopatric to 
sympatric speciation as a result of misapplying his 
Principle of Divergence (Mayr, 1992). He fought for 
it in his controversy with Moritz Wagner, misled by 
Wagner’s alternative, selection or isolation, as the 
cause of speciation. Sympatric speciation was the spe- 
ciation mechanism most widely adopted until the 
1940s. It was adopted by nearly all entomologists 
who worked with host-specific species, in spite of 
the counterarguments by K. Jordan and E. Poulton. 
In 1947 Mayr showed the obstacles encountered 
by sympatric speciation. Maynard Smith showed 
under what combination of circumstances sympatric 
speciation nevertheless could occur and Bush (1994) 


provided considerable evidence for its actual occur- 
rence. 

Even though it is still evident that geographic spe- 
ciation is the most frequent process of speciation both 
in animals and plants, enough situations have been 
found in recent years to confirm the occurrence of 
sympatric speciation. Apparently it proceeds in ani- 
mals by two very different methods. 


By host shift 

If an individual of a host-specific insect shifts to 
another plant host and establishes a population, this 
founder population might, by sympatric speciation, 
become a new species, equally host-specific on the 
new host. The difficulty is reciprocal recolonization. 
If an insect can switch from plant species A to plant 
species B, its descendants are most likely able to 
colonize back to plant species A, for which it was 
originally particularly adapted. If there is sufficient 
back and forth of colonization, an insect species 
might evolve that is successful on both plant species. 
Selection might well favor such polyphagy as an 
expansion of the resource base of an insect species. A 
shift to a new host species might be particularly easy in 
a peripherally isolated population where the original 
plant species is rare but a suitable new one is quite 
abundant. 


By acquisition of a new mate preference 

It was recently discovered that sympatric speciation 
occurs quite frequently in certain families of fishes, 
particularly the Cichlidae. Not all cases of the coex- 
istence of two (or more) very closely related species in 
the same body of water requires an explanation by 
sympatric speciation. Sometimes repeated coloniza- 
tion from an outside source is what really had hap- 
pened. However, when six genetically very similar 
cichlid species coexist in a crater lake in Cameroon, 
species that are more similar to each other than to any 
species outside the lake, sympatric speciation is the 
only interpretation that makes sense. The mechanism 
is apparently a switch in mate preference (through 
sexual selection), but the details of this process have 
not yet been elucidated. 


Acquisition of Isolating Mechanisms 


Isolating mechanisms (see Species) are the devices 
through which species are protected against hybrid- 
ization with other species. A rigorous definition is: 
“Tsolating mechanisms are biological properties of 
individuals that prevent the interbreeding of popula- 
tions that are actually or potentially sympatric.” This 
definition excludes geographical barriers or any other 
kind of spatial isolation as an isolating mechanism. In 


sexually reproducing species, isolating mechanisms 
originate in populations during periods of geographic 
isolation. Darwin insisted that there cannot be natural 
selection for the development of isolating mechan- 
isms, rather they are an incidental byproduct of the 
genetic changes that occur independently in the 
isolated populations. This is particularly obvious 
for postzygotic isolating mechanisms (chromosomal 
changes). As far as prezygotic mechanisms are 
involved, particularly behavioral barriers in animals, 
it was long assumed that they were the incidental 
byproduct of different stochastic processes in the 
two isolated populations and of the different selection 
forces to which they are exposed. Evidence has 
been found recently, however, that ‘fashions’ in mate 
selection may play an important role in certain 
groups of animals. Mate preferences, developed 
through sexual selection in isolated populations, may 
(by change of function) become behavioral isolating 
mechanisms. 


Incomplete Speciation and 
Hybridization 


Incipient species often re-establish contact with the 
parental population owing to range expansion before 
their isolating mechanisms had been perfected. In the 
zone of contact, hybridization will now take place 
and a more or less extensive hybrid zone will develop. 
Several different outcomes of such an event have been 
recorded. If the isolating mechanisms were nearly 
perfect and only a few hybrids occur, the two species 
will not fuse and natural selection may even lead to an 
improvement of the isolating mechanisms. 

If there is almost indiscriminate hybridization, a 
more or less permanent hybrid zone will develop 
due to the continuing elimination of the hybrids 
and their descendants, since they are of reduced via- 
bility. At the same time new hybrids between the two 
populations continue to be produced. The isolating 
mechanisms cannot be improved because of this con- 
tinuous recolonization of the hybrid belt from the two 
parental populations. 

There is some evidence that a highly isolated 
small hybrid population may, in time, develop its 
own isolating mechanisms (owing to an absence of 
recolonization by the two parental populations) and 
finally become a separate species. This is the most 
likely explanation for the occurrence of homoploid 
species coexisting with one or the other parental spe- 
cies, now without hybridizing with them. Such cases 
have been described in plants and in animals. 

Some authors have claimed that two incipient 
species connected by a hybrid belt might become full 
species by a process called parapatric speciation. They 
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postulate that the selection pressure against the 
hybrids would in due time lead to a reduced frequency 
of hybridization and ultimately to its disappearance. 
A careful study of all these cases, however, has con- 
vinced me that this is unlikely to occur. This is indi- 
cated by the high age of some of the hybrid belts 
between incipient species which had originated in 
Pleistocene refugia and had re-established contact 
with each other as much as 8000 to 10000 years ago. 

Cases in which an expanding species begins to over- 
lap the range of a closely related species are different. 
The most advanced colonists of the expanding species 
may not be able to find conspecific mates and then 
mate with individuals of the overlapped species. As 
the expansion continues, sufficient individuals of their 
own species become available and the hybridization 
occurs no longer. 

The isolating mechanisms are not always perfect. 
They are ‘leaky’ as it is said. The result is occasional 
hybridization, even between good species. The fre- 
quency of such occasional hybridization varies among 
different kinds of organisms (see Species). 

In the prokaryotes, unilateral gene exchange 
between agamospecies is apparently very frequent, 
even among such distant groups as the eubacteria 
and the archaebacteria. 


Rates of Speciation 


Perhaps the most astonishing aspect of speciation is 
the enormous difference in the rates of speciation. It 
may be instantaneous as in the case of allopolyploidy 
or the shift to parthenogenesis in animal species 
hybrids. On the other hand, populations that are 
known to have been isolated from each other millions 
of years ago may still lack any isolating mechanisms. 
The American botanist Asa Gray called Darwin’s 
attention to about six or seven eastern North 
American plants, including the skunk cabbage, 
which also occur in eastern Asia but have not changed 
in appearance since their isolation nor acquired cross- 
sterility. Here speciation had not occurred in a period 
of six to eight million years of separation. 

Rapid speciation may also occur in sexually repro- 
ducing organisms. A group of species of cichlid fishes 
that occur in the southernmost bay of Lake Malawi in 
east Africa, each endemic to the waters around a rocky 
island, seem to have evolved in the last 1000 years. The 
rich fauna of more than four hundred species of cichlid 
fishes in Lake Victoria evolved since the lakebed was 

completely dry, perhaps only 25000 years ago. Even 
more rapid seems to have been the sympatric specia- 
tion of some freshwater fishes, having occurred in 
less than 1000 years. Mayr infers that in birds an isol- 
ation of at least 10000 years but more likely more 
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than 100000 years is necessary for the perfecting of 
isolating mechanisms. 
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The word species is used in daily language to refer to 
different kinds of things. One speaks of different spe- 
cies of metals or species of minerals. This concept of 
species focuses on degree of difference and is now 
usually referred to as the typological species concept. 
This was, on the whole, the species concept of Lin- 
naeus and ruled unchallenged far into the nineteenth 
century. It fitted well with the essentialistic thinking 
of that period, but eventually it became obvious that it 
did not reveal the true nature of species of organisms. 
This resulted in a search for other, hopefully better, 
species concepts. Before discussing them, it is, how- 
ever, necessary to determine what the word ‘species’ 
means. In current taxonomy the word species is used 
for three different concepts. 


1. Species as concept. For different biologists, the 
word ‘species’ had different meanings. For some it 
indicated an entity that was different, for others it 
was a reproductive community. Species definitions 


reflecting at least seven such different concepts have 
been proposed and will be discussed below. 

2. Species as taxon. A species taxon is a group of 
natural populations conforming to the definition 
of a species concept. The species concept serves as 
the yardstick by which to delimit a taxon against 
other taxa. A taxon can be described and delimited, 
but not defined. Such a group of populations has 
reality, but its rank (species or subspecies) is often 
difficult to determine. This determination of rank 
is rarely a problem where two closely related popu- 
lations are sympatric or are in the process of invad- 
ing each other’s ranges. If the two populations 
remain reproductively isolated, they are two spe- 
cies; if they freely interbreed, they are only a single 
species. 

For more than 130 years, naturalists, have agreed 
that species are not essentialistic classes. But then 
what are they? Haeckel said, “the species is an 
individual.” However, even though there is an inter- 
nal cohesion in a species, which makes it a unified 
system corresponding to an individual, the word 
‘individual’ in the vernacular always applies to a 
singular object which a species is not. It is definitely 
counterintuitive to refer to the circa 6 billion 
human individuals as an individual. It has therefore 
been suggested to introduce a third ontological 
category, the biopopulation. A species taxon is a 
biopopulation. It has the internal cohesion of an 
individual, but is so to speak a multiple individual. 
However, a species taxon is definitely not a class. 

3. Species as category. A category designates rank 
or level in a hierarchic classification. The species 
category is the class whose members are the species 
taxa. Not only are all biological species placed in 
this category, but so also are those asexual entities 
(agamospecies) that are as different from each other 
as are biological species. Such pluralism in the spe- 
cies category is inevitable since biological species 
and agamospecies are two rather different natural 
phenomena. Agamospecies occur in most higher 
taxa of animals (not in mammals or birds), are 
very common in plants, and all ‘species’ of prokary- 
otes are agamospecies. An agamospecies consists of 
a set of clones which, in the aggregate, are 
as different from other agamospecies as are good 
biological species from each other. Owing to the 
steady selection against inferior clones, gaps are 
produced among the clones that correspond to the 
species borders of biological species. 


Species concepts 


For a layperson, a species was simply an assemblage of 
similar entities. But, as the knowledge of nature grew, 
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than 100000 years is necessary for the perfecting of 
isolating mechanisms. 
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The word species is used in daily language to refer to 
different kinds of things. One speaks of different spe- 
cies of metals or species of minerals. This concept of 
species focuses on degree of difference and is now 
usually referred to as the typological species concept. 
This was, on the whole, the species concept of Lin- 
naeus and ruled unchallenged far into the nineteenth 
century. It fitted well with the essentialistic thinking 
of that period, but eventually it became obvious that it 
did not reveal the true nature of species of organisms. 
This resulted in a search for other, hopefully better, 
species concepts. Before discussing them, it is, how- 
ever, necessary to determine what the word ‘species’ 
means. In current taxonomy the word species is used 
for three different concepts. 


1. Species as concept. For different biologists, the 
word ‘species’ had different meanings. For some it 
indicated an entity that was different, for others it 
was a reproductive community. Species definitions 


reflecting at least seven such different concepts have 
been proposed and will be discussed below. 

2. Species as taxon. A species taxon is a group of 
natural populations conforming to the definition 
of a species concept. The species concept serves as 
the yardstick by which to delimit a taxon against 
other taxa. A taxon can be described and delimited, 
but not defined. Such a group of populations has 
reality, but its rank (species or subspecies) is often 
difficult to determine. This determination of rank 
is rarely a problem where two closely related popu- 
lations are sympatric or are in the process of invad- 
ing each other’s ranges. If the two populations 
remain reproductively isolated, they are two spe- 
cies; if they freely interbreed, they are only a single 
species. 

For more than 130 years, naturalists, have agreed 
that species are not essentialistic classes. But then 
what are they? Haeckel said, “the species is an 
individual.” However, even though there is an inter- 
nal cohesion in a species, which makes it a unified 
system corresponding to an individual, the word 
‘individual’ in the vernacular always applies to a 
singular object which a species is not. It is definitely 
counterintuitive to refer to the circa 6 billion 
human individuals as an individual. It has therefore 
been suggested to introduce a third ontological 
category, the biopopulation. A species taxon is a 
biopopulation. It has the internal cohesion of an 
individual, but is so to speak a multiple individual. 
However, a species taxon is definitely not a class. 

3. Species as category. A category designates rank 
or level in a hierarchic classification. The species 
category is the class whose members are the species 
taxa. Not only are all biological species placed in 
this category, but so also are those asexual entities 
(agamospecies) that are as different from each other 
as are biological species. Such pluralism in the spe- 
cies category is inevitable since biological species 
and agamospecies are two rather different natural 
phenomena. Agamospecies occur in most higher 
taxa of animals (not in mammals or birds), are 
very common in plants, and all ‘species’ of prokary- 
otes are agamospecies. An agamospecies consists of 
a set of clones which, in the aggregate, are 
as different from other agamospecies as are good 
biological species from each other. Owing to the 
steady selection against inferior clones, gaps are 
produced among the clones that correspond to the 
species borders of biological species. 


Species concepts 


For a layperson, a species was simply an assemblage of 
similar entities. But, as the knowledge of nature grew, 


this vague concept was no longer adequate and a need 
developed for a more precise definition of the species 
concept. The first attempt was the typological species 
concept, but when its weaknesses were discovered, a 
considerable number of ‘better’ concepts were pro- 
posed by naturalists. Six of these concepts in addition 
to the typological one will be discussed here. 


Typological or Essentialistic Species Concept 
This concept developed during and after the Renais- 
sance and particularly in the eighteenth century. It was 
supported by three sets of observations: 


1. The observations of the naturalists of seemingly 
well defined kinds of species of animals and plants 
at a given locality. 

The belief of the Christian naturalists that there are 

“as many species as there were diverse forms cre- 

ated in the beginning” (Linnaeus). 

3. The philosophical view first advanced by the 
pythagoreans and Plato that the observable vari- 
ation of nature can be assigned to separate classes 
characterized by their definition (eidos, essence). 
Each class consists of constant and essentially iden- 
tical members. 


N 


According to this concept, the observed diversity of 
the universe reflects the existence of a limited number 
of underlying ‘universals’ or types. Individuals do not 
stand in any special relation to each other, being 
merely manifestations of the same type. Members of 
a species form a class. Variation is the result of imper- 
fect manifestations of the idea implicit in each species. 
This concept was the species concept of Linnaeus and 
his followers. Because this philosophical tradition is 
also referred to as essentialism, the typological defin- 
ition is also sometimes called the essentialist species 

concept. Degree of phenotypic difference is the criter- 
ion of species status for the adherent of the typological 
species concept. For him, a different species is simply 
that which is different. Morphological evidence is 
used by all taxonomists, but there is an enormous 
difference between basing one’s species concept 
entirely on degree of difference and using morpho- 
logical evidence as an inference in the application of a 
biological species concept. 

The typological species concept was accepted by 
taxonomists almost unanimously as late as the mid- 
nineteenth century. It included the acceptance of four 
postulates: 


1. Species consist of similar individuals sharing the 
same essence. 

2. Each species is separated from all others by a sharp 
discontinuity. 
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3. Each species is completely constant through time. 
4. There are strict limits to the possible variation 
within any one species. 


According to this concept, species are defined as 
groups of similar individuals that are different from 
individuals belonging to other species. 


Difficulties of the typological species concept 
In the last 150 years, more and more exceptions to 
these criteria were found by working taxonomists. 
Great phenotypic variation was discovered in many 
species and the different sexes or different age stages 
or other intraspeciefic variants were often at first 
described as different species. When unmasked even- 
tually, they were assigned to their proper species on 
the basis of biological criteria (life history, etc.). This 
was in plain conflict with the typological definition. 
Equally troublesome was the opposite extreme, the 
absence of observable phenotypic differences between 
noninterbreeding coexisting species. On the basis of 
life history criteria, literally hundreds of morphologic- 
ally indistinguishable species were described in vir- 
tually all higher taxa of animals, from mammals 
down to protozoans. What was, for instance, tradi- 
tionally considered Paramecium aurelia was finally 
shown to consist of no less than 14 ‘sibling species,’ 
as such cryptic species are called. They also occur 
among flowering plants. Gilbert White, the vicar of 
Selborne, discovered in 1768 the first sibling species 
by showing that the leaf warbler Phylloscopus trochilus 
of Linnaeus actually consisted of three different spe- 
cies. It became quite clear in time that the typological 
species concept was not particularly suitable for sexu- 
ally reproducing species of organisms. What other 
concept should instead be adopted? 


Biological Species Concept 

The shift from a strictly typological to a more bio- 
logical species concept is already foreshadowed in the 
writings of Linnaeus. When he discovered that he had 
described the juvenile goshawk and the female mallard 
owing to plumage differences as different species, he 
reduced his names to synonyms, realizing that on the 
basis of life history criteria they belonged to the same 
species. He revealed that for him biological criteria 
had primacy over degree of phenotypic difference. 
He evidently had asked himself what the real meaning 
of the word ‘species’ was and had adopted a new 
concept. 


The meaning of species 

Perhaps one should first ask what is meant by con- 
cept? The species concept is our view of the role of 
species in nature. To find this out, we must ask the 
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Darwinian question, “Why are there species?, what is 
their meaning in the scheme of things?” 


There is no better way of answering these questions than to 
try to conceive of a world without species. Let us think for 
instance of a world in which there are only individuals, all 
belonging to a single mating community. Every individual to 
varying degrees is different from every other one, and every 
individual is capable of mating with those others that are 
most similar to it. In such a world every individual would be, 
so to speak, the center of a series of concentric circles of 
increasingly more different individuals. Any two mates 
would be on the average rather different from each other, 
and would produce a vast array of genetically different types 
among their offspring. Now let us assume that one of these 
recombinations is particularly well adapted for one of the 
available niches. It is prosperous in this niche, but when the 
time comes for mating, this superior genetic complex will 
inevitably be broken up by recombination. There is no 
mechanism that would prevent such a destruction of genet- 
ically superior combinations and there is, therefore, no pos- 
sibility of the gradual improvement of genetic combinations. 
The significance of the species now becomes evident. The 
reproductive isolation of a species is a protective device 
against the breaking up of its well-integrated coadapted 
gene system. Through organizing organic diversity into 
species, a system has been created that permits genetic diver- 
sification and the accumulation of favorable genes and gene 
combinations without any danger of destruction of the basic 
gene complex. (Mayr, 1963, p. 423) 


The meaning of the species is now clear, it is a pro- 
tected reproductive community. This is expressed in 
the biological species definition: “A species is a group 
of interbreeding natural populations that is reproduc- 
tively isolated from other such groups.” Since the 
interaction of populations is a major aspect of the 
species concept, the concept is strictly applicable only 
in the nondimensional situation that is at a given place 
at a given time. The role of the species concept is to 
serve as a yardstick in the testing of the species status 
of populations. A species, thus, is a population (or 
group of populations), not a type. 

The biological species concept is particularly use- 
ful for the field naturalist, the ecologist, and the student 
of behavior. However, in difficult situations it requires 
an intimate knowledge of natural populations which a 
museum or herbarium taxonomist may not have. 


Isolating mechanisms 

Coexisting biological species are prevented from 
interbreeding by genetic propensities. These are 
referred to as isolating mechanisms. They are “bio- 
logical properties of individuals which prevent the 
interbreeding of populations that are actually or 
potentially sympatric” (Mayr, 1963, p. 91). Cross- 
sterility was long considered the exclusive barrier 


between sympatric species. However, numerous cases 
have been described in the last 100 years of sympatric 
species which virtually never interbreed in nature but 
are shown to be fully fertile in captivity. Their genetic 
independence is sustained by isolating mechanisms 
other than sterility. 

A large number of different kinds of isolating 
mechanisms have now been discovered. They consist 
of premating mechanisms, which prevent the occur- 
rence of copulation, and postmating mechanisms 
(sterility genes, chromosomal incompatibility, etc.), 
which diminish or prevent the success of crossing 
with nonconspecific individuals. Most important 
among the premating mechanisms, particularly in ani- 
mals, are behavioral barriers, but important are also 
seasonal and habitat isolation. Isolating mechanisms 
occasionally break down and permit sporadic hybrid- 
ization, particularly in plants. Yet this does not neces- 
sarily lead to an eventual fusion of the two species, 
when the hybrids and their offspring are of suffi- 
ciently lowered viability to be eliminated rather 
quickly from the gene pool. 

The ‘leaky’ nature of isolating mechanisms would 
seem to invalidate the biological species concept 
(based on noninterbreeding) or at least its applicability 
to plants. However, studies of local floras of angio- 
sperms have shown that a very high percentage of 
plant species have all the characteristics of biological 
species and are remarkably well isolated reproduc- 
tively from other sympatric species, even where occa- 
sional hybridization occurs. 

The perfecting of leaky isolating mechanisms is 
apparently not easy. Several cases are known in plants 
(Quercus, Populus) where the fossil record shows that 
two species were hybridizing many millions of years 
ago and still occasionally hybridize, even though, on 
the whole, they still coexist as two perfectly distinct 
species. In animals an introgression of genes of one 
species into another occurs also, perhaps even fre- 
quently. Wherever the range of the gray wolf (Canis 
lupus) meets that of the coyote (Canis latrans) mito- 
chondrial genes of the coyote are found in the wolf 
populations. But phenotypically the introgressed 
individuals look like typical wolves. 

The application of the biological species concept to 
populations may encounter various difficulties. Four 
of them deserve special discussion. 


Agamospecies Asexual organisms do not form bio- 
populations but produce separate uniparentally re- 
producing clones. The biological species concept can 
therefore not be applied. Instead, an assemblage of 
similar clones is considered an agamospecies (Cain) 
and ranked in the Linnaean hierarchy as if it were a 
biological species. 


Some clones (agamospecies) are apparently quite 
isolated from other related clones, because any for- 
merly existing intermediate clones were eliminated by 
natural selection. As clones they show little variability 
and are therefore more easily diagnosed than related 
sexual species. Easy identifiability is helpful in the 
delimitation of species taxa, but this has nothing to 
do with the biological meaning of species concepts. 


The ranking of geographically isolated populations In 
many species, there are isolated populations beyond the 
regular species border, separated by minor geograph- 
ical barriers. Some of these isolates have somewhat 
diverged genetically, owing to selection and stochastic 
processes. Whether or not such populations are still 
conspecific with the main body of the species cannot 
be determined directly but must be inferred. The 
methods on which such inferences are based are 
described in the taxonomic literature. These methods 
primarily make use of degrees of morphological 
difference. It must be emphasized that the rank thus 
determined is not based on difference but on the 
probability of interbreeding as inferred from the 
degree of morphological difference. 

Rejection of the applicability of the biological spe- 
cies concept in the delimitation of species taxa has led 
many botanists to recognize numerous allopatric 
populations as full species, populations which an 
ornithologist or lepidopterist would call subspecies. 
The ‘species’ of such botanists are of highly unequal 
biological significance. 


Incomplete isolation ‘Two incipient species, after a 
period of isolation, may spread and come secondarily 
in contact with each other. If speciation was not yet 
completed, they will form a parapatric hybrid zone. 
If there is a complete breakdown of the isolating 
mechanisms, the two incipient species must be ranked 
as subspecies. However, if only an occasional hybrid is 
produced, they are best considered full species. Occa- 
sional hybridization between related sympatric spe- 
cies is much more common in plants than in animals, 
but in spite of its frequency may not lead to a complete 
breakdown of the barrier between the two species. 


Selfing In some sexually reproducing species, self- 
mating may evolve. This differs from strictly asexual 
reproduction in that the egg does not develop without 
fertilization, but is fertilized by a gamete produced by 
the parent of the egg. An equivalent situation is repre- 
sented by automixis. Virtually selfing lineages occur 
in many sexually reproducing species, but nothing is 
gained by developing a special species concept for 
such situations. Such selfing lineages are best included 
with the biological species from which they are 
derived. 
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The four mentioned difficulties in the application of 
the biological species concept are real. However, all 
endeavors to arrive at a meaningful species concept 
that is equally applicable to a sexual and a sexual 
organisms have been a failure. We must accept having 
two different kinds of species concepts for the two 
kinds of organisms. The decision to call all peripher- 
ally isolated populations full species is biologically 
unacceptable. It leads to a complete negation of the 
actual meaning of species and amounts to an 
unequivocal re-establishment of the typological spe- 
cies concept. 

There is no question that there are a number of 
evolutionary processes (particularly asexuality) to 
which the biological species concept cannot be 
applied. In such cases one must adopt plural solutions, 
as so often in other branches of biology. Alternative 
mechanisms seem far less acceptable. 


Other Species Concepts 


The biological and the typological species concepts are 
the two most widely adopted species concepts. How- 
ever, in order to correct what some authors considered 
to be deficiencies of the biological and typological 
species concepts, a number of other species concepts 
have been proposed over the years. Some of these 
cannot properly be considered concepts; they are sim- 
ply operational instructions for how to delimit species 
taxa. Many of the controversies which they have 
raised are still raging. 


Nominalistic Species Concept 
Nominalists deny the real existence of species. As 
stated by one of them (Bessey): 


nature produces individuals and nothing more...species 
have no actual existence in nature. They are mental concepts 
and nothing more...species have been invented in order 
that we may refer to great numbers of individuals collec- 
tively. 


Every naturalist knows from practical experience that 
this is simply not true. Species of animals are not 
human constructs, nor are they types in the sense of 
Plato, rather they are existing entities for which there 
is no equivalent in the realm of inanimate nature. 
There is no better refutation of the nominalist claim 
than the fact that primitive natives refer to the same 
natural populations as species as do university gradu- 
ates in the Western world. Even ina local flora, the vast 
majority of species are exceptionally well demarcated 
against each other. It is only among asexual organisms 
that species frequently may have to be delimited rather 
arbitrarily as stated by the nominalists. 


1868 Species 


Evolutionary Species Concept 

Some paleontologists have advanced a species concept 
based on evolutionary criteria. In Simpson’s definition 
(1961, p. 153), “an evolutionary species is a lineage (an 
ancestral-descendent sequence of populations) evolv- 
ing separately from others and with its own unitary 
evolutionary role and tendencies.” Actually this is the 
definition of a phyletic lineage, not of a species. It 
applies equally to almost any isolated population or 
incipient species; it also fails to explain what a ‘unitary 
role’ is and why phyletic lines do not interbreed with 
each other. What apparently concerned Simpson most 
was the problem of the delimitation of species taxa in 
the time dimension, but here his definition is of little 
help. When we consider a sequence of morphotypes in 
a single phyletic lineage, how are we to know whether 
these morphotypes have different unitary evolution- 
ary roles and tendencies and should thus be consider- 
ed different species or whether all of them have the 
same unitary evolutionary role and should thus be 
treated as chronospecies? The principal weakness of 
the so-called evolutionary species definition is that it 
fails to account for the causation and maintenance of 
discontinuities among contemporary species. Further- 
more, none of the proponents of an evolutionary spe- 
cies definition has provided a nonarbitrary criterion 
by which to divide a continuous phyletic lineage 
into separate species taxa. Hennig (1966) arbitrarily 
terminated every evolutionary species when a daugh- 
ter species branched off the parental lineage, 1 ignoring 
the fact that the parental species may remain un- 
changed when a new species originates by peripatric 
speciation. The evolutionary species concept does not 
account for the selection forces responsible for the 
existence of species. 


Phylogenetic Species Concept 

Cladists, in recent years, have proposed one further 
species concept. It is their endeavor to delimit the 
branches of the phylogenetic tree and to locate the 
branching points. To facilitate this they recognize 
‘phylogenetic species.’ “A phylogenetic species is an 
irreducible (basal) cluster of organisms, diagnosably 
distinct from other such clusters, and within which 
there is a parental pattern of ancestry and descent.” 
This definition does not use the word ‘population’ but 
indicates by its reference to ‘a parental pattern of 
ancestry and descent’ that it refers to a branch. Each 
branch is initiated by a population designated the stem 
mother species. A species, for the cladists, is an evolu- 
tionary unit characterized by its difference from other 
populations. Such a population has no other biological 
significance than its potential to be the starting point 
of a new phylogenetic lineage. Relying entirely on 
the amount of difference from other populations, the 


phylogenetic species is actually an undisguised return 
to the typological species concept. Almost any isol- 
ated subspecies of a traditional classification can be 
raised to species rank under the phylogenetic defin- 
ition. The result is a highly uneven value of accepted 
phylogenetic species. According to the biological 
species concept, many of them would be subspecies, 
others incipient species or allospecies, and finally still 
others well-isolated full species. The phylogenetic 
species basically has no biological significance and 
since its criterion, degrees of difference, is purely sub- 
jective, it leads to a rather arbitrary determination of 
species status. 


Recognition Species Concept 

Paterson proposed a recognition species concept, 
based on the capacity of members of a species to 
recognize each other as potential mates. However, it 
has been shown that Paterson’s arguments were based 
on misunderstandings, and that the recognition spe- 
cies concept is only a version of the biological species 
concept. 


Cohesion Species Concept 

The biological species concept describes an ideal situ- 
ation of a large interactive cohesive gene pool, 
completely isolated reproductively from other such 
species. Alas, there are numerous situations which do 
not quite fit, particularly the agamospecies. But there 
are also ‘leaky’ isolating mechanisms, resulting in spe- 
cies hybrids and in the introgression of genetic mate- 
rial into another species, there are selfing lineages and 
species, and there is the complete abandonment of 
sexual reproduction and the origin of agamospecies. 
Templeton proposed to bring all these diverse kinds of 
species under one hat by accepting a ‘cohesion species 
concept.’ It defines “a species as the most inclusive 
group of organisms having the potential for genetic 
and/or demographic exchangeability” or “a popula- 
tion of individuals having the potential for phenotypic 
cohesion through instrinsic cohesion mechanisms.” It 
is difficult to see an underlying Darwinian concept in 
these definitions. They sound more like instructions 
for the delimitation of species taxa. 

It also seems that many phenomena that should be 
covered by the cohesion concept do not fit. For 
instance, where is the cohesion among clones of an 
agamospecies, ultimately derived from one ancestral 
species. There is cohesion within each clone, but none 
whatsoever between clones. Worse, ‘the potential for 
genetic...exchangeability’ is shared by many very 
good species, coyote and wolf being a typical example. 
Most species of duck cross readily with each other 
and produce fertile hybrids. And, of course, all pro- 
karyotes freely exchange genes, even the two very 


different subdivisions of eubacteria and archebacteria. 
Although the cohesion species concept is not a suit- 
able replacement of the biological species concept, the 
reading of Templeton’s account is recommended as a 
particularly well-informed discussion of the difficul- 
ties of a sound species definition. 
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‘Species selection’ refers to a process where a variety 
of factors cause some species to produce more species 
than others, resulting in a change in proportion of 
species deriving from different ancestral species. 

Species selection is analogous to natural selection, 
which operates at the population level. The closest 
natural selection analogy would be to clonal selection, 
since species are splitting (or becoming extinct) at 
different rates. The difference in rates explains the 
differences in relative abundance of species groups, 
each deriving from a single ancestral species. 

Species selection involves selection among species 
whose long-term result may be a morphological 
change in the modal morphology or other traits of 
the whole population of species under consideration. 
We presume a source of heritable variability, a re- 
productive mechanism, and differential ‘fitness’ 
among units — in this case — species. Differential fitness 
would include differential rates of speciation of 
different species morphological types or differential 
extinction. Speciation events generate among-species 
morphological variation, while selective mortality, or 
differential speciation rates, would bias survival in one 
morphological direction. 

Species selection would involve cases where species- 
level properties bias speciation or extinction rates. For 
example, if reduced dispersal is fixed in a species it will 
tend to increase the probability of regional isolation, 
genetic differentiation, and speciation. One ancestral 
species bearing the trait of reduced dispersal might 
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therefore produce more species than another species 
with widespread dispersal. Traits associated with the 
reduced-dispersal species might hitchhike along and 
be proliferated in a large group of descendant species, 
relative to traits hitchhiking with the species that has a 
low speciation rate. 

Species selection is a process that is thought to gen- 
erate the array of morphological differences among 
species either to amplify or even supplant the power 
of natural selection operating within each species. Thus 
natural selection (within species) is necessary but may 
be insufficient to explain the breadth of morphological 
differences within a monophyletic group of species. 

Species-level characters such as dispersal type 
might also influence extinction rates. For example, 
low dispersal might promote smaller geographic spe- 
cies ranges, which would make such species more 
vulnerable to extinction. Species with short dispersal 
might therefore have high speciation rates and high 
extinction rates, which would suggest that small shifts 
in the values of these rates could cause very different 
outcomes in success at the level of species selection. 

Species selection is a controversial process that has 
been suggested to demonstrate how macroevolution- 
ary processes above the level of population could 
cause large-scale evolutionary trends. 


See also: Natural Selection; Speciation; Species; 
Species Trees 
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Any species consists of many individuals, and those 
are distributed in a certain more or less continuous 
geographical area. As time goes on, some geological 
factor, such as mountain formation, shift of river 
course, creation of channel, may create a geographical 
barrier to this species. This kind of event prompts to 
produce two or more geographically isolated popula- 
tions within the same species, and the evolutionary 
history (including allele frequency change through 
random genetic drift, occurrence of new mutation, 
pattern of natural selection of those populations will 
be different. Those factors all contribute to differenti- 
ation of genetic constitutions of those populations, and 
they become incipient species. Iteration of this diver- 
gence process produces a tree-like structure, as Darwin 
(1859) showed using a figure (Figure |). This structure 
is called a species tree. 
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Figure | Darwin’s diagram of species trees. 

Each species is considered to be an abstract point in 
a species tree. However, one species includes many 
individuals, and in reality, a species tree is a very rough 
approximation of the genealogy of genes residing in 
those individuals. Because species trees are usually 
estimated from gene tree(s), we should be careful to 
distinguish two types of trees. 

In fact, a gene tree may be different from the corres- 
ponding species tree. This difference comes from the 
existence of gene genealogy in the ancestral species. A 
simple example is illustrated in Figure 1A. A gene 
sampled from species A has its direct ancestor at the 
speciation time T; generations ago, and so does a gene 
sampled from species B. Thus the divergence time 
between the two genes sampled from the different 
species always overestimates that of species. The 
amount of overestimation corresponds to the coales- 
cence time in the ancestral species, and its expectation 
is 2N for neutrally evolving nuclear genes of diploid 
organism, where N is the population size of the ances- 
tral species. Therefore, if the two speciation events (T4 
and T2) are close enough, the topological relationship 
of the gene tree may become different from that of the 
species tree, as shown in Figure IB. Although species 
A and B are more closely related than to C, genes 
sampled from species B and C happen to be more 
closely related with each other than to that sampled 
from species A. The probability of obtaining an 
erroneous tree topology is given by Prob(error) = 
(2/3) eT p where T = T, — T; generations. There- 
fore, a species tree estimated from a single gene may 
not be correct even if the gene tree was correctly 
estimated. In this case, we should use more than one 
gene. 


When distantly related species are compared, how- 
ever, the above problem is not so serious, and top- 
ology of a gene tree most often corresponds to that of 
species tree. This is foundation of ‘molecular phylo- 
genetics, in which molecular data such as nucleotide 
sequences are used to reconstruct phylogenetic rela- 
tionships of various species. 
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Genetic variation, the existence of at least two forms, 
is the essential ingredient in all genetic experiments. 
Phenotypic variation, in particular, is used as a means 
for uncovering the normal function of a wild-type 
allele at many loci. It was the availability of many 
variant phenotypes within the fancy mouse trade that 
made the house mouse such an ideal organism for 
studies by early geneticists. In a sense though, the 
house mouse won by default because, in the absence 
of domestication and artificial selection, variation in 
traits visible to the eye is extremely rare, and thus, 
other small mammals were genetically intractable. 
Although the fancy mouse variants provided material 
for a host of early genetic studies, the number of 
different variants was still limited, and the rate at 
which new ones arose spontaneously in experimental 
colonies was exceedingly low: it is now known that, 
on average, only one gamete in 100000 is likely to 
carry a detectable mutation at a particular locus. 
During the 1920s, several investigators began inves- 
tigating the effects of X-rays on reproduction and 
development. In two laboratories, at least, new mutant 
alleles were recovered in the offspring of irradiated 
parents, but the investigators failed to make any 
connection between irradiation and the induction of 
these mutations. The connection was finally made by 
Muller who, in 1927, published his classic paper 
explaining the induction of heritable mutations by 
X-rays. Since that time, geneticists who study all of 
the major experimental organisms — from bacteria to 
mice — have used both ionizing irradiation and various 


chemicals as agents of mutagenesis to create novel 
alleles as tools for understanding gene function. 

Large-scale mouse mutagenesis experiments were 
first begun at two government-based ‘atomic energy’ 
laboratories: the Oak Ridge National Laboratory in 
Oak Ridge, Tennessee, in the US and the MRC Radio- 
biological Research Unit first at Edinburgh, and then 
at Harwell, in the UK. Both of these experimental 
programs were begun after World War II, as a means 
of quantifying the effects of various forms of radiation 
on mice and, by extrapolation, humans, to better 
understand the consequences of detonating nuclear 
weapons. The US effort was directed by W.L. Russell 
and the British effort was directed by T.C. Carter. 
Scientists at both laboratories quickly realized the 
potential of their newly created resource of mutant 
animals, and both laboratories have since gone on to 
generate mutations by chemical agents as well. The 
studies at Oak Ridge and Harwell were very large; 
10 000-60 000 first-generation animals were typically 
analyzed in an experimental protocol. They have pro- 
vided most of the empirical data currently available on 
the mechanisms and rates at which mutations are 
caused by all well-characterized mutagenic agents in 
the mouse. 

The experiments performed by Russell and Carter, 
and other colleagues who followed in their footsteps, 
were designed to obtain discrete values for the muta- 
genic potential of different radiation protocols. Rather 
than attempt to examine all animals for all effects of 
a particular irradiation protocol (as was common in 
earlier experiments), these mouse geneticists chose 
instead to look only at the small fraction of animals 
that were mutated at a small set of well-defined ‘spe- 
cific’ loci. The rationale for the ‘specific locus test’ was 
that effects on individual loci could be more easily 
quantitated and that the limited results obtained could 
still be extrapolated for an estimate of whole genome 
effects. Russell decided that mutation rates should be 
followed simultaneously at a sufficient number of loci 
to distinguish and avoid problems that might be caused 
by locus-to-locus variations in sensitivity to particular 
mutagens. He decided further that the same set of loci 
should be examined in each experiment performed. 
The seven loci chosen to be followed in the specific 
locus test were defined by recessive mutations with 
visible homozygous phenotypes that were easily dis- 
tinguished in isolation from each other, and had no 
effect on viability or fertility. The seven loci are agouti 
(a is the recessive nonagouti allele), brown (b), albino 
(c), dilute (d), short-ear (se), pink-eyed dilution (p), 
and piebald (s). A special ‘marker strain’ was con- 
structed that was homozygous for all seven loci. 

In its simplest form, the specific locus test is carried 
out by mating females from the special marker strain 
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to completely wild-type males that have been pre- 
viously exposed to a potential mutagen. In the absence 
of any mutations, offspring from this cross will not 
express any of the seven phenotypes visible in the 
marker strain mother. However, if the mutagen has 
induced a mutation at one of the specific loci, the 
associated mutant phenotype will be uncovered. This 
test is very efficient because it only requires a single 
generation of breeding and visual examination is all 
that is required to score each animal. 

Although recessive mutations at all loci other than 
the specific seven will go undetected in the first gen- 
eration offspring from this cross, it is possible to 
detect a dominant mutation at any locus so long as it 
is viable and produces a gross alteration in hetero- 
zygous phenotype such as a skeletal or coat color 
change. One should realize that the most common 
effect of any undirected mutagen will be to ‘knockout’ 
a gene and, in the vast majority of cases, the resulting 
null allele will be recessive to the wild-type. There is, 
however, a very small class of loci at which null alleles 
will actina dominant or semidominant fashion to wild- 
type. These ‘haploinsufficient’ phenotypes are pre- 
sumably caused by a developmental sensitivity to gene 
product dosage. Among the best characterized of the 
dominant-null mutations are the numerous ones un- 
covered at the T locus (which result in a short tail)- and 
the W locus (which result in white spotting on the coat). 


See also: Brachyury Locus; Coat Color Mutations, 
Animals; W (White Spotting) Locus 
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See: DNA-Binding Proteins; DNA Hybridization; 
Restriction Endonuclease 
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A spermatid is the male animal cell type that results 
from completion of the second division associated 
with meiosis. The only location where meiosis occurs 
in males is within the testis and the cell type from 
which spermatids form are spermatocytes. Meiosis is 
the process where the 47 nucleus in primary sperm- 
atocytes divides to form the 27 nucleus of secondary 
spermatocytes that, in turn, divides to form the 1% 
nucleus of spermatids. While a spermatid is initially a 
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round, immotile cell, it differentiates to becomes the 
specialized, motile spermatozoon in a process called 
spermiogenesis. The essential purpose of spermiogen- 
esis is to produce a spermatozoon that is streamlined 
for swift movement and contains the molecules re- 
quired for this cell to locate and fertilize an egg. Sperm- 
iogenesis does not involve further division of the 
nucleus but it does involve extensive reorganization 
and an unusual division of the cytoplasm. This un- 
usual division of the spermatid cytoplasm results in an 
anucleate cytoplast (essentially a membrane-bounded 
bag of cytoplasm), that was classically named the 
cytoplasmic droplet and is now most frequently called 
the residual body, and a smaller, nucleus-containing 
spermatid. This smaller spermatid retains mitochon- 
dria, centrioles, and certain membrane-bounded organ- 
elles. The residual body contains ribosomes and other 
cellular constituents not required by spermatozoa. 
There is some variation as to when and how the re- 
sidual body forms within the different animal phyla. 
For instance, in the nematodes, the residual body 
forms at the same point when spermatids are created 
during the second meiotic division. Nuclei in nema- 
tode spermatids apparently do not engage in synthesis 
of new RNAs and these cells are unable to synthesize 
any new proteins because ribosomes are discarded 
into the residual body. Consequently, the dramatic 
cell shape changes that characterize the transition of 
a spermatid into a spermatozoon occur solely by re- 
arrangement and modification of preexisting macro- 
molecules. On the other hand, insects and mammals 
exhibit the more common situation where formation 
of spermatids during completion of the second meio- 
tic division is temporally and spatially separated from 
appearance of the residual body. In this case, sperm- 
atids (as the spermatogonia and spermatocytes that 
precede them) remain connected to one another by 
cytoplasmic passageways and formation of the re- 
sidual body is the last major cellular event to occur 
during transition of the spermatid to the sperm- 
atozoon. As such, insect and mammalian spermatids 
retain their ribosomes and the capacity to synthesize 
new proteins during most of the spermatid stage. In 
mammals, transcription of new mRNAs also con- 
tinues during the spermatid stage. These macromol- 
ecules can be shared between spermatids through the 
cytoplasmic passageways that connect adjacent cells, 
and many are essential for maturation of spermatids 
into spermatozoa. The spermatid undergoes a number 
of dramatic changes as it matures, including formation 
of the flagellum which is required for motility by the 
spermatozoon. Internal membranes are also reorgan- 
ized, and the acrosome, which is a specialized struc- 
ture required for egg penetration during fertilization, 
forms during this stage. Prior to residual body 


formation, the spermatid nucleus becomes transcrip- 
tionally quiescent as the DNA becomes tightly com- 
pacted. This can be associated with a change in the 
proteins that interact with the DNA in the spermatid 
nucleus. In mammals, DNA (which is negatively 
charged) is complexed with histone proteins (that 
carry a positive charge) in all non-germline cells and 
all stages of spermatogenesis up to the late spermatid 
stage. At this point, histones are displaced from the 
DNA and replaced with protamines, which carry a 
stronger positive charge than histones. Spermatids in 
which this protamine replacement has occurred are 
transcriptionally inactive and the nucleus is physically 
more compact than the spermatid nucleus that con- 
tains histones. In the last stage, the spermatid forms a 
residual body in which ribosomes and other cellular 
constituents that are no longer needed are discarded. 
This more compact configuration of the protamine- 
containing nucleus permits the head of the spermato- 
zoon to be physically smaller and more streamlined. 
A great deal remains to be determined about how 
spermatids form and function. For instance, it is un- 
known how incomplete cytokinesis that results in 
cytoplasmic passageways occurs. The mechanism that 
ensures proper segregation of cytoplasm during sep- 
aration of the spermatid from the residual body is 
another poorly understood process. 


See also: Meiosis 
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Spermatocytes are germ cells in the testes of animals 
that are engaged in meiosis. The testes contains popu- 
lations of cells, named spermatogonia, that proliferate 
by mitotic division to yield new cells. As is usually the 
case for mitotic division, these cells are equivalent 
with respect to the DNA content of their nuclei. 
After one or more mitotic divisions, spermatogonia 
divide to form spermatocytes. Spermatocytes differ 
from spermatogonia in that they enter meiosis and 
engage in the genetic recombination that characterizes 
this process. They are the only cell type in the male 
body where meiosis occurs. Genetic recombination 
shuffles the genome so that new combinations of 
genetic material are created and the resulting cells 
have a genetic endowment that differs from the start- 
ing spermatogonial cell. After committing to meiosis, 
two cellular divisions occur where the 47 primary 
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spermatocyte divides to form two 2n secondary 
spermatocytes (n refers to one complete copy of the 
genetic material). In turn, each secondary spermato- 
cyte divides to form two 17 spermatids. The genetic 
recombination associated with meiosis occurs in the 
primary spermatocyte. In many animals, spermato- 
cytes divide incompletely so that cells remain 
connected by passageways of cytoplasm. These pas- 
sageways allow the free exchange of macromolecules, 
such as proteins and other materials, between con- 
nected cells. The significance of these passageways is 
that spermatocytes are nearly identical with respect to 
the proteins they contain even though the DNA tem- 
plate in their nucleus can differ due to genetic recom- 
bination that had occurred during meiosis. Although 
there have been some advances in our understanding 
in recent years, how a spermatocyte exits mitosis and 
enters meiosis is still poorly understood. It is also 
unknown how incomplete cell division associated 
with cytoplasmic passageways occurs during the for- 
mation of spermatocytes. 


See also: Spermatids; Spermatogonia 
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Wild-Type Spermatogenesis 


As in most animals, spermatogenesis in the nematode 
Caenorhabditis elegans is a differentiation pathway 
where spermatogonia ultimately differentiate into 
spermatozoa. Wild-type C. elegans spermatogenesis 
is summarized in Figure |. The 47 primary spermato- 
cyte buds off the rachis (a central syncitial cytoplasmic 
core) after entering pachytene of meiosis I and under- 
goes the first meiotic division. Unlike mammalian 
spermatogenesis where primary spermatocytes remain 
connected to one another, the C. elegans primary 
spermatocyte is an individualized cell. Subsequent 
development of the primary spermatocyte can occur 
in vitro in the absence of added hormones, growth 
factors, or any other supporting cell type(s) such as 
the Sertoli cell that is required for mammalian sperm- 
atogenesis. The two secondary spermatocytes either 
completely separate or stay linked by a cytoplasmic 
bridge as they undergo the second meiotic division to 
yield a total of four haploid spermatids. Asymmetric 
cytoplasmic partitioning places many cytoplasmic 
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constituents not required for subsequent differen- 
tiation into the residual body during formation of 
haploid spermatids. Resulting nonmotile, apolar 
spermatids are activated to form motile bipolar sperm- 
atozoa in a 5-10-minute differentiation process. 
Nematode spermatozoa lack the flagellum and acro- 
some that characterize spermatozoa of many other 
species, including most vertebrates. Instead, C. eleg- 
ans spermatozoa crawl by directed membrane flow 
of a single pseudopod that drags the cell body across 
the substrate. A C. elegans spermatogonial cell com- 
pletes spermatogenesis over several hours rather than 
days or, in the case of mammals, weeks. 


C. elegans Reproductive Biology 


Like most animals, C. elegans produces both sperm 
and eggs and the union of these two gametes produces 
new individuals. Four larval growth stages occur 
between hatching and formation of the sexually 
mature adult. While there are C. elegans males, this 
species does not produce females in the conventional 
sense. In fact, C. elegans exists principally as a self- 
fertile protandrous (‘male first’) hermaphrodite where 
both sperm and eggs are produced by the same indi- 
vidual. Anatomical comparison of C. elegans to other 
dioecious Caenorhabditis species (which have both a 
true male and female) reveals that the C. elegans herm- 
aphrodite somatic tissues are female. The hermaphro- 
dite germline has been modified so that it functions as 
a testis during the fourth larval stage and then switches 
to produce oocytes as an adult. The sperm are stored 
in a sac named the spermatheca and ovulation places 
eggs into the spermatheca where they are fertilized. 
Spermatogenesis in males also begins during the fourth 
larval stage but continues throughout adult life. When 
a male copulates with a hermaphrodite, his sperm 
will preferentially (versus hermaphrodite-produced 
sperm) fertilize eggs. Spermatogenesis is highly 
similar in both sexes. 


Exploiting Caenorhabditis elegans Biology 
to Obtain Spermatogenesis Mutants 


The unusual reproductive biology of C. elegans has 
greatly facilitated genetic analysis of spermatogenesis 
in this organism. In dioecious organisms, recovery of 
mutants with specific defects in spermatogenesis is 
difficult. Such mutants are first detected by the in- 
ability of a male to sire progeny, which can have 
many explanations that are unrelated to spermato- 
genesis. For instance, males with subtly defective 
genitalia where (otherwise normal) sperm cannot 
exit the male would be among the mutants recovered 
in such a mutant hunt. Compounding this problem, 
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Stages of wild-type spermatogenesis are shown diagramatically as an ordered pathway of morphogenesis 


(top line). Spermatocytes are initially in syncytium with a central cytoplasmic core named the rachis, and they bud 
from this structure after initiating meiosis. Spermatocytes subsequently develop as individualized cells and do not 
require any accessory cells. Along the pathway are listed some of the >40 spe and fer genes and their approximate 
point of developmental arrest as determined by light microscopy. For instance, spe-! 7 spermatids are abnormal and 
spe-9 mutant spermatozoa cannot fertilize oocytes after gamete contact. The last step shows spe-/! mutant 
spermatozoa that can fertilize oocytes but cause death of the resulting embryo because they do not provide a 
component required for embryogenesis. Beneath the pathway are abnormal cells that accumulate when a gene is 
mutated. n, nucleus. (Modified from and reproduced with permission from LHernault SW, Shakes DC and Ward S 


(1988) Genetics 120: 435-452.) 


heterozygous siblings must be maintained for all 
examined mutants so that a mutation responsible for 
a bona fide spermatogenesis defect can be recovered. 
The situation in C. elegans is much more straightfor- 
ward because candidate spermatogenesis-defective 
mutants can be initially identified in hermaphrodites 
in the absence of mating. Normally, internal self- 
fertilization in young hermaphrodites is so efficient 
that virtually every sperm fertilizes an egg, which then 
start embryogenesis and are subsequently laid on the 
agar growth plate. A mutation that affects spermato- 
genesis will abolish self-fertility, and these mutant 
hermaphrodites will lay numerous oocytes that 
appear different from embryos under low-power 
magnification. Self-fertilization in hermaphrodites is 
not absolutely required for C. elegans reproduction 
and these mutant, self-sterile hermaphrodites can be 
placed with wild-type males and allowed to mate. 
If they produce outcross progeny, then inseminated 
wild-type sperm can correct the sterile phenotype 
of the mutant hermaphrodite. This indicates the 
mutant hermaphrodites contained defective sperm 
and, so far, this technique has allowed identification 
of >40 genes that appear to affect spermatogenesis. 
The principal reason it has been possible to identify 
and recover this large collection is that a mutation 


affecting spermatogenesis is identified and recovered 
from the same individual animal. 

Figure | is a cartoon of light microscopic pheno- 
types and shows the approximate point where sperm- 
atogenesis or fertilization defective (spe or fer) 
mutants affect or arrest further development; 
about half of the available mutants appear in this 
figure. In mutants on the lower part of the figure 
(e.g., spe-4), part of the wild-type pathway (upper 
row in Figure l) of spermatogenesis occurs before 
abnormal cytology becomes evident. This is a large, 
diverse set of mutations and many of the processes 
required for normal spermatogenesis are affected 
in one or more mutants. Most mutants depicted in 
Figure | and >20 mutants not shown are incompletely 
understood. A combination of genetic, molecular, and 
cell biological tools is being used to analyze these 
mutants. 


Further Reading 

LHernault SW (1997) Spermatogenesis. In: Riddle DL, 
Blumenthal T, Meyer BJ and Priess JR (eds) C. elegans, 2nd 
edn, pp. 274-294. Plainview, NY: Cold Spring Harbor 
Laboratory Press. 


See also: Caenorhabditis elegans 
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Spermatogenesis is the complex developmental pro- 
cess by which haploid spermatozoa, capable of ultim- 
ately fertilizing an egg, develop as a consequence of 
both meiosis and differentiation from diploid stem 
cells in the testis. Spermatogenesis occurs in the sem- 
iniferous tubule compartment of the testis, and this 
compartment can be clearly influenced by the other 
compartments of this organ. The seminiferous tubule 
is composed of a structurally complex epithelium 
surrounded by interstitial cells (e.g., Leydig cells, 
muscle-like myoid cells) and fluid. This epithelium is 
comprised of two basic cell types, the somatic Sertoli 
cells and the germ cells. The Sertoli cells display a 
columnar morphology that extends from the base of 
the epithelium to the lumen of the duct. They sur- 
round and nurture the differentiating germ cells, and 
through Sertoli—Sertoli cell junctions form the blood- 
testis barrier, which ensures that only spermatogonia 
and early spermatocytes are exposed to circulating 
macromolecules in the blood and lymph. 

It is known that Sertoli cell products can interact 
with the germ cells to regulate their differentiation, 
and there is ample evidence for the reciprocal release 
of products from the germ cells that can affect Sertoli 
cell function. Spermatogenesis depends on testoster- 
one that is secreted by the Leydig cells. The control of 
testosterone production by these cells is regulated 
at the level of the anterior pituitary through its secre- 
tion of luteinizing hormone (LH) in response to hypo- 
thalamus-derived gonadotropin-releasing hormone 
(GnRH). Follicle-stimulating hormone (FSH) is also 
required for spermatogenesis and is released from the 
anterior pituitary in response to GnRH released from 
the hypothalamus. Both testosterone and FSH act 
directly on the seminiferous tubule epithelium to regu- 
late spermatogenesis. 

Spermatogenesis can be divided into three major 
developmental stages: (1) a period of mitotic cell prolif- 
eration; (2) two meiotic cell divisions; and (3) spermio- 
genesis, a series of morphological changes that give rise 
to the highly polarized spermatozoon. Spermatogen- 
esis is initiated at the time of puberty with mitotic 
divisions of the primitive, type A spermatogonia. Sub- 
sequent cell divisions of a majority of the daughter cells 
then ensues, leading to two populations of sperm- 
atogonia. One of the populations enters a differentia- 
tive pathway and eventually become spermatozoa, 
whereas the other population undergoes apoptosis, 
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otherwise knownas programed cell death. The remain- 
ing daughter cells that do not undergo these multiple 
rounds of cell divisions undergo a different fate, being 
in some way similar to their progenitors. This popula- 
tion of ‘resting’ spermatogonia continues to function 
as stem cells, being required for subsequent rounds of 
spermatogenesis, thus ensuring the continuous pro- 
duction of spermatozoa. The percentage of type A 
spermatogonia that adopt these different fates varies 
from species to species. Those type A spermatogonia 
that continue to divide eventually differentiate to type 
B spermatogonia that are committed to enter meiosis. 
Those germ cells that enter meiosis are termed sperm- 
atocytes. Primary spermatocytes undergo the first 
meiotic division and this division is characterized by 
a long prophase; as a consequence, these cells can 
often be seen in histological examinations of the tes- 
ticular seminiferous epithelium. It is during this first 
meiotic division that the paired homologous chromo- 
somes of the primary spermatocytes participate in 
crossing-over, giving rise to genetic recombination. 
The final result of this first meiotic division is the 
production of two secondary spermatocytes, each 
containing the entire set of duplicated autosomal 
chromosomes and either a duplicated X or duplicated 
Y chromosome. 

The second meiotic division carried out by the 
secondary spermatocytes is of short duration and 
thus is more difficult to stage in histological sections. 
It is during this second meiotic division that each 
secondary spermatocyte produces two spermatids, 
each with a haploid number of single chromosomes. 
These spermatids then enter spermiogenesis, and it is 
during this time of development that they undergo a 
series of dramatic differentiation events that even- 
tually conclude with the development of a very highly 
polarized spermatozoon, the morphology of which is 
species-specific. Examples of such differentiation 
events include: (1) the formation of an acrosome, a 
secretory granule that overlies the nucleus of the sperm- 
atozoon and participates in the fertilization process; 
and (2) the assembly of the spermatozoan flagellum 
which powers sperm motility. In the mouse sperm- 
atogenesis takes approximately 30 days to complete, 
14 days of which are devoted to spermiogenesis. 
Human spermatogenesis, in contrast, takes approxi- 
mately 64 days to complete, 35 days of which are 
devoted to spermiogenesis. 

An interesting feature of spermatogenesis is that 
the developing male germ cells fail to complete cyto- 
kinesis during mitosis and meiosis; therefore, clones of 
differentiating daughter cells originating from a single 
maturing spermatogonium remain connected by 
cytoplasmic bridges, forming, in effect, a syncytium. 
These cytoplasmic bridges persist throughout 
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spermiogenesis when the individual spermatozoa are 
released into the lumen of the seminiferous tubules. 
Since the haploid spermatozoa undergo a majority of 
their differentiation after their nuclei have completed 
meiosis, the presence of cytoplasmic bridges ensures 
that each developing spermatozoan shares a common 
cytoplasm with its neighbors, thus supplying them 
with all of the products of a diploid genome. It must 
be emphasized that spermatogenesis takes place in 
clusters that are not necessarily synchronous with 
one another along the length of the seminiferous 
tubules, thus ensuring the production of a constant 
supply of mature spermatozoa. However, owing to 
the syncytial nature of the development of clonal 
populations of spermatozoa from a single spermato- 
gonium, development within these syncytia is syn- 
chronous. 


Further Reading 

Handel MA (1998) Meiosis and gametogenesis. In: Pederson RA 
and Schatten GP (eds) Current Topics in Developmental Biology, 
vol. 37, p. 418. San Diego, CA: Academic Press. 

Hecht NB (1998) Molecular mechanisms of male germ cell 
differentiation. BioEssays 20: 555-561. 


See also: Gametogenesis; Spermatids; 
Spermatocytes 
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Spermatogonia are proliferating germ cells within the 
testes of animals. They are the only self-renewing 
adult cell type in males that is capable of contributing 
to the next generation. Early in the embryonic devel- 
opment of all sexually reproducing animals, a small 
group of germ cells is formed that are destined to be 
the exclusive source of gametes within that organism. 
Germ cells are initially indistinguishable in cellular 
appearance between males and females but this 
changes after the germ cells complete migration into 
the developing gonad. In males, the gonad is named 
the testis (plural: testes) and the germ cells form sperm- 
atogonia within this organ. Spermatogonia are the 
progenitor cells for all spermatozoa, and spermato- 
gonia applies to all germ cells within the testes that have 
not entered meiosis. The testis is the only location in a 
male where meiosis occurs, which is the process where 
the nucleus is reduced from the typical diploid state 
(similar to that found in most cells outside the gonad) 


to the haploid state. For any given organism, the testis 
is usually one of the most impressive organs in terms 
of the sheer number of differentiated cells produced; 
in human males, it is estimated that 10'7-10!* sperm- 
atozoa are produced during his lifetime. This is pos- 
sible because spermatogonia include a population of 
self-renewing cells (named stem cells) that divide to 
produce one cell capable of entering the differentia- 
tion pathway that ultimately forms haploid spermato- 
zoa and another stem cell. Generally, spermatogenesis 
is a polarized, assembly-line-like process where pro- 
liferating spermatogonia are spatially separate from 
cells in meiosis and the spermatids that are near a 
space or tubule into which they are released. The 
spermatogonia that function as stem cells are furthest 
from the position where spermatids are released to 
become spermatozoa. There are usually several mitotic 
divisions (e.g., nine in the laboratory mouse) between 
the time when the spermatogonial stem cell divides 
and when a cell that enters meiosis and becomes a 
spermatocyte will form. Future research will reveal 
how primordial germ cells form, how they differenti- 
ate into spermatogonia, and how the spermatogonia 
retains its stem cell properties. 


See also: Meiosis 
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Spina bifida, literally meaning a cleft in the spine, is 
a localized congenital malformation and a type of 
neural tube defect (NTD). It comprises a range of 
lesions of varying severity along the midline of the 
back involving any number of vertebrae at any level, 
most frequently occurring in the lumbosacral region. 
The minimal expression is spina bifida occulta, which 
may not even be evident externally or produce any 
symptoms, but in meningocoele, the meninges her- 
niate through the cleft in the vertebral arches to 
produce a bulging sac; and in meningomyelocoele, 
the spinal cord as well as the meninges herniates. 
The overlying skin is deficient so the nervous tissue 
is exposed and becomes damaged, leading to motor 
and sensory deficit of the lower part of the body and 
often to incontinence. Hydrocephalus may develop 
too. 

Spina bifida arises during early embryogenesis. In 
weeks 3 and 4 after conception, the future brain 
and spinal cord originate as a flat plate of cells on the 


upper surface of the embryo. The lateral edges of this 
plate elevate, overarch and meet in the midline, fusing 
from several separate starting points to form the 
neural tube. This subsequently induces the tissue sur- 
rounding it to differentiate, eventually to form the 
vertebrae. If the neural folds fail to meet and fuse, or 
if there is incomplete closure, then a gap will occur in 
the tissue above; this is spina bifida. 

Spina bifida, together with the associated and 
equally common anencephalus where the cephalic 
part of the neural tube fails to close, has a birth preva- 
lence which varies according to geographical area, 
ethnicity, socioeconomic status and also temporally. 
In the United Kingdom during this century, the 
prevalence has shown a series of peaks and troughs, 
the peaks occurring roughly every decade, although 
since 1970 the downward trend has continued and not 
been reversed. Within the United Kingdom there is a 
prevalence gradient which is highest in the northwest, 
decreasing to the southeast. In the 1960s, the highest 
known prevalence of NTD in the world was in Belfast 
(8.7/1000 births); today such high rates occur in the 
Indian Punjab and northeast China. NTD are com- 
mon in Celts and Sikhs, but uncommon in many 
Asiatic and black populations. In high prevalence 
areas there is an association with low socioeconomic 
status, but NTD are relatively rare amongst Third 
World peoples who live in abject poverty. 

Clues as to possible causes have been sought from 
these many curious demographic observations. While 
spina bifida can occasionally be caused by the anti- 
convulsant sodium valproate, maternal diabetes, and 
by some specific types of chromosomal imbalance, in 
the vast majority of cases the cause is unknown. These 
are regarded as multifactorial, involving both genetic 
and environmental precipitating factors: environmen- 
tal, because of the observed association with socio- 
economic class and geography; genetic, because of the 
ethnic specificity and because close relatives of pro- 
bands have a higher than average risk of being affected 
themselves. 

It is now known that a deficiency of folic acid 
is involved in the genesis of many spina bifidas, and 
that maternal therapy with 0.4 mg of folic acid daily 
prior to, and in the weeks following, conception pre- 
vents around 72% of cases. Folate fortification of a 
staple food item so that all women of childbearing 
age will be protected has been widely debated, and 
was recently implemented 1 in the United States. This is 
primary prevention. Secondary prevention through 
prenatal diagnosis and termination of affected preg- 
nancies has been practised in the western world since 
the mid-1970s, and has increasingly influenced the 
birth prevalence figures. Open lesions in the fetus 
allow the escape of o-fetoprotein into the amniotic 
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fluid, and elevated levels are also found in the maternal 
bloodstream in around 75% of cases. This forms the 
basis of a screening test in the early second trimester. 
The lesions may also be directly visualized by ultra- 
sound scanning at 16-18 weeks’ gestation. 

The discovery of prophylactic folate therapy has 
not yet led to an understanding of the mechanisms 
involved. But it has stimulated research into metabolic 
pathways which use folic acid and into searching for 
mutations in spina bifida subjects in the genes for key 
enzymes involved. The search also continues for other 
environmental factors involved in the 25% of cases 
where folic acid is not a factor. A number of different 
mouse models of spina bifida exists, and the embryo- 
logical and molecular mechanisms are being investi- 
gated. In addition, experiments involving targeted 
gene mutations in mice have produced spina bifida, 
unexpectedly in some cases, so yielding information 
on the genetic control of normal neural tube forma- 
tion and closure, which may eventually lead to the 
identification of the genetic components in the cause 
of spina bifida. 


See also: Dysmorphology 
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The spindle is the structure formed ina eukaryotic cell 
at the time of division formed by polymerization of 
microtubules. The chromosomes attach to the spindle 
and in metaphase align in a central plate perpendicular 
to the long axis of the spindle. The spindle plays a role 
during anaphase in pulling paired chromatids to oppo- 
site poles. 


See also: Mitosis 
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Splicing is the process whereby introns are removed 
from a newly transcribed primary RNA transcript 
(hnRNA)and exons are joined to produce mature 
mRNA. 


See also: Introns and Exons; Pre-mRNA Splicing 
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Splicing junctions are the sequences immediately sur- 
rounding the exon-intron boundaries. Right-splicing 
junctions comprise the boundary between the right 
end of an intron and the left end of the adjacent exon, 
whilst left-splicing junctions comprise the boundary 
between the right end of an exon and the left end of an 
intron. 


See also: Introns and Exons 
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In prokaryotes the gene and messenger RNA (mRNA) 
are colinear. That is, the sequence of DNA nucleotides 
in the gene is identical to the sequence of RNA 
nucleotides in the mRNA encoded by the gene. By 
contrast, in the vast majority of eukaryotic genes the 
DNA sequences encoding mRNA are interrupted by 
noncoding sequences called introns. The sequences 
that encode the mature mRNA are called exons. 
These ‘split? eukaryotic genes are transcribed by 
RNA polymerase to produce pre-messenger RNAs 
(pre-mRNAs) consisting of alternating exons and 
introns. Many eukaryotic pre-mRNAs are highly 
complex, containing dozens of exons and introns. 
The average size of an exon is around 150 nucleotides, 
whereas introns can be very large, some over 100 000 
nucleotides in length. 

The process called RNA splicing is required to 
produce mature mRNAs that encode proteins. RNA 
splicing involves precise cleavage of the pre-mRNA 
at the junctions between exons and introns, followed 
by the covalent joining of adjacent exons. Thus, 
RNA splicing is similar to splicing an audiotape or 
film, where the unwanted piece is clipped out with 
scissors and then taped together to produce the 
desired sound or images. However, because of the 
triplet nature of the genetic code, a mistake in cutting 
the RNA of only one nucleotide will produce an 
mRNA that has an altered reading frame, thus 


producing a message that cannot encode the correct 
protein. Therefore, the splicing process must be very 
precise. 

In the eukaryotic cell, RNA splicing occurs in the 
nucleus and is carried out by a large RNA-protein 
complex called the spliceosome, which is capable of 
recognizing specific RNA sequences located at the 
exon/intron junctions. A common feature of most 
eukaryotic introns is a GU dinucleotide at the 5’ end 
(the 5’ splice site) and an AG dinucleotide at the 3’ 
end (the 3’ splice site). Once bound to these junctions 
the spliceosome cuts the RNA, and then joins the 
adjacent exons. The importance of a precise cleavage 
at these junctions is dramatically illustrated by the fact 
that a large number of human genetic diseases are 
caused by mutations at 5’ or 3’ splice sites, which result 
in the production of nonfunctional mRNAs during 
splicing. 

Another important consequence of the split gene 
organization is the generation of multiple proteins 
from a single pre-mRNA by a process called alterna- 
tive splicing. In most cases all of the exons present in 
the pre-mRNA are spliced together to form the 
mature mRNA. For example, a pre-mRNA contain- 
ing five exons would give rise to a mature mRNA 
containing the five exons in the order 12345. How- 
ever, in some cases an exon may be skipped by the 
splicing machinery, resulting in an mRNA with the 
exons in the order 1245, which would encode a differ- 
ent protein. This process can be regulated, so that the 
transcription of a single gene can lead to the produc- 
tion of different proteins in different cell types. Of 
course, the number of different exon combinations 
increases with the number of exons in the gene. In 
fact, there are examples in which thousands of differ- 
ent proteins can be produced from a single gene by 
alternative splicing. 

Why are eukaryotic genes split? One theory is that 
exons are the primary unit of protein evolution. 
According to this theory new genes emerge during 
evolution by exon shuffling, which occurs by a pro- 
cess called DNA recombination. Thus, two genes with 
multiple exons could recombine to generate a new 
gene containing exons from both genes. When the 
new gene is transcribed and spliced, a novel protein 
would be produced. Another theory is that the split 
gene organization evolved because it provides an effi- 
cient means of producing multiple proteins from a 
single gene by alternative splicing. 


See also: Alternative Splicing; Frameshift 
Mutation; Introns and Exons 
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The transmissible spongiform encephalopathies 
(TSEs) or ‘prion’ diseases are neurodegenerative dis- 
orders affecting a range of mammalian species. The 
devastating epidemic of bovine spongiform enceph- 
alopathy (BSE) in the United Kingdom and its 
subsequent spread to humans and other species have 
thrown this whole group of diseases into the limelight 
in recent years. BSE, which was possibly first caused 
by contamination of animal feed with sheep scrapie 
offal, has been diagnosed in nearly 180 000 cattle since 
it was first reported in 1986. Novel TSEs, shown to be 
related to BSE, have also been recognized in domestic 
and large cats and in a range of exotic ungulate spe- 
cies in zoological collections. In 1996, a dramatic 
announcement was made that a new form of human 
TSE, variant Creutzfeldt—Jakob disease (vCJD), had 
been identified in 10 patients in the United Kingdom. 
Laboratory studies soon provided compelling evi- 
dence that vCJD was caused by the BSE agent. Since 
then, the number of vCJD cases has crept slowly 
upwards, at present (2001) standing at about 100, but 
it is still too early to predict the eventual size of the 
outbreak. Given this uncertainty, it is vital to under- 
stand the factors governing the transmission of TSEs 
between and within species. It is well established that 
genetic information carried by both the host and the 
agent can have a profound effect on the occurrence 
and characteristics of TSE disease. 


TSE Agents and the Prion Protein 


TSEs are caused by unconventional transmissible 
agents which are unusually resistant to inactivation 
by heat, chemicals, nucleases and proteases. Extensive 
studies have so far failed to reveal any infection-specitic 
nucleic acids in highly infective tissue extracts. Rather, 
infectivity tends to copurify with aggregated forms of 
the host ‘prion protein,’ PrP. The conversion of the 
normal cellular protein, PrP* (an abundant, membrane- 
bound sialoglycoprotein), into the abnormal form, 
PrP“, involves a conformational change to a pre- 
dominantly B-pleated structure. PrP% accumulates in 
the brain and other tissues of infected individuals, 
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often forming fibrillar aggregates, and its presence is 
regarded as a definitive diagnostic feature of these 
diseases. Moreover, studies of transgenic knockout 
mice have shown that PrP is required for propagation 
of TSE infection. These observations have prompted 
the ‘prion’ or ‘protein-only’ hypothesis, originally put 
forward by Stanley Prusiner in 1982, that TSE agents 
contain no nucleic acid but consist solely of modified 
PrP, either PrP“ itself or an intermediate between 
PrP* and PrP*. 

According to the prion hypothesis, when an animal 
is challenged with a TSE inoculum, conformationally 
modified PrP from the inoculum interacts with nor- 
mal host PrP and converts it to PrP**. This in turn is 
suggested to pass on the modification to new PrP 
molecules, resulting in an accumulation of PrP*, 
which eventually interferes with brain function, lead- 
ing to death. While the prion hypothesis has gained 
widespread acceptance, there are still key issues that 
have not been resolved, namely, the exact modification 
of the protein that confers infectiousness, the basis of 
agent strain variation (see below) and the mechanism 
of propagation of this information in a protein- 
only structure. Therefore, the existence of an elusive 
infection-specific informational molecule such as a 
nucleic acid remains a possibility. If such a molecule 
exists, then it may be protected and hidden by aggre- 
gated PrP, as proposed in the ‘virino’ hypothesis of 
Alan Dickinson. 


Experimental TSEs in Mice 


Host Genetic Effects 

Most of our understanding of the biology of the TSEs 
comes from studies of experimental rodent models. 
When a mouse is infected with a TSE agent, there is 
a long asymptomatic incubation period (at least 4 
months) before the appearance of neurological dis- 
ease. If all experimental variables are controlled, the 
length of the incubation period is remarkably repro- 
ducible and depends precisely on interactions between 
genetic information carried by the host and that car- 
ried by the TSE agent. Many years ago, Dickinson 
established that the major component of the host 
effect is associated with a single gene, called Sinc (scra- 
pie incubation), although other host genes also have 
minor effects. Later it became clear from studies in 
congenic and transgenic mouse lines that the Sinc gene 
(sometimes referred to as Prn-z) is located on chromo- 
some 2 and encodes PrP. Only two alleles of the 
Sinc/PrP gene have been found in laboratory mice, 
encoding PrP proteins that differ by two amino acids 
(at codons 108 and 189). The effect of the Simc/PrP 
genotype on the incubation period can be very large 
(up to hundreds of days with certain TSE isolates), 
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suggesting that PrP is involved in a rate-limiting step 
in pathogenesis. 


Agent Strain Variation 

Dickinson also demonstrated that, like conventional 
microorganisms, TSE agents exhibit strain variation: 
to date, approximately 20 phenotypically distinct 
laboratory TSE strains have been identified. The 
most obvious ways in which TSE strains differ are in 
the patterns of incubation periods they produce in the 
three possible Simc/PrP mouse genotypes (the two 
homozygotes and the heterozygote) and in the type 
of pathology produced in the brain. TSE strains differ 
in the length of the incubation period in any single 
Sinc genotype. They also differ in which of the two 
Sinc homozygotes has the shorter incubation period 
and in the apparent dominance of the two alleles in the 
heterozygote mouse. It is well established that TSE 
strains can retain their characteristics over many serial 
mouse passages, although variant strains may some- 
times be selected by changing the passage conditions. 
It has also been demonstrated that the disease pheno- 
type of TSE strains is independent of the genotype, or 
even the species, of the animal from which the infect- 
ive inoculum has been produced. All of these observa- 
tions indicate that TSE agents carry some form of 
strain specific information that is independent of the 
host. It has been suggested that protein-only struc- 
tures could carry this information in the form of mul- 
tiple self-perpetuating conformational modifications 
but it is unclear whether this proposed novel form of 
genetic information can account for the experimental 
observations, or whether a separate informational 
molecule is required. 

Although the basis of TSE strain variation is 
unknown, TSE strain identification based on incuba- 
tion periods and neuropathology in mice, has been 
used to unravel the relationships between TSEs occur- 
ring naturally in different species. Thus, it has been 
shown that isolates from BSE, feline spongiform 
encephalopathy, TSEs of exotic ungulates and, most 
recently, vCJD produce the same disease phenotype in 
mice, providing compelling evidence that they are all 
caused by the same TSE strain. In these cross-species 
transmissions PrP-independent host genetic effects 
become much more prominent, but the genes involved 
have not yet been identified. 


Sheep Scrapie 


Scrapie has been endemic in sheep in the United King- 
dom for at least 250 years. It has long been recognized 
that there is a strong host genetic influence on the 
incidence of sheep scrapie, but it is only in recent 


years that it has been possible to analyze this at the 
molecular level. As in mice, PrP genotype has a major 
effect on the incubation period and occurrence of the 
disease in sheep. This was first described by Nora 
Hunter, working on a closed flock of Cheviot sheep 
that had been selectively bred for many years accord- 
ing to their susceptibility or resistance to experimental 
challenge with scrapie. Later, it was recognized that 
PrP genotype is also important in natural scrapie. The 
sheep PrP gene is highly polymorphic, showing amino 
acid substitutions in at least 12 sites of the coding 
region. However, only three of these polymorphisms 
have been shown so far to have a significant influence 
on the occurrence of scrapie: an alanine(A)/valine (V) 
polymorphism at codon 136, a histidine(H)/argi- 
nine(R) polymorphism at codon 154, and a gluta- 
mine(Q)/arginine(R)/histidine(H) polymorphism at 
codon 171. In terms of these three sites, at least five 
alleles of the PrP gene are present in sheep (ARQ, 
ARR, ARH, AHQ, and VRQ). 

Given this degree of polymorphism, it is not sur- 
prising that PrP genetic effects in natural scrapie are 
very complicated. The frequency of the above alleles 
differs markedly between sheep breeds. Also, a high 
scrapie incidence is associated with different geno- 
types in different breeds or flocks, possibly because 
multiple strains of scrapie agent are involved in the 
natural disease. Indeed, it has been shown experi- 
mentally that different TSE isolates target sheep of 
different PrP genotypes. However, there do appear 
to be PrP genotypes that confer a high degree of 
resistance wherever they occur. For example, only 
one case of scrapie has ever been reported in the 
ARR/ARR genotype. At the other end of the spec- 
trum, sheep of the rare VRQ/VRQ genotype in the 
United Kingdom almost always get scrapie and this 
has led to the suggestion that natural scrapie is a purely 
genetic disease. However, this and other susceptible 
genotypes are present in sheep from scrapie-free 
countries such as Australia and New Zealand, arguing 
strongly that scrapie is an acquired infection. Further- 
more, the occurrence of disease in highly susceptible 
genotypes within a high-incidence flock can be 
delayed, or even prevented, by extremely hygienic 
husbandry during the perinatal period. 


BSE 


In contrast to the above, the PrP gene in cattle shows 
very little variation. One major polymorphism, involv- 
ing a difference the number of an octapeptide repeat in 
the PrP protein, has been described, but this does not 
appear to influence the occurrence of BSE. 
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Human TSEs 


TSEs of humans present as sporadic, familial, or 
acquired disorders. In each category, genetic variation 
in the PrP gene plays a key role in determining the 
occurrence and characteristics of the disease. As in 
sheep, the human PrP gene, which is located on 
chromosome 20, is very variable. To date, four poly- 
morphisms altering PrP amino acid sequence have 
been recorded, including three substitutions and a 
deletion of one of the octapeptide repeats. In add- 
ition, at least 23 mutations have been reported: 14 
point mutations, 8 insertions of varying numbers of 
octapeptide repeats, and one stop codon mutation, 
which results in a truncation of the protein. 


Sporadic CJD 

Sporadic CJD (sCJD) is a rare condition with a world- 
wide distribution, occurring at an annual frequency of 
about one case per million of the population. A com- 
mon polymorphism, resulting in either methionine or 
valine at residue 129 of the human PrP protein, influ- 
ences the occurrence of sCJD. In the general Caucasian 
population, about 35% of individuals are homozygous 
for methionine at this site, 15% are homozygous for 
valine, and 50% are heterozygous. Although sCJD 
occurs in all three of these genotypes, the heterozygote 
is substantially underrepresented. Itis not known what 
causes sCJD, as extensive epidemiological surveys have 
revealed no environmental risk factors. If sCJD is an 
acquired infection, then the causative agent must be 
almost ubiquitous. Alternatively, it has been suggested 
that sCJD occurs as a result of a rare spontaneous con- 
version of PrP* to its pathological infectious form. 


Acquired Human TSEs 

The first clearly acquired TSE to be described in 
humans was kuru, which was associated with ritu- 
alistic cannibalism amongst the Fore people in Papua 
New Guinea. Since then, it has been recognized that 
on rare occasions CJD has spread iatrogenically from 
person to person, by corneal or dura mater grafting, 
contamination of neurosurgical instruments, or treat- 
ment with human pituitary-derived hormones. The 
largest group of iatrogenic transmissions, to over 100 
patients, has involved the administration of CJD- 
contaminated human growth hormone. Amongst 
exposed patients, heterozygosity at codon 129 of the 
PrP gene has resulted in longer incubation periods and 
perhaps also lower susceptibility to infection. 

As described above, there is very strong evidence 
that vCJD, which is clinically and pathologically dis- 
tinct from sCJD and occurs in younger patients, is 
caused by the BSE agent. Although other modes of 
transmission have not been excluded, it is most likely 
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that BSE infection has been acquired from bovine 
products in the diet. All vCJD patients so far have 
been homozygous for methionine at codon 129 of 
the PrP gene, but it remains to be seen whether valine 
homozygotes and heterozygotes are also susceptible 
but with longer incubation periods. 


Familial Human TSEs 

The reported mutations of the human PrP gene, most 
of which are rare, are associated with familial neurolo- 
gical diseases showing a range of clinical and neuro- 
pathological characteristics. The commonest mutation 
results in a substitution of lysine for glutamic acid at 
codon 200, and has been identified in association with 
large clusters of familial CJD in Jewish people of 
Libyan origin and in apparently unrelated commun- 
ities in Slovakia and in Chile. The disease phenotype 
for this and several other mutations is similar to that of 
typical sCJD. Another relatively common mutation 
involves a substitution of asparagine for aspartic acid 
at codon 178. Interestingly, this mutation is associated 
with an sCJD-like phenotype when linked to valine at 
codon 129, but a distinct disease phenotype, fatal famil- 
ial insomnia, when linked to methionine at codon 129. 
A quite different clinical and pathological picture, 
Gerstmann-Straussler-Scheinker disease or GSS, is 
seen in families carrying several of the other mutations, 
the most common being a proline-to-leucine mutation 
at codon 102. 

Many of the familial human TSEs, as well as sCJD, 
have been shown to be transmissible to experimental 
animals. However, as the penetrance for many of the 
mutations is high, it is usually assumed that these are 
purely genetic diseases that occur because the mutant 
protein readily converts spontaneously to a patho- 
logical and transmissible form. Indeed, several mutant 
human PrPs tend to form aggregates when expressed 
in cell lines, although these have not, so far, been 
shown to be infectious. The lesson from sheep scrapie 
is that inheritance of high susceptibility to infection 
can sometimes masquerade as a purely genetic disease 
and the same could be true for at least some of the 
familial human TSEs. 


Concluding Remarks 


It is clear that the PrP gene influences the expression 
of TSE disease in a number of different species, but 
there are many outstanding questions regarding the 
mechanisms involved. Currently the effects of PrP 
variants and mutants are being explored in a range of 
model systems, including transfected cell lines and 
transgenic mice. Other studies are exploring the bio- 
chemical characteristics of the pathological protein 
associated with different PrP variants or with different 
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TSE strains. Hopefully, these studies will clarify the 
complex interactions between TSE infection and host 
PrP genotype. 
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The term ‘spore,’ derived from the Greek for seed, is 
generally applied to small, resistant, dormant cells 
formed by a wide variety of organisms, including 
bacteria, fungi, protozoans, algae, and plants. Spores 
are formed as part of sexual or asexual reproductive 
processes, but when germinated they almost always 
give rise to new individuals or groups of cells. In this 
way, most spores can be clearly distinguished from 
gametes. As one might anticipate from the very wide 
range of organisms that produce them, there are many 
very different types of spores. In most organisms, 
other than the higher plants, spores are much more 
resistant to environmental agents than the organisms 
that produce them. Many types of spores can remain 
dormant for long periods of time. 

The endospores produced by certain gram-positive 
bacteria, such as Bacillus subtilis, are typically referred 
to as spores. Endospores are formed within the bac- 
terial cell, and are themselves highly differentiated, 
resistant, nongrowing cells. They are formed via the 
conversion of a vegetative cell by a complicated path- 
way of gene expression triggered by nutrient exhaus- 
tion. The regulation of this pathway serves as a model 
for the study of differentiation. When conditions 
favorable to growth return, the endospore can convert 
rapidly back into a vegetative cell. Endospores 
have been found to be able to remain dormant for 


thousands of years under suitable conditions, and 
there is some evidence for germination after millions 
of years of dormancy. 

The bacterial group Actinomycetes, of which 
Streptomyces is one genus, also produce spores, but 
these spores are not related to the endospores dis- 
cussed above. These prokaryotes form mycelia remin- 
iscent of the eukaryotic fungi (see below) and often 
produce spores on aerial filaments, once again remin- 
iscent of the fungi. Spore production, the morphology 
of the spores, and spore-producing structures vary 
widely across the actinomycetes. In Streptomyces the 
multinucleate aerial filaments, called sporophores, 
form crosswalls which generate single-celled spores, 
referred to as conidia. Once again this process is 
reminiscent of that of the fungi, but these prokaryotic 
organisms are not related to the eukaryotic fungi. 

The fungi are a group that includes molds, yeasts, 
and mushrooms. The molds and mushrooms grow as a 
mycelium, a mat of cross-branching filaments called 
hyphae. Fungi reproduce by producing spores, usually 
unicellular, either sexually or asexually from spe- 
cialized hyphal compartments. The phylogenetic 
groupings of the fungi are named according to the 
mechanism of the production of sexual spores. 

The Ascomycetes are fungi that form sexual spores 
termed ascospores within an enclosed sac (ascus). The 
Ascomycetes include the yeast Saccharomyces cerevi- 
siae. The ordered spores within the ascus of Saccharo- 
myces can be dissected and analyzed to yield 
information on genetic segregation (see Tetrad Ana- 
lysis). The asexual spores formed by these fungi, called 
conidia, are often brightly colored and very resistant 
to drying. Under favorable conditions members of the 
Zygomycetes, such as Rhizopus, form haploid spores 
asexually in structures called sporangia. When growth 
conditions are poor, zygosporangia are formed by a 
sexual process. The Basidiomycetes, which include the 
mushrooms, puffballs, and rusts, produce sexual 
spores called basidiospores on the ends of club-shaped 
structures (basidia). The asexual spores are typically 
called conidia. Interestingly, some rusts form asexual 
spores called pycnidiospores or spermatia which act 
very much like gametes, fusing with another cell 
before they are able to grow. 

There is a large group of protozoans that used to be 
considered as a single phylum, Sporozoa, but which 
have been reclassified into four separate phyla: Api- 
complexa, Microspora, Acetospora, and Myxospora. 
All members of these phyla are parasites of animals. 
As the original taxonomic name implied, many of 
these organisms form spores, or sporocysts. These 
protozoan ‘spores’ are not homologous to spores pro- 
duced by other organisms. There are differences in 
nomenclature and life cycles amongst these phyla 


and the term ‘spore’ can refer to reproductive, infect- 
ive, or resistant stages in the often complex life cycles 
of these organisms. 

The fungus-like protists, the phylum Myxomy- 
cota (plasmodial slime molds), the phylum Acrasio- 
mycota (cellular slime molds), and the phylum 
Oomycota (water molds), also produce spores. In 
the life cycle of a plasmodial slime mold, haploid 
spores are produced by meiosis from the diploid spor- 
angium, in response to harsh environmental condi- 
tions. The spores germinate into active haploid forms 
which fuse to form the diploid stage. The spores of the 
cellular slime mold are also haploid, but they are 
derived by differentiation from existing haploid cells, 
not by meiosis. The water molds produce encysted, 
diploid zoospores through an asexual process and 
diploid spores called oospores through a sexual pro- 
cess. The encysted zoospores are very resistant to 
environmental conditions, the oospores less so. 

All plants, and some algae, have a sexual life cycle 
that is characterized by alternation of generations. 
One generation is composed of a haploid multicellular 
organism/stage called a gametophyte and the other 
generation is composed of a diploid multicellular 
organism/stage called a sporophyte. Meiosis in the 
sporophyte produces haploid cells called spores. In 
some plants the spores may be of two kinds, a mega- 
spore which forms a female gametophyte and a 
microspore which forms a male gametophyte. A 
spore generates the multicellular gametophyte by 
mitosis; it does not fuse with another haploid cell. 
In the higher plants the gametophytes do not form 
independent organisms, but are protected by being 
retained in the reproductive tissue of the sporophytes. 


See also: Bacillus subtilis; Fungi; Streptomyces; 
Tetrad Analysis 
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Src Family 


The Src tyrosine kinases comprise a family of around 
eight related proteins: Src, Fyn, Yes, Lck, Hck, Lyn, 
Fgr, and Blk. Of these, Src, Fyn, and Yes are expressed 
ubiquitously, while the others are mostly found in 
hematopoietic cells. c-Sre is the prototypic family 
member and is the cellular homolog of v-Src, the 
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transforming protein of Rous sarcoma virus. While 
the oncogenic and nononcogenic forms of Src both 
possess intrinsic tyrosine kinase activity, the viral form 
is deregulated by virtue of a deletion of C-terminal 
sequences that contain a tyrosine residue (Tyr527 in 
avian Src), whose phosphorylation confers negative 
regulation of c-Src’s kinase activity. 


Structural Organization and Regulation 
of Src Proteins 


In addition to the C-terminal regulatory sequences, 
each of the Src family kinases is myristylated (and in 
some cases palmitylated) at its N-terminus and this is 
required for membrane association. C-terminal to the 
site of myristylation is a unique domain that is not 
conserved among family members and is a putative 
site of serine phosphorylation, a Src homology (SH)3 
domain, an SH2 domain, a linker that joins the SH2 
domain with the kinase domain, and the conserved 
regulatory tail region mentioned above. 

Over the years, structure/function analysis and 
crystal structure determinations of Src family mem- 
bers have led to an understanding of the arrangement 
of Src’s structural domains and how this arrangement 
is altered as Src is regulated. Briefly, the SH3 and SH2 
domains of Src interact not only with specific 
sequences in effector proteins, but also with other 
regions in the Src protein itself. In particular, when 
Tyr527 is phosphorylated (carried out in the cell by a 
tyrosine kinase termed c-Src kinase (CSK)), it binds to 
its SH2 domain in an intramolecular interaction that 
has important consequences; first, the SH2 domain 
can no longer interact with heterologous effector 
proteins and, second, the intramolecular interaction 
results in repression of Src’s catalytic activity. Con- 
versely, dephosphorylation of Tyr527 results in cata- 
lytic activation by releasing the constraints imposed 
by the SH2 domain-Tyr527 interaction. The SH3 
domain of Src also contributes to catalytic repression 
by forming an intramolecular interaction with sequen- 
ces in the linker region and kinase domain. Thus, the 
inactive conformation of Src is maintained by multiple 
interactions between distinct regions of the protein. 
Consequently, Src activation requires the disruption 
of these interactions either via dephosphorylation 
of Tyr527 or by displacement of the SH3 and/or 
SH2 domains as a result of high-affinity binding to 
other proteins. In addition, c-Src activation requires 
phosphorylation on Tyr416 (believed to be autophos- 
phorylation) that is required to generate the substrate- 
binding site and ensure correct positioning of the 
substrate for catalysis. 

Thus, full activation of Src kinases requires the 
release of both SH3- and SH2-mediated intramolecular 
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interactions which allows the catalytic site to adopt an 
active conformation, as well as phosphorylation of 
Tyr416 in the kinase domain. The role of other sites 
of phosphorylation in the Src protein, such as in the 
unique region, has not been determined. 


Src Kinase Activity in Cellular Responses 
to Environmental Stimuli 


The Src family kinases are key components of cellular 
responses to environmental stimuli. A few examples 
of Src’s role in signaling changes in cell behavior are 
given below. 


Signaling from Receptor Tyrosine Kinases 
Src kinases act downstream of the transmembrane 
receptor protein tyrosine kinases (RPTKs), such as 
those for platelet-derived growth factor (PDGF) and 
epidermal growth factor (EGF), and are required to 
elicit downstream responses after ligand stimulation. 
Src is activated by both PDGF- and EGF-treatment of 
cells and becomes associated with the cytoplasmic 
domains of the activated receptors, via the SH2 
domain of Src and specific receptor phosphorylated 
tyrosine residues that have been mapped (at least in 
the case of PDGF-R). In addition, Src association is 
believed to induce further tyrosine phosphorylation 
of the activated receptors in some cases. Different 
experimental approaches that interfere with Src’s 
activity have implicated Src in the ability of a variety of 
growth factors to stimulate DNA synthesis and elicit a 
mitogenic response, although recent experiments in 
cells that lack Src, Fyn, and Yes have suggested that 
there is not an absolute requirement for Src family 
kinases under all circumstances. It should be noted 
that the Src/Fyn/Yes triple knockout cells that were 
used for these studies had been immortalized with 
SV40 large T antigen, which might account for the 
apparent discrepancy. 


G-Protein-Coupled Receptors 

There is now also evidence that ligand-induced acti- 
vation of G-protein-coupled receptors (GPCRs) 
activate Src and induce phosphorylation of known 
Src substrates, such as focal adhesion kinase (FAK) 
and paxillin. Although the mechanism of Src acti- 
vation is not known, GPCR-induced transactivation 
of RPTKS have been implicated in this process. 


Integrin Adhesion Receptors 

After engagement of integrins as a result of cell adhe- 
sion to extracellular matrix (ECM), components of the 
adhesion sites of integrin clustering (the so-called 
focal adhesions) are tyrosine phosphorylated. An ex- 
ample of this is FAK, which is phosphorylated on 


Tyr397 after integrin stimulation, creating a binding 
site for the Src SH2 domain which results in associ- 
ation of FAK with Src. Further phosphorylation of 
FAK on additional tyrosine phospho-acceptor sites 
leads to recruitment of signaling proteins, including 
the adaptor protein Grb-2 that can link integrins, 
via FAK, to the Ras-MAP kinase pathway. This 
type of intracellular signaling pathway, in which Src 
plays a pivotal role, is likely to contribute to adhesion- 
dependent cellular responses, although there are prob- 
ably other ways of inducing adhesion-dependent 
activation of MAP kinase. Recently, it has also been 
proposed that FAK mediates the integrin requirement 
for growth factor-induced MAP kinase activation, and 
this too might require Src-dependent phosphorylation 
of FAK. 


Oncogenic Transformation by Src 


A great deal of information concerning regulation of 
Sre’s subcellular localization and the biological con- 
sequences of Src activity has been gained by studying 
cellular transformation induced by oncogenic, 
deregulated forms of Src protein. Of particular value 
have been conditional, temperature-dependent 
mutants of v-Sre that have been used to dissect both 
the intracellular targeting of Src and the ensuing 
transformation process. 

v-Sre is targeted to cellular focal adhesions of 
mesenchymal cells by a process that requires the Src 
SH3 domain and an intact actin cytoskeletal network 
maintained by the concerted action of the Rho family 
of small GTPases and myosin activity. Specifically, 
inactive v-Sre colocalizes with microtubules around 
the cell nucleus and makes an SH3-and acto-myosin- 
dependent switch to peripheral focal adhesions at 
stress fiber termini. Neither myristylation nor the 
catalytic activity of v-Sre is required for translocation 
to focal adhesions, although these are required for 
disruption of focal adhesions and the actin cytoskel- 
eton that accompany cell transformation. In addition, 
Src kinase activity is required for cell motility, 
mediated by its effects on focal adhesion turnover 
and actin remodeling. In keeping with this, cells 
derived from c-Src/Fyn/Yes triple knockout embryos 
exhibit impaired migration. 

In contrast to fibroblasts, c-Src, Fyn, and Yes local- 
ize at cadherin-mediated cell-cell adhesions in epithe- 
lial cells. However, in a somewhat analogous manner 
to Sre’s role in focal adhesion turnover in fibroblasts, 
Src kinase activity is required for the disassembly of 
cadherin-mediated epithelial cell-cell adhesions 
(often termed epithelial cell scattering) that is neces- 
sary to free cells from the constraints of their epithelial 
connections, for example, during wound repair. 


As well as v-Src’s effects in disturbing the actin/ 
adhesion network, v-Src can also promote cell growth, 
stimulating both mitogenesis of quiescent cells and 
rapid transit through the G, phase of the cell cycle in 
growing cells. These effects of v-Sre are initiated at the 
cell periphery and are mediated by intracellular signal 
transduction pathways, including phosphatidy] inosi- 
tol (PI) 3-kinase and MAP kinase, that impinge on 
cell cycle regulators such as cyclin/cyclin-dependent 
kinases (cdks), the p27cdk inhibitor, and the retino- 
blastoma protein. In addition, v-Sre can also provide a 
PI 3-kinase-dependent survival signal by suppressing 
the apoptosis that oncogenically transformed cells are 
primed to undergo when deprived of serum growth 
factors or adhesion to ECM. This ability of activated 
Src to keep cells alive is in keeping with recent reports 
indicating that Src is critically involved in coupling 
lymphokine receptor activation with inhibition of 
apoptosis, and in mediating the VEGF-induced 
endothelial cell survival that is necessary for angiogen- 
esis. However, at least one Src family member, Lck, has 
been shown to mediate apoptotic cell death induced 
by ionizing radiation, indicating that the role of par- 
ticular Src kinases in regulating life or death decisions 
might vary and depend on cell context. 


Src in Human Cancer 


The expression and activity of c-Src is elevated in a 
variety of human cancers. This has best been docu- 
mented in colorectal cancer where increases have 
been reported from normal epithelium through the 
premalignant stages to invasive and metastatic tumors. 
Nonetheless, there is a substantial body of evidence 
indicating that the oncogenic properties of activated 
Src might contribute to the genesis of human tumors. 
For this reason, the tyrosine kinase inhibitors that 
selectively target the Src family will potentially be of 
value in suppressing Src-dependent aspects of the 
malignant phenotype in tumor cells, and in further 
dissecting the molecular mechanisms of Src’s biologic- 
al effects. 

More detailed reviews on the structure, regulation 
and biological activities of the Src family are provided 
in Further Reading below. 
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See also: Cancer Susceptibility; SH Domains; 
SH2 Domain; SH3 Domain 


SSLP (Simple Sequence 
Length Polymorphism) 


See: Microsatellite 


SSR (Simple Sequence 
Repeat) 


See: Microsatellite 


Stable Equilibrium 


See: Equilibrium 


Staggered Cuts 
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Staggered cuts in duplex DNA are made when the two 
strands are cleaved at different points in close prox- 
imity to each other. 


See also: Restriction Endonuclease 
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Staphylococcus aureus is a species of gram-positive 
bacteria which is typically pathogenic, yellow- 
pigmented, and salt resistant. The organism is often 
involved in endocarditis, food poisoning, infections of 
the skin, pneumonia, septic arthritis, and toxic shock 
syndrome. S. aureus is commonly found in the upper 
respiratory track of healthy individuals and is a 
notorious hospital-acquired (nosocomial) pathogen. 


1886 Start, Stop Codons 


The chromosome of S. aureus is a circular DNA 
molecule of approximately 2.8 megabase pairs. Gen- 
etic analysis has been done with transduction and 
transformation, as well as by physical methods and 
sequencing. (S. aureus DNA was used as a donor in 
one of the first published reports of intergeneric gene 
cloning.) Several plasmids are known and plasmid- 
borne drug resistance is common. 

The emergence of antibiotic-resistant strains of S. 
aureus is a major problem in most hospitals. Virtually 
all nosocomial strains are resistant to penicillins and 
there is an increasing number of methicillin-resistant 
strains that are resistant to multiple antibiotics. Thus 
far, most strains seem to be susceptible to vancomycin, 
but vancomycin-tolerant strains have been observed. 
This means that strains of S. aureus may soon appear 
that lead to infections which cannot be treated by 
antibiotics. 

Pathogenesis involves the production and secretion 
of cell surface and extracellular proteins that damage 
the host cells or tissues, or that interfere with the 
immune system. These proteins can include coagulase, 
enterotoxins A-E, fibrinolysin, lipase, nuclease, sev- 
eral proteases, o-, B-, and 6-toxins, toxic shock syn- 
drome toxin 1, and over 20 others. The fibrin-clotting 
enzyme coagulase, also called staphylocoagulase, 
causes the host protein fibrin to be broken down and 
deposited on the bacterial cell, possibly helping to 
protect the bacterium from attack by host cells. The 
yellow pigment also seems to be protective against 
killing by phagocytes. 

Many of the genes encoding these virulence factors 
are under the control of a cell-density-dependent, 
global regulatory system which responds to a peptide 
produced by the organism itself. The regulatory sys- 
tem involves the agr locus. This locus contains two 
divergent transcription units, RNAI and RNAITII, 
controlled by promoters P2 and P3, respectively. The 
RNAIII transcript is an RNA that regulates the genes 
encoding the cell-surface and extracellular proteins. It 
acts primarily by regulating transcription of those 
genes, but in some cases acts as a translational regula- 
tor. RNAIII is also the message for 6-toxin, but trans- 
lation of RNAIII is not involved in regulation. The P2 
operon has four genes, agrA, agrB, agrC, and agrD. 
The product of agrD is a small protein which is pro- 
cessed to an octapeptide and then excreted from the 
cell. The processing involves the product of agrB. 
The agrA and agrC genes encode a two-component, 
signal-transducing regulatory system, with the agrA 
product (AgrA) being the response regulator and the 
agrC product (AgrC) being the sensor kinase. It is 
AgrC that binds to the peptide when it reaches a high 
extracellular concentration near the end of expon- 
ential growth. The phosphorylated form of AgrA 


then presumably activates transcription of both P2 
and P3. This leads to very high levels of RNAIII in 
the cell and the initiation of the virulence response. 


See also: Drug Resistance; Gene Regulation; 
Kinases (Protein Kinases) 
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Protein-coding genes are transcribed from DNA to 
messenger RNA (mRNA). The protein is then trans- 
lated from the mRNA by the ribosome. Only a 
subsection of the mRNA is translated into protein. A 
start codon is a codon that signals the ribosome to start 
translation. Consequently, all translated regions begin 
with a start codon. A start codon has two functions: as 
a potential start codon, and as a regular codon for 
some amino acid. The standard genetic code has a 
single start codon, AUG, which codes for methionine. 
Thus, all translated proteins begin with this amino 
acid. Note that some proteins may undergo posttrans- 
lational processing and lose their initial methionine 
residue. If the start codon appears within the trans- 
lated region it functions as a regular amino acid codon. 
Whether a particular AUG is seen as a start codon or 
as an internal codon depends on the relative location 
of a ribosome-binding signal on the mRNA. In 
bacteria, this is generally a so-called Shine-Dalgarno 
sequence, a subset of the sequence ***GGAGG** 
that is complementary to a specific sequence near the 
3’ end of the 16S ribosomal RNA and lines the mRNA 
up appropriately to initiate transcription. The first 
amino acid has to sit unprotected in what then 
becomes the peptidyl site on the ribosome; to block 
the N-terminal charge, a formylmethionine version 
of the methionyl tRNA is used rather than regular 
met-tRNA to read this initiating codon. Eukaryotes 
have other ways of distinguishing the start of the 
actual message. 

The other genetic codes used in some organelles 
and primitive eukaryotes may have multiple start 
codons, which code for different amino acids. Organ- 
isms and organelles that share an otherwise identical 
genetic code may differ in the start codons they use. 

Stop codons are special codons that signal the ribo- 
some to stop translation. Unlike the start codon, the 
stop codon itself is not translated, and the last amino 
acid of a protein is the one coded by the codon imme- 
diately before the stop codon. All known genetic 


codes have multiple stop codons, all of which termin- 
ate translation. The stop codons in the standard 
genetic code are UAG, UAA, and UGA. They are 
referred to as amber, ochre, and opal, respectively. 

All translated (coding) regions begin with a start 
codon and end with a stop codon. They may contain 
additional start codons (which function as regular 
amino acid-coding codons), but cannot contain any 
additional stop codons (since these will terminate 
translation). In prokaryotes, a single mRNA may con- 
tain several coding regions, each of which is bounded 
by a start codon and a stop codon and usually has its 
own Shine-Dalgarno sequence. 

The trinucleotide sequence of a stop codon signals 
termination only if it appears within frame, i.e., starting 
immediately after the previous codon ends. Thus, the 
trinucleotide sequence of a stop codon will often ap- 
pear in a translated region off-frame (i.e., the last two 
nucleotides of one codon and the first nucleotide of 
the following codon, or the last nucleotide of a codon 
and the first two nucleotides of the following codon) 
without affecting translation unless an insertion or 
deletion mutation shifts the reading frame. 


See also: Genetic Code; Translation; Translational 
Control 
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Mice carrying mutations at the Steel (SI) locus located 
on chromosome 10 display multiple defects in hema- 
poiesis, gametogenesis, pigmentation, gut motility, 
and hippocampal-dependent learning. A similar 
phenotype is also displayed by mice with mutations 
at the dominant white spotting (W) locus. Mosaic 
analysis involving chimeric embryos and reciprocal 
bone marrow transplantation experiments in the 
1960s and 1970s suggested that the SZ and W loci, 
respectively, controlled environmental and intrinsic 
properties of the stem cells that give rise to the mul- 
tiple cell lineages affected in these mutants. In the late 
1980s and early 1990s this was directly demonstrated 
with the cloning first of the W locus, followed by the 
identification of the SI gene product, Steel Factor 
(SLF), as the ligand that binds to the W gene product, 
the Kit receptor tyrosine kinase. 

SLF, also referred to as mast cell growth factor 
(MGF), Kit ligand (KL), and stem cell factor 
(SCF), is a transmembrane growth factor which is 
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proteolytically cleaved to produce a soluble protein. 
Two isoforms of SLF exist (SLF7** and SLF*”°) due to 
alternative splicing around exon 6 which contains the 
primary proteolytic cleavage site. A second cleavage 
site, which is used when exon 6 is missing, is present in 
exon 7 (Figure 1). These splice variants are expressed 
in a tissue-specific manner. Both the transmembrane 
and soluble forms of SLF are biologically active. SLF 
stimulation of the Kit receptor results in tyrosine 
phosphorylation of the receptor and its associated 
downstream signaling molecules, potentiating the sur- 
vival, proliferation, and/or differentiation of target 
cells, such as hematopoietic cells, germ cells, and mela- 
nocytes. 

There are many independent alleles of the S/ locus, 
with the severity of the phenotype depending on the 
molecular alteration of the gene. Mice homozygous 
for lethal SI alleles - Si, SV’, SE”, SP”, SI, SP?” 
SI'8" — die in utero or shortly after birth of severe 
macrocytic anemia. These alleles involve deletions 
within the gene resulting in complete loss of SLF 
function. In contrast, mice homozygous for the 
Steel-Dickie (SI“) allele are viable despite displaying 
all the pleiotropic effects characteristic of disruptions 
at the S/ loci, including anemia, a reduced number of 
hemapoietic stem cells, a profound mast cell defi- 
ciency, a complete lack of pigmentation (white, 
black-eyed), and sterility in both sexes. The S/4 allele 
involves a 4-kb intragenic deletion in SLF genomic 
sequence leading to the loss of the transmembrane 
and cytoplasmic coding regions of SLF (Figure 1). 
The SI mutation is therefore only capable of pro- 
ducing the soluble form of SLF. Thus, the mutant 
phenotype of SI? mice suggests that soluble SLF 
alone cannot provide the normal signal to neighbor- 
ing cells that express the Kit receptor and that the 
membrane-bound form of SLF is essential for the 
normal developmental processes controlled by the W 
and S/ loci. 

The differential roles of soluble and membrane- 
bound SLF have been further defined im vitro. Soluble 
SLF produced by fibroblast cells cannot sustain the 
growth of Kit-expressing hemapoietic cells, pri- 
mordial germ cells, mast cells, or melanocytes when 
cocultured. In contrast, membrane-bound SLF 
expressed by fibroblasts supports the proliferation of 
these cell types in a contact-dependent manner. The 
adhesive nature of this latter interaction results in a 
more sustained phosphorylation of the Kit receptor 
due to a slower kinetics in the downmodulation of the 
receptor from the cell surface in comparison to soluble 
SLE. This differential signaling mediated by the two 
forms of SLF may determine whether Kit-expressing 
cells undergo survival, proliferation, and/or differen- 
tiation. It has also been proposed that the presentation 


codes have multiple stop codons, all of which termin- 
ate translation. The stop codons in the standard 
genetic code are UAG, UAA, and UGA. They are 
referred to as amber, ochre, and opal, respectively. 

All translated (coding) regions begin with a start 
codon and end with a stop codon. They may contain 
additional start codons (which function as regular 
amino acid-coding codons), but cannot contain any 
additional stop codons (since these will terminate 
translation). In prokaryotes, a single mRNA may con- 
tain several coding regions, each of which is bounded 
by a start codon and a stop codon and usually has its 
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termination only if it appears within frame, i.e., starting 
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Mice carrying mutations at the Steel (SI) locus located 
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proteolytically cleaved to produce a soluble protein. 
Two isoforms of SLF exist (SLF7** and SLF*”°) due to 
alternative splicing around exon 6 which contains the 
primary proteolytic cleavage site. A second cleavage 
site, which is used when exon 6 is missing, is present in 
exon 7 (Figure 1). These splice variants are expressed 
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and soluble forms of SLF are biologically active. SLF 
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cells, such as hematopoietic cells, germ cells, and mela- 
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within the gene resulting in complete loss of SLF 
function. In contrast, mice homozygous for the 
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Figure | Structure of the alternatively spliced and SI mutant SLF protein products. Diagrammatic representation of 
the SLF protein SLF**® and the alternatively spliced protein product SLF*”°, lacking 28 amino acids encoded within 
exon 6. The secretion signal peptide is indicated by a shaded box and the transmembrane domain by a solid black box. 
Cleavage (denoted by the arrows) of SLF** and SLF?” at the proteolytic cleavage sites (represented by asterisks) 
encoded within exon 6 and exon 7, respectively, result in soluble SLF products. The SI? mutant SLF product is 
generated by a 4-kb intragenic deletion in SLF genomic sequence resulting in the loss of the transmembrane and 
cytoplasmic regions and five amino acids (aa) N-terminal to the transmembrane domain, which are replaced by three 
additional amino acids and a stop codon. The SI'7 mutant SLF product is the result of a splice donor site mutation 
that affects splicing of the C-terminal exon that encodes the cytoplasmic tail. This results in the substitution of amino 


acids 239—273 with 27 additional amino acids. 


of membrane-bound SLF along the migratory path- 
ways of hemapoietic progenitor cells, primordial 
germ cells, and melanoblasts may play a role in guid- 
ing these Kit-expressing cells to their final destinations 
during embryogenesis. 

Although soluble SLF is unable to compensate for 
the loss of membrane-bound SLF, it is apparent from 
the phenotypes of S/ and SI mutant mice that soluble 
SLF is also indispensable in vivo. Soluble SLF has 
extensively been used in the study of hemapoiesis. 
Alone, soluble SLF acts primarily as a cell survival 
factor; however, in synergy with a number of cyto- 
kines and interleukins, it enhances colony formation 
by hemapoietic progenitor cells and the proliferation 
of factor-dependent cell lines. 

The SI” allele has also provided important 
insights into the biochemistry of SLF-Kit signaling. 
The S7” allele leads to the substitution of 34 amino 
acids from the cytoplasmic tail with 27 extraneous 
amino acids (Figure 1). Homozygous S/!77/sl!77 
mice are male sterile due to a block in postnatal sperm- 
atogenesis and are white. These results suggest that the 
cytoplasmic tail of SLF plays a role in either 
the transport and stability of SLF, its localization 
within the cell, and/or signaling between SLF and 
Kit-expressing cells. S/?” and SI?” are two additional 
alleles that affect gametogenesis in the adult. Both 
mutations are the result of DNA rearrangements 


upstream of the coding sequence of SLF, affecting 
the levels of SLF mRNA in a tissue-specific manner. 
In particular, decreased SLF mRNA expression in 
females causes sterility by affecting ovarian follicle 
development. These mice display only mild anemia 
and partial coat pigmentation, demonstrating that 
these mutations produce only limited impairment of 
SLF function. 

Although the biology of SLF at present is not fully 
understood, the analysis of mouse mutants disrupted 
at the S/ locus has provided valuable insight into the 
function of this important growth factor. Further 
investigation into the distinct roles of membrane- 
bound versus soluble SLF and the function of the 
cytoplasmic tail of SLF are still required. Clinical 
trials involving the use of SLF as a therapeutic agent 
are also in progress. 
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See also: W (White Spotting) Locus 


Stem Cells 


See: Embryonic Stem Cells 
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Steroids are a diverse group of compounds, mainly 
though not invariably water-insoluble, that play a 
major role in physiology as constituents of mem- 
branes, as emulsifying agents during the digestive pro- 
cess, and as hormones. From the chemical perspective, 
the naturally occurring steroids are considered to be 
derivatives of a hydrocarbon, cyclopentanoperhydro- 
phenanthrene (C17H26). This compound, a product 
of synthetic organic chemistry, contains three cyclo- 
hexane rings and one cyclopentane ring, all fused 
together to form a puckered structure that is usually 
referred to as the steroid nucleus. The naturally occur- 
ring steroids have various substituents (alkyl, hy- 
droxyl, aldehyde, ketone, carboxylic acid) attached 
to the four-ring nucleus. Frequently there are one or 
more double bonds within the steroid nucleus. 
Biosynthetically, steroids are formed from five 
carbon precursors that are chemically related to the 
hydrocarbon isoprene, in an enzyme-catalyzed se- 
quence of reactions that, in schematic terms, proceeds 
as follows: C5—C10—C15—C30. The C30 com- 
pound, lanosterol, is the precursor to all of the other 
steroids, including cholesterol (C27), one of the most 
abundant members of this category of metabolite. 
Cholesterol, in addition to its role as a modulator of 
membrane fluidity, is the precursor of the bile acids 
(required for efficient digestion of lipids), vitamin D 
(required for calcium and phosphate metabolism), and 
the steroid hormones (required for mineral metabol- 
ism, blood pressure regulation, reproduction, and man- 
ifestation of secondary sexual characteristics). The 
excessive deposition of cholesterol within the cardio- 
vascular system is one characteristic of atherosclerosis. 


See also: Familial Hypercholesterolemia 
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Sticky ends (cohesive ends) are short stretches of sin- 
gle strands of DNA that protrude from the ends of 
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duplex DNA, typically generated by staggered cuts in 
double-stranded DNA, e.g., by restriction endonu- 
cleases. Complementary sticky ends can anneal or 
hybridize to one another and can be joined by DNA 
ligase, often to create a recombinant molecule. 


See also: Ligation; Restriction Endonuclease 
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The term strain is often used by microbiologists to 
indicate a natural isolate of a particular species. 
Geneticists, including microbial geneticists, also use 
the term to indicate a group within a species that has a 
distinctive genetic trait. For diploid organisms, fre- 
quently such a group corresponds to a true-breeding 
line that is homozygous for genes that contribute to 
the trait. For instance, experimental geneticists refer to 
their stocks of true-breeding genotypes, or inbred 
lines, as strains. 

In microbial genetics, a particular strain may 
have any number of genetic differences compared to 
other members of the species. However, in animal 
genetics the term variety is often used to characterize 
a group within a species that has several distinctive 
traits. 


See also: Inbred Strain; Wild-Type (WT) 


Strain Distribution Pattern 
(SDP) 
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Strain distribution pattern is the distribution of two 
segregating alleles at a single locus across a group of 
animal samples used for analysis ina linkage study. An 
SDP is used in the context of backcross data and data 
obtained from recombinant inbred (RI) strains. 


See also: Backcross; Recombinant Inbred Strains 
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Strand displacement is a mode of replication of some 
viruses in which a new DNA strand grows by dis- 
placing the previous homologous strand of the duplex. 


See also: Replication 
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Streptomyces is a gram-positive genus of filamentous 
actinomycetes, a large group of spore-forming bac- 
teria that form branching filaments during growth. 
These filaments form a network called a mycelium. 
(Both the formation of spores and the pattern of 
mycelial growth may remind one of fungi, but the 
actinomycetes are prokaryotes.) The streptomycetes 
are primarily found in soil. 

The streptomycetes have a high G+C base compos- 
ition (70-74%). The typical prokaryote has a single 
circular double-stranded DNA molecule asits chromo- 
some, whereas, as a general rule, in Streptomyces the 
chromosome is linear. In most Streptomyces species, 
the chromosomes are about 8 megabase pairs, which is 
rather large compared with other prokaryotic chromo- 
somes. Many species also contain linear plasmids 
which range from about 10 to as many as 1000 kilo- 
base pairs, and circular plasmids are also present. The 
ends of the linear chromosomes and linear plasmids 
contain an inverted repeat, and each DNA strand has a 
protein covalently attached to its 5’ end. This protein 
serves as a primer in DNA replication. 

There are a large number of genetic tools available 
in Streptomyces. Several of the plasmids in Strepto- 
myces are conjugative and many of these can also 
mobilize chromosomal genes. Generalized transduc- 
tion has also been used to map genes. A number of 
cloning vectors, including shuttle vectors, are also 
available. Indeed, most of the techniques of molecular 
genetics, including transposon mutagenesis, can be 
used. 

Many members of the genus Streptomyces produce 
avery large number of secondary metabolites, often as 
part of the complex pathway leading to sporulation. 
These secondary metabolites include many useful 


antibiotics. For instance, S. aureofaciens produces tet- 
racycline, S. clavuligerus produces cephalosporins and 
clavulanic acid, S. erythreus produces erythromycin, 
S. fradiae produces neomycin, S. griseus produces 
streptomycin, S. mouseri produces nystatin, and 
S. venezuelae produces chloramphenicol. Because of 
the medical importance of many of the antibiotics 
produced by these organisms, there has been a con- 
siderable amount of research on the genetics of anti- 
biotic production. 

The genes responsible for antibiotic biosynthesis 
are located in clusters, which also include genes 
responsible for resistance to and transport of the 
antibiotic. Some of these clusters are found on the 
chromosome, while others are carried by a plasmid. 
Because of the relatedness of some of the antibiotic- 
producing genes in fungi and in Streptomyces it has 
been argued that these are examples of horizontal gene 
transfer which took place between these soil micro- 
organisms. 


See also: Conjugation, Bacterial; Spores; 
Streptomycin; Transduction 
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Streptomycin is an antibiotic of the aminoglycoside 
group obtained from certain strains of the bacterium 
Streptomyces griseus. The antibiotic binds to the 16S 
ribosomal RNA (rRNA) of bacterial-type ribosomes 
and inhibits protein synthesis. Streptomycin increases 
the frequency of errors in protein synthesis and has 
been found to be of important use in in vitro studies on 
the accuracy of protein synthesis. 

The microbiologist Selman A. Waksman was 
awarded the Nobel Prize in Physiology or Medicine 
in 1952 for his discovery of streptomycin, the first 
antibiotic effective against tuberculosis. (It should 
be noted that Waksman also introduced the term 
‘antibiotic.’) Although no longer as widely used in 
the treatment of human infectious disease, strepto- 
mycin is still used in combination with other drugs 
to treat tuberculosis. Reasons for its more restricted 
clinical use include adverse side effects, such as pos- 
sible kidney damage and deafness, and streptomycin 
resistance. 

Streptomycin-resistant mutants of bacteria have 
long been known and intensively studied. The major- 
ity of mutants leading to streptomycin resistance in 


common bacteria are the result of mutations in rpsL, 
the gene encoding the ribosomal protein $12 (a protein 
in the small subunit of the ribosome), and in rrs, the 
gene encoding 16S ribosomal RNA (rRNA). These 
mutations are recessive. Since the fast-growing bac- 
teria used as genetic models, such as Escherichia coli 
and Bacillus subtilis, have several copies of the genes 
encoding rRNA, most of the early work on strepto- 
mycin resistance involved mutations in rpsL (origin- 
ally named str), which exists as a single copy. Mutants 
in the gene encoding 16S rRNA conferring strepto- 
mycin resistance were not uncovered until genetic 
techniques were available to manipulate these genes 
in vitro. 

Ribosomal protein $12 interacts with a highly 
conserved structure formed by the 16S rRNA, where 
streptomycin binds. Apparently certain amino acid 
changes in S12 lead to an alteration or destabilization 
of this structure. This in turn affects the binding of 
streptomycin to the ribosome. Some of these muta- 
tions lead to streptomycin resistance, but some lead 
to streptomycin dependence. As mentioned above, 
streptomycin itself can increase errors in protein 
synthesis. Interestingly, some streptomycin-resistant 
mutants restrict the normal level of certain errors, i.e., 
the ribosomes in these mutants are hyperaccurate. 
They also have slowed down translation elongation 
rates. (Certain mutants in ribosomal protein S4, 
encoded by rpsD, are also streptomycin resistant and 
have hyperaccurate ribosomes.) Like streptomycin, 
these mutants have proved valuable in investigations 
of translational accuracy. Several different mutations 
in rpsL are known in enteric bacteria which can lead to 
streptomycin resistance, but these tend to be clustered 
at two different regions of the protein: amino acid 
residues 41 to 45 and 87 to 93. Mutations in similar 
locations are known in other bacteria. 

Mutations in the 16S rRNA encoding gene, 77s, 
which confer resistance to streptomycin have been 
localized to the region near base 530 and to that near 
base 915. These regions are part of a putative ‘accuracy 
center’ of the ribosome. Mutations in the 915 region 
not only can lead to streptomycin resistance, but also 
to changes in translational accuracy. This region seems 
to be involved with proper selection of tRNA at the 
ribosomal A site. 

Although the causative agent of tuberculosis, 
Mycobacterium tuberculosis, has a reasonably large 
genome (4.4 million bp) it has only a single copy of 
each rRNA gene. Therefore, in M. tuberculosis, resist- 
ance can arise by a mutation in the sole rvs gene (or the 
sole rpsL gene). One study found that about 10% of 
the resistant strains of M. tuberculosis isolated from 
patients have mutations in rvs, while 50% have muta- 
tions in rpsL. However, resistance can also arise by 
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mechanisms other than modification of the target 
of streptomycin activity, which include uptake and 
modification of the antibiotic. 

Although most antibiotics that act by inhibiting 
protein synthesis are bacteriostatic, streptomycin is 
bacteriocidal. It is not completely clear why strepto- 
mycin kills bacteria, rather than just stopping growth. 


See also: Antibiotic Resistance; Antibiotic- 
Resistance Mutants; Resistance to Antibiotics, 
Genetics of; Ribosomal RNA (rRNA); Ribosomes; 
Streptomyces 
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The bacterial stringent response refers to the many 
adjustments of gene expression and cell physiology 
attributable to the accumulation of the (p)ppGpp 
nucleotides, which are derivatives of GTP (or GDP) 
bearing pyrophosphoryl substituents on the ribose 
3’ hydroxyl. The intracellular level of (p)ppGpp is 
regulated by mechanisms that sense the availability 
of different nutrients such as amino acids, carbon 
sources, nitrogen sources, lipids, and phosphate. The 
best-understood nutrient limitation condition, medi- 
ated by the re/A gene, involves amino acid deprivation. 

The stringent response was first noticed as inhibi- 
tion of stable RNA accumulation occasioned by 
amino acid starvation in Escherichia coli. The ability 
of mutants of a single locus to abolish this wild-type 
‘stringent’ RNA control phenotype led to calling the 
mutant behavior a ‘relaxed response’ and the mutant 
gene relA. Similar mutant phenotypes are widespread 
among bacteria distantly related to E. coli. In addition 
to RNA control, many other processes are affected by 
the stringent response as judged by differential nega- 
tive or positive mutant effects on regulatory behavior. 
Negative effects are seen for activities whose functions 
are presumably superfluous during starvation condi- 
tions, such as the synthesis of ribosomes, ribosomal 
RNA, and transfer RNA. Among functions that 
can be induced by (p)ppGpp synthesis are synthesis 
and transport of specific amino acids, accumulation 
of glycogen and polyphosphate, and induction of 
the RpoS sigma factor governing stationary phase- 
specific gene expression. Many of the regulatory 
outcomes of the stringent response can be viewed as 
enhancing survival and adaptation to nutritional 
stress. 
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Using ATP as a pyrophosphate donor to GTP (or 
GDP) acceptor substrates, the RelA protein catalyzes 
(p)ppGpp_ synthesis on ribosomes. The reaction 
requires that ribosomes be stalled during translation 
of mRNA for lack of a bound, codon-specified, 
charged (aminoacylated) tRNA. Catalysis is activated 
by uncharged cognate tRNA binding to the other- 
wise vacant ribosomal acceptor site. Predictions that 
(p)ppGpp synthesis is activated by increased 
ratios of uncharged/charged tRNA whenever rates 
of tRNA aminoacylation fail to keep up with the 
demands of protein synthesis have been verified with 
aminoacyl-tRNA synthetase mutants when tRNA 
levels are artificially varied. A causal role for the 
(p)ppGpp nucleotides in the stringent response can 
be demonstrated with engineered gene constructs 
that allow manipulation of (p)ppGpp abundance 
atwill in cells that are not nutritionally stressed. Cells 
with an artificially elevated level of (p)ppGpp mimic 
many of the major regulatory effects seen during a 
stringent response provoked by amino acid starvation. 

Regulation of (p)ppGpp levels in response to depriv- 
ation of nutrients other than amino acids occurs in 
strains deleted for relA. Despite the absence of relA 
function, these starvation protocols elicit responses 
that share features of the classical stringent response 
to amino acid limitation. This second source of 
(p)ppGpp synthesis, in E. coli, is a gene called spoT 
that encodes a single bifunctional protein having 
weak (p)ppGpp synthetic activity as well as a specific 
(p)ppGpp 3’-pyrophosphoryl hydrolase. Although the 
SpoT protein sequence shows broad homology with 
the RelA protein, SpoT is not ribosome associated. The 
regulation of (p)ppGpp accumulation generally 
involves inhibition of degradation rather than stimula- 
tion of synthesis. The best-studied example (carbon 
source starvation) leads to (p)ppGpp accumulation 
through severe inhibition of (p)ppGpp hydrolysis. 

Deleting both the relA and spoT genes of E. coli 
abolishes detectable (p)ppGpp. Such (p)ppGpp° 
strains appear nearly normal as long as abundant 
nutrients are provided. However, (p)ppGpp® strains 
fail to grow on otherwise supportive glucose + salts 
minimal media unless several amino acids are pro- 
vided. The corresponding biosynthetic pathways are 
deduced to be (p)ppGpp-dependent. Survival of 
(p)ppGpp’® strains is also impaired by nutrient starva- 
tion, revealing a protective effect of (p)ppGpp during 
the stringent response. Although extragenic suppres- 
sors of these (p)ppGpp® phenotypes map exclusively 
to genes specifying subunits of the RNA polymerase, 
the mechanism by which (p)ppGpp inhibits transcrip- 
tion im vitro remains elusive. 

The stringent response appears to be confined to 
Eubacteria where specialized roles for (p)ppGpp 


range from those found in E. coli to those contributing 
to pathogenesis (Legionella pneumophila), acid resist- 
ance (Lactococcus lactis), adaptive catabolism (Pseudo- 
monas putida), antibiotic production (Streptomyces 
coelicolor), and quorum sensing for fruiting body 
development (Myxococcus xanthus). In contrast to 
most Eubacteria, the genomes of some intracellular 
parasitic bacteria lack genes with Rel/Spo homology; 
examples are Rickettsia prowazekii, Treponema palli- 
dum, and Chlamydia trachomatis. 


Further Reading 

Cashel M, Gentry DM, Hernandez VJ and Vinella D (1996) The 
stringent response. In: Neidhardt FC et al. (eds) Escherichia 
coli and Salmonella: Cellular and Molecular Biology, 2nd edn, 
pp. 1458—1496. Washington, DC: ASM Press. 


See also: Gene Expression; GTP (Guanosine 
Triphosphate) 
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A structural gene is any gene coding for a product 
(e.g., enzyme, structural protein, tRNA), i.e., any 
product other than a regulator. 


See also: Housekeeping Gene 
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In many types of cells, in diverse species, RNAs are 
localized to specific cytoplasmic domains. Subcellular 
RNA localization contributes to the creation of cel- 
lular asymmetry by creating spatially unique domains 
within cells. In some cases, localization of mRNA is 
coupled to its translational activation, so that only 
localized transcripts are translated. Together, subcel- 
lular RNA localization and localization-dependent 
translation serve to restrict protein products to 
specific cellular domains. In recent years considerable 
advances have been made in understanding the bio- 
logical functions served by subcellular RNA localiza- 
tion, and the mechanisms behind this localization. 


Biological Functions of RNA Localization 


Embryonic Patterning 

Striking examples of RNA localization occur in 
oocytes. Many maternally encoded localized RNAs 
play key roles in embryonic development. In Xenopus, 
several RNAs are localized to the animal or vegetal 
poles of the oocyte, where their protein products to 
function in axial embryonic patterning. Examples 
include Vg? mRNA, which encodes a TGFf-like 
growth factor, and VegT mRNA, which encodes a 
T-box transcription factor. X/sirts, small noncoding 
RNAs, are also localized to the vegetal pole of the 
oocyte, where they are required for correct local- 
ization of Vg? mRNA. In Drosophila, localization of 
RNAs in the developing oocyte is a key step in setting 
up both the anterior—posterior and dorsal—ventral axes 
of the egg and subsequently the embryo. Examples 
include bicoid (bcd) mRNA, which is localized to the 
anterior pole where the Bed protein, a homeobox- 
family transcription factor, initiates head and thorax 
development, and nanos (nos) mRNA, which is local- 
ized to the posterior pole where Nos protein acts as a 
translational regulator, and is essential for abdomen 
formation. 


Binary Cell Fates 

Localization of mRNA is an efficient means of parti- 
tioning a factor to one of two daughter cells born from 
a single cell division. A striking example occurs in the 
budding yeast Saccharomyces cerevisae, where local- 
ization of ASH1 mRNA to the daughter cell, and its 
exclusion from the mother cell, contributes to the 
determination of mating type. In the developing Dros- 
ophila nervous system, asymmetric cell divisions of 
neuroblasts produce GMCs (ganglion mother cells), 
the precursors to neurons and glia. Localization of 
Prospero, a transcription factor required for GMC 
fate, is enhanced by localization of prospero mRNA 
to the GMC daughter. 


Germ Cell Fate 

In organisms with distinct germline and soma, the 
decision to be a germ or somatic cell is usually made 
early in development. Germ cell fate is often accom- 
plished by sequestering a specialized maternal egg 
cytoplasm, the germ plasm. In Drosophila, the germ 
plasm is assembled by the action of Oskar (Osk) pro- 
tein, which is localized at the Drosophila posterior pole 
by localization of osk mRNA. Two noncoding RNAs 
are also localized to the Drosophila germ plasm: mito- 
chondrial large ribosomal RNA (mtlrRNA), which is 
required for the formation of germ cells, and polar 
granule component (PGC) RNA, required for germ 
cell development. 
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Somatic Cells 
RNAs are localized in a variety of somatic cells. 
Examples include B-actin mRNA and myelin basic 
protein (MBP) mRNA. B-actin mRNA is localized 
to the leading edges of the lamellipodia of chick fibro- 
blasts. This localization is required for the distinct 
polarity of this cell type. Localization of MBP mRNA 
in the processes of oligodendrocytes targets MBP, a 
protein essential for myelination of the nervous sys- 
tem, to the myelin compartment of these cells. 
Extracellular signals stimulate RNA localization. 
Focal adhesion complexes (FACs) are formed in res- 
ponse to signals that arise when integrin receptors 
bind to the extracellular matrix (ECM). These signal- 
ing events induce the recruitment of mRNA to FACs. 
In neurons, RNAs are localized by large RNA trans- 
port granules, to dendritic domains, to proximal axons, 
and also to axonal growth cones in developing neurites. 
Sorting of RNA granules into dendrites is responsive 
to extracellular signals, including neurotrophic fac- 
tors, and may contribute to neuronal plasticity. Neural 
activity modulates the expression and localization 
of mRNAs in some neurons. For instance, Arc 
(Activity Regulated Cytoskeletal protein) mRNA is 
sorted to dendrites soon after its expression in 
response to electrical stimulation. Synaptic activation 
of specific regions of the brain results in accumulation 
of ARC transcripts in the synaptically activated 
dendrites. 


Mechanisms of RNA Localization 


RNA localization pathways require the transport of 
RNA through cells and its stable attachment to struc- 
tures at a final cellular destination. These events utilize 
cis-acting RNA elements, proteins that bind these 
elements, and cellular structures and regulatory pro- 
teins associated with these structures. 


Cis-Acting RNA Localization Signals 

Within localized RNAs, cis-acting signals direct the 
transport and docking of RNAs at their proper cellu- 
lar addresses. In general these localization elements 
lie in the 3’ untranslated regions (UTRs) of RNAs, 
although signals for localization have also been iden- 
tified in 5’ UTRs. Some RNA localization signals are 
modular, with separable elements mediating distinct 
steps in RNA localization. For instance, distinct sig- 
nals in the 3’ UTR of MBP mRNA mediate RNA 
transport and RNA anchoring in the cell processes. 
Localization signals usually lie in regions with intri- 
cate secondary RNA structures. In several cases, such 
as Xenopus Vg1 mRNA, and Drosophila bcd mRNA, 
redundant elements are dispersed over a larger RNA 
segment required for localization. 
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The Role of the Cytoskeleton and Other 
Cellular Organelles 

In many cases, localized RNAs are transported as 
large granules, visible by light microscopy. The move- 
ment of these RNA granules, identified with fluores- 
cently labeled RNAs or green fluorescent protein 
(GFP)-tagged proteins, has been studied in living 
cells by time-lapse video microscopy. Both the actin 
and microtubule cytoskeletons play important roles in 
RNA localization. These cytoskeletal elements pro- 
vide scaffolds for directional transport of RNAs 
through the cytoplasm, and structural components of 
anchors at the site of localization. For instance, in 
Xenopus oocytes, microtubules and actin filaments 
perform distinct temporal and spatial roles in localiz- 
ing Vg? mRNA: microtubules mediate transport of 
the RNA through the cell, and actin filaments are 
required for anchoring the RNA at the vegetal cortex. 
The endoplasmic reticulum (ER) also contributes to 
RNA localization in Xenopus oocytes. 

Motor proteins are expected to contribute to these 
events. In budding yeast localization of ASH1 mRNA 
to the daughter cell is actin-dependent, and requires 
the SHE1 type V myosin. In oligodendrocytes, MBP 
RNA forms RNP granules that can be visualized in 
cultured oligodendrocytes. The transport of these 
RNA granules along microtubules to the cell pro- 
cesses requires kinesin. In Drosophila, two micro- 
tubule motor proteins mediate sorting of maternal 
mRNAs along a polarized microtubule cytoskeleton 
to their destinations in the oocyte. Kinesin I is required 
to localize oskar mRNA to the posterior pole of the 
oocyte, whereas cytoplasmic dynein is implicated 
in the localization of bicoid mRNA to the anterior 
oocyte pole. The actin cytoskeleton also plays a role in 
anchoring oskar mRNA at the posterior pole of the 
Drosophila oocyte. 


Proteins that Bind RNA Localization 
Elements 

RNA-binding proteins constitute one important com- 
ponent of the large RNA/protein complexes that 
transport localized RNAs. In some cases the same 
RNA-binding protein is used to localize RNAs in 
different species. ZBP-1 (zipcode-binding protein-1) 
is an RNA-binding protein identified in chicken 
fibroblasts that binds to the RNA localization signal 
of B-actin mRNA. The homologous protein in Xeno- 
pus, Vera (also known as Vg1 RBP (RNA-Binding 
protein)), binds to Vg? mRNA localization signals 
in oocytes. Some RNA-binding proteins mediate 
RNA localization in different types of cells within a 
species. Thus Staufen, a protein that binds double- 
stranded RNA (dsRNA), was identified for its role 
in localizing maternal RNAs in the Drosophila oocyte, 


and is also required for localizing prospero mRNA in 
dividing embryonic neuroblasts. Proteins that cycle in 
and out of the nucleus may function in very early steps 
in RNA localization pathways. Several heterogeneous 
nuclear RNP (hnRNP) proteins have been implicated 
in cytoplasmic RNA localization in Xenopus oocytes, 
Drosophila embryos, and mammalian oligodendro- 
cytes. 


Localization of RNA by Degradation 

A final mechanism that yields spatial localization of 
RNAs is selective degradation of RNA coupled to 
protection in specialized regions of the cytoplasm. 
This type of localization mechanism is exemplified 
by Hsp83 mRNA in Drosophila embryos. In young 
embryos, maternally loaded Hsp83 mRNA is distri- 
buted uniformly. However, following egg activation 
there is degradation of the mRNA throughout the 
cytoplasm, except at the posterior pole, where it is 
protected. This selective degradation yields an embryo 
with posteriorly localized Hsp83 mRNA. 


Further Reading 

Barbarese E, Brumwell C, Kwon S, Cui H and Carson JH (1999) 
RNA on the road to myelin. Journal of Neurocytology 28(4—5): 
263-270. 

Bashirullah A, Cooperstock RL and Lipshitz HD (1998) RNA 
localization in development. Annual Review of Biochemistry 67: 
335-394. 

Bassell GJ, Oleynikov Y and Singer RH (1999) The travels of 
mRNAs through all cells large and small. FASEB Journal | 3(3): 
447-454. 

Etkin LD and Lipshitz HD (1999) RNA localization. FASEB Jour- 
nal 13(3): 419-420. 

Gavis E (1997) Expeditions to the pole: RNA localization in 
Xenopus and Drosophila. Trends in Cell Biology 7: 485—492. 
Hazelrigg T (1999) The destinies and destinations of RNAs. Cell 

95: 45 |-460. 

Lasko P (1999) RNA sorting in Drosophila oocytes and embryos. 
FASEB Journal 13(3): 421-433. 

Lehmann R (1995) Cell—cell signaling, microtubules, and the 
loss of symmetry in the Drosophila oocyte. Cell 83: 353- 
356. 

Macdonald PM and Smibert CA (1996) Translational regulation 
of maternal mRNAs. Current Opinion in Genetics and Develop- 
ment 6: 403—407. 

Mowry KL and Cote CA (1999) RNA sorting in Xenopus 
oocytes and embryos. FASEB Journal |3(3): 435—445. 

Oleynikov Y and Singer RH (1998) RNA localization: different 
zipcodes, same postman? Trends in Cell Biology 8: 381—383. 

Schnapp BJ, Arn EA, Deshler JO and Highett MI (1997) RNA 
localization in Xenopus oocytes. Seminars in Cell and Develop- 
mental Biology 8: 529-540. 

St Johnston D (1995) The intracellular localization of messenger 
RNAs. Cell 81: 161—170. 


Wilhelm JE and Vale RD (1993) RNA on the move: the 
mRNA localization pathway. Journal of Cell Biology 123: 269- 
274. 


See also: Cell Lineage; Messenger RNA (mRNA); 
Xenopus laevis 


Subcloning 


I Schildkraut 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1252 


Subcloning is the process of dividing a large DNA 
fragment carried in a vector into smaller more man- 
ageable DNA fragments each carried independently in 
its own vector. DNA cloning often results in large 
DNA fragments that encode more than one gene. 
These large DNA fragments are subcloned in order 
to determine DNA sequence or study the effect of 
single genes or overexpress specific gene products. 


See also: DNA Cloning; Vectors 


Substitution Mutations 


See: Base Substitution Mutations; Gene 
Substitution; Mutagens 
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Superinfection is the productive entry of a phage into a 
cell that is already infected with another phage. Infec- 
tion of bacteria by certain phages interferes with the 
ability of certain other phages to reproduce or to con- 
tribute to the genetic composition of the progeny. 

Several different types of superinfection immunity 
are observed. 

First of all, cells carrying prophages are immune to 
superinfection by other phages that use the same 
repressor, since that repressor is already present in 
the cell in high enough quantity to prevent the lytic 
mode of infection for the incoming phage. Thus 
lambda prophages, for example, give their hosts the 
benefit of protection against lytic infection by other 
lambdoid prophages. Even though HKO22 does not 
belong to the same immunity group as lambda in this 
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regard, it protects against superinfection by lambda in 
another way. As discussed in the article on antitermin- 
ation (see Antitermination Factors), HKO22 makes a 
very special version of the N protein that, rather than 
being involved in its own antitermination process, 
directly interferes with the ability of the N protein of 
lambda and certain related phages to carry out their 
own antitermination. This alternative mechanism 
makes cells infected with HKO22 immune to infection 
by that group of lambdoid phages. 

The pair of lambda rex genes render cells carrying 
lambda prophages immune to infection by a wide 
variety of other phages. In this case, the immunity is 
suicidal; the cell’s membrane potential breaks down 
some time into infection by the superinfecting phage. 
The T4 rll A and B genes are able to overcome this 
effect of lambda rex genes. It is this phenomenon that 
rendered rII mutants unable to grow on lambda 
lysogens — a factor that was crucial to the elaborate 
fine-structure-genetics work of Seymour Benzer 
which established that the unit of recombination is the 
individual nucleotide (see T Phages). 

Cells infected with bacteriophage T4 are immune 
not only to infection by most other phages but to 
other T4 phage arriving more than a few minutes 
after the T4. This immunity against late-arriving T4 
is the consequence of a membrane protein encoded by 
the imm gene that helps block entry of the newly 
arriving phage DNA into the cell; it is related also to 
a process called ‘superinfection exclusion.’ The precise 
mechanism is not understood. The DNA of the phage 
attempting superinfection remains in the periplasmic 
space and is largely degraded. In the case of non- 
T-even phages attempting to superinfect a cell infected 
by a T-even phage, transcription is inhibited by the Alc 
protein, which blocks transcription of all cytosine- 
containing DNA, as discussed in the entry on 
T phages, and the incoming phage DNA is subject to 
degradation by endonucleases II and IV, produced by 
T-even phage to degrade the host DNA. 

Phage P22 prophages have yet other mechanisms 
of producing immunity to superinfecting phages. In 
addition to the repressors needed to maintain the 
prophage state, P22 prophages express several genes 
which are very effective in keeping out homologous 
and heterologous superinfecting phages. The product 
of gene sieA interferes with DNA injection by P22 
and related phages. GpszeB causes the lytic cycle of 
certain other Salmonella phages (not including P22 
itself) to abort early in infection. The product of gene 
al interferes with adsorption by P22 and related phages 
by changing the structure of the lysogen’s O antigen. 

The variety of mechanisms found in the few phages 
studied to date for engendering superinfection exclu- 
sion and thus protecting resident phage from external 


1896 Superrepressor 


competition makes it very likely that many more 
interesting mechanisms will be found as more phages 
are studied from the vast pool present in our bio- 
sphere. 


See also: Bacteriophages; Prophage, Prophage 
Induction 
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Superrepressor refers to a mutant form of a repressor 
protein that represses gene expression more efficiently 
than the wild-type. Repressor proteins are found in 
phage, bacteria, and eukaryotes and intervene in the 
control of gene expression at the level of transcrip- 
tion or translation, such that the structure—function 
mechanisms underlying the superrepressor phenotype 
can be quite varied. 

Superrepressors were first identified in the context 
of the study of the control of prokaryotic operons 
coding for metabolic enzymes, such as the lac (lac- 
tose), hut (histidine), gal (galactose), put (proline), 
src (sucrose) and nag (n-acetlyglucosamine-6P) 
operons. The repressor proteins of these operons act 
by binding to specific DNA operator sequences 
found in or near the operon. RNA polymerase 
cannot bind or initiate transcription when the repres- 
sor is bound to the operator site. When the small 
molecule metabolites (inducers) of these operons 
(i.e., histidine in the case of the hut operon) are present 
in the medium, they interact with high affinity with 
the repressor proteins, inducing a conformational 
change such that the repressor’s affinity for the opera- 
tor site decreases significantly, thereby decreasing 
operator occupancy and allowing transcription to 
take place. In this context, superrepressor mutants 
typically involve amino acid mutations that abolish 
or at least significantly diminish the affinity of the 
repressor proteins for their inducers. They are termed 
un-inducible. 

Another class of bacterial repressors for which 
superrepressors have been identified involves proteins 
that negatively regulate gene transcription in response 
to the binding of a corepressor molecule, typically the 
product or an intermediate in the biosynthetic path- 
way implicating the enzymes encoded by the operon 
in question. Thus, the trp repressor (trpR) in reponse 
to L-tryptophan represses transcription of genes 


whose products are enzymes involved in the synthetic 
pathways for aromatic amino acids. Another example 
is the arg repressor. Strains bearing strong super- 
repressor mutations in these proteins are typically 
tryptophan (or arginine) auxotrophs, since the 
enzymes responsible for the biosynthesis of these 
amino acids are never produced. The mechanisms by 
which these mutations lead to the superrepressor 
phenotype can be quite diverse and include increased 
operator affinity in the absence of corepressor or only 
when bound by the corepressor. Increases in DNA 
affinity in general can also lead to superrepression 
since operator affinity increases accordingly. Super- 
repressor mutants with altered repressor oligomeriza- 
tion and protein-folding properties have also been 
identified. 

Certain repressors such as birA (biotin repressor) 
and putR also act as enzymes in the control of the 
biosynthetic pathway, catalyzing reactions on the 
inducer or corepressor molecule. Superrepressor 
mutants of these multifunctional proteins may exhibit 
altered catalytic properties, as well. 

Translational superrepressors have been found 
in the case of the R17/MS2 RNA phage coat protein. 
Their superrepressor activity appears to arise from 
an increase in the size of the RNA site recognized 
leading to an increase in overall affinity. Superrepres- 
sors bearing deficiencies in coat assembly result in 
an increase in the amount of protein available for 
repression. 

Eukaryotic superrepressors typically involve 
mutations that increase the ability of a repressor 
protein to undergo heterodimerization with a tran- 
scriptional activator. Transcriptional activation by 
the yeast transcriptional activator Gal4 is repressed 
through its interaction with a repressor protein, 
Gal80. In response to binding of galactose by Gal80 
and phosphorylation of Gal4, the Gal4/Gal80 com- 
plex is destabilized in favor of a complex between 
Gal3 and Gal80. Superrepressor forms of Ga180 are 
galactose uninducible, and are deficient for interaction 
with Gal3. 

Another very interesting group of eukaryotic 
superrepressors are the superrepressor mutants of 
I-«B which heterodimerizes with NF-«B, a transcrip- 
tional activator of immunoglobulin, and certain anti- 
apoptotic genes. NF-«B is retained in the cytoplasm 
when complexed with I-«B. In response to a variety of 
extracellular signals, I-«B is phosphorylated and even- 
tually degraded, resulting in the transport of NF-«B to 
the nucleus. Transfection of malignant cell lines with 
I-«B superrepressors can lead to dramatic decreases in 
their abnormal proliferation rates. 


See also: Repressor 
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Mutations in genes can be detrimental when they 
result in premature stop signals (nonsense errors), 
amino acid substitutions (missense errors), or shifts 
in the translational register (frameshift errors). Com- 
pensatory substitutions in translational components 
rescue these initial mutations by a mechanism known 
as genetic suppression. Many naturally occurring sup- 
pressors have been identified, and the most common 
are variants of transfer RNAs (tRNAs). These so- 
called ‘suppressor tRNAs’ typically contain substitu- 
tions in their anticodons that recognize error-inducing 
codons and allow insertion of an amino acid into the 
growing polypeptide chain. Other suppressors are 
variants of ribosomal RNA, ribosomal proteins, and 
termination factors. Interest in therapeutically induced 
suppression is high, given the numerous diseases 
caused by single nucleotide changes. 


Types of Translational Errors 


Mutations in genes can have a wide range of effects. 
The amino acid sequence of an encoded protein may 
not be changed at all, for example, if the resulting 
codon is read by an isoaccepting tRNA. Similarly, no 
phenotypic change is observed at the functional level 
if an amino acid change does not affect the structure or 
function of the encoded protein. In contrast, a single 
nucleotide substitution can have a drastic effect not 
only on the protein being produced, but also on cel- 
lular pathways that depend on the encoded protein. 

Three types of translational errors can result from 
genetic point mutations. A substitution that changes a 
codon’s specificity from one amino acid to another is a 
missense error. A substitution that changes an amino- 
acid-inserting (sense) codon to a stop codon is a non- 
sense error. Finally, insertion or deletion of one or 
more nucleotides can result in a translational frame- 
shift. Frameshifts result in a new amino acid sequence 
and often lead to nonsense errors when the ribosome 
encounters a premature stop codon in the new reading 
frame. 


Suppressor tRNAs 


A transfer RNA that is able to suppress genetic muta- 
tions is the result of advantageous substitutions in the 
tRNA gene. Such substitutions are typically located in 
the anticodon of tRNA, such that a suppressor tRNA 
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mutation by inserting an amino acid at the position of 
the premature termination codon. Suppression generally 
occurs at low levels compared with wild-type produc- 
tion of the protein. (A) Premature termination; (B) 
nonsense suppression. 


recognizes the mutated codon or unintended frame- 
shift rather than its cognate codon. Suppressor tRNAs 
may be aminoacylated according to the amino acid 
specificity of their parent sequence, or they may be 
misacylated because of the anticodon change (for 
example, the Escherichia coli su7 tRNA is derived 
from tRNA? but inserts glutamine). In either case 
the suppressor tRNA inserts its attached amino acid 
into the growing polypeptide chain at the location of 
mutation in the mRNA (Figure 1). As long as the 
inserted amino acid is not detrimental to the protein, 
the gene mutation is rescued. One example of this 
most common type of suppression is the su2 suppres- 
sor tRNA in E. coli. This variant of tRNA,S™ has a 
G to A substitution in its anticodon so that it recog- 
nizes a UAG stop codon instead of its cognate CAG 
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codon. The su2 tRNA inserts glutamine in place of the 
premature stop. 

While missense and nonsense suppression occurs 
primarily through single nucleotide substitutions in 
the anticodon, tRNAs that suppress +1 frameshifts 
sometimes contain an extra base in the anticodon. 
These suppressors read a four-nucleotide codon, 
thereby restoring the correct translational frame. An 
example is the E. coli tRNA®” suppressor that has a 
four-base anticodon. 

Not all suppressor tRNAs contain anticodon sub- 
stitutions, however. The E. coli su9 nonsense suppres- 
sor is a variant of tRNA“? that retains its wild-type 
CCA anticodon but has a G to A change in its D-arm. 
This substitution leads to a tRNA with increased ther- 
mal stability that recognizes both its cognate UGG 
codon and, through an unusual A:C pair in the third 
anticodon position, the UGA stop codon. 

Several mechanisms combine to limit suppression- 
mediated amino acid insertion, which is typically 
between 5 and 50% of wild-type levels. Suppressor 
tRNAs are typically derived from minor isoacceptors, 
ensuring that translation of most sense codons is not 
reduced. Wild-type mRNAs often contain tandem 
stop codons, so even if one is suppressed the other will 
lead to termination. Finally, suppressor tRNAs must 
compete with termination factors for binding to stop 
codons. Together these features maintain the overall 
accuracy of protein synthesis. 


Other Compensatory Changes Result in 
Suppression 


When the ribosome reaches a stop codon, translational 
release factors facilitate hydrolysis of the fully synthe- 
sized protein from the peptidyl-tRNA. In addition to 
genetic suppression through altered tRNAs, variants 
of release factors have also been identified that pro- 
duce suppression phenotypes. Mutations are primar- 
ily within the C-terminal regions of the release factors, 
and likely prevent sequence-dependent recognition of 
stop codons. 

Certain ribosomal proteins and regions of rRNA 
have long been implicated in translational accuracy 
control. For example, mutations in small subunit 
proteins $4, S5, and S12 lead to ribosomes that are 
either hyperaccurate or error-prone. Errors in protein 
synthesis include increased levels of stop codon read- 
through, frameshifting, and missense errors. Codon- 
specific suppression variants have also been identified 
within ribosomal components. A single C to A sub- 
stitution within the small subunit rRNA (E. coli 
C1054A) results in UGA readthrough without affect- 
ing other termination events or causing missense or 
frameshift suppression. This mutation decreases the 


binding affinity of release factor 2 for the ribosome, 
resulting in the observed genetic suppression. 

Finally, suppression of genetic mutations can be the 
result of compensatory changes in partner molecules. 
Mitochondria translate a limited number of proteins 
from a small genome that also contains a complete 
set of tRNA genes. A nucleotide substitution in the 
acceptor stem of yeast mitochondrial tRNA“? was 
shown recently to be suppressed by a variant of 
aspartyl-tRNA synthetase (AspRS). The single 
amino acid subsitution in the nuclear-encoded AspRS 
enzyme is in a region known to contact the acceptor 
stem of its cognate tRNA“S?. New contacts between 
the variant tRNA“‘? and AspRS result in enhanced 
aminoacylation efficiency and genetic suppression of 
the tRNA defect. 


Therapeutic Potential of Suppression 


Recent advances in gene sequencing have revealed that 
numerous human genetic diseases are the result of 
nonsense or missense mutations. For example, substi- 
tutions in the tumor suppressor p53 are reported to be 
responsible for as many as half of human cancers. 
These p53 substitutions typically result in missense 
mutations within a critical DNA-binding region. 
Likewise, about 5% of individuals with cystic fibro- 
sis carry a premature stop codon in the gene for 
cystic fibrosis transmembrane conductance regulator 
(CFTR). If even small amounts of functional protein 
could be produced in these cases, significant reduction 
in symptoms might be achieved. Many researchers 
are therefore actively working to develop suppres- 
sor tRNAs that could be used therapeutically. Several 
challenges to such gene therapy exist. As with all 
approaches to gene therapy, any suppressor tRNA 
must be transported into the affected cells. The sup- 
pressor tRNA must be transcribed and aminoacyl- 
ated at high levels. Finally, suppression must be 
selective for the target gene’s mRNA so authentic 
termination signals for other proteins are not read 
erroneously. 

The use of aminoglycoside antibiotics has also been 
proposed as a treatment for some genetic diseases. 
Aminoglycosides interact with the decoding center 
of the ribosome and decrease translational accuracy 
by allowing readthrough of stop codons. At high 
levels of antibiotic this decrease in accuracy complete- 
ly eliminates protein synthesis, while at low levels 
limited readthrough may provide enough of the defi- 
cient protein for a near-normal phenotype. Such an 
approach has shown potential in models of both cystic 
fibrosis and Duchenne muscular dystrophy, a disease 
in which 5-15% of patients carry a premature stop 
codon in the gene for dystrophin. 
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Like all mutations, suppressor mutations are inherit- 
able alterations in the sequence of the genetic material 
of an organism. What is distinctive about a suppressor 
mutation is that it reverses the phenotypic change 
caused by a previously existing mutation, without 
actually reversing the original mutation itself. The con- 
tinued presence of the original mutation distinguishes 
a suppressor mutation from a true ‘reverse mutation.’ 
For this reason organisms containing the original mu- 
tation and the suppressor are sometimes referred to 
as ‘pseudorevertants’ to distinguish them from ‘true’ 
revertants. Although, in many instances, the pheno- 
type found in the new double mutant is not identical 
to the wild-type phenotype, it is sufficiently normal to 
allow the organism to function under selective con- 
ditions. Because the suppressor mutations occur at a 
sites other than that of the original mutations they 
suppress, they are also called ‘second-site mutations.’ 
(It must be emphasized that the suppressor mutation 
in the absence of the original mutation is unlikely to 
yield an organism with a wild- type phenotype.) 
Typically suppressor mutations have been classi- 
fied as being of two types, depending on where in 
the genome they occur compared with the original 
mutation they are suppressing. If they occur in the 
same gene they are said to be intragenic suppressors. 
If they are in another gene, they are said to be 
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intergenic suppressors. (The term “extragenic suppres- 
sor’ is also often used to refer to a suppressor mutation 
occurring in a gene other than the gene containing the 
original mutation.) Intragenic suppressors tend to 
restore the function of the gene containing the original 
mutation, a situation sometimes termed ‘direct sup- 
pression.” While intergenic suppressors can also be 
direct, they may allow the organism to somehow by- 
pass the original defect. This latter situation is termed 
‘indirect suppression.’ While some suppressors can 
suppress only a specific mutation in a specific gene, 
others can suppress a number of different mutations 
(in one gene or related genes), and still others can 
suppress entire classes of mutations in many different 
genes. 


Intragenic Suppression 


If the suppressor mutation is within the same gene as 
the original suppressor it is said to be an intragenic 
suppressor. Intragenic suppression is direct in that the 
function of the originally mutated gene is restored. 

One type of such mutations are intragenic frame- 
shift suppressors, and their discovery and character- 
ization helped elucidate the mechanism by which the 
genetic code was read. Mutations of the bacteriophage 
T4 rII genes induced by the acridine proflavin were 
found to be of two types, microinsertions (+) and 
microdeletions (—). It was discovered that some rever- 
tants of these mutants actually contained two muta- 
tions, the original r/7 mutation and an intragenic 
suppressor. When the latter was isolated by recombin- 
ation, it was found to behave as a typical r/7 mutant 
but had the opposite sign (+/—) of the original mu- 
tation. This indicated that, during expression, the 
genetic code was read in a particular frame and, 
while the removal (or addition) of a single base pair 
caused a frameshift to a nonfunctional state, the near- 
by addition (or removal) could restore the correct 
reading frame and thereby the function of the gene. 
Also, whereas a single mutation or two mutations of 
the same sign (+ or —) within the gene led to loss of 
function, it was observed that three closely linked 
mutations of the same sign led to a normal, or nearly 
normal phenotype. This gave evidence that the code is 
read in groups of three bases. Please note that this type 
of suppressor is ‘direct’ in that the function of the gene 
is (at least partially) restored. 

The other main type of intragenic suppressors are 
second site mutations within a gene that lead to an 
amino acid residue change in the protein product 
which compensates for a change brought about by 
the original mutation. Once again the restoration 
of the original phenotype may not be complete, but 
the doubly mutant protein has activity. Such mutants 
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Like all mutations, suppressor mutations are inherit- 
able alterations in the sequence of the genetic material 
of an organism. What is distinctive about a suppressor 
mutation is that it reverses the phenotypic change 
caused by a previously existing mutation, without 
actually reversing the original mutation itself. The con- 
tinued presence of the original mutation distinguishes 
a suppressor mutation from a true ‘reverse mutation.’ 
For this reason organisms containing the original mu- 
tation and the suppressor are sometimes referred to 
as ‘pseudorevertants’ to distinguish them from ‘true’ 
revertants. Although, in many instances, the pheno- 
type found in the new double mutant is not identical 
to the wild-type phenotype, it is sufficiently normal to 
allow the organism to function under selective con- 
ditions. Because the suppressor mutations occur at a 
sites other than that of the original mutations they 
suppress, they are also called ‘second-site mutations.’ 
(It must be emphasized that the suppressor mutation 
in the absence of the original mutation is unlikely to 
yield an organism with a wild- type phenotype.) 
Typically suppressor mutations have been classi- 
fied as being of two types, depending on where in 
the genome they occur compared with the original 
mutation they are suppressing. If they occur in the 
same gene they are said to be intragenic suppressors. 
If they are in another gene, they are said to be 
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intergenic suppressors. (The term “extragenic suppres- 
sor’ is also often used to refer to a suppressor mutation 
occurring in a gene other than the gene containing the 
original mutation.) Intragenic suppressors tend to 
restore the function of the gene containing the original 
mutation, a situation sometimes termed ‘direct sup- 
pression.” While intergenic suppressors can also be 
direct, they may allow the organism to somehow by- 
pass the original defect. This latter situation is termed 
‘indirect suppression.’ While some suppressors can 
suppress only a specific mutation in a specific gene, 
others can suppress a number of different mutations 
(in one gene or related genes), and still others can 
suppress entire classes of mutations in many different 
genes. 


Intragenic Suppression 


If the suppressor mutation is within the same gene as 
the original suppressor it is said to be an intragenic 
suppressor. Intragenic suppression is direct in that the 
function of the originally mutated gene is restored. 

One type of such mutations are intragenic frame- 
shift suppressors, and their discovery and character- 
ization helped elucidate the mechanism by which the 
genetic code was read. Mutations of the bacteriophage 
T4 rII genes induced by the acridine proflavin were 
found to be of two types, microinsertions (+) and 
microdeletions (—). It was discovered that some rever- 
tants of these mutants actually contained two muta- 
tions, the original r/7 mutation and an intragenic 
suppressor. When the latter was isolated by recombin- 
ation, it was found to behave as a typical r/7 mutant 
but had the opposite sign (+/—) of the original mu- 
tation. This indicated that, during expression, the 
genetic code was read in a particular frame and, 
while the removal (or addition) of a single base pair 
caused a frameshift to a nonfunctional state, the near- 
by addition (or removal) could restore the correct 
reading frame and thereby the function of the gene. 
Also, whereas a single mutation or two mutations of 
the same sign (+ or —) within the gene led to loss of 
function, it was observed that three closely linked 
mutations of the same sign led to a normal, or nearly 
normal phenotype. This gave evidence that the code is 
read in groups of three bases. Please note that this type 
of suppressor is ‘direct’ in that the function of the gene 
is (at least partially) restored. 

The other main type of intragenic suppressors are 
second site mutations within a gene that lead to an 
amino acid residue change in the protein product 
which compensates for a change brought about by 
the original mutation. Once again the restoration 
of the original phenotype may not be complete, but 
the doubly mutant protein has activity. Such mutants 
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were discovered by Yanofsky in his studies on the 
colineraity of the trpA gene and its product, trypto- 
phan synthetase. The replacement of the glycine at 
residue 211 by a glutamic acid residue leads to an 
inactive tryptophan synthetase, but the activity can 
be restored if the normal tyrosine at residue 175 is 
replaced by a cysteine. (The suppressor mutation in 
the absence of the original mutation leads to an in- 
active protein.) Such intragenic suppressors can give 
considerable insight into the functional/structural 
requirements of the protein for activity. 


Intergenic Suppressors 


If the suppressor mutation is in a gene other than that 
containing the original mutation it is said to be an 
intergenic suppressor (or an extragenic suppressor). 
These have proved both useful and interesting. Once 
again there are different types of intergenic suppres- 
sors. Many are indirect and allow the organism to 
bypass the original mutation. These bypass suppres- 
sors may activate a new pathway; if so they should be 
able to suppress essentially any mutation in the origin- 
al gene (and also mutations in other genes in the 
original pathway). Such suppressor mutations may 
be found in regulatory genes of other pathways. 

However, other intergenic suppressors may restore 
the function of the original pathway. If the original 
mutation led to a partially active gene product, in- 
tergenic suppressors may be found in that gene’s 
regulatory genes. These then will be somewhat 
allele-specific, since many mutant alleles will have no 
activity. Other intergenic suppressor mutations may 
be in genes whose products interact with the product 
of the gene containing the original mutation. Muta- 
tions in the interacting protein may compensate for 
the change i in the original protein. Such suppressor 
mutations would be expected to be very allele-specific, 
only restoring activity to a very limited number of 
mutations in the original gene. 

Another type of intergenic suppressor suppresses 
mutations of a particular class rather than mutation in 
specific genes. These are informational suppressors 
that alter, to a limited extent, how the cell containing 
them reads the genetic code. These informational sup- 
pressors were also discovered during analysis of tryp- 
tophan synthetase mutants and T4 mutants. These 
suppressor mutations act directly and allow the cell 
containing them to make a functional product from 
the original mutated gene. These suppressors, then, are 
intergenic but direct. 

The first suppressors of this type to be understood 
completely were nonsense suppressors, and these 
were investigated using nonsense mutations in the 
T4 r/I genes. Cells with nonsense suppressors insert 


an amino acid at the site of a nonsense mutation, that 
is, they read nonsense (stop) codons as sense. Most of 
these suppressors are themselves mutant tRNAs 
which have been altered to respond to one (or more) 
of the stop codons (and compete with the mechanism 
of chain termination). The majority of these have 
mutations in that part of the tRNA which encodes 
the anticodon. The mutant tRNA is normally amino- 
acylated but now reads a nonsense codon and not the 
normal sense codon. Such mutations are possible in 
cells that have duplicate genes for the normal tRNA or 
at least alternative tRNAs which can still read the 
normal sense codon. Nonsense suppressors differ in 
the efficiency of suppression. Part of this is due to the 
fact that suppression will occur only if the amino acid 
they carry will lead to a return of activity of the full- 
length peptide. Part is also due to the fact that the 
normal translational termination machinery continues 
to function. Nonsense suppressors tend not to lead toa 
loss of termination at normal stop codons at the end of 
genes, and efficient nonsense suppressor tRNAs tend 
to suppress nonsense codons other than the one(s) the 
organism prefers. For instance, most Escherichia coli 
genes terminate with a UAA codon and most efficient 
suppressor tRNAs read UAG or UGA. Many differ- 
ent suppressor tRNA mutations have been isolated 
and others have also been constructed using in vitro 
genetic manipulations. Such tRNAs can be used to 
assess the activity of proteins with different amino 
acids at a particular residue. 

Not all nonsense suppressor mutations need to be 
in genes encoding tRNAs. Such mutations have also 
been isolated in the genes encoding both the 16S 
ribosomal RNA and the 23S rRNA of bacteria. Pre- 
sumably the bases that are changed in the mutants 
normally participate in translation termination. 

In addition, mutant tRNAs have also been isolated 
that can suppress missense mutations and frameshift 
mutations. Indeed among the first intergenic, direct 
suppressors to be isolated was a mutant tRNA that 
suppressed a missense mutation in trpA. 

The selection and characterization of suppressor 
mutations remains one of the most interesting and 
rewarding experimental genetic approaches to un- 
covering gene function. Such mutations can yield 
important information on the function or regulation 
of the gene containing the original mutation and also 
uncover new genes and even new pathways. While 
we have here emphasized the < suppressor mutation’ 
occurring in a organism containing an original muta- 
tion, important information can also be gained by 
studying the suppressor mutation in isolation. Some, 
particularly the intragenic mutations, are likely to give 
rise to phenotypes very like those of the mutation 
they suppress. However, others may lead to new and 


interesting phenotypes, disclosing the functions of the 
genes that contain them. 


See also: Acridines; Mutation, Missense; Nonsense 
Mutation; Phenotype; Reverse Mutation; 
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The term ‘suppressor tRNA’ usually refers to a genet- 
ically altered or mutant tRNA that, because it trans- 
lates a codon other than its normal (cognate) codon or 
is aminoacylated with an amino acid other than its 
normal amino acid, reverses, at least to some extent, 
the effect of a mutation in the gene for one or more 
proteins. Such translational suppression has been 
described as a “mistake upon a mistake.” The latter 
mistake refers to a mutation in a protein-encoding 
gene, which results in some altered phenotype of the 
encoded protein. That mutation could be: (1) a mis- 
sense change, converting one of that protein’s codons 
to a codon for an amino acid that renders the protein 
inactive; (2) a nonsense mutation, i.e., a change of one 
of the sense codons to a termination codon (UGA, 
UAA, or UAG), causing premature termination of 
polypeptide synthesis and resulting in synthesis of a 
truncated protein; or (3) insertion or deletion of one or 
more nucleotides can result in a frameshift mutation 
that leads either to an early termination codon in the 
new frame or continued synthesis past the original 
normal termination codon until a new termination 
codon is reached in the new translational reading 
frame. The first mistake in the “mistake upon a mis- 
take” expression refers to a change in specificity at 
some step in translation that reverses the effects of 
the missense, nonsense, or frameshift mutation in the 
protein-encoding reference gene, resulting in a change 
in the primary structure of the mutant protein, either 
back to the wild-type amino acid sequence or to one 
that confers on the protein some degree of normal 
activity. 

Several types of mutant suppressor tRNAs have 
been characterized. The most common type arises 
from anticodon base changes that allow the mutant 
tRNAs to read codons other than their normal ones. 
Less frequent are mutations outside of the anticodon 
that resultin decoding changes that lead to suppression, 
particularly of frameshift mutations. Such mutations 
have been found to occur in any one of the three tRNA 
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arms, the anticodon arm (outside of the anticodon), the 
dihydroU arm, and the T-pseudoU-C arm, as well as 
the amino acid acceptor stem. Another kind of one-step 
suppressor tRNA mutation, one that is found infre- 
quently but was predicted in the original hypothesis for 
missense suppression, is a mutation that changes the 
aminoacylation specificity of the tRNA but allows the 
retention of the normal decoding specificity. An ex- 
ample of this type is a lysine tRNA base change in the 
amino acid acceptor stem that allows the tRNA to be 
misacylated some of time with alanine while still 
decoding the lysine codons AAA and AAG. Some 
mutant tRNAs with anticodon changes that alter 
their decoding properties are also misacylated to 
some extent with noncognate amino acids. 

Not limited to mutant suppressor tRNAs, trans- 
lational suppression is a most effective way in which 
to examine the structure, function, and interactions 
of any translational macromolecule, as long as that 
molecule is involved in the specificity or accuracy of 
translation. Translational suppressor mutations have 
been found and characterized in the genes for transla- 
tional molecules other than tRNAs, namely in the 
genes for elongation factors (bacterial EF-Tu and 
EF-G), termination factors (bacterial release factors 
RF1, RF2, and RF3, and yeast release factor eRF1), 
aminoacyl synthetases, and the ribosomal RNAs of 
both subunits of bacteria and yeast. Suppression of 
nonsense mutations has also been achieved by high 
expression of rRNA fragments from cloned segments 
of the bacterial 23S rRNA (large ribosomal subunit) 
gene, either in the sense orientation only, in the anti- 
sense orientation only, or in both orientations, de- 
pending on the segment examined. 

Two special situations should be mentioned in 
which apparently mutant tRNAs (suppressor tRNAs) 
are normally present in the cells. First, some organ- 
isms carry, in addition to the tRNAs for a sense codon, 
another tRNA, acylatable with the same amino acid, 
but whose anticodon allows it to decode the termin- 
ation codons UAA or UAG in certain contexts within 
a coding sequence. Second, from bacterial to human 
cells, a specialized translational mechanism employs a 
tRNA that inserts selenocysteine at high frequency at 
strategically located UGA codons within a protein- 
coding sequence. In bacteria, that special tRNA is 
acylated with serine, which is then converted in two 
steps to selenocysteine. The selenocysteyl-tRNA then 
interacts, not with elongation factor Tu, but rather 
with a special Tu-like elongation factor to position 
the special aminoacyl-tRNA at the UGA codon for 
peptide bond formation. 


See also: Elongation Factors; Translation; 
Translational Control 
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Any change in the environment in which a genome is 
expressed may be expected to have some impact on the 
phenotypes of at least some of the genes. Alterations 
in phenotype, even if subtle, may have an effect on the 
overall fitness (ability to leave reproductive offspring) 
of an organism. Environmental parameters are not 
limited to the obvious ecological ones of temperature, 
moisture, sunlight, and pH but should also be consid- 
ered to include the environment within organisms, 
within particular organs and tissues, and within cells. 
Indeed these may be among the most exacting of 
environments, especially when the additional factor 
of symbiosis is included. 

Symbiosis is a close association of two or more 
organisms of different species such that the two leave 
more offspring (are more fit) in a particular envir- 
onment as partners than as individuals. Lichens are 
a striking case in point. As a symbiont, a lichen can 
colonize the dry, bare surface of a grave stone, a habi- 
tat where the individual fungus and algae would not be 
able to survive let alone reproduce. In any intimate 
symbiosis the ‘environment’ includes the partners 
themselves and the phenotype is collective, a com- 
bination of all of their expressed genes. 

Genetic consequences of symbioses (especially 
long-term intimate symbioses) may include: 


1. A loss of genes (or gene function) that are redun- 
dant in one or the other partner. 
2. A loss of genes (or gene function) that are no longer 
necessary in the new circumstances. 
A transfer of genes from one partner to another such 
as via an exchange of viruses. Such exchanges may be 
lethal in some cases but in other cases may result ina 
shared coding for a multigenic structure or pathway 
and thus a sort of cementing of the relationship. 
Loss and transfer of genes may ultimately result in 
an obligate relationship in which the partners can 
never again be free-living. The phenotype of the 
symbiosis is effectively a shared phenotype of two 
altered genomes. All kinds of herbivorous animals 
are striking examples in that none can digest cellu- 
lose and yet a major aspect of a herbivorous niche is 
the consumption of plants. Symbionts of herbivor- 
ous digestive systems (such as those of bovine 
rumens) provide a major part of the overall pheno- 
type by digesting cellulose. 


3 


hi 


Parasitism may be seen as a variation of symbiosis, 
although one in which the reproduction of the associ- 
ates is not coordinated. In a typical mutual symbiosis, 
reproduction of one partner is accompanied by repro- 
duction in the other such that the ratio remains the 
same. In contrast, parasites often overrun their hosts 
and may be defined in part by this tendency. Parasites 
and symbionts may be seen as opposite extremes of a 
changeable continuum. Some symbioses may revert 
to more independent (and sometimes pathogenic) liv- 
ing if environmental conditions change. For example, 
green hydra will consume their algal symbionts if kept 
in the dark. Some parasites are lethal only under par- 
ticular circumstances. For example, many intestinal 
microbes are endemic in some populations and not 
especially harmful but can become harmful in an un- 
familiar host. Therefore, parasite genetics may also be 
viewed as a variation on the symbiotic theme and the 
points listed above are valid for parasites as well. For 
example, malaria parasites are truly obligate internal 
inhabitants of their host’s blood cells, in part because 
of a loss of some of their metabolic capabilities and 
a reliance on those of the host. Mitrochondria and 
chloroplasts may be considered as extreme examples 
of obligate symbionts with several additional conse- 
quences to their genetics. 


See also: Mitochondria; Predator—Prey and 
Parasite—Host Interactions 
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The symbiosis island of Mesorhizobium loti is a 
501.8-kb chromosomally integrated element that 
transfers to nonsymbiotic mesorhizobia in the envir- 
onment and converts them to symbionts able to nodu- 
late and fix nitrogen with Lotus species. The island 
integrates into a phe-tRNA gene, reconstructing the 
gene at one (left) end of the island and producing a 17- 
bp direct repeat of the 3’ end of the tRNA gene at the 
other end. Integration and excision of the island are 
mediated by a phage P4-type integrase encoded just 
within the left end of the island (Sullivan and Ronson, 
1998). The island has a mosaic structure suggesting 
that it evolved in a stepwise fashion via multiple re- 
combination events. It contains nodulation and nitro- 
gen fixation genes, including some which are spread 
across several replicons in other rhizobia, and a wide 


range of other genes. Such genes include those likely 
to be involved in transfer of the island, genes of 
unknown function found on symbiotic replicons in 
other rhizobia, genes with no homologs in current 
databases, several putative regulatory genes, genes 
encoding cell-membrane-associated components in- 
cluding porins, and an array of metabolic genes which 
may contribute to ‘fine tuning’ of nodule metabolism. 
The island is a member of an emerging class of 
acquired genetic elements that may be termed ‘fitness 
islands’ (Preston et al., 1998). These conjugative genet- 
ic elements integrate chromosomally in a site-specific 
manner, and contribute to the diversification and 
adaptation of bacteria to environmental niches. Ex- 
amples of such elements include many pathogenicity 
islands (Kaper and Hacker, 1999), the clc element 
conferring chlorocatechol degradation in Pseudomo- 
nas sp. (Ravatn et al., 1998), and the SXT element 
conferring antibiotic resistance in Vibrio cholerae 
(Hochhut et al., 2000). 
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The symbiosome is defined as a membrane-bounded 
compartment containing one or more symbionts 


Symbiosome_ 1903 


that is located in the cytoplasm of eukaryotic cells 
(Figure |). Rhizobia have the ability to infect and 
establish a nitrogen-fixing symbiosis with a variety 
of legume plants. This symbiosis involves the 


(A) 


Symbiosome membrane (SM) 


Bacteroids 


Symbiosome space (SS) 


Figure | (A) Schematic drawing of a symbiosome 
showing the location of the symbiosome membrane 
(SM), symbiosome space (SS), and bacteroids. (B) 
Transmission electron micrograph showing a symbio- 
some within an infected cell of a Lotus japonicus root 
nodule. The SM, SS, and Mesorhizobium loti bacteroids 
should be easily identifiable by referring to the schematic 
diagram. (Drawing and photo kindly provided by Dr John 
Dunlap, University of Tennessee.) 
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intracellular colonization of root cortical cells con- 
tained in the nodule structure. The symbiosome is 
composed of the symbiosome membrane (SM) which 
surrounds a symbiosome space (SS) that contains the 
symbiont (in the case of rhizobia, the nitrogen-fixing 
symbiont, i.e., bacteroid). The term symbiosome is 
specific to stable, intracellular symbioses. For ex- 
ample, it should not be confused with an endosome 
resulting from endocytosis of bacteria. 

The term symbiont has general meaning regardless 
of what system is being discussed. However, homology 
among many intracellular symbionts extends to the 
compartment (vacuole) that encloses such bacteria. 
For example, similar membrane-bounded, bacteria- 
containing structures are found in legume nodules, 
amoeba endosymbionts, intracellular Legionella, 
malaria, etc. Moreover, all intracellular symbionts in 
eukaryotic cells have similar problems to overcome. 
For example, how do such symbionts enter the cell, 
avoid lysosomes, obtain nutrients, etc. Terminology 
that draws attention to the similarity among intracel- 
lular symbionts could stimulate cooperative research 
and speed discovery. This is the reason why the term 
symbiosome was proposed and has been widely 
adopted. 

Before ‘symbiosome,’ the intracellular compart- 
ment housing rhizobial bacteroids was given a variety 
of names; peribacteroid vacuoles, bacteroid-enclosing 
compartments, peribacteroid units, symbiotic ves- 
icles, nifixasome, etc. Although all such terms were 
descriptive, they failed to draw attention to the 
homology of symbiosomes to similar structures in 
other intracellular symbionts. 

Owing to the endosymbiotic theory for the evolu- 
tion of mitochondria and chloroplasts, the suggestion 
has been made that symbiosomes be considered as 
quasi-organelles. 


See also: Rhizobium; Symbionts, Genetics of 
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‘Sympatric’ is the term used to describe species (or 
higher taxa) coexisting at the same locality. Such taxa 
may coexist in the same habitat or prefer different 
habitats at the same geographical location. 


See also: Phylogeography; Speciation; Species 


Sympatric Speciation 


See: Speciation 
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At any restricted part of the phylogeny of life, Hennig 
(1966) asserted that there were two kinds of homo- 
logous characters: apomorphies and plesiomorphies. In 
most cases a symplesiomorphy is a homologous char- 
acter shared by two or more taxa and is hypothesized to 
have evolved before the common ancestor of these taxa. 
Symplesiomorphies are also known as shared primi- 
tive or plesiotypic characters. Symplesiomorphy is a 
relative term that is contrasted with an apomorphic 
homolog at this restricted level. Consider the charac- 
ters ‘having feathers on the body’ shared by robins and 
hawks and ‘having epidermal scales on the body’ 
shared by crocodiles and lizards. At this restricted 
level, Hennig argumentation would lead to the 
hypothesis that ‘having scales on the body’ was the 
plesiomorphic homolog while ‘having feathers on 
the body’ was the apomorphic homolog. This is because 
the sister group of lepidosaurs (including lizards) and 
archosaurs (crocodilians and birds) has epidermal 
scales, indicating that the scales were already present 
in the common ancestor of all four taxa. All symple- 
siomorphies are synapomorphies at some higher level 
in the phylogeny that is more inclusive than the 
restricted level considered by the investigator. So, epi- 
dermal scales are synapomorphic at the level of a 
larger monophyletic group that includes all four or 
our taxa plus mammals, turtles, fish, etc. In the phy- 
logenetic system symplesiomorphies cannot be used 
to corroborate monophyletic groups because they 
have already been used (actually or logically) to cor- 
roborate a larger, more inclusive, group. Although 
symplesiomorphies have been used to diagnose para- 
phyletic groups by some investigators, this use is sus- 
pect because classifications containing paraphyletic 
groups are logically inconsistent with the phylogeny 
as reconstructed by synapomorphies (Hull, 1964; 
Wiley, 1981). In other words, grouping by symplesio- 
morphy represents the use of a homolog to group at an 
inappropriate level of the phylogeny. 
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Henning(1950, 1966)distinguishedbetweenthreekinds 
of taxic homologs. A synapomorphy is a homologous 
character (evolutionary novelty) shared between two 
or more species or higher taxa whose presence in these 
taxa diagnoses a monophyletic group (clade). If we 
consider all of phylogeny (the complete descent pat- 
tern of all species, the tree of life), all taxic homologies 
are either synapomorphies or autapomorphies (evolu- 
tionary novelties of a single terminal species). At some 
restricted level that considers only part of the tree of 
life, synapomorphies are those homologous characters 
shared by two or more taxa that have evolved during 
the time span covered by the phylogenetic tree the 
investigator is attempting to discover. Other charac- 
ters are either plesiomorphies (apomorphies that have 
evolved earlier) or autapomorphies (character states 
that diagnose single species). Thus, at a restricted 
part of the phylogeny the term synapomorphy is a 
relative term used to differentiate character states 
that are relevant to reconstructing phylogenetic his- 
tory from characters that are not relevant. Because all 
shared taxic homologies are ultimately synapomor- 
phies at some level in the phylogeny, the origin of 
synapomorphies is the same as the origins of the char- 
acters themselves (see Homology). Ultimately, 
synapomorphies arise as mutations that come to be 
fixed in species lineages as autapomorphies and become 
synapomorphies when the ancestral species speciates 
(see Ax, 1987; Haszprunar, 1991). 

Synapomorphies diagnose monophyletic groups 
(clades) precisely because they are the evidence 
required to hypothesize that the descendant species 
shared a common ancestral species. In contrast, shared 
plesiomorphies are not evidence that can be used to 
diagnose a monophyletic group precisely because they 
evolved earlier than the origin of the common ancestor 
hypothesized to be shared by the members of the 
monophyletic group. Since all species and groups of 
species from monophyletic groups at some level in the 
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phylogeny, the use of plesiomorphic characters to 
diagnose any group would represent the use of the 
same character at least twice, even though it had a 
singular origin (singular from the taxic viewpoint, 
arising in a single lineage). 

Although the concept of synapomorphy had been 
used by many investigators before Hennig, the sys- 
tematic use of synapomorphies as tools for reconstruct- 
ing phylogenetic trees springs from his systematic 
application of the principle of grouping by synapomor- 
phy, commonly known as Hennig argumentation or 
phylogenetic systematics. In its simplest form, Hennig 
argumentation requires a search for similarities within 
the group of study (the ingroup), formulation of 
hypotheses of character (using various criteria), and 
comparison of the distribution of these characters 
within groups that are closely related to the group of 
study (outgroups). 

Given an initial hypothesis of homology, if the 
character state in question is uniformly found within 
the ingroup but not found in the closest relative (the 
sister group) or any other closely related groups, then 
the investigator may deduce that the character in ques- 
tion is a synapomorphy for the group as a whole. For 
example, tetrapod limbs are found in all major groups 
of tetrapod vertebrates but is not found in the sister 
group of tetrapods (certain fishes generally termed 
rhipidistians) or any other taxon closely related to 
the tetrapods or their sister group (coelacanths, lung- 
fishes, other groups of rhipidistians, etc.). Thus, we 
may deduce that the tetrapod limb is a synapomorphy 
of a monophyletic group, Tetrapoda. This does not 
mean that all tetrapods must have tetrapod limbs. 
Certain lizards and snakes may have only embryonic 
vestiges or limb buds but lack limbs as adults. But, 
they are members of groups whose ancestors are 
hypothesized to have limbs. These seeming anomalies 
are the reason why synapomorphies are diagnostic of 
groups rather than defining groups. 

Hypotheses of relationships within a group are 
argued in a similar manner. If some members of a 
group have one character and other members have a 
different but homologous character, then that charac- 
ter found in the sister group and other outgroups is the 
synapomorphy that diagnosed a monophyletic group 
within the group of study. For example, within the 
plant group composed of mosses and tracheophytes, 
tracheophytes have an independent sporophyte gen- 
eration while in mosses the sporophyte is dependent 
on the gametophyte. The sister group of mosses and 
tracheophytes are the hornworts and in hornworts the 
sporophyte is dependent on the gametophyte. Thus, 
we can conclude that within the monophyletic 
group of land plants that has xylem and phloem 
(mosses + tracheophytes), the tracheophytes share 
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the synapomorphy of an independent sporophyte 
generation. This hypothesis gains additional corro- 
boration when we observe that liverworts (another 
outgroup, but not part of the sister group) also have 
saprophytes that are dependent on the gametophyte. 

The final arbitrator of synapomorphy and thus 
homology is the test of congruence of many independ- 
ent characters corroborating a particular phylogenet- 
ic tree (the congruence test: Patterson, 1988) (see 
Homology). This is because some identical characters 
are not homologies, but homoplasies. The assumption 
inherent in the congruence test is that the most parsi- 
monious explanation of character distribution yields 
the maximum number of hypotheses of homology 
and the minimum number of ad hoc hypotheses of 
homoplasy (Farris, 1980). 
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The word synapsis is derived from the Greek word 
sunapsis, meaning point of contact. The term synapse 
is used to describe a cell-cell junction that allows a 
nerve impulse to pass from one nerve cell to another. 
In the study of DNA metabolism, the word synapse is 
also used to describe the point where two DNA mol- 
ecules come together during recombination. Genetic 
recombination is any process that brings about an 


exchange of genetic information between two DNA 
molecules, and it can take several forms. Homologous 
genetic recombination, transposition, and site-specific 
recombination are the most common, and each of 
these is described in more detail elsewhere in this 
encyclopedia. In each case, at least two DNA mol- 
ecules or two different segments of the same DNA 
molecule must be brought together at the point where 
the genetic exchange is to take place. This key step, 
which may precede any covalent chemistry, is referred 
to as synapsis. The synapsis may be either DNA- 
mediated or protein-mediated. 


DNA-Mediated Synapsis 


Homologous genetic recombination is a genetic 
exchange between any two DNA molecules (or seg- 
ments of the same molecule) witha similar sequence. In 
principle, it can occur at any site on any DNA mol- 
ecule. Thus, proteins that bind to a particular sequence 
on the DNA play no role in the synapsis step in this 
process. Instead, synapsis involves the alignment of 
similar sequences in the two DNA molecules, a process 
that requires direct DNA-DNA interaction. Proteins 
participate in this process as catalysts. Proteins that 
facilitate DNA-DNA alignment include the RecA 
protein in bacteria and its homologs: the Rad51 or 
Dmcl proteins in eukaryotes, the Rad1 protein of 
Archaea, and related proteins produced by some 
viruses. These proteins (with the possible exception 
of Dmc1) form helical filaments on single-stranded 
DNA, with the bases of the DNA displayed in the 
major groove of the filament. The DNA is then aligned 
with homologous sequences ina second, duplex DNA, 
in a process sometimes called the search for homology. 
Recent studies indicate that the homology search 
involves base flipping, in which Watson—Crick inter- 
actions in the duplex are weakened and individual 
bases in the duplex are flipped out so that they can 
pair with the bases in the originally bound single 
strand. The homology search leading to synapsis is 
thus mediated by standard Watson—Crick base pairing 
(Figure 1). The sampling is very rapid. Once the cor- 
rect alignment is found, there is an extensive transfer of 
one strand of the duplex to its new pairing partner. 


Protein-Mediated Synapsis 


Site-specific recombination and transposition both 
generally require the activity of at least one protein 
that binds to specific DNA sequences. This protein 
also plays a key role in bringing DNA molecules 
together in the right orientation for the genetic 
exchanges catalyzed in these reactions. Synapsis is 
mediated largely by protein-protein interactions. In 
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many of these systems, very elaborate protein-DNA 
complexes are formed with the DNA wrapped into the 
complex in a precise geometry. The architecture of the 
synaptic complex not only brings two DNA sites to- 
gether for reaction, but can also determine the outcome 
of the reaction. In addition, formation of the synaptic 
complex is often a prerequisite for any covalent chem- 
istry, preventing the occurrence of incomplete DNA 
cleavage or strand transfer reactions, which could be 
deleterious to chromosomal DNA. 
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Figure | (left) DNA-mediated synapsis in homologous 
genetic recombination. The reaction is shown in cross 
section, with a RecA-bound single strand interacting with 
an incoming duplex DNA. Only one base (A) or base pair 
(A:T) from each DNA is shown. Base-flipping occurs 
within the duplex to allow synapsis mediated by a Watson— 
Crick interaction between the bound single strand and the 
base flipped out of the duplex. The small filled circles 
attached to each base are meant to represent the DNA 
backbone, and the duplex is thus shown approaching the 
single strand via its major groove. Several recent studies 
have provided strong evidence for a minor groove 
approach prior to base flipping, and this aspect of the 
DNA pairing mechanism is still considered controversial. 


Site-Specific Recombination 

In conservative site-specific recombination, the genet- 
ic exchange occurs at specified sequences in the DNA 
which are recognized and bound by the recombinase 
enzyme and/or auxiliary proteins. There are two 
large classes of site-specific recombinases, the inte- 
grase class and the resolvase/invertase class. 

In the integrase class, the simplest forms of the 
recombination sites consist of two protein-binding 
sites flanking a short sequence where the actual 
DNA recombination occurs. The recombinase pro- 
teins bind to a recombination site, then bring two 
bound recombination sites together in a synapse via 
protein-protein interactions (Figure 2). Recombin- 
ation is then catalyzed by the recombinase and occurs 
within the complex. 

For the resolvase/invertase class, synapsis not only 
brings two recombination sites together, but also 
determines the outcome of the reaction. Synapsis in 
these systems involves an elaborate complex with 
multiple proteins, with the DNA wrapped within 
and around the complex in a precise topology. In 
addition, the synaptic complex forms efficiently only 
when the DNA is negatively supercoiled. The syn- 
aptic complex acts as a topological filter. In the case of 
invertases, the architecture of the complex can form 
only with two recombination sites that are both on 
the same DNA molecule and inverted in orientation 
(Figure 3A). The result is that the reaction always 
leads to an inversion of DNA sequences between the 
recombination sites. Similarly, the architecture of the 
synaptic complex formed by resolvases allows them to 
catalyze recombination only between two recombin- 
ation sites on the same DNA molecule that are in the 
same orientation, leading to a deletion of the interven- 
ing sequences (Figure 3B). Each of these systems is 
thus able to ‘sense’ the relative orientation of two 
recombination sites ina DNA molecule even though 
the sites may be separated by thousands of base pairs. 
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Figure 2 Protein-mediated synapsis in site-specific 
recombination by integrase class recombinases. Synaptic 
complexes generally include four recombinase proteins 
(as shown in the first panel only), but can be 
considerably more complex (e.g., the complexes formed 
by the bacteriophage à integrase). The DNA-binding 
sites containing the base pairs specifically recognized by 
the recombinase proteins are indicated with thickened 
lines. Only the first few steps of the recombination 
reaction are shown, with the Holliday junction illu- 
strated being a common intermediate in the reactions 
catalyzed by these enzymes. Integrase class recombin- 
ases have an active site tyrosine that forms a covalent 
intermediate with the DNA. These tyrosine residues are 
indicated by Y symbols. Only two of the four subunits 
promote formation and resolution of the covalent 
intermediates at any given time, and the active ones 
are circled. Resolution of the Holliday junction into 
recombinant DNA molecules involves the noncircled Y 
residues. 


Figure 3 Synaptic complexes as topological filters: 
The invertase/resolvase class of site-specific recombin- 
ation systems. Proteins are not shown to keep the figure 
uncluttered, although multiple recombinase and auxiliary 
proteins generally bind to the indicated sites and are 
essential to form and maintain the DNA architecture 
shown. (A) The likely architecture of the synaptic 
complex formed by an invertase system. This complex 
can readily form between two DNA sites only if the sites 
are in the opposite orientation and the DNA molecule is 
negatively supercoiled. (B) The likely architecture of a 
synaptic complex formed by a resolvase system. This 
complex restricts reaction to recombination sites that 
have the same orientation within a negatively super- 
coiled DNA molecule. The structures effectively filter 
out sites in the incorrect orientation even if they are 
thousands of base pairs apart. In each complex, the sites 
labeled x are those where the DNA rearrangement 
takes place, and these sites are positioned and held 
together by protein-protein interactions. The orienta- 
tions of the x sites are indicated with arrows in both 
panels. The sites labeled y are places where additional 
recombinase proteins (which do not take place in the 
chemical steps) or other auxiliary proteins bind to 
maintain the overall DNA architecture. In the DNA 
beyond the protein-binding sites, the right-handed 
twisting represents the natural supertwisting of nega- 
tively supercoiled DNA. 


Transposition 

Transposons are discrete DNA segments that have the 
capacity to move between different chromosomal 
locations that may share no homology. Elaborate 
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Figure 4 Transposon synaptic complexes. (A) A 
generic synaptic complex with the two transposon ends 
juxtaposed over a chromosomal DNA target site. (B) 
The synaptic complex formed by the transposing 
bacteriophage Mu. This structure is closely related to 
the complex formed by the invertase class of site- 
specific recombinases. There are inverted DNA-binding 
sites at the ends of the Mu transposon, and the complex 
helps ensure that the two ends brought together are 
from the same transposon. The sites labeled x are the 
sites where the Mu DNA is cleaved. The sites labeled y 
are places where additional proteins bind to help define 
the complex architecture. The site labeled z is the target 
site on a different DNA segment where the Mu 
transposon will be inserted. As in Figure 3, the 
proteins needed to form and maintain this architecture 
are not shown. 
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protein complexes are also used to bring about synapsis 
between a migrating transposon and a new target site 
in a host chromosome. The synaptic complex gener- 
ally includes the target DNA as well as both ends of 
the transposon, or three sites altogether. The trans- 
posase enzyme that catalyzes the DNA splicing steps 
of the reaction is always a critical, and sometimes 
the only, protein component in the complex. There 
are many types of transposons. Transposition can 
involve a simple cut-and-paste movement from one 
site to another, or replication of the transposon so 
that a copy is left behind in the original location. 
Some transposons can migrate in either a replicative 
or nonreplicative mode, and the architecture of 
the synaptic complex can play a role in defining the 
mode employed. In some complex transposons, the 
architecture of the synaptic complex also helps ensure 
that the two transposon ends in the reaction are 
inverted relative to each other and thus are likely to 
have come from the same transposon. An example is 
the synaptic complex formed by the transposing bac- 
teriophage Mu (Figure 4). The principle is the same as 
that employed in the invertase/resolvase systems 
described above, in which the complex architecture 
serves as a topological filter preventing the juxtapos- 
ition of sites that are not properly oriented on the 
chromosome. 


Further Reading 

Aldaz H, Schuster E and Baker TA (1996) The interwoven 
architecture of the Mu transposase couples DNA synapsis 
to catalysis. Cell 85: 257—269. 

Davies DR, Groyshin IY, Reznikoff WS and Rayment | (2000) 
Three-dimensional structure of the Tn5 synaptic complex 
transposition intermediate. Science 289: 77-85. 

Grindley ND (1997) Site-specific recombination: synapsis and 
strand exchange revealed. Current Biology 7: R608—R61 2. 
Guo F, Gopaul DN and van Duyne GD (1997) Structure of Cre 
recombinase complexed with DNA in a site-specific recom- 

bination synape. Nature 389: 4—6. 

Gupta RC, Folta-Stogniew E, O’Malley S, Takahasi M and Rad- 
ding CM (1999) Rapid exchange of A:T base pairs is essential 
for recognition of DNA homology by human rad5| 
recombination protein. Molecular Cell 4: 705-714. 

Haber JE (1998) Meiosis: avoiding inappropriate relationships. 
Current Biology 8: R832—R835. 

Pena CE, Kahlenberg JM and Hatfull GF (2000) Assembly and 
activation of site-specific recombination complexes. Proceed- 
ings of the National Academy of Sciences, USA 97: 7760-7765. 


See also: Genetic Recombination; Holliday 
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The synaptonemal complex (SC) is two parallel 
aligned proteinaceous chromosome cores of a pair of 
homologous chromosomes at prophase of meiosis. 
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Chromosome cores 
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The cores are in the order of 60 to 100nm wide, lie 
about 100nm apart, and are interconnected by 
transverse filaments. The chromatin is attached to 
the cores in a series of loops. Supertficially, the 
structure of the SC is similar over a wide range of 
organisms from fungi, to protists, to plants, and to 
invertebrate and vertebrate animals. Although 
present at meiosis of most sexually reproducing 
organisms, it is notably absent in some species such as 
the fungus Ustilago maydis, the protist Tetrahymena, 


DIPLOTENE 


Figure | 


A diagrammatic representation of the development and structure of the synaptonemal complex at 


successive stages of meiotic prophase. The two chromatids of each homolog (black, wavy lines) are associated with a 
chromosome core which begins to form during leptotene. By zygotene, these cores become parallel aligned and 
commence ‘zipping up’ by means of the interaction of transverse filaments with the core proteins. The entire 
structure is termed the synaptonemal complex, which is fully formed by pachytene. In diplotene, the transverse 
filaments detach and the cores separate for most of their length but remain together at the chiasmata, the points of 
genetic recombination. The names of the stages and structures indicated in the diagram. The positions of three of the 
proteins that form the complex in mice, CORI (long, dashed lines), SCP2 (dotted gray lines), and SYNI (gray lines), 


are shown at the various stages of meiotic prophase. 


and males of some species of fruit flies. It is only 
partially developed in the fission yeast Schizosaccharo- 
myces pombe. 


Historical Background 


The SC was first reported in 1957 in a number of 
mammals and in an invertebrate species. Originally, 
the structure could be visualized only with the elec- 
tron microscope and the date of the reports coincides 
with the perfection and commercialization of the elec- 
tron microscope. It was intuitively assumed that this 
structure was of fundamental importance to the struc- 
ture and behavior of chromosomes at meiosis, and this 
was soon after supported by the observation that male 
dipteran insects having genetic recombination pos- 
sessed SCs while those that had no recombination 
lacked this structure. However, after a number of 
cases were discovered to have genetic recombination 
without SCs, their importance was downplayed in 
the literature. This trend was reversed in the 1970s 
with the discovery that recombination-associated 
nodules (RNs) are uniquely located at the SCs, imply- 
ing that the SCs play a role in recombination. The SC 
regained full recognition as a main player in the meiot- 
ic process with the discoveries in the 1990s that 
recombinationally active proteins are located at the 
SCs and that they interact with SC components. 


Development of the Synaptonemal 
Complex 


The diagrams of Figure | illustrate the development 
of the SC. At the leptotene stage of meiotic prophase, 
the chromosomes are unpaired and small segments of 
chromosome core appear in the nucleus as shown in 
Figure 2. The early unpaired core segments in the 
figure are visualized by immunofluorescent micro- 
scopy using antibodies against one of the core proteins 
and a secondary antibody that is conjugated with a 
green fluorochrome. 

The short segments become joined into longer 
stretches of cores and simultaneously the cores of 
homologous chromosomes start to associate with 
each other, thereby forming the first SC segments at 
the zygotene stage of meiosis. The synapsis of the 
cores is related to the formation of transverse fila- 
ments between the cores. Proteins of the transverse 
filament are reacted with their respective antibodies 
linked to a red fluorescent secondary antibody. Where 
the cores have started to synapse in the zygotene 
nucleus, yellow segments appear due to the over- 
lapping of the red and green fluorochromes. Once 
synapsis is complete, the antibody against the trans- 
verse filaments is present along the entire SCs as 
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Figure 2 (See Plate 39) The meiotic chromosome 
cores/SCs of a mouse are visualized here by indirect 
immunofluorescence with antibodies against a core 
protein and against a synaptic protein which produces 
a yellow color when the two proteins are present at the 
same site. Three meiotic prophase nuclei in successive 
stages of development (leptotene, zygotene and pachy- 
tene) are indicated in the figure. Scale bar = 10 um. 


shown in Figure | and in the pachytene nucleus of 
Figure 2. 

During subsequent development, the chromosome 
cores start to move apart when the transverse fila- 
ments are removed. In the diplotene stage depicted in 
Figures | and 3, the separated cores are green fluores- 
cent and the last remaining points of contact that have 
transverse filaments are yellow-colored. Some fila- 
ment material still adheres to the separated cores at a 
few points. Where two cores show a sharp conver- 
gence, it is suspected that this is the site of a reciprocal 
recombination event which gives rise to a chiasma 
(Figure 3, ch). 


Structural Components of the 
Synaptonemal Complex 


Two of the structural components of the meiotic 
chromosome cores in mammals are a 30kDa and a 
190 kDa protein (Figure |, COR1 and SCP2) which 
can form multimers with themselves and with each 
other resulting in long and relatively strong cores. The 
transverse filaments that lie between two cores during 
SC formation contain a 125kDa protein (Figure I, 
SYN1). This protein can be thought of as the inter- 
digitating elements of a zipper that fastens the cores 
together in close (about 100nm) parallel proximity. 
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Figure 3 


(See Plate 40) Near the end of prophase in 
this mouse nucleus, the homologous chromosomes and 
their cores separate wherever the synaptic protein is no 
longer present. The centromeric chromatin (het) is 
stained with DAPI. Some of the locations that are 
suspected of having a chiasma are designated ch. Scale 
bar = 10 um. 


These structural elements were identified by the use of 
antibodies raised against SCs. In insects and plants, the 
SCs are a dominant feature of the meiotic prophase 
nucleus, but their protein components have not yet 
been determined. 

In the yeast Saccharomyces cerevisiae, an entirely 
different methodology has resulted in the identifica- 
tion of SC components. The products of a number of 
genes that affected the meiotic process were found to 
affect the formation of SCs. The gene products of 
HOP1 and RED1 are required for normal core for- 
mation, and the ZIP1 product is present between the 
cores, where it functions in chromosome synapsis. 
Surprisingly, none of the SC components of mammals 
resembles those of yeast SC components, even though 
they are structurally and functionally similar. No 
decision can yet be made whether this can be attribu- 
ted to evolutionary divergence or to polyphyletic 
origins. 


Proteins Associated with the 
Synaptonemal Complex 


To induce a double-strand break in the DNA for the 
formation of a joint molecule as a preliminary to 
recombination in S. cerevisiae, it has been estimated 
that approximately ten different proteins are clustered 
at the site of the break. Some of these proteins reside in 
distinct 100nm foci on the SCs of yeast, plants, and 


animals. Antibodies against recombinationally active 
proteins, RAD51 and DMC1, detect approximately 
300 foci at the cores/SCs of early mouse meiotic pro- 
phase nuclei. Other proteins may be a part of the 
complex but could be present in too low abundance 
to be detected with this technique. Proteins involved 
in later recombination functions, such as BLM and 
MLH1, are present in SC-associated foci at later stages 
of meiotic prophase. In addition to these recombin- 
ationally active proteins, there are foci at the SC that 
contain the checkpoint proteins ATR and hRAD1. 
The functions of these proteins in the regulation of 
the cell cycle following DNA damage have been 
reported for somatic cells but their functions in meio- 
sis are not well understood. 


Chromatin Attachment to the 
Synaptonemal Complex 


The diagrammatic chromatin loops in Figure | and 
the chromatin haloes in Figure 3 illustrate the organ- 
ization of the chromatin relative to the SC. Particu- 
larly evident is the intensely blue-stained centromeric 
heterochromatic chromatin in Figure 3. The average 
length of the loops varies among species: short (0.5 um) 
for yeast, about 5 um for various mammals, and long 
(20 um) for some insects. The implication is that there 
are specific mechanisms that regulate loop size and 
attachment to the SC. Foreign DNA from bacterio- 
phage lambda that is inserted into mouse DNA fails 
to attach to the core/SC, indicating that there is some 
recognition mechanism. 


Further Reading 

Moens PB, Pearlman RE, Heng HHQ and Traut W (1998) 
Chromosome cores and chromatin at meiotic prophase. 
Current Topics in Developmental Biology 37: 241—262. 

Roeder SG (1997) Meiotic chromosomes: it takes two to tango. 
Genes and Development | 1: 2600-2621. 
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Syndactyly refers to the complete or partial fusion of 
two or more fingers or toes. Severe forms of synd- 
actyly involve bony fusion of digits, lesser forms 
include webbed fingers and toes. The condition is 
frequently familial. 


Syngenic 
L Silver 
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‘Syngenic’ means literally ‘of the same genotype.’ The 
term is used most frequently by immunologists to 
describe interactions between cells from the same 
inbred strain. 


See also: Inbred Strain 


Synovial Sarcoma 


C S Cooper 
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Synovial sarcoma is an aggressive soft tissue sarcoma 
that arises most commonly in young adults and ado- 
lescence, with around 200 new cases in the United 
Kingdom and 800 in the United States each year. 
Histologically biphasic and monophasic subtypes can 
be distinguished. Both these subtypes contain the diag- 
nostic translocation t(X;18)(p11.2;q11.2) that results in 
the fusion of the SYT gene on chromosome 18 with 
either the SSX1 or SSX2 gene on chromosome X. 


See also: Sarcomas; Translocation 


Synteny (Syntenic Genes) 
L Silver 
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Synteny describes two or more genes or loci that 
have been mapped to the same linkage group. Con- 
served synteny refers to the situation where two 
linked loci in one species (such as the mouse) have 
homologs that are also linked in another species (such 
as humans). 


See also: Linkage Map 


Systematics 


See: Taxonomy, Numerical 
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Systemic Acquired 
Resistance (SAR) 
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Plants defend themselves against pathogens by con- 
stitutive barriers and a number of inducible defense 
mechanisms deployed after contact with a pathogen. 
Often, a first infection with a fungal, bacterial, or viral 
pathogen induces resistance toward subsequent infec- 
tions. To a certain extent, this plant immunization is 
analogous to immunization in animals. Induced re- 
sistance may be expressed locally at the site of infection 
as well as systemically, in uninfected parts of the plant. 
This phenomenon was termed ‘systemic acquired 
resistance’ (SAR) to emphasize the power of the plants 
to acquire resistance after an initial infection even in 
tissues remotely located from the first infection. This 
type of resistance is expressed by many plants against 
a wide variety of pathogens, including organisms un- 
related to the inducing pathogen (Table 1). SAR is 
explained by the production of a signal released from 
the infected leaf and translocated to other parts of the 
plant, where it induces defense reactions. Nonpatho- 
genic root-colonizing bacteria were also found to 
induce SAR in leaves. The biochemical nature of the 
changes induced in infected plants was intensely stud- 
ied and led to the discovery of a number of proteins 
termed ‘pathogenesis-related’ (PR) proteins. It was 
also observed that the simple phenolic compound sali- 
cylic acid (SA) can induce PRs in tobacco and protect 
the plant against tobacco mosaic virus (TMV). Later, 
SA was shown to be produced by plants locally, at the 
site of infection, but also in the phloem sap as well as in 
uninfected systemic leaves, and SA was proposed as a 
possible endogenous signal for SAR. These obser- 
vations opened the way for molecular investigations 
on induced resistance. This field evolved considerably 
when SAR was found to operate in the genetically 
tractable system Arabidopsis thaliana. 


Reactions after a First Infection 


Generally, the success of the induced defense mechan- 
isms depends on the outcome of the race between the 
invading pathogen and the reactions of the plant. In 
compatible interactions, the virulent pathogen is often 
recognized too late and the plant will be infected. In 
the case of incompatible interactions, plants rapidly 
recognize the avirulent pathogen and the resistance 
mechanisms are efficiently blocking the invader. A 
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Table | Inducing agents and disease agents of plants 


Plant 


Inducer organism 


Systemic protection against 


Alfalfa 
Arabidopsis thaliana 


Asparagus bean 


Barley 


Bean 


Carnation 


Cucumber 


Muskmelon 
Oilseed rape 
Pearl millet 


Potato 


Radish 


Red clover 
Rice 
Sicklepod 
Soybean 


Stylosanthes guianensis 


Tobacco 


Tomato 


Watermelon 


Colletotrichum lindemuthianum 


Turnip crinkle virus 

Pseudomonas syringae 

Fusarium oxysporum 

Pseudomonas fluorescens WCS41 7 


Tobacco necrosis virus 
Tobacco rattle virus 


Erysiphe graminis f. sp. hordei 


Collectotrichum lindemuthianum 
Collectotrichum lagenarium 
Uromyces phaseoli 
Pseudomonas fluorescens 


Pseudomonas sp. 


Colletotrichum lagenarium 
Pseudoperonospora cubensis 
Pseudomonas lachrymans 
Tobacco necrosis virus 
Pseudomonas putida 
Serratia marcescens 


Colletotrichum lagenarium 
Leptosphaeria maculans 
Sclerospora graminicola 


Phytophthora infestans 
Phytophthora cryptogea 


Pseudomonas fluorescens 


Bean yellow mosaic virus 
Pseudomonas syringae 
Alternaria crassiae 


Colletotrichum lagenarium 
Colletotrichum truncatum 
Colletotrichum gloeosporioides 
Tobacco mosaic virus 

Tobacco necrosis virus 
Thielaviopsis basicola 
Peronospora tabacina 
Pseudomonas syringae 
Pseudomonas fluorescens CHAO 


Phytophthora infestans 


Fusarium oxysporum 


Colletotrichum lindemuthianum 


Pseudomonas syringae 

Turnip crinkle virus 
Pseudomonas syringae 

Erysiphe cichoracearum 

Botrytis cinerea 

Alternaria brassicicola 

Fusarium oxysporum f.sp. raphani 
Pseudomonas syringae pv. tomato 
Tobacco necrosis virus 

Erysiphe graminis f. sp. hordei 
Colletotrichum lindemuthianum 
Tobacco necrosis virus 
Pseudomonas syringae pv. phaseolicola 


Fusarium 


Colletotrichum lagenarium 
Cladosporium cucumerinum 
Fusarium oxysporum 
Pseudomonas lachrymans 
Sphaerotheca fuliginea 
Tobacco necrosis virus 
Colletotrichum orbiculare 


Colletotrichum lagenarium 
Leptosphaeria maculans 
Sclerospora graminicola 


Phytophthora infestans 


Fusarium oxysporum f. sp. raphani 
Pseudomonas syringae pv. tomato 
Alternaria brassicola 

Erysiphe polygoni 

Magnaporthe grisea 

Alternaria crassiae 


Colletotrichum truncatum 


Colletotrichum gloeosporioides 


Thielaviopsis basicola 
Phytophthora parasitica 
Peronospora tabacina 
Pseudomonas syringae 
Phytophthora parasitica 
Pseudomonas tabaci 
Tobacco mosaic virus 
Tobacco necrosis virus 


Phytophthora infestans 


Colletotrichum lagenarium 
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Peronospora parasitica 
Erysiphe cichoracearum 


SNI1 


Light-induced genes 


Schematic, simplified diagram of the signal transduction network operating in SAR. In this diagram arrows 


represent a flow of the information and proteins are ordered with respect to the sequence from incoming signals (left 


side) to the responses (right side). 


Abbreviations: Eth: ethylene; JA: jasmomic acid; NBS-LRR: nucleotide-binding site leucine-rich-repeat protein; NPR: 
non-expresser of PR genes; NDR: non-race-specific disease resistance; PDFs: plant defensins; PRs: pathogenesis- 
related proteins; SID: salicylic acid induction deficient; SNI: suppressor of NPRI inducible; TGA: basic leucine zipper 
(btP) transcription factor; TIR-NBS-LRR; toll-interleukin-receptor nucleotide-binding site leucine-rich-repeat 


protein. 


first infection leads to many changes, some of which 
may eventually lead to the deployment of various 
barriers that can block the invading pathogen. These 
barriers include modifications of the cell wall such as 
deposition of lignin. The deposition of this phenolic 
polymer hinders pathogens in any of several ways: 
mechanical reinforcement of the cell wall, formation 
of a hydrophobic layer preventing diffusion of water 
and solutes, protection of the other cell wall compon- 
ents from the actions of hydrolytic enzymes of the 
pathogens. Antimicrobial secondary metabolites are 
produced de novo at the site of infection. These phyto- 
alexins often have a broad unspecific activity and 
represent a toxic barrier to invaders. Activation of a 
programmed cell death similar to apoptosis in animals 
also prevents the spread of invaders. This hypersen- 
sitive reaction (HR) is genetically determined. It 
is mostly triggered when a product of a pathogen 
gene (avirulence gene) is recognized by the product 
of a host gene (resistance gene) during a so-called 
gene-for-gene interaction. The synthesis of novel pro- 
teins after pathogen attack is perhaps the most inten- 
sely documented reaction. These host-encoded, PR 
proteins are induced locally and systemically after 
pathogen infection. They occur in most plants where 
they have been looked for and have various biochem- 
ical activities (Table 2). Some of the PRs were found 
to be enzymes such as B-1,3-glucanases, chitinases, or 
proteinases capable of hydrolyzing the cell wall of 
invading fungal pathogens, while the function of 
others, for example PR-1, still remains unknown to 
date. Combinations of PRs (for example glucanase 
and chitinase) are likely to be most efficient, and dif- 
ferent types of PRs are directed at different types of 
pathogens. 


The sequence of reactions taking place in a leaf 
undergoing a first attack by a pathogen have been 
extensively studied using various mutants of Ara- 
bidopsis spp. After initial recognition of the pathogen 
by the plant, a cascade of early events is induced that 
includes ion fluxes, phosphorylation events, and gen- 
eration of nitric oxide and active oxygen species. 
SA acts as a secondary signal molecule and is required 
for increased expression of resistance and various 
defense-related proteins such as the PRs. Depending 
on the inducing microorganism, the signal transduc- 
tion pathway takes a different course according to the 
nature of the initial interaction (virulent versus aviru- 
lent pathogen, rhizobacteria). A further level of com- 
plexity exists among the incompatible interactions 
where the pathway shows a dependency on either 
one of two classes of leucine-rich-repeats proteins 
(LRRs; Figure l). Resistance against a given pathogen 
might be activated via different signal transduction 
pathways. For example, infection with leaf pathogens 
which induce resistance to Pseudomonas syringae 
depends on a pathway involving SA, while rhizobac- 
teria-induced SAR act via the plant hormone ethylene 
and jasmonic acid (Figure |). The complexity of these 
signaling pathways is further illustrated by the occur- 
rence of cross-talk or interference between pathways. 
For instance, both the induction of PR-1 and the 
resistance to P. syringae show a strong dependency 
on the light signal transduction pathway (Figure 1). 
Another example of cross-talk is given by the signal- 
ing pathway used by plants after rhizobacteria infec- 
tion. In this case, the NPR1 protein is recruited, which 
otherwise is parts of the SA pathway. 

Given the central role of SA in pathogen-induced 
signaling for induced resistance, studies have been 
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Table 2 The families of pathogenesis-related (PR) 
proteins 


Family Property 

PR-| Unknown 

PR-2 B-1,3-Glucanase 

PR-3 Chitinase (type l, Il, IV, V, VI, VII) 
PR-4 Chitinase (type l, Il) 
PR-5 Thaumatin-like 

PR-6 Proteinase inhibitor 
PR-7 Endoproteinase 

PR-8 Chitinase (type III) 
PR-9 Peroxidase 

PR-10 RNase-like 

PR-I | Chitinase (type 1) 
PR-12 Defensin 

PR-13 Thionin 

PR-14 Lipid-transfer protein 


directed at understanding the regulation of its produc- 
tion and its molecular mode of action. SA is produced 
from phenylalanine via coumaric and benzoic acid, 
but the exact precursor of SA is still unknown, and 
the enzymes involved in SA biosynthesis have not yet 
been identified or isolated. More work is needed to 
understand the regulation of SA and its localization 
after pathogen attack both locally and systemically. 
Mutants impaired in SA biosynthesis might be a valu- 
able alternative with which to discover enzymes 
involved in SA biosynthesis. The sid1 and sid2 mu- 
tants impaired in SA accumulation after pathogen 
attack represent interesting candidates, and the 
function of these genes is actively been pursued. 

The mode of action of SA was investigated by 
searching for SA-binding proteins (SAPs). These SAPs 
include catalase and ascorbate peroxidase. The bind- 
ing of SA to such H,O -scavenging enzymes was 
hypothesized to lead to the formation of a phenolic 
radical involved in lipid peroxidation. Lipid peroxi- 
dation products can activate defense gene expression, 
providing a link between SA and defense. It remains to 
be shown that sufficient lipid peroxides are formed by 
such phenolic radicals in the right time-frame for the 
defense response to take place. Another SAP of un- 
known biochemical function shows a higher affinity 
for SA or related functional analogs such as 2, 
6-dichloroisonicotinic acid (INA) or benzothiadi- 
azole (BTH; BION™) than catalase, but its biological 
relevance remains to be determined. 

Responses induced by SA include transcriptional 
activation of genes. For instance, a SA-inducible pro- 
tein kinase (SIPK) belonging to the MAP kinase 
family has been identified in tobacco. This SIPK is 
also induced upon infection and is likely to be part 


of the chain of phosphorylation events taking place 
downstream of SA. A number of studies have focused 
onthe upstream regulatory sequences of the PR-/ gene, 
one of the culminating responses in SAR. One in- 
dispensable regulatory element for SA-induced PR-1 
gene expression is a consensus sequence (TGACG) 
for recognition with transcription factors of the 
bZIP protein family. TGA proteins belonging to the 
plant bZIP transcription factors were shown to bind 
to the TGACG box in the PR-1 promoter of A. thali- 
ana. TGAs were also shown to interact physically 
with NPR1, a 65-kDa ankyrin repeat-containing pro- 
tein with homology to ikBa. A further level of regu- 
lation is provided by SNI1, which represses PR gene 
expression, presumably by direct binding to a specific 
DNA sequence or viaa transcription factor. Regulation 
of PR gene expression also involves phosphorylated 
WRKY DNA-binding factors (Figure 1). 


Systemic Signal for SAR 


The importance of SA in SAR was provided by vari- 
ous correlative studies, but most compellingly by 
transgenic plants overexpressing a bacterial salicylate 
hydroxylase gene (the NahG gene). When expressed 
in the plant, this enzyme degrades SA effectively. 
Transgenic plants carrying the NahG gene are unable 
to display SAR. The role of SA as a systemic signal was 
critically assessed using leaf detachment assays or 
grafting experiments between NahG-expressors or 
other plants with altered SA levels and wild-type 
plants. These experiments all show that SA is neces- 
sary for the induction of pathogen-induced SAR but 
that a signal other than SA can be translocated to the 
upper leaves and induce resistance. However, SA pro- 
duced in a lower leaf during infection can be tran- 
sported to the upper leaf in sufficient amounts before 
appearance of systemic resistance in that leaf. In con- 
clusion, SA as well as another putative systemic signal 
might be involved in systemic signaling during SAR. 


Reaction in Systemic Tissue 


The systemic responses can be clearly separated from 
the reactions taking place in the infected parts of the 
plant. For instance, the upper leaves of plants inocu- 
lated on the lower leaf have elevated levels of PRs. In 
this case, the systemic signal has triggered defense- 
related reactions before contact with the challenging 
pathogen. In contrast, other reactions such as changes 
in cell wall lignification were only detected after 
challenge infection of the upper leaf but with faster 
induction kinetics. Thus, the systemic signal has con- 
ditioned the tissue to respond faster. Evidence for 
conditioning has been provided using cultured plant 


cells. Defense reactions can be induced in cultured 
cells by treatment with elicitors, e.g., molecules 
derived directly from pathogens or released after the 
plant-pathogen interaction. Pretreatment with SA or 
functionally related inducers prior to exposure to an 
elicitor leads to potentiation of the elicitor-induced 
expression of defense-related phenylpropanoid genes 
such as phenylalanine ammonia-lyase (PAL) or 
4-coumarate: CoA ligase. In the same tissue, the 
expression of other genes not directly related to 
defense such as mannitol dehydrogenase or anionic 
peroxidase is induced directly by SA. SA as well as 
functional analogs such as INA or BTH appear to 
have a dual function by inducing directly the expres- 
sion or potentiating the expression of elicitor-induced 
genes. Similar observations were also made in cucum- 
ber hypocotyls and in whole A. thaliana plants and 
they are similar to the situation in a noninfected upper 
leaf of a plant infected on the lower leaf. Future 
experiments should now be designed to describe 
closely how the systemic signal induces conditioning 
in the induced leaves. 


Prospects 


Much progress has been achieved in the study of SAR 
in the last few years. An increasing number of new 
elements in the signal transduction pathway are being 
discovered and their number will undoubtedly 
increase with the advent of large-scale investigations 
of gene expression. Clearly, to understand how pro- 
teins encoded by the novel gene products operate and 
interact, interest is expected to move swiftly to the 
biochemical level for a functional understanding of 
SAR. Among the fascinating questions on SAR are 
those concerning the systemic signal of SAR, its regu- 
lation, and mode of action. SA has been implied in this 
process initially, and its role as a key signal in pathogen- 
induced SAR is well documented. Its function as a 
translocated systemic signal remains a matter of de- 
bate. In the near future, more will be learned on the 
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regulation and localization of SA synthesis and its 
mode of action. The signal transduction involved in 
the regulation of the SAR response turns out to be far 
from a linear chain of events, and several pathways 
interact, partially leading to sets of responses targeted 
to specific pathogens. 

Compounds such as SA represent interesting 
model structures for the development of non- 
antibiotic crop protectants, which trigger the natural 
potential for resistance in various plants. These 
chemicals could be considered as immunostimulants 
by analogy to certain drugs used in humans. A good 
example is BION, a compound recently released on 
the market. 


Further Reading 

Ellis JPD and Pryor T (2000) Structure, function and evolution 
of plant disease resistance genes. Current Opinions in Plant 
Science 3: 278 —284. 

Genoud Tand Metraux JP (1999) Crosstalk in plant cell signaling: 
structure and function of the genetic network. Trends in Plant 
Science 4: 503-507. 

Glazebrook J (1999) Genes controlling expression of defense 
responses in Arabidopsis. Current Opinion in Plant Science 4: 
280-286. 

Grant M and Mansfield J (1999) Early events in host—pathogen 
interactions. Current Opinion in Plant Biology 2: 312-319. 

Hammerschmidt R (1999) Phytoalexins: what have we learned 
after 60 years? Annual Review of Phytopathology 37: 285 —306. 

Malock K, Levine A, Eulgam T et al. (2000) The transcriptome of 
Arabidopsis during systematic acquired resistance. Nature 
Genetics 403—410. 

Pieterse CM] and van Loon LC (1999) Salicylic acid-independent 
plant defence pathways. Trends in Plant Science 4: 52-58. 

Van Loon LC and Van Strien EA (1999) The families of 
pathogenesis-related proteins, their activities, and compara- 
tive analysis of PR-| type proteins. Physiological and Molecular 
Plant Pathology 55: 85-97. 


See also: Arabidopsis thaliana: The Premier Model 
Plant; Rhizobium; Signal Transduction 


T Cell Receptor Gene 
Family 
L Silver 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1275 


The T cell receptor (Tcr) gene family is a member of 
the immunoglobulin gene superfamily. The Tcr gene 
family encodes polypeptides that are placed on the 
surface of the immune cells (called T cells) that provide 
the body’s cellular immune response to foreign viruses 
and bacteria. 


See also: Immunoglobulin Gene Superfamily 
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History and Genetics 


The ¢ haplotypes of the house mouse are one of the 
best-known mammalian examples of meiotic drive. 
t haplotypes are a selfish variant of chromosome 17 
that have evolved the ability to enhance their own 
transmission at the expense of the wild-type homolog. 
They were first discovered in 1927 and were originally 
referred to as ‘t mutations’ or ‘t alleles’ because they 
appeared to be mutant alleles of the T (Brachyury) 
locus on chromosome 17: T/+ animals have short 
tails while T/t animals are tailless (both +/+ and +/t 
mice have tails of normal length). These £ mutations 
were thought to act in several aberrant pleiotropic 
ways, as they appeared to have effects on embryonic 
development, genetic recombination, male fertility, 
and Mendelian segregation. Subsequently, the com- 
bined use of formal genetic analysis and molecular 
markers revealed that they were instead an extended 
genetic entity with altered chromosomal structure and 
multiple independent loci responsible for the various 


phenotypes they express. This extended genetic entity 
became known as a t haplotype (also called the t 
complex). 

t haplotypes can be found worldwide in natural 
populations of all subspecies of the house mouse, 
Mus musculus. Current knowledge of t haplotypes 
indicates that they are genetically complex, and com- 
prise a 20cM (30-40 Mbp) region of the proximal 
third of chromosome 17 (see Figure 1). They are 
defined relative to the wild-type homolog by a series 
of four major, nonoverlapping inversions. These act to 
suppress recombination across the entire region in +/t 
heterozygotes such that the integrity of the t haplo- 
type is maintained, and they are typically transmitted 
as a single genetic entity. Within this region there are 
numerous independent loci which produce the char- 
acteristic effect on tail length, cause embryonic lethal- 
ity and male sterility, and mediate the meiotic drive 
phenotype. 


Transmission Ratio Distortion 


The most characteristic feature of t haplotypes is their 
capacity to distort Mendelian segregation in their 
favor. This is known as transmission ratio distortion 
(TRD). Segregation is normal in +/t females who 
produce offspring in the expected 50:50 Mendelian 
ratios. In contrast, heterozygous +/t males will typ- 
ically transmit the t haplotype to more than 90% of 
their offspring. Early studies demonstrated that meio- 
sis is normal in these males, and that segregation dis- 
tortion occurs postmeiotically. Equal numbers of + 
sperm and t sperm are produced, and thus the TRD 
phenotype is a consequence of the production of func- 
tionally inactivated wild-type sperm. 

Although the mechanism of TRD has not been 
fully resolved, the molecular basis of segregation dis- 
tortion is finally beginning to emerge. Multiple inde- 
pendent loci, the t complex distorters (Ted) and a t 
complex responder (Tcr), interact to cause transmis- 
sion bias in favor of the t haplotype (Figure 1). At 
least three, and perhaps as many as five, distorter loci 
have been identified: Tcd-1, Tcd-2, Tcd-3, and Tcd-4 
and Tcd-5. These vary in their individual strength and 
effectiveness, and show a cumulative effect on TRD 
such that the extent of distortion is dependent on the 
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t haplotype and wild-type forms of mouse chromosome | 7. Shaded boxes represent the four t-associated 


inversions, in(I 7)! through in(! 7)4. The second inversion, in(I 7)2, is believed to have arisen on the wild-type lineage, 
where it breaks up the Tcp-/0 gene family, and not within the t haplotype lineage. Several distorter loci, Tcd-! through 
Tcd-3 (and possibly Tcd-4 and Tcd-5), interact with the responder locus, Tcr, to cause the high transmission of the t 
haplotype. The locations of the t-associated lethal mutations are indicated by stars. The H2 (or mouse major 
histocompatibility complex) is contained within the fourth inversion, and is displaced proximally relative to its 


position on the wild-type homolog. 


absolute levels of expression of the individual Ted 
genes. Although multiple loci play a role in the expres- 
sion of the phenotype, only one (Ter) is responsible for 
determining which of the two homologs is transmitted 
at high ratios. 

Physiological and genetic studies of sperm have 
demonstrated that motility defects are observed in 
the wild-type sperm from +/t males, and a model 
has been suggested in which the TRD phenotype 
might result from the expression of defective axonemal 
dyneins (the microtubule associated ATPases that 
control flagellar motility, providing locomotive force), 
producing impaired flagellar function of the + sperm. 
A gene encoding an axonemal dynein heavy chain has 
recently been identified as a molecular candidate for 
the most powerful of the distorter elements, Tcd-2. 
Recent studies of the Tcd-1 region indicated that the 
distorter activity of Tcd-1 is independent of any ster- 
ility effects and might be due to more than one inde- 
pendent distorter in the region. A candidate gene for 
Tcr, Tcp-10b, was one of the first candidates to be 
isolated and cloned; however targeted mutagenesis of 
Tcp-10b in t haplotypes failed to eliminate TRD. 


Evolution 


The accumulated evidence from studies addressing the 
origin of ¢ haplotypes suggests that they evolved 
through the stepwise assembly of the four inversions. 
Phylogenetic analyses of the DNA sequences of two 


t-associated loci (an intron of the Top-1 gene in the 
second inversion and the Hba-ps4 pseudogene in 
the fourth inversion) suggest that inversion 2 is much 
older (~3 million years) than inversion 4 (~ 1.5 mil- 
lion years), and is likely to have been the first inversion 
to have arisen. Because the second inversion is so old, 
and can suppress recombination over a region adjacent 
to both the responder (Tcr) and distorter (Ted) loci, it 
is assumed that this inversion was the primary event 
leading to the spread of t haplotypes. The remaining 
inversions probably accumulated subsequently, with 
each one ‘locking in’ additional Ted loci, increasing the 
overall strength of TRD. 

Despite this ancient age, sequence comparisons 
among independent ¢ haplotypes show extremely 
reduced levels of nucleotide polymorphism in con- 
trast to the high levels obtained in comparisons 
among independent wild-type chromosomes. Add- 
itionally, there are very few independent t haplotype 
lineages in contrast to the large number of wild-type 
ones that have arisen since their divergence. The find- 
ing that ¢ haplotypes have diverged considerably from 
their wild-type homologs, yet have very low levels of 
variation among different t haplotypes, has been inter- 
preted as evidence that all contemporary t haplotype 
may share a more recent common ancestor (dated 
from 100000 to as recently as 10000 years ago) 
which must have spread rapidly, perhaps due to 
drive, across all subspecies of Mus musculus in which 
t haplotypes are now found. In striking contrast with 


the low levels of polymorphism found at most ż loci 
are the large number of recessive lethal alleles asso- 
ciated with t haplotypes. To date, 16 independent 
complementing lethal loci have been identified on t 
haplotypes present in different populations. These 
have presumably accumulated since the recent diver- 
gence of t haplotypes from a common ancestor. This 
empirical finding of a high diversity of recessive 
lethals has led several authors to speculate that reces- 
sive lethality may impart a selective advantage to the t 
chromosome. 


Population Biology 


An obvious consequence of the high transmission of 
t haplotypes is that they should increase in frequency 
and become fixed in natural populations. Yet, and 
like other meiotic drive systems in Drosophila, they 
remain as a polymorphism, suggesting that strong 
natural selection may act against drive systems in gen- 
eral. Two counterbalancing forces account for why t 
haplotypes have not become fixed. First, all males 
homozygous for t haplotypes (t/t) are completely 
sterile due to motility defects of all of their sperm. 
Second, most t haplotypes carry recessive lethal muta- 
tions. Because t haplotypes may carry different reces- 
sive lethals this results in two possible outcomes. All 
mice homozygous for the same lethal ¢ haplotype (e.g., 
t*/t*) die early in gestation, while mice carrying two 
t haplotypes with different, complementing lethals 
(e.g., t“/t”) are viable, but male-sterile. Nevertheless, 
these counterbalancing forces do not account for why 
the frequencies of t haplotypes in natural populations 
are so low. Mathematical models indicate that for a 
lethal ¢ haplotype with a TRD of 95%, a high equilib- 
rium frequency of t haplotypes should result, such 
that about 77% of wild mice should be heterozygous 
for a t haplotype. All field studies, however, have 
found that far fewer wild mice, as few as 10-15%, 
actually carry t haplotypes. 

Several forces have been proposed that might 
maintain a low frequency of t haplotypes. Theoretical 
studies show that strong selection to reduce the trans- 
mission bias of drive chromosomes will favor the 
spread of genes that suppress meiotic drive, and this 
kind of genetic suppression has been described for sev- 
eral drive systems in Drosophila. In contrast, the evi- 
dence for modifiers of TRD is mixed. There is some 
evidence for a general effect of genetic background in 
long-term laboratory studies. However, studies on 
transmission ratio in matings from wild mice have so 
far found no evidence for reduced TRDs, suggesting 
modifiers are not common in natural populations. 

Why are modifiers of drive not prevalent in this 
system when they have evolved to counteract drive in 
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many other systems? Evidence so far suggests that 
selection is acting on other components of fitness 
instead. Studies of field-inseminated litters have 
found that multiple matings of +/+ females with 
both +/t and +/+ males can effectively lower TRD 
in a given litter from 90% down to 20%. Selection 
has also been demonstrated to be acting against +/t 
heterozygotes in several studies. Mean litter size has 
been shown to be ~20% less for litters produced by 
either +/t males or +/t females relative to wild-type 
(+/+) litters, and this is likely to be due to a reduced 
viability of +/t embryos in utero. Additionally, a 
broad pattern emerging from empirical studies sug- 
gests a relationship between t haplotypes and popu- 
lation size and structure, with genetic drift or 
inbreeding also contributing to lowering ¢ frequencies 
in larger populations. The combined effects of even 
a small reduction in TRD, with a 20% reduction in 
heterozygote fitness, and moderate levels of popu- 
lation subdivision can be shown to considerably 
lower t frequencies in simulation studies, and thus a 
combination of interacting population-level effects 
seems most likely to account for the low frequencies 
of this strong meiotic drive chromsome in natural 
populations. 


See also: Meiotic Drive, Mouse; Mus musculus 
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Bacteriophages have been major research tools for 
molecular biology; the history of phage research in 
the West is virtually a history of molecular biology 
itself. Many early advances in understanding the 
detailed biology of phage infection were made by 
Max Delbriick (1939), Salvador Luria, Alfred Hershey 
(Hershey and Chase, 1952) and their students such as 
A.H. Doermann, using a select group of phages, the 
‘T phages,’ that are still extremely important. Until 
1944, various laboratories had used many different 
kinds of phages and bacteria, often isolating them 
themselves, making it virtually impossible to compare 
results between laboratories. Delbruck set up a phage 
training course, regular (still-ongoing) meetings at 
Cold Spring Harbor, Long Island, and a “phage 
truce.” He convinced the community of American 
phage investigators to focus on seven phages selected 
by Demerec and Fano (1945), grown on Escherichia 
coli strain B in nutrient broth at 37 °C. 
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These phages, numbered T1-T7 (T for ‘type’), are 
all well-behaved in that they give clear, easily count- 
able plaques with virtually 100% plating efficiency and 
show no confusing phenomena such as lysogeny. T2, 
T4, and T6 (the “T-even’ phages) happen to be closely 
related to each other, as are T3 and T7. Much of the 
early work focused on the T-even phages, the largest 
of the group and structurally the most complex, with a 
genome size of 169 kb and contractile tails that assign 
them to the family Myoviridae. T3 and T7 are Podo- 
viridae, with short stubby tails. They are about a 
quarter the size of the T-even phages and distinguish 
themselves particularly by producing their own 
phage-directed RNA polymerase to transcribe their 
late genes. This polymerase and its associated, distinct 
promoters have been useful for cloning work, particu- 
larly when potentially very toxic gene products are 
involved. T5 belongs to the Siphoviridae, with a long, 
flexible noncontractile tail and an icosahedral head 
(90nm in diameter); its genome is about two-thirds 
the size of T4’s. The last of the group, T1, is also a 
member of the Siphoviridae; it has a 60-nm icosahe- 
dral head and a genome size of about 48.5 kb, and 
looks much like the temperate bacteriophage lambda. 
It has been studied much less than the others, mainly 
because it is so difficult to contain in the laboratory; 
unlike the other T phages, it survives drying, and thus 
often turns up in unexpected and undesired places. 
The T phages infect various strains of E. coli and 
Shigella dysenteriae to varying degrees. While all infect 
most common laboratory strains, T1, for example, 
only replicated in 2 of 290 clinical isolates of E. colt, 
and T7 also infects very few wild strains. Relatives of 
the various T phages can be found infecting most or all 
of the gram-negative bacteria, but none have yet been 
seen invading gram-positive bacteria or Archaea. 

T2, T4, and T6 are so closely related that they can 
recombine with one another in a mixed infection, as 
do T3 and T7. The T-even phages show dominance 
over virtually all other phages in mixed infections, 
inhibiting their synthesis just as they do that of the 
host even when the other phages are well into their 
infection cycle. T-even phages further distinguish 
themselves by using an odd base, 5-hydroxymethyl- 
cytosine (HMC) rather than cytosine in their DNA. 
This substitution plays a key role in many aspects of 
the mechanisms the phage uses to efficiently subvert 
the host to its purposes and to avoid attack by most 
host restriction endonucleases. 

Here we focus primarily on the T-even phages 
because of their central role in elucidating many funda- 
mental processes and control mechanisms. For ex- 
ample, T2 was used to first demonstrate that viruses 
encode enzymes and that DNA is the genetic material. 
Other important advances using T2 and/or T4 include 


demonstrations of the colinearity of gene and protein; 
the nonoverlapping triplet nature of the code, with 
three specific triplets set aside to signal “terminate 
here”; the existence and properties of mRNA; the 
processes leading to the assembly of complex func- 
tional structures; the mechanism of DNA replication; 
and the occurrence of DNA restriction and modifi- 
cation. We will, however, begin by summarizing the 
properties of the three families of T-odd phages. 


T-Odd Phages 


Bacteriophages T7 and T3 

Bacteriophage T7 has played several important roles 
in the development of molecular biology. It was the 
first of the larger phages to be fully sequenced — by 
I.J. Dunn and F.W. Studier in 1983 — and the functions 
of most of its genes were soon identified. It encodes its 
own RNA polymerase, which transcribes 10 times as 
fast as the host polymerase. This T7 enzyme has been 
used extensively in developing expression systems in 
which a cloned protein can be so overproduced that it 
forms as much as half of the total cell protein — a 
tremendous bonus for gene engineers. 

T7 is the prototype of the Podoviridae phages, 
with a stubby, noncontractile external tail about 10 x 
20nm. The tail also has an intraviral portion that 
expands after attachement to the target cell, forming 
a complex organ for DNA transfer into the cell. The 
DNA transfer process always starts from the left end on 
the standard genomic map, aided by host-polymerase 
transcription of the first 19% of the genome from 
three strong promoters that are located within the 
first 750 bp. The genes in this ‘first-step transfer’ por- 
tion encode inactivators of the host restriction enzyme 
and of its deoxyguanosine triphosphate (dGTP) tri- 
phosphohydrolase, a protein kinase that also shuts off 
host-catalyzed transcription by a phosphorylation- 
independent mechanism, a DNA ligase and the new 
single-subunit RNA polymerase. This polymerase has 
significant homology with the Saccharomyces cerevi- 
siae mitochondrial RNA polymerase. It is required to 
transcribe (and draw in) the remainder of the genome, 
transcribing first a cluster of genes involved primarily 
in DNA metabolism and then, from stronger pro- 
moters, the genes responsible for the phage capsid. 
The promoters for this polymerase consist of a highly 
conserved sequence between bases —17 and +6 relative 
to the transcription start site. There is little recognition 
for noncognate promoters between the polymerases 
from T7, T3, and the related Salmonella phage SP6 and 
Klebsiella phage K11, but changes in a single amino 
acid can interconvert the T3 and T7 specificities. T7 
has 10 promoters for the middle genes, five for the late 
genes, and one to initiate replication. 


T7 DNA replicates as a linear molecule and then 
forms concatamers using unreplicated 160-bp ter- 
minal repeats that are later duplicated during the pack- 
aging process. Growth of T7 and many of its relatives 
(but not T3) is inhibited by F plasmids. This inhibition 
involves specific interactions with the F-factor pif 
gene and causes inhibition of membrane functions 
and of all macromolecular synthesis. Several other 
prophages and resident plasmids can inhibit infection 
by T7 or by particular T7 mutants. The RNA poly- 
merases and cognate promoters from T7 and several of 
its relatives have been used to generate tightly con- 
trolled, high-level expression vectors capable of pro- 
ducing as much as 50% of the cellular protein as a 
desired cloned product. The promoter sequences are 
rare enough that these can be engineered to work even 
in eukaryotic cells. 


Bacteriophage T5 

The DNA of T5 is about 121kb long, with 10-kb 
terminal repeats, unique ends, and four nicks at precise 
sites in one strand. The DNA enters the cell in a 
two-step process. The left terminal repeat enters 
first, and pre-early genes it encodes completely shut 
off host replication, transcription, and translation, 
block host restriction systems, and degrade the host 
DNA to free bases and deoxyribonucleosides that are 
ejected from the cell. This first-step transfer DNA 
segment also encodes genes needed for the rapid 
entry of the rest of the genome once this process is 
complete, for shutoff of the pre-early genes, and for 
the orderly expression of early and then late genes 
from the rest of the genome. T5 encodes a variety 
of genes for enzymes of nucleotide metabolism and 
DNA synthesis and for modifiers of RNA polymerase. 
Circular DNA molecules with a single copy of the 
terminal repeat are found inside the cell; it appears to 
replicate in a rolling-circle mode. Precut genomes 
containing both terminal repeats are inserted into the 
preformed heads. About 25 kb of the genome, in three 
large blocks, is in principle deletable; this includes 
genes for tRNAs for all 20 amino acids as well as a 
number of open reading frames (ORFs). However, 
unless there is a compensating insertion, not more 
than 13.3 kb can be deleted and still have the DNA 
packaged. 


Bacteriophage TI 

T1 uses proteins of the iron-transport pathway to enter 
the cell, first binding reversibly to membrane protein 
Ton A, then irreversibly to TonB. It is the only one of 
the T phages requiring an energized cell membrane for 
irreversible binding; its DNA entry is effected by a 
proton symport involving tonB. A transient fall in 
proton-motive force (PMF), ATP, and GTP inhibit 
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the initiation of translation of host proteins while 
allowing phage transcription and translation to pro- 
ceed. However, host transcription continues until the 
host DNA is degraded — a process that is tightly 
coupled to phage DNA synthesis (though not required 
to produce phage DNA). Initiation of phage DNA 
synthesis requires phage protiens, but elongation is 
carried out by the host pol ITI «-subunit, a mode of re- 
plication that is common for temperate but not for lytic 
phages. The latent period of T1 is only about 13 min, 
with a burst size of approximately 100. The DNA is 
packaged from concatamers by a headful mechanism. 
Early in the cycle (or in the absence of host DNA 
degradation), it can package host DNA to produce 
generalized transducing phages; these carry various 
parts of the host genome in predictable ratios. How- 
ever, the transducing phage can be observed only with 
double amber mutant strains plated on nonpermissive 
recipients, since otherwise all of the transductants are 
killed by the large excess of viable, virulent phage. 


T-Even Phages 


The T-even phages include not only T2, T4, and T6 
but also hundreds of other phages that have been 
isolated around the world: for example, from the 
sewers of Long Island, animals in Denver Zoo, and 
patients recovering from dysentery in the USSR and 
eastern Europe. They have been found in substantial 
quantity everywhere they have been sought, such as in 
habitats supporting the growth of E. coli and/or Shi- 
gella sp., and some have also been isolated for acineto- 
bacters, aeromonads, pseudomonads, and vibrios. 
Many have been tabulated by Ackermann and Krisch 
(1997) and by Kutter et al. (1996). Although the early 
work was carried out with T2, the development of a 
detailed genetic system for T4 led to it becoming the 
primary object of study. T4 is the only member of the 
family to have been completely sequenced — a task that 
involved the coordinated efforts of the extensive T4 
community — and has been studied so extensively that 
it is the subject of two entire books. All of the T-even 
phages share the same general gene organization and 
most of the same genes. The main known differences 
among them are in the tail tips and receptors recog- 
nized, in some modifications of the DNA, in the 
complements of genes for tRNAs and for internal 
proteins packaged in the capside, and in some add- 
itional nonessential genes. Each of the known phages 
in this family has been isolated only once from nature, 
emphasizing the extent and variability of the family. 
The T4’s spaceship-like capsid (Figure 1) carries 
168 903 bp worth of genetic information, coding for 
about 300 genes. Only 161 of these have been function- 
ally characterized (Figure 2). Genes of related function 
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are largely clustered. About 40% of the genome codes 
for proteins involved in the phage’s complex structure 
and its assembly: 24 for head morphogenesis, 10 of 
which encode structural components, and 26 used in 
the tail and tail fibers, with five additional proteins 
needed for tail assembly. The six tail fibers are attached 
to a hexagonal baseplate; the double tail shaft consists 
of a hollow tail tube built up on the baseplate and sur- 
rounded by the contractile sheath, which has a mol- 
ecule of GTP bound to each subunit to provide the 
energy for contraction. With this complexity and its 
ability to largely carry out its assembly żin vitro, T4 has 
served extensively as a model system for studying 
protein self-assembly and mediated-assembly pro- 
cesses. The T4 DNA is linear, but the molecules are 
circularly permuted, ending at many different sites in 
the genome, with a terminal redundancy of about 6%, 
so the genetic map is circular; the physical basis for this 
phenomenon is discussed in detail in Bacteriophage 
recombination (see Bacteriophage Recombination). 


T4 rapidly directs the bacterial cell to stop making 
all of its own macromolecules - DNA, RNA and 
protein — and turns it into a factory for making more 
T4. This transition involves a finely orchestrated series 
of developmental steps; the times given in the follow- 
ing list refer to infection at 37 °C in rich medium under 
strong aeration, the conditions under which most stud- 
ies have been carried out. (Recent work indicates that 
T-even phages also grow well anaerobically, as in the 
mammalian gut, but few details are available for anaer- 
obic fermentation and virtually none for anaerobic 
respiration, where no glucose is present and molecules 
such as nitrate or fumarate act as electron receptors in 
the place of oxygen.) 

To summarize the infection process (Figure 3): 


1. As the T4 DNA is injected, the host RNA poly- 
merase binds to nearly 50 strong promoter regions, 
leading to transcription of the immediate-early 
genes. The products of these genes are mainly 


Figure | 
State College by Michael Wurtz, Basel, Switzerland. 


This electron micrograph of bacteriophage T4 was kindly donated to the phage archives at Evergreen 
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Figure 2 Genomic map of bacteriophage T4. Sizes and positions of all the characterized genes have been 


determined on the basis of the DNA sequence. In addition, there are a similar number of open reading frames, whose 


there are only a few kilobase pairs of apparently noncoding DNA in the 


genome. (Diagram prepared by Burton Guttman and Elizabeth Kutter, Evergreen State College.) 


functions have not yet been determined 
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Figure 3 The normal course of T4 infection at low multiplicity of infection. The only difference at high multiplicity 
is that the time of lysis is delayed by several hours, and synthesis of DNA, structural proteins, and phages continues, 
albeit more slowly. 


small proteins and are primarily involved with 
shutoff of host functions. Most are only made for 
the first 3-5 min after infection. 

2. A second group of early proteins is made starting 
about 3 min after infection. Some of these delayed- 
early proteins help replicate T4 DNA. Others are 
nucleases that degrade the host DNA, and some are 
proteins that further modify the host RNA poly- 
merase to allow recognition of the late genes and 
reduce early-gene transcription. 

3. Phage DNA synthesis starts about 5 min after 
infection, mediated by an intricate complex of 
eight proteins. Nucleotides are efficiently fed into 
this ‘replisome’ by a second complex made up of 
nucleotide-synthesizing enzymes, most of which 
are phage-encoded. The daughter DNA molecules 
recombine extensively, resulting in a complicated, 
multibranched ball of replicating DNA that can 
include over a hundred genome equivalents. 

4. Synthesis of late phage proteins, mostly those that 
constitute the phage capsid, starts about 7 min after 
infection. Meanwhile, synthesis of the T4 enzymes 
of DNA synthesis gradually stops. If anything 
blocks phage DNA synthesis, such as an antibiotic 
or a mutation in a gene essential for phage DNA 
replication, synthesis of many T4 early enzymes 
continues, as if trying to bypass the block by sheer 
numbers. Furthermore, no structural proteins are 
made; the system is regulated so that it does not 


make capsids until there are phage DNA molecules 
available to be packaged in them. 


5. Phage heads, tails, and tail fibers assemble via inde- 


pendent pathways. The heads assemble bound to 
the cell membrane, assisted by chaperonins. Then 
each of them is packed with a head full of DNA 
from the replicating complex; single-strandard 
breaks are repaired and branches are resolved in 
the process. Tails and tail fibers are then added. 


6. Cell lysis generally occurs about 25-30 min after 


low-multiplicity infection. Oxidative metabolism 
suddenly stops, and lysis is mediated by the action 
of T4 lysozyme, whose mechanism is similar to that 
of known eukaryotic lysozymes. The lysozyme 
begins to be synthesized relatively early in infec- 
tion, but it has no access to the peptidoglycan layer 
until after the synthesis and proper assembly of a 
holin that is encoded by the t gene. The released 
phages (about 100-200 per cell) are then ready to 
start another cycle of infection. 


7. If T4 is forewarned during its intracellular phase 


that there is something of a bacterial shortage, it has 
yet another growth strategy, ‘lysis inhibition.’ This 
process is initiated if another T-even phage tries to 
get into a cell already infected by T4, indicating an 
overabundance of phage relative to cells and signal- 
ing that the best strategy for reproduction is to 
delay lysis. Instead of lysing the cell after only half 
an hour, the virus somehow maintains the cell intact 


for 4—6h through a mechanism that depends on an 
interaction between a T4-encoded periplasmic pro- 
tein, rI, and the t-encoded holin in response to the 
secondary infection. The phages are thus protected 
from a relatively inhospitable environment for a 
longer period of time, as well as allowing more 
phages to be made. Expansion of the phage popu- 
lation is clearly slower under these lysis-inhibition 
conditions than when a new round of infection is 
initiated every half-hour, but this is a more effective 
strategy when the bacterial population is limited. 
Most of the T-even phages show lysis inhibition, 
but the details of the timing and extent vary among 
the different phages and are affected to some degree 
by the specific host being infected. No such phenom- 
enon has yet been reported for phages from non-T- 
even families. 


One T4 particle is enough to cause a normal infection. 
When several T4 infect E. coli at the same time, they 
peacefully coexist, mutually complement any genetic 
defects they may have, recombine extensively with 
each other, and produce progeny with all possible 
combinations of the available genetic information. 
However, if more than 20-40 phages try to infect 
the same cell simultaneously, they damage the bacter- 
ial membrane so much that the cell just disintegrates; 
this phenomenon is called ‘lysis from without.’ T4 
virions, like those of many other viruses, can remain 
viable for many years, unless they dry out, their DNA 
is damaged by radiation, their tail fibers get knocked 
off, or their DNA is released by osmotic shock before 
they encounter a susceptible host. 


Foundations of Viral Genetics 


Mutant Phages 

No technical innovation has been more productive 
than the development of gentic analysis, which 
depends upon finding mutants and using them to eluci- 
date the normal structure and operation of specific 
systems. Phage genetics began with the recognition 
of the existence of T2 and T4 plaque-morphology 
mutants. Turbid (tu) mutants make somewhat cloudy 
plaques; minute (mz) mutants make small plaques; 
and rapid-lysis (r) mutants make larger-than-normal 
plaques with sharp edges. Wild-type T4 cannot grow 
on a phage-resistant strain of E. coli B (B/4). Host- 
range (4) mutants have altered adsorption properties, 
so they can grow on B/4 and thus can form clear 
plaques on mixed B and B/4 indicator bacteria, 
where h* phages make turbid plaques, since growth 
of the B/4 bacteria continues unimpeded. The ability 
to find such phage and bacterial mutants shows how 
specific the attachment of the phage to the cell surface 
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is; the resistant bacteria do not adsorb the phage in 
question, because the requisite surface structures have 
been altered and the phage mutants have altered ad- 
sorption structures suited to the new bacterial morph- 
ology. 

Hershey and Rotman (1948) used these various 
plaque morphology mutants to demonstrate genetic 
recombination in phages. R.S. Edgar and R.H. Epstein 
recognized that T4 could only be properly explored 
genetically if one could collect a large number of 
mutants that might, in principle, affect any gene. 
They therefore searched for, and found, temperature- 
sensitive (ts) mutants, able to grow at 30°C but not at 
42°C. Since any protein of the phage might become 
inactivated at high temperature through a change in 
one of its amino acids, ts mutations might be obtained 
in any gene. However, they can only be observed if the 
absence of that gene product is very deleterious to the 
phage under the growth conditions being used. They 
also identified a second general type of conditionally 
lethal mutant, a host-dependent type that is able to 
grow in certain bacterial hosts but not others. Seymour 
Benzer had already shown that the r/J mutants are 
host dependent, growing readily in E. coli strain B but 
not in strain K, although it soon became apparent that 
the critical factor is really that the K strain they used 
carried the unrelated temperate phage lambda. Epstein 
and Charles Steinberg (Epstein et al., 1963) searched 
for anti-rII mutants that could grow in strain K but 
not in B, and found many of them. But in contrast to 
rII mutants, these mutants mapped in many genes 
around the T4 genome. It thus became evident that 
these mutations were of a general type that might 
occur in any gene. They were subsequently named 
amber (am) in honor of Harris Bernstein, the graduate 
student who helped them map large number of 
mutants (the German word ‘Bernstein? means 
‘amber’) Such am mutants have since been identified 
in many organisms and viruses; they involve mutation 
to a stop codon in the middle of the gene and thus to 
premature termination. 


Genetic Fine-Structure Analysis 

Benzer Seymour (1955) carried out a now-classic series 
of experiments with the r/J mutants that revealed 
basic facts about genetic fine structure, which can only 
be investigated if one can detect very small numbers of 
recombinants among large numbers of progency. He 
took advantage of the fact that when two r/J mutants 
are crossed, any wild-type recombinants are easily 
detectable because they alone will plate on K (A), 
while all progeny plate on B. Most of the r77 mutants 
carry point mutations, which revert (back-mutate) to 
wild-type at measurable rates. However, Benzer also 
found r/7 mutants that involved deletions: they do not 
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revert to wild-type and they fail to recombine with 
two or more point mutants at different sites. If any 
two such deletions overlap — that is, delete some com- 
mon stretch of the genome - they also will be unable 
to recombine. Eventually he was able to arrange 145 
deletions in an unambiguous sequence. His experi- 
ments thus showed that the phage genome is topo- 
logically linear, as it should be if its information is 
simply encoded in a DNA molecule. Given the dele- 
tion map, it is then easy to map any point mutation by 
crossing it against the deletions. It is first localized 
roughly by crossing against sets of deletions; then its 
position relative to nearby point mutations is deter- 
mined by standard crosses. Using this procedure, 
Benzer determined that the r/J point mutations map 
at a large number of sites, some so close to each other 
that there must be changes at neighboring nucleotide 
pairs in the DNA. These experiments confirmed that 
the phage genome can be understood as a simple 
DNA molecule, with mutations being changes in any 
nucleotide. 


Complementation and Operational 
Definition of a Gene 

Mapping a series of mutations, even those very close 
to one another, still leaves open the question of bound- 
aries between genes. Theoretically, a gene is best 
understood as the region that encodes a single poly- 
peptide; operationally, there must be some way to 
delimit such a region. Benzer determined that the r// 
mutations actually fall into two neighboring genes, 
both required for growth in K(A). Benzer infected 
K(A) cells simultaneously with any two rI mutants, 
reasoning that if their defects are in different genes, 
each one can supply a function that the other lacks; 
they are able to complement each other and produce 
an infection yielding viable phage. If, on the other 
hand, both mutations fall in the same gene, the two 
phages together are no better off than each one by 
itself, and they cannot grow. This complementation 
test is thus an operational definition of a gene. When 
applied to the rJ7 mutants, it allowed unambiguous 
assignment of each mutation to one of the two neigh- 
boring genes. (The well-defined boundary between 
them was then chosen as the origin of the T4 genetic 
map.) The same test can be used with any host-lethal 
mutants. Note that complementation is quite different 
from recombination. In recombination tests, the ques- 
tion is whether, and with what frequency, two genomes 
can recombine their information to produce a new 
genome; one must wait until the next generation to 
determine the answer. In complementation tests, the 
question is whether two genomes, each missing some 
functional unit, can mutually supply gene products 
(generally proteins) to produce a normal function 


and, thus, viable phage under otherwise nonpermis- 
sive conditions. 

By using a large collection of am and ts mutants, 
Epstein et al. (1963) outlined the general structure of 
the T4 genome (Figure 2). The genome has since been 
found to contain 168903 nt pairs; it is standardly 
drawn with its (arbitrary) zero point, the junction 
between the rIIA and rlIB genes, at 9 o’clock. It is 
largely organized by function, with the late or capsid- 
related genes falling predominantly into two large 
blocks. The genes for a few late proteins are inter- 
spersed in early regions; transcription and translation 
of such genes is subjected to unusual transcriptional 
and translational controls. The other regions contain 
primarily early genes, many of which are not essential 
under ordinary laboratory conditions and thus cannot 
be defined by am or ts mutations in the usual way. 
Many of these genes have been identified by mutations 
obtained by functional methods; they have names of 
one to four letters which usually relate to their function 
(such as ‘e’ for lysozyme (‘endolysin’), denA and denB 
for DNA endonucleases, or rpbA and rpbB for RNA 
polymerase-binding proteins). Note that genes that 
appear ‘nonessential’ under standard laboratory con- 
ditions may be necessary or at least highly beneficial 
under conditions in the natural environment, most 
commonly the mammalian gut. 


Special Properties of T-Even Phages 

The hydroxymethyl groups, like the methyl groups of 
thymine, are located in the major groove of the DNA 
helix, where they do not affect base-pairing but can be 
used as recognition signals. This is analagous to the 
5-methylcytosine that is formed at specific sites after 
DNA synthesis in both prokaryotic and eukaryotic 
systems and acts as a control signal for many cellular 
processes. The use of HMC in place of C in T4 DNA 
facilitates the viral domination of the host in several 
ways: 


1. T4 is immune to most bacterial restriction systems. 
Bacteria protect their own DNA from restriction 
endonucleases by marking it with methyl groups at 
the cleavage site; most of these enzymes modify a C 
residue and are also blocked by the hydroxymethyl 
group, so that they do not attack T4 DNA. E. coli 
does have a nuclease that specifically recognizes 
HMC as foreign, but T4 blocks this nuclease by 
glucosylating the HMC residues in its DNA. No 
E. coli enzyme has yet been observed that can attack 
the sugar-coated DNA, 

2. T4 encodes cytosine-specific nucleases that degrade 
the host DNA but do not attack its own DNA. 

3. T4 inhibits transcription of bacterial DNA by pro- 
ducing a small protein, gpAlc, which interacts with 


both the RNA polymerase and DNA to terminate 
the elongation of transcription of all cytosine- 
containing DNA. 


T4 could also ensure that it does not accidentally 
package host DNA by only packaging hmdC DNA. 
However, it has no such mechanism, and can therefore 
package host DNA and carry it to a new cell, acting as 
a transducing phage, as long as the degradation of host 
DNA is blocked by elimination of the genes that 
encode the specific T4 nucleases. This seems to work 
much more efficiently in complex phage mutants that 
use cytosine rather than HMC in their DNA and are 
missing gpAlc. 

T4 also encodes several enzymes that are particu- 
larly useful in genetic engineering work, even though 
their value to T4 itself is still unclear: 


1. A DNA ligase that can join two blunt-ended pieces 
of DNA. (The other known DNA ligases will only 
join DNA pieces that have complementary single- 
stranded ends, and thus can be held in register.) 

2. An RNA ligase (whose main known function in the 
phage is to join the tail fibers to the tail, a process 
that seems to involve only proteins, not RNA). 
Though at least three T4 genes contain introns 
(the first to be demonstrated in eubacterial systems), 
the RNA ligase apparently plays no role in their 
splicing, which occurs autocatalytically. 

3. A 3'-phosphatase, 5’-kinase which acts on DNA, 
RNA, a number of vitamins and cofactors, and a 
variety of other molecules. Mutations in this gene 
have no observable deleterious effects on the phage. 


The tail fiber genes are generally the most diverse 
region between the various T-even phages, as might 
be expected considering their role in host range deter- 
mination. Each tail fiber is made primarily of two very 
long proteins, gp34 and gp37, with two much smaller 
proteins, gp35 and gp36, forming the flexible joint 
between them. In T4, gp38 is added to the distal tip 
of gp37 and provides the specificity of binding to the 
receptor. Many of the other T-even phages have a 
totally different version of gp38 that is involved only 
in the assembly process; in those cases, the receptor 
binding site is near the distal end of gp37 itself. 
T4 phage particles initially absorb to the surfaces of 
sensitive cells through specific contact between the 
distal ends of the tail fibers and specific diglucosyl 
residues on the outer membrane lipopolysaccharides 
of the E. coli B cell surface or the outer membrane 
protein ompC on K strains. Each of the other T-even 
phages has its own specific receptors: membrane pro- 
teins OmpF and FadL for T2, Tsx = NupA for T6, 
OmpA for K3 and Ox2, OmpF for Tula, OmpC 
for Tulb. The reversible initial binding is quickly 
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followed by irreversible attachment by means of 
gp12, a short tail fiber that extends down from each 
vertex of the baseplate. The gp 12 binding leads to an 
allosteric hexagon-to-star transition in the arrange- 
ment of the baseplate proteins and thence in the rela- 
tionship between the tail sheath monomers. Using 
energy from bound GTP, the sheath contracts while 
the baseplate stays bound to the cell surface, forcing 
the central, noncontracting core of the tail fiber 
through the membrane of the cell. 


Introns in Genes of T-Even Phages 

A large fraction of eukaryotic genes are now known to 
be fragmented by the insertion of one or more non- 
translated intervening sequences, or ‘introns,’ within 
their coding sequences, which must then be excised 
from the primary transcripts. However, such com- 
plexities were considered a purely eukaryotic pheno- 
menon until the report by Chu et al. (1984) of a 1-kb 
intron within the thymidylate synthase (td) gene of 
bacteriophage T4. Two additional T4 genes have since 
been shown to also contain introns: those for the 
aerobic and anaerobic nucleotide reductases, genes 
nrdB and nrdD. 

The observation of introns in T4 raises obvious 
questions about splicing mechanisms, especially once 
it was observed that these introns could also be spliced 
out when cloned into E. coli in the absence of other T4 
genes. Does this indicate that E. coli carries splicing 
machinery and thus presumably also contains introns? 
Additional research in the laboratories of Belfort, 
Shub, and Chu has shown that the T4 introns are, in 
fact, self-splicing and are capable of assuming a sec- 
ondary structure virtually identical to that of the 
eukaryotic type I self-splicing introns. The splicing, 
in all cases, occurs via the same mechanism, involving 
a series of transesterifications, or phosphoester bond 
transfers, with the RNA functioning as an enzyme. 
The ease of doing genetic analysis in T4 facilitates 
studying the role of various intron and exon sequences 
in the self-splicing reaction. It is not yet clear whether 
the structural homology between the T4 and eukaryo- 
tic introns reflects some ancient evolutionary origin 
of such splicing or a later transfer of introns from 
eukaryotes to T4; in either case, many interesting 
questions are opened up. 

Phage T4 seems to be a never-ending source of 
novelties for molecular biologists, and the latest of 
them has been described by Huang et al., 1988, who 
found a mechanism in which a 50-base segment of the 
information in one gene is skipped over during trans- 
lation. Gene 60 encodes an 18-kDa subunit of the 
DNA topoisomerase, which is involved in phage 
DNA replication. There is strong evidence that the 
extra sequence is not removed as an intron. The extra 
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bases, which are bracketed by a direct repeat of 5 nt 
pairs, appear to be pushed out ina kind of hairpin loop 
so that the codons on either side of it are brought 
together. Though the structure at this point is unusual, 
a ribosome can presumably move right through it, 
translating the messenger properly while ignoring all 
the nucleotides in the loop. This segment of gene 60 
was inserted into the N-terminal coding sequence 
of the B-galactosidase gene of E. coli, where it also 
was neither excised nor translated. The fused genes 
showed comparable levels of enzyme activity to a 
fusion without the extra 50 bases, indicating that the 
looped-out sequence has little effect on translation of 
the messenger. This ribosomal bypass region has been 
found only in gene 60 of T4 and a few of the other 
T-even phages; there is nothing like it in the compar- 
able gene of phages T2, T6, and most other family 
members. In those cases, T4’s gene 60 is actually 
fused with the gene for another topoisomerase sub- 
unit, coded in T4 by gp39. In T4, genes 39 and 60 are 
separated by several hundred base pairs that are 
noncoding except for what appears to be the residue 
of one of the homing endonuclease genes described 
here. 


Conclusions 


Bacteriophages, especially the large, complex phages 
discussed here, have long been a major focus of mol- 
ecular biology. An amazing amount of basic biologic- 
al information has been uncovered with the T-even 
coliphages alone. Although much of the excitement of 
molecular biology has now shifted to eukaryotic sys- 
tems, many investigators continue to work with phages 
and continue to astonish their colleagues with dis- 
coveries of previously undreamt-of mechanisms and 
processes. Phage systems, which are relatively easy to 
handle and involve inexpensive materials and short 
time scales, remain excellent material for working 
out the details of many kinds of complex mechanisms 
and for training young investigators. One major 
advantage is the degree of genetic understanding of 
the phage-host system and the ability to combine 
genetic, physical, and biochemical tools in attacking 
a problem. But this line of work reemphasizes an 
important point about biological research: that simply 
knowing the structure of a DNA molecule is not 
enough, because the sequence of nucleotides tells little 
about the function of that sequence, even though it 
may yield important clues. Biology is something more 
than chemistry. A gene is not merely a segment of a 
DNA molecule; it is a meaningful segment, which 
must be expressed and regulated, often through 
complex mechanisms, and there is no way to know 
those mechanisms a priori just by doing chemical 


experiments. This has again been reemphasized with 
the discovery of the folded-out intron in gene 60. 
Molecular biology has been fruitful primarily because 
it combines chemical work with biological — especially 
genetic — studies. And much of its fascination lies in its 
promise of another surprise after every experiment. 
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The TAL family consists of three proto-oncogenes 
(TAL1, TAL2, and LYL1) that were identified through 


the analysis of tumor-specific chromosomal transloca- 
tions associated with human T-cell acute lymphoblas- 
tic leukemia (T-ALL). Each of these genes encodes a 
polypeptide that harbors the basic helix-loop-helix 
domain (bHLH), a DNA-binding motif common 
to many eukaryotic transcription factors. Since the 
bHLH domains of TAL1, TAL2, and LYL1 are more 
closely related to one another than to those of other 
proteins, they constitute a discrete subgroup within 
the larger family of bHLH proteins (Figure 1). 
Nevertheless, each of the TAL genes has a different 
pattern of tissue-specific expression during normal 
development. Also, while gene targeting has shown 
that the mouse Tal! protein (also called Scl) is essential 
for the formation of all blood cell lineages, similar 
studies have not revealed overt defects of hemapoiesis 
in mice devoid of Tal2 or Lyl1. Thus, despite their 
common role in leukemogenesis and the striking amino 
acid sequence homology of their respective bHLH 
domains, it appears that each of the TAL genes has 
assumed distinct functions during mammalian devel- 
opment. 


TAL Proteins Serve as DNA-Binding 
Transcription Factors 


The bHLH motifisa structural domain of 50-60 amino 
acids that forms two amphipathic a-helices separated 
by an intervening loop (Figure |). The bHLH domains 
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of two separate polypeptides can interact to form a 
dimer that binds DNA as a parallel, left-handed four- 
helix bundle. Sequence-specific DNA recognition is 
mediated primarily by a stretch of basic amino acids 
that reside near the N-terminal flank of each dimer- 
ized bHLH motif (Figure 1). Although some bHLH 
proteins can form homodimers, the TAL proteins 
only bind DNA upon heterodimerization with the 
‘E proteins,’ a distinct subgroup of bHLH proteins 
encoded by the E2A, E2-2, and HEB genes. These 
heterodimers (e.g., TAL1/E2A) have been shown to 
bind DNA ina sequence-specific fashion and to modu- 
late the transcription of reporter genes that contain a 
cognate recognition sequence. Thus, at the biochem- 
ical level, the TAL proteins appear to function as 
transcription factors. 

The bHLH domains of TAL1, TAL2, and LYL1 
also interact with the LIM domains of the LIM-only 
oncoproteins LMO1 and LMO2. This interaction 
allows for the assembly of a larger DNA-binding 
complex, which includes not only a bHLH hetero- 
dimer such as TAL1/E2A but also a member of the 
GATA transcription factor family. One such oligo- 
meric complex (E2A/TAL1/LMO2/LBD/GATA-1) 
has been observed in erythroid cells as well as in 
leukemic T cells derived from patients with T-ALL. 
This complex binds DNA in a bidentate fashion 
in which the E2A/TAL1 heterodimer contacts its 
recognition sequence on DNA (the E-box), while 
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Figure | 


TAL family of basic helix-loop-helix (bHLH) proteins. (A) Schematic of the TALI, TAL2, and LYL! gene 


products. Shaded bars represent the DNA-binding bHLH domains. (B) An alignment of amino acid sequences from 
the bHLH domains. Whereas the bHLH motifs of TALI, TAL2, and LYLI share more than 85% amino acid identity, 
they are less similar to the corresponding motifs of other bHLH proteins (e.g., MYC). 
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GATA-1 binds a neighboring GATA sequence (see 
Figure | in LMO Family of LIM-only Genes). 


Role of TAL Genes in T-Cell Leukemia 


Certain lymphoid malignancies are characterized 
by common chromosome abnormalities that can be 
found in almost all affected patients. For example, 
more than 95% of patients with Burkitt’s lymphoma 
have chromosome translocations that activate the 
MYC proto-oncogene, while most cases of follicular 
B-cell lymphoma feature a translocation that activates 
the BCL2 proto-oncogene. In contrast, cytogenetic 
studies have not uncovered a common chromosomal 
defect associated with T-ALL. Instead, a series of rare, 
but recurrent, chromosome translocations are found 
in T-ALL patients. Each results in the transposition 
of a proto-oncogene into the T-cell receptor (TCR) 
locus on either chromosome 7 (TCR f-chain) or 14 
(TCR «/é-chain). Of the nine proto-oncogenes 
known to be activated in this manner in T-ALL, 
three encode the members of the TAL family and two 
encode the LIM-only proteins with which they are 
known to interact (i.e. LMO1 and LMO2). Thus, 
although the chromosomal abnormalities associated 
with T-ALL are diverse, many of them target proteins 
within the same biochemical pathway. Moreover, 
malignant activation of the TAL/ gene is especially 
frequent in T-ALL. While chromosomal transloca- 
tions involving TAL/ are observed in only 3% of 
cases, an additional 25% of patients harbor local re- 
arrangements of the TALI gene and nearly half of 
all pediatric cases show evidence of tumor-specific 
TAL1 activation. As such, TAL/ represents the most 
commonly activated proto-oncogene known to be 
involved in T-ALL. 

TAL1, TAL2, and LYL1 are not expressed during 
T-cell development. In contrast, the translocated 
alleles of these genes are actively transcribed in T- 
ALL cells, suggesting that ectopic expression of any 
one of these genes in the T-cell lineage is potentially 
leukemogenic. This has been confirmed using mouse 
models in which targeted expression of a Tal1 trans- 
gene in thymocytes results in the formation of clonal 
T-cell leukemias after a long latency. The exact 
mechanisms by which ectopic expression of the TAL 
genes elicits T-cell tumorigenesis are not clear. DNA- 
binding protein complexes containing, for example, 
TAL1 might promote leukemogenesis by altering the 
normal pattern of gene expression in T-lineage cells. 
Alternatively, the TAL gene products might serve as 
dominant-negative inhibitors of the E proteins, which 
normally function as homodimeric bHLH transcrip- 
tion factors during lymphoid development. By dis- 
rupting these E-protein homodimers, ectopic TAL1 


might impair T-cell maturation and thereby increase 
the likelihood of leukemic transformation. 
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Repeated DNA segments are sometimes found adja- 
cent to each other ina direct orientation. Schematically, 
if ‘ABCD’ represents an ordered genetic sequence, a 
direct tandem repeat could consist of ‘ABCBCD,’ with 
segment ‘BC’ being the repeat unit. Such tandem 
repeats can be of various lengths ranging from several 
nucleotides to entire groups of genes. Tandem repeats 
can occur both in coding and noncoding DNA 
sequences. In certain instances, large numbers of 
these DNA repeats can found in direct orientation in 
repeat arrays. For example, the nucleolar organizer 
region of many organisms including Xenopus, Dros- 
ophila, and humans carries hundreds of rRNA genes 
in a tandem repeat array. 

One prominent characteristic of tandem repeats is 
their instability. Tandem repeats are prone to both 
increases (such as ‘ABCBCD”’ to ‘ABCBCBCD’) 
and decreases ((ABCBCD’ to ‘ABCD’) in the number 
of repeat elements. Such rearrangements are believed 
to occur by homologous recombination between the 
repeated DNA segments (including unequal crossing- 
over between chromosomes) or by slipped misalign- 
ment of the repeat sequences during DNA replication. 
Repeats at a variety of genetic loci in humans are 
variable enough among individuals (so-called variable 
number tandem repeats or VNTR) that they have been 
used as molecular ‘fingerprints’ for forensic purposes. 
For certain loci in humans, rearrangements between 
tandem repeats can lead to genetic disease. For ex- 
ample, in Huntington disease, fragile X syndrome, or 


myotonic muscular dystrophy, expansion of a trinu- 
cleotide repeat array to a larger size, within their 
respective loci, is associated with the disease pheno- 
types. Certain thalassemias are caused by deletion 
between tandem globin genes. 

How are tandem repeats formed? During replica- 
tion, slipped misalignment at short repeated DNA 
sequences flanking a single-copy gene segment can 
cause the repeat and the gene segment to be dupli- 
cated. Alternatively, mispairing and unequal crossing- 
over between chromosomes can initially produce the 
duplication. Once a duplication has been established, 
it is free to undergo further rearrangements, as dis- 
cussed above, including deletion or expansion in 
repeat number. Genetic selection may sometimes 
play a role in driving expansion of genes into tandem 
arrays. For instance, in bacteria, selection for in- 
creased resistance to certain antibiotics can result in 
the amplification of a drug-resistance gene as direct 
repeats in a tandem array. Selection for increased gene 
copy may also explain the existence of tandem arrays 
of rRNA or histone genes in certain organisms. Once 
a gene is duplicated, it is free to diverge in sequence 
from its partner and, therefore, tandem repeats may 
play an important role in evolution. This is probably 
the case for the globin loci of humans where diverged 
globin genes, expressed differentially in development, 
are found in a tandem array. Nevertheless, despite 
the usefulness of certain tandem duplications, many 
tandem-repeated DNA sequences are in noncoding 
DNA with no known function. Tandem duplication 
may be, like genetic mutation, a source of genomic 
flux that can occasionally be beneficial to the organ- 
ism. 


See also: Evolution of Gene Families; 
Globin Genes, Human 
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The first method developed for creating transgenic 
mice — direct injection of DNA into embryonic 
nuclei — is very powerful, but it has two serious 
limitations. First, it can only be used to add, not sub- 
tract, genetic material. Second, the insertion of genetic 
material cannot be targeted to particular genomic 
locations. In genetic terms, this means that trans- 
genic mice produced by embryonic nuclear injection 
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are only useful for the analysis of dominant pheno- 
types. 

By 1989, a second independent transgenic tech- 
nology — targeted mutagenesis or gene targeting — 
was developed to circumvent these limitations. 
Targeted mutagenesis provides researchers with the 
ability to eliminate, or knockout, any cloned gene. 
The same technology can even be used to replace 
single amino acids or larger regions of a gene to obtain 
an allele with an altered function. This ultimate tool of 
genetic engineering can be used in experiments with 
two different kinds of objectives. First, as means for 
determining gene rates by examining the pheno- 
typic consequences incurred in a developing embryo 
or animal that does not express a particular gene, or 
expresses an alternative form of that gene. Second, as a 
means for creating mouse models for human diseases, 
like cystic fibrosis, that are caused by the loss of gene 
function. 

The targeted mutagenesis technology is technically 
more demanding and more complex than nuclear 
injection technology, and its development was depend- 
ent on two critical advances in cell culture that 
occurred during the 1980s. The first was the establish- 
ment of im vitro conditions that allow researchers to 
place mouse embryos, at the blastocyst stage, into 
culture where they continue to divide without differ- 
entiating. These cultured cells are called embryonic 
stem (ES) cells. ES cells appear to be similar to cells 
from the inner cell mass (ICM) in that they retain 
totipotency. It is possible to grow cultures containing 
many millions of ES cells from a single embryo, and 
then recover a handful of cells from this culture for 
injection back into the blastocoele cavity of a normal 
embryo, where the ES cells can attach to the ICM, 
divide, and contribute to all of the tissues in the adult 
mouse that develops out of that embryo. Most import- 
antly for geneticists, the ES cells even contribute to 
the germline of these chimeric mice so that genes 
present in the ES cell genome can be passed on to 
future generations. 

The second critical advance necessary for the devel- 
opment of the targeted mutagenesis technology came 
with the establishment of a protocol for homologous 
recombination in ES cells. When mouse cells are trans- 
fected (the word used to describe DNA transform- 
ation of mammalian cells) with mouse-derived DNA, 
chromosomal integration almost always occurs at ran- 
dom locations other than the site from which the 
DNA was derived. However, very occasionally, the 
added DNA will ‘find’ its endogenous homolog and 
replace it by a process of ‘homologous recombin- 
ation.” The frequency of homologous recombination 
events asa fraction of total integrants is on the order of 
10 ~to 10°. 
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If ES cells were simply transfected with cloned 
fragments of mouse DNA, homologous recombin- 
ation events would not cause any genomic changes. 
But, the methods of recombinant DNA technology 
provide researchers with tools for modifying cloned 
genes so that they are no longer functional. Now, 
when homologous recombination occurs with one 
of these specially designed knockout constructs, the 
endogenous wild-type gene is replaced by a non- 
functional allele. Finally, in order to make use of the 
homologous recombination technology, researchers 
needed to develop special protocols for identifying 
and recovering the very rare cells in which these events 
took place. 

One, although not the only, appeal of the gene 
targeting technology is the ability to create mouse 
models for particular human diseases. But, in essence, 
gene targeting can provide investigators with power- 
ful tools to study any cloned gene. While patterns of 
RNA and protein expression provide clues to the 
stages and tissues in which genes are active, it is only 
with mutations that a true understanding of function 
can be obtained. 


Creating Gene Knockouts 


Once a particular gene has been cloned and character- 
ized, the steps involved in obtaining a mouse with a 
null mutation in the corresponding locus can be out- 
lined briefly as follows. 


1. Design and construct an appropriate targeting 
vector in which the gene of interest has been dis- 
rupted with a positive selectable marker; in the 
most commonly used protocol, a negative select- 
able marker is also added at a position that flanks 
the gene sequence. The most commonly used 
positive selectable marker is the neomycin resist- 
ance (neo) gene, and the most commonly used 
negative selectable marker is the thymidine kinase 
(tk) gene. 

Introduce the targeting vector into a culture of 
embryonic stem (ES) cells (usually derived from 
the 129 strain), then select for those cells in which 
the internal positive selectable marker has become 
integrated into the genome without the flanking 
negative selectable marker. 

Screen for clones that have integrated the vector 
by homologous recombination rather than by the 
more common nonhomologous recombination in 
random genomic sites. 

Once ‘targeted clones’ have been identified, pro- 
duce chimeric embryos through the injection of the 
mutated ES cells into the inner cavity of a blastocyst 
(usually of the B6 strain), and place these chimeric 
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embryos back into foster mothers who bring them 
to term. 


The experiment is deemed a success if the ES cells 
successfully enter the germline of the chimeric animals 
as demonstrated by breeding. If the disrupted gene 
is indeed transmitted through the germline, the first 
generation of offspring from the chimeric founder 
will include heterozygous animals that can be inter- 
crossed to produce a second generation with individ- 
uals homozygous for the mutated gene. 


See also: Chimera; Embryonic Stem Cells; 
Knockout 
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The TATA box is a conserved A:T-rich sequence of 
nucleotides found about 25 bp upstream of the start- 
point in a eukaryotic RNA polymerase II transcrip- 
tion unit; it is involved in positioning the enzyme for 
correct initiation. 


See also: RNA Polymerase; Transcription 
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The biodiversity on earth, running to many millions of 
species, is so enormous that it would not be possible to 
study it if it were not classified. Endeavors at ordering 
this diversity were made many years ago by the 
Greeks (Theophrastus). From the sixteenth century 
to the time of Linnaeus the main interest was in medi- 
cinal plants, and here correct identification was of the 
utmost importance. The ‘downward’ classification, 
employed by Linnaeus, in which the mass of uniden- 
tified species was divided at every step into smaller 
groups by employing divisional logic (dichotomy), 
rather quickly led to identification: animals are either 
cold-blooded or warm-blooded; if warm-blooded 
they have either hair or feathers; if they have feathers 
they can either fly or not (e.g., the ostrich), etc. This 
was a method of identification but not of classifica- 
tion. It placed far too much weight on single charac- 
ters and often led to very unnatural groupings, like 
placing the whales among the fishes. 


In the period from c. 1770 to 1830 ‘downward’ 
classification was replaced by ‘upward’ classification. 
This consists of the construction of classes of similar 
species in which the classes are arranged in a hierarchy 
by the degree of their similarity to each other. Such an 
arrangement is called a classification. The definition of 
classification given by a dictionary is approximately as 
follows: a classification is the systematic arrangement 
of entities into groups or classes, according to the 
degree of their similarity or relationship. 

This concept of classification, based on similarity, 
is widely used in human affairs. Books in a library or 
goods in a store or any other heterogeneous mass of 
entities are classified according to the principle of 
similarity. However, applying this universal method 
to biodiversity ran into difficulties. Different authors 
often disagreed on what was most similar. Worse still, 
some authors fell back on relying on a few conspicu- 
ous characters. As a result, for the next 150 years there 
continued to be much argument as to what would be 
the best classification. 

It had long been recognized by perceptive philoso- 
phers that similarity alone is not always sufficient for a 
good classification. If similarity or difference among 
groups of entities is caused by a particular factor, this 
causal factor must also be taken into consideration in a 
classification. Darwin applied this principle to the 
classification of biodiversity. He realized that, accord- 
ing to the theory of common descent, the descendants 
of a particular ancestor would tend to be more similar 
to each other than they would be to unrelated species. 
More importantly, if owing to superficial similarity, 
an unrelated species is included in a taxon, a detailed 
character analysis would reveal that it had not des- 
cended from the common ancestor of the other species 
of the taxon. Such a superficially similar species is then 
removed from the taxon. Such an analysis is referred 
to as cladistic analysis (see below). 

Darwin presented his new ideas on classification in 
On the Origin of Species (Darwin, 1859). Since then 
evolutionary taxonomists have more or less adopted 
his principles. They are best designated as the method 
of Darwinian classification. 


Darwinian Classification 


Darwinian classification employs two sets of criteria 
in the classification of biodiversity, degree of differ- 
ence, and genealogy. Darwin emphasized repeatedly, 
verbally and in his correspondence, that genealogy 
alone cannot produce a good classification. The cru- 
cially important aspect of a Darwinian classification is 
that in the first step the classes of similar species are 
determined (‘classification’) and in the second step 
(‘cladistic analysis’) all species are removed from 
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these taxa that are not clearly descendants of the near- 
est common ancestor. A taxon that consists exclu- 
sively of all descendants of the nearest common 
ancestor is called a monophyletic taxon. Haeckel, 
1866 defines the term monophyletic taxon as consist- 
ing of the descendants of the nearest common ances- 
tor. 


Cladistic Analysis 

Hennig (1966) clearly recognized that only derived 
(apomorph) characters can be used to determine 
branching pattern, not ancestral (patristic, plesio- 
morph) characters. Followers of Darwinian classi- 
fication use, likewise, cladistic analysis in order to 
determine whether or not a taxon delimited by them 
is monophyletic. There have been earlier authors who 
appreciated the importance of this principle, but 
Hennig was the first to articulate it clearly. It is a 
legitimate method for the weighting of characters. 
The use of cladistic analysis does not make a classi- 
fication cladistic. The most successful recent appli- 
cations of cladistic analysis were made by testing 
additional classifications for strict monophyly. 


Objective of Classification 

The purpose of any classification is to serve as an 
information storage and retrieval system. Every 
taxon is relatively homogeneous and all included 
species share some well-defined attributes: all mam- 
mals have a mammalian jaw articulation, are warm- 
blooded, are hairy, and suckle their young with milk. 
Furthermore, almost any Darwinian taxon is adapted 
for a particular niche or adaptive zone. Hence, almost 
invariably, the taxa of a Darwinian classification have 
an ecological significance. Particularly helpful is the 
arrangement of the taxa in a hierarchical pyramid: 
species, genus, family, order, class, phylum. The 
more different two species are, the higher the (higher) 
taxon to which they belong. Cat and dog belong to 
different families, but to the same order (Carnivora). 
Cat and yeast belong to different kingdoms (Animalia 
versus Fungi). 

The Darwinian classification is ideally suited to 
fulfill the functions of a classification and is, therefore, 
also used when, for special purposes, other ordering 
systems are used simultaneously. 


Cladification 


In 1950 Hennig proposed a new ordering system for 
organisms. It was based entirely on the branching 
pattern of the tree of descent and was called by him 
‘phylogenetic systematics.’ This term was rather mis- 
leading because the differences arising during the 
divergence of the various lineages are as much part of 
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phylogeny as their branching. Hennig’s system, con- 
sisting of an ordering of the branches, was therefore 
renamed cladification, and taxonomists practicing it 
are called cladists. 


Method of cladification 

Cladification is not a method of classification, it does 
not establish ‘classes of similar and/or related species.’ 
Rather, it recognizes phyletic lineages, branches of the 
phyletic tree (clades or cladons). A cladon consists of 
the stem species that gave rise to the branch and all of 
its descendants. As a result, for instance, the mammals 
are combined into a cladon with their reptilian ances- 
tors, the Therapsida and Pelycosauria. Taking these 
two groups out of the Reptilia makes the Reptilia for 
a cladist a ‘paraphyletic’ group (see Paraphyly). The 
remaining Reptilia are then no longer a valid taxon. By 
contrast, for a Darwinian taxonomist, birds and mam- 
mals are the only species that answer to the diagnostic 
characters of birds and mammals. Apart from the 
fact that a taxon must be monophyletic, its descent 
does not determine its classification. Cladification is 
most helpful whenever questions of phylogeny are in- 
volved. It sheds light on the time of origin of particular 
characters. For instance, it shows that nest building 
did not originate with the birds, because it occurs 
already in other taxa of the branch of the reptiles, 
thearchosaurians (thecodonts, dinosaurs, crocodiles), 
from which birds are descended. Cladification permits 
inferences on phylogeny from an analysis of the 
characters of living forms without the use of fossil 
material. 

In recent years numerous cladists have suggested 
that the establishment of a cladification (a cladogram) 
would make the Darwinian classification superfluous; 
however, this is not the case. As already described 
above, the virtues of the Darwinian system for infor- 
mation retrieval and its importance in ecology make 
its preservation indispensable. Classification and cla- 
dification have different objectives and can exist side 
by side. 


Incompatibilities between Cladifications and 
Traditional Classifications 

In view of the evident merits of the method of cla- 
dification, the question is often raised why so many 
taxonomists still use the Darwinian classification as 
their preferred ordering system. The main reason is 
that the two systems have entirely different objectives 
and cladification is unable to produce a classification 
as traditionally understood. It is impossible to 
convert a cladogram into a Darwinian classification. 
Therefore, it is not correct to say that establishing a 
cladification makes a classification superfluous. An 
analysis of the incompatibility of the two methods is 


the best way to demonstrate how different the two 
systems of ordering are. Among the numerous incom- 
patibilities between the two systems the following 
may be listed: 


1. The method of combining all descendants of a stem 
species into a single cladon results in great hetero- 
geneity. For instance, the synapsid stem species 
gave rise to the pelycosaurs, therapsids, and mam- 
mals, a highly heterogeneous cladon. By contrast, 
the taxa in a classification are relatively homoge- 
neous, a property which is very important for infor- 
mation retrieval. 

Cladification only uses derived characters. This is 

indeed a necessity in cladistic analysis, but in a 

classification one must use the totality of all char- 

acters, derived as well as ancestral ones. Indeed, its 
ancestral characters are often the most diagnostic 
characters of a taxon. 

3. Cladists tend to assume that characters originate 
uniquely, hence a study of the distribution of newly 
derived characters permits the construction of un- 
equivocal cladograms. This assumption overlooks 
the frequency of parallelophyly. Parallelophyly is 
the independent origin of the same character in two 
related phyletic lineages owing to their possession 
of a similar ancestral genotype. Parallelophyly is 
one of the major causes for the frequency of homo- 
plasy. The irregular distribution of stalked eyes in 
the acalypteran flies is an example of homoplasy 
owing to parallelophyly. 

4. Autapomorphic characters, that is characters that 
evolved in only one of two sister groups, are largely 
neglected in cladification. This prevents giving 
proper weight to the differences between sister 
groups. One sister group often deserves a much 
higher categorical rank than the other. 
The stem species that gives rise to a new cladon 
usually has only one (or very few) of the derived 
features (synapomorphies) that later characterize 
the cladon. Others gradually accumulate during its 
further evolution. Even though the stem species 
belongs to the cladon, it may lack most of the 
apomorphic characters which later become diag- 
nostic for this cladon. 
Since degree of difference is, in principle, ignored 
and likewise the use of ancestral characters, cladifi- 
cation has no adequate method for the ranking of 
cladons. Both of Hennig’s criteria, geological age 
and equal ranking of sister groups, have proven 
unworkable and cladists, to achieve some sort of 
ranking, developed a new method that makes use of 
degree of difference, an approach expressly rejected 
by Hennig. This method, called sequencing, is vul- 
nerable to various difficulties. 
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Owing to their difficulties with ranking, cladists 
cannot construct a hierarchy of cladons that would 
reflect the degree of difference among higher taxa. 
This is a grave weakness, since a hierarchical 
arrangement is the most important property of 
any efficient classification. 

7. The earlier portions of a highly derived cladon are 
usually members of a well-established taxon in a 
Darwinian classification. The dinosaurs, for in- 
stance, are part of the Reptilia in such a classifica- 
tion. In a cladification the dinosaurs forming with 
birds and crocodilians the Archosauria cladon are 
removed from the Reptilia, which thereby become 
‘paraphyletic’ and are no longer a valid taxon. All 
those fossil taxa that have given rise to a derived 
(descendant) taxon become paraphyletic in a cladi- 
fication and must be broken up and renamed. The 
adoption of the principle of paraphyly thus results 
in the destruction of the majority of traditional 
taxa, particularly of fossil taxa. Thus, cladification 
is in conflict with the highest objective of classifica- 
tion, namely, stability. Other provisions of cladifi- 
cation likewise are in conflict with stability. 

8. Equally inimical to stability is the custom of cla- 
dists to give an entirely new meaning to traditional 
terms. For instance, phylogeny when introduced 
by Haeckel, referred to both of its components, 
cladogenesis and anagenesis. But Hennig restricted 
it to the former. Likewise, the term ‘monophyletic’ 
referred for 100 years to a taxon that was derived 
from the nearest common ancestor. For Hennig, the 
term describes the mode of descent of a branch (see 
Monophyly, Holophyly). 


Other weaknesses of the methods of cladification have 
been pointed out in the recent literature (Cronquist, 
1987; Hedberg, 1995; Knox, 1998). It is for this reason 
that the Darwinian classification continues to be so 
widely adopted. It represents and classifies organic 
diversity better than cladification, which restricts 
itself to the study of branching patterns. 


Other Systems of Ordering Species 


A number of taxonomic methods have been proposed 
that are not specifically evolutionary. 


Special Purpose Classifications 

These are usually based on a single characteristic, like 
diploid versus polyploid plants, or the traditional 
arrangement of plants under the headings trees, 
shrubs, herbs, and grasses. Such special classifications 
are not evolutionary; also they have such a low infor- 
mation content that they cannot be used for broader 
generalizations. 
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Phenetics 

A system of classification based entirely on degree of 
similarity (or difference). Some taxonomists thought 
that by taking enough characters, preferably more 
than one hundred, one was sure to come up with 
taxa that were clearly the descendants of the nearest 
common ancestor. But this method never became 
popular, not only because it was very laborious, but 
also because it was usually impossible to find so many 
reliable differences. Also it encounters great difficul- 
ties owing to homoplasy, mosaic evolution, and the 
absence of criteria for character weighting (Mayr and 
Ashlock, 1991). However, Sibley and Ahlquist’s (1983) 
classification of birds based on the DNA difference of 
the taxa is essentially a phenetic method. 
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Numerical taxonomy is the grouping of taxonomic 
units by numerical methods into groups on the basis 
of their properties. It requires the information about 
taxonomic entities to be converted into numerical 
quantities which can then be analyzed by appropriate 
algorithms. It includes the drawing of phylogenetic 
inferences to the extent that this is possible. Therefore, 
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the term broadly covers much of systematics, and 
includes the distinct activities of classification (order- 
ing of entities into groups) and identification (assign- 
ment of additional entities to their correct groups). 

Attempts to quantitate biological relationships go 
back many years, and isolated advances were made in 
fields such as statistics, psychometrics, and anthro- 
pology. However, no consistent scheme for choosing 
properties, for coding and comparing the data numeric- 
ally, and for constructing a classification or identifica- 
tion scheme had been developed. These problems 
were addressed in the late 1950s and early 1960s, and 
resulted in the welding together of many single themes 
into a coherent plan of action. This program was made 
feasible by the advent of digital computers. Numerical 
taxonomy has also been referred to as taxometrics, or 
Adansonian taxonomy after the early French botanist 
Adanson who first addressed the problem of differen- 
tial weighting of characters. 

The philosophical bases of numerical taxonomy 
rest on the empirical tradition of statistics and on the 
theory of predictivity propounded by philosophers of 
science such as Whewell and Mill. The groups formed 
by numerical taxonomy are ‘natural groups’ in the 
philosophical sense, though not necessarily natural 
in the commonly used sense of phylogenetic groups. 
Thus, the groups of chemical elements in the periodic 
table, such as halogens or alkaline metals, are natural 
in the philosophical sense although their members are 
not related by common ancestry. The relationships 
between the entities are highly multivariate, because 
numerous properties are considered. The groups thus 
produced are not defined by the invariable possession 
of certain of the properties, that is they are not mono- 
thetic. They are instead polythetic, which means that 
they share many properties, but no property must of 
necessity be present in members of a group. The 
groups thus formed can accommodate a limited num- 
ber of exceptional properties. 

One of the key concepts in numerical taxonomy is 
predictivity. In essence this means that groups of en- 
tities can predict correctly the most likely situation in 
new, as yet unanalyzed, members. Thus, a new bird will 
be expected to have feathers and a new mammal will be 
expected to have hairs, though exceptions may occur 
(for example whales do not have hair). Despite some 
limitations the concept of predictivity allows one to 
test classifications. Those that make more correct pre- 
dictions are superior to those that make less. 

The first problem is the choice of entities to be 
studied, and especially the choice of properties. In 
most biological work this seldom causes difficulties. 
Thus, the length of an insect will usually be considered 
relevant but not the day of the week the specimen was 
collected. There are still some conceptual problems, 


such as how to separate size from shape of organisms, 
and how best to combine information that is partly 
contradictory. Numerical taxonomy does require 
deliberate choices of properties, and one can see that 
this will depend on the aims of the classification, 
whether to express for example morphological or 
molecular detail. 

The second most important question is to decide 
what properties are to be compared across entities. 
It seems obvious that one should compare length of 
wing in one entity with length of wing in another, and 
not with length of foot. In biology this is usually called 
the concept of homology: homologous comparisons 
should be made. Yet this leads to some serious prob- 
lems. Homology is usually taken to mean that the 
properties are the same by ancestry, and wing and foot 
on the gross level have different ancestries. But to be 
sure of homologies one must first know the phyl- 
ogeny, and this is either not certain, or should not be 
prejudged. Instead, character complexes that share the 
most similarity (in some sense) are taken as homolo- 
gous. Thus, wings are more similar to other wings than 
they are to feet. 

These issues are best shown by molecular sequence 
data, where there may be several families of proteins, 
globins for example. One wishes to compare ortholo- 
gous sequences, i.e., those that have the strongest 
matches, and by implication the closest evolutionary 
relationships (aided perhaps by physiological evi- 
dence). Thus, one would compare one myoglobin 
with another myoglobin and one hemoglobin with 
another hemoglobin. The existence of several subfam- 
ilies of sequences (such as « and B globins) and 
of pseudogenes that are not expressed can lead to 
difficulties. Once the comparable sequences have been 
chosen there is still the problem of which sites should 
be compared, because of insertions and deletions in 
the sequences. This is again a question of homology, 
homology of sites, and in general it is solved by 
searching for that alignment between two sequences 
that gives the greatest number of matches along the 
entire sequence. The great success of molecular classi- 
fication shows that this approach is sound, although 
there may be some areas that present difficulty. 

A third problem is what weight should be given to 
different properties or characters. Thus, should char- 
acter 1 be considered ten times as important as char- 
acter 2, or perhaps only one-tenth? The answer is in 
general based on the amount of information given by 
a character. Complex characters should be broken 
down into several unit characters, each of which is 
given the same weight. The justification is that each 
unit of information should contribute unit weight. This 
broad principle implies that in numerical taxonomy 
characters are equally weighted as far as possible, 


although minor deviations from the rule have little 
effect on the final outcome. The principle prevents 
wildly different weights from producing serious dis- 
tortion of the findings. 

Numerical taxonomy became embroiled in a dis- 
pute about evolution. The view has grown that bio- 
logical groupings should above all reflect phylogeny. 
It is of course perfectly permissible to aim for the best 
phylogenies, and numerous well-founded algorithms 
have been devised for this. But insistence on phyl- 
ogeny does not obviate the need to consider character 
choice, homology, and character weighting. The dis- 
pute centered about the concepts of cladistics, i.e., of 
how to recognize phyla or clades. These ideas were 
introduced by followers of the German zoologist 
Hennig, and in their strict form this required phylo- 
genetic groups to be formed exclusively from identical 
properties that were derived by descent from a com- 
mon ancestor. Such properties clearly cannot be 
known before one knows the phylogeny, which is 
what is to be determined. Strict cladistics is therefore 
not an operational method. Cladistic ideas have since 
become more complicated, and cladistics is now 
becoming largely a synonym for some form of phylo- 
genetic analysis. 


Steps in Numerical Taxonomy 
The steps in numerical taxonomy are as follows: 


1. The entities to be studied are chosen, together with 
the properties that are to be employed. The proper- 
ties are then coded in numerical form. 

2. Similarities between the entities are calculated. 

3. The salient taxonomic structure is determined from 
the similarities and is summarized in the form of 
groups of entities. 

4. The groups are treated as successively inclusive 
groupings using criteria such as taxonomic rank or 
phylogenetic age. 

5. The data are reorganized to give identification sys- 
tems for new, unknown, entities. 


These steps must be carried out in the order given 
above. One cannot, for example, choose the best char- 
acters for identification before the groups have been 
determined. Many studies only continue to step 3, and 
at present few continue to step 5. More details are 
given below. 


Step | 

The entities to be studied, ¢ in number, may be of many 
kinds — species, genera, populations, individuals, and 
molecular sequences. Therefore, these are termed 
operational taxonomic units (OTUs). The properties 
are termed characters, each with their character states. 
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Thus length of leaf is a character and 11 cm is a char- 
acter state of leaf length. Molecular properties are 
usually sites in protein or nucleotide sequences, and 
their character states are amino acids or nucleotides. 
Character complexes are complex properties that can 
be broken down to single characters. Thus, leaf shape 
is a character complex. At this stage decisions on 
homology must be taken to ensure that comparisons 
will be among the correct characters. It is usual to 
employ as many characters as are feasible, covering a 
wide range of properties that are considered relevant, 
because the reliability of the groupings generally 
increases with the number of characters. 

The character states are then coded in a suitable 
numerical form. Characters are either qualitative 
(presence-absence, coded 1 or 0) multistate (e.g., 
amino acids, coded as one of the 20 alternatives), or 
quantitative (e.g., length of leaf, coded in cm). The 
latter require scaling, because the units in which they 
are measured must be controlled, otherwise their 
effect on the analysis becomes indeterminate. Thus, 
5cm could be scored as 50mm or 0.05m or even 
0.00005 km. Some rational solution is needed, and 
this is usually by ranging them between 0 and 1 (for 
the smallest and largest measurement in the OTUs, 
respectively) or else by standardizing them to zero 
mean and standard deviation of 1. The OTUs and 
characters form a rectangular matrix of n rows of 
unit characters with t columns of OTUs, whose 
entries are the coded and scaled character states. 


Step 2 

The similarities between the OTUs are calculated 
using one of the coefficients of similarity or dissimil- 
arity. A simple coefficient is the proportion of 
matches, m, in a set of n qualitative characters. For 
example, 25 matches in 32 characters gives 78.1% 
similarity. This can also be expressed as dissimilarity 
of 1—m, and it can be represented in the alternative 
form of a distance of 0.219. The similarity of identical 
OTUs can be given as 1.0 or 100 %, or as distance 
of zero. These values yield a square similarity 
matrix of size t x t (though usually only the lower 
triangular half is recorded). The relationships can also 
be represented in space, where the positions of the 
OTUs are points in an imaginary space of n dimen- 
sions. There are numerous coefficients of similarity, 
which are chosen to reflect various desired types of 
relationship. 

Certain experimental techniques yield the equiva- 
lent of similarity matrices directly. Thus, a table of 
serological cross-reactions, or of nucleic acid hybrid- 
ization, records similarity between organisms from 
physicochemical reactions. The entries are not char- 
acter states of OTUs. 
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Step 3 

A table of similarities does not make evident the rela- 
tionships between groups of OTUs. Two main classes 
of method are available for elucidating the taxonomic 
structure. The first leads to tree-like diagrams, or den- 
drograms, in which the OTUs are situated at the tips 
of the branches. The second yields plots of the pos- 
itions of the OTUs in a simplified space, usually two- 
or three-dimensional diagrams. 

In the first group are algorithms for cluster analysis 
and for phylogenetic reconstruction. These differ in 
the assumptions about the cause of the observed rela- 
tionships. Most cluster methods search the similarity 
matrix for the most similar OTUs and group them 
together, and then successively add the next most 
similar OTUs until all have joined. Various criteria 
are used for the joining process. The smallest distance 
between any OTU of a cluster to any OTU in another 
cluster gives straggly clusters (single linkage analysis). 
The average distance between all members of one 
cluster and all of another cluster is the criterion in 
average linkage analyses (the best known is the 
unweighted pair group method with averages, 
UPGMA). The similarity level at which branches 
join forms one axis of the tree and the OTUs are 
given in order of joining along the other axis. The 
tips of the tree are all at the same level, i.e., at similarity 
of 100% or distance of zero. Such a tree is termed a 
phenogram, and the relationships express similarity 
without phylogenetic assumptions. Methods for 
reconstructing phylogeny rely on assumptions (often 
very complex) on the way evolution has proceeded. 
The basic principle is that evolutionary change has 
been as small as possible to yield the observed rela- 
tionships between OTUs. This may be described 
loosely as the principle of evolutionary parsimony. 
The term parsimony is also used in more restricted 
senses, so that a most parsimonious tree is different 
from a minimum distance tree or a maximum like- 
lihood tree, though all of these rest on the broad 
principle of minimum evolutionary change. These 
techniques also add OTUs successively to give a 
tree-like diagram, where the branches are phylogen- 
etic groups (phyla or clades). Some methods bypass 
the similarity matrix itself, though similarities are 
implied in some form. Because different rates of evo- 
lution are taken into account the tips are not all at the 
same level. The resultant dendrogram is a phylogen- 
etic tree or cladogram. Furthermore, the position 
representing the earliest point in time, which corres- 
ponds to the most recent common ancestor of the 
OTUs, is often uncertain, and such a tree is termed 
an unrooted tree. Further analysis may be needed to 
determine the root, for example by including a distant 
OTU that is believed to belong to a different clade 


than all the other OTUs. The diagram then becomes a 
rooted tree. 

A method that is less often used relies on evolu- 
tionary compatibility between pairs of characters, 
known as clique analysis. The character states of a 
pair of characters may allow representation ona phylo- 
genetic tree such that no repeated mutations, or back 
mutations, are required to account for the observed 
data. Such a pair is termed compatible, and the groups 
of mutually compatible characters are taken to indi- 
cate the clades. The criterion of parsimony here is 
parsimony of unnecessary mutations. 

One pervasive problem is that unlike most methods 
of cluster analysis, many techniques for phylogenetic 
reconstruction cannot be guaranteed to find the opti- 
mal tree. The reason is that the number of alternative 
tree topologies grows very rapidly with increasing 
numbers of OTUs. It is then infeasible to test every 
topology, even with powerful computers, and compu- 
tational short-cuts still leave many alternatives to test. 
For example, there are over 8 x 10°* topologies for 30 
OTUs. Furthermore, there are often many trees with 
the same greatest optimality, so that although they dif- 
fer little, there may be no criteria to choose among them. 
It is evident that phylogenetic reconstructions must 
always be regarded to some extent as approximate. 

The second group of methods is known as ordin- 
ation analysis. They reduce the many dimensions 
represented by the similarity matrix to a few dimen- 
sions that express as much as possible of the observed 
variation. This yields plots in two or three dimensions. 
The OTUs are represented by points on such dia- 
grams, and clusters of closely related OTUs can be 
seen by eye. Well known techniques are principal 
component and principal coordinate analysis. If 
ordination is to be useful, a high proportion of the 
variation must be expressed in the first two or three 
dimensions. There is always a danger that clusters of 
OTUs that are quite separate in multidimensional 
space, and are easily revealed by cluster analysis to 
be distinct, will be overlapped in ordination plots. 

If a similarity matrix is rearranged to bring highly 
similar OTUs together, the cells of the matrix can be 
shaded in different intensities according to the simi- 
larity values. Such shaded similarity matrices are occa- 
sionally useful in interpreting taxonomic structure. 

These various methods for structure emphasize 
different aspects of relationship, and inevitably lead 
to some loss of information. Their choice depends 
largely on the aims of the investigator. 


Step 4 

The taxonomic structure can now be represented for- 
mally as taxonomic or phylogenetic groups of OTUs, 
with appropriate indications of their status. Thus 


taxonomic ranks can be defined and named. There are 
some problems, because objective criteria for rank or 
for cladal status are not well developed, and nomen- 
clature can be controversial. Good scientific judge- 
ment in the light of other knowledge is therefore 
indispensable. The final groupings can then be 
described in various forms, such as tables of common 
character states or age of origin. 


Step 5 

The relevant information is now available for pro- 
ducing an identification system whereby further, 
unknown, members of the groups can be identified 
with their correct group. Various strategies are em- 
ployed. One is to construct a diagnostic key, prefer- 
ably by one of the algorithms for this. Such a key 
is similar to ‘rule-based systems’ in information re- 
trieval, and the groups are usually treated as monothetic 
(i.e., it is assumed all new members of a group will 
possess the character states given in the key). Another 
strategy is to treat the groups and the unknown as 
having position in a phenetic space defined by the 
characters that distinguish groups. The unknown is 
then identified with the group to which it is nearest 
(i.e., most similar). A simple form of such a system is a 
diagnostic table, which can be compared with the 
unknown to find the closest match. Such methods 
are very similar to ‘expert systems’ in information 
retrieval. The groups are still treated as polythetic; 
therefore, the correct identity is not excluded by an 
occasional atypical or missing character state of the 
unknown. Also, the probability of a correct identifi- 
cation can usually be calculated. This strategy is close 
to discriminant functions in statistics. Numerical 
identification has been most used in microbiology 
but it is being applied to many fields where identifica- 
tion, recognition, or diagnosis is required. 


Additional Techniques 


Criteria of quality are needed for numerical tech- 
niques. Thus, statistical sampling theory leads to esti- 
mates of how accurate a similarity value is likely to be. 
The extent to which a dendrogram represents a simi- 
larity matrix can be measured by the cophenetic cor- 
relation coefficient, and there are techniques to assess 
the agreement or congruence between different data 
sets. Related to this are methods to combine data in the 
form of consensus trees. Repeated random sampling 
can estimate the reliability of clades in a cladogram 
(bootstrap analysis). 

Current work in biology is largely concentrated on 
reconstructing phylogeny from molecular sequences, 
using algorithms that are highly specialized to pro- 
teins or nucleotide sequences. However, it should be 
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remembered that molecular or genomic data are not 
necessarily phylogenetic. The distinction between 
phenetic and cladistic relationships is based on the 
methods employed, and phenetic analyses can be 
made from genomic data. 


Other Applications 


The commonest types of analysis, as described 
above, are those that group organisms together; this 
is termed Q analysis. However, one can group the 
characters; this is termed R analysis. This is useful in 
several ways. The grouping of characters can reveal 
complexes that covary, and provide insight into de- 
velopmental genetics. Similarly, grouping geograp- 
hical areas according to their biota can be of 
assistance in biogeography. 

Numerical taxonomy has been successfully applied 
in a wide range of disciplines. It has been adapted to 
problems in ecology, morphometrics, epidemiology, 
and geographical variation. Its concepts are extensively 
used in genomic analysis and information retrieval. 
Most of these applications have had to face basic ques- 
tions, such as homology, choice of characters, and 
character weighting. It has thus led many disciplines 
to re-examine and redefine their aims and assumptions. 
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Tay-Sachs disease (TSD) is a progressive, uniformly 
fatal, neurodegenerative disorder of infancy — the 
acute infantile form of the Gm2 gangliosidoses, one 
subgroup of the lysosomal storage disorders. TSD is 
named for the two physicians who first described the 
condition in the 1880s, Warren Tay, a British ophthal- 
mologist, and Bernard Sachs, a neurologist in New 
York City. 
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Inheritance, Distribution, and Frequency 


TSD is inherited with an autosomal recessive pattern 
of transmission. Heterozygote carriers are entirely 
normal. TSD has been described in infants of all racial 
and ethnic groups, but historically has been identified 
predominantly among children of Central/Eastern 
European Jewish ancestry (Ashkenazim). The hetero- 
zygote frequency for TSD among Ashkenazi Jews is 
between 1/25 and 1/30 individuals, with a disease in- 
cidence of about 1 in 3000 births (1/27 x 1/27 x 1/4). 
Among general non-Jewish populations, the TSD 
carrier rate is approximately 1 in 300, making the 
disease incidence approximately 1 in 360000 births 
(1/300 x 1/300 x 1/4). Certain non-Jewish isolates 
with increased TSD have been found among the 
Pennsylvania-Dutch, the Cajuns of Louisiana, and 
some French Canadians from Quebec, Canada. 
Other non-Jewish isolates with TSD in China, Japan, 
and Morocco, also have been identified, all probable 
examples of genetic founder effect and drift. 


Pathogenesis 


TSD results from the progressive intralysosomal accu- 
mulation of Gy2 ganglioside, a normal component of 
neuronal membranes. The defect in TSD is deficient 
activity of Gyz gangliosidase, the enzyme required to 
catalyze the intralysosomal hydrolytic cleavage of the 
terminal N-acetylgalactosamine from Gy ganglio- 
side. This enzyme is also named hexosaminidase A 
(HEX A) when its activity is assayed with coloro- 
metric or fluorogenic artificial substrates. In the 
absence of this hydrolysis, Gm2 remains intact and, 
with continued normal biosynthesis, progressively 
accumulates within neuronal lysosomes. Increasing 
storage of Gy leads to progressive engorgement of 
cytoplasmic lysosomes, forming the characteristic 
membranous cytoplasmic bodies (‘onion skin lesions’) 
on electron microscopy. Cytoplasmic engorgement 
disrupts normal neuronal cell function as reflected by 
the increasing neurologic symptomatology (weakness, 
blindness, seizures, etc.) and ultimately leads to neur- 
onal cell death. 


Clinical Description 


Although the disease process begins in the fetal ner- 
vous system early in gestation, the affected infant 
appears entirely normal at birth and remains so 
throughout the first 4-6 months of life. Motor weak- 
ness (e.g., floppiness or poor head control) or an 
‘increased startle response’ to sharp sounds may be 
the first difficulties observed by parents. Further, wan- 
dering eye movements at about this age may lead to 


specialist referral, where the characteristic ‘cherry red 
spot’ in the fovea of the maculae is seen, thus leading 
to the diagnosis. From 6-12 months, there is progres- 
sive deterioration, with increasing weakness, loss of 
motor and developmental milestones if previously 
gained (e.g., rollover, sit alone) and failure to gain 
new ones (e.g., stand, coast, walk, talk). Physical find- 
ings reveal only fundoscopic changes, absence of liver, 
spleen, or other organ enlargement, and evidence of 
upper motor neuron dysfunction (hyperreflexia, sus- 
tained ankle clonus, pathologic startle response, etc.). 
Diminished vision is evident and progressive seizures 
usually begin around 12-14 months. Deterioration 
continues and by 16-18 months decerebrate postur- 
ing, blindness, and complete loss of meaningful inter- 
action with the immediate environment is apparent. 
Management issues subsequently include feeding, 
hydration, airway care, skin care, seizure control, 
and maintenance of bowel and bladder function. The 
child will remain in this ‘chronic vegetative state’ 
until death between 2 and 5 years of age, usually the 
result of acute aspiration or overwhelming infection, 
secondary to pneumonia. 


Diagnosis 


With artificial substrates, HEX A and HEX B are 
easily quantified in serum, leukocytes, cultured skin 
fibroblasts, or amniotic fluid cell samples from sus- 
pected patients or fetuses at risk for TSD. A complete 
or near complete deficiency of HEX A activity (in the 
presence of normal or increased HEX B) is diagnostic 
of TSD in a symptomatic infant or an affected fetus. 
Heterozygotes have HEX A levels of approximately 
50% of the control level. Where specific mutations are 
known to segregate in a family, PCR-based DNA 
mutation analysis can be used both for diagnosis and 
carrier identification. 


Molecular Genetics 


The 40-kb, 14-exon gene for TSD is located on chromo- 
some 15q23 and directs the synthesis of the «-subunit 
of Gm2 gangliosidase (HEX A). This enzyme is 
a heterodimer comprised of one a-subunit and one 
B-subunit (derived from the HEX B gene on chromo- 
some 5q13). Mutations in the -subunit gene are asso- 
ciated with TSD and its later onset variants, while 
B-subunit gene mutations account for Sandhoff dis- 
ease and its variants. Mutations in a third gene, the 
Gy activator gene on chromosome 5q, can also lead 
to insufficient Gy degradation and a rare form of 
Gm2 gangliosidosis, ‘activator-deficient TSD.’ More 
than 100 o-subunit (HEX A) gene mutations have 
been identified to date. 


Treatment 


No specific treatment for TSD is presently available. 
Attempts to introduce Gy gangliosidase into the 
central nervous system by purified enzyme infusions, 
by cellular transfusions or by bone marrow trans- 
plantation have uniformly failed to date. Traversing 
the blood-brain barrier — with enzymatic protein, 
cellular elements, or gene-carrying vectors — remains 
the major obstacle to therapeutic breakthrough. 
The creation of knockout mouse models for both 
TSD and Sandhoff disease provides important new 
avenues for such studies. Most recently, research 
efforts have been directed to minimizing the 
accumulation of Gm2 by inhibiting its biosynthesis 
with nojirimycin derivatives, relatively nonspecific 
inhibitors of glycosphingolipid production. Some 
have hope that this may prove beneficial. Lastly, 
while widely discussed, vector-mediated gene therapy 
remains a hope for the future, and will require major 
further research breakthroughs if ever to become a 
reality. 


Prevention 


Major strides have been made to prevent the birth of 
infants affected with TSD. Community-based educa- 
tion, voluntary carrier testing, and genetic counseling 
programs in Jewish communities throughout the 
world have lead to a greater than 90% reduction in 
the incidence of TSD in Ashkenazi Jewish popula- 
tions. Carrier testing (HEX A and /or DNA based 
testing) enables the identification of at-risk couples 
(both partners: heterozygotes) before the birth of 
affected offspring. Such couples, with comprehensive 
genetic counseling, may choose to monitor each preg- 
nancy by amniocentesis or chorionic villus sampling 
and interrupt (abort) those pregnancies where the 
fetus is affected (25% risk with each pregnancy). 
Other options include adoption, use of noncarrier 
sperm or ovum donors, preimplantation testing of em- 
bryos after im vitro fertilization, taking their chances, 
or use of carrier status information in marriage or 
mating decisions (as carried out by certain ultra- 
orthodox Jewish groups). Until effective therapeutic 
breakthroughs occur, preventative approaches will 
remain the mainstay in the management of this 


dreaded disorder. 
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History 


A mutation in the Brachyury or T locus, first 
described in 1927, was one of the earliest recognized 
developmental genes in mice. Embryological defects 
caused by mutations at this locus cause embryonic 
death in homozygotes and defects in tail development 
in heterozygotes (hence the locus name Brachyury 
which means short tail, symbolized as T for tail). In 
1990, the gene itself was cloned and found to be a 
novel transcription factor, and it was not long there- 
after that the existence of a family of T-related genes 
was demonstrated by the discovery of genes in both 
Drosophila and mice, with sequence homology to 
T. Discovery and exploration of the family, which 
was called the T-box gene family, has proceeded by 
leaps and bounds and T-box genes have been found in 
species as divergent as the roundworm, Caenorhabdi- 
tis elegans, and Homo sapiens, as well as many species 
in between (Figure 1). 


Defining Features of the Family 


The proteins encoded by the T-box family genes are all 
putative transcription factors. The defining feature of 
the gene family is a region of DNA sequence hom- 
ology that encodes a polypeptide region named the 
T-box, extending across 180 to 190 amino acid resi- 
dues. The gene products of several members of the 
T-box gene family have been shown to have a domain 
of specific DNA binding activity which includes the 
T-box region, leading to the hypothesis that DNA 
binding is conserved among all proteins containing 
the T-box polypeptide domain, even though polypep- 
tides diverge widely outside the region encoded by the 
T-box. DNA-binding activity, along with the nuclear 
localization of the gene products, suggested that these 
proteins act as transcriptional regulators of other 
genes. Indeed, transcriptional regulation has been 
demonstrated for several T-box gene products, and it 
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Figure | A phylogenetic tree of the T-box gene family constructed using the neighbor-joining algorithm based on 
Poisson-corrected distances between amino acid sequences. The length of the horizontal lines is proportional to 
evolutionary distance. T-box subfamilies are grouped and indicated by brackets. Five Caenorhabditis elegans T-box 
genes and one ascidian gene at the bottom of the tree have yet to be classified into particular subfamilies. Eight known 
human genes are not included but are closely related to their mouse orthologs. as, Ascidian; am, amphioxus; ce, 
C. elegans; dm, Drosophila; ch, chick; mu, mouse; x, Xenopus; zf, zebrafish. (Reproduced with permission from 


Papaioannou and Silver, 1998.) 


seems very likely to be a common feature of all 
members of the family. A productive area of future 
research will be the elucidation of the nature of this 
transcriptional control and the discovery of the specif- 
ic genes that are regulated by T-box genes. 


Phylogenetic Analysis 


Phylogenetic analysis provides a powerful tool for 
dissecting the evolution of gene families, and for 
predicting and understanding functional relation- 
ships among different family members. Phylogenetic 


analysis of the T-box gene family (Figure 1) reveals 
that this is an ancient gene family. Its initial expansion 
from a single progenitor sequence appears to have 
occurred at the outset of metazoan evolution and 
further expansions have occurred by gene duplication 
along individual evolutionary lineages. Together with 
gene expression studies, phylogenetic comparisons 
also provide evidence for the existence of T-box gene 
subfamilies whose more recently duplicated members 
retain similar or overlapping patterns of gene expres- 
sion, most likely correlated with conserved function 
as well. For example, the Tbx2 and Tbx3 genes are 
members of an ancient vertebrate subfamily that 
expanded prior to the divergence of bony fish and 
tetrapods. These two genes have expression patterns 
that are broadly similar, both within and between spe- 
cies, although minor temporal and spatial differences in 
expression may reflect divergence of function that 
could have occurred since the separation of the genes. 

Phylogenetics also allows the identification of what 
are likely to be orthologs of the same gene in different 
species. Orthologs are defined as direct descendants 
from a single ancestral gene that was present in the 
genome of the common ancestor of the species under 
analysis, for example, the Brachyury orthologs found 
in many species (Figure l, 7 subfamily). Analysis of 
the T-box family tree can direct the search for new 
T-box genes by predicting the existence of orthologs 
in species where they have yet to be discovered, thus 
hastening gene discovery. 


Role in Development 


T-box genes have been discovered primarily through 
screens designed to detect genes with embryonic 
expression or function. Although several T-box genes 
are known to be expressed in adult tissues, their wide- 
spread expression in embryonic tissues, particularly in 
areas of inductive tissue interactions, emphasizes what 
is almost certainly a family feature: a major role for T- 
box genes in embryonic development. 

One highly effective means of ascertaining the 
function of a gene is to find or create mutations in 
that gene in order to study the effects of its disruption. 
There is currently only a handful of known mutations 
in T-box genes. In addition to the well-studied 
Brachyury mutations in the mouse which affect the 
development of posterior structures including the tail, 
mutant alleles of the Drosophila ortholog, Trg, consti- 
tute a series of alleles with effects of varying severity 
on the development of posterior structures. Similarly, 
mutant alleles of the zebrafish Brachyury ortholog, 
no tail, also show effects on the development of pos- 
terior structures, illustrating comparable functions of 
orthologs in widely divergent species. Among the 
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other T-box genes, there are spontaneous mutations 
at the Drosophila omb locus, and in two human genes, 
TBX3 and TBX5. The human mutations are of - 
considerable interest in that they are responsible for 
autosomal, dominant, developmental syndromes 
known as ulnar-mammary syndrome and Holt- 
Oram syndrome, respectively. The ulnar-mammary 
syndrome is characterized by limb defects and 
abnormalities of apocrine glands including the mam- 
mary glands, while the Holt-Oram syndrome is char- 
acterized by cardiac septal defects and abnormalities 
of the forelimbs. 

A mutation in the mouse gene Tbx6 has been 
produced by targeted mutagenesis, a technique by 
which specific mutations can be created at will. 
Homozygous mutant embryos have severe defects in 
the specification and differentiation of the somites. 
Although the head region forms normally, neck 
somites are misshapen and more posterior somites 
fail to form at all. Instead, two ectopic neural tubes 
are present in place of the posterior somites. This 
mutant phenotype of three parallel neural tubes and 
no posterior somites, as well as the phenotypes of all 
other known mutations in T-box family genes, indi- 
cates a critical role for these genes in the specification 
and differentiation of tissues and structures during 
embryonic development. 

Even this small number of mutants has been ex- 
tremely valuable in elucidating the nature of T-box gene 
function. Future mutational analysis, particularly by 
targeted mutagenesis, holds the key to understanding 
individual T-box gene function and the functional 
significance of the family as a whole. 


Further Reading 

Herrmann BG (1995) The mouse Brachyury (T) gene. Seminars in 
Developmental Biology 6: 385-394. 

Papaioannou VE and Silver LM (1998) The T-box gene family. 
BioEssays 20: 9-19. 


See also: Gene Family; Transcription Factor 
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Telomerase is a ribonucleoprotein complex that main- 
tains chromosome ends. It is a cellular reverse tran- 
scriptase composed of both RNA and proteins that 
employs its internal RNA component as a template 
for the synthesis of telomeric DNA. It stabilizes 
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telomere length by adding hexameric (TTAGGG)n 
repeats onto telomeric ends of chromosomes. After 
adding six bases, the enzyme is believed to pause while 
it repositions (translocates) the template RNA in 
order to synthesize the subsequent 6 bp repeat. This 
extension of the 3’ DNA template end in turn allows 
replication of the 5’ end of the lagging strand. It thus 
compensates for the continued erosion of telomeres 
and has been referred to as a ‘cellular immortalizing 
enzyme.’ 


See also: Reverse Transcriptase 
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The Need for Telomeres 


Telomeres are found at the ends of chromosomes; they 
provide the answer to two problems of chromosome 
management. First, there has to be something to dis- 
tinguish true chromosome ends from the accidental 
ends resulting from chromosome breakage. There is 
much evidence from studies of the effect of radiation 
on chromosomes that broken ends are prone to indis- 
criminate rejoining, with the possibility of segmental 
rearrangement. Presumably they provide substrates 
for double-stranded DNA ligase, and they may also 
be subject to erosion by exonuclease. True chromo- 
some termini must be sealed in some way to protect 
them against these hazards. 

Second, there has to be a way of completing DNA 
replication. DNA polymerase extends DNA strands 
from their 3’ ends, and so the two strands of double- 
stranded DNA are replicated in opposite directions. 
The synthesis of one follows the replication fork and 
can be continuous, but the synthesis of the other runs 
‘backwards’ in piecemeal fashion, each piece having 
to be initiated afresh by an RNA primer that is 
subsequently removed. The consequence is that the 
DNA strand with a 5’ terminus cannot be fully repli- 
cated by the regular mechanism, since there is no 3’ 
end to prime the filling of the gap left by removal of 
the last RNA primer (Figure 1). Indeed, there is evi- 
dence that the 5’-terminated strand, already shorter, 
may be further shortened, leaving an even longer 
single-stranded 3’ ‘tail.’ So, without some additional 
end-replication mechanism, one would expect the 
chromosome to get a little shorter with each cycle of 
replication. Cell viability depends on the constant 
replenishment of terminal sequences. 


m; 


Figure | The end-replication problem. As the DNA 
replication fork progresses, one of the two parental 
strands can be replicated continuously, primed by its 
own 3’ end; the other (the lower in this figure) has to be 
replicated ‘backwards’ in patches, the replication of each 
patch being primed separately by primase RNA (shown 
here as a wavy line). When the replication fork reaches 
the chromosome end, the 5/-terminated strand of one 
daughter chromatid is incomplete, since the final RNA 
primer cannot be replaced by DNA. 


Terminal Repeats 


With the notable exception of Drosophila (see below), 
chromosomes, so far as they have been sequenced, 
have short tandem repeats at their extremities. These 
repeats are characteristically rich in G-C base pairs, 
with the Gs predominantly in the 3/-terminated 
strand, sometimes called the G-strand. The G-rich 
repeats are more or less constant within organisms 
but variable between organisms: TTGGGG in the 
ciliate Tetrahymena, TG, TGG, or TGGG in Saccharo- 
myces, and TTAGGG in human and mouse, to give just 
afew examples. The total length of this region of simple 
repeats ranges from just 20 base pairs in some ciliates to 
more than 100kb in mouse, it can also vary within 
species, especially with ageing (see below). Adjoining 
the simple repetitive telomeric sequence, there is gen- 
erally a region, called the subtelomere, of less regular 
and generally longer repeats. 


Subtelomeric Structure 


In Saccharomyces yeast, where all chromosomes 
are now completely sequenced, the terminal TG1-3 
repeats are usually flanked on the inside by up to four 
tandemly arranged copies of sequences called Y’, 
which come in two main sizes, 5.2 and 6.7 kb, related 
by an internal deletion. They contain open reading 
frames of uncertain function. Inside the Y’ sequences 
is a segment called X, variable in length among yeast 
telomeres but with a core sequence of about 0.5 kb 


which is common to all. It is thought to have a role in 
the positioning of the telomeres at the periphery of the 
nucleus. Inside the X segment are further repeats, 
variable in length and sequence from one chromosome 
to another, before the gene-containing interior of the 
chromosome is reached. Further copies of the TG1-3 
repeats may be interspersed among the Y’ sequences 
(Figure 2). 

The X/Y’ subtelomeric sequence tends to attract 
certain DNA-binding proteins which serve as foci 
for the formation of transcriptionally silent chromatin 
structure (see below). 

The telomeres of other organisms are similar to 
those of Saccharomyces in the general sense that their 
terminal G/C-rich repetitive sequences are flanked by 
other kinds of repeats, but these show no consensus 
between organisms. 


Telomerase Function 


Terminal repeats are replenished through the activity 
of telomerase, an enzyme first isolated from Tetra- 
hymena, which is a particularly rich source. Like 
other ciliates, Tetrahymena has two kinds of cell nuclei: 
micronuclei, containing the total genome in single 
copy, and macronuclei, in which those genes currently 
active are excised in fragments and amplified to high 
copy number. Consequently, this organism has an 
exceptionally large number of chromosome ends to 
look after. In spite of the peculiarities of the organism, 
the Tetrahymena telomerase system appears to be 
typical of eukaryotic organisms generally. 
Telomerase is a reverse transcriptase which carries 
its own single-stranded RNA molecule to serve as a 
template for DNA synthesis. The total length of the 
telomerase RNA varies widely between organisms 
(159 bases in Tetrahymena, about 500 bases in mam- 
mals, and 1.5 kilobases in Saccharomyces), but the 
crucial section in each case is a sequence including 
the complement of the repetitive G-rich telomere se- 
quence. Thus in Tetrahymena, the primase se quence 
3’/-AACCCC-5’ corresponds to the telomere 
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Figure 2 The general structure of the telomeric DNA 
of the budding yeast Saccharomyces cerevisiae. Closely- 
spaced vertical lines represent the short terminal repeats 
(TG, on the 3’-terminated strand), sometimes present 
also in the subtelomeric DNA. Open and filled boxes 
represent Y’ and X sequences and internal chromosome 
sequence is stippled. The number of Y’ elements varies 
between 0 and 4, but only one is shown here. 


Telomeres 1947 


5'-TTGGGG-3', and in Saccharomyces 3/- 
CACACCC-5’ corresponds to 5'/-GTGTGGG-3’ in 
the telomere. The telomerase binds to the chromo- 
some end and, using its own RNA sequence as a 
template and the 3’/-terminus at the chromosome end 
as primer, synthesizes an additional terminal repeat. 
The enzyme then moves to the new end and adds 
another repeat copy, and so on (Figure 3). 

After a number of such sequential additions to the 
3'-terminated telomere strand, the extension can be 
made double-stranded by ordinary RNA-primed 
‘backwards’ replication (Figure 3). The removal of 
the final RNA primer will result ina short single-strand 
gap and a consequent shortening of the G-strand, but 
the length already gained through telomerase action 
will generally be more than enough to compensate for 
this loss as well as for shortening resulting from the 
previous round of replication. An alternative method 
of second strand synthesis also seems possible; if the 
first strand folded back on itself, as it might do 
through G-T base pairing, it could prime the synthesis 


of its own complement. 


Telomere Binding Proteins 


Proteins binding at telomeres probably serve several 
functions. The most obvious is protection of the DNA 
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Figure 3 The proposed mechanism for the main- 
tenance of telomeres. (A) Telomerase protein (repre- 
sented by the dotted ellipse) binds to the 3’-terminated 
G/C-rich single strand which extends beyond the 
shortened 5/-terminated strand, and a new repeat 
sequence is added, with the single-stranded telomerase 
RNA acting as a template. (B) The telomerase shifts to 
the new 3’ end, and another repeat sequence is added. (C) 
After a certain number of repeats have been added (three 
depicted here), the extended sequence is made double- 
stranded by RNA-primed repliction, shown as a dashed 
line; the RNA primer is shown as a short wavy line. 
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termini against attack by exonuclease. Another, about 
which little can be said as yet, is to do with the pos- 
itioning of the chromosomes in the nucleus. It has been 
known for many years that, especially in meiotic cells, 
chromosome ends often appear to be attached to the 
nuclear envelope in clusters, giving the so-called ‘bou- 
quet’ appearance to the prophase chromosomes. It has 
been suggested that this could be a part of the process 
bringing homologous chromosomes together for 
meiotic pairing. 

One idea about the protection of chromosome ter- 
mini is that the G-rich nature of the telomere DNA 
permits the formation of tetrameric associations 
between guanine residues, perhaps resulting in hairpin 
loops in the DNA chain. Such structures might be 
protective against exonuclease attack in themselves 
and might also provide sites for protein binding. 
Tetra-G-binding proteins do exist, but whether they 
function especially at telomeres is not yet clear. 

Most is known about the telomere binding proteins 
of the yeast Saccharomyces. Not all of them are known 
to be necessary for telomere function. The Saccharo- 
myces SIR genes were first identified as necessary for 
the maintenance of the ‘silent’ (i.e., nontranscribed) 
state of auxiliary genes in the yeast mating-type 
switching system. At least some of the proteins 
encoded by them bind not only to the silent mating 
type ‘cassette’ loci but also to the subtelomeric repeats 
of yeast telomeres and help maintain a state of the 
chromatin that silences, in a clonal fashion, genes arti- 
ficially inserted within it (an example of position- 
effect variegation). When S/R3 is overexpressed, this 
‘silent’ form of chromatin, which is akin to the hetero- 
chromatin of higher eukaryotes, can spread along the 
chromosome, extending the silenced region. It is not 
clear, however, that this property of telomeres has 
anything to do with the essential telomere function 
of preserving chromosome ends. 

The DNA binding protein encoded by the essential 
gene RAPI is more clearly concerned with telomere 
function. It binds at many sites in the genome and acts 
as a transcription regulator, but it is particularly con- 
centrated at telomeres and functions in the regulation 
of the number of simple repeats. The increasing 
amount of bound RAP Ip as the telomere elongates 
appear to inhibit further elongation. Disruption of the 
binding, either by truncation of the RAP1 polypeptide 
chain, or by alteration of the telomere repeats by 
engineering mutations in the TERI gene which 
encodes the RNA template, results in a lengthening of 
the telomeric sequence and a faster rate of turnover 
of the terminal repeats (Krauskopf and Blackburn, 
1998). At least one other gene RIFI, appears to 
cooperate with RAPI in limiting telomerase 
function. RIF1 protein binds to RAP1 protein, and 


temperature-sensitive alleles of RIF1 cause telomere 
elongation at the restrictive temperature. 

Proteins that may be analogous to the yeast telo- 
mere binding proteins have been identified in mouse 
and human cells. The genes TRF1 and TRF2 both 
encode proteins that bind to the TTAGGG terminal 
repeats, and there is strong evidence that TRF1, like 
yeast RAP1, regulates telomere length. A mutational 
deficiency inthe proteinresultsintelomerelengthening, 
and its overproduction to telomere shortening. It thus 
seems tobe functionally similar to yeast RAP1, butthere 
is no sequence similarity between the two proteins. 

The yeast gene EST2, so called because some est 
mutants have Extremely Short Telomeres, encodes 
the protein component of the telomerase itself. 
Telomere-shortening est mutants in general suffer 
early senescence but give rise to occasional clones of 
cells with restored growth. Analysis of the telomeres 
of these revived clones has shown that they had been 
regenerated not by regrowth of the standard telomeric 
repeats, but by recruitment of additional subtelomeric 
Y elements. It seems that the revived cells have built 
up a surplus of Y’ elements by some amplification 
process and transferred them to chromosome termini, 
probably by recombination between the few TG, 
repeats still present at the termini and TG,, repeats 
intercalated among the donor Y’ elements (Lundblad 
and Blackburn, 1993). This example shows that organ- 
isms are not necessarily dependent on one particular 
kind of repeat for maintaining their chromosome 
ends: other kinds of renewable sequence may be able 
to substitute. This principle is still more strikingly 
illustrated by the telomeres of Drosophila. 


Drosophila Telomeres 


Surprisingly, exploration of the ends of Drosophila 
chromosomes has revealed no short G/C-rich ter- 
minal repeats of the kind found in other organisms. 
Instead, the termini are composed largely of a long 
repetitive element (HeT-A), interspersed at some ter- 
mini with another such element (TART). HeT-A and 
TART are respectively 6kb and 5.1 kb in length and 
are allied to the LINE (long interspersed element) 
class of retroelements. TART has open reading frames 
encoding a reverse transcriptase and a protein with 
similarities to gag, typical of retroelements. HeT-A 
encodes no reverse transcriptase, but reveals its rela- 
tionship to a retroelement by possessing a gag-like 
open reading frame. Both have poly-A sequences at 
their 3’ ends, as one would expect of sequences propa- 
gated by reverse transcription from mRNA. Next 
to the HeT-A/TART elements on the centromere side 
is a region of about 10 kb consisting of shorter repeats 
of between 0.5 and 1.8 kb (Figure 4A). 


The current hypothesis for Drosophila telomere 
maintenance is that HeT-A and TART RNA tran- 
scripts are reverse-transcribed into DNA single 
strands, the 3/-poly(A) tails of which are brought 
into alignment with, and then ligated to 5’-termini at 
the chromosome ends. The added sequence is then 
made double-stranded by DNA synthesis primed 
from the previous chromosome end (Figure 4B). 
The gag-like DNA binding protein is essential for 
the addition process and may be involved in the align- 
ment step (Figure 4B). 

The addition of a HeT-A or TART element is a very 
substantial elongation by comparison with the small 
additions typically catalyzed by telomerase. One added 
HeT-A or TART copy should sustain chromosome 
replication for many fly generations. 

Whereas previously-known LINE elements, such 
as the I element of Drosophila which is responsible for 
one kind of hybrid dysgenesis, are inserted all over the 
genome, HeT-A and TART seem to be confined to 
chromosome ends. LINEs, like other transposons and 
retrotransposons, are generally considered to be ‘self- 
ish’ elements, but HeT-A and TART appear to have 
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Figure 4 (A) The DNA structure of Drosophila 
telomeres. The terminal tandemly arranged Het-A/TART 
retroelements are shown as open boxes; the subterminal 
element has been shortened by many cycles of replication. 
Stippled boxes represent shorter subtelomeric tandem 
repeats of variable length, and the black section non- 
telomeric chromosomal DNA. (B) Proposed mechanism 
of Het-A/TART addition; (i) RNA (wavy line) is reverse- 
transcribed from Het-A or TART, with the 3’-terminal 
poly-A tract brought into alignment with the chromo- 
some end, probably with the aid of the gag protein 
encoded in both elements; (ii) the 3’-terminated 
DNA strand is extended by reverse transcriptase 
(encoded in TART) using the RNA as template, so adding 
a single-strand DNA copy of one of the retroelements; (iii) 
the RNA is removed, and second DNA strand synthesis is 
primed from the distal end by primase-synthesized RNA. 
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been recruited to perform an essential function for 
their host organism. 

The subtelomeric regions of Drosophila chromo- 
somes are like those of other organisms in consisting 
of the order of tens of kilobases of tandemly repetitive 
sequence of no ascribed function. As in Saccharo- 
myces, this repetitive sequence may provide a sub- 
strate for formation of heterochromatin, since genes 
artificially placed within it tend to be ‘silenced,’ i.e., 
not transcribed. 


Telomeres, Aging, and Cancer 


The chromosomes of embryonic cells are adequately 
equipped with telomeric repeated sequences and gen- 
erally (a notable exception is Drosophila) possess telo- 
merase activity. But as cells begin to differentiate, their 
telomerase usually decreases to undetectable levels. 
This is one reason why somatic cells have a limited 
lifespan. The telomeres of cells in culture decrease 
in length with successive transfers (Figure 5), and 
when they fall below a critical minimum length, cell 
division stops. 

On the other hand, cancer cells, or cells with can- 
cerous potential, are ‘immortalized’ and can grow 
indefinitely. A common, though not quite universal, 
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Figure 5 Decrease in telomere length with time of 
culture of cells from human vascular tissue. The size of a 
terminal restriction fragment (TRF) — i.e., the narrowing 
distance between a fixed subterminal restriction site and 
the chromosome end — is plotted (with standard errors) 
against number of cell generations (population doub- 
lings). (From Chang and Harley, 1995.) 
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characteristic of immortalized cells is that they have at 
least detectable levels of telomerase. Very signifi- 
cantly, the lifetime of at least some nonimmortalized 
human cell lines in culture can be extended, perhaps 
indefinitely, by transfection with DNA vectors encod- 
ing telomerase protein (Bodnar et al., 1998), though 
other studies have shown that extra telomerase alone is 
not sufficient to immortalize all cell types. An 
increased amount of telomerase is not usually consid- 
ered to be a main cause of cancer, and neither is the 
failure to maintain telomerase levels considered to be a 
main cause of programmed cell death (apoptosis). But 
it does seem that upregulation of telomerase produc- 
tion usually accompanies oncogenesis, and is perhaps 
necessary for it. 
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Temperate phages are bacteriophages that can 
sometimes coexist with their host for extended 
periods of time, during which time the host and its 
internal phage multiply in synchrony (unlike lytic or 
virulent phages). Instead, temperate phages have 
two lifecycles to choose from. They can undergo a 
lytic life cycle (see Virulent Phage for more detail) or 
a lysogenic lifecycle. The lysogenic lifecycle is unique 
to temperate phages. 

The first step in any phage life cycle is entry of the 
phage into the host cell. After the virus has success- 
fully attached itself to the outside of the bacterial cell it 
inserts its genome (either DNA or RNA) into the cell. 
In the lytic life cycle, the genome is immediately used 
to begin replication of the virus. In the lysogenic life- 
cycle, the viral genome will remain dormant within 
the bacterial cell, either as a plasmid or incorporated 
into the host’s genome, replicating only when the 
host genome is replicated; the regulatory mechanisms 
involved are discussed elsewhere. The lysogenized 
phage can later enter the lytic life cycle; this process 
is called induction. Upon induction, viral genes will be 
transcribed and translated, new progeny phage will 
be made, and the cell will be lysed to release the new 
phage into the environment. Lysogeny presumably 
provides temperate phage with a selective advantage, 
because they can delay the lytic life cycle until condi- 
tions are favorable. 


See also: Bacteriophages; Lysogeny; Virulent 
Phage 
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A temperature-sensitive mutation creates a pheno- 
type that varies with the temperature. The classic 
temperature-sensitive (ts) mutants were obtained 
by R.S. Edgar as part of a program to map the phage 
T4 genome extensively; at the same time, host- 
dependent (amber) mutations were found by R.H. 
Epstein and his associates for the same purpose 


(see Epstein et al., 1963). (Both types of mutants are 
called ‘conditional lethals’; Edgar, 1966.) The ts 
mutants of T4 were defined as those that are able to 
form plaques at 25°C but not at 42°C (Edgar and 
Lielausis, 1964). A set of ts mutants can be mapped 
relative to one another by crosses in which bacteria are 
infected simultaneously by two mutants at the low 
(permissive) temperature, so they can produce recom- 
binants. Progeny phage are plated and incubated at 
low temperature to count total phage and at high 
temperature to count recombinants. The ts mutants 
are also mapped relative to classical mutants and 
amber mutants. 

Edgar (1966) pointed out that temperature-sensitive 
mutants had been obtained much earlier in other 
organisms. Some classical alleles are also temperature 
sensitive. For instance, some alleles for fur color in 
Siamese cats and mice cause the expression of color 
only in the cooler tissues at the extremities of the feet, 
ears, nose, and tail. 
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A template is a single-stranded DNA or RNA poly- 
mer that is used to direct synthesis of another polymer 
such as DNA, RNA, or protein. DNA is used as a 
template molecule for DNA replication, DNA repair, 
as well as for transcription. DNA polymerases use 
template DNA by covalently linking deoxyribo- 
nucleoside 5’-triphosphates that base pair withtemplate 
DNA toformanew, complementary DNA strand. The 
polymerase ‘reads’ the template in the 3/5’ direction, 
while synthesizing DNA in the 5’—3' direction to 
form the antiparallel double-stranded DNA product. 


Template 1951 


This process is essential for duplication of genomic 
DNA, which precedes cell division. RNA poly- 
merases use template DNA similarly to create com- 
plementary strands of RNA, which are then used as 
messenger RNA and translated into proteins or used 
as RNA components of cellular machinery. 

Template DNA is prepared for replication and 
transcription by the unwinding action of enzymes 
known as helicases. These enzymes utilize energy 
from molecules such as ATP to destabilize the hydro- 
gen bonds between base pairs, which leads to separ- 
ation of the two strands of the duplex. Single-stranded 
DNA can quickly reanneal to duplex form, therefore 
most cellular organisms have special single-stranded 
DNA binding proteins that stabilize the template until 
it is copied or transcribed. Excision of damaged DNA 
or DNA nuclease activity can also generate single- 
stranded DNA. This template is converted to 
double-stranded DNA during the processes of DNA 
repair or recombination. 

RNA is also used as a template molecule for 
synthesis of both RNA and DNA. Similar to the 
DNA-dependent polymerases described above, 
RNA-dependent polymerases recognize and bind sin- 
gle-stranded RNA template and covalently link com- 
plementary ribonucleotides to form RNA polymers. 
RNA-dependent DNA polymerases known as 
reverse transcriptases use single-stranded RNA tem- 
plate to form complementary strands of DNA. 
Retroviruses such as human immunodeficiency virus 
(HIV) use these enzymes to convert their genomic 
RNA into single-stranded DNA, which is used as a 
template for transcription, and into double-stranded 
DNA which is integrated into the host organism 
DNA during infection. RNA-dependent RNA poly- 
merases also use RNA asa template. RNA viruses such 
as the influenza virus and poliovirus use RNA poly- 
merases to transcribe as well as to replicate their geno- 
mic RNA. 

As well as its role in nucleic acid metabolism, 
single-stranded RNA is used as template during pro- 
tein synthesis. Messenger RNA (mRNA), a scrupu- 
lous copy transcribed from the coding DNA template 
or RNA template, is in turn translated into amino acid 
polymers by the translation machinery in the cell. 
During translation, groups of three consecutive bases 
on mRNA (codons) are recognized and paired with 
corresponding amino acids, and the amino acids are 
successively linked to form polypeptides. Thus, the 
mRNA template is ‘read’ in the 5’—3’ direction, 
codon by codon leading to a faithful copy of the 
information in the form of a protein. 

By using a relay system of templates from replica- 
tion to transcription to translation, living organisms 
can faithfully maintain their genetic information over 


1952 Terminal Redundancy 


successive generations as well as utilize this genetic 
information accurately for the process of living. 


See also: DNA Repair; DNA Replication; 
Polymerase Chain Reaction (PCR); 
Transcription; Translation 


Terminal Redundancy 
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Terminal redundancy refers to the situation where 
certain phages have a duplication at one end of the 
other end. This allows them to circularize when they 
enter the host cell. 


See also: Bacteriophages 


Termination Codon 


See: Amber Codon; Nonsense Codon; 
Ochre Codon; Opal Codon 


Termination Factors 


See: Release (Termination) Factors 


Terminator 
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A terminator is a DNA sequence at the end of a 
transcript that causes RNA polymerase to stop tran- 
scription. 


See also: RNA Polymerase; Transcription 


Test Cross 
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Mendel originated the test cross as a cross of a hybrid 
individual by a purebred, or homozygous for the 
recessive trait(s) segregating in the hybrid. Mendel 
proposed that a hybrid individual would make two 
kinds of gametes, each equally frequent, for each 
heterozygous character, resulting in two types for a 


monohybrid, four types for a dihybrid, or eight types 
for a trihybrid. These types could only be inferred 
from the results of hybrid x hybrid crosses, however. 
To test each hybrid for the gamete types it produces 
more directly, Mendel introduced the idea of crossing 
to a purebred for recessive traits as the phenotype 
of each progeny would then reflect the gamete type 
contributed by the hybrid parent. Test crossing a 
dihybrid, for example, resulted in 1:1:1:1 ratios, con- 
firming the gamete types that underlie the 9:3:3:1 
ratios observed from the self-cross of the dihybrid. 
Currently test crosses are widely used to simplify the 
analysis of progeny from parents that are hetero- 
zygous for many genes, for example, in linkage and 
gene mapping studies. 


See also: Backcross; Linkage Map; Punnett Square 
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The mammalian testes determining locus (symbolized 
as Tdy, because of its location on the Y chromosome) 
is also called the testes determining factor (or TDF) in 
humans. This single gene present on the Y chromo- 
some of all male mammals is necessary and sufficient 
for the development of the fetus along a male pathway 
of differentiation in both germ cell and somatic cell 
tissues. The locus was identified through genetic stud- 
ies of people who carry a Y chromosome but are 
developmentally female (such people have a deletion 
over the TDF gene), and other people who do not 
carry the Y chromosome but are developmentally 
male (such people have a copy of the TDF gene trans- 
located to another chromosome). 


See also: Sex Determination, Human; 
Y Chromosome (Human) 


Tetrad Analysis 
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Meiotic Tetrads: Advantages and 
Availability 


In most eukaryotic organisms that have been studied 
genetically, the segregation of alleles at meiosis can be 


studied only in randomized meiotic products. On the 
female side, only one product of each meiotic cell 
survives to form the egg nucleus, and in males, 
although all products are potentially viable, they can 
be recovered in their original tetrads only rarely. In 
many fungi, however, and some algae such as Chlamy- 
domonas, analysis of whole tetrads is possible. Tetrad 
analysis permits the confirmation of rules of meiosis 
which are believed to apply universally but are less 
directly demonstrable in randomized meiotic pro- 
ducts and can be confirmed by microscopy only with 
especially favorable material. 


Tetrad Analysis 1953 


Some of the same advantages can be obtained by 
half-tetrad analysis, that is the recovery of two out of 
the four postmeiotic chromosome copies in the same 
meiotic product; this is possible in Drosophila mela- 
nogaster through the use of attached-X chromosomes. 


Tetrads and Octads in Fungi 


The meiotic products of the ascomycete fungi are as- 
cospores, initially held together within the ascus, which 
is a sac formed from the cell wall of the meiotic mother 
cell. The budding yeast Saccharomyces cerevisiae and 
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The 4:4 segregation of a pale-ascospore mutation (shown as a) in an ascomycete fungus with ordered asci, 


such as Sordaria fimicola. (A) No crossing over between the gene and the centromere results in segregation at the first 
division of meiosis. (B) A crossover between the gene and the centromere results in segregation at the second 
division. (From Fincham JRS (1983) Genetics. Bristol, UK: John Wright & Sons.) 


1954 Tetrad Analysis 


the fission yeast Schizosaccharomyces pombe have four 
ascospores in each ascus. Most of the filamentous 
species, however, (e.g., Neurospora, Sordaria, and 
Ascobolus) have eight — a tetrad of spore pairs resulting 
from a mitotic division following meiosis. 

In all these species it is relatively easy to make 
crosses between genetically distinct strains, dissect 
out ascospores before their discharge from the asci, 
and grow them into separate haploid cultures. Tetrad 
analysis has also been performed in the basidiomycete 
fungi, particularly in the mushroom group (agarics, 
e.g, Coprinus and Schizophyllum spp.), but not so 
commonly as in the Ascomycetes. 


Confirmation of |:1 Segregation (and 
Exceptions) 


Simple Mendelism and the chromosome theory pre- 
dict that meiosis in a heterozygote, with alleles A and 
a, should result in two A and two a products in every 
tetrad (or 4:4 in eight-spored asci). This prediction is 
most easily checked when, as in Sordaria and Ascobo- 
lus, mutant alleles are available that affect ascospore 
color and are therefore scorable in undissected asci. 
In Sa. cerevisiae (the most intensively studied yeast) 
such directly visible markers are not available. The asci 
have to be dissected and the ascospores grown, but 
this can now be done sufficiently rapidly for analysis 
of hundreds or thousands of tetrads. 

The data show that the simple Mendelian rule holds 
in the great majority of asci, but not universally. A 
minority of tetrads, usually of the order of 1% or 
0.1% in the filamentous species but typically ranging 
between 1 and 10% in Saccharomyces, show 3:1 or 1:3 
ratios of spores or spore pairs, an anomaly now attrib- 
uted to gene conversion (Gene Conversion). 

Another exception is the occasional segregation of 
the allelic difference at the mitotic division following 
meiosis, giving mismatched sister spores and 5:3, 3:5, 
or (with two mismatches in the ascus) aberrant 4:4 
ratios in the eight-spored species, or 50:50 mosaic 
single-spore colonies in yeast. This postmeiotic segre- 
gation can be explained by gene conversion affecting 
half-chromatids (single DNA strands) at the first pro- 
phase of meiosis. 

Although tetrad analysis provided the means of 
detecting exceptions to the 2:2 rule, gene conversion 
will not be considered in this article, which will pro- 
ceed on the assumption that the rule of 2:2 segregation 
always holds. 


Ordered Tetrads 


The long narrow asci of such Ascomycetes as Neuro- 
spora and Sordaria species are ordered in the sense that 


the positions of the spores reflect the two divisions of 
meiosis. The first division spindle is oriented length- 
wise along the ascus. The second division spindles 
are (with some exceptions, notably in So. brevicollis) 
arranged end-to-end without overlap, so that alleles 
separated from one another at the first division end up 
in spores in different halves of the ascus. The post- 
meiotic mitotic spindles are also nonoverlapping, so 
each product of meiosis is represented by a pair of 
adjacent spores. The occurrence of spore pairs 
carrying different alleles in the same half of the ascus 
indicates second division segregation, the result of 
crossing over between the gene locus and the chromo- 
some centromere (First and Second Division Seg- 
regation). Second division segregation frequency 
generally approaches a maximum of two-thirds for 
genes far from their centromeres (Figure 1). 

Representing the allelic difference as +, a (for wild- 
type and mutant alleles), and writing the constitutions 
of spore pairs in order from the tip to the base of the 
ascus, there are two equally frequent first division 
segregation patterns, + + a a and a a + +, and four 
equally frequent second division segregation patterns, 
+ata,taat,at+a+anda+-+a (Figure |). These 
statistical equalities show that the orientation of the 
first division bivalents and second division dyads on 
the division spindles is essentially random. 

Asci in Ascobolus and Saccharomyces are more or 
less spherical and not ordered, so that information on 
first versus second division segregation has to be 
obtained in other ways (see below). 


Two-Locus Segregations: Independent 
Assortment 


When two crossed strains differ at two gene loci, there 
are (neglecting order within the ascus) three possible 
ascus types: parental ditype (PD), nonparental ditype 
(NPD), and tetratype (T). A good example is provided 
by ascospore color mutants, yellow (y) and buff (b) in 
Sordaria spp. Wild-type ascospores are nearly black, 
and double mutant ascospores are white. We can sym- 
bolize a buff x yellow cross as b y* x b* y, the + 
superscripts indicating the wild-type alleles. Then PD 
asci will have two buff (b y™) and two yellow (b* y) 
spore pairs, NPD asci will have two black (b* y*) and 
two white (b y) spore pairs, and T asci will have one 
spore pair of each color (Figure 2). If the two segre- 
gating loci are unlinked, PD and NPD asci should be 
statistically equal in frequency. If PD asci are signifi- 
cantly more frequent than NPD, that is prima facie 
evidence for linkage. 

If the loci of the two genes are not linked, the 
frequency of Tasci depends on the two second division 
segregation frequencies, which we will call p and q. 


Figure 2 The three ascus types resulting from a cross 
between two pale-ascospore mutants in Sordaria. b, buff; 
y, yellow; b y, the double mutant, is white. PD, parental 
ditype; NPD, nonparental ditype; T, tetratype. Buff and 
yellow are distinguished from wild-type (black) by dark 
and light stippling, respectively. Wild-type alleles are 
shown as +. 


Tetratypes must always result when one locus seg- 
regates at the first division and the other at the second: 
the overall frequency is p(1 — q) + 4(1 — p). They will 
also result from one half of the cases where both 
segregate at the second division, i.e., pq/2. One can 
easily show this by writing down all the equally prob- 
able possibilities. PD and NPD asci will each result 
from one half of the cases where both loci segregate at 
the first division, and one quarter of the cases where 
both segregate at the second division. Thus: 


f(PD) = f(NPD) = (1 —p)(1 —4)/2+pq/4 and 
f(T) = p(1— 4) + 4(1 —p) +pq/2 


With three independently segregating allelic differ- 
a ce ce ma Mel NE cae ; 
ences, a'/a, b'/b and c'/c, it is in principle possible, 
by determining the tetratype frequencies from the 
three crosses a* b x a bt,a* cx ac™andb*cxbct, 
to evaluate all three second division frequencies 
from three simultaneous equations. But if second 


Tetrad Analysis 1955 


division segregation frequencies are required, it is 
simpler to find a mutation that virtually always 
segregates at the first division. Then the tetratype 
frequency in a cross of that mutant to any other will 
be the second division segregation frequency of that 
other mutant. 


Linked Segregations 


When two segregating loci are linked, the three tetrad 
types give information about crossing over between 
the two. PD asci are most simply interpreted as 
absence of crossing-over, although they can also result 
from two-strand double crossovers, with the second 
crossover involving the same two chromatids as the 
first, so cancelling its effect. (As a first approximation, 
one can neglect the possibility of three or more cross- 
overs in the same interval.) 

Tetratypes result mainly from single crossovers. 
The observation that, except for the most distant link- 
ages, the majority of recombinant products occur in 
tetratype asci is the most direct genetic evidence that 
crossing-over occurs after the chromosomes have 
divided, and involves only one chromatid from each 
chromosome. Tetratype tetrads also confirm that 
crossing-over is reciprocal, with Ab and aB always 
produced together. Except in a case of very close 
linkage, tetratype asci also result from three-strand 
double crossovers — that is with one chromatid cross- 
ing over twice, two chromatids involved once each 
and the fourth one not at all. 

Four-strand doubles, with the second crossover 
occurring between the two chromatids not involved 
in the first (Figure 3), result in nonparental ditype asci. 


Strand Relationships in Double 
Crossovers 


As Figure 3 shows, there are four kinds of double 
crossover, but their relative frequences cannot 
be obtained from two-point crosses because two- 
strand doubles are indistinguishable from no crossing- 
over and the two kinds of three-strand doubles 
both look like singles. With three linked allelic differ- 
ences, however, the four types can all be distinguished 
when one crossover falls within each interval 
(Figure 4). 

Extensive data, mostly from Saccharomyces and 
Neurospora, support the generalization that the four 
types of double crossover are equally probable, and 
that each chromatid has a 50% chance of being 
involved in each crossover, whether or not it is 
involved in another crossover. This is called absence 
of chromatid interference. 


1956 
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Figure 3 The explanation of the three tetrad classes, PD, NPD, and T, when the two genes concerned are linked. 
The cross is A b x a B, with a and b two linked mutations and A and B the corresponding wild-type alleles. 


Distinguishing between Independence 
and Distant Linkage 


When loci marked by allelic differences (markers) are 
very far apart on the same chromosome, the frequency 
of nonparental ditype tetrads may approach that of 
parental ditypes, and, by this criterion, the loci may be 
judged to be unlinked. However, this situation will 
arise only when the crossover frequency is so high that 
the A/a and B/b pairs of alleles are distributed among 
the four products of meiosis virtually independently 
of each other and at random. The result of the four 
meiotic products each receiving one random A allele 


and one random B allele will be a PD:NPD:T ratio of 
1:1:4. 

In other words, a crossover frequency high enough 
to bring the NPD and PD frequencies close to equal- 
ity will generate twice as many tetratypes as ditypes. 
With unlinked loci, the frequency of tetratypes can 


'Given that four meiotic products carry two A and two a 
alleles, in arbitrary order A A a a, there are six ways of 
distributing two B and two b alleles among them at random: AB 
AB ab ab, Ab Ab aB aB, AB Ab aB ab, Ab AB ab aB, AB Ab ab aB, 
and Ab AB aB ab. 


Cross: ABCxabe 
Tetrad ABC AB 
types: AbC Ab 

aBC ab 
ab c aB 
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(linked loci in that order) 
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Figure 4 The detection of four kinds of double crossover (two-strand, four-strand, and two kinds of three-strand) 
by tetrad analysis. They occur with equal probabilities. The three linked marker mutations are here shown as a, b, c 


and the corresponding wild-type alleles as A, B, C. 


take any value between two-thirds and zero, depend- 
ing on the second division segregation frequencies of 
the loci concerned (see equation above). So, while a 
tetratype frequency of two-thirds is in itself ambigu- 
ous, a tetratype frequency of significantly less than 
two-thirds of the total can be taken as evidence that 
the two loci are unlinked. 


Further Reading 
Fincham JRS and Day PR (1961) Fungal Genetics. Oxford: 
Blackwell Scientific Publications. 


See also: Attached-X and other Compound 
Chromosomes; Chromatid; Chromatid 
Interference; Crossing-Over; First and Second 
Division Segregation; Gene Conversion; Meiosis; 
Mendel’s Laws; Neurospora crassa; Postmeiotic 
Segregation 


Tetraparental Mouse 
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The cells contained within the early mammalian 
embryo (from zygote to eight-cell morula) are all 


undifferentiated and all capable of directing the 
development of an entire individual animal by them- 
selves, even though they typically participate in the 
creation of an animal with their sister cells. A bizarre 
consequence of this ‘totipotency’ of cleavage stage 
cells is the formation of chimeras. The term chimera 
comes from Greek mythology and is used to designate 
an embryo or animal that is composed of cells from 
two or more different origins. (The mythological Chi- 
mera is a composite of a lion, goat, and serpent.) 

The production of chimeric mice was first reported 
in 1961 by the Polish embryologist Tarkowski. He 
accomplished this feat by first taking the zona pellu- 
cida off two cleavage stage embryos to obtain denuded 
cell masses that are naturally sticky. When two 
denuded embryos are pushed together, they form a 
single chimeric cell mass that is capable of undergoing 
normal development within the female reproductive 
tract. When the two embryos are obtained from dif- 
ferent females mated to different males, the resulting 
animal has four parents and is considered to be tetra- 
parental. It is also possible to produce hexaparental 
animals that are derived from a combination of three 
embryos. The production of chimeric mice is an essen- 
tial component of the targeted mutagenesis technol- 
ogy that has revolutionized the use of the mouse as a 
model organism for studying human diseases. 


See also: Chimera; Targeted Mutagenesis, Mouse 
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Tetratype 
J RS Fincham 
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This term refers to the type of meiotic tetrad of hap- 
loid products formed in a diploid, doubly hetero- 
zygous meiotic cell (A/a B/b) that contains all four 
possible combinations of alleles: A B, A b, a B, and a b. 
For formation of a tetratype tetrad at least one of the 
two allelic differences must segregate at the second 
division of meiosis. When the A and B loci are linked 
on the same chromosome, a tetratype is most simply 
explained as due to a single crossover between the two, 
but can also result from certain kinds of double or 
multiple crossing-over. 


See also: Tetrad Analysis 


Thalassemias 
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The thalassemias are a group of inherited disorders of 
hemoglobin. They were first reported independently 
from the United States and Italy in 1925. The word 
‘thalassemia,’ derived from Greek roots for ‘the sea? 
and ‘blood, was invented under the mistaken belief 
that these diseases were confined to the Mediterranean 
region. It was later discovered that they are the com- 
monest single human gene disorders and have a wide- 
spread distribution in many countries of the world 
(Figure 1). 


Different Types of Thalassemia 


The thalassemias result from inherited defects in the 
synthesis of the globin chains of hemoglobin. Humans 
have different hemoglobins at various stages of de- 
velopment. Normal adults have a major hemoglobin 
(Hb) called HbA, comprising about 97% of the 
total, and a minor component, HbA, which accounts 
for 2-3%. The main hemoglobin in fetal life is HbF, 
traces of which are found in normal adults. There are 
three embryonic hemoglobins. All these different 
hemoglobins are tetramers of two pairs of unlike 
globin chains. Adult and fetal hemoglobins have a 
chains associated with B (HbA, œß2) 6 (HbAg, 
a282), or y chains (HbF, 272), whereas in the embryo 


there are different a-like chains called ¢ chains and 
distinct B-like chains called ¢ chains. Each individual 
globin chain has a heme moiety attached to it, to 
which oxygen is bound. 

There are two common types of thalassemia, «- and 
B-thalassemia, which result from defective synthesis of 
a or B chains. There are rarer forms in which both 6 and 
B chain, or s, y, 6, and y chain production, is defective, 
called 5B- or ey5B-thalassemia, respectively. 

The thalassemias are inherited in a Mendelian 
recessive fashion. The severe, homozygous form of 
the disease is called thalassemia major while the carrier 
state, in which only one defective globin gene is inherit- 
ed, is called the trait. The disease is very heterogeneous 
from the clinical viewpoint and many patients are 
encountered who fall in between these extremes; these 
disorders are called ‘thalassemia intermedia.’ 


Molecular Pathology 


Most of the thalassemias result from mutations which 
involve either the a- or B-globin genes. 


a-Thalassemia 

The genetics of -thalassemia is complicated because 
normal humans receive two & genes from each parent, 
a genotype which is written «a/a. There are 
two main classes of o-thalassemia. First, there are the 
a°-thalassemias, in which both a genes are deleted, 
that is all or part of the gene is missing; the homozyg- 
ous state is written --/--, and the heterozygous state 
--/aq. On the other hand, in the «*-thalassemias only 
one of the a genes is lost; the homozygous and hetero- 
zygous states are designated -a/-« and, -a/aq, re- 
spectively. Sometimes «'-thalassemia results from a 
mutation which inactivates the -globin gene rather 
than deleting it. In this case the heterozygous state is 
written o!o/o0. 


B-Thalassemia 

Over 200 different mutations of the B-globin genes 
have been found in patients with B-thalassemia. They 
may affect gene function at any level between tran- 
scription, processing of the primary messenger RNA 
transcript, translation, or posttranslational stability 
of the gene product. Rarely, B-thalassemia, like 
a-thalassemia, may result from a partial or complete 
deletion of the B-globin gene. Some of these mutations 
cause an absence of B-chain production and the result- 
ing disease is called B°-thalassemia, while others cause 
a reduced output of B chains, B'-thalassemia. Some 
of the latter forms are extremely mild and may not 
be identifiable in carriers; most heterozygotes for 
B-thalassemia have very mild anemia and an elevated 
level of HbA». 
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IVS 1-5G—C 
IVS 1-1 GT 

CODONS 41 - 42.4bp DEL. 
CODONS 26 GAG—AAG(HbE) 


oe 


IVS 2 - 654 C=T 

CODONS 41 - 42.4bp DEL. 
CODON 17 AAG=TAG 
CODON 26 GAG—AAG(HbE) 
-28 A-G 


Ww A->G 
w IVS 1-5G+C 


IVS 1-5G—-C 
ce 619 bp DELETION A P 
CODON 24 T>A rita ma a G 

POLY-A T--C 


Figure | 


CODONS 41 - 42.4bp DEL. 


The world distribution of the B-thalassemias. Each population has a different set of mutations. These are 


described either by the nucleotide base position in introns (IVS | or 2) or in the particular codons in exons. 
Mutations that are given the prefix — are those in the 5’ noncoding regions of the B-globin genes. Those marked 


poly(A) are mutations in the 3’ noncoding regions. 


The hallmark of all the thalassemias is imbalanced 
globin chain production. In the B-thalassemias this 
results in an excess of a chains, which precipitate in 
the red cell precursors, leading to their damage in the 
bone marrow and shortening the survival of their pro- 
geny in the peripheral blood. The pathology of the a- 
thalassemias is different. In the face of defective a chain 
production excess y chains produced in fetal life form y4 
molecules, while in adults excess B chains form By, 
molecules; these homotetramers are called Hb Bart’s 
(y4) and H (B4), respectively. They do not give up oxy- 
gen at normal physiological tensions and are also 
unstable. This leads to a shortened red cell survival and 
hence anemia, and patients are further disadvantaged 
because the high oxygen affinity of the homotetramers 
leads to reduced oxygen delivery to the tissues. 


Clinical Features 


The homozygous state for «°-thalassaemia, that is the 
loss of all four a globin genes, results in the stillbirth 


of a hydropic fetus, usually late in pregnancy. These 
infants are anemic and edematous and show all the 
features of severe intrauterine hypoxia. Pregnancies 
carrying these babies are complicated by a high fre- 
quency of toxemia and difficulties in delivery, par- 
ticularly because of enormously enlarged placentas. 
Individuals who have lost three of their four « genes 
(-a/--) have a condition called hemoglobin H disease 
which is associated with moderate anemia and enlarge- 
ment of the spleen. Persons who have lost two or one 
of their « globin genes are not incapacitated, but of 
course may pass on the defective chromosomes to 
their children. 

The homozyous or compound heterozygous (the 
inheritance of two different alleles) states for severe 
forms of B-thalassemia are characterized by severe 
anemia which is manifest during the first year of life 
when the switch from y- to B-globin chain production 
occurs. If these children are not given regular blood 
transfusions they usually die within a few months. If 
they are inadequately transfused they become growth 
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retarded, develop a curious mongoloid facial appear- 
ance, have gross skeletal deformities due to over- 
growth of the bone marrow, and a variety of other 
complications (Figure 2). Children who are well 
transfused grow and develop normally but if they do 
not receive drugs to remove the excess iron gained by 
transfusion they die of the effects of iron overload, 
which involves particularly the liver, endocrine 
glands, and heart. Some of the milder forms of B- 
thalassemia are compatible with relatively normal 
development without regular blood transfusions, 
despite a variable degree of anemia. 


Reasons for Clinical Variability 


Particularly in the case of the B-thalassemias, there is 
remarkable variability in the clinical severity of the 
disorder. Several factors have been identified. First, 
children with severe forms of B-thalassemia produce 
variable amounts of fetal hemoglobin after the first 
year of life. All normal adults produce small amounts 
of fetal hemoglobin in some of their red cell precur- 
sors; in B-thalassemia these cells come under intense 
selection because part of the excess of a chains, which 
destroy red cell precursors, are bound to y chains 
to produce hemoglobin F. It is now clear that one 
of the major factors in the clinical variability of 
B-thalassemia is a genetically determined ability to 
produce unusually high levels of fetal hemoglobin. A 
second factor which has been clearly identified is that 
the coinheritance of o-thalassemia will ameliorate the 
B-thalassemia. This remarkable experiment of nature 
provides clearcut evidence that it is the imbalance of 
globin chain production, and the excess of « chains, 
that is the major reason why B-thalassemia is so severe. 
Patients who are fortunate enough to inherit both 
types of thalassemia are less severely affected because 
the reduction of a chains caused by the «-thalassemia 
gene decreases the overall degree of globin chain 
imbalance and hence red cell production in the bone 
marrow is more effective. 


Coinheritance of Thalassemia with 
Hemoglobin Variants 


Although there are many structural hemoglobin vari- 
ants, most of them are rare and only three, hemo- 
globins S, C, and E, reach high frequencies. Hence it 
is not uncommon for a person with f-thalassemia to 
coinherit a gene for one of these variants. The com- 
pound heterozygous state for B-thalassemia and the 
sickle cell gene, sickle cell B-thalassemia, results in a 
clinical picture very like sickle cell anemia (Sickle Cell 
Anemia). On the other hand, the inheritance of B- 
thalassemia together with hemoglobin E, a hemoglobin 


Figure 2 An X-ray photograph of the hands of a child 
with severe B-thalassemia showing the marked thinning 
of the bones of the hands due to expansion of bone 
marrow. 


variant which is produced at a reduced rate and hence 
is associated with a mild f-thalassemia phenotype, 
produces a severe form of thalassemia which is usually, 
but not always, transfusion-dependent. Hemoglobin 
E B-thalassemia is one of the commonest forms of 
severe thalassemia, and is becoming a major public 
health problem in parts of India, and further east, 
particularly in Thailand and Indonesia. 


Distribution and Population Genetics 


The thalassemias occur at a particularly high frequency 
in a band stretching from the Mediterranean region, 
through the Middle East and Indian subcontinent into 
south-east Asia where they are distributed in a vertical 
line from China, through the Malaysian peninsula and 
into the island population of Indonesia. 

Each population has its own particular varieties of 
a- or B-thalassemia, which suggests that they have 
arisen by mutation and that the gene frequency has 
been increased by a local selective process. There is 
good evidence that the milder forms of «*-thalassemia 


are protective against Plasmodium falciparum malaria. 
Although it has not yet been formally proved, it seems 
very likely that this will also be the case for carriers of 
B-thalassemia. 


Control and Treatment 


All the thalassemias can be identified in the carrier 
state, and most forms can be diagnosed in the fetus, 
and thus it is possible to offer counseling and prenatal 
diagnosis for parents who wish to terminate pregnan- 
cies carrying babies with severe forms of the disease. 
This approach has resulted in a major reduction in the 
birth of new cases in some of the Mediterranean 
islands and in other countries. 

The only definitive form of treatment for thalas- 
semia is bone marrow transplantation, which is only 
possible when there is a matching donor relative. 
Symptomatic treatment involves regular blood trans- 
fusion and the use of iron-chelating drugs to remove 
the excess iron which results from transfused blood. 
Children with B-thalassemia who are adequately 
transfused and chelated grow and develop normally 
and in some cases are now able to have children of 
their own. They need expert care because they are 
prone to a variety of complications, including blood- 
borne infections, notably hepatitis C and human 
immunodeficiency virus (HIV), endocrine damage 
leading to growth retardation and bone disease, and 
the side effects of chelating agents. 

Future therapeutic efforts are being directed at try- 
ing to stimulate the production of fetal hemoglobin 
production, or at somatic gene therapy, directed at 
replacing defective a- or B-globin genes. 
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Thermophiles or heat- loving organisms, are generally 
defined as those organisms that grow optimally (7,51) 
above 55 °C. Although some animals can tolerate brief 
exposure to these temperatures, all organisms that 
require these temperatures for growth are pro- 
karyotic, i.e. lack a membrane-defined nucleus. 
Thermophiles can be further described as extreme 
thermophiles (7op.>65°C) or hyperthermophiles 
(Tope > 80°C) and most of the latter are archae- 
bacteria. Since most of the currently known thermo- 
philes have only been in culture for 20 years or less, the 
pace of development of genetic methods to study them 
has lagged behind that of mesophilic (moderate 
temperature) microbes. Genetics, as considered in 
this discussion, concerns the ability of thermophilic 
microbes to transfer genetic information among 
cells, to express those acquired traits, and to pass 
this information on to progeny. This discussion will 
be limited to those organisms belonging to the 
domain Bacteria. A comprehensive presentation of 
the genetics of Archaea is provided in a separate article 
(see Archaea, Genetics of). 


Why the Interest in Thermophiles? 


Since their discovery, thermophilic microbes have 
engendered interest in their ability to withstand high 
temperatures. From both applied and theoretical per- 
spectives, these organisms provide unique opportun- 
ities to understand and exploit their biochemical 
capacities. The intrinsic stability of their cellular com- 
ponents, particularly their proteins, has attracted 
interest in using them as sources of high-temperature 
biological catalysts. The most spectacular examples of 
this are the thermostable DNA polymerases isolated 
from thermophilic microbes for use in the polymerase 
chain reaction (PCR). The first such enzyme was 
obtained from the extremely thermophilic bacterium 
Thermus aquaticus. Since complex biochemical trans- 
formations may require many enzymes to work in 
concert to produce desirable endproducts, it may be 
desirable in these situations to employ whole organ- 
isms engineered to carry out the needed biotrans- 
formations. One goal of developing genetic methods 
with thermophiles is to allow the engineering of whole 
cells to facilitate the development of green technolo- 
gies such as bioremediation and generation of useful 
products from waste biomass. 
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More recently, it has become apparent that organ- 
isms very much like modern thermophiles may 
have played a pivotal role in the early evolution of 
life. Current evolutionary studies of living organisms 
suggest that the earliest ancestors of all life were 
hyperthermophiles. In addition, as complete genome 
sequences of several thermophiles have become avail- 
able, evidence has accumulated suggesting that genetic 
information has been transferred among thermo- 
philes. The impact this has had upon the evolution of 
life may be profound and will continue to be an 
important topic of study. 


Challenges of Thermophile Genetics 


In Vivo versus In Vitro Genetics 

Using recombinant DNA methods, it is often possible 
for one to remove a gene of interest from an organism 
and study it in another organism (such as Escherichia 
coli) that is easier to handle in the laboratory. This in 
vitro genetics has made possible the study of many 
enzymes from thermophiles by expressing the genes 
encoding those enzymes in hosts such as E. coli. In 
particular, this has allowed investigators to study in 
detail the molecular basis of the thermostability exhib- 
ited by these enzymes and to alter their properties to 
suit given needs in biotechnological applications. 
Although these im vitro genetic studies account for 
the great majority of studies of thermophile enzymes, 
they do not involve true genetic manipulations of 
the thermophilic organisms themselves. The actual 
manipulation of DNA within the thermophiles (in 
vivo genetics) is much less common due to techno- 
logical constraints and our general lack of knowledge 
of these microbes. It is these manipulations that will be 
the focus of the following discussion. 


Transfer of DNA Among Thermophiles 

Genetic analysis of thermophilic bacteria is limited to 
only a few species: Bacillus stearothermophilus 
(Tope ~ 60°C), Thermus thermophilus (Top.~ 70°C), 
Thermoanaerobacter species (Top,.~60°C), and 
Thermotoga species (Tope ~ 80°C). In all cases, the 
genetic tools available are much less sophisticated 
than those used to study mesophiles like E. coli. 
Three methods are used to introduce DNA into 
bacteria: transduction, conjugation, and trans- 
formation. Transduction is DNA transfer mediated by 
abacterial virus (bacteriophage) that contains a segment 
of genomic DNA removed from its previous host. 
Although bacteriophage that infect some thermo- 
philes are known, none are known to be capable of 
transferring chromosomal genes to an infected host. 
Conjugal transfer of DNA between cells involves 
cell-to-cell contact during the transfer process and 


has been observed among some Thermus strains in 
the laboratory. No other thermophiles have been 
shown to do so. Finally, transformation, the process 
by which naked DNA is taken up by cells, occurs in all 
of the thermophiles listed above. However, only Ther- 
mus species undergo ‘natural’ transformation 
whereby naked DNA is taken up by cells during 
growth without pretreatment of those cells in the 
laboratory. The other organisms must be forced to 
take up DNA by prior chemical treatment of the 
cells or by driving the DNA into cells using electrical 
force (electroporation). 


State of Genetics for Specific 
Thermophiles 


Strictly Anaerobic Thermophiles 

In the cases of Thermotoga and Thermoanaerobacter, 
investigators have only recently demonstrated DNA 
uptake and expression of genes encoding selectable 
phenotypes. A major impediment to the use of select- 
able markers is the instability of the selective agents, 
typically antibiotics, in the growth medium. For strict 
anaerobes, the combination of high temperature and 
the reactive compounds needed to exclude oxygen 
from growth media can often inactivate the drugs. 
Additionally, many of the proteins that confer resist- 
ance to antibiotics in mesophiles are themselves 
unstable at high temperatures. The antibiotic kanamy- 
cin has been found to be of great use with thermo- 
philes since it is generally stable to heat in culture 
medium and there is a gene that encodes a heat-stable 
protein that confers resistance to it. This selectable 
phenotype is used in many thermophiles including 
species of these two strictly anaerobic groups. Both 
Thermoanaerobacter and Thermotoga have been 
transformed to kanamycin resistance using artificial 
methods. However, these methods have not as yet 
been used to examine the physiology or molecular 
biology of these organisms. 


Bacillus stearothermophilus 

Bacillus stearothermophilus has been artificially trans- 
formed with plasmid DNA containing the kanamycin- 
resistance gene to allow stable maintenance of the 
plasmid in the cell population. Genes from other 
organisms have also been introduced into this organ- 
ism where their encoded proteins were expressed. 
Mutants of B. stearothermophilus have been con- 
structed by introducing into them a transposon that 
integrated into a resident plasmid encoding the 
ability to degrade phenol. The mutation introduced 
into this plasmid caused those cells to produce cat- 
echol, a chemical useful for a variety of industrial 
processes. 


Thermus thermophilus 

Arguably the most genetically tractable thermophilic 
bacterium is Thermus thermophilus. As noted above, 
this organism, like many Thermus species, can take up 
DNA during growth without prior chemical treat- 
ment of cells. The mechanism by which transform- 
ation occurs is not yet known, though itis currently the 
subject of investigation. Unlike most naturally trans- 
formable organisms, T. thermophilus is capable of tak- 
ing up DNA at all phases of growth in the laboratory. 
Further, it is very efficient in doing so, with trans- 
formation efficiencies for uptake of chromosomal 
genes of up to 12% of the cells in a culture taking up 
added DNA. Thermus cells can be transformed by T: 
thermophilus DNA that has been cloned into E. colt. 
This allows investigators to simply mix E. coli cells 
harboring cloned T. thermophilus DNA with T. thermo- 
philus cells, heat the mixture (killing the E. coli), 
and then incubate the mixture under conditions that 
only allow genetically transformed T. thermophilus 
cells to grow. Thus Thermus genes can be readily 
removed, altered in E. coli, and then returned to 
T. thermophilus cells to observe the effects of the 
alterations. 

A variety of plasmid vectors have been constructed 
to allow manipulation of T. thermophilus genes 
in Thermus itself. Plasmids capable of replication in 
T. thermophilus contain the gene conferring resistance 
to kanamycin. Other plasmids cannot replicate in 
T. thermophilus, but contain portions of chromosomal 
DNA so that they can recombine into defined sites 
within the T. thermophilus genome. These allow stable 
maintenance of cloned genes in single copies in the 
chromosome. In addition, expression of these genes 
can be placed under the control of the promoter where 
the plasmid integrates. 

Recent work has shown that DNA can be trans- 
ferred by conjugation from T. thermophilus strain 
HB8 to a closely related strain, HB27. Strain HB8 
harbors a region of its chromosomal DNA similar in 
sequence to the F plasmid of E. coli which allows 
conjugal transfer of chromosomal genes after it inte- 
grates into the chromosome of the donor cell. It is 
thought that this DNA element allows strain HB8 to 
transfer copies of its chromosomal genes to recipient 
cells. There is as yet no evidence for the existence of 
this mobilizable DNA element as a free plasmid in 
cells. This element is adjacent to genes allowing strain 
HB8 to grow under anaerobic conditions by nitrate 
respiration and the transfer of this trait to strain HB27, 
which cannot respire nitrate, is being used to examine 
the conjugation mechanism. Mutations are being 
made in suspected conjugation genes by site-directed 
mutagenesis to systematically assign functions to 
these genes. 
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Theta replication is the primary type of replication 
used to replicate circular genomes (such as those of 
bacteria and many viruses and plasmids). These circu- 
lar genomes each have one origin of replication. Theta 
replication proceeds bidirectionally from this origin, 
creating a replication bubble. Replication occurs in 
both directions simultaneously because there are two 
strands; on each strand, the replication fork moves 
from 5’ to 3’. The replication bubble grows during 
DNA replication (see Figure 1), eventually splitting 
off as anew double-stranded copy of the genome when 
the forks traveling in opposite directions merge. The 
structure formed during this process resembles the 
Greek letter theta (0), hence the name of the model. 


ori 
ori 
7 
-O-O-O = 
O we 


Figure | Theta replication. 

In Figure I, the replicating genome is shown as 
having only a single replication bubble. In reality, 
additional replication bubbles can be initiated soon 
after the parental origin of replication has been dupli- 
cated; this is how a bacterium that takes 40 min to 
replicate its DNA can still reproduce with a doubling 
time of only 20 min, and it means that the copy num- 
ber of genes near the origin can be several-fold higher 
than that of genes on the opposite side of the circle. 


See also: Replication 
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This is the standard method in classical genetics of 
ordering linked genes. Basically, it involves determin- 
ing the relative frequencies of the eight possible pro- 
ducts of meiosis in a triply heterozygous diploid, say 
A/a B/b C/c, where A, B, and C are linked, or sus- 
pected of being linked. If the organism is a haploid 
fungus or alga, the products of meiosis, which will be 
spores of one kind or another, can be individually 
germinated and scored directly. In the case of diploid 
organisms, the triple heterozygote is crossed to a 
homozygous triple recessive a/a b/b c/c, so that the 
eight kinds of meiotic product are represented by 
eight distinguishable phenotypes in the test-cross 
progeny. 

If A, B, C are unlinked, the eight types of meiotic 
product will, apart from sampling error and any dif- 
ferences in viability, be equally frequent (see Independ- 
ent Segregation). If all are linked, defining two 
adjacent intervals in the linear order of the chromo- 
some, two types of reciprocally constituted products 
will represent noncrossovers (parental types), two 
types will result from crossing-over in one interval, 
two from crossing-over in the other, and two from 
crossing-over in both. Table | sets out the possibil- 
ities, with some hypothetical frequencies for illustra- 
tion. We would conclude from these data that the 
triple heterozygote inherited A B C from one parent 
and a b c from the other, and that the loci were in the 
order as written. The order of the loci is usually 
obvious from the observation that two of the eight 
types are conspicuously less frequent than the others 


and can therefore be identified as the double crossover 
types. It is important that both putative double cross- 
over classes should agree in their low frequency, since 
a low frequency of just one of them could be due to its 
low viability. Approximately equal frequencies of 
reciprocally constituted classes give reassurance that 
the conclusions regarding linkage are not being ser- 
iously distorted by viability differences. 

The recombination percentages (uncorrected link- 
age map distances) between the loci, taken two at a 
time, are calculated by summing the recombinant 
classes 3, 4, 7 and 8 for A-B and classes 5, 6, 7 and 8 
for B-C: 36 and 25 respectively, based on the numbers 
in Table I. Unless the loci are sufficiently close 
together for there to be no double crossovers (classes 
7 and 8 both zero), the recombination frequency 
between A and C (3 + 4 + 5 + 6, totalling 51 here) 
will always be less than the sum of A-B and B-C. 
Because of double and multiple crossovers, recombin- 
ation frequency does not increase linearly with map 
distance. True map distance, measured in centimor- 
gans, is 100 times the mean crossover frequency per 
meiotic product or (because each crossover involves 
only two out of four chromatids in the meiotic biva- 
lent) 50 times the total mean number of crossovers per 
meiotic cell. This is equal to the percentage recombin- 
ation between the ends of the chromosome interval 
only when double crossovers do not occur. 

Three-point data give information about 
crossover interference. If crossovers are formed in- 
dependently, with no effect of one on the chance 
of formation of another, the probability of simultan- 
eous crossing over in both of the marked intervals 
should be the product of the probabilities in each 
interval separately. A lower double crossover fre- 
quency than predicted on this basis is an indication 
of crossover interference, the intensity of which is 
measured by the extent to which the coefficient of 
coincidence (observed/predicted double crossover 
frequency) falls below unity. 


Table | A typical result of a three-point test-cross: ABC /abc x abc/abc 

Diploid phenotypes or meiotic products“ Frequency Interpretation 

l ABC 44 % No crossover 

2 abc } 

3 Abc 31% Single crossover between A and B 
4 aB a 

5 ABc 20 % Single crossover between B and C 
6 ab el 

7 AbC 5% Double crossover, A-B and B-C 
8 aBc } 


“It is the phenotypes of the diploid progeny of the cross that are scored, but they represent the haploid meiotic products 
(germ cells) of the triply homozygous parent A B C /a b c, since the other parent contributes only the recessive alleles a, b, c. 


Given the illustrative numbers in Table l, we 
would expect, if there were no interference, the fre- 
quency of doubles to be 0.36 x 0.25 = 0.09 of the total, 
as compared with the 0.05 observed. The coefficient of 
coincidence is therefore 5/9 or 0.56, which is a fairly 
typical value. Generally, interference is stronger over 
shorter distances. 


See also: Independent Segregation; Interference, 
Genetic; Map Distance, Unit; Mapping Function; 
Meiotic Product 
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Threonine (Thr or T) is one of the 20 amino acids 
commonly found in proteins. Its side-chain contains 
an OH group allowing it to form H-bonds with water. 
It is therefore classed as a polar amino acid. Its chem- 
ical structure is given in Figure I. 


COO 
*HyN-C—H 
H—C—OH 
i, 


Figure | Threonine. 


See also: Amino Acids; Proteins and Protein 
Structure 
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Threshold characters refer to discrete (i.e., discontinu- 
ous) traits that are not inherited in a Mendelian fash- 
ion but in a manner more similar to continuous traits. 
In other words, despite being discontinuous in nature, 
their inheritance is polygenic rather than monogenic. 
An example of a threshold character is litter size in 
mice, pigs, and cattle. Although litter size itself must 
be a whole number, the factors that control litter size 
are quantitative, namely the levels of circulating gona- 
dotrophic hormones. 

The simplest form of discrete trait is one that 
is either present or absent. The liability-threshold 


Thymine 1965 


model describes how polygenic effects (which are 
cumulative and therefore continuous) can lead to a 
dichotomous character. This model assumes the pres- 
ence of an unmeasured continuous variable that, if it 
could be measured, would be appropriate for quan- 
titative genetic analysis. Such an underlying continu- 
ous variable is known as the liability. If the liability of 
an individual exceeds a certain threshold value, then 
the trait will be present in the individual. Conversely, 
if the liability falls below the threshold, then the trait 
will be absent. 

Since liability is defined as an unmeasured variable, 
its frequency distribution in the population is un- 
known. In order to apply standard quantitative gen- 
etic theory to liability, it is usual to assume that it has 
normal distribution, and that its joint distribution 
in members of a family has a multivariate normal 
distribution. However, it is also possible to incorpor- 
ate the effect of one or more major loci into the 
liability. A model that incorporates both polygenic 
and major locus effects on liability is called a ‘mixed 
model.’ The correlation in liability between any pair 
of relatives is determined by the degree of the relation- 
ship and the underlying genetic model. Empirical data 
on the frequencies of the trait in the general popu- 
lation and in relatives of probands with and without 
the trait, or on pedigrees ascertained via probands, 
can be fitted to the liability-threshold models to 
obtain estimates of the threshold and of the variance 
components of the liability. In this way it is possible to 
test for the presence of a major locus effect (complex 
segregation analysis), and obtain an estimate of the 
heritability of the liability. 

The validity of a liability-threshold model can be 
assessed by goodness-of-fit statistics of empirical data 
involving pedigrees containing relative pairs of differ- 
ent classes (e.g., identical and fraternal twins, full and 
half siblings). However, rejection of the model may 
mean that complicating factors (such as assortative 
mating) have not been accounted for, rather than 
failure of the basic liability-threshold model. 


See also: Continuous Variation; Heritability; 
Multifactorial Inheritance 
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Thymine is a pyrimidine (molecular formula, 
C;H6N202) found primarily within DNA in the 


1966 Thymine Dimer 


form of a deoxynucleotidyl residue, paired with 
adenine. Thymine is also found in trace quantities 
within transfer RNA. Chemically, thymine can be 
considered to be a derivative of uracil, and is some- 
times referred to as 5-methyluracil. Because thymine 
is a critically important constituent of DNA, consid- 
erable effort has been put into the design of drugs 
that might selectively inhibit thymine biosynthesis, 
thereby blocking DNA replication, especially in 
rapidly dividing malignant cells. Examples of success 
in this area include the drugs fluorouracil, methotrex- 
ate, and aminopterin, each of which directly or indir- 
ectly block the attachment of the methyl substituent 
to the pyrimidine ring of the thymine precursor 
deoxyuridylic acid. 


See also: Pyrimidine 


Thymine Dimer 
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A thymine dimer is a cross-linked pair of adjacent 
thymine residues in DNA, which results from damage 
induced by ultraviolet radiation. 


See also: DNA Repair; Thymine 
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Ti plasmids are large, often more than 200 kb long, 
catabolic plasmids harbored by Agrobacterium tume- 
faciens strains. A Ti plasmid can be transferred by 
conjugation to most Agrobacterium and some Rhizo- 
bium species. A major characteristic of a Ti plasmid is 
that it contains, the vir or virulence genes, which 
enable a copy of one or more segments (T-DNA) of 
the Ti plasmid be transferred into plant cells, where it 
can become integrated into the plant genome. The 
genes encoded by the T-DNA are under eukaryotic 
control and can be expressed in a plant background. 
This can result in a plant cell proliferation (crown gall 
formation) and the synthesis and secretion of a specific 
metabolite, of no use for the plant. These metabolites, 
called opines, are condensation products of amino 
acids, such as arginine and lysine, and abundant plant 


metabolites such as pyruvic acid, ketoglutaric acid, 
succinate, and mannose. 

Ti plasmids encode also functions that allow the 
Agrobacterium to catabolize these opines. So Ti plas- 
mids bring to their host the capacity to engineer plant 
tissue to proliferate (hence the name ‘tumor inducing’) 
and force these tissues to synthesize large amounts 
of compounds that can only be catabolized by Ti- 
harbouring bacteria. 

The discovery of the Ti plamids and the under- 
standing of the mechanism of T-DNA transfer was a 
start for altering these plasmids and turning them into 
vectors for plant gene engineering. These transgenic 
plants have now been commercialized and are grown 
worldwide (on more than 45 million hectares in the 
year 2000), all having been constructed with the help 
of the Ti plasmid as gene vector. 


See also: Agrobacterium; Plasmids; Transfer of 
Genetic Information from Agrobacterium 
tumefaciens to Plants 


Tissue Culture 
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In the modern context, tissue culture usually refers to 
the long-term culture of dispersed animal or plant cells 
rather than the short-term incubation of organs or 
tissues. However, its evolution has been to mimic 
complex cellular interactions in vitro in a manipu- 
latable fashion and had its origins in the maintenance 
of organs or tissues ex vivo. As such, its history goes 
back to the beginning of biology. However, it was only 
at the end of the nineteenth century that Roux demon- 
strated the viability of cells outside of the body in 
physiologic saline, and not until 1907 when Harrison 
showed the outgrowth of neurons from explanted 
tissue, that tissue culture can be truly considered to 
begin. These explant cultures were grown in a lymph 
clot, a technique replaced by the plasma clot and 
perfected by Carrel. Carrel and Burrows showed the 
growth-promoting activities of a chick embryo extract 
and using this technique, with rigid control of sterility, 
Carrel maintained a strain of cells for over 34 years. 
This technique became the mainstay of cell culture. It 
was used by investigators such as Rous who trans- 
ferred cells back into chickens and showed that they 
formed tumors, a phenomenon that was later shown 
to be due to the Rous sarcoma virus (RSV) named after 
him. Thus, cancer cell biology was born. However, it 


was only towards the end of the 1940s when manipu- 
latable cell lines were established that tissue culture, as 
we know it, can be said to have started. 


Milestones in Development of Tissue 
Culture 


Many of the cell lines originally isolated in the late 
1940s and early 1950s are still in use today, illustrating 
their long-term culturability. These include the L-cell 
line that Earle showed could be dispersed and plated as 
single cells to grow upasclones, HeLacellsderivedfrom 
a human cervical tumor, and Chinese hamster ovary 
(CHO) cells from a disaggregated ovary. A key parallel 
event was the development of defined culture media. 
These were originally developed from physiological 
salt solutions defined by Earle and Ham, although it 
was Eagle who systemically tested many reagents and 
developed the complex media with over 25 ingredients 
that bears his name. In fact, many media available 
today bear these pioneers’ names, and even those 
media developed for specific cell types such as Iscove’s 
medium for hemapoietic cells, are based on these 
original media formulations. These defined media, 
however, are not sufficient to support the growth of 
cells but must be supplemented with serum, usually 
fetal bovine. In addition, some cells need the products 
of replication-inactivated (usually by treatment with 
mitomycin C or y-irradiation) feeder fibroblastic cell 
layers. It is thought that the serum or feeder cells 
provide the source of adhesion molecules, growth 
factors and carriers such as transferrin. It was the 
observation in the middle of the 1950s that nerve 
cells required nerve growth factor (NGF) for out- 
growth and maintenance of their viability that allowed 
the development of more defined media for the 
support of differentiated cells. Innumerable growth 
factors have since been isolated such as epidermal 
growth factor (EGF), platelet-derived growth factor 
(PDGF) for epithelial and fibroblastic cells, respect- 
ively, as well as those for hemapoietic cells including 
macrophage and granulocyte colony stimulating fac- 
tors. Thus several cell types can now be cultured 
in completely defined media. However, despite the 
identification of numerous cell-type-specific growth 
factors, the ability of cells to grow in the complete 
absence of serum is still the exception rather than the 
rule. 

Cells are often cultured from tissue explants or 
from disaggregated embryos. These cells are anchor- 
age dependent and retain a normal diploid chromo- 
some complement. The primary cultures can be 
propagated to form secondary cultures. But Hayflick 
observed that cultured human fibroblasts have a 
finite replicative life span, the “Hayflick limit’ after 
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which they cannot be passaged. This cellular senes- 
cence has proven to be a fertile ground for the study of 
aging. In mouse fibroblasts, a similar replicative life 
span exists. However, the cultures usually go into a 
crisis when most cells cannot be propagated but some 
outgrow as morphologically transformed cells. These 
are usually aneuploid and are immortal. They also 
display less stringent growth requirements including 
the loss of contact inhibition of growth, a lower 
requirement for serum growth factors, anchorage 
independence, and often an ability to form tumors in 
nude (immunology compromised) mice. These are 
also the characteristics of cultured cells derived from 
tumors. Similar changes can be induced in normal 
primary cells by transformation of receptive cells 
with RNA and DNA tumor viruses such as RSV, 
polyoma, and simian virus 40 (SV40). These trans- 
formed cells and the events leading from the mortal 
to the immortal tumorigenic state have been major 
areas of study for cancer biology. Indeed, a major 
assay for the tumorigenic phenotype is the ability of 
cells to grow as foci of morphologically transformed 
cells, or as colonies in semisoft media containing agar 
or methylcellulose. These characteristics are not 
shared by normal mortal cells. 

Almost all cells that can be continuously cultured 
are neoplastically transformed and genetically altered. 
An exception to this are embryonic stem (ES) cells. 
These are derived from blastocysts and represent 
uncommitted cells form the inner cell mass (the cells 
that will form the embryo proper). ES cells can be 
cultured indefinitely either on feeder cells or in the 
presence of the misnamed growth factor, leukemia 
inhibitory factor (LIF). They show many characteris- 
tics of transformed cells including anchorage independ- 
ence, loss of contact inhibition, and the ability to form 
tumors in vivo. However, when reintroduced into a 
blastocyst, they take on normal cell fates and can 
contribute to all tissues including the germline. 
These cells are also unusual in that they permit hom- 
ologous recombination into their nuclear DNA of 
introduced genes thus allowing the ablation or muta- 
tion of specific genes. By reintroduction of these 
mutated cells into the blastocyst, mutations can be 
introduced into the mouse germline. This technology 
of switching between cultured cell and living organism 
has dramatically enhanced the study of mammalian 
development. Furthermore, the pluripotent nature of 
ES cells has been exploited in culture by inducing 
differentiation thereby allowing the study of this 
process. Differentiation in vitro is now becoming 
common and many lineages have cell culture systems 
suitable to study the biochemistry of differentiation. 
These include myogenic, adipogenic, neurogenic, 
and hemapoietic lineages. To date, however, 
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maintenance and differentiation of epithelially derived 
cell types has been difficult. 

Cell culture has also been invaluable for genetics. 
The first cloning from single cells allowed the estab- 
lishment of genetically homogenous cell lines and 
the ability to identify mutation and purify these free 
from other cells. This search for mutations was particu- 
larly fertile in Puck’s CHO cell line that is, in fact, a 
proline autotroph. It was in this cell line that the first 
temperature-sensitive and, therefore recessive, muta- 
tion was identified. This was a paradox since mam- 
malian cells are diploid making the identification of 
recessive mutations theoretically almost impossible. 
However, it appears that despite its almost diploid 
chromosome number, the CHO genome has large 
areas of functional hemizygosity. This cell line is easily 
mutagenized and consequently large numbers of 
mutants were isolated in this and some other cell 
lines. The availability of mutants led to attempts to 
rescue the mutations by complementation with DNA 
in a manner similar to that pioneered in bacterial 
genetics. DNA transformation was first achieved by 
Graham and Van der Erb using adenovirus DNA and 
this soon became routine with both plasmid and 
chromosomal DNA. A similar transfer of genetic 
information could also be achieved into cells with a 
variety of viral vectors. Consequently, DNA trans- 
formation of cultured cells could be used to isolate 
genes by complementation, to study structure- 
function relationship of gene products and to study 
gene regulation in promoter assays. It also became 
the basis of the commercial use of cell culture for the 
production of biologicals. 

Somatic cell mutants allowed the study of cells 
hybrids. In this case, fusion is made between cells 
carrying particular selectable markers. The first meta- 
bolic selection was the HAT (hypoxanthine, aminop- 
terin, and thymidine) selection technique developed 
by Szybalaski and exploited by Littlefield, that kills 
cells that lack hypoxanthine guanine phosphoribosyl 
transferase (HGPRT) and thymidine kinase (TK). This 
is because the aminopterin blocks de novo synthesis 
of purines and conversion of (UMP to dTMP forcing 
the use of salvage pathways. Consequently, parent 
cells that are TKT/HGPRT and HGPRT‘/TK can 
be hybridized and complementing hybrids selected by 
incubation in HAT media. Somatic cell hybrids can be 
created both within and between species. Fusion can 
be spontaneous or promoted by inactivated Sendai 
virus or, more commonly now, by incubation in poly- 
ethylene glycol. This allows complementation assays 
to be performed and questions to be addressed such as 
the dominance of the neoplastic phenotype. Contro- 
versies over some of the results led to the realiza- 
tion that chromosomes were being lost from 


heterokaryons. Thus, in human-hamster hybrids, 
human chromosomes are preferentially lost. This 
enabled the establishment of somatic cell hybrids that 
contained single human chromosomes to be used for 
rapid chromosomal mapping of genes; a technique still 
in use today. The immortalization of normal B cells by 
fusion with myeloma cells led to hybrids that produce 
a single antibody specificity. These cells can be propa- 
gated indefinitely producing a monoclonal antibody 
of defined type and directed to a single epitope. These 
cultures can be scaled up for industrial production of 
monoclonal antibodies. 

Concomitant with the development of animal cell 
culture, plant cell culture was also developed. It lagged 
early on because of the lack of appropriate media but, 
by the 1930s, suitable media were developed by White 
and others. In contrast to animal cell culture, plant 
cells can grow in defined media. A major advance 
occurred when whole plants were generated from a 
single cell, originally performed in carrot, by Kato and 
Takeuchi, but now routine in many species. This 
allowed for clonal propagation of plants as did the 
regeneration of buds, shoots, and roots in culture. 
Thus, in many ways, plant cell culture is in advance 
of animal cell culture since development and differen- 
tiation can be studied, as can the sophisticated inter- 
actions between parasites and hosts in culture. 


Culture Methods 


Isolated cells were originally cultured on glass, hence 
in vitro literally meaning ‘in glass.’ However, these 
methods have been superseded by use of specially 
treated plasticware available from many different sup- 
pliers. Flasks or tissue culture plates are usually desig- 
nated by their surface area, e.g., 25 cm’, 75 cm” flasks or 
their diameter, 30 mm, 60mm plates, etc. Generally, cells 
are grown in a modification of Eagle’s medium, e.g., 
o-minimal essential medium (2MEM) or Dulbecco’s 
modified Eagle’s medium (DMEM) supplemented 
with 5% to 20% serum. Usually the medium is bicar- 
bonate buffered since this physiological buffering 
seems to enhance the growth of cells, and thus the cells 
require a CO) containing atmosphere and usually con- 
tain phenol red as a PH indicator. Cells are subcultured 
by trypsinization to dispense the cells or by scraping 
for trypsin-resistant cells. Thus, they can be propa- 
gated from flask to flask. Cells can also be frozen at 
temperatures below —70 °C in media containing serum 
and dimethyl sulfoxide in a state from which they can 
be effectively resurrected. Thus, clones or cell lines 
can be laid down in storage for use many years later. 
Surface culture of cells is limited for the large-scale 
production of cells. Some transformed cells can be 
adapted to growth in suspension. Suspension culture 


is usually in spinner flasks that contain a magnetic 
stirrer bar and these can be scaled up to 10 liters or 
so. After that, for industrial production, various fer- 
menter designs have been developed. However, most 
cells require substrate attachment to grow, limiting the 
large-scale production of cells or their products. Con- 
sequently, the design of culture flasks has been modi- 
fied to contain spirals or racks of plastic to enhance 
surface area, or cells have been grown in suspension 
culture attached to small beads known as microcar- 
riers. Given the ability to transfer DNA into mammal- 
ian cells and to obtain stable expression of gene 
products, such large-scale techniques have become 
an essential component of the biotechnology industry 
to produce recombinant products on a commercial 
scale. At the other end of the scale, hemapoietic 
stem cells can be grown in small colonies, usually in 
semisolid medium, and differentiation induced by 
application of particular growth factors. These 
methods have been used to define lineage relation- 
ships. These culture methods have also provided 
invaluable assays for the isolation and eventual cloning 
of hemapoietic growth factors, many of which are 
now in commercial production as therapeutics using 
the large-scale culture techniques described above. 
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Joe-Hin Tjio (1919- ) is best known for his part in the 
discovery in 1956 that the diploid chromosome num- 
ber in humans is 46 and not 48 as had been dogma 
since 1912. This discovery was made at the Institute of 
Genetics, University of Lund (Sweden) in collabor- 
ation with Albert Levan. 


Tjio, Joe-Hin 1969 


The successful enumeration of the correct diploid 
chromosome number was the product of many factors, 
among which were Tjio’s acknowledged technical 
skill, the use of an amenable tissue, and application 
of techniques that had been developed for plant and 
animal cytogenetics. The crucial study was performed 
on fetal lung fibroblasts, cultured in vitro by Rune 
Grubb at the University’s Department of Microbiol- 
ogy, which proved to be an excellent source material 
for chromosome preparations. These cells were 
induced to arrest at metaphase (the best stage for 
counting chromosome number) by use of the spindle 
poison, colchicine; Albert Levan had pioneered this 
approach in studies of plant chromosomes. The sem- 
inal paper by Tjio and Levan, entitled “The chromo- 
some number of man,’ was published in Hereditas in 
1956 (Tjio and Levan, 1956). This publication marks 
the start point of the discipline of clinical cytogenetics. 

Joe-Hin Tjio was born on 11 February 1919 in 
Indonesia where he grew up. He trained as an agrono- 
mist from 1936 to 1940, then took up a position as a 
cytogeneticist at the Botanical Institute of Bogor in 
Indonesia. His initial foray into cytogenetics came to 
an abrupt halt following the invasion of the country 
by the Japanese and his internment for the remainder 
of the war. After the war, Tjio moved to Europe. 
He worked as a research assistant in laboratories in 
Denmark and Sweden (including that of Albert 
Levan), before taking up a position as head of cyto- 
genetics in Zaragosa (Spain) where he remained from 
1948 to 1957. This move to Spain did not, however, 
end his association with Albert Levan. Tjio made 
regular study visits to Lund and it was on one of 
these, in 1955, that he carried out the work resulting 
in the identification of the human diploid chromo- 
some number. 

Following the publication of the correct chromo- 
some number and presentation of the observations at 
the First International Human Genetics Congress in 
Copenhagen, Tjio received a number of invitations to 
work in the United States. Although initially reluctant 
to move, in 1957 Tjio joined T.T. Puck at the Univer- 
sity of Colorado. While there, he completed his PhD 
entitled “The somatic chromosomes of man.’ In 1959, 
Tjio moved to the National Institutes of Health, 
initially at the National Institute of Arthritis and 
Metabolic Diseases, and subsequently, at the National 
Institute of Diabetes and Digestive and Kidney Dis- 
eases. At the latter, he was head of the cytogenetics 
section, a position he held until retirement in 1992. He 
continued to work after retirement until 1997. 

The continuing thread through Joe-Hin Tjio’s 
career was the study of chromosomes. His initial pub- 
lications in 1948 were on plant chromosomes. A series 
of publications followed, many with Albert Levan, in 
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which the effects of chemicals on plant chromosomes 
were described. In 1954, with encouragement from 
Levan, he turned his attention to mammalian chromo- 
somes, initially using mouse ascite tumors and later 
human cells. This interest developed further in the 
United States, and with T.T. Puck and A. Robinson he 
published a number of papers highlighting chromo- 
some abnormalities in constitutional genetic defects. 
Helater became especially interested inthe Philadelphia 
chromosome (a marker chromosome resulting from 
a reciprocal translocation involving chromosomes 9 
and 22) in chronic myeloid leukemia. For many 
years he applied his cytogenetic expertise in studies 
on autoimmunity in mouse model systems. 

Joe-Hin Tjio received many awards and honors for 
his scientific endeavors. The Kennedy International 
Award from the Joseph P. Kennedy foundation that 
he received in 1962 is probably the most prestigious. 
This award recognized his important contribution to 
our understanding of genetically determined mental 
retardation. 


Further Reading 
Tjio J-H and Levan A (1956) The chromosome number of man. 
Hereditas 42: |—-6. 


See also: Levan, Albert; Painter, Theophilus 
Schickel 
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See: Melting Temperature (Tm) 
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See: Tumor Necrosis Factor (TNF) 
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The processes of DNA transcription and replication 
both require separating the two strands of the double- 
helical DNA molecule, and they happen at tre- 
mendous speed. For example Escherichia coli DNA 
replication unwinds about 100 000 base pairs per mi- 
nute. This potentially leads to overcoiling of the DNA 
into tight knots, with the two strands wrapped too 
tightly around each other. Topoisomerases are pro- 
teins that solve this problem by cutting the DNA 


backbone, letting the cut ends twist past each other 
to a more relaxed configuration, and then resealing the 
phosphodiester bonds in the backbone to again form 
an intact double helix. DNA topoisomerases bond 
covalently to the DNA phosphate as they break the 
phosphodiester linkage between neighboring nucleo- 
tides, storing the energy in that bond to use in the 
process of resealing the bond. They are thus in effect 
reversible nucleases that break and reseal the DNA 
molecule rapidly and efficiently, with no added energy 
- they cannot let go until they repair the break they 
have caused. 

There are two families of DNA topoisomerases. 
Members of the topoisomerase I family break only 
one strand, causing a nick in the DNA and letting 
the two free ends rotate relative to each other around 
the phosphodiester linkage in the other strand, driven 
by the stress of any supercoiling of the DNA. Mem- 
bers of the topoisomerase II family are primarily 
responsible for resolving the potential tangle where 
two DNA double helices cross each other. They cut 
both strands of one such molecule at the same time, 
binding to both free ends and thus forming a sort of 
double protein-lined gate. The other double helix can 
be passed through this gate, while the topoisomerase 
still keeps both ends of the cut strands close to each 
other and ready to reseal. Type II topoisomerases are 
useful, for example, for separating the two replicated 
strands of circular DNA molecules. Some of them 
need extra ATP energy to make the molecules relax. 
In eukaryotic cells, the type II enzymes are mainly 
found during periods of DNA replication, so they 
are useful targets for anticancer drugs. Bacteria also 
have another form of type II topoisomerases, called 
gyrases, that are able to put in extra superhelical turns. 
Many enzymes that work on DNA can only work 
properly if the DNA has a bit of extra twist — a so- 
called ‘supercoil’ — that forces the two strands of the 
DNA to separate over a short region. Gyrases can 
generate such supercoils. 


See also: Replication; Transcription 
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A well-known feature of the tortoiseshell cat is that 
almost all such animals are female. This is a result of X 
chromosome inactivation. In all female mammals one 
of the two X chromosomes in every cell becomes 


genetically inactive early in development. Either X 
chromosome may become inactive, and in each cell 
once the choice has been made the same X chromo- 
some remains inactive throughout the further multi- 
plication of that cell line in development. Thus, in the 
tissues of the adult there may be large clumps of cells 
with the same X chromosome active. If the two X 
chromosomes carry different alleles of an X-linked 
gene, the animal will be a mosaic of two types of 
cells, and if the gene concerned affects some visible 
feature, such as coat color, then this mosaicism will be 
visible as variegation. 

In cats there is an X-linked gene that determines a 
ginger coat versus nonginger (i.e., black or tabby). A 
tortoiseshell cat carries an allele for ginger on one X 
chromosome and an allele for black or tabby on the 
other. As a result the coat is variegated with patches of 
ginger and of black or tabby. Male cats have only a 
single X chromosome (being chromosomally XY) and 
can thus only be either ginger or nonginger. Similar 
X-linked coat color genes occur in other species, 
including the Syrian hamster and the mouse. In add- 
ition, a patchy effect can result from genes affecting 
other characteristics, such as coat texture. The tabby 
pattern in the cat is not due to X chromosome inacti- 
vation as the gene is autosomal, but a gene called tabby 
in the mouse is X-linked and gives a striped effect in 
heterozygotes due to X-inactivation. With appropri- 
ate techniques of histology or cell culture the presence 
of two types of cells can be shown in heterozygotes 
for many X-linked genes. The shape and size of the 
patches depends on patterns of cell growth and cell 
mingling during development. In tortoiseshell cats the 
patches are larger if the cat carries an autosomal gene 
for white spotting because this gene reduces the num- 
ber and distance of migration of pigment cells, so that 
descendants of a single cell occupy a larger area. 

It is possible to use the distribution of patches to 
determine whether particular structures arise from a 
single cell (monoclonal) or many cells (polyclonal) in 
development; e.g., intestinal crypts arise from a single 
cell, so that all cells of a crypt have the same X chromo- 
some active, and intestinal villi arise from more than 
one cell, and hence show variegation. Similarly, 
X chromosome inactivation can be used to show 
whether tumors arise from one or many cells. 

Occasionally male tortoiseshell cats are found and 
these usually result from a chromosomal anomaly. If 
an animal is chromosomally XXY it is male, due to 
the male-determining effect of the Y chromosome, but 
having two X chromosomes it undergoes X chromo- 
some inactivation, as in an XX female. Thus, if it is 
heterozygous for ginger and nonginger it will show 
tortoiseshell coloring. A similar pattern can also be 
produced by other developmental events that cause 
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the animal to be a mixture of two types of cells. 
One such event is a somatic mutation, occurring 
early in development, so that some cells carry the 
mutation and others not. Another possibility is 
the accidental or experimental fusion of two early 
embryos to form a single individual. Since the pattern 
depends on the multiplication and movement of cells 
after the two-cell populations form, then different 
types of event, such as mutation or embryo fusion, 
produce similar patterns. In some species patterns 
similar to tortoiseshell can be produced by autosomal 
genes, if these lead to somatic mutation or to epigen- 
etic developmental changes in gene expression, as in 
the tabby cat. 


See also: Coat Color Mutations, Animals; 
X-Chromosome Inactivation 


Trans, Cis Configurations 


See: Cis—Trans Configurations 
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Trans-acting factors are regulatory proteins which act 
to control gene transcription. They are therefore also 
known as transcription factors. In eukaryotes, these 
factors play a key role in the regulation of gene expres- 
sion by determining which genes are transcribed in a 
particular tissue or in response to a given stimulus. 

To produce their effects trans-acting transcription 
factors will, in general, require the ability to bind 
directly or indirectly to DNA and then to influence 
gene transcription either positively or negatively. Each 
of these aspects will be considered in turn. 


DNA Binding 


Detailed analysis of a number of different transcrip- 
tion factors has indicated that they have a modular 
structure in which specific regions of the molecule 
are responsible for binding to the DNA while other 
regions produce a stimulatory or inhibitory effect on 
transcription. Studies on the DNA binding regions of 
different transcription factors have revealed several 
distinct structural elements which can produce DNA 
binding. 

Indeed transcription factors are frequently classi- 
fied on the basis of their DNA binding domains. 
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Well-characterized DNA binding domains include the 
helix—-turn-helix motif found in the homeobox tran- 
scription factors, the two cysteine-two histidine zinc 
finger which is found, for example, in the Sp transcrip- 
tion factor family, the multicysteine zinc finger which 
is found in the steroid-thyroid hormone receptor 
family, the Ets domain, and the basic DNA binding 
domain. 

This last example is of particular interest since fac- 
tors containing the basic DNA binding domain can 
only bind to DNA once they have formed transcrip- 
tion factor dimers. Hence, factors containing the basic 
binding domain are further subgrouped according 
to the nature of the dimerization motif which they 
contain. Thus some of these factors contain a helix- 
loop—-helix motif which mediates dimerization while 
others undergo dimerization via the so-called leucine 
zipper motif which contains a regular array of leucine 
residues. 

Thus a wide variety of DNA binding domains 
(which in some cases have associated dimerization 
domains) allow trans-acting transcription factors to 
bind to their appropriate DNA sequences within tar- 
get genes. 


Activation of Transcription 


Many transcription factors contain, in addition to the 
DNA binding domain, specific regions that are neces- 
sary for the activation of transcription. Such regions 
were identified on the basis of their ability to stimulate 
transcription when linked to the DNA binding 
domain of a completely unrelated factor. These 
regions are known as activation domains. As with 
DNA binding domains, a number of distinct types of 
activation domain have been identified. They are 
classified on the basis that they are rich respectively 
in acidic amino acids, glutamine residues, or proline 
residues. 

These activation domains appear to function by 
interacting with components of the basal transcrip- 
tional complex. This is a complex of RNA polymerase 
II and various transcription factors such as TFIIB and 
TFIID which assembles at the gene promoter and 
is essential for transcription to occur. Activation 
domains have been shown to interact either directly 
with specific components of this complex or indirectly 
by interacting with so-called coactivator molecules 
which then interact with the basal complex itself. 
Whatever the case, such interactions appear to result 
in enhanced transcription either by stimulating the 
rate of transcription factor complex assembly or by 
stimulating the level of its activity. 

Hence, following binding to their appropriate 
DNA binding site via the DNA binding domain, the 


activation domains of specific activating transcription 
factors can interact with the basal transcriptional com- 
plex so as to stimulate transcription. In this manner 
the binding of specific transcription factors can stimu- 
late gene transcription. 


Repression of Transcription 


Although it was originally thought that most 
eukaryotic transcription factors acted by stimulating 
transcription, it has now become clear that a wide 
variety of factors act by inhibiting the transcription 
of specific genes and that such inhibitory transcription 
factors may be at least as important as stimulatory 
factors. 

The earliest examples of such inhibitory transcrip- 
tion factors were shown to act by interfering with the 
activity of a positively acting factor thereby blocking 
its stimulatory effect on transcription. This could be 
achieved, for example, by preventing the positively 
acting factor from binding to DNA either via the 
negatively acting factor binding to its DNA bind- 
ing site or by the formation of a non-DNA binding 
protein-protein complex between the positively act- 
ing factor and the negatively acting factor. Alter- 
natively, the negatively acting factor could act by 
interacting with the positively acting factor to block 
the activity of its activation domain in a phenomenon 
known as ‘quenching.’ 

It has now become clear, however, that a class 
of inhibitory transcription factors exists which can 
directly inhibit transcription even in the absence of a 
positively acting factor. These factors can thus reduce 
the basal level of transcription below that observed 
even in the absence of any activating molecule and 
appear to function by interacting either directly or 
indirectly with the basal transcriptional complex so 
as to reduce its activity. They thus constitute the 
antithesis of the activating molecules discussed in the 
previous section and possess defined inhibitory 
domains which are responsible for their effects and 
which, like activation domains, can function when 
transferred to the DNA binding domain of another 
molecule. 

Hence the balance between binding of transcrip- 
tional activators and transcriptional repressors to the 
regulatory region of a particular gene will determine 
its rate of transcription in any particular situation. 
Clearly however, in order for a particular gene to 
respond to specific signals or to be regulated in a cell 
type specific manner, the balance between these ac- 
tivating and repressing molecules must change in dif- 
ferent situations. The mechanisms which are used to 
regulate transcription factor activity are discussed in 
the next section. 


Regulation of Transcription Factors 


Transcription factors can be regulated at two levels, 
namely, the regulation of transcription factor syn- 
thesis and the regulation of transcription factor 
activity. 


Regulation of Synthesis 

In a number of different situations a transcription 
factor is regulated by being synthesized in one par- 
ticular tissue or cell type and not in other tissues. The 
most dramatic example of this concerns the MyoD 
transcription factor which is synthesized only in skel- 
etal muscle cells. Thus in this case the overexpression 
of the MyoD factor in undifferentiated fibroblast cells 
is sufficient to convert them to skeletal muscle cells, 
indicating that this factor is critical in the induction of 
muscle-specific gene expression. 


Regulation of Transcription Factor Activity 
Although the regulation of transcription factor syn- 
thesis is an important control point, it cannot be the 
only regulatory mechanism that controls transcription 
factor activity. Thus if this were the case, the enhanced 
synthesis of a transcription factor in response to a 
particular stimulus would be controlled by enhanced 
transcription of its corresponding gene, which in turn 
would require the de novo synthesis of further tran- 
scription factors, so resulting in the need for new 
transcription of these genes and so on. Therefore it 
is necessary to have an additional mechanism which 
allows de novo gene transcription by the activation of 
pre-existing transcription factors. 

Such activation of pre-existing transcription factors 
can occur via a number of different mechanisms which 
can involve ligand binding, alterations in protein- 
protein interaction and transcription factor phosphor- 
ylation. Thus, for example, in the case of the steroid 
receptors the inactive receptor is associated with an 
inhibitory heat-shock protein hsp90. Following bind- 
ing of the steroid hormone ligand, hsp90 dissociates 
and moves to the nucleus where it can bind to its 
appropriate response element and switch on tran- 
scription. 

Regulation of transcription factor activity by phos- 
phorylation is seen in the case of the CREB factor 
which binds to the cyclic AMP response element 
(CRE) and plays a critical role in the regulation of 
transcription in response to cyclic AMP. Thus, fol- 
lowing treatment with cyclic AMP the CREB factor 
becomes phosphorylated on a particular serine resi- 
due. This phosphorylation prevents the binding of 
CREB to another protein, CBP, which does not bind 
to unphosphorylated CREB. 

This CBP factor appears to play a critical role in the 
activation of transcription. Thus this factor is able to 
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bind to specific components of the basal transcrip- 
tional complex thereby linking CREB to this complex 
and allowing stimulation of its activity following cyc- 
lic AMP treatment. In addition however CBP has 
been shown to possess histone acetyltransferase activ- 
ity. Such enhanced acetylation of histones has been 
shown to occur in regions of DNA that are active or 
potentially active in transcription and to be involved 
in the open chromatin structure characteristic of such 
regions. It is therefore possible that the binding of 
CBP to CREB recruits it to the DNA and allows it 
to produce changes in the chromatin structure which 
lead to enhanced transcription. 

Hence, in a specific cell type or in response to a 
specific stimulus, specific transcription factors are 
either synthesized or become activated following 
posttranslational modification. The binding of these 
transcription factors to their appropriate recognition 
sequences thus produces specific patterns of gene 
transcription in specific cell types or in response to 
specific stimuli. 


Further Reading 
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A transcribed spacer is part of an rRNA transcrip- 
tional unit that is transcribed but subsequently dis- 
carded during maturation. It does not ultimately give 
rise to part of rRNA. 


See also: Nontranscribed Spacer; Transcription 
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Transcription Decodes the Genetic 
Information and Is a Complex Process 


Transcription is the process of physically decoding the 
genetic information into an RNA that can be used by 
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the cell, either directly or as a template for protein 
synthesis. RNA polymerase is the enzyme that carries 
out the synthesis of RNA from a nucleic acid tem- 
plate, and the complexity of the enzyme’s polypeptide 
composition depends on the organism. Likewise the 
complexity of the cellular machinery that regulates 
RNA synthesis depends on the organism, and on the 
necessity for timing the synthesis of particular RNAs 
in particular locations to particular levels. Transcrip- 
tion is but one of many processes that influence the 
overall regulation of gene expression in the cell, and 
cells and their viruses have developed multiple strat- 
egies to modulate the transcription machinery. In 
addition to protein components that have an impact 
on the behavior of RNA polymerase, the template and 
nascent transcript can also be involved in regulation, 
through nucleic acid sequences and structures and/or 
by virtue of the association of the DNA and the RNA 
with other proteins. 

In developing a mechanistic model for transcrip- 
tion regulation, experimentalists have used both 
genetics and biochemistry. Most recently these models 
have been highlighted by the availability of three- 
dimensional structures for RNA polymerases from 
prokaryotes and eukaryotes (Figure 1). However, 
transcription is a cyclic process, and in addition to 
regulating the actual catalytic properties of RNA 
polymerase, there is also regulation in finding the 
proper gene for transcription (promoter recognition), 
in initiating the synthesis of the RNA chain (both 
promoter escape and promoter clearance), in transi- 
tioning into a stable transcript elongation process and 
modulating elongation across a transcription unit, and 
in terminating the synthesis of the transcript at the 
proper location (Figure 2). The polymerase is then 
available to start the cycle again. For transcripts 


synthesized by prokaryotic RNA polymerases, there 
is coordination between synthesis and translation of 
the messenger RNA. In eukaryotes, there is co- 
ordination between synthesis and a variety of proces- 
sing reactions for the transcript: capping (for RNA 
polymerase II), splicing, and 3’ end maturation. Tran- 
scription also must be coordinated with DNA meta- 
bolic processes such as replication, recombination, 
and repair. 


Transcription Templates 


Transcription is most commonly discussed in the con- 
text of a double-stranded DNA template being copied 
into RNA by a DNA-dependent RNA polymerase. 
The majority of this entry will focus on that category 
of transcription. However, there are a number of 
viruses that utilize RNA as a template for transcrip- 
tion carried out by RNA-dependent RNA poly- 
merases. There are many mechanistic similarities 
between the DNA- and RNA-templated reactions, 
but the RNA-dependent RNA polymerases often 
contain a subunit with sequences closely related to 
reverse transcriptases (i.e. that copy RNA into 
DNA). In addition, much less is known about how 
these latter enzymes recognize specific genes and the 
proper sites at which to begin transcription. Next to 
nothing is known about their regulation during elonga- 
tion, or about the process of termination. 

In neither prokaryotic nor eukaryotic cells is the 
template highly purified nucleic acid. In eukaryotes, 
positively charged histone and nonhistone chromo- 
somal proteins are associated with the template to 
create the chromatin that is found in the nucleus, and 
these proteins participate in structuring the chromo- 
somes as well as in the regulation of both DNA and 
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(See Plate 41) Crystal structures have been determined for (A) T7 RNA polymerase (2.4 A, IQLN; 


Cheetham and Steitz, 1999; image from Brookhaven Protein Database); (B) Thermus aquaticus core RNA polymerase 
(3.3 A, IDDQ, Zhang et al., 1999; image from Brookhaven Protein Database); and (C) Saccharomyces cerevisiae RNA 
polymerase II (3.0 A, LIENO; Cramer et al., 2000; image courtesy of Patrick Cramer and David Bushnell). 
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Figure 2 Transcription cycle. Representation of RNA 
polymerase (RP) during the process of transcription. 
The dark parallel lines represent the DNA template. 
The thin line represents the nascent transcript. T refers 
to a termination site on the DNA template. Details of 
the transcription cycle are given in the text. 


RNA metabolism (see Chromosome). In prokaryotes, 
there are also basic proteins associated with the DNA 
to create a chromosome, but the structure is distinct 
from that in eukaryotes. There are no histones, 
although there are histone-like proteins. The pro- 
karyotic chromosomes are condensed into a nucleoid 
structure, and are associated with the inner cell mem- 
brane. Mutations in the histone-like proteins have a 
significant impact on transcription, and thus, as in 
eukaryotes, the naturally occurring template should 
not be considered as simply free DNA. 


RNA Polymerases 


The DNA-dependent RNA polymerases range in 
subunit complexity from those of the single subunit 
polymerases of bacteriophages (bacterial viruses) to 
the 12-15 subunit complexes associated into an active 
enzyme necessary for catalysis in the eukaryotic 
nucleus (Table 1). Many phages utilize the host cell’s 
polymerase as well as their own encoded RNA poly- 
merase, as is the case with phages (e.g., T7) that encode 
the smaller single subunit RNA polymerases. Some of 
these polymerases have become major reagents in bio- 
technology because they are now relatively easy to 
overproduce in bacterial cells, they need no additional 
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protein cofactors to recognize a promoter region, and 
they synthesize transcripts at least twice as fast as the 
bacterial host RNA polymerases. Curiously, the single 
subunit bacteriophage RNA polymerase itself can 
recognize the correct gene and start at the correct 
site on purified DNA, whereas the multisubunit 
eukaryotic nuclear polymerases require a very large 
number of accessory protein factors in order to locate 
the proper sites on the template for transcription. 
Bacterial RNA polymerases contain four subunits of 
three nonidentical proteins in the catalytic core of the 
enzyme, and this core complex requires one additional 
accessory factor (called sigma) that both enables the 
recognition of specific sequences that promote tran- 
scription and reduces the binding to nonpromoter 
sites on the DNA. There are several different sigma 
factors in each species of bacteria; one in each species 
is considered to be the primary factor for most trans- 
cription, whereas the others are utilized for special 
regulated events such as the stress response, motility, 
differentiation, and spore formation. 

The genomes of some bacteriophages, such as 
Bacillus subtilis phage SP01 and coliphage T4, encode 
their own sigma factors that redirect the bacterial 
RNA polymerase catalytic core for transcription of 
the viral DNA templates. The catalytic cores of the 
bacterial enzymes remain the same for all this tran- 
scription. Also, many bacteriophages utilize other 
virally encoded and host-encoded regulatory proteins 
to modify subsequent steps in the transcription cycle 
after promoter recognition. 

Bacteriophage N4 uses both the host polymerase 
and two other polymerases encoded by its own genome 
(Table 1). One of the viral polymerases is a very 
large single subunit enzyme that, in contrast to the 
smaller single subunit bacteriophage-encoded poly- 
merases, needs a host cofactor, ssb, a single-stranded 
DNA-binding protein. The ssb protein allows this 
320 kDa polymerase to recognize an unusual hairpin 
structure in the viral DNA. This allows synthesis of 
further viral RNAs, resulting in replication and assem- 
bly of new viral particles. 

Eukaryotic viruses, like their prokaryotic counter- 
parts, use the entire spectrum of transcription strat- 
egies, from encoding their own polymerases (e.g., 
vaccinia, baculovirus) to using the host-cell machinery. 
Several eukaryotic viruses encode RNA-dependent 
RNA polymerases as well (e.g., hepatitis C virus, 
poliovirus). Virally encoded protein factors also modu- 
late transcription in the host cell at many positions in 
the transcription cycle. 

Eukaryotic RNA polymerases can be found in the 
nucleus and in cellular organelles. The polymerase 
found in mitochondria and one polymerase found 
in chloroplasts have similarity to the small, single 
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Table | 


DNA-dependent RNA polymerase subunit variation 


Number of 
subunits in core 


Source 


Sizes (kDa) 


Bacteriophage T3, T7, SP6° l ~ 100 
Bacteriophage N4°° | 320 

3 40, 30, 15 
Eubacteria, e.g., Escherichia coli 4 150, 145, 45 
Archaebacteria, 13 122, 101, 44, 30, 27, 13.8, 12, 11.8, 10, 9.7, 9.7, 7.5, 5.5 
e.g., Sulfolobolus acidocaldarius 
Saccharomyces cerevisiae RNAP | 14 190, 135, 49, 43, 40, 34.5, 27, 23, 19, 14.5, 14, 12.2, 10, 10 
Saccharomyces cerevisiae RNAP II 12 191, 140, 35, 25.4, 25, 19, 17.9, 16.5, 14.2, 13, 8.3, 7.7 
Saccharomyces cerevisiae RNAP Ill 13-17 160, 128, 82, 53, 40, 37, 34, 31, 27, 25, 23, 19, 17, 14.5, I1, 10, 10 
Human and yeast mitochondria | 140 
Spinach and mustard chloroplast“ | 110 

4 154, 120, 78, 38 

~13 141, 110, 36, 26 and others not yet fully characterized 
Vaccinia virus 8 147, 132, 35, 30, 22, 19, 18, 7 
Baculovirus” 4 98, 55, 53, 46 


Examples are given of several classes of DNA-dependent RNA polymerases. 
“Indicates that the virus uses the host RNA polymerase as well as its own encoded RNA polymerase for temporal 


regulation of transcription. 


Indicates that the virus encodes more than one RNA polymerase. 


‘Indicates that more than one RNA polymerase is used. 


subunit bacteriophage RNA polymerases; chloro- 
plasts have two other RNA polymerases, one of 
which is very similar to the bacterial enzyme (Table 
1). The nuclear RNA polymerases come in three dis- 
tinct complexes isolated biochemically, and they were 
originally categorized by the class of genes each tran- 
scribes. More recently they have been defined by the 
subunits and accessory protein factors that act on 
them to regulate RNA synthesis. Termed RNA poly- 
merases I, II, and III (or A, B, and C, respectively), 
each has over a dozen subunits (Table 1). RNA poly- 
merase I transcribes the genes that encode the struc- 
tural RNAs for the subunits of the ribosome. RNA 
polymerase II transcribes the genes that encode pro- 
teins as well as a subset of small RNAs. RNA poly- 
merase III transcribes the genes encoding ribosomal 
5S RNA, tRNAs, and a subset of other small RNAs. 
In Saccharomyces cerevisiae where the subunits 
have all been cloned and characterized by sequence, 
five subunits are shared by all three polymerase com- 
plexes. In addition, there is sequence similarity among 
four other subunits that are unique to each of the three 
polymerases. Three of the four subunits that are related 
by sequence similarities are also related in sequence to 
the three different subunits in the bacterial catalytic 
core RNA polymerase. Biochemical and genetic stud- 
ies suggest that these three subunits in the eukaryotic 
polymerases are probably functionally homologous to 
the bacterial subunits as well, and thus they are liable to 


be the catalytic core of the enzymes. However, none of 
the eukaryotic nuclear polymerases has been reconsti- 
tuted in active form from individually isolated sub- 
units, and thus the function of the individual subunits 
can at best only be inferred from the genetics and from 
biochemical analysis of mutant polymerases. 

These studies, coupled with the recently published 
crystal structures of T7 RNA polymerase, yeast RNA 
polymerase II, and a thermophilic bacterial RNA 
polymerase, have provided great insight into function 
and predictions of function for individual subunits. 
There are some overall similarities but also significant 
differences among the enzymes, especially in the 
potential route for the template DNA and the nascent 
transcript within ternary complexes. The structures 
of the purified polymerases also differ significantly 
from the structure of cocrystals with nucleic acids, 
and there are major conformational changes in going 
from free enzyme to transcriptionally engaged enzyme. 
A number of mechanistic models have been built based 
on the structures of the purified enzymes, assimilating 
both mutational information and biochemical cross- 
linking of the DNA and RNA with polymerase 
subunits. In addition, there are lower resolution struc- 
tures of elongation complexes for both yeast RNA 
polymerase II and E. coli RNA polymerase that inform 
the models. However, not all experimental results are 
consistent with the models that predict the position of 
the nucleic acids; the large conformational differences 


between the free enzyme and enzyme associated with 
DNA or with DNA and RNA probably explain these 
discrepancies. Nonetheless, the models are extremely 
useful in making predictions for further analyzing the 
molecular mechanisms during transcription. 

In both prokaryotes and eukaryotes, the two larg- 
est subunits form the catalytic center of the polymer- 
ase. The individual nucleotide precursors cross-link to 
each of these subunits, as do the template DNA and 
the nascent RNA transcript. These subunits are also 
contacted by proteins that are regulatory for initiation 
and for elongation, and they are important in recogni- 
tion of proper termination sequences. The third sub- 
unit of the bacterial core RNA polymerase, a, is 
important for proper structural integrity of the poly- 
merase, and it also contacts DNA and mediates signals 
from activator and repressor proteins when the poly- 
merase is bound at the promoter. In eukaryotes, the 
a-subunit equivalent has also been shown to be 
important for the structural integrity of RNA poly- 
merase II in yeast, although there is no information as 
yet about its contacts or function with other regula- 
tory proteins. 

Two of the smaller subunits of yeast RNA poly- 
merase II, 4 and 7, are important in mediating the 
stress response in cells, and im vitro they are essential 
for efficient promoter recognition and initiation. Sub- 
unit 7 is encoded by an essential gene, whereas a dele- 
tion of the gene encoding subunit 4 renders the cells 
temperature-sensitive and results in a weakened asso- 
ciation of subunit 7 with the polymerase. The associ- 
ation is so inefficient that both subunits 4 and 7 are 
missing from polymerase biochemically isolated from 
cells in which the gene encoding only subunit 4 has 
been deleted. 

No function has been ascribed to most of the sub- 
units that are common to all three yeast nuclear RNA 
polymerases (subunits 5, 6, 8, 10, and 12), although 
subunit 6 exhibits genetic interactions with a factor 
involved in regulating transcript elongation by RNA 
polymerase II and also is important for the assembly 
and structural integrity of RNA polymerases I and II. 
Each polymerase also has a small subunit that is 
related but nonidentical in sequence, subunits A12.2, 
B12.6, and C11, in RNA polymerases I, II, and M, 
respectively. This subunit has an inferred role in tran- 
script elongation because it mediates an RNA hydro- 
lysis reaction carried out by the polymerase when 
elongation stalls during transcription. 

Archaebacterial DNA-dependent RNA poly- 
merases are also multisubunit complexes, and they 
resemble eukaryotic polymerases more than prokary- 
otic polymerases. However, they require fewer 
accessory protein factors for promoter recognition, 
and they also contain subunits that are related in 
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sequence to a transcript elongation factor (TFIIS) for 
yeast RNA polymerase II. 


Transcription Cycle 


The cell utilizes the many reactions and interactions 
during transcription to provide exquisite temporal, 
spatial, and quantitative control over the synthesis 
of RNA. At any point in the transcription cycle 
(Figure 2), a step may be rate-limiting and thus a 
target for control. The step that is rate-limiting may 
also change depending on the needs of the cell. Experi- 
ments looking at transcription tend to focus on one 
specific aspect or another, or simply on whether the 
levels of transcript go up or down in the presence of 
various treatments or in different types of mutant 
cells. Detailing the mechanistic steps in transcription 
assists in appreciating and understanding the com- 
plexity and power of the multiple layers of regulation. 


Finding the Proper Gene to Transcribe: 
Promoter Recognition and Preinitiation 
Complex Formation 
In a complex genome, how does RNA polymerase 
find the location at which RNA synthesis needs to 
begin for a particular gene? This search-and-locate 
mission is referred to as ‘promoter recognition’ 
because the DNA sequences ‘promote’ specific tran- 
scription. As mentioned above, the single subunit 
bacteriophage RNA polymerases can find the location 
and the transcription start site for specific genes with- 
out the assistance of any other proteins. Nearly all 
other polymerases require one or more accessory fac- 
tors to effect promoter recognition. Bacterial RNA 
polymerases need the accessory factor referred to as 
sigma to locate the desired promoter and to position 
the polymerase at the correct start site. Eukaryotic 
nuclear RNA polymerases need a large constellation 
of accessory proteins that work in sequence and in 
concert to recruit the polymerase to the proper loca- 
tion and position it for accurate initiation; this macro- 
molecular complex bound to the DNA at the promoter 
has been referred to as the ‘preinitiation complex.’ 
When the polymerase binds to the double-stranded 
DNA promoter region, it forms what has been termed 
a ‘closed complex,’ because the DNA remains in a 
regular, closed duplex. For bacterial RNA polymer- 
ase, conformational isomerization through several 
physically distinct closed complexes has been observed, 
and these isomerizations precede the opening of the 
DNA in the promoter region to expose the single- 
stranded DNA in an ‘open complex’ that can now be 
‘read’ by RNA polymerase. The open complex can be 
distinguished from the closed complex by physical 
and biochemical properties. Open complex formation 
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by bacteriophage and bacterial RNA polymerases 
occurs in vitro without any infusion of energy. How- 
ever, the eukaryotic preinitiation complex requires the 
energy of ATP hydrolysis to open the DNA. In the 
open complex the polymerase is poised to receive 
nucleoside triphosphate substrates and begin RNA 
synthesis. A very large amount of published work 
has gone into examining the regulation of preinitiation 
complex formation, and closed-to-open complex for- 
mation. There are a number of rate-limiting steps in 
both the binding and isomerization reactions as well as 
in the association and dissociation of regulatory fac- 
tors. The detailed kinetics of the rate-limiting steps 
have been analyzed im vitro, and genetic analysis has 
been invaluable in identification of regulatory factors 
and specific nucleic acid sequences essential in the 
regulation of promoter binding and preinitiation com- 
plex formation. There is still much to learn. 


Regulation of Promoter Recognition and 
Preinitiation Complex Formation 

Multiple sequence-specific DNA-binding proteins 
regulate promoter recognition and preinitiation 
complex formation. Other protein factors that do 
not bind DNA regulate these processes through 
protein-protein interactions. Hundreds of proteins 
in prokaryotes and eukaryotes (such as activators, 
repressors, and insulators, as well as enhancer-binding 
proteins and general transcription factors (GTFs) ) 
have been catalogued based on their presence or 
absence in this phase of the transcription cycle. 
There are many ways in which activators, repressors, 
enhancers, and insulators work and, in large part, the 
regulation focuses upon a specific mechanistic step 
that is rate-limiting. In addition, the activity of these 
factors may be modulated by posttranslational modi- 
fications such as phosphorylation, glycosylation, 
acetylation, or methylation, as well as the chromo- 
somal environment and chromatin structure sur- 
rounding the promoter. In addition, many of these 
regulatory proteins also have effects on initiation and 
elongation reactions. Thus, studies of each step in 
transcription in isolation can reveal the individual 
important details of the mechanism, but understand- 
ing the overall regulation requires a broader view of 
the transcription cycle. 


Initiating RNA Synthesis 

In the past, initiation of transcription was defined as 
the formation of the first phosphodiester bond, or 
dinucleotide synthesis. In this view, elongation started 
with the third nucleotide incorporated into a tran- 
script. However, increased understanding of the dif- 
ferent stages in the transcription cycle has redefined 
initiation to include events that continue into the 


initial transcribed region. The ‘initial transcribed com- 
plexes’ are those engaged in the initiation process that 
have not yet entered elongation. The formation of a 
fully processive elongation complex follows the com- 
pletion of the initiation reaction and release of the 
accessory initiation factors, and promoter escape has 
become the operational term for the process that 
occurs prior to the establishment of the stable elonga- 
tion complex. 


Promoter escape 

The ‘switch’ that dictates the release of the initiation 
factors from the polymerase and permits its transition 
into productive elongation is not known. ‘Abortive 
initiation’ is characterized by the repetitive synthesis 
of small oligomeric transcripts near the start site of 
transcription, and these abortive products are released 
from the transcribing complex without release of the 
polymerase or initiation factors. Rather, the polymer- 
ase starts again at the correct initiation site and begins 
synthesis anew. Sequences in the promoter region and 
in the not yet transcribed downstream region can 
influence the length of the abortive products and the 
efficiency of their production. In contrast to the pre- 
dictions of some models for transcription, the length 
and efficiency of production of the abortive products 
is not correlated with the predicted thermodynamic 
stabilities of potential RNA-DNA hybrids in the 
initial transcription complexes. Nonetheless, promoter 
escape is clearly rate-limiting for some promoters and 
not for others. 

What events characterize promoter escape? For 
bacterial RNA polymerase, the release of the sigma 
factor marks the end of the initiation process. The 
sigma factor is released when transcripts are between 
9 and 16 nucleotides long, and the length depends on 
the promoter and the early transcribed sequence. For 
eukaryotic RNA polymerases, there is not yet a dis- 
crete event that marks the end of initiation, but the 
transition occurs within a broader range, when the 
transcript is between 8 and 40 nucleotides long. Sev- 
eral accessory protein factors influence the transition, 
and regulatory proteins also have an impact on the 
promoter escape process. For RNA polymerase II, 
the phosphorylation of the largest subunit is cor- 
related with promoter escape, and several different 
protein kinases may be involved in this transition. 
However, the specific phosphorylation sites are not 
known. 


Promoter clearance 

The term ‘promoter clearance’ refers to the point in 
the reaction when the polymerase and associated 
accessory factors have moved away from the se- 
quences necessary for promoter recognition and 


preinitiation complex formation. Thus, the ‘cleared’ 
promoter becomes accessible to another RNA poly- 
merase molecule with its appropriate attendant pro- 
teins. Clearance can occur without escape, as occurs 
for heat shock genes in Drosophila. Thus, stalling near 
the start site of transcription can have a variety of 
effects, including promoter occlusion, although it can 
also poise the polymerase to respond rapidly to cellu- 
lar needs that are essential for its survival. 


Regulation of the initiation reaction 

The transition from initiation to elongation requires 
the establishment of a ‘ternary elongation complex,’ 
which physically includes the RNA polymerase, the 
template, and the nascent transcript in a stable com- 
plex, that is able to continue the incorporation of 
nucleotides as progressive movement is made across 
the template. In addition to cis-acting promoter 
sequences and early transcribed sequences, there 
are trans-acting protein factors that influence the 
ability of the polymerase to move from initiation 
to elongation. The sigma factors and the GreA 
and GreB factors in bacteria can have an impact on 
the position and extent of abortive initiation, and 
the location at which the polymerase enters product- 
ive elongation. In more specific examples the ‘cata- 
bolite regulatory protein’ affects promoter escape in 
the E. coli maltose operon, and UTP concentration 
regulates promoter escape in the pyrimidine biosyn- 
thetic operon. 

For eukaryotic RNA polymerase II, the general 
transcription factors TFIIE and TFIIH are very influ- 
ential in the transition from preinitiation complex 
formation to elongation. The kinase activity of 
TFIIH acts upon RNA polymerase II changing its 
phosphorylation state, but the helicase activities of 
this factor are also critical in this transition. There 
are also situations, as with the heat shock genes of 
Drosophila and yeast, where the polymerase does not 
enter productive elongation until gene-specific regu- 
latory factors have an impact on the initial transcrib- 
ing complex. Clearly, initiation is a series of sequential 
steps that are regulated in the cell. 


Table 2 Options for RNA polymerase during elongation 
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Elongation 
Once the action proximal to the promoter has been 
resolved, the polymerase (and occasionally some 
associated proteins) can continue with its catalytic 
function, the synthesis of the RNA. However, elong- 
ating the nascent transcript is not a monotonic 
process of fixed rate. Rather, a large amount of cellular 
regulation influences the progress of the polymerase 
across a transcription unit. There are a variety of 
blocks to elongation that impede the polymerase as it 
makes its way across a transcription unit (see Table 1 
in Lin and Lynch (1996), for eukaryotic blocks to 
elongation), and these regulated stops havea significant 
impact on the rate of transcription of any gene. At any 
nucleotide position, the polymerase can do one of four 
things (Table 2). Of course, it can continue elonga- 
tion, incorporating the next nucleotides without 
detectable hesitation. Alternatively, the polymerase 
can ‘pause,’ and eventually begin transcription again. 
The signals that cause pausing are not clear, although 
pausing occurs at very specific locations all along a 
transcription unit. Pausing does not seem to occur 
when the templates are synthetic homopolymers 
(such as poly (dA) or poly (dC)) or alternating co- 
polymeric templates (such as poly (dAT) or poly 
(dCG)). The amount of the polymerase that can be 
‘trapped’ in the pause and the time spent pausing 
varies for each site and with different polymerases. 
There are few rules that predict a pause site or the 
duration of the pause. It is known that sequences 
both upstream and downstream of the pause site can 
influence the reaction, as can accessory protein factors. 
Some pauses have consecutive Ts in the nontranscribed 
DNA strand, or potential hairpin secondary structures 
upstream of the pause site. However, these are not 
characteristic of all the sequences that pause poly- 
merases. Clearly the frequency and duration of pauses 
are attractive targets for regulation, Protein factors that 
influence polymerase pausing are discussed below. 
During elongation, a polymerase also can ‘arrest.’ 
Arrest is an operational definition for an im vitro 
observation wherein the polymerase stops tran- 
scription upon encountering a block to elongation. 


Elongate Continue incorporation of nucleotide substrates 

Pause Halt during elongation, but resume nucleotide incorporation in a finite period of time without the need for an 
accessory factor. Accessory factors can influence the efficiency or duration of the pause 

Arrest Halt during elongation, but cannot resume synthesis without the assistance of accessory protein factor. 
Catalytic center of the polymerase and the 3’ end of the transcript physically separate, creating a requirement 
for transcript hydrolysis to generate a new 3’ end in the catalytic center 

Terminate 


factor-dependent 


Halt during elongation, release the transcript and the template; a terminator can be factor-independent or 
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However, in contrast to a pause site, an arrested poly- 
merase will not spontaneously resume elongation in 
the presence of nucleotide substrates; rather the stalled 
and arrested elongation complex requires an accessory 
protein factor. That factor stimulates a transcript 
hydrolysis activity that resides in the polymerase. 
This unusual activity results in cleavage of the 3’ end 
of the transcript, release of a small RNA oligonucleot- 
ide (from 2 to 20 nucleotides long), retention of the 5’ 
portion of the transcript, and the resumption of elonga- 
tion across the site that originally blocked progress of 
the polymerase (Figure 3). This cleavage activity is 
necessary because, when arrested, the polymerase’s 
catalytic center is displaced from the 3’ end of the 


Elongation 


Arrest 


Transcript cleavage 


GREA TFIIS 
N 


T 
Cleaved RNA oligonucleotide 


Resume elongation 


=> 


Figure 3 Transcript cleavage reaction in arrested 
elongation ternary complexes. RNA polymerase moving 
along the DNA template encounters a block to 
elongation and arrests transcription. The catalytic 
center of the polymerase moves away from the 3’ end 
of the transcript. The catalytic center is stimulated by an 
accessory protein (GreA or GreB in bacteria, TFIIS in 
eukaryotes) to hydrolyze an oligonucleotide from the 3’ 
end of the nascent transcript. The hydrolysis results in a 
new 3/-OH in register with the catalytic center, and 
nucleotide incorporation can resume for continuing the 
elongation process. The star represents the catalytic 
center of the polymerase. The perpendicular line 
represents the block to elongation. 


transcript. Lacking a 3’ hydroxyl group in the active 
site, the polymerase can no longer incorporate nucleo- 
tide substrates. To create a new 3’ end, the catalytic site 
hydrolyzes the transcript somewhat 5’ to the displaced 
end of the transcript. This creates a new 3’ end in the 
catalytic site that is in proper sequence register with 
the template; the polymerase can resume elongation 
and pass the original site of arrest (Figure 3). This 
cleavage reaction by the polymerase alone is very in- 
efficient. Thus, factors that stimulate the cleavage reac- 
tion were isolated because they could stimulate the 
overall elongation reaction of the polymerase. Both 
prokaryotes and eukaryotes have such factors. Even 
the single-subunit bacteriophage RNA polymerase 
carries out the cleavage reaction, apparently without 
a cofactor. The mechanistic details that characterize 
the transcript cleavage reaction are under intense 
study, as the regulation of this reaction can strongly 
influence the efficiency of the elongation reaction. 
Finally, the elongating polymerase may ‘terminate’ 


(described below). 


Regulation during elongation 

The multiple reactions that contribute to elongation 
(pausing, arresting, elongating) highlight the variety of 
targets for regulation. In eukaryotes, there are several 
proteins that reduce the pausing of RNA polymerase 
II in vitro. These include the general transcription 
factor TFIIF, which is also needed for the formation 
of the preinitiation complex. In addition, elongin and 
HMG14 and HMG17 have an effect on pausing. The 
TFUHS/SII protein of eukaryotes and the GreA and 
GreB proteins of bacteria stimulate the cleavage reac- 
tion necessary to reverse the arrested state of RNA 
polymerase II and bacterial RNA polymerase, respect- 
ively. The TFIIS protein is quite specific for RNA 
polymerase II in promoting readthrough of arrested 
polymerases, but it can also function with RNA poly- 
merase I under some conditions to stimulate the cleav- 
age reaction in vitro. The N-TEFb and DSIF protein 
complexes negatively regulate elongation by RNA 
polymerase II, and their action is overcome by a pro- 
tein complex, P-TEFb, that phosphorylates the poly- 
merase. 

There are also many proteins that regulate elonga- 
tion across specific transcription units, as illustrated 
by the following examples. The Tat protein of the 
human immunodeficiency virus stimulates more effi- 
cient elongation across the viral genome. The G2R and 
A18R proteins of vaccinia virus appear to modulate 
elongation for temporal control of viral RNA synthe- 
sis. Several cellular eukaryotic transcription units are 
controlled in a tissue-specific manner by regulating 
elongation, e.g., adenosine deaminase and, retinoic 
acid receptor isoforms. In prokaryotes, factors specific 


to the purB, Bgl, and S10 operons attenuate transcrip- 
tion elongation. Other proteins affect elongation by 
affecting termination, and examples of such proteins 
are discussed below. 

Proteins that modulate the phosphorylation state 
of RNA polymerase II also regulate the elongation 
reaction, although the mechanism of this effect is 
unclear. The exchange of a protein complex (mediator) 
that promotes initiation with one(s) that facilitates 
elongation (elongator) is also involved. A highly phos- 
phorylated polymerase is engaged in elongation; how- 
ever, the pattern of phosphorylation and the particular 
kinase(s) involved are not well defined. There also 
may be the need for the sequential action of distinct 
kinases to generate an elongation complex capable of 
transcribing over long distances. Dephosphorylation 
is carried out by an enzyme that is essential in yeast, 
and is thought to be involved in the cycling of the 
polymerase after termination. It is likely that the 
extent and pattern of phosphorylation of RNA poly- 
merase II changes during the elongation process, and 
that the combined effects of kinases and phosphatases 
have a significant regulatory impact im vivo during 
pausing, arrest, and in the coordination of RNA and 
DNA metabolism coincident with transcription. 

In eukaryotes, chromosomal structure also affects 
the elongation reaction, and there are several interac- 
tions between chromatin remodeling machines and 
elongation factors inferred from genetics. In addition, 
proteins genetically involved in chromatin structure 
changes in vivo (such as the Spt4p and Spt5p proteins 
from yeast and their mammalian homologs) have been 
shown subsequently to have an impact on elongation 
in vitro. The Swi/Snf chromatin remodeling complex 
has also been linked in vivo and in vitro to elongation 
control. 

Clearly, when considering the transcript elongation 
reaction, one must think about not only the mechanism 
of the catalytic synthesis but also the variety of behav- 
iors of RNA polymerase during that synthesis (elonga- 
tion, pausing, arrest, transcript cleavage). Overlaying 
the regulation are protein factors that alter each of 
these processes or change the template conformation. 


Termination 

Termination involves release of both the RNA and the 
template by RNA polymerase (Figure 2). It is crucial 
that this termination process is tightly regulated, since 
RNA polymerase is a completely processive enzyme, 
incorporating nucleotides until termination occurs. 
Once the polymerase has terminated transcription, it 
cannot pick up where it left off. This contrasts with 
DNA polymerases, distributive enzymes that dissoci- 
ate from and reassociate with a template—primer to 
continue extension of a replicating strand of DNA. 
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Bacterial and bacteriophage RNA polymerases 
recognize specific sites as ‘factor-independent’ termin- 
ators, and bacterial polymerases also recognize a dis- 
tinct set of sites only in the presence of accessory 
termination protein factors such as the rho protein, 
giving ‘factor-dependent’ sites of termination. The 
factor-independent termination sites include a T-rich 
sequence in the nontranscribed strand downstream of 
a potential hairpin secondary structure in the RNA. 
The T-rich sequence in the DNA is thought to signal 
the polymerase to halt transcription, and the hairpin 
structure is thought to contribute to the release reac- 
tion that dissociates the RNA from the polymerase. 
Curiously, T-rich sequences are part of the signal that 
causes pausing, arrest, and sometimes termination for 
eukaryotic RNA polymerases as well, reinforcing the 
notion of conserved signals during transcription. 
There are also a number of ‘antitermination’ proteins 
that regulate the recognition of terminators, and these 
are known to be especially important for the temporal 
regulation of bacteriophage production. In prokar- 
yotes, the translation apparatus also regulates termin- 
ator recognition within a transcription unit ina process 
called ‘attenuation’ (see Attenuation), which is signifi- 
cant in several amino acid biosynthetic operons. 

In eukaryotes, the termination reaction is best 
understood for RNA polymerase III, for which a 
T-run of four or five nucleotides in the nontemplate 
DNA strand seems sufficient to signal termination. 
For RNA polymerase II, all primary transcripts 
extend beyond the sequence that will become the 3’ 
end of the mature transcript ultimately formed in a 
processing reaction. The termination reaction is 
poorly understood, but it is directly coupled to or 
coordinated with the 3’ end processing events. The 
termination event is not random, but falls in a specific 
delimited region of the transcription unit. Termin- 
ation by RNA polymerase I must be particularly pre- 
cise. The ribosomal RNA genes (rDNA) are arranged 
in clusters of repeats, yet each transcription unit is 
independent of the others. Thus, RNA polymerase I 
terminates at specific locations that fall between the 
transcribed repeats, and this termination event in 
S. cerevisiae is dependent upon the Reb-1 protein. 
Like RNA polymerase II, RNA polymerase I tran- 
scribes well beyond the location of what will become 
the mature 3’ end of the rRNA, and this ‘extra’ tran- 
scription may be involved in regulating preinitiation 
complex formation for the next copy of rDNA down- 
stream in the repeats. 


Regulation during termination 

Accurate and efficient termination prevents interfer- 
ence with transcription regulation of downstream 
genes. That is, if transcription continued unhindered, 
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Figure 4 Colliding polymerases. RNA polymerase 
ternary elongation complexes (A and B) are copying 
opposite strands of the DNA template and moving 
toward each other. Each might interfere with the 
progress of the other, and the chicken «-globin gene 
cluster uses this mechanism for regulation of transcrip- 
tion termination. 


transcription regulation of a downstream gene would 
be disrupted. Occasionally this ‘promoter occlusion’ 
is used to the cell’s advantage. In addition, when two 
transcription units are oriented facing each other in 
opposite directions on the chromosome, the termin- 
ation reaction for gene A may also be regulatory for 
gene B (Figure 4). However, when the DNA replica- 
tion machinery encounters the transcribing RNA 
polymerase, termination does not always occur. Sur- 
prisingly, these two molecular machines can pass each 
other nearly unhindered, although how this occurs is 
not understood. 

In prokaryotes, the rho termination factor is essen- 
tial for the proper utilization of factor-dependent ter- 
mination sites in both cellular and bacterial virus 
transcription units. The rho factor is a hexamer of 
identical subunits that binds RNA, has RNA helicase 
activity, and hydrolyzes ATP during the termination 
reaction. There are rho utilization sites (rut sites) in 
rho-dependent termination regions, but the exact 
molecular nature of the sites and their recognition is 
incompletely defined. In a mechanism distinct from 
that utilized by the rho protein, the B. subtilis TRAP 
complex directs termination during tryptophan bio- 
synthesis, ina manner quite distinct from the ribosome- 
mediated attenuation for tryptophan synthesis seen 
in E. coli. 

Rho-dependent termination sites and factor- 
independent termination sites are ignored by prokar- 
yotic RNA polymerases in the presence of antitermin- 
ation factors such as the bacteriophage lambda N and Q 
proteins and similar proteins from related bacterial 
viruses. The antitermination function of N requires 
several host proteins and N-utilization-substances 
(Nus), first identified genetically. These include 
NusA, NusB, the rho factor, NusE (ribosomal protein 
S10), the -subunit of RNA polymerase, as well 
as NusG. In a distinct mechanism, the psu protein 
encoded by phage P4 promotes antitermination 
which allows its partner phage, P2, to complete its 
replication. Cellular antitermination activity has 
been detected genetically in the bacterial ribosomal 
operons, where readthrough of transcription is regu- 
latory and dependent on sites with sequence similarity 


to those needed for effective N function, but the host 
protein that carries out this reaction has not been 
identified as yet. 

There are also mutant bacterial RNA polymerases 
that are considered ‘hyperterminators’ and ‘hypoter- 
minators,’ and these mutations lie in the two largest 
subunits of the catalytic core of the enzyme. The 
change in termination efficiency holds for both factor- 
independent and rho-dependent terminators. How- 
ever, the termination efficiency is unrelated to the 
elongation rate when factor-independent terminators 
are studied, whereas the termination efficiency for 
rho-dependent terminators seems to be related to the 
rate of elongation when the polymerase encounters 
the termination sequences. The mechanistic explan- 
ation for these differences is not known. 

In eukaryotes, the La protein enters into termin- 
ation regulation for RNA polymerase III. The La 
protein binds to U-rich 3’ ends of RNA, characteristic 
of the sequence for terminated RNAs for RNA poly- 
merase III. Whether La is a termination factor that 
promotes release or is a stability factor for terminated 
RNAs is still under study. Genetic analyses of the 
largest subunits of RNA polymerase III very clearly 
delineate regions of the polymerase important for 
the recognition of T-rich regions of the non- 
transcribed DNA strand in the termination reaction. 
An analysis of similar regions of the largest subunits of 
RNA polymerases I and II has not been done, but 
the terminators for these two polymerases are also 
less well defined. In these cases, it may be more diffi- 
cult to select individual mutations in the polymerase 
that have an impact on termination since there is a 
coupling of 3’ end processing with the termination 
reaction. 

Termination is the final step in the synthesis of a 
productive RNA transcript from a transcription unit, 
but it is the essential part of the cycle that permits the 
polymerase to move from one template sequence to 
another so that the process of transcription can begin 
again. 


Summary 


Transcription makes an accurate copy of the informa- 
tion found within the DNA (or RNA) template so 
that the cell can decode the genetic information and 
allow it to be used in synthesizing protein, as well as 
structural, regulatory, and catalytic RNAs. Transcrip- 
tion is a multistep process with a large number of rate- 
limiting steps, both catalytic and stoichiometric. It 
should be emphasized that transcription is a cycle 
wherein the proteins involved are cycled again and 
again to promote synthesis of the proper RNAs at 
the proper place and time and in appropriate amounts 


in the cell. Transcription is one of many processes that 
the cell uses to regulate gene expression, and the co- 
ordination of transcription with other RNA and 
DNA metabolic processes emphasizes the importance 
of accurate regulation of these events to ensure survival 
of the organism. 
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Transduction is the term used to describe the transfer 
of genetic material from one cell to another by means 
of a virus. As microbial genetics developed, it became 
clear that bacteria can exchange genetic information in 
two ways: through transformation, in which a cell 
picks up naked DNA from the medium, and through 
conjugation, in which two cells come into contact 
temporarily and a copy of some genes are transferred 
from one to the other. The requirement for cell-cell 
contact was demonstrated by Davis, who put cells of 
two strains of Escherichia coli, normally capable of 
conjugation, on opposite sides of a filter, thus prevent- 
ing their contact; no recombination occurred, thus 
showing that recombination was not due to transform- 
ation. When Norton Zinder discovered recombin- 
ation between strains LT2 and LT22 of Salmonella 
typhimurium, he repeated the Davis experiment, 
expecting to find no recombination. But recombin- 
ation did occur, showing that conjugation was not 
involved; adding deoxyribonuclease to the medium 
did not prevent recombination, so the mechanism was 
not transformation. Zinder then showed that strain 
LT22 was lysogenic for a phage, designated P22, and 
that P22 virions were carrying genes into LT2 cells. 
Zinder called the phenomenon ‘transduction.’ 
Transduction can take two distinct forms, general- 
ized and specialized. In generalized transduction, as 
in the P22 case, the phage (either temperate or viru- 
lent) multiplies in a host cell while breaking the host 
DNA into fragments. While most virions include 
copies of the phage genome, some ‘pseudovirions’ 
encapsidate fragments of host DNA. When a second 
strain of bacteria, carrying suitable genetic markers, 
is then infected at a low multiplicity of infection, some 
of the bacteria receive the transducing particles; 
instead of becoming infected, they receive a fragment 
of bacterial DNA, which can then recombine with 
their own DNA and render them recombinant. Gen- 
eralized transducing phage can carry any host genes. 
Specialized transduction, in contrast, occurs only 
with a temperate phage whose prophage occupies a 
specific site in the host genome. The best-known case 
is with phage lambda, whose prophage inserts between 
the gal (galactose metabolism) and bio (biotin biosynth- 
esis) genes. When the prophage is excised from the gen- 
ome and begins lytic multiplication, it usually 
recombines witha precise reversal of theinsertioncross- 
over, so a burst of normal lambda phage is produced. 
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Sometimes, however, the excision crossover occurs at 
the wrong point, so the excised lambda genome leaves 
some of its own genes behind and carries some of the 
host genes from one side or the other. This pheno- 
menon was first studied by using the gal genes; the 
resulting transducing phage carried gal DNA instead 
of some critical genes needed for phage multiplication 
and so were known as Adg (defective, galactose-trans- 
ducing). Later it was discovered that lambda phage 
could also carry bio genes, and some of these phage 
are not defective in their ability to multiply. 

Transduction of either type is particularly useful 
for fine-structure genetic analysis. The general se- 
quence of bacterial genes can be determined with 
relative ease through conjugation experiments, but 
such experiments cannot determine the sequence of a 
series of very close markers. Various experimental 
designs can be used for fine-structure mapping. If 
two close markers are transduced together, they are 
said to be cotransduced, and the frequencies of 
cotransduction can indicate the relative sequence of 
three or more contiguous markers, since those closest 
to each other should be cotransduced most frequently. 
One can also determine the sequence of a series of 
markers affecting a single function (Xj, X2,..., Xn) 
relative to a nearby outside marker; for illustration, we 
will use the marker leu (leucine biosynthesis). One 
then sets up a pair of reciprocal crosses such as the 
following, where the recipient cells are made Jeu” and 
the transducing phage are grown on leu” cells: 


donor : leut x1 + 


recipient: leu + x2 
and 


led +x 


recipient: lew xı + 


donor: 


One then measures the frequency of lenu” transduct- 
ants and the frequency of leu” X* transductants. The 
experiment depends on the principle that the more 
crossover events are required to produce a given 
recombinant, the rarer that recombinant will be. 
Notice that in the first cross, one pair of crossovers is 
needed to bring the leu” marker into the recipient and 
a second pair of crossovers is necessary to obtain the 
wild-type X4. In the second cross, however, a single 
pair of crossovers can bring in both the leu* and Xt 
markers. Thus, if the sequence is indeed len—X\—X2 
more of the /eu* recombinants should also be X* in 
the second cross than in the first. 


See also: Bacterial Transformation; Conjugation, 
Bacterial; Specialized Transduction 
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The term ‘transfection’ is used differently by bacterial 
geneticists and animal cell biologists. Both usages 
describe processes where a DNA fragment is intro- 
duced into a cell and the gene or genes it carries are 
expressed. In bacteria this general process is called 
transformation, and transfection refers to the special 
case where bacterial cells are transformed with DNA 
from an infectious bacterial virus (a bacteriophage), to 
produce an infected cell. In tissue culture of animal 
cells, transfection refers to any process that artificially 
introduces DNA into cultured cells. 


See also: Bacterial Transformation; Tissue 
Culture 
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Agrobacterium tumefaciens, a soil-borne phytopatho- 
genic bacterium, transfers a segment of its Ti (tumor- 
inducing) plasmid, called T-DNA (transferred DNA), 
to plants (Figure 1). Virulence proteins, coded for by 
the virulence region also localized on the Ti plasmid, 
mediate this transfer. They are involved in generation, 
translocation, protection, and nuclear localization of 
the T-DNA. Finally the bacterial DNA integrates into 
plant chromosomal DNA ina random fashion. Genes 
located on the integrated T-DNA carry eukaryotic 
expression signals. Their expression yields enzymes 
providing unique nutrients for the bacterium and 
turning on mitotic activity of the transformed cell. 
The result is a tumor, also called crown gall. Molecular 
understanding of this process allows the use of this 
bacterium as a general tool for generating transgenic 
plants. 


Events in the Bacterium 


In free-living agrobacteria most of the virulence 
genes, which are organized in operons, are inactive. 
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Perception by the bacteria of signals emanating from 
wounded plant cells leads to a unique molecular 
dialogue. Wounded plant cells secrete phenolic com- 
pounds and sugars, which are sensed by the bacterial 
virulence protein VirA (Table l) and interpreted 
as proximity of a plant cell ready to be invaded. In 
a cascade of events following attachment of the bac- 
teria to plant cells, the receipt of the plant signal is 
turned into a general transcriptional activation of 
all virulence genes. The VirA protein thereby becomes 
autophosphorylated and subsequently transfers the 
phosphate group to the virulence transcription factor 
VirG in order to activate it. This leads to general 
transcription of all virulence genes, involving specific 
regions upstream of the virulence promoters. As 
a consequence virulence proteins are produced. 
These are involved, apart from the signal sensing/ 
transcriptional regulation activities, in generating 
a transferable T-DNA unit and accomplishing its 
transfer to the plant cell with the help of a transfer 
machine. 


Table | 
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Scheme of T-DNA transfer from Agrobacterium tumefaciens to a plant cell. (Adapted from Rossi et al., 


The T-DNA on the Ti plasmid is flanked by two 
almost identical sequence elements in direct orien- 
tation. These so-called border sequences are substrates 
for the site-specific nicking enzyme composed of 
VirD1 and VirD2. Upon cleavage, the catalytic sub- 
unit VirD2 remains covalently attached to the 5’ ter- 
minus of the emerging single-stranded T-DNA. This 
protein-DNA complex travels, in an unknown fash- 
ion, to the plant cell. It thereby uses the transfer 
apparatus composed of the exocellular pilus, the mat- 
ing channel, and cytoplasmic membrane ATPases. 
Despite intensive research in this area, the exact com- 
position of this machine is not known. Also the local 
connection of the functional units and, most import- 
antly, the use by the T-DNA-protein complex of the 
transfer machinery awaits elucidation. 


Events in the Plant 
After the T-DNA-VirD2 complex has reached a plant 


cell’s interior, by whatever mechanism, it receives a 


Functions of Ti-plasmid encoded virulence proteins. Chromosomally located virulence genes are not 


included. In addition to the general Ti-plasmid encoded virulence proteins, there exist additional strain specific proteins 


Protein Function 

VirA Sensor/transmitter of plant signal 

VirG Transcriptional activator of virulence genes 

VirDI, VirD2 Enzyme complex generating a site-specific nick into the border sequence 

VirC Enhancement of virulence 

VirE2 Single-stranded DNA binding protein; activity in the bacterium is dependent on the presence of VirE| 


VirBI-I1, VirD4 Components of transfer machine 
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protective coat of VirE2 protein molecules that most 
likely reaches the plant cell independently of the 
T-DNA complex. This virus-like particle has a com- 
pact, yet flexible structure. It enters the plant nucleus at 
nuclear pores by virtue of nuclear localization signals 
contained in virulence proteins. Specific transporters 
ferry the T-DNA complex across these huge macro- 
molecular machines. 

The ‘ultimate goal’ of the pathogenic organism 
A. tumefaciens is to subvert the metabolism of the 
infected plant cell to serve the invader’s need and to 
produce compounds only the inciting pathogen can 
use. For this to be guaranteed in a genetically stable 
way, the T-DNA complex integrates. To elucidate the 
mechanism of this integration step im vitro as well as 
in vivo analyses have been employed. Whereas the 
former approach yielded information on known 
bacterial- and plant-specific proteins involved in this 
process, analysis of mutants of the model plant 
Arabidopsis thaliana impaired in transformation 
leads to the identification of plant proteins specifically 
involved in this process. 

The tumorous phenotype of plants infected with 
Agrobacterium tumefaciens originally suggested 
oncogenic principles at work that might yield infor- 
mation on human cancer. Once the bacterially 
initiated transformation principle was established, 
this suggested relationship of course could no longer 
be maintained. Indeed, the tumor-inducing principle 
(TIP) could be explained by the presence and expres- 
sion of genes involved in the hormone balance of plant 
cells. The T-DNA was shown to contain genes cod- 
ing for enzymes involved in the biosynthesis of the 
plant hormones auxin and cytokinin. Overexpression 
of these genes leads to uncontrolled proliferation of 
transformed cells, ultimately yielding the phenotype 
of tumors. The underlying pathogenic mechanism 
was found to be the overreplication of transformed 
nuclear DNA and specifically of T-DNA genes 
involved in the production and secretion of specific 
secondary metabolites called opines. These biochem- 
icals, specific for agrobacterial strains inducing their 
synthesis, are used exclusively by the bacterial species 
responsible for the respective tumors as their sole 
nitrogen, carbon, and energy source. This special bac- 
terium-—plant relationship thus represents a micro- 
cosm of specific exploitation of plant resources by a 
sophisticated plant pathogen. 


Use of Agrobacterium tumefaciens for 
Generation of Transgenic Plants 


During the study of the mechanisms underlying 
T-DNA-mediated plant transformation, it became 
obvious that there was an opportunity to exploit this 


system for generating transgenic plants. In particular, 
the fact that for the transfer no T-DNA sequences 
other than the borders are required suggested the pro- 
cess of inserting genes of interest into the T-DNA, 
which the bacterium faithfully inserted into the plant. 
In many plant species Agrobacterium-mediated plant 
transformation is the method of choice for the genera- 
tion of transgenic plants, as in many cases only one ora 
few copies of (almost) complete DNA units are inte- 
grated. Successful transformation has been achieved 
not only for dicotyledonous plants, the natural hosts 
producing tumors upon infection, but also for some 
monocotyledonous plants, to which the agriculturally 
important cereals such as wheat, maize, and rice 
belong. Thereby the capacity of the transformed cells 
to regenerate frequently into fertile, transgenic plants 
has been the critical step. In addition to crop improve- 
ment, however, plant transgenesis is a key factor in the 
establishment of modern genetics and genomics. 
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The transfer RNAs are the central molecules in trans- 
lation or protein biosynthesis. They are the adaptors 


that mediate the translation of the genetic message into 
proteins, which are the principal gene products of cells. 


History 


When the structure of DNA and the basics of protein 
synthesis were first clarified tRNA molecules were 
unknown. Crick pointed out that there was a signifi- 
cant problems to understand how a polypeptide could 
be assembled from an RNA template since there 
was no stereochemical complementarity between the 
codons and the amino acids. He suggested that adap- 
tors, small RNA molecules that could be charged with 
specific amino acids by enzymes, would decode the 
messenger RNA (mRNA) by complementarity. These 
adaptors could thus participate in incorporating the 
amino acids into a growing polypeptide. Subsequently 
these adaptors were identified and are now known as 
the tRNA molecules. 


Structure 


When the first nucleotide sequence of a tRNA was 
determined the possibilities for base-paired secondary 
structures was examined. Among the three possibil- 
ities suggested only one, the classical cloverleat 
(Figure 1), was consistent with subsequently deter- 
mined sequences. Here the stem regions contain four 
to seven base pairs. The cloverleaf is arranged in sucha 
way that the 5’ and 3’ termini are base-paired to each 


Figure | The classical cloverleaf two-dimensional 
structure of phenylalanine tRNA from yeast. (Figure 
made by Maria Selmer.) 
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other. The three leafs are formed by three base-paired 
regions each making a loop. The middle loop contains 
the anticodon of the tRNA. In 1973 the first three- 
dimensional structures of tRNA were determined. 
Here a number of surprises became apparent. First, 
the secondary structure of the clover leaf was con- 
firmed, but it was found to be folded into the shape 
of an L (Figure 2). Second, the anticodon was found at 
one end and the 3’ acceptor, the CCA sequence, at the 
opposite end, approximately 80 A apart. This meant 
that the anticodon has no possibility of interacting 
with the amino acid. It also means that when the 
tRNA incorporates the amino acid into the growing 
polypeptide on the ribosome the mRNA and the 
decoding site is far from the site for peptidy] transfer. 


Synthesis 
The genes for the tRNA molecules are dispersed in the 


genomes. There are sometimes several genes for a cer- 
tain tRNA and the genes frequently are found as clus- 
ters of several tRNA genes. Some of them are located 
among genes for ribosomal RNA molecules. The genes 
for the tRNA molecules do not code for the final func- 
tional molecule. Thus the transcript contains sequences 
that are removed by specialized tRNA processing 
nucleases. Likewise the 3’ terminal residues, CCA, are 
added by an enzyme to eucaryal tRNA precursors. 

In addition the mature tRNA molecule is exten- 
sively modified. Thus some modifications are so 


Figure 2 The three-dimensional structure of a 
tRNA (Phe tRNA from yeast). (Figure made by Maria 
Selmer.) 


1988 Transfer RNA (tRNA) 


typical that they have given the names to the parts of 
the structure they belong to. Thus a pseudouridine (¥) 
has given the name to the ¥ loop and a 5,6-dehydrou- 
ridine (D) has given the name to the D-loop. The anti- 
codon loop is also frequently modified. 


Codon-Anticodon Relationships 


The universal genetic code has 64 words or codons, 
three of which designate stop and are not normally 
read by tRNAs. Since there are 20 different amino 
acids in the regular protein the code is degenerate. 
Thus there are six codons that correspond to serine 
and arginine while there is only one codon for several 
other amino acids. This situation is handled differently 
in different organisms. The codons used must be cor- 
related to the set of tRNAs expressed by the organism. 
In some organisms the codon usage is limited to a 
small set of tRNAs (minimally 20) while in other 
species there are tRNAs corresponding to most 
codons. Thus the codon usage is different for different 
organisms. 

The anticodon, normally positions 34-36 of the 
tRNA, interacts with the codons of the mRNA primar- 
ily by Watson—Crick base-pairing. However, the first 
position of an anticodon of atRNA can base-pair with 
different nucleotides at the third position of a codon. 
Thus noncanonical base pairs are formed in the third 
or the so-called wobble position of the codon. This 
may also involve modified bases of the tRNA. This 
allows atRNA with a certain anticodon to read several 
codons; thus a limited set of tRNAs can read a larger 
set of codons. 


The tRNA Synthetases 


The enzymes that charge the tRNAs with amino acids, 
the tRNA synthetases, are specific for one amino acid 
each. Thus there are normally 20 tRNA synthetases in 
an organism even though there are some deviations. 
These enzymes are specific for one amino acid and the 
corresponding tRNA or a family of tRNAs. There are 
two classes of tRNA synthetases, ten of each class. 
These two classes have entirely different structures, 
recognize the tRNA in different ways and charge the 
tRNA on the 2’ (class I) and 3’ (class II) hydroxyls of 
the terminal ribose of the tRNA. 

The fidelity of translation primarily depends on the 
synthetases. They are enzymes that recognize two 
different substrates with utmost accuracy. First the 
correct amino acid has to be bound by the enzyme 
and activated by an ATP molecule. In the second step 
the correct tRNA is bound to the enzyme and the 
amino acid has to be transferred to it. Since the dis- 
tance between the amino acid and the anticodon is 


large not all tRNA synthetases contact the anticodon, 
but select the correct tRNA using specific sequences 
along the lengths of the tRNA molecules. 


The Adaptor Molecule in Protein 
Synthesis 


The charged aminoacyl tRNA is readily protected by 
a protein that is very abundant in the cell and that 
carries the tRNA to the ribosome. In procaryotes it 
is called elongation factor Tu (EF-Tu) and in Eukarya 
and Archaea elongation factor 1 which is composed of 
several subunits. This elongation factor is activated by 
binding a GTP molecule to bind the aminoacyl-tRNA 
The complex between the elongation factor and 
the tRNA primarily concerns the acceptor stem and 
the amino acid that becomes well protected from hy- 
drolysis by the protein. 

The ribosome has three main sites for tRNAs: the 
A-, P-, and E-sites. They span the space between the 
small and large subunits. Here the long distance 
between the two functional ends of the tRNA becomes 
necessary. The mRNA is bound to the small ribosomal 
subunit, thus the decoding is performed there. The 
large subunit contains the peptidyl transferase site 
where the acceptor ends of two tRNAs come together 
to transfer the nascent peptide to the incoming amino 
acid. To be able to start the synthesis of a protein a 
special tRNA is required to read the first AUG codon. 
This is called the initiator tRNA. It binds to initiation 
factor 2 (IF2) which brings it into the ribosomal P-site 
before the factor dissociates. 

Complexes between tRNA and the elongation fac- 
tor Tu (or EF-1) bind to the ribosome in an initial 
selection between cognate and noncognate tRNAs. 
Cognate, and less frequently near-cognate, tRNAs 
cause the elongation factor to hydrolyze its GTP 
molecule. This results in the dissociation of the factor 
from the tRNA and the ribosome. The tRNA mol- 
ecule is now free to place its amino acid in the peptidyl 
transfer site. In a proofreading step the fidelity of the 
codon-—anticodon interaction is improved by the dis- 
sociation of near-cognate tRNAs before the nascent 
peptide is attached. In a subsequent step the peptidyl 
tRNA is translocated from the A-site to the P-site, a 
step that is mediated by elongation factor G (EF-G) or 
in Eukarya and Archaea EF-2. The deacylated tRNA 
is moved from the P- to the E-site before it dissociates. 


tRNA Mimicry 


Several proteins participating in protein biosynthesis 
interact with the ribosomal sites for tRNA. The struc- 
tures of several of those are being unraveled. So far 
the eukaryotic termination factor 1 (eRF1), that causes 


the hydrolysis of the peptide when a stop codon is 
encountered in the A-site, has a tRNA like structure 
and may bind to the A-site like a tRNA. The ribosome 
recycling factor is an excellent mimic of a tRNA. It 
may bind to the A-site after termination, to dissociate 
the mRNA from the ribosome and cause the subunits 
to dissociate from each other. The ribosomal translo- 
case, EF-G or EF-2, specifically mimics the complex 
containing EF-Tu and a tRNA. Thus parts of that 
protein mimic the tRNA. During translocation it has 
been observed to place itself in part of the A-site. 
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1. Transformation in bacteria or eukaryotic cells is the 
acquisition of new genetic markers by incorpor- 
ation of added DNA. 

2. Transformation of eukaryotic cells is the conver- 
sion to an unrestrained growth pattern. 


See also: Bacterial Transformation; Marker 
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Transgenes are exogenous DNA sequences intro- 
duced into the genome of an organism. These trans- 
genes may include genes from the same organism or 
novel genes from a completely different organism. 
The resulting plant, animal, or microorganism is said 
to be transformed. Transformation occurs naturally in 
organisms such as bacteria, which can take up DNA 
from their surrounding environment. In addition, 
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techniques have been developed to introduce and 
maintain transgenes in plants, animals, and bacteria. 
Transgenes can be used to analyze or alter the function 
of a known gene. In other cases, introduction of trans- 
genic DNA has been used to add new functions to 
an organism, such as the expression of a protein 
normally not present in that organism. In addition to 
the application of transgenes in research, transgenic 
DNA has many potential medical applications, 
including the creation of DNA-based vaccines and 
gene therapy. 

The process of transformation was first identified 
in 1928 by Frederick Griffith, in experiments using 
two different strains of streptococcal bacteria. It had 
been demonstrated previously that injection of viru- 
lent S (smooth) strain bacterial cells could kill mice, 
but that injection of cells from the nonvirulent R 
(rough) strain did not. When heat-killed S strain cells 
were mixed with live R strain cells and injected into 
mice, however, the mice also died, indicating that in 
some way the dead S strain cells had been able to 
transform the nonvirulent R strain cells into virulent 
S strain cells. In subsequent experiments, when DNA 
isolated from S strain cells was injected into mice 
along with live R strain cells, the injected animals 
were found to contain a mixture of both R and S strain 
bacterial cells, demonstrating that transformation of 
nonvirulent R strain bacteria into virulent S strain 
bacteria was mediated by the S strain DNA. 

Initial experiments in the transformation of multi- 
cellular animals took place in Drosophila melanogaster 
and in mice. In Drosophila, transformation is carried 
out using a transposable element to transport the 
transgenic DNA into the Drosophila genome. Initial 
transformation experiments in Drosophila made use of 
the Drosophila rosy gene, which is required for normal 
red eye color. In order to create transformed flies, the 
wild-type rosy gene was cloned into the middle of a 
Drosophila transposable element and the transgene/ 
transposon construct was then injected into Dros- 
ophila embryos containing a mutant rosy gene. Flies 
in which the transposon successfully excised from the 
plasmid and then inserted into the genome could be 
identified based on their change in eye color, from 
rosy to the normal red. The first transgenic mice 
were created by microinjection of transgene DNA 
into fertilized mouse eggs. One transgene used in 
these early experiments contained the promoter for 
the metallothionein gene, which is activated by 
increased levels of heavy metals, fused to the gene 
encoding human growth hormone. Using this trans- 
gene, the animals that developed from the transformed 
embryos could be identified based on their increased 
size relative to normal mice when fed either cadmium 
or zinc. 
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A critical first step in creating a transgenic organism 
is to get the transgene DNA into the organism. In 
bacteria, transformation is carried out by mixing 
transgenic DNA with bacterial cells treated to increase 
their ability to take up DNA. A variety of other 
methods have been used to introduce DNA into 
plant and animals cells including DNA injection, 
electroporation, and microparticle bombardment. A 
different type of approach involves the use of viruses: 
the transgene is cloned into the viral genome and is 
then introduced into cells by viral infection. If the 
transgene-containing virus is competent to carry out 
multiple rounds of infection, the transgene may be 
spread from cell to cell by the virus. Alternatively, if 
the transgenic virus is unable to infect cells on its own 
and requires the presence of a helper virus for infec- 
tion, the transgene will be transmitted to a more limit- 
ed group of cells. 

Typically, transformation is carried out in such a 
way that while many individuals are exposed to the 
transgenic DNA only a small percentage are actually 
transformed. Therefore, a means of distinguishing 
the successful transformants from the background of 
untransformed individuals is essential. One approach 
often used is to include a selectable marker, such as a 
gene that provides antibiotic resistance, in the trans- 
genic DNA; in other cases, transformed individuals 
are identified based on rescue of a mutant phenotype, 
such as described above for the Drosophila rosy gene. 
An increasingly important approach is the use of 
reporter genes such as LacZ or green fluorescent 
protein (GFP) that can be used to both identify trans- 
formants and to monitor when and where the 
transgene is expressed in vivo. 

In some cases transformation is a transient event; 
for example, when transgenic DNA is introduced into 
tissue culture cells, most of it is lost after several 
rounds of cell division. The creation of stable trans- 
formants requires a means of maintaining the trans- 
genic DNA through multiple rounds of cell division. 
For multicellular animals, there is the additional 
requirement that germ cells must be transformed in 
order to ensure that the transforming DNA is trans- 
mitted to the next generation (in the case of multi- 
cellular plants, transformation can be maintained 
either by transformation of the germline or by clonal 
propagation of transformed cells). 

One way to produce stable transformation is by 
insertion of the transgene into the genomic DNA. In 
many cases where this occurs, the site of transgene 
insertion is random. In species such as the yeast Sac- 
charomyces cerevisiae and Tetrahymena, however, 
transgene insertion occurs by homologous recombin- 
ation in which the site of transgene insertion is 
based on homology between transgene and genomic 


sequences. In S. cerevisiae, stable transformation can 
also occur when transgenic DNA is cloned into Cen 
plasmids which are faithfully segregated to both 
daughter cells during each cell division due to the 
presence of yeast centromere sequences. A different 
approach to transformation is observed in the nema- 
tode Caenorhabditis elegans, where injection of trans- 
genic DNA into the C. elegans germline can result in 
formation of an extrachromosomal DNA array con- 
taining hundreds of tandemly repeated copies of the 
transgene. 

Transgenes have played a critical role in research. 
They have been used to clone genes by phenotypic 
rescue, in which transgenes are tested for their ability 
to rescue the mutant phenotype of individual genes. 
Using transgenes that fuse gene regulatory sequences 
to reporter genes such as B-galactosidase or GFP, it is 
possible to observe directly gene expression patterns. 
Another important use of transgenes has been the 
creation of constructs that are targeted to a specific 
gene based on sequence homology and can be used to 
alter or knockout gene function. Increasingly, these 
approaches are now being applied to the creation of 
genetically altered plants and animals. One example 
of this growing transgenic technology is the use of 
mammalian embryonic stem cells, which are totipo- 
tent and therefore have the capacity to form any tissue 


type. 


See also: Embryonic Stem Cells; Genetic 
Engineering; Transfer of Genetic Information 
from Agrobacterium tumefaciens to Plants; 
Transgenic Animals 
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A transgenic animal is one carrying an experimentally 
introduced gene or DNA segment (transgene) in 
many or all of its cells. The transgene is usually 
acquired by gene transfer at an early embryonic stage 
or by transmission from a transgenic parent. Trans- 
genes may be inserted at random sites in the host 
genome, or they may be targeted to specific loci via 
homologous recombination. Many (but not all) trans- 
genes express a gene product, which in some cases 
alters the phenotype of the animal. Transgenic mice 
are widely used in biomedical research, and the appli- 
cation of similar technologies to other species is of 
potential importance in agriculture. 


History of Transgenic Animals 


Transgenic mice were the first transgenic animals to be 
produced, due to the widespread use of this species as 
a model mammalian organism for genetic and embryo- 
logical studies. This important development resulted 
from a convergence of technical advances in several 
fields, including the culture of preimplantation em- 
bryos in vitro, embryo transfer, recombinant DNA 
methods, and the ability to transfer genes into cul- 
tured cells. The first transgenic mice were produced 
in the mid-1970s, by the viral infection of preimplant- 
ation embryos, which resulted in stable integration of 
the viral DNA into the host genome and transmission 
through the germline. The first transgenic mice carry- 
ing an exogenous eukaryotic gene were produced i in 
1980 by a different method, the microinjection of 
purified recombinant DNA into a pronucleus of the 
fertilized egg. This method has since been used exten- 
sively in mice, and also adapted for transgenesis in 
many other mammalian species. Later in the 1980s, 
the development of embryonic stem (ES) cell lines that 
could colonize the germline in chimeric mice provided 
another route for the introduction of transgenes, as 
well as for targeted mutagenesis of endogenous genes. 
However, ES cells able to colonize the germline have 
not been produced from other mammals. In the late 
1990s, techniques for cloning sheep, cows, and mice 
via nuclear transfer from fetal or adult somatic cells 
into the oocyte provided a new potential route for 
transgenesis. 


Methods for Production of Transgenic 
Animals, and Genetic and Physical 
Properties of Transgenes 


In order to introduce a transgene into most or all the 
cells of the animal, including the germ cells, trans- 
genesis is normally performed at very early stages of 
embryogenesis, ranging from the oocyte to the balsto- 
cyst (Figure |). Oocytes, fertilized eggs, or pre- 
implantation embryos are recovered from female 
animals and maintained in culture for several hours 
to several days. After introduction of the transgene, 
the embryos are reimplanted into the reproductive 
tract of a ‘foster mother, where they can develop in 
utero. 


Transgenesis via Pronuclear Microinjection 
of DNA 

Pronuclear microinjection of DNA into fertilized 
eggs is the most widely used method for introducing 
transgenes when site-specific insertion into the host 
genome is not required. The eggs are viewed with a 
microscope and, using a glass micropipette controlled 
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genesis at which transgenes are introduced. Microinjection 
of transgene DNA is normally performed at the one- 
cell (fertilized egg) stage. A fine, hollow glass needle 
containing a DNA solution is inserted through the zona 
pellucida (outer glycoprotein envelope) and plasma 
membrane of the egg into one of the two pronuclei. A 
small volume of the DNA solution is injected. Infection 
with retroviral vectors is usually performed between 
the four-cell and morula stages: the zona pellucida is 
removed and the embryo is exposed to a preparation 
of infectious viral particles (not shown). Transgenesis 
via embryoic stem (ES) cells is performed at the eight- 
cell or blastocyst stage: several ES cells containing the 
transgene are aggregated with an eight-cell embryo, or 
injected into the cavity of a blastocyst. The ES cells 
then intermingle with the host embryonic cells, giving 
rise to a chimeric mouse. 


by a micromanipulator, a small volume of a DNA 
solution containing several hundred transgene mol- 
ecules is introduced into one or both pronuclei of 
each egg. One or more copies of the transgene may 
integrate stably into a host chromosome, resulting in 
mitotic transmission to all cells in the developing 
embryo. Integration of the transgene occurs in 
approximately 10-40% of injected mouse eggs. There- 
fore, each animal born a a microinjected embryo 
must be screened to determine whether it is trans- 
genic. For this purpose, genomic DNA is isolated 
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from a tissue sample (typically the tail tip) and poly- 
merase chain reaction (PCR) and/or Southern blot 
analyses are performed to detect the transgene and 
determine its physical integrity. An animal that de- 
velops from the injected egg and carries the transgene 
is called a ‘founder,’ and it can be mated with normal 
animals to transmit the transgene and derive a ‘trans- 
genic line.’ Some transgenic founders are mosaics that 
carry the transgene in only a fraction of their cells, 
presumably because it integrated into the genome 
after the first round of DNA replication. Such mosaic 
animals are recognized because they transmit the 
transgene to fewer than the normal 50% of offspring, 
or because they express the transgene in only a frac- 
tion of cells. Except when inherited from a mosaic 
founder, a transgene displays Mendelian transmission. 
It is stably inherited over many generations, implying 
that it is permanently integrated at a single genomic 
locus. The loci of transgene insertions can be deter- 
mined by genetic mapping or by in situ hybridization 
to metaphase chromosomes. Transgenic lines can be 
maintained in either a hemizygous or a homozygous 
state, unless the transgene insertion has induced a 
recessive mutation in an essential host gene (see sec- 
tion “Targeted mutations,’ below). 

Transgenes usually insert at a random site in the 
genome, through an unknown mechanism involving 
illegitimate recombination (Figure 2). This allows 
DNA from any source (eukaryotic, prokaryotic, 
viral, etc.) to be introduced into the mammalian 
genome. Frequently, there is a deletion or rearrange- 
ment of host DNA at the site of insertion. In some 
cases, copies of the transgene insert at two or more 
unlinked loci in a single founder. Each insertion locus 
can contain a single copy of the transgene or, more 
typically, a head-to-tail tandem array of tens or even 
hundreds of transgene copies. Even when DNA from 
the same species is injected, the frequency of insertion 
into the host genome through homologous recombin- 
ation is extremely low, about 1074. However, micro- 
injected DNA molecules readily recombine with each 
other through a homologous mechanism. This allows 
a large gene to be injected in several overlapping frag- 
ments, which can recombine to reconstruct the gene 
before or during the integration process. The size of a 
transgene appears to be limited only by breakage dur- 
ing handling and microinjection of the DNA, and 
cloned DNA segments several hundred kb in length 
(e.g., yeast artificial chromosomes) can be successfully 
introduced. Similar methods have been used to intro- 
duce transgenes into a wide variety of mammalian 
species, including rats, rabbits, sheep, goats, pigs, and 
cows. However, for unknown reasons the frequency 
of transgenesis is much lower in large mammals than 
in mice and rats. 


Transgenesis via Infection with Retroviral 
Vectors 

Retroviral vectors take advantage of the natural ability 
of retroviruses to enter the cell and integrate into the 
genome in a single copy. A retroviral vector is pro- 
duced by inserting the transgene in place of part of the 
viral genome, and a preparation of infectious viral 
particles is produced by introducing the recombinant 
virus into tissue culture cells. Mouse embryos at the 
cleavage or morula stage are then infected with the 
virus (Figure l), resulting in retroviral DNA integra- 
tion in one or more cells (Figure 2). The site of inser- 
tion is essentially random, and the founder mice are 
usually mosaics. The size of the transgene is limited to 
8-9 kb, due to packaging limitations of the retroviral 
particle. 


Transgenesis via Embryonic Stem Cells 
Embryonic stem cells are pluripotent cell lines derived 
from early mouse embryos. ES cells can be cultured in 
vitro, and retain the ability to contribute to all somatic 
and germ cell lineages when introduced into an early 
embryo. Therefore, they represent an important route 
for the introduction of transgenes. The main advan- 
tage of ES cells for transgenesis is that they can be 
transfected with DNA and then subject to selection 
and screening procedures to identify clones of cells in 
which the transgene has inserted at a specific site in the 
host genome. This permits the generation of targeted 
mutations in specific endogenous genes (e.g., gene 
knockouts), as well as the insertion of transgenes at 
specific sites in the genome. 

Transgenes for use in ES cells are designed to 
include a selectable marker gene allowing positive 
selection during cell culture, e.g., a neomycin re- 
sistance gene. The transgene DNA is introduced into 
a large population of cultured ES cells (~ 10’ cells) by 
any of several methods for DNA transfection, such as 
electroporation or lipofection. Clones of ES cells that 
carry the transgene, and therefore can grow in the 
presence of the selective agent, are isolated. These 
clones can then be screened to identify rare clones in 
which the transgene has inserted at a specific target 
site, as in gene knockout experiments (Figure 2). Retro- 
viral vectors can also be used to introduce exogenous 
genes or to induce random insertional mutations in 
ES cells. 

Once a transgenic ES cell clone with the desired 
properties has been identified, it is used to produce 
chimeric mice. Several ES cells are microinjected into 
the cavity of a blastocyst-stage embryo or aggregated 
with a cleavage-stage embryo (Figure 1), and several 
such embryos are implanted into the uterus of a foster 
mother mouse and allowed to develop to term. This 
results in the development of chimeric animals (i.e., 
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Figure 2 Modes of transgene insertion into the genome. (A) DNA molecules microinjected into the egg 
pronucleus (shown by arrows, with black boxes representing exons of the transgene) usually integrate in a multicopy, 
head-to-tail tandem array at a random site in the host genome. In the example illustrated, the transgene has, by 
chance, disrupted a host gene (whose six exons are indicated by boxes), resulting in deletion of the first two exons. 
(B) Retroviral vectors also insert into the host genome at random sites, but they do not cause deletion of host 
sequences. In the example shown, the retroviral vector, which carries a transgene (long black box) in between the 
long terminal repeats (open boxes), has by chance inserted into an intron of the same host gene. (C) Targeting 
vectors are designed to integrate into a specific host gene via homologous recombination. The vector shown contains 
two segments of DNA derived from the host gene, allowing it to undergo site-specific integration via homologous 
recombination with its target gene. In the example shown, the targeting vector is specifically designed to delete exons 
2—4 of the host gene, replacing them with a neo gene (a selectable marker) and a B-galactosidase reporter gene, which 
will thus be expressed under the control of the promoter of the target gene (small arrow). 
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those composed of a mixture of cells from two 
sources, in this case the host embryo and the donor 
ES cells) in which the cells derived from donor ES cells 
carry the transgene. The host embryos are obtained 
from a mouse strain genetically distinct from the ES 
cells, typically carrying different hair color genes so 
that the chimeric mice can be identified by their mixed 
hair pigmentation. Chimeras in which the ES cells 
have contributed to the germline will transmit the 
transgene to some or all of their offspring, which are 
then bred to perpetuate the transgenic line. 


Transgenesis via Nuclear Transfer 

Somatic cells derived from embryonic, fetal, or adult 
tissues of the same species are cultured and transfected 
with the transgene, and a clone of transgenic cells is 
identified. A transgenic cell is then fused with an 
oocyte whose nucleus has been previously removed, 
resulting in transfer of the transgenic somatic cell 
nucleus in to the oocyte. The oocyte is then implanted 
into the reproductive tract of a female animal to allow 
it to develop. 

As in the case of ES cells, donor cells are first 
selected for insertion of the transgene, and can be 
screened for the desired transgene insertion site, copy 
number, etc. Because the host oocyte is enucleated, 
the transgenic animal that develops carries the entire 
genome of the donor cell, including the transgene, in 
every cell. 


Expression of Transgenes 


Transgenes that are introduced by pronuclear micro- 
injection are designed to function after insertion into 
diverse sites in the host genome. The expression of 
such a transgene depends on several factors, including 
the regulatory elements (i.e., the sequences that regu- 
late transcription, RNA processing, and translation) 
included in the transgene, the site of insertion into the 
host genome, and the number of copies of the trans- 
gene. Some transgenes consist of a genomic DNA 
segment including a gene (exons and introns) together 
with a certain extent of the natural 5’ and 3’ flanking 
DNA. The expression of such a ‘genomic’ transgene 
depends on whether the DNA segment includes all the 
regulatory elements that normally regulate the gene’s 
expression. In mammals, regulatory elements some- 
times reside >100 kb away from the gene, so that it 
may be necessary to transfer a very large segment of 
genomic DNA to ensure proper expression. More 
frequently, a transgene consists of a cDNA clone 
(providing the coding sequences) joined to a hetero- 
logous promoter, enhancer(s), intron, and polyadeny- 
lation signals to create an artificial “cDNA transgene.’ 
This approach can be used to express essentially any 


gene in any cell type or tissue for which appropriate 
regulatory sequences have been defined, although the 
level of expression may be lower than that obtained 
with a genomic transgene. The pattern of expression 
dictated by the regulatory sequences in a genomic or 
cDNA transgene may be overridden by the influence 
of neighboring host DNA, a consequence of random 
insertion into the genome. Such ‘position effects’ can 
silence a transgene in all cells or ina fraction of cells, or 
alter its level or pattern of expression. There is in 
general a positive correlation between transgene 
copy number and expression level, although this rela- 
tionship can be masked by position effects. 

In addition to regulatory elements that can direct 
expression in specific cell types, tissues, or temporal 
patterns, sequences can be included in a transgene to 
make its expression conditional on administration of a 
drug, a change in temperature, or other experimental 
manipulations. For example, promoters that can be 
regulated by administration of antibiotics, hormones, 
or metal ions have been used extensively to turn trans- 
genes on and off at will during development of the 
embryo or during the life of the adult animal. 

Through the use of ES cells (and potentially 
through nuclear transfer from somatic cells), the prob- 
lem of position effects can be circumvented by tar- 
geting the desired coding sequence to a specific locus, 
where it falls under the control of the natural regulat- 
ory mechanisms at that locus. For example, to express 
a foreign protein in red blood cells, the appropriate 
coding sequences might be fused to the regulatory 
elements of a B-globin gene (encoding the B chain of 
hemoglobin) and introduced into random sites in the 
mouse genome by pronuclear microinjection. Altern- 
atively, the coding sequences could be inserted, via 
homologous recombination, to replace the coding 
sequences of the mouse f-globin gene in ES cells, 
which would then be used to produce germline chi- 
meric mice. 


Applications of Transgenic Animals 


Transgenesis is an extremely powerful tool for the 
genetic analysis and manipulation of mice and other 
animals. As defined above, a transgene is an experi- 
mentally introduced DNA segment carried in the 
genome of a host animal. A transgene can be designed 
to encode a new gene product in the transgenic animal, 
or it can be introduced with the intent of altering or 
disrupting a host gene at its site of insertion. In many 
cases, a transgene will do both, e.g., disrupt an endo- 
genous gene while expressing a new gene product. 
Thus, the applications of transgenesis take advantage 
of its ability to induce both loss-of-function and gain- 
of-function genetic alterations. 


Targeted Mutations, Random Insertional 
Mutations, and Gene Traps 

Targeted mutations are caused by the insertion of a 
transgene into a specific, predetermined host gene via 
homologous recombination. These include ‘knock- 
outs,’ which are designed to eliminate expression of 
a host gene, and ‘knock-ins,’ which are designed to 
modify a host gene without blocking its expression. 
Targeted mutagenesis, whose aim is generally to study 
the function of a known gene, is described in detail 
elsewhere in this volume. 

A different and complementary application of 
transgenesis is random insertional mutagenesis, which 
is generally conducted with the aim of identifying new 
genes. Because of the random site of insertion of most 
transgenes, a host gene is sometimes disrupted, result- 
ing in a lethal or phenotypically visible mutation in 
approximately 5-10% of transgenic mouse lines. Vir- 
tually all insertional mutations identified in transgenic 
animals are recessive, and can be propagated in the 
heterozygous state even if they are lethal in the homo- 
zygous state. In contrast to mutations induced by 
chemicals or radiation, the molecular analysis of inser- 
tional mutations is relatively easy, because the trans- 
gene provides a tag that can be used to clone DNA 
from the mutant locus. In mammals, it is not generally 
feasible to screen sufficient numbers of transgenic 
animals to identify mutations in a specific gene, or 
those causing a preordained phenotypic defect. 
Nevertheless, random insertional mutations in mice, 
which are often an unanticipated byproduct of other 
transgenic experiments, have led to the discovery of 
many important genes. 

A different approach to the use of random inser- 
tional mutagenesis for gene discovery is ‘gene trap- 
ping.’ Here, a transgene encoding an easily detected 
reporter protein (e.g., the enzyme f-galactosidase), 
but lacking its own promoter and/or or enhancer, is 
introduced at random sites in the genome. In mice, 
gene trapping is usually performed in ES cells, which 
are then used to produce chimeric mice. When the 
gene trap vector lands in a host gene, it is expressed 
under the control of the host gene’s regulatory appar- 
atus. Many animals with different gene trap insertions 
are screened to identify those that express the reporter 
gene at particular anatomic sites or development 
stages. The inserted gene trap vector provides a tag to 
clone the gene that has been ‘trapped,’ and also often 
generates a loss-of-function mutation. 


Cell-Type-Specific or Conditional Gene 
Disruption 

In addition to heritable mutations, which are present 
in every cell in the animal throughout its development, 
it is sometimes useful to disrupt a gene in only a 
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specific cell type or at a specific time. This can be 
accomplished in mice by introducing a transgene 
encoding a site-specific recombinase (e.g., Cre re- 
combinase of bacteriophage P1) under the control of 
cell-type- specific or inducible regulatory elements. 
Animals carrying this transgene are crossed with a 
second transgenic strain in which the target gene has 
been modified by the insertion of recognition sites for 
the recombinase (e.g., loxP sites). In cells that express 
the recombinase transgene, recombination at the tar- 
get locus is specifically induced. Depending on the 
placement of the recognition sites, this can result in 
deletion, silencing, or activation of a target gene. 


Genetic Rescue of Mutations 

When attempting to identify the gene responsible fora 
mutant phenotype in the mouse, transgenesis is often 
used to test individual candidate genes. A transgenic 
strain is generated by microinjecting DNA encoding 
the wild-type allele of the candidate gene, which 
inserts at a different locus than the mutant gene. The 
transgenic strain is then crossed with the mutant 
strain, to test whether the transgene can correct (or 
‘rescue’) the phenotypic defect in mutant animals. 
This approach is also useful when attempting to pos- 
itionally clone a mutant gene that is believed to be 
located within a large region of DNA, but whose 
identity is unknown. Large segments of wild-type 
DNA (often cloned as yeast or bacterial artificial 
chromosomes) within the genomic region of interest 
are tested for their ability to rescue the mutation. If a 
successful rescue is observed, smaller clones of DNA 
are tested to narrow in on the gene. 


Mapping Transcriptional Regulatory 
Elements 

One of the earliest applications of transgenesis was to 
delimit the regulatory DNA sequences (e.g., promoter 
elements and enhancers) required to direct correct 
tissue-specific or developmental-stage-specific expres- 
sion of eukaryotic genes. The rationale of such experi- 
ments is that if a transgene displays a consistent 
pattern of expression when inserted at several differ- 
ent sites in the host genome, then that pattern must be 
dictated by cis-acting regulatory elements included in 
the transgene DNA. A transgene whose RNA or pro- 
tein product can be distinguished from endogenous 
gene products is used to produce several transgenic 
animals or lines, and the expression pattern is char- 
acterized. Often a reporter gene encoding an easily 
detected protein is inserted in place of the coding 
sequences of the transgene. The expression of several 
transgenes containing varying amounts of 5’ or 3/ 
flanking DNA is compared to deduce the location 
of cis-acting regulatory elements. Such regulatory 
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elements can then be used to direct the expression of 
other transgenes. 

Gene targeting can also be used to define the 
regulatory elements of a gene, for example, by deleting 
a segment of flanking DNA from a host gene, and 
examining the consequences for the gene’s expression. 


Ectopic Expression or Overexpression of 
Normal Gene Products 

A frequent application of transgenesis involves inten- 
tionally altering the developmental or tissue-specific 
pattern of expression of a normal gene product. This is 
usually accomplished by placing the coding sequences 
of one gene under the control of promoter and enhan- 
cer elements of a second gene, and introducing the 
hybrid transgene into an animal. Alternatively, the 
coding sequences can be targeted, via homologous 
recombination, to a new locus in the genome, where 
they come under the control of regulatory elements of 
a different gene. These approaches have been used 
extensively to study the roles of proteins such as 
growth factors, receptors, and transcription factors 
during animal development. Two examples are 
shown in Figures 3 and 4. Figure 3 illustrates one of 


Figure 3 Abnormal growth of a transgenic mouse 
expressing growth hormone under the control of the 
mouse metallothionein promoter. Growth hormone 
coding sequences were fused to the metallothionein 
promoter, resulting in abnormal expression of growth 
hormone in ectopic tissues such as liver and intestine, 
and its consequent overproduction in the transgenic 
animals. Transgenic mice (left) grew two to three times 
as fast as controls, and reached a size of up to twice 
normal. (Photograph courtesy of Dr. Ralph L. Brinster.) 


the earliest examples of genetic engineering in mice, in 
which growth hormone coding DNA sequences were 
expressed under the control of the promoter of the 
metallothionein gene, resulting in expression in and 
secretion from ectopic tissues, and abnormal growth. 
Figure 4 illustrates an experiment in which a mouse 
homeobox cDNA, Hoxd4, was placed under the con- 
trol of the promoter of a different homoebox gene, 
Hoxa1, resulting in expression of the Hoxd4 protein 
in a more anterior region of the embryo than normal, 
and causing a ‘homeotic’ transformation of certain 
bones in the axial skeleton. 


Expressing Dominant Gain-of-Function or 

Dominant-Negative Mutant Gene Products 
Certain diseases and developmental defects in humans 
and animals, including many forms of cancer, are 


~~ 


Figure 4 The ectopic expression of a homeobox 
transcription factor in transgenic mice causes a homeo- 
tic transformation of the occipital bones of the skull. To 
study the role of Hox genes in anterior—posterior 
patterning of the body plan, the Hoxd4 coding sequences 
were fused to the promoter of the Hoxal gene, which is 
expressed more anteriorly than Hoxd4 in the mesoderm 
of the developing embryo. The photograph shows the 
axial skeletons of a transgenic (left) and a normal 
newborn mouse (right). The ectopic expression of 
Hoxd4 caused an absence of the supraoccipital (S) and 
exoccipital (E) bones and a reduction in size of the 
interparietal bone (I) of the skull. Furthermore, the 
transgenic mouse contained ectopic bony structures 
located anteriorly to the first cervical vertebra (C1) and 
fused ventrally to the basioccipital bone (B). (Photo- 
graph courtesy of Dr. Thomas Lufkin.) 
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caused by dominantly acting mutations, which may 
encode a modified gene product or cause inappropri- 
ate expression of a normal gene product. Animal mod- 
els of such diseases can be generated by introducing 
the mutant gene as a transgene. For example, many 
mouse models of cancer have been generated by intro- 
ducing dominantly acting oncogenes with appropriate 
regulatory elements to target their expression to a 
specific tissue or cell type. Similarly, the individual 
roles of viral gene products in pathogenesis can be 
examined by expressing viral genes in transgenic 
animals. 

Dominant-negative mutations are those that cause 
a mutant protein to interfere with the function of its 
wild-type counterpart in a heterozygote. An example 
is an enzyme that can bind its substrate but is 
catalytically inactive, and therefore competes with the 
wild-type enzyme for substrate. Transgenes encoding 
dominant-negative mutant proteins can be used to 
block the function of a host gene product. This 
approach is particularly useful in cases where it is not 
feasible to generate a loss-of-function mutation, or 
where it is desirable to block gene function in only a 
limited population of cells in the animal. 


Genetic Markers of Specific Cell Types or 
Lineages 

Transgenic animals that express a reporter gene pro- 
duct (e.g., B-galactosidase or GFP) in a specific cell 
type, cell lineage, or anatomical region are useful in a 
wide variety of biological experiments. Transgenic 
strains of this type are commonly produced in three 
ways: (1) by introducing a transgene in which the 
reporter gene is controlled by regulatory elements 
from a gene with the desired expression pattern, (2) 
by identifying a strain carrying a randomly inserted 
reporter gene with the desired expression pattern, i.e., 
a ‘gene trap,’ or (3) by targeting the reporter gene into 
the locus of a specific gene, and thereby placing it 
under the control of regulatory elements at that 
locus. An example of the expression of a B-galactosi- 
dase reporter gene in a specific region of the develop- 
ing nervous system is shown in Figure 5. 


Genetic Ablation of Specific Cell Lineages 
Instead of marking the cells expressing a specific gene, 
it is also possible to ablate those cells, and all of their 
descendants, in the developing animal. This can be 
used to analyze cell lineage relationships or to gener- 
ate disease models. Cell lineage ablation is accom- 
plished by introducing a transgene encoding a toxin 
(e.g., diphtheria toxin A fragment) that will kill the 
cells in which it is expressed, but will not harm sur- 
rounding cells. 
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Figure 5 -galactosidase staining reveals the expres- 
sion of a transgene at the mid-hindbrain junction of an 
EI0.5 day transgenic mouse embryo. The embryo 
carried a transgene encoding B-galactosidase under the 
control of the Engrailed-2 enhancer and the promoter of 
the hsp68 heat-shock gene. Engrailed-2 is a homeobox 
gene required for normal brain development. The arrow 
points to the mid-hindbrain junction, a site of Engrailed-2 
expression, while the arrowhead indicates expression of 
the transgene in the spinal cord, which is due to the 
activity of the hsp68 promoter. (Photograph courtesy of 
Dr. Alexandra Joyner.) 


Applications in Agriculture and the 
Pharmaceutical Industry 

Transgenic animal models of human disease can be 
useful for preclinical drug testing. Animals engineered 
to be susceptible to human viruses, by introduction of 
viral receptors or other host range determinants, can 
also be used for testing human vaccines. 

Transgenic animals can serve as ‘factories’ that, in 
some cases, may produce large amounts of proteins 
more efficiently than alternative expression systems 
such as bacteria, yeast, or mammalian cell cultures. 
Transgenic mice have been engineered to express 
human antibodies (which are superior to murine anti- 
bodies for use as drugs) by introducing large segments 
of human DNA encoding human immunoglobulin 
genes, and breeding these transgenic animals with 
strains in which the endogenous immunoglobulin 
loci are mutated. In transgenic large animals such as 
cows or sheep, proteins of pharmaceutical value can be 
produced in large quantity in milk (and later purified) 
by introducing the appropriate gene under the control 
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of regulatory elements that direct expression in the 
mammary glands. 

Transgenesis can in principle be used to alter many 
phenotypic properties that may increase the value 
of agriculturally important animals. These include 
growth rate, fat composition, milk production, and 
hair texture. It may also be possible to modify do- 
mestic animals such as pigs to make them more suit- 
able as organ donors for human transplant patients. 
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A transient polymorphism is one where the genetic 
variation (polymorphism) currently found in a popu- 
lation at a certain genetic locus is expected to be of 
limited duration, and will ultimately be eliminated 
over time by the evolutionary force(s) acting on the 
population. The end result in such cases will necessar- 
ily be a monomorphic population with only one 
genetic type (allele and genotype) at the locus in ques- 
tion. 

The biological significance of transient polymorph- 
isms naturally depends strongly on the number of 
generations for which the existing genetic variation 
will be maintained. If this is a very long time, say 
hundreds or thousands of generations, then, for all 
practical purposes, the observed variation may be 
effectively permanent. Moreover, in such cases, envir- 
onmental or other relevant conditions are apt to 
change during such a long time span, and along with 
it, the evolutionary forces acting in the population on 
the genotypes at this locus. Once conditions change, 
genetic variation may either be maintained or elimin- 
ated even faster under the new evolutionary regime. 


Genetic variation would thus be apt to be maintained 
in such a population as long as the current conditions 
prevail, which is all one can ordinarily hope to predict. 

Because of the critical impact of the time frame of 
transient polymorphisms, it is important for evolu- 
tionary biologists to take this into account when 
deriving and interpreting the biological conditions 
under which genetic variation will be maintained in a 
population. In particular, these conditions may be 
much broader than suggested by the standard ones 
based simply on when there will be a stable poly- 
morphism (i.e., a locally stable equilibrium at which 
genetic variation will be maintained at the locus in 
question). The time-dependent dynamics of the sys- 
tem must also be examined under the conditions in 
which genetic variation will ultimately be lost to 
determine if and when long-lasting, transient poly- 
morphisms (and effectively permanent genetic vari- 
ation) will be maintained. 

An example of a long-term, but transient poly- 
morphism is provided by the case of a recessive lethal 
allele at a single, diallelic autosomal locus within a 
diploid population of organisms. Suppose, in particu- 
lar, that a locus has the two alleles, A, and A>, where 
the three genotypes (A;A;, A;A2, and AzA2) have the 
relative viabilities of 1, 1, and 0, respectively. This 
means that genotypes with one or two copies of the 
dominant A; allele survive equally well from birth to 
reproduction, while homozygotes for the recessive A2 
allele are lethal, with none surviving to reproduce. 
Suppose further that this selection is the only evolu- 
tionary force acting on this locus in the population. In 
particular, this is a random mating population with no 
mutation to new alleles at this locus, migration, or 
stochastic effects from genetic drift present. 

Under these evolutionary conditions, natural selec- 
tion will steadily reduce the frequency of the deleteri- 
ous A> allele to 0. Genetic variation will thus always 
ultimately be eliminated at this locus, as long as these 
conditions prevail; however, this selection regime can 
result in a long-lasting, transient polymorphism, with 
the frequency of the recessive lethal allele remaining 
above 0.001 for thousands of generations, as shown in 
Figure |. The reason for this slow decline is that once 
the deleterious allele is at low frequencies only rarely 
will individuals carry two such rare alleles; once this 
happens, the recessive lethal allele is effectively 
shielded from selection, since almost all the individ- 
uals in the population will be normal, with full fitness, 
as they carry at least one normal, dominant allele. 

The allele frequency trajectories in Figure | are 
based on the recursion equation giving the new fre- 
quency of a recessive lethal allele Az (q’) after one 
generation of selection, in terms of its previous fre- 


quency (q). This is given by: 


A> frequency 
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Trajectory through time in generations of the frequency of a recessive lethal allele Az, showing how its 


frequency will only slowly be reduced below 0.001 by natural selection, resulting in a long-lasting, transient 


polymorphism. 
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This recursion equation can be solved to obtain an 
explicit dynamical solution for the frequency q, of the 
deleterious allele Ay after any number of generations 
of selection, t: 


1+ tqo 


Gt 


where qo is its initial frequency. From this solution we 
see clearly that, over the course of many generations, 
natural selection will by itself slowly but steadily 
reduce the frequency of a recessive lethal allele to 0. 


Further Reading 
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Transition is the term that describes the change of a 
purine to a purine on the same strand of nucleic acid, 
and of a pyrimidine to a pyrimidine on the comple- 
mentary strand. Changes from A:T to G:C, T:A to 
C:G, G:C to A:T, and C:G to T:A are transitions. 


See also: Transversion Mutation 
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A genome may contain 500-50000 genes that are 
translated to the corresponding proteins. In most 
genomes the hereditary material is in the form of 
DNA; however, certain viruses contain genomic 
RNA. The process of translation is usually preceded 
by transcription, and the product of translation, a 
protein, is frequently transported into a different com- 
partment of the cell from that where it has been 
synthesized. 


Genetic Code 


The genetic code is the universal dictionary by which 
genetic information is translated into the functional 
machinery of living organisms, the proteins. The words 
or ‘codons’ of the genetic message are three nucleo- 
tides long. Since there are four different nucleotides 
used in messenger RNA (mRNA), this results in a 
dictionary of 64 words. There are 20 amino acids 
that are normally used in proteins and which are trans- 
lated. In addition the translation needs a definition of 
‘start’ and ‘stop.’ The start codon defines the start of 
translation as well as the reading frame (the sequence 
of nucleotide triplets) that is to be translated. The start 
or initiator codon is identical to the methionine codon. 
Special mechanisms are used to identify the correct 
initiation site; in addition there are three stop codons. 
Thus 61 codons are available for 20 amino acids, and 
hence the genetic code is degenerate. In the case of 
leucine, serine, and arginine, there are as many as six 
codons, whereas methionine and tryptophan have only 
one codon. 


2000 Translation 


The universal genetic code deviates slightly in mito- 
chondria, where a few codons are translated in alter- 
native ways. The most prevalent are methionine and 
tryptophan, which have two codons instead of the 
usual one. Different organisms use the degenerate 
genetic code differently. The usage of the codons is 
coupled to the availability to tRNAs that can translate 
them. Thus the codon usage can differ to the extent 
that a gene that is transferred from one organism to 
another cannot be translated unless the new organism 
is supplemented with extra tRNAs. 


Transcription 


Genomic DNA cannot be translated but has to be 
copied or transcribed into RNA by different RNA 
polymerases. Here the classic mechanism discovered 
by Watson and Crick applies. One strand of the 
double-stranded DNA (the negative strand) is copied 
through Watson—Crick base-pairing to a positive 
strand of RNA. The process of transcription is in all 
cases strongly regulated. Some proteins are synthe- 
sized in large numbers, whereas others are only present 
ina few copies per cell. Again some proteins are synthe- 
sized during a brief period in the life of the cell, whereas 
others are produced more or less continuously. 

In eukarya, transcription is performed in the nu- 
cleus and the transcript is transported into the cyto- 
plasm to be translated. Transcription and translation 
in mitochondria and chloroplasts is performed in 
these cellular organelles. In the case of eubacteria and 
archaea, the whole process is performed in the cyto- 
plasm. The eubacterial transcripts frequently contain 
several genes controlled by one operator; the mRNA 
is polycistronic. 


Processing of Transcribed RNA 


A number of transcribed RNAs are never translated 
but have the same cellular functions as RNA. These 
are primarily ribosomal RNA (rRNA) and transfer 
RNA (tRNA). The transcribed RNA, called the ‘pri- 
mary transcript,’ frequently has to be processed to 
become mRNA. Several different processes are in- 
volved; the processes in eukarya differ from those in 
eubacteria. The primary transcripts normally contain 
long or short regions, which are not to be translated. 
They form so-called introns, while the translated re- 
gions form exons. The splicing machinery removes the 
introns by cutting and ligation. EukaryoticmRNAs are 
also modified by the addition of a poly(A) tail at the 3’ 
end of the message. 

In eukarya the primary transcripts are also fre- 
quently edited to become mRNAs. This is sometimes 
done by changes of U to C or vice versa. More 


extensive editing occurs of mitochondria from try- 
panosomes, where the mRNAs are extensively modi- 
fied by large enzymatic particles that use templates 
called ‘guide RNAs.’ 


Reading Frame and Usage of Genetic 
Code 


The initiator AUG codon defines the reading frame 
of a mRNA. Translation proceeds from this start in 
steps of three nucleotides (one codon). The frequent 
occurrence of termination codons out of frame pre- 
vents translation in the wrong frame for other than 
short stretches. However, there are mRNAs which for 
correct translation need a change of reading frame. This 
is the case for Escherichia coli termination or release 
factor-2 (RF2). The readthrough of a stop codon 
requires a tRNA that would decode a stop (nonsense) 
codon asa sense codon and incorporate a specific amino 
acid. Such tRNAs are called ‘suppressor tRNAs.’ 

A few proteins in eubacteria and eukarya contain 
seleno-cystein (Se-Cys). This is not incorporated 
by a posttranslational modification as in other cases 
of nonstandard amino acids. Se-Cys is incorporated 
instead during translation in response to one of the 
stop codons. The mechanism for this involves a special 
tRNA (tRNA) which reads the stop codon. A set of 
enzymes has specific functions in this system. One of 
them is a special form of elongation factor Tu called 
SelB that uniquely binds tRNA®**. SelB has the prop- 
erty of identifying a specific secondary structure of 
the mRNA that precedes the stop codon that corres- 
ponds to Se-Cys. This leads to the suppression of the 
stop codon and the incorporation of Se-Cys. 


Translation on Ribosomes 


The process of translation occurs on the ribosome in 
the cytoplasm or in the cellular organelles, mitochon- 
dria and chloroplasts. The ribosome is a complex of a 
few large rRNA molecules and between 50 and 90 
different proteins. It is made of two subunits (large 
and small) with different functions that dissociate 
from each other during part of the process. Translation 
is traditionally divided into three steps: initiation, 
elongation, and termination. A fourth step, ribosome 
recycling, also belongs to the process. Soluble protein 
factors catalyze the process by binding to the ribo- 
some transiently. More than 10 factors participate in 
eubacterial translation, whereas a considerably larger 
number participate in eukaryal translation. 


Initiation 
Translation is initiated by the binding of a messenger 
RNA (mRNA) to the ribosomal small subunit. In 


this process, the initiation (methionine) codon is 
selected and bound at the ribosomal decoding site. 
Subsequently the initiator methionyl-tRNA and the 
large ribosomal subunit are bound to this initiation 
complex. Eubacterial initiation is stimulated by 
three initiation factors, IF-1, IF-2, and IF-3. In eukary- 
otes a much larger number of initiation factors 
participate. 


Elongation 

In each cycle of elongation, one amino acid is in- 
corporated into the nascent peptide. There are three 
elongation factors in eubacteria, which catalyze two 
of the basic steps in translation: the binding of an 
aminoacyl-tRNA to the A-site and the translocation 
of the peptidyl-tRNA from the A-site to the P-site. 
During translocation the tRNAs and the mRNA are 
moved to expose the next codon in the ribosomal 
A-site. However, the central event in elongation, pep- 
tidy] transfer, isa spontaneous process where no protein 
factor is needed. 

The recognition of the codon by the anticodon of 
the tRNA is a process that is done in several steps. In 
the initial selection, the anticodon of the aminoacyl- 
tRNA in complex with elongation factor Tu (EF-Tu) 
and GTP is matched against the codon in the A-site of 
the ribosome. In the case of a good match, the ribo- 
some induces EF-Tu to hydrolyze its bound GTP to 
GDP and phosphate. The EF-Tu:GDP complex has a 
conformation that has low affinity for the aminoacyl- 
tRNA and the ribosome; accordingly it dissociates. 
The aminoacyl moiety of the tRNA, when bound to 
EF-Tu, is located far from the peptidyl transfer center 
and, after the disassociation of EF-Tu, has to reorient 
itself into the A-site of the ribosome, while retaining 
the interaction with its codon. This process coincides 
with the proofreading of the anticodon of the tRNA 
by the codon of the mRNA. An incorrect (noncog- 
nate) match of the anticodon to the codon increases 
the likelihood that the aminoacyl-tRNA will dissoci- 
ate before its amino acid has reached the peptidyl 
transfer site of the ribosome. 

Peptidyl transfer is catalyzed by the rRNA of the 
large subunit without direct assistance of ribosomal 
proteins or translation factors. A completely con- 
served nucleotide, A2451 of the E. coli 23S rRNA, 
serves as a general base during peptide bond forma- 
tion. Once the aminoacyl moiety reaches the A-site 
of the peptidyl transfer site, the peptide on the pep- 
tidyl-tRNA in the P-site can be transferred to it. This 
leads to a peptidyl-tRNA in the A-site and a deacy- 
lated tRNA in the P-site. 

The final step of elongation is the translocation of 
the peptidyl-tRNA from the A-site to the P-site and 
the movement of the mRNA by three nucleotides so 
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that the next codon is exposed in the A-site. EF-G, 
which catalyzes this process, binds to the ribosome in 
complex with GTP. After translocation is performed, 
it dissociates in complex with GDP. A surprising find- 
ing is that the ternary complex of EF-Tu with GTP 
and aminoacyl tRNA has the same shape as EF-G. It 
remains possible that EF-G, when it dissociates from 
the ribosome, leaves an imprint into which this tern- 
ary complex fits nicely. 


Termination 

The termination of protein synthesis depends on the 
exposure of one of the three stop codons (UAA, 
UAG, and UGA) in the decoding part of the A-site. 
In eubacteria two release factors, RF1 and RF2, parti- 
cipate to decode the stop codons and hydrolyze the 
completed peptide from the P-site tRNA. In eukarya 
they correspond to a single decoding factor, eRF1. 
The crystal structure of eRF1 indicates that these 
factors may perform their function by mimicking 
tRNA. The termination factor RF3 in all cases cata- 
lyzes the dissociation of the decoding factors from the 
ribosome. 


Ribosome Recycling 

The ribosome-recycling factor (RRF) removes mRNA 
from the ribosome so that the ribosome is available to 
synthesize new protein from new mRNAs. It per- 
forms this role together with EF-G. An amazing 
observation is that RRF also closely mimics tRNA. 
This suggests that RRF binds to a tRNA binding site, 
possibly the A-site, and is translocated from it by 
EF-G. This then leads to the dissociation of the 
mRNA from the ribosome and the subunits from 
each other. 


Inhibition of Translation 

Translation can be inhibited generally by a large group 
of antibiotic inhibitors. Translational repression also 
occurs and has been observed for some eubacterial 
ribosomal proteins. When these specific proteins 
have been synthesized in excess over rRNA they 
bind to a specific region of their own polycistronic 
mRNA and prevent further synthesis of any of the 
proteins encoded by this mRNA. 


Transport of Product 


The translated protein is frequently targeted to an- 
other compartment of the cell distinct from the site 
of synthesis. This is the case for membrane proteins in 
general, but certain proteins, functioning in different 
cellular compartments, are synthesized in the cyto- 
plasm and subsequently transported to their final des- 
tination by various different transport systems. One 
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of the best studied involves the signal recognition 
particle (SRP), which sorts proteins according to 
their different destinations. The protein to be trans- 
ported has an N-terminal sequence of amino acids 
(tag) that is recognized by SRP, which assists the pro- 
tein in passing through the cytoplasmic membrane. 
The SRP is composed of an RNA molecule and a 


number of proteins. 


Molecular System of Translation 


The ribosome is essential to the process of translation. 
In translation, different molecules bind to the ribo- 
some and proteins are produced. mRNA is vital, 
because it contains the message to be translated. The 
tRNA molecules are also absolutely essential compon- 
ents of the system, since they are the ultimate tools of 
the translation. They are the adaptors that in one end 
read the codons of the message and at the opposite end 
incorporate the amino acid that corresponds to the 
codon into the growing polypeptide chain. The fac- 
tors participating in translation, different in number 
depending on the type of cell, catalyze the different 
steps of translation. In their absence, the process 
becomes so slow that the life of the cell is impossible. 
It is interesting that several of the protein factors, 
EF-G, RRF, and eRF1, imitate tRNA. This suggests 
that they bind toa tRNA binding site on the ribosome 
to perform their catalytic function. 

The ribosome itself has a number of functional 
sites, primarily the decoding site and the peptidyl 
transfer sites. It is remarkable that these sites are 
made up of RNA without any direct participation of 
ribosomal or factor proteins. Thus the ribosome is a 
ribozyme. This observation, together with the fact that 
tRNA and mRNA are the only additional essential 
components in translation, suggests that the early 
translation system could have been constructed 
entirely of RNA. This is consistent with the idea that 
the prebiotic world was an RNA world. 
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Regulation of expression of a protein-encoding gene 
can occur at several molecular levels between the 
DNA and the accumulation of a functional protein: 
transcription, mRNA processing and stability, trans- 
lation (ribosome-dependent, mRNA-programmed 
protein synthesis), and protein modification and turn- 
over. Translational control usually refers to regulation 
of expression at the level of translation. It is properly 
used precisely to mean modulation of the efficiency of 
mRNA translation at the initiation, elongation, or 
termination stages of polypeptide synthesis, although 
it is sometimes used more broadly to include trans- 
lation-coupled regulation of mRNA stability. Trans- 
lational control allows cells to respond rapidly to 
changes in physiological conditions. This is especially 
important in organisms with a nuclear barrier between 
the sites of mRNA synthesis and translation, as well as 
a considerable time lag due to the interval between 
activation of nuclear pathways for the synthesis of 
mRNA and its transport to the cytoplasm. Because 
it involves virtually instantaneous recruitment and 
action of regulatory macromolecules, translational 
control is particularly well suited to regulation of the 
structural and temporal aspects of cell proliferation, 
developmental processes, and cell differentiation, 
and for integrating the various metabolic pathways 
in the cell. A distinction can be made between ‘global’ 
and ‘selective’ translational control. The former 
affects the entire population of mRNAs in a cell, 
switching their translation on or off or modulating it 
by degrees in unison. This type is usually achieved by 
adjustments in the activity of general components 
of the protein synthetic machinery acting in a non- 
specific manner. By contrast, selective controls affect a 
subset of the mRNAs in a cell, sometimes even just a 
single species. 

Cis-acting elements of mRNA often interact with 
trans-acting molecules (mostly, but not exclusively, 
proteins) to achieve activation or deactivation of 
mRNA translation. For example, the iron-responsive 


element (IRE), a sequence- and structure-specific 
negative regulatory element, is found in the 5’- 
untranslated region (5'UTR) of the mRNA for 
ferritin, the iron storage protein. The IRE regulates 
translation of ferritin mRNA in accordance with 
changes in the level of cellular iron. This regulation is 
mediated by a trans-acting iron repressor protein 
(IRP) that binds to the IRE. At low iron concentra- 
tion, IRP binds to the 5'UTR of the ferritin mRNA 
and prevents ferritin translation; at higher iron levels, 
the IRP is saturated with iron and falls off the ferritin 
mRNA. Release of the IRP from the ferritin mRNA 
leads to efficient translation of the mRNA. Transla- 
tion of mRNAs for ribosomal proteins and Drosophila 
sperm tail proteins is also regulated by 5’/UTRs. 

In one mechanism for repressing translation, 
termed ‘translational masking,’ the RNA is seques- 
tered into translationally silent mRNP particles. 
Translational masking was first described for the phe- 
nomenon, during gametogenesis, in which mRNA is 
stored for use at a later stage in development. Several 
‘masking’ phosphoproteins bind to the mRNA with 
relatively little sequence specificity and, by preventing 
mobilization of the mRNA to ribosomes, inhibit 
translation. Specific sequences in the 3’-untranslated 
region (3’ UTR) are necessary for unmasking at spe- 
cific stages of development. 

The length of an mRNA’s poly(A) tail seems to be 
important for modulating the level of its translation. 
During oogenesis and embryogenesis, the translational 
activity of specific mRNAs is regulated by poly(A) 
length, and sequence elements in the 3/UTR have 
been shown to control polyadenylation. For some 
mRNAs, a long poly(A) tail activates translation but 
a short one does not. The 3’UTR of some developmen- 
tally regulated genes is also known to be involved in 
repression of mRNA translation by interaction with 
trans-acting factors, as, for example, in the case of the 
15-lipoxygenase (Lox) gene. The Lox protein is 
required for the breakdown of internal membranes in 
mature reticulocytes. Lox RNA is synthesized in bone 
marrow, but translation of the RNA is repressed until 
reticulocytes reach the blood. The repression is 
achieved by binding of a 48-kDa protein specifically 
to a repeat motif in the 3'UTR of Lox mRNA. 

A novel mechanism of translational regulation by 
3'UTR sequences is displayed by the interaction of the 
lin-14 and lin-4 genes, which regulate the timing of 
developmental events in Caenorhabditis elegans. Sur- 
prisingly, /in-4 encodes not a protein but two small 
RNA transcripts with partial complementarity to 
sequences within the /in-14 mRNA 3/UTR. Transla- 
tional repression of lin-14 mRNA may be caused by 
the binding of the /in-4 antisense RNAs to a repeated 
sequence motif in the 3'UTR of the lin-14 mRNA. 
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The majority of known cases of translational con- 
trol operate at the initiation stage of polypeptide 
synthesis, but examples are known for both the elonga- 
tion and termination stages. Particularly notable are 
mechanisms involving frameshifting to produce longer 
or shorter proteins for specific functions and termin- 
ation codon readthrough to produce a longer protein 
with a specific function. Finally, it should be men- 
tioned that translational control of transcription is 
observed in many cases of transcription attenuation, 
where the termination or continuation of RNA poly- 
merase is determined by the procession and position- 
ing of the ribosome while translating a leader region of 
the nascent RNA transcript. 


See also: Gene Expression; Translation 


Translocation 


C V Beechey and A G Searle 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1310 


Translocation is the transfer of a length of genomic 
material to a new site, either in a chromosome or 
chromatid. It is, therefore, a type of ‘structural aberra- 
tion.” When within a chromosome (intrachange) it is 
generally known as a ‘shift.’ When between chromo- 
somes (interchange) it may be a ‘reciprocal trans- 
location’ or a nonreciprocal ‘insertion’ of material. 
Reciprocal translocations are symmetrical when the 
centric part of one broken chromosome combines 
with the acentric part of the other, or asymmetrical 
when the centric parts combine with each other, as do 
the acentric parts. A symmetrical exchange is usually 
viable, but an asymmetrical one, with its dicentric and 
acentric products, is usually cell-lethal. In a ‘whole- 
arm translocation,’ all or nearly all of a chromosome 
arm from a metacentric is interchanged with that from 
a telocentric or another metacentric. Closely related 
to this is the ‘Robertsonian translocation,’ or ‘centric 
fusion,’ in which the long arms of two acrocentrics 
unite to form a metacentric, with loss of a centric 
fragment. The reverse process of ‘centric fission,’ or 
dissociation, leads to the formation of two acro- 
centrics from one metacentric. Each is important in 
evolution, as is reciprocal translocation. 


Reciprocal Translocation (Symbol T in 
Mouse, t in Humans) 


Induction in Male Germ Cells 
Reciprocal translocations were first studied in Dros- 
ophila by H.J. Muller and coworkers, who induced 
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them by X-irradiation of spermatozoa. This has been 
done in the mouse too, where work has concentrated 
on products of treated spermatogonial stem cells, 
namely, primary spermatocytes and subsequent pro- 
geny. Translocations have also been studied in many 
other animals and plants. 


Meiotic Effects 

Translocations are revealed in spermatocytes at dia- 
kinesis/metaphase I by their characteristic multivalent 
configurations, typically rings or chains of four elem- 
ents (because of the association between nonhomo- 
logous chromosomes) among the normal bivalents 
(see Figure |). F4 progeny which carry a translocation 
generally show ‘semi-sterility’ on mating to normal 
mice: litter size is approximately halved because about 
half the gametes produced by a translocation hetero- 
zygote will be unbalanced (see Figure 2), with death 
in utero of resultant zygotes. This chromosomal 
imbalance may arise from normal (alternate or adja- 
cent-1) disjunction, which leads to duplications and 
deficiencies of segments distal to the translocation 
breakpoint, or from the rarer adjacent-2 disjunction, 
in which proximal (centric) regions are duplicated and 
deficient in the gamete. 


X-Autosome Translocations 

In male mice, translocations between an autosome and 
a sex chromosome cause sterility of carriers through 
breakdown of spermatogenesis. Female carriers of 
X-autosome translocations have small litters due to 
‘semi-sterility,’ although there could also be effects on 
oogenesis. Some X—autosome translocations in which 
autosomal coat color genes are moved on to the X 
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have thrown light on X-inactivation. If this process 
is random, cells in which the normal X is inactivated 
will express the autosomal color gene but if the trans- 
located X is affected, inactivation may spread into the 
autosomal segment to suppress expression of the color 
gene and produce variegation. In most X—autosome 
translocations inactivation is random but in 7(X;16) 
16H the normal X is inactive in all cells. This has led 
to its extensive use in studies on sex determination. 


Marker Chromosomes 


Meiotic effects 

Some autosomal translocations are male-sterile in the 
mouse through defective spermatogenesis, especially 
when one breakpoint is near the centromere and the 
other is distal, so that long and short ‘marker chromo- 
somes’ tend to be formed. This may lead to failures 
of pairing at meiosis, to spermatogenic breakdown, 
and to the production by fertile females of tertiary 
(or partial) trisomic and monosomic zygotes, which 
sometimes survive. 


Experimental use 

Translocations which give long and/or short chromo- 
some markers in somatic cells and which are fertile 
have been used extensively in transplantation and 
chimerism research since no special pretreatment is 
needed to reveal the transplanted cells, e.g., the very 
short T6 marker in the translocation T(14;15) 6Ca (see 
Figure |). Unequal translocations have also proved 
useful in the fine mapping of loci by in situ hybridiza- 
tion since probe signals can easily be detected as being 
distal or proximal to the breakpoint. 


Quadrivalents observed amongst bivalents in metaphase preparations from male mice heterozygous for 


reciprocal translocations. (A) Ring of four chromosomes (arrowed). (B) Chain of four chromosomes from a mouse 
carrying the small T6Ca chromosome (arrowed) referred to in the section ‘Experimental use. In both preparations 
the unequal X/Y bivalent is shown with an arrowhead marking the junction of the large X and small Y chromosomes. 


(Photograph courtesy of E.P. Evans.) 
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Figure 2 Gametic products of a reciprocal translocation (RT). (1) Initial lesions in nonhomologous chromosomes 
(may occur at diploid or haploid stage); (2) exchange of distal regions to give RT; (3) resultant cross-configuration 
at meiotic pachytene which gives characteristic rings or chains at diakinesis/metaphase 1; (4) haploid products of normal 
disjunction, half unbalanced; (5) products of rarer nondisjunction, all unbalanced. Note that if an unbalanced gamete 
AB.CB combines with another unbalanced gamete AD.CD then a balanced zygote heterozygous for the translocation 
results, but it has uniparental partial disomy (see section ‘Genomic Imprinting’) for the chromosomes concerned. 


Locating Breakpoints on Linkage Maps 
Translocation breakpoints can be mapped because the 
heterozygote behaves as if it had a dominant gene 
for semi-sterility at that point. Some translocations 
have a phenotypic effect associated with a breakpoint. 
Thus T(2; 8)26H in the mouse is completely linked to 
the non-agouti locus with a distinctive dark agouti 
phenotype in the homozygote. With some transloca- 
tions the homozygote is lethal. 


Genomic Imprinting 

Offspring with both copies of a specific chromosome 
region derived from one parent only (uniparental par- 
tial disomy) can be generated by intercrossing hetero- 
zygotes for reciprocal translocations (see Figure 2). 
Such offspring can be recognized by the use of marker 
genes on the chromosome regions concerned. How- 
ever, for certain chromosome regions and transloca- 
tion breakpoints this expected complementation of 
unbalanced gametes to form normal offspring results 


in abnormalities instead. This is one manifestation of 
; ee E T : 
genomic imprinting, which means that, for certain 
genetic factors, passage through both parental germ- 
lines is necessary for normal development. Work 
with many mouse translocations (both reciprocal and 
Robertsonian) have helped us to understand this phe- 
nomenon and the chromosome regions involved. 


Insertion (Symbol Is in Mouse) 


These seem to be rarer than reciprocal translocations, 
though some apparent duplications could really be- 
insertions, as demonstrated by chromosome painting 
techniques. Only three have been described in the 
laboratory mouse, of which the most interesting is 
Cattanach’s translocation, an inverted insertion of 
about one-third of chromosome 7 into the X chromo- 
some with symbol /s(In7;X)1 Ct. This has facili- 
tated studies on X-inactivation, imprinting, and gene 
dosage. 


2006 Translocation 


Shift (Transposition) 


Only one chromosomal shift has been described in the 
mouse. Otherwise known as sex-reversed (Sxr) this 
causes a part of the proximal end of the Y chromosome 
to move to its distal end. Its effects include the partial 
sex reversal of XX and XO females to males with small 
testes. It has proved useful in studies on sex determin- 
ation. 


Robertsonian Translocation (Symbol Rb 
in Mouse) 


Nondisjunctional Effects 
There is no evidence for the induction of Robert- 
sonian translocations by ionizing radiation but they 
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are a common type of aberration in animals, espe- 
cially mammals. Because their heterozygotes tend to 
undergo nondisjunction, they lead to the formation of 
monosomic and trisomic zygotes (see Figure 3). All 
the former and most of the latter die in utero, but in 
humans and mice some trisomics survive beyond 
birth, usually with severe abnormalities. Thus hetero- 
zygotes for a Robertsonian involving human chromo- 
some 21 may have offspring with trisomy 21, which 
causes Down syndrome. Since mouse chromosome 16 
shows extensive genetic homology with human 21, 
Robertsonians for the former have been used to gen- 
erate mice trisomic for 16 as models for this syndrome. 
Higher rates of nondisjunction are found in mice 
having two Robertsonians with an arm in common, 
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Figure 3 Robertsonian translocation and its gametic consequences. (I) Initial centric lesions in nonhomologous 
acrocentric chromosomes AB and CD; (2) formation of metacentric with loss of small product (genetically inert); (3) 
haploid balanced gametes from normal disjunction in Robertsonian heterozygote: |:| ratio of gametes with both 
acrocentrics or the metacentric alone; (4) trisomic unbalanced gametes from nondisjunction, with two copies of 
chromosome arm AB or of chromosome arm CD; (5) monosomic unbalanced gametes, with absence of 
chromosomes AB or CD. Note that in intercrosses of Robertsonian heterozygotes a trisomic unbalanced gamete 
(shown in 4) may combine with a monosomic one (shown in 5) to give a balanced zygote heterozygous for the 
translocation but with uniparental disomy for AB or CD (see section ‘Use for Imprinting Studies’). 
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i.e., monobrachial homology; a mouse stock with tri- 
brachial homology generates 100% nondisjunction 
(Beechey and Searle, 1991). 


Use for Imprinting Studies 

When Robertsonian heterozygotes are intercrossed, 
complementation of unbalanced gametes can result 
in offspring with both copies of a specific chromo- 
some inherited from the same parent (uniparental dis- 
omy). As with reciprocal translocations abnormal 
phenotypes result from uniparental disomy for certain 
chromosomes. 


Robertsonian Translocations in Wild 
Mammals 


Mice 

Robertsonian variation is studied in many wild and 
domestic mammals, particularly mice and shrews. The 
rate of chromosomal evolution in both species is high. 
The known distribution of Robertsonian races in the 
mouse stretches from North Africa to Scotland but 
is concentrated in mountainous regions around 
Switzerland, home of the tobacco mouse (Gropp and 
Winking, 1981). This is homozygous for seven centric 
fusions, so has a diploid complement of 14 metacentric 
and 12 acrocentric chromosomes, in contrast to the 
40 acrocentrics found in the house mouse. Hybrids 
between tobacco and laboratory mice have re- 
duced fertility, with high but variable frequencies of 
nondisjunction, but these Robertsonians have been 
transferred successfully into laboratory strains. 


Shrews 

In the common shrew (Sorex araneus) 52 karyotypic 
races, differing in the arrangements of acrocentrics 
and metacentrics, have now been described. Apparent- 
ly they are all derived through Robertsonian fusions 
from an all-acrocentric ancestral stock, although 
whole-arm translocation may be responsible for some 
metacentric arrangements (Searle et al., 1990). They 
have been used mainly to study evolutionary problems, 
such as the links between chromosomal variation and 
speciation and the significance of hybrid zones. 
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Transmissible spongiform encephalopathies (TSEs) 
comprise an unusual group of invariably fatal neuro- 
generative diseases which affect both humans and ani- 
mals. They are unusual in that they are not transmitted 
by conventional microorganisms such as viruses. The 
infective agent is believed to be a normal host protein, 
the prion protein, which has undergone a posttransla- 
tional modification that changes the conformation of 
the protein and renders it resistant to enzyme degrad- 
ation. Natural transmission of the protease-resistant 
form of the prion protein in food, or its inoculation 
parenterally, for example in contaminated hormones, 
are known routes of infection. Mutation of the prion 
protein gene can lead to familial forms of TSE in 
humans and it has been shown that in these cases 
transmission to experimental animals can occur by 
inoculation of infected brain material from deceased 
family members. The disease is thus unusual in that it 
is both genetic and infective. Another important char- 
acteristic is that the incubation period after infection is 
measured in years rather than days. 
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TSEs affect the central nervous system and produce 
characteristic neuropathological features, including 
vacuolation of the gray matter, loss of neurons, astro- 
cytosis, and the occurrence of amyloid plaques com- 
posed largely of abnormal prion protein. There is no 
evidence of any immune response which would be 
expected if the disease was due to a viral pathogen 
or due to an autoimmune reaction. Treatments that 
inactivate viruses, such as heat, irradiation, formal- 
dehyde, and DNA and RNA nucleases, fail to in- 
activate TSE agents. 

The following lists the various TSEs that have been 
recognized in animals and humans. In animals: 


Scrapie in sheep and goats. 

Chronic wasting diseases in mule deer and elk. 

Transmissible mink encephalopathy in farmed 
mink. 

Bovine spongiform encephalopathy (BSE) in cattle, 


exotic ungulates, and carnivores, including cats. 
In humans: 


Sporadic Creutzfeldt—Jakob disease (CJD). 

Kuru (see Kuru). 

Iatrogenic CJD (from transplants and hormones). 
Gerstmann-Straussler-Scheinker (GSS) syndrome. 
Fatal familial insomnia. 


Variant CJD (from BSE). 


See also: Kuru; Spongiform Encephalopathies 
(Transmissible), Genetic Aspects of 
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‘Transmission genetics is often referred to as Mendelian 
genetics or classical genetics. Genetic crosses and 
pedigrees are studied in transmission genetics in order 
to reveal the mode of inheritance for a trait. The mode 
of inheritance describes whether a gene influencing a 
trait is dominant or recessive, and if the trait is trans- 
mitted in an autosomal, sex-linked, or maternal fash- 
ion. The mode of inheritance is most easily identified 
for traits that are determined by the action of a single 
gene, which are also known as Mendelian traits. In 
contrast, polygenic traits reflect the combined activ- 
ities of more than one gene, and the inheritance of 
these traits is more difficult to trace. An extremely 
important consideration in transmission genetics is 
to sort out the degree to which the variation in an 
observed trait (phenotype) is due to the genetic con- 
stitution of an individual (genotype) and how much 


is attributable to and arises from influences in the 
environment. 

In the past century, several major achievements in 
genetics blossomed from work in transmission genet- 
ics. These successes include the discovery that genes are 
the unit of inheritance, that genes reside on chromo- 
somes, and that mutation in the DNA sequences of 
genes can alter their function. The tools and concepts 
of transmission are used today ina wide range of appli- 
cations including agriculture, biotechnology, medi- 
cine, genetic mapping, genetic counseling, and 
evolutionary studies. 


Selective Breeding 


Domestication of Animals 

In 1906, 22 years after the death of Gregor Mendel, 
William Bateson offered the name ‘genetics’ for the 
new branch of biology concerned with the study of 
heredity. Despite its recent scientific origin, work in 
transmission genetics began in prehistoric times with 
the development of agriculture. As early as 8000 Bc 
humans observed the transmission of biological traits 
from parent to offspring as they bred and domesti- 
cated animals such as horses, oxen, and camels. Her- 
edity was important to early breeders, and had a major 
impact on the value of their livestock. For example, a 
prize-winning ram was priceless if its advantages were 
passed on to its offspring, but would have little value 
if they were not. Without an understanding of the 
mechanism of heredity it was difficult for breeders to 
accurately predict the outcomes of matings. Asa result, 
many of the early breeding methods often involved a 
curious mixture of folklore, magic, and science. 


Acquired Traits 

The story of Jacob in the book of Genesis (30: 27-42) 
is an example of the use of folklore and magic in the 
breeding of animals. Jacob desired to increase the 
number of rare speckled and spotted animals in his 
father-in-law’s flock, in order to keep these animals 
for his own profit. He tried to improve his chances for 
success by using his own form of magic to influence 
the coat color patterns in offspring. To do this he 
displayed wood rods decorated with black and 
white speckled patterns at the watering hole where 
the animals mated. According to the story, Jacob’s 
breeding technique resulted in large numbers of 
speckled and spotted animals. The modern genetic 
explanation of Jacob’s success is that many of the 
solid colored animals in Jacob’s flock were carriers of 
recessive alleles for spotting and speckling. Since the 
speckled and spotted coat patterns are recessive traits, 
these traits will appear in offspring of parents that are 
both carriers. 


Up until the twentieth century, the spotted and 
speckled patterns in Jacob’s flocks would likely be 
explained as an acquired trait or character. Support 
for the inheritance acquired characteristics waned in 
the late nineteenth century as a result of the work of 
embryologists. They demonstrated the existence of 
both germ cells and somatic cells. Germ cells were 
shown to carry the genetic material transmitted to 
future generations, and this material was not readily 
influenced by environmental or life experiences. 


Cultivation of Plants 

Despite the long history of animal husbandry, the first 
key insights into the mechanism of heredity did not 
come from animal breeding, but rather from botany. 
Mendel’s model for heredity stemmed from his obser- 
vations of dominance, segregation and independent 
assortment in experiments using the garden pea. 
Mendel’s work was in turn the culmination of an 
ancient history of applied genetics in plants. As early 
as 8000 Bc, humans in various regions of the world 
began cultivating and selectively breeding plants such 
as maize, wheat, barley, rice, and the date palm. While 
some crops of relatively minor importance have been 
domesticated since the Middle Ages, the majority of 
modern crop plants (and domesticated animals) were 
brought under human management during prehistoric 
times. The selective breeding strategy used for crop 
development was not complex, but it was very effect- 
ive. Farmers would collect seeds from those plants 
they liked the most and grow them more frequently 
than plants that were less desirable. Evidence of the 
powerful effects of long-term selective breeding on 
crop development is evident in the common cabbage- 
like vegetables that are very popular today. These 
plants include kale, cabbage, kohlrabi, cauliflower, 
broccoli, and Brussels sprouts. While all of these 
plants look very different, they are in fact members 
of the same species, Barsac olearia. 


Plant Hybrids 

The powerful effects of selective breeding that 
resulted in plants such as the cabbage-like vegetables, 
gradually reaches its limit as plants become inbred and 
genetic variation decreases. One of the greatest chal- 
lenges to early plant breeders was to introduce new 
traits into crops in order to improve their productiv- 
ity. Some crop variants appear spontaneously as a 
result of mutations, where others result from cross- 
pollination. Artificial pollination of plants has an 
ancient history as evidenced by plants such as the 
date palm. The date palm is a type of plant with 
separate male and female individuals in a population. 
Cultivation of the date palm occurred as early as 4000 
Bc and ancient Assyrian art dating from 800 Bc 
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demonstrates pollination ceremonies involved a 
curious combination of science and magic. Despite 
this early use of artificial pollination in plant breeding, 
the first systematic study of plant hybrids was not 
carried out until late in the eighteenth century. Cross- 
pollination studies became popular with academics 
and plant breeders in the early nineteenth century 
who were interested in vegetative vigour and potential 
economic significance of hybrid plants. In 1866, 
Mendel’s principles were presented in a paper titled 
‘Experiments in plant hybridization.’ Unfortunately, 
it took 34 years for the scientific community to 
appreciate Mendel’s contribution to biology. Modern 
genetics began with the study of trait transmission and 
the production of plant hybrids. 


Mendelian Inheritance 


Mendelian Traits 

Mendelian genetics displaced several erroneous ideas 
about heredity. Mendel used the common garden pea 
(Pisum sativum) as his experimental organism. Pisum 
proved to be a remarkably tractable organism for 
Mendel’s genetic studies. Pea plants have a relatively 
short generation time when compared to other plants 
such as fruit trees, which were among those plants 
used by Mendel’s predecessors. An additional advan- 
tage of Pisum was the wide range of variable traits 
available for use in cross-pollination studies. Further- 
more, pea plants self-fertilize when left alone. This 
allowed Mendel to produce purebred (homozygous) 
parental strains by self-crossing plants for several 
generations. Mendel studied traits influenced by the 
action of single genes and these traits are now referred 
to as Mendelian traits. The single-gene traits studied 
by Mendel produced easily visible and distinguishable 
phenotypes such as round versus wrinkled seed, yel- 
low versus green seedpod, and tall versus dwarf stem. 
These phenotypes were readily recognizable in 
subsequent generations and were not complicated by 
in-between forms in progeny. Mendel studied the 
transmission of seven different single-gene traits in 
hybridization experiments using purebred parental 
strains of pea plants. 


Reciprocal Crosses 

Since ancient times a major question regarding the 
mechanism of heredity was whether males and females 
have an equal contribution to the traits of offspring. 
Since the golden age of Greece, philosophers and 
scientists debated whether the genetic contribution 
of one sex exceeded the contribution of the other 
sex. Mendel proposed that the genetic contribution 
of both sexes was equal. Mendel proposed that organ- 
isms inherit pairs of unit factors (genes) for each trait, 
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with one member of each pair derived from each 
parent. Mendel’s most conclusive evidence for an 
equal genetic contribution for each sex came from 
his use of reciprocal crosses. For example, Mendel 
observed that crosses could be designed that used the 
pollen from a tall plant pollinating a dwarf plant, or 
vice versa. He obtained the same results with all seven 
traits in his study and concluded that the transmission 
of these traits was not sex-dependent. 


Traits Do Not Blend 

The blending theory of inheritance was a deeply 
entrenched misconception that was eventually re- 
solved by Mendelian genetics. The blending model 
proposed that maternal and paternal genetic material 
mixed together after fertilization, just like two differ- 
ent colored liquids in a cup. Blending theories gained 
support from observations based on continuously 
varying traits, such as skin color and height, where 
the physical appearance (phenotype) was intermediate 
of each parent. Blending theories failed to explain 
the behavior of discontinuous traits, or discrete traits, 
that consisted of only two contrasting phenotypes, 
with no intermediate phenotypes between. These dis- 
crete traits were not altered in offspring and could skip 
generations. By studying discrete traits, now known 
as Mendelian traits, Mendel’s work demonstrated that 
genes are stable entities that are inherited in pairs. 
Mendel’s results also showed that in hybrid organisms 
dominant versions of genes, or alleles, could mask the 
presence of recessive alleles. Recessive alleles, such as 
those for speckled coat patterns in Jacob’s flock, are 
therefore hidden in hybrids but are stable and can be 
transmitted to future generations. 


Statistical Approach 

Mendel’s experiments involved 287 crosses of 70 dif- 
ferent purebred plants and used approximately 28 000 
pea plants. One of Mendel’s major contributions to 
genetics was his methodical approach. His approach 
involved setting up clearly defined crosses of plants 
with readily distinguishable variables, and then apply- 
ing mathematical analysis (statistics) in order interpret 
his results. Mendel’s method of counting large num- 
bers of progeny followed by statistical analysis became 
widely applied in biological research and helped to 
earn early research in transmission genetics the nick- 
name ‘bean bag’ genetics. Despite earning this un- 
desirable nickname, the statistical analysis of trait 
transmission has been instrumental to the mapping 
of genes, evolutionary studies, and models used for 
population genetics. 


Monohybrid Cross and Test Cross 
Mendel’s cross-hybridization studies involved pure- 
bred plants that differed with regard to a single 


contrasting trait. Purebred, homozygous, parental 
stocks were crossed and the offspring of this cross 
are called F; hybrids, or monohybrids. In the F, gen- 
eration, all of the hybrids resembled the parent with 
the dominant trait. The genotype of these monohy- 
brid, or heterozygous, plants can be represented as 
genotype Aa, with the uppercase letter representing 
the dominant allele and the lowercase letter represent- 
ing the recessive allele. The F; hybrid plants were next 
self-fertilized (Aa x Aa) and this cross is known as a 
monohybrid cross. In the offspring of monohybrid 
crosses, or F, generation, Mendel repeatedly observed 
a phenotype ratio of three plants with the dominant 
phenotype to one plant with the recessive phenotype 
(3:1 phenotype ratio) in the Fz generation. Mendel 
predicted that the plants with a dominant phenotype 
in the F, generation were of mixed genotypes with 
some being homozygous dominant genotype AA and 
others being heterozygous genotype Aa. In order to 
determine the genotypes of plants with dominant 
phenotypes in the F, generation Mendel devised the 
test cross. 

The test cross takes the organism with a dominant 
phenotype but unknown genotype and crosses it to 
a homozygous recessive individual with a known 
genotype aa. In a test cross with a plant of genotype 
AA all offspring will have the dominant phenotype 
and will have the heterozygous genotype Aa. How- 
ever, if a plant with genotype Aa is used in a test cross, 
then the genotypes of 50% of the offspring will have 
the genotype Aa and display the dominant trait. The 
other 50% will be display the recessive phenotype 
since they will have the homozygous recessive geno- 
type aa. Mendel’s test cross method is still used today 
in breeding procedures with plants and animals 
in order to determine the genotype of plants with 
dominant phenotypes. 


Independent Assortment 

Mendel also studied crosses where he followed the 
segregation of two separate pairs of contrasting traits. 
The initial cross involved two homozygous parents 
that differed in two different traits represented by 
the cross AABB x aabb. The F, offspring of this 
cross were dihybrid plants of the genotype AaBb. 
Mendel performed a dihybrid cross and examined 
the phenotypes and genotypes of F, plants. Mendel 
observed that each pair of traits was inherited independ- 
ently. He observed a 3:1 ratio of dominant to recessive 
trait when the A and B pairs of traits were considered 
separately, as independent crosses (Aa x Aa and Bb x 
Bb). When considered together in one cross (AaBb x 
AaBb), the combinations of traits appeared in the 
phenotype ratio of 9/16 with both dominant traits, 
3/16 with one dominant trait, 3/16 with the other, and 


1/16 with both recessive traits. This ratio is designated 
as Mendel’s 9:3:3:1 dihybrid ratio and is based on 
probability events involving segregation, independent 
assortment, and random fertilization. 


Meiosis 

Within a few years of the rediscovery of Mendel’s 
principles of heredity, the fields of cytology and genet- 
ics were brought together through the work of Walter 
Sutton and Theodor Boveri. The work of these investi- 
gators and others introduced the concept that chromo- 
somes in somatic cells exist as definite pairs. In contrast, 
gametes were found to be haploid and thus contained 
only one member of each chromosome pair. These 
observations were incorporated into a chromosome 
theory of heredity, which states that chromosomes are 
the carriers of genes and serve as the basis for 
Mendelian mechanisms of segregation and independ- 
ent assortment. Both segregation and independent 
assortment are explained by events involving distri- 
bution and sorting of homologous chromosomes 
during meiosis I. Independent assortment occurs 
when homologous pairs of chromosomes randomly 
align during metaphase I of meiosis and segregation 
results from the separation of homologous pairs dur- 
ing anaphase I. 


Linkage 

As early as 1903, Sutton and Boveri suggested that there 
are likely many more genes than there are chromo- 
somes. This indicated that if chromosomes were 
involved in heredity, then each chromosomes would 
have to contain many genes linked together, like beads 
on a string. These predictions based on cytology were 
supported by studies in transmission genetics. Specif- 
ically, it was observed that in certain dihybrid crosses 
some traits were inherited together and were therefore 
not transmitted according to the law of independent 
assortment. Mendel’s principle of independent assort- 
ment states that the distribution of one pair of genes 
into gametes is independent of the distribution of 
another pair. Genes located on the same chromosome 
are said to be linked genes and tend to be inherited as a 
group. Linkage of genes on chromosomes is usually 
not complete. New combinations of linked genes 
occur as a result of crossing-over during prophase I 
of meiosis. Crossing-over results in a reshuffling of the 
allele repertoire for a particular chromosome and adds 
to the genetic variation of sexually reproducing organ- 
isms. The degree of crossing-over between two loci on 
a single chromosome is proportional to the distance 
between the two loci. This correlation provided the 
basis for the construction of the first chromosome 
maps. Genetic mapping therefore owes its origins to 
experiments in transmission genetics, or ‘bean bag’ 
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genetics, conducted by two Drosophila geneticists, 
Thomas H. Morgan and Alfred H. Sturtevant. These 
experiments involved scoring the phenotypes of off- 
spring and calculating recombination frequencies of 
linked genes. Along with providing the basis of 
genetic mapping, the work of Sturtevant and Morgan 
also confirmed the chromosomal theory of heredity 
because they established that chromosomes contain 
genes in a linear order, and these genes were the units 
of inheritance observed by Mendel. 


Non-Mendelian Inheritance 


Interactions among Alleles 

In the early twentieth century, researchers sought to 
confirm and extend Mendel’s observations of heredity 
in Pisum using different organisms. In addition to the 
observation of genetic linkage, researchers also began 
to encounter other examples of non-Mendelian 
inheritance. Non-Mendelian patterns of inheritance 
were identified when crosses yielded a modified ver- 
sion of Mendel’s 3:1 phenotype ratio in the F, gener- 
ation of a monohybrid cross. In some cases the altered 
phenotype ratios in the F, generation reflected differ- 
ent types of dominance relationships among the alleles 
of a gene. This is observed for phenotypes resulting 
from incomplete dominance and codominance. Co- 
dominance and incomplete dominance yield unique 
phenotypes for heterozygous offspring (Aa). Incom- 
plete dominance results in heterozygotes with inter- 
mediate phenotypes, as in the case of snapdragons 
when parents with red flowers and white flowers are 
crossed resulting in heterozygous offspring with pink 
flowers. Codominance occurs when both alleles show 
dominance, as in the case of the AB blood type (7^ I”) in 
humans. Furthermore, the human ABO blood groups 
represent another deviation from Mendelian simpli- 
city since there are more than two alleles (A, B, and O) 
for this particular trait. Deviations from Mendelian 
inheritance are also observed in traits with phenotypes 
having variable penetrance and expressivity. In these 
cases, individuals with the same allele combination can 
produce different degrees of a phenotype in different 
individuals. An example of a trait that is incompletely 
penetrant is polydactyly, which is a dominant trait 
causing extra fingers and toes. In some cases, the 
dominant trait of polydactyly can skip generations 
due to incomplete penetrance and then reappear in 
future generations. 


Interactions among Genes 

Not long after the rediscovery of Mendel’s work, 
research in transmission genetics revealed that discrete 
traits, are often regulated by more than gene. This 
indicated that the role of genes in determining 
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phenotype was more sophisticated than Mendel’s 
observations had suggested. We know now that in 
reality few phenotypes result from the action of a 
single gene acting alone. Phenotypes result from the 
combined actions of the many genes and their protein 
products. The protein products of many genes func- 
tionally interact in metabolic pathways and regulatory 
processes, or physically interact as structural proteins 
within and between cells. 

Epistasis is a type of functional interaction between 
nonallelic gene pairs. In epistasis, a homologous pair 
of recessive alleles can override the phenotypic input 
of separate locus. Epistasis was first observed for 
genes that participate as part of a pathway involving 
many gene products, such as those controlling coat 
color and patterns in animals. An example of epistasis 
is albinism. In albinism, the homozygous pair of reces- 
sive albino alleles for albinism overrides the function of 
other nonallelic gene pairs involved in pigmentation. 


Effect of Sex on Phenotype 

Another type of non-Mendelian inheritance involves 
traits that are affected by the sex of an organism. In 
cases of sex-limited traits, expression is exclusively 
limited to one sex. In sex-limited traits, the expression 
genes are modified by an individual’s sex hormones. 
Examples of sex-limited inheritance include genes 
influencing the heaviness of beard growth in humans 
and genes influencing sex-limited differences in tail 
and neck plumage in domestic fowl. In comparison, 
sex-influenced traits are influenced by the sex of the 
organism, but are not limited to one sex or the other. 
Examples of sex-influenced traits include pattern 
baldness in humans, horn formation in sheep, and 
certain coat patterns in cattle. 


Mendelian Traits in Man 


Mendel’s Laws Are Universal 

Although the mechanisms of heredity were initially 
demonstrated using the garden pea plant, these prin- 
ciples were rapidly confirmed in a variety of different 
organisms. Mendelian inheritance was demonstrated 
in 1902 in poultry and mice. In 1905, W.E. Castle, a 
chief pioneer in genetics, introduced Drosophila as 
an experimental organism for genetic studies. In 
1902, Sir Archibald Garrod made the observation 
that the human disease alkaptonuria was caused by a 
block in a metabolic reaction sequence. Garrod 
hypothesized that the metabolic block was due to a 
congenital deficiency of a specific enzyme, and he 
suggested that the trait appeared to be inherited as a 
recessive trait. Mendelian inheritance was officially 
extended to humans in 1903, when albinism became 
the first human trait classified as a Mendelian recessive 


Table | 
humans 


A representative list of Mendelian traits in 


Recessive traits 


Dominant traits 


Albinism 
Alkaptonuria 
Ataxia telangiectasia 
Cystic fibrosis 
Duchenne muscular 
dystrophy 
Galactosemia 
Hemophilia 


Achondroplasia 
Brachydactyly 
Ehler—Danlos syndrome 
Huntington disease 


Hypercholesterolemia 
Marfan syndrome 


Neurofibromatosis 

Osteogenesis imperfecta 

Phenylthiocarbamide tasting 
(PTC) 

Porphyria 

Widow’s peak hairline 


Lesch—Nyhan syndrome 
Phenylketonuria 
Sickle-cell anemia 


Straight hair line 


trait, with normally pigmented skin inherited as a 
dominant trait. By 1910, other human traits such as 
brachydactyly (digit malformation) and the ABO 
blood groups were shown to be genetically deter- 
mined. Additional examples of dominant and reces- 
sive Mendelian traits are listed in Table 1. 


Early Genetic Mapping 

Genetic mapping of human traits has its origins in 
1911, with the assignment of the gene resulting in 
colorblindness (a recessive trait) to the X chromosome 
when it was observed that trait was inherited by sons 
from mothers who saw colors normally. Other dis- 
orders that affected males only were also mapped to 
the X chromosome. For X-linked disorders, females 
are protected by a normal copy of the gene on their 
second X chromosome, while males only have one 
copy of the X chromosome. 


Somatic Cell Hybrids 

The other 22 pairs of human chromosomes were vir- 
tually uncharted until late in the 1960s. The first break- 
through in mapping genes to autosomes resulted from 
studies using somatic cell hybrids, which are cell lines 
made by fusing mouse and human cells, and contain 
only a few copies different human chromosomes. 
Advances in the 1970s ultimately led to our modern 
methods in gene mapping. The first of these advances 
was the development of specific stains that produced 
banding patterns, making it easier for researchers to 
identify human chromosomes in hybrid cells. 


Genetic Markers and DNA Sequencing 
A major advance in genetic mapping occurred in 
the 1970s with the development of recombinant 


DNA techniques. Recombinant DNA technology 
led to new mapping strategies based on using DNA 
variations as markers on chromosomes and to the 
technique of in situ hybridization. DNA sequencing 
was first introduced in the 1970s and major advances 
in the technique, including automated DNA sequen- 
cing in the 1980s, allowed the determination of the 
order of bases in a strand of DNA and ultimately 
revealed the molecular structure genes on chromo- 
somes. 


Mendelian Trait Database 

Many disorders in humans are inherited as simple 
dominant or recessive Mendelian traits, including 
some 3500 disease genes. Most of the Mendelian dis- 
orders are rare and recessive traits occur more fre- 
quently when offspring result from matings between 
related individuals. Dominant disorders sometimes 
appear in families with no history of the trait and 
these are often cases resulting from spontaneous 
mutations in the germline of a parent. A database 
known as Online Mendelian Inheritance in Man 
(OMIM) is available and contains information on 
Mendelian traits in humans. The OMIM database 
that contains over 11000 phenotypes in humans that 
are presumed to represent a trait caused by a single 
gene. Over 6000 of the entries in the OMIM database 
represent mapped genetic loci, and this number will 
likely increase rapidly as work in the human genome 
project draws to completion. 


Modes of Inheritance 


Pedigrees 

Researchers today still use transmission genetics and 
the methods used by Mendel to study the inheritance 
of a trait. The first step with many experimental 
organisms begins with designing genetic crosses to 
study the inheritance of a phenotype. The results of 
these crosses are most significant when many of these 
designed crosses can be set up to yield large numbers 
of offspring for statistical analysis. However, how 
do we study trait transmission in organisms such as 
humans, where designed crosses and large numbers of 
offspring are not practical or available? The answer is 
that the inheritance of a phenotype in humans can be 
determined by pedigree analysis. Pedigrees are charts 
that depict family relationships and phenotypes. In 
pedigrees, males are represented by squares and circles 
represent females presenting family tree information 
in a chart known as a pedigree. In pedigrees, shaded 
symbols are used to represent individuals with a par- 
ticular phenotype. Examples of pedigrees demonstrat- 
ing two different modes of inheritance are shown in 
Figure I. 
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modes of inheritance. Males are represented by squares 
and females by circles. Horizontal lines indicate parents, 
vertical lines show generations, and elevated horizontal 
lines depict siblings. Symbols for individuals displaying a 
phenotype are shaded. (A) This pedigree is consistent with 
an autosomal recessive mode of inheritance. Individuals 
labeled | and 2 are both heterozygotes for the recessive 
trait (Aa x Aa). (B) This pedigree is consistent with an 
autosomal dominant mode of inheritance. Individuals 
labeled | and 2 do not have the phenotype and are 
therefore homozygous recessive (aa x aa). 


Standard Modes of Inheritance 

There are six standard modes of inheritance that can 
be reliably determined by careful pedigree analysis. 
The modes of inheritance reflect whether a trait is 
dominant or recessive, and the chromosomal linkage 
of the trait in question. When first attempting to elim- 
inate modes of inheritance it is easiest to initially 
assume that the trait displayed in the pedigree is due 
to the action of a single gene, and that the trait shows 
complete penetrance and uniform expressivity. The 
six standard modes of inheritance and examples of 
diseases are (1) autosomal recessive (cystic fibrosis, 
Tay-Sachs disease), (2) autosomal dominant (achon- 
droplasia, neurofibromatosis), (3) X-linked recessive 
(hemophilia, colorblindness), (4) X-linked domin- 
ant (congential generalized hypertrichosis, Rett 


2014 Transmission Genetics 


syndrome), (5) Y-linked (genes involved in male 
fertility and development), and (6) mitochondrial 
inheritance (leber optic atrophy, Leigh syndrome). In 
each case, characteristic patterns in pedigrees are used 
to eliminate other modes of inheritance. Important 
characteristics used to determine inheritance patterns 
are presented in Table 2. 


Polygenic and Multifactorial Traits 

Traits that result from the activities of more than one 
gene are polygenic traits. Unlike single-gene traits, 
which produce discrete phenotypes, polygenic traits 
produce a range of phenotypes. In a population, the 
distribution of phenotypic classes produced by poly- 
genic trait follows a bell-shaped curve. Polygenic 
traits are often multifactorial, since the resulting 
phenotypes are influenced to a certain degree by the 
environment. Human behavior and many diseases are 
multifactorial and it is often difficult to determine 
how much of the phenotype is genetically determined. 
A list of representative multifactorial disorders in 
humans is presented in Table 3. 

The recurrence risk of Mendelian traits in a family 
can be predicted by establishing the mode of inherit- 
ance using pedigree analysis. However, it is much 
more difficult to predict the recurrence risks for 
polygenic traits and to do so geneticists must use a 
variety of information from family and population 
studies. The human genome project will help with 
the diagnosis of many polygenic disorders as many 


Table 2 Mode of inheritance and pedigree analysis 


of the genes that predispose people to these illnesses 
are identified. Using data from the human genome 
project, insight into key genes controlling multifactor- 
ial traits such as intelligence and hypertension by ana- 
lysis of genome data using quantitative trait loci 
(QTL) algorithms. QTL analysis can reveal these 
genes by detecting loci that account for as little as 
1% of the observed variance in a trait. The Human 
Genome Project has the potential to radically change 
medicine, since someday the diagnosis of many multi- 
factorial disorders may begin at birth; decades before 
an individual experience the first symptoms. 


Hereditarianism 


Complex Traits 

In the early 1900s the public was introduced to 
Mendelian inheritance. At that time there was little 
distinction made between the inheritance of Mendelian 
traits and complex traits. Hereditarianism is the con- 
cept that all human traits are controlled solely by 
genetic inheritance and ignores the contribution of the 
environment. Many single-gene traits, such as those 
studied by Mendel, conform to hereditarian analysis 
since they are relatively resistant to environmental 
influences. Hereditarianism viewed human personality 
traits as Mendelian traits, and presented them in con- 
trasting pairs such as politeness versus bluntness and 
obedience versus disobedience. Hereditarianism did 
not acknowledge the possibility that not all familial 


Mode of inheritance 


Some characteristic patterns in pedigree 


Autosomal recessive 


Affected offspring usually born to unaffected parents 


Chance of affected offspring is 25% for children of carriers 


Affects either sex 


° 
° 
e |f both parents are affected, all children will exhibit trait 
° 
° 


Increased incidence with parental consanguinity 


Autosomal dominant e 


Affected individual has at least one affected parent? 


e Children with one affected parent have 50% risk of being affected 
e Affects either sex 


X-linked recessive 


Affects almost exclusively males 


Not transmitted from father to son 


If female inherits, father must have trait 


X-linked dominant 


All daughters of affected fathers exhibit the trait 


All sons of an unaffected mother will not have trait 


Y-linked 


Females never exhibit trait 


e Son always has same phenotype as father 


Mitochondrial inheritance 


All children of an affected mother inherit the disorder 


e None of the children of an affected father inherit the disorder 


“May not apply in the cases of non-penetrance or spontaneous mutation. 


Table 3 Representative examples of multifactorial 
human genetic disorders 


Multifactorial Disorders 


Breast cancer 

Bipolar affective disorder 
Cleft palate 

Dyslexia 

Diabetes mellitus 
Hypertension 

Neural tube defects 
Schizophrenia 

Seizure disorders 


traits are biologically inherited, and that even inherit- 
ed traits can have complex causes. 


Eugenics 

In 1911, geneticist Reginald Crundall Punnett was not 
alone when he warned that the knowledge of heredity 
in humans was “at present too slight and too uncertain 
to base legislation upon.” Nevertheless, by the early 
twentieth century hereditarianism became a part of 
American popular and political culture in the form 
of eugenic ideology. Eugenicists argued that society 
pays a high price for the birth of ‘socially inadequate’ 
people. Eugenicists warned that undesirable traits 
such as pauperism, feeblemindness, alcholism, rebel- 
liousness, nomadism, criminality, and prostitution 
were spreading in the general population. The goal of 
eugenic programs was to institute social policies that 
promoted certain human matings (positive eugenics) 
while discouraging others (negative eugenics). Social 
policies supported by eugenicists included restriction 
in marriage laws, immigration restrictions, and steril- 
ization laws. 


Sterilization law 

An example of a dangerous eugenic social policy that 
is still valid is the 1927 Supreme Court ruling in the 
case of Buck v. Bell. The case involved 17-year-old 
Carrie Buck, who was chosen as the first person to be 
sterilized under Eugenical Sterilization Act passed in 
1924 by the State of Virginia. Carrie had a child but 
was not married, and her mother was a resident at an 
asylum. A lower court decreed that Carrie was “the 
probable potential parent of socially inadequate off- 
spring” and that her sterilization would be a benefit to 
society, since “experience has shown that heredity 
plays an important part in the transmission of insanity, 
imbecility, etc.” The US Supreme Court upheld the 
lower court opinion and in the decision rendered Just- 
ice Holmes issued the infamous phrase that “three 
generations of imbeciles are enough.” Carrie Buck 
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and more than 60 000 Americans in institutions for the 
mentally ill were involuntarily sterilized following 
the 1927 decision. The practice of sterilization for 
the mentally ill continued into the mid-1970s and the 
Buck v. Bell precedent allowing sterilization of the 
“feebleminded’ has never been overuled. 


Conclusions and Prospects 


It is essential to examine both heredity and variation 
when tracing the passage of traits from generation to 
generation. One aspect of heredity that has been 
apparent since prehistoric times is that the offspring 
of sexually reproducing organisms are not exact dupli- 
cates of their parents — instead they usually vary in 
many traits. Plant and animal breeders have been able 
to harness this genetic variation and using controlled 
breeding they have accentuated certain desired traits 
in offspring over many generations. 

The principles of transmission genetics have endur- 
ing practical applications in the design of selective 
breeding strategies for agriculture. Hybrid corn pro- 
vides yields incomparable to that of inbred varieties. 
Poultry farmers can separate male and female chicks 
upon hatching through careful use stocks with sex- 
linked traits influencing plumage. Transgenic farm 
animals can produce rare human proteins for pharma- 
ceutical use in their milk (or semen in the case of 
boars) as a result of an innovative technique that 
crosses species barriers and streamlines gene transmis- 
sion by inserting foreign DNA directly into the germ- 
line. 

Mankind has applied the principles of transmission 
genetics since primitive times by the introduction of 
agriculture and selective breeding. Genetics is still a 
young science and our knowledge of heredity has been 
radically altered within just a few generations. Our 
fascination of heredity has had a dark past as evi- 
denced by the eugenic policies in the early twentieth 
century. The genome project is an exciting endeavor 
that is rapidly transforming genetics into an infor- 
mation science. It is important to consider the rights 
of individuals and to remember the lessons of the 
past when using genetic information in the future. 
Overall, both science and society have been radically 
transformed by research related to transmission 
genetics. 
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Transposable elements, or transposons, are discrete 
segments of DNA that can move within genomes. 
They were discovered in maize by Barbara 
McClintock, as the cause of unstable mutations (the 
instability resulting from excision of the transposon), 
and have since been found in every organism that was 
analyzed in any detail. 

In the first approximation transposons are prob- 
ably best viewed as molecular parasites, segments of 
genetic material that can ensure their own replication 
(albeit with the aid of multiple host factors). Note that 
this does not exclude that some transposons may have 
acquired a function that is beneficial to the host organ- 
ism; in ecological terms one could say that in those 
cases strict parasitism has turned into symbiosis. A 
well-known example in the present time is the rapid 
spread of antibiotic resistance genes via transposable 
elements. A more hypothetical example is the integra- 
tion of the RAG transposon into a predecessor of the 
immunoglobulin genes; this triggered the develop- 
ment of the current repertoire of segments of verte- 
brate immunoglobulin genes, which are combined 
through V(D)J recombination, a descendant of RAG 
transposon excision. 

There are bacterial transposons that encode factors 
for conjugative pili, and thus ensure their spread 
between cells. One step further along these lines are 
transposons where the state after leaving one cell and 


before entering the other cell has become stabilized. 
Thus bacteriophage Mu can be viewed as a trans- 
poson, but it is also a perfectly fine member of the 
family of lambdoid temperate bacteriophages. Simi- 
larly retroviruses such as MoMLV and MMTV can be 
viewed as retrotransposons that can be packaged. 

Not all genetic elements that move within or 
between genomes are considered transposons. In 
general, the ability to integrate into several different 
positions in the genome distinguishes transposons 
from other elements. Alternatively mobile elements 
may move by site-specific recombination, as is the 
case for the well-studied bacteriophage lambda (see 
Site-Specific Recombination). 


Structure of Transposons 


Transposons are mostly flanked by short direct 
repeats of host DNA, which result from duplication 
of the target DNA during the integration reaction (see 
below). The termini of the transposons themselves are 
often inverted repeats; some level of symmetry should 
be no surprise if one realizes that the mechanistic 
events that integrate each of the two transposon ends 
are usually identical. Transposons minimally encode 
their own transposase, the protein(s) required for the 
transposition reaction (see below). In addition they 
can encode e.g., antibiotic resistance genes, in the 
case of many bacterial transposons, and structural 
virion genes, e.g., in the case of retroviruses and bac- 
teriophage Mu. Bacterial transposons such as Tn10 
and Tn5 are composite elements: they consist of two 
insertion sequences (IS10 for Tn10, and IS50 for Tn5) 
flanking a unique middle segment, which encodes the 
transposase; IS sequences are also found as separate 
mobile elements in the genome. 

Bacterial transposase proteins often show a cis- 
preference, meaning that the protein acts preferen- 
tially on the nearest transposon termini it encounters 
after it has been synthesized, those of the actual 
encoding element itself. In eukaryotes where there is 
spatial separation between cytoplasmic transposase 
protein synthesis and nuclear transposase activity, 
there can be no preference for any transposase to act 
on the encoding copy of the transposon. Hence, in 
the common situation where a genome contains 
multiple copies of a given transposon, there is no 
selective disadvantage for a given copy to lose the 
transposase gene, as long as it can move using the 
transposase encoded by other copies. Copies encod- 
ing an active transposase are called autonomous; 
mutant derivatives (usually deletions) that have lost 
their own transposase genes are called nonautono- 
mous. The first class of transposons, discovered by 
McClintock, owes its double name to this: Ac/Ds for 
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Transposable elements, or transposons, are discrete 
segments of DNA that can move within genomes. 
They were discovered in maize by Barbara 
McClintock, as the cause of unstable mutations (the 
instability resulting from excision of the transposon), 
and have since been found in every organism that was 
analyzed in any detail. 

In the first approximation transposons are prob- 
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genetic material that can ensure their own replication 
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this does not exclude that some transposons may have 
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ism; in ecological terms one could say that in those 
cases strict parasitism has turned into symbiosis. A 
well-known example in the present time is the rapid 
spread of antibiotic resistance genes via transposable 
elements. A more hypothetical example is the integra- 
tion of the RAG transposon into a predecessor of the 
immunoglobulin genes; this triggered the develop- 
ment of the current repertoire of segments of verte- 
brate immunoglobulin genes, which are combined 
through V(D)J recombination, a descendant of RAG 
transposon excision. 

There are bacterial transposons that encode factors 
for conjugative pili, and thus ensure their spread 
between cells. One step further along these lines are 
transposons where the state after leaving one cell and 


before entering the other cell has become stabilized. 
Thus bacteriophage Mu can be viewed as a trans- 
poson, but it is also a perfectly fine member of the 
family of lambdoid temperate bacteriophages. Simi- 
larly retroviruses such as MoMLV and MMTV can be 
viewed as retrotransposons that can be packaged. 

Not all genetic elements that move within or 
between genomes are considered transposons. In 
general, the ability to integrate into several different 
positions in the genome distinguishes transposons 
from other elements. Alternatively mobile elements 
may move by site-specific recombination, as is the 
case for the well-studied bacteriophage lambda (see 
Site-Specific Recombination). 


Structure of Transposons 


Transposons are mostly flanked by short direct 
repeats of host DNA, which result from duplication 
of the target DNA during the integration reaction (see 
below). The termini of the transposons themselves are 
often inverted repeats; some level of symmetry should 
be no surprise if one realizes that the mechanistic 
events that integrate each of the two transposon ends 
are usually identical. Transposons minimally encode 
their own transposase, the protein(s) required for the 
transposition reaction (see below). In addition they 
can encode e.g., antibiotic resistance genes, in the 
case of many bacterial transposons, and structural 
virion genes, e.g., in the case of retroviruses and bac- 
teriophage Mu. Bacterial transposons such as Tn10 
and Tn5 are composite elements: they consist of two 
insertion sequences (IS10 for Tn10, and IS50 for Tn5) 
flanking a unique middle segment, which encodes the 
transposase; IS sequences are also found as separate 
mobile elements in the genome. 

Bacterial transposase proteins often show a cis- 
preference, meaning that the protein acts preferen- 
tially on the nearest transposon termini it encounters 
after it has been synthesized, those of the actual 
encoding element itself. In eukaryotes where there is 
spatial separation between cytoplasmic transposase 
protein synthesis and nuclear transposase activity, 
there can be no preference for any transposase to act 
on the encoding copy of the transposon. Hence, in 
the common situation where a genome contains 
multiple copies of a given transposon, there is no 
selective disadvantage for a given copy to lose the 
transposase gene, as long as it can move using the 
transposase encoded by other copies. Copies encod- 
ing an active transposase are called autonomous; 
mutant derivatives (usually deletions) that have lost 
their own transposase genes are called nonautono- 
mous. The first class of transposons, discovered by 
McClintock, owes its double name to this: Ac/Ds for 


the autonomous activator Ac, and the nonautonomous 
dissociator Ds. 


Classification 


For an extensive overview of classes of transposons 
readers are referred to the monographs mentioned 
below. A classification by host organism makes little 
sense, since it has been discovered that families of 
transposons that share the most important features of 
their mechanism of jumping, and also often show 
extensive sequence similarity can be encountered in 
plants, as well as animals, fungi, and prokaryotes. A 
more meaningful distinction is by the mechanism 
of jumping, and (often related) the structure of the 
element. 

One distinction is between elements that transpose 
through a a DNA intermediate and those that have an 
RNA intermediate. The former are called DNA trans- 
posons, the latter retrotransposons. 


DNA Transposons 

DNA transposons come again in two types. There are 
those that jump via a simple cut-and-paste mechan- 
ism. Those include, to name some of the best-studied 
transposons, bacterial transposons Tn7, Tn10, the P 
element of Drosophila, the Tcl transposon (and related 
mariner transposon) from Caenorhabditis elegans and 
other organisms, and Tam and Ac/Ds elements in 
plants. In all cases the transposition reaction is 
initiated by double-strand DNA breaks at the trans- 
poson termini, after which the excised element can 
move to a new genomic target and reintegrate. Note 
that for these transposons the transposition process 
itself does not result in a rise in transposon copy 
number. For the transposon to multiply within a gen- 
ome, it must thus depend on additional features. One 
possibility is that the transposon preferentially trans- 
poses from replicated DNA into a part of the genome 
that has not yet been replicated, which results in 
duplication of the transposon in one of the two daugh- 
ter cells. Another mechanism that seems to be respon- 
sible for transposon duplication in some cases is 
templated repair of the donor site after transposon 
excision: a transposon excises, and the break left in 
the donor DNA is repaired using as repair template 
either the sister chromatid or the homologous chromo- 
some. This repair replication will then often insert a 
new copy of the transposon into its old position. 

The other class of DNA transposons is character- 
ized by a transposition process that is intrinsically 
replicative. Examples are bacterial transposons Mu, 
gamma-delta, and Tn3. These transposons are never 
excised from their original position in the genome. 
Instead breaks occur only at the 3’ ends of the two 


Transposable Elements 2017 


transposon ends. These are then fused to a new target 
in the genome; this reaction creates a forked structure 
at each transposon end, which is very similar to a 
replication fork, and can indeed be zipped open by 
replication of the transposon. Note that, as shown 
below, the difference between the two classes of trans- 
posons, replicative and nonreplicative, is smaller than 
one might have thought. 


Retrotransposons 

These transposons do not excise, and do not even 
undergo single-strand DNA breaks at their termini 
to initiate transposition. They are transcribed. The 
resulting RNA can be reverse transcribed into DNA, 
by the enzyme reverse transcriptase (RT), which is 
usually encoded by the pol (for polymerase) gene of 
the transposon. There are two classes of retrotrans- 
posons: LTR and non-LTR. LTR stands for long ter- 
minal repeat. LTR transposons are first transcribed 
into an RNA that contains part of each LTR at each ter- 
minus. Then via a reverse transcription process that 
includes a complex series of template jumps between 
the two termini, a genomic cDNA is generated that 
has complete LTR copies at each end. This DNA copy 
is then integrated into a new chromosomal DNA 
target. This integration reaction is catalyzed by the 
transposon encoded integrase protein. Well-known 
LTR retroelements are the retroviruses MoMLV, 
MMTV, and HIV, the yeast transposons Ty1 and 
Ty3, Copia of Drosophila melanogaster, Tal of Arabi- 
dopsis thaliana, and IAP of mice. 

Non-LTR retroelements are, e.g., the LINES and 
SINES that make up a considerable part of the human 
genome, and the I element of Drosophila. SINES do not 
encode RT and integrase, they probably use the trans- 
position machinery of LINES (also see Retroposon). 


Mechanism of Transposition 


Chemical Steps in Transposition 

While the above classification might suggest other- 
wise, the chemical steps in transposition are universal. 
In all cases the transposon is inserted into its new 
target DNA by a pair of nucleophylic attacks of the 
two 3’ hydroxyls at each terminus of the transposon 
on two phosphodiester bonds in the opposite strands 
of the target DNA. Since these two phosphodiester 
bonds are always a few positions apart, there is a 
stagger in the target DNA, which causes the 
target duplication characteristic for all transposons 
(see Figure 1). It is important to note that the target 
DNA is thus never actually cut: the in-line attacks of 
the transposon 3’ hydroxyls remove and replace the 
original hydroxyl groups at the target. This implies 
that no exogenous energy is required to ligate the 
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Figure |I The donor cleavage. (A) Cutting of 3’ ends 


only (e.g., Mu, Tn3). (B) Cutting of both ends (e.g., Tn10, 
Tn7, P element, Tcl). (C) Retrovirus donor cleavage. 


transposons to its new target, since the energy of the 
cleaved bonds is retained in the new bonds. How then 
can replicative and nonreplicative transposons all inte- 
grate via this one universal reaction? All reactions are 
initiated by a simple hydrolysis that releases the 3’ 
transposon terminus from its flank. (Note that this is 
even true for most LTR retrotransposons: the prod- 
ucts of the reverse transcription of e.g. HIV are two 
base pairs longer at each end than the integrated 
copies; two nucleotides are removed from the 3’ end 
by a hydrolysis.) The difference between replicative 
and nonreplicative elements is what happens to the 
other strands. These are cut for simple cut-and-paste 
transposons, while they are not for, e.g., phage Mu. 
Then in both cases the newly generated 3’ ends are 
fused to the target DNA; the difference is that the 
replicative elements still have the complete flanking 
DNA dangling on, while the cut-and-paste transpos- 
ons have at most a few nucleotides (as result of a short 
stagger with which the transposon was excised). The 
point is illustrated in Figure IB, and discussed in 
more detail in Sherratt (1995). 

The standard technical term for the hydrolysis that 
releases the 3’ hydroxyl groups from their flanking 
DNA is donor cleavage. The reaction in which these 
3’ hydroxyls are fused to the new target DNA is called 
strand transfer. Donor cleavage and strand transfer are 
not exactly the same as excision and integration: e.g., 
retrotransposons never excise, but as mentioned above 
most of them need a donor cleavage to remove a few 
nucleotides from the double-stranded linear trans- 
poson DNA before integration can occur. Integration 
is almost identical to strand transfer, except that the 
strand transfer reaction per se only fuses the 3’ ends of 


the transposon to its new target, so that subsequent 
DNA repair is required before the transposon is fully 
integrated. 


Target Choice 

While transposons are in the first approximation dis- 
tinguished from site-specific recombination systems 
by their ability to integrate randomly, there are actu- 
ally varying degrees of freedom in target choice, but 
there hardly ever is complete randomness. Some trans- 
posons have absolute target requirements (e.g., the 
Tcl/mariner transposons need a TA dinucleotide at 
the target). Most transposons have some preference 
for a few nucleotides at the target DNA; for Mu, Tn10, 
and Tcl a preferred integration consensus could be 
found. These consensuses are loose enough that the 
transposon will still integrate into most genes at many 
positions. Some yeast retroelements prefer to integrate 
into promoter regions. The bacterial Tn7 transposon 
has two modes of integration: site-specific as well as (at 
a lower frequency) random. Apart from general target 
preferences, there can be other restraints, such as the 
preference for P elements and for plant transposons to 
integrate into the region of the chromosome it excised 
from; apparently there is only limited diffusion of the 
excised transposon, either in three dimensions, or by 
scanning along the chromosome after excision. 


The Transposase Proteins 


Both the donor cleavage and the strand transfer reac- 
tions are catalyzed by the transposase. Starting from 
the assumption that transposons are selfish DNA, one 
could expect that in principle transposons encode 
their own transposase, and this is indeed what is 
found. In all cases there is a separate domain that 
specifically binds to the termini of the transposon 
(usually it binds to a region of approximately 20 base 
pairs, which is a few base pairs removed from the 
actual transposon end). Then there is a catalytic 
domain. Interestingly, these catalytic domains show a 
clear conservation at two levels: the overall folds of the 
transposase of Mu (A), resembles that of retroviral 
integrases (and also that of the ruvC protein involved 
in resolution of intermediates in homologous recom- 
bination). Also, a characteristic triad of amino acid 
residues is observed, with a conserved spacing: the 
so-called DD(35)E motif; these residues are at the 
heart of the catalytic site of these phosphoryl transfer- 
ase proteins, and they are thought to coordinate the 
divalent cations (commonly Mg**) that are required 
for the reaction. 

Some analysis has been done concerning the stoi- 
chiometry of the transposase complex; it is clear that 
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Figure 2 Donor cleavage and strand transfer. (A) 
Here shown for an integrating retrovirus are the donor 
cleavage (as in Figure l), indicated by small arrows, 
followed by strand transfer. Note that the stagger shown 
in the target DNA results in short single-strand 
sequences in the target DNA that flank the integrated 
element; after repair replication these will become short 
target duplications. (B) The product of strand transfer 
shown for a retrovirus, a replicative DNA transposon 
(note that this structure needs to be resolved by 
DNA replication through the Mu transposon), and a cut- 
and-paste transposon, Tn10. In the latter case the two 
ends of the broken donor DNA are still shown in the 
picture; in reality they may be far removed from 
the integration site, but they are shown to emphasize 
the similarity in the reaction mechanisms. (See Sherratt, 
1995.) 
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at least two, possibly four, transposase subunits are 
required for the complete reaction. The two ends of 
the transposon are actually complexed during the 
donor cleavage as well as the strand transfer reaction. 
It has been shown that cutting at one transposon ends 
depends on proper recognition also at the other end; in 
the strand transfer reaction, it is the precise spatial 
organization of the two transposon termini which 
determines the precise distance between the phospho- 
diester bonds at the target DNA that are being 
cleaved. Small differences in the multimeric integrase 
complex can thus explain why the MLV genome is 
always flanked by four and the HIV genome by five 
base pair target duplications. 

There is limited knowledge of the host proteins 
required in transposition. What is known is almost 
all based on bacterial transposons, where small basic 
proteins are sometimes required or at least strongly 
stimulate transposition. 


Regulation of Transposition 


All organisms contain transposons in their genomes, 
and it is clear that the frequency of transposition must 
be controlled. The population biology is as for any 
parasite: the transposon can not replicate too aggres- 
sively, since it should not affect the fitness of the host 
too much. The regulation of transposition is still an 
open area for research. The P element of Drosophila 
encodes its own repressor; an alternative splice in the 
transposase transcript determines whether an active 
transposase or a repressor protein is synthesized. 
This alternative splice ensures that the transposon 
can jump in the germ line, but remains silent in 
somatic cells. Interestingly, the opposite is found for 
the Tcl element of the nematode C. elegans, which is 
active in somatic cells of all strains, but is kept quiet in 
the germline of some strains; the nature of the silen- 
cing genes (mut genes) is to be determined. Plant 
transposable elements are also regulated at the sub- 
strate level by DNA methylation. Especially with the 
aid of ongoing genome projects, it should now 
become feasible to identify the genes that control 
transposon activity. 


Transposons as Tools 


Transposons generate DNA fusions. This can be 
exploited in experimental genetics in many ways: 
e.g., interruption by a transposon can inactivate a 
gene, or insertion next to a gene can activate a gene. 
After transposon insertion mutagenesis, the trans- 
poson can be used as a tag to recover the mutant 
gene relatively easily. An enhancerless reporter gene 
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can be included in a transposon, so that after massive 
transposon insertion, one can screen for organisms in 
which the reporter is expressed in an interesting fash- 
ion (e.g., tissue specific, or only conditionally); these 
experiments are referred to as enhancer traps or gene 
traps. Other applications focus less on the integration 
site as such; transposons can be used for transgenesis: 
to ensure that a precisely defined DNA segment is 
integrated in single copy (but at an unknown position 
in the genome). Transposons have also found numer- 
ous applications in gene mapping and sequencing 
projects. 
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The concept of transposable elements (TEs) was first 
published in 1948 by Barbara McClintock as a result 
of combined genetic and cytological studies in maize. 
In contrast to the linear and relatively stable arrange- 
ment of genes in linkage groups, transposable elem- 
ents seemed to possess the ability to change their 
position in the genome. They were even able to jump 
from one chromosome to another. After the nature of 


transposable elements had been revealed by molecular 
analysis and it had become clear that transposable 
elements are ubiquitous in prokaryotic and eukaryotic 
organisms, McClintock’s discovery and work were 
finally rewarded by the Nobel Prize in 1983. TEs 
have, in fact, been found in every organism analyzed 
so far. In plants they can contribute more than 80% of 
the total DNA of the genome. In general, plants serve 
as hosts for elements that are structurally and func- 
tionally similar to those found in yeast or mammals. 
However, there are differences in the genomic organ- 
ization of plants and in the heterogeneity of their 
families (see below). 

In prokaryotes (where the number of transposable 
elements is much lower) there are clear advantages for 
the host that carrys, for example, antibiotic-resistant 
elements. Despite the often enormous contribution of 
these elements to DNA in plants, it is still unclear 
whether there are long-term benefits for the host spe- 
cies carrying them or whether they must be regarded 
as mainly ‘selfish DNA.’ 


Discovery of Transposable Elements in 
Maize 


McClintock’s successful studies in the 1940s were 
based on classical genetic presuppositions. For 
example, she concentrated on suitable genetic traits 
such as genetic loci associated with plant pigments 
or distinguished morphological characters belonging 
to an ideal monitoring organ for genetic analysis 
such as the maize karyopsis. Although transposable 
elements are normally known to cause unstable 
mutations by insertion into marker genes, the first 
evidence of transposition came from a genetic con- 
stellation where a defective nonautonomous element 
able to cause aberrant transposition events was acti- 
vated in trans by an autonomous element of the same 
family. 

Although the molecular basis of transposition was 
not known for another 30 years, McClintock made 
three observations that enabled her to devise the 
concept of transposable elements. McClintock termed 
her first observations such as ‘unstable loci’ c (color- 
less aleurone) and wx (waxy). Normal loci known to 
be stable during propagation suddenly became 
unstable. All the mutable loci were located on the 
short arm of chromosome 9. They became unstable 
because a locus termed Ds (Dissociation) caused 
chromosome breakage, which resulted in the loss 
of the distal chromosome fragment carrying the 
markers in subsequent cell divisions. The second 
observation was that chromosome breakage only 
occurred in the presence of a controlling element, 


which McClintock termed Ac (Activator). The third 
key to the idea of transposition was the proof that Ac 
and Ds could both move from their loci to other sites 
in the genome. They both displayed Mendelian inher- 
itance; however, they changed their location on the 
chromosomes by a transposition event (McClintock, 
1987). Later, molecular analysis revealed the Ds at 
the proximal position of the short arm of chromo- 
some 9 to be a defective nonautonomous element of 
the Ac/Ds family called double Ds. One Ds copy of 
this family had jumped into another Ds and the result- 
ing element often causes chromosome breaks and 
rarely transposes. The Ac element is the autonomous 
master element of the family and mobilizes nonauto- 
nomous Ds elements in trans. Transposons of the Ac/ 
Ds family are DNA elements (Class II transposons) 
(see ‘Ac/Ds superfamily in maize and Antirrhinum’ 
below), which transpose by a cut-and-paste mechan- 
ism. 


Overview and Classification 


The transposition mechanisms of TEs in plants are 
more diverse than those of prokaryotic elements. In 
plants, a transposition event leads to a mobile element 
insertion at a new acceptor site in the genome. How- 
ever, there are differences between the various types 
of elements concerning the generation of insertion 
sequences. Two main classes, comprising elements 
with similar mechanisms of transposition, can be dis- 
tinguished (Figure 1). Class I elements are retrotrans- 
posons which transpose via an RNA intermediate; 
therefore, transcription is the first step to generate 
the inserted copy. The RNA is retrotranscribed in 
the next step by reverse transcriptase action into extra- 
chromosomal DNA and this DNA is finally inserted 
at the target site. Class II elements transpose via a ‘cut- 
and-paste’ mechanism; the DNA copy is excised at the 
donor locus and inserted at the acceptor site. In Class I 
elements the life cycle is replicative and increases the 
number of elements per cell. In Class II elements the 
mechanism is nonreplicative and the number of elem- 
ents is not increased by the transposition mechanism 
per se. The elements of the Ac/Ds family described 
above belong to Class II (see below). Class I and 
Class II elements can be subdivided into different 
groups displaying similar structures or similar modes 
of transposition. Both classes comprise autonomous 
and nonautonomous elements. Nonautonomous trans- 
posable elements have to be mobilized by autonomous 
ones in trans. In plants Class I retrotransposable elem- 
ents occur in greater numbers. In contrast to non- 
replicative Class II elements, Class I elements give 
rise to stable mutations because they are not excised 
from their donor site. Autonomous Class II elements 
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create unstable mutations and a gene function may be 
restored by their excision. 


Class I: Retrotransposons 


A clear correlation between structure and mode of 
transposition can be observed for all transposable 
elements. Whilst retrotransposition always starts by 
transcription of the element, the full retroviral life 
cycle generates their characteristic long terminal 
repeats (LTRs). The presence of these repeats is also 
the basis for a further classification of retrotranspos- 
able elements: Although all retrotransposons transpose 
via an RNA intermediate, some of these elements fol- 
low the mechanism of the retroviral replication, yet 
others do not. In plants the first retroelements were 
described some 15 years ago. The LTR-elements cin 
and Bs? as well as the non-LTR element cin4 have all 
been isolated from maize. LTR-elements have been 
detected in algae, bryophytes, pteridophytes, gym- 
nosperms, and angiosperms. Non-LTR elements 
have also been found throughout the plant kingdom. 
Looking at the elements sequenced so far, the vast 
majority of plant retroelements are nonautonomous. 
Only a few actively transposing retroelements have 
been isolated until now. 


LTR-Retrotransposons in Plants 

In comparison with other retroelements, LTR- 
retrotransposable elements are the most retrovirus- 
like TEs. They encode the gag, pol, RT/RNaseH, 
and the int (integrase) sequences constituting genes 
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Figure | Transposition of Class | and Class Il 
elements results in the integration of a mobile DNA 
sequence at a new host acceptor site. Class II elements 
transpose via a cut-and-paste mechanism. The inserted 
sequence has to be cut out prior to another insertion. 
Class | elements generate the inserted sequence by 
transcription in the first step and retrotransposition in 
the second. 
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characteristic for retroviruses; however, the env genes 
are missing. (If a coding sequence similar to the env 
(envelope) can be detected, the element is generally 
classified as a retrovirus.) In yeast, retrovirus-like par- 
ticles are formed by the Ty1 gag, which encodes for 
the structural proteins of the virion core. However, as 
the env homolog is missing, they do not give rise to 
infectious viruses. Thus, the elements might have ori- 
ginated from retroviruses either by loss of the env 
gene, or the incorporation of this gene into an LTR- 
element might have led to a retrovirus. Because of the 
missing env gene, LTR-retrotransposable elements 
can be regarded as retroviruses that are unable to 
leave their host cellular units. 

In yeast and Drosophila these elements have been 
classified into two groups according to the differences 
in their structural organization as well as homology of 
their sequences: Ty1/copia and Ty3/gypsy. The struc- 
tural organization and mechanism of transposition are 
identical to that of retroviruses. Even the arrangement 
of retroviral proteins is similar in both groups. The 
gag and the protease genes precede the reverse tran- 
scriptase and the RNaseH gene. However, they differ 
in the location of the znt (integrase) gene (Figure 2) 
(Kunze et al., 1997). 

The transposition mechanism of LTR transposons 
and retroviruses is thought to be the same. In all cases 
retrotransposition starts with the transcription of 
the element from a promoter located in the 5’ LTR. 
The primary transcript is then reversely transcribed 
into double-stranded copy-DNA and finally inte- 
grated into the host genome by integrase action. This 
enzymatic activity creates a staggered nick at the new 
host site. The gaps resulting from ligation of the 
double-stranded element to the single-stranded stag- 
gered host DNA are filled in by cellular repair 
enzymes. Therefore, the inserted copy shows LTRs 
(caused by the retroviral-like replication) which are 
bordered by short direct repeats (caused by integra- 
tion into the host DNA) that are usually 5 bp long. 
The length of these repeats is due to the integrase, 
which sets the staggered cuts. 

In plants elements of the Ty1/copia group seem to be 
ubiquitous. They have been identified in algae, ferns, 
gymnosperms, monocots, and dicots. Sequence analy- 
sis has revealed that most of the elements identified 
seem to be inactive: They either carry defects in the 
protein-coding regions, but can still be mobilized by a 
different reverse transcriptase activity in trans, or they 
have accumulated mutations in the LTRs after integra- 
tion, which totally inhibit mobilization. 

The first functional and autonomously transposing 
Ty1/copia elements to be isolated in plants were the 
Tnt1 and Ttol retrotransposons from tobacco. Tnt1 
has a size of 5.3 kb, 610 bp LTRs and generates 5 bp 


target-site duplications upon integration. There are 
more than 100 Tnt1 copies in the tobacco genome. 
The polyprotein encoding sequence is similar to that 
for the Drosophila copia element (sequence homology 
and length). At the protein level homologies range 
from 29 to 42% (gag, prot, endo, and RT/RNaseH 
domains). The number of elements may vary in the 
extreme. Bs/ in maize, for example, is present in only 
one to five copies, but BARE-1/ in barley exists in more 
than 50000 copies per genome, so constituting a 
considerable part of the genome. 

Elements of the Ty3/gypsy group have been isolated 
from plants as well, but so far they have been investi- 
gated in less detail than the Ty1/copia elements. The 
Ty3/gypsy group has been detected in gymnosperms 
and angiosperms and is also expected to occur 
throughout the plant kingdom. The first Ty3/gypsy 
member to be identified was the del1-46 TE from 
Lilium henryi (see Figure 2). It occurs in about 
13000 copies in the nuclear DNA. In contrast to 
their similar structural organization, protein hom- 
ology with the Ty1/gypsy group is rather limited. 
The most obvious structural difference is the position 
of the int coding region. It is located upstream of the 
RNaseH in the Ty3/gypsy elements. The number of 
certain Ty3/gypsy transposons may be as high as the 
Ty1/copia elements. In maize, for instance, there are 
about 20 000 copies of Cinful-1 whichis 8.6 kb long and 
terminated by 586 bp LTRs (Kumar and Bennetzen, 
1999). 

Owing to their transposition mechanism, the retro- 
transposon copy number is enlarged with every tran- 
script eventually converted into a cDNA. There are 
many retrotransposon elements that have been isol- 
ated from plants. In general, plant genomes seem to be 
surprisingly tolerant to increasing copy numbers of 
TE sequences until they finally contribute significant 
amounts of DNA to the host genome. Whereas the 
number of Ty1 elements is, for example, rather limited 
in yeast (some 40 copies), an estimated 1 000 000 LTR- 
retrotransposons occur in Vicia faba. 


LINEs and SINEs: Non-LTR- 
Retrotransposons in Plants 
The non-LTR-retrotransposons constitute a second 
group of retrotransposable elements. The lack of LTRs 
distinguishes them structurally and mechanistically 
from LTR elements (the latter following the retroviral 
replication cycle). LINEs (long interspersed nuclear 
elements) and SINEs (short interspersed nuclear elem- 
ents) occur in the genome as part of the repetitive 
DNA that is not arranged in tandem repeats but is 
interspersed. 

Interestingly, LINEs partly encode the same pro- 
teins as LTR-transposable elements: gag and RT/ 
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Figure 2 (A) Tnt! from tobacco (belonging to the Ty!/copia group) and dell-46 from Lilium henryi (a member of 
the Ty3/gypsy group) display a similar structural organization. LTRs (long terminal repeats) mark the termini 
of the integrated elements. Target site duplications are indicated by triangles outside the LTRs; however, location of 
the int (integrase) differs. (B) Retrotransposition of LTR-retrotransposons starts by transcription which is followed by 
reverse transcription. The new copy is inserted by integrase action. The host repair machinery fills in the gaps 
resulting from the insertion process. 
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RNaseH. Instead of int (integrase), a coding sequence 
for an endonuclease (EN) seems to be responsible 
for the insertion into the host genome. The terminal 
poly(A) tract, the missing LTRs, and the flanking 
direct inverted repeats indicate that, despite the partly 
identical protein functions, a different mechanism for 
transposition compared to LTR-bordered transposons 
is used by these elements. Hints about the mode of 
transposition were obtained by the isolation of the 
first plant non-LTR retrotransposon. 

The first plant LINE Cim4—-1 was detected as an 
insertion in the 3’ untranslated region of the A1 gene 
in maize. Full-length Cin4 elements are 4kb in size 


and present in 50 to 100 copies per genome. Further 
isolated Cin4 elements showed identical 3’ ends, but 
heterogeneously truncated 5’ ends. Based on the 
observation that short regions of homology could be 
identified between the sequenced Cin4 5’ ends and the 
adjoining target site duplication, a model for LINE 
transposition as depicted in Figure 3 was developed 
(Schwarz-Sommer et al., 1987). Transcription starts at 
a promoter at the 5’ end of the element. LINEs carry 
RNA-Poll] and SINEs carry RNA-PollII promoters. 
These promoters are transposed as well, thus ensuring 
further mobility of an element that has already moved 
in the genome. In the next step reverse transcription 
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Figure 3 (A) Typical LINE structure and p-SINE! from rice. The LINE encodes for the Gag (coat protein), the RT/ 
RNaseH (reverse transcriptase/RNaseH), and the EN (endonuclease). The varying target site duplications are marked 
by nested and non-nested triangles. A hypothetical transposition model for non-LTR retrotransposons starts by 
transcription from Polll or Pollll promoters. (B) In the second step the endonuclease (En) cuts the host acceptor 
site. Retrotranscription is primed by RNA invasion using the free 3’-hydroxyl end in the host DNA as a primer. The 
host DNA of the opposite strand serves as primer for the second strand synthesis. 


starts at the 3’ end of the element. The transcription is 
primed using a free 3’ hydroxy] end which results from 
endonuclease (EN) activity in the genomic DNA. The 
second strand is finally synthesized using the free 3’ 
OH end in the opposite genome strand as a primer for 
the cellular repair machinery. Owing to the mechan- 
ism of integration by short homologies between the 5’ 
end of the element and the host DNA at the staggered 
nick, the length of the target side duplication varies 
from about 3 to 16 bp. In contrast, LTR elements (see 
“LTR-Retrotransposons in plants,’ above) and Class II 
elements (see below) produce discrete target side dupli- 
cations (generally 5 bp in the case of LTR elements). 

Of special interest concerning quantitative effects is 
the del2 LINE in the lily. A 4.5 kb unit is present in 
250000 copies in Lilium henryt. It constitutes 4% of 
the lily genome. The element is found in many mono- 
cotyledons but interestingly in only four of eight lily 
species investigated. The vast number of copies and its 
presence in only some lily genomes illustrates the 
astonishing plasticity of the plant genome. In particu- 
lar LINEs contribute a major part to the repetitive 
DNA in plants. 

SINEs are relatively small elements. In contrast to 
LINEs they only give rise to transcripts that do not 
encode for proteins necessary for transposition; there- 
fore, they are also termed retrogenes. For mobiliza- 
tion these functions have to be supplied in trans. 
However, the cis-acting determinants and structures 
are very similar to those in LINEs. A promoter that is 
part of the transcript, and a 3’ end that terminates in 
poly(A) or A- or T-rich sequences as well as target site 
duplications in the host DNA all indicate that the 
transposition mechanism is similar to that of LINE 
elements. Most probably the necessary trans-acting 
functions are provided by other retroelements present 
in the host genome. The promoter regions show motifs 
similar to those known from transfer RNA (tRNA) 
genes. The promoter, usually an RNA-pollII promoter 
that is part of the transcript, distinguishes them from 
intronless pseudogenes. 

In rice, a family of related sequences has been 
identified that occurs in more than 100 copies in the 
genome: the p-SINE/ elements are on average 125 bp 
long, flanked by 14-15 bp target site duplications and 
they contain an RNA-PollII promoter (Figure 3). 
However, they terminate in a T-rich pyrimidine tract 
at the 3/-ends. Similar elements have been found in 
Craterostigma plantagineum where the insertions are 
flanked by 12-17 bp target site duplications and ter- 
minate in an oligo(T) tract as well. The size varies 
between 0.65 and 0.9 kb, which is unusually large for 
SINE-like elements. 

Given that the model for the transposition mechan- 
ism of LINEs and SINEs is correct, the question 
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arises as to what extent normal genomic transcripts 
could be retrotransposed as well. If a poly(A) tail is 
sufficient to prime integration and reverse transcrip- 
tion, any poly(A) tail could serve as a substrate for 
retroelement enzymatic activities. As a matter of fact, 
DNA sequences resulting from such events can be 
found in the plant genome. They are termed processed 
pseudogenes. In contrast to the genes they originated 
from they display neither an internal promoter, nor 
any intron sequences, yet they possess the poly(A) 
tail. They generally represent processed RNA tran- 
scripts converted into DNA. 

Therefore, after integration, these byproducts of 
retroelement activity in the genome are immobilized 
and cannot be transcribed anymore. Extending the 
line of reduction from complex autonomous elements 
like retroviruses to LTR-transposable elements to 
non-LTR-transposable elements (LINEs), and no 
protein-encoding SINEs, these processed pseudo- 
genes are one-way products of accidentally chosen 
templates by trans-acting reverse transcriptase activ- 
ities in the genome. 


Genome Distribution and Chromosomal 
Organization 

Two types of repetitive DNA that are structurally 
different and dissimilarly arranged constitute a major 
part of the genetic material in higher plants. Tandemly 
arranged sequences build up satellite and microsatel- 
lite DNAs. Interspersed sequences widely distributed 
in the genome mainly consist of retrotransposable 
elements. In maize they constitute up to more than 
80% of the genome. Besides their individual structure, 
which is correlated to their mode of transposition, 
their arrangement at the chromosome level is of spe- 
cial interest. 

Although the elements are often widely inter- 
spersed, investigations in Beta species by fluorescent 
in situ hybridization (FISH) disclosed a nonrandom 
distribution for LINE elements. They are located in 
discrete clusters. Homogenization of certain areas 
(by loss of TEs) might be a genomic mechanism to 
limit the presence of retroelements to specific chromo- 
somal locations. Alternatively, preferential insertions 
at specific positions might concentrate the elements. 
Such a tendency has been shown for Zepp, anon-LTR 
retroelement in Chlorella vulgaris, which prefer- 
entially integrates into other Zepp copies (Schmidt, 
1999). 

Ty1-copia elements are dispersed in euchromatin. 
In some species they are unevenly distributed. The 
preferred regions differ for various elements and sep- 
arate species. Some families are clustered in or absent 
from chromosomal areas such as centromeres, telo- 
meres, or paracentric regions. In euchromatin (the 
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location of most genes), the elements occur mainly in 
spacer DNA (intergenic regions). Drosophila specific 
retroelements are detected at telomere or centromere 
sites, yet in the Gramineae the cereba and Ty3/copia 
sequences cluster at centromere regions. To date, no 
known function can be correlated with these different 
occurrences. 


Class Il: DNA Transposons 


Since the discovery of DNA transposons in maize 
more than 50 years ago, the elements of this class 
have been investigated in detail. They transpose via a 
nonreplicative cut-and-paste mechanism. Owing to 
the fact that the transposon is cut out at the donor 
site, the number of Class II transposons does not 
increase during transposition. Yet, they can multiply 
when they transpose from an already replicated site 
into one that is replicated later. In general, the number 
of DNA transposons is limited from a few copies up to 
several hundred in the genome. The elements encode 
for only one ora few proteins constituting trans-acting 
factors sufficient for transposition to occur. How- 
ever, the structure of the termini (the czs-determinants 
for transpositional activity) are more complicated than 
the corresponding retroelement structures. Current 
models for transposition of Class II elements em- 
phasize the necessity of concerted cuts at both ends 
to avoid chromosome breakage. The DNA-based cis- 
requirements for exision seem to be more intricate 
than those for retroelements. The termini contain 
two types of cis-acting signals: terminal inverted 
repeats (TIRs), which determine the position to be 
cut, and subterminal motifs, which seem to guarantee 
the alignment of the ends in an appropriate arrange- 
ment for the cut-and-paste mechanism. 


Ac/Ds Superfamily in Maize and Antirrhinum 

The Ac/Ds family comprises some hundred elements 
per genome. Most of them are inactive or nonautono- 
mous. Nonautonomous elements are derived from the 
autonomous Ac by, for example, base substitutions or 
deletions in the protein-coding region. In general, 
only a few Ac master elements exist per genome. Ac 
is 4565 bp long and the central part encodes for the 807 
amino acid transposase (TPase) (see Figure 4). The 
TPase recognizes two different types of cis-acting sig- 
nals in the Ac termini via a bipartite DNA-binding 
domain. The outermost ends of Ac are marked by 
11bp TIRs. There are multiple 3-4 bp-long DNA- 
binding motifs which are internally located to the 
TIRs and that are bound by the TPase as well. In 
spite of the absence of sequence homology, both the 
11 bp TIRs and the short motifs are recognized (Becker 
and Kunze, 1997). According to the transposom 


hypothesis, the subterminal motifs are required for 
the correct alignment of both ends by TPase-TPase 
interactions. Concerted cuts at both TIRs might then 
be set by TPase molecules binding to the TIRs as well. 
During integration a staggered incision is set into the 
host DNA and the element is ligated to the single 
strands of the locus. The gaps are finally filled in by 
the actions of host repair enzymes. 8 bp target site 
duplications are created by integration. After excision 
a more or less perfect track of the TE activity in the 
form of ‘footprints’ can be detected, generated by 
imprecise excision and repair of TE Class II visits 
(see “Iransposable elements and DNA diversity’ and 
Figure 4B). 

The TPase is acting in trans on the Ac (activator) 
element as well as on the nonautonomous Ds elem- 
ents. In some cases Ds elements can be clearly identi- 
fied as decendants of autonomous elements. Although 
in other instances, for example, Ds/, 11 bp TIRs and 
subterminal motifs can also be identified, sequence 
comparison indicates Ds? is not derived from an Ac 
element. 

The presence of many different TEs in the same 
plant species raises the question whether (and if so, to 
what extent) TEs are related within and between vari- 
ous organisms. The Ac/Ds transposons, comprising 
several hundred elements in maize, belong to the 
Ac superfamily. All these elements generate 8 bp target 
site duplications and display similarities in their TIR 
sequences. In maize, the TEs Ds1, Bg, and rDT belong 
to this superfamily. Further members are found in 
Chlamydomonas, Petunia, Pisum, and Petroselinum. 
Also, one of the genetically and molecularly best- 
characterized transposons, Tam3 of Antirrhinum 
majus, is a member of this superfamily. The Tam3 
(transposon Antirrhinum majus) DNA consisting of 
3629 bp is bordered by 12 bp TIRs, and contains an 
open reading frame coding for 749 amino acids. An 
overall 30% similarity is found between the Tam3 
TPase and the Ac TPase (807 amino acids long) (in 
some conserved regions the similarity rises to about 
65%). These findings indicate that homologies 
between elements from different species are not only 
limited to the cis-acting determinants for transposition 
at the DNA-level, but also extend to the proteins 
necessary for transposition. The homologies at the 
DNA and protein level indicate common patterns 
and blueprints and thus clear systematic relationships. 


En/Spm in Maize: CACTA Superfamily 

The CACTA superfamily of TEs is defined by similar- 
ities in their outermost TIR sequences. Elements 
belonging to this family have been identified in 
maize, rice, Antirrhinum (Tam1), Glycine max, and 
Pisum. The En/Spm element is the most thoroughly 
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Figure 4 The Ac-encoded TPase (transposase) recognizes the TIRs (terminal inverted repeats) and short 
subterminal motifs (indicated by black bars). The target site duplications at the outermost ends of the TIRs are 
depicted as triangles. The three black arrows mark transcription start sites. In the hypothetical transposom (a 
complex arrangement of the Ac termini and TPase molecules) the TPase binds to the short subterminal motifs. Thus 
the ends are brought together in correct alignment. TPase molecules that have been shown to recognize the TIRs 
then might cut the termini. The host repair machinery fills in the gaps that result from the first steps of integration. At 
the donor site a more or less perfect target site duplication is left as a footprint. 


analyzed member of the CACTA superfamily. It was 
independently discovered by Peterson in 1953 who 
termed the element Enhancer (En) and McClintock 
in 1954 who called this transposon Suppressor of 
mutation (Spm). En/Spm as depicted in Figure 5 is 
the autonomous master element that carries all the 
necessary cis- and trans-determinants for transpos- 
ition. As in Ac; the central protein-encoding part 
of the 8287bp element is dispensable for cis-acting 
transposition signals. The cis-determinants for 


transposition consist of perfect 13 bp-long TIRs and 
12 bp subterminal sequence motifs. Hence, the basic 
organization of Ac and En/Spm is similar. However, 
on En/Spm integration, only 3bp-long target site 
duplications are generated which are characteristic 
for all the members of this family. In further contrast 
to Ac, En/Spm encodes for two proteins. A primary 
transcript is subjected to alternative splicing. The two 
mRNAs of 2.5 kb and 6kb are transcribed into two 
proteins, the 67 kDa TNPA and the 132 kDa TNPD 
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Figure 5 The ends of the En/Spm element are marked 
by 13 bp TIRs. Target site duplications in the host DNA 
are indicated by triangles. The element encodes for the 
TNPA and the TNPD (transposon proteins A and D, 
respectively). Transcription start site is marked by the 
small arrow. The corresponding mRNAs are generated 
by alternative splicing. TNPA recognizes the subterminal 
motifs depicted as black bars. Although being essential 
for transposition, specific DNA-binding has not yet been 
demonstrated for TNPD. A hypothetical transposon 
TNPD recognizes the TIRs and cuts just outside the 
termini. 
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(transposon proteins A and D). Both functions are 
necessary for transposition to occur. The quantita- 
tively dominating TNPA binds to the subterminal 
12 bp motifs and presumably works as a glue between 
the termini and ensures the correct alignment of the 
TIRs. Whereas TNPA recognizes subterminal motifs, 
the 132 kDa TNPD presumably binds the 13 bp TIRs 
that are not recognized by TNPA. Both proteins are 
required for transposition as shown in tobacco as a 
heterologous host. 

In Antirrhinum, a close relative of maize, En/Spm is 
the 15164 bp-long Tam1 element that is bordered by 
13bp TIRs and that also generates 3bp target site 
duplications. Tam1 encodes for two putative proteins 
TNP1 and TNP2 which, in all probability, constitute 
the functional equivalents of TNPA and TNPD. 


Mutator Elements: An Unusual Class II 
Family 

Mutator (Mz) elements were originally identified by 
their property of an abnormally high mutation rate, 
hence the name. Although the independently isolated 
autonomous elements seem to be virtually unvarying 


and hence form just one subfamily, the nonautono- 
mous elements have been classified into six subfamilies 
because of often totally unrelated internal sequences 
between the families and strong homologies within 
them (Figure 6). Thus, in comparison with other 
Class II elements like Ac/Ds or En/Spm, members of 
the Mutator family display an unusual amount of 
sequence diversity. However, similar TIRs border 
autonomous and nonautonomous elements alike. To 
date, large numbers of Mu elements have been isolated 
and sequenced. 

The autonomous element is termed MuDR. It is 
4942 bp long and is bordered by c. 220 bp-long TIRs 
containing the Mu promoters. The first 180 bp of the 
TIRs of MuDr elements show 99% identity. Two major 
transcripts have been identified, a 2.8 kb mudrA and 
a 1.0kb mudrB product. Deletions in the mudrA- 
coding region abolish Mu activity. The 823 amino 
acid MUR-A product contains a sequence motif 
which is also found in transposase proteins from nine 
prokaryotic IS (insection sequence) elements. Hence 
MUR-A is supposed to be the transposase of this 
family. A clear function for the mudrB transcript has 
not been defined up to now. For mudrB alternative 
splicing has been reported: either two or three small 
introns are spliced out. The longer mRNA encodes a 
207-amino acid protein and the shorter mRNA a poly- 
petide consisting of 167 amino acids (Figure 6). 

Besides their sequence diversity, another interest- 
ing feature of Mu elements is their unusual transpos- 
ition behavior. In contrast to other Class II elements, 
which transpose only via a cut-and-paste mechanism, 
two different types of mechanisms have to be postu- 
lated for Mu. Although the number of elements can 
increase threefold per generation in low copy stocks, 
the germinal reversion rate is extremely low. Yet, new 
Mu insertions can rise to a frequency of 10-15 copies 
per gamete per generation. A simple cut-and-paste 
mechanism does not seem to be sufficient to explain 
this transposition behavior. Furthermore, somatic 
revertant sectors show characteristic footprints resem- 
bling those of other Class II TEs. Mutator seems to 
use two different types of transposition depending on 
tissue and developmental stage of the host cell. The 
somatic effects can be explained by a cut-and-paste 
mechanism (as shown for Ac in Figure 4). However, 
during development of the germ cells, Mu transpos- 
ition was found to be duplicative: The double-strand 
breaks at the exision site trigger a gene conversion-like 
gap repair, which may start either from the sister 
chromatid or from the homologous chromosome as a 
template. Such a transposition event finally leads to a 
duplication of the element. Despite the fact that the 
exact function for MUR-B is still unknown, in situ 
immunolocalization of germinal tissues has revealed 


an accumulation pattern of MUR-B that would be 
consistent with a possible MUR-B function in switch- 
ing between the two alternative pathways: a gene- 
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Figure 6 (A) The ends of the autonomous MuDR 
element are marked by 220bp TIRs. The host DNA 
exhibits 9bp target side duplications as indicated by 
triangles. Transcription start sides are marked by the 
arrows above the element. Two convergent transcripts 
encode for MUR-A (the presumed transposase) MUR-B 
is possibly involved in switching between the two types 
of transposition mechanisms as described in the text. 
(B) The gap repair model for Mu transposition ultimately 
leads to sequence duplication of the Mu element. The 
double-stranded gap left behind after transposition is 
widened at the donor site by exonuclease activity. A 
homologous genomic sequence serves as a template for 
the gap repair. After completion of the repair process, 
the element has been duplicated. 
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conversion-like gap repair in germinal cells and a fill- 
in/religation repair belonging to the cut-and-paste 
mechanism in somatic tissues (for a review see 
Kunze et al., 1997). 


Regulation of Transposition 


Any transposition event has the potential to be more 
or less harmful for the genome affected. Hence, a 
limitation of transposition frequency might be advan- 
tageous for both, the hosts and the TEs. Yet, if there is 
any benefit by downregulation of TE activities, how 
would such an advantage concur with the enormous 
numbers of retroelements mentioned above indicating 
an impressive tolerance and plasticity of the plant 
genome for repetitive DNA? 

One possibility is that benefits from downregula- 
tion may not necessarily limit the overall number of 
elements but only the number of transposition events 
in time, compensating for potential damage by distrib- 
uting rare events over long periods. As mentioned 
above, plant genomes contain vast amounts of retro- 
elements which occur as diverse but related sequences. 
However, active elements are either absent or their 
number is extremely limited. The transposition rate 
of retrotransposons is generally controlled at the level 
of transcription. Normally transcription rate is low 
and transposition has been detected for just a few 
plant retroelements. 

However, some methods of regulation have already 
been demonstrated experimentally: generally, in Tnt1 
(belonging to the Ty1/copia group) transcription is 
low, but can be stimulated by microbial induction 
and abiotic factors like wounding and freezing. 
The promoter of the Tnt1 element in tobacco is 
located in the 5’ LTR and the corresponding regu- 
latory sequences could be identified in its U3 region. 
Activation by genomic stress has likewise been estab- 
lished for Class II elements such as Ac and En/Spm, 
which also display a tight regulation of the transposi- 
tion rate. 

Moreover, the Ac/Ds and En/Spm TE families can 
epigenetically be inactivated. This effect is found to be 
associated with cytosine methylation of the elements 
upstream of their promoters or regions nearby. Inac- 
tive Ac elements are hypermethylated; however, they 
can be activated in trans by an active Ac element which 
supplies the TPase. Reactivation seems to be correl- 
ated with partial demethylation and stimulation of 
transcription. Similar effects have been found for En/ 
Spm. Here different states of inactivation can be 
observed. Cryptic inactive elements are not transi- 
ently reactivated by an active En/Spm in the genome, 
yet silent inactive elements are activated by the TNPA 
protein supplied im trans. The protein binds in a 
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methylation-sensitive manner to the promoter of En/ 
Spm. Like Ac, En/Spm encodes for a protein, TNP A, 
which maintains the unmethylated state (Becker and 
Kunze, 1996). 

On the other hand, reactivation can be induced by 
gamma irradiation or chromosome breakage, leading 
to mass activations of transposable elements compar- 
able to the phenomenon of hybrid dysgenesis by the 
Drosophila P elements. 

Apart from such stress activations there are other 
factors influencing time and role of transposition. 
Transposition of Ac, for instance, seems to be depend- 
ent on the number of elements in the genome. A 
negative dosage effect can be observed for certain 
genetic backgrounds: The more autonomous elements 
occur in the genome, the less frequent and more 
delayed transposition events are in development. 
Thus, an increasing number of elements can be correl- 
ated with a reduction of transpositon. 


Transposons as Tools: Transposon 
Tagging and Reverse Genetics 


The classical transposon tagging strategy has been 
widely used in bacteria and plants (for a review in 
plants, see Kunze et al., 1997). A quick isolation of 
an inactivated gene is guaranteed by the use of an 
inserted transposon sequence as a tag to isolate the 
neighboring host sequences. This classical strategy 


Figure 7 (See Plate 42) The excision of the Ac 
transposable element at the P locus of Zea mays 
resulting in a variegated colour pattern in the pericarp 
of the kernels. 


has been extended into the era of functional genomics 
by more quantitative approaches. The aim changed 
from the isolation of a specific gene to quantitative 
mutagenesis of the genome. Saturation mutagenesis by 
either TEs or T-DNA transferred from Agrobacter- 
ium tumefaciens to the plant has successfully been 
applied for this purpose. 

Additionally, quantitative mutagenesis of the 
genome allows reverse genetics. A gene of interest 
whose function is yet unknown can be investigated 
when an individual carrying a suitable insertion has 
been identified. The use of PCR primers against the 
TE and the gene of interest allows the identification of 
such insertion mutants. The PCR screens are usually 
performed by analyzing the DNA pools of many 
plants. Hence, quick isolation of a specific insertion 
mutant is possible. In the model plant Arabidopsis 
thaliana the well-analyzed En/Spm system has been 
used successfully for this purpose. Reverse genetic stra- 
tegies based on endogenous TEs have also been applied 
efficaciously in Antirrhinum, Petunia, and maize. 


Quantitative Contributions to the 
Genetic Material 


The vast number of retrotransposons and their 
impressive contribution to the total amount of nuclear 
DNA in plants reflects the astonishing tolerance and 
plasticity of the plant genome. Whereas the copy 
number of Class II elements is usually limited to a 
maximum of several hundred TEs in maize, thousands 
of different Class I element families are estimated to 
account for 70-85% of its nuclear DNA. In com- 
parison Saccharomyces cerevisiae contains only five 
LTR retrotransposon families, which contribute 3% 
of the genome. In Arabidopsis there are several 
hundred families of LTR and non-LTR elements 
accounting for 14% of its genome. In contrast, in 
mammals LINEs and SINEs are the dominating 
groups comprising up to 100000 copies and 35% of 
the nuclear genome, whereas elements of the Ty1/ 
copia group are either not present at all or occur only 
in small numbers. Plants again possess elements 
belonging to the Ty1-copia and Ty3-gypsy groups as 
well as LINE-like and SINE-like elements. 
Ordinarily, plant TEs surpass those of the other 
kingdoms in diversity and abundance (Kumar and 
Bennetzen, 1999). 


Transposable Elements and 
DNA Diversity 


Owing to their peculiarity to insert at new genomic 
sites, Class I and Class II TEs have the capacity to 
generate mutations and to constitute homologous 


sequences at nonallelic loci. Both are known to be 
essential for the generation of genetic diversity. Hom- 
ologous recombination between such nonallelic copies 
leads to chromosomal rearrangements which comprise 
deletions, duplications, inversions, and translocations. 
The genetic diversity is further increased by altera- 
tions in gene expression, for example, when elements 
insert into regulatory regions (usually leading either to 
reductions in or to losses of function of a pathway). 
Unstable mutations observed for Class II elements 
reflect insertions and excisions at the DNA level. 
Additional sequence diversity is generated upon ex- 
cision (also found for the P elements of Drosophila): 
At the donor site a more or less precise target site 
duplication is left behind. Whereas target site duplica- 
tions lead to additional bases and/or amino acids, the 
excision mechanism often deletes or adds bases and 
thus creates imprecise footprints (Saedler and Nevers, 
1985; Kunze et al., 1997). Multiple different TE foot- 
prints result in altered host sequences: over 90% of 
more than 800 analyzed Ds excision products of maize 
Waxy alleles revealed mutant sequences not restoring 
the wild-type. However, the sequence deviations 
proved to be ‘surprisingly nonrandom’ (Scott et al., 
1996): Depending on the allele and the insertion site, 
some 37-88% carry a predominant footprint and even 
the less prevalent footprints are often similar to the 
prevalent ones. Moreover, in 1-6% of the excisions, no 
footprint was formed. Only the rest in between seems 
to consist of random sequences. 

Interestingly, sequence deviations leading to the 
addition of one to three amino acids to a Waxy protein 
did not abolish protein function: although the new 
alleles encode proteins with reduced enzymatic activ- 
ities, wild-type function is still largely maintained. 
The question whether and to what extent such results 
detected in cultivated plants (whose existence depend 
on human care and interest) can be extrapolated to the 
origin of plant species in the wild leads us to our next 
TE topic. 


Transposons and Evolution 


The present discussion about transposons and evolu- 
tion is one of the most interesting and undoubtedly 
also the most controversial topic(s) of TE research. 
Seeing that often about one third (and sometimes up 
to 70-85%) of the total nuclear DNA amount can 
consist of transposon sequences in plants (not to men- 
tion the similar situation in animals and humans), the 
question is, of course, what (if any) is the biological 
function of TEs? 

In contrast to bacteria, where a range of clear-cut 
functions has been detected and described in detail 
(see entries on related topics), it is not possible to 
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make a similar sweeping statement for the TEs of 
eukaryotes. In the latter the search for a general bio- 
logical function has, in fact, until now resulted in no 
more than several alluring, yet contradictary hypoth- 
eses without much hard scientific evidence behind 
them (for reviews see Kunze et al., 1997, Lénnig and 
Saedler, 1997). 

The two main hypotheses on the existence of TEs 
may be presented as follows: 


1. Transposons as evolutionary tools. TEs are thought 
to have a general biological function by essentially 
contributing to the origin of eukaryotic species 
and (perhaps also) higher systematic categories 
(Nevers et al., 1986; McClintock, 1987). At present, 
the reason for this hypothesis consists mainly of the 
fact that TEs have an enormous mutagenic poten- 
tial: As mentioned above, nearly the whole range of 
mutations is covered by TE activities. Even the 
generation of intron-like sequences and ectopic 
expression of genes belongs to the possible muta- 
genic effects of plant TEs. Moreover, TEs can 
generate mutations at such accelerated rates that 
no other known natural mutagenic agency can 
compete with transposons. So the basic inference 
is: DNA variation is necessary in evolution. TEs 
produce DNA variation. Thus TEs are important 
in evolution. In sum, transposon existence and 
spread is due to natural selection of superior 
phenotypes caused by TE-induced advantageous 
mutations. 

2. Transposons as parasites. The second hypothesis 
was first formulated by Doolittle and Sapienza 
(1980) and Orgel and Crick (1980): TEs have no gen- 
eral biological function. They exist only for their 
own sake and not for the organism’s. As trans- 
posons represent at least a slight energetic burden 
for the organisms harboring them and since many 
TE activities are clearly destructive, they essentially 
consist of ‘sefish DNA’ which may also be labeled 
‘parasitic DNA’ or even the ‘ultimate parasite’ 
(Orgel and Crick, 1980). Their only ‘function’ is 
survival in their environments where they can 
spread as long as they do not become too harmful 
for their hosts. TEs can replicate and spread because 
they happen to be in surroundings in which DNA 
replication is part and parcel of the regular key 
events of cell division (for several further points 
see below, as well as the review by Kunze et al., 
1997). However, the hypothesis does not exclude 
occasional contributions to the origin of useful 
variation in nature. 


Concerning hypothesis (1), it may be conceded 
that TEs have a place in the microevolution of wild 
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populations (which has yet to be tested) (Lénnig and 
Saedler, 1997). We must, nevertheless, distinguish 
between ‘necessary’ and ‘sufficient’ causes for explain- 
ing a phenomenon as the origin of the plant (and 
animal) world: is the origin of species and higher 
systematic categories fully explained by natural selec- 
tion of TE-induced mutations as well as by the DNA 
sequence variation produced by the rest of the muta- 
tion processes? 

It was no less a person than Charles Darwin himself 
who provided the following sufficiency test for his 
theory (Darwin, 1859): “If it could be demonstrated 
that any complex organ existed, which could not pos- 
sibly have been formed by numerous, successive, 
slight modifications, my theory would absolutely 
break down.” However, Darwin stated that he could 
“not find out such a case.” Yet, the question whether 
the situation has changed in the interim of some 150 
years of biological research is answered in the affirma- 
tive by several biologists. Michael J. Behe (Behe, 1996, 
2000) has refined Darwin’s statement by introducing 
and defining his concept of “irreducibly complex sys- 
tems,” specifying: 


By irreducibly complex I mean a single system composed of 
several well-matched, interacting parts that contribute to the 
basic function, wherein the removal of any one of the parts 
causes the system to effectively cease functioning. 


Behe and like-minded researchers are convinced that 
they have detected several such systems at the bio- 
chemical level (origin of the cilium, the flagellum, 
blood clotting, vesicular transport, and further exam- 
ples, see Behe, 1996, 2000). Lénnig (1998), suggests 
that irreducible complexity may also be found at the 
anatomical level (in combination with biochemical 
systems) in angiosperms as, for instance, in the trap 
mechanism(s) of Utricularia and several other carniv- 
orous plants. 

Supposing that such systems exist, could the non- 
Darwinian (more or less saltational) view of TEs as 
evolutionary tools as favored by many TE researchers 
(for a review, see Kunze et al., 1997) help solve the 
problem in a naturalistic way? 

For a balanced answer, we have to consider briefly 
what protagonists of hypothesis (2), i.e., transposons 
as ‘parasites,’ can further object to as a general evolu- 
tionary role of TEs: 


1. Since “low mutation rates are necessary for life as 
weknowit” (Alberts etal., 1994, p.243, withareview 
of the evidence), the unusually frequent movements 
of activated TEs can be life-threatening for the 
natural populations affected by them. High muta- 
tion rates result in “error catastrophe’ leading to the 


extinction of a population (for the details of popu- 
lation genetics, see ReMine, 1993). 

2. Apart from very few exceptions, almost all TE 
insertions into coding sequences cause losses of 
gene functions. 

3. Even the footprints leading to an addition of one to 
three amino acids of a protein have to be mainly 
classified as regressive evolution, i.e., protein 
function is reduced (for examples, see Lénnig and 
Saedler, 1997). 

4. The hierarchy of gene functions has to be consid- 
ered. Transposon activities in genes coding for his- 
tones, ubiquitin, actin, many tRNAs, and other 
ultraconservative and conservative parts of the gen- 
ome have to be classified as nearly always ‘parasitic.’ 

5. Yet, even in the plant genome’s more redundant 
parts (flower color, plant height, form of leaf mar- 
gins, etc.), TE activities can be disadvantageous. For 
instance, loss of the Nivea gene function (one of the 
basic functions in the anthocyanin pathway) affects 
not only flower color, but also lowers resistance to 
stresses, such as UV light, cold, pathogens, and 
mechanical damage. (As for the problems of gene 
duplications and exon shuffling by TEs, as well as a 
possible synthesis between different views, see 
Kunze et al., 1997.) 

6. The probability that activated TEs will simultan- 
eously generate several independent, advantageous 
mutations in different parts of the genome, salt- 
ationally resulting in irreducibly complex struc- 
tures or organs appears to be very low. 

7. Besides the problem of irreducible complexity and 
pertaining more generally to the origin of species, 
the following points should be considered. The 
Lilium case mentioned above already hints at 
the fact that there is neither a correlation between 
the number and kinds of TEs and the number of 
species and genera formations within plant families, 
nor is there a strict connection between overall 
species (and higher systematic category) complex- 
ity and the DNA amount (C-value paradox). 


Concerning the question whether TE activities could 
solve the origin of irreducibly complex systems and 
organs in particular and the generation of species in 
general, you are invited to judge for yourself whether 
the facts and arguments presented so far suggest a 
direction to the answer(s) of the problems raised (for 
further reading see Starlinger, 1993; Kunze et al., 1997; 
Lénnig, 2001). 
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Transposase is the transposon-encoded protein that 
is responsible the element’s transposition. The trans- 
posase recognizes and binds to both ends of its cognate 
transposon, brings the two ends together in a synaptic 
complex, and cuts the DNA at the two 3’ ends (and in 
some cases at the 5’ ends as well). In a single combined 
cleavage-ligation step, the transposase then inserts the 
3’ transposon ends into target DNA. 


See also: Insertion Sequence; P Elements; 
Transposable Elements; Transposable Elements 
in Plants; Transposons as Tools 
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Transposons are mobile genetic elements. Their move- 
ment within and between DNA moleculas can result 
in a wide range of genome rearrangements including 
insertions, deletions, inversions, duplications, repli- 
con fusions, and probably chromosomal transloca- 
tions. In addition they are important elements in the 
spread of antibiotic resistance genes in bacteria. They 
are extremely abundant, being found in almost all 
organisms. They can constitute a significant percent- 
age of the total genomic DNA of a species. 

A large number of different types of transposons 
have been identified. Of these a fairly large subset 
transpose by a mechanism in which the transposon is 
simply excised from the flanking ‘donor’ DNA by a 
pair of double-strand breaks, one at each transposon 
end. The excised transposon intermediate is then 
inserted into a new site. The new location usually has 
no sequence relationship to the transposon or the 
donor site. Excision and insertion steps are catalyzed 
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by one or more transposon-encoded proteins called 
transposases. 

Three different strategies for generating the excised 
transposon intermediate have been documented. Two 
separate transposase proteins can beinvolved in making 
the double strand break at each end. For example, in Tn7 
(a bacterial transposon) the proteins TnsB and TnsA 
eachare responsible for cleaving a different DNA strand 
at and just outside of the transposon end, respectively. 

However, other transposons such as the bacterial 
transposons Tn10 and Tn5 encode only a single trans- 
posase protein with a single active site. The excision 
reaction takes place within a nucleoprotein complex in 
which there are only two molecules of the transposase 
present. In this situation excision takes place by a 
mechanism in which a transposon end hairpin inter- 
mediate is formed. Here, the single active site of one 
transposase monomer first introduces a nick to expose 
a 3/OH group at the transposon terminus. Then the 
same active site is used to catalyze the joining of this 
3’OH terminus to a phosphate group on the opposite 
strand of the same end. This generates the transposon 
end hairpin and severs the final connection between 
the transposon and the flanking donor DNA in one 
chemical step. The hairpin end must then be cleaved to 
reexpose the 3’OH group for joining to the target 
DNA in the final step. It is likely that a number of 
plant transposons also use this mechanism of excision. 

In the third excision mechanism the transposase 
first makes a nick at only one transposon end. Then 
the exposed 3’OH terminus is joined to a phosphate 
group on the same DNA strand but just outside of the 
second end. This generates an excision intermediate in 
which the cleaved strand forms a covalently closed 
circle and the two transposon ends are held together 
byasingle-strand bridge. Double-strand circles are gen- 
erated from this by an as yet undefined mechanism. 
Then the transposase introduces a pair of single-strand 
cleavages at the abutting transposon ends. This opens 
up the circle generating a linear form of the excised 
transposon thatcanbe directly insertedintoanew target 
site. This mechanism is used by the bacterial transposon 
18911 and other members of the IS3 family. It is not 
understood what constraints are responsible for the 
evolution of this seemingly complex excision pathway. 

Interestingly, the hairpin mechanism for trans- 
poson excision is also used in the formation of double- 
strand DNA breaks in V(D)J recombination. This is 
the process whereby antigen receptor genes are pieced 
together from separate coding segments in developing 
T and B cells in the immune systems of jawed verte- 
brates. Furthermore, the proteins that catalyze this 
double-strand break reaction, RAGI and RAG2, 
have been shown to catalyze DNA transposition reac- 
tions im vitro via the hairpin mechanism. The use 
of a common mechanism for double-strand break 


formation in V(D)J recombination and bacterial DNA 
transposition suggests that the V(D)J recombination 
system evolved from an ancient bacterial transposon. 

It will be interesting to see if other mechanisms for 
carrying out transposon excision are used and what 
constraints favor these mechanisms. 
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Transposons are ubiquitous mobile DNA elements 
that can relocate within the genome of their hosts. 
These elements generally have either inverted or direct 
repeats at their termini and encode enzymes important 
for transposition. Movement occurs either by duplica- 
tion and insertion of the new copy into the genome 
(replicative) or by excision of an existing copy and 
insertion at a new site (nonreplicative). Insertion of 
the transposon causes duplication of the target site and 
therefore these elements are flanked by direct repeats. 
Retrotransposons are a class of transposons that con- 
tain long terminal repeats and use an RNA intermedi- 
ate for transposition. These elements encode reverse 
transcriptase and integrase, enzymes required for their 
duplication and transposition. Early observations by 
McClintock and others indicated that transposons 
could mediate transfer of genetic material within and 
between species. The application of transposons as 
molecular biology tools did not begin until the 1970s, 
coincident with our increased understanding of 
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Figure | Common uses for transposons. 
transposon structure and movement and the ability to 
manipulate these elements. 


Transposons as Molecular Biology Tools 


Transposable elements have emerged as versatile, 
powerful, and informative biological tools since their 
discovery 40 years ago. Current applications of trans- 
poson technology include gene identification and tag- 
ging, positional cloning, gene disruption, generating 
reporter fusions and enhancer traps, DNA sequen- 
cing, altering protein localization or gene expression, 
introducing protein tags, and introducing new genes 
(Figure 1). Prior to their use as tools, transposons are 
generally modified in order to control their replication 
and to allow detection of their presence. Typically, the 
endogenous transposase is deleted from the trans- 
poson and expressed in trans when movement is 
required. If not already present, a suitable selectable 
marker gene (usually encoding antibiotic resistance or 
a metabolic enzyme) is inserted to allow identification 
of transposon-bearing organisms. Further modifica- 
tions are made depending on the specific transposon 
and the goals of the application. 

The methods of introduction of transposons into 
their target sites have evolved considerably over 
the years. Traditionally, in vivo transposition was 
achieved by crossing strains that lack the transposon 
to those that contain the transposon and a functional 
transposase. A more commonly used method is to 
transform a vector containing the modified trans- 
poson into the cells of interest and induce expression 
of the transposase gene in trans to achieve transpos- 
ition. While random insertions occur, the method is 
not ideal if the organism of choice is not easily trans- 
formed, lacks a manipulable transposon system, or is 
not compatible with the expressed transposase. These 
problems can be overcome using im vitro trans- 
position. In the presence of the appropriate transpos- 
ase, the transposon (e.g., Tn5, Mu) randomly inserts 
into the target DNA which is subsequently trans- 
formed into the cells. In vitro transposition allows the 
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creation of large libraries of randomly tagged clones 
that can simplify DNA sequencing, gene mapping, 
and mutagenesis. Tools such as genetic footprinting 
were developed as a result of the ability to quickly and 
economically generate large numbers of transposon 
mutagenized clones. Most recently, Tn5 transposition 
complexes are formed im vitro and introduced into 
cells via electroporation. In vitro transposition will 
allow researchers to apply transposon technology to 
answer questions in a larger number of organisms. 


Gene Identification and Tagging 


Transposons are used to identify genes of interest in a 
wide variety of organisms that include viruses, bac- 
teria, fungi, flies, plants, worms, and mice. Historically, 
the Drosophila melanogaster transposon known as the 
P element has been used to isolate genes of interest to 
researchers. When female flies from laboratory strains 
are crossed to male flies from natural populations, the 
resulting progeny exhibit a variety of mutant pheno- 
types. This phenomenon, known as hybrid dysgen- 
esis, is the result of transposition of the P element 
introduced into the cross by the male flies. Progeny 
flies exhibiting a phenotype of interest are selected, 
and DNA from these flies is isolated. Since the linkage 
of the P element to the mutant gene serves as a tag, 
radioactively labeled P element sequence is used as a 
probe to identify the DNA fragments that contain a P 
element. The DNA encoding the gene of interest is 
present in a subset of this population and once identi- 
fied, is cloned and sequenced. 

Today, P elements are used to identify Drosophila 
genes by inducing movement within the germline and 
screening for the phenotype of interest. These elements 
have been modified to encode markers that affect eye 
color to allow identification of flies that contain a P 
element insertion. Upon mutagenesis with these modi- 
fied elements, flies that exhibit the phenotype of inter- 
est as well as the eye color encoded by the P element 
are selected. The sequence of the DNA flanking the 
insertion element can be identified by inverse PCR. If 
the P element contains a bacterial selectable marker 
and origin of replication, the flanking region can be 
cloned by plasmid rescue in Escherichia coli. 

Transposons are used to circumvent the challenges 
associated with identifying genes in large eukaryotic 
genomes. Random transposon mutagenesis in intron- 
rich genomes does not yield many gene mutations 
since the transposons often insert into introns and 
are spliced out of the mRNA. Springer and colleagues 
developed a Dissociation (Ds) transposon-based tool 
for gene trapping in Arabidopsis thalania that by- 
passes this problem. The transposon contains a small 
intron, three splice site acceptors, a reporter gene, and 
a splice donor site in the order listed. Insertion of this 
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transposon into an intron leads to alternative splicing 
and expression of the reporter gene. The presence of 
the reporter gene also disrupts the coding sequence of 
the tagged gene and might therefore generate a pheno- 
type that allows identification of strains bearing muta- 
tions in the gene of interest. 

Transposons are used to identify genes that have 
tissue-specific or developmentally induced expres- 
sion patterns. Transposons designed for this purpose 
usually contain a reporter gene that is fused to a weak 
promoter (see ‘reporter fusions and gene expression’). 
The reporter gene is not expressed unless there is an en- 
hancer nearby. This tool allows identification of genes 
on the basis of their expression patterns. This approach 
has been used in worms, flies, and plants to identify 
genes that are tissue or developmentally regulated. 


Markers for Gene Mapping and Cloning 


Transposons are used as markers for mapping and 
positional cloning of genes in maize and Arabidopsis 
thalania. A collection of Arabidopsis plants that con- 
tain Ds transposon insertions encoding the kanamycin 
resistance gene is being constructed. After 1000-2000 
genomic transposon insertions have been mapped, 
they will become valuable tools for fine-scale mapping 
and cloning of gene mutations. There are currently 
enough molecular markers in Arabidopsis to allow 
preliminary mapping of a gene mutation within 10 to 
20cM. Transposon tagging will generate approxi- 
mately 20 mapped transposons within a 10 to 20 cM 
region. This will facilitate further mapping since 
tagged plant lines will have several transposons that 
bear dominant genetic markers throughout the gen- 
ome. The strain containing the mutation to be mapped 
can be crossed to the appropriate tagged lines and 
progeny in which the phenotype of interest is linked 
to the dominant marker selected. The region contain- 
ing the mutation can be mapped to the sequence 
between two mapped transposons on the basis of 
linkage. Since a recombination event will have occur- 
red between the mutation of interest and the trans- 
poson, the recombination breakpoint will facilitate 
cloning of the gene of interest. 


Gene Mutagenesis 


Once a gene has been identified, researchers can use 
transposons to produce disruptions or small muta- 
tions within its coding sequence. Transposon in- 
sertions can be generated in the gene by inducing 
transposon movement through genetic crosses or 
other methods and selecting or screening for the muta- 
tion of interest. If the organism of choice lacks a 
manipulable transposon system, in vitro transcription 
or electroporation can be used to introduce the 


transposon. Disruptions, insertions, and small 
deletions within the coding sequence are the most 
common mutagenic effects. Disruptive alleles are the 
result of transposon insertion while deletions and 
small insertions are caused by imprecise excision of 
nonreplicative transposons. 

Disruption of a gene of interest is made easier if 
there is a transposon in a nearby chromosomal loca- 
tion. Many eukaryotic transposons such as P elements 
in Drosophila, and Ac/Ds elements in maize and Ara- 
bidopsis tend to reinsert very close (within 200 kb) to 
the initial site when mobilized. Thus one can begin 
with a strain in which a transposon is located near the 
gene of interest, induce transposase expression and 
generate a collection of mutations a subset of which 
will contain disruptions in the gene of interest. Alter- 
natively, if the gene has been cloned into a stable extra- 
chromosomal DNA element, it can be mutagenized 
with transposons in bacterial cells and reintroduced 
into its host by transformation or gene transfer. 
The advantage of many transposon-based disruption 
alleles is that induction of a second round of trans- 
poson movement causes reversion to the wild-type 
allele. This feature is often used to verify that the pheno- 
type observed is the result of the transposon insertion. 

Transposons are used to generate multiple inser- 
tional alleles in a gene of interest. Depending on the 
location of a transposon insertion within a gene, it can 
cause complete or partial inactivation of the gene, or 
alter its activity or the conditions under which it is 
active. Generation of a large number of insertional 
alleles in a gene of interest is useful in dissecting gene 
function since the effect of truncating a protein at 
different positions can then be assessed. In addition, 
many nonreplicative transposons such as fly P elem- 
ents, plant Ac/Ds, yeast Ty, and worm Tcl excise 
imprecisely and cause deletions of various sizes and 
insertions. Deletions occur when flanking DNA is 
removed during excision and insertions occur when 
transposon sequences are not completely excised. If 
there is a transposon near a gene of interest, one can 
generate many different mutations within the gene 
and analyze the phenotypes of the resulting strains. 

If the phenotype of a mutation in the gene of inter- 
est is not known, mutagenized strains can be screened 
to identify those that contain mutations in the gene of 
interest. For example, in Caenorhabditis elegans large 
mutant libraries that contain Tcl mutagenized genes 
have been generated. Tcl is a ubiquitous transposon 
whose excision usually leads to deletions. After in- 
ducing Tcl excision, the mutagenized worms are sub- 
cultured and allowed to produce progeny. Every 
subculture will contain siblings that share the same 
mutation. Genomic DNAs from the progeny of Tcl 
mutagenized strains are pooled and screened for 
mutations in the gene of interest by PCR. Since Tcl 


excision usually leads to deletions, the fragment pro- 
duced by the mutant allele would be smaller than that 
made by the wild-type allele and favored in PCR 
reactions. Once a positive result is obtained, genomic 
DNA derived from each subculture is recovered and 
screened to identify which subculture contains the 
worms of interest. 

Transposons are used to mutagenize genes of inter- 
est in a wide variety of organisms that include viruses, 
bacteria, fungi, plants, worms, and mice. Cloned 
genomic DNA or cDNA from many organisms can 
be mutagenized in bacterial cells and reintroduced to 
assess the phenotypic effects of the resulting muta- 
tions. Endogenous transposons such as P elements in 
flies, Ty in yeast, Ac/Ds in plants, or Tcl in worms can 
also be used to mutagenize their hosts. 


Reporter Fusions and Gene Expression 


Transposons facilitate gene expression studies by gen- 
erating reporter gene fusions. Reporter genes encode 
proteins (for example, the B-galactosidase gene (/acZ)) 
whose presence can be easily detected using colori- 
metric or fluorescence assays. Transposon insertions 
that lead to production of reporter proteins are 
tremendously useful in determining tissue-specific 
expression patterns from multicellular organisms and 
the conditions under which genes are expressed in 
unicellular organisms. 

Reporter fusions can be generated in two ways: the 
first method involves expression of a fusion protein 
that contains sequences from the endogenous protein 
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as well as the reporter, while the second involves 
expression of an intact reporter protein from endo- 
genous regulatory sequences. Production of a fusion 
protein occurs when the reporter gene is missing a 
start codon (ATG) and therefore cannot be expressed 
unless it is downstream of and in frame with the start 
codon for the endogenous gene. A transposon is modi- 
fied so that a reporter gene that lacks a start codon is 
adjacent to one end. Insertion of this transposon into 
the coding region of a gene in the appropriate reading 
frame results in expression of a fusion protein. Expres- 
sion of the reporter gene would then correlate with the 
expression of flanking genomic DNA, enabling the 
establishment of an expression profile for a given 
gene. This method has been used to study gene expres- 
sion patterns in yeast, plants, and flies. 

Transposons can also be used for enhancer trap- 
ping. In this application, the transposon is modified 
so that it contains a weak promoter and an intact 
reporter gene whose expression is dependent on inser- 
tion near a chromosomal enhancer. The expression of 
the reporter gene reflects the expression pattern con- 
ferred by the enhancer. The transposon does not have 
to insert within the coding sequence of a gene to be 
expressed and does not generate a fusion protein 
(Figure 2). The approach is identical to that used to 
identify genes on the basis of their expression patterns 
(see above). It helps researchers to identify regulatory 
elements and determine the conditions under which 
they are active, and is valuable in identifying enhancers 
involved in tissue and developmentally regulated gene 
expression. 
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Figure 2 Multiple applications of transposons: transposon generation of gene disruptions, gene tags, truncated 


proteins, reporter fusions, protein tags and enhancer traps. 
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A popular modification of the traditional enhancer 
trap experiment involves cloning a promoter into a 
transposon and then using it to cause inappropriate 
expression of nearby genes. Expression pattern data is 
extremely useful in dissecting the functions of pheno- 
typically silent genes. This was demonstrated by Rorth 
and colleagues who used a P element that contained 
the yeast GAL promoter to carry out gain-of-function 
screens in Drosophila. Upon insertion of this P elem- 
ent, expression of the tagged gene was under the 
control of the GAL promoter. Mating these tagged 
lines to files that express the Gal4p activator resulted 
in controlled overexpression. P elements tend to insert 
at the 5’ ends of genes and lead to production of full- 
length transcripts. In some cases, the P element inserts 
downstream of a start site and in the opposite orienta- 
tion. This leads to the expression of antisense tran- 
scripts resulting in loss-of-function mutations. 

Cellular localization of fusion proteins provides 
important clues about gene function. Transposons 
can be used to alter protein localization by placing 
specific localization signals within the transposon. The 
fusion proteins encoded by the transposon-tagged 
genes are then targeted on the basis of the transposon- 
mediated signal. Researchers constructed a trans- 
poson that contains a yeast promoter and the DNA 
binding domain of the Gal4p protein. Insertion of this 
transposon often leads to the generation of Gal4p 
fusion proteins that localize to the nucleus and bind 
the GAL promoter. The transcriptional activity of the 
tagged protein can then be assessed in one-hybrid 
assays. Transposon tagging can therefore allow direct 
detection of a target protein or localize it to facilitate 
further analysis of protein function. 

Transposons can therefore be used to generate 
fusion proteins or reporter proteins that reflect endo- 
genous gene expression, identify enhancers and alter 
gene expression levels or cellular localization. 


Protein Tags 


Transposons are also used to generate protein tags that 
can be used for easy in vivo and in vitro protein 
detection (Figure 2). For this approach to be success- 
ful, the sequence encoding the tag must be inserted 
such that the reading frame of the transposon-bearing 
gene is maintained through the tag; the resulting pro- 
tein can be visualized by detecting the presence of the 
tag. Common tags include green fluorescent protein 
(GFP), and the hemagglutinin and myc epitopes. 
Epitope-tagging a protein also allows researchers to 
learn if a protein is a part of a larger complex through 
coimmunoprecipitation studies. The tagged protein 
can be identified using an antibody to the tag and 
immunoblot analysis, and purification may be aided 


by the use of affinity columns that contain antibody to 
the epitope. 


Gene Transfer 


Transposons can also be used to introduce genes 
into organisms — an approach currently being exploit- 
ed to create transgenic strains of zebrafish and flies. 
Homologous recombination does not occur in these 
organisms making it difficult to incorporate genes into 
the chromosome. Transposon insertion is a highly 
efficient method of introducing genes into the gen- 
ome. In the case of zebrafish, researchers clone a gene 
of interest into a transposon, which is subsequently 
used to mutagenize zebrafish embryos. A subset of the 
embryos develop into fish that can transmit the trans- 
poson insertion through the germline. The resulting 
transgenic organisms are used to dissect gene function 
during development. 


DNA Sequencing 


Transposons are currently being used to facilitate 
large-scale DNA sequencing in a cost-efficient and 
accurate manner. When using a transposon-based 
approach to genome sequencing, large clones are 
broken into smaller redundant and overlapping clones 
that are subsequently subjected to transposon muta- 
genesis. The transposition conditions are controlled so 
that there is approximately one insertion every three 
kilobases. After mapping the locations of the trans- 
posons, clones are aligned so that a group of clones 
that together represent a transposon insertion at every 
300 bases within a region are identified (see Figure 3). 
The DNA on both sides of the transposon is 
sequenced using primers that are specific to the ends 
of the transposon. Since an identical priming site is 
used for all the sequencing reactions, the cost of 
synthesizing primers is very low and there is no lag 
time spent waiting for new sequence data in order to 
design new primers. Multiple transposon insertions 
decrease the length of DNA to be sequenced in one 
run thereby increasing the probability of obtaining 
accurate data ina single sequencing run. Additionally, 
random transposon insertion into cosmids and other 
large clones facilitates rapid sequencing and does not 
require mapping or subcloning. As testimony to the 
practicality of this approach, transposons have already 
been used to help sequence large amounts of bacterial, 
insect, and human DNA. 


Combining Transposon Applications 


All the transposon applications discussed have been 
used in isolation from each other. Using individual 
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procedures, a gene is often tagged, sequenced, and 
distrupted, and reporter fusions generated. It is pos- 
sible, however, to use a single transposon to attain all 
of these goals in a minimal number of steps. A multi- 
purpose mini-Tn3 transposon designed for this pur- 
pose is shown in Figure 4. This transposon contains a 
lacZ reporter gene that is missing its promoter 
sequences and initiator methionine codon near one 
end of the transposon. The lacZ gene is usually only 
expressed if it is downstream of and in frame with a 
promoter and an initiator methionine codon. This 
engineered mobile element contains yeast and bacter- 
ial selectable markers that allow selection for its pre- 
sence in both organisms. The reporter gene and 
selectable markers are separated from the terminal 
repeats by loxR and loxP sites, and there are three 
copies of the hemagglutinin (HAT) epitope (a con- 
venient protein tag) between the loxP site and the 
terminal repeat. In the presence of cre recombinase, 
recombination occurs between the /ox sites, resulting 
in excision of the intervening region. The remaining 
transposon sequence encodes three copies of the 
hemagglutinin epitope that are in the same reading 
frame as the recently removed reporter gene; this 
reduced transposon is transcribed and translated as 
part of the surrounding gene. 

This highly modified transposon is currently being 
used for the large-scale analysis of gene function in 
yeast. Specifically, a yeast genomic DNA library is 
mutagenized with the multipurpose transposon de- 
scribed above. Strains carrying reporter fusions are 
used for quantitative and qualitative measurements of 
gene expression. Strains bearing HAT-tagged genes 
allow application of protein detection by immunoblot 
analysis, i immunoprecipitation, and immunolocaliza- 
tion. Finally, strains bearing insertion and disruption 
alleles are being used for phenotypic analysis. Further 
modification of this transposon by replacement of the 


Transposons as Tools 2039 


TR loxR loxP TR 


Xa 
lacZ URA3 tet res [SXHA f 
i T 


minitransposon 


cre recombinase 
279 bp HAT tag 


Figure 4 A multipurpose transposon. TR, terminal 
repeats; Xa, protease cleavage site; loxR and loxP, sites 
that recombine in the presence of cre recombinase; lacZ, 
B-galactosidase gene; URA3, yeast selectable marker that 
allows growth in the absence of uracil; tet, bacterial 
selectable marker that encodes the tetracycline resis- 
tance gene; res, transposon resolvase site required for 
resolution of co-integrates; 3x HA, three copies of the 
hemagglutinin epitope. 


HAT sequence with that of GFP will allow in vivo 
protein localization studies. Additionally, the inser- 
tion of a protease cleavage site near the HAT tag will 
enable controlled degradation of the tagged protein, 
thereby increasing the versatility and efficacy of this 
transposon for further studies. 


Conclusion 


Transposable elements have clearly emerged as a ver- 
satile and informative tool in modern molecular biol- 
ogy. These elements usually lead to the development 
of new biological tools. As evidence of this fact, con- 
sider the ever-increasing collection of Drosophila and 
Saccharomyces strains that bear transposon-mediated 
single gene disruptions. These libraries have proven 
invaluable in determining gene function in these 
model organisms. The recent development of a trans- 
poson that can mutagenize Mycobacteria tuberculosis 
will drastically speed the identification of proteins 
required for the virility of this organism. Transposons 
are also being widely used in medical diagnostics since 
the presence of their conserved sequences provides a 
primer site for amplification of the nearby DNA and 
consequent identification of the infecting organism. As 
our understanding of transposons in different model 
systems continues to grow, additional uses are likely in 
the future analysis of their respective organism. 
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Most instances of RNA splicing involve removal of 
internal sequences of a single molecule and splicing 
together of the two surrounding sequences. In con- 
trast, trans-splicing results in the splicing of two 
originally separate RNA molecules. Trans-splicing 
can take several different forms: (1) the splicing of 
a so-called spliced leader (SL) onto the 5’ ends of 
mRNAs, which occurs in trypanosomes, euglena, 
roundworms, flatworms, and primitive chordates (2) 
group II splicing of separate RNAs in certain organel- 
lar systems, and (3) situations in which separate RNAs 
undergo trans-splicing by group-I-dependent, group- 
II-dependent, or spliceosome-dependent mechanisms, 
either because they have been engineered to do so or 
because of a rare, poorly understood, low-frequency 
event. Because these three kinds of trans-splicing are 
unrelated processes and are grouped here only because 
they are each classified as ‘trans-splicing,’ they will be 
considered separately. 


Spliced Leader Addition 


In this kind of trans-splicing, a short donor RNA con- 
tributes its 5’ end to form the 5’ end of an mRNA. The 
spliced leader (SL) replaces the 5’ sequences of the pre- 
mRNA, and the reaction is catalyzed by most of the 
same machinery that catalyzes nuclear intron removal. 
That is, these are spliceosome-catalyzed reactions. 
The donor in trans-splicing is itself a small nuclear 
ribonucleoprotein particle. It is comprised of a short 
RNA, the SL RNA, which is 100-135 nucleotides in 
length, and several bound proteins. The first 21 to 51 
nucleotides of the SL RNA are transferred to a recipi- 
ent RNA by trans-splicing. The SL RNA is folded 
into a three-stem/loop structure with a conventional 
Sm protein-binding site located between the second 
and third stem. The SL snRNP contains the Sm pro- 
teins also found on U1, U2, U4, and U5 snRNPs, as 
well as some unique proteins that have not been found 


on other snRNPs. The spliced leader itself is immedi- 
ately followed by a conventional 5’ splice site that acts 
as the donor in trans-splicing. The recipient in trans- 
splicing is a standard pre-mRNA in most respects. 
However it differs from most pre-mRNAs by begin- 
ning with an intron-like sequence, sometimes called 
an outron, instead of the usual exon at the 5’ end. The 
outron ends with a conventional 3’ splice site that acts 
as the trans-splice acceptor. The 5’ splice site on the SL 
RNA interacts with a branch point in the outron to 
form a Y-branched intermediate that is subsequently 
resolved by splicing of the short SL to the 3’ splice site 
at the end of the outron. The resulting products are (1) 
the SL spliced to the first exon of the acceptor RNA, 
and (2) the outron branched to the downstream por- 
tion of the SL RNA. The latter is presumably de- 
branched and the nucleotides recycled as with the 
lariat byproducts of cis-splicing or intron removal, 
the more familiar nuclear splicing event. Trans- 
splicing is catalyzed by most of the same snRNPs as 
catalyze cis-splicing. One exception though is the U1 
snRNP responsible for recognition and choice of the 
5! splice site. In trans-splicing U1 plays no role since 
the 5’ splice site is present on a snRNP already. In fact 
it is base paired in all known SL snRNPs to the SL 
itself in a short helix reminiscent of the U1 RNA/5’ 
splice site helix. However, this base pairing is not 
required for trans-splicing in vitro or in vivo. 

In trypanosomes and at least some nematodes (and 
possibly some flatworms) many or all of the acceptor 
molecules are synthesized as polycistronic precursors. 
Each pre-mRNA contains RNA copies of several 
genes, and in these cases trans-splicing is used to 
resolve the polycistronic precursor into mature mono- 
cistronic mRNAs. In addition, 3’ end formation 
occurs just upstream (generally about 100-400 nt 
upstream) which results in a polyadenylated upstream 
mRNA and an SL-containing downstream mRNA. In 
these cases, the trans-splicing reaction follows the same 
course as described above except the branching occurs 
at a branch point between genes rather than near the 5’ 
end of the mRNA. In trypanosomes, there is only a 
single SL RNA, which is used for trans-splicing both 
at the 5’ ends and at internal sites in polycistronic 
mRNAs. In the nematode Caenorhabditis elegans, 
about 25% of genes are transcribed as parts of poly- 
cistronic precursors containing two to more than five 
genes. There is a special SL RNA, called SL2, which is 
used for trans-splicing at trans-splice sites between 
genes in these polycistronic pre-mRNAs. SL2 RNA 
has a secondary structure similar to the SL RNAs 
described above, but its sequence is different. 

In C. elegans, many polycistronic precursors must 
undergo SL1-trans-splicing at their 5’ ends, intron 
removal throughout, and SL2-trans-splicing at 


internal trans-splice sites between genes. How are 
these different processes accomplished with specifi- 
city? Not all the players in the reaction are known 
yet, but it is clear that an intron or synthetic intron- 
like RNA can serve as an outron if placed at the 5’ end 
of a pre-mRNA. Furthermore an outron can be 
excised as an intron if a 5’ splice site is placed within 
it. Thus the context of a 3’ splice site, rather than any 
particular sequence, determines whether it is sub- 
jected to trans- or cis-splicing. In general, a 3’ splice 
site near the 5’ end of a pre-mRNA, with no upstream 
5! splice site, will be trans-spliced. This can be most 
easily understood by envisioning a spliceosome begin- 
ning to form around a 3’ splice site; if an upstream U1 
snRNP bound to a5’ splice site pairs with it, then cis- 
splicing occurs, whereas if no upstream site is found, 
then the SL snRNP provides a 5’ splice site in trans. 

The rules for SL2 trans-splicing at internal sites in 
polycistronic pre-mRNAs are less clear. In trypano- 
somes the downstream trans-splicing event deter- 
mines the location of upstream 3’ end formation. 
However, in worms the events are largely independ- 
ent, although interference with 3’ end formation does 
affect the SL2 specificity of trans-splicing. In the only 
operon studied so far, a 22-nucleotide U-rich sequence 
about 30 nt downstream of the 3’ end formation site 
has been shown to be required for utilization of SL2. It 
is not yet known what trans-acting factors interact 
with this sequence. The sequence of the remainder 
of the intercistronic region is not required for trans- 
splicing. 


Group II Trans-Splicing 


Group II introns occur in plant mitochondria and 
chloroplasts. Most exist between adjacent exons and 
their removal by cis-splicing is dependent on a com- 
plex secondary structure containing six stem/loop 
domains. However, in some instances, especially in 
the nad1, nad2, and nad5 genes of higher plant mito- 
chondria, the exons have become rearranged. In these 
cases the individual pieces of the genes are transcribed 
separately. The transcripts can then form the analo- 
gous stems by intermolecular base pairing and the 
splicing occurs in trans just as if it were occurring 
within a single transcript. In some cases one of the 
transcripts contains no exon sequences; apparently 
its only purpose is to bring the correct exons together 
in a trimolecular stem-loop structure to allow the 
correct trans-splicing to occur. 


Other Instances of Trans-Splicing 


It is well established that autocatalytic Group I spli- 
cing can be engineered to occur in trans, as can Group 
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II intron splicing that normally occurs in cis. Further- 
more the eukaryotic nuclear mRNA splicing ma- 
chinery can splice together two separate RNAs by 
conventional mechanisms in vitro. This reaction is 
relatively efficient when spliceosomes are offered 
two substrates, one of which contains only a 3’ splice 
site while the other has only a 5’ splice site. In these 
cases, splice sites normally used for cis-splicing are 
used in trans. A low level of trans-splicing occurs 
in vivo as well. Trans-splicing has been detected in a 
variety of cells, always cases of splicing between two 
different mRNAs at cis-splice sites. These have been 
detected by RT-PCR, which greatly amplifies rare 
products, and by isolation of single rare cDNA clones. 
Nevertheless, there have now been numerous reports 
that can be explained only by trans-splicing having 
occurred. Most of these examples have occurred in 
mammalian cells. So far, it is not clear what has 
brought the two exons from separate mRNAs to- 
gether. One possibility would be formation of an 
RNA double helix or other tertiary structure involv- 
ing the two molecules which could artificially bring 
the two splice sites into proximity. It has been possible 
to force trans-splicing to occur in mammalian systems 
by engineering two molecules in which portions of the 
‘intron’ sequences can anneal. In these cases it appears 
as if the splicing machinery is ‘fooled’ into believing 
the 5’ and 3’ splice sites are on the same molecule 
and so it splices them together, creating a hybrid 
molecule. This is splicing in trans, but it is presumably 
mechanistically identical to normal splicing, since 
conventional 5’ and 3’ splice sites are used. The sig- 
nificance of these rare events is unclear, since there are 
no cases in which the products of these trans-spliced 
chimeric RNAs have been shown to function. Pre- 
sumably this sort of trans-splicing is just an unin- 
tended consequence of normal nuclear pre-mRNA 
processing events. 


See also: Pre-mRNA Splicing 
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Transvection is the ability of a locus to influence 
activity of an allele on the other homolog only when 
the two chromosomes are synapsed. 


See also: Synapsis in DNA Transactions 
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Transversion mutation is a specific kind of point 
mutation, one in which a single purine is substituted 
for a pyrimidine or vice versa. As the result of a 
transversion mutation, the mutated position in the 
gene may for example have an adenine where it had a 
thymine or cytosine. Transversions are much less 
common than transition mutations — the other form 
of point substitution mutations, in which one of the 
two purines or pyrimidines is substituted for the other 
— because the generation of transversions during repli- 
cation requires much greater distortion of the double 
helix than does the production of transition muta- 
tions. The general shape of the base mistakenly 
inserted into the base pair to produce a transition 
mutation is conserved and there mainly is a change 
in electron distribution; this may happen simply 
because the base in the template strand is, for example, 
in a rare enol state rather than its more common keto 
form at the moment of replication. On the other hand, 
the base pair formed during the production of trans- 
version mutations is either much larger (involving two 
purines) or much smaller (with two pyrimidines) than 
the standard base pair. 

Interestingly, the genetic code has evolved in sucha 
way that transversion mutations are much more likely 
than are transition mutations to lead to substituting an 
amino acid with very different properties and thus to 
significant changes in the properties of the protein, 
due to the relationships between the sequence patterns 
of the codons for the various amino acids. Because of 
the degree of degeneracy in the third nucleotide 
of most codons (except those for tryptophan and 
methionine), a transversion in the third nucleotide is 
less likely to affect the organism than a transversion in 
the second or first nucleotides. 

A classic example of a transversion leading to a 
major change in protein properties is found in the 
sickle cell mutation in human hemoglobin. In those 
with sickle cell anemia, a thymine is substituted for an 
adenine in the second position of the sixth codon of 
the gene for the B subunit, leading to the incorporation 
of valine (a hydrophobic amino acid) rather than glu- 
tamic acid (which is very hydrophilic) at that position. 
As a consequence of this change in each of the two 
B subunits, the individual hemoglobin tetramers 
stick to each other to form very long chains that lead 
to the characteristic sickle shape of the cells. In the 
case of sickle cell trait, where only one of the two 


hemoglobin alleles carries the sickle-cell mutation, 
only half of the subunits are mutated so no long chains 
are formed. 


See also: Invariants, Phylogenetic; Sickle Cell 
Anemia; Transition 
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In evolutionary biology, phylogenetic trees of organ- 
isms or genes are often called ‘trees.’ Mathematically, 
a tree is defined in terms of graph theory as follows: 
all the nodes are connected via edges, and there is only 
one path to connect any two nodes. Therefore, a net- 
work is not a tree because there is more than one path 
(route) to connect one node with another. Nodes are 
divided into external and internal ones. The former are 
also called operational taxonomic units (OTUs) in 
evolutionary biology, or leaves in computer science. 
There are five OTUs (1-5) and four internal nodes 
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Figure | Tree types. (A, B) Rooted; (C) unrooted. 


(X, Y, Z, and R1) in the tree of Figure IA. Edges are 
usually called branches in evolutionary biology, and 
not only topological relationship (how to connect 
nodes) but also distance value is often added to 
branches. Branches can also be divided into external 
and internal ones. An external branch connects an 
external node to an internal node (e.g., branch 1-X 
of Figure |), while an internal branch connects two 
internal nodes (e.g., branch X-Z of Figure 1). 

A tree can be either rooted or unrooted. A rooted 
tree has a special node called the root that is defined as 
the position, R, of the common ancestor (see Figures 
1A and IB). There will be a unique path from the root 
to any other node, and the direction of this is of course 
that of time in evolution. A phylogenetic tree in an 
ordinary sense is a rooted tree. Unfortunately, how- 
ever, many methods for building phylogenetic trees 
produce unrooted trees, such as the tree shown in 
Figure IC. An unrooted tree can be converted to a 
rooted tree if the position of the root is specified. Trees 
of Figures IA and IB were thus produced from the 
unrooted tree of Figure IC. Rooted/unrooted trees 


(B) 1 3 
2 T 
5 
6 
4 
(©) i 3 
2 
6 4 
5 


Figure 2 Tree types. (A, B) Bifurcating; (C) multi- 
furcating. 
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are also called directed/undirected trees in math- 
ematics. 

This relation between rooted and unrooted trees is 
used for the ‘outgroup’ method of rooting as follows. 
When we are interested in determining the phylogen- 
etic relationship among the n sequences, we will 
add one (or more) sequence that is known to be an 
outgroup relative to the n sequences. The obtained 
unrooted tree for the n + 1 sequences can easily be 
converted into a rooted tree of n sequences. Sequence 
5 corresponds to the outgroup in the tree of Figure IC 
when the root is R1, and the tree of Figure | A is then 
obtained. When the root is R2, sequences 3 and 4 are 
considered to be the outgroup to sequences 1, 2, and 5, 
and we obtain the tree of Figure IB. 

The number of possible tree topologies rapidly 
increases with an increasing number of OTUs. The 
general equation for the number of possible topo- 
logies for bifurcating unrooted trees (Tn) for n OTUs 
is given by: 


Tn = (2n —5)!/[2"-3 (n — 3)!] 


If we apply this equation, there are 221643 
095 476 699771875 possible tree topologies for 20 
OTUs. It is clear that the search for the true phylo- 
genetic tree of many sequences is a very difficult 
problem. This is why so many methods have been 
proposed for building phylogenetic trees. 

Other important concepts in trees are bifurcating 
trees and multifurcating trees. The trees shown in 
Figure | and those in Figures 2A and 2B are all 
bifurcating ones, while the tree in Figure 2C is multi- 
furcating. Theoretically, multifurcating trees can be 
considered as bifurcating trees in which some branches 
have zero length. For example, the tree of Figure 2C 
can be equated to those of Figures 2A and 2B when 
branches S and T of Figures 2A and 2B, respectively 
are zeros. This relationship is used to produce ‘con- 
sensus’ trees. Let us compare the tree structures of 
Figures 2A and 2B. There are slight differences 
between them, and if we ignore branches S and T, we 
obtain Figure 2C. This tree can be considered as the 
consensus tree of those in Figures 2A and 2B. 

If the evolutionary rate is constant, we obtain a 
particular type of rooted tree, which can be called a 
‘clock’ tree. This example is shown in Figure 3A. 
When there is heterogeneity of evolutionary rates 
in different lineages of a tree, a non-clock tree is 
obtained, as in Figure 3B. It should be noted, how- 
ever, that trees that look like a clock tree can be con- 
structed if we assume constancy of evolutionary rate, 
even if constancy does not hold in reality. Unweighted 
pair group method with arithmetic mean (UPGMA) is 
such a method for producing a clock-like tree. Many 
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Figure 3 Rooted trees. (A) Clock tree; (B) non-clock 
tree. 


other tree-making methods usually produce trees 
without assuming constancy of evolutionary rate. 
However, they only produce unrooted trees, unlike 
UPGMA which always produces rooted trees. 

Ideally, branch lengths of a phylogenetic tree are 
proportional to the physical time since divergence. 
Thus branches a and b of Figure 4A should be the 
same length. We call this type of rooted tree the 
‘expected tree.’ Both species and gene trees have 
their expected trees, but their properties are somewhat 
different from each other. An expected gene tree 
directly reflects the history of DNA replications, 
while an expected species tree is a gross simplification 
of the course of differentiation of populations. There- 
fore, the speciation time is not always clear. 

The genealogical relationship of genes, or expected 
gene tree, is independent from the mutation process. 
However, mutation events are essential for the 
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Figure 4 Phylogenetic trees. (A) ‘Expected’; (B) 
‘realized’ 


reconstruction of phylogenetic trees. Thus we can at 
best estimate a gene tree according to the mutation 
events realized on its expected gene tree. We call this 
ideal reconstruction of the gene tree as the ‘realized’ 
gene tree (Figure 4B), while the reconstructed one 
from observed data is called the ‘estimated’ gene tree. 
Branch lengths of realized and estimated genes tree are 
proportional to mutational events. These mutational 
events are not necessarily proportional to physical 
time. By definition, expected gene trees are strictly 
bifurcating, while realized and estimated gene trees 
may be multifurcating. This is because of the possi- 
bility of no mutation in a certain branch, such as 
branch X of Figure 4A. 

A species tree reconstructed from observed data is 
called an ‘estimated’ species tree, while there is no 
realized species tree. It should also be noted that both 
expected and realized trees are rooted, while estimated 


trees are often unrooted due to the limitations of 
available information. 


See also: Gene Trees; Genetic Distance; 
Phylogeny; Species Trees; Taxonomy, Numerical 
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Plant hairs, also called trichomes, are specialized epi- 
dermal cells. On aerial organs trichomes have a pro- 
tective role against insects or sun. In Arabidopsis 
thaliana trichomes are easily accessible and have 
become a genetic model system for the analysis of 
pattern formation and cell differentiation. 


Genetic Dissection of Trichome 
Development in Arabidopsis 


In Arabidopsis, trichomes are unicellular, branched 
cells that are regularly distributed on most aerial sur- 
faces. Systematic screens for trichome mutants in 
Arabidopsis revealed 37 complementation groups. 
The analysis of these mutants enabled the dissection 
of trichome development into distinct, genetically 
controlled steps (Figure 1): (1) initiation, (2) endore- 
duplication, (3) differentiation, (4) branching, 
(5) expansion, and (6) maturation. 


Trichome Initiation 


On leaves, trichomes are initiated at the base in a field 
of dividing epidermal cells. The incipient trichome 
cells are separated by three to four epidermal cells 
and show a characteristic spacing pattern. Trichome 
patterning does not seem to involve cell lineage. 
Rather it is thought to be based on a mechanism 
where initially equivalent epidermal cells compete 
with each other via cell-cell interactions (Figure 2A). 
According to the current models GLABRAI (GL1), a 
MYB-related transcription factor that is expressed in 
developing trichomes, and TRANSPARENT TESTA 
GLABRAI1 (TTG1), a WD40 protein, function as 
positive regulators of trichome development. Epider- 
mal cells surrounding a young trichome are inhibited 
from becoming trichomes by the negative regulator 
TRIPTYCHON (TRY) probably by downregulating 
the two positive regulators GL1 and TTG. TRY 
encodes a MYB-related protein lacking the activation 
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domain and is assumed to execute its inhibitory func- 
tion by directly moving to neighboring cells. 


Endoreduplication 


Incipient trichome cells stop cell divisions but pro- 
ceed, on an average through four cycles of DNA 
replication (called endoreduplication). The number 
of endoreduplication cycles in trichomes is controlled 
by two genetic pathways. One pathway depends on 
the plant hormone gibberellin (GA). Mutants defi- 
cient in GA biosynthesis lack trichomes and a mutant, 
spindly (spy), that results in a constitutive activation 
of the GA signal transduction pathway displays tri- 
chomes with an increased DNA content (Figure 2B). 
In addition three genes control trichome endoredupli- 
cation in a GA independent pathway. Strikingly, two 
of them, GL/ and TRY, also play a role during tri- 
chome patterning, with GL1 promoting and TRY 
inhibiting additional endoreduplication cycles. In 
addition, the GL3 gene is required as a positive regu- 
lator. gl3 mutants undergo only three cycles of 
endoreduplication and since this phenotype can not 
be rescued by any other overreplicating mutant GL3 is 
assumed to act upstream of all other known genes. 


Differentiation Mutants 


Five genes, GLABRA2, ROOT HAIRLESS1 (RHL1), 
RHL2, RHL3, and ECTOPIC ROOT HAIR3 
(ERH3), appear to function early during trichome 
differentiation in the regulation of genes acting later. 
The corresponding mutants show a wide range of 
trichome phenotypes: trichome size and branching is 
generally reduced, and mutant trichomes often lack 
papillae on their surface. These phenotypic aspects 
resemble the single mutant phenotypes of other tri- 
chome morphogenesis mutants and it is therefore 
believed that the differentiation genes are required to 
integrate the function of later-acting trichome mor- 
phogenesis genes. Consistent with this idea is the 
finding that the cloning of the GLABRA2 gene 
revealed that it encodes a protein with sequence simi- 
larity to homeodomain transcription factors. 


Branching Mutants 


Fifteen genes have been identified that function as 
positive or negative regulators of branch number. They 
fall into two groups. One group establishes a connec- 
tion between the DNA content and branch number 
(Figure 2B,C). Accordingly, mutants with a reduced 
DNA content, e.g. glabra3, have fewer branches 
while mutants with an increased DNA content, e.g. 
triptychon, show more branches. Since changes in 
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Figure | Trichome development mutants. Left: Schematic illustration of developmental steps during trichome 
formation. Right: Examples of mutants affecting various developmental steps. Abbreviations: try (triptychon), gl2 
(glabra2), gl3 (glabra3), sti (stichel), klk (klunker) (a member of the distorted class of mutants), cha (chabli). 
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Figure 2 Genetic models of trichome development. (A) Trichome cell selection. The genetic model postulates that 
GLABRA! (GLI) and TRANSPARENT TESTA GLABRA I (TTG) form a positive regulatory loop and trichome development 
is initiated by trichome differentiation genes such as GLABRA2 (GL2). Cell-cell communication is mediated by 
TRIPTYCHON (TRY) which is activated by TTG and downregulates GLI. As a result cells compete with each other to 
become a trichome cell. In the upper situation both cells are in an equilibrium. Below, the right, shaded cell has gained 
higher concentrations of GL! and TTG and suppresses trichome development in the left cell. (B) Endoreduplication. 
The number of endoreduplication cycles is controlled by positive and negative regulators. Arrows indicate positive 
regulation events, blunted bars indicate negative regulatory events. Abbreviations: GA (gibberellin), GL! (GLABRA!), 
GLABRA3 (GL3), SPY (SPINDLY), PYC (POLYCHOM), KAK (KAKTUS), RFI (RASTAFARI), TRIPTYCHON (TRY). (C) Branching. 
The number of branches is controlled by several independent pathways. Abbreviations: STI (STICHEL), FRCI (FURCA!), 
FRC2 (FURCA2), FRC3 (FURCA3), FRC4 (FURCA4), STA (STACHEL), ZWI (ZWICHEL), AN (ANGUSTIFOLIA), NOK (NOEK). 
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the DNA content in the wild-type background by 
using inhibitors of DNA replication or in tetraploid 
plants also result in a correlation between the DNA 
content and branch number it is unlikely that the 
mutants have two separate roles in the two processes. 
This suggests that either cell growth or cell size con- 
trols branch number. 

In the second group of mutants the DNA content is 
like in the wild-type. Genetically, they seem to act 
largely in independent pathways (Figure 2C). Only 
FURCA2 and STACHEL seem to function redun- 
dantly and downstream of ZWICHEL, FURCA4, 
and NOEK. The DNA content-related pathway 
seems to be mediated by ANGUSTIFOLIA. Pre- 
sently the underlying molecular mechanisms are 
unknown. Only the ZWICHEL gene has been cloned. 
The ZWICHEL gene encodes a member of the kin- 
esin superfamily of motor proteins that contains a 
calmodulin-binding site. It is therefore assumed to be 
either involved in the transport of important intracel- 
lular components or in the reorganization of micro- 
tubules prior to branch initiation. A role of the 
microtubules in branch formation is also suggested 
by drug inhibitor experiments: destabilization of 
microtubules results in unbranched trichomes and 
the stabilization of microtubules can trigger branch 
formation in the unbranched stichel mutant. 


Trichome Expansion 


Eight genes, grouped in the DISTORTED class, are 
required to maintain the directionality of trichome cell 
expansion. Development of trichomes in distorted 
mutants is nearly normal until branch initiation 
while later growth is irregular resulting in mature 
trichomes displaying a twisted and distorted pheno- 
type. Although none of these genes is cloned yet, an 
analysis of the cytoskeleton in these mutants suggests 
that the DISTORTED genes play a role in the organ- 
ization of the actin cytoskeleton. All distorted mutants 
show strong abnormalities in the organization of the 
actin cytoskeleton. The biological relevance of the actin 
cytoskeleton in the expansion growth has been inde- 
pendently demonstrated with drug inhibitors. Drugs 
interfering with the actin organization result ina pheno- 
type indistinguishable from the distorted mutants. 


Trichome Maturation 


During trichome maturation the cell wall thickens and 
small papilla are formed. This step is affected in five 
mutants, under developed trichome (udt), trichome 
birefringence (tbr), chablis (cha), chardonnay (cdo), 
and retsina (rts). These mutants appear transparent 
and may even collapse at some point. Only the tbr 


mutant has been studied in some detail and was shown 
to be affected in cellulose deposition. 


Further Reading 

Hilskamp M (2000) How plants split hairs. Current Biology 10: 
R308-R3 10. 

Hilskamp M, Folkers U and Schnittger A (1999) Trichome 
development in Arabidopsis thaliana. International Review of 
Cytology 186: 147-178. 

Marks MD (1997) Molecular genetic analysis of trichome devel- 
opment in Arabidopsis. Annual Review of Plant Physiology and 
Plant Molecular Biology 48: 137—163. 

Oppenheimer D (1998) Genetics of plant cell shape. Current 
Opinion in Plant Biology |: 520-524. 

Szymanski DB, Lloyd AM and Marks MD (2000) Progress in the 
molecular genetic analysis of trichome initiation and mor- 
phogenesis in Arabidopsis. Trends in Plant Sciences 5: 53. 


See also: Arabidopsis thaliana: The Premier Model 
Plant; Cell Lineage 


Trinucleotide Repeats: 
Dynamic DNA and Human 
Disease 

V Brown and S T Warren 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1337 


For many years a handful of heritable disorders puz- 
zled geneticists by showing a tendency of the disease 
phenotype to become more severe or have earlier age- 
of-onset as the disease is passed on through sub- 
sequent generations in a family (Wells and Warren, 
1998). This has been termed genetic anticipation, 
which is similar to the Sherman paradox described in 
fragile X syndrome, where the likelihood of having 
an affected child increases through subsequent gen- 
erations of a pedigree (Warren and Sherman, 2000). 
In 1991, through research on spinal bulbar muscular 
atrophy and the fragile X syndrome, scientists discov- 
ered of a new class of genetic mutation termed tri- 
nucleotide repeat expansions or dynamic mutations 
(La Spada et al., 1991; Warren and Sherman, 2000). 
Understanding this novel type of mutation revealed 
in molecular terms the underlying mechanism of 
both genetic anticipation and the Sherman paradox. 
To date, at least 24 neurological diseases and 17 non- 
neurologic genetic diseases show evidence of genetic 
anticipation (Wells and Warren, 1998) and at least 
20 genetic disorders have been linked to mutations 
in trinucleotide repeat tracts (Table 1). 


Table | 


Human disorders caused by trinucleotide repeats 


Repeat Locale Normal Affected Disorder Inheritance Gene Locus Protein Result 
CGG S'UTR 6-52 52-230 (pre) Fragile X syn- X-linked domin- FMRI Xq27.3 RNA-BP CpG DNA methylated, 
repeats >230-2000 drome ant FRAXA transcriptionally silenced, 
(full) mental retardation 
GCC S'UTR 7-35 130-150 (pre) Fragile X, mental X-linked domin- FMR2 Xq28 Transcription CpG DNA methylated, 
230-750 (full) retardation ant FRAXE factor transcriptionally silenced, 
mild mental impairment 
CGG 5'UTR ll 80 (pre) Jacobsen Expansion carrier, CBL2 11q23.3 Proto- | 1q23->qter loss, mild 
100—1000 syndrome recessive oncogene mental impairment 
(full) (FRAI IB) 
CTG 3V'UTR 5-37 50-80 (pre) Myotonic dystro- Autosomal DMPK 19q13 Ser-Thr protein Impaired nuclear export 
>80—3000 phy (DM) dominant kinase of mRNA 
(full) 
GAA Intron <39 80 (pre) Friedreich’s ataxia Autosomal X25 FRDAI 9q13-21.1 Mitochondrial Low expression of ma- 
>| 12-1700 (FA) recessive iron-binding ture mRNA, impaired 
(full) protein iron transport 
CAG Coding l1-33 38—66 Spinobulbar mus- X-linked recessive AR Xql2-21 Androgen Mildly androgen insensi- 
Gln cular atrophy or receptor tive (loss of function), 
Kennedy’s disease neuronal loss: spinal 
(SBMA) cord, cranial nerves 
CAG Coding 6-39 36-121 Huntington Autosomal ITI5 4p16.3 Huntingtin Intracellular protein ag- 
Gin disease (HD) dominant gregates, neuronal loss: 
striatum, cerebral cortex 
CAG Coding 7-34 51-88 Dentatorubral Autosomal DRPLA B27 12p13 Atrophin-| or Nuclear protein aggre- 
Gln pallidoluysian dominant drplap gates, neuronal loss: den- 
atrophy or Haw tate nucleus, neocortex 
River syndrome 
CAG Coding 6-39 41-81 Spinocerebellar Autosomal SCAI 6p23 Ataxin- | Nuclear protein aggre- 
Gln ataxia type | dominant gates, neuronal loss: Pur- 
(SCA I) kinje cells, brainstem 
CAG Coding 14-31 35-64 Spinocerebellar Autosomal SCA2 12q24.1 Ataxin-2 No protein aggregate 
Gln ataxia type 2 dominant detected 
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Repeat Locale Normal Affected Disorder Inheritance Gene Locus Protein Result 
CAG Coding 12-41 40-84 Spinocerebellar Autosomal SCA3 MJDI 14q32.1 Ataxin-3 Nuclear/perinuclear 
Gln ataxia type 3 or dominant protein aggregates, 
Machado-Joseph neuronal loss: basal 
disease (SCA3/ ganglia, brainstem 
MJD) 
CAG Coding 7-18 20-23 (EA2) Spinocerebellar Autosomal CACNAIA 19p13 a l A-voltage Cytoplasmic protein 
Gin 21-27 (SCA6) ataxia type 6 dominant dependent Ca? aggregates, neuronal loss: 
(SCA 6) or epi- channel subunit Purkinje cells 
sodic ataxia 
type 2 
CAG Coding 7-17 38-130 Spinocerebellar Autosomal SCA7 3p12-13 Ataxin-7 Nuclear protein aggre- 
Gin ataxia type 7 dominant gates, neuronal loss: 
(SCA 7) cerebellum, inferior olive 
(CTA)3_17 Spinocerebellar Autosomal SCA 8 13q2I Antisense RNA Spastic, ataxic dysarthria, 
CTG, 3/UTR <100 107-250 ataxia type 8, dominant (antisense) Sense cerebellar atrophy 
>>250 (?) schizophrenia, or KLHLI (sense) RNA-BP 
BPAD (allelic 
variant?) 
CAG Coding 7-28 >65 Spinocerebellar Autosomal PP2A (putative) 11q22-24 Regulatory sub- Tremor, ataxia, dementia, 
Gin ataxia type 12 dominant unit of protein damaged basal ganglia and 
(SCA 12) phosphatase 2A cerebellum 
GCG Coding 6-7 8-13 (AD) Oculopharyngeal Autosomal PABP2 14ql 1.2 Poly-A RNA-BP, Nuclear filament protein 
Ala muscular dominant adenylation accumulation 
dystrophy factor 
7 (AR) 
homozygote Autosomal reces- 


sive 
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Two common techniques are used to identify 
repeat expansions in known genes: Southern blotting 
and polymerase chain reaction (PCR). Southern blots 
of genomic DNA are hybridized with radioactive 
probes that will bind to the unique DNA sequence 
in or near the repeat region in question. The radio- 
active probe identifies the DNA fragment harboring 
the repeat in question and the size of that fragment 
reflects the number of triplet repeats at that locus. 
PCR uses primers that bind to the unique sequences 
on each flank of a repeat and amplifies the DNA frag- 
ment containing the repeat in question. The size of 
the amplified DNA product reflects the size of the 
repeated region (Wells and Warren, 1998; Warren and 
Sherman, 2000). To locate new genes affected by 
repeat expansion, genome-wide screens are performed 
in search of large trinucleotide tracts of DNA. Tagged 
synthetic oligonucleotides of any desired repeat 
sequence (for example: (GAC),7) bind any DNA frag- 
ment containing GAC/CAG repeats. By virtue of 
the oligonucleotide tag, repeat-containing DNA frag- 
ments are tracked down, cloned, and analyzed for 
unique flanking sequences. The unique flanking 
sequence provides information about the genomic 
location of the large repeat tracts. The method of 
finding trinucleotide repeat expansions without pre- 
vious knowledge of genomic location is named repeat 
expansion detection (RED) (Vincent et al., 2000). 

Repeated triplet sequences occur naturally within 
expressed genes. The ‘dynamic’ repeats show inter- 
generational instability; the number of triplet units 
found in a child’s gene is different than the parental 
alleles. However, this instability is typical only in 
families with the disorder. In most normal families, 
repeats are stable. Although most known dynamic 
triplet repeats are GC-rich, the GAA expansion in 
Friedreich’s ataxia implies that any triplet combin- 
ation may be subject to dynamic instability. Each 
trinucleotide tract has a characteristic range of poly- 
morphic alleles in the normal population (Table 1) 
(Wells and Warren, 1998; Cummings and Zoghbi, 
2000). Massive expansions occur in 5’ untranslated 
regions, introns, or 3’ untranslated regions, which are 
noncoding regions of genes that are transcribed into 
RNA but not translated into protein. The CGG repeat 
in fragile X syndrome, the GAA repeat in Friedreich’s 
ataxia, and the CTG repeat in myotonic dystrophy 
can each expand by hundreds to thousands of repeats 
during transmission from parent to child. In contrast, 
even moderate expansions are sufficient to cause dis- 
ease when they occur in the coding regions of genes 
as seen with the CAG repeats in the spinocerebellar 
ataxias. The trinucleotide repeat diseases can therefore 
be subdivided into two categories: coding (moderate) 
expansions and noncoding (massive) expansions. 


Trinucleotide Repeat Mutations in 
Coding Regions of Genes 


In the coding region of a gene, even moderate expan- 
sions of roughly 15-100 repeats can be pathogenic 
(Table 1). Each added DNA triplet encodes an extra 
amino acid, which potentially disrupts the protein 
structure and function. As the repeat expands beyond 
a certain threshold, protein function is pathologic- 
ally altered. With the exception of oculopharyngeal 
muscular dystrophy, all of the known coding repeat 
expansion mutations code for polyglutamine (polyQ) 
stretches, and almost universally cause neurologic 
disorders (SBMA, HD, SCA1, SCA3/MJD, SCA6, 
SCA7, SCA12). Polyglutamine diseases are domin- 
antly inherited. CAG expansions are defined as 
altered-function alleles that cause a hallmark cellular 
phenotype: intracellular protein aggregates contain- 
ing the mutant polyQ protein, ubiquitin, transglutam- 
inase, and heat-shock proteins (Kaytor and Warren, 
1999). Although each of these proteins harboring 
expanded polyQ tracts are found throughout the 
brain, a particular mutant protein causes only selective 
degeneration of a unique subset of neurons (Table 1). 
Interacting proteins that are expressed in a specific 
subset of cells and interact with a specific mutant 
polyQ protein may impart neuronal specificity. In 
support of this hypothesis, huntingtin, ataxin-1, atro- 
phin-1, and the SBMA androgen receptor protein are 
each associated with a different cell-specific partner 
(Wells and Warren, 1998). Another view is that cell- 
specific proteases release pathogenic peptides from 
specific polyQ protein substrates. These peptides seed 
intracellular protein aggregates which may either 
catalyze or be a consequence of neuronal death. Cell 
culture experiments have established that polyQ pep- 
tides are more toxic than polyQ stretches within the 
context of a full-length huntingtin protein, and indeed 
caspases are sufficient to facilitate polyQ peptide 
toxicity (Ferrigno and Silver, 2000). One view is that 
polyQ stretches impede protein degradation by the 
proteasome such that mutant protein overwhelms 
the normal protein turnover machinery of the cell. The 
consequent disruption of general protein metabolism 
would then lead to cell death. Cellular ubiquitin is a 
‘tag’ that targets proteins for degradation by the pro- 
teasome (Alves-Rodrigues et al., 1998). Cummings and 
coworkers studied pathogenesis in cells that could not 
ubiquitinate mutant expanded ataxin-1 cells. Surpris- 
ingly, in transgenic mice lacking ubiquitin ligase in 
brain Purkinje cells expressing mutant ataxin-1, there 
was no correlation between nuclear inclusions (pro- 
tein aggregates) and pathogenicity. In fact, the brains of 
mice with reduced protein aggregates had more 
neurodegeneration (Cummings and Zoghbi, 2000). 
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This has fueled debate over whether protein aggrega- 
tion in neurons may somehow protect neurons, and 
not catalyze cell death. 


Trinucleotide Repeat Mutations in 
Noncoding Regions of Genes 


Trinucleotide repeat mutations are not always found 
in coding regions of genes; they have also been 
detected within introns as well as within noncoding 
exons of the 5'- and 3’-untranslated regions in mature 
mRNA. Moderate expansions of trinucleotide tracts 
that do not code for amino acids are often asymp- 
tomatic, as seen in fragile X syndrome, myotonic 
dystrophy, and Friedreich’s ataxia. The moderate 
expansions generate ‘pre-mutation’ alleles — highly 
unstable trinucleotide tracts that may undergo mas- 
sive expansions, adding hundreds to thousands of 
triplet repeats during transmission from parent to off- 
spring. These massive expansions can become patho- 
genic by silencing transcription, inhibiting translation, 
altering mRNA splicing, or impeding RNA export 
from the nucleus (Wells and Warren, 1998). In fragile 
X syndrome, as the triplet repeat expands into the pre- 
mutation range, more disease-causing ‘full-mutation’ 
alleles arise through transmission to offspring, which 
explains the increased penetrance of fragile X syn- 
drome in successive generations and thus the Sherman 
paradox. The full-mutation CGG tract, which is 


Leading strand template 


Figure | 


near the promoter, becomes a target for DNA methy- 
lation thereby silencing transcription (Cummings and 
Zoghbi, 2000). In myotonic dystrophy, the tran- 
scribed expanded CUG repeat in the DMPK mRNA 
3’ UTR is believed to sequester CUG-binding pro- 
teins, thereby squelching proteins away from other 
cellular mRNAs and impairing general cellular RNA 
splicing and export (Cummings and Zoghbi, 2000). 
Massive GAA expansions in intron 1 of the X25 
gene are believed to impair transcription or possibly 
splicing of the expanded intron, reducing the cellular 
level of mature frataxin (Cummings and Zoghbi, 
2000). 


DNA Expansion Mechanism 


DNA expansions that occur within trinucleotide 
tracts are attributable to the unique biochemical diffi- 
culties of performing DNA replication within long 
stretches of repeat DNA. Mounting in vitro evidence 
suggests that triplets are deleted or added to long re- 
petitive tracts as the cellular machinery tries to repli- 
cate through DNA hairpins formed by triplet repeat 
sequences (Warren and Sherman, 2000). During repli- 
cation of the genome, DNA synthesis on the lagging 
strand of template requires small bits of DNA, Okazaki 
fragments, to be synthesized along the template and 
then joined into one continuous strand (Figure 1) 
(Zubay, 1993). In order to join Okazaki fragments, the 


Lagging strand template 


Okazaki fragments 


DNA replication. Polymerization proceeds from 5’ to 3’ for the newly synthesized DNA strands. On the 


lagging strand, synthesis proceeds 5’ to 3’ for each Okazaki fragment, but overall lagging strand synthesis proceeds 
3’ to 5’ as the fragments extend, meet, and are ligated together. Okazaki fragments are shown in chronological order 


of synthesis indicated by I, 2, 3. 
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upstream fragment extends downstream (3’) to the 
next fragment and displaces a small portion of it as a 
single strand ‘flap’ of DNA (Figure 1, |). Flap endonu- 
clease 1, FEN1, is an enzyme responsible for removing 
the ‘flap’ of single-standed DNA (Warrenand Sherman, 
2000). This allows end-to-end ligation of the 
Okazaki fragments and conserves the wild-type 
DNA sequence. In trinucleotide repeat tracts, the 
single-strand flap DNA may form a stable hairpin 
structure and remain in the DNA unremoved by 
FEN! (Figure I, 2). Upon ligation to the neighboring 
Okazaki fragment, the displaced hairpin region will 
be incorporated in the new DNA strand, effectively 
expanding the repeat region. The model predicts 
that DNA replication through hairpins in Okazaki 
fragments will cause an insertion of triplet units (Fig- 
ure IA), whereas hairpins formed in the template 
strand of DNA will cause triplet deletions (Figure IB). 
In support of this model, yeast mutants defective for the 
FEN1 homolog RAD27 show increased trinucleotide 
expansion rates (reviewed in Warren and Sherman, 
2000). Other supporting data show that CGG 
trinucleotide repeats which harbor AGG interruptions 
are more stable than pure CGG tracts. The AGG in- 
terruptions may act as landmarks on the DNA that 
facilitate proper primer/template alignment during 
replication, or may destabilize transient hairpins in 
the DNA. Interestingly, many of these dynamic muta- 
tions show gender-bias instability, meaning expansion 
or contraction occurs only during inheritance through 
the female (fragile X, Friedreich’s ataxia, myotonic 
dystrophy) or through the male (spinocerebellar atax- 
ias). This ‘meiotic drive’ however is often accompan- 
ied by somatic mosaicism, where nongametic cells 
harbor a spectrum of different allele sizes (Wells and 
Warren, 1998). 

For diseases with genetic anticipation (polyQ dis- 
eases, fragile X syndrome, myotonic dystrophy, Fried- 
reich’s ataxia) a negative correlation between triplet 
repeat length and disease age-of-onset coupled 
with a positive correlation between repeat length and 
disease severity reveals a story much like the Sher- 
man paradox (McInnis, 1996). Again, by elucidating 
the mechanism of trinucleotide expansions through 
transmission to offspring, this anticipation can be 
defined at a molecular level. For coding mutations, 
as the CAG repeats expand the polyQ stretches 
became longer, gaining the dominant function of 
expanded polyQ. 

Both coding and non-coding dynamic triplet re- 
peat genes have a characteristic range of ‘normal’ 
repeat, allowing some degree of heterogeneity in 
the unaffected population (Table 1). PABP2, the 


polyA-binding protein 2 gene for oculopharyngeal 
muscular dystrophy, does not tolerate even small 
expansions or contractions. Expansion of the GCG 
repeat in PABP2 becomes pathogenic after addition of 
just one repeat (recessive form) or three repeats (domin- 
ant form) coding for polyalanine (Brais et al., 1998). 

Discovering the mutational mechanism that under- 
lies trinucleotide repeat expansion has allowed us to 
define genetic anticipation phenomena in molecular 
terms. We have insight into an entire class of heritable 
disorders, and can associate phenotype with under- 
lying genotype of trinucleotide repeat disease. As 
we learn more about each of these diseases we are 
also gaining insight into genomic polymorphisms, 
chromatin structure, DNA replication, methylation, 
transcription, translation, and protein degradation. 
Trinucleotide repeats have thus opened a new window 
into dynamic research on dynamic DNA. 
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The direct demonstration that the genetic code is a 
triplet code came only when the relationship between 
the nucleotide sequence of a segment of DNA and the 
amino acid sequence of the polypeptide encoded by 
it was shown to be three nucleotides to one amino 
acid. Although procedures for determining the amino 
acid sequence of small polypeptides were extant in the 
late 1950s, DNA-sequencing protocols were not gen- 
erally available until the late 1970s. By that time the 
triplet nature of the code had become established 
through the interpretation of the results of both 
genetic and biochemical experiments. 

Published in 1961, a brilliant series of genetic 
experiments clearly indicated that the genetic code is 
a triplet code. Proflavin, a derivative of acridine, is 
highly mutagenic for replicating genomes. In the 
experiments alluded to, addition of proflavin to a 
bacterial culture infected with phage T4 yielded 
many mutant phage progeny. In general, the resulting 
mutant phage characteristically did not revert to 
the wild-type phenotype after subsequent treatment 
with base-substitution mutagens, but did do so after 
treatment with mutagenic acridines. It was proposed 
that acridine mutations and the majority of spontan- 
eous mutations are not caused by base substitutions 
but by addition or deletion of nucleotides, resulting in 
disruption of synthesis of the protein encoded by the 
mutated gene. That disruption is best explained by 
supposing that the sequence of letters comprising the 
genetic code is read in one direction, from a specific 
starting point, in groups of three or whatever the 
coding ratio might be. Consequently, the addition 
or removal of a letter would displace the reading 
mechanism by one letter from the site of alteration 
onward so that every triplet thereafter would be mis- 
read relative to the wild-type gene. Reversion of 
acridine-induced mutations is nearly always by sup- 
pressor mutations at a different but closely linked site, 
mutations that are assumed to work by adding a base 
where one has been deleted, and vice versa, so that the 
proper wild-type number of nucleotides (letters in the 
sequence) would be restored. 

The 1961 study began with the isolation of in- 
dependent, spontaneous reversions of a proflavin- 
induced mutation in the rI region of phage T4 
and the subsequent demonstration in recombination 


experiments that the reversions were second-site, sup- 
pressor mutations as hypothesized, and that, after 
separation from the original mutation, each displayed 
the mutant phenotype. Once the suppressor muta- 
tions were isolated, they could be treated like any 
new rI mutations. First, they were mapped and all 
were found to be located close to and on either side of 
the original r/7 mutation that they suppressed. Sec- 
ond, reversions of the separated suppressor mutations 
were obtained and the revertant strains were analyzed 
in the same way as before. Once again the reversions 
were found to be due to suppressor mutations, that is, 
to suppressors of suppressors! When these ‘second- 
generation suppressors’ were isolated, they proved, 
like the first-generation suppressors, to have the typical 
suppressor phenotype and to map in the same segment 
of the 77I region. They, in turn, were reverted and their 
suppressors, that is, suppressors of suppressors of sup- 
pressors, or third-generation suppressors, were isol- 
ated. In this way some 80 new r// mutations of the 
spontaneously reverting, acridine type were obtained. 

The investigators, assuming that the original 
acridine-induced mutation resulted from, for ex- 
ample, the insertion of an extra nucleotide, and that the 
reversions were due to a compensating deletion of a 
nucleotide at other sites close to it, assigned arbitrarily 
a plus sign (+) to the original mutation and a minus 
sign (—) to its suppressors. Similarly all the suppres- 
sors of those suppressors could be labeled as ‘plus’ 
mutations, while the third generation suppressors 
would be ‘minus’ mutations. It did not matter whether 
the original mutation was an insertion or a deletion; 
the important point was that a mutation and its sup- 
pressor had opposite signs. According to this conven- 
tion, some of the 80 new rI mutations were plus and 
some were minus. The question addressed next was: 
What happens if new combinations of these mutations 
are constructed by recombination? First, any combin- 
ation of a plus and a minus mutation gave a wild-type 
or pseudo-wild-type phenotype. Second, as expected, 
combinations of two pluses (++) or two minuses 
(——) always yielded the mutant phenotype. On the 
other hand, if the coding ratio is really 3, or a multiple 
of 3, the addition or substraction of three nucleotides 
should not throw the reading mechanism out of align- 
ment with the code. This was verified experimentally. 
(Of course the reading between the outermost inser- 
tions or deletions will be different from wild-type, but 
for a particular gene segment this may not affect the 
protein product.) The conclusion of this study was 
that the coding ratio is probably 3, or, if more than 
one nucleotide is added or removed at a time, some 
multiple of 3. 

The triplet nature of the code was supported by 
other mutational studies, utilizing, independently, 


tobacco mosaic virus (TMV) coat protein and the 
tryptophan synthetase alpha chain (encoded by the 
trpA gene) of Escherichia coli. Strong support for a 
triplet code came from correlation of amino acid sub- 
stitutions with the type of action of specific mutagens 
(TMV), from analyses of amino acid substitutions 
in forward mutations and subsequent nonwild-type 
reversions (trpA), and particularly intracodon recom- 
bination in trpA. Biochemical confirmation of the 
triplet nature of the code and specific assignment of 
codons to their amino acids came from translation of 
defined RNA polymers, transcription and translation 
of DNA polymers, and finally from ribosome/triplet 
aminoacyl-tRNA-binding assays. 


See also: Genetic Code; Universal Genetic Code 
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Triploidy is the term referring to the presence of a 
complete extra haploid set of chromosomes inan organ- 
ism or cell line resulting ina37 number of chromosomes 
where n is the haploid chromosome number for the 
species concerned. Triploids are both euploid and poly- 
ploid in that they contain a completely balanced extra 
set of chromosomes. They are rarely found in a viable 
state in wild animal populations but can occur in plant 
communities and are widely used in both commercial 
fruit and fish production. As discussed below, triploid 
organisms are invariably infertile. 


Triploidy in Humans 


In humans triploid embryos occur with chromosome 
constitutions of either 69, XXX, 69, XXY or 69, XYY. 
Triploidy is one of the commonest chromosome aber- 
rations in humans and ends in spontaneous abortion 
in nearly all cases constitutes an estimated 1% of 
all recognized human conceptions and 17% of all 
chromosomally abnormal abortuses. Approximately 
50 cases that came to birth but died almost directly 
afterwards are described in the literature. 

Two distinct triploid fetal phenotypes have been 
recognized: one in which a relatively well developed 
fetus occurs together with an extremely large cystic 
placenta known as a partial hydatidiform mole and a 
second characterized by a grossly retarded embryo 
and placenta. Other severe malformations frequently 
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present in both forms include cardiac defects, cleft 
palate, and skeletal defects. Chromosome and DNA 
marker studies have demonstrated that placental and 
fetal enlargement in triploid conceptions occurs when 
two paternal and one maternal set of chromosomes are 
present (diandry) and the phenotypic form with a 
reduced fetal and placental development arises in the 
presence of two maternal and one paternal set of 
chromosomes (digyny). Various lines of evidence, 
principally from mouse embryo nuclear transfer 
experiments, suggest that variations in fetal develop- 
ment associated with an excess of either paternal or 
maternal genome material are related to differences in 
parent specific imprinting patterns induced during 
gamete formation. 

Diandrous triploidy most frequently arises through 
the fertilization of a single oocyte by two spermatozoa 
and digynous triploidy by complete failure of chromo- 
some segregation in either the maternal first or second 
meiotic division giving rise to a diploid egg with two 
maternal sets of chromosomes. 

The 69, XXX and 69, XXY triploids occur equally 
frequently with the 69, XYY form being rarely 
observed suggesting a much reduced viability of the 
69, XYY triploid relative to the other two. 


Other Organisms 


Triploidy is encountered occasionally in natural 
populations of flowering plants containing diploid 
(2n) and tetraploid (47) plants. It is presumed that 
such triploids arise by natural crosses between diploid 
and tetraploid plants in the same population. How- 
ever, there are many examples of triploid strains of 
cultivated plants that have been induced artifically 
by crossing diploid and tetraploid parental strains. 
Unlike human triploids, such triploid plants appear 
to be morphologically normal, but are characterized 
by being completely infertile and can only be propa- 
gated vegetatively. Their infertility arises during 
gamete formation. Typically during meiosis the three 
homologs of each chromosome join and cross-over to 
produce a trivalent at the first meiotic division. The 
resulting chromosome segregation from each trivalent 
is completely random and it is extremely unlikely that 
a sufficiently large number of genetically balanced 
gametes can be produced to provide fertility. This phe- 
nomenon of triploid sterility was widely studied in the 
1930s and 1940s in various plant species with notable 
contributions from Darlington and Mather in triploid 
Hyacinthus (hyacinth) and Dermen in Petunia by 
studying chromosome segregation in pollen. 

One of the most famous and ancient examples of a 
triploid plant species is the cultivated banana charac- 
terized by its widely used and fleshy seedless fruit. The 
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cultivated banana is believed to have been derived 
from a cross between a diploid species Musa acumin- 
ata and the tetraploid species M. balbisiana, both of 
which produce seeded fruit, some 1000 years ago in 
southeast Asia. This gave rise to a sterile triploid 
plant with large seedless fruit and enormous food- 
producing properties. Propagation of the cultivated 
banana occurs by dividing its root system. There are 
now more than 600 varieties of cultivated banana, 
including the plantain, which have been introduced 
into the majority of tropical countries. Although the 
original seeded wild species are still available, they are 
considered to be so inferior that they are only eaten in 
times of famine when the cultivated banana crop fails. 

Modern plant breeders have adopted the same strat- 
egy of using the triploid status to produce seedless fruit 
for various fruit varieties of which currently the most 
important and fashionable ones are grapes, water- 
melons, and citrus fruits. Breeders have developed 
sophisticated methods of vegetative propagation in- 
cluding ż in vitro tissue culture and vegetative regenera- 
tion from endosperm tissue via somatic embryogenesis. 

Although viable triploidy is rarely found in the 
animal kingdom, examples of natural newt popula- 
tions were described the early 1940s in which triploids 
occurred together with diploid and tetraploid animals. 
In 1978 it was demonstrated that triploid newts could 
be induced experimentally by fertilizing heat-shocked 
eggs from diploid mothers; it was discovered that the 
heat shock treatment caused retention of the second 
polar body to produce a diploid egg. Similar strategies 
have been used to induce triploidy in fish for commer- 
cial reasons. Triploid salmon are sterile and demon- 
strate a steady and long-term growth combined with 
an improved flesh quality by comparison to fertile 
salmon and have a further advantage that they can be 
harvested at any time of the year. Triploid grass carp 
have been introduced into rivers and ponds in the 
United States for the purpose of weed control without 
the fear of reproduction and spreading in an uncon- 
trolled fashion. 


See also: Aneuploid; Chromosome Number; 
Euploid 
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Trisomy is the term that refers to the occurrence of 
an extra chromosome in a diploid (27) organism. 


Trisomy is an example of aneuploidy in that the pre- 
sence of the extra chromosome induces a genetic and 
numerical imbalance of the genome of the organism 
concerned. 

Trisomy of a chromosome frequently leads to in- 
duction of a specific phenotype for the chromosome 
concerned. In a series of classical studies carried out 
between 1923 and 1931, Blakeslee demonstrated spe- 
cific phenotypic effects associated with trisomy of 
each type of chromosome in Datura stramonium, the 
Jimson weed. The haploid chromosome number of 
diploid Jimson weed is 12 and Blakeslee observed 
12 different characteristic phenotypes expressed as 
changes inthe general sizeand shape of the seed capsules 
and the structure of hooks carried on their surface. 
Further, Blakeslee noted that the phenotypic changes 
induced by trisomy were significantly larger than 
those caused by induction of either triploidy or tetra- 
ploidy and concluded that the genetic imbalance of tri- 
somy was the major cause of the phenotypic changes. 

These studies came to the notice of the scientific 
community and in 1933 the Dutch ophthalmologist 
Waardenberg speculated that Down syndrome in 
humans might also be caused by the occurrence of an 
extra specific chromosome, an opinion that was later 
endorsed by Penrose. 

Various improvements in human chromosome pre- 
parations lead to the accurate counting of human 
chromosomes by Tjio and Levan in 1956. In 1959, 
Lejeune and colleagues identified an extra chromosome 
caused by trisomy of a specific chromosome in patients 
with Down syndrome. The chromosome involved 
was subsequently referred to as chromosome 21 and 
Down syndrome as trisomy 21 syndrome. This ob- 
servation was confirmed by Pat Jacobs in the same year, 
who also observed trisomy of the X-chromosome 
in Klinefelter syndrome patients and monosomy X 
in Turner syndrome. In the following year, trisomy 
of chromosome 18 giving rise to Edwards syndrome 
and of trisomy 13 associated with Patau syndrome 
were described. The respective birth frequencies 
of trisomies 21, 18, and 13 have been estimated to 
be 1/700, 1/7000, and 1/12000 respectively. The 
short form of the ISDN human chromosome nomen- 
clature refers to trisomy 21 as 47, XX, +21 in the case 
of female Down syndrome patients and 47, XY, +21 
for male Down syndrome patients. Complete tri- 
somies of autosomes other than 21, 18, and 13 do 
not appear to be compatible with life after birth in 
humans. 

However, there are occasional descriptions in the 
literature of mosaic trisomies being found for several 
other chromosomes in liveborn individuals, includ- 
ing trisomies for chromosomes 8 and 9. Approxi- 
mately 5% of Down syndrome patients do not arise 


from a complete extra chromosome 21, but are caused 
by unbalanced translocations resulting in three copies 
of all or part of the long arm of chromosome 21. The 
most common form of translocation is a Robertsonian 
translocation formed by centric fusion involving the 
long arm of 21 and the long arm of another acrocentric 
chromosome, usually 14 or 13. The short arms of the 
translocation chromosomes are eliminated resulting in 
a phenotypically normal translocation carrier with 45 
chromosomes in place of 46. Missegregation of the 
translocation in gametes of the translocation carriers 
results in Down syndrome children with 46 chromo- 
somes with three copies of 21q. Other forms of trans- 
location in unbalanced state can give rise to partial 
trisomy in which just part of a given chromosome is 
trisomic. Specific phenotypes of partial trisomies for 
chromosomes besides 21, 18, and 13 have been 
described. In particular partial trisomies involving 
the short arm of chromosome 9 occur regularly and 
have a very characteristic phenotype. Partial trisomies 
can also arise from meiotic recombination within the 
inversion loop in carriers of pericentric inversions, 
resulting in a recombinant chromosome with a partial 
monosomy at the end of one arm of the chromosome 
and partial trisomy at the end of the other arm. Inter- 
stitial duplications have also been described which 
result in partial trisomy and an associated clinical 
phenotype. Investigation of the size and position of 
partial trisomies has led to the recognition that only 
part of the chromosome is necessary and permits 
defining a minimal essential region to produce a tri- 
somic phenotype. Gene mapping studies of such cases 
has helped identify which genes are involved in caus- 
ing genetic imbalance. In the case of chromosome 21 
the recent sequencing of the entire chromosome 
shows that probably far fewer genes are involved in 
causing Down syndrome than originally believed. 
Many independent studies indicate that the liveborn 
incidence of Down syndrome must constitute only a 
small fraction of the presumed disease incidence at 
conception. Chromosome studies on spontaneous 
abortions, which represent between 15-25% of all 
recognized pregnancies, show that at least 80% of 
Down syndrome pregnancies must spontaneously 
abort. Trisomy 21 is involved in approximately 10% 
of all trisomic spontaneous abortions. Importantly, 
trisomy has been observed for all chromosomes in 
spontaneous abortions in varying frequencies and 
demonstrates a general increase with maternal age in 
the abortus population. Astonishingly, monosomy X 
(Turner syndrome) is the only total chromosome 
monosomy found in either human liveborns or spon- 
taneous abortions. However, studies in the mouse on 
fetal wastage associated with chromosomal abnor- 
malities carried out by Alfred Gropp in the early 1970s 


Trisomy 2057 


clearly show that monosomies are involved in concep- 
tion at an equal frequency to trisomies (the theoretical 
expectation); autosomal monosomic conceptions 
never come to term and are lost at a much earlier 
stage of pregnancy. We may assume that the same is 
true for humans but monosomic embryos are lost too 
early in the pregnancy to be even recognized. The 
inevitable conclusion is that the frequency of chromo- 
somal aneuploidies at conception, including trisomies, 
is much higher than can be deduced from frequencies 
in either liveborns or spontaneous abortions. Further, 
the frequency of chromosome abnormalities at con- 
ception can be expected to rise rapidly with increasing 
maternal age such that the majority of embryos are 
chromosomally abnormal in women approaching 
their 40th year. This expectation is confirmed by 
recent investigations on the chromosome status of 
IVF embryos which demonstrate that nearly all 
embryos derived from women > 37 years are chromo- 
somally abnormal. 

Geneticists have been fascinated for decades with 
the factors determining the enormous increases in 
chromosome nondisjunction in the human female 
with increasing age. One possible factor, first de- 
scribed by Edwards and Henderson by direct examin- 
ation of mouse meiotic oocytes some 33 years ago, 
is that of a reduced meiotic crossing-over (chiasma 
forming) resulting in homologous chromosomes fail- 
ing to segregate normally into daughter cells at the 
first meiotic division. This phenomenon has also 
been shown to take place in humans by using genetic 
marker segregation analysis; the level of genetic 
recombination between genes located on a particular 
chromosome is a direct measure of the number of 
chiasmata occurring on that chromosome during the 
preceding meiosis. It appears that the specific chromo- 
some 21s involved in giving rise to Down syndrome 
children exhibit a much reduced level of genetic 
recombination than chromosome 21s from normal 
children. Interestingly, the reduced recombination 
frequency i is not just confined to chromosome 21 and 
genetic marker analysis of all chromosomes in Down 
syndrome children shows a genome-wide reduction 
in genetic recombination. This leads to the conclusion 
that there is a subpopulation of oocytes with a strongly 
reduced level of recombination over all chromosomes 
which predisposes the oocyte to nondisjunction of 
chromosome 21. The central problem is how the 
occurrence of a subpopulation of oocytes with reduced 
recombination could be released during ovulation 
with increasing frequency with advancing maternal 
age. Henderson and Edwards advanced the ‘produc- 
tion line theory’ to explain this: the theory assumes 
that oocytes which were the first to differentiate dur- 
ing development of the embryonic ovary had a higher 
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chromosome recombination frequency than those 
which differentiated later; oocytes were subsequently 
released during monthly ovulation in the order of their 
embryonic differentiation according to the maxim, 
first in — first out, last in — last out. Many questions 
arising from the production line theory remain un- 
answered, including, how can the order of oocyte 
differentiation induce the postulated differences in 
recombination frequency. In 1996 Lamb et al. intro- 
duced their two-hit model of nondisjunction in which 
the first hit was a reduced recombination in particular 
oocytes and the second hit a generalized ovarian aging 
related to maternal age. This model explains the rare 
occurrence of Down syndrome babies to young 
mothers to be caused primarily by a reduced recom- 
bination and the much higher incidence in older 
mothers by a combination of both ovarian aging and 
reduced chromosome recombination. 


See also: Aneuploid; Down Syndrome; Gene 
Mapping; Klinefelter Syndrome; Monosomy; 
Nondisjunction; Trisomy 18; Turner Syndrome 
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Trisomy 18 syndrome, also known as Edwards syn- 
drome, was originally described by Professor John 
Edwards of Oxford University and his colleagues in 
a single case report published in 1960 (Edwards et al., 
1960). Other case descriptions in North America soon 
followed and this syndromic pattern became estab- 
lished by the mid-1960s. Since then, hundreds of 
case reports and several series have been published 
throughout the world. The trisomy 18 syndrome is 
due to an extra copy of chromosome 18 and represents 
the third most common autosomal syndrome behind 
trisomy 21/Down syndrome and the deletion 22q11 
syndrome. Based on a number of population studies 
performed in different areas of North America, 
Europe, and Australia, the prevalence at birth in 
liveborn infants ranges from 1 in 3600 to 1 in 8500 
(summarized by Embleton et al., 1996). The most 
accurate estimate of the frequency of trisomy 18 in 
live births with only a minimal influence by prenatal 
screening is from the Utah study which documents a 
prevalence of just under 1 in 6000. 

The pattern of malformation, i.e., the syndrome, 
consists of a recognizable constellation of major 
and minor anomalies, a predisposition to increased 


neonatal and infant mortality, and a significant 
neurodevelopmental and motor disability in the 
older surviving children. The most consistent findings 
include the presence of prenatal growth deficiency 
(low weight for gestational age), recognizable cranio- 
facial features (high forehead, short palpebral fissures, 
small face for cranium, external ear anomalies, and 
micrognathia), a distinctive hand posturing (overrid- 
ing fingers, camptodactyly, and nail hypoplasia), a 
short sternal length, and foot deformities. While the 
facial features are not as distinctive as in the newborn 
with Down syndrome, the clinical diagnosis in the 
neonate with trisomy 18 is relatively straightforward 
for the clinician with training in clinical genetics, dys- 
morphology, and neonatology. The hand findings are 
particularly unique as the combination of overriding 
fingers with the second on the third, accompanied 
by the camptodactyly is observed in only a few other 
congenital malformation syndromes. In addition to 
the prenatal growth deficiency and pattern of minor 
anomalies, infants with trisomy 18 are at risk for a 
number of medically significant structural defects: 
over 90% of children will have a cardiovascular mal- 
formation; other congenital anomalies that occur in 
varying proportions of 10-50% of patients include 
omphalocoele, tracheal esophageal fistula, radial apla- 
sia, and talipes equinovarus. Defects of the external ear 
are common and usually comprise a small ear that is 
broad with unraveling to the helix and is frequently 
accompanied by fusion to the scalp skin (crypotia). 
About 5% of infants will have an open neural tube 
defect, usually meningomyelocele. A summary of 
these and other features of trisomy 18 are available in 
pediatric and genetic texts. 

The cardiovascular malformations, like in most 
chromosome syndromes, are nonrandom and rela- 
tively specific. In large series, about 90% of trisomy 
18 patients with a heart defect have a ventricular septal 
defect with polyvalvular disease. Usually the valvular 
dysplasia involves two or three of the heart valves and 
occasionally produces the tetralogy of Fallot. About 
10% of patients with trisomy 18 syndrome will have 
a more complicated cardiac malformation, such as a 
double outlet right ventricle or endocardial cushion 
defect. 

The most significant manifestation of infants with 
trisomy 18syndromeistheincreased neonatalandinfant 
mortality. This particularfeature of the syndromeis well 
known to perinatologist, neonatologists, and pediatri- 
cians. Since the mid-1980s, five population-based sur- 
vival studies have been performed in various parts of 
the United Kingdom, Australia, and the United States. 
The actual figures are remarkably similar in all of the 
investigations and indicate that about50% of newborns 
with trisomy 18 die by 7 days of age. Approximately 


80-90% have succumbed by 6 months of age with 
over 90% having died by 12 months of age. It is this 
increased infant mortality that has often led physicians 
to regard trisomy 18 as a ‘lethal condition.’ However, 
it is important to emphasize that about 5% of children 
will survive the first year of life. The exact reason for 
death in children with this syndrome is not completely 
clearcut. Recent investigations into the natural history 
of trisomy 18 have indicated that central apnea or its 
presence in combination with other medical problems 
is the primary cause of the high infant mortality. 
Aspiration events, upper airway obstruction, hypo- 
ventilation, and the heart defects accompany central 
apnea as part of the multifactorial complex of findings 
that are related to this increased infant mortality. 
Other medical problems that complicate morbidity 
in infancy include feeding difficulties, gastroeso- 
phageal reflux, and the potential for congestive heart 
failure related to the cardiovascular malformations. 
The majority of infants with trisomy 18 who survive 
the newborn period are not able to feed by mouth and 
require tube feedings. Placement of a gastrostomy 
tube becomes a consideration for older infants. 

The other major manifestation of trisomy 18 is the 
neurodevelopmental disability. As mentioned, less 
than 10% of children survive the first year of life. 
However, once a child with trisomy 18 is older than 
12 months, the individual appears to have passed an 
important threshold regarding the occurrence of cen- 
tral apnea and the aspiration events mentioned above. 
(In later childhood if there is a serious illness or de- 
mise, it is usually due to a more specific medical reason 
or complication such as pneumonia and pulmonary 
hypertension.) All older infants and children with 
trisomy 18 exhibit a significant but nonregressive 
neurodevelopmental and motor disability. While all 
children progress in their milestones, they do so quite 
slowly. Toddlers and older children have enough 
developmental involvement that they are not able to 
walk unassisted or use verbal expression. A study 
by Baty et al. (1994a, b) reviewed the developmental 
and medical records of 62 children with trisomy 18. 
Analysis of these developmental evaluations showed 
that on average the age equivalent performance skills 
were between 6 and 10 months of age regardless 
of chronological age. Although all individuals with 
trisomy 18 in the study were functioning at an age of 
severe to profound developmental lag, the children 
did achieve many skills of early childhood and 
continued to learn throughout life: a number of older 
children with trisomy could use a walker and several 
were able to feed themselves and understand cause 
and effect. Of note, there is one child in the medical 
literature with trisomy 18 who was able to walk 
unsupported. Investigations on this child showed 
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that he had full trisomy 18 without mosaicism. 
Because of the developmental progress in older infants 
and young children, referral for early intervention 
programs is recommended. Due to the increased 
infant mortality and degree of developmental disabil- 
ity, the ethical issues surrounding the management of 
infants and children with trisomy 18 are quite compli- 
cated, yet seldom discussed. As mentioned above, a 
small but definite proportion of infants with trisomy 
18 will be alive at 12 months of age and it is usually not 
possible in the newborn period to predict survival 
accurately. These issues are discussed in more detail 
in a recent review by the author (Carey, 2001). Other 
complications of older children include hearing loss, 
scoliosis, and increased risk for Wilms’ tumor. The 
author has outlined suggestions for routine health 
supervision (Carey, 2001) in persons with trisomy 18. 

Trisomy 18 is frequently recognized in the prenatal 
setting because of the high occurrence of ultrasound 
abnormalities that are seen in second trimester fetuses 
with this condition. In addition, prenatal triple screen 
will show the presence of a low value of all three 
parameters and such population programs will detect 
about 60% of second trimester fetuses with trisomy 
18. Thus, the dilemma of discussing the condition and 
its natural history in the prenatal scenario is a common 
occurrence in these times. 

Trisomy 18 is usually due to a complete trisomy, 
i.e., three copies of chromosome 18. Over 90% of in- 
fants in recent population series who had the Edwards 
syndrome phenotype have complete trisomy, while 
about 8% have either mosaicism or a partial 18q tri- 
somy. In full trisomy 18, the extra chromosome is 
presumably present due to a nondysjunctional event 
in meiosis. The error in nondysjunction predominant- 
ly occurs in oogenesis and is evenly divided between 
meiosis I and II. This is in contrast to all other human 
trisomies where the error is usually in maternal meio- 
sis I. Thus, the biology of nondysjunction in trisomy 
18 is unique. However, as in the case of the other 
common autosomal trisomes, i.e., 21 and 13, there is 
a maternal age affect. As in other chromosome syn- 
dromes, prenatal diagnosis with amniocentesis or 
chorionic villous sampling in future pregnancies is 
routinely offered to families who have had a child 
with trisomy 18. 

Families of infants and children with trisomy 18 in 
Utah formed an international lay advocacy group in 
1980. This group, called the Support Organizational 
for Trisomy 18, 13, and Related Disorders (SOFT), 
now includes thousands of families from all over the 
world. SOFT publishes a newsletter six times a year, 
holds an annual conference, and connects families to 
each other for support. The contact address is SOFT, 
2982 South Union Street, Rochester, NY 14624, USA, 
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1-800-716-SOFT, and the web page is www.trisomy. 
org. The contact address for SOFT UK is through 
Tudor Lodge Redwood, Ross on Wye, Herefordshire 
HR9 5UD, UK (tel: 01-989-67480). Another organ- 
ization entitled Chromosome 18 Registry and Research 
Society, San Antonio, Texas, USA (http://www.chr18. 
uthscsa.edu) focuses on research aspects of chromo- 
some disorders involving chromosome 18. 


References 

Baty BJ Blackburn BL and Carey JC (1994a) Natural history of 
trisomy 18 and trisomy 13: 1. Growth, physical assessment, 
medical histories, survival, and recurrence risk. American 
Journal of Medical Genetics 49: 175-188. 

Baty BJ, Jorde LB, Blackburn BL and Carey JC (1994b) Natural 
history of trisomy |8 and trisomy 13: II. Psychomotor devel- 
opment. American Journal of Medical Genetics 49: 189-194. 

Carey JC (2001) The Common Medically Serious Classical Tri- 
somy Syndromes. In: Cassidy S and Allanson J (eds) Manage- 
ment of Common Genetic Syndromes. New York: John Wiley. 

Chromosome 18 Registry and Research Society, San Antonio, 
Texas. http://www.chr | 8.uthscsa.edu 

Embleton ND, Wyllie JP, Wright MJ, Burn J and Hunter S (1996) 
Natural history of trisomy 18. Archives of Disease in Childhood 
75: 48-41. 

Edwards JH, Harnden DG, Cameron AH, Crosse VM and Wolff 
OH (1960) A new trisomic syndrome. Lancet |: 787—788. 

SOFT. http://www.trisomy.org 


See also: Trisomy 


Triticum Species (Wheat) 
J Dvorak 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1672 


Evolution and Domestication 


Evolution 

Triticum (wheat) comprises six biological species at 
the diploid, tetraploid, and hexaploid levels (Table 1). 
The polyploid Triticum species originated by hy- 
bridization between Triticum and the neighboring 
genus Aegilops (goatgrass), as shown schematically 
in Figure l. The tetraploid species, T. turgidum (geno- 
mes AABB) and T. timopheevii (genomes AAGG), are 
polyphyletic. The A genomes of both species were 
contributed by T. urartu. The B and G genomes are 
closely related to the genome of Ae. speltoides (S gen- 
ome). The designation of these genomes as B and G 
rather than as modified S genomes has been retained 
for historical reasons. Since T. turgidum is older than 
T. timopheevii, the B genome has diverged more from 


T. monococcum A"A™ 


T. zhukovskyi AAGG "A™ 
T. urartu AA 


T. timopheevii AAGG 
T. turgidum AABB 


T. aestivum AABBDD 
Ae. speltoides SS 


Ae. tauschii DD 


Figure | 
Triticum. 


Evolution of the polyploid complex of 


the S genome of Ae. speltoides than the G genome. The 
G genome is virtually identical to the S genome at the 
molecular level but it differs from it, as well as from 
the B genome, by major structural chromosome re- 
arrangements. Hexaploid T. aestivum originated some 
6000-7000 years ago by the hybridization of tetra- 
ploid wheat, most likely cultivated emmer (T. turgi- 
dum subsp. dicoccon), with Ae. tauschii. Ae. tauschii 
subsp. strangulata in Transcaucasia was the principal 
source of the wheat D genome gene pool but, since 
several hybridization events were responsible for the 
formation of this gene pool, it cannot be excluded 
that Ae. tauschii from other geographical regions, 
participated in shaping it. Hexaploid T. zhukovskyi 
originated recently by interspecific hybridization of 
cultivated T; timopheevii with cultivated T. mono- 
coccum. 

Table 2 summarizes genome relationships in the 
Triticum—Aegilops alliance. Genomes designated by the 
same capital letter share homologous chromosomes. 
Different superscripts attached to a common capital 
letter mark slightly differentiated (modified) versions 
of a basic genome. Diploid sources of genomes desig- 
nated as X and Yare currently uncertain, although some 
evidence suggests that the X genome evolved from an 
ancient S genome. Table 2 also summarizes relation- 
ships among cytoplasms (plasmons) of the species in 
the Triticum—Aegilops alliance. 


Domestication 

Cultivated T; monococcum, T. turgidum, and T. timo- 
pheevii originated by domestication of wild progeni- 
tors (Table 1). Einkorn wheat, T. monococcum, was 
domesticated in the Karacadag mountains of south- 
eastern Turkey about 10000 years ago. Emmer wheat 
was domesticated at a currently unknown site either 
at the same time as T. monococcum or slightly later. 
T. timopheevii was probably domesticated in Trans- 
caucasia. There is no evidence that hexaploid wheat 
ever existed in the wild. The semi-wild T. aestivum 
subsp. tibetanum (Table |) isa weedy race of unknown 
origin. Hulled, brittle-rachis forms of T. aestivum have 
often been considered ancestral to the free-threshing 
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Ploidy, domestication status, and spike characteristics of Triticum species and subspecies 


Ploidy Status Spike 


Species 


Subspecies 


aegilopoides (wild einkorn wheat) 
monococcum (cultivated einkorn wheat) 


dicoccoides (wild emmer wheat) 
dicoccon (cultivated emmer wheat) Paleocolchieum 


durum (durum), turgidum (pollard wheat), turanicum (Khorassan 


wheat), polonicum (Polish wheat), carthlicum (Persian wheat), 
isphahanicum 

armeniaeum (syn. araraticum) 

timopheevii 

macha, tibetanum (Tibetan wheat) 
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spelta (spelt), vavilovii, yunanense (Yunan wheat) 
aestivum (bread wheat), compactum (club wheat), 


sphaerococcum (Indian dwarf wheat), petropavlovskyi 
(Chinese rice wheat) 


2x wild hulled, brittle T. monococcum 
cult. hulled, nonbrittle T. monococcum 
wild hulled, brittle T. urartu 
4x wild hulled, brittle T. turgidum 
cult. hulled, nonbrittle T. turgidum 
naked, nonbrittle T. turgidum 
wild hulled, brittle T. timopheevii 
cult. hulled, nonbrittle T. timopheevii 
6x cult. hulled, brittle T. aestivum 
hulled, partially brittle T. aestivum 
naked, nonbrittle T. aestivum 
cult. hulled, nonbrittle T. zhukovskyi 


forms (Table |). However, both molecular and archeo- 
logical evidence suggest that modern hulled forms of 
T. aestivum were probably derived by hybridization 
between free-threshing hexaploid wheats and hulled 
domesticated or wild emmer wheat. 


Cytogenetic Structure 


Genomes within the Triticum—Aegilops alliance are 
large ranging from 4.1 pg (4024 Mb) of DNA per 1C 
nucleus in Ae. tauschii (Arumuganathan and Earle, 
1991) to 6-7 pg of DNA per 1 C nucleus in a number 
of other diploid species (Bennett, 1972; Arumuga- 
nathan and Earle, 1991). The 1C DNA content of 
T. aestivum was estimated to be 16.5 pg (15 966 Mb) 
(Arumuganathan and Earle, 1991) and 18 pg (Bennett, 
1972). The relative sizes of the three T. aestivum 
genomes are B > A > D (Figure 2). Repeated nucleo- 
tide sequences comprise a minimum of 83% of the 
T. aestivum genomes (Flavell et al., 1974). 

The basic chromosome number x is 7 in all genomes 
of the Triticum—Aegilops alliance. Accessory chromo- 
somes have been reported only in Aegilops mutica 
(syn. Amblyopyrum muticum) and Ae. speltoides. 

Triticum aestivum cultivar Chinese Spring has been 
shown to possess a ‘primitive’ chromosome structure 
and has been extensively used in genetic studies and 
the development of cytogenetic stocks. Its idiogram 
(Figure 2) was adopted as the wheat standard. 
The ability to replace a chromosome in one wheat 
genome by a chromosome from another wheat 
genome (as in the nullisomic-tetrasomic lines) led to 


the identification of related (homoeologous) chromo- 
somes in T. aestivum A, B, and D genomes (Figure 2). 
To reflect homoeologous relationships among wheat 
chromosomes, the 21 chromosomes of T. aestivum 
have been assigned to seven groups of three homoeo- 
logous chromosomes, one from each genome. For 
example, chromosomes 1A, 1B, and 1D are the A-, B- 
and D-genome chromosomes, respectively, of wheat 
homoeologous group 1 (Figure 2). Wheat homoeolo- 
gous groups are the standard for the assignment of 
homoeologous relationships among chromosomes 
across the tribe Triticeae. 

The regions of wheat chromosomes containing 
constitutive heterochromatin can be revealed by the 
C-banding procedure followed by staining with 
Giemsa (heterochromatin stains dark). The pattern 
of C-bands facilitates unequivocal identification of 
each of the wheat chromosomes (Figure 2). Chromo- 
somes of the B genome are most heavily heterochro- 
matic (Figure 2) with the principal heterochromatic 
sequence being the Ag-satellite, based on a trinucleo- 
tide motif (GAA),. Wheat chromosome arms are 
designated S for the short arm and L for the long arm 
(wheat chromosome arm designations p for short and 
q for long, that have been used in other genomes, did 
not find a broad acceptance and were abandoned after 
a few years). The subdivision of wheat chromosome 
arms into smaller segments based on C-banding is 
shown in Figure 2. 

The majority of Chinese Spring chromosomes 
belong to a single homoeologous group. An exception 
are the structurally rearranged chromosomes 4A, 5A, 
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Figure 2 (Opposite) Idiogram of the Triticum aestivum cultivar Chinese Spring complement. Chromosomes |A to 7A, 
IB to 7B, and ID to 7D belong to the A, B, and D genomes, respectively. Chromosomes designated by the same Arabic 
numeral are homoeologous. Chromosome arms are designated S for short and L for long. Except for chromosome 4A, 
S arms are homoeologous with each other and L arms are homoeologous with each other in the seven homoeologous 
groups. Each arm is divided into one to three regions, numbered | through 3, delineated by major C-bands (numbers to 
the left of chromosomes). The regions are further subdivided into C-positive (solid rectangles) and C-negative (opened 
rectangles) bands. The first band in each region is a C-positive band. Bands within a region are numbered in the proximal 
to distal direction. The position of major C-bands in a chromosome arm is characterized by the fraction length 
(numbers to the right of chromosomes) of the arm (Endo and Gill, 1996). (Courtesy of B. S. Gill.) 


Table 2 Genome and plasmon constitution of species in the Triticum—Aegilops alliance 


Species Plasmon? Genome? 
T. monococcum A? AT 

T. urartu ? A 

Ae. speltoides S,G,G? S 

Ae. searsii S“ s° 

Ae. bicornis sP? sP? 

Ae. sharonensis s! s! 

Ae. longissima s!? s! 

Ae. uniaristata N N 

Ae. comosa (incl. Ae. heldreichii) M, M" M 

Ae. caudata C C 

Ae. umbellulata U U 

Ae. tauschii D D 

Ae. mutica (syn. Amblyopyrum muticum) TI T 

T. turgidum B AB (B is related to S) 
T. aestivum B ABD 

T. timopheevii G AG (G is related to S) 
T. zhukovskyi G AGA™ 
Ae. cylindrica D DC 
Ae. ventricosa D DN 
Ae. crassa 4x D? D°X 
Ae. crassa 6x D? D‘*XD 
Ae. vavilovii D? D‘xs** 
Ae. juvenalis D? DXU 
Ae. triuncialis U, C? UC 
Ae. columnaris U? UY 

Ae. neglecta 4x (syn. Ae. triaristata) U UY 
Ae. neglecta 6x (syn. Ae. recta) U UYN 
Ae. geniculata (syn. Ae. ovata) M° (T) UM°’ 
Ae. biuncialis U UM”? 
Ae. kotschyi S“ us' 
Ae. peregrina (syn. Ae. variabilis) S“ us' 


“Superscripts indicate minor differentiation of a plasmon from the basic type (for more 


details see Wang et al., 1997). 
’Modified from Dvořák (1998). 


and 7B. The first step in these rearrangements was 
fixation of a 4A-5A translocation in einkorn wheat 
prior to the divergence of T. monococcum and T. urartu. 
The remaining rearrangements were fixed during 
the evolution of T. turgidum subsp. dicoccoides. 
These involved a pericentric inversion in 4A which 


converted the long arm into the short arm, a paracentric 
inversion in the 4AL arm, and a reciprocal transloca- 
tion between 7BS and the rearranged 4AL arm. Trans- 
locations fixed during the evolution of T. turgidum 
subsp. dicoccoides differ from those fixed during 
the evolution of T. timopheevii subsp. armeniaeum. 
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Although additional small terminal translocations and 
inversions in the A and B genomes may have been 
fixed during the evolution of the A, B, and D genomes 
of T. aestivum, and have escaped molecular detection, 
the order of loci in wheat homoeologous chromo- 
somes is largely colinear. 

The genomes in the Triticum—Aegilops alliance pos- 
sess either one or two pairs of nucleolar organizing 
regions (NORs). In the chromosome complement of 
T. aestivum, NORs are on chromosome arms 1AS 
(Nor9), 1BS (Nor1), 6BS (Nor2), and 5DS (Nor3). 
These multigene loci contain several hundred to sev- 
eral thousand repeated 18S-5.8S-26S rRNA gene units 
arranged in tandem and separated by nontranscribed 
spacers. Although additional minor loci were detected 
by in situ hybridization in wheat, they may not con- 
tain complete gene units and no evidence exists that 
they function as NORs. Also, comparative mapping 
in Triticeae showed that the multigene Nor loci and 
multigene loci encoding 5S rRNA have occasionally 
transposed into new locations during the evolution of 
wheat and other Triticeae genomes without perturb- 
ation of the colinearity of surrounding chromosome 
regions. The genetic mechanism of these transposition 
events is not known. 


Cytogenetic Stocks and their Use 


Aneuploids and Alien Addition and 
Substitution Lines 

Because of hexaploidy, T. aestivum is exceptionally 
tolerant of aneuploidy and sets of monosomics, tetra- 
somics, ditelosomics, and double ditelosomics for 
each of the 21 chromosomes of Chinese Spring have 
been developed (Sears, 1954; Kimber and Sears, 1968). 
Since limited polymorphism exists in Chinese Spring 
wheat, Chinese Spring aneuploid stocks are not abso- 
lutely isogenic. That fact must be considered in experi- 
mental designs and interpretation of results. The 
definitions and designations of the various types of 
wheat aneuploids and alien addition and substitution 
lines and structural chromosome variants are com- 
piled in Table 3. 

Monosomics hold a central position among wheat 
aneuploid stocks since they facilitate development of 
other cytogenetic stocks and have played a critical role 
in gene mapping in wheat. Selfed monosomics segre- 
gate for nullisomics in their progeny. Theoretically, 
half of the microspores or megaspores produced on a 
monosomic plant should receive the monosome and 
half should not. However, since the univalent is lost in 
about 50% of meioses, only about 25% of the micro- 
spores and megaspores acquire the monosome; about 
75% are nullisomic. Since nullisomic eggs are func- 
tional in T. aestivum, a monosomic produces about 


25% euploid eggs and 75% nullisomic eggs. However, 
the transmission of nullisomy via pollen is adversely 
affected by competition between monosomic and 
nullisomic pollen grains. Sears, (1954) reported the 
frequency of nullisomic progeny from selfed 
monosomics to range from 0.9% for monosomic 5D 
to 7.6% for monosomic 3B. From these frequencies, 
he inferred that only 4% of the pollen grains involved 
in fertilization are nullisomic while 96% are euploid. 
Hence, a selfed monosomic plant is, on average, 
expected to produce 24% euploid, 73% monosomic, 
and 3% nullisomic zygotes. 

Irregular disjunction of the univalent in mono- 
somics occasionally results in centromere misdivision, 
leading to the occurrence of telocentric chromosomes 
(telosomes) and isochromosomes (isosomes) in 
the progeny. Sears (1954) and Sears and Sears (1979) 
developed ditelosomic and double ditelosomic stocks 
for all 42 chromosome arms of T. aestivum cv. Chinese 
Spring and isosomic stocks for most of the arms. 
Monotelosomics can be produced by crossing ditelo- 
somics with corresponding monosomics. Monotelo- 
somics are viable and fertile and have the advantage 
over monosomics in that the aneuploid chromosome 
is identified easily with a microscope, which can be 
critical in crossing schemes in wheat. Monosomy has 
been transferred from Chinese Spring to other T. aes- 
tivum varieties by recurrent backcrossing, thereby 
facilitating chromosome manipulation in other 
genetic backgrounds. 

Some cultivars of T. aestivum (Chinese Spring 
being one of them) possess genes for high interspecific 
crossability, Kr1 (5B), kr2 (5A), kr3 (5D), and kr4 
(1A). Such wheat genotypes can be used to produce 
hybrids with virtually any species in the tribe Triticeae. 
By backcrossing amphiploids produced from inter- 
specific hybrids, individual chromosomes of Aegilops, 
rye, barley, Lophopyrum, Thinopyrum, and other spe- 
cies have been added to the wheat chromosome com- 
plement. These genetic stocks are called monosomic or 
disomic alien addition lines (Table 3). Alien addition 
lines can be used as the initial material in substituting 
an alien chromosome for a specific wheat homoeolog 
(alien substitution lines) (Table 3). 

Some wheat and alien chromosomes carry gam- 
etocidal genes. An example of a wheat gametocidal 
gene is the Pollen killer (Ki) locus on chromosome 
arm 6BL. In the Kiki heterozygote, male gameto- 
phytes not carrying the Kz allele are aborted; only 
pollen grains having the Ki allele are able to function, 
resulting in an extreme form of segregation distor- 
tion. The breakage of wheat chromosomes, being a 
frequent result of the activity of gametocidal genes, 
has been exploited in the development of wheat dele- 
tion stocks. An example is the breakage of wheat 


Triticum Species (Wheat) 2065 


Table 3 Types, designation, sporophytic chromosome numbers, and meiotic pairing configurations of wheat 
aneuploids, alien addition and substitution lines, and structural variants 


Name Designation (group | and Sporophytic Meiotic pairing“ 
2 chromosomes are used chromosome no. 
as examples) 
Disomic DIA 42 21” 
Monosomic MIA 41 21” + 
Nullisomic NIA 40 20” 
Trisomic TrilA 43 20” + 1!” 
Tetrasomic TIA 44 20” + 1" 
Nullisomic—tetrasomic NIA-TIB 42 19” 4" 
Monotelosomic Mtl AS 40 +t 20” +r 
Ditelosomic DtI AS 40 + 2t 20” + t” 
Double monotelosomic dMtlA 40 + 2t 20” + ¢ +r 
Double ditelosomic dDtlA 40 + 4t 207 + t +t” 
Ditelo-monotelosomic DtlAS-MtlAL 40 + 3t 207 +t +t! 
Monoisosomic Mil AS 40 + i 20” +i 
Diisosomic Dil AS 40 + 2i 20” + i” 
Monotelodisomic MtDIAS 4l +t 20” + tI” 
Double monotelo-trisomic dMitrilA 41 + 2t 20” + (t + t) 1” 
Monoisodisomic MiDIAS 41 +i 20” + il” 
Double monoiso-trisomic dMitrilA 41 + 2i 20” + (i + i) 1” 
Double monosomic dMIA-M2A 40 I9 + 14 1 
Double monotelo-disomic dMtD1AS-MtD2AS 40 + 2t 19” + tl” + tl” 
Monosomic addition MAIR? 43 21” + 1 
Disomic addition DAIR 44 22” 
Monotelosomic addition MtA IRS 42 +t 21” +r 
Ditelosomic addition DtAIRS 42 + 2t 21” + t” 
Monosomic substitution MSIR(IA) 41 20” + I’ 
Disomic substitution DSIR(IA) 42 21” 
Substitution double monosomic SdMIR(IA) 42 20” + I+ 1 
Intervarietal disomic substitution DS1ACnn‘ (1ACS) 42 21” 
Terminal translocation TIASIAL-IBL® € 42 21” 
Terminal translocation (explicit description) TIASIALI.4::1BLI.2 42 21” 
Intercalarly translocation TIASIAL-IDL-IAL 42 21” 
Intercalarly translocation (explicit description) TIASIALI.4::1IDLI.2::1AL 42 21” 
Deletion dell AS-I 42 21” 


at!" and ”” indicate univalent, bivalent, trivalent, and quadrivalent, respectively. Arabic numerals indicate the number of 


complete chromosomes present; t, a telosome; i, an isosome. Telosomes or isosomes for the opposite arms of a chromo- 
some in a single muttiralent are placed in parentheses. 

Rye chromosome IR. 

“Cnn: T. aestivum cultivar Cheyenne; CS: cultivar Chinese Spring. 

4Centromere is indicated by “° 

“Translocation breakpoint is indicated by ‘::’ 

Modified from Kimber and Sears (1968). 


chromosomes in plants with the monosomic addition 
of Ae. cylindrica chromosome 2C. Gametophytes lack- 
ing the chromosome suffer chromosome breakage, and 
the aberrant chromosomes are often transmitted to 
progeny. Over 400 stocks with terminal deletions 
have been isolated in the genetic background of Chinese 
Spring and the location of the breakpoints on the 
chromosome arms determined (Endo and Gill, 1996). 


These deletion stocks are a powerful tool for gene 
mapping in wheat. 

Compared to T. aestivum, tetraploid T. turgidum 
is far less tolerant of aneuploidy. T. turgidum mono- 
somics are difficult to produce, have poor vigor and 
fertility, and the monosomic state is poorly trans- 
mitted to progeny. Therefore, a set of disomic substi- 
tution lines of the D-genome chromosomes for their 
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A- and B-genome homoeologs, developed in the 
genetic background of durum variety Langdon, is 
used instead of monosomics in mapping studies at 
the tetraploid level. 


Gene Mapping with Monosomics 

Synteny mapping with T. aestivum monosomics (or 
monotelosomics) exploits the altered segregation ratio 
that characterizes progeny of a monosomic compared 
to a disomic. In the monosomic portion of progeny 
from a cross between a monosomic female and euploid 
male, the monosome is contributed by the male par- 
ent. If the male has a recessive allele on the chromo- 
some the monosomic F; will express the recessive 
allele. In practice, a homozygous recessive (aa) male 
is crossed with each of the 21 possible monosomics. 
Only one of the F; progeny will express the recessive 
phenotype, indicating the chromosome on which the 
locus is located. The entire F progeny derived from 
F, will show the recessive phenotype, provided that 
the phenotype of rare nullisomics is the same as that of 
the hemizygous (a-) or homozygous (aa) plants. If it is 
not, the F, progeny will not be uniform. Nevertheless, 
even in this instance, the F, progeny will show a vast 
excess of recessive phenotypes. 

If the genotypes of the parents are reversed, the male 
parent being homozygous for the dominant allele (AA) 
and the monosomic female hemizygous for the reces- 
sive allele (a—), the hemizygous (A—) monosomic F, 
progeny from the cross usually expresses the dom- 
inant phenotype and is phenotypically similar to the 
disomic (AA). In the F, generation, most plants are 
either homozygous AA or hemizygous A-. Since nul- 
lisomics are usually rare, segregation in the Fz genera- 
tion is greatly distorted in favor of the dominant class. 
In synteny mapping schemes, a homozygous dom- 
inant (AA) male is crossed with the entire set of 21 
possible monosomics. All F; monosomic progenies 
from these crosses are usually phenotypically iden- 
tical. In the F, generation, 20 progenies will segregate 
in the classical monohybrid phenotypic 3:1 ratio but 
one will show an excess of dominant phenotypes, 
indicating the syntenic group in which the locus 
resides. In some crosses, the superiority of the euploid 
pollen over the nullisomic pollen (see ‘Aneuploids and 
alien addition and substitution lines’) is weak. If the 
frequency of nullisomics approaches 0.25, it will be 
concluded that a normal Mendelian 3 AA:1 aa segre- 
gation has occurred, if the phenotype of the nullisomic 
(- -) is the same as that of the recessive homozygote 
(aa). Meiotic examination of the recessive class in each 
cross is needed in such cases. All recessive plants in the 
critical progeny will be found to be nullisomic. This 
outcome identifies the syntenic group in which the 
locus resides. 


Intervarietal Substitution Lines 

In an intervarietal disomic substitution (DS) line 
(Table 2), a single chromosome pair from wheat var- 
iety A (chromosome donor) is substituted for the 
homologous pair in wheat variety B (chromosome 
recipient). To develop an intervarietal disomic substi- 
tution line, variety A is crossed as a male with variety 
B monosomic for the targeted chromosome. The 
monosomic F; is selected cytologically. The mono- 
some, which is contributed by the donor variety, can- 
not recombine because of the absence of a homolog. 
The F; monosomic is backcrossed to the monosomic 
of line B, and again a monosomic is selected in the 
progeny. This backcross is repeated six or more times, 
thereby producing a monosomic intervarietal substi- 
tution of a specific chromosome of line A in the 
genetic background of line B. An intervarietal disomic 
substitution line is then produced by selfing. Any 
wheat genotype can be used as a source of a chromo- 
some but the choice of a chromosome recipient is 
limited by the availability of a set of monosomics in 
the specific genetic background. 

Alien disomic substitution lines, such as the D- 
genome disomic substitution lines in the Langdon 
genetic background, can be used instead of mono- 
somics in the production of intervarietal DS lines. In 
this scenario, an alien disomic substitution line is used 
as a recurrent parent instead of a monosomic. In 
each backcross generation, the male parent is mono- 
somic for two homoeologous chromosomes (double 
monosomic). The absence of recombination between 
homoeologous monosomes ensures that an intact 
wheat chromosome is ultimately substituted. 

Intervarietal disomic substitution lines partition 
the genome of a donor variety into individual chromo- 
somes in the nearly isogenic background of a recipient 
variety and thus provide a powerful tool for gene 
mapping. If the genetic background of the recipient 
variety has been fully restored by backcrossing, all 
phenotypic differences between a disomic substitu- 
tion line and the recipient variety are owing the activ- 
ity of genes located on the substituted chromosome. 

A locus placed into a syntenic group by analysis of 
disomic substitution lines can be further mapped by 
employing disomic recombinant substitution lines 
(disomic RSLs) for that specific chromosome. A dis- 
omic RSL is a line in which a single pair of recombined 
chromosomes is substituted into the genetic back- 
ground of a recipient variety. In a population of di- 
somic RSLs, each line is homozygous for a pair of 
recombined chromosomes. To develop a disomic 
RSL mapping population, a line with disomic substi- 
tution of a specific chromosome of line A in the 
genetic background of line B is crossed with line B. 
F, progeny is crossed as a male with the corresponding 


monosomic of line B. Monosomic progeny harbor 
recombined (A/B) monosomes. Homozygous di- 
somic RSLs are produced by selfing. Because of iso- 
genicity of the genetic background, populations of 
disomic RSLs can be used for mapping of genes with 
minor effects and genes affecting quantitative traits. 


Genetic Transmission 


All polyploid species in the Triticum—Aegilops alliance 
are allopolyploid. Their chromosome complements 
are composed of either two or three pairs of related 
genomes. Artificial allopolyploids invariably show 
some heterogenetic chromosome pairing (pairing 
between homoeologous chromosomes at meiosis I). 
In marked contrast, chromosomes pair only homogen- 
etically (only homologs pair) in natural polyploids. 
Because of this, natural allopolyploids show strictly 
disomic inheritance. Studies of aneuploids and aneu- 
ploid interspecific hybrids showed that heterogenetic 
chromosome pairing in wheat is prevented by the 
activity of a completely dominant gene, Ph1 (pairing 
homoeologous), on the long arm of chromosome 5B. 
If the locus is absent, heterogenetic chromosome pair- 
ing occurs. Additional suppressors of heterogenetic 
pairing have been detected in the A and D genomes. 
Of these weaker loci, the best characterized is the Ph2 
locus on the short arm of chromosome 3D. While Ph2 
has been found to exist in Ae. tauschii, a diploid, where 
it plays an unknown role, the evolutionary source of 
Ph1 is currently unknown, since no diploid species has 
so far been found to compensate fully for the absence 
of Ph1 in Ph1-deficient interspecific hybrids. Interest- 
ingly, accessory chromosomes of Ae. speltoides and 
Ae. mutica exert a pairing effect on homoeologous 
chromosomes similar to that of Ph1. 

Genotypes of virtually all polyploid Aegilops spe- 
cies were found to suppress pairing of homoeologous 
chromosomes to some degree, indicating that disomic 
inheritance in these species is a result of genetic sup- 
pression of heterogenetic chromosome pairing. Since 
none of these species possess the Phi gene, genetic 
suppression of heterogenetic chromosome pairing in 
Aegilops must employ different genes. 

The suppression of heterogenetic chromosome 
pairing by Ph1, Ph2, and other loci is opposed by a 
number of genes that either promote heterogenetic 
chromosome pairing or inhibit the expression of sup- 
pressors. Ae. speltoides and Ae. mutica are poly- 
morphic for major genes inhibiting P// activity. 

The mechanism by which Ph] and other genes with 
a similar function regulate heterogenetic chromosome 
pairing is currently unknown. It has been suggested 
that Ph1 regulates premeiotic association of chromo- 
somes and that premeiotic associations, mediated 
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by the centromere-spindle interactions, predetermine 
meiotic pairing pattern. Studies of recombination 
between chromosomes composed of homologous and 
homoeologous segments showed that Ph1 precludes 
recombination in homoeologous segments even if the 
centromere and telomere are simultaneously homo- 
logous, which is inconsistent with these hypotheses. 
The activity of Ph1 also affects meiosis I pairing of 
homologous chromosomes by detecting heterozygo- 
sity in homologous chromosome pairs in wheat inter- 
varietal F, hybrids. Partial suppression of crossovers in 
F, plants from intervarietal crosses reduces the regular- 
ity of chromosome disjunction and results in the pre- 
sence of aneuploids in F, and early selfing generations. 


Mating Systems 


All species in the Triticum—Aegilops alliance, except 
for Ae. mutica and Ae. speltoides, are naturally self- 
pollinating. The outcrossing rates vary among the 
self-pollinating species. In wheat, the outcrossing 
rate is typically about 1% in field conditions. 


Recombination between Homoeologous 
Chromosomes 


The Ph1 locus prevents recombination and meiosis I 
pairing between homoeologous chromosomes not 
only in tetraploid and hexaploid wheat but also in 
wheat haploids and interspecific hybrids. As a result, 
Ph1 represents a potent barrier for the introgression of 
genes from related species into wheat. The initial strat- 
egy to incorporate alien genes into wheat chromo- 
somes relied on the production of translocations 
between alien and wheat chromosomes by irradiation. 
The elucidation of the genetic control of heterogenetic 
chromosome pairing in wheat facilitated the devel- 
opment of techniques for introgression of alien genes 
by recombination between homoeologous chromo- 
somes allowed by nullisomy for chromosome 5B or 
by homozygosity or hemizygosity for a recessive 
mutation of Phi. Several ph1 mutations exist in 
T. aestivum; the ph1b deletion mutation has been 
most extensively used. Only one mutant, phic, exists 
in T. turgidum. 


Genetic Mapping 


Mapping of traits in wheat has to a large extent relied 
on natural variation since induced mutations are 
rare in polyploid wheats. Mutations induced by ioniz- 
ing radiation are most often large deletions. Many 
genes controlling isozymes, disease resistance, envir- 
onmental stress tolerance, morphological traits, and 
other types of genetic markers have been placed into 
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syntenic groups by monosomic, nullisomic-tetra- 
somic, and ditelosomic analyses, and analyses of alien 
disomic addition and alien and intervarietal disomic 
substitution lines. A compilation of mapped wheat 
genes can be accessed in the Wheat Gene Catalog in 
GrainGenes (http://wheat.pw. usda.gov). 

Linkage maps employing RFLP and simple se- 
quence repeat (SSR) markers have been developed 
for T. aestivum and T. turgidum and many of these 
maps have been compiled in GrainGenes (http:// 
wheat.pw.usda.gov). SSR markers, most based on 
dinucleotide motifs, are highly polymorphic in 
T. aestivum, and most primer sets amplify DNA 
from only a single genome. 

Extensive deletion maps have been constructed 
for all 21 chromosomes of common wheat. Deletion 
mapping is an efficient means of placing molecular 
markers into bins delineated by the breakpoints of ter- 
minal deletions and is the backbone of the expressed 
sequence tag (EST) mapping in wheat. 

In wheat genomes, over 30% of loci are duplicated 
or multiplicated. That fact must be considered in com- 
parative mapping and other uses of wheat genetic 
maps. 

A notable characteristic of the wheat linkage maps 
is their great distortion relative to physical-type maps 
(such as the deletion maps) employing the same mark- 
ers. These distortions reflect the fact that crossovers 
are preferentially localized at the ends of the chromo- 
somes while large proximal regions of chromosomes 
are largely devoid of crossovers. These proximal 
euchromatic regions of the wheat chromosomes also 
tend to be poor in gene content. Genes tend to be 
clustered in gene-rich islands in wheat chromosomes. 
The locations of these islands are currently being 
investigated. 
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A ‘trivial equilibrium’ in evolutionary biology gener- 
ally refers to an equilibrium state in which a popula- 
tion is monomorphic, that is has only one version 
(allele) of a certain gene. This state is also called a ‘fix- 
ation equilibrium’ or “boundary equilibrium.’ When 
there are only two alleles present at a particular 
genetic locus, the stability of the allele frequency equi- 
librium where the frequency of one of these two 
alleles is O determines whether or not that allele 
can be maintained in the population. If the fixation 
equilibrium is unstable where the frequency of allele 
A, for instance, is 0, then the frequency of allele A 
will always move away from 0, and thus increase when 
it is rare, thereby insuring that it can never be elimin- 
ated as long as the current conditions prevail. If 
such an equilibrium is locally stable, however, the 
frequency of allele A will approach 0, and it will be 
lost, whenever its frequency starts or becomes suffi- 
ciently low. 

The stability of the two trivial equilibria, or fixation 
states, where only one of two alleles is present, can 
also give useful insight into the conditions under 
which both alleles (and thus genetic variation) are 
maintained in the population. In particular, when fix- 
ation is unstable for both alleles at a diallelic genetic 
locus (when either allele has a frequency of 0), then 
neither allele can be lost, and we say we have a 
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Sample allele frequency trajectories as a function of time in generations under a form of frequency- 


dependent selection for various initial allele frequencies (po). Left: fitness conditions giving a unstable—stable—unstable 
(USU) equilibrium pattern with a protected polymorphism; right: a stable—unstable—stable—unstable—stable (SUSUS) 
equilibrium pattern without a protected polymorphism. The first and last entries of ‘unstable’ (U) or ‘stable’ (S) refer 
to the stability of the two fixation equilibria with allele frequencies of 0 and |. The intermediate entries refer to the 
stability of the internal, polymorphic equilibria with allele frequencies greater than O and less than |. The fitness 
parameters (a,b,c,d) give the values of the pairwise fitnesses for the symmetric pairwise interaction model defined in 


Table 5 of Asmussen and Basnayake (1990). 


‘protected polymorphism.’ A protected polymorph- 
ism thus ensures that permanent genetic variation will 
be maintained in the population. Because of this useful 
practical application, in analyzing more complicated 
population genetic models, where it is difficult if not 
impossible to derive the full analytic conditions for 
the maintenance of genetic variation, researchers often 
rely instead on determining the conditions for a pro- 
tected polymorphism. 

In interpreting protected polymorphism results, 
however, it should be realized that a protected poly- 
morphism is only sufficient, and not always necessary, 
to retain a population’s existing allelic variation. This 
is because it is sometimes possible to have a situation 
where an internal, polymorphic equilibrium is stable 
along with one or both of the two trivial equilibria. 
Such a combination of simultaneously stable internal 
and fixation equilibria occasionally arises in more com- 
plicated models of natural selection, such as frequency- 
dependent selection, where the fitnesses of the various 
genotypes vary with the changing genetic composition 
of the population in which they are found. 

In such cases the evolutionary fate of existing 
genetic variation depends on the initial genetic com- 
position of the population; from some initial allelic 
frequencies (e.g., those near the stable fixation equili- 
brium) the population will converge to the stable fix- 
ation equilibrium and lose all genetic variation, while 
from other initial frequencies it will converge to a 
stable internal equilibrium and maintain that vari- 
ation. Genetic variation can thus be maintained in 
such populations under certain initial conditions, 
even in the absence of a protected polymorphism. 
In these populations, the conditions for a protected 


polymorphism underestimate the full conditions 
under which allelic variation is maintained. 

Figure | illustrates this type of situation in the 
symmetric pairwise interaction model of frequency- 
dependent selection introduced by Asmussen and 
Basnayake (1990). In this model both alleles are main- 
tained via a protected polymorphism whenever the 
fitnesses in like x like interactions (homozygote x 
like homozygote) are lower than those in like x unlike 
interactions (homozygote x heterozygote). Given the 
right initial conditions, however, both alleles can also 
be maintained in the population when fitnesses in 
homozygote x like homozygote interactions are less 
than a 1:2 weighted average of the fitnesses in homo- 
zygote x unlike homozygote and heterozygote x 
heterozygote interactions. 

In the left panel of Figure I, the genotypic fitnesses 
in the pairwise interactions among the three genotypes 
are such that there is a protected polymorphism; fix- 
ation for both alleles is unstable, and there is a single, 
stable polymorphic equilibrium, with each allele hav- 
ing a frequency of 0.5, to which the population con- 
verges from all initially polymorphic states. Genetic 
variation is thus always maintained under these con- 
ditions. Genetic variation can also be maintained in 
this model under the selection conditions shown in the 
right panel, however, where both fixation equilibria 
are actually stable. Here there is most definitely not a 
protected polymorphism, since when either allele is 
rare it is lost from the population; however, a perman- 
ent polymorphism may still be reached, since for 
intermediate initial frequencies the population conver- 
ges over time to a stable internal equilibrium at which 
both alleles are maintained at a frequency of 0.5. The 
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fitness conditions giving a protected polymorphism 
thus underestimate the potential for permanent genetic 
variation under this type of (frequency-dependent) 
selection. 
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The individualization of the trophectoderm from the 
inner cell mass during the late morula stage is the first 
differentiation event of the developing mammalian 
embryo. Trophoblast cells are derived from the tro- 
phectoderm and mediate embryonic implantation, the 
process by which an embryo initiates and maintains 
contact with the maternal uterine epithelia or stroma. 
Trophoblast precursors are diploid, mononuclear 
stem cells that differentiate to form nonproliferating 
trophoblast and giant cells. Giant cells are charac- 
terized by their large size, increased phagocytosis, 
and the endoreduplication of their DNA. In humans, 
giant cells fuse with each other to form multinucleate 
syncytiotrophoblasts. 

Implantation begins with the apposition of the tro- 
phectoderm to the uterine epithelia and continues 
with the formation of adhesive contacts between 
them. In some mammals, adhesion may be followed 
by the invasion of the uterine epithelium by the tro- 
phoblast cells during the formation of the placenta. 
There are two main types of placentation, epithelio- 
chorial and hemochorial. In the epitheliochorial, or 
yolk sac placenta, the uterine epithelia and trophoblast 
maintain intimate contacts through the interdigitation 
of microvilli without invasion of the uterine epithelial 
layer by the trophoblast. Nutrient, waste, and gas 
exchange occurs across these cell layers. In the hemo- 
chorial or chorioallantoic placenta, the trophoblast 


invades the uterine epithelial cell layer by the intru- 
sion of the trophoblast into the uterine stroma, or the 
displacement of the uterine epithelium by the tropho- 
blast. 

Most ungulates, like the porcine, bovine, and ovine, 
have an epitheliochorial placenta. The attachment of a 
blastocyst is followed by adhesion and the formation 
of microvilli which interdigitate the trophoblast cells 
with the uterine epithelium. The trophoblast, or 
chorion, expands to cover the entire surface of the 
uterine epithelia surrounding the implantation site, 
and the maternal and fetal blood vessels form beside 
the uterine epithelia and trophoblast cell layers, 
respectively. Although the distance between the ves- 
sels may be as little as 2 um, the integrity of the uterine 
epithelial cell layer is not breached by the trophoblast. 
There may be specialized areas of the epitheliochorial 
placenta for the exchange of gases or other nutrients. 
The majority of marsupials also exhibit epitheliochor- 
ial placentation, even though the gestation period of 
marsupials is short compared with other mammals, 
and most development of the offspring is supported 
by lactation after birth, rather than in utero by 
placentation. 

In rodents and humans a hemochorial placenta 
forms when the trophoblast invades the uterine stro- 
ma. In the rat and mouse, the mural trophectoderm 
cells that are not in contact with the inner cell mass 
attach to the uterine epithelia. They become highly 
phagocytic and engulf the apoptotic uterine epi- 
thelial cell layer as they invade into the uterine 
stroma. Invasion of the stroma is facilitated by the 
production of serine, matrix metallo- and cysteine 
proteinases by the trophoblast cells. After the invasion 
of the stroma, the mural trophectoderm cells differ- 
entiate to form the primary trophoblast giant cell 
layer. The polar trophectoderm cells remain as diploid 
stem cells that continue to proliferate and form 
the ectoplacental cone. Some of these cells will 
become giant cells, while others become the extraem- 
bryonic ectoderm, which ultimately forms the chor- 
ion of the placenta. 

Upon terminal differentiation to giant cells, the 
trophoblast cells expand in size and endoredupli- 
cate their DNA. During endoreduplication multiple 
rounds of DNA replication occur without intervening 
mitosis, resulting in the accumulation of extra chro- 
matids. In murine trophoblast giant cells, the down- 
regulation of a transcription factor called Snail may 
promote endoreduplication. 

The trophoblast cells produce hormones, like pla- 
cental lactogen, that help to maintain pregnancy, and 
factors that promote maternal angiogenesis. Before 
the placenta forms, the embryonic trophoblast cells 
directly contact maternal blood sinuses, and in 


combination with Reichert’s membrane and the yolk 
sac, serves as a primitive placenta to enable nutrient 
and gas exchange. The definitive placenta forms when 
the allantois, which is derived from the embryonic 
mesoderm, attaches to the chorion. Fetal blood ves- 
sels, including the umbilicial cord vessels, form from 
this mesoderm and invade the chorion and laby- 
rinthine layers of the placenta, where nutrient and 
gas exchange occur. 

After the mouse placenta forms, the trophoblast 
giant cells and diploid trophoblast cells are found in 
the labyrinthine and spongiotrophoblast layers. In the 
labyrinth, the trophoblast cells will contact the fetal 
blood vessels and maternal blood sinuses, whereas 
in the spongiotrophoblast layer the maternal blood 
sinuses and stromal cells are in close apposition. In 
humans the cytotrophoblast cells invade and fuse with 
the maternal spiral arterioles coming in direct contact 
with maternal blood, whereas the syncytiotrophoblast 
layer is bathed in maternal blood. The embryo proper 
does not come in direct contact with maternal blood. 
The trophoblast giant cell layer isolates the embryo 
early in pregnancy, and maintains a barrier between 
fetal and maternal blood vessels after the placenta 
forms. 

Female mammals inherit one copy of the X chromo- 
some from each parent, and one copy is tran- 
scriptionally inactivated in the trophectoderm at the 
blastocyst stage. In mice the trophoblast cells select- 
ively inactivate the paternally inherited X chromo- 
some, whereas either the maternal or paternal X 
chromosome is inactivated in the embryonic tissue. 
Genomic imprinting and X chromosome inactiva- 
tion prevent the complete development of mouse 
embryos created from parthenogenetic, gynogenetic, 
and androgenetic embryos. In parthenogenotes and 
gynogenotes, which contain only maternal chromo- 
somes, the extraembryonic tissues fail to develop 
properly. In androgenotes with only paternal chromo- 
somes the embryonic tissues do not form. 

The proper development of the trophoblast is 
essential to embryo implantation and normal develop- 
ment. In humans, implantation failure is a frequent 
cause of infertility. In the disease of pregnancy, pre- 
eclampsia, the cytotrophoblast exhibits shallow inva- 
sion of the maternal spiral arterioles resulting in poor 
placenta development, fetal growth retardation and, 
often, fetal death. Choriocarcinoma, a rare form of 
cancer, is a malignant growth of the trophoblast in 
the absence of an embryo, that is highly invasive and 
may cause maternal death. 


See also: Embryonic Development, Mouse; 
Embryonic Stem Cells 
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Trypsin, the canonical serine protease, has often been 
used to investigate the relationship between the struc- 
ture of proteins and their respective functions. This 
relationship is one that exists in a delicate balance. It 
has been refined through evolution and is a significant 
challenge to understand. Historically, various and 
sometimes harsh, chemical and biochemical methods 
have been used to dissect this relationship. However, 
with modern molecular biology and biophysical tech- 
nology, one may examine and explore this relationship 
in an intentional, systematic, and more selective man- 
ner. The coupling of molecular cloning with tech- 
niques such as X-ray crystallography has provided a 
means to directly examine the various relationships 
that exist between the amino acids of a given protein. 
Once the high resolution three-dimensional structure 
of an enzyme has been determined, the potential 
role(s) of individual amino acids in a protein often 
becomes more obvious, and many times new or unex- 
pected features are discovered. Furthermore, the 
understanding of these relationships may be used to 
alter pre-existing functions or to introduce new func- 
tions into a protein in a predictable manner, i.e., pro- 
tein engineering. The proteolytic enzyme trypsin has 
played a fundamental role in our current understand- 
ing of the relationship between the structures of the 
serine proteases and their activities. A common three- 
dimensional structure and catalytic mechanism has 
been conserved among the eukaryotic and prokaryotic 
members of this large family of enzymes. However, 
the diverse activities of the serine proteases that range 
from digestion to fertilization are the results of differ- 
ent sets of amino acids that are used by each enzyme 
for its specific function. This article profiles various 
approaches to study trypsin and further our under- 
standing of enzyme structure and function, particu- 
larly in the serine proteases. 


Background 


Trypsin (enzyme classification number 3.4.21.4) was 
first described in the late 1800s as a proteolytic activity 
present in pancreatic secretions. Subsequent studies 
revealed that this enzyme specifically hydrolyzed 
peptide bonds C-terminal to the amino acid residues 
of Lys (lysine) and Arg (arginine) 10° times faster 
than hydrolysis by hydroxide ion. Since its initial dis- 
covery, trypsin has been identified in all animals, 
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including insects, fish, and mammals. Trypsin from 
each source can differ slightly in activity, but the nat- 
ural substrate for the enzyme is generally any peptide 
that contains Lys or Arg. The specificity of trypsin 
allows it to serve both digestive and regulatory func- 
tions. As a digestive agent, it degrades large polypep- 
tides into smaller fragments. As a regulatory protease, 
it activates other proteins through limited proteolysis 
at specific Lys or Arg bonds. 


A Model for Protein Engineering 


Trypsin has been extensively studied both mechanis- 
tically and structurally producing a vast amount of 
information regarding how its structure relates to cer- 
tain aspects of its function. This large informational 
database makes this enzyme an ideal candidate for 
examining protein structure and function in detail. 
Additional features of this enzyme have also made it 
a desirable candidate for protein engineering. It is 
relatively small (molecular weight = 25 kDa) and it 
is monomeric. The protein can be overexpressed 
in bacteria or yeast and its primary structure can be 
readily altered through the use of recombinant DNA 
technology. The enzyme is very robust and stable 
for long periods of time at low temperature, either 
lyophilized or in solution at low pH. 


Structural Aspects 

Detailed analysis of various high-resolution crystal 
structures of trypsin provided the initial understand- 
ing of how the three-dimensional structure relates to 
its function. The crystal structures of trypsin from 
several sources have been solved under a wide range 
of conditions in the presence and absence of both 
small molecule and macromolecular inhibitors. Ana- 
lyses of these structures have identified key amino 
acids that are important for activation, catalysis, and 
substrate recognition. 


Zymogen activation 

In all natural systems, trypsin is expressed in an in- 
active form, or zymogen, known as trypsinogen. In 
mammals, trypsinogen is expressed in the pancreas 
and then secreted into the duodenum where it is acti- 
vated by the highly selective protease, enteropeptidase. 
Enteropeptidase recognizes a specific sequence (Asp- 
Asp-Asp-Asp-Lys) at the amino-terminus of try psi- 
nogen and cleaves the peptide bond C-terminal to the 
Lys residue to produce the active enzyme. A compari- 
son of the high-resolution crystal structures of trypsi- 
nogen and trypsin revealed that after removal of this 
specific N-terminal segment, a buried aspartate 
(Asp194) rotates about the polypeptide backbone and 
forms a salt bridge with the new N-terminus at Ile16 


(Figure 1). This interaction orders a site on the poly- 
peptide that is essential for stabilizing the negatively 
charged transition state of the substrate. Once this 
region is stabilized, trypsin is irreversibly activated. 


Catalysis 

Catalytic activity in trypsin proceeds by the utiliza- 
tion of an elegant arrangement of three amino acids 
that constitute the ‘catalytic triad.’ Using a numbering 
system based on chymotrypsinogen, this essential 
triumvirate consists of serine 195, histidine 57, and 
aspartate 102. These three amino acids are strictly 
conserved among all members of the trypsin family. 
Several features identified via crystallographic analysis 
suggested that the functional moieties of the triad do 
not act independently, but instead work in concert 
to facilitate peptide and ester bond hydrolysis. The 
crystal structures show that the 6-oxygen of Asp102 
accepts a hydrogen bond from the 6-1-nitrogen of 
His57 (Figure 2A). This interaction highly polarizes 
the His such that there is a proton on the 6-1-nitrogen 
and not on the 2-nitrogen. Consequently, the ¢-2- 
nitrogen faces Ser195 and acts as a general base to 
increase the nucleophilic character of the hydroxyl 
group of Ser195. In the currently accepted mechanism 


Asp-194 (Tg) 


Asp-102 
terminal amine 


Asp-194 (Tn) 


Figure | (See Plate 43) Arrangement of key residues 
in trypsinogen (Tg) and trypsin (Tn). The arrangement 
of the catalytic triad is essentially identical in the 
zymogen (Peach) and mature forms (Pink) of the 
enzyme. The major difference is the position of 
Asp|94. This residue rotates about the backbone and 
forms a salt bridge with the N-terminus of the mature 
enzyme. The salt bridge formed between Asp|94 and 
the N-terminus of the mature enzyme causes a subtle 
change in backbone structure and forms a site on the 
enzyme that stabilizes the transition state. 
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Figure 2 Schematic representation of the structure and mechanism of the trypsin active site. (A) Key features of 
trypsin: the arrangement of Asp102, His57, and Ser195 is essential for catalysis. The -oxygen of Asp|02 accepts a 
hydrogen bond from the 51 -nitrogen of His57. The 2-nitrogen faces Ser 195 and acts as a general base to increase the 
nucleophilic character of the hydroxyl group of Ser195 which attacks the carbonyl carbon of the scissile peptide bond. 
The base of the SI site is occupied by Asp189 and facilitates Lys and Arg recognition primarily through electrostatic 
interaction. (B) Stabilization of the transition state: after nucleophilic attack of Ser 195 on the carbonyl carbon, the 
backbone amide hydrogens of Gly193 and Ser195 stabilize the negatively charged tetrahedral intermediate formed in 
peptide hydrolysis. 
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of catalysis, after the initial complex between trypsin 
and a substrate is formed, the hydroxyl oxygen of 
Ser195 attacks the carbonyl carbon of the scissile 
bond. The histidine then acts as a general acid and 
donates the proton abstracted from Ser195 to the 
newly formed amine or alcohol group. The first prod- 
uct then dissociates and a covalent acyl-enzyme 
complex is simultaneously formed (Figure 2B). De- 
acylation occurs through the same mechanism except 
that solvent provides the attacking nucleophile. The 
serine and histidine residues had been shown to be 
essential for catalysis by chemical modification 
experiments. Defining the role of the third member 
of the catalytic triad required more sophisticated ana- 
lytical methods. 


Substrate recognition 

The understanding of substrate recognition in trypsin 
has been facilitated by co-crystal structures of trypsin 
complexed with macromolecular inhibitors such as 
pancreatic trypsin inhibitor. The structural basis for 
recognition of Arg and Lys residues is a clearly 
defined region in the protein referred to as the S1 
site. An Asp residue (Asp189) occupies the base of 
this site and forms favorable interactions with the 
positively charged Arg and Lys side chains of bound 
substrates (see Figure 2). The cocrystal structures 
also revealed multiple interactions at the protease- 
inhibitor interface that define the extended substrate- 
binding pocket. These additional interactions increase 
the catalytic efficiency of trypsin toward peptide sub- 
strates. Residues lining the sides of the S1 site also 
affect substrate recognition. Glycines at positions 
216 and 226 (Gly216 and Gly226) lie on opposite 
walls of the pocket and interact with the aliphatic 
portion of the long side chains of Lys and Arg. Large 
hydrophobic residues do not bind productively in the 
S1 site and negatively charged residues are chemically 
incompatible. Similar, but less obvious, principles 
are involved in the recognition of extended peptide 
substrates. 


Mutational Analysis 

Structural analysis of native trypsin provided a frame- 
work to further clarify the existing biochemical data 
and to assign specific roles for individual amino acids 
from a physical frame of reference. However, the pro- 
posed roles for various amino acids could not be tested 
directly until the mid-1980s when advances in recom- 
binant DNA technology allowed their direct replace- 


ment. 


Catalysis 
Further support for the importance of each of the 
residues in the catalytic triad was obtained by 


replacing Ser195 and His57 of rat trypsin with ala- 
nines. These substitutions led to a 10*-fold decrease 
in activity. The role of the active site aspartic acid was 
evaluated by replacement with an asparagine (Asn). 
Kinetic and structural analysis of the Asp102 — Asn 
variant of trypsin revealed that the role of Asp102 is 
to maintain the imidazole ring of His in the correct 
three-dimensional arrangement. Substituting any resi- 
due in the catalytic triad dramatically reduces activity. 
Such deleterious effects verified that each component 
of the triad is an indispensable part of the catalytic 
machinery. 


Substrate recognition 

Both mutational analysis and genetic selection have 
been used to generate trypsin variants that address the 
role of Asp189 in trypsin-catalyzed reactions. Kinetic 
analysis of these variants such as Asp-189—Ser dem- 
onstrated that the presence of a negative charge at 
the base of the specificity pocket is essential for cata- 
lysis. Variants that did not contain the negative charge 
were 10°-fold lower in activity toward Arg or Lys 
containing substrates. This conclusion is further sub- 
stantiated by the observation that activity was par- 
tially restored if acetate were added to the assay 
mixture. The crystal structure of this variant reveals 
that the acetate occupies the base of the specificity 
pocket, facilitating interaction between the variant 
trypsin and a substrate molecule, similar to Asp189 in 
native trypsin. However, substitution of Asp189 with 
the positively charged Lys did not result in a reversal 
of charge recognition. Clearly, Asp189 is not the only 
residue involved in substrate recognition. 


Substrate specificity 

Trypsin has been used to explore the structural fea- 
tures that govern substrate specificity among the ser- 
ine proteases. For example, the arrangement of the 
catalytic residues and the positioning of the S1 site 
are remarkably similar between trypsin and chymo- 
trypsin, but the residue at the base of the specificity 
pocket differs. Chymotrypsin has a Ser instead of an 
Asp at position 189 that facilitates association of the 
large hydrophobic residues typically recognized by 
chymotrypsin. When this residue in trypsin was sub- 
stituted with Ser, the catalytic efficiency of the variant 
enzyme was markedly reduced against substrates con- 
taining Lys and Arg, but no increase in activity was 
observed against typical chymotrypsin substrates. 
Therefore, there are other factors governing the spe- 
cificity of the protease toward its substrate. In fact, to 
achieve chymotrypsin-like substrate specificity, two 
surface loops in trypsin must be replaced by the cor- 
responding loops in chymotrypsin. The new ‘hybrid’ 
trypsin variant exhibited an acylation rate constant 


equal to that of chymotrypsin toward phenylalanine 
amide substrates, but still was 10°-fold lower in over- 
all efficiency relative to wild-type chymotrypsin. This 
result demonstrated that portions of the enzyme distal 
to the active site could play essential roles in affect- 
ing catalysis. Structural comparisons of the hybrid 
enzyme with native trypsin and chymotrypsin illu- 
strated that other features, such as the backbone con- 
formation of Gly216, are important to substrate 
specificity. These features were altered to increase 
the activity of the hybrid trypsin toward typical chy- 
motrypsin substrates. 


Altering Protein Function 

A goal in studying protein structure-function rela- 
tionships is to understand the basic principles of struc- 
ture that lead to specific function(s). The intent is to 
develop the capability to design a protein de novo that 
possesses a predicted function. Information from the 
structures of other proteins coupled with that from 
the numerous crystal and cocrystal structures of 
trypsin have been used as a guide for designing new 
regulatory functions into trypsin. In certain cases at 
least, the existing structural and biochemical informa- 
tion may be used to alter predictably the function of a 
protein by making subtle changes in its structure. For 
example, an enzyme whose activity could be regulated 
by transition metals was made. A variant of trypsin was 
designed that utilized the tendency of histidines to 
coordinate transition metals in proteins. Through 
molecular modeling, an Arg residue near His57 was 
selectively substituted with a His (Arg96—His). The 
expectation was that in the presence of certain transi- 
tion metals, the two His residues would coordinate the 
metal, precluding use of His57 in catalysis. Indeed, in 
the presence of copper, nickel, or zinc, the proteolytic 
activity of this variant was arrested. Furthermore, the 
activity could be restored by the addition of a strong 
metal chelator supporting the assumption that the 
metal was indeed acting in the predicted manner. Struc- 
tural analysis of the variant trypsin verified that a 
metal-regulated protease had been constructed. 

A similar design scheme was developed in which 
metal binding was used to alter the substrate specifi- 
city of trypsin. Two residues in the extended binding 
pocket of trypsin were substituted with histidines that 
could act as metal ligands. Additionally, a histidine in 
the substrate was positioned such thatifit completed the 
metal coordination polyhedron, it would register 
the scissile bond at the cleavage position in the protein. 
A substrate atypical for trypsin, containing tyrosine 
and a correctly positioned histidine, was used to test 
the design. The two protein ligands and one substrate 
ligand coordinated the metal thereby forcing the tyro- 
sine residue into the specificity pocket. This strategic 
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placement of a histidine in the substrate that facilitated 
productive binding of the substrate on the enzyme, 
resulted in a form of ‘substrate assisted’ catalysis. Not 
only was activity observed, but the efficiency was also 
increased through further engineered substitutions. 
By replacing the Asp in the specificity pocket with a 
Ser residue, which is more characteristic of chymo- 
trypsin, the serine protease that typically recognizes 
tyrosine as a substrate, a metal dependent proteolytic 
activity was achieved. 


Conclusion 


Investigations with trypsin have demonstrated that 
information obtained through various studies may 
be used to define principles and relationships of pro- 
tein structure that lead to function. Each new techno- 
logy provides not only a deeper understanding of how 
this particular enzyme works, but also new means to 
explore novel mechanisms that enzymes in general 
may employ. The activities that have been engineered 
into trypsin may already be used in closely related 
family members. Trypsin will continue to serve as a 
powerful and effective ‘divining rod’ to discover 
examples of these activities. 


See also: Proteins and Protein Structure 
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Tryptophan (Trp or W) is one of the 20 amino acids 
commonly found in proteins. Its nonpolar, aromatic 
side chain is only slightly soluble in water. Tryptophan 
mapping is a technique that exploits the fluorescent 
nature of tryptophan residues to determine the che- 
mical environment of a specific tryptophan in a pro- 
tein. Its chemical structure is given in Figure I. 
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Figure | Tryptophan. 


See also: Amino Acids; Proteins and Protein 
Structure 
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The trp Operon and Tryptophan 
Biosynthesis 


All organisms that synthesize tryptophan use the same 
sequence of biochemical reactions. In Escherichia colt, 
these reactions are catalyzed by enzymes formed from 
polypeptide chains encoded in the structural genes of a 
single transcriptional unit, the trp operon. This article 


will describe the organization of the trp operon of 
E. coli, the pathway and enzymes of tryptophan bio- 
synthesis, and the regulatory mechanisms this organ- 
ism employs to regulate trp operon expression. 


Gene Arrangement in the trp Operon of 
Escherichia coli 


The trp operon of E. coli and some other enteric 
microorganisms contains five major structural genes, 
designated trpE through trpA (Figure 1). These five 
genes encode five polypeptides bearing the seven 
functional domains that are necessary for tryptophan 
formation. The genetic segments corresponding to 
two pairs of functional domains are fused, yielding 


regulatory region structural genes 


trpL trpE trpD trpC trpB trpA 
gene segments: trpL trpE trpG ° trpD trpC -° trpF | troB | trpA | 
op. L» attn. i t t' 
trpP1 i trpP2 i 
Anthranilate synthase(E) z Fe i 7 
trp trp leader Glutamine amidotransferase(G)- ydro-lyase(B) 
polypeptides: peptide APR transferase(D) IGP 
IGP synthase(C)- aldolase(A) 
= PRA isomerase(F) = 
(C-F) 


Anthranilate synthase, 
Glutamine amidotransferase- 
APR transferase 


Tryptophan synthase: 
IGP aldolase, L-serine hydro-lyase 


enzyme complexes: (E,G-D,) (ApBo) 
+ L-glutamine +PRPP 
reactions: chorismate ————»anthranilate ————®» N-(5'-phosphoribosyl)- ————+1-(o-carboxyphenylamino)- 
anthranilate 1-deoxyribulose-5-phosphate 
TrpE, TrpG TrpD TrpF 
TrpC 
+L-serine 
L-tryptophan «———— indole «———— indole-3-glycerol 
TrpB TrpA phosphate 
Figure | The trp operon of Escherichia coli, its specified polypeptides, the enzyme complexes they form, the 


reactions in tryptophan biosynthesis, and the polypeptide or polypeptide domain responsible for catalysis of each 
reaction. The operon consists of a transcription regulatory region followed by five structural genes and tandem sites 
of transcription termination (t and t’). The principal promoter (trpP!) overlaps multiple operators (op.) at which the 
tryptophan-activated trp repressor can bind and inhibit transcription initiation. Following the promoter, there is a 
transcribed regulatory leader region containing the coding region (trpL) for a 14-residue peptide. Transcription may 
either terminate at a regulated site of transcription termination, the attenuator (attn.), located in this leader region, 
or proceed into the structural genes of the operon. Two of the structural genes, trpD and trpC, consist of fused 
genetic segments. Each genetic segment specifies a polypeptide domain that can catalyze one of the tryptophan 
biosynthetic reactions. There is an internal promoter (trpP2) near the distal end of trpD. TrpA through TrpG (and A 
through G) refer to the polypeptide domains responsible for catalysis of the indicated reactions. Four of the five trp 
polypeptides form enzyme complexes. APR transferase, anthranilate phosphoribosyl transferase; PRA isomerase, 
phosphoribosyl anthranilate isomerase; IGP synthase, indoleglycerol phosphate synthase; IGP aldolase, indoleglycerol 
phosphate aldolase; PRPP, 5-phosphoribosyl-|-pyrophosphate. 


the bifunctional polypeptides IrpG-TrpD and TrpC- 
TrpF. Dissection studies with these fused polypeptides 
have established that each domain is a more or less 
independent functional unit. 


The Pathway and Enzymes of 
Tryptophan Biosynthesis 


The pathway of tryptophan biosynthesis proceeds 
from chorismate, the common precursor of the three 
aromatic amino acids. Chorismate also serves as pre- 
cursor of several minor aromatic metabolites, includ- 
ing p-aminobenzoic acid, a component of folic acid. 
The biochemical reactions proceeding from chorismate 
to tryptophan, and the seven polypeptide domains cata- 
lyzing these reactions, are illustrated in Figure |. The 
synthesis of anthranilate from chorismate, and phos- 
phoribosyl anthranilate from anthranilate, are cataly- 
zed by a tetrameric enzyme complex consisting of two 
TrpE and two TrpG-TrpD polypeptides. Although 
L-glutamine is the preferred amino group donor during 
the synthesis of anthranilate (o-aminobenzoate) from 
chorismate, ammonia may be used as alternative source 
of this amino group by the complex or by the TrpE 
polypeptide alone. Glutamine utilization requires 
the TrpG glutamine amidotransferase domain. In 
the conversion of anthranilate to phosphoribosyl 
anthranilate, by the TrpD domain, 5-phosphoribo- 
syl-1-pyrophosphate (PRPP) contributes the side 
chain of phosphoribosy] anthranilate. Phosphoribosyl 
anthranilate is then rearranged by anthranilate 
phosphoribosyl transferase, the TrpF domain, to 
form _ 1-(o-carboxyphenylamino)-1-deoxyribulose- 
5-phosphate (CdRP). The carboxyl group of CdRP 
is then removed and the pyrrole ring of the indole 
moiety is formed, yielding the next intermediate in 
the pathway, indole-3-glycerol phosphate. The latter 
reaction is catalyzed by indoleglycerol phosphate 
synthase, the TrpC domain. Indole glycerol phosphate 
is then converted to indole by the TrpA polypeptide of 
the tryptophan synthase enzyme complex; this tetra- 
meric complex consists of two molecules each of TrpA 
and TrpB. Finally, indole is condensed witha pyridoxal 
phosphate derivative of L-serine, to form L-tryptophan; 
the final reaction is catalyzed by the TrpB polypeptide 
of the tryptophan synthase complex. 

Synthesis of tryptophan from chorismate requires 
the products of four additional biosynthetic path- 
ways, the compounds L-glutamine, phosphoribosyl- 
1-pyrophosphate, L-serine, and pyridoxal phosphate. 
Glutamine provides the amino group of anthranilate, 
phosporibosyl pyrophosphate is the source of two 
carbon atoms of the pyrrole ring of indole, L-serine 
provides the alanyl side chain of tryptophan, and 
pyridoxal phosphate is the coenzyme essential for 
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activation of L-serine during catalysis of the final reac- 
tion in tryptophan formation. 


Structure/Function Studies with the 
Tryptophan Biosynthetic Enzymes 


The mechanism of enzymatic catalysis of each of the 
tryptophan biosynthetic reactions has been investi- 
gated and appreciable information has been gathered 
on the key active site residues in each biosynthetic 
protein or protein domain. The three-dimensional 
structure of the tryptophan synthase enzyme complex 
of Salmonella typhimurium has been determined, as 
well as the structures of complexes containing mutant 
protein variants. The three-dimensional structure of the 
bifunctional phosphoribosyl anthranilate isomerase- 
indoleglycerol phosphate synthase of E. coli has also 
been determined. These structures have revealed that 
the TrpA, TrpC, and TrpF polypeptide domains have 
similar structures of the «/B TIM barrel type, raising 
the possibility that they evolved from one another or 
from a common ancestor. Structural studies with the 
tryptophan synthase enzyme complex have shown 
that a tunnel connects the active site of the TrpA 
polypeptide to the active site of the TrpB polypeptide. 
Indole, generated in the TrpA active site, travels 
through this tunnel to the TrpB active site, where it 
is condensed with serine. Studies with this enzyme 
complex have also revealed features of the complex 
that explain the mutual activation of each polypeptide 
upon complex formation with the heterologous poly- 
peptide. 


Regulation of Expression of the trp 
Operon of Escherichia coli 


The five structural genes of the trp operon are pre- 
ceded by a transcription regulatory region consisting 
of a promoter/operator, at which transcription init- 
ation is regulated, and a transcribed leader segment, 
within which transcription termination is regulated. 
Initation at the trp promoter is regulated by the 
tryptophan-activated trp repressor protein; the extent 
of repression varies in response to changes in the intra- 
cellular concentration of free tryptophan. Repression 
regulates operon expression over about an 80-fold 
range. Polymerase molecules that have initiated tran- 
scription at the trp promoter and escaped repression 
are subject to a second regulatory mechanism, tran- 
scription attenuation. The latter mechanism deter- 
mines whether or not transcription will terminate at 
a site located in the distal portion of the leader region. 
This decision is influenced by the intracellular con- 
centration of tryptophan-charged tRNA™?. When the 
Trp-tRNA"? concentration is high, transcription 


2078 Tumor Antigens Encoded by Simian Virus 40 


terminates in the leader region. When tRNA? is 
mostly uncharged, which occurs when cells experi- 
ence a severe tryptophan deficiency, termination is 
avoided and transcription proceeds to the end of the 
operon. Transcription attenuation in the trp operon of 
E. coli regulates transcription of the structural genes of 
the operon over about an eightfold range. The com- 
bined action of repression and attenuation regulates 
transcription of the structural genes of the operon 
over about a 600-fold range. There is an internal pro- 
moter located in the distal portion of trpD (Figure 1). 
Transcription initiation at this promoter is unregu- 
lated and proceeds at a frequency less than 10% that 
attributable to the principal promoter. Tandem sites of 
transcription termination are located following the 
trpA structural gene; the first is protein-factor-inde- 
pendent, a so-called intrinsic terminator, while the 
second required the protein Rho. Completion of tran- 
scription of the operon yields a polycistronic messen- 
ger RNA. Ribosomes can initiate translation at any of 
the five major ribosome binding sites on this polycis- 
tronic messenger. 

The trp promoter region of E. coli contains three 
operators that can bind trp repressor. Operator-bound 
repressor inhibits transcription initiation. The trp 
repressor also regulates transcription initiation in sev- 
eral other operons concerned with tryptophan metab- 
olism. The three-dimensional structures of the trp 
aporepressor (aporepressor lacks bound tryptophan), 
the trp repressor, and the trp repressor—-operator com- 
plex, have been determined. These structures have 
revealed the features of this protein that are respon- 
sible for its activation by tryptophan and its recogni- 
tion of specific operators. 

The transcribed leader region of the trp operon of 
E. coli is about 160 bp in length. As mentioned, this 
genetic segment encodes an mRNA segment that can 
cause transcription termination in the leader region. 
The transcript of the leader region can fold to form 
three RNA structures, termed terminator, antitermin- 
ator, and transcription pause structure. The terminator 
and antiterminator are alternative RNA structures, 
i.e., they have a sequence of nucleotides in common, 
thus either, but not both, can exist at one time. When 
cells are deficient in charged tRNA"? the antitermin- 
ator forms; this precludes formation of the terminator. 
When cells have adequate levels of charged tRNA™®, 
the terminator forms and transcription terminates in 
the leader region. A deficiency of charged tRNA“? is 
sensed during attempted translation of tandem Trp 
codons in a 14-residue leader peptide coding region, 
trpL, located near the 5’ end of the trp operon tran- 
script. Coupling of transcription and translation, 
essential to this mechanism of attenuation, is achieved 
by the formation of the transcription pause structure, 


located near the 5’ end of the transcript. Polymerase 
pausing allows a ribosome to bind to the transcript 
and initiate synthesis of the leader peptide. The move- 
ment of this ribosome then releases the paused tran- 
scription complex, and transcription and translation 
proceed in unison. 

Two of the trp polypeptides, the products of genes 
trpE and trpA, lack tryptophan, therefore they are 
synthesized preferentially during severe tryptophan 
starvation. An additional regulatory feature, transla- 
tional coupling, insures equimolar synthesis of the 
polypeptide products of two pairs of adjacent genes, 
trpE and trpD, and trpB and trpA. As mentioned, the 
products of these genes form enzyme complexes. The 
enzyme complex catalyzing the first two reactions in 
the pathway is feedback-inhibited by tryptophan. The 
tryptophan binding site is located in the TrpE poly- 
peptide. 

The use of two transcription regulatory mechanisms 
and feedback inhibition of anthranilate synthase activ- 
ity allows E. coli to regulate tryptophan biosynthesis 
efficiently in response to changes in the availability of 
tryptophan and the rate of protein synthesis. 
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Simian virus 40 (SV40) is a small (45 nm) DNA- 
containing virus that establishes a lifelong, harmless 
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persistent infection in the kidney of its natural host, 
the Rhesus maqaque. SV40 causes tumors in some 
rodents and confers tumorigenic properties to cell 
types from many species. Because it grows well in 
established monkey kidney cell lines and efficiently 
transforms rodent cell lines in culture, SV40 has been 
studied extensively as a model for viral productive 
infection and as a probe to understand molecular 
mechanisms of tumorigenesis. 

SV40 is a member of the polyomavirus subfamily of 
the Papovavirus family. The other subfamily includes 
the papillomaviruses. Polyomaviruses are character- 
ized by small icosahedral nonenveloped virions and 
circular, double-stranded DNA genomes. Some mem- 
bers of the polyomavirus subfamily such as murine 
polyomavirus or simian lymphotrophic polyoma- 
virus, are tumorigenic in their natural hosts. Other 
members of the subfamily, such as murine K virus 
and budgerigar fledging disease virus, are important 
pathogens. Two human polyomaviruses, BKV and 
JCV, have been identified and both are closely related 
to SV40. Both BKV and JCV establish lifelong, harm- 
less persistent infections of the kidney in most 
humans. JCV can undergo a productive infection of 
the brain in immunocompromized individuals and is 
the causative agent of progressive multifocal leukoen- 
cephalopathy, an AIDS-associated dementia. 


Infectious Cycle of SV40 


The SV40 virion consists of a single molecule of cir- 
cular, double-stranded DNA consisting of 5243 base 
pairs complexed with cellular chromatin and three 
viral-encoded proteins termed VP1, VP2, and VP3. 
The viral genome also encodes three proteins that are 
not present in the mature virion. One of these, the 
agnoprotein, is poorly understood but is thought to 
be involved in virion assembly and/or release from 
cells once the infectious cycle is complete. The other 
two proteins termed large tumor antigen (T antigen) 
and small tumor antigen (t antigen), play central roles 
in regulating the infectious cycle and are responsible 
for the tumorigenic properties of SV40. 

The circular viral genome is divided into two tran- 
scriptional units. The early promoter produces a single 
primary transcript that is differentially spliced to yield 
two mRNAs, one encoding large T antigen (T antigen) 
and the other small t antigen (t antigen). Expression of 
these messenger RNAs (mRNAs) requires only the 
cellular transcription apparatus and consequently the 
SV40 early promoter is frequently used in mammalian 
expression vectors to drive transcription of hetero- 
logous genes. The agnoproteins, VP1, VP2, and VP3, 
are encoded by differentially spliced mRNAs derived 
from the viral late promoter. The SV40 late promoter 


is inactive in most cell types. Large T antigen is a 
potent activator of the SV40 late promoter. Hence, 
transcription of the virion proteins requires the prior 
expression of T antigen in infected cells. 

The infectious cycle starts when an SV40 virion 
attaches to a susceptible cell. Most studies have been 
done on cultures of established African green monkey 
kidney cells. Internalized virions are thought to be 
transported to the nucleus with uncoating of the 
viral chromatin occurring either during this transport 
or subsequent to arrival in the nucleus. The cellular 
transcription apparatus then drives expression from 
the SV40 early promoter resulting in expression of 
large T antigen and small t antigen. T/t expression is 
followed by an increase in the transcription of a num- 
ber of cellular genes, many of which are involved 
in nucleotide metabolism (thymidine kinase), DNA 
replication (histones, DNA polymerase), and cell 
growth (rRNAs). Approximately 24h postinfection, 
the infected cells enter S phase and cellular DNA is 
replicated. Shortly after this, viral DNA replication is 
initiated and continues throughout infection. Tran- 
scription from the viral late promoter begins at about 
the same time as viral DNA replication, followed by 
expression of the virion proteins and the coordinated 
assembly of progeny virions. Cell death occurs about 
96h postinfection with approximately 300 infectious 
progeny being released from each infected cell. 

Not all cell types are permissive for SV40 infection. 
For example, human fibroblasts are semipermissive 
with viral replication being restricted to a few cells in 
the population. Rodent fibroblasts are nonpermissive 
and thus no progeny viruses are produced following 
infection of these cells. 


Role of Large T Antigen in Viral DNA 
Replication 


Large T antigen is the only viral protein directly 
required for viral DNA replication. The remaining 
proteins are recruited from the cellular replication 
apparatus. Viral DNA replication initiates at a unique 
site termed the origin of replication (ori) and proceeds 
bidirectionally around the genome. The minimal 077 is 
a 64 bp fragment that contains three important elem- 
ents: (1) a 21 bp imperfect palindrome; (2) a central 
region consisting of four repeats of the pentanucleo- 
tide GAGGC; and (3) an AT-rich region. Large T 
antigen possesses a DNA-binding domain that recog- 
nizes the GAGGC pentanucleotide. In addition, T 
antigen possesses an ATP-binding and hydrolysis 
domain. To initiate viral replication, T antigen mono- 
mers bind to the pentanucleotide and this is followed 
by the ATP-dependent assembly of a double hexamer 
structure that cooperatively forms around the central 
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portion of ori. The formation of the T antigen double 
hexamer leads to a distortion of the AT-rich region of 
ori that is essential for initiation of replication. T anti- 
gen then recruits the cellular replication apparatus to 
ori by forming direct associations with DNA poly- 
merase, primase, RPA, and topoisomerase. Following 
initiation, T antigen serves as a DNA helicase with 
each of the two hexamers hydrolyzing ATP. 

The replication functions of T antigen are both 
positively and negatively regulated by phosphory- 
lation. Phosphorylation at T124 is necessary for the 
cooperative assembly of double hexamers at ori, thus 
Tantigen molecules not modified at this site are defect- 
ive for replication. On the other hand, the phosphory- 
lation of several serine residues antagonizes double 
hexamer formation. 


Regulation of Gene Expression by the 
Large and Small T Antigens 


Both large and small T antigens are transcriptional 
regulators. Large T antigen also acts as a transcrip- 
tional repressor. T antigen recognition sequences 
(GAGGC elements) are present in the early promoter 
and when T antigen binds to these sequences early 
region transcription is abated. Thus, T antigen regu- 
lates its own levels via an autoregulatory feedback 
loop. This is clearly shown during SV40 productive 
infections where T antigen levels rise during the first 
24h postinfection, but then reach a steady-state level. 
Mutation of the T antigen binding sequences in the 
early promoter eliminate this autoregulation and lead 
to constituitively high levels of T antigen. 

Large T antigen is also a transcriptional activator. 
For example, large T antigen is necessary for activation 
of the viral late promoter, allowing expression of the 
virion structural proteins. Large T antigen also acti- 
vates expression of anumber of cellular genes, many of 
which are required to drive the cells into and through 
the cell cycle (see below). Transcriptional activation 
does not require the DNA-binding activity of T anti- 
gen. Rather, T antigen associates with the basal tran- 
scriptional apparatus and, by mechanisms that are still 
unclear, this association leads to transcriptional activa- 
tion. For example, T antigen binds directly to TBP and 
to the transcriptional adapters CBP/p300 as well as toa 
number of transcription factors. 

Small t antigen also activates the transcription of 
cellular genes, including cyclinD and cyclinA. Small t 
antigen-mediated activation of cyclinD transcription 
is indirect and requires small t antigen interaction with 
the cellular phosphatase pp2A. This association leads 
to inhibition of pp2A activity, which in turn leads to 
increased cyclinD transcription signaled by activation 
of the MAP kinase pathway. The mechanism by which 


small t antigen transactivates the cyclinA gene is not 
clear, but this action does not require interaction with 
pp2A. Rather, the small t antigen J domain, a con- 
served domain in all members of the DnaJ class of 
molecular chaperones, is necessary to activate cyclin 
A expression. 


Viral Tumorigenesis 


SV40 normally grows in growth-arrested, terminally 
differentiated cells; however, a number of cellular pro- 
teins required for viral replication are only expressed 
during S phase. Thus, a successful infection requires 
that SV40 drives the cells into the cell cycle. In per- 
missive cells, this is followed by viral DNA replica- 
tion, late gene expression, virion assembly, and cell 
death. Thus, tumor cells rarely, if ever, result from a 
SV40 productive infection. On the other hand, rodent 
cells are nonpermissive for SV40 replication. For 
example, SV40 infection of mouse cells results in T 
antigen expression and the subsequent driving of the 
infected cells into the cell cycle, but viral DNA replic- 
ation and late gene expression are blocked. This results 
in a population of cells that remain in the cell cycle as 
long as T antigen is present. However, since no viral 
replication occurs T antigen is diluted as the cells 
divide, and eventually the culture returns to its normal 
growth-arrested state. This phenomenon is termed 
abortive transformation. 

In rare instances, SV40 DNA becomes integrated in 
the cellular chromosome in such a manner that allows 
continuous T antigen expression. Such cells are per- 
manently transformed and display a variety of altered 
properties including the ability to: (1) grow in the 
absence of serum or specific growth factors; (2) over- 
grow a monolayer of normal growth-arrested cells; (3) 
grow in the absence of anchorage to a substrate; and 
(4) form tumors in animals. The expression of both 
large and small T antigen is required for full transform- 
ation by SV40. 

Small t antigen does not induce transformation by 
itself, but rather cooperates with large T antigen to 
induce the fully transformed phenotype. This func- 
tion is clearly linked to small t antigen inhibition of 
pp2A activity and the consequent increase in cyclinD 
levels. Small t antigen-mediated activation of the 
cyclinA promoter, which is independent of its action 
on pp2A, also contributes to transformation. 

Large T antigen is necessary for transformation and 
is sufficient to confer some aspects of the transformed 
phenotype. Large T antigen effects transformation by 
blocking the Rb and p53 tumor suppressor pathways. 
The Rb family of tumor suppressors (pRb, p107, and 
p130) block cell proliferation by inhibiting the action 
of the E2F family of transcription factors. T antigen 


binds to Rb proteins resulting in the release of E2F 
from Rb-mediated repression. This results in the acti- 
vation of E2F-responsive genes and the consequent 
progression through the cell cycle. One cellular 
response to this unscheduled entry into S phase is the 
activation of the p53 tumor suppressor pathway. The 
subsequent activation of p53-responsive genes results 
in cell cycle arrest and apoptosis. SV40 large T antigen 
circumvents this defense by binding to p53 and pre- 
venting it from activating its target genes. 

Clearly SV40 tumorigenesis is complicated requir- 
ing the activation and inhibition of multiple cellular 
pathways. Studies with animal models suggest that 
tumorigenesis may be even more complicated than 
originally thought. T antigen expression in some tis- 
sues, for example the f islet cells of the pancreas, is 
sufficient to drive cell proliferation resulting in hyper- 
plasia, but is not sufficient for progression to carcin- 
oma. This progression requires the subsequent 
mutation of, as yet unidentified, cellular genes. Never- 
theless, the T antigens are a powerful model for under- 
standing how the perturbation of specific cellular 
pathways contributes to cancer. 


Future Directions 


The DNA tumor viruses, including SV40, continue to 
provide surprises and insight into the mechanisms of 
cellular growth control. Future studies will be aimed 
at obtaining a detailed understanding of the biochem- 
ical mechanisms used by the T antigens to alter their 
cellular targets, and at discerning which specific 
aspects of the transformed phenotype are altered by 
T antigen action on each target. The next few years 
should prove very exciting in this regard. 


Further Reading 
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Lippincott-Raven. 


See also: Origin (ori); Transformation; Virus 


Tumor Necrosis Factor 
(TNF) 
T H Rabbitts 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1627 


Tumor necrosis factor (TNF; also called cachectin, 
macrophage cytotoxin, necrosin, lymphotoxin) is a 
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mediator of inflammatory responses. It is made by 
macrophages and activated monocytes, eosinophils, 
and NK cells. Its function in tumor biology is complex 
and it can have a cytotoxic and cytostatic effect. It also 
has an effect on angiogenesis and has a negative effect 
leading to cachexia in cancer patients (thereby its 
other name cachectin). 

The gene is on human chromosome 6p2 and mouse 
chromosome 17. It has four exons and produces an 
mRNA encoding a proprotein of 233 amino acids, 
cleaved to 157 amino acids in its mature form. 


See also: Angiogenesis 
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Normal cellular genes whose inactivation by mutation 
predisposes to cancer are termed ‘tumor suppressors.’ 
Generally speaking, the protein products of tumor 
suppressor genes work in processes that prevent the 
transformation of normal cells into cancer cells. Inacti- 
vation of both copies of a tumor suppressor gene 
vitiates the protective function of its product, predis- 
posing to transformation. Germline mutations in cer- 
tain tumor suppressor genes give rise to inherited 
cancer predisposition syndromes. In predisposed indi- 
viduals, the second copy of the gene is inactivated in 
tumors by somatic mutations. 

The first tumor suppressor gene to be characterized 
was Rb, associated with the rare cancer retinoblas- 
toma. Retinoblastoma can exhibit a familial or non- 
familial (sporadic) pattern of incidence. In the 1970s, 
Knudson proposed that both types could be the result 
of a ‘two hit’ mutational process inactivating a pro- 
tective gene, a notion later proven with the identifica- 
tion of Rb. Familial cases inherit one defective copy of 
Rb in the germline, and inactivation of the remaining 
copy suffices to trigger retinoblastoma. In sporadic 
cases, both copies of Rb must be inactivated by 
somatic mutation in the tumor cells. 


Two Classes 


Broadly speaking, the protein products of tumor sup- 
pressor genes can be classified on the basis of their 
cellular function as ‘gatekeepers’ (which regulate cell 
division, death, or lifespan) or ‘caretakers’ (which 
preserve genetic stability) (see Figure 1). 
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Figure | An overview of the events leading to 
tumorigenesis following the inactivation of tumor 
suppressor genes. Top: Multiple genetic alterations are 
selected during the evolution of a cancer cell from a 
normal cell. The loss of tumor suppressor genes 
belonging to the caretaker and gatekeeper classes 
facilitates the evolution of a cancer cell. 


Inactivation of gatekeeper genes facilitates, through 
a variety of mechanisms, the unrestrained growth 
typical of cancer cells. Mutations in several different 
gatekeeper genes may be required for the neoplastic 
transformation of a cell. Gatekeeper gene products 
often participate in the control of cell cycle (for ex- 
ample, the Rb gene product) or in the signals that regu- 
late proliferation (for example, the APC gene product). 

In contrast, caretaker gene mutations lead indir- 
ectly to carcinogenesis. Inactivation of a caretaker 
gene typically induces an increase in the rate of genetic 
mutation, thereafter favoring somatic mutations in 
gatekeeper genes. Often, caretaker gene products par- 
ticipate in DNA repair or the pathways that maintain 
chromosome stability. Examples include the breast 
cancer susceptibility genes BRCA1 and BRCA2, or 
the DNA repair genes mutated in different comple- 
mentation groups of the disorder xeroderma pigmen- 
tosum. 


The Biological Basis of Tumor 
Suppression 


Space does not permit a comprehensive discussion of 
the functions of known tumor suppressor genes. 
Nonetheless, it is instructive to examine some of the 
biological processes in which they participate. 

The division of mammalian cells is normally trig- 
gered or limited by signaling pathways transduced by 
ligand-receptor interactions at the cell surface. Several 


tumor suppressor genes of the gatekeeper class serve 
to regulate these pathways. For example, inactivation 
of the APC tumor suppressor dysregulates a signaling 
pathway initiated by the Wnt receptor. 

Many tumor suppressors of the gatekeeper class are 
key regulators of progression through the cell cycle. Of 
particular importance in human tumors is the dis- 
regulation, by tumor suppressor inactivation, of the 
events that induce progression from the G; to the S- 
phase. Normally, a complex series of inhibitory inter- 
actions governed by the Rb and p53 tumor suppressor 
proteins prevents inappropriate G, to S progression. It 
has been estimated that this pathway is inactivated by 
mutation in over 80% of human tumors. 

The elimination of abnormal, damaged, or un- 
wanted cells is accomplished by a mechanism of pro- 
grammed cell death (apoptosis), whose initiation and 
execution are controlled by several tumor suppressor 
gene products belonging to the gatekeeper class. Inacti- 
vation of these genes prevents or impedes apoptosis, 
prolonging the survival of aberrant cells that may 
evolve into tumors. 

Normal human cells have a finite lifespan, and in 
culture, attain a state of replicative quiescence (termed 
“‘senescence’) after a limited number of cell divisions. 
Many tumor suppressor genes of the gatekeeper class 
(for example, p53 and p16) participate in senescence 
induction. Their inactivation eliminates one important 
control preventing the unlimited cell proliferation 
typical of cancer cells. 

Multiple mechanisms governed by tumor suppres- 
sor genes belonging to the caretaker class are respon- 
sible for the maintenance of genetic stability. They 
include the mechanisms that ensure that a cell can 
sense, signal, and repair damage to DNA that either 
arises during processes such as replication or is 
induced by exogenous agents such as UV radiation. 
For example, inactivation of the tumor suppressor 
gene Msh2, involved in the correction of mismatched 
DNA bases, increases the frequency of mutations 
throughout the genome and is associated with heredi- 
tary predisposition to colorectal cancer. Moreover, 
genes whose products participate in the mechanisms 
that ensure the correct segregation of duplicated 
chromosomes to daughter cells during mitosis may 
also behave as caretakers, although this remains to 
be firmly established. 

Finally, it is important to appreciate that tumor 
suppressor genes may perform multiple cellular func- 
tions whose inactivation is relevant to carcinogenesis. 
An important example is the 53 tumor suppressor, 
which has been implicated in pathways for G,-S 
checkpoint control, in DNA damage sensing, in 
DNA repair, and in programed cell death by apopto- 
sis. The breast cancer genes BRCA1 and BRCA2 have 


been implicated in the regulation of transcription as 
well as DNA repair. The APC tumor suppressor par- 
ticipates in intracellular signaling through the Wnt 
pathway, but also in the regulation of mitosis. In 
these and other examples, it is overly simplistic to 
ascribe tumor suppression to a single biological func- 
tion of the mutant gene. 


Approaches to Identification 


The majority of ‘classical’ tumor suppressor genes 
have been characterized through the identification of 
germline mutations associated with predisposition to 
human cancer. Most of these genes have been first 
isolated using linkage analysis in rare, large families 
with highly penetrant autosomal dominant genetic 
predisposition to cancer. Rb is the first example of a 
tumor suppressor gene identified by this approach. 
These inherited cancer syndromes account for less 
than 5% of the global cancer burden. A different 
strategy will be required to identify high-prevalence, 
low-penetrance predisposition genes, mostly using 
association studies and linkage disequilibrium. The 
proportion of cancers attributable to these genes is 
probably much higher than 5%. 

Tumor suppressor genes are recessive at the cellular 
level and therefore inactivation of both alleles is re- 
quired. This is more often accomplished by mutation 
of one allele and deletion of the second allele. The 
second allele in some cases is targeted by deletion 
(homozygous deletions), methylation with conse- 
quent loss of expression, or mutation. Finally some 
mutations act as dominant negative, and effectively a 
single event inactivates both alleles. 

p53 was the first human tumor suppressor gene 
identified by mutational analysis of sporadic tumors, 
and since then several others have been described. 
Classic tumor suppressor genes are defined by muta- 
tion in both familial and sporadic forms of cancer. 
An increasing number of candidate tumor suppressor 
genes are identified by somatic mutations and have not 
been associated with genetic predisposition. Examples 
include BUB1, BUBR1, TGF-BRII, Axin, DPC4, 
p300, and PPARy. 

The most frequent mechanism of inactivation of 
the second allele of a tumor suppressor gene is allelic 
deletion, and therefore loss of specific chromosomal 
regions occurs frequently in human neoplasia. The 
classic method used to detect these allelic losses was 
Southern blotting, with restriction fragment length 
polymorphism (RFLP) and, later, variable-number 
tandem repeat (VNTR) probes. The advent of PCR 
and the mapping of very informative microsatellite 
markers has facilitated significantly the screening 
for loss of heterozygosity (LOH). More recently, 


Tumor Suppressor Genes 2083 


single-nucleotide polymorphisms (SNPs) have also 
been used in LOH studies. PCR-based methods can 
also be used for whole-genome screens for homozy- 
gous deletions, and several tumor suppressor genes 
have been cloned using homozygous deletion map- 
ping (for example, p16, DPC4, PTEN, and hSNF5). 
Detection of chromosomal copy number loss can also 
be done using hybridization-based approaches: com- 
parative genomic hybridization (CGH) and array- 
based CGH. Tumor suppressor genes have also been 
identified with functional approaches, chromosomal 
transfer-based complementation, and using mouse 
genetics. 


Examples 


An exhaustive review of all tumor suppressor genes 
identified to date and their function is beyond the 
space available here, and therefore a list of tumor 
suppressor genes is presented (Table 1) and a selected 
group of tumor suppressor pathways are discussed in 
more detail. 


Rb-p53 Pathway: Cell Cycle Control at G,-S 
Transition 

The Rb gene, first identified in a search for the gene 
mutated in familial retinoblastoma, encodes a member 
of a small family of proteins that has a critical role in 
the control of progression from the G1- to the S-phase 
of the cell cycle. It has been estimated that the path- 
way controlled by Rb is defective in over 85% of all 
cancers. Rb itself is mutated not only in retinoblas- 
toma but also ina variety of sporadic tumors arising in 
different tissues. 

Rb and related proteins regulate the E2-F family of 
transcription factors at the G,-S transition. Activation 
of the heterodimeric E2-F complex, which contains 
one of the E2-F subunits (E2-F1 to —5) bound to the 
DP protein, suffices to initiate the transcription of a 
number of genes essential for entry into the S-phase, 
including DNA polymerase, enzymes involved in 
nucleotide synthesis, and cyclin E. Prior to S-phase 
entry, E2-F is held in an inactive state bound to Rb. 
Passage through the G,-S transition is triggered by 
cyclin-dependent protein kinases (CDKs) that form 
active complexes with cyclins D, E, or A. When the 
cyclin D-CDK complex is activated in response to 
mitogenic growth regulatory signals, it hyperphos- 
phorylates Rb. Hyperphosphorylated Rb is released 
from its association with E2-F subunits, leaving them 
free to transactivate genes required for S-phase entry. 
Cyclin E, itself a target for transactivation, initiates a 
positive feedback circuit by promoting the assembly 
of cyclin E-CDK complexes that can also hyperphos- 
phorylate Rb. 
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Table I Tumor suppressor genes in human cancer. 
Tumor Chromosomal Gene function Germline mutations Somatic mutations 
suppressor location in hereditary cancer in sporadic cancers 
gene syndromes 
RBI 13q14 Transcriptional regulator of cell Familial retinoblastoma Retinoblastoma, 
cycle (G;-S) osteosarcoma, lung 
cancer 
WT! IIp13 Transcriptional regulator Wilms’ tumor Nephroblastoma, AML 
p53 I7qll Guardian of genome Li-Fraumeni syndrome 50% of human cancers 
hCHK2 22q12.1 G—M checkpoint enforcement Li-Fraumeni syndrome Unknown 
NFI I7qll Ras-GAP activity Von Recklinghausen Neurofibroma, sarcoma, 
neurofibromatosis glioma 
NF2 22q12 ERM protein/cytoskeletal Neurofibromatosis type 2 Schwannomas, 
regulator meningiomas 
VHL 3p25 Regulator of proteolysis; Von Hippel—Lindau Renal cell carcinoma, 
interacts with elongins syndrome pheocromocytoma, 
hemangioma 
APC 5q21 Negative regulator of Wnt Familial adenomatous Colorectal cancer 
signaling; association with polyposis 
microtubule cytoskeleton 
INK4a 9p21 Cyclin-dependent kinase inhibitor FAMM Pancreatic cancer; 
(CDKI2A, melanoma; esophageal 
pl6, MTSI) carcinoma 
PTC 9q22.3 Receptor for sonic hedgehog Gorlin syndrome Basal cell carcinoma, 
medulloblastoma 
BRCAI 17q21 DNA repair; transcriptional Familial breast cancer Breast and ovarian 
regulator carcinoma (rare) 
BRCA2 13q12.3 DNA repair; transcriptional Familial breast cancer Breast and ovarian 
regulator carcinoma; pancreatic 
carcinoma (rare) 
DPC4 18q21.1 Regulator of TGF pathway Juvenile polyposis Pancreatic and colorectal 
carcinoma; hamartoma 
PTEN 10q23 Dual-specificity phosphatase; Cowden syndrome Glioblastoma; prostate 
regulation of PI3K/AKT pathway and breast carcinoma 
TSCI/ TSC2 9q34/16p13.3 Associates with TSCI/TSC2; a Tuberous sclerosis 
putative GTPase-activating protein 
LKBI 19p13 Serine/threonine kinase Peutz—Jeghers syndrome 
E-cadherin 16q22.1 Cell adhesion and Wnt signaling Hereditary diffuse gastric Gastric (diffuse), breast 
(CDH!) cancer (lobular), and 
gynecologic carcinoma 
hMSH2 2p22 DNA MMR HNPCC Colorectal carcinoma; 
MMR-deficient tumors 
hMLHI 3p21 DNA MMR HNPCC Colorectal carcinoma; 
MMR-deficient tumors 
hPMSI 2q31 DNA MMR HNPCC Colorectal carcinoma; 
MMR-deficient tumors 
hPMS2 7p22 DNA MMR HNPCC Colorectal carcinoma; 
MMR-deficient tumors 
hMSH6 2pl6 DNA MMR HNPCC Colorectal carcinoma; 
MMR-deficient tumors 
hMSH3 5ql l-q12 DNA MMR HNPCC Colorectal carcinoma; 
MMR-deficient tumors 
CBP/p300 16p13.3/22q13 Transcriptional regulator; Rubinstein—Taybi syndrome Colorectal, pancreatic, 


acetylase 


(CBP) 


gastric and breast 
carcinoma (p300) 
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Table | (Continued) 
Tumor Chromosomal Gene function Germline mutations Somatic mutations 
suppressor location in hereditary cancer in sporadic cancers 
gene syndromes 
hSNF5S 22ql 1.2 SWI/SNF multiprotein complex Hereditary MRTs Sporadic MRTs; 
chroroid plexus 
carcinomas 
PPARy 3p25 Nuclear hormone receptor - Colorectal carcinoma 
family of transcription factors 
Axin | 16p13.3 Wnt signaling - Liver carcinoma 
Axin 2 17q23-q24 Wnt signaling - Colorectal carcinoma 
(Conductin) 
EXTI/EXT2 8q24.1 1-q24.13/ Glycosyltransferase activity Familial exostoses 
| ip12-pl1 (heparan sulfate metabolism) 
ATM | 1q23 DNA damage/repair Ataxia telangiectasia B-CLL 
(autosomal recessive) 
XP(A-H) Multiple loci DNA excision repair Xeroderma pigmentosum 
(autosomal recessive) 
NBSI 8q21 DNA double-stranded break Nijmegen breakage 
repair syndrome 
(autosomal recessive) 
BLM 15q26.1 DNA helicase Bloom syndrome 
(autosomal recessive) 
FANCA-H Multiple loci UV damage/repair Fanconi anemia 


(autosomal recessive) 


FAMM: familial atypical mole melanoma syndrome; AML: acute myeloid leukemia; HNPCC: hereditary nonpolyposis 
colorectal cancer; MMR: mismatch repair; MRT: malignant rhabdoid tumor. 


Cyclin-CDK activity is inhibited by two families 
of inhibitory proteins (CDK-Is), whose prototypic 
members are the p21 or ple NS*4 proteins, respect- 
ively. The p21 family includes the CDK-Is p27KIP-1 
and p57; whereas the p16 family includes p19®?/ 
pl44®* an alternatively spliced form, and p15'N**®, 
By preventing Rb hyperphosphorylation, CDK-Is 
antagonize the cascade of events that culminates in S- 
phase entry. CDK-Is of the p21 family are generally 
induced through the activation of the p53 tumor sup- 
pressor, which serves to integrate the cellular response 
to metabolic or genotoxic stress. p53 protein is nor- 
mally present only at very low concentrations in cells, 
due to its rapid proteolysis induced by association 
with a negative regulator, Mdm-2. Following exposure 
to stress, p53 activity becomes elevated through a 
variety of mechanisms. Levels of p53 protein are 
increased, following release from Mdm-2 association. 
Moreover, posttranslational modifications of p53 such 
as phosphorylation are induced that activate the pro- 
tein. Active p53 functions as a sequence-specific activ- 
ator of transcription. A number of different targets 
have been identified that mediate cell-cycle arrest, 
DNA repair, and programed cells death by apoptosis. 
CDK-Is such as p21 are important targets for p53 
transactivation. When levels of p21 protein are 


increased following p53 activation, they bind to and 
inactivate cyclin-CDK complexes, preventing S-phase 
entry and cell-cycle progression. Thus, the G1-S tran- 
sition is rendered sensitive to DNA damage and other 
cellular stresses through the activation of the p53 path- 
way. 

The multifaceted control mechanisms that govern 
transition from the G1- to the S-phase of cell cycle can 
be perturbed by cancer-associated mutations that 
affect the genes encoding the cyclin-CDK complexes 
or the CDK-Is and other components of the p53 path- 
way, besides Rb. The net effect of these mutation- 
induced changes is to effectively remove the controls 
that prevent unfettered entry into the S-phase, 
enabling unrestrained cell proliferation. 

cyclin D1 is often overexpressed in human cancers 
following gene amplification or translocation to an 
active chromosomal locus. Indeed, the chromosomal 
region (11q13) that contains the cyclin D1 gene is 
amplified in a diverse group of human cancers, includ- 
ing head and neck tumors (>40%), breast cancer 
(~15%), and small cell lung tumors (10%). Similarly, 
the CDK4 gene encoding the CDK that binds to 
cyclin D1 can also undergo cancer-associated amplifi- 
cation or mutation. That overexpression of cyclin D1 
in transgenic mice promotes carcinogenesis suggests a 
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direct role for cyclin D1 amplification in tumor cell 
proliferation. 

Genes encoding the CDK-Is are frequent targets of 
tumor-associated mutations that result in the inactiva- 
tion or deletion of their encoded proteins. For 
instance, the p16 gene is mutated in familial cases of 
melanoma, in over 40% of sporadic pancreatic carcin- 
omas, and in about one-third of (sporadic) tumors of 
the esophagus. The locus encoding p16 undergoes 
homozygous deletion in about half of all gliomas, in 
40% of pancreatic carcinomas, and, less frequently, in 
acute lymphocytic leukemias, ovarian cancer, and lung 
carcinomas. Consistent with these observations, mice 
deficient in p16 spontaneously develop a number of 
different tumors. 

Over half of all human tumors contain mutations in 
p53. Typically these mutations cause a missense alter- 
ation in one allele, resulting in the synthesis of a faulty 
p53 protein, with the second p53 allele often inacti- 
vated by gross chromosomal rearrangements such as 
deletions. Over 90% of the missense mutations found 
in human cancers affect the DNA-binding domain of 
p53, encoded in the middle third of the protein. Many 
result in the loss of sequence-specific DNA binding, 
or transactivation, by the p53 molecule. Since p53 
works as a tetrameric complex, mutant p53 can poison 
the function of the wild-type molecule in a trans- 
dominant manner. This property may explain why 
humans who inherit a single defective p53 allele (Li- 
Fraumeni syndrome) are predisposed to a variety of 
different tumors. 

It is important to emphasize that the transactiva- 
tion capacity of p53 is important not only for correct 
regulation of the G;-S transition, but also for many 
other functions whose inactivation is relevant to 
tumorigenesis. Thus, although disregulation of the 
Rb pathway by p53 mutations is likely to make an im- 
portant contribution to transformation, it is unlikely — 
in isolation — to explain the multifaceted role of p53 in 
tumor suppression. 


APC and E-cadherin: Wnt Signaling, Cell 
Adhesion, and Human Tumorigenesis 
APC mutations were initially identified in the rare 
hereditary form of colorectal cancer familial adeno- 
matous polyposis (FAP). Patients with FAP develop 
hundreds to thousands of adenomatous polyps during 
the first three decades of life and almost inevitably 
some of these polyps will progress to invasive carcin- 
oma. APC mutations have been described subse- 
quently in the large majority of sporadic colorectal 
cancers. 

The APC protein binds to and acts to prevent f- 
catenin accumulation. This finding suggests a function 


of APC in both cadherin-based cellular adhesion and 


Wnt signaling. The canonical Wnt signaling pathway 
has B-catenin at its heart. In unstimulated cells, free 
cytoplasmic B-catenin is destabilized by a multipro- 
tein complex containing the APC tumor suppressor, 
Axin and glycogen synthase kinase-3B. The phosphor- 
lation of B-catenin earmarks it for ubiquination and 
subsequent degradation by the proteasome. In con- 
trast, when cells are stimulated by Wnt ligands, the 
cytoplasmic protein Dishevelled is recruited to the 
membrane, where directly or indirectly it binds to 
Frizzled, the seven transmembrane Wnt receptor. 
The mechanism by which Dishevelled inhibits the 
Axin complex is not completely understood, but the 
net result is stabilization of B-catenin. B-catenin is 
released from the Axin complex and translocated 
into the nucleus, where it binds to TCF proteins and 
stimulates transcription of Wnt target genes (includ- 
ing c-Myc and cyclin D). In addition to its role in Wnt 
signaling, B-catenin binds to the homotypic adhesion 
molecule E-cadherin and links the cadherin junctions 
to the actin cytoskeleton to mediate cell adhesion. The 
current model of APC’s function in tumor suppres- 
sion proposes that its main role is to shuttle B-catenin 
from the nucleus and the cytoplasm to the junctional 
compartment of epithelial cells. Here B-catenin is 
delivered either to the Axin complex for degradation 
or to E-cadherin to integrate adherens junctions. A 
role for APC in spindle function has also been sug- 
gested and could explain the chromosomal instability 
observed in APC mutant colorectal cancer cells. 

E-cadherin is an homophilic cell adhesion molecule 
and integrates the Wnt signaling pathway. Somatic 
mutations of the E-cadherin gene, CDH1, were ori- 
ginally identified in diffuse gastric, lobular breast and 
gynecological carcinomas. Subsequently using a com- 
bination of linkage and candidate-gene analysis, trun- 
cating germline mutations of CDH1 were identified in 
kindred with autosomal dominant predisposition to 
diffuse gastric cancer, affecting predominantly young 
individuals. The mechanism trough which E-cadherin 
inactivation initiates tumorigenesis is poorly under- 
stood, but it is tempting to speculate that it might 
be the result of increasing the cytoplasmic pool of 
B-catenin. 


Breast Cancer Genes BRCAI! and BRCA2: 
New Paradigms in Tumor Suppression 
About one-tenth of all breast cancer cases exhibit a 
familial pattern of inheritance. Of these familial cases, 
germline mutations in either BRCA1 or BRCA2 occur 
in between 20 and 60%. Mutations in BRCA1 or 
BRCA2 are not a feature of nonfamilial (sporadic) 
breast cancer. BRCAI and BRCA2 were first iden- 
tified in 1994-95 through the analysis of families ex- 
hibiting a predisposition to early-onset breast cancer. 


Founder mutations affecting these genes occur in Ice- 
land and amongst the Ashkenazim, where they confer 
a highly penetrant risk of breast, ovarian, and other 
cancers (including cancers of the male breast, pan- 
creas, and prostate). 

The cellular functions of the proteins encoded by 
these large genes remain uncertain. In meiotic cells, 
colocalization of BRCA1 and BRCA2 proteins to the 
synaptonemal complexes of developing axial elements 
has been reported, consistent with a role in meiotic 
recombination, a process that is initiated by DNA 
double-stranded breakage. Similarly, there is increas- 
ing evidence that BRCA1 and BRCA2 are essential in 
mitotic cells for the repair of DNA double-stranded 
breaks by homologous recombination. Targeted 
disruption of the murine homologs of BRCA1 or 
BRCA2 gives rise to genotoxin hypersensitivity and 
chromosomal instability suggestive of defective 
DNA double-stranded break repair. Furthermore, 
homology-directed repair of double-stranded DNA 
breaks introduced into chromosomal substrates is 
impaired by the disruption of BRCA1 or BRCA2 
pathways, although pathways for repair by nonhomo- 
logous end-joining remain unaffected. 

BRCA2 interacts directly, and at a high stoichi- 
ometry, with Rad51, the mammalian homolog of the 
RecA protein essential for DNA repair by recombin- 
ation. It has therefore been proposed that BRCA2 
works to control the activity or availability of Rad51, 
although the precise molecular mechanism remains to 
be defined. The interaction of BRCA1 with Rad51 is 
less well characterized, although both proteins colo- 
calize — along with BRCA2 - to discrete nuclear foci 
following DNA damage. 

There is good evidence that BRCA genes work as 
caretakers of genetic stability. Cells that harbor dis- 
ruptions in BRCA/ or BRCA2 accumulate aberrations 
in chromosome structure reminiscent of diseases 
such as Bloom syndrome or Fanconi’s anemia, where 
chromosomal instability is associated with cancer 
predisposition. This is likely to increase the frequency 
of gross chromosomal rearrangements such as trans- 
locations and deletions throughout the genome. 

Homozygous inactivation of BRCA1 or BRCA2 
results in the failure of cell proliferation through 
the activation of cell-cycle checkpoints responsive to 
DNA damage. There is evidence that checkpoints 
operative during mitosis may be of particular relevance 
in preventing the proliferation of BRCA2-deficient 
cells. It therefore seems likely that inactivation of 
these checkpoints will be an important step in the 
transformation of cells lacking the BRCA genes. 

It is unclear why carcinogenesis in individuals who 
inherit one mutant allele of BRCA1 or BRCA2 should 
exhibit a predilection for specific tissues such as the 
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breast or ovary. Both BRCA1 and BRCA2 are 
expressed in many different cell types and have been 
implicated in biological functions common to all tis- 
sues. Several possible explanations have been pro- 
posed, but none is as yet supported by conclusive 
evidence. 


Mismatch Repair Genes 

The identification of the mismatch repair (MMR) 
genes is probably the single example where a molecu- 
lar phenotype combined with cross-species compari- 
son helped in cloning a novel class of cancer genes. 
Cancers from individuals belonging to kindred with 
hereditary nonpolyposis colorectal cancer (HNPCC) 
displayed evidence of replication errors upon micro- 
satellite analysis of colorectal cancer. These replication 
errors were similar to those identified in yeast with 
defective MMR genes, homologous to bacterial MutS 
and MutL. Human homologs of bacterial/yeast MMR 
genes were cloned and shown to be responsible for 
the cancer predisposition in HNPCC. Loss of MMR 
genes, as with other genes with caretaker function, 
predisposes to cancer by increasing the DNA muta- 
tion rate, thereby increasing the chance of inactivation 
of gatekeeper genes such as APC. 


Von Hippel-Lindau Gene 

The von Hippel-Lindau (VHL) gene product forms 
part of a complex that targets proteins for degradation 
by proteolysis. One of the VHL targets for proteolysis 
is hypoxia-inducible factor-1 (HIF1), which in VHL- 
negative tumor cells becomes stabilized irrespective of 
oxygen concentration, leading to induction of vascular 
endothelial growth factor (VEGF). Other growth- 
regulatory molecules are also regulated by proteolysis 
and potential VHL targets. 


Mutations in Cancer Pathogenesis 


Cancer arises as a result of successive rounds of muta- 
tion and clonal selection, leading to the progressive 
conversion of normal human cells into cancer cells 
(see Figure 1). Several independent lines of evidence 
support the contention that cancer is a multistep pro- 
cess. Measurements of age-dependent human cancer 
incidence have shown that the rate of tumor develop- 
ment is proportional to the 4th—6th power of elapsed 
time, implicating four to six rate-limiting, independ- 
ent stochastic events. Estimates of the number of 
genetic alterations derived from the study of these 
alterations in colorectal tumorigenesis reveal that car- 
cinomas arise from a minimum of five or more events. 
Remarkably, in experiments involving the introduc- 
tion of genes in human cells, a minimum of four dis- 
tinct pathways must be disrupted to convert a normal 
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cell into a tumor cell: the mitogen-response pathway, 
the telomere maintenance pathway, the retinoblas- 
toma pathway, and the p53 pathway. In normal cells 
genomic integrity at the sequence and karyotypic 
levels is maintained by complex systems of DNA- 
monitoring and -repair enzymes and of checkpoint 
enforcement, and therefore mutations are rare. Dis- 
ruption of one or more of these systems results in 
genomic instability, an enabling characteristic which 
allows mutations at the rate required for malignant 
transformation. 

Clonal evolution through mutation results in the 
acquisition of capabilities that underlie the malignant 
transformation process: self-sufficiency in growth sig- 
nals, insensitivity to antigrowth signals, evasion of 
apoptosis, evasion of senescence, ability to induce 
and sustain angiogenesis, and ability to invade and 
metastasize. Inactivation of tumor suppressor genes 
may underlie the acquisition of one or more of these 
six essential malignant characteristics (gatekeeper 
genes) and/or result in the enabling genetic instability 
(caretaker genes). Tumor suppressor genes integrate 
these molecular pathways that have evolved to main- 
tain cellular homeostasis. It is not clear how many 
genes have to be mutated in order to disrupt a path- 
way, and inactivation of some genes can have con- 
sequences in several pathways (p53 regulates genomic 
stability, the cell cycle, and apoptosis). In pancreatic 
carcinoma the Rb pathway is abrogated by inacti- 
vation of the p16 gene (through point mutation, 
homozygous deletion, and methylation) in over 95% 
of cases, and disruption of other members of the path- 
way (Rb, cyclin D, and CDK4) is almost never seen. 
This contrasts with the situation in breast cancer, 
where p16 mutations are rare (as are mutations in 
Rb) and cyclin D amplification is relatively common. 
This might indicate that different tissues have different 
gatekeepers and therefore require different mutations 
for tumor initiation and progression. 


Conclusions 


Much of the recent progress that has illuminated the 
biological functions of tumor suppressor genes and 
their role in carcinogenesis has moved in steps from 
identification of the genes by linkage and mutation 
studies in humans to the analysis of mouse strains in 
which genes have been disrupted by targeting. The 
recent availability of the human genomic DNA 
sequence and the anticipated availability of the murine 
genomic sequence in the near future should consider- 
ably facilitate both identification and biological 
analysis. Moreover, the widespread availability of 
sequence information coupled to the use of new and 
better techniques to manipulate the genome of 


organisms ranging from mice to fruit flies promises 
to make possible global searches for genes whose 
disruption or alteration confers increased cancer pre- 
disposition. It will be particularly interesting and 
important if these studies result in fresh insights into 
the question of why germline tumor suppressor gene 
mutations are generally associated with a tissue- 
specific — rather than ubiquitous — increase in cancer 
risk. A better understanding of this problem, perhaps 
through the identification of tissue-specific gate- 
keeper genes for carcinogenesis, will be an important 
step in the evolution of new strategies for cancer treat- 
ment and prevention. 
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Although the syndrome described by Turner in 1938 
comprised infantilism, congenital webbed neck, and 
cubitus valgus in women, it has since been recognized 
that the important feature of Turner syndrome is 
abnormal ovarian development. In affected infants 
the ovaries appear normal and contain oogonia, but 
by adolescence primordial germ cells are virtually 
absent and the ovaries are replaced by thin streaks of 
ovarian stroma. These ‘streak’ gonads are incapable of 
producing sufficient estrogens for feminization and so 
the vulva remains infantile, breasts and pubic and 
axillary hair fail to develop, and there is primary amen- 
orrhea. Short stature is invariable, and the final height 
is usually no more than 140cm. Webbed neck and 
cubitus valgus are only two of a number of associated 
congenital abnormalities that may or may not be 
present. These associated malformations are so 


characteristic that they are often termed “Turner stig- 
mata.’ The important ones are shield chest, webbed 
neck, cubitus valgus, peripheral lymphedema at birth, 
short IVth metacarpals, hypoplastic nails, multiple pig- 
mented naevi, atrial septal defect, bicuspid aortic valve, 
and coarctation of the aorta. The condition presents in 
infancy with malformations, in childhood with short 
stature, and in adolescence with primary amenorrhea. 
Plasma and urinary gonadotrophins are elevated at 
puberty to postmenopausal levels. Substitution ther- 
apy with estrogens allows the adult to develop second- 
ary sex characteristics and to live a comparatively 
satisfactory, although sterile, married life. Pregnancy 
can be achieved by ovum donation, cyclical hormone 
therapy, and IVE. Moderate improvement in stature 
(up to an additional 5cm) is possible by prolonged 
treatment with growth hormone started in childhood 
and continued through adolescence. 

The more frequent occurrence of coarctation of the 
aorta in males prompted Polani and colleagues in 1954 
to examine the sex chromatin status of three patients 
with Turner syndrome and coarctation of the aorta. 
Sex chromatin was found to be absent and this was 
interpreted as evidence that the patients were sex- 
reversed males. A study of the frequency of color 
blindness in a large series of patients tended to confirm 
this as the frequency found corresponded with the 
frequency in normal males. However, in 1959 Ford 
and colleagues showed that the sex chromosome com- 
plement consisted of a single X chromosome and no Y. 
While most patients with the complete Turner pheno- 
type and associated malformations have been shown 
to have a 45,X karyotype, a wide variety of variants 
of Turner syndrome are described with structural 
abnormalities of the X or Y chromosomes. Those 
patients with deletions of the short arm of the X 
chromosome tend to have short stature and Turner 
stigmata, whereas those with long-arm deletions of 
the X tend to have few Turner stigmata and stature 
within the normal range. The fact that normal XY 
males do not have Turner stigmata and short stature 
is explained by the existence of X/Y homologous loci 
present in a double dose. In other words, the disabil- 
ities associated with Turner syndrome are the result of 
haploinsufficiency of X/Y homologous loci. 

The more common structural sex chromosome 
abnormalities found in Turner patients include long- 
arm X isochromosome, long-arm Y isochromosome, 
short-arm deletions of the X chromosome and ring-X 
chromosomes. These structural abnormalities are 
invariably associated with mosaicism for 45,X cells. 
46,XX/45,X, and 46,XY/45,X are more common 
forms of mosaicism found in patients with features 
of Turner syndrome. The presence of a normal cell 
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line clearly modifies the phenotype. 46,XX/45,X cases 
may menstruate, and 46,XY/45,X cases may show 
variable degrees of masculinization. Some XY/X 
mosaics may have asymmetrial sex differentiation 
with a ‘streak’ gonad on one side and a rudimentary 
testis on the other; in these cases Mullerian derivatives 
are repressed on the side of the testis, and the external 
genitalia are ambiguous. 

In Turner patients with structural X-chromosome 
aberrations the abnormal X is preferentially inactiv- 
ated. This is reflected in the size of the sex chromatin 
body. Isochromosomes for the long arm of the X have 
measurably larger sex chromatin bodies than normal, 
while X-chromosome deletions have smaller than nor- 
mal sex chromatin. Among cases of 45,X Turner syn- 
drome two classes are recognized depending on the 
parental origin of the X chromosome. Those with a 
paternally derived X chromosome have better social 
communication and cognitive skills than those in 
whom the single X is maternally derived. The findings 
are explained on the basis of differential imprinting of 
maternal and paternal X-linked genes influencing 
behavior. 

The finding that a substantial proportion of early 
spontaneous abortions have a 45,X karyotype indi- 
cates that most conceptions with Turner syndrome 
are inviable. In fact, it has been estimated that 97% 
of 45,X conceptions are lost early in pregnancy. Many 
of those that continue in pregnancy can be recognized 
by ultrasound examination. The most important fea- 
ture is cystic hygroma resulting from hypoplasia of 
the lymphatic system, a precursor not only of the 
webbed neck evident after birth but also of the asso- 
ciated cardiac malformations and peripheral lymphe- 
dema. The frequency of Turner syndrome among 
female livebirths is believed to be in the order of 1 in 
3000. 


See also: Imprinting, Genomic; Klinefelter 
Syndrome; Sex Chromatin; Sex Determination, 
Human; X-Chromosome Inactivation 
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The twisting number is the total number of base pairs 
divided by the number of base pairs per turn of a DNA 
double helix. 


See also: DNA Structure 
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See: Transposons as Tools 
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Tyrosine (Tyr or Y) is one of the 20 amino acids 
commonly found in proteins. Its side-chain contains 
a hydrophilic hydroxyl group attached to an extreme- 
ly hydrophobic benzene ring, making its chemical 
properties somewhat ambiguous. Its chemical struc- 
ture is given in Figure l. 


COO 
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Figure | Tyrosine. 


See also: Amino Acids; Proteins and Protein 
Structure 
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Ubiquitin (Ub) is a 76-residue protein that exists in 
cells either free or covalently conjugated to other pro- 
teins. Ub is present in all eukaryotes but apparently 
absent from prokaryotes. Among eukaryotes, Ub is 
one of the most conserved proteins. For example, the 
sequences of yeast and mammalian Ub differ by two 
residues out of 76. Ub is a component of a multipath- 
way intracellular proteolytic system called the Ub 
system, or the Ub-proteasome system. Most of the 
Ub-dependent pathways involve processive degrad- 
ation of Ub-conjugated (ubiquitylated) proteins by 
the 26S proteasome, an ATP-dependent multisubunit 
protease. (Note: Ub whose C-terminal (glycine 76) 
carboxyl group is covalently linked to another com- 
pound is called the ubiquityl moiety, with the deriva- 
tive terms being ubiquitylation, ubiquitylated. The 
abbreviation Ub refers to both free ubiquitin and 
the ubiquityl moiety.) 

Ub is conjugated to other proteins (including other 
Ub molecules) through an amide bond, called the 
isopeptide bond, between the C-terminal residue (gly- 
cine 76) of Ub and the e-amino group of a lysine 
residue in a substrate protein. Ub is activated for the 
conjugation reaction by a Ub-activating enzyme (E1), 
which couples ATP hydrolysis to the formation of 
a high-energy thioester bond between glycine 76 of 
Ub and a specific cysteine residue of the E1 enzyme 
(Figure |). The E1-linked Ub moiety is moved, in a 
transesterification reaction, from E1 to a specific 
cysteine residue of a Ub-conjugating enzyme (E2), and 
from there to a lysine residue of an ultimate substrate 
protein, yielding a Ub-protein conjugate (Figure 1). 
This latter step requires participation of another 
component, called E3 or recognin. One function of 
E3 is to select a protein for ubiquitylation through an 
interaction with the protein’s degradation signal. 

Some ubiquitylated proteins (for example, histone 
H2A in mammalian chromosomes) bear a single Ub 
moiety and, in that state, appear to be metabolically 


stable. However, most of the other Ub conjugates 
contain a substrate-linked multi-Ub chain, in which 
the C-terminal glycine of Ub is linked to an inter- 
nal lysine of an adjacent Ub, resulting in a chain of 
Ub-Ub conjugates containing two or more Ub 
moieties. One function of a substrate-linked multi- 
Ub chain is to facilitate the substrate’s degradation 
by the 26S proteasome, in part through the binding 
of the multi-Ub chain to a specific component of the 
proteasome. 

The covalent bond between Ub and other proteins 
can be cleaved: every eukaryotic cell contains multi- 
ple, ATP-independent proteases that recognize a Ub 
moiety and cleave at the Ub-—adduct junction. The 
multiplicity of these Ub-specific proteases stems in 
part from the diversity of their targets, which include 
Ub precursors (linear, DNA-encoded fusions of Ub to 
other proteins, including Ub itself); Ub adducts with 
small nucleophiles such as glutathione; and either free 
or substrate-linked multi-Ub chains. 

Degradation signals, or degrons, are features of 
proteins that confer metabolic instability. In most 
cases, Ub is a secondary degradation signal, in that it 
is linked to a protein that bears one of several primary 
degradation signals recognized by the Ub system. 
Degradation signals can be active constitutively or 
conditionally. Signals of the latter class, found in 
many regulatory proteins, including cyclins and tran- 
scription factors, are controlled through phosphoryla- 
tion or interactions with other proteins whose binding 
may either shield a degron or activate it by providing a 
missing determinant. 

Ub-dependent degradation signals consist of two 
major determinants: an amino acid sequence or a con- 
formational feature that is specific for a given degron, 
and a lysine residue, the latter being the site of ubi- 
quitylation. An example of the first determinant is a 
destabilizing N-terminal residue of a short-lived pro- 
tein, which is recognized by a specific E2-E3 targeting 
complex. The set of amino acid residues that are desta- 
bilizing in a given cell yields a rule, called the N-end 
rule, which relates the zm vivo half-life of a protein to 
the identity of its N-terminal residue. The corres- 
ponding Ub-dependent system, called the N-end 
rule pathway, is one of several proteolytic pathways 
of the Ub system. Among the other Ub-dependent 
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Figure | Ubiquitin pathways. 
pathways are the systems whose large E3-E2 com- 
plexes target for degradation a class of conditionally 
unstable regulatory proteins called cyclins. The result- 
ing periodic changes in the concentrations of cyc- 
lins (which function as subunits of cyclin-dependent 
kinases) drive and regulate the cell cycle. For example, 
mitotic cyclins are destroyed specifically at the end of 
mitosis, in part through the recognition of a sequence 
motif in these cyclins called the ‘destruction box.’ 
Most, though not all, of the damaged and otherwise 
abnormal cytosolic and nuclear proteins are recog- 
nized and selectively destroyed by the Ub system, 
apparently through the exposure of their normally 
buried degrons. Moreover, proteins which are trans- 
located from the cytosol across the endoplasmic re- 
ticulum (ER) membrane into the ER but fail to fold 
properly can be selectively transported back to the 


cytosol for their degradation by the Ub system. In 
addition, the conjugation of Ub to some membrane 
proteins acts as a signal for their endocytosis and 
delivery to lysosomes, in contrast to the more com- 
mon function of Ub as a signal for the proteasome- 
mediated proteolysis in the cytosol. Under certain 
conditions, Ub can also function as a molecular chap- 
erone: the transient, cotranslational linkage of Ub to 
specific ribosomal proteins was found to be essential 
for the efficient biogenesis of ribosomes. Eukaryotic 
cells also contain several homologs of Ub, called Ub- 
like proteins, which, similarly to Ub, exist either free 
or conjugated to other proteins. The conjugation of 
Ub-like proteins to their substrates involves distinct 
sets of conjugating enzymes and, in contrast to Ub 
conjugation, appears to have functions different from 
proteolysis. 


The vast functional range of Ub stems from the fact 
that a large fraction of intracellular proteins (cyclins, 
transcription factors, components of signal transduc- 
tion pathways, damaged proteins) are physiological 
substrates of the Ub system. This system has been 
shown to play major roles ina legion of biological pro- 
cesses: the cell cycle, cell growth and differentiation, 
embryogenesis, apoptosis, signal transduction, DNA 
repair, regulation of DNA transcription, replication 
and segregation, transmembrane transport, endocy- 
tosis, stress responses, antigen presentation, other 
aspects of the immune response, and functions of the 
nervous system, including circadian rhythms, axon 
guidance, and acquisition of memories. A number of 
tumor suppressors and proto-oncoproteins are specif- 
ic components of the Ub system. Viruses often target 
Ub-dependent proteolysis to bypass or suppress the 
host’s immune response. The Ub system and its per- 
turbations have been implicated in the pathogenesis of 
cancer, senescence, bacterial and viral infections, spe- 
cific genetic syndromes, and major neurodegenerative 
diseases. 


Further Reading 

Hershko A, Ciechanever A and Varshavsky A (2000) The ubi- 
quitin system. Nature Medicine 6: 1073-1081. 

Hicke L (1997) Ubiquitin-dependent internalization and down- 
regulation of plasma membrane proteins. FASEB Journal 84: 
277-287. 

Peters J-M, Harris JR and Finley D (eds) (1998) Ubiquitin and the 
Biology of the Cell. New York: Plenum Press. 


See also: Cell Cycle; Cyclin-Dependent Kinases; 
Mitosis; Proteolysis; Proteome 


Umber Mutation 
See: Nonsense Mutation; Start, Stop Codons 


Underdominance 
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‘Underdominance’ or ‘heterozygote inferiority’ refers 
to cases of natural selection in diploid organisms 
where the fitness of genetic heterozygotes, carrying 
two different forms (alleles) of a gene (e.g., A742), is 
strictly less than that of both of the corresponding 
homozygotes, which carry two copies of one of 
those alleles (e.g., AzA; and AzA2). This situation is 
perhaps most commonly found in interspecific 
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hybrids formed by the mating of two distinct biologic- 
al species. In such cases, each parental species may be 
fixed for one allele at a certain genetic locus that has 
proved advantageous in its typical habitat, and all 
individuals of that species are accordingly homozy- 
gous for that allele; hybrids carrying a mixed combin- 
ation of the two parental alleles sometimes have 
reduced viability or fertility in the two parental envir- 
onments and thus show underdominance there. In 
other, intermediate or alternative environments, how- 
ever, the hybrids may instead have an intermediate 
fitness, or even enjoy a fitness advantage over both 
parental species, since the relative fitness of genotypes 
is often habitat or environment specific. 


Evolutionary Outcome under Constant 
Underdominant Selection 


The evolutionary outcome from underdominance is 
readily predicted if there are only two alleles at a 
certain locus, e.g., A; and A2, and heterozygotes have 
a constant fitness disadvantage through a lower viabil- 
ity (rate of survival from birth to reproduction) than 
either of the associated homozygous genotypes. If no 
other evolutionary forces such as mutation, nonran- 
dom mating, migration, or genetic drift act on this 
genetic locus, constant underdominant selection will 
always eliminate one of the two alleles from the popu- 
lation, with the frequency of one allele monotonically 
declining to zero, and that of the other monotonically 
increasing to 1. 

Which allele is lost (frequency becomes zero) and 
which fixed (frequency becomes 1) is readily predicted 
and depends on the relative fitnesses of the three 
genotypes, as well as upon the initial allele frequencies 
in the population. This works as follows: the relative 
fitness values determine a threshold allele frequency 
for each allele, such that populations with frequencies 
initially below that value will lose that allele, while, in 
populations with higher initial allele frequencies, the 
frequency of that allele will increase to 1 and the other 
allele will be lost. In general, an allele increases in 
frequency under constant viability selection if and 
only if it has the higher average, or marginal fitness, 
which equals the weighted average of the fitnesses of 
the two genotypes that carry the allele, with the 
weights given by the frequency of the other allele in 
that genotype. The marginal fitness of allele A;, for 
instance, equals the fitness of A;A; weighted by the 
frequency of Az, plus the fitness of AyA2 weighted by 
the frequency of A>. In underdominance, the marginal 
fitnesses of the two alleles vary in their relative mag- 
nitudes, such that each allele enjoys the advantage 
when it is at sufficiently high frequencies, but not at 
lower frequencies. 
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The threshold allele frequency determining when 
each allele is fixed or eliminated by selection can be 
specified precisely if we introduce some notation and 
let the viabilities of the two homozygotes, A;A; and 
A-A», be 1 + s and 1 + t, respectively, relative to a 
value of 1 for A;Az heterozygotes. The critical, thresh- 
old frequency for allele A; is then t/(s + t). The fre- 
quency of A; will steadily decrease to zero (and that of 
the alternative allele Az will increase to 1) if it is 
initially below this threshold value, and A; will stead- 
ily increase to 1 (and the alternative allele A, will 
decrease to zero) if it starts above t/(s + t). The fre- 
quency of allele A; will remain at t/(s + t) if it starts 
exactly there. This critical allele frequency is an exam- 
ple of an unstable equilibrium and unstable poly- 
morphism, since the population always moves away 
from this polymorphic equilibrium value if it is per- 
turbed from it. (Since there are only two alleles at this 
locus, their frequencies add to 1, implying that the 
corresponding threshold frequency for the alternate 
allele Az is s/(s + t).) 

Figure | shows underdominant allele frequency 
dynamics. This represents a numerical example in 
which the relative fitnesses of the three genotypes 
(A;A;, A1A2, and AzA2) are 1.1, 1.0, and 1.2, respect- 
ively, so that s = 0.1 and t = 0.2. The threshold fre- 
quency for allele A; is thus t/(s + t) = 0.2/(0.1 + 0.2) = 
2/3. As shown by the upper and lower allele frequency 
trajectories in Figure | the frequency of allele A, 
steadily increases to 1 if its frequency starts above 
2/3 (e.g., from 0.7), and steadily declines to zero if 
its frequency is initially below two-thirds (e.g., from 
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Figure | Trajectory through time in generations of the 
frequency of allele A;, subject to constant underdominant 
selection (see text), for three initial allele frequencies (po). 
The alternate allele does the reverse, decreasing in fre- 
quency to zero whenalleleA, increases to |, and vice versa. 


0.6). The frequency of allele A; remains at 2/3 should 
it start at precisely that value. 

Note that in this case the threshold frequency for 
allele A; is more than 0.5, which means that its fre- 
quency declines to 0 (and that of the alternate allele 
increases to 1) for more than half the possible initial 
values; in this example, for all allele frequencies less 
than 2/3. This asymmetrical disadvantage to allele A; 
will occur whenever A;A; homozygotes have the 
lower fitness (of the two homozygotes), as it does 
here (1.1 versus 1.2). If AzA; homozygotes instead 
have the higher fitness, then the threshold frequency 
for allele A; is below 0.5 and its frequency will 
increase to 1 (and that of the alternate allele decrease 
to 0) for more than half its possible initial values. 

Figure | was computed using the recursion equa- 
tion predicting the new frequency of allele A; (p’) after 
one generation of underdominant selection, in terms 
of its previous frequency (p) and the fitness advantage 
(s and t) of the two homozygotes. This is given by: 


e | 1+ sp | 
P =P 1 + sp? + tq? 
whereq=1—pisthefrequency of thealternateallele A>. 

Lastly, it should be emphasized that Figure | gives 
just one numerical example showing how the threshold 
frequency determining whichalleleislostis determined 
by the relative fitnesses of the three genotypes. The 
strength of selection, which depends on the magnitude 
of the differences among the relative fitnesses, also 
strongly affects the time until equilibrium is reached 
and the relevant allele is lost from the population. 


Further Reading 

Hartl DL and Clark AG (1997) Principles of Population Genetics, 
3rd edn, ch. 6. Sunderland, MA: Sinauer Associates. 

Hedrick PW (2000) Genetics of Populations, 2nd edn, ch. 3. Sud- 
bury, MA: Jones & Bartlett. 


See also: Balanced Polymorphism; Fitness 
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Underwinding of DNA is produced by negative 
supercoiling, when the double helix itself is coiled in 
the opposite sense from the intertwined strands. 


See also: DNA Supercoiling; Negative 
Supercoiling; Overwinding 


Undirectional Replication 
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Undirectional replication occurs with the movement 
of a single replication fork from a given origin. 


See also: Replication; Replication Fork 
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Definition 


Unequal crossing over occurs as a result of pairing 
between similar chromosome segments residing at 
different chromosome loci. This occurs most readily 
when tandemly repetitious segments are present close 
together on the same chromosome, since in this case 
only a slight mutual slippage in normal meiotic pairing 
is required. Unequal crossing-over in this situation 
yields products differing only in that one has an 
extra tandem repeat at the expense of the other. 


Bar in Drosophila 


The classical example is the Bar mutant of Drosophila 
melanogaster, investigated in the 1920s and 1930s by 
Sturtevant and Bridges. Bar is a tandem duplication 
of a segment of the X chromosome comprising about 
six polytene chromosome bands. In the male or the 
homozygous female it causes an extreme narrowing of 
the eye; when heterozygous with wild-type in the 
female it has a less extreme effect. Stocks of Bar are 
not completely stable, mutating to wild-type or a more 
extreme mutant allele called Double-bar, with a fre- 
quency of the order of 1 in 1000 flies. Examination of 
the polytene chromosomes shows that the apparently 
wild-type revertants had lost the duplicated segment 
and that Double-bar was a tandem triplication. The 
explanation in terms of out-of-phase meiotic pairing 
of the repeats and unequal crossing over is shown in 
Figure I. 

From his original experiment, Sturtevant con- 
cluded that the wild-type and double-bar derivatives 
resulted from unequal crossing-over. This was made 
virtually certain by the finding that they nearly all 
showed recombination of allelic differences at two 
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loci, fused (fu) and forked (f), closely placed on each 
side of the Bar locus (Figure |). Though resulting 
from out-of-phase pairing, the unequal crossovers 
still had the normal crossover property of interference 
— reducing the probability of crossing-over in an 
adjacent chromosome interval. 


Unequal Exchanges between Sister 
Chromatids in Drosophila? 


In Drosophila, as in other eukaryotes, certain genes 
are naturally tandemly repeated to high copy number; 
the ribosomal RNA-encoding (rDNA) sequences are 
the prime example. Drosophila rDNA occurs in about 
400 copies, divided about equally between the X and Y 
chromosomes. The bobbed (bb) series of mutations, 
detectable through their effect on bristles, are drastic 
reductions inrDNA copies. A bobbed fly stock witha 
special Y chromosome (Ybb—) with mutant bobbed, 
and an X chromosome with another reduced copy 
number bb allele (Xbb) shows instability of rDNA 
copy number on the X chromosome of male flies. 
New alleles, either with magnification of copy num- 
ber up to wild-type levels or further diminution, arise 
with high frequency (although the diminished alleles 
are recovered less often because of their low viability). 
Unequal meiotic crossing-over between homologous 
sequences appears to be ruled out here, since homolog 
exchange does not occur in Drosophila males. Perhaps 
the most plausible hypothesis is unequal sister-strand 
exchange within the single X, both at meiosis and 
mitosis. It is not known what feature of the Ybb— 
chromosome induces this instability in the X. 


Wild-type 
f BS + 


Double-bar 
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Figure | The reversion of the X-linked Bar mutant of 
Drosophila melanogaster to wild-type, and the origin of 
the more extreme mutant Double-bar, by slippage in the 
pairing of the Bar duplication and unequal crossing-over. 
The females whose progeny were screened were 
homozygous Bar and heterozygous with respect to 
two closely-placed flanking mutations, fused (fu) and 
forked (f). The nonBar (‘wild-type’) and double-Bar 
derivatives were, with very few exceptions, reciprocally- 
constituted crossovers with respect to the flanking 
markers. 
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Figure 2 The presumed modes of origin of the human 
mutant globins Lepore and anti-Lepore by out-of 
register pairing and crossing-over between the B- and 
6-globin genes. The two kinds of variant globin were 
identified in different families. 


Unequal Crossing-Over in Yeast 


In more recent years, the new techniques of DNA 
manipulation and transformation have been used to 
‘engineer’ tandem duplications in the budding yeast, 
Saccharomyces cerevisiae. Here the frequency of 
unequal crossing-over is much higher than in Dros- 
ophila; in one study, about one-fourth of all the cross- 
overs in the region of a tandem repeat were unequal, 
with one participating chromatid gaining the gene 
copy that the other had lost. In yeast the situation is 
complicated by the fact that some of the gains and 
losses of repeats are due to conversion events without 
crossing-over. 


The Human £-Globin Gene Cluster 


In humans, as in other mammals, the genes encoding 
the globin polypeptides of hemoglobins occur in tan- 
demly arranged clusters, one for o-related and one for 
B-related globins. The human f-globin cluster con- 
sists, in order of linkage, of the genes s (embryonic 
globin), Gy, Ay (two closely similar forms of fetal 
globin), ọß (a degenerate and nonfunctional B-related 
pseudogene), and 6 and f (6 and B adult globins). It is 
speculated that this tandem array of homologous genes 
resulted from unequal crossing-over, either meiotic 
or mitotic, in the course of vertebrate evolution, 
although the initial generation of two copies from 
one must have occurred without much homology. 
Following the generation of repeats, the initially 
redundant extra copies would seem to have either 
diverged in the times of their expression and, to dif- 
ferent extents, in their coding sequences, or decayed by 
mutation to loss-of-function, as in the pseudogenes. 
This gene arrangement still entails some small risk 
of expansion and deletion by unequal exchange. 
One abnormal, though functional type of hemo- 
globin, called hemoglobin Lepore, is the result of the 


replacement of the normally separate 6 and B genes by 
a single 6—B chimera, with the N-terminal sequence of 
5 joined to the C-terminal part of B. The reciprocal 
condition, called anti-Lepore, has also been found, 
with the N-terminus of B joined to the C-terminus of 
6, and normal copies of the B and 6 genes retained 
(Figure 2). The B and 6 genes, which are relatively 
close together and more than 90% identical, are pre- 
sumably more prone to this kind of mistake than the 
other members of the cluster. 


Further Reading 

Bridges CB (1936) The Bar ‘gene’ is a duplication. Science 83: 210. 

Hawley ES and Tartof K (1985) A two-stage model for the 
control of rDNA magnification. Genetics 109: 691—700. 
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duplicated genetic elements in Saccharomyces cerevisiae. 
Genetics 109: 303—332. 

Sturtevant AH (1961) Further study of the so-called mutation at 
the Bar locus of Drosophila. Reprinted in Genetics and 
Evolution — Selected Papers of A. H. Sturtevant, pp. 107-114. 
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See also: Globin Genes, Human; rDNA 
Amplification; Tandem Repeats 
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When offspring inherit their genotype from only one 
parent, this is known as uniparental inheritance. This 
term can be applied to a wide range of genetic events, 
some examples of which are given below. Much of 
what follows addresses uniparental inheritance in 
diploid organisms. Uniparental inheritance in plants 
will not be considered here. Nonetheless, a substantial 
proportion of the earth’s biomass is composed of 
organisms with a predominantly haploid life cycle 
such as fungi and algae and these deserve a mention. 
For example, the fission yeast Schizosaccharomyces 
pombe normally reproduces asexually during vegeta- 
tive growth, thus propagating the haploid state by 
uniparental inheritance. This yeast is also, however, 
capable of sexual reproduction to produce diploid 
cells. In response to starvation, these diploids will 
sporulate and give rise to haploid cells again by meio- 
sis. When nutrients are plentiful, budding yeast strains 
such as the baker’s yeast Saccharomyces cerevisiae 
prefer to proliferate as diploid cells. On starvation 
they too will undergo meiosis and can subsequently 


either proliferate asexually as haploids or sexually by 
mating with cells of the opposite mating type to form 
diploids. Yeast haploids are a valuable genetic tool for 
the identification and analysis of mutant genes in key 
cellular pathways such as cell-cycle control. In the life 
cycle of the unicellular freshwater alga, Chlamydo- 
monas, uniparental inheritance is evident through the 
asexual reproductive activity of the haploid cells. Once 
again in adverse environmental conditions, some of 
these cells are transformed into gametes and a pair 
fuse to form a diploid zygote. In favorable conditions 
after this sexual phase, meiosis occurs which gives rise 
to a new haploid generation. 

Mitochondria are organelles which occupy a sub- 
stantial cytoplasmic portion of the eukaryotic cell. 
They are responsible for completing the energy con- 
version used to drive cellular reactions. These organ- 
elles contain DNA which in mammals is about 107” 
times the size of the nuclear genome and are capable of 
carrying out their own DNA replication, transcrip- 
tion and protein synthesis. The mitochondrial genome 
in humans encodes tRNAs, rRNAs and 13 other poly- 
peptides. Mitochondrial genesundergonon-Mendelian 
cytoplasmic inheritance which, in higher mammals is 
uniparental because the egg contributes much more 
cytoplasm to the zygote than does the sperm. Hence, 
this uniparental inheritance is maternal in origin. 


Parthenogenesis in Invertebrates 


Parthenogenesis can be defined as the production of 
an embryo from a female gamete without any genetic 
contribution from a male gamete, with or without the 
eventual development into an adult. It is distinct from 
asexual reproduction since it involves the production 
of egg cells. Parthenogenesis is a normal method of 
reproduction in many lower organisms, but does not 
lead to viable mammalian offspring. Parthenogenetic 
development can proceed by various routes depending 
on whether meiosis has occurred or has been supres- 
sed, in which case the egg develops as as result of 
mitotic divisions. Whenever sex is determined by 
chromosome constitution, parthenogenetic offspring, 
in the absence of effective meiosis, all will be, mostly 
female. In birds, however (see below), the offspring 
are male as in this case females are the heterogametic 
sex. In bees, males originate by haploid partheno- 
genesis while diploid females are produced by fertil- 
ization in the normal way. Other aphids, such as 
greenfly (Hemiptera) have generations which alternate 
between parthenogenesis and fertilization, so called 
cyclical parthenogenesis. The formation of female 
parthenogenetic offspring is widespread among 
many order of insects. For example in Drosophila 
parthenogenetica, a small proportion of eggs laid by 
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virgin females develop to produce viable adults. 
Another example is the parthenogenetic grasshopper 
Warramaba virgo, a species which consists of females 
only. Parthenogenesis is also successful in some 
Crustaceae such as the brine shrimp, Artemia salina. 


Uniparental Inheritance in Vertebrates 


As organism complexity increases, experimentally 
induced parthenogenesis has been used to test the 
developmental potential of parthenogenetically pro- 
duced vertebrates. For example, in several species of 
amphibia including Rana japonica, R. nigromaculata, 
and R. pipiens, viable parthenogenetic adults have 
been described. Parthenogenetic fish have also been 
reported and in these abnormal situations it is thought 
that the egg activation may result from infection with 
the phycomycete Ichthyonophonus hoferi. Surpris- 
ingly, for some reptiles of the order Squamata the 
natural mode of reproduction is through partheno- 
genesis in some unisexual populations, and through 
fertilization in populations with two sexes. 

In warm-blooded vertebrates, parthenogenetic off- 
spring are less viable and there is no known instance of 
parthenogenesis as the normal mode of reproduction. 
Nonetheless, in some breeds of chicken and turkey, 
parthenogenetic development has been reported. In 
the latter case, the parthenogenetically produced 
male turkeys (in birds, males are the homogametic 
sex) were small with reduced fertility. In mammals, 
spontaneously occuring cleavage divisions in oocytes 
have been described, notably in the LT/Sv strain of 
mice in which parthenogenesis occurred regularly ina 
small percentage of virgin females resulting in 
development to the blastocyst stage. A high incidence 
of ovarian teratomas are also seen in this mouse strain. 
These benign teratomas are derived from germ cells 
and consist of both differentiated and undifferentiated 
cells. In humans, benign teratomas (dermoid cysts) 
account for 15-20% of ovarian neoplasia and there 
have been several reports of differentiated human 
tissue within these teratomas. 

The inability of activated mammalian eggs to 
develop into viable term fetuses is most likely due to 
the phenomenon of genomic imprinting. Genomic 
imprinting is a process that causes some genes to be 
expressed solely from one parental allele. Some im- 
printed genes are expressed from the maternal chromo- 
some and others from the paternally inherited 
homolog. This results in a requirement for both par- 
ental genomes in order for development to proceed 
normally. The embryological consequences of geno- 
mic imprinting have been studied experimentally in 
mice. Uniparental conceptuses were generated by egg 
activation to make parthenogenones, or via pronuclear 
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transplantation to make bimaternal gynogenones or 
bipaternal androgenones. Gynogenetic and partheno- 
genetic conceptuses die around midgestation and 
exhibit growth retardation and poor development of 
extraembryonic tissues. If these embryos are given 
normal extraembryonic components, development 
will proceed only slightly further, indicating that the 
lethality is not solely due to the failure of the placenta 
but that a diploid maternal contribution to the embryo 
itself is incompatable with life. Androgenetic embryos 
fare less well than their bimaternal counterparts. 
These embryos rarely survive gastrulation and usually 
the conceptus resembles a mass of extraembryonic 
tissue in the absence of embryonic components. 
These findings suggest that a maternal genome is 
required for the development of embryonic compon- 
ents and a paternal genome for the development of 
extraembryonic tissues, at least at these early stages. 
Androgenetic conceptuses are reminiscent of the com- 
plete hydatidiform molar pregnancy in humans which 
has a diploid paternal genotype in the absence of a 
maternal genetic component. A small number of com- 
plete moles have been reported which contain dif- 
ferentiated embryonic cells. The vast majority of 
hydatidiform moles are diploid and are likely to have 
arisen through duplication of the paternal genome 
(homozygous) or by dispermy (heterozygous). The 
majority of partial moles are triploid with the extra 
genome set being paternal in origin. 

Mouse chimeras made with normal cells and either 
parthenogetic or androgenetic cells allow the study 
of the later developmental potential of these experi- 
mental cells during development. The presence of 
normal cells rescues the lethality observed in the par- 
thenogenetic embryos. The resulting chimeras are 
small, but morphologically normal and fertile. Par- 
thenote cells are predominantly found in tissues of 
ectodermal and neurectodermal origin and are not 
well-represented in mesodermal derivatives. In con- 
trast, androgenetic chimaeras develop to term only if 
their contribution is low. Resulting chimeras show 
dysmorphology of the axial skeleton and embryos 
show growth enhancement. The androgenetic cells 
do not favour neurectodermal lineages but prefer to 
contribute to mesoderm. Both androgenetic and 
parthenogenetic cells can contribute to the germline. 
These studies suggest that imprinted genes have roles 
to play in regulating growth and in the development 
of mesodermal and neurectodermal lineages. This is 
consistent with the functions of the small number of 
imprinted genes identified to date. 

Successful androgenesis has been reported to occur 
naturally in interspecfic hybrids of the Sicilian stick 
insect (Bacillus ressius). In contrast with the failure of 
mammalian androgenesis, androgenetic development 


can be induced to occur in lower organisms including 
some vertebrates, notably fish. In fish research, andro- 
genesis has been used successfully to generate homo- 
zygous lines of fish. In addition, the fish species 
Oncorhyncus mykiss and Oreochromis niloticus have 
been recovered from cryopreserved sperm, via sperm- 
sperm fusion followed by fertilization of irradiated 
eggs. The resulting fish survive and appear normal 
suggesting that if genomic imprinting occurs in these 
androgenetic organisms, it is not essential for devel- 
opment. 


Mammalian Cloning 


A clone is any collection of cells that are descendants 
of a single ancestor cell. In the case of the cloning of a 
whole complex organism, that single ancestor is 
usually an enucleated egg which, instead of being 
fertilized, has received a diploid nucleus from a 
somatic cell. This can be considered as another form 
of uniparental inheritance as offspring are produced 
asexually without a contribution from two parents. 
Such nuclear transplantations in vertebrates were car- 
ried out in frogs in the 1980s in order to test the ability 
of nuclei from differentiated cells to support normal 
development. These studies, which addressed the ques- 
tion of whether irreversible changes in genes accom- 
pany differentiation, resulted in development to 
adulthood in a small number of cases. In general, in 
frogs, the later the developmental stage of the nucleus 
used, the more limited the developmental potential of 
the embryo. These studies suggested that there was 
limited reversibility of the differentiated state. 

Until recently, mammalian cloning from fully dif- 
ferentiated cell types was even less successful. In cattle 
and sheep, a series of technical advances allowed trans- 
planted nuclei from undifferentiated embryonic cells 
to give rise to viable offspring. More recently, donor 
nuclei from differentiated adult cells were also found 
to be capable of making a sheep clone after nuclear 
transplantation into an egg. This significant result 
indicates that the differentiated mammalian genome 
can be ‘reprogrammed’ to support development. This 
approach has now been used to produce mouse clones 
at a frequency of about 2% from adult somatic cell 
nuclei injected into enucleated eggs. The frequency of 
successful mammalian cloning is very low and not all 
somatic cell nuclei are capable of producing adult 
clones. This historical achievement provides a valu- 
able model system in which to address key questions 
regarding the regulatory mechanisms of genome pro- 
gramming and reprogramming. In addition, it opens 
wide the debate over ethical issues surrounding the 
application of this technology, notably in ‘reproduct- 
ive cloning.’ Nonetheless, the benefits of cloning 


technology are great, for example as applied to cell and 
tissue therapy — ‘therapeutic cloning,’ as well as the 
preservation and propagation of endangered species. 
Mammalian cloning is an example of how uniparental 
inheritance has moved from nature into the labora- 
tory with profound implications for genetics and bio- 
medicine. 


Further Reading 
Mittwoch U (1978) Parthenogenesis. Journal of Medical Genetics 
15: 165-181. 


See also: Androgenone; Imprinting, Genomic; 
Mitochondria; Parthenogenesis, Mammalian 


Unique DNA 


E Kutter 
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Unique DNA is a stretch of DNA that is present in 
only a single copy ina cell. This includes most DNA 
in bacterial cells and most of the DNA that is 
expressed as mRNA in eukaryotic cells. However, as 
reported by Roy Britten and David Kohne in 1968, 
most eukaryotic cells also have classes of DNA that 
are present in many copies in the genome. In reasso- 
ciation kinetics studies, where the DNA was frag- 
mented into pieces a few hundred nucleotides long, 
denatured, and allowed to reanneal, at least three gen- 
eral populations of DNA were seen. There are sections 
that reanneal very rapidly, indicating that they must be 
present in over a million copies per cell to be able to 
find a mate so quickly. This is now termed highly 
repetitive DNA. Segments that anneal as slowly as 
one would expect from the size of the genome if each 
DNA fragment had to find a unique mate are termed 
unique DNA. Fragments that reanneal at intermediate 
rates, as if a few hundred nearly identical copies were 
present in the genome are called (moderately or mid- 
dle repetitive DNA). The amount of unique DNA 
tends to be fairly similar between different related 
species, even though the total DNA content may vary 
widely; in amphibians and plants, the total DNA in 
the haploid genome can vary by over two orders of 
magnitude, to nearly 10'* base pairs! In most higher 
eukaryotes, including humans, it appears that only a 
few percent of the DNA encodes proteins. 

Much of the highly repetitive DNA has an average 
basecomposition significantly differentfrom that of the 
bulk DNA of the cell, and thus shows up as a separate, 
or ‘satellite,’ peak during density-gradient centrifuga- 
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tion. This satellite DNA hybridizes mainly to the 
heterochromatic centromere regions of the chromo- 
some and consists of short sequences repeated a very 
large number of times. Much of the other rapidly an- 
nealing DNA may be related to large numbers of copies 
of various (generally defective) viruses present in the 
DNA. Moderately repetitive DNA may also be clus- 
tered as tandem repeats which can include functional 
genes, such as the ribosomal RNA and tRNA genes. 


See also: Repetitive (DNA) Sequence 


Universal Genetic Code 


E Kutter 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1353 


Throughout the whole of the prokaryotic, plant, and 
animal kingdoms, the same codons are used for the 
same amino acids, with very few exceptions — the code 
is thus almost universal. However, the genetic code in 
mitochondria has several significant differences from 
the common universal code (Table 1). 

In the mitochondria, the third codon thus is less 
important in selection, reducing the number of tRNAs 
needed in this compact organelle. 

The second known variation from the universal 
genetic code is that ciliated protozoans read AGA 
and AGG as additional Stop signals rather than as Arg. 


Table | Genetic code in mitochondria 

Codon Common code Mitochondrial code 
AUA lle Met 

AGA Arg Stop 

AGG Arg Stop 

UGA Stop Trp 


See also: Genetic Code; Start, Stop Codons 


Unscheduled DNA 
Synthesis 
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Unscheduled DNA synthesis is any DNA synthesis 
occurring outside the S phase in the eukaryotic cell. 


See also: Cell Cycle; S Phase 


2100 Unstable Equilibrium 


Unstable Equilibrium 


See: Equilibrium 


Up, Down Mutations 


See: Alleles; Promoters 


Upstream 
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The term ‘upstream’ identifies sequences preceding in 
the opposite direction from expression. 


See also: Downstream; Gene Expression 


Upstream, Downstream 
Site 
J Parker 
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The terms ‘upstream’ and ‘downstream’ are used to 
refer to the location of sites on the DNA relative to 
other sites in the same gene. The first step in the 
expression of a gene is transcription, the production 
of an RNA copy of the gene using one strand of the 
DNA as a template. For a given gene it is always the 
same DNA strand that is used as a template and tran- 
scription starts at a specific site on the double- 
stranded DNA molecule and proceeds in only one 
direction. The RNA molecule is synthesized from its 
5’ to its 3’ end. For genes whose products are proteins, 
translation also begins at a specific sequence on this 
RNA transcript and proceeds unidirectionally in the 
5’ to 3’ direction. This common directionality for 
both transcription and translation means that from 
the point of view of expression, one end of the gene 
is its beginning (the end where transcription begins) 
and the RNA polymerases and ribosomes flow uni- 
directionally from this end toward the other end of 
the gene, something like a stream. 

This has led to the practice of referring to specific 
DNA sequences as ‘upstream’ or ‘downstream’ as com- 
pared to other sequences in a gene (or in relationship 


to the gene itself). For instance, the promoter, a site 
involved in transcription initiation, is upstream of the 
coding sequences of a gene, but may be downstream of 
some regulatory sites on the DNA. Please note that a 
site could be upstream with regard to some sequences 
and downstream with regard to others. In addition, 
although some sites invariably have a fixed orientation 
relative to other sites, others do not. For example, in a 
gene encoding a protein, the sequence that encodes the 
start codon is of necessity upstream from the sequence 
encoding the stop codon. However, the position of 
certain regulatory sequences, such as an operator, 
may be either upstream or downstream of other regu- 
latory sequences, such as the promoter. Such differ- 
ences in position may have important consequences 
for gene regulation. For sequences called enhancers, 
which are involved in transcription, their position 
relative to the gene(s) whose expression they influence 
seems less important. However, such sites tend to be 
exceptional. 

Many sequences upstream of the encoding region 
of a gene play important roles in the regulation of gene 
expression. These include the promoter, various sites 
that make up the promoter (e.g., TATA box and GC 
box), and sites where regulatory proteins bind (e.g., 
operators and activator-binding sites). Another site 
found in yeast is the upstream activator site (UAS) 
which functions somewhat like an enhancer but must 
be upstream of the beginning of transcription. Often 
these regulatory sites may be simply referred to as 
‘upstream sites.’ Of course, there are also sites that 
are invariably downstream of the coding region of a 
gene, such as transcription terminators and, in eu- 
karyotes, the sites encoding sequences involved in the 
polyadenylation of messenger RNA. 


See also: Enhancers; Operators; Promoters; 
Transcription; Translation 


Uracil 


R L Somerville 
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Uracil (U; molecular formula, C4H4N202) is a nat- 
urally occurring representative of the heterocyclic 
nitrogen-containing aromatic bases termed pyrimi- 
dines. Although found predominantly in RNA, uracil 
is also a major constituent of the DNA of certain 
bacterial viruses. In every other species, uracil is a 
transient constituent of DNA, arising through the 


random deamination of cytosine residues. Efficient 
surveillance and repair systems exist within most 
cells to prevent uracil residues from accumulating 
within DNA. Because the base-pairing properties of 
uracil are identical to those of thymine, replication 
of DNA containing a G:U base pair would give rise 
to a daughter molecule containing an A:U base pair 
and a granddaughter molecule containing an A:T base 
pair. If the original G:C base pair happened to be 
indispensable, the failure to replace U would be lethal. 


See also: Pyrimidine 


URF 
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A URF (unidentified reading frame) is an open read- 
ing frame that is presumed to code for protein, but for 
which no product has been found. 


See also: Reading Frame 
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UTP (Uridine 
Triphosphate) 
E J Murgola 
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Uridine-5’-triphosphate (UTP) is an energy-rich, 
activated precursor for RNA synthesis. It is synthe- 
sized in the cell by phosphorylation of uridine diphos- 
phate (UDP), catalyzed by a nucleoside diphosphate 
kinase, with adenosine triphosphate (ATP) as the 
phosphate donor: 


UDP + ATP — UTP + ADP 


For the synthesis of deoxyuridine triphosphate 
(dUTP), a precursor of DNA, the 2’ hydroxyl group 
of the ribose moiety of guanosine triphosphate (GTP) 
is replaced by a hydrogen atom. The final step in this 
conversion is catalyzed by ribonucleotide reductase. 


See also: RNA 


UVR Genes 


See: Excision Repair 


V Gene 
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A V gene (variable gene) is a sequence coding for the 
major part of the variable (N-terminal) region of an 
immunoglobulin chain. 


See also: Immunoglobulin Gene Superfamily; 
Variable Region 


Valine 
E J Murgola 
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Valine is one of the 20 amino acids commonly found in 
proteins. Its abbreviation is Val and its single letter 
designation is V. As one of the essential amino acids in 
humans, it is not synthesized by the body and so must 
be provided in an individual’s diet. 

The chemical structure of valine is given in Figure I. 


Figure |I Valine. 


See also: Amino Acids; Proteins and Protein 
Structure 


Variable Codons 


See: Codons, Invariable; Genetic Code 


Variable Region 
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Variable regions are regions in the amino acid 
sequence of both heavy and light chains of immuno- 
globulins with great diversity of sequence. They are 
associated with the antigen-binding areas. 


See also: Constant Regions; Immunoglobulin 
Gene Superfamily; V Gene 


Variegation 
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Variegation of phenotype is a phenomenon caused by 
a change in genotype during somatic development. 


See also: Somatic Mutation 


Vascular Endothelial 
Growth Factor (VEGF) 


J Laurén 
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Vascular endothelial growth factor (VEGF) is essen- 
tial for the embryonic differentiation and growth of 
endothelial cells and for tumor angiogenesis. VEGF 
expression is induced by hypoxia, activated oncogenes, 
and a variety of cytokines. Five different isoforms of 
VEGF, generated by alternative splicing, bind to 
two endothelial tyrosine-kinase receptors, VEGFR-1 
(flt-1) and VEGFR-2 (flk-1/KDR). VEGF-induced 
intracellular signaling results in increased cell pro- 
liferation, migration, and inhibition of apoptosis. 
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VEGEFs have great potential i in the induction of thera- 
peutic angiogenesis in ischemic diseases and blocking 
their signal transduction is a promising approach for 
the inhibition of tumor angiogenesis. 


See also: Angiogenesis; Signal Transduction 


Vectors 


W E Jack 
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Vectors were the first DNA tools used in genetic 
engineering, and continue to be cornerstones of this 
technology. A vector is a DNA segment that can repli- 
cate independent of the chromosome and be trans- 
ferred between hosts. DNA fragments integrated 
into the vector are coreplicated, making multiple iden- 
tical copies, or clones, of the target sequence. Accord- 
ingly, these elements are often referred to as cloning 
vectors or vehicles. Vectors have been developed and 
adapted for a wide range of uses. Two primary uses are 
(1) to isolate, identify and archive fragments of a larger 
genome and (2) to selectively express proteins en- 
coded by specific genes. 


Vector Characteristics 


Several vector characteristics are central to their func- 
tion as tools in molecular biology. The first feature is 
autonomous replication. This autonomy facilitates 
purification of the vector and its integrated DNA in- 
sert, paving the way for characterization and mani- 
pulation of the DNA sequence. Such manipulations 
frequently include schemes for expression and/or 
mutagenesis of associated gene-coding regions. This 
autonomy also confers the ability to have multiple 
copies of the DNA insert in the cell, improving 
DNA yields and increasing the gene dosage. 

A second central feature is the availability of 
mechanisms to insert desired DNA fragments into 
the vector. Ideally, precisely defined DNA fragments 
can be readily integrated into the vector without dis- 
rupting essential vector functions. If this step can be 
accomplished with high efficiency, screening for the 
desired construct is minimized. Traditionally, restric- 
tion endonucleases have been used to cleave both 
vector and insert, which are then joined by DNA 
ligase to form the desired clone. In this regard, vectors 
with a large number of unique restriction endonu- 
clease sites provide the greatest flexibility in accommo- 
dating inserts, which can be flanked by any of a variety 


of restriction sites. When these sites are clustered in 
the vector, this cluster is referred to as a multiple 
cloning site (MCS). The advent of PCR amplification 
has dramatically altered the landscape of cloning in 
that precise insert regions can be amplified, adding 
flanking restriction sites if necessary to facilitate 
attachment to the vector. Recently, enzymes that func- 
tion in recombination, such as lambda int/xis and Cre 
recombinase, have also been used to transfer DNA 
segments between vectors. 

A final feature is the ability to select for hosts 
containing the vector. Introducing purified DNA 
into cells, a process called transformation or transfec- 
tion, is inefficient in many cell types. Accordingly, 
transformation is generally followed by a selection in 
which untransformed cells are killed. Selection often 
employs cytotoxic agents, such as antibiotics, to kill 
untransformed cells, relying on drug resistance genes 
on the vector to confer survival to transformed cells. 
Selection can also rely on metabolic enzymes on the 
plasmid that complement either naturally-occurring 
or introduced defects in the host cell. 

In addition to these basic features, DNA sequences 
on the vector can act to extend the host range or 
control expression of cloned gene-coding regions, pro- 
viding an opportunity to selectively express a desired 
protein in a variety of novel contexts, as when tissue- 
or cell cycle-specific expression is desired. Still other 
sequences can provide means for producing and 
selecting protein variants with desired properties. 

Two broad categories of vectors can be defined. The 
first is composed of DNA derived from naturally- 
occurring elements (plasmids) in the cell. These elem- 
ents can differ widely in size and complexity and are 
most often circular. The second utilizes the genomes 
of viruses infecting bacteria or higher organisms such 
as fungi, plants, insects or mammals. These are refer- 
red to as viral vectors, or as bacteriophage (phage) 
vectors when the host is a bacterium. 


Plasmid Vectors 


While plasmids have been reported in eukaryotes such 
as yeast, their presence is largely limited to bacteria 
where their existence is widespread. Accordingly, the 
discussion here focuses on the bacterial systems, 
especially on Escherichia coli, the most widely used 
bacterial system for manipulating DNA. The wealth 
of knowledge and experience concerning DNA DNA 
manipulations in E. coli makes it a starting point for 
almost all in vitro DNA manipulations, no matter 
what the intended use or fate of the construct DNA. 
One obvious utility of plasmids is the ability to 
separate that DNA from cellular chromosomal DNA, 
permitting a vast array of im vitro manipulations. 


Plasmid purification protocols rely both on the rela- 
tively small size and closed circular structure of plas- 
mids to effect separation. The ability to perform such 
purifications rapidly on a small scale is of great advan- 
tage in analysis. 

The origin of replication is the key determinant of 
plasmid properties. This DNA element controls the 
host range, copy number and compatibility of the 
plasmid. Replication initiates within that element, 
and typically involves host proteins, frequently in 
association with plasmid-encoded proteins. Despite 
using host proteins, the mode of replication can differ 
from that of the host chromosome, allowing independ- 
ent accumulation of the plasmid. In one manifestation 
of this effect, the replication of certain plasmids is 
independent of new protein synthesis (relaxed con- 
trol), unlike host chromosomal replication (stringent 
control). As a result, in the presence of protein syn- 
thesis inhibitors, plasmid replication continues in the 
absence of chromosomal replication, resulting in a 
selective amplification of plasmid DNA. 

The type of origin of replication also dictates the 
copy number, or number of plasmids per cell. Copy 
numbers range from one to thousands per cell, and can 
be affected by growth conditions. A high copy num- 
ber is an advantage in increasing the yield of DNA, 
and can also elevate gene expression by the concomi- 
tant increase in gene dosage. However, if the expressed 
genes are toxic to the cell, lower copy number vectors 
are advised. 

The origin of replication determines whether the 
plasmid will replicate in a single host, or in multiple 
hosts. Bacterial plasmids do not, however, replicate in 
eukaryotic cells. When there is a need to replicate in 
disparate hosts, multiple replication origins can be 
included in a single shuttle vector. Thus, constructs 
that can be more readily assembled in a bacterial clon- 
ing system can still be used with more complex genetic 
systems. 

An additional feature arising from replication origin 
interactions is plasmid incompatibility: the inability of 
plasmids with similar replication origins to be stably 
maintained in the same cell. When incompatible plas- 
mids are introduced into the same cell, the plasmids 
compete for survival in subsequent generations of 
cells, eventually segregating to leave two populations 
of cells, each containing only one type of plasmid. 
When multiple types of plasmids are desired in the 
same cell, they ideally will have compatible replication 
origins, and each carry a different selectable marker. 


Viral Vectors 


Viruses are distinguished from plasmids in that the 
viral DNA is encapsulated by a protein coat. This viral 
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capsid not only protects the DNA, but also provides 
an efficient mechanism to deliver viral DNA to the 
cell. This packaging facilitates isolation of the viral 
DNA via purification of the viral capsids. In general, 
viral vectors have larger genomes, encoding proteins 
involved in packaging viral DNA, and in some cases 
proteins involved in viral replication, transcription 
and/or translation. 

The diverse biology of viral vectors has been ex- 
ploited for specialized applications. For example, one 
class of bacteriophages packages a circular single strand 
of the viral genome. DNA isolated from the resulting 
viral particles yields an excellent template for in vitro 
DNA replication, usefulin DNA sequencing and oligo- 
nucleotide-directed mutagenesis. Other viruses can 
integrate their genome into the host chromosome, 
providing stable transgenes for phenotypic analysis. 

A number of eukaryotic viral vectors have been 
constructed, although development has lagged behind 
phage vectors because of the more complex systems 
they represent. Although different elements govern 
replication, transcription and translation, the basic 
features of viral vectors mimic those of phage vectors. 
Viral vectors have played an essential role in elucidat- 
ing gene function, as well as the function of regulatory 
sequences, and promise to play a prominent role in the 
development and implementation of gene therapy. 


Library Construction in Vectors 


A fundamental use of vectors is to isolate and store 
smaller segments of a larger genome. Such fragmen- 
tation simplifies DNA sequence determination and 
analysis of associated gene-coding regions. Such col- 
lections, or libraries, are created by fragmenting a 
genome, and combining each of the ensuing fragments 
with a vector backbone. This genomic library will 
ideally have at least one representation of all DNA 
sequences. Even better, there will be multiple repre- 
sentations, each bounded by different borders to 
increase the probability that a single clone can be 
isolated containing all the relevant contiguous genetic 
elements, such as those associated with a single gene- 
coding region. Correct clones can be identified by a 
number of methods, with one common method being 
hybridization with a probe sharing DNA sequence 
similarity with the desired region. 

In higher organisms, study of gene-coding regions 
is facilitated by constructing cDNA libraries, formed 
by making double-stranded DNA copies of the 
mRNA found in the cell. These cDNA libraries sim- 
plify analysis in that sequences outside the coding 
regions are not represented. 

One final library, an expression library, bears men- 
tioning. This type of library joins coding sequences 
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from the genome with a specific coding sequence 
expressed on the vector, producing a hybrid protein 
containing at least a portion of the protein encoded in 
the genome. Identification of this expressed protein, 
for example by interaction with a specific antibody, 
identifies the underlying gene-coding region, and 
through this correlation opens avenues to identify 
and isolate the complete protein-coding region. 

As can be imagined, a critical feature of these 
libraries is the size of the insert that can be cloned. 
High copy number vectors (> 20 copies per cell) can 
easily accommodate inserts of 3-8 kb, a reasonable 
range for a gene-coding region (a 60kDa protein- 
coding region is about 1.6 kb in length). Larger inserts 
tend to be less stable, particularly in high copy number 
vectors, presumably due to the selective advantage of 
replicating smaller plasmids, including those with 
spontaneous deletions. This bias against large inserts 
is also seen in phage vectors that do not limit the 
amount of DNA packaged in the capsid. 

However, screening, mapping and DNA sequence 
compilation often benefits from having large inserts. 
Many viral cloning vectors allow larger inserts, with 
the length limited by the cavity within the viral capsid. 
Insert sizes in these viral vectors can be increased 
by deleting nonessential viral genes. In an extreme 
example, cosmid vectors derived from bacteriophage 
lambda can package inserts of 30-45 kb, only slightly 
smaller than the 48 kb wild-type genome. Single copy 
plasmids such as bacterial artificial chromosomes 
(BAC) and yeast artificial chromosomes (YAC) can 
maintain even larger inserts (350-1000 kb). 


Vectors for Protein Expression 


A specialized class of vectors are designed for protein 
expression. Vectors in this class promote high yields of 
a desired protein in either bacterial or eukaryotic sys- 
tems, although usually not both due to differences in 
transcriptional and translational signals. It should 
additionally be noted that a number of eukaryotic pro- 
teins require posttranslational modifications which 
are lacking in bacterial systems, and must be expres- 
sed in eukaryotic systems to ensure full functional 
operation. 

High yields in either prokaryotic and eukaryotic 
expression systems can be fostered by multicopy 
vectors that increase the gene dosage, and by juxta- 
position of transcriptional and translational control 
sequences next to the gene-coding sequence. Inducible 
transcriptional promoters are often used, and in the 
case of eukaryotes promoters can also be controlled in 
a cell-cycle or tissue-specific manner, providing means 
to control the timing and localization of protein ex- 
pression. This is useful not only in producing large 


amounts of protein for biochemical analysis, but also 
in establishing regulatory pathways and phenotypic 
associations for particular gene products. 

One variety of expression vectors modifies the 
expressed protein to include extra amino acid se- 
quences at one or both termini. These added 
sequences can target the resulting fusion protein to 
specific cellular compartments, or can act as detection 
tags in defining spatial and temporal patterns of 
expression. Still other fusions act as affinity tags to 
assist in purification of the attached protein. 


Conclusions 


In the short history of recombinant DNA, vectors 
have played a major role in advances in technology. 
Without a doubt, refinement and development of new 
vectors will continue to pace the rapidly expanding 
field of molecular biology. 
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DNA; Transduction 
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Vertical Transmission 
D E Wilcox 
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Vertical transmission occurs when a trait or a disease is 
passed down through several generations, directly 
from an affected individual to affected descendants in 
successive generations. It is typically seen in auto- 
somal dominant inheritance. In this pattern of inherit- 
ance, both sexes can be affected and, in turn, transmit 


the trait or disease to both males and females. It is also 
seen in mitochondrial inheritance but here, although 
both males and females can be affected, only females 
transmit the trait. This is because only eggs and not 
sperm transmit mitochondria to the zygote. 

Example pedigrees may be seen at http: //www. 
gla.ac.uk/medicalgenetics/encyclopedia.htm 


See also: Mitochondrial Inheritance 


Viroids 
C Beamish and E Kutter 
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Viroids might be considered ‘naked viruses’ — viruses 
without their space ships. They are infectious circular 
RNA molecules only about 300 nucleotides long, with 
no protein coats. Cells somehow take them up, repli- 
cate them extensively and then release them. Some- 
times they harm their host cell, sometimes not. They 
cause a variety of problematic diseases in higher 
plants. For example, in potato spindle-tuber disease, 
a viroid causes the potatoes to become cracked, 
gnarled, and elongated. Viroids also damage coconut 
palms, hops, peaches, cucumbers, and avocados. 
They are transmitted between plants mechanically or 
through pollen or ovules. In addition to their large 
economic impact on plant crops, these RNAs are of 
great interest to molecular biologists because they 
are the smallest and simplest replicating molecules 
known, and because it is possible that they are a sort 
of living fossils, reflecting precellular evolution in a 
hypothetical RNA world. 

Viroids were discovered in 1971 by T.O. Diener, a 
plant pathologist. It is still not clear how they cause 
disease, especially since they may cause severe prob- 
lems in one plant and no particular symptoms in a 
related species. They encode no proteins. Perhaps 
they bind to something in the cell and disrupt some 
regulatory mechanism. The RNA of many viroids 
contains sections of nucleotide sequence complemen- 
tary to key regions at the boundaries of RNA introns; 
maybe that is how they damage cells. They also have 
nucleotide sequences similar to some seen in trans- 
posons and retroviruses. 

Two quite distinct groups of viroids have been iden- 
tified. Group A viroids (such as peach latent mosaic 
viroid) replicate in chloroplasts, while group B viroids 
multiply in the nucleus of the host cell. Viroid replica- 
tion involves either symmetric or asymmetric rolling 


circle replication mediated by an RNA-directed RNA 
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polymerase, long thought to be unique to plant cells. 
(Now it seems that there may be a similar animal-cell 
enzyme that is involved in replicating hepatitis delta 
virus RNA.) The infecting circular monomer is copied 
to make a long linear multimeric minus strand. In the 
symmetric mode, the multimeric minus strand is cut 
and sealed to make minus circular monomers. These 
then act as templates to form the plus-strand mono- 
mers by the same set of three reactions. All group B 
viroids studied to date use the symmetric mode. In the 
asymmetric mode, seen for group A viroids, the minus 
strand serves directly as a template to make linear 
multimeric plus strands. These are cleaved through 
autocatalysis involving a so-called ‘hammerhead’ 
ribozyme structure and sealed into closed circles. 
Such hammerhead structures have also been seen in 
other RNA molecules that have enzyme-like proper- 
ties. No hammerheads are seen in the group B viroids 
and no self cleavage has yet been observed, but neither 
has a cellular enzyme yet been identified that is re- 
sponsible for their cleavage. Thus, a still-unidentified 
form of autocatalysis may be involved. 
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A virulent phage will always substantially alter the 
physiology of the cell it infects and lyse its bacterial 
host at the completion of its infection cycle. Because 
of this distinguishing characteristic, virulent phages 
are also known as lytic phages. Coliphages T4 and 
T7 are classic example of virulent phages. The life 
cycle of virulent phages (Figure l) is initiated with 
adsorption of the phage to some specific set of macro- 
molecules it uses as receptors on a bacterial host. Once 
adsorption has occurred, the viral DNA enters the 
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Figure | The life cycle of virulent phages. 

host and the viral genes are transcribed and translated 
in a specific preprogrammed sequence. The resulting 
viral proteins shut down the host’s cellular machinery 
and begin building new viruses. A lysozyme or other 
endolysin i is produced and, once a phage-encoded 
porin gives it access across the inner membrane to 
the bacterial cell wall, the wall is degraded and the 
new phages are released into the surrounding environ- 
ment. Under optimal conditions, it takes about 25 min 
for T4 to complete this life cycle. 


See also: Lysis; T Phages 
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A virus can be any of a variety of minute, particulate, 
infectious agents consisting of protein and nucleic acid, 
each capable of multiplying in the cells of a specific 
type of organism. Each type of virus causes a character- 
istic infection, which commonly culminates in destruc- 
tion of the host cells and liberation of more viruses. 
Viruses of all types of organisms (bacteria, fungi, algae, 
protists, plants, animals) have been identified. 

As microbiologists of the late nineteenth and early 
twentieth centuries gradually sorted out the entities of 
their domain, they depended on what we would now 
consider quite primitive tools. Microscopes of the 
time were good enough to reveal bacteria of various 
forms, but the theoretical limit of resolution of a 
microscope using visible light, about 0.2 um, kept 
them from observing any smaller particles. However, 
the porcelain filters that were commonly used to ster- 
ilize solutions provided a useful criterion of size. Thus, 
when Iwanowski demonstrated, in 1892, that infected 
tobacco plants carried the infectious agent in their sap, 
he could not observe the agent with a microscope, but, 
by demonstrating that it passed through a filter that 
retained bacteria, he showed it to be a previously 
unknown life form. In 1899, Beijerinck independently 


discovered the same phenomenon and called it a 
contagium vivum fluidum — a contagious living fluid. 
Similar infectious agents of other diseases were iden- 
tified during the same period (foot-and-mouth disease 
by Loeffler and Frosch in 1898, yellow fever by Reed 
and his colleagues in 1900) as filterable viruses, from 
the Latin virus, meaning a poison or stench. 

The study of viruses was advanced considerably by 
the discovery of bacterial viruses, or bacteriophages 
(phages; see article on Bacteriophages), by Twort in 
1915 and by d’Herelle (see D’Herelle, Félix) in 1917. 
As described by Evans (1952): 


When one adds a small amount of bacterial virus to a vigor- 
ously growing bacterial culture, nothing occurs immedi- 
ately. Then, suddenly, the suspension begins to foam, as 
materials inside the bacterial cell are liberated into the me- 
dium, and within a short time the heavy mass of growing 
bacteria has been replaced by floating shreds of debris that 
settle slowly to the bottom of the containing vessel. The 
clear bluish supernatant fluid now contains a hundred fold 
multiplication of the original virus inoculum. 


Phage multiplication is easy to study because of 
plaque formation: an appropriately diluted suspen- 
sion of phage is mixed with susceptible bacteria in 
a few milliliters of melted agar, and the mixture is 
poured over the layer of nutrient medium in a petri 
plate. After overnight incubation, the plate is covered 
with a continuous layer, or ‘lawn,’ of bacteria which 
is interrupted by small, circular clearings called 
‘plaques,’ where bacteria have been killed. This phe- 
nomenon reveals at least two important points. First, 
bacteria are killed at discrete points on the plate, not in 
widespread killing spread nebulously throughout the 
culture, thus showing that the infectious agents are 
particulate. Second, the number of plaques formed is 
directly proportional to the amount of the original 
suspension added to the plate, making a convenient 
way to determine the concentration, or titer, of infec- 
tious particles: the number of plaques formed multi- 
plied by the dilution factor. One can also count viruses 
by electron microscopy and thus determine the plat- 
ing efficiency: the fraction of particles in a preparation 
that are capable of forming plaques. For many phages, 


the plating efficiency is essentially 1, but for other 
viruses it may be quite low. 

In a series of classic experiments, Ellis and 
Delbrück (1939) and Delbrück (1940) showed the value 
of the plaque method and used it to demonstrate that 
bacteriophages multiply in steps, now demonstrated 
by a single-step growth curve. Consider each plaque 
to be formed by an infective center; the number of 
such centers stays constant for about 25-30 min after 
phages and bacteria have been mixed in a flask, and 
then it rises dramatically, typically by a factor of about 
100. This shows that the phages multiply within bac- 
teria that remain intact for a time and then suddenly 
burst (lyse). Until 25-30 min, the infective centers are 
intact, infected bacteria which burst on the plate, each 
yielding a single plaque; after that time, more and 
more of the infective centers are free phage particles 
that have been liberated by cells bursting in the infec- 
tion flask. Experiments with viruses of other cells, 
including plants and animals, give comparable results, 
except that infected cells do not commonly burst so 
suddenly and dramatically but rather disintegrate 
gradually, liberating the virus particles that have been 
formed within. The understanding generated by these 
experiments has defined much of our thinking about 
viruses in general. 


Nature of Viruses 


The most often-asked question about viruses is “Are 
they alive?” This is a semantic problem that could 
only be answered by addressing the more complicated 
question of what the criteria for life are. Instead of 
addressing that issue, we will adopt the now-well- 
accepted viewpoint of Evans (1952) that viruses 
are entities with their own distinctive characteristics 
and are quite distinct from other entities called organ- 
isms. In Lwoff’s words, “viruses are to be considered 
viruses because viruses are viruses.” As Lwoff and 
Tournier emphasized, there are no entities intermedi- 
ate between viruses and organisms. Certain kinds of 
infectious bacteria (rickettsias, chlamydias) show 
virus-like features such as intracellular multiplication 
and extreme metabolic dependency on their hosts, but 
they are still recognizable as bacteria. The differences 
between viruses and organisms are easily shown by 
the following comparison. 


1. An organism is always a cell or collection of cells. 
No virus has such a structure. A virus is a particle, 
called a ‘virion,’ which consists of a nucleic acid 
genome enclosed in a protein covering: (a) The 
virion contains only one kind of nucleic acid — 
either DNA or RNA - whereas every cell needs 
both kinds to function. Viruses reproduce solely 
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using the information from this one nucleic acid, 
whereas organisms, including infectious organisms, 
reproduce through an integrated action of their 
nucleic acid constituents; (b) Cells grow by enlarge- 
ment and binary fission. No virus grows in this 
way. The virion is merely a vehicle for transporting 
the nucleic acid genome to another host cell. The 
genome enters the host cell and begins an infection, 
which results in production of a large number of 
new virions. 

2. Viral genomes do not contain the information for 
any kind of apparatus to generate high-potential 
energy — what Lwoff called a ‘Lipmann system.’ 
The virus is thus totally dependent upon its host 
cell for a chemiosmotic potential, for ATP, and for 
any other source of energy. 

3. A virus makes use of its host’s protein-synthesizing 
apparatus: its ribosomes, transfer RNAs, and other 
factors. Some viral genomes encode special tRNAs, 
but no virus supplies the entire protein-synthetic 
system. Again, it is absolutely dependent upon its 
host. 


Characteristics of Virions 


Individual virus particles (virions) have distinct- 
ive features that help to identify them in electron 
micrographs, but all virions share certain basic features. 
Each virion consists of a nucleic acid genome (see 
Genome; Nucleic Acid) and a protective covering of 
protein called the ‘capsid’; the combination of nucleic 
acid and protein makes the nucleocapsid, which in 
many cases is the entire virion. In other cases, how- 
ever, the nucleocapsid is enclosed by an envelope, or 
‘peplos,’ made of a somewhat modified cell membrane 
from the cell in which the virion was made. 

Crick and Watson (1956) pointed out that even a 
small virus has too much capsid protein to be encoded 
by its genome if the protein were one unique se- 
quence. Instead, they argued, a capsid must be made 
of small subunits (protomers) that combine to form 
the large (multimeric) capsid. Caspar and Klug (1962), 
in considering design principles for virus structure, 
combined this principle of subassembly with a prin- 
ciple of self-assembly: the protomeric units should 
assemble themselves spontaneously into the correct 
(that is, lowest energy) form without the need for 
additional structural information from outside. If 
identical protomers are to assemble themselves into 
a structure, the bonds between all subunits must be 
identical and all subunits must bear the same geomet- 
rical relationship to one another; the resulting struc- 
ture must therefore be symmetrical. The laws of 
crystallography limit the possible modes of symmetry 
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to two classes: helical or cubic. In helical virions, the 
nucleic acid associates with the capsid protein units 
to form a helix coated with protein. In cubic virions, 
the nucleic acid is wound up rather like a ball of 
thread inside a closed shell of protein subunits. The 
capsid of a cubic virion is therefore a surface crystal of 
protein. 


Helical Capsids 

The helical capsid, exemplified by tobacco mosaic 
virus, is easiest to describe (Figure |). The capsid is a 
large multimer of a single type of polypeptide enclos- 
ing the RNA genome in a groove. Protomeric units 
associate with one another to initially form a disk, 
which soon is transformed into the beginnings of a 
helix. Protomers assemble along the RNA until it is 
entirely enclosed, thus determining the length of the 
capsid. This type of capsid is therefore very similar 
to other large, helical protein structures such as the 
flagella and pili of bacteria. 


Cubic Capsids 

Electron microscopy shows that all cubic capsids 
actually have the form of an icosahedron, with 20 
equilateral triangular faces; this becomes a dodeca- 
hedron in the limit for the smallest viruses. In fact, 
the principle of construction was discovered by 
R. Buckminster Fuller, in facing the challenge of 
designing easily assembled structures (“assembly by 
child,” as he put it). A plane can be paved with iden- 
tical equilateral-triangle tiles; in this plane, six tiles 
meet at many points of sixfold symmetry. The plane 
can be bent into a third dimension, so it can start to 
enclose a space, by removing a wedge of tiles touching 
one point and connecting the remaining tiles, thus 
creating a vertex with fivefold symmetry. Completely 
enclosing a volume requires 12 fivefold vertices (mak- 
ing a dodecahedron), but more generally the space 
between vertices is filled in with varying numbers of 
units of sixfold symmetry, thus creating an icosahe- 
dron. This is the basis of Fuller’s geodesic dome. Just 
as geodesic domes can vary insize, actual cubic capsids 
vary in number of protomers. All possible icosadelta- 
hedrons can be defined by a number T, the triangula- 
tion number, where the number of subunits is 207: Tis 
given by the rule T = Pf’, where f is any integer and P 
= þh? + bk + k’, where h and k are any two integers 
with no common factor. Virions of different sizes are 
known to have T-values of 1, 3, 4, 7, 9, 16, 25, and 81. 
When negatively stained and examined by electron 
microscopy, the surface of an icosahedral nucleocap- 
sid shows distinct units called capsomers; those at the 
vertices are pentons, made of five protomers, and the 
rest are hexons, made of six protomers. 


Figure | The helical structure of tobacco mosaic 
virus. The nucleocapsid consists of many identical 
protein molecules enclosing the RNA genome. 


Types of Virus—Host Interactions 


Virus—host interactions are exemplifying by lytic virus 
multiplication: One or more virions infect a host cell, 
which is then converted into a factory for the synthesis 
of new viruses. Virions accumulate in the cell, which 
eventually disintegrates and scatters its contents. 

However, this is not the only possible type of virus- 
host interaction. Many viruses establish a state of 
lysogeny (see Lysogeny), in which the viral genome 
remains in a stable condition (a provirus) inside the 
cell. Lysogenized cells may multiply indefinitely, like 
other cells, each member of the clone retaining its own 
copy of the provirus. 

A number of bacterial viruses are known to multiply 
in their hosts in a nondestructive way. A copy of the 
viral genome remains inside the cell, replicating at a 
low rate and directing the synthesis of viral proteins; 
but, instead of lysing, the cell remains intact and new 
virions are extruded from the cell surface. Such cells 
can apparently continue to produce virions indefin- 
itely. 


Common Pattern of Lytic Multiplication 


All viruses whose lytic cycles have been studied show 
very similar patterns of virus multiplication. The 
method of infection varies, but, once a viral genome 
is established in the host cytoplasm, a series of early 
genes (see Early Genes (in Phage Genomes)) are ex- 
pressed. These genes may encode proteins that disrupt 
host activities, as by hydrolyzing the host genome 
and stopping translation of host mRNAs, and their 


Table | Known types of viral genomes 
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Description 


Example 


DNA genomes 


Single-stranded Circular 
Linear 
Double-stranded Linear 


Linear with nicks 
Circular 


RNA genomes 


Single-stranded (linear) Positive strand 


Positive strand segmented 


Negative strand 


Negative strand segmented 


Double-stranded (linear) Segmented 


Phages: )X174, M13 
Parvoviruses 

Many viruses 

Phage T5 
Papovaviruses 


Picornaviruses 

Brome mosaic virus 

Rhabdoviruses; some paramyxoviruses 
Some paramyxoviruses 

Reoviruses 


protein products may be new enzymes essential for 
the replication of the viral genome. Synthesis of these 
proteins ceases by the middle of the infection period 
and a set of late genes (see Late Genes) is then turned 
on. The late proteins encoded by these genes are pri- 
marily structural proteins of the virion; their synthesis 
continues throughout the rest of the infection period, 
and the lysed cell contains capsid proteins and nucleic 
acids that have not formed virions. 


Types of Viral Genomes 


Each type of virus has a genome of either DNA or 
RNA, but no virus carries both. The genome may be 
either single- or double-stranded, and may take a var- 
iety of forms. Viral nucleic acid strands are also desig- 
nated either positive (plus) or negative (minus), where 
the viral messenger RNA is defined as being positive. 
Finally, some viruses carry segmented genomes made 
of separate nucleic acid molecules, and a complete 
genome requires all of the segments; in this case, how- 
ever, a successful infection can result from simultan- 
eous infection by several virions that contribute all 
of the segments in combination, even if no one virion 
has a complete genome. Table I shows the types of 
genomes that have been identified. 

When viruses with double-stranded genomes in- 
fect, transcription can occur as it does in a normal 
cell, although special transcriptases may be required. 
However, infection by a single-stranded genome re- 
quires the formation of a double-stranded intermedi- 
ate called a ‘replicative form’ (RF). For instance, the 
single-stranded DNA genome of the phage þX174 is a 
plus strand. Upon infection, it is converted by the 


bacterial DNA polymerase into a double-stranded 
RF, whose negative strand is then used as the template 
strand for transcription of mRNA and also as the 
template for the production of new positive-strand 
genomes. Similarly, when a virus with a positive-strand 
RNA genome infects, that RNA is itself capable of 
being a messenger, and one of the early products made 
by translation of this mRNA is an RNA replicase, an 
RNA-dependent RNA polymerase which converts 
some of the infecting strands into RFs. As with a 
single-stranded DNA, these RFs are the sources of 
new mRNA molecules and new positive-strand 
genomes. 

Infection by a virus with a negative-strand RNA 
genome requires a transcriptase associated with the 
virion; this enzyme transcribes the genome to produce 
positive mRNA, which is then translated into pro- 
teins. Among these proteins is a replicase, which con- 
verts the infecting genomes into RFs, from which new 
negative-strand genomes are formed. 

The most unusual and complex form of replication 
occurs among the retroviruses, such as human im- 
munodeficiency virus (HIV) and Rous sarcoma virus 
(RSV). They carry positive-strand RNA genomes 
along with RNA-dependent DNA polymerase, a 
so-called reverse transcriptase. Upon infection, this 
polymerase converts the RNA genome into a unique 
double-stranded form (+RNA combined with 
—DNA). After removal of the RNA strand by a ribo- 
nuclease activity of the same enzyme, the remaining 
negative DNA strand is replicated to form a double- 
stranded proviral DNA. Furthermore, the proviral 
DNA then forms a circular molecule, which integrates 
itself into the host chromosomes. This DNA can then 
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Figure 2 Forms of representative DNA viruses. (A) Viruses with double-stranded DNA; (B) viruses with single- 


stranded DNA. 


be transcribed into new RNA strands, which serve 
either as mRNAs or as genomes for new viruses. 
Cells infected in this manner then take on new char- 
acteristics of their own and may be transformed into 
tumor cells. 


Classification of Viruses 


The general scheme of virus classification proposed 
by Lwoff and Tournier, based on the characteristics 
of the virion, has been considerably modified to take 
account of other features. The Lwoff—Tournier system 


divides viruses into riboviruses or deoxyviruses as 
their nucleic acid is RNA or DNA; then into helical or 


cubic classes, depending on symmetry of the nucleo- 
capsid, thus defining four classes: Ribocubica, Ribo- 
helica, Deoxycubica, and Deoxyhelica. Finally, the 
nucleocapsid may be naked or enveloped, creating a 
tertiary division and defining eight orders. However, 
this scheme leaves no room for the common bacterio- 
phages, which have icosahedral heads and helical tails, 
nor for some viruses with complex — and sometimes 
still unknown — structures. The classification cur- 
rently being developed by an international committee 
is explained by Lwoff and Tournier (1966). Figures 2 
and 3 show some major families of viruses, defined 
by a combination of nucleocapsid form, presence or 
absence of an envelope, and type of nucleic acid. 
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Figure 3 Forms of representative RNA viruses. (A) Viruses with double-stranded RNA; (B) viruses with single- 
stranded RNA (+ strand); (C) viruses with single-stranded DNA. 
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In contrast to the large number of bacteriophages and 
viruses of eukaryotes known to date, only about two 
dozen viruses of the Archaea have so far been identi- 
fied and studied in some detail. Several of these viruses 
have unique morphologies, although their genome 
structures and virus-host relationships show certain 
similarities to either bacterial or eukaryotic viruses, 
furnishing evidence for the primeval existence of 
common ancestral modules. 


Viruses of Euryarchaeota 


The morphotypes of archaeal viruses reflect the divi- 
sion of the domain Archaea into two kingdoms, the 
Euryarchaeota and the Crenarchaeota. All but two 
viruses of euryarchaeotes are typical head-and-tail 
phages, including virions with contractile and non- 
contractile tails, thus belonging to the families Myo- 
viridae and Siphovoridae. All have double-stranded 
DNA genomes. Circular permutation and terminal 
redundancy of the genomes of some phages indicate 
a headful mechanism of packaging from concatemeric 
precursors. 

Both temperate and lytic viruses were found. The 
prophage of Halobacterium phage oH persists as a 
circular episome, similar to the prophage form of the 
coliphage P1, rather than being integrated in the host’s 
chromosome. The regulation of lysogeny has features 
resembling the regulation of lysogeny in the lambda 
phage of Escherichia coli. The promoters of the genes 
encoding an early lytic product that is necessary for 
the expression of late genes and the repressor of that 
transcript are situated back to back, in a manner simi- 
lar to that of cl and cro in lambda, and transcription of 
the two genes is mutually exclusive. 

About 25% of the genome of the haloarchaeophage 
oH andthecomplete genome of the Methanobacterium 


phage YM2 have been sequenced. Similarities with 
bacteriophages were again found in the genome organ- 
ization. Several open reading frames (ORFs) of the 
Methanobacterium phage show significant similarities 
to genes encoding structural proteins, proteins in- 
volved in packaging DNA into capsids, and a site- 
specific recombinase of bacteriophages that infect 
Bacillus and other gram-positive hosts. 

There are two examples of euryarchaeal viruses that 
have morphologies different from those of tailed 
phages: His 1, a virus infecting Haloarcula hispanica, 
and a virus-like particle produced by Methanococcus 
voltae strain A3 both have a spindle-like shape. 


Viruses of Crenarchaeota 


Viruses have been described for only two genera of 
the kingdom Crenarchaeota, the hyperthermophile 
Thermoproteus and the extreme thermophile Sulfolo- 
bus. All of these viruses have unique morphotypes and 
have been assigned to four novel virus families: Fusel- 
loviridae (the spindle-shaped enveloped viruses SSV1, 
SSV2, and SSV3 of Sulfolobus), Rudiviridae (the stiff 
rod-shaped, nonenveloped viruses SIRV1 and SIRV2 
of Sulfolobus), Lipothrixviridae (the filamentous 
enveloped viruses TT V1, TT V2, and TTV3 of Thermo- 
proteus, DAFV of Acidianus, and SIFV of Sulfolobus), 
and Guttaviridae (the droplet-shaped virus SNDV 
of Sulfolobus). Typical virus particles are shown in 
Figure I. 

Only viruses TTV1 and TTV4 are lytic. The fusel- 
loviruses are temperate and the rest are present in their 
hosts in more or less stable carrier state. Possibly this 
strategy helps them to escape prolonged direct con- 
frontation with the harsh natural environment, with 
temperatures up to 100 °C, and, for viruses of the 
acidophilic Sulfolobus, pH values down to 1.5. How- 
ever, due to significant inhibition of the growth of host 
cells, plaque tests could be established for all viruses 
infecting Sulfolobus, except SNDV. 

In fusellovirus lysogens the viral genome is inte- 
grated specifically into the host genome by means of a 
virally encoded integrase and is also present as a plas- 
mid copy. As in the case of some temperate bacterio- 
phages, integration occurs in a tRNA gene of the host. 
Similar to bacterial lysogens, virus production can be 
induced by UV irradiation or by mitomycin C treat- 
ment, apparently resulting from an SOS-like response 
of the host cells. This response includes activation of a 
short transcription unit which is situated between two 
large ‘back-to-back’ transcription units, similar to the 
C1 gene of the lambdoid bacterial phage 186. 

The rudivirus SIRV1 shows unusual behavior innew 
hosts. The virus genome varies by extensive accumu- 
lation of point mutations with a rate of about 107? 
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substitutions per nucleotide per replication cycle — 
unprecedented for DNA viruses and approaching 
values seen for the most rapidly mutating RNA 
viruses. Accumulation of point mutations eventually 
leads to the selection of conditionally stable virus 
variants, coinciding with the recovery of high fidelity 
replication. Such stable variants of SIRV1 produce 
further variants when infecting a new host, demon- 
strating that stability of the viral genome in certain 
hosts does not exclude the potential to vary. SIRV2, 
which has a similar but 3.2 kb longer genome, remains 
stable in the same hosts. 

The virus TT'V1 shows genetic variability indica- 
tive of an undefined recombination mechanism. The 
variance arises from the regrouping of homologous 
specific sequences between two nonadjacent reading 
frames. 

The genomes of all crenarchaeal viruses isolated to 
date consist of double-stranded DNA. In members of 
the Rudiviridae and the Lipothrixviridae it is linear, 
and in members of the two other families covalently 
closed circular. The circular DNA of SSV1 was shown 
to be positively supercoiled. The termini of the linear 
genome of the lipothrixvirus SIFV are modified in an 
as yet uncharacterized manner, and those of the rudi- 
viruses are covalently closed — the two DNA strands 
form a continuous polynucleotide chain. Such struc- 
ture is characteristic for linear genomes of eukaryotic 
poxviruses, Chlorella viruses, and African swine 
fever virus. The genomes of rudiviruses share with 
these genomes one more characteristic feature, long 
inverted terminal repeats. Presumably the mode of 


Figure | Electron micrographs of representatives of the four families of viruses of the Crenarchaeota. 
(A) Lipothrixvirus SIFV; (B) rudivirus SIRV2; (C) fusellovirus SSV l; (D) guttavirus SIFV. Scale bars = 200 nm. (Reprinted 
from Prangishvili et al, 2001 with permission from Elsevier Science.) 


DNA replication is similar in rudiviruses and these 
eukaryotic viruses. 

The complete genomes of both rudiviruses and of 
the fuselloviruses SSV1 and SSV2, and more than 90% 
of the genomes of the lipothrixviruses TTV1 and SIFV 
have been sequenced. Except for the latter two, the 
sequences of members of the same families are highly 
homologous to each other. The two rudiviruses share 
16 ORFs with the lipothrixvirus SIFV, indicating that 
the two virus families may be related. Paralleling the 
unique morphotypes of the viruses, only a very 
limited number of their ORFs show any similarity 
to proteins from other viruses or from organisms. On 
the basis of sequence similarity, only the SSV1 gene 
encoding an integrase and the SIRV1 genes encoding a 
dUTPase and a Holliday junction resolvase could be 
unambiguously identified. All three genes have been 
functionally expressed in E. coli. 

The virus SSV1 has proven to be a useful model 
for studying transcription in the Archaea. During 
the analysis of viral transcripts and promoters it was 
found that the promoter sequences contained TATA- 
boxes resembling the promoters of eukaryotic RNA- 
polymerase II rather than those of bacteria. 


Evolutionary Considerations 


Studies on archaeal viruses allow insight into virus 
evolution. Conservation of a characteristic virion mor- 
phology and sequence similarity of some of the genes 
indicates that euryarchaeal viruses could share ances- 
try with tailed bacteriophages. Certain characteristics 
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of the genomes as well as virus—host relationships of 
crenarchaeal fuselloviruses indicate a common ances- 
try with lambdoid bacteriophages. Crenarchaeal rudi- 
viruses, sharing with poxviruses, Chlorella viruses, 
and African swine fever virus peculiarities of genome 
organization and replication, seem to be related to 
them. Considering biochemical barriers between the 
three domains of life, e.g., the incompatibility of the 
archaeal and the bacterial transcription systems, and 
between different life styles, direct spreading of 
viruses from one domain to the others appears un- 
likely. It seems more plausible to assume common 
ancestors prior to the divergence of the lineages of 
their hosts. 
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vectors. Genetics 152: 1397-1405. 

Stolt P and Zillig W (1994) Gene regulation in halophage oH, 
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The Visconti—Delbriick hypothesis was offered by 
N. Visconti and M. Delbriick in 1953 to rationalize 
the linkage relations observed in genetic crosses 
conducted with the T-even phages, T2 and T4. 

Phage crosses usually involve the mixing of about 
1 x 10° bacteria (in 1 ml) with about 7 x 10° phage 
particles of each of (usually) two different genotypes. 
Phage particles are given time to adsorb to the bacteria 
and inject their DNA, thereby entering the ‘vegeta- 
tive’ state. Phage particles that fail to adsorb to cells are 


eliminated, and the infected cells are diluted so that 
progeny phage particles released when the cells lyse 
do not readsorb to bacteria or bacterial debris. The 
progeny phage are assayed by plaque formation, and 
the genotypes of the particles determined by either the 
morphology of the plaques generated or by their abil- 
ity to grow under one or another condition. The frac- 
tion of the progeny that is recombinant for a given pair 
of markers defines the recombinant frequency for 
those markers. For phages T2 and T4, early studies 
suggested three ‘linkage groups.’ Crosses involving 
markers in the same linkage group gave convincingly 
less than 50% recombinants, and the frequencies 
observed allowed the construction of linear maps 
based on the rule that markers giving the largest fre- 
quencies of recombinants be placed farthest apart on 
the map. However, crosses with markers on different 
linkage groups gave 40-45% recombinants, signi- 
ficantly less than the Mendelian expectation of 50%. 
The linkage data departed from that usually ob- 
served with eukaryotic genetic crosses in an additional 
way: Frequencies of double recombinants for adjacent 
intervals were higher than (negative interference), 
rather than equal to (no interference) or lower than 
(positive interference), that expected if exchanges 
were randomly distributed among the linkage groups 
of different progeny particles. Furthermore, simul- 
taneous infection by three appropriately marked geno- 
types of phage produced a progeny in which some of 
the particles had markers derived from each of the 
three infecting genotypes (‘triparental recombinants’). 
In an effort to put these novel data into a meiotic 
framework, Visconti and Delbrück proposed that a 
phage cross should be interpreted within a ‘population 
genetics’ framework. They proposed that the vegeta- 
tive phage within an infected cell paired (‘mated’) with 
each other. During a mating, the linkage groups are 
randomly assorted, and markers within the linkage 
groups are recombined by reciprocal exchanges that 
do not interfere with each other. Since there is no 
evidence of mating types or sexes in a phage popu- 
lation, the vegetative phage were assumed to mate 
with each other at random with respect to their geno- 
type, with the consequence that half the matings are 
unproductive of recombinants. Under this assump- 
tion, one round of mating would result in 25% 
recombination for unlinked markers. To account for 
the observed 40-45% recombination between mark- 
ers on different linkage groups, the progeny particles 
must have had several such matings, with partners 
chosen at random for each mating. Since mating was 
assumed to be contemporaneous with replication, this 
is equivalent to saying that progeny particles derive 
from lineages that have experienced several matings 
on the average. These assumptions explained the 


formation of triparental recombinants as having arisen 
in successive biparental matings. They explained the 
negative interference as a consequence of the unequal 
numbers of productive matings experienced by the 
different lineages. The model, formulated algebraically 
(see below), accounted for most of the data it was in- 
tended to explain. An apparent weakness in the model, 
however, related to the assumption of reciprocal 
exchange. Crosses with T2 had shown that comple- 
mentary recombinants emerge from individual infected 
bacteriain numbers that are uncorrelated. At face value, 
this suggests that the recombination process is not reci- 
procal, i.e., the exchange process does not result in the 
formation of complementary recombinants in the same 
event. Visconti and Delbrück argued that the vegetative 
phage that have emerged from a mating could, by 
chance, enjoy differing opportunities for replication, 
obscuring the reciprocality of recombination which 
they had assumed. However, subsequent considerations 
revealed that the assumption of reciprocality played 
no role inthe form of the final equations obtained. 

The elementary Visconti—Delbriick equation relat- 
ing recombinant frequency R to the probability 
of recombination in a single mating p is 


R=2f(1-f)(1 — e”) 


where m is the average number of matings per lineage 
and f is the fraction of one of the two parental geno- 
types in the infecting phage population. When the two 
infecting types are equal, as was the usual intention, 
the expression simplified to R =5(1—e-””). These 
equations suppose that matings are Poisson-distribu- 
ted among lineages. 

The mean number of matings, m, per lineage was 
estimated to be about 5 by letting p be 0.5 for markers 
that gave R = 0.45. With m set at 5, the observed R 
values were converted to p values that showed no 
crossover interference. The Visconti—Delbriick theory 
met similar success with linkage data from phage 
lambda, when m was set at 1. 

For T-even phage, the mating theory required modi- 
fication when it was demonstrated that the ‘three link- 
age groups’ were, in fact, well separated segments of 
a single, circular linkage map. When the theory was 
adjusted for this circularity, the data were best ex- 
plained by the assumption that there was exactly one 
exchange per mating, rendering the concept of mating 
superfluous. The negative interference in the data, 
which had been the primary justification for the notion 
of mating, proved to be the result of several contribut- 
ing factors: (1) The classical definition of interference is 
inappropriate for a circular map; (2) as envisioned by 
Visconti and Delbrück, some vegetative phage are with- 
drawn from the mating pool (becoming infectious 
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particles) before others, further contributing to 
heterogeneity in opportunities for recombination; (3) 
when a mixture containing equal numbers of two 
phage genotypes is added to cells to initiate a cross, 
some cells, by chance, are infected by unequal numbers 
of the two types. This ‘finite input’ effect introduces 
further heterogeneity in recombination opportunities. 

For T4 phage, the mating theory was dealt its final 
blow with the demonstration that separate regions of 
the linkage map behave independently. The demon- 
stration made use of intragenic crosses employing 
markers (call them 7 and 2) in both the r and the e 
genes. One parent was the double mutant r/ e1; the 
other was r2 e2. The frequencies of r* and of e* 
recombinants were measured in the absence and in 
the presence of a third infecting phage, which was 
ele2 and was deleted for the r gene. In the presence 
of the third phage, the reduction in frequency of e* 
progeny phage was greater than was the reduction in 
frequency of the r* progeny phage. Apparently, the e1 
and e2 mutant genes indulged in fruitless interactions 
with the efe2 double mutant gene, while the r/ and r2 
genes interacted to produce r* as if the third phage was 
not present in the cross. 


See also: Interference, Genetic; T Phages 
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Vitamins are organic molecules that are required in 
very small amounts by some, but not all living organ- 
isms. After ingestion, vitamins are converted to active 
derivatives, in which form they act primarily as cofac- 
tors for enzymatic reactions. Vitamins such as A and 
D are also involved in regulating gene expression, 
while others have antioxidant capabilities. Vitamins 
differ from other metabolic intermediates, because 
they either cannot be synthesized by the organism or 
are present in such low quantities that they must be 
acquired externally. Historically, vitamins have been 
classified by their solubility: The B vitamins and vita- 
min C are soluble in water, whereas vitamins A, D, E, 
and K are soluble in organic solvents. 


Vitamins Act Catalytically as Cofactors 
in General Cell Metabolism 


Most vitamins or their active metabolites are needed as 
cofactors in the synthesis, catabolism, and modifica- 
tion of organic compounds. They interact, covalently 
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and noncovalently, with enzymes and are necessary 
for their activity. 

Biotin is a key cofactor required by the enzymes 
acetyl coenzyme A (-CoA) carboxylase (EC 6.4.1.2), 
pyruvate carboxylase (EC 6.4.1.1), transcarboxylase 
(EC 2.1.3.1), methylmalonyl-CoA decarboxylase (EC 
4.1.1.41), and oxaloacetate decarboxylase (EC 4.1.1.3), 
each of which catalyze carboxyl transfer reactions. 
The widespread distribution of biotin in foodstuffs 
and the ability to absorb some of the biotin synthe- 
sized by intestinal bacteria makes biotin deficiency in 
humans rare. However, deficiency has been observed 
in humans and experimental animals on diets contain- 
ing large amounts of uncooked egg white. The glyco- 
protein avidin, found in abundant quantities in egg 
white, binds biotin with high affinity, thereby prevent- 
ing its absorption through the gastrointestinal wall. 

Another vitamin directly involved in the synthesis 
and degradation of organic compounds is vitamin Bs 
(pantothenic acid). This compound becomes the func- 
tional moiety of coenzyme A and of the acyl carrier 
protein (see Figure |). Vitamin B; (thiamine) is neces- 
sary for energy-yielding metabolism, as its active form 
thiamine pyrophosphate is the prosthetic group for 
some enzymes such as pyruvate dehydrogenase (EC 
1.2.4.1) and 2-oxoglutarate dehydrogenase (EC 1.2.4.2) 
that are involved in oxidative decarboxylation. 

Many dehydrogenases use nicotinamide adenine 
dinucleotide (NAD*) or nicotinamide adenine di- 
nucleotide phosphate (NADP*). Niacin is a precursor 
in the synthesis of both of these cofactors. Other 
enzymes, designated flavoproteins, use the riboflavin 
(vitamin B,)-derived oxidation—reduction cofactors, 
flavin mononucleotide (FMN) and flavin adenine 
dinucleotide (FAD). 

Once translated, many proteins must be modified 
to be enzymatically or structurally active. Of signifi- 
cance in this regard is the action of vitamin C (ascorbic 
acid) as a cofactor for the hydroxylation of proline 
residues in collagen. This reaction is important for the 
maintenance of connective tissue, with deficiency 
causing related symptoms, including muscle fatigue, 
easily bruised skin, swollen gums, osteoporosis, and 
poor wound healing. Vitamin Bg, in phosphorylated 
form, is a necessary cofactor for transamination reac- 
tions required for amino acid synthesis and in the 
breakdown of the glucose storage polymer glycogen. 
Vitamin K is involved in converting blood coagulation 
precursor proteins to an active conformation. 


Folic Acid and Vitamin B,2 are Critical 
for DNA Synthesis 


Sufficient amounts of folic acid and cobalamin (Vita- 
min B42) are required for nucleotide synthesis. Folate 


derivatives directly associate as cofactors to enzymes 
that synthesize purines and the pyrimidine thymidine, 
while vitamin B, is necessary to ensure that enough 
folate-derived cofactor is present to support nucle- 
otide biosynthesis. 

Folic acid deficiency leads to megaloblastic and 
macrocytic anemia, a hallmark of improper DNA 
replication and cell division in rapidly dividing hema- 
topoetic cells. This condition is attributable to insuffi- 
cient thymidine synthesis, as purines can normally be 
obtained in the diet. 5,10-Methylenetetrahydrofolate, 
a cofactor derived from folic acid, acts as a methyl 
donor for thymidylate synthase (EC 2.1.1.45), which 
converts deoxyuridine monophosphate (dUMP) to 
thymidine monophosphate (TMP). A lack of folic 
acid causes an accumulation of dUMP as well as a 
TMP deficiency. In folate-deficient cells, RNA tran- 
scription and subsequent translation is normal, but 
normal DNA replication is impeded. Consequently, 
the cytoplasm is able to expand, but the growth and 
division of the nucleus lags behind, preventing cell 
division. An overabundance of dUMP coupled with 
a deficiency of TMP also increases the likelihood of 
misincorporation of (UMP in place of TMP, which in 
turn increases DNA instability and the probability of 
chromosome breakage. 

One of the functions of vitamin B42 is to act as a 
cofactor for the enzyme methionine synthase (EC 
2.1.1.14), which catalyzes the conversion of homocys- 
teine to methionine and in the process demethylates 
5-methyltetrahydrofolate. | Methyltetrahydrofolate 
tetrahydrofolate (THF) can be recycled and con- 
verted to such cofactors as 5,10-methylenetetrahydro- 
folate to be used for thymidine biosynthesis. Vitamin 
Biz deficiency leads to an accumulation of homo- 
cysteine and, more importantly, THF is not recycled, 
mimicking the effects of folate deficiency. 


Vitamins E and C Act as Antioxidants 


Reactive oxygen species (ROS) cause extensive 
damage to DNA and membrane lipids. Enzymatic 
reactions catalyzed by superoxide dismutase, perox- 
idase, and catalase counteract the effects of ROS. Pro- 
tection against ROS, such as from the highly reactive 
peroxyl radicals, is also accomplished by antioxidant 
molecules such as vitamins E and C. These react with 
the initial oxygen species to form less reactive species, 
which are readily quenched by molecules other than 
DNA and lipids. 

Vitamin E consists of a mixture of related com- 
pounds named tocopherols, all of which are soluble 
in lipids. Owing to their lipophilic nature, they ac- 
cumulate in cell membranes and adipose tissue. By 
behaving as a lipid peroxyl radical scavenger, vitamin 
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Pantothenic acid is an essential component of coenzyme A (SCoA). (A, B) The vitamin pantothenic acid 


(A) is incorporated as a functional component of coenzyme A (B); the dashed box (B) shows the substituents of 
coenzyme A that were derived from pantothenic acid. (C, D) Coenzyme A can be attached to acetyl-CoA and other 
carbonyl species to act as a good leaving group in carbon—carbon bond formation. A basic residue (X) of an enzyme 
can deprotonate the «a-carbon of acetyl-CoA. The carbon ion created can attack the electrophilic center of the 
carbonyl group of another acetyl-CoA, creating a carbon-carbon bond. The curved arrows show the direction of 
electron flow in the reaction mechanism. (E) The product of the substitution reaction. 


E prevents the peroxidation of polyunsaturated mem- 
brane fatty acids. Once reacted, the radical form of 
vitamin E can be converted back to a-tocopherol 
in a redox reaction, enhancing its antioxidant cap- 
abilities. Deficiency of vitamin E can cause red blood 
cell instability, but there are no major vitamin E 
diseases, because this substance is present in most 
food sources. 

Vitamin C (ascorbic acid) is water soluble and acts 
as a free radical scavenger in the cytoplasm and organ- 
elles of cells. Ascorbic acid is able to react with the a- 
tocopheryl radical as well as donate electrons to ROS. 
This accomplishes two important tasks: the recycling 
of vitamin E and the prevention of the tocopheryl 
radical from starting a phospholipid peroxidation 
chain reaction. 


Vitamin A Regulates Cell Growth and 
Development 


A member of the vitamin A family of molecules, ret- 
inoic acid, affects DNA replication and cell division. 


Like the other vitamin A molecules, retinol and retinal, 
retinoic acid is also derived from carotene. Retinal is an 
integral part of the membrane-bound light receptors 
rhodopsin and iodopsin. However, retinoic acid acts 
as a signaling molecule by binding ligand-dependent 
nuclear receptor proteins in a manner comparable with 
the behavior of steroid hormones. 

Retinoic acid aggregates with retinoic acid recep- 
tors (RARs) and retinoic acid X receptors (RXRs), 
allowing dimeric complexes of receptors to bind 
recognition sequences in the promoter region of target 
genes. RAR is able to heterodimerize with RXR, 
while RXRs can homodimerize or heterodimerize 
with other members of the nuclear receptor super- 
family. DNA binding can activate transcription or 
block the binding by other transcription factors, 
both of which contribute to repressing cell replication 
and thymidine uptake. 

11-cis-retinal is bound to the photopigments 
rhodopsin, in rod cells, and iodopsin, in cone cells. 
Light absorption by the opsins stimulates a series of 
conformational changes of the bound retinoid. Each 
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conformational change makes the association between 
retinal and the protein progressively less stable. 
This process culminates with the dissociation of 11- 
retinal in a trans conformation. The dissociation 
causes closing of sodium channels, hyperpolarizing 
the cell membrane. The change in membrane potential 
is transmitted as a nervous impulse along the optic 
neurons. 


Vitamin D is Important for Calcium and 
Phosphorus Metabolism 


The most important derivative of vitamin D, the bio- 
logically active form calcitrol (1,25-dihydroxyvitamin 
D3) acts similarly to retinoic acid (see Vitamin A 
regulates cell growth and development), by binding 
the nuclear vitamin D receptor (VDR). The binding to 
response elements in promoter regions by vitamin 
D-bound VDR dimers induces transcription of genes 
involved in increasing the intestinal absorption and 
kidney resorption of calcium and phosphorus. A 
deficiency reduces overall calcium and phosphorus 
levels, leading to the childhood disease rickets, char- 
acterized by the incomplete mineralization of bones. 
An analogous condition in adults is osteomalacia, 
which is the result of the demineralization of mature 
bones. Deficiencies are rare because of the capacity of 
skin cells to synthesize vitamin D ina light-dependent 
reaction. 


The Evolution of Vitamin C Synthesis 


Most mammals, birds, amphibians, and reptiles are 
able to synthesize ascorbic acid. However, guinea 
pigs and primates (including humans), have lost the 
ability to synthesize this vitamin. This phenomenon 
is caused by mutations in the gene encoding the 
enzyme L-gulonolactone oxidase (EC 1.1.3.8), which 
is responsible for converting gulonolactone to ascor- 
bic acid. It is generally believed that this deficiency 
was maintained because adequate dietary intake 
removed a selective advantage. 


Future Research 


There is still much to be understood about the bio- 
synthesis and mechanism of action of vitamins. The 
role of antioxidant vitamins in preventing cancer and 
possibly increasing longevity is not fully understood. 
Further research on vitamin gene regulation and the 
metabolic pathways effected will help to elucidate 
vitamins’ role in development. Ultimately, what is 
learned can be applied to enhance dietary intake and 
subsequently better human life. 
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Glycogen storage diseases are metabolic disorders 
resulting in storage of abnormal amounts and/or 
forms of glycogen. Von Gierke disease is a glycogen 
storage disease caused by defective liver glucose- 
6-phosphatase activity. The disease causing muta- 
tion(s) can either be in the gene coding for the liver 
glucose-6-phosphatase enzyme (G6PC) or in the 
gene coding for the endoplasmic reticulum substrate 
and/or product transport proteins of the glucose-6- 
phosphatase system (see Figure 1). 


History 


In 1929, the pathologist von Gierke carried out an 
autopsy on a child who had died of influenza. He 
noticed a very large liver that stained positively for 
glycogen. He sent tissue samples to Schoenheimer, a 
biochemist, who showed that the glycogen levels were 
36% of the dry weight of the liver. In subsequent 
years, more patients were described with abnormal 
liver glycogen storage. It took over 20 years for the 
first enzyme defect in glycogen storage disease to be 
delineated. In 1952, Carl and Gerty Cori found abnor- 
mally low glucose-6-phosphatase enzyme activity in 
livers of some (but not all) patients with glycogen 
storage disease. Glucose-6-phosphatase deficiency 
was therefore termed von Gierke disease. Since 1952, 
virtually all proteins involved in the synthesis or 
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Figure | A schematic representation of the human 
liver glucose-6-phosphatase system. G6PC, glucose- 
6-phosphatase enzyme; G6PTI, endoplasmic reticulum 
glucose-6-phosphate transport protein; Pi, inorganic 
phosphate; G-6-P, glucose-6-phosphate. 


degradation of glycogen (see Figure 2) and its regula- 
tion have been found to cause forms of glycogen stor- 
age disease. The glycogen storage diseases are now 
usually either named after the enzyme that is defective 
or numbered in the order in which the enzymatic 
defects were identified. Von Gierke disease was the 
first glycogen storage disease to be delineated and it is 
now more commonly called type 1 glycogen storage 
disease or glucose-6-phosphatase deficiency. 


Function of Liver Glycogen 


Glucose is the primary source of energy for most 
mammalian cells. Most tissues cannot make sufficient 
glucose to meet their metabolic needs. Blood glucose 
levels must stay within a narrow range to maintain 
normal metabolic function in brain and other tissues. 
It is, therefore, advantageous to an individual to have 
the ability to store glucose at times of plenty (for 
example, after a meal) in a compact macromolecular 
form, which can be rapidly broken down and released 
into the bloodstream at times of need. In human liver, 
glycogen is the storage form of glucose. 


Role of Liver Glucose-6-Phosphatase 


Whenever blood glucose levels fall, or at times of 
stress, the liver releases glucose into the bloodstream. 
In addition to breaking down glycogen to form glu- 
cose, the liver can also synthesize new glucose via 
the pathway called gluconeogenesis (see Figure 2). 
Glucose-6-phosphatase is the final step of both gluco- 
neogenesis and glycogen breakdown. Glucose-6- 
phosphatase breaks down glucose-6-phosphate to 
glucose and inorganic phosphate and is the only 
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enzyme that is capable of forming significant amounts 
of glucose in the body. The major role of liver glucose- 
6-phosphatase is therefore to produce glucose for use 
by other tissues. 

In patients with type 1 glycogen storage disease 
significant amounts of glucose cannot be made by 
either pathway. In contrast, in most other types of 
glycogen storage disease only glycogen breakdown is 
affected, and glucose can still be made via the gluco- 
neogenic pathway. 


The Liver Glucose-6-Phosphatase 
System 


Liver glucose-6-phosphatase is an integral membrane 
protein and its active site is inside the lumen of the 
endoplasmic reticulum, whereas all the other enzymes 
that produce or use glucose-6-phosphate are in the 
cytoplasm (see Figure 2). This means that the sub- 
strates and products of glucose-6-phosphatase must 
cross the endoplasmic reticulum membrane (see 
Figure 1). The substrate and product transport pro- 
teins are facilitative transporters that transport their 
substrates down a concentration gradient. The two 
common forms of type 1 glycogen storage disease 
are termed type la and type 1b glycogen storage 
disease 


Molecular Bases of Type la Glycogen 
Storage Disease: Glucose-6-Phosphatase 
Enzyme (G6PC) Deficiency 


A human liver glucose-6-phosphatase enzyme has 
been cloned, and the human glucose-6-phosphatase 
gene (G6PC1), which has five exons, is located on 
chromosome 17 at q21. To date, over 40 different 
mutations in the glucose-6-phosphatase enzyme gene 
have been found in patients with type 1a glycogen 
storage disease. Mutations have been found through- 
out all five exons and at the intron-exon boundaries. 
Two other highly related genes, G6PC2 and G6PC3, 
have been found recently. The three G6PC genes have 
different tissue-specific patterns of expression. Only 
G6PC1, the gene mutated in type 1a glycogen storage 
disease, is expressed at significant levels in human 
liver. 


Molecular Bases of Type Ib Glycogen 
Storage Disease: G6PT| Deficiency 


The gene mutated in type 1b glycogen storage disease 
has been mapped to chromosome 11q23. The gene has 
nine exons spanning a region of approximately 4 kb. 
To date, mutations have been found in all the exons 
except exon 7. The gene is differentially spliced in a 
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Figure 2 Schematic representation of the pathways of liver glucose production. 


tissue-specific manner and exon 7 is not expressed in 
liver. The 1b gene is expressed more widely than the 
G6PC1 gene, which may explain why patients with 
type 1b glycogen storage disease often have additional 
symptoms to those with type 1a glycogen storage 
disease. For example, patients with type 1b glycogen 
storage disease often have neutropenia, and G6PT1 
(but not G6PC1) is expressed in neutrophils. 


Clinical Presentation of Glycogen 
Storage Disease 


Type la 

The clinical manifestations of type 1a glycogen storage 
disease are many and varied, including the logical 
effects of defective glucose production, e.g., growth 
retardation, hepatomegaly, fasting hypoglycemia, lac- 
tic acidemia, hyperuricemia, and hyperlipidemia. The 
severity of individual symptoms vary greatly among 
patients, who may be virtually asymptomatic in rare 
cases. These diseases normally present in childhood 
but, surprisingly, some present in adulthood. Long- 
term complications include gout, hepatic adenomas, 
hepatomas, and renal disease. 


Type Ib 

Type 1b glycogen storage disease is often more severe 
than type 1a glycogen storage disease. Type 1b glyco- 
gen storage disease has a similar clinical course to type 
1a glycogen storage disease, with the additional find- 
ings of neutropenia and impaired neutrophil function 
resulting in recurrent bacterial infections. Oral and 
intestinal mucosa ulceration commonly occur, and 
cases of chronic inflammatory bowel disease have 
been reported. 


Management 


In the past, many patients with type 1 glycogen stor- 
age disease died, and prognosis was guarded in those 
who survived. In the past 20 years, major progress has 
been made in managing this disorder. Current treat- 
ment of type 1 glycogen storage disease involves the 
nocturnal nasogastric infusion of glucose and/or oral 
uncooked cornstarch. Early diagnosis and initiation of 
treatment has improved the prognosis, with normal 
growth and pubertal development and reduced risk of 
gout in adult patients. 


Further Reading 

Cori GTand Cori CF (1952) Glucose-6-phosphate of the liver in 
glycogen storage disease. Journal of Biological Chemistry 199: 
661-667. 


See also: Glucose 6-Phosphate Dehydrogenase 
(G6PD) Deficiency 
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Germline mutations in the VHL tumor suppressor 
gene cause von Hippel—Lindau disease — a dominantly 
inherited familial cancer syndrome characterized by 
the development of vascular tumors in the retina and 
central nervous system (hemangioblastomas), clear cell 
renal cell carcinomas, pheochromocytomas, pancreatic 


islet cell, and other tumors. Tumors from von Hippel- 
Lindau patients have somatic inactivation of the wild- 
type allele and somatic inactivation of both alleles 
occurs in most sporadic clear cell renal cell carcinomas 
and hemangioblastomas. The VHL gene product 
plays a critical role in regulating proteosomal degrad- 
ation of the a subunits of the hypoxia-inducible tran- 
scription factors HIF-1 and HIF2. 


See also: Tumor Suppressor Genes 


Von Willebrand Disease 
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Von Willebrand disease is the commonest inherited 
bleeding disorder in humans. Deficiency of Von 
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Willebrand factor (vWF) causes defective platelet 
adhesion and abnormal Factor VIII function. In its 
effects it thus mimics hemophilia A. Unlike hemo- 
philia A, the condition is inherited as an autosomal 
trait, usually autosomal dominant, but a severe auto- 
somal recessive form is also recognized due to homo- 
zygosity of the gene. The vWF gene has been 
sequenced and many mutations associated with the 
disease have been identified. 


See also: Hemophilia 


X Chromosome 
Y Boyd 


Copyright © 2001 Academic Press 
doi: 10.1006/rwgn.2001.1392 


The human X chromosome contains around 
150000000 base pairs (150 Mb) of DNA, approxi- 
mately 5% of the genetic content of each cell. There 
are estimated to be around 3000 to 5000 genes carried 
on the X chromosome and several hundred of these 
have been associated with clinical disease. Lists of 
genes and genetic diseases that have been mapped to 
the human X chromosome can be found on the 
Genome Database (GDB) website. Two features of 
the X chromosome have made it particularly amenable 
to study. The first feature is that caused by the differ- 
ence in X chromosome content of females and males, 
with XX females inheriting one X chromosome from 
each parent and XY males inheriting their single X 
chromosome from their mothers and the sex- 
determining Y chromosome from their fathers. Pedi- 
grees for X-linked traits, such as color blindness and 
hemophilia, therefore exhibit a distinctive inheritance 
pattern, with the trait being manifested only in males 
and being passed on through their unaffected daugh- 
ters to their grandsons. The second catalyst for re- 
search on X-chromosome genes was the discovery 
that the chromosome contained the locus encoding 
hypoxanthine phosphoribosyl transferase (HPRT), a 
non-essential enzyme in the purine salvage pathway. 
Since HPRT can be subject to both forward and 
reverse chemical selection, the presence or absence 
of the entire X chromosome, or portions of the 
X chromosome containing HPRT, can be selected 
for in cell culture systems. This has led to the pro- 
duction of somatic cell hybrid panels that contained 
different portions of the X chromosome and which 
have been extensively used to map X-chromosomal 
genes. 

The X chromosome has also been the focus 
of research efforts because of its association with X- 
inactivation (see X-Chromosome Inactivation) the 
phenomenon whereby most genes on one of the two 
X chromosomes selected at random are silenced in 


early female development. Recent advances in mo- 
lecular biology have brought considerable insight 
into the mechanisms behind this phenomenon, 
which is thought to have arisen as a means of ensuring 
that both XX females and XY males have a single dose 
of the products of X-linked genes in all their somatic 
cells. 


Evolutionary Origin of the 
X Chromosome 


Examination of the sex chromosomes of nonmamma- 
lian vertebrates such as snakes has led to the conclu- 
sion that the mammalian X and Y chromosomes were 
derived from a pair of autosomal homologs which 
initially differed at only one, or a few loci, that were 
required for sex determination. The difference in size 
and shape of the X and Y chromosome, known as 
heteromorphism, is thought to have arisen through 
recombination suppression and loss of genes from 
the sex-determining Y chromosome. There are still 


~+— Xp-Yp homology (pseudoautosomal region) 
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Figure | The human X chromosome represented as 
a Giemsa-banded ideogram. Xp (X chromosome short 
arm) and Xq (X chromosome long arm) are divided into 
stained and unstained regions that are referred to by a 
standard nomenclature (numbers given to the right of 
the chromosome). The four main regions of homology 
between the human X and Y chromosomes are 
indicated by the arrow. There are also much smaller 
regions of homology represented by single genes. 
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Figure 2 Comparative map of the human and mouse X chromosomes. Each rectangular block represents a 
chromosomal region, known as a conserved segment or homologous block, which contains the same genes in the 
same order on both chromosomes. The seven largest blocks have been given equivalent numbers on the human and 
mouse X chromosomes. It is not known whether there is any sequence homology between the human and mouse 
centromeres. It can be seen that, whereas there are only two large conserved segments (blocks 6 and 7) that are 
shared between the human X chromosome long arm and the mouse X chromosome, the human X chromosome 
short arm is composed of several smaller conserved segments. The position of genes responsible for human X-linked 
disorders, which have either natural or engineered mouse models, are positioned to the side of each chromosome. 


some regions of homology between the human X and Y 
chromosomes, including the 5Mb region pseudo- 
autosomal region where the two chromosomes pair 
during meiosis (Figure |). The ancestral sex chromo- 
somes of mammals are probably represented by the 
region of approximately 120 Mb covering the entire 
human X long arm and the centromeric region of the 


short arm which is homologous to the X chromosome 
found in noneutherian mammals such as marsupials. 
The additional material on the X chromosome short 
arm of eutherian (placental mammals), is autosomal in 
marsupials and is thought to have been transposed 
at intervals during evolution onto the X chromosome. 
X and Y chromosomes of differing sizes are also 


present in insects and worms. Birds have a reciprocal 
system where females are the heterogametic sex (i.e., 
making two different types of gametes) and carry two 
different sex chromosomes this time called Z and W 
and males have a ZZ complement. The sex chromo- 
somes of birds are thought to have evolved independ- 
ently from those of mammals as genes that are Z- 
linked in birds have been mapped to a range of 
human and mouse autosomes. 


Conservation of X Linkage in Mammals 


In the late 1960s Susumu Ohno predicted that genes 
that were X-linked in one mammalian species would 
also be X-linked in another to avoid the imbalance in 
gene dosage that would occur if X chromosome genes 
were moved onto an autosome. This prediction has 
been shown to be true and has helped to identify 
animal models for human X-linked diseases, which 
can be recognized by their unusual inheritance pat- 
tern. However, although the same genes lie on the X 
chromosome in all mammals, mapping experiments 
have revealed that groups of loci in the same order, 
known as conserved segments or homologous blocks, 
have been rearranged with respect to each other dur- 
ing evolution. A good example of this is illustrated 
by the comparative map of the human and mouse X 
chromosomes which comprises a series of conserved 
segments that range in size from 100 000 to 50 000 000 
base pairs (Figure 2). The pseudoautosomal region, 
which contains genes that are expressed on the active 
X, the inactive X and the Y chromosome and are 
therefore not subject to dosage compensation, is the 
only region on humans and mouse that does not have a 
conserved gene content. 


Human X-Linked Disease 


The most common human syndromes associated with 
the X chromosome are anomalies in sex chromosome 
number that arise through nondisjunction at meiosis. 
Turner syndrome occurs in ae iin 1 in 2000 
female births and is caused by the loss of an entire 
chromosome leading to an XO karyotype. To explain 
why the presence of a single X chromosome is deleteri- 
ous in XO females but not in XY males, it has been 
proposed that Turner syndrome is caused by a single, 
not double, dose of one or more of the few genes that 
normally escapes from X-inactivation. This is in 
tune with the observation that mice with an XO 
karyotype do not have an overt phenotype and that 
there are fewer mouse genes reported that escape 
X-inactivation. An additional X chromosome is pres- 
ent in the 1 in 600 males that are Klinefelter syndrome 
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patients, who have an XXY karyotype. More rarely, 
females have also been identified with XXX and 
XXXX complements. Mutations, or rearrangements, 
in genes that are important in primary or secondary 
sex determination can give rise to females with an XY 
chromosome complement and males with an XX 
complement. 

Mutations in single X-linked genes are fully 
expressed in males and give rise to ‘sex-linked’ dis- 
orders, for example, Duchenne muscular dystrophy 
which has an incidence of around 1 in 3000 males and 
the fragile X-linked mental retardation syndrome 
which has an incidence of around 1 in 10000 males. 
As a result of the random inactivation of one of their 
two X chromosomes in early development, all females 
are mosaics of two populations of cells and the relative 
numbers of cells in these two populations will differ 
between individuals. Often females heterozygous for 
a mutated gene are completely unaffected as the popu- 
lation of cells expressing the nonmutated allele either 
provides a sufficient quantity of normal gene product, 
or, during development or lineage differentiation, 
predominates over the population of cells carrying 
the mutant allele. However, some female carriers for 
X-linked ‘recessive’ diseases manifest some disease 
symptoms because of a natural skew in favor of cells 
with the mutated X as the active chromosome. Very 
occasionally, carrier females may manifest the same 
severity of disorder as that seen in males. Mutations 
in X-linked genes may also give rise to X-linked 
‘dominant’ disorders found only in females and in 
these instances it is assumed that affected males 
die before birth. The most common example of an 
X-linked dominant is Rett syndrome, a severe pro- 
gressive neurological disorder affecting approxi- 
mately 1 in 20000 females, which has recently been 
associated with mutations in the gene encoding 
methyl-CpG-binding protein. 


Further Reading 

Boyd Y, Blair HJ, Cunliffe P, Masson WK and Reed V (2000) 
A phenotype map of the mouse X chromosome: models 
for human X-linked disease. Genome Research 10: 277-292. 

Genomic Database (GDB) http://www.gdb.org/gdb. 

Lahn BT and Page DC (1999) Four evolutionary strata on the 
human X chromosome. Science 286: 964—967. 

Miller JR (1990) X-Linked Traits: A Catalog of Loci in Non-Human 
Mammals. Cambridge: Cambridge University Press. 

Ohno S (1969) Evolution of sex chromosomes in mammals. 
Annual Review of Genetics 3: 495-521. 


See also: Ohno’s Law; Sex Determination, 
Human; Sex Linkage; W Chromosome; 
X-Chromosome Inactivation; Z Chromosome 
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In mammals sex is determined by the X and Y chromo- 
somes, females being chromosomally XX and males 
XY. In females one of the two X chromosomes in each 
cell becomes genetically inactive and untranscribed 
early in development and remains so throughout life. 
This is termed X-chromosome inactivation (XCI). 
The result is that the effective dosages of products of 
X-linked genes are equal in males and females. The 
X chromosome is typically large and with many genes 
unconnected with sex, whereas the Y chromosome 
is typically much smaller and carries orthologs of 
only a few of the X-linked genes. Without XCI 
males and females would thus differ in levels of 
X-linked gene products. X-chromosome inactivation 
is thus a form of dosage compensation. The existence 
of XCI was first suggested by Mary Lyon in 1961. 
For a time this suggestion was known as the ‘Lyon 
hypothesis, and the inactive X chromosome was 
said to be ‘lyonized.’ These terms are now outdated, 
however. 

Either X chromosome can be inactivated in differ- 
ent cells in the embryo proper of eutherian mammals. 
Once the choice has been made the same X chromo- 
some remains inactive in the descendants of each cell 
throughout life. By contrast in the extraembryonic 
membranes of eutherian mammals, and in all cells of 
marsupials, the paternally derived X chromosome is 
inactivated in all cells. In the female germ cells the 
inactive X chromosome is reactivated as the cells 
approach meiosis, whereas in the male germ cell the 
single X chromosome becomes inactive. Rare indi- 
viduals are found with supernumerary or missing 
X chromosomes such as XXY males, or XXX or 
XO females. In these individuals a single X chromo- 
some remains active no matter how many are present. 
Thus, there is a counting mechanism ensuring that a 
single X chromosome remains active per two auto- 
some sets. 

When the X chromosome becomes inactive it takes 
on a set of characteristic properties. It replicates its 
DNA late in S-phase, and it remains condensed during 
interphase and forms the sex chromatin body against 
the nuclear membrane. It shows hypoacetylation of 
lysine residues in histones, and in somatic cells of euth- 
erian mammals, the cytosines in CpG islands of pro- 
moter regions of housekeeping genes are differentially 


methylated. In these respects the inactive X chromo- 
some (Xi) behaves like heterochromatin, whereas the 
active X chromosome (Xa) in the same cell behaves as 
euchromatin. In addition a specific protein, histone 
macro H2A\1, is concentrated in the Xi. 

X-chromosome inactivation in marsupials differs 
from that in eutherians not only in preferential pater- 
nal X inactivation but also in the lack of differential 
methylation. Late replication and hypoacetylation of 
histones are seen as in eutherians. The inactivation is 
both less complete and less stable than in eutherians. 

The mechanism of XCI remains unknown but 
there have been recent major advances in knowledge. 
The initiation of inactivation in early development 
requires the presence of the X-inactivation center 
(XIC) on the X chromosome. Segments of X chromo- 
some lacking an XIC through translocation or deletion 
do not undergo XCI. A single gene Xist (X inactive 
specific transcript) located at the XIC is essential for 
XCI. Gene knockouts show that it is needed for both 
random and paternal XCI, but is not needed for sper- 
matogenesis (when the single X chromosome becomes 
inactive). Knockout of the promoter and first exon 
prevented initiation of inactivation of the affected X 
chromosome but counting of X chromosomes still 
occurred. Knockout of the region 3’ to exon 6 pre- 
vented counting. Insertion of transgenes for Xist 
showed that a 40-kb cosmid including the Xist gene 
was sufficient for counting and inactivation. Xist 
RNA can coat the autosome and inactivate autosomal 
genes and hence X-specific sequences are not essential 
for Xist function. Before the onset of XCI Xzist is 
transcribed from both X chromosomes but the tran- 
scripts are unstable. At XCI the transcript from the 
incipient Xi becomes stabilized and its RNA accumu- 
lates over the entire length of the Xi and appears to 
coat it. The allele on the Xa is then silenced. Methy- 
lation of cytosines at certain sites in Xist is required to 
maintain its silence on the Xa. Methylation is also 
thought to form the imprint that prevents the matern- 
ally derived X chromosome becoming inactive in the 
extraembryonic membranes. Xist has not yet been 
found in marsupials. It may be that they have lost the 
gene or it may be present but very poorly conserved. 

It is not clear whether Xzst has a role in maintaining 
the inactive state once initiated. Loss of Xist activity 
after the X chromosome has become inactive does not 
necessarily lead to reactivation. There is good evi- 
dence that differential methylation of cytosines on 
the Xi is important in stabilizing inactivation. Lack 
of differential methylation of the Xi in marsupials is 
thought to explain the lower stability of XCI in this 
group. In addition, late replication of the Xi is thought 
to provide a stabilizing mechanism in both eutherians 
and marsupials. 


How Xist RNA brings about the conversion of the 
Xi from the active euchromatic to the inactive hetero- 
chromatic state remains unknown. Since late replica- 
tion of DNA and hypoacetylation of histones are both 
found in marsupials as well as eutherians they are 
thought to be part of the fundamental mechanism. 
During the process of initiation of XCI late replica- 
tion appears earlier than hypoacetylation, suggesting 
that induction of a delay in replication may be the 
first step leading on to hypoacetylation, condensation, 
and inactivation. However, other possibilities remain 
open. 


See also: Sex Chromosomes; X Chromosome 
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Xenology is the condition of having been, in its his- 
tory, transferred, not from parent to offspring, but 
from one species to another (horizontal transfer). It 
does not include the transfer between organelles and 
the nucleus. 


See also: Horizontal Transfer 
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The South African clawed frog, Xenopus laevis 
(Figure l), is a model organism for the analysis of 
vertebrate development. Its long generation time of 
1.5 years precludes generation of mutants for genetic 
analysis. Nonetheless, the availability of large num- 
bers of large eggs and the ease of manipulating experi- 
mentally the early embryos allow the identification 
and characterization of genes that are important for 
development. 

While amphibians have long been used for embry- 
ological research, X. laevis has been the frog of choice 
since 1960. An important reason for this choice is that 
X. laevis is the easiest amphibian to maintain as a 
breeding colony. Unlike most other frogs, it remains 
aquatic as an adult, it tolerates dirty water, and it does 
not require live food. Females can be injected with 
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Xenopus laevis 


Figure | Xenopus laevis. (From: Amphibians: Guidelines 
for Breeding, Care, and Management of Laboratory Animals 
(1974) National Academy of Sciences, USA.) 


hormones to induce egg-laying, every few months 
over many years, and tadpoles are easily raised to 
become reproductive adults. 

Although it takes too long to do crosses to charac- 
terize genes by mutation, the combination of molecu- 
lar and embryological techniques in X. laevis allows 
genes to be analyzed. The embryological advantages 
include the numbers of eggs, their large size, the speed 
of development, and the hardiness of the embryos. An 
adult female will spawn hundreds of eggs at a time, 
each 1.3 mm in diameter. By 3 days after fertilization, 
many organ systems have formed and the embryo 
begins to swim. It is simple to obtain large numbers 
of synchronously developing embryos and to dissect 
out particular parts of them. Similarly it is easy to 
inject molecules into the embryos and to perform 
microsurgery. Pieces of embryo can be transplanted 
to other embryos or joined together as recombinants 
in culture, requiring only simple salt solutions and 
antibiotics. 

In place of mutagenesis to find genes of interest, 
those working with X. laevis use various criteria to 
select genes from cDNA libraries. Criteria may 
include: expression of the gene at one stage or in one 
tissue of the embryo and not in another, homology toa 
gene of interest in another model organism such as the 
fruit fly Drosophila or the nematode Caenorhabditis 
elegans, or an activity in the embryo itself. The activity 
of any cDNA is easily assayed by injecting its corres- 
ponding RNA into the dividing embryo and examin- 
ing the effect on development. The simplicity of the 
RNA injection assay allows molecular screens for 
genes in place of mutant screens. For example, the 
dorsal axis is the defining feature of the vertebrates 
and consists of the central nervous system, the back- 
bone, and the body musculature. Use of molecular 
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approaches on X. laevis embryos identified genes 
important for the development of the dorsal axis. 
These include Siamois, goosecoid, chordin, noggin, 
and Cerberus, most of which play important roles in 
mammalian development. 

The strengths of X. laevis as a model system can be 
seen by the variety of investigations that can be done. 
Signaling between two tissues in development can be 
detected by the simple microsurgical procedure of 
making a recombinant in culture between those two 
tissues. Basic life processes can be analyzed using pre- 
parations from X. laevis eggs. For example, extracts 
made from eggs or early embryos continue to exhibit 
features of cell division in the test tube, and molecules 
can be added to the extracts to test their role in the cell 
cycle. In addition, the X. laevis egg itself can serve as a 
tiny, 1-1 test tube to examine activities of molecules. 
For instance, RNAs for proteins that serve as ion 
channels can be injected into eggs, and the eggs will 
now use those channels. In this way, the chloride 
channel involved in cystic fibrosis was analyzed. 

Finally, new genes can be introduced into X. laevis 
by mixing cDNAs with sperm nuclei. The sperm 
nuclei are injected into unfertilized eggs to start devel- 
opment. This transgenic procedure permits more 
extensive testing of genes in X. laevis development as 
well as the establishment of genetic lines that carry a 
gene of interest. X. laevis and its relative X. tropicalis, 
which has a simpler genome, provide valuable oppor- 
tunities for the study of vertebrate development and 
cell biology. 


See also: Developmental Genetics 
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Xeroderma pigmentosum (XP) is the classical human 
recessive disorder caused by defective nucleotide exci- 
sion repair of DNA damage, including pyrimidine 
dimers induced by UV radiation. Sun-exposed skin 
of XP patients appears parchment-like and hyperpig- 
mented and has an over 1000x increased risk of skin 
cancer. In the most severely affected cases there is 
progressive neuronal degeneration as well. Clinical 
management is largely restricted to stringent sunlight 
protection measures. Seven different genes are 
involved. Of these, XPA, XPC, and XPE are required 
for DNA lesion recognition; the XPB and XPD 
gene products are helicases mediating local strand 


unwinding and XPF and XPG specify structure- 
specific endonucleases performing strand incision on 
either side of the lesion. There exists an additional 
relatively mild ‘variant’ form of XP caused by defect- 
ive DNA polymerase n (eta), a translesion polymerase 
that can replicate DNA templates containing UV 
damage. 


See also: DNA Polymerase (Eta); Excision 
Repair; Pyrimidine Dimers 
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All organisms with XX:XY systems of sex determin- 
ation generally exhibit equal levels of X-linked gene 
expression in XX females and XY males. This phe- 
nomenon, known as dosage compensation, is achieved 
in mammals by inactivation of one X chromosome in 
female cells (Lyon, 1961). Asa result, the two identical 
X chromosomes in the same cell are distinguishable 
from one another. Almost from the inception of the X- 
chromosome inactivation hypothesis, it was realized 
that this unequal expression of the Xs in the same cell 
could be best explained by a single initiation site on 
the X chromosome from which inactivation would 
spread. Studies of X:autosome translocations sup- 
ported this idea in that only one X chromosome seg- 
ment of a rearrangement could be inactivated, and 
studies of the X chromosome break points from 
different translocations permitted an approximate 
location of the initiation site. In addition, heritable 
variation in X chromosome susceptibility to inactiva- 
tion permitted an approximate mapping of the initi- 
ation site, the results of which agreed with the results 
from the translocation studies. The initiation site was 
named the X inactivation center (XIC), but little pro- 
gress in identifying the specific gene involved was 
made until the late 1980s. 


Discovery 


Realizing that the controlling gene might have the 
unique property of being expressed only from the 
inactive X chromosome, in 1989, Willard and his 
group began a systematic search for X-linked genes 
that escape inactivation concentrating on the region 
believed to contain the XIC. In 1991 they discovered a 
gene that was expressed from the inactive X alone and 
named it X inactive specific transcript (XJST/Xist; 


Brown et al., 1991). (Human genes are depicted in 
italicized capital letters, such as XIST, while mouse 
genes are depicted in italicized capital and lower-case 
letters, such as Xist. For simplicity, the all capitals 
version will be used here.) Both the human and 
mouse X/ST transcripts have no significant open read- 
ing frame and they remain in the nucleus, with the 
processed transcript coating the inactive X. Prior to 
inactivation, X/ST is expressed from both X chromo- 
somes, but in an unstable form which does not coat 
either X chromosome. A critical point in the inactiva- 
tion process is the stabilization of the X/ST transcript 
from one X, leading to coating of that chromosome 
and its inactivation. At this stage, X/ST transcription 
from the other X ceases. 


Developmental Studies 


If XIST is the gene that initiates X inactivation, its 
expression should precede the other features charac- 
teristic of X inactivation, such as widespread promoter 
methylation, hypoacetylation of histones, late replica- 
tion, and histone macroH2A1.2 association. This tim- 
ing was verified, as XIST expression is first detected at 
the four-cell stage in the mouse (Kay et al., 1993) 
before any obvious signs of X inactivation, and it 
continues to be expressed through later developmen- 
tal stages and into adult life. In the mouse, the early 
expression of XJST is imprinted, with the paternal 
allele being exclusively expressed in extraembryonic 
tissues and, accordingly, the paternal X chromosome 
is exclusively inactivated. Later, in the embryo proper, 
XIST expression and inactivation are random. 


Requirement of XIST for X-Inactivation 


Deletion of XIST, including the promoter, prevents 
that chromosome from becoming inactivated, which 
indicates that XIST is required in cis for inactivation to 
occur (Penny et al., 1996). Transfection of XIST in 
multiple copies onto a murine autosome can bring 
about some of the characteristics of X-inactivation 
on that autosome (Lee et al., 1996). These results 
along with the findings from developmental studies 
demonstrate that X/ST is necessary and probably 
sufficient for initiating X-inactivation. 


Determining the Active versus the 
Inactive X 


X-inactivation occurs normally only in cells with 
more than one structurally normal X chromosome. 
In cells with multiple X chromosomes, but a normal 
number of autosomes, only one X remains active. In 
polyploid cells, however, more than one X can remain 
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active. This means that the initiating factor must be 
involved in ‘counting’ the number of X chromosomes 
relative to the autosome content. Furthermore, the 
paternal X in the mouse is selectively inactivated in 
extraembryonic cells, and in the embryo proper, 
where inactivation is normally random, a choice 
must be made in each cell as to which chromosome 
will remain active or be inactivated. This is the ‘choice’ 
aspect of X-inactivation. 

Some light has been shed on these questions with 
XIST knockout experiments in mice. As mentioned 
earlier, when the 5’ region of XJST, including the 
promoter, is deleted, no XIST RNA is produced and 
the chromosome cannot be inactivated. The normal X 
is inactivated in about half the cells and in the remain- 
ing cells both the deleted and normal Xs are active. 
The cells with both Xs active are not viable in the 
embryo. This outcome implies that in about half the 
cells, the deleted X was chosen for inactivation, but 
could not be inactivated because of the 5’ deletion. 
Most importantly, this result suggests that the 5’ end 
of XIST may not be of value in the ‘counting and 
choice’ functions of X-inactivation. In contrast to 
the 5’ deletion, a 3’ deletion of XIST insures that the 
deleted X will be inactivated; this is true even if there is 
only a single X in the cell (Clerc and Avner, 1998). This 
latter observation indicates that the ‘counting’ func- 
tion can be destroyed without affecting inactivation. 

In humans a mutation in the XIST promoter 
appears to alter the probability of inactivation of the 
chromosome carrying the mutation. If the interpret- 
ation is true, it would mean that in humans, in contrast 
to mice, the 5’ end of X/ST has a role in counting and 
choice. Alternative interpretations of these data are 
possible and these will have to be tested before the 
human results can be considered to be contrary to the 
murine findings. 

The recent discovery of a gene, TSIX, which is 
antisense to XIST leads to an explanation of the 
different functions of the 5’ and 3’ ends of XIST (Lee 
et al., 1999). Like XIST, TSIX produces an untrans- 
lated nuclear RNA. It has its own promoter that is 
deleted in the 3’ XIST deletion referred to above, and 
it is transcribed at low levels from both Xs prior to 
inactivation. As the inactivation process begins, in an 
in vitro embryonic culture system, TSIX transcription 
is shut down on the chromosome to be inactivated, 
while on the other X, low-level TSIX and XIST 
expression are maintained. The next step appears to 
be an enhancement and stabilization of XIST expres- 
sion on the X that does not express TSX. XIST is then 
shut down on the other X (destined to be the active X). 
When the differentiation process leading to an active 
X and an inactive X in the same cell is complete, 
TSIX transcription is silenced on both Xs and XIST 
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transcription and accumulation occurs only from the 
inactive X. This series of events implies that TSIX 
expression directly blocks XJST accumulation in cis 
and that silencing TS/X on one X is all that is neces- 
sary to produce an inactive X chromosome. The estab- 
lishment of the active X chromosome requires that 
XIST be shut down prior to TSIX silencing so that 
XIST will not accumulate. 


Control of TSIX and XIST Expression 


Since the promoters of both genes are rich in CpG 
dinucleotides, promoter methylation appears to be a 
possible means for switching transcription on and off. 
The XIST promoter has been well studied, and in fully 
differentiated female cells the X7ST promoter on the 
active X chromosome is hypermethylated, while the 
promoter on the inactive X is hypomethylated. In 
mature sperm, the XJST promoter is hypermethy- 
lated, while in the oocyte it is hypomethylated. Soon 
after fertilization, however, there is a general wave 
of demethylation that includes X/ST, and current 
methylation studies of early mouse embryonic stages 
show no XIST methylation from the eight-cell stage 
through blastula formation. We expect no XIST pro- 
moter methylation on the inactive X as XIST is 
expressed on that chromosome. On the active X, how- 
ever, XIST is silenced, and in fully differentiated in 
vitro cultured embryonic cells and im vivo somatic 
cells the XIST promoter on the active X chromosome 
is fully methylated. Apparently, the initial shutdown 
of XIST on the active X is not determined by methy- 
lation, although methylation of the promoter must 
occur soon after silencing. What the initial silencing 
factor for XIST is remains to be determined. As men- 
tioned earlier, the key to inactivating an X is silencing 
TSIX and limited evidence suggests that promoter 
methylation of TSIX may be involved. 


Aneuploidy and X-Inactivation 


Another important question is: what are the details of 
the inactivation process in cells with a normal comple- 
ment of autosomes but multiple X chromosomes? 
Most workers assume that one X is first randomly 
selected to be active and that the remaining Xs are 
inactivated by a default mechanism. The fact that a 
single X with the 3’ X/ST deletion can be inactivated 
could be considered as support for this idea. That is, 
the 3’ region contains the site for ‘marking’ and/or 
‘counting’ the active X and its deletion allows its 
inactivation. It may be that observations of polyploid 
cells will also have bearing on this ‘marking’ event. In 
a triploid with two Xs and a Y, both Xs remain active, 
but in a triploid with three Xs, either one or two Xs 


may be inactivated. That inactivation does not occur 
in a triploid cell with two Xs and a Y suggests that the 
‘counting’ mechanism must involve not only the num- 
ber of X chromosomes, but an autosomal:X chromo- 
some ratio, as in Drosophila and Caenorhabditis. An 
excess of autosomes relative to X chromosomes would 
act to prevent inactivation as in normal males and in 
triploids with two Xs and a Y, while an equal or lower 
autosomal:X chromosome ratio, as in normal females 
or triploids with three Xs, results in inactivation. The 
absence of inactivation in the triploid XXY cell also 
argues against inactivation being initiated by pairing 
or interaction between X chromosomes. 

The primary experimental system used to obtain 
much of the developmental information on XJST is 
the murine embryonic cell (ES) culture system. XX 
ES cells in an undifferentiated state represent a pre-X- 
inactivation embryonic state; upon induction of dif- 
ferentiation one X becomes inactivated, mimicking the 
process in vivo. With respect to the comments just made 
on ploidy and X-inactivation, it would be very valuable 
if ES cultures with more than two X chromosomes 
and various levels of polyploidy were available so that 
the questions just raised might be better explored. 


How does XIST Work? 


At the outset it must be stated that we have no defini- 
tive answer to this most important question. There are 
numerous observations and interpretations, however, 
that can help develop a working model. 

As noted, XIST RNA binds to the inactive X. If the 
chromatin is digested away, XIST RNA remains 
bound in the nucleus, suggesting that it is not directly 
in contact with chromosomal DNA, but rather bound 
to protein components of the nuclear matrix, some of 
which must be the chromosomal scaffold. The amount 
of XIST RNA in the nucleus has been calculated to be 
insufficient to cover all the coding regions of the in- 
active X, which implies that XIST RNA does not 
interact with individual genes. Careful cytogenetic 
studies have shown that XIST RNA exhibits a banded 
structure on the murine inactive X chromosome, with 
the RNA being excluded from regions of constitutive 
heterochromatin. 

Silencing of the X is a complex phenomenon 
including XIST RNA, promoter methylation, hypo- 
acetylation, late replication, histone macroH2A1.2 
association, and undoubtedly other as yet undiscov- 
ered factors. The interrelationships and possible inter- 
actions of these various silencing factors have yet to be 
fully worked out, but it is already clear that, at least, 
some of these factors have a considerable degree of 
independence. For example, the X7ST gene can be 
deleted from the inactive X of a somatic cell, and 


inactivation at individual loci is still maintained. In- 
activation is not as stable when XIST RNA is not 
properly localized, however, as is the case in the trans- 
formed cells from which XIST was deleted. In such 
cases it is likely that promoter hypermethylation 
would be sufficient for silencing. It is also possible 
that in the presence of XIST, but with promoter hypo- 
methylation and advanced replication, that a particu- 
lar gene on the inactive X can escape inactivation. 
XIST RNA has been shown to colocalize with the 
histone variant, macroH2A1.2, very early in develop- 
ment (Costanzi et al., 2000). It is likely that direct or 
indirect interactions with other proteins will be found 
inthe near future. One possible mode of action of XIST 
RNA is that interactions with various proteins could 
bring about facultative heterochromatinization of the 
inactive X, thus acting as an initial silencing agent. This 
could also mark the chromosome for further modifi- 
cations by additional silencing factors, such as methy- 
lation and histone hypoacetylation. Since neither XIST 
RNA nor macroH2A1.2 are critical for maintaining 
inactivation, it is also possible that XIST is not directly 
involved in silencing, but acts as an early developmental 
mark for compartmentalizing the inactive X and 
permitting modification by silencing agents such as 
methylation, hypoacetylation, and late replication. 


Spreading of the X-Inactivation Signal 


Another important aspect of X-inactivation is how the 
inactivation signal is spread from the XIC throughout 
the X. With the discovery of XIST and its role in 
initiation of X-inactivation, the question of spreading 
is one of how XIST RNA is spread in cis along the 
inactive X. As discussed above, XIST RNA is likely to 
interact with the X chromosome via protein inter- 
mediates. In fact, macroH2A1.2, mentioned above, 
may be such a protein. Since XIST RNA covers a 
large part of the inactive X and may, under certain 
conditions, bind to autosomal segments, it is unlikely 
that the XIST RNA protein complex binds toa DNA 
sequence unique to the X chromosome. Lyon (2000) 
suggested that such an XIST-protein complex might 
bind to a long interspersed repeat element, LINE- 
1(L1), which are more frequent on the X chromo- 
some, especially the younger L1 elements, than on 
autosomes. L1 elements are retrotransposons unique 
to mammals (Burton et al., 1986). This hypothesis 
remains to be confirmed. 


XIST Expression in Male Meiosis 


XIST RNA has also been detected in male meiosis, 
at a time when the X is going through precocious 
condensation. Eventually all the sperm chromosomes 
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become highly condensed. It was natural to think that 
XIST expression plays a role in this condensation, but 
the level of XIST expression in spermatogenesis is 
extremely low and could cover only a small fraction 
of the X chromosome. It would be interesting to know 
if TSIX is also expressed at this stage, thereby pre- 
venting any accumulation of XIST RNA. 


Future Questions 


Although the discovery of XIST has led to rapid pro- 
gress in our understanding of the X-inactivation pro- 
cess, significant questions remain. What does XIST 
RNA actually do? What is the nature of the autosomal 
signal in the initiation process? What is the nature of 
the mechanism for the switching on and off of TSIX 
and XIST on the active and inactive X chromosomes. 

Finally, we can consider the question of whether 
X-inactivation resulting from stable XIST expression 
can be initiated at more than one time in development. 
X-inactivation is initiated at an early and developmen- 
tally specific stage. Its occurrence appears to depend 
on differentiation, because the absence of dosage com- 
pensation is tolerated in early undifferentiated states. 
Can inactivation be induced at a later developmental 
stage in cells that have more than one active X chromo- 
some, such as in a male tumor or in a somatic cell 
culture system? XIST gene expression has been re- 
activated on the active X in transformed somatic cells, 
but without any detectable silencing effect on the rest 
of the chromosome. This may mean that the complete 
inactivation process is restricted to a specific early 
developmental stage. In contrast, there is evidence of 
sex chromatin (Atkin and Baker, 1992) and XIST 
expression (Looijenga et al., 1997) in male germ cell 
tumors with more than one X, although demonstra- 
tion of inactivation of genes on the ‘inactivated X’ has 
not been reported. It is unlikely that these germ cell 
tumors would have been present at the time of normal 
embryonic inactivation; if this work is correct, there- 
fore, it would mean that the inactivation process is not 
restricted in developmental time and place. 
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X-ray crystallography is a technique for determining 
the three-dimensional structure of molecules, includ- 
ing complex biological macromolecules such as pro- 
teins and nucleic acids. It is a powerful tool in 
the elucidation of the three-dimensional structure of 
a molecule at atomic resolution. Data is collected by 
diffracting X-rays from a single crystal, which has an 
ordered, regularly repeating arrangement of atoms. 
Based on the diffraction pattern obtained from X-ray 
scattering off the periodic assembly of molecules or 
atoms in the crystal, the electron density can be recon- 
structed. 

The use of X-ray diffraction patterns in the study 
of molecular structure dates to the early part of the 
twentieth century. In 1901, Röntgen received the first 
Nobel Prize for Physics for the discovery of X-rays, 
and by 1912 the discovery of X-ray diffraction in 
crystals by Von Laue, Friedrich, and Knipping, and 
the research of Bragg and Bragg on the structure of 
crystals (for which the 1915 Nobel Prize for Physics 
was awarded) laid the foundation for the field of 
X-ray crystallography. In 1953, X-ray diffraction 
patterns from ordered fibers of DNA were instrumen- 
tal in Watson and Crick’s discovery of the double- 
helical structure of DNA. 


See also: DNA, History of 
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The Y chromosome has been studied extensively in 
humans and mice where in both species it is required 
for testis determination and spermatogenesis. In 
humans the chromosome also contributes to normal 
somatic development. In this brief article, the organ- 
ization and gene content of the human Y chromosome 
will be considered with particular reference to the uni- 
que features that have shaped its evolution. The mam- 
malian sex chromosomes evolved from an ancestral 
pair of autosomes. The dominant male-determining 
gene arose on the proto-Y and became genetically 
sequestered from the rest of the genome through the 
suppression of recombination over most of the length 
of the proto-Y with the proto-X. This suppression of 
recombination has led to the rapid degeneration 
of genes on the Y chromosome and, consequently, 
a genetic imbalance in gene dosage between males 
and females for homologous genes carried on the pro- 
genitor sex chromosomes and retained as functional 
loci on the modern X chromosome. The evolution of 
dosage compensation has restored equivalence of gene 
expression between males and females for genes on the 
X chromosome and, in most mammals, this is achieved 
by random X-inactivation in the female. 

Thus, the Y chromosome differs from other mam- 
malian chromosomes in two fundamental ways. First, 
as mentioned above, it is the only chromosome that 
does not recombine along the majority of its length 
and, second, it is present only in one sex, the male. The 
evolution of the genetic functions, and therefore the 
gene content, of the Y chromosome are believed to be 
a reflection of these basic properties. Two principal 
theories for the evolution of genes on the Y chromo- 
some have been formulated each based on one of its 
two distinctive properties. First, it is theorized that the 
absence of recombination prevents the segregation of 
deleterious mutations from advantageous mutations, 
thus leading to an inevitable deterioration in the 
genetic content of the nonrecombining regions of the 
Y chromosome (NRY). The lack of Y-linked genetic 


functions, which originally inspired this theory over 
80 years ago, is still the best known feature of the Y 
chromosome, after sex-determination. Second, the 
male-specific nature of the Y chromosome may pro- 
mote the accumulation of male-enhancing/female- 
damaging alleles, sexually antagonistic (SA) alleles, 
leading to the Y chromosome becoming a specialized 
male chromosome; for example, genes involved in sper- 
matogenesis or the promotion of fertilization success 
of the male. These two theories are not mutually 
exclusive but are contradictory, in the sense that the 
first proposes a loss of genetic information while the 
second proposes a gain of genetic information. 


Sequence Content of the Human Y 
Chromosome 


Despite the morphological and sequence divergence 
of the Y chromosome from its ancestral homolog the 
X, all mammals retain a small region of strict X-Y 
homology to permit pairing and correct segregation 
of the sex chromosomes during male gametogenesis. 
This segment is known as the pseudoautosomal region 
(PAR) and pairing region and in humans there are two; 
a major one (PAR1; 2.6 Mb) found at the extremities 
of the X and Y short arms and a lesser region (PAR2; 
300-400 Kb) found at the extremities of the X and 
Y long arms (see Figure |). In humans, there is an 
obligatory crossover in PAR1 at each male meiosis in- 
dicating that recombination between the sex chromo- 
somes is unrestricted in this region. Recombination 
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also occurs in the PAR2, but at a much lower fre- 
quency. The PAR regions show considerable variation 
in gene content between different groups of species. 
The evolution of the sex chromosomes and the estab- 
lishment of distinct PARs is believed to have occurred 
both by the acquisition of regions from autosomes 
onto the X with subsequent recombination onto the 
Y and the loss of material from the male-determining 
Y chromosome through rearrangements and degen- 
eration: the addition—attrition model of sex chromo- 
some evolution. 

Within the nonrecombining portions of the sex 
chromosomes several other blocks of homology and 
X-Y homologous genes have been described. By 
studying the patterns of homology in different species, 
the events leading to discrete blocks of sequence con- 
servation can be reconstructed. Consequently, regions 
of X-Y homology defined on the modern human sex 
chromosomes represent either the ancient remnants of 
the ancestral pair of autosomes, or reflect exchanges of 
material with the X chromosome that have occurred 
more recently during evolution. These NRY X-Y 
homologous regions have arisen through transfers 
mediated by PAR recombination and subsequent rear- 
rangements on the Y (e.g., inversions) and by direct 
duplication and transposition from the X into the 
NRY. There is evidence to suggest that sequences 
have also been recruited directly to the Y chromosome 
from autosomes through duplicative transpositions; 
for example the DAZ genes. The result of these 
different mechanisms of sequence recruitment to 
the Y has been to create a chromosome where X-Y 
homologies, Y-autosome homologies, and Y-specific 
sequences are interspersed within the euchromatic 
portion of the chromosome. 

Whilst most genes on the X chromosome are sub- 
ject to inactivation on one of the X chromosomes in 
each female cell, this does not apply to genes found to 
be homologous between the X and Y chromosomes 
where both X and Y copies are functional. These genes 
escape X-inactivation and are, therefore, expressed in 
diploid dose in both males and females. This applies to 
almost all genes within the PARs and to X-Y hom- 
ologous genes located in the NRY. Deficits of genes in 
this latter category lead to the features of Turner syn- 
drome (XO females, and females and males with par- 
tial deletions of the X and Y), suggesting that they are 
required in diploid dose in both males and females. 
Other subtle aspects of somatic phenotype may also 
be influenced by such genes. 

The third prominent sequence feature of the Y 
chromosome is the block of Y long arm heterochro- 
matin that accounts for almost half of the 50 Mb of 
the human Y. This is composed of two tandem repeat 
sequences and contains virtually no single-copy DNA 


sequences. Amongst individual males, there is extreme 
polymorphism in the size of the Yq heterochromatin, 
ranging from its complete absence to the occupation 
of almost half of the Y chromosome DNA content. 
As no clinical consequences are associated with the 
absence of Yq heterochromatin, it is believed that no 
critical genes reside in this part of the chromosome. 


Amplification of Sequences on the 
Y Chromosome 


Sequences related to the highly amplified Yq hetero- 
chromatin repeats are found in other regions of the 
genome but at much lower copy number. This indi- 
cates that these sequences have become amplified once 
placed onto the Y chromosome. This amplification of 
sequences on the Y chromosome is a distinguishing 
feature of DNA sequences on this chromosome. 
Much of the sequence content in the euchromatic 
NRY consists of amplified sequence and gene families. 
It would appear that the absence of recombination 
may remove restraints upon copy number and, where 
this does not compromise chromosome function, 
amplification is tolerated. It may also be that there is 
selection for the amplification of genes coding for 
particular functions; for example, the DAZ and RBM 
genes believed to be important for successful sperm- 
atogenesis. There may be several reasons for this 
selection for amplification. First, there may be selec- 
tion for multiple copies of a gene because this results 
in increased amounts of suboptimal gene product aris- 
ing from accumulated deleterious mutations. Second, 
it may be that subtle variants of the same amplified 
gene have been selected that act synergistically. Third, 
it may be that the inability to reconstruct defective 
genes through recombination drives gene amplifica- 
tion resulting in reduced selection against any member 
of that gene family that suffers an alteration. One or a 
combination of these possibilities may lie behind the 
emergence of amplified sequence and gene families on 
the Y chromosome. 


Model of the Y Chromosome 


A bipartite model of the human Y chromosome 
emerges from the above observations. The key 
features are as follows: 


1. The Y has two pairing regions that are strictly 
homologous with the X where there is recombin- 
ation, and hence genetic exchange between the two 
sex chromosomes. 

2. The Y has a nonrecombining region (NRY) com- 
posed of the Yq heterochromatin and a euchromatic 


segment containing an interspersed arrangement of 
Y-specific, X-Y homologous and Y-autosome 
homologous sequences. 

3. The Y is populated by amplified sequence and 
gene families that are likely to have arisen from 
the nonrecombining status of the majority of the 
chromosome. 

This model is summarized in Figure IA. 


The analysis of genetic functions encoded by the Y 
chromosome and its gene content has been facilitated 
by, first, extensive deletion mapping and, second, 
cloning of the entire euchromatin (some 30 Mb of 
DNA) ina series of overlapping yeast, P1, and bacter- 
ial artificial chromosomes. Deletion mapping has 
exploited structural abnormalities of the Y chromo- 
some and a wide range of single-copy cloned DNA 
and STS (sequence-tagged site) markers to score the 
presence or absence of Y chromosome regions in dif- 
ferent individuals. This has allowed the correlation of 
a series of deletion intervals with the phenotypes 
possessed by individuals carrying any particular Y 
chromosomeabnormality. By mapping cloned markers 
and STSs back onto the physical clone maps, it has 
been possible to determine which set of clones cover 
each of the defined deletion intervals associated with a 
genetic function defined by a phenotype. These clones 
provide the basis for determining the gene content 
of deletion intervals through a variety of molecular 
analyses. Figure 2 summarizes the deletion map, 
the location of genetic functions assigned to the Y, 
the blocks of homology with the X, and the genes 
and pseudogenes that have been mapped to the 
chromosome. 


Genes and Phenotypes Mapped to the 
Y Chromosome 


The following is a brief summary of the genes and 
phenotypes that have been assigned to the human Y 
chromosome. These will be listed starting at the Yp 
telomere. 


PARI 

The only definitive phenotype that has been assigned 
to the PAR1 is short stature associated with Turner 
syndrome. Several genes have been assigned to this 
region. These are: 


= 


. SHOX (a homeodomain-containing gene) where 
mutations have been shown to be present in indi- 
viduals with idiopathic short stature, suggesting its 
involvement in Turner syndrome. 

2. CSF2RA encodes the « chain of the GM- 

CFS (granulocyte-macrophage colony-stimulating 
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factor) receptor heterodimer and its frequent dele- 
tion in the M2 subtype of acute myeloid leukemia 
suggests its involvement in this malignancy. 

3. IL3RA encodes the a subunit of the receptor for the 
cytokine IL-3 that promotes growth of hema- 
poietic cells. This receptor shares the B subunit 
with the GM-CFS and IL-5 receptors. 

4. ANT3 encodes an ADP/ATP translocase and is 
believed to be involved in energy metabolism. 

5. ASMT encodes the enzyme acetylserotonin methyl- 
transferase and catalyses the final step in the synthe- 
sis of serotonin in the retina and pineal gland. 
There have been suggestions that this gene may be 
involved in affective disorders. 

6. XE7 encodes a ubiquitously expressed protein of 
unknown function. 

7. MIC2 encodes a surface antigen expressed on all 
cells except spermatozoa. It is adjacent to a related 
pseudogene, MIC2R, that has probably arisen by 
gene duplication. 

8. On the Y, XG represents a nonfunctional copy of 
its X homolog. Only the first three exons are pres- 
ent in the Y-linked sequence. The gene on the X 
encodes the red blood cell antigen XG. 


Yp NRY Euchromatin 

A number of genes have been mapped to the nonre- 
combining region of the Y chromosome short arm. 
The majority of these are X-Y homologous genes but 
do not participate in genetic exchange with their X 
counterparts. Three major genetic functions have been 
assigned to Yp by deletion mapping; sex determin- 
ation in distal Yp close to the boundary with the 
PAR1, the locus (or loci) involved in the lymphoedemic 
anomalies of Turner syndrome to the human-specific 
block of X-Y homology between Yp11.2 and Xq21.3 
and the locus causing gonadoblastoma (GBY) to 
proximal Yp. The genes on Yp are: 


1. SRY encodes a transcription factor belonging to 
the HMG gene family. It has been shown that 
its function is determination of male gonadal 
development (TDF). 

2. RPS4Y encodes a ribosomal protein and it has 
been suggested that haploinsufficiency of this pro- 
tein may contribute to aspects of the Turner phe- 
notype. This remains controversial. 

3. ZFY encodes a zinc finger protein with potential 
to function as a transcription factor. The function 
of this gene remains unknown. 

4. PCDHY encodes a protocadherin gene that re- 
sembles cadherin neuronal receptors. The gene is 
expressed in the brain and may be involved in 
forming neuronal networks. 
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Genetic functions / phenotypes 
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Figure 2 The human Y chromosome. 
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5. RBM encodes an RNA-binding protein and is 
part of an amplified gene family of 30-50 genes 
on the Y. Copies on Yq are involved in germ cell 
differentiation. 

6. TTY1 and TTY2 encode testis-specific transcripts 
with no evident open reading frames. More than 
one locus exists on the Y for both transcripts. 

7. TSPY encodes a testis-specific protein with 
some homology to the SET oncogene, a nuclear 
phosphoprotein. It is believed to be involved in 
germ cell differentiation and it has been suggested 
that it may play a role in the development of germ 
cell tumors; a possible candidate for GBY. The 
gene is part of a tandemly amplified gene family 
with two clusters (TSPYA and TSPYB) on Yp (see 
Figure 2). 

8. PRKY encodes a protein kinase related to the 
cAMP-dependent kinases. Its function is 
unknown. 

9. AMELY encodes amelogenin, a constituent of 
tooth enamel and may contribute to tooth size in 
males. Both PRKYand AMELY map into a second 
block of homology on Yp with the Xp 22.3 region 
of the X chromosome. 

10. PRY (PTP-BL related) encodes a tyrosine phos- 
phatase and is also present at more than one locus 
on the Y. 


Yq NRY Euchromatin 

Several genes have been mapped to the long arm of the 
human Y chromosome. Two features are evident when 
the gene content of Yq is considered. First, it is notice- 
able that at least three loci associated with germ cell 
development and male infertility have been mapped to 
Yq. This supports the idea that genes controlling 
spermatogenesis will accumulate on the Y. Second, 
there is an accumulation of pseudogenes that are 
homologous to functional genes mapping to the X 
chromosome. Two further pseudogenes, ASSP6 of 
the argininosuccinate synthetase gene family and 
ACTP2 of the actin gene family, have also been 
assigned to Yq11. The long arm also contains several 
copies of sequences homologous to retroviruses. 
Deletion analysis has assigned phenotypes for male 
infertility (AZFa, AZFb, and AZFc), Turner syndrome 
skeletal anomalies, and growth (GCY; including tooth 
size) to Yq. The following have been mapped to the Yq 


euchromatin: 


1. A series of nonfunctional pseudogenes with 
homology to functional homologs mapping to 
Xp. These are: ribosomal protein RPS24Y, aryl- 
sulfatases ARSFY, ARSE Y, ARSDY, glycogenein 2 


10. 


11. 


12. 


13. 
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GLY2P, XG pseudogene XGPY, apical protein 
Xenopus-like APXLY, and an SP1 transcrip- 
tion cofactor CRSP2Y (formerly known as 
EXLM1Y). 


. DFFRY (also known as USP9Y) encodes a 


ubiquitin-specific protease and is homologous to 
the Drosophila developmental gene, faf. 


. DBY encodes a potential RNA helicase that may 


be involved in mRNA translation. Both DFFRY 
and DBY are removed by deletions resulting in the 
AZFa male infertility phenotype primarily char- 
acterized by Sertoli cell-only syndrome. One or 
both of these are, therefore, likely to underpin the 
AZFa phenotype. 


. UTY encodes a tetratricopeptide repeat gene that 


may have a role in transcriptional repression. 


. TB4Y is homologous to thymosin f and has hom- 


ologs on Xq22 and various autosomes. 


. KALp encodes a nonfunctional copy of the X- 


linked KAL gene that is responsible for Kallmann 
syndrome (anosmia and hypogonadism). The se- 
quence of this gene resembles that of cell adhesion 
molecules and is believed to have a role in neuron- 
al cell migration. 


. BPY1 (basic protein on the Y) encodes a basic 


protein of unknown function. The gene is now 
known as VCY (variable charge protein on the 
Y) and has been shown to have homologous 
genes in Xp22.3. There are two Y copies. BPY2 is 
unrelated but potentially encodes a different basic 
protein. 


. STSp is a nonfunctional copy of the steroid sul- 


fatase gene located in Xp22.3. Deletion of the X 
gene leads to the skin condition X-linked ichthyo- 
sis. Closely linked to this is the pseudogene GS/p, 
homologous to an X-linked gene in Xp22.3 of 
unknown function. 


. CDY encodes a protein containing a chromodo- 


mainand may be involved in remodeling chromatin 
during the maturation stages of spermatogenesis. 
There are at least two loci for this gene on the Y. 
XKRY encodes a protein related to XK, a putative 
membrane transport protein. 

SMCY encodes the male-specific HY antigen 
(HYA) expressed on the surface of male cells. 
This gene has a homolog mapping to Xp11.2. 
EIF1AY encodes a translation initiation and 
elongation factor with homologs in Xq22 and 
chromosome 1p. 

DAZ encodes another RNA-binding protein and 
has been suggested as a candidate for the AZFc 
infertility phenotypes. Although DAZ and RBM 
are strong candidates for the AZFc and AZFb 
phenotypes, respectively, it can be seen that both 
intervals contain loci for a number of other genes. 


2160 Y Linkage 


It should, therefore, be kept in mind that combin- 
ations of these genes may underpin the sperm- 
atogenic phenotypes associated with these regions. 


Par2 

The PAR2 is of recent evolutionary origin having 
appeared after the divergence of chimpanzees and 
hominids. Two genes have been mapped to this 
minor pairing region at the Yq telomere. The first, 
SYBL1, encodes a synaptobrevin-like protein that 
may have a function is synaptic vesicle docking. 
Unlike other potentially functional X-Y homologous 
genes, the Y copy is specifically inactivated. The 
second, ILR9, encodes the interleukin 9 receptor. 


Conclusion 


The sex chromosomes evolved from a common ances- 
tral pair of homologs as a chromosomal basis for sex 
determination emerged. The suppression of recombin- 
ation outside the regions pairing with the X chromo- 
some has created unique conditions on the differential 
portion of the Y, leading to the rapid degeneration of 
its sequence and gene content. This genetic isolation 
has driven Y chromosome evolution resulting in an 
accumulation of repeated sequence and gene families, 
defective pseudogenes, and functional genes shared 
with the X and autosomes. There is a continuous pro- 
cess of addition and attrition of sequences and genes 
on the Y chromosome, creating a rapidly changing 
genetic content. The absence of recombination on 
and presence only in the male of the chromosome 
have produced evolutionary pressures predicted to 
lead to the accumulation of male-specific functions 
and dimorphisms on the Y chromosome. Many of 
the genes and genotypes assigned to the Y support 
this prediction as they have been shown to be asso- 
ciated with male-specific traits such as male sexual 
development, greater stature, and spermatogenesis. It 
is expected that the concept of sexual selection (the 
selection for genes on the Y that confer an advantage 
in male competition for fertilization success) may 
account for the presence of other functional genes on 
the Y chromosome once their biological function(s) 
have been clarified. 


See also: Sex Determination, Human; 
X Chromosome 
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See: Linkage; Y Chromosome (Human) 
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Yeast artificial chromosomes (YACs) were originally 
constructed in order to study chromosome behavior 
in mitosis and meiosis without the complications of 
manipulating and destabilizing native chromosomes. 
This allows for the alteration of structures on a non- 
essential chromosome to study their effects. There 
are three essential components for chromosome main- 
tenance and stability (Figure |): a functioning centro- 
mere (CEN); origin of replication (autonomous 
replication sequence, ARS); and telomeres (TEL) at 
the ends. A technical barrier to building YACs was 
creating telomeres. CENs and ARSs could be cloned 
on small plasmids in Escherichia coli and their function 
tested by shuttling into yeast. As linear DNA mol- 
ecules cannot be maintained in E. coli, the TEL com- 
ponent was constructed as inverted repeats of 
telomere sequence that could resolve into functioning 
telomeres when moved into yeast. Tetrahymena telo- 
mere repeats function as telomeres in yeast and are 
used in most YAC constructs. 

YACs that contain the three essential components 
and are of sufficient size behave as normal chromo- 
somes. They replicate and segregate properly during 
mitosis and meiosis and are affected similarly by 
mutations that alter native chromosome behavior. 
Using YACs we now know that centromere function 
is severely impaired on very short YACs (less than 
20kb), which may be due to antagonism between 


TEL (plasmid) YSM-1ARS CEN 
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Figure | Basic structure of a yeast artificial 
chromosome. TEL, Tetrahymena telomere-derived 
sequences; (plasmid) sequences derived from bacterial 
cloning vector such as pBR322; YSM-| and YSM-2, 
yeast genes for selecting yeast host transformants, 
generally prototrophic markers; ARS, yeast auto- 
nomously replicating sequence; CEN, yeast centromere 
DNA. 


-5 kb 
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cetromere and telomere functions. Using markers 
along a pair of homologous YACs it has been shown 
that crossovers near the telomeres are not sufficient to 
guarantee proper segregation in meiosis, while those 
that are more internal are sufficient. YACs were also 
used to demonstrate distributive segregation, an alter- 
native segregation mechanism originally described in 
Drosophila. This occurs in the absence of a homolog 
or absence of crossovers. YACs continue to be used to 
study the function of chromosome components, and 
as an assay for segregation problems in different 
mutant backgrounds and treatments. 

It was quickly recognized that YACs could pro- 
vide a new cloning vehicle for very large contiguous 
fragments. These could be much bigger than those 
cloned using the conventional cloning vectors of the 
time. Technically this is more difficult than other 
cloning methods as it requires ligation of three mol- 
ecules and uptake of very large fragments by yeast. 
Despite this, several large YAC libraries of various 
genomes, including humans, have been constructed. 
These have proven very useful for physical character- 
ization, as well as positional cloning of genes of inter- 
est. The ability of telomeres from other organisms 
to function as telomeres in yeast has led to the clon- 
ing of large terminal fragments of chromosomes as 
‘half-YACs.’ 

The advantages of using YACs go beyond the abil- 
ity toclone large fragments. The high levels of homolo- 
gous recombination in yeast and the developments in 
the ability to alter any sequence in any fashion allow 
specific mutations to be made in the sequences in the 
YAC without having to resort to subcloning. In many 
cases these altered YACs can be directly moved back 
into the organism or cell type of origin for assaying 
phenotypes. Recombination between YACs that 
partially overlap can be used to generate new YACs 
of larger size, allowing for the building up of contigu- 
ous fragments larger than the original library inserts. 
This is particularly useful when the genomic organ- 
ization of a gene or region of interest covers a larger 
than insert size. YACs in excess of 2 megabases have 
been constructed in this fashion. 

The disadvantages of YACs are threefold. The first 
disadvantage is the presence of a significant number of 
chimeric YACs that are due to coligation of fragments 
from different parts of the genome being cloned. A 
second disadvantage is the instability of certain 
sequences in yeast such as the alphoid repeats from 
human centromeres. Finally, YACs, as originally con- 
structed, are not amenable to easy separation from 
yeast genomic DNA for further analysis and ma- 
nipulation. For these reasons, the use of YACs for 
construction of a library in genome projects is being 
replaced by alternatives such as BACs (bacterial 
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artificial chromosomes). There is a recent resurgence 
in the use of YACs due to a technique that combines 
the advantages of yeast recombination with the ease of 
manipulation of BACs. Transformation-associated 
recombination can be used to target specific regions 
of genomic DNA for cloning as a YAC. The YAC 
vector can incorporate bacterial sequences that allow a 
large circular YAC to be shuttled into E. coli as a BAC. 
This removes two of the problems associated with 
YACs and may solve the stability problem by utilizing 
an alternative host. 


See also: BAC (Bacterial Artificial Chromosome); 
Saccharomyces Chromosomes 
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Charles Yanofsky was born in New York City on 17 
April 1925. He received his BS degree from City 
College of New York and PhD degree from Yale 
University under the guidance of David Bonner. His 
academic career began with his appointment as an 
Assistant Professor at Western Reserve University 
Medical School. He moved to Stanford University in 
1958 where he is currently Herzstein Professor of 
Biological Sciences. His early pioneering studies with 
Neurospora crassa involved suppressor mutations that 
restored the ability of this organism to form an active 
enzyme in a mutant that previously produced an in- 
active protein. Considerably later and working with 
the A protein subunit of tryptophan synthetase in 
Escherichia coli, he established that suppression causes 
mistakes in amino acid incorporation with the net 
result that a specific amino acid in the mutant protein 
that is responsible for loss of protein activity is 
replaced by an amino acid that restores enzymatic 
activity. It was later shown that these suppressor 
mutant strains contained altered tRNAs. The switch 
from Neurospora to E. coli allowed Charles Yanofsky 
to carry out fine-structure genetic analysis of a large 
number of isolated tryptophan-requiring A protein 
mutants of E. coli and to isolate sufficient quantities 
of the corresponding mutant proteins for sequence 
analysis. By aligning the order of mutations on the 
genetic map with the order of positions of the amino 
acid changes in the corresponding A protein mutants, 
direct evidence was provided that gene structure 
and protein structure are colinear in bacteria. The 
further analysis of different amino acid changes at 
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the same codon position in the A protein provided in 
vivo verification of the genetic code deduced from 
in vitro studies. The genetic and biochemical analysis 
of the tryptophan synthetase system in E. coli subse- 
quently led to the discovery of attenuation as a major 
regulatory mechanism that controls the level of tran- 
scription of the tryptophan operon in response to the 
cellular level of tryptophanyl-tRNA. These funda- 
mental contributions of Charles Yanofsky have been 
key to the rapid development of molecular genetics 
and our basic understanding of mechanisms control- 
ling the flow of information from gene to protein. 


See also: Tryptophan Operon 
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Yeasts are simple, single-celled eukaryotes that pro- 
vide outstanding model systems for understanding 
basic cell biology. The ability to manipulate yeast 
cells in the laboratory depends upon the ability to 
transform them with engineered plasmids and to 
maintain these plasmids within the cell. Naturally 
occurring yeast plasmids provided the original tem- 
plate used to design laboratory vectors. These have 
been further developed so that modern laboratory 
plasmids provide a variety of sophisticated features. 
These plasmids can be used as research tools to study 
yeast biology, or as practical tools to manipulate yeast 
cells. 


Naturally Occurring Plasmids 


Yeast cells have naturally occurring double-stranded 
circular DNA plasmids. In order to persist in a popu- 
lation of growing cells, these plasmids must replicate, 
and they must segregate to both daughter cells dur- 
ing cell division. In the well-studied budding yeast 
Saccharomyces cerevisiae, a naturally occurring plas- 
mid called 2-micron is present at up to 100 copies 
per cell. This plasmid provides a useful example of 
the strategies required to replicate and transmit an 
extrachromosomal element. The 2-micron plasmid 
provides no apparent selective advantage or disadvan- 
tage to the cell that harbors it. It is maintained in the 
nucleus and its DNA is packaged in the same way as 
normal chromatin. It contains four genes, a unique 
origin of DNA replication required for its replication 
once per cell cycle, and a partitioning system, still not 


completely understood, that ensures its transmission 
to both cells during cell division. 

How can a plasmid that is replicated only once per 
cell cycle achieve such high copy number? The struc- 
ture of the 2-micron plasmid provides a clue. It con- 
tains two tracts of repeated sequence that separate 
the molecule into two halves. If these homologous 
regions are aligned with one another and recombin- 
ation occurs between them, the net effect is to flip the 
orientation of one half of the plasmid relative to the 
other. If a bidirectional replication fork is proceeding 
around the plasmid at the time of recombination, with 
the two halves separated by the homologous recom- 
bination region, then this event effectively reverses the 
orientation of one of the forks relative to the other. 
That is, the forks follow one another, rather than 
converge. This rearrangement allows amplification 
by additional replication of the plasmid. Another 
recombination event restores the original orientation, 
and when the forks finally converge, replication 
ceases. The enzyme responsible for these rearrange- 
ments is encoded in the plasmid genome. Similar plas- 
mids have been isolated from a number of yeast 
species, and they all appear to employ the recombin- 
ation method for amplification. The existence of these 
natural plasmids provides yeast geneticists with useful 
molecular tools. 


Engineered Plasmids 


In order for yeast plasmids to be useful in the labora- 
tory, they require several features. First, there must be 
a means of preparing large quantities of pure plasmid 
DNA. For this purpose, recombinant yeast plasmids 
are built as yeast/Escherichia coli shuttle vectors that 
contain a bacterial origin of replication and a bacterial 
selective marker, such as the B-lactamase gene that 
confers ampicillin resistance. With these components, 
large amounts of the recombinant plasmid can be 
manipulated in and purified from E. coli cultures. 
The second requirement is that the plasmid be main- 
tained in the yeast cell. Thus, it requires a yeast origin 
of replication. For S. cerevisiae plasmids, the 2-micron 
origin isolated from the naturally occurring plasmid is 
commonly used. However, chromosomal replication 
origins can also be employed. In fact, the ability of a 
fragment of chromosomal DNA to support plasmid 
maintenance as an autonomously replicating sequence 
(ARS) is one of the definitions of a chromosomal 
replication origin in yeast. Because the ARS elements 
from the S. cerevisiae chromosome are compact, on 
the order of 100 base pairs of DNA, they are easily 
added to a plasmid. The centromeres from S. cerevisiae 
are also sufficiently compact to be encompassed on a 
plasmid, again on the order of a few hundred base 


pairs, so that an ARS + CEN-containing plasmid can 
be maintained and transmitted through mitosis and 
meiosis as a circular minichromosome. 

Not all yeasts provide such handy cellular compon- 
ents. The fission yeast Schizosaccharomyces pombe is 
another popular experimental system. However, na- 
turally occurring plasmids have not been studied in 
this organism, so there is no native equivalent to the 2- 
micron origin. Sequence fragments with ARS function 
from the fission yeast genome have been cloned based 
on their ability to support plasmid maintenance and 
these are commonly used in plasmid constructions. 
These fission yeast ARS elements are somewhat larger 
than the replication origins in Sa. cerevisiae, typically 
over a kilobase of DNA. However, the centromeres of 
the fission yeast approach 100 kilobases of DNA, and 
are far too large to be included on any plasmid. There- 
fore, most fission yeast plasmids rely upon a Sc. pombe 
ARS element and random segregation; as a result, they 
suffer a relatively high frequency of loss compared to 
plasmids in Sa. cerevisiae. 

A third necessary feature is a selectable marker, so 
that yeast cells containing the plasmid (‘transform- 
ants’) can be distinguished from those that do not. 
Because yeasts are eukaryotes, they are not sensitive 
to antibacterial drugs such as ampicillin. Instead, the 
plasmids contain wild-type yeast genes to comple- 
ment nutrient-requiring yeast mutants. For example, 
Sa. cerevisiae cells lacking a functional URA3 gene are 
unable to grow in the absence of exogenous uracil. 
However, if the active URA3 gene is included on a 
plasmid, cells that take up the plasmid and maintain it 
will grow in the absence of uracil (‘complemen- 
tation’). URA3 thus provides a positive selection for 
plasmid-containing strains. Similarly, in Sc. pombe, 
the ura4* gene on the plasmid will complement a 
strain with a ura4 deficiency. Since many standard 
laboratory yeast strains carry multiple auxotrophic 
mutations, and yeast origins do not suffer from 
replication interference, it is possible to transform a 
single strain with several plasmids that differ only in 
their selectable markers. 

A final requirement for yeast plasmids is one of 
size. Because simple methods of plasmid purification 
are more difficult with large molecules, most work- 
able plasmids are at most 20 to 30 kb. Larger plasmids 
exist, but are refractory to simple manipulation in E. 
coli and difficult to transform into yeast. Instead, they 
are manipulated in the yeast and moved from strain to 
strain using classical genetics. 


Different Plasmids Have Different Uses 


Additional plasmid features depend upon their 
intended use. First, there are simple cloning vectors 


Yeast Plasmids 2163 


designed to maintain genomic DNA fragments. These 
typically add a set of useful restriction enzyme sites to 
the basic plasmid backbone described in the preceding 
section. Such plasmids are often used to construct 
genomic DNA libraries, in which each plasmid con- 
tains a random fragment of the genome and the pool of 
plasmids represents the entire genome. These libraries 
are useful for cloning genes by complementation of 
a mutant strain. A second class of plasmids allows 
expression of a cloned gene under controlled condi- 
tions. These require a regulated promoter. A number 
of different yeast promoters that can be turned on and 
off in response to particular growth conditions have 
been isolated from both Sa. cerevisiae and Sc. pombe. 
Such expression plasmids can be used not only to 
express yeast genes, but also to express genes from 
other species in yeast cells. By fusing the cloned frag- 
ment to targeting signals, such as secretion signals, a 
heterologous protein can be produced and secreted or 
otherwise directed to specific cellular compartments. 
Yeast cells can therefore be used as factories to pro- 
duce large amounts of heterologous recombinant 
proteins. 

A specialized subset of plasmids are those designed 
to integrate into the yeast genome and be maintained 
as part of a chromosome rather than as free episomes 
in the cell; in a formal sense, these are not yeast plas- 
mids at all, because they are only maintained as true 
plasmids in E. coli. Simply removing the yeast replica- 
tion origin will prevent a plasmid from being main- 
tained efficiently as an episome in the yeast cell. 
Because yeast cells are proficient at homologous 
recombination, an integrating plasmid is likely to 
insert at a position in the chromosome that matches 
some sequences on the plasmid, such as the marker, or 
the cloned gene of interest. Integration ensures that 
the plasmid will be present in single copy in the cell, 
and eliminates concerns about copy number variation 
and inefficient transmission through the cell cycle. 
An integrated plasmid is relatively stable, so that 
it is likely to be maintained even in the absence of 
selection, and it can be moved from strain to strain 
genetically as any other chromosomal marker. 


Manipulating Cells with Plasmids 


Once a plasmid is constructed, it can be used in a 
variety of experiments. First, the yeast cells must be 
induced to take up the plasmid in a process referred to 
as transformation, usually involving chemical treat- 
ment or electroporation. The transformed cells are 
plated under selective conditions so that only those 
cells that have successfully taken up and established 
the plasmid and its marker will grow. Subsequently, 
the gene(s) contained on the plasmid can be analyzed 


2164 Yeast Two-Hybrid System 


for ability to complement mutations in the host strain, 
or for phenotypes associated with overproduction. 
Using plasmid libraries of genomic DNA or cDNA 
transformed into mutant strains, the investigator can 
clone genes and suppressors by complementation. For 
example, by transforming the plasmid library into a 
host strain that contains a temperature-sensitive muta- 
tion in an interesting gene, and selecting for growth 
at the restrictive temperature, plasmids that contain 
the wild-type copy of the mutant gene or a suppressor 
of the mutant can be isolated and subsequently char- 
acterized. If a yeast plasmid contains an equivalent 
gene from human cells (for example), the ability 
of that gene to replace the mutated yeast gene can be 
assessed. Such cross-complementation has been used 
to isolate homologous genes from different species. 

However, presence of a plasmid-borne gene in the 
cell may have unanticipated effects. Expression of a 
toxic gene on a plasmid may provide a negative selec- 
tion that counters the positive selection of the plasmid 
marker. This reduces the efficiency of plasmid main- 
tenance and attenuates viability of the host strain. 
Such toxic phenotypes also confer a genetic selection 
for random mutations that reduce expression or other- 
wise modify the responsible gene or the host strain to 
ameliorate the effects. This adaptability can be turned 
to the investigator’s advantage. An example isa method 
called the ‘plasmid shuffle,’ which exploits the ability 
to select against a plasmid. In this technique, expres- 
sion of a gene on a plasmid is essential for viability of 
the cell. The investigator transforms the strain with 
a second plasmid containing a mutant derivative of the 
same gene, and a different selectable marker. If the 
mutant derivative is still functional, and the missing 
nutrient is provided, then the cell no longer relies 
upon the first plasmid for viability. This provides 
away to assess the function of mutations i vivo. 

Integrating plasmids provide an opportunity to ma- 
nipulate the yeast chromosome directly. As described 
in the previous section, integration relies upon the 
yeast cell’s proficient homologous recombination. 
This can be exploited to target insertion of DNA to a 
particular locus. Depending on the exact construction, 
an integrating plasmid may be used to replace a geno- 
mic copy of a gene of interest, to insert a mutation or 
an epitope tag, or to physically link a chromosomal 
locus to a selectable plasmid marker. This can be 
important for subsequent genetic analysis. 


Plasmids in yeast thus provide the ability to iden- 
tify unknown genes and examine their function, to 
manipulate the yeast chromosome, and to program 
the yeast cell to produce any protein of interest. 
These episomes provide essential laboratory tools, as 
well as important models for studying extrachromo- 
somal elements. Without them, yeast genetics would 
never have developed as a powerful model system 
for understanding eukaryotic cell biology. 
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See also: Saccharomyces cerevisiae (Brewer’s 
Yeast); Saccharomyces Chromosomes; 
Schizosaccharomyces pombe, the Principal Subject 
of Fission Yeast Genetics; Transposable Elements 
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The yeast two-hybrid system is a valuable tool used to 
identify interacting proteins. The protein of interest is 
expressed in yeast as a fusion to the DNA-binding 
domain of a transcription factor lacking a transcrip- 
tion activation domain. The DNA-binding fusion 
protein is generally called the bait. The yeast strain 
also contains one or more reporter genes with binding 
sites for the DNA-binding domain. To identify pro- 
teins that interact with the bait, a plasmid library that 
expresses cDNA-encoded proteins fused to a tran- 
scription activation domain is introduced into the 
strain. Interaction of a cDNA-encoded protein with 
the bait results in activation of the reporter genes, 
allowing cells containing the interactors to be identi- 


fied. 


See also: cDNA; DNA-Binding Proteins; Reporter 
Gene 
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Avian and mammalian sex chromosomes evolved 
independently (Fridolfsson et al., 1998; Nanda et al., 
1999, 2000) and should therefore have fundamentally 
different sex-determining genes. The sex chromo- 
somes in birds are designated Z and W: the female is 
the heteromorphic (ZW) and the male the homo- 
morphic (ZZ) sex. The average avian Z chromosome 
is a medium-sized macrochromosome. It is not clear 
in birds whether the Z or W chromosome determines 
sex. 
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A zig-zag-like structure of the DNA chain that is 
observed in GC-rich segments of DNA which form 
left-handed helices. 


Zinc Finger Proteins 
See: DNA-Binding Proteins 
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A zoo blot is a technique using Southern blotting to 
evaluate the ability of a DNA probe from one species 
to hybridize with genomic DNA from a variety of 
other species. 


See also: Southern Blotting 
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The zygote, or fertilized egg, is a single cell produced 
by fusion of female and male germ cells, that is, the 
unfertilized egg and sperm, respectively. Since germ 
cells undergo meiotic divisions to a haploid state (7) 
during oogenesis and spermatogenesis, fusion of the 
unfertilized egg and sperm (fertilization) restores a 
diploid (27) number of chromosomes to the zygote. 
In mammals, the second meiotic division of the egg, 
with separation of chromatids, occurs shortly after 
fusion with sperm. At an appropriate time after fertil- 
ization, the zygote begins to divide mitotically, even- 
tually giving rise to a multicellular organism that 
exhibits all of the characteristics of the species. 
Nuclei contributed to the zygote by the unfertil- 
ized egg and sperm are called female and male pronu- 
clei, respectively. In mice, the female pronucleus forms 
at ~7.5h and the male pronucleus at ~5.5h after 
fusion of the unfertilized egg and sperm. The two 
pronuclei must come together near the center of the 
zygote and form a single diploid nucleus. The timing 
of nuclear formation varies greatly from one species to 
another; for example, it takes < 1 h in sea urchins and 
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> 12 hin mice. In fact, in mice, pronuclei approach each 
other, but do not actually fuse to become a diploid 
nucleus. Rather, pronuclear membranes disappear and 
chromosomes assemble ona spindle. DNA replication 
occurs ~14.5h after fertilization, as the pronuclei 
migrate toward the center of the zygote. In mice, the 
first cleavage division occurs at ~20h after fertil- 
ization, when chromosomesare assembled ona spindle. 

In many animals, sperm contribute a centriole to 
the zygote and this organelle helps to organize the 
first mitotic spindle on which the chromosomes are 
arranged. In this respect, the sperm centriole acts 
as a microtuble-organizing center in the zygote. On 
the other hand, sperm contribute very few of the 
large number of mitochondria found in the zygote 
(<0.01%), ensuring that mitochondrial DNA is 
maternally inherited. 

The zygote is inactive with respect to nascent 
transcription of genomic DNA, although translation 
of maternal transcripts takes place. The onset of tran- 
scription is delayed until after the first cleavage division 
in mammals and after the first 12 cleavage divisions in 
some nonmammals. Presumably, this period of tran- 
scriptional inactivity exhibited by the zygote provides 
time to remodel parental chromosomes. 

In mammals, genomes derived from the unfertil- 
ized egg and sperm appear not to be equivalent. In 
some cases, only the maternally derived allele of a 
particular gene is active, whereas, in others, only the 
paternally derived allele is active (‘genetic imprint- 
ing’). Some of these genes are absolutely essential for 
normal development. Apparently, as a result of this 
nonequivalence of pronuclei, parthenogenetic (bima- 
ternal), gynogenetic (bimaternal), and androgenetic 
(bipaternal) mammalian zygotes cannot give rise to 
normal fetuses and live births. 
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A zygotic lethal gene is a gene that leads invariably or 
almost invariably to the death of an organism prior to 


the reproductive stage. ‘Zygotic’ in this case refers 
to the organism that develops from a single-celled 
diploid zygote. For some zygotic lethal genes, a rare 
individual carrying the gene in a dose that is normally 
lethal survives to the reproductive stage. Such rare 
survivors are called ‘escapers.’ Zygotic lethal genes 
are to be distinguished from the following: gametic 
or haplophasic lethal genes, which exert their effects in 
haploid gametes; from sterile genes, which allow their 
bearers to reach reproductive maturity but render 
them sterile; and from maternal-effect or paternal- 
effect lethal genes, which kill the progeny of affected 
individuals. 

Many zygotic lethal genes result in developmental 
arrest and death at a particular stage during develop- 
ment, referred to as the lethal phase. Zygotic lethal 
genes may thus be classified as embryonic or postem- 
bryonic, depending on their lethal phase. Examples of 
postembryonic zygotic lethal genes in insects include 
larval, pupal and early adult lethals. 

A dominant zygotic lethal gene is lethal when pres- 
ent as a single copy per cell, even when a wild-type 
allele of the same gene is also present in the diploid 
cells. Such lethal genes are rarely studied because they 
cannot be propagated by breeding. Recessive zygotic 
lethal genes are lethal only when they are present in 
the homozygous or hemizygous condition. Indivi- 
duals that are heterozygous for the lethal gene are 
viable because the wild-type allele is dominant to the 
lethal allele. A recessive lethal gene can be maintained 
and propagated in a heterozygous stock. One-fourth 
of the progeny produced from the mating of hetero- 
zygous lethal parents are expected to be homozygous 
for the lethal gene and exhibit the lethal phenotype. 
Two-thirds of the surviving progeny are expected to 
be heterozygous for the lethal gene and can be used 
to propagate the lethal gene. In rare cases, a recessive 
zygotic lethal gene may confer a visible phenotype 
dominantly, in which case individuals heterozygous 
for the lethal can be readily identified. For most lethal 
genes, however, the heterozygous lethal organisms are 
indistinguishable from the homozygous wild-type 
organisms, unless a closely linked marker gene con- 
ferring a visible phenotype is used to track either the 
lethal-bearing chromosome or its non-lethal homo- 
log. Chromosomal rearrangements are often used to 
suppress recombination between the lethal gene and 
the visible tag. 

X-linked zygotic lethal genes have long (since 
1912) been recognized in the fruit fly Drosophila 
melanogaster because the progeny of a heterozygous 
mother mated to a wild-type male exhibit an altered 
sex ratio: half of the sons are hemizygous for the 
zygotic lethal gene and inviable, whereas all of the 
daughters receive a dominant wild-type allele from 


their father. Half of the daughters will be heterozy- 
gous for the recessive lethal gene; they can be identi- 
fied by the altered sex ratio of their progeny. 

The phenotype conferred by a conditional zygotic 
lethal gene can be influenced by changes in the growth 
conditions of the organisms carrying it or in the 
genotypic background in which the lethal gene is 
embedded. An example of a zygotic lethal gene influ- 
enced by growth conditions is a gene that causes le- 
thality only when the organism carrying it is raised at 
an elevated (non-permissive or restrictive) tempera- 
ture. In this case, the conditional lethal gene can be 
propagated in homozygous stocks maintained at per- 
missive conditions. A shift to restrictive conditions 
permits analysis of the lethal phenotype. 

Most recessive zygotic lethal genes differ from their 
wild-type alleles by having reduced or no wild-type 
gene activity. Such genes are referred to as vital or 
essential because wild-type gene activity, even if pro- 
vided by a single wild-type gene per cell, is required 
for development of the organism to the reproductive 
stage. Very approximate estimates of numbers of es- 
sential genes have been made for organisms that have 
been intensively studied genetically. For example, it 
has been estimated that the fruit fly D. melanogaster 
and the nematode Caenorhabditis elegans each have 
5000 essential genes and that the mouse Mus musculus 
has 5000-10 000 essential genes. For all three of these 
organisms, it is estimated that there are many more 
genes that are active and unessential. Any essential 
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function provided by two or more genes redundantly 
would not have been counted as essential in these 
estimates because a loss-of-function allele of one 
gene within an overlapping set would not have been 
recessive lethal. 

In Drosophila and C. elegans, the earliest stages 
of embryogenesis are controlled largely by genes 
expressed in the mother rather than in the embryo. 
The products of such maternal-effect genes are stored 
in the oocyte prior to fertilization and are needed for 
early embryogenesis, before the activities of many 
zygotic genes are required. 

In general, the essential physiological role played 
by a zygotic lethal gene cannot be deduced simply 
from an analysis of its lethal phenotype. A molecular 
analysis of the gene and its products is usually re- 
quired, as well as other methodologies. Essential genes 
may be required at more than one stage of develop- 
ment, including stages that in normal development 
occur after the lethal phase. The role of an essential 
gene in stages subsequent to the lethal phase may be 
studied in genetic mosaics, in which only some cells 
of an organism are homozygous for the lethal gene. 
Genetic mosaics of D. melanogaster have been used to 
show that only a small proportion of genes repre- 
sented by recessive zygotic lethal alleles are essential 
for the viability of all cells of the animal. 


See also: Balanced Translocation; Chimera; 
Maternal Effect 


